Robin Moffatt

Robin Moffatt

127 posts published

emr

ETL Offload with Spark and Amazon EMR - Part 3 - Running pySpark on EMR

In the previous articles (here [https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-1], and here [https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-2-code-development-with-notebooks-and-docker/] ) I gave the background to a project we did for a client, exploring the benefits of Spark-based ETL processing running on Amazon's Elastic Map Reduce

elasticsearch

Streaming data from Oracle using Oracle GoldenGate and Kafka Connect

This article was also posted on the Confluent blog [http://www.confluent.io/blog/streaming-data-oracle-using-oracle-goldengate-kafka-connect/] , head over there for more great Kafka-related content! -------------------------------------------------------------------------------- Kafka Connect [http://docs.confluent.io/3.0.0/connect/index.html] is part of the Confluent Platform [http://www.confluent.io/product], providing a set

Oracle GoldenGate

Using logdump to Troubleshoot the Oracle GoldenGate for Big Data Kafka Handler

Oracle GoldenGate [http://www.oracle.com/technetwork/middleware/goldengate/overview/index.html] for Big Data (OGG BD) supports sending transactions as messages to Kafka topics, both through the native Oracle handler [http://docs.oracle.com/goldengate/bd1221/gg-bd/GADBD/GUID-2561CA12-9BAC-454B-A2E3-2D36C5C60EE5.htm#GADBD449] as well as a connector into Confluent'