Robin Moffatt

Robin Moffatt

127 posts published

emr

ETL Offload with Spark and Amazon EMR - Part 3 - Running pySpark on EMR

In the previous articles (here [https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-1], and here [https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-2-code-development-with-notebooks-and-docker/] ) I gave the background to a project we did for a client, exploring the benefits of Spark-based ETL processing running on Amazon's Elastic Map Reduce (EMR) Hadoop

elasticsearch

Streaming data from Oracle using Oracle GoldenGate and Kafka Connect

This article was also posted on the Confluent blog [http://www.confluent.io/blog/streaming-data-oracle-using-oracle-goldengate-kafka-connect/] , head over there for more great Kafka-related content! -------------------------------------------------------------------------------- Kafka Connect [http://docs.confluent.io/3.0.0/connect/index.html] is part of the Confluent Platform [http://www.confluent.io/product], providing a set

Oracle GoldenGate

Using logdump to Troubleshoot the Oracle GoldenGate for Big Data Kafka Handler

Oracle GoldenGate [http://www.oracle.com/technetwork/middleware/goldengate/overview/index.html] for Big Data (OGG BD) supports sending transactions as messages to Kafka topics, both through the native Oracle handler [http://docs.oracle.com/goldengate/bd1221/gg-bd/GADBD/GUID-2561CA12-9BAC-454B-A2E3-2D36C5C60EE5.htm#GADBD449] as well as a connector into Confluent's Kafka