emr

Tagged

A collection of 2 posts

ETL Offload with Spark and Amazon EMR - Part 3 - Running pySpark on EMR

In the previous articles (here [https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-1], and here [https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-2-code-development-with-notebooks-and-docker/] ) I gave the background to

obiee

ETL Offload with Spark and Amazon EMR - Part 1 - Introduction

We recently undertook a two-week Proof of Concept exercise for a client, evaluating whether their existing ETL processing could be done faster and more cheaply using Spark. They were also interested in whether something like Redshift [http://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html] would provide a