Rittman Mead
  • Training
  • Case Studies
  • About
  • Blog
  • Search
Subscribe
Tagged

s3

A collection of 3 posts

emr

ETL Offload with Spark and Amazon EMR - Part 3 - Running pySpark on EMR

In the previous articles (here [https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-1], and here [https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-2-code-development-with-notebooks-and-docker/] ) I gave the background to a project we did for a client, exploring the benefits of Spark-based ETL processing running on Amazon's Elastic Map Reduce

  • Robin Moffatt
Robin Moffatt Dec 19, 2016 • 11 min read
spark

ETL Offload with Spark and Amazon EMR - Part 2 - Code development with Notebooks and Docker

In the previous article [https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-1/] I gave the background to a project we did for a client, exploring the benefits of Spark-based ETL processing running on Amazon's Elastic Map Reduce (EMR) Hadoop platform. The proof of concept we ran was on

  • Robin Moffatt
Robin Moffatt Dec 16, 2016 • 10 min read
Forays into Kafka - Enabling Flexible Data Pipelines
Big Data

Forays into Kafka - Enabling Flexible Data Pipelines

One of the defining features of “Big Data” from a technologist’s point of view is the sheer number of tools and permutations at one’s disposal. Do you go Flume or Logstash? Avro or Thrift? Pig or Spark? Foo or Bar? (I made that last one up). This wealth

  • Robin Moffatt
Robin Moffatt Oct 28, 2015 • 20 min read
Rittman Mead © 2025
Powered by Ghost