Tagged

spark

A collection of 16 posts

emr

ETL Offload with Spark and Amazon EMR - Part 3 - Running pySpark on EMR

In the previous articles (here [https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-1], and here [https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-2-code-development-with-notebooks-and-docker/] ) I gave the background to a project we did for a client, exploring the benefits of Spark-based ETL processing running on Amazon's Elastic Map Reduce

Oracle OpenWorld 2015 Roundup Part 2 : Data Integration, and Big Data (in the Cloud...)
Big Data

Oracle OpenWorld 2015 Roundup Part 2 : Data Integration, and Big Data (in the Cloud...)

In yesterdays part one of our three-part Oracle Openworld 2015 round-up [https://www.rittmanmead.com/blog/2015/11/oracle-openworld-2015-roundup-part-1-obiee12c-and-data-visualisation-cloud-service/] , we looked at the launch of OBIEE12c just before Openworld itself, and the new Data Visualisation Cloud Service that Thomas Kurian demo’d in his mid-week keynote. In part two we’

Technical

OBIEE and ODI on Hadoop : Next-Generation Initiatives To Improve Hive Performance

The other week I posted a three-part series (part 1 [https://www.rittmanmead.com/blog/2014/12/going-beyond-mapreduce-for-hadoop-etl-pt-1-why-mapreduce-is-only-for-batch-processing/] , part 2 [https://www.rittmanmead.com/blog/2014/12/going-beyond-mapreduce-for-hadoop-etl-pt-2-introducing-apache-yarn-and-apache-tez/] and part 3 [https://www.rittmanmead.com/blog/2014/12/going-beyond-mapreduce-for-hadoop-etl-pt-3-introducing-apache-spark/] ) on going beyond MapReduce for Hadoop-based ETL, where I l