A few months ago I posted an article on the blog around using Apache Spark to
analyse activity on our website
[https://www.rittmanmead.com/blog/2014/12/going-beyond-mapreduce-for-hadoop-etl-pt-3-introducing-apache-spark/]
, using Spark to join the site activity to some reference tables for some
one-off analysis. In this article I’ll be