In some previous articles on the blog I’ve analysed Apache webserver log files
sitting on a Hadoop cluster using Hive
[https://www.rittmanmead.com/blog/2014/04/simple-data-manipulation-and-reporting-using-hive-impala-and-cdh5/]
, Pig
[https://www.rittmanmead.com/blog/2014/05/simple-hadoop-dataflows-using-apache-pig-and-cdh4-6/] and
most recently, Apache Spark
[https://www.rittmanmead.com/blog/2014/05/