Going Beyond MapReduce for Hadoop ETL Pt.2 : Introducing Apache YARN and Apache Tez

Monday, December 8th, 2014 by

In the first post in this three part series on going beyond MapReduce for Hadoop ETL, I looked at how a typical Apache Pig script gets compiled into a series of MapReduce jobs, and those MapReduce jobs pass data between themselves by writing intermediate resultsets to disk (HDFS, the Hadoop cluster file system). As a […]

Going Beyond MapReduce for Hadoop ETL Pt.1 : Why MapReduce Is Only for Batch Processing

Sunday, December 7th, 2014 by

Over the previous few months I’ve been looking at the various ways you can load data into Hadoop, process it and then report on it using Oracle tools. We’ve looked at Apache Hive and how it provides a SQL layer over Hadoop, making it possible for tools like ODI and OBIEE to use their usual […]

Analytics with Kibana and Elasticsearch through Hadoop – part 3 – Visualising the data in Kibana

Tuesday, November 4th, 2014 by

In this post we will see how Kibana can be used to create visualisations over various sets of data that we have combined together. Kibana is a graphical front end for data held in ElasticSearch, which also provides the analytic capabilities. Previously we looked at where the data came from and exposing it through Hive, […]

Analytics with Kibana and Elasticsearch through Hadoop – part 2 – Getting data into Elasticsearch

Tuesday, November 4th, 2014 by

Introduction In the first part of this series I described how I made several sets of data relating to the Rittman Mead blog from various sources available through Hive. This included blog hits from the Apache webserver log, tweets, and metadata from WordPress. Having got it into Hive I now need to get it into […]

Analytics with Kibana and Elasticsearch through Hadoop – part 1 – Introduction

Monday, November 3rd, 2014 by

Introduction I’ve recently started learning more about the tools and technologies that fall under the loose umbrella term of Big Data, following a lot of the blogs that Mark Rittman has written, including getting Apache log data into Hadoop, and bringing Twitter data into Hadoop via Mongodb. What I wanted to do was visualise the […]

Website Design & Build: tymedia.co.uk