flume - Rittman Mead

Forays into Kafka - Enabling Flexible Data Pipelines

One of the defining features of “Big Data” from a technologist’s point of view is the sheer number of tools and permutations at one’s disposal. Do you go Flume or Logstash? Avro or Thrift? Pig or Spark? Foo or Bar? (I made that last one up). This wealth

Technical

Trickle-Feeding Log Data into the HBase NoSQL Database using Flume

The other day I posted an article on the blog around using Flume to transport Apache web log entries from our website into Hadoop [https://www.rittmanmead.com/blog/2014/05/trickle-feeding-webserver-log-files-to-hdfs-using-apache-flume/] , with the final destination for the entries being an HDFS file - with the HDFS file essentially mirroring

Technical

Trickle-Feeding Log Files to HDFS using Apache Flume

In some previous articles on the blog I’ve analysed Apache webserver log files sitting on a Hadoop cluster using Hive [https://www.rittmanmead.com/blog/2014/04/simple-data-manipulation-and-reporting-using-hive-impala-and-cdh5/] , Pig [https://www.rittmanmead.com/blog/2014/05/simple-hadoop-dataflows-using-apache-pig-and-cdh4-6/] and most recently, Apache Spark [https://www.rittmanmead.com/blog/2014/05/