Rittman Mead Blog - Big Data

Tagged

Big Data

A collection of 112 posts

Using Oracle Big Data SQL to Add Dimensions and Attributes to Hadoop Reporting

In a previous post I looked at using Oracle’s new Big Data SQL product with ODI12c [https://www.rittmanmead.com/blog/2014/10/using-oracle-big-data-sql-odi12c-hive/], where I used Big Data SQL [http://www.oracle.com/us/products/database/big-data-sql/overview/index.html]

Connecting OBIEE11g on Windows to a Kerberos-Secured CDH5 Hadoop Cluster using Cloudera HiveServer2 ODBC Drivers

In a few previous posts and magazine articles [http://www.oracle.com/technetwork/issue-archive/2014/14-sep/o54ba-2279189.html] I’ve covered connecting OBIEE11g to a Hadoop cluster [https://www.rittmanmead.com/blog/2014/01/obiee-11-1-1-7-cloudera-hadoop-hiveimpala-part-2-load-data-into-

OBIEE and ODI on Hadoop : Next-Generation Initiatives To Improve Hive Performance

The other week I posted a three-part series (part 1 [https://www.rittmanmead.com/blog/2014/12/going-beyond-mapreduce-for-hadoop-etl-pt-1-why-mapreduce-is-only-for-batch-processing/] , part 2 [https://www.rittmanmead.com/blog/2014/12/going-beyond-mapreduce-for-hadoop-etl-pt-2-

Rittman Mead BI Forum 2015 Call for Papers Now Open!

I’m very pleased to announce that the Call for Papers for the Rittman Mead BI Forum 2015 [https://docs.google.com/a/rittmanmead.com/forms/d/18XWPmrzr3te55rloDJyB8vFP9j8AUNKKryvOToGHBlA/viewform] is now open, with abstract submissions open to January 18th 2015. As in previous years the BI Forum will run over

Going Beyond MapReduce for Hadoop ETL Pt.2 : Introducing Apache YARN and Apache Tez

In the first post [https://www.rittmanmead.com/blog/2014/12/going-beyond-mapreduce-for-hadoop-etl-pt-1-why-mapreduce-is-only-for-batch-processing/] in this three part series on going beyond MapReduce for Hadoop ETL, I looked at how a typical Apache Pig script gets compiled into

Going Beyond MapReduce for Hadoop ETL Pt.1 : Why MapReduce Is Only for Batch Processing

Over the previous few months I’ve been looking at the various ways you can load data into Hadoop, process it and then report on it using Oracle tools [https://www.rittmanmead.com/blog/2013/11/why-odi-dw-and-obiee-developers-should-be-interested-in-hadoop/] . We’ve looked

Analytics with Kibana and Elasticsearch through Hadoop - part 3 - Visualising the data in Kibana

In this post we will see how Kibana can be used to create visualisations over various sets of data that we have combined together. Kibana is a graphical front end for data held in ElasticSearch, which also provides the analytic capabilities. Previously we looked at where the data came from

Analytics with Kibana and Elasticsearch through Hadoop - part 2 - Getting data into Elasticsearch

Introduction In the first part of this series [https://www.rittmanmead.com/blog/2014/11/analytics-with-kibana-and-elasticsearch-through-hadoop-part-1-introduction/] I described how I made several sets of data relating to the Rittman Mead blog from various sources available through Hive. This included blog hits

Analytics with Kibana and Elasticsearch through Hadoop - part 1 - Introduction

Introduction I’ve recently started learning more about the tools and technologies that fall under the loose umbrella term of Big Data [http://cdn.meme.am/instances/500x/47510205.jpg], following a lot of the blogs that Mark Rittman has written, including getting Apache log data into Hadoop [https://www.

Using rlwrap with Apache Hive beeline for improved readline functionality

rlwrap is a nice little wrapper in which you can invoke commandline utilities and get them to behave with full readline [http://www.gnu.org/software/bash/manual/html_node/Readline-Interaction.html#Readline-Interaction] functionality just like you’d get at the bash prompt. For example, up/down arrow

Adding Oracle Big Data SQL to ODI12c to Enhance Hive Data Transformations

An updated version of the Oracle BigDataLite VM [http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html] came out a couple of weeks ago, and as well as updating the core Cloudera CDH software to the latest release it also included Oracle Big Data SQL [http://www.

News and Updates from Oracle Openworld 2014

It’s the Saturday after Oracle Openworld 2014 [http://www.oracle.com/openworld], and I’m now home from San Francisco and back in the UK. It’s been a great week as usual, with lots of product announcements and updates to the BI, DW and Big Data products we

Using Oracle GoldenGate for Trickle-Feeding RDBMS Transactions into Hive and HDFS

A few months ago I wrote a post on the blog around using Apache Flume to trickle-feed log data into HDFS and Hive [https://www.rittmanmead.com/blog/2014/05/trickle-feeding-webserver-log-files-to-hdfs-using-apache-flume/] , using the Rittman Mead website as the source for

Analyzing Twitter Data using Datasift, MongoDB, Hive and ODI12c

Last week I posted an article on the blog around analysing Twitter data using Datasift, MongoDB and Pig [https://www.rittmanmead.com/blog/2014/09/analyzing-twitter-data-using-datasift-mongodb-and-pig/], where I used the Datasift [http://datasift.com] service to stream tweets about Rittman Mead into a

Analyzing Twitter Data using Datasift, MongoDB and Pig

If you followed our recent postings on the updated Oracle Information Management Reference Architecture [https://www.rittmanmead.com/blog/2014/06/introducing-the-updated-oracle-rittman-mead-information-management-reference-architecture-pt1-information-architecture-and-the-data-factory/] , one of the key concepts we talk about is the “data reservoir”

Upcoming Big Data and Hadoop for Oracle BI, DW and DI Developers Presentations

If you’ve been following our postings on the blog over the past year, you’ll probably have seen quite a lot of activity around big data and Hadoop [http://www.rittmanmead.com/category/big-data/] and in particular, what these technologies bring to the world of Oracle Business Intelligence,

Rittman Mead and Oracle Big Data Appliance

Over the past couple of years Rittman Mead have been broadening our skills and competencies out from core OBIEE, ODI and Oracle data warehousing into the new “emerging” analytic platforms: R and database advanced analytics, Hadoop, cloud and clustered/distributed systems. As we talked about in the recent series of

Why Oracle Big Data SQL Potentially Solves a Big Issue with Hadoop Security

Oracle announced their Big Data SQL product a couple of weeks ago, which effectively extends Exadata’s query-offloading to Hadoop data sources. I covered the launch a few days afterwards [https://www.rittmanmead.com/blog/2014/07/taking-a-look-at-the-new-oracle-big-data-sql/], focusing on

Taking a Look at the New Oracle Big Data SQL

Oracle launched their Oracle Big Data SQL [http://www.oracle.com/us/products/database/big-data-sql/overview/index.html] product earlier this week, and it’ll be of interest to anyone who saw our series of posts a few weeks ago about the updated Oracle Information Management Reference Architecture

Introducing the Updated Oracle / Rittman Mead Information Management Reference Architecture Pt2. - Delivering the Data Factory

In my previous post on our updated Oracle Information Management Reference Architecture [https://www.rittmanmead.com/blog/2014/06/introducing-the-updated-oracle-rittman-mead-information-management-reference-architecture-pt1-information-architecture-and-the-data-factory/] , jointly-developed with Oracle’s Enterprise Architecture team, we went through a conceptual and

Introducing the Updated Oracle / Rittman Mead Information Management Reference Architecture Pt1. - Information Architecture and the "Data Factory"

One of the things at Rittman Mead that we’re really interested in, is the architecture of “information management” systems and how these change over time as thinking, and product capabilities, evolve. In fact we often collaborate with the Enterprise Architecture team within Oracle, giving input into the architecture designs

End-to-End ODI12c ETL on Oracle Big Data Appliance Pt.5 : Bulk Unload to Oracle

All week I’ve been looking at what’s involved in moving data around Hadoop on the Oracle Big Data Appliance, using ODI12c to orchestrate the end-to-end process. Using web log data from the Rittman Mead website, I first landed the log data on HDFS using Apache Flume,

End-to-End ODI12c ETL on Oracle Big Data Appliance Pt.4 : Transforming Data using Python & Hive Streaming

This week I’m taking an in-depth look at ETL on the Oracle Big Data Appliance, using Oracle Data Integrator 12c to call the various bits of Hadoop functionality and orchestrate the whole process. So far, I’ve landed web log data into the Hadoop cluster using Flume, created

End-to-End ODI12c ETL on Oracle Big Data Appliance Pt.3 : Enhance with Oracle Reference Data via Sqoop, and CKMs

In the first two posts in this series, I used the software on the Oracle Big Data Appliance 3.0 to ingest web log data from the Rittman Mead blog server, parse and load that data into a Hive table, and then join that table to another to add details