January 29th, 2014 by Mark Rittman
Oracle released a new developer VM for download on OTN yesterday called “bigdatalite” – if you’re interested in big data, Hadoop and some of the SQL-on-Hadoop technologies I’ve been looking at recently on the blog, this is something you’ll want to download as soon as possible and play around with. I’ve had access to an earlier version of this VM back from 2012 because of some development work I did with ODI these technologies, but up until now there’s not been a publicly downloadable version I could point people to. Now there is, so I just wanted to walk through what in it, and how you can start to play around with some of the features.
Once you’ve downloaded the various archive files and imported the VM into Virtualbox, log in as oracle/welcome1 and you’ll see a (strangely militaristic-looking) desktop and some links to start an Oracle database, open a browser and so on:
Give the various services a few seconds to start up, and then click on the “Start Here” link on the desktop to open your browser.
The getting started page lists out the various products that are installed on the VM, which you can group as:
- Hadoop and big data products from Cloudera – Cloudera Manager, their equivalent to Enterprise Manager; Cloudera’s distribution of Hadoop (similar to how Red Hat and SuSE distribute their own versions of Linux); and Cloudera Impala and Search, their add-ons to Hadoop that make querying and searching faster
- Oracle’s Big Data Connectors, a set of technologies that link the Oracle database to Hadoop, allowing you to query Hadoop from Oracle, and load and unload data between the two platforms
- Oracle Data Integrator 12c, with a couple of Hadoop integration examples pre-created
- Oracle Database 12c, to use with the Big Data Connectors and ODI
- Oracle NoSQL database, a key/value database similar to Apache HBase
- A bunch of other related Oracle tools such as Jdeveloper, SQL Developer, and Oracle’s R Distribution – with R Studio and additional R packages separately installable
So a great place to start playing around with Hadoop in-general, a way to get some experience with Impala and Hive if you’re an OBIEE developer, and also a great way to try out the integration pieces between the Oracle Database and Hadoop including ODI’s capabilities in this area.
If you click on the Cloudera Manager link (http://localhost:7180/cmf/login) you’ll be taken to Cloudera Manager. This web UI allows you to see the state of the various services managed by Cloudera Manager, including
- HDFS (the distributed filesystem that holds the datafiles then typically analysed using Hive and Impala);
- Hive and Impala (two technologies for issuing SQL-type queries over HDFS files);
- MapReduce (the core data-processing technology within Hadoop that splits operations into mapping, shuffling and reducing (aggregating) data and automatically parallelises it over nodes in the Hadoop cluster)
- Sqoop (for loading data into and out of Hadoop from relational databases)
- Hue (a web UI for all of the above, that we’ll look at in a moment)
Hue is the other main web interface you’ll want to look at, and this is more of a developer-focused web app that allows you to create and view HDFS files, create Hive tables and then query them using Hive and Impala.
I covered Hue and the process of uploading files to create Hive tables in the two blog posts below the other week, and once you’ve done that you can query them from tools such as OBIEE using the 126.96.36.199 release’s Hive connectivity:
- OBIEE 188.8.131.52, Cloudera Hadoop & Hive/Impala Part 1 : Install and Set-up an EC2 Hadoop Cluster
- OBIEE 184.108.40.206, Cloudera Hadoop & Hive/Impala Part 2 : Load Data into Hive Tables, Analyze using Hive & Impala
If you’re more from the database side, there’s some tutorials available on the big data connectors and so forth – there doesn’t appear to be any separate tutorials for ODI though so you’ll need to “reverse-engineer” the two examples in ODI Studio to work through how they’ve been created. I’ll try and do this soon and post it on the blog, if anyone’s interested.
Anyway, the VM is downloadable now with supporting materials available on OTN here. I’ve added some links below to earlier posts on our blog that might be of interest to you if you’re looking to try OBIEE and ODI with this platform:
- Why ODI, DW and OBIEE Developers Should Be Interested in Hadoop
- Creating a Multi-Node Hadoop/Impala Cluster as a Datasource for OBIEE 220.127.116.11
- Connecting OBIEE 18.104.22.168 to Cloudera Impala
- Accelerating Hadoop/Hive OBIEE Queries Using Exalytics and the Summary Advisor
- OBIEE, ODI and Hadoop Part 4: Hive Data Transformation & Integration via ODI 11g
- OBIEE, ODI and Hadoop Part 3: A Closer Look at Hive, HFDS and Cloudera CDH3
- OBIEE, ODI and Hadoop Part 2: Connecting OBIEE 22.214.171.124 to Hadoop Data Sources
- OBIEE, ODI and Hadoop Part 1: So What Is Hadoop, MapReduce and Hive?