If you’ve been following our postings on the blog over the past year, you’ll probably have seen quite a lot of activity around big data and Hadoop and in particular, what these technologies bring to the world of Oracle Business Intelligence, Oracle Data Warehousing and Oracle Data Integration. For anyone who’s not had a chance to read the posts and articles, the three links below are a great introduction to what we’ve been up to:
- A list of some of our “getting started with Hadoop” articles
- Rittman Mead Announce New Partnerships with Cloudera and Amazon Web Services
- Rittman Mead and Oracle Big Data Appliance
- Presentation at the recent Oracle Virtual Technology Summit on ODI12c for Hadoop / Big Data Integration
In addition, we recently took part in an OTN ArchBeat podcast with Stewart Bryson and Andrew Bond on the updated Oracle Information Management Reference Architecture we co-developed with Oracle’s Enterprise Architecture team, where you can hear me talk with Stewart and Andrew about how the updated architecture came about, the thinking behind it, and how concepts like the data reservoir and data factory can be delivered in an agile way.
- Podcast Show Notes: Redefining Information Management Architecture (including links to the three parts in the podcast)
- Introducing the Updated Oracle / Rittman Mead Information Management Reference Architecture : Part 1: Information Architecture and the Data Factory
- Introducing the Updated Oracle / Rittman Mead Information Management Reference Architecture : Part 2: Delivering the Data Factory
I’m also pleased to be delivering a number of presentations and seminars over the next few months, on Oracle and Cloudera’s Hadoop technology and how it applies to Oracle BI, DW and DI developers - if you’re part of a local Oracle user group and you’d like me to deliver one of them for your group, drop me an email at email@example.com.
Slovenian Oracle User Group / Croatian Oracle User Group Conferences, October 2014
These two events run over consecutive days in Slovenia and Croatia, and I’m delivering the keynote at each on Analytics and Big Data, and a one-day seminar running on the Tuesday in Slovenia, and over the Wednesday and Thursday in Croatia. The theme of the seminar is around applying Hadoop and big data technologies to Oracle BI, DW and data integration, and is made up of four sessions:
Part 1 : Introduction to Hadoop and Big Data Technologies for Oracle BI & DW Developers "In this session we'll introduce some key Hadoop concepts including HDFS, MapReduce, Hive and NoSQL/HBase, with the focus on Oracle Big Data Appliance and Cloudera Distribution including Hadoop. We'll explain how data is stored on a Hadoop system and the high-level ways it is accessed and analysed, and outline Oracle's products in this area including the Big Data Connectors, Oracle Big Data SQL, and Oracle Business Intelligence (OBI) and Oracle Data Integrator (ODI)." Part 2 : Hadoop and NoSQL Data Ingestion using Oracle Data Integrator 12c and Hadoop Technologies "There are many ways to ingest (load) data into a Hadoop cluster, from file copying using the Hadoop Filesystem (FS) shell through to real-time streaming using technologies such as Flume and Hadoop streaming. In this session we'll take a high-level look at the data ingestion options for Hadoop, and then show how Oracle Data Integrator and Oracle GoldenGate leverage these technologies to load and process data within your Hadoop cluster. We'll also consider the updated Oracle Information Management Reference Architecture and look at the best places to land and process your enterprise data, using Hadoop's schema-on-read approach to hold low-value, low-density raw data, and then use the concept of a "data factory" to load and process your data into more traditional Oracle relational storage, where we hold high-density, high-value data." Part 3 : Big Data Analysis using Hive, Pig, Spark and Oracle R Enterprise / Oracle R Advanced Analytics for Hadoop "Data within a Hadoop cluster is typically analysed and processed using technologies such as Pig, Hive and Spark before being made available for wider use using products like Oracle Big Data SQL and Oracle Business Intelligence. In this session, we'll introduce Pig and Hive as key analysis tools for working with Hadoop data using MapReduce, and then move on to Spark as the next-generation analysis platform typically being used on Hadoop clusters today. We'll also look at the role of Oracle's R technologies in this scenario, using Oracle R Enterprise and Oracle R Advanced Analytics for Hadoop to analyse and understand larger datasets than we could normally accommodate with desktop analysis environments." Part 4 : Visualizing Hadoop Datasets using Oracle Business Intelligence, Oracle BI Publisher and Oracle Endeca Information Discovery "Once insights and analysis have been produced within your Hadoop cluster by analysts and technical staff, it's usually the case that you want to share the output with a wider audience in the organisation. Oracle Business Intelligence has connectivity to Hadoop through Apache Hive compatibility, and other Oracle tools such as Oracle BI Publisher and Oracle Endeca Information Discovery can be used to visualise and publish Hadoop data. In this final session we'll look at what's involved in connecting these tools to your Hadoop environment, and also consider where data is optimally located when large amounts of Hadoop data need to be analysed alongside more traditional data warehouse datasets."
Oracle Openworld 2014 (ODTUG Sunday Symposium), September 2014
Along with another session later in the week on the upcoming Oracle BI Cloud Services, I’m doing a session on the User Group Sunday for ODTUG on ODI12c and the Big Data Connectors for ETL on Hadoop:
Deep Dive into Big Data ETL with Oracle Data Integrator 12c and Oracle Big Data Connectors [UGF9481] "Much of the time required to work with big data sources is spent in the data acquisition, preparation, and transformation stages of a project before your data reaches a state suitable for analysis by your users. Oracle Data Integrator, together with Oracle Big Data Connectors, provides a means to efficiently load and unload data to and from Oracle Database into a Hadoop cluster and perform transformations on the data, either in raw form or in technologies such as Apache Hive or R. This presentation looks at how Oracle Data Integrator can form the centerpiece of your big data ETL strategy, within either a custom-built big data environment or one based on Oracle Big Data Appliance."
UK Oracle User Group Tech’14 Conference, December 2014
I’m delivering an extended version of my OOW presentation on the UKOUG Tech’14’s “Super Sunday” event, this time over 90 minutes rather than the 45 at OOW, giving me a bit more time for demos and discussion:
Deep-Dive into Big Data ETL using ODI12c and Oracle Big Data Connectors "Much of the time required to work with Big Data sources is spent in the data aquisition, preparation and transformation stages of a project; before your data is in a state suitable for analysis by your users.Oracle Data Integrator, together with Oracle Big Data Connectors, provides a means to efficiently load and unload data from Oracle Database into a Hadoop cluster, and perform transformations on the data either in raw form or technologies such as Apache Hive or R. In this presentation, we will look at how ODI can form the centrepiece of your Big Data ETL strategy, either within a custom-built Big Data environment or one based on Oracle Big Data Appliance."
Oracle DW Global Leaders’ Meeting, Dubai, December 2014
The Oracle DW Global Leaders forum is an invite-only group organised by Oracle and attended by select customers and associate partners, one of which is Rittman Mead. I’ll be delivering the technical seminar at the end of the second day, which will run over two sessions and will be based on the main points from the one-day seminars I’m running in Croatia and Slovenia.
From Hadoop to dashboards, via ODI and the BDA - the complete trail : Part 1 and Part 2 "Join Rittman Mead for this afternoon workshop, taking you through data acquisition and transformation in Hadoop using ODI, Cloudera CDH and Oracle Big Data Appliance, through to reporting on that data using OBIEE, Endeca and Oracle Big Data SQL. Hear our project experiences, and tips and techniques based on real-world implementations"
Keep an eye out for more Hadoop and big data content over the next few weeks, including a look at MongoDB and NoSQL-type databases, and how they can be used alongside Oracle BI, DW and data integration tools.