Each year at the Rittman Mead BI Forum we host an optional one-day masterclass before the event opens properly on Wednesday evening, with guest speakers over the year including Kurt Wolff, Kevin McGinley and last year, Cloudera’s Lars George. This year I’m particularly excited that together with Jordan Meyer, our Head of R&D, I’ll be presenting the masterclass on the topic of “Delivering the Oracle Big Data and Information Management Reference Architecture”.
Last year we launched at the Brighton BI Forum event a new reference architecture that Rittman Mead had collaborated with Oracle on, that incorporated big data and schema-on-read databases into the Oracle data warehouse and BI reference architecture. In two subsequent blog posts, and in a white paper published on the Oracle website a few weeks after, concepts such as the “Discovery Lab”, “Data Reservoirs” and the “Data Factory” were introduced as a way of incorporating the latest thinking, and product capabilities, into the reference architecture for Oracle-based BI, data warehousing and big data systems.
One of the problems I always feel with reference architectures though is that they tell you what you should create, but they don’t tell you how. Just how do you go from a set of example files and a vague requirement from the client to do something interesting with Hadoop and data science, and how do you turn the insights produced by that process into a production-ready, enterprise Big Data system? How do you implement the data factory, and how do you use new tools such as Oracle Big Data Discovery and Oracle Big Data SQL as part of this architecture? In this masterclass we’re looking to explain the “how” and “why” to go with this new reference architecture, based on experiences working with clients over the past couple of years.
The masterclass will be divided into two sections; the first, led by Jordan Meyer, will focus on the data discovery and “data science” parts of the Information Management architecture, going through initial analysis and discovery of datasets using R and Oracle R Enterprise. Jordan will share techniques he uses from both his work at Rittman Mead and his work with Slacker Radio, a Silicon Valley startup, and will introduce the R and Oracle R Enterprise toolset for uncovering insights, correlations and patterns in sample datasets and productionizing them as database routines. Over his three hours he’ll cover topics including:
Session #1 - Data exploration and discovery with R (2 hours)
1.1 Introduction to R
1.2 Tidy Data
1.3 Data transformations
1.4 Data Visualization
Session #2 - Predictive Modeling in the enterprise (1 hr)
2.3 Deploying models to the data warehouse with ORE
After lunch, I’ll take the insights and analysis patterns identified in the Discovery Lab and turn them into production big data pipelines and datasets using Oracle Data Integrator 12c, Oracle Big Data Discovery and Oracle Big Data SQL For a flavour of the topics I’ll be covering take a look at this Slideshare presentation from a recent Oracle event, and in the masterclass itself I’ll concentrate on techniques and approaches for ingesting and transforming streaming and semi-structured data, storing it in Hadoop-based data stores, and presenting it out to users using BI tools like OBIEE, and Oracle’s new Big Data Discovery.
Session # 3 - Building the Data Reservoir and Data Factory (2 hr)
3.1 Designing and Building the Data Reservoir using Cloudera CDH5 / Hortonworks HDP, Oracle BDA and Oracle Database 12c
3.2 Building the Data Factory using ODI12c & new component Hadoop KM modules, real-time loading using Apache Kafka, Spark and Spark Streaming
Session #4 - Accessing and visualising the data (1 hr)
4.1 Discovering and Analyzing the Data Reservoir using Oracle Big Data Discovery
4.2 Reporting and Dashboards across the Data Reservoir using Oracle Big Data SQL + OBIEE 18.104.22.168
You can register for a place at the two masterclasses when booking your BI Forum 2015 place, but you’ll need to hurry as we limit the number of attendees at each event in order to maximise interaction and networking within each group. Registration is open now and the two events take place in May - hopefully we’ll see you there!