Previewing Three Sessions at the Brighton Rittman Mead BI Forum 2015

April 7th, 2015 by

As well as a one-day masterclass by myself and Jordan Meyer, a data visualisation challenge, keynotes and product update sessions from Oracle and our guest speaker from the Oracle Data Warehouse Global Leaders Program, the Brighton Rittman Mead BI Forum 2015 has of course a fantastic set of speakers and sessions on a wide range of topics around Oracle BI, data warehousing and big data. In this blog post I’m going to highlight three sessions at the Brighton BI Forum, and later in the week I’ll be doing the same with three sessions from the Atlanta event – so let’s start with a speaker who’s new to the BI Forum but very well-known to the UK OBIEE community – Steve Devine.

Steve is one of the most experienced OBIEE practitioners in the Europe, recently with Edenbrook / Hitachi Consulting, Claremont and now working with Altius in the UK. In his session at the Brighton BI Forum 2015 Steve’s going to talk to us about what’s probably the hottest topic around OBIEE at the moment in his session “The Art and Science of Creating Effective Data Visualisations”. Over to Steve:

NewImage

“These days, news publications and the internet are packed with eye-catching data visualisations and infographics – the New York Times, the Guardian or Information Is Beautiful to name but a few. Yet the scientists and statisticians tell us that everything could be a bar chart, and that nothing should ever be a pie chart! How do we make sense of these seemingly disparate, contrasting views?
My presentation provides an introduction on how graphic design principles complement the more science orientated aspects of data viz design. It will focus on a simple-to-apply design framework that brings all of these principles together, enabling you to create visualisations that have the right balance of aesthetics and function. By example, I’ll apply this framework to traditional BI scenarios such as operational and exploratory dashboards, as well as new areas that BI tools are just beginning to support such as commentary and storytelling. I’ll also look at how well Oracle’s BI tools address today’s data visualisation needs, and how they compare to the competition.”

On the topic of data visualisation, I’m also very pleased to have Daniel Adams from Rittman Mead’s US office coming over to the Brighton BI Forum to talk about effective dashboard design. Daniel’s been working with Rittman Mead clients in the US and Europe for the past year helping them apply data visualisation and dashboard design best practices to their dashboards and reports, and he’ll be sharing some of his methods and approaches in his session “User Experience First: Guided information and attractive dashboard design”:

NewImage

“Most front end OBI developers can give users exactly what they ask for, but will that lead to insightful dashboards that improve data culture and escalate the user xperience? One the biggest  mistakes I see as a designer, are dashboards that are a cluttered collection of tables and graphs. Poorly designed dashboards can prevent users from adopting a BI implementation, diminishing the ROI. 
In this session, attendees will learn to design dashboards that inform, instruct, and lead to smart discussion and decisions.  This includes learning to visualize data to convey meaning, implementing attractive visual design, and creating a layout that leads users through a target rich environment. We will walk through a series of “before” and “after” dashboards that demonstrate the difference between meeting a requirement, and using proven UX and UI design concepts to make OBIEE dashboards insightful and enjoyable to use.”

Finally, someone I’m very pleased to have over to the Brighton BI Forum for the first time is Gerd Aiglstorfer. I first met Gerd at an Oracle event in Germany several years ago, and since then I’ve noticed several of his blogs and the launch of his Oracle University Expert Sessions on OBIEE development, administration and RPD modelling. Gerd is one of Europe’s premier experts in OBIEE and Oracle BI, and for his inaugural BI Forum presentation he’ll be deep-diving into one of the most complex topics around repository modeling in his session “Driving OBIEE Join Semantics on Multi Star Queries as User”:

NewImage

“Multi star queries are a very useful and powerful functionality of OBIEE. But when I examine reports developed by business users or report developers I often find some misunderstandings on how it is working and queries are build by OBIEE. As additionally the execution strategy in OBIEE 11.1.1.7 has changed to generate SQL of multi star queries I had the idea to introduce the topic at the BI Forum. Thus, it’s a quite interesting topic to go into technical details of OBIEE SQL generator engine.
I’ll introduce how users can drive join semantics on common fields in multi star queries. You will get a full picture of the functionality for a better understanding of how report creation affects SQL generation. I recognized some inconsistencies during my tests of the new OBIEE 11.1.1.7 logic in January 2014. I will demonstrate the issues and would like to discuss if you would say: “It’s a defect within the SQL generator engine” – as I do.”

Full agenda details on the Brighton Rittman Mead BI Forum 2015 can be found on the event homepage, along with details of the optional one-day masterclass on Delivering the Oracle Information Management and Big Data Reference Architecture, and our first-ever Data Visualisation Bake-Off, using the DonorsChoose.org dataset.

Realtime BI Show with Kevin and Stewart – BI Forum 2015 Special!

April 6th, 2015 by

Jordan Meyer and I were very pleased to be invited onto the Realtime BI Show podcast last week, run by Kevin McGinley and Stewart Bryson, to talk about the upcoming Rittman Mead BI Forum running in Brighton and Atlanta in May 2015. Stewart and Kevin are of course speaking at the Atlanta BI Forum event on May 13th-15th 2015 at the Renaissance Atlanta Midtown Hotel, Atlanta, and in the podcast we talk about the one-day masterclass that Jordan and I are running, some of the sessions at the event, and the rise of big data and data discovery within the Oracle BI+DW industry.

Full details on the two BI Forum 2015 events can be found on the event homepage, along with details of the optional one-day masterclass on Delivering the Oracle Information Management and Big Data Reference Architecture, the guest speakers and the inaugural Data Visualization Challenge. Registration is now open and can be done online using the two links below.

  • Rittman Mead BI Forum 2015, Brighton –  May 6th – 8th 2015 

We’ve also set up a special discount code for listeners to the Realtime BI Show, with 10%-off both registration and the masterclass fee for both the Brighton and Atlanta events – use code RTBI10 on the Eventbrite registration forms to qualify.

Analysing ODI performance with Flame Graphs

April 2nd, 2015 by

Flame Graphs are a visualisation that I learnt about through the excellent Linux systems performance work of Brendan Gregg, and saw Luca Canali talk about recently at UKOUG Tech 14. They’re a brilliant way of summarising extremely dense information in a way from which the main components accounting for the most time can be identified. I was recently doing some analysis for a client on their ODI batch runtime and I thought it would be a good idea to try them out. Load Plans can have complex hierarchies to them and working out which main sections account for what time can be tricky, as can following a load plan step through to a session and on to a session step and its constituent parts.

A flame graph is made up of the “stack trace” on the y-axis, and the amount of time spent in each on the x-axis. This is different from most other standard visualisations where the x-axis represents the passage of time, and instead summarises the data at multiple levels of the stack trace hierarchy. The “stack trace” in this case with ODI is Load plan -> load plan step (load plan step […]) -> session -> session step -> task. It’s as easy to see the overall run time as it is a load plan step part way down, as a constituent task of a session step. And what’s more, flame graphs look nice! This may seem a flimsy reason for using them on their own, but it’s a bonus over trawling through dull tables of data alone.

Looking at the flame graph above (taken from a demo BI Apps implementation) it’s nice and easy to see that the Warehouse Load Phase accounts for c.75% of the time, within which the two areas accounting for most time are AP and AR balances. This is from literally a single glance at one graphic. Flame Graphs are built as SVGs which enables them to be interactive (here’s an example). Clicking on any of the stack trace boxes drills into that area, so for the tasks taking less time (and so displaying less text) this is useful to see the specifics. Here’s the GL balance load in detail, showing how long the row inserts take in proportion to the index build:

 

Creating the flame graph is simple. You just need a stack trace that is semi-colon separated, followed by a space-delimited counter value at the end. A bit of recursive SQL magic with the SNP_ tables (helpfully documented by Oracle here) gives us this kind of output file with one line for every task executed and its duration:

which you then run through the Flame Graph tool:

Simply load the resulting SVG into a web browser such as Chrome, and you’re done. Here’s an example that you can download and try out.

Announcing Oracle E-Business Suite for Hadoop and MongoDB

April 1st, 2015 by

Rittman Mead are very pleased today to announce our special edition of Oracle E-Business Suite R12 running on Apache Hadoop and MongoDB, for customers looking for the ultimate in scalability, flexible data storage and lower cost-of-ownership. Powered by Hadoop technologies such as Apache Hive, HDFS and MapReduce, optional reference data storage in MongoDB and reporting provided by Apache Pig, we think this represents the ultimate platform for large deployments of Oracle’s premier ERP suite.

NewImage

In this special edition of Oracle E-Business Suite R12, we’ve replaced the Oracle Database storage engine with Hadoop, MapReduce and Apache Hive, with MapReduce providing the data processing engine and Apache Hive providing a SQL layer integrated with Oracle Forms. We’ve replaced Oracle Workflow with Apache Oozie and MongoDB as the optional web-scale NoSQL database for document and reference data storage, freeing you from the size limitations of relational databases, the hassles of referential integrity and restrictions of defined schemas. Developer access is provided through Apache Hue, or you can write your own Java MapReduce and or JavaScript MongoDB API programs to extend E-Business Suite’s functionality. Best of all, there’s no need for expensive DBAs as developers handle all data-modeling themselves (with MongoDB’s collections automatically adapting to new data schemas), and HDFS’s three-node replication removes the need for complicated backup & recovery procedures.

NewImage

We’ve also brought Oracle Reports into the 21st century by replacing it with Apache Pig, a high-level abstraction language for Hadoop that automatically compiles your “Pig Latin” programs into MapReduce code, and allows you to bring in data from Facebook, Twitter to combine with your main EBS dataset stored in Hive and MongoDB.

Pigebs

On the longer-term roadmap, features and enhancements we’re planning include:

  • Loosening the current INSERT-only restriction to allow UPDATES, DELETEs and full ACID semantics once HIVE-5317 is implemented. 
  • Adding MongoDB’s new write-reliabiity and durability so that data is always saved when EBS writes it to the underlying MongoDB collection
  • Reducing the current 5-30 minute response times to less than a minute by moving to Tez or Apache Spark
  • Providing integration with Oracle Discoverer 9iAS to delight end-users, and provide ad-hoc reporting truly at the speed-of-thought

For more details on our special Oracle E-Business Suite for Hadoop edition, contact us at enquiries@rittmanmead.com – but please note we’re only accepting new customers for today, April 1st 2015. 

Oracle GoldenGate, MySQL and Flume

March 30th, 2015 by

Back in September Mark blogged about Oracle GoldenGate (OGG) and HDFS . In this short followup post I’m going to look at configuring the OGG Big Data Adapter for Flume, to trickle feed blog posts and comments from our site to HDFS. If you haven’t done so already, I strongly recommend you read through Mark’s previous post, as it explains in detail how the OGG BD Adapter works.  Just like Hive and HDFS, Flume isn’t a fully-supported target so we will use Oracle GoldenGate for Java Adapter user exits to achieve what we want.

What we need to do now is

  1. Configure our MySQL database to be fit for duty for GoldenGate.
  2. Install and configure Oracle GoldenGate for MySQL on our DB server
  3. Create a new OGG Extract and Trail files for the database tables we want to feed to Flume
  4. Configure a Flume Agent on our Cloudera cluster to ‘sink’ to HDFS
  5. Create and configure the OGG Java adapter for Flume
  6. Create External Tables in Hive to expose the HDFS files to SQL access

OGG and Flume

Setting up the MySQL Database Source Capture

The MySQL database I will use for this example contains blog posts, comments etc from our website. We now want to use Oracle GoldenGate to capture new blog post and our readers’ comments and feed this information in to the Hadoop cluster we have running in the Rittman Mead Labs, along with other feeds, such as Twitter and activity logs.

The database has to be configured to user binary logging and also we need to ensure that the socket file can be found in /tmp/mysql.socket. You can find the details for this in the documentation. Also we need to make sure that the tables we want to extract from are using the InnoDB engine and not the default MyISAM one. The engine can easily be changed by issuing

Assuming we already have installed OGG for MySQL on /opt/oracle/OGG/ we can now go ahead and configure the Manager process and the Extract for our tables. The tables we are interested in are

First configure the manager

Now configure the Extract to capture changes made to the tables we are interested in

We should now be able to create the extract and start the process, as with a normal extract.

We will also have to generate metadata to describe the table structures in the MySQL database. This file will be used by the Flume adapter to map columns and data types to the Avro format.

Setting up the OGG Java Adapter for Flume

The OGG Java Adapter for Flume will use the EXTTRAIL created earlier as a source, pack the data up and feed to the cluster Flume Agent, using Avro and RPC. The Flume Adapter thus needs to know

  • Where is the OGG EXTTRAIL to read from
  • How to treat the incoming data and operations (e.g. Insert, Update, Delete)
  • Where to send the Avro messages to

First we create a parameter file for the Flume Adapter

There are two things to note here

  • The OGG Java Adapter User Exit is configured in a file called flume.props
  • The source tables’ structures are defined in wp.def

The flume.props file is a ‘standard’ User Exit config file

Some points of interest here are

  • The Flume agent we will send our data to is running on port 4545 on host bd5node1.rittmandev.com
  • We want each record to be prefixed with I(nsert), U(pdated) or D(delete)
  • We want each record to be postfixed with a timestamp of the transaction date
  • The Java class com.goldengate.delivery.handler.flume.FlumeHandler will do the actual work. (The curios reader can view the code in /opt/oracle/OGG/AdapterExamples/big-data/flume/src/main/java/com/goldengate/delivery/handler/flume/FlumeHandler.java)

Before starting up the OGG Flume, let’s first make sure that the Flume agent on bd5node1 is configure to receive our Avro message (Source) and also what to do with the data (Sink)

Here we note that

  • The agent’s source (inbound data stream) is to run on port 4545 and to use avro
  • The agent’s sink will write to HDFS and store the files  in /user/flume/gg/%{SCHEMA_NAME}/%{TABLE_NAME}
  • The HDFS files will be rolled over every 1Mb (1048576 bytes)

We are now ready to head back to the webserver that runs the MySQL database and start the Flume extract, that will feed all committed MySQL transactions against our selected tables to the Flume Agent on the cluster, which in turn will write the data to HDFS

If I now submit this blogpost I should see the results showing up our Hadoop cluster in the Rittman Mead Labs.

We can quickly create an externally organized table in Hive to view the results with SQL

Please leave a comment and you’ll be contributing to an OGG Flume!

Website Design & Build: tymedia.co.uk