More Details on the Lars George Cloudera Hadoop Masterclass – RM BI Forum 2014, Brighton & Atlanta

April 29th, 2014 by


It’s just over a week until the first of the two Rittman Mead BI Forum 2014 events take place in Brighton and Atlanta, and one of the highlights of the events for me will be Cloudera’s Lars George’s Hadoop Masterclass. Hadoop and Big Data are two technologies that are becoming increasingly important to the worlds of Oracle BI and data warehousing, so this is an excellent opportunity to learn some basics, get some tips from an expert in the field, and then use the rest of the week to relate it all back to core OBIEE, ODI and Oracle Database.

Lars’ masterclass is running on the Wednesday before each event, on May 7th at the Brighton Hotel Seattle and then the week after, on Wednesday 14th May at the Renaissance Atlanta Midtown Hotel. Attendance for the masterclass is just £275 + VAT for the Brighton event, or $500 for the Atlanta event, but you can only book it as part of the overall BI Forum – so book-up now while there are still places left! In the meantime, here’s details of what’s in the masterclass:

Session 1 – Introduction to Hadoop 

The first session of the day sets the stage for the following ones. We will look into the history of Hadoop, where it comes from, and how it made its way into the open-source world. Part of this overview are the basic building blocks in Hadoop, the file system HDFS and the batch processing system MapReduce. 

Then we will look into the flourishing ecosystem that is now the larger Hadoop project, with its various components and how they help forming a complete data processing stack. We also briefly look into how Hadoop based distributions help today tying the various components together in a reliable manner based on predictable release cycles. 

Finally we will pick up the topic of cluster design and planning, talking about the major decision points when provisioning a Hadoop cluster. This includes the hardware considerations depending on specific use-cases as well as how to deploy the framework on the cluster once it is operational.

Session 2 – Ingress and Egress

The second session dives into the use of the platform as part of an Enterprise Data Hub, i.e. the central storage and processing location for all of the data in a company (large, medium, or small). We will discuss how data is acquired into the data hub and provisioned for further access and processing. There are various tools that allow the importing of data from single event based systems to relational database management systems. 

As data is stored the user has to make decisions how to store the data for further processing, since that can drive the performance implications considerably. In state-of-the-art data processing pipelines there are usually hybrid approaches that combine lightweight LT (no E for “extract” needed), i.e. transformations, with optimised data formats as the final location for fast subsequent processing. Continuous and reliable data collection is vital for productionising the initial proof-of-concept pipelines.

Towards the end we will also look at the lower level APIs available for data consumption, rounding off the set of available tools for a Hadoop practitioner.

Session 3 – NoSQL and Hadoop

For certain use-cases there is an inherent need for something more “database” like compared to the features offered by the original Hadoop components, i.e. file system and batch processing. Especially for slow changing dimensions and entities in general there is a need for updating previously stored data as time progresses. This is where HBase, the Hadoop Database, comes in and allows for random reads and writes to existing rows of data, or entities in a table. 

We will dive into the architecture of HBase to derive the need for proper schema design, one of the key tasks implementing a HBase backed solution. Similar to the file formats from session 2, HBase allows to freely design table layouts which can lead to suboptimal performance. This session will introduce the major access patterns observed in practice and explain how they can play to HBase’s strengths. 

Finally a set of real-world examples will show how fellow HBase users (e.g. Facebook) have gone through various modification of their schema design before arriving at their current setup. Available open-source projects show further schema designs that will help coming to terms with this involved topic.

Session 4 – Analysing Big Data

The final session of the day tackles the processing of data, since so far we have learned mostly about the storage and preparation of data for subsequent handling. We will look into the existing frameworks atop of Hadoop and how they offer distinct (but sometimes also overlapping) functionalities. There are frameworks that run as separate instance but also higher level abstractions on top of those that help developers and data wranglers of all kinds to find their right weapon of choice.

Using all of the learned the user will then see how the various tools can be combined to built the promised reliable data processing pipelines, landing data in the Enterprise Data Hub and using automatisms to start the said subsequent processing without any human intervention. The closing information provided in this session will look into the external interfaces, such as JDBC/ODBC, enabling the visualisation of the computed and analysed data in appealing UI based tools.

Detailed Agenda:

  • Session 1 – Introduction to Hadoop
    • Introduction to Hadoop
      • Explain pedigree, history
      • Explain and show HDFS, MapReduce, Cloudera Manager
    • The Hadoop Ecosystem
      • Show need for other projects within Hadoop
      • Ingress, egress, random access, security
    • Cluster Design and Planning
      • Introduce concepts on how to scope out a cluster
      • Typical hardware configuration
      • Deployment methods 
  • Session 2 – Ingress and Egress
    • Explain Flume, Sqoop to load data record based or in bulk
    • Data formats and serialisation
      • SequenceFile, Avro, Parquet
    • Continuous data collection methods
    • Interfaces for data retrieval (lower level)
  • Session 3 – NoSQL and Hadoop
    • HBase Introduction
    • Schema design
    • Access patterns
    • Use-cases examples
  • Session 4 – Analysing Big Data
    • Processing frameworks
      • Explain and show MapReduce, YARN, Spark, Solr
    • High level abstractions
      • Hive, Pig, CrunchImpalaSearch
    • Datapipelines in Hadoop
      • Explain Oozie, Crunch
    • Access to data for existing systems
      • ODBC/JDBC

Full details of both BI Forum events can be found at the Rittman Mead BI Forum 2014 web page, and full agenda details for Brighton and Atlanta are on this blog post from last week.

Preview of the Rittman Mead BI Forum in Atlanta

April 25th, 2014 by

Mark has done a great job of previewing the upcoming content for both BI Forums, the one running locally for us in Atlanta, as well as the one in Brighton, UK. We have an exceptional Master Class this year with Lars George from Cloudera, including an introduction to the Cloudera Big Data stack with full details on building, loading and analyzing Hadoop clusters. The exact details of what’s covered, as well as the timetable for all speaker presentations, is listed here. Additionally, Mark posted on the two special presentations occurring at the two events: Maria Colgan on the In-Memory database option in Atlanta, and myself and Andrew Bond covering the latest iteration of Oracle’s Information Management Reference Architecture in Brighton. And finally, Mark also covered three presentations for the Atlanta event covering Advanced Visualizations and Mobility. Instead of rehashing all of that, I wanted to do a blog post diving a bit more into the Atlanta event, and some of the content not previously mentioned, especially those by Oracle. We’ve always had incredible representation from Oracle at the BI Forum, and we are very appreciate that the different teams consider our event to be so important in the community.

I wanted to start off by discussing the venue a bit: the Renaissance Hotel in Midtown Atlanta. It’s a modern, upscale Atlanta hotel in Midown that also has the amazing Rooftop 866 bar with incredible views of the city (those of you that have “socialized” with me over the years know I’ll be spending some time up there). I’m confident this will be our best venue to date.


Before diving into the sessions that Oracle will be presenting in Atlanta, it seems prudent to give those folks a “warm and fuzzy” feeling, show our appreciation, and make them feel safe and sound. So here’s an image that many of our readers will already recognize; for those who don’t, I’m sure you’ll know it by heart when the two events conclude:


Philippe Lions will be back again this year previewing the newest version of Sample App. For customers and partners who are like us at Rittman Mead, Sample App is a pivotal part of your OBIEE methodology. It allows us the ability to demonstrate anything from simple OBIEE analyses and dashboards, to some of the crazy mad-scientist stuff that Philippe’s team comes up with. If Oracle and Philippe didn’t design and build Sample App and keep it current, then we would have to build it ourselves. From my understanding, this will be the first time Philippe has previewed this content external to Oracle, so we are pleased and honored that he chose the BI Forum as the venue for this. It’s also worth noting that Philippe is a BI Forum veteran… he has never missed the Atlanta event since it’s inception four years ago.

We also have Jack Berkowitz, VP of Product Management for Business Analytics at Oracle, speaking on “Analytics Applications and the Cloud”. He’ll be discussing Oracle BI Applications (OBIA) in detail and the roadmap Oracle has for deploying those applications in the Cloud. I imagine that Jack will be giving the Wednesday night Keynote (as he did last year with Philippe), which is always a crowd-pleaser. Jack also spoke on the new Mobile Application Designer last year, so I imagine he will also be able to update us on that product even though his focus at Oracle has shifted. Also from Oracle we have Matt Bedin (another BI Forum veteran) talking about Oracle BI and Cloud, but with a focus on Oracle’s roadmap with regular Oracle Analytics in the Cloud, which equates to having a Cloud-optimized OBIEE running in Oracle’s Public Cloud. As this product is not yet generally available, attendees will get the scoop on where this product is going… and we might even get some hints on when to expect it.

We are excited to have Chris Lynskey, Senior Director, Product Management and Strategy at Oracle, making his first appearance at the BI Forum. He’ll be speaking on “Endeca Information Technology for Self-Service and Big Data”, so we’ll see Endeca’s positioning for structured and non-structured reporting on an adhoc basis. We’ll have several presentations that delve into Endeca, but it will be good to hear from Chris on this topic, as he was with Endeca prior to the acquisition by Oracle, and has been deeply involved with the 3.1 release. Rounding out Oracle’s participation is BI Forum newcomer Susan Cheung, Vice President of Product Management for Oracle TimesTen. Susan will be speaking on “TimesTen In-Memory Database for Analytics – Best Practices and Use Cases”. So it will be interesting to have both Susan and Maria Colgan at the Forum, so attendees will have a chance to see Oracle’s complete In-Memory strategy and roadmap at one setting.

The final session I’d like to discuss is an entry from yours truly on “ExtremeBI: Agile, Real-Time BI with Oracle Business Intelligence, Oracle Data Integrator and Oracle GoldenGate”. I know… it’s an incredibly long title… but I had to get in all the buzz words. I also rely heavily on the Information Management Reference architecture that Andrew Bond and I are presenting at the UK BI Forum, so my Atlanta session will be based around this newest release. I love this content, and I think it shows with my excitement level every time I present it. I describe an Agile methodology that utilizes Oracle’s BI stack to the fullest: integrating OBIEE, ODI and perhaps the most beneficial element: Oracle GoldenGate. For those organizations who are investigating ways to deliver content rapidly while also making the end user central to the development process, then this session is for you.

Manifesto for Agile Software Development

Their are still slots available at both venues, so feel free to contact me directly if you have questions about either event.

Simple Data Manipulation and Reporting using Hive, Impala and CDH5

April 24th, 2014 by

Althought I’m pretty clued-up on OBIEE, ODI, Oracle Database and so on, I’m relatively new to the worlds of Hadoop and Big Data, so most evenings and weekends I play around with Hadoop clusters on my home VMWare ESXi rig and try and get some experience that might then come in useful on customer projects. A few months ago I went through an example of loading-up flight delays data into Cloudera CDH4 and then analysing it using Hive and Impala, but realistically it’s unlikely the data you’ll analyse in Hadoop will come in such convenient, tabular form. Something that’s more realistic is analysing log files from web servers or other high-volume, semi-structured sources, so I asked Robin to download the most recent set of Apache log files from our website, and I thought I’d have a go at analysing them using Pig and Hive, and maybe the visualise the output using OBIEE (if possible, later on).

As I said, I’m not an expert in Hadoop and the Cloudera platform, so I thought it’d be interesting to describe the journey I went through, and also give some observations from myself on when to use Hive and when to use Pig; when products like Cloudera Impala could be useful, and also the general state-of-play with the Cloudera Hadoop platform. So the files I started off with were Apache weblog files, with 10 in total and sizes ranging from 350MB to around 2MB.


Looking inside one of the log files, they’re in the standard Apache log file format (or “combined log format”), where the visitor’s IP address is recorded, the date of access, some other information and the page (or resource) they requested:


What I’m looking to do is count the number of visitors on a day, which was the most popular page, what time of day are we most busy, and so on. I’ve got a Cloudera Hadoop CDH5.0 6-node cluster running on a VMWare ESXi server at home, so the first thing to do is log into Hue, the web-based developer admin tool that comes with CDH5, and upload the files to a directory on HDFS (Hadoop Distributed File System), the Unix-like clustered file system that underpins most of Hadoop.


You can, of course, SFTP the files to one of the Hadoop nodes and use the “hadoop fs” command-line tool to copy the files into HDFS, but for relatively small files like these it’s easier to use the web interface to upload them from your workstation. Once I’ve done that, I can then view the log files in the HDFS directory, just as if they were sitting on a regular Unix filesystem.


At this point though, the files are still “unstructured’ – just a single log entry per line – and I’ll therefore need to do something before I can count things like number of hits per day, what pages were requested and so on. At this beginners level, there’s two main options you can use – Hive, a SQL interface over HDFS that lets you select from, and do set-based transformations with, files of data; or Pig, a more procedural language that lets you manipulate file contents as a series of step-by-step tasks. For someone like myself with a relational data warehousing background, Hive is probably easier to work with but it comes with some quite significant limitations compared to a database like Oracle – we’ll see more on this later.

Whilst Hive tables are, at the most simplest level, mapped onto comma or otherwise-delimted files, another neat feature in Hive is that you can use what’s called a “SerDe”, or “Serializer-Deserializer”, to map more complex file structures into regular table columns. In the Hive DDL script below, I use this SerDe feature to have a regular expression parse the log file into columns, with the data source being an entire directory of files, not just a single one:

Things to note in the above DDL are:

  • EXTERNAL table means that the datafile used to populate the Hive table sits somewhere outside Hive’s usual /user/hive/warehouse directory, in this case in the /user/root/logs HDFS directory.
  • ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ tells Hive to use the Regular Expressions Serializer-Deserializer to interpret the source file contents, and 
  • WITH SERDEPROPERTIES … gives the SerDe the regular expression to use, in this case to decode the Apache log format.

Probably the easiest way to run the Hive DDL command to create the table is to use the Hive query editor in Hue, but there’s a couple of things you’ll need to do before this particular command will work:

1. You’ll need to get hold of the JAR file in the Hadoop install that provides this SerDE (hive-contrib-0.12.0-cdh5.0.0.jar) and then copy it to somewhere on your HDFS file system, for example /user/root. In my CDH5 installation, this file was at opt/cloudera/parcels/CDH/lib/hive/lib/, but it’ll probably be at /usr/lib/hive/lib if you installed CDH5 using the traditional packages (rather than parcels) route. Also if you’re using a version of CDH prior to 5, the filename will be renamed accordingly. This JAR file then needs to accessible to Hive, and whilst there’s various more-permanent ways you can do this, the easiest is to point to the JAR file in an entry in the query editor File Resources section as shown below.

2. Whilst you’re there, un-check the “Enable Parameterization” checkbox, otherwise the query editor will interpret the SerDe output string as parameter references.


Once the command has completed, you can click over to the Hive Metastore table browser, and see the columns in the new table. 


Behind the scenes, Hive maps its table structure onto all the files in the /user/root/logs HDFS directory, and when I run a SELECT statement against it, for example to do a simple row count, MapReduce mappers, shufflers and sorters are spun-up to return the count of rows to me.


But in its current form, this table still isn’t all that useful – I’ve just got raw IP addresses for page requesters, and the request date is a format that’s not easy to work with. So let’s do some further manipulation, creating another table that splits out the request date into year, month, day and time, using Hive’s CREATE TABLE AS SELECT command to transform and then load in one command:

Note the ParquetHive SerDe I’m using in this table’s row format definition – Parquet is a compressed, column-store file format developed by Cloudera originally for Impala (more on that in a moment), that from CDH4.6 is also available for Hive and Pig. By using Parquet, we potentially take advantage of speed and space-saving advantages compared to regular files, so let’s use that feature now and see where it takes us. After creating the new Hive table, I can then run a quick query to count web server hits per month:


So – getting more useful, but it’d be even nicer if I could map the IP addresses to actual countries, so I can see how many hits came from the UK, how many from the US, and so on. To do this, I’d need to use a lookup service or table to map my IP addresses to countries or cities, and one commonly-used such service is the free GeoIP database provided by MaxMind, where you turn your IP address into an integer via a formula, and then do a BETWEEN to locate that IP within ranges defined within the database. How best to do this though?

There’s several ways that you can enhance and manipulate data in your Hadoop system like this. One way, and something I plan to look at on this blog later in this series, is to use Pig, potentially with a call-out to Perl or Python to do the lookup on a row-by-row (or tuple-by-tuple) basis – this blog article on the Cloudera site goes through a nice example. Another way, and again something I plan to cover in this series on the blog, is to use something called “Hadoop Streaming” – the ability within MapReduce to “subcontract” the map and reduce parts of the operation to external programs or scripts, in this case a Python script that again queries the MaxMind database to do the IP-to-country lookup.

But surely it’d be easiest to just calculate the IP address integer and just join my existing Hive table to this GeoIP lookup table, and do it that way? Let’s start by trying to do this, first by modifying my final table design to include the IP address integer calculation defined on the MaxMind website: 

Now I can query this from the Hive query editor, and I can see the IP address integer calculations that I can then use to match to the GeoIP IP address ranges.


I then upload the IP Address to Countries CSV file from the MaxMind site to HDFS, and define a Hive table over it like this:

Then I try some variations on the BETWEEN clause, in a SELECT with a join:

.. which all fail, because Hive only supports equi-joins. One option is to use a Hive UDF (user-defined function) such as this one here to implement a GeoIP lookup, but something that’s probably a bit more promising is to switch over to Impala, which has the ability to do non-equality joins through the crossjoin feature (Hive can in fact also use cross-joins, but they’re not very efficient). Impala also has the benefit of being much faster for BI-type queries than Hive, and it’s also designed to work with Parquet, so let’s switch over to the Impala query editor, run the “invalidate metadata” command to re-sync it’s table view with Hive’s table metastore, and then try the join in there:


Not bad. Of course this is all fairly simple stuff, and we’re still largely working with relational-style set-based transformations. In the next two posts in the series though I want get a bit more deep into Hadoop-style transformations – first by using a feature called “Hadoop Streaming” to process data on its way into Hadoop, done in parallel, by calling out to Python and Perl scripts; and then take a look at Pig, the more “procedural” alternative to Hive – with the objective being to enhance this current dataset to bring in details of the pages being requested, filter out the non-page requests, and do some work with authors, tag and clickstream analysis.

Previewing Three Oracle Data Visualization Sessions at the Atlanta US BI Forum 2014

April 22nd, 2014 by

Many of the sessions at the UK and US Rittman Mead BI Forum 2014 events in May focus on the back-end of BI and data warehousing, with for example Chris Jenkins’ session on TimesTen giving us some tips and tricks from TimeTen product development, and Wayne Van Sluys’s session on Essbase looking at what’s involved in Essbase database optimisation (full agendas for the two events can be found here). But two areas within BI that have got a lot of attention over the past couple of years are (a) data visualisation, and (b) mobile, so I’m particularly pleased that our Atlanta event has three of the most innovative practitioners in this area – Kevin McGinley from Accenture (left in pictures below), Christian Screen from Art of BI (centre), and Patrick Rafferty from Branchbird (right), talking about what they’ve been doing in these areas.


If you were at the BI Forum a couple of years ago you’ll of course know Kevin McGinley, who won “best speaker” award the previous year and most recently has gone on to organise the BI track at ODTUG KScope and write for OTN and his own blog, Kevin also hosts, along with our own Stewart Bryson, a video podcast series on iTunes called “Real-Time BI with Kevin & Stewart”, and I’m excited that he’s joining us again at this year’s BI Forum in Atlanta to talk about adding 3rd party visualisations to OBIEE. Over to Kevin…

“I can’t tell you how many times I’ve told someone that I can’t precisely meet a certain charting requirement because of a lack of configurability or variety in the OBIEE charting engine.  Combine that with an increase in the variety and types of data people are interested in visualizing within OBIEE and you have a clear need.  Fortunately, OBIEE is web-based tool and can leverage other visualization engines, if you just know how to work with the engine and embed it into OBIEE.

In my session, I’ll walk through a variety of reasons you might want to do this and the various approaches for doing it.  Then, I’ll take two specific engines and show you the process for building a visualization with them right in an OBIEE Analysis.  In both examples, you’ll come away with a capability you’ve never been able to do directly in OBIEE before.”


Another speaker, blogger, writer and developer very-well known to the OBIEE community is Art of BI Software’s Christian Screen, co-author of the Packt book “Oracle Business Intelligence Enterprise Edition 11g: A Hands-On Tutorial” and developer of the OBIEE collaboration add-in, BITeamwork. Last year Christian spoke to us about developing plug-ins for OBIEE, but this year he’s returned to a topic he’s very passionate about – mobile BI, and in particular, Oracle’s Mobile App Designer. According to Christian:

“Last year Oracle marked its mobile business intelligence territory by updating its Oracle BI iOS application with a new look and feel. Unbeknownst to many, they also released the cutting-edge Oracle BI Mobile Application Designer (MAD). These are both components available as part of the Oracle BI Foundation Suite. But it is where they are taking the mobile analytics platform that is most interesting at the moment as we look at the mobile analytics consumption chain. MAD is still in its 1.x release and there is a lot of promise with this tool to satisfy the analytical cravings growing in the bellies of many enterprise organizations. There is also quite a bit of discussion around building new content just for mobile consumption compared to viewing existing content through the mobile applications native to major mobile devices.

The “Oracle BI Got MAD and You Should be Happy” session will discuss these topics and I’ll be sharing the stage with Jayant Sharma from Oracle BI Product Development where we’ll also be showing some cutting edge material and demos for Oracle BI MAD.  Because MAD provides a lot of flexibility for development customizations, compared to the Oracle BI iOS/Android applications, our session will explore business use cases around pre-built MAD applications, HTML5, mobile security, and development of plug-ins using the MAD SDK.  One of the drivers for this session is to show how many of the Oracle Analytics components integrate with MAD and how an Oracle BI developer can quickly leverage the capabilities of MAD to show the tool’s value within their current Oracle BI implementation.

We will also discuss the common concern of mobile security by touching on the BitzerMobile acquisition and using the central mobile configuration settings for Oracle BI Mobile. The crowd will hopefully walk away with a better understanding of Oracle BI mobility with MAD and a desire to go build something.”


As well as OBIEE and Oracle Mobile App Designer, Oracle also have another product, Oracle Endeca Information Discovery, that combines a data aggregation and search engine with dashboard visuals and data discovery. One of the most innovative partner companies in the Endeca space are Branchbird, and we’re very pleased to have Branchbird’s Patrick Rafferty join us to talk about “More Than Mashups – Advanced Visualizations and Data Discovery”. Over to Patrick …

“In this session, we’ll explore how Oracle Endeca customers are moving beyond simple dashboards and charts and creating exciting visualizations on top of their data using Oracle Endeca Studio. We’ll discuss how the latest trends in data visualization, especially geospatial and temporal visualization, can be brought into the enterprise and how they drive competitive advantage.

This session will show in-production real-life examples of how extending Oracle Endeca Studio’s visualization capabilities to integrate technology like D3 can create compelling discovery-driven visualizations that increase revenue, cut cost and enhance the ability to answer unknown questions through data discovery.”


The full agenda for the Atlanta and Brighton BI Forum agendas can be found on this blog post, and full details of both events, including registration links, links to book accommodation and details of the Lars George Cloudera Hadoop masterclass, can be found on the Rittman Mead BI Forum 2014 home page.

Preview of Maria Colgan, and Andrew Bond/Stewart Bryson Sessions at RM BI Forum 2014

April 16th, 2014 by

We’ve got a great selection of presentations at the two upcoming Rittman Mead BI Forum 2014 events in Brighton and Atlanta, including sessions on Endeca, TimesTen, OBIEE (of course), ODI, GoldenGate, Essbase and Big Data (full timetable for both events here). Two of the sessions I’m particularly looking forward to though are ones by Maria Colgan, product manager for the new In-Memory Option for Oracle Database, and another by Andrew Bond and Stewart Bryson, on an update to Oracle’s reference architecture for Data Warehousing and Information Management.

The In-Memory Option for Oracle Database was of course the big news item from last year’s Oracle Openworld, promising to bring in-memory analytics and column-storage to the Oracle Database. Maria is of course well known to the Oracle BI and Data Warehousing community through her work with the Oracle Database Cost-Based Optimizer, so we’re particular glad to have her at the Atlanta BI Forum 2014 to talk about what’s coming with this new feature. I asked Maria to jot down a few worlds for the blog on what she’ll be covering, so over to Maria:

NewImage“Given this announcement and the performance improvements promised by this new functionality is it still necessary to create a separate access and performance layer in your data warehouse environment or to run  your Oracle data warehouse  on an Exadata environment?“At Oracle Open World last year, Oracle announced the upcoming availability of the Oracle Database In-Memory option, a solution for accelerating database-driven business decision-making to real-time. Unlike specialized In-Memory Database approaches that are restricted to particular workloads or applications, Oracle Database 12c leverages a new in-memory column store format to speed up analytic workloads.

This session explains in detail how Oracle Database In-Memory works and will demonstrate just how much performance improvements you can expect. We will also discuss how it integrates into the existing Oracle Data Warehousing Architecture and with an Exadata environment.”

The other session I’m particularly looking forward to is one being delivered jointly by Andrew Bond, who heads-up Enterprise Architecture at Oracle and was responsible along with Doug Cackett for the various data warehousing, information management and big data reference architectures we’ve covered on the blog over the past few years, including the first update to include “big data” a year or so ago.


Back towards the start of this year, Stewart, myself and Jon Mead met up with Andrew and his team to work together on an update to this reference architecture, and Stewart carried on with the collaboration afterwards, bringing in some of our ideas around agile development, big data and data warehouse design into the final architecture. Stewart and Andrew will be previewing the updated reference architecture at the Brighton BI Forum event, and in the meantime, here’s a preview from Andrew:

“I’m very excited to be attending the event and unveiling Oracle’s latest iteration of the Information Management reference architecture. In this version we have focused on a pragmatic approach to “Analytics 3.0″ and in particular looked at bringing an agile methodology to break the IT / business barrier. We’ve also examined exploitation of in-memory technologies and the Hadoop ecosystem and guiding the plethora of new technology choices.

We’ve worked very closely with a number of key customers and partners on this version – most notably Rittman Mead and I’m delighted that Stewart and I will be able to co-present the architecture and receive immediate feedback from delegates.”

Full details of the event, running in Brighton on May 7-9th 2014 and Atlanta, May 15th-17th 2014, can be found on the Rittman Mead BI Forum 2014 homepage, and the agendas for the two days are on this blog post from earlier in the week.

Website Design & Build: