Oracle Open World : Day 4 : Final Round of BI&W Presentations

It was the final day of Open World today, and in actual fact it was probably the most productive (for me at least) in terms of BI & data warehousing content. I'll start off with one presentation that I missed, George Lumpkin's "Data Warehouse Architectures" - you can download the presentation here, username/password is here.

I was particularly interested in this one as George Lumpkin is Senior Director of Product Management, Data Warehousing & BI Platform for Oracle, in other words what he's saying should pretty much be Oracle's position on DW architecture. Now I've got to be careful here as I wasn't at the presentation, but from looking through the slides, what he's proposing is a physical architecture that has three elements:

A Staging Area, for supporting ETL as needed
An Atomic Data layer, which is the base data warehouse schema and contains atomic (i.e. detail-level) data, has a 3NF design, supports general end-user queries, and is the source of data for the next level which is...
Application-Specific Performance Structures, containing aggregate data, has a dimensional view, and supports specific end-users, tools and applications. Later slides go on to show this layer comprising of views (over the atomic data layer), materialized views (sourced from the same) and analytic workspaces.

The atomic data layer contains both real-time and batch loaded data, with the real-time data being sourced from CDC, queues and so forth. The slides then go on to summarise the architecture as being simple (only contains three layers, an ODS is not neccessary, and data marts and analytics are combined into one layer), broad (contains both normalised and denormalised, detail and aggregate data) and integrated (single security as all three physical layer are part of the same database, and a single source of the truth)

Now to me this looks interesting, for a couple of reasons. Firstly, it looks like he's broken aggregate data out from detail-level data in order to support real-time updates to the warehouse; if real-time updates were applied to a Kimball-style dimensional warehouse, you'd need to apply the changes to all the places that you've denormalized the data to, and you'd also need to be constantly refreshing the aggregates. The typical Kimball response to this move though would be to point out that users, when working with aggregate data, would need to drill-through to the detail-level 3NF warehouse to retrieve data at the detail level, which is non-intuitive. Also, the Performance Data Layer doesn't look to me like a dimensional warehouse with conformed dimensions - it looks more like a set of independent data marts, some of which are implemented using AWs - which again goes against the Kimball grain as you won't be able to drill across from sales cubes to HR cubes and so forth. So it looks to me that this architecture sacrifices the usability of a dimensional data warehouse for the benefit of receiving real-time updates - It's an approach I suppose but I'm curious as to why Oracle are promoting this approach above others? My guess is that this is what the Fusion BI&W layer will be built like, and what Oracle are trying to sell here is the idea of real-time analytics and business intelligence as a differentiator between themselves and SAP; not sure though whether it's the right approach for all data warehouses. Comments anyone?

Later on in the morning I took a look around the demogrounds and had a chat with some of the product managers for Oracle's BI products. There certainly seems to be a lot of excitement around what they've got up their sleeves and again Fusion seems to crop up again and again as having a major input into their development plans. One product I had a closer look at was XML Publisher and it's shaping up to be quite an interesting product. To answer my own question from a couple of days ago, the product as it stands only takes XML as an input, but the development team are working on an engine for processing SQL data and at some point in a future release it should be possible to use both XML and SQL data as the report data source. One interesting demo that I saw for XMLP (as we now call it) was a dashboard application that contained a number of charts, graphs, grids and so forth with data taken from a number of XML feeds. It all looked very slick and quite similar in fact to the Report Center demo I saw the other day - what's interesting here is that there's at least four examples of dashboarding that the various product teams are working on; you've got Oracle Portal plus Discoverer Portlets, BI Beans (the Executive Insight demo is a good example), Report Center and now XML Publisher, all of which are solving the same problem in different ways. It'll be interesting to see how this pans out, it certainly seems that the product teams are pretty much on the ball now in terms of knowing what customers are looking for, hopefully it won't be too confusing though for someone just looking to build a dashboard and not knowing which technology to use. Watch this space as they say ...

After the stroll around the demogrounds I went to the presentation (slides, paper) by Keith Laker, Katarina Obradovic-Sarkic and the BI Beans/OLAP product team on Integrating OLAP and Data Mining in your BI Applications. Keith is always pretty good value in terms of coming up with something innovative (last year he build an OWB Paris "expert" that migrated OSA databases to Discoverer Drake) and this year he went through a BI Beans application they'd build which he termed the "Cube Factory". The idea behind this was that you might have a number of measures - revenue, sales, units and so forth - and a customer dimension that has a number of attributes - length of tenure, age, income band and so on. The demo showed a JSP application that let you select one or more measures, and then displayed a page listing all the attributes for the customer. Under the list of attributes was an "attribute importance" button that, when clicked on, showed a BI Beans bar graph that displayed each attribute and the importance assigned to it in relation to the measures, calculated using the data mining option. You could then pick two or more of these attributes (in the demo, Keith picked the ones with the highest importance - age band, income, length of tenure and so on) and the application would then, via the AWXML Java API, build a cube "on the fly" dimensioned by these attributes. The point of all this was to show how an application can assist the user in picking those items that are most relevant for slicing and dicing a measure, and then build a cube on the fly to support this analysis. I had a word with Keith afterwards as I wasn't sure on what basis the attribute imporance was worked out - was it for the measures that were selected, or was it precalculated - and it turned out that the importance was infact precalculated based on the revenue measure. Looking through the accompanying paper and from talking to Keith, the way that the application works is that the cube is initially built (using AWM) as an eighteen-dimension cube (one dimension for each possible customer attribute), the data sources are mapped to the dimensions, but the data is not at this stage loaded. Then, when the user selects the attributes by which they will analyze the measures, the AWXML API is then used to populate the cube just based on the dimensions that are being used. The paper linked to above gives more details, and apparently the demo and source code will go up on OTN shortly so that we can download and play around with it. I thought it was a pretty cool demo actually, a nice usage of data mining and OLAP together (I was wondering how they would bring the two together for the presentation, I thought it would be a bit tenuous but infact it was pretty relevant) and I look forward to downloading it off OTN in due course.

The final presentation for me was the Discoverer 10.1.2 Best Practices one by Mike Donahue; I'd seen previous paper by Oracle on 9.0.x and 4i and was quite looking forward to an update, in particular on scalability and sizing. The paper and presentation are available online; a couple of points that I took away from the session were

The server architecture of Discoverer Plus running against OLAP data is different to when running against relational data; Plus relational talks to the Discoverer Servlet, which in turn talks to a Discoverer Session and the Discoverer Preference Server on the application server which then talks to the database, whilst Plus OLAP (after authenticating) talks directly to the database, via the Discoverer Catalog. Slides 8 and 9 cover this. Worth bearing in mind when working out where the load will go on a new system.
Discoverer Viewer is more application server intensive than Discoverer Plus, as with Viewer the app server (the Discoverer Servlet) needs to query the database, render the HTML and so on; with Plus the display part of the work is done by the applet running in your browser, meaning that the "cost" of a Viewer user is more than a "Plus" user, not probably what you'd be expecting as the Viewer interface is considered more "lightweight" and you'd probably think that a Plus user took up more server resource than a Viewer user.
Discoverer 10.1.2 has some "patented" optimisations for "join and item trimming", which means that if you usually build your folders as custom folders, with a "select * from customers" type query, these optimisations won't be used (they only work on simple or complex folders, where Discoverer itself puts the SQL together) which is a good reason now to stop building folders like this.

The paper that comes with the presentation is pretty thin on details, but the slides are a bit more useful. Apparently Oracle are due to publish shortly a couple of white papers on sizing and troubleshooting 10.1.2., I'll keep an eye out and post a link when I notice them.

Well, that's it for me in terms of Open World feedback, I'm off to pack now, I'll do a short posting tomorrow morning on what the highlights for me have been, other than that, that's it for me now.