Oracle Open World : Day 4 : Final Round of BI&W Presentations

September 23rd, 2005 by Mark Rittman

It was the final day of Open World today, and in actual fact it was probably
the most productive (for me at least) in terms of BI & data warehousing content.
I’ll start off with one presentation that I missed, George Lumpkin’s "Data
Warehouse Architectures" - you can download the presentation

here
, username/password is
here.

I was particularly interested in this one as George Lumpkin is Senior
Director of Product Management, Data Warehousing & BI Platform for Oracle, in
other words what he’s saying should pretty much be Oracle’s position on DW
architecture. Now I’ve got to be careful here as I wasn’t at the presentation,
but from looking through the slides, what he’s proposing is a physical
architecture that has three elements:

  1. A Staging Area, for supporting ETL as needed
  2. An Atomic Data layer, which is the base data warehouse schema and
    contains atomic (i.e. detail-level) data, has a 3NF design, supports general
    end-user queries, and is the source of data for the next level which is…
  3. Application-Specific Performance Structures, containing aggregate data,
    has a dimensional view, and supports specific end-users, tools and
    applications. Later slides go on to show this layer comprising of views
    (over the atomic data layer), materialized views (sourced from the same) and
    analytic workspaces.

The atomic data layer contains both real-time and batch loaded data, with the
real-time data being sourced from CDC, queues and so forth. The slides then go
on to summarise the architecture as being simple (only contains three layers, an
ODS is not neccessary, and data marts and analytics are combined into one
layer), broad (contains both normalised and denormalised, detail and aggregate
data) and integrated (single security as all three physical layer are part of
the same database, and a single source of the truth)

Now to me this looks interesting, for a couple of reasons. Firstly, it looks
like he’s broken aggregate data out from detail-level data in order to support
real-time updates to the warehouse; if real-time updates were applied to a
Kimball-style dimensional warehouse, you’d need to apply the changes to all the
places that you’ve denormalized the data to, and you’d also need to be
constantly refreshing the aggregates. The typical Kimball response to this move
though would be to point out that users, when working with aggregate data, would
need to drill-through to the detail-level 3NF warehouse to retrieve data at the
detail level, which is non-intuitive. Also, the Performance Data Layer doesn’t
look to me like a dimensional warehouse with conformed dimensions - it looks
more like a set of independent data marts, some of which are implemented using
AWs - which again goes against the Kimball grain as you won’t be able to drill
across from sales cubes to HR cubes and so forth. So it looks to me that this
architecture sacrifices the usability of a dimensional data warehouse for the
benefit of receiving real-time updates - It’s an approach I suppose but I’m
curious as to why Oracle are promoting this approach above others? My guess is
that this is what the Fusion BI&W layer will be built like, and what Oracle are
trying to sell here is the idea of real-time analytics and business intelligence
as a differentiator between themselves and SAP; not sure though whether it’s the
right approach for all data warehouses. Comments anyone?

Later on in the morning I took a look around the demogrounds and had a chat
with some of the product managers for Oracle’s BI products. There certainly
seems to be a lot of excitement around what they’ve got up their sleeves and
again Fusion seems to crop up again and again as having a major input into their
development plans. One product I had a closer look at was XML Publisher and it’s
shaping up to be quite an interesting product. To answer my own question from a
couple of days ago, the product as it stands only takes XML as an input, but the
development team are working on an engine for processing SQL data and at some
point in a future release it should be possible to use both XML and SQL data as
the report data source. One interesting demo that I saw for XMLP (as we now call
it) was a dashboard application that contained a number of charts, graphs, grids
and so forth with data taken from a number of XML feeds. It all looked very
slick and quite similar in fact to the Report Center demo I saw the other day -
what’s interesting here is that there’s at least four examples of dashboarding
that the various product teams are working on; you’ve got Oracle Portal plus
Discoverer Portlets, BI Beans (the Executive Insight demo is a good example),
Report Center and now XML Publisher, all of which are solving the same problem
in different ways. It’ll be interesting to see how this pans out, it certainly
seems that the product teams are pretty much on the ball now in terms of knowing
what customers are looking for, hopefully it won’t be too confusing though for
someone just looking to build a dashboard and not knowing which technology to
use. Watch this space as they say …

After the stroll around the demogrounds I went to the presentation (slides,

paper
) by Keith Laker, Katarina Obradovic-Sarkic and the BI Beans/OLAP
product team on Integrating OLAP and Data Mining in your BI Applications. Keith
is always pretty good value in terms of coming up with something innovative
(last year he build an OWB Paris "expert" that migrated OSA databases to
Discoverer Drake) and this year he went through a BI Beans application they’d
build which he termed the "Cube Factory". The idea behind this was that you
might have a number of measures - revenue, sales, units and so forth - and a
customer dimension that has a number of attributes - length of tenure, age,
income band and so on. The demo showed a JSP application that let you select one
or more measures, and then displayed a page listing all the attributes for the
customer. Under the list of attributes was an "attribute importance" button
that, when clicked on, showed a BI Beans bar graph that displayed each attribute
and the importance assigned to it in relation to the measures, calculated using
the data mining option. You could then pick two or more of these attributes (in
the demo, Keith picked the ones with the highest importance - age band, income,
length of tenure and so on) and the application would then, via the AWXML Java
API, build a cube "on the fly" dimensioned by these attributes. The point of all
this was to show how an application can assist the user in picking those items
that are most relevant for slicing and dicing a measure, and then build a cube
on the fly to support this analysis. I had a word with Keith afterwards as I
wasn’t sure on what basis the attribute imporance was worked out - was it for
the measures that were selected, or was it precalculated - and it turned out
that the importance was infact precalculated based on the revenue measure.
Looking through the accompanying paper and from talking to Keith, the way that
the application works is that the cube is initially built (using AWM) as an
eighteen-dimension cube (one dimension for each possible customer attribute),
the data sources are mapped to the dimensions, but the data is not at this stage
loaded. Then, when the user selects the attributes by which they will analyze
the measures, the AWXML API is then used to populate the cube just based on the
dimensions that are being used. The paper linked to above gives more details,
and apparently the demo and source code will go up on OTN shortly so that we can
download and play around with it. I thought it was a pretty cool demo actually,
a nice usage of data mining and OLAP together (I was wondering how they would
bring the two together for the presentation, I thought it would be a bit tenuous
but infact it was pretty relevant) and I look forward to downloading it off OTN
in due course.

The final presentation for me was the Discoverer 10.1.2 Best Practices one by
Mike Donahue; I’d seen previous paper by Oracle on 9.0.x and 4i and was quite
looking forward to an update, in particular on scalability and sizing. The

paper
and

presentation
are available online; a couple of points that I took away from
the session were

  • The server architecture of Discoverer Plus running against OLAP data is
    different to when running against relational data; Plus relational talks to
    the Discoverer Servlet, which in turn talks to a Discoverer Session and the
    Discoverer Preference Server on the application server which then talks to
    the database, whilst Plus OLAP (after authenticating) talks directly to the
    database, via the Discoverer Catalog. Slides 8 and 9 cover this. Worth
    bearing in mind when working out where the load will go on a new system.
  • Discoverer Viewer is more application server intensive than Discoverer
    Plus, as with Viewer the app server (the Discoverer Servlet) needs to query
    the database, render the HTML and so on; with Plus the display part of the
    work is done by the applet running in your browser, meaning that the "cost"
    of a Viewer user is more than a "Plus" user, not probably what you’d be
    expecting as the Viewer interface is considered more "lightweight" and you’d
    probably think that a Plus user took up more server resource than a Viewer
    user.
  • Discoverer 10.1.2 has some "patented" optimisations for "join and item
    trimming", which means that if you usually build your folders as custom
    folders, with a "select * from customers" type query, these optimisations
    won’t be used (they only work on simple or complex folders, where Discoverer
    itself puts the SQL together) which is a good reason now to stop building
    folders like this.

The paper that comes with the presentation is pretty thin on details, but the
slides are a bit more useful. Apparently Oracle are due to publish shortly a
couple of white papers on sizing and troubleshooting 10.1.2., I’ll keep an eye
out and post a link when I notice them.

Well, that’s it for me in terms of Open World feedback, I’m off to pack now,
I’ll do a short posting tomorrow morning on what the highlights for me have
been, other than that, that’s it for me now.

Comments are closed.