Agile Data Warehousing with Exadata and OBIEE: Puzzle Pieces

December 28th, 2011 by

In the previous post, I laid the groundwork for describing Extreme BI: a combination of Exadata and OBIEE delivered with an Agile spirit. I discussed that the usual approach to Agile data warehousing is not Agile at all due to the violation of it’s main principle: working software delivered iteratively.

If you haven’t already deduced from my first post — or if you haven’t already seen me speak on this topic — what I am recommending is bypassing, either temporarily or permanently, the inhibitors specific to data warehousing projects which limit our ability to deliver working software quickly. Specifically, I’m recommending that we wait to build and populate physical star schemas until a later phase, if at all. Remember the two reasons that we build dimensional models: model simplicity and performance. With our Extreme BI solution, we have tools to counter both of those reasons. We have OBIEE 11g, with a rich metadata layer that presents our underlying data model, even if it is transactional, as a star schema to the end user. This removes our dependency on a simplistic physical model to provide a simplistic logical model to end users. We also have Exadata, which delivers world-class performance against any type of model, and can bridge the performance gap afforded by star schemas. With these tools at our disposal, we can postpone the long process of building dimensional models, at least for the first few iterations. This is the only way to get working software in front of the end user in a single iteration, and, as I will argue, this is the best way to collaborate with an end user and deliver the content they are expecting.

Of the puzzle pieces we need to deliver this model, the first is the Oracle Next-Generation Reference DW Architecture (we need an acronym for that), which Mark has already written about in-depth here. As you browse through this post, pay special attention to his formulation of the foundation layer, which is the most important layer for delivering Extreme BI.

Oracle Next-Generation Reference DW Architecture

Foundation Layer

This is our “process-neutral” layer, which means simply that it isn’t imbued with requirements about what users want and how they want it. Instead, the foundation layer has one job and one job only: tracking what happened in our source systems. Typically, the foundation layer logical model looks identical to the source systems, except that we have a few additional metadata columns on each record such as commit timestamps and Oracle Database system change numbers (SCN’s). There are other, more complex solutions for modeling the foundation layer when the 3NF from the source system or systems is not sufficient, such as data vault. Our foundation layer is generally “insert-only”, meaning we track all history so that we are insulated from changing user requirements in the near and distant futures.

UPDATE: Kent Graziano, a major data vault evangelist, has started blogging. Perhaps with some pressure from the public, we could “encourage” him to blog on what data vault would look like in a standard foundation layer.

Capturing Change

Also required for delivering Extreme BI is a process for capturing change from the source systems and rapidly applying it to the foundation layer, which I described briefly in one of my posts on real-time data warehousing. We have a bit of a tug-of-war at this point between Oracle Streams and Oracle GoldenGate. GoldenGate is the stated platform of the future because it’s a simple, flexible, powerful and resilient replication technology. However, it does not yet have powerful change data capture functionality specific to data warehouses, such as easy subscriptions to raw changed data, or support for multiple subscription groups. You can, in general, work around these limitations using the INSERTALLRECORDS parameter and some custom code (perhaps fodder for a future blog post). Regardless of the technology, Extreme BI requires a process for capturing and applying source system changes quickly and efficiently to the foundation layer on the Exadata Database Machine.

Extreme Performance

Although I’ll drill into more detail in the next post, the reason we need Extreme Performance is to offset the performance gains we usually get from star schemas, since we won’t be building those, at least not in the initial iterations. Although Rittman Mead has deployed a variant of this methodology sans Exadata using a powerful Oracle Database RAC instead, there is no substitute for Exadata. Although the hardware on the Database Machine is superb, it’s really the software that is a game-changer. The most extraordinary features include smart scan and storage indexes, as well as hybrid columnar compression, which Mark talks about here and references an article by Arup Nanda found here. For years now, with standard Oracle data warehouses, we’ve pushed the architecture to it’s limits trying to reduce IO contention at the cost of CPU utilization, using database features such as partitioning, parallel query and basic block compression. But Exadata Storage can eliminate the IO boogeyman using combinations of these standard features plus the Exadata-only features to elevate the query performance against 3NF schemas on par with traditional star schemas and beyond.

Extreme Metadata

Extreme performance is only half the battle… we also need Extreme Metadata to provide us the proper level of abstraction so that report and dashboard developers still have a simplistic model to report against. This is what OBIEE 11g brings to the table. We have also delivered a variant of this methodology without OBIEE, using Cognos instead, which has a metadata layer called Framework Manager. As with Exadata, the BI Server has no equal in the metadata department, so my advice… don’t substitute ingredients.

Consider, for a moment, the evolution of dimensional modeling in deploying a data warehouse. Not too long ago, we had to solve most data warehousing issues with the logical model because BI tools were simplistic. Generally… there was no abstraction of the physical into the logical, unless you categorize the renaming of columns as abstraction. As these tools evolved, we often found ourselves with a choice: solve some user need in the logical model, or solve it with the feature set of the BI tool. The use of aggregation in data warehousing is a perfect example of this evolution. Designing aggregate tables used to be just another part of the logical modeling exercise, and were generally represented in the published data model for the EDW. But now, building aggregates is more of a technical implementation than a logical one, as either the BI Server or the Oracle Database can handle the transparent navigation to aggregate tables.

The metadata that OBIEE provides adds two necessary features for Agile delivery. First, we are able to report against complex transactional schemas, but still expose those schemas as simplified dimensional models. This allows us to bypass the complex ETL process at least initially so that we can get new subject areas into the users hands in a single iteration. But OBIEE’s capability to map multiple Logical Table Sources (LTS’s) for the same logical table makes it easy to modify — or “remap” — the source of our logical tables over time. So, in later iterations, if we decide that it’s necessary to embark upon complex ETL processes to complete user stories, we can do this in the metadata layer without affecting our reports and dashboards, or changing the logical model that report developers are used to seeing.

Flow of Data Through the Three-Layer Semantic Model

More to Come…

In the next post, I’ll describe what I call the Model-Driven Iteration, where we use OBIEE against the foundation layer to expose new subject areas in a single iteration. After that, I’ll describe ETL Iterations, where we transform a portion of our model iteratively using ETL tools such as ODI, OWB or Informatica. Finally, I’ll describe what I call Combined Iterations, where both Model-Driven activity and ETL activity are going on at the same time.

Comments

  1. M Says:

    Hi Stewart,
    This next-generation DW architecture is called DW 2.0 by Bill Inmon, it makes for a quite good acronym.

  2. Sachin B Says:

    Hi Stewart,

    Doesn’t The COTS/Off the shelf approach bypass the Foundation layer completely?
    In the current BI Apps scenario, we load the transactional data into the staging tables (_DS/FS) tables, then load them into corresponding facts and dims.
    If this approach has to be applied, wouldn’t targeting the source system be an appropriate approach. Cause creating a foundation layer data model is something which is repetitive since most of the CRM and ERP systems have audit tables for major entities to keep track of record changes.
    I do however feel that we might experience stuff like lock escalations and performance issues from the transactional system side.

  3. Stewart Bryson Says:

    @Sachin

    Excellent point! You are correct that the COTS stuff, with all the history tracking, does take a bit of the edge off the value of the foundation layer. But as you will see in later posts, it is absolutely pivot that the transactional schema (whether foundation or COTS-based) exist on the Exadata Database Machine. Afterall… it’s the “Extreme Performance” angle that allows us to consider reporting against transactional schemas in the first place.

  4. Stewart Bryson Says:

    @M

    Hopefully you don’t think this architecture is TOO Bill Inmon-like. :-)

  5. Dan O'Brien Says:

    Hi Stewart, You talked about this during the RM’s Brighton conference last year. IMHO you would be very very hard pressed to model the BI Apps RPD directly to 3NF schemas. And this would be impossible in some cases (ie. for HR Analytics where a persisted workforce snapshot is built in the Warehouse). So great in theory, but in practise, highly doubtful (for BI Apps-type applications at least). Cheers, Dan (in New Zealand)

  6. Stewart Bryson Says:

    @Dan:

    This is exactly what Siebel did initially with this product: reported directly against the Siebel CRM system as if it were a star schema. It was only later that Siebel started adding the packaged applications that later became the basis for the BI Applications.

    We’ve used this methodology to report on some very complex source systems, including financials, CRM, billing, etc.

    Also… watch the Oracle BI Apps roadmap. You are likely to start seeing some BI Apps offerings delivered in this way again.

  7. Stewart Bryson Says:

    @Dan:

    I also meant to add: one of the reasons that the BI Apps schemas are so complicated is that they have to be a one-size-fits all, supporting extensibility with flex-fields, etc.

    For designing a custom data warehouse to handle financials, etc., a logical model doesn’t have to be near that complex. Though I’ll grant you: chart of accounts and financial reporting is one of the more complicated DW concepts.

    I will talk about the part ETL plays in this methodology later in the series.

    Thanks for reading.

  8. Dan Says:

    @Stewart, thanks for your responses, always a good read.

    I worked last year with a French mobile operator on an OBIEE “Virtual Data Warehouse” POC ontop of high volume 3NF sources. We succeeded in demo’ing the value of this approach BUT the RPD modelling was not at all trivial. Especially since data may be fragmented across source systems for the single model.

    Complex fragmentation (ie. combined level-based and data range based fragmentation) is limited, and difficult to test, especially on 3NF sources. You cannot have partial fragments on a particular level it seems. Also, I tried both a normalized physical layer and de-normalized opaque view physical layer approach: and it the former approach was best adapted when you have data fragmentation / multi-sources.

    In any case, I would love to see a real-world “BI Apps level of complexity” RPD ontop of multiple source applications directly.

    Agreed that the BI Apps “one size fits all” approach is sometimes pointless, since customers will customize. On the other hand, the BI Apps methodology tries to address many of the Warehousing “gotchas” in a structured way.

Website Design & Build: tymedia.co.uk