Agile Data Warehousing with Exadata and OBIEE: Model-Driven Iteration

After laying the groundwork with an introduction, and following up with a high-level description of the required puzzle pieces, it's time to get down to business and describe how Extreme BI works. At Rittman Mead, we have several projects delivering with this methodology right now, and more in the pipeline.

I'll gradually introduce the different types of generic iterations that we engage in, focusing on what I call the "model-driven" iteration for this post. Our first few iterations are always model-driven. We begin when a user opens a user story requesting new content. For any request for new content, we require that all the following elements are including in the story:

  1. A narrative about the data they are looking for, and how they want to see it. We are not looking for requirements documents here, but we are looking for the user to give a complete picture of what it is that they need.
  2. An indication of how they report on this content today. In a new data warehouse environment, this would include some sort of report that they are currently running against the source system, and in a perfect world, this would involve the SQL that is used to pull that report.
  3. An indication of data sets that are "nice to haves". This might include data that isn't available to them in the current paradigm of the report, or was simply too complicated to pull in that paradigm. After an initial inspection of these nice-to-haves and the complexity involved with including them in this story, the project manager may decide to pull these elements out and put them a separate user story. This, of course, depends on the Agile methodology used, and the individual implementation of that methodology.
First we assign the story to an RPD developer, who uses the modeling capabilities in the OBIEE Admin Tool to "discover" the logical dimensional model tucked inside the user story, and develop that logical model inside the Business Model and Mapping (BMM) layer. Unlike a "pure" dimensional modeling exercise where we focus only on user requirements and pay very little attention to source systems, in model-driven development, we constantly shift between the source of the data, and how best the user story can be solved dimensionally. Instead of working directly against the source system though, we are working against the foundation layer in the Oracle Next-Generation Reference Data Warehouse Architecture. We work from a top-down approach, first creating empty facts and dimensions in the BMM, and mapping them to the foundation layer tables in the physical layer.

To take a simple example, we can see how a series of foundation layer tables developed in 3NF could be mapped to a logical dimension table as our Customer dimension:

Model-Driven Development of Dimension Table

I rearranged the layout from the Admin Tool to provide an "ETL-friendly" view of the mapping. All the way to the right, we can see the logical, dimensional version of our Customer table, and how it maps back to the source tables. This mapping could be quite complicated, with perhaps dozens of tables. The important thing to keep in mind is that this complexity is hidden from not only the consumer of the reports, but also from the developers. We can generate a similar example of what our Sales fact table would look like:

Another way of making the same point is to look at the complex, transaction model:

We can then compare this to the simplified, dimensional model:

And finally, when we view the subject area during development of an analyses, all we see are facts and dimensions. The front-end developer can be blissfully ignorant that he or she is developing against a complex transactional schema, because all that is visible is the abstracted logical model:

When mapping the BMM to complex 3NF schemas, the BI Server is very, very smart, and understands how to do more with less. Using the metadata capabilities of OBIEE is superior to other metadata products, or superior to a "roll-you-own metadata" approach using database views, because of the following:

  1. The generated SQL usually won't involve self-joins, even when tables exists in both the logical fact table, and the logical dimension table.
  2. The BI Server will only include tables that are required to facilitate the intelligent request, either because it has columns mapped to the attributes being requested, or because the table is a required reference table to bring disparate tables together. Any tables not required to facilitate the request will be excluded.
Since the entire user story needs to be closed in a single iteration, the user who opened the story needs to be able to see the actual content. This means that the development of the analyses (or report) and the dashboard are also required to complete the story. It's important to get something in front of the end user immediately, but it doesn't have to be perfect. We should focus on a clear, concise analyses in the first iteration, so it's easy for the end user to verify that the data is correct. In future iterations, we can deliver high-impact, eye-catching dashboards. Equally important to closing the story is being able to prove that it's complete. In Agile methodologies, this is usually referred to as the "Validation Step" or "Showcase". Since we have already produced the content, then it's easy to prove to the user that the story is complete. But suppose that we believed we couldn't deliver new content in a single iteration. That would imply that we would have an iteration during our project that didn't include actual end-user content. How would you go about validating or showcasing that content? How would we go about showcasing a completed ETL mapping, for instance, if we haven't delivered any content to consume it?

What we have at the end of the iteration is a completely abstracted view of our model: a complex, transactional, 3NF schema presented as a star schema. We are able to deliver portions of a subject area, which is important for time-boxed iterations. The Extreme Metadata of OBIEE 11g allows us to remove this complexity in a single iteration, but it's the performance of the Exadata Database Machine that allows us to build real analyses and dashboards and present it to the general user community.

In the next post, we'll examine the ETL Iteration, and explore how we can gradually manifest our logical business model into a physical model over time. As you will see, the ETL iteration is an optional one... it will be absolutely necessary in some environments, and completely superflous in others.