Over the last year, I've been speaking at conferences on one subject more than any others: Agile Data Warehousing with Exadata and OBIEE. Although I've been busy with client work and growing the US business, I realize I need to dedicate more time to blogging again, and this seemed like the logical subject to take up. So I'll use the next few blog posts to make my case for what I like to call Extreme BI: an Agile approach to data warehousing using the combination of Extreme Performance and Extreme Metadata.
In a standard data warehouse implementation, whether we are walking in the Inmon or Kimball camps, some portion of our data model will be dimensional in nature; a star schema with facts and dimensions. So let me pose a question, which I think will lend itself well to diving into the Extreme BI discussion: Why do we build dimensional models? The first reason is simplicity. We want to model our reporting structures in a way that makes sense to the business user. The standard OLTP data model that takes two of the four walls in the conference room to display is just never going to make sense to your average business user. At the end of a logical modeling exercise, I expect the end-user to have a look at a completed dimensional model and say: "Yep... that's our business alright". The second reason we build dimensional models is for performance. Denormalizing highly complex transactional models into simplified star schemas generally produces tremendous performance gains.
So my follow-up question: can the combination of Exadata and OBIEE, or Extreme BI, actually change the way we deliver projects? We've all seen the Exadata performance numbers that Oracle publishes, and I can tell you first hand the performance is impressive. Can this Extreme Performance combined with the Extreme Metadata that OBIEE provides give us a more compelling case for delivering data warehouses using Agile methodologies?
To start with, I'd like to paint a picture of what the typical waterfall data warehousing project looks like. The tasks we usually have to complete, in order, are the following:
- User interviews
- Construct requirement documents
- Create logical data model
- SQL prototyping of source transactional models
- Document source-to-target mappings
- ETL development
- Front-end development (analyses and dashboards)
- Performance tuning
Raise your hand if this looks familiar. We would have to go through all these steps, which could take months, before end users can see the fruits of our labor. To mitigate this scenario, organizations will attempt to deliver data warehouses using "Agile" methodologies. What this usually means, from my experience, is a simple repackaging of the same waterfall project plan into "iterations" or "sprints", so that the project can be delivered iteratively. So the process might look like the following:
- Iteration 1: Interviews and user requirements
- Iteration 2: Logical modeling
- Iteration 3: ETL Development
- Iteration 4: Front-end development
But this, ladies and gentlemen, is not Agile. To get an understanding of what lies at the heart of Agile development, we need to look no further than the Agile Manifesto, or the history of the Agile Movement. When examining the different methodologies, there is one major theme that permeates all of them: working software delivered iteratively. It's not enough to simply deliver the same old waterfall methodology in "sprints" or "iterations", because, at the end of those iterations, we don't have any working software... software that end users can actually use to improve their job or help them make better decisions. In the example above, we still require four iterations before we get any usable content. It doesn't matter if we've written some complex ETL to load a fact table if the end user doesn't have a working dashboard to go along with it.
To apply the Agile Manifesto to data warehouse delivery, it's the following key elements that are required for us to deliver with a true Agile spirit:
- User stories instead of requirements documents: a user asks for particular content through a narrative process, and includes in that story whatever process they currently use to generate that content.
- Time-boxed iterations: iterations always have a standard length, and we choose one or more user stories to complete in that iteration.
- Rework is part of the game: there aren't any missed requirements... only those that haven't been addressed yet.
I've been conscious not to prescribe any distinct Agile methodology, though I can't help using more Scrum-like concepts in this formulation. However, I think this list is generic enough to apply to most methodologies. Over the next few posts, I'll discuss the necessary puzzle pieces to engage in Extreme BI, as well as how we might implement new subject area content in a single iteration. Additionally, I'll discuss how these implementations might be reworked, or "refactored", over several iterations to produce data warehouses that respond to user stories: what users want and when they want it.