Agile Data Warehousing with Exadata and OBIEE: Introduction

December 21st, 2011 by

Over the last year, I’ve been speaking at conferences on one subject more than any others: Agile Data Warehousing with Exadata and OBIEE. Although I’ve been busy with client work and growing the US business, I realize I need to dedicate more time to blogging again, and this seemed like the logical subject to take up. So I’ll use the next few blog posts to make my case for what I like to call Extreme BI: an Agile approach to data warehousing using the combination of Extreme Performance and Extreme Metadata.

In a standard data warehouse implementation, whether we are walking in the Inmon or Kimball camps, some portion of our data model will be dimensional in nature; a star schema with facts and dimensions. So let me pose a question, which I think will lend itself well to diving into the Extreme BI discussion: Why do we build dimensional models? The first reason is simplicity. We want to model our reporting structures in a way that makes sense to the business user. The standard OLTP data model that takes two of the four walls in the conference room to display is just never going to make sense to your average business user. At the end of a logical modeling exercise, I expect the end-user to have a look at a completed dimensional model and say: “Yep… that’s our business alright”. The second reason we build dimensional models is for performance. Denormalizing highly complex transactional models into simplified star schemas generally produces tremendous performance gains.

So my follow-up question: can the combination of Exadata and OBIEE, or Extreme BI, actually change the way we deliver projects? We’ve all seen the Exadata performance numbers that Oracle publishes, and I can tell you first hand the performance is impressive. Can this Extreme Performance combined with the Extreme Metadata that OBIEE provides give us a more compelling case for delivering data warehouses using Agile methodologies?

To start with, I’d like to paint a picture of what the typical waterfall data warehousing project looks like. The tasks we usually have to complete, in order, are the following:

  1. User interviews
  2. Construct requirement documents
  3. Create logical data model
  4. SQL prototyping of source transactional models
  5. Document source-to-target mappings
  6. ETL development
  7. Front-end development (analyses and dashboards)
  8. Performance tuning

Raise your hand if this looks familiar. We would have to go through all these steps, which could take months, before end users can see the fruits of our labor. To mitigate this scenario, organizations will attempt to deliver data warehouses using “Agile” methodologies. What this usually means, from my experience, is a simple repackaging of the same waterfall project plan into “iterations” or “sprints”, so that the project can be delivered iteratively. So the process might look like the following:

  1. Iteration 1: Interviews and user requirements
  2. Iteration 2: Logical modeling
  3. Iteration 3: ETL Development
  4. Iteration 4: Front-end development

But this, ladies and gentlemen, is not Agile. To get an understanding of what lies at the heart of Agile development, we need to look no further than the Agile Manifesto, or the history of the Agile Movement. When examining the different methodologies, there is one major theme that permeates all of them: working software delivered iteratively. It’s not enough to simply deliver the same old waterfall methodology in “sprints” or “iterations”, because, at the end of those iterations, we don’t have any working software… software that end users can actually use to improve their job or help them make better decisions. In the example above, we still require four iterations before we get any usable content. It doesn’t matter if we’ve written some complex ETL to load a fact table if the end user doesn’t have a working dashboard to go along with it.

To apply the Agile Manifesto to data warehouse delivery, it’s the following key elements that are required for us to deliver with a true Agile spirit:

  1. User stories instead of requirements documents: a user asks for particular content through a narrative process, and includes in that story whatever process they currently use to generate that content.
  2. Time-boxed iterations: iterations always have a standard length, and we choose one or more user stories to complete in that iteration.
  3. Rework is part of the game: there aren’t any missed requirements… only those that haven’t been addressed yet.

I’ve been conscious not to prescribe any distinct Agile methodology, though I can’t help using more Scrum-like concepts in this formulation. However, I think this list is generic enough to apply to most methodologies. Over the next few posts, I’ll discuss the necessary puzzle pieces to engage in Extreme BI, as well as how we might implement new subject area content in a single iteration. Additionally, I’ll discuss how these implementations might be reworked, or “refactored”, over several iterations to produce data warehouses that respond to user stories: what users want and when they want it.

Follow-up Posts

Agile Data Warehousing with Exadata and OBIEE: Puzzle Pieces

Agile Data Warehousing with Exadata and OBIEE: Model-Driven Iteration

Agile Data Warehousing with Exadata and OBIEE: ETL Iteration

Comments

  1. Besher Says:

    Hey Stewart
    It will be great to see how you will apply agile methodology for BI projects in early phases. I have used to apply those kind of methodologies in my projects ( LIKE ASD, Scrum …etc)but frankly i used them after designing the DW (designing the dashboards and creating the reports phase ).
    if you want to use agile methodologies when you designing the DW .. i think this will require more efforts and a lot of rework,because in most cases the customer don’t know what exactly what he needs, imagine if you design the DW and perform your ETL and generate some reports (reports are your deliverables ), then the customer asked you to add something to you Star schema … you have to redesign and perform ETL again ..and this will effect your schedule right ..and this will not end her, after each iteration he may ask some changes .. and you can’t say no because those changes are in the scope of the project ..
    we will see how you can handle those things in your next blogs :)

  2. Stewart Bryson Says:

    @Besher: Keep reading. You’ll see my approach. :-)

  3. Tim Berry Says:

    I’m currently woking on a project whereby we are delivering a transaction system using OWB and managing the development with SRUM, Agile, standups.
    It hasn’t worked, its a bit of a mess to be honest.
    This isn’t due to the Agile element but more due to the standard of the staff hiding behind processes.
    These days when I hear managers trumpeting new processes I know they probably don’t know what they are doing.
    I’m sure your different considering the organisation you work for but in general the new buzz word bingo process just maintains the same management for a further fiasco.
    Tim

  4. Stewart Bryson Says:

    @Tim

    Agile is more than just a methodology, or set of processes… it’s a mindset. The old mindset of CYA (cover you assets) with documentation and finger-pointing when things don’t work has to stop. As technologists, we have to partner with the business and say “these are our user stories” and “these are our solutions”.

    Agile is not right for every organization. When each division or group is still predominately concerned with turf wars, then Agile is the wrong choice.

  5. Julian Human Says:

    Hi,

    Very interesting approach, but is the danger of too much modelling within the Oracle app layer turning the business rules into a black-box, inaccessable by other BI tools or consumers? Also, what happens if I want to bypass the semantic layer and write SQL for an ad hoc query – I need an indepth understanding of the 3NF schema. Is there another blog entry to cover these off?

    Thanks

  6. Max ROY Says:

    Good discussion, Stewart.

    My take on this is that DW & BI is typically built for large enterprises where there are multiple sources of data into the DW, both internal as well as external.

    So before embarking on Agile iterations, IMHO, there has to be some up-front Architectural/Data Modeling work to define the interactions & dependencies between these many data sources so as to create a coherent, consistent, stable DW DB Schema(s). This can be either Normalized or Dimensional; but without this framework I think the subsequent Agile sprints will flounder and often redo work as there is no Referential Baseline within which to develop the User Dashboards/Reports.

    For small & simple DW’s the Agile teams can probably ‘evolve’ the Architecture/Model as they go along. But for the DW’s I’ve worked on, we required that initial High-Level Design or SAD that the Agile teams could reference.

    Comments welcome.

Website Design & Build: tymedia.co.uk