Kent Graziano On Agile Methods And Data Warehousing

A paper that caught my eye at the recent OTDUG Desktop Conference was the one by Kent Graziano on Agile Methods and Data Warehousing. You have to be a conference attendee to view the presentations, but I got hold of Kent's email address and dropped him a line and he was kind enough to send me a review copy of the conference paper. I subsequently noticed that Kent delivered the paper again at the RMOUG Training Days and you can actually download the powerpoint slides that came with the presentation (and can get the paper if you purchase the conference proceedings, which are a bargain I hear). I've had a look through the paper and presentation now, and I thought it worth making a few notes on.

The presentation abstract was stated as:

"Most people will agree that data warehousing and business intelligence projects take too long to deliver tangible results. Often by the time a solution is in place, the business needs have changed. With all the talk about Agile development methods and Extreme Programming, the question arises as to how these approaches can be used to deliver data warehouse and business intelligence projects faster. This paper will attempt to look at some of the principles behind the Agile Manifesto and see how they might be applied in the context of a traditional data warehouse project. We will also examine a new approach call the Data Vault to see if that methodology helps. The goal is to determine a method or methods to get a more rapid (2-4 week) delivery of portions of an enterprise data warehouse architecture."

Agile methods are a bit of a hot topic at the moment and I took a look at them, in the context of data warehousing, earlier last year in two articles. My interest at the time was sparked by the promise of a more flexible, responsive and customer-driven approach to data warehousing projects, as a way of giving us a competitive edge over competitors who were stuck with methodologies that discouraged requirements changes and only delivered benefits right at the end of the project. At the time there were a couple of astute comments, particularly by Andy Todd and Guy Mortenson, with Guy making the observation:

"for context: my interest is the boundry conditions between, say, XP and datawarehousing.

It seems to me there are two different ways of approaching 'agile' and data warehousing projects. One way is to say 'this agile stuff looks interesting (http://agilemanifesto.org/principles.html), let's see if we can apply it to DW projects, or the other is to say, what was the epipheny that grew into agile and go through the same process to arrive at 'son of agile', or 'brother of agile'? Kent Beck, I have it on good authority, went through a process of 'this isn't working, there has to be a better way. What is the minimum we need to build software? We need source code. We need to know it works. We need design. We need to know what to build.' Thus, we arrive at continuous integration, test first, refactoring, customer onsight'(etc). (Thanks to Alan Francis [http://www.twelve71.com/]for the insight).

In the datawarehousing world, I believe we're at a kind of 'this (often) doesn't work, there has to be a better way' stage. Which path is then taken depends on the appetite for people to "embrace change" in the first place and then to either adapt agile or take agile to its extreme and start agile anew."

In the presentation, Kent examines the twelve key principles behind the agile manifesto and looks at how they can be applied to data warehousing projects. Looking first at principle number one, "Our highest priority is to satisfy the customer through early and continuous delivery of valuable software", Kent notes that we really need to consider the valuable software as being the ETL interfaces, data structures and reports, and customers as being end users or "knowledge workers", otherwise all we'd be talking about the process of delivering reports and dashboards to the final end users and ignoring all the work that has to go into putting the infrastructure in place. The second principle, around being able to welcome change, is the one that is most intererest to me, and Kent proposes that the best way of being able to accommodate this is to build your warehouse in a normalized fashion, and to do so with a code generation tool such as Oracle Designer or Oracle Warehouse Builder.

I won't try and paraphrase the entire presentation, but one other principle that I was keen to get Kent's opinion on is the one around delivering software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale. In the conference paper Kent suggests that deliverables could be defined as:

"1.       A fact table for a star schema
2.
       A dimension table
3.
       A complete star (fact and all dimensions)
4.
       One piece of ETL code that populates a fact table
5.
       A function needed by the ETL code
6.
       A new report or query"

and then argues that:

"Even if the original intent of Agile did not consider database and ETL type development efforts, can we not apply the principles anyway? The intent of this paper is to propose some new best practices for data warehouse development. So my proposition is that we can apply the concepts and principles of Agile as a means of organizing our work efforts and our teams to be more efficient and deliver something sooner rather than later. Is that not a good thing? Remember that Principle #3 indicates delivery in a couple of weeks to a couple of months, so a two week interaction, while desirable, is not mandatory to be considered Agile. Perhaps we (the Oracle and data warehouse community) need to be broader in our thinking and our interpretation of Agile methods. Why can't we use Agile methods and concepts to deliver a database structure quickly? Do we have to follow the letter of the law to reap benefit from Agile thinking? I think not."

The idea of using agile techniques when building databases is not new, with Martin Taylor and Pramod Sadalage publishing "Agile Database Design" early in 2003 and a whole book, "Agile Database Techniques", written on the subject. OTN also published an article last year by Scott Ambler, the book author, on "Agile Development and the Developer/DBA Connection", and if you read any of the books by Ralph Kimball on dimensional data warehousing, what he's actually suggesting is an agile method that delivers a data warehouse based on incremental, short steps, based around subject areas.

So what are my thoughts? I'm all for it actually, tempered however by the reality that on most projects I get to engage on, the client wants to define requirements up front (indeed, has often defined them by the time we get there) and isn't too keen on what they might consider to be an open-ended approach to development, especially if they see it as leading to a price tag (or delivery date) they can't predict in advance. I noticed that in the paper, Kent suggests that a normalized data warehouse design lends itself well to agile development, and so I dropped him an email to clarify this - what Kent is actually saying is not that dimensional warehouses aren't suited to agile development (indeed they may be faster to develop than normalized ones) - it's just that he feels that with dimensional, star schema warehouses you end up transforming the data multiple times to feed multiple fact tables, and this additional work makes it difficult to be agile. I guess I'd disagree with this, as it's been my experience that well designed dimensional warehouses don't necessarily have lots of duplicated data, but I'll defer to Kent's experience on this one. Certainly, I can certainly see why a code generating tool, coupled with repeatable processes and integrated regression testing, would make the process of implementing data model changes much more manageable. All in all, an excellent bit of analysis and I'll be interested to hear at a later date how he got on with this approach.

The powerpoint presentation can be downloaded from the RMOUG Training Days site, and if you were a training days attendee or you "attended" the Desktop 2005 conference, you should be able to get hold of the paper as well.