Kent Graziano On Agile Methods And Data Warehousing

February 28th, 2005 by Mark Rittman

A paper that caught my eye at the recent
OTDUG Desktop Conference was
the one by Kent Graziano on
Agile Methods
and Data Warehousing
. You have to be a conference attendee to view the
presentations, but I got hold of Kent’s email address and dropped him a line and he
was kind enough to send me a review copy of the conference paper. I subsequently
noticed that Kent delivered the paper again at the RMOUG Training Days and you
can actually download the
powerpoint slides
that came with the presentation (and can get the paper if
you purchase the conference proceedings, which are a bargain I hear). I’ve had a look through
the paper and presentation now, and I thought it worth making a few notes on.

The presentation abstract was stated as:

"Most people will agree that data warehousing and business
intelligence projects take too long to deliver tangible results. Often by
the time a solution is in place, the business needs have changed. With all
the talk about Agile development methods and Extreme Programming, the
question arises as to how these approaches can be used to deliver data
warehouse and business intelligence projects faster. This paper will attempt
to look at some of the principles behind the Agile Manifesto and see how
they might be applied in the context of a traditional data warehouse
project. We will also examine a new approach call the Data Vault to see if
that methodology helps. The goal is to determine a method or methods to get
a more rapid (2-4 week) delivery of portions of an enterprise data warehouse
architecture."

Agile methods are a bit of a hot topic at the moment and I took a look at
them, in the context of data warehousing, earlier last year in
two
articles. My interest
at the time was sparked by the promise of a more flexible, responsive and
customer-driven approach to data warehousing projects, as a way of giving us a
competitive edge over competitors who were stuck with methodologies that
discouraged requirements changes and only delivered benefits right at the end of
the project. At the time there were a couple of astute comments, particularly by
Andy Todd and
Guy Mortenson, with
Guy making the observation:

"for context: my interest is the boundry conditions between, say, XP
and datawarehousing.

It seems to me there are two different ways of approaching ‘agile’ and data
warehousing projects. One way is to say ‘this agile stuff looks interesting
(http://agilemanifesto.org/principles.html), let’s see if we can apply it to
DW projects, or the other is to say, what was the epipheny that grew into
agile and go through the same process to arrive at ‘son of agile’, or
‘brother of agile’? Kent Beck, I have it on good authority, went through a
process of ‘this isn’t working, there has to be a better way. What is the
minimum we need to build software? We need source code. We need to know it
works. We need design. We need to know what to build.’ Thus, we arrive at
continuous integration, test first, refactoring, customer onsight’(etc).
(Thanks to Alan Francis [http://www.twelve71.com/]for the insight).

In the datawarehousing world, I believe we’re at a kind of ‘this (often)
doesn’t work, there has to be a better way’ stage. Which path is then taken
depends on the appetite for people to "embrace change" in the first place
and then to either adapt agile or take agile to its extreme and start agile
anew."

In the presentation, Kent examines the
twelve key principles behind
the agile manifesto
and looks at how they can be applied to data warehousing
projects. Looking first at principle number one, "Our highest priority is to
satisfy the customer through early and continuous delivery of valuable
software", Kent notes that we really need to consider the valuable software as
being the ETL interfaces, data structures and reports, and customers as being
end users or "knowledge workers", otherwise all we’d be talking about the
process of delivering reports and dashboards to the final end users and ignoring
all the work that has to go into putting the infrastructure in place. The second
principle, around being able to welcome change, is the one that is most
intererest to me, and Kent proposes that the best way of being able to
accommodate this is to build your warehouse in a normalized fashion, and to do
so with a code generation tool such as Oracle Designer or Oracle Warehouse
Builder.

I won’t try and paraphrase the entire presentation, but one other principle
that I was keen to get Kent’s opinion on is the one around delivering software
frequently, from a couple of weeks to a couple of months, with a preference to
the shorter timescale. In the conference paper Kent suggests that deliverables
could be defined as:

"1.      
A fact table for a star schema
2.
      
A dimension table
3.
      
A complete star (fact and all dimensions)
4.
      
One piece of ETL code that populates a fact table
5.
      
A function needed by the ETL code
6.
      
A new report or query"

and then argues that:

"Even if the original intent of Agile did not consider database and
ETL type development efforts, can we not apply the principles anyway? The
intent of this paper is to propose some new best practices for data
warehouse development. So my proposition is that we can apply the concepts
and principles of Agile as a means of organizing our work efforts and our
teams to be more efficient and deliver something sooner rather than later.
Is that not a good thing? Remember that Principle #3 indicates delivery in a
couple of weeks to a couple of months, so a two week interaction, while
desirable, is not mandatory to be considered Agile. Perhaps we (the Oracle
and data warehouse community) need to be broader in our thinking and our
interpretation of Agile methods. Why can’t we use Agile methods and concepts
to deliver a database structure quickly? Do we have to follow the letter of
the law to reap benefit from Agile thinking? I think not."

The idea of using agile techniques when building databases is not new, with
Martin Taylor and Pramod Sadalage publishing
"Agile Database Design"
early in 2003 and a whole book,

"Agile Database Techniques"
, written on the subject. OTN also published an
article last year by Scott Ambler, the book author, on

"Agile Development
and the Developer/DBA Connection"
, and if you read any of the books
by Ralph Kimball on dimensional data warehousing, what he’s actually suggesting
is an agile method that delivers a data warehouse based on incremental, short
steps, based around subject areas.

So what are my thoughts? I’m all for it actually, tempered however by the
reality that on most projects I get to engage on, the client wants to define
requirements up front (indeed, has often defined them by the time we get there)
and isn’t too keen on what they might consider to be an open-ended approach to
development, especially if they see it as leading to a price tag (or delivery
date) they can’t predict in advance. I noticed that in the paper, Kent suggests
that a normalized data warehouse design lends itself well to agile development,
and so I dropped him an email to clarify this – what Kent is actually saying is
not that dimensional warehouses aren’t suited to agile development (indeed they
may be faster to develop than normalized ones) – it’s just that he feels that
with dimensional, star schema warehouses you end up transforming the data
multiple times to feed multiple fact tables, and this additional work makes it
difficult to be agile. I guess I’d disagree with this, as it’s been my
experience that well designed dimensional warehouses don’t necessarily have lots
of duplicated data, but I’ll defer to Kent’s experience on this one. Certainly, I can certainly see why a code generating tool, coupled with
repeatable processes and integrated regression testing, would make the process
of implementing data model changes much more manageable. All in all, an excellent
bit of analysis and I’ll be interested to hear at a later date how he
got on with this approach.

The powerpoint presentation can be
downloaded from the
RMOUG Training Days site, and if you were a training days attendee or you
"attended" the Desktop 2005 conference, you should be able to get hold of the
paper as well.

Comments

  1. Andy Todd Says:

    A great summation Mark, and in the last paragraph but one you’ve nailed what I call “the consultant’s dilemma”.
    Agile methods grew up because requirements change, and this we have to adapt during the course of the project. But customers (and especially those in charge of controlling costs) want certainty – of budget and delivery date usually. The key challenge for those of us managing projects is to balance these competing requirements.
    And it’s not easy. Sometimes it’s downright impossible. Certainly setting expectations up front makes the project a lot easier, and it’s possible to allay fears about spiralling costs and disappearing delivery dates by pointing out that you will actually be delivering more, sooner. I usually pull in the pragmatic programmer’s analogy of ‘tracer bullets’ into these discusssions to illustrate the benefits of the approach – mainly that we shouldn’t be going down any dead end paths.
    Oooh, I can feel a post to my blog coming on.

  2. Mark Says:

    Thanks Andy, and I must get round to reading the Pragmatic Programmer – it’s been on my list for a while now but something else has always cropped up. Also, i’ll look out for your blog article.
    Mark

  3. Eric Worthy Says:

    What Kent talks about is what real IT shops have been doing for years. It’s funny how people can write papers about common sense and they become ‘experts’.

Website Design & Build: tymedia.co.uk