Reasons to review data warehouse health

September 12th, 2007 by Peter Scott

A few weeks ago I was chatting to someone about ‘health checks’ for data warehouses and the sort of reason that data warehouse owners might have for commissioning them. We came up with three fundamental reasons:

  • doubts about the current system
  • fitness for future enhancement
  • an as check on the developers currently building a data warehouse.

In my experience the most common reason to have a health check carried out is that something is perceived to be wrong with the current system.

Sometimes the ETL processes just take too long for the batch window and availability gets squeezed but the usual reason to think something to be amiss is that the users (that is, the guys who paid for the data warehouse) are complaining the system is too slow or the results are wrong.

Systems that were once speedy can become slow with increasing data volumes and numbers of users; what was right-sized five years ago may not be so good now that more people use the system and we have 10 years of history and not 5. Likewise the things that people ask change; we have a customer that used to have only six of their users that needed to access one ‘raw’ data table and then only ran a few simple queries per week against it, now there are 200 users each rolling up 6 months of sales for individual customers hitting the table many times every day, and performance suffers because design compromises where used to balance storage against performance weighted by usage. Older data warehouses are often locked into older design principles. Even if the DBA team keeps on top of things with patching and database versions, the actual code behind the data warehouse may well be locked into the features that existed when the system was originally developed; old 8i data warehouses even when upgraded to 11g may not exploit features such as multi-table inserts, index organised tables and the whizzy partitioning options we have now because nobodyhas  revisited the code that maintains the DW.  Sometimes, of course, the data warehouse was never ever speedy, perhaps because the hardware is physically under specified for the job in hand or the design wastes the potential of the system.

Wrong results or more likely, untrusted results are in some ways more of a challenge to investigate: what basis does the doubt come from, is there scope in the process to omit, or double count data, do our data cleansing routines standup to scrutiny?

Fitness for future enhancement is self-evident. Successful data warehouses have enthused users that want to know more ; a fundamental question is will the more fit?

To be honest, I rarely get invited to review someone else’s data warehouse development whilst it is being built; a sort of “police the developer role”. I get asked to review plans and finished systems, but work in progress seems off limits to many organisations, perhaps it is just too political a topic

Comments

  1. Jon Mead Says:

    In my (humble) experience reviews are often triggered by (1) the ETL process extending beyond the required window or (2) reports taking too long to run. The notable thing here is they are less often triggered by data quality. I guess this could mean two things – first everyone has Data Warehouses with high quality data, or second that as long as users have something in front of them in a timely fashion they are happy :) (there’s always Excel to make those last minute adjustments),

    Jon

  2. Peter Scott Says:

    :-)

    I’ve seen users that have little trust in a DW because they feel data is missing – it’s the self-same users that extract the whole customer dimension to Access so they can do their own thing :-)

    And missing could mean rejected on quality grounds, or even out of scope for the data warehouse

Website Design & Build: tymedia.co.uk