Technical

Time for dates (again)

Peter Scott

Dec 12, 2006 • 1 min read

One of the themes I keep retuning to is the time dimension in Data Warehouses. I use the word "time" in the physics sense of something continuous and not the cyclical "clock" sense. Time (and in my sense, time could be measured in units far more coarse than hours and minutes) is almost ubiquitous in data warehouses; almost, as I have worked with an asset maintenance data warehouse for a water company where events for an asset could be tens of years apart - even then, I concede, time is present as a dimension but it is not one that is widely used in query predicates.

Time is a bit of an oddity amongst the data warehouse dimensions: it is not finite, it just goes on and on, more pertinently, much of the time dimension can be derived programmatically - now try doing that with telephone subscribers. I say, much, as there may well be some business rules in building some of the hierarchies: fiscal years, fiscal periods and even seasons don't always line up exactly the calendar year. The third oddity is that there is actually an inbuilt data type to support DATE - and using it can really help the query optimiser with questions of cardinality. The database just knows that there is one day between November 30 2006 and December 1 2006 - but code that as two strings or numbers and exactly how many values could appear between 20061130 & 20061201?

Although calendar values could be built on the fly, there is much merit of storing it as a denormalised table. It only needs occasional maintenance (perhaps removing some old dates and adding new ones each year) but mostly it will just sit there and be used. When I build table like this I tend to pepper them with really useful attributes such as offsets into quarters and day of month, they come into there own when you do date base aggregations such as month to date against same period last month, especially if you use a OLAP cube.

Sign up for more like this.