Time for dates (again)
December 12th, 2006 by Peter Scott
One of the themes I keep retuning to is the time dimension in Data Warehouses. I use the word “time” in the physics sense of something continuous and not the cyclical “clock” sense. Time (and in my sense, time could be measured in units far more coarse than hours and minutes) is almost ubiquitous in data warehouses; almost, as I have worked with an asset maintenance data warehouse for a water company where events for an asset could be tens of years apart - even then, I concede, time is present as a dimension but it is not one that is widely used in query predicates.
Time is a bit of an oddity amongst the data warehouse dimensions: it is not finite, it just goes on and on, more pertinently, much of the time dimension can be derived programmatically - now try doing that with telephone subscribers. I say, much, as there may well be some business rules in building some of the hierarchies: fiscal years, fiscal periods and even seasons don’t always line up exactly the calendar year. The third oddity is that there is actually an inbuilt data type to support DATE - and using it can really help the query optimiser with questions of cardinality. The database just knows that there is one day between November 30 2006 and December 1 2006 - but code that as two strings or numbers and exactly how many values could appear between 20061130 & 20061201?
Although calendar values could be built on the fly, there is much merit of storing it as a denormalised table. It only needs occasional maintenance (perhaps removing some old dates and adding new ones each year) but mostly it will just sit there and be used. When I build table like this I tend to pepper them with really useful attributes such as offsets into quarters and day of month, they come into there own when you do date base aggregations such as month to date against same period last month, especially if you use a OLAP cube.

December 12th, 2006 at 10:15 pm
When I build table like this I tend to pepper them with really useful attributes …
A couple of manufacturing companies I worked for (both food, strangely enough) worked on 13 month calendars, too, so a seperate table can help that contains information about which manufacturing month and which calendar month a day is in.
December 12th, 2006 at 11:55 pm
“The third oddity is that there is actually an inbuilt data type to support DATE - and using it can really help the query optimiser with questions of cardinality. The database just knows that there is one day between November 30 2006 and December 1 2006″
I’m not a great fan of this optimizer “feature”, quite frankly. The DATE datatype is totally inappropriate for the optimizer to make “day-based” extrapolations as far as cardinality goes. Fact is: there are 86400 secs in a day and anywhere in there it may show with a skew in a DATE column. To assume that such a column shows anything about calendar day cardinality is IMHO akin to assuming that the highest date will be 3999/Dec/31. Ie, it’s just dead wrong. Although *sometimes*, indeed, useful.
But the CBO folks reckon that’s not the case, so there…
Cool idea, the additional denormalisation for quarters, accounting periods, offsets. I’ll definitely use that at some stage, I reckon.
While I’m here: have you found much use for the TIMESTAMP datatype? I’ve been thinking about that one for a while and can’t for the life of me figure where it’d indeed be useful. Maybe in the DW sphere, somewhere? Dunno, beats me.
December 13th, 2006 at 12:40 am
While I’m here: have you found much use for the TIMESTAMP datatype?
(With tongue firmly lodged in cheek)
Well, it means you can use a TIMESTAMP for the insert_time in your Primary Key to make it ‘unique’, instead of a DATE. That way you don’t run into occasional errors with duplicate primary key values when inserting rows too quickly in a bulk load process.
I’ll get my coat.
December 13th, 2006 at 6:17 am
you mean sequences are “dead”?

December 13th, 2006 at 7:50 am
Noons - my world is a bit (lot) slower than yours - whole days are typical granularities for many of our DW systems. Where I wandered into the realms of date + time I admit that DATE may not always be a good choice (especially if you bitmap index it!)
I sometimes split date into a DAY reference key and the clock time as a fact measure but again that does not suit every case
Never used timestamps - they are big and don’t help my customers find the data they are looking for
December 13th, 2006 at 12:21 pm
I’ve used timestamps for a heavily used debug log table, but only so that the order of events could be pinpointed a bit more accurately than 1 second, although in the end I found that a pk based on a sequence did just as well.
I tried adding a unique constraint to a timestamp, but under load testing (i.e. 1000’s of concurrent transactions) we got the occasional unique constraint violation (and this is on a Unix box that provides timestamps accurate to 0.000001 of a second).
December 14th, 2006 at 4:40 pm
Pete-I agree that for most BI Reports that sit on a DW timestamp(atleast for the ones we have here) and date’s dont have much value as opposed to rolling them up to quarter’s/year/by month etc.
Siebel Analytics - does not support timestamp datatype - you will have to cast it to use it in the BI Tool. I am not too sure about other BI tools.
December 15th, 2006 at 3:20 pm
Pete,
The time dimension is a problem in OLAP in a number of ways:
1) It is actually a continuous variable, but OLAP’s maths relies only on discrete values.
2) You can have time as a dimension or as a measure and apply min and max to it - but you can’t do much using both these concepts together.
3) It goes on and on, but the classic OLAP operators assume a finite number of discrete members on a dimension.
…
Need a change in the mathematical modelling of the cube in my opinion. don’t know what to replace it with, but definitely need something else than a simple discrete hierarchy for time at some stage.