Dimensions on the side

March 6th, 2007 by Peter Scott

It is quite a while since I blogged the mini-series on data warehouse design (it started here) and was proceeded by a series with the grand title of DW Wisdom - where I strayed into talking about parallel queries, RAID5 and massive disks. But I did not really write much about dimensions. So to make up for it:

Thinking in dimensional terms is (fortunately) an easy thing for BI users to do; for example looking at sales data there is typically a “What” was sold, a “Where” was sold, a “When” and a often a to “Who”. Knowing the key values for these four Ws will enable a single transaction to be found, but it also gives the prospect of being able to aggregate on one or more of the dimensions to provide summary information that is useful to the business. Often these dimensions have a hierarchical structure - a store is on a street, in a city, in a state, in a region, in a country in a continent (and for Star Wars fans and others with special needs) on a planet, in a solar system, in a galaxy…

Sometimes there are more than one hierarchy within a dimension - date for example could have both a fiscal hierarchy and a calendar year hierarchy, sharing a common point of ‘DAY’. Other dimensions could have alternate navigation paths that re-meet at the total level - for example a store can be part of hierarchy that goes store -> state -> region -> all stores and also in a store -> store size -> all stores hierarchy.

But real data warehouse often have more than four dimensions. I’ll write about that next time (after I have thought up some more “W” words to name them)

Comments

  1. APC Says:

    >> after I have thought up some more “W” words to name them

    Why? And, er, hoW?

    Cheers, APC

  2. Peter Scott Says:

    Arfur C - - I knew that someone would write that… and I also knew it was likely to be you ;-)
    My wife tells me that “how” used to be ‘”whow” but then she has studied such things

  3. vidya Says:

    Pete,

    Having mentioned dimensions. just something that came to my mind and was curious to see what most in the Industry follow as well. For Eg: a Time Dimension I tend to snowflake it to month day and time.
    but I know a few just have multiple dimensions for year/month/day/ time.
    Is there a reason one would be better than the other?

  4. Peter Scott Says:

    Time is always the odd dimension. A lot of what you decide to do will come from the granularity of the information being stored. For example if you can get by on truncating the date/time value to the day and not bother with the time component then all well and good; it can certainly simplify things; you could build a date (aka time) dimension with hierarchies to navigate from day to month or quarter and you have the choice of presenting it as a denormalized dimension table or as a snowflake, your choice really comes down to the query tools you use and the database technology in play.
    But add in time and we can no longer (sensibly) have a conformed dimension (we would have to include all of the possible time values in our dimension table). Another question to ask is what to people want to establish with the use of time - if we are comparing 05:30 UTC over a set of days with the 19:37 UTC over the same dates there might be something to be said having clock time as a separate dimension to calendar date