Degenerate Dimensions Defined

April 21st, 2004 by Mark Rittman

Much of my work involves designing and reviewing dimensional data models, and
an interesting issue that often comes up is how to deal with data items such as
invoice number, order number and so on, that are not strictly facts - you’re not
going to want to add them up, or average them, or perform any other maths on
them - but they don’t seem to fit into existing dimensions.

Ralph Kimball coined the term
‘Degenerate
Dimensions’
for these data items, as they perform much the same function as
dimensions: they sit in the fact table and allow you to limit down or ’slice and
dice’ your fact table measures, but they aren’t foreign key links through to
dimension tables, as all the information you want - the invoice number, or the
order number - is contained in the degenerate dimension column itself.
Degenerate dimensions are useful as they tie the transactions, or events, in the
fact table back to real-life items - invoices, orders and so on - and they can
be a quick way to group together similar transactions for further analysis.

The key here is not to go overboard and make these degenerate dimensions into
full dimension tables - for example, an Invoice dimensions - as in all
likelihood this dimension table will grow at the same rate as your fact table.
If there is other interesting information to go with the invoice - for example,
who the customer was, what products were ordered - this is better placed in
specific dimensions for customers and products where it can be stored as a kind
of ‘master copy’, rather than storing it alongside each order in a balooning
Invoice dimension.

The other advantage with degenerate dimensions is that they’re a lot easier
to build and maintain when using ETL tools such as Oracle Warehouse Builder, as you don’t have to create dimension lookup tables, create
synthetic keys, sequences and so on. Indeed, if you’re loading your dimensional model into
a multidimensional database such as Oracle OLAP, your database will be much
smaller in size and easier to handle if you can keep the number of formal
dimensions to a minimum, as they tend to ‘explode’ in size the more dimensions
you add to the database.

Judicious use of degenerate dimensions keeps your dimensional model rational
and your database size reasonable, whilst allowing you to keep useful items in
the fact table that help us tie the data warehouse back to the original source
systems.

Comments

  1. John Emmer Says:

    Mark,
    I was wondering if in your work reviewing dimensional data models you have found any good resources or patterns for dealing with Degenerate Measures (fun name — actually embedded totals). We are designing a datamart product now, and our requirements group says that we will need to support ‘non-additive measures’. I can’t get a good example from them, but they claim there are some measures where there’s no way to derive the monthly number (or the ‘regional’ number) from the daily (or ‘group’) numbers. This makes it much harder (or impossible) to use materialized views for query rewrite, and we’re struggling with just what to do with this requirement if we can’t make it go away. Any suggestions or pointers to resources?
    Thanks!

  2. John Emmer Says:

    To respond to my own question, the best information I’ve found so far was a couple of articles linked on http://www.ralphkimball.com, which itself I found thanks to Mark’s discussion here. On this page:
    http://www.ralphkimball.com/html/articlesArchitecture/articlesAggregates.html
    you’ll find some articles on “Real-Life” datamart issues, and “Assymetric Aggregations”, both of which deal with this issue. This was particularly nice, as we are building a ‘key performance indicators’ application, and this is precisely where the author says these kinds of requirements often show up. My initial reaction to the requirements was something like “but that’s not a measure — measures are by definition additive numeric facts!” It’s nice to know others have faced the same issues!

  3. Mark Says:

    Hi John
    Sorry I didn’t get back to you earlier. I’ve been working away up in London for the last two weeks, and your question needed a bit more thought than the usual type.
    What I was going to suggest was Ralph Kimball’s site (as you found out), plus his book, “The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling”, which has dimensional modelling case studies for different industries.
    http://www.amazon.co.uk/exec/obidos/ASIN/B000083KQI/qid=1083179888/sr=1-4/ref=sr_1_11_4/026-4880710-1467620
    Regards
    Mark

  4. John Emmer Says:

    Thanks, Mark. I borrowed that book from a co-worker, and it did have some pertitent examples. Previously, I had only been looking at Kimball’s ‘Lifecycle Toolkit’ book, but the ‘plain’ Toolkit book has some more specifics that are nice to see.