“What are the advantages of
"What are the advantages of star schemas over other design patterns? From a performance point of view, why are star schemas is better than snowflakes?" asks a recent poster on OTN.
Star vs. Snowflake schema is a classic question asked when putting together a dimensional datawarehouse. Basically, snowflaking is the process of removing low cardinality attributes from a dimension table, and placing them in a separate dimension table connected by a snowflake key. The aim of this is to reduce the total amount of storage needed for a dimension, by removing 'redundant' data.
This approach is often suggested by traditional database designers when building a data warehouse, as tradition tells you to normalise your data model to remove all redundant information. However, by splitting dimension information across several tables, you increase the amount of joins needed to perform a query, complicate the data model in the users' eyes, and often decrease query performance.
These days, disk space is cheap, and there isn't the imperitive that there once was to keep disk usage to a minimum. The main point behind your data warehouse design is to make query response as fast as possible, removing any unneccessary joins, and it should be a rare occurance indeed where it's worth the effort splitting out your dimensions into snowflakes to reduce disk storage.
Given the difference in size between a typical fact table, and the dimensions surrounding it, fact tables are where you should concentrate your effort to improve performance and reduce storage, given that most fact tables comprise 80% to 90% of your data warehouse.
Given that, there are a few situations where it's worth snowflaking;
- Very large customer dimensions (think telcos)
- Financial Product Dimensions (where each product has a host of special attributes not shared by other products)
- MultiEnterprise Calender Dimensions (because each organization has idiosyncratic fiscal periods, seasons, and holidays)
This situation is outlined in an article by Ralph Kimball called "A Trio Of Interesting Snowflakes".
OWB currently only allows true star schemas to be defined, with all of a dimension's attributes held in the one table. However, if you are happy to define the dimensions yourself, the CREATE_DIMENSION statement allows you to select individual levels, and their attributes, from multiple tables. Mark Van De Wiel also confirmed in a previous posting on OTN that future versions of OWB would allow snowflakes to be defined.