January 20th, 2010 by Venkatakrishnan J
Degenerate dimensions are quite common when we do a Data Warehouse implementation. They are nothing but a set of dimensions that can be classified as Fact attributes and are hence maintained in the fact table itself instead of separate dimension tables. One of the key advantages of using Degenerate dimensions is the fact that there is no need for doing a separate Surrogate lookup while creating the ETL mappings. Generally degenerate dimensions do not alter the grain of a fact table and hence are maintained within the Fact table itself. In some cases, they act as a reference between the data warehouse and the transactional system.
For example, Order Numbers, Invoice Number, Transaction Dates, Sale Dates, Credit-Debit Indicator etc can all be classified as degenerate dimensions. The interesting aspect of degenerate dimensions is, they can get a little bit tricky while modeling in BI EE. Let me take this through an example. Lets assume that we have 2 fact tables SALES and COST (the SH schema in Oracle). Sales fact table has a degenerate dimension called SALE_ID(which acts as a reference to a SALE made in the transactional system). How do we model this? I will cover the method that is normally used and describe the pros & cons of this approach. After that i will cover 2 separate methods that can prove useful while modeling degenerate dimensions.
The traditional method of modeling degenerate dimensions is to include them in the Logical Fact table itself as shown below without any aggregation.
From my experience modeling degenerate dimensions this way can be useful only if you have the following requirements
1. Have just a single fact table and all your reports go against only those measures in that single fact table.
2. Requirement to create logical calculations (not db calculations) out of these degenerate dimensions. For example, FILTER calculations on a measure require the logical column to be present.
But even when you have the requirements listed above, care must be taken while exposing this attribute to the presentation layer. One of the major drawbacks of the above technique is the fact that when we model it this way, BI Server cannot enforce conforming dimension joins.
Non-Conforming Degenerate Dimensions:
For example, lets create a report as shown below.
As you see, when we bring in the degenerate dimension in to the report, the UNIT_COST measure which is from the other fact table becomes NULL. One of the main reasons why this happens is, SALE_ID does not have an aggregation associated with it. For conforming dimension based joins to happen, we need dimensional attributes that can be pushed inside the GROUP BY clause of a SQL query. When we do not have an aggregate specified for a column belonging to a fact table, BI Server does not know where to push this column in the SQL query. Hence it basically nulls out the UNIT_COST measure thereby negating the conforming dimension based join. In this case, degenerate dimension SALE_ID does not conform to the COST fact. How do we resolve this?
To resolve this, there are some fundamental questions that needs to be answered. The basic question will be, when we expose degenerate dimensions to end users, we need to know what needs to happen to the measures from other fact tables. In the above example, UNIT_COST has no relation whatsoever with the degenerate dimension. In hindsight, the null in the above report seems correct as it does not have relation to the SALE_ID. But since we have modeled it as a fact attribute, we would expect the UNIT_COST measure to be calculated for each channel and then repeat itself (like a Level Based measure) for every SALE_ID as shown below.
This is done by modeling the degenerate dimension as a separate dimension in BI EE as shown below. Remember, from a BI EE perspective one physical table can act as dimension or fact or both depending on how you model it. The model is shown below
The new dimension Sale that we modeled is still not related to the COST measures. To enable that we need to assign the COST measures to the total level of the Sale dimension.
This will enable reporting on degenerate dimensions even on non-conforming facts. This technique will even more be useful when you have the same degenerate dimension in 2 different fact tables. Lets look at that next.
Conforming Degenerate Dimensions:
Lets assume that the fact tables FACT_COST and FACT_SALES have a degenerate dimension called BOUGHT_BY & PAID_BY respectively. Both these attributes can have only 2 values (Credit-Card or Cash). If we model them as degenerate dimensions using the traditional method, the biggest issue we will face will be the inability to use them for reporting across the fact tables. If we model it as a dimension, we can effectively use a single column for 2 differing perspectives depending on how we look at the data.
We start with basically creating a new dimension in the BMM called Payment as shown below.
The payment logical column will be mapped to the BOUGHT_BY degenerate dimension column of the COSTS table and PAID_BY degenerate dimension column of the SALES table.
In the reporting layer, one can use this single attribute to report on 2 different fact tables even though the attribute maps to 2 different columns in the physical source.
Treating degenerate dimensions as normal dimensions in BI EE can prove more beneficial than using it as a fact attribute. Especially when you have users using BI EE as an ad-hoc analysis tool(when you do not want your model to generate wrong numbers/errors).