Status changes over time
June 21st, 2007 by Peter Scott
Curiously I am working on two very similar data architecture projects at the moment, both small scale data warehouses, and both in similar subject domains. Most of the data warehouses I get involved in the are effectively a series of snapshots of a set of measures for discrete time slices be it average stock holding in a week, sales per day, calls per month. These two, however, are more aligned to track events that happen to an item over time.
Both of the current projects are based on CRM processes that monitor the progression of customer queries from first contact to resolution. This type of requirement is amazingly common and can occur in any business sector where a case is raised and tracked, common examples include support desks, insurance claims and customer complaints systems; closely similar systems exist in engineering where (say) maintenance activities on aircraft components are tracked over time.
Suppose, the system was for my burger bar owning customer, and that they had 100 odd outlets. Somewhere in their corporate office there would be a team that handled complaints from customers that broke teeth on bones in the meat, burnt their mouths on hot food or fell on wet floors. The complaints staff would log the problem in there complaints management system and periodically up date the complaint as it progresses to resolution. On the way the status of the complaint will change through various states such as logged, under investigation, settled (and perhaps a myriad of other steps on the way). But for reporting we want to know things such as complaints logged per day per outlet, number of open cases at month end or total compensation paid per quarter.
Dimensionally modeling such things can get a little tricky - certain things such as outlet and date drop out quite easily in my burger chain, maybe we can also use server if we are trying to spot the accident prone or perhaps customer if we are looking for fraudsters. But how do we handle the ‘case’? A single case has a lifespan, before it is first logged it does not exist in the data warehouse, after it is resolved it probably won’t reappear (this may not be the case for IT support calls
) We could create a case dimension and have case status as a measure, but this does not lead to clean looking queries as we will almost invariably use a predicate on the fact measure. Or we could make case status a dimension, or make…
The more I think about this the more I am tending to favour measureless fact tables, that is just a skinny set of dimension keys (including case status) and key to join to the detail of the case for those whose interest is more in mining than reporting.
April 24th, 2008 at 9:34 pm
This fact is called fact less fact. Case Number is; degenerated dimension, statusSeq will give the latest status and date is, when the status changed. The lowest grain is; Case Number,StatusSeq,DateSeq,ShopSeq. Counter is helpful to count for given status.
CaseFact
Case Number
StatusSeq
DateSeq
ShopSeq
Counter (Default = 1)
Dimension tables are:
StatusDIM
ShopDim
DataDim
April 25th, 2008 at 10:06 am
Thanks for the comment Farooq - this is roughly the same as implemented. If I had a fixed set of activities I would be tempted to model this as an ‘updateable’ fact table with a separate date dimension for each of the milestone activities - say call logged date, call response date, close date. We can then simply track changes in status and measure response times. Kimball uses this technique in his Data Warehouse Toolkit book.
April 25th, 2008 at 12:49 pm
Interesting! Could you please elaborate “separate date dimension” concept?
This site will give you good stuff for design challenges (just ignore it if you know it already)
http://www.stevehoberman.com/challenges.htm
Thanks
April 25th, 2008 at 7:31 pm
@Farooq - if we have a “fact” that has dates such as ‘open date’, ‘response date’ and ‘close date’ we should try to expose each of these dimensions as views over the same base date dimension - some query tools get very upset if you try joining to the same physical date table to more than one fact foreign key.
April 26th, 2008 at 12:24 am
Thank you! Peter. That is a good logic.
One more question for different scenario: If we need hours and minutes for fact table than what is the best way? Like: for call center, how many calls resolved by specialist per hour? What is peak time for receiving calls? Expected dates are: CaseReceived dateTime, caseComplete dateTime, stausChange dateTime. If we use dateTime 3 times in fact as fact value, its going to be huge?
Assumbtions:
The lowest grain for date dimension is; one record per day.
One million calls per month.