Like buses in London you wait ages for one to arrive and then three come along at the same time. I started writing this piece a while back and since I started Beth wrote on data quality, Jon Mead touched on conformed dimensions in a follow up to his posting on dimensions, and Andy Whitehurst of Xansa spoke about Master Data Management at yesterday's UKOUG BIRT SIG meeting in London.
What I was going to write touches on all thee topics but is hopefully far enough removed from them not to be seen as mere band-wagon hopping.
Quality of data has always been key to successful data warehouses. In database terms this could be related to traditional constraints: unique, to avoid duplicates; foreign key, to ensure that each child has a parent; not null, to make sure mandatory attributes are present. So far, so understandable. But there is another form of duplication we need to avoid in data warehouses - the same item, different name duplicate. By this I don't mean where a product is simply renamed (such as when Marathon bars became Snickers in the UK) but when you expand a data warehouse to take in data from other functional groups and find that part of the business uses completely different terms for their reference data, or the case of mergers and acquisitions where the two companies effectively referred to the identical items in different ways, perhaps with differing hierarchies. The challenge here is to create a common data model that allows the business to understand the data but also allows the proper amount of validation to ensure the the facts in the data warehouse add up.