DW Design (part 1)

“Which comes first, the physical or logical design for an Oracle data warehouse?” has always been a question that I have struggled to come up with a convincing answer. To my thinking they both develop simultaneously. True, when designing a data warehouse there will be a set of design criteria that define the bounds of the project. Typically initial sizing parameters will be known, we know how many years of history we need to access, we know how many items of reference data we need hold and its projected growth rate, we also know from whence our source data comes and whether it is in our design remit to cleanse it. We may have knowledge of the division of time between data load, user access and the essential routine maintenance tasks such as backups. There will be many unknowns (such as summary sizing) but we have probably enough to start an iterative initial design.

For a long while I have favoured a three section data warehouse design: a staging area where raw fact and reference data is validated for referential integrity, a third-normal form layer to hold the reference data and historical fact, and finally a presentation layer to hold denormalised reference data and aggregated fact. The staging layer is ‘private’ to the data warehouse but user query access (subject to business security rules) to other layers is permitted. In some cases it will not be possible to use a denormalised layer; but if you can use one, you should.

In the next part I will look at staging in more detail.