Summaries

May 20th, 2006 by Peter Scott

David Aldridge, aka the OracleSponge has recently blogged on the three ‘P’s of Data Warehousing. David commented that the “pseudo P”, summarisation, was perhaps the most important. Well, I must say that I agree with him there.

In whatever form, be they conventional tables or materialized views, summaries can dramatically boost performance for user queries. True they come at the cost of additional storage and the time it takes to refresh the summary as part of the batch, but they do do two things: they move a potentially expensive (in terms of IO resource and computation) query out of the on-line day, and they allow it to happen just once.

The choice of summaries is sometimes a bit of an art, especially with data warehouses with multiple dimensions; sales by item is relatively easy to aggregate, but what when you also take in to account store, customer, and supplier then which combinations of aggregates are needed? One of our data warehouses has 9 dimensions on the sales table, and many of those have more than 3 levels in the dimension hierarchy. Obviously, some combinations are never queried (or at least not by sane analysts) but other combinations will be very frequently used.

One significant advantage of Oracle Materialized Views and, in 10g, rewrite equivalence is that we do not need to map these summary tables into the user query tool; query rewrite will take a query against the base fact table and look for the best-fit aggregation that either fully or partially answers the query. This ability to use tables at an intermediate level of aggregation can help us reduce the number of aggregates we need to create.

Next post I will talk about GROUP BY ROLLUP and sub-partitioning.

Comments

  1. David Aldridge Says:

    >> One of our data warehouses has 9 dimensions on the sales table …

    Pshaw … you don’t know you’re born, young Peter. I just counted forty-one dimensions on a financial transactions fact table, plus two degenerate dimensions (thus neatly straddling the magic number).

    Fortunately no-one seems to know what some of them mean, so we discard those from consideration for summarisation. Our most detailed summary table against that table has fifteen dimensions — more than enough to cover most situations.

    A methodology that I’ve settled on for identifying situations amenable to summary is to look at report durations and frequency — multiply them together to get an approximate total resource usage and cherry-pick the top ones as the prime candidates.

    Add in to those any infrequently run reports that take excessive time to execute, and reports aimed at senior managers who have the power to can the whole project, and Bob’s your uncle.

Website Design & Build: tymedia.co.uk