Materialized View and Partitioning "Gotchas", Part II

June 11th, 2009 by Stewart Bryson

If you follow the blog, then you know that Mark recently posted about materialized views for aggregation, located here.

One of the issues he encountered was how to refresh materialized views based on tables loaded with partition exchanges… a common approach to loading fact tables while minimizing the downtime for the reporting schema. Once the appropriate indexes are built on a staging table, that table can then become part of the partitioned table with little more than a dictionary update on Oracle’s part… which is a near instantaneous process.

Mark attempted to refresh the materialized view with two different approaches. First, he demonstrated a “FAST” refresh… which uses materialized view logs to maintain recent updates to the table. But this seemed to cause the following error:

“ORA-32313: REFRESH FAST of “SH”.”PRODUCT_SALES_MV” unsupported after PMOPs”.

PMOP stands for “partition maintenance operation”, and includes the standard fare of partition-based DDL operations: EXCHANGES, SPLITS, TRUNCATES, DROPS, etc. Mark’s deduction: a fast refresh cannot occur using materialized view logs on a table that just accepted a partition exchange. If you think about it… this makes perfect sense. How is a materialized view log, which stores incremental data via a trigger on the base table, supposed to handle a whole new partition of data that is brought in through a DDL operation? The short answer is… it can’t.

So in his usual fashion, Mark presented the preferred alternative. He shows us how one might refresh the aggregate using Partition Change Tracking (PCT). This is a really cool feature that uses the inherent metadata that Oracle stores when the base table or tables are partitioned. The database knows which partitions contain modified, or “new” data, and how the rows in the materialized view map to that underlying table. Refreshing the materialized view is then easy… just refresh what needs refreshing. Simple enough.

So we have the complete picture about which method to use and when, right? Well… maybe not. In reworking the original examples for a new article he’s writing, Mark encountered something rather unusual: he was unable to recreate the ORA-32313 that he received in the original test case. He conferred with me and Peter Scott, a materialized view expert working with us at Rittman Mead, to shed light on what now seemed like a flawed test case. No matter which database version we tried, or which combination of controls we mashed together, none of us could recreate the original error. Personally, I tried all the PMOP’s: EXCHANGES, TRUNCATES, SPLITS, DROPS, etc. I also tried all the different permutations of the EXCHANGE command: WITHOUT VALIDATION, INCLUDING INDEXES… everything. I tried exchanging full partitions and empty ones. No matter what… I couldn’t generate the original error:

SQL> alter table sales exchange partition SALES_Q1_1998 with table sales_stg;

Table altered.

Elapsed: 00:00:01.19
SQL> EXEC DBMS_MVIEW.REFRESH('product_sales_mv', 'F');

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.20
SQL>

But then it occurred to me… what if PCT was playing a part in this? As Mark demonstrated in the original posting… the main drawback to using PCT in a standard star schema is the inability to maintain dimensional attributes in the stored aggregate. In other words… whenever the materialized view contains attributes derived from non-partitioned dimension tables… and those dimension tables themselves have been modified since the last refresh… PCT can no longer use partition metadata to keep track of which rows have changed. The actual error the database gives us is rather non-descriptive:

"ORA-12052: cannot fast refresh materialized view SH.PRODUCT_SALES_MV".

Oh… thanks for that Oracle. But Mark works through what’s needed to reconcile this error. Instead, we should maintain only fact table attributes in the aggregate… and then Oracle doesn’t care whether the dimension tables have been updated or not. The database will simply do a “join back” to the dimension tables during the Query Rewrite phase, while still avoiding a join to the largest table at play: the fact table.

So what part was PCT playing in the FAST refresh portion of the original test case… the one that wasn’t considering PCT at all? Well, as it turns out (and as the documentation points out), a FAST refresh does not necessarily utilize the materialized view logs, as we initially expected. Actually, in cases where there have been PMOP’s on one of the base tables, a FAST refresh will actually use PCT if available. So when Mark, Peter and myself were all trying to generate the PMOP error, we were unable to because the database was fooling us by using PCT to refresh the aggregate. It wasn’t until I tried the FAST refresh after modifying data in the PRODUCTS dimension table that I finally generated the error:

SQL> alter table sales exchange partition SALES_Q1_1998 with table sales_stg;

Table altered.

Elapsed: 00:00:00.16
SQL>
SQL> insert into products values
  2         ( 10001,
  3           'test product',
  4           'test product',
  5           'Test Subcategory',
  6           10001,
  7           'Test Subcategory',
  8           'Test Category',
  9           10001,
 10           'Test Category',
 11           1,
 12           'U',
 13           'P',
 14           1,
 15           'STATUS',
 16           1.99,
 17           1.99,
 18           'TOTAL',
 19           1,
 20           NULL,
 21           SYSDATE,
 22           NULL,
 23           'A');

1 row created.

Elapsed: 00:00:00.00
SQL> EXEC DBMS_MVIEW.REFRESH('product_sales_mv', 'F');
BEGIN DBMS_MVIEW.REFRESH('product_sales_mv', 'F'); END;

*
ERROR at line 1:
ORA-32313: REFRESH FAST of "SH"."PRODUCT_SALES_MV" unsupported after PMOPs
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2537
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2743
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2712
ORA-06512: at line 1

Elapsed: 00:00:00.09
SQL> 

So what’s the moral of this story? I guess it depends on your requirements. You might be able to sum it up by saying “The database is really fond of using PCT”… and perhaps the database is right this time.

Comments

  1. David Aldridge Says:

    That’s a nice write-up, Stewart. I think it is a great illustration of the principle of “Just because something is logically possible, that doesn’t mean that it is implemented”.

    I’ve been very cautious about implementing materialised views against joined tables, mostly because of the complexity of refreshing MVs when there are multiple logging methods used for the changes that need to be applied. By this I mean that there are, to my knowledge, three mechanisms for the MV refresh process to determine what changes need to be applied.

    i) MV logs, for conventional DML (mostly for dimensional maintenance)
    ii) Partition tracking, for partition-based DDL (fact tables, mostly)
    iii) ALL_SUMDELTA, for direct path inserts (mostly fact tables)

    With an MV based on a join of a partitioned fact table to a dimension table you might load most of your new facts with a partition exchange, add a few straggling transactions with direct path inserts into old partitions, and maintain multiple types of SCD in the dimension table using updates and conventional path inserts. If you want to fast-refresh that MV there are all three logging mechanisms in play, and I’d not be surprised to find that the RDBMS throws its hands in the air and spits an error code.

    So I have a pet theory, not fully proven really, that it is wisest when mixing these logging types in a single ETL load to perform multiple fast refreshes after each load type has completed. In the above example, that would mean three fast refreshes of course.

    You might like to see if you can fast refresh in two stages in your example that throws the 32313 PMOP error.

  2. Peter Scott Says:

    @David
    Back in 9.2 days I did some builds that fast refreshed twice – once for the dimension and again for the fact load. So yes that technique works, it is also cited in that most marvellous of Oracle publications, the Data Warehousing Guide – that book should be obligatory reading to all DWers even if they use other databases!

    In the end we moved to “bespoke” or DIY refresh (lovingly hand-coded) as the customer had a slight problem with fast refresh of the dimensions – it was so unpredictable in duration, if someone made a lot of change then that 10 minute refresh hits 7 hours…

  3. Stewart Bryson Says:

    I wonder how many rows in the materialized view are being refreshed twice in the two-step refresh process. On the surface, it seems like any number from all to none… depending on a multitude of factors, including: number of SCD1 attributes, how often there are SCD2 changes, the granularity of the aggregation, etc. Perhaps I haven’t thought this through all the way, however.

    As I mentioned in a comment to the original posting, I have taken the route Mark takes by maintaining only the surrogate keys to the dimension tables in the materialized view. If you think about it, this is pretty much in line with how Kimball describes aggregate navigation… using aggregated fact measures and shrunken dimensions. I have also taken the DIY approach similar to Pete’s, using materialized views on prebuilt tables that are managed as part of the ETL load… typically a splitter in an OWB mapping where aggregates are loaded at the same time as the fact using a multi-table insert.

    @David: thanks for the informative comment… and of course, for reading.

Write a comment





Website Design & Build: tymedia.co.uk