Working Through Some SCD 2 and 3 Examples Using OWB10gR2

September 21st, 2006 by Mark Rittman

A couple of days ago I mentioned that I wanted to take a look at SCD2 and 3
handling using OWB10gR2. This was prompted by the fact that up until now I
hadn’t really looked in detail at this feature, and I’d been reading a few
articles and blog postings that suggested the way it works is a bit odd in the
first release. In particular, this blog posting by Roelant Vos suggests that it doesn’t
update non-trigger columns in the way you might expect, and corresponding
separately with Roelant there does appear to be an issue around how it handles
changes to parent-level attributes.

In this example then, I first of all create a new user called GLOBAL_SCD and
register it with OWB as a target schema. I then create a target module in OWB
and point it at this new user. The data I’m going to use as my test case is
based on the PRODUCT_DIM table in the Global Sample Schema, and I start off then
by dropping in to SQL*Plus and creating a copy of the table, leaving a few rows
out for insertion later on.

SQL> create table product_dim
  2  as
  3  select *
  4  from   global.product_dim
  5  where  item_id < 43
  6  /

Table created.

SQL> select count(*) from product_dim;

  COUNT(*)
----------
        30

This leaves me with 5 more rows left to insert, which I can use later on to
test how the SCD mechanism works with new as well as changed data.

Next I go into Warehouse Builder 10gR2 and define two dimensions:

  • PRODUCT_DIM_TYPE_2, defined as a SCD2 dimension, and
  • PRODUCT_DIM_TYPE_3, defined as a SCD3 dimension.

For anyone now familiar with the PRODUCT_DIM table in the Global Sample
Schema, it contains data corresponding to four levels (item, item family, item
class and total product), with the bottom item level having three additional
attributes (item package id, item buyer and item marketing manager). In my
scenario, I want to check out how both the SCD2 and SCD3-handling works, but
currently you can’t mix and match type 2 and type 3 attributes in the same
dimension, and so I’m going to set up PRODUCT_DIM_TYPE_2 to record all history
for the product family and product class levels, together with the item
marketing manager attribute; for the PRODUCT_DIM_TYPE_3 dimension, I’ll be
recording the previous and current values held in the item buyer field

Once I’ve created the basic definition for each dimension, creating the
dimension attributes, levels, hierarchy and so on the same for both dimension
versions, I then go into the Data Object Editor to fine-tune the SCD handling.

Starting with the Type 2 dimension, I click on the SCD tab and then click on
the "Settings…" tab next to the SCD2 radio button.

Then, when I view the settings, I can designate the DSC
attributes from the Family and Class levels, together with the Item Marketing
Manager attribute, as ones that trigger new rows to be written. Through a
process of trial and error I established that, if I designate an attribute to
trigger history in a level – class or family, in my example – you need to have
effective and expiration dates on every level concerned, having these dates just
once on each row isn’t enough. This is because of the way that OWB creates
dimensions such that they can be easily snowflaked – by having creation and
expiry dates for each level, you can easily separate them off in to separate
tables. When you do this through you’ll have to re-bind the dimension to the
table to create the additional columns.

Notice also that I’ve left item buyer and item package ID blank
- what will happen with these attributes if new values come in? – as I have also
with the ID columns for the class and family levels – these cannot be designated
as trigger columns as they are already business keys, the change in hierarchy
can only be detected through the description value changing.

I then go to the Type 3 dimension and edit the SCD settings for
this one. In this case all I do is designate the effective date column, and pick
"Previous Item Buyer" as the new attribute that old copies of Item
Buyer will be moved in to.

Notice how the SCD wizard has added in columns for all the other
attributes that I might want to track history on? To remove these, I go to the
Attributes tab for the dimension, right-click on the ones I don’t want, and
remove them from the attribute list.

Then, when I re-bind the dimension and view it again in the SCD
settings dialog, the additional, redundant PREVIOUS_ attributes have
disappeared.

Now I switch over to the mapping editor and build mappings to
load my two dimensions. For the Type 2 dimension, I map the columns from the
PRODUCT_DIM table to the relevant dimension attributes, and map a constant using
the expression SYSDATE to the effective date column – this provides a value for
‘starting off’ dimension rows if a trigger attribute changes. Warehouse Builder
takes care of the effective date attribute automatically, when it closes off an
expired row.

Notice how there’s nothing Type 2-specific about the attributes,
except of course there’s now these two new date attributes, effective and
expired date, and we just map to the dimension as normal. If you select the
dimension operator though, and click on the "Expand Child Graph"
button on the toolbar, you’ll find that the dimension operator is actually a
pluggable mapping that implements the splitting, merging, inserting and so on
that’s needed to implement this feature behind the scenes.

This is effectively what you’d have had to have built by hand
with earlier versions of OWB; now it’s just wrapped up as a pluggable mapping
that presents a dimensional interface to the rest of the mapping.

It’s similar with the Type 3 mapping. In this case however,
there’s a new PREVIOUS_ITEM_BUYER attribute that Warehouse Builder maintains
automatically (you can’t map to it) and an effective date attribute, which you
populate as with the Type 2 mapping.

Selecting the dimension operator and clicking on "Expand
Child Graph" shows a similar mapping that implements the Type 3 load.

So now we can perform the first load. After running the Type 2
and then Type 3 mappings from the Control Center, I bring up the dimension
viewer and take a look at the first run of the Type 2 dimension.

Looking good. Checking the Type 3 dimension gives me the same
view. Now if I use the data viewer against the underlying table, I see that it’s
created a "solved" dimension, where there are additional rows in the
table, with negative DIMENSION KEYs,  to represent the different higher
levels in each hierarchy. This allows Warehouse Builder to create fact tables
against any level in the dimension hierarchy, not just the bottom level, which
was a restriction in earlier versions of the product.

Notice that the effective date attributes for the higher level
rows (with a negative DIMENSION_KEY) are only populated for that level and
higher, so that for example the rows for the class level only have the
CLASS_EFFECTIVE_DATE column filled in, with the FAMILY_EFFECTIVE date and
ITEM_EFFECTIVE left blank.

Now comes the interesting bit. I’ll go back to the PRODUCT_DIM
table and make a few changes:

  • I’ll pull across two more rows, with ITEM_IDs of 43 and 44,
    from the source table in the Global Sample Schema.

  • I’ll amend the row with ITEM_ID = 20, to change the
    ITEM_MARKETING_MANAGER to "Rittman", the ITEM_PACKAGE_ID to
    "Home Multimedia", the FAMILY_ID to 13 and the FAMILY_DSC to
    "LCD Monitors", which should create a new entry in the Product
    hierarchy.

  • I’ll amend the row with ITEM_ID = 21, to change the
    ITEM_BUYER to "Berbatov", the ITEM_PACKAGE_ID to "Home
    Multimedia", and the FAMILY_ID and FAMILY_DSC to the same new values
    that I’ve used for ITEM_ID = 20.

Now what I’d expect here is the following.

  • There should be two new entries in the dimension for the two
    new ITEM_IDs.

  • The row that currently houses the dimension member with
    ITEM_ID = 20 should be closed off, and a new row written with the new
    ITEM_MARKETING_MANAGER and the new values for FAMILY_ID and FAMILY_DSC. I’d
    also expect the FAMILY_ID and FAMILY_DSC changes to propagate through the
    "solved" rows in the underlying table for the Family, Class and
    Total Product levels.

  • For the changes to the row with ITEM_ID = 21, I’d expect one
    row still to remain with the existing value for ITEM_BUYER moved to the
    PREVIOUS_ITEM_BUYER column, and the new value in it’s place. I don’t know
    what will happen with the other columns, we’ll have to wait and see.

I drop into SQL*Plus and make the changes to the PRODUCT_DIM
table.

SQL> update product_dim
  2  set item_marketing_manager='Rittman'
  3  ,   item_package_id = 'Home Multimedia'
  4  ,   family_id = 13
  5  ,   family_dsc = 'LCD Monitors'
  6  where item_id = 20
  7  /

1 row updated.

SQL> update product_dim
  2  set item_buyer = 'Berbatov'
  3  ,   item_package_id = 'Home Multimedia'
  4  ,   family_id = 13
  5  ,   family_dsc = 'LCD Monitors'
  6  where item_id = 21
  7  /

1 row updated.

SQL> insert into product_dim
  2  select *
  3  from   global.product_dim
  4  where  item_id in (43,44)
  5  /

2 rows created.

SQL> commit;

Commit complete.

Now I run the two mappings again to load in this new and updated
data, and then check out the results.

Starting off with the Type 2 dimension, I first check out the
dimension object viewer and notice that the entry with Monitor – 17" Super
VGA appears in both the new LCD Monitors family and also the old, now redundant,
Monitors family. The 19" Monitors one is OK though. What’s happened?

What’s different about the two rows? The row with a 17"
Super VGA monitor has an ITEM_ID of 20, the 19" Super VGA monitor is
ITEM_ID = 21. Looking back at the update statement, we changed the Marketing
Manager (the trigger column) and Package ID, as well as the Family ID and
Description, whilst for the row with an Item ID of 21 we changed just the Item
Buyer and Package ID, neither of which were trigger columns for the Item level,
plus the Family ID and Description. Presumably a new row has been written for
the first item as the trigger has ‘fired’, but it didn’t for the second, even
though the family ID and description changed. Let’s look at the data and find
out.

 

There you go. None of the changes propagated through for ITEM_ID
= 21, as none of the ITEM-level attributes that were triggers were changed. With
the row with ITEM_ID = 20, a new row has been written with the old one expired
off with an ITEM_EXPIRATION_DATE set. The dimension viewer still displays the
expired row though, it seems to ignore the expiry date for the level. But what
about the rows also in the dimension that are specifically for the FAMILY level?

Well the new member of the Family level has come through ok. So,
we’ve got two issues so far – as Roelant found out, attributes that aren’t
marked as trigger columns don’t have their values updated unless a trigger
attribute also in that level changes. Secondly, the dimension viewer in OWB
still seems to show members that have expired off, leading to duplicates when
working with SCD 2 handing.

Looking at the Type 3 dimension though, the two items we just
amended are now within the LCD Monitor family, and the Monitor family has now
disappeared.

That’s strange though as only Item Buyer is a trigger column,
and only the row with ITEM_ID = 21 had a change of Item Buyer. I wonder what
happened? Let’s take a look at the data again.

So with the row with the Item ID of 20, it’s still picked up on
the change of Family Description and Item Package ID, even though the trigger
column didn’t change. That’s what I’d want though so no issue there.

Now this first test is complete, I want to carry out some other
tests on issues that I’d heard mentioned:

  • If you make a change to a higher level description, say
    FAMILY_DSC in my case, OWB doesn’t close off the previous versions of the
    levels below this to reflect that they once were once children of a
    differently named dimension member, and

  • If you don’t mark an attribute as triggering history,
    changes to it won’t get reflected in the dimension entry.

Starting with the first observation, I go back to the
PRODUCT_DIM table and amend all the PRODUCT_DSC values that are currently
"Modems/Fax" to just "Modems". This should affect two rows,
with an ITEM_IDs of 24 and 33.

SQL> update product_dim
  2  set    family_dsc = 'Modems'
  3  where  family_dsc = 'Modems/Fax'
  4  /

2 rows updated.

SQL> commit;

Commit complete.

SQL> select distinct(family_dsc)
  2  from   product_dim
  3  /

FAMILY_DSC
--------------------
CD/DVD
Documentation
Accessories
Desktop PCs
Memory
Modems
Operating Systems
Portable PCs
LCD Monitors

9 rows selected.

Now when I load the Type 2 dimension, I would expect the the
ITEM level rows to be closed off and new ones written with the new FAMILY_DSC,
and the "solved" FAMILY, CLASS and TOTAL_PRODUCT levels to be
similarly affected. I run the load and see what’s happened.

First of all I check the dimension viewer, and this has worked
fine – the old family description of "Modems/Fax" has disappeared to
be replaced with just "Modems". So far so good.

Checking out the table underneath the dimension and firstly just checking the
rows with ITEM_ID in (24,33), I see that Roelant is correct – the record for
this item just shows the new family description, losing it’s earlier historic
value.

However, if I expand the view to look at all rows with FAMILY_ID
= 9, i.e. bring in the solved rows for the higher level, you can see the row
with "Modems/Fax" closed off and a new row written.

So functionally, whilst the user will see the correct new value
in the dimension viewer, the historical old value for the family description is
lost when only referring to the bottom level row, which is what most query tools
would only refer to.

On to the second issue, where attributes that aren’t marked as
trigger columns aren’t updated in the dimension – this is potentially quite a
serious issue, as you’d need to split these cases off from the SCD 2 and 3 load
and process them separately, or lose the changes. In my earlier example, this
indeed did seem to be the case with a Type 2 load, but the values did come
through with the Type 3 load.

To create a new test case, I go back to the PRODUCT_DIM table
and set the ITEM_MARKETING_MANAGER to "Robinson" (which should get
picked up by the type 2 load, as marketing manager is a trigger column) and the
ITEM_BUYER to "King" for the row where ITEM_ID = 31.

SQL> update product_dim
  2  set    item_marketing_manager = 'Robinson'
  3  ,      item_buyer = 'King'
  4  where  item_id = 31
  5  /

1 row updated.

SQL> commit;

Commit complete.

Let’s see what happens. I run the Type 2 load and then check out
the relevant rows using the data viewer.

That’s good – the new row, triggered by the change in item
marketing manager, also contains the new item buyer. But what if we just change
the item buyer for another row, and not a value that’s in one of the trigger
columns?

SQL> update product_dim
  2  set    item_buyer = 'Chimbonda'
  3  where  item_id = 33
  4  /

1 row updated.

SQL> commit;

Commit complete.

What happens now when I run the mapping?

Roelent’s right – the change in Item Buyer doesn’t get picked up
on, as it’s not a trigger column. What this means is that if you want the new
values for an attribute to be recorded at all, they’ve got to be a trigger
column – if they’re not they will get forgotten. Roelent suggests a method of
splitting off rows with changed non-trigger columns and updating them
separately, this would make sense but it’s a bit of an annoying workaround.

Let’s do a similar test with the Type 3 dimension. Item Buyer is
the trigger column here and so should trigger a change for ITEM_ID = 31, but
let’s create a new example with ITEM_ID = 34 where we just change the marketing
manager on it’s own.

SQL> update product_dim
  2  set    item_marketing_manager = 'Mido'
  3  where  item_id = 34
  4  /

1 row updated.

SQL> commit;

Commit complete.

Running the Type 3 load gives me the following rows in the data
viewer.

In this case, the Item Marketing Manager change got pulled
through OK, even though it’s not a trigger column and even when, in the case of
ITEM_ID = 34, the trigger column didn’t even change.

To sum up then, what I think I’ve found as follows:

  • For Type 2 loads, changes to non-trigger columns don’t get
    propagated through to the dimension rows, unless the trigger column for that
    level changed.

  • However for Type 3 loads, changes to non-trigger columns
    always get propagated, even if the trigger column doesn’t change.

  • For Type 2 loads, if attributes of a higher level change in
    a row, this doesn’t trigger a new entry for that particular row, instead the
    new higher-level attribute value just over-writes the old one. When you find
    the row elsewhere in the solved dimension specifically for that higher
    level, that row gets expired and a new one written, but this history is lost
    to the child row below it.

  • When a detail-level, child row gets expired off by the Type
    2 load process, the dimension viewer still displays it, creating duplicates
    in the dimension hierarchy when viewed through the dimension viewer. The
    data in the underlying table is still OK, but the viewer displays the
    hierarchy wrong.

So how do you work around this? Well as Roelant
suggests here
, if you’re doing a type 2 load and the possibility exists that
a non-trigger column can change without a trigger column changing in the same
level, you’ll have to update the dimension separately to handle these cases;
alternatively, make all the attributes in the level trigger attributes, which is
probably what Oracle would suggest you do.

With the issues around new rows not being written for child
dimension members when the parent level attribute changes – well that’s just a
shortcoming of the first implementation, as is the quirk with the dimension
viewer. I expect Oracle development are aware of this and it’ll be addressed in
a forthcoming patch release of OWB 10gR2. I’ve also uploaded the MDL file here
if you’re interested in working through the examples.

Comments

  1. Vladimir Andreev Says:

    I thought Berbatov went to England to play football, not to be an Item_Buyer ;-)
    Nice post!

  2. Tama McConnell Says:

    So you&#8217;re a spurs fan !
    I am glad I am not the only one who is a little &#8220;underwhelmed&#8221; by the SCD2/3 implementation in OWB10gR2.

  3. alex shirva Says:

    Mark:
    while I enjoyed reading your OWB-type2 case study, I am struggling to implement a 2 level dimension ( ALL, product level) using OWB 10gR2. seems like levels keys have to be
    connected ( parent level,child lvl) and in my case there is none ( &#8216;ALL&#8217; is lvl1 code and (1,2,3&#8230;.) are lvl2 codes)
    any way you pint me to a detail example. I looked at oracle reference still there is a disconnection to connect the levels and deploy them.
    many thanks
    alex

Website Design & Build: tymedia.co.uk