Generic problems

Drag and drop ETL tools sometimes encourage the use of a generic approach. Load a flat file to stage, validate the data, apply some transforms and publish to a data warehouse is after all standard fare within a data warehouse. But what when there is some subtle difference in the process that does not get spotted by the developer in the design and implement phase?

One recent example came to light for one of our Oracle Warehouse Builder built data warehouses. Almost all of the fact tables are partitioned and the usual publish approach is to direct path insert into an empty table and then partition exchange. But our problem fact table is a late addition to the DW - it came from a new requirement about a year into live use of the system. The original developers had moved on and the new team used current practice to model any changes. The new daily feed was around 2000 rows per day and from our table stats the average row length is around 32 bytes. So we have small volumes to load each day and is probably not worth partitioning. And there is the problem; we are not loading into a empty partition so the normal load mechanism of append is not appropriate since it direct path inserts the data into a new extent. And as each extent is 4MB, we are wasting a vast amount of space to load less than 100KB per day.

So the fix: Rebuild the OWB map to use conventional insert and rebuild the target table to recover the masses of unused storage.

As David Aldridge correctly noted - it is append parallel that causes the extra extents to be used. Looking at the generated code this was indeed the case. And the cause, a default append parallel hint in the mapping hint!