Moving Data

I just looked at the Rittman Mead Blog archive and saw the last post that was written by me was back in August 2010! Where has the time gone? Some of the time has been spent preparing and giving conference and user group talks, (as I write this I am just finishing off a new talk on external data sources for Collaborate 11), but the majority of my time has been consumed working on two large and very different ETL projects; developing a new data warehouse over JD Edwards for an international distribution company (using Oracle Warehouse Builder) and my current project to move large amounts of data around a company using Oracle Data Integrator with the odd touch of GoldenGate replication.

The more that I use ODI the more I like it, especially the recent 11g version with its ability to build source data sets within the GUI using set operators such as UNION and MINUS. Developing ETL processes can be as simple as connecting columns on the mapping tab and then specifying any knowledge modules needed to load, validate and store the data. However, the real challenge for me (and I love to be challenged) comes from making the generated code do exactly what I want it to do and in this ODI gives me immense flexibility. I can specify where individual transformations occur (source, stage, target or even a mix of all three), I am able to specify which columns are updated and which are inserted in an upsert type of incremental load, and, probably most significantly, I can adapt knowledge modules to do exactly what I want them to do.

Sometimes a change to a knowledge module can be as trivial as altering the name of the Oracle directory created for loading files as Oracle External Tables - this would be important if we need to process multiple external table locations within a single database schema. Other times we need to make more substantial alterations to the load, or even repurpose it. For example the Oracle MERGE incremental load includes a step using SQL MINUS to detect changes between source and target; we could change the table order in the MINUS operator so that we look for rows in the target that don't exist in the source and thus build a knowledge module to do a "logical delete".

Which for me means that ODI is not a cookie-cutter tool (though some people treat it so) but a way that allows me use my skill and judgement to do ETL in the way that I think that is most appropriate for the dataset.