Ten Tips for a Successful Oracle Warehouse Builder Project

March 21st, 2006 by Mark Rittman

I worked for a client the other week that asked me to come up with some tips
and best practices for Oracle Warehouse Builder. Without giving the game away
(otherwise you’d never pay to hire me) here’s ten tips for working with Oracle
Warehouse Builder 10g.

  1. Don’t skimp on the design and analysis phase. Just because Oracle
    Warehouse Builder is a graphical tool used for designing data structures and
    mappings, it doesn’t mean that you can skip the boring bit at the start
    where you do all the analysis and design. You still need to understand your
    source data, and you still need to specify how the data maps and how you
    deal with the transformations and cleansing. This is especially important
    when you consider that some customers use OWB as an opportunity to put the
    data migration work into the hands of less experienced developers, who won’t
    neccessarily be aware of design best practices and who won’t be aware of all
    the nuances of the source data. Every project I’ve ever worked on has said
    that they wished they’d been able to spend even more time on analysis and
    design, so don’t think that OWB means you can miss this stage out.
     
  2. One the same subject, for each mapping produce a specification and test
    plan on to the developer so that they know, item by item, what bit of data
    maps on to where, and how to test whether their mapping is working ok. It
    doesn’t matter where you do this, it can be on a piece of paper, in a Word
    document, in a spreadsheet or in a powerpoint slide.
     
  3. Make sure from day one that developer machines are sufficiently
    powerful. You need at least 1GB of RAM and a 1GHz CPU, and ideally your
    Design Repository database will be on a server with locally attached disks
    within your department, not on a SAN that’s being shared with your
    E-Business Suite implementation. Oracle Warehouse Builder is effectively an
    OLTP application that stores each change to the warehouse model as a
    transaction against the repository tables, and this application like any
    other OLTP application suffers when the disks it’s using are suffering
    contention.
     
  4. Use the Flashback technologies in Oracle 9i and 10g to implement mapping
    and process flow "transactions", so that your mapping or process flow stores
    the SCN (System Change Number) at the start of the mapping, and then rolls
    either the tables within the mapping, or the whole database, back using
    FLASHBACK TABLE or FLASHBACK DATABASE if the mapping or process flow fails.
    What we do is use FLASHBACK table to roll back the tables within a process
    flow if it fails, and use FLASHBACK DATABASE if we want to roll back the
    entire ETL process. It’s certainly quicker than performing a point-in-time
    recovery if the whole ETL process goes belly-up and it gives us the ability
    to pull together a number of mappings and processes into a single atomic
    package of work which we can reverse out if need be.
     
  5. Use collections to give yourself the ability to create daily ETL
    releases. Each day, collect together into a collection all of those mappings
    and other objects that have changed and been "checked out" by the
    developers, and use this to export those objects into the "Daily Build"
    environment…
     
  6. Create a "Daily Build" environment which contains the ETL process as at
    the end of the previous day, and which includes all the changed items in the
    collection you created yesterday. Then, using an automated deployment
    process built using OMB*Plus scripts, deploy all of the objects and mappings
    and run a "smoke test"
    to check that the build hasn’t been broken. If it has, get it fixed, and if
    it hasn’t, you know that you can create an ETL release if you need to.
    Thanks to Jon Mead and Donna Kelly for the last two tips.
     
  7. Record and review the run times of your various mappings and process
    flows. Once a week, list these out in order of run time, longest at the top,
    and in addition identify those whose run times have varied the most over
    time. Use the ALL_RT_AUDIT_EXECUTIONS and ALL_RT_AUDIT_MAP_RUNS runtime
    repository views to obtain your run times, and bring in additional
    statistics from statspack and the Unix server running the database to add to
    your diagnostic data.
     
  8. When tuning a mapping, look for the simple answers first. Assuming that
    you’ve designed your mapping properly in the first place, the reason it’s
    running slowly is probably because an index isn’t getting used, or you need
    to increase the size of HASH_AREA_SIZE, or because you’re joining too many
    tables together. Only when you’ve tried the obvious try techniques such as

    tracing
    , the danger with trying the complicated first is that you get
    drowned in diagnostic data and miss the obvious solution.
     
  9. Keep mappings simple, do one thing at a time. The danger with loading
    multiple tables, or loading one table after another after another, is that
    you’ll never be able to work out what’s gone wrong if the test figures come
    out wrong, and you’ll have the devil of a job tuning the mapping if you’ve
    got multiple levels of INSERT INTO .. SELECT in your mapping. Keep It Simple
    is the watchword here.
     
  10. If you’re starting out on a big project, you’ve got lots of data to
    integrate and aggressive deadlines, hire someone who’s
    done it before to give
    you a hand before it all gets out of control. OWB only goes so far when it
    comes to a data warehouse project, and the real skill comes in apply
    software development best practices to what you’re doing, and knowing when
    to use Warehouse Builder and when to step outside of the product and build
    your own framework. Better to spend a few days upfront than learn all the
    mistakes the hard way.

Thanks to Jon and Donna for providing input and ideas along the way.

Comments

  1. Amol Says:

    Spot on recommendations . I believe most of these could be applied to any data-integration effort.
    People most often pay less attention to #1 then hire in-experienced developers and conveniently lay the blame on the tool at a later stage.