Ten Tips for a Successful Oracle Warehouse Builder Project
I worked for a client the other week that asked me to come up with some tips and best practices for Oracle Warehouse Builder. Without giving the game away (otherwise you'd never pay to hire me) here's ten tips for working with Oracle Warehouse Builder 10g.
- Don't skimp on the design and analysis phase. Just because Oracle
Warehouse Builder is a graphical tool used for designing data structures and
mappings, it doesn't mean that you can skip the boring bit at the start
where you do all the analysis and design. You still need to understand your
source data, and you still need to specify how the data maps and how you
deal with the transformations and cleansing. This is especially important
when you consider that some customers use OWB as an opportunity to put the
data migration work into the hands of less experienced developers, who won't
neccessarily be aware of design best practices and who won't be aware of all
the nuances of the source data. Every project I've ever worked on has said
that they wished they'd been able to spend even more time on analysis and
design, so don't think that OWB means you can miss this stage out.
- One the same subject, for each mapping produce a specification and test
plan on to the developer so that they know, item by item, what bit of data
maps on to where, and how to test whether their mapping is working ok. It
doesn't matter where you do this, it can be on a piece of paper, in a Word
document, in a spreadsheet or in a powerpoint slide.
- Make sure from day one that developer machines are sufficiently
powerful. You need at least 1GB of RAM and a 1GHz CPU, and ideally your
Design Repository database will be on a server with locally attached disks
within your department, not on a SAN that's being shared with your
E-Business Suite implementation. Oracle Warehouse Builder is effectively an
OLTP application that stores each change to the warehouse model as a
transaction against the repository tables, and this application like any
other OLTP application suffers when the disks it's using are suffering
contention.
- Use the Flashback technologies in Oracle 9i and 10g to implement mapping
and process flow "transactions", so that your mapping or process flow stores
the SCN (System Change Number) at the start of the mapping, and then rolls
either the tables within the mapping, or the whole database, back using
FLASHBACK TABLE or FLASHBACK DATABASE if the mapping or process flow fails.
What we do is use FLASHBACK table to roll back the tables within a process
flow if it fails, and use FLASHBACK DATABASE if we want to roll back the
entire ETL process. It's certainly quicker than performing a point-in-time
recovery if the whole ETL process goes belly-up and it gives us the ability
to pull together a number of mappings and processes into a single atomic
package of work which we can reverse out if need be.
- Use collections to give yourself the ability to create daily ETL
releases. Each day, collect together into a collection all of those mappings
and other objects that have changed and been "checked out" by the
developers, and use this to export those objects into the "Daily Build"
environment...
- Create a "Daily Build" environment which contains the ETL process as at
the end of the previous day, and which includes all the changed items in the
collection you created yesterday. Then, using an automated deployment
process built using OMB*Plus scripts, deploy all of the objects and mappings
and run a "smoke test"
to check that the build hasn't been broken. If it has, get it fixed, and if
it hasn't, you know that you can create an ETL release if you need to.
Thanks to Jon Mead and Donna Kelly for the last two tips.
- Record and review the run times of your various mappings and process
flows. Once a week, list these out in order of run time, longest at the top,
and in addition identify those whose run times have varied the most over
time. Use the ALL_RT_AUDIT_EXECUTIONS and ALL_RT_AUDIT_MAP_RUNS runtime
repository views to obtain your run times, and bring in additional
statistics from statspack and the Unix server running the database to add to
your diagnostic data.
- When tuning a mapping, look for the simple answers first. Assuming that
you've designed your mapping properly in the first place, the reason it's
running slowly is probably because an index isn't getting used, or you need
to increase the size of HASH_AREA_SIZE, or because you're joining too many
tables together. Only when you've tried the obvious try techniques such as
tracing, the danger with trying the complicated first is that you get
drowned in diagnostic data and miss the obvious solution.
- Keep mappings simple, do one thing at a time. The danger with loading
multiple tables, or loading one table after another after another, is that
you'll never be able to work out what's gone wrong if the test figures come
out wrong, and you'll have the devil of a job tuning the mapping if you've
got multiple levels of INSERT INTO .. SELECT in your mapping. Keep It Simple
is the watchword here.
- If you're starting out on a big project, you've got lots of data to integrate and aggressive deadlines, hire someone who's done it before to give you a hand before it all gets out of control. OWB only goes so far when it comes to a data warehouse project, and the real skill comes in apply software development best practices to what you're doing, and knowing when to use Warehouse Builder and when to step outside of the product and build your own framework. Better to spend a few days upfront than learn all the mistakes the hard way.
Thanks to Jon and Donna for providing input and ideas along the way.