Ten Tips for a Successful Oracle Warehouse Builder Project

I worked for a client the other week that asked me to come up with some tips and best practices for Oracle Warehouse Builder. Without giving the game away (otherwise you'd never pay to hire me) here's ten tips for working with Oracle Warehouse Builder 10g.

  1. Don't skimp on the design and analysis phase. Just because Oracle Warehouse Builder is a graphical tool used for designing data structures and mappings, it doesn't mean that you can skip the boring bit at the start where you do all the analysis and design. You still need to understand your source data, and you still need to specify how the data maps and how you deal with the transformations and cleansing. This is especially important when you consider that some customers use OWB as an opportunity to put the data migration work into the hands of less experienced developers, who won't neccessarily be aware of design best practices and who won't be aware of all the nuances of the source data. Every project I've ever worked on has said that they wished they'd been able to spend even more time on analysis and design, so don't think that OWB means you can miss this stage out.
     
  2. One the same subject, for each mapping produce a specification and test plan on to the developer so that they know, item by item, what bit of data maps on to where, and how to test whether their mapping is working ok. It doesn't matter where you do this, it can be on a piece of paper, in a Word document, in a spreadsheet or in a powerpoint slide.
     
  3. Make sure from day one that developer machines are sufficiently powerful. You need at least 1GB of RAM and a 1GHz CPU, and ideally your Design Repository database will be on a server with locally attached disks within your department, not on a SAN that's being shared with your E-Business Suite implementation. Oracle Warehouse Builder is effectively an OLTP application that stores each change to the warehouse model as a transaction against the repository tables, and this application like any other OLTP application suffers when the disks it's using are suffering contention.
     
  4. Use the Flashback technologies in Oracle 9i and 10g to implement mapping and process flow "transactions", so that your mapping or process flow stores the SCN (System Change Number) at the start of the mapping, and then rolls either the tables within the mapping, or the whole database, back using FLASHBACK TABLE or FLASHBACK DATABASE if the mapping or process flow fails. What we do is use FLASHBACK table to roll back the tables within a process flow if it fails, and use FLASHBACK DATABASE if we want to roll back the entire ETL process. It's certainly quicker than performing a point-in-time recovery if the whole ETL process goes belly-up and it gives us the ability to pull together a number of mappings and processes into a single atomic package of work which we can reverse out if need be.
     
  5. Use collections to give yourself the ability to create daily ETL releases. Each day, collect together into a collection all of those mappings and other objects that have changed and been "checked out" by the developers, and use this to export those objects into the "Daily Build" environment...
     
  6. Create a "Daily Build" environment which contains the ETL process as at the end of the previous day, and which includes all the changed items in the collection you created yesterday. Then, using an automated deployment process built using OMB*Plus scripts, deploy all of the objects and mappings and run a "smoke test" to check that the build hasn't been broken. If it has, get it fixed, and if it hasn't, you know that you can create an ETL release if you need to. Thanks to Jon Mead and Donna Kelly for the last two tips.
     
  7. Record and review the run times of your various mappings and process flows. Once a week, list these out in order of run time, longest at the top, and in addition identify those whose run times have varied the most over time. Use the ALL_RT_AUDIT_EXECUTIONS and ALL_RT_AUDIT_MAP_RUNS runtime repository views to obtain your run times, and bring in additional statistics from statspack and the Unix server running the database to add to your diagnostic data.
     
  8. When tuning a mapping, look for the simple answers first. Assuming that you've designed your mapping properly in the first place, the reason it's running slowly is probably because an index isn't getting used, or you need to increase the size of HASH_AREA_SIZE, or because you're joining too many tables together. Only when you've tried the obvious try techniques such as tracing, the danger with trying the complicated first is that you get drowned in diagnostic data and miss the obvious solution.
     
  9. Keep mappings simple, do one thing at a time. The danger with loading multiple tables, or loading one table after another after another, is that you'll never be able to work out what's gone wrong if the test figures come out wrong, and you'll have the devil of a job tuning the mapping if you've got multiple levels of INSERT INTO .. SELECT in your mapping. Keep It Simple is the watchword here.
     
  10. If you're starting out on a big project, you've got lots of data to integrate and aggressive deadlines, hire someone who's done it before to give you a hand before it all gets out of control. OWB only goes so far when it comes to a data warehouse project, and the real skill comes in apply software development best practices to what you're doing, and knowing when to use Warehouse Builder and when to step outside of the product and build your own framework. Better to spend a few days upfront than learn all the mistakes the hard way.

Thanks to Jon and Donna for providing input and ideas along the way.