Here at Rittman Mead we have been working on some new methodology and design patterns for ETL. We have long realised that the bottleneck in Business Intelligence and Data Warehousing projects is ETL, so we have been prototyping new techniques to approaching this and trialling them at client's sites.
Taking a step back and looking at the ETL process we felt there was a lot of complexity unnecessarily created by decomposing the process into a number of program units or mappings. In our view this process creates the following problems:
- A large amount of processing time was wasted on the inter-communication of these mappings.
- Unnessary temporary storage objects and created and populated in the database.
- A separate technology is required to orchestrate all the mappings.
- It encouraged multiple developers to work on the ETL process thereby increasing the risk of mis-communication and mis-aligned interfaces.
We feel this approach adheres to some of the fundamental tennets of software development: encapsulation (everything is in the one mapping) and decoupling (there are no external dependencies). Further it completely negates the need for old bugbear re-usability, you now don't even need to re-use code, just use it once, all in the same mapping. Most importantly OMP will also provides a reduction in development costs: you now only need one developer.
Our extensive research has also developed a series of steps you can follow to deliver your One Mapping. You should note that the One Mapping that OMP generates will be extremely complex, only by following these can you address the complexity of the mapping that will be generated.
OMP follows a black hole development approach where it is crucial for the developer to do as much development as possible without any outside interfere from either peers or the business. This allows the developer to focus solely on the development task in hand, which is a must when developing extremely complex code. It is also essential that the developer is allowed to proceed as far through the process as possible without stopping for other distracting activities like testing. In order to follow the OMP I have built the following example using Oracle Warehouse Builder.
- Step 1: source objects - create new mapping a drag all your source objects onto the canvas - it is important to arrange these in a straight line on the left hand side of the canvas.
- Step 2: add all your join operators to combine the data. A couple of tips here, (1) add predicates into the join conditions to avoid using filter operators (2) keep the data transition lines as straight as possible for performance reasons.
- Step 3: add any expression or transformational operators required - these should really be added to the middle of the canvas.
- Step 4: add all your target tables - these are added to the right hand side of your canvas. You are in the home straight now, but you may find this the trickiest part and we recommend using at least a 29" monitor to complete this process.
- Step 5: unit test - note there is no orchestration or integration required, as you only have One Mapping.
- Step 6: release to production - you can just release you mapping straight into production, overwriting whatever was there before. There is no system or integration testing required as there is only one piece of code. UAT is further bypassed as your unit testing verifies whether the entire ETL process works or not.