February 20th, 2013 by Michael Rainey
Last week, I was in Denver, Colorado presenting at the Rocky Mountain Oracle Users Group Training Days conference. This was my first time at the RMOUG conference and I have to say, with the number of great topics being presented during any given session time, I hope to see the training days expanded beyond 2 days in the future. Rittman Mead was well represented, as we presented or co-presented a total of 8 sessions at the conference.
The topic I presented was titled “GoldenGate and ODI – A Perfect Match for Real-Time Data Warehousing”, based on my inaugural Rittman Mead blog post. I described how to use the Oracle Next Generation Reference Data Warehouse Architecture, along with Oracle GoldenGate and Oracle Data Integrator, to form the perfect match for real-time data warehousing. Finally, we reviewed several solutions to common challenges in real-time ETL processing. In case you missed it last week, I will also be presenting this session at KScope 13 in New Orleans, June 23-27.
I’ve decided to split the topic into two posts, focusing first on an introduction to the architecture and tools being used. In the second post, I will describe in more detail how it all fits together.
Oracle’s Reference Data Warehouse Architecture
The Reference Data Warehouse Architecture (recently updated to include Big Data) has been covered well in the Rittman Mead blog, and most recently in Peter Scott’s RMOUG presentation, so I won’t go into too much detail here. Basically, there are three layers to the architecture: staging, foundation, and access and performance. The Staging layer acts as the “landing pad” for incoming source data, with exact copies of the current source tables and data. The Foundation layer also retains a copy of the source tables, but rather than storing only the current records from each source, the entire transactional history is captured, stored, and tracked with audit columns such as SCN and commit date. Finally, the Access and Performance layer is where the traditional star schemas are created and business intelligence tools, such as OBIEE, will connect and access the data warehouse data.
The key to the architecture is the aptly named Foundation layer. The source transactional history has been stored here since the inception of the data warehouse and can be used to reload the target star schema should new attributes or measures be added. It can also be used to support reporting against transactional schemas, as Stewart Bryson has described both in person (one of his RMOUG presentations) and in a series of blog posts. For example, using OBIEE to build a logical star against the Foundation layer tables can provide the end user a prototype of the report, allowing us to further define requirements before the final ETL is constructed.
Oracle GoldenGate and Oracle Data Integrator
Oracle GoldenGate is the standard Oracle product for data replication, providing log-based change data capture, distribution, and delivery in real-time. GoldenGate captures transactional data changes from the database log based on the configuration set in the Extract parameter file, and loads them into a Trail File, GoldenGate’s own log file, using a platform-independent universal data format. The data is then read from the extract Trail File and moved to the target server Trail File using a process called a Data Pump. Finally, the transactions are loaded into the target database tables from the target Trail File using the Replicat parameter file configuration. The entire process occurs with sub-second latency and minimal impact to the source and target systems.
Oracle Data Integrator is Oracle’s strategic product for data integration. Using the E-LT (Extract, Load, Transform) methodology to gather data from the source, load it into a staging area, and then perform transformations while loading the final target, ODI uses the power of the database to perform the work, rather than an additional ETL engine. ODI also has its own form of change data capture, which is implemented using a Journalized Knowledge Module. The JKM generates the infrastructure for the CDC, creating journal tables that store the change rows and views that provide access to the change rows for use in ODI Interfaces. ODI CDC can be implemented using various capture processes, such as triggers on source tables, timestamps on rows, or mining of the database logs via Oracle Streams or, in our case, Oracle GoldenGate.
The integration between GoldenGate and ODI begins with the Journalized Knowledge Module, “JKM Oracle to Oracle Consistent (OGG)”. The JKM, delivered with ODI, will create the GoldenGate parameter files, provide instructions on how to implement GoldenGate, and generate the ODI CDC tables and views. The JKM, added to the Model, uses the ODI metadata to generate the GoldenGate parameter file mappings. This alone saves quite a bit of manual work and, of course, reduces possible typos caused by human error.
In the next post, I will describe how Oracle Data Integrator can be used to generate the GoldenGate parameter files and begin replication from source to both the Staging and Foundation layers, ending with several solutions to some common real-time ETL challenges.
GoldenGate and Oracle Data Integrator – A Perfect Match…