Extreme BI: Capturing Change with Oracle GoldenGate and ODI: Part 2 (Implementation)

In my previous post, I provided an introduction to the Extreme BI approach to change data capture from Source → Staging Layer and Source → Foundation Layer. I described the Oracle Next-Generation Reference DW Architecture and how we plan to implement an appropriate solution for agile data warehousing. Now it is time to get into the setup and implementation of Oracle GoldenGate via Oracle Data Integrator 11g.

GoldenGate Replication via ODI 11g: Getting Started

To recap where we left off, our solution begins with Oracle GoldenGate and its real-time replication capabilities, integrating with Oracle Data Integrator 11g and its journalizing change data capture implementation. The setup will be as follows, with the replication from GoldenGate loading both Staging and Foundation layers, and then the ODI CDC moving data through Staging into the Performance layer. We’ll focus on the GoldenGate piece of the solution in this post.

Oracle GoldenGate Solution

After the install and initial configuration of GoldenGate, installation of ODI, and setup of the databases representing each layer (source, staging, and foundation), I was ready to roll. For reference, I used one of the more detailed GoldenGate blogs, written by Gavin Soorma. The tutorial is very intuitive for the setup of GoldenGate in a Linux environment. Mark Rittman also touched on the subject, pre-ODI 11g, in a blog post here. To learn the integration between ODI and GoldenGate, I worked through a helpful Oracle By Example titled ODI11g: Combining Oracle Data Integrator and Oracle GoldenGate (NOTE: the OBE scripts were developed specifically for ODI 11.1.1.3). When all was installed and ready to go, I had created 3 models and 3 datastores, and was ready for replication.

Source, Staging, and Foundation Models

The three datastores are identical, with the exception of a few additional columns in the Foundation version of the table. These columns will be used to capture information related to each transactional change on the source.

EDW_SCN (System Change Number)
EDW_COMMIT_TIMESTAMP (when the change was committed)
EDW_TRANS_TYPE (DML type for the transaction)

As you can imagine, it might become a bit cumbersome to add 3 new columns to each datastore in your Foundation model, especially if you have hundreds of tables. To speed up development time, I was able to create a simple Groovy script within ODI 11g that loops through each datastore in the model, adding the appropriate columns. I'll save this one for a future post.

GoldenGate Replication via ODI 11g: Capturing Change

The key to the ODI / GoldenGate integration is a Journalizing Knowledge Module in ODI named "JKM Oracle to Oracle Consistent (OGG)". The knowledge module has a long list of commands to support the CDC functionality in ODI (Add to CDC, Add Subscriber, Start Journal, etc), including several very specific to the GoldenGate parameter file configuration (hence the OGG in the name). As you will come to see, ODI does most of the heavy lifting for you regarding the GoldenGate extract, datapump and replicat parameter file creation and setup.

By default, the OGG parameter files are set to propagate a change from the source to the target, keeping both tables in sync with each other. The great thing about ODI's Knowledge Modules is that they are very flexible and can be edited to perform almost any function. By adding a few lines of code to the replicat parameter file generation command, and mapping the transaction history capture columns, we can now use an enhanced version of the JKM to create the Source to Foundation layer replication. My modifications to the JKM were minimal: 1) additional code in one command to allow the tracking of history in the target database by inserting a new record for each change, and 2) an Option to determine whether we output this code during execution or not.

Below you’ll find the modified "Create apply pro (2)" command from my custom JKM. It is a little messy, but a quick comparison with the original and the description below should help you single out the code additions.

OdiOutFile "-FILE=<%= odiRef.getOption("LOCAL_TEMP_DIR") %>/<%= odiRef.getOption("SRC_OGG_OBJECT_GROUP") %>_to_<%= odiRef.getOption("STG_OGG_OBJECT_GROUP") %>/stg/dirprm/<%= odiRef.getOption("STG_OGG_OBJECT_GROUP") %>A#ODI_APPLY_NUMBER.prm" "-CHARSET_ENCODING=<%= odiRef.getOption("CHARSET_ENCODING") %>" -APPEND

<%if(odiRef.getOption("STG_OGG_TRACK_HISTORY_RM").equals("1"))
{%>
map <%= odiRef.getObjectName("L", odiRef.getJrnInfo("TABLE_NAME"),  odiRef.getOption("SRC_LSCHEMA"), "D") %>, TARGET <%= odiRef.getJrnInfo("FULL_TABLE_NAME") %>, KEYCOLS (<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>, EDW_SCN)<%if (!odiRef.getOption("NB_APPLY_PROCESS").equals("1")) {%>, FILTER (@RANGE(#ODI_APPLY_NUMBER,<%= nbApplyProcesses %>,<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>))<% } %> INSERTALLRECORDS,
COLMAP (
USEDEFAULTS,
EDW_COMMIT_TIMESTAMP = @GETENV("GGHEADER" , "COMMITTIMESTAMP"),
EDW_SCN = @GETENV("TRANSACTION" , "CSN"),
EDW_TRANS_TYPE = @GETENV("GGHEADER" , "OPTYPE")
)
<%} else {%>

map <%= odiRef.getObjectName("L", odiRef.getJrnInfo("TABLE_NAME"),  odiRef.getOption("SRC_LSCHEMA"), "D") %>, TARGET <%= odiRef.getJrnInfo("FULL_TABLE_NAME") %>, KEYCOLS (<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>)<%if (!odiRef.getOption("NB_APPLY_PROCESS").equals("1")) {%>, FILTER (@RANGE(#ODI_APPLY_NUMBER,<%= nbApplyProcesses %>,<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>))<% } %>;
<%} %>;

The key to tracking history for each change is the INSERTALLRECORDS option, which tells GoldenGate to insert a new record into the target for each transaction. To make this work properly, be sure your database supplemental logging is turned on and set to capture all columns for each changed row.

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS

To ensure the columns will be mapped properly, I set the USEDEFAULTS option in the COLMAP component, forcing like-named columns to map automatically. For the EDW_* column mappings, I used the @GETENV function, which returns information about the GoldenGate environment. The GGHEADER information type, from which I grab the COMMITTIMESTAMP (commit timestamp) and OPTYPE (DML operation/transaction type), provides info about the transaction environment of each record. The TRANSACTION information type, from which I get the CSN (the Commit Sequence Number, which for Oracle is the System Change Number), returns information about each source transaction.

I wrapped the creation of the map calls in an IF ELSE statement, using my newly created "STG_OGG_TRACK_HISTORY" option as the condition. This allows us to set the option to "true" when we want to track history and "false" when we do not. My updated JKM is saved as a copy of the original, with the name "JKM Oracle to Oracle Consistent (OGG) RM". The replicat parameter file created when tracking history is turned on will look like this:

replicat ODIF1A1
userid system,  password oracle
discardfile /epm/ogg_fnd/dirrpt/ODIF1.dsc, purge
sourcedefs /epm/ogg_fnd/dirdef/ODISC.def

map GGS_OWNER.SRC_CUSTOMER, TARGET FOUNDATION.SRC_CUSTOMER, KEYCOLS (CUSTID, EDW_SCN) INSERTALLRECORDS,
COLMAP (
USEDEFAULTS,
EDW_COMMIT_TIMESTAMP = @GETENV("GGHEADER" , "COMMITTIMESTAMP"),
EDW_SCN = @GETENV("TRANSACTION" , "CSN"),
EDW_TRANS_TYPE = @GETENV("GGHEADER" , "OPTYPE")
)
;

GoldenGate Replication via ODI 11g: Setup and Configuration

Out-of-the-box, the "JKM Oracle to Oracle Consistent (OGG)" Knowledge Module allows multiple targets to be loaded from the same source. We'll reuse this capability, with the new JKM Option "STG_OGG_TRACK_HISTORY" set appropriately for each model.

First, we set all of our options for the setup of the replication from Source to Staging. In the Model STAGING, select the Journalizing tab, set the Journalizing Mode to "Consistent Set", and choose the new "JKM Oracle to Oracle Consistent (OGG) RM" Knowledge Module, allowing us to set the options. Below I have only listed those options which need to be modified or are of importance to this setup, leaving any not listed as the default value. Note: depending on the OS you are using, the location of your OGG install, and your database names, my values may differ slightly from yours.

LOCAL_TEMP_DIR : /epm/TEMP
SRC_LSCHEMA : SOURCE
SRC_DB_USER : ggs_owner
SRC_DB_PASSWORD : ggs_owner
SRC_OGG_PATH : /epm/ogg
SRC_HOSTNAME : LOCALHOST
SRC_SETUP_OGG_PROCESSES : true
STG_OGG_OBJECT_GROUP : ODIT1
STG_MANAGER_PORT : 7910
STG_OGG_TRACK_HISTORY : false
STG_OGG_PATH : /epm/ogg_stg
COMPATIBLE : 10
ENABLE_ODI_CDC : true

We'll then add our table SRC_CUSTOMER to the CDC in ODI and add a subscriber, named "ODI", at the model level.

Finally, we start the journal at the STAGING model level. This will not only create all of the standard ODI journalizing objects, such as the J$ tables and views, but it will also generate the configuration (obey) and parameter files for use in GoldenGate replication. It also, conveniently I might add, creates a ReadMe.txt file that guides you through the GoldenGate replication setup. All files are generated in the temporary file location specified in your JKM options.

ODI creates a temp folder with GoldenGate files

After following the ReadMe.txt file instructions, we should be ready for replication. Now, when a record changes in the Source table SRC_CUSTOMER, the change will be propagated through to the Staging table SRC_CUSTOMER, with the change made in-place. Below, I change the Source version of my ADDRESS from "1221 Word St" to "1221 Word Ave". When I query the Staging table, the update has already been made.

To begin capturing these changes in the Foundation layer database, we need to follow the same steps as with the Staging, but with a few variations.

First, we setup the journalizing in the FOUNDATION model, just as we did for the STAGING model. Then, we set our JKM options.

SRC_SETUP_OGG_PROCESSES : false
STG_OGG_OBJECT_GROUP : ODIF1
STG_MANAGER_PORT : 7911
STG_OGG_TRACK_HISTORY : true
STG_OGG_PATH : /epm/ogg_fnd
ENABLE_ODI_CDC : false

Beyond the more obvious differences, such as the location of the Foundation GoldenGate install and the different Manager port, we have several important options to ensure our change capture replication works. We set the "SRC_SETUP_OGG_PROCESSES" to false, as the source extract is already in place. The new option, "STG_OGG_TRACK_HISTORY", set to true, will enable the additional code we added to the JKM to be implemented in the creation of the replicat parameter file. Finally, the "ENABLE_ODI_CDC" option is set to false. This will eliminate the creation of the J$ tables and views, and other objects that are used primarily by ODI for journalizing, since we have no plans to build ELT (Extract, Load, Transform) code using the Foundation layer data.

We then add the table to the CDC and start the journal at the FOUNDATION model level. No subscriber is necessary as we are not performing any CDC with the journalizing downstream. Follow the ReadMe.txt file instructions and the replication from Source to Foundation should be up and running.

Now, if I make another couple of changes to the SRC_CUSTOMER table, you can see that the update is again made to the Staging database table in-place. On the other hand, the old record with the previous value still exists in the Foundation database, and there is now a new record inserted into the SRC_CUSTOMER table for each change, with the EDW_* columns loaded with data about each transaction.

Source changes are captured as history in the foundation layer

With the transactions being replicated via Oracle GoldenGate, we now have a new and improved process for loading the Foundation layer and capturing change. The process setup was straightforward and much less time consuming than creating a set of Source to Foundation Interfaces. The ability for ODI to integrate with GoldenGate provides a simple, yet effective way to implement your change data capture in Extreme BI data warehousing.