Well, it’s been awhile since I’ve posted one of our Rittman Mead Data Integration Tips, so I thought a recent challenge might be the next great candidate. I was working through the Kimball ETL Subsystems and the Error Event Schema, Subsystem 5 if you’re familiar with the methodology, and attempting to build the schema from Oracle Data Integrator 12c (ODI 12c) metadata tables. If you caught my presentation “A Walk Through the Kimball ETL Subsystems with Oracle Data Integration” at Oracle OpenWorld 2015, then you’ll know a bit more about where I’m coming from. You can still catch my presentation on this topic at both IOUG Collaborate16 and ODTUG KScope16 (being invited to speak at each of these events is always an honor). Now this Data Integration Tip isn’t solely related to the Kimball ETL Subsystems, though the solution did prove rather useful for my presentation (and upcoming article in the IOUG Select Journal). It’s actually an interesting twist in how Oracle Data Integrator mapping metadata is stored differently between the ODI 11g and ODI 12c repositories.
The challenge is to find the target table, or Datastore, for a given Mapping in ODI 12c, or Interface in ODI 11g. When I say “find”, I mean query the ODI work repository tables and return the target table, or tables in 12c, for a given Mapping. Luckily, I’m equipped with some guidance from our friends at My Oracle Support. If you look at support document Oracle Data Integrator 11g and 12c Repository Description (Doc ID 1903225.1) you’ll find the data dictionaries for both ODI11g (184.108.40.206.0+) and ODI12c (12.1.2, 12.1.3, & 12.2.1). These are invaluable resources from Oracle - though the may be works in progress and somewhat incomplete!
Let’s take a look first at ODI 11g and how simple things used to be. Back when Interfaces were the mechanism for extracting, transforming, and loading, we were allowed only 1 single target Datastore. Ahh, those were the good ‘ol days! Digging into the repository we really only had one place to look for the target table - SNP_POP. This table in the ODI 11g repository, SNP_POP (which essentially stands for Synopsis - Populate), contains a column I_TABLE. This identifying column represents the target table for that particular Interface. Here’s a query that ties it all together.
select p.pop_name interface_name, t.table_name target_table, m.cod_mod model_code from snp_pop p inner join snp_table t on p.i_table = t.i_table inner join snp_model m on t.i_mod = m.i_mod;
As you can see, the key to capturing the target table for an Interface is simply in the SNP_POP.I_TABLE column. Because there is only one target, we can easily figure it out.
Now, ODI 12c is where the real challenge lies. As you may know, with the move from Interfaces in 11g to flow based Mappings in 12c, we were allowed to do new and exciting things, such as load multiple target tables from a single Mapping. We may also have a case where a Datastore component maps to a Filter component, which then maps to another Datastore component, etc. As you can see in the image below, we can have lots of tables, and lots of tables that may be sources or targets, but we’re only interested in the final target table (or tables, for that matter!).
Ok, so let’s dig into the Work Repository now. It seems an ODI Mapping can’t be that much more difficult than an Interface, right? Well…
First, there are quite a few tables related to the Mapping itself.
SNP_MAPPING SNP_MAP_ATTR SNP_MAP_ATTR_INFO SNP_MAP_COMP SNP_MAP_COMP_TYPE SNP_MAP_CONN SNP_MAP_CP SNP_MAP_CP_ROLE SNP_MAP_DATA_TYPE SNP_MAP_EXPR SNP_MAP_EXPR_REF SNP_MAP_PROP SNP_MAP_PROP_DEF SNP_MAP_REF SNP_MAP_REF_PP
Whoa…we’ve got some work to do. Lucky for you, I’ve already done the work. Let’s look at how it all fits together. We can start with the Mapping (SNP_MAPPING) table. We also have different components on the Mapping, such as Lookups, etc, so we can join in the table SNP_MAP_COMP as well. Here’s the information we’ll be able to see with that simple join.
select ... from snp_mapping m inner join snp_map_comp mc on m.i_mapping = mc.i_owner_mapping
That’s interesting, we’ve captured all components in the mapping. But there are still quite a lot of non-targets here. Ok, maybe if we add the connection points for each component we can find the input and output for each. Components each have an input connector point, allowing an output from a different component to flow into it.
select ... from snp_mapping m inner join snp_map_comp mc on m.i_mapping = mc.i_owner_mapping inner join snp_map_cp cp on mc.i_map_comp = cp.i_owner_map_comp
That just added the connection point information for each component and whether it is an INPUT or OUTPUT. Not extremely useful by itself, so let’s dig a bit deeper. How about adding the SNP_MAP_REF table? This table seems to contain a reference to all types of other attributes about the Mapping and its Components. We also need to consider that Datastores, just as any other Component, will have both an input and output. Right now, the dataset shows both the input and output connectors for each Component. Let’s remove the input connection points to limit our result set.
select ... from snp_mapping m inner join snp_map_comp mc on m.i_mapping = mc.i_owner_mapping inner join snp_map_cp cp on mc.i_map_comp = cp.i_owner_map_comp inner join snp_map_ref mr on mc.i_map_ref = mr.i_map_ref where cp.direction = 'O' --output connection point only
Joining the reference table has now allowed us to focus on the Datastores in the Mapping and their OUTPUT connection point. What I really want is to see only the Datastores that do not have their OUTPUT connection point connected to an INPUT connector. Therefore, if the OUTPUT is empty, it must be the target table!
select ... from snp_mapping m inner join snp_map_comp mc on m.i_mapping = mc.i_owner_mapping inner join snp_map_cp cp on mc.i_map_comp = cp.i_owner_map_comp inner join snp_map_ref mr on mc.i_map_ref = mr.i_map_ref where cp.direction = ‘O' and --output connection point only. cp.i_map_cp not in (select i_start_map_cp from snp_map_conn) --not a starting connection point
The SNP_MAP_CONN, which stores the mapping connections, will allow me to limit the query to the components that only have an output, but not an input. The connections table will contain all component connections in the ODI 12c mappings. Here’s what we get as a result.
Hey, now we’re onto something here. In fact, this is what I was looking for! Target table(s) in a single Mapping in ODI12c. Not quite as simple as ODI11g, but with a bit of SQL and understanding of the repository tables, you can do it. Here’s the final query again, joining in the SNP_TABLE & SNP_MODEL tables to complete the dataset.
select m.name mapping_name, mr.qualified_name, mc.name datastore_alias, t.table_name target_table, mdl.cod_mod model_code from snp_mapping m inner join snp_map_comp mc on m.i_mapping = mc.i_owner_mapping inner join snp_map_cp cp on mc.i_map_comp = cp.i_owner_map_comp inner join snp_map_ref mr on mc.i_map_ref = mr.i_map_ref inner join snp_table t on mr.i_ref_id = t.i_table inner join snp_model mdl on t.i_mod = mdl.i_mod where cp.direction = 'O' and --output connection point cp.i_map_cp not in (select i_start_map_cp from snp_map_conn) --not a starting connection point ;
Please let me know if you find this Data Integration Tip useful or if you have a better way of accessing the target table in a mapping. One of my Rittman Mead colleagues asked, “why not just use the ODI Java API?”. For accessing the ODI repository, I do prefer using some Groovy script and the API. But in this case, I’m interested in building out a dimensional schema and writing ETL to load the dimensions and facts, which lends itself to SQL rather than Groovy script.
As always, if you’re team needs help around Oracle Data Integrator, or Oracle Data Integration Solutions in general, drop us a line at [email protected]. Or feel free to reach out to me directly via email ([email protected]) or Twitter (@mRainey). Cheers!