June 13th, 2011 by Mark Rittman
About a year or so ago I posted an entry on this blog highlighting the new architectural features of Oracle Data Integrator 11g. At the time, I talked about the new user interface, integration with WebLogic Server and Enterprise Manager, and how the Repository Creation Utility could be used to create the master and work repositories. As well as these high-level new features though, there were a number of more detail-level features that could prove useful for developers creating data integration routines with the tool.
One such new feature is the ability to join together two or more data sets using set operators, such as UNION, INTERSECT or MINUS. To take an example, consider a situation where you have some order data sitting in a table that you want to load into a staging area, like this:
Now imagine that you had some additional orders data, but this time it was sitting in a file, containing the same columns but obviously stored separately to the table data. You could read from the two sources over two separate interfaces, or in ODI 11g you can press the Add/Remove Dataset… button, like this:
Pressing this button brings up a dialog that lets you give each dataset a name, add a new dataset reference and then select the set operator to combine them, from the list of UNION, UNION ALL, MINUS and INTERSECT.
Once you’ve added this second dataset reference, ODI adds a tab to the bottom of the Source Datastore canvas, and you can then drag and drop the file datasource into your interface, with the two sources then being combined through your chosen set operator.
Another new feature is Temporary Interfaces. If you saw my postings on the 18.104.22.168 release of the Oracle BI Applications back in 2009, you’d have seen how Oracle reproduced the maplets feature within Informatica with temporary, or “yellow” interfaces. Temporary interfaces are different from regular ODI interfaces in that they can become a data source for another interface, allowing you to encapsulate mapping functionality and re-use it across wider ETL processes. Temporary interfaces have now made their way into ODI 11g and are a handy way of breaking down a more complex mapping procedure into more manageable chunks.
To take an example, in the example above we combined a table and file source into a single interface source through a set operator, which then provided us with some order line item informaton. Now consider a situation where we want the output of this interface to be available as a data source for another interface, so that we can combine the line-level data with some order-level information such as the order date, salesperson and ship date. To turn the original interface into a temporary interface, ensure that you don’t drag a target datastore into the interface, instead drag and drop the columns you require into the target datastore area, and in the Temporary Target Properties panel at the bottom of the screen, give the output table a name, for example ORDER_DETAILS_TEMP.
Now, when the temporary interface is executed, it creates an populates in this case a table called ORDER_DETAILS_TEMP, and if you include the temporary interface in another mapping, you can join to it and read from it just like any other datastore.
The default behaviour for temporary interfaces is to persist the intermediate results (the output of the temporary interface) in a database table, which in the example above, would make sense as part of the data is coming in from a file. But in the case where the temporary interface is only working with table data, it would be nice if we could render the temporary interface as just a SELECT statement, which we could embed in the calling interface as a sub-query or sub-select.
This is actually now possible with ODI 11g, using a feature called “Derived Select for Temporary Interfaces”. To take an example, consider a situation where you are building up a complex, multi-step interface where first, you aggregate orders by customer, and then you want to add an additional column to this dataset containing the customer order rank. You could start by creating a temporary interface that aggregated orders by customer, and then create a second interface that has the first temporary interface as a data source.
Then, instead of having the first temporary interface persist its output as a temporary table, you instead navigate to the Source Properties panel in the Interface Editor, and select the Use Temporary Interface as Derived Table (Sub-Select) checkbox, like this:
The benefit of going down this route is that you’ll cut down on the disk I/O, and disk space required, by removing the need to stage the intermediate results to a temporary table. The downside is that the SQL used in the main interface is going to be more complex, might take up more memory and might be a bit trickier to debug if there’s an error. But in general, if you’re using a fairly beefy database server to do your ETL, this new feature will make sense more often than not.
So there you go – some new interface and ETL features in the initial 22.214.171.124 release of ODI 11g. But hold on – as Peter Scott mentioned, there’s just been a new 126.96.36.199 release of ODI as well, so we’ll take a look tomorrow at two new features that come with this updated version – Load Plans, and OBIEE Lineage.