Thoughts on Oracle Data Integrator

March 26th, 2007 by Mark Rittman

I’ve been working fairly solidly with Oracle Data Integrator over the past few weeks, and I think I’ve used it enough now to form a few opinions on the product. I did some work with a client who wanted to evaluate it for it’s ability to handle Changed Data Capture, I’ve been writing an article for it for OTN and I’ve been scoping out the chapter on it for the forthcoming book. All of these tasks have required me to actually deliver something using it where I couldn’t just change the requirement if the tool didn’t do it well, so it’s been about as close an exercise as you can get to without actually using it on a real project. I’ve also had a chance to discuss how the tool is used with some of the ex-Sunopsis people who now work at Oracle, so hopefully I’ve got a good idea on how the tool is being positioned going in to the future.

Just to recap: Oracle Data Integrator is an ETL tool that Oracle acquired when it took over Sunopsis late last year. Data Integrator is being positioned as being the ETL tool to use as part of Oracle Fusion Middleware, as it’s just at home loading data into a non-Oracle database as an Oracle one, it can read from and publish to Web Services, it’s repository can reside on more or less any database, but it still uses the OWB paradigm of using the target database to do the ETL processing. In a way, it’s like OWB but written in a database-agnostic way, which I guess is why Oracle were interested in it in the first place. I wrote about ODI in it’s previous Sunopsis Data Conductor guise late last year, this link contains an archive of all the relevant blog postings.

If you drew a venn diagram of the functionality of OWB and the functionality of ODI, you’d get a considerable overlap, with the intersection being data integration, extracting from all different databases, loading to Oracle databases, data quality and error handling, support for OLAP concepts such as loading slowly changing dimensions and the creation of processes and workflows.

ODI would have as it’s unique capabilities the ability to load non-Oracle databases, native support for Changed Data Capture on both Oracle and non-Oracle databases, support for real-time ETL processes, built-in support for Web Services, and native support for extracting from non-Oracle databases (i.e. without the need for Oracle Gateways, or through ODBC). OWB would have, as it’s unique set of features, data modelling including OLAP data modelling, support for Oracle OLAP, a richer set of data mapping operators, the data profiler, transportable modules and change management/metadata change propagation.

So, one way of positioning ODI would be to say that it’s an alternative to OWB when performing the data integration and staging part of your warehouse build, particularly when you need to:

  • Capture new and changed data from a database
  • Source data from non-Oracle databases using native utilities
  • Capture data from, or publish transformed data to, a Web Service
  • Publish real-time ETL processes that respond to data pushed through them

One way of visualizing this is using the following diagram of the Oracle DW technology stack.

ODI as part of the Oracle DW Stack

As you can see, ODI is in a way a potential replacement, or supplement, to the Enterprise ETL Option for Warehouse Builder - especially as, in the 10.2.0.1 and 10.2.0.2 release of OWB, Slowly Changing Dimension support has a number of bugs, and the Change Data Capture and Real-Time mappings features originally in the beta version were pulled prior to the production release. In this case, it makes a fairly good case for replacing at least some of the more complex and technical aspects of OWB’s functionality, at least in the stages prior to loading into the warehouse structures proper, and it’s even more of a compelling story if you’re dealing with non-Oracle sources and targets or indeed a SOA environment.

So, where ODI fits in with the existing Oracle DW stack is as a complement, or an alternative, to some of the functionality OWB provides up until you get to the ODS and analytic layers of the warehouse. Beyond that, that’s where OWB comes in to it’s own; the OLAP data modelling features, support for building Discoverer EULs and BI Beans presentations, support for loading AWs. These are areas ODI leaves well alone, which means that, one option going into the future is to use ODI together with OWB on your more complex data warehouse projects, perhaps paying for ODI ($12k per target database CPU, $4k per source database CPU) by foregoing the Enterprise ETL license ($10k per CPU on each database OWB is deployed to). I suspect Oracle would prefer you to license them both, but I’d say if your data integration needs are more complex than OWB can handle, consider ODI as a way of building the staging layer of the data warehouse, with OWB used to build the layers from the ODS up.

Data Quality is another area that both tools handle well, but differently. As you probably know, OWB now comes with the Data Quality Option, which apart from the well-known Data Profiler, also allows you to define error-handling on all your target objects with error rows neatly moved off into error handling tables. Although it uses PL/SQL and the Exceptions Into clause, in 10g this sort of code is just as fast - if not faster - than LOG ERRORS and it also allows you to keep all your error handling in one place, defined as data rules, rather than scattered all over your mappings.

ODI takes a slightly different approach. What it does is allows you to define “virtual” constraints on tables that ODI enforces, which are then monitored by Control Knowledge Modules with erroneous rows again moved off in to error tables. What’s good about ODI’s approach is that it effectively “firewalls off” erroneous data contained in source tables, such that they get copied on to error tables in a separate, workarea schema on the source database, and never get so far as to be loaded into the target environment.

An ODI Check Control

This has the advantage of reducing the amount of data loading the target database has to carry out, as it doesn’t have to first load all the data, errors and all, into the target database before finding out which rows contain errors. This is particularly useful if your load window is small and you could do without unneccessarily processing rows that contain errors anyway - these are “firewalled off” by the virtual constraint and control knowledge module and never come near the target database. Where OWB comes in to it’s own though is the Data Profiler - with ODI, you need to know in advance what data rules (i.e. constraints) to apply, OWB on the other hand helps you determine them using the structure and semantics of your data.

Another area I found interesting with ODI was the support for Changed Data Capture. Some of the OWB old hands will know that CDC was originally going to be a feature of OWB10gR2, along with Real-Time mappings, but these were pulled from the product during the beta cycle, presumably so that they could actually get the product out of the door in a reasonable time. ODI however has pretty much full support for Changed Data Capture, both for Oracle databases and for other ones; for Oracle data sources, you can either capture changes using triggers, or using the Asynchronous Hotlog Changed Data Capture process I blogged about last year, with ODI acting as a graphical front-end for the process. I’m currently writing an article on this for OTN and so I won’t go into the full details, but in short the process for setting up CDC in this way on an Oracle 10g data source is as follows:

Firstly, enable your data server model (ODI terminology for an Oracle schema) for Journalizing, using either the simple trigger method, or the more featured consistent method that allows the capture of sets of tables using the LogMiner feature of Oracle 9i and 10g.

If you’re brave, ODI will even configure the source database to support Asynchronous Hotlog Change Data Capture, although it’s not recommended for production systems as it bounces the database half way through (that’ll keep your users/developers on their toes…)

Then, you select one or more tables to be part of the CDC set, and then use the Designer application to start the journal. Taking a look at the steps ODI takes to create the journal, you can see that it sets up CDC, in the standard way that Oracle 9i and 10g supports it, creates subscriber tables and sets up the capture and publish processes - pretty much all the things I did manually in my previous article.

CDC Setup 2

Now I’ll be honest here, in that it took me a pretty long time to get this running properly, but the problems were more problems with the complexity of CDC on Oracle - getting the capture processes set up properly, clearing out failed setups when I was working out how to do it - rather than issues with ODI per se, although one annoying aspect of ODI did make things unneccesarily difficult. Basically, when you create your connections through to the source Oracle database, you naturally create them using the credentials of the source Oracle user account. The problem comes though when ODI tries to write data to the workarea schema it creates on the source database; it needs the CREATE ANY TABLE, SELECT ANY TABLE, DELETE ANY TABLE etc priviledges, together in fact with the DBA role if you want it to configure the database for you, which in production is a bit of a no-no. In fact, to get it working properly, I had to create the source database connection using the SYSTEM account credentials; the advice from Oracle is to do this initially, then set it back to the credentials of the source schema after setup is complete, but it was a bit of a hassle, and something that prolonged the exercise by about a week, to be honest. So the advice here is, especially if you’re using CDC - when you create your Oracle data server, use the SYSTEM account as the login (apparently the password is encrypted when stored in the repository, so it’s not the end of the world).

Anyway, once you’ve set up journaling, actually selecting from the new and changed data is a doddle - none of the fiddling around selecting from subscriber views that you get if you set it up yourself. You still need to extend and then purge the subscriber window, which you can do manually, or incoporate into a process flow, but then you just tick a box to say “work with journaled data” when adding the data store to an interface, and bob’s your uncle.

CDC Setup 3

Beyond that point, you just deal with new and changed data the same way as data from the original table - a neat solution.

The obvious next question though, once you’ve got changed data capture working, is how to integrate the trickle of changes into your target database in real-time. In OWB, assuming you’d got CDC working through setting it up manually outside of OWB, you’d then probably design a mapping that read from the subscriber views and ran on a regular basis, say every minute or so, giving you latency of around a minute, depending on how long the CDC propagation and your mapping took. ODI takes a different approach though, in that you can define a process flow that reads from the journaled table, but then waits at the start for some data to appear in the journal before executing. Therefore, the process can be set to run, listen for changed data, and then kick-off when either a certain number of journal rows appear, or after a set period of time, whilst at the end the last step in the process kicks the process off again.

ODI Real-Time

So, in conclusion, ODI looks like a pretty nifty additional toolset when working with OWB on a data integration project, or when building a data warehouse when the integration and staging element is non-trivial. If you need to deal with non-Oracle sources, or you need to capture changed data (possibly in real-time) on Oracle or any platform, or if you’re doing data integration in a SOA environment, it’s a handy box of tricks that makes the Oracle stack a more capable data integration platform. Of course when Oracle isn’t the target platform, it makes things possible that in reality just aren’t practical with OWB, and reading the tea leaves ODI is going to be the integration point for the metadata used for OBI EE and for loading data in to Hyperiod Essbase. In one sense, it could be seen as an alternative to Oracle Gateways or the Enterprise ETL Option for OWB, but in reality, I hope on projects in the future it’s an additional piece of kit that we use alongside OWB’s features, as I think the two tools complement each other, once you get to know what each of them does well.

Comments

  1. Nathalie Roman Says:

    I’m glad you’ve made a comparison between OWB and ODI which is a comparison from the business intelligence perspective. I’ve made the same comparison but from another point of view, my question was which tool is most suited for my project, for my use case. I’ve listed up the different possibilities there were using Oracle and ended up with ODI being the solution that addressed my requirements the most.
    On a day by day basis I’m learning to work with ODI and getting to know the different features, etc. and until now I’m still happy with my decision ;o)
    It’s a little bit positioned in the SOA-world, integration with existing datasources, transformation to target-datasources, … the same you would accomplish when using ESB.
    If you’ve got other experiences with the tool you want to share or you’ve got advise for my migration path, my project feel free to comment on my blog. I’m frequently posting my experiences with ODI on the blog to get feedback on best practices, way of working, etc.

  2. Mark Rittman Says:

    Hi Nathalie,

    Thanks for getting in touch. I had noticed your earlier postings on ODI, I’m keeping an eye on them to see how your experiences correlate with mine.

    Good luck with evaluating ODI, and as you say, I’ll add comments to your postings if I can add anything of value.

    regards

    Mark

  3. Sangeetha Says:

    Hi

    I’m working on setting up CDC in ODI. Can you explain (with steps) how did you set up the package and interface in the example snapshots that you have covered in this article?

    Thanks
    Sangeetha

  4. Mark Rittman Says:

    Sangeetha,

    I actually put the CDC steps together for an OTN article that will be published in the next few weeks. Keep an eye on this site, and I’ll post a link to the article when it’s on OTN.

    regards

    Mark

  5. Shema John Says:

    Mark,

    Am part of Sangeetha’s team, where we are in the solution design phase for a client. We are keen on using CDC to capture changes to multiple tables (in lets say 1 min. intervals) and form a sinlge XML message with the captures and send it across to the SOA ESB/BPEL.

    We have managed to create a journalized table, but we are not able to proceed furthur to get a process(ESB?) poll the journalized table and pick.

    If you could give us clues to those couple of steps, would really help us confirm quickly to the client that CDC is a good option for changed data captures from multiple data sources.

    Regards
    Shema

  6. Mark Rittman Says:

    Shema, Sangeetha - I’ve sent you a document via email that explains how it works.

    regards

    Mark

  7. Ravi Says:

    Hi Mark,

    Any thoughts on using ODI to integrate with SAP R/3 system ? Do we need to use Iway adapter ?

    Any documentation you have would be appreciated.

    Thanks
    Ravi

  8. Neha Says:

    I am trying to access remote file structure , present in AIX server through my window ODI client .But I am unable to create file Data Store for the same.

    It’s showing error that “schema does not exist”.

    Please suggest

  9. Vikash Says:

    Hi Mark,

    I am trying to evalute the ODI tool for the conversion and interfaces purpose while implementing the R12 Oracle Enterprise Suite, am but skeptical of using this tool for this implementation as I am not sure if all my requirements will be met by ODI, can you give your inputs of utilising ODI for the above context, also would like to know the scenarios where this tool can be used .

    thnx

    Vikash

  10. Raghava Says:

    Good information about the OWB and ODI

  11. Jeff Says:

    Good article! One thing we debate at my work is choice of ETL vendor. We use OWB and oracle data warehouse and ODI appears to have some interesting features. Our debate goes like this “should the ETL tool integrate more closely with the database or the data access tools”. We use Business Objects for data access and some feel we should be considering their Data Integrator product as their being sold on the intgegration with the BO product stack. I feel it’s more important that the ETL and db work well together.
    Any opinions?
    jeff

  12. Peter Scott Says:

    Jeff - my take on this is that some of the integration tools from query tool vendors such as Cognos and BO are designed to work with any target database and have the “lowest common denominator” approach to coding this is often comes out as row-based simple SQL. I would prefer to implement good database specific code for my ETL process - it is here where I want the best use of my batch window so I favour product that target the code generated to platform

  13. Leo van Gaal Says:

    Great article! I downloaded ODI to solve an old problem: migration of a dBaseIII fiel with a memo field into an oracle table.
    are knowledge modules avaiable for this situation ?

    thans in advance.

    Leo

  14. mahesh Says:

    It’s great to read Mark’s evaluation on ODI. I am in a process of synchronizing data between two applications which do not expose internal data schemas but a web service and SOAP API list are provided for manipulating data. For this use case, as a POC I tried to invoke the provided web service and create a XML data server. But I could not able to invoke any method except login method as other method calls need authentication (other methods do not login info arguments). I still cannot find a way to store login() response (URL and sessionId) temporary and post subsequent method calls to that url (like we do in a proxy in web services client application).

    May be I’ve not explained well the situation, but anyway basically what I need is to create a ‘Data Server’ from data which acquire from an external web service and proceed with interfaces in ODI.

    I follow Nathalie’s blog too, but those areas have not been discussed yet.

    Thanks.
    -mahesh

  15. Simone Says:

    Hi, I’ve got a problem.
    When I try to journal a datastore which its name is 24 o 25 characters , I got the error
    “This datastoreis not journalized on this context”. Why?

  16. matt maxson Says:

    Hi Mark,

    Good article. Has anyone used the new Hyperion HFM and Essbase adaptors for ODI to push data from ODI to these source databases? Or are people using HFDM to do this from ODI?

  17. bogiadispacy Says:

    Hi
    I’m working on setting up Sync CDC in ODI but I don’t know to capture data . So can you sent me the document to explain how Sync CDC is set up in ODI
    Thanks

  18. Harsha Says:

    Hi,
    Whether ODI supports oracle 9i Database r not?