Using Data Pump and Transportable Tablespaces with OWB10gR2 “Paris”

One of the new features in Oracle Warehouse Builder 10gR2 "Paris" is "Transportable Modules". Transportable Modules are source modules that get their data from remote databases, with the data being brought into your warehouse environment using either transportable tablespaces if your source and target databases are Oracle 9i or higher, or data pump, if both source and target databases are 10g or higher. Note that both source and target database have to be the same version (different releases within the same version are OK though.)

The idea behind transportable modules is that they allow you to rapidly replicate data between your source databases and warehouse environment, without having to use flat files or manually set up FTP scripts. When you transport data from an Oracle 9i schema, Oracle Warehouse Builder 10gR2 uses transportable tablespaces to move your data, and when you're on 10g or higher, you can either still use transportable tablespaces, or you can use data pump, the successor to imp/exp. So how does this work?

In this example, we've got a schema in a remote database called HRDATA that contains a number of tables. We want to transport these tables into the warehouse environment so that we can include them in the warehouse load. We could just export them to flat files and copy them across, or even create a database link, but in this case we want to replicate the schema into our warehouse database and work on it from there. As both databases are 10g (the source database is 10.1.0.2, the target is 10.2.0.1) we'll use data pump to move the data across.

Going into Oracle Warehouse Builder 10gR2, the first step then is to right-click on the Transportable Modules node and select "New..."

Next, we name the module and provide a description:

The next step is to provide OWB with details on how to connect to the remote database. The user that we connect as has to have the EXP_FULL_DATABASE and ALTER_TABLESPACE roles - I connected as SYSTEM which had the former by default, but I had to explicitly grant it the latter.

Note also the FTP username and password that it requires. The Transportable Modules feature works by moving files around in the background via FTP, or at least it does when you choose to move your data via transportable tablespaces - when it uses data pump I expect it could just connect directly from source to target, therefore not needing FTP, but I'm not sure if OWB implements it using this quicker method. Also, when I first tried this, I connected as the schema owner that contained the data, but as you have to grant EXP_FULL_DATABASE and ALTER TABLESPACE, I thought it safer to leave the source schema with it's default permissions, and connect using SYSTEM instead.

Anyway, once you've entered the details for the source database, you can do the same for the target. Again, I connected as SYSTEM here, as the transportable module routine goes and creates the schema you're going to load into anyway, which will have the same name as the schema you're loading from. Also, even though I used the SYSTEM account, I had to grant a bunch of roles to the account before I could use it as the target location:

SQL> grant connect to system;

Grant succeeded.

SQL> grant resource to system;

Grant succeeded.

SQL> grant imp_full_database to system;

Grant succeeded.

SQL> grant execute_catalog_role to system with admin option;

Grant succeeded.

SQL> grant create materialized view to system with admin option;

Grant succeeded.

SQL> grant alter tablespace to system;

Grant succeeded.

Note that the module isn't going to load your source tables into the SYSTEM schema, it's just going to connect as SYSTEM when it brings data into your target database; when you set the configuration options later on, you can define the specific schema in your warehouse environment that you want to load data in to.

Once you've set this up, the wizard looks like this:

Now that you've defined the source and target connections, the next step is to pick the objects that you want to transport. The next dialog that comes up gives you a listing of the tablespaces in the source database, and you can then go and pick either a whole schema, or individual tables, to bring across. The fact that your initial listing is of tablespaces is because the primary use for this module is to facilitate transportable tablespaces, but it works just as well if you're going to use datapump, as we are.

Once you've completed all the details, OWB then goes away and reads the source table metadata, so that you can work with it in your project.

At this point, there's no source data in your target warehouse, all you've done is define the sources and targets and brought across the table metadata. The next step then is to deploy the module, but before you do that, you need to set some configuration options, by right-clicking on the module you've just created.

As we're not going to use transportable tablespaces and use data pump instead, I untick the "Transport Tablespaces" tickbox, and change "What To Deploy" to "Tables Only" as I only want to bring across tables of data, not all the other objects in the schema.

The second part to the configuration options lets you define what local schema the replicated data will go into; in my case, I want to redirect the data into a new schema called "HRDATA_SRC" which OWB will create for me when I deploy the module.

You can of course specify a schema that's already created, and if I had a general "STAGING" schema I wanted to load data into, I could specify it here. For what I'm doing though, I want to create a separate schema to hold the replicated data, so I give it the name, password and default tablespace, set the degree of parallelism for data pump, and press OK.

Now that the module has been defined, you can then view the table metadata in the Design Center, like this:

... then finally, use the Control Center Manager to deploy the module and copy the tables across.

Now, when I connect to the warehouse environment, I can use SQL*Plus to check that the data is there.

SQL> conn src_hrdata/password@ora10g
Connected.
SQL> select table_name
  2  from   user_tables
  3  /

TABLE_NAME

EMPLOYEES
JOBS
JOB_HISTORY
COUNTRIES
REGIONS
DEPARTMENTS
LOCATIONS

7 rows selected.

SQL> select *
2 from job_history
3 /

EMPLOYEE_ID START_DAT END_DATE JOB_ID DEPARTMENT_ID


    102 13-JAN-93 24-JUL-98 IT_PROG                                   60
    101 21-SEP-89 27-OCT-93 AC_ACCOUNT                               110
    101 28-OCT-93 15-MAR-97 AC_MGR                                   110
    201 17-FEB-96 19-DEC-99 MK_REP                                    20
    114 24-MAR-98 31-DEC-99 ST_CLERK                                  50
    122 01-JAN-99 31-DEC-99 ST_CLERK                                  50
    200 17-SEP-87 17-JUN-93 AD_ASST                                   90
    176 24-MAR-98 31-DEC-98 SA_REP                                    80
    176 01-JAN-99 31-DEC-99 SA_MAN                                    80
    200 01-JUL-94 31-DEC-98 AC_ACCOUNT                                90

10 rows selected.

Just to recap then, the Transportable Module in OWB "Paris" lets you bring in to your warehouse database, either via transportable tablespaces or data pump, data from a remote Oracle database. The reason you'd use this, as opposed to connecting via a database link, is that transportable tablespaces and data pump are far quicker at bringing across large amount of data, and the data then sits in your warehouse database, rather than you having to keep going back and forth to the remote database.

In a few days time, I'm going to follow this up by taking a look at setting up Change Data Capture with OWB "Paris".