Cardinality of Rowsources

One of the things I sometimes come across when looking at legacy ETL code for data loading is the misuse of query hints to "improve" data load performance. Sometimes DBAs or developers do not remember that "tuning the select" may not be the same thing as "tuning the insert"; Jonathan Lewis wrote about an example of this in December 2008.

The circumstances Jonathan describes is commonplace in some data warehouses; an insert into a DW table from a remote (database link) source joined to, or filtered on, some rows from a table on the target data warehouse. In the post's comments there was some debate on why Oracle needed to ignore the driving site hint of this type of query when obeying it would no effect on the insert (other than an improved query plan). There were also some suggestions to get around this such as creating a remote view to do the filter or join, which has the disadvantage of needing to create objects on the source database (views and database links) and the use of  pipelined functions on the target system to act as the rowsource for the insert.

I like the pipelined function approach and although there a few quirky things about parallel inserts it does present a good way forward that is not invasive on the source database. Well, it is a good way forward except that cardinality of the row source is often wrong. So I especially liked the piece just published by Adrian Billington on ways to set cardinality on piplined and table functions. If you join to table functions in your queries I urge you to take a look at what Adrian has to say.