Dynamic SQL

May 26th, 2009 by Peter Scott

One of my pet hates is the inappropriate use of dynamic SQL within ETL processes, for example building a command as VARCHAR2 string then using it in an “execute immediate” statement. Putting commands into strings whether hard-coded ones within the package or procedure, retrieved from some ‘code’ table in the database or even built on-the-fly can make support, enhancement and documentation tasks more complex – string content don’t show up in the “all_dependencies” view and that can make finding the code that manipulates data overly complicated.

I would be a hypocrite to say “never use dynamic SQL”. It does have its uses. A few years back I had to write my own materialized view refresh code for the special case of complete refresh of a single partition of a materialized view, or to be more precise the refresh of over 50 different partitioned materialized views – the pragmatic approach was to write a single procedure to use dynamic SQL to do the DML and to handle the refresh of a single partition.

Recently, I came across an even more compelling reason to use dynamic SQL – one of my customers needed to load several thousand flat files per day into their data warehouse. Each file had the same structure, we just needed to load them as fast as possible to a staging table and to persist the name of the original source file with the loaded data for data lineage purposes. Using Oracle external tables we have two viable options – rename the source files one-by-one to be the external table’s location name (there is also a similar approach of concatenating the files together to build a ‘massive’ external table) or to alter the external table location to match each of the incoming file names.

I prefer the ALTER TABLE approach as I don’t need to manipulate files in the OS and I can do all that I need in the database (albeit with a little touch of database Java to build a table of files to process). In pseudo code terms I:

insert a directory listing into a temporary table (using a database Java procedure)

for each filename that matches the filename format

    Alter the external table location to be the be the new filename

    insert the content of the external file into the staging table (append)

Loop to the next filename

Of course there are some other bits and pieces going on to log the activity and to move (or rename) files after processing so that they don’t get processed again if we need to rerun the process.

By processing both the ALTER TABLE and the INSERT INTO statements as dynamic SQL we avoid the overhead of repeatedly invalidating references to the external table and need to recompile any code that uses the external table;  and that is quite a saving in time over many, many files

Comments

  1. Gary Says:

    I prefer the rename of the underlying file option. You can do it with UTL_FILE.FRENAME (so no OS access needed). If you ALTER TABLE, you are using DDL so get an impliit commit and lose locks. Renaming the file won’t commit, release locks or invalidate code and will, with the OVERWRITE flag set to FALSE, fail if another process is already using the table for a different file.

  2. chet Says:

    Gary wins (I don’t know why everything’s a competition to me these days. :)

    I concur with Gary though. I tried it out before with the ALTER TABLE approach to change the location but found (realized?) there was the little implicit commit in there. Wrote it up last year even!

    http://www.oraclenerd.com/2008/05/validating-process-part-ii.html

  3. Peter Scott Says:

    Thanks for the input – it is so easy to forget the wealth of functionality in the supplied packages, and although I thought I knew UTL_FILE, the rename just slipped by.

    It will certainly be worth benchmarking the two approaches to see what is best for my circumstance. It is really a matter of the impact of the implicit commit – the code invalidation is not a problem for me as no code directly refers to the external table.

    Thanks again – it is feedback that makes blogging worthwhile!

  4. Adrian Billington Says:

    Changing the location of an external table is one of those “lightweight” ALTER TABLE operations that doesn’t invalidate your dependant code. I suppose if it did, the feature would be next to worthless. You could also wrap the ALTER TABLE in an autonomous transaction, but I think that’s been mentioned elsewhere and might be slower combined with an ALTER TABLE than the FRENAME method if you have thousands of files to deal with. Let us know the benchmark results!

  5. Peter Scott Says:

    I had a few minutes today to build a trivial test case – a three column source file (csv format, 10 rows) and cloned it 200 hundred times. The target table had the three source columns and columns for the original filename and a date stamp. Oracle 10.2.0.3 on a my “play” Windows VM

    The basic load procedure was to use a database Java call to insert the candidate files to load into a table and then loop with an implicit through the table processing each file in turn.

    The options:
    1) Dynamic SQL ALTER TABLE + Dynamic SQL INSERT
    2) Dynamic SQL ALTER TABLE + Dynamic SQL INSERT (Append hint)
    3) Dynamic SQL ALTER TABLE + SQL INSERT
    4) Dynamic SQL ALTER TABLE + SQL INSERT (Append hint)
    5) UTL_FILE.FRENAME + SQL INSERT + UTL_FILE.FRENAME
    6) UTL_FILE.FRENAME + SQL INSERT (Append hint) + UTL_FILE.FRENAME

    For the rename approach I renamed the file to the external table’s location then renamed again after loading – as this is what I would be doing in a real dataload

    The first thing to note if you use the append hint and conventional SQL in the loop is that you need to commit between each file as multiple direct path to the same table will fail – for multiple ‘small’ files the overhead of the commits may steal all the advantage (and more) of the direct path loading.

    So what approach is fastest in my very unreal 200 file load?
    Remember the files are ridiculously small and the load times of between 4.5 and 7 seconds gives an unreasonable degree of uncertainty, but fastest for me was number 1) followed 0.5 seconds later by 5) then option 3) which was slower again by 0.5 seconds – the real laggards where the direct path loads – I suspect that this may well change with increasing file size.

Write a comment





Website Design & Build: tymedia.co.uk