A Frustrating Couple of Days

If you read Tom Kyte's blog the other day you'd have seen the entry where he asks you to guess where he's currently staying, using a photo taken out of his hotel bedroom window. Well in the same spirit, here's one out of the office window where I've been working the last few days - can you guess where it is? A clue is the unusually-shaped cathedral in the top left-hand corner of the picture. Another clue is that it's rained all of the last three days. Eat your heart out, Las Vegas.

Frustrating three days actually. I was called up to help on a project where a bulk data load (for a data migration) was taking a disproportionate amount of time. The requirement was to load a few tens of thousand records into existing tables in a transactional schema, the sort of thing that normally would only take a few minutes to run. For this data load though, just a couple of thousand would take many hours, the reason for which was the tables that were being loaded into each had triggers attached to them that went on to populate a series of interface tables. The triggers worked fine for normal OLTP transactions, each trigger that fired maybe added a tenth of a second to the transaction time, no-one noticed the difference, but when you bulk loaded into the table, performance was all over the place. Disabling the triggers wasn't an option as the interface had to be maintained, but leaving them on there meant the load couldn't happen in the batch window available. And just to add to the challlenge - if I ran the load, to see what particular tables took the longest - I had to ensure that the schema was returned to it's previous "unloaded" state at the end, as it was needed for something else later in the week.

Looking at the triggers, the problem boiled down to two things. First, each trigger caused lots of separate table selects to happen, lots of inserts and updates and using code that wasn't particularly efficient. Second, the triggers fired for every row that was inserted, updated or deleted, which kicked off a lot of database activity for each row of data that was inserted into the schema. For simple, single transactions, as I said the effect wasn't noticeable, but if you suddenly try and force through thousands of transactions in one go, you're so out of luck.

First thing that struck me was how hard it is to try and sort out performance when your system has lots and lots of triggers. You can try and manually trace through the path that data goes through, but when each table that's touched potentially has triggers that touches other tables, it's not an easy task to work out the impact of a DML operation. Taking a further look showed that all our operations were inserts, but a large proportion of the triggers only started affecting other tables if the DML operations were updates or deletes. Tricky.

The plan I put together was to take the batch file that does the batch insert, add a 10046 trace to the start, run the load though and then identify those parts of the load (and therefore those tables/triggers) that could most benefit from tuning. As I said though, I couldn't just run the load, I had to guarantee that I could restore the schema back to it's original state. The first option then was to take an export, create a new schema, load the export into that and run the test from there. Gave it a try, but the export file hit 400GB and then caused a segmentation fault - presumably, run out of disk quota. No immediate opportunity to get the DBAs to create an environment for me, so the next idea was to create a new schema, create empty copies of the tables that would be affected by the import - around 10 tables that I would directly load, plus 70 more that the triggers would insert into or update - then create synonyms to the procedures and reference tables that were held back in the original schema. Tried it out (after finally getting all the triggers to compile) but again out of luck, as the triggers had the original schema name hardcoded into them in certain places, wouldn't fly as an idea. Last idea was to create copies of the affected tables within the same schema, then do the load and then truncate/reload those tables from the copies, but as the application wasn't documented I couldn't be 100% sure that I'd taken copies of every table that the triggers affected, therefore I couldn't risk starting the load.

So, in the end I run out of time and had to admit defeat, for the time being at least. Not something that usually happens, but the combination of the triggers, the environment and the circumstances meant that I had to give up on this one, at least for this time. Shame actually, not nice coming away without a result, still it reminded me why you should try and avoid putting application logic into triggers, they're a nightmare to try and tune if you're anyone other than the person who put the code together. Anyway, off to Reading and Swindon now, should be a chance to do a bit more writing for the new book chapter I'm putting together.