A Frustrating Couple of Days

June 15th, 2005 by Mark Rittman

If you read Tom Kyte’s blog the other day you’d have seen the entry where he
asks you to
guess where he’s currently staying
, using a photo taken out of his hotel
bedroom window. Well in the same spirit,
here’s one out of the
office window
where I’ve been working the last few days – can you guess
where it is? A clue is the unusually-shaped cathedral in the top left-hand
corner of the picture. Another clue is that it’s rained all of the last three
days. Eat your heart out, Las Vegas.

Frustrating three days actually. I was called up to help on a project where a
bulk data load (for a data migration) was taking a disproportionate amount of
time. The requirement was to load a few tens of thousand records into existing
tables in a transactional schema, the sort of thing that normally would only
take a few minutes to run. For this data load though, just a couple of thousand
would take many hours, the reason for which was the tables that were being
loaded into each had triggers attached to them that went on to populate a series
of interface tables. The triggers worked fine for normal OLTP transactions, each
trigger that fired maybe added a tenth of a second to the transaction time,
no-one noticed the difference, but when you bulk loaded into the table,
performance was all over the place. Disabling the triggers wasn’t an option as
the interface had to be maintained, but leaving them on there meant the load
couldn’t happen in the batch window available. And just to add to the challlenge
- if I ran the load, to see what particular tables took the longest – I had to
ensure that the schema was returned to it’s previous "unloaded" state at the
end, as it was needed for something else later in the week.

Looking at the triggers, the problem boiled down to two things. First, each
trigger caused lots of separate table selects to happen, lots of inserts and
updates and using code that wasn’t particularly efficient. Second, the triggers
fired for every row that was inserted, updated or deleted, which kicked off a
lot of database activity for each row of data that was inserted into the schema.
For simple, single transactions, as I said the effect wasn’t noticeable, but if
you suddenly try and force through thousands of transactions in one go, you’re
so out of luck.

First thing that struck me was how hard it is to try and sort out performance
when your system has lots and lots of triggers. You can try and manually trace
through the path that data goes through, but when each table that’s touched
potentially has triggers that touches other tables, it’s not an easy task to
work out the impact of a DML operation. Taking a further look showed that all
our operations were inserts, but a large proportion of the triggers only started
affecting other tables if the DML operations were updates or deletes. Tricky.

The plan I put together was to take the batch file that does the batch
insert, add a 10046 trace to the start, run the load though and then identify
those parts of the load (and therefore those tables/triggers) that could most
benefit from tuning. As I said though, I couldn’t just run the load, I had to
guarantee that I could restore the schema back to it’s original state. The first
option then was to take an export, create a new schema, load the export into
that and run the test from there. Gave it a try, but the export file hit 400GB
and then caused a segmentation fault – presumably, run out of disk quota. No
immediate opportunity to get the DBAs to create an environment for me, so the
next idea was to create a new schema, create empty copies of the tables that
would be affected by the import – around 10 tables that I would directly load,
plus 70 more that the triggers would insert into or update – then create
synonyms to the procedures and reference tables that were held back in the
original schema. Tried it out (after finally getting all the triggers to
compile) but again out of luck, as the triggers had the original schema name
hardcoded into them in certain places, wouldn’t fly as an idea. Last idea was to
create copies of the affected tables within the same schema, then do the load
and then truncate/reload those tables from the copies, but as the application
wasn’t documented I couldn’t be 100% sure that I’d taken copies of every table
that the triggers affected, therefore I couldn’t risk starting the load.

So, in the end I run out of time and had to admit defeat, for the time being
at least. Not something that usually happens, but the combination of the
triggers, the environment and the circumstances meant that I had to give up on
this one, at least for this time. Shame actually, not nice coming away without a
result, still it reminded me why you should try and avoid putting application
logic into triggers, they’re a nightmare to try and tune if you’re anyone other
than the person who put the code together. Anyway, off to Reading and Swindon
now, should be a chance to do a bit more writing for the new book chapter I’m
putting together.

Comments

  1. Michael Armstrong-Smith Says:

    Hi Mark
    My guess is that you are in Liverpool, England. Am I close?
    Best wishes
    Michael

  2. thierry Says:

    Why not loading n files instead of one big file? Kick in parallellism, and when the system scales a little, the load will fit within the available window. good luck.

  3. Pete_s Says:

    I would have guessed Liverpool (and without seeing the picture..) but only becuase StatCounter had you down as in Liverpool when you posted a comment on my Blog!
    I hate triggers… it is often the simple way for developers but a nightmare for people needing to make bulk data changes in the furture. It is so hard to follow things through when you get triggers triggering triggers.

  4. Mark Says:

    Yes it was Liverpool!
    The issue of triggers was an interesting one. The problem that they solved – how to feed changed data through to a set of interface files, for later applying to a second system – is one that’s quite suited to triggers. Triggers of course are how synchronous change data capture works, so there’s a precident for keeping change tables up to date. The problem I had though was that the triggers killed performance when carrying out bulk inserts, and the distributed/scattered nature of the interface logic, being embedded in lots of table triggers, made it difficult to work out where the bottlenecks were happening and where I could best apply my tuning work.
    What I was hoping for was to trace the flow of SQL by doing an 10046 trace, the problem I had though was that we couldn’t put a “safe” environment together in the few days I had to do the work. Frustrating, as I said…

  5. victor ladge Says:

    is that a reflection of a bed, a table, a television and a small bar of soap in the window?

  6. Jon Says:

    I think the problems you outlined are common among data migrations: you are trying to move a large mass of data from one transactional system to another. The target system has not been designed for this sort of processing, but has a number of interfaces/routines that could be used for audit, assigning account numbers etc. The question is do you rewrite the logic, in which case you lose the advantage of a production tested robust piece of code, or do you try and ‘hack’ something together that will accomodate the data sizes – the risk being you lose the rigour of the above process, and you have to rewrite 50% of a target system which itself took 12 months to build – no easy answer.

  7. markmal Says:

    Mark,
    You can try to
    1. disable triggers
    2. load data
    3. enable triggers, modifying INSERT triggers to UPDATE.
    4. update tableA set col=col;
    5. return triggers to only INSERT mode.
    The assumption is that the current cascading trigger’s logic allows this trick, and the “interface” part can be done outside of a maintenance window.
    The risk here is that the last part is going outside of maintenance window, in production hours and prevents a back out plan.

Website Design & Build: tymedia.co.uk