Real-world Feedback On OWB Performance Tuning

September 26th, 2005 by Mark Rittman

Earlier this year I was asked to take a look at
how we
might use 10046 trace files to help carry out OWB performance tuning
, the
idea being that by automatically generating trace files, we could identify why
certain mappings and

process flows
were taking too long to run. A reader was
interested in the technique at the time and tried it out on an OWB implementation he was
working on. The other day he sent me some feedback on how the approach went. I
thought it’d be interesting as a bit of "real-world experience" on carrying out
OWB performance tuning.

"Bit of follow up that I’ve been meaning to write re the statistics
gathering/event tracing in mappings that Mark wrote up on his blog a few
months back.

Further consideration has shown that this was not the correct approach for
this project, reasons as follows. We undertook a strategy to run a daily
smoke test (http://www.stevemcconnell.com/bp04.htm)
off the full process based around different test packs. This enabled us to
test volumes, specific cases etc etc. With daily event 10046 stats turned on
(the full process executed 300 sql scripts and ~100 mappings) we found the
following:

(1) There was too much information to review.

(2) We kept running out of disk space because of the Gb’s of log files -
thus actually stopping the jobs running as the database errored.

The approach now taken is the take timings against volumes for each of the
jobs run, and the extrapolate the estimated run time for full volume. We
then have a daily/weekly meeting to review the top 10 cases and on the basis
of this focus any performance and tuning activity as required. I believe
this to be a more practical approach, and also follows to a degree the
approach describe by C Millsap et al in the optimising Oracle book, i.e.
focus your tuning efforts to provide maximum benefit from the resource
(developer time) you assign to it. Depending of the mapping to be tuned we
will then determine if event 10046 tuning is required, it may be a case of
logic rewrite or peer review may solve the issue.

I do not mean to devalue the work we did earlier in the year, but like
everything theory and reality are often different, and sometimes the old
fashioned look at the query/explain plan and realise no indexes are being
used is enough.

The greatest asset has been the daily smoke tests this has give us a
historic record of timings, if we do get unexpected variances in runtimes we
can use these figures to determine if its related to database config and
utilisation, or whether it relates to code releases (all releases are logged
and versioned).

Mark - may be worth updating the blog - I can write something if you like,
or just quote the above if you think it’s appropriate; I think it would be
useful to share though."

Thanks for the feedback. I think what he is saying here is that you shouldn’t really by
default trace all mappings, as the hit on performance and on disk usage is too
great. If as he says though, you concentrate tracing on just those mappings
that are affecting "response time" - as he says, determined through
the smoke test approach he used - then it can still be a useful approach,
although you shouldn’t forget the simple approach of just generating an
intermediate mapping and then running an explain plan on it.

Comments

  1. Doug Burns Says:

    Hi Mark,
    I ran into similar problems when enabling 10046 on a number of mappings in an overnight batch. We ended up limiting the mappings we traced to a dozen or so that were the main culprits. Even then, there was a ton of trace information to wade through because we were using parallel execution. We could have done with some sort of trace aggregation tool to cut down the workload and time spent. I don’t think my boss believed the amount of time we needed to do the job properly!
    In the end, though, we were able to identify what the problem was. It was a library cache pin bug so the 10046 was essential because a simple explain plan wouldn’t have exposed this.
    So I’d agree that it’s still the only way sometimes but needs to be applied intelligently.
    Cheers,
    Doug

  2. Jeff Moss Says:

    I don’t think I’d consider putting in 10046 en masse either - I think it’s a tool for targeted tuning on a code unit(s) you have identified as problematic, either from wall clock timing, timing delta, volume delta or resource usage readings.
    Our approach is to analyse the stats of the OWB mapping executions to look for:
    High volumes
    Length execution times
    Wild variances in Volume/Execution time as compared to recent averages
    Anything that pops up on the radar gets looked at by those responsible for tuning…where a 10046 might then be used as one of the tools in the armoury.
    Another thing we look at is the Top 10 “Top SQL” reports off Enterprise Manager…but the only problem we have there is that we need to “read between the lines” just a little to know which mapping the statement sits in…it’s usually obvious but it would be nice if it was definitive…I’m still thinking about how to do that one.

  3. Mark Says:

    Doug, Jeff, cheers for the additional feedback.