A Discoverer Performance Dilemma

January 17th, 2006 by Mark Rittman

I’ve got an interesting dilemma with a piece of consulting I’m doing this
week. We’re working with charity up in London who wants to add some Discoverer
reporting to their OLTP application. They’d got licences for Discoverer but
never got around to deploying it, and so our task was to do a few days of EUL
and report development, then spend some time training the team on how we did it.
Their resources are limited and so the scope of the work is specifically to add
reporting to their existing application, not develop a data mart or data
warehouse and report off of that. They’re also wary of making changes to the
OLTP application to make reporting easier, as it works "just fine" at the moment
and again resources are an issue. It seems a fair request to me – not all
reporting has to be off of specially set up reporting databases, it’s perfectly
reasonable to want to add a bit of reporting to your line-of-business
application, in a way it’s what Discoverer was originally designed for.

The application was originally developed using Oracle Designer and makes a
lot of use of "domain value" lookup tables, a three column table with domain
key, code and description which acts as a central lookup table for all tables
that store just ID values. To take an example, the customer table might have IDs
for gender, age group, requirement, outcome and so on, whilst our central domain
lookup table would have a set of rows containing all gender values, all age
groups, all requirement types and so on, like this: 

Domain Key   Code      Description
----------   --------  -----------
GEN          M         Male
GEN          F         Female
AGEGROUP     1         0-10
AGEGROUP     2         10-16
AGEGROUP     3         17-60
REQTYPE      1         Advice
REQTYPE      2         Counselling
..etc

This works quite nicely for them as the application can dynamically add new
types of attributes, and new attribute values, but it’s a real issue when we
come to querying and accessing the data as:

  • The value column is dependent on the domain key column, which screws up
    the optimiser – it can’t accurately predict the selectivity of a "code"
    value as it needs to consider what "domain key" value is also being used;
  • The same table is being used all the time for key lookups, which will
    cause contention for the same block if everyone is referring to it all the
    time, and
  • (this is more down to the normalized design) a simple query involves
    about 20 joins back to this same table to retrieve attribute descriptions

There’s lots of other stuff like this as well – outer joins that are required
in order to return rows when part of the information is missing, and a data
model that’s fairly complicated to understand and navigate because it’s
primarily there to provide a flexible, extensible environment to do their
transaction processing.

Now I’ve read lots of articles why this sort of approach can even be an issue
for OLTP applications, not just DSS ones; the issue over the dependent columns
can lead to too many full table scans being run, or indexes being used when the
predicate isn’t actually selective enough, and the contention for that single
domain values table is going to at least cause lots of buffer busy waits and at
worst start serializing the application. However it’s the Discoverer bit I’m
thinking about now, and in particular – how can I speed up their reports whilst
causing them the least amount of development and additional administration work?
As an added bit of fun they’re still on Oracle 8.1.7 (soon to go to 9.2.0.5) and
currently using the Rule-Based Optimizer.

My thinking so far has been along these lines:

  1. Starting with a simple one or two folder (aka Table) query, retrieve the
    explain plan, run timings and then gradually add in more complexity, to see
    at what point the query starts to slow down. What I expect to happen is that
    joining to a particular type of table – say the domain value lookup one -
    will suddenly slow down the query, and this is where the effort should
    concentrate. In other words, find out where the issue is before charging in
    and making loads of changes.
  2. Assuming the domain value lookup table is the issue, split out the
    different domains so that there is instead a gender lookup table, an age
    bands lookup table, an requirements type lookup table and so on. This should
    be easier for the optimiser to deal with and will spread the demand for key
    lookup values over several database blocks. To minimise the amount of
    development work and administration, create these tables as materialized
    views based off of the domain values table, and refresh them on commit – new
    values are rarely added to it won’t be much of an overhead.
  3. Move them to the Cost-Based Optimiser as soon as possible, put in place
    statistics gathering, gather histograms (assuming #1 is done) and try and
    use some of the new feature that make joining large tables more efficient -
    hash joins, star transformations and so on.
  4. Create indexes where columns are regularly used as part of conditions,
    bearing in mind of course that it’s an OLTP system and there’s going to a
    cost in maintaining them.
  5. Perhaps create denormalized versions of the key tables used for
    reporting that add back in the attribute descriptions, again through
    materialized views, and run reports off of these rather than the normalized
    OLTP tables. This will remove the need for lots of key lookup joins, plus we
    can resolve the missing values problem that currently requires the use of
    outer joins.
  6. Once they move to 9.2.0.5, move any frequently used tables to the KEEP
    pool, make sure they stay in the buffer cache.
  7. Consider rebuilding the lookup tables as Index-Organized Tables, reduce
    the LIOs for a key lookup from 2 to 1.
  8. Make sure we aren’t negating any indexes through things like UPPER()
    being applied automatically to conditions, Default Date Hierarchies being
    used to make selections based on Year, Month and so on – if we find these
    the usual solution is to create some additional function-based indexes that
    match these derivations.
  9. For the big tables that are queried, consider partitioning them based on
    branch code or date of enquiry, so that if a full table scan is needed, at
    least most of the partitions should get pruned and less data will need to be
    worked with.

That’s it for the time being. None of these should need much maintenance -
once the branch partitions are created they won’t need much adding to, the
materialized views take care of themselves, the IOTs only need to be created and
the statistics can be gathered as part of a batch job – so hopefully it won’t
involve much additional administration. I don’t know yet whether they’ll go for
it, but I do hope so as I’m keen to see them get the most out of the work
they’re putting in. If we get anywhere with it I’ll report back on the results,
and if anyone’s done anything similar (within the same constraints) I’d be
interested in feedback. Remember – the reports need to run fast, the data really
isn’t suitable for MI-style reporting, but any improvements we put in have to
have little development or administration impact.

Comments

  1. Pete_S Says:

    I have a customer with a similar table, but their one is around 2m rows long and with a surrogate key!
    option 2 looks a good bet
    3 needs lots of test that you don’t break the OLTP app – I’m not sure about star transforms without bitmaps on the fact data.
    4 if adding or changing data in the domain table is RARE then perhaps a bitmap index – but could be risky
    7 you could make an IOT Mview if you wanted to

  2. Andy Todd Says:

    Ah, good old cg_ref_codes, how I miss it ;-)
    If it’s any consolation the problem is even worse in Oracle Applications, because in that lovely system there are two tables for these user definable sets of data;
    FND_FLEX_VALUE_SETS defines the domains, which are only visible by name in the application, but which has a unique internal key value.
    FND_FLEX_VALUES defines the actual values and their (translated) descriptions, and does not contain the flex value set’s name, only it’s hidden id.
    So you see an awful lot of code like;
    SELECT ffv.flex_value, ffv.description
    FROM fnd_flex_value_sets ffs
    JOIN fnd_flex_values_vl ffv ON ffs.flex_value_set_id = ffv.flex_value_set_id
    WHERE ffs.flex_value_set_name = ‘Trade Code’
    So we have the added bonus that outer joins are a bit problematic (it’s the number one problem that my team still haven’t properly grasped).
    In summary I’d suggest having a look at the Applications EUL provided by Oracle because they have probably already addressed this issue. I’m in the dark as the client I’m working for doesn’t like Discoverer and won’t ley us pollute our machines with it ;-)

  3. Mathew Butler Says:

    Regarding the generic type table see Cost Based Oracle Fundamentals by Jonathan Lewis p295 – p299 for a possible solution to your problem of obtaining correct cardinalities. Involves using a list partitiond table on Domain Key and referencing Domain Key as a literal. Of course there are issues around the licencing of the partitioning option and possibly the current use of the RBO.
    M.

  4. Steve C Says:

    Mark,
    I’ve developed several EUL’s on designer generated schemas using both Rule and Cost Based Optimisers. Most of our Discoverer EUL’s are on OLTP systems so I tend to encounter these problems quite a bit .
    The best way to address this I’ve found is
    1. Ensure that the CG_REF_CODES table has a compound index on Domain and code.
    2. Ensure that each FK column in the table with a lookup to CG_REF_CODES has an index on the code value. Create the indexes on a different tablespace/disk to the table so as to maximise the benefit of using more than one database writer process. This should cut down on the overheads of the indexes.
    3. In Discoverer create the main table as a simple folder.
    4. Create a version of CG_REF_CODES for each domain needed by importing a definition for the whole table and adding a mandatory condition of Domain = ‘Value’. This will apply when either inner or outer joins are used so is very useful. This is also hidden from the user so they don’t have to know the domain codes.
    5. Create Item classes by creating a hidden custom folder using
    SELECT *
    FROM CG_REF_CODES
    WHERE DOMAIN = ‘Value’
    Apply the item classes to the version of CG_REF_CODES created.
    6. Create joins between the main table and the different copies of CG_REF_CODES linking the code value to the correspoding code value in CG_REF_CODES. The mandatory condition applied above will add the extra condition needed to utilise the index on CG_REF_CODES.
    If you’re using custom folders, so long as the CG_REF_CODES table has a PK index and the associated FK has an index, you can use code such as
    SELECT col1, col2, col3 …
    FROM tab1, CG_REF_CODES gender
    where tab1.gender_code = gender.code(+) and ‘GEN’ = gender.domain(+)
    It is possible as above to outer join the literal value of ‘GEN’ as well as the database columns involved. This actually works very well, even under the rule based optimiser. If as in 4 above you have created the mandatory conditions for Domain = ‘GEN’ then the ‘GEN’ = gender.domain(+) condition will automatically be added by Discoverer in outer join operations between the simple folders.
    Although this might seem a simplistic approach compared with the list you mentioned, sometimes I find with Discoverer that simple is better. I had all this running on a 7.3.4 database in Discoverer 3 and all worked really quickly. I have since migrated the EUL’s to Discoverer 9.0.2 on the Cost Based Optimiser and this approach still works.

  5. Mark Says:

    Thanks everyone, a good set of suggestions. I went through them with the client yesterday and I’ll be getting some feedback soon on what worked and what didn’t. Thanks again for the help.
    Mark

  6. anonymous Says:

    Mark, the above suggestion is what I would suggest as well. similar to dimensional role playing. Joining to a bunch of folders that are different mandatory filters on the same base table will ensure that depending on the number of each domain, you may or may not full scan over and over. (assuming you index domain and code).
    #2 I don’t think will be a problem. It or it’s domain/code index should be cached since it will be so popular. It will be used based on RBO whether you want it to or not.
    Or create the views at the db level and then build the EUL off that. Either way, you have to logically split these out. Physical is up to you. Whether physical isolation will help is better left up to some real tests.
    As tom K has said, databases are designed to join data. make sure the MV approach (pre-join) saves more resources than it consumes.

Website Design & Build: tymedia.co.uk