Database problems

One of our customers is on Oracle 9.2, we would dearly love to move them to 10g but for certification reasons with their chosen query tool we can't upgrade them until the tool is also upgraded which in turn is linked to the version of JVM associated with their terminal server environment. We upgraded the development database environment to 9.2.0.8 back in November, but as the customer is in retail and the holiday season is the peak sales period we were not allowed to upgrade the live system until this week.

We have been looking forward to moving to the final patch set of 9.2, it fixed a few problems that afflicted some data warehouse users and was also claimed to resolve a horrible core dumping problem with certain star transformation queries.

The next day problem calls started to roll in; some, but by no means all existing reports, started to fail with errors relating to missing arguments to functions. So my team went in to fault finding mode.

When you provide support as a service to customers the first priority is to restore service, that is to identify a work around to minimise impact on users. Finding the root cause and fixing can come later if people are able to work.

Finding example queries that fail is not trivial, it is not like slow queries or wrong results problems where we can capture the query easily in the database, these queries fail to parse. Fortunately, the user tool has an option to show display the SQL being generated. The SQL captured is not very elegant, it uses lots of to_char conversions and a ugly looking in list. As the data warehouse uses materialized view query rewrite we decide to disable query rewrite in an SQL session and capture the query plan. As this gave a plan we conclude that the query is rewritten before it fails. We turn query rewrite on again and replace the in list with a single value, another successful plan captured, this time rewriting against a suitable table and using a temporary table star transformation. The next stage is to replace the single item with a one item in list and then keep adding items until the query fails to parse. Strangely, we get failures as soon as we have three values in the in list.

In the past, temporary table star transformation have had a bad press, there is even an option not to use them in the setting of the STAR_TRANSFORMATION_ENABLED database setting. So perhaps this is the problem happening here. Switching of the transformation completely we now get queries that complete and still in acceptable time; we now have a work around that will stop the failures whilst we continue to investigate the cause and prepare a simple test case for support