April 22nd, 2012 by Peter Scott
This week Stewart, Mark and I are at Collaborate 12 in Las Vegas – look us out if you are at the conference. Long flights (it’s about 10 hours from the UK to Las Vegas) give me plenty of blogging time so I decided to write a few new blogs. Here’s the first.
One part of my work here at Rittman Mead is to look at data warehouse performance either from the user query perspective or the ETL processes used to load the data warehouse. Sometimes, I look at existing systems, other times, I am working as part of the development team creating a new system.
Recently, I was asked to look at a table that uses some very complex aggregation and calculation logic in a near real-time environment. I had three design objectives to meet, generate as little redo logging as possible, as fast a refresh as possible and a keeping the data available for as long as possible – that is no periods during the refresh when data is not visible. I decided that for my purposes a partition exchange load using a truncate append insert into the exchange table was going to work best – add a couple of Exadata bells and whistles such as hybrid columnar compression on the target and away we go. For the insert I need to use a hint to tell Oracle to direct path load and a second hint to get the insert done in parallel.
Normally, I dislike using hints. To my mind using a hint in a query tells me that the cost based optimizer (CBO) has not got all the facts it needs; there could be a problem with the statistics (stale, inadequate, missing etc), a lack of database constraints (the more we can tell the CBO about the data the better) or, as in some cases that I have seen, the hint used is not actually helping the current situation; a hint put in when the database was developed in Oracle 8i days may be doing more harm than good now, likewise a hint put in “because we always do that for this type of query” or because “I found it on Google” may not give us what we need. Generally, I tend to use hints only whilst developing SQL, as a way of telling me that CBO is not doing a good a job as it could because I have kept it in the dark about some fact. I somehow feel disappointed if the CBO still needs a hint when code goes to production. However, there are some things that happen in data warehouses that the CBO has problems with – correlated columns are an example that springs to mind. Correlation can wreak havoc on cardinality estimates.
I often do, however, use hints that modify what is being asked of the database – it is perhaps unfortunate that these are called “hints” at all. For example APPEND is fundamentally a different way to insert data, it is far more than a subtlety of a query such as to whether a particular index is used or that we do a nested loop join. Likewise the GATHER_PLAN_STATISTICS hint is telling Oracle to do something more fundamental than modifying a query plan. I chose to append data because the characteristics of the append insert are appropriate for what I want to do, be it nologging inserts, data compression, or anything else that needs a direct path insert. For queries, I am not hung up on how it happens, just that it is as fast as possible and the results are correct.
One final observation about hints is that testing a hint on your development system may not give the same execution plan when put into a different system. Of course this is the same with un-hinted queries too. Never assume that what is good to use in one environment is equally good in another. Or to put it another way, validate that the code you intend to implement works in all environments.