Easter Roundup

It's the Easter break over here at the moment, and I'm off work for the next week with a bit of paternity leave. Things are a bit hectic as I'm sure you can imagine, with a two and a half-year old and a two week old baby, but it's all good fun and a nice break from travelling all over the country and staying in hotels. Although I've not been able to update the blog much, I've kept an eye on what's been going on and bookmarked a few sites to look at later on.

First up, if you took an interest in the predictive modelling debate on Asktom, Don Burleson's forum and the Dizwell forum, you might be interested in a similar debate going on at Don's forum around Empirical vs. Research DBAs. Personally I don't agree with Don's assertion, that Oracle research primarily conducted on laptops using SQL scripts and single-user systems does not translate to the world of multi-terabyte, highly concurrent systems, and therefore the best way to tune a database is to use observation and correlation to "tease out" the inner workings of the system, as my belief is that the danger in basing decisions on intuition, correlation and "hunches" is that you make the right decision for the wrong reasons, and then go on to apply this "lucky guess" to subsequent situations where in fact it doesn't work, and you don't know why because you don't know why it worked in the first place. The danger when you propose "silver bullets" without supporting evidence as to why they work is that the approach can't properly be reviewed, and if the internals of the Oracle database change, and you don't retest them against the new version, the danger is that this approach may no longer be valid and it runs the risk of becoming an Oracle myth.

The other problem I have with the empirical approach is that it relies on many, many years of experience and observation of different types of Oracle database, and the "sixth sense" you therefore develop for coming up with performance tuning fixes, and therefore it's an approach that by definition cannot be taught to novices in any reasonable amount of time, and cannot be carried out by anyone other than very experienced DBAs. By contrast, the scientific method, an example of which is the "response time" tuning method put forward by Cary Millsap and Jeff Holt in "Optimising Oracle Performance", can be taught and used successfully right from the start as it bases its approach on instrumenting your system at a low level, recording and measuring what's going on, applying just the right fix to address the problem, and then recording and measuring again. Unlike the empirical approach - observe at a high-level, make some correlations, try an approach, see what happens, if it works, great, otherwise try something else - the scientific approach, although it might involve a bit more work at the start, and a closer understanding of Oracle internals - at least involves basing recommendations on facts, not "lore", which sounds to me a better way to tune. The other problem as I see it with the empirical approach - which let's remember bases it's approach on "it works but I don't know why" - is that many of the solutions put forward by proponents of this approach are soon after discredited by members of the scientific community, with lots of documentary evidence to say why. It's all well and good to say that research done by DBAs using laptops and SQL scripts doesn't prove anything, but if they then prove that your empirically-derived solutions don't work, and you don't offer any evidence of your own to back up your original assertion and instead change the debate to one about credentials and education, then this just looks like a smokescreen and undermines the strength of your original argument. All credit though to Don for having this debate though, for taking the time to put forward his thoughts and for publishing opinions on his website that both support and contradict his approach - Don often attracts a lot of (unneccessary) flak but it's debates like this that help (relative) novices like me understand the bigger picture. Also, my gut feeling is that most of the time Don is actually right in that his approach did work, but what I can't understand is why (with his undoubted knowledge and experience) he doesn't just back up his assertions with reproducible test cases (even if these involved simulating an instance with a high concurrent load), which would validate his empirical observations and negate most of the arguments against what he's proposing. As I've said before, all of this is way beyond my level of experience, but it seems obvious to me that there's an easy way to address the concerns of the scientific Oracle community which is to provide test cases and evidence to back up every empirically derived solution, and until that's done there'll always be debate about whether a hunch or a correlation is in fact valid.

Moving on now, a bunch of new Oracle blogs have started up recently that are worth bookmarking and taking a look at. Mike Ault and Robert Freeman, who work with Don at BC Oracle Consulting, have started new blogs, and Don himself has got his own personal blog that talks about his life outside of oracle consulting. I met Don, Mike and John Garmany last year whilst at Openworld and whilst we operate in different areas and presumably have different approaches, they're all actually great fun and good company to go out with - thanks Don for the meal! Doug Burns, who you might remember as the author of "Suck It Dry - Tuning Parallel Execution", an excellent paper on parallel execution that I mentioned last year, has started a blog with one of the first postings on the art-vs.-science debate. Note also Doug's Technical Papers and Presentations page with a number of other interesting Oracle papers. Also, not new but new to me is a weblog by Jonathan Gennick, author of many OTN articles and a number of O'Reilly books.

Jonathan Lewis has put another article up on his Miscellaneous page, this time on "Bottlenecks", which as usual is worth a quick read and reminded me to take a proper look at hash clusters, which can improve the performance of applications that retrieve values from lookup tables - more details can be found in the online documentation, this article by Connor McDonald and this one by Don Burleson. This article also jogged my memory about something I'd read about star schema dimension tables as IOTs (index-organised tables) - this was originally a comment on one of my blog postings, and I subsequently looked it up at Asktom where it appears to be a valid approach to improve performance.

Rounding up those articles I've bookmarked and not had a chance to check out properly. "Introducing Materialized Views as mechanism for Business Rule implementation (complex declarative constraints)" on the Amis blog looks like another thorough investigation into Oracle SQL that could prove useful on a future project. "An Oracle 10g Upgrade Case Study: Looking at System Performance Before and After the Upgrade" would have been useful a few weeks ago for me when I was asked to detail the benefits (and potential issues) around an 8i to 10g migration. Robin Schumacher (who I also met at the same dinner with Don) recently wrote an article for OTN entitled "Response Time Analysis Made Easy in Oracle Database 10g" that examines some of the new instrumentation that you get with Oracle 10g, and over at Oracle-base.com there's an excellent round-up of the performance tuning enhancements in Oracle Database 10g.

Finally, mirroring the art-vs.-science debate in Oracle performance tuning circles, within the data warehousing world there's a similar, perennial argument going on around the best way to design data warehouses. One faction, based around the approach put forward by Bill Inmon, believes that data warehouses are no different to regular databases and are best designed in a third-normal-form, as this is the most flexible and performant way to build a relational database. The other faction, based around the writings of Ralph Kimball, believe that the most important thing is query performance and making the data model easy for users to navigate, and build their warehouses using star schemas instead. Me - I'm a Kimballite, if only because Oracle's data warehouse performance enhancements presuppose a star schema and carry out joins and data retrieval more efficiently when you've got one - but if you're interested there's a debate going on over at the B-Eye Network, and Kent Graziano (author of the Agile Methods paper I reviewed recently, and a supported of the Inmon "Enterprise Data Warehouse" architecture) pointed me in the direction of this article which sums up the Inmon EDW philosophy.

Back in a few days.