Look Before You Leap

A recent post on Howard Rogers' blog around the hype and enthusiasm around the launch of Oracle 11g, together with some feedback I got during the review process for my OU seminar slides, got me thinking about how we sometimes rush in to adopt just-released versions of software, and how we can sometimes over-emphasise new features and products forgetting what's already in place. Basically, just because something new is now possible, it doesn't mean that it's the best way to do something; similarly, when it comes to new software products and even new releases, there's often a lot of common sense in waiting before you adopt something, especially if what you've got works and what's coming is new and relatively untested.

To take an example; going on Howard's point, Oracle Database 11g is due out (on Linux) within the next few weeks, and according to the IOUG a large proportion of their members intend to adopt this new release within the next six months or so. On the face of it this looks commendable - you're adopting new features as they come onstream, maximizing your investment and so on - but if you've been working, as I have, in the Oracle BI&DW space for a few years now, you'll start to detect a pattern where software that's initially billed as being new and fresh out of exhausive beta testing actually only really works properly after a few patch releases.

To take some examples, in Oracle 9i the old Express Server engine was incorporated into the Oracle Database and was therefore able to take advantage of the scalability, managability and so on of the mainstream Oracle database. If you were an early adopter of this feature, whilst it worked (just about) in this first release, it wasn't really until 9iR2, or being honest, 10g before it was "production ready". Even incremental features for Oracle OLAP, such as compression, enabling for Discoverer access and so on really didn't become properly usable until one or two releases after their initial introduction. This isn't mean to beat on Oracle OLAP - lots of other BI&DW features of the database such as partitioning (partitioned views, which were followed by partitioned tables, star transformations and so on) had an initial introduction with (in hindsight) limited functionality that in the end were perfected in later releases. Of course there are honorable exceptions - the SQL Model clause seems to have worked more or less perfectly from the initlal 10gR1 release - but in most cases, the .1 release should be treated with a bit of caution and really not used in critical bits of code until it's been out in the field for at least 12 months or so.

Also, even if you work on the assumption that what's actually in this initial release is in isolation properly tested, it's still worth holding fire for at least a couple of patches (or, ideally, until the R2 version of the database) as it's unlikely that it's been possible to test this new feature in combination with all the other new features introduced with this new release. Realistically, even with the most thorough beta testing cycle and testing within product development, it's not until the new release gets into the field and gets tested within all the different production environments, combinations of database features used and so on before all the bugs will come out.

What this has meant for me is that, over the past few years or so, my advice to clients when it comes to new major database releases - 9i, 10g, now 11g, is to wait until the release 2 version of the software before you migrate from earlier versions. If you were particularly cautious you'd wait for the terminal patch release but in the data warehousing line of business I work in, you've got to balance getting hold of the new features with a reasonable amount of caution. By all means, for testing and development purposes, download and start working with 11g as soon as it's available, but unless the new feature you're looking to use is just an incremental improvement to a well established feature - interval partitioning is an example that springs to mind - in hindsight it's probably best to wait until 11gR2 before you start to adopt some of the radically new features. By the Release 2 version, most of the bugs will have been ironed out, and you'll probably benefit from a "rounding off" of the new features as product development add all the bits in that they couldn't include in time for the Release 1 version.

In a similar vein to Howard's article, I also got a bit of thoughtful feedback from a reviewer when I passed the slides around for my upcoming OU seminar. In the slides I went through all the ways in which you could now integrate data in real-time using the Oracle BI Server, and perform calculations on data using this BI middle tier. In a way, the Oracle BI Server almost allows you to do away with an Oracle Data Warehouse, and it's calculation capabilities give you a fair bit of the functionality traditionally found in a dedicated OLAP engine such as the Oracle Database OLAP Option: which is where the interesting review comment came in - Just because you can do things in this new way, it doesn't mean that you should do it this way.

To give an example: in my slides, I go through the process of joining data from, say Microsoft SQL Server to data coming from Oracle and present it to users as a single view. Now in some circumstances this will make sense - when, for example, a report is needed now or there aren't resources available to integrate the data into a data warehouse - but we shouldn't lose sight of the fact that, in the majority of circumstances, data used for reporting that needs to be integrated is better off held in a dedicated data warehouse, where you can integrate the data as it comes in, store it in a single database instance, store history, index and summarize the data and so on and so on. The integration capabilities of Oracle BI Server are particularly useful when you need to add a data source fast and there isn't time to integrate into the warehouse proper, or where the return on the development work involved doesn't justify reworking the warehouse ETL routines and so on, but in an average situation where you're building for the long-term and performance is key, you can't beat a proper data warehouse with a single integrated data store, ideally based on Oracle Database technology.

Similarly, just because the Oracle BI Server can do time-series and OLAP-style calculations, that doesn't mean that it's always the most efficient way of performing this type of analysis. Time-Series and multi-dimensional analysis is something Oracle OLAP does incredibly efficiently, due to its multi-dimensional storage arrays and query language and the fact that it runs right next to the data, and therefore if you're putting a BI architecture together and you fall into the trap of distregarding all the traditional data warehousing technology in favour of what's new and just come out, you run the risk of missing a trick and coming up with a less-than-optimal architecture. By all means use the BI Server if the back-end data source doesn't support time-series or OLAP functionality, or if you've not go access to the data source, but given a choice doing the calculations as near to the data as possible, and leveraging in-database analytic and OLAP functions is usually the most efficient course of action.

So I guess what I'm coming to with this posting is two points; firstly, exercise a bit of caution when jumping in with new product (especially database) releases as historically whatever's new typically performs much more solidly in the second release 18 months or so later; and secondly, whilst all the new BI tools and technologies give you lots of new functionality and ways to do things we've been doing in the past, don't forget the basic rules of BI & data warehousing - use the most efficient and appropriate tool to do the job, don't forget the basic benefits and payoffs of data warehousing, and consider when adopting a new feature whether in fact an existing bit of your technology stack performs the function more efficiently. Basically, use your tools wisely and whilst a new release of Oracle is like a new book from your favourite author, don't mix up enthusiasm for something new to read about with something you've got to implement as soon as it comes out..

Finally, after this over-long posting (Janet's away for three days and I've just got the kids off to bed, so I'm upstairs in the loft with a big glass of wine and a bit of peaceful silence), a quick update on the parrot story. Since I last posted, I've been getting unsoliciticed emails from bird food supplies websites asking if I want to exchange links (apparently I score highly on Google for parrots and fireplaces now), and the parrot itself, or at least the first of the parrots, got into the local newspaper. Other than that, I've been on my usual travels - a few days in one of the Gulf States earlier this week, I've never been in a place as hot in my life - and the first of the next round of BI seminars starts next week, at the Oracle City Office in London. If you're coming along, you'll get a chance to see all the wonderful new technology I've been talking about on this blog over the past few months, and you'll get to hear me talk about the situations where it's most appropriate to use them. Until then, have a good weekend, and as a parting link, here's a new Oracle BI blogger who's recently posted a "getting started with Hyperion Essbase" article - good stuff.