Oracle Partitioning 2007

August 4th, 2007 by Peter Scott

I see that Oracle have posted a series of white papers on the soon to arrive 11g release, the one on partitioning caught my eye.

But first an aside: Does anyone actually use slower storage for less-accessed data? The theory sounds good: put the-once-in-a-blue-moon stuff on slower, cheaper, disk and the “use all of the time” data on the high grade, expensive, disk or even solid-state disk… but in practice do people just slap the data on to their 100TB corporate SAN? When’s the last time you saw someone buying a couple of hundred GB JBOD array of SATA disk when they already the space on a corporate NAS /SAN?

I have always liked Oracle’s implementation of partitioning, it is flexible, and transparent to queries - that is we don’t need a special syntax to query partitioned data. I could do most things that I wanted to in the 9.2 and 10g releases but some things required a little creativity. But this time around there will be some nice features - and some should even appeal to the OLTP community. Significantly the number of options for partition key has significantly increased and for composite partitioning (sub partitions) almost any combination goes - I am not sure that HASH-HASH would be useful feature to many people though. New partitioning scheme types are:

  • Interval partitions, that is an auto-extending variant on range partitions (if new data arrives for a non-existing partition the partition is created). This could be a great feature for people new to partitioning and have not already developed code to create partitions as part of data load process; however if you are using some form of rolling data design in a data warehouse you will still need to write code to drop aged-out partitions
  • Ref partitions. Here we are partitioning on the partitioning strategy inherited from a parent table (as referenced by a foreign key join) This may not be quite as useful as it sounds in a DW situation where items can change parents (type 1 slow changing dimensions) but could be good in OLTP work where we have master / detail table pairs (such as order header / order line) and could partition both tables on order_date without having to denormalise the detail table to contain the order date just to act as partition key.
  • Virtual Column partitioning allows functions to be used on column data to define the partitions. For example some organisations generate codes with some form of semantic meaning - say the first three characters of a product code could be the supplier code, we could then partition our product containing table on supplier by creating a virtual partition key on the first three characters of product code

Second aside: I am not a great lover of hash partitions. Partitioning should be a tool to improve both manageability and performance, but to my thinking hash partitions are not about boosting performance in DW systems where we have many ‘multi partition key’ queries and end up hitting almost all of the hash partitions, although we might still get partition-wise joins and some parallelism gains we just don’t get the partition pruning

Comments

  1. Jeremy Schneider Says:

    I’m not sure a lot of people are using slower storage to implement in-database ILM right now but I do think that it’s a good forward-thinking feature. As data volumes grow it might become much more important; maybe in 5 years a lot more people will be doing it.

  2. Peter Scott Says:

    That’s a good point about market maturity - maybe 5 years time things will change. But is ILM a solution for a non-existent problem; is the cost of developing and deploying ILM disks cheaper than homogeneous corporate storage - I don’t know… honestly

  3. Jeremy Schneider Says:

    I think it depends on what the rate of data growth is over the next few years. Early-adopter industries would probably be those like medical imaging for example. One of the things causing an explosive data growth rate right now is media; higher-resolution images and documents, audio, and video… hey - I was just looking at hand-held HD camcorders last week for only a couple hundred bucks. As that data starts making its way into DW systems I think we might see greater demand for large-scale ILM - perhaps something like what Oracle has architected. But really this is all just speculation and guesswork on my part - who knows!

  4. joel garry Says:

    This takes me back to circa 1980, I was working on PDP inventory systems and taking a UCLA class on “online systems.” The class went on a field trip to a mysterious multistory building in downtown LA which was a large oil companies data center, big IBM iron with cooling radiators on the roof. They had a tape robot that was about 150 feet long, which would zip back and forth pulling tapes from racks and mounting them on tape drives - there were a few dropped on the floor. On one end, it had hexagonal data cartridges in a beehive looking contraption that would be plugged by a mechanical arm into a reader.

    Now, that was partitioning!

    I just bought two 500G USB boxes about the size of videotapes, for backing up the home computers. $109.99 each.