Investigating Oracle 11g Interval Partitioning

If you've kept abreast of what's new in the latest release of the database, you will probably be aware of three major enhancements to partitioning in Oracle 11g. Firstly, there's the new reference (ref) partitioning where you can partition one table based on a key value in another, so that for example you can partition the order details table based on the same key as the orders table. The second enhancement is partitioning on virtual columns, where you can define a column based on an expression and then partition the table based on it. The one I'm most interested in though is Interval Partitioning, the third enhancement, as it allows you to define a range partitioning scheme for a table that automatically creates partitions as they are needed.

If you haven't used partitioning before you'd actually probably think that this is how it works at the moment; you define a range partitioning scheme so that, for example, each month's data is placed in it's own partition which you can then back up, make read-only and so on, and as you load data into the table the partitions are created for you on demand. It's actually the case though that prior to 11g you needed to define partitions before you load data into them, leading to the situation where before each data load into a data warehouse, say, you had to run ALTER TABLE scripts to add the relevant partitions into your tables. It's not a huge amount of extra work but it does complicate the load process, and Oracle have addresed this in the 11g release by releasing INTERVAL partitioning, which creates new partitions as they are required.

To take an example, this script creates a table with four regular range partition and then sets up interval partitioning for dates past the 1st January 2007, which is referred to as the "transition point". Interval partitioned tables always need at least one range-based partition set up, with interval partitioning taking over beyond this transition point.

SQL> CREATE TABLE interval_sales
  2      ( prod_id        NUMBER(6)
  3      , cust_id        NUMBER
  4      , time_id        DATE
  5      , channel_id     CHAR(1)
  6      , promo_id       NUMBER(6)
  7      , quantity_sold  NUMBER(3)
  8      , amount_sold    NUMBER(10,2)
  9      )
 10    PARTITION BY RANGE (time_id)
 11    INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
 12      ( PARTITION p0 VALUES LESS THAN (TO_DATE('1-1-2005', 'DD-MM-YYYY')),
 13        PARTITION p1 VALUES LESS THAN (TO_DATE('1-1-2006', 'DD-MM-YYYY')),
 14        PARTITION p2 VALUES LESS THAN (TO_DATE('1-7-2006', 'DD-MM-YYYY')),
 15        PARTITION p3 VALUES LESS THAN (TO_DATE('1-1-2007', 'DD-MM-YYYY')) );

Table created.


If you then take a look at USER_TAB_PARTITIONS, you can see the four regular range-based partitions that I've just set up.

SQL> Select partition_name from user_tab_partitions where table_name = 'INTERVAL_SALES';

PARTITION_NAME
------------------------------
P0
P1
P2
P3

Now if I try and add some data into this table beyond the transition point and then take a look at USER_TAB_PARTITIONS, you can see the new partition that's been set up. Note that Oracle has only created one partition to cover the the entire period from the transition period to the date of the transaction, and that the partition name is system-generated.

SQL> insert into interval_sales values (1,2,to_date('1-jan-2009', 'dd-mon-rrrr'),'d',3,4,9.9);

1 row created.

SQL> commit;

Commit complete.

SQL> Select partition_name, high_value from user_tab_partitions where table_name = 'INTERVAL_SALES';

PARTITION_NAME HIGH_VALUE


P0 TO_DATE(' 2005-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
P1 TO_DATE(' 2006-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
P2 TO_DATE(' 2006-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
P3 TO_DATE(' 2007-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P161 TO_DATE(' 2009-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

So what happens if we insert more data, but this time for a date in-between? The interval partitioning scheme specifies one partition per month, what's going to happen?

SQL> insert into interval_sales values (1,2,to_date('1-jan-2008', 'dd-mon-rrrr'),'d',3,4,9.9);

1 row created.

Elapsed: 00:00:00.01
SQL> commit;

Commit complete.

Elapsed: 00:00:00.01
SQL> Select partition_name, high_value from user_tab_partitions where table_name = 'INTERVAL_SALES';

PARTITION_NAME HIGH_VALUE


P0 TO_DATE(' 2005-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
P1 TO_DATE(' 2006-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
P2 TO_DATE(' 2006-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
P3 TO_DATE(' 2007-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P161 TO_DATE(' 2009-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P162 TO_DATE(' 2008-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

6 rows selected.

That's interesting. It's created a single month partition to cover the new date, and the existing partition then covers from that date through to the date of the transaction. What about if we add a bunch of other transactions in several months over several years?

SQL> insert into interval_sales values (1,2,to_date('14-mar-2008', 'dd-mon-rrrr'),'d',3,4,9.9);

1 row created.

SQL> insert into interval_sales values (1,2,to_date('17-sep-2009', 'dd-mon-rrrr'),'d',3,4,9.9);

1 row created.

SQL> insert into interval_sales values (1,2,to_date('10-jun-2012', 'dd-mon-rrrr'),'d',3,4,9.9);

1 row created.

SQL> insert into interval_sales values (1,2,to_date('05-aug-2008', 'dd-mon-rrrr'),'d',3,4,9.9);

1 row created.

SQL> insert into interval_sales values (1,2,to_date('05-dec-2008', 'dd-mon-rrrr'),'d',3,4,9.9);

1 row created.

SQL> insert into interval_sales values (1,2,to_date('05-nov-2008','dd-mon-rrrr'),'d',3,4,9.9);

1 row created.

SQL> insert into interval_sales values (1,2,to_date('05-oct-2008','dd-mon-rrrr'),'d',3,4,9.9);

Commit complete.

SQL> Select partition_name, high_value from user_tab_partitions where table_name = 'INTERVAL_SALES';

PARTITION_NAME HIGH_VALUE


P0 TO_DATE(' 2005-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
P1 TO_DATE(' 2006-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
P2 TO_DATE(' 2006-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
P3 TO_DATE(' 2007-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P161 TO_DATE(' 2009-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P162 TO_DATE(' 2008-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P163 TO_DATE(' 2008-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P164 TO_DATE(' 2009-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P165 TO_DATE(' 2012-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P166 TO_DATE(' 2008-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P167 TO_DATE(' 2009-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P168 TO_DATE(' 2008-12-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P169 TO_DATE(' 2008-11-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

13 rows selected.

Now it's a bit difficult to work out the ranges of the interval partitions because they're listed in the order they were created, but if you re-order the list it looks like this:

TO_DATE(' 2005-01-01 00:00:00',
TO_DATE(' 2006-01-01 00:00:00',
TO_DATE(' 2006-07-01 00:00:00',
TO_DATE(' 2007-01-01 00:00:00',
TO_DATE(' 2008-02-01 00:00:00',
TO_DATE(' 2008-04-01 00:00:00',
TO_DATE(' 2008-09-01 00:00:00',
TO_DATE(' 2008-11-01 00:00:00',
TO_DATE(' 2008-12-01 00:00:00',
TO_DATE(' 2009-02-01 00:00:00',
TO_DATE(' 2009-01-01 00:00:00',
TO_DATE(' 2009-10-01 00:00:00',
TO_DATE(' 2012-07-01 00:00:00',

So what's happening here is that the first four partitions are our range-based ones and cover up to the end of 2006; after that comes the interval partitions and they are defined with their high value being the first day of the next month, as they are month-based interval partitions. As our data is sparse these interval partitions end of covering several months, as further data is then added these ranges will end up going down as interval partitions for other months are created.

So that answers the question as to what the system does when you end up inserting data way beyond the transition point, as I was wondering whether lots of individual month-based partitions were created. What actually happens, as you can see from the example above, is that partitions are created as needed based on the the HIGH_VALUE that's relevant to the data being added.

The other question I had was around partition maintenance, and in particular partition exchange loading, where you load data into a staging table to the same design as an individual table partition, prepare your local indexes to go with it, then swap it with the (usually empty) partition in the main table using an ALTER TABLE .. EXCHANGE PARTITION command, like this:

ALTER TABLE sales_partitioned
  EXCHANGE PARTITION sales_200108 WITH TABLE sales_staging
  INCLUDING INDEXES
  WITHOUT VALIDATION;

But how does this work when you use interval partitioning, when the partition name is system-generated? How do you know the same of the partition to exchange? The answer to this is in another partitioning enhancement in Oracle 11g, where you can refer to a partition by the range that it covers, like this:

SQL> ALTER TABLE INTERVAL_SALES DROP PARTITION FOR (TO_DATE('3-3-2004','DD-MM-YYYY'));

Table altered.

So if I prepare staging table to match my partitioned table, like this:

SQL> CREATE TABLE interval_sales_staging
  2      ( prod_id        NUMBER(6)
  3      , cust_id        NUMBER
  4      , time_id        DATE
  5      , channel_id     CHAR(1)
  6      , promo_id       NUMBER(6)
  7      , quantity_sold  NUMBER(3)
  8      , amount_sold    NUMBER(10,2)
  9      )

and the add some data to it...

SQL> insert into interval_sales_staging values (1,2,to_date('05-oct-2008', 'dd-mon-rrrr'),'d',3,4,9.9);

1 row created.

and then try out an exchange partition, using this new FOR clause, it works:

SQL> ALTER TABLE interval_sales
  2  EXCHANGE PARTITION FOR (TO_DATE('05-10-2008','DD-MM-YYYY'))
  3  WITH TABLE interval_sales_staging
  4  INCLUDING INDEXES
  5  WITHOUT VALIDATION;

Table altered.

Not bad. But what if the partition I want to add in to doesn't exist? What if I insert some data into the staging table that is for just one month, and yet the partition that the FOR clause points to covers several months - will I lose all the data in the existing partition even though I'm probably expecting to exchange just a single month's partition? And what about if the new data is for a date beyond the current range of the interval partitioning, and there isn't therefore a partition to exchange with?

To start off with then I'm going to list out all the the data in the interval partitioned table, so we can see what's currently there:

SQL> select time_id, quantity_sold from interval_sales order by time_id;

TIME_ID QUANTITY_SOLD


05-OCT-08 4
01-JAN-08 4
14-MAR-08 4
05-AUG-08 4
05-OCT-08 4
05-NOV-08 4
05-DEC-08 4
01-JAN-09 4
17-SEP-09 4
10-JUN-12 4

10 rows selected.

So say now we have some new data for the month of Jun 2008, maybe we missed this data out in our initial load and we want to put it into the data warehouse now. In the interval partitioned table, data for Jun 08 is in a partition that goes from the 1st April 2008 to the 1st September 2008 and has a row in it dated 5th August. What happens if we exchange our staging table for this partition - will we lose the data for August?

SQL> insert into interval_sales_staging values (1,2,to_date('10-jun-2008', 'dd-mon-rrrr'),'d',3,5,9.9);

1 row created.

SQL>
SQL> commit;

Commit complete.

SQL> ALTER TABLE interval_sales
2 EXCHANGE PARTITION FOR (TO_DATE('10-06-2008','DD-MM-YYYY'))
3 WITH TABLE interval_sales_staging
4 INCLUDING INDEXES
5 WITHOUT VALIDATION;
ALTER TABLE interval_sales
*
ERROR at line 1:
ORA-02149: Specified partition does not exist

OK. What about if we try and do an exchange partition for data beyond the highest interval partition, does that cause a "partition does not exist" error as well?

SQL> truncate table interval_sales_staging;

Table truncated.

SQL> insert into interval_sales_staging values (1,2,to_date('10-jun-2018', 'dd-mon-rrrr'),'d',3,5,9.9);

1 row created.

SQL> commit;

Commit complete.

SQL> ALTER TABLE interval_sales
2 EXCHANGE PARTITION FOR (TO_DATE('10-06-2018','DD-MM-YYYY'))
3 WITH TABLE interval_sales_staging
4 INCLUDING INDEXES
5 WITHOUT VALIDATION;
ALTER TABLE interval_sales
*
ERROR at line 1:
ORA-02149: Specified partition does not exist

My guess here is that you need to insert some data into the interval partitioned table, for the partition you want to then exchange, before this will work. Let's give this a try.

SQL> truncate table interval_sales_staging;

Table truncated.

SQL> insert into interval_sales values (1,2,to_date('10-jun-2008', 'dd-mon-rrrr'),'d',3,5,9.9);

1 row created.

SQL> commit;

Commit complete.

SQL> ALTER TABLE interval_sales
2 EXCHANGE PARTITION FOR (TO_DATE('10-06-2008','DD-MM-YYYY'))
3 WITH TABLE interval_sales_staging
4 INCLUDING INDEXES
5 WITHOUT VALIDATION;

Table altered.


So there you have it. You can do partition exchange loading with an interval partitioned table, and you can reference the partition using the FOR clause, but you need to have created the partition first of all, by inserting a dummy row into the interval partitioned table, before this will work. The row can be anything as long as it satisfies any constraints, the idea is to then get Oracle to create the mostly empty interval partition so that you can then exchange it with the contents of your staging table.

Thanks to Peter Scott for the examples, the last tip and the information on the FOR clause, before we were aware of this we were thinking that the system-generated partition name was going to more or less eliminate interval partitioning for partition exchange loading, so it was nice to come across this feature after taking a look through the documentation.