Playing Around with Star Transformations and Bitmap Indexes

I’ve been doing some work this week with a client in London, who wants to put together a reporting data mart based on Oracle Database 10gR1. Although they and I were fairly sure a dimensional database design using bitmap indexes and star transformations would give them the query performance they required, there were a number of questions we had at the start that I wanted to get resolved:

  1. How easy would it be to get a star transformation to take place - in earlier versions of Oracle (8i, 9i) star transformations seemed a bit temperamental and I wanted to see how easily these took place in the latest version of the database.
  2. What do the execution plans look like for queries against the star schema with STAR_TRANSFORMATION_ENABLED set to TRUE and FALSE, how do these compare to execution plans for the same table with regular non-unique B*Tree indexes, and did the queries that used star transformations really return data that much faster?
  3. What was the difference in size, and time to create, for bitmap indexes compared to regular non-unique indexes?
  4. If we had a column with a bitmap index on it and we then updated the column (unusual, but this is one of their requirements), by how much did this increase the size of the index and how fast was the update compared to updating a column with a regular non-unique index on it? Also, if we were running a select against the same table, to what extent would the query be slowed down compared to running the same query against a table containing regular non-unique indexes (i.e. what is the impact of row-locking when updating a column with a bitmap index on it)
So, quite a few questions. In addition, the client expected each query against the fact table, which was likely to contain upwards of 20 or 30 million rows when completed, to only return at most a few thousand records, so we were keen to find out whether star transformations still had the same performance edge when returning only a small percentage of the fact table (as opposed to, say, scanning a large chunk of the fact table and then aggregating the data).

The data itself wouldn’t be available for another week or so, so I created a test user and copied across data from the SH Sample Schema. To create a representative set of data, I copied the SH.SALES table into itself a few times to give it around 7m rows, and created two versions of the table, SALES_IDX on which I’d create regular non-unique indexes and SALES_BIX, which would have the bitmap indexes on its dimension key columns. Then, I copied across the SH.CUSTOMERS, SH.PRODUCTS, SH.TIMES, SH.CHANNELS and SH.COUNTRIES tables across, created primary key constraints and unique indexes on their ID columns, bitmap indexes on their attribute columns, bitmap indexes on the dimension key columns in SALES_BIX and non-unique indexes on the corresponding SALES_IDX columns, and then gathered statistics on the schema in readiness for testing.

SQL> desc sales_idx
 Name                          Null?    Type
 ----------------------------- -------- ------------
 PROD_ID                       NOT NULL NUMBER
 CUST_ID                       NOT NULL NUMBER
 TIME_ID                       NOT NULL DATE
 CHANNEL_ID                    NOT NULL NUMBER
 PROMO_ID                      NOT NULL NUMBER
 QUANTITY_SOLD                 NOT NULL NUMBER(10,2)
 AMOUNT_SOLD                   NOT NULL NUMBER(10,2)

SQL> desc sales_bix
 Name                          Null?    Type
 ----------------------------- -------- ------------
 PROD_ID                       NOT NULL NUMBER
 CUST_ID                       NOT NULL NUMBER
 TIME_ID                       NOT NULL DATE
 CHANNEL_ID                    NOT NULL NUMBER
 PROMO_ID                      NOT NULL NUMBER
 QUANTITY_SOLD                 NOT NULL NUMBER(10,2)
 AMOUNT_SOLD                   NOT NULL NUMBER(10,2)

SQL> select table_name
  2  ,      index_name
  3  ,      index_type
  4  from   user_indexes
  5  where  table_name like 'SALES%'
  6  /

TABLE_NAME           INDEX_NAME           INDEX_TYPE
-------------------- -------------------- ------------
SALES_BIX            TIME_ID_BIX          BITMAP
SALES_BIX            CUST_ID_BIX          BITMAP
SALES_BIX            CHANNEL_ID_BIX       BITMAP
SALES_BIX            PROD_ID_BIX          BITMAP
SALES_IDX            PROD_ID_IDX          NORMAL
SALES_IDX            CUST_ID_IDX          NORMAL
SALES_IDX            TIME_ID_IDX          NORMAL
SALES_IDX            CHANNEL_ID_IDX       NORMAL
8 rows selected.

SQL> begin
  2
  3        DBMS_STATS.GATHER_SCHEMA_STATS (
  4          ownname => 'SH_STAR_TEST',
  5            estimate_percent => 10
  6            );
  7            end;
  8  /

PL/SQL procedure successfully completed.

Note that if you’re familiar with the SH schema you’ll have noticed that I’ve missed out on indexing the PROMOTIONS dimension - I forgot about this during the testing but as the schema doesn’t exist in the test schema I set up, and I don’t reference it in joins, it’ll effectively be ignored and treated as if it’s a fact table measure, so we can disregard it.

So now all the data was there, I ran a few select statements to see how the execution plan came out, firstly with STAR_TRANSFORMATION_ENABLED set to FALSE, and then to TRUE, against both the bitmap-indexed and regular-indexed tables. The results were interesting.

SQL> select sum(amount_sold), count(*)
  2  from   sales_bix s, customers c, channels ch,
  3         times t, products p
  4  where  s.prod_id = p.prod_id
  5  and    s.time_id = t.time_id
  6  and    s.channel_id = ch.channel_id
  7  and    s.cust_id = c.cust_id
  8  and    c.cust_state_province = 'CA'
  9  and    t.calendar_year = 1998
 10  and    t.calendar_month_number = 11
 11  and    ch.channel_id = 2
 12  and    p.prod_category_desc = 'Hardware';

SUM(AMOUNT_SOLD)   COUNT(*)
---------------- ----------
        84735.44         56

Elapsed: 00:00:00.39

Execution Plan
----------------------------------------------------------
Plan hash value: 3700521350

----------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                           |     1 |    80 |   698   (1)| 00:00:09 |
|   1 |  SORT AGGREGATE                    |                           |     1 |    80 |            |          |
|*  2 |   HASH JOIN                        |                           |   601 | 48080 |   698   (1)| 00:00:09 |
|   3 |    TABLE ACCESS BY INDEX ROWID     | SALES_BIX                 |    20 |   500 |   546   (1)| 00:00:07
|   4 |     NESTED LOOPS                   |                           |  1181 | 75584 |   546   (1)| 00:00:07 |
|   5 |      MERGE JOIN CARTESIAN          |                           |    60 |  2340 |    27   (4)| 00:00:01 |
|   6 |       NESTED LOOPS                 |                           |     2 |    48 |     3   (0)| 00:00:01 |
|*  7 |        INDEX UNIQUE SCAN           | CHANNELS_PK               |     1 |     3 |     0   (0)| 00:00:01 |
|*  8 |        TABLE ACCESS FULL           | PRODUCTS                  |     2 |    42 |     3   (0)| 00:00:01 |
|   9 |       BUFFER SORT                  |                           |    30 |   450 |    24   (5)| 00:00:01 |
|  10 |        TABLE ACCESS BY INDEX ROWID | TIMES                     |    30 |   450 |    27   (4)| 00:00:01 |
|  11 |         BITMAP CONVERSION TO ROWIDS|                           |       |       |            |          |
|  12 |          BITMAP AND                |                           |       |       |            |          |
|* 13 |           BITMAP INDEX SINGLE VALUE| CALENDAR_MONTH_NUMBER_BIX |       |       |            |          |
|* 14 |           BITMAP INDEX SINGLE VALUE| CALENDAR_YEAR_BIX         |       |       |            |          |
|  15 |      BITMAP CONVERSION TO ROWIDS   |                           |       |       |            |          |
|  16 |       BITMAP AND                   |                           |       |       |            |          |
|* 17 |        BITMAP INDEX SINGLE VALUE   | TIME_ID_BIX               |       |       |            |          |
|* 18 |        BITMAP INDEX SINGLE VALUE   | PROD_ID_BIX               |       |       |            |          |
|* 19 |        BITMAP INDEX SINGLE VALUE   | CHANNEL_ID_BIX            |       |       |            |          |
|* 20 |    VIEW                            | index$_join$_002          |  3506 | 56096 |   151   (3)| 00:00:02 |
|* 21 |     HASH JOIN                      |                           |       |       |            |          |
|  22 |      BITMAP CONVERSION TO ROWIDS   |                           |  3506 | 56096 |     2   (0)| 00:00:01 |
|* 23 |       BITMAP INDEX SINGLE VALUE    | CUST_STATE_PROVINCE_BIX   |       |       |            |
|  24 |      INDEX FAST FULL SCAN          | CUSTOMERS_PK              |  3506 | 56096 |   148   (3)| 00:00:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("S"."CUST_ID"="C"."CUST_ID")
   7 - access("CH"."CHANNEL_ID"=2)
   8 - filter("P"."PROD_CATEGORY_DESC"='Hardware')
  13 - access("T"."CALENDAR_MONTH_NUMBER"=11)
  14 - access("T"."CALENDAR_YEAR"=1998)
  17 - access("S"."TIME_ID"="T"."TIME_ID")
  18 - access("S"."PROD_ID"="P"."PROD_ID")
  19 - access("S"."CHANNEL_ID"=2)
  20 - filter("C"."CUST_STATE_PROVINCE"='CA')
  21 - access(ROWID=ROWID)
  23 - access("C"."CUST_STATE_PROVINCE"='CA')

That's as expected, the query uses the bitmap indexes, does a number of hash joins and nested loop joins and returns the data in 0.39 seconds (not bad for 7m fact table rows) with a cost of 698. Running the query against the fact table with only regular non-unique indexes returns the following autotrace output:

SQL> select sum(amount_sold), count(*)
  2  from   sales_idx s, customers c, channels ch,
  3         times t, products p
  4  where  s.prod_id = p.prod_id
  5  and    s.time_id = t.time_id
  6  and    s.channel_id = ch.channel_id
  7  and    s.cust_id = c.cust_id
  8  and    c.cust_state_province = 'CA'
  9  and    t.calendar_year = 1998
 10  and    t.calendar_month_number = 11
 11  and    ch.channel_id = 2
 12  and    p.prod_category_desc = 'Hardware';

SUM(AMOUNT_SOLD)   COUNT(*)
---------------- ----------
        84735.44         56

Elapsed: 00:00:20.96

Execution Plan
----------------------------------------------------------
Plan hash value: 2003962144

----------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                           |     1 |    80 |  3175   (1)| 00:00:39 |
|   1 |  SORT AGGREGATE                 |                           |     1 |    80 |            |          |
|*  2 |   HASH JOIN                     |                           |   598 | 47840 |  3175   (1)| 00:00:39 |
|*  3 |    HASH JOIN                    |                           |  1176 | 75264 |  3023   (1)| 00:00:37 |
|   4 |     TABLE ACCESS BY INDEX ROWID | TIMES                     |    30 |   450 |    11   (0)| 00:00:01 |
|   5 |      BITMAP CONVERSION TO ROWIDS|                           |       |       |            |          |
|   6 |       BITMAP AND                |                           |       |       |            |          |
|*  7 |        BITMAP INDEX SINGLE VALUE| CALENDAR_MONTH_NUMBER_BIX |       |       |            |          |
|*  8 |        BITMAP INDEX SINGLE VALUE| CALENDAR_YEAR_BIX         |       |       |            |          |
|*  9 |     TABLE ACCESS BY INDEX ROWID | SALES_IDX                 | 28620 |   698K|  1504   (1)| 00:00:19 |
|  10 |      NESTED LOOPS               |                           | 57239 |  2738K|  3011   (1)| 00:00:37 |
|  11 |       NESTED LOOPS              |                           |     2 |    48 |     3   (0)| 00:00:01 |
|* 12 |        INDEX UNIQUE SCAN        | CHANNELS_PK               |     1 |     3 |     0   (0)| 00:00:01 |
|* 13 |        TABLE ACCESS FULL        | PRODUCTS                  |     2 |    42 |     3   (0)| 00:00:01 |
|* 14 |       INDEX RANGE SCAN          | PROD_ID_IDX               |   102K|       |   214   (2)| 00:00:03 |
|* 15 |    VIEW                         | index$_join$_002          |  3506 | 56096 |   151   (3)| 00:00:02 |
|* 16 |     HASH JOIN                   |                           |       |       |            |          |
|  17 |      BITMAP CONVERSION TO ROWIDS|                           |  3506 | 56096 |     2   (0)| 00:00:01 |
|* 18 |       BITMAP INDEX SINGLE VALUE | CUST_STATE_PROVINCE_BIX   |       |       |            |          |
|  19 |      INDEX FAST FULL SCAN       | CUSTOMERS_PK              |  3506 | 56096 |   148   (3)| 00:00:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("S"."CUST_ID"="C"."CUST_ID")
   3 - access("S"."TIME_ID"="T"."TIME_ID")
   7 - access("T"."CALENDAR_MONTH_NUMBER"=11)
   8 - access("T"."CALENDAR_YEAR"=1998)
   9 - filter("S"."CHANNEL_ID"=2)
  12 - access("CH"."CHANNEL_ID"=2)
  13 - filter("P"."PROD_CATEGORY_DESC"='Hardware')
  14 - access("S"."PROD_ID"="P"."PROD_ID")
  15 - filter("C"."CUST_STATE_PROVINCE"='CA')
  16 - access(ROWID=ROWID)
  18 - access("C"."CUST_STATE_PROVINCE"='CA')

OK, so this time the query took around 20 seconds with a cost of 3175, about ten times the cost of the query using bitmap indexes, around what I expected.

Now I enabled star transformations and ran the same queries again. Firstly, the table with bitmap indexes on it:

SQL> alter session set star_transformation_enabled = true;

Session altered.

Elapsed: 00:00:00.00
SQL> select sum(amount_sold), count(*)
  2  from   sales_bix s, customers c, channels ch,
  3         times t, products p
  4  where  s.prod_id = p.prod_id
  5  and    s.time_id = t.time_id
  6  and    s.channel_id = ch.channel_id
  7  and    s.cust_id = c.cust_id
  8  and    c.cust_state_province = 'CA'
  9  and    t.calendar_year = 1998
 10  and    t.calendar_month_number = 11
 11  and    ch.channel_id = 2
 12  and    p.prod_category_desc = 'Hardware';

SUM(AMOUNT_SOLD)   COUNT(*)
---------------- ----------
        84735.44         56

Elapsed: 00:00:01.08

Execution Plan
----------------------------------------------------------
Plan hash value: 3776350793

----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                           |     1 |    31 |   201   (5)| 00:00:03 |
|   1 |  SORT AGGREGATE                      |                           |     1 |    31 |            |          |
|   2 |   NESTED LOOPS                       |                           |    60 |  1860 |   201   (5)| 00:00:03 |
|*  3 |    INDEX UNIQUE SCAN                 | CHANNELS_PK               |     1 |     3 |     0   (0)| 00:00:01 |
|   4 |    TABLE ACCESS BY INDEX ROWID       | SALES_BIX                 |    60 |  1680 |   201   (5)| 00:00:03 |
|   5 |     BITMAP CONVERSION TO ROWIDS      |                           |       |       |            |          |
|   6 |      BITMAP AND                      |                           |       |       |            |          |
|   7 |       BITMAP MERGE                   |                           |       |       |            |          |
|   8 |        BITMAP KEY ITERATION          |                           |       |       |            |          |
|*  9 |         TABLE ACCESS FULL            | PRODUCTS                  |     2 |    42 |     3   (0)| 00:00:01 |
|* 10 |         BITMAP INDEX RANGE SCAN      | PROD_ID_BIX               |       |       |            |          |
|  11 |       BITMAP MERGE                   |                           |       |       |            |          |
|  12 |        BITMAP KEY ITERATION          |                           |       |       |            |          |
|  13 |         TABLE ACCESS BY INDEX ROWID  | TIMES                     |    30 |   450 |    11   (0)| 00:00:01 |
|  14 |          BITMAP CONVERSION TO ROWIDS |                           |       |       |            |          |
|  15 |           BITMAP AND                 |                           |       |       |            |          |
|* 16 |            BITMAP INDEX SINGLE VALUE | CALENDAR_MONTH_NUMBER_BIX |       |       |            |          |
|* 17 |            BITMAP INDEX SINGLE VALUE | CALENDAR_YEAR_BIX         |       |       |            |          |
|* 18 |         BITMAP INDEX RANGE SCAN      | TIME_ID_BIX               |       |       |            |          |
|* 19 |       BITMAP INDEX SINGLE VALUE      | CHANNEL_ID_BIX            |       |       |            |          |
|  20 |       BITMAP MERGE                   |                           |       |       |            |          |
|  21 |        BITMAP KEY ITERATION          |                           |       |       |            |          |
|* 22 |         VIEW                         | index$_join$_017          |  3506 | 56096 |   151   (3)| 00:00:02 |
|* 23 |          HASH JOIN                   |                           |       |       |            |          |
|  24 |           BITMAP CONVERSION TO ROWIDS|                           |  3506 | 56096 |     2   (0)| 00:00:01 |
|* 25 |            BITMAP INDEX SINGLE VALUE | CUST_STATE_PROVINCE_BIX   |       |       |            |          |
|  26 |           INDEX FAST FULL SCAN       | CUSTOMERS_PK              |  3506 | 56096 |   148   (3)| 00:00:02 |
|* 27 |         BITMAP INDEX RANGE SCAN      | CUST_ID_BIX               |       |       |            |          |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CH"."CHANNEL_ID"=2)
   9 - filter("P"."PROD_CATEGORY_DESC"='Hardware')
  10 - access("S"."PROD_ID"="P"."PROD_ID")
  16 - access("T"."CALENDAR_MONTH_NUMBER"=11)
  17 - access("T"."CALENDAR_YEAR"=1998)
  18 - access("S"."TIME_ID"="T"."TIME_ID")
  19 - access("S"."CHANNEL_ID"=2)
  22 - filter("C"."CUST_STATE_PROVINCE"='CA')
  23 - access(ROWID=ROWID)
  25 - access("C"."CUST_STATE_PROVINCE"='CA')
  27 - access("S"."CUST_ID"="C"."CUST_ID")

Note
-----
   - star transformation used for this statement

Ok, so the execution time is slightly higher but the cost is lower, 201 to 698. Given the small data set size and the unscientific conditions, I wasn't too surprised about the slight time discrepency, but the execution plan cost was lower, which is what I expected. I'll run some more tests later and check that the star transform does infact return the data faster as well as generate a lower-cost plan. Note also the Note in the autotrace output, "-- Star transformation used for this statement".

Now I run the same test on the table with regular non-unique indexes.

SQL> select sum(amount_sold), count(*)
  2  from   sales_idx s, customers c, channels ch,
  3         times t, products p
  4  where  s.prod_id = p.prod_id
  5  and    s.time_id = t.time_id
  6  and    s.channel_id = ch.channel_id
  7  and    s.cust_id = c.cust_id
  8  and    c.cust_state_province = 'CA'
  9  and    t.calendar_year = 1998
 10  and    t.calendar_month_number = 11
 11  and    ch.channel_id = 2
 12  and    p.prod_category_desc = 'Hardware';

SUM(AMOUNT_SOLD)   COUNT(*)
---------------- ----------
        84735.44         56

Elapsed: 00:00:01.88

Execution Plan
----------------------------------------------------------
Plan hash value: 747039580

----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                           |     1 |    31 |  3625   (7)| 00:00:44 |
|   1 |  SORT AGGREGATE                      |                           |     1 |    31 |            |          |
|   2 |   NESTED LOOPS                       |                           |    60 |  1860 |  3625   (7)| 00:00:44 |
|*  3 |    INDEX UNIQUE SCAN                 | CHANNELS_PK               |     1 |     3 |     0   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS BY INDEX ROWID       | SALES_IDX                 |    60 |  1680 |  3625   (7)| 00:00:44 |
|   5 |     BITMAP CONVERSION TO ROWIDS      |                           |       |       |            |          |
|   6 |      BITMAP AND                      |                           |       |       |            |          |
|   7 |       BITMAP MERGE                   |                           |       |       |            |          |
|   8 |        BITMAP KEY ITERATION          |                           |       |       |            |          |
|   9 |         TABLE ACCESS BY INDEX ROWID  | TIMES                     |    30 |   450 |    11   (0)| 00:00:01 |
|  10 |          BITMAP CONVERSION TO ROWIDS |                           |       |       |            |          |
|  11 |           BITMAP AND                 |                           |       |       |            |          |
|* 12 |            BITMAP INDEX SINGLE VALUE | CALENDAR_MONTH_NUMBER_BIX |       |       |            |          |
|* 13 |            BITMAP INDEX SINGLE VALUE | CALENDAR_YEAR_BIX         |       |       |            |          |
|  14 |         BITMAP CONVERSION FROM ROWIDS|                           |       |       |            |          |
|* 15 |          INDEX RANGE SCAN            | TIME_ID_IDX               |       |       |    14   (0)| 00:00:01 |
|  16 |       BITMAP MERGE                   |                           |       |       |            |          |
|  17 |        BITMAP KEY ITERATION          |                           |       |       |            |          |
|* 18 |         TABLE ACCESS FULL            | PRODUCTS                  |     2 |    42 |     3   (0)| 00:00:01 |
|  19 |         BITMAP CONVERSION FROM ROWIDS|                           |       |       |            |          |
|* 20 |          INDEX RANGE SCAN            | PROD_ID_IDX               |       |       |   215   (2)| 00:00:03 |
|  21 |       BITMAP MERGE                   |                           |       |       |            |          |
|  22 |        BITMAP KEY ITERATION          |                           |       |       |            |          |
|* 23 |         VIEW                         | index$_join$_017          |  3506 | 56096 |   151   (3)| 00:00:02 |
|* 24 |          HASH JOIN                   |                           |       |       |            |          |
|  25 |           BITMAP CONVERSION TO ROWIDS|                           |  3506 | 56096 |     2   (0)| 00:00:01 |
|* 26 |            BITMAP INDEX SINGLE VALUE | CUST_STATE_PROVINCE_BIX   |       |       |            |          |
|  27 |           INDEX FAST FULL SCAN       | CUSTOMERS_PK              |  3506 | 56096 |   148   (3)| 00:00:02 |
|  28 |         BITMAP CONVERSION FROM ROWIDS|                           |       |       |            |          |
|* 29 |          INDEX RANGE SCAN            | CUST_ID_IDX               |       |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CH"."CHANNEL_ID"=2)
   4 - filter("S"."CHANNEL_ID"=2)
  12 - access("T"."CALENDAR_MONTH_NUMBER"=11)
  13 - access("T"."CALENDAR_YEAR"=1998)
  15 - access("S"."TIME_ID"="T"."TIME_ID")
  18 - filter("P"."PROD_CATEGORY_DESC"='Hardware')
  20 - access("S"."PROD_ID"="P"."PROD_ID")
  23 - filter("C"."CUST_STATE_PROVINCE"='CA')
  24 - access(ROWID=ROWID)
  26 - access("C"."CUST_STATE_PROVINCE"='CA')
  29 - access("S"."CUST_ID"="C"."CUST_ID")

Note
-----
   - star transformation used for this statement

Now that's interesting. The data comes back faster than when we had star transformations disabled, the cost is slightly higher, and the autotrace output reports that a star transformation was in fact used on this query, even though the fact table didn't have bitmap indexes on its dimension columns. Looking at the execution plan (and I'm certainly no expert on interpreting explain plans here, so there's a good chance I've got the wrong end of the stick here) but it looks like the optimizer is creating on-the-fly bitmap indexes ("BITMAP CONVERSION FROM ROWIDS") and then scanning these bitmaps as part of a star transformation ("BITMAP CONVERSION TO ROWIDS"). Contrast this with the bitmap-indexed tables' execution plan, where only the "BITMAP CONVERSION TO ROWIDS" step takes place. I could well be wrong here (I suspect I am) but if anyone knows any more about this - I'm on 10g 10.2.0.3 Enterprise Edition on Windows XP SP2 - let me know as I'm definitely interested to know what's going on.

So, it looks like at least in terms of generating the most efficient execution plan, a fact table with bitmap indexes on the dimension key columns looks like being the optimal solution, although this apparent "on-the-fly" creation of bitmap indexes from regular B*Tree indexes looks interesting. So what about the question about index size?

SQL> select segment_name
  2  ,      bytes/1024000 "Size in MB"
  3  from   user_segments
  4  where  segment_name like 'CUST_ID%'
  5  or     segment_name like 'PROD_ID%'
  6  or     segment_name like 'CHANNEL_ID%'
  7  or     segment_name like 'TIME_ID%'
  8  /

SEGMENT_NAME         Size in MB
-------------------- ----------
CHANNEL_ID_BIX            3.072
CHANNEL_ID_IDX           122.88
CUST_ID_BIX              27.648
CUST_ID_IDX             131.072
PROD_ID_BIX               2.048
PROD_ID_IDX              122.88
TIME_ID_BIX               4.096
TIME_ID_IDX              163.84

8 rows selected.

So the normal non-unique indexes are currently over 100MB in size each, whilst the bitmap indexes (which are highly compressed) are below 5MB for the small dimensions, and around 27MB for the large, 55000-row dimension. Now we'll see how these are affected by inserts and updates to the tables.

SQL> insert into sales_bix
  2  select *
  3  from   sh.sales
  4  /

918843 rows created.

Elapsed: 00:00:56.67
SQL> commit
  2  /

Commit complete.

Elapsed: 00:00:00.39
SQL> insert into sales_idx
  2  select *
  3  from   sh.sales
  4  /

918843 rows created.

Elapsed: 00:09:55.17
SQL> commit;

Commit complete.

Elapsed: 00:00:00.02
SQL>
SQL> select segment_name
  2  ,      bytes/1024000 "Size in MB"
  3  from   user_segments
  4  where  segment_name like 'CUST_ID%'
  5  or     segment_name like 'PROD_ID%'
  6  or     segment_name like 'CHANNEL_ID%'
  7  or     segment_name like 'TIME_ID%'
  8  /

SEGMENT_NAME         Size in MB
-------------------- ----------
CHANNEL_ID_BIX            3.072
CHANNEL_ID_IDX          139.264
CUST_ID_BIX              27.648
CUST_ID_IDX             180.224
PROD_ID_BIX               2.048
PROD_ID_IDX             147.456
TIME_ID_BIX               4.096
TIME_ID_IDX             196.608

8 rows selected.

Ok, so the table with bitmapped indexes took just under a minute to insert rows in to (1m rows into a table of about 7m rows) and the indexes stayed the same size, whilst the corresponding insert into the table with regular indexes took around 9 minutes and increased the index size by between 10% and 40%. Note that with this insert, no new dimension key values were used, it was the same set of data being inserted back into the test tables.

And now the same for updates. I'll do updates to the CUST_ID column as this has the largest amount of key values (55000-odd). Here are the results:

SQL> update sales_bix
  2  set    cust_id = 3001
  3  where  time_id between to_date('01-jan-2000','DD-MON-YYYY')
  4         and to_date('30-apr-2000','DD-MON-YYYY')
  5  and    channel_id =3
  6  /

461151 rows updated.

Elapsed: 00:00:45.51
SQL> commit;

Commit complete.

Elapsed: 00:00:00.01
SQL> update sales_idx
  2  set    cust_id = 3000
  3  where  time_id between to_date('01-jan-1999','DD-MON-YYYY')
  4         and to_date('30-apr-1999','DD-MON-YYYY')
  5  and    channel_id = 2
  6  /

186120 rows updated.

Elapsed: 00:03:01.97
SQL> update sales_idx
  2  set    cust_id = 3001
  3  where  time_id between to_date('01-jan-2000','DD-MON-YYYY')
  4         and to_date('30-apr-2000','DD-MON-YYYY')
  5  and    channel_id =3
  6  /

461151 rows updated.

Elapsed: 00:06:18.26
SQL> commit;

Commit complete.

Elapsed: 00:00:00.05
SQL> select segment_name
  2  ,      bytes/1024000 "Size in MB"
  3  from   user_segments
  4  where  segment_name like 'CUST_ID%'
  5  or     segment_name like 'PROD_ID%'
  6  or     segment_name like 'CHANNEL_ID%'
  7  or     segment_name like 'TIME_ID%'
  8  /

SEGMENT_NAME         Size in MB
-------------------- ----------
CHANNEL_ID_BIX            3.072
CHANNEL_ID_IDX          139.264
CUST_ID_BIX              28.672
CUST_ID_IDX             196.608
PROD_ID_BIX               2.048
PROD_ID_IDX             147.456
TIME_ID_BIX               4.096
TIME_ID_IDX             196.608

8 rows selected.

Ok, so the CUST_ID_IDX index has got bigger, but the bitmap indexes have stayed the same. What if I create another 5,000 customers, create another 100,000 sales records using these new customers and add them two the two fact tables?

SQL> insert into sales_bix
  2  select *
  3  from   new_sales
  4  /

100000 rows created.

Elapsed: 00:00:03.78
SQL> insert into sales_idx
  2  select *
  3  from   new_sales
  4  /

100000 rows created.

Elapsed: 00:00:33.18
SQL> commit;

Commit complete.

Elapsed: 00:00:00.01
SQL> select segment_name
  2  ,      bytes/1024000 "Size in MB"
  3  from   user_segments
  4  where  segment_name like 'CUST_ID%'
  5  or     segment_name like 'PROD_ID%'
  6  or     segment_name like 'CHANNEL_ID%'
  7  or     segment_name like 'TIME_ID%'
  8  /

SEGMENT_NAME              Size in MB
------------------------- ----------
CHANNEL_ID_BIX                 3.072
CHANNEL_ID_IDX               147.456
CUST_ID_BIX                   28.672
CUST_ID_IDX                  196.608
PROD_ID_BIX                    2.048
PROD_ID_IDX                  147.456
TIME_ID_BIX                    4.096
TIME_ID_IDX                  196.608

8 rows selected.

So, no difference then, and the bitmap-indexed table is still faster to insert data in to, compared to a regularly-indexed table.

What I've found then is that, with my data set and my laptop, there's no noticeable degradation when maintaining a fact table with bitmap indexes on the fact table columns, and in fact activities on this table are faster than the same activities on a table with B*Tree indexes. Now I'm of course open to the idea that my data set is skewed, or my environment is unrepresentative, but I'm surprised nonetheless as I expected all hell to break loose once I tried updating and inserting into column with bitmap indexes on them, and that's not what I got in my testing. Maybe my fact table is too small (7m rows) or my dimensions are too small (two are <100 rows, one is 1.5k and one is 55k rows), but there was a clear performance difference between the table with regular indexes and bitmap indexes, so we can't put it down to such a small data set that all activities ran in just a second or so. Strange.

Also, the star transformation taking place on the table with regular, non-unique indexes was strange as well. I had a quick email exchange with Peter Scott and he mentioned that DB/2, for example, also creates bitmap indexes on the fly, so this could be something that was introduced in 10gR1 or R2 and I didn't notice at the time. It was an interesting exercise though as I thought firstly that bitmap indexes being updated would be a performance killer, and it wasn't, and I was pleasantly surprised when the star transformation took place so easily (and in fact appears to have taken place even when I didn't have the neccessary bitmap indexes in place).

We'll be running all this on the real data set in a couple of weeks, including trying to update table columns with bitmap indexes on them, so it'll be interesting to see whether this test data set ends up being representative of the real-world data; also, if the bitmap indexes revert to type (which I suspect they will) it'll be interesting to see where I went wrong in my testing. I've you've spotted anything already (or indeed if this is in fact the correct behavior, especially the star transformation bit) let me know as I'm keen to get these unresolved bits sorted out. Also, I'm conscious that my testing is on 10g 10.2.0.3 whilst the client is on 10.1.0.5, so if this is a 10gR2-specific behavior, again let me know.