Using the Sparsity Advisor in Oracle OLAP 10gR2

I mentioned yesterday that I was looking to update the partitioning and sparsity handling sections of my Oracle OLAP Best Practices paper, prior to running it again at Collaborate'06. Yesterday I covered off partitioning in a lot more detail, and today I'm going to look at the new Sparsity Advisor that comes with Oracle Database 10g Release 2.

In the version of the OLAP Best Practices paper I delivered at the Desktop Conference 2006, I walked through an example where I calculated the sparsity of a cube based on the data in the fact table. To perform this calculation, I first of all worked out the product of all the lowest level members in each of the five dimensions:

39 customers x 22 products x 2 channels x 10 promotions x 2929 days = 50,260,640 potential cell combinations per measure

Then, I ran a select statement against the fact table, to see how many of these actual combinations (referred to as "tuples") actually existed in the source data:

SQL> SELECT count(distinct(to_char(custid)||
  2         to_char(prodid)||
  3         to_char(channelid)||
  4         to_char(orderdate)||
  5         to_char(promoid)) distinct_cells
  6  FROM   gs_sales
7  /

DISTINCT_CELLS

     11401</pre>

Then, by dividing the actual combinations by the potential ones, you come up with a figure for the sparsity of the cube:

11,401 actual / 50,261,640 potential * 100 = 0.02% sparse

which is a pretty sparse cube (30% is the general cut-off between sparse and dense)

The only problem with this approach though is that we are only considering the lowest level of the dimension. In reality, each dimension will have one or more hierarchies which themselves have one or more levels, and the actual cube density will be affected by how these levels rollup through their hierarchies. You could of course calculate this manually but it's very, very difficult and most people just work off of the rough figure I've used above.

At the end of the day though, the correct figure can be calculated, it's just quite tricky, and using this you should be able to automatically define whether dimensions are sparse or not, and also make a call about the order in which they should be listed, and funny enough a feature that does this is exactly what Oracle have provided in the lastest version of Oracle OLAP.

This feature is called the Sparsity Advisor and it takes the form of a number of subprograms within the DBMS_AW package. Going forward, my understanding is that this new functionality will be incorporated into Analytic Workspace Manager, I'll speculate on how that might take form later on.

Full details on the procedures and functions within the Sparsity Advisor can be found in the online documentation, but what I'll do rather than go through them one-by-one is work through an example, using the Global Widgets OLAP dataset that I've used previously.

The Global Widgets OLAP dataset has five dimensions - customer, product, time, promotion and channel - and a cube with two measures, order quantity and ship quantity. Each of the dimensions has a single hierarchy consisting of multiple levels, like this:

When you view the logical dimensional model in AWM, it looks like this:

The Sparsity Advisor works by analyzing the data in the source tables, and you pass to it the list of columns that go to make up the levels in the hierarchy. The first step then is to take a look at the tables that provide the data for your cube:

SQL> CONNECT gsw/password@ora10g
Connected.
SQL> SET ECHO ON
SQL> SET LINESIZE 300
SQL> SET PAGESIZE 300
SQL> SET SERVEROUTPUT ON FORMAT WRAPPED
SQL>
SQL> DESCRIBE gs_channel_dim
 Name
 --------------------------------------------
 CHANNELID
 NAME
 ALL_CHANNELS

SQL> DESCRIBE gs_promotion_dim
Name

PROMOID
PROMO_NAME
PROMO_CATEGORY_ID
PROMO_CATEGORY_NAME

SQL> DESCRIBE gs_product_dim
Name

PRODID
NAME
PRODSIZE
COST
SELL
PRODLINE_NAME
ALL_PRODUCTS

SQL> DESCRIBE gs_customer_dim
Name

CUSTID
NAME
CREDITLIMIT
ADDRESS
STATE
POSTCODE
CITY_NAME
DISTRICT_NAME
REGION_NAME
ALL_CUSTOMERS

SQL> DESCRIBE gs_time_dim
Name

DAYID
DAY_TIMESPAN
DAY_ENDDATE
MONTHID
MONTH_ENDDATE
MONTH_TIMESPAN
QUARTERID
QUARTER_ENDDATE
QUARTER_TIMESPAN
YEARID
YEAR_ENDDATE
YEAR_TIMESPAN

SQL> DESCRIBE gs_sales
Name

ORDID
CUSTID
PRODID
CHANNELID
DAYID
ORDERQTY
SHIPDATE
SHIPQTY
REQDATE
CANCELDATE
STATUS
SELLPRICE
COSTPRICE
PROMOID

Now that we've got a listing of tables and columns that provide the cube data, the next step is to use the DBMS_AW.SPARSITY_ADVICE_TABLE procedure to create a table within our schema to hold the advisor's output:

SQL>
SQL> -- Define and initialise and advice table
SQL> BEGIN
  2          dbms_aw.sparsity_advice_table();
  3  EXCEPTION
  4          when others then null;
  5  END;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> TRUNCATE TABLE aw_sparsity_advice;

Table truncated.

Now, we need to put some PL/SQL together to tell the advisor how the columns in our source tables populate the dimensions in our cube. The way you do this is to go through the dimensions in your cube, and make calls to the DBMS_AW.ADD_DIMENSION_SOURCE procedure, passing to it

  1. the name of the dimension,
  2. the column in the source table that links through to the fact table,
  3. the name of the dimension source table, or a SELECT statement that provides the source data
  4. an indicator as to whether the dimension is parent-child, level based, a measure dimension or has no hierarchy, and
  5. a list of the columns that provide the IDs for the hierarchy levels, in order from the bottom to the top level.

Here's an example using the GSW_AW source tables:

SQL>
SQL> DECLARE
  2          dimsources      dbms_aw$_dimension_sources_t;
  3          dimlist         VARCHAR2(500);
  4          sparsedim       VARCHAR2(500);
  5          defs            CLOB;
  6  BEGIN
  7  -- Provide information about all dimensions in the cube
  8
  9          dbms_aw.add_dimension_source('channel','channelid',dimsources,
 10                  'gs_channel_dim', dbms_aw.hier_levels,
 11                  dbms_aw$_columnlist_t('channel_id','all_channels'));
 12
 13          dbms_aw.add_dimension_source('product','prodid',dimsources,
 14                  'gs_product_dim', dbms_aw.hier_levels,
 15                  dbms_aw$_columnlist_t('prodid','prodline_name','all_products'));
 16
 17          dbms_aw.add_dimension_source('customer','custid',dimsources,
 18                  'gs_customer_dim', dbms_aw.hier_levels,
 19                  dbms_aw$_columnlist_t('custid','city_name','district_name',
 20                  'region_name','all_customers'));
 21
 22          dbms_aw.add_dimension_source('promotion','promoid',dimsources,
 23                  'gs_promotion_dim', dbms_aw.hier_levels,
 24                  dbms_aw$_columnlist_t('promoid','promo_category_id'));
 25
 26          dbms_aw.add_dimension_source('time','dayid',dimsources,
 27                  'gs_time_dim',dbms_aw.hier_levels,
 28                  dbms_aw$_columnlist_t('dayid','monthid','quarterid','yearid'));
 29  

Once you've defined how your dimensions are sourced, you can then call the DBMS_AW.ADVISE_SPARSITY procedure to populate your Sparsity Advice table.

 30
 31  -- Analyze fact table and provide advicxe without partitioning
 32
 33          dbms_aw.advise_sparsity('gs_sales','sales_cube',
 34                                  dimsources, dbms_aw.advice_full,
 35                                  dbms_aw.partby_none);
 36
 37  commit;

Note that there are options as to whether your advice is full, fast or default, and whether you include any advice on measure partitioning (see yesterday's article for details), but for the meantime we'll generate full advice but leave partitioning for another time.

The final PL/SQL bit is to call the DBMS_AW.ADVISE_DIMENSIONALITY function and procedure to generate recommendations on the order for the dimensions, the definition for any composite (or compressed composite) that it recommends, and the definition of the measures.

 38
 39  -- Generate OLAP DML for composite and variable definitions
 40
 41          dimlist := dbms_aw.advise_dimensionality('sales_cube', sparsedim,
 42                          'sales_cube_composite');
 43          dbms_output.put_line('Dimension list:  ' || dimlist);
 44          dbms_output.put_line('Sparse dimension:  ' || sparsedim);
 45          dbms_aw.advise_dimensionality(defs, 'sales_cube');
 46          dbms_output.put_line('Definitions:  ');
 47          dbms_aw.printlog(defs);
 48
 49  END;
 50  /

When you run this, the recommendation looks like this:

Dimension list:  <sales_cube_composite<channel promotion product customer time>>
Sparse dimension:  DEFINE sales_cube_composite COMPOSITE COMPRESSED <channel promotion product custo
Definitions:
DEFINE sales_cube.cc COMPOSITE COMPRESSED <channel promotion product customer time>
DEFINE sales_cube NUMBER VARIABLE <sales_cube.cc<channel promotion product customer time>>

PL/SQL procedure successfully completed.

The way you read this is as follows:

  • The Dimension List line shows you that all of your dimensions will be included in a composite, in the order channel first, then promotion, then product, then customer and then time. This definition, with <composite_name<dim1 dim2 dim 3 dim4>> is actually OLAP DML notation (the old Express developers will recognise it) and in terms of Analytic Workspace Manager, it means that you'll mark all dimensions as sparse, and list them in the order of channel, then promotion, then product and so on.
  • The next line tells you that this composite will be defined as COMPOSITE COMPRESSED, as opposed to COMPOSITE. Again this is OLAP DML and in Analytic Workspace Manager terms, it means you should tick the "Create the cube compressed" tick box.
  • The remaining two lines with the DEFINEs at the start you can ignore if you're using Analytic Workspace Manager to define your cube; again what they are is the OLAP DML definitions for the compressed composite and the measure and you shouldn't need to worry about these.

Just to round it out, once you've run the Sparsity Advisor and turned to Analytic Workspace Manager to define your cube, based on these recommendations your "Implementation Details" tab should look like this:

The thing to remember here is that when it refers to a composite, what it means is that all of the dimensions within the composite should be marked as sparse. When it defines the composite as compressed, it means that you should tick the "Use Compression..." tickbox.

Once you've run the advisor, you can select from the AW_SPARSITY_ADVICE table to see how it arrived at its conclusions:

SQL> SELECT fact
  2  ,      dimension
  3  ,      dimcolumn
  4  ,      membercount nmem
  5  ,      leafcount   nleaf
  6  ,      advice
  7  ,      density
  8  FROM   aw_sparsity_advice
  9  WHERE  cubename='sales_cube'
 10  /

FACT DIMENSION DIMCOLUMN NMEM NLEAF ADVICE DENSITY


gs_sales channel channelid 2 2 COMPRESSED .00437484
gs_sales promotion promoid 10 3 COMPRESSED .00244991
gs_sales product prodid 22 22 COMPRESSED .0427859
gs_sales customer custid 39 39 COMPRESSED .15399423
gs_sales time dayid 2926 1549 COMPRESSED .85230554

There are more columns within this table that you can select from, for more details check out the docs.

Finally, one thing to be aware of here is that you need to have a column name match between the fact table and each of the dimension tables. In my case, the lowest level ID in the GS_TIME_DIM is called DAYID, but in the fact table this dimension is used several times, for the order date, the ship date and so on. To make the sparsity advisor work, I had to temporarily rename the ORDERDATE column in GS_SALES to DAYID so that it matched the lowest level key in the GS_TIME_DIM table, like this:

SQL> alter table gs_sales
  2  rename column orderdate to dayid
  3  /

Table altered.

otherwise,  you'll get an error to say that there's no link between one of the dimensions and the fact table.

Wrapping up then, it seems like this advisor is a pretty good way of working out what order to list your dimensions in, whether they should be marked as sparse or dense, and whether you should go one step further and use cube compression. It seems a little bit scary using this first of all, but you can use the code above as a template and just slot in the dimensions and measures you want to use, but going forward the obvious thing is for Oracle to include this functionality in Analytic Workspace Manager, either as a "generate recommendations" button, or even as something that gets run automatically when you go to define your cube. It'll be interesting to see how this feature is developed in future.