Using the Sparsity Advisor in Oracle OLAP 10gR2

April 4th, 2006 by Mark Rittman

I
mentioned yesterday that I was looking to update the partitioning and sparsity handling sections of my Oracle OLAP Best Practices paper, prior to
running it again at Collaborate’06. Yesterday I covered off partitioning in a
lot more detail, and today I’m going to look at the new
Sparsity Advisor that
comes with Oracle Database 10g Release 2.

In the version of the
OLAP Best Practices paper I delivered at the Desktop
Conference 2006
, I walked through an example where I calculated the sparsity of
a cube based on the data in the fact table. To perform this calculation, I first
of all worked out the product of all the lowest level members in each of the
five dimensions:

39 customers x 22 products x 2 channels x 10 promotions x 2929 days =
50,260,640 potential cell combinations per measure

Then, I ran a select statement against the fact table, to see how many of
these actual combinations (referred to as "tuples") actually existed in the source
data:

SQL> SELECT count(distinct(to_char(custid)||
  2         to_char(prodid)||
  3         to_char(channelid)||
  4         to_char(orderdate)||
  5         to_char(promoid)) distinct_cells
  6  FROM   gs_sales
7  /

DISTINCT_CELLS
--------------
         11401

Then, by dividing the actual combinations by the potential ones, you come up
with a figure for the sparsity of the cube:

11,401 actual / 50,261,640 potential * 100 = 0.02% sparse

which is a pretty sparse cube (30% is the general cut-off between sparse and
dense)

The only problem with this approach though is that we are only considering
the lowest level of the dimension. In reality, each dimension will have one or
more hierarchies which themselves have one or more levels, and the actual cube
density will be affected by how these levels rollup through their hierarchies.
You could of course calculate this manually but it’s very, very difficult and
most people just work off of the rough figure I’ve used above.

At the end of the day though, the correct figure can be calculated, it’s just
quite tricky, and using this you should be able to automatically define whether
dimensions are sparse or not, and also make a call about the order in which they
should be listed, and funny enough a feature that does this is exactly what
Oracle have provided in the lastest version of Oracle OLAP.

This feature is called the Sparsity Advisor and it takes the form of a number
of subprograms within the DBMS_AW package. Going forward, my understanding is
that this new functionality will be incorporated into Analytic Workspace
Manager, I’ll speculate on how that might take form later on.

Full details on the procedures and functions within the Sparsity Advisor can
be found in the

online documentation
, but what I’ll do rather than go through them
one-by-one is work through an example, using the Global Widgets OLAP dataset
that I’ve used previously.

The Global Widgets OLAP dataset has five dimensions – customer, product,
time, promotion and channel – and a cube with two measures, order quantity and
ship quantity. Each of the dimensions has a single hierarchy consisting of
multiple levels, like this:

When you view the logical dimensional model in AWM, it looks
like this:

The Sparsity Advisor works by analyzing the data in the source
tables, and you pass to it the list of columns that go to make up the levels in
the hierarchy. The first step then is to take a look at the tables that provide
the data for your cube:

SQL> CONNECT gsw/password@ora10g
Connected.
SQL> SET ECHO ON
SQL> SET LINESIZE 300
SQL> SET PAGESIZE 300
SQL> SET SERVEROUTPUT ON FORMAT WRAPPED
SQL>
SQL> DESCRIBE gs_channel_dim
 Name
 --------------------------------------------
 CHANNELID
 NAME
 ALL_CHANNELS

SQL> DESCRIBE gs_promotion_dim
 Name
 --------------------------------------------
 PROMOID
 PROMO_NAME
 PROMO_CATEGORY_ID
 PROMO_CATEGORY_NAME

SQL> DESCRIBE gs_product_dim
 Name
 --------------------------------------------
 PRODID
 NAME
 PRODSIZE
 COST
 SELL
 PRODLINE_NAME
 ALL_PRODUCTS

SQL> DESCRIBE gs_customer_dim
 Name
 --------------------------------------------
 CUSTID
 NAME
 CREDITLIMIT
 ADDRESS
 STATE
 POSTCODE
 CITY_NAME
 DISTRICT_NAME
 REGION_NAME
 ALL_CUSTOMERS

SQL> DESCRIBE gs_time_dim
 Name
 --------------------------------------------
 DAYID
 DAY_TIMESPAN
 DAY_ENDDATE
 MONTHID
 MONTH_ENDDATE
 MONTH_TIMESPAN
 QUARTERID
 QUARTER_ENDDATE
 QUARTER_TIMESPAN
 YEARID
 YEAR_ENDDATE
 YEAR_TIMESPAN

SQL> DESCRIBE gs_sales
 Name
 --------------------------------------------
 ORDID
 CUSTID
 PRODID
 CHANNELID
 DAYID
 ORDERQTY
 SHIPDATE
 SHIPQTY
 REQDATE
 CANCELDATE
 STATUS
 SELLPRICE
 COSTPRICE
 PROMOID                                        

Now that we’ve got a listing of tables and columns that provide
the cube data, the next step is to use the

DBMS_AW.SPARSITY_ADVICE_TABLE

procedure to create a table within our schema to hold the advisor’s output:

SQL>
SQL> -- Define and initialise and advice table
SQL> BEGIN
  2          dbms_aw.sparsity_advice_table();
  3  EXCEPTION
  4          when others then null;
  5  END;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> TRUNCATE TABLE aw_sparsity_advice;

Table truncated.

Now, we need to put some PL/SQL together to tell the advisor how the columns
in our source tables populate the dimensions in our cube. The way you do this is
to go through the dimensions in your cube, and make calls to the


DBMS_AW.ADD_DIMENSION_SOURCE
procedure, passing to it

  1. the name of the dimension,
  2. the column in the source table that links through to the fact table,
  3. the name of the dimension source table, or a
    SELECT statement that
    provides the source data
  4. an indicator as to whether the dimension is parent-child, level based, a
    measure dimension or has no hierarchy, and
  5. a list of the columns that provide the IDs for the hierarchy levels, in
    order from the bottom to the top level.

Here’s an example using the GSW_AW source tables:

SQL>
SQL> DECLARE
  2          dimsources      dbms_aw$_dimension_sources_t;
  3          dimlist         VARCHAR2(500);
  4          sparsedim       VARCHAR2(500);
  5          defs            CLOB;
  6  BEGIN
  7  -- Provide information about all dimensions in the cube
  8
  9          dbms_aw.add_dimension_source('channel','channelid',dimsources,
 10                  'gs_channel_dim', dbms_aw.hier_levels,
 11                  dbms_aw$_columnlist_t('channel_id','all_channels'));
 12
 13          dbms_aw.add_dimension_source('product','prodid',dimsources,
 14                  'gs_product_dim', dbms_aw.hier_levels,
 15                  dbms_aw$_columnlist_t('prodid','prodline_name','all_products'));
 16
 17          dbms_aw.add_dimension_source('customer','custid',dimsources,
 18                  'gs_customer_dim', dbms_aw.hier_levels,
 19                  dbms_aw$_columnlist_t('custid','city_name','district_name',
 20                  'region_name','all_customers'));
 21
 22          dbms_aw.add_dimension_source('promotion','promoid',dimsources,
 23                  'gs_promotion_dim', dbms_aw.hier_levels,
 24                  dbms_aw$_columnlist_t('promoid','promo_category_id'));
 25
 26          dbms_aw.add_dimension_source('time','dayid',dimsources,
 27                  'gs_time_dim',dbms_aw.hier_levels,
 28                  dbms_aw$_columnlist_t('dayid','monthid','quarterid','yearid'));
 29  

Once you’ve defined how your dimensions are sourced, you can then call the

DBMS_AW.ADVISE_SPARSITY
procedure to populate your Sparsity Advice table.

 30
 31  -- Analyze fact table and provide advicxe without partitioning
 32
 33          dbms_aw.advise_sparsity('gs_sales','sales_cube',
 34                                  dimsources, dbms_aw.advice_full,
 35                                  dbms_aw.partby_none);
 36
 37  commit;

Note that there are options as to whether your advice is full, fast or default, and whether you include any advice on measure partitioning (see yesterday’s article for details), but for the meantime we’ll generate full advice but leave partitioning for another time.

The final PL/SQL bit is to call the DBMS_AW.ADVISE_DIMENSIONALITY

function
and

procedure
to generate
recommendations on the order for the dimensions, the definition for any
composite (or compressed composite) that it recommends, and the definition of
the measures.

 38
 39  -- Generate OLAP DML for composite and variable definitions
 40
 41          dimlist := dbms_aw.advise_dimensionality('sales_cube', sparsedim,
 42                          'sales_cube_composite');
 43          dbms_output.put_line('Dimension list:  ' || dimlist);
 44          dbms_output.put_line('Sparse dimension:  ' || sparsedim);
 45          dbms_aw.advise_dimensionality(defs, 'sales_cube');
 46          dbms_output.put_line('Definitions:  ');
 47          dbms_aw.printlog(defs);
 48
 49  END;
 50  /

When you run this, the recommendation looks like this:

Dimension list:  <sales_cube_composite<channel promotion product customer time>>
Sparse dimension:  DEFINE sales_cube_composite COMPOSITE COMPRESSED <channel promotion product custo
Definitions:
DEFINE sales_cube.cc COMPOSITE COMPRESSED <channel promotion product customer time>
DEFINE sales_cube NUMBER VARIABLE <sales_cube.cc<channel promotion product customer time>>

PL/SQL procedure successfully completed.

The way you read this is as follows:

  • The Dimension List line shows you that all of your dimensions will be
    included in a composite, in the order channel first, then promotion, then
    product, then customer and then time. This definition, with <composite_name<dim1
    dim2 dim 3 dim4>> is actually OLAP DML notation (the old Express developers
    will recognise it) and in terms of Analytic Workspace Manager, it means that
    you’ll mark all dimensions as sparse, and list them in the order of channel,
    then promotion, then product and so on.
  • The next line tells you that this composite will be defined as
    COMPOSITE COMPRESSED, as opposed to
    COMPOSITE. Again this is OLAP DML and in
    Analytic Workspace Manager terms, it means you should tick the "Create the
    cube compressed" tick box.
  • The remaining two lines with the DEFINEs at the start you can ignore if
    you’re using Analytic Workspace Manager to define your cube; again what they
    are is the OLAP DML definitions for the compressed composite and the measure
    and you shouldn’t need to worry about these.

Just to round it out, once you’ve run the Sparsity Advisor and turned to
Analytic Workspace Manager to define your cube, based on these recommendations
your "Implementation Details" tab should look like this:

The thing to remember here is that when it refers to a
composite, what it means is that all of the dimensions within the composite
should be marked as sparse. When it defines the composite as compressed, it
means that you should tick the "Use Compression…" tickbox.

Once you’ve run the advisor, you can select from the
AW_SPARSITY_ADVICE table to see how it arrived at its conclusions:

SQL> SELECT fact
  2  ,      dimension
  3  ,      dimcolumn
  4  ,      membercount nmem
  5  ,      leafcount   nleaf
  6  ,      advice
  7  ,      density
  8  FROM   aw_sparsity_advice
  9  WHERE  cubename='sales_cube'
 10  /

FACT                 DIMENSION            DIMCOLUMN                  NMEM      NLEAF ADVICE                  DENSITY
-------------------- -------------------- -------------------- ---------- ---------- ---------------
gs_sales             channel              channelid                     2          2 COMPRESSED            .00437484
gs_sales             promotion            promoid                      10          3 COMPRESSED            .00244991
gs_sales             product              prodid                       22         22 COMPRESSED             .0427859
gs_sales             customer             custid                       39         39 COMPRESSED            .15399423
gs_sales             time                 dayid                      2926       1549 COMPRESSED            .85230554

There are more columns within this table that you can select
from, for more details

check out the docs
.

Finally, one thing to be aware of here is that you need to have
a column name match between the fact table and each of the dimension tables. In my case, the lowest level ID in the
GS_TIME_DIM
is called DAYID, but in the fact table this dimension is used several times, for
the order date, the ship date and so on. To make the sparsity advisor work, I
had to temporarily rename the ORDERDATE column in
GS_SALES to
DAYID so that it
matched the lowest level key in the GS_TIME_DIM table, like this:

SQL> alter table gs_sales
  2  rename column orderdate to dayid
  3  /

Table altered.

otherwise,  you’ll get an error to say that there’s no link between one
of the dimensions and the fact table.

Wrapping up then, it seems like this advisor is a pretty good way of working
out what order to list your dimensions in, whether they should be marked as
sparse or dense, and whether you should go one step further and use cube
compression. It seems a little bit scary using this first of all, but you can
use the code above as a template and just slot in the dimensions and measures
you want to use, but going forward the obvious thing is for Oracle to include
this functionality in Analytic Workspace Manager, either as a "generate
recommendations" button, or even as something that gets run automatically when
you go to define your cube. It’ll be interesting to see how this feature is
developed in future.

Comments

  1. Mark Rittman &#187; New Sparsity Advisor within AWM 10.2.0.3 Says:

    [...] I covered the PL/SQL Sparsity Advisor earlier in the year, but at the time it was fairly tricky to run and now the functionality has been incorporated directly into Analytic Workspace Manager. What both these things do – AWM is just a GUI front-end for the PL/SQL version – is works out for you which of your dimensions are sparse, which are dense, and it recommends the order that you should list the dimensions, something that can have a fairly significant influence on the time it takes to load and query your cube. [...]

Website Design & Build: tymedia.co.uk