Using the Sparsity Advisor in Oracle OLAP 10gR2
April 4th, 2006 by Mark Rittman
I
mentioned yesterday that I was looking to update the partitioning and sparsity handling sections of my Oracle OLAP Best Practices paper, prior to
running it again at Collaborate’06. Yesterday I covered off partitioning in a
lot more detail, and today I’m going to look at the new
Sparsity Advisor that
comes with Oracle Database 10g Release 2.
In the version of the
OLAP Best Practices paper I delivered at the Desktop
Conference 2006, I walked through an example where I calculated the sparsity of
a cube based on the data in the fact table. To perform this calculation, I first
of all worked out the product of all the lowest level members in each of the
five dimensions:
39 customers x 22 products x 2 channels x 10 promotions x 2929 days =
50,260,640 potential cell combinations per measure
Then, I ran a select statement against the fact table, to see how many of
these actual combinations (referred to as "tuples") actually existed in the source
data:
SQL> SELECT count(distinct(to_char(custid)||
2 to_char(prodid)||
3 to_char(channelid)||
4 to_char(orderdate)||
5 to_char(promoid)) distinct_cells
6 FROM gs_sales
7 /
DISTINCT_CELLS
--------------
11401
Then, by dividing the actual combinations by the potential ones, you come up
with a figure for the sparsity of the cube:
11,401 actual / 50,261,640 potential * 100 = 0.02% sparse
which is a pretty sparse cube (30% is the general cut-off between sparse and
dense)
The only problem with this approach though is that we are only considering
the lowest level of the dimension. In reality, each dimension will have one or
more hierarchies which themselves have one or more levels, and the actual cube
density will be affected by how these levels rollup through their hierarchies.
You could of course calculate this manually but it’s very, very difficult and
most people just work off of the rough figure I’ve used above.
At the end of the day though, the correct figure can be calculated, it’s just
quite tricky, and using this you should be able to automatically define whether
dimensions are sparse or not, and also make a call about the order in which they
should be listed, and funny enough a feature that does this is exactly what
Oracle have provided in the lastest version of Oracle OLAP.
This feature is called the Sparsity Advisor and it takes the form of a number
of subprograms within the DBMS_AW package. Going forward, my understanding is
that this new functionality will be incorporated into Analytic Workspace
Manager, I’ll speculate on how that might take form later on.
Full details on the procedures and functions within the Sparsity Advisor can
be found in the
online documentation, but what I’ll do rather than go through them
one-by-one is work through an example, using the Global Widgets OLAP dataset
that I’ve used previously.
The Global Widgets OLAP dataset has five dimensions – customer, product,
time, promotion and channel – and a cube with two measures, order quantity and
ship quantity. Each of the dimensions has a single hierarchy consisting of
multiple levels, like this:
When you view the logical dimensional model in AWM, it looks
like this:

The Sparsity Advisor works by analyzing the data in the source
tables, and you pass to it the list of columns that go to make up the levels in
the hierarchy. The first step then is to take a look at the tables that provide
the data for your cube:
SQL> CONNECT gsw/password@ora10g Connected. SQL> SET ECHO ON SQL> SET LINESIZE 300 SQL> SET PAGESIZE 300 SQL> SET SERVEROUTPUT ON FORMAT WRAPPED SQL> SQL> DESCRIBE gs_channel_dim Name -------------------------------------------- CHANNELID NAME ALL_CHANNELS SQL> DESCRIBE gs_promotion_dim Name -------------------------------------------- PROMOID PROMO_NAME PROMO_CATEGORY_ID PROMO_CATEGORY_NAME SQL> DESCRIBE gs_product_dim Name -------------------------------------------- PRODID NAME PRODSIZE COST SELL PRODLINE_NAME ALL_PRODUCTS SQL> DESCRIBE gs_customer_dim Name -------------------------------------------- CUSTID NAME CREDITLIMIT ADDRESS STATE POSTCODE CITY_NAME DISTRICT_NAME REGION_NAME ALL_CUSTOMERS SQL> DESCRIBE gs_time_dim Name -------------------------------------------- DAYID DAY_TIMESPAN DAY_ENDDATE MONTHID MONTH_ENDDATE MONTH_TIMESPAN QUARTERID QUARTER_ENDDATE QUARTER_TIMESPAN YEARID YEAR_ENDDATE YEAR_TIMESPAN SQL> DESCRIBE gs_sales Name -------------------------------------------- ORDID CUSTID PRODID CHANNELID DAYID ORDERQTY SHIPDATE SHIPQTY REQDATE CANCELDATE STATUS SELLPRICE COSTPRICE PROMOID
Now that we’ve got a listing of tables and columns that provide
the cube data, the next step is to use the
DBMS_AW.SPARSITY_ADVICE_TABLE
procedure to create a table within our schema to hold the advisor’s output:
SQL> SQL> -- Define and initialise and advice table SQL> BEGIN 2 dbms_aw.sparsity_advice_table(); 3 EXCEPTION 4 when others then null; 5 END; 6 / PL/SQL procedure successfully completed. SQL> SQL> TRUNCATE TABLE aw_sparsity_advice; Table truncated.
Now, we need to put some PL/SQL together to tell the advisor how the columns
in our source tables populate the dimensions in our cube. The way you do this is
to go through the dimensions in your cube, and make calls to the
DBMS_AW.ADD_DIMENSION_SOURCE procedure, passing to it
- the name of the dimension,
- the column in the source table that links through to the fact table,
- the name of the dimension source table, or a
SELECT statement that
provides the source data - an indicator as to whether the dimension is parent-child, level based, a
measure dimension or has no hierarchy, and - a list of the columns that provide the IDs for the hierarchy levels, in
order from the bottom to the top level.
Here’s an example using the GSW_AW source tables:
SQL>
SQL> DECLARE
2 dimsources dbms_aw$_dimension_sources_t;
3 dimlist VARCHAR2(500);
4 sparsedim VARCHAR2(500);
5 defs CLOB;
6 BEGIN
7 -- Provide information about all dimensions in the cube
8
9 dbms_aw.add_dimension_source('channel','channelid',dimsources,
10 'gs_channel_dim', dbms_aw.hier_levels,
11 dbms_aw$_columnlist_t('channel_id','all_channels'));
12
13 dbms_aw.add_dimension_source('product','prodid',dimsources,
14 'gs_product_dim', dbms_aw.hier_levels,
15 dbms_aw$_columnlist_t('prodid','prodline_name','all_products'));
16
17 dbms_aw.add_dimension_source('customer','custid',dimsources,
18 'gs_customer_dim', dbms_aw.hier_levels,
19 dbms_aw$_columnlist_t('custid','city_name','district_name',
20 'region_name','all_customers'));
21
22 dbms_aw.add_dimension_source('promotion','promoid',dimsources,
23 'gs_promotion_dim', dbms_aw.hier_levels,
24 dbms_aw$_columnlist_t('promoid','promo_category_id'));
25
26 dbms_aw.add_dimension_source('time','dayid',dimsources,
27 'gs_time_dim',dbms_aw.hier_levels,
28 dbms_aw$_columnlist_t('dayid','monthid','quarterid','yearid'));
29
Once you’ve defined how your dimensions are sourced, you can then call the
DBMS_AW.ADVISE_SPARSITY procedure to populate your Sparsity Advice table.
30
31 -- Analyze fact table and provide advicxe without partitioning
32
33 dbms_aw.advise_sparsity('gs_sales','sales_cube',
34 dimsources, dbms_aw.advice_full,
35 dbms_aw.partby_none);
36
37 commit;
Note that there are options as to whether your advice is full, fast or default, and whether you include any advice on measure partitioning (see yesterday’s article for details), but for the meantime we’ll generate full advice but leave partitioning for another time.
The final PL/SQL bit is to call the DBMS_AW.ADVISE_DIMENSIONALITY
function and
procedure to generate
recommendations on the order for the dimensions, the definition for any
composite (or compressed composite) that it recommends, and the definition of
the measures.
38
39 -- Generate OLAP DML for composite and variable definitions
40
41 dimlist := dbms_aw.advise_dimensionality('sales_cube', sparsedim,
42 'sales_cube_composite');
43 dbms_output.put_line('Dimension list: ' || dimlist);
44 dbms_output.put_line('Sparse dimension: ' || sparsedim);
45 dbms_aw.advise_dimensionality(defs, 'sales_cube');
46 dbms_output.put_line('Definitions: ');
47 dbms_aw.printlog(defs);
48
49 END;
50 /
When you run this, the recommendation looks like this:
Dimension list: <sales_cube_composite<channel promotion product customer time>> Sparse dimension: DEFINE sales_cube_composite COMPOSITE COMPRESSED <channel promotion product custo Definitions: DEFINE sales_cube.cc COMPOSITE COMPRESSED <channel promotion product customer time> DEFINE sales_cube NUMBER VARIABLE <sales_cube.cc<channel promotion product customer time>> PL/SQL procedure successfully completed.
The way you read this is as follows:
- The Dimension List line shows you that all of your dimensions will be
included in a composite, in the order channel first, then promotion, then
product, then customer and then time. This definition, with <composite_name<dim1
dim2 dim 3 dim4>> is actually OLAP DML notation (the old Express developers
will recognise it) and in terms of Analytic Workspace Manager, it means that
you’ll mark all dimensions as sparse, and list them in the order of channel,
then promotion, then product and so on. - The next line tells you that this composite will be defined as
COMPOSITE COMPRESSED, as opposed to
COMPOSITE. Again this is OLAP DML and in
Analytic Workspace Manager terms, it means you should tick the "Create the
cube compressed" tick box. - The remaining two lines with the DEFINEs at the start you can ignore if
you’re using Analytic Workspace Manager to define your cube; again what they
are is the OLAP DML definitions for the compressed composite and the measure
and you shouldn’t need to worry about these.
Just to round it out, once you’ve run the Sparsity Advisor and turned to
Analytic Workspace Manager to define your cube, based on these recommendations
your "Implementation Details" tab should look like this:

The thing to remember here is that when it refers to a
composite, what it means is that all of the dimensions within the composite
should be marked as sparse. When it defines the composite as compressed, it
means that you should tick the "Use Compression…" tickbox.
Once you’ve run the advisor, you can select from the
AW_SPARSITY_ADVICE table to see how it arrived at its conclusions:
SQL> SELECT fact 2 , dimension 3 , dimcolumn 4 , membercount nmem 5 , leafcount nleaf 6 , advice 7 , density 8 FROM aw_sparsity_advice 9 WHERE cubename='sales_cube' 10 / FACT DIMENSION DIMCOLUMN NMEM NLEAF ADVICE DENSITY -------------------- -------------------- -------------------- ---------- ---------- --------------- gs_sales channel channelid 2 2 COMPRESSED .00437484 gs_sales promotion promoid 10 3 COMPRESSED .00244991 gs_sales product prodid 22 22 COMPRESSED .0427859 gs_sales customer custid 39 39 COMPRESSED .15399423 gs_sales time dayid 2926 1549 COMPRESSED .85230554
There are more columns within this table that you can select
from, for more details
check out the docs.
Finally, one thing to be aware of here is that you need to have
a column name match between the fact table and each of the dimension tables. In my case, the lowest level ID in the
GS_TIME_DIM
is called DAYID, but in the fact table this dimension is used several times, for
the order date, the ship date and so on. To make the sparsity advisor work, I
had to temporarily rename the ORDERDATE column in
GS_SALES to
DAYID so that it
matched the lowest level key in the GS_TIME_DIM table, like this:
SQL> alter table gs_sales 2 rename column orderdate to dayid 3 / Table altered.
otherwise, you’ll get an error to say that there’s no link between one
of the dimensions and the fact table.
Wrapping up then, it seems like this advisor is a pretty good way of working
out what order to list your dimensions in, whether they should be marked as
sparse or dense, and whether you should go one step further and use cube
compression. It seems a little bit scary using this first of all, but you can
use the code above as a template and just slot in the dimensions and measures
you want to use, but going forward the obvious thing is for Oracle to include
this functionality in Analytic Workspace Manager, either as a "generate
recommendations" button, or even as something that gets run automatically when
you go to define your cube. It’ll be interesting to see how this feature is
developed in future.

December 15th, 2006 at 10:14 pm
[...] I covered the PL/SQL Sparsity Advisor earlier in the year, but at the time it was fairly tricky to run and now the functionality has been incorporated directly into Analytic Workspace Manager. What both these things do – AWM is just a GUI front-end for the PL/SQL version – is works out for you which of your dimensions are sparse, which are dense, and it recommends the order that you should list the dimensions, something that can have a fairly significant influence on the time it takes to load and query your cube. [...]