Compressed Composites (Oracle 10g Compression) Explained
January 5th, 2005 by Mark Rittman
If you’ve an interest in the OLAP Option and you’ve read some of my
recent articles on the
new features in Oracle 10g OLAP, you’ve probably seen a feature called
“compression” mentioned. In this
DBAZine article I described compression as “a novel form of cube
compression, which promises to both enhance query performance (by retrieving
fewer blocks of data for a given logical amount of data) and drastically reduce
batch loading and aggregation times, saving disk space on the way. Compression
in Oracle 10g OLAP is more about improving performance and scalability than
saving disk space (although that s a nice side effect). The net result of this
is that, in a given batch window, Oracle 10g OLAP can now load and aggregate
more data than before, and for a given amount of disk, can store more
information than before. This, plus big advances in scalability internally
around areas such as very big composites, makes Oracle 10g OLAP potentially a
very effective platform when building particularly large cubes.” So how does
this compression feature actually work?
Good question. I actually found out about the compression feature from
talking to a couple of people in the OLAP Product team, and at the time this was
news to me as I hadn’t seen this feature too well signposted in either the
recent (2003) batch of Open World papers or in the online documentation. Indeed
I was under
the impression that it was only coming with the 10.1.0.3 patch release, but
I’ve now
located this feature in the online documentation and it’s all a bit clearer
now.
It appears that what we’re talking about when referring to “compression” in
10g OLAP cubes is actually something called “Compressed Composites”. Composites
should be fairly familiar to anyone working with Express or Oracle OLAP and are
structures that you set up when you’ve got very sparse cubes. In cases where
there are few actual used combinations of dimensions for your variable, compared
to the potential set of valid combinations, setting up composites, which only
store within them the actual combinations of dimensions used, together with
indexes to the underlying dimensions, reduces the amount of NA values stored in
the variable and results in more efficient data storage.
So what are compressed composites? Probably the best description is to just
quote
the OLAP DML Reference R1 (10.1) section on compressed composites:
“In some cases, when you aggregate data in a variable dimensioned by a
composite defined with one or more hierarchical dimension, one parent node
may have only one descendant node and so on all the way up to the top
level. When a variable has a good deal of this type of sparsity, use a
compressed composite as the dimension of the variable. Dimensioning this
type of variable with a compressed composite creates the smallest possible
variable, composite, and composite index much smaller than if you dimension
a variable with a b-tree or hash composite.This reduction in size does not occur at the detail level. Oracle OLAP
creates composite values for detail level the same way for all composites. A
composite contains one composite tuple for each set of base dimension values
that identifies non-NA detail data in the variables that use it.The reduction in size occurs for those sets of base dimension values
that identify non-NA data at higher levels of hierarchical dimensions.
Oracle OLAP populates these higher-level values differently depending on
whether a variable is dimensioned by a b-tree, hash, or compressed
composite:For variables dimensioned by b-tree and hash composites, Oracle OLAP
creates composite tuples for non-NA data at higher levels the same way that
it does for non-NA data at the detail level. There is one composite tuple
(with its own physical position) for each set of base dimension values that
identifies non-NA data. The composite index contains all of the index
entries needed to relate the composite tuple to the base dimension values.
For variables dimensioned by compressed composites, Oracle OLAP
reduces redundancy in the variable, composite, and composite index by using
the”intelligence” of the AGGREGATE command that populates the variable. For
sets of base dimension values that represent parent nodes, Oracle OLAP
creates a physical position in the composite only for those tuples that
represent a parent with more than one descendant. Oracle OLAP then creates
an index between this composite structure and the base dimensions and uses
this composite structure as the dimension of the variable. Since the actual
structure of a compressed composite is smaller than that of a b-tree or hash
composite, a variable dimensioned by a compressed composite is also smaller
than a variable dimensioned by a b-tree or hash composite. Also, since the
index for a compressed composite only has nodes for parents with more than
one descendant, the index of a compressed composite has fewer levels and is
smaller than the index of a b-tree composite.Although performance varies depending on the depth of the hierarchies
and the order of the dimensions in the composite, aggregating variables
defined with compressed composites is typically much faster than aggregating
variables defined with b-tree or hash composites.”
You can define a composite as compressed using the syntax:
DEFINE name COMPOSITE
[AW
workspace] COMPRESSED [SESSION]
Alternatively, with the forthcoming 10.1.0.4 patch release and the
accompanying new version of Analytic Workspace Manager, you can turn on
compression for a cube (which translates to one or more variables) using the
GUI.

This presumably tells AWM to create any composites with the COMPRESSED option
as detailed above.
One of the things that was mentioned to me when I was first told about
compression was that in this initial version, the scope for using it was quite
limited. This would appear to be borne out by the reference in the above
documentation that states:
“…when you aggregate data in a variable dimensioned by a composite
defined with one or more hierarchical dimension, one parent node may have
only one descendant node and so on all the way up to the top level. When
a variable has a good deal of this type of sparsity, use a compressed
composite as the dimension of the variable. Dimensioning this type of
variable with a compressed composite creates the smallest possible variable,
composite, and composite index much smaller than if you dimension a variable
with a b-tree or hash composite”
which would suggest that compression in this release is only of value when
you have the specific situation where you’re finding parent nodes with only one
descendent node. If anyone from the product team is reading, could you add any
more detail for this?
Now that this is a bit clearer, the next step is to get hold of a migrated
OES database that’s going to 10g, identify whether the specific conditions for
compressed composites are met, implement them and do some benchmarking. We’ve
got a couple of clients who are looking to implement this feature so I’ll report
back with results in due course. Once again full details on compressed
composites can be found in the
OLAP DML Reference available on OTN.

January 7th, 2005 at 8:48 pm
Mark, the compression benefit does apply only when you tend to have “sticks” – for instance, if JAN, FEB, MAR roll up into Q1, but only FEB exists. We’ve seen that most data these days is sparse enough that compression is usually a benefit, and when it doesn’t help, it usually doesn’t hurt either. And when it does help, it’s often huge – we saw one customer data set (from manufacturing) where it brought build time down from about 2 hours to a scant 15 minutes.
The “limited” application is more about restrictions that come into play when you’re using compressed composites – in the initial release you can only aggregate compressed composites using SUM, and you can’t write data into a variable dimensioned by a compressed composite that has been aggregated without first clearing out the aggregated data. There are probably others, but those are the ones that jump to mind first. Like all features, compressed composites are being continually improved and you can be sure that future releases will relax or remove many of the current limiations.
January 7th, 2005 at 8:51 pm
…of course, I’m speaking for myself, comments should not be construed as company positions, these are all opinions, etc, etc, etc.