Oracle 11g Pivot

One of the things that I often come across is the "up-dateable fact", that is a fact that starts life "incomplete" and changes overtime. Examples include things such as support calls that start life as "open" then progress through "responded", "resolved" and finally "closed"; statuses in the sales cycle such as ordered, paid, shipped; stock movements in a warehouse - goods received and dispatched. Of course the business, rightly, needs to measure the times between stages or the number or value of transactions at each stage.

As a principle, I hate the idea of having to update a fact. A fact has happened, it is not going to change. I suppose to be more accurate a "change" is a new event, a new fact, a new fact occurring at a different time. So how to model this? - well instinctively I would go for a table that is only inserted (preferably appended to - think set based!) containing whatever dimensions are needed (don't forget the 'when' dimension) PLUS an 'EVENT' dimension (one row per expected status) and the measures (how many, how much etc). To report on this we need to rotate the table so that the events that belong to single item appear in the same row. Before Oracle 11g we would need to construct some SQL using a mix of case statements and analytic functions to rotate the data. But now we have a potentially better way the, Oracle 11g Pivot operator.

Here we define a set of dimensions for the row (similar to the dimensions in a Group BY clause), the aggregation operators for the pivoted measures - which of course could include MIN() or MAX() for the cases when want to pivot DATE types. We also need to define the dimensions we want to pivot by, and here we can actually choose multiple dimensions; this again is somewhat similar to the GROUP BY of traditional SQL. Remember though when we pivot we sometimes only expect to 'aggregate' a single row - if we want to pivot order date and dispatch date then we probably have just one of each!.

So how does it look? Well the Oracle 11g documentation describes the syntax and gives some examples - here I am showing a slightly more complicated case where we are pivoting by two dimensions, each with a known set of code values. This example is based on two of the examples in the Oracle 11g Data Warehousing Guide

	SELECT * FROM	(
		SELECT product, channel, quarter, quantity_sold FROM sales_view
		) PIVOT (SUM(quantity_sold) as SUMQ, SUM(amount_sold) as SUMS
			FOR (channel, quarter) IN
			((5, '02') AS CATALOG_Q2,
		 	(4, '01') AS INTERNET_Q1,
		 	(4, '04') AS INTERNET_Q4,
		 	(2, '02') AS PARTNERS_Q2,
		 	(9, '03') AS TELE_Q3
			) );

The query returns a column for the product and for each of the specified pairs of channel and quarter a column for each measure. So we get columns for:

PRODUCT, CATALOG_Q2_SUMQ, CATALOG_Q2_SUMS, INTERNET_Q1_SUMQ, INTERNET_Q1_SUMS, INTERNET_Q4_SUMQ, INTERNET_Q4_SUMS, PARTNERS_Q2_SUMQ, INTERNET_Q4_SUMS, TELE_Q3_SUMQ, and TELE_Q3_SUMS

Note how the the measure name is concatenated to the alias in the in list.
As you can see we don't need to specify each combination of channel and quarter - just the ones we want in our pivoted view. We also don't use a GROUP BY clause - we specify the columns we want to see (both the dimensions and the aggregations) and Oracle implicitly groups by all of the columns not in aggregated functions.

In my example I used SELECT * to wrap the inline pivot, in practice I would explicitly select the columns and perhaps alias them to more meaningful names than the concatenated ones generated by Oracle. I would also expose the pivot as database view and thus access it from OWB or OBIEE where it appears to be just another table or view.

Another point to note is that you might see null values in the pivoted measures and these can be due to one of two reasons: the value stored for that combination of dimensions (in our case channel and quarter) is actually NULL, or that the combination does not exist. If you need to (and you may not need to) you can differentiate by using a COUNT measure; if the count is zero then the combination does not exist in the source table, if one or more then the source has NULLs stored for the combination.

We used a similar pivot view to the one above to monitor stock movements in a warehouse - in this case we needed to track individual batches of product from multiple potential suppliers, so in addition to the product dimension we had dimensional columns for batch id (a degenerate dimension) and supplier. The view was then exposed to OWB to allow us to include the aggregated result set in our ETL process - we needed to calculate some additional measures based on the difference between two of the pivoted columns. The Pivot operator greatly simplified our ETL for this fact - we could easily write an ETL process with a straight aggregation then pivot the results with CASE statements or DECODES or whatever - but that would have been less clear and also increased the number of "moving parts".

We have had no problems with performance with our data set - 80 million rows pivoted on Exadata to just a few seconds. But it was not too slow on our non-exadata development machine either.