More on Oracle OLAP 11g and Query Rewrite

December 15th, 2007 by

Over the past few days I’ve been taking a closer look at how query rewrite works with the new 11g release of Oracle OLAP. As a quick recap, in Oracle 11g you can now use OLAP analytic workspaces as summaries within your data warehouse, with Oracle now able to register OLAP dimensions and cubes as “cube organized materialized views”, which can be used by the query optimizer as an alternative to regular materialized views. The advantage of using analytic workspaces instead of relational summaries is that a single multi-dimensional analytic workspace can take the place of multiple relational summaries, and an analytic workspace can be faster to aggregate and query than a relational summary.

My interest in this area is mainly due to Discoverer customers asking me over the years whether Oracle OLAP could be a solution to their query performance problems. In the past, whilst moving to an analytic workspace could provide performance benefits when working with large aggregated data sets, you would have had to start using a different version of Discoverer – Oracle Discoverer Plus OLAP – which has a slightly different interface to normal, relational Discoverer, stored it’s reports in a different metadata catalog and only had a subset of the functionality of relational Discoverer, albeit with a spiffy new dimensional query builder. Now that the 11g release of Oracle OLAP allows analytic workspaces to be queried just like regular materialized views, it’s now possible to use regular Discoverer with it which opens up the possibility of combining the performance of OLAP with the familiarity and maturity of regular SQL-based query tools.

So, my studies followed two main areas: firstly, to see whether Discoverer relational – actually the Windows-based Discoverer Desktop version – can make use of 11g OLAP cubes, and secondly, to take a closer look at how the rewrite process works. To perform the tests, I used the Videostore data set that gets installed in the VIDEO5 schema when you create an End User Layer using Discoverer Administrator, and created an analytic workspace containing measures, dimensions and hierarchies matching the Videostore Discoverer business area.

The Video store data set contains Store, Product and Times dimensions, and a Sales Fact table containing a number of measures.

The dimension folders contain items that are arranged into hierarchies, so that users can drill up and drill down in their worksheets.

This more or less translated straight over to an Analytic workspace. A couple of steps that I had to do were firstly, to add a physical “Year” column to the Times table, so that I could create a day > year > all times hierarchy in the analytic workspace – In Discoverer, all of the time dimension attributes are generated at query time using a function, something I usually try and discourage on customer engagements as using a function on the fact table time dimension key can lead to that column’s index not getting used.

The second thing I had to do was to add additional “Total” columns to the Times, Product and Store physical tables, as the analytic workspace would need dimension members at the top level to store measures aggregated up to this top level. If I was building my AW using Oracle Warehouse Builder I could add these additional columns as part of the AW load mapping, but for this example I just added to them to the relational tables and then created the Analytic Workspace.

Once I built the Analytic Workspace (which incidentally, I build as a non-partitioned, compressed cube), I removed all of the existing constraints in the VIDEO5 schema and recreated them using the SQL script generated by the Materialized View Advisor, together with some relational dimensions, so that the query optimizer had the best chance of rewriting the queries to use the cube organized materialized views that AWM then generated. Once the Analytic Workspace was created and, using a simple SQL check through SQL Developer, I confirmed that query rewrite against the Analytic Workspace was working, I started my tests.

The first query I ran was a simple table of departments, regions and summarized sales.

Taking a look at the SQL that Discoverer generated (via the View SQL feature of Discoverer Desktop), it looked like a typical piece of Discoverer SQL.

Taking a look at the Explain Plan for the query, it looks like it’s been rewritten correctly, now using the cube organized materialized view over the analytic workspace. Moreover, it’s not joined back to the dimensions to get the dimension member names, something it was doing in my previous tests.

So what’s happened here in the background is that the Cost-Based Optimizer has decided that, for this particular query block, the analytic workspace (or more precisely, the cube organized materialized view) will return the aggregated data more quickly than the detail-level, relational tables originally specified in the query, so internally it’s re-written the query to go to the CB$SALES_CUBE cube organized materialized view. In this particular case, as all of the data it required could be retrieved from the cube organized materialized view – the query just requests what are actually dimension member IDs from the analytic workspace I created – it can get all it needs just from this summary.

Next I put together a crosstab query, to see how that worked.

Again, taking a look at the explain plan, it’s been rewritten to use the cube organized materialized view over the Analytic Workspace, but this time I’ve requested some product data that’s actually an attribute in the analytic workspace, as opposed to a dimension member, and so Oracle has had to join back to the dimension tables to get this additional data.

So far, I’ve just requested aggregated data which the Cost Based Optimized has determined can be most efficiently retrieved from the cube organized materialized view. This materialized view was created using SUM aggregations on the various measures, and contains aggregations for all the combination of product, store and time dimension levels. How, though, would it work if we use an SQL function in our query, say using one of the analytic function worksheets that comes with the Video Store sample data set?

I try it out with the Rank of Sales worksheet.

Taking a look at the SQL, you can see the RANK analytic function being used.

Again, taking a look at the Explain Plan, it’s been re-written to use the cube organized materialized view.

Under the covers, the Cost-Based Optimizer has decided that the most efficient way to retrieve sales by city and year, and sales by region and year, is to first go to the cube organized materialized view; then, it uses the SQL engine to perform the ranking, finally returning the results to Discoverer so that it can display the worksheet values. This is exactly the same process that goes on when the query is rewritten to a relational materialized view, with the materialized view providing the summarized data and the SQL engine then ranking it using the WINDOW SORT operation. In this particular case, the query has also had to join back to the dimension tables again to get the year description to use in the worksheet page item.

The only issue I found with rewrite was something, I think, that is a Discoverer-specific issue, in that worksheets that use items taken from a complex folder (one that is made up of arbitrary items dragged into a folder) didn’t ever get re-written; I tried out the same query using both regular and complex folders, and whilst the SQL for both queries appeared to be the same, they ended up with different Explain Plans, which I’m guessing is down to some additional hints, joins or something thats actually getting added to queries against complex folders which is stopping re-write occurring. One to check out with an SQL trace at some point.

Whilst we’re on the subject of Explain Plans and Oracle OLAP, I was thinking about the SQL statements that my previous “Oracle OLAP and ApEx” posting used; in the example queries I used, I joined individual views over the analytic workspace dimensions and cubes which in the 10g release would have been a pretty inefficient way to query OLAP data – it would have made the OLAP engine return all the dimension records that met the criteria, all the measure records, then have the SQL engine join them before returning the data, which was a bit crazy as the data is already “pre-joined” in the analytic workspace. To get around this issue, the recommended 10g way of accessing analytic workspaces through SQL was to create a single denormalized view over all the dimensions and measures, which whilst being efficient was a bit difficult to work with. In 11g though, there’s something called a CUBE JOIN explain plan step that allows you to query analytic workspace views as single items but with the join actually pushed back to the analytic workpace. To see whether this was working, I went back to one of my queries and generated an explain plan.

What’s happening here is that Oracle is aware that the views I’m querying are actually over an analytic workspace, and it therefore uses the new JOINED CUBE PARTIAL OUTER operation to join the data, passing the joining back to the analytic workspace which does it more efficiently as the data is already joined within its multi-dimensional storage array.

Anyway, it was an interesting bit of studying to do and my thanks also go to Bud Endress from the OLAP Product Management Team for some of the discussions we’ve had around these new features. Like anything new I’d use it initially with a bit of caution – the cost that the CBO attributes to analytic workspaces compared to regular relational storage is still going through a bit of calibration – but its certainly got some possibilities for Discoverer and other data warehouse setups that need to regularly query large sets of aggregated data, and certainly maintaining a single analytic workspace is a lot less work than maintaining a large set of individual relational summaries. Watch this space for more details as we work through the 11g OLAP release, and if you’re interested in working with us on prototyping some of these new features, drop us a line.

Comments

  1. Pete Scott Says:

    Mark, don’t forget that Oracle 10 users could have built a relational view over the OLAP cube and then registered the view for query rewrite using the rewrite equivalence procecedure. This is of course is a bit clunky and may not give as many opportunities for successful rewrite – and the problem with failed rewrite is that we are using low (or no) aggregation source tables in our queries and relying on the rewrite engine to pick the one that works for our query; when rewrite does not happen our query has to do loads of aggregation in the database at query time

  2. Borkur Steingrimsson Says:

    Mark,

    Did you try creating some regular materialized views next to this and see which way the CBO would prefer to go?

    Borkur

  3. Christopher Bridge Says:

    Hi Mark,

    Great write-up, as usual.

    “My interest in this area is mainly due to Discoverer customers asking me over the years whether Oracle OLAP could be a solution to their query performance problems.”

    Are the Discoverer customers with the performance issues using an OLAP model in the first place? or, is the real issue the need for them to convert a relational-based reporting schema into an OLAP model?

    The Video store example is an OLAP model example, making use of complex folders and SQL trimming in Discoverer, if I recall correctly.

    Chris.

  4. Mark Rittman Says:

    Hi Pete, Chris, Borkur,

    @Pete – yes, thanks for pointing this out. I covered query equivalence in an old article for DBAZine, actually – http://www.dbazine.com/datawarehouse/dw-articles/rittman2 – and as you say, it gives us a means of rewriting SQL queries against an AW. I think the limitation here is that it’d only cover individual SQL statements whereas regular query rewrite can be fairly intelligent about taking a range of SQL statements and rewriting them to use the AW. And of course, with 11g, it’s a hell of a lot easier to set up!

    @Borkur – no I haven’t, and it’s an interesting question. I guess for 11.1.0.6.0, the CBO would always use the AW MVs as their costing is artificially low – this is being addressed in the 11.1.0.7.0 version (and the intervening OLAP patch) and at that point it’d be an interesting test to run.

    @Chris – yes, you’re right, in the original draft of the article I did state that this wasn’t a solution for performance problems against OLTP databases, you’d need to get your data into a star schema in the first place to get this to work, which may in fact solve your problem without needing to use the AW anyway. Interesting point about the complex folders and SQL trimming – when I get a moment, I’ll run a proper SQL trace on the session and see what SQL is actually getting generated as opposed to what Discoverer is reporting – given that Disco says both queries are using the same SQL, but different explain plans are being generated, I suspect that under the covers slightly different SQL must in fact be being generated.

Website Design & Build: tymedia.co.uk