Oracle Open World 2008, Day 4 : OBIEE Action Framework, and OBIEE Performance Tuning

I'm just catching up now on my Open World notes, as I'm staying for an extra day to go to Redwood Shores tomorrow for a meeting with the OWB product development team. Wednesday (Day 4) was my first day without any presentations, and with Larry's keynote during the early afternoon I managed to go to a session on the OBIEE Action Framework in the morning, and - the session I'd been waiting all week for - the OBIEE Performance Tuning one, at the end of the day.

I blogged about the forthcoming Action Framework in OBIEE 11g in a blog post last year, and this year Nick Tuson and Adam Bloom, two of the OBIEE product managers based just down the road from me in Bristol, gave an update on the progress of this feature. The basic idea with the action framework is that you can define, in the web catalog, "actions" that represent calls to web services, java methods, scripts or URLs that you can then attach to data items in your reports. Actions have three parts to them; an action definition that contains the action name, parameter names, button names and so on; a set of parameter mappings, and an action implementation, that defines the java class, web service call and so on that the action uses.

Adam also introduced another new addition to the web catalog in 11g, "conditions", which are defined against data items and can be used by actions and other processes to run checks like "is the customer profitable", "do they have a checking account" and so on, I used a variation on this when doing my BI and SOA articles and it's a similar idea to the conditions that you can define in the Discoverer EUL. In terms of dates, we're still looking at "some date in calendar year 2009" for OBIEE 11g, but it certainly sounds like they're packing in the features. I was also amused to see that the quote on Nick's slides around the action framework came from my blog post last year, I was certainly happy to endorse the new feature as it's shaping up to be one of the most innovative features in this forthcoming release.

Apart from live blogging Larry's keynote just after lunch, I managed to get myself along to Bob Ertl's session on Optimizing Oracle Business Intelligence Enterprise Edition. I was particularly looking forward to this session as it was the only one that took a look at OBIEE internals; in previous years we've had Ed Suen doing sessions on the semantic object layer and Kurt Wolf doing ones on OBIEE architecture, however I understand both have left recently and therefore Bob looks to have taken up the mantle of chief internals specialist for OBIEE. The session itself was really good, I really enjoyed it, Bob was leapt on a bit by enthusiastic and questioning OBIEE architects and performance tuners at the end but I think this just illustrates the interest there is in this area. Anyway, here's a synopsis of what Bob had to say:

OBIEE queries can run slowly for three main reasons:

  1. They scan too many records on the disk, typically caused by tables being too large, too much data needing to be aggregated, transformations being too complex, too many joins or poor SQL. The typical database solution to this is to add indexes, partitioning, summary management, batch aggregation, move the aggregation to the ETL process, create star schemas and so on. In the OBIEE world, you typically address this by using aggregate navigation, use caching, follow BI data modeling best practices. Bob noted that customers often turn to caching as their first solution to problems in this area, but this only addresses part of the problem.
  2. Returning too many records over the network, due to poor SQL, using unsupported DB functions, doing a cross-join (OBIEE federated queries) involving large tables, poor prompt design. In the RDBMS world you can address this by centralizing your data into a data warehouse (therefore decreasing the network traffic caused by cross-database joins), following model practices, using the drive table join feature, setting prompts appropriately, and using caching.
  3. Interface and processing bottlenecks, such as incorrect connection pool settings, setting logging too too high, incorrect configuration settings, insufficeint hardware. Funnily enough at least the first two of these issues are addressed by the new database machine and exadata storage servers announced yesterday, but at least on smaller systems you've got to address these things yourself.
Bob then went on to talk about BI Server query plans, for me the thing that I was most interested in finding out about. My background is in data warehousing where it's pretty important that you know about the various join types and execution plans that the database can generate for your queries; as the BI Server is a "virtual database engine" I'm equally curious about how the BI Server does its query parsing, plan generation and so on. In the abstract form, BI tool query plans look something like this:
  1. Pre-aggregation functions and filters are applied to the data
  2. Aggregations are performed
  3. For the BI Server, multiple result sets from federated queries are "stitched" together, and
  4. Post-aggregation functions and filters are applied.
Bob's advice was (and this is something I also advocate), to move as much of this functionality down the stack to the database as is possible. If you can integrate, aggregate and summarize your data in the data warehouse then do so, only if you can't should you then consider federated queries and so on. The BI Server itself will push down functions to the underlying database so that, for example, if you run a ranking function on your query and you're using an Oracle database, the BI Server will "function ship" this down to the database rather than try and do it itself. If you're using Access though, for example, or using Oracle but have got your database features set incorrectly in the BI Administration tool, this calculation will need to be done in the BI Server's memory instead (using a feature called "functional compensation" and it'll go and get all the rows from the database before it can run the calculation itself. The tip here then is to (a) make sure your database feature setting are correct in the BI Administrator tool, using the Features tab in the database dialog:

Table Features

and (b) consider using a database that supports the maximum amount of function shipping, also consider using Essbase etc if lots of aggregates or complex calculations are needed as these can be done natively.

There was also talk about how the BI Server goes about stitching disparate results sets together. This is something where, for example, you define a fact table in the business model and mapping layer and then map this to two physical data sources. When a query runs, two separate SQL statements are generated and the BI Server "stitches" the data together. The problem here comes when you try and do this with large data sets, as all the data is accessed and loaded into data before the BI Server joins the data in memory using a common key column. Bob talked about a BI Server feature called "parameterized nested loop joins", where you can set the driving table for a federated query, like this:

Driving Table

When you set this feature, the BI Server (if I remember it correctly) will query the driving table first, remember the set of key values that are returned by this query bearing in mind any filters that are applied, and then separately join to the second table only retrieving the key values collected by the first query. Bob also mentioned "eager group by" (might have mis-heard this though), "sub queries with IN clauses" and "persisted connection pool sub-queries", he didn't go into any more details on this but did promise a series of tech notes from the BI Server team in time to expand on these features. I hope this comes through and I'll certainly be dropping Bob a note on this, as internals and query processing details like these just aren't in the documentation and it'll be good to shed a bit more light on this.

Bob then went on to talk about Aggregate Persistence (I covered this in a previous blog posting) and talked about some features that might be coming in future releases, including parallel loading of aggregated data into the aggregate tables created by the wizard; automatic indexing (of the underlying data warehouse? or the aggregate tables created by the Aggregate Persistence Wizard?), an advisor function on which aggregates to create and which to drop; selective deletion of aggregates (currently it's all or nothing), and incremental loads of aggregate tables, although this is a bit trickier. I also asked about future integration with Oracle materialized views, this apparently is being looked at as of course it provides many of these features built into the (Oracle) database.

The final section of the talk was on logging, and the usage tracking repository tables and reports that ship with OBIEE and the BI Applications. Whilst logical and physical query logging is something you typically only turn on when diagnosing performance problems, usage tracking is something you can and should turn on permanently, the results of which you can store to file or better, to tables, so that you can generate dashboards and reports off of the usage data generated. The usage tracking reports themselves contain data such as when reports are run, what reports are the most popular, statistics (time of day, total time, compile time, number of rows returned and so on), these can also be reported on by the BI Administration plug in that you can obtain for Oracle Enterprise Manager. There are some best practices around setting up the usage tracking system; the BI Server should connect to the usage tracking tables via it's own connection pool, writes are batched to every five minutes, run the usage tracking system on the same database server as the business content. Apparently an Oracle by Example tutorial is coming soon for the usage tracking system, keep an eye out on OTN for this.

Anyway, as I said it was an excellent session, and I suggested at the end that those who had come along and expressed an interest in the area should stay in contact. So, if you're interested in setting up an informal "OBIEE Performance Tuning User Group", let me know and I'll put us all in contact.