Digging into the Oracle BI Server Query Log File

As a follow-up to the posting I did last week on Oracle BI Enterprise Edition performance tuning, I thought I’d take a closer look a the query log file generated by the Oracle BI Server, the “virtual database engine” that ships with Oracle BI EE. For some time now I’ve been looking to see if there is any diagnostic data produced by the BI Server so that, for example, I can see the query plans put together when a query comes in, particularly in the situation where, for example, the BI Server joins together (or “stitches”, in Oracle BI Server terminology) data from two different database queries into one logical dataset. So, let’s take a simple example, where we have a fact table containing 14m rows of sales data and a product table containing 72 rows of product data, with the data all being stored in a single Oracle database and a single physical model describing the source tables and their relationships.

Running a simple query like this, against the source data:

Obiee Log1

Would generate the following log entries, at
$ORACLEBI_HOME/server/Log/NQQuery.log, with query logging set to 3:

So let’s step through this log entry and look at each part in detail.

The first section sets out the general details of the query, and contains details of the QUERY_SRC_CD variable containing your query, the name of the Oracle BI Server repository and subject area used for the query.

Notice that each entry is the log is also timestamped with the username of the BI Server user that was responsible for the query, and the timestamp for the log entry.

Next is the logical, “Request List” that sets out the columns that form the criteria for the report, and the grouping and aggregation applied to the data.

If my report criteria included a filter, the filter columns would appear in the request list as well, like this:

So at this point we’re dealing with the logical view of the data, which relates back to the Presentation Layer of the Oracle BI Server metadata layer. The next part of the log file shows the physical SQL that was generated to correspond with the logical request list of columns specified by the user. In this case, both logicals tables have logical table sources that come from the same physical Oracle database, and so a single physical SQL query is generated.

Notice the fact that Oracle BI Server automatically aggregates your data (this is behavoir specified in the default aggregation settings in the BI Server metadata layer, you can specify non-grouping, i.e. detail display of data instead), and applies a GROUP BY on the dimension data.

The next part of the log file details whether the query completed successfully, how long it took to return data, how many rows were returned and the response time of the underlying database.

So what we’ve got here is a fairly straightforward record of the logical query requested by the user (the request list of columns, the criteria for your request), the physical SQL generated to return the required list of columns, any filters, aggregation or grouping applied to the data, and statistics on how long the query took to run. Putting my Oracle Database hat on, we haven’t got anything equivalent to the “wait events” you get when tuning the Oracle database, where it tells you where the time has been spent when executing a query (waiting for disk I/O, waiting for network I/O, etc) but then again, in this case it’s the database that’s doing the vast majority of the work and there’s not really much overhead added by the BI Server here. The database took 194 seconds to return data, and the BI server only added a second to this, giving a total response time of 195.

There are actually several levels of logging that you can turn on for users and queries, with levels going from 0 (no logging) to 5 (lots and lots of logging), with the general advice being to only really turn this on when you’re debugging a query, as the logs are written to disk and can slow down a real-life system if left on. In a bit more detail, the logging levels currently available are as follows:

Level 0 = no logging, the default setting for a user or query

Level 1 = logs the logical SQL (the request list), the elapsed time, whether the cache was used, query status, user name and so on.

Level 2 = Everything in level 1, plus repository name, subject area name, physical SQL generated, number of rows returned and so on

Level 3 = Everything in level 2 plus some additional data on cache usage. This is the level I usually set, though the docs suggest you should just use level 2

Level 4 = Everything in level 3 plus the physical SQL execution plan

Level 5 = Everything in level 4 plus intermediate row counts at various stages, particularly useful when the BI Server is stitching data sets together.

So how does the above query look when run at level 5 logging? Let’s take a look, this time with the filter added in that I showed a minute ago. I’ve bolded the areas that are new in this level of logging:

So the execution plan that the log file contains is the logical execution plan, not the Oracle execution plan corresponding to the physical SQL that is generated. Let’s take a look at this logical execution plan in more details.

So what this contains is the request list as before, and then a list of “child nodes” with each node corresponding to a step in the execution plan (I think). Then we’ve got the filter applied to the nodes, the group by and the order by, and the query again takes 195 seconds, with the BI Server adding virtually nothing to the response time.

So far it all looks straightforward, but what happens if instead of both logical table sources pointing to the same one physical database, we source the dimension table from a second physical database, so that the BI Server generates a federated query and stitches the result set together in the BI Server’s memory. How would this extended query log look then? Let’s take a look.

So this looks a bit more interesting, as we can see that the BI Server is doing some of the work now, generating an SQL query for each physical data source and then stitching the data together. There’s also lot more nodes, which suggests that the BI Server is going through a lot more steps to return the data. Let’s a take a look at the more interesting parts of the log, starting with the logical execution plan, which now contains a number of child nodes; ones for each of the two database queries, further ones for the stitched-together data set and a node for the join between the two result sets .

Notice that each of the main nodes has IDs assigned to them (9981, 9964 and 9865) so that you can follow them through the rest of the log file.

Later on down the log file we can see the physical SQL generated for each node, and the row count resulting from each physical SQL execution. Then, after the data is retrieved from the various physical database connection, you can see how the BI Server then goes on to inner join the data (the binary join on node 10001), remove unneeded columns from the initial result set (projection), sort the data, again remove unneeded columns and then aggregate the data. You can also see what operations are carried out on the data before and after aggregation, where sorting is carried out and so on.

From looking this log file, you can pretty much determine the steps the BI Server is going through to add aggregations, sorting, result set stitching and so on to your data, if you’re interested in seeing how data from multiple results sets is brought together, I don’t know if this is publicly documented anywhere and whether other join types are possible (I’ve heard of nested loop joins also being performed), but this certainly looks interesting if you’re into Oracle BI Server internals. Something I’m certainly looking to do is to document these steps properly and come up with the complete set of steps that can be carried out, if anyone’s interested in helping then let me know as I said, I don’t think this is publicly documented.

So looking at the log file output, this query has actually taken just 173 seconds rather than the 195 required for the single-source query, the BI Server only adding a second to this. I wouldn’t place too much emphasis on the actual times taken for each variation on the query as I’m only running them once and on a single laptop, but you can see from the timings that Oracle BI Server doesn’t seem to add much overhead, even when you’re using the federated query feature (at least on my dataset, which isn’t all that big).

Now one of the things you can do in the BI Server metadata layer, is select the “driving table” for these types of cross-database joins. This is particularly useful if one of your tables is small (the suggested limit is 1000 rows, in my case it’s 70, and it tells the BI Server to firstly query the driving table to return a small amount of interesting rows, then it applies these these rows to the other table in batches so that only a much smaller set of the big table rows are then loaded into the BI Server memory.

Obiee Log2

The rationale here is that a normal federated query join would require both tables to be loaded into the BI Server memory, and if one of them is big, then this will require a full table scan of the physical source table and potentially take up a lot of memory in the BI Server. Using this driving table technique you can fairly quickly come up with a much smaller set of initial rows and row ids, this can then be used to filter the second table and therefore reduce the I/O usage and memory usage required by the federated query join. There are a couple of parameters you can set in the NQSCONFIG.INI file to determine the size of the batches that are applied to the larger table, and you can switch this feature off completely, but how does this look in the query log file, and does it make my query run faster? Let’s take a look.

The first difference in this “driving table” query log entry is the logical execution plan, than now retrieves data from the second, non-driving table via a set of parameter values.

You can specify the maximum amount of parameters in the table filter through the MAX_PARAMETERS_PER_DRIVE_JOIN parameter in the NQSCONFIG.INI file, with the limit really being down to how many filter values the back-end database can handle. The MAX_QUERIES_PER_DRIVE_JOIN parameter thereafter limits the total number of separate calls to the non-driving table that can take place (a function of the total number of rows returned by the initial driving table query divided by the MAX_PARAMETERS_PER_DRIVE_JOIN parameter setting), to stop run-away queries occuring that generate hundreds of individual calls to the non-driving table.

The other interesting part is the actual physical SQL sent to the non-driving table, which looks like this:

So in this case, the driving table feature will initially run the query against the smaller (dimension) table and return the rows specified by the filter on the query. Then, in batches of twenty, individual selects will be made against the larger, non-driving table, until all the interested rows are returned. These will then be joined in memory in the BI Server and the results returned to the dashboard, in theory faster than if we’d tried to load both tables into the BI Server memory and join them both there.

So now, to take this one step further, what does the query log file look like if we run the Aggregate Persistence Wizard on this data set, generate and aggregate at the PROD_SUBCATEGORY_DESC, plug the aggregates into the logical model and then run a query? Where (if anywhere) does aggregate navigation future in the logical execution plan and query steps?

Well, after running the wizard and registering the aggregates, I run the query again and here’s the output:

So, you can see the aggregate table being used, but there’s nothing in the log to say that aggregates were considered, the query is just generated against the logical table source that returns the smallest amount of rows.

Anyway, that’s a quick trip around the query log file, with a short diversion into driving tables and aggregates. Of course the $64,000 questions is “what do you do with the log file, in particular the knowledge of the various query execution steps”, and in reality if you’re getting to the point where federated queries are running slowly, your most obvious and productive means of sorting this out is to physically integrate your data into a single data mart or data warehouse rather than mucking around trying to decipher undocumented log files. Still, if you’re stuck with a federated query or just interested in how the internals work, it’s an interesting set of diagnostics you can generate, I’ll certainly be blogging more on this over time and if you’ve got anything to add, just let me know in the article comments.