Outer Join LTS Pruning… It’s Here!
A few months ago Charles Elliott and I were tasked with assessing OBIEE performance for a client here in the US. Queries were taking hours to run (literally) and obviously users weren’t happy. In the context of BI, there are many places where performance can be improved (database tuning, data warehouse modeling, query writing, etc.) but we decided to start with the RPD.
One of our resources at this client pointed to us that every query they ran used a database view (no, not a materialized view) which in turn sourced a handful of physical tables. These tables were all joined in the view using outer joins. So, every single query the BI Server ran, had to go through all the joins contained in this view, even if it didn’t need data from all the tables in question. We called this the “Inception Effect” (remember the movie?) because you always had to go 5 levels deep, and each one took longer than the last.
In this post, I want to demonstrate this problem and how we can “trick” the BI Server into behaving the way we want.
IMPORTANT: Please keep in mind that for this first example, we are using version 188.8.131.52.0. This detail will be very relevant later.
For this test, we’re obviously not using our client’s data. We will be working with data from the 2010 FIFA World Cup, and our tables are GOALS, PLAYERS and MATCHES. The GOALS table will be the source for both the logical dimension and fact, while PLAYERS and MATCHES will also provide additional information for our logical dimension. This is our physical model:
In our (very simple) business model, we have a fact table called Fact: Goals which uses the GOALS physical table as its only source:
And we also have a dimension table called Dim: GPM, which has one LTS with all three physical tables as sources (the columns in this table were named for easy identification of the source table). Note the join types for this logical table source:
So, let’s see what this looks like when we create a very simple query in Answers.
Note that we have a column from the GOALS table, one from the PLAYERS table and one from our fact. As expected, our query should include one outer join (remember that the LTS for our dimension included a couple of outer joins), but upon closer examination, we see that it actually includes both outer joins, even though it didn’t need to include the MATCHES table.
Furthermore, if we remove the ‘Players – Name’ column as well (leaving only columns that come from the GOALS table), the two outer joins still show up:
You may be wondering: Why do I care, as long as the result is still correct? Well, in our client assessment this was extremely important, because all the outer joins included in the queries all the time were causing them to run much slower than they should.
The solution in these situations is to “trick” the BI Server into using the outer-joined logical table sources only when needed. It requires a little more development time, but in the end it is well worth it. Let’s take a look at what our model looked like with this approach and what the results were.
As you can guess by the LTS names, the GOALS LTS includes only the GOALS table as its physical source, the GOALS_PLAYERS LTS includes the GOALS and PLAYERS tables, while the GOALS_PLAYERS_MATCHES LTS includes all three of them. The order of the logical table sources matters, as you want you first choice on top, and your last choice at the bottom. Alternatively, you can use Priority Groups to determine the order in which the BI Server is going to try and use each LTS based on the query criteria. Let’s see if it actually works.
In our multiple LTS test, I added an extra column to our model called ‘LTS ID’ that will help us identify in Answers the LTS being used for each query.
So this is our first result set:
This query includes only columns from the GOALS table, plus our LTS ID column, which shows the LTS being used. The important thing, however, is that when we look at the SQL issued by the BI Server, we see that no outer joins were included, indicating once more that the BI Server was smart enough to choose the most efficient LTS for this query.
If we include a column from the PLAYERS table to our request, this is what we see:
And the SQL validates our theory, including only one outer join:
Lastly, if we add a column from the MATCHES table to the request:
Our SQL should include both outer joins, as it is using the LTS seen in the request, with all 3 tables:
The New World Order
Recently, we heard a rumor from Oracle Product Management that with the 184.108.40.206 release of OBIEE, the BI Server would be a little smarter around pruning outer-joined logical table sources. We decided to put it to the test, and the results were encouraging.
We went back to the one LTS idea:
But in this case, if the theory is correct, the BI Server should be able to prune the unnecessary tables despite the outer joins.
When we run a request in Answers with columns from all three tables:
The BI Server will issue both outer joins in the query as expected:
However, if we remove the ‘Matches – Venue’ column:
The BI Server will remove the MATCHES table from the query, therefore eliminating one of the outer joins:
And finally, if we leave only columns from the GOALS table in our query, no outer joins are included:
And all this, using one single LTS in 220.127.116.11.0.
Now, I’m not claiming that the newer versions of OBIEE will be able to prune every unnecessary outer join in every single situation. This was a simple example, with only a few tables, but it does show that Oracle has been working on this. And that’s very good news if you are a RPD developer.
Until next time!