<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rittman Mead Consulting &#187; Mark Rittman</title>
	<atom:link href="http://www.rittmanmead.com/author/mark-rittman/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rittmanmead.com</link>
	<description>Delivered Intelligence</description>
	<lastBuildDate>Wed, 10 Mar 2010 08:49:23 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Inside the Oracle BI Server Part 3 : BI Server In-Memory Joins</title>
		<link>http://www.rittmanmead.com/2010/03/03/inside-the-oracle-bi-server-part-3-bi-server-in-memory-joins/</link>
		<comments>http://www.rittmanmead.com/2010/03/03/inside-the-oracle-bi-server-part-3-bi-server-in-memory-joins/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 09:00:10 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2010/02/28/inside-the-oracle-bi-server-part-3-bi-server-in-memory-joins/</guid>
		<description><![CDATA[In the previous two postings in this series, I looked at the architecture of the Oracle BI Server, and how it processes incoming queries from Oracle BI Answers. In the latter article I touched on the concept of BI Server in-memory joins, and in this article I want to expand on this topic and look [...]]]></description>
			<content:encoded><![CDATA[<p>In the previous two postings in this series, I looked at the architecture of the Oracle BI Server, and how it processes incoming queries from Oracle BI Answers. In the latter article I touched on the concept of BI Server in-memory joins, and in this article I want to expand on this topic and look at just what goes on when the BI Server is called upon to combine data from multiple sources.</p>
<p>When the BI Server executes a query plan, it handles the data in four separate stages:</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis322.jpg" height="373" width="200" border="0" hspace="4" vspace="4" alt="Bis32" /></p>
<ul>
<li>Firstly, filters and functions are applied to the data from each data source</li>
<li>Then, the data from these data sources are aggregated as required</li>
<li>Then they are joined together (or &#8220;stitched&#8221; together), and</li>
<li>Then, any calculations and/or aggregations that are applied across data sources are applied</li>
</ul>
<p>In simple OBIEE environments, data used by a request will come from a single database, and therefore any joins that need to be performed by the BI Server will automatically be &#8220;pushed down&#8221; to the underlying database. In the cases though where more than one physical database is being used to provide data for a query, this join will instead need to be performed by the BI Server &#8220;in memory&#8221;. This ability to &#8220;federate&#8221; data sources, and therefore produce reports and analysis that span multiple data sources, but present the data to users as if it was a single database, is one of the key unique features of OBIEE and sets it apart from tools like Discoverer which are really restricted to reporting against single data sources.</p>
<p>So given this capability, how does it work under the covers? When does the BI Server perform a join in-memory, and when does it get done at the underlying database level? Where can we see what is happening, and can we predict what method the BI Server will use when performing a join? Finally, what algorithm does the BI Server use when performing these joins, and how does it use memory and disk when during the process?</p>
<p>To illustrate how the process works, there are a number of join scenarios that we need to consider. Some relate to joining fact and dimension tables together, and others relate to joining fact tables that share conforming dimensions, or hold conforming data sets of differing granularity.</p>
<p><span style="font-size:14pt;"><strong>Joining Fact and Dimension Tables Together</strong></span></p>
<p>The BI Server semantic layer requires you to organize your business model and mapping layer into a star schema. This star schema may have one or more logical dimension tables, that join to one or more logical fact tables. The logical fact tables typically have conforming dimensions, so that you can create requests that span multiple fact tables and multiple dimension tables.</p>
<p>Taking for the moment joins between fact and dimension tables, depending on how the underlying physical or logical table source joins are set up in the semantic model, these may be either inner joins, left outer joins, right outer joins or full outer joins. The simple example to consider is a business model that is mapped to a single physical database, so that all logical table sources point to the same underlying data source, as shown in the screenshot below:</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis25-2.jpg" width="500" border="0" hspace="4" vspace="4" alt="Bis25-2" /></p>
<p>In this case, if we issued a request against this business model that required data from a dimension table and a fact table, the BI Server would push the join between logical table sources down to the underlying database, a single SQL query would be generated and the execution plan from a level 5 query log entry would look like this:</p>
<pre>-------------------- Execution plan:

RqList &lt;&lt;2105&gt;&gt; [for database 3023:2820:orcl3,44]
PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:2820,44],
sum(SALES.QUANTITY_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c2 GB [for database 3023:2820,44]
Child Nodes (RqJoinSpec): &lt;&lt;2136&gt;&gt; [for database 3023:2820:orcl3,44]
PRODUCTS T2874
SALES T2911
DetailFilter: PRODUCTS.PROD_ID = SALES.PROD_ID [for database 0:0]
GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC]  [for database 3023:2820,44]
OrderBy: c1 asc [for database 3023:2820,44]
</pre>
<p>The same would apply to a left outer join between table sources in the same database, a right outer join or a full outer join. The BI Server doesn&#8217;t do any work here except to issue a single SQL query, and you can see just the one &#8220;RqList&#8221; (request list) in the execution plan, indicating again that the BI Server thinks it only needs to put together one query to satisfy the request.</p>
<p>If, however, one of the logical dimension tables had its logical table source re-pointed to a separate physical database, as shown in the screenshot below, the BI Server would now have to do the join itself, as it can&#8217;t be pushed down to the underlying database (as there are now two of them).</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis261.jpg" width="500" border="0" hspace="4" vspace="4" alt="Bis26" /></p>
<p>In this case, two SQL queries would be issued, one against each of the two physical databases, and the BI Server would do the join in-memory (or to disk, I&#8217;ll elaborate on this later on). The corresponding logical execution plan from a level 5 log file would now look like this:</p>
<pre>-------------------- Execution plan:

RqBreakFilter &lt;&lt;2465&gt;&gt;[1] [for database 0:0,0]
RqList &lt;&lt;2466&gt;&gt; [for database 0:0,0]
D1.c2 as c1 [for database 3023:2500,44],
sum(D1.c5 by [ D1.c2]  at_distinct [ D1.c2, D1.c3] ) as c2 [for database 0:0,0]
Child Nodes (RqJoinSpec): &lt;&lt;2478&gt;&gt; [for database 0:0,0]
(
RqList &lt;&lt;2482&gt;&gt; [for database 0:0,0]
D902.c1 as c2 GB [for database 3023:2500,44],
D901.c2 as c3 [for database 3023:132,44],
D901.c3 as c5 [for database 3023:132,44]
Child Nodes (RqJoinSpec): &lt;&lt;2490&gt;&gt; [for database 0:0,0]

(
RqList &lt;&lt;2495&gt;&gt; [for database 3023:132:orcl,44]
SALES.PROD_ID as c2 [for database 3023:132,44],
sum(SALES.QUANTITY_SOLD by [ SALES.PROD_ID] ) as c3 [for database 3023:132,44]
Child Nodes (RqJoinSpec): &lt;&lt;2504&gt;&gt; [for database 3023:132:orcl,44]
SALES T211
GroupBy: [ SALES.PROD_ID]  [for database 3023:132,44]
OrderBy: c2 asc [for database 3023:132,44]
) as D901
InnerJoin &lt;&lt;2492&gt;&gt; On D901.c2 = D902.c2; actual join vectors:  [ 0 ] =  [ 1 ]

(
RqList &lt;&lt;2517&gt;&gt; [for database 3023:2500:orcl2,44]
PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:2500,44],
PRODUCTS.PROD_ID as c2 [for database 3023:2500,44]
Child Nodes (RqJoinSpec): &lt;&lt;2523&gt;&gt; [for database 3023:2500:orcl2,44]
PRODUCTS T2502
OrderBy: c2 asc [for database 3023:2500,44]
) as D902
OrderBy: c2, c3 [for database 0:0,0]
) as D1
OrderBy: c1 asc [for database 0:0,0]
</pre>
<p>Notice the <strong>&#8220;InnerJoin &lt;&lt;2492&gt;&gt; On D901.c2 = D902.c2; actual join vectors:  [ 0 ] =  [ 1 ]&#8220;</strong> that is in the middle of the execution plan, between the two main Rqlists &#8211; this tells you that the BI Server is doing the join, as it would only appear here if it couldn&#8217;t be pushed down to the underlying database. You might also find references to <strong>LeftOuterJoin</strong>, <strong>RightOuterJoin</strong> and <strong>FullOuterJoin</strong> here, depending on how the join between the tables is defined in the physical or logical table source joins in your semantic layer.</p>
<p><span style="font-size:14pt;"><strong>Joining Facts with Conforming Dimensions Together</strong></span></p>
<p>Another situation occurs when you are joining fact tables together that share conforming dimensions. A simple example of this is where you create a request that requires data from two or more fact tables that share conforming dimensions, such as those shown in the screenshot below:</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis25-1.jpg" height="256" width="500" border="0" hspace="4" vspace="4" alt="Bis25-1" /></p>
<p>As requests such as these can potentially lead to &#8220;fan trap&#8221; issues (explained in <a href="http://www.rittmanmead.com/2008/08/26/resolving-fan-traps-and-circular-joins-using-obiee/">this blog post</a>), the BI Server knows that it has to generate two logical queries and join, or &#8220;stitch&#8221; them together to avoid the fan trap. If both fact tables are sourced from the same physical database, and this database supports subquery factoring (the &#8220;WITH&#8221; clause that you see in Oracle 10gR2/11g SQL statements) then it will generate the following execution plan, which has a FullOuterStitchJoin between the two inner RqList blocks:</p>
<pre>RqBreakFilter &lt;&lt;3571&gt;&gt;[3] [for database 0:0,0]
    RqList &lt;&lt;3462&gt;&gt; [for database 3023:2820:orcl3,46]
        D1.c1 as c1 GB [for database 3023:2820,46],
        D2.c1 as c2 GB [for database 3023:2820,46],
        case  when D1.c2 is not null then D1.c2 when D2.c2 is not null then D2.c2 end  as c3 GB [for database 3023:2820,46]
    Child Nodes (RqJoinSpec): &lt;&lt;3567&gt;&gt; [for database 3023:2820:orcl3,46]
        (
            RqList &lt;&lt;3474&gt;&gt; [for database 3023:2820:orcl3,46]
                sum(COSTS.UNIT_COST by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c1 GB [for database 3023:2820,46],
                PRODUCTS.PROD_SUBCATEGORY_DESC as c2 GB [for database 3023:2820,46]
            Child Nodes (RqJoinSpec): &lt;&lt;3507&gt;&gt; [for database 3023:2820:orcl3,46]
                PRODUCTS T2874
                COSTS T2830
            DetailFilter: COSTS.PROD_ID = PRODUCTS.PROD_ID [for database 0:0]
            GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC]  [for database 3023:2820,46]
        ) as D1 FullOuterStitchJoin &lt;&lt;3565&gt;&gt; On D1.c2 = D2.c2
        (
            RqList &lt;&lt;3511&gt;&gt; [for database 3023:2820:orcl3,46]
                sum(SALES.AMOUNT_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c1 GB [for database 3023:2820,46],
                PRODUCTS.PROD_SUBCATEGORY_DESC as c2 GB [for database 3023:2820,46]
            Child Nodes (RqJoinSpec): &lt;&lt;3544&gt;&gt; [for database 3023:2820:orcl3,46]
                PRODUCTS T2874
                SALES T2911
            DetailFilter: PRODUCTS.PROD_ID = SALES.PROD_ID [for database 0:0]
            GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC]  [for database 3023:2820,46]
        ) as D2
    OrderBy: c3 asc [for database 3023:2820,46]
</pre>
<p>The BI Server Navigator then generates a single SQL statement off of this execution plan, which queries both fact tables using subquery factoring, and then brings the results together in the main body of the statement:</p>
<pre>-------------------- Sending query to database named orcl3 (id: &lt;&lt;3462&gt;&gt;):
WITH
SAWITH0 AS (select sum(T2830.UNIT_COST) as c1,
     T2874.PROD_SUBCATEGORY_DESC as c2
from
     PRODUCTS T2874,
     COSTS T2830
where  ( T2830.PROD_ID = T2874.PROD_ID )
group by T2874.PROD_SUBCATEGORY_DESC),
SAWITH1 AS (select sum(T2911.AMOUNT_SOLD) as c1,
     T2874.PROD_SUBCATEGORY_DESC as c2
from
     PRODUCTS T2874,
     SALES T2911
where  ( T2874.PROD_ID = T2911.PROD_ID )
group by T2874.PROD_SUBCATEGORY_DESC)
select distinct SAWITH0.c1 as c1,
     SAWITH1.c1 as c2,
     case  when SAWITH0.c2 is not null then SAWITH0.c2 when SAWITH1.c2 is not null then SAWITH1.c2 end  as c3
from
     SAWITH0 full outer join SAWITH1 On SAWITH0.c2 = SAWITH1.c2
order by c3
</pre>
<p>If the physical database doesn&#8217;t support subquery factoring, such as Oracle Database 10gR1 or higher, then the BI Server generates a slightly different execution plan, again with a FullOuterStitchJoin, like this:</p>
<pre>-------------------- Execution plan:

RqBreakFilter &lt;&lt;3115&gt;&gt;[3] [for database 0:0,0]
    RqList &lt;&lt;3006&gt;&gt; [for database 0:0,0]
        D903.c1 as c1 GB [for database 3023:2820,44],
        D903.c2 as c2 GB [for database 3023:2820,44],
        case  when D903.c3 is not null then D903.c3 when D903.c4 is not null then D903.c4 end  as c3 GB [for database 3023:2820,44]
    Child Nodes (RqJoinSpec): &lt;&lt;3117&gt;&gt; [for database 0:0,0]
        (
            RqList &lt;&lt;3160&gt;&gt; [for database 0:0,0]
                D901.c1 as c1 GB [for database 3023:2820,44],
                D902.c1 as c2 GB [for database 3023:2820,44],
                D901.c2 as c3 [for database 3023:2820,44],
                D902.c2 as c4 [for database 3023:2820,44]
            Child Nodes (RqJoinSpec): &lt;&lt;3163&gt;&gt; [for database 0:0,0]

                    (
                        RqList &lt;&lt;3018&gt;&gt; [for database 3023:2820:orcl3,44]
                            sum(COSTS.UNIT_COST by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c1 GB [for database 3023:2820,44],
                            PRODUCTS.PROD_SUBCATEGORY_DESC as c2 GB [for database 3023:2820,44]
                        Child Nodes (RqJoinSpec): &lt;&lt;3051&gt;&gt; [for database 3023:2820:orcl3,44]
                            PRODUCTS T2874
                            COSTS T2830
                        DetailFilter: COSTS.PROD_ID = PRODUCTS.PROD_ID [for database 0:0]
                        GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC]  [for database 3023:2820,44]
                        OrderBy: c2 asc [for database 3023:2820,44]
                    ) as D901 FullOuterStitchJoin &lt;&lt;3109&gt;&gt; On D901.c2 = D902.c2; actual join vectors:  [ 1 ] =  [ 1 ]

                    (
                        RqList &lt;&lt;3055&gt;&gt; [for database 3023:2820:orcl3,44]
                            sum(SALES.AMOUNT_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c1 GB [for database 3023:2820,44],
                            PRODUCTS.PROD_SUBCATEGORY_DESC as c2 GB [for database 3023:2820,44]
                        Child Nodes (RqJoinSpec): &lt;&lt;3088&gt;&gt; [for database 3023:2820:orcl3,44]
                            PRODUCTS T2874
                            SALES T2911
                        DetailFilter: PRODUCTS.PROD_ID = SALES.PROD_ID [for database 0:0]
                        GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC]  [for database 3023:2820,44]
                        OrderBy: c2 asc [for database 3023:2820,44]
                    ) as D902
        ) as D903
    OrderBy: c3 asc [for database 0:0,0]
</pre>
<p>This is then resolved for this database into two separate SQL statements, which then joined &#8220;in-memory&#8221; together by the BI Server.</p>
<pre>-------------------- Sending query to database named orcl3 (id: &lt;&lt;3018&gt;&gt;):

select sum(T2830.UNIT_COST) as c1,
     T2874.PROD_SUBCATEGORY_DESC as c2
from
     PRODUCTS T2874,
     COSTS T2830
where  ( T2830.PROD_ID = T2874.PROD_ID )
group by T2874.PROD_SUBCATEGORY_DESC
order by c2

+++Administrator:2a0000:2a0005:----2010/02/28 15:05:31

-------------------- Sending query to database named orcl3 (id: &lt;&lt;3055&gt;&gt;):

select sum(T2911.AMOUNT_SOLD) as c1,
     T2874.PROD_SUBCATEGORY_DESC as c2
from
     PRODUCTS T2874,
     SALES T2911
where  ( T2874.PROD_ID = T2911.PROD_ID )
group by T2874.PROD_SUBCATEGORY_DESC
order by c2
</pre>
<p><span style="font-size:14pt;"><strong><br />
Joining Table Sources within a Logical Fact</strong></span></p>
<p>Another situation is a fact table may have more than one logical table source, because individual measures are sourced from different data sources or perhaps measures may be mapped in at differing levels of granularity (this blog post describes such a scenario). In this case, again the BI Server will initially try and push the join down to the underlying database, something that may be possible if a single physical database is used and we can use a technique like subquery factoring; more likely though it will require the BI Server to issue two or more physical SQL statements and then bring the results back together again using a FullOuterStitchJoin.</p>
<pre>-------------------- Execution plan:

RqList &lt;&lt;7829&gt;&gt; [for database 0:0,0]
    D1.c1 as c1 [for database 0:0,0],
    D1.c2 as c2 [for database 0:0,0],
    D1.c3 as c3 [for database 0:0,0],
    D1.c4 as c4 [for database 3023:4210,44]
Child Nodes (RqJoinSpec): &lt;&lt;7842&gt;&gt; [for database 0:0,0]
    (
        RqList &lt;&lt;7809&gt;&gt; [for database 0:0,0]
            D1.c1 as c1 [for database 0:0,0],
            D1.c2 as c2 [for database 0:0,0],
            D1.c3 as c3 [for database 0:0,0],
            D1.c4 as c4 [for database 3023:4210,44],
            D1.c5 as c5 [for database 0:0,0]
        Child Nodes (RqJoinSpec): &lt;&lt;7824&gt;&gt; [for database 0:0,0]
            (
                RqBreakFilter &lt;&lt;7808&gt;&gt;[1,2,5] [for database 0:0,0]
                    RqList &lt;&lt;7604&gt;&gt; [for database 0:0,0]
                        case  when D903.c1 is not null then D903.c1 when D903.c2 is not null then D903.c2 end  as c1 GB [for database 0:0,0],
                        case  when D903.c3 is not null then D903.c3 when D903.c4 is not null then D903.c4 end  as c2 GB [for database 0:0,0],
                        D903.c5 as c3 GB [for database 0:0,0],
                        D903.c6 as c4 GB [for database 3023:4210,44],
                        case  when D903.c7 is not null then D903.c7 when D903.c8 is not null then D903.c8 end  as c5 GB [for database 0:0,0]
                    Child Nodes (RqJoinSpec): &lt;&lt;7844&gt;&gt; [for database 0:0,0]
                        (
                            RqList &lt;&lt;7915&gt;&gt; [for database 0:0,0]
                                D901.c1 as c1 [for database 0:0,0],
                                D902.c1 as c2 [for database 3023:4210,44],
                                D902.c2 as c3 [for database 3023:4210,44],
                                D901.c2 as c4 [for database 0:0,0],
                                D901.c3 as c5 GB [for database 0:0,0],
                                D902.c3 as c6 GB [for database 3023:4210,44],
                                D901.c4 as c7 [for database 0:0,0],
                                D902.c4 as c8 [for database 3023:4210,44]
                            Child Nodes (RqJoinSpec): &lt;&lt;7918&gt;&gt; [for database 0:0,0]

                                    (
                                        RqList &lt;&lt;7851&gt;&gt; [for database 0:0,0]
                                            D1.c2 as c1 [for database 0:0,0],
                                            D1.c3 as c2 [for database 0:0,0],
                                            D1.c1 as c3 GB [for database 0:0,0],
                                            D1.c4 as c4 [for database 0:0,0]
                                        Child Nodes (RqJoinSpec): &lt;&lt;7854&gt;&gt; [for database 0:0,0]
                                            (
                                                RqBreakFilter &lt;&lt;7687&gt;&gt;[2,3] [for database 0:0,0]
                                                    RqList &lt;&lt;8040&gt;&gt; [for database 0:0,0]
                                                        D1.c1 as c1 [for database 0:0,0],
                                                        D1.c2 as c2 [for database 0:0,0],
                                                        D1.c3 as c3 [for database 0:0,0],
                                                        D1.c4 as c4 [for database 0:0,0]
                                                    Child Nodes (RqJoinSpec): &lt;&lt;8058&gt;&gt; [for database 0:0,0]
                                                        (
                                                            RqList &lt;&lt;7972&gt;&gt; [for database 3023:4483:Quotas,2]
                                                                sum(QUANTITY_QUOTAS.QUOTA by [ CATEGORY.CATEGORY, MONTHS.MONTH_MON_YYYY] ) as c1 [for database 3023:4483,2],
                                                                MONTHS.MONTH_MON_YYYY as c2 [for database 3023:4483,2],
                                                                CATEGORY.CATEGORY as c3 [for database 3023:4483,2],
                                                                MONTHS.MONTH_YYYYMM as c4 [for database 3023:4483,2]
                                                            Child Nodes (RqJoinSpec): &lt;&lt;7682&gt;&gt; [for database 3023:4483:Quotas,2]
                                                                CATEGORY T4486
                                                                MONTHS T4488
                                                                QUANTITY_QUOTAS T4492
                                                            DetailFilter: CATEGORY.CATEGORY = QUANTITY_QUOTAS.CATEGORY and MONTHS.MONTH_YYYYMM = QUANTITY_QUOTAS.MONTH_YYYYMM [for database 0:0]
                                                            GroupBy: [ CATEGORY.CATEGORY, MONTHS.MONTH_YYYYMM, MONTHS.MONTH_MON_YYYY]  [for database 3023:4483,2]
                                                        ) as D1
                                                    OrderBy: c2, c3 [for database 0:0,0]
                                            ) as D1
                                        OrderBy: c1 asc, c2 asc [for database 0:0,0]
                                    ) as D901 FullOuterStitchJoin &lt;&lt;7800&gt;&gt; On D901.c1 =NullsEqual D902.c1 and D901.c2 =NullsEqual D902.c2; actual join vectors:  [ 0 1 ] =  [ 0 1 ]

                                    (
                                        RqList &lt;&lt;7880&gt;&gt; [for database 3023:4210:orcl4,44]
                                            D2.c2 as c1 [for database 3023:4210,44],
                                            D2.c3 as c2 [for database 3023:4210,44],
                                            D2.c1 as c3 GB [for database 3023:4210,44],
                                            D2.c4 as c4 [for database 3023:4210,44]
                                        Child Nodes (RqJoinSpec): &lt;&lt;7883&gt;&gt; [for database 3023:4210:orcl4,44]
                                            (
                                                RqBreakFilter &lt;&lt;7760&gt;&gt;[2,3] [for database 3023:4210:orcl4,44]
                                                    RqList &lt;&lt;7989&gt;&gt; [for database 3023:4210:orcl4,44]
                                                        sum(ITEMS.QUANTITY by [ PRODUCT.CATEGORY, TIMES.MONTH_MON_YYYY] ) as c1 [for database 3023:4210,44],
                                                        TIMES.MONTH_MON_YYYY as c2 [for database 3023:4210,44],
                                                        PRODUCT.CATEGORY as c3 [for database 3023:4210,44],
                                                        TIMES.MONTH_YYYYMM as c4 [for database 3023:4210,44]
                                                    Child Nodes (RqJoinSpec): &lt;&lt;7755&gt;&gt; [for database 3023:4210:orcl4,44]
                                                        PRODUCT T4256
                                                        TIMES T4264
                                                        ITEMS T4239
                                                        ORDERS T4248
                                                    DetailFilter: ITEMS.ORDID = ORDERS.ORDID and ITEMS.PRODID = PRODUCT.PRODID and ORDERS.ORDERDATE = TIMES.DAY_ID [for database 0:0]
                                                    GroupBy: [ PRODUCT.CATEGORY, TIMES.MONTH_MON_YYYY, TIMES.MONTH_YYYYMM]  [for database 3023:4210,44]
                                            ) as D2
                                        OrderBy: c1 asc, c2 asc [for database 3023:4210,44]
                                    ) as D902
                        ) as D903
                    OrderBy: c1, c2, c5 [for database 0:0,0]
            ) as D1
        OrderBy: c5 asc, c2 asc, c4 asc [for database 0:0,0]
    ) as D1
</pre>
<p>Again, notice the FullOuterStitchJoin in the execution plan &#8211; this indicates that facts (as opposed to facts and dimensions) are being joined together.</p>
<p>This in turn leads to two separate SQL statements. The one against the &#8220;orcl&#8221; database is more complex because the results then need to be mapped to the aggregation level that the second source, &#8220;quotas&#8221;, comes in at:</p>
<pre>-------------------- Sending query to database named Quotas (id: &lt;&lt;7972&gt;&gt;):
select sum(T4492."QUOTA") as c1,
     T4488."MONTH_MON_YYYY" as c2,
     T4486."CATEGORY" as c3,
     T4488."MONTH_YYYYMM" as c4
from
     "CATEGORY" T4486,
     "MONTHS" T4488,
     "QUANTITY_QUOTAS" T4492
where  ( T4486."CATEGORY" = T4492."CATEGORY" and T4488."MONTH_YYYYMM" = T4492."MONTH_YYYYMM" )
group by T4486."CATEGORY", T4488."MONTH_YYYYMM", T4488."MONTH_MON_YYYY"

+++Administrator:2b0000:2b000a:----2010/02/24 17:18:51

-------------------- Sending query to database named orcl4 (id: &lt;&lt;7880&gt;&gt;):

select D2.c2 as c1,
     D2.c3 as c2,
     D2.c1 as c3,
     D2.c4 as c4
from
     (select D1.c1 as c1,
               D1.c2 as c2,
               D1.c3 as c3,
               D1.c4 as c4
          from
               (select sum(T4239.QUANTITY) as c1,
                         T4264.MONTH_MON_YYYY as c2,
                         T4256.CATEGORY as c3,
                         T4264.MONTH_YYYYMM as c4,
                         ROW_NUMBER() OVER (PARTITION BY T4256.CATEGORY, T4264.MONTH_MON_YYYY ORDER BY T4256.CATEGORY ASC, T4264.MONTH_MON_YYYY ASC) as c5
                    from
                         PRODUCT T4256,
                         TIMES T4264,
                         ITEMS T4239,
                         ORDERS T4248
                    where  ( T4239.ORDID = T4248.ORDID and T4239.PRODID = T4256.PRODID and T4248.ORDERDATE = T4264.DAY_ID )
                    group by T4256.CATEGORY, T4264.MONTH_MON_YYYY, T4264.MONTH_YYYYMM
               ) D1
          where  ( D1.c5 = 1 )
     ) D2
order by c1, c2
</pre>
<p>So, to summarize things so far:</p>
<ul>
<li>Where possible, the BI Server will try and generate a single SQL statement to resolve a request</li>
<li>And if possible, any joins that are required between tables will be pushed down to the database</li>
<li>If table data sources are located on separate physical databases, the BI Server will request the individual data source data blocks, and then join the results together in-memory using an inner, left outer, right outer or full outer join as appropriate</li>
<li>If facts (or measures within a fact) are being joined together, the BI Server will need to generate one logical query per logical table source, and bring the data together with a full outer stitch join</li>
<li>As mentioned above, if it&#8217;s possible to do this stitch join at the database level (using, for example,  a WITH clause), it&#8217;ll do so</li>
<li>Otherwise the BI Server will generate separate SQL statements and join the data together in-memory</li>
</ul>
<p>When an in-memory BI Server join happens between two tables, it will bring back both sets of data from the two (or more) table sources and then perform a sort-merge join to bring the data together. If possible, it will push the sort back to the underlying database and just do the &#8220;merge&#8221; part of the join, and it&#8217;ll in all likelihood page some of the temporary data to TMP files in $ORACLEBIDATA/tmp depending on the load on the server, available memory and the number of concurrent queries that it is running. The NQSConfig.INI BI Server parameter VIRTUAL_TABLE_PAGE_SIZE determines the point at which temporary data is paged to disk, and on a Unix server you can experiment with increasing it from its default setting if you have lots of unused memory available (the docs suggest that this will probably not have much of a positive effect, though).</p>
<p><span style="font-size:14pt;"><strong>Fragmented Data Sources<br />
</strong></span><br />
Another variation on a join that the BI Server can do is a &#8220;union&#8221; between two queries. This is most common when you have fragmented data sources, such as the example below where part of the data in the sales table comes from one table, and part from another.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis34-1.jpg" height="390" width="500" border="0" hspace="4" vspace="4" alt="Bis34-1" /></p>
<p>In this case, the logical execution plan will contain RqUnionAll between the inner RqList request lists, to show that the BI Server knows it needs to union all the two queries.</p>
<pre>-------------------- Execution plan:

RqList &lt;&lt;7569&gt;&gt; [for database 3023:6594:orcl7,44]
    D3.c2 as c1 GB [for database 3023:6594,44],
    sum(D3.c3 by [ D3.c2] ) as c2 GB [for database 3023:6594,44]
Child Nodes (RqJoinSpec): &lt;&lt;7695&gt;&gt; [for database 3023:6594:orcl7,44]
    (
        RqList &lt;&lt;7613&gt;&gt; [for database 3023:6594:orcl7,44]
            PRODUCTS.PROD_SUBCATEGORY_DESC as c2 [for database 3023:6594,44],
            SALES_UPTO_2003.AMOUNT_SOLD as c3 [for database 3023:6594,44]
        Child Nodes (RqJoinSpec): &lt;&lt;7617&gt;&gt; [for database 3023:6594:orcl7,44]
            PRODUCTS T6596
            SALES T6629
        DetailFilter: PRODUCTS.PROD_ID = SALES_UPTO_2003.PROD_ID [for database 0:0]
        RqUnion All &lt;&lt;7690&gt;&gt; [for database 3023:6594:orcl7,44]
        RqList &lt;&lt;7668&gt;&gt; [for database 3023:6594:orcl7,44]
            PRODUCTS.PROD_SUBCATEGORY_DESC as c2 [for database 3023:6594,44],
            SALES_BEYOND_2003.AMOUNT_SOLD as c3 [for database 3023:6594,44]
        Child Nodes (RqJoinSpec): &lt;&lt;7672&gt;&gt; [for database 3023:6594:orcl7,44]
            PRODUCTS T6596
            SALES T6637
        DetailFilter: PRODUCTS.PROD_ID = SALES_BEYOND_2003.PROD_ID [for database 0:0]
    ) as D3
GroupBy: [ D3.c2]  [for database 3023:6594,44]
OrderBy: c1 asc [for database 3023:6594,44]
</pre>
<p>Then, depending on whether the BI Server can resolve this using a single query or multiple queries against separate data source, either a single SQL statement like the one below will be issued, or separate statements will be issued and the BI Server will do the union all in memory.</p>
<pre>select D3.c2 as c1,
     sum(D3.c3) as c2
from
     ((select T6596.PROD_SUBCATEGORY_DESC as c2,
               T6629.AMOUNT_SOLD as c3
          from
               PRODUCTS T6596,
               SALES T6629 /* SALES_UPTO_2003 */
          where  ( T6596.PROD_ID = T6629.PROD_ID )
          union all
          select T6596.PROD_SUBCATEGORY_DESC as c2,
               T6637.AMOUNT_SOLD as c3
          from
               PRODUCTS T6596,
               SALES T6637 /* SALES_BEYOND_2003 */
          where  ( T6596.PROD_ID = T6637.PROD_ID ) )
     ) D3
group by D3.c2
order by c1
</pre>
<p><span style="font-size:14pt;"><strong>Driving Tables (Parameterized Nested Loop Joins)</strong></span></p>
<p>I mentioned in the paragraph above that BI Server joins are typically done using the sort-merge algorithm. One variation on this though is when you set one of the two tables in a business model and mapping logical join to be a driving table, typically because you are federating fact and dimension tables and one table is much smaller than the other, as shown in the screenshot below.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis31-1.jpg" height="461" width="500" border="0" hspace="4" vspace="4" alt="Bis31-1" /></p>
<p>The first thing to understand with driving tables is that they are regarded as a &#8220;hint&#8221; by the BI Server, and the BI Server may well choose to ignore the setting if it makes more sense to perform the join as normal (presumably, when both tables are relatively small). If the driving table instruction is followed, though, the BI Server will always do the join in-memory, even if both tables come from logical table sources pointing to the same physical database. In the execution plan shown below, you can see the InnerJoin (left drive) that indicates a parameterized nested loop join (PNLJ) will be required, and as the name suggests the BI Server will perform a nested loop join rather than the sort-merge join that it usually uses to join tables together.</p>
<pre>-------------------- Execution plan:
RqBreakFilter &lt;&lt;8705&gt;&gt;[1] [for database 0:0,0]
    RqList &lt;&lt;8972&gt;&gt; [for database 0:0,0]
        D1.c2 as c1 [for database 3023:2500,44],
        sum(D1.c5 by [ D1.c2]  at_distinct [ D1.c2, D1.c3] ) as c2 [for database 0:0,0]
    Child Nodes (RqJoinSpec): &lt;&lt;8984&gt;&gt; [for database 0:0,0]
        (
            RqList &lt;&lt;8463&gt;&gt; [for database 0:0,0]
                D901.c1 as c2 GB [for database 3023:2500,44],
                D902.c2 as c3 [for database 3023:5035,44],
                D902.c3 as c5 [for database 3023:5035,44]
            Child Nodes (RqJoinSpec): &lt;&lt;8707&gt;&gt; [for database 0:0,0]

                    (
                        RqList &lt;&lt;8757&gt;&gt; [for database 3023:2500:orcl2,44]
                            PRODUCTS.PROD_NAME as c1 GB [for database 3023:2500,44],
                            PRODUCTS.PROD_ID as c2 [for database 3023:2500,44]
                        Child Nodes (RqJoinSpec): &lt;&lt;8760&gt;&gt; [for database 3023:2500:orcl2,44]
                            PRODUCTS T2502
                        DetailFilter: PRODUCTS.PROD_NAME = '128MB Memory Card' or PRODUCTS.PROD_NAME = '3 1/2" Bulk diskettes, Box of 100' or PRODUCTS.PROD_NAME = '5MP Telephoto Digital Camera' or PRODUCTS.PROD_NAME = '64MB Memory Card' or PRODUCTS.PROD_NAME = 'Deluxe Mouse' or PRODUCTS.PROD_NAME = 'Envoy Ambassador' or PRODUCTS.PROD_NAME = 'Envoy External 8X CD-ROM' or PRODUCTS.PROD_NAME = 'Martial Arts Champions' or PRODUCTS.PROD_NAME = 'Model A3827H Black Image Cartridge' or PRODUCTS.PROD_NAME = 'Model C93822D Wireless Phone Battery' or PRODUCTS.PROD_NAME = 'Model CD13272 Tricolor Ink Cartridge' or PRODUCTS.PROD_NAME = 'PCMCIA modem/fax 28800 baud' or PRODUCTS.PROD_NAME = 'SIMM- 16MB PCMCIAII card' or PRODUCTS.PROD_NAME = 'Smash up Boxing' or PRODUCTS.PROD_NAME = 'Unix/Windows 1-user pack' [for database 0:0]
                        OrderBy: c2 asc [for database 3023:2500,44]
                    ) as D901
                InnerJoin  (left drive) &lt;&lt;8806&gt;&gt; On D901.c2 = D902.c2; actual join vectors:  [ 1 ] =  [ 0 ]

                    (
                        RqList &lt;&lt;8790&gt;&gt; [for database 3023:5035:orcl5,44]
                            SALES.PROD_ID as c2 [for database 3023:5035,44],
                            sum(SALES.AMOUNT_SOLD by [ SALES.PROD_ID] ) as c3 [for database 3023:5035,44]
                        Child Nodes (RqJoinSpec): &lt;&lt;8793&gt;&gt; [for database 3023:5035:orcl5,44]
                            SALES T5126
                        DetailFilter: SALES.PROD_ID = ?1 or SALES.PROD_ID = ?2 or SALES.PROD_ID = ?3 or SALES.PROD_ID = ?4 or SALES.PROD_ID = ?5 or SALES.PROD_ID = ?6 or SALES.PROD_ID = ?7 or SALES.PROD_ID = ?8 or SALES.PROD_ID = ?9 or SALES.PROD_ID = ?10 or SALES.PROD_ID = ?11 or SALES.PROD_ID = ?12 or SALES.PROD_ID = ?13 or SALES.PROD_ID = ?14 or SALES.PROD_ID = ?15 or SALES.PROD_ID = ?16 or SALES.PROD_ID = ?17 or SALES.PROD_ID = ?18 or SALES.PROD_ID = ?19 or SALES.PROD_ID = ?20 [for database 0:0]
                        GroupBy: [ SALES.PROD_ID]  [for database 3023:5035,44]
                        OrderBy: c2 asc [for database 3023:5035,44]
                    ) as D902
            OrderBy: c2, c3 [for database 0:0,0]
        ) as D1
    OrderBy: c1 asc [for database 0:0,0]
</pre>
<p>Then then leads to the following parameterized SQL statements being issued, with the first statement representing the &#8220;driving&#8221; query, and the second the &#8220;probing&#8221; one against the larger table.</p>
<pre>-------------------- Sending query to database named orcl2 (id: &lt;&lt;8757&gt;&gt;):

select T2502.PROD_NAME as c1,
     T2502.PROD_ID as c2
from
     PRODUCTS T2502
where  ( T2502.PROD_NAME in ('128MB Memory Card', '3 1/2" Bulk diskettes, Box of 100', '5MP Telephoto Digital Camera', '64MB Memory Card', 'Deluxe Mouse', 'Envoy Ambassador', 'Envoy External 8X CD-ROM', 'Martial Arts Champions', 'Model A3827H Black Image Cartridge', 'Model C93822D Wireless Phone Battery', 'Model CD13272 Tricolor Ink Cartridge', 'PCMCIA modem/fax 28800 baud', 'SIMM- 16MB PCMCIAII card', 'Smash up Boxing', 'Unix/Windows 1-user pack') )
order by c2

+++Administrator:2c0000:2c000a:----2010/02/24 21:06:47

-------------------- Sending query to database named orcl5 (id: &lt;&lt;8790&gt;&gt;):

select T5126.PROD_ID as c2,
     sum(T5126.AMOUNT_SOLD) as c3
from
     SALES T5126
where  ( T5126.PROD_ID in (:PARAM1, :PARAM2, :PARAM3, :PARAM4, :PARAM5, :PARAM6, :PARAM7, :PARAM8, :PARAM9, :PARAM10, :PARAM11, :PARAM12, :PARAM13, :PARAM14, :PARAM15, :PARAM16, :PARAM17, :PARAM18, :PARAM19, :PARAM20) )
group by T5126.PROD_ID
order by c2
</pre>
<p>In reality you rarely see driving table joins being used as there are much better solutions to bringing together small and large tables together &#8211; the main one being to co-locate the tables and then push the join down to the database, rather than bring both datasets together and have the BI Server join them in memory instead (this also applies to a lesser degree to all BI Server joins). But this could be a useful &#8220;quick fix&#8221; until such time as you can co-locate the data, and its useful to remember that these types of joins are always done by the BI Server due to the need to iterate through drive/probe operations.</p>
<p><span style="font-size:14pt;"><strong>Persist Connnection Pools</strong></span></p>
<p>One final variation on BI Server execution plans and join types is when you set up a &#8220;persist connection pool&#8221;. Persist connection pools are typically used in two scenarios; firstly, where Oracle/Siebel Marketing is being used, and secondly, where the underlying physical database doesn&#8217;t handle large numbers of values in an IN-list. In this case, you can set up a second connection pool within a physical database and specify it as the persist connection pool, as shown in the screenshot below:</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis30-1.jpg" height="397" width="500" border="0" hspace="4" vspace="4" alt="Bis30-1" /></p>
<p>I&#8217;ve never encountered a persist connection pool &#8220;in the wild&#8221;, so to speak, but an example query log output from when one was used is shown below. In this instance, the first query was sent to a MS Analysis Services database, and a persist connection pool was used to materialize the in-list results into a database table which is then joined back to the ORDERS table in the final query, rather than have the BI Server do the join in-memory.</p>
<pre>-------------------- Sending query to database named FoodMart (id: &lt;&lt;10980&gt;&gt;):
With
  member [Measures].[YearAnc] as 'ancestor([Time].Currentmember,[Time].[Year]).name'
  set [Q] as '{{[Time].[Year].members}}'
  select
    {[measures].[YearAnc]} on columns,
    {[Q]} on rows
  from     [Sales]

-------------------- Sending query to database named SQLDB_Northwind (id: CreateTable TransGateway):
CREATE TABLE TTCH5C5DEL554110000020000003 ( column1 VARCHAR2(8) )

-------------------- Sending query to database named SQLDB_Northwind (id: &lt;&lt;11057&gt;&gt;):
select distinct TO_NUMBER(TO_CHAR(T1864.OrderDate, 'yyyy'), '9999') as c1
from
     Orders T1864
where  ( TO_NUMBER(TO_CHAR(T1864.OrderDate, 'yyyy'), '9999') in (select column1 from TTCH5C5DEL554110000020000003) )
</pre>
<p><span style="font-size:14pt;"><strong><br />
Conclusions</strong></span></p>
<p>So, there you have it. The join strategy of the BI Server, as is the case with functions and calculations, is to wherever possible push them down to the underlying database. If this can&#8217;t be done, because either the database version doesn&#8217;t support features like subquery factoring, or if the data for the request is being sourced from more than one physical databas, the BI Server will do the join itself, initially in-memory but usually with temporary data being paged to disk.</p>
<p>There are two main types of BI Server join; regular (inner, left outer, right outer and fullouter) joins for bringing together fact and dimension tables; and full outer stitch joins, for bringing together facts and measures. There are also variations for handling joins from very small tables to very large tables (driving tables, or parameterized nested loop joins), or when the physical database doesn&#8217;t support large in-lists, however these issues are usually better handled by co-locating data or upgrading the database.</p>
<p>Finally, even though the BI Server is pretty clever at doing these types of joins, you&#8217;re usually better trying to invest your time in physically bringing your data together into a data mart or data warehouse than spending too much time fine-tuning these joins, though a knowledge of how they work (and how to read a level 5 execution plan) can be useful if you have to understand, or tune, an existing system in-place. Of course the level 5 execution plan doesn&#8217;t really tell you anything you couldn&#8217;t determine by looking at the design of the RPD &#8211; there&#8217;s nothing that goes on beyond this that might change the execution plan for a certain set of data, unlike the Oracle database which changes the plan from database to database depending on the distribution and nature of the data &#8211; but its interesting to get a peek into the workings of the BI Server Navigator module.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/03/03/inside-the-oracle-bi-server-part-3-bi-server-in-memory-joins/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>OBIEE Content at ODTUG Kaleidoscope 2010</title>
		<link>http://www.rittmanmead.com/2010/03/02/obiee-content-at-odtug-kaleidoscope-2010/</link>
		<comments>http://www.rittmanmead.com/2010/03/02/obiee-content-at-odtug-kaleidoscope-2010/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 14:04:01 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2010/03/02/obiee-content-at-odtug-kaleidoscope-2010/</guid>
		<description><![CDATA[As well as organizing our own BI Forum in the UK, another event I&#8217;ve had a hand in is ODTUG Kaleidoscope 2010, which is due to run from June 27th &#8211; July 1st in Washington D.C. I&#8217;ve been the content lead for the BI, DW and Hyperion Reporting Tools stream, and if you were thinking [...]]]></description>
			<content:encoded><![CDATA[<p>As well as organizing our own BI Forum in the UK, another event I&#8217;ve had a hand in is <a href="http://www.odtugkaleidoscope.com/">ODTUG Kaleidoscope 2010</a>, which is due to run from June 27th &#8211; July 1st in Washington D.C. I&#8217;ve been the content lead for the <a href="http://www.odtugkaleidoscope.com/oraclebusinessintelligence.html">BI, DW and Hyperion Reporting Tools</a> stream, and if you were thinking of coming to the BI Forum but couldn&#8217;t make it over to the UK, this event has a similar level of OBIEE content and might be of interest to you.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/03/odtug.jpg" height="117" width="500" border="0" hspace="4" vspace="4" alt="Odtug" /></p>
<p>Like the BI Forum, we wanted to focus on the technical aspects of OBIEE, and as the event is running Stateside we were able to get some of the key product managers from Oracle to come and present, who don&#8217;t usually make it to general user group events like this but were keen to contribute to our OBIEE focus. Here&#8217;s a taster for what&#8217;s on the agenda:</p>
<p><strong>OBIEE 11g Integration with Oracle ADF Business Components</strong><br />
<em>Palaniappan Chidambaram, Oracle Corporation</em></p>
<p style="text-indent:20pt;"><em>With Oracle Middleware getting smarter and robust, it’s time to update the data source of OBIEE from relational database to Fusion ADF Business Model Components (light-weight Java View Objects). This is an efficient way to add pervasive BI to the enterprise applications built using Fusion ADF. Learn the new OBIEE 11g and Fusion ADF integration with added metadata on security, UI hints, and much more.</em></p>
<p>I met Palanippan back at last year&#8217;s Open World, and he&#8217;s responsible for some of the OBIEE/ADF integration that&#8217;s coming with the new Fusion Release of Oracle Applications. Palanippan will be talking about something that will be of interest to both OBIEE, and ADF developers, and will be speaking about it from the perspective of someone responsible for its features and usage.</p>
<p><strong>Best Practices for Performance, Scalability, and Reliability with Oracle BI Enterprise Edition</strong><br />
<em>Mike Durran, Oracle Corporation</em>, and<br />
<strong>Oracle OBIEE Metadata Modeling Best Practices and Tips for Concurrent Development</strong><br />
<em>Alan Fuller, Oracle Corporation<br />
</em>
<p style="text-indent:20pt;"><em>The initial setup and configuration of an Oracle BI system can reap benefits in terms of ongoing performance and reliability. Mike&#8217;s session describes the creation of a system that can scale to your enterprise from the initial install, configuration for optimum performance, ongoing monitoring, and troubleshooting tips.</p>
<p style="text-indent:20pt;">In Alan&#8217;s session, a senior member of the OBIEE product management team will cover best practices for metadata modeling in OBIEE including: using the power of the enterprise semantic layer for performance tuning and query optimization, rapid development processes, managing frequently encountered stumbling blocks, and making it easy for end users. Also covered will be tips for concurrent development of metadata across multiple developers.</p>
<p style="text-indent:20pt;"></em></p>
<p>From speaking with past attendees of Kaleidoscope, a regular bit of feedback is that people want to hear &#8220;best practice&#8221; sessions on their technology area of interest. Mike and Alan are therefore going to give Oracle&#8217;s view on best practices for OBIEE performance, and RPD modeling, which will also give the audience the change to discuss their best practices and comment on the ones put forward by Oracle.</p>
<p><strong>Web Services and Application Integration with Oracle Business Intelligence EE Suite</strong><br />
<em>David Granholm, Oracle Corporation</em></p>
<p style="text-indent:20pt;"><em>A variety of techniques can be used to build novel applications which extend the core capabilities of the OBIEE Suite. Approaches include URL-based integration, Web services leveraging SOAP methods, portal and WebCenter integration, and custom applications built in Oracle’s Application Development Framework (ADF) using Oracle JDeveloper &#8211; Oracle&#8217;s main development tool for Java-based SOA applications. Each of these approaches and a few real-world examples will be discussed.</em></p>
<p>David is a really good speaker, and this session will complement Palanippan&#8217;s and extend the discussion to SOAP and web services. Again, something of interest for general application developers as well as OBIEE developers.</p>
<p><strong>Oracle Business Intelligence Applications Essbase Integrator</strong><br />
<em>Alaric Thomas, Oracle Corporation</em></p>
<p style="text-indent:20pt;"><em>Oracle has a number of powerful business intelligence technologies in its portfolio, and we are rapidly integrating these technologies to provide more value and lower TCO for customers. In this session, Alaric Thomas will discuss how the Oracle BI Applications – Essbase Integrator will bring together the capabilities of the Oracle BI Applications and Oracle Essbase to further leverage a common applications information model delivering value to both IT and end users.</em></p>
<p>I blogged on this presentation a while ago, and this will be an excellent opportunity to see how the integration of Essbase and Oracle BI Apps is taking place, using technology available now rather than being based on the Fusion BI Apps.</p>
<p><strong>Oracle® Hyperion Smart View for Office, Fusion Edition</strong><br />
<em>Toufic Wakim, Oracle Corporation</em></p>
<p style="text-indent:20pt;"><em>For those working with a financial application, Oracle Essbase or Oracle Business Intelligence Suite Enterprise Edition Plus, Oracle Hyperion Smart View for Office brings access to data and report templates and the ability to author reports in the Microsoft Office framework. This session gives an overview of this product and demonstrates new features. It also shows how to use it with Oracle Hyperion Planning, Oracle Hyperion Financial Management, Oracle Business Intelligence Suite Enterprise Edition Plus, and Oracle Essbase.</em></p>
<p>SmartView is supposed to replace the BI Office add-in for OBIEE, but in its current incarnation doesn&#8217;t work too well. Apparently a new version that is more suited to OBIEE will be released shortly, and Toufic hopefully will discuss and demonstrate this version at Kaleidoscope. If you&#8217;re interested in the future of MS Office integration with OBIEE, this will be a must-see session.</p>
<p>Apart from these sessions, there&#8217;s a couple from myself on OBIEE 11g new features (assuming I can get sign-off to present, as 11g is doubtful for release by the time of Kaleidoscope), and a whole bunch of OWB, DW and ODI sessions as well as content aimed at Hyperion users upgrading to OBIEE. Outside of the sessions we&#8217;ll be organizing a social event specifically for the BI community, and there&#8217;ll of course be other great Essbase, Hyperion, Oracle and development sessions running concurrently.</p>
<p><a href="http://www.odtugkaleidoscope.com/registration.html">Registration is open now</a>, and if you&#8217;re serious about OBIEE and based Stateside (or even outside the USA, and fancy a trip to DC), this is definitely the event to attend in 2010.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/03/02/obiee-content-at-odtug-kaleidoscope-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Inside the Oracle BI Server Part 2 : How Is A Query Processed?</title>
		<link>http://www.rittmanmead.com/2010/03/01/inside-the-oracle-bi-server-part-2-how-is-a-query-processed/</link>
		<comments>http://www.rittmanmead.com/2010/03/01/inside-the-oracle-bi-server-part-2-how-is-a-query-processed/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 07:15:37 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2010/02/28/inside-the-oracle-bi-server-part-2-how-is-a-query-processed/</guid>
		<description><![CDATA[In the first article on this series about the Oracle BI Server, I looked at the architecture and functions within this core part of the OBIEE product set. in this article, I want to look closer at what happens when a query (or &#8220;request&#8221;) comes into the BI Server, and how it translates it into [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://www.rittmanmead.com/2010/02/25/inside-the-oracle-bi-server-part-1-the-bi-server-architecture/">first article on this series about the Oracle BI Server</a>, I looked at the architecture and functions within this core part of the OBIEE product set. in this article, I want to look closer at what happens when a query (or &#8220;request&#8221;) comes into the BI Server, and how it translates it into the SQL, MDX, file and XML requests that then get passed to the underlying data sources.</p>
<p>In the previous article, I laid out a conceptual diagram of the BI Server and talked about how the Navigator turned incoming queries into one or more physical database queries. As a recap, here&#8217;s the architecture diagram again:</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis2-1.jpg" height="542" width="500" border="0" hspace="4" vspace="4" alt="Bis2-1" /></p>
<p>Now as we all know, the BI Server uses a three-layer metadata model that exposes one or more databases (or &#8220;subject areas&#8221;) for ODBC-compliant query tools to run queries against. Here&#8217;s a typical metadata model that takes a number of physical data sources, joins them together into a smaller number of business model and mapping models, and then presents them out to the query tool (usually, Oracle BI Answers) as a set of databases made up of relational tables, columns and joins.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis7.jpg" height="320" width="500" border="0" hspace="4" vspace="4" alt="Bis7" /></p>
<p>Usually you access this metadata model using Oracle BI Answers, which presents you with an initial choice of subject areas (databases in ODBC terminology) and then displays the contents of one of them as a list of tables and columns (in 11g, you&#8217;ll be able to to include tables from multiple subject areas in queries as long as there are tables in common between them).</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis9.jpg" height="481" width="500" border="0" hspace="4" vspace="4" alt="Bis9" /></p>
<p>Other ODBC-compliant query tools, such as Microsoft Excel, Cognos or Business Objects, can access these subject areas and run queries against them just as if it was a regular database. Here&#8217;s Microsoft Excel 2007 building a query against the same subject area:</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis8.jpg" height="357" width="500" border="0" hspace="4" vspace="4" alt="Bis8" /></p>
<p><span style="font-size:14pt;"><strong>What Happens When the BI Server Handles a Query?<br />
</strong></span><br />
So just what happens then, when a query (or &#8220;request&#8217;) comes in from one of these sources, and needs to be processed in order to return results to the user? As you&#8217;re probably aware, the BI Server doesn&#8217;t itself hold data (except cached results from other queries, when this feature is enabled); instead, it translates the incoming &#8220;logical&#8221; query into one or more outgoing &#8220;physical&#8221; queries against the relevant data sources. As such, a logical model presented to users might be mapped to data in an Oracle data warehouse, an E-Business Suite application, some data in a Teradata data warehouse, some measures in an Essbase cube and even some data in an Excel spreadsheet. The BI server resolves this complexity by creating a simplified, star schema business model over these data sources so that the user can query it as if it&#8217;s a single source of data.</p>
<p>If you&#8217;re used to the Oracle database, you&#8217;ll probably know that it has various components that are used to resolve queries &#8211; the library cache, query rewrite, table and system statistics, etc &#8211; and both rule-based and cost-based optimizers that are used to generate a query plan. For most modern Oracle systems, a statistics-based cost-based optimizer (most famously documented by Jonathan Lewis in <a href="http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html">this book</a>) is used to generate a number of potential execution plans (which can be displayed in a 10035 trace), with the lowest cost being chosen to run the query. Now whilst the equivalent process isn&#8217;t really documented for the BI Server, what it appears to do is largely follow a rule-based approach with a small amount of statistics being used (or not used, as I&#8217;ll mention in a moment). In essence, the following sources of metadata information are consulted when creating the query plan for the BI Server;</p>
<ul>
<li>The presentation (subject area) layer to business model layer mapping rules;</li>
<li>The logical table sources for each of the business columns used in the request;</li>
<li>The dimension level mappings for each of the logical table sources;</li>
<li>The &#8220;Number of Elements at this Level&#8221; count for each dimension level (potentially the statistics bit, though anecdotally I&#8217;ve heard that these figures aren&#8217;t actually used by the BI Server);</li>
<li>Whether caching is enabled, and if so, whether the query can be found in the cache;</li>
<li>What physical features are enabled for the particular source database for each column (and whether they are relational, multi-dimensional, file, XML or whatever)</li>
<li>Specific rules for generating time-series queries, binning etc, and</li>
<li>Security settings and filters</li>
</ul>
<p>As far as I can tell, there are no indexes, no statistics (apart from the dimension level statistics mentioned above) and no hints; there is however query rewrite and aggregates, as the BI Server allows aggregate tables to be defined which are then mapped in to specific levels in a dimension hierarchy. Cleverly, the back-end data source doesn&#8217;t even have to be an SQL database, and can in fact be a multi-dimensional database such as Essbase, Oracle OLAP or Microsoft Analysis Services, with the multi-dimensional dataset that they return converted into a row-based dataset that can be joined to other data coming in from a more traditional relational database.</p>
<p><span style="font-size:14pt;"><strong>&#8220;A Day in the Life of a Query&#8221;<br />
</strong></span><br />
A good way of looking at what Oracle has termed &#8220;A day in the life of a query&#8221;, is to take a look at some slides from a presentation that Oracle used regularly around the time of the introduction of Oracle BI EE. I&#8217;ll go through it slide by slide and add some interpretation from myself.</p>
<p>1. A query comes in from Answers or any other ODBC query tool, asking for one or more columns from a subject area. Overall, the function within the BI Server that deals with this is called <strong>Intelligent Request Generation, </strong>marked in yellow in the diagram below.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis10.jpg" height="260" width="500" border="0" hspace="4" vspace="4" alt="Bis10" /></p>
<p>2. The query is then passed to the <strong>Logical Request Generation</strong> engine, marked in yellow in the diagram below. The request itself requires the Brand, Closed Revenue (ultimately held in the GL system), Service Requests (held in the CRM system) and Share of Revenue (a calculated, or derived, measure). As such it&#8217;s going to require multiple physical SQL queries and multi-pass calculations, all of which will be worked out by another part of the BI Server architecture, the Navigator.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis11.jpg" height="264" width="500" border="0" hspace="4" vspace="4" alt="Bis11" /></p>
<p>3. Once the logical request has been generated but before its passed off to the Navigator, a check is made (if this feature is enabled) as to whether the logical request can be found in the cache. <strong>Cache Services</strong> will either do a fast, or more comprehensive match of the incoming request against those stored in the query cache, and if found, return the results from there rather than have the BI Server run physical SQL against the business model&#8217;s data sources.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis12.jpg" height="264" width="500" border="0" hspace="4" vspace="4" alt="Bis12" /></p>
<p>For a more detailed look at what Cache Services does, the old Siebel Analytics Administration Tool documentation has a good flowchart that explains what goes on:</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis13.jpg" height="452" width="450" border="0" hspace="4" vspace="4" alt="Bis13" /></p>
<p>The key bit is the Cache Hit step. In general, a cache hit will occur if the following conditions are met:</p>
<ul>
<li>Caching is enabled (CACHE=Y in the NQSConfig.INI file);</li>
<li>The WHERE clause in the logical SQL is semantically the same, or a logical subset of a cached statement;</li>
<li>All of the columns in the SELECT list have to exist in the cached query, or they must be able to be calculated from them;</li>
<li>It has equivalent join conditions, so that the resultant joined table of any incoming query has to be the same as (or a subset of) the cached results</li>
<li>If DISTINCT is used, the cached copy has to use this attribute as well</li>
<li>Aggregation levels have to be compatible, being either the same or more aggregated than the cached query</li>
<li>No further aggregation (for example, RANK) can be used in the incoming query</li>
<li>Any ORDER BY clause has to use columns that are also in the cached SELECT list</li>
</ul>
<p>In addition, there are two NQSConfig.INI parameters that I think were added in the last few releases (as I can&#8217;t find them mentioned in the Siebel Analytics documentation) are USE_ADVANCED_HIT_DETECTION and MAX_SUBEXPR_SEARCH_DEPTH. The latter determines how many levels into an expression (for example, SUM(MAX(SIN(COS(TAN(ABS(TRUNC(PROFIT)))))))) that the cache hit detector will go in trying to get a match, whilst the former turns on some additional cache hit searches that you might want to enable if caching is important but not otherwise happening. Unfortunately the docs don&#8217;t really expand on what these additional searches are or the performance impact that they can introduce, so if anyone has any more information on this, I&#8217;d be glad to hear.</p>
<p>4. If the cache can&#8217;t provide the answer to the request, the request then gets passed to the <strong>Navigator</strong>. The Navigator handles the logical request &#8220;decision tree&#8221; and determines how complex the request is, what data sources (logical table sources) need to be used, whether there are any aggregates that can be used, and overall what is the best way to satisfy the request, based on how you&#8217;ve set up the presentation, business model and mapping, and physical layers in your RPD.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis14.jpg" height="264" width="500" border="0" hspace="4" vspace="4" alt="Bis14" /></p>
<p>5. Within the Navigator, the <strong>Multi-Pass / Sub-Request Logic</strong> function analyzes the incoming request and works out the complexity of the query. It works out whether it requires multiple passes (for example, calculates the average of two aggregated measures), or whether the request is based on the results of another request (in other words, uses a sub-request). The BI Server then uses this information to work out the optimal way to gather the required data and do the calculations; in the example used in the slides, the revenue share calculation is based on the other two measures and is therefore considered &#8220;multi-pass&#8221;.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis15.jpg" height="264" width="500" border="0" hspace="4" vspace="4" alt="Bis15" /></p>
<p>6. A measure used within the business model and mapping layer may be &#8220;fragmented&#8221;, which means that it is logically partitioned so that historic information, for examples, comes from a data warehouse whilst current information comes from an OLTP application. The <strong>Fragment Optimization Engine</strong> within the Navigator sits between the incoming request and the Execution Engine and where appropriate, transforms the base-level SQL into &#8220;fragmented&#8221; SQL against each of the data sources mapped into the fragmented measure. For more background information on fragmentation, check out <a href="http://www.rittmanmead.com/2007/06/19/obiee-data-modeling-tips-2-fragmentation/">this old blog post on the subject</a>.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis17.jpg" height="264" width="500" border="0" hspace="4" vspace="4" alt="Bis17" /></p>
<p>7. The final function within the Navigator is the <strong>Aggregate Navigator</strong>, which uses the logical table source mappings together (in theory) with the dimension level statistics to determine the most efficient table to fetch the data from (i.e. the table with the least number of records to successfully fulfil a request).</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis16.jpg" height="264" width="500" border="0" hspace="4" vspace="4" alt="Bis16" /></p>
<p>8. The <strong>Optimized Query Rewrites</strong> function within the BI Server then takes the query plan generated by the Navigator and rewrites it to use the features of the underlying database engines, adding RANK(OVER()) calculations if Oracle is being used, for example (referred to as &#8220;function shipping&#8221;) or just getting the raw data and having the BI Server do the calculations afterwards, if working with a database that doesn&#8217;t support analytic SQL functions. This part of the BI Server is also responsible for generating XML queries, or MDX queries for OLAP sources,<br />
which are then sent to the underlying physical databases, in parallel, so that they can retrieve their relevant data sets.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis18.jpg" height="264" width="500" border="0" hspace="4" vspace="4" alt="Bis18" /></p>
<p>9. Once the data is retrieved, the results combined together and any further calculations applied, the results are returned to the calling application via the ODBC interface, and also copied to the cache along with the logical SQL query if caching is enabled.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis21.jpg" height="264" width="500" border="0" hspace="4" vspace="4" alt="Bis21" /></p>
<p>The BI Server&#8217;s knowledge of what each source database can support, in terms of SQL functions, is determined by the contents of the DBFeatures.INI configuration file which can in turn be over-ridden by the &#8220;Features&#8221; tab in the Database settings in the physical database model.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis19.jpg" height="523" width="500" border="0" hspace="4" vspace="4" alt="Bis19" /></p>
<p>I think I&#8217;ve also noticed that, from release to release of OBIEE, the way that time-series queries, for example, get resolved into physical SQL queries changes over time, as Oracle get better at generating efficient SQL queries to resolve complex calculations. It&#8217;s also the case that currently, for Essbase data sources, very few of the functions used by the BI Server get function-shipped to their equivalent MDX functions, though this is meant to be improving in the forthcoming 11g release (and in the meantime, you can use EVALUATE and EVALUATE_AGGR to call MDX functions directly).</p>
<p><span style="font-size:14pt;"><strong>Level 5 Logging, and Logical Execution Plans<br />
</strong></span><br />
You can see what goes on when a complex, multi-pass request that requires multiple data sources is sent through from Answers and gets logged in the NQQuery.log file with level 5 logging enabled. The query requests &#8220;quantity&#8221; information that is held in an Oracle database, &#8220;quotas&#8221; that comes from an Excel spreadsheet, and &#8220;variance&#8221; which is derived from quantity minus quotas. Both columns need to be aggregated before the variance calculation can take place, and you can see from the logs the Navigator being used to resolve the query.</p>
<p>Starting off, this is the logical request coming through.</p>
<pre>
-------------------- Logical Request (before navigation):

RqList
    Times.Month Name as c1 GB,
    Quantity:[DAggr(Items.Quantity by [ Times.Month Name, Times.Month ID] )] as c2 GB,
    Quota:[DAggr(Items.Quota by [ Times.Month Name, Times.Month ID] )] as c3 GB,
    Quantity:[DAggr(Items.Quantity by [ Times.Month Name, Times.Month ID] )] - Quota:[DAggr(Items.Quota by [ Times.Month Name, Times.Month ID] )] as c4 GB,
    Times.Month ID as c5 GB
OrderBy: c5 asc
</pre>
<p>Then the navigator breaks the query down, works out what sources, multi-pass calculations and aggregates can be used, and generates the logical query plan.</p>
<pre>
-------------------- Execution plan:

RqList &lt;&lt;993147&gt;&gt; [for database 0:0,0]
    D1.c1 as c1 [for database 0:0,0],
    D1.c2 as c2 [for database 3023:491167,44],
    D1.c3 as c3 [for database 0:0,0],
    D1.c4 as c4 [for database 0:0,0]
Child Nodes (RqJoinSpec): &lt;&lt;993160&gt;&gt; [for database 0:0,0]
    (
        RqList &lt;&lt;993129&gt;&gt; [for database 0:0,0]
            D1.c1 as c1 [for database 0:0,0],
            D1.c2 as c2 [for database 3023:491167,44],
            D1.c3 as c3 [for database 0:0,0],
            D1.c4 as c4 [for database 0:0,0],
            D1.c5 as c5 [for database 0:0,0]
        Child Nodes (RqJoinSpec): &lt;&lt;993144&gt;&gt; [for database 0:0,0]
            (
                RqBreakFilter &lt;&lt;993128&gt;&gt;[1,5] [for database 0:0,0]
                    RqList &lt;&lt;992997&gt;&gt; [for database 0:0,0]
                        case  when D903.c1 is not null then D903.c1 when D903.c2 is not null then D903.c2 end  as c1 GB [for database 0:0,0],
                        D903.c3 as c2 GB [for database 3023:491167,44],
                        D903.c4 as c3 GB [for database 0:0,0],
                        D903.c3 - D903.c4 as c4 GB [for database 0:0,0],
                        case  when D903.c5 is not null then D903.c5 when D903.c6 is not null then D903.c6 end  as c5 GB [for database 0:0,0]
                    Child Nodes (RqJoinSpec): &lt;&lt;993162&gt;&gt; [for database 0:0,0]
                        (
                            RqList &lt;&lt;993219&gt;&gt; [for database 0:0,0]
                                D902.c1 as c1 [for database 0:0,0],
                                D901.c1 as c2 [for database 3023:491167,44],
                                D901.c2 as c3 GB [for database 3023:491167,44],
                                D902.c2 as c4 GB [for database 0:0,0],
                                D902.c3 as c5 [for database 0:0,0],
                                D901.c3 as c6 [for database 3023:491167,44]
                            Child Nodes (RqJoinSpec): &lt;&lt;993222&gt;&gt; [for database 0:0,0]

                                    (
                                        RqList &lt;&lt;993168&gt;&gt; [for database 3023:491167:ORCL,44]
                                            D1.c2 as c1 [for database 3023:491167,44],
                                            D1.c1 as c2 GB [for database 3023:491167,44],
                                            D1.c3 as c3 [for database 3023:491167,44]
                                        Child Nodes (RqJoinSpec): &lt;&lt;993171&gt;&gt; [for database 3023:491167:ORCL,44]
                                            (
                                                RqBreakFilter &lt;&lt;993051&gt;&gt;[2] [for database 3023:491167:ORCL,44]
                                                    RqList &lt;&lt;993263&gt;&gt; [for database 3023:491167:ORCL,44]
                                                        sum(ITEMS.QUANTITY by [ TIMES.MONTH_MON_YYYY] ) as c1 [for database 3023:491167,44],
                                                        TIMES.MONTH_MON_YYYY as c2 [for database 3023:491167,44],
                                                        TIMES.MONTH_YYYYMM as c3 [for database 3023:491167,44]
                                                    Child Nodes (RqJoinSpec): &lt;&lt;993047&gt;&gt; [for database 3023:491167:ORCL,44]
                                                        TIMES T492004
                                                        ITEMS T491980
                                                        ORDERS T491989
                                                    DetailFilter: ITEMS.ORDID = ORDERS.ORDID and ORDERS.ORDERDATE = TIMES.DAY_ID [for database 0:0]
                                                    GroupBy: [ TIMES.MONTH_MON_YYYY, TIMES.MONTH_YYYYMM]  [for database 3023:491167,44]
                                            ) as D1
                                        OrderBy: c1 asc [for database 3023:491167,44]
                                    ) as D901 FullOuterStitchJoin &lt;&lt;993122&gt;&gt; On D901.c1 =NullsEqual D902.c1; actual join vectors:  [ 0 ] =  [ 0 ]

                                    (
                                        RqList &lt;&lt;993192&gt;&gt; [for database 0:0,0]
                                            D2.c2 as c1 [for database 0:0,0],
                                            D2.c1 as c2 GB [for database 0:0,0],
                                            D2.c3 as c3 [for database 0:0,0]
                                        Child Nodes (RqJoinSpec): &lt;&lt;993195&gt;&gt; [for database 0:0,0]
                                            (
                                                RqBreakFilter &lt;&lt;993093&gt;&gt;[2] [for database 0:0,0]
                                                    RqList &lt;&lt;993319&gt;&gt; [for database 0:0,0]
                                                        D1.c1 as c1 [for database 0:0,0],
                                                        D1.c2 as c2 [for database 0:0,0],
                                                        D1.c3 as c3 [for database 0:0,0]
                                                    Child Nodes (RqJoinSpec): &lt;&lt;993334&gt;&gt; [for database 0:0,0]
                                                        (
                                                            RqList &lt;&lt;993278&gt;&gt; [for database 3023:496360:Quotas,2]
                                                                sum(QUANTITY_QUOTAS.QUOTA by [ MONTHS.MONTH_MON_YYYY] ) as c1 [for database 3023:496360,2],
                                                                MONTHS.MONTH_MON_YYYY as c2 [for database 3023:496360,2],
                                                                MONTHS.MONTH_YYYYMM as c3 [for database 3023:496360,2]
                                                            Child Nodes (RqJoinSpec): &lt;&lt;993089&gt;&gt; [for database 3023:496360:Quotas,2]
                                                                MONTHS T496365
                                                                QUANTITY_QUOTAS T496369
                                                            DetailFilter: MONTHS.MONTH_YYYYMM = QUANTITY_QUOTAS.MONTH_YYYYMM [for database 0:0]
                                                            GroupBy: [ MONTHS.MONTH_YYYYMM, MONTHS.MONTH_MON_YYYY]  [for database 3023:496360,2]
                                                        ) as D1
                                                    OrderBy: c2 [for database 0:0,0]
                                            ) as D2
                                        OrderBy: c1 asc [for database 0:0,0]
                                    ) as D902
                        ) as D903
                    OrderBy: c1, c5 [for database 0:0,0]
            ) as D1
        OrderBy: c5 asc [for database 0:0,0]
    ) as D1
</pre>
<p>Notice the &#8220;FullOuterStitchJoin&#8221; in the middle of the plan? We&#8217;ll look into this more in the next posting in this series. For now though, this logical query plan is then passed to the Optimized Query Rewrites and Execution Engine, which then generates in this case two physical SQL statements that are then passed back, and &#8220;stitch joined&#8221;, by the BI Server, before performing the post-aggregation calculation required for the variance measure.</p>
<pre>-------------------- Sending query to database named ORCL (id: &lt;&lt;993168&gt;&gt;):

select D1.c2 as c1,
     D1.c1 as c2,
     D1.c3 as c3
from
     (select D1.c1 as c1,
               D1.c2 as c2,
               D1.c3 as c3
          from
               (select sum(T491980.QUANTITY) as c1,
                         T492004.MONTH_MON_YYYY as c2,
                         T492004.MONTH_YYYYMM as c3,
                         ROW_NUMBER() OVER (PARTITION BY T492004.MONTH_MON_YYYY ORDER BY T492004.MONTH_MON_YYYY ASC) as c4
                    from
                         CUST_ORDER_HISTORY.TIMES T492004,
                         CUST_ORDER_HISTORY.ITEMS T491980,
                         CUST_ORDER_HISTORY.ORDERS T491989
                    where  ( T491980.ORDID = T491989.ORDID and T491989.ORDERDATE = T492004.DAY_ID )
                    group by T492004.MONTH_MON_YYYY, T492004.MONTH_YYYYMM
               ) D1
          where  ( D1.c4 = 1 )
     ) D1
order by c1

+++Administrator:2b0000:2b000e:----2010/02/23 16:04:42

-------------------- Sending query to database named Quotas (id: &lt;&lt;993278&gt;&gt;):
select sum(T496369."QUOTA") as c1,
     T496365."MONTH_MON_YYYY" as c2,
     T496365."MONTH_YYYYMM" as c3
from
     "MONTHS" T496365,
     "QUANTITY_QUOTAS" T496369
where  ( T496365."MONTH_YYYYMM" = T496369."MONTH_YYYYMM" )
group by T496365."MONTH_YYYYMM", T496365."MONTH_MON_YYYY"
</pre>
<p><span style="font-size:14pt;"><strong><br />
Memory Usage and Paging Files</strong></span></p>
<p>If you follow the BI Server at the process level during these steps, you&#8217;ll find that memory usage is largely determined at startup time by the size and complexity of the RPD thats attached online, and then goes up by around 50MB when the first query is executed. After that, memory usage tends to go up the more concurrent sessions that are run, and also when cross-database joins are performed. You&#8217;ll also find TMP files being created in $ORACLEBIDATA/tmp directory, which are used by the BI Server to hold temporary data as it pages out from memory, again typically when cross-database joins are used but also when it needs to perform additional aggregations that can&#8217;t be put into the physical SQL query.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis20.jpg" height="478" width="500" border="0" hspace="4" vspace="4" alt="Bis20" /></p>
<p>These files can get fairly big (up to 2GB in some cases) and can be created even when a single data source is used, typically for grouping data or as we&#8217;ll see in the next posting, when joining data across fact tables. They are usually cleared down when the BI Server and Presentation Server are restarted, but bear in mind when creating complex calculations that they can get pretty I/O intensive on the BI Server hardware.</p>
<p>So that&#8217;s the basics in terms of how basic queries are processed by the BI Server, and how the various BI Server components and engines process the query as it goes through the various stages. Again, if anyone knows any more, please add it as a comment, but for now that&#8217;s it and I&#8217;ll be back in a few days with part 3, on BI Server In-Memory Joins.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/03/01/inside-the-oracle-bi-server-part-2-how-is-a-query-processed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Inside the Oracle BI Server Part 1 : The BI Server Architecture</title>
		<link>http://www.rittmanmead.com/2010/02/25/inside-the-oracle-bi-server-part-1-the-bi-server-architecture/</link>
		<comments>http://www.rittmanmead.com/2010/02/25/inside-the-oracle-bi-server-part-1-the-bi-server-architecture/#comments</comments>
		<pubDate>Thu, 25 Feb 2010 22:11:30 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2010/02/25/inside-the-oracle-bi-server-part-1-the-bi-server-architecture/</guid>
		<description><![CDATA[The session that I&#8217;m giving at the BI Forum in Brighton in May is entitled &#8220;Inside the Oracle BI Server&#8221;, and I&#8217;m aiming to take a closer look at the architecture and functionality of this key OBIEE component. We&#8217;re all fairly aware of what the BI Server does at a high level, but I thought [...]]]></description>
			<content:encoded><![CDATA[<p>The session that I&#8217;m giving at the <a href="http://www.rittmanmead.com/biforum2010">BI Forum in Brighton in May</a> is entitled &#8220;Inside the Oracle BI Server&#8221;, and I&#8217;m aiming to take a closer look at the architecture and functionality of this key OBIEE component. We&#8217;re all fairly aware of what the BI Server does at a high level, but I thought it&#8217;d be interesting to take a closer look at what the BI Server does, particularly when it parses queries and joins datasets together.</p>
<p>At a very high level, the main function of the BI Server is to process inbound SQL requests against against a virtual database model, build and execute one or more physical database queries, process the data and then return it to users. The BI Server is one part of the Oracle BI Enterprise Edition Plus product family, and presents itself to query tools as one or more databases in a simple relational (star schema) model, that can then point to a much more complex set of relational, multidimensional, file and XML data sources (and in 11g, ADF objects).</p>
<p>Taking the standard OBIEE architecture diagram, the BI Server sits in the middle of the OBIEE set of servers and provides the query capability, security, interfaces to data sources and calculation logic for OBIEE (all of this is based on the current, 10g set of products).</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis1-1.jpg" height="321" width="500" border="0" hspace="4" vspace="4" alt="Bis1-1" /></p>
<p>The BI Server communicates with the BI Presentation Server via ODBC, and then connects out to the various supported data sources through ODBC, OCI, XML/A, the Essbase Client API and other native protocols. A key function of the BI Server is to create a three-layer metadata model, stored in a file-based repository along with security settings, database passwords, BI Server settings, startup macros and variable definitions.</p>
<p><span style="font-size:14pt;"><strong>The BI Server Logical Components<br />
</strong></span><br />
Taking a look specifically at the BI Server, it has a number of logical components.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis2.jpg" height="542" width="500" border="0" hspace="4" vspace="4" alt="Bis2" /></p>
<ul>
<li>The <strong>ODBC interface</strong>, that is used by Oracle BI Answers and other third-party tools to pass requests to the BI Server, and to receive the output from queries;</li>
<li>The <strong>Logical Business Model</strong>, the three-layer metadata model that describes the data available for queries;</li>
<li>The <strong>Intelligent Request Generator</strong>, a module responsible for taking the incoming queries and turning them into physical queries against the connected data source, which is made up of several sub-components including:</li>
<li>The <strong>Navigator</strong>, probably the most important part of the BI Server, and the part that takes the incoming query, compares it against cached answers, navigates the logical model and generates the physical queries that will best return the data required for the query</li>
<li>Within the Navigator, there are modules for determining whether <strong>multiple physical queries</strong> are needed, whether stored <strong>aggregates</strong> can be used, and whether <strong>fragmented</strong> data sources can be used for partitioned measures;</li>
<li>An <strong>Optimized Query Rewrite</strong> engine for handling aggregate navigation and fragments, and for translating to the correct physical SQL dialect, and</li>
<li>An <strong>Execution Engine</strong> for firing off the queries to the relational, multi-dimensional, file and XML sources required to satisfy the query.</li>
<li><strong>Cache Services</strong> stores the results of previously run queries, matches incoming SQL against that used before and returns data from the cache rather than making the BI Server query the underlying databases again</li>
</ul>
<p>In addition, various supporting technologies, modules and services provide the infrastructure for the BI Server, including:</p>
<ul>
<li><strong>Data Source Adapters</strong> for Oracle, ODBC, SQL Server, DB/2, Teradata, file, XML and other sources;</li>
<li><strong>System and Performance Monitoring</strong> through JMX counters and other technologies;</li>
<li><strong>Security Services</strong> for setting up users and groups in the RPD, filters, subject area security, links to outside LDAP servers and custom authenticators;</li>
<li><strong>Query Governance</strong>, for placing limits on numbers of rows returned and length of query execution for users and groups;</li>
<li><strong>Load Balancing</strong>, and <strong>Session Management</strong></li>
</ul>
<p><span style="font-size:14pt;"><strong>Taking a Look at the BI Server Process<br />
</strong></span><br />
Now whilst the BI Server has many characteristics of a database, compared to running Oracle on Unix which exposes many of its components (SMON, PMON, MMON, LGWR etc) as separate processes, the BI Server is just a single executable that runs under the name NQSServer.exe (or just nqsserver under Unix). The screenshot below is a view of this service (along with sawserver.exe, the BI Presentation Server) as shown in the Windows Task Manager utility.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis3.jpg" height="465" width="426" border="0" hspace="4" vspace="4" alt="Bis3" /></p>
<p>We&#8217;ll get on to memory usage in a future posting in this series, but in general the amount of memory taken up by the BI Server is initially determined by the size and complexity of the repository (RPD) that is running online, with further chunks taken up by concurrent sessions and then intermittent spikes of memory when in-memory (stitch) joins take place between data sources. The BI Server creates TMP (temporary) files in the $ORACLEBIDATA/tmp directory as data is further totalled and calculated, and as cross-database joins are paged to file.</p>
<p>If you take a closer look at the NQSServer.exe process using a tools such as Microsoft&#8217;s Process Explorer utility, you can see that it&#8217;s a multi-threaded server application:</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis4.jpg" height="522" width="500" border="0" hspace="4" vspace="4" alt="Bis4" /></p>
<p>You can see that the BI Server is a C++ application that uses the Microsoft Visual C++ runtime, whilst taking a look at one of the running threads shows the various DLLs that are being used:</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis5.jpg" height="402" width="461" border="0" hspace="4" vspace="4" alt="Bis5" /></p>
<p><span style="font-size:14pt;"><strong>Another Conceptual View of the BI Server<br />
</strong></span><br />
Another conceptual view of the BI Server architecture can be found in the old Siebel Analytics Administration Tool documentation, which shows the BI Server (or the Siebel Analytics Server as it was called then) having several layered components:</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/bis6.jpg" height="489" width="500" border="0" hspace="4" vspace="4" alt="Bis6" /></p>
<ul>
<li>The <strong>Security Model</strong>, presumably the users and groups in the RPD, plus the filters and subject area security in the repository;</li>
<li>The <strong>Business Model</strong>, the three-layer metadata model;</li>
<li><strong>Aggregate Navigation</strong>, for rewriting queries to use mapped in aggregate tables;</li>
<li><strong>SQL Generation Engine</strong> and <strong>Multi-database Query Processing</strong>, presumably the bit that takes the database capabilities matrix and generates the correct physical SQL for the various data sources;</li>
<li>The <strong>Computation Engine</strong>, for performing in-memory stitch joins, post-aggregation filters and functions, and sorting,</li>
<li>Query cachiing</li>
<li>The <strong>Metadata Repositories</strong> that can be connected to the BI Server (with one marked as &#8220;default&#8221;, and</li>
<li>The various <strong>data sources</strong>, such as Oracle, DB/2, Informix and SQL Server</li>
</ul>
<p><span style="font-size:14pt;"><strong>Conclusions<br />
</strong></span><br />
So the BI server has some of the characteristics of a BI tool (metadata model, connectivity to data sources, security etc) and some of a regular relational database (query processing, optimization, rewrite, aggregate navigation etc) but without OLTP database features such as transactions. Its primary job is to process incoming requests against this metadata model and translate them into the physical queries required to get the data from the underlying data sources, acting more as a query broker with no data being stored locally except that held in the cache. If you&#8217;re interested in a bit more history of the BI Server, including its origins as a search engine called the nQuire Query Server, take a look at <a href="http://www.rittmanmead.com/2007/09/18/a-potted-history-of-oracle-bi-suite-enterprise-edition/">this old blog post on the origins of Siebel Analytics and OBIEE</a> where I&#8217;ve written up some of the original origins of the OBIEE product set.</p>
<p>The BI Server has one main configuration file, held at $ORACLEBI/server/config/NQSConfig.INI, which contains parameter settings in plain text. The full set of possible parameters are held in the Server Administrators&#8217; Guide within the Oracle docs, and this method of holding parameter settings looks like it&#8217;ll be carried across to 11g, although the settings themselves will be maintained through Enterprise Manager rather than the Administration tool as is the case with 10g and earlier.</p>
<p>For now though, that&#8217;s it for architecture and components and in the next posting, I&#8217;ll be looking at how the BI Server, and in particular the Navigator, handles incoming requests.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/02/25/inside-the-oracle-bi-server-part-1-the-bi-server-architecture/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>OWB, RMOUG and ODTUG in Denver</title>
		<link>http://www.rittmanmead.com/2010/02/19/owb-rmoug-and-odtug-in-denver/</link>
		<comments>http://www.rittmanmead.com/2010/02/19/owb-rmoug-and-odtug-in-denver/#comments</comments>
		<pubDate>Fri, 19 Feb 2010 04:21:27 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[User Groups & Conferences]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4365</guid>
		<description><![CDATA[I&#8217;m writing this on the evening following the Rocky Mountains Oracle User Group Training Days Conference in Denver, Colorado, where Stewart Bryson and I delivered our session on OWB11gR2 New Features for DBAs and Developers. We were both pleased with the turnout, managed to deliver five demos across the new code template functionality, and took [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m writing this on the evening following the <a href="http://www.rmoug.org/training.htm">Rocky Mountains Oracle User Group Training Days</a> Conference in Denver, Colorado, where Stewart Bryson and I delivered our session on OWB11gR2 New Features for DBAs and Developers. We were both pleased with the turnout, managed to deliver five demos across the new code template functionality, and took some good questions from the audience at the end. Thanks to Peggy King and all the others at RMOUG for inviting us over, and here&#8217;s Stewart, up on stage and ready to go, just before starting our session:</p>
<p align="center"><img src="http://farm5.static.flickr.com/4034/4369723824_b64bdbc2a5_d.jpg" alt="" /></p>
<p>This was our first time at <a href="http://www.rmoug.org/index.html">RMOUG</a>, probably the biggest and best regional user group conference in the States and very similar to our UK Oracle User Group conference in terms of speakers, technical coverage and friendliness of the speakers. This year RMOUG took the opportunity to broaden their coverage into areas such as Hyperion, SOA and business intelligence, so it was also good to see people such as Edward Roske here delivering a number of Essbase and Hyperion sessions. We took our usual &#8220;OWB11gR2 New Features&#8221; presentation and gave it a DBA and database developer angle, and whilst most of the audience were new to OWB hopefully the new ODI-derived functionality still made a bit of sense. If you&#8217;re interested, the paper and presentation can be downloaded from <a href="http://www.rittmanmead.com/articles">our articles page</a>.</p>
<p>Now whilst Stewart has now gone back to Atlanta, I&#8217;ve stayed on as I&#8217;m taking part in my first ever <a href="http://www.odtug.com">ODTUG</a> Board Meeting. I got elected to the board late last year and whilst I know most of the board members already (having been the BI&#038;DW SIG co-chair for the last few years), it&#8217;ll be very interesting to take part and contribute to some of the user group decisions, and to see how we can improve and build on the BI content going forward. Certainly, the BI stream agenda for this year&#8217;s <a href="http://www.odtugkaleidoscope.com/oraclebusinessintelligence.html">Kaleidoscope</a>, due to be held in Washington D.C. in June, is about the best BI agenda I&#8217;ve seen for a long time (including lots of sessions by the Oracle PMs responsible for the 11g release), and I&#8217;m looking forward to working with the board to increase this coverage in the future. So for me, it&#8217;s another three days work going into the weekend, then I fly back to the UK on Sunday ready to deliver a course on Tuesday. Busy times, but all good fun.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/02/19/owb-rmoug-and-odtug-in-denver/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Announcing the Rittman Mead BI Forum 2010, May 19th &#8211; 21st 2010, Hotel Seattle Brighton UK</title>
		<link>http://www.rittmanmead.com/2010/02/15/announcing-the-rittman-mead-bi-forum-2010-may-19th-21st-2010-hotel-seattle-brighton-uk/</link>
		<comments>http://www.rittmanmead.com/2010/02/15/announcing-the-rittman-mead-bi-forum-2010-may-19th-21st-2010-hotel-seattle-brighton-uk/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 07:55:24 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[User Groups & Conferences]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4352</guid>
		<description><![CDATA[I&#8217;m very pleased to announce that registration is now open for the Rittman Mead BI Forum 2010, to be held at the Hotel Seattle, Brighton on May 19th &#8211; 21st 2010.
Like last year&#8217;s event, the 2010 BI Forum is focused on Oracle BI Enterprise Edition and the technologies that support it, such as Essbase, ODI [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m very pleased to announce that registration is now open for the <a href="http://www.rittmanmead.com/biforum2010">Rittman Mead BI Forum 2010</a>, to be held at the Hotel Seattle, Brighton on May 19th &#8211; 21st 2010.</p>
<p>Like last year&#8217;s event, the 2010 BI Forum is focused on Oracle BI Enterprise Edition and the technologies that support it, such as Essbase, ODI EE and the Oracle Database. We have a number of the world&#8217;s top speakers on the subject, including names such as</p>
<ul>
<li>Gerard Braat (Oracle Corporation), on &#8220;Complex Modelling with OBIEE 11g&#8221;</li>
<li>Antony Heljula (Peak Indicators), on &#8220;BI EE Architectures and Sizing&#8221;</li>
<li>Venkatakrishnan J (Rittman Mead), on &#8220;Fusion Middleware 11g &#8211; ADF Business Components&#8221;</li>
<li>Craig Stewart, (Oracle Corporation), on &#8220;ODI 11g: The New Generation of Data Integration&#8221;, and</li>
<li>John Minkjan (Ciber), on &#8220;OBIEE Customizations: When You Are In The Kitchen, Learn to Cook&#8221;</li>
</ul>
<p>For a full listing of all the sessions, check out the event website at <a href="http://www.rittmanmead.com/biforum2010">http://www.rittmanmead.com/biforum2010</a></p>
<p>As well as general sessions on the 20th and 21st May, we also have a keynote by Phil Bates (OBIEE Architect, Oracle Corporation) at the event reception on the evening of the 19th May, and a special one-day OBIEE Masterclass earlier in the day by none other than <a href="http://kpipartners.blogspot.com">Kurt Wolff</a>, one of the original nQuire team and an expert on repository modeling and dashboard design.</p>
<p>The pricing for the event is very reasonable and we&#8217;ve also subsidized the cost of two nights in the event hotel, so that we can all stay in the same place over the three days. There&#8217;s also the opportunity to stay for an extra night before and after the event, if you want to arrive the night before for Kurt&#8217;s masterclass, or your flight back doesn&#8217;t leave until the Saturday.</p>
<p>As with last year, we&#8217;ve taken a decision to limit the number of attendees to 50, to maximize the opportunities for networking and to allow us to keep the focus on intermediate-to-advanced OBIEE topics. Last year&#8217;s attendees have already been able to register for the past week, but we&#8217;re now opening up registrations publically and there are around 25 places left. If you are interested in attending, check out the event website and <a href="www.regonline.com/biforum2010">register here</a>, and hopefully we&#8217;ll see you in Brighton in May!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/02/15/announcing-the-rittman-mead-bi-forum-2010-may-19th-21st-2010-hotel-seattle-brighton-uk/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The BI Survey 9 &#8211; Fieldwork Closes on Sunday</title>
		<link>http://www.rittmanmead.com/2010/01/27/the-bi-survey-9-fieldwork-closes-on-sunday/</link>
		<comments>http://www.rittmanmead.com/2010/01/27/the-bi-survey-9-fieldwork-closes-on-sunday/#comments</comments>
		<pubDate>Wed, 27 Jan 2010 10:58:38 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4190</guid>
		<description><![CDATA[[Fieldwork for the BI Survey 9, an independent survey of BI customers and partners, closes its fieldwork on Sunday 31st January. It's important that we get a suitable number of Essbase, OBIEE, OLAP and Discoverer users completing the survey, so please visit the site and fill it in if you've not done so already. Thanks [...]]]></description>
			<content:encoded><![CDATA[<p><em>[Fieldwork for the BI Survey 9, an independent survey of BI customers and partners, closes its fieldwork on Sunday 31st January. It's important that we get a suitable number of Essbase, OBIEE, OLAP and Discoverer users completing the survey, so please visit the site and fill it in if you've not done so already. Thanks - MR]</em></p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2009/11/dyna_banner_bisurvey9-1.jpg" height="75" width="500" border="1" hspace="4" vspace="4" alt="Dyna Banner Bisurvey9-1" /></p>
<p><strong>&#8220;The BI Survey 9: The Customer Verdict</strong></p>
<p>We would very much welcome your participation in &#8216;The BI Survey 9: The Customer Verdict&#8217;, the world&#8217;s largest survey of business intelligence (BI) and performance management (PM) users (formerly known as The OLAP Survey).</p>
<p>As a participant, you will:</p>
<ul>
<li>Receive a summary of the results from the full survey</li>
<li>Be entered into a draw to win one of ten $50 Amazon vouchers</li>
<li>Ensure that your experiences are included in the final analyses</li>
</ul>
<p>To take part in the survey on-line, visit: <a href="http://digiumenterprise.com/answer?link=249-KP9DYABR">http://digiumenterprise.com/answer?link=270-5J9MMB9M<br />
</a><br />
BARC&#8217;s annual survey obtains input from a large number of organizations in order to better understand their buying decisions, the implementation cycle and the business benefits achieved.</p>
<p>Both business and technical users, as well as vendors and consultants, are welcome to participate. If you are answering as a consultant, please answer the questions (including the demographic questions) from your client&#8217;s perspective; we will ask you separately about your own firm.</p>
<p>The BI Survey has always adopted a vendor-independent stance. While vendors assist by inviting users to participate in the Survey, Business Application Research Center (BARC) &#8211; the publisher &#8211; does not accept vendor sponsorship of the Survey, and the results are analyzed and published without any vendor involvement.</p>
<p>You will be able to answer questions on your usage of a BI product from any vendor. Your answers will only be used anonymously, and your personal details will never be passed on to vendors or other third parties.</p>
<p>The survey should take about 15-20 minutes to complete</p>
<p>* BARC (Business Application Research Center) is a leading independent software industry analyst specializing in Data Management and Business Intelligence. For more information on BARC please visit <a href="http://www.barc.de/index.php?id=6&#038;L=1%22%20%5Co%20%22BARC%20Homepage">The BARC website</a>, <a href="http://www.bi-survey.com">www.bi-survey.com</a> and <a href="http://www.bi-verdict.com%22%20%5Ct%20%22_blank">www.BI-Verdict.com</a>&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/01/27/the-bi-survey-9-fieldwork-closes-on-sunday/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BI Forum 2010 Abstract Voting Now Open</title>
		<link>http://www.rittmanmead.com/2010/01/25/bi-forum-2010-abstract-voting-now-open/</link>
		<comments>http://www.rittmanmead.com/2010/01/25/bi-forum-2010-abstract-voting-now-open/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 20:18:34 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[User Groups & Conferences]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4184</guid>
		<description><![CDATA[If you&#8217;re considering coming along to the BI Forum 2010 we&#8217;ll be holding in Brighton in May 2010, we&#8217;d be interested in hearing your opinion on what sessions we&#8217;ll be running. The call for papers is now closed and we have 27 potential sessions, from speakers around the world. The content covers OBIEE, Essbase, ODI [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re considering coming along to the BI Forum 2010 we&#8217;ll be holding in Brighton in May 2010, we&#8217;d be interested in hearing your opinion on what sessions we&#8217;ll be running. The call for papers is now closed and we have 27 potential sessions, from speakers around the world. The content covers OBIEE, Essbase, ODI and OWB, with content aimed at experienced developers, and the abstract voting form can be found at the following address:</p>
<p><a href="http://www.zoomerang.com/Survey/?p=WEB22A5U9QD93G">http://www.zoomerang.com/Survey/?p=WEB22A5U9QD93G</a></p>
<p><strong>UPDATE:</strong> We missed one off &#8211; sorry:</p>
<p><a href="http://www.zoomerang.com/Survey/?p=WEB22A63S9HTWZ">http://www.zoomerang.com/Survey/?p=WEB22A63S9HTWZ</a></p>
<p>If you&#8217;re one of the potential speakers, please don&#8217;t vote for your own paper (this keeps scoring consistent across all speakers). We&#8217;ll keep the vote open for seven days from now, and contact the successful speakers at the start of February. Once all is in place we&#8217;ll then open the event for registration towards the middle of May. Thanks again to everyone who has shown interest,  and for the potential speakers who have submitted some excellent abstracts.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/01/25/bi-forum-2010-abstract-voting-now-open/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Rules and Error Handling in Warehouse Builder 10gR2/11gR2</title>
		<link>http://www.rittmanmead.com/2010/01/22/data-rules-and-error-handling-in-warehouse-builder-10gr211gr2/</link>
		<comments>http://www.rittmanmead.com/2010/01/22/data-rules-and-error-handling-in-warehouse-builder-10gr211gr2/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 09:12:59 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[Oracle Warehouse Builder]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2010/01/22/data-rules-and-error-handling-in-warehouse-builder-10gr211gr2/</guid>
		<description><![CDATA[One of the less well-known features in recent releases of Warehouse Builder is Data Rules, and how they can help you gracefully handle consistency issues in your data. Data Rules act as virtual constraints on your data and add error handling tables to your mappings to allow you to divert off, or reprocess data that [...]]]></description>
			<content:encoded><![CDATA[<p>One of the less well-known features in recent releases of Warehouse Builder is Data Rules, and how they can help you gracefully handle consistency issues in your data. Data Rules act as virtual constraints on your data and add error handling tables to your mappings to allow you to divert off, or reprocess data that fails one or more consistency checks. Let&#8217;s take a look at an example to see how they work.</p>
<p>To create an example, we have a table called CUSTOMERS_TGT that has a column called GENDER. We want to enforce a data rule that says this column cannot be null. However we want our Warehouse Builder mapping to catch rows coming in that have null in this column, and replace the nulls with an &#8220;Unknown&#8221; value. To start this process we open up the target table in the Data Object Editor (I&#8217;m using version 11gR1 of Warehouse Builder, but the feature works the same in 10gR2 and 11gR2), click on the <strong>Data Rules</strong> tab at the bottom, and click on the <strong>Apply Rule</strong> button.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq1-1.jpg" height="404" width="500" border="0" hspace="4" vspace="4" alt="Dq1-1" /></p>
<p>I am then presented with the first page of a wizard, and then I&#8217;m shown a list of pre-defined data quality checks that I can select from. I select the <strong>NOT NULL</strong> check and press <strong>Next</strong> to continue, I then name the data quality check and are then asked which table column the check should be &#8220;bound&#8221; to. I select the <strong>GENDER</strong> column and press <strong>Next</strong> to continue.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq3.jpg" height="400" width="500" border="0" hspace="4" vspace="4" alt="Dq3" /></p>
<p>Once the wizard is complete, my table now has an entry under Data Rules, with this Not Null rule shown as applying and bound to the GENDER column.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq4.jpg" height="406" width="500" border="0" hspace="4" vspace="4" alt="Dq4" /></p>
<p>I then use the Control Center Manager to redeploy this table, not because the new data rule has caused a physical NOT NULL constraint to be added to the target table (all of these constraints are &#8220;virtual&#8221; and handled within the Warehouse Builder mapping), but because of a &#8220;Shadow Table&#8221; that needs to be deployed alongside the target table to handle the errors. You don&#8217;t see this table listed in the set of tables within the OWB database module, it only appears as a script when you go to redeploy the table, but it&#8217;s important you deploy this (or create your own of the same definition) otherwise the mappings later on will fail.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq9.jpg" height="259" width="465" border="0" hspace="4" vspace="4" alt="Dq9" /></p>
<p>Note also that your target table shouldn&#8217;t have a real NOT NULL constraint on it , as this can cause your mapping to fail even when error handling is enabled in the mapping, as the real constrain gets in there first before your mapping can gracefully handle the error. The target table should however have a primary key as this is used later on when moving incoming rows around that fail the error check.</p>
<p>Over to the mapping editor now, where I&#8217;ve got a simple mapping that loads from a source table into this target table. Note that my target table, the one above, has some extra rows in the table operator now due to the data rule that I&#8217;ve added to it.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq6.jpg" height="298" width="500" border="0" hspace="4" vspace="4" alt="Dq6" /></p>
<p>As you can see, the target table has some extra rows that are used to hold any incoming rows that fail data rule checks, with the ERR$$$ columns holding the reason for the error. Before the error handling will take place though you need to tell OWB, via the target table properties editor, to move the errors to the error handling table rather than just ignore, or report on them.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq7.jpg" height="219" width="334" border="0" hspace="4" vspace="4" alt="Dq7" /></p>
<p>Note also the Error Table section of the table properties, which lets you substitute your own, pre-built error handling table if you want to, and to fine-tune how error records are stored in the table.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq10.jpg" height="205" width="330" border="0" hspace="4" vspace="4" alt="Dq10" /></p>
<p>So now we&#8217;re ready to go. I deploy the mapping (if the mapping deployment fails because of a table it can&#8217;t find, check that you&#8217;ve deployed the error (shadow) table mentioned above) and then run it. Even though my incoming data had one row with a null value in the GENDER column, the mapping completes successfully.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq11.jpg" height="150" width="445" border="0" hspace="4" vspace="4" alt="Dq11" /></p>
<p>However the Job Details page shows that some incoming data was diverted to my error handling table.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq12.jpg" height="164" width="533" border="0" hspace="4" vspace="4" alt="Dq12" /></p>
<p>SQL Developer shows that the new error handling table does in fact contain the row that&#8217;s failed the data quality check.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq13.jpg" height="157" width="500" border="0" hspace="4" vspace="4" alt="Dq13" /></p>
<p>Checking the target table shows that all of the remaining rows are compliant with the NOT NULL data rule.</p>
<p>But what if we wanted to correct data that failed the check, as well as record it in this error handling table. Well, we can take the output of the error rows in the target table operator, apply a transformation to it and then load just these back into the target table, if there&#8217;s a &#8220;not known&#8221; value for example that we wish to apply to values that would otherwise be null.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq14.jpg" height="278" width="500" border="0" hspace="4" vspace="4" alt="Dq14" /></p>
<p>Make sure you set the error handling action of this new target table operator to also reject rows that fail the not null check (otherwise you might inadvertently add the row back with data that might still fail the data rule check).</p>
<p>Running the mapping again shows it executing correctly, but this time if I check SQL Developer, I can see a new row appearing that has had its otherwise null GENDER value converted to a &#8220;U&#8221;, for unknown.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq15.jpg" height="201" width="362" border="0" hspace="4" vspace="4" alt="Dq15" /></p>
<p>The mapping execution still added a row to the error table though for this row, so that I can see where the mapping has corrected data in the background.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/dq16.jpg" height="158" width="500" border="0" hspace="4" vspace="4" alt="Dq16" /></p>
<p>The process works in the same way for other data rules you might apply, such as domain check rules, foreign key rules, &#8220;not number&#8221; rules and so forth. <del datetime="2010-01-27T21:53:04+00:00">One thing to bear in mind though is that these mappings run in PL/SQL (as opposed to SQL, set-based) mode and therefore might not perform as well as ones without data rules applied</del> <em>[This was fixed from 10.2.0.3, see the note from David Allen in the comments below, these mappings now run set-based in the same way as other mappings]</em>. However if you think about the benefits in terms of graceful error handling, the &#8220;self-documenting&#8221; nature of mappings that use this feature and so on, it&#8217;s certainly a good way of handling data quality issues in your staging or integration layer before bulk-loading the data into your main warehouse tables.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/01/22/data-rules-and-error-handling-in-warehouse-builder-10gr211gr2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Hybrid Columnar Compression in Oracle Exadata v2</title>
		<link>http://www.rittmanmead.com/2010/01/21/hybrid-columnar-compression-in-oracle-exadata-v2/</link>
		<comments>http://www.rittmanmead.com/2010/01/21/hybrid-columnar-compression-in-oracle-exadata-v2/#comments</comments>
		<pubDate>Thu, 21 Jan 2010 13:24:26 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2010/01/21/hybrid-columnar-compression-in-oracle-exadata-v2/</guid>
		<description><![CDATA[Along with In-Memory Parallel Execution, another new feature that came along with release 11gR2 of the Oracle Database (or more correctly, version 2 of Exadata Storage Server) is Hybrid Columnar Compression. You&#8217;ll need Exadata to use this (though at one point is was part of the standard 11gR2 database beta, without an Exadata dependency), but [...]]]></description>
			<content:encoded><![CDATA[<p>Along with <a href="http://www.rittmanmead.com/2010/01/19/in-memory-parallel-execution-in-oracle-database-11gr2/">In-Memory Parallel Execution</a>, another new feature that came along with release 11gR2 of the Oracle Database (or more correctly, version 2 of Exadata Storage Server) is Hybrid Columnar Compression. You&#8217;ll need Exadata to use this (though at one point is was part of the standard 11gR2 database beta, without an Exadata dependency), but if you&#8217;ve got either the HP or Sun versions of the Exadata hardware, are running the v2 Exadata software and your database is running version 11gR2, then you can give this is a spin.</p>
<p>If you&#8217;ve been following the wider analytic database market over the past few years (Curt Monash&#8217;s <a href="http://www.dbms2.com">DBMS2</a> website is a good place to start), you&#8217;ll probably be aware of products such as <a href="http://www.sybase.co.uk/products/datawarehousing/sybaseiq">Sybase IQ</a> and <a href="http://www.vertica.com/">Vertica Analytic Database</a> that store data in columns as opposed to rows. Interestingly, a lot of former ex-Oracle data warehousing server tech people ended up at Vertica, including <a href="http://www.lilianhobbs.com/Welcome.html">Lilian Hobbs</a> (author of &#8220;Oracle 10g Database Data Warehousing&#8221;) , and over the past couple of years vendors such as these have achieved some success in the marketplace with their column store approach. In fact <a href="http://www.rittmanmead.com/2008/12/15/interview-with-mike-stonebraker/">I ran an interview</a> with Mike Stonebraker, CTO of Vertica, on this blog a few months ago (prior to the release of Exadata 2) where he set out the case for column-based storage and the &#8220;shared nothing&#8221; architecture that his product uses.</p>
<p>Fast forward to the weeks before Open World, and Oracle announce the <a href="http://www.rittmanmead.com/2009/09/13/oracle-to-announce-new-oltp-focused-database-machine/">Sun Database Machine</a> and version 2 of the Exadata Storage software that comes with it. Apart from <a href="http://kevinclosson.wordpress.com/2009/12/10/pardon-me-where-is-that-flash-cache-part-i/">Flash Cache</a> (primarily aimed at OLTP environments) the major innovation from my perspective, and a bit of a volte-face from Oracle, was this halfway-house approach to column-based storage which they termed &#8220;Hybrid Columnar Compression&#8221;. The idea behind this is as follows:</p>
<ul>
<li>The Oracle database is still primarily a row-based engine, but there is a new type of segment compression that compresses data in columns</li>
<li>This is only available if you are using Exadata Storage Server &#8211; Kevin Closson says that this is <a href="http://kevinclosson.wordpress.com/2009/09/01/oracle-switches-to-columnar-store-technology-with-oracle-database-11g-release-2/">due to technical</a> (as opposed to marketing) reasons</li>
<li>It comes in two forms; Warehouse Compression (add COMPRESS FOR QUERY to your table definition script) and Archive Compression (add COMPRESS FOR ARCHIVE to your table script).</li>
<li>Compress for query is optimized for DSS-style queries (the same market as Vertica/Sybase IQ), whilst Compress for Archive burns up more CPU but achieves greater compression than Warehouse compression.</li>
<li>It all works through a feature introduced in Exadata 2 / Oracle Database 11gR2 called &#8220;Compression Units&#8221;.</li>
</ul>
<p>Arup Nanda has <a href="http://www.oracle.com/technology/oramag/oracle/10-jan/o10compression.html">written a good article for Oracle Magazine</a> that goes through the technology and explains how rows relate to compression units, and compression units relate to blocks. Basically, whereas blocks store data in rows, each column next to each other in the row (more or less, row chaining can affect this), so that blocks conceptually look like this:</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/comp1.jpg" height="117" width="500" border="0" hspace="4" vspace="4" alt="Comp1" /></p>
<p>The compression units used by Hybrid Columnar Compression typically span several data blocks and inside, organize data into sets of columns, with each compression unit holding the data for the columns for several rows.</p>
<p style="text-align:center;"><img src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/01/comp2.jpg" height="250" width="404" border="0" hspace="4" vspace="4" alt="Comp2" /></p>
<p>These compression blocks offer two potential advantages to data warehousing customers; firstly they compress better because columns are more likely to contain repeating values (genders, cities, account flags etc) than rows (where individual columns are often unrelated, at least in terms of the values they contain), and because you therefore pack more data in per block, it takes less blocks, and less disk I/O, to get hold of the data you are interested in &#8211; the same idea that makes regular compression attractive to DW users. Arup&#8217;s article goes into the syntax and the concepts in more detail, if you&#8217;re interested, and also has some test cases to show the kinds of benefits you can expect (though of course you&#8217;ll need Exadata 2 to try them out). Also, beware of the locking issues that compression units can bring &#8211; as each unit contains several rows of data, a lock on one row in the compression unit will lock the other rows as well, which makes the feature not really suitable for OLTP environments.</p>
<p>For me though, the interesting test of this will be to see how hybrid columnar compression compares to the &#8220;pure-columnar&#8221; approach used by vendors such as Vertica. I would imagine Vertica would argue that whilst its great that these compression units store their data organized by column, their approach would be superior for DSS customers as their equivalent of blocks stores just the data for individual columns, not all the columns for a set of rows. As such, their approach may well require even less disk I/O as you won&#8217;t be pulling back all the (compressed) columns that are stored with the columns you require (as you do with Oracle&#8217;s compression units), though I&#8217;m not close enough to either of the technologies to know if this is the case.</p>
<p>I doubt we&#8217;ll ever really see an apples-to-apples benchmark test of both technologies side-by-side (if only because most vendors&#8217; license terms prohibit publishing benchmarks), but for Oracle customers I guess this is by-the-by; even if columnar compression isn&#8217;t as DSS-efficient as pure column-based storage, it&#8217;s still a huge boost to queries and it keeps it all in the familiar, manageable Oracle environment, a benefit not to be dismissed if you have thousands of databases to manage and you&#8217;re aiming for vendor and hardware consolidation. For customers already sold on Exadata and SmartScan and for whom this is if anything a bit of a bonus, it&#8217;ll be interesting to try this out on your own data and see how much benefit it brings.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/01/21/hybrid-columnar-compression-in-oracle-exadata-v2/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
