<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rittman Mead Consulting &#187; Oracle Database</title>
	<atom:link href="http://www.rittmanmead.com/category/oracle-database/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rittmanmead.com</link>
	<description>Delivering Oracle Business Intelligence</description>
	<lastBuildDate>Mon, 06 Feb 2012 21:18:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Real-time BI: EDW with a Real-time Component</title>
		<link>http://www.rittmanmead.com/2011/07/real-time-bi-edw-with-a-real-time-component/</link>
		<comments>http://www.rittmanmead.com/2011/07/real-time-bi-edw-with-a-real-time-component/#comments</comments>
		<pubDate>Wed, 06 Jul 2011 20:46:55 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle Warehouse Builder]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=8630</guid>
		<description><![CDATA[I apologize for the long delay in getting this last portion of the Real-time discussion in place. Since I wrote the first two installments, we&#8217;ve had the BI Forum (US and UK versions), plus a flurry of activity around Rittman Mead in the US, followed up by KScope11. But a promise is a promise, and [...]]]></description>
			<content:encoded><![CDATA[<p>I apologize for the long delay in getting this last portion of the Real-time discussion in place. Since I wrote the first two installments, we&#8217;ve had the BI Forum (US and UK versions), plus a flurry of activity around Rittman Mead in the US, followed up by KScope11. But a promise is a promise, and here goes with the conclusion.</p>
<p>I laid out the general vocabulary and considerations for Real-time BI in <a href="http://www.rittmanmead.com/2011/05/real-time-bi-an-introduction/">my first post</a> in this series, and then followed up with how to implement Real-time BI using <a href="http://www.rittmanmead.com/2011/05/real-time-bi-federated-oltpedw-reporting/">a federated approach</a> that relies on the metadata capabilities OBIEE to blend two different environments into one. Now I&#8217;d like to discuss how we might implement a Real-time solution by relying on ETL instead of BI Tool metadata. I call this EDW with a Real-Time Component.</p>
<p>Whereas the Federated OLTP/EDW Reporting option provides us an option to layer real-time data into an otherwise classic batch-loaded EDW, delivering the EDW with a Real-Time Component requires designing an EDW from the ground up that supports real-time reporting. Specifically, we have to design our fact tables to support what Ralph Kimball calls the “real-time partition” in his book <em>The Kimball Group Reader</em>: “To achieve real-time reporting, we build a special partition that is physically and administratively separated from the conventional static data warehouse tables. Actually, the name partition is a little misleading. The real-time partition may be a separate table, subject to special rules for update and query.” We construct a separate section for each of our fact tables to facilitate the following 4 requirements, as defined by Kimball:</p>
<ol>
<li>Contain all activity since the last time the load was run</li>
<li>Link seamlessly to the grain of the static data warehouse tables</li>
<li>Be indexed so lightly that incoming data can “dribble in”</li>
<li>Support highly responsive queries</li>
</ol>
<p><img style="margin-left: auto;margin-right: auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/07/real-time-partition.png" border="0" alt="Real time partition" width="600" height="375" /></p>
<p>So we modify our model to support the interaction of real-time and static data, but we also modify our ETL to support this. In fact, to construct an EDW with a Real-Time Component, we have to build some very intricate interaction between the database, the data model and ETL processes. The static fact table is partitioned on a date data-type using standard Oracle partitioning strategies. The real-time partition is structured in such a way as to be loadable throughout the day. In other words, there are no indexes or constraints enabled on the table. ETL against the real-time partition uses a process comparable to traditional load scenarios, but using micro-batch instead, running as often as 100 times a day or more. Alternative methods include transactional style, record-by-record loading, possible using web services or message-based system such as JMS queues.</p>
<p>We  effectively want to build a single logical fact table out of the combination of the static EDW fact table and the real-time fact partition. There are several ways to do this. We could use OBIEE fragmentation for this, as we saw in the <a href="http://www.rittmanmead.com/2011/05/real-time-bi-federated-oltpedw-reporting/">last post.</a> This would work, but it&#8217;s not what I recommend. The reason we used fragmentation in the last post is because we were joining two completely different data sets across conformed dimensions into a unified model. However, with the real-time partition, we have two tables that have exactly the same structure—both using the same surrogate keys to the same dimension tables—just separated across different segments for performance reasons. In this case, I choose to UNION the two datasets with either a database view, or an opaque view in OBIEE.</p>
<p><img style="margin-left: auto;margin-right: auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/07/opaque-union-view.png" border="0" alt="Opaque union view" width="542" height="553" /></p>
<p>This works because we no longer have to control which source the rows will come from in particular situations: we simply pull all the rows, and use standard WHERE filters to limit the rows where applicable, and like the pruning the BI Server did for us in the last post, the Oracle Database will do for us in this case. We can, however, still present the static fact tables in situations that merit it: I&#8217;m thinking of financial reports here. Accountants don&#8217;t usually like their reports giving different results every time they run them.</p>
<p>We have one issue with the load of the real-time partition: we are assuming that we receive all of our dimension data right along with our fact data in clean CDC subscription groups. That would likely be the case if we were pulling all the data for our data warehouse from a single source-system, but with enterprise data warehouses, that is rarely the case. Receiving dimension data early causes no problems with our load scenario; it doesn’t matter if we do the surrogate key lookup for the fact table load hours or days later than the dimensions. Receiving the fact table data early does present us with ETL logic issues: the correct dimension record may or may not be there when it’s time to load the facts.</p>
<p>There is a simple strategy to handle early-arriving facts. In our ETL, we implement a process to insure that our facts are at least reportable intra-day:</p>
<ol>
<li>If a dimension record exists for the current business or natural key we are interested in, then grab the latest record. This is the best we can do at this point, and will usually be the correct value.</li>
<li>If no dimension record exists yet for the current natural key, then use a default record type equating to “Not Known Yet.” Though it’s not sexy for intra-day reporting, it at least makes the data available across the dimensions we do know about.</li>
<li>As we approach the end of the day and prepare to “close the books” for the current day, we should have run all dimension loads—even late arriving dimensions—so that our dimension tables are all up to date. At this point we run a corrective mapping to update all the fact records in the real-time partition with the right surrogate keys. This would likely be a MERGE statement, or a TRUNCATE/INSERT style mapping. From a performance perspective, my bet is on the latter.</li>
</ol>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2011/07/outer-join-mapping1.png"><img class="size-large wp-image-8631 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2011/07/outer-join-mapping1-1024x354.png" alt="" width="737" height="255" /></a></p>
<p>&nbsp;</p>
<p>The above mapping loads the real-time partition in a micro-batch style doing an outer join to the CUSTOMER_DIM table and writing the &#8220;Not Known Yet&#8221; row in case a customer is not found. Also, I am employing a Splitter Operator in OWB, but I tricked it out to force it to load all rows to BOTH tables: SALES_FACT_RT and SALES_STG_RT. The reason for this is that we don&#8217;t write dimension natural keys into our fact tables, though I&#8217;ve seen that technique employed in some real-time implementations. So when it&#8217;s time to run our corrective mapping to correct our fact table data, we just join the SALES_STG_RT table to the now-correct dimension tables and produce the right surrogate keys for each fact record, and load the results into SALES_FACT_RT.</p>
<p>When “closing the books” on the day, we build indexes and constraints on the real-time partition that match those on the partitioned fact table. Once this step is complete, we can then use a partition-exchange operation to combine the real-time partition as part of the static fact table. In Oracle, this is a fast, dictionary update, and occurs almost instantaneously.<br />
Obviously, our partitioning choice for the fact table will determine exactly how this partition-exchange will occur. If we’ll agree to partition the fact table by DAY, then we can use Oracle Interval partitioning, available in Oracle 11gR1 and beyond. We have to make this concession because Interval partitioning tables cannot have partitions in the same table that contain different range-based boundaries. For instance, we can’t have some MONTH-based partitions, while also having some DAY-based partitions, as we can with regular range-based partitioning. Using Interval partitioning is the easiest method, however, because it requires the least amount of partition maintenance as part of the load. For instance, consider the SALES_FACT table listed below, using Interval partitioning on the SALES_DATE_KEY, which we partition on at the DAY grain:</p>
<pre>CREATE TABLE sales_fact
       (
         customer_key           NUMBER           NOT NULL,
         product_key            NUMBER           NOT NULL,
         staff_key              NUMBER           NOT NULL,
         store_key              NUMBER           NOT NULL,
         sales_date_key         DATE             NOT NULL,
         trans_id               NUMBER,
         trans_line_id          NUMBER,
         sales_date             DATE,
         unit_price             NUMBER,
         quantity               NUMBER,
         amount                 NUMBER
       )
       partition BY range (sales_date_key)
       interval (numtodsinterval(1,'DAY'))
       (
         partition sales_fact_2006 VALUES less than (to_date('2007-01-01','YYYY-MM-DD'))
       )
       COMPRESS
/</pre>
<p>Each time we load a record into SALES_FACT for which no partition currently exists, Oracle will spawn one for the table. But based on our real-time requirements, we will use a partition-exchange operation every day to close the books on the current day processing, so each day, we will need to spawn a clean, new partition to facilitate that partition-exchange. All we need to do to make this happen is issue an insert statement with a DATE value for the partitioning key that equates to TRUNC(SYSDATE). For instance, the following statement would generate a new partition that we can use for the exchange:</p>
<pre>SQL&gt; INSERT INTO gcbc_edw.sales_fact
  2         (
  3           customer_key,
  4           product_key,
  5           staff_key,
  6           store_key,
  7           sales_date_key,
  8           trans_id,
  9           trans_line_id,
 10           sales_date,
 11           unit_price,
 12           quantity,
 13           amount)
 14         VALUES
 15         (
 16           -1,
 17           -1,
 18           -1,
 19           -1,
 20           trunc(SYSDATE),
 21           -1,
 22           -1,
 23           SYSDATE,
 24           0,
 25           0,
 26           0
 27         )
 28  /

1 row created.

Elapsed: 00:00:00.01
SQL&gt;</pre>
<p>Once the insert has created our new SYSDATE-based partition, we can exchange the real-time partition in for this new partition. We can use the new PARTITION FOR clause — which allows us to reference partition names using partition key values — with a slight caveat. Though we can’t use SYSDATE explicitly in the DDL statement, we can reference it implicitly:</p>
<pre>SQL&gt; DECLARE
  2     l_date DATE := SYSDATE;
  3     l_sql  LONG;
  4  BEGIN
  5     l_sql :=   q'|alter table gcbc_edw.sales_fact exchange partition|'
  6             || chr(10)
  7             || q'|for ('|'
  8             || l_date
  9             || q'|') with table gcbc_edw.sales_fact_rt|';
 10
 11     dbms_output.put_line( l_sql );
 12     EXECUTE IMMEDIATE( l_sql );
 13  END;
 14  /

alter table gcbc_edw.sales_fact exchange partition
for ('03/01/2011 09:38:33 PM') with table gcbc_edw.sales_fact_rt

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.07
SQL&gt;</pre>
<p>Using the preferred Interval partitioning option, the final “close the books” process flow is shown below. The first step that is taken is to run any late-arriving dimension mappings, in this example, the MAP_CUSTOMER_DIM mapping. Once all the dimensions are up-to-date, we can run the process that corrects all the dimension keys in the real-time partition. Remember, the real-time partition contains small data sets, so updating these records should not be resource intensive. In this scenario, the mapping MAP_CORRECT_SALES_FACT_RT issues an Oracle MERGE statement, but it is quite likely that a TRUNCATE/INSERT statement would work just as well. Once all the data in the real-time partition is correct and ready to go, we issue the MAP_CREATE_PARTITION mapping which uses an insert statement to spawn a new partition, and then the EXCHANGE_PARTITION PL/SQL procedure builds indexes and constraints, and completes the process by issuing the partition-exchange statement.</p>
<p><img style="margin-left: auto;margin-right: auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/07/corrective-process-flow1.png" border="0" alt="Corrective process flow" width="545" height="275" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/07/real-time-bi-edw-with-a-real-time-component/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Real-time BI: Federated OLTP/EDW Reporting</title>
		<link>http://www.rittmanmead.com/2011/05/real-time-bi-federated-oltpedw-reporting/</link>
		<comments>http://www.rittmanmead.com/2011/05/real-time-bi-federated-oltpedw-reporting/#comments</comments>
		<pubDate>Mon, 16 May 2011 16:42:41 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[BI (General)]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle Warehouse Builder]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=8243</guid>
		<description><![CDATA[The typical approach in Federated OLTP/EDW reporting environments is to use a BI tool such as OBIEE to do horizontal federation. This means combining data from multiple sources at the same grain in a single logical table. One note of clarification: my use of the word &#8220;federated&#8221; might be a misnomer, and I apologize in [...]]]></description>
			<content:encoded><![CDATA[<p>The typical approach in Federated OLTP/EDW reporting environments is to use a BI tool such as OBIEE to do horizontal federation. This means combining data from multiple sources at the same grain in a single logical table. One note of clarification: my use of the word &#8220;federated&#8221; might be a misnomer, and I apologize in advance. As I argued in the <a href="http://www.rittmanmead.com/2011/05/real-time-bi-an-introduction/">last post</a>, the best practice for performance reasons is to actually stream, or &#8220;GoldenGate&#8221; the source system data to a foundation layer on the data warehouse instance. But old habits die hard, so I&#8217;ll continue to refer to this as &#8220;federation&#8221; even though it may not be technically accurate. Thanks for the latitude.</p>
<p>One of the sources for federation is a classic, batch-loaded EDW, with ETL processes that load conformed dimension tables, followed by fact tables that store the measures and calculations for the enterprise. Oracle Warehouse Builder (OWB), the ETL tool built inside the Oracle Database, is a standard choice for data warehouses built on the Oracle Database, and below, I show a sample process flow of what that batch load might look like:</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/batch-DW.png" alt="Batch DW" border="0" width="600" height="326" /></p>
<p>Logical table sources (LTS’s) are a key feature within the OBIEE semantic model but are often misunderstood. Each LTS represents a single location for data to exist for either a logical fact table, or logical dimension table. A logical table in the BMM can have multiple LTS’s for any of the following reasons:</p>
<p>1. Including different table sources into a single logical table at different levels of granularity. Tables containing data pre-aggregated at a different level in a hierarchy is a common example of this scenario, and is known as &#8220;vertical fragmentation&#8221;.</p>
<p>2. Including different table sources into a single logical table at the same level of granularity. Having data exist in two different locations, but wanting them to be combined in particular situations, is a common example of this scenario, and is known as &#8220;horizontal fragmentation&#8221;.</p>
<p>Using horizontal fragmentation in OBIEE, we can map a single logical fact table to multiple LTS’s. For example, suppose we had a physical fact table in our EDW called SALES_FACT. To represent that fact table in the semantic model, we would create a logical fact table in the BMM — called “Sales Fact Realtime” in this example — and create an LTS that maps to the SALES_FACT table. We would also map another LTS which presents this data in the source system as well. As the source system is transactional and likely exists in third-normal form (3NF), the LTS that maps to the transactional schema would likely not be a simple one-to-one relationship. In 3NF, we would likely have to join multiple tables in our source system to represent the logical fact table Sales Fact Realtime:</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/source-to-target-fact.png" alt="Source to target fact" border="0" width="600" height="270" /></p>
<p>We would have to do something comparable with the Customer Dimension:</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/source-to-target-dimension.png" alt="Source to target dimension" border="0" width="600" height="297" /></p>
<p>With the two LTS&#8217;s, we still need to configure the horizontal fragmentation. For this implementation, I have configured a repository variable called RV_REALTIME_THRESHOLD_DT, with an initialization block that keeps the value consistently at TRUNC(SYSTDATE). I use this variable as the threshold between reporting against the EDW schema and the source system schema.</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/init-block1.png" alt="Init block" border="0" width="530" height="439" /></p>
<p>Once I have the variable available, I can configure the fragmentation on the fact table to use the threshold to determine the appropriate source for a particular record. This is less complicated with the EDW LTS&#8230; simple fragmentation configured for all rows with a transaction date less than the threshold date:</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/fragmentation-EDW.png" alt="Fragmentation EDW" border="0" width="432" height="508" /></p>
<p>Whereas only the source system contains the newer rows needed for layering in real-time data&#8230; both the EDW and the source system contain historic data, albeit the EDW data is likely transformed to a certain degree. So we have to configure fragmentation using the RV_REALTIME_THRESHOLD_DT variable, but we also have to use that variable as a filter on the source system LTS to make sure we don&#8217;t over allocate the data.</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/fragmentation-OLTP.png" alt="Fragmentation OLTP" border="0" width="436" height="507" /></p>
<p>What’s the result of all this complex mapping among different LTS’s in the BMM? OBIEE understands that each source schema is completely segmented, and the tables in each LTS never join to tables in the other LTS… but they do union. OBIEE will construct a complete query against the transactional schema, in this example, joining between the CUSTOMER_DEMOG_TYPES, CUSTOMERS, POS_TRANS and POS_TRANS_HEADER tables. Additionally, OBIEE will construct another complete query against the EDW schema, in this case, only the tables SALES_FACT and CUSTOMER_DIM. The BI Server then logically unions the results between the two source schemas into a single result set that is returned whenever a user builds a report against the logical tables Customer Dim and Sales Fact Realtime. So I run the following report against my fragmented Sales Fact Realtime:</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/high-level-report-federated.png" alt="High level report federated" border="0" width="461" height="468" /></p>
<p>The interesting part is how OBIEE does the logical union. When the EDW and the transactional schema exist in separate databases, the BI Server issues two different database queries and combines them into a single result set in its own memory space. However, if the schemas exist within the same database, as the Oracle Next-Generation Reference Architecture recommends, then the BI Server is able to issue a single query, transforming the logical union into an actual physical union in the SQL statement, as demonstrated in the statement below. Notice that the SQL threshold has been applied, and the UNION was constructed with a single SQL statement pushed down from the BI Server to the Oracle Database holding the Foundation and Presentation and Access layers in our Oracle architecture:</p>
<pre>
WITH
SAWITH0 AS (select T44105.AMOUNT as c1,
     T44042.CUSTOMER_LAST_NAME as c2,
     T48199.CALENDAR_MONTH_NUMBER as c3,
     T48199.CALENDAR_YEAR as c4,
     T48199.SQL_DATE as c5
from
     GCBC_EDW.DATE_DIM T48199 /* CONFORMED_DATE_DIM */ ,
     GCBC_EDW.CUSTOMER_DIM T44042,
     GCBC_EDW.SALES_FACT T44105
where  ( T44042.CUSTOMER_KEY = T44105.CUSTOMER_KEY and T44105.SALES_DATE_KEY = T48199.DATE_KEY ) ),
SAWITH1 AS (select T43971.SAL_AMT as c1,
     T43901.CUST_LAST_NAME as c2,
     T48199.CALENDAR_MONTH_NUMBER as c3,
     T48199.CALENDAR_YEAR as c4,
     T48199.SQL_DATE as c5
from
     GCBC_EDW.DATE_DIM T48199 /* CONFORMED_DATE_DIM */ ,
     GCBC_CRM.CUSTOMERS T43901,
     GCBC_POS.POS_TRANS T43971,
     GCBC_POS.POS_TRANS_HEADER T43978
where  ( T43901.CUST_ID = T43978.CUST_ID
         and T43971.TRANS_ID = T43978.TRANS_ID
         <strong>and T48199.DATE_KEY =  TRUNC(T43978.TRANS_DATE)
         and T43978.TRANS_DATE &gt;= TO_DATE('2011-05-16 00:00:00' , 'YYYY-MM-DD HH24:MI:SS') </strong>
       )),
SAWITH2 AS ((select concat(D0.c4, D0.c3) as c2,
     D0.c5 as c3,
     D0.c2 as c4,
     D0.c1 as c5
from
     SAWITH0 D0
union all
select concat(D0.c4, D0.c3) as c2,
     D0.c5 as c3,
     D0.c2 as c4,
     D0.c1 as c5
from
     SAWITH1 D0)),
SAWITH3 AS (select sum(D3.c5) as c1,
     D3.c2 as c2,
     D3.c3 as c3,
     D3.c4 as c4
from
     SAWITH2 D3
group by D3.c2, D3.c3, D3.c4)
select distinct 0 as c1,
     D2.c2 as c2,
     D2.c3 as c3,
     D2.c4 as c4,
     D2.c1 as c5
from
     SAWITH3 D2
order by c2, c4, c3
</pre>
<p>But OBIEE is also capable of doing the fragmentation equivalent of &#8220;partition pruning.&#8221; When the BI Server has enough information to know that the entire result set will come from a single source, then the SQL will be issued against only one of the LTS&#8217;s. For instance, if I click on one of the &#8220;SQL Date&#8221; attributes in the above report which will apply a filter on the fragmentation column, the BI Server will know that the result set only comes from the EDW:</p>
<pre>WITH
SAWITH0 AS (select sum(T44105.AMOUNT) as c1,
     concat(T48199.CALENDAR_YEAR, T48199.CALENDAR_MONTH_NUMBER) as c2,
     T48199.DATE_KEY as c3,
     T48199.SQL_DATE as c4,
     T44042.CUSTOMER_LAST_NAME as c5
from
     GCBC_EDW.DATE_DIM T48199 /* CONFORMED_DATE_DIM */ ,
     GCBC_EDW.CUSTOMER_DIM T44042,
                   GCBC_EDW.SALES_FACT T44105
where  ( T44042.CUSTOMER_KEY = T44105.CUSTOMER_KEY
         and T44042.CUSTOMER_LAST_NAME = 'Carr'
         and T44105.SALES_DATE_KEY = T48199.DATE_KEY
         <strong>and T48199.SQL_DATE = TO_DATE('2009-07-03' , 'YYYY-MM-DD')</strong>
         and concat(T48199.CALENDAR_YEAR, T48199.CALENDAR_MONTH_NUMBER) = '200907' )
group by T44042.CUSTOMER_LAST_NAME,
         T48199.DATE_KEY,
         T48199.SQL_DATE,
         concat(T48199.CALENDAR_YEAR, T48199.CALENDAR_MONTH_NUMBER))
select distinct 0 as c1,
     D1.c2 as c2,
     D1.c3 as c3,
     D1.c4 as c4,
     D1.c5 as c5,
     D1.c1 as c6
from
     SAWITH0 D1
order by c2, c5, c4, c3</pre>
<p>Before closing this section of the real-time discussion, I want to take a minute to identify the strengths and weaknesses of this approach. As far as strengths go, we have several items that register with this solution. First off&#8230; this is a low-latency solution. When using the Oracle Next-Generation Reference Architecture, we have the latency of streaming, or &#8220;GoldenGating,&#8221; the content from the source system to the DW database. With clients we&#8217;ve had in the past, this can run anywhere from a few seconds to several minutes, depending on the solution implemented. Additionally, there is no complex logical or physical data modeling and supporting ETL to deliver this solution, as there is with the EDW with a Real-Time Component, which we will explore in the next posting.</p>
<p>As far as weaknesses go, there will be a fair amount of complex RPD semantic-layer modeling. Obviously, the degree of difficulty depends on a number of factors: number of source systems integrated, number of subject areas, complexity of reports delivered, etc. Also, increased complexity of RPD modeling may introduce performance degradation as OLTP schemas have to be transformed &#8220;on the fly&#8221; to star schemas by the BI Server. But keep in mind&#8230; we are typically only doing this for at most a day&#8217;s worth of data, so with proper database tuning, this content can usually perform quite well.</p>
<p>Next up: EDW with a Real-Time Component</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/05/real-time-bi-federated-oltpedw-reporting/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Real-time BI: An Introduction</title>
		<link>http://www.rittmanmead.com/2011/05/real-time-bi-an-introduction/</link>
		<comments>http://www.rittmanmead.com/2011/05/real-time-bi-an-introduction/#comments</comments>
		<pubDate>Mon, 09 May 2011 20:25:23 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle Warehouse Builder]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=8137</guid>
		<description><![CDATA[Discussing real-time data warehousing is difficult because the meaning of real-time is dependent on context. A CIO of an organization that has weekly batch refresh processes might view an up-to-the-day dashboard as real-time, while another organization that already has daily refresh cycles might be looking for something closer to up-to-the-hour. In truth, an interval will [...]]]></description>
			<content:encoded><![CDATA[<p>Discussing real-time data warehousing is difficult because the meaning of real-time is dependent on context. A CIO of an organization that has weekly batch refresh processes might view an up-to-the-day dashboard as real-time, while another organization that already has daily refresh cycles might be looking for something closer to up-to-the-hour. In truth, an interval will always exist between the occurrence of a measurable event and our ability to process that event as a reportable fact. In other words, there will always be some degree of latency between the source-system record of an event happening, and our ability to report that it happened.</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/reportable-fact1.png" alt="Reportable fact" border="0" width="600" height="513" /></p>
<p>For the purposes of this series of blog posts, I’m defining real-time as anything that pushes the envelope on the standard daily batch load window. We will explore some of the architectural options available in the standard Oracle BI stack (Oracle Database plus Oracle Business Intelligence) for the removal of the latency inherent in this well-established paradigm.</p>
<p>I&#8217;ve categorized 4 basic types of DW/BI systems as they relate to real-time BI.</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/four-types1.png" alt="Four types" border="0" width="600" height="438" /></p>
<p>Our solution with the least amount of latency, but also with the worst performance, is simple reporting against the OLTP database. Our next approach, with more latency but better performance, is a federated approach, using a BI tool such as OBIEE to “combine” results from the data warehouse with fresh data from the OLTP source system schema. In this scenario, we have a typical data warehouse, loading in daily batch cycle, but we layer in fresh data from the source for intra-day records. We improve upon query performance by using an optimized data warehouse for the majority of our data, with gains for real-time reporting by including the non-tranformed data directly from the source system schema. Our next approach is using a traditional data warehouse that has been optimized to store the results of micro-batch loads. In this scenario, instead of running the batch load process once every 24 hours, we instead run it several times a day, usually between one and ten times an hour. We extend the standard data warehouse architecture to have a real-time component, which means, we modify our fact tables, our dimension tables, and the ETL processes that run them to better handle the micro-batch processing. Finally, last on the list in terms of latency, but our best choice for pure performance, is the traditional, batch-loaded data warehouse.</p>
<p>As the image above suggests, I&#8217;m most interested in the two squares in the middle, as I think they are the only ones that qualify as both &#8220;real-time&#8221; and &#8220;BI&#8221;. In the series of blog posts to follow, I&#8217;ll be talking about these two solutions, how they compare to each other, and what are some of the keys in determining &#8220;readiness&#8221; for delivering them. Before moving on, however, I need to make a pitch for the Oracle Next-Generation Reference DW Architecture, which Mark first spoke about <a href="http://www.rittmanmead.com/2009/07/drilling-down-in-the-oracle-next-generation-reference-dw-architecture/">here</a>. </p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/next-gen1.png" alt="Next gen" border="0" width="600" height="347" /></p>
<p>Regardless of which of the two solutions that we opt for, we should implement the staging, foundation and performance layers as the best overall approach to building and sustaining business intelligence. For example&#8230; even if we should to use our source system schema to address some portion of our reporting requirements, we should still stream, or &#8220;GoldenGate,&#8221; our data from the actual live transactional system to the database where we do our reporting. But I&#8217;m not arguing in favor of simply having another &#8220;reporting&#8221; copy of our source system: I&#8217;m advocating that we use change data capture strategies to populate a foundation layer where we maintain a complete history of all the source system changes. Only then are we fully insulated against any change in user behavior and the possible change in reporting requirements that follows.</p>
<p>Be on the lookup for the first follow-up to this, where I will explain the EDW with Federated OLTP Data choice, and demonstrate how to achieve this using OBIEE 11g.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/05/real-time-bi-an-introduction/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Oracle Data Warehouse Global Leaders Webcast</title>
		<link>http://www.rittmanmead.com/2011/03/oracle-data-warehouse-global-leaders-webcast/</link>
		<comments>http://www.rittmanmead.com/2011/03/oracle-data-warehouse-global-leaders-webcast/#comments</comments>
		<pubDate>Fri, 18 Mar 2011 18:47:23 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[BI (General)]]></category>
		<category><![CDATA[BI 2.0]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=7593</guid>
		<description><![CDATA[I&#8217;m honored to be delivering a webcast for the Oracle Data Warehouse Global Leaders Program on Tuesday, March 22 at Noon EST. This is an elite program for key global data warehousing customers and is managed by the Oracle data warehousing product management team. It also provides a rich opportunity to network with peers, and these webcasts are one of [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m honored to be delivering a webcast for the Oracle Data Warehouse Global Leaders Program on Tuesday, March 22 at Noon EST. This is an elite program for key global data warehousing customers and is managed by the Oracle data warehousing product management team. It also provides a rich opportunity to network with peers, and these webcasts are one of the ways that Oracle delivers value to the program members. Anyone interested in the program or the webcast should <a href="mailto:dw-global-leaders_us@oracle.com" target="_blank">email the DW Global Leaders program</a>.</p>
<p>The subject matter will be &#8220;Agile Data Warehousing on Oracle Exadata and OBIEE 11g&#8221;. This is a subject I&#8217;ve been devoting a lot of time to lately, both in project delivery and in speaking. With two full data warehouse delivery projects on Exadata under my belt, and several other partial projects, the Database Machine is absolutely a paradigm shift. But the real tipping point comes when these DW capabilities are combined with a powerful metadata layer, such as exists in OBIEE 11g. Over the last few years, I&#8217;ve adjusted and re-adjusted long-standing beliefs about how data warehouses should be built and delivered. While I&#8217;ll talk about what makes Exadata and OBIEE different, my main focus is demonstrating how to use the features to deliver BI in accordance with standard Agile concepts. I also have a series of blog posts planned to dive into this subject in detail.</p>
<p>If you&#8217;re interested in homework, I&#8217;ll be discussing the <a title="Drilling Down in the Oracle Next-Generation Reference DW Architecture" href="http://www.rittmanmead.com/2009/07/drilling-down-in-the-oracle-next-generation-reference-dw-architecture/">Oracle Next-Generation Data Warehouse Reference Architecture</a>, Exadata Smart-Scan, and the OBIEE Semantic Model. Additionally, I&#8217;ll spend some time on the <a href="http://agilemanifesto.org/">Agile Manifesto</a>, the generic <a href="http://en.wikipedia.org/wiki/Agile_software_development">agile development movement</a>, and what effect they have on DW delivery methodologies.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/03/oracle-data-warehouse-global-leaders-webcast/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Transcend Part 2: Dimensions</title>
		<link>http://www.rittmanmead.com/2010/12/transcend-part-2-dimensions/</link>
		<comments>http://www.rittmanmead.com/2010/12/transcend-part-2-dimensions/#comments</comments>
		<pubDate>Wed, 15 Dec 2010 05:17:03 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Data Integrator]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle Warehouse Builder]]></category>
		<category><![CDATA[Rittman Mead]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=6790</guid>
		<description><![CDATA[In the last post, I described the concept of a &#8220;Mapping&#8221; entity in Transcend as a bundling of pre- and post-mapping processes that can be executed with a single call. The &#8220;Dimension&#8221; entity is very similar. In fact, in strict programming terms (for those of you who are interested&#8230; the rest I&#8217;ll bore for just [...]]]></description>
			<content:encoded><![CDATA[<p>In the last <a href="http://www.rittmanmead.com/2010/12/01/transcend-part-1-mappings/">post</a>, I described the concept of a &#8220;Mapping&#8221; entity in Transcend as a bundling of pre- and post-mapping processes that can be executed with a single call. The &#8220;Dimension&#8221; entity is very similar. In fact, in strict programming terms (for those of you who are interested&#8230; the rest I&#8217;ll bore for just a moment), a Dimension actually is a polymorphed Mapping for Transcend. We wrote Transcend using lots of object types, and a Dimension type actually inherits&#8211;or is &#8220;under&#8221; in Oracle object-relational speak&#8211;a Mapping type. So what this means is that a Dimension is a special kind of Mapping. It uses the Mapping&#8217;s methods for bundling processes before and after an ETL mapping to facilitate the loading of hybrid Type 1 and Type 2 slowly-changing dimensions.</p>
<p>First, I need a dimension table that I can work with, and one that most of us are familiar with. I&#8217;m going to use the SH.PRODUCTS table&#8230; but with a few modifications. The dimension tables in the SH schema don&#8217;t use surrogate keys, so I&#8217;m going to add one. Additionally&#8230; I&#8217;m going to pare down the columns a little bit, while also updating the data slightly to be a little more standard:</p>
<pre>SQL&gt; create table stewart.product_dim
  2  as select
  3  prod_id product_key,
  4  prod_id,
  5  prod_name,
  6  prod_desc,
  7  prod_status,
  8  prod_eff_from,
  9  prod_eff_to,
 10  prod_valid
 11  from sh.products;

Table created.

SQL&gt; update stewart.product_dim
  2     set prod_eff_to='12/31/9999',
  3         prod_valid='Y'
  4   where prod_valid='A';

72 rows updated.

SQL&gt; update stewart.product_dim
  2     set prod_valid='N'
  3   where prod_valid='I';

0 rows updated.

SQL&gt;</pre>
<p>Notice that, with the SH.PRODUCTS table, there are only &#8216;Active&#8217; rows in the table. You&#8217;ll see how this degrades the quality of the test case later, but I&#8217;ll trudge on. I&#8217;ll also add a few indexes to demonstrate Transcend&#8217;s capability for handling those, just like we saw with the Mapping. I&#8217;ll do some dog-fooding and use the product to generate these indexes:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.build_indexes
  3     ( p_owner        =&gt; 'stewart',
  4       p_table        =&gt; 'product_dim',
  5       p_source_owner =&gt; 'sh',
  6       p_source_table =&gt; 'products',
  7       p_index_type   =&gt; 'bitmap'
  8     );
  9  END;
 10  /
1 index creation process executed for STEWART.PRODUCT_DIM

PL/SQL procedure successfully completed.

SQL&gt;</pre>
<p>I&#8217;ll also need a source table, which would be the target table for our ETL mapping. Hold tight&#8230; let me explain. Transcend does not try to replace your ETL tool&#8230; it just tries to make some of the heavy-lifting easier. As the SCD implementation in most of these tools is either non-standard or non-perfromant, we tried to deliver a best practices approach that could be easily called from any tool. Obviously, this would work well with custom-developed ETL mappings as well. So all you have to do in your ETL mapping is construct the business logic for how the data set from your source systems needs to be joined and transformed to be ready to roll into your dimension. You drop that in a staging table, tell Transcend that this data needs to make it&#8217;s way into the dimension table, and then the product will do the rest. My holding table will be called PRODUCT_SRC, and needs a subset of the columns that exist in the dimension table. The only columns that aren&#8217;t required are the ones that are calculated and stored to facilitate SCD processing. For my table, this is: PRODUCT_KEY, PROD_EFF_TO, and PROD_VALID. I&#8217;ll add a few rows to this table from SH.PRODUCTS and modify them to look like source-system updates:</p>
<pre>SQL&gt; create table staging.product_src
  2  as select
  3  prod_id,
  4  prod_name,
  5  prod_desc,
  6  prod_status,
  7  prod_eff_from
  8  from sh.products
  9  where rownum &lt; 11;

Table created.

SQL&gt; update staging.product_src
  2  set prod_name = 'New '||prod_name;

10 rows updated.

SQL&gt;</pre>
<p>Now I need to configure the Dimension and how I want it loaded, much like I configured the Mapping in the last post. Actually, many of the parameters are the same, but there are some new ones in there as well:</p>
<pre>SQL&gt; BEGIN
  2     trans_adm.create_dimension
  3     ( p_mapping          =&gt; 'map_product_dim',
  4       -- dimension table
  5       p_owner            =&gt; 'stewart',
  6       p_table            =&gt; 'product_dim',
  7       -- SCHEMA for intermediate tables
  8       p_staging_owner    =&gt; 'staging',
  9       -- intermediate source table
 10       p_source_owner     =&gt; 'staging',
 11       p_source_table     =&gt; 'product_src',
 12       -- SEQUENCE for the dimension
 13       p_sequence_owner   =&gt; 'stewart',
 14       p_sequence_name    =&gt; 'product_key_seq',
 15       p_default_scd_type =&gt; 2,
 16       p_description      =&gt; 'load for PRODUCT_DIM',
 17       -- MANAGE indexes and constraints
 18       p_indexes          =&gt; 'both',
 19       p_index_type       =&gt; 'bitmap',
 20       p_constraints      =&gt; 'both'
 21     );
 22  END;
 23  /

PL/SQL procedure successfully completed.

SQL&gt;</pre>
<p>So I&#8217;ve told Transcend about my dimension table, including the sequence to use for the surrogate key, the default SCD type (more on this later), and the staging (or work) schema to hold any intermediate tables that Transcend needs to create. I&#8217;m also registering the PRODUCT_SRC table which I described above, and I&#8217;m telling it how to handle constraints and indexes, which I described in the last post.</p>
<p>With the basics around the dimension table configured, I now need to work on the columns. I&#8217;ve configured the default SCD type as a Type 2. That means that I don&#8217;t have to do anything with the columns that are regular dimensional attributes for which I want to capture change. So the only things I need to register with Transcend are the following: any Type 1 dimensional attributes, the surrogate key, the natural key, the current indicator column, the effective date and the expiration date:</p>
<pre>SQL&gt; BEGIN
  2     trans_adm.create_dim_attribs
  3     ( p_mapping       =&gt; 'map_product_dim',
  4       p_surrogate     =&gt; 'product_key',
  5       p_effective_dt  =&gt; 'prod_eff_from',
  6       p_expiration_dt =&gt; 'prod_eff_to',
  7       p_current_ind   =&gt; 'prod_valid',
  8       p_nat_key       =&gt; 'prod_id',
  9       p_scd1          =&gt; 'prod_status'
 10     );
 11  END;
 12  /

PL/SQL procedure successfully completed.

SQL&gt; commit;

Commit complete.

SQL&gt;</pre>
<p>It&#8217;s worth noting that the P_NAT_KEY and P_SCD1 columns take comma-separated lists of values in case you need to pass multiple column names in. I can also use the P_SCD2 parameter if my default SCD type was a 1.</p>
<p>Since I haven&#8217;t told Transcend the specific approach I want to take to getting the rows into the dimension table, the default &#8216;merge&#8217; methodology is employed. This represents a change in the default behavior beginning in version 2.5 away from the old default of &#8216;exchange&#8217;, which I will demonstrate in a bit. A single &#8220;INSERT into&#8230; SELECT&#8230;&#8221; statement will be issued to pull all the rows with a PROD_VALID of &#8216;Y&#8217; from PRODUCT_DIM along with all the incoming rows from PRODUCT_SRC into an intermediate table, doing complex SQL analytics along the way to evaluate and process any Type 1 and Type 2 changes, as well as setting PROD_EFF_TO, PROD_EFF_FROM and PROD_VALID. The changes are then MERGED back into the dimension table. Because we use a single, set-based process, the performance over comparable row-by-row processing can be substantial. Executing the Dimension functionality is done as if it were a regular mapping:</p>
<pre>SQL&gt; exec trans_etl.start_mapping( 'map_product_dim' );
Pre-mapping processes beginning
Pre-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt; -- here you would execute the mapping
SQL&gt; exec trans_etl.end_mapping( 'map_product_dim' );
Post-mapping processes beginning
Table STAGING.STG$PRODUCT_DIM created
<strong>Number of records processed with analytics into STAGING.STG$PRODUCT_DIM: 82</strong>
5 constraint disablement processes for STEWART.PRODUCT_DIM executed
1 index and 0 local index partitions affected on table STEWART.PRODUCT_DIM
<strong>Number of SCD1 attributes updated with a MERGE in STEWART.PRODUCT_DIM: 0
Number of records merged into STEWART.PRODUCT_DIM: 82</strong>
1 index rebuild process for table STEWART.PRODUCT_DIM executed
5 constraint enablement processes for STEWART.PRODUCT_DIM executed
Post-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt;</pre>
<p>It&#8217;s not an overly impressive test case because, as I mentioned above, there are no historical rows in this table at all: PROD_VALID is &#8216;Y&#8217; for every record. Transcend brings all the current dimension rows into the working set for the following reasons: the effective dates and the current indicator might have to be modified, and all the current Type 2 attributes have to be in the working set to be able to determine whether there have been Type 2 changes. But before this, a separate MERGE statement executes to update all the Type 1 changes for the historical rows. A Type 1 change is seen as a correction, and requires updating ALL the rows in the dimension table for that entity, even ones that may be years old.</p>
<p>Now, I&#8217;ll reset the test case in preparation for using a PARTITION EXCHANGE instead of a MERGE. This will issue two insert statements: one that does the same insert statement as above, and another that brings all the old dimension rows into the intermediate table as well. This is because we are replacing PRODUCT_DIM table with a brand new version of the table. Exchanges are handy in cases where 100% uptime is required, and users can&#8217;t afford to be down for even the amount of time it takes to load dimension tables. Partition exchanges don&#8217;t disrupt queries to the dimension tables, as the load is done to the intermediate table while the dimension is still available. When that load is complete, the dimension table is just swapped out for the new version of the table in an instantaneous dictionary update. Transcend makes this possible by creating the intermediate table with a single partition using MAXVALUE so that the one partition contains all data, and swapping that single partition with the dimension table. Note, however, that we still have to process all the Type 1 updates to the historical rows the same as we did before. However, we do that in the intermediate table so it is done prior to being exchanged in to the dimension table:</p>
<pre>SQL&gt; BEGIN
  2     trans_adm.modify_dimension
  3     ( p_mapping        =&gt; 'map_product_dim',
  4       p_replace_method =&gt; 'exchange'
  5     );
  6  END;
  7  /

PL/SQL procedure successfully completed.

SQL&gt; exec trans_etl.start_mapping( 'map_product_dim' );
Pre-mapping processes beginning
Pre-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt; -- here you would execute the mapping
SQL&gt; exec trans_etl.end_mapping( 'map_product_dim' );
Post-mapping processes beginning
Table STAGING.STG$PRODUCT_DIM created
<strong>Number of history records inserted into STAGING.STG$PRODUCT_DIM: 0
Number of records processed with analytics into STAGING.STG$PRODUCT_DIM: 82
Number of SCD1 attributes updated with a MERGE in STAGING.STG$PRODUCT_DIM: 10</strong>
Statistics from STEWART.PRODUCT_DIM transferred to partition PMAX of STAGING.STG$PRODUCT_DIM
1 index creation process executed for STAGING.STG$PRODUCT_DIM
5 constraints built for STAGING.STG$PRODUCT_DIM
<strong>STEWART.PRODUCT_DIM exchanged for partition PMAX of table STAGING.STG$PRODUCT_DIM</strong>
Post-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt;</pre>
<p>Again&#8230; since there are no historical records in the PRODUCT_DIM table, the first insert brought over no records.</p>
<p>And in case you aren&#8217;t confused enough as is&#8230; Transcend supports late-arriving dimensions as well. In the default mode, if you get an incoming dimension row that has en effective date before the effective date of the current row for that dimensional entity, then the PROD_EFF_TO and PROD_EFF_FROM date ranges, as well as the tracking of SCD 2 attributes, won&#8217;t work correctly. But if we turn on the late-arriving feature, Transcend will handle this seamlessly. It sets the effective dates and current indicator correctly over the entire range of data, but more importantly&#8230; it will represent the history of SCD 2 attributes correctly regardless of which order the rows came in. The complex analytics statement used to evaluate Type 1 and Type 2 attributes, as well as setting effective dates and current indicators, will process the entire dimension table along with the incoming rows. Though this sounds like it would be excruciatingly slow, the fact is that it isn&#8217;t. Compared to the typical row-by-row approach that ETL tools take, requiring lots and lots of updates, a pure set-based process, even if it reads the entire dimension table, is generally the winner. But remember, this is only required if late-arriving dimensions are a possibility. However, for many clients, the processing of the entire table combined with a partition exchange outperforms the MERGE approach of just loading the current records. Updates are expensive folks.</p>
<p>You can see that the process is down to a single insert again:</p>
<pre>SQL&gt; BEGIN
  2     trans_adm.modify_dimension
  3     ( p_mapping       =&gt; 'map_product_dim',
  4       p_late_arriving =&gt; 'yes'
  5     );
  6  END;
  7  /

PL/SQL procedure successfully completed.

SQL&gt; exec trans_etl.start_mapping( 'map_product_dim' );
Pre-mapping processes beginning
Pre-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt; -- here you would execute the mapping
SQL&gt; exec trans_etl.end_mapping( 'map_product_dim' );
Post-mapping processes beginning
Table STAGING.STG$PRODUCT_DIM created
<strong>Number of records processed with analytics into STAGING.STG$PRODUCT_DIM: 82</strong>
Statistics from STEWART.PRODUCT_DIM transferred to partition PMAX of STAGING.STG$PRODUCT_DIM
1 index creation process executed for STAGING.STG$PRODUCT_DIM
5 constraints built for STAGING.STG$PRODUCT_DIM
<strong>STEWART.PRODUCT_DIM exchanged for partition PMAX of table STAGING.STG$PRODUCT_DIM</strong>
Post-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt;</pre>
<p>The moral of this story: if you have any Type 1 attributes, don&#8217;t immediately go for the default MERGE approach with it&#8217;s multiple updates required. Give the late-arriving approach a try, even if you don&#8217;t have any. You might see the single insert statement outperform all the rest.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/12/transcend-part-2-dimensions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Transcend Part 1: Mappings</title>
		<link>http://www.rittmanmead.com/2010/12/transcend-part-1-mappings/</link>
		<comments>http://www.rittmanmead.com/2010/12/transcend-part-1-mappings/#comments</comments>
		<pubDate>Wed, 01 Dec 2010 04:28:12 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Data Integrator]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle Warehouse Builder]]></category>
		<category><![CDATA[Rittman Mead]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=6632</guid>
		<description><![CDATA[If you read the blog regularly, then you know that Rittman Mead offers a product called Transcend that is designed to interact with the Oracle Database to make some of the heavy-lifting in ETL mappings very simple. Because we recently released version 2.5 of Transcend, I thought I&#8217;d talk about some of the features I [...]]]></description>
			<content:encoded><![CDATA[<p>If you read the blog regularly, then you know that Rittman Mead offers a product called Transcend that is designed to interact with the Oracle Database to make some of the heavy-lifting in ETL mappings very simple. Because we recently released version 2.5 of Transcend, I thought I&#8217;d talk about some of the features I haven&#8217;t blogged about before. In the next few posts, I&#8217;ll talk about &#8220;Mapping&#8221; entities, &#8220;Dimension&#8221; entities, and how these can be integrated with standard ETL tools like OWB and ODI. In the current post I&#8217;ll mainly be talking about Mapping entities</p>
<p>When you think of a Mapping in Transcend, try not to think about the common source-to-target functionality that you would create in an ETL tool or a custom mapping. Instead, try to consider EVERYTHING ELSE that you might need to couple with the mapping that ETL tools don&#8217;t provide: <a href="http://www.rittmanmead.com/2009/12/21/transcend-and-index-maintenance/">index maitenance</a>, <a href="http://www.rittmanmead.com/2009/12/29/transcend-and-constraint-maintenance/">constraint maintenance</a>, <a href="http://www.rittmanmead.com/2010/03/23/transcend-and-segment-switching/">segment-switching</a>, etc. If you look at the above posts, you will see a lot of features that can be implemented with a series of calls. That&#8217;s not difficult, but it&#8217;s still code, right? But what if you could &#8220;bundle&#8221; all of the necessary calls that need to be made BEFORE and AFTER a mapping into one easy name, so that you could pass that name to just two calls&#8211;one before the mapping and one after&#8211;to get all the necessary features needed to roll that mapping? Well&#8230; then you&#8217;d have a Transcend Mapping!</p>
<p>In addition to all the heavy-lifting, we wanted to make sure Transcend did the simple things right, so we first created an instrumentation framework that could be used consistently throughout the product. This framework is affectionately called Evolve and includes auditing, logging (with different levels), process registration with the Oracle Database, and a DEBUG mode. So the very basic skeleton of a Mapping in Transcend is the ability to log messages in a logging table, set the MODULE and ACTION contexts with the database, and capture exception errors all the way back to their source. To put together a standard mapping, I would make the following call:</p>
<pre>SQL&gt; BEGIN
  2     trans_adm.create_mapping( p_mapping =&gt; 'map_sales_fact' );
  3  END;
  4  /

PL/SQL procedure successfully completed.

SQL&gt;</pre>
<p>Now, whenever I execute the MAP_SALES_FACT mapping in either my ETL tool or my custom ETL processing, I need to make a single call before the mapping runs, and a single call after:</p>
<pre>SQL&gt; exec trans_etl.start_mapping( p_mapping =&gt; 'map_sales_fact' );
Pre-mapping processes beginning
Pre-mapping processes completed

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL&gt; -- here you would execute the mapping
SQL&gt; -- notice that the session has been instrumented;
SQL&gt; select SYS_CONTEXT( 'USERENV', 'MODULE' ) module,
  2         SYS_CONTEXT( 'USERENV', 'ACTION' ) action
  3    from dual;

MODULE                         | ACTION
------------------------------ | --------------------------------
mapping map_sales_fact         | execute mapping

1 row selected.

Elapsed: 00:00:00.00
SQL&gt; exec trans_etl.end_mapping( p_mapping =&gt; 'map_sales_fact' );
Post-mapping processes beginning
Post-mapping processes completed

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL&gt; </pre>
<p>So at the very least, Transcend provides the ability to instrument the mapping so the session can be identified easily in V$SESSION, as well as setting a consistent environment for tracing sessions with <a href="http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_monitor.htm#ARPLS091">DBMS_MONITOR</a>, which allows for tracing sessions based on combinations of MODULE and ACTION. But there is a lot more we can do with Transcend. We can easily incorporate index maintenance by changing our configuration slightly. All the features previously described around <a href="http://www.rittmanmead.com/2009/12/21/transcend-and-index-maintenance/">index maintenance</a> can be turned on and off with a simple configuration change (NOTE: the &#8216;both&#8217; parameter specifies that we want to mark indexes both UNUSABLE before the mapping and USABLE again after the mapping. We could have also passed &#8220;unusable&#8221;,&#8221;usable&#8221; or &#8220;ignore&#8221; ):</p>
<pre>SQL&gt; BEGIN
  2     trans_adm.modify_mapping( p_mapping         =&gt; 'map_sales_fact',
  3                               p_table           =&gt; 'sales_fact',
  4                               p_owner           =&gt; 'stewart',
  5                               p_indexes         =&gt; 'both',
  6                               p_index_type      =&gt; 'bitmap',
  7                               p_idx_concurrency =&gt; 'yes'
  8                             );
  9  END;
 10  /

PL/SQL procedure successfully completed.

SQL&gt; </pre>
<p>Now I simply execute the exact same calls I made before, but now the new functionality is implemented:</p>
<pre>SQL&gt; exec trans_etl.start_mapping( p_mapping =&gt; 'map_sales_fact' );
Pre-mapping processes beginning
5 indexes and 0 local index partitions affected on table STEWART.SALES_FACT
Pre-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt; -- here you would execute the mapping
SQL&gt; exec trans_etl.end_mapping( p_mapping =&gt; 'map_sales_fact' );
Post-mapping processes beginning
Rebuild processes for unusable indexes on 28 partitions of table STEWART.SALES_FACT submitted to the Oracle scheduler
No matching unusable global indexes found
Post-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt;  </pre>
<p>Additionally, all the constraint maintenance features described <a href="http://www.rittmanmead.com/2009/12/29/transcend-and-constraint-maintenance/">here</a> can be implemented with another call to MODIFY_MAPPING:</p>
<pre>SQL&gt; BEGIN
  2     trans_adm.modify_mapping( p_mapping         =&gt; 'map_sales_fact',
  3                               p_constraints     =&gt; 'both',
  4                               p_constraint_type =&gt; 'P',
  5                               p_con_concurrency =&gt; 'no'
  6                             );
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL&gt; exec trans_etl.start_mapping( p_mapping =&gt; 'map_sales_fact' );
Pre-mapping processes beginning
5 indexes and 0 local index partitions affected on table STEWART.SALES_FACT
1 constraint disablement process for STEWART.SALES_FACT executed
Pre-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt; -- here you would execute the mapping
SQL&gt; exec trans_etl.end_mapping( p_mapping =&gt; 'map_sales_fact' );
Post-mapping processes beginning
Rebuild processes for unusable indexes on 28 partitions of table STEWART.SALES_FACT submitted to the Oracle scheduler
No matching unusable global indexes found
1 constraint enablement process for STEWART.SALES_FACT executed
Post-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt; </pre>
<p>And finally, we can implement <a href="http://www.rittmanmead.com/2010/03/23/transcend-and-segment-switching/">segment-switching</a> using partition exchanges, table renames&#8230; and now a new option in 2.5: MERGE statements. As described in the above post, Transcend Mappings can provide the functionality to facilitate getting the rows from one segment into another. So you could create a mapping that loads all the new rows for a fact table, and propagate those changes into another table using a trio of possibilities. Formerly, only the options to do this with partition exchanges and table renames made sense, because those were the most difficult to reproduce in and ETL tool, and thus required custom-coding. As I&#8217;ve worked with numerous clients since writing the initial version of Transcend, I&#8217;ve discovered that some of the non-Oracle ETL tools have a tough time replicated the functionality of a MERGE statement. So now we can MERGE all the rows from one segment into another at the end of an ETL mapping by simply configuring this feature with Transcend:</p>
<pre>SQL&gt; BEGIN
  2     trans_adm.modify_mapping( p_mapping         =&gt; 'map_sales_fact',
  3                               p_replace_method  =&gt; 'merge',
  4                               p_staging_owner   =&gt; 'stewart',
  5                               p_staging_table   =&gt; 'sales_stg'
  6                             );
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL&gt; exec trans_etl.start_mapping( p_mapping =&gt; 'map_sales_fact' );
Pre-mapping processes beginning
0 indexes and 80 local index partitions affected on table STEWART.SALES_FACT
1 constraint disablement process for STEWART.SALES_FACT executed
Pre-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt; -- here you would execute the mapping
SQL&gt; exec trans_etl.end_mapping( p_mapping =&gt; 'map_sales_fact' );
Post-mapping processes beginning
Number of records merged into STEWART.SALES_FACT: 918843
Rebuild processes for unusable indexes on 16 partitions of table STEWART.SALES_FACT submitted to the Oracle scheduler
No matching unusable global indexes found
1 constraint enablement process for STEWART.SALES_FACT executed
Post-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt; </pre>
<p>And we can still use the segment-switching options from the previous version, including the popular &#8220;exchange&#8221; option:</p>
<pre>SQL&gt; BEGIN
  2     trans_adm.modify_mapping( p_mapping         =&gt; 'map_sales_fact',
  3                               p_replace_method  =&gt; 'exchange'
  4                             );
  5  END;
  6  /

PL/SQL procedure successfully completed.

SQL&gt; exec trans_etl.start_mapping( p_mapping =&gt; 'map_sales_fact' );
Pre-mapping processes beginning
Pre-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt; -- here you would execute the mapping
SQL&gt; exec trans_etl.end_mapping( p_mapping =&gt; 'map_sales_fact' );
Post-mapping processes beginning
6 index creation processes submitted to the Oracle scheduler for STEWART.SALES_STG
Creation of constraint SALES_STG_PK executed
1 constraint built for STEWART.SALES_STG
STEWART.SALES_STG exchanged for partition SALES_Q4_2003 of table STEWART.SALES_FACT
1 constraint dropped on STEWART.SALES_STG
6 indexes dropped on STEWART.SALES_STG
Post-mapping processes completed

PL/SQL procedure successfully completed.

SQL&gt; </pre>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/12/transcend-part-1-mappings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Partitioning Fact Tables, Part 1</title>
		<link>http://www.rittmanmead.com/2010/08/partitioning-fact-tables-part-1/</link>
		<comments>http://www.rittmanmead.com/2010/08/partitioning-fact-tables-part-1/#comments</comments>
		<pubDate>Fri, 13 Aug 2010 00:39:09 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=5245</guid>
		<description><![CDATA[I&#8217;m dogmatic about certain aspects of data warehousing. For instance, fact tables should be range partitioned by DATE. I tell my clients all the time: you will have a very difficult time persuading me otherwise. But they always try: they argue about all the attributes that are more pervasive than DATE: customer classes, transaction types, [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m dogmatic about certain aspects of data warehousing. For instance, fact tables should be range partitioned by DATE. I tell my clients all the time: you will have a very difficult time persuading me otherwise. But they always try: they argue about all the attributes that are more pervasive than DATE: customer classes, transaction types, etc., etc. But I&#8217;m just not buying it. We are building data warehouses, and the <a href="http://intelligent-enterprise.informationweek.com/030422/607warehouse1_1.jhtml?_requestid=5194">third rail</a> of the Soul of the Data Warehouse is how it handles time.</p>
<p>If you agree with me about this precept (and I really think you should), this is still not the end of the story. We must charge ahead into the lion&#8217;s den of a debate that has been raging in the Oracle data warehousing world for years: do we make the surrogate key of our date dimension a NUMBER, or do we make it a DATE? It&#8217;s funny&#8230; I remember this being the first question I ever posed to Mark years and years ago, and he did a blog entry that evolved out of our email communication. I don&#8217;t see the entry on the blog any more&#8230; it must have been lost in <a href="http://www.rittmanmead.com/category/the-great-blog-disaster/">The Great Blog Disaster</a>. Pity.</p>
<p>The choice between NUMBER and DATE bubbles up from the two streams at work in the Oracle Data Warehousing community: the data warehousing folks, and the Oracle folks. <a href="http://www.ralphkimball.com/">Ralph Kimball </a> argues that the surrogate key of the date dimension should be numeric. In the <a href="http://www.ralphkimball.com/html/booksDWLT2.html">Data Warehouse Lifecycle Toolkit</a> book (or at least, in my edition of it), Kimball basically makes the argument that numbers require less space than dates. That one never did too much for me. However, in his <a href="http://www.kimballgroup.com/html/designtipsPDF/KimballDT51LatestThinking.pdf">Latest Thinking on Time Dimension Tables</a> design tip, he makes a better argument: if our surrogate key is a DATE, then how do we handle &#8220;Not Applicable&#8221; type rows? This one has teeth, and I think that most designers who struggle with this decision point to this issue. If we use an actual DATE as our surrogate key, then what value can we use that actually means &#8220;no date at all&#8221;?</p>
<p>Oracle experts like <a href="http://asktom.oracle.com">Tom Kyte</a> argue that <a href="http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:4632159445946">&#8220;dates belong in DATES&#8221;</a>. (If you look really hard at this post, you can see a younger and more naive version of myself weighing in on the debate&#8230; and also, apparently, not knowing how to gather histograms with DBMS_STATS. Oh well.) As Tom demonstrates on that post, the optimizer just plain works better when dates are stored in DATE datatypes.</p>
<p>I&#8217;ve typically been on Kyte&#8217;s side in this debate, both from a performance and a maintenance perspective. I&#8217;ve parted ways with Kimball on this point and urged my clients to build date dimensions with DATE surrogate keys, calling the column something like DATE_KEY. For the &#8216;NA&#8217; types of dimension records, I use a wacky DATE value for DATE_KEY, such as &#8217;12/31/9999&#8242; or &#8217;01/01/0001&#8242;. Think of this as the equivalent of -1 if the surrogate key were actually numeric. Being a surrogate key&#8230; it really doesn&#8217;t matter what value it contains: we just need to know the column name so we can construct the correct JOIN syntax. Then, I&#8217;ll build another DATE column in the table called SQL DATE, and this is the one that I expose to the reporting layer. Since SQL DATE does not have to serve as the primary key, it&#8217;s fine for it to be a NULL if desired.</p>
<p>In subsequents posts, I&#8217;ll examine new partitioning features in 11g, including interval partitioning (which Pete Scott recently <a href="http://www.rittmanmead.com/2010/08/07/more-on-interval-partitioning/">blogged</a> about), and also reference partitioning, and whether these enhancements provide more options to this historically binary choice.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/08/partitioning-fact-tables-part-1/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Oracle OLAP 11g &#8211; Reporting in Excel using Simba MDX OLE-DB Provider</title>
		<link>http://www.rittmanmead.com/2010/08/oracle-olap-11g-reporting-in-excel-using-simba-mdx-ole-db-provider/</link>
		<comments>http://www.rittmanmead.com/2010/08/oracle-olap-11g-reporting-in-excel-using-simba-mdx-ole-db-provider/#comments</comments>
		<pubDate>Wed, 11 Aug 2010 10:05:00 +0000</pubDate>
		<dc:creator>Venkatakrishnan J</dc:creator>
				<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle OLAP]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=5238</guid>
		<description><![CDATA[If you had looked at my blog entry here, i would have shown a way of reporting on Oracle OLAP 11g using the newly introduced Essbase XOLAP. As mentioned there, one of the biggest advantages of using Essbase is its tight integration with Excel through Smart View. Unfortunately, in the case Oracle OLAP, the excel [...]]]></description>
			<content:encoded><![CDATA[<p>If you had looked at my blog entry <a href="http://www.rittmanmead.com/2010/04/19/oracle-epm-11-1-1-3-oracle-olap-11g-reporting-on-oracle-olap-using-essbase-excel-add-insmartview-xolap/" target="_blank">here</a>, i would have shown a way of reporting on Oracle OLAP 11g using the newly introduced Essbase XOLAP. As mentioned there, one of the biggest advantages of using Essbase is its tight integration with Excel through Smart View. Unfortunately, in the case Oracle OLAP, the excel add-ins were based on the BI Beans technology which is more or less deprecated now. Also the excel add-ins of Oracle OLAP were not as powerful as the Smart View add-in. In my previous blog entry i had shown how XOLAP interpreted the MDX fired from Visual Explorer/Smart-View and then converted them back to the corresponding SQL calls to Oracle OLAP. The SQL&#8217;s generated by XOLAP were OLAP aware i.e multiple SQL&#8217;s were generated to hit the correct pre-aggregated intersections rather than doing aggregations through SQL. There are 2 biggest drawbacks with this approach. They are</p>
<p>1. It required an Essbase License<br />
2. Any change to the OLAP metadata required an XOLAP cube rebuild within Essbase</p>
<p>Some time <a href="http://www.oracle.com/us/corporate/press/036550" target="_blank">last year</a>, <a href="http://www.simba.com" target="_blank">Simba Technologies</a> announced an MDX OLE-DB provider for Oracle OLAP. So far i did not get an opportunity to test this though it looked promising. Couple of weeks back we got an evaluation copy from Simba to test the driver (i will have to thank Simba and their Oracle OLAP &#8211; MDX provider team for providing us with an evaluation copy). This driver basically provides an ability for Excel users to leverage the power of Oracle OLAP using the Excel Pivot Tables/Charts etc. At a high level this driver does the following</p>
<p>1. End users can use the native Excel functionality to create charts/pivot tables etc<br />
2. The charts/Pivot tables generate MDX (standard OLE-DB based microsoft MDX)<br />
3. Simba driver then converts the MDX to one or more SQL calls to the Oracle OLAP</p>
<p>In this blog entry we will basically see how this driver works. The install process is quite straightforward where we are taken through a set of steps that will setup the OLE-DB driver. This driver will work only for Oracle OLAP versions 11.1.0.7 or above. Then we start off with setting up a DSN to connect to the Oracle OLAP database. Ensure that the client driver of Oracle used in the DSN is atleast of the 11.1.0.7 version.</p>
<p style="clear: both"><img style="text-align: center; display: block; margin: 0pt auto 10px;" src="http://www.rittmanmead.com/wp-content/uploads/2010/08/Picture_6-thumb.png" alt="" width="380" height="247" /></p>
<p style="clear: both"><img style=" text-align: center; display: block; margin: 0 auto 10px;" src="http://www.rittmanmead.com/wp-content/uploads/2010/08/Picture_7-thumb.png" alt="" width="380" height="227" />Once this is setup, from Excel use the Data Connection Wizard to setup a OLE-DB connection through the Simba MDX driver.</p>
<p style="clear: both"><img style=" text-align: center; display: block; margin: 0 auto 10px;" src="http://www.rittmanmead.com/wp-content/uploads/2010/08/Picture_8-thumb.png" alt="" width="380" height="271" /></p>
<p style="clear: both"><img style=" text-align: center; display: block; margin: 0 auto 10px;" src="http://www.rittmanmead.com/wp-content/uploads/2010/08/Picture_9-thumb.png" alt="" width="380" height="234" /></p>
<p style="clear: both"><img style=" text-align: center; display: block; margin: 0 auto 10px;" src="http://www.rittmanmead.com/wp-content/uploads/2010/08/Picture_10-thumb.png" alt="" width="368" height="465" />This should automatically connect us to the Oracle OLAP schemas.</p>
<p style="clear: both"><img style=" text-align: center; display: block; margin: 0 auto 10px;" src="http://www.rittmanmead.com/wp-content/uploads/2010/08/Picture_11-thumb.png" alt="" width="367" height="420" /></p>
<p style="clear: both"><img style=" text-align: center; display: block; margin: 0 auto 10px;" src="http://www.rittmanmead.com/wp-content/uploads/2010/08/Picture_12-thumb.png" alt="" width="380" height="270" />As you see, we can save a connection to a cube and then the same connection can be reused later for creating more reports. Lets start with creating a simple pivot table report (using native Microsoft Excel Pivot tables)</p>
<p style="clear: both"><img style=" text-align: center; display: block; margin: 0 auto 10px;" src="http://www.rittmanmead.com/wp-content/uploads/2010/08/Picture_13-thumb.png" alt="" width="289" height="243" />After this if you notice, we will now be having metadata of Oracle OLAP exposed within the Pivot Table member selection panels. This is very similar to Hyperion Visual Explorer where we are shown all the levels in a dimension and all the hierarchies as well. We can pick and choose either a specific level or we can choose members from multiple levels by applying proper filters. Lets create a very simple report as shown below</p>
<p style="clear: both"><img style=" text-align: center; display: block; margin: 0 auto 10px;" src="http://www.rittmanmead.com/wp-content/uploads/2010/08/Picture_14-thumb.png" alt="" width="242" height="521" /></p>
<p style="clear: both"><img style=" text-align: center; display: block; margin: 0 auto 10px;" src="http://www.rittmanmead.com/wp-content/uploads/2010/08/Picture_15-thumb.png" alt="" width="364" height="305" />As you see, we now have the ability to drill using the native MS pivot table functionality. Lets look at the MDX fired to generate the above query.</p>
<pre>SELECT
{[MEASURES].[SALES],[MEASURES].[SALES_YTD]}
DIMENSION PROPERTIES
PARENT_UNIQUE_NAME ON COLUMNS ,
NON EMPTY
CrossJoin(Hierarchize({DrilldownLevel({[TIME].[CALENDAR].[ALL_YEARS].[ALL_YEARS]})}),
Hierarchize({DrilldownLevel({[PRODUCT].[STANDARD].[ALL_PRODUCTS].[ALL_PRODUCTS]})})) DIMENSION PROPERTIES PARENT_UNIQUE_NAME,
[TIME].[CALENDAR].[CALENDAR_YEAR].[CALENDAR_YEAR_END_DATE],
[TIME].[CALENDAR].[CALENDAR_YEAR].[CALENDAR_YEAR_TIME_SPAN],
[TIME].[CALENDAR].[CALENDAR_YEAR].[CALENDAR_YEAR_LONG_DESCR],
[TIME].[CALENDAR].[CALENDAR_YEAR].[CALENDAR_YEAR_SHORT_DESC],
[TIME].[CALENDAR].[CALENDAR_YEAR].[END_DATE],
[TIME].[CALENDAR].[CALENDAR_YEAR].[TIME_SPAN],
[TIME].[CALENDAR].[CALENDAR_YEAR].[LONG_DESCRIPTION],
[PRODUCT].[STANDARD].[DEPARTMENT].[DEPARTMENT_LONG_DESCRIPT],
[PRODUCT].[STANDARD].[DEPARTMENT].[DEPARTMENT_SHORT_DESCRIP],
[PRODUCT].[STANDARD].[DEPARTMENT].[LONG_DESCRIPTION] ON ROWS
FROM
[SALES_CUBE] CELL PROPERTIES VALUE, FORMAT_STRING,
LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS,2</pre>
<p style="clear: both">As you see the MDX retrieves all the necessary Oracle OLAP level properties as MDX intrinsic properties. This is very interesting. The MDX driver basically does a metadata level mapping between MDX and Oracle OLAP. I am not sure how much of this is documented(in terms of MDX to SQL conversion calls) but again this looks very promising. Now lets look at the SQL that is fired back to Oracle OLAP. The driver can generate multiple SQL Queries for a single MDX call. This is very similar to BI EE 11g (which i shall be covering later once BI EE 11g is GA) where while doing a drill to multiple levels we will see multiple SQL calls being generated.</p>
<p style="clear: both">The first 2 SQL&#8217;s generated(for this report) will be for constructing the metadata or the member list for all the dimensions that are part of the query</p>
<pre>SELECT
'OLAPTRAIN' AS CATALOG_NAME,
'SALES_CUBE' AS CUBE_NAME,
members.DEPTH AS LEVEL_NUMBER,
members.HIER_ORDER AS MEMBER_ORDINAL,
members.DIM_KEY AS MEMBER_NAME,
1 AS MEMBER_TYPE,
SHORT_DESCRIPTION AS MEMBER_CAPTION,
1 AS CHILDREN_CARDINALITY,
CASE
WHEN (members.PARENT IS NULL) THEN NULL
ELSE members.DEPTH-1
END AS PARENT_LEVEL,
CASE
WHEN members."CALENDAR_QUARTER" IS NOT NULL
AND members.LEVEL_NAME != 'CALENDAR_QUARTER'
THEN '[TIME].[CALENDAR].[CALENDAR_QUARTER].[' || members.PARENT || ']'
WHEN members."CALENDAR_YEAR" IS NOT NULL
AND members.LEVEL_NAME != 'CALENDAR_YEAR'
THEN '[TIME].[CALENDAR].[CALENDAR_YEAR].[' || members.PARENT || ']'
WHEN members."ALL_YEARS" IS NOT NULL
AND members.LEVEL_NAME != 'ALL_YEARS' THEN '[TIME].[CALENDAR].[ALL_YEARS].[' || members.PARENT || ']'
ELSE (CAST (NULL AS VARCHAR2(1)))
END AS PARENT_UNIQUE_NAME,
(CAST (NULL AS VARCHAR2(1))) AS DESCRIPTION
, members.CALENDAR_YEAR_END_DATE AS PROPERTY_4
, members.CALENDAR_YEAR_TIME_SPAN AS PROPERTY_5
, members.CALENDAR_YEAR_LONG_DESCR AS PROPERTY_6
, members.CALENDAR_YEAR_SHORT_DESC AS PROPERTY_7
, members.END_DATE AS PROPERTY_20
, members.TIME_SPAN AS PROPERTY_21
, members.LONG_DESCRIPTION AS PROPERTY_22
, 'TIME' AS DIMENSION_NAME
, 'CALENDAR' AS HIERARCHY_NAME
, members.LEVEL_NAME AS LEVEL_NAME
FROM
"OLAPTRAIN".TIME_CALENDAR_VIEW members
WHERE
members.LEVEL_NAME = 'CALENDAR_YEAR'
ORDER BY MEMBER_ORDINAL, PARENT_UNIQUE_NAME, MEMBER_NAME</pre>
<pre>SELECT
'OLAPTRAIN' AS CATALOG_NAME,
'SALES_CUBE' AS CUBE_NAME,
members.DEPTH AS LEVEL_NUMBER,
members.HIER_ORDER AS MEMBER_ORDINAL,
members.DIM_KEY AS MEMBER_NAME,
1 AS MEMBER_TYPE,
SHORT_DESCRIPTION AS MEMBER_CAPTION,
1 AS CHILDREN_CARDINALITY,
CASE
WHEN (members.PARENT IS NULL) THEN NULL
ELSE members.DEPTH-1
END AS PARENT_LEVEL,
CASE
WHEN members."COUNTRY" IS NOT NULL AND members.LEVEL_NAME != 'COUNTRY'
THEN '[GEOGRAPHY].[REGIONAL].[COUNTRY].[' || members.PARENT || ']'
WHEN members."REGION" IS NOT NULL AND members.LEVEL_NAME != 'REGION'
THEN '[GEOGRAPHY].[REGIONAL].[REGION].[' || members.PARENT || ']'
WHEN members."ALL_REGIONS" IS NOT NULL AND members.LEVEL_NAME != 'ALL_REGIONS'
THEN '[GEOGRAPHY].[REGIONAL].[ALL_REGIONS].[' || members.PARENT || ']'
ELSE (CAST (NULL AS VARCHAR2(1)))
END AS PARENT_UNIQUE_NAME,
(CAST (NULL AS VARCHAR2(1))) AS DESCRIPTION
, members.ALL_REGIONS_SHORT_DESCRI AS PROPERTY_9
, members.ALL_REGIONS_LONG_DESCRIP AS PROPERTY_10
, members.LONG_DESCRIPTION AS PROPERTY_12
, 'GEOGRAPHY' AS DIMENSION_NAME
, 'REGIONAL' AS HIERARCHY_NAME
, members.LEVEL_NAME AS LEVEL_NAME
FROM
"OLAPTRAIN".GEOGRAPHY_REGIONAL_VIEW members
WHERE
members.LEVEL_NAME = 'ALL_REGIONS'
ORDER BY MEMBER_ORDINAL, PARENT_UNIQUE_NAME, MEMBER_NAME</pre>
<p style="clear: both">Then the final query will be for generating the measure values.</p>
<pre>SELECT
SALES_CUBE_VIEW.SALES, SALES_CUBE_VIEW.TIME,
SALES_CUBE_VIEW.PRODUCT
FROM
"OLAPTRAIN".SALES_CUBE_VIEW SALES_CUBE_VIEW
WHERE
SALES_CUBE_VIEW.TIME IN ('ALL_YEARS', 'CY2008', 'CY2010','CY2007','CY2009' )
AND SALES_CUBE_VIEW.PRODUCT IN ('ALL_PRODUCTS', '-518', '-519', '-520' )
AND SALES_CUBE_VIEW.CHANNEL = 'ALL_CHANNELS'
AND SALES_CUBE_VIEW.GEOGRAPHY = 'ALL_REGIONS'</pre>
<p style="clear: both">If you look at all the queries, they are all OLAP aware i.e. default member filters are applied properly and there is no additional aggregation that is pushed through SQL. This is very interesting and if there are customers using Oracle OLAP, this is one driver that can potentially be put to good use for Excel based reporting.</p>
<p style="clear: both">Currently looks like there is no way to fire custom MDX queries through the Excel 2007 that i have. So, i am not sure how the driver will behave/work when we push custom MDX aggregations like AGGREGATE, SUM etc. Also, i am not sure whether a mapping for all MDX functions(like intersect, union etc) to corresponding OLAP SQL calls exist. But I was told that custom MDX functions should also work well. It is just a case of Excel 2007 not supporting custom MDX queries for the native Pivot Tables.</p>
<p><br class="final-break" style="clear: both" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/08/oracle-olap-11g-reporting-in-excel-using-simba-mdx-ole-db-provider/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Realtime Data Warehouses</title>
		<link>http://www.rittmanmead.com/2010/04/realtime-data-warehouses/</link>
		<comments>http://www.rittmanmead.com/2010/04/realtime-data-warehouses/#comments</comments>
		<pubDate>Thu, 08 Apr 2010 08:08:59 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle GoldenGate]]></category>
		<category><![CDATA[User Groups & Conferences]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4624</guid>
		<description><![CDATA[In a little over two weeks I will be giving my Real Time Data Warehousing talk at Collaborate 10 in Las Vegas. I will cover a variety of techniques and not just change data capture. However, in my opinion, CDC plays a major part in the practical implementation of a realtime data warehouse. It is [...]]]></description>
			<content:encoded><![CDATA[<p>In a little over two weeks I will be giving my Real Time Data Warehousing talk at <a href="http://collaborate10.ioug.org/Education/BIWATrainingDays/tabid/83/Default.aspx">Collaborate</a> 10 in Las Vegas. I will cover a variety of techniques and not just change data capture. However, in my opinion, CDC plays a major part in the practical implementation of a realtime data warehouse. It is not the whole story as CDC is about data propagation; we still need an ETL component to consume those changes and publish them to the reporting layer of the DW. Depending on what needs to be done with the data (such as processing slowly changing dimensions and building aggregations) this step can add significantly to the latency of the event to publish process. I will cover this in my talk &#8211; and here on the Rittman Mead blog after Collaborate has finished.</p>
<p>No talk including Oracle CDC would be complete without mention of GoldenGate; Mark has recently <a href="/2010/03/22/configuring-odi-10-1-3-6-to-use-oracle-golden-gate-for-changed-data-capture/" target="_blank">blogged</a> about consuming GoldenGate captured data with ODI. For those that don&#8217;t know, GoldenGate is an Oracle acquired company that specialised in data replication that is very fast, able to handle large data volumes and able to support heterogeneous source and target platforms both in terms of database and operating system. One combination I aim to try in a few weeks time is pushing change from an Oracle database into an Oracle TimesTen in-memory database.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/04/realtime-data-warehouses/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Transcend and Segment-Switching</title>
		<link>http://www.rittmanmead.com/2010/03/transcend-and-segment-switching/</link>
		<comments>http://www.rittmanmead.com/2010/03/transcend-and-segment-switching/#comments</comments>
		<pubDate>Tue, 23 Mar 2010 19:57:17 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Rittman Mead]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4548</guid>
		<description><![CDATA[As you read this most recent post on the capabilities of Transcend, I&#8217;d like to pose a question to you: would you consider using Transcend in your environment if it was open-sourced? Currently, we implement Transcend as part of the Rittman Mead Rapid Deployment Framework for our clients, but have been considering opening it up [...]]]></description>
			<content:encoded><![CDATA[<p>As you read this most recent post on the capabilities of Transcend, I&#8217;d like to pose a question to you: would you consider using Transcend in your environment if it was open-sourced? Currently, we implement Transcend as part of the Rittman Mead Rapid Deployment Framework for our clients, but have been considering opening it up to the community. We&#8217;ve made no iron-clad decisions, but I&#8217;m curious if our readers have any opinions.</p>
<p>Transcend supports the notion of what I like to call &#8220;segment-switching&#8221;: table renames and partition exchanges. Each is suited for different purposes, though in some situations it boils down to preference. I&#8217;ll demonstrate both methods, and also some of the ancillary features of the product while setting up my test case.</p>
<p>First, I&#8217;ll create a version of the SH.SALES table in another schema to act as my fact table, keeping the partition information, adding the indexes, and transferring the statistics:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.build_table(
  3                            p_table          =&gt; 'sales_fact',
  4                            p_owner          =&gt; 'target',
  5                            p_source_table   =&gt; 'sales',
  6                            p_source_owner   =&gt; 'sh',
  7                            p_tablespace     =&gt; 'users',
  8                            p_partitioning   =&gt; 'yes',
  9                            p_rows           =&gt; 'yes',
 10                            p_indexes        =&gt; 'yes',
 11                            p_constraints    =&gt; 'no',
 12                            p_statistics     =&gt; 'transfer'
 13                          );
 14  END;
 15  /
Table TARGET.SALES_FACT created
Number of records inserted into TARGET.SALES_FACT: 918843
Statistics from SH.SALES transfered to TARGET.SALES_FACT
Index SALES_FACT_CHANNEL_BIX built
Index SALES_FACT_CUST_BIX built
Index SALES_FACT_PROD_BIX built
Index SALES_FACT_PROMO_BIX built
Index SALES_FACT_TIME_BIX built
5 index creation processes executed for TARGET.SALES_FACT

PL/SQL procedure successfully completed.

SQL&gt; </pre>
<p>Now I&#8217;ll create a staging table based on the same SALES table, but this time, I won&#8217;t build the indexes:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.build_table(
  3                            p_table          =&gt; 'sales_stg',
  4                            p_owner          =&gt; 'target',
  5                            p_source_table   =&gt; 'sales',
  6                            p_source_owner   =&gt; 'sh',
  7                            p_tablespace     =&gt; 'users',
  8                            p_partitioning   =&gt; 'yes',
  9                            p_rows           =&gt; 'yes',
 10                            p_indexes        =&gt; 'no',
 11                            p_constraints    =&gt; 'no',
 12                            p_statistics     =&gt; 'no'
 13                          );
 14  END;
 15  /
Table TARGET.SALES_STG created
Number of records inserted into TARGET.SALES_STG: 918843

PL/SQL procedure successfully completed.

SQL&gt; </pre>
<p>Now that I have the staging table containing the rows I want to put into the target table, I&#8217;ll use the REPLACE_TABLE procedure to interchange the two tables, handling all index and constraint maintenance. The P_IDX_CONCURRENCY and P_CON_CONCURRENCY parameters determine whether indexes and constraints will be built sequentially &#8212; one after another in a loop &#8212; or whether Transcend will submit the DDL statements to the Oracle Scheduler so that they can run concurrently, and then wait for them all to complete:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.replace_table(
  3                              p_owner           =&gt; 'target',
  4                              p_table           =&gt; 'sales_fact',
  5                              p_source_table    =&gt; 'sales_stg',
  6                              p_idx_concurrency =&gt; 'yes',
  7                              p_con_concurrency =&gt; 'yes',
  8                              p_statistics      =&gt; 'transfer'
  9                          );
 10  END;
 11  /
Statistics from TARGET.SALES_FACT transferred to TARGET.SALES_STG
Oracle scheduler job BUILD_INDEXES841 created
Oracle scheduler job BUILD_INDEXES841 enabled
Index SALES_STG_CHANNEL_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES842 created
Oracle scheduler job BUILD_INDEXES842 enabled
Index SALES_STG_CUST_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES843 created
Oracle scheduler job BUILD_INDEXES843 enabled
Index SALES_STG_PROD_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES844 created
Oracle scheduler job BUILD_INDEXES844 enabled
Index SALES_STG_PROMO_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES845 created
Oracle scheduler job BUILD_INDEXES845 enabled
Index SALES_STG_TIME_BIX creation submitted to the Oracle scheduler
5 index creation processes submitted to the Oracle scheduler for TARGET.SALES_STG
No matching constraints found on TARGET.SALES_FACT
No matching constraints to drop found on TARGET.SALES_FACT
5 indexes dropped on TARGET.SALES_FACT
TARGET.SALES_STG and TARGET.SALES_FACT table names interchanged
Index TARGET.SALES_STG_CHANNEL_BIX renamed to SALES_FACT_CHANNEL_BIX
Index TARGET.SALES_STG_CUST_BIX renamed to SALES_FACT_CUST_BIX
Index TARGET.SALES_STG_TIME_BIX renamed to SALES_FACT_TIME_BIX
Index TARGET.SALES_STG_PROMO_BIX renamed to SALES_FACT_PROMO_BIX
Index TARGET.SALES_STG_PROD_BIX renamed to SALES_FACT_PROD_BIX

PL/SQL procedure successfully completed.

SQL&gt; select count(*) from target.sales_fact;

  COUNT(*)
----------
    918843

1 row selected.

SQL&gt; </pre>
<p>Notice that the parameter P_SOURCE_OWNER is not available in this procedure. That&#8217;s because table renames cannot be done across schemas. Also, none, one or both of the tables can be partitioned; Transcend adjusts the DDL for the indexes accordingly. Actually&#8230; the functionality is not unlike the <a href="http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10577/d_redefi.htm">DBMS_REDEFINITION </a>package&#8230; though DBMS_REDEFINITION has a lot more functionality. I actually have a task on the product roadmap to see about rewriting REPLACE_TABLE to use DBMS_REDEFINITION&#8230; but for another day.</p>
<p>The more meaningful segment-switching process is the good old fashioned partition exchange. This is useful in lots of load scenarios, especially when a fact table is involved. Partition exchange loading allows for maintenance type tasks such as index rebuilds and constraint validation to occur on a staging table without affecting the actual reporting table, so report queries can continue to run while these tasks are being completed.</p>
<p>I&#8217;ll create a new non-partitioned table without rows, and then I&#8217;ll re-insert all the rows from the SH.SALES table, but I&#8217;ll adjust them so that they correspond to the highest partition in the SALES_FACT table: 2003Q4.</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.build_table(
  3                            p_table          =&gt; 'sales_stg',
  4                            p_owner          =&gt; 'stage',
  5                            p_source_table   =&gt; 'sales',
  6                            p_source_owner   =&gt; 'sh',
  7                            p_tablespace     =&gt; 'users',
  8                            p_partitioning   =&gt; 'no',
  9                            p_rows           =&gt; 'no',
 10                            p_indexes        =&gt; 'no',
 11                            p_constraints    =&gt; 'no',
 12                            p_statistics     =&gt; 'ignore'
 13                          );
 14  END;
 15  /
Table STAGE.SALES_STG created

PL/SQL procedure successfully completed.

SQL&gt; SELECT dbms_metadata.get_ddl('TABLE','SALES_STG','STAGE') from dual;

DBMS_METADATA.GET_DDL('TABLE','SALES_STG','STAGE')
-----------------------------------------------------------------------

  CREATE TABLE "STAGE"."SALES_STG"
   (	"PROD_ID" NUMBER,
	"CUST_ID" NUMBER,
	"TIME_ID" DATE,
	"CHANNEL_ID" NUMBER,
	"PROMO_ID" NUMBER,
	"QUANTITY_SOLD" NUMBER(10,2),
	"AMOUNT_SOLD" NUMBER(10,2)
   ) SEGMENT CREATION DEFERRED
  PCTFREE 5 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS NOLOGGING
  TABLESPACE "USERS"

1 row selected.

SQL&gt; insert /*+ APPEND */ into stage.sales_stg
  2  select prod_id, cust_id, to_date('10/15/2003','mm/dd/yyyy'),
  3  channel_id, promo_id, quantity_sold, amount_sold
  4  from sh.sales;

918843 rows created.

SQL&gt; commit;

Commit complete.

SQL&gt; </pre>
<p>As the DBMS_METADATA function above shows, Transcend converted the DDL from the SH.SALES table to make it a non-partitioned table. Now, I&#8217;ll perform the partition exchange:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.exchange_partition(
  3                                   p_table           =&gt; 'sales_fact',
  4                                   p_owner           =&gt; 'target',
  5                                   p_source_table    =&gt; 'sales_stg',
  6                                   p_source_owner    =&gt; 'stage',
  7                                   p_idx_concurrency =&gt; 'yes',
  8                                   p_statistics      =&gt; 'transfer'
  9                                 );
 10  END;
 11  /
Statistics from partition SALES_Q4_2003 of TARGET.SALES_FACT transferred to STAGE.SALES_STG
Oracle scheduler job BUILD_INDEXES871 created
Oracle scheduler job BUILD_INDEXES871 enabled
Index SALES_STG_CHANNEL_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES872 created
Oracle scheduler job BUILD_INDEXES872 enabled
Index SALES_STG_CUST_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES873 created
Oracle scheduler job BUILD_INDEXES873 enabled
Index SALES_STG_PROD_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES874 created
Oracle scheduler job BUILD_INDEXES874 enabled
Index SALES_STG_PROMO_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES875 created
Oracle scheduler job BUILD_INDEXES875 enabled
Index SALES_STG_TIME_BIX creation submitted to the Oracle scheduler
5 index creation processes submitted to the Oracle scheduler for STAGE.SALES_STG
No matching constraints found on TARGET.SALES_FACT
STAGE.SALES_STG exchanged for partition SALES_Q4_2003 of table TARGET.SALES_FACT
No matching constraints to drop found on STAGE.SALES_STG
5 indexes dropped on STAGE.SALES_STG

PL/SQL procedure successfully completed.

SQL&gt; select count(*) from target.sales_fact;

  COUNT(*)
----------
   1837686

1 row selected.

SQL&gt; </pre>
<p>You can see that Transcend handled all of the indexes and even built them concurrently in the background. You may notice that I didn&#8217;t even specify which partition I wanted to exchange the table in for. I could have specified this with the P_PARTNAME parameter, but when left null, Transcend will just assume that I want the highest partition.</p>
<p>Partition-exchange loading is recognized as an effective method for loading fact tables&#8230; but can it be used for dimension tables as well? I would argue it can, and actually Transcend uses partition-exchange loading to perform hybrid SCD Type 1 and Type 2 loading techniques (more on that in a future post). Partition exchanges can provide the same high-availability and background load scenarios used for fact tables&#8230; but the issue here is that dimension tables are generally not evenly distributed in time as a fact table is: they usually aren&#8217;t even partitioned, and when they are, it&#8217;s normally not based on range.</p>
<p>What I often do in these scenarios is use single-partition tables, so the partitioned table is little more than a container for a max partition. I create the staging table as the single-partitioned table, and keep the dimension table unpartitioned. Transcend actually doesn&#8217;t care whether the source table or the target table is partitioned: it knows that only one table in the series can be partitioned, and it will error if this is not the case. So let&#8217;s build our dimension table and our single-partition staging table:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.build_table(
  3                            p_table          =&gt; 'customer_dim',
  4                            p_owner          =&gt; 'target',
  5                            p_source_table   =&gt; 'customers',
  6                            p_source_owner   =&gt; 'sh',
  7                            p_tablespace     =&gt; 'users',
  8                            p_partitioning   =&gt; 'no',
  9                            p_rows           =&gt; 'yes',
 10                            p_indexes        =&gt; 'yes',
 11                            p_constraints    =&gt; 'yes',
 12                            p_statistics     =&gt; 'transfer'
 13                          );
 14  END;
 15  /
Table TARGET.CUSTOMER_DIM created
Number of records inserted into TARGET.CUSTOMER_DIM: 55500
Statistics from SH.CUSTOMERS transferred to TARGET.CUSTOMER_DIM
Index CUSTOMER_DIM_GENDER_BIX built
Index CUSTOMER_DIM_MARITAL_BIX built
Index CUSTOMER_DIM_YOB_BIX built
Index CUSTOMER_DIM_PK built
4 index creation processes executed for TARGET.CUSTOMER_DIM
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of constraint CUSTOMER_DIM_COUNTRY_FK executed
Creation of constraint CUSTOMER_DIM_PK executed
17 constraints built for TARGET.CUSTOMER_DIM

PL/SQL procedure successfully completed.

SQL&gt; </pre>
<p>I&#8217;ll have to create the single-partition table manually, because Transcend doesn&#8217;t have the functionality to convert DDL from a non-partitioned table to a partitioned one (yet). It really doesn&#8217;t matter which column is used as the partitioning column, since the partitioned table is really only a container for the max partition. So typically, I&#8217;ll use the primary key:</p>
<pre>SQL&gt; CREATE TABLE STAGE.CUSTOMER_STG
  2     (    CUST_ID NUMBER,
  3          CUST_FIRST_NAME VARCHAR2(20),
  4          CUST_LAST_NAME VARCHAR2(40),
  5          CUST_GENDER CHAR(1),
  6          CUST_YEAR_OF_BIRTH NUMBER(4,0),
  7          CUST_MARITAL_STATUS VARCHAR2(20),
  8          CUST_STREET_ADDRESS VARCHAR2(40),
  9          CUST_POSTAL_CODE VARCHAR2(10),
 10          CUST_CITY VARCHAR2(30),
 11          CUST_CITY_ID NUMBER,
 12          CUST_STATE_PROVINCE VARCHAR2(40),
 13          CUST_STATE_PROVINCE_ID NUMBER,
 14          COUNTRY_ID NUMBER,
 15          CUST_MAIN_PHONE_NUMBER VARCHAR2(25),
 16          CUST_INCOME_LEVEL VARCHAR2(30),
 17          CUST_CREDIT_LIMIT NUMBER,
 18          CUST_EMAIL VARCHAR2(30),
 19          CUST_TOTAL VARCHAR2(14),
 20          CUST_TOTAL_ID NUMBER,
 21          CUST_SRC_ID NUMBER,
 22          CUST_EFF_FROM DATE,
 23          CUST_EFF_TO DATE,
 24          CUST_VALID VARCHAR2(1)
 25     )
 26         partition BY range (cust_id)
 27         ( partition max VALUES less than (MAXVALUE))
 28  /

Table created.

SQL&gt; </pre>
<p>Now I&#8217;ll perform the partition exchange in the other direction: where the <em>source</em> table is the partitioned table, and the <em>target</em> table is the non-partitioned table. For Transcend, the <em>source</em> and <em>target</em> concepts are determined by which segment is being loaded, not which one is being exchanged into:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.exchange_partition(
  3                                   p_table           =&gt; 'customer_dim',
  4                                   p_owner           =&gt; 'target',
  5                                   p_source_table    =&gt; 'customer_stg',
  6                                   p_source_owner    =&gt; 'stage',
  7                                   p_idx_concurrency =&gt; 'yes',
  8                                   p_con_concurrency =&gt; 'yes',
  9                                   p_statistics      =&gt; 'transfer'
 10                                 );
 11  END;
 12  /
Statistics from TARGET.CUSTOMER_DIM transferred to partition MAX of STAGE.CUSTOMER_STG
Oracle scheduler job BUILD_INDEXES876 created
Oracle scheduler job BUILD_INDEXES876 enabled
Index CUSTOMER_STG_GENDER_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES877 created
Oracle scheduler job BUILD_INDEXES877 enabled
Index CUSTOMER_STG_MARITAL_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES878 created
Oracle scheduler job BUILD_INDEXES878 enabled
Index CUSTOMER_STG_YOB_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES879 created
Oracle scheduler job BUILD_INDEXES879 enabled
Index CUSTOMER_STG_PK creation submitted to the Oracle scheduler
4 index creation processes submitted to the Oracle scheduler for STAGE.CUSTOMER_STG
Oracle scheduler job BUILD_CONSTRAINTS880 created
Oracle scheduler job BUILD_CONSTRAINTS880 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS881 created
Oracle scheduler job BUILD_CONSTRAINTS881 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS882 created
Oracle scheduler job BUILD_CONSTRAINTS882 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS883 created
Oracle scheduler job BUILD_CONSTRAINTS883 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS884 created
Oracle scheduler job BUILD_CONSTRAINTS884 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS885 created
Oracle scheduler job BUILD_CONSTRAINTS885 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS886 created
Oracle scheduler job BUILD_CONSTRAINTS886 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS887 created
Oracle scheduler job BUILD_CONSTRAINTS887 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS888 created
Oracle scheduler job BUILD_CONSTRAINTS888 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS889 created
Oracle scheduler job BUILD_CONSTRAINTS889 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS890 created
Oracle scheduler job BUILD_CONSTRAINTS890 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS891 created
Oracle scheduler job BUILD_CONSTRAINTS891 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS892 created
Oracle scheduler job BUILD_CONSTRAINTS892 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS893 created
Oracle scheduler job BUILD_CONSTRAINTS893 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS894 created
Oracle scheduler job BUILD_CONSTRAINTS894 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS895 created
Oracle scheduler job BUILD_CONSTRAINTS895 enabled
Creation of constraint CUSTOMER_STG_COUNTRY_FK submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS896 created
Oracle scheduler job BUILD_CONSTRAINTS896 enabled
Creation of constraint CUSTOMER_STG_PK submitted to the Oracle scheduler
17 constraints submitted to the Oracle scheduler for STAGE.CUSTOMER_STG
TARGET.CUSTOMER_DIM exchanged for partition MAX of table STAGE.CUSTOMER_STG
Constraint CUSTOMER_STG_PK dropped
Constraint SYS_C0021255 dropped
Constraint SYS_C0021256 dropped
Constraint SYS_C0021257 dropped
Constraint SYS_C0021263 dropped
Constraint SYS_C0021259 dropped
Constraint SYS_C0021260 dropped
Constraint SYS_C0021261 dropped
Constraint SYS_C0021262 dropped
Constraint SYS_C0021258 dropped
10 constraints dropped on STAGE.CUSTOMER_STG
4 indexes dropped on STAGE.CUSTOMER_STG

PL/SQL procedure successfully completed.

SQL&gt; </pre>
<p>I know&#8230; there&#8217;s a lot going on here. First, the statistics are transferred from one segment to another. The other options for the P_STATISTICS parameter are &#8216;gather&#8217; and &#8216;ignore&#8217;. But basically, the &#8216;transfer&#8217; method is preferred, because it maintains continuity between automatic stats collection runs. All the indexes and constraints are built concurrently, the exchange is performed, and finally, these same indexes and constraints are dropped on the <em>new</em> source table in preparation for the next run.</p>
<p>Hopefully this demonstrates the segment-switching capabilities of Transcend, and paves the way for me to describe some of the more advanced features, especially, handling slowly-changing dimensions in a set-based process, as well as configuring Transcend &#8220;mappings&#8221; to correspond with mappings that get executed as part of the ETL batch run.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/03/transcend-and-segment-switching/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

