<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rittman Mead Consulting &#187; Data Warehousing</title>
	<atom:link href="http://www.rittmanmead.com/category/data-warehousing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rittmanmead.com</link>
	<description>Delivered Intelligence</description>
	<lastBuildDate>Wed, 17 Mar 2010 20:23:43 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Thoughts on Change Data Capture</title>
		<link>http://www.rittmanmead.com/2010/03/09/thoughts-on-change-data-capture/</link>
		<comments>http://www.rittmanmead.com/2010/03/09/thoughts-on-change-data-capture/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 09:31:54 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[User Groups & Conferences]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4497</guid>
		<description><![CDATA[In little over a month I will be in Las Vegas speaking at Collaborate 10. There is a lot of BI / DW talks this year and for the first time with BIWA Training Days branding. Rittman Mead will be there at the conference giving talks on each of the conference days. If you are [...]]]></description>
			<content:encoded><![CDATA[<p>In little over a month I will be in Las Vegas speaking at Collaborate 10. There is a lot of BI / DW talks this year and for the first time with<a href="http://collaborate10.ioug.org/Education/BIWATrainingDays/tabid/83/Default.aspx#view" target="_blank"> BIWA Training Days branding</a>. Rittman Mead will be there at the conference giving talks on each of the conference days. If you are at the conference (or even just on vacation there) then come and say &#8216;Hi&#8217; to Stewart, Venkat, Mark and myself.</p>
<p>My talk will be about Realtime Data Warehousing &#8211; it is an overview of reasons, techniques and pitfalls, but I do cover a lot of material in that hour. Of course, Change Data Capture (CDC) will be a major part of the talk; Oracle has so many options here including their recently acquired GoldenGate product set. As always, the slides will be here on the Rittman Mead site soon after I speak.</p>
<p>My colleague, Stewart Bryson has also had some recent thoughts about change data capture over on the TDWI group at LinkedIn.com (group membership needed); he was quite preceptive (and on the money, in my opinion) with his comment &#8220;I would hesitate to let technical limitations dictate user requirements. In today&#8217;s BI/DW market, there are very few technical limitations that cannot be solved one way or another.&#8221;</p>
<p>One of points I will make in my Realtime DW talk, and perhaps I need a few more slides to do it justice, is the need to profile the change you capture on the source system. Often there is a lot of &#8220;noise&#8221; that looks like change but you have no real interest in it at the data warehouse. Not all systems are &#8220;well behaved&#8221;; I have seen systems that always update a record even if nothing has changed and even systems that update each column as separate statement with its own commit.  Of course, even systems that don&#8217;t have those vices can still have columns that have no DW significance being updated and see those changes being filtered out on the data warehouse after we had already done a lot of work (processing, network bandwidth and the like) to get the data there.</p>
<p>The more I do this kind of work I feel there is a need to switch CDC on on the live source for a while and see the typical patten of change that occurs in a day, week, period whatever and then make decisions on how to handle this defensively downstream. Do we need to exclude certain columns that are just &#8220;noise&#8221;? What will be the impact of multiple, rapidly-occurring commits on how we handle SCD-2 dimensions? Of course we can predict what will see and come up with a proposed solution but the real source often has a few surprises up its sleeve &#8211; once a customer gave me a sequence of order statuses that an order passed through in its life-cycle except that on the actual source system the order sequence was not the same as their documentation and that would impact our reporting.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/03/09/thoughts-on-change-data-capture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oracle 11g Pivot</title>
		<link>http://www.rittmanmead.com/2010/02/23/oracle-11g-pivot/</link>
		<comments>http://www.rittmanmead.com/2010/02/23/oracle-11g-pivot/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 12:02:16 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[BI (General)]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle Warehouse Builder]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4401</guid>
		<description><![CDATA[One of the things that I often come across is the &#8220;up-dateable fact&#8221;, that is a fact that starts life &#8220;incomplete&#8221; and changes overtime. Examples include things such as support calls that start life as &#8220;open&#8221; then progress through &#8220;responded&#8221;, &#8220;resolved&#8221; and finally &#8220;closed&#8221;; statuses in the sales cycle such as ordered, paid, shipped; stock [...]]]></description>
			<content:encoded><![CDATA[<p>One of the things that I often come across is the &#8220;up-dateable fact&#8221;, that is a fact that starts life &#8220;incomplete&#8221; and changes overtime. Examples include things such as support calls that start life as &#8220;open&#8221; then progress through &#8220;responded&#8221;, &#8220;resolved&#8221; and finally &#8220;closed&#8221;; statuses in the sales cycle such as ordered, paid, shipped; stock movements in a warehouse &#8211; goods received and dispatched. Of course the business, rightly, needs to measure the times between stages or the number or value of transactions at each stage.</p>
<p>As a principle, I hate the idea of having to update a fact. A fact has happened, it is not going to change. I suppose to be more accurate a &#8220;change&#8221; is a new event, a new fact, a new fact occurring at a different time. So how to model this? &#8211; well instinctively I would go for a table that is only inserted (preferably appended to &#8211; think set based!) containing whatever dimensions are needed (don&#8217;t forget the &#8216;when&#8217; dimension) PLUS an &#8216;EVENT&#8217; dimension (one row per expected status) and the measures (how many, how much etc). To report on this we need to rotate the table so that the events that belong to single item appear in the same row. Before Oracle 11g we would need to construct some SQL using a mix of case statements and analytic functions to rotate the data. But now we have a potentially better way the, Oracle 11g Pivot operator.</p>
<p>Here we define a set of dimensions for the row (similar to the dimensions in a Group BY clause), the aggregation operators for the pivoted measures &#8211; which of course could include MIN() or MAX() for the cases when want to pivot DATE types. We also need to define the dimensions we want to pivot by, and here we can actually choose multiple dimensions; this again is somewhat similar to the GROUP BY of traditional SQL. Remember though when we pivot we sometimes only expect to &#8216;aggregate&#8217; a single row &#8211; if we want to pivot order date and dispatch date then we probably have just one of each!.</p>
<p>So how does it look? Well the Oracle 11g documentation describes the syntax and gives some examples &#8211; here I am showing a slightly more complicated case where we are pivoting by two dimensions, each with a known set of code values. This example is based on two of the examples in the Oracle 11g Data Warehousing Guide</p>
<pre>	SELECT * FROM	(
		SELECT product, channel, quarter, quantity_sold FROM sales_view
		) PIVOT (SUM(quantity_sold) as SUMQ, SUM(amount_sold) as SUMS
			FOR (channel, quarter) IN
			((5, '02') AS CATALOG_Q2,
		 	(4, '01') AS INTERNET_Q1,
		 	(4, '04') AS INTERNET_Q4,
		 	(2, '02') AS PARTNERS_Q2,
		 	(9, '03') AS TELE_Q3
			) );</pre>
<p>The query returns a column for the product and for each of the specified pairs of channel and quarter a column for each measure. So we get columns for:</p>
<p>PRODUCT, CATALOG_Q2_SUMQ, CATALOG_Q2_SUMS, INTERNET_Q1_SUMQ, INTERNET_Q1_SUMS, INTERNET_Q4_SUMQ, INTERNET_Q4_SUMS, PARTNERS_Q2_SUMQ, INTERNET_Q4_SUMS, TELE_Q3_SUMQ, and TELE_Q3_SUMS</p>
<p>Note how the the measure name is concatenated to the alias in the in list.<br />
As you can see we don&#8217;t need to specify each combination of channel and quarter &#8211; just the ones we want in our pivoted view. We also don&#8217;t use a GROUP BY clause &#8211; we specify the columns we want to see (both the dimensions and the aggregations) and Oracle implicitly groups by all of the columns not in aggregated functions.</p>
<p>In my example I used SELECT * to wrap the inline pivot, in practice I would explicitly select the columns and perhaps alias them to more meaningful names than the concatenated ones generated by Oracle. I would also expose the pivot as database view and thus access it from OWB or OBIEE where it appears to be just another table or view.</p>
<p>Another point to note is that you might see null values in the pivoted measures and these can be due to one of two reasons: the value stored for that combination of dimensions (in our case channel and quarter) is actually NULL, or that the combination does not exist. If you need to (and you may not need to) you can differentiate by using a COUNT measure; if the count is zero then the combination does not exist in the source table, if one or more then the source has NULLs stored for the combination.</p>
<p>We used a similar pivot view to the one above to monitor stock movements in a warehouse &#8211; in this case we needed to track individual batches of product from multiple potential suppliers, so in addition to the product dimension we had dimensional columns for batch id (a degenerate dimension) and supplier. The view was then exposed to OWB to allow us to include the aggregated result set in our ETL process &#8211; we needed to calculate some additional measures based on the difference between two of the pivoted columns. The Pivot operator greatly simplified our ETL for this fact &#8211; we could easily write an ETL process with a straight aggregation then pivot the results with CASE statements or DECODES or whatever &#8211; but that would have been less clear and also increased the number of &#8220;moving parts&#8221;.</p>
<p>We have had no problems with performance with our data set &#8211; 80 million rows pivoted on Exadata to just a few seconds. But it was not too slow on our non-exadata development machine either.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/02/23/oracle-11g-pivot/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Data Warehouse Fault Tolerance Part 3: Restoring</title>
		<link>http://www.rittmanmead.com/2010/02/12/data-warehouse-fault-tolerance-part-3-restoring/</link>
		<comments>http://www.rittmanmead.com/2010/02/12/data-warehouse-fault-tolerance-part-3-restoring/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 02:47:53 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4327</guid>
		<description><![CDATA[Hopefully you&#8217;ve read the introduction, Part 1, and Part 2. Those posts detailed methods for building fault-tolerant ETL code, with a strong bias in favor of using Oracle Database features. Now I&#8217;ll drill into the backup and recovery aspect of data warehousing fault tolerance, and tackle the age-old question of whether to ARCHIVELOG or NOARCHIVELOG [...]]]></description>
			<content:encoded><![CDATA[<p>Hopefully you&#8217;ve read the <a href="http://www.rittmanmead.com/2010/02/02/data-warehouse-fault-tolerance-an-introduction/">introduction</a>, <a href="http://www.rittmanmead.com/2010/02/08/data-warehouse-fault-tolerance-part-1-resuming/">Part 1</a>, and <a href="http://www.rittmanmead.com/2010/02/10/data-warehouse-fault-tolerance-part-2-restarting/">Part 2</a>. Those posts detailed methods for building fault-tolerant ETL code, with a strong bias in favor of using Oracle Database features. Now I&#8217;ll drill into the backup and recovery aspect of data warehousing fault tolerance, and tackle the age-old question of whether to ARCHIVELOG or NOARCHIVELOG in a BI/DW environment.</p>
<p>When I engage with clients that have a data warehouse operating in NOARCHIVELOG mode, their usual reasoning for this decision is a perceived performance gain. This makes sense on the surface&#8230; because NOARCHIVELOG prevents the generation of all that unwanted and unneeded REDO, right?</p>
<p>Not exactly. There is misconception about what NOARCHIVELOG actually means, and hopefully, I can clear that up with a demonstration. I have a database in NOARCHIVELOG, and I&#8217;ll test to see whether my statements generate REDO:</p>
<pre>SQL&gt; SELECT log_mode
  2    FROM v$database;

LOG_MODE
------------
NOARCHIVELOG

1 row selected.

Elapsed: 00:00:00.00
SQL&gt;
SQL&gt; CREATE TABLE target.sales
  2      AS SELECT *
  3           FROM sh.sales
  4          WHERE 1=0;

Table created.

Elapsed: 00:00:00.59
SQL&gt;
SQL&gt; SET autotrace on statistics
SQL&gt;
SQL&gt; INSERT INTO target.sales
  2         SELECT *
  3           FROM sh.sales;

918843 rows created.

Elapsed: 00:00:02.92

Statistics
----------------------------------------------------------
       1897  recursive calls
      40779  db block gets
       7062  consistent gets
       1585  physical reads
   <strong>38832896  redo size</strong>
        742  bytes sent via SQL*Net to client
        958  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
     918843  rows processed

SQL&gt;
SQL&gt; ROLLBACK;

Rollback complete.

Elapsed: 00:00:01.32
SQL&gt;
SQL&gt; INSERT <strong>/*+ APPEND */</strong> INTO target.sales
  2         SELECT *
  3           FROM sh.sales;

918843 rows created.

Elapsed: 00:00:06.00

Statistics
----------------------------------------------------------
       1042  recursive calls
       5581  db block gets
       2874  consistent gets
       1052  physical reads
      <strong>92108  redo size</strong>
        732  bytes sent via SQL*Net to client
        975  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
     918843  rows processed

SQL&gt;
SQL&gt; ROLLBACK;

Rollback complete.

Elapsed: 00:00:00.00
SQL&gt;  </pre>
<p>The regular insert statement generated 38M of REDO in a NOARCHIVELOG database. Interesting. And the INSERT /*+ APPEND */ statement generated only 92K. Though it would appear that neither of these statements actually executed in NOLOGGING mode, the truth is that the APPEND statement did. All statements generate a little bit of REDO, because updates to the data dictionary are always logged.</p>
<p>So why do regular inserts generate REDO on a NOARCHIVELOG database? There is a myth in the Oracle world that NOARCHIVELOG means that no REDO is generated, but that is not the case. Choosing NOARCHIVELOG mode simply means that we are foregoing the option to use media recovery (restoring datafiles, rolling forward). Think about it: REDO is not simply for media recovery, it&#8217;s also for crash recovery. If all REDO generation was suspended, Oracle wouldn&#8217;t be able to open after a simple server crash. In NOARCHIVELOG mode, there are situations where we can suspend most of the REDO generated, and one of those situations involves using the INSERT /*+ APPEND */ statement. So why would the database allow these NOLOGGING operations? Because direct-path operations write blocks directly into datafiles, bypassing the buffer cache. We wouldn&#8217;t have to rely on the online REDO logs to recover those transactions, and so Oracle allows us to minimize the REDO generated.</p>
<p>So if you have your database in NOARCHIVELOG mode for performance reasons, but you are using ETL tools that don&#8217;t support true direct-path writes on Oracle (a lot of the third-party tools don&#8217;t), or you are using cursor-based, row-by-row load scenarios, the same amount of REDO is generated if the database was in ARCHIVELOG mode. The only thing gained from operating in this manner is the privilege of having to shut down the database whenever a backup is needed.</p>
<p>Perhaps another myth that gets perpetuated is that we can&#8217;t have the best of both worlds, but in fact we can. We can minimize the amount of REDO generated, we can operate in ARCHIVELOG mode, we can backup our database in online mode, and we would be able to restore from that backup. The solution: NOLOGGING tables and indexes. I&#8217;ll put the database in ARCHIVELOG mode, and rerun the test case above with one small change: I&#8217;ll change the table to be NOLOGGING:</p>
<pre>SQL&gt; startup mount
ORACLE instance started.

Total System Global Area  422670336 bytes
Fixed Size                  1336960 bytes
Variable Size             343935360 bytes
Database Buffers           71303168 bytes
Redo Buffers                6094848 bytes
Database mounted.
SQL&gt; alter database
  2  <strong>archivelog</strong>;

Database altered.

SQL&gt; alter database
  2  open;

Database altered.

SQL&gt; SELECT log_mode
  2    FROM v$database;

LOG_MODE
------------
ARCHIVELOG

1 row selected.

Elapsed: 00:00:00.06
SQL&gt;
SQL&gt; ALTER TABLE target.sales
  2        <strong>nologging</strong>;

Table altered.

Elapsed: 00:00:01.02
SQL&gt;
SQL&gt; SET autotrace on statistics
SQL&gt;
SQL&gt; INSERT INTO target.sales
  2         SELECT *
  3           FROM sh.sales;

918843 rows created.

Elapsed: 00:00:02.47

Statistics
----------------------------------------------------------
      15560  recursive calls
      33573  db block gets
      13861  consistent gets
       6260  physical reads
   <strong>38289752  redo size</strong>
        740  bytes sent via SQL*Net to client
        958  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
        154  sorts (memory)
          0  sorts (disk)
     918843  rows processed

SQL&gt;
SQL&gt; ROLLBACK;

Rollback complete.

Elapsed: 00:00:01.45
SQL&gt;
SQL&gt; INSERT <strong>/*+ APPEND */</strong> INTO target.sales
  2         SELECT *
  3           FROM sh.sales;

918843 rows created.

Elapsed: 00:00:03.51

Statistics
----------------------------------------------------------
          1  recursive calls
       4628  db block gets
       1718  consistent gets
         59  physical reads
       <strong>8072  redo size</strong>
        732  bytes sent via SQL*Net to client
        975  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
     918843  rows processed

SQL&gt;
SQL&gt; ROLLBACK;

Rollback complete.

Elapsed: 00:00:00.03
SQL&gt; </pre>
<p>We get the exact same behavior with a NOLOGGING table in ARCHIVELOG mode than we did with NOARCHIVELOG mode. But is having the database in ARCHIVELOG mode of any value when all of our ETL processes are NOLOGGING? We can perform an online backup, but would we even be able to restore from that backup if we have transactions that executed as NOLOGGING?</p>
<p>The answer is &#8220;yes&#8221; and &#8220;yes&#8221;. We just need one small change to our backup strategy: a well-placed incremental backup.</p>
<p>To increase the performance of our incremental backup, we need to create a block change tracking file. The database keeps a list of all changed blocks so that RMAN will know exactly what to backup during an incremental:</p>
<pre>SQL&gt; alter database enable block change tracking
  2  using file '/oracle/oradata/bidw1/change_blocks.bct';

Database altered.

Elapsed: 00:00:02.16
SQL&gt; select * from
  2  v$block_change_tracking;

STATUS       | FILENAME                                 |      BYTES
------------ | ---------------------------------------- | ----------
ENABLED      | /oracle/oradata/bidw1/change_blocks.bct  |   11599872

1 row selected.

Elapsed: 00:00:00.01
SQL&gt;  </pre>
<p>We start by taking the initial incremental level 0 backup:</p>
<pre>RMAN&gt; backup incremental
2&gt; level 0 database
3&gt; plus archivelog;

Starting backup at 11-FEB-10
current log archived
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=45 device type=DISK
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=18 RECID=47 STAMP=710646180
input archived log thread=1 sequence=19 RECID=48 STAMP=710646955
channel ORA_DISK_1: starting piece 1 at 11-FEB-10
channel ORA_DISK_1: finished piece 1 at 11-FEB-10
piece handle=/oracle/flash_recovery_area/BIDW1/backupset/2010_02_11/o1_mf_annnn_TAG20100211T015555_5q7bhw0c_.bkp tag=TAG20100211T015555 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02
Finished backup at 11-FEB-10

Starting backup at 11-FEB-10
using channel ORA_DISK_1
<strong>channel ORA_DISK_1: starting incremental level 0 datafile backup set</strong>
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/oracle/oradata/bidw1/system01.dbf
input datafile file number=00002 name=/oracle/oradata/bidw1/sysaux01.dbf
input datafile file number=00003 name=/oracle/oradata/bidw1/undotbs01.dbf
input datafile file number=00004 name=/oracle/oradata/bidw1/users01.dbf
input datafile file number=00005 name=/oracle/oradata/bidw1/example01.dbf
input datafile file number=00007 name=/oracle/oradata/bidw1/target01.dbf
input datafile file number=00006 name=/oracle/oradata/bidw1/tdrep01.dbf
channel ORA_DISK_1: starting piece 1 at 11-FEB-10
channel ORA_DISK_1: finished piece 1 at 11-FEB-10
piece handle=/oracle/flash_recovery_area/BIDW1/backupset/2010_02_11/o1_mf_nnnd0_TAG20100211T015557_5q7bhz1o_.bkp tag=TAG20100211T015557 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:05:26
channel ORA_DISK_1: starting incremental level 0 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 11-FEB-10
channel ORA_DISK_1: finished piece 1 at 11-FEB-10
piece handle=/oracle/flash_recovery_area/BIDW1/backupset/2010_02_11/o1_mf_ncsn0_TAG20100211T015557_5q7btbnf_.bkp tag=TAG20100211T015557 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:09
Finished backup at 11-FEB-10

Starting backup at 11-FEB-10
current log archived
using channel ORA_DISK_1
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=20 RECID=49 STAMP=710647302
channel ORA_DISK_1: starting piece 1 at 11-FEB-10
channel ORA_DISK_1: finished piece 1 at 11-FEB-10
piece handle=/oracle/flash_recovery_area/BIDW1/backupset/2010_02_11/o1_mf_annnn_TAG20100211T020143_5q7btr8x_.bkp tag=TAG20100211T020143 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 11-FEB-10

RMAN&gt;
</pre>
<p>Now I&#8217;ll load the SALES table with another INSERT /*+ APPEND */ to make sure we have a NOLOGGING operation since our last backup.</p>
<pre>SQL&gt; insert <strong>/*+ APPEND */</strong>
  2  into target.sales
  3  select * from
  4  sh.sales;

918843 rows created.

Elapsed: 00:00:21.06

Statistics
----------------------------------------------------------
       2780  recursive calls
       6081  db block gets
       2434  consistent gets
       5442  physical reads
     <strong>136036  redo size</strong>
       1536  bytes sent via SQL*Net to client
       1155  bytes received via SQL*Net from client
          6  SQL*Net roundtrips to/from client
         10  sorts (memory)
          0  sorts (disk)
     918843  rows processed

SQL&gt; commit;

Commit complete.

Elapsed: 00:00:00.07
SQL&gt; </pre>
<p>This is the step in our process that requires a slight change to our backup and recovery strategy: we should get an incremental level 1 backup as soon as the load is complete. This will physically backup all blocks that have been affected by the load, and we wouldn&#8217;t need to logically apply the REDO logs that are missing the NOLOGGING operations. Since we have changed block tracking, this step will be extremely fast, and I recommend that the ETL process flow or main driving script execute the backup as the very last step in the batch load.</p>
<pre>RMAN&gt; backup incremental
2&gt; level 1 database
3&gt; plus archivelog;

Starting backup at 11-FEB-10
current log archived
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=30 device type=DISK
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=18 RECID=47 STAMP=710646180
input archived log thread=1 sequence=19 RECID=48 STAMP=710646955
input archived log thread=1 sequence=20 RECID=49 STAMP=710647302
input archived log thread=1 sequence=21 RECID=50 STAMP=710648694
channel ORA_DISK_1: starting piece 1 at 11-FEB-10
channel ORA_DISK_1: finished piece 1 at 11-FEB-10
piece handle=/oracle/flash_recovery_area/BIDW1/backupset/2010_02_11/o1_mf_annnn_TAG20100211T022455_5q7d67t6_.bkp tag=TAG20100211T022455 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 11-FEB-10

Starting backup at 11-FEB-10
using channel ORA_DISK_1
<strong>channel ORA_DISK_1: starting incremental level 1 datafile backup set</strong>
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/oracle/oradata/bidw1/system01.dbf
input datafile file number=00002 name=/oracle/oradata/bidw1/sysaux01.dbf
input datafile file number=00003 name=/oracle/oradata/bidw1/undotbs01.dbf
input datafile file number=00004 name=/oracle/oradata/bidw1/users01.dbf
input datafile file number=00005 name=/oracle/oradata/bidw1/example01.dbf
input datafile file number=00007 name=/oracle/oradata/bidw1/target01.dbf
input datafile file number=00006 name=/oracle/oradata/bidw1/tdrep01.dbf
channel ORA_DISK_1: starting piece 1 at 11-FEB-10
channel ORA_DISK_1: finished piece 1 at 11-FEB-10
piece handle=/oracle/flash_recovery_area/BIDW1/backupset/2010_02_11/o1_mf_nnnd1_TAG20100211T022457_5q7d6cgv_.bkp tag=TAG20100211T022457 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting incremental level 1 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 11-FEB-10
channel ORA_DISK_1: finished piece 1 at 11-FEB-10
piece handle=/oracle/flash_recovery_area/BIDW1/backupset/2010_02_11/o1_mf_ncsn1_TAG20100211T022457_5q7d6t16_.bkp tag=TAG20100211T022457 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 11-FEB-10

Starting backup at 11-FEB-10
current log archived
using channel ORA_DISK_1
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=22 RECID=51 STAMP=710648715
channel ORA_DISK_1: starting piece 1 at 11-FEB-10
channel ORA_DISK_1: finished piece 1 at 11-FEB-10
piece handle=/oracle/flash_recovery_area/BIDW1/backupset/2010_02_11/o1_mf_annnn_TAG20100211T022515_5q7d6vg7_.bkp tag=TAG20100211T022515 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 11-FEB-10

RMAN&gt;</pre>
<p>Now, let&#8217;s see if we can restore:</p>
<pre>RMAN&gt; startup mount

Oracle instance started
database mounted

Total System Global Area     422670336 bytes

Fixed Size                     1336960 bytes
Variable Size                356518272 bytes
Database Buffers              58720256 bytes
Redo Buffers                   6094848 bytes

<strong>RMAN&gt; restore database;</strong>

Starting restore at 11-FEB-10
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=18 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to /oracle/oradata/bidw1/system01.dbf
channel ORA_DISK_1: restoring datafile 00002 to /oracle/oradata/bidw1/sysaux01.dbf
channel ORA_DISK_1: restoring datafile 00003 to /oracle/oradata/bidw1/undotbs01.dbf
channel ORA_DISK_1: restoring datafile 00004 to /oracle/oradata/bidw1/users01.dbf
channel ORA_DISK_1: restoring datafile 00005 to /oracle/oradata/bidw1/example01.dbf
channel ORA_DISK_1: restoring datafile 00006 to /oracle/oradata/bidw1/tdrep01.dbf
channel ORA_DISK_1: restoring datafile 00007 to /oracle/oradata/bidw1/target01.dbf
channel ORA_DISK_1: reading from backup piece /oracle/flash_recovery_area/BIDW1/backupset/2010_02_11/o1_mf_nnnd0_TAG20100211T015557_5q7bhz1o_.bkp
channel ORA_DISK_1: piece handle=/oracle/flash_recovery_area/BIDW1/backupset/2010_02_11/o1_mf_nnnd0_TAG20100211T015557_5q7bhz1o_.bkp tag=TAG20100211T015557
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:06:37
Finished restore at 11-FEB-10

<strong>RMAN&gt; recover database;</strong>

Starting recover at 11-FEB-10
using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001: /oracle/oradata/bidw1/system01.dbf
destination for restore of datafile 00002: /oracle/oradata/bidw1/sysaux01.dbf
destination for restore of datafile 00003: /oracle/oradata/bidw1/undotbs01.dbf
destination for restore of datafile 00004: /oracle/oradata/bidw1/users01.dbf
destination for restore of datafile 00005: /oracle/oradata/bidw1/example01.dbf
destination for restore of datafile 00006: /oracle/oradata/bidw1/tdrep01.dbf
destination for restore of datafile 00007: /oracle/oradata/bidw1/target01.dbf
channel ORA_DISK_1: reading from backup piece /oracle/flash_recovery_area/BIDW1/backupset/2010_02_11/o1_mf_nnnd1_TAG20100211T022457_5q7d6cgv_.bkp
channel ORA_DISK_1: piece handle=/oracle/flash_recovery_area/BIDW1/backupset/2010_02_11/o1_mf_nnnd1_TAG20100211T022457_5q7d6cgv_.bkp tag=TAG20100211T022457
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:15

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished recover at 11-FEB-10

<strong>RMAN&gt; alter database open;</strong>

database opened

RMAN&gt; </pre>
<p>So that&#8217;s it for the Three &#8220;R&#8221;&#8217;s. I had a lot of fun revisiting the &#8220;operations&#8221; side of the house, and logging in as SYSDBA again. It&#8217;s amazing how it all just came back to me&#8230; I didn&#8217;t have to look at the manuals at all. Okay&#8230; maybe once.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/02/12/data-warehouse-fault-tolerance-part-3-restoring/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data Warehouse Fault Tolerance Part 2: Restarting</title>
		<link>http://www.rittmanmead.com/2010/02/10/data-warehouse-fault-tolerance-part-2-restarting/</link>
		<comments>http://www.rittmanmead.com/2010/02/10/data-warehouse-fault-tolerance-part-2-restarting/#comments</comments>
		<pubDate>Wed, 10 Feb 2010 05:50:13 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4321</guid>
		<description><![CDATA[In my last post, I described the First &#8220;R&#8221; in data warehouse fault tolerance: Resuming. As I mentioned in the introduction to this series, my goal is a triage approach where the simple things, such as space errors, are handled effortlessly without repercussions. But what happens when the errors are not so simple, and Oracle&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>In my last <a href="http://www.rittmanmead.com/2010/02/08/data-warehouse-fault-tolerance-part-1-resuming/">post</a>, I described the First &#8220;R&#8221; in data warehouse fault tolerance: Resuming. As I mentioned in the <a href="http://www.rittmanmead.com/2010/02/02/data-warehouse-fault-tolerance-an-introduction/">introduction</a> to this series, my goal is a triage approach where the simple things, such as space errors, are handled effortlessly without repercussions. But what happens when the errors are not so simple, and Oracle&#8217;s built in resuming functionality can&#8217;t catch it? In these cases, the ETL processing will actually error or fail, and the cause will have to be corrected before the load can be restarted.</p>
<p>There are numerous approaches to crafting sustainable ETL; as a matter of fact, Peter Scott wrote a <a href="http://www.rittmanmead.com/2009/04/30/simple-steps-to-sustainable-etl/">post</a> by that very title. Jon Mead contributed <a href="http://www.rittmanmead.com/2008/05/31/resuming-your-etl-process-in-owb/">this one</a> about resuming ETL processes. Just a note of clarification: what Jon describes as &#8220;resuming&#8221; is what I am describing here as &#8220;code-controlled restarting&#8221;: building smart ETL process flows and instrumented mappings so that a record is kept of what&#8217;s already been run. This is a required component, and I recommend coding best practices such as these into all ETL processes. But the restartability feature I&#8217;m focusing on is &#8220;data management restartability&#8221;, which deals with controlling data sets after failures. So the feature that I&#8217;m plugging in for the Restartability phase is Oracle&#8217;s Flashback functionality.</p>
<p>Flashback provides the capability to revert the entire database, or smaller portions of it, to a particular point in time. For Oracle, a &#8220;point in time&#8221; is always referenced by the System Change Number(SCN), an internal clock for the Oracle Database. It auto-increments every time a transaction commits, but other sources, such as the SMON process, can increment the SCN as well. The current SCN can be viewed in many of the data dictionary tables, as well as using the DBMS_FLASHBACK server package.</p>
<pre>SQL&gt; select current_scn,
  2  dbms_flashback.get_system_change_Number
  3  from v$database;

CURRENT_SCN | GET_SYSTEM_CHANGE_NUMBER
----------- | ------------------------
    2536238 |                  2536238

1 row selected.

SQL&gt; </pre>
<p>We can convert from SCN&#8217;s to timestamps and back again, but this conversion is not exact. The Oracle documentation states that the functions are precise to about 3 seconds, which is evident from this example:</p>
<pre>SQL&gt; select SCN_TO_TIMESTAMP(2536238) scn
  2  from dual;

SCN
------------------------------
02/09/2010 12.47.26000000000

1 row selected.

SQL&gt; select TIMESTAMP_TO_SCN('02/09/2010 12.47.26000000000') ts
  2  from dual;

        TS
----------
   2536237

1 row selected.

SQL&gt;</pre>
<p>Even though we access both Flashback Database and Flashback Table with the same general syntax and specify SCN&#8217;s for both incarnations, the technical implementation under the hood is drastically different. Flashback Table is completely an UNDO operation, and is really not a new feature at all. Oracle has always used the UNDO space (rollback segments before that) to manage the state of tables as of a particular SCN to allow the robust multi-versioning that keeps reads and writes from blocking one another. Flashback Table is just an &#8220;opening&#8221; of the multi-version API, in a manner of speaking, so that any SCN can be viewed as long as sufficient UNDO exists.</p>
<p>Flashback Database, on the other hand, doesn&#8217;t use UNDO at all, instead using new instance file structures called flashback logs in conjunction with a little bit of archived redo. Flashback logs contain prior versions of changed blocks, and we use the version of the block just prior to the SCN of interest and put them back in the datafiles, followed by redo log recovery to get the database to the exact point of the SCN.</p>
<p>So what part does Oracle&#8217;s Flashback technology play with data warehouse fault tolerance, specifically in the area of Restartability? Some aspect of the load will likely need to be &#8220;undone&#8221; before we can continue, and this is where Flashback fits in neatly, as demonstrated in the following examples.</p>
<p>I created copies of the CUSTOMERS, PRODUCTS and SALES tables from the SH schema and inserted the rows from there as well. Before I start, I need to enable row movement on the new SALES table. This would need to be implemented for all tables in the data warehouse that are a consideration for Flashback Table:</p>
<pre>SQL&gt; alter table target.sales enable row movement;

Table altered.

SQL&gt;
SQL&gt; SELECT count(*) FROM target.products;

  COUNT(*)
----------
       72

1 row selected.

SQL&gt; SELECT count(*) FROM target.customers;

  COUNT(*)
----------
     55500

1 row selected.

SQL&gt; SELECT count(*) FROM target.sales;

  COUNT(*)
----------
    918843

1 row selected.

SQL&gt; </pre>
<p>Next I&#8217;ll get ready to execute my code. First, I&#8217;ll create what&#8217;s called a &#8220;restore point&#8221; in the database. This allows me to give an intelligent name to a particular SCN and is similar to tagging a release in Subversion. Before each new step in the process, I&#8217;ll create a restore point so that each phase has a tagged, referenceable SCN. As I&#8217;m using the concept of a unique, sequence-generated number for each batch that runs (Jon calls it an &#8220;execution ID&#8221; in his posting above), I&#8217;ll work that number into the name of my restore points. </p>
<pre>SQL&gt; create restore point dw_load_1001;

Restore point created.

SQL&gt;</pre>
<p>Next&#8230; I do the processing that moves the necessary files into place (if any), prepares and loads the ODS tables, etc. After that&#8230; I move into the load of the dimensional model itself:</p>
<pre>SQL&gt; create restore point load_customers_1001;

Restore point created.

SQL&gt; exec dw_load.load_customers;
Number of records loaded: 0

PL/SQL procedure successfully completed.

SQL&gt; create restore point load_products_1001;

Restore point created.

SQL&gt; exec dw_load.load_products;
Number of records loaded: 72

PL/SQL procedure successfully completed.

SQL&gt; create restore point load_sales_1001;

Restore point created.

SQL&gt; exec dw_load.load_sales;
5 indexes and 0 local index partitions affected on table TARGET.SALES
Number of records loaded: 699999
Rebuild processes for unusable indexes on 28 partitions of table TARGET.SALES executed
No matching unusable global indexes found

PL/SQL procedure successfully completed.

SQL&gt; </pre>
<p>So the data warehouse load ran without error, so I can assume that it was successful, right? In looking back over the log, I see that no rows were actually loaded into the CUSTOMERS table. After researching the issue, I discover that the Change Data Capture process on the source system is experiencing errors, and there were no rows published to the CUSTOMERS change set. Since the load didn&#8217;t technically fail, the process continued to the load of the fact table, and it&#8217;s very likely that many of the rows in the fact table have the wrong surrogate key from the CUSTOMERS table.</p>
<p>In describing my triage approach from earlier postings, the &#8220;aftermath&#8221; is exactly what I&#8217;m trying to avoid. In my experience, ETL load failures and the subsequent aftermath (investigations, data corrections, and reloads) cause more downtime than any other hardware or software related issues. But with the approach I&#8217;ve put into place, this aftermath shouldn&#8217;t concern me, because now I can simply &#8220;undo&#8221; it (pun intended).</p>
<pre>SQL&gt; flashback table target.sales to restore point load_sales_1001;

Flashback complete.

SQL&gt; select count(*) from target.sales;

  COUNT(*)
----------
    918843

1 row selected.

SQL&gt; create restore point new_load_customers_1001;

Restore point created.

SQL&gt; exec dw_load.load_customers;
Number of records loaded: 99

PL/SQL procedure successfully completed.

SQL&gt; create restore point new_load_sales_1001;

Restore point created.

SQL&gt; exec dw_load.load_sales;
5 indexes and 0 local index partitions affected on table TARGET.SALES
Number of records loaded: 699999
Rebuild processes for unusable indexes on 28 partitions of table TARGET.SALES executed
No matching unusable global indexes found

PL/SQL procedure successfully completed.

SQL&gt; </pre>
<p>Instead of flashing back, I could try to sort out the issue. For instance, if I&#8217;m attaching the unique execution ID to every row in the fact table, either directly, or through an AUDIT dimension table, then I could probably identify the rows for this run. But why would I do this when the Flashback functionality is already available to me?</p>
<p>My test case above was a simple one; I was able to proceed just by flashing back a single table before restarting the process. However, in a large enterprise data warehouse, the effort involved in a typical aftermath could be staggering depending on how many fact tables are involved, how many dimension tables track history with SCD Type 2 changes, etc. Combine that with the possible need to flashback ODS tables, history tables, persistent staging tables, etc. I&#8217;ve seen numerous situations where the exact ramifications are tough to quantify: we know what broke, but we have no idea what needs to be fixed. Perhaps there was a hardware failure in the middle of an ETL load, and it&#8217;s hard to identify just exactly which tables were loaded and which ones weren&#8217;t. In this case, what I really need is the ability to do a complete &#8220;do-over&#8221;: put everything back the way it was prior to the beginning of the load, and just restart everything.</p>
<p>Enter Flashback Database. So I&#8217;ll demonstrate what&#8217;s required to enable this feature, and then I&#8217;ll replay the test case above and solve it from this angle.</p>
<p>I first need to put my database in Archive Log Mode, as archived redo is a required component of the feature:</p>
<pre>SQL&gt; startup mount
ORACLE instance started.

Total System Global Area  422670336 bytes
Fixed Size                  1336960 bytes
Variable Size             335546752 bytes
Database Buffers           79691776 bytes
Redo Buffers                6094848 bytes
Database mounted.
SQL&gt; alter database archivelog;

Database altered.

SQL&gt; archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     25
Next log sequence to archive   27
Current log sequence           27
SQL&gt; </pre>
<p>Next, I need to configure the Flash Recovery Area, which is a file system on the server where the database will create the flashback logs:</p>
<pre>SQL&gt; alter system set db_recovery_file_dest_size=3G;

System altered.

SQL&gt; Alter system set db_recovery_file_dest='/oracle/flash_recovery_area';

System altered.

SQL&gt;</pre>
<p>Finally, I need to set the <strong>flashback_retention_target</strong> parameter, which instructs the Flash Recovery Area on our needs for retention. This parameter is actually in minutes&#8230; thanks for the consistency Oracle. After that, I just enable flashback and open the database:</p>
<pre>SQL&gt; alter system set db_flashback_retention_target=2880;

System altered.

SQL&gt; alter database flashback on;

Database altered.

SQL&gt; alter database open;

Database altered.

SQL&gt; </pre>
<p>So, Flashback Database should be ready to use. I&#8217;ll take a quick look and see if the database thinks it&#8217;s ready:</p>
<pre>SQL&gt; select oldest_flashback_scn,
  2         oldest_flashback_time,
  3         startup_time
  4    from v$flashback_database_log
  5         cross join v$instance;

OLDEST_FLASHBACK_SCN | OLDEST_FLASHBACK_TIME  | STARTUP_TIME
-------------------- | ---------------------- | ----------------------
             2912097 | 02/10/2010 12:10:30 AM | 02/10/2010 12:09:08 AM

1 row selected.

Elapsed: 00:00:00.15
SQL&gt;  </pre>
<p>Now I&#8217;ll flashback the entire database to the very first restore point I created: dw_load_1001:</p>
<pre>SQL&gt; shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL&gt; startup mount
ORACLE instance started.

Total System Global Area  422670336 bytes
Fixed Size                  1336960 bytes
Variable Size             343935360 bytes
Database Buffers           71303168 bytes
Redo Buffers                6094848 bytes
Database mounted.
SQL&gt; flashback database to restore point dw_load_1001;

Flashback complete.

SQL&gt; alter database open resetlogs;

Database altered.

SQL&gt; SELECT count(*) FROM target.products;

  COUNT(*)
----------
       72

1 row selected.

SQL&gt; SELECT count(*) FROM target.customers;

  COUNT(*)
----------
     55500

1 row selected.

SQL&gt; SELECT count(*) FROM target.sales;

  COUNT(*)
----------
    918843

1 row selected.

SQL&gt;</pre>
<p>So the immediate downside of this approach is that it requires the involvement of the operations team because the database has to be in mount mode, and the data warehouse is not available during this slight outage. However, when compared with the time it might take to sort out and correct massive aftermath scenarios, this seems to be the preferable choice. Is the data warehouse really &#8220;available&#8221; if data corrections and data reloads are occurring? I would rather involve the operations team for a quick, concrete fix so the reload can complete as soon as possible.</p>
<p>The next &#8220;R&#8221; is Restoring, though it really involves putting the pieces in place for a scalable Backup strategy. And &#8220;Backup&#8221; doesn&#8217;t start with an &#8220;R&#8221;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/02/10/data-warehouse-fault-tolerance-part-2-restarting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Warehouse Fault Tolerance Part 1: Resuming</title>
		<link>http://www.rittmanmead.com/2010/02/08/data-warehouse-fault-tolerance-part-1-resuming/</link>
		<comments>http://www.rittmanmead.com/2010/02/08/data-warehouse-fault-tolerance-part-1-resuming/#comments</comments>
		<pubDate>Mon, 08 Feb 2010 14:41:45 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4314</guid>
		<description><![CDATA[In the introduction to this series of posts, I spoke briefly about data warehouse fault tolerance and the unique challenges resulting from high data volumes combined the batch load window required to create them. I then defined the goal: a layered approach allowing simple errors to be caught early before they turn in to serious [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://www.rittmanmead.com/2010/02/02/data-warehouse-fault-tolerance-an-introduction/">introduction</a> to this series of posts, I spoke briefly about data warehouse fault tolerance and the unique challenges resulting from high data volumes combined the batch load window required to create them. I then defined the goal: a layered approach allowing simple errors to be caught early before they turn in to serious conditions.</p>
<p>Resuming is the ability to continue effortlessly after an error. The important thing is that there should be no aftermath from the error: our process should pause gracefully until the error is corrected. The Oracle Database has offered out of the box functionality for resuming since version 9i in the form of Resumable Space Allocation. Resumable operations are supported for SELECT queries, DML and DDL, and can be enabled at either the system or the session level. To enable at the system level, the RESUMABLE_TIMEOUT database parameter should have a non-zero value.</p>
<pre>SQL&gt; alter system set resumable_timeout=3600;

System altered.

SQL&gt;</pre>
<p>To enable resumable operations at the session level, the statement follows this basic syntax, with the TIMEOUT and NAME clauses being optional:</p>
<p>ALTER SESSION ENABLE RESUMABLE &lt;TIMEOUT <em>n</em>&gt; &lt;NAME <em>string</em>&gt;;</p>
<p>The TIMEOUT value is specified in seconds, and if omitted, the default value of 7200 is used, or 2 hours. The NAME clause gives the resumable session a user-friendly name for when we are monitoring for resumable sessions (as we will see later) to see which of our processes is suspended. Enabling resumable operations for the session level requires that the RESUMABLE permission has been granted:</p>
<pre>SQL&gt; grant resumable to stewart;

Grant succeeded.

SQL&gt;</pre>
<p>Resumable operations can also be enabled with the Oracle utilities&#8230; such as SQL-Loader, Export/Import and Datapump. The command-line parameters RESUMABLE, RESUMABLE_NAME and RESUMABLE_TIMEOUT exist to mimic the functionality mentioned above.</p>
<p>Now for a demonstration. I&#8217;ll create a situation that is ripe for a space allocation error: I&#8217;ll put an empty copy of the SALES fact table from the SH schema in a tablespace with only 250K of space:</p>
<pre>SQL&gt; create tablespace target datafile '/oracle/oradata/bidw1/target01.dbf' size 250K;

Tablespace created.

SQL&gt; create table target.sales tablespace target as select * from sh.sales where 1=0;

Table created.

SQL&gt;</pre>
<p>Now I&#8217;ll load some records into the table, which should cause it to suspend. To prepare my session, I need to enable resumable operations. Since I always instrument my code, I&#8217;ll register my process with the database. After that, I have an easy way to guarantee consistency when referring to processes. Now, I can use the registered name for my resumable session as well:</p>
<pre>SQL&gt; exec dbms_application_info.set_module('SALES fact load','insert some rows');

PL/SQL procedure successfully completed.

SQL&gt;
SQL&gt; DECLARE
  2     l_module VARCHAR2(48) := sys_context('USERENV','MODULE');
  3  BEGIN
  4     EXECUTE IMMEDIATE
  5     'alter session enable resumable timeout 18000 name '''||l_module||'''';
  6  END;
  7  /

PL/SQL procedure successfully completed.

SQL&gt;</pre>
<p>I start loading the records in hopes of a suspended session:</p>
<pre>SQL&gt; insert into target.sales select * from sh.sales;</pre>
<p>So now, I open up another session, and I start another transaction against the TARGET.SALES table, just to pile on the TARGET tablespace:</p>
<pre>SQL&gt; exec dbms_application_info.set_module('SALES fact load2','insert more rows');

PL/SQL procedure successfully completed.

SQL&gt;
SQL&gt; DECLARE
  2     l_module VARCHAR2(48) := sys_context('USERENV','MODULE');
  3  BEGIN
  4     EXECUTE IMMEDIATE
  5     'alter session enable resumable timeout 18000 name '''||l_module||'''';
  6  END;
  7  /

PL/SQL procedure successfully completed.

SQL&gt; insert into target.sales select * from sh.sales;</pre>
<p>I&#8217;ll have a look in the DBA_RESUMABLE view (there is also a USER_RESUMABLE version) for my suspended sessions. Even though I could get all the following information with a single SQL statement, I broke it up for better visibility on the blog:</p>
<pre>SQL&gt; select name, start_time, suspend_time, status from dba_resumable;

NAME              | START_TIME           | SUSPEND_TIME         | STATUS
----------------- | -------------------- | -------------------- | ------------
SALES fact load2  | 02/06/10 10:33:33    | 02/06/10 10:33:33    | SUSPENDED
SALES fact load   | 02/06/10 10:29:03    | 02/06/10 10:29:03    | SUSPENDED

2 rows selected.

Elapsed: 00:00:00.07
SQL&gt; select name, sql_text from dba_resumable;

NAME              | SQL_TEXT
----------------- | -----------------------------------------------
SALES fact load2  | insert into target.sales select * from sh.sales
SALES fact load   | insert into target.sales select * from sh.sales

2 rows selected.

SQL&gt; select name, error_msg from dba_resumable;

NAME              | ERROR_MSG
----------------- | ------------------------------------------------------------------------
SALES fact load2  | ORA-01653: unable to extend table TARGET.SALES by 8 in tablespace TARGET
SALES fact load   | ORA-01653: unable to extend table TARGET.SALES by 8 in tablespace TARGET

2 rows selected.

SQL&gt;</pre>
<p>The Oracle Database also publishes server alerts concerning suspended transactions using the Server-Generated Alerts infrastructure. This infrastructure uses the AWR toolset, the server package DBMS_SERVER_ALERT for getting and setting metric threshholds, and the queue table ALERT_QUE to hold alerts that have been published from AWR. Custom processes could be written to mine ALERT_QUE for these alerts, but the easiest way to configure and view server alerts is using Oracle Enterprise Manager (OEM). On the Alerts section of the main OEM page, we can see three different alerts generated by the Oracle Database:</p>
<div style="text-align:center"><a href="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/all-alerts.png"><img class="aligncenter" src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/all-alerts.png" border="0" alt="all alerts.png" width="500" height="228" /></a></div>
<p>If we click on the &#8220;Session Suspended&#8221; link, we can see the multiple alerts generated in this category:</p>
<div style="text-align:center"><a href="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/suspend-alerts.png"><img class="aligncenter" src="http://www.rittmanmead.com/wp2/wp-content/uploads/2010/02/suspend-alerts.png" border="0" alt="suspend alerts.png" width="500" height="81" /></a></div>
<p>Another alert generated indirectly by the suspended transaction is the &#8220;Configuration&#8221; class event caused by our session &#8220;waiting&#8221; to proceed. The Oracle wait event interface can show us information about the suspend waits on the system:</p>
<pre>SQL&gt; SELECT event,
  2         SUM(time_waited) time_waited,
  3         SUM(total_waits) total_waits,
  4         AVG(average_wait) average_wait
  5    FROM gv$session_event
  6   WHERE lower(event) LIKE '%suspend%'
  7   GROUP BY event
  8   ORDER BY time_waited ASC
  9  /

EVENT                                          | TIME_WAITED | TOTAL_WAITS | AVERAGE_WAIT
---------------------------------------------- | ----------- | ----------- | ------------
statement suspended, wait error to be cleared  |      305373 |        1377 |       221.78

1 row selected.

SQL&gt;</pre>
<p>To free up the space issue, I&#8217;ll enable autoextend on the TARGET tablespace. Then, I&#8217;ll take a look and see if anything has changed:</p>
<pre>SQL&gt; alter database datafile '/oracle/oradata/bidw1/target01.dbf'
  2  autoextend on next 10M maxsize 1000M;

Database altered.

SQL&gt; select status, resume_time, name from dba_resumable;

STATUS       | RESUME_TIME          | NAME
------------ | -------------------- | -----------------
NORMAL       | 02/06/10 10:56:49    | SALES fact load2
NORMAL       | 02/06/10 10:56:49    | SALES fact load

2 rows selected.

SQL&gt;</pre>
<p>The Resumable Space Allocation features includes the AFTER SUSPEND trigger, which allows the specification of a system-wide trigger that will fire whenever a transaction is suspended. The typical use for this functionality is alerting as suspended operations don&#8217;t write anything to the alert log.</p>
<p><strong>UPDATE: I made a mistake here&#8230; suspended transactions do in fact cause entries in the alert log, and so does the RESUME process detailed below.</strong></p>
<p>There are some features in the DBMS_RESUMABLE package that may make sense when writing an AFTER SUSPEND trigger:</p>
<pre>SQL&gt; desc dbms_resumable
PROCEDURE ABORT
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SESSIONID                      NUMBER                  IN
FUNCTION GET_SESSION_TIMEOUT RETURNS NUMBER
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SESSIONID                      NUMBER                  IN
FUNCTION GET_TIMEOUT RETURNS NUMBER
PROCEDURE SET_SESSION_TIMEOUT
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SESSIONID                      NUMBER                  IN
 TIMEOUT                        NUMBER                  IN
PROCEDURE SET_TIMEOUT
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TIMEOUT                        NUMBER                  IN
FUNCTION SPACE_ERROR_INFO RETURNS BOOLEAN
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 ERROR_TYPE                     VARCHAR2                OUT
 OBJECT_TYPE                    VARCHAR2                OUT
 OBJECT_OWNER                   VARCHAR2                OUT
 TABLE_SPACE_NAME               VARCHAR2                OUT
 OBJECT_NAME                    VARCHAR2                OUT
 SUB_OBJECT_NAME                VARCHAR2                OUT

SQL&gt;</pre>
<p>This package adds functionality for writing custom processes in the AFTER SUSPEND trigger. The SPACE_ERROR_INFO function returns specifics about the table and tablespace affected by the space error. A series of checks could be coded enabling specific actions depending on which objects were affected. A suspended process can be ended prematurely with the ABORT procedure, or more time can be added using the SET_TIMEOUT procedure. I actually had one client explain how she had written an AFTER SUSPEND trigger that compiled information about the tablespace affected so that an &#8220;ALTER DATABASE&#8230; RESIZE&#8230;&#8221; command could be issued to add more space to the affected datafile. I didn&#8217;t have the heart to tell her that she had basically written a feature that already existed in the database: AUTOEXTEND.</p>
<p>So what are the best practices to take away from this? Quite simply&#8230; all ETL mappings and flows, as well as database maintenance processes, should use Resumable Space Allocation, preferably using the NAME clause in conjunction with DBMS_APPLICATION_INFO. Setting a RESUMABLE_TIMEOUT value at the system level can be scary, because a single suspended transaction could cause locks that reverberate all the way through the system. But is this really a concern in a BI/DW environment? Are there any processes in our batch load window or with any of our operational maintenance processes that we wouldn&#8217;t want to enable for resumable operations, no matter how many processes back up waiting for them to complete? It could spell bad news if we used any kind of synchronous replication technology to move data to the DW instance, but short of that, I can&#8217;t think of any. Please let me know if you have alternative viewpoints.</p>
<p>I&#8217;ve never found much reason to use the AFTER SUSPEND trigger though. Data warehouses should have production-type monitoring running already, just like other production systems. OEM is more than satisfactory for basic monitoring and alerting, and with the Server-Generated Alerts introduced in 10g, forms a complete product for Oracle environments. But regardless of which monitoring solution is used, it should be able to issue simple queries against the database and alert based on the results of those queries. A select against the DBA_RESUMABLE table provides all the information required to send out an alert, and with features such as AUTOEXTEND, I just can&#8217;t see a requirement for the ability to issue procedural code because a transaction is suspended.</p>
<p><strong>UPDATE: as pointed out above, since suspended transactions do in fact show up in the alert log, this is good news for integrating Resumable Space Allocation into an existing environment. Assuming that there&#8217;s proper alert log monitoring with paging functionality already in place, implementing resumable operations can simply use that infrastructure already in place.</strong></p>
<p>Keep your eyes open for the next of the &#8220;Three R&#8217;s&#8221; in BI/DW fault tolerance: Restarting.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/02/08/data-warehouse-fault-tolerance-part-1-resuming/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Data Warehouse Fault Tolerance: An Introduction</title>
		<link>http://www.rittmanmead.com/2010/02/02/data-warehouse-fault-tolerance-an-introduction/</link>
		<comments>http://www.rittmanmead.com/2010/02/02/data-warehouse-fault-tolerance-an-introduction/#comments</comments>
		<pubDate>Tue, 02 Feb 2010 17:06:01 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4249</guid>
		<description><![CDATA[With so much of the blog devoted to OBIEE, OWB and Essbase lately, I felt like it was time to do a few database-related postings. In the past, when I&#8217;ve posted database content to the blog, I usually gravitate toward ETL-related features: those that waffle between database administration and ETL development. But this time I&#8217;m [...]]]></description>
			<content:encoded><![CDATA[<p>With so much of the blog devoted to OBIEE, OWB and Essbase lately, I felt like it was time to do a few database-related postings. In the past, when I&#8217;ve posted database content to the blog, I usually gravitate toward ETL-related features: those that waffle between database administration and ETL development. But this time I&#8217;m going to take a very different route and discuss data warehouse fault tolerance, and so I&#8217;ll be doing a series of postings that discuss what it means to strive to be fault-free.</p>
<p>Fault tolerance isn&#8217;t disaster recovery exactly&#8230; though there&#8217;s a lot of overlap. Instead, fault tolerance is the ability to recover from errors, and those errors can result from hardware issues, software issues, general systems issues (network latency, out-of-space errors), and human mistakes. The main point is that BI/DW environments present unique challenges, both for operations and for the development team. I&#8217;m not preposing that the divide between transactional and reporting systems is necessarily vast&#8230; we still need redundant storage systems and dependable backup strategies. I am preposing, however, that one-size-fits-all approaches to fault-tolerance is problematic, and applying standards that evolved in support of transactional systems may not provide the best protection for BI/DW environments.</p>
<p>The operational teams (DBAs, Unix Admins, Storage Admins, etc.) and the development teams (source system extraction, ETL) have to work closer in a BI/DW than perhaps they do in OLTP environments. Of course, OLTP developers have to write scalable code&#8230; but I think that&#8217;s within their control for the most part. ETL developers are thrashing around millions or billions of rows of data, and because of this, everything needs to be well-oiled: undo spaces need to be available, temp space needs to be plentiful, standard operational jobs such as backup and recovery or statistics gathering need to keep the batch load window in mind, etc. Whereas OLTP code is exclusively SQL&#8230; ETL code is packed full of DDL: partition-exchange loads, index and constraint maintenance, table truncates, the whole gamut.</p>
<p>So when working with millions or billions of rows, we need to eliminate errors as best we can. Sounds simple enough, but the truth is that errors are going to happen, and there&#8217;s nothing we can do to wipe them out completely. But we can mitigate. So we need to introduce a triage process: catching and correcting errors as early as possible so that their damage is minimal. In essence: don&#8217;t let simple errors turn into weekend-long data correction issues, where millions of rows need to be updated or deleted. Let&#8217;s work smarter, not harder, using every solution available to use, including features present in the Oracle Database, best practices in ETL development, and possible modifications to our backup and recovery strategies.</p>
<p>I should note that, when speaking of BI/DW environments, I still have the batch load paradigm squarely in mind. Although the line in the sand is certainly moving in one constant direction, most data warehouses are still loaded with either batch or mini-batch processing. However, being a fan of near-real-time data warehouse techniques (as my colleague Peter Scott has written from time to time&#8230; only reporting from the source system itself is truly real-time), I&#8217;ll be sure to point out how some of these techniques differ the closer we get to the actual transaction.</p>
<p>I currently have three postings in mind that correlate to the Three R&#8217;s of Data Warehouse Fault Tolerance. Be on the lookout for the first installment coming soon: Resuming.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/02/02/data-warehouse-fault-tolerance-an-introduction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Capturing Change (Part 2)</title>
		<link>http://www.rittmanmead.com/2009/12/30/capturing-change-part-2/</link>
		<comments>http://www.rittmanmead.com/2009/12/30/capturing-change-part-2/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 10:54:21 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=3985</guid>
		<description><![CDATA[In the previous part I outlined the business need for writing our own CDC routines and started to outline some of the issues we would need to resolve. Here I shall outline the approach we took and describe how to go about building the SQL needed to synthesise the rows that need to be added [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://www.rittmanmead.com/2009/12/08/capturing-change-part-1/" target="_blank">previous</a> part I outlined the business need for writing our own CDC routines and started to outline some of the issues we would need to resolve. Here I shall outline the approach we took and describe how to go about building the SQL needed to synthesise the rows that need to be added to our data warehouse<br />
First a recap of the types of records we will need to process</p>
<pre>operation$...	PK	...	C1
 I		87		'J' -- a newly inserted row
UO		1		NULL
UN		1		'X' -- NULL value becomes 'X'
UO		99		'Y'
UN		99		'Z' -- 'Y' becomes 'Z'
UO		42		'A'
UN		42		NULL -- 'A' becomes NULL
UO		39		NULL
UN		39		NULL -- data value remains unchanged (not necessarily NULL)</pre>
<p>As I mentioned last time we have no real interest in delete records (in fact my source system does not generate deletes); we are using CDC to populate a data warehouse and the fact that a record had existed is of interest to us. As I also mentioned we are not using supplementary logging so we can&#8217;t just apply the I and UN records in chronological order</p>
<p>A point I did not emphasise last time is that change data capture does just that: it captures changes, in fact every commit of a change. This means that we may see changes that are of no interest to our data warehouse. It also means we may see &#8216;non-changes&#8217; where a record is &#8220;updated&#8221; and then committed without changes to data values being made. I would suggest that the best way to filter out these records (if filtering is appropriate for your business) is as part of the downstream ETL and not of the change capture process.</p>
<p>So how do we go about the building of a change capture view. The first thing is to realise that we can do this with analytic functions, the second thing is that we will need a lot of them (several for each column being processed) and this can have a major (negative) impact on query performance as we carry out many window sort operations on the same dataset.</p>
<p>Let&#8217;s start with the update row type. For an update we need a row to exist, therefore any change should be built on a prior version of the row. If we consider the first update for a row in the subscriber view it must either be for a newly inserted row or one we already hold in target table.  Subsequent updates in the same CDC subscriber view need to be applied in the order they occurred. As the CDC identifies changes by the row key, a SCN and a RSID value we already have the bones of ordering our changes to apply, we only need to add in a way of finding the original value in the target table if it exists, again analytics come to our rescue.</p>
<p>Whenever I write a query using analytics I try to work out the how to partition and order the data to achieve my goal. With the updates I need to process pairs of UO and UN records for the same key value and change number and order them so that the UN records comes last. We then need to look for changes between the UO and UN record. To my thinking this is a simple LEAD() or LAG() to bring the before and after versions onto the same row and a case statement to determine if the column has become null in the captured change.</p>
<pre>select * from (
SELECT OPERATION$,
  CSCN$,
  COMMIT_TIMESTAMP$,
  RSID$,
  ORDER_ID,

/* Now for the changing columns */
  ORDER_STATUS,
  CASE
    WHEN lag(ORDER_STATUS) over (partition BY ORDER_ID, rsid$ order by OPERATION$ DESC ) IS NULL
    AND ORDER_STATUS                                                            IS NULL
    THEN NULL -- No change to value
    ELSE
      CASE
        WHEN ORDER_STATUS IS NULL
        THEN 2  -- 2 = change from NOT NULL to NULL
        ELSE 1  -- 1 = change from NULL to NOT NULL
      END
  END c_ORDER_STATUS
/* repeat similar logic for each of the other columns of interest. */

  FROM CDC_ORDERS
  WHERE operation$ &lt;&gt; 'I' -- only looking at UO and UN operations
  )
WHERE operation$ = 'UN' - we only need to process the final states of each change
)</pre>
<p>We now need to union this set of rows to the most recent stored version of the row; this is either the version in our data store with the most recent timestamp or the insert record in our CDC view. Picking the most recent value the data store can be achieved by using the row_number function:</p>
<pre>	SELECT OPERATION$, CSCN$,COMMIT_TIMESTAMP$,RSID$,ORDER_ID, ORDER_STATUS, L_ORDER_STATUS ... from (
	SELECT 'X' OPERATION$, -- set a constant
	  -1 CSCN$, -- set to a constant so we can simply filter it out later -- we don't want to re-insert this record!
	  COMMIT_TIMESTAMP$,
	  RSID$,
	  ORDER_ID,

/* For each column of interest */
	  ORDER_STATUS,
	  1 c_ORDER_STATUS, -- set a constant
/* Repeat the above block */

	row_number() over (partition by order_id, order by COMMIT_TIMESTAMP$ DESC,RSID$ DESC) RN  -- most recent version will be 1
	from  ODS_ORDERS
	where ORDER_ID in (select order_id from CDC_ORDERS) -- we only want order_id values that are in the CDC view
	) where RN = 1 -- only select the most recent</pre>
<p>or for the &#8216;I&#8217; record in our CDC subscriber view window.</p>
<pre>	SELECT OPERATION$,
	  CSCN$,
	  COMMIT_TIMESTAMP$,
	  RSID$,
	  ORDER_ID,

	/* Now for the changing columns */
	  ORDER_STATUS,
	  1 c_ORDER_STATUS -- set a constant
	/* repeat similar logic for each of the other columns of interest. */

	  FROM CDC_ORDERS
	  WHERE operation$ = 'I' -- only looking at I operations</pre>
<p>As the previous value for a given PK will be in one of two places (but not both) then a simple UNION ALL would suffice to provide all of the rows we need to build the history of changes.<br />
The next stage of the processing is take the three UNION ALL sources and then using analytics &#8220;copy down&#8221; previous values to fill in blanks. Here I use the LAST_VALUE function to look back over an ordered window.</p>
<pre>SELECT
	OPERATION$,
    CSCN$,
    COMMIT_TIMESTAMP$,
    RSID$,
    ORDER_ID,
    CASE
      WHEN c_ORDER_STATUS IS NOT NULL  -- there is a new value of ORDER_STATUS
      THEN ORDER_STATUS
      ELSE -- look at the last change for this column
        CASE LAST_VALUE(c_ORDER_STATUS ignore nulls) over (partition BY ORDER_ID order by CSCN$, RSID$)
          WHEN 1 -- changed to a non null at the last change
          THEN LAST_VALUE(ORDER_STATUS ignore nulls) over (partition BY ORDER_ID order by CSCN$, RSID$)
          WHEN 2 -- became NULL at the last change
          THEN NULL
/*
we could use LAST_VALUE(ORDER_STATUS ) over (partition BY ORDER_ID order by CSCN$, RSID$) but this would add a sort to query plan
*/
        END
    END ORDER_STATUS,
/* similar code for the remaining columns all selected from my UNION ALL VIEW of ODS_ORDER, CDC_ORDER where operation = 'I' and CDC_ORDER where operation is not 'I'</pre>
<p>The final things to deal with are: the COMMIT_TIMESTAMP$ column is a DATE and we may get multiple rows for a given key and date if multiple commits occur in the same second; as far as I am concerned here, multiple commits are (in spirit) the same change so we could take the last row in any given second, again we use the row_number function for this</p>
<pre>ROW_NUMBER() OVER (PARTITION BY ORDER_ID, COMMIT_TIMESTAMP$ ORDER BY RSID$ DESC) RN</pre>
<p>and not reinserting the &#8220;seed&#8221; rows we took from ODS_ORDERS &#8230; but as we set the CSCN$ to be -1 we just filter on CSCN$ values to be greater than zero in our insert.</p>
<p>That&#8217;s basically it &#8211; a huge view with many, many analytics &#8211; but it performs quite well providing you are not processing too large a window.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2009/12/30/capturing-change-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Transcend and Constraint Maintenance</title>
		<link>http://www.rittmanmead.com/2009/12/29/transcend-and-constraint-maintenance/</link>
		<comments>http://www.rittmanmead.com/2009/12/29/transcend-and-constraint-maintenance/#comments</comments>
		<pubDate>Tue, 29 Dec 2009 18:55:07 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=3979</guid>
		<description><![CDATA[In this posting, I explained the Transcend product as a new offering here at Rittman Mead, and I went on to demonstrate it&#8217;s use with index maintenance. Now, I&#8217;d like to demonstrate another staple of data warehouse load routines: constraint maintenance.
I constructed a complete replica of the SH schema called SH_NEW for use in this [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.rittmanmead.com/2009/12/21/transcend-and-index-maintenance/">this</a> posting, I explained the Transcend product as a new offering here at Rittman Mead, and I went on to demonstrate it&#8217;s use with index maintenance. Now, I&#8217;d like to demonstrate another staple of data warehouse load routines: constraint maintenance.</p>
<p>I constructed a complete replica of the SH schema called SH_NEW for use in this example. Assuming that this schema contains an actual data warehouse that we want to load data into, let&#8217;s suppose that we want to disable some constraints prior to loading one of the fact tables. We aren&#8217;t exactly sure which constraints we want to disable, and we&#8217;re a little lazy and don&#8217;t want to query the data dictionary. Transcend gives us what&#8217;s called Debug Mode, and it allows us to see what the framework WOULD DO with a particular command, without actually doing it. This works for all aspects of the framework, and is initiated with a single command:</p>
<pre>SQL&gt; BEGIN
  2
  3     trans_adm.start_debug;
  4
  5  END;
  6  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.45
SQL&gt;</pre>
<p>All Transcend packages and underlying code objects are written to understand that, while a session is in Debug Mode, no DDL statements should be executed inside that session. There is still plenty of recursive SQL that goes on inside that session: queries against Oracle dictionary objects and Transcend tables, auditing and logging, etc. However, the session is in &#8220;do no harm&#8221; mode, and everything the framework does in this mode is benign. We could go back to the default Runtime Mode by issuing the TRANS_ADM.STOP_DEBUG procedure.</p>
<p>Now I&#8217;ll use Debug Mode to see all the constraints that would get disabled with a particular command, so that I can evaluate whether I want to disable all of these, or perhaps just a subset:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.disable_constraints(
  3                                    p_table          =&gt; 'sales',
  4                                    p_owner          =&gt; 'sh_new'
  6                                  );
  7  END;
  8  /
SQL: alter table SH_NEW.SALES disable constraint SALES_CHANNEL_FK
Constraint SALES_CHANNEL_FK disabled on SH_NEW.SALES
SQL: alter table SH_NEW.SALES disable constraint SALES_CUSTOMER_FK
Constraint SALES_CUSTOMER_FK disabled on SH_NEW.SALES
SQL: alter table SH_NEW.SALES disable constraint SALES_PRODUCT_FK
Constraint SALES_PRODUCT_FK disabled on SH_NEW.SALES
SQL: alter table SH_NEW.SALES disable constraint SALES_PROMO_FK
Constraint SALES_PROMO_FK disabled on SH_NEW.SALES
SQL: alter table SH_NEW.SALES disable constraint SALES_TIME_FK
Constraint SALES_TIME_FK disabled on SH_NEW.SALES
SQL: alter table SH_NEW.SALES disable constraint SYS_C0014014
Constraint SYS_C0014014 disabled on SH_NEW.SALES
SQL: alter table SH_NEW.SALES disable constraint SYS_C0014009
Constraint SYS_C0014009 disabled on SH_NEW.SALES
SQL: alter table SH_NEW.SALES disable constraint SYS_C0014010
Constraint SYS_C0014010 disabled on SH_NEW.SALES
SQL: alter table SH_NEW.SALES disable constraint SYS_C0014011
Constraint SYS_C0014011 disabled on SH_NEW.SALES
SQL: alter table SH_NEW.SALES disable constraint SYS_C0014012
Constraint SYS_C0014012 disabled on SH_NEW.SALES
SQL: alter table SH_NEW.SALES disable constraint SYS_C0014013
Constraint SYS_C0014013 disabled on SH_NEW.SALES
SQL: alter table SH_NEW.SALES disable constraint SYS_C0014008
Constraint SYS_C0014008 disabled on SH_NEW.SALES
12 constraint disablement processes for SH_NEW.SALES executed

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.13
SQL&gt; select distinct status from all_constraints where owner='SH_NEW';

STATUS
------------
ENABLED

1 row selected.

Elapsed: 00:00:00.11
SQL&gt;</pre>
<p>We are able to evaluate the DDL that the command generates without executing it. In the statement above, we can see multiple check constraints, as well as multiple foreign key constraints. To give us three types of constraints to work with, I&#8217;ll go ahead and build a primary key on the SALES table, which is usually made up of all the combinations of foreign keys to the dimension tables:</p>
<pre>SQL&gt; alter table sh_new.sales add constraint sales_pk
  2  primary key (prod_id, cust_id, time_id, channel_id, promo_id);

Table altered.

Elapsed: 00:00:07.70
SQL&gt;</pre>
<p>Now, I can use the parameter P_CONSTRAINT_TYPE to determine which types of constraints I want to disable. This parameter actually accepts a regular expression, so it will match with any of the constraint types from the ALL_CONSTRAINTS table, such as P, R, C, etc., but if I wanted to specify only two types of constraints, I can do that as well:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.disable_constraints(
  3                                    p_table             =&gt; 'sales',
  4                                    p_owner             =&gt; 'sh_new',
  5                                    p_constraint_type   =&gt; 'p|c'
  6                                  );
  7  END;
  8  /
Constraint SALES_PK disabled on SH_NEW.SALES
Constraint SYS_C0014008 disabled on SH_NEW.SALES
Constraint SYS_C0014009 disabled on SH_NEW.SALES
Constraint SYS_C0014014 disabled on SH_NEW.SALES
Constraint SYS_C0014011 disabled on SH_NEW.SALES
Constraint SYS_C0014012 disabled on SH_NEW.SALES
Constraint SYS_C0014013 disabled on SH_NEW.SALES
Constraint SYS_C0014010 disabled on SH_NEW.SALES
8 constraint disablement processes for SH_NEW.SALES executed

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.75
SQL&gt; select constraint_type, status, count(*) from all_constraints
  2  where owner='SH_NEW' and table_name='SALES' group by constraint_type, status;

C | STATUS       |   COUNT(*)
- | ------------ | ----------
R | ENABLED      |          5
C | DISABLED     |          7
P | DISABLED     |          1

3 rows selected.

Elapsed: 00:00:00.20
SQL&gt;</pre>
<p>Enabling is just as easy&#8230; using different constraint types, or using constraint names as regular expressions. As I demonstrated in the <a href="http://www.rittmanmead.com/2009/12/21/transcend-and-index-maintenance/">previous post</a>, I can also use the Oracle Scheduler to queue processes up so they execute concurrently. However, this can be dangerous with constraints, so be sure the combination of constraints to be built won&#8217;t block one another.</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.enable_constraints(
  3                                   p_table                =&gt; 'sales',
  4                                   p_owner                =&gt; 'sh_new',
  5                                   p_constraint_regexp    =&gt; 'sys',
  6                                   p_concurrent           =&gt; 'yes'
  7                                  );
  8  END;
  9  /
Oracle scheduler job CONSTRAINT_MAINT101 created
Oracle scheduler job CONSTRAINT_MAINT101 enabled
Constraint SYS_C0014008 enabled on SH_NEW.SALES
Oracle scheduler job CONSTRAINT_MAINT102 created
Oracle scheduler job CONSTRAINT_MAINT102 enabled
Constraint SYS_C0014009 enabled on SH_NEW.SALES
Oracle scheduler job CONSTRAINT_MAINT103 created
Oracle scheduler job CONSTRAINT_MAINT103 enabled
Constraint SYS_C0014010 enabled on SH_NEW.SALES
Oracle scheduler job CONSTRAINT_MAINT104 created
Oracle scheduler job CONSTRAINT_MAINT104 enabled
Constraint SYS_C0014014 enabled on SH_NEW.SALES
Oracle scheduler job CONSTRAINT_MAINT105 created
Oracle scheduler job CONSTRAINT_MAINT105 enabled
Constraint SYS_C0014012 enabled on SH_NEW.SALES
Oracle scheduler job CONSTRAINT_MAINT106 created
Oracle scheduler job CONSTRAINT_MAINT106 enabled
Constraint SYS_C0014013 enabled on SH_NEW.SALES
Oracle scheduler job CONSTRAINT_MAINT107 created
Oracle scheduler job CONSTRAINT_MAINT107 enabled
Constraint SYS_C0014011 enabled on SH_NEW.SALES
7 constraint enablement processes for SH_NEW.SALES submitted to the Oracle scheduler

PL/SQL procedure successfully completed.

Elapsed: 00:00:06.16
SQL&gt;</pre>
<p>Another option with constraints is validating them. This is useful with the primary key of a fact table, because the underlying index of that primary key is all but useless: it&#8217;s not valuable in any typical star schema queries, and the space it consumes could be considerable. Keeping the constraint disabled is another option, but in some cases, validating that a constraint could be disabled goes a long way with data quality initiatives. So I&#8217;ll start by turning up the logging level so we can see the actual DDL that is generated. I&#8217;ll validate the primary key on the SALES table, and then attempt to insert some data into it:</p>
<pre>SQL&gt; BEGIN
  2
  3     trans_adm.set_module_conf(
  4                                p_logging_level=&gt; 3
  5                              );
  6
  7  END;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL&gt;
SQL&gt; BEGIN
  2     trans_etl.validate_constraints(
  3                                     p_table              =&gt; 'sales',
  4                                     p_owner              =&gt; 'sh_new',
  5                                     p_constraint_type    =&gt; 'p',
  6                                     p_concurrent         =&gt; 'no'
  7                                  );
  8  END;
  9  /
SQL: alter table SH_NEW.SALES modify constraint SALES_PK validate
Constraint SALES_PK validated on SH_NEW.SALES
1 constraint process for SH_NEW.SALES executed

PL/SQL procedure successfully completed.

Elapsed: 00:00:08.71
SQL&gt; insert into sh_new.sales values (1,1,trunc(sysdate),1,1,10,10);
insert into sh_new.sales values (1,1,trunc(sysdate),1,1,10,10)
*
ERROR at line 1:
ORA-25128: No insert/update/delete on table with constraint (SH_NEW.SALES_PK) disabled and validated

Elapsed: 00:00:00.02
SQL&gt;</pre>
<p>To be able to insert rows into a table with a constraint that is disabled and validated, we would have to either disable it (most likely) or enable it first.</p>
<p>Finally, for our last example, I&#8217;ll demonstrate the parameter P_BASIS. Though it&#8217;s probably poorly named, what it actually means is &#8220;which object is the BASIS of the constraint maintenance: the table that holds the constraint, or the table that is referenced by the constraint?&#8221; So far, the default of &#8216;table&#8217; has been passed in all the examples, but we could pass either a value of &#8216;reference&#8217; or &#8216;all&#8217;. This means that we can pass the name of a dimension table to the DISABLE_CONSTRAINTS procedure when we actually want to disable constraints on a fact table. We may not know for sure which fact tables have foreign keys pointing to a particular dimension table, only that we want to disable any constraints that reference it. P_BASIS allows us to do that.</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.disable_constraints(
  3                                    p_table    =&gt; 'products',
  4                                    p_owner    =&gt; 'sh_new',
  5                                    p_basis    =&gt; 'reference'
  6                                  );
  7  END;
  8  /
Constraint SALES_PRODUCT_FK disabled on SH_NEW.SALES
Constraint COSTS_PRODUCT_FK disabled on SH_NEW.COSTS
2 constraint disablement processes related to SH_NEW.PRODUCTS executed

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.08
SQL&gt;
SQL&gt; BEGIN
  2     trans_etl.enable_constraints(
  3                                   p_table      =&gt; 'products',
  4                                   p_owner      =&gt; 'sh_new',
  5                                   p_basis      =&gt; 'reference',
  6                                   p_concurrent =&gt; 'yes'
  7                                  );
  8  END;
  9  /
Oracle scheduler job CONSTRAINT_MAINT131 created
Oracle scheduler job CONSTRAINT_MAINT131 enabled
Constraint SALES_PRODUCT_FK enabled on SH_NEW.SALES
Oracle scheduler job CONSTRAINT_MAINT132 created
Oracle scheduler job CONSTRAINT_MAINT132 enabled
Constraint COSTS_PRODUCT_FK enabled on SH_NEW.COSTS
2 constraint enablement processes related to SH_NEW.PRODUCTS submitted to the Oracle scheduler

PL/SQL procedure successfully completed.

Elapsed: 00:00:05.71
SQL&gt;</pre>
<p>More to come soon, including object cloning (of which we&#8217;ve seen a little bit already), stats maintenance, and eventually, complex scenarios such as partition exchanges, table replacements, and configured Mappings and Dimensions which are easily called from ETL tools such as OWB.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2009/12/29/transcend-and-constraint-maintenance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Transcend and Index Maintenance</title>
		<link>http://www.rittmanmead.com/2009/12/21/transcend-and-index-maintenance/</link>
		<comments>http://www.rittmanmead.com/2009/12/21/transcend-and-index-maintenance/#comments</comments>
		<pubDate>Mon, 21 Dec 2009 14:51:01 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Rittman Mead]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=3932</guid>
		<description><![CDATA[Before joining up with the guys across the pond, I ran my own consulting company based in Atlanta called Transcendent Data. With each new data warehouse I built, I realized I was engineering the same processes over and over again with each client. So I developed a framework of best-practices for ETL development, encompassing such [...]]]></description>
			<content:encoded><![CDATA[<p>Before joining up with the guys across the pond, I ran my own consulting company based in Atlanta called Transcendent Data. With each new data warehouse I built, I realized I was engineering the same processes over and over again with each client. So I developed a framework of best-practices for ETL development, encompassing such things as auditing and logging, index and constraint maintenance, complex load scenarios, slowly-changing dimensions (SCD&#8217;s), and other things. The product is called Transcend, and now we&#8217;ve decided to start offering it here at Rittman Mead, so I&#8217;ll be blogging about some of the functionality from time-to-time. It&#8217;s written entirely in PL/SQL, object-relational types, and Java stored procedures, so it installs completely within the database, supporting versions 10gR2 and forward.</p>
<p>Though the most requested aspect of Transcend is it&#8217;s support for loading SCD&#8217;s in set-based mode, including combinations of Type 1 and Type 2 attributes in the same table, this aspect depends on a lot of core features in the product, so it&#8217;s prudent that I demonstrate some of these core features first, and then build up to the more advanced features in future postings. So the first thing I&#8217;d like to demonstrate is Transcend&#8217;s support for index maintenance in an ETL context. I&#8217;ll also use some of Transcend&#8217;s other features in setting up the test case.</p>
<p>First, I&#8217;ll build a table just like the SH.SALES table, but I&#8217;ll create it in another schema. I&#8217;ll include  the partitioning information, the indexes, and all the rows. The combination of parameters below dictates which table properties to include in the cloning process:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.build_table(
  3                            p_table          =&gt; 'sales_fact',
  4                            p_owner          =&gt; 'target',
  5                            p_source_table   =&gt; 'sales',
  6                            p_source_owner   =&gt; 'sh',
  7                            p_tablespace     =&gt; 'users',
  8                            p_partitioning   =&gt; 'yes',
  9                            p_rows           =&gt; 'yes',
 10                            p_indexes        =&gt; 'yes',
 11                            p_constraints    =&gt; 'no',
 12                            p_statistics     =&gt; 'transfer'
 13                          );
 14  END;
 15  /
Table TARGET.SALES_FACT created
Number of records inserted into TARGET.SALES_FACT: 918843
Statistics from SH.SALES transfered to TARGET.SALES_FACT
Index SALES_FACT_CHANNEL_BIX built
Index SALES_FACT_CUST_BIX built
Index SALES_FACT_PROD_BIX built
Index SALES_FACT_PROMO_BIX built
Index SALES_FACT_TIME_BIX built
5 index creation processes executed for TARGET.SALES_FACT

PL/SQL procedure successfully completed.

Elapsed: 00:00:14.92
SQL&gt;</pre>
<p>I also need a version of the table without the rows, the indexes, or the partitioning information. Then, I&#8217;ll insert only the rows from the SH.SALES table from the year 1998:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.build_table(
  3                            p_table          =&gt; 'sales_stg',
  4                            p_owner          =&gt; 'target',
  5                            p_source_table   =&gt; 'sales',
  6                            p_source_owner   =&gt; 'sh',
  7                            p_tablespace     =&gt; 'users',
  8                            p_partitioning   =&gt; 'no',
  9                            p_rows           =&gt; 'no',
 10                            p_indexes        =&gt; 'no',
 11                            p_constraints    =&gt; 'no',
 12                            p_statistics     =&gt; 'transfer'
 13                          );
 14  END;
 15  /
Table TARGET.SALES_STG created
Statistics from SH.SALES transfered to TARGET.SALES_STG

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.77
SQL&gt; insert into target.sales_stg select * from sh.sales where to_char(time_id,'yyyy') = '1998';

178834 rows created.

Elapsed: 00:00:00.95
SQL&gt; commit;

Commit complete.

Elapsed: 00:00:00.04
SQL&gt;</pre>
<p>With these two tables set up, I&#8217;ll demonstrate some of the options Transcend provides for index maintenance. First is the ability to mark indexes on a particular table unusable by a variety of the attributes. First, I&#8217;ll mark all bitmaps unusable, and then I&#8217;ll rebuild them:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.unusable_indexes(
  3                                 p_table          =&gt; 'sales_fact',
  4                                 p_owner          =&gt; 'target',
  5                                 p_index_type     =&gt; 'bitmap'
  6                          );
  7  END;
  8  /
5 indexes and 0 local index partitions affected on table TARGET.SALES_FACT

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.07
SQL&gt;
SQL&gt; BEGIN
  2     trans_etl.usable_indexes(
  3                               p_table          =&gt; 'sales_fact',
  4                               p_owner          =&gt; 'target'
  5                          );
  6  END;
  7  /
Rebuild processes for unusable indexes on 28 partitions of table TARGET.SALES_FACT executed
No matching unusable global indexes found

PL/SQL procedure successfully completed.

Elapsed: 00:00:04.77
SQL&gt;
SQL&gt;</pre>
<p>Now, I&#8217;ll mark indexes unusable that match a particular regular expression:</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.unusable_indexes(
  3                                 p_table          =&gt; 'sales_fact',
  4                                 p_owner          =&gt; 'target',
  5                                 p_index_regexp   =&gt; 'prod'
  6                          );
  7  END;
  8  /
1 index and 0 local index partitions affected on table TARGET.SALES_FACT

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.25
SQL&gt;
SQL&gt; BEGIN
  2     trans_etl.usable_indexes(
  3                               p_table          =&gt; 'sales_fact',
  4                               p_owner          =&gt; 'target'
  5                          );
  6  END;
  7  /
Rebuild processes for unusable indexes on 28 partitions of table TARGET.SALES_FACT executed
No matching unusable global indexes found

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.69
SQL&gt;
SQL&gt;</pre>
<p>Now, I just want to mark bitmaps unusable for a particular partition: SALES_Q4_2003. I also want to change up the rebuild of the indexes. Instead of rebuilding the indexes one after another&#8230; I&#8217;d like to have them all rebuild at the same time. Thankfully, Transcend supports this, by sending the rebuild statements to the Oracle Scheduler, DBMS_SCHEDULER. The TRANS_ETL package will wait for the conclusion of all rebuild processes before continuing. All of this is done by simply passing a value of &#8216;yes&#8217; to the P_CONCURRENT parameter.</p>
<pre>SQL&gt; BEGIN
  2     trans_etl.unusable_indexes(
  3                                 p_table          =&gt; 'sales_fact',
  4                                 p_owner          =&gt; 'target',
  5                                 p_partname       =&gt; 'sales_q4_2003'
  6                          );
  7  END;
  8  /
0 indexes and 5 local index partitions affected on table TARGET.SALES_FACT

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.67
SQL&gt;
SQL&gt; BEGIN
  2     trans_etl.usable_indexes(
  3                               p_table          =&gt; 'sales_fact',
  4                               p_owner          =&gt; 'target',
  5                               p_concurrent     =&gt; 'yes'
  6                          );
  7  END;
  8  /
Oracle scheduler job USABLE_INDEXES61 created
Oracle scheduler job USABLE_INDEXES61 enabled
Rebuild processes for unusable indexes on 1 partition of table TARGET.SALES_FACT submitted to the Oracle scheduler
No matching unusable global indexes found

PL/SQL procedure successfully completed.

Elapsed: 00:00:06.19
SQL&gt;
SQL&gt;</pre>
<p>Finally, the most complicated bit in index maintenance&#8230; and the reason I built the staging table with only the rows from 1998 in it. When loading a fact table, we rarely want to affect all the local index partitions on that table. As a matter of fact, we usually want to mark unusable only a very small number of the local index partitions, and this is usually dependent on which rows we are loading into the fact table.</p>
<p>Transcend supports this notion by allowing the specification of a particular table or view, using P_SOURCE_OWNER and P_SOURCE_OBJECT, to determines which partitions to mark as unusable on the target table. Remember that the SALES_STG table contains only the rows from the SH.SALES table for 1998. First, I&#8217;ll turn up the logging level for Transcend slightly so we can see the actual DDL being generated, and then I&#8217;ll affect the local index partitions on SALES_FACT that correspond to the rows in SALES_STG:</p>
<pre>SQL&gt; BEGIN
  2
  3     trans_adm.set_module_conf(
  4                                p_logging_level=&gt; 3
  5                              );
  6
  7  END;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL&gt;
SQL&gt; BEGIN
  2
  3     trans_etl.unusable_indexes(
  4                                 p_table          =&gt; 'sales_fact',
  5                                 p_owner          =&gt; 'target',
  6                                 p_source_object   =&gt; 'sales_stg',
  7                                 p_source_owner   =&gt; 'target'
  8                               );
  9
 10  END;
 11  /
SQL: alter index TARGET.SALES_FACT_CHANNEL_BIX modify partition SALES_Q1_1998 unusable
SQL: alter index TARGET.SALES_FACT_CHANNEL_BIX modify partition SALES_Q2_1998 unusable
SQL: alter index TARGET.SALES_FACT_CHANNEL_BIX modify partition SALES_Q3_1998 unusable
SQL: alter index TARGET.SALES_FACT_CHANNEL_BIX modify partition SALES_Q4_1998 unusable
SQL: alter index TARGET.SALES_FACT_CUST_BIX modify partition SALES_Q1_1998 unusable
SQL: alter index TARGET.SALES_FACT_CUST_BIX modify partition SALES_Q2_1998 unusable
SQL: alter index TARGET.SALES_FACT_CUST_BIX modify partition SALES_Q3_1998 unusable
SQL: alter index TARGET.SALES_FACT_CUST_BIX modify partition SALES_Q4_1998 unusable
SQL: alter index TARGET.SALES_FACT_PROD_BIX modify partition SALES_Q1_1998 unusable
SQL: alter index TARGET.SALES_FACT_PROD_BIX modify partition SALES_Q2_1998 unusable
SQL: alter index TARGET.SALES_FACT_PROD_BIX modify partition SALES_Q3_1998 unusable
SQL: alter index TARGET.SALES_FACT_PROD_BIX modify partition SALES_Q4_1998 unusable
SQL: alter index TARGET.SALES_FACT_PROMO_BIX modify partition SALES_Q1_1998 unusable
SQL: alter index TARGET.SALES_FACT_PROMO_BIX modify partition SALES_Q2_1998 unusable
SQL: alter index TARGET.SALES_FACT_PROMO_BIX modify partition SALES_Q3_1998 unusable
SQL: alter index TARGET.SALES_FACT_PROMO_BIX modify partition SALES_Q4_1998 unusable
SQL: alter index TARGET.SALES_FACT_TIME_BIX modify partition SALES_Q1_1998 unusable
SQL: alter index TARGET.SALES_FACT_TIME_BIX modify partition SALES_Q2_1998 unusable
SQL: alter index TARGET.SALES_FACT_TIME_BIX modify partition SALES_Q3_1998 unusable
SQL: alter index TARGET.SALES_FACT_TIME_BIX modify partition SALES_Q4_1998 unusable
0 indexes and 20 local index partitions affected on table TARGET.SALES_FACT

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.65
SQL&gt;
SQL&gt; BEGIN
  2
  3     trans_etl.usable_indexes(
  4                               p_table          =&gt; 'sales_fact',
  5                               p_owner          =&gt; 'target',
  6                               p_concurrent     =&gt; 'yes'
  7                          );
  8  END;
  9  /
SQL: alter table TARGET.SALES_FACT modify partition SALES_Q1_1998 rebuild unusable local indexes
Oracle scheduler job USABLE_INDEXES62 created
Oracle scheduler job USABLE_INDEXES62 enabled
SQL: alter table TARGET.SALES_FACT modify partition SALES_Q2_1998 rebuild unusable local indexes
Oracle scheduler job USABLE_INDEXES63 created
Oracle scheduler job USABLE_INDEXES63 enabled
SQL: alter table TARGET.SALES_FACT modify partition SALES_Q3_1998 rebuild unusable local indexes
Oracle scheduler job USABLE_INDEXES64 created
Oracle scheduler job USABLE_INDEXES64 enabled
SQL: alter table TARGET.SALES_FACT modify partition SALES_Q4_1998 rebuild unusable local indexes
Oracle scheduler job USABLE_INDEXES65 created
Oracle scheduler job USABLE_INDEXES65 enabled
Rebuild processes for unusable indexes on 4 partitions of table TARGET.SALES_FACT submitted to the Oracle scheduler
No matching unusable global indexes found

PL/SQL procedure successfully completed.

Elapsed: 00:00:05.88
SQL&gt;</pre>
<p>That&#8217;s pretty much it for unusable and usable index functionality. Transcend also supports cloning indexes and dropping indexes, with many of the same parameters mentioned above.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2009/12/21/transcend-and-index-maintenance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Capturing Change (Part 1)</title>
		<link>http://www.rittmanmead.com/2009/12/08/capturing-change-part-1/</link>
		<comments>http://www.rittmanmead.com/2009/12/08/capturing-change-part-1/#comments</comments>
		<pubDate>Tue, 08 Dec 2009 20:31:51 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2009/12/08/capturing-change-part-1/</guid>
		<description><![CDATA[Shortly after I joined Rittman Mead I wrote a small article on real-time Business Intelligence , there is also a link to it on our &#8220;Articles&#8221; tab. One of the techniques I mentioned in passing was change data capture, CDC.
Although many people believe that change capture is a technique for real-time or near real-time data [...]]]></description>
			<content:encoded><![CDATA[<p>Shortly after I joined Rittman Mead I wrote a small <a href="http://www.rittmanmead.com/files/DataW_Getting_Real.pdf" target="_blank">article</a> on real-time Business Intelligence , there is also a link to it on our <a href="http://www.rittmanmead.com/articles/" target="_blank">&#8220;Articles&#8221; tab</a>. One of the techniques I mentioned in passing was change data capture, CDC.</p>
<p>Although many people believe that change capture is a technique for real-time or near real-time data acquisition it also plays a role in batch-orientated ETL processes where for whatever reason you can&#8217;t directly query the source or where it is hard to identify new or changed data for loading into the data warehouse. Recently, we had to do just that; extract changes from a e-retailers system where there was no scope to modify or query the data source to generate conventional extracts.</p>
<p>Oracle have recently acquired GoldenGate, who had framework for CDC, I will be writing about GoldenGate later in December, but for this project we had a requirement to use asynchronous CDC with Oracle Warehouse Builder 11.1 and an Oracle 11gR1 (Exadata 1) target system. A further restriction we had was that we could not modify the change logging on the source system.</p>
<p>With asynchronous CDC we access data changes through subscriber views that effectively use system change numbers as a filter condition in the view definition. There are two calls to an Oracle package that are used in our ETL workflow: DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW which in essence sets the upper bound of the SCN range selector to the current SCN and DBMS_CDC_SUBSCRIBE.PURGE_WINDOW that sets the lower bound of the SCN filter to one above the current upper bound (i.e. returns no rows). The view itself contains, amongst others, an operation column to describe the type of change captured, &#8216;I&#8217; for insert, &#8216;D&#8217; for delete and &#8216;UO&#8217; and &#8216;UN&#8217; for the old and new values of updated rows;  columns to identify the order that the changes occurred &#8211; such as timestamps, SCN values and update numbers and, of course, the data changes. For this data warehouse we had no interest in deletes, but needed to know about new rows (type I) and changes (UO and UN). Because of the nature of the customer&#8217;s business it was very likely that many changes to a row could occur in a single CDC window &#8211; the simple option of using CDC to identify the changed rows and then fetch them from source was not available to us.</p>
<p>CDC can be configured to log the whole of the source row (supplementary logging), in which case we only need to look at the &#8216;I&#8217; records and the &#8216;UN&#8217; records and apply the whole rows in order. But as we could not change the logging to track the whole row we ended up with a source that contained a primary key and data for columns that have changed or NULLs for the case where no change has occurred, this makes things a little harder for us as we need to synthesise the whole row before processing updates. A further complication was that in some cases data could become NULL and those nulls need to processed</p>
<p>If we reduce the information in CDC subscriber view to operation$, the primary key of the source table (PK) and just one source column that might change we get a possibility matrix for updates:</p>
<pre>operation$...	PK...	C1
UO		1	NULL
UN		1	'X'	-- NULL value becomes 'X'
UO		99	'Y'
UN		99	'Z'	-- 'Y' becomes 'Z'
UO		42	'A'
UN		42	NULL -- 'A' becomes NULL
UO		39	NULL
UN		39	NULL -- data value remains unchanged (not necessarily NULL)</pre>
<p>So to process updates we need to retrieve the previous version of the row, apply changes to the updated columns and store a new (versioned) copy of the row. Where data does not change we need to &#8220;copy down&#8221; the previous value stored in our versioned data store or the next earliest version in our CDC view. We also wanted to keep this a set based operation for performance reasons</p>
<p>In part two I will describe the approach we adopted</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2009/12/08/capturing-change-part-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
