<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rittman Mead Consulting &#187; Peter Scott</title>
	<atom:link href="http://www.rittmanmead.com/author/peter-scott/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rittmanmead.com</link>
	<description>Delivered Intelligence</description>
	<lastBuildDate>Wed, 10 Mar 2010 08:49:23 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Thoughts on Change Data Capture</title>
		<link>http://www.rittmanmead.com/2010/03/09/thoughts-on-change-data-capture/</link>
		<comments>http://www.rittmanmead.com/2010/03/09/thoughts-on-change-data-capture/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 09:31:54 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[User Groups & Conferences]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4497</guid>
		<description><![CDATA[In little over a month I will be in Las Vegas speaking at Collaborate 10. There is a lot of BI / DW talks this year and for the first time with BIWA Training Days branding. Rittman Mead will be there at the conference giving talks on each of the conference days. If you are [...]]]></description>
			<content:encoded><![CDATA[<p>In little over a month I will be in Las Vegas speaking at Collaborate 10. There is a lot of BI / DW talks this year and for the first time with<a href="http://collaborate10.ioug.org/Education/BIWATrainingDays/tabid/83/Default.aspx#view" target="_blank"> BIWA Training Days branding</a>. Rittman Mead will be there at the conference giving talks on each of the conference days. If you are at the conference (or even just on vacation there) then come and say &#8216;Hi&#8217; to Stewart, Venkat, Mark and myself.</p>
<p>My talk will be about Realtime Data Warehousing &#8211; it is an overview of reasons, techniques and pitfalls, but I do cover a lot of material in that hour. Of course, Change Data Capture (CDC) will be a major part of the talk; Oracle has so many options here including their recently acquired GoldenGate product set. As always, the slides will be here on the Rittman Mead site soon after I speak.</p>
<p>My colleague, Stewart Bryson has also had some recent thoughts about change data capture over on the TDWI group at LinkedIn.com (group membership needed); he was quite preceptive (and on the money, in my opinion) with his comment &#8220;I would hesitate to let technical limitations dictate user requirements. In today&#8217;s BI/DW market, there are very few technical limitations that cannot be solved one way or another.&#8221;</p>
<p>One of points I will make in my Realtime DW talk, and perhaps I need a few more slides to do it justice, is the need to profile the change you capture on the source system. Often there is a lot of &#8220;noise&#8221; that looks like change but you have no real interest in it at the data warehouse. Not all systems are &#8220;well behaved&#8221;; I have seen systems that always update a record even if nothing has changed and even systems that update each column as separate statement with its own commit.  Of course, even systems that don&#8217;t have those vices can still have columns that have no DW significance being updated and see those changes being filtered out on the data warehouse after we had already done a lot of work (processing, network bandwidth and the like) to get the data there.</p>
<p>The more I do this kind of work I feel there is a need to switch CDC on on the live source for a while and see the typical patten of change that occurs in a day, week, period whatever and then make decisions on how to handle this defensively downstream. Do we need to exclude certain columns that are just &#8220;noise&#8221;? What will be the impact of multiple, rapidly-occurring commits on how we handle SCD-2 dimensions? Of course we can predict what will see and come up with a proposed solution but the real source often has a few surprises up its sleeve &#8211; once a customer gave me a sequence of order statuses that an order passed through in its life-cycle except that on the actual source system the order sequence was not the same as their documentation and that would impact our reporting.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/03/09/thoughts-on-change-data-capture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oracle 11g Pivot</title>
		<link>http://www.rittmanmead.com/2010/02/23/oracle-11g-pivot/</link>
		<comments>http://www.rittmanmead.com/2010/02/23/oracle-11g-pivot/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 12:02:16 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[BI (General)]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle Warehouse Builder]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4401</guid>
		<description><![CDATA[One of the things that I often come across is the &#8220;up-dateable fact&#8221;, that is a fact that starts life &#8220;incomplete&#8221; and changes overtime. Examples include things such as support calls that start life as &#8220;open&#8221; then progress through &#8220;responded&#8221;, &#8220;resolved&#8221; and finally &#8220;closed&#8221;; statuses in the sales cycle such as ordered, paid, shipped; stock [...]]]></description>
			<content:encoded><![CDATA[<p>One of the things that I often come across is the &#8220;up-dateable fact&#8221;, that is a fact that starts life &#8220;incomplete&#8221; and changes overtime. Examples include things such as support calls that start life as &#8220;open&#8221; then progress through &#8220;responded&#8221;, &#8220;resolved&#8221; and finally &#8220;closed&#8221;; statuses in the sales cycle such as ordered, paid, shipped; stock movements in a warehouse &#8211; goods received and dispatched. Of course the business, rightly, needs to measure the times between stages or the number or value of transactions at each stage.</p>
<p>As a principle, I hate the idea of having to update a fact. A fact has happened, it is not going to change. I suppose to be more accurate a &#8220;change&#8221; is a new event, a new fact, a new fact occurring at a different time. So how to model this? &#8211; well instinctively I would go for a table that is only inserted (preferably appended to &#8211; think set based!) containing whatever dimensions are needed (don&#8217;t forget the &#8216;when&#8217; dimension) PLUS an &#8216;EVENT&#8217; dimension (one row per expected status) and the measures (how many, how much etc). To report on this we need to rotate the table so that the events that belong to single item appear in the same row. Before Oracle 11g we would need to construct some SQL using a mix of case statements and analytic functions to rotate the data. But now we have a potentially better way the, Oracle 11g Pivot operator.</p>
<p>Here we define a set of dimensions for the row (similar to the dimensions in a Group BY clause), the aggregation operators for the pivoted measures &#8211; which of course could include MIN() or MAX() for the cases when want to pivot DATE types. We also need to define the dimensions we want to pivot by, and here we can actually choose multiple dimensions; this again is somewhat similar to the GROUP BY of traditional SQL. Remember though when we pivot we sometimes only expect to &#8216;aggregate&#8217; a single row &#8211; if we want to pivot order date and dispatch date then we probably have just one of each!.</p>
<p>So how does it look? Well the Oracle 11g documentation describes the syntax and gives some examples &#8211; here I am showing a slightly more complicated case where we are pivoting by two dimensions, each with a known set of code values. This example is based on two of the examples in the Oracle 11g Data Warehousing Guide</p>
<pre>	SELECT * FROM	(
		SELECT product, channel, quarter, quantity_sold FROM sales_view
		) PIVOT (SUM(quantity_sold) as SUMQ, SUM(amount_sold) as SUMS
			FOR (channel, quarter) IN
			((5, '02') AS CATALOG_Q2,
		 	(4, '01') AS INTERNET_Q1,
		 	(4, '04') AS INTERNET_Q4,
		 	(2, '02') AS PARTNERS_Q2,
		 	(9, '03') AS TELE_Q3
			) );</pre>
<p>The query returns a column for the product and for each of the specified pairs of channel and quarter a column for each measure. So we get columns for:</p>
<p>PRODUCT, CATALOG_Q2_SUMQ, CATALOG_Q2_SUMS, INTERNET_Q1_SUMQ, INTERNET_Q1_SUMS, INTERNET_Q4_SUMQ, INTERNET_Q4_SUMS, PARTNERS_Q2_SUMQ, INTERNET_Q4_SUMS, TELE_Q3_SUMQ, and TELE_Q3_SUMS</p>
<p>Note how the the measure name is concatenated to the alias in the in list.<br />
As you can see we don&#8217;t need to specify each combination of channel and quarter &#8211; just the ones we want in our pivoted view. We also don&#8217;t use a GROUP BY clause &#8211; we specify the columns we want to see (both the dimensions and the aggregations) and Oracle implicitly groups by all of the columns not in aggregated functions.</p>
<p>In my example I used SELECT * to wrap the inline pivot, in practice I would explicitly select the columns and perhaps alias them to more meaningful names than the concatenated ones generated by Oracle. I would also expose the pivot as database view and thus access it from OWB or OBIEE where it appears to be just another table or view.</p>
<p>Another point to note is that you might see null values in the pivoted measures and these can be due to one of two reasons: the value stored for that combination of dimensions (in our case channel and quarter) is actually NULL, or that the combination does not exist. If you need to (and you may not need to) you can differentiate by using a COUNT measure; if the count is zero then the combination does not exist in the source table, if one or more then the source has NULLs stored for the combination.</p>
<p>We used a similar pivot view to the one above to monitor stock movements in a warehouse &#8211; in this case we needed to track individual batches of product from multiple potential suppliers, so in addition to the product dimension we had dimensional columns for batch id (a degenerate dimension) and supplier. The view was then exposed to OWB to allow us to include the aggregated result set in our ETL process &#8211; we needed to calculate some additional measures based on the difference between two of the pivoted columns. The Pivot operator greatly simplified our ETL for this fact &#8211; we could easily write an ETL process with a straight aggregation then pivot the results with CASE statements or DECODES or whatever &#8211; but that would have been less clear and also increased the number of &#8220;moving parts&#8221;.</p>
<p>We have had no problems with performance with our data set &#8211; 80 million rows pivoted on Exadata to just a few seconds. But it was not too slow on our non-exadata development machine either.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/02/23/oracle-11g-pivot/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Heat Maps in OBIEE</title>
		<link>http://www.rittmanmead.com/2010/01/22/heat-maps-in-obiee/</link>
		<comments>http://www.rittmanmead.com/2010/01/22/heat-maps-in-obiee/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 21:06:08 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Oracle BI Suite EE]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4169</guid>
		<description><![CDATA[OBIEE 10.1.3.4.1 has a host of ways to add visual impact to reported data to aid its interpretation. But some things are not so easy to achieve, or at least not by just clicking on a few format buttons on the Answers web page.
One of my customers had the need to report a day&#8217;s figures [...]]]></description>
			<content:encoded><![CDATA[<p>OBIEE 10.1.3.4.1 has a host of ways to add visual impact to reported data to aid its interpretation. But some things are not so easy to achieve, or at least not by just clicking on a few format buttons on the Answers web page.</p>
<p>One of my customers had the need to report a day&#8217;s figures and colour the results based on change since yesterday. I also had three additional design challenges to meet: not to change the RPD (this is only a proof of concept, RPD changes might come later), avoid having to laboriously conditionally format each coloured cell (the colour rules could become very complex and based on multiple columns) and allow the use of &#8220;normal&#8221; and pivot tables on the dashboard.</p>
<p>Firstly, as I am looking at the difference between the a measure over two days I can not use a pivot report for this as it is not possible to calculate differences between measure columns in the pivot. But we can use the &#8216;FILTER&#8217; button on the column formula editor to restrict the values returned to just those that match the filter. By doing this twice, once for each day of interest, we get two columns in the same Answers Request TABLE view that we can do the maths on. Here we need to calculate the percentage difference between the two values.</p>
<p>Now I can calculate the values I need for my heat map &#8211; the day&#8217;s measures, and the percentage change since yesterday. Next I need to colour in the cell backgrounds.</p>
<p>Of course it very possible to just use conditional formatting, but this gets quite unwieldy with complex cell colour rules, and of course the request to make the colour mapping reusable across multiple Answers requests and dashboards. I decided to tackle the problem by adding a text column to the request and changing its format to HTML. To change the background colour of a cell you need to make the cell contain the following HTML:</p>
<pre>&lt;div style="background-color: rgb('|| cast(255*RND() as varchar(3))||','||cast(255*RND() as varchar(3))||',cast(255*RND() as varchar(3))|| ');" &gt; -some value to output' || &lt;/div&gt;'</pre>
<p>You must output something in the cell as OBIEE does not colour in NULL values. I decided to use the RGB() form of the colour selector as it is probably simpler to calculate three numeric values than to build colour codes in hexadecimal.</p>
<p>Although it is feasible to code this colour mapping in OBIEE it is a lot of effort. Here I simplified things greatly by writing a database function that takes the value to display and a value that controls the colour (and here it is simply percentage change over two days) as the two arguments and returns a varchar2 string that formats the cell. Although the colouring coding algorithm can be as complex as you like ultimately we populate a string and return it.</p>
<pre>[code omitted]
vHTML_STRING:='&lt;div style="background-color: rgb('||vRED||','||vGREEN||','||vBLUE||')" &gt;'||TO_CHAR(pValue,'FM999,999,999,999')||' &lt;/div&gt;'
return vHTML_STRING;</pre>
<p>To use this we simply set the cell contents to be the results of the OBIEE EVALUATE</p>
<pre>Evaluate('ORCL.F_my_colour_map(%1,%2)' as varchar(100), [some OBIEE expression that evaluates to the colour],[some other expression that gives the value to display]))</pre>
<p>function formatted as HTML.</p>
<p>Of course, this method is only suitable for the current OBIEE release, older versions may not have 'EVALUATE' and  the future OBIEE 11 may not behave in the same way with rendering HTML within Answers tables.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/01/22/heat-maps-in-obiee/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Collaborate 10</title>
		<link>http://www.rittmanmead.com/2009/12/30/collaborate-10/</link>
		<comments>http://www.rittmanmead.com/2009/12/30/collaborate-10/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 16:27:53 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=3995</guid>
		<description><![CDATA[Mark recently mentioned that he would joining Stewart Bryson (the Managing Director of Rittman Mead America) as speakers at ODTUG Kaleidoscope 2010. But before that June meeting both Mark and I will be joining Stewart at Collaborate 10 in Las Vegas. This year Collaborate will have BIWA feel about it as is running BIWA Training [...]]]></description>
			<content:encoded><![CDATA[<p>Mark recently mentioned that he would joining Stewart Bryson (the Managing Director of Rittman Mead America) as speakers at ODTUG Kaleidoscope 2010. But before that June meeting both Mark and I will be joining Stewart at <a href="http://collaborate10.ioug.org/" target="_blank">Collaborate 10</a> in Las Vegas. This year Collaborate will have BIWA feel about it as is running <a href="http://collaborate10.ioug.org/Education/BIWATrainingDays/tabid/83/Default.aspx" target="_blank">BIWA Training Days</a> (a sort of conference within a conference)</p>
<p>I will be giving a new presentation called &#8220;Getting Real &#8211; Data Warehouse Loading As It Happens&#8221; which will discuss some of the challenges of realtime data acquisition, starting with my premise that Real Time doesn&#8217;t really exist&#8230;.there will always be some lag, even if you query the source directly. I intend to touch on replication, direct query of source, change propagation and of course the things that GoldenGate now brings to the Oracle table. There will also be space for a bit of pragmatic cheating &#8211; how to make things look real-time when they are not, and of course fit in the warning that &#8220;just because you can do something doesn&#8217;t mean that you should&#8221;.</p>
<p>Mark will be busy with two sessions &#8211; a Deep Dive, all-day session on OBIEE on Sunday 18th and look at OWB 11gR2 the next day; Stewart will talk about DW fault tolerance.</p>
<p>I am looking forward to being in Vegas &#8211; the chance to hear some good talks and to meet some old friends.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2009/12/30/collaborate-10/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Capturing Change (Part 2)</title>
		<link>http://www.rittmanmead.com/2009/12/30/capturing-change-part-2/</link>
		<comments>http://www.rittmanmead.com/2009/12/30/capturing-change-part-2/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 10:54:21 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=3985</guid>
		<description><![CDATA[In the previous part I outlined the business need for writing our own CDC routines and started to outline some of the issues we would need to resolve. Here I shall outline the approach we took and describe how to go about building the SQL needed to synthesise the rows that need to be added [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://www.rittmanmead.com/2009/12/08/capturing-change-part-1/" target="_blank">previous</a> part I outlined the business need for writing our own CDC routines and started to outline some of the issues we would need to resolve. Here I shall outline the approach we took and describe how to go about building the SQL needed to synthesise the rows that need to be added to our data warehouse<br />
First a recap of the types of records we will need to process</p>
<pre>operation$...	PK	...	C1
 I		87		'J' -- a newly inserted row
UO		1		NULL
UN		1		'X' -- NULL value becomes 'X'
UO		99		'Y'
UN		99		'Z' -- 'Y' becomes 'Z'
UO		42		'A'
UN		42		NULL -- 'A' becomes NULL
UO		39		NULL
UN		39		NULL -- data value remains unchanged (not necessarily NULL)</pre>
<p>As I mentioned last time we have no real interest in delete records (in fact my source system does not generate deletes); we are using CDC to populate a data warehouse and the fact that a record had existed is of interest to us. As I also mentioned we are not using supplementary logging so we can&#8217;t just apply the I and UN records in chronological order</p>
<p>A point I did not emphasise last time is that change data capture does just that: it captures changes, in fact every commit of a change. This means that we may see changes that are of no interest to our data warehouse. It also means we may see &#8216;non-changes&#8217; where a record is &#8220;updated&#8221; and then committed without changes to data values being made. I would suggest that the best way to filter out these records (if filtering is appropriate for your business) is as part of the downstream ETL and not of the change capture process.</p>
<p>So how do we go about the building of a change capture view. The first thing is to realise that we can do this with analytic functions, the second thing is that we will need a lot of them (several for each column being processed) and this can have a major (negative) impact on query performance as we carry out many window sort operations on the same dataset.</p>
<p>Let&#8217;s start with the update row type. For an update we need a row to exist, therefore any change should be built on a prior version of the row. If we consider the first update for a row in the subscriber view it must either be for a newly inserted row or one we already hold in target table.  Subsequent updates in the same CDC subscriber view need to be applied in the order they occurred. As the CDC identifies changes by the row key, a SCN and a RSID value we already have the bones of ordering our changes to apply, we only need to add in a way of finding the original value in the target table if it exists, again analytics come to our rescue.</p>
<p>Whenever I write a query using analytics I try to work out the how to partition and order the data to achieve my goal. With the updates I need to process pairs of UO and UN records for the same key value and change number and order them so that the UN records comes last. We then need to look for changes between the UO and UN record. To my thinking this is a simple LEAD() or LAG() to bring the before and after versions onto the same row and a case statement to determine if the column has become null in the captured change.</p>
<pre>select * from (
SELECT OPERATION$,
  CSCN$,
  COMMIT_TIMESTAMP$,
  RSID$,
  ORDER_ID,

/* Now for the changing columns */
  ORDER_STATUS,
  CASE
    WHEN lag(ORDER_STATUS) over (partition BY ORDER_ID, rsid$ order by OPERATION$ DESC ) IS NULL
    AND ORDER_STATUS                                                            IS NULL
    THEN NULL -- No change to value
    ELSE
      CASE
        WHEN ORDER_STATUS IS NULL
        THEN 2  -- 2 = change from NOT NULL to NULL
        ELSE 1  -- 1 = change from NULL to NOT NULL
      END
  END c_ORDER_STATUS
/* repeat similar logic for each of the other columns of interest. */

  FROM CDC_ORDERS
  WHERE operation$ &lt;&gt; 'I' -- only looking at UO and UN operations
  )
WHERE operation$ = 'UN' - we only need to process the final states of each change
)</pre>
<p>We now need to union this set of rows to the most recent stored version of the row; this is either the version in our data store with the most recent timestamp or the insert record in our CDC view. Picking the most recent value the data store can be achieved by using the row_number function:</p>
<pre>	SELECT OPERATION$, CSCN$,COMMIT_TIMESTAMP$,RSID$,ORDER_ID, ORDER_STATUS, L_ORDER_STATUS ... from (
	SELECT 'X' OPERATION$, -- set a constant
	  -1 CSCN$, -- set to a constant so we can simply filter it out later -- we don't want to re-insert this record!
	  COMMIT_TIMESTAMP$,
	  RSID$,
	  ORDER_ID,

/* For each column of interest */
	  ORDER_STATUS,
	  1 c_ORDER_STATUS, -- set a constant
/* Repeat the above block */

	row_number() over (partition by order_id, order by COMMIT_TIMESTAMP$ DESC,RSID$ DESC) RN  -- most recent version will be 1
	from  ODS_ORDERS
	where ORDER_ID in (select order_id from CDC_ORDERS) -- we only want order_id values that are in the CDC view
	) where RN = 1 -- only select the most recent</pre>
<p>or for the &#8216;I&#8217; record in our CDC subscriber view window.</p>
<pre>	SELECT OPERATION$,
	  CSCN$,
	  COMMIT_TIMESTAMP$,
	  RSID$,
	  ORDER_ID,

	/* Now for the changing columns */
	  ORDER_STATUS,
	  1 c_ORDER_STATUS -- set a constant
	/* repeat similar logic for each of the other columns of interest. */

	  FROM CDC_ORDERS
	  WHERE operation$ = 'I' -- only looking at I operations</pre>
<p>As the previous value for a given PK will be in one of two places (but not both) then a simple UNION ALL would suffice to provide all of the rows we need to build the history of changes.<br />
The next stage of the processing is take the three UNION ALL sources and then using analytics &#8220;copy down&#8221; previous values to fill in blanks. Here I use the LAST_VALUE function to look back over an ordered window.</p>
<pre>SELECT
	OPERATION$,
    CSCN$,
    COMMIT_TIMESTAMP$,
    RSID$,
    ORDER_ID,
    CASE
      WHEN c_ORDER_STATUS IS NOT NULL  -- there is a new value of ORDER_STATUS
      THEN ORDER_STATUS
      ELSE -- look at the last change for this column
        CASE LAST_VALUE(c_ORDER_STATUS ignore nulls) over (partition BY ORDER_ID order by CSCN$, RSID$)
          WHEN 1 -- changed to a non null at the last change
          THEN LAST_VALUE(ORDER_STATUS ignore nulls) over (partition BY ORDER_ID order by CSCN$, RSID$)
          WHEN 2 -- became NULL at the last change
          THEN NULL
/*
we could use LAST_VALUE(ORDER_STATUS ) over (partition BY ORDER_ID order by CSCN$, RSID$) but this would add a sort to query plan
*/
        END
    END ORDER_STATUS,
/* similar code for the remaining columns all selected from my UNION ALL VIEW of ODS_ORDER, CDC_ORDER where operation = 'I' and CDC_ORDER where operation is not 'I'</pre>
<p>The final things to deal with are: the COMMIT_TIMESTAMP$ column is a DATE and we may get multiple rows for a given key and date if multiple commits occur in the same second; as far as I am concerned here, multiple commits are (in spirit) the same change so we could take the last row in any given second, again we use the row_number function for this</p>
<pre>ROW_NUMBER() OVER (PARTITION BY ORDER_ID, COMMIT_TIMESTAMP$ ORDER BY RSID$ DESC) RN</pre>
<p>and not reinserting the &#8220;seed&#8221; rows we took from ODS_ORDERS &#8230; but as we set the CSCN$ to be -1 we just filter on CSCN$ values to be greater than zero in our insert.</p>
<p>That&#8217;s basically it &#8211; a huge view with many, many analytics &#8211; but it performs quite well providing you are not processing too large a window.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2009/12/30/capturing-change-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Capturing Change (Part 1)</title>
		<link>http://www.rittmanmead.com/2009/12/08/capturing-change-part-1/</link>
		<comments>http://www.rittmanmead.com/2009/12/08/capturing-change-part-1/#comments</comments>
		<pubDate>Tue, 08 Dec 2009 20:31:51 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2009/12/08/capturing-change-part-1/</guid>
		<description><![CDATA[Shortly after I joined Rittman Mead I wrote a small article on real-time Business Intelligence , there is also a link to it on our &#8220;Articles&#8221; tab. One of the techniques I mentioned in passing was change data capture, CDC.
Although many people believe that change capture is a technique for real-time or near real-time data [...]]]></description>
			<content:encoded><![CDATA[<p>Shortly after I joined Rittman Mead I wrote a small <a href="http://www.rittmanmead.com/files/DataW_Getting_Real.pdf" target="_blank">article</a> on real-time Business Intelligence , there is also a link to it on our <a href="http://www.rittmanmead.com/articles/" target="_blank">&#8220;Articles&#8221; tab</a>. One of the techniques I mentioned in passing was change data capture, CDC.</p>
<p>Although many people believe that change capture is a technique for real-time or near real-time data acquisition it also plays a role in batch-orientated ETL processes where for whatever reason you can&#8217;t directly query the source or where it is hard to identify new or changed data for loading into the data warehouse. Recently, we had to do just that; extract changes from a e-retailers system where there was no scope to modify or query the data source to generate conventional extracts.</p>
<p>Oracle have recently acquired GoldenGate, who had framework for CDC, I will be writing about GoldenGate later in December, but for this project we had a requirement to use asynchronous CDC with Oracle Warehouse Builder 11.1 and an Oracle 11gR1 (Exadata 1) target system. A further restriction we had was that we could not modify the change logging on the source system.</p>
<p>With asynchronous CDC we access data changes through subscriber views that effectively use system change numbers as a filter condition in the view definition. There are two calls to an Oracle package that are used in our ETL workflow: DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW which in essence sets the upper bound of the SCN range selector to the current SCN and DBMS_CDC_SUBSCRIBE.PURGE_WINDOW that sets the lower bound of the SCN filter to one above the current upper bound (i.e. returns no rows). The view itself contains, amongst others, an operation column to describe the type of change captured, &#8216;I&#8217; for insert, &#8216;D&#8217; for delete and &#8216;UO&#8217; and &#8216;UN&#8217; for the old and new values of updated rows;  columns to identify the order that the changes occurred &#8211; such as timestamps, SCN values and update numbers and, of course, the data changes. For this data warehouse we had no interest in deletes, but needed to know about new rows (type I) and changes (UO and UN). Because of the nature of the customer&#8217;s business it was very likely that many changes to a row could occur in a single CDC window &#8211; the simple option of using CDC to identify the changed rows and then fetch them from source was not available to us.</p>
<p>CDC can be configured to log the whole of the source row (supplementary logging), in which case we only need to look at the &#8216;I&#8217; records and the &#8216;UN&#8217; records and apply the whole rows in order. But as we could not change the logging to track the whole row we ended up with a source that contained a primary key and data for columns that have changed or NULLs for the case where no change has occurred, this makes things a little harder for us as we need to synthesise the whole row before processing updates. A further complication was that in some cases data could become NULL and those nulls need to processed</p>
<p>If we reduce the information in CDC subscriber view to operation$, the primary key of the source table (PK) and just one source column that might change we get a possibility matrix for updates:</p>
<pre>operation$...	PK...	C1
UO		1	NULL
UN		1	'X'	-- NULL value becomes 'X'
UO		99	'Y'
UN		99	'Z'	-- 'Y' becomes 'Z'
UO		42	'A'
UN		42	NULL -- 'A' becomes NULL
UO		39	NULL
UN		39	NULL -- data value remains unchanged (not necessarily NULL)</pre>
<p>So to process updates we need to retrieve the previous version of the row, apply changes to the updated columns and store a new (versioned) copy of the row. Where data does not change we need to &#8220;copy down&#8221; the previous value stored in our versioned data store or the next earliest version in our CDC view. We also wanted to keep this a set based operation for performance reasons</p>
<p>In part two I will describe the approach we adopted</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2009/12/08/capturing-change-part-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Oracle 11g Release 2 Analytics</title>
		<link>http://www.rittmanmead.com/2009/09/13/oracle-11g-release-2-analytics/</link>
		<comments>http://www.rittmanmead.com/2009/09/13/oracle-11g-release-2-analytics/#comments</comments>
		<pubDate>Sun, 13 Sep 2009 15:55:52 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2009/09/13/oracle-11g-release-2-analytics/</guid>
		<description><![CDATA[Mark and Venkat have already been blogging about OWB 11g Release 2, but that was not the only new release to slip past Oracle&#8217;s doors in recent days; 11g Release 2 of the database is also out.
I have long loved analytic functions in Oracle &#8211; they can give a simple way to avoid sub-queries and [...]]]></description>
			<content:encoded><![CDATA[<p>Mark and Venkat have already been blogging about OWB 11g Release 2, but that was not the only new release to slip past Oracle&#8217;s doors in recent days; 11g Release 2 of the database is also out.</p>
<p>I have long loved analytic functions in Oracle &#8211; they can give a simple way to avoid sub-queries and can help reduce some fairly difficult ETL problems to a simple SQL statement, one that can be exposed as a database view or matrialized view and then accesses by an ETL tool such as OWB or ODI. Of course, analytic functions are not a silver bullet, wrongly used they can cause problems; by necessity the PARTITION BY and ORDER BY clauses will require data sorts and, as I mentioned in a UKOUG talk a few years back, for large sort operations that may not be a cheap operation, especially if many analytic functions are used each with differing sorting requirements.</p>
<p>It was nice to see in the <a href="http://download.oracle.com/docs/cd/E11882_01/server.112/e10881/chapter1.htm#NEWFTCH1">New Features Guide </a>for Oracle 11gR2  two new analytic functions and a useful enhancement to two existing ones: LAG and LEAD were particular favourites of mine, the ability to &#8220;copy&#8221; a value from the next or previous row is so useful when, say, dealing with opening and closing stock values. The big failing (to my mind) was how this handled NULL values; this is now addressed by the option to IGNORE NULLS.</p>
<p>One of the new analytic functions also looks at rows around the current row; NTH_VALUE which is really a more general version of FIRST_VALUE and LAST_VALUE. The final new Analytic function is LISTAGG. This allows the concatenation of a measure column, that is a way of pivoting a column of data but presenting the results a single column. Here is the example in the <a href="http://download.oracle.com/docs/cd/E11882_01/server.112/e10810/analysis.htm#DWHSG02015">Oracle 11g Data Warehousing Guide 11gR2 </a></p>
<pre>
SELECT time_id, prod_id, MIN(amount_sold), LISTAGG(min(amount_sold),';')<br />
WITHIN GROUP (ORDER BY prod_id) OVER (PARTITION BY time_id) cust_list<br />
FROM sales WHERE time_id > '20-DEC-01' AND prod_id BETWEEN 120 AND 125<br />
GROUP BY prod_id, time_id;</p>
<p>TIME_ID   PROD_ID   MIN(AMOUNT_SOLD)  CUST_LIST<br />
-------   -------   ----------------  -----------<br />
21-DEC-01     120            51.36    51.36;10.81<br />
21-DEC-01     121            10.81    51.36;10.81<br />
22-DEC-01     120            51.36    51.36;10.81;20.23;56.12;17.79;15.67<br />
22-DEC-01     121            10.81    51.36;10.81;20.23;56.12;17.79;15.67<br />
22-DEC-01     122            20.23    51.36;10.81;20.23;56.12;17.79;15.67<br />
22-DEC-01     123            56.12    51.36;10.81;20.23;56.12;17.79;15.67<br />
22-DEC-01     124            17.79    51.36;10.81;20.23;56.12;17.79;15.67<br />
22-DEC-01     125            15.67    51.36;10.81;20.23;56.12;17.79;15.67<br />
...
</pre</p>
<p>My colleagues and I will be writing more on the new features of Oracle 11g and a more in-depth view of OWB 11gR2, so keep watching the blog</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2009/09/13/oracle-11g-release-2-analytics/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Slightly Fuzzy Lookups</title>
		<link>http://www.rittmanmead.com/2009/08/15/slightly-fuzzy-lookups/</link>
		<comments>http://www.rittmanmead.com/2009/08/15/slightly-fuzzy-lookups/#comments</comments>
		<pubDate>Sat, 15 Aug 2009 09:25:13 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle Warehouse Builder]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2009/08/15/slightly-fuzzy-lookups/</guid>
		<description><![CDATA[Recently I had a requirement to selectively &#8220;translate&#8221; the data in one column of a table before loading into a data warehouse. In this case we had to &#8220;standardise&#8221; a list of countries. Normally, this is classic use of an outer join to a table containing the incorrect expression and its translation and where a [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I had a requirement to <em>selectively</em> &#8220;translate&#8221; the data in one column of a table before loading into a data warehouse. In this case we had to &#8220;standardise&#8221; a list of countries. Normally, this is classic use of an outer join to a table containing the incorrect expression and its translation and where a match is found use the translation or else the original expression.</p>
<pre>
select case when t.new_country is null then o.country else t.new_country end country_name
from
translation_table t, orignal_table o
where o.country = t.country(+);
</pre>
<p>But in this case the usage of the column and the spellings and abbreviations used were highly inconsistent. Sometimes we had the country name entered in the local language, sometimes additional words in same column before or after the country name such as a town name or a post code, sometimes we had punctuation, occasionally people confused county with country, even in some cases delivery instructions like &#8220;leave behind garage&#8221;. Although is feasible to create a translation list based on all of the incorrect variations, it is not the best idea; there can be many variations and coding all eventualities may not be sustainable</p>
<p>One thing that can get forgotten is that is possible to use an expression instead of an equality in the outer join where clause. For example we could use LIKE and in the translation_table store match strings (with % and _ characters as required) instead of the exact strings.</p>
<pre>
select case when t.new_country is null then o.country else t.new_country end country_name
from
translation_table t, orignal_table o
where o.country LIKE t.country(+);
</pre>
<p>But with Oracle 10g and later we have an even more flexible option, the use of regular expressions. This allows us to optionally match parts of strings, to define the length of match and even to specify alternative matching characters.<br />
We store the regular expressions in the translation_table.</p>
<pre>
select case when t.new_country is null then o.country else t.new_country end country_name
from
translation_table t, orignal_table o
where REGEXP_LIKE(o.country , t.country(+));
</pre>
<p>The key thing to watch out for is that a string might match more than one regular expression &#8211; for example &#8220;WALES&#8221; would match &#8220;WALES&#8221; (obviously) but also &#8220;NEW SOUTH WALES&#8221; &#8211; so choose the regular expressions with care.</p>
<p>But can you use this is a ETL tool such as Oracle Warehouse Builder? There is no real problem with the use of LIKE, except that you should uncheck the use ANSI SQL check box on the mapping properties, but as <a href="http://blogs.oracle.com/warehousebuilder/2009/08/pattern_matching_conditions.html">David Allan mentions</a> there is a difficulty in using the REGEXP_LIKE operator as OWB does not know that the REGEXP_LIKE expression is boolean. David&#8217;s solution is quite simple &#8211; wrap the logic in a case operator and then check that for equality; simple and effective.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2009/08/15/slightly-fuzzy-lookups/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Cardinality of Rowsources</title>
		<link>http://www.rittmanmead.com/2009/06/21/cardniality-of-rowsources/</link>
		<comments>http://www.rittmanmead.com/2009/06/21/cardniality-of-rowsources/#comments</comments>
		<pubDate>Sun, 21 Jun 2009 12:21:20 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2009/06/21/cardniality-of-rowsources/</guid>
		<description><![CDATA[One of the things I sometimes come across when looking at legacy ETL code for data loading is the misuse of query hints to &#8220;improve&#8221; data load performance. Sometimes DBAs or developers do not remember that &#8220;tuning the select&#8221; may not be the same thing as &#8220;tuning the insert&#8221;; Jonathan Lewis wrote about an example [...]]]></description>
			<content:encoded><![CDATA[<p>One of the things I sometimes come across when looking at legacy ETL code for data loading is the misuse of query hints to &#8220;improve&#8221; data load performance. Sometimes DBAs or developers do not remember that &#8220;tuning the select&#8221; may not be the same thing as &#8220;tuning the insert&#8221;; Jonathan Lewis <a href="http://jonathanlewis.wordpress.com/2008/12/05/distributed-dml" target="_blank">wrote</a> about an example of this in December 2008.</p>
<p>The circumstances Jonathan describes is commonplace in some data warehouses; an insert into a DW table from a remote (database link) source joined to, or filtered on, some rows from a table on the target data warehouse. In the post&#8217;s comments there was some debate on why Oracle needed to ignore the driving site hint of this type of query when obeying it would no effect on the insert (other than an improved query plan). There were also some suggestions to get around this such as creating a remote view to do the filter or join, which has the disadvantage of needing to create objects on the source database (views and database links) and the use of  pipelined functions on the target system to act as the rowsource for the insert.</p>
<p>I like the pipelined function approach and although there a few quirky things about parallel inserts it does present a good way forward that is not invasive on the source database. Well, it is a good way forward except that cardinality of the row source is often wrong. So I especially liked the piece just published by Adrian Billington on <a href="http://www.oracle-developer.net/display.php?id=427" target="_blank">ways to set cardinality on piplined and table functions</a>. If you join to table functions in your queries I urge you to take a look at what Adrian has to say.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2009/06/21/cardniality-of-rowsources/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dynamic SQL</title>
		<link>http://www.rittmanmead.com/2009/05/26/dynamic-sql/</link>
		<comments>http://www.rittmanmead.com/2009/05/26/dynamic-sql/#comments</comments>
		<pubDate>Tue, 26 May 2009 18:11:38 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2009/05/26/dynamic-sql/</guid>
		<description><![CDATA[One of my pet hates is the inappropriate use of dynamic SQL within ETL processes, for example building a command as VARCHAR2 string then using it in an &#8220;execute immediate&#8221; statement. Putting commands into strings whether hard-coded ones within the package or procedure, retrieved from some &#8216;code&#8217; table in the database or even built on-the-fly [...]]]></description>
			<content:encoded><![CDATA[<p>One of my pet hates is the inappropriate use of dynamic SQL within ETL processes, for example building a command as VARCHAR2 string then using it in an &#8220;execute immediate&#8221; statement. Putting commands into strings whether hard-coded ones within the package or procedure, retrieved from some &#8216;code&#8217; table in the database or even built on-the-fly can make support, enhancement and documentation tasks more complex &#8211; string content don&#8217;t show up in the &#8220;all_dependencies&#8221; view and that can make finding the code that manipulates data overly complicated.</p>
<p>I would be a hypocrite to say &#8220;never use dynamic SQL&#8221;. It does have its uses. A few years back I had to write my own materialized view refresh code for the special case of complete refresh of a single partition of a materialized view, or to be more precise the refresh of over 50 different partitioned materialized views &#8211; the pragmatic approach was to write a single procedure to use dynamic SQL to do the DML and to handle the refresh of a single partition.</p>
<p>Recently, I came across an even more compelling reason to use dynamic SQL &#8211; one of my customers needed to load several thousand flat files per day into their data warehouse. Each file had the same structure, we just needed to load them as fast as possible to a staging table and to persist the name of the original source file with the loaded data for data lineage purposes. Using Oracle external tables we have two viable options &#8211; rename the source files one-by-one to be the external table&#8217;s location name (there is also a similar approach of concatenating the files together to build a &#8216;massive&#8217; external table) or to alter the external table location to match each of the incoming file names.</p>
<p>I prefer the ALTER TABLE approach as I don&#8217;t need to manipulate files in the OS and I can do all that I need in the database (albeit with a little touch of database Java to build a table of files to process). In pseudo code terms I:</p>
<pre>
insert a directory listing into a temporary table (using a database Java procedure)

for each filename that matches the filename format

    Alter the external table location to be the be the new filename

    insert the content of the external file into the staging table (append)

Loop to the next filename</pre>
<p>Of course there are some other bits and pieces going on to log the activity and to move (or rename) files after processing so that they don&#8217;t get processed again if we need to rerun the process.</p>
<p>By processing both the ALTER TABLE and the INSERT INTO statements as dynamic SQL we avoid the overhead of repeatedly invalidating references to the external table and need to recompile any code that uses the external table;  and that is quite a saving in time over many, many files</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2009/05/26/dynamic-sql/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
