<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rittman Mead Consulting &#187; Data Warehousing</title>
	<atom:link href="http://www.rittmanmead.com/category/data-warehousing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rittmanmead.com</link>
	<description>Delivering Oracle Business Intelligence</description>
	<lastBuildDate>Mon, 06 Feb 2012 21:18:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Agile Data Warehousing with Exadata and OBIEE: ETL Iteration</title>
		<link>http://www.rittmanmead.com/2012/01/agile-exadata-obiee-etl/</link>
		<comments>http://www.rittmanmead.com/2012/01/agile-exadata-obiee-etl/#comments</comments>
		<pubDate>Fri, 27 Jan 2012 04:22:31 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[BI (General)]]></category>
		<category><![CDATA[BI 2.0]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Methodology]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=9954</guid>
		<description><![CDATA[This is the fourth entry in my series on Agile Data Warehousing with Exadata and OBIEE. To see all the previous posts, check the introductory posting which I have updated with all the entries in the series. In the last post, I describe what I call the Model-Driven iteration, where we take thin requirements from the [...]]]></description>
			<content:encoded><![CDATA[<p>This is the fourth entry in my series on Agile Data Warehousing with Exadata and OBIEE. To see all the previous posts, check the <a title="Agile Data Warehousing with Exadata and OBIEE: Introduction" href="http://www.rittmanmead.com/2011/12/agile-data-warehousing-with-exadata-and-obiee-introduction/">introductory posting</a> which I have updated with all the entries in the series.</p>
<p>In the last post, I describe what I call the Model-Driven iteration, where we take thin requirements from the end-user in the form of a user story and generate the access and performance layer, or our star schema, logically using the OBIEE semantic model. Our first several iterations will likely be Model-Driven as we work with the end user to fine-tune the content he or she wants to see on the OBIEE dashboards. As user stories are opened, completed and validated throughout the project, end users are prioritizing them for the development team to work on. Eventually, there will come a time when an end user opens a story that is difficult to model in the semantic layer. Processes to correct data quality issues are a good example, and despite having the power of Exadata at our disposal, we may find ourselves in a performance hole that even the Database Machine can&#8217;t dig us out of. In these situations, we reflect on our overall solution and consider the maxim of Agile methodology: &#8220;refactoring&#8221;, or &#8220;rework&#8221;.</p>
<p>For Extreme BI, the main form of refactoring is ETL. The pessimist might say: &#8220;Well, now we have to do ETL development, what a waste of time all that RPD modeling was.&#8221; But is that the case? First off&#8230; think about our users. They have been running dashboards for some time now with at least a portion of the content they need to get their jobs done. As the die-hard Agile proponent will tell you&#8230; some is better than none. But also&#8230; the process of doing the Model-Driven iteration puts our data modelers and our ETL developers in a favorable position. We&#8217;ve eliminated the exhaustive data modeling process, because we already have our logical model in the Business Model and Mapping layer (BMM).</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Full-Logical-Model.png"><img class="alignnone size-large wp-image-9976" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Full-Logical-Model-1024x559.png" alt="" width="614" height="335" /></a></p>
<p>But we have more than that. We also have our source-to-target information documented in the semantic metadata layer. We can see that information using the Admin Tool, as depicted below, or we can also use the &#8220;Repository Documentation&#8221; option to generate some documented source-to-target mappings.</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Map-Dimension.png"><img class="size-full wp-image-9883  alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Map-Dimension.png" alt="" width="671" height="219" /></a></p>
<p>When embarking on ETL development, it&#8217;s common to do SQL prototyping before starting the actual mappings to make sure we understand the particulars of granularity. However, we already have these SQL prototypes in the nqquery.log file&#8230; all we have to do is look at it. The combination of the source-to-target-mapping and the SQL prototypes provide all the artifacts necessary to get started with the ETL.</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Query-Log.png"><img class="alignnone size-large wp-image-9982" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Query-Log-1024x598.png" alt="" width="645" height="377" /></a></p>
<p>When using ETL processing to &#8220;instantiate&#8221; our logical model into the physical world, we can&#8217;t abandon our Agile imperatives: we must still deliver the new content, and corresponding rework, within a single iteration. So whether the end user is opening the user story because the data quality is abysmal, or because the performance is just not good enough, we must vow to deliver the ETL Iteration time-boxed, in exactly the same manner that we delivered the Model-Driven Iteration. So, if we imagine that our user opens a story about data quality in our Customer and Product dimensions, and we decide that all we have time for in this iteration are those two dimension tables, does it make sense for us to deliver those items in a vacuum? With the image below depicting the process flow for an entire subject area, can we deliver it piecemeal instead of all at once?</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Piecemeal-Process-Flow.png"><img class="alignnone size-full wp-image-9968" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Piecemeal-Process-Flow.png" alt="" width="636" height="348" /></a></p>
<p>The answer, of course, is that we can. We&#8217;ll develop the model and ETL exactly as we would if our goal was to plug the dimensions into a complete subject area. We use surrogate keys as the primary key for each dimension table, facilitating joining our dimension tables to completed fact tables. But we don&#8217;t have completed fact tables at this point in our project&#8230; instead we have a series of transaction tables that work together to form the basis of a logical fact table. How can we use a dimension table with a surrogate key to join to our transactional &#8220;fact&#8221; table that doesn&#8217;t yet have these surrogate keys?</p>
<p>We fake it. Along with surrogate keys, the long-standing best practice of dimension table delivery has been to include the source system natural key, as well as effective dates, in all our dimension tables. These attributes are usually included to facilitate slowly-changing dimension (SCD) processing, but we&#8217;ll exploit them for our Agile piecemeal approach as well. So in our example below, we have a properly formed Customer dimension that we want to join to our logical fact table, as depicted below:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Partial-Hybrid-Model-e1327470743307.png"><img class="alignnone size-full wp-image-9995" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Partial-Hybrid-Model-e1327470743307.png" alt="" width="596" height="200" /></a></p>
<p>We start by creating aliases to our transactional &#8220;fact&#8221; tables (called POS_TRANS_HYBRID and POS_TRANS_HEADER_HYBRID in the example above), because we don&#8217;t want to upset the logical table source (LTS) that we are already using for the pure transactional version of the logical fact table. We create a complex join between the customer source system natural key and transaction date in our hybrid alias, and the natural key and effective dates in the dimension table. We use the effective dates as well to make sure we grab the correct version of the customer entity in question in situations where we have enabled Type 2 SCD&#8217;s (the usual standard) in our dimension table.</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Surrogate-Pipeline.png"><img class="alignnone size-large wp-image-10007" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Surrogate-Pipeline-1024x869.png" alt="" width="574" height="486" /></a></p>
<p>This complex logic of using the natural key and effective dates is identical to the logic we would use in what Ralph Kimball calls the &#8220;surrogate pipeline&#8221;: the ETL processing used to replace natural keys with surrogate keys when loading a proper fact table. Using Customer and Sales attributes in an analysis, we can see the actual SQL that&#8217;s generated:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Surrogate-Pipeline-SQL.png"><img class="alignnone size-large wp-image-10025" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Surrogate-Pipeline-SQL-1024x510.png" alt="" width="645" height="321" /></a></p>
<p>We can view this hybrid approach as an intermediate step, but there is also nothing wrong with this as a long-term approach if the users are happy and Exadata makes our queries scream. If you think about it&#8230; a surrogate key is an easy was of representing the natural key of the table, which is the source system natural key plus the unique effective dates for the entity. A surrogate key makes this relationship much easier to envision, and certainly code using SQL, but when we are insulated from the ugliness of the join with Extreme Metadata, do we really care? If our end users ever open a story asking for rework of the fact table, we may consider manifesting that table physically as well. Once complete, we would need to create another LTS for the Customer dimension (using an alias to keep it separate from the table that joins to the transactional tables). This alias would be configured to join directly to the new Sales fact table across the surrogate key&#8230; exactly how we would expect a traditional data warehouse to be modeled in the BMM. The physical model will look nearly identical to our logical model, and the generated SQL will be less interesting:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Fact-LTS.png"><img class="alignnone size-full wp-image-10033" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Fact-LTS.png" alt="" width="221" height="226" /></a></p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Star-Schema-SQL.png"><img class="alignnone size-large wp-image-10029" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Star-Schema-SQL-1024x420.png" alt="" width="645" height="265" /></a></p>
<p>Now that I&#8217;ve described the Model-Driven and ETL Iterations, it&#8217;s time to discuss what I call the Combined Iteration, which is likely what most of the iterations will look like when the project has achieved some maturity. In Combined Iterations, we work on adding new or refactored RPD content alongside new or refactored ETL content in the same iteration. Now the project really makes sense to the end user. We allow the user community&#8211;those who are actually consuming the content&#8211;to dictate to the developers with user stories what they want the developers to work on in the next iteration. The users will constantly open new stories, some asking for new content, and others requesting modifications to existing content. All Agile methodologies put the burden of prioritizing user stories squarely on the shoulders of the user community. Why should IT dictate to the user community where priorities lie? If we have delivered fabulous content sourced with the Model-Driven paradigm, and Exadata provides the performance necessary to make this &#8220;real&#8221; content, then there is no reason for the implementors to dictate to the users the need to manifest that model physically with ETL when they haven&#8217;t asked for it. If whole portions of our data warehouse are never implemented physically with ETL&#8230; do we care? The users are happy with what they have, and they think performance is fine&#8230; do we still force a &#8220;best practice&#8221; of a physical star schema on users who clearly don&#8217;t want it?</p>
<p>So that&#8217;s it for the Extreme BI methodology. At the onset of this series&#8230; I thought it would require five blog posts to make the case, but I was able to do it in four instead. So even when delivering blog posts, I can&#8217;t help but rework as I go along. Long live Agile!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2012/01/agile-exadata-obiee-etl/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Agile Data Warehousing with Exadata and OBIEE: Model-Driven Iteration</title>
		<link>http://www.rittmanmead.com/2012/01/agile-exadata-obiee-model-driven/</link>
		<comments>http://www.rittmanmead.com/2012/01/agile-exadata-obiee-model-driven/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 05:32:10 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[BI (General)]]></category>
		<category><![CDATA[BI 2.0]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=9825</guid>
		<description><![CDATA[After laying the groundwork with an introduction, and following up with a high-level description of the required puzzle pieces, it&#8217;s time to get down to business and describe how Extreme BI works. At Rittman Mead, we have several projects delivering with this methodology right now, and more in the pipeline. I&#8217;ll gradually introduce the different types of [...]]]></description>
			<content:encoded><![CDATA[<p>After laying the groundwork with an <a title="Agile Data Warehousing with Exadata and OBIEE: Introduction" href="http://www.rittmanmead.com/2011/12/agile-data-warehousing-with-exadata-and-obiee-introduction/" target="_blank">introduction</a>, and following up with a high-level description of the required <a title="Agile Data Warehousing with Exadata and OBIEE: Puzzle Pieces" href="http://www.rittmanmead.com/2011/12/agile-exadata-obiee-puzzle-pieces/" target="_blank">puzzle pieces</a>, it&#8217;s time to get down to business and describe how Extreme BI works. At Rittman Mead, we have several projects delivering with this methodology right now, and more in the pipeline.</p>
<p>I&#8217;ll gradually introduce the different types of generic iterations that we engage in, focusing on what I call the &#8220;model-driven&#8221; iteration for this post. Our first few iterations are always model-driven. We begin when a user opens a user story requesting new content. For any request for new content, we require that all the following elements are including in the story:</p>
<ol>
<li>A narrative about the data they are looking for, and how they want to see it. We are not looking for requirements documents here, but we are looking for the user to give a complete picture of what it is that they need.</li>
<li>An indication of how they report on this content today. In a new data warehouse environment, this would include some sort of report that they are currently running against the source system, and in a perfect world, this would involve the SQL that is used to pull that report.</li>
<li>An indication of data sets that are &#8220;nice to haves&#8221;. This might include data that isn&#8217;t available to them in the current paradigm of the report, or was simply too complicated to pull in that paradigm. After an initial inspection of these nice-to-haves and the complexity involved with including them in this story, the project manager may decide to pull these elements out and put them a separate user story. This, of course, depends on the Agile methodology used, and the individual implementation of that methodology.</li>
</ol>
<p>First we assign the story to an RPD developer, who uses the modeling capabilities in the OBIEE Admin Tool to &#8220;discover&#8221; the logical dimensional model tucked inside the user story, and develop that logical model inside the Business Model and Mapping (BMM) layer. Unlike a &#8220;pure&#8221; dimensional modeling exercise where we focus only on user requirements and pay very little attention to source systems, in model-driven development, we constantly shift between the source of the data, and how best the user story can be solved dimensionally. Instead of working directly against the source system though, we are working against the foundation layer in the Oracle Next-Generation Reference Data Warehouse Architecture. We work from a top-down approach, first creating empty facts and dimensions in the BMM, and mapping them to the foundation layer tables in the physical layer.</p>
<p>To take a simple example, we can see how a series of foundation layer tables developed in 3NF could be mapped to a logical dimension table as our Customer dimension:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Dimension-Join.png"><img class="size-full wp-image-9893 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Dimension-Join.png" alt="Model-Driven Development of Dimension Table" width="425" height="208" /></a></p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Map-Dimension.png"><img class="size-full wp-image-9883 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Map-Dimension.png" alt="" width="671" height="219" /></a></p>
<p>I rearranged the layout from the Admin Tool to provide an &#8220;ETL-friendly&#8221; view of the mapping. All the way to the right, we can see the logical, dimensional version of our Customer table, and how it maps back to the source tables. This mapping could be quite complicated, with perhaps dozens of tables. The important thing to keep in mind is that this complexity is hidden from not only the consumer of the reports, but also from the developers. We can generate a similar example of what our Sales fact table would look like:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Fact-Join.png"><img class="size-full wp-image-9896 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Fact-Join.png" alt="" width="426" height="209" /></a></p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Map-Fact.png"><img class="size-full wp-image-9889 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Map-Fact.png" alt="" width="664" height="276" /></a></p>
<p>Another way of making the same point is to look at the complex, transaction model:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Physical-Model-Annotated.png"><img class="size-full wp-image-9904 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Physical-Model-Annotated.png" alt="" width="441" height="311" /></a></p>
<p>We can then compare this to the simplified, dimensional model:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Logical-Model-Annotated.png"><img class="size-full wp-image-9905 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Logical-Model-Annotated.png" alt="" width="409" height="260" /></a></p>
<p>And finally, when we view the subject area during development of an analyses, all we see are facts and dimensions. The front-end developer can be blissfully ignorant that he or she is developing against a complex transactional schema, because all that is visible is the abstracted logical model:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Astracted-View-for-Developer.png"><img class="alignnone size-full wp-image-9915" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Astracted-View-for-Developer.png" alt="" width="741" height="395" /></a></p>
<p>When mapping the BMM to complex 3NF schemas, the BI Server is very, very smart, and understands how to do more with less. Using the metadata capabilities of OBIEE is superior to other metadata products, or superior to a &#8220;roll-you-own metadata&#8221; approach using database views, because of the following:</p>
<ol>
<li>The generated SQL usually won&#8217;t involve self-joins, even when tables exists in both the logical fact table, and the logical dimension table.</li>
<li>The BI Server will only include tables that are required to facilitate the intelligent request, either because it has columns mapped to the attributes being requested, or because the table is a required reference table to bring disparate tables together. Any tables not required to facilitate the request will be excluded.</li>
</ol>
<p>Since the entire user story needs to be closed in a single iteration, the user who opened the story needs to be able to see the actual content. This means that the development of the analyses (or report) and the dashboard are also required to complete the story. It&#8217;s important to get something in front of the end user immediately, but it doesn&#8217;t have to be perfect. We should focus on a clear, concise analyses in the first iteration, so it&#8217;s easy for the end user to verify that the data is correct. In future iterations, we can deliver high-impact, eye-catching dashboards. Equally important to closing the story is being able to prove that it&#8217;s complete. In Agile methodologies, this is usually referred to as the &#8220;Validation Step&#8221; or &#8220;Showcase&#8221;. Since we have already produced the content, then it&#8217;s easy to prove to the user that the story is complete. But suppose that we believed we couldn&#8217;t deliver new content in a single iteration. That would imply that we would have an iteration during our project that didn&#8217;t include actual end-user content. How would you go about validating or showcasing that content? How would we go about showcasing a completed ETL mapping, for instance, if we haven&#8217;t delivered any content to consume it?</p>
<p>What we have at the end of the iteration is a completely abstracted view of our model: a complex, transactional, 3NF schema presented as a star schema. We are able to deliver portions of a subject area, which is important for time-boxed iterations. The Extreme Metadata of OBIEE 11g allows us to remove this complexity in a single iteration, but it&#8217;s the performance of the Exadata Database Machine that allows us to build real analyses and dashboards and present it to the general user community.</p>
<p>In the next post, we&#8217;ll examine the ETL Iteration, and explore how we can gradually manifest our logical business model into a physical model over time. As you will see, the ETL iteration is an optional one&#8230; it will be absolutely necessary in some environments, and completely superflous in others.</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Physical-Model-Annotated.png"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2012/01/agile-exadata-obiee-model-driven/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Agile Data Warehousing with Exadata and OBIEE: Puzzle Pieces</title>
		<link>http://www.rittmanmead.com/2011/12/agile-exadata-obiee-puzzle-pieces/</link>
		<comments>http://www.rittmanmead.com/2011/12/agile-exadata-obiee-puzzle-pieces/#comments</comments>
		<pubDate>Wed, 28 Dec 2011 19:39:56 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Methodology]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=9637</guid>
		<description><![CDATA[In the previous post, I laid the groundwork for describing Extreme BI: a combination of Exadata and OBIEE delivered with an Agile spirit. I discussed that the usual approach to Agile data warehousing is not Agile at all due to the violation of it&#8217;s main principle: working software delivered iteratively. If you haven&#8217;t already deduced [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a title="Agile Data Warehousing with Exadata and OBIEE: Introduction" href="http://www.rittmanmead.com/2011/12/agile-data-warehousing-with-exadata-and-obiee-introduction/" target="_blank">previous post</a>, I laid the groundwork for describing Extreme BI: a combination of Exadata and OBIEE delivered with an Agile spirit. I discussed that the usual approach to Agile data warehousing is not Agile at all due to the violation of it&#8217;s main principle: working software delivered iteratively.</p>
<p>If you haven&#8217;t already deduced from my first post &#8212; or if you haven&#8217;t already seen me speak on this topic &#8212; what I am recommending is bypassing, either temporarily or permanently, the inhibitors specific to data warehousing projects which limit our ability to deliver working software quickly. Specifically, I&#8217;m recommending that we wait to build and populate physical star schemas until a later phase, if at all. Remember the two reasons that we build dimensional models: model simplicity and performance. With our Extreme BI solution, we have tools to counter both of those reasons. We have OBIEE 11g, with a rich metadata layer that presents our underlying data model, even if it is transactional, as a star schema to the end user. This removes our dependency on a simplistic physical model to provide a simplistic logical model to end users. We also have Exadata, which delivers world-class performance against any type of model, and can bridge the performance gap afforded by star schemas. With these tools at our disposal, we can postpone the long process of building dimensional models, at least for the first few iterations. This is the only way to get working software in front of the end user in a single iteration, and, as I will argue, this is the best way to collaborate with an end user and deliver the content they are expecting.</p>
<p>Of the puzzle pieces we need to deliver this model, the first is the <a href="http://www.rittmanmead.com/wp-content/uploads/2011/12/058925.pdf" target="_blank">Oracle Next-Generation Reference DW Architecture</a> (we need an acronym for that), which Mark has already written about in-depth <a title="Drilling Down in the Oracle Next-Generation Reference DW Architecture" href="http://www.rittmanmead.com/2009/07/drilling-down-in-the-oracle-next-generation-reference-dw-architecture/" target="_blank">here</a>. As you browse through this post, pay special attention to his formulation of the foundation layer, which is the most important layer for delivering Extreme BI.</p>
<div id="attachment_9672" class="wp-caption aligncenter" style="width: 673px"><a href="http://www.rittmanmead.com/wp-content/uploads/2011/12/next-gen.png"><img class="size-large wp-image-9672    " src="http://www.rittmanmead.com/wp-content/uploads/2011/12/next-gen-1024x627.png" alt="" width="663" height="407" /></a><p class="wp-caption-text">Oracle Next-Generation Reference DW Architecture</p></div>
<h2>Foundation Layer</h2>
<p>This is our &#8220;process-neutral&#8221; layer, which means simply that it isn&#8217;t imbued with requirements about what users want and how they want it. Instead, the foundation layer has one job and one job only: tracking what happened in our source systems. Typically, the foundation layer logical model looks identical to the source systems, except that we have a few additional metadata columns on each record such as commit timestamps and Oracle Database system change numbers (SCN&#8217;s). There are other, more complex solutions for modeling the foundation layer when the 3NF from the source system or systems is not sufficient, such as <a title="Data Vault Modeling" href="http://en.wikipedia.org/wiki/Data_Vault_Modeling" target="_blank">data vault</a>. Our foundation layer is generally &#8220;insert-only&#8221;, meaning we track all history so that we are insulated from changing user requirements in the near and distant futures.</p>
<p><strong>UPDATE: </strong> Kent Graziano, a major data vault evangelist, has started <a title="Oracle Data Warrior" href="http://kentgraziano.com/" target="_blank">blogging</a>. Perhaps with some pressure from the public, we could &#8220;encourage&#8221; him to blog on what data vault would look like in a standard foundation layer.</p>
<h2>Capturing Change</h2>
<p>Also required for delivering Extreme BI is a process for capturing change from the source systems and rapidly applying it to the foundation layer, which I described briefly in one of my posts on <a title="Real-time BI: An Introduction" href="http://www.rittmanmead.com/2011/05/real-time-bi-an-introduction/" target="_blank">real-time data warehousing</a>. We have a bit of a tug-of-war at this point between Oracle Streams and Oracle GoldenGate. GoldenGate is the stated platform of the future because it’s a simple, flexible, powerful and resilient replication technology. However, it does not yet have powerful change data capture functionality specific to data warehouses, such as easy subscriptions to raw changed data, or support for multiple subscription groups. You can, in general, work around these limitations using the INSERTALLRECORDS parameter and some custom code (perhaps fodder for a future blog post). Regardless of the technology, Extreme BI requires a process for capturing and applying source system changes quickly and efficiently to the foundation layer on the Exadata Database Machine.</p>
<h2>Extreme Performance</h2>
<p>Although I&#8217;ll drill into more detail in the next post, the reason we need Extreme Performance is to offset the performance gains we usually get from star schemas, since we won&#8217;t be building those, at least not in the initial iterations. Although Rittman Mead has deployed a variant of this methodology sans Exadata using a powerful Oracle Database RAC instead, there is no substitute for Exadata. Although the hardware on the Database Machine is superb, it&#8217;s really the software that is a game-changer. The most extraordinary features include <a title="Smart Scans Meet Storage Indexes" href="http://www.oracle.com/technetwork/issue-archive/2011/11-may/o31exadata-354069.html" target="_blank">smart scan and storage indexes</a>, as well as hybrid columnar compression, which Mark talks about <a title="Hybrid Columnar Compression in Oracle Exadata v2" href="http://www.rittmanmead.com/2010/01/hybrid-columnar-compression-in-oracle-exadata-v2/" target="_blank">here</a> and references an article by Arup Nanda found <a title="Compressing Columns" href="http://www.oracle.com/technetwork/issue-archive/2010/10-jan/o10compression-082302.html" target="_blank">here</a>. For years now, with standard Oracle data warehouses, we&#8217;ve pushed the architecture to it&#8217;s limits trying to reduce IO contention at the cost of CPU utilization, using database features such as partitioning, parallel query and basic block compression. But Exadata Storage can eliminate the IO boogeyman using combinations of these standard features plus the Exadata-only features to elevate the query performance against 3NF schemas on par with traditional star schemas and beyond.</p>
<p style="text-align: center"><a href="http://www.rittmanmead.com/wp-content/uploads/2011/12/Terabytes-to-Gigabytes.png"><img class="aligncenter size-full wp-image-9739" src="http://www.rittmanmead.com/wp-content/uploads/2011/12/Terabytes-to-Gigabytes.png" alt="" width="617" height="352" /></a></p>
<h2>Extreme Metadata</h2>
<p>Extreme performance is only half the battle&#8230; we also need Extreme Metadata to provide us the proper level of abstraction so that report and dashboard developers still have a simplistic model to report against. This is what OBIEE 11g brings to the table. We have also delivered a variant of this methodology without OBIEE, using Cognos instead, which has a metadata layer called <a title="Framework Manager" href="http://www.ironsidegroup.com/2010/07/08/best-practices-in-cognos-8-framework-manager-model-design/" target="_blank">Framework Manager</a>. As with Exadata, the BI Server has no equal in the metadata department, so my advice&#8230; don&#8217;t substitute ingredients.</p>
<p>Consider, for a moment, the evolution of dimensional modeling in deploying a data warehouse. Not too long ago, we had to solve most data warehousing issues with the logical model because BI tools were simplistic. Generally&#8230; there was no abstraction of the physical into the logical, unless you categorize the renaming of columns as abstraction. As these tools evolved, we often found ourselves with a choice: solve some user need in the logical model, or solve it with the feature set of the BI tool. The use of aggregation in data warehousing is a perfect example of this evolution. Designing aggregate tables used to be just another part of the logical modeling exercise, and were generally represented in the published data model for the EDW. But now, building aggregates is more of a technical implementation than a logical one, as either the BI Server or the Oracle Database can handle the transparent navigation to aggregate tables.</p>
<p>The metadata that OBIEE provides adds two necessary features for Agile delivery. First, we are able to report against complex transactional schemas, but still expose those schemas as simplified dimensional models. This allows us to bypass the complex ETL process at least initially so that we can get new subject areas into the users hands in a single iteration. But OBIEE&#8217;s capability to map multiple Logical Table Sources (LTS&#8217;s) for the same logical table makes it easy to modify &#8212; or &#8220;remap&#8221; &#8212; the source of our logical tables over time. So, in later iterations, if we decide that it&#8217;s necessary to embark upon complex ETL processes to complete user stories, we can do this in the metadata layer without affecting our reports and dashboards, or changing the logical model that report developers are used to seeing.</p>
<div id="attachment_9754" class="wp-caption aligncenter" style="width: 612px"><a href="http://www.rittmanmead.com/wp-content/uploads/2011/12/semantic-model.031.png"><img class="size-full wp-image-9754 " src="http://www.rittmanmead.com/wp-content/uploads/2011/12/semantic-model.031.png" alt="" width="602" height="378" /></a><p class="wp-caption-text">Flow of Data Through the Three-Layer Semantic Model</p></div>
<h2>More to Come&#8230;</h2>
<p>In the next post, I&#8217;ll describe what I call the Model-Driven Iteration, where we use OBIEE against the foundation layer to expose new subject areas in a single iteration. After that, I&#8217;ll describe ETL Iterations, where we transform a portion of our model iteratively using ETL tools such as ODI, OWB or Informatica. Finally, I&#8217;ll describe what I call Combined Iterations, where both Model-Driven activity and ETL activity are going on at the same time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/12/agile-exadata-obiee-puzzle-pieces/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Agile Data Warehousing with Exadata and OBIEE: Introduction</title>
		<link>http://www.rittmanmead.com/2011/12/agile-data-warehousing-with-exadata-and-obiee-introduction/</link>
		<comments>http://www.rittmanmead.com/2011/12/agile-data-warehousing-with-exadata-and-obiee-introduction/#comments</comments>
		<pubDate>Wed, 21 Dec 2011 15:48:55 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Methodology]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=9597</guid>
		<description><![CDATA[Over the last year, I&#8217;ve been speaking at conferences on one subject more than any others: Agile Data Warehousing with Exadata and OBIEE. Although I&#8217;ve been busy with client work and growing the US business, I realize I need to dedicate more time to blogging again, and this seemed like the logical subject to take [...]]]></description>
			<content:encoded><![CDATA[<p>Over the last year, I&#8217;ve been speaking at conferences on one subject more than any others: Agile Data Warehousing with Exadata and OBIEE. Although I&#8217;ve been busy with client work and growing the US business, I realize I need to dedicate more time to blogging again, and this seemed like the logical subject to take up. So I&#8217;ll use the next few blog posts to make my case for what I like to call Extreme BI: an Agile approach to data warehousing using the combination of Extreme Performance and Extreme Metadata.</p>
<p>In a standard data warehouse implementation, whether we are walking in the Inmon or Kimball camps, some portion of our data model will be dimensional in nature; a star schema with facts and dimensions. So let me pose a question, which I think will lend itself well to diving into the Extreme BI discussion: Why do we build dimensional models? The first reason is simplicity. We want to model our reporting structures in a way that makes sense to the business user. The standard OLTP data model that takes two of the four walls in the conference room to display is just never going to make sense to your average business user. At the end of a logical modeling exercise, I expect the end-user to have a look at a completed dimensional model and say: &#8220;Yep&#8230; that&#8217;s our business alright&#8221;. The second reason we build dimensional models is for performance. Denormalizing highly complex transactional models into simplified star schemas generally produces tremendous performance gains.</p>
<p>So my follow-up question: can the combination of Exadata and OBIEE, or Extreme BI, <em>actually change the way we deliver projects? </em>We&#8217;ve all seen the Exadata performance numbers that Oracle publishes, and I can tell you first hand the performance is impressive. Can this Extreme Performance combined with the Extreme Metadata that OBIEE provides give us a more compelling case for delivering data warehouses using Agile methodologies?</p>
<p>To start with, I&#8217;d like to paint a picture of what the typical waterfall data warehousing project looks like. The tasks we usually have to complete, in order, are the following:</p>
<ol>
<li>User interviews</li>
<li>Construct requirement documents</li>
<li>Create logical data model</li>
<li>SQL prototyping of source transactional models</li>
<li>Document source-to-target mappings</li>
<li>ETL development</li>
<li>Front-end development (analyses and dashboards)</li>
<li>Performance tuning</li>
</ol>
<p>Raise your hand if this looks familiar. We would have to go through all these steps, which could take months, before end users can see the fruits of our labor. To mitigate this scenario, organizations will attempt to deliver data warehouses using &#8220;Agile&#8221; methodologies. What this usually means, from my experience, is a simple repackaging of the same waterfall project plan into &#8220;iterations&#8221; or &#8220;sprints&#8221;, so that the project can be delivered iteratively. So the process might look like the following:</p>
<ol>
<li>Iteration 1: Interviews and user requirements</li>
<li>Iteration 2: Logical modeling</li>
<li>Iteration 3: ETL Development</li>
<li>Iteration 4: Front-end development</li>
</ol>
<p>But this, ladies and gentlemen, is not Agile. To get an understanding of what lies at the heart of Agile development, we need to look no further than the <a title="The Agile Manisfesto" href="http://agilemanifesto.org/" target="_blank">Agile Manifesto</a>, or the history of the <a title="The Agile Movement" href="http://en.wikipedia.org/wiki/Agile_software_development" target="_blank">Agile Movement</a>. When examining the different methodologies, there is one major theme that permeates all of them: working software delivered iteratively. It&#8217;s not enough to simply deliver the same old waterfall methodology in &#8220;sprints&#8221; or &#8220;iterations&#8221;, because, at the end of those iterations, we don&#8217;t have any working software&#8230; software that end users can actually use to improve their job or help them make better decisions. In the example above, we still require four iterations before we get any usable content. It doesn&#8217;t matter if we&#8217;ve written some complex ETL to load a fact table if the end user doesn&#8217;t have a working dashboard to go along with it.</p>
<p>To apply the Agile Manifesto to data warehouse delivery, it&#8217;s the following key elements that are required for us to deliver with a true Agile spirit:</p>
<ol>
<li>User stories instead of requirements documents: a user asks for particular content through a narrative process, and includes in that story whatever process they currently use to generate that content.</li>
<li>Time-boxed iterations: iterations always have a standard length, and we choose one or more user stories to complete in that iteration.</li>
<li>Rework is part of the game: there aren&#8217;t any missed requirements&#8230; only those that haven&#8217;t been addressed yet.</li>
</ol>
<p>I&#8217;ve been conscious not to prescribe any distinct Agile methodology, though I can&#8217;t help using more Scrum-like concepts in this formulation. However, I think this list is generic enough to apply to most methodologies. Over the next few posts, I&#8217;ll discuss the necessary puzzle pieces to engage in Extreme BI, as well as how we might implement new subject area content in a single iteration. Additionally, I&#8217;ll discuss how these implementations might be reworked, or &#8220;refactored&#8221;, over several iterations to produce data warehouses that respond to user stories: what users want and when they want it.</p>
<p><strong>Follow-up Posts</strong></p>
<p><a title="Agile Data Warehousing with Exadata and OBIEE: Puzzle Pieces" href="http://www.rittmanmead.com/2011/12/agile-exadata-obiee-puzzle-pieces/" target="_blank">Agile Data Warehousing with Exadata and OBIEE: Puzzle Pieces</a></p>
<p><a title="Agile Data Warehousing with Exadata and OBIEE: Model-Driven Iteration" href="http://www.rittmanmead.com/2012/01/agile-exadata-obiee-model-driven/">Agile Data Warehousing with Exadata and OBIEE: Model-Driven Iteration</a></p>
<p><a title="Agile Data Warehousing with Exadata and OBIEE: ETL Iteration" href="http://www.rittmanmead.com/2012/01/agile-exadata-obiee-etl/" target="_blank">Agile Data Warehousing with Exadata and OBIEE: ETL Iteration</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/12/agile-data-warehousing-with-exadata-and-obiee-introduction/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>More Notes on Right-Time BI</title>
		<link>http://www.rittmanmead.com/2011/11/more-notes-on-right-time-bi/</link>
		<comments>http://www.rittmanmead.com/2011/11/more-notes-on-right-time-bi/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 15:02:23 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[BI (General)]]></category>
		<category><![CDATA[Data Warehousing]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=9066</guid>
		<description><![CDATA[Over the past couple of years Stewart Bryson and I have been looking into things &#8220;right time&#8221; (or is that realtime?). It is great to have him around to trade ideas (and graphics for presentations!). Most of what we have been discussing has been about &#8220;traditional reporting&#8221;, either with or without a data warehouse, and [...]]]></description>
			<content:encoded><![CDATA[<p>Over the past couple of years Stewart Bryson and I have been looking into things &#8220;right time&#8221; (or is that realtime?). It is great to have him around to trade ideas (and graphics for presentations!). Most of what we have been discussing has been about &#8220;traditional reporting&#8221;, either with or without a data warehouse, and definitely in the realms of &#8220;how well have we done&#8221;. However, that is not the sole use case for right time BI.</p>
<p>I have long felt that BI is only done for one of three reasons &#8211; the law says we must report things, it saves us money, or it makes us money; so if knowing something sooner gives us competitive advantage then surely that is a good thing. Knowing sooner is not enough though; it is also about being able to act on the information to facilitate a change in the organization that enhances return (or lowers costs). To my mind we are moving from the traditional &#8220;let&#8217;s look at this in aggregate&#8221; stance to a world where we ask &#8220;what is the significance of this newly observed fact&#8221;. This type of analysis requires a body of data to create a reference model and access to smart statistical tools to allow us to make judgments based on probabilities. Making such decisions based on dynamic events is not just for stock markets and bankers, the same principles apply in many sectors. I know of some restaurant chains that have investigated using centrally monitored sales across all outlets to dynamically adjust staff levels based on likely demand &#8211; staff are sent home, brought in, moved between outlets based on a predictive model that uses past trading patterns across many outlets.</p>
<p>As usual, most of the building blocks we need to do this are available to us, we just need a bit of creativity to join them together into an architecture. For this kind of use I feel that messaging should be core to the data capture &#8211; we want to look at single items of &#8220;fact&#8221; and do some statistical analysis on them before adding them to the data warehouse (or what ever form our data repository takes) so that the new fact can become part of the base data set we use to analyze the next fact to arrive. Micro-batch loading of log based change data is probably less suited here as we are:</p>
<ol>
<li>adding to the latency by using discrete loads at fixed intervals and</li>
<li>the processing of many items at a time complicates the statistical analysis and alerting phases (after all if we get 2453 credit card transactions in a batch only a few will be potentially fraudulent).</li>
</ol>
<p>After we capture a message from the source system we can pass the information through a chain of processes to analyze the information, propagate alerts based on the statistical significance of the item and add the data to the data store so that it becomes part of the knowledge. This last stage of adding the message to the data will probably need to be in a micro-batch mode rather than one-row-at-a-time-as-it-arrives &#8211; the latencies of adding fact to a conformed OLAP system (database, cubes, whatever) are such that single row additions will just take too much time, even if our target is an in-memory system. Here the art of the designer is to balance the availability of data, the time to reprocess the OLAP structures, the desire to keep the system up to date. It is always worth noting that for many data domains having to-the-second data is not that important as any new rows are unlikely to change the statistical results, however some subject domains will need access to all of the recent information including that which has not yet made it to the data warehouse, and here the creativity comes in.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/11/more-notes-on-right-time-bi/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Oracle Warehouse Builder and Data Integrator</title>
		<link>http://www.rittmanmead.com/2011/10/oracle-warehouse-builder-and-data-integrator/</link>
		<comments>http://www.rittmanmead.com/2011/10/oracle-warehouse-builder-and-data-integrator/#comments</comments>
		<pubDate>Sun, 30 Oct 2011 13:22:29 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Data Integrator]]></category>
		<category><![CDATA[Oracle Warehouse Builder]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=9043</guid>
		<description><![CDATA[Sometimes, when I am working with customers on data warehousing projects I am asked questions about Oracle Warehouse Builder and its future. I know no more on this than what I read in Oracle&#8217;s reposted a statement of direction from May 2011 and recent internet postings elsewhere which states that OWB 11gR2 will be the [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes, when I am working with customers on data warehousing projects I am asked questions about Oracle Warehouse Builder and its future. I know no more on this than what I read in Oracle&#8217;s <a title="Statement of Direction" href="http://www.oracle.com/technetwork/middleware/data-integrator/overview/sod-1-134268.pdf" target="_blank">reposted</a> a statement of direction from May 2011 and recent internet postings elsewhere which states that OWB 11gR2 will be the final release, although it will be patched to work with the Oracle 12 database when that comes along. To me this means that for existing OWB projects there is no hurry to migrate to ODI &#8211; Oracle have signaled in their statement of direction that a future ODI release will help smooth the migration path. However, I think that for new projects ODI should be considered as first choice &#8211; unless you only require the basic OWB functionality that is included with the Oracle database&#8217;s license, and even then I would be tempted to look at the advantages of using the enterprise-quality features you gain with the purchase of ODI.</p>
<p>One question that often comes up is &#8220;How is OWB different from ODI, after all they both do E-LT?&#8221; I have written a small series of blogs to be published over the next few months that look at this subject from the point of view of an OWB developer moving to ODI.</p>
<p>To start things off here is the first of the series where I am looking at OWB and ODI in high level terms and point out some of the key differences and similarities. I will be considering the two current releases (OWB 11.2 and ODI 11.1.1.5). Later blogs will look in more detail about the actual development of ETL process and how to orchestrate them.</p>
<p>Both ODI and OWB have a similar (I am being very simplistic here) three-component design of: a metadata repository, a development environment where the developer defines the processes and data flows and a runtime component that executes the code and flows. It is the &#8220;how&#8221; of these things that is different for the two tools.</p>
<p>Both are repository driven, that is the metadata that describes the ELT processes, data structures being accessed and host of other things is held in a database schema. For OWB the repository is pre-installed (the user needs to create a workspace though) in an Oracle 11gR2 database, optionally, the OWB repository can be installed into an other Oracle database if required. ODI&#8217;s repository is installed using Oracle Fusion Middleware&#8217;s Repository Creation Utility into a supported (and not necessarily Oracle) database. With ODI, the repository can be shared with other components that use the Fusion Middleware stack such as OBIEE 11g, whether this is desirable would depend on your circumstances and factors such as your organization&#8217;s software release process and network topology &#8211; just because it is possible to have all on one database does not make it desirable.</p>
<p>Cosmetically, there is a lot of similarity between the two development environments, they are both part of the same unified family of Java IDE applications as JDeveloper and SQLDeveloper; the look and feel is similar, for example double-clicking on a tab has the same effect (it toggles the tab&#8217;s panel between full-sized and windowed). What is different however is the content of the windows and navigators and that is a big topic for later postings.In practice, with OWB the key parts of the IDE are those for the development of MAPPINGS and (optionally) the design of process flows to orchestrate mappings. In the ODI world think INTERFACES for mappings and PACKAGES for process flows. This is simplistic though as ODI also has PROCEDURES (code developed in one of the ODI supported languages) and LOAD PLANS (multiple packages orchestrated to execute in serial or parallel). OWB mappings require the developer to include all of the components needed to facilitate the mapping &#8211; we connect source columns to target columns through a logic flow of joiners, filters, expressions, aggregates and a whole palette of other activities. Typically, this would generate a single, but large, SQL statement with much use of in-line views. ODI interfaces are simply about connecting source columns to target columns in a logical relationship (we also create expressions, joins and filters here) and allowing the physical implementation to be supplied by a knowledge module.</p>
<p>In its most common usage mode, OWB deploys its executable code into PL/SQL packages in the target database. Even pure SQL set-based insert code is wrapped into a package that contains the control and audit methods that allow it to execute under the control of the Control Center and the OWB runtime. The code generated by ODI depends on the knowledge modules used and might be native SQL which is executed directly against the target database by the Java agent executing the code. Again this is a big topic and more will follow in later blogs.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/10/oracle-warehouse-builder-and-data-integrator/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Real-time BI: EDW with a Real-time Component</title>
		<link>http://www.rittmanmead.com/2011/07/real-time-bi-edw-with-a-real-time-component/</link>
		<comments>http://www.rittmanmead.com/2011/07/real-time-bi-edw-with-a-real-time-component/#comments</comments>
		<pubDate>Wed, 06 Jul 2011 20:46:55 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle Warehouse Builder]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=8630</guid>
		<description><![CDATA[I apologize for the long delay in getting this last portion of the Real-time discussion in place. Since I wrote the first two installments, we&#8217;ve had the BI Forum (US and UK versions), plus a flurry of activity around Rittman Mead in the US, followed up by KScope11. But a promise is a promise, and [...]]]></description>
			<content:encoded><![CDATA[<p>I apologize for the long delay in getting this last portion of the Real-time discussion in place. Since I wrote the first two installments, we&#8217;ve had the BI Forum (US and UK versions), plus a flurry of activity around Rittman Mead in the US, followed up by KScope11. But a promise is a promise, and here goes with the conclusion.</p>
<p>I laid out the general vocabulary and considerations for Real-time BI in <a href="http://www.rittmanmead.com/2011/05/real-time-bi-an-introduction/">my first post</a> in this series, and then followed up with how to implement Real-time BI using <a href="http://www.rittmanmead.com/2011/05/real-time-bi-federated-oltpedw-reporting/">a federated approach</a> that relies on the metadata capabilities OBIEE to blend two different environments into one. Now I&#8217;d like to discuss how we might implement a Real-time solution by relying on ETL instead of BI Tool metadata. I call this EDW with a Real-Time Component.</p>
<p>Whereas the Federated OLTP/EDW Reporting option provides us an option to layer real-time data into an otherwise classic batch-loaded EDW, delivering the EDW with a Real-Time Component requires designing an EDW from the ground up that supports real-time reporting. Specifically, we have to design our fact tables to support what Ralph Kimball calls the “real-time partition” in his book <em>The Kimball Group Reader</em>: “To achieve real-time reporting, we build a special partition that is physically and administratively separated from the conventional static data warehouse tables. Actually, the name partition is a little misleading. The real-time partition may be a separate table, subject to special rules for update and query.” We construct a separate section for each of our fact tables to facilitate the following 4 requirements, as defined by Kimball:</p>
<ol>
<li>Contain all activity since the last time the load was run</li>
<li>Link seamlessly to the grain of the static data warehouse tables</li>
<li>Be indexed so lightly that incoming data can “dribble in”</li>
<li>Support highly responsive queries</li>
</ol>
<p><img style="margin-left: auto;margin-right: auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/07/real-time-partition.png" border="0" alt="Real time partition" width="600" height="375" /></p>
<p>So we modify our model to support the interaction of real-time and static data, but we also modify our ETL to support this. In fact, to construct an EDW with a Real-Time Component, we have to build some very intricate interaction between the database, the data model and ETL processes. The static fact table is partitioned on a date data-type using standard Oracle partitioning strategies. The real-time partition is structured in such a way as to be loadable throughout the day. In other words, there are no indexes or constraints enabled on the table. ETL against the real-time partition uses a process comparable to traditional load scenarios, but using micro-batch instead, running as often as 100 times a day or more. Alternative methods include transactional style, record-by-record loading, possible using web services or message-based system such as JMS queues.</p>
<p>We  effectively want to build a single logical fact table out of the combination of the static EDW fact table and the real-time fact partition. There are several ways to do this. We could use OBIEE fragmentation for this, as we saw in the <a href="http://www.rittmanmead.com/2011/05/real-time-bi-federated-oltpedw-reporting/">last post.</a> This would work, but it&#8217;s not what I recommend. The reason we used fragmentation in the last post is because we were joining two completely different data sets across conformed dimensions into a unified model. However, with the real-time partition, we have two tables that have exactly the same structure—both using the same surrogate keys to the same dimension tables—just separated across different segments for performance reasons. In this case, I choose to UNION the two datasets with either a database view, or an opaque view in OBIEE.</p>
<p><img style="margin-left: auto;margin-right: auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/07/opaque-union-view.png" border="0" alt="Opaque union view" width="542" height="553" /></p>
<p>This works because we no longer have to control which source the rows will come from in particular situations: we simply pull all the rows, and use standard WHERE filters to limit the rows where applicable, and like the pruning the BI Server did for us in the last post, the Oracle Database will do for us in this case. We can, however, still present the static fact tables in situations that merit it: I&#8217;m thinking of financial reports here. Accountants don&#8217;t usually like their reports giving different results every time they run them.</p>
<p>We have one issue with the load of the real-time partition: we are assuming that we receive all of our dimension data right along with our fact data in clean CDC subscription groups. That would likely be the case if we were pulling all the data for our data warehouse from a single source-system, but with enterprise data warehouses, that is rarely the case. Receiving dimension data early causes no problems with our load scenario; it doesn’t matter if we do the surrogate key lookup for the fact table load hours or days later than the dimensions. Receiving the fact table data early does present us with ETL logic issues: the correct dimension record may or may not be there when it’s time to load the facts.</p>
<p>There is a simple strategy to handle early-arriving facts. In our ETL, we implement a process to insure that our facts are at least reportable intra-day:</p>
<ol>
<li>If a dimension record exists for the current business or natural key we are interested in, then grab the latest record. This is the best we can do at this point, and will usually be the correct value.</li>
<li>If no dimension record exists yet for the current natural key, then use a default record type equating to “Not Known Yet.” Though it’s not sexy for intra-day reporting, it at least makes the data available across the dimensions we do know about.</li>
<li>As we approach the end of the day and prepare to “close the books” for the current day, we should have run all dimension loads—even late arriving dimensions—so that our dimension tables are all up to date. At this point we run a corrective mapping to update all the fact records in the real-time partition with the right surrogate keys. This would likely be a MERGE statement, or a TRUNCATE/INSERT style mapping. From a performance perspective, my bet is on the latter.</li>
</ol>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2011/07/outer-join-mapping1.png"><img class="size-large wp-image-8631 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2011/07/outer-join-mapping1-1024x354.png" alt="" width="737" height="255" /></a></p>
<p>&nbsp;</p>
<p>The above mapping loads the real-time partition in a micro-batch style doing an outer join to the CUSTOMER_DIM table and writing the &#8220;Not Known Yet&#8221; row in case a customer is not found. Also, I am employing a Splitter Operator in OWB, but I tricked it out to force it to load all rows to BOTH tables: SALES_FACT_RT and SALES_STG_RT. The reason for this is that we don&#8217;t write dimension natural keys into our fact tables, though I&#8217;ve seen that technique employed in some real-time implementations. So when it&#8217;s time to run our corrective mapping to correct our fact table data, we just join the SALES_STG_RT table to the now-correct dimension tables and produce the right surrogate keys for each fact record, and load the results into SALES_FACT_RT.</p>
<p>When “closing the books” on the day, we build indexes and constraints on the real-time partition that match those on the partitioned fact table. Once this step is complete, we can then use a partition-exchange operation to combine the real-time partition as part of the static fact table. In Oracle, this is a fast, dictionary update, and occurs almost instantaneously.<br />
Obviously, our partitioning choice for the fact table will determine exactly how this partition-exchange will occur. If we’ll agree to partition the fact table by DAY, then we can use Oracle Interval partitioning, available in Oracle 11gR1 and beyond. We have to make this concession because Interval partitioning tables cannot have partitions in the same table that contain different range-based boundaries. For instance, we can’t have some MONTH-based partitions, while also having some DAY-based partitions, as we can with regular range-based partitioning. Using Interval partitioning is the easiest method, however, because it requires the least amount of partition maintenance as part of the load. For instance, consider the SALES_FACT table listed below, using Interval partitioning on the SALES_DATE_KEY, which we partition on at the DAY grain:</p>
<pre>CREATE TABLE sales_fact
       (
         customer_key           NUMBER           NOT NULL,
         product_key            NUMBER           NOT NULL,
         staff_key              NUMBER           NOT NULL,
         store_key              NUMBER           NOT NULL,
         sales_date_key         DATE             NOT NULL,
         trans_id               NUMBER,
         trans_line_id          NUMBER,
         sales_date             DATE,
         unit_price             NUMBER,
         quantity               NUMBER,
         amount                 NUMBER
       )
       partition BY range (sales_date_key)
       interval (numtodsinterval(1,'DAY'))
       (
         partition sales_fact_2006 VALUES less than (to_date('2007-01-01','YYYY-MM-DD'))
       )
       COMPRESS
/</pre>
<p>Each time we load a record into SALES_FACT for which no partition currently exists, Oracle will spawn one for the table. But based on our real-time requirements, we will use a partition-exchange operation every day to close the books on the current day processing, so each day, we will need to spawn a clean, new partition to facilitate that partition-exchange. All we need to do to make this happen is issue an insert statement with a DATE value for the partitioning key that equates to TRUNC(SYSDATE). For instance, the following statement would generate a new partition that we can use for the exchange:</p>
<pre>SQL&gt; INSERT INTO gcbc_edw.sales_fact
  2         (
  3           customer_key,
  4           product_key,
  5           staff_key,
  6           store_key,
  7           sales_date_key,
  8           trans_id,
  9           trans_line_id,
 10           sales_date,
 11           unit_price,
 12           quantity,
 13           amount)
 14         VALUES
 15         (
 16           -1,
 17           -1,
 18           -1,
 19           -1,
 20           trunc(SYSDATE),
 21           -1,
 22           -1,
 23           SYSDATE,
 24           0,
 25           0,
 26           0
 27         )
 28  /

1 row created.

Elapsed: 00:00:00.01
SQL&gt;</pre>
<p>Once the insert has created our new SYSDATE-based partition, we can exchange the real-time partition in for this new partition. We can use the new PARTITION FOR clause — which allows us to reference partition names using partition key values — with a slight caveat. Though we can’t use SYSDATE explicitly in the DDL statement, we can reference it implicitly:</p>
<pre>SQL&gt; DECLARE
  2     l_date DATE := SYSDATE;
  3     l_sql  LONG;
  4  BEGIN
  5     l_sql :=   q'|alter table gcbc_edw.sales_fact exchange partition|'
  6             || chr(10)
  7             || q'|for ('|'
  8             || l_date
  9             || q'|') with table gcbc_edw.sales_fact_rt|';
 10
 11     dbms_output.put_line( l_sql );
 12     EXECUTE IMMEDIATE( l_sql );
 13  END;
 14  /

alter table gcbc_edw.sales_fact exchange partition
for ('03/01/2011 09:38:33 PM') with table gcbc_edw.sales_fact_rt

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.07
SQL&gt;</pre>
<p>Using the preferred Interval partitioning option, the final “close the books” process flow is shown below. The first step that is taken is to run any late-arriving dimension mappings, in this example, the MAP_CUSTOMER_DIM mapping. Once all the dimensions are up-to-date, we can run the process that corrects all the dimension keys in the real-time partition. Remember, the real-time partition contains small data sets, so updating these records should not be resource intensive. In this scenario, the mapping MAP_CORRECT_SALES_FACT_RT issues an Oracle MERGE statement, but it is quite likely that a TRUNCATE/INSERT statement would work just as well. Once all the data in the real-time partition is correct and ready to go, we issue the MAP_CREATE_PARTITION mapping which uses an insert statement to spawn a new partition, and then the EXCHANGE_PARTITION PL/SQL procedure builds indexes and constraints, and completes the process by issuing the partition-exchange statement.</p>
<p><img style="margin-left: auto;margin-right: auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/07/corrective-process-flow1.png" border="0" alt="Corrective process flow" width="545" height="275" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/07/real-time-bi-edw-with-a-real-time-component/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>ODI 11g New Mapping and Interface Features &#8211; Part 3 &#8211; OBIEE Lineage</title>
		<link>http://www.rittmanmead.com/2011/06/odi-11g-new-mapping-and-interface-features-part-3-obiee-lineage/</link>
		<comments>http://www.rittmanmead.com/2011/06/odi-11g-new-mapping-and-interface-features-part-3-obiee-lineage/#comments</comments>
		<pubDate>Wed, 15 Jun 2011 04:38:14 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Data Integrator]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=8453</guid>
		<description><![CDATA[In the first two parts of this min-series, I looked at the new mapping and interface features with ODI 11.1.1.3, and yesterday I took a look at Load Plans, a new feature that&#8217;s come with the 11.1.1.5 release of Oracle Data Integrator. Today, to finish-off the series, I&#8217;ll be taking a look at something that [...]]]></description>
			<content:encoded><![CDATA[<p>In the first two parts of this min-series, I looked at the <a href="http://www.rittmanmead.com/2011/06/odi-11g-new-mapping-and-interface-features-part-1/">new mapping and interface features with ODI 11.1.1.3</a>, and yesterday I took a look at <a href="http://www.rittmanmead.com/2011/06/odi-11g-new-mapping-and-interface-features-part-2-load-plans/">Load Plans</a>, a new feature that&#8217;s come with the 11.1.1.5 release of Oracle Data Integrator. Today, to finish-off the series, I&#8217;ll be taking a look at something that intersects both ODI and OBIEE;<a href="http://download.oracle.com/docs/cd/E21764_01/integrate.1111/e12644/biee_lineage.htm#insertedID0"> &#8220;Oracle Business Intelligence Data Lineage&#8221;</a>, a new feature that&#8217;s also just shipped with ODI 11.1.1.5</p>
<p>Something that&#8217;s always been a weakness with Oracle&#8217;s BI&amp;DW tools (and indeed, with most vendors&#8217; tools) is the multiple, fragmented repositories that they use. When you create data objects and data mappings in ODI, there&#8217;s no in-built link between those and the repository objects you create with the OBIEE Administration tool, and it&#8217;s also tricky to trace data elements from analyses and dashboards through to the OBIEE repository, let alone the ETL tool and the underlying data source. The Oracle Business Intelligence Data Lineage feature with ODI 11.1.1.5 looks to address this by linking the ODI and OBIEE repositories and allowing you to query lineage and impact across both sources.</p>
<p>To install the OBIEE Data Lineage feature, you need to make sure you&#8217;re on ODI 11.1.1.5 and then download the ODI Companion CD along with ODI itself. Once you&#8217;ve done this, you run a Data Lineage Wizard that ships as part of the Companion CD, and you&#8217;re given three options to work with.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-17.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-17.png" border="0" alt="Sshot 17" width="600" height="455" /></p>
<p>You can set this up for both OBIEE 10g or 11g, but the install is a bit simpler and automated with 11g. To set things up, select I<strong>nstall Lineage in OBIEE Server</strong>, make sure the BI Server and Presentation Server are both stopped, and pick an RPD for the wizard to merge the physical, logical and subject area contents in to.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-18.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-18.png" border="0" alt="Sshot 18" width="600" height="455" /></p>
<p>The wizard installs objects in your repository (RPD), together with analyses in your presentation catalog. Once its run, you can take a look at what&#8217;s been created.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-26.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-26.png" border="0" alt="Sshot 26" width="600" height="390" /></p>
<p>There&#8217;s a couple of steps to change the supplied connection pool details to point to your combined ODI and master repositories (this works best if you&#8217;ve used the RCU to create your ODI repositories), but once it&#8217;s all done and you&#8217;ve checked the connections, you&#8217;ve got a set of tables that hold combined ODI and OBIEE lineage data that then needs to be initially loaded using the wizard.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-22.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-22.png" border="0" alt="Sshot 22" width="600" height="455" /></p>
<p>Selecting the <strong>Export Metadata from OBIEE and Refresh Lineage</strong> option copies repository information from your RPD file, and analyses and dashboard information from your presentation catalog, into a set of new tables in your ODI repository. You need to run this step every time your OBIEE or ODI repository information changes, and there are supplied scripts that you can use to automate the process.</p>
<p>Once you&#8217;ve supplied the connection details to your OBIEE repository and catalog, you then match up the ODI models in your ODI repository with the physical databases in your OBIEE repository, so that the lineage links can be created.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-31.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-311.png" border="0" alt="Sshot 31" width="600" height="455" /></p>
<p>Once it&#8217;s all run and setup, you can then start querying your data lineage using the supplied analyses, or create your own. In the example below, I&#8217;ve selected a subject area table from the OBIEE repository, and the analysis initially shows me the columns that it contains.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-34.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-34.png" border="0" alt="Sshot 34" width="600" height="301" /></p>
<p>Pressing the <strong>Lineage</strong> button shows me some more information about a selected subject area column, including where it came from in the source database.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-35.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-35.png" border="0" alt="Sshot 35" width="600" height="148" /></p>
<p>A &#8220;Runtime Stats&#8221; analysis shows me the history of agent executions within my repository, allowing me to analyze past runs, identify long-running process and spot where errors and warnings are getting raised.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-36.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-36.png" border="0" alt="Sshot 36" width="600" height="304" /></p>
<p>You can also add contextual links to analyses in your catalog, to allow users to display details on where the data items came from in their report. For example, in the dashboard below, I&#8217;ve added a link as a text item under the analysis, which the user can then click on to display lineage data.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-37.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-37.png" border="0" alt="Sshot 37" width="600" height="517" /></p>
<p>Clicking on the link, which is actually some embedded HTML in a text item added to the dashboard, calls another report and passes across the name of the request we&#8217;re interested in, giving us a contextual lineage report.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-38.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-38.png" border="0" alt="Sshot 38" width="600" height="301" /></p>
<p>From that point on, I can drill-down further through the business model and mapping columns, physical layer columns, through to the ODI repository objects that were used to populate the request data.</p>
<p>So there we are, the last of the new ODI 11.1.1.5 features that I wanted to cover. Take a look through the <a href="http://www.oracle.com/technetwork/middleware/data-integrator/overview/odi-11115-newfeatures-overview-wp-394851.pdf">ODI 11.1.1.5 New Features PDF</a> if you&#8217;re interested in hearing what else is new with this release.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/06/odi-11g-new-mapping-and-interface-features-part-3-obiee-lineage/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>ODI 11g New Mapping and Interface Features &#8211; Part 2 &#8211; Load Plans</title>
		<link>http://www.rittmanmead.com/2011/06/odi-11g-new-mapping-and-interface-features-part-2-load-plans/</link>
		<comments>http://www.rittmanmead.com/2011/06/odi-11g-new-mapping-and-interface-features-part-2-load-plans/#comments</comments>
		<pubDate>Tue, 14 Jun 2011 06:09:57 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Data Integrator]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=8439</guid>
		<description><![CDATA[Yesterday I posted on the blog an overview of some of the new interface (mapping) features in the original, 11.1.1.3 release of Oracle Data Integrator 11g. In today&#8217;s posting, I&#8217;m going to look at the first of two new features introduced with the ODI 11.1.1.5 that are particularly interesting: Load Plans. The 11.1.1.5 release of [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday I posted on the blog an overview of some of the <a href="http://www.rittmanmead.com/2011/06/odi-11g-new-mapping-and-interface-features-part-1/">new interface (mapping) features in the original, 11.1.1.3 release of Oracle Data Integrator 11g</a>. In today&#8217;s posting, I&#8217;m going to look at the first of two new features introduced with the ODI 11.1.1.5 that are particularly interesting: Load Plans.</p>
<p>The 11.1.1.5 release of ODI is <a href="http://www.oracle.com/technetwork/middleware/data-integrator/downloads/index.html">available for download on OTN</a> and Edelivery, and you can either do a fresh install, or upgrade from either the 10g or 11.1.1.3 version of ODI. The 10g to 11g upgrade process, like OBIEE, is carried out using the Upgrade Assistant, whilst the 11.1.1.3 to 11.1.1.5 upgrade, again like OBIEE, is the awful &#8220;Fusion Middleware Patching&#8221; process that involves running lots of scripts, upgrading repositories and so on. Once you&#8217;re there though, one of the first major new features you&#8217;ll see in 11.1.1.5 is Load Plans.</p>
<p>As a quick recap, ETL routines using ODI are made up of Interfaces, which are typically organized into packages. You can either create packages, like this, where you drag individual interfaces onto the canvas, and each interface is then executed serially, one after the other as one session:</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-10.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-10.png" border="0" alt="Sshot 10" width="600" height="290" /></p>
<p>Or you can compile the individual interfaces into &#8220;scenarios&#8221;, and then use the scenarios in a package instead.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-11.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-111.png" border="0" alt="Sshot 11" width="600" height="215" /></p>
<p>The advantage of using scenarios rather than interfaces in a package, apart from the fact you&#8217;re working with a fixed, compiled version of the interface that in theory won&#8217;t change from run-to-run, is that each scenario can be run in it&#8217;s own session, and execute asynchronously, with an ODI tool at the end that checks their statuses and fails over if one of the child sessions errors.</p>
<p>In fact running interfaces and packages in parallel, something fairly common in large ETL routines, was something fairly tricky in ODI before 11.1.1.5. Typically you&#8217;d use the compiled scenarios trick to get things running asynchronously, and you&#8217;d also set up and run several agents each of which handled a certain number of sessions before load-balancing off to another agent. Oracle did try and tackle this problem, along with sequencing and combining of sets of interfaces, with the 7.9.5.2 release of the BI Apps which used ODI, along with a tool called the Configuration Manager, to create load routines out of subject areas each of which had dependencies, run orders and so on. It wasn&#8217;t as powerful as the DAC, but combined with packages it went some of the way to meeting the BI Apps requirements, albeit in a pretty inflexible way.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="NewImage.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/NewImage18.png" border="0" alt="NewImage" width="500" height="300" /></p>
<p>ODI 11.1.1.5 introduces a new concept to handle this requirement, called <a href="http://download.oracle.com/docs/cd/E21764_01/integrate.1111/e12643/loadplans.htm#BABCCGIA">Load Plans</a>. Load Plans are executable objects that you create in your work repository, can be compiled themselves and imported into an Execution Work Repository, can run from the command-line, and are a way of adding nesting, sequencing, parallelism, error handling and restarting to your ODI project.</p>
<p>Going on the same example as before, where we are first loading some staging data and then insert/updating into some warehouse tables. a typical ODI 11.1.1.5 Load Plan would look like this:</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-12.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-121.png" border="0" alt="Sshot 12" width="600" height="375" /></p>
<p>When you create a load plan, individual steps are added in which can either be a Serial Step, a Parallel Step, a step to run an individual Scenario, and a Case step for conditional logic.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-13.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-13.png" border="0" alt="Sshot 13" width="437" height="219" /></p>
<p>In the example above, we have a serial step at the start to refresh some project variables, then we load in parallel the staging tables used to feed the dimensions. Later on, we have a CASE step that checks a project variable and then only executes the rest of the plan if DW_REFRESH=1.</p>
<p><a href="http://www.rittmanmead.com/2010/02/data-warehouse-fault-tolerance-part-2-restarting/">Something that will interest Stewart Bryson</a>, is the ability to define resume/restart settings for individual steps and groups of steps. The default action for an interface scenario failing is for the whole load plan to fail, but you can define &#8220;exception&#8221;, procedures and other steps to, for example, flashback a table or set of tables, or run some kind of clean-up routine.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-14.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-14.png" border="0" alt="Sshot 14" width="600" height="204" /></p>
<p>These exceptions can then be associated with steps and groupings, so that the exception is run after a step failure which will then either re-run, re-start from the parent-process or similarly for the whole load plan.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-15.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-15.png" border="0" alt="Sshot 15" width="600" height="374" /></p>
<p>Presumably this is the sort of new feature that&#8217;s making it&#8217;s way into ODI because of requirements coming from the BI Apps, but it should be useful for most developers working with ODI. It doesn&#8217;t replace the concept of packages &#8211; packages are typically &#8220;transaction-style&#8217; ETL routines that don&#8217;t get edited after deployment and can also take advantage of loops, which load plans don&#8217;t have (but then again, load plans have conditional execution using CASE &#8230; WHEN &#8230; ELSE). Uli Bethke, who uses ODI extensively, came up with some <a href="http://www.business-intelligence-quotient.com/?p=1284">initial limitations that he found with load plans</a> which are worth taking a look at, and this posting by FX from the ODI team <a href="http://blogs.oracle.com/dataintegration/entry/what_s_new_with_oracle">goes into the feature in a bit more detail</a>.</p>
<p>To wrap-up this mini-series, tomorrow I&#8217;ll take a look at a new ODI 11.1.1.5 feature that&#8217;ll also be interesting to OBIEE developers &#8211; Oracle Business Intelligence Enterprise Edition Data Lineage.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/06/odi-11g-new-mapping-and-interface-features-part-2-load-plans/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ODI 11g New Mapping and Interface Features &#8211; Part 1</title>
		<link>http://www.rittmanmead.com/2011/06/odi-11g-new-mapping-and-interface-features-part-1/</link>
		<comments>http://www.rittmanmead.com/2011/06/odi-11g-new-mapping-and-interface-features-part-1/#comments</comments>
		<pubDate>Mon, 13 Jun 2011 16:55:09 +0000</pubDate>
		<dc:creator>Mark Rittman</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Oracle Data Integrator]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=8428</guid>
		<description><![CDATA[About a year or so ago I posted an entry on this blog highlighting the new architectural features of Oracle Data Integrator 11g. At the time, I talked about the new user interface, integration with WebLogic Server and Enterprise Manager, and how the Repository Creation Utility could be used to create the master and work [...]]]></description>
			<content:encoded><![CDATA[<p>About a year or so ago I posted an entry on this blog highlighting the <a href="http://www.rittmanmead.com/2010/09/odi-11g-now-available-for-download/">new architectural features of Oracle Data Integrator 11g</a>. At the time, I talked about the new user interface, integration with WebLogic Server and Enterprise Manager, and how the Repository Creation Utility could be used to create the master and work repositories. As well as these high-level new features though, there were a number of more detail-level features that could prove useful for developers creating data integration routines with the tool.</p>
<p>One such new feature is the ability to join together two or more data sets using set operators, such as UNION, INTERSECT or MINUS. To take an example, consider a situation where you have some order data sitting in a table that you want to load into a staging area, like this:</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-3.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-32.png" border="0" alt="Sshot 3" width="516" height="419" /></p>
<p>Now imagine that you had some additional orders data, but this time it was sitting in a file, containing the same columns but obviously stored separately to the table data. You could read from the two sources over two separate interfaces, or in ODI 11g you can press the <strong>Add/Remove Dataset&#8230;</strong> button, like this:</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-4.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-41.png" border="0" alt="Sshot 4" width="517" height="167" /></p>
<p>Pressing this button brings up a dialog that lets you give each dataset a name, add a new dataset reference and then select the set operator to combine them, from the list of UNION, UNION ALL, MINUS and INTERSECT.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-5.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-52.png" border="0" alt="Sshot 5" width="534" height="292" /></p>
<p>Once you&#8217;ve added this second dataset reference, ODI adds a tab to the bottom of the Source Datastore canvas, and you can then drag and drop the file datasource into your interface, with the two sources then being combined through your chosen set operator.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="sshot-6.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/sshot-61.png" border="0" alt="Sshot 6" width="467" height="395" /></p>
<p>Another new feature is Temporary Interfaces. If you saw my postings on the 7.9.5.2 release of the Oracle BI Applications back in 2009, you&#8217;d have seen how Oracle <a href="http://www.rittmanmead.com/2009/08/a-first-look-at-the-bi-apps-7-9-5-2-part-2-technology-changes-between-odi-and-informatica/">reproduced the maplets feature within Informatica with temporary, or &#8220;yellow&#8221; interfaces</a>. Temporary interfaces are different from regular ODI interfaces in that they can become a data source for another interface, allowing you to encapsulate mapping functionality and re-use it across wider ETL processes. Temporary interfaces have now made their way into ODI 11g and are a handy way of breaking down a more complex mapping procedure into more manageable chunks.</p>
<p>To take an example, in the example above we combined a table and file source into a single interface source through a set operator, which then provided us with some order line item informaton. Now consider a situation where we want the output of this interface to be available as a data source for another interface, so that we can combine the line-level data with some order-level information such as the order date, salesperson and ship date. To turn the original interface into a temporary interface, ensure that you don&#8217;t drag a target datastore into the interface, instead drag and drop the columns you require into the target datastore area, and in the Temporary Target Properties panel at the bottom of the screen, give the output table a name, for example ORDER_DETAILS_TEMP.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="NewImage.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/NewImage14.png" border="0" alt="NewImage" width="600" height="416" /></p>
<p>Now, when the temporary interface is executed, it creates an populates in this case a table called ORDER_DETAILS_TEMP, and if you include the temporary interface in another mapping, you can join to it and read from it just like any other datastore.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="NewImage.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/NewImage16.png" border="0" alt="NewImage" width="523" height="419" /></p>
<p>The default behaviour for temporary interfaces is to persist the intermediate results (the output of the temporary interface) in a database table, which in the example above, would make sense as part of the data is coming in from a file. But in the case where the temporary interface is only working with table data, it would be nice if we could render the temporary interface as just a SELECT statement, which we could embed in the calling interface as a sub-query or sub-select.</p>
<p>This is actually now possible with ODI 11g, using a feature called &#8220;Derived Select for Temporary Interfaces&#8221;. To take an example, consider a situation where you are building up a complex, multi-step interface where first, you aggregate orders by customer, and then you want to add an additional column to this dataset containing the customer order rank. You could start by creating a temporary interface that aggregated orders by customer, and then create a second interface that has the first temporary interface as a data source.</p>
<p>Then, instead of having the first temporary interface persist its output as a temporary table, you instead navigate to the Source Properties panel in the Interface Editor, and select the <strong>Use Temporary Interface as Derived Table (Sub-Select)</strong> checkbox, like this:</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="NewImage.png" src="http://www.rittmanmead.com/wp-content/uploads/2011/06/NewImage17.png" border="0" alt="NewImage" width="600" height="592" /></p>
<p>The benefit of going down this route is that you&#8217;ll cut down on the disk I/O, and disk space required, by removing the need to stage the intermediate results to a temporary table. The downside is that the SQL used in the main interface is going to be more complex, might take up more memory and might be a bit trickier to debug if there&#8217;s an error. But in general, if you&#8217;re using a fairly beefy database server to do your ETL, this new feature will make sense more often than not.</p>
<p>So there you go &#8211; some new interface and ETL features in the initial 11.1.1.3 release of ODI 11g. But hold on &#8211; as <a href="http://www.rittmanmead.com/2011/05/not-the-only-11-1-1-5-0-in-town/">Peter Scott mentioned</a>, there&#8217;s just been a new 11.1.1.5 release of ODI as well, so we&#8217;ll take a look tomorrow at two new features that come with this updated version &#8211; Load Plans, and OBIEE Lineage.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/06/odi-11g-new-mapping-and-interface-features-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

