<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rittman Mead Consulting &#187; Dimensional Modelling</title>
	<atom:link href="http://www.rittmanmead.com/category/dimensional-modelling/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rittmanmead.com</link>
	<description>Delivering Oracle Business Intelligence</description>
	<lastBuildDate>Mon, 06 Feb 2012 21:18:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Agile Data Warehousing with Exadata and OBIEE: ETL Iteration</title>
		<link>http://www.rittmanmead.com/2012/01/agile-exadata-obiee-etl/</link>
		<comments>http://www.rittmanmead.com/2012/01/agile-exadata-obiee-etl/#comments</comments>
		<pubDate>Fri, 27 Jan 2012 04:22:31 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[BI (General)]]></category>
		<category><![CDATA[BI 2.0]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Methodology]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=9954</guid>
		<description><![CDATA[This is the fourth entry in my series on Agile Data Warehousing with Exadata and OBIEE. To see all the previous posts, check the introductory posting which I have updated with all the entries in the series. In the last post, I describe what I call the Model-Driven iteration, where we take thin requirements from the [...]]]></description>
			<content:encoded><![CDATA[<p>This is the fourth entry in my series on Agile Data Warehousing with Exadata and OBIEE. To see all the previous posts, check the <a title="Agile Data Warehousing with Exadata and OBIEE: Introduction" href="http://www.rittmanmead.com/2011/12/agile-data-warehousing-with-exadata-and-obiee-introduction/">introductory posting</a> which I have updated with all the entries in the series.</p>
<p>In the last post, I describe what I call the Model-Driven iteration, where we take thin requirements from the end-user in the form of a user story and generate the access and performance layer, or our star schema, logically using the OBIEE semantic model. Our first several iterations will likely be Model-Driven as we work with the end user to fine-tune the content he or she wants to see on the OBIEE dashboards. As user stories are opened, completed and validated throughout the project, end users are prioritizing them for the development team to work on. Eventually, there will come a time when an end user opens a story that is difficult to model in the semantic layer. Processes to correct data quality issues are a good example, and despite having the power of Exadata at our disposal, we may find ourselves in a performance hole that even the Database Machine can&#8217;t dig us out of. In these situations, we reflect on our overall solution and consider the maxim of Agile methodology: &#8220;refactoring&#8221;, or &#8220;rework&#8221;.</p>
<p>For Extreme BI, the main form of refactoring is ETL. The pessimist might say: &#8220;Well, now we have to do ETL development, what a waste of time all that RPD modeling was.&#8221; But is that the case? First off&#8230; think about our users. They have been running dashboards for some time now with at least a portion of the content they need to get their jobs done. As the die-hard Agile proponent will tell you&#8230; some is better than none. But also&#8230; the process of doing the Model-Driven iteration puts our data modelers and our ETL developers in a favorable position. We&#8217;ve eliminated the exhaustive data modeling process, because we already have our logical model in the Business Model and Mapping layer (BMM).</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Full-Logical-Model.png"><img class="alignnone size-large wp-image-9976" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Full-Logical-Model-1024x559.png" alt="" width="614" height="335" /></a></p>
<p>But we have more than that. We also have our source-to-target information documented in the semantic metadata layer. We can see that information using the Admin Tool, as depicted below, or we can also use the &#8220;Repository Documentation&#8221; option to generate some documented source-to-target mappings.</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Map-Dimension.png"><img class="size-full wp-image-9883  alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Map-Dimension.png" alt="" width="671" height="219" /></a></p>
<p>When embarking on ETL development, it&#8217;s common to do SQL prototyping before starting the actual mappings to make sure we understand the particulars of granularity. However, we already have these SQL prototypes in the nqquery.log file&#8230; all we have to do is look at it. The combination of the source-to-target-mapping and the SQL prototypes provide all the artifacts necessary to get started with the ETL.</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Query-Log.png"><img class="alignnone size-large wp-image-9982" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Query-Log-1024x598.png" alt="" width="645" height="377" /></a></p>
<p>When using ETL processing to &#8220;instantiate&#8221; our logical model into the physical world, we can&#8217;t abandon our Agile imperatives: we must still deliver the new content, and corresponding rework, within a single iteration. So whether the end user is opening the user story because the data quality is abysmal, or because the performance is just not good enough, we must vow to deliver the ETL Iteration time-boxed, in exactly the same manner that we delivered the Model-Driven Iteration. So, if we imagine that our user opens a story about data quality in our Customer and Product dimensions, and we decide that all we have time for in this iteration are those two dimension tables, does it make sense for us to deliver those items in a vacuum? With the image below depicting the process flow for an entire subject area, can we deliver it piecemeal instead of all at once?</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Piecemeal-Process-Flow.png"><img class="alignnone size-full wp-image-9968" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Piecemeal-Process-Flow.png" alt="" width="636" height="348" /></a></p>
<p>The answer, of course, is that we can. We&#8217;ll develop the model and ETL exactly as we would if our goal was to plug the dimensions into a complete subject area. We use surrogate keys as the primary key for each dimension table, facilitating joining our dimension tables to completed fact tables. But we don&#8217;t have completed fact tables at this point in our project&#8230; instead we have a series of transaction tables that work together to form the basis of a logical fact table. How can we use a dimension table with a surrogate key to join to our transactional &#8220;fact&#8221; table that doesn&#8217;t yet have these surrogate keys?</p>
<p>We fake it. Along with surrogate keys, the long-standing best practice of dimension table delivery has been to include the source system natural key, as well as effective dates, in all our dimension tables. These attributes are usually included to facilitate slowly-changing dimension (SCD) processing, but we&#8217;ll exploit them for our Agile piecemeal approach as well. So in our example below, we have a properly formed Customer dimension that we want to join to our logical fact table, as depicted below:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Partial-Hybrid-Model-e1327470743307.png"><img class="alignnone size-full wp-image-9995" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Partial-Hybrid-Model-e1327470743307.png" alt="" width="596" height="200" /></a></p>
<p>We start by creating aliases to our transactional &#8220;fact&#8221; tables (called POS_TRANS_HYBRID and POS_TRANS_HEADER_HYBRID in the example above), because we don&#8217;t want to upset the logical table source (LTS) that we are already using for the pure transactional version of the logical fact table. We create a complex join between the customer source system natural key and transaction date in our hybrid alias, and the natural key and effective dates in the dimension table. We use the effective dates as well to make sure we grab the correct version of the customer entity in question in situations where we have enabled Type 2 SCD&#8217;s (the usual standard) in our dimension table.</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Surrogate-Pipeline.png"><img class="alignnone size-large wp-image-10007" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Surrogate-Pipeline-1024x869.png" alt="" width="574" height="486" /></a></p>
<p>This complex logic of using the natural key and effective dates is identical to the logic we would use in what Ralph Kimball calls the &#8220;surrogate pipeline&#8221;: the ETL processing used to replace natural keys with surrogate keys when loading a proper fact table. Using Customer and Sales attributes in an analysis, we can see the actual SQL that&#8217;s generated:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Surrogate-Pipeline-SQL.png"><img class="alignnone size-large wp-image-10025" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Surrogate-Pipeline-SQL-1024x510.png" alt="" width="645" height="321" /></a></p>
<p>We can view this hybrid approach as an intermediate step, but there is also nothing wrong with this as a long-term approach if the users are happy and Exadata makes our queries scream. If you think about it&#8230; a surrogate key is an easy was of representing the natural key of the table, which is the source system natural key plus the unique effective dates for the entity. A surrogate key makes this relationship much easier to envision, and certainly code using SQL, but when we are insulated from the ugliness of the join with Extreme Metadata, do we really care? If our end users ever open a story asking for rework of the fact table, we may consider manifesting that table physically as well. Once complete, we would need to create another LTS for the Customer dimension (using an alias to keep it separate from the table that joins to the transactional tables). This alias would be configured to join directly to the new Sales fact table across the surrogate key&#8230; exactly how we would expect a traditional data warehouse to be modeled in the BMM. The physical model will look nearly identical to our logical model, and the generated SQL will be less interesting:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Fact-LTS.png"><img class="alignnone size-full wp-image-10033" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Fact-LTS.png" alt="" width="221" height="226" /></a></p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Star-Schema-SQL.png"><img class="alignnone size-large wp-image-10029" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Star-Schema-SQL-1024x420.png" alt="" width="645" height="265" /></a></p>
<p>Now that I&#8217;ve described the Model-Driven and ETL Iterations, it&#8217;s time to discuss what I call the Combined Iteration, which is likely what most of the iterations will look like when the project has achieved some maturity. In Combined Iterations, we work on adding new or refactored RPD content alongside new or refactored ETL content in the same iteration. Now the project really makes sense to the end user. We allow the user community&#8211;those who are actually consuming the content&#8211;to dictate to the developers with user stories what they want the developers to work on in the next iteration. The users will constantly open new stories, some asking for new content, and others requesting modifications to existing content. All Agile methodologies put the burden of prioritizing user stories squarely on the shoulders of the user community. Why should IT dictate to the user community where priorities lie? If we have delivered fabulous content sourced with the Model-Driven paradigm, and Exadata provides the performance necessary to make this &#8220;real&#8221; content, then there is no reason for the implementors to dictate to the users the need to manifest that model physically with ETL when they haven&#8217;t asked for it. If whole portions of our data warehouse are never implemented physically with ETL&#8230; do we care? The users are happy with what they have, and they think performance is fine&#8230; do we still force a &#8220;best practice&#8221; of a physical star schema on users who clearly don&#8217;t want it?</p>
<p>So that&#8217;s it for the Extreme BI methodology. At the onset of this series&#8230; I thought it would require five blog posts to make the case, but I was able to do it in four instead. So even when delivering blog posts, I can&#8217;t help but rework as I go along. Long live Agile!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2012/01/agile-exadata-obiee-etl/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Agile Data Warehousing with Exadata and OBIEE: Model-Driven Iteration</title>
		<link>http://www.rittmanmead.com/2012/01/agile-exadata-obiee-model-driven/</link>
		<comments>http://www.rittmanmead.com/2012/01/agile-exadata-obiee-model-driven/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 05:32:10 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[BI (General)]]></category>
		<category><![CDATA[BI 2.0]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=9825</guid>
		<description><![CDATA[After laying the groundwork with an introduction, and following up with a high-level description of the required puzzle pieces, it&#8217;s time to get down to business and describe how Extreme BI works. At Rittman Mead, we have several projects delivering with this methodology right now, and more in the pipeline. I&#8217;ll gradually introduce the different types of [...]]]></description>
			<content:encoded><![CDATA[<p>After laying the groundwork with an <a title="Agile Data Warehousing with Exadata and OBIEE: Introduction" href="http://www.rittmanmead.com/2011/12/agile-data-warehousing-with-exadata-and-obiee-introduction/" target="_blank">introduction</a>, and following up with a high-level description of the required <a title="Agile Data Warehousing with Exadata and OBIEE: Puzzle Pieces" href="http://www.rittmanmead.com/2011/12/agile-exadata-obiee-puzzle-pieces/" target="_blank">puzzle pieces</a>, it&#8217;s time to get down to business and describe how Extreme BI works. At Rittman Mead, we have several projects delivering with this methodology right now, and more in the pipeline.</p>
<p>I&#8217;ll gradually introduce the different types of generic iterations that we engage in, focusing on what I call the &#8220;model-driven&#8221; iteration for this post. Our first few iterations are always model-driven. We begin when a user opens a user story requesting new content. For any request for new content, we require that all the following elements are including in the story:</p>
<ol>
<li>A narrative about the data they are looking for, and how they want to see it. We are not looking for requirements documents here, but we are looking for the user to give a complete picture of what it is that they need.</li>
<li>An indication of how they report on this content today. In a new data warehouse environment, this would include some sort of report that they are currently running against the source system, and in a perfect world, this would involve the SQL that is used to pull that report.</li>
<li>An indication of data sets that are &#8220;nice to haves&#8221;. This might include data that isn&#8217;t available to them in the current paradigm of the report, or was simply too complicated to pull in that paradigm. After an initial inspection of these nice-to-haves and the complexity involved with including them in this story, the project manager may decide to pull these elements out and put them a separate user story. This, of course, depends on the Agile methodology used, and the individual implementation of that methodology.</li>
</ol>
<p>First we assign the story to an RPD developer, who uses the modeling capabilities in the OBIEE Admin Tool to &#8220;discover&#8221; the logical dimensional model tucked inside the user story, and develop that logical model inside the Business Model and Mapping (BMM) layer. Unlike a &#8220;pure&#8221; dimensional modeling exercise where we focus only on user requirements and pay very little attention to source systems, in model-driven development, we constantly shift between the source of the data, and how best the user story can be solved dimensionally. Instead of working directly against the source system though, we are working against the foundation layer in the Oracle Next-Generation Reference Data Warehouse Architecture. We work from a top-down approach, first creating empty facts and dimensions in the BMM, and mapping them to the foundation layer tables in the physical layer.</p>
<p>To take a simple example, we can see how a series of foundation layer tables developed in 3NF could be mapped to a logical dimension table as our Customer dimension:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Dimension-Join.png"><img class="size-full wp-image-9893 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Dimension-Join.png" alt="Model-Driven Development of Dimension Table" width="425" height="208" /></a></p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Map-Dimension.png"><img class="size-full wp-image-9883 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Map-Dimension.png" alt="" width="671" height="219" /></a></p>
<p>I rearranged the layout from the Admin Tool to provide an &#8220;ETL-friendly&#8221; view of the mapping. All the way to the right, we can see the logical, dimensional version of our Customer table, and how it maps back to the source tables. This mapping could be quite complicated, with perhaps dozens of tables. The important thing to keep in mind is that this complexity is hidden from not only the consumer of the reports, but also from the developers. We can generate a similar example of what our Sales fact table would look like:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Fact-Join.png"><img class="size-full wp-image-9896 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Fact-Join.png" alt="" width="426" height="209" /></a></p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Map-Fact.png"><img class="size-full wp-image-9889 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Model-Driven-Map-Fact.png" alt="" width="664" height="276" /></a></p>
<p>Another way of making the same point is to look at the complex, transaction model:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Physical-Model-Annotated.png"><img class="size-full wp-image-9904 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Physical-Model-Annotated.png" alt="" width="441" height="311" /></a></p>
<p>We can then compare this to the simplified, dimensional model:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Logical-Model-Annotated.png"><img class="size-full wp-image-9905 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Logical-Model-Annotated.png" alt="" width="409" height="260" /></a></p>
<p>And finally, when we view the subject area during development of an analyses, all we see are facts and dimensions. The front-end developer can be blissfully ignorant that he or she is developing against a complex transactional schema, because all that is visible is the abstracted logical model:</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Astracted-View-for-Developer.png"><img class="alignnone size-full wp-image-9915" src="http://www.rittmanmead.com/wp-content/uploads/2012/01/Astracted-View-for-Developer.png" alt="" width="741" height="395" /></a></p>
<p>When mapping the BMM to complex 3NF schemas, the BI Server is very, very smart, and understands how to do more with less. Using the metadata capabilities of OBIEE is superior to other metadata products, or superior to a &#8220;roll-you-own metadata&#8221; approach using database views, because of the following:</p>
<ol>
<li>The generated SQL usually won&#8217;t involve self-joins, even when tables exists in both the logical fact table, and the logical dimension table.</li>
<li>The BI Server will only include tables that are required to facilitate the intelligent request, either because it has columns mapped to the attributes being requested, or because the table is a required reference table to bring disparate tables together. Any tables not required to facilitate the request will be excluded.</li>
</ol>
<p>Since the entire user story needs to be closed in a single iteration, the user who opened the story needs to be able to see the actual content. This means that the development of the analyses (or report) and the dashboard are also required to complete the story. It&#8217;s important to get something in front of the end user immediately, but it doesn&#8217;t have to be perfect. We should focus on a clear, concise analyses in the first iteration, so it&#8217;s easy for the end user to verify that the data is correct. In future iterations, we can deliver high-impact, eye-catching dashboards. Equally important to closing the story is being able to prove that it&#8217;s complete. In Agile methodologies, this is usually referred to as the &#8220;Validation Step&#8221; or &#8220;Showcase&#8221;. Since we have already produced the content, then it&#8217;s easy to prove to the user that the story is complete. But suppose that we believed we couldn&#8217;t deliver new content in a single iteration. That would imply that we would have an iteration during our project that didn&#8217;t include actual end-user content. How would you go about validating or showcasing that content? How would we go about showcasing a completed ETL mapping, for instance, if we haven&#8217;t delivered any content to consume it?</p>
<p>What we have at the end of the iteration is a completely abstracted view of our model: a complex, transactional, 3NF schema presented as a star schema. We are able to deliver portions of a subject area, which is important for time-boxed iterations. The Extreme Metadata of OBIEE 11g allows us to remove this complexity in a single iteration, but it&#8217;s the performance of the Exadata Database Machine that allows us to build real analyses and dashboards and present it to the general user community.</p>
<p>In the next post, we&#8217;ll examine the ETL Iteration, and explore how we can gradually manifest our logical business model into a physical model over time. As you will see, the ETL iteration is an optional one&#8230; it will be absolutely necessary in some environments, and completely superflous in others.</p>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2012/01/Physical-Model-Annotated.png"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2012/01/agile-exadata-obiee-model-driven/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Agile Data Warehousing with Exadata and OBIEE: Puzzle Pieces</title>
		<link>http://www.rittmanmead.com/2011/12/agile-exadata-obiee-puzzle-pieces/</link>
		<comments>http://www.rittmanmead.com/2011/12/agile-exadata-obiee-puzzle-pieces/#comments</comments>
		<pubDate>Wed, 28 Dec 2011 19:39:56 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Methodology]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=9637</guid>
		<description><![CDATA[In the previous post, I laid the groundwork for describing Extreme BI: a combination of Exadata and OBIEE delivered with an Agile spirit. I discussed that the usual approach to Agile data warehousing is not Agile at all due to the violation of it&#8217;s main principle: working software delivered iteratively. If you haven&#8217;t already deduced [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a title="Agile Data Warehousing with Exadata and OBIEE: Introduction" href="http://www.rittmanmead.com/2011/12/agile-data-warehousing-with-exadata-and-obiee-introduction/" target="_blank">previous post</a>, I laid the groundwork for describing Extreme BI: a combination of Exadata and OBIEE delivered with an Agile spirit. I discussed that the usual approach to Agile data warehousing is not Agile at all due to the violation of it&#8217;s main principle: working software delivered iteratively.</p>
<p>If you haven&#8217;t already deduced from my first post &#8212; or if you haven&#8217;t already seen me speak on this topic &#8212; what I am recommending is bypassing, either temporarily or permanently, the inhibitors specific to data warehousing projects which limit our ability to deliver working software quickly. Specifically, I&#8217;m recommending that we wait to build and populate physical star schemas until a later phase, if at all. Remember the two reasons that we build dimensional models: model simplicity and performance. With our Extreme BI solution, we have tools to counter both of those reasons. We have OBIEE 11g, with a rich metadata layer that presents our underlying data model, even if it is transactional, as a star schema to the end user. This removes our dependency on a simplistic physical model to provide a simplistic logical model to end users. We also have Exadata, which delivers world-class performance against any type of model, and can bridge the performance gap afforded by star schemas. With these tools at our disposal, we can postpone the long process of building dimensional models, at least for the first few iterations. This is the only way to get working software in front of the end user in a single iteration, and, as I will argue, this is the best way to collaborate with an end user and deliver the content they are expecting.</p>
<p>Of the puzzle pieces we need to deliver this model, the first is the <a href="http://www.rittmanmead.com/wp-content/uploads/2011/12/058925.pdf" target="_blank">Oracle Next-Generation Reference DW Architecture</a> (we need an acronym for that), which Mark has already written about in-depth <a title="Drilling Down in the Oracle Next-Generation Reference DW Architecture" href="http://www.rittmanmead.com/2009/07/drilling-down-in-the-oracle-next-generation-reference-dw-architecture/" target="_blank">here</a>. As you browse through this post, pay special attention to his formulation of the foundation layer, which is the most important layer for delivering Extreme BI.</p>
<div id="attachment_9672" class="wp-caption aligncenter" style="width: 673px"><a href="http://www.rittmanmead.com/wp-content/uploads/2011/12/next-gen.png"><img class="size-large wp-image-9672    " src="http://www.rittmanmead.com/wp-content/uploads/2011/12/next-gen-1024x627.png" alt="" width="663" height="407" /></a><p class="wp-caption-text">Oracle Next-Generation Reference DW Architecture</p></div>
<h2>Foundation Layer</h2>
<p>This is our &#8220;process-neutral&#8221; layer, which means simply that it isn&#8217;t imbued with requirements about what users want and how they want it. Instead, the foundation layer has one job and one job only: tracking what happened in our source systems. Typically, the foundation layer logical model looks identical to the source systems, except that we have a few additional metadata columns on each record such as commit timestamps and Oracle Database system change numbers (SCN&#8217;s). There are other, more complex solutions for modeling the foundation layer when the 3NF from the source system or systems is not sufficient, such as <a title="Data Vault Modeling" href="http://en.wikipedia.org/wiki/Data_Vault_Modeling" target="_blank">data vault</a>. Our foundation layer is generally &#8220;insert-only&#8221;, meaning we track all history so that we are insulated from changing user requirements in the near and distant futures.</p>
<p><strong>UPDATE: </strong> Kent Graziano, a major data vault evangelist, has started <a title="Oracle Data Warrior" href="http://kentgraziano.com/" target="_blank">blogging</a>. Perhaps with some pressure from the public, we could &#8220;encourage&#8221; him to blog on what data vault would look like in a standard foundation layer.</p>
<h2>Capturing Change</h2>
<p>Also required for delivering Extreme BI is a process for capturing change from the source systems and rapidly applying it to the foundation layer, which I described briefly in one of my posts on <a title="Real-time BI: An Introduction" href="http://www.rittmanmead.com/2011/05/real-time-bi-an-introduction/" target="_blank">real-time data warehousing</a>. We have a bit of a tug-of-war at this point between Oracle Streams and Oracle GoldenGate. GoldenGate is the stated platform of the future because it’s a simple, flexible, powerful and resilient replication technology. However, it does not yet have powerful change data capture functionality specific to data warehouses, such as easy subscriptions to raw changed data, or support for multiple subscription groups. You can, in general, work around these limitations using the INSERTALLRECORDS parameter and some custom code (perhaps fodder for a future blog post). Regardless of the technology, Extreme BI requires a process for capturing and applying source system changes quickly and efficiently to the foundation layer on the Exadata Database Machine.</p>
<h2>Extreme Performance</h2>
<p>Although I&#8217;ll drill into more detail in the next post, the reason we need Extreme Performance is to offset the performance gains we usually get from star schemas, since we won&#8217;t be building those, at least not in the initial iterations. Although Rittman Mead has deployed a variant of this methodology sans Exadata using a powerful Oracle Database RAC instead, there is no substitute for Exadata. Although the hardware on the Database Machine is superb, it&#8217;s really the software that is a game-changer. The most extraordinary features include <a title="Smart Scans Meet Storage Indexes" href="http://www.oracle.com/technetwork/issue-archive/2011/11-may/o31exadata-354069.html" target="_blank">smart scan and storage indexes</a>, as well as hybrid columnar compression, which Mark talks about <a title="Hybrid Columnar Compression in Oracle Exadata v2" href="http://www.rittmanmead.com/2010/01/hybrid-columnar-compression-in-oracle-exadata-v2/" target="_blank">here</a> and references an article by Arup Nanda found <a title="Compressing Columns" href="http://www.oracle.com/technetwork/issue-archive/2010/10-jan/o10compression-082302.html" target="_blank">here</a>. For years now, with standard Oracle data warehouses, we&#8217;ve pushed the architecture to it&#8217;s limits trying to reduce IO contention at the cost of CPU utilization, using database features such as partitioning, parallel query and basic block compression. But Exadata Storage can eliminate the IO boogeyman using combinations of these standard features plus the Exadata-only features to elevate the query performance against 3NF schemas on par with traditional star schemas and beyond.</p>
<p style="text-align: center"><a href="http://www.rittmanmead.com/wp-content/uploads/2011/12/Terabytes-to-Gigabytes.png"><img class="aligncenter size-full wp-image-9739" src="http://www.rittmanmead.com/wp-content/uploads/2011/12/Terabytes-to-Gigabytes.png" alt="" width="617" height="352" /></a></p>
<h2>Extreme Metadata</h2>
<p>Extreme performance is only half the battle&#8230; we also need Extreme Metadata to provide us the proper level of abstraction so that report and dashboard developers still have a simplistic model to report against. This is what OBIEE 11g brings to the table. We have also delivered a variant of this methodology without OBIEE, using Cognos instead, which has a metadata layer called <a title="Framework Manager" href="http://www.ironsidegroup.com/2010/07/08/best-practices-in-cognos-8-framework-manager-model-design/" target="_blank">Framework Manager</a>. As with Exadata, the BI Server has no equal in the metadata department, so my advice&#8230; don&#8217;t substitute ingredients.</p>
<p>Consider, for a moment, the evolution of dimensional modeling in deploying a data warehouse. Not too long ago, we had to solve most data warehousing issues with the logical model because BI tools were simplistic. Generally&#8230; there was no abstraction of the physical into the logical, unless you categorize the renaming of columns as abstraction. As these tools evolved, we often found ourselves with a choice: solve some user need in the logical model, or solve it with the feature set of the BI tool. The use of aggregation in data warehousing is a perfect example of this evolution. Designing aggregate tables used to be just another part of the logical modeling exercise, and were generally represented in the published data model for the EDW. But now, building aggregates is more of a technical implementation than a logical one, as either the BI Server or the Oracle Database can handle the transparent navigation to aggregate tables.</p>
<p>The metadata that OBIEE provides adds two necessary features for Agile delivery. First, we are able to report against complex transactional schemas, but still expose those schemas as simplified dimensional models. This allows us to bypass the complex ETL process at least initially so that we can get new subject areas into the users hands in a single iteration. But OBIEE&#8217;s capability to map multiple Logical Table Sources (LTS&#8217;s) for the same logical table makes it easy to modify &#8212; or &#8220;remap&#8221; &#8212; the source of our logical tables over time. So, in later iterations, if we decide that it&#8217;s necessary to embark upon complex ETL processes to complete user stories, we can do this in the metadata layer without affecting our reports and dashboards, or changing the logical model that report developers are used to seeing.</p>
<div id="attachment_9754" class="wp-caption aligncenter" style="width: 612px"><a href="http://www.rittmanmead.com/wp-content/uploads/2011/12/semantic-model.031.png"><img class="size-full wp-image-9754 " src="http://www.rittmanmead.com/wp-content/uploads/2011/12/semantic-model.031.png" alt="" width="602" height="378" /></a><p class="wp-caption-text">Flow of Data Through the Three-Layer Semantic Model</p></div>
<h2>More to Come&#8230;</h2>
<p>In the next post, I&#8217;ll describe what I call the Model-Driven Iteration, where we use OBIEE against the foundation layer to expose new subject areas in a single iteration. After that, I&#8217;ll describe ETL Iterations, where we transform a portion of our model iteratively using ETL tools such as ODI, OWB or Informatica. Finally, I&#8217;ll describe what I call Combined Iterations, where both Model-Driven activity and ETL activity are going on at the same time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/12/agile-exadata-obiee-puzzle-pieces/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Agile Data Warehousing with Exadata and OBIEE: Introduction</title>
		<link>http://www.rittmanmead.com/2011/12/agile-data-warehousing-with-exadata-and-obiee-introduction/</link>
		<comments>http://www.rittmanmead.com/2011/12/agile-data-warehousing-with-exadata-and-obiee-introduction/#comments</comments>
		<pubDate>Wed, 21 Dec 2011 15:48:55 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Methodology]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=9597</guid>
		<description><![CDATA[Over the last year, I&#8217;ve been speaking at conferences on one subject more than any others: Agile Data Warehousing with Exadata and OBIEE. Although I&#8217;ve been busy with client work and growing the US business, I realize I need to dedicate more time to blogging again, and this seemed like the logical subject to take [...]]]></description>
			<content:encoded><![CDATA[<p>Over the last year, I&#8217;ve been speaking at conferences on one subject more than any others: Agile Data Warehousing with Exadata and OBIEE. Although I&#8217;ve been busy with client work and growing the US business, I realize I need to dedicate more time to blogging again, and this seemed like the logical subject to take up. So I&#8217;ll use the next few blog posts to make my case for what I like to call Extreme BI: an Agile approach to data warehousing using the combination of Extreme Performance and Extreme Metadata.</p>
<p>In a standard data warehouse implementation, whether we are walking in the Inmon or Kimball camps, some portion of our data model will be dimensional in nature; a star schema with facts and dimensions. So let me pose a question, which I think will lend itself well to diving into the Extreme BI discussion: Why do we build dimensional models? The first reason is simplicity. We want to model our reporting structures in a way that makes sense to the business user. The standard OLTP data model that takes two of the four walls in the conference room to display is just never going to make sense to your average business user. At the end of a logical modeling exercise, I expect the end-user to have a look at a completed dimensional model and say: &#8220;Yep&#8230; that&#8217;s our business alright&#8221;. The second reason we build dimensional models is for performance. Denormalizing highly complex transactional models into simplified star schemas generally produces tremendous performance gains.</p>
<p>So my follow-up question: can the combination of Exadata and OBIEE, or Extreme BI, <em>actually change the way we deliver projects? </em>We&#8217;ve all seen the Exadata performance numbers that Oracle publishes, and I can tell you first hand the performance is impressive. Can this Extreme Performance combined with the Extreme Metadata that OBIEE provides give us a more compelling case for delivering data warehouses using Agile methodologies?</p>
<p>To start with, I&#8217;d like to paint a picture of what the typical waterfall data warehousing project looks like. The tasks we usually have to complete, in order, are the following:</p>
<ol>
<li>User interviews</li>
<li>Construct requirement documents</li>
<li>Create logical data model</li>
<li>SQL prototyping of source transactional models</li>
<li>Document source-to-target mappings</li>
<li>ETL development</li>
<li>Front-end development (analyses and dashboards)</li>
<li>Performance tuning</li>
</ol>
<p>Raise your hand if this looks familiar. We would have to go through all these steps, which could take months, before end users can see the fruits of our labor. To mitigate this scenario, organizations will attempt to deliver data warehouses using &#8220;Agile&#8221; methodologies. What this usually means, from my experience, is a simple repackaging of the same waterfall project plan into &#8220;iterations&#8221; or &#8220;sprints&#8221;, so that the project can be delivered iteratively. So the process might look like the following:</p>
<ol>
<li>Iteration 1: Interviews and user requirements</li>
<li>Iteration 2: Logical modeling</li>
<li>Iteration 3: ETL Development</li>
<li>Iteration 4: Front-end development</li>
</ol>
<p>But this, ladies and gentlemen, is not Agile. To get an understanding of what lies at the heart of Agile development, we need to look no further than the <a title="The Agile Manisfesto" href="http://agilemanifesto.org/" target="_blank">Agile Manifesto</a>, or the history of the <a title="The Agile Movement" href="http://en.wikipedia.org/wiki/Agile_software_development" target="_blank">Agile Movement</a>. When examining the different methodologies, there is one major theme that permeates all of them: working software delivered iteratively. It&#8217;s not enough to simply deliver the same old waterfall methodology in &#8220;sprints&#8221; or &#8220;iterations&#8221;, because, at the end of those iterations, we don&#8217;t have any working software&#8230; software that end users can actually use to improve their job or help them make better decisions. In the example above, we still require four iterations before we get any usable content. It doesn&#8217;t matter if we&#8217;ve written some complex ETL to load a fact table if the end user doesn&#8217;t have a working dashboard to go along with it.</p>
<p>To apply the Agile Manifesto to data warehouse delivery, it&#8217;s the following key elements that are required for us to deliver with a true Agile spirit:</p>
<ol>
<li>User stories instead of requirements documents: a user asks for particular content through a narrative process, and includes in that story whatever process they currently use to generate that content.</li>
<li>Time-boxed iterations: iterations always have a standard length, and we choose one or more user stories to complete in that iteration.</li>
<li>Rework is part of the game: there aren&#8217;t any missed requirements&#8230; only those that haven&#8217;t been addressed yet.</li>
</ol>
<p>I&#8217;ve been conscious not to prescribe any distinct Agile methodology, though I can&#8217;t help using more Scrum-like concepts in this formulation. However, I think this list is generic enough to apply to most methodologies. Over the next few posts, I&#8217;ll discuss the necessary puzzle pieces to engage in Extreme BI, as well as how we might implement new subject area content in a single iteration. Additionally, I&#8217;ll discuss how these implementations might be reworked, or &#8220;refactored&#8221;, over several iterations to produce data warehouses that respond to user stories: what users want and when they want it.</p>
<p><strong>Follow-up Posts</strong></p>
<p><a title="Agile Data Warehousing with Exadata and OBIEE: Puzzle Pieces" href="http://www.rittmanmead.com/2011/12/agile-exadata-obiee-puzzle-pieces/" target="_blank">Agile Data Warehousing with Exadata and OBIEE: Puzzle Pieces</a></p>
<p><a title="Agile Data Warehousing with Exadata and OBIEE: Model-Driven Iteration" href="http://www.rittmanmead.com/2012/01/agile-exadata-obiee-model-driven/">Agile Data Warehousing with Exadata and OBIEE: Model-Driven Iteration</a></p>
<p><a title="Agile Data Warehousing with Exadata and OBIEE: ETL Iteration" href="http://www.rittmanmead.com/2012/01/agile-exadata-obiee-etl/" target="_blank">Agile Data Warehousing with Exadata and OBIEE: ETL Iteration</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/12/agile-data-warehousing-with-exadata-and-obiee-introduction/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Real-time BI: EDW with a Real-time Component</title>
		<link>http://www.rittmanmead.com/2011/07/real-time-bi-edw-with-a-real-time-component/</link>
		<comments>http://www.rittmanmead.com/2011/07/real-time-bi-edw-with-a-real-time-component/#comments</comments>
		<pubDate>Wed, 06 Jul 2011 20:46:55 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle Warehouse Builder]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=8630</guid>
		<description><![CDATA[I apologize for the long delay in getting this last portion of the Real-time discussion in place. Since I wrote the first two installments, we&#8217;ve had the BI Forum (US and UK versions), plus a flurry of activity around Rittman Mead in the US, followed up by KScope11. But a promise is a promise, and [...]]]></description>
			<content:encoded><![CDATA[<p>I apologize for the long delay in getting this last portion of the Real-time discussion in place. Since I wrote the first two installments, we&#8217;ve had the BI Forum (US and UK versions), plus a flurry of activity around Rittman Mead in the US, followed up by KScope11. But a promise is a promise, and here goes with the conclusion.</p>
<p>I laid out the general vocabulary and considerations for Real-time BI in <a href="http://www.rittmanmead.com/2011/05/real-time-bi-an-introduction/">my first post</a> in this series, and then followed up with how to implement Real-time BI using <a href="http://www.rittmanmead.com/2011/05/real-time-bi-federated-oltpedw-reporting/">a federated approach</a> that relies on the metadata capabilities OBIEE to blend two different environments into one. Now I&#8217;d like to discuss how we might implement a Real-time solution by relying on ETL instead of BI Tool metadata. I call this EDW with a Real-Time Component.</p>
<p>Whereas the Federated OLTP/EDW Reporting option provides us an option to layer real-time data into an otherwise classic batch-loaded EDW, delivering the EDW with a Real-Time Component requires designing an EDW from the ground up that supports real-time reporting. Specifically, we have to design our fact tables to support what Ralph Kimball calls the “real-time partition” in his book <em>The Kimball Group Reader</em>: “To achieve real-time reporting, we build a special partition that is physically and administratively separated from the conventional static data warehouse tables. Actually, the name partition is a little misleading. The real-time partition may be a separate table, subject to special rules for update and query.” We construct a separate section for each of our fact tables to facilitate the following 4 requirements, as defined by Kimball:</p>
<ol>
<li>Contain all activity since the last time the load was run</li>
<li>Link seamlessly to the grain of the static data warehouse tables</li>
<li>Be indexed so lightly that incoming data can “dribble in”</li>
<li>Support highly responsive queries</li>
</ol>
<p><img style="margin-left: auto;margin-right: auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/07/real-time-partition.png" border="0" alt="Real time partition" width="600" height="375" /></p>
<p>So we modify our model to support the interaction of real-time and static data, but we also modify our ETL to support this. In fact, to construct an EDW with a Real-Time Component, we have to build some very intricate interaction between the database, the data model and ETL processes. The static fact table is partitioned on a date data-type using standard Oracle partitioning strategies. The real-time partition is structured in such a way as to be loadable throughout the day. In other words, there are no indexes or constraints enabled on the table. ETL against the real-time partition uses a process comparable to traditional load scenarios, but using micro-batch instead, running as often as 100 times a day or more. Alternative methods include transactional style, record-by-record loading, possible using web services or message-based system such as JMS queues.</p>
<p>We  effectively want to build a single logical fact table out of the combination of the static EDW fact table and the real-time fact partition. There are several ways to do this. We could use OBIEE fragmentation for this, as we saw in the <a href="http://www.rittmanmead.com/2011/05/real-time-bi-federated-oltpedw-reporting/">last post.</a> This would work, but it&#8217;s not what I recommend. The reason we used fragmentation in the last post is because we were joining two completely different data sets across conformed dimensions into a unified model. However, with the real-time partition, we have two tables that have exactly the same structure—both using the same surrogate keys to the same dimension tables—just separated across different segments for performance reasons. In this case, I choose to UNION the two datasets with either a database view, or an opaque view in OBIEE.</p>
<p><img style="margin-left: auto;margin-right: auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/07/opaque-union-view.png" border="0" alt="Opaque union view" width="542" height="553" /></p>
<p>This works because we no longer have to control which source the rows will come from in particular situations: we simply pull all the rows, and use standard WHERE filters to limit the rows where applicable, and like the pruning the BI Server did for us in the last post, the Oracle Database will do for us in this case. We can, however, still present the static fact tables in situations that merit it: I&#8217;m thinking of financial reports here. Accountants don&#8217;t usually like their reports giving different results every time they run them.</p>
<p>We have one issue with the load of the real-time partition: we are assuming that we receive all of our dimension data right along with our fact data in clean CDC subscription groups. That would likely be the case if we were pulling all the data for our data warehouse from a single source-system, but with enterprise data warehouses, that is rarely the case. Receiving dimension data early causes no problems with our load scenario; it doesn’t matter if we do the surrogate key lookup for the fact table load hours or days later than the dimensions. Receiving the fact table data early does present us with ETL logic issues: the correct dimension record may or may not be there when it’s time to load the facts.</p>
<p>There is a simple strategy to handle early-arriving facts. In our ETL, we implement a process to insure that our facts are at least reportable intra-day:</p>
<ol>
<li>If a dimension record exists for the current business or natural key we are interested in, then grab the latest record. This is the best we can do at this point, and will usually be the correct value.</li>
<li>If no dimension record exists yet for the current natural key, then use a default record type equating to “Not Known Yet.” Though it’s not sexy for intra-day reporting, it at least makes the data available across the dimensions we do know about.</li>
<li>As we approach the end of the day and prepare to “close the books” for the current day, we should have run all dimension loads—even late arriving dimensions—so that our dimension tables are all up to date. At this point we run a corrective mapping to update all the fact records in the real-time partition with the right surrogate keys. This would likely be a MERGE statement, or a TRUNCATE/INSERT style mapping. From a performance perspective, my bet is on the latter.</li>
</ol>
<p><a href="http://www.rittmanmead.com/wp-content/uploads/2011/07/outer-join-mapping1.png"><img class="size-large wp-image-8631 alignnone" src="http://www.rittmanmead.com/wp-content/uploads/2011/07/outer-join-mapping1-1024x354.png" alt="" width="737" height="255" /></a></p>
<p>&nbsp;</p>
<p>The above mapping loads the real-time partition in a micro-batch style doing an outer join to the CUSTOMER_DIM table and writing the &#8220;Not Known Yet&#8221; row in case a customer is not found. Also, I am employing a Splitter Operator in OWB, but I tricked it out to force it to load all rows to BOTH tables: SALES_FACT_RT and SALES_STG_RT. The reason for this is that we don&#8217;t write dimension natural keys into our fact tables, though I&#8217;ve seen that technique employed in some real-time implementations. So when it&#8217;s time to run our corrective mapping to correct our fact table data, we just join the SALES_STG_RT table to the now-correct dimension tables and produce the right surrogate keys for each fact record, and load the results into SALES_FACT_RT.</p>
<p>When “closing the books” on the day, we build indexes and constraints on the real-time partition that match those on the partitioned fact table. Once this step is complete, we can then use a partition-exchange operation to combine the real-time partition as part of the static fact table. In Oracle, this is a fast, dictionary update, and occurs almost instantaneously.<br />
Obviously, our partitioning choice for the fact table will determine exactly how this partition-exchange will occur. If we’ll agree to partition the fact table by DAY, then we can use Oracle Interval partitioning, available in Oracle 11gR1 and beyond. We have to make this concession because Interval partitioning tables cannot have partitions in the same table that contain different range-based boundaries. For instance, we can’t have some MONTH-based partitions, while also having some DAY-based partitions, as we can with regular range-based partitioning. Using Interval partitioning is the easiest method, however, because it requires the least amount of partition maintenance as part of the load. For instance, consider the SALES_FACT table listed below, using Interval partitioning on the SALES_DATE_KEY, which we partition on at the DAY grain:</p>
<pre>CREATE TABLE sales_fact
       (
         customer_key           NUMBER           NOT NULL,
         product_key            NUMBER           NOT NULL,
         staff_key              NUMBER           NOT NULL,
         store_key              NUMBER           NOT NULL,
         sales_date_key         DATE             NOT NULL,
         trans_id               NUMBER,
         trans_line_id          NUMBER,
         sales_date             DATE,
         unit_price             NUMBER,
         quantity               NUMBER,
         amount                 NUMBER
       )
       partition BY range (sales_date_key)
       interval (numtodsinterval(1,'DAY'))
       (
         partition sales_fact_2006 VALUES less than (to_date('2007-01-01','YYYY-MM-DD'))
       )
       COMPRESS
/</pre>
<p>Each time we load a record into SALES_FACT for which no partition currently exists, Oracle will spawn one for the table. But based on our real-time requirements, we will use a partition-exchange operation every day to close the books on the current day processing, so each day, we will need to spawn a clean, new partition to facilitate that partition-exchange. All we need to do to make this happen is issue an insert statement with a DATE value for the partitioning key that equates to TRUNC(SYSDATE). For instance, the following statement would generate a new partition that we can use for the exchange:</p>
<pre>SQL&gt; INSERT INTO gcbc_edw.sales_fact
  2         (
  3           customer_key,
  4           product_key,
  5           staff_key,
  6           store_key,
  7           sales_date_key,
  8           trans_id,
  9           trans_line_id,
 10           sales_date,
 11           unit_price,
 12           quantity,
 13           amount)
 14         VALUES
 15         (
 16           -1,
 17           -1,
 18           -1,
 19           -1,
 20           trunc(SYSDATE),
 21           -1,
 22           -1,
 23           SYSDATE,
 24           0,
 25           0,
 26           0
 27         )
 28  /

1 row created.

Elapsed: 00:00:00.01
SQL&gt;</pre>
<p>Once the insert has created our new SYSDATE-based partition, we can exchange the real-time partition in for this new partition. We can use the new PARTITION FOR clause — which allows us to reference partition names using partition key values — with a slight caveat. Though we can’t use SYSDATE explicitly in the DDL statement, we can reference it implicitly:</p>
<pre>SQL&gt; DECLARE
  2     l_date DATE := SYSDATE;
  3     l_sql  LONG;
  4  BEGIN
  5     l_sql :=   q'|alter table gcbc_edw.sales_fact exchange partition|'
  6             || chr(10)
  7             || q'|for ('|'
  8             || l_date
  9             || q'|') with table gcbc_edw.sales_fact_rt|';
 10
 11     dbms_output.put_line( l_sql );
 12     EXECUTE IMMEDIATE( l_sql );
 13  END;
 14  /

alter table gcbc_edw.sales_fact exchange partition
for ('03/01/2011 09:38:33 PM') with table gcbc_edw.sales_fact_rt

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.07
SQL&gt;</pre>
<p>Using the preferred Interval partitioning option, the final “close the books” process flow is shown below. The first step that is taken is to run any late-arriving dimension mappings, in this example, the MAP_CUSTOMER_DIM mapping. Once all the dimensions are up-to-date, we can run the process that corrects all the dimension keys in the real-time partition. Remember, the real-time partition contains small data sets, so updating these records should not be resource intensive. In this scenario, the mapping MAP_CORRECT_SALES_FACT_RT issues an Oracle MERGE statement, but it is quite likely that a TRUNCATE/INSERT statement would work just as well. Once all the data in the real-time partition is correct and ready to go, we issue the MAP_CREATE_PARTITION mapping which uses an insert statement to spawn a new partition, and then the EXCHANGE_PARTITION PL/SQL procedure builds indexes and constraints, and completes the process by issuing the partition-exchange statement.</p>
<p><img style="margin-left: auto;margin-right: auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/07/corrective-process-flow1.png" border="0" alt="Corrective process flow" width="545" height="275" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/07/real-time-bi-edw-with-a-real-time-component/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Geography Hierarchies</title>
		<link>http://www.rittmanmead.com/2011/05/geography-hierarchies/</link>
		<comments>http://www.rittmanmead.com/2011/05/geography-hierarchies/#comments</comments>
		<pubDate>Tue, 24 May 2011 17:28:22 +0000</pubDate>
		<dc:creator>Peter Scott</dc:creator>
				<category><![CDATA[BI (General)]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/2011/05/geography-hierarchies/</guid>
		<description><![CDATA[I have been thinking about address a lot recently, in part it was moving house and all of the 1001 people that need to be notified. In the main, though, it was thoughts inspired by a data warehouse project I am working on. For this DWH people are geo-located by their street address, but for [...]]]></description>
			<content:encoded><![CDATA[<p>I have been thinking about address a lot recently, in part it was moving house and all of the 1001 people that need to be notified. In the main, though, it was thoughts inspired by a data warehouse project I am working on. For this DWH people are geo-located by their street address, but for most reporting we are only concerned with a grain of city. This all sounds so simple but how do we build a hierarchy from address, the line between the street where you live and the planet you live on. I remember as a child thinking it cool to address a letter to 23 Railway Cuttings, East Cheam then adding Surrey, England, Europe, The World, and however far I could get through navigating the solar system and the universe. To a child the hierarchy of address is relatively straightforward. But in data warehouse modelling things are not quite so simple.</p>
<p>Take the postal code (or zip code) where does it fit in the hierarchy? Well the answer is might not fit at all. Postal codes were developed to help post offices deliver mail &#8211; and each postal authority did their own thing. The UK and the Netherlands have postal code systems that can identify a single street or even a cluster of houses of within a street. Other countries work on a code per town or group of nearby towns &#8211; so straight away we have a difference in grain; a few houses in the UK a few towns in France. In Germany postal codes relate to geographic areas but those areas are not aligned to the Bundesländer; on the other hand, France ties postal code to Department but there are anomalies notably where a river runs through a village and opposite banks share a postal code but are different Departments (and in one case, different regions). Some national postal codes are numeric, some area alphanumeric (like Canadian and UK ones), the length of the postcode varies between countries too.</p>
<p>Perhaps the sensible thing, especially if you are dealing with addresses from multiple countries, is to not use postal code as a level in geographical hierarchies. If you use them at all just make them as an attribute of the address and remember that they don&#8217;t always have geographical parents.</p>
<p>I think the key point about modelling geography is that just because you know how addresses work in your own country you can&#8217;t assume that they work like that in the country next door. If you have a requirement to report, for example, the efficiency of the postal service in delivering your goods by postal region you need to ensure that your reporting handles the anomalies and exceptions. As always, knowing your data is key to creating a correct model.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/05/geography-hierarchies/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Real-time BI: Federated OLTP/EDW Reporting</title>
		<link>http://www.rittmanmead.com/2011/05/real-time-bi-federated-oltpedw-reporting/</link>
		<comments>http://www.rittmanmead.com/2011/05/real-time-bi-federated-oltpedw-reporting/#comments</comments>
		<pubDate>Mon, 16 May 2011 16:42:41 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[BI (General)]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>
		<category><![CDATA[Oracle Database]]></category>
		<category><![CDATA[Oracle Warehouse Builder]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=8243</guid>
		<description><![CDATA[The typical approach in Federated OLTP/EDW reporting environments is to use a BI tool such as OBIEE to do horizontal federation. This means combining data from multiple sources at the same grain in a single logical table. One note of clarification: my use of the word &#8220;federated&#8221; might be a misnomer, and I apologize in [...]]]></description>
			<content:encoded><![CDATA[<p>The typical approach in Federated OLTP/EDW reporting environments is to use a BI tool such as OBIEE to do horizontal federation. This means combining data from multiple sources at the same grain in a single logical table. One note of clarification: my use of the word &#8220;federated&#8221; might be a misnomer, and I apologize in advance. As I argued in the <a href="http://www.rittmanmead.com/2011/05/real-time-bi-an-introduction/">last post</a>, the best practice for performance reasons is to actually stream, or &#8220;GoldenGate&#8221; the source system data to a foundation layer on the data warehouse instance. But old habits die hard, so I&#8217;ll continue to refer to this as &#8220;federation&#8221; even though it may not be technically accurate. Thanks for the latitude.</p>
<p>One of the sources for federation is a classic, batch-loaded EDW, with ETL processes that load conformed dimension tables, followed by fact tables that store the measures and calculations for the enterprise. Oracle Warehouse Builder (OWB), the ETL tool built inside the Oracle Database, is a standard choice for data warehouses built on the Oracle Database, and below, I show a sample process flow of what that batch load might look like:</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/batch-DW.png" alt="Batch DW" border="0" width="600" height="326" /></p>
<p>Logical table sources (LTS’s) are a key feature within the OBIEE semantic model but are often misunderstood. Each LTS represents a single location for data to exist for either a logical fact table, or logical dimension table. A logical table in the BMM can have multiple LTS’s for any of the following reasons:</p>
<p>1. Including different table sources into a single logical table at different levels of granularity. Tables containing data pre-aggregated at a different level in a hierarchy is a common example of this scenario, and is known as &#8220;vertical fragmentation&#8221;.</p>
<p>2. Including different table sources into a single logical table at the same level of granularity. Having data exist in two different locations, but wanting them to be combined in particular situations, is a common example of this scenario, and is known as &#8220;horizontal fragmentation&#8221;.</p>
<p>Using horizontal fragmentation in OBIEE, we can map a single logical fact table to multiple LTS’s. For example, suppose we had a physical fact table in our EDW called SALES_FACT. To represent that fact table in the semantic model, we would create a logical fact table in the BMM — called “Sales Fact Realtime” in this example — and create an LTS that maps to the SALES_FACT table. We would also map another LTS which presents this data in the source system as well. As the source system is transactional and likely exists in third-normal form (3NF), the LTS that maps to the transactional schema would likely not be a simple one-to-one relationship. In 3NF, we would likely have to join multiple tables in our source system to represent the logical fact table Sales Fact Realtime:</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/source-to-target-fact.png" alt="Source to target fact" border="0" width="600" height="270" /></p>
<p>We would have to do something comparable with the Customer Dimension:</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/source-to-target-dimension.png" alt="Source to target dimension" border="0" width="600" height="297" /></p>
<p>With the two LTS&#8217;s, we still need to configure the horizontal fragmentation. For this implementation, I have configured a repository variable called RV_REALTIME_THRESHOLD_DT, with an initialization block that keeps the value consistently at TRUNC(SYSTDATE). I use this variable as the threshold between reporting against the EDW schema and the source system schema.</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/init-block1.png" alt="Init block" border="0" width="530" height="439" /></p>
<p>Once I have the variable available, I can configure the fragmentation on the fact table to use the threshold to determine the appropriate source for a particular record. This is less complicated with the EDW LTS&#8230; simple fragmentation configured for all rows with a transaction date less than the threshold date:</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/fragmentation-EDW.png" alt="Fragmentation EDW" border="0" width="432" height="508" /></p>
<p>Whereas only the source system contains the newer rows needed for layering in real-time data&#8230; both the EDW and the source system contain historic data, albeit the EDW data is likely transformed to a certain degree. So we have to configure fragmentation using the RV_REALTIME_THRESHOLD_DT variable, but we also have to use that variable as a filter on the source system LTS to make sure we don&#8217;t over allocate the data.</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/fragmentation-OLTP.png" alt="Fragmentation OLTP" border="0" width="436" height="507" /></p>
<p>What’s the result of all this complex mapping among different LTS’s in the BMM? OBIEE understands that each source schema is completely segmented, and the tables in each LTS never join to tables in the other LTS… but they do union. OBIEE will construct a complete query against the transactional schema, in this example, joining between the CUSTOMER_DEMOG_TYPES, CUSTOMERS, POS_TRANS and POS_TRANS_HEADER tables. Additionally, OBIEE will construct another complete query against the EDW schema, in this case, only the tables SALES_FACT and CUSTOMER_DIM. The BI Server then logically unions the results between the two source schemas into a single result set that is returned whenever a user builds a report against the logical tables Customer Dim and Sales Fact Realtime. So I run the following report against my fragmented Sales Fact Realtime:</p>
<p><img style="margin-left:auto;margin-right:auto" src="http://www.rittmanmead.com/wp-content/uploads/2011/05/high-level-report-federated.png" alt="High level report federated" border="0" width="461" height="468" /></p>
<p>The interesting part is how OBIEE does the logical union. When the EDW and the transactional schema exist in separate databases, the BI Server issues two different database queries and combines them into a single result set in its own memory space. However, if the schemas exist within the same database, as the Oracle Next-Generation Reference Architecture recommends, then the BI Server is able to issue a single query, transforming the logical union into an actual physical union in the SQL statement, as demonstrated in the statement below. Notice that the SQL threshold has been applied, and the UNION was constructed with a single SQL statement pushed down from the BI Server to the Oracle Database holding the Foundation and Presentation and Access layers in our Oracle architecture:</p>
<pre>
WITH
SAWITH0 AS (select T44105.AMOUNT as c1,
     T44042.CUSTOMER_LAST_NAME as c2,
     T48199.CALENDAR_MONTH_NUMBER as c3,
     T48199.CALENDAR_YEAR as c4,
     T48199.SQL_DATE as c5
from
     GCBC_EDW.DATE_DIM T48199 /* CONFORMED_DATE_DIM */ ,
     GCBC_EDW.CUSTOMER_DIM T44042,
     GCBC_EDW.SALES_FACT T44105
where  ( T44042.CUSTOMER_KEY = T44105.CUSTOMER_KEY and T44105.SALES_DATE_KEY = T48199.DATE_KEY ) ),
SAWITH1 AS (select T43971.SAL_AMT as c1,
     T43901.CUST_LAST_NAME as c2,
     T48199.CALENDAR_MONTH_NUMBER as c3,
     T48199.CALENDAR_YEAR as c4,
     T48199.SQL_DATE as c5
from
     GCBC_EDW.DATE_DIM T48199 /* CONFORMED_DATE_DIM */ ,
     GCBC_CRM.CUSTOMERS T43901,
     GCBC_POS.POS_TRANS T43971,
     GCBC_POS.POS_TRANS_HEADER T43978
where  ( T43901.CUST_ID = T43978.CUST_ID
         and T43971.TRANS_ID = T43978.TRANS_ID
         <strong>and T48199.DATE_KEY =  TRUNC(T43978.TRANS_DATE)
         and T43978.TRANS_DATE &gt;= TO_DATE('2011-05-16 00:00:00' , 'YYYY-MM-DD HH24:MI:SS') </strong>
       )),
SAWITH2 AS ((select concat(D0.c4, D0.c3) as c2,
     D0.c5 as c3,
     D0.c2 as c4,
     D0.c1 as c5
from
     SAWITH0 D0
union all
select concat(D0.c4, D0.c3) as c2,
     D0.c5 as c3,
     D0.c2 as c4,
     D0.c1 as c5
from
     SAWITH1 D0)),
SAWITH3 AS (select sum(D3.c5) as c1,
     D3.c2 as c2,
     D3.c3 as c3,
     D3.c4 as c4
from
     SAWITH2 D3
group by D3.c2, D3.c3, D3.c4)
select distinct 0 as c1,
     D2.c2 as c2,
     D2.c3 as c3,
     D2.c4 as c4,
     D2.c1 as c5
from
     SAWITH3 D2
order by c2, c4, c3
</pre>
<p>But OBIEE is also capable of doing the fragmentation equivalent of &#8220;partition pruning.&#8221; When the BI Server has enough information to know that the entire result set will come from a single source, then the SQL will be issued against only one of the LTS&#8217;s. For instance, if I click on one of the &#8220;SQL Date&#8221; attributes in the above report which will apply a filter on the fragmentation column, the BI Server will know that the result set only comes from the EDW:</p>
<pre>WITH
SAWITH0 AS (select sum(T44105.AMOUNT) as c1,
     concat(T48199.CALENDAR_YEAR, T48199.CALENDAR_MONTH_NUMBER) as c2,
     T48199.DATE_KEY as c3,
     T48199.SQL_DATE as c4,
     T44042.CUSTOMER_LAST_NAME as c5
from
     GCBC_EDW.DATE_DIM T48199 /* CONFORMED_DATE_DIM */ ,
     GCBC_EDW.CUSTOMER_DIM T44042,
                   GCBC_EDW.SALES_FACT T44105
where  ( T44042.CUSTOMER_KEY = T44105.CUSTOMER_KEY
         and T44042.CUSTOMER_LAST_NAME = 'Carr'
         and T44105.SALES_DATE_KEY = T48199.DATE_KEY
         <strong>and T48199.SQL_DATE = TO_DATE('2009-07-03' , 'YYYY-MM-DD')</strong>
         and concat(T48199.CALENDAR_YEAR, T48199.CALENDAR_MONTH_NUMBER) = '200907' )
group by T44042.CUSTOMER_LAST_NAME,
         T48199.DATE_KEY,
         T48199.SQL_DATE,
         concat(T48199.CALENDAR_YEAR, T48199.CALENDAR_MONTH_NUMBER))
select distinct 0 as c1,
     D1.c2 as c2,
     D1.c3 as c3,
     D1.c4 as c4,
     D1.c5 as c5,
     D1.c1 as c6
from
     SAWITH0 D1
order by c2, c5, c4, c3</pre>
<p>Before closing this section of the real-time discussion, I want to take a minute to identify the strengths and weaknesses of this approach. As far as strengths go, we have several items that register with this solution. First off&#8230; this is a low-latency solution. When using the Oracle Next-Generation Reference Architecture, we have the latency of streaming, or &#8220;GoldenGating,&#8221; the content from the source system to the DW database. With clients we&#8217;ve had in the past, this can run anywhere from a few seconds to several minutes, depending on the solution implemented. Additionally, there is no complex logical or physical data modeling and supporting ETL to deliver this solution, as there is with the EDW with a Real-Time Component, which we will explore in the next posting.</p>
<p>As far as weaknesses go, there will be a fair amount of complex RPD semantic-layer modeling. Obviously, the degree of difficulty depends on a number of factors: number of source systems integrated, number of subject areas, complexity of reports delivered, etc. Also, increased complexity of RPD modeling may introduce performance degradation as OLTP schemas have to be transformed &#8220;on the fly&#8221; to star schemas by the BI Server. But keep in mind&#8230; we are typically only doing this for at most a day&#8217;s worth of data, so with proper database tuning, this content can usually perform quite well.</p>
<p>Next up: EDW with a Real-Time Component</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/05/real-time-bi-federated-oltpedw-reporting/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Oracle Data Warehouse Global Leaders Webcast</title>
		<link>http://www.rittmanmead.com/2011/03/oracle-data-warehouse-global-leaders-webcast/</link>
		<comments>http://www.rittmanmead.com/2011/03/oracle-data-warehouse-global-leaders-webcast/#comments</comments>
		<pubDate>Fri, 18 Mar 2011 18:47:23 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[BI (General)]]></category>
		<category><![CDATA[BI 2.0]]></category>
		<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Exadata]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=7593</guid>
		<description><![CDATA[I&#8217;m honored to be delivering a webcast for the Oracle Data Warehouse Global Leaders Program on Tuesday, March 22 at Noon EST. This is an elite program for key global data warehousing customers and is managed by the Oracle data warehousing product management team. It also provides a rich opportunity to network with peers, and these webcasts are one of [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m honored to be delivering a webcast for the Oracle Data Warehouse Global Leaders Program on Tuesday, March 22 at Noon EST. This is an elite program for key global data warehousing customers and is managed by the Oracle data warehousing product management team. It also provides a rich opportunity to network with peers, and these webcasts are one of the ways that Oracle delivers value to the program members. Anyone interested in the program or the webcast should <a href="mailto:dw-global-leaders_us@oracle.com" target="_blank">email the DW Global Leaders program</a>.</p>
<p>The subject matter will be &#8220;Agile Data Warehousing on Oracle Exadata and OBIEE 11g&#8221;. This is a subject I&#8217;ve been devoting a lot of time to lately, both in project delivery and in speaking. With two full data warehouse delivery projects on Exadata under my belt, and several other partial projects, the Database Machine is absolutely a paradigm shift. But the real tipping point comes when these DW capabilities are combined with a powerful metadata layer, such as exists in OBIEE 11g. Over the last few years, I&#8217;ve adjusted and re-adjusted long-standing beliefs about how data warehouses should be built and delivered. While I&#8217;ll talk about what makes Exadata and OBIEE different, my main focus is demonstrating how to use the features to deliver BI in accordance with standard Agile concepts. I also have a series of blog posts planned to dive into this subject in detail.</p>
<p>If you&#8217;re interested in homework, I&#8217;ll be discussing the <a title="Drilling Down in the Oracle Next-Generation Reference DW Architecture" href="http://www.rittmanmead.com/2009/07/drilling-down-in-the-oracle-next-generation-reference-dw-architecture/">Oracle Next-Generation Data Warehouse Reference Architecture</a>, Exadata Smart-Scan, and the OBIEE Semantic Model. Additionally, I&#8217;ll spend some time on the <a href="http://agilemanifesto.org/">Agile Manifesto</a>, the generic <a href="http://en.wikipedia.org/wiki/Agile_software_development">agile development movement</a>, and what effect they have on DW delivery methodologies.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2011/03/oracle-data-warehouse-global-leaders-webcast/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Partitioning Fact Tables, Part 1</title>
		<link>http://www.rittmanmead.com/2010/08/partitioning-fact-tables-part-1/</link>
		<comments>http://www.rittmanmead.com/2010/08/partitioning-fact-tables-part-1/#comments</comments>
		<pubDate>Fri, 13 Aug 2010 00:39:09 +0000</pubDate>
		<dc:creator>Stewart Bryson</dc:creator>
				<category><![CDATA[Data Warehousing]]></category>
		<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Oracle Database]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=5245</guid>
		<description><![CDATA[I&#8217;m dogmatic about certain aspects of data warehousing. For instance, fact tables should be range partitioned by DATE. I tell my clients all the time: you will have a very difficult time persuading me otherwise. But they always try: they argue about all the attributes that are more pervasive than DATE: customer classes, transaction types, [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m dogmatic about certain aspects of data warehousing. For instance, fact tables should be range partitioned by DATE. I tell my clients all the time: you will have a very difficult time persuading me otherwise. But they always try: they argue about all the attributes that are more pervasive than DATE: customer classes, transaction types, etc., etc. But I&#8217;m just not buying it. We are building data warehouses, and the <a href="http://intelligent-enterprise.informationweek.com/030422/607warehouse1_1.jhtml?_requestid=5194">third rail</a> of the Soul of the Data Warehouse is how it handles time.</p>
<p>If you agree with me about this precept (and I really think you should), this is still not the end of the story. We must charge ahead into the lion&#8217;s den of a debate that has been raging in the Oracle data warehousing world for years: do we make the surrogate key of our date dimension a NUMBER, or do we make it a DATE? It&#8217;s funny&#8230; I remember this being the first question I ever posed to Mark years and years ago, and he did a blog entry that evolved out of our email communication. I don&#8217;t see the entry on the blog any more&#8230; it must have been lost in <a href="http://www.rittmanmead.com/category/the-great-blog-disaster/">The Great Blog Disaster</a>. Pity.</p>
<p>The choice between NUMBER and DATE bubbles up from the two streams at work in the Oracle Data Warehousing community: the data warehousing folks, and the Oracle folks. <a href="http://www.ralphkimball.com/">Ralph Kimball </a> argues that the surrogate key of the date dimension should be numeric. In the <a href="http://www.ralphkimball.com/html/booksDWLT2.html">Data Warehouse Lifecycle Toolkit</a> book (or at least, in my edition of it), Kimball basically makes the argument that numbers require less space than dates. That one never did too much for me. However, in his <a href="http://www.kimballgroup.com/html/designtipsPDF/KimballDT51LatestThinking.pdf">Latest Thinking on Time Dimension Tables</a> design tip, he makes a better argument: if our surrogate key is a DATE, then how do we handle &#8220;Not Applicable&#8221; type rows? This one has teeth, and I think that most designers who struggle with this decision point to this issue. If we use an actual DATE as our surrogate key, then what value can we use that actually means &#8220;no date at all&#8221;?</p>
<p>Oracle experts like <a href="http://asktom.oracle.com">Tom Kyte</a> argue that <a href="http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:4632159445946">&#8220;dates belong in DATES&#8221;</a>. (If you look really hard at this post, you can see a younger and more naive version of myself weighing in on the debate&#8230; and also, apparently, not knowing how to gather histograms with DBMS_STATS. Oh well.) As Tom demonstrates on that post, the optimizer just plain works better when dates are stored in DATE datatypes.</p>
<p>I&#8217;ve typically been on Kyte&#8217;s side in this debate, both from a performance and a maintenance perspective. I&#8217;ve parted ways with Kimball on this point and urged my clients to build date dimensions with DATE surrogate keys, calling the column something like DATE_KEY. For the &#8216;NA&#8217; types of dimension records, I use a wacky DATE value for DATE_KEY, such as &#8217;12/31/9999&#8242; or &#8217;01/01/0001&#8242;. Think of this as the equivalent of -1 if the surrogate key were actually numeric. Being a surrogate key&#8230; it really doesn&#8217;t matter what value it contains: we just need to know the column name so we can construct the correct JOIN syntax. Then, I&#8217;ll build another DATE column in the table called SQL DATE, and this is the one that I expose to the reporting layer. Since SQL DATE does not have to serve as the primary key, it&#8217;s fine for it to be a NULL if desired.</p>
<p>In subsequents posts, I&#8217;ll examine new partitioning features in 11g, including interval partitioning (which Pete Scott recently <a href="http://www.rittmanmead.com/2010/08/07/more-on-interval-partitioning/">blogged</a> about), and also reference partitioning, and whether these enhancements provide more options to this historically binary choice.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/08/partitioning-fact-tables-part-1/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Oracle BI EE 10.1.3.4.1 &#8211; Currency Conversions &amp; FX Translations &#8211; Part 1</title>
		<link>http://www.rittmanmead.com/2010/06/oracle-bi-ee-10-1-3-4-1-currency-conversions-fx-translations-part-1/</link>
		<comments>http://www.rittmanmead.com/2010/06/oracle-bi-ee-10-1-3-4-1-currency-conversions-fx-translations-part-1/#comments</comments>
		<pubDate>Tue, 15 Jun 2010 07:22:02 +0000</pubDate>
		<dc:creator>Venkatakrishnan J</dc:creator>
				<category><![CDATA[Dimensional Modelling]]></category>
		<category><![CDATA[Oracle BI Suite EE]]></category>

		<guid isPermaLink="false">http://www.rittmanmead.com/?p=4962</guid>
		<description><![CDATA[One of the common requirements when implementing BI EE is the ability to handle multiple input currencies. This is a pretty common requirement especially in business scenarios where multiple countries/currencies are involved. In such cases, many of the finance related measures like Sales etc will come in local currencies. So, as part of the BI [...]]]></description>
			<content:encoded><![CDATA[<p>One of the common requirements when implementing BI EE is the ability to handle multiple input currencies. This is a pretty common requirement especially in business scenarios where multiple countries/currencies are involved. In such cases, many of the finance related measures like Sales etc will come in local currencies. So, as part of the BI EE setup we need to ensure that such local currency transactions are converted to a common reporting currency. There are 2 types of currency conversions</p>
<p>1. Local Currency to Reporting Currency Conversion – This is the most common requirement where individual transactions are converted into a common Reporting currency and then rolled up for reporting.<br />
2. Reporting Currency Restatements – This generally is a finance requirement where the common input reporting currency(assuming input data itself comes in reporting currency) will have to be analyzed for varying rates. I will cover this in the next blog post.</p>
<p>I will be covering the first requirement in this blog post i.e. converting local currency to reporting currency. I shall be using a modified form of  the Oracle Sample SH schema. The high level physical schematic diagram is given below</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 5" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture5.png" border="0" alt="Picture 5" width="422" height="315" /></p>
<p>Basically every transaction in the SALES fact table is a transaction that was done in the individual countries. For example, this fact table will have a AMOUNT_SOLD of say 100 EUR if the customer of the transaction is from say Belgium(i.e. product was bought in Belgium). The same fact table will also have an AMOUNT_SOLD of say 150 USD if the product was bought by a customer in United States.</p>
<p>Every Country will have a single local currency(USD, EUR, GBP etc). So, basically the Countries table above will have CURRENCY_CODE as an attribute of a Country.</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 6" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture6.png" border="0" alt="Picture 6" width="362" height="315" /></p>
<p>Finally we have a rates table called CURRENCY_RATES which will basically store the daily fluctuating rates. For the purposes of this blog post, i will assume that there is only one common reporting currency which is USD.</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 7" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture7.png" border="0" alt="Picture 7" width="504" height="276" /></p>
<p>There are 2 ways to do currency conversion. I will list them below</p>
<p>1. Do the Rate Multiplication only at the grain of the rates (Time &amp; Customer dimension) and not for every transaction.<br />
2. Do the Rate Multiplication to each and every transaction.</p>
<p>I will discuss both the above techniques here. I generally prefer the first one as in many cases that turns out to be much faster than the other.</p>
<p><strong>Rate Multiplication at Grain of Rates:</strong></p>
<p>Consider the following query</p>
<pre>SELECT
CURRENCY_CODE,
COUNTRY_NAME,
A.TIME_ID,
AMOUNT_SOLD
FROM
SALES A,
TIMES B,
CUSTOMERS C,
COUNTRIES D
WHERE
A.TIME_ID = B.TIME_ID AND
A.CUST_ID = C.CUST_ID AND
C.COUNTRY_ID = D.COUNTRY_ID
ORDER BY 2,3</pre>
<p>The above query produces the following data.</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 8" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture8.png" border="0" alt="Picture 8" width="407" height="282" /></p>
<p>In our Rates table we have, one rate for every Day/Currency combination</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 9" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture9.png" border="0" alt="Picture 9" width="404" height="142" /></p>
<p>As you see above, we can arrive at the FX rate conversion in 2 ways. Multiply each row in the SALES fact table with the rate and then do the roll-ups. For example,</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 11" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture11.png" border="0" alt="Picture 11" width="478" height="164" /></p>
<p>Or we can Roll-up each transaction at the Day/Currency Level and then multiply with the rates. In plain math, all we are trying to do is</p>
<p align="center"><em><strong>a*d+b*d+c*d = <em><strong>(a+b+c)*d</strong></em></strong></em></p>
<p align="center"><em> </em><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 12" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture12.png" border="0" alt="Picture 12" width="474" height="44" /></p>
<p>We start off with doing the latter i.e. do the multiplication once the roll-ups are done (but at the grain of the rates). To implement this in the repository, we need to model the rates as a separate fact table. The RATE measure will take Average (or any aggregation except none as we will always enforce the lower-most level multiplication and hence it does not matter) as the aggregation. Since RATE’s do not conform to other dimensions like Product, Promotions &amp; Channels, we need to assign the measure to the Total level of each of the non-conforming dimensions. Also, we need to assign the RATE to the Day level of the Time dimension and the Country Level of the Customer dimension. This will ensure that we get a unique and the correct rate. To Test whether the rates work correctly, lets create a very simple report as shown below</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 15" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture15.png" border="0" alt="Picture 15" width="320" height="315" /></p>
<p>As you see, we are able to produce the RATE values even for non-conforming CHANNEL dimension. Also, for all the cities within a Country we get the same RATE. This is what assigning of the levels do to the RATE measure. Now create a Logical Column which will multiply the RATE with the Sales Measure</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 16" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture16.png" border="0" alt="Picture 16" width="409" height="250" /></p>
<p>Since we are enforcing the levels in the RATE measure, the same enforcement will happen for the resulting measure as well. Lets create a report as shown below</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 19" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture19.png" border="0" alt="Picture 19" width="361" height="315" /></p>
<p>If we look at the SQL Query, we will notice that the joins of the Rates and the Sales Measure will be enforced only at the RATE grain (through an outer query block) as against every transaction which is not necessary.</p>
<pre>WITH
SAWITH0 AS (select D1.c1 as c1,
     D1.c2 as c2,
     D1.c3 as c3,
     D1.c4 as c4,
     D1.c5 as c5,
     D1.c6 as c6,
     D1.c7 as c7,
     D1.c8 as c8,
     D1.c9 as c9
from
     (select sum(T12697.AMOUNT_SOLD) as c1,
               T12623.CHANNEL_CLASS as c2,
               T12638.CUST_CITY as c3,
               T12623.CHANNEL_CLASS_ID as c4,
               T12710.FISCAL_YEAR as c5,
               T13704.COUNTRY_NAME as c6,
               T12710.TIME_ID as c7,
               T12710.FISCAL_YEAR_ID as c8,
               T13704.COUNTRY_ID as c9,
               ROW_NUMBER() OVER (PARTITION BY T12623.CHANNEL_CLASS_ID, T12638.CUST_CITY, T12710.TIME_ID, T13704.COUNTRY_ID ORDER BY T12623.CHANNEL_CLASS_ID ASC, T12638.CUST_CITY ASC, T12710.TIME_ID ASC, T13704.COUNTRY_ID ASC) as c10
          from
               TIMES T12710,
               COUNTRIES T13704,
               CUSTOMERS T12638,
               CHANNELS T12623,
               SALES T12697
          where  ( T12638.COUNTRY_ID = T13704.COUNTRY_ID and T12623.CHANNEL_ID = T12697.CHANNEL_ID and T12638.CUST_ID = T12697.CUST_ID and T12697.TIME_ID = T12710.TIME_ID )
          group by T12623.CHANNEL_CLASS, T12623.CHANNEL_CLASS_ID, T12638.CUST_CITY, T12710.FISCAL_YEAR, T12710.FISCAL_YEAR_ID, T12710.TIME_ID, T13704.COUNTRY_ID, T13704.COUNTRY_NAME
     ) D1
where  ( D1.c10 = 1 ) ),
SAWITH1 AS (select D1.c1 as c1,
     D1.c2 as c2,
     D1.c3 as c3,
     D1.c4 as c4,
     D1.c5 as c5,
     D1.c6 as c6
from
     (select avg(T17009.RATE) as c1,
               T12710.FISCAL_YEAR as c2,
               T13704.COUNTRY_NAME as c3,
               T12710.TIME_ID as c4,
               T12710.FISCAL_YEAR_ID as c5,
               T13704.COUNTRY_ID as c6,
               ROW_NUMBER() OVER (PARTITION BY T12710.TIME_ID, T13704.COUNTRY_ID ORDER BY T12710.TIME_ID ASC, T13704.COUNTRY_ID ASC) as c7
          from
               TIMES T12710,
               COUNTRIES T13704,
               CURRENCY_RATES T17009
          where  ( T12710.TIME_ID = T17009.RATE_DATE and T13704.CURRENCY_CODE = T17009.FROM_CURRENCY )
          group by T12710.FISCAL_YEAR, T12710.FISCAL_YEAR_ID, T12710.TIME_ID, T13704.COUNTRY_ID, T13704.COUNTRY_NAME
     ) D1
where  ( D1.c7 = 1 ) ),
SAWITH2 AS (select D1.c1 as c1,
     D1.c2 as c2,
     D1.c3 as c3,
     D1.c4 as c4,
     D1.c5 as c5,
     D1.c6 as c6,
     D1.c7 as c7,
     D1.c8 as c8,
     D1.c9 as c9,
     D1.c10 as c10
from
     (select SAWITH0.c2 as c1,
               case  when SAWITH0.c5 is not null then SAWITH0.c5 when SAWITH1.c2 is not null then SAWITH1.c2 end  as c2,
               case  when SAWITH0.c6 is not null then SAWITH0.c6 when SAWITH1.c3 is not null then SAWITH1.c3 end  as c3,
               SAWITH0.c3 as c4,
               SAWITH0.c1 as c5,
               SAWITH0.c1 * SAWITH1.c1 as c6,
               case  when SAWITH1.c4 is not null then SAWITH1.c4 when SAWITH0.c7 is not null then SAWITH0.c7 end  as c7,
               SAWITH0.c4 as c8,
               case  when SAWITH0.c8 is not null then SAWITH0.c8 when SAWITH1.c5 is not null then SAWITH1.c5 end  as c9,
               case  when SAWITH1.c6 is not null then SAWITH1.c6 when SAWITH0.c9 is not null then SAWITH0.c9 end  as c10,
               ROW_NUMBER() OVER (PARTITION BY SAWITH0.c2, SAWITH0.c3, SAWITH0.c4, case  when SAWITH0.c5 is not null then SAWITH0.c5 when SAWITH1.c2 is not null then SAWITH1.c2 end , case  when SAWITH0.c6 is not null then SAWITH0.c6 when SAWITH1.c3 is not null then SAWITH1.c3 end , case  when SAWITH0.c8 is not null then SAWITH0.c8 when SAWITH1.c5 is not null then SAWITH1.c5 end , case  when SAWITH1.c4 is not null then SAWITH1.c4 when SAWITH0.c7 is not null then SAWITH0.c7 end , case  when SAWITH1.c6 is not null then SAWITH1.c6 when SAWITH0.c9 is not null then SAWITH0.c9 end  ORDER BY SAWITH0.c2 ASC, SAWITH0.c3 ASC, SAWITH0.c4 ASC, case  when SAWITH0.c5 is not null then SAWITH0.c5 when SAWITH1.c2 is not null then SAWITH1.c2 end  ASC, case  when SAWITH0.c6 is not null then SAWITH0.c6 when SAWITH1.c3 is not null then SAWITH1.c3 end  ASC, case  when SAWITH0.c8 is not null then SAWITH0.c8 when SAWITH1.c5 is not null then SAWITH1.c5 end  ASC, case  when SAWITH1.c4 is not null then SAWITH1.c4 when SAWITH0.c7 is not null then SAWITH0.c7 end  ASC, case  when SAWITH1.c6 is not null then SAWITH1.c6 when SAWITH0.c9 is not null then SAWITH0.c9 end  ASC) as c11
          from
               SAWITH0 full outer join SAWITH1 On SAWITH0.c7 = SAWITH1.c4 and SAWITH0.c9 = SAWITH1.c6
     ) D1
where  ( D1.c11 = 1 ) )
select SAWITH2.c1 as c1,
     SAWITH2.c2 as c2,
     SAWITH2.c3 as c3,
     SAWITH2.c4 as c4,
     SAWITH2.c5 as c5,
     SAWITH2.c6 as c6
from
     SAWITH2
order by c1, c2, c3, c4</pre>
<p>Though the query above might look big, this performs really well as the multiplication happens only for a select set of records. But there is one downside to this approach though. If we do not have Time or Customer dimension in the report, the currency converted measure will still go at the grain of Day and the Country as shown below</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 20" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture20.png" border="0" alt="Picture 20" width="165" height="315" /></p>
<p>The only way to roll-these up in such cases is to use the Pivot Table(or custom Logical SQL) as there is no capability currently in BI Server to roll-up a level based measure (after enforcing the levels). The other option is to enforce a filter whenever either Time or Customer dimensions are not chosen. The biggest advantage of this method though is in doing Currency Conversions for YTD, MTD measures. In the Case YTD, MTD measures, there might be a requirement to multiply the latest rate for that Month/Year as against multiplying the rate for each day. In such cases, all we need to do is to create Rate YTD, Rate MTD fact tables as shown below</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 21" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture21.png" border="0" alt="Picture 21" width="220" height="154" /></p>
<p>The Rate Measures in each of the separate logical fact tables will be assigned to the Month &amp; Year level respectively.</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 22" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture22.png" border="0" alt="Picture 22" width="413" height="180" /></p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 23" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture23.png" border="0" alt="Picture 23" width="411" height="180" /></p>
<p>And their respective Logical Table Source will have the filters applied as shown below (shown for YTD).</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 24" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture24.png" border="0" alt="Picture 24" width="500" height="146" /></p>
<p>This will ensure that we have the capability to determine which rates we need to multiply with what measure.</p>
<p><strong>Rate Multiplication for every transaction:</strong></p>
<p>This method is suited if we have the Rate Stored as an attribute of the Fact Measures themselves (as against a separate fact table with differing grain). But sometimes we might have a need to actually multiply the rates for each and every transaction. In such cases, we can use the approach wherein we bring in the Rates table as an inner-joined table to the main fact(or we can model it as a dimension depending on what is required).</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 25" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture25.png" border="0" alt="Picture 25" width="402" height="315" /></p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 26" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture26.png" border="0" alt="Picture 26" width="504" height="263" /></p>
<p>Now if we create a report using this</p>
<p align="center"><img style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" title="Picture 27" src="http://www.rittmanmead.com/wp-content/uploads/2010/06/Picture27.png" border="0" alt="Picture 27" width="194" height="315" /></p>
<p>you will notice that the join is pushed into the main fact table itself. Sometimes this might perform better especially when we apply filters properly. The SQL produced for this is given below</p>
<pre>WITH
SAWITH0 AS (select T12623.CHANNEL_CLASS as c1,
     T13704.COUNTRY_NAME as c2,
     sum(T12697.AMOUNT_SOLD * T18227.RATE) as c3,
     T12623.CHANNEL_CLASS_ID as c4,
     T13704.COUNTRY_ID as c5
from
     COUNTRIES T13704,
     CUSTOMERS T12638,
     CHANNELS T12623,
     SALES T12697,
     CURRENCY_RATES_FACT T18227
where  ( T12623.CHANNEL_ID = T12697.CHANNEL_ID and T12638.COUNTRY_ID = T13704.COUNTRY_ID and T12638.CUST_ID = T12697.CUST_ID and T12697.CUST_ID = T18227.CUST_ID and T12697.TIME_ID = T18227.TIME_ID )
group by T12623.CHANNEL_CLASS, T12623.CHANNEL_CLASS_ID, T13704.COUNTRY_ID, T13704.COUNTRY_NAME)
select SAWITH0.c1 as c1,
     SAWITH0.c2 as c2,
     SAWITH0.c3 as c3
from
     SAWITH0
order by c1, c2</pre>
<p>The important point to note though is the fact that we cannot easily achieve the MTD &amp; YTD rate conversion functionality that we saw above in the first method. Though it is possible, it will take some amount of work to make it perform well.</p>
<p>As you see both the methods above have their own pros and cons. Of course, in your case the actual scenario might be completely different (like you might have both local currency as well as Reported Currency stored in DW etc) but this should hopefully be useful for people who are starting with a currency conversion requirement in BI EE.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rittmanmead.com/2010/06/oracle-bi-ee-10-1-3-4-1-currency-conversions-fx-translations-part-1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

