December 14th, 2012 by Stewart Bryson
You guessed it… more on transactional schemas. For a quick recap, here are the posts to date:
In the last post on Simple Dimensions and Facts, we discussed logical facts and dimensions, factless fact tables, and fake count measures. In constructing the simple dimension in that post, we saw the collapsing of normalized relationships into a single conformed dimension table when building Dim – Customer. Another important entity for reporting in the Customer Tracking application is the Contact entity. At first glance, it might seem that we should simply build another dimension called Dim – Contact, and use analyses to combine our Customer and Contact dimensions along with our Activity fact table to analyze Customer and Contact behavior. When we look again at the data model for the Customer Tracking application below, Customer and Contact are clearly two separate entities in this application, and so this probably means that we should continue to report on them this way, and structure them as such in the OBIEE Business Model and Mapping layer. Right?
Is this the way that business users think about their data? Although it makes perfect sense in a normalized data model to have separate entities for Customer and Contact, the business user likely won’t see it this way, especially when it comes to defining reporting requirements. Contacts are clearly specific instances of a Customer at a lower level in the Customer-Contact hierarchy, and our end users will expect to be able to drill cleanly from Customer on a report down to the specific Contact that is associated with a particular activity. Although OBIEE supports building hierarchies across multiple logical dimension tables so we can provide the drill-down behavior mentioned here, the end user is still confronted with two entities when building the report. This is perhaps the most challenging part of reporting against transactional schemas: deciding not to settle for the model that the OLTP designers have given us. Instead of resigning to have both Dim – Customer and Dim – Contact in our logical BMM, we should attempt to build the appropriate dimensional rendering of this model: a single Dim – Contact dimension of which Customer is simply another element in the hierarchy.
The first step in this process is to rename Dim – Customer to Dim – Contact, as it’s always best to name dimension tables according to the lowest level in the hierarchy: the declared grain of the dimension table. Then I add the physical contact tables to the Dim – Contact logical table source (renaming that as well for clarity purposes), and map all the new Contact-specific attributes to the appropriate physical table.
I also do some renaming of the Customer dimension, and include Contact Detail as the new, lowest level in the hierarchy.
I consider this merging of entities into a single logical dimension table to be more profound than our original constructing of Dim – Customer in the Simple Fact and Dimension post. Although we combined multiple normalized tables to get our original Customer dimension, those tables represented labels and descriptors, such as Customer Category and Customer Status. They didn’t represent true entities, which I argue is the distinction that exists when combining Customer and Contact. This type of high-level distinction comes into play when we try to think about our source model in dimensional terms instead of transactional ones.
An important feature of OBIEE when considering whether to build logical table sources with increasing amounts of physical tables added to them is performance. It might seem that combining the Customer and Contact entities in this way might have a negative impact on performance, because of all the tables that have to be joined to return records for the Dim – Contact dimension. Perhaps keeping the two dimension tables separate would mean less work that the database has to do. What if our query only requires Contact or Customer attributes in the result set, but not both? Doesn’t the BI Server have to include all the joins in the logical table source — in this case six tables — every time this dimension is used in an analysis? The answer is no. The BI Server is very, very smart, and in many ways acts like a database optimizer when structuring the physical SQL for an analysis. In each Intelligent Request Generation, it is aware of the smallest number of physical tables required to produce the result set in the logical query, and only includes those tables in the physical SQL. For instance, a sample analysis that returns only Customer attributes from our new Dim – Contact logical dimension only includes three tables in the SQL issued against the database:
Building a Date Dimension
One consistent element of every BI system is the necessity to handle dates effectively. For this purpose, when building a conformed data warehouse, we always build a date dimension of some kind or another. Ralph Kimball describes the special place that a date (or time) dimension plays in the data warehouse. We may have to split the difference here with Kimball, as you will see later. Regardless, we usually don’t have a date dimension table in our transactional schema. If you recall from earlier posts in this series, the application code for transactional systems usually exists in API’s designed specifically for single record “gets” and “puts”. For this reason, it’s very easy to develop a series of API’s designed to insert or return specific date attributes, and rarely do these systems deal with date conceptually as a hierarchy. It’s simply not part of what they are designed for. It’s not just dimensional models that require date dimensions: OBIEE as a product requires them in particular situations. If we want to capitalize on the built-in feature of automatic drill-down, then we have to build a logical dimension table that contains our levels: year, to quarter, to month, to day, for example. Additionally, the time-series capabilities that OBIEE has also require that our date attributes be configured in a logical hierarchy. We have two options for producing the non-existent date dimension:
- We can build a date dimension in our transactional database, or in our foundation layer, and create aliases for this physical table in the OBIEE physical layer. We can then use these aliases in our BMM to form one or more role-playing dimension tables for our facts.
- We can use the functionality in the OBIEE business model to logically “construct” a date dimension using a DATE attribute from the source table for the logical fact, and then use calculated attributes to flesh out the remainder of the required level-specific attributes.
The first bullet point is an attractive option, especially if the reporting environment is a replicated version of the transactional system, or better-yet, a foundation layer (described in this post), where we are able to inject additional physical structures specifically for reporting purposes. The join between the date dimension and the source table is constructed using a CAST of DATETIME to DATE for Oracle databases, which will generate the necessary TRUNC command in the join syntax. Although I recommend using DATE datatypes as the surrogate key in date dimensions, it’s not required, as date dimensions usually have a DATE dataype in them somewhere, and this column can be used as the join criteria in the physical model. Don’t be tempted into using OBIEE’s federation capabilities to use a date dimension from another database for this purpose. Though it may seem like the ideal solution on the surface, keep in mind that date-based filters would not be pushed down to the reporting database in this scenario, as these filters would have to be applied logically in memory by the BI Server. This produces a performance nightmare, and inevitably, the production DBA will blame OBIEE for not applying date filters on it’s queries.
Though it’s tempting to dive into the first bullet point further (feel free to leave comments about this), it seems like a fairly standard approach for developers familiar with using OBIEE against traditional data warehouses. We will instead focus on the second bullet point: constructing a date dimension out of sheer nothingness. We start by building an empty dimension table called Dim – Activity Date and mapping only one single attribute from our source table containing the DATE column, in this case, EBA_CUST_CUST_ACTIVITY.ACTIVITY_DATE. This will be the only attribute that we map back to a source table; all the remaining attributes will be calculated measures in the BMM. The OBIEE SQL language contains a series of date conversion functions specifically designed to extract date elements from a DATE datatype. Any functions that OBIEE can’t extract on it’s own can be extracted using the EVALUATE function to push database-specific syntax transparently down to the underlying database engine. I prefer to use the logical option Derived from existing columns using an expression as opposed to Derived from physical mappings whenever possible because we can reuse the Dim – Activity Date dimension with other facts, and we will only have to map a single column in the logical table source: the DATE attribute.
If we choose to use physical mappings instead, we will have to map all the attributes each time we add an additional logical table source to the logical dimension table. When our date dimension is complete, we’ll have a complete logical date dimension with accompanying dimension (hierarchy) ready to be used for drill-down, time-series, etc.
When we are all done with the BMM mapping, we can see how the BI Server generates the physical SQL against an Oracle Database. As we saw earlier, the BI Server doesn’t need to join to the EBA_CUST_CUST_ACTIVITY more than once, even though it’s now part of the logical table source for three logical tables: Dim – Customer Activity, Dim – Activity Date and Fact – Customer Activity.
In the next post, we’ll describe Complex Fact Tables… and see some really, really cool stuff. I promise!