Using OBIEE against Transactional Schemas Part 4: Complex Dimensions

December 14th, 2012 by

You guessed it… more on transactional schemas. For a quick recap, here are the posts to date:

Using OBIEE against Transactional Schemas Part 1: Introduction

Using OBIEE against Transactional Schemas Part 2: Aliases

Using OBIEE against Transactional Schemas Part 3: Simple Dimensions and Facts

Combining Entities

In the last post on Simple Dimensions and Facts, we discussed logical facts and dimensions, factless fact tables, and fake count measures. In constructing the simple dimension in that post, we saw the collapsing of normalized relationships into a single conformed dimension table when building Dim – Customer. Another important entity for reporting in the Customer Tracking application is the Contact entity. At first glance, it might seem that we should simply build another dimension called Dim – Contact, and use analyses to combine our Customer and Contact dimensions along with our Activity fact table to analyze Customer and Contact behavior. When we look again at the data model for the Customer Tracking application below, Customer and Contact are clearly two separate entities in this application, and so this probably means that we should continue to report on them this way, and structure them as such in the OBIEE Business Model and Mapping layer. Right?

Is this the way that business users think about their data? Although it makes perfect sense in a normalized data model to have separate entities for Customer and Contact, the business user likely won’t see it this way, especially when it comes to defining reporting requirements. Contacts are clearly specific instances of a Customer at a lower level in the Customer-Contact hierarchy, and our end users will expect to be able to drill cleanly from Customer on a report down to the specific Contact that is associated with a particular activity. Although OBIEE supports building hierarchies across multiple logical dimension tables so we can provide the drill-down behavior mentioned here, the end user is still confronted with two entities when building the report. This is perhaps the most challenging part of reporting against transactional schemas: deciding not to settle for the model that the OLTP designers have given us. Instead of resigning to have both Dim – Customer and Dim – Contact in our logical BMM, we should attempt to build the appropriate dimensional rendering of this model: a single Dim – Contact dimension of which Customer is simply another element in the hierarchy.

The first step in this process is to rename Dim – Customer to Dim – Contact, as it’s always best to name dimension tables according to the lowest level in the hierarchy: the declared grain of the dimension table. Then I add the physical contact tables to the Dim – Contact logical table source (renaming that as well for clarity purposes), and map all the new Contact-specific attributes to the appropriate physical table.

I also do some renaming of the Customer dimension, and include Contact Detail as the new, lowest level in the hierarchy.

I consider this merging of entities into a single logical dimension table to be more profound than our original constructing of Dim – Customer in the Simple Fact and Dimension post. Although we combined multiple normalized tables to get our original Customer dimension, those tables represented labels and descriptors, such as Customer Category and Customer Status. They didn’t represent true entities, which I argue is the distinction that exists when combining Customer and Contact. This type of high-level distinction comes into play when we try to think about our source model in dimensional terms instead of transactional ones.

An important feature of OBIEE when considering whether to build logical table sources with increasing amounts of physical tables added to them is performance. It might seem that combining the Customer and Contact entities in this way might have a negative impact on performance, because of all the tables that have to be joined to return records for the Dim – Contact dimension. Perhaps keeping the two dimension tables separate would mean less work that the database has to do. What if our query only requires Contact or Customer attributes in the result set, but not both? Doesn’t the BI Server have to include all the joins in the logical table source — in this case six tables — every time this dimension is used in an analysis? The answer is no. The BI Server is very, very smart, and in many ways acts like a database optimizer when structuring the physical SQL for an analysis. In each Intelligent Request Generation, it is aware of the smallest number of physical tables required to produce the result set in the logical query, and only includes those tables in the physical SQL. For instance, a sample analysis that returns only Customer attributes from our new Dim – Contact logical dimension only includes three tables in the SQL issued against the database:

Building a Date Dimension

One consistent element of every BI system is the necessity to handle dates effectively. For this purpose, when building a conformed data warehouse, we always build a date dimension of some kind or another. Ralph Kimball describes the special place that a date (or time) dimension plays in the data warehouse. We may have to split the difference here with Kimball, as you will see later. Regardless, we usually don’t have a date dimension table in our transactional schema. If you recall from earlier posts in this series, the application code for transactional systems usually exists in API’s designed specifically for single record “gets” and “puts”. For this reason, it’s very easy to develop a series of API’s designed to insert or return specific date attributes, and rarely do these systems deal with date conceptually as a hierarchy. It’s simply not part of what they are designed for. It’s not just dimensional models that require date dimensions: OBIEE as a product requires them in particular situations. If we want to capitalize on the built-in feature of automatic drill-down, then we have to build a logical dimension table that contains our levels: year, to quarter, to month, to day, for example. Additionally, the time-series capabilities that OBIEE has also require that our date attributes be configured in a logical hierarchy. We have two options for producing the non-existent date dimension:

  • We can build a date dimension in our transactional database, or in our foundation layer, and create aliases for this physical table in the OBIEE physical layer. We can then use these aliases in our BMM to form one or more role-playing dimension tables for our facts.
  • We can use the functionality in the OBIEE business model to logically “construct” a date dimension using a DATE attribute from the source table for the logical fact, and then use calculated attributes to flesh out the remainder of the required level-specific attributes.

The first bullet point is an attractive option, especially if the reporting environment is a replicated version of the transactional system, or better-yet, a foundation layer (described in this post), where we are able to inject additional physical structures specifically for reporting purposes. The join between the date dimension and the source table is constructed using a CAST of DATETIME to DATE for Oracle databases, which will generate the necessary TRUNC command in the join syntax. Although I recommend using DATE datatypes as the surrogate key in date dimensions, it’s not required, as date dimensions usually have a DATE dataype in them somewhere, and this column can be used as the join criteria in the physical model. Don’t be tempted into using OBIEE’s federation capabilities to use a date dimension from another database for this purpose. Though it may seem like the ideal solution on the surface, keep in mind that date-based filters would not be pushed down to the reporting database in this scenario, as these filters would have to be applied logically in memory by the BI Server. This produces a performance nightmare, and inevitably, the production DBA will blame OBIEE for not applying date filters on it’s queries.

Though it’s tempting to dive into the first bullet point further (feel free to leave comments about this), it seems like a fairly standard approach for developers familiar with using OBIEE against traditional data warehouses. We will instead focus on the second bullet point: constructing a date dimension out of sheer nothingness. We start by building an empty dimension table called Dim – Activity Date and mapping only one single attribute from our source table containing the DATE column, in this case, EBA_CUST_CUST_ACTIVITY.ACTIVITY_DATE. This will be the only attribute that we map back to a source table; all the remaining attributes will be calculated measures in the BMM. The OBIEE SQL language contains a series of date conversion functions specifically designed to extract date elements from a DATE datatype. Any functions that OBIEE can’t extract on it’s own can be extracted using the EVALUATE function to push database-specific syntax transparently down to the underlying database engine. I prefer to use the logical option Derived from existing columns using an expression as opposed to Derived from physical mappings whenever possible because we can reuse the Dim – Activity Date dimension with other facts, and we will only have to map a single column in the logical table source: the DATE attribute.

If we choose to use physical mappings instead, we will have to map all the attributes each time we add an additional logical table source to the logical dimension table. When our date dimension is complete, we’ll have a complete logical date dimension with accompanying dimension (hierarchy) ready to be used for drill-down, time-series, etc.

When we are all done with the BMM mapping, we can see how the BI Server generates the physical SQL against an Oracle Database. As we saw earlier, the BI Server doesn’t need to join to the EBA_CUST_CUST_ACTIVITY more than once, even though it’s now part of the logical table source for three logical tables: Dim – Customer Activity, Dim – Activity Date and Fact – Customer Activity.

In the next post, we’ll describe Complex Fact Tables… and see some really, really cool stuff. I promise!

Comments

  1. Emil Traicu Says:

    Hi Stewart,

    I’ve read with very high interest your posts regarding the use of OBIEE against transactional schemas. Very interesting. There are very few detailed posts regarding the use of OBIEE against transactional schema.
    As I’ve been tasked to create an OBIEE metadata model against transactional schemas, in particular a merchandising module of our ERP suite, I’d like to make 2 comments:

    A. in reference to part 3 [Simple Dimensions and Facts] – there looks to be at least 3 ways of building logical dimensions:
    1. by modeling the relationships between the tables that are making up the dimension in the BMM layer, as you suggested. This way all tables will map to a single LTS. One advantage is that outer joins between LTSS-s can be properly modeled.
    2. by modeling the relationships between the tables that are making up the dimension in the Physical Layer. This way every table will map to its own LTS in BMM. One disadvantage is that outer joins can’t be modeled here.
    Obviously there’d be a hybrid method as a combination of #1 and #2, i.e. all tables that are making up a level in the hierarchy will be modeled as LTSS-s for an LTS corresponding to that hierarchy level. This way the model would be more readable.
    3. by building views in the back end DB/schema that are already assembling the tables that are making up the dimension.
    Basically any of the aforementioned methods should work. However it looks like if one wants to make use of Aggregate Navigation with level-based fragmentation then the first 2 methods fail to work, especially when multiple hierarchies are being defined for dimensions. The only way I manage to make it work was to define back end DB views that are assembling all the tables making up the dimension and furthermore defining views for related shrunken dimensions.
    Maybe I missed something but I tried all the possible combinations of the first 2 aforementioned methods. As soon as I was trying to implement the aggregation navigation mechanism the BI server was throwing weird errors, despite that the consistency check have been successfully passed.
    So the question is – have you ever tried the implement the Aggregate Navigation with level-based fragmentation against transactional schema? If not I am afraid that ad-hoc queries running against transactional schema will take quite long to aggregate data.
    OBIEE 11.1.1.6.2 BP1 has been used

    B. in reference to part 4 [Complex Dimensions] – you merged the 2 dimensions, i.e. Dim-Customer and Dim-Contact, into Dim-Contact, with declared grain of Contact. I wonder how ‘Fact–Customer Activity’ relates to this new dimension. It looks like its granularity is Customer. Is it hooked up at the Customer level? How this originates in the physical layer? I don’t have enough info about the Customer Tracking application/schema but I am having hard time figuring out how you could define activities at the contact granularity. Besides a quick glance at the diagram in your post leads me to the conclusion that there’s a many to many relationship between customer and contacts which makes me questioning the hierarchical compliance of your merged dimension, unless there are additional business rules that would ensure this compliance.

    Thanks,

    Emil

  2. Stewart Bryson Says:

    @Emil… thanks for your verbose comments. I can only assume you got something out of this by all your feedback, so thanks for reading.

    A lot of people tend to build views in the backend (or opaque views in OBIEE… same difference). But hopefully, if these posts have shown anything, they’ve shown that this is not preferable, though admittedly sometimes necessary. The power of OBIEE is that it’s kind of a “dynamic view generator”… only including the smallest number of tables necessary to facilitate the query. Creating views would eliminate that functionality for the most part. To answer your question… yes, I’ve had a lot of success with Aggregate Navigation using this methodology. I actually prescribe Aggregate Navigation as the first step to solving performance for some clients before taking the plunge into a full-blown data warehouse, depending on requirements, budget, etc.

    The Contact belongs to one and only one Customer… at least with the data set that I created to use in the presentation. I didn’t look closely to see if I could add a contact to multiple Customers… I didn’t even try. The app is a bit dated, so I was really only interested in the schema. For my purposes, there was many-to-one from Contact to Customer. If you wanted to support many-to-many between Contact and Customer, then you are correct… this would not be a standard level hierarchy and you would need to separate the two entities.

    Stewart

Website Design & Build: tymedia.co.uk