I've finally gotten around to finishing this series... I believe it's actually been a year in the making. I'm planning on being more proactive with getting content on the blog. Actually... the main reason I'm putting this one to bed is I have some exciting posts planned in the coming months, and I feel guilty writing those without closing this one off. For the recap, here are the previous entries:
In this post, I want to talk about complex facts. The OBIEE metadata layer is a very powerful force indeed, and I can't demonstrate all the possibilities for shifting a normalized schema into a dimensional one. Instead, I thought I would show one really interesting example... something extremely powerful, in hopes that others reading this series might find inspiration to try something daring themselves.
OLTP developers design their systems with one task in mind: facilitating the functions of the transactional system. Even when standard reporting is delivered as part of an OLTP system, the flexibility of the schema for these purposes is usually an afterthought because the stated goal of these systems is (and should be): delivering transactions with as much efficiency as possible. It should come as no surprise to BI developers when these systems store unrelated activities (at least from a source-system perspective) in completely separate tables. We often want to combine these activities in a coherent structure — usually an activity-based fact table — to be able to GROUP BY the Activity Type in a single report. It’s usually these advanced reporting requirements that cause us to pull out SQL Developer Data Modeler and start designing a data warehouse, which is probably a good idea. But for the reasons mentioned in the introductory blog post in this series, a data warehouse isn’t always in the cards. Desperate for their data, business analysts pull out that old Swiss Army knife — Excel — and start combining data sets and pivoting them, generating the reports that the business needs. We have to understand the need to deliver content to the business quickly, and the business can't always wait for mounds of ETL code before making very important decisions.
The Business Model and Mapping layer available to us in the OBIEE Semantic layer provides us the ability to present a logical fact table, such as Fact - Customer Activity, as a shell for many types of activities, and even combine those activities in intelligent and performant ways. The example we’ll construct from the Customer Tracking application (which has been the basis for all the previous entries) involves reporting against the activities stored in the EBACUSTCUST_ACTIVITY table. As mentioned earlier, this table tracks explicit CRM activities related to particular Customers and Contacts, including meetings, phone calls, etc. We have several activities that we would find useful to combine with these explicit events, such as Customer creation dates, Customer inactivity dates, etc. These implicit activities would look great combined in our Fact - Activity Fact table so we could include the type of activity in our GROUP BY list, and return the results when we drill down to the detail level for a particular customer. We could try to build this integration in the dashboard itself to show the account creation date from Dim - Contact on the same dashboard with dates of CRM activities. But we should all admit that, it's a better solution to build this functionality in the Business Model if it's possible. But is this feasible without ETL? Would we have to stitch this together using perhaps presentation variables, or worse: have business analysts dump the results of two separate analyses into an Excel spreadsheet and produce the report they need outside of OBIEE?
So we want to add an implicit event to our Fact - Customer Activity logical table: the creation of a Customer account, which is represented in Customer Tracker with the CREATEDON column in the EBACUST_CUSTOMERS table. We’ll start by adding another logical table source using this source table and provide the same hard-coded value of 1 to the Activity Count measure:
Remember: this is still a factless fact table, and we have to provide a measure to the OBIEE semantic layer which allows aggregate queries. We have a little bit more to do with these logical table sources, but we need to make similar changes to several of our logical dimension tables as well before we complete this task. I'll preview two logical table sources below (both called Customers), and then explain them further down. The one on the left is a new logical table source for the Dim - Customer Activity table, while the second is for the Dim - Activity Date table:
In the Dim - Customer Activity logical dimension table, we'll create a new logical table source also based on the EBACUSTCUSTOMERS table. As this is a dimension table and requires that we map a value for the primary key, we simply use the physical primary key from the EBACUSTCUSTOMERS table. Notice that we have constructed a new logical column called Activity Source Type. This attribute will allow us to differentiate our explicit activities, such as those sourced directly from the EBACUSTCUSTACTIVITY table, from this new implicit activity that we are constructing from the EBACUST_CUSTOMERS table. We also provide several hard-coded values to other attributes in this dimension table to compensate for the lack of values for those attributes for our implicit activities.
We also need to provide an additional logical table source for our date dimension Dim - Activity Date. This is where the magic starts to happen. The creation of the Customer account is actually the event that we are interested in reporting on, so it’s this date that ties all the activities together. We’ll map the CREATEDON date from EBACUSTCUSTOMERS to the single Activity Date column that we have defined in the logical table source and let the other calculated measures provide the remaining attributes necessary in the logical dimension table. However, since the CREATEDON column in the EBACUSTCUSTOMERS table is defined as a DATETIME attribute in the physical model (and we want it to remain that way when we view it as a dimensional attribute), we need to modify the expression slightly in the logical table source to remove the time element. As the calculation is not visible in the image above, I've listed it here:
Cast("orcl".""."CUSTTRACK"."EBACUSTCUSTOMERS"."CREATEDON" AS DATE )
The only remaining dimension table is Dim - Contact, but we don’t need to make any changes here, as the EBACUSTCUSTOMERS table is already adequately represented in the logical table source. Because we are bringing a logical table source associated with this logical dimension table into our logical fact table, the BI Server already understands how to construct the join (or more correctly, the lack of a join) with this logical dimension.
Now, we can return to the logical table sources for Fact - Customer Activity to exploit one final piece of sheer magic from the BI Server. For each logical table source, we select the Content tab and make the following changes:
There’s a few really important bits that we are unlocking here. First: we need to check the option next to This source should be combined with other sources at this level. Ordinarily, logical table sources are usually selected by the BI Server using an OR evaluation: one LTS per combination of fact and dimension joins. (There are exceptions to this, but I'm distilling the content down a bit to hopefully make this easier to follow). This setting instead dictates that the LTS’s should be evaluated with an AND instead. We are instructing the BI Server to combine these two logical table sources as if they existed in the same table. This is done using a logical union, which manifests itself as an actual UNION statement in the physical SQL when both sources exist in the same database. We can see this behavior by examining the physical SQL generated by an analysis using Fact - Activity Fact:
The more impressive functionality comes when we make use of the expression specified in Fragmentation content. This logic instructs the BI Server to do a kind of partition pruning that is similar to the behavior of the Oracle Database optimizer when dealing with partitioned tables. What we have constructed here is really a form of logical partitioning, with the source for the Fact - Customer Activity logical fact table existing in two logical physical sources, or partitions. So far, our query wasn’t precise enough to allow the BI Server to prune down to a single logical table source. However, when we choose to filter on the Activity Source Type logical column either directly or by drilling down, which is the same column we defined in our Fragmentation content section, the BI Server removes the UNION statement and generates a query against a single logical table source:
It’s still a best practice to build conformed data warehouses, and transactional reporting should be seen as a means to an end. Regardless, there will always be reasons to do transactional reporting, and the power of the Business Model and Mapping layer provides us with capabilities to deliver analyses and dashboards to replace the Excel spreadsheets that often form the cornerstone of transactional reporting. I hope you've enjoyed this series... and perhaps the long delays between each entry kept you on the edge of your seat for each new installment. Perhaps not. Regardless... drop me some comments and let me know what you think.