Is A Dimensional Model Suitable For All Types Of Reporting (Even Operational)?

June 15th, 2005 by Mark Rittman

Next week I’m off to see a client in Ireland, who’s using us to help put
together the design for their data warehouse. They’re looking to use tools such
as Business Objects, Discoverer and so forth, there’s a certain element of OLAP-style
decision support reporting required, but the vast bulk of the reporting they’re
going to need to do is simple, tabular reporting. Their current tool is Cognos
Impromptu, running against a normalised set of OLTP-style tables, and they want
the warehouse to take the load off of their existing systems and satisfy all of
their reporting needs, both operational and analytic/forecasting.

The question is, should the warehouse I put together for them be dimensional
in design? Or should it be a normalised, "Inmon"-style warehouse? Now normally,
I would stick to my guns when putting a dimensional data warehouse together, as
I’m of the opinion that even though the warehouse is designed around facts and
dimensions, it can still meet all reporting needs - there’s not something
special about dimensional designs that leave them unable to meet certain types
of operational reporting requirements. Sure, you’ve got denormalized data, and you might
need to set up a new fact table to provide a new set of reports where the data
doesn’t currently exist, but if you’re building a reporting system, regardless
of whether most of the reports are operational (and not OLAP, or crosstabs) - a
dimensional data model will still fit the bill. Or does it?

Now of course you could say that the operational reporting could be met from
the ODS. Certainly if the warehouse was only loaded weekly, and the ODS was
loaded daily or more frequently, I’d agree with you. But if the ODS (which I
usually consider as being the data integration layer, not the reporting layer)
is as up-to-date as the dimensional star schemas fed from it, why not do the
reporting off of the star schema?

So, what’s your opinion?

Comments

  1. raul Says:

    Mark, from a data point of view, sure you could report from either. However, there is always more than data, namely: the timeliness of data, system resources, and also the SLAs.
    If your ODS gets data throughout the day, and the warehouse is not refreshed until the night batch cycle happens (or is this a real-time data warehouse), then you won’t be able to meet certain reporting criteria from the data warehouse (e.g: today’s sales, as they happen). On the other hand, certain groups might want the data reported after it’s fully loaded, and validated - that means data warehouse.
    There are situations when you don’t want to subject your busy ODS to the random (timewise) heavy querying typical for reporting usage. Typically, the ODS supports tons of operational systems, which have higher priority then reporting.
    We are building 2 reporting environments at Canadian Tire, one will be tactical reporting from a 3NF ODS, the other one is the classical dimensional Cognos. For sure, reporting off the ODS will be the most exciting project.
    Good luck
    Raul

  2. Deepak Puri Says:

    Hi Mark,
    This is largely based on our Telecom Call Center Operational Reporting and Dashboarding experiences, using the SQL Server OLAP engine: we’ve been able to accommodate near real-time (upto 5-minute update cycle) requirements using incremental updates of OLAP cubes (typically taking a few seconds).
    Obviously, our operational Telecom OLAP cubes are much simpler than the cubes used for Customer Experience analysis (which are only updated nightly), and don’t need as much integration with “enterprise” dimensions. And even the operational cubes are rebuilt nightly, to allow for deletion of old data and for dimension maintenance.
    Regards,
    - Deepak

  3. Pete_s Says:

    I would get all the queries into one place - the DW!
    I would also go for dimensions and denormalized reference data. “Starflake” (halfway between star and snowflake) may be worth looking at but could cause a few problems in Business Objects Universe creation - not big problems, but the ones that make queries messy
    One problem to consider is how trickle (or micro-batch) data feeds of fact will work. I guess that a large number of aggregate tables is out - probably because of the need to maintain them in real time (especially for changes in reference data across history). But if you rely on DW methods such as bitmap indexes you might well have locking problems with continuous insert of fact.

  4. Mark Says:

    Thanks everyone for the feedback.
    One question though - what do you think about the suitability of a star schema for operational, non-OLAP reporting? My instinct says that a star schema is just another form of relational schema, and there’s nothing about it that precludes basic, operational reporting, but from our initial analysis, there’s not a lot of “crosstab” style reporting going on, and not many “facts”. Most of the data that will be reported on is textual, dimension attribute stuff - people’s names, addresses, courses they’re on and so forth.
    Now of course you can always put fact tables together - the fact could be people attending courses, or people buying products, or people registering change of addresses, with the dimensions holding the details of customers, products, courses and so on - but is this stretching the dimensional model too far, and should infact the schema be designed as a “normalised” data warehouse, not a dimensional one? As I say, my instinct is to go with the dimensional model regardless, but I’m just wondering what other people’s opinions are…

  5. Pete_s Says:

    Further thoughts - still go for a dimensional model but for the ‘fat’ base reference tables (looks like customer in your case) implement it ‘normalised’ (or partially denormalised) and put the higher levels of the customer hierarchy in a more traditional denormalised table to help roll-up queries.
    If your query tool baulks at single table queries (say, all customers who moved house this month) then cheat! Make a single row ‘dummy fact table’ and cartesian join it to the dimension table.

  6. Gabe Says:

    “They’re looking to use tools such as Business Objects, Discoverer and so forth, there’s a certain element of OLAP-style decision support reporting required, but the vast bulk of the reporting they’re going to need to do is simple, tabular reporting. Their current tool is Cognos Impromptu, running against a normalised set of OLTP-style tables, and they want the warehouse to take the load off of their existing systems and satisfy all of their reporting needs, both operational and analytic/forecasting.”
    One option would be to leave the existing operational reports as is

  7. Tricia Says:

    Mark - stick to the dimensional model. One thing to remember with reporting projects (and I’ve done several) is that the end users change their minds on what and how to report when they have different possibilities. OLAP takes advantage of pre-aggregating data, and that can positively impact both report performance and the users ability to abstract. When moving from an operational reporting to an analytic reporting culture, it takes users a while to understand that they can slice and dice at any level of a dimensional hierarchy and how to combine those different dimensions to show trends, patterns or items of interest. If you use database views to support reporting, you can always include some ODS data if needed to provide more context to dimensional data. Good luck!

  8. Mark Says:

    Thanks everyone for the advice.
    I’ve taken this all in and have got a plan now. With a bit of luck I’ll be able to post a follow-up article towards the end of next week / start of the week after letting you know how I’ve got on. Thanks again for all the advice.

  9. Kiriti Mukherjee Says:

    Hi Mark,
    I had similar doubts about creating a star schema having fact tables with lots of textual information. My client wants to have operational reporting (like invoices, order details etc) as well as some analytical capabilities (top 5 orders for the day by product by geography - and get details of the order - lots of textual attribute info).
    After a lot of debate, we decided to go for a star schema, and we have been hugely rewarded. I had the Orders fact table with about 10 facts and 100 attrubute fields and I let the users loose. I was amazed by the diversity of their analyses. We have the standard dimensions - customers (bill_to, ship_to etc), products, time, order type etc, but beyond that, the attribute fields helped the users get a lot of meaningful operational information. Yes, a case can be made for “junk” dimensions, but we realized too many of them will create just that - junk. If you have proper indices and partition your fact table, I can say from very recent experience, that it will work. I have now built fact tables for Orders, Invoices, Contracts and General Ledger from an Oracle Apps 11.5.9 database and data from other systems like BPCS and Great Plains.
    Thanks,
    Kiriti

  10. Mark Says:

    Hi Kiriti,
    Just to clarify something. In your reply you say that “I had the Orders fact table with about 10 facts and 100 attrubute fields and I let the users loose.”
    Does this mean that you put attributes (customer type, product size, postcode and so on) directly in the fact table? If so, why did you do this and not put them in the dimensions as normal? Was there some benefit to doing this? If you took attributes out of the dimensions and placed them directly in the fact table, how did you still support rolling up dimension members through hierarches, if the user wasn’t selecting attributes from a dimension record?
    I’d just be curious on how you approached this.

  11. David Aldridge Says:

    Late to the party, as usual.
    The key issue for me in deciding whether to go for a star or a more normalized model is the levels at which the different facts exist. If you have related metrics on annual purchase requirements, monthly inventory changes, daily deliveries and timestamp sales, and you have to run reports to compare annual purchase requirements with year-to-date deliveries then that’s going to be tough out of a single star, and I drift towards 3NF.

  12. Kiriti Says:

    Hi Mark,
    No, I did not put dimensional attributes on the fact table. For example, the product class, type etc that make up the product hierarchy belong to the product dimension. But if there are attributes of the order like customer’s PO number, payment cheque number, shipment waybill number etc - these are atributes of the order (the degenerate dimension, if you will) itself. These are put in the fact table. Customers who do a lot of analysis with atomic data find this kind of fact table design very useful.
    This kind of design, although very uncommon (this is the first time I have tried it with success) offers the benefits of both worlds - additive facts that can be sliced/diced/rolled up by various dimensions (like product, bill-to customers, ship-to customers, warehouse, order type etc) as well as detailed fact-attributes (like, for example, descriptive flexfields - these are pure text fields where customers store whatever information they want) for atomic level data analysis. I must admit that this kind of design will not be good cube material, but for that, we have aggregates built as materialized views.
    Thanks,
    Kiriti

  13. Vantueil Says:

    Thanks Kiriti– your design may not be so uncommon. I am in the clinical healthcare domain, and my “measures” (more properly “fact attributes”) are almost totally structured text attributes on atomic services. Analysis is interested in working with these attributes. The only real numerics are counts of services meeting certain conditions. A star schema approach has been historically resisted, but my intuition (thinking along lines similar to Kiriti) is that with properly designed dimensions the current analysis requirements can be met, and the model will be in place to support more complex dimensional analysis to come.

  14. Doo don Says:

    If you have non-OLAP reporting and you don’t have adhoc reporting requirements why should you built the dimensional model?. Converting data from relational (OLTP system) to dimensional format is generally complex and time consuming. Tuning the database for star transformation is lot more complex. Why somebody needs to go through all these if all the report queries are ‘known’ and we can properly index the database for those queries?

  15. Vantueil Says:

    The OLTP system in this particular environment is non-relational. The transformations getting to an analysis/reporting relational DW are horrendous already. And we have new business directions that convince us that upcoming query needs are far from known. But point taken– it will be a lot of work.

  16. Doo don Says:

    If one schema is used for both operational and OLAP reporting I would suggest to add the natural keys to fact tables for all slowly changing dimensions. In the initial posting Mark says his users are using Cognos Impromptu for reporting. I have seen a advantage for Power play cube building if star schema is used. But for Impromptu is was the opposite.

  17. Doo Don Says:

    I attended the ODTUG 2005 conference last week at New Orleans. One of the presentations was on Designing data warehouse for high performance by Vincent Chazhoor. The question ‘Relational Model or Dimensional Model’ was addressed in detail in the presentation. ODTUG usually posts the power point slids at their website every year. So I think it will happen this year also.

  18. Simon Kirk Says:

    In this instance, I would tend to agree with a previous posting (Doo Don).
    By “this instance”, I’m assuming the following conditions are met as part of
    your analysis:
    * Reporting requirements are known and static (little or no ad-hoc)
    * They are on the whole transactional, without the need for aggregation
    * The daily generation of these reports does not compromise NFR’s, SLA,s
    data consistency or ODS loading.
    * The scope of the reporting embraces the majority of the enterprise data
    held in the ODS.
    * The ODS is manageable to navigate and query by those identified to use it.
    Under these conditions, a dimension modelling process would surely be an
    unnecessary and manually intensive exercise to in effect, replicate the
    entire ODS data, with little or no added value at this time.
    An iterative assessment of the most frequently targeted ODS data, in
    conjunction with any “likely” ad-hoc and/or summary requirements in the near
    future could drive dimensional modelling of these key areas, on an as need
    basis.
    It depends I guess on the size of the business and diversity of business
    function.
    I have read this posting with interest, as we are starting an impact
    analysis of how best to improve our own warehouse. We are in the situation
    where we have a complex warehouse structure (resulting from multiple source
    merges and related timing issues, data consistency, and history tracking
    requirements). This makes it an environment unsuitable to all but a few key
    analysts. We wish therefore, to formally rename it as an ODS (which it is in
    all but name), restrict access to all but a few pre-designated expert users
    as gatekeepers (CIF - Inmon), and expand the existing suite of marts (in
    terms of granularity and business function) to plug the reporting gap that
    this creates.
    The difficulty - and this is where it relates to your situation - is that
    there will always be a need for granular ad-hoc analysis (in our world at
    least) for cross-referencing single customers (for example) meeting unusual
    criteria across any number of business functions at transaction level. The
    effort to pre-empt, and implement the potential scope of this in the marts
    is at least formidable, if not impossible. It is not necessarily desirable
    either; where multiple fact table joins can prove confusing to users,
    especially where inconsistent granularity exists. Also the ETL overhead is
    likely to create NFR issues. Thus a trade off is inevitable, where ODS
    reporting at transaction level is justified to meet the shortfall in mart
    scope.