Does OWB Create Logical, or Physical, Data Models (or a Bit of Both?)

July 10th, 2006 by

I’m just working through the Data Modelling part of my upcoming seminar, and
I’m at the point where I’m thinking about logical and physical modelling within
Warehouse Builder 10.2. Warehouse builder seems to mix up the concepts of
logical and physical modeling, and therefore I thought it worthwhile to work
through what I’m thinking, and perhaps get some input from other OWB users.

Most people’s understanding of logical and physical data warehouse data
modeling goes along these lines:

  • Your logical warehouse model consists of entities, relationships, keys,
    items (or columns) and so forth, is usually concerned with the normalized
    view of the warehouse (usually the staging or ODS layer), and is held in
    either spreadsheets or a tool such as Oracle Designer. You might also
    annotate your model with useful information ("metadata") and
    perhaps business rules, although this will usually go in the functional
    specification.
  • You might also have a logical model of the dimensional part of your
    warehouse, which contains elements such as dimensions, hierarchies, levels,
    attributes, measures and so on. It is product-neutral and again might be
    stored in Excel or Designer.
  • The physical model, usually automatically derived from the logical model,
    contains the same items, perhaps denormalized, but with physical attributes
    – datatype, index details, partitioning details, storage clauses and so on -
    added on. This will certainly cover both the normalized and dimensional
    views of the warehouse, and it may in fact be the first point at which
    dimensional objects are mentioned.

Now with warehouse builder, the physical and logic models of the warehouse
appear to be, to my mind, mixed-up together. One way you could look at it would
be that the logical model is what’s in the OWB repository, whilst the physical
model is the result of the logical model’s deployment – but that doesn’t give
you a phyiscal model as such, just a logical model and the results of it’s
deployment.

In fact the model that’s in the OWB repository contains both logical and
physical elements. It contains metadata and business rules (a.k.a. "data
rules") which could certainly be considered as part of the logical model,
and OLAP constructs that are held at a level of abstraction above the physical
implementation, to support the ability to deploy a dimension or cube to either
relational or multi-dimensional storage. And yet, what the OWB repository looks
most to me like is a model of the physical implementation of the warehouse with
elements of the logical model thrown in, but to work it through it’s probably
worth walking through some examples of how objects are represented in the
repository, to see if this idea holds water.

A first example is where the physical definition of objects in the OE schema
are imported into a data source model. In this case, the objects from the OE
schema appear in a data source module, and they’re clearly a straight
representation of the physical implementation of the objects, i.e. this is a
physical database model.

There’s no entities, relationships and so on. It’s a straight
physical model.

Now, say we do some data profiling on the OE schema, carry out
some corrections and define some data rules. A data rule is a logical construct
in Warehouse Builder that translates into a physical check constraint on the
physical table, and some instructions to OWB to wrap up the table, whenever it
is mapped to, in a construct that deals with rows that fail the data rule. So,
our model now contains logical information – the data rule – together with the
physical table design. It also contains functional information – a mapping to
implement a correction based on this data rule – and some additional tables that
the mapping will use when processing the correction.

Next, we define a dimension that takes it’s data from the
CUSTOMERS table. Some people would say that a dimension is a physical
implementation of the logical object, Customer, whilst others would argue that
the dimension object is a logical object in it’s own right. When we come to
create the customer dimension, it sits under the database node along with
tables, cubes, external tables and sequences, which would suggest that it’s a
physical object that’s held alongside the other physical objects in what must be
the physical model of the warehouse.

And yet – when you view the dimension in the Data Object Editor,
you can change the deployment type for the object from relational to
multi-dimensional, which suggest that the object we’re dealing with is a logical
representation of the dimension, with the physical implementation being hidden
away internally so as to present a unified approach to the developer. I can see
why this has been done – to keep away from the developer the complexities of an
OLAP deployment – but it does mean that OLAP objects in this model are logical
representations, not physical (although you can specify details of the physical
implementation, such as the AW name and tablespace within this logical model,
once you’ve chosen a storage method).

So my conclusion is, that OWB is a physical database modeler,
with certain details abstracted out (data rules, OLAP storage) to make life
"easier" for the developer. It’s not really a logical modeller,
although you could consider the representation of the warehouse in the OWB
repository as being the logical model, although I’m sure a user of Designer
would disagree.

Does anyone else have any views on this, perhaps a different
perspective? Add a comment if you have.

Comments

  1. Scott Powell Says:

    Mark, I agree 100%. OWB 10gR2 definitely seems to be a blend of both physical and logical items.
    There are really two items where I think things could have been done a bit better. The first is “locations”, they seem to logically represent an items such as “where to deploy my data warehouse”, but then there doesn’t appear to be an easy way to repoint it someplace new (for instance, from dev to production). Maybe I’m missing something, but I’d be a lot happier (I think…) if they had a purely 100% logical “location”, and then tied that to purely 100% physical servers / schemas.
    The other thing that I think is a bit clunky is that, when you “logically” design a dimension or cube using the wizards, it creates physical objects such as tables, sequences, etc. However, if you go back and make changes to the logical item – that doesn’t carry through to the physical objects – you have to go back and edit those manually.
    What do you think?
    p.s. Other than these two items, I haven’t really run into many problems with the logical vs. physical pieces. I think they’ve done a pretty good job at balancing the two.

  2. Robert Gauf Says:

    You’re right Mark, it is a mix, and not a clear cut one either (for an old Designer hack). You’ve actually got bits of three levels though. Try this thought:
    - Logical model elements are as you mentioned product neutral. And definitely implementation neutral. The dimensions / cubes are the best example (and rules too). I like starting with the cube and dimension description and then pushing down. Because I can talk to techie business folks about them. And directly relate them to business needs (like KPIs).
    - the physical model is product dependent, but still implementation neutral (or indifferent at least). The modeling of the cube as relational or OLAP fits here. As well as constraints, keys, indexes, views, MVs, sequences, etc to support all the content and transform / extract modules. Its the layer the module developer works at. And its the layer I add things into (like views and MVs) to make it easy for the business user (who thinks at the logical model layer).
    —> So confusion can arise with relational elements. Because a physical table may represent a logical element directly — OR — a physical view may better represent the logical element. But the process has probably been to consider the logical element (but not document it), define the table(s) to support it, and then create the view to physicalize it directly. Messy.
    - The third layer is the implementation / environment / deployment model. Scott’s comment hits this point. The tablespaces, the storage clauses, the server nodes, “grid”ed partitions. These all need modeling to properly deploy. But they quite often fall below the developer’s need (i.e. change index > no change to load flow, change partition > maybe change load). Important to see though, that they can be changed by the storage guru (aka DBA) without changing the upper level physical model (i.e. table is the same regardless of partitioning).
    So OWB does support all three levels. Does a good job of physical and implementation level. Supports a logical model, but not with the degree of model separation that Designer has. And improvements could be made at each level. Ah well.

  3. kosour Says:

    Mark, strictly speaking, unless you’ve deployed dwh object, you are in logical model enviroment. It doesn’t matter how many details about physical implementation you’ve pointed out. OWB can or can not decide to use this or other detail of real implementations. Right? :) So, the problem is how can I say to owb to use this or others details. I mean, OWB may not to have place to store particular object’s property. It’s obviously, because OWB is one step behind Oracle Database.

  4. Mark Says:

    Scott – two good points, thanks. I can’t help thinking, with locations (and indeed with all connections to Oracle and (particularly) non-Oracle datasources, the connection feature could have been done much better. Contrast to ETL tools such as Datastage, even DTS, which connect via ODBC and it’s only a matter of picking one connection, then picking another one to redirect the mapping. I know ODBC is traditionally Windows-only and isn’t really a suitable solution for Oracle, but there has to be a better way than the current setup, which as you say is clunky and doesn’t redirect well.
    Robert – good points, I’ll add it to my commentary. Thanks for this.
    Kosour – I have to disagree that anything in the model is by definition “logical” – you can have a model of a physical implementation, this is fairly standard stuff in Designer. One thing that you did pick up on (I think) though is having different object configurations for different locations – say a particular configuration when deployed to dev, another to prod to reflect the larger environment. Now that’d be useful.
    cheers
    Mark

  5. Robert van Roon Says:

    Mark,
    I am currently working in a data warehouse project using OWB r1. We currently had a discussion in my group about where should we document the different objects of the data warehouse: tables, views, the cubes, the hierarchies, indexes, mappings, but also the design decisions and business rules. This discussion came up with the question in mind how do we maintain the documentation if the data warehouse is changing in time.
    Most people within my group agreed that OWB represents the physical model of the data warehouse. The logical data model of the data warehouse should be made by using another tool, like Designer, PowerDesigner or Erwin Data Modeler, they said. At this point even the question came up why still using OWB for physical data modeling if you can make use of those tools that generates the physical data model.
    What’s your opinion? Do you know a way to generate the documentation from the Design Repository?
    Kind regards,
    Robert van Roon.

  6. Jean-Pierre Says:

    Hey Mark,
    Just to comment on the multiple configurations comment you made. I’m sure you know but that is supported in OWB 10.2. It allows you to keep the logical model separate from the storage for example (small for dev, larger for prod). It also throught the control center allows you to restrict who touches what.
    Adding configuration to design gets you the generated code for the various environments and this works for all objects (tables, mappings etc.).
    Thanks,
    JP

Website Design & Build: tymedia.co.uk