Agile Data Warehousing with Exadata and OBIEE: Puzzle Pieces

December 28th, 2011 by Stewart Bryson

In the previous post, I laid the groundwork for describing Extreme BI: a combination of Exadata and OBIEE delivered with an Agile spirit. I discussed that the usual approach to Agile data warehousing is not Agile at all due to the violation of it’s main principle: working software delivered iteratively.

If you haven’t already deduced from my first post — or if you haven’t already seen me speak on this topic — what I am recommending is bypassing, either temporarily or permanently, the inhibitors specific to data warehousing projects which limit our ability to deliver working software quickly. Specifically, I’m recommending that we wait to build and populate physical star schemas until a later phase, if at all. Remember the two reasons that we build dimensional models: model simplicity and performance. With our Extreme BI solution, we have tools to counter both of those reasons. We have OBIEE 11g, with a rich metadata layer that presents our underlying data model, even if it is transactional, as a star schema to the end user. This removes our dependency on a simplistic physical model to provide a simplistic logical model to end users. We also have Exadata, which delivers world-class performance against any type of model, and can bridge the performance gap afforded by star schemas. With these tools at our disposal, we can postpone the long process of building dimensional models, at least for the first few iterations. This is the only way to get working software in front of the end user in a single iteration, and, as I will argue, this is the best way to collaborate with an end user and deliver the content they are expecting.

Of the puzzle pieces we need to deliver this model, the first is the Oracle Next-Generation Reference DW Architecture (we need an acronym for that), which Mark has already written about in-depth here. As you browse through this post, pay special attention to his formulation of the foundation layer, which is the most important layer for delivering Extreme BI.

Oracle Next-Generation Reference DW Architecture

Foundation Layer

This is our “process-neutral” layer, which means simply that it isn’t imbued with requirements about what users want and how they want it. Instead, the foundation layer has one job and one job only: tracking what happened in our source systems. Typically, the foundation layer logical model looks identical to the source systems, except that we have a few additional metadata columns on each record such as commit timestamps and Oracle Database system change numbers (SCN’s). There are other, more complex solutions for modeling the foundation layer when the 3NF from the source system or systems is not sufficient, such as data vault. Our foundation layer is generally “insert-only”, meaning we track all history so that we are insulated from changing user requirements in the near and distant futures.

UPDATE: Kent Graziano, a major data vault evangelist, has started blogging. Perhaps with some pressure from the public, we could “encourage” him to blog on what data vault would look like in a standard foundation layer.

Capturing Change

Also required for delivering Extreme BI is a process for capturing change from the source systems and rapidly applying it to the foundation layer, which I described briefly in one of my posts on real-time data warehousing. We have a bit of a tug-of-war at this point between Oracle Streams and Oracle GoldenGate. GoldenGate is the stated platform of the future because it’s a simple, flexible, powerful and resilient replication technology. However, it does not yet have powerful change data capture functionality specific to data warehouses, such as easy subscriptions to raw changed data, or support for multiple subscription groups. You can, in general, work around these limitations using the INSERTALLRECORDS parameter and some custom code (perhaps fodder for a future blog post). Regardless of the technology, Extreme BI requires a process for capturing and applying source system changes quickly and efficiently to the foundation layer on the Exadata Database Machine.

Extreme Performance

Although I’ll drill into more detail in the next post, the reason we need Extreme Performance is to offset the performance gains we usually get from star schemas, since we won’t be building those, at least not in the initial iterations. Although Rittman Mead has deployed a variant of this methodology sans Exadata using a powerful Oracle Database RAC instead, there is no substitute for Exadata. Although the hardware on the Database Machine is superb, it’s really the software that is a game-changer. The most extraordinary features include smart scan and storage indexes, as well as hybrid columnar compression, which Mark talks about here and references an article by Arup Nanda found here. For years now, with standard Oracle data warehouses, we’ve pushed the architecture to it’s limits trying to reduce IO contention at the cost of CPU utilization, using database features such as partitioning, parallel query and basic block compression. But Exadata Storage can eliminate the IO boogeyman using combinations of these standard features plus the Exadata-only features to elevate the query performance against 3NF schemas on par with traditional star schemas and beyond.

Extreme Metadata

Extreme performance is only half the battle… we also need Extreme Metadata to provide us the proper level of abstraction so that report and dashboard developers still have a simplistic model to report against. This is what OBIEE 11g brings to the table. We have also delivered a variant of this methodology without OBIEE, using Cognos instead, which has a metadata layer called Framework Manager. As with Exadata, the BI Server has no equal in the metadata department, so my advice… don’t substitute ingredients.

Consider, for a moment, the evolution of dimensional modeling in deploying a data warehouse. Not too long ago, we had to solve most data warehousing issues with the logical model because BI tools were simplistic. Generally… there was no abstraction of the physical into the logical, unless you categorize the renaming of columns as abstraction. As these tools evolved, we often found ourselves with a choice: solve some user need in the logical model, or solve it with the feature set of the BI tool. The use of aggregation in data warehousing is a perfect example of this evolution. Designing aggregate tables used to be just another part of the logical modeling exercise, and were generally represented in the published data model for the EDW. But now, building aggregates is more of a technical implementation than a logical one, as either the BI Server or the Oracle Database can handle the transparent navigation to aggregate tables.

The metadata that OBIEE provides adds two necessary features for Agile delivery. First, we are able to report against complex transactional schemas, but still expose those schemas as simplified dimensional models. This allows us to bypass the complex ETL process at least initially so that we can get new subject areas into the users hands in a single iteration. But OBIEE’s capability to map multiple Logical Table Sources (LTS’s) for the same logical table makes it easy to modify — or “remap” — the source of our logical tables over time. So, in later iterations, if we decide that it’s necessary to embark upon complex ETL processes to complete user stories, we can do this in the metadata layer without affecting our reports and dashboards, or changing the logical model that report developers are used to seeing.

Flow of Data Through the Three-Layer Semantic Model

More to Come…

In the next post, I’ll describe what I call the Model-Driven Iteration, where we use OBIEE against the foundation layer to expose new subject areas in a single iteration. After that, I’ll describe ETL Iterations, where we transform a portion of our model iteratively using ETL tools such as ODI, OWB or Informatica. Finally, I’ll describe what I call Combined Iterations, where both Model-Driven activity and ETL activity are going on at the same time.

Season’s Greetings from Rittman Mead!

December 23rd, 2011 by Mark Rittman

Well, it’s almost Christmas Eve now, so we’re finishing up on client projects, getting the new office fitted out, planning a few mini-R&D projects over the Christmas break (or at least, I am anyway…), and traveling back to our families to take some well-earned rest over the holiday break. It’s been an exciting year, with a number of new projects starting up around the world, a new office opened in Melbourne, and growth to around fifty staff across our offices in Brighton, Brussels, Atlanta, Bangalore and Melbourne.

Next year looks pretty exciting already, with the 11.1.1.6 release of OBIEE due pretty soon, as well as the general availability of Exalytics, Oracle’s new hardware/software combination for in-memory analytics. Expect to see lots of blog posts on 11.1.1.6 and Exalytics as soon as the products go live, plus lots of content on Oracle BI, OLAP, data warehousing, ETL and project methodology on our blog, on our Facebook page, on Twitter and of course, in-person at conferences around the world. And – finally – my book should be out next year ;-)

So, it’s just for me to say thank you to all our customers and partners, and especially to our staff around the world, without whom of course none of this would be possible. Have a great holiday season, and we look forward to catching up with many of you in the New Year!

Seasons greetings,

Mark Rittman & Jon Mead

Agile Data Warehousing with Exadata and OBIEE: Introduction

December 21st, 2011 by Stewart Bryson

Over the last year, I’ve been speaking at conferences on one subject more than any others: Agile Data Warehousing with Exadata and OBIEE. Although I’ve been busy with client work and growing the US business, I realize I need to dedicate more time to blogging again, and this seemed like the logical subject to take up. So I’ll use the next few blog posts to make my case for what I like to call Extreme BI: an Agile approach to data warehousing using the combination of Extreme Performance and Extreme Metadata.

In a standard data warehouse implementation, whether we are walking in the Inmon or Kimball camps, some portion of our data model will be dimensional in nature; a star schema with facts and dimensions. So let me pose a question, which I think will lend itself well to diving into the Extreme BI discussion: Why do we build dimensional models? The first reason is simplicity. We want to model our reporting structures in a way that makes sense to the business user. The standard OLTP data model that takes two of the four walls in the conference room to display is just never going to make sense to your average business user. At the end of a logical modeling exercise, I expect the end-user to have a look at a completed dimensional model and say: “Yep… that’s our business alright”. The second reason we build dimensional models is for performance. Denormalizing highly complex transactional models into simplified star schemas generally produces tremendous performance gains.

So my follow-up question: can the combination of Exadata and OBIEE, or Extreme BI, actually change the way we deliver projects? We’ve all seen the Exadata performance numbers that Oracle publishes, and I can tell you first hand the performance is impressive. Can this Extreme Performance combined with the Extreme Metadata that OBIEE provides give us a more compelling case for delivering data warehouses using Agile methodologies?

To start with, I’d like to paint a picture of what the typical waterfall data warehousing project looks like. The tasks we usually have to complete, in order, are the following:

  1. User interviews
  2. Construct requirement documents
  3. Create logical data model
  4. SQL prototyping of source transactional models
  5. Document source-to-target mappings
  6. ETL development
  7. Front-end development (analyses and dashboards)
  8. Performance tuning

Raise your hand if this looks familiar. We would have to go through all these steps, which could take months, before end users can see the fruits of our labor. To mitigate this scenario, organizations will attempt to deliver data warehouses using “Agile” methodologies. What this usually means, from my experience, is a simple repackaging of the same waterfall project plan into “iterations” or “sprints”, so that the project can be delivered iteratively. So the process might look like the following:

  1. Iteration 1: Interviews and user requirements
  2. Iteration 2: Logical modeling
  3. Iteration 3: ETL Development
  4. Iteration 4: Front-end development

But this, ladies and gentlemen, is not Agile. To get an understanding of what lies at the heart of Agile development, we need to look no further than the Agile Manifesto, or the history of the Agile Movement. When examining the different methodologies, there is one major theme that permeates all of them: working software delivered iteratively. It’s not enough to simply deliver the same old waterfall methodology in “sprints” or “iterations”, because, at the end of those iterations, we don’t have any working software… software that end users can actually use to improve their job or help them make better decisions. In the example above, we still require four iterations before we get any usable content. It doesn’t matter if we’ve written some complex ETL to load a fact table if the end user doesn’t have a working dashboard to go along with it.

To apply the Agile Manifesto to data warehouse delivery, it’s the following key elements that are required for us to deliver with a true Agile spirit:

  1. User stories instead of requirements documents: a user asks for particular content through a narrative process, and includes in that story whatever process they currently use to generate that content.
  2. Time-boxed iterations: iterations always have a standard length, and we choose one or more user stories to complete in that iteration.
  3. Rework is part of the game: there aren’t any missed requirements… only those that haven’t been addressed yet.

I’ve been conscious not to prescribe any distinct Agile methodology, though I can’t help using more Scrum-like concepts in this formulation. However, I think this list is generic enough to apply to most methodologies. Over the next few posts, I’ll discuss the necessary puzzle pieces to engage in Extreme BI, as well as how we might implement new subject area content in a single iteration. Additionally, I’ll discuss how these implementations might be reworked, or “refactored”, over several iterations to produce data warehouses that respond to user stories: what users want and when they want it.

Follow-up Posts

Agile Data Warehousing with Exadata and OBIEE: Puzzle Pieces

Agile Data Warehousing with Exadata and OBIEE: Model-Driven Iteration

Agile Data Warehousing with Exadata and OBIEE: ETL Iteration

Event Triggers in BI Publisher 11g

December 20th, 2011 by Robin Moffatt

Event Triggers in BI Publisher 11g give the facility to call a function in Oracle either before or after a data set is refreshed. The function must return a boolean (true/false), and if it returns false the data model will abort execution.

In this article I will demonstrate how to pass a parameter through to a function in the database, and write this parameter to a table in the database. This could be useful for auditing the use of the system. The parameter that I will use is a system variable, User locale, that can be accessed through the BI Publisher System Variable :xdo_user_report_locale. There are several others available (see list here), and you can also reference your own parameters that you define in the dataset.

In this very simple example, we will write an entry to an audit table every time the data set is refreshed. The table we will use is defined as follows:

--Create table
CREATE  TABLE bip_audit_log  (action_desc VARCHAR(255), action_ts DATE);

The interface between the BI Publisher Event Trigger, and writing to the Database table, is a custom PL/SQL package. You can write your own depending on what you want to achieve with the Event Trigger. If you use parameters, then make sure they are declared globally (and see the note below about matching up parameter names, if you are trying to use parameters defined within the data model).

Here is the example package:

-- This is the package declaration.
-- The global variable is essential if you want to pass parameters in.
CREATE OR REPLACE PACKAGE bip_audit AS
g_input varchar2(255);
FUNCTION beforeDatasetRefreshWithInput(g_input IN VARCHAR2)
  RETURN BOOLEAN;
END;
/

-- This is the package body - the function
CREATE OR REPLACE PACKAGE body bip_audit AS
FUNCTION beforeDatasetRefreshWithInput(g_input IN VARCHAR2)
  RETURN BOOLEAN AS
BEGIN
  INSERT  INTO bip_audit_log
    (
      action_desc,
      action_ts
    )
    VALUES
    (
      'Locale : '
      || g_input,
      sysdate
    );
  COMMIT;
  RETURN TRUE;
EXCEPTION
WHEN OTHERS THEN
  RETURN false;
END;

END;
/

Before you use your package, you should test it:

-- test function
-- Use this to invoke the function
DECLARE
  result BOOLEAN;
BEGIN
  result := bip_audit.beforedatasetrefreshwithinput('foobar');
END;
/
-- Now select from the table, and make sure we got an entry
SELECT action_desc,         TO_CHAR(action_ts,'YYYY-MM-DD HH24:MI:SS')
from bip_audit_log;
/

Once you have created the PL/SQL package, create a New -> Data Model in BI Publisher/OBIEE and go to the Data Model Properties page and set the Oracle DB Default Package to that which you have created. Make sure that the PL/SQL package is accessible from the Default Data Set connection that is defined for your Data Model.

Build your Data Set as required, and save the Data Model.

Now go to Event Triggers, and click on Create new Event Trigger.

Name the event trigger Write locale to audit log. Set the Type to Before Data, and Language as PL/SQL.

In the lower pane of the window, under Available Functions you should see the name of the package that you set in the Properties page. If you don’t, check that the package exists as defined in Properties, and can be accessed by the user and connection defined as the Default Data Source for the Data Model.

Underneath the package, you should see the function that you defined, listed in upper case. Highlight the function and click the right arrow to move it into the Event Trigger box on the right.

Now manually edit the Event Trigger text to replace the part in brackets:

G_INPUT:VARCHAR2

with

:xdo_user_report_locale

(note the colon prefix)


Save the changes to your data model, and then click on view XML, and click Run.

When you clicked Run, the Event Trigger should have fired, and you’ll see the results of the Data Set as normal. To confirm the success of the Event Trigger, check the contents of the audit table:

SELECT action_desc,         TO_CHAR(action_ts,'YYYY-MM-DD HH24:MI:SS') from bip_audit_log;

Result:

ACTION_DESC          TO_CHAR(ACTION_TS,'
-------------------- -------------------
Locale : en_US       2011-12-01 16:26:56

This completes the simple example, having shown how to pass a parameter through to a function via an Event Trigger. However, read on below to see details of the problems you may expect to encounter deviating from the very narrow example above.

Gotcha!

The above works fine, unless you add your own parameter to the dataset (regardless of whether you try to use it in the event trigger).

If you try to use the above Event Trigger, in a Data Model in which a Parameter exists, you’ll get this error:

PLS-00302: component 'NEW_PARAMETER_2' must be declared
ORA-06550: line 2, column 1:
PL/SQL: Statement ignored

(where New_Parameter_2 is the name of the parameter that exists in the data set).

I would imagine that the workaround for this would be to amend the package function to also accept the parameter names also defined in the report, even if they’re not used by the function.

Using your own parameters

If you want to use your own parameter (rather than a system variable as above), then from my testing it appears that the issue described here in 2007 (MOS DocID 859980.1) still holds: your Data Set parameter name must match the name of a global parameter defined in your package function.

So in the above example, a parameter defined in the data set as g_input would work as input to the Event Trigger :

BIP_AUDIT.BEFOREDATASETREFRESHWITHINPUT(:g_input)

But changing the parameter name in the Data Set, and its corresponding reference in the Event Trigger, would not:

BIP_AUDIT.BEFOREDATASETREFRESHWITHINPUT(:my_new_parm_name)

This throws the error :

Message	oracle.xdo.XDOException: oracle.xdo.XDOException: oracle.xdo.XDOException: java.sql.SQLException: ORA-06550: line 2, column 11:
Supplemental Detail	PLS-00302: component 'MY_NEW_PARM_NAME' must be declared
ORA-06550: line 2, column 1:
PL/SQL: Statement ignored

Why this doesn’t apply to passing through system variables (for example, xdo_user_report_locale seen above), I don’t know.

Troubleshooting

Watch out for unhelpful error handling within BI Publisher. If there’s a problem with the XML generation (for example – but not limited to – the Event Trigger call being incorrect), expect to get this kind of thing:

XML Parsing Error: mismatched tag. Expected: </img>.
Location: http://rm-win01:9704/xmlpserver/servlet/xdo
Line Number 2, Column 580:<table style='background=c[...]

The annoying thing about this is that the actual error is tantalisingly close – if you scroll to the end of the line, you’ll see that it truncates:

oracle.xdo.XDOException: java.sql.SQLException: ORA-06550: line 4, column 51:

So to find out the actual error, go to Enterprise Manager (FMC) and under your Web Logic Domain, navigate to your bi_server and right-click, Logs -> View Logs

Within here you should find the error, and by taking advantage of the ECID (Execution Context ID), you can drill through to related log entries. In this instance, that includes both the error at the BI Publisher level and that reported back from the database.

Version

The above was all tested on BI Publisher 11.1.1.5.

Documentation

Event Triggers are covered in the documentation here

Rittman Mead BI Forum 2012, Brighton & Atlanta May 2012 : Call for Papers Now Open!

December 12th, 2011 by Mark Rittman

I’m pleased to announce the call for papers for the fourth annual Rittman Mead BI Forum, which like last year is running in Brighton, UK and Atlanta, GA in May 2012.

Last year’s event was the best attended ever, with a mix of technical and project/methodology talks based around OBIEE (Oracle Business Intelligence Enterprise Edition), and technologies that support it such as ODI, Essbase, Oracle OLAP and Exadata. This year, we’ll of course be covering the new Exalytics product, and we have last year’s US Best Presenter winner, Kevin McGinley from Accenture, providing the optional Masterclass on the Wednesday before each event.

This year, we’re opening up the presentation formats a bit, so that as well as the regular one-hour presentations, we’ll also be inviting presentations in these additional formats:

  • One hour debates, along the lines led by Stewart and myself last year, along topical and controversial topics where the audience will vote at the end – for example, “You probably don’t need Exalytics”, “
  • Ten-minute “TED”-style talks, where no slides are allowed and you can speak on your favourite topic in a short, lightning session, and
  • Ten-minute “Ignite”-style talks, where you have 20 slides, each of which automatically advances every thirty seconds

Feedback from previous years indicated that sessions that got the audience involved went down the best, and there were also suggestions for shorter, differently-presented sessions. We will be giving free passes to speakers accepted for the one-hour sessions, and 25% discounts for speakers providing the ten minute sessions.

If you’re wondering what sorts of topics might be of interest to attendees, here’s some requests and thoughts from delegates from last year’s events:

Suggestions for Technology-Focused Sessions

  • Integration between ODI and OBIEE (both ways)
  • OBIEE Performance Tuning – including case studies
  • Search, and Unstructured Data
  • Golden rules for deployments and multi-developer projects
  • ODI, Essbase and OBIEE working together
  • Introduction to Endeca
  • BI Applications topics
  • OBIEE and ADF: what are the limits ? When is it a must to use ?
  • OBIEE 11g security – what works, what doesn’t, how does it all fit together?
  • Mobile BI – including designing dashboards for delivery via Mobile
  • Multi-tier performance tuning including web tiers, clustering etc
  • Mapping “nuts and bolts”
  • Knowing/administering/troubleshooting the 11g Weblogic/EM layer and the complexities surrounding that
  • Options for using OBIEE for real time or near real time operational reporting
  • Exalytics – Understanding installation/configuration; how managing the caching if you have over 1 terabyte of data; go over new features, etc.

Planning & Methodology

  • Strategy Maps and Scorecards – How and Where to Use Them?
  • How to present information overload for end-users – how do you convince someone they don’t need 45 reports?
  • How effectively and quickly are people able to implement various Oracle BI Applications in real world scenarios? And how are they structuring the project to do so?
  • Exalytics – making the business case. What makes a customer choose Exayltics? As of which customer size/complexity/maturity does it make sense?
  • Cloud Delivery Models
  • Data Governance Frameworks
  • Design principles, with regards regards to design, usability, etc.

So what are you waiting for? The abstract submission form is now online, and will close on January 31st 2012, with speakers notified in early February 2012. Get submitting those session abstracts now, and hopefully we’ll see you in Brighton or Atlanta in 2012!

Website Design & Build: tymedia.co.uk