Data Warehouse Fault Tolerance Part 1: Resuming

February 8th, 2010 by Stewart Bryson

In the introduction to this series of posts, I spoke briefly about data warehouse fault tolerance and the unique challenges resulting from high data volumes combined the batch load window required to create them. I then defined the goal: a layered approach allowing simple errors to be caught early before they turn in to serious conditions.

Resuming is the ability to continue effortlessly after an error. The important thing is that there should be no aftermath from the error: our process should pause gracefully until the error is corrected. The Oracle Database has offered out of the box functionality for resuming since version 9i in the form of Resumable Space Allocation. Resumable operations are supported for SELECT queries, DML and DDL, and can be enabled at either the system or the session level. To enable at the system level, the RESUMABLE_TIMEOUT database parameter should have a non-zero value.

SQL> alter system set resumable_timeout=3600;

System altered.

SQL>

To enable resumable operations at the session level, the statement follows this basic syntax, with the TIMEOUT and NAME clauses being optional:

ALTER SESSION ENABLE RESUMABLE <TIMEOUT n> <NAME string>;

The TIMEOUT value is specified in seconds, and if omitted, the default value of 7200 is used, or 2 hours. The NAME clause gives the resumable session a user-friendly name for when we are monitoring for resumable sessions (as we will see later) to see which of our processes is suspended. Enabling resumable operations for the session level requires that the RESUMABLE permission has been granted:

SQL> grant resumable to stewart;

Grant succeeded.

SQL>

Resumable operations can also be enabled with the Oracle utilities… such as SQL-Loader, Export/Import and Datapump. The command-line parameters RESUMABLE, RESUMABLE_NAME and RESUMABLE_TIMEOUT exist to mimic the functionality mentioned above.

Now for a demonstration. I’ll create a situation that is ripe for a space allocation error: I’ll put an empty copy of the SALES fact table from the SH schema in a tablespace with only 250K of space:

SQL> create tablespace target datafile '/oracle/oradata/bidw1/target01.dbf' size 250K;

Tablespace created.

SQL> create table target.sales tablespace target as select * from sh.sales where 1=0;

Table created.

SQL>

Now I’ll load some records into the table, which should cause it to suspend. To prepare my session, I need to enable resumable operations. Since I always instrument my code, I’ll register my process with the database. After that, I have an easy way to guarantee consistency when referring to processes. Now, I can use the registered name for my resumable session as well:

SQL> exec dbms_application_info.set_module('SALES fact load','insert some rows');

PL/SQL procedure successfully completed.

SQL>
SQL> DECLARE
  2     l_module VARCHAR2(48) := sys_context('USERENV','MODULE');
  3  BEGIN
  4     EXECUTE IMMEDIATE
  5     'alter session enable resumable timeout 18000 name '''||l_module||'''';
  6  END;
  7  /

PL/SQL procedure successfully completed.

SQL>

I start loading the records in hopes of a suspended session:

SQL> insert into target.sales select * from sh.sales;

So now, I open up another session, and I start another transaction against the TARGET.SALES table, just to pile on the TARGET tablespace:

SQL> exec dbms_application_info.set_module('SALES fact load2','insert more rows');

PL/SQL procedure successfully completed.

SQL>
SQL> DECLARE
  2     l_module VARCHAR2(48) := sys_context('USERENV','MODULE');
  3  BEGIN
  4     EXECUTE IMMEDIATE
  5     'alter session enable resumable timeout 18000 name '''||l_module||'''';
  6  END;
  7  /

PL/SQL procedure successfully completed.

SQL> insert into target.sales select * from sh.sales;

I’ll have a look in the DBA_RESUMABLE view (there is also a USER_RESUMABLE version) for my suspended sessions. Even though I could get all the following information with a single SQL statement, I broke it up for better visibility on the blog:

SQL> select name, start_time, suspend_time, status from dba_resumable;

NAME              | START_TIME           | SUSPEND_TIME         | STATUS
----------------- | -------------------- | -------------------- | ------------
SALES fact load2  | 02/06/10 10:33:33    | 02/06/10 10:33:33    | SUSPENDED
SALES fact load   | 02/06/10 10:29:03    | 02/06/10 10:29:03    | SUSPENDED

2 rows selected.

Elapsed: 00:00:00.07
SQL> select name, sql_text from dba_resumable;

NAME              | SQL_TEXT
----------------- | -----------------------------------------------
SALES fact load2  | insert into target.sales select * from sh.sales
SALES fact load   | insert into target.sales select * from sh.sales

2 rows selected.

SQL> select name, error_msg from dba_resumable;

NAME              | ERROR_MSG
----------------- | ------------------------------------------------------------------------
SALES fact load2  | ORA-01653: unable to extend table TARGET.SALES by 8 in tablespace TARGET
SALES fact load   | ORA-01653: unable to extend table TARGET.SALES by 8 in tablespace TARGET

2 rows selected.

SQL>

The Oracle Database also publishes server alerts concerning suspended transactions using the Server-Generated Alerts infrastructure. This infrastructure uses the AWR toolset, the server package DBMS_SERVER_ALERT for getting and setting metric threshholds, and the queue table ALERT_QUE to hold alerts that have been published from AWR. Custom processes could be written to mine ALERT_QUE for these alerts, but the easiest way to configure and view server alerts is using Oracle Enterprise Manager (OEM). On the Alerts section of the main OEM page, we can see three different alerts generated by the Oracle Database:

all alerts.png

If we click on the “Session Suspended” link, we can see the multiple alerts generated in this category:

suspend alerts.png

Another alert generated indirectly by the suspended transaction is the “Configuration” class event caused by our session “waiting” to proceed. The Oracle wait event interface can show us information about the suspend waits on the system:

SQL> SELECT event,
  2         SUM(time_waited) time_waited,
  3         SUM(total_waits) total_waits,
  4         AVG(average_wait) average_wait
  5    FROM gv$session_event
  6   WHERE lower(event) LIKE '%suspend%'
  7   GROUP BY event
  8   ORDER BY time_waited ASC
  9  /

EVENT                                          | TIME_WAITED | TOTAL_WAITS | AVERAGE_WAIT
---------------------------------------------- | ----------- | ----------- | ------------
statement suspended, wait error to be cleared  |      305373 |        1377 |       221.78

1 row selected.

SQL>

To free up the space issue, I’ll enable autoextend on the TARGET tablespace. Then, I’ll take a look and see if anything has changed:

SQL> alter database datafile '/oracle/oradata/bidw1/target01.dbf'
  2  autoextend on next 10M maxsize 1000M;

Database altered.

SQL> select status, resume_time, name from dba_resumable;

STATUS       | RESUME_TIME          | NAME
------------ | -------------------- | -----------------
NORMAL       | 02/06/10 10:56:49    | SALES fact load2
NORMAL       | 02/06/10 10:56:49    | SALES fact load

2 rows selected.

SQL>

The Resumable Space Allocation features includes the AFTER SUSPEND trigger, which allows the specification of a system-wide trigger that will fire whenever a transaction is suspended. The typical use for this functionality is alerting as suspended operations don’t write anything to the alert log.

UPDATE: I made a mistake here… suspended transactions do in fact cause entries in the alert log, and so does the RESUME process detailed below.

There are some features in the DBMS_RESUMABLE package that may make sense when writing an AFTER SUSPEND trigger:

SQL> desc dbms_resumable
PROCEDURE ABORT
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SESSIONID                      NUMBER                  IN
FUNCTION GET_SESSION_TIMEOUT RETURNS NUMBER
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SESSIONID                      NUMBER                  IN
FUNCTION GET_TIMEOUT RETURNS NUMBER
PROCEDURE SET_SESSION_TIMEOUT
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SESSIONID                      NUMBER                  IN
 TIMEOUT                        NUMBER                  IN
PROCEDURE SET_TIMEOUT
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TIMEOUT                        NUMBER                  IN
FUNCTION SPACE_ERROR_INFO RETURNS BOOLEAN
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 ERROR_TYPE                     VARCHAR2                OUT
 OBJECT_TYPE                    VARCHAR2                OUT
 OBJECT_OWNER                   VARCHAR2                OUT
 TABLE_SPACE_NAME               VARCHAR2                OUT
 OBJECT_NAME                    VARCHAR2                OUT
 SUB_OBJECT_NAME                VARCHAR2                OUT

SQL>

This package adds functionality for writing custom processes in the AFTER SUSPEND trigger. The SPACE_ERROR_INFO function returns specifics about the table and tablespace affected by the space error. A series of checks could be coded enabling specific actions depending on which objects were affected. A suspended process can be ended prematurely with the ABORT procedure, or more time can be added using the SET_TIMEOUT procedure. I actually had one client explain how she had written an AFTER SUSPEND trigger that compiled information about the tablespace affected so that an “ALTER DATABASE… RESIZE…” command could be issued to add more space to the affected datafile. I didn’t have the heart to tell her that she had basically written a feature that already existed in the database: AUTOEXTEND.

So what are the best practices to take away from this? Quite simply… all ETL mappings and flows, as well as database maintenance processes, should use Resumable Space Allocation, preferably using the NAME clause in conjunction with DBMS_APPLICATION_INFO. Setting a RESUMABLE_TIMEOUT value at the system level can be scary, because a single suspended transaction could cause locks that reverberate all the way through the system. But is this really a concern in a BI/DW environment? Are there any processes in our batch load window or with any of our operational maintenance processes that we wouldn’t want to enable for resumable operations, no matter how many processes back up waiting for them to complete? It could spell bad news if we used any kind of synchronous replication technology to move data to the DW instance, but short of that, I can’t think of any. Please let me know if you have alternative viewpoints.

I’ve never found much reason to use the AFTER SUSPEND trigger though. Data warehouses should have production-type monitoring running already, just like other production systems. OEM is more than satisfactory for basic monitoring and alerting, and with the Server-Generated Alerts introduced in 10g, forms a complete product for Oracle environments. But regardless of which monitoring solution is used, it should be able to issue simple queries against the database and alert based on the results of those queries. A select against the DBA_RESUMABLE table provides all the information required to send out an alert, and with features such as AUTOEXTEND, I just can’t see a requirement for the ability to issue procedural code because a transaction is suspended.

UPDATE: as pointed out above, since suspended transactions do in fact show up in the alert log, this is good news for integrating Resumable Space Allocation into an existing environment. Assuming that there’s proper alert log monitoring with paging functionality already in place, implementing resumable operations can simply use that infrastructure already in place.

Keep your eyes open for the next of the “Three R’s” in BI/DW fault tolerance: Restarting.

Oracle BI EE 10.1.3.4.1 – Modeling Measure Dimensions on Relational Sources – Essbase Style

February 3rd, 2010 by Venkatakrishnan J

One of the biggest differences between Essbase and most other reporting/OLAP tools like BI EE/Oracle OLAP etc is in the way Measures are treated(comparison only in terms of data modeling and reporting and not on physical implementation). In Essbase, every data/value(cell) is attributed by a set of dimensions. Even a measure is considered as an attribute of another dimension. But in the case of Oracle OLAP, BI EE etc, one requires to specifically identify a set of measures and load/report against them. The problem with this approach is the fact that in many cases while doing the reporting we would like to treat the measures like a normal dimension and vice versa. Just to illustrate this point, the most commonly used component in BI EE Answers is the Pivot View component. A Pivot View basically converts a set of dimension attributes as a set of measures(typical analysis) by spreading all the dimension attributes in the column section.

image

Similarly, in many cases, we want to treat the measures as a set of dimension attributes as shown below

image

Again pivot table component can be used to achieve this. Both SQL and MDX can swap the attributes to Columns and Rows pretty efficiently(basically one can manually handcraft SUM, FIRST etc kind of aggregation on any valid column). But once we designate a set of static measures within the reporting tool, the model can become constrained to an extent. Of course, there are ways around it like using Pivot tables, Answers based aggregation, UNIONs etc. The main drawback with a measure based model is the fact that one cannot efficiently drill on Measure Hierarchies. The other drawback is since our database related data structures are more oriented towards static measure modeling (facts and dimensions), it is very difficult to switch the measure hierarchy for reporting.

For example, if you had gone through or attended our training days event last year, we had shown how to go about switching measure hierarchies in BI EE for an Essbase source. Its a very common requirement when you have Essbase as a data source. Many users might choose Accounts dimension as a measure dimension for reporting. Some users might choose Scenario as the measure dimension for reporting. Essbase provides that flexibility and the same can be leveraged from BI EE as well. But in the case of relational sources, it can be quite difficult to achieve this. There are a couple of reasons for this

1. If we model a separate measure dimension in our warehouse, the fact tables will contain only one measure. The size of the fact table will grow multiplied by the number of measures in the measure dimension.

2. Its not practical(& to an extent does not make sense) to model it this way for relational data models at the database level.

Though we might stick with a fact/dimensional model in the data warehouse, i think the reporting tools should provide the capability to switch the measure dimensions even for relational sources(PIVOT and UNPIVOT operations should be supported from the RPD). Currently we do not have this capability in BI EE (from a repository standpoint). Though BI EE does not support this currently, the BI Server repository can be tweaked to an extent to achieve this using Conforming Dimensions. Lets look at a couple of scenarios. One is modeling the measure dimension for relational sources and the other is in switching the measure dimension itself.

Measure Dimension:

Lets understand this through our usual example SH schema that gets shipped with oracle database. Most common way of designing a repository for this SH schema is given below

image

Its a very simple repository containing a Sales measure, a cost measure and all the dimensions. If you look at this, UNIT_COST and AMOUNT_SOLD are designated as measures(which is how it exists in the physical database). Lets assume that a requirement is to treat both these measures as attributes of a measure column i.e we want to be in a position to treat both the measures as part of a single dimension. The report below basically explains what is needed

image

The above report has a column called Measure which will basically un-pivot the measure values. To achieve this, we start with creating a new custom dimension within the RPD called as Measure Dim. This will basically have one column called as Measure which will be fed through 2 logical table sources (one is Sales and the other is Costs). This is to signify that each LTS will be used to serve each measure

image

image

image

Then we create 2 custom fact tables (one for sales measure and the other for cost measure) and then force the conforming dimension join as shown below

image

To combine both the measures through a common single fact, we create another logical fact table with one logical column. This will be fed through a logical calculation of UNIT_COST and AMOUNT_SOLD as shown below

image

image

This will provide an extra dimension for your reporting (measure dimension). Remember, this will work as long as you have measures from different physical tables. If you have measures from the same physical table then you will have to alias the table as many times as the number of measures (which is not recommended). Also, currently we cannot make BI EE to push the PIVOT and UNPIVOT operators directly from the BMM.

image

Swapping Measure Dimensions:

The next requirement is, now that we have converted the measures as a set of attributes derived from a column, to convert each member of the CHANNEL_CLASS column as a measure i.e we want 3 measures Direct, Indirect and Others. To do this, replicate the measure in each fact table 3 times. Basically each measure will have a FILTER function applied to filter only on the necessary attributes. For example, the Sales and Cost fact tables are shown below

image

The Direct measure will have a FILTER function applied as shown below

image

Similar filters will be applied for Indirect and Others measures as well for both the fact tables. Then in the main fact table, we need to combine the individual measures again.

image

This will allow you to create a report containing the actual measures as Dimension attributes and the actual dimension attributes as measures (all of this in a normal table view)

image

If you look at the SQL, you will notice that there will be conforming dimension join done on the Amount Sold and Unit Price attributes and then they are combined using a main query

WITH
SAWITH0 AS (select sum(case  when T1776.CHANNEL_CLASS = 'Direct' then T1929.AMOUNT_SOLD end ) as c1,
     sum(case  when T1776.CHANNEL_CLASS = 'Indirect' then T1929.AMOUNT_SOLD end ) as c2,
     sum(case  when T1776.CHANNEL_CLASS = 'Others' then T1929.AMOUNT_SOLD end ) as c3,
     'Amount Sold' as c4,
     T1916.PROMO_NAME as c5
from
     PROMOTIONS T1916,
     CHANNELS T1776,
     SALES T1929
where  ( T1776.CHANNEL_ID = T1929.CHANNEL_ID and T1916.PROMO_ID = T1929.PROMO_ID and (T1776.CHANNEL_CLASS in ('Direct', 'Indirect', 'Others')) )
group by T1916.PROMO_NAME),
SAWITH1 AS (select sum(case  when T1776.CHANNEL_CLASS = 'Direct' then T1784.UNIT_COST end ) as c1,
     sum(case  when T1776.CHANNEL_CLASS = 'Indirect' then T1784.UNIT_COST end ) as c2,
     sum(case  when T1776.CHANNEL_CLASS = 'Others' then T1784.UNIT_COST end ) as c3,
     'Unit Price' as c4,
     T1916.PROMO_NAME as c5
from
     PROMOTIONS T1916,
     CHANNELS T1776,
     COSTS T1784
where  ( T1776.CHANNEL_ID = T1784.CHANNEL_ID and T1784.PROMO_ID = T1916.PROMO_ID and (T1776.CHANNEL_CLASS in ('Direct', 'Indirect', 'Others')) )
group by T1916.PROMO_NAME)
select distinct case  when SAWITH0.c4 is not null then SAWITH0.c4 when SAWITH1.c4 is not null then SAWITH1.c4 end  as c1,
     case  when SAWITH1.c5 is not null then SAWITH1.c5 when SAWITH0.c5 is not null then SAWITH0.c5 end  as c2,
     nvl(SAWITH1.c1 , 0) + nvl(SAWITH0.c1 , 0) as c3,
     nvl(SAWITH1.c2 , 0) + nvl(SAWITH0.c2 , 0) as c4,
     nvl(SAWITH1.c3 , 0) + nvl(SAWITH0.c3 , 0) as c5
from
     SAWITH0 full outer join SAWITH1 On SAWITH0.c4 = SAWITH1.c4 and SAWITH0.c5 = SAWITH1.c5
order by c1, c2

Though this will work in most cases, there are some drawbacks with this approach

1. The SQLs generated will not be the best performing ones. As the number of measures grow, the number of conforming dimension joins will increase thereby producing poor performance. Unfortunately BI EE cannot push the PIVOT/UNPIVOT operators supported from Oracle 11g

2. If there are a lot of measures from a single fact table then this requires multiple fact table aliases to force the conforming dimension join across all dimension sources (For the measure dim). This might make the RPD un-necessarily big.

You might not find this technique to be of practical use, but this demonstrates the capabilities of the BI Server to model some complex requirements. I am hoping that sometime in the near future we should see more of pure multi-dimensional modeling on relational sources to be supported by BI EE.

Data Warehouse Fault Tolerance: An Introduction

February 2nd, 2010 by Stewart Bryson

With so much of the blog devoted to OBIEE, OWB and Essbase lately, I felt like it was time to do a few database-related postings. In the past, when I’ve posted database content to the blog, I usually gravitate toward ETL-related features: those that waffle between database administration and ETL development. But this time I’m going to take a very different route and discuss data warehouse fault tolerance, and so I’ll be doing a series of postings that discuss what it means to strive to be fault-free.

Fault tolerance isn’t disaster recovery exactly… though there’s a lot of overlap. Instead, fault tolerance is the ability to recover from errors, and those errors can result from hardware issues, software issues, general systems issues (network latency, out-of-space errors), and human mistakes. The main point is that BI/DW environments present unique challenges, both for operations and for the development team. I’m not preposing that the divide between transactional and reporting systems is necessarily vast… we still need redundant storage systems and dependable backup strategies. I am preposing, however, that one-size-fits-all approaches to fault-tolerance is problematic, and applying standards that evolved in support of transactional systems may not provide the best protection for BI/DW environments.

The operational teams (DBAs, Unix Admins, Storage Admins, etc.) and the development teams (source system extraction, ETL) have to work closer in a BI/DW than perhaps they do in OLTP environments. Of course, OLTP developers have to write scalable code… but I think that’s within their control for the most part. ETL developers are thrashing around millions or billions of rows of data, and because of this, everything needs to be well-oiled: undo spaces need to be available, temp space needs to be plentiful, standard operational jobs such as backup and recovery or statistics gathering need to keep the batch load window in mind, etc. Whereas OLTP code is exclusively SQL… ETL code is packed full of DDL: partition-exchange loads, index and constraint maintenance, table truncates, the whole gamut.

So when working with millions or billions of rows, we need to eliminate errors as best we can. Sounds simple enough, but the truth is that errors are going to happen, and there’s nothing we can do to wipe them out completely. But we can mitigate. So we need to introduce a triage process: catching and correcting errors as early as possible so that their damage is minimal. In essence: don’t let simple errors turn into weekend-long data correction issues, where millions of rows need to be updated or deleted. Let’s work smarter, not harder, using every solution available to use, including features present in the Oracle Database, best practices in ETL development, and possible modifications to our backup and recovery strategies.

I should note that, when speaking of BI/DW environments, I still have the batch load paradigm squarely in mind. Although the line in the sand is certainly moving in one constant direction, most data warehouses are still loaded with either batch or mini-batch processing. However, being a fan of near-real-time data warehouse techniques (as my colleague Peter Scott has written from time to time… only reporting from the source system itself is truly real-time), I’ll be sure to point out how some of these techniques differ the closer we get to the actual transaction.

I currently have three postings in mind that correlate to the Three R’s of Data Warehouse Fault Tolerance. Be on the lookout for the first installment coming soon: Resuming.

Fantasy Football – When is a fact not a Fact?

February 2nd, 2010 by Adrian Ward

As you will see from Venkat’s Football blog, we ran a demo of the OBIEE system at last years UKOUG.  The demo was based upon the English Football Premiership, and although I prefer rugby, using sports in samples is a great way to highlight the features of OBIEE.  It seems that some sports, such as Cricket, American Football and Tennis seem to compete on the number of statistics they can find in a match!

You don’t need to understand the English football league to understand the demo, however, for football managers there are dozens of facts that can be analysed, and can be related too.
For example, you can run a request to show teams that are level at half time, but go on to lose the game in the second half.  As a coach you may draw from that fitness needs to be improved, along with concentration (more drinks?), and a better half time talk.

So you can use the ’statistics’ to help identify where things are going wrong, and when you are doing the right things, for example losing at away teams far away (change the travel arrangements?).

The great thing with sports stats is that you can also analyse your opponent to find their weaknesses – do they concede early goals (we have to come out strong and attacking from the whistle).

The other ‘interesting’ thing about using sports results in OBIEE is that they clearly demonstrate how facts are sometimes measures and sometimes dimensional attributes…..

Reporting on Fact based Dimensional Attributes

Lets use an example to explore this:
The exam question is this:

How many games was my team losing at half time, but then went on to win.

We will then drill into these games to try to understand the cause of this.

Remember the question for later!

The measures we have employed include:

  • Goals scored
  • Home Team Score at Half Time
  • Home Team Score at Full time
  • Away Team Score at Half Time
  • Away Team Score at Full time
  • Home Team Red cards,
  • Home Team Yellow Cards,
  • etc

These measures are set with an aggregation of SUM, but we could also set many of the other aggegations, such as AVERAGE.
If you run the following request,

Home Stats

the SQL generated is:

SELECT T6428.YEAR AS C1,
T6428.HOMETEAM AS C2,
‘HOME’ AS C3,
SUM(T6428.FTHG) AS C4,
SUM(T6428.FTAG) AS C5
FROM
PREMIER_LEAGUE T6428 /* HOME TEAM – PREMIER LEAGUE */
WHERE ( T6428.YEAR = ‘2009′ )
GROUP BY T6428.HOMETEAM, T6428.YEAR
ORDER BY C1, C2, C3
** FTHG = Full Time Home Team Goals
** FTAG = Full Time Away Team Goals

Note the “SUM(T6428.FTHG) AS C4″ which is from the Sum Aggregate in the logical Column.

Quick Data Modelling Aside

Before I go further a quick note on how the data could be modelled.

There are essentially three ways (I can quickly think of) we could model the ‘result’ of a match.

  1. Capture the result in the ‘Match’ fact record.
    Two Examples
    a) We could add a field called HTR – Half Time Result – and enter words or codes such as ‘Home Win’ or ‘HW’.
    b) We could add field called HW_HT (short for Home Win Half Time) and put a 0 or 1 in if it true
  2. No Results code – just capture the goals (let the reporting layer do the work)
  3. Place the result codes in a fact table of their own.
    Create a table called MatchResults, with columns

MATCH_ID,   RESULT_CODE,   RESULT

a sample set of record for a match could be -

123, HW_HT, 0

123, AW_HT, 0

123, DR_HT, 1

Back to the scenario

Let’s look at a Scenario 1(a) method (which is the chosen method for our Demo).

In the ETL Venkat has created a field called FTR – Full Time Result, which contains the values H (Home Team Wins), D (Draw) and A (Away Team Wins)

If we use this column in a report

6

WITH
SAWITH0 AS (select sum(case when T6428.FTR = ‘H’ then 1 else 0 end ) as c1,
T6428.YEAR as c2,
T6428.HOMETEAM as c3,
T6428.AWAYTEAM as c4,
T6428.DATE1 as c5
from
PREMIER_LEAGUE T6428 /* Home Team – Premier League */
where ( T6428.YEAR = ‘2009′ )
group by T6428.AWAYTEAM, T6428.DATE1, T6428.HOMETEAM, T6428.YEAR)
select distinct SAWITH0.c2 as c1,
SAWITH0.c3 as c2,
SAWITH0.c4 as c3,
SAWITH0.c5 as c4,
‘Home’ as c5,
SAWITH0.c1 as c6
from
SAWITH0
order by c1, c2, c3, c4, c5

In Summary this is:

7

select T6428.YEAR as c1,
T6428.HOMETEAM as c2,
‘Home’ as c3,
sum(case when T6428.FTR = ‘H’ then 1 else 0 end ) as c4
from
PREMIER_LEAGUE T6428 /* Home Team – Premier League */
where ( T6428.YEAR = ‘2009′ )
group by T6428.HOMETEAM, T6428.YEAR
order by c1, c2, c3

Now let’s take a scenario 2 where the ETL for football scores does not populate a set of result fields. This is often the case where database developers are not warehouse or star friendly, and only store the raw data.

So, in order to determine who is winning at half time we have to run a calculation. The choices of where you do this are numerous:

  1. In the database (Create a materialised or normal View)
  2. In the rpd – Create a calculated field which looks at the scores
  3. In a request – In a particular report

The database option probably being the longest one to implement, but it will also be the most performant (if using an MV) when you come to running reports. Creating the calculation in the rpd means it is re-usable. Creating the calculation in the report is probably the easiest to get done in a short time.

Lets have a look at creating a request of home wins at full time.

Add, Year, Team Name, Wins at Full Time.

Get Wins

Wins at Full time =

Case Win

(CASE WHEN ((Home Team Score at Full Time – AWAY Team Score at Full Time)>0) THEN 1 ELSE 0 END)

Which created the following SQL.

WITH
SAWITH0 AS (SELECT SUM(T6428.FTAG) AS C1,
SUM(T6428.FTHG) AS C2,
T6428.YEAR AS C3,
T6428.HOMETEAM AS C4
FROM
PREMIER_LEAGUE T6428 /* HOME TEAM – PREMIER LEAGUE */
WHERE ( T6428.YEAR = ‘2009′ )
GROUP BY T6428.HOMETEAM, T6428.YEAR)
SELECT DISTINCT SAWITH0.C3 AS C1,
SAWITH0.C4 AS C2,
SAWITH0.C2 AS C3,
SAWITH0.C1 AS C4,
CASE WHEN SAWITH0.C2 – SAWITH0.C1 > 0 THEN 1 ELSE 0 END AS C5
FROM
SAWITH0
ORDER BY C1, C2

AH!, So the SQL generated did not give the right answer.

Is that what you expected? Did you definately know it was going to do that?

As you can see it SUMMED the Home Goals and Away goals for the season then looked at the case if sum(Home goals) is greater than sum(away goals). Therefore you do not get an answer for each match, so cannot get total matches won in the table.

Why is this?

The key difference between Scenario 1 and 2 is shown in the statements

The first method analysis of the Result runs on each Match record in the table, then sums the field

sum(case when T6428.FTR = ‘H’ then 1 else 0 end ) as c4

The second method runs an analysis the summed fields

CASE WHEN SAWITH0.C2 – SAWITH0.C1 > 0 THEN 1 ELSE 0 END AS C5

Now lets see if we can get a Scenario 2 to work.

Here is the query again, but with a secret weapon – The Pivot table

Get wins

.SQL produced is the same,

WITH
SAWITH0 AS (select sum(T6428.FTAG) as c1,
sum(T6428.FTHG) as c2,
T6428.YEAR as c3,
T6428.HOMETEAM as c4,
T6428.DATE1 as c5,
T6428.AWAYTEAM as c6
from
PREMIER_LEAGUE T6428 /* Home Team – Premier League */
where ( T6428.YEAR = ‘2009′ )
group by T6428.AWAYTEAM, T6428.DATE1, T6428.HOMETEAM, T6428.YEAR),
SAWITH1 AS (select distinct SAWITH0.c1 as c1,
SAWITH0.c2 as c2,
SAWITH0.c3 as c3,
SAWITH0.c4 as c4,
SAWITH0.c5 as c5,
‘Home’ as c6,
SAWITH0.c6 as c7
from
SAWITH0)
select distinct SAWITH1.c3 as c1,
SAWITH1.c4 as c2,
SAWITH1.c5 as c3,
SAWITH1.c6 as c4,
SAWITH1.c7 as c5,
SAWITH1.c2 as c6,
SAWITH1.c1 as c7,
case when SAWITH1.c2 – SAWITH1.c1 > 0 then 1 else 0 end as c8
from
SAWITH1
order by c1, c2, c3, c4, c5

This time the query is the same, but I have put all the details into the table so that each match record is analysed, then used a pivot table to display the results

The Pivot

So the secret is to make sure that the calculations are done at the detail row level to get the right answer, then have the BI server add the rows for you.

The exam question can therefore be answered by creating a query with all the necessary calculations, Half time losing (make =1) minus full time winning (make = 0). sum the result in a pivot.

So the data can be analysed by treating the fact like an attribute of the match.  My favourite method to model this is to create the resultant fields – HT_HW (0/1), FT_HW(0/1) on the table and add these to a logical dimension. Then expose the dimension table and, Hey Presto your facts are acting like dimensional attributes.

Oracle Fusion Middleware 11g – BI ADF Components – First impressions

January 28th, 2010 by Venkatakrishnan J

One of the few things that i have been working on this week is in understanding how the recently released Fusion Middleware 11g will influence Oracle BI EE in the near future. As was demonstrated & presented in the last Open World by Oracle, the strategic direction of Oracle is to use Fusion Middleware 11g as the base for all Oracle products including Business Intelligence going forward. FMW 11g has an excellent set of components like Visualizations, Custom Graphs, Table & Pivot View Components, Hierarchy Viewer etc which can all add value to the existing feature set of BI EE. If Oracle can somehow make them as pluggable components in future releases, it will make it a lot easier to extend BI EE.

One of the first integrations that i noticed between Fusion Middleware and BI EE was when JDeveloper 11g Tech Preview 4 was released almost 2 years back. I had blogged about it sometime back here. This integration was very promising wherein every report in BI EE Web Catalog was exposed as a view object. So in effect, from within JDeveloper one can easily apply ADF visualizations to an existing BI EE report/dashboard. This integration was very similar to the BI Publisher – BI EE integration where instead of a template, ADF visualization is applied. Unfortunately, this feature does not exist anymore in the production release of JDeveloper 11g. Though it does not exist as a feature now, there are ways to still expose them within JDeveloper. I will cover them in future blog entries.

Picture 12

Picture 13

In the current release of Fusion Middleware 11g, there are quite a few BI specific visualizations that have been made available. All of these can be used even on existing BI EE 10g installations. But the only downside is, there will be quite a bit of coding involved to tie BI EE 10g with the FMW 11g graphs. Hopefully this should be made easier with BI EE 11g.

I will basically highlight some of the interesting ADF Visualization components specific to Business Intelligence in this blog entry here. Many of the graphs shown below are available as part of a sample application that needs to be obtained from OTN. All these(base components for the charts/graphs) are available out of the box with the FMW 11g installation.

Custom Gauges:

FMW 11g adds some significant number of Gauge components. I have listed the important ones below. All these are very interactive like the Corda Charts that we are used to seeing in BI EE. ADF Gauges are based on SVG and hence the drills are supported here as well.

Picture 14

Picture 15

Graph Visualizations:

FMW 11g supports 2 main features in any graph

1. Custom Annotations can be added
2. Individual data points can be moved (drag and drop) for better visualization

Picture 17

Picture 17

Custom panel splitters are supported as well. One can essentially have 4 graphs in a page and then hide/unhide them in a pane layout as shown below

Picture 18

Its very interesting to note that time based interactivity in the graphs have been added. For example, one can choose a time period within a master graph, the child graphs will change automatically.

Picture 19

There is also an inline graph component called Spark Charts. These are charts that are embedded as a cell object in a table/pivot view.

Picture 20

Pivot Tables:

This is one area where i am seeing significant changes to the way BI EE 10g pivot table and FMW11g Pivots. Some interesting features of FMW11g pivot tables are

1. Pivot Table Heat Maps & Conditional Formatting

Picture 21

2. Ability to embed charts (Spark Charts & Gauges) in the pivot tables

Picture 22

3. Ability to drag and drop the columns outside of the edit layout. Also ability to edit cells (potential for writebacks)

Picture 24

Gantt Charts:

This is a significant addition to the list of supported graphs in the FMW 11g. Quite a lot of visualizations are supported on the Gantt Charts

Picture 23

Picture 24

There is another nice add-on to the normal Gantt charts i.e one can track Resource utilization as a stacked bar against Time as the base.

Picture 25

Hierarchy Viewer:

This visualization is probably one of the highlights of FMW 11g. Instead of a normal hierarchy drills from Pivots/tables, this visualization can provide visual hierarchical drills as shown below

Picture 26

Picture 28

Mapviewer Integration:

This is another area where FMW 11g adds a lot of value. It is now possible to directly reuse Maps built out of Mapviewer and render them along with BI components within the FMW framework

Picture 28

There are other areas as well where every graph/chart has improved. For example, now FMW 11g supports a set of attributes that can be modified uniformly for all the graphs. It also provides Master-Detail forms/reports. On a whole, to me, it looks like Oracle has indeed spent quite a bit of time and effort to get the BI framework in FMW ready. Now if only these can be reused out of the box directly from within the BI EE in future releases, then it could add significant dimension to the way we do BI reporting. I guess only time will tell on how these components eventually get into BI EE.

Website Design & Build: tymedia.co.uk