EII, Predictive Modelling & RMOUG Training Day Presentations
February 26th, 2005 by Mark Rittman
Some more interesting interesting articles and papers.
"EII - Dead On Arrival" by Andy Hayler for
DMReview looks at the current buzz around Enterprise Information Integration
- "lightweight BI" based around the synthesis of XML feeds - and finds it comes
up lacking:
"Some people who should know better have swallowed this EII mirage
hook, line and sinker, and a number of start-up companies have been funded
flaunting "EII for business intelligence" messages. The only problem with
this new futurist approach is that it is absolutely and utterly flawed.Let’s consider the problem again. You have data in dozens of incompatibly
structured source systems. Your new EII software is somehow going to build a
presumably fairly complex set of distributed queries that will zip off to
the source transaction systems, interrogate them and bring back a result set
that will somehow produce a consistent answer.The first problem is: how exactly does the EII software know what the
linkages are between the differently coded source system structures?
Somewhere it is going to have a catalog which will translate the
differences, rather like a dictionary to translate words from one language
to another. This sounds suspiciously like a metadata dictionary of the type
that data warehouses have to construct, but let’s leave that aside for the
moment.What exactly happens when those distributed queries make their way through
to the source systems? For a start, the unpredictable nature of queries will
upset the careful load balancing done by operations departments to optimize
online throughput. Or rather, it won’t, because no systems managers are
going to allow this technology anywhere near their delicately balanced
systems, at least not after the first time it brings the ordering system to
a grinding halt.The next problem with the EII approach is that there is no history. For
transaction systems, you want to archive data quickly in order to maintain
high performance (there is no need to worry about what your account balance
was last year, just what it is now; last year’s balance can be archived).
However for an inquiry: show me the trend in account withdrawals over the
last year in the Southeast region, this does require historical data.Next, do these vendors really think that all the analysis hierarchies needed
are embedded within the ERP systems? To take the example of marketing, there
are normally complex segmentation hierarchies for analysis purposes that are
usually held in entirely separate places from the core transaction systems
and are not stored along with each order or invoice.Just as importantly, the EII tools entirely ignore the tedious problem of
data quality. It may be news to vendors who have more experience producing
PowerPoint slides than production code, but the quality of data lurking in
the transaction systems is not what it might be. This is why there is an
industry of products to assist with improving data quality, and why a
significant chunk of any data warehouse project budget is associated with
data quality. Oh that’s right; you don’t need a data warehouse any more, so
I guess you may as well ignore that pesky data quality problem as well."
I took a look at this concept late last year when I posted a link to a
colleagues’ presentation on
Executive BI Dashboards
With XML, XQuery And XDS and the issue around historic data was brought up
by Ferenc Palyi in the article comments. I think Andy Hayler has hit the nail on
the head with his article - although you also should notice that he works for
Kalido, who sell very expensive full
service packaged data warehouses, the antithesis of what EII vendors provide -
but my thinking nowadays is that EII systems are more of a point solution for
integrating disparate current data in real-time as opposed to a replacement for a
traditional data warehouse, which is probably what the originators of the idea
first had in mind before the vendors arrived on the scene.
There’s a
rather entertaining thread currently running on
Asktom around
Don Burleson’s challenge for someone to create "a reliable predictive
model that will suggest tables and indexes which will measurably benefit from
reorganization.". Don is suggesting that it may be possible to build a
model that can predict when a table or index would benefit from rebuilding, and
bases this model on a number of assumptions that Tom subsequently states are
fundamentally flawed. I won’t pretend to be an expert in this area or try
and paraphrase the argument, but take a look at the
original challenge, the
thread on Asktom, and the additional one’s on
Don’s Oracle DBA board
and Howard J. Rogers’s
Dizwell Forum.
Note also a couple of good postings,
one by
Robert Freeman and one by
Jonathan Lewis that discuss whether such a model is in fact possible, and
what sort of assumptions it would have to be built on. Fascinating stuff.
Update: Don Burleson asks
"Are All
Oracle Scientists Created Equal" whilst Jonathan Lewis’ Spanish
Correspondent, "Don Quixote", asks
"Can You Spend Too
Much Cash?". See also postings on the
AskTom thread by
Mike Ault,
Howard J. Rogers,
Tim Hall and Tom’s follow-up to
Mark Cunningham’s posting.
The RMOUG Training Days recently took place, and there’s a whole load of
presentations and papers available for download that’ll be of interest to BI&DW
developers. Some interesting ones that I noticed were:
-
Oracle 10g Wait
Event Tuning with the Automated Session History Tables -
Oracle
Discoverer V4 The Basics Uncovered and Administration Setup -
Quick Web
Development using JDeveloper 10g -
Oracle 10g SQL
Tuning Secrets -
Blowing the
Whistle on REF Cursors, Implicit Cursors, and Native Dynamic SQL: When and
Why Each of These PL/SQL Constructs Comes In Handy -
Oracle JDeveloper for
Database Developers and DBAs
Analytic SQL Functions-Teaching an Old Dog New Tricks-
Oracle/Unix
Scripting Tips and Techniques -
Oracle9i PL/SQL
Fundamentals -
Oracle 10g/9i
for Developers: What You Need to Know -
Oracle Spatial and
Location Technologies -
Profiling Oracle: How
It Works
PL/SQL Debugging: Going Beyond DBMS_OUTPUT- Data Warehousing 101
-
Agile Methods and
Data Warehousing -
Data Mining Options in
Oracle 10g -
Get the Bigger
Picture with OLAP-Enabled Oracle BI 10g - Unified, Strategic, and Extensible -
Bridging the Gap Between
Structured and Unstructured Data - All -
Using STATSPACK as a
Performance DW -
No Bikinis? Working
with SQL’s Model - Intermediate -
Materialized Views in
Action -
Supercharging Star
Transformations -
Data Vault-What’s
the Combination?
Using the Oracle Metabase Plus Language to Build and Deploy Mappings and
Workflows-
XML Survival Skills
for DBAs -
Resource Mapping: A
Wait Time-Based Methodology for Performance Analysis -
Zeroing In on
Performance in Oracle 10g -
Gathering
Statistics: How Often and How Precise -
Index Organized
Tables-Are They Right for You? -
Speeding Up
Queries with Semi-Joins and Anti-Joins: How Oracle Evaluates EXISTS, NOT
EXISTS, IN, and NOT IN -
Use EXPLAIN
PLAN and TKPROF to Tune your Applications -
BIS and Discoverer EUL
in 11.5.9 -
Oracle Applications
Reporting-How Can I Get What I Need?
Finally, Chris Lawson has
written a useful article on the performance issues that can sometimes arise when
adding PL/SQL functions to an SQL statement.
"How Functions Can Wreck
Performance" should be of interest to any ETL developer looking to use
functions as a way of getting around a tricky SQL problem, and looks in
particular at the effect of calling such PL/SQL functions repeatedly. Well worth
a five minute read.
