EII, Predictive Modelling & RMOUG Training Day Presentations

Some more interesting interesting articles and papers.

"EII - Dead On Arrival" by Andy Hayler for DMReview looks at the current buzz around Enterprise Information Integration - "lightweight BI" based around the synthesis of XML feeds - and finds it comes up lacking:

"Some people who should know better have swallowed this EII mirage hook, line and sinker, and a number of start-up companies have been funded flaunting "EII for business intelligence" messages. The only problem with this new futurist approach is that it is absolutely and utterly flawed.

Let's consider the problem again. You have data in dozens of incompatibly structured source systems. Your new EII software is somehow going to build a presumably fairly complex set of distributed queries that will zip off to the source transaction systems, interrogate them and bring back a result set that will somehow produce a consistent answer.

The first problem is: how exactly does the EII software know what the linkages are between the differently coded source system structures? Somewhere it is going to have a catalog which will translate the differences, rather like a dictionary to translate words from one language to another. This sounds suspiciously like a metadata dictionary of the type that data warehouses have to construct, but let's leave that aside for the moment.

What exactly happens when those distributed queries make their way through to the source systems? For a start, the unpredictable nature of queries will upset the careful load balancing done by operations departments to optimize online throughput. Or rather, it won't, because no systems managers are going to allow this technology anywhere near their delicately balanced systems, at least not after the first time it brings the ordering system to a grinding halt.

The next problem with the EII approach is that there is no history. For transaction systems, you want to archive data quickly in order to maintain high performance (there is no need to worry about what your account balance was last year, just what it is now; last year's balance can be archived). However for an inquiry: show me the trend in account withdrawals over the last year in the Southeast region, this does require historical data.

Next, do these vendors really think that all the analysis hierarchies needed are embedded within the ERP systems? To take the example of marketing, there are normally complex segmentation hierarchies for analysis purposes that are usually held in entirely separate places from the core transaction systems and are not stored along with each order or invoice.

Just as importantly, the EII tools entirely ignore the tedious problem of data quality. It may be news to vendors who have more experience producing PowerPoint slides than production code, but the quality of data lurking in the transaction systems is not what it might be. This is why there is an industry of products to assist with improving data quality, and why a significant chunk of any data warehouse project budget is associated with data quality. Oh that's right; you don't need a data warehouse any more, so I guess you may as well ignore that pesky data quality problem as well."

I took a look at this concept late last year when I posted a link to a colleagues' presentation on Executive BI Dashboards With XML, XQuery And XDS and the issue around historic data was brought up by Ferenc Palyi in the article comments. I think Andy Hayler has hit the nail on the head with his article - although you also should notice that he works for Kalido, who sell very expensive full service packaged data warehouses, the antithesis of what EII vendors provide - but my thinking nowadays is that EII systems are more of a point solution for integrating disparate current data in real-time as opposed to a replacement for a traditional data warehouse, which is probably what the originators of the idea first had in mind before the vendors arrived on the scene.

There's a rather entertaining thread currently running on Asktom around Don Burleson's challenge for someone to create "a reliable predictive model that will suggest tables and indexes which will measurably benefit from reorganization.". Don is suggesting that it may be possible to build a model that can predict when a table or index would benefit from rebuilding, and bases this model on a number of assumptions that Tom subsequently states are fundamentally flawed. I won't pretend to be an expert in this area or try and paraphrase the argument, but take a look at the original challenge, the thread on Asktom, and the additional one's on Don's Oracle DBA board and Howard J. Rogers's Dizwell Forum. Note also a couple of good postings, one by Robert Freeman and one by Jonathan Lewis that discuss whether such a model is in fact possible, and what sort of assumptions it would have to be built on. Fascinating stuff.

Update: Don Burleson asks "Are All Oracle Scientists Created Equal" whilst Jonathan Lewis' Spanish Correspondent, "Don Quixote", asks "Can You Spend Too Much Cash?". See also postings on the AskTom thread by Mike Ault, Howard J. Rogers, Tim Hall and Tom's follow-up to Mark Cunningham's posting.

The RMOUG Training Days recently took place, and there's a whole load of presentations and papers available for download that'll be of interest to BI&DW developers. Some interesting ones that I noticed were:

Finally, Chris Lawson has written a useful article on the performance issues that can sometimes arise when adding PL/SQL functions to an SQL statement. "How Functions Can Wreck Performance" should be of interest to any ETL developer looking to use functions as a way of getting around a tricky SQL problem, and looks in particular at the effect of calling such PL/SQL functions repeatedly. Well worth a five minute read.