Neil Raden On ETL, Real-Time Data Warehousing and the Semantic Web
April 25th, 2005 by Mark Rittman
Neil Raden, who’s article on
Model-Driven Approaches
For BI Projects I linked to last year, dropped me a line to tell me about
some new real-time ETL articles he’d written:
"Mark,
I’ve written some articles and white papers about real-time data
warehousing:
http://ww.hiredbrains.com/knowout.html…but I think the really interesting part of it is not data warehousing,
per se, but abstraction and real-time analytics. Abstraction can provide the
logical-to-physical layer between a data warehouse and a BI app, but it can
also provide the kind of rich meaning we need for our machines to do some
reasoning, something all current data warehousing and BI concepts lack.For that, I’m investigating the competing approaches of the Semantic Web and
Emergent Semantics, though I’m leaning toward the latter.With good semantics and Moore’s Law, much of data warehousing becomes
irrelevant. As for BI, most of it is parlor tricks. I’m hoping to see a new
batch of tools that can reason and learn, at least in the limited domain
commerce, supply chain, CRM, etc. That’s where real time will show some
returns."
Apart from the list of
Neil’s ETL and data warehousing articles, there’s a couple of good (free)
e-Books linked to on Neil’s site, including one on
ETL and
Data Integration (including articles on Kimball vs. Inmon, the model-driven
BI project and two on real-time data warehousing), and another one specifically
about
real-time data warehousing. You have to go through an annoying registration
process to get the books, and they’re Windows-only executables, but it looks
like there’s some interesting content there. See also
"Implementing Real-Time Data Warehousing Using Oracle 10g" on DBAZine.
Real-Time data warehousing is an interesting area, and one that’s addressed
by some of the new features in OWB "Paris" – the ability to accept data from
Advanced Queues and web services, and the ability to publish out to the same,
such that you can publish an OWB mapping that "listens" for ETL data and then
transforms and hands it off in real-time. The reality though, at least as far as
I’ve experienced in the UK, is that the market isn’t really clamouring for this
at the moment, at least not in any volume. What is of interest though is
reducing the ETL load time down to as close to zero as possible, with as little
impact as possible on users who are accessing the system, and it’s this
requirement that’s driving my interest in this area. Whilst it’s still probably
a while off before a significant number of OWB users use technologies such as
web services and AQs to process their ETL jobs, RDBMS technologies such as
external tables, table functions/pipelining and change data capture are already
getting take-up and are starting to become a normal feature of OWB projects.
Must also take a proper look at this "semantics" stuff as well – it’s been
cropping up a lot recently and I need to get a better understanding of what this is all about…

