New Business Intelligence Articles And Links
February 20th, 2005 by Mark Rittman
A few articles and papers I came across recently, that you might find
interesting.
First up is a number of in-depth articles written by
Howard Rogers, on some Oracle
10g features that BI&W developers might find interesting. The first article is
on materialized
views, which Howard describes as "amongst the cleverest, neatest, most
productive features Oracle has ever come up with, and they continue to impress
me every time I use them, even though they ve been around for many, many Oracle
versions (including a stint in ancient versions as something known as a
snapshot table )" Materialized views are the cornerstone of Oracle’s
relational OLAP offering and this article goes through the manual creation of
materialized views, the use of the new DBMS_ADVISOR which writes the
materialized view creation code for you, and the miscellaneous small
improvements in materialized views that come with 10g. The other two articles,
on writable
external tables and the new
workload
repository, are also useful from an ETL and tuning standpoint, and whilst
you’re there I’d recommend taking a look at the
rest of the site and in
particular Howard’s
blog, as he’s a veteran of the various Oracle usenet groups and a recently
"ex-member"
of the Oaktable. Always well written and worth taking the time out to read.
Oracle’s Antony Reynolds recently posted a blog article about
scaling J2EE
applications, which caught my interest as I recently worked on a couple of
short engagements that centred around the stress-testing of a J2EE application.
According to Antony, the factors that usually limit applications are resource
contention, non-linear scaling, cost and manageability and mentions two common "gotcha’s"
when scaling J2EE applications - mismatched connection pools and unnecessary
container managed transactions. A good, fairly concise article that gives a good
heads up on what can limit the scalability of a J2EE app and how to deal with
this.
There was an interesting posting on the
AMIS blog the other day about
model-driven design on
ETL functions. The posting references an article in Database Magazine by
Mark Zwijsen which proposes that "by structuring the functional design of the
ETL process, it should be possible to automatically
generate about 80% of the actual ETL code".
Unfortunately there isn’t a link to an online version of the article and I can’t
locate it with a Google search (maybe it’s in Dutch?) but there’s a bit more
detail in the blog posting and a diagram of a possible ETL structure.
Nigel Pendse recently published the results of the
OLAP Survey 4 and following on from
that wrote an article for DMReview that
went through some of the findings. A couple of points that I found
interesting were firstly, that Oracle Discoverer users report the worst query
performance, and that secondly, Oracle Express users report the best data
quality. Looking at the comment about Discoverer in particular, I can’t help
thinking that this is because, with later versions of Discoverer (9ias and up)
it’s considerably more difficult to get summaries working than with earlier
versions; with Discoverer 3i, Discoverer itself managed the summary tables, and
redirected it’s queries to summaries if they would speed up queries. With
current versions of Discoverer, you use the query rewrite/materialized views
mechanism, which means that the DBA has to get the rewrite mechanism working
(changing initialisation parameters for the database), grant certain permissions
to both the Discoverer administrator, the End User Layer and the Discoverer
users so that query rewrite can take place, and even then with early versions of
Discoverer 9iAS running against 8i and 9i databases, it was hit and miss as to
whether a materialized view would actually get used. With current versions, it’s all a
lot more reliable, but it still takes DBA intervention and changes to the
database parameters, and in most cases the Discoverer administrator never even
knows about these issues and what needs to be done to enable summaries, and just
assumes that performance will always be this bad. Coupled with that, for most
Discoverer reporting projects, unlike projects that use dedicated, proprietary
OLAP servers, there usually isn’t a section in the project plan to create and
manage summaries, as they’re not mandatory (as they are with OLAP servers such
as Express or Powerplay) and the system seems to work just fine without them, at
least at the start. In short, there’s no technical reason why Discoverer users
should have to put up with poor query performance, but because of the
combination of a difficult to (initially) set up summary management mechanism,
and the fact that summaries are "optional", in fact most Discoverer
installations do run pretty slowly.
BI Blogs describes itself as "Bringing
together Business Intelligence information from across the blogosphere!" and
aggregates feeds from a number of Oracle, Microsoft and other BI-focussed blogs
including ones by
Mosha Pasumansky, the
Hitachi Consulting Yukon Team,
Jonathan
Weitz and yours truly. I thought this was an interesting development as I
keep a close eye on what Microsoft, Cognos, Business Objects and the like are
doing with business intelligence & OLAP, and it’s particularly interesting to
see how the Yukon business intelligence features are shaping up. One to
bookmark.
Lastly, there’s a trio of new articles in
IntelligentEnterprise that’ll
be of interest to readers of this site. Ralph Kimball and Margy Ross look at
slowly changing dimensions, and
how types 1, 2 and 3 sometimes need to be adapted for more analytically mature
warehouse users; Seth Grimes looks at
unstructured data and domain-rooted semantics, whilst Michael Gonzales looks
at
using rules-based audits and proofs of concept to avoid potential BI project
pitfalls.

February 23rd, 2005 at 3:42 am
Just wanted to comment on the section about poor Discoverer query performance. In general the client server version performs fairly well, but Oracle’s Plus version from 902 upwards seems to be a dog. We currently have 10.1.2 of Plus in test mode and it has made substantial improvements in performance.
As far as the summaries go, another obstacle seems to be that the SQL that Discoverer writes is so convoluted and complex, it defeats almost every attempt at creating summaries to redirect the queries to. Even using the “Summary Advisor” in the admin tool yields mediocre results. In particular, alternate sorts applied to data seem to throw almost all summaries for a loop and “go around” the query redirection mechanism.
An effective technique I have seen done is a single table summary based on the fact table, along with indexes on the summaries. The dimension tables were also converted to IOT’s which yielded much better performance.
Just my 2 cents..
February 23rd, 2005 at 8:48 pm
Mark,
Good point, thanks for this. Interesting comment about the IOT dimensions as well, I’ll take a look at that for the next system I set up.
thanks
Mark