Is RAC Suitable For Data Warehousing?
May 30th, 2005 by Mark Rittman
Another one of our clients is looking to put some hardware in place for a new
data warehousing project. The two hardware options are:
- Buy a single, large Unix server (probably a HP box running HP-UX) with 8
CPUs - Buy two or more commodity "Lintel" boxes, with 2 or 4 processors,
running Red Hat Advanced Server, and use RAC to create a single database
instance.
But just how suitable is Real Application Clusters for data warehousing? The reason I started doubting RAC’s suitability was an article I came across entitled
"Oracle Real Application
Clusters and Data Warehouse Applications" which stated the following:
"Oracle has marketed the Real Application Clusters (RAC) option as
appropriate for all types of applications, but there is one very important
exception, Oracle data warehouses. Oracle RAC is used primarily for
continuous availability of mission-critical systems (i.e. banking
applications) and for transparent scalability of massive online transactions
systems (i.e. Amazon). However, it is not always the best architecture for
data warehouse that require large materialized view rollups because they
often require large, cohesive RAM regions (large data sorts) and
tightly-coupled CPU s (fast parallel query) … In sum, Oracle Real
Application Clusters is a wonderful tool for mission-critical database that
must have continuous availability (using RAC with Transparent Application
Failover (TAF)) and for scalability of super-large OLTP systems, Oracle RAC
is not the best choice for data warehouse applications that require
high-speed table scan performance."
Now I know Don and Mike have spent quite a bit of time working with RAC, so I
thought it wise to look into this in a bit more detail. What I did in the end
was
post a link to the article on the
Dizwell Forum, and ask the forum participants for their experiences with RAC
and data warehousing. I won’t repeat the whole thread (which went on for three
pages, so it’s obviously an area of interest) but if you’re considering RAC for
a data warehousing project,
take a look at the thread and
Don’s
article, as they’re both useful bits of background reading. My conclusion? A
large, single server and a collection of clustered servers aren’t
interchangeable concepts; there could be issues with RAC with "edge cases", for
instance (as mentioned in the forum comments) where a parallel query needs to
run over multiple nodes, and the "large cohesive memory area" issue I still
think could be an problem if you need more memory than a single small server can
provide. The lesson I guess though is that there’s no "black and white" rule for
this sort of thing (not that the original article was suggesting this) and it’s
worth getting a bit of peer advice if you’re looking to get an answer to a
question such as this, with presumably big implications if you get it wrong.

June 1st, 2005 at 2:35 am
Mark,
I see Don updated the article. It used to be rather black and white. It is good to see we’ve taken the rather long leap from:
http://www.phpbbserver.com/phpbb/viewtopic.php?t=52&mforum=dizwellforum&sid=e69fcb5ba2c0e5a02b6f083e4f069eb8
However, it is not an appropriate architecture for data warehouse
To the more subtle
However, it is not always the best architecture for data warehouse
as you’ve quoted today. Very dramatic what a couple of words can do to the implication of a statement.
Public comment is a good thing. I would encourage all to read the thread on the Dizwell forum. There they will hear about many instances of data warehouses successfully running on RAC, instead of inferences that to do so would not be a good thing.
June 4th, 2005 at 1:10 pm
Tom,
Good observation. Didn’t actually notice at the time that the article had changed, as you say the original version was more “black and white”, hence my posting to the Dizwell forum. Thanks for pointing this out.