Open World Paper on RAC, DW and 32-bit Linux

If you read my posting a couple of months ago asking whether RAC was suitable for data warehousing, and the debate beforehand on the Dizwell Forum, you might be interested in a white paper I've just come across on last year's Open World download site. Entitled "20TB on a Linux Cluster Today : How To Build a Multiterabyte Data Warehouse, Using Linux and RAC" (no author name given) which covers some of the questions that I raised in the original postings.

The questions I had at the time were particularly around whether a set of individual servers would be able to put enough (continuous) RAM together to satisfy big queries, and whether replacing one big server with lots of little ones would mean that parallel queries would run faster. Page 15 of the paper has a section on the memory question:

"One of the largest concerns when implementing a database system on 32-bit Linux has always been how much memory could be addressed. Oracle 10g out of the box can only have a 1.7GB of shared memory for its SGA on a generic 2.4.x 32-bit Linux kernel. However, it is possible to increase this size, using one of the following options: change the memory mapped_base, using a shared memory file system or implement hugetlbfs. For more information about these option please see Appendix C of the Database Administrator’s Reference 10g Release 1 (10.1) for UNIX Systems: AIX-Based Systems, hp HP-UX PA-RISC (64-bit), hp Tru64 UNIX, Linux x86, and Solaris Operating System (SPARC).

However most data warehouses don’t need a large SGA. Data warehouse applications are dominated by queries scanning a lot of data, performing join, large sort and bitmap operations at a high degree of parallelism. Memory for hash joins and sort operations is allocated out of the PGA (Program Global Area), not the SGA. PGA memory is not bound by the 1.7GB SGA. A 4 CPU system running a degree of parallelism of 8 uses typically less than 3.2GB of PGA. To tune the PGA please refer to the Oracle® Database Performance Tuning Guide10g Release 1 (10.1)."

and Page 16 had a section on parallel query:

"The largest demand for interconnect traffic in data warehouse applications comes from inter process communication (IPC). When performing join, aggregation or load operations involving multiple nodes it might be necessary to re-distribute data and send control messages from one node to another. Processes, which are also called Parallel Servers, communicate with each other using the Interconnect. The amount of interconnect traffic depends on the operation and the number of nodes participating in the operation. Join operations tend to utilize the interconnect traffic more than simple aggregations because of possible communication between Parallel Servers. The amount of interconnect traffic can vary significantly depending on the distribution method. Which distribution method is used can be found in the PQ Distrib column of the query plan. Cases where one side of the join is broadcasted or both sides are hash-distributed result in the largest interconnect traffic. Partial Partition Wise Joins in which only one side of the join is redistributed result in less interconnect traffic, while Full Partition Wise Joins in which no side needs to be redistributed result in the least interconnect traffic.

The amount of interconnect traffic also depends on how many nodes participate in a join operation. The more nodes participate a join operation the more data needs to be distributed to remote nodes. For instance in a 4-node RAC cluster with 4 CPU each to maximize load performance with external tables one needs to set the DOP to 32 on both the external and internal tables. This will result in 8 Parallel Servers performing read operations from the external table on each node as well as 8 Parallel Servers performing table creation statements on each node. On the other hand if there are 4 users on average on the systems issuing queries, it is very likely that each user’s query runs locally on one node reducing the number of remote data distribution to zero."

which backs up what was said on the Forum. There's also some useful bits on setting up storage arrays, measuring IO throughput and tuning the interconnect protocols.

The interesting point for me here was about the PGA area not being subject to the 1.7GB limit as the SGA - it can in fact go up to the 4GB limit for a single process - and that this PGA allocation is in addition to the SGA total, which means that you can take advantage of RAM amounts greater than 4GB on a server. Of course with a 64-bit server (x86, Itanium etc) this is a non-issue but if you're looking to use cheapo 4-processor commodity boxes, it looks like this addresses issue about enough RAM being available. It's also worth bearing in mind as well if you're looking to use OWB10g on Linux x86-64; OWB10g (in 64-bit mode) is only certified on AMD Opteron 64-bit chips, not Intel 64-bit chips, and then only when using Suse SLES9, which could mean that you end up running both the database and OWB in 32-bit mode even when you've got 64-bit processors. Anyway, the paper's here and the Dizwell discussion's here if you want to read more.