Being Too Clever For Your Own Good

August 24th, 2005 by Mark Rittman

I was with a client the other week and was asked to look at a particular SQL
statement that kept failing due to lack of TEMP space. It looked something like
this (names changed to protect the innocent, etc.)

CREATE TABLE lookup_table
AS
SELECT
    destn,
	studref,
    min(pi) pi
fROM (
SELECT
     pct1.DES       as container_type
    ,pc1.ID                 as pi
    ,pc1.SPN as pn
    ,pct2.DES       as Desc2
    ,pc2.DES        as studref
    ,pct3.des       as desc3
    ,pc3.dest_code        as destn
FROM
     container pc1
    ,element  e1
    ,container_type pct1
    ,element  e2
    ,container pc2
    ,container_type pct2
    ,element  e3
    ,container pc3
    ,container_type pct3
WHERE
    pc1.CONTAINER_TYPE_ID = 3
AND e1.CONTAINER_ID       = pc1.id
AND pct1.ID                       = pc1.CONTAINER_TYPE_ID
AND e2.id                         = e1.PARENT_ID
AND pc2.ID                        = e2.CONTAINER_ID
AND pct2.id                       = pc2.CONTAINER_TYPE_ID
AND e3.id                         = e2.PARENT_ID
AND pc3.ID                        = e3.CONTAINER_ID
AND pct3.id                       = pc3.CONTAINER_TYPE_ID
) pages
GROUP BY destn, studref
;

which when executed gave the following error:

CREATE TABLE lookup_table
*
ERROR at line 1:
ORA-12801: error signaled in parallel query server P030
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

Now of course my first reaction was "get the TEMP tablespace extended" but it
had already just been extended to 8GB (made up of 4 tempfiles). So what could be
causing the issue?

The first thing I did was to run an explain plan on the query, as my
suspicion was that the joins were perhaps going to be hash joins, which can make
heavy use of the TEMP tablespace if the hash can’t be built in memory (defined
by the HASH_AREA_SIZE parameter). The explain plan looked like this:

SQL> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

 PLAN_TABLE_OUTPUT
 ----------------------------------------------------------------------------------------------------

 ----------------------------------------------------------------------------------------------------
 | Id  | Operation                                | Name                      | Rows  | Bytes | Cost  |    TQ  |IN-OUT| PQ Distrib |
 ----------------------------------------------------------------------------------------------------
 |   0 | CREATE TABLE STATEMENT                   |                           |     1 |   120 |   406 |        |      |            |
 |   1 |  LOAD AS SELECT                          |                           |       |       |       |        |      |            |
 |   2 |   PX COORDINATOR                         |                           |       |       |       |        |      |            |
 |   3 |    PX SEND QC (RANDOM)                   | :TQ10004                  |     1 |   120 |   406 |  Q1,04 | P->S | QC (RAND)
 |   4 |     SORT GROUP BY                        |                           |     1 |   120 |   406 |  Q1,04 | PCWP |            |
 |   5 |      PX RECEIVE                          |                           |     1 |   120 |   405 |  Q1,04 | PCWP |            |
 |   6 |       PX SEND HASH                       | :TQ10003                  |     1 |   120 |   405 |  Q1,03 | P->P | HASH       |
 |   7 |        NESTED LOOPS                      |                           |     1 |   120 |   405 |  Q1,03 | PCWP |            |
 |   8 |         HASH JOIN                        |                           |     1 |   112 |   405 |  Q1,03 | PCWP |            |
 |   9 |          PX RECEIVE                      |                           |     1 |   100 |   305 |  Q1,03 | PCWP |            |
 |  10 |           PX SEND BROADCAST              | :TQ10002                  |     1 |   100 |   305 |  Q1,02 | P->P | BROADCAST  |
 |  11 |            NESTED LOOPS                  |                           |     1 |   100 |   305 |  Q1,02 | PCWP |            |
 |  12 |             NESTED LOOPS                 |                           |     1 |    87 |   305 |  Q1,02 | PCWP |            |
 |  13 |              NESTED LOOPS                |                           |     1 |    77 |   305 |  Q1,02 | PCWP |            |
 |  14 |               HASH JOIN                  |                           |     1 |    65 |   305 |  Q1,02 | PCWP |            |
 |  15 |                PX RECEIVE                |                           |     1 |    47 |   206 |  Q1,02 | PCWP |            |
 |  16 |                 PX SEND BROADCAST        | :TQ10001                  |     1 |    47 |   206 |  Q1,01 | P->P | BROADCAST  |
 |  17 |                  HASH JOIN               |                           |     1 |    47 |   206 |  Q1,01 | PCWP |            |
 |  18 |                   BUFFER SORT            |                           |       |       |       |  Q1,01 | PCWC |            |
 |  19 |                    PX RECEIVE            |                           |     1 |    26 |     0 |  Q1,01 | PCWP |            |
 |  20 |                     PX SEND BROADCAST    | :TQ10000                  |     1 |    26 |     0 |        | S->P | BROADCAST  |
 |  21 |                      NESTED LOOPS        |                           |     1 |    26 |     0 |        |      |            |
 |  22 |                       INDEX UNIQUE SCAN  | PK_CONTAINER_TYPE         |     1 |    13 |     0 |        |      |
 |  23 |                       INDEX FULL SCAN    | PK_CONTAINER_TYPE         |     1 |    13 |       |        |      |
 |  24 |                   PX BLOCK ITERATOR      |                           |  2708K|    54M|   202 |  Q1,01 | PCWC |            |
 |  25 |                    TABLE ACCESS FULL     | CONTAINER        	      |  2708K|    54M|   202 |  Q1,01 | PCWP |            |
 |  26 |                PX BLOCK ITERATOR         |                           |  2340K|    40M|    96 |  Q1,02 | PCWC |            |
 |  27 |                 TABLE ACCESS FULL        | ELEMENT                   |  2340K|    40M|    96 |  Q1,02 | PCWP |            |
 |  28 |               TABLE ACCESS BY INDEX ROWID| ELEMENT                   |     1 |    12 |     2 |  Q1,02 | PCW
 |  29 |                INDEX UNIQUE SCAN         | PK_ELEMENT                |     1 |       |     1 |  Q1,02 | PCWP |
 |  30 |              TABLE ACCESS BY INDEX ROWID | CONTAINER        	      |     1 |    10 |     2 |  Q1,02 | PC
 |  31 |               INDEX UNIQUE SCAN          | PK_CONTAINER              |     1 |       |     1 |  Q1,02 | PCWP |
 |  32 |             INDEX UNIQUE SCAN            | PK_CONTAINER_TYPE         |     1 |    13 |     0 |  Q1,02 | PCWP |
 |  33 |          PX BLOCK ITERATOR               |                           |  2340K|    26M|    96 |  Q1,03 | PCWC |            |
 |  34 |           TABLE ACCESS FULL              | ELEMENT                   |  2340K|    26M|    96 |  Q1,03 | PCWP |            |
 |  35 |         TABLE ACCESS BY INDEX ROWID      | CONTAINER                 |     1 |     8 |     2 |  Q1,03 | PCWP |
 |  36 |          INDEX UNIQUE SCAN               | PK_CONTAINER              |     1 |       |     1 |  Q1,03 | PCWP |            |
----------------------------------------------------------------------------------------------------

A couple of things jumped out at me: firstly, parallel query is being used
(spot the PX RECEIVE, PX SEND BROADCAST and PX BLOCK ITERATOR operations), and
secondly, there’s lots of hash joins going on (three to be precise, which
matches the number of tables in the query). So, a couple more bits of
information to find out : roughly how big are the tables that we’re joining (the
smallest of which will determine the size of hash table built) and how big is
the HASH_AREA_SIZE?

SQL> show parameter hash_area_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
hash_area_size                       integer     131072

SQL> select bytes from dba_segments where owner='STAGING' and segment_name = 'CONTAINER';

     BYTES
----------
 318767104

SQL> select bytes from dba_segments where owner='STAGING' and segment_name = 'ELEMENT';

     BYTES
----------
 150994944
SQL> select bytes from dba_segments where owner='STAGING' and segment_name = 'CONTAINER_TYPE';

     BYTES
----------
 65536

So, from looking at these figures, the HASH_AREA_SIZE isn’t too big, but the
smallest table (CONTAINER_TYPE) isn’t that big anyway, and according to this
article, you only need about 1.6 x the size of the smallest (driving) table
available in the hash area.

From speaking to a couple of people, the main finger of suspicion was
pointing towards parallel query. The consensus was that parallel query can
multiply the actual amount of hash joins going on by the number of parallel
slaves being used, and this could well end up taking more memory than if a
single hash table was built. Looking at the initialization parameters, parallel
query was enabled and with quite a high PARALLEL_MAX_SERVERS:

SQL> show parameter parallel_max_servers

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
parallel_max_servers integer 240

SQL> show parameter parallel_threads_per_cpu

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
parallel_threads_per_cpu integer 2

The server we were using had 12 SPARC processors, and all the tables and
indexes were defined using PARALLEL = DEFAULT, so they should pick up the
default degree of parallelism for the instance, which should be 12 (processors)
x 2 (parallel threads per CPU) = 24. So what was happening when we ran our
query? I ran it again, but in a separate session queried V$PX_PROCESS to see how
many slaves were running:

SQL> SELECT * FROM v$px_process;

SERV STATUS           PID SPID                SID    SERIAL#
---- --------- ---------- ------------ ---------- ----------
P023 IN USE            52 11852               318        441
P012 IN USE            38 11830               327       7723
P000 IN USE            24 10078               348       2064
P004 IN USE            29 10086               357       1433
P006 IN USE            31 10090               366       2312
P007 IN USE            32 10092               367        431
P018 IN USE            44 11842               371       1366
P003 IN USE            28 10084               372        292
P008 IN USE            33 10095               373        453
P014 IN USE            40 11834               374        500
P001 IN USE            26 10080               381        348
P011 IN USE            36 10111               401       1170
P017 IN USE            43 11840               414       1002
P020 IN USE            46 11846               416      21671
P002 IN USE            27 10082               445       7451
P019 IN USE            45 11844               465        344
P016 IN USE            42 11838               474        626
P009 IN USE            34 10097               489        723
P010 IN USE            35 10106               503        254
P013 IN USE            39 11832               504        265
P015 IN USE            41 11836               505       1848
P021 IN USE            47 11848               512       6012
P005 IN USE            30 10088               519        170
P022 IN USE            49 11850               546       1334

24 rows selected.

which was what I was expecting given the above settings; however running the
query at this degree of parallelism was causing the statement to fail, running
out of temp space. So what if we reduced the DOP, would that have any effect?

SQL>
drop table lookup_table;

Table
dropped.

Elapsed: 00:00:00.04
SQL> CREATE TABLE lookup_table parallel (degree 24)
  2  AS
  3  SELECT
  4      destn,
  5   studref,
  6      min(pi) pi
  7  fROM (
  8  SELECT
  9       pct1.DES       as container_type
 10      ,pc1.ID                 as pi
 11      ,pc1.SPN as pn
 12      ,pct2.DES       as Desc2
 13      ,pc2.DES        as studref
 14      ,pct3.des       as desc3
 15      ,pc3.dest_code        as destn
 16  FROM
 17       container pc1
 18      ,element  e1
 19      ,container_type pct1
 20      ,element  e2
 21      ,container pc2
 22      ,container_type pct2
 23      ,element  e3
 24      ,container pc3
 25      ,container_type pct3
 26  WHERE
 27      pc1.CONTAINER_TYPE_ID = 3
 28  AND e1.CONTAINER_ID       = pc1.id
 29  AND pct1.ID                       = pc1.CONTAINER_TYPE_ID
 30  AND e2.id                         = e1.PARENT_ID
 31  AND pc2.ID                        = e2.CONTAINER_ID
 32  AND pct2.id                       = pc2.CONTAINER_TYPE_ID
 33  AND e3.id                         = e2.PARENT_ID
 34  AND pc3.ID                        = e3.CONTAINER_ID
 35  AND pct3.id                       = pc3.CONTAINER_TYPE_ID
 36  ) pages
 37  GROUP BY destn, studref
 38  ;
CREATE TABLE lookup_table parallel (degree 24)
*
ERROR at line 1:
ORA-12801: error signaled in parallel query server P030
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

What was interesting here was that when I queried V$PX_PROCESS in a different
session whilst this statement was running, 48 slaves were running. Doug Burns
explains why this is in his
Parallel Query paper
- basically Oracle will actually kick off sets of PX
slaves for a given action, one as a producer of rows, and one as a consumer,
which led to 48 PX slaves being listed in V$PX_PROCESS, and the query still ran
out of temp space. So what if we ran it again, with a DOP of 12?

SQL> drop table lookup_table;

SQL> CREATE TABLE lookup_table parallel (degree 12)
  2  AS
  3  SELECT
  4      destn,
  5   studref,
  6      min(pi) pi
  7  fROM (
  8  SELECT
  9       pct1.DES       as container_type
 10      ,pc1.ID                 as pi
 11      ,pc1.SPN as pn
 12      ,pct2.DES       as Desc2
 13      ,pc2.DES        as studref
 14      ,pct3.des       as desc3
 15      ,pc3.dest_code        as destn
 16  FROM
 17       container pc1
 18      ,element  e1
 19      ,container_type pct1
 20      ,element  e2
 21      ,container pc2
 22      ,container_type pct2
 23      ,element  e3
 24      ,container pc3
 25      ,container_type pct3
 26  WHERE
 27      pc1.CONTAINER_TYPE_ID = 3
 28  AND e1.CONTAINER_ID       = pc1.id
 29  AND pct1.ID                       = pc1.CONTAINER_TYPE_ID
 30  AND e2.id                         = e1.PARENT_ID
 31  AND pc2.ID                        = e2.CONTAINER_ID
 32  AND pct2.id                       = pc2.CONTAINER_TYPE_ID
 33  AND e3.id                         = e2.PARENT_ID
 34  AND pc3.ID                        = e3.CONTAINER_ID
 35  AND pct3.id                       = pc3.CONTAINER_TYPE_ID
 36  ) pages
 37  GROUP BY destn, studref
 38  ;

Table created.

Elapsed: 00:05:51.07

and this time it is created without any problem. So now that we know that a
DOP of 12, and consequently 24 PX slaves, is OK, how about taking the DEGREE
clause of of the CREATE TABLE statement and instead limiting down the
PARALLEL_MAX_SERVERS parameter to "throttle" it to a maximum of 24. Would that
work?

SQL> conn sys/password@orcl as sysdba
Connected.
SQL> alter system set parallel_max_servers = 24 scope=memory;

System altered.

Elapsed: 00:00:00.00
SQL> conn STAGING/STAGING@orcl
Connected.

SQL> CREATE TABLE lookup_table
  2  AS
  3  SELECT
  4      destn,
  5   studref,
  6      min(pi) pi
  7  fROM (
  8  SELECT
  9       pct1.DES       as container_type
 10      ,pc1.ID                 as pi
 11      ,pc1.SPN as pn
 12      ,pct2.DES       as Desc2
 13      ,pc2.DES        as studref
 14      ,pct3.des       as desc3
 15      ,pc3.dest_code        as destn
 16  FROM
 17       container pc1
 18      ,element  e1
 19      ,container_type pct1
 20      ,element  e2
 21      ,container pc2
 22      ,container_type pct2
 23      ,element  e3
 24      ,container pc3
 25      ,container_type pct3
 26  WHERE
 27      pc1.CONTAINER_TYPE_ID = 3
 28  AND e1.CONTAINER_ID       = pc1.id
 29  AND pct1.ID                       = pc1.CONTAINER_TYPE_ID
 30  AND e2.id                         = e1.PARENT_ID
 31  AND pc2.ID                        = e2.CONTAINER_ID
 32  AND pct2.id                       = pc2.CONTAINER_TYPE_ID
 33  AND e3.id                         = e2.PARENT_ID
 34  AND pc3.ID                        = e3.CONTAINER_ID
 35  AND pct3.id                       = pc3.CONTAINER_TYPE_ID
 36  ) pages
 37  GROUP BY destn, studref
 38  ;

Table created.

Elapsed: 00:08:34.07

OK, that’s good, although the query time has gone up - this could just be
down to load on the server though. The principle is fine, by setting
PARALLEL_MAX_SERVERS to a lower amount, we can limit the amount of parallelism
taking place. How about going the whole hog and turning of PQ completely? How
would that affect it?

SQL> conn sys/password@orcl as sysdba
Connected.
SQL> alter system set parallel_max_servers = 0 scope=memory;

System altered.

Elapsed: 00:00:00.00
SQL> conn STAGING/STAGING@orcl
Connected.
SQL> drop table lookup_table;

Table dropped.

Elapsed: 00:00:00.01
SQL> CREATE TABLE lookup_table
  2  AS
  3  SELECT
  4      destn,
  5   studref,
  6      min(pi) pi
  7  fROM (
  8  SELECT
  9       pct1.DES       as container_type
 10      ,pc1.ID                 as pi
 11      ,pc1.SPN as pn
 12      ,pct2.DES       as Desc2
 13      ,pc2.DES        as studref
 14      ,pct3.des       as desc3
 15      ,pc3.dest_code        as destn
 16  FROM
 17       container pc1
 18      ,element  e1
 19      ,container_type pct1
 20      ,element  e2
 21      ,container pc2
 22      ,container_type pct2
 23      ,element  e3
 24      ,container pc3
 25      ,container_type pct3
 26  WHERE
 27      pc1.CONTAINER_TYPE_ID = 3
 28  AND e1.CONTAINER_ID       = pc1.id
 29  AND pct1.ID                       = pc1.CONTAINER_TYPE_ID
 30  AND e2.id                         = e1.PARENT_ID
 31  AND pc2.ID                        = e2.CONTAINER_ID
 32  AND pct2.id                       = pc2.CONTAINER_TYPE_ID
 33  AND e3.id                         = e2.PARENT_ID
 34  AND pc3.ID                        = e3.CONTAINER_ID
 35  AND pct3.id                       = pc3.CONTAINER_TYPE_ID
 36  ) pages
 37  GROUP BY destn, studref
 38  ;

Table created.

Elapsed: 00:04:31.08

So Parallel Query’s the culprit then, eh? If you turn it down a notch, the
query now runs within the TEMP space, and if you turn it off completely, it
actually runs faster than if it’s enabled. Er… well no actually, there’s one
more twist to it.

Whilst we were performing heroics with PARALLEL_MAX_SERVERS and explain
plans, one of the other DBAs noticed that the CONTAINER_TYPE table was redundant
in the SQL statement - it was included in the join, but the columns it provided
could be sourced entirely from the other two tables. CONTAINER_TYPE could
therefore be removed from the join, potentially reducing the amount of memory
and temp space taken up by the hash join. The SQL was rewritten to remove this
table:

CREATE TABLE lookup_table
AS
SELECT
     pc3.dest_code        as destn
    ,pc2.DES        as studref
    ,MIN(pc1.ID)            as pi
FROM
     container pc1
    ,element  e1
    ,element  e2
    ,container pc2
    ,element  e3
    ,container pc3
WHERE
    pc1.CONTAINER_TYPE_ID = 3
AND e1.CONTAINER_ID       = pc1.id
AND e2.id                         = e1.PARENT_ID
AND pc2.ID                        = e2.CONTAINER_ID
AND e3.id                         = e2.PARENT_ID
AND pc3.ID                        = e3.CONTAINER_ID
GROUP BY pc3.dest_code, pc2.des
;

and then run again - with an successful execution and a run time of just 1
min 17 seconds. Which was a bit of improvement. This of course was with
PARALLEL_MAX_SERVERS set to 0, so what would happen if it was increased back to
the original value, 240?

SQL> conn sys/password@orcl as sysdba
Connected.
SQL> alter system set parallel_max_servers = 240 scope=memory;

System altered.

Elapsed: 00:00:00.02
SQL> conn STAGING/STAGING@orcl
Connected.
SQL> CREATE TABLE lookup_table
  2  AS
  3  SELECT
  4       pc3.dest_code        as destn
  5      ,pc2.DES        as studref
  6      ,MIN(pc1.ID)            as pi
  7  FROM
  8       container pc1
  9      ,element  e1
 10      ,element  e2
 11      ,container pc2
 12      ,element  e3
 13      ,container pc3
 14  WHERE
 15      pc1.CONTAINER_TYPE_ID = 3
 16  AND e1.CONTAINER_ID       = pc1.id
 17  AND e2.id                         = e1.PARENT_ID
 18  AND pc2.ID                        = e2.CONTAINER_ID
 19  AND e3.id                         = e2.PARENT_ID
 20  AND pc3.ID                        = e3.CONTAINER_ID
 21  GROUP BY pc3.dest_code, pc2.des
 22  ;

Table created.

Elapsed: 00:01:12.04
SQL>

And there you go - it runs within the temp space constraints, and runs (just
about) faster than with PARALLEL_MAX_SERVERS set to 0.

So, the moral of the story? Well, I guess if you’re having issues with temp
space, and you’re hash joining several (big) tables with parallel query enabled,
and you’ve got lots of processors, consider limiting down the degree of
parallelism as a quick fix way to limit the amount of temp space being used.
Alternatively, consider increasing HASH_AREA_SIZE if you’ve got memory spare.
However, and I guess this is the true moral of the story - before you get all
fancy and start playing around with initialisation parameters, DOPs and so
forth, don’t ignore the little guy over in the corner, waving his hand and
saying "perhaps this statement could be written better", as probably the biggest
improvement you’ll ever get is taking the time out to write the query more
efficiently in the first place, rather than rely on clever quick fixes to paper
over the problem.

Any comments, aspects I’ve missed or misinterpreted etc, let me know and I’ll
update as neccessary.

Comments

  1. Ralph Says:

    nice problem solving….
    I ran permanently into this problem on a DWH (4Proc) - I just stepped down the DoP… and of course ignored the little guy who keept rewriting the statements =) as I was usually faster at the first (but the little guy rewrote the statement anyway - so I implemented it afterwards and increased the DoP after that
    =)
    .Ralph

  2. patrick Says:

    Good one! Gives me some insight to a problem I am working on right now. Thanks.
    Patrick

  3. Doug Burns Says:

    Nice posting, Mark. It reminded me of the problems we used to run into with temp space usage which it’s easy to forget about with px (and would be a worthwhile addition to the article). It also emphasised what I was trying to say with this bit :-
    “Although you may be able to use Parallel Execution to make an inefficient SQL statement run many times faster, that would be incredibly stupid. It s essential that you tune the SQL first. In the end, doing more work than you should be, but more quickly, is still doing more work than you should be! To put it another way, don t use PX as a dressing for a poorly designed application. Reduce the workload to the minimum needed to achieve the task and then start using the server facilities to make it run as quickly as possible. “
    Cheers,
    Doug