Improving PL/SQL Performance By Using Collections

January 30th, 2004 by Mark Rittman

Yesterday I looked at
some code
written in Oracle 7-style PL/SQL, and compared it with what is
considered best practice with Oracle 8i and 9i. The more up-to-date code
generally runs around 3-5 times faster than the old style code, and, as part
of understanding where the changes to PL/SQL have occurred, I’ve started off
by looking at how Oracle now handles a concept known as ‘Collections’. 

The online Oracle 9i PL/SQL User Guide introduces
collections as
;

"A collection is an ordered group of elements, all of the same
type. It is a general concept that encompasses lists, arrays, and other
familiar datatypes. Each element has a unique subscript that determines its
position in the collection."

So what are these used for? Well, when you first start using PL/SQL, you
generally end up declaring variables, then retrieving a value from a table to
go into the variable. If you’re going to retrieve a range of values from a
table, you declare a cursor, retrieve the values one at a time from the
database, and process them sequentially. Sometimes though, you want to
retrieve a load of values all in one go, and load them into an array, so that
you can carry out some sort of operation on the group of values as a whole.

With Oracle 7, you could create what was called an index-by table, or
‘PL/SQL Table’, that consisted of a series of value pairs; an index value, and
a scalar datatype (such as varchar2, or number). You referred to an individual
PL/SQL Table entry by using the index, i.e. CUSTOMER_NAME(10).
What made it interesting was that, as well as using scalar datatypes, you
could also create PL/SQL tables using PL/SQL records, which could consist of a
number of individual columns. By creating a PL/SQL record type based off of an
existing table (for example, by using the SCOTT.EMP%ROWTYPE),
you could load a table row, or an entire table, into a variable and process it
within your PL/SQL package. 

For example

DECLARE
TYPE EmpTabTyp IS TABLE OF emp%ROWTYPE
    INDEX BY BINARY_INTEGER;
emp_tab EmpTabTyp;

BEGIN
/* Retrieve employee record. */
SELECT * INTO emp_tab(7468) FROM emp WHERE empno = 7468;

END;

would set up a PL/SQL table type that consists of PL/SQL records with the
same definition as the columns in SCOTT.EMP, and then create a variable based
on this type. It would then retrieve a single row from the EMP table, then
place it into the EMP_TAB variable, using the index value of 7468. Note that
you don’t need to initialise the variable before it’s used, and you can
arbitrarily put values into the variable using any valid BINARY_INTEGER value.

As of Oracle 8, PL/SQL Tables were renamed ‘Collections’ and supplemented
by two new composite types: Nested Tables, and VARRAYs

Nested tables extend the functionality of index-by tables by adding extra
collection methods (known as table attributes for index-by tables), and, in a
new development, nested tables can also be store in database tables and can be
directly manipulated using SQL. Collectively, both types are known known as
PL/SQL Tables.

To declare a nested table, you use the syntax (note the lack
of ‘INDEX BY BINARY INTEGER’)

TYPE type_name IS TABLE OF element_type
[NOT NULL];

There are some key differences between traditional index-by tables, and
nested tables, and some good reasons why you’d want to start using nested
tables, which detailed in the
online
documentation
. In practical terms, one major difference between nested
tables and index-by tables, is that you have to initialise nested tables,
using a constructor (like you get in java), definining how many elements can
initially be stored in it (although you can later EXTEND the nested table);
however, as mentioned above, you can store nested tables within the database
(embedded in database columns), which is a distinct advantage over index-by
tables. So, if you want to put together a database that is object orientated,
and you need the data to be persistent, nested tables are the way to go.

The other new type of collection, as of Oracle 8, is VARRAYs. A varray
(variable length array) is a datatype that would be familiar to java or C
programmers. A varray has a maximum size, which you must specify in its type
definition. Its index has a fixed lower bound of 1 and an extensible upper
bound. Thus, a varray can contain a varying number of elements, from zero
(when empty) to the maximum specified in its type definition.

So, how do you choose which of these three types (index-by tables, nested
tables and varrays) to use. According
to the docs
(which also gives detailed reasons to choose nested tables
over associative arrays, and between nested tables and varrays);

"If you already have code or business logic that uses some other
language, you can usually translate that language’s array and set types
directly to PL/SQL collection types.

  • Arrays in other languages become VARRAYs in
    PL/SQL.

  • Sets and bags in other languages become
    nested tables in PL/SQL.

  • Hash tables and other kinds of unordered
    lookup tables in other languages become associative arrays in PL/SQL."

So, going back to our
original examples
, why do the answers suggest using collections - in this
case, nested tables?

The original method for loading the table as put forward in the question
was:

CREATE TABLE t1 AS
SELECT *
FROM all_objects
WHERE 1=0;

CREATE OR REPLACE PROCEDURE test_proc
IS

BEGIN
    FOR x IN (SELECT * FROM all_objects)
    LOOP
        INSERT INTO t1
        (owner, object_name,
subobject_name, object_id,
        data_object_id, object_type,
created, last_ddl_time,
        timestamp, status, temporary,
generated, secondary)
        VALUES
        (x.owner, x.object_name,
x.subobject_name, x.object_id,
        x.data_object_id, x.object_type,
x.created,
        x.last_ddl_time, x.timestamp,
x.status, x.temporary,
        x.generated, x.secondary);
    END LOOP;
COMMIT;
END test_proc;

 Ignoring the initial table creation script, the test_proc producedure
does three things;

  1. Declares a cursor that points to the resultset from SELECT * FROM ALL_OBJECTS
  2. Starts at record one, and inserts into the t1 table the columns from the
    first row in the cursor

  3. Then, it loops back and gets the next row of data, until all rows from
    the cursor have been retrieved.

The data is then committed, and the procedure ends.

The first solution put forward uses a nested table to hold the data from
the ALL_OBJECTS table, and does something called BULK COLLECT to load all of
the source tables’ data into the nested table.

CREATE OR REPLACE PROCEDURE fast_proc (p_array_size
IN PLS_INTEGER DEFAULT 100)
IS

TYPE ARRAY IS TABLE OF
all_objects%ROWTYPE;
l_data ARRAY;

CURSOR c IS
SELECT *
FROM all_objects;

BEGIN
    OPEN c;
    LOOP
    FETCH c BULK COLLECT INTO l_data LIMIT p_array_size;

    FORALL i IN
1..l_data.COUNT
    INSERT INTO t2 VALUES l_data(i);

    EXIT WHEN c%NOTFOUND;
    END LOOP;
    CLOSE c;
END fast_proc;
/

Ignore the fact that the table’s called ARRAY - it’s a nested table, not a
varray.

The second example answer is a variation on this, that does much the same
thing with slightly more compact code;

SQL> create or replace procedure
fast_proc is
2         type TObjectTable is table
of ALL_OBJECTS%ROWTYPE;
3         ObjectTable$ TObjectTable;
4         begin
5         select
6                    
* BULK COLLECT INTO ObjectTable$
7         from ALL_OBJECTS;
8
9         forall x in
ObjectTable$.First..ObjectTable$.Last
10       insert into t1 values ObjectTable$(x)
;
11       end;
12 /

Again, a nested table is declared, but this time the cursor is dispensed
with, and the data is just bulk loaded directly into the nested table. Again,
the FORALL statement is used afterwards to run through the nested table, and
I’ll go into how BULK COLLECT and FORALL work tomorrow.

So what is the message here? Well, first of all, if you need to process
lots of rows in one go, loading the data into memory first, use a collection
and BULK COLLECT the data into them. As Daniel Morgan pointed out when putting
his answer together, "The point I would hope more junior developers
take away from this is that while cursors definitely have their uses … they
should not be the first thing one thinks of any more.".
Together with
BULK COLLECT, use FORALL to loop through the data, and i’ll be going into more
detail on  two features tomorrow. In the meantime, if you want to do a
bit more reading up on collections, and see how they are being further
improved with Oracle 10g, take a look at this
OTN article by Steven Feuerstein
.

Comments

  1. Yves Bergeron Says:

    Hi,
    Unless there is some very complex data manipulation to do between the select and the insert, there is no need for a procedure here.
    A simple
    [code]
    INSERT INTO T1
    SELECT * FROM ALL_OBJECTS
    [/code]
    should do the job.
    I think the BULK COLLECT and FORALL features are really needed when the data must reside in memory for some special treatment. Also, care of memory usage must be taken if the table volume is very high.
    Sometimes, a lot of PL/SQL coding can be replaced by simple SQL DML.
    Just my 2 cents.

  2. Mark Rittman Says:

    Hi Yves,
    Thanks for the feedback. Yes - you’re right, a straight insert into … select is going to be faster, and this was covered in the previous article http://www.rittman.net/archives/000754.html . The point I was illustrating was that collections (together with BULK COLLECT and FORALL) are faster than retrieving rows one by one, if you have to process the rows within PL/SQL. Thanks for the feedback anyway!

  3. Quoc Thai Says:

    I’ve tried fast_proc on oracle 8.1.7 but it won’t worked and raised an exception.
    Please help me !

  4. Quoc Thai Says:

    I’ve tried fast_proc on oracle 8.1.7 but it won’t worked and raised an exception.
    Please help me !