Performance Issues with DML Error Logging and Conventional Path Inserts

I've recently been putting together an article for Oracle Magazine on the new DML Error Logging feature in Oracle 10gR2, which allows you to add a LOG ERRORS clause to most DML statements so that rows that would otherwise cause the statement to fail are instead copied to an error logging table. A typical use of LOG ERRORS is like this:

INSERT /*+ APPEND */
INTO   table_a
SELECT *
FROM   table_b
LOG ERRORS
REJECT LIMIT UNLIMITED;

The rows that would cause an error are instead written to a table called ERR$_TABLE_A that gets created for you by the DBMS_ERRLOG.CREATE_ERROR_LOG procedure. I won't go into the full details here but Tim Hall has put together a nice write-up that you can view here.

Anyway, for the article I put a set of examples together, where I used the data in the SH.SALES table to create a "source table" copy in a new schema, created another copy based off of this as a "target table", then introduced some errors in to the source table and constraints into the target so that I could show how the feature worked. Finally I used the DBMS_ERRLOG package to create the error logging table.

SQL*Plus: Release 10.2.0.1.0 - Production on Sun Dec 4 12:38:54 2005

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> create sequence sales_id_seq;

Sequence created.

SQL> create table sales_src
2 as
3 select sales_id_seq.nextval as "SALES_ID"
4 , prod_id
5 , cust_id
6 , channel_id
7 , time_id
8 , promo_id
9 , amount_sold
10 , quantity_sold
11 from sh.sales
12 /

Table created.

SQL> create table sales_target
2 as
3 select *
4 from sales_src
5 where 1=0
6 /

Table created.

SQL> alter table sales_src
2 modify promo_id null
3 /

Table altered.

SQL> select min(sales_id)
2 , max(sales_id)
3 from sales_src
4 /

MIN(SALES_ID) MAX(SALES_ID)


        1        918843

SQL> update sales_src
2 set promo_id = null
3 where sales_id in (5000,6000,7000)
4 /

3 rows updated.

SQL> update sales_src
2 set amount_sold = 0
3 where sales_id between 1000 and 1005
4 /

6 rows updated.

SQL> commit;

Commit complete.

SQL> alter table sales_target
2 add constraint amount_sold_chk
3 check (amount_sold > 0)
4 enable
5 validate
6 /

Table altered.

SQL> desc sales_src
Name Null? Type


SALES_ID NUMBER
PROD_ID NOT NULL NUMBER
CUST_ID NOT NULL NUMBER
CHANNEL_ID NOT NULL NUMBER
TIME_ID NOT NULL DATE
PROMO_ID NUMBER
AMOUNT_SOLD NOT NULL NUMBER(10,2)
QUANTITY_SOLD NOT NULL NUMBER(10,2)

SQL> desc sales_target
Name Null? Type


SALES_ID NUMBER
PROD_ID NOT NULL NUMBER
CUST_ID NOT NULL NUMBER
CHANNEL_ID NOT NULL NUMBER
TIME_ID NOT NULL DATE
PROMO_ID NOT NULL NUMBER
AMOUNT_SOLD NOT NULL NUMBER(10,2)
QUANTITY_SOLD NOT NULL NUMBER(10,2)

SQL> begin
2 dbms_errlog.create_log_table('SALES_TARGET');
3 end;
4 /
dbms_errlog.create_log_table('SALES_TARGET');
*
ERROR at line 2:
ORA-06550: line 2, column 16:
PLS-00302: component 'CREATE_LOG_TABLE' must be declared
ORA-06550: line 2, column 4:
PL/SQL: Statement ignored

SQL> begin
2 dbms_errlog.create_error_log('SALES_TARGET');
3 end;
4 /

PL/SQL procedure successfully completed.

Next then I demonstrated a direct path insert from the source into the target table, using the LOG ERRORS clause.

SQL> insert /*+ APPEND */
  2  into   sales_target
  3  select *
  4  from   sales_src
  5  log errors
  6  reject limit unlimited
  7  /

918834 rows created.

Elapsed: 00:00:05.20
SQL> truncate table sales_target;

Table truncated.

Elapsed: 00:00:00.39
SQL> truncate table err$_sales_target;

Table truncated.

Elapsed: 00:00:00.07

I then compared it to a PL/SQL routine that you'd normally use to handle this sort of situation. 

SQL> DECLARE
  2    TYPE array IS TABLE OF sales_target%ROWTYPE
  3        INDEX BY BINARY_INTEGER;
  4     sales_src_arr   ARRAY;
  5     errors          NUMBER;
  6     bulk_error      EXCEPTION;
  7     l_cnt           NUMBER := 0;
  8     PRAGMA exception_init
  9           (bulk_error, -24381);
 10     CURSOR c IS
 11        SELECT * FROM sales_src;
 12    BEGIN
 13     OPEN c;
 14     LOOP
 15       FETCH c
 16          BULK COLLECT
 17          INTO sales_src_arr
 18          LIMIT 100;
 19       BEGIN
 20          FORALL i IN 1 .. sales_src_arr.count
 21                   SAVE EXCEPTIONS
 22            INSERT INTO sales_target VALUES sales_src_arr(i);
 23       EXCEPTION
 24       WHEN bulk_error THEN
 25         errors :=
 26            SQL%BULK_EXCEPTIONS.COUNT;
 27         l_cnt := l_cnt + errors;
 28         FOR i IN 1..errors LOOP
 29           DBMS_OUTPUT.PUT_LINE
 30           ('Error iteration ' ||
 31       SQL%BULK_EXCEPTIONS(i).ERROR_INDEX ||
 32            ' Oracle error is ' ||
 33       SQL%BULK_EXCEPTIONS(i).ERROR_CODE );
 34         END LOOP;
 35       END;
 36       EXIT WHEN c%NOTFOUND;
 37
 38    END LOOP;
 39     CLOSE c;
 40     DBMS_OUTPUT.PUT_LINE
 41     ( l_cnt || ' total errors' );
 42    END;
 43  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:26.01

The example results came back as you'd expect - the direct path insert using DML error logging took about 5 seconds for 900k rows, whilst the PL/SQL routine, which used BULK COLLECT, FORALL and SAVE EXCEPTIONS, took about 26 seconds. As an experiment, I also ran the LOG ERRORS version using just a conventional path insert, like this:

SQL> insert into sales_target
  2  select *
  3  from   sales_src
  4  log errors
  5  reject limit unlimited
  6  /

918834 rows created.

Elapsed: 00:00:58.79
SQL> truncate table sales_target;

Table truncated.

Elapsed: 00:00:02.46

The conventional path version came back in just under a minute - considerably longer than the PL/SQL version. I must admit I put this down to the fact that I wasn't in direct path mode, coupled with the amount of rows I was loading, and just mentally filed it away as one of those things.

I passed the article out for review to Tim Hall, Doug Burns and Tom Kyte as this was the first "server tech" article I'd written for Oracle Magazine, and I thought it worth getting a couple of extra opinions before submitting it for publishing. The reviews were OK except that Tom picked up on the fact that the conventional path version ran so slow - this seemed too slow for him and something didn't look right. Tom said he'd take a look at it but also suggested that I contact Jonathan Lewis, who might have an idea on this.

I got in touch with Jonathan who said that he'd not had a chance to take a proper look at DML error logging yet, but said that the figures suggested that the conventional path load might have managed to fall back into some single-row processing mode for undo, redo and latching. Jonathan suggested that I run the tests again, but this time using a view and a statistics gathering package that he'd written to get a better idea of the activity during the DML execution. I've had a look on Jonathan's site and I think that the view and code that I used is on this "Snapshots Revisited" page, so you can take a look and run the tests yourselves if you like.

Going back to SQL*Plus, I installed the view and package from Jonathan's code and ran the tests again, but this time snapshotting the stats to see if clues suggested themselves:

SQL> exec snap_my_stats.start_snap;

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.11
SQL> insert /*+ APPEND */
2 into sales_target
3 select *
4 from sales_src
5 log errors
6 reject limit unlimited
7 /

918834 rows created.

Elapsed: 00:00:07.00
SQL> exec snap_my_stats.end_snap;

Session stats - 04-Dec 12:57:32
Interval:- 43 seconds

Name Value


opened cursors cumulative 178
user calls 12
recursive calls 2,496
recursive cpu usage 30
session logical reads 12,823
CPU used when call started 296
CPU used by this session 298
DB time 703
user I/O wait time 218
session uga memory 65,464
session pga memory 393,216
enqueue requests 303
enqueue releases 301
physical read total IO requests 73
physical read total multi block requests 56
physical read total bytes 10,158,080
physical write total IO requests 213
physical write total multi block requests 211
physical write total bytes 41,500,672
db block gets 6,814
db block gets from cache 1,748
db block gets direct 5,066
consistent gets 6,009
consistent gets from cache 6,009
consistent gets - examination 246
physical reads 1,230
physical reads cache 1,230
physical read IO requests 68
physical read bytes 10,076,160
db block changes 1,439
consistent changes 57
physical writes 5,066
physical writes direct 5,066
physical write IO requests 213
physical write bytes 41,500,672
physical writes non checkpoint 5,066
free buffer requested 1,438
commit cleanouts 119
commit cleanouts successfully completed 119
CR blocks created 110
physical reads cache prefetch 1,162
calls to kcmgas 175
calls to get snapshot scn: kcmgss 560
redo entries 1,196
redo size 178,172
undo change vector size 43,096
data blocks consistent reads - undo records applied 57
no work - consistent read gets 5,424
rollbacks only - consistent read gets 57
deferred (CURRENT) block cleanout applications 114
IMU Flushes 1
table scans (short tables) 55
table scans (long tables) 2
table scan rows gotten 933,374
table scan blocks gotten 5,253
table fetch by rowid 8
cluster key scans 116
cluster key scan block gets 226
rows fetched via callback 6
index fetch by key 122
heap block compress 1
session cursor cache hits 170
cursor authentications 1
buffer is not pinned count 318
workarea executions - optimal 2
parse time cpu 6
parse time elapsed 48
parse count (total) 178
parse count (hard) 3
execute count 187
bytes sent via SQLNet to client 794
bytes received via SQL
Net from client 1,225
SQL*Net roundtrips to/from client 8

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.07
SQL> truncate table sales_target;

Table truncated.

Elapsed: 00:00:00.46
SQL> truncate table err$_sales_target;

Table truncated.

Well that was the direct path version. What about the conventional path version?

Elapsed: 00:00:00.21
SQL> exec snap_my_stats.start_snap;

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.15
SQL> insert into sales_target
2 select *
3 from sales_src
4 log errors
5 reject limit unlimited
6 /

918834 rows created.

Elapsed: 00:00:57.45
SQL> exec snap_my_stats.end_snap;

Session stats - 04-Dec 12:59:34
Interval:- 86 seconds

Name Value


opened cursors cumulative 232
user calls 12
recursive calls 2,927
recursive cpu usage 105
session logical reads 983,732
CPU used when call started 2,793
CPU used by this session 2,793
DB time 5,752
user I/O wait time 522
messages sent 264
enqueue requests 890
enqueue releases 888
physical read total IO requests 120
physical read total multi block requests 116
physical read total bytes 26,796,032
db block gets 973,216
db block gets from cache 973,216
consistent gets 10,516
consistent gets from cache 10,516
consistent gets - examination 4,002
physical reads 3,271
physical reads cache 3,271
physical read IO requests 120
physical read bytes 26,796,032
db block changes 1,872,467
consistent changes 57
change write time 1,511
free buffer requested 16,066
pinned buffers inspected 2
hot buffers moved to head of LRU 45
free buffer inspected 11,997
commit cleanouts 119
commit cleanouts successfully completed 119
CR blocks created 110
write clones created in foreground 7
physical reads cache prefetch 3,151
calls to kcmgcs 5,048
calls to kcmgas 181
calls to get snapshot scn: kcmgss 1,076
redo entries 945,196
redo size 263,407,300
redo buffer allocation retries 16
redo log space requests 19
redo log space wait time 1,014
undo change vector size 58,811,884
data blocks consistent reads - undo records applied 57
no work - consistent read gets 5,532
rollbacks only - consistent read gets 57
rollback changes - undo records applied 6
deferred (CURRENT) block cleanout applications 114
active txn count during cleanout 3,756
cleanout - number of ktugct calls 3,756
IMU Flushes 1
table scans (short tables) 109
table scans (long tables) 2
table scan rows gotten 934,130
table scan blocks gotten 5,361
table fetch by rowid 8
cluster key scans 116
cluster key scan block gets 226
rows fetched via callback 6
index fetch by key 122
session cursor cache hits 224
buffer is pinned count 913,776
buffer is not pinned count 318
parse time cpu 5
parse time elapsed 5
parse count (total) 232
parse count (hard) 3
execute count 241
bytes sent via SQLNet to client 810
bytes received via SQL
Net from client 1,210
SQL*Net roundtrips to/from client 8

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.06
SQL> truncate table sales_target;

Table truncated.

Elapsed: 00:00:02.15
SQL> truncate table err$_sales_target;

Table truncated.

Note the difference in the redo generated between the two versions - 178Kb of redo for the direct path insert, compared to 263Mb with the conventional path insert.

I also then ran a couple of extra tests; firstly I removed the constraints from the target table and ran the conventional path version again, this time without using LOG ERRORS:

SQL> alter table sales_target
  2  modify column promo_id null
  3  /
modify column promo_id null
                       *
ERROR at line 2:
ORA-00905: missing keyword

Elapsed: 00:00:00.09
SQL> alter table sales_target
2 modify promo_id null
3 /

Table altered.

Elapsed: 00:00:00.34
SQL> alter table sales_target
2 drop constraint amount_sold_chk
3 /

Table altered.

Elapsed: 00:00:00.07
SQL> exec snap_my_stats.start_snap;

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.15
SQL> insert into sales_target
2 select *
3 from sales_src
4 /

918843 rows created.

Elapsed: 00:00:14.07
SQL> exec snap_my_stats.end_snap;

Session stats - 04-Dec 13:02:10
Interval:- 54 seconds

Name Value


opened cursors cumulative 197
user calls 12
recursive calls 2,776
recursive cpu usage 38
session logical reads 59,375
CPU used when call started 458
CPU used by this session 458
DB time 1,418
user I/O wait time 783
messages sent 43
enqueue requests 766
enqueue releases 764
physical read total IO requests 221
physical read total multi block requests 197
physical read total bytes 41,443,328
db block gets 46,747
db block gets from cache 46,747
consistent gets 12,628
consistent gets from cache 12,628
consistent gets - examination 1,371
physical reads 5,059
physical reads cache 5,059
physical read IO requests 221
physical read bytes 41,443,328
db block changes 38,770
consistent changes 57
change write time 77
free buffer requested 10,653
dirty buffers inspected 590
hot buffers moved to head of LRU 7
free buffer inspected 8,689
commit cleanouts 110
commit cleanouts successfully completed 110
CR blocks created 110
physical reads cache prefetch 4,838
shared hash latch upgrades - no wait 19
calls to kcmgcs 5,048
calls to kcmgas 165
calls to get snapshot scn: kcmgss 5,816
redo entries 28,400
redo size 43,719,772
redo buffer allocation retries 3
redo log space requests 3
redo log space wait time 126
undo change vector size 2,463,156
data blocks consistent reads - undo records applied 57
no work - consistent read gets 5,474
rollbacks only - consistent read gets 57
deferred (CURRENT) block cleanout applications 110
active txn count during cleanout 1,120
cleanout - number of ktugct calls 1,120
IMU Flushes 1
table scans (short tables) 58
table scans (long tables) 2
table scan rows gotten 933,416
table scan blocks gotten 5,259
table fetch by rowid 24
cluster key scans 115
cluster key scan block gets 225
index fetch by key 116
index scans kdiixs1 19
session cursor cache hits 182
buffer is pinned count 913,776
buffer is not pinned count 372
workarea executions - optimal 2
parse time cpu 11
parse time elapsed 125
parse count (total) 182
parse count (hard) 1
execute count 198
bytes sent via SQLNet to client 810
bytes received via SQL
Net from client 1,176
SQL*Net roundtrips to/from client 8
sorts (memory) 7
sorts (rows) 16

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.10
SQL> truncate table sales_target;

Table truncated.

This time the conventional path insert generated 43Mb of redo - this should be considered the benchmark figure for a conventional path insert, therefore the LOG ERRORS clause is generating just under 200Mb of extra redo, which explains why it takes so much longer to run. What about then if I remove all the errors from the data, then reinstate the LOG ERRORS clause - will having this clause in itself cause the extra redo to be generated?

Elapsed: 00:00:05.03
SQL> alter table sales_target
  2  modify promo_id not null
  3  /

Table altered.

Elapsed: 00:00:00.09
SQL> alter table sales_target
2 add constraint amount_sold_chk
3 check (amount_sold > 0)
4 enable
5 validate
6 /

Table altered.

Elapsed: 00:00:00.09
SQL> delete from sales_src
2 where amount_sold <=0
3 /

6 rows deleted.

Elapsed: 00:00:03.09
SQL> delete from sales_src
2 where promo_id is null
3 /

3 rows deleted.

Elapsed: 00:00:00.35
SQL> commit;

Commit complete.

Elapsed: 00:00:00.03
SQL> exec snap_my_stats.start_snap;

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.18
SQL> insert into sales_target
2 select *
3 from sales_src
4 log errors
5 reject limit unlimited
6 /

918834 rows created.

Elapsed: 00:01:11.04
SQL> exec snap_my_stats.end_snap;

Session stats - 04-Dec 13:08:45
Interval:- 109 seconds

Name Value


opened cursors cumulative 230
user calls 12
recursive calls 2,909
recursive cpu usage 103
session logical reads 983,504
CPU used when call started 2,920
CPU used by this session 2,922
DB time 7,113
user I/O wait time 1,659
messages sent 258
enqueue requests 877
enqueue releases 875
physical read total IO requests 225
physical read total multi block requests 199
physical read total bytes 41,590,784
db block gets 973,166
db block gets from cache 973,166
consistent gets 10,338
consistent gets from cache 10,338
consistent gets - examination 3,812
physical reads 5,077
physical reads cache 5,077
physical read IO requests 225
physical read bytes 41,590,784
db block changes 1,872,411
consistent changes 57
change write time 1,788
free buffer requested 17,868
hot buffers moved to head of LRU 26
free buffer inspected 11,199
commit cleanouts 110
commit cleanouts successfully completed 110
CR blocks created 110
write clones created in foreground 7
physical reads cache prefetch 4,852
calls to kcmgcs 5,048
calls to kcmgas 172
calls to get snapshot scn: kcmgss 1,076
redo entries 945,174
redo size 263,400,952
redo buffer allocation retries 24
redo log space requests 18
redo log space wait time 853
undo change vector size 58,810,660
data blocks consistent reads - undo records applied 57
no work - consistent read gets 5,536
rollbacks only - consistent read gets 57
deferred (CURRENT) block cleanout applications 110
active txn count during cleanout 3,587
cleanout - number of ktugct calls 3,587
IMU Flushes 1
table scans (short tables) 113
table scans (long tables) 2
table scan rows gotten 934,186
table scan blocks gotten 5,369
table fetch by rowid 5
cluster key scans 110
cluster key scan block gets 220
rows fetched via callback 1
index fetch by key 111
session cursor cache hits 224
cursor authentications 1
buffer is pinned count 913,767
buffer is not pinned count 306
parse time cpu 6
parse time elapsed 90
parse count (total) 230
parse count (hard) 2
execute count 231
bytes sent via SQLNet to client 810
bytes received via SQL
Net from client 1,210
SQL*Net roundtrips to/from client 8

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.03
SQL> select count(*)
2 from err$_sales_target;

COUNT(*)

     0

Elapsed: 00:00:00.12
SQL>

So yes, just adding the LOG ERRORS clause causes the extra redo to be generated, even though no errors have been written to the error logging table. According to Jonathan, what is in effect happening is that when you switch to conventional path with LOG ERRORS, the "block at a time" optimisation for undo and redo dissappears. In effect, the conventional path switches to single row mode, which you can tell by

  • the session logical I/Os in the stats = number of rows
  • redo entries = number of rows
  • db block changes = 2* number of rows (one for the row, one for the undo entry

It's worth saying here that I'm working with 10.2.0.1 on Windows XP SP2 in noarchivelog mode (MR - I originally had this as 10.1.0.2, a typo I corrected on 5/11). I'll raise it as a bug on metalink on Monday (assuming it's a bug and not an intended feature) and we'll see what happens. Thanks again to Jonathan and Tom for pointing it out, very interesting and it shows the value of getting your article peer reviewed.