The Cost Of Database Independence

Reading through the comments following on from my "Is Oracle A Legacy Technology" posting, and in particular Mogens N rgaard's reference to Jonathan Lewis' comments about database independence, made me think about another client I recently worked with.

This client had an application that, when a transaction was about to take place, would request from the database a set of sequence numbers that would be used to record the transaction. These sequence numbers would be used to record the customer, the account, the product, the individual order items and the address that the order was to be sent to. The problem they were having was that, as the number of concurrent users began to scale up, the application slowed down dramatically, and around once each day the application actually froze, with users locked out and no action possible except to restart the application server. They had a suspicion it was down to a locking issue, and asked us to take a look and make some recommendations.

As we had access to the application coders (it was an in-house application) and they had a development environment, it was fairly easy to insert event 10046 extended trace code into the bit of the application that processed orders, process a few sample orders using a single logged in user, then examine the TKPROF formatted trace file and take a look at the SQL being emitted by the application, and any wait events that slowed down the code. Next, we ran the same process with 100 simulated concurrent users, to see if we could spot any lock contention or other wait events that came when multiple users tried to use the system.

Looking first at the SQL used by the application, it soon became clear where the problems might be occurring. The application used a table, which will call SEQUENCE_NUMBERS, which had one row per sequence number used and stored the highest value used by the sequence;

CREATE TABLE sequence_numbers
(
seq_num_type varchar2(50),
seq_num_value number
);

The table had around 20 seq_num_types, one for customer_id, one for product_id, and so on. When the application required a new customer ID, it would issue code along the following lines:

UPDATE sequence_numbers SET seq_num_value = seq_num_value WHERE seq_num_type = 'CUSTOMER';
SELECT  seq_num_value FROM sequence_numbers WHERE seq_num_type = 'CUSTOMER';
COMMIT;

Ignoring at this point alternative ways to do this, this first things that came to mind when I looked at this were - firstly, we're going to have a lot of contention for this particular table and it's associated block within the datafile, and all of the requests are going to be for the same few rows. Once we start updating our particular rows Oracle is going to block any further updates until that change is committed or rolled back, so we're going to get some issues around locking, and we're also going to have issues about multiple processes all trying to access this particular block in the buffer. Also, all of these commits are going to (correct me if I'm wrong here) be causing the database to be writing the redo log to disk because of all the checkpoints, which is going to effectively serialise our work whilst this goes on.

Looking at the relevant bit from the TKPROF report, we found the following waits:

Elapsed times include waiting on following events:

  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  buffer busy waits                              11        0.06          0.13
  latch free                                      3        0.01          0.04
  enqueue      
                                   1        0.00          0.00

What this showed was that, whilst we did get one lock wait, most of our waits were in fact down to the same rows within the same table being "fought over" by all our processes at the same time.

Just afterwards, we had a call through from the customer help desk saying that the application had locked up again, and from running some code we found over on AskTom, we quickly found out that one of our application users was blocking another through looking at the V$LOCK view:

   select
       (select username from v$session where sid=a.sid) blocker,
        a.sid,
       ' is blocking ',
        (select username from v$session where sid=b.sid) blockee,
         b.sid
   from v$lock a, v$lock b
  where a.block = 1
    and b.request > 0
    and a.id1 = b.id1
    and a.id2 = b.id2
  

BLOCKER         SID 'ISBLOCKING'  BLOCKEE         SID
-------- ---------- ------------- -------- ----------
APPUSER            8  is blocking APPUSER           9

Speaking again to the application developer, and looking at some logging code he'd put into the application, it appeared that what was happening was that one part of the code was requesting a new sequence number somewhere, then failing to complete the transaction, but leaving the SEQUENCE_NUMBER row locked. All other users then eventually ground to a halt waiting for this transaction to eventually commit.

The customer's next idea was to wrap up the request of these sequence numbers into a function, which would declare a cursor for update, retrieve the number and commit the transaction, all in one atomic process that the application couldn't muck up. The code for the function looked something like:

create function GET_CUSTOMER_ID
returns number
is
cursor SEQUENCE_CUR is
        SELECT seq_num_val FROM sequence_numbers WHERE seq_num_type = 'CUSTOMER'
          FOR UPDATE;
begin
         {logic to retrieve sequence number}
end;

However, when we ran this concurrently with 100 users and took a look at the TKPROF results, the performance was even worse:

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  enqueue                                        12        0.98          2.03
  latch free                                      6        0.01          0.10
  buffer busy waits                               5        0.02          0.05

Now we were really having problems with locks. The SELECT ... FOR UPDATE was locking the table even more aggressively than our previous code, such that the whole process was now taking much longer just to do the same job. We might solve our system deadlock problem, but the application definitely won't scale.

Of course at this point anyone who's familiar with Oracle will know that the customer should really have been using sequences instead of maintaining a list of values in a table, and manually updating, selecting and committing each request. Sequence values (or infact, the next value for a particular sequence) are held in the SGA, only require disk I/O when the sequence cache has been exhausted, and don't involve any locks, buffer busy waits or table contention. Sequences are one of the features within Oracle that allow applications to scale, and their use in this situation is described in Jonathan Lewis' chapter in Oracle Insights : Tales Of The Oaktable:

"The primary purpose of Oracle sequences is to supply unique numbers rapidly ... So what do you do if you don't want to use Oracle sequences? Typically,something like this:

select seq_value
into :local_variable
from seq_table
where seq_name = 'SLOW_REQ'
for update;

update seq_table
set seq_value = seq_value + 1
where seq_name = 'SLOW_SEQ';
commit;

The commit is necessary to reduce the time that other sessions have to sit waiting for you to get out of the way ... More significantly, you have reduced your scalability quite dramatically. Sessions will be queueing, and you've now achieved three round trips, three redo records, and six block changes, something that Oracle could do in one (or zero) round trips, no redo records, and no block changes"

The obvious step then was to recommend that all of this code was replaced by simple sequence lookups, which would get rid of the segment contention, contention for block buffers, locking and extra redo. However, the problem was that the customer's management required that all applications be "database independent", so that they weren't locked into a single database vendor.

The reasoning behind this was that the client had some associated companies, not all of which used Oracle as their RDBMS, and the software might possibly be used by these other companies on their database. In addition, the company had got a particularly good deal from Oracle when they last licensed the database, and migrated from Sybase onto Oracle because of this deal. If they didn't get the same deal when they came to renew their license, they might well move to a different database vendor, or even look at an open source database such as mySQL.

If you read AskTom or have got Tom Kyte's Effective Oracle By Design, you'll know Tom's view on database independence:

"Here's something that you might find controversial : Database dependence (regardless of the database you are using, this does not apply just to Oracle) should be your goal, not something to avoid. You want to maximise the investment in your database. You want to develop the best software in the least amount of time against that database. The only way to do that is to fully exploit what the database has to offer.

The truth is that, with the exception of trivial applications, achieving database independence is not only extremely hard, but is also extremely costly and consumes a huge amount of resources ... your application would not use SQL beyond simple "keyed reads" ... in short, you would end up writing your own database."

Of course a cynic would say that "He would say that, wouldn't he, working for a database vendor" but at the same time, you've paid a lot of money for Oracle, and it seems a shame not to use the very features that Oracle can charge a lot of money for. Jonathan Lewis, in the same Oaktable book chapter, warns however that money aside, if you don't use these features, your application just won't scale:

"An application that is database independent is automatically an application that will perform badly at high levels of concurrency or large data volumes. It may do the job adequately, but you may spend all your time and money struggling to push the software into meeting Service Level Agreements (SLAs) and the users' expectations ... Database developers put a lot of effort into making their technology efficient, and basically assume that if they design a good feature into their systems you will be prepared to pay a small performance penalty in one part of their code in order to win a big performance gain elsewhere. Database-independent code doesn't use features, it uses vanilla-flavored, lowest common denominator elements - so you pay every penalty, but never pick up the benefits."

Also, at what point do you declare database independence - do you allow stored procedures? (most databases have them, but their syntax is different). What about analytic functions? They're in SQL-99, but not everyone implements them. SQL Server has identity columns, and Access has autonumber fields - can we use sequences now, if we equate them with these other implementation methods? It's difficult to come up with a workable definition of database independence that doesn't just leave us, as Tom Kyte suggests, with just tables and indexes.

So, effectively, we were stuck with this sequence workaround, which we had to keep in place to satisfy the requirement for database independence, but which effectively limited the scalability of the application and had Oracle working at only a fraction of it's possible capacity. The irony of the situation was that the client could end up paying more for hardware to support the inefficient application, and was already paying out additional money to have someone like myself take a look at the application. But, by doing this they were able to switch from vendor to vendor whenever they wished, other companies within the group could use it on their own database platforms, and perhaps the deal they might get in future would outweigh the additional cost of running the application in this way.

Anyway, I thought this was an interesting bit of work in the end, as it illustrated the impact of not doing things the Oracle-optimised way, and also showed why a customer might choose, for logical business reasons, to go down the database independence route. I'm due to go back in there early next year, so it'll be interesting to see if sequences were eventually considered, what impact they might have had, if they weren't what additional cost and effort was incurred in making the application scalable by other means.

Comments?