The Cost Of Database Independence

November 24th, 2004 by

Reading through the
comments
following on from my "Is
Oracle A Legacy Technology"
posting, and in particular
Mogens
N rgaard’s reference
to Jonathan Lewis’ comments about database
independence, made me think about another client I recently worked with.

This client had an application that, when a transaction was about to take
place, would request from the database a set of sequence numbers that would be
used to record the transaction. These sequence numbers would be used to record
the customer, the account, the product, the individual order items and the
address that the order was to be sent to. The problem they were having was that,
as the number of concurrent users began to scale up, the application slowed down
dramatically, and around once each day the application actually froze, with
users locked out and no action possible except to restart the application
server. They had a suspicion it was down to a locking issue, and asked us to
take a look and make some recommendations.

As we had access to the application coders (it was an in-house application)
and they had a development environment, it was fairly easy to insert

event 10046 extended trace
code into the bit of the application that
processed orders, process a few sample orders using a single logged in user,
then examine the TKPROF formatted trace file and take a look at the SQL being
emitted by the application, and any wait events that slowed down the code. Next,
we ran the same process with 100 simulated concurrent users, to see if we could
spot any lock contention or other wait events that came when multiple users
tried to use the system.

Looking first at the SQL used by the application, it soon became clear where
the problems might be occurring. The application used a table, which will call
SEQUENCE_NUMBERS, which had one row per sequence number used and stored the
highest value used by the sequence;

CREATE TABLE sequence_numbers
(
seq_num_type varchar2(50),
seq_num_value number
);

The table had around 20 seq_num_types,
one for customer_id, one for
product_id, and so on. When the
application required a new customer ID, it would issue code along the following
lines:

UPDATE sequence_numbers SET seq_num_value
= seq_num_value WHERE seq_num_type = ‘CUSTOMER’;
SELECT  seq_num_value FROM sequence_numbers WHERE seq_num_type =
‘CUSTOMER’;
COMMIT;

Ignoring at this point alternative ways to do this, this first things that
came to mind when I looked at this were – firstly, we’re going to have a lot of
contention for this particular table and it’s associated block within the
datafile, and all of the requests are going to be for the same few rows. Once we
start updating our particular rows Oracle is going to block any further updates
until that change is committed or rolled back, so we’re going to get some issues
around locking, and we’re also going to have issues about multiple processes all
trying to access this particular block in the buffer. Also, all of these commits
are going to (correct me if I’m wrong here) be causing the database to be
writing the redo log to disk because of all the checkpoints, which is going to
effectively serialise our work whilst this goes on.

Looking at the relevant bit from the TKPROF report, we found the following
waits:


Elapsed times include waiting on following events:

 
Event waited on                             Times   Max. Wait  Total Waited
  —————————————-   Waited  ———-  ————
  buffer busy waits                              11        0.06          0.13
  latch free                                      3        0.01          0.04
  enqueue      
                                  
1        0.00          0.00

What this showed was that, whilst we did get one lock wait, most of our waits
were in fact down to the same rows within the same table being "fought over" by
all our processes at the same time.

Just afterwards, we had a call through from the customer help desk saying
that the application had locked up again, and from running

some code we found over on AskTom
, we quickly found out that one of our
application users was blocking another through looking at the V$LOCK view:

  
select
       (select username from v$session where sid=a.sid)
blocker,
        a.sid,
       ‘ is blocking ‘,
        (select username from v$session where
sid=b.sid) blockee,
         b.sid
   from v$lock a, v$lock b
  where a.block = 1
    and b.request > 0
    and a.id1 = b.id1
    and a.id2 = b.id2
  

BLOCKER         SID ‘ISBLOCKING’ 
BLOCKEE         SID
——– ———- ————- ——– ———-
APPUSER            8 
is blocking APPUSER          
9

Speaking again to the application developer, and looking at some logging code
he’d put into the application, it appeared that what was happening was that one
part of the code was requesting a new sequence number somewhere, then failing to
complete the transaction, but leaving the SEQUENCE_NUMBER row locked. All other
users then eventually ground to a halt waiting for this transaction to
eventually commit.

The customer’s next idea was to wrap up the request of these sequence numbers
into a function, which would declare a cursor for update, retrieve the number
and commit the transaction, all in one atomic process that the application
couldn’t muck up. The code for the function looked something like:

create function GET_CUSTOMER_ID
returns number
is
cursor SEQUENCE_CUR is
        SELECT seq_num_val FROM
sequence_numbers WHERE seq_num_type = ‘CUSTOMER’
          FOR UPDATE;
begin
         {logic to retrieve sequence
number}
end;

However, when we ran this concurrently with 100 users and took a look at the
TKPROF results, the performance was even worse:


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  —————————————-   Waited  ———-  ————
  enqueue                                        12        0.98          2.03
  latch free                                      6        0.01          0.10
  buffer busy waits                               5        0.02          0.05

Now we were really having problems with locks. The SELECT
… FOR UPDATE was locking the table even more aggressively than our previous
code, such that the whole process was now taking much longer just to do the same
job. We might solve our system deadlock problem, but the application definitely
won’t scale.

Of course at this point anyone who’s familiar with Oracle
will know that the customer should really have been using sequences instead of
maintaining a list of values in a table, and manually updating, selecting and
committing each request. Sequence values (or infact, the next value for a
particular sequence) are held in the SGA, only require disk I/O when the
sequence cache has been exhausted, and don’t involve any locks, buffer busy
waits or table contention. Sequences are one of the features within Oracle that
allow applications to scale, and their use in this situation is described in
Jonathan Lewis’ chapter in

Oracle Insights : Tales Of The Oaktable
:

"The primary purpose of Oracle sequences is to supply
unique numbers rapidly … So what do you do if you don’t want to use Oracle
sequences? Typically,something like this:

select seq_value
into :local_variable
from seq_table
where seq_name = ‘SLOW_REQ’
for update;

update seq_table
set seq_value = seq_value + 1
where seq_name = ‘SLOW_SEQ’;
commit;

The commit is necessary to reduce the time that other
sessions have to sit waiting for you to get out of the way … More
significantly, you have reduced your scalability quite dramatically. Sessions
will be queueing, and you’ve now achieved three round trips, three redo
records, and six block changes, something that Oracle could do in one (or
zero) round trips, no redo records, and no block changes"

The obvious step then was to recommend that all of this
code was replaced by simple sequence lookups, which would get rid of the segment
contention, contention for block buffers, locking and extra redo. However, the
problem was that the customer’s management required that all applications be
"database independent", so that they weren’t locked into a single database
vendor.

The reasoning behind this was that the client had some
associated companies, not all of which used Oracle as their RDBMS, and the
software might possibly be used by these other companies on their database. In
addition, the company had got a particularly good deal from Oracle when they
last licensed the database, and migrated from Sybase onto Oracle because of this
deal. If they didn’t get the same deal when they came to renew their license,
they might well move to a different database vendor, or even look at an open
source database such as mySQL.

If you read AskTom or have got Tom Kyte’s

Effective Oracle By Design
, you’ll know Tom’s view on database independence:

"Here’s something that you might find controversial :
Database dependence (regardless of the database you are using, this does not
apply just to Oracle) should be your goal, not something to avoid. You want to
maximise the investment in your database. You want to develop the best
software in the least amount of time against that database. The only way to do
that is to fully exploit what the database has to offer.

The truth is that, with the exception of trivial
applications, achieving database independence is not only extremely hard, but
is also extremely costly and consumes a huge amount of resources … your
application would not use SQL beyond simple "keyed reads" … in short, you
would end up writing your own database."

Of course a cynic would say that "He would say that,
wouldn’t he, working for a database vendor" but at the same time, you’ve paid a
lot of money for Oracle, and it seems a shame not to use the very features that
Oracle can charge a lot of money for. Jonathan Lewis, in the same Oaktable book
chapter, warns however that money aside, if you don’t use these features, your
application just won’t scale:

"An application that is database independent is
automatically an application that will perform badly at high levels of
concurrency or large data volumes. It may do the job adequately, but you may
spend all your time and money struggling to push the software into meeting
Service Level Agreements (SLAs) and the users’ expectations … Database
developers put a lot of effort into making their technology efficient, and
basically assume that if they design a good feature into their systems you
will be prepared to pay a small performance penalty in one part of their code
in order to win a big performance gain elsewhere. Database-independent code
doesn’t use features, it uses vanilla-flavored, lowest common denominator
elements – so you pay every penalty, but never pick up the benefits."

Also, at what point do you declare database independence -
do you allow stored procedures? (most databases have them, but their syntax is
different). What about analytic functions? They’re in SQL-99, but not everyone
implements them. SQL Server has identity columns, and Access has autonumber
fields – can we use sequences now, if we equate them with these other
implementation methods? It’s difficult to come up with a workable definition of
database independence that doesn’t just leave us, as Tom Kyte suggests, with
just tables and indexes.

So, effectively, we were stuck with this sequence
workaround, which we had to keep in place to satisfy the requirement for
database independence, but which effectively limited the scalability of the
application and had Oracle working at only a fraction of it’s possible capacity.
The irony of the situation was that the client could end up paying more for
hardware to support the inefficient application, and was already paying out
additional money to have someone like myself take a look at the application.
But, by doing this they were able to switch from vendor to vendor whenever they
wished, other companies within the group could use it on their own database
platforms, and perhaps the deal they might get in future would outweigh the
additional cost of running the application in this way.

Anyway, I thought this was an interesting bit of work in
the end, as it illustrated the impact of not doing things the Oracle-optimised
way, and also showed why a customer might choose, for logical business reasons,
to go down the database independence route. I’m due to go back in there early
next year, so it’ll be interesting to see if sequences were eventually
considered, what impact they might have had, if they weren’t what additional
cost and effort was incurred in making the application scalable by other means.

Comments?

Comments

  1. Andy Todd Says:

    Great piece Mark, well written and argued. I’d go one step further and say that every database solves this problem in a different way, each of which is more efficient than using a table of sequences.
    In MySQL, for instance, they have the auto_increment column. A side effect of this is some arguably not great database designs where every table has an artificial key even if it doesn’t require one. This is usually because the person designing the schema *knows* that the definition of a primary key column is and that every table needs one.
    [1] http://dev.mysql.com/doc/mysql/en/example-AUTO_INCREMENT.html

  2. Mr. Ed Says:

    “Database independent” seems to mean that an application _works at all_. No one seems to include the notion that the app has to also _work with reasonable performance_. It’s like saying that you can stop programming once you get the program to compile.
    You can have severe problems on other databases. For example, DB2’s locking mechanism is vastly different from Oracle’s, and an elaborate scheme to reproduce a feature can lock up hard (think: an Oracle app meets lock escalation on DB2). I have seen companies switch from one database to another, and things like this always nip them in the behind.
    A better goal is to only use SQL/features/code that can be easily reproduced on other database types. Every database seems to have the equivalent of a sequence, so there’s nothing wrong with using it. But if you get into some of the fancier Oracle, DB2, or Sybase features, you start to dramatically increase your work if you need to convert.
    Companies will spend a lot more money trying to make their “database independent” code work on every database then to simply use the constructs that the database vendor has provided. After all, you’re not creating the index leaves yourself, anymore, right?

  3. Sebastiano Pilla Says:

    They could’ve achieved a good compromise by encapsulating the sequence retrieval function in a module (very loosely speaking, I don’t know the programming language used), and using one implementation for Oracle (Oracle sequences) and another implementation for another database. The amount of code that differs from one database to another would still be very small and they could obtain satisfactory performance.

  4. Alan Green Says:

    There’s a balance here between a working application and remaining reasonably flexible to choose between RDBMS implementations. Rather than completely embracing vendor specific extensions, we tend to write fairly generic SQL for the easy 80% of an application and then do whatever it takes to make the hard 20% work well.

  5. Kim N rby Andersen Says:

    Allow me to disagree to some point. Actually database independence is a good thing – and I’m right out of school. My point is that to ensure gaining a broader market, one must be able to deploy “anywhere”, using most large vendors.
    However, I seek it through an alternative strategy, compared to the “tables and indexes” way. I believe that Object Oriented Programming should supply the application with a data abstraction layer, with a complete interface (here, I mean the interface keyword, as used in C++, java og C#), and allowing you to use the “Oracle app layer”, “MSSQL layer” or the “Tables and indexes” layer.
    Anyways, I’m ready to debate this in further detail, if anybody should wish so :)
    Yours
    Kim

  6. Dan Clamage Says:

    Is database independence what they’re teaching in universities these days? Some professors are really out of touch with the real world, where we have to make things work quickly and scale well. A well-known Best Practice (that has been proven to work) is to encapsulate your SQL in an interface layer, so when you have to go changing things for another target database, that’s the only thing you have to muck with. The application layer (however you wish to implement it, 2-tier, 3-tier or n-tier) shouldn’t have to change. This is how we design and build highly maintainable systems (where you make few or no changes to the software whenever business requirements change).
    What would really aid database migration projects is a translator that takes PL/SQL for example and turns it into T-SQL. But I think a cross-platform, robust translator is a long ways off.

  7. Stew Stryker Says:

    As Sebastiano Pilla wrote:
    They could’ve achieved a good compromise by encapsulating the sequence retrieval function in a module (very loosely speaking, I don’t know the programming language used), and using one implementation for Oracle (Oracle sequences) and another implementation for another database. The amount of code that differs from one database to another would still be very small and they could obtain satisfactory performance.
    I agree entirely.
    If you’ll pardon the metaphor, here’s my view of “database independence”. Let’s say you buy a car. You ski, snowboard and kayak, so you buy a Thule roof rack with all the doodads to attach them to this car. A few years later, you trade the car in for a different brand and type. If you expect to be able to strap the old roof rack on your new vehicle, you’re going to be disappointed because 8 times out of 10, it ain’t going to work. But you don’t have to buy a whole new rack and ski, snowboard and kayak attachments. You just have to buy new feet for the rack bar.
    So those vehicle-specific feet are equivalent to those database-specific modules that you’ll have to maintain when you move to a new database. Thinking you’re going to get 100% future compatibility with a new database (or new car) is foolish.

  8. H.Tonguc YILMAZ Says:

    One of the most interesting example of database independence may be seen in JDBC programming in my opinion. Since JDBC is a standart java guys talk about this as an advantage, but here are some test results for improving the performance of J2EE applications with very simple JDBC tips;
    Total time for inserting 2000 rows to a simple oracle table was 9499 milliseconds initially
    Disabling auto-commit mode,
    With Prepared Statement,
    Batching with batch size = 30
    Total time for inserting 2000 rows was 311 milliseconds, ~%97 gain…
    Even with Pl/Sql Bulk operations same operation can be done under 50 milliseconds..
    (Other simple features like Row PreFetching, Defining the Types of Result Set Columns, Statement Caching also affects the performance very much – Oracle 9i JDBC Programming by Jason Price)
    In my organization this independence efforts and the black-box syndrome like Kyte mentions(Expert One-on-One by Thomas KYTE) minimize our investment for Oracle and makes me a magician since with the very small knowledge which is available everywhere I can improve an application’s performance 2-3 times..
    Lately the java develpers I work with needed a search tool for their internet application, the project manager started to investigate the available tools. At that moment I heard about this and talked with the manager of the group and suggested them to use Oracle Text since they are running on Oracle9iR2. The result were as usual, I was once again a magician :o))
    Developers who work with Oracle must know what is available for them, if not “database indepedence” guide people to run on Oracle like an Excel sheet and reinvent the wheel again and again in every new project..
    Best regards.

  9. Savitha Malve Says:

    Why re-invent a wheel when we already have one built in.There is no point in discarding built in features after buying a product which is efficient in handling things we want and waste couple of days of time to find other options.

  10. David Aldridge Says:

    It’s an interesting paradox, that forcing database independence is intended to make the enterprise more cost effective by reducing potential licensing costs, however the method makes applications less cost-effective themselves through requiring more expensive hardware, and that more CPU’s mean higher licensing costs whatever platform is chosen.

    So it doesn’t actually make economic sense as well as being technologically a poor practice.

Website Design & Build: tymedia.co.uk