Feedback On Recent Articles
Over the last couple of weeks I posted a couple of articles that generated a lot of feedback, so much so that I thought it worth posting an update article summarising some of the comments I received. The articles were on subjects that tend to pretty much divide the Oracle community so it was no surprise that opinions were in both cases pretty divergent.
The first article was entitled "Is Oracle A Legacy Technology" and was based on a couple of articles by Mogens N rgaard that argued that databases are now, like operating systems and disk storage, a commodity technology and that the Oracle RBDMS in particular is actually now a "legacy" technology. The article was linked to on a number of blogs and generated a lot of referrals from the Firebird (an open-source RDBMS) news site.
Joshua Allen, who works for Microsoft but has often commented astutely on Oracle postings, made this comment:
"I think RDBMS has been sliding toward commodity for at least 5 years now. I recall major initiatives in companies where I consulted as far back as 8 years ago to offshore Oracle DBA work.
On the other hand, I would not equate RDBMS with what Oracle, Microsoft and IBM offer. The vendors have recognized the commoditization for some time and have been aggressively moving up the value chain. Competition is not about RDBMS or perf anymore; these components of the Database account for a tiny fraction of the incremental sales. Most people buy Oracle, IBM, etc. for the other functionality now; and I think analytics is a big part of this. While skills like performance tuning and relational design are commodity now; skills like dimensional modeling and ETL design are not. And although I was personally involved in offshoring some OLAP projects as far back as 5 years ago, I think there is still a huge demand for skilled people who know these things at home and who can stay at the high end of the value chain. The quality of ETL design still varies wildly, and even when this becomes a commodity skill, there is a desperate need for people who understand analytics -- and note that the bulk of IT spending, CRM, homeland security, etc. is being spent in these areas rather than on RDBMS or Oracle perf tuning."
Andy Todd's comment that "Yep, the database is becoming more legacy by the day. Joshua - Oh how I wish that relational design was a commodity skill. You obviously haven't seen some of the data models that I have ;-)" broadly agreed but but then a poster called Billy came up with a differing opinion:
"... Today databases are TB's. VLTs are meassured in billions of rows. Welcome to the Information Age. Or more accurately, to the dawn of the Information Age. The sun has just peaked out behind the horizon.
We ain't seen nothing yet.
Who can honestly say that any Open Source database can provide the technology, scalability and performance of a commercial database? Or that something like SQL-Server (years behind in technology) can ever catch up with Oracle? Oracle has pretty much become the trail breaker in RDBMS technology.
J2EE, app tiers and all that? I remember the first buzz about OO. I recall buzzwords such a JAD, CASE and many more that would have shaped the face of the 90s.. no more developers or programmers. CASE and OO will herald something very new and diffent.
Er.. yeah.. right. How many more developers do we have today in comparison to back then?
Applications comes and go. Application Technologies come and go. Thin-clients are not new. App tiers are not new. Been there and done that in the 80's when most J2EE prophets were still in diapers. J2EE has come.. and will go. New today. Legacy tomorrow.
What has remained constant? Our thirst for more and more data. Data is foreever. Applications get replaced.
The database has always been the core. Is still the core. Fact. Not opinion.
As long as Oracle leads the pack in technology and features, while being price competitive, it will still be around for a long time to come.. Open Source or SQL-Server be damned."
Given that I had liberally quoted from Mogens' articles I actually then got in touch with him to get his take on the debate. Mogens subsequently commented:
"Great comments. Thank you! When James Morle and I started the BAARF Party (www.baarf.com) some time ago, sales of RAID-5 (and 4) SANs took off. When I wrote the paper "You Probably Don't Need RAC" some two years ago, sales of RAC really took off.
So when I claim Oracle to be legacy, I would buy Oracle stock now :).
Nevertheless, I'd like to make a few comments, which hopefully will provoke some debate(s).
The young people coming out of the schools these days have been taught the Database Independent design principles. All they want - all they request - from a database is "persistent storage" (aka tables) and the ability to tune (aka indexes). No kidding.
As Jonathan Lewis points out in his chapter (10) of the Tales Of The OakTable book, database independent design will suffer if you either experience increased data volume or concurrency. Until then, it might work. If things go wrong, however, it's back to the database specific features.
But the real horror is that these days a lot of databases will be asked to do very silly things without anyone bothering, not even during a crises.
Relational Design: Ah yes. We've been hoping that people would do it the right way for the last 20 years or more. They don't. We don't. It has become an easy way of casting blame on anybody else than us (who don't do it). So instead of still hoping in vain for a change of mind all over the World, perhaps we ought to lower our blood pressure and accept that very, very few people are willing to do it this way? It's not going to become a universal thing to do, so let's be happy those few times when we actually see it at work. 20 years of not really seeing it ought to be enough for most :).
Finally, it's interesting to ponder the "SQLserver is years behind" argument.
Yes, it is. The two big questions are: It's behind, but is it good enough? And do customers need all the new features of Oracle?
There's a very different mood/atmosphere in SQLserver Development than in Oracle Development these days. The SQLserver guys are trying to catch up. They're trying really hard, and it would appear they're succeeding. The wait interface stuff in Yukon (now called SQLserver 2005) is called Dynamic Management Views (DMV's) are would appear to be superior in several aspects to Oracle's wait interface.
SQLserver 2005 has got read consistency. They call it row versioning or something, but it's there, it can be turned on, and it seems to work.
So fine. 10g has got fantastic stuff. They've also got a lot of features that perhaps UPS or eBay need. But why should most DBA's care about all this new stuff if the system runs fine?"
Mogens' comments about database independence then led me on to the second article, entitled "The Cost Of Database Independence". This followed on from an engagement earlier in the year, where I'd worked on a tuning project for a client who wasn't allowed to use any features that were specific to a particular vendor (in this case Oracle). The reasoning behind this was that they were coming up to their license renewal soon and didn't want to be tied in to Oracle, and they also wanted to port their application to other companies in the group who didn't use Oracle. The problem however was that they also wanted the application to scale, which it wasn't doing at present whilst written in a "database independent" manner.
The feedback on this article split pretty much down the line of DBAs and Java developers. Edward Stangler's comment was typical from the DBA side and proposed loosening the database independence idea to allow features that were implemented in some form or another by all vendors:
""Database independent" seems to mean that an application _works at all_. No one seems to include the notion that the app has to also _work with reasonable performance_. It's like saying that you can stop programming once you get the program to compile.
You can have severe problems on other databases. For example, DB2's locking mechanism is vastly different from Oracle's, and an elaborate scheme to reproduce a feature can lock up hard (think: an Oracle app meets lock escalation on DB2). I have seen companies switch from one database to another, and things like this always nip them in the behind.
A better goal is to only use SQL/features/code that can be easily reproduced on other database types. Every database seems to have the equivalent of a sequence, so there's nothing wrong with using it. But if you get into some of the fancier Oracle, DB2, or Sybase features, you start to dramatically increase your work if you need to convert.
Companies will spend a lot more money trying to make their "database independent" code work on every database then to simply use the constructs that the database vendor has provided. After all, you're not creating the index leaves yourself, anymore, right?"
whilst Sebastiano Pilla suggested another good compromise:
"They could've achieved a good compromise by encapsulating the sequence retrieval function in a module (very loosely speaking, I don't know the programming language used), and using one implementation for Oracle (Oracle sequences) and another implementation for another database. The amount of code that differs from one database to another would still be very small and they could obtain satisfactory performance. "
Phillip J. Eby, commenting on Andy Todd's reference to the article, came up with another approach again:
"It's important to distinguish between database-independent SQL, and database-independent code. Your SQL can be database-specific, because it pretty much *has* to be, for all the practical reasons discussed in the article.
However, your *code* should not be database specific; it should call functions that are named according to the domain-specific things you need to do, like "findCustomerByName()". Better yet, put those functions as methods on a "CustomerDatabase" class. Then, your application *code* is not tied to the database, even though you have a class that's tightly coupled to a specific database.
And, even if you don't change databases, you'll *still* be glad you did it, when you discover that the new version of your database's optimizer doesn't handle that query correctly without a "force" or "hint", and you have only one place that you have to look for or change it. Not only that, but your code will be more comprehensible to somebody reading it and seeing 'findCustomerByName()' instead of a bunch of SQL.
Speaking as someone who's migrated a significant SQL-based application from Sybase to Oracle, I wish I had taken this approach when the app was first being written. But because SQL was embedded directly in the code, I had to define a bunch of low-level functions (similar to JDBC escapes, in effect) and go around sticking them inside the SQL in a lot more places than I would have with the approach above.
Also, when the Oracle admins asked for sample queries, we could've just said, "here's a copy of the class with *all* the SQL in it."
However, the J2EE developers were pretty much in agreement that database independence is actually, contrary to my assertion, a good thing. Kim made the following point:
"Allow me to disagree to some point. Actually database independence is a good thing - and I'm right out of school. My point is that to ensure gaining a broader market, one must be able to deploy "anywhere", using most large vendors.
However, I seek it through an alternative strategy, compared to the "tables and indexes" way. I believe that Object Oriented Programming should supply the application with a data abstraction layer, with a complete interface (here, I mean the interface keyword, as used in C++, java og C#), and allowing you to use the "Oracle app layer", "MSSQL layer" or the "Tables and indexes" layer."
Whereas David Warnock was not impressed and thought I was just wrong:
"Yes I agree that the way this particular company were allocating ID's was not very good from a performance point of view. But the article then makes two huge and untenable leaps.
Firstly, that there is not a better way to do this and keep the same level of database independance (ie use no database specific techniques such as sequences). This is just not true. We and I am sure many others have been doing it for years. One option is for each client to grab a block of ID's. The client can then allocate ID's sequentially from the block they are given without needing to talk to the server at all. Tune this by allocating blocks of ID's to clients according to their expected volumes and you could reduce the use of the sequence number table by a huge factor.
I once did this for a system that used a dotted id like an ip address. Each part was allocated by one level so that ID's could be alloocated anywhere and still be unique. If I remember the parts were siteid.serverid.clientid.id each time a server started it took the next serverid for its site. When a client started it got the next clientid from it's server then the client allocated id's sequentially. If it ran out it got the next clientid and started again.
Secondly, the other wrong assumption throughout the post is that you can't have database independance and use database specific features. Again it is just wrong, OJB as one example (for Java, or SQLObject for Python and of course there are many many others) handles sequences correctly for a wide range of databases. We have done it ourselves for Firebird, Postgresql and Mysql in the past. It is not rocket science.
If I think cynically then I am suspicious of anything that encourages us to get the performance by sacrificing database independance. In my working life that has been very valuable to allow flexibility in scale of deployment, in platform for deployment, in switching between business models (shrink wrapped vs ASP vs Open Source) and in protecting us from changes in policy/direction etc of the database supplier. When you have used a database where the supplier got taken over and the new owner dropped the product quickly you set a high value on database independance as a form of business insurance."
Both articles had as their theme the concern that the job of the DBA, and in particular the job of the Oracle DBA, was becoming marginalized as all the action moved towards simpler databases and more business logic in the mid-tier. The comments were good in that, whilst many agreed with this assertion, often the same ones recognised that, for a system to scale and to cope with the increased size of databases we're working with now, there is a definate role for enterprise-class databases and the DBAs that service them. It was also interesting to get the opinion of the J2EE developers on how they'd achieve datbase independence whilst still allowing the system to scale up. I'll have to try some of these approaches out when I next come across a customer looking to go down the route of "database independence".