Accidents waiting to happen

December 28th, 2005 by Peter Scott

A while back I blogged about a data warehouse review we are doing for a customer. At the time I was less than impressed by the consistency of the approach to DW design and the lack of knowledge of how Oracle works and what is ‘built-in’ as functionality. I have now moved on to another of their legacy systems. On the surface things look better. Source code has version control headers, they are using a source control tool (OK, it is not one that I like, but if it works for them, so what?), even the variable names are intelligible (self documenting) and there are comments in the code. Lots of ticks in the good practice checklist, or is this just too good to be true?
One of the modules investigated populates staging tables from a source (non-Oracle) database and then applies this to the ‘published’ layers of the data warehouse (I would have probably specified two modules, one to stage the data and one to publish – but I am not here to split hairs). Looking at the code in more detail I see variable names such as reject_count, commit_frequency. Could this mean that someone has written set-based code that also can handle data exceptions efficiently? Sadly, no; all of these carefully crafted, well named variables appear only once in the procedure, just before the ‘BEGIN’ statement!
Worse still was the technique used to populate the staging table – the developers used a cursor loop to identify the tables to populate but held the source table names in a hard coded array accessed by loop counter. An accident waiting to happen!

Comments

  1. Patrice Borne Says:

    I have been working on different systems with Oracle as a back-end and my opinion is that the data (and the database) is the critical piece. Nobody cares what technology you use in front of the DB (J2EE, ASP.NET, PHP, whatever…) as long as it’s working and able to generate decent screens, but, everybody cares about the data.

    The integration of your system with “the rest of the world” is paramount (along with the design of your DB of course). My favorite situation is when PPT managers draw boxes on a slide (to represent systems) and arrows in between to mean that these systems are somehow integrated. If the next slides don’t explain how these magic arrows will be built, it’s a red flag to me. It means people on the project are more concerned about how their screens look like than how their data looks like.

    I had to work on a project 3 years ago where I could not control the integration design process. The code we received was a joke! The usual stuff: open a cursor, parse the result set (sometimes hundreds of thousands of rows) one row at a time, check if the row exists in the destination table, if yes, update, else, insert etc.

    I ended up rewriting chunks of it myself!

    Lately, I have been working on a complete rewrite of an other system from scratch and I was fully in charge (design of the DB, integration with Tibco, the Mainframe and other Oracle databases, design of the pages with ASP.NET) and I decided to do it right. I built everything based on pure SQL and the set approach. I used a “staging area” to load the external data before processing it.

    It took me a bit of time to get started at first, because I had a lot of incoherent data and I had to build my watchdogs. But, let me tell you that the result is way superior to the open cursor approach. First, I estimate the processing to be between 50 and 100 times faster than the cursor approach. Second, I can catch data inconsistencies more reliably, by leveraging Oracle for what it was designed to do (you put constraints on your tables and let the engine tell you when there is a problem).

    When I am on a project, the first thing I look at is who is in charge of the integration and how they do it. This tells me if there is a good chance for success or failure, more than anything else.

  2. Pete_S Says:

    Thanks for the input, Patrice

    I could not agree more with the fact that is the data is the critical piece. All serious database vendors have invested in developing systems that handle data integrity and validation – why do I come always find systems put together by developers that think they can do it better than Oracle, SQL Server or DB2? Data is for databases.

Website Design & Build: tymedia.co.uk