Are data warehouses dinosaurs?

Two statements

  1. Modern query tools allow multiple data sources to be interrogated in a single query => there is no need to consolidate information in a monolithic data warehouse
  2. The size of databases are increasing. Last year's Wintercorp survey illustrated it very clearly, as did a straw poll during a VLDB presentation at UK OUG 2006 that showed around 50% of the audience had databases over 5TB, a significant increase in proportion since the presenter last asked the same question a year or so ago.
So on the face of it we have a conflict - "you don't have to have everything in the same place to query it" and "people are storing more in single locations". But in reality this is not a conflict at all, it is far more subtle than that

Query tool vendors realise that in an enterprise many disparate information systems exist, some at the global or regional corporate level, some existing just within one operating unit and yet others are are point solutions at a personal or at most departmental level; coupled with the notion that many organisations have a hotch-potch of technologies from various mergers and acquisitions or incarnations of a corporate IT strategy and the challenges of consolidating data in a single place become evident. Joining data on the fly has many attractions. But there are many counter arguments to this approach

  • OLTP sources such as ERP systems are optimised to give fast and robust access to single row data. The systems are not designed to trawl through masses of data
  • Retrieving large data sets and then aggregating them away in a user query tool can waste precious network bandwidth - is it not better to move and aggregate just once?
  • Many operational systems have only a sense of 'now' - the amount of history is limited
Furthermore, legislation (SOx, accounting rules, national security etc.) may mandate long-term storage away from the transactional systems. I suspect the outcome is that the data warehouses will not become extinct, they will just evolve into data warehouses!

The data warehouse is dead, long live the data warehouse