Not such good ideas

January 26th, 2007 by Peter Scott

One of our consultants dropped by my desk to ask me some Oracle questions around data life cycle management. One of his customers is moving a store based system to a central system and like a lot of systems in the UK needs to keep 7 years of transactional data available. I explained all of the standard stuff: partitioning for ease of maintenance, use of lower spec disk (slower) for the old stuff you don’t need often, table compression, indexing techniques.

He came back a couple days later to run past me the ideas from the customer’s external consultant. The consultant proposed that the data was partitioned by geographic region. Although this idea sounds like it has merit, after all most queries will concern single stores, it rang alarm bells in my mind.

  • There is a need to keep a rolling 7 year history, old data needs to be removed, new added. Maintaining partitions through delete and insert operations is not efficient. Dropping the oldest partition and adding a new one is very efficient, especially if we partition exchange to populate
  • Not using time as part of the partition key prevents access to features such as readonly tablespaces
  • Although each store is today allocated to a region, who’s to say that next month the business won’t reorganise and drop a couple of regions and reallocate the stores affected to the remaining regions. It is not sensible let data move between partitions. Ideally, a partition key should be immutable and region clearly is not

It would seem that the consultant took the positive attributes of partitioning: ease of DBA management, improvement of ETL process and enhanced query performance through partition elimination and decided that the first two were unimportant. I think that manageability and batch is very important and query performance might be better addressed through indexing. But if partition elimination is desirable based on store I would go with either list or hash sub-partitions on store_id, and leave the region well alone

Comments

  1. David Aldridge Says:

    Hmmm, it’s a very strategic business this partitioning. I’d think that with a reasonably static relationship between store and region then list subpartitions could be organized to place single regions in single subpartitions, or at least avoid multiple regions being represented in the same subpartition.

  2. Peter Scott Says:

    David, in this case the company moves stores between regions at least once a year!
    And as the reporting is always by store - the system replaces a store system in each outlet then hash or list on store id seems a better choice then partitioning on a level that is not often used and is subject to change

  3. David Aldridge Says:

    How do they handle that change Pete? Preserving history of region-store relationships or just storing current one?

  4. Peter Scott Says:

    It’s not a DW as we would know it, David. None of this SCD stuff for them. Basically they are replacing n-hundred independent store systems with a single central web-based one. All queries are by store so what is the point partitioning by region.

  5. herodt Says:

    I am not a DW person by any stretch, and what I am about to say may simply be a slow and painful way of inserting my foot into my mouth…

    but wouldn’t creating a key based on the year-region-store be the way to go?

    Then it is by year for the rolling data Possibly set it year-month if that is necessary, if a region is dropped or restructured then that can be dealt with if it is necessary, but with the store in there, that should allow the partion to be picked during the query.

    The partions that each store searches should be limited to a single year normally assuming normal retail setup, and if the search is necessary to go back 7 years, then there are still only 7 partitions.

  6. Peter Scott Says:

    herodt:
    I suspect (given the way the current store based system is used) that the only queries will be single store, all years. This is not a BI system. Stores will never be compared against each other for this type of data so organising by region seems a like creating work to handle store-region migrations when in reality hash or list sub-partitioning on store will do the same (functional) job with no additional steps.

    Another downside to not using date as the partition key is that we will not be able to make effective use table compression as we will be dealing with deletes from blocks when we remove old data