Data Warehouse Fault Tolerance Part 3: Restoring

February 12th, 2010 by

Hopefully you’ve read the introduction, Part 1, and Part 2. Those posts detailed methods for building fault-tolerant ETL code, with a strong bias in favor of using Oracle Database features. Now I’ll drill into the backup and recovery aspect of data warehousing fault tolerance, and tackle the age-old question of whether to ARCHIVELOG or NOARCHIVELOG in a BI/DW environment.

When I engage with clients that have a data warehouse operating in NOARCHIVELOG mode, their usual reasoning for this decision is a perceived performance gain. This makes sense on the surface… because NOARCHIVELOG prevents the generation of all that unwanted and unneeded REDO, right?

Not exactly. There is misconception about what NOARCHIVELOG actually means, and hopefully, I can clear that up with a demonstration. I have a database in NOARCHIVELOG, and I’ll test to see whether my statements generate REDO:

The regular insert statement generated 38M of REDO in a NOARCHIVELOG database. Interesting. And the INSERT /*+ APPEND */ statement generated only 92K. Though it would appear that neither of these statements actually executed in NOLOGGING mode, the truth is that the APPEND statement did. All statements generate a little bit of REDO, because updates to the data dictionary are always logged.

So why do regular inserts generate REDO on a NOARCHIVELOG database? There is a myth in the Oracle world that NOARCHIVELOG means that no REDO is generated, but that is not the case. Choosing NOARCHIVELOG mode simply means that we are foregoing the option to use media recovery (restoring datafiles, rolling forward). Think about it: REDO is not simply for media recovery, it’s also for crash recovery. If all REDO generation was suspended, Oracle wouldn’t be able to open after a simple server crash. In NOARCHIVELOG mode, there are situations where we can suspend most of the REDO generated, and one of those situations involves using the INSERT /*+ APPEND */ statement. So why would the database allow these NOLOGGING operations? Because direct-path operations write blocks directly into datafiles, bypassing the buffer cache. We wouldn’t have to rely on the online REDO logs to recover those transactions, and so Oracle allows us to minimize the REDO generated.

So if you have your database in NOARCHIVELOG mode for performance reasons, but you are using ETL tools that don’t support true direct-path writes on Oracle (a lot of the third-party tools don’t), or you are using cursor-based, row-by-row load scenarios, the same amount of REDO is generated if the database was in ARCHIVELOG mode. The only thing gained from operating in this manner is the privilege of having to shut down the database whenever a backup is needed.

Perhaps another myth that gets perpetuated is that we can’t have the best of both worlds, but in fact we can. We can minimize the amount of REDO generated, we can operate in ARCHIVELOG mode, we can backup our database in online mode, and we would be able to restore from that backup. The solution: NOLOGGING tables and indexes. I’ll put the database in ARCHIVELOG mode, and rerun the test case above with one small change: I’ll change the table to be NOLOGGING:

We get the exact same behavior with a NOLOGGING table in ARCHIVELOG mode than we did with NOARCHIVELOG mode. But is having the database in ARCHIVELOG mode of any value when all of our ETL processes are NOLOGGING? We can perform an online backup, but would we even be able to restore from that backup if we have transactions that executed as NOLOGGING?

The answer is “yes” and “yes”. We just need one small change to our backup strategy: a well-placed incremental backup.

To increase the performance of our incremental backup, we need to create a block change tracking file. The database keeps a list of all changed blocks so that RMAN will know exactly what to backup during an incremental:

We start by taking the initial incremental level 0 backup:

Now I’ll load the SALES table with another INSERT /*+ APPEND */ to make sure we have a NOLOGGING operation since our last backup.

This is the step in our process that requires a slight change to our backup and recovery strategy: we should get an incremental level 1 backup as soon as the load is complete. This will physically backup all blocks that have been affected by the load, and we wouldn’t need to logically apply the REDO logs that are missing the NOLOGGING operations. Since we have changed block tracking, this step will be extremely fast, and I recommend that the ETL process flow or main driving script execute the backup as the very last step in the batch load.

Now, let’s see if we can restore:

So that’s it for the Three “R”‘s. I had a lot of fun revisiting the “operations” side of the house, and logging in as SYSDBA again. It’s amazing how it all just came back to me… I didn’t have to look at the manuals at all. Okay… maybe once.

Comments

  1. rnm1978 Says:

    Great series of articles Stewart. Clearly explained and good examples.

  2. Tom Says:

    Just want to correct a possible misunderstanding from Mr. Stewart, actually no one says (or at least I’v never heard of it) “NOARCHIVELOG means that no REDO is generated”. Stewart’s test results are absolutely correct that Oracle in both archivelog mode and nonarchivelog mode does write REDO. However the difference is that in archivelog mode, oracle does not only write redo but also generates archive logs, which means basically when one redo log reaches its max size and before it could be reusable all contents in this REDO log file have to be archived into another archive log file. So in this situation duplicated disk writes occur, which could result in double disk I/Os and slow down the processes. Also, since huge amount of archive log will be generated you have to allocate huge disk space to accommodate those archive log files (for a billion rows data loading, it might require 100GB space for only one table) and in case of occurrence of insufficient space for archive log files the database will hang to await space to be freed up. So both performance and maintenance of archivelog files are key points to be considered. Anyway still many thanks for this valuable article.

  3. Tom Says:

    Further to above, of couse if all or most ETL processes could really perform DIRECT PATH data loading, there won’t be any concerns or worries about archivelog mode as it doesn’t write REDO and thereby it doesn’t generate archiv log files as well.

  4. Tom Says:

    The last point, archiving is also very CPU deintensive — it involves copying a file from one disk to another disk. But If you have the right number of high performance devices and
    IO channels — no worries, no degradation of ETL performance.

Website Design & Build: tymedia.co.uk