Data Warehouse Fault Tolerance Part 3: Restoring

Hopefully you’ve read the introduction, Part 1, and Part 2. Those posts detailed methods for building fault-tolerant ETL code, with a strong bias in favor of using Oracle Database features. Now I’ll drill into the backup and recovery aspect of data warehousing fault tolerance, and tackle the age-old question of whether to ARCHIVELOG or NOARCHIVELOG in a BI/DW environment.

When I engage with clients that have a data warehouse operating in NOARCHIVELOG mode, their usual reasoning for this decision is a perceived performance gain. This makes sense on the surface… because NOARCHIVELOG prevents the generation of all that unwanted and unneeded REDO, right?

Not exactly. There is misconception about what NOARCHIVELOG actually means, and hopefully, I can clear that up with a demonstration. I have a database in NOARCHIVELOG, and I’ll test to see whether my statements generate REDO:

The regular insert statement generated 38M of REDO in a NOARCHIVELOG database. Interesting. And the INSERT /*+ APPEND */ statement generated only 92K. Though it would appear that neither of these statements actually executed in NOLOGGING mode, the truth is that the APPEND statement did. All statements generate a little bit of REDO, because updates to the data dictionary are always logged.

So why do regular inserts generate REDO on a NOARCHIVELOG database? There is a myth in the Oracle world that NOARCHIVELOG means that no REDO is generated, but that is not the case. Choosing NOARCHIVELOG mode simply means that we are foregoing the option to use media recovery (restoring datafiles, rolling forward). Think about it: REDO is not simply for media recovery, it’s also for crash recovery. If all REDO generation was suspended, Oracle wouldn’t be able to open after a simple server crash. In NOARCHIVELOG mode, there are situations where we can suspend most of the REDO generated, and one of those situations involves using the INSERT /*+ APPEND */ statement. So why would the database allow these NOLOGGING operations? Because direct-path operations write blocks directly into datafiles, bypassing the buffer cache. We wouldn’t have to rely on the online REDO logs to recover those transactions, and so Oracle allows us to minimize the REDO generated.

So if you have your database in NOARCHIVELOG mode for performance reasons, but you are using ETL tools that don’t support true direct-path writes on Oracle (a lot of the third-party tools don’t), or you are using cursor-based, row-by-row load scenarios, the same amount of REDO is generated if the database was in ARCHIVELOG mode. The only thing gained from operating in this manner is the privilege of having to shut down the database whenever a backup is needed.

Perhaps another myth that gets perpetuated is that we can’t have the best of both worlds, but in fact we can. We can minimize the amount of REDO generated, we can operate in ARCHIVELOG mode, we can backup our database in online mode, and we would be able to restore from that backup. The solution: NOLOGGING tables and indexes. I’ll put the database in ARCHIVELOG mode, and rerun the test case above with one small change: I’ll change the table to be NOLOGGING:

We get the exact same behavior with a NOLOGGING table in ARCHIVELOG mode than we did with NOARCHIVELOG mode. But is having the database in ARCHIVELOG mode of any value when all of our ETL processes are NOLOGGING? We can perform an online backup, but would we even be able to restore from that backup if we have transactions that executed as NOLOGGING?

The answer is “yes” and “yes”. We just need one small change to our backup strategy: a well-placed incremental backup.

To increase the performance of our incremental backup, we need to create a block change tracking file. The database keeps a list of all changed blocks so that RMAN will know exactly what to backup during an incremental:

We start by taking the initial incremental level 0 backup:

Now I’ll load the SALES table with another INSERT /*+ APPEND */ to make sure we have a NOLOGGING operation since our last backup.

This is the step in our process that requires a slight change to our backup and recovery strategy: we should get an incremental level 1 backup as soon as the load is complete. This will physically backup all blocks that have been affected by the load, and we wouldn’t need to logically apply the REDO logs that are missing the NOLOGGING operations. Since we have changed block tracking, this step will be extremely fast, and I recommend that the ETL process flow or main driving script execute the backup as the very last step in the batch load.

Now, let’s see if we can restore:

So that’s it for the Three “R”‘s. I had a lot of fun revisiting the “operations” side of the house, and logging in as SYSDBA again. It’s amazing how it all just came back to me… I didn’t have to look at the manuals at all. Okay… maybe once.