Getting To The Bottom Of NOLOGGING, NOARCHIVELOG and Standby Databases
May 28th, 2005 by Mark Rittman
One of the customers I work with is looking to us for advice on how to set
their data warehousing database up. They’re looking to use the Enterprise
Edition of the latest Oracle version (10.1.0.4) and are planning on using Oracle
Warehouse Builder 10g as the ETL tool. Each environment (dev, test, prod) will
have two database instances, one to hold the OWB Design repository (with an 8k
block size) and one to hold the target data warehouse and the OWB Runtime
Repository (with a 16K block size). Setting up the design repository instance
isn’t an issue, it’s just a regular database set up using the default database
template, but the target data warehouse instance is a bit more interesting as
we’re looking to minimize the amount of redo log generated during the warehouse
initial build and then refresh.
Most people are aware that you can specify a NOLOGGING clause when creating
tables, indexes, partitions and tablespaces which under certain circumstances
causes redo generation to be switched off when the object is loaded. You’re also
probably aware that the database itself can be placed in NOARCHIVELOG mode,
which disables the process of redo log archiving which can otherwise slow down a
bulk load into a warehouse. However, what’s often not clear is which of these
options should be used and why you would use one rather than the other, and I
therefore thought it worthwhile going back over the documentation and trying to
set out what the optimal set of choices are. As usual, if you’re reading this
and you spot something that I’ve got wrong, or can add anything that makes the
process clearer, add a comment at the end and I’ll update the posting.
To put this in context, the client runs a number of other Oracle databases,
almost exclusively as the backend for their transactional systems, and the
general rule is that these databases are run in ARCHIVELOG mode, incremental hot
backups are taken during the week using RMAN followed by full hot backups at the
weekend, and the archived redo logs are stored off site and also used to refresh
their DR databases. Given a preference, they would rather run the new data
warehouse database in ARCHIVELOG mode, so that it can fit in with their existing
arrangements, but they are willing to perform the warehouse load and refresh in
NOLOGGING mode as long as we give them some downtime after the load to do a
full, cold backup of the warehouse.
What we would like to do is to somehow disable redo log generation for
initial loads into our dimension tables, and for initial and then subsequent
loads into the fact tables. The rationale for this is that the initial dimension
load, and all loads into the fact table, load a very large amount of data and we
can speed up this process if we don’t have to write redo information into the
redo log files – in other words, we dramatically reduce the amount of disk I/O
we need to perform to load a given object. If our database is running in
ARCHIVELOG mode, we get an additional benefit in that we don’t then need to
archive off these redo log files, again reducing the time to perform the load as
we don’t have to wait around for the ARCn process to archive the filled-up redo
log files. The downside of doing this though is that these data loads are
unrecoverable – that is, if the instance crashes or we have a media (disk)
failure, we wouldn’t be able to recoverer these objects. Whilst that isn’t a bit
deal for the objects we’re working with – we can just reload them from the
source files – it means that we’re going to have problems with any other objects
that are also in that datafile. What this means in practice is that we have to
do a full, cold backup of the datafiles that contain these objects before we can
consider the database "recoverable".
I mentioned earlier on that you can specify a NOLOGGING clause when creating
tables, indexes, partitions and tablespaces. Ignoring tablespaces for the
moment, what this means in practice is that you can create the table (for
example) using the NOLOGGING clause:
CREATE TABLE "PRODUCT"
(
"CLASS_DSC" VARCHAR2(50),
"CLASS_ID" NUMBER,
"FAMILY_DSC" VARCHAR2(50),
"FAMILY_ID" NUMBER,
"ITEM_BUYER" VARCHAR2(50),
"ITEM_DSC" VARCHAR2(50),
"ITEM_ID" NUMBER,
"ITEM_MARKETING_MANAGER" VARCHAR2(50),
"TOTAL_PRODUCT_DSC" VARCHAR2(50),
"TOTAL_PRODUCT_ID" NUMBER)
NOLOGGING
;
Then, you can carry out bulk loads into this table in NOLOGGING mode. For
example:
INSERT
/*+ APPEND */
INTO
"PRODUCT"
("CLASS_DSC",
"CLASS_ID",
"FAMILY_DSC",
"FAMILY_ID",
"ITEM_BUYER",
"ITEM_DSC",
"ITEM_ID",
"ITEM_MARKETING_MANAGER",
"TOTAL_PRODUCT_DSC",
"TOTAL_PRODUCT_ID")
(SELECT
"PRODUCT_DIM"."CLASS_DSC" "CLASS_DSC",
"PRODUCT_DIM"."CLASS_ID" "CLASS_ID",
"PRODUCT_DIM"."FAMILY_DSC" "FAMILY_DSC",
"PRODUCT_DIM"."FAMILY_ID" "FAMILY_ID",
"PRODUCT_DIM"."ITEM_BUYER" "ITEM_BUYER",
"PRODUCT_DIM"."ITEM_DSC" "ITEM_DSC",
"PRODUCT_DIM"."ITEM_ID" "ITEM_ID",
"PRODUCT_DIM"."ITEM_MARKETING_MANAGER" "ITEM_MARKETING_MANAGER",
"PRODUCT_DIM"."TOTAL_PRODUCT_DSC" "TOTAL_PRODUCT_DSC",
"PRODUCT_DIM"."TOTAL_PRODUCT_ID" "TOTAL_PRODUCT_ID"
FROM "PRODUCT_DIM" "PRODUCT_DIM"
);
One point to note here is that the NOLOGGING clause doesn’t mean that
every DML operations on the table works in NOLOGGING mode – it’s only bulk
operations like INSERT /*+ APPEND */ and Direct Loader (SQL*Loader) that take
advantage of it. Regular INSERTs, DELETEs and UPDATEs will generate normal redo,
even when you specify NOLOGGING. This is a popular misconception as logic would
suggest that the NOLOGGING table creation clause would suppress all redo log
generation, not just direct path loads, but all the NOLOGGING clause does is
specify NOLOGGING for the table creation process itself (in case you’re using
CREATE TABLE … AS SELECT) and for any subsequent direct path insertions.
Another misconception is that NOLOGGING suppresses all redo, even when using
direct path loads. What actually happens is that NOLOGGING operations generate
minimal redo, not no redo whatsoever, with this minimal redo generated so that
the data dictionary itself is preserved. It’s only a relatively small amount,
but it’s worth noting that some will be produced, whatever steps you take.
With indexes and partition creation, it’s a similar story – you can specify
NOLOGGING in the object creation clause, and then certain operations (in the
case of indexes, creation and ALTERs (rebuilds)) can take advantage of NOLOGGING.
So what about tablespaces? What do you get when you specify the NOLOGGING
clause when creating a tablespace?
CREATE TABLESPACE dims
NOLOGGING
DATAFILE ‘c:\oradata\dims01.dbf’ SIZE 50M
BLOCKSIZE 8192
EXTENT MANAGEMENT LOCAL UNIFORM SIZE 256K
SEGMENT SPACE MANAGEMENT AUTO
ONLINE;
What this actually does is ensure that the default LOGGING|NOLOGGING
attribute for objects created in this tablespace is set to NOLOGGING. You can
still subsequently create a table as LOGGING in a tablespace you define as
NOLOGGING, all the NOLOGGING attribute does for a tablespace is set the default
for objects later created in the tablespace.
Once you’ve bulk loaded your data, created your table using CREATE TABLE …
AS SELECT or rebuilt your index, the important next step is to perform a
baseline backup of the datafiles that contain your objects; once you’ve done
this, and until such time as you carry out any other unrecoverable DML, you can
then recover your objects like any other. This baseline backup
can be a hot backup (as long as you’re in ARCHIVELOG mode, which in our case
we are) and this backup then ensures that we have a baseline copy of the object
within our backup set if we then need to perform a full recovery.
So where does the ARCHIVELOG setting for databases come into things?
The general advice for setting up warehouse target databases is to put the
database into NOARCHIVELOG mode. NOARCHIVELOG suppresses the archiving of redo
log files and improves the performance of data loads as you don’t have the ARCn
processes running around in the background and holding things up as they copy
the redo log files off into your archive locations. The obvious cost of this is
that you will only have limited options when recovering your instance -
basically if you’ve overwritten your redo log files and not had the contents
archived off, and you have a media failure, you can only restore your instance
back to the last full cold backup you took (as the redo required to roll forward
from this point has probably been overwritten, and you didn’t archive it off
beforehand). However, for data warehouses, this isn’t usually an issue, as you
can just reload from the source files or databases, and the benefits of your
data load being quicker are usually greater than the (slim) chance that you’ll
need to perform a media recovery.
What we have here then are two separate concepts – ARCHIVELOG and
NOARCHIVELOG mode, which determines whether redo logs are archived off when they
fill up, or just overwritten and the redo then lost, and LOGGING | NOLOGGING,
where we either allow all redo to be generated as normal, or in certain
circumstances (the important ones, when we need to load lots of data) suppress
the majority of redo, at the cost of our operation being unrecoverable and
subsequently needing to reload the objects from the source files. These two
concepts interact in two ways:
- If we’re looking to suppress as much redo as possible, chances are we’d
like to suppress archiving as well (and we might as well, as the objects
we’re loading are unrecoverable anyway), and - If you specify NOARCHIVELOG mode for a database, all indexes, tables and
partitions are
automatically set to NOLOGGING, and therefore all INSERT /*+ APPEND */
and SQL*Loader direct load operations will automatically generate minimal
redo.
However, in our instance, the DBAs want to keep the database in ARCHIVELOG
mode, so that they can manage the database in the same way as the OLTP databases
and ship the archivelogs off to their DR site to maintain their standby
databases. As I was reading through the various articles and questions on Asktom,
I noticed
this answer which suggested that there could be some complications when
using NOLOGGING operations when you later on wanted to ship the archivelogs to a
standby database:
"when using a standby database for disaster recovery — you cannot
perform nologging operations without performing additional, manual work (in
fact, in 9i,
we can set a flag that says "ignore the request for nologging — log it
anyway" on the database to avoid this issue alltogether).Please read the standby database documention — it does explain this in some
detail:"
The answer then points through to the
Oracle 8.1.6 documentation for Standby Database which talks about the
problems you’ll get when you try and apply the (minimal) redo logs that were
written for our unrecoverable transactions to your standby database:
"In some SQL statements, the user has the option of specifying
the NOLOGGING clause, which indicates that the database operation is not
logged in the redo log file. Even though the user specifies the NOLOGGING
clause, a redo log record is still written to the redo log. However, when
the redo log file is transferred to the standby site and applied to the
standby database, a portion of the datafile is unusable and marked as being
unrecoverable. When you either activate the standby database, or open the
standby database with the read-only option, and attempt to read the range of
blocks that are marked as "UNRECOVERABLE," you will see error messages
similar to the following:ORA-01578: ORACLE data block corrupted (file # 1, block # 2521)
ORA-01110: data file 1: '/vobs/oracle/dbs/stdby/tbs_1.f'
ORA-26040: Data block was loaded using the NOLOGGING option"
That doesn't sound good, and certainly I can't imagine our DBA chuckling to
himself when at 2.30 in the morning he's called to recover our database and that
error message comes up. It's not as bad as it seems actually, as the article
then goes on to explain how you recover from this situation, but the point that
Tom was trying to make is that NOLOGGING isn't something you should do and then
tell no-one about - you should understand what it does, talk it through with the
DBA and have a plan in mind for when you do eventually need to perform a
recovery and work around the fact that you may have some objects that are
unrecoverable - if you've done your baseline backup after your bulk load, it
shouldn't affect you, but if your data centre gets hit by a meteorite then it
may well affect the process of starting up your standby database. Given that
Standby Database is now Data Guard with 10g, I checked the 9i Data Guard
documentation (the 10g documentation seems to be offline at the moment) and
found this comment, which suggests that in fact we might want
to actually reconsider our use of NOLOGGING operations:
"For logical standby databases, when SQL apply operations
encounter a redo log record for an operation performed with theNOLOGGINGclause, it skips over the record and continues applying
changes from later records. Later, if an attempt is made to access one of
the records that were updated withNOLOGGINGin effect, the
following error is returned:ORA-01403 no data foundTo recover after the
NOLOGGINGclause is
specified, re-create one or more tables from the primary database, as
described in
Section 9.1.6.Note: In general, use of the
NOLOGGING
clause is not recommended. Optionally, if you know in advance
that operations using theNOLOGGINGclause will be
performed on certain tables in the primary database, you might
want to prevent the application of SQL statements associated
with these tables to the logical standby database by using the
DBMS_LOGSTDBY.SKIP procedure."
So, in summary, in our situation where we want to minimise the amount of redo
generated for our dimension and fact table loads, but we want to work in
ARCHIVELOG mode and ship our logs to a standby database, I'd say the best
approach would be to:
- Keep the database in ARCHIVELOG mode
- Create separate tablespaces, with their own datafiles, for the
dimensions and facts that we are going to bulk load, and create these
tablespaces as NOLOGGING - Create the tables to implement these dimensions and facts as normal
(i.e. without the NOLOGGING clause, as this is already the default for their
tablespaces) - Use INSERT /*+ APPEND */ to
bulk load our data into these tables - Once the loads have completed, do a baseline hot backup of the datafiles
that relate to the tables/tablespaces we've just loaded - At this point, our main database is completely recoverable.
- If we need to recover the standby database, be prepared for the fact
that some manual work might be needed to activate the standby database, and
in fact be prepared for the DBA to actually stop us from using NOLOGGING
through the use of the DBMS_LOGSTDBY.SKIP procedure.
Any comments?


May 29th, 2005 at 10:02 am
Is the load going to be direct-path, regardless of the logging issue? That is, is direct-path part of the baseline to judge whether NOLOGGING helps the load process, or is that being introduced at the same time as NOLOGGING?
May 29th, 2005 at 2:50 pm
Hi Mark,
I’d be wanting to ask questions around why the DBAs wish to have a, presumably fully functional, standby database for this application (I’m assuming its a classic datawarehouse and the data comes in via the data loads only) in the event of disaster; but at the same time wish to disable logging and do direct mode loading.
In my mind a standby is for mission critical systems that you wish to failover to as soon as possible in a disaster, if you don’t have them the business loses x,000 per minute. A DW is for analysis and intelligence, critical yes – but not time critical. The correct ’standby’ solution for me would almost certainly be to rerun the loads – not replay the logs. If you must have two dw available, why not just do the data loads twice, once in each location.
if the db is important enough that it requires a standby in the event of failure, it would seem highly likely that its important enough to mandate logging for operations on it. to my mind you only ever choose to bypass the redo logs, or run in noarchivelog mode, if you are confident that you can just recreate the data.
May 29th, 2005 at 6:15 pm
Well, I have never had a customer that needed a standby DW, quick recovery yes.
The normal strategy for us is to restore to the last complete backup set and re-run the batch(s) to catch up. Worst case is to rebuild from tape on the DR truck and apply any missing batchs; generally that takes about a day (well when we test DR it does – if we did it for real the truck would get lost and…)
Personally I would go no archive log, minimal redo (well, if you won’t use it…) supported by REGULAR (daily?) back-ups of the database to disk followed by copy to tape. If you use SATA disk you can get this for about 5-10K/terabyte (Even Sun sells SATA disk at 10,000 for 2 terabytes!). You may find that read-only tablespaces help (this has changed somewhat on 10g) in reducing the size of backup sets
May 30th, 2005 at 11:11 am
Ed, Niall, Pete
Thanks for the comments. Ed – yes, we should test the benefits of direct path with logging, before turning off logging, to isolate the effect of each part of what we’re trying to do. Good call.
Niall, Pete. Yes, you’re right, standby database probably isn’t suitable for this sort of application. I guess the issue here is that every other database uses standby database for DR, and they’d prefer to use this method for our application. As you say though, there are other ways of achieving the same thing that wouldn’t have issues with NOLOGGING, so I’ll raise this possibility with them in our next meeting. Thanks for the advice, much appreciated.
regards
Mark