Constraints
April 14th, 2007 by Peter Scott
Constraints do just that; when enforced they constrain data such that rules are obeyed. But even when not enforced the existence of an Oracle RELY constraint does something really useful, it tells the CBO, and especially the query rewrite engine, that certain facts about the data can be presumed to be true. Of course, if we don’t tell the truth to the database we can into all sorts of problems.
But why should we want to tell the database to trust us, wouldn’t it be better to let the database examine the data and use, real, enforced constraints? Real unique constraints are enforced by b-tree indexes, and for large fact tables, this can be a problem; typically we would need an index based on each dimension column in the fact table and furthermore for a partitioned table this should be a global rather than local index. In one of my systems we have a 600 million row table spread over 800 partitions and each row is uniquely defined by a combination of 10 dimensional attributes. Having an index of that size is no joke – we need masses of storage; we either slow data load as each item is checked against the constraint or we spend time rebuilding the constraint post data load. And to what purpose, it is not as if we would often run queries that target specific rows by way of the composite index.
For my money, real constraints are used in the staging process to validate data; when the data is proven clean and published to main data warehouse it is the time for the rely constraints to take over.


April 15th, 2007 at 12:45 am
Interesting feature. It is probably most welcome in the Java crowd that prefers to manage data integrity in the application (instead of the DB).
April 15th, 2007 at 8:38 pm
I’ll happily go along with this in a read-only warehousing environment, which is what it appears you’re referring to. When we get to transactional apps my paranoia begins to kick in; I don’t trust the middle- and client-tire programs to consistently enforce the rules. I get particularly nervous when I think about an evolving application in which maintenance programmers ‘adjust’ the code 5 years down the road.
April 15th, 2007 at 9:13 pm
Beth, yes I am talking about data warehouses – should have linked back to the original post to make that clearer.
Rob – as Beth said, can you really trust a developer to manage programmatic constraints… ops, I used to be a developer so perhaps shouldn’t have said that ;-)
April 16th, 2007 at 9:17 pm
I wasn’t making a statement whether it was desirable, just that it would be welcome from the crowd that does trust the programmer. (For what it’s worth, no, I’m not in that crowd).
April 17th, 2007 at 3:44 am
Saw an interesting point in an analytics session here in Vegas. If your data tends to come in with lots of constraint violations like duplicate keys, it can be faster to scrub them in staging tables with with analytics than go through the “rejects table” routines.
April 17th, 2007 at 8:40 am
From comments I received, both here and in private email, I must emphasise that RELY is a way of telling the database “trust me, I am telling you that this is how it is.” If we lie to the database, then don’t be surprised if features such as materialized view query rewrite selects inappropriate summary tables.
Rob Vollman’s comment about desirability of use in a transactional environment is very true (but there again he is a pragmatic guy) – just because you can do something does not mean it is the right thing to do
Joel – wish I had heard that talk. I am still a bit “old school” on reference loads, I disable constraints, load data and then re-enable the constraints with an exceptions into clause. But for fact load I use outer joins and look for nulls (for foreign keys) or not nulls (for duplicates) – but the analytics could be a good way to go here
April 17th, 2007 at 11:07 am
Does this mean you still maintain the B-tree unique index in your staging area on the 600 million row table spread over 800 partitions, uniquely defined by a combination of 10 dimensional attributes? By IOT? You just switch to RELY constraint on the main DW?
April 17th, 2007 at 11:36 am
DJ – For staging I use a lightweight check and not the whole history of the database. We have a business rule where we only load data that is no more than six weeks old – this immediately reduces the size of the data set to check against by almost 95%. We tend to represent this as a IOT copy of the key columns. Once we prove the data to be clean we publish to the main DW tables (raw data & summary tiers) and use RELY contraints for then on
April 17th, 2007 at 4:03 pm
Thanks for the reply. It’s nice to know what the industry experts are doing. Keep up the good working.