Validation of dimensions
August 26th, 2005 by Peter Scott
If you have an Oracle data warehouse and are using materialized views for aggregations then Oracle dimension objects would have almost certainly been defined. These objects describe the hierarchical, attribute and join relationships for dimensions and together with database constraints enable the query rewrite mechanism to successfully rewrite queries that navigate dimensional hierarchies. Dimensional objects are declarative and are not enforced by the database.
One Oracle supplied PL/SQL procedure that is often overlooked is the one that validates dimensions. In Oracle 10g this procedure is DBMS_DIMENSION.VALIDATE_DIMENSION but in earlier versions it was DBMS_OLAP.VALIDATE_DIMENSION. The parameters passed to either version of the procedure are very similar and are documented in Supplied PL/SQL Packages reference and the Data Warehousing guide. The procedure navigates the dimensional hierarchy and checks that all of the relationships defined in the dimension object are valid. The rowid (and reason for failure) of data items that fail validation are written to an exceptions table. This exception table can then be used to identify reference items that fail validation. The real utility of this procedure is that it provides a quick way of checking dimensional integrity on DW systems that use RELY constraints.


August 26th, 2005 at 5:41 pm
It has always seemed to me that some of the functionality of the validate dimension procedure could be taken over by materlialized views in a multirow constraint-checking configuration.
Taking the customers_dim example here: http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_5006.htm#SQLRF01206
It seems that for each level in the dimension you could create an MV to ensure integrity. So for the “customer” level you could have an MV that ensures that there are no customers that have more than one value for the cust_first_name, cust_last_name, cust_gender, cust_marital_status, cust_year_of_birth, cust_income_level, or cust_credit_limit.
Something like an MView with a query …
Select cust_id, count(*)
from customers
group by cust_id
having count(distinct cust_city) > 1 and …
… with a 1=0 constraint on it, or …
Select cust_id,
count(distinct cust_city) c_cust_city,
…
from customers
group by cust_id
… with a c_cust_city = 1 constraint on it. It would have to be fast refreshable on commit to be useful, I suppose.
Now if you put in place such a methodology, would there then be any value in the validation of the dimension? I’m notaware that it has any function in telling the optimizer that the dimension data is valid with respect to the definition of the dimension … or does it?
August 26th, 2005 at 10:13 pm
The dimension objects are really used as metadata for query rewrite and as part of the dimension definition used by OLAP. You are right, David, they don’t directly influence the query optimiser.
A lot of data warehouses use RELY constraints on the foreign keys to the dimensions; design effort goes into the data load layer to ensure that only referentially valid data is applied to the DW. When you use RELY you are telling the database “trust me, I know what I am doing” but sometimes we get it wrong, and VALIDATE_DIMENSION is a good way to check that the reference is consistant.
I suppose that you could roll-your-own code to validate dimesnions (perhaps using your MV approach) but the package is there, it works and is someone elses problem (Oracle’s) to maintain it
August 29th, 2005 at 1:43 pm
RELY is a good analogy — no enforcement of contraints but a means of documenting their existence to the optimizer. The difference in general would be in the amount of data involved I suppose. You could probably afford to have some form of validation of the dimensional data at the time that it is modified because there is not too much of it, unlike in a foreign key from a fact table.
One decision-maker for me would be how much I trust my ETL process (both technologyand people) to make sure that the validation gets done. If there was any doubt over it then I might consider enforcing dimensions with an MV constraint mechanism, but more likely I wouldn’t. It’s just one of those theoretical possibilities.