Metadata Modeling in the Database with Analytic Views

12.2, the latest Oracle database release provides a whole set of new features enhancing various aspects of the product including JSON support, Auto-List Partitioning and APEX news among others.
One of the biggest news in the Data Warehousing / Analytics area was the introduction of the Analytic Views, that as per Oracle's definition are

Metadata objects that enable the user to quickly and easily create complex hierarchical and dimensional queries on data in database tables and views

tl;dr

If you are on rush, here is an abstract of what you'll find in this looooong blog post:

Metadata modeling can now be done directly in the database using Analytic Views, providing to end users a way of querying database objects without needing a knowledge of joining conditions, aggregation functions or order by clauses.
This post will guide you through the creation of an analytic view that replicates a part of a OBIEE's Sampleapp business model. The latest part of the post is dedicated to understanding the usage of analytic views and the benefits for end users especially in cases when self-service BI tools are used.

If you are still around and interested in the topic please take a drink and sit comfortably, it will be a good journey.

Metadata Modeling

What are then the Analytics Views in detail? How are they going to improve end user's ability in querying data?

To answer above question I would take a step back. Many readers of this blog are familiar with OBIEE and its core: the Repository. The repository contains the metadata model from the physical sources till the presentation areas and includes the definition of:

  • Joins between tables
  • Hierarchies for dimensions
  • Aggregation rules
  • Security settings
  • Data Filters
  • Data Sources

This allows end users to just pick columns from a Subject Area and display them in the appropriate way without needing to worry about writing SQL or knowing how the data is stored. Moreover definitions are held centrally providing the famous unique source of truth across the entire enterprise.

The wave of self-service BI tools like Tableau or Oracle's Data Visualization Desktop provided products capable of querying almost any kind of data sources in a visual and intuitive way directly in the end user hands. An easy and direct access to data is a good thing for end user but, as stated above, requires knowledge of the data model, joins and aggregation methods.
The self-service tools can slightly simplify the process by providing some hints based on column names, types or values but the cruel reality is that the end-user has to build the necessary knowledge of the data source before providing correct results. This is why we've seen several times self-service BI tools being "attached" to OBIEE: get corporate official data from the unique source of truth and mash them up with information coming from external sources like personal Excel files or output of Big Data processes.

Analytics Views

Analytic Views (AV) take OBIEE's metadata modeling concept and move it at database level providing a way of organizing data in a dimensional model so it can be queried with simpler SQL statements.
The Analytical Views are standard views with the following extra options:

  • Enable the definition of facts, dimensions and hierarchies that are included in system-generated columns
  • Automatically aggregate the data based on pre-defined calculations
  • Include presentation metadata

Analytics views are created with a CREATE ANALYTIC VIEW statement, some privileges need to be granted to the creating user, you can find the full list in Oracle's documentation.

Every analytical view is composed by the following metadata objects:

  • Attribute dimensions: organising table/view columns into attributes and levels.
  • Hierarchies: defining hierarchical relationships on top of an attribute dimension object.
  • Analytic view objects: defining fact data referencing both fact tables and hierarchies.

With all the above high level concepts in mind it's now time to try how Analytical Views could be used in a reporting environment.

Database Provisioning

For the purpose blog post I used Oracle's 12.2.0.1 database Docker image, provided by Gerald Venzl, the quickest way of spinning up a local instance. You just need to:

  • Install Docker
  • Download database installer from Oracle's website
  • Place the installer in the proper location mentioned in the documentation
  • Build Oracle Database 12.1.0.2 Enterprise Edition Docker image by executing
./buildDockerImage.sh -v 12.1.0.2 -e
  • Running the image by executing
docker run --name db12c -p 1521:1521 -p 5500:5500 -e ORACLE_SID=orcl -e ORACLE_PDB=pdborcl -e ORACLE_CHARACTERSET=AL32UTF8 oracle/database:12.2.0.1-ee

The detailed parameters definition can be found in the GitHub repository. You can then connect via sqlplus to your local instance by executing the standard

sqlplus sys/pwd@//localhost:1521/pdborcl as sysdba

The password is generated automatically during the first run of the image and can be found in the logs, look for the following string

ORACLE AUTO GENERATED PASSWORD FOR SYS, SYSTEM AND PDBAMIN: XXXXxxxxXXX

Once the database is created it's time to set the goal: I'll try to recreate a piece of the Oracle's Sampleapp RPD model in the database using Analytic Views.

Model description

In this blog post I'll look in the 01 - Sample App business model and specifically I'll try to replicate the logic behind Time, Product and the F0 Sales Base Measures using Analytic Views.

Dim Product

The Sampleapp's D1 - Products (Level Based Hierarchy) is based on two logical table sources: SAMP_PRODUCTS_D providing product name, description, LOB and Brand and the SAMP_PROD_IMG_D containing product images. For the purpose of this test we'll keep our focus on SAMP_PRODUCTS_D only.
The physical mapping of Logical columns is shown in the image below.

Attribute Dimension

The first piece we're going to build is the attribute dimension, where we'll be defining attributes and levels. The mappings in above image can "easily" be translated into an attributes with the following SQL.

CREATE OR REPLACE ATTRIBUTE DIMENSION D1_DIM_PRODUCT
USING SAMP_PRODUCTS_D
ATTRIBUTES
 (PROD_KEY as P0_Product_Number
    CLASSIFICATION caption VALUE 'P0 Product Number',
  PROD_DSC as P1_Product
    CLASSIFICATION caption VALUE 'P1 Product',
  TYPE as P2_Product_Type
    CLASSIFICATION caption VALUE 'P2 Product Type',
  TYPE_KEY as P2k_Product_Type
    CLASSIFICATION caption VALUE 'P2k Product Type',
  LOB as P3_LOB
    CLASSIFICATION caption VALUE 'P3 LOB',
  LOB_KEY as P3k_LOB
    CLASSIFICATION caption VALUE 'P3k LOB',
  BRAND as P4_Brand
    CLASSIFICATION caption VALUE 'P4 Brand',
  BRAND_KEY as P4k_Brand
    CLASSIFICATION caption VALUE 'P4k Brand',
  ATTRIBUTE_1 as P5_Attribute_1
    CLASSIFICATION caption VALUE 'P5 Attribute 1',
  ATTRIBUTE_2 as P6_Attribute_2
    CLASSIFICATION caption VALUE 'P6 Attribute 2',
  SEQUENCE as P7_Product_Sequence
    CLASSIFICATION caption VALUE 'P7 Product Sequence',
  TOTAL_VALUE as P99_Total_Value
    CLASSIFICATION caption VALUE 'P99 Total Value')

Few pieces to note:

  • CREATE OR REPLACE ATTRIBUTE DIMENSION: we are currently defining a dimension, the attributes and levels.
  • USING SAMP_PRODUCTS_D: defines the datasource, in our case the table SAMP_PRODUCTS_D. Only one datasource is allowed per dimension.
  • PROD_KEY as P0_Product_Number: using the standard notification as we can easily recaption columns names
  • CLASSIFICATION CAPTION ... several options can be added for each attribute like caption or description

The dimension definition is not complete with only attribute declaration, we also need to define the levels. Those can be taken from OBIEE's hierarchy

For each level we can define:

  • The level name, caption and description
  • The Key
  • the Member Name and Caption
  • the Order by Clause

Translating above OBIEE's hierarchy levels into Oracle SQL

LEVEL BRAND
  CLASSIFICATION caption VALUE 'BRAND'
  CLASSIFICATION description VALUE 'Brand'
  KEY P4k_Brand
  MEMBER NAME P4_Brand
  MEMBER CAPTION P4_Brand
  ORDER BY P4_Brand
LEVEL Product_LOB
  CLASSIFICATION caption VALUE 'LOB'
  CLASSIFICATION description VALUE 'Lob'
  KEY P3k_LOB
  MEMBER NAME P3_LOB
  MEMBER CAPTION P3_LOB
  ORDER BY P3_LOB
  DETERMINES(P4k_Brand)
LEVEL Product_Type
  CLASSIFICATION caption VALUE 'Type'
  CLASSIFICATION description VALUE 'Type'
  KEY P2k_Product_Type
  MEMBER NAME P2_Product_Type
  MEMBER CAPTION P2_Product_Type
  ORDER BY P2_Product_Type
  DETERMINES(P3k_LOB,P4k_Brand)
LEVEL Product_Details
  CLASSIFICATION caption VALUE 'Detail'
  CLASSIFICATION description VALUE 'Detail'
  KEY P0_Product_Number
  MEMBER NAME P1_Product
  MEMBER CAPTION P1_Product
  ORDER BY P1_Product
  DETERMINES(P2k_Product_Type,P3k_LOB,P4k_Brand)
ALL MEMBER NAME 'ALL PRODUCTS';

There is an additional DETERMINES line in above sql for each level apart from Brand, this is how we can specify the relationship between level keys. If we take the Product_LOB example, the DETERMINES(P4k_Brand) defines that any LOB in our table automatically determines a Brand (in OBIEE terms that LOB is a child of Brand).

Hierarchy

Next step is defining a hierarchy on top of the attribute dimension D1_PRODUCTS defined above. We can create it just by specifying:

  • the attribute dimension to use
  • the list of levels and the relation between them

which in our case becomes

CREATE OR REPLACE HIERARCHY PRODUCT_HIER
  CLASSIFICATION caption VALUE 'Products Hierarchy'
USING D1_DIM_PRODUCT
  (Product_Details CHILD OF
   Product_Type CHILD OF
   Product_LOB CHILD OF
   BRAND);

When looking into the hierarchy Product_hier we can see that it's creating an OLAP-style dimension with a row for each member at each level of the hierarchy and extra fields like DEPT, IS_LEAF and HIER_ORDER

The columns contained in Product_hier are:

  • One for each Attribute defined in attribute dimension D1_PRODUCTS like P0_PRODUCT_NUMBER or P2K_PRODUCT_TYPE
  • The member name, caption and description and unique name
  • The level name in the hierarchy and related depth
  • The relative order of the member in the hierarchy
  • A field IS_LEAF flagging hierarchy endpoints
  • References to the parent level

Member Unique Names

A particularity to notice is that the MEMBER_UNIQUE_NAME of Cell Phones is [PRODUCT_TYPE].&[101] which is the concatenation of the LEVEL and the P2K_PRODUCT_TYPE value.
One could expect the member unique name being represented as the concatenation of all the preceding hierarchy members, Brand and LOB, and the member key itself in a string like [PRODUCT_TYPE].&[10001]&[1001]&[101].

This is the default behaviour, however in our case is not happening since we set the DETERMINES(P3k_LOB,P4k_Brand) in the attribute dimension definition. We Specified that Brand ([10001]) and LOB ([1001]) can automatically be inferred by the Product Type so there is no need to store those values in the member key. We can find the same setting in OBIEE's Product Type logical level

Dim Date

The basic D0 Dim Date can be built starting from the table SAMP_TIME_DAY_D following the same process as above. Like in OBIEE, some additional settings are required when creating a time dimension:

  • DIMENSION TYPE TIME: the time dimension type need to be specified
  • LEVEL TYPE <LEVEL_NAME>: each level in the time hierarchy needs to belong to a precise level type chosen from:
  • YEARS
  • HALF_YEARS
  • QUARTERS
  • MONTHS
  • WEEKS
  • DAYS
  • HOURS
  • MINUTES
  • SECONDS

Attribute Dimension

Taking into consideration the additional settings, the Dim Date column mappings in above image can be translated in the following attribute dimension SQL definition.

CREATE OR REPLACE ATTRIBUTE DIMENSION D0_DIM_DATE
DIMENSION TYPE TIME
USING SAMP_TIME_DAY_D
ATTRIBUTES
 (CALENDAR_DATE AS TOO_CALENDAR_DATE,
  PER_NAME_MONTH AS T02_PER_NAME_MONTH,
  PER_NAME_QTR AS T03_PER_NAME_QTR,
  PER_NAME_YEAR AS T04_PER_NAME_YEAR,
  DAY_KEY AS T06_ROW_WID,
  BEG_OF_MTH_WID AS T22_BEG_OF_MTH_WID,
  BEG_OF_QTR_WID AS T23_BEG_OF_QTR_WID
  )
    LEVEL CAL_DAY
      LEVEL TYPE DAYS
      KEY TOO_CALENDAR_DATE
      ORDER BY TOO_CALENDAR_DATE
      DETERMINES(T22_BEG_OF_MTH_WID, T23_BEG_OF_QTR_WID,T04_PER_NAME_YEAR)
    LEVEL CAL_MONTH
      LEVEL TYPE MONTHS
      KEY T22_BEG_OF_MTH_WID
      MEMBER NAME T02_PER_NAME_MONTH
      ORDER BY T22_BEG_OF_MTH_WID
      DETERMINES(T23_BEG_OF_QTR_WID,T04_PER_NAME_YEAR)
    LEVEL CAL_QUARTER
      LEVEL TYPE QUARTERS
      KEY T23_BEG_OF_QTR_WID
      MEMBER NAME T03_PER_NAME_QTR
      ORDER BY T23_BEG_OF_QTR_WID
      DETERMINES(T04_PER_NAME_YEAR)
    LEVEL CAL_YEAR
      LEVEL TYPE YEARS
      KEY T04_PER_NAME_YEAR
      MEMBER NAME T04_PER_NAME_YEAR
      ORDER BY T04_PER_NAME_YEAR
    ALL MEMBER NAME 'ALL TIMES';

You may have noticed a different mapping of keys, member names and order by attributes. Let's take the CAL_MONTH as example. It's defined by two columns

  • BEG_OF_MTH_WID: used for joins and ordering
  • PER_NAME_MONTH: used as "display label"

PER_NAME_MONTH in the YYYY / MM format could be also used for ordering, but most of the times end user requests months in the MM / YYYY format. Being able to set a ordering column different from the member name allows us to properly manage the hierarchy.

Hierarchy

Time hierarchy follows the same rules as the product one, no additional settings are required.

CREATE OR REPLACE HIERARCHY TIME_HIER
USING D0_DIM_DATE
  (CAL_DAY CHILD OF
   CAL_MONTH CHILD OF
   CAL_QUARTER CHILD OF
   CAL_YEAR);

Fact Sales

The last step in the journey is the definition of the analytic view of the fact table that as per Oracle's documentation

An analytic view specifies the source of its fact data and defines measures that describe calculations or other analytic operations to perform on the data. An analytic view also specifies the attribute dimensions and hierarchies that define the rows of the analytic view.

The analytic view definition contains the following specifications:

  • The data source: the table or view that will be used for the calculation
  • The columns: which columns from the source objects to use in the calculations
  • The attribute dimensions and hierarchies: defining both the list of attributes and the levels of the analysis
  • The measures: a set of aggregations based on the predefined columns from the data source.

Within analytical views definition a materialized view can be defined in order to store aggregated values. This is a similar to OBIEE's Logical Table Source setting for aggregates.

Analytic View Definition

For the purpose of the post I'll use SAMP_REVENUE_F which is one of the sources of F0 Sales Base Measures in Sampleapp. The following image shows the logical column mapping.

The above mappings can be translated in the following SQL

CREATE OR REPLACE ANALYTIC VIEW F0_SALES_BASE_MEASURES
USING SAMP_REVENUE_F
DIMENSION BY
  (D0_DIM_DATE
    KEY BILL_DAY_DT REFERENCES TOO_CALENDAR_DATE
    HIERARCHIES (
      TIME_HIER DEFAULT),
   D1_DIM_PRODUCT
    KEY PROD_KEY REFERENCES P0_Product_Number
    HIERARCHIES (
      PRODUCT_HIER DEFAULT)
   )
MEASURES
 (F1_REVENUE FACT REVENUE AGGREGATE BY SUM,
  F10_VARIABLE_COST FACT COST_VARIABLE AGGREGATE BY SUM,
  F11_FIXED_COST FACT COST_FIXED AGGREGATE BY SUM,
  F2_BILLED_QTY FACT UNITS,
  F3_DISCOUNT_AMOUNT FACT DISCNT_VALUE AGGREGATE BY SUM,
  F4_AVG_REVENUE FACT REVENUE AGGREGATE BY AVG,
  F21_REVENUE_AGO AS (LAG(F1_REVENUE) OVER (HIERARCHY TIME_HIER OFFSET 1))
  )
DEFAULT MEASURE F1_REVENUE;

Some important parts need to be highlighted:

  • USING SAMP_REVENUE_F: defines the analytic view source, in our case the table SAMP_REVENUE_F
  • DIMENSION BY: this section provides the list of dimensions and related hierarchies to take into account
  • KEY BILL_DAY_DT REFERENCES TOO_CALENDAR_DATE: defines the join between the fact table and attribute dimension
  • HIERARCHIES (TIME_HIER DEFAULT): multiple hierarchies can be defined on top of an attribute dimension and used in an analytical view, however like in OBIEE only one will be used by default
  • F1_REVENUE FACT REVENUE AGGREGATE BY SUM: defines the measure with alias, source column and aggregation method
  • F2_BILLED_QTY FACT UNITS: if aggregation method is not defined it replies on default SUM
  • F21_REVENUE_AGO: new metrics can be calculated based on previously defined columns replicating OBIEE functions like time-series. The formula (LAG(F1_REVENUE) OVER (HIERARCHY TIME_HIER OFFSET 1)) calculates the equivalent of the OBIEE's AGO function for each level of the hierarchy.
  • DEFAULT MEASURE F1_REVENUE: defines the default measure of the analytic view

Using Analytic Views

After the analytic view definition, it's time to analyse what benefits end users have when using them. We are going to take a simple example: a query to return the Revenue and Billed Qty per Month and Brand.

Using only the original tables we would have the following SQL

SELECT D.CAL_MONTH,
  D.BEG_OF_MTH_WID,
  P.BRAND,
  SUM(F.REVENUE) AS F01_REVENUE,
  SUM(F.UNITS)   AS F02_BILLED_QTY
FROM SAMP_REVENUE_F F
JOIN SAMP_PRODUCTS_D P
ON (F.PROD_KEY = P.PROD_KEY)
JOIN SAMP_TIME_DAY_D D
ON (F.BILL_DAY_DT = D.CALENDAR_DATE)
GROUP BY D.CAL_MONTH,
  D.BEG_OF_MTH_WID,
  P.BRAND
ORDER BY D.BEG_OF_MTH_WID,
  P.BRAND;

The above SQL requires the knowledge of:

  • Aggregation methods
  • Joins
  • Group by
  • Ordering

Even if this is an oversimplification of the analytic view usage you can already spot that some knowledge of the base data structure and SQL language is needed.

Using the analytic views defined above, the query can be written as

SELECT TIME_HIER.MEMBER_NAME AS TIME_SLICE,
  PRODUCT_HIER.MEMBER_NAME   AS PRODUCT_SLICE,
  F1_REVENUE,
  F2_BILLED_QTY
FROM F0_SALES_BASE_MEASURES
WHERE TIME_HIER.LEVEL_NAME  IN ('CAL_MONTH')
AND PRODUCT_HIER.LEVEL_NAME IN ('BRAND')
ORDER BY TIME_HIER.HIER_ORDER,
  PRODUCT_HIER.HIER_ORDER;

As you can see, there is a simplification of the SQL statement: no more aggregation, joining conditions and group by predicates are needed. All the end-user has to know is the analytical view name, and the related hierarchies that can be used.

The additional benefit is that if we want to change the level of granularity of the above query we just need to change the WHERE condition. E.g. to have the rollup per Year and LOB we just have to substitute

WHERE TIME_HIER.LEVEL_NAME  IN ('CAL_MONTH')
AND PRODUCT_HIER.LEVEL_NAME IN ('BRAND')

with

WHERE TIME_HIER.LEVEL_NAME  IN ('CAL_YEAR')
AND PRODUCT_HIER.LEVEL_NAME IN ('LOB')

without touching granularity, group by and order by statements.

Using Analytic Views in DVD

At the beginning of my blog post I wrote that Analytic Views could be useful when used in conjunction with self-service BI tools. Let's have a look at how the end user journey is simplified in the case of Oracle's Data Visualization Desktop.

Without AV the end-user had two options to source the data:

  • Write the complex SQL statement with joining condition, group and order by clause in the SQL editor to retrieve data at the correct level with the related dimension
  • Import the fact table and dimensions as separate datasources and join them together in DVD's project.

Both options require a SQL and joining conditions knowledge in order to being able to present correct data. Using Analytic Views the process is simplified. We just need to create a new source pointing to the database where the analytic views are sitting.
Next step is retrieve the necessary columns from the analytic view. Unfortunately analytic views are not visible from DVD object explorer (only standard table and views are shown)

We can however specify with a simple SQL statement all the informations we need like Time and Member Slice, the related levels and the order in hierarchy.

SELECT TIME_HIER.MEMBER_NAME AS TIME_SLICE,
  PRODUCT_HIER.MEMBER_NAME   AS PRODUCT_SLICE,
  TIME_HIER.LEVEL_NAME AS TIME_LEVEL,
  PRODUCT_HIER.LEVEL_NAME AS PRODUCT_LEVEL,
  TIME_HIER.HIER_ORDER AS TIME_HIER_ORDER,
  PRODUCT_HIER.HIER_ORDER AS PRODUCT_HIER_ORDER,
  F1_REVENUE,
  F2_BILLED_QTY
FROM F0_SALES_BASE_MEASURES
ORDER BY TIME_HIER.HIER_ORDER,
  PRODUCT_HIER.HIER_ORDER;

You may have noted that I'm not specifying any WHERE clause for level filtering: as end user I want to be able to retrieve all the necessary levels by just changing a filter in my DVD project. After including the above SQL in the datasource definition and amending the measure/attribute definition I can start playing with the analytic view data.

I can simply include the dimension's MEMBER_NAME in the graphs together with the measures and add the LEVEL_NAME in the filters. In this way I can change the graph granularity by simply selecting the appropriate LEVEL in the filter selector for all the dimensions available.

One particular to notice however is that all the data coming from various columns like date, month and year are "condensed" into a single VARCHAR column. In case of different datatypes (like date in the time dimension) this will prevent a correct usage of some DVD's capabilities like time series or trending functions. However if a particular type of graph is needed for a specific level, either an ad-hoc query or a casting operation can be used.

Conclusion

In this blog post we analysed the Analytic Views, a new component in Oracle Database 12.2 and how those can be used to "move" the metadata modeling at DB level to provide an easier query syntax to end-users.

Usually metadata modeling is done in reporting tools like OBIEE that offers additional set of features on top of the one included in analytic views. However centralized reporting tools like OBIEE are not present everywhere and, with the wave of self-service BI tools, analytic views represent a perfect method of enabling users not familiar with SQL to simply query their enterprise data.

If you are interested in understanding more about analytic views or metadata modeling, don't hesitate to contact us!
If you want to improve the SQL skills of your company workforce, check out our recently launched SQL for beginners training!