Regular Expressions in OBIEE

When reporting against an OLTP system, in many ways, OBIEE works like an ETL tool, transforming the source system data and presenting it as a star schema. After years of writing ETL code, if there's one thing I hate to be without, it's regular expressions. So, when working on a project to map an OLTP source system to a logical model in OBIEE, I came across the following issue, and knew immediately that I would need regular expressions to solve it.

The CONTACT table in the source system had two columns storing the name: FIRST_NAME and LAST_NAME. There were two different processes that wrote entries in the CONTACT table... and one of them was faulty, writing the entire name in the FIRST_NAME column, though it was never corrected. So the following is a decent representation of what the data looked like:

SQL> create table CONTACT (contact_id NUMBER, first_name varchar2(50), last_name varchar2(50));

Table created.

Elapsed: 00:00:00.02
SQL> insert into CONTACT values (1, 'Bryson, Stewart W.', NULL);

1 row created.

Elapsed: 00:00:00.02
SQL> insert into CONTACT values (2, 'Mead, Jon', NULL);

1 row created.

Elapsed: 00:00:00.00
SQL> insert into CONTACT values (3, 'Mark', 'Rittman');

1 row created.

Elapsed: 00:00:00.00
SQL> select * from CONTACT;

CONTACT_ID | FIRST_NAME           | LAST_NAME
---------- | -------------------- | --------------------
         1 | Bryson, Stewart W.   |
         2 | Mead, Jon            |
         3 | Mark                 | Rittman

3 rows selected.

Elapsed: 00:00:00.00
SQL>

I needed to map the BMM such that I could return FIRST_NAME and LAST_NAME regardless of whether the entire name was concatenated into the FIRST_NAME, or whether it was correctly distributed across both columns. Additionally, the fact that the middle initial needed to be included with FIRST_NAME also proved a little troubling. At the end of the day, this is what I came up with in SQL:

SQL> SELECT CASE
  2           WHEN last_name IS null THEN trim(regexp_substr(first_name,'[^,]+$'))
  3           ELSE first_name
  4         END first_name,
  5         CASE
  6           WHEN last_name IS null THEN regexp_substr(first_name,'^([^,]+)')
  7           ELSE last_name
  8         END last_name
  9    FROM contact
 10  /

FIRST_NAME           | LAST_NAME
-------------------- | --------------------
Stewart W.           | Bryson
Jon                  | Mead
Mark                 | Rittman

3 rows selected.

Elapsed: 00:00:00.01
SQL>

To explain a bit, I'll start with how I extracted the first name information from the FIRST_NAME column. I needed to start at the comma and then get the entire string until the end of the column. So I used the [^] structure in regular expressions, which basically says, return anything EXCEPT the character between the brackets and after the carrot (^). The plus (+) instructs the RegEx engine to return one or more instances of the previous structure. And at the end, the dollar sign ($) dictates that the entire string must run to the end of the column value. So taken all together, [^,]+$ instructs the RegEx engine to:

"Start at the first character that is not a comma, and return all non-comma characters all the way to the end of the column value."

The only kludge introduced here is that the first non-comma character was actually a space, and to remove it, I simply used a TRIM. If some one has a way to do this without a TRIM, then I'd be glad to hear it.

To extract the last name information from the FIRST_NAME column, I used a similar mechanism, except that, instead of using the dollar sign ($) at the end, I put the carrot (^) at the beginning. It's the same concept: it means that the expression returned has to begin at the start of the column value. So, the ^([^,]+ instructs the RegEx engine to:

"Start at the beginning of the column value, and return the whole string until a comma is encountered."

Easy enough.

Now I want OBIEE to accept this SQL in the BMM. The only issue here is that OBIEE does not support regular expressions in it's SQL language, so I have to use the EVALAUTE command to pass Oracle's regular expression syntax back through to the database. So I'll demonstrate how to do this, but first I'll need to create a fact table to join to the CONTACT table in OBIEE.

SQL> CREATE TABLE activity (contact_id number, activity_date date, num_calls NUMBER);

Table created.

Elapsed: 00:00:00.04
SQL> INSERT INTO activity VALUES (1, SYSDATE-2, 10);

1 row created.

Elapsed: 00:00:00.00
SQL> INSERT INTO activity VALUES (2, SYSDATE-1, 20);

1 row created.

Elapsed: 00:00:00.00
SQL> INSERT INTO activity VALUES (3, sysdate, 30);

1 row created.

Elapsed: 00:00:00.00
SQL>
SQL> SELECT * FROM activity;

CONTACT_ID | ACTIVITY_DATE          |  NUM_CALLS
---------- | ---------------------- | ----------
         1 | 12/16/2009 10:21:34 AM |         10
         2 | 12/17/2009 10:21:34 AM |         20
         3 | 12/18/2009 10:21:34 AM |         30

3 rows selected.

Elapsed: 00:00:00.01
SQL>

To demonstrate the whole process in OBIEE, I first built the BMM to bring the data in how it is from the database:

non-regexp data.png

To generate the correct data, using the regular expressions developed above, here is how I mapped the First Name attribute:

CASE
WHEN "bidw".""."STEWART"."CONTACT"."LAST_NAME" IS NULL
  THEN  Trim(BOTH ' ' FROM Evaluate('regexp_substr(%1,''[^,]+$'')', "bidw".""."STEWART"."CONTACT"."FIRST_NAME" ))
ELSE
  "bidw".""."STEWART"."CONTACT"."FIRST_NAME"
END
first_name column mapping.png

And here is how I mapped the Last Name attribute:

CASE
WHEN "bidw".""."STEWART"."CONTACT"."LAST_NAME" IS NULL
  THEN Evaluate('regexp_substr(%1,''^[^,]+'')', "bidw".""."STEWART"."CONTACT"."FIRST_NAME" )
ELSE
  "bidw".""."STEWART"."CONTACT"."LAST_NAME"
END
last_name column mapping.png

And finally... the results:

regexp data.png