Interview with Kevin McGinley, BI Content Lead for Kscope 12

January 29th, 2012 by Stewart Bryson

Recently, I sat down (virtually) with Kevin McGinley of Accenture to discuss the upcoming ODTUG Kscope 12. I was on the content selection committee, and immediately recognized how lucky ODTUG was to have Kevin coordinating this process. We had tough choices to make around content — this is always the case, as I’ve participated in this capacity before in the past. But Kevin always took us in the right direction, and after the process was over, I knew I wanted to have a discussion with him on the blog so our readers could see what awaits them at Kscope 12.

Kevin recently blogged about Kscope 12 on the ODTUG Blog, so perhaps that is a nice introduction to our interview here. I’d like to thank Kevin for taking a little time to do this interview, and I’d also like to thank Accenture for allowing him to appear here.

[Stewart Bryson] This is only your second Kscope, but already you are a winner of the Editor’s Choice award for your whitepaper at Kscope 11, and now, are the BI content lead for Kscope 12. What do you think it is about ODTUG and Kscope that you have connected with?

[Kevin McGinley] I was amazed by three things at Kscope 11.  First, the ODTUG community is a very warm, welcoming community of people who were very easy to engage with, both on a professional and personal level.  Second, I was pleased with the type of content presented at Kscope versus a larger conference like Open World.  The sessions feel very real, the presenters are very approachable, and the level of discussion/interaction is much higher.  Lastly, I was very impressed with the level of organization at Kscope.  The conference flowed very smoothly, there were a lot of interesting activities outside the core sessions, and the entertainment was top-notch.

[Stewart Bryson] For those folks who have never attended Kscope before, how would you describe the event, perhaps drawing comparisons or differences with other conferences?

[Kevin McGinley] As I alluded to above, Kscope is much more communal than a larger conference like Open World.  Open World is a mad dash against 40,000+ strangers to get from place to place.  You are exhausted by the end of the week, and the practical knowledge you take away can be limited.  Kscope is a more manageable pace, the practical knowledge you gain from the sessions is much higher, and there is greater emphasis on interaction and discussion.

[Stewart Bryson] Thinking specifically about the BI Stream, what would you say to Kscope Alumni about the BI Stream this year that might encourage them to give the conference another try?

[Kevin McGinley] I would say two things to this.  First, BI keeps growing at Kscope – we have about 50% more sessions than we did last year!  This is great because you get to offer more variety in the content and you also get to balance the “intro” audiences against the “technical” audiences – satisfying both.  Second, Kscope has a tremendous EPM presence – quite possibly the biggest EPM conference around – and with BI and EPM converging the way they are, this offers attendees a tremendous opportunity to start looking at how to maximize their Oracle investments in these two areas and expand the value they provide to their businesses.

[Stewart Bryson] What can you tell us about the content selection process? Did you have a particular focus or goal in mind when selecting and scheduling the presentations?

[Kevin McGinley] Because OBIEE 11g was introduced before Kscope 11, it had a very strong presence that year due to the sheer magnitude of the release.  It was necessary to insure that the ODTUG community was well informed about OBIEE 11g.  Now that OBIEE 11g has settled in the marketplace, we can explore/return to other areas like the packaged BI Applications, data integration with ODI and Golden Gate, EPM integration, more BI Publisher, and the recently announced Exalytics.  We tried to make sure we still covered relevant areas of OBIEE, but left room to cover more of the Oracle offerings around OBIEE, since it’s rarely used by itself in a vacuum.

[Stewart Bryson] Any particular BI sessions that you are looking forward to?

[Kevin McGinley] I see what you’re trying to do here, Stewart – you’re looking for me to plug your two presentations! In all seriousness, there are a lot of great sessions that I’m excited about.  I also love that we have a great balance between customer speakers, boutique consulting companies, large consulting companies, independents, and Oracle ACEs.  I think that’s important.  To answer your question, though, I’m really excited to hear from customers like JC Penny, Eaton Vance, General Dynamics, and Clark Construction covering topics like OBI/EPM integration, rolling-out mobility to executives, and project testing strategies.

[Stewart Bryson] Being involved with content selection can be very time-consuming. How supportive has Accenture been with your dedication to Kscope?

[Kevin McGinley] Accenture has been great.  I think no matter where you work, you’re often pretty busy, so it helps to have an employer who is supportive with the extra time required to make sure Kscope is a great experience for everyone.  Accenture really recognizes the value of a smaller, more intimate conference like Kscope – we host a similar conference for our Oracle customers – and encourages its employees to engage in the industry community where possible.

[Stewart Bryson] Personally, I think Kscope provides a great opportunity to step outside my usual focus on BI and see some sessions in other streams. Last year I attended sessions on Exadata, PL/SQL development, and APEX. Has anything outside of the BI stream caught your eye?

[Kevin McGinley] The great thing about BI is that it complements other tracks nicely.  You can’t get very far in BI without a data store of some sort, so both the Database track and the Essbase track offer sessions that would be attractive to BI attendees.  Both data stores require optimization for BI to perform, and each track has very practical sessions on how to accomplish this.  I’m excited about that.  Another track I find interesting is the EPM Business Content, a new track this year.  Geared more towards a director or senior manager, this track can really help a BI person understand how EPM can fit into their environment and drive additional value.

As you can see, Business Intelligence is in good hands at Kscope 12. Hopefully, we’ll see you there!

Agile Data Warehousing with Exadata and OBIEE: ETL Iteration

January 27th, 2012 by Stewart Bryson

This is the fourth entry in my series on Agile Data Warehousing with Exadata and OBIEE. To see all the previous posts, check the introductory posting which I have updated with all the entries in the series.

In the last post, I describe what I call the Model-Driven iteration, where we take thin requirements from the end-user in the form of a user story and generate the access and performance layer, or our star schema, logically using the OBIEE semantic model. Our first several iterations will likely be Model-Driven as we work with the end user to fine-tune the content he or she wants to see on the OBIEE dashboards. As user stories are opened, completed and validated throughout the project, end users are prioritizing them for the development team to work on. Eventually, there will come a time when an end user opens a story that is difficult to model in the semantic layer. Processes to correct data quality issues are a good example, and despite having the power of Exadata at our disposal, we may find ourselves in a performance hole that even the Database Machine can’t dig us out of. In these situations, we reflect on our overall solution and consider the maxim of Agile methodology: “refactoring”, or “rework”.

For Extreme BI, the main form of refactoring is ETL. The pessimist might say: “Well, now we have to do ETL development, what a waste of time all that RPD modeling was.” But is that the case? First off… think about our users. They have been running dashboards for some time now with at least a portion of the content they need to get their jobs done. As the die-hard Agile proponent will tell you… some is better than none. But also… the process of doing the Model-Driven iteration puts our data modelers and our ETL developers in a favorable position. We’ve eliminated the exhaustive data modeling process, because we already have our logical model in the Business Model and Mapping layer (BMM).

But we have more than that. We also have our source-to-target information documented in the semantic metadata layer. We can see that information using the Admin Tool, as depicted below, or we can also use the “Repository Documentation” option to generate some documented source-to-target mappings.

When embarking on ETL development, it’s common to do SQL prototyping before starting the actual mappings to make sure we understand the particulars of granularity. However, we already have these SQL prototypes in the nqquery.log file… all we have to do is look at it. The combination of the source-to-target-mapping and the SQL prototypes provide all the artifacts necessary to get started with the ETL.

When using ETL processing to “instantiate” our logical model into the physical world, we can’t abandon our Agile imperatives: we must still deliver the new content, and corresponding rework, within a single iteration. So whether the end user is opening the user story because the data quality is abysmal, or because the performance is just not good enough, we must vow to deliver the ETL Iteration time-boxed, in exactly the same manner that we delivered the Model-Driven Iteration. So, if we imagine that our user opens a story about data quality in our Customer and Product dimensions, and we decide that all we have time for in this iteration are those two dimension tables, does it make sense for us to deliver those items in a vacuum? With the image below depicting the process flow for an entire subject area, can we deliver it piecemeal instead of all at once?

The answer, of course, is that we can. We’ll develop the model and ETL exactly as we would if our goal was to plug the dimensions into a complete subject area. We use surrogate keys as the primary key for each dimension table, facilitating joining our dimension tables to completed fact tables. But we don’t have completed fact tables at this point in our project… instead we have a series of transaction tables that work together to form the basis of a logical fact table. How can we use a dimension table with a surrogate key to join to our transactional “fact” table that doesn’t yet have these surrogate keys?

We fake it. Along with surrogate keys, the long-standing best practice of dimension table delivery has been to include the source system natural key, as well as effective dates, in all our dimension tables. These attributes are usually included to facilitate slowly-changing dimension (SCD) processing, but we’ll exploit them for our Agile piecemeal approach as well. So in our example below, we have a properly formed Customer dimension that we want to join to our logical fact table, as depicted below:

We start by creating aliases to our transactional “fact” tables (called POS_TRANS_HYBRID and POS_TRANS_HEADER_HYBRID in the example above), because we don’t want to upset the logical table source (LTS) that we are already using for the pure transactional version of the logical fact table. We create a complex join between the customer source system natural key and transaction date in our hybrid alias, and the natural key and effective dates in the dimension table. We use the effective dates as well to make sure we grab the correct version of the customer entity in question in situations where we have enabled Type 2 SCD’s (the usual standard) in our dimension table.

This complex logic of using the natural key and effective dates is identical to the logic we would use in what Ralph Kimball calls the “surrogate pipeline”: the ETL processing used to replace natural keys with surrogate keys when loading a proper fact table. Using Customer and Sales attributes in an analysis, we can see the actual SQL that’s generated:

We can view this hybrid approach as an intermediate step, but there is also nothing wrong with this as a long-term approach if the users are happy and Exadata makes our queries scream. If you think about it… a surrogate key is an easy was of representing the natural key of the table, which is the source system natural key plus the unique effective dates for the entity. A surrogate key makes this relationship much easier to envision, and certainly code using SQL, but when we are insulated from the ugliness of the join with Extreme Metadata, do we really care? If our end users ever open a story asking for rework of the fact table, we may consider manifesting that table physically as well. Once complete, we would need to create another LTS for the Customer dimension (using an alias to keep it separate from the table that joins to the transactional tables). This alias would be configured to join directly to the new Sales fact table across the surrogate key… exactly how we would expect a traditional data warehouse to be modeled in the BMM. The physical model will look nearly identical to our logical model, and the generated SQL will be less interesting:

Now that I’ve described the Model-Driven and ETL Iterations, it’s time to discuss what I call the Combined Iteration, which is likely what most of the iterations will look like when the project has achieved some maturity. In Combined Iterations, we work on adding new or refactored RPD content alongside new or refactored ETL content in the same iteration. Now the project really makes sense to the end user. We allow the user community–those who are actually consuming the content–to dictate to the developers with user stories what they want the developers to work on in the next iteration. The users will constantly open new stories, some asking for new content, and others requesting modifications to existing content. All Agile methodologies put the burden of prioritizing user stories squarely on the shoulders of the user community. Why should IT dictate to the user community where priorities lie? If we have delivered fabulous content sourced with the Model-Driven paradigm, and Exadata provides the performance necessary to make this “real” content, then there is no reason for the implementors to dictate to the users the need to manifest that model physically with ETL when they haven’t asked for it. If whole portions of our data warehouse are never implemented physically with ETL… do we care? The users are happy with what they have, and they think performance is fine… do we still force a “best practice” of a physical star schema on users who clearly don’t want it?

So that’s it for the Extreme BI methodology. At the onset of this series… I thought it would require five blog posts to make the case, but I was able to do it in four instead. So even when delivering blog posts, I can’t help but rework as I go along. Long live Agile!

Hyperion Profitability & Cost Management – Overview

January 26th, 2012 by Venkatakrishnan J

Recently i have been doing lot more work on the Oracle EPM stack than on the Oracle BI stack. So, i will be writing more on the various Oracle EPM products like HFM, Planning, FDQM etc in the forthcoming weeks. To sort of kick start the series of postings, i thought i will begin an article on Hyperion Profitability & Cost Management also popularly known as HPCM. It is one of those products that is often overlooked, due to the overlap of features it has with other products like Essbase & Planning. It is sort of a targeted product with a solid technical foundation and uses Hyperion Essbase as its backend. On the outset, HPCM primarily provides Functional Users with the ability to automatically allocate Costs & Revenue to various departments, accounts thereby giving the ability to do proper & complete profitability reporting.

HPCM primarily has 3 main sweet spots

1. Every company will have indirect costs. For example, a Consulting company where Revenue is obtained through driving projects will have a lot of indirect costs like HR Costs, Admin Costs etc. HPCM provides an ability to allocate the costs back to the projects so that proper project profitability is derived. How the costs are allocated will be defined through HPCM itself. For example, if a company is running say 3 consulting projects with 20, 30 & 40 resources each, then the indirect costs are allocated back to the project based on the number of people(or time logged etc) in each project.

2. Every company will be storing the incoming Revenue & Costs in a Ledger. Due to various reasons, even the direct costs & revenue might not actually be tied back to a project (Consulting company example above). So, there might be a need to allocate the project based Direct Costs & Revenue as well to the Projects (allocation possibly by head count etc).

3. Allocation of Costs is pretty dynamic in nature depending on the type of business. It can vary quite frequently. So, the key is to ensure that the allocation logic can be changed frequently and easily. In addition, one more key point is to find the lineage back to the source on how the costs are obtained.

HPCM provides all the 3 above. If you are an Essbase or a Planning person, you could argue that we can do the same thing using these 2 products itself. Though true, in many cases Cost & Revenue allocation rules are defined by Functional Users. So, it is not possible for Functional users to create Business Rules/Calculation Scripts every time there is a change. In addition though Essbase is very good, it is very difficult to do a data lineage from a calculation script, to find out how the costs are allocated. Thats the main reason why, HPCM is a solution positioned primarily at Business/Functional users for providing that cost & Revenue Breakdown.

Though i have mentioned that HPCM is a functional tool, its underlying technology is very interesting. It has a relational metadata that stores the metadata related to HPCM. In addition each HPCM application will have 2 Essbase databases. One is Block Storage cube which will be used for the allocation & calculations. The other is a reporting Aggregate Storage cube which will be used for reporting. Data push from BSO and ASO is automatically available out of the box from HPCM. Also, one important point to note, change to dimensions, change to metadata, pushing data from BSO to ASO are all achieved within HPCM without writing any external code/scripts. Everything is done out of the box. This architecture is shown below.

In addition, the most interesting aspect of HPCM is the way it handles dimensions. This is what we will be covering in this article today. HPCM uses EPMA for managing its dimensions & attributes. HPCM as an application has 3 types of dimensions

1. System Dimensions – There are 2 System Dimensions – Measures & AllocationType. Generally while creating a HPCM application through the Wizard we can pre-create these 2 dimensions. AllocationType is used by HPCM internally for doing allocations. It is generally not needed to make any changes to this dimension. But Measures dimension is the most important dimension that HPCM uses for pushing costs & allocating them. We can create custom members in the Measure dimension if needed.

2. POV Dimensions – HPCM supports upto 4 point of view dimensions. These dimensions are generally for storing Year, Period, Scenario & Version. True to their names, they generally are used as POVs and are not used directly in any allocation (the POVs are always fixed in the calculation scripts).

3. Business Dimensions – Business Dimensions are those dimensions where allocations happen. These dimensions drive the allocation logic.

In addition HPCM also supports alias and attribute dimensions. For this article, i will use a simple case of demonstrating how to go about allocating HR Costs in a Consulting Company recognising its revenue through Projects. Lets make an assumption that on a monthly basis we record the HR Costs (including Salary paid to HR, other misc costs) etc. Lets also assume that we have 3 projects running in the company with the following break-up

a. Project A – 300 people full time
b. Project B – 500 people full time
c. Proejct C – 200 people full time

We will start off with creating the application through the Application Wizard (pre-create System Dimensions) and then we shall define the necessary dimensions.

We basically have 2 business dimensions – one for Accounts which will hold the HR Costs. Then we have the Project dimension which will record the revenue and costs specific to the project.

 

Lets deploy this application and then login to the application through Workspace.

In HPCM all allocation happens through stages. Stages is where allocation happen. HPCM supports uppto 9 stages. Each stage also supports intra stage allocation. Lets try to understand what this means from a Multi-Dimensional Essbase database Standpoint. In our example, we will have 2 stages. The first stage will have just the Accounts dimension – basically HRCosts in Accounts dimension will flow from Stage 1 to Stage 2 and will get allocated in Stage 2. So, Stage 2 will contain both the Accounts and Project dimensions.

After creating these 2 stages(ensure you also have a POV defined) lets go ahead and deploy this to Essbase from the Manage Database screen(both Calculation & Reporting Database). What you will notice is 3 things

1. There will be 2 essbase databases one suffixed with letter C and the other suffixed with letter R. C database is the Block Storage database that is used for allocation. R database is the Aggregated Storage database that will be used for reporting.

2. You will notice that for each stage there will be a corresponding set of dimensions prefixed by the Stage prefix given at the time of creation. So effectively, if there are 2 stages with 2 dimensions each, then Essbase will have 4 dimensions (though the 2 dimensions might be the same in EPMA).

3. You will also notice that each dimension will have a dummy member called NoMember. This is one of the most important members that controls the grain of the data. This member is the key in loading multi-grain data for allocation into HPCM.


Now that we have the Essbase cubes deployed as well, lets try to understand how the allocation logic works. To begin with lets assume for the month of Jan 2011, the HR Costs for the Consulting Company is say 1000 USD as shown below

This is the input data into Stage 1. So to load this in we will have be creating a text file and load it directly into Essbase. There are 3 options to load data into HPCM

1. Manual Data Entry – HPCM provides a screen where we can update data manually.

2. Staging Tables – We can load the data temporarily into a set of staging tables, and then from within HPCM we can push the data from Staging tables into Essbase.

3. Directly loading data into Essbase

In our case, we will load the data directly into Essbase as that will give more clarity on how HPCM works. For doing data load for Stage 1, remember we have a total of 9 dimensions in Essbase (Measures,AllocationType,Year, Scenario, Period, Version, ACAccount,PRAccount,PRProject). But our input data of HRCosts comes at a grain of only 7 dimensions (Measures,AllocationType,Year, Scenario, Period, Version, ACAccount). So, load this in we will have use the NoMember intersection of the remaining 2 dimensions (PRAccount & PRProject). The input data file to Essbase is shown below

CostInput,DirectAllocation,2011,Jan,Actual,Working,HRCosts,[PRAccounts].[NoMember],[PRProgram].[NoMember],1000
Our idea is to allocate the 1000 USD down to the 3 projects for the January Month. So, our end result should look like this

If you notice, the 1000 USD is split across the 3 projects based on the overall number of resources(Resource Count for each project/Total Resources*HRCost) in each project. So basically for the allocation to happen, we need to load the Resource Count data. Resource Count data for all projects and the individual projects have to be loaded as shown below

FixedDriverValue,DirectAllocation,2011,Jan,Working,Actual,HRCosts,HRCosts,Project A,300
FixedDriverValue,DirectAllocation,2011,Jan,Working,Actual,HRCosts,HRCosts,Project B,500
FixedDriverValue,DirectAllocation,2011,Jan,Working,Actual,HRCosts,HRCosts,Project C,200
Weight,DirectAllocation,2011,Jan,Working,Actual,HRCosts,[PRAccounts].[NoMember],[PRProgram].[NoMember],1000

In the above data file there are 2 things we can notice

1. We have used a measure called FixedDriverValue and Weight. We will see their significance a little bit later. For now think of them as intersections that will hold the Resource Count Data.

2. The first 3 records above have HRCosts repeated twice to load into both the Accounts dimension we have in the Essbase Cube. Again we will see the significance of why we are doing this below.

So far we have loaded all the necessary data into the Essbase Cubes and have also setup the stages. The next step is in defining the Allocation Logic. This is done through a concept called Drivers. From the Perspective of HPCM, drivers define how the allocation values get pushed from source to target. HPCM supports different allocations like Activity Based Costing etc through the concept of Drivers. In our case, the Driver for allocation is the Resource Count. Just to recap, within Essbase now we have dimensions catering to two stages – Stage 1 and Stage 2. This is illustrated below. So basically we have 3 separate sub-cubes each having its own intersection.

In the above diagram, whenever we want to do allocation there can be different ways of doing it. The 3 most common ways are

1. Allocation Based on Source – Here all the driver values are obtained from the source stage and data from source stage is then assigned to the Destination Stage based on the Source driver values.

2. Allocation Based on Destination – Here all the driver values are obtained from Destination stage and data from source stage is then assigned to the Destination Stage based on the Destination driver values.

3. Allocation Based on Source & Destination – Also known as Assignments – Here all the driver values are obtained from the intersection of Source & Destination and data from source stage is then assigned to the Destination Stage based on the Assignment driver values.

If you look at our Driver data above (3 records containing FixedDriverValue & 1 containing Weight), you can see that our driver (Resource Count) is loaded in 2 ways. First the FixedDriverValue is loaded at the intersection of Source & Destination stages. The second driver Weight is loaded at the Source Stage intersection. We have chosen these 2 measures (FixedDriverValue & Weight). We could have chosen any other measure like Rate, Volume etc. But each measure has a logical meaning and it is better to stick to the ones we think is logically close to what we are trying to do. If we are not able to use the existing measures and if they don’t relate to our driver names then we can create custom measures. So, this allocation handling through drivers is defined through the Driver Definition screen in HPCM.

As you see in our formula we are basically doing a division of each Resource Count from Stage 2 by the Total Resource Count in Stage 1. HPCM will automatically multiply the resulting values of the driver to the CostInput measure (input data) thereby allocating.

Once we have defined the driver the next step is to assign the driver to both the stages. HPCM allows us to have multiple drivers for each stage. And even within a single dimension in a stage we can define exceptions so that multiple drivers can be assigned within the same dimension. This way we can have allocation logic based not only on Resource Count but also on say Clocked Time of a resource in certain cases. Driver assignment is done through the Driver Selection screen in HPCM.

After the assignments, one other important feature of HPCM is its ability to assign allocations on a cell by cell basis. So, what we can do is, for each cell in the Source Stage we can assign the Destination cell(s) in the Destination Stage. That way we can clearly find out what intersections will get affected by what allocation logic. If you had to do them through say Essbase Calculation Scripts alone, it would have been such a laborious task not only while doing the allocations but also in maintaining them. In our example, lets assign the Data intersection of HRCosts in Stage 1 to the 3 Destination Projects in Stage 2 as shown below

Now lets run the Calculation & then immediately transfer the data to the ASO cube from the Manage Calculation screen.

If you now look at the data you can clearly see that each project now will have the corresponding costs allocated.

To validate how the costs have flown through the stages, HPCM provides an option to do a stage balancing report. This will show us how the costs have moved from stage to stage. In addition one biggest advantage of HPCM is, if there are any unassigned costs in the source, those will automatically move into Idle Costs (or we can configure it throw an error to ensure all costs are always allocated).

In terms of reporting, the ASO cube that HPCM provides will be more than sufficient. But based on what i have seen, we will have a lot of un-necessary dimensions especially when we are not concerned about the inter-stage essbase dimensions. In such cases, we can build our own ASO cube and extract only the necessary intersections that we need. But for now, i will showcase how BI EE 11g can be used for reporting again the native HPCM ASO cube. I will quickly show the lineage breakup reports as well

We start off with importing the ASO cube into BI EE which is pretty straight-forward as shown below.

Then lets start with building the report just for looking at our Stage 1 data. Remember Stage 1 input HRCosts data is loaded against NoMember intersections of the other dimensions. So we will have to explicitly filter to arrive at the right data.

Similarly to get the final allocated data we will have to apply filters as shown below

In the same way we can get the driver values directly from the cube using a simple BI EE report. All of these are shown in the form of a dashboard – showing how the costs from stage to stage using the drivers.

Configuring OBI 11g to use port 80

January 23rd, 2012 by Robin Moffatt

Introduction

By default, OBI 11g will listen for connections on port 7001 or port 9704 depending on your installation type. There are several reasons you may want to change this, and one of them would be to make it use port 80, the default http port. If you use port 80, then you don’t have to specify it in the URL, so you can use http://biserver/analytics/ as your URL instead of http://biserver:9704/analytics/.

The port configuration is a simple change to be made to the Server configuration in Web Logic Console. However, if you want to use port 80 and are running Linux/Unix there is an additional consideration. On Linux/Unix (hereafter, I’ll just refer to Linux), any port < 1024 is considered privileged and you must run the program binding to it as root.

At this point, if you don't have root access on your box already and are beholden to a sysadmin team, I would suggest validating your reasons for wanting to run on port 80, since you are going to have to go begging for some kind of privileged access. If you're up for that, then read on. If not, then why not stick with 9704, it's a nice number anyway....

Overview of the change to listen on port 80

The configuration changes are actually very simple. They are just within Web Logic Server, we don’t touch the BI components at all.

We change the Listen Port property of the Server to port 80.

Since we are using port 80, we will have to start the server process as root. But, we can get Web Logic to revert back to the OBI user&group once the port has been bound to (since running a process as root is generally a Bad Idea)

To do this, we configure Post-Bind UID in the Machine properties.

WLS – Server configuration

Depending on whether you have a Enterprise or Simple OBI installation, the configuration will differ slightly. See Mark’s post here for explanation of the difference between the two installation types. In an Enterprise installation, you need to modify the properties of the bi_server1 Managed Server, whereas a Simple installation has all the bits within AdminServer.

For both however, you need to login to Web Logic Admin Console, and locate the Servers screen. Click on either AdminServer (Simple installation) or bi_server1 (Enterprise installation) to view the Server settings.

Click on Lock & Edit

Locate Listen Port on the Server configuration (Configuration -> General, if not already displayed). Change this to 80.

Click on Save.

WLS – Machine configuration

Still logged into Web Logic Admin Console, with the above Listen Port change pending, click on the Machines link under Environment in the Domain Structure menu. Click on your server name.

Now, tick the two items Enable Post-Bind UID and Enable Post-Bind GID.

To determine your userid and group, login with your normal OBI user, and type id

$ id
uid=300(oracle) gid=301(oinstall)

Enter these values in Post-Bind UID and Post-Bind GID

Click on Save.

Activate the changes

In Web Logic Admin Console, click on Activate Changes.

When this has completed successfully, shutdown the stack and then see below for how to start it back up.

Starting up the stack to listen on port 80

You don’t need to run all of the components with root privilege, only those that are binding to port 80. In addition, in an Enterprise installation you have the Node Manager component which is responsible for invoking Managed Servers, and since we need the Managed Server bi_server1 to be run with root privileges, Node Manager must be too.

‘Simple’ installation

  1. Start WebLogic, as root
  2. Start OPMN and BI Components (not as root)

‘Enterprise’ installation

  1. Start NodeManager, as root
    • If you don’t start this as root, you won’t be able to start up the Managed Server bi_server1 from within Web Logic Admin Console
  2. Start WebLogic (AdminServer) (not as root)
  3. Start WebLogic Managed Server (bi_server1), as root
  4. Start OPMN and BI Components (not as root)

Validate the changes

You should now be able to access OBI at port 80, which as the default HTTP port doesn’t need specifying in your URL. So if previously it was http://localhost:9704/analytics/ you should now be able to see it at http://localhost/analytics

If you look in the Web Logic server log (AdminServer.log / bi_server1.log respectively), you should see this kind of entry:

<Notice> <Server> <BEA-002613> <Channel "Default[1]" is now listening on 192.168.69.0:80 for protocols iiop, t3, ldap, snmp, http.>
<Notice> <Server> <BEA-002613> <Channel "Default[2]" is now listening on 127.0.0.1:80 for protocols iiop, t3, ldap, snmp, http.>

And netstat should show something like:

$ netstat -a|grep LISTEN|grep 80
[…]
biserver01.80             *.*                0      0 49152      0 LISTEN
localhost.80               *.*                0      0 49152      0 LISTEN
[…]

or you might see the port named instead of numbered, thus:

$ netstat -a|grep LISTEN|grep http
tcp        0      0 rm-oel02.localdomain:http   *:*                         LISTEN
tcp        0      0 localhost.localdomain:http  *:*                         LISTEN

Possible errors

If you don’t start the process as root, then it will fail to bind to port 80, and you’ll see this error in the log:

 <Error> <Server> <BEA-002606> <Unable to create a server socket for listening on channel "Default[1]". The address 127.0.0.1 might be incorrect or another process is using port 80: java.net.BindException: Permission denied.>

sudo

What is sudo? Well, any chance to quote this xkcd cartoon is a good one:

xkcd - sudo make me a sandwich

What sudo does is let you run a command under root (superuser) privileges, without actually being root. This is very useful in two ways: (1) your sysadmins don’t have kittens when you ask for root access, because they can give it in a very granular way which is logged every time it is used (2) you’re not let loose on a server with unfettered root access where a simple slip of the keyboard can serious knacker things up (hence your sysadmin’s kittens).

sudo can be granted globally for a user, that is, any command the user wants to run as root can be done so. Alternatively, sudo can be allowed for one or many pre-specified commands. This means that in this case your sysadmins can give you the right to run the required WLS processes as root, and absolutely nothing else.

This is an example of such an entry in the sudo configuration file, /etc/sudoers :

Cmnd_Alias BISERVER = /app/fmw11115/user_projects/domains/bifoundation_domain/bin/startManagedWebLogic.sh bi_server1, /app/fmw11115/wlserver_10.3/server/bin/startNodeManager.sh
# This defines a command alias, BISERVER, with the web logic startup script and arguments to call, and the command to start Node Manager
oracle     localhost=NOPASSWD BISERVER
# user oracle, on machine localhost, can run BISERVER command alias, without being re-prompted for their password

Whenever you invoke sudo, it’ll be logged in /var/log/secure, eg:

Jan  3 17:07:19 rm-oel02 sudo:      rnm : TTY=pts/2 ; PWD=/home/rnm ; USER=root ; COMMAND=/app/fmw11115/user_projects/domains/bifoundation_domain/bin/startManagedWebLogic.sh bi_server1

Alternative to running root/sudo

An interesting alternative is to leave OBI listening on its default port, and configure the server to forward requests on port 80 to it, for example using iptables. I found this suggestion here, amongst other places. This way you’d only need a one-time privileged change made on the server, instead of needing privileged rights continually (however limited they may be)

I’ve not had chance to try it, and in keeping to the KISS principle would probably only consider it if all other options were exhausted. Thinking of someone coming to the server after me, would their first thought around OBI port configuration really be to look at iptables? Whereas Listen Port is a standard configuration item within WLS.

Agile Data Warehousing with Exadata and OBIEE: Model-Driven Iteration

January 16th, 2012 by Stewart Bryson

After laying the groundwork with an introduction, and following up with a high-level description of the required puzzle pieces, it’s time to get down to business and describe how Extreme BI works. At Rittman Mead, we have several projects delivering with this methodology right now, and more in the pipeline.

I’ll gradually introduce the different types of generic iterations that we engage in, focusing on what I call the “model-driven” iteration for this post. Our first few iterations are always model-driven. We begin when a user opens a user story requesting new content. For any request for new content, we require that all the following elements are including in the story:

  1. A narrative about the data they are looking for, and how they want to see it. We are not looking for requirements documents here, but we are looking for the user to give a complete picture of what it is that they need.
  2. An indication of how they report on this content today. In a new data warehouse environment, this would include some sort of report that they are currently running against the source system, and in a perfect world, this would involve the SQL that is used to pull that report.
  3. An indication of data sets that are “nice to haves”. This might include data that isn’t available to them in the current paradigm of the report, or was simply too complicated to pull in that paradigm. After an initial inspection of these nice-to-haves and the complexity involved with including them in this story, the project manager may decide to pull these elements out and put them a separate user story. This, of course, depends on the Agile methodology used, and the individual implementation of that methodology.

First we assign the story to an RPD developer, who uses the modeling capabilities in the OBIEE Admin Tool to “discover” the logical dimensional model tucked inside the user story, and develop that logical model inside the Business Model and Mapping (BMM) layer. Unlike a “pure” dimensional modeling exercise where we focus only on user requirements and pay very little attention to source systems, in model-driven development, we constantly shift between the source of the data, and how best the user story can be solved dimensionally. Instead of working directly against the source system though, we are working against the foundation layer in the Oracle Next-Generation Reference Data Warehouse Architecture. We work from a top-down approach, first creating empty facts and dimensions in the BMM, and mapping them to the foundation layer tables in the physical layer.

To take a simple example, we can see how a series of foundation layer tables developed in 3NF could be mapped to a logical dimension table as our Customer dimension:

Model-Driven Development of Dimension Table

I rearranged the layout from the Admin Tool to provide an “ETL-friendly” view of the mapping. All the way to the right, we can see the logical, dimensional version of our Customer table, and how it maps back to the source tables. This mapping could be quite complicated, with perhaps dozens of tables. The important thing to keep in mind is that this complexity is hidden from not only the consumer of the reports, but also from the developers. We can generate a similar example of what our Sales fact table would look like:

Another way of making the same point is to look at the complex, transaction model:

We can then compare this to the simplified, dimensional model:

And finally, when we view the subject area during development of an analyses, all we see are facts and dimensions. The front-end developer can be blissfully ignorant that he or she is developing against a complex transactional schema, because all that is visible is the abstracted logical model:

When mapping the BMM to complex 3NF schemas, the BI Server is very, very smart, and understands how to do more with less. Using the metadata capabilities of OBIEE is superior to other metadata products, or superior to a “roll-you-own metadata” approach using database views, because of the following:

  1. The generated SQL usually won’t involve self-joins, even when tables exists in both the logical fact table, and the logical dimension table.
  2. The BI Server will only include tables that are required to facilitate the intelligent request, either because it has columns mapped to the attributes being requested, or because the table is a required reference table to bring disparate tables together. Any tables not required to facilitate the request will be excluded.

Since the entire user story needs to be closed in a single iteration, the user who opened the story needs to be able to see the actual content. This means that the development of the analyses (or report) and the dashboard are also required to complete the story. It’s important to get something in front of the end user immediately, but it doesn’t have to be perfect. We should focus on a clear, concise analyses in the first iteration, so it’s easy for the end user to verify that the data is correct. In future iterations, we can deliver high-impact, eye-catching dashboards. Equally important to closing the story is being able to prove that it’s complete. In Agile methodologies, this is usually referred to as the “Validation Step” or “Showcase”. Since we have already produced the content, then it’s easy to prove to the user that the story is complete. But suppose that we believed we couldn’t deliver new content in a single iteration. That would imply that we would have an iteration during our project that didn’t include actual end-user content. How would you go about validating or showcasing that content? How would we go about showcasing a completed ETL mapping, for instance, if we haven’t delivered any content to consume it?

What we have at the end of the iteration is a completely abstracted view of our model: a complex, transactional, 3NF schema presented as a star schema. We are able to deliver portions of a subject area, which is important for time-boxed iterations. The Extreme Metadata of OBIEE 11g allows us to remove this complexity in a single iteration, but it’s the performance of the Exadata Database Machine that allows us to build real analyses and dashboards and present it to the general user community.

In the next post, we’ll examine the ETL Iteration, and explore how we can gradually manifest our logical business model into a physical model over time. As you will see, the ETL iteration is an optional one… it will be absolutely necessary in some environments, and completely superflous in others.

Website Design & Build: tymedia.co.uk