Visualising OBIEE DMS metrics with Graphite

March 30th, 2014 by

Assuming you have set up obi-metrics-agent and collectl as described in my previous post, you have a wealth of data at your disposal for graphing and exploring in Graphite, including:

  • OS (CPU, disk, network, memory)
  • OBIEE’s metrics
  • Metrics about DMS itself
  • Carbon (Graphite’s data collector agent) metrics

In this post I’ll show you some of the techniques we can use to put together a simple dashboard.

Building graphs

First off, let’s see how Graphite actually builds graphs. When you select a data series from the Metrics pane it is added to the Graphite composer where you can have multiple metrics. They’re listed in a legend, and if you click on Graph Data you can see the list of them.

Data held in Graphite (or technically, held in whisper) can be manipulated and pre-processed in many ways before Graphite renders it. This can be mathmatical transforms of the data (eg Moving Average), but also how the data and its label is shown. Here I’ll take the example of several of the CPU metrics (via collectl) to see how we can manipulate them.

To start with, I’ve just added idle, wait and user from the cputotals folder, giving me a nice graph thus:

We can do some obvious things like add in a title, from the Graph Options menu

Graphite functions

Looking at the legend there’s a lot of repeated text (the full qualification of the metric name) which makes the graph more cluttered and less easy to read. We can use a Graphite function to fix this. Click on Graph Data, and use ctrl-click to select all three metrics:

Now click on Apply Function -> Set Legend Name By Metric. The aliasByMetric function is wrapped around the metrics, and the legend on the graph now shows just the metric names which is much smarter:

You can read more about Graphite functions here.

Another useful technique is being able to graph out metrics using a wildcard. Consider the ProcessInfo group of metrics that DMS provides about some of the OBIEE processes:

Let’s say we want a graph that shows cpuTime for each of the processes (not all are available). We could add each metric individually:

But that’s time consuming, and assumes there are only two processes. What if DMS gives us data for other processes? Instead we can use a wildcard in place of the process name:

You can do this by selecting a metric and then amending it in the Graph Data view, or from the Graph Data view itself click on Add and use the auto-complete to manually enter it.

But now the legend is pretty unintelligable, and this time using the aliasByMetric function won’t help because the metric name is constant (cpuTime). Instead, use the Set Legend Name By Node function. In this example we want the third node (the name of the process). Combined with a graph title this gives us:

This aliasbyNode method works well for Connection Pool data too. However it can be sensitive to certain characters (including brackets) in the metric name, throwing a IndexError: list index out of range error. The latest version of obi-metrics-agent should workaround this by modifying the metric names before sending them to carbon.

The above graph shows a further opportunity for using Graphite functions. The metric is a cumulative one – amount to CPU time that the process has used, in total. What would be more useful would be if we could show the delta between each occurrence. For this, the derivative function is appropriate:

Sometimes you’ll get graphs with gaps in; maybe the server was busy and the collector couldn’t keep up.


To “gloss over” these, use the Keep Last Value function:


Saving graphs

You don’t have to login to Graphite by default, but to save and return to graphs and dashboards between sessions you’ll want to. If you used the obi-metrics-agent installation script then Graphite will have a user oracle with password Password01. Click the Login button in the top right of the Graphite screen and enter the credentials.

Once logged in, you should see a Save icon (for you young kids out there, that’s a 3.5″ floppy disk…).

You can return to saved graphs from the Tree pane on the left:


As well as the standard Graphite graphing described above, you also have the option of using flot, which is available from the link in the top-right options, or the icon on an existing graph:


Graphlot/Flot is good for things like examining data values at specific times:


Creating a dashboard

So far we’ve seen individual graphs in isolation, which is fine for ad-hoc experimentation but doesn’t give us an overall view of a system. Click on Dashboard in the top-right of the Graphite page to go to the dashboards area, ignoring the error about the default theme.

You can either build Graphite dashboards from scratch, or you can bring in graphs that you have prepared already in the Graphite Composer and saved.

At the top of the Graphite Dashboard screen is the metrics available to you. Clicking on them drills down the metric tree, as does typing in the box underneath

Selecting a metric adds it in a graph to the dashboard, and selecting a second adds it into a second graph:

You can merge graphs by dragging and dropping one onto the other:

Metrics within a graph can be modified with functions in exactly the same way as in the Graphite Composer discussed above:

To add in a graph that you saved from Graphite Composer, use the Graphs menu

You can resize the graphs shown on the dashboard, again using the Graphs menu:

To save your dashboard, use the Dashboard -> Save option.

Example Graphite dashboards

Here are some examples of obi-metrics-agent/Graphite being used in anger. Click on an image to see the full version.

  • OS stats (via collectl)
    OS stats from collectl
  • Presentation Services sessions, cache and charting
    Presentation Services sessions, cache and charting
  • BI Server (nqserver) Connection and Thread Pools
    BI Server (nqserver) Connection and Thread Pools
  • Response times vs active users (via JMeter)
    Response times vs active users (via JMeter)

Built-In OBIEE Load Testing with nqcmd

March 28th, 2014 by

nqcmd ships with all installations of OBIEE and includes some very useful hidden functionality – the ability to generate load tests against OBIEE. There are lots of ways of generating load against OBIEE, but most require third party tools of varying degrees of complexity to work with.

It’s easy to try this out. First set the OBIEE environment:  [I'm using SampleApp v309R2 as an example; your FMW_HOME path will vary]

and then the “open sesame” setting which enables the hidden nqcmd functionality:

On Windows, run set SA_NQCMD_ADVANCED=YES instead. If you don’t set this environment variable then nqcmd just throws an error if you try to use one of the hidden options.

Now if you list the available options for nqcmd you’ll see lots of new options in addition to the usual ones:

You’re own your own figuring the new options out as they’re not documented (and therefore presumably not supported and liable to change or be dropped at any time). What I’ve done below is my best guess at how to use them – don’t take this as gospel. The one source that I did find is a post on Oracle’s CEAL blog: OBIEE 11.1.1 – Advanced Usage of nqcmd command, from which I’ve taken some of the detail below.

Let’s have a look at how we can generate a load test. First off, I’ll create a very simple query:

and from the Advanced tab extract the Logical SQL from it:

This Logical SQL I’ve saved to a file, report01.lsql.

To run this Logical SQL from nqcmd I use the standard (documented) syntax, passing the Logical SQL filename with the -s flag:

Adding the -q flag will do the same, but suppress the data output:

The basic parameters for load testing are

  • -t – how many threads [aka Virtual Users]
  • -td – test duration
  • -ds – how frequently to write out load test statistics
  • -T – enable time statistics [without this they will not be reported correctly]

You also need to supply -o with an output filename. Even if you’re not writing the data returned from the query to disk (which you shouldn’t, and -q disables), nqcmd needs this in order to be able to write its load test statistics properly (I got a lot of zeros and nan otherwise). In addition, the -T (Timer) flag should be enabled for accurate timings.

So to run a test for a minute with 5 threads, writing load test stats to disk every 5 seconds, you’d run:

The load test stats are written to a file based on the name given in the -o parameter, with a _Counters.txt suffix:

Using obi-metrics-agent to pull out the OBIEE metrics and Graphite to render them we can easily visualise what happened when we ran the test. The Oracle_BI_General.Total_sessions metric shows:


Ramping Up the Load

nqcmd also has a -i parameter, to specify the ramp up per thread. Most load tests should incorporate a “ramp up”, whereby the load is introduced gradually. This is important so that you don’t overwhelm a server all at once. It might be the server will not support the total number of users planned, so by using a ramp up period you can examine the server’s behaviour as the load increases gradually, spotting the point at which the wheels begin to come off.

The -i parameter for nqcmd is the delay between each thread launching, and this has an interesting effect on the duration of the test. If you specify a test duration (-td) of 5 seconds, five threads (-t), and a rampup (-i) of 10 seconds the total elapsed will be c.55 seconds (5×10 + 5).

I’ve used the standard time command on Linux to validate this by specifying it before the nqcmd call.

So basically the -td is the “Steady State” once all threads are ramped up, and the literal test duration is equal to (rampup * number of threads) + (desired steady state)

The above ramp-up can be clearly seen:


BTW a handy trick I’ve used here is to use a timestamp for the output name so that the Counter.txt from one test doesn’t overwrite another, by specifying date using an inline bash command :

Whilst we’re at it for tips & tricks – if you want to stop nqcmd running but Ctrl-C isn’t instant enough for you, the following will stop it in its tracks:

Wait a Moment…

…or two. Wait time, or “think time”, is also important in producing a realistic load test. Unless you want to hammer your server just for the lulz to see how fast you can overload it, you’ll want to make sure the workload you’re simulating represents how it is actually used — and in reality users will be pausing (thinking) between report requests. The -w flag provides this option to nqcmd.

In this test below, whilst the Total Sessions is as before (no ramp up), the Connection Pool shows far fewer busy connections. On previous tests the busy connections were equal to the number of active threads, because the server was continuously running queries.


And the CPU, which in the previous test was exhausted at five users with no wait time, now is a bit more relaxed


for comparison, this was the CPU in the first test we ran (5 threads, no wait time, no ramp up). Note that ‘idle’ drops to zero, i.e. the CPU is flat-out.


Load Test in Action

Let’s combine ramp up and wait times to run a load test and see what we can see in the underlying OBIEE metrics. I’m specifying:

  • Write the output to a file with the current timestamp (date, in the format YYYY-MM-DD HH:MM:SS)
  • 20 threads
  • 10 second gap between starting each new thread
  • 5 second wait between each thread submitting a new query
  • Run for a total of 230 seconds (20 thread x 10 second ramp up = 200 seconds, plus 30 second steady state)

Here’s what happened.

  • At first, as the users ramp up the Connection Pool gets progressively busier
  • However, when we hit c.14 threads, things start to go awry. The busy count stays at 10, even though the user count is increasing: 2014-03-28_10-26-12
    (This was displayed in flot which you can get to on the /graphlot URL of your Graphite server)
  • So the user count is increasing, but we’re not seeing increasing activity on the Connection Pool… so what does that do for the response times? 2014-03-28_10-30-50
    OK, so the Average Query Elapsed Time is a metric I’d normally be wary of, but this is a dedicated server running just my load test workload (and a single query within it) so in this case it’s a valid indicator — and it’s showing that the response time it going up. Why’s it going up?
  • Looking more closely at the Connection Pool we can see a problem — we’re hitting the capacity of ten connections, and requests are starting to queue up: 2014-03-28_10-38-06
    Note how once the Current Busy Connection Count hits the Capacity of ten, the Current Queued Requests value starts to increase — because the number of users is increasing, trying to run more queries, but having to wait.

So this is a good example of where users would see slow performance, but some of the usual “Silver Bullets” around hardware and the database would completely miss the target, because the bottleneck here is actually in the configuration of the Connection Pool.

If you’re interested in hearing more about this subject, make sure you register for the BI Forum in Brighton, 7-9 May where I’m delighted to be speaking for the second time, presenting “No Silver Bullets : OBIEE Performance in the Real World“.

Data Integration Tips: ODI 12c – Substitution API GUIDs

March 27th, 2014 by

With the release of Oracle Data Integrator 12c and Oracle GoldenGate 12c still fresh in our minds, I thought I would start an ongoing series focused on Data Integration Tips – specifically around the latest version of Oracle’s data integration tools. The goal is to provide insight into some of the hidden, undocumented, or lesser known features of these products. Most of these tips and tricks will come from our experiences when solving challenges for our clients. I’m sure some of my data integration colleagues at Rittman Mead will jump on board and share their knowledge as well.

The Scenario

While working with a client who is using ODI 12c, we were attempting to capture the session ID (known as the Batch ID) for each execution of a Load Plan. The process uses an ODI Variable, which, when refreshed, would capture the identifier from the following substitution API call. This variable would then be mapped to the audit dimension key column in each dimension mapping for use in ETL process auditing.

Much to my surprise, this code produced an error. Even more surprising, the ODI 12c documentation showed that I had the syntax and parameter value correctly defined for this API call. The error received was different than any I had seen before:

java.lang.Exception: The application script threw an exception: Exception getLoadPlanInstance("BATCH_ID") : ### KEY com.sunopsis.res.gen / ODI-17769: OdiRef: Internal ID 5 of IObjects SnpLoadPlanInstance should not be used when the repository is 12c mode.### BSF info: null at line: 0 column: columnNo

The message states that this code “should not be used when the repository is in 12c mode“. Hmm…something about this call is clearly not compatible with ODI 12c.

Here’s the Tip…

As you may recall, ODI 11g used numeric IDs, made up of the object ID and the Work Repository ID, to uniquely identify objects stored in the repository. This had the potential for issues when migrating objects between repositories, should the repository ID be the same across environments. This led Oracle to change how objects are stored in Oracle Data Integrator 12c – switching from numeric IDs to GUIDs.

During my research of this error, I was reminded of this change and directed to the ODI 12c “Upgrading your environment” documentation. Throughout this document, the decision on which upgrade path to take depends on the use of  ”legacy numeric identifiers” in odiRef API calls versus the new GUID approach in ODI 12c. This is a hint that certain substitution API calls using identifiers are not going to work in a standard ODI 12c installation.

I was on the right track, but still hadn’t found anything that looked like a solution. I reached out on Twitter, and once again Mr. David Allan of Oracle’s Data Integration team had the “unwritten answer” to which the documentation was referring.

Now we’re getting somewhere. I made the change in the ODI variable call to use the GUID, and it worked as expected.


Final thought – if you’re using any “ID” parameter calls in your substitution API code, be sure to modify them to use GUIDs. And, as in this case, make sure you make the change to any logical and physical table attributes to handle the difference in datatypes (numeric vs varchar2).

More Data Integration Tips will be coming soon – I look forward to your feedback on the blog series.

Installing obi-metrics-agent, Graphite, and collectl

March 27th, 2014 by

In the previous post I introduced obi-metrics-agent, an open-source tool from Rittman Mead that can collect DMS metrics from OBIEE and write them to several formats including graphite. Graphite is a tool for storing and graphing time-based metrics.

Now in this post we will look at how to install the tools and get them up and running on your server. The install process has been tested on both OL5 and OL6 (and so by extension, Cent OS and RHEL). There’s no reason why it oughtn’t work on other *nixes, but it’s not been tested.

I’m using Oracle’s SampleApp v309R2 server which is built on Oracle Linux (OL) 5 so that anyone can try this out at home running it on VirtualBox (or VMWare Fusion).

There are three parts to the obi-metrics-agent installation.

  1. obi-metrics-agent – python script to extract DMS metrics from OBIEE
  2. [optional] Graphite – store and graph data, including that extracted from obi-metrics-agent and collectl
  3. [optional] collectl – OS metrics viewer. Can send data for graphing in Graphite

Only one of these is mandatory – the installation of obi-metrics-agent itself. That in itself is just a case of cloning the repository in order to get the obi-metrics-agent python script on the server. However, the remaining articles in this blog series will assume that graphite and collectl have also been installed, so these are covered too.

Assumptions & pre-requisites

  1. Server has internet access (try ping to check)
  2. Installation is running as the oracle user
  3. Installation path is /home/oracle/graphite
  4. User has sudo rights

Before you run this on SampleApp v309R2, you must first run the following to fix a bug in yum’s configuration. If you’re not running SampleApp, you don’t need to do this:

You need the EPEL yum repository setup (for packages such as git, etc):

  • OL 5/Cent OS 5/RHEL 5 (including SampleApp v309R2):
  • OL 6/Cent OS 6/RHEL 6:


The bulk of the installation is done by a couple of shell scripts. To get these, you should clone the git repository. The following steps will do this.

Now run the following to install git and clone the repository:

Then launch the installation:

  • OL5:
  • OL6:

This will install git (if not present), and create a folder called obi-metrics-agent in the installation folder of OBIEE (FMW Home). You can put obi-metrics-agent wherever you want, this locating is just a suggestion.

The installation for Graphite builds a standalone python environment using virtualenv in which graphite will run under Python 2.6. The reason for this is that OL5 ships with Python 2.4 (Noah was seen coding in this shortly before boarding his ark) and the OS’ python version cannot be easily upgraded without causing all sorts of complications. Virtualenv is generally a good way to deploy python applications in isolation from others, so is also used for the OL6 version of the install.

The script will take a few minutes to run. Once complete, you can test the installation following the steps below. Do note that Graphite is somewhat notorious for installation problems, so whilst these instructions have been tested, you may hit quirks on your own server that may need a bit of Google-Fu to resolve. If you want to follow step-by-step instructions detailing each step, they are provided:

If you’re on a different *nix platform then feel free to adapt the install scripts and submit a pull request. Graphite works flawlessly on Debian-based distributions, and there is no reason why you should run it local to the OBIEE server on which obi-metrics-agent is running.

Testing the installation

Testing obi-metrics-agent

First, start up the OBIEE stack and make sure that you can login to OBIEE. One it is running, you can run obi-metrics-agent by entering:

Expected output:

For more advanced options, including sending data to graphite, see below.

Testing Graphite

Go to http://<server>/ and you should see the Graphite launch page:


By default Graphite will collect data about its own performance, of the carbon agent. This means you can easily test that Graphite’s graphs are working by selecting one of the carbon metrics to graph. Expand the Metrics tree and click cpuUsage which will add it into the graph in the main pane:


Graphite is installed with a default user oracle with password Password01.

Testing collectl

From the command line enter collectl:

(press Ctrl-C to exit)

To check that collectl is successfully sending data to Graphite go to Graphite and under the Metrics folder you should see your machine’s hostname (for example obieesample for SampleApp). Under that should be the metrics that collectl is sending to Graphite:


Graphing OBIEE DMS metrics in Graphite

Now that we’ve gone through the installation, let us see how to use the three components together, graphing out some OBIEE DMS data in Graphite.

Run as you did in the test above, but this time specify the output carbon and carbon-server parameters

Now go to Graphite at http://<server>/ and under Metrics you should see all of the OBIEE DMS metrics in folders. Note that it may take a couple of minutes for the metrics to first appear in Graphite. If you don’t see them, or the complete list, wait a minute and then hi


(the DMS folder is metrics relating to DMS itself – the OBIEE DMS metrics are under OBI)

Find the metric for Presentation Services logged in sessions, Oracle_BI_PS_Sessions.Sessions_Logged_In and click on it to add it to the graph. By default the graph shows a large time period so click on the Select Recent Data icon


Set the time period to 5 minutes, and click the Auto-Refresh button at the bottom of the graph.

Now log in to OBIEE, and go back to the Graphite page – you should see that the metric has increased accordingly.


I’ll look at using Graphite in detail in a later blog post; this is just to demonstrate that the collection and rendering is working.

Different ways of using obi-metrics-agent

obi-metrics-agent has three modes of operation:

  1. Collect and parse metrics to output
    • Near-real-time rendering of OBI metrics in Graphite
    • Near-real-time load of data into Oracle (via CSV & external table)
  2. Collect metrics to disk only
    • For analysis at a later date
    • If graphite/carbon is not available to send data to
    • If outbound network bandwidth is constrained (or could be, by OBIEE)
    • For lowest host system overhead
  3. Parse existing metrics from disk
    • Parse previously collected data and output to Graphite or Oracle (via CSV & external table)


For further details and syntax examples, see the USAGE.MD file in the github repository.

Introducing obi-metrics-agent – an Open-Source OBIEE Metrics Collector

March 26th, 2014 by

Understanding what is going on inside OBIEE is important for being able to diagnose issues that arise, monitor its health, and dig deep into its behaviour under stress in a load test. OBIEE exposes a set of metrics through the Dynamic Monitoring Service (DMS) and viewable through Enterprise Manager (EM) Fusion Middleware Control. EM is a great tool but doesn’t meet all requirements for accessing these metrics, primarily because it doesn’t retain any history.

obi-metrics-agent is a tool that extracts OBIEE’s performance metrics from the Dynamic Monitoring Service (DMS) functionality. Venkat wrote the original version, which I have rewritten in python and added additional functionality. It polls DMS on a specified interval and output the data to a variety of formats. It was written to aid OBIEE performance monitoring either as part of testing or longer-term use. Its features include:

  • Multiple output options, including CSV, XML, and Carbon (for rendering in Graphite etc)
  • Parse data as it is collected, or write to disk
  • Parse data collected previously


How does it work?

obi-metrics-agent is written in Python, and uses the documented OPMN functionality to expose DMS metrics opmnctl metric op=query. We experimented with the WLST route but found the overhead was too great. OPMN supplies the DMS data as a large XML message, which obi-metrics-agent can either store raw, or parse out into the constituent metrics. It can write these to CSV or XML files, generate INSERT statements for sending them to a database, or send them to graphite (see below).

obi-metrics-agent can also parse previously-extracted raw data, so if you want to store data in graphite but don’t have the server to hand at execution time it can be loaded retrospectively.


On which platforms does it work?

  • Tested thoroughly on Oracle Linux 5 and 6
  • Works on Windows 2003, should work on later versions

Which OBI metrics are collected?

All of the ones that OPMN supports. Currently, BI Server and BI Presentation Services, plus the opmn process metrics (such as CPU time of each OBI component)

To explore the DMS metrics available, you can use Enterprise Manager, or the DMS Spy servlet that is installed by default with OBIEE and available at http://<obi-server>:7001/dms/ (assuming your AdminServer is on port 7001).


I have used DMS metrics primarily when investigating OBIEE’s behaviour under stress in performance testing, but some of the higher-level metrics are useful for day-to-day monitoring too. The DMS metrics let you peer into OBIEE’s workings and deduce or hypothesise the cause of behaviour you are seeing.

  • How many users does OBIEE see as logged in?
  • How many active requests are there from Presentation Services to BI Server (nqserver)?
  • How many active connections are there from each Connection Pool to the database? How many queued connections?
  • What’s the average (careful…) response time by database?
  • What’s the error rate for Presentation Services queries?
  • How does the memory profile of each OBIEE component behave during testing?
  • How are the BI Server’s internal thread pools coping with the load? Do they need resizing?
  • How many queries per second are being run on each database?
  • How is the graphing engine behaving? Is it queuing requests?
  • What’s the Presentation Services and BI Server cache hit rate?

Sounds great, where do I get it?

Rittman Mead have released obi-metrics-agent as open source. You can find it on GitHub:

Simply clone the repository and run the python script. You need to install the lxml library first – full details are supplied in the repository.

See the next blog post for a step-by-step on getting it set up and running using SampleApp v309R2 as an example server, including with Graphite and Collectl for visualising the data and collecting OS stats too.

Visualising the collected data – Graphite

Graphite is an open-source graphing tool that comes with a daemon called carbon that receives incoming data and stores it its own times-series database (called whisper). Graphite is a very popular tool meaning there’s lots of support out there for it and additional tools written to complement it, some of which I’ll be exploring in later blog posts. It’s also very easy to get data into graphite, and because it stores it in a time series you can then display OBIEE DMS data alongside anything else you may have – for example, OS metrics from collectl, or jmeter performance test counters.

obi-metrics-agent architecture7defbbf23a98ad16184c80041fdf1bc9

Whilst obi-metrics-agent can be used on its own and data stored to CSV for subsequent parsing, or accessing in Oracle as an external table, the focus on this and subsequent blog posts will primarily be on using obi-metrics-agent writing data to graphite and the benefits this brings when it comes to visualising it.

“Graphite?! Haven’t we got a visualisation tool already in OBIEE?”

You can graph this data out through OBIEE, if you want. The OBI metric data can be loaded in by external table from the CSV files, or using the generated INSERT statements.

The benefit of using Graphite is twofold:

  1. Its primary purpose is graphing time-based metrics. Metrics in, time-based graphs out. You don’t need to build an RPD or model a time dimension. It also supports one-click rendering of wildcarded metric groups (for example, current connection count on all connection pools), as well as one-click transformations such as displaying deltas of a cumulative measure.
  2. It gives an alternative to a dependency on OBIEE. If we use OBIEE we’re then rendering the data on the system that we’re also monitoring, thus introducing the significant risk of just measuring the impact of our monitoring! To then set up a second OBIEE server just for rendering graphs then it opens up the question of what’s the best graphing tool for this particular job, and Graphite is a strong option here.

Graphite just works very well in this particular scenario, which is why I use it….YMMV.


There are several obvious features that could be added so do please feel free to fork the repository and submit your own pull requests! Ideas include:

  • support for scaled-out clusters
  • init.d / run as a daemon
  • selective metric collection

Known Issues

Prior to OBIEE, there was a bug in the opmn process which causes corrupt XML sometimes. This could sometimes be as much as 15% of samples. On corrupt samples, the datapoint is just dropped.

The FMW patch from Oracle for this issue is 13055259.


What next?

There are several posts on this has subject to come, including :

  1. Installing obi-metrics-agent, graphite, and collectl.
  2. Exploring the graphite interface, building simple dashboards
  3. Alternative front-ends to graphite
Website Design & Build: