I’m honoured to have been accepted to speak at this year’s Rittman Mead BI Forum, the sixth year of this expert-level conference that draws some of the best Oracle BI/DW minds together from around the world. It’s running May 8th-9th in Brighton, and May 15-16th in Atlanta, with an optional masterclass from Cloudera’s Lars George the day before the conference itself at each venue.
My first visit to the BI Forum was in 2009 where I presented Performance Testing OBIEE, and now five years later (five years!) I’m back, like a stuck record, talking about the same thing – performance. That I’m still talking about it means that there’s still an audience for it, and this time I’m looking beyond just testing performance, but how it’s approached by people working with OBIEE. For an industry built around 1s and 0s, computers doing just what you tell them to and nothing else, there is a surprising amount of suspect folklore and “best practices” used when it comes to “fixing” performance problems.
Getting good performance with OBIEE is just a matter of being methodical. Understanding where to look for information is half the battle. By understanding where the time goes, improvements can be targeted where they will be most effective. Heavily influence by Cary Millsap and his Method-R approach to performance, I will look at how to practically apply this to OBIEE. Most of the information needed to build up a full picture is available readily from OBIEE’s log files
I’ll also dig a bit deeper into OBIEE, exploring how to determine how the system’s behaving “under the covers”. The primary technique for this is through OBIEE’s DMS metrics which I have written about recently in relation to the new Rittman Mead open-source tool, obi-metrics-agent and am using day-to-day to rapidly examine and resolve performance problems that clients see.
I’m excited to be presenting again on this topic, and I hope to see you in Brighton next month. The conference always sells out, so don’t delay – register today!
Continuing with our Data Integration Tips series, here’s one that applies to both Oracle Data Integrator 11g and 12c. This “tip” was actually discovered as a part of a larger issue involving GoldenGate, Oracle Datapump, and ODI. Maybe a future post will dive deeper into those challenges, but here I’m going to focus just on the ODI bit.
During our setup of GoldenGate and ODI, it was discovered that the source and target databases were set to use different character sets.
Target (DW): AL32UTF8
During my research, I found that the source is a single-byte character set and the target is multi-byte. What this means is that a special character, such as “Ǣ“, for example, may take up more than one byte when stored in a column with a VARCHAR2 datatype (as described in the Oracle documentation – “Choosing a Character Set“). When attempting to load a column of datatype VARCHAR2(1) containing the text “Ǣ”, the load would fail with an error, similar to the one below.
The difference in character sets is clearly the issue, but how do we handle this when performing a load between the two databases? Reading through the Oracle doc referenced above, we can see that it all depends on the target database column length semantics. Specifically, for the attributes of VARCHAR2 datatype, we need to use character semantics in the target, “VARCHAR2(1 CHAR)”, rather than byte semantics, “VARCHAR2(1 BYTE)”. The former can handle multi-byte character sets simply by storing the characters as they are inserted. The latter will store each byte necessary for the character value individually. Looking back at the example, the character “Ǣ” inserted into a column using byte semantics (which is the default, in this case, when BYTE or CHAR is not specified) would require 2 bytes, thus causing the error.
Here’s the Tip…
The overall solution is to modify any VARCHAR2 columns that may have special characters inserted to use character semantics in the target database. Quite often we cannot determine which columns may or may not contain certain data, requiring the modification of all columns to use character semantics. Using the database system tables, the alter table script to make the necessary changes to existing columns can be generated and executed. But what about new columns generated by ODI? Here we’ll need to use the power of the Oracle Data Integrator metadata to create a new datatype.
In the ODI Topology, under the Physical Architecture accordion, the technologies that can be used as a data source or target are listed. Each technology, in turn, has a set of datatypes defined that may be used as Datastore Attributes when the technology is chosen in a Model.
Further down in the list, you will find the VARCHAR2 datatype. Double-click the name to open the object. In the SQL Generation Code section we will find the syntax used when DDL is generated for a column of type VARCHAR2.
As you can see, the default is to omit the type of semantics used in the datatype syntax, which most likely means BYTE semantics are used, as this is typically the default in an Oracle database. This syntax can be modified to always produce character semantics by adding the CHAR keyword after the length substitution value.
Before making the change to the “out of the box” VARCHAR2 datatype, you may want to think about how this datatype will be used on Oracle targets and sources. Any DDL generated by ODI will use this syntax when VARCHAR2 is selected for an attribute datatype. In some cases, this might be just fine as the ODI tool is only used for a single target data warehouse. But quite often, ODI is used in many different capacities, such as data migrations, data warehousing, etc. To handle both forms of semantics, the best approach is to duplicate the VARCHAR2 datatype and create a new version for the use of characters.
Now we can assign the datatype VARCHAR2 (CHAR) to any of our Datastore columns. I recommend the use of a Groovy script if changing Attributes in multiple Datastores.
Now when Generate DDL is executed on the Model, the Create Table step will have the appropriate semantics for the VARCHAR2 datatype.
As you can see, the power of Oracle Data Integrator and the ability to modify and customize its metadata provided me with the solution in this particular situation. Look for more Data Integration Tips from Rittman Mead – coming soon!
In this post I’ll show you some of the techniques we can use to put together a simple dashboard.
First off, let’s see how Graphite actually builds graphs. When you select a data series from the Metrics pane it is added to the Graphite composer where you can have multiple metrics. They’re listed in a legend, and if you click on Graph Data you can see the list of them.
Data held in Graphite (or technically, held in whisper) can be manipulated and pre-processed in many ways before Graphite renders it. This can be mathmatical transforms of the data (eg Moving Average), but also how the data and its label is shown. Here I’ll take the example of several of the CPU metrics (via collectl) to see how we can manipulate them.
To start with, I’ve just added idle, wait and user from the cputotals folder, giving me a nice graph thus:
We can do some obvious things like add in a title, from the Graph Options menu
Looking at the legend there’s a lot of repeated text (the full qualification of the metric name) which makes the graph more cluttered and less easy to read. We can use a Graphite function to fix this. Click on Graph Data, and use ctrl-click to select all three metrics:
Now click on Apply Function -> Set Legend Name By Metric. The aliasByMetric function is wrapped around the metrics, and the legend on the graph now shows just the metric names which is much smarter:
You can do this by selecting a metric and then amending it in the Graph Data view, or from the Graph Data view itself click on Add and use the auto-complete to manually enter it.
But now the legend is pretty unintelligable, and this time using the aliasByMetric function won’t help because the metric name is constant (cpuTime). Instead, use the Set Legend Name By Node function. In this example we want the third node (the name of the process). Combined with a graph title this gives us:
This aliasbyNode method works well for Connection Pool data too. However it can be sensitive to certain characters (including brackets) in the metric name, throwing a IndexError: list index out of range error. The latest version of obi-metrics-agent should workaround this by modifying the metric names before sending them to carbon.
The above graph shows a further opportunity for using Graphite functions. The metric is a cumulative one – amount to CPU time that the process has used, in total. What would be more useful would be if we could show the delta between each occurrence. For this, the derivative function is appropriate:
Sometimes you’ll get graphs with gaps in; maybe the server was busy and the collector couldn’t keep up.
To “gloss over” these, use the Keep Last Value function:
You don’t have to login to Graphite by default, but to save and return to graphs and dashboards between sessions you’ll want to. If you used the obi-metrics-agent installation script then Graphite will have a user oracle with password Password01. Click the Login button in the top right of the Graphite screen and enter the credentials.
Once logged in, you should see a Save icon (for you young kids out there, that’s a 3.5″ floppy disk…).
You can return to saved graphs from the Tree pane on the left:
As well as the standard Graphite graphing described above, you also have the option of using flot, which is available from the link in the top-right options, or the icon on an existing graph:
Graphlot/Flot is good for things like examining data values at specific times:
Creating a dashboard
So far we’ve seen individual graphs in isolation, which is fine for ad-hoc experimentation but doesn’t give us an overall view of a system. Click on Dashboard in the top-right of the Graphite page to go to the dashboards area, ignoring the error about the default theme.
You can either build Graphite dashboards from scratch, or you can bring in graphs that you have prepared already in the Graphite Composer and saved.
At the top of the Graphite Dashboard screen is the metrics available to you. Clicking on them drills down the metric tree, as does typing in the box underneath
Selecting a metric adds it in a graph to the dashboard, and selecting a second adds it into a second graph:
You can merge graphs by dragging and dropping one onto the other:
Metrics within a graph can be modified with functions in exactly the same way as in the Graphite Composer discussed above:
To add in a graph that you saved from Graphite Composer, use the Graphs menu
You can resize the graphs shown on the dashboard, again using the Graphs menu:
To save your dashboard, use the Dashboard -> Save option.
Example Graphite dashboards
Here are some examples of obi-metrics-agent/Graphite being used in anger. Click on an image to see the full version.
OS stats (via collectl)
Presentation Services sessions, cache and charting
nqcmd ships with all installations of OBIEE and includes some very useful hidden functionality – the ability to generate load tests against OBIEE. There are lots of ways of generating load against OBIEE, but most require third party tools of varying degrees of complexity to work with.
It’s easy to try this out. First set the OBIEE environment: [I'm using SampleApp v309R2 as an example; your FMW_HOME path will vary]
You’re own your own figuring the new options out as they’re not documented (and therefore presumably not supported and liable to change or be dropped at any time). What I’ve done below is my best guess at how to use them – don’t take this as gospel. The one source that I did find is a post on Oracle’s CEAL blog: OBIEE 11.1.1 – Advanced Usage of nqcmd command, from which I’ve taken some of the detail below.
Let’s have a look at how we can generate a load test. First off, I’ll create a very simple query:
and from the Advanced tab extract the Logical SQL from it:
"A - Sample Sales"."Products"."P2 Product Type"s_1,
"A - Sample Sales"."Base Facts"."1- Revenue"s_2
FROM"A - Sample Sales"
ORDER BY1,2ASC NULLS LAST
FETCH FIRST5000001ROWS ONLY
This Logical SQL I’ve saved to a file, report01.lsql.
To run this Logical SQL from nqcmd I use the standard (documented) syntax, passing the Logical SQL filename with the -s flag:
-ds – how frequently to write out load test statistics
-T – enable time statistics [without this they will not be reported correctly]
You also need to supply -o with an output filename. Even if you’re not writing the data returned from the query to disk (which you shouldn’t, and -q disables), nqcmd needs this in order to be able to write its load test statistics properly (I got a lot of zeros and nan otherwise). In addition, the -T (Timer) flag should be enabled for accurate timings.
So to run a test for a minute with 5 threads, writing load test stats to disk every 5 seconds, you’d run:
Using obi-metrics-agent to pull out the OBIEE metrics and Graphite to render them we can easily visualise what happened when we ran the test. The Oracle_BI_General.Total_sessions metric shows:
Ramping Up the Load
nqcmd also has a -i parameter, to specify the ramp up per thread. Most load tests should incorporate a “ramp up”, whereby the load is introduced gradually. This is important so that you don’t overwhelm a server all at once. It might be the server will not support the total number of users planned, so by using a ramp up period you can examine the server’s behaviour as the load increases gradually, spotting the point at which the wheels begin to come off.
The -i parameter for nqcmd is the delay between each thread launching, and this has an interesting effect on the duration of the test. If you specify a test duration (-td) of 5 seconds, five threads (-t), and a rampup (-i) of 10 seconds the total elapsed will be c.55 seconds (5×10 + 5).
I’ve used the standard time command on Linux to validate this by specifying it before the nqcmd call.
So basically the -td is the “Steady State” once all threads are ramped up, and the literal test duration is equal to (rampup * number of threads) + (desired steady state)
The above ramp-up can be clearly seen:
BTW a handy trick I’ve used here is to use a timestamp for the output name so that the Counter.txt from one test doesn’t overwrite another, by specifying date using an inline bash command :
Whilst we’re at it for tips & tricks – if you want to stop nqcmd running but Ctrl-C isn’t instant enough for you, the following will stop it in its tracks:
Wait a Moment…
…or two. Wait time, or “think time”, is also important in producing a realistic load test. Unless you want to hammer your server just for the lulz to see how fast you can overload it, you’ll want to make sure the workload you’re simulating represents how it is actually used — and in reality users will be pausing (thinking) between report requests. The -w flag provides this option to nqcmd.
In this test below, whilst the Total Sessions is as before (no ramp up), the Connection Pool shows far fewer busy connections. On previous tests the busy connections were equal to the number of active threads, because the server was continuously running queries.
And the CPU, which in the previous test was exhausted at five users with no wait time, now is a bit more relaxed
for comparison, this was the CPU in the first test we ran (5 threads, no wait time, no ramp up). Note that ‘idle’ drops to zero, i.e. the CPU is flat-out.
Load Test in Action
Let’s combine ramp up and wait times to run a load test and see what we can see in the underlying OBIEE metrics. I’m specifying:
Write the output to a file with the current timestamp (date, in the format YYYY-MM-DD HH:MM:SS)
10 second gap between starting each new thread
5 second wait between each thread submitting a new query
Run for a total of 230 seconds (20 thread x 10 second ramp up = 200 seconds, plus 30 second steady state)
At first, as the users ramp up the Connection Pool gets progressively busier
However, when we hit c.14 threads, things start to go awry. The busy count stays at 10, even though the user count is increasing:
(This was displayed in flot which you can get to on the /graphlot URL of your Graphite server)
So the user count is increasing, but we’re not seeing increasing activity on the Connection Pool… so what does that do for the response times?
OK, so the Average Query Elapsed Time is a metric I’d normally be wary of, but this is a dedicated server running just my load test workload (and a single query within it) so in this case it’s a valid indicator — and it’s showing that the response time it going up. Why’s it going up?
Looking more closely at the Connection Pool we can see a problem — we’re hitting the capacity of ten connections, and requests are starting to queue up:
Note how once the Current Busy Connection Count hits the Capacity of ten, the Current Queued Requests value starts to increase — because the number of users is increasing, trying to run more queries, but having to wait.
So this is a good example of where users would see slow performance, but some of the usual “Silver Bullets” around hardware and the database would completely miss the target, because the bottleneck here is actually in the configuration of the Connection Pool.
If you’re interested in hearing more about this subject, make sure you register for the BI Forum in Brighton, 7-9 May where I’m delighted to be speaking for the second time, presenting “No Silver Bullets : OBIEE Performance in the Real World“.
With the release of Oracle Data Integrator 12c and Oracle GoldenGate 12c still fresh in our minds, I thought I would start an ongoing series focused on Data Integration Tips – specifically around the latest version of Oracle’s data integration tools. The goal is to provide insight into some of the hidden, undocumented, or lesser known features of these products. Most of these tips and tricks will come from our experiences when solving challenges for our clients. I’m sure some of my data integration colleagues at Rittman Mead will jump on board and share their knowledge as well.
While working with a client who is using ODI 12c, we were attempting to capture the session ID (known as the Batch ID) for each execution of a Load Plan. The process uses an ODI Variable, which, when refreshed, would capture the identifier from the following substitution API call. This variable would then be mapped to the audit dimension key column in each dimension mapping for use in ETL process auditing.
Much to my surprise, this code produced an error. Even more surprising, the ODI 12c documentation showed that I had the syntax and parameter value correctly defined for this API call. The error received was different than any I had seen before:
java.lang.Exception: The application script threw an exception: com.sunopsis.tools.core.exception.SnpsSimpleMessageException: Exception getLoadPlanInstance("BATCH_ID") : ### KEY com.sunopsis.res.gen / ODI-17769: OdiRef: Internal ID 5 of IObjects SnpLoadPlanInstance should not be used when the repository is 12c mode.### BSF info: null at line: 0 column: columnNo
The message states that this code “should not be used when the repository is in 12c mode“. Hmm…something about this call is clearly not compatible with ODI 12c.
Here’s the Tip…
As you may recall, ODI 11g used numeric IDs, made up of the object ID and the Work Repository ID, to uniquely identify objects stored in the repository. This had the potential for issues when migrating objects between repositories, should the repository ID be the same across environments. This led Oracle to change how objects are stored in Oracle Data Integrator 12c – switching from numeric IDs to GUIDs.
During my research of this error, I was reminded of this change and directed to the ODI 12c “Upgrading your environment” documentation. Throughout this document, the decision on which upgrade path to take depends on the use of “legacy numeric identifiers” in odiRef API calls versus the new GUID approach in ODI 12c. This is a hint that certain substitution API calls using identifiers are not going to work in a standard ODI 12c installation.
I was on the right track, but still hadn’t found anything that looked like a solution. I reached out on Twitter, and once again Mr. David Allan of Oracle’s Data Integration team had the “unwritten answer” to which the documentation was referring.
@mRainey you have to use odiRef.getLoadPlanInstance(“BATCH_GUID”) in #odi12c … Guids(12c) versus Ids(11g)
Now we’re getting somewhere. I made the change in the ODI variable call to use the GUID, and it worked as expected.
Final thought – if you’re using any “ID” parameter calls in your substitution API code, be sure to modify them to use GUIDs. And, as in this case, make sure you make the change to any logical and physical table attributes to handle the difference in datatypes (numeric vs varchar2).
More Data Integration Tips will be coming soon – I look forward to your feedback on the blog series.