RM BI Forum 2014 Brighton is a Wrap – Now on to Atlanta!

May 12th, 2014 by

I’m writing this sitting in my hotel room in Atlanta, having flown over from the UK on Saturday following the end of the Rittman Mead BI Forum 2014 in Brighton. I think it’s probably true to say that this year was our best ever – an excellent masterclass on the Wednesday followed by even-more excellent sessions over the two main days, and now we’re doing it all again this week at the Renaissance Atlanta Midtown Hotel in Atlanta, GA.

Wednesday’s guest masterclass was by Cloudera’s Lars George, and covered the worlds of Hadoop, NoSQL and big data analytics over a frantic six-hour session. Lars was a trooper; despite a mistake over the agenda where I’d listed his sessions as being just an hour each when he’d planned (and been told by me) that they were an hour-and-a-half each, he managed to cover all of  the main topics and take the audience through Hadoop basics, data loading and processing, NoSQL and analytics using Hive, Impala, Pig and Spark. Roughly half the audience had some experience with Hadoop with the others just being vaguely acquainted with it, but Lars was an engaging speaker and stuck around for the rest of the day to answer any follow-up questions.


For me, the most valuable parts to the session were Lars’ real-world experiences in setting up Hadoop clusters, and his views on what approaches were best to analyse data in a BI and ETL context – with Spark clearly being in-favour now compared to Pig and basic MapReduce. Thanks again Lars, and to Justin Kestelyn from Cloudera for organsising it, and I’ll get a second-chance to sit through it again at the event in Atlanta this week.

The event itself proper kicked-off in the early evening with a drinks reception in the Seattle bar, followed by the Oracle keynote and then dinner. Whilst the BI Forum is primarily a community (developer and customer)-driven event, we’re very pleased to have Oracle also take part, and we traditionally give the opening keynote over to Oracle BI Product Management to take us through the latest product roadmap. This year, Matt Bedin from Oracle came over from the States to deliverer the Brighton keynote, and whilst the contents aren’t under NDA there’s an understanding we don’t blog and tweet the contents in too much detail, which then gives Oracle a bit more leeway to talk about futures and be candid about where their direction is (much like other user group events such as BIWA and ODTUG).


I think it’s safe to say that the current focus for OBIEE over the next few months is the new BI in the Cloud Service (see my presentation from Collaborate’14 for more details on what this contains), but we were also given a preview of upcoming functionality for OBIEE around data visualisation, self-service and mobile – watch this space, as they say. Thanks again to Matt Bedin for coming over from the States to delver the keynote, and for his other session later in the week where he demo’d BI in the Cloud and several usage scenarios.

We were also really pleased to be joined by some some of the top OBIEE, Endeca and ODI developers around the US and Europe, including Michael Rainey (Rittman Mead) and Nick Hurt (IFPI), Truls Bergensen, Emiel van Bockel (CB), Robin Moffatt (Rittman Mead), Andrew Bond (Oracle) and Stewart Bryson (Rittman Mead), and none-other than Christian Berg, an independent OBIEE / Essbase developer who’s well-known to the community through his blog and through his Twitter handle, @Nephentur – we’ll have all the slides from the sessions up on the blog once the US event is over, and congratulations to Robin for winning the “Best Speaker” award for Brighton for his presentation “No Silver Bullets: OBIEE Performance in the Real World”.


We had a few special overseas guests in Brighton too; Christian Screen from Art of BI Software came across (he’ll be in Atlanta too this week, presenting this time), and we were also joined by Oracle’s Reiner Zimmerman, who some of you from the database/DW-side will known from the Oracle DW Global Leaders’ Program. For me though one of the highlights was the joint session with Oracle’s Andrew Bond and our own Stewart Bryson, where they presented an update to the Oracle Information Management Reference Architecture, something we’ve been developing jointly with Andrew’s team and which now incorporates some of our thoughts around the agile deployment of this type of architecture. More on this on the blog shortly, and look out for the white paper and videos Andrew’s team are producing which should be out on OTN soon.


So that’s it for Brighton this year – and now we’re doing it all again in Atlanta this week at the Renaissance Atlanta Midtown Hotel. We’ve got Lars George again delivering his masterclass, and an excellent – dare I say it, even better than Brighton’s – array of sessions including ones on Endeca, the In-Memory Option for the Oracle Database, TimesTen, OBIEE, BI Apps and Essbase. There’s still a few places left so if you’re interested in coming, you can book here and we’ll see you in Atlanta later this week!


Adding Geocoding Capabilities to Pig through Custom UDFs

May 4th, 2014 by

In the previous two posts in this series, I’ve used Hive and Pig to process and analyse data from log files generated by the Rittman Mead Blog webserver. In the article on Hive, I created a relational table structure over a directory containing the log files, using a regular expression SerDe to split each log line up into its constituent parts (IP address, page requested, status code and so on). I then brought in another table of data, containing IP address ranges and countries they were assigned to, so that I could determine what parts of the world were accessing our site over time.

In the second example, I took the same log files but processed them this time using Apache Pig. I used Pig’s dataflow-style approach to loading and analysing data to progressively filter, pivot and analyse the log file dataset, and then joined it to an export of pages and authors from WordPress, the CMS that we used to run the website, so I could determine who wrote the most popular blog articles over the period covered by the logs.

But the first example, where I joined the log file data to the geocoding table, had a bit of an issue that only came-up when I tested it with a larger set of data than I used at the end of that article. In the article example, I limited the amount of log file rows to just five, at the time to keep the example readable on the blog, but when I tried it later on with the full dataset, the query eventually failed with an out-of-memory error from the Hadoop cluster. Now in practice, I could probably have increased the memory (java heap space) or otherwise got the query through, but geo-tagging my data in this way – as a big table join, and using an in-memory database engine (Impala) to do it – probably isn’t the most sensible way to do a single value lookup as part of a Hadoop transformation – instead,  this is probably something better done through what’s called a “user-defined function”.

Both Hive and Pig support used-defined functions (UDFs), and a quick Google search brought up one for Hive called GeocodeIP, on Github, that looks like it might do the job. Sticking with Pig for the moment though, we thought this might be a good opportunity to see how UDFs for Pig are created, and so my colleague, Nelio Guimaraes, put together the following example to walk through how a typical one might be created. Before start though, a bit of background.

The problem we have is that we need to match an IP address in a webserver log file with an IP address range in a lookup table. For example, the IP address in the lookup table might be, any the lookup database would have an IP address range from, say, to which allocates to a particular country – Poland, for example. The lookup database itself comes from Maxmind, and there’s a formula they use to convert IP addresses to integers, like this:


so that you can do a simple BETWEEN in an SQL join to locate the range that matches the incoming IP address.


Except Pig, like Hive, can’t normally support non-equijoins, which leads us to UDFs and other approaches to getting the country for our IP address. Pig, again like Hive, is however extensible and its relatively easy to add Pig UDFs either yourself, or through UDF libraries like Pig’s Piggybank. The best language to write UDFs in is Java as it gives access to the largest amount of Pig native functionality (such as the ability to write custom data loaders and unloaders), but for what we’re doing Python will be fine, so let’s put one together in Python to do our IP address Geocoding and show how the process works.

Going back to the Pig scripts we put together yesterday, they started-off by declaring a relation that loaded the raw log files in from a directory in HDFS, and then used another relation to parse the first one via a regular expression, so we had each of the log file elements in its own column, like this:

At this point we can run-off a list of the top 5 browser type based on page access, for example:

See the previous post, and this other blog article, for more background on how you do grouping and aggregating in Pig, if you’re not familiar with the syntax.

But what if we want to group by country? That’s where the geocoding comes in, and the Pig UDF we’re going to create. As we’re going to create this example using Python, we’ll be using the pygeoip Python API that you install through the pip, the Python package manager, and the GeoLite Country database (.dat) file from Maxmind, who make this basic version available for download free, then follow these steps to set the Python UDF up:

1. On the master node on your Hadoop cluster (i’m using a three-node CDH4.6 cluster) where Pig and Hive run, install pip, and then download the pygeoip API, like this:

2. Copy the GeoIP.dat file to somewhere on the Hadoop master node, for example /home/nelio/. Make a note of the full path to the GeoIP.dat file, and then copy the file to the same location on all of the worker nodes – there’s probably a way to cache this or otherwise automatically distribute the file (suggestions welcome), but for now this will ensure that each worker node can get access to a copy of this file.

3. Using a text editor, create the python script that will provide the Python UDF, like this, substituting the path to your GeoIP.dat file if it’s somewhere else. Once done, save the file as python_geoip.py to the same directory (/home/nelio, in this example) – note that this file only needs to go on the main, master node in your cluster, and Pig/MapReduce will distribute it to the worker nodes when we register it in our Pig script, later on.

Note that the sys.path.append line in the script is so that Jython knows to look in the place were the new python module, pygeoip, when starting up.

4. Let’s start another Pig session now and try and use this UDF. I exit back to the OS command prompt, and change directory to where I’ve stored the python file and the GeoIP.dat file, and start another Grunt shell session:

Now if I want to use this Python Pig UDF in a Grunt shell session, I need to register it as an scripting UDF either at the Grunt shell or in a Pig script I run, so let’s start with that, and then bring in the log data as before:

I’ll now define a new relation (alias) projecting just the IP addresses from the combined log files, and then filter that into another alias so we only have valid IP addresses to work with:

Now we get to use the Python UDF. We’ll pass to it these IP addressees, with the UDF then returning the country that the IP address is located in, based on Maxmind’s IP address ranges.

So this function will have converted all of the IP addresses in the logs to country names; let’s now group, count, order and select the top five from that list.

Of course all this has done so far is set-up a data-flow in Pig, telling it how to move the data through the pipeline and arrive at the final output I’m interested in; let’s now run the process by using the “dump” command:

So that’s a simple example of a Pig UDF, in this instance written in Python. There’s other ways to extend Pig beyond UDFs – Pig Streaming is the obvious alternative, where the entire relation goes through the streaming interface to be processed and then output back into Pig, and hopefully we’ll cover this at some point in the future – or then again, maybe it’s now time to take a proper look at Spark.

Simple Hadoop Dataflows using Apache Pig and CDH4.6

May 2nd, 2014 by

The other day I took some logs from the Apache webserver that runs the Rittman Mead website, and analysed them using Hadoop CDH5, Apache Hive and Impala to get some basic metrics on number of hits per month, where the hits came from and so on. Hive and Impala are great for analysing data sitting on HDFS on a Hadoop cluster, but like SQL compared to PL/SQL or C++, everything you do is declarative and set-based whereas sometimes, you want to build up your dataset using a dataflow-type approach, particularly if you’ve come from a programming vs. a data warehousing background.

If you’ve been looking at Hadoop for a while, you’ll probably therefore know there’s another basic high-level-language approach to querying Hadoop data to accompany Hive, and it’s called “Pig”. Pig, like Hive, is an Apache project and provides an engine for creating and executing data flows, in parallel, on Hadoop. Like Hive, jobs you create in Pig eventually translate into MapReduce jobs (with the advantages and disadvantages that this brings), and has concepts that are similar – but just that little bit different – to relational flows such as filters, joins and sorts.

It’s often called a “procedural” language (as opposed to Hive’s declarative language), but really it’s not – it’s a “data flow language” that has you specifically set out the data flow as the main part of a Pig program, rather than it being a by-product of the if/then/elses and control structures of a procedural language. For people like me that comes from an Oracle data warehousing background, in most cases we’d feel more comfortable using Hive’s set-based transformations to do our data loading and transformation on Hadoop, but in some cases – particularly when you’re querying data interactively, building up a data pipeline and working with nested data sets – it can be more appropriate.

Connecting to the Pig Console, and Pig Execution Options

Iteratively examining and analysing data from webserver log files is a great example of where Pig could be useful, as you naturally hone-down and pivot the data as you’re looking at it, and in-effect you’re looking to create a data pipeline from the raw logs through to whatever summary tables or files you’re looking to create. So let’s go back to the same input log files I used in the previous post on Hive and Impala, and this time bring them into Pig. 

Within CDH (Cloudera Distribution including Hadoop) you can run Pig scripts either interactively from the Pig command-line shell, called “Grunt”, or you can submit them as workflow jobs using the Hue web interface and the Oozie workflow scheduler; the advantage when you’re starting to working with the interactive Grunt shell is that you can run your commands one-by-one and examine the metadata structures that you create along the way, so let’s use that approach first and move onto batch scheduling later on.

I’ll start by SSH’ing into one of the CDH4.6 nodes and starting the Grunt shell:

Even from within the Grunt shell, there’s two ways I can then run Pig. The default way is to have Grunt run your Pig commands as you’d expect, converting them in the end to MapReduce jobs which then run on your Hadoop cluster. Or, you can run in “local mode”, which again uses MapReduce but only runs on the machine you’re logged in to and only single-threaded, but can often be faster for when you’re just playing around with a local dataset and you want to see results fast (you can turn on local mode by adding an ‘-x local’ flag when starting Grunt). In my example, I’m going to run Grunt in regular MapReduce mode though anyway.

Loading and Parsing the Weblog Files

I then define my first pig relation, analogous to a relational table and technically, a named Pig “bag”, like this:

Compared to the Pig table DDL script in the previous article example I posted, we declare the incoming dataset much more programmatically – the first row of the script creates a relation called “raw_logs”, analogous to a table in Hive, and declares it as having a single column (“line:array”) that maps onto a directory of files in HDFS (“/user/root/logs”). You can ask Pig (through the Pig command-line client, which I’m using now) to list-out the structure of this relation using the “describe” command:

In this form the logs aren’t too useful though as each row contains all the data we want, as a single field. To take a look at what we’re working with currently, let’s create another relation that limits down the dataset to just five rows, and use the DUMP command to display the relation’s data on the screen:

What I’ve omitted for clarity in the above output is the MapReduce console output – what you’ll see if you run this in MapReduce mode is the process starting up, and then running, to retrieve 5 rows effectively at random from the whole set of log files, process them through the Map > Shuffle > Reduce process and then return them to the Grunt shell.

What would be really good though, of course, is if we could split these single log row columns into multiple ones, one for each part of the log entry. In the Hive example I posted the other day, I did this through a Hive “SerDe” that used a regular expression to split the file, and I can do something similar in Pig; Pig has a function called REGEX_EXTRACT_ALL that takes a regular expression and creates a column for each part of the expression, and so I can use it in conjunction with another relational operator, GENERATE FLATTEN, to take the first set of data, run it through the regular expression and come out with another set of data that’s been split as I want it:

GENERATE in Pig tells it to create (or “project”( some columns out of an incoming dataset; FLATTEN eliminates any nesting the resulting output (we’ll see more of FLATTEN and nesting in a moment). Notice how the DESCRIBE command afterwards now shows individual columns for the log elements, rather than just one single “line:chararray” column.

Using Pig to Interactively Filter the Dataset

So now we’ve got a more useful set of rows and columns in the Pig relation, and like an Oracle table, unless we do something to order them later, they’re effectively held in random order. Something we can do now is filter the dataset, for example creating another relation containing just those log entries where the request 404’d, and the further filter that dataset to those 404’d requests that were made by users using IE6:

So how many of our website users are on IE6 and getting page not available errors? To find out, I create another relation that groups the entries up in a single row, and then generates a count of those rows that were aggregated:

and I can do a similar thing all of the 404’s:

You can see these Pig scripts running in CDH’s Cloudera Manager web application, with the screenshot below showing one of them at the point where 92% of the Mapper parts have completed, waiting to hand-off to the Reducers; the console output in Grunt will show you the status too, the output of which I removed from the above two statements for clarity.


Grouping, Subsetting and Aggregating Data using Pig

How we generate counts and other aggregates is interesting in Pig. Pig has a relational operator called GROUP as we’ve seen before, and when you GROUP a relation by a column, or a group of columns, it creates a new relation that contains two columns; one called “group” that has the same datatype as whatever you grouped on (or a “tuple” made up of multiple columns, if you grouped-on more than one column), and a second column that’s named after whatever you grouped, i.e. the original relation. To take an example, if we grouped the logs_base relation on status code, you’d see the following if you then describe the resulting relation:

What’s interesting though about a pig GROUP, and conceptually different to SQL (and therefore Hive)’s GROUP BY, is that this second column is actually in Pig terms a “bag”, a bag of rows (or “tuples”) that are unaltered compared to the original relation, i.e. they’ve not been aggregated up by the grouping, but are still in their same detail-level. So Pig gives you, apart from its step-by-step data flow method of working with data, this ability to group data whilst still preserving the detail of the individual grouped rows, leaving any summation or other aggregation step to something you do afterwards. So for example, if I wanted to see how many 200s, 404’s and so on my log file dataset contained in total, I then tell Pig to iterate through these bags, project out the columns I’m interested in (in this case, just the status) and also perform aggregation over the grouping buckets specified in the GROUP relational operator:

So in that example, we told Pig to list out all of the groupings (i.e. the distinct list of status codes), and then run a count of rows against each of those groupings, giving us the output we’re interested in. We could, however, not aggregate those rows at this point though and instead treat each “bucket” formed by the grouping as a sub-selection, allowing us to, for example, investigate in more detail when and why the 301 errors – “Moved Permanently” – were caused. Let’s use that now to find out what the top 10 requests were that led to HTTP 301 errors, starting by creating another relation that just contains the ‘301’ group:

Looking at the structure of the relation this has created though, you can see that the rows we’ve grouped are all contained within a single tuple called “logs_base”, and to do anything interesting with that data we’ll need to flatten it, which takes that tuple and un-nests it:

Notice also how I referenced the two columns in the by_status_301 relation by positional notation ($0 and $1)? This is handy when either you’ve not got a proper schema defined for your data (all part of the “pigs eat anything” approach for Pig, in that it even handles data you don’t yet have a formal schema for), or when it’s just easier to refer to a column by position than work out it’s formal name.

So now we’ve got our list of log entries that have recorded HTTP 301 “permanently moved” error messages, let’s use another relation to project just the columns we want – the date and the requests – and also use some Pig string functions to extract the day, month and year along, and also split the request field up into its constituent method, URI and protocol fields:

All of these statements just set-up the data flow, and no actual processing takes place until we choose to dump, or store, the results of the data flow – which again makes Pig great for iteratively building-up a data flow, or in BI terms maybe an ETL flow, before finally pulling the trigger at the end and generating the end result. Let’s do that now using the dump command:

So we had around 85k “page permanently moved” errors in April, only a few in February, and a much larger amount in March 2014. So which web page requests in March 2014 were the biggest cause of this error? Let’s focus on just that month and list out the top ten page requests that hit this error:

Joining Datasets in Pig 

So far we’ve worked with just a single set of data – the Apache weblog files that we’ve then filtered, subsetted, parsed, analysed and so forth. But what would be really interesting though, would be if we can bring in some additional, reference or other lookup data to help us make more sense of the log activity on our website. One of the motivators for the people behind Pig, right at the start, was to give Hadoop the ability to join datasets, which up until then was really hard to do with just Java and MapReduce; as we’ll see later on there are still a lot of restrictions on how these joins take place, but Pig gives you the ability to join two or more datasets together, which we’ll do now in another example where we’ll look at the most popular blog posts, and blog authors, over the period covered by our logs.

Let’s start by taking the full set of logs, parsed into the separate elements of the log file entry, and add in additional columns for month and the request elements:

One thing you’re taught with Pig is “project early, and often”, so let’s remove the method and protocol columns from that dataset and then filter the remaining page requests to remove those that are blank or aren’t blog post requests:

Let’s now reduce that list down to the top ten page requests, the way we did before with pages causing 301 errors:

Not bad. What would be even better though, would be if I could retrieve the full names of these posts in WordPress, on which our website runs, and also the author name. I’ve got text file export file of post names, URLs and authors that’s been previously exported from our WordPress install, so let’s declare another relation to hold initially the raw rows from that file, like this:

Then split that file by the semicolon that delimits each of the entries (author, post name etc):

I’ll now take that relation and project just the columns I’m interested in:

Now I’ll do the join, and then take that join and use it to generate a combined list of pages and who wrote them:

and then finally, output the joined set of data to a comma-separated file in HDFS:

Once that’s run, I can use Grunt’s “cat” command to output the contents of the file I just created:

But What About More Complex Joins and Transformations … Enter, Pig Extensibility

This is of course great, but going back to my previous Hive example I also managed to geo-code the log file entries, converting the IP addresses into country names via a lookup to a geocoding database. What made that example “interesting” though was the need to join the Hive table of log posts to the geocode table via a BETWEEN, or > and < than operators, so that I could locate each IP address within the ranges given by the geocoding database – and the reason it got interesting was that Hive can only do equi-joins, not non-equijoins or joins involving greater than, BETWEEN and so on. Impala *could* do it, and on a small set of input rows – five in my example – it worked fine. Try and scale the Impala query up to the full dataset though, and the query fails, because it runs out of memory; and that’s potentially the issue with Impala, and set-based queries, as Impala does everything in-memory, and most Hadoop systems are designed for fast I/O, not lots of memory. 

So can Pig help here? Well, it’s actually got the same limitation – non-equijoins are actually quite difficult to do in Hadoop because of the way MapReduce works, but where Pig could perhaps help is through its extensibility – you can stream Pig data, for example IP addresses, through Perl and Python scripts to return the relevant country, or you can write Pig UDFs – User-Defined Functions – to return the information we need in a similar way to how PL/SQL functions in Oracle let you call-out to arbitrary functions to return the results of a more complex look-up. But this is also where things get a bit more complicated, so we’ll save this to the next post in this series, where I’ll also be joined by my colleague Nelio who’s spent the best part of this last week VPNd into my VMWare-based Hadoop cluster getting this last example working.

New ODI12c Article, and Details of our Inaugural ODI12c Course in Brighton, May 12th-14th 2014

May 1st, 2014 by


Oracle have just published the May/June 2014 edition of Oracle Magazine, and my business analytics column this time round is on the new 12c release of Oracle Data Integrator.

In “Go with the Flow” I look at how this new editor supports OWB-style multi-step mappings, and how new features like “deployment specifications” allow you to choose different load strategies depending on whether you’re doing a full, or an incremental load. On the same topic, you might also want to take a look at my colleague Stewart Bryson’s recent article on the Oracle Technology Network, “Making the Move from Oracle Warehouse Builder to Oracle Data Integrator 12c”, where he takes an in-depth look at what’s involved in migrating from, and interoperating with, Oracle Warehouse Builder, and what’s in-store for OWB developers when they upgrade to the new 12c release of ODI.

This is actually excellent timing, as we’re just about to launch our new ODI12c training, with our initial course being a three-day ODI12c bootcamp that’s running for the first time in Brighton, UK, from May 12th -14th 2014. Based on this latest release of Oracle Data Integrator, this three-day course assumes no prior knowledge and takes you through everything you need to know to get started with ODI12c.

From setting up the topology through to creating mapping, packages and load plans, this course features modules and labs covering many aspects of ODI 12c functionality. As with all of our courses, we bring our trainer to you and teach all of your team, together, how to make the most of Oracle’s premier data integration tool, with one of our experienced consultants leading the sessions and sharing their project experience.

This first run of the course will be taught by the course author, Oracle ACE Edel Kammermann, accompanied by Jerome Francoisse, our lead beta-tester for ODI12c and speaker at events such as Oracle Openworld and RMOUG training days. If you’ve been looking to get trained-up on the new 12c release of Oracle Data Integrator, this is an excellent opportunity to learn the basics in just three days, down in sunny Brighton in May! Course details are as follows:

  • Duration : 3 Days
  • Course Delivery : Instructor-led with labs, on-site at customer location
  • Who Should Attend : Developers, consultants, project managers, technical leads, DBAs
  • Prerequisites : None

Detailed Course Agenda :

  • Getting Started with ODI 12c
  • ODI Topology
  • ODI Projects
  • Models and Datastores
  • Data Quality in a Model
  • Introduction to ODI Mappings
  • ODI Procedures, Variables, Sequences, and User Functions
  • Advanced ODI Mappings
  • ODI Packages
  • Scenarios in ODI
  • The ODI Debugger
  • ODI Load Plans

To book a place on the course, running in Brighton, UK on May 12th-14th 2014, just click on this link - we’ll be running the course in the US shortly afterwards. Finally, if you’ve got any questions about this course or any of our other OBIEE, ODI, Oracle BI Apps or OWB courses, just drop us a line at training@rittmanmead.com.

Previewing TimesTen, Endeca and Oracle DW Sessions at the Brighton BI Forum 2014

May 1st, 2014 by

It’s under a week now to the first of the two Rittman Mead BI Forum 2014 events, with Brighton running next week at the Hotel Seattle and then Atlanta the week after, at the Renaissance Atlanta Midtown Hotel. Earlier in the week I went through a more detailed agenda for the Lars George Cloudera Hadoop Masterclass, and the week before Stewart covered-off some of the Oracle sessions at the Atlanta event, but as a final preview of this series I just wanted to talk about three session running at next week’s Brighton event.


Someone I’ve got to know pretty well over the last year is Oracle’s Chris Jenkins, who’s the face of TimesTen development in the UK. I first ran into Chris, and his colleague Susan Cheung, late last year when I posted a series of articles on TimesTen vs. Essbase ASO, and then Chris presented alongside myself and Peak Indicators’ Tony Heljula at last year’s Oracle Openworld, on TimesTen on Exalytics Best Practices. Chris kindly agreed to come along to the Brighton BI Forum and share some of his tips and techniques on TimesTen development, and also answer some of the questions from members of the audience implementing TimesTen as part of their OBIEE setup. Over to Chris:

“Since the launch of Exalytics TimesTen has been at it’s heart delivering high performance access to relational data to support the ‘speed of thought’ experience. But it hasn’t all been plain sailing; each use case has its own specific challenges and correct configuration, adopting best operational practice and properly tuning the TimesTen database to support the workload are essential to getting the best results. When working with customers I often come across situations where things are not setup quite as well as they might be or where a less than optimal approach has been adopted, and this can negatively affect performance or manageability.

In my session I will highlight the most common pitfalls and show how to avoid them. I will also discuss best practices for operation and data loading and look at how to optimise the TimesTen database for your workload. And of course there is the opportunity to ask questions! By the end of the session I hope that you will have a good understanding of how to get the best out of TimesTen for your particular use case.”


Another speaker speaking for the first time at the BI Forum is Truls Bergersen, but Truls will of course be well-known to the European user group community through his work with the Norwegian Oracle User Group, who run the excellent OUGN conference cruise each year around April. Truls has been working with Oracle’s BI and data warehousing tools for many years, but more recently has been taking a close look at Endeca Information Discovery, the search and data discovery tool Oracle added to their BI portfolio a couple of years ago. According to Truls …

“It’s been almost two and a half years now, since Oracle acquired Endeca,and in that period the tool has been given a few enhancements. E.g.improvements have been done to the look-and-feel of the UI, it has beenadded support for loading JSON and OBI presentation tables, and the toolcan now be installed on Weblogic. My two favorite things, however, are theself service provisioning and eBS extensions.

The goal of my presentation is to give the audience a good overview of thetool from a data architect’s point of view, and how the tool fits in withand extends your existing BI platform. I will not go into details aboutinstallation and other too technical aspects, but rather look at thetool’s capabilities from a data point of view – how can Endeca utilizeOBIEE and visa versa, what can be done in terms of self-service, etc.”


Finally, we’re really pleased to be joined by none other than Reiner Zimmerman, who heads-up Oracle’s Data Warehouse Global Leaders’ Program. Rittman Mead are one of the European partner sponsors of the DW Global Leaders Forum, which brings together the top customers and partners working with Oracle’s data warehousing, engineered systems and big data products several times a year in EMEA, the Americas and APAC.  Reiner’s also the person most likely to take the “last man standing” award from our own Borkur and Ragnar, so before that happens, over to Reiner:

“The DW & Big Data Global Leaders program is an Oracle development driven strategic customer program to establish a platform for Oracle DW and Big Data customers to discuss best practices and experience with Oracle Product Management and Product Development and our Associate Partners like Rittman Mead.

Our current focus is Big Data and Advanced Analytics and we seek to create best practices around Big Data architectures in terms of Manageability, High Availability and Monitoring Big Data Systems. Learn what the program is, what is can bring to you, how you can participate and what other customers are doing.”

The Rittman Mead Brighton BI Forum 2014 runs next week (May 7th-9th 2014) at the Hotel Seattle, Brighton, and there’s still a few places left if you register now. Straight-after, we’re going over to Atlanta to do it all again at the Renaissance Midtown Atlanta Hotel, with full details of the event agendas here, and the event homepage including registration instructions, here. Hopefully see some of you in Brighton or Atlanta later in May!

Website Design & Build: tymedia.co.uk