OBIEE 11.1.1.9: New Features for Front-End Developers and Business Users

May 26th, 2015 by

OBIEE 11.1.1.9 was released this May and Robin and Mark wrote great posts about New Features from System Administrators and the new support for HiveServer2 and Cloudera Impala respectively. In this post, we will see some useful new features for Front-End development. Some of these features like tree map, new styles, etc. were included in the OBIEE 11.1.1.7.10 version, but I’m including them here anyway because if you are moving from an earlier release of 11.1.1.7 you probably haven’t seen them yet.

Search & Sort options inside the Subject Area

There is a new option to search inside the subject area when we create an analysis. If we press the magnifying glass icon in the Subject Areas pane, a search box appears and OBIEE will return all the objects that contains the text that we entered. We can also use the % wildcard.

search_option

In addition there is also the option to order the objects inside the subject area sorting the folders and the presentation columns inside them. The default value is showing the objects in the way that are saved in the repository.

sort_subj_area

Save Calculated Items in the Web Catalog

A very useful feature for business users is the possibility to create calculated items in an analysis and save it in the catalog in order to reuse it in other analyses for the same subject area.

In the Results tab there is a new icon for creating a new measure. The calculation is created in the same way the column formulas are created.

new_calc_measure

After the new measure is created you can go to the Criteria tab and in the column pop-up menu select the Save Column As option. In this manner you save this new measure as a new object in the catalog to be reused in other analyses. The feature of creating a new calculated measure for tables and pivot tables was included in the 11.1.1.7.10 version but the possibility to save the column in the catalog is a new feature of the OBIEE 11.1.1.9.

savenew_colcalculated_measure4

You can also change the properties of the new column in the Criteria tab but be aware that conditional formatting and conditional action links for the column are not saved to the catalog. Regular action links are saved for the column as well as the format properties.

The saved column can be edited from the catalog pane and catalog page. All the changes that you make in the catalog are applied to every analysis that used this column. If you modify it inside a particular analysis, these changes are valid only for the analysis that you are working on.

editnew_measure

To be able to save new measures as columns you should have the Save Column privilege.

calculatedmeasure_priv

Global Variables

In this version we find a new new type of variables: the Global Variables. Global Variables can be defined in the context of an analysis and can be used in other analysis. Useful to do some intermediate reusable calculations.

To create a new global variable, select the Edit Formula option for the required column. When you press the Variable button in the Formula Editor you will see the new Global option. The Insert Global Variable dialog appears and you can select an existing global variable to be used in the formula, or you can create a new one. To create a new one you need to enter a name, type and value. If you want to use an expression like in the example (Min(Revenue)) as value, the data type should be Text.

global_variable

To reference a global variable you need to use the fully qualified name including the context: @{global.variables.variablename}.

global_var_result

Also to manage global variables you should have the Manage Global Variables privilege.

New Visualisation: Treemap

There is a new visualisation called Treemap since OBIEE 11.1.1.7.10. Treemap groups the dimensional data that you selected in the analysis in tiles. By default, the tiles size is based in the content of first measure of the analysis and the tiles colour is based in the content of the second measure. If you have only one measure in your analysis, is used for both size and colour.

treemap

You can edit it as any other view and change the group by information as well as the measures which affects the size and colour of the tile and the range of colours that you want to use. Also you can choose the style between seeing the information in coloured bins or using continuous colour fill. If you selected the first one you can also select the amount of different coloured bins do you want to use.

treemap2

New Preview Styles

When you want to preview an analysis in the Results tab, instead of showing immediately how the analysis will look, OBIEE offers you different Styles to see your analysis. These feature was also included since the 11.1.1.7.10 version. In case you want to create a custom style, Oracle recommends to use the new Skyros style as a starting point.

preview

Horizontal Layout for Radio Button and Check Box Dashboard Prompts

Radio button and check box dashboards prompts can be showed horizontally. When you create a new dashboard prompt using check box or radio button under Options, you can select between horizontal or Vertical Layout appears under Options.

horizontal_prompts

horizontal_prompt1

Enhancements in Export & Print Options

In this version, you will find more options in the Print and Export Options dialog. You can select if you want to include charts, images and formatting or to specify column properties like the column width and to wrap the text in columns.

dashboardprintoptions

In summary, these features are simple but quite useful for business users and front-end developers, and give more flexibility to create better data visualisations.

 

Deploying Oracle BI Mobile HD within Good, Oracle OMSS and Other Secure MDM Containers

May 24th, 2015 by

A few months ago ODTUG announced their ODTUG Mobile Day in Utrecht, Netherlands and asked if I’d like to put forward an abstract for a presentation. Something that’s been on my mind for a while is the issues some of our customers have been reporting trying to run Oracle BI Mobile HD with the Good Mobile Device Management toolkit, so I thought this might be a good opportunity to look into the topic in more detail and try and come-up with some recommendations on how to get this working. As the ODTUG Mobile Day was for all Oracle developers, not just BI ones, I thought it’d be a good opportunity to look into iOS and Android development in-general as it relates to Oracle, and in-particular what’s involved in deploying into these “container” environments that many large enterprise customers insist on when deploying mobile applications. The presentation is available on Slideshare if you want to download it, but I thought I’d expand on some of the concepts and tips in a separate blog post as it’s all actually quite interesting.

As a bit of background, OBIEE itself can of course display its standard dashboards through mobile web browsers with graphics and charts switching to iOS and Android-friendly formats rather than trying to render in Flash. But standard dashboards and analyses look too small and are hard to work with in mobile browsers, so using this approach to mobile-enable your dashboards isn’t generally recommended. Instead of course, as most readers will already be aware, Oracle released a number of mobile clients for OBIEE over the years, with Oracle BI Mobile and Oracle BI Mobile App Designer available in the latest 11.1.1.9 and suiting most end-user and customer needs for mobile access to their business analytics data.

 NewImage

Oracle BI Mobile App Designer apps can be deployed using SSL and SSO, whilst BI Mobile HD takes advantage of iOS and Android platform security features such as password storage in encrypted on-device keystores and remote-wipe of devices. But for some organizations that need higher-levels of device control and data security, this platform-level security isn’t enough as:

  • Unknown and unsecured devices accessing their networks (for example,via VPN) is considered an unacceptable security risk
  • For Android devices in-particular, there is known malware out on the internet that can compromise “rooted” devices
  • It’s possible to cut-and-paste sensitive information from BI Mobile applications into other applications
  • Users don’t always set secure passwords, and lost or stolen devices can potentially expose BI data stored on the device to unauthorised users

To address these issues, a number of Mobile Device Management vendors provide enterprise-level solutions to these issues, typically by having the customer specially-sign applications they wish to deploy to indicate they’ve been authorised for use, and by deploying those applications within managed containers that keep them separate from the general mobile apps on the users’ device. As well as specially signed and deployed applications such as BI Mobile HD, these MDM suites also typically provide secure and containerised web browsers and email devices, both of which need to work with Oracle BI and Oracle BI Mobile App Designer. Vendors in this space include Good Technology and MobileIron, and Oracle have their own MDM solution after they acquired Bitzer Mobile back in 2013. So how do these solutions work, why do some customers have trouble getting them working, and what’s the “preferred” approach if a customer asks you to just get it working for them?

To start with the simplest approach, the tested and recommended way to run Oracle BI Mobile HD, and Oracle BI Mobile App Designer through an MDM-supplied web browser, is to use Oracle Mobile Security Suite (OMSS), based on the technology Oracle acquired from Bitzer Mobile. OMSS is a complete platform for deploying mobile apps in a secure, managed container environment, and takes customer-signed iOS and Android applications and deploys them onto enterprise users’ mobile devices using a centralized console and management service.

NewImage

To support deployment of Oracle BI Mobile into OMSS and other vendor MDM solutions, Oracle have made an un-packed and un-signed version of Oracle BI Mobile available for download on OTN as “Oracle Business Intelligence Mobile Security Toolkit”, and taking the iOS version as an example you can then compile this is Apple Xcode and sign it for deployment in your organisation’s OMSS installation. For some organisations this can be a bit of a challenge as they don’t generally use Macs and don’t do iOS development, but for Rittman Mead testing this out was pretty straightforward as we all use Macs for our consulting work and some of us play around with Xcode and iOS development in our spare time. What can also be a challenge is setting up an iOS Developer Account so that you can sign the BI Mobile HD application with your organization’s own certificate, but we set up such an account a couple of years ago and were able to get it all setup with just a couple of hours’ work. What you will need then to get this running (apart from the OMSS part that I’ll cover in a moment) is the following bits of hardware and software, in this case for the iOS version of BI Mobile HD:

  • An Apple Mac that can run recent versions of Xcode
  • An Apple Developer account that can develop and sign iOS applications, in your organization’s name and not an individual developer’s – note that you generally need to be registered with Dun and Bradstreet’s business verification service to set this up, which can take a few weeks if your entry is out-of-date or not matching your current company details
  • Oracle Business Intelligence Mobile Security Toolkit
  • Apple Xcode

NewImage

Then it’s a case of setting up a new project in Xcode, selecting Single View Application and Universal as the device type, entering your project and organization’s information, and then merging the Oracle Mobile Security Toolkit’s project files with the empty project you just created and setting any other project properties.

NewImage

At this point you should be able to run the application within the Xcode project and test it using Xcode’s built-in iPhone and iPad simulators, so that you can check all the features still work as expected.

NewImage

Now comes the point where you sign the app you’ve just created so that you can then deploy it into OPSS for distribution to your users. This point is important to security-conscious customers as it’s now the customer, rather than Oracle, that have signed the app and you can be much more certain that no malware or other backdoors have been introduced into the code before compiling and distribution. Signing of the app takes place within Xcode, with a similar process being used for the Android version of BI Mobile HD within the Android Studio IDE.

NewImage

The resulting compiled .app and .apk files are then uploaded into OMSS’s catalog for distribution to users, with provisioning taking place using emails set to corporate users that provide download links to these containerised, managed and secured versions of Oracle’s BI Mobile HD app.

NewImage

So all of this looks pretty straightforward, albeit maybe a bit more complicated for organisations that don’t use Macs and don’t generally develop iOS applications – but Oracle partners such as ourselves can help with this if you’d like to offload this part of the process to a specialist team. Where things do sometimes get a bit more complicated is when other MDM vendor technologies are used, particularly Good Dynamics MDM solution that works in a slightly different way to Oracle Mobile Security Suite.

Unlike OMSS’s approach where it has you compile and sign BI Mobile HD within Apple’s own iOS application, Good requires you to build and export the unsigned Oracle Mobile Security Toolkit project in Xcode as an .ipa file, and then copy it along with your iOS Developer Program certificate and the certificate password into Good’s own Good Control Management Console. There your application is then combined with Good’s security libraries, signed with your certificate password and deployed as a “wrapped application” to then be distributed to users using a similar method to the one OMSS takes; however all mobile application access then goes through a Good proxy server, typically placed outside the main company network and providing secure communications between these managed applications running outside of the company firewall into that company’s secure servers – in this case, OBIEE11g.

NewImage

There’s nothing inherently wrong with this compared to how OMSS does it, and organisations often pick Good Dynamics over other MDM solutions because of the extra functionality Good’s approach provides through the insertion of their security SDK into your mobile application; but its when organisations take advantage of these features to provide custom integration with their own security platform that problems can sometimes occur.

For example, a common issue we hear about when deploying Oracle BI Mobile HD using Good is when the customer tries to integrate their SSO solution into the user authentication process. Good’s security SDK makes it possible to intercept user login events and route the request to the customer SSO server, but it’s essential that control is passed back to the BI Server as if this re-routing hadn’t taken place and returning the authentication details the BI Server expects, and if the custom login process doesn’t quite do this then the authentication process fails. Another issue we heard about recently was recent versions of iOS (iOS 7) deprecating synchronous API calls but BI Mobile HD still making them; in this case Oracle supplied a patch and all calls are now made asynchronously but until then, deployment in the Good environment mysteriously failed.

What makes these issues doubly-tricky to identify and resolve is the restrictions most security-conscious enterprise customers place around disclosing details of their network setup, to the point where they often aren’t allowed to tell partners or Oracle Support any of the details they’ll need to work out how traffic passes around the network and over to OBIEE and the Good MDM environment. What troubleshooting often comes down to then is good old-fashioned packet-sniffing and investigation by someone familiar with OBIEE (in particular, the BI Server’s authentication process), Good’s security SDK and the customer’s network setup.

NewImage

So given all of this, what is our recommendation for customers looking to implement mobile OBIEE11g clients within an MDM, container solution? My advice would be, where the customer doesn’t currently have an MDM solution and wants the easiest way to deploy Oracle BI Mobile within a secure container, go for the Oracle Mobile Security Suite option – it’s what Oracle support and test for, and as an Oracle solution it’s obviously going to be easier to go down this route than try and troubleshoot another vendor’s solution if things go wrong. But if an organization is insisting on deploying Oracle BI Mobile in a secure container its unlikely this is the first time they’ve had to do it, so there’s most probably already an MDM solution in-place and it’s likely to be from Good.

In this case, first of all remember that it should work and the two products are compatible; what you need to be aware of though is the correct way of linking BI Mobile HD in this environment to your corporate SSO and security platform, and work together with your network security team to quickly identify where the issue is coming from if things don’t work first time. Engaging with an experienced OBIEE partner such as Rittman Mead can of course help, and Oracle’s own product development and support teams have most probably seen most of the issues that can come up and can help if things get tricky. The team here at Rittman Mead have several customers successfully using Good and other vendor’s MDM solutions along with Oracle BI Mobile, and of course we can help with the app signing and deployment process if your organization doesn’t usually work with Macs or have experience with Xcode and Oracle Mobile Security Toolkit.

Finally, thanks to Chris Redgrave from the Rittman Mead team, and Oracle’s Matt Milella and Jacques Vigeant who helped me with the background research for this article and the ODTUG BI Mobile Day presentation. As I mentioned earlier the presentation from the ODTUG event is available on Slideshare, and there’s also walkthroughs for deploying BI Mobile HD within Oracle OMSS on iOS and Android on the OTN website.

New Oracle Magazine article on Oracle BI Cloud Service

May 24th, 2015 by

NewImageThe May/June 2015 edition of Oracle Magazine is now out, and my Business Intelligence article in this edition is on Oracle BI Cloud Service (BICS). In “Upload, Model, Analyze and Report” I focus on the “departmental power-user” use-case where someone with a small team wants to share data and dashboards with others in the department, is familiar with OBIEE but wants to get something up-and-running quickly without having to include IT or the formal development process. By just uploading a spreadsheet of data and quickly modeling it into a star-schema using BICS’s simple-to-use web-based tools, you can create reports that can be shared with others in your team using OBIEE’s familiar dashboard interface.

Rittman Mead offer a number of services around Oracle BI Cloud Service and now have our own BICS “pod” for demonstration and evaluation purposes. Now that Oracle have released Visual Analyser early for BICS we’re expecting a lot of interest and demand for services, support and training around Oracle’s cloud version of OBIEE, so if you’re interested in moving part of your OBIEE estate in the cloud, or you’re a departmental BI lead looking to run OBIEE within your department without the need to get IT involved, drop us a line at enquiries@rittmanmead.com and we’ll be pleased to help.

Connecting OBIEE 11.1.1.9 to Hive, HBase and Impala Tables for a DW-Offloading Project

May 22nd, 2015 by

In two previous posts this week I talk about a client request to offload part of their data warehouse top Hadoop, taking data from a source application and loading it into Hive tables on Hadoop for subsequent reporting-on by OBIEE11g. In the first post I talked about hosting the offloaded data warehouse elements on Cloudera Hadoop CDH5.3, and how I used Apache Hive and Apache HBase to support insert/update/delete activity to the fact and dimension tables, and how we’d copy the Hive-on-HBase fact table data into optimised Impala tables stored in Parquet files to make sure reports and dashboards ran fast.

NewImage

In the second post I got into the detail of how we’d keep the Hive-on-HBase tables up-to-date with new and changed data from the source system, using HiveQL bulk-inserts to load up the initial table data and a Python script to handle subsequent inserts, updates and deletes by working directly with the HBase Client and the HBase Thrift Server. Where this leaves us at the end then is with a set of fact and dimension tables stored as optimised Impala tables and updatable Hive-on-HBase tables, and our final step is to connect OBIEE11g to it and see how it works for reporting.

NewImage

As I mentioned in another post a week or so ago, the new 11.1.1.9 release of OBIEE11g supports Cloudera Impala connections from Linux servers to Hadoop, with the Linux Impala drivers being shipped by Oracle as part of the Linux download and the Windows ones used for the Admin Tool workstation downloadable directly from Cloudera. Once you’ve got all the drivers and OBIEE software setup, it’s then just a case of setting up the ODBC connections on the Windows and Linux environments, and you should then be in a position to connect it all up.

NewImage

In the Impala side, I first need to create a copy of the Hive-on-HBase table I’ve been using to load the fact data into from the source system, after running the invalidate metadata command to refresh Impala’s view of Hive’s metastore.

Next I import the Hive-on-HBase and the Impala table through the Impala ODBC connection – even though only one of the tables (the main fact table snapshot copy) was created using Impala, I still get the Impala speed benefit for the other three tables created in Hive (against the HBase source, no less). Once the table metadata is imported into the RPD physical layer, I can then create a business model and subject area as I would do normally, so my final RPD looks like this:

NewImage

Now it’s just a case of saving the repository online and creating some reports. If you’re using an older version of Impala you may need to disable the setting where a LIMIT clause is needed for every GROUP BY (see the docs for more details, but recent (CDH5+) versions will work fine without this). Something you’ll also need to do back in Impala is compute statistics for each of the tables, like this:

Fetched 1 row(s) in 0.01s

Apart from being generic “good practice” and giving the Impala query optimizer better information to form a query plan with, you might hit the error below in OBIEE if you don’t do this.

NewImage

If you do hit this error, go back to the Impala Shell or Hue and compute statistics, and it should go away next time. Then, finally, you can go and create some analyses and dashboards and you should find the queries run fine against the various tables in Hadoop, and moreover the response time is excellent if you use Impala as the main query engine.

NewImage

I did a fair bit of testing of OBIEE 11.1.1.9 running against Cloudera Impala, and my findings were that all of the main analysis features worked (prompts, hierarchies, totals and subtotals etc) and the response time was comparable with a well-turned data warehouse, maybe even Exalytics-level of speed. If you take a look at the nqquery.log file for the Impala SQL queries OBIEE is sending to Impala, you can see they get fairly complex (which is good, as I didn’t hit any errors when running the dashboards) and you can also see where the BI Server takes a more simple approach to creating subtotals, nested queries etc compared to the GROUP BY … GROUPING SETS that you get when using a full Oracle database.

Not bad though for a data warehouse offloaded entirely to Hadoop, and it’s good to see such a system handling full updates and deletes to data as well as insert appends, and it’s also good to see OBIEE working against an Impala datasource and with such good response times. If any of this interests you as a potential customer, feel free to drop me an email at mark.rittman@rittmanmead.com, or check-out our Big Data Quickstart page on the website.

Loading, Updating and Deleting From HBase Tables using HiveQL and Python

May 21st, 2015 by

Earlier in the week I blogged about a customer looking to offload part of the data warehouse platform to Hadoop, extracting data from a source system and then incrementally loading data into HBase and Hive before analysing it using OBIEE11g. One of the potential complications for this project was that the fact and dimension tables weren’t append-only; Hive and HDFS are generally considered write-once, read-many systems where data is inserted or appended into a file or table but generally then can’t be updated or overwritten without deleting the whole file and writing it again with the updated dataset.

To get around this problem we loaded our incoming data into HBase tables, a NoSQL key/value-store database that also runs on Hadoop and HDFS but permits update and delete operations on rows as well as selects and inserts; later on we took the main fact table stored in Hive-on-HBase and copied its contents into Impala to considerably improve the response time of queries against this tables and the still-Hive-on-HBase dimension tables, but going back to the insert-update-delete operations on the HBase tables, how exactly does this work and what’s the most efficient way to do it?

Taking a step back for a moment, HBase is a NoSQL, key/value-type database where each row has a key (for example, “SFO” for San Francisco airport) and then a number of columns, grouped into column families. In the Flight Delays dataset that we used in the previous blog post, an HBase of origin airports might have a few thousand entries with each entry, or row, keyed on a particular airport code like this:

NewImage

(Note that at the start, these key values won’t be there – they’re more for illustrative purposes)

At the time of HBase table definition, you specify one or more “column families”. These are group headers for columns you might add earlier, and in the case of my origin airport table I might just use the column family name “dest”, so that the HBase table DDL looks like this:

and the conceptual view of the table would look like this:

NewImage

Now what’s neat about NoSQL-style databases like this (and Endeca Server is the same) is that you can define individual columns just by using them. For example, I could create columns for the airport name, airport city, airport state and airport code just by using their name in a data load, prefixing those column names with the named of a previously-defined column family. Using the HBase Shell, for example, I could issue the following PUT commands to insert the first row of data into this HBase table, like this:

Now my HBase table conceptually looks like this:

NewImage

If I then want to use another column under the “origin” column family for LAX, I can just do so by using it in the next set of PUT commands, like this:

NewImage

Each column within column families has its values individually set, retrieved and deleted using PUT, GET and DELETE commands, and as long as you prefix the column name with one of the previously-defined column-family names and provide the key value for the row you’re interested in, HBase database tables are very flexible and were designed for simple product catalog-type applications running on hundreds of sharded server nodes for companies of the likes of Amazon, Google and Facebook (see this HBase “Powered-by” page for more examples of organizations using HBase).

But what HBase very much isn’t is a relational database like Oracle, Microsoft SQL server or even Apache Hive, databases that we’re much more likely to store data warehouse-type data in. In the previous post I showed how Hive table structures can in-fact be put over HBase tables, mapping HBase columns to Hive columns, and then HiveQL INSERT INTO TABLE … SELECT commands can be used to bulk-load these HBase tables with initial sets of data. So back to the original question – what’s the best way to then incrementally load and refresh these HBase tables, and I can I still use HiveQL for this?

In my original post, I defined Hive tables over my HBase ones using the Hive-on-Hbase (yum install hive-hbase) package and associated Hive storage handler; for example, the Hive table that provided SQL access over the flight_delays HBase tables was defined like this:

With the underlying HBase table defined with a key and two column families, one for dimension columns and one for fact (measure) ones – the key is a sequence number that I added to the source dataset to give each row a unique identifier.

To initially populate the table, I’ve created another Hive table with the initial set of source data in it, and I just insert its values in to the Hive-on-HBase table, like this:

This initial load of 200,000 rows in this instance took 40 seconds to load; not bad, certainly acceptable for this particular project. Imagine now for every day after this we typically added another 500 or so flight records; in regular Hive this would be straightforward and we’d use the LOAD DATA or INSERT INTO TABLE … SELECT commands to add new file data to the Hive table’s underlying HDFS directories. And we can do this with the Hive-on-HBase table too, with the INSERT INTO TABLE command adding the incoming data to new rows/cells in the HBase table. Checking the row count and min/max ID values in the Hive-on-HBase table at the start, like this:

I can see that there’s 200,000 rows in the HBase table, starting at key value 1 and ending at key value 200,000. The table containing new data has key values going from 200,001 to 200,500, so let’s insert that new data into the Hive-on-HBase table:

Not bad – 26 seconds for the 500 rows, not quite as fast as the initial load but acceptable. Let’s also check that the data went in OK:

As I’d hoped, the number of rows has increased by 500 and the maximum key value is now 200,500. But how do we apply updates to the data in the table? I’ve got another source table that this time contains 1,000 randomly-selected rows from the initial data load dataset, where I’ve set the LATE column value to ‘999’:

In fact the way you apply these updates is just to INSERT INTO TABLE … SELECT again, and the incoming values create new versions of existing rows/cells if needed. Some versions of HBase automatically keep a number of versions of each cell value (typically 3 versions), however the version of HBase that comes with CDH5.2 and higher only keeps one version by default (you can increase this number per table, or system wide, using the steps in the CDH5.2 release notes). Let’s try this out now, first using the HBase shell to see the values and timestamps currently held for one particular key value I know should by updated by the next dataset:

I’ll now use Hive to apply the updates, like this:

Notice how this third INSERT didn’t create any new rows, the max key ID in the follow-up query hasn’t increased since the previous insert of new data. Querying one of the rows that I know was changed by this new table of data updates, I can see that the LATE column value has been changed:

Let’s go into the HBase shell now and take a look at the columns cells for that same key ID:

Notice how the timestamp for each of the cells has now updated? If I had more than the default 1 version of each cell enabled, I could query the previous versions to see the old values and timestamps. So this works pretty well, and all I need to do is use HiveQL and INSERT INTO TABLE … SELECT to initially populate, append to and even update values in the table. But what If I want to update HBase more “programmatically”, maybe as part of a process that reads directly from a source application (for example, Salesforce or a web service) and then writes directly into HBase without the intermediate step of landing the incoming data into a file? For this we can use the HBase Client API of which there are libraries for many languages with the most popular being the Java API. If Java is too much though and you’d rather interact with HBase using a language such as Python, as this Cloudera blog post explains you can use either a REST API interface to HBase or one using the Thrift interface and work with languages such as Python.

In my case, my preferred way of programatically working with HBase is to use Python and a developer library called Happybase, where I can also bring in other libraries such as ones to work with Hive and even ones to work with OBIEE and Fusion Middleware and do my work at a much higher-level of abstraction. To show how this might work, I’m going to use Python, the HBase Client API and Happybase to programatically read from my update Hive tables (in real-life I’d probably connect directly to a web service if going down this more complicated route) and write a routine to read rows from the Hive table and load them into HBase.

Again I’m using the Oracle Big Data Lite 4.1 VM which has Python 2.7.6 already installed, and to get ready to install the Happybase library I first need to install pip, the “preferred installer program” for Python. As per the pip installation instructions, first download pip and then install it from the command-line:

Then use Pip to install Happybase 

Whist you’re there you might as well install “pyhs2”, another python package that in this case lets us easily connect to Hive tables via the HiveServer2 interface found on CDH5+ and the Big Data Lite 4.1 VM.

Now I can put together a Python program such as the one below, that in this case creates a connection to a Hive table, selects all rows from it into a cursor and then PUTs these rows into the HBase table, via a batch process that sends data to HBase via the Thrift interface every 10,000 rows:

which I can then run from the command-line like this:

As I said, using this approach I could just as easily connect to a web service or read in data via Flume or Kafka, and I can delete rows as well as insert/update them and add any other logic. From my testing it’s not all that faster than going via HiveQL and INSERT INTO TABLE … SELECT scripts (most probably because I’m still going into HBase indirectly, via the Thrift interface) but it does offer the possibility of direct inserts into HBase (and therefore Hive) from the source application without the intermediate step of writing files to disk.

So to finish this short series, tomorrow I’ll look at how well these Hive-on-HBase tables, and the Impala table I created in the previous example, work when queried from OBIEE11g. Back tomorrow.

Website Design & Build: tymedia.co.uk