Thoughts on Running OBIEE in the Cloud : Part 2 - Data Sources and ETL

In yesterday’s post on running OBIEE in the cloud, I looked at a number of options for hosting the actual OBIEE element; hosting it in a public cloud service such as Amazon EC2, using Oracle’s upcoming BI-as-a-Service offering, or partner offerings such as our upcoming ExtremeBI in the Cloud service. But the more you think about this sort of thing, the more you realise that the OBIEE element is actually the easy part - it’s what you do about data storage, security, LDAP directories and ETL that makes things more complicated.

Take the example I gave yesterday where OBIEE was run in the cloud, with the multi-tenancy option enabled, the main data warehouse in the cloud, and data sourced from cloud and on-premise sources.

NewImage

In this type of setup, there’s a number of things you need to consider beyond how OBIEE is hosted. For example:

  • If your corporate LDAP directory is on-premise, how do we link OBIEE to it? Or does the LDAP server also need to be in the cloud?
  • What sort of database do we use if we’re hosting it in the cloud. Oracle? If so, self-hosted in a public cloud, or through one of the Oracle DB-in-the-cloud offerings?
  • If not Oracle database, what other options are available?
  • And how do we ETL data into the cloud-based data warehouse? Do we continue to use a tool like ODI, or use a cloud-based option - or even a service such as Amazon’s AWS Data Pipeline?

What complicates things at this stage in the development of “cloud”, is that most companies won’t move 100% to cloud in one go; more likely, individual application and systems might migrate to the cloud, but for a long time we’ll be left with a “hybrid” architecture where some infrastructure stays on premise, some might sit in a public cloud, others might be hosted on third-party private clouds.  So again, what are the options?

Well Oracle’s upcoming BI-as-a-service offering works at one extreme end-of-the-spectrum; the only data source it’ll initially work with is Oracle’s own database-as-a-service, which in its initial incarnation provides a single schema, with no SQL*Net access and with data instead uploaded via a web interface (this may well change when Oracle launch their database instance-as-a-service later in 2014). No SQL*Net access means no ETL tool access though, in practice, as they all use SQL*Net or ODBC to connect to the database, so this offer to my mind is aimed at either (a) small BI applications where it’s practical to upload the data via Excel files etc, or (b) wider Oracle Cloud-based systems that might use database-as-a-service to hold their data, Java-as-a-service for the application and so forth. What this service does promise though is new capabilities within OBIEE where users can upload their own data, again via spreadsheets, to the cloud OBIEE system, and have that “mashed-up” with the existing corporate data - the aim being to avoid data being downloaded into Excel to do this type of work, and with user metrics clearly marked in the catalog so they’re distinct from the corporate ones.

But assuming you’re not going for the Oracle cloud offer, what are the other options over data? Well hosting OBIEE in the cloud is conceptually no different from hosting anywhere else, in that it can connect to various data sources via the various connection methods, so in-principle you’ve got just the same options open to you if running on premise. But the driver for moving OBIEE into the cloud might be that your applications, data etc are already in the cloud, and you might also be looking to take advantage of cloud features in your database such as dynamic provisioning and scaling, or indeed use one of the new cloud-native databases such as Amazon Redshift.

NewImage

I covered alternative databases for use with OBIEE a few months ago in a blog post, and Amazon Redshift at the time looked like an interesting option; based on ParAccel, a mature analytic database offering, column-store and tightly integrated in with Amazon’s other offerings, a few customers have asked us about this as an option. And they’re certainly interesting - in practice, not all that different in pricing to Oracle database as a source but with some interesting analytic features - but they all suffer from the common issue that they’re not officially supported as data sources. Amazon Redshift, for example, uses Postgres-derviced ODBC drivers to connect to it, but Postgres itself isn’t officially supported as a source, which means you could well get sub-optimal queries and you certainly won’t get specific support from Oracle for that source. But if it works for you - then this could be an option, along with more left-field data source such as Hadoop.

But to my mind, it’s the ETL element that’s the most interesting, and most challenging, part of the equation. Going back to Openworld, Oracle made a few mentions of ETL in their general Cloud Analytics talks, including talk about an upcoming data source adapter for the BI Apps that’ll enable loading from Fusion Apps in the cloud, like this:

NewImage

There were also a number of other deployment options discussed, including hybrid architectures where some sources were in the cloud, some on-premise, but all involved running the ETL elements of the BI Apps - ODI or Informatica - on-premise, the same way as installs today. And to my mind, this is where the Oracle cloud offering is the weakest, around cloud-based and cloud-native ETL and data integration - the only real option at the moment is to run ODI agents in the cloud and connect them back to an on-premise ODI install, or move the whole thing into the cloud in what could be quiet a heavyweight data integration architecture.

Other vendors are, in my opinion, quite a way further forward with their cloud data integration tools strategy than Oracle, who instead seem to be still focused on on-premise (mainly), database-to-database (mostly) ETL. To take two examples; Informatica have an Informatica Cloud service which appears to be a platform-as-a-service play, with customers presumably signing-up for the service, designing their ETL flows and then paying for what they use, with a focus on cloud APIs as well as database connectivity, and full services around data quality, MDM and so forth.

NewImage

Another vendor in this space is SnapLogic, a pure-play cloud ETL vendor selling a component-based product with a big focus on cloud, application and big data sources. What’s interesting about this and other similar vendor’s approaches though are they they appear to be “cloud-first” - written for the cloud, sold as a service, as much focused on APIs as database connectivity - a contrast to Oracle’s current data integration tools strategy which to my mind still assumes an on-premise architecture. What’s more concerning is the lack of any announcement around ETL-in-the-cloud at the last Openworld - if you look at the platform-as-a-service products announced at the event, whilst database, BI, documents, BPM and so forth-as-a-service were announced, there was no mention of data integration:

NewImage

What I’d like to see added to this platform in terms of data integration would be something like:

  • On-demand data integration, sold as a service, available as a package along with database, Java and BI
  • Support for Oracle and non-Oracle application APIs - for example Salesforce.com, Workday and SAP - see for example what SnapLogic support in this area.
  • No need for an install - it’s already installed and it’s a shared platform, as they’re doing with OBIEE
  • Good support for big data, unstructured and social data sources

I think it’s pretty likely this will happen - whilst products such as the BI Apps can have their ETL in the cloud, via ODI in the BI Apps 11g version for example, these are inherently single-tenant, and I’d fully expect Oracle plan at some time to offer BI Apps-as-a-service, with a corresponding data integration element designed from the ground-up to work cloud-native and integrate with the rest of Oracle's platform-as-a-service offering.

So there we have it - some thoughts on the database and ETL elements in an OBIEE-in-the-cloud offering. Keep an eye on the blog over the next few months as I built-out a few examples, and I’ll be presenting on the topic at the upcoming BIWA 2014 event in San Francisco in January - watch this space as they say.