The vast majority of our customers use Oracle Database as the underlying data source for OBIEE 11g, sometimes on Exadata but almost always the Enterprise Edition of the database. OBIEE typically sells best into customers that already extensively use Oracle products, and often it's often bundled-in with a wider technology or apps deal including database and middleware products. But for some customers, the cost of the Oracle Database as a data source can be prohibitive and in many cases, they're also looking at purchasing and deployment options that don't involve upfront perpetual license deals and running the software on-premise. One of our longest-standing customers came to us with just such a scenario a few weeks ago, and so I thought it'd be interesting to take a look at what alternatives there are to licensing Oracle Database Enterprise Edition, and whether running these alternatives in the cloud was a practical option.
To set the context: imagine that you wanted to deploy a small-to-medium sized data warehouse on two servers:
- A "production" server with 64GB RAM, 4 x 4 core servers, Intel x86_64 architecture and running Oracle Linux 6 (for licensing purposes, 8 processors)
- A "dev/test" server with the same spec, onto which we'd install two separate databases
For now we'll ignore the ETL element, but if we were licensing Oracle Data Integrator Enterprise Edition, we'd need to license it for these two servers too at around $23k/processor.
So what are the alternatives to Oracle Database Enterprise Edition as a data source? To my mind, there are three main options:
- Trading down to fewer processors but keeping Oracle Database EE, or going down to a lower-specced version such as Standard Edition or Standard Edition One
- Using an alternative RDBMS, probably either open-source or a "community edition"
- The left-field option - using Hadoop, NoSQL, Hive and the other new "big data" technologies.
Let's take a look at the Oracle options first of all. Based on 8 processors (4 x 4 Intel x86_64 cores, x0.5 multiplication factor), the cost of licensing Oracle DB EE + commonly-used database options based on the current Tech Price List would be as follows (at list price):
- Oracle Database Enterprise Edition ($47.5k x 8 = $380k)
- Partitioning Option ($11.5k x 8 = $92k)
- EM Tuning Pack ($5k x 8 = $40k)
- EM Diagnostics Pack ($5k x 8 = $40k)
Giving us a total of $552k for 8 processors (covering both of our servers). Add onto that the standard 22% support and software updates, and you're looking at an initial payment of $673k or the perpetual license, and an ongoing $121k/year for annual maintenance. Factor in a DBA and the hardware as well, and you'll probably not get much change out of $800k in year one, and $250k in the years afterwards.
You could switch-down to one of the cheaper editions of Oracle Database, which also don't come with the add-on packs, which could take you down to $5.8k/processor for SE1 or $17.5k for SE, but you'll then have to do without most of the high-end data warehousing features in the Oracle database, and work within socket restrictions when it comes to SE1. Now of course you get what you pay for with Oracle Database, including potentially lower ongoing admin costs because much of the tuning and optimisation work is either unnecessary or automated, but if price is the issue and the alternative is a cut-down version of Oracle, what else is on offer - and that works with OBIEE?
If you start to look around at the alternatives to Oracle, and assuming you're discounting products like IBM DB/2 or Microsoft SQL Server, you'll find there are two main "conventional" types of RBDMS that you could use with OBIEE:
- General purpose RBDMS's, such as mySQL and PostgreSQL
- "Analytical" databases, typically in-memory, column-store and shared-nothing, such as Greenplum and Vertica
If you look under the covers with the analytical databases, in most cases you'll find that the core RBDMS technology they use is in fact PostgreSQL, with the (open-source) code then forked and adapted to work with the particular technology angle each vendor chooses to emphasise. Most developers tend to think that PostgreSQL is a more "enterprise-scale" free RDBMS than mySQL, and so given that it also provides the core DB technology in the other products, how might this work with OBIEE?
As a bit of background, PostgreSQL is an open-source database that's of a similar vintage to Oracle Database, and you can download it for free or install it as part of most Linux distributions. Many developers swear by PostgreSQL itself as being all they need for data warehouse scenarios, but a couple of vendors have taken the PostgreSQL code and commercialised it, adding their own management tools and support options around the core open-source code. The most famous of these is EnterpriseDB, who package up PostgreSQL into a number of options one of which is called Postgres Plus Advanced Server. If you're looking for an open-source based, low-cost but mature RBDMS engine for your database, this is more or less the flagship option, so I set about creating a new OBIEE 188.8.131.52 VM for testing purposes, with the aim of using Postgres Plus Advanced Server as my data source.
In fact, the first issue you hit when trying to go "non-Oracle" for the database, is that you need a Oracle, SQL Server or IBM DB/2 database to install the RCU schemas into. PostgreSQL isn't an option, but in reality most customers using OBIEE will have a small Oracle database available somewhere, so I cheated here and installed the RCU schemas into a spare Oracle database, and then went on with the install.
Installing Postgres Plus Advanced Server itself was pretty straightforward, with the installer separately downloading the core PostgreSQL files and the Apache HTTP server used for its web-based admin tools. On Windows, once you've installed the software there's a new Start Menu entry with the EnterpriseDB-specific (value-add) tools, including a console-based Enterprise Manager-type tool called pgAdmin.
Opening up pgAdmin gives you an environment that's similar to that you'd find in tools such as TOAD, SQL*Developer and the old Java thick-client based Enterprise Manager, and using the tool I was able to connect to my PostgreSQL database, create a few tables, and run some test SQL queries.
It's pretty basic stuff compared to Oracle's Enterprise Manager product, but it may be all you need if you're looking for a tactical, or departmental solution. EnterpriseDB price Postgres Plus Advanced Server at just under $5k/processor/year for the core database (one processor = one socket, however many cores), with their Enterprise Manager product costing $1500/year for two monitored hosts, and their "Solution Pack" another $5k/year, so based on the same server sizing as we used for the Oracle costings, the price of an EnterpriseDB solution looks a bit like this:
- Postgres Plus Advanced Server ($5k x 8 = $40k)
- Enterprise Manager EM Tuning Pack ($1.5k)
- Solutions Pack ($5k)
Coming out at around $47k/year for the two servers. Now the Oracle licenses were perpetual not annual, but even so, the total cost of the on-premise PostgreSQL solution is about 1/3rd of just the annual maintenance cost of the Oracle software, let alone the software license cost, so it's considerably cheaper. So what don't you get with PostgreSQL - either core, or commercialised via EnterpriseDB, that you get with Oracle?
Now as a disclaimer, I'm mainly an Oracle person and can't really speak with any great depth on PostgreSQL, but the major things that are missing in PostgreSQL compared to Oracle Database 11g are:
- Proper, in-built support for parallel query - PostgreSQL has an open-source GridSQL project but this looks more like a federated, shared-nothing solution rather than PQ within a single server
- Built-in ETL in the form of Oracle Warehouse Builder
- Anything like Oracle's built in MOLAP server, Advanced Analytics and the like
- All the DR options, security options and so forth
And no doubt the core DW features - bitmap indexing, cost-based optimiser, support for VLDBs and so on - are not nearly as developed, and easy to manage, as with the Oracle database. But - given that not all customers need these features, and that many customers even on 10g and 11g are still using the database as if it were Oracle 7 or 8i - lack of such features may not be a show-stopper if money is tight.
Now so far we've been talking about running PostgreSQL "on-premise", more or less a a direct substitute for Oracle Database; however, it's also possible to run Postgres Plus Advanced Server on Amazon's AWS cloud platform, with no local install of the software, simplified administration but access to the same core Postgres Plus Advanced Server features billed at an hourly rate, giving you an architecture like the diagram below:
EnterpriseDB and AWS provide a free 24-hour trial, so I set up another database this time on the Amazon cloud, provisioned it and created the same set of tables.
Connecting to the cloud-based Postgres Plus database was the same as connecting to a regular, on-premise one, and in fact this is more-or-less just commercial PostgreSQL on Amazon AWS. Instead of charging for software and support up-front as with regular Oracle or on-premise Postgres Plus Advanced Server, you're instead charged, AWS-style, an hourly rate based on the size of instance - one comparable to one of the Oracle servers mentioned beforehand comes in at $6.48/hour or $56k/year, or $112k for the two servers we'd need, still overall less than even the Oracle software maintenance charge but covering hosting as well (though you'll need to pay Amazon storage and bandwidth charges on top, too).
Connecting OBIEE 11g to these two databases is also fairly straightforward - the Data Direct Postgres ODBC drivers that come with OBIEE work fine, with the connection process involving creating an ODBC DSN to the Postgres database either on-premise on on AWS, then importing the tables into the BI Repository and modelling them as normal. Connecting to the AWS cloud as a data source worked fine as wasn't laggy, but you'd obviously need to test it at greater volume and with a realistic network connection. But - it works and seems to perform OK, at least with three rows...
So something like Postgres Plus Advanced Server in the cloud would probably appeal to a customer with a temporary requirement, a departmental budget, or an infrastructure that's already in the cloud, most probably on Amazon AWS. But you can also run Oracle Database on Amazon's AWS cloud as well, through a partnership between Oracle and Amazon's RDS ("Relational Database Service"). In this case, you go to the Amazon AWS website, sign-up for the service, put in your credit card details and within a few minutes, you've got a provisioned, Standard Edition One database ready to connect to via all the usual tools. And what this is, again, is the standard Oracle Database (or with EnterpriseDB's offering, Postgres Plus Advanced Server) but hosted on AWS, charged hourly, with two main options for pricing:
- A license-included option (Oracle Database Standard Edition One) at $3.14/hour for a reasonably specced-instance, coming out at about $27k/year, or $54k for the two servers we'd need
- A "bring-your-own licence" (BYOL) option, where you provide the Oracle licenses and Amazon AWS hosts it, at around $19k/year for the same size instance, $38k/year for the two servers we'd need
So if you work on the assumption that Oracle Database SE1 is roughly equivalent to Postgres Plus Advanced Server in terms of DW features, it's not a bad option, and probably a better one if you're already heavily invested in Oracle and have lots of skills there. The BYOL option is really just about hosting, but at $38k/year to host the two servers, back them up and so on, it's not a bad option compared to buying some servers and putting them somewhere.
So, with Amazon RDS or Postgres Plus Advanced Server in the cloud, if you're happy to take the hit on features and scaleability it's possible to bring your database costs down considerably, which may be an option when your project first starts off, and with the database platform reviewed say in a couple of years' time to see whether it's worth trading up to Oracle Database Enterprise Edition, or bring the whole thing back in-house. But Postgres and Oracle aren't the only players in the analytical database market - there are other on-premise vendors such as Greenplum and Vertica who'll sell you either a full-priced, Oracle-comparable database and who have cut-down, community editions available for single-node installs, or there are new players such as Amazon again who are bringing out cloud-native analytic databases again based on instance sizes and hourly charging - so lets look at one of them, Amazon's "Redshift".
Amazon Redshift is actually based on an analytic database product called ParAccel, a column-store database in the same vein as Vertica but with major bits cut-out in it's Amazon Redshift form. That said - it's a lot more of an interesting starting point than core PostgreSQL or Oracle Database SE1, an given that it's a core Amazon AWS product, it's no doubt going to receive a lot of investment, developer evangelism and usage over the next few years. It's also Postgres-compatible in terms of ODBC drivers, so viable for OBIEE, though like Postgres unsupported, so you'll not get much in the way of BI Server optimisations or help from Oracle Support if it goes wrong.
To connect OBIEE to Amazon Redshift there's a set of ODBC and JDBC drivers that you can download, and once you've opened up the port to Redshift in the AWS firewall, you can create an ODBC connection on the OBIEE side just like most other PostgreSQL connections, and then import the Redshift metadata into OBIEE's BI Repository.
Redshift is priced around the same mark as Postgres Plus Advanced Server, but without the Amazon AWS storage and bandwidth charges. It's a bit higher-risk than Postgres Plus, and it's not got the Oracle Database compatibility features EnterpriseDB added to core Postgres, but it's column-store, cloud-native and backed fully by Amazon.
So - with all of this in mind, what are the conclusions? Well it's fair to say that, based on what turned-up today and OBIEE's pre-existing heterogenous database support, you can connect non-Oracle databases to OBIEE as well as Oracle ones, but you'll need to bear in mind that the newer cloud-based ones, and the open-source ones, won't necessarily come with support from Oracle. If you're looking to keep as close to an Oracle solution as possible but cut costs dramatically, using Amazon's RDS service particularly with the license-included SE1 option could bring down the cost considerably, from $800k or so to around $50k/year, but you'll need to work within the product constraints of SE/SE1, so no materialised views, partitioning option and the like - so it's probably a matter of preference whether you'd go for Oracle SE1 or a Postgres-based solution such as EnterpriseDB's Postgres Plus Advanced Server (I'd probably go with Amazon RDS and Oracle Database SE1, if only because I can upgrade in-time to Oracle Database Enterprise Edition and my Oracle skills can be re-used).
Going forward - analytic databases-as-a-service certainly sound interesting, with Amazon Redshift looking particularly interesting. How much customers will take up cloud-based data warehousing databases in-general though will probably be limited until OBIEE itself goes into the cloud - either through cloud-friendly licensing policies that charge by the hour and instance capacity, or with a cloud-native version of OBIEE designed to interoperate with Oracle's Cloud Database offering, which up until now doesn't provide external SQL*Net access and is really designed for powering cloud-based Java apps. Watch this space as they say - and it'll be interesting to hear what Oracle announce in this area at next months' Oracle Openworld.