Introduction to Oracle BI Cloud Service : Provisioning Data

September 23rd, 2014 by

In the first post in this series I looked at the new Oracle BI Cloud Service, which went GA over the weekend and which Rittman Mead have been using these past few weeks as part of a beta release. In the first post I looked at what BICS is and who its aimed at in this initial release, and went through the features at a high-level; over the rest of the week I’ll be looking at the features in-detail, starting today with the data upload and provisioning process. Here’s the links to the rest of the series, with the items getting updated over the week as I post each entry in the series:

As I mentioned in that first post, “Introduction to Oracle BI Cloud Service : Product Overview”, BICS in this initial release to my mind is aimed at departmental use-cases where someone wants to quickly upload and analyse an offline dataset and share the results with other members of their team. BICS comes bundled with Oracle Database Schema Service and 50GB of storage, and OBIEE in this setup reports just against this data source with no ability to reach-out dynamically to other data sources or blend those sources with the main one in Oracle’s cloud database. It’s aimed really at users with a single source of data to work with, who’ve probably obtained it as an export from some other system and just want to be able to report against it, though as we’ll see later in this post it is possible to link to other SaaS sources with a bit of PL/SQL wizardry.

So the first task you’re likely to perform when working with BICS is to upload some data to report on. There are three main options for uploading data to BICS, two of which are browser-based and aimed at end-users, and one that uses SQL*Developer and more aimed at devs. BICS itself comes with a menu items on the home page for uploading data, and this is what we’ll think users will use most as it’s built-into the tool and fairly prominent.

NewImage

Clicking on this menu item launches an ApEx application hosted in the Database Schema Service that comes with BICS, and which allows you to upload and parse XLS and delimited file-types to the database cloud instance and then store the contents in database tables.

NewImage

Oracle Database Schema Service also comes with Application Express (ApEx) as a front-end, and ApEx has similar tools for upload datasets into the service, with additional features for creating views and PL/SQL packages to process and manipulate the data, something we used in our beta program example to connect to Salesforce.com and download data using their REST API. In-theory you shouldn’t need to use these features much, but SIs and partners such as ourselves will no doubt use ApEx a lot to build out the loading infrastructure, data cleansing and other features that you might want for a packaged cloud app – so get your PL/SQL books out and brush-up on ApEx development.

NewImage

The other way to get data into BICS is to use Oracle SQL*Developer, which has a special Oracle Cloud connector type that allows you to view and work with database objects as if they were regular database ones, and upload data to the cloud in the form of “carts”. I’d imagine these options will get extended over time, either by tools or utilities Oracle release for this v1.0 BICS release, or by BICS eventually supporting the full Oracle Database Instance Service that’ll support regular SQL*Net connections from ETL tools.

NewImage

So once you’ve got some data uploaded into Database Schema Services, you’ll end up with a set of source tables from which you can create your BI Repository. Check back tomorrow for more details on how BICS’s new thin-client data modeller works and how you create your business model against this cloud data source, including how the repository editing and checkout process works in this new potentially multi-user development environment.

 

Introduction to Oracle BI Cloud Service : Product Overview

September 22nd, 2014 by

Long-term readers of this blog will probably know that I’m enthusiastic about the possibilities around running OBIEE in the cloud, and over the past few weeks Rittman Mead have been participating in the beta program for release one of Oracle’s Business Intelligence Cloud Service (BICS). BICS went GA over the weekend and is now live on Oracle’s public cloud site, so all of this week we’ll be running a special five-part series on what BI Cloud Service is, how it works and how you go about building a simple application. I’m also presenting on BICS and our beta program experiences at Oracle Openworld this week (Oracle BI in the Cloud: Getting Started, Deployment Scenarios, and Best Practices [CON2659], Monday Sep 29 10:15 AM – 11.00 AM Moscone West 3014), so if you’re at the event and want to hear our thoughts, come along.

Over the next five days I’ll be covering the following topics, and I’ll update the list with hyperlinks once the articles are published:

So what is Oracle BI Cloud Service, and how does it relate to regular, on-premise OBIEE11g?

On the Oracle BI Cloud Service homepage, Oracle position the product as “Agile Business Intelligence in the Cloud for Everyone”, and there’s a couple of key points in this positioning that describe the product well.

NewImage

The “agile” part is referring to the point that being cloud-based, there’s no on-premise infrastructure to stand-up, so you can get started a lot quicker than if you needed to procure servers, get the infrastructure installed, configure the software and get it accepted by the IT department. Agile also refers to the fact that you don’t need to purchase perpetual or one/two-year term licenses for the software, so you can use OBIEE for more tactical projects without having to worry about expensive long-term license deals. The final way that BICS is “agile” is in the simplified, user-focused tools that you use to build your cloud-based dashboards, with BICS adopting a more consumer-like user interface that in-theory should mean you don’t have to attend a course to use it.

BICS is built around standard OBIEE 11g, with an updated user interface that’ll roll-out across on-premise OBIEE in the next release and the standard Analysis Editor, Dashboard Editor and repository (RPD) under the covers. Your initial OBIEE homepage is a modified version of the standard OBIEE homepage that lists standard developer functions down the left-hand side as a series of menu items, and the BI Administration tool is replaced with an online, thin-client repository editor that provides a subset of the full BI Administration tool functionality.

NewImage

Customers who license BICS in this initial release get two environments (or instances) to work with; a pre-prod or development environment to create their applications in initially, and a production environment into which they deploy each release of their work. BICS is also bundled with Oracle Database Schema Service, a single-schema Oracle Database service with an ApEx front-end into which you store the data that BICS reports on, and with ApEx and BICS itself having tools to upload data into it; this is, however, the only data source that BICS in version 1 supports, so any data that your cloud-based dashboards report on has to be loaded into Database Schema Service before you can use it, and you have to use Oracle’s provided tools to do this as regular ETL tools won’t connect. We’ll get onto the data provisioning process in the next article in this five-part series.

BICS dashboards and reports currently support a subset of what’s available in the on-premise version. The Analysis Editor (“Answers”) is the same as on-premise OBIEE with the catalog view on the left-hand side, tabs for Results and so on, and the same set of view types (and in fact a new one, for heat maps). There’s currently no access to Agents, Scorecards, BI Publisher or any other Presentation Services features that require a database back-end though, or any Essbase database in the background as you get with on-premise OBIEE 11.1.1.7+.

NewImage

What does become easier to deploy though is Oracle BI Mobile HD as every BICS instance is, by definition, accessible over the internet. Last time I checked the current version of BI Mobile HD on Apple’s App Store couldn’t yet connect, but I’m presuming an update will be out shortly to deal with BICS’s login process, which gets you to enter a BICS username and password along with an “identity domain” that specifics the particular company tenant ID that you use.

NewImage

I’ll cover the thin-client data modeller later in this series in more detail, but at a high-level what this does is remove the need for you to download and install Oracle BI Administration to set up your BI Repository, something that would have been untenable for Oracle if they were serious about selling a cloud-based BI tool. The thin-client data modeller takes the most important (to casual users) features of BI Administration and makes them available in a browser-based environment, so that you can create simple repository models against a single data source and add features like dimension hierarchies, calculations, row-based and subject-area security using a point-and-click environment.

NewImage

Features that are excluded in this initial release include the ability to define multiple logical table sources for a logical table, creating multiple business areas, creating calculations using physical (vs. logical) tables and so on, and there’s no way to upload on-premise RPDs to BICS, or download BICS ones to use on-premise, at this stage. What you do get with BICS is a new import and export format called a “BI Archive” which bundles up the RPD, the catalog and the security settings into a single archive file, and which you use to move applications between your two instances and to store backups of what you’ve created.

So what market is BICS aimed at in this initial release, and what can it be used for? I think it’s fair to say that in this initial release, it’s not a drop-in replacement for on-premise OBIEE 11g, with only a subset of the on-premise features initially supported and some fairly major limitations such as only being able to report against a single database source, no access to Agents, BI Publisher, Essbase and so on. But like the first iteration of the iPhone or any consumer version of a previously enterprise-only tool, its trying to do a few things well and aiming at a particular market – in this case, departmental users who want to stand-up an OBIEE environment quickly, maybe only for a limited amount of time, and who are familiar with OBIEE and would like to carry on using it. In some ways its target market is those OBIEE customers who might otherwise have use Qlikview, Tableau or one of the new SaaS BI services such as Good Data, who most probably have some data exports in the form of Excel spreadsheets or CSV documents, want to upload them to a BI service without getting all of IT involved and then share the results in the form of dashboards and reports with their team. Pricing-wise this appears to be who Oracle are aiming the service at (minimum 10 users, $3500/month including 50GB of database storage) and with the product being so close to standard OBIEE functionality in terms of how you use it, it’s most likely to appeal to customers who already use OBIEE 11g in their organisation.

That said, I can see partners and ISVs adopting BICS to deliver cloud-based SaaS BI applications to their customers, either as stand-alone analysis apps or as add-ons to other SaaS apps that need reporting functionality. Oracle BI Cloud Service is part of the wider Oracle Platform-as-a-Service (PaaS) that includes Java (WebLogic), Database, Documents, Compute and Storage, so I can see companies such as ourselves developing reporting applications for the likes of Salesforce, Oracle Sales Cloud and other SaaS apps and then selling them, hosting included, through Oracle’s cloud platform; I’ll cover our initial work in this area, developing a reporting application for Salesforce.com data, later in this series.

NewImage

Of course it’s been possible to deploy OBIEE in the cloud for some while, with this presentation of mine from BIWA 2014 covering the main options; indeed, Rittman Mead host OBIEE instances for customers in Amazon AWS and do most of our development and training in the cloud including our exclusive “ExtremeBI in the Cloud” agile BI service; but BICS has two major advantages for customers looking to cloud-deploy OBIEE:

  • It’s entirely thin-client, with no need for local installs of BI Administration and so forth. There’s also no need to get involved with Enterprise Manager Fusion Middleware Control for adding users to application roles, defining application role mappings and so on
  • You can license it monthly, including data storage. No other on-premise license option lets you do this, with the shortest term license being one year

such that we’ll be offering it as an alternative to AWS hosting for our ExtremeBI product, for customers who in-particular want the monthly license option.

So, an interesting start. As I said, I’ll be covering the detail of how BICS works over the next five days, starting with the data upload and provisioning process in tomorrow’s post – check back tomorrow for the next instalment.

Getting The Users’ Trust – Part 2

September 18th, 2014 by

Last time I wrote about the performance aspects of a BI system and how they could affect a user’s confidence. I concluded by mentioning that incorrect data might be generated by poorly coded ETL routines causing data loss or duplication. This time I am looking more at the quality of the data we load (or don’t load).

Back in the 1990’s I worked with a 4.5 TB DWH that had a single source for fact and reference data, that is the data loaded was self-consistent. Less and less these days we find a single source DWH to be the case; we are adding multiple data sources (both internal and external). Customers can now appear on CRM, ERP, social media, credit referencing, loyalty, and a whole host of other systems. This proliferation of data sources gives rise to a variety of issues we need to be at least aware of, and in reality, should be actively managing. Some of these issues require us to work out processing rules within our data warehouse such as what do we do with fact data that arrives before its supporting reference data; I once had a system where our customer source could only be extracted once a week but purchases made by new customers would appear in our fact feed immediately after customer registration. Obviously, it is a business call on whether we publish facts that involve yet to be loaded customers straight away or defer those loads until the customer has been processed in the DWH. In the case of my example we needed to auto-create new customers in the data warehouse with just the minimum of data, the surrogate key and the business key and then do a SCD type 1update when the full customer data profile is loaded the following week. Technical issues such as these are trivial, we formulate and agree a business rule to define our actions and we implement it in our ETL or, possibly, the reporting code. In my opinion the bigger issues to resolve are in Data Governance and Data Quality.

Some people combine Data Quality and Governance together as a single topic and believe that a single solution will put all right. However, to my mind, they are completely separate issues. Data quality is about the content of the data and governance is about ownership, providence and business management of the data. Today, Data Governance is increasingly becoming a regulatory requirement, especially in finance.

Governance is much more than the data lineage tools we might access in ETL tools such as ODI and even OWB. ETL lineage is about source to target mappings; our ability to say that ‘bank branch name’ comes from this source attribute, travels through these multiple ODI mappings and finally updates that column in our BANK_BRANCH dimension table. In true Data Governance we probably do some or all of these:

  • Create a dictionary of approved business terms. This will define every attribute in business terms and also provide translations between geographic and business-unit centric ways of viewing data. In finance one division may talk about “customer”, another division will say “investor”, a third says “borrower”; in all three cases we are really talking about the same kind of object, a person. This dictionary should go down to the level of individual attribute and measures and include the type of data being held such as text, currency, date-time, these data types are logical types and not physical types as seen on the actual sources. It is important that this dictionary is shared throughout the organisation and is “the true definition” of what is reported.
  • Define ownership (or stewardship) for the approved business data item.
  • Map business data sources and targets to our approved list of terms (at attribute level). It is very possible that some attributes will have multiple potential sources, in such cases we must specify which source will be the master source.
  • Define processes to keep our business data aligned.  
  • Define ownership for the sources for design (and for static data such as ISO country codes, content) change accountability. Possibility integrate into change notification mechanism of change process.
  • Define data release processes for approved external reference data.
  • Define data access and redaction rules for compliance purposes.
  • Build-in audit and control.
As you can see we are not, in the main, talking data content, instead we are improving our description of the business data over that are already held in database data dictionaries and XSD files. This is still metadata and is almost certainly best managed in some kind of Data Governance application. One tool we might consider for this is Oracle Data Relationship Manager from the Hyperion family of products. If we want to go more DIY it may be possible to leverage some of the data responsibility features of Oracle SQL Developer Data Modeller.

Whereas governance is about using the right data and having processes and people to guarantee it is correctly sourced, Data Quality is much finer in grain and looks at the actual content. Here a tool such as Oracle Enterprise Data Quality is invaluable. By the way I have noticed that OEDQ version 12 has recently been released, I have a blog on this in the pipeline.

I tend to divide Data Quality into three disciplines:

  • Data Profiling is always going to be our first step. Before we fix things we need to know what to fix! Generally, we try to profile a sample of the data and assess it column by column, row by row to build a picture of the actual content. Typically we look at data range, nulls, number of distinct values and in the case of text data: character types used (alpha, letter case, numeric, accents, punctuation etc), regular expressions. From this we develop a plan to tackle quality, for example on a data entry web-page we may want to tighten processing rules to prevent certain “anticipated” errors; more usually we come up with business rules to apply in our next stage. 
  • Data Assessment. Here we test the full dataset against the developed rules to identify data that conforms or needs remedy. This remedy could be referring the data back to the source system owner for correction, providing a set of data fixes to apply to the source which can be validated and applied as a batch, creating processes to “fix” data on the source at initial data entry, or (and I would strongly advise against this for governance reasons) dynamically fix in an ETL process. The reason I am against fixing data downstream in ETL is that the data we report on in our Data Warehouse is not going to match the source and this will be problematic when we try to validate if our data warehouse fits reality.
  • Data de-duplication. This final discipline of our DQ process is the most difficult, identifying data that is potentially duplicated in our data feed. In data quality terms a duplicate is where two or more rows refer to what is probably (statistically) the same item, this is a lot more fuzzy than an exact match in database terms; people miskey data, call centre staff mis-hear names, companies merge and combine data sets, I have even seen customers registering a new email address because they can not be bothered to reset their password on a e-selling website. De-duplication is important to improve the accuracy of BI in general, it is nigh-on mandatory for organisations that need to manage risk and prevent fraud.
Data Quality is so important to trusted BI; without it we run the risk that our dimensions do not roll-up correctly and that we under-report by separating our duplicates. However, being correct at the data warehouse is only part of the story, these corrections also need to be on the sources; to do that we have to implement processes and disciplines throughout the organisation.
 
For BI that users can trust we need to combine both data management disciplines. From governance we need to be sure that we are using the correct business terms for all attributes and that the data displayed in those attributes has made the correct journey from the original source. From quality we gain confidence that we are correctly aggregating data in our reporting.
 
At the end of the day we need to be right to be trusted.

 

 

Getting The Users’ Trust – Part 1

September 17th, 2014 by

Looking back over some of my truly ancient Rittman Mead blogs (so old in fact that they came with me when I joined the company soon after Rittman Mead was launched), I see recurrent themes on why people “do” BI and what makes for successful implementations. After all, why would an organisation wish to invest serious money in a project if it does not give value either in terms of cost reduction or increasing profitability through smart decisions. This requires technology to provide answers and a workforce that is both able to use this technology and has faith that the answers returned allow them to do their jobs better. Giving users this trust in the BI platform generally boils down to resolving these three issues: ease of use of the reporting tool, quickness of data return and “accuracy” or validity of the response. These last two issues are a fundamental part of my work here at Rittman Mead and underpin all that I do in terms of BI architecture, performance, and data quality. Even today as we adapt our BI systems to include Big Data and Advanced Analytics I follow the same sound approaches to ensure usable, reliable data and the ability to analyse it in a reasonable time.

Storage is cheap so don’t aggregate away your knowledge. If my raw data feed is sales by item by store by customer by day and I only store it in my data warehouse as sales by month by state I can’t go back to do any analysis on my customers, my stores, my products. Remember that the UNGROUP BY only existed in my April Fools’ post. Where you choose to store your ‘unaggregated’ data may well be different these days; Hadoop and schema on read paradigms often being a sensible approach. Mark Rittman has been looking at architectures where both the traditional DWH and Big Data happily co-exist.

When improving performance I tend to avoid tuning specific queries, instead I aim to make frequent access patterns work well. Tuning individual queries is almost always not a sustainable approach in BI; this week’s hot, ‘we need the answer immediately’ query may have no business focus next week. Indexes that we create to make a specific query fly may have no positive effect on other queries; indeed, indexes may degrade other aspects of BI performance such as increased data load times and have subtle effects such as changing a query plan cost so that groups of materialized views are no longer candidates in query re-write (this is especially true when you use nested views and the base view is no longer accessed).

My favoured performance improvement techniques are: correctly placing the data be it clustering, partitioning, compressing, table pinning, in-memory or whatever, and making sure that the query optimiser knows all about the nature of the data; again and again “right” optimiser information is key to good performance. Right is not just about running DBMS_STATS.gather_XXX over tables or schemas every now and then; it is also about telling the optimiser about data relationships between data items. Constraints describe the data, for example which columns allow NULL values, which columns are part of parent-child relationships (foreign keys). Extended table statistics can help describe relationships between columns in a single table for example in a product dimensions table the product sub-category and the product category columns will have an interdependence, without that knowledge cardinality estimates can be very wrong and favour nested loop style plans that could be very poor performing on large data sets.

Sometimes we will need to create aggregates to answer queries quickly; I tend to build ‘generic’ aggregates, those that can be used by many queries. Often I find that although data is loaded frequently, even near-real-time, many business users wish to look at larger time windows such as week, month, or quarter; In practice I see little need for day level aggregates over the whole data warehouse timespan, however, there will always be specific cases that might require day-level summaries. If I build summary tables or use Materialized Views I would aim to make tables that are at least 80% smaller than the base table and to avoid aggregates that partially roll up many dimensional hierarchies; customer category by product category by store region by month would probably not be the ideal aggregate for most real-user queries. That said Oracle does allow us to use fancy grouping semantics in the building of aggregates (grouping sets, group by rollup and group by cube.) The in-database Oracle OLAP cube functionality is still alive and well (and was given a performance boost in Oracle 12c); it may be more appropriate to aggregate in a cube (or relational-look-alike) rather than individual summaries.

Getting the wrong results quickly is no good, we must be sure that the results we display are correct. As professional developers we test to prove that we are not losing or gaining data through incorrect joins and filters, but ETL coding is often the smallest factor in “incorrect results” and this brings me to part 2, Data Quality.

Rittman Mead/Oracle Data Integration Speakeasy @ Oracle Open World

September 11th, 2014 by

If you are attending Oracle Open World this year and fancy bit of a different experience, come and join Rittman Mead and Oracle’s Data Integration teams for drinks and networking at 7pm on Tuesday 30th September at the Local Edition speakeasy on Market Street.

We will be providing a couple of hours of free drinks with the opportunity to quiz our leading data integration experts and Oracle’s data integration team about any aspect of the data integration toolset, architecture and our innovative implementation approaches, and to relax and kick back at the end of a long day. So whether you want to know about how ODI can facilitate your big data strategy, or implement data quality and data governance across your enterprise data architecture, please come along.

The Local Edition is located at 691 Market St, San Francisco, CA and the event runs from 7pm to 9pm. Please register here.

For further information on this event and the sessions we are presenting at Oracle Open World contact us at info@rittmanmead.com.

Website Design & Build: tymedia.co.uk