Taking a Look at the Oracle Database 12c In-Memory Option

August 29th, 2014 by

The In-Memory Option for Oracle Database 12c became available a few weeks ago with the 12.1.0.2 database patchset, adding column-store and in-memory capabilities to the Oracle Database. Unlike pure in-memory databases such as Oracle TimesTen, the in-memory option adds an in-memory column-store feature to the regular row-based storage in the Oracle database, creating in-memory copies of selected row-store tables in a compressed column-based storage format, with the whole process being automatic and and enabled by issuing just a couple of commands – to the point where my feedback at the end of beta testing was that it was “almost boring” – said slightly tongue-in-cheek…

NewImage

But of course adding in-memory capabilities to the Oracle database is anything but boring – whilst TimesTen on Exalytics has given us the ability to store aggregate tables, and “hot data” tables  in a dedicated in-memory database co-located with the BI mid-tier, TimesTen is another set of skills to learn and another product to manage. Moreover, you’ve got to somehow get the data you want to aggregate or otherwise cache out of your data warehouse and into TimesTen, and then you’ve got the even more difficult task of keeping that TimesTen dataset in-sync with your main data warehouse data. Most importantly though, even with the 2TB and 4TB versions of Exalytics many data warehouses won’t fit entirely into memory, so you’re going to end-up with some of your data in TimesTen and some in your data warehouse, and with only the TimesTen dataset benefiting from in-memory speeds.

So what if you could enable the in-memory option on your data warehouse, and then just run it all in-memory there? To see how this worked, I thought it’d be interesting to take the flight delays dataset that comes with the latest OBIEE 11g v406 SampleApp, and copy that across to an Oracle 12.1.0.2.0 database to test it out with the in-memory option. This dataset comes with a set of dashboard pages and analyses and contains around 6m rows of data, and in SampleApp is accelerated by an aggregate schema, also stored in Oracle, that generally reduces query times down to just a few seconds.

NewImage

So what I thought would be interesting to check would be first, whether enabling the in-memory option sped-up the dashboards, and second, whether having the in-memory option removed the need for the aggregate tables altogether. I was also interested to see whether putting the aggregate tables into memory improved their access time significantly, as anecdotally I’d heard that there wasn’t much clock-time difference between accessing aggregates in a TimesTen database vs. just creating them in the same data warehouse database as the source data. To set this all up then I created the new 12.1.0.2.0 database,. exported the BI_AIRLINES and BI_AIRLINES_AGG schemas out of the 12c database that comes with SampleApp v406, and then imported them into the same schema names in the new 12.1.0.2.0 instance, on the new VM (rather than on the SampleApp VM), and then amended the connection pool details in the SampleApp RPD to point to this new, external Oracle 12.1.0.2.0 database instance. The 12.1.0.2.0 database at this point had the following memory allocation:

So I don’t end-up comparing the larger database instance on my 12.1.0.2.0 VM with the smaller one the airlines data came from on the VM, I created my baseline response time measurements on this new VM and with none of the in-memory features enabled, and ran some of the dashboard pages and clicked-on some of the prompt values – and the response time wasn’t bad, with just the odd analysis that took an excessive time to return. Mostly though, the results for each dashboard came back straight-away, what you’d expect when summary tables have been deployed to speed-up reporting.

chart1

Looking at the SQL being generated by the BI Server, you can quickly see why this is the case; the BI Server is using aggregate navigation and actually querying the BI_AIRLINES_AGG schema to return the aggregated results the dashboard, as you can see from one of the SQL statements retrieved from the NQQuery.log file below:

OBIEE will always use these aggregate tables if they’re available in the repository, so to enable me to test the reports with these aggregates disabled I borrowed the technique Robin introduced in his blog on testing aggregate navigation, and added a request variable prompt to the dashboard page that allows me to pass an INACTIVE_SCHEMAS value to the logical SQL queries issued for the dashboard analyses, and therefore tell the BI Server to ignore the aggregate schema and just use the detail-level BI_AIRLINES schema instead.

NewImage

I therefore test the dashboard again, this time with the BI_AIRLINES_AGG schema disabled; this time, every dashboard took a while to display properly, with every analysis showing the “spinning clock” for twenty, thirty seconds a time. Comparing the response times to the ones a moment ago when aggregates were enabled, and you can see the difference.

chart2

So let’s set-up the in-memory option and see how it affects these two scenarios. The in-memory option for Oracle Database 12c 12.1.0.2.0 is installed by default as part of the core database, but doesn’t start working until you set a value for the INMEMORY_SIZE initialisation parameter – as you can see from the SQL*Plus output below, it’s currently set to zero, effectively disabling the feature:

The memory Oracle allocates to the in-memory option, via this initialisation parameter, comes out of the SGA and therefore before I set the INMEMORY_SIZE parameter, it’d be a good idea to increase the SGA_TARGET value to accommodate the 1G I’m then going to assign to INMEMORY_SIZE. I do both from the SQL*Plus command-prompt and then bounce the database to bring the new values into use:

Now we’re at the point where we can enable the tables for in-memory access. Tables to be copied into memory can either be created like that at the start, or you can ALTER TABLE them afterwards and specify that they go into memory (or you can configure the in-memory settings at the tablespace level, or even specify particular columns or partitions to go into memory). The underlying data still gets stored row-wise on disk, but enabling a table for in-memory access tells the Oracle database to create column-store in-memory representations of the table and direct suitable queries to those copies, all the time in the background keeping the copy in-sync with the row-store base data.

I’ll start by enabling all of the BI_AIRLINES schema for in-memory access, as the biggest pay-off would be if then meant we didn’t need to maintain the aggregate tables. After running a SELECT table_name FROM user_tables to list out the table names, I then run a script to enable all the tables for in-memory access, with the in-memory copies being populated immediately:

Looking at USER_TABLES from this schema, I can see all of the tables I selected now marked for in-memory access, for immediate loading:

And I can track the progress of the tables being copied into memory using the V$IM_SEGMENTS v$ view, like this:

Note that most of the tables went into memory immediately, but one (PERFORMANCE) is taking a while because it’s so big. Also note that not all tables are listed in the v$ view yet, as the database hasn’t got around to adding them into memory yet, or it might choose not to populate them if it feels the memory could be used better elsewhere. You can alter the priority of these in-memory copy processes if you want, and decide whether the copying happens immediately, or when the table is first scanned (accessed).

Running the dashboards again, with the request variable prompt set to disallow the aggregate schema, gave me the response times below – the in-memory queries were much faster than the row-based non in-memory ones, but they weren’t down to the response time of the dashboards right at the start, where all data is stored row-wise but we’ve got aggregate tables to speed up the queries (note I’ve reordered the bars so the non in-memory queries with no aggregate tables are on the left of each series, as the slowest of all approaches)

chart3

Taking a look at one of the physical SQL queries for a cross-tab (hierarchical columns) analysis, you can see that in-memory table access is happening:

Indeed, looking at the queries the BI Server is sending to the database it’s not too surprising the in-memory difference in this case wasn’t too dramatic. According to the docs (emphasis mine):

Storing a database object in the IM column store can improve performance significantly for the following types of operations performed on the database object 

  • A query that scans a large number of rows and applies filters that use operators such as the following: =, <, >, and IN
  • A query that selects a small number of columns from a table or materialized view with a large number of columns, such as a query that selects five columns from a table with 100 columns
  • A query that joins a small table to a large table
  • A query that aggregates data

The IM column store does not improve performance for the following types of operations:

  • Queries with complex predicates
  • Queries that select a large number of columns
  • Queries that return a large number of rows
  • Queries with multiple large table joins

and our query certainly has complex predicates, returns a fair few rows, has large and lots of joins etc. Taking a more simple query that you’d likely write yourself if querying a data warehouse, you can see the in-memory table access being used again but a much simpler, an cheaper explain plan:

If I then turn-off the in-memory feature and regenerate the execution plan, you can see without in-memory the plan is around 5x as expensive:

Running the actual queries in this case gives me a wall-time of around 4 seconds for the in-memory version, and 14 seconds when in-memory query is disabled … but the response time isn’t anywhere near the initial run where we had data stored row-wise but with aggregate tables, so let’s finish-off the testing by putting the aggregate table in-memory too, and see if that makes a difference.

And of course, it absolutely flew:

chart4

So to conclude from my look at the Oracle Database 12c In-Memory option with OBIEE11g v406 SampleApp, I’d say the following based on my initial tests:

  • For BI-type reporting where you’re typically summarising lots of data, the in-memory option doesn’t remove the need for aggregate tables – you’ll still benefit significantly from having them, in my observation
  • Where the in-memory option does benefit you is when you’re querying the detail-level data – it helps with aggregation but it’s main strength is fast filtering against subsets of columns
  • Some of the more complex SQL queries issued by OBIEE’s BI Server, for example when creating lots of subtotals and totals against a dataset, reduce the effectiveness of the in-memory option – you’ll get the biggest speed improvement, at least at the moment, with queries with simpler predicates and not so complex joins

There might be more to the aggregation story in the end, though. Looking at the Oracle Database 12c In-Memory Option Technical White Paper, the in-memory option should in-fact help with aggregation through a new optimiser transformation called “vector group by”, a transformation that’s likened to a star transformation that uses CPU-efficient algorithms and a multi-dimensional array created on-the-fly in the PGA called an “in-memory accumulator”.

NewImage

In fact, what we’ve heard is that many of the old Oracle OLAP team have moved over to the in-memory option team and were responsible for this feature, so I’ll be taking a closer look at in-memory aggregation in this new feature over the next few months. In my examples though, I didn’t see any examples of vector group by in the query execution plans, so I’m assuming either conditions weren’t right for it, or like star transformations there’s some combination of setting and query factors that need to be in place before it’ll appear in the execution plan (and queries presumably run that much faster).

For now though – that’s my first run-through of the 12c In-Memory Option in the context of a typical BI workload. Check back later in the year for more in-memory option postings, including hopefully something more on the in-memory aggregation feature.

Comments

  1. Peter Scott Says:

    Nice post, Mark – I have another 12c new features blog post coming out next week – it’s not just in-memory that is of interest to the DW/BI set :)

    We should also have a piece about in-memory in the next edition of the UKOUG magazine – but as I wrote it, it has fewer pictures

  2. Matthijs Schram Says:

    Hi Mark,

    Thanks for the post, but I am confused.

    According to the documentation memcompress for query low gives the best query performance?

    Thanks,
    Matthijs

    MEMCOMPRESS FOR QUERY LOW
    This method results in the best query performance.

  3. Mark Rittman Says:

    @Matthijs – you’re correct, that was an error by me in interpreting the documentation. I’ve updated the article accordingly now.

    Mark

  4. Bud Endress Says:

    Neither of the plans for the example queries use the vector transform plan (that is, the in-memory aggregation feature). The queries appear to be eligible for vector transform plan, so it might just be a costing issue. You might try using the VECTOR_TRANSFORM hint for each of the query blocks that aggregate data. Also, if you see the bitmap indexes used when you have a vector transform plan you might try making the indexes invisible.

    Generally speaking, you can expect more benefit from the vector transform plan with more joins (that is, more dimension tables joined to a fact table) and more rows aggregated. These examples include only two joins and not very many rows aggregated.

  5. Mark Rittman Says:

    Hi Bud

    Thanks for this, a couple of questions if I may?

    1. You said “not many rows aggregated” – there were 6m rows in the table, how many would you expect to see for vector group by to be relevant?

    2. If we don’t use vector group by, how performant is the in-memory option for aggregation without that? Was I right in saying it’s not a substitute for aggregate tables? Is it a substitute though with vector group by?

    3. Are there any other internal rules about whether a vector group by can take place? Like PKs on the dim tables, FK joins, minimum number of dim joins etc?

    thanks

    Mark

  6. Bud Endress Says:

    You will probably see some benefit with only 6m rows and 2 joins in your VM, but the benefits increase as the rows aggregated and number of joins increases. As a general rule, the number of joins is more important than the number of rows aggregated.

    For example, in one of the tests I typically run 4 dimensional queries are 4x faster and 9 dimensional queries are 10x faster (on average). This ratio holds pretty steady. (This is with 500m row table on a modest blade server.)
    The performance of vector group by vs hash group by is more difficult to pin down, in part because some vector transformation plans (IMA) will fail over to hash group by depending on the data (e.g., the Cartesian product filtered dimension tables).

    In my opinion, efficient access to pre-aggregated data will almost always be faster than dynamic aggregation. This makes perfect sense given that query performance is directly proportional to the number of rows aggregated (all else being equal).

    For highly ad-hoc query workloads, for example where there are many possible grouping columns and/or queries tend to filter then aggregate (e.g., SELECT quarter WHERE month_in_quarter in (1,2)) aggregate tables are not practical and the vector transformation plan be much faster than alternative plans. That’s where in-memory really shines.

    If the query workload tends to access data that can be pre-aggregated (e.g., unfiltered quarters), a summary table or materialized view can still be very useful. In that case, the benefit of in-memory aggregation will be a reduction in the time (and CPU) needed to create or update the summary table or materialized view. Again, in my tests materialized views were able to significantly reduce query times and in-memory aggregation significantly reduced the time to create or maintain the materialized view.

    In memory purists might squawk at the idea of using a summary table or materialized view. As I see it, it’s just another tool that the database provides. If in-memory technology makes the maintenance of the summary table or MV 10x faster and queries are improved by other 5-20x, I think it can be a useful strategy.
    At present, the most important consideration for whether a query is vector transformation eligible is the form of the grouping clause. 12.1.0.2 supported group by; grouping sets, cube and rollup are not yet supported. Optimizer costing is another consideration which is why I always use the vector_transform hint.

  7. Steve devine Says:

    Great informative article as ever. Can we have a chart with coloured bars though!

  8. dieluigu Says:

    Mark, really good blog post however (agree with Steve) I would suggest bringing more color to the bar charts.

  9. Mark Rittman Says:

    @dieluigu, Steve – good point about the chart colours. I’ll see if I can redo them in colour.

  10. Mark Rittman Says:

    There you go – added a bit of colour now.

Write a comment





Website Design & Build: tymedia.co.uk