July 4th, 2013 by Peter Scott
In part 2 I introduced TimesTen Columnar Compression and explained that it creates-column value look up tables so a longer data value can be replaced with a short token. If we take a traditional Fact table we have a set of (usually) integer keys and some measures (again most likely numeric). Replace a 2 or 4 bytes long dimension key with a token of the same size will not save space on compression, in fact we use more memory as we also have to store the token look up table that gets created as part of the compression; I don’t think that compression is useful for most fact tables. On the other hand with dimensions we do get repeating longish columns for example many customers live in the city “San Francisco” we can replace the whole string with a single token of two or four bytes. Using Column Groups we could include STATE with CITY to get “San Francisco, California” reduced to single token. Column Groups can also be used where we have multiple attributes at dimension grain. Taking customer again we may have attributes for GENDER, MARITAL STATUS, HOME OWNER FLAG. For the example columns I gave the product of the number of distinct values for each column is under 255 so we could compress these three columns to a single byte value in the dimension table. In my opinion TimesTen Compression is more suited for dimensions than facts.
So far I have only looked at estimated sizes on empty tables. These are just estimates, real sizing and relative performance evaluation comes from the use of representative data loaded into actual TimesTen tables. Having created tables we can look at loading them. We have many options here:
- ttLoadFromOracle – requires TimesTen 220.127.116.11 or higher,
- ttBulkCp – loads data from flat file (similar to SQL/Loader).
- ttImportFromOracle – creates scripts to transfer data (18.104.22.168 or later) – I introduced this in part 2.
- OBIA ETL target changes – supports non-Oracle sources.
- ODI – supports non-Oracle sources.
- Oracle GoldenGate – replicate changes on source system (OBIA data warehouse warehouse tables) to TimesTen.
- Custom code that uses ODBC/JDBC.
- SQLDevloper – Export as insert statements and execute in TimesTen session. Good for quick and dirty on low data volumes.
Old school, but effective, ttBulkCp works well. We spool flat files from our source tables and then use the utility to load the data into TimesTen. If you use SQLDeveloper you will see that ‘ttbulkcp’ is one of the supported export formats. The great thing about using SQL developer is that we can impose a where clause on to the export to move only a subset of data, a great feature if we are only using a slice of data or developing some form of incremental load into TimesTen. As is often the case there are are a few things to watch out for: date formats need to be TimesTen format or we need to supply a Date format mask to the ttBulkCp command and watch out for value limits on any of the native TimesTen data types.
Load From Oracle
The new kid on the block, allows parallel data loading which can be a big advantage for loading large tables. This feature was new in TimesTen 22.214.171.124 and currently does not support columnar compression, it is expected that compression will be available in a subsequent release. Unlike ttBulkCp, ttLoadFromOracle makes a direct connection to the source database base and executes a SQL query to extract the data set required. The ability to define SQL in the select will be especially useful when we develop processes such as data fragmentation or incremental loading.
Data Load Tests
Firstly, let me emphasise that these are not benchmark timings, I am using artificial test data in a fairly small environment; however they give a general indication for what works and what does not.
In part 2 of this series I showed that using the same DDL on TimesTen as for the original OBIA data warehouse tables gave a massive increase in table size (RAM required compared to disk utilised in Oracle) Changing the data types to TimesTen native types reduced the space requirement, but still came in as larger than the original OBIA usage. For my dimension loads I extracted the source table DDL and modified it to use native TimesTen data types. As I am using TimesTen as storage for a data warehouse style database I added the OPTIMIZED FOR READ directive to my CREATE statements. I exported the data from the source tables using SQLDeveloper to create ttBulkCp files. No data filtering was used on extract. Each file was in turn loaded into pre-created TimesTen Tables. Two additional dimensions were created to experiment with compression (compressed address and compressed product). I also tried a compressed fact table and as expected I saw no space saving (in fact a small increase in memory usage). The compress fact was not used in further tests
For the first set of tests I duplicated the whole fact table to TimesTen, again using ttBulkCp. After loading the tables with data TimesTen object statistics were gathered. Just as in its traditional cousin the TimesTen Cost Based Optimizer needs accurate table statistics to do a good job. I then ran a few queries at differing levels of aggregation over the OBIA Oracle 11g database, the TimesTen copy and the TimesTen copy with compressed dimensions. For full tables at no aggregation the best query performance was seen with the original OBIA data warehouse. Of the two TimesTen table structures the best performance (nearly as good as the OBIA database) was seen with compressed dimensions.
Both in terms of database size and query performance it would appear that using TimesTen as replacement for a detail grain data warehouse is not viable. We do have some other options to exploit the fragmentation or federation features of OBI to allow us to utilize TimesTen with OBIA. We can follow the approach of the Exalyitics Summary Advisor and build aggregated tables in TimesTen or we can use TimesTen to store a narrow time-based slice of data at the same level of grain as the original fact table. In this blog posting I am not going to map my test tables into OBI, instead I am going to look at things from SQL.
For this section I will be using LoadFromOracle and building my aggregate or timeslice as a query over the source database, in the real world I would look at some form of incremental load rather than a full refresh. Building a time-slice table is perhaps the simplest extract; we add a where clause to the extract that limits the select to a subset of rows. In this case I am extracting a particular month from my fact table. The resulting TimesTen table is of course much smaller than before and queries very quickly. This approach will work well if a large amount of user queries are looking at current month but for queries over longer time frames OBI will potentially need to combine the results from both the OBIA database for historic data and TimesTen for current, this happens in the OBI server.
The final test were to create summary tables at a higher level of aggregation, again I have used the SQL to build the aggregates as part of the ttLoadFromOracle call. Again our aggregates are much smaller than the original source and execute very quickly as much less data needs to be looped through.
In conclusion I would say that the use of TimesTen is viable over OBIA apps providing you only work with data subsets. The choice between using a recent time-slice or an aggregate table will depend on you reporting needs. If most queries require base data (that is more operational reporting) it is probably best to use a recent time-slice in TimesTen, on the other hand if people mainly look at management dashboards then targeted summary aggregates are the way to go. However any aggregate in TimesTen must be a lot smaller than source table or else there is no advantage in using the aggregate