mySQL As A Data Warehousing Platform?
May 24th, 2004 by Mark Rittman
Julian Ford passed on a link
to an article by Michael J. Radwin on Buildng
Data Warehouses With mySQL. The article refers to a presentation by John
Ashenfelter who’s also written a book
on mySQL and data warehousing which is due out in July 2004.
Data warehousing using mySQL on one hand sounds counter-intuitive but also
has a few things going for it. On the plus side, data warehouses don’t need
to be 24×7x365 available, and transaction integrity and support is less of an
issue, which are usually seen as weak areas for mySQL. On the other hand, mySQL
is unlikely to scale up to the terabyte-size data warehouses you often come
across now, and it lacks features such as materialized views and query rewrite,
star joins, bitmap indexes, upserts, views, subqueries, incremental or hot
backups, stored procedures (for ETL) and so on. I’m reluctant to criticize mySQL
but I’m not sure I’d recommend it for a data warehousing project.
From looking another mySQL and data warehousing presentation ("Large
Volume Data Transformation and Warehousing Using MySQL" by a Swedish
company called Lentus) it’s interesting that
they concentrate on using mySQL for the data staging and ETL process, rather
than querying and analysis. This would fit in well with the strengths of mySQL
(simple database, not overly complex, no need for complex transactional support)
although Oracle again would score points through it’s built in ETL functions
such as streams, external
tables, upserts, table functions and so on. Where you’re going to run into
problems is when you try to query a very large data set using mySQL - Oracle has
a number of enhancements that make star
transformations (queries run against star schemas) very efficient and has
mechanisms for redirecting
user queries from base tables to summary
tables without user intervention.
Having said all that, there probably is a role for mySQL in data warehousing,
most probably as a reporting database for other mySQL databases. If you’re
familiar with mySQL it’s more a case of learning the data warehousing process
and then applying it to your knowledge of mySQL, and in all likelihood your
database isn’t going to be all that large anyway - it’s more about bringing data
together and getting a 360 degree view of your organization rather than dealing
with very large datasets. Interestingly and nonetheless, there’s a bit of a
cottage industry springing up around reporting and analysis around mySQL and
open source, with a product called OLAP4All
offering a relational OLAP solution with a mySQL backend, Mondrian
being an open-source Java multidimensional OLAP engine that implements
Microsoft’s MDX query language, and JPivot,
a JSP tag library for delivering Mondrian OLAP data over the web.
Has anyone used mySQL or Mondrian for data warehousing projects? If so, get
in touch as I’d be interested in how things turned out.
June 3rd, 2004 at 4:57 am
Mark,
I had been looking at some open source technologies for BI/ETL for quite some time. My company lab installed and examined Mondrian… We had no real opinion — it seemed to work as described.
I came across a whitepaper/solution a while ago about some clustered parallel query work on postgresql. It’s been developed by a company, metapa (http://www.metapa.com). They have a free whitepaper which has some information (vague and non technical) about their clustered database solution.
Best I can see, it’s some proprietary extensions to postgresql to achieve partial clustering. They’re well funded, and nodes are inexpensive (i believe around 5k) so it could be interesting if they make some headway.
Nick
June 5th, 2004 at 2:15 pm
Hi Nick,
Thanks for that. I’ll take a look at metapa.com in the near future.
Mark
August 18th, 2004 at 5:32 am
I am cracking Mondrian at the moment. I tot a search in Google can lead me to clues, and it led me here. haha, will update u then if i crack it. Its just that i followed the Mondrian setup guide about changing the jdbc url from odbc to myssql (been smart
i use oracle instead of mysql but keep getting unable to load oracle.driver..). In the properties, i specify driver as oracle.jdbc.driver.OracleDriver, and the url in my case is jdbc:oracle:thin:@REDHUAN:1521:foodmart/foodmart@compiere. Hmm, i should double check that it classpath to classes12.zip, hmm, i tot i did. Have pity on this dummy.
February 28th, 2007 at 1:04 pm
How to connect from Oracle to Mondrian. What all xml files need to be modified.
Thanks