Skip Links

Blog

Posts tagged with "column-oriented databases".

Running analytics using Infobright

Sid

Sid

05 Mar 2010 16:10

Last month I went to an interesting seminar given by MySQL where one of the presentations was by Infobright. It’s an open source analytics solution and their sales guy was plausible enough to get me download it.

I’m currently using it to do some analytics/Business Intelligence testing running it against some historical data from a data warehouse we’ve built. I’m asking it things that I know relational databases struggle with, or at least require a lot of tuning input to get working nicely (e.g. top-n queries to show the most valuable customers, etc.)

At the moment the performance improvements for are pretty significant (15x quicker) with only a month’s worth of data. I’m hoping that once we load a quarter’s-worth, or a year, or multi-year then the improvements will ramp up even more.

The nice thing about it is that it’s built on MySQL so you can run regular SQL against it and it does all the clever stuff.

The weird thing about it from a relational viewpoint is that there are no indexes or anything like that. It’s a column-oriented database and hence great for group and aggregate functions – the mainstay of analytics.

Once I’ve done a bit more formal testing I’ll publish out the queries we used along with the comparison to the traditional relational database we have currently implemented on.

Quick update:
Just finished the first round of testing with around 625,000 rows of data which represents about 1/3 of a year. Infobright was faster on all the queries ranging from 15x to 30x faster.

Tagged in: infobright, analytics, business intelligence, data warehousing, columnar databases, column-oriented databases

Results of Infobright evaluation

Sid

Sid

18 Mar 2010 18:06

In my last post I was evaluating Infobright’s column-based database (ICE) against some data in a warehouse we’ve built for one of our clients. We compared storage space and timings and results were pretty impressive and I thought I’d share.

Versus an unoptimised MySQL InnoDB database ICE was between 15 and 40 times faster on the four analytical queries we selected. In addition, ICE used over 10 times less storage space.

We also tested ICE against a database that was optimised heavily in favour of the queries we were running. This test was pretty unrealistic as no warehouse I’ve ever worked on has fit all of the queries that run against it. Despite this, ICE was 2 – 6 times faster than the optimised database and around 17 times smaller.

The test case we used only had 1.3 million rows but the results were impressive nonetheless. Probably the best thing about it was how easy it was to get it up and running and performing to that level – compare that with the brainpower and effort needed to optimise a database. I know this was a limited test but it was still on real data and asking real business questions.

Anyway, we’ve put up the full report and approach on Scribd so take a read if you’re interested and either comment here or get back to me for more information.

Tagged in: infobright, analytics, business intelligence, data warehousing, columnar databases, column-oriented databases