Quote from DarthSidious:
I am still trying to understand how using infobright or any other column based database is helping you. A good paper on this topic can be found here. Quoting from that article: "If you're bringing back all the columns, a column-store database isn't going to perform any better than a row-store DBMS, but analytic applications are typically looking at all rows and only a few columns" says Gartner analyst Donald Feinberg. "When you put that type of application on a column-store DBMS, it outperforms anything that doesn't take a column-store approach."
Now, your typical query would bring back all data from all row all the time, wouldn't it? Except, well, if you are just comparing close prices of 2 or more instruments, or querying close prices only. In those cases, you can leave out the OHL data. But say for building bars etc., theoretically, there is no benefit.
Maybe some benefit based on the fact that some of the columns are compressed? (again, see the article).
I would be very interested in some hard numbers for comparison purposes. For example,
a) total # of rows, total # of rows for MSFT, total # of rows returned for "select * where ticker="MSFT" and from=thisdate and to=thatdate (say for 2 years)
b) same as above, but comparison between 2 stocks
Time taken for the above queries, and of course your hardware.
I think the benefit actually comes from compression, because the storage on disk after compression is substantially less than without compression. The results are returned to me quite quickly, in under a second. 16GB vs much less.
I'm not sure this is a long-term solution, but for my purposes, it's ok for now. I avoid sophisticated queries and use 'merge' in R, post-query, where date+time are in a range. I only use the database to fish queries between a certain date, like in your example. So I'm not really even using any sort of query optimizer or anything.