Yeah, that's the big complaint with the free *NIXes - you have to build your environment yourself out of parts. Which is also it's biggest positive if you know what you want and know what you're doing..
I've taken a different approach than blah. My database schema is extremely simple:
Instruments table and Ticks table. I compute bars on the fly. So far it hasn't given me issues, but I'll probably create an indexed view if that's a problem. (Here's where I love Postgres: Materialized views (in Postgres) are a lot more forgiving than Indexed views (in SQL Server) at the cost of manually triggering an update.)
Right now, I only have 1 instrument in there for 75 million rows and 3GB of space. I don't have other instruments cause I'm running out of space. I'll get a bigger drive and more instruments once I start making money on this one.
I've compared the speed of SQL vs CSV. They're nearly the same speed. The biggest difference is that for CSV, I have to wait forever to start testing anywhere except the middle. For SQL, it's instantaneous.
Also, storing the results of the trades in SQL is very important for me, since I query that thing in a bunch of ways. Maybe I should stick it in a cube some day?
My schema for results look like this:
A strings table and a Trades table. The trades table records prices, times, and identifiers so I can separate out which strategy did what, and why it decided to do so.
I found most of the data issues came from storing trades too slowly. So I bulk load the trades in batches, and use a separate thread to do that.
I like your scheme. I'm sure it's incredibly fast. Since you know your application very well, you don't need to provide a general data interface that SQL provides.
I think, though, it's not worth it at this stage for me to use such a scheme. Once I put in any non-trivial strategy into my backtest system, however, data acquisition drops down to 2% of total time.
So I think if we were looking at database organization purely from a performance standpoint, the binary store would be best. However, once we account for the limited time I have to spend on this project (about an hour a day), development time starts to dominate.
For what it's worth, I still haven't spent the time to import some of my CSV data. I just wait the 5 minutes for it to run through the old data. I spend that 5 minutes working on refining my strategy.
Imagine the following schema:
Header[x bytes, fixed size] + Datapacket1 + Datapacket2 ...., where Datapacket = [DateTime as millisecond long integer, datapointA (such as BidPrice), datapointB(AskPrice), .....]
Each Datapacket within a given times series has the same fixed size in bytes. You use a "StoragePlugin" that shares the same interface with other storage plugins and the interface enables you to serialize datapoints, deserialize them, and it stores knowledge of the size of each data packet. In that way you can essentially store mixed data in the very same database. You can store time series with datapoints of type [timestamp, bid, ask] such as fx tick based data, or you could store [timestamp, last trade, volume] for a futures quote, or [timestamp, open, high, low,close]. The Header will provide information about the data structure such as size of file, number of data points, size of each data packet, symbol information,...
With such structure you can now easily run a binary search algo over an arbitrarily large time series and it will take a fraction of a millisecond to retrieve the matching position in your binary time series. From that datapoint you can simply read x number datapoints, convert on the fly to compressed data or run any other type of query. Sure, you do not have the benefits a relational db offers but actually you still do because you can simply read the data points in question and run relational queries in memory. That is basically anyway how SQL Server does it unless you heavily index your data. But you can essentially do the same by caching your queries or optimize queries in memory (I can provide more info on that if requested but it will be more in-depth).