GAT, Arctic does look very cool. It's column-oriented and compressed. I wonder at what granularity they store the data. I also wonder if they support queries with arbitrary start/end times, and if so, how they do it. If I can figure out how to access the data with Matlab, then I'll install it and give it a try.
nitro, I agree. Collecting realtime ticks for many/all instruments is daunting. Even just maintaining 100% uptime seems hard. My next step will be to buy daily TAQ updates that I can process overnight, to keep my backtests up-to-date. Then I can just discard whatever ticks I process throughout the day.
Butterfly, perhaps the title is misleading. I'm not talking about collecting realtime ticks. I'm talking about creating a disk-based data store that makes it fast to read in all trades for a ticker on a given day. For that purpose, I don't believe there is a faster method. The data can be read using C/C++, but getting it into Python is also fast, since zlib is a compiled library. Of course, applying backtest logic to the ticks once they're in Python will not be fast, but that isn't the point.
Has anyone here tried HDF5? I tried once a few years ago, and it was very slow. I must have done something wrong, though, because others say it is fast.