I guess I have to do some real cost/benefit analysis before I pick a direction. The inputs are as follows:
- very small team, one techy young monkey and one useless old fart that can barely boot up his MacBook; so we can’t support heavy technology stack
Are you the old fart here?
- small number of assets that do require tick data work at this point in time, across a fairly small number of assets (probably talking a few hundred symbols, only 10-20 used concurrently); so organizational aspects are not crucial, I can probably use file-system based structure
I'll say this: if you're still at the point that a filesystem based structure continues to scale then you're probably not at the point where you need an industrial grade solution. This isn't a bad thing, it's actually a good thing because it means your dataset (or hopefully you're projected one) is intending to stay reasonable.
- at some point, I might move in the direction of doing more of the latency sensitive stuff; so flexibility is important
Unless you've got something novel, model-wise, don't you think you're probably outgunned from the start here? My mind would be blown if commercial entities in the latency sensitive space are using any interpreted languages whatsoever at runtime (of which python is an interpreted language [and a slow one at that]). You'd have to be calculating something they're just not even remotely interested in in order to win that war - and who knows, that might just be the case. I'd say if you can avoid this entire space for as long as possible it's probably for the better.
- main requirements are rapid reading and writing of rather large blocks of data (intraday we dump ticks into a text file) for research and back testing
What instruments are you guys able to dump ticks into text files every day and have those not be massive? On top of that how are you preserving floating point precision if you're emitting into ascii data at some point? Just straight dumping doubles via printf or something along those lines? Also, how many days of intraday data? The text file dump of ticks won't scale for anything highly liquid and encompassing months worth of data. If it's just 24h worth of data then it'll probably work but I don't know how you'd backtest anything with just that.
Is there a reason I don’t want to go with bcolz given the above?
There's a lot of talk these days about "columnar" data stores and the like, typically with the vibe that it's some kind of bullet that will make all your numeric processing somehow better than a row oriented data store. In reality it is not a new idea whatsoever - just as "NoSQL" isn't new whatsoever. What matters is your data access patterns and how your models/backtests use the actual data you store. If you're not accessing the data from the context of a single column then it's not really going to do anything for you. For instance, let's say you use symbol as key, OHLC as columns. Your backtesting logic then consults close for every single tick. A column-oriented DBMS will perform better there because locality of reference is high for successive close values (hence they can be bulk retrieved faster than a row-oriented approach). However, let's say you want open+close, now that's going more in the direction of a row-oriented access pattern and a column-oriented structure, while it will still perform decently, is showing less of an advantage. BTW: The typical approach for something like this with a standard row-oriented approach is to index those columns to produce your own column-oriented table (the index). If you were constantly backtesting open+close but stored OHLC data per row then it behooves you to create an index on open+close for that table. More background:
https://en.wikipedia.org/wiki/Column-oriented_DBMS
Also, if you're compressing data to save space you *will* take a hit in access time. It may not be much of a hit if the access patterns are streaming/bulk-reading but the second anything resembles random seek then compression will kill you (as each block will need to be uncompressed in order to read the individual columns or rows within a block - and some of that data may be unrelated).
bcolz may work fine for you, I'm just skeptical of the scalability of any database implemented on top of an interpretive language because they simply don't perform or scale. They eventually all hit bottlenecks requiring use of underlying components written in native languages in order to keep scaling. Generally they're good for scaffolding or prototyping things but they all eventually hit a wall (with unfortunately too many people throwing hardware at the problem rather than changing the algorithm [in this context, the implementation language]).