Okay maybe my solution is technically a bit ambitious given that you won't have an unlimited capacity/patience for IT on your team. I think the hosted pub/sub + cluster of subscribers approach is sexy but might be overkill given the above. Certainly untested.I guess I have to do some real cost/benefit analysis before I pick a direction. The inputs are as follows:
- very small team, one techy young monkey and one useless old fart that can barely boot up his MacBook; so we can’t support heavy technology stack
- small number of assets that do require tick data work at this point in time, across a fairly small number of assets (probably talking a few hundred symbols, only 10-20 used concurrently); so organizational aspects are not crucial, I can probably use file-system based structure
- at some point, I might move in the direction of doing more of the latency sensitive stuff; so flexibility is important
- main requirements are rapid reading and writing of rather large blocks of data (intraday we dump ticks into a text file) for research and back testing
Is there a reason I don’t want to go with bcolz given the above?
Personally, I can't comment on bcolz but does seem interesting. I'll also say that given your really have a log replay vs. an adhoc query requirement I think the proper abstractions above a file system consisting of binary files of tick data is a viable low headache approach. Obvious you'd want to write an API that abstracts all the headache of text files. I'd say test bcolz though. Worst case scenario is you've probably just made migration to whatever DB you try next easier than it would be with plain text files. If I was in your shoes I'd probably give mariadb some thought as well.
Last edited: