Storing time series data

For what it's worth, I use binary files of tick data indexed by symbol and date.
When you need to test across years of tick data with hundreds of symbols, CSV just isn't going to cut it.
me too. use binary format for daily and 1-min data for thousands of symbols
 
Why for, "worst"?? That looked like a template that would have my Functional Programming friends all tingly with joy!

Haha thanks, but it's the worst because there are so many possibilities for bugs. It's completely uncommented and extremely fragile.
 
me too. use binary format for daily and 1-min data for thousands of symbols

And I assume you are using some kind of 'master' file to store symbol specific information like file name / location, general symbol data like exchange and other 'contract specs' in case of futures data for example?
 
And I assume you are using some kind of 'master' file to store symbol specific information like file name / location, general symbol data like exchange and other 'contract specs' in case of futures data for example?
Thanks for asking. A good question. Files are indexed by symbol name (and date for intra-day data). All other symbol info including company profile, fundamentals besides quot data are stored in data files. In my system, daily data are divided in to per-symbol files and minute data are divided into per-symbol and per-day files. Data files are stored in a high-performance computer file system ( usually through hash table to locate a file in a directory structure). Also active data are cached in RAM. As an example, retrieving basic info for a list of 1000 symbols takes less than 0.1 seconds. I have tools to conveniently update binary data files so as to add new data points daily and/or real-time.
 
Custom solution built on top of rocksDb. Though I store raw tick data so this is overkill for most.

I'm thinking I want to do this as well, not to actually use it directly, but so I can transform it later. Why did you choose rocksdb vs CSV or pgsql or SQLite?
 
A chose rocksDb because its a simple key/value store optimized for append operations where I don't need to modify old data. For my access patterns which are scans for backtesting, its the most optimal choice. Most key/value and columnar stores are better suited to this type of work over relational databases (eg PgSQL, SQLite). RocksDb is very low level and is not for the novice. Higher level key values stores that could be used are InfluxDb (which I have moved off of for performance reasons) or KDb+ for example. I have years of raw tick data, so this works best for me. If you are not storing data with granularity of < 1min then any relational db will be fine.
 
Last edited:
Back
Top