Storing time series data

aqtrader · Feb 21, 2019

Craig66 said:
For what it's worth, I use binary files of tick data indexed by symbol and date.
When you need to test across years of tick data with hundreds of symbols, CSV just isn't going to cut it.

me too. use binary format for daily and 1-min data for thousands of symbols

nooby_mcnoob · Feb 21, 2019

tommcginnis said:
Why for, "worst"?? That looked like a template that would have my Functional Programming friends all tingly with joy!

Haha thanks, but it's the worst because there are so many possibilities for bugs. It's completely uncommented and extremely fragile.

H2O · Feb 22, 2019

aqtrader said:
me too. use binary format for daily and 1-min data for thousands of symbols

And I assume you are using some kind of 'master' file to store symbol specific information like file name / location, general symbol data like exchange and other 'contract specs' in case of futures data for example?

aqtrader · Feb 24, 2019

H2O said:
And I assume you are using some kind of 'master' file to store symbol specific information like file name / location, general symbol data like exchange and other 'contract specs' in case of futures data for example?

Thanks for asking. A good question. Files are indexed by symbol name (and date for intra-day data). All other symbol info including company profile, fundamentals besides quot data are stored in data files. In my system, daily data are divided in to per-symbol files and minute data are divided into per-symbol and per-day files. Data files are stored in a high-performance computer file system ( usually through hash table to locate a file in a directory structure). Also active data are cached in RAM. As an example, retrieving basic info for a list of 1000 symbols takes less than 0.1 seconds. I have tools to conveniently update binary data files so as to add new data points daily and/or real-time.

HobbyTrading · Feb 24, 2019

I use csv files. My data files are rather small because I only use daily OHLC price data for the last 2~3 years.

T0pH4t · Feb 27, 2019

Custom solution built on top of rocksDb. Though I store raw tick data so this is overkill for most.

nooby_mcnoob · Feb 27, 2019

T0pH4t said:
Custom solution built on top of rocksDb. Though I store raw tick data so this is overkill for most.

I'm thinking I want to do this as well, not to actually use it directly, but so I can transform it later. Why did you choose rocksdb vs CSV or pgsql or SQLite?

T0pH4t · Feb 27, 2019

A chose rocksDb because its a simple key/value store optimized for append operations where I don't need to modify old data. For my access patterns which are scans for backtesting, its the most optimal choice. Most key/value and columnar stores are better suited to this type of work over relational databases (eg PgSQL, SQLite). RocksDb is very low level and is not for the novice. Higher level key values stores that could be used are InfluxDb (which I have moved off of for performance reasons) or KDb+ for example. I have years of raw tick data, so this works best for me. If you are not storing data with granularity of < 1min then any relational db will be fine.

T0pH4t · Feb 27, 2019

Maybe take a look at this thread. I posted a couple of times and there is a lot of good info in it. https://www.elitetrader.com/et/threads/time-series-db.316394/