Tick Database Implementations

NetTecture · Jun 8, 2012

Quote from PocketChange:

Just store time stamp as string... What timer do you plan to use for ns precision?

Maybe he does what I do

Millisecond, BUT: using 100ns (windows level tick) to ORDER timestamps with the same time. This allows me to keep the order when retrieving, based on the event stream coming from the interface.

nitro · Jun 8, 2012

Quote from PocketChange:

Just store time stamp as string... What timer do you plan to use for ns precision?

Storing timestamps as strings would be fine, but it often breaks the rest of the software you are trying to use, at least out of the box. If this was the only project that I had to do, I would probably deal with it. But I don't have time to dick around with trying to adhere myself to a solution. I need it to be the other way around right now.

As for high resolution timing, we are using:

http://www.symmetricom.com/products...TimeProvider-5000-and-TimeProvider-Expansion/

with the PCIe1000 cards.

januson · Jun 12, 2012

Quote from amazingIndustry:

with all due respect but I think you did not get my point. I have no problem implementing scan algorithms or other apps. I am looking to exchange ideas about efficient database structures for tick data and custom time series for read and write purposes with APIs that expose rich query functionality. Rolling my own is a huge time waster especially when something already exists. Even my own binary data store and reader implements open source components that I did not develop myself. I am wondering whether anyone has experience using Redis and RavenDb and what they have to say about their efficiency regarding time series data storage and data retrieval.

I can assure you that RavenDb is terrible slow when comparing it to a normal RDBMS and that is caused by the nature of ticks.

I have experimented with MongoDb which is amazingly faster than RavenDb both in read and write.
MongoDb is 10Xfaster than RDBMS for write and almost the same speed in retrieval of data.

My findings is based at 15.000.000 ticks indexed on time stamp. MongoDb had enabled journals

Furthermore RavenDb totally misses the opportunity in aggregation over time, quite contrary to MongoDb which just has developed a small aggregation framework.

I would without any doubt choose MongoDb for writes

emg · Jun 12, 2012

Quote from fatrat:

I'm curious what automated system traders who write in C/C++ do for storing their tick data for market analysis. After having written much code for hedge fund automated trading systems for quants, I'm attempting to duplicate (author from scratch) what I've seen for my own benefit.

5 1/2 years later. What is the verdict? Did u get sued for violating non-compete clause? did u blow up? or are u now a HFT trader?

amazingIndustry · Jun 12, 2012

Thanks for sharing your RavenDB experience. I am looking for a solution that has faster than 10x RDBMS read times. I guess that would rule out RavenDB. Redis is still a contender but I need to run more performance tests. Have you had the chance to run read performance tests on MongoDB as well, other than writes?

Thanks

Quote from januson:

I can assure you that RavenDb is terrible slow when comparing it to a normal RDBMS and that is caused by the nature of ticks.

I have experimented with MongoDb which is amazingly faster than RavenDb both in read and write.
MongoDb is 10Xfaster than RDBMS for write and almost the same speed in retrieval of data.

My findings is based at 15.000.000 ticks indexed on time stamp. MongoDb had enabled journals

Furthermore RavenDb totally misses the opportunity in aggregation over time, quite contrary to MongoDb which just has developed a small aggregation framework.

I would without any doubt choose MongoDb for writes

PocketChange · Jun 12, 2012

I've got a lot of experience in these matters and most implementations start from archives of tick tapes. aka NxCore or CME Data Mine. Essentially a recording of the whole market message by message.

For real time and tick by tick market replay you can run message by message.

if your able to intelligently process and store these messages you can develop a tick accurate consolidated database. You can shrink the 37M daily tick messages sent for a symbol down to a daily 20K - 50K price tick consolidation. Rinse and repeat for each symbol and your library of tick tapes.

You can further summarize the consolidations into helper bars ie. 1 min OHLC bars ,1 hour etc.

Now a 2 step sql query can locate any tick in your db. typically < 100ms.
The key is storing pre-processed consolidations allowing you to quickly drill down to the bar and fetch ticks without having to process the tapes.

Hadoop Hives Hue etc allows you to scale up.

amazingIndustry · Jun 12, 2012

fair points, but what to use to persist the data, and especially read persisted data with high throughput AND intelligent query logic? What are your contenders?

Quote from PocketChange:

I've got a lot of experience in these matters and most implementations start from archives of tick tapes. aka NxCore or CME Data Mine. Essentially a recording of the whole market message by message.

For real time and tick by tick market replay you can run message by message.

if your able to intelligently process and store these messages you can develop a tick accurate consolidated database. You can shrink the 37M daily tick messages sent for a symbol down to a daily 20K - 50K price tick consolidation. Rinse and repeat for each symbol and your library of tick tapes.

You can further summarize the consolidations into helper bars ie. 1 min OHLC bars ,1 hour etc.

Now a 2 step sql query can locate any tick in your db. typically < 100ms.
The key is storing pre-processed consolidations allowing you to quickly drill down to the bar and fetch ticks without having to process the tapes.

Hadoop Hives Hue etc allows you to scale up.

PocketChange · Jun 13, 2012

Quote from amazingIndustry:

fair points, but what to use to persist the data, and especially read persisted data with high throughput AND intelligent query logic? What are your contenders?

We load consolidations in inmemory sqlite db's. Our sqlite tick db's are processed, indexed and stored cerod (compressed encrypted read only) chunked at 250mb. The chunking allows efficient integration with cloudera (Hadoop,hdfs, mapreduce,).

amazingIndustry · Jun 13, 2012

would you know whether any of your solutions provide .Net compliant APIs? Hadoop to my knowledge does not I think...

Quote from PocketChange:

We load consolidations in inmemory sqlite db's. Our sqlite tick db's are processed, indexed and stored cerod (compressed encrypted read only) chunked at 250mb. The chunking allows efficient integration with cloudera (Hadoop,hdfs, mapreduce,).

PocketChange · Jun 13, 2012

http://system.data.sqlite.org/index.html/doc/trunk/www/index.wiki
https://www.hadooponazure.com/
http://www.jnbridge.com/labs/?utm_source=dz1202&utm_medium=link&utm_campaign=dzone

Quote from amazingIndustry:

would you know whether any of your solutions provide .Net compliant APIs? Hadoop to my knowledge does not I think...