I'm curious what automated system traders who write in C/C++ do for storing their tick data for market analysis. After having written much code for hedge fund automated trading systems for quants, I'm attempting to duplicate (author from scratch) what I've seen for my own benefit. The hedge fund that I worked for didn't actually maintain a tick-database. They used conventional datastructures to make markets. This led to headaches, and I want to attempt a slightly different implementation.
I'm not a DBA, but I set up SQL Server 2005 with a series of stored procs for storing market information. Databases are slow, so I created a quote- and tick-posting queue to which market-data threads asynchronously post their information, and the worker threads commit them to the DB later. The software has a sort of sliding window that serves as a short-term cache for recent tick-data so lookups for past tick-data don't hit the database directly. My models that are highly-dependent on short-term data don't suffer performance penalties when reaching back for historical data, provided it fits in the sliding window.
I modeled the tick-data management system like a processor cache, with the database being the equivalent of a main-memory. So far so good, and my program was able to keep up with the NASDAQ's QQQQ, tick for tick and order-add/order-remove for the Level-2 books.
There's room for improvement, however. I have not added more products aside from QQQQ for my test. I'm trying to address scalability issues for handling multiple products. Rather than reinvent the wheel and spend considerable time and energy implementing a robust tick database, I'd like to hear what other alternatives people are using.
Are you guys using any 3rd party solutions for tick-databases? Do you have in-memory database solutions? What solutions have you come up with to: 1) minimize latency, and 2) provide efficient access to historical data without significant overhead.
In addition, are any of you willing to provide efficient DB schema for the storage of ticks, or are you aware of freely available schema to help improve tick db performance? Do you separate databases for individual products? Do you throw all products into one database? Do you run your modeling software on a different system than your database server? What is your network topology with regard to the database server?
Where have your bottlenecks come from in the past? In the event of a database server failure, how do you deal with the situation? Do you utilize redundancy for data-storage in case of a failure?
I'm not a DBA, but I set up SQL Server 2005 with a series of stored procs for storing market information. Databases are slow, so I created a quote- and tick-posting queue to which market-data threads asynchronously post their information, and the worker threads commit them to the DB later. The software has a sort of sliding window that serves as a short-term cache for recent tick-data so lookups for past tick-data don't hit the database directly. My models that are highly-dependent on short-term data don't suffer performance penalties when reaching back for historical data, provided it fits in the sliding window.
I modeled the tick-data management system like a processor cache, with the database being the equivalent of a main-memory. So far so good, and my program was able to keep up with the NASDAQ's QQQQ, tick for tick and order-add/order-remove for the Level-2 books.
There's room for improvement, however. I have not added more products aside from QQQQ for my test. I'm trying to address scalability issues for handling multiple products. Rather than reinvent the wheel and spend considerable time and energy implementing a robust tick database, I'd like to hear what other alternatives people are using.
Are you guys using any 3rd party solutions for tick-databases? Do you have in-memory database solutions? What solutions have you come up with to: 1) minimize latency, and 2) provide efficient access to historical data without significant overhead.
In addition, are any of you willing to provide efficient DB schema for the storage of ticks, or are you aware of freely available schema to help improve tick db performance? Do you separate databases for individual products? Do you throw all products into one database? Do you run your modeling software on a different system than your database server? What is your network topology with regard to the database server?
Where have your bottlenecks come from in the past? In the event of a database server failure, how do you deal with the situation? Do you utilize redundancy for data-storage in case of a failure?