Hi Makis, I really appreciate your response and candid and qualified feedback. I realize I too often criticize but I do not praise good things enough on this board. So, please note my gratitude.
a) I do not have issues with serialization/deserialization, I think I would be able to get quotes fast enough into pure binary file storage if I went the file route, rather than db.
b) You really hit a good point here and I should have clarified: I did originally intend to peruse the quote database as access point for other applications, spread sheets,... and that would have been a good argument to implement a data base. Since my last posts I thought about it and I think I need to clearly separate the business logic here. One part is my trading app that needs to access quotes in the event the connection goes down and it needs to re-start quickly and reload the last available time series to re-calculate indexes, indicators, ...
An entirely different application would be a data server/warehouse which would expose access to a database. At a later point I could simply have my trading app connect to the data server through tcp.
c) I agree with your last comments. I also came to the same conclusion. So far I ran a lightweight OMS, PMS, Risk module within each trading strategy container with its own execution gateway and data feed adapters. Several such containers can run in parallel on different markets, asset classes. I am now looking to pull out the OMS/PMS/Risk modules to create a global one that aggregates the whole book (not just each individual strategy container set), and I also look to separately pull out the market data feed. I already ran tests using the open source ZeroMQ messaging API, I am able to send messages even out of process over tcp at about 12 million messages a second which is way fast enough to run the datafeed with aggregation/consolidation modules as separate entity. This should provide several benefits. But I am diverging here away from the database issue.
I guess the database comes in handy as separate data server storage and query medium. I am still undecided when it comes to the exact type of database technology. On one hand I want to store and especially quickly retrieve time series based data (columnar db preferably for that), on the other hand I like to store transactions, TCA results, and query them in a very RDBMS/SQL type of way, meaning, running queries across values not just keys.
Your comments helped me to decide to go with a pure file dump approach within the trading app until I have separately come up with a fast database solution that can store real-time feed data frequently and make them available in the event of a crash. I prioritize the trading app for obvious reasons so the database may have to wait a little but please comment if you have ideas how to get the best of both worlds, pulling time series and running sql-like queries across values, not just keys.
Thanks
Quote from Makis:
A lot depend on the specifics of your design but from the info so far your best bet would be to serialize each message and dump onto a file (or mmap if available, but would need some extra work for mmaps)
I have worked on several market data platforms and all provide native serialize() and deserialize() methods to convert messages to buffers. Chances are that whichever vendor api you use has similar functionality. Other feeds (e.g iqfeed) give you straight buffers. Dump those buffers onto a file on a dedicated thread. Alternatively, you can write your own serialize method and only dump messages after you aggregate them to your own format. There are cons/pros either way.
A database would be an overkill for your purpose, but you then have an option to use it in other ways as well, so it may worth your time. I have never used it, but recently came across MonetDB that looks promising for real time tick data capture.
A better solution (but time consuming to implement) would be to decouple your OMS and Market Data from the same process, and further decouple the market data aggregation and data capture, as you eliminate the possibility of your data capture dragging down the most critical components in case of high volume and data spikes. That also would depend on which asset classes and how many symbols you subscribe for.