Thanks for all the responses guys, I appreciate it. After considering some of the other database choices (some very interesting alternatives), I think I am going to stick with MySQL, mostly because I am already very familiar with it and ways to optimize it.
Now that that is settled, I am ready to begin my trek to design the database. After scratching the surface on this task, I realized that I was not completely sure how I wanted to go about doing this. Since I don't want my design to be myopic in nature, I figured it would be best to ask those who are more knowledgeable than I. Here are some initial questions I have about this:
1. Which storage engine would be more suitable for storing quotes? MyISAM or InnoDB? It turns out that MyISAM may be a bit faster, but InnoDB is transactional and supports full ACID.
2. What types of data should I store? As of now I only plan to store tick data. For example:
symbol,
tradeTime,
tradePrice,
tradeSize. For those of you who have recorded data for a while, how long does simple data like this last you? Does storing something like market depth or bid/ask prices benefit you in the future? I don't have any strategies at the moment that would take advantage of this data, but I'm guessing that sometime later on it would be useful. Plus, disk space is cheap so amassing a huge amount of data wouldnt be too bad. Aside from what I mentioned, what other types of information have you guys found useful to record?
3. How precise should the trade times be? MySQL has a built-in
UNIX_TIMESTAMP() function that returns the time in seconds. Would something more precise actually make a difference? Although it would be possible, it would be a pain to write a function that returned millisecond precision (I dont think MySQL has one built-in).
4. Right now I'm planning on making one table per symbol, and then possibly running a script to update a symbol_EOD table nightly (not sure about this yet). This would easily amount to over 100 tables in my database, or twice that with EOD tables too. Have you guys found it useful to compile other tables from the tick data, like EOD? As of now, I have something like:
Code:
INSERT INTO [i]symbol[/i] VALUES (ID,UNIX_TIMESTAMP(),[i]price[/i],[i]size[/i]);
5. I've had a bit of experience in the past with database clusters, and although this may be far in the future, how practical would setting up a cluster or parallel system actually be?
6. One last, and fairly important question I have, is how reliable would this be for realtime trading? I'm sure it'd be fine for backtesting, but trading realtime with this data is a whole different thing. I was considering using memory-based tables to store possibly the current day's data, thereby speeding up the processing time for recent quotes. This would require a bit of work to get it done reliably, is it worth the extra effort, or would standard tables be sufficient?
If anyone could give me some pointers on the above questions (or something I left out), I'd really appreciate it. Also, for those of you who already have something setup and running, I'm real interested to hear about what kinda setup you have, like what kind of data you you store, how much data you have total or how much you accumulate daily (# records or MB), and mostly, how well it is working for you, and any pitfalls you've encountered along the way. Thanks a lot!
-- Paccc