Quote from CloroxCowboy:
I use Sql Server as part of my setup... several hundred thousand possible pairs, with thousands of datapoints per pair...
I have been through this. Had to test a strategy and needed a lot of data (EOD) which is 20,212 trading days since 1928 for DJIA. MS SQL do not have regression, sliding windows, etc. so I coded in in VB.NET and used MS SQL for data storage. The algorithm generates ~1,000 columns so it turns out MS SQL 2008 can hold only 500 so I had to make two tables. To use this later you have to use a join which slows down the query a lot. The problem was the regression, sliding windows, etc. reached the limits of .NET - i cannot process more than 10,000 data points back. For 10,000 points it takes an hour of processing time.
Then tried with Sybase Anywhere 11. It has regression, sliding windows, etc. But still all data are processed for a lot of time ~10 minutes on my server.
Then I tested 500 stocks and while there not so much history back the processing time was prohibitive for any real use ~1 hour - e.g. at EOD I have to process the data fast enough to be able to trade next day. And there is no time for testing different strategies. My understanding is that to be able to test I have to be able to calculate all parameters as they used to be at the trading day and to have at least 10 days back and 10 days forward of calculated gain in order know when such and such factors were at effect, you have such and such gain up to 10 days after. These have to be calculated for every single trading day back in the history to be useful.
If you consider the cost - to get the maximum performance you have to use the enterprise versions - it is a very substantial investment if you want for example to speed up the calculations using 2 CPU with Sybase - it costs two times more.
So it turned out this is not possible to do with Conventional Off the Shelf databases (COTS).
After these benchmarks it turns out parallel computing is the way to do it as well a specialized and highly tuned database. We are doing this for more than 2 years and our internal report shows 8-10 times speedup on benchmark load/store of 200MB generated database, load/store being the slowest operations. Computations show up to 100 times speedup.
So basically this is our experience so far with using COTS.