Next the topic of daily trade signal generation.
I am having trouble understanding your benchmarks descriptions that I quote from above. However, one thing that I pull out of it (from the part of your full text that I quote above) is you seem to be saying that your goal was to calculate daily trade signals against 500 stocks in 2 hours max, and Sybase was not able accomplish that on a high speed server Dell PowerEdge 2850 2XCPU XEON 3 GHZ, 4 GB RAM, RAID 2 x 75 GB HDD. Furthermore, Microsoft SQL was 8 - 10 times slower than Sybase.
So let me modify my test program from my previous post above to adapt to this benchmark as closely as I can. Ok, so I am still using my test strategy that calculates 358 "sliding window" indicator calculations per day, even though I contend 358 input variables is way more than 99% of actual trading strategies would use. I trimmed down the number of stocks from 1,800 to 500.
So, the benchmark test is a moving average crossover system with 356 extra moving average calculations each day ranging in length from 5 to 360 inclusive. These 358 moving average calculations are done for each stock, each day. I am running this on 500 stocks, 13 years of EOD data. The test is also applying a fixed fractional portfolio level money management strategy and calculating combined portfolio trading results for the portfolio of 500 stocks over the 13 years (or more exactly the trading period is a little less than 12 years due to the 360 day ramp up of the moving average calculation).
On my $479.99 consumer level Windows Vista laptop with 3 GB of memory, the above calculation finished in 43 seconds. Less than 1 minute.
There is no database involved. Rather I am calculating on the fly using the PowerST software which is a written from scratch C++ application.
Take 43 seconds / 60 seconds = 0.7167 minutes.
Take 2 hours = 120 minutes. Take 120 minutes / 0.7167 minutes = 167.43.
It works out the PowerST is 167.43 times faster than your Sybase benchmark.
Then you said that Sybase is 8 - 10 times faster than MS... Let's call it 9 times faster. That makes PowerST 1,507 times faster then MS.
I am as surprised by these comparison numbers as I imagine that you are.
Also, the above are conservative numbers highly biased against PowerST because there are multiple worse case assumptions in my comparison test.
You are saying Sybase couldn't finish in 2 hours, where these calculations are being done based on if it did finish in 2 hours.
You were running on a high speed server and I am running on a cheap consumer level laptop.
This is processing 13 years of EOD data, where for daily trade signal generation 1 to 3 years of data is usually enough.
Still I am 167 times faster than the fastest (to use your term) Conventional Off the Shelf databases that you found in your testing.
This seems hard to believe, but in my interpretation of the performance you are describing and my best approximation test of what you describe, these are the numbers.
Certainly you can't still say that my software design approach as described in my first post on this topic would be too slow. The reality seems to be that it is more than 167 times faster than the fastest commercial SQL database you were able to find.
What am I missing? Can you see flaws in my reasoning or tests?
Well, to do some analysis, simple moving average is not very calculation intensive. More calculation intensive input variables would absorb time, but that is only CPU grinding. My test above does all of the data handling including combined portfolio processing, so more calculation intensive applications would only add exactly whatever time is actually spent in CPU grinding. Besides, calculation intensive input variables can be pre-calculated, and my viewpoint is that many (probably most) stock trading applications don't need massive calculation of hundreds of input variables.
On balance, it would seem that this conclusively proves the speed potential of custom C++ code versus SQL. In fact, that would seem to be an understatement.
- Bob Bolotin
Quote from thstart:
We wanted to be able to test at least the 500 S&P stocks. It turned out the extensive tests with all factors we wanted are not possible to finish in a reasonable time....
So it turns out the basic requirements are these - no matter what the processing and analysis are - it has to finish up to 1 hour plus max 2 hours. It turned out with the hardware mentioned - a high speed server Dell PowerEdge 2850 2XCPU XEON 3 GHZ, 4 GB RAM, RAID 2 x 75 GB HDD, which is a reasonable investment, you cannot do a lot with the MS or even Sybase tools - it is too slow....
For comparison if you take just one operation - LOAD data store - MS tools (MS SQL , .NET) show exponential dependency from the amount of data you LOAD in the DB, Sybase shows relatively linear dependency and is 8-10 times faster than MS, with THStart (our software) it is linear and 10 times faster than Sybase and you can figure out about how much faster relative to MS.
I am having trouble understanding your benchmarks descriptions that I quote from above. However, one thing that I pull out of it (from the part of your full text that I quote above) is you seem to be saying that your goal was to calculate daily trade signals against 500 stocks in 2 hours max, and Sybase was not able accomplish that on a high speed server Dell PowerEdge 2850 2XCPU XEON 3 GHZ, 4 GB RAM, RAID 2 x 75 GB HDD. Furthermore, Microsoft SQL was 8 - 10 times slower than Sybase.
So let me modify my test program from my previous post above to adapt to this benchmark as closely as I can. Ok, so I am still using my test strategy that calculates 358 "sliding window" indicator calculations per day, even though I contend 358 input variables is way more than 99% of actual trading strategies would use. I trimmed down the number of stocks from 1,800 to 500.
So, the benchmark test is a moving average crossover system with 356 extra moving average calculations each day ranging in length from 5 to 360 inclusive. These 358 moving average calculations are done for each stock, each day. I am running this on 500 stocks, 13 years of EOD data. The test is also applying a fixed fractional portfolio level money management strategy and calculating combined portfolio trading results for the portfolio of 500 stocks over the 13 years (or more exactly the trading period is a little less than 12 years due to the 360 day ramp up of the moving average calculation).
On my $479.99 consumer level Windows Vista laptop with 3 GB of memory, the above calculation finished in 43 seconds. Less than 1 minute.
There is no database involved. Rather I am calculating on the fly using the PowerST software which is a written from scratch C++ application.
Take 43 seconds / 60 seconds = 0.7167 minutes.
Take 2 hours = 120 minutes. Take 120 minutes / 0.7167 minutes = 167.43.
It works out the PowerST is 167.43 times faster than your Sybase benchmark.
Then you said that Sybase is 8 - 10 times faster than MS... Let's call it 9 times faster. That makes PowerST 1,507 times faster then MS.
I am as surprised by these comparison numbers as I imagine that you are.
Also, the above are conservative numbers highly biased against PowerST because there are multiple worse case assumptions in my comparison test.
You are saying Sybase couldn't finish in 2 hours, where these calculations are being done based on if it did finish in 2 hours.
You were running on a high speed server and I am running on a cheap consumer level laptop.
This is processing 13 years of EOD data, where for daily trade signal generation 1 to 3 years of data is usually enough.
Still I am 167 times faster than the fastest (to use your term) Conventional Off the Shelf databases that you found in your testing.
This seems hard to believe, but in my interpretation of the performance you are describing and my best approximation test of what you describe, these are the numbers.
Certainly you can't still say that my software design approach as described in my first post on this topic would be too slow. The reality seems to be that it is more than 167 times faster than the fastest commercial SQL database you were able to find.
What am I missing? Can you see flaws in my reasoning or tests?
Well, to do some analysis, simple moving average is not very calculation intensive. More calculation intensive input variables would absorb time, but that is only CPU grinding. My test above does all of the data handling including combined portfolio processing, so more calculation intensive applications would only add exactly whatever time is actually spent in CPU grinding. Besides, calculation intensive input variables can be pre-calculated, and my viewpoint is that many (probably most) stock trading applications don't need massive calculation of hundreds of input variables.
On balance, it would seem that this conclusively proves the speed potential of custom C++ code versus SQL. In fact, that would seem to be an understatement.
- Bob Bolotin
This is just your assumption.