Quote from vincegata:
How do you scale to use more computers/cores for backtesting? Is it like you run a strategy one (housed in an executable) against a first set of symbols for the whole backtest period on a first core, you run the same strategy against a second set of symbols for the whole backtest period on a second core, you run a strategy two against a first set of symbols for the whole backtest period on a third core and so on. When you add more servers you just decrease a number of symbols to run against a each strategy. Something like that? By a strategy I mean it can be a task such as calculating correlation between symbols.
Did you read my posts? Seems you have a problem to understand simple sentences? I did explain this.
We split the backtest into tasks - a task is always
* One week (Sunday to Saturday - we are always flat on weekends)
* Max. x combinations (128 at the moment).
The tasks come into a central queue. Every node takes tasks from the queue. Add more computers, they just take still tasks from the same queue. We are single threaded (WAY faster than multi threaded) In backtests and thus paralellize tasks - i.e. a node takes one task per core. This avoids dealing with thread synchronization in the backtest an still use the CPU 100%.
Due to the queue system I can easily scale this to a thousand computers if I need to - or more . TOTALLY classical HPC (High Performance Computing).
Every task writes the results and all relevant information into the central database for analysis.
For example we rignt how do a retest fof a lot of stuff due to - data issues (we had bad exports).
One particular example:
Optimization - XXXXX_SI, 4620 combos)
(that is a particular - name is XXXd out) strategy in silver, having 4620 parameter combinations).
It has not started yet, so:
Scheduled:4620
That is for a 29 month period. As you can see - that is 4620 tasks for the grid to take up and work on. So, in theory - I could have up to 4620 cores working on this particular optimization at the same time.
What we can not do now is genetic optimization - but we work on it with a more complex task structure and additional tasks (i.e. you get ONE task to generate tasks for generation 1, then another one to generate the tasks for the next generation).
Again, his all is totally standard - any of the supercomputers of the last generation works similar, with a HPC setup working on work orders from a central queue.