Quote from greaterreturn:
The challenge is the Part B of your question as so how we'll handle this when dealing with ticks from multiple instruments at the same time.
There's 2 parts to discussing this.
1. I just realized that the way I originally planned for multiple instruments won't work. Basically, I imagined the instruments running separately, in parrallel on separate threads. But that won't work because they won't be in synchronization for statistics reporting. Plus you couldn't refer to a "USD/GBP" while inside a "USD/JPY" model.
Try looking at this from an actor paradigm perspective. It makes more sense to have one actor that controls all tick input and output to other actors to ensure that data is getting synchronized appropriately. It itself may delegate behavior to sub-collectors, but don't forget that at the end of the day, your bottleneck will be your non-parallel I/O (namely, your data stream).
Quote from greaterreturn:
Here's the challenage. With that amount of data, you would hope to use the Quad CORE mulitple CPU systems to maximize the throughput.
However, that's no easily parrallelized.
In other words, those ticks must be streamed through the strategies in their time sequence order.
You don't want a USD/GBP tick for 3:23 pm going through the model long before the 2:38 tick from USD/JPY.
Again, using one actor to synchronize would help. Also, using a binary heap, sorted by time-stamp would give you pretty good performance and allow you to limit strategies from only getting the 'next' tick.
Quote from greaterreturn:
They must come in the same time sequence as from the real time feed.
Parrallelizing depends on being able to break a process into parts that can be process separately in parrallel and then rejoin the results at the end.
But how do you break a 5 year test of 10 instruments into 4 CPUs?
I do have an idea but it's faily complex and not flushed out.
The problem comes because any given strategy calculation, exit strategy, statistics gathering, or money management will depend on past bars and data during the test. And THOSE depend on past bars.
What you are trying to do is run the same computations on four cores with 4 different data sets. What if instead you ran four separate computations on different cores with the same data-set? The issue arises when computations begin relying on one another... so in this manner, the algorithms would have to be designed with parallelism in mind (OpenMP-style). This would probably require specialized interpretation of user scripts -- and I think you were just planning on keeping it standard C#. So this might be out. But ultimately, if the user ever has computations that rely on the past ... you probably can't easily split the data.
So you have two options: either try to hide the parallelism from the user, or force the user to be aware of it and design their algorithms to take advantage of it. In my opinion, it depends on who your target audience is. Considering it also seems like you are having users write their code in C#, it may be difficult for you to perform 'vertical' parallelism on their behalf -- the only way I can see you doing it is by going horizontal, which puts you back at square one with your issues.
A quick thought ... not fully fleshed out ... but what if you have your users write their strategies in different blocks. I could have my MACD computation code in one block, Stoch code in another, et cetera. These blocks run parallel. Then you could have a couple functions to synchronize the parallelism -- allowing them to share data and whatnot. For example...
(note that the following code is just pseudo code...)
In one part...
Code:
sma = SMA(close, 20)
share("SMA", sma)
synchronize(:one)
macd = get("MACD")
In the other...
Code:
macd = MACD(close, 9, 16)
share("MACD", macd)
synchronize(:one)
sma = get("SMA")
This might allow for some parallelism without too much hassle... sort of an actor-paradigm solution with message passing. I dunno. Tough one.
Maybe check out the <a href="http://en.wikipedia.org/wiki/Dataflow_programming">Dataflow programming</a> paradigm?