Open Source Black Box Trade Platform in C#?

greaterreturn · Dec 12, 2008

Pippi, I just had a great idea and an challenge too with your earlier question.

1. Good news, I just realized how simple to switch to "streaming" ticks. (Better term than rolling.) That's because TickZOOM already streams ticks from the real time server when running live.

All I have to do is make the file loader switch to stream the data the same way. It will actually simplify code in the engine process ticks only by streaming.

That's so easy I can do it tonight. So it will have streaming tick technology.

Then the only parameter TickZOOM will need is how much memory you can allow to the Tick buffer.

My only concern will be how this impacts optimizing on parameters. Which by the way I haven't used in a while so we may have to fix a few things with that.

NOW, here's another read design challenge real issue. In fact, let me handle that in a separate post.

Wayne

greaterreturn · Dec 12, 2008

The challenge is the Part B of your question as so how we'll handle this when dealing with ticks from multiple instruments at the same time.

There's 2 parts to discussing this.

1. I just realized that the way I originally planned for multiple instruments won't work. Basically, I imagined the instruments running separately, in parrallel on separate threads. But that won't work because they won't be in synchronization for statistics reporting. Plus you couldn't refer to a "USD/GBP" while inside a "USD/JPY" model.

Obviously, it will be nice to access ANY instrument from any strategy just like I have any bar interval from any strategy.

Here's the challenage. With that amount of data, you would hope to use the Quad CORE mulitple CPU systems to maximize the throughput.

However, that's no easily parrallelized.

In other words, those ticks must be streamed through the strategies in their time sequence order.

You don't want a USD/GBP tick for 3:23 pm going through the model long before the 2:38 tick from USD/JPY.

They must come in the same time sequence as from the real time feed.

Parrallelizing depends on being able to break a process into parts that can be process separately in parrallel and then rejoin the results at the end.

But how do you break a 5 year test of 10 instruments into 4 CPUs?

I do have an idea but it's faily complex and not flushed out.

The problem comes because any given strategy calculation, exit strategy, statistics gathering, or money management will depend on past bars and data during the test. And THOSE depend on past bars.

Take something as simple as the equity curve. How useful can it be in a 5 year test for money management if you start in the middle?

It seems that some trading modelized can be parrallelized better than others.

For example. My day trading system resets all the indicators are midnight every day. That way, NONE of the individual days depends on previous days information.

As far as money management, that's just a % of the current account value.

So that can be easily broken in to separate years and process separately.

But other models may have greater or lesser dependencies.

It will require some research.

But at least with streaming data, you won't hit a wall like in NeoTicker. Still some more advanced analysis and technology is needed for TickZOOM to automatically figure out a strategy to parrallelize a back test.

Wayne

greaterreturn · Dec 12, 2008

Wow. I just found some innovative research on how to parralelize processing on multi CPUs

I'm fully committed to that technology since it will make TickZOOM live up to it's name and make it powerfully competitive to other technology.

Sincerely,
Wayne

Corey · Dec 12, 2008

Quote from greaterreturn:

The challenge is the Part B of your question as so how we'll handle this when dealing with ticks from multiple instruments at the same time.

There's 2 parts to discussing this.

1. I just realized that the way I originally planned for multiple instruments won't work. Basically, I imagined the instruments running separately, in parrallel on separate threads. But that won't work because they won't be in synchronization for statistics reporting. Plus you couldn't refer to a "USD/GBP" while inside a "USD/JPY" model.

Try looking at this from an actor paradigm perspective. It makes more sense to have one actor that controls all tick input and output to other actors to ensure that data is getting synchronized appropriately. It itself may delegate behavior to sub-collectors, but don't forget that at the end of the day, your bottleneck will be your non-parallel I/O (namely, your data stream).

Quote from greaterreturn:

Here's the challenage. With that amount of data, you would hope to use the Quad CORE mulitple CPU systems to maximize the throughput.

However, that's no easily parrallelized.

In other words, those ticks must be streamed through the strategies in their time sequence order.

You don't want a USD/GBP tick for 3:23 pm going through the model long before the 2:38 tick from USD/JPY.

Again, using one actor to synchronize would help. Also, using a binary heap, sorted by time-stamp would give you pretty good performance and allow you to limit strategies from only getting the 'next' tick.

Quote from greaterreturn:

They must come in the same time sequence as from the real time feed.

Parrallelizing depends on being able to break a process into parts that can be process separately in parrallel and then rejoin the results at the end.

But how do you break a 5 year test of 10 instruments into 4 CPUs?

I do have an idea but it's faily complex and not flushed out.

The problem comes because any given strategy calculation, exit strategy, statistics gathering, or money management will depend on past bars and data during the test. And THOSE depend on past bars.

What you are trying to do is run the same computations on four cores with 4 different data sets. What if instead you ran four separate computations on different cores with the same data-set? The issue arises when computations begin relying on one another... so in this manner, the algorithms would have to be designed with parallelism in mind (OpenMP-style). This would probably require specialized interpretation of user scripts -- and I think you were just planning on keeping it standard C#. So this might be out. But ultimately, if the user ever has computations that rely on the past ... you probably can't easily split the data.

So you have two options: either try to hide the parallelism from the user, or force the user to be aware of it and design their algorithms to take advantage of it. In my opinion, it depends on who your target audience is. Considering it also seems like you are having users write their code in C#, it may be difficult for you to perform 'vertical' parallelism on their behalf -- the only way I can see you doing it is by going horizontal, which puts you back at square one with your issues.

A quick thought ... not fully fleshed out ... but what if you have your users write their strategies in different blocks. I could have my MACD computation code in one block, Stoch code in another, et cetera. These blocks run parallel. Then you could have a couple functions to synchronize the parallelism -- allowing them to share data and whatnot. For example...

(note that the following code is just pseudo code...)

In one part...

Code:

sma = SMA(close, 20)
share("SMA", sma)
synchronize(:one)
macd = get("MACD")

In the other...

Code:

macd = MACD(close, 9, 16)
share("MACD", macd)
synchronize(:one)
sma = get("SMA")

This might allow for some parallelism without too much hassle... sort of an actor-paradigm solution with message passing. I dunno. Tough one.

Maybe check out the <a href="http://en.wikipedia.org/wiki/Dataflow_programming">Dataflow programming</a> paradigm?

greaterreturn · Dec 12, 2008

Hey. Corey, thanks. You generated some ideas! Here's several things to realize.

1. An important design principal of TickZOOM is to attempt keep all this kind of complexity out of the custom strategies and indicators.

2. Disk for a long time will not be a bottleneck. Let me explain. First I mentioned 20 seconds loading time for 11Million ticks. But I reviewed that cod and realize that can be cut to a small fraction like 2 seconds. It's because a new features I added that's eating up CPU but I know how to optimize that out of the data load now. Whereas processing those 11 million ticks (when int data types) takes 30 seconds.

So CPU is definately the bottleneck. I see teh QUAD Core pegged at 25% for the entire 30 seconds.

Now for you, or anyone to assist, we need to understand the tick loop in the tick engine.

Here's a description of the main tick loop in the engine like this in pseudocode:

-----------------------------------

1. Does this tick force any new bars to be created in any timeframes?
If so invoke all strategies or indicators that requested it to be notified that the bars are closing for each appropriate bar interval.

2. Now pass this tick to all indicators or strategies that want tick by tick updates.

3. Update the actual bar data with the new tick which also closes or creates new bars as appropriate.

4. If live or replay then update the chart (but not for optimizing or backtest)

5. Again, did this tick force any new bars? If so, call all the strategies and indicators that requested notification for each bar interval that there's a NEW bar available now.

NOTE: TickZOOM prepocesses the custom strategies to work out the order of dependencies and calls them in that order since strategies can be chained with other strategies, and include indicators which can in turn, include other indicators.

So, in that loop, as you can see, everything depends tightly on the step before it. You can't call a depency out of order or update the bars before notifying all the strategies, etc. At first, it appears there's no way to break any of those steps out for parrallel.

But WAIT!

Think of the next tick in the pipline!!!

In theory, after one tick as finished a step, the next tick in the pipeline can start working on that step.

But there's issues with that. We don't want to force users to make strategies and indicators that are thread safe.

Plus, if your strategy is inside the ProcessTick() method and another thread calls the EndPeriod() method, you could have some multithread concurrency problems.

So it's certainly not obvious how to do this.

WAIT WAIT WAIT, I THINK I GOT IT.

OMG. I have a killer idea. I gotta go, I'll leave it for homework and DAZZLE all of you with the result.

Imagine what it would be like to get a full year of gain tick data for USD/JPY of 11 million ticks in 8 seconds.

EIGHT seconds! I know how to do it. And it's easy to actually implement now that I realize it.

Man alive. This is going be more cool that sliced bread.

Wayne

greaterreturn · Dec 12, 2008

This idea is so good that I feel secretive suddenly. *smile* It'll take several days to impliment in my spare time.

But maybe it's good enough if it works to charge money for it. 11 Million ticks in 8 seconds. Plus streaming ticks so there's no memory limitations.

Think of the possibilities.

Sincerely,
Wayne

greaterreturn · Dec 12, 2008

Oh and adding multiple instruments will take about the same work. But that's easy.

Custom models now request in their contructor which bar intervals they need to use (minute, 10 second, etc)

And they have the concept of a default interval that gets handed down from the parent which they can use or override plus add others.

We can just replicate that to add the feature to request which instruments the strategy wants to be updated for also.

It can also have default instrument which is the handed down by the parent or override it.

By "parent", I don't mean as inheritance but in the object graph of chaining and dependencies.

Sincerely,
Wayne

greaterreturn · Dec 12, 2008

Musings. . .

Folks, just checked my trading account results for the week.

I'm sort of torn what to do now.

Let me explain.

Last week I rolled out a live black box strategy that made 30 pips every
single day, all week long. It was up last Monday too before I started this thread.

But at the end of this week it is down 60 pips for the week.

FYI, it has a daily cap of 30 and does about 70% daily profitability
in back testing.

So you see, Tuesday, I was feeling relaxed and happy ready to share my tools
and generally high on the world.

I felt I didn't need to spend so much time for a while researching and testing stratagies.

That's why I posted asking if people thought an open source system was a good idea
because it would be fun for me and help us all, etc. as you well know.

I still love this idea.

But right now, I feel I need to put my nose back to the grinding stone and
spend time working on either another model to add to this one.

My goal is 100% weekly profitability. And I won't stop till I get there.
And week the results are negative, it motivates me to go back to the drawing
board. I keep the existing strategy or stratagies running while working
on another one or a replacement.

I'm sure I will do the optimizations I just discussed simply because it
will speed up my own research except for the multiple instrumention.
I'm okay with just one for now.

I'm open to discussion, ideas, as always.

Oh, and I will finish setting up the tickzoom site and forums. In fact,
the admins are switching the domain name servers now.

However, I don't feel I can afford the time to release the code and
and deal with the questions and support until my own trading is meeting
my minimum expectations.

I have a financial plan that I have to meet X amount for a family need before March. So I'm on a tight schedule.

Sincerely,
Wayne

Random.Capital · Dec 12, 2008

Quote from greaterreturn:

...does about 70% daily profitability
in back testing.

What does "70%" mean? That 2 out of 3 trades are profitable, or that trades show on average a 70% gain?

Random.Capital · Dec 12, 2008

Quote from greaterreturn:

I have a financial plan that I have to meet X amount for a family need before March. So I'm on a tight schedule.

A couple of reminders from one of the legends...

âWhat does a man do when he sets out to make the stock market pay for a sudden need? Why, he merely hopes. He gambles. He therefore runs much greater risks than he would if he were speculating intelligently, in accordance with opinions or beliefs logically arrived at after a dispassionate study of underlying conditions.â

And...

"âIn fact, of all hoodoos in Wall Street I think the resolve to induce the stock market to act as a fairy godmother is the busiest and most persistent."

And...

âThere isnât a man in Wall Street who has not lost money trying to make the market pay for an automobile or a bracelet or a motor boat or a painting.