Interesting blog on BlueTrend fund and system development

Sergio77 · Feb 28, 2015

BlueTrebd has return about 12%annualized since inception. Interesting references to comments by famous Leda Braga who runs the fund about system design and data-mining. Anyone with knowledge of the details of bootstrap tests please comment on relevance of p-values calculated for the fund. Author makes some interesting comments about hypothesis testing and relation to data-mining.

Blog link

acrary · Feb 28, 2015

From what I read his reference to bootstrap testing is another name for Monte Carlo Testing. Looks like he took monthly returns; randomized them in 500,000 passes and totaled the annual performance. Then he compared the actual results with the ranked sampled returns. For p-testing you usually care about 95% significance or 99% significance. In the first case the actual results were better than more than 99.7% of samples. In the second case the actual results were only better than 84% of the results. If the fund had existed prior to the first data samples then you could conclude he had a very good edge that has recently deteriorated. If the fund has only been around for 5 years then he likely was curvefitting the data and the results are pretty much random. The large drawdowns in trend following are likely the result of putting on too much size initially and accepting too much heat (too many positions in correlated markets). If they had stepped into the trends, the drawdowns would have been much lower. He mentioned the addition of countertrend trades to reduce drawdowns, however that could have easily been added to the trend following systems to sharpen the initial trades.

globalarbtrader · Feb 28, 2015

The p-value you get from bootstrapping returns will be a function of how many observations you have and what your realised Sharpe Ratio is. So for example if your Sharpe is really 0.5 then on average with 20 years of data you'll have a p-value of 2.5%. Bootstrapping deals well with non Guassian returns as you get a parametric p-value. You get higher p-values for a positive skew strategy like trend following.

So this is just a fancy way of saying that trend following SR was lower in the past.

“There’s a creative moment when you think of a hypothesis, maybe it’s that interest rate data drives currency rates. So we think about that first before mining the data. We don’t mine the data to come up with ideas.”

I call this ideas first testing. Starting with the data, and getting the model, I call 'data first' testing.

Both methods have their advantages and drawbacks.

The following is an excerpt from my forthcoming book:

Systematic trading assumes that the future will be like the past. Hence we should create rules that would have worked historically, and hope that they will continue to work in the future.

But there are at least two different ways to find rules. One common method, which I call data first, is to analyse some data, find some profitable patterns and create some trading rules to exploit them. This is sometimes called data mining. The alternative, ideas first, is to come up with an idea, then create a rule, which is then tested on data to see if it works.

(There is a third method which is to use an idea which you cannot or will not test on historical data. This falls outside the scope of this book.)

Designing an ideas first system is like saying:

“I want to design a system that captures this kind of market behavior or source of return. I hope this behavior or source of return is still around in the future”.

Whereas for a data first system:

“Here is a system that was profitable in the past given the patterns in the market (which I won't try and explain or understand). I hope these market patterns persist in the future”.

Advantages of ideas first

If an idea works no further dangerous fitting is compulsory.

Any fitting will probably be done in a small subset of alternatives.

- - We tend to get simpler and more intuitive trading rules with ideas first.

We can construct rules that make intuitive sense, with a story behind them.

It's easier to classify the trading rule, and work out where its profits are coming from.

Advantages of data first

There will be a bias with ideas first to testing things that we know will work, either because of 'market lore' or academic studies. This is a form of hidden over-fitting. This is essentially the problem highlighted in the blog.

With ideas first it is tempting to try a large number of ideas to find the ones that work. This is also a form of over-fitting.

All the fitting that is done is explicit, so the degree of over-fitting. can be controlled.

A compelling theory or story does not guarantee that the source of returns is repeatable, and could give a false sense of security.

Clever data analysis might unearth novel strategies that were previously unknown.

Given my preference for things I can trust and understand, I favor the ideas first method. This usually results in intuitive, simpler and more transparent rules. As long as a small number of ideas are tested over-fitting is less likely.

But in some situations the data first process is better; for example in high frequency trading where there is plenty of data, rules can be refitted regularly and novel ideas are more likely to be found as market structure evolves.

The important thing is to be aware of the strengths and weaknesses of each method, and use them appropriately.

(Disclosure - I know a few of the BT guys so might not be an unbiased observer)

justrading · Feb 28, 2015

acrary said:
.......The large drawdowns in trend following are likely the result of putting on too much size initially and accepting too much heat (too many positions in correlated markets). If they had stepped into the trends, the drawdowns would have been much lower.

I am so pleased to read this. One constantly sees absolute statements that scaling in is inferior behaviour, that it is mathematically proven that scaling in is inferior. Of course it is mathematically proven, if you know the outcome in advance.

I think of myself as a swing trader, but given the wide stops I tend to use until I see a need to tighten, I am somewhat of a trend follower. In a difficult year for my trading style, I had a 38% win rate but managed to squeeze out 26% profit. On analysis, my straight losers had the smallest size, my biggest winners (in percentage terms) had the largest size.

I typically scale in, close out everything in one go unless I have the slightest doubt, in which case I will leave a partial runner with a wide but still profitable stop.

Visaria · Feb 28, 2015

my own method sounds similar to yours, may i ask whether you use profit targets? Do you scale out of trades?

justrading · Feb 28, 2015

Visaria, if your question is directed at me these are my answers.

I have not used profit targets until now, because they cap profits when we really do not know when the run would stop. An early lesson for me was a stock that over a few weeks moved up 30%, and I reaped 24% of it. Had I taken profit, it would have been much less.

I am now researching my methods on FX and I find that typically price tends to move in a limited range, unless a central bank action or fiscal policy initiative moves the rate significantly. So, for the first time I am forced to research profit targets, because trailing stops completely destroy my approach. I suspect this would hold true for any mean reverting instrument.

Edit: sorry missed this, I usually do not scale out as a trailing stop takes me out. I am researching this now for FX as I see the need.

Sergio77 · Mar 1, 2015

Thanks acray. I think the data were from monthly returns of the fund since inception. Anyway, I think you are right and it looks like Monte Carlo testing.

So I gather than these fund managers do not like data-mining but they may be doing it without realizing it. Do you think they actually have some edge or they have been lucky?

acrary said:
From what I read his reference to bootstrap testing is another name for Monte Carlo Testing. Looks like he took monthly returns; randomized them in 500,000 passes and totaled the annual performance. Then he compared the actual results with the ranked sampled returns. For p-testing you usually care about 95% significance or 99% significance. In the first case the actual results were better than more than 99.7% of samples. In the second case the actual results were only better than 84% of the results. If the fund had existed prior to the first data samples then you could conclude he had a very good edge that has recently deteriorated. If the fund has only been around for 5 years then he likely was curvefitting the data and the results are pretty much random. The large drawdowns in trend following are likely the result of putting on too much size initially and accepting too much heat (too many positions in correlated markets). If they had stepped into the trends, the drawdowns would have been much lower. He mentioned the addition of countertrend trades to reduce drawdowns, however that could have easily been added to the trend following systems to sharpen the initial trades.

Sergio77 · Mar 1, 2015

globalarbtrader said:
So this is just a fancy way of saying that trend following SR was lower in the past.

DOoyou mean Sharpe Ration? From the fund performance it seems that it was higher in the past, not lower.

globalarbtrader · Mar 1, 2015

Sergio77 said:
DOoyou mean Sharpe Ration? From the fund performance it seems that it was higher in the past, not lower.

Yes SR - Sharpe Ratio. And I meant higher in the past. My bad.

Sergio77 · Mar 7, 2015

I am curious how they will do this month. It seems to me that the hope are high for continuing trend in stocks and most funds have moved away from commodities because of lack of liquidity and now follow stock trend.