Backtesting Metrics

Quote from TD80:

The first thing I look at after statistical significance is % average profit. If it isn't really good, the system gets tossed into the archives. Of course the exact number will be up to the developer as to what constitutes "really good".


Could you explain a bit more "% average profit"? Is it the average % of Win amount / Account balance at time of win, as inverse of "2% of account balance for a stop"? (No intention to question or argue, only seeking more info)
 
Quote from zedDoubleNaught:

Could you explain a bit more "% average profit"? Is it the average % of Win amount / Account balance at time of win, as inverse of "2% of account balance for a stop"? (No intention to question or argue, only seeking more info)

It is the average return on the amount risked. If I buy $50K of a stock and make a $2,500 profit, then my return on risked capital is 5%. So average all of those profits/losses for every trade and you get your average return per trade.

The shorter the time frame for the trade, the better you can compound, but your costs (commissions/slippage) eat away at your return.

The higher the average return, the more "buffer" you have to pause/stop should your strategy start to become too popular, but odds are you are in the trade longer, so you take more risk and compound at a slower rate.

If you ride the (optimal) edge with slim margins, it could take only one big player to come in and turn your strategy into a losing proposition in a flash.

Stay to the right of optimal when it comes to your profit versus cost over time ratio. Beware that thinly traded volatile stocks look like they offer paradigm-breaking opportunity of both high average returns and very short holding periods, but the slippage will kill you once you get some size.
 
Quote from nLepwa:

The trade frequency is irrelevant if you suffer from regime shift.
I have mean-reversion systems with thousands of trades per month that perform extremely well from 1990 to 1998 and then crash.

1998 really seems to be a turning point for a lot of mean-reversion strategies. Since you don't know when the next turning point will come, backtesting back to 1997-1998 gives you a nice overview of how things could turn out...

If your strategy exploits some kind of mean-reversion even remotely (and most intra-day strategies do) I would suggest to backtest atleast 15 years.
Personally I backtest 25 years. And I expect my strategies to work for (atleast) another 25 years. Trading isn't a get-rich-quick scheme.

Ninna

Interesting. I did a ton of work trying to figure out what that shift was. I'm pretty sure I know the answer now, and have incorporated it as part of my strategy.
Nothing like backtesting as far back as you can go; sometimes the insights you get can be very revealing.
 
I am new to system design, and have read Pardo's book.
Have a question for more knowledgable system designers about the degrees of freedom in your strategy. Does the inclusion of more indicators/parameters decrease the degrees of freedom and can this be overcome by increasing your test window size in order to compensate??
Pardo suggests that as a rule of thumb remaining degrees of freedom remain above 90%. (whats the most optimum parameter you have found?)


Thanks
 
Quote from DustyFoot:

Hi Everyone. I have been working on developing several ATS and I was wondering if any of you could let me know what kind of metrics you use for your backtesting results before you decide to execute a strategy. Things such as Drawdown VS profit, profit factor , number of executions and how far back you look in order to consider your strategy robust enough. I would imagine this has been asked and answered but I was unable to find a thread. Any insight would be appreciated.
Thanks
Hi DustyFoot,

fighting overfitting is the greatest single concern a strategist has. So before metrics, it is important to make sure that the experimental model will not allow <b>curve fitting</b> (which is by no means trivial).

As to metrics, apart the classic metrics which have been suggested [ and that should include also the Sharpe ratio (just because it's the first thing any hedge fund will ask) ] you may also take a look at my simulation results, were i list several indicators, which personally i find useful in strategy assessment:

<a href="http://www.datatime.eu/public/gbot/Strats%20G-BOT/default.htm"> my sims </a>

(click on the various links Strategy ... : these are automated sim output)

Personally, once you are within the max target drawdown, i consider very useful the ratio of the <b>average PNL (for instance daily) over the Maximum Drawdown</b> ever seen (where by "max drawdown" i mean the greatest PNL decrease from a local PNL maximum). Also, the ratio of the average PNL (for instance daily) over the Maximum (absolute) Position ever is useful. I usually multiply the first ratio for 100K, because it expresses what is the strategy <b>Avg Daily Profit for each 100K of max drawdown.</b>

More useful than looking at single indicators, is to look at their distributions over a large number of sessions (bootstrapping).


Tom
 
Quote from slavduja:

I am new to system design, and have read Pardo's book.
Have a question for more knowledgable system designers about the degrees of freedom in your strategy. Does the inclusion of more indicators/parameters decrease the degrees of freedom and can this be overcome by increasing your test window size in order to compensate??
Pardo suggests that as a rule of thumb remaining degrees of freedom remain above 90%. (whats the most optimum parameter you have found?)


Thanks

I don't see this as a proper use of DoF. All trading systems in the world have 3 DoF: long, short and flat. A plane has 6 DoF but can have many parameters for control. So is a trading system, 3 DOF but can have many control parameters. These authors they use the wrong terminology. If you use many redundant control parameters you risk the controller becoming unstable. Same in a trading system, if you use too many parameters the system becomes unprofitable.
 
Quote from fullautotrading:

Hi DustyFoot,

fighting overfitting is the greatest single concern a strategist has. So before metrics, it is important to make sure that the experimental model will not allow <b>curve fitting</b> (which is by no means trivial).

As to metrics, apart the classic metrics which have been suggested [ and that should include also the Sharpe ratio (just because it's the first thing any hedge fund will ask) ] you may also take a look at my simulation results, were i list several indicators, which personally i find useful in strategy assessment:

<a href="http://www.datatime.eu/public/gbot/Strats%20G-BOT/default.htm"> my sims </a>

(click on the various links Strategy ... : these are automated sim output)

Personally, once you are within the max target drawdown, i consider very useful the ratio of the <b>average PNL (for instance daily) over the Maximum Drawdown</b> ever seen (where by "max drawdown" i mean the greatest PNL decrease from a local PNL maximum). Also, the ratio of the average PNL (for instance daily) over the Maximum (absolute) Position ever is useful. I usually multiply the first ratio for 100K, because it expresses what is the strategy <b>Avg Daily Profit for each 100K of max drawdown.</b>

More useful than looking at single indicators, is to look at their distributions over a large number of sessions (bootstrapping).


Tom
Thanks for sharing this. Great stuff.
 
Quote from slavduja:

Does the inclusion of more indicators/parameters decrease the degrees of freedom and can this be overcome by increasing your test window size in order to compensate??
...

Thanks
Parameters are not necessarily trading "indicators". (Personally, I believe indicators generally useless for trading and, obviously, the more indicators one has, better can be "curve fitting" to past data.)

"Parameter" can take another meaning. For instance, the order size, the scalp size, etc. can be "parameters". And some parameters do not change the nature (profitable/unprofitable) of the strategy or the order of magnitude of performance indicators, due to obvious scale invariance.

There are other methods and countermeasures to fight curve fitting, and these are generally learned in time and painfully. You may well curve fit even with the simplest strategy in the world, with just one indicator.

Curve fitting is one of the thing computers and humans do better. The capacity of adapting is just an expression of the intelligence. Unfortunately, in this case it's not an expression of intelligence but of naivety, because one realization (even of several decades) of past data is in no way useful for future profitability, but it's especially great for curve fitting. What can be barely inferred are just very general indications, about order of magnitude, ranges, volatility, correlations, etc.

Tom
 
Back
Top