Backtesting Metrics

HowardCohodas · Jan 21, 2011

Even for the aware, it is still easy to fall victim to curve fitting. I now go considerably out of my way to avoid it.

The first thing to do is divide the historical data into two non-contiguous parts. On one set you experiment (train) with your rules to find something worth validating. Then you process the part of the data set that your rules have never seen through your rule set. Only if the results are comparable can you have confidence you have something worth the next stage of testing. This method is well known.

I take it two steps farther. I divide the data set into smaller blocks such that the block size contains the longest time a trade would be in play. I then randomly pick which block goes in which set for each back-test run I perform. Furthermore, I vary the size of each data set with respect to the amount of data in each set. Once set contains from 40% to 60% of the data and the other set contains the rest. The program randomly chooses the amount of data in the training set as well as which blocks go into the set.

I get a lot few candidates for down stream validation, but I avoid curve fitting. Since historical data in general and EOD in particular have sever limitations when testing some strategies, many strategies still fail at the next step which is paper trading.

slavduja · Jan 21, 2011

Quote from intradaybill:

I don't see this as a proper use of DoF. All trading systems in the world have 3 DoF: long, short and flat. A plane has 6 DoF but can have many parameters for control. So is a trading system, 3 DOF but can have many control parameters. These authors they use the wrong terminology. If you use many redundant control parameters you risk the controller becoming unstable. Same in a trading system, if you use too many parameters the system becomes unprofitable.

Thanks for the reply intradaybill,

Could you please provide an example or two of "redundant control parameters?"

thanks

slavduja · Jan 21, 2011

Quote from fullautotrading:

Parameters are not necessarily trading "indicators". (Personally, I believe indicators generally useless for trading and, obviously, the more indicators one has, better can be "curve fitting" to past data.)

"Parameter" can take another meaning. For instance, the order size, the scalp size, etc. can be "parameters". And some parameters do not change the nature (profitable/unprofitable) of the strategy or the order of magnitude of performance indicators, due to obvious scale invariance.

There are other methods and countermeasures to fight curve fitting, and these are generally learned in time and painfully. You may well curve fit even with the simplest strategy in the world, with just one indicator.

Tom

Thanks for the reply Tom. Excellent sim link !
Ok let me perhaps be more specific. In Pardos book he speaks of degrees of freedom and trade sample size being large enough in order to take the back-tests as statistically reliable.
his definition of degree of freedom was a "each data point in the sample represents 1 DoF" than he follows and sais "A degree of freedom then is said to be consumed or used by each trading rule and by every data point necessary to calculate indicators" His example off the page 292 (second edition) is a trading system using a data sample , which is a four data-point ,2year, price history , composed of opens,highs,lows,closes, or a total of 2,080 points.Example one is a trading strategy that uses a 10-day average of the highs and a 50-day average of the lows. Average one uses 11 degrees of freedom ,10 highs plus one more as a rule.Average two uses 51 degrees of freedom 50 lows plus one as a rule. The total is 62 degrees of freedom used. " 62/2080 = 3% which leaves DoF> 90 , which is acceptable.

My question , is this an accurate way to determine DoF or not?? Should one just discard this and focus on a trade sample size? Your opinions and experience.
If one is to add a breadth indicator to the above strategy , are the data points to be added to the 2080 or calculated separately since they are not in the form of open ,high, low and close per say?

Since I am new, I do have a few more questions about testing for randomness. As stated in this thread every trading strategy can either be long, short or flat at the time. I have seen some test against randomness by separating each trade done in the backtest, and randomly assigning 1 out of 3 possible options (to go long, short or remain flat) at the time and comparing the end resulting equity curves vs your own to gauge effectiveness of your edge.
Does anybody use such a method or the one Pardo presents int he book called correlation between equity curve and perfect profit?

Slavduja

intradaybill · Jan 21, 2011

Quote from slavduja:

Thanks for the reply intradaybill,

Could you please provide an example or two of "redundant control parameters?"

thanks

Anything that is a derivative of price is a redundant parameter. That does not mean it is useless but it is redundant.

intradaybill · Jan 21, 2011

Quote from slavduja:

Example one is a trading strategy that uses a 10-day average of the highs and a 50-day average of the lows. Average one uses 11 degrees of freedom ,10 highs plus one more as a rule.Average two uses 51 degrees of freedom 50 lows plus one as a rule. The total is 62 degrees of freedom used. " 62/2080 = 3% which leaves DoF> 90 , which is acceptable.

These are not degress of freedom. These are functions of degrees of freedom. Anyone who does not udnerstand the difference cannot be taken seriously.

fullautotrading · Jan 21, 2011

Quote from slavduja:

Thanks for the reply Tom. Excellent sim link !

Thanks to you guys. I am glad to have your favor and appreciation.

Ok let me perhaps be more specific. In Pardos book he speaks of degrees of freedom and trade sample size being large enough in order to take the back-tests as statistically reliable.
his definition of degree of freedom was a "each data point in the sample represents 1 DoF" than he follows and sais "A degree of freedom then is said to be consumed or used by each trading rule and by every data point necessary to calculate indicators" His example off the page 292 (second edition) is a trading system using a data sample , which is a four data-point ,2year, price history , composed of opens,highs,lows,closes, or a total of 2,080 points.Example one is a trading strategy that uses a 10-day average of the highs and a 50-day average of the lows. Average one uses 11 degrees of freedom ,10 highs plus one more as a rule.Average two uses 51 degrees of freedom 50 lows plus one as a rule. The total is 62 degrees of freedom used. " 62/2080 = 3% which leaves DoF> 90 , which is acceptable.

Yes sure, for one who want to lose even his underware it certainly is )
Acceptable? Why don't you tell him to trade it with his own money instead of dispensing these thruths through books.

My question , is this an accurate way to determine DoF or not?? Should one just discard this and focus on a trade sample size? Your opinions and experience.
If one is to add a breadth indicator to the above strategy , are the data points to be added to the 2080 or calculated separately since they are not in the form of open ,high, low and close per say?

I have not read the book mentioned, but from those concepts you report, it seems more on the delirium side than on the rational side.
He is trying to "generalize" (improperly) a well known concept (Lagrange interpolation). That is for 2 points passes a line.
For 3 points passes a parabola. For n points passes a polynomial of degree n-1. Which means that if you have at least n parameters (coefficients of the polynomial) then you can touch all the n points with the curve.

cf: http://en.wikipedia.org/wiki/Lagrange_polynomial
http://www.ibiblio.org/e-notes/Splines/Lagrange.htm

But, frankly, the context of trading is by far more complex and that kind of statements sound to me like unnecessary pseudomathematical bullshit of one who needs to impress a naive public.

Since I am new, I do have a few more questions about testing for randomness. As stated in this thread every trading strategy can either be long, short or flat at the time.

Nope it's not so simple. That's a naive concept. It depends on the strategies. (For instance I personally implement *only* strategies which "conceptually" can have all the above states simultaneously.)

I have seen some test against randomness by separating each trade done in the backtest, and randomly assigning 1 out of 3 possible options (to go long, short or remain flat) at the time and comparing the end resulting equity curves vs your own to gauge effectiveness of your edge.
Does anybody use such a method or the one Pardo presents int he book called correlation between equity curve and perfect profit?

I think if one uses his own intelligence and make concrete experience he will be doing much better than following those "inventions" from a writer that, I just suspect, from what you report, has rather commercial needs than really knowing what he is talking about.

Mike805 · Jan 21, 2011

Quote from fullautotrading:
I think if one uses his own intelligence and make concrete experience he will be doing much better than following those "inventions" from a writer that, I just suspect, from what you report, has rather commercial needs than really knowing what he is talking about.

Even though he's an author, I don't think Pardo has any commercial needs. Of course everyone should think for themselves... but, I do think it is short sighted of you to suspect what you do of Pardo.

Pardo doesn't offer inventions, he offers a guideline, and a good one at that. The "system" described is a means to facilitate testing, not something to "lose one's underwear" with...

Your assessment of the Pardo literature is wrong and you're forming some narrow opinions here.

I understand everyone has their own way of suceeding in this business, but, in general, forming opinions *after proper research* is what allows for innovation and progress. I suggest you do some proper research.

slavduja · Jan 22, 2011

Quote from Mike805:

Even though he's an author, I don't think Pardo has any commercial needs. Of course everyone should think for themselves... but, I do think it is short sighted of you to suspect what you do of Pardo.

Pardo doesn't offer inventions, he offers a guideline, and a good one at that. The "system" described is a means to facilitate testing, not something to "lose one's underwear" with...

I agree and definitely like the book. Pardo definitely knows what he is doing.i was just unsure about the degrees of freedom explanation and accuracy.
http://www.futuressourcebook.com/Pa...rsary-with-Upper-echelon-Ranking-pr284459.htm

Slav

fullautotrading · Jan 22, 2011

Quote from Mike805:

...

Your assessment of the Pardo literature is wrong and you're forming some narrow opinions here.

I understand everyone has their own way of suceeding in this business, but, in general, forming opinions *after proper research* is what allows for innovation and progress. I suggest you do some proper research.

Hi Mike805, I have no problems with the fact that my mere suspect has no reason to be. I am actually happy to hear that. As I said, i neither know the author, nor i read the book.

On the other hand, I dont feel is much fair to say that i need to do "proper research", for months, on an author, to express my opinion about a concept which has been described accurately, and for which my personal opinion has been asked.

Let make a concrete example. If I told you that the Earth is a cube or, to sound better an hypercube with 3 DOF, would you need to do proper research on me, to point out the fact that i am saying bullshit ?

The biggest problem of people approaching trading is not lack of "theoretical" knowledge, but, more simply the lack of simple logic and consequentiality. Also there are "stages" in a trader development, and, for instance, in time i also have abandoned many concepts that at one time i believed in. Remember always that when someone has a problem of real contents, sometimes he hides behind sophisticated jargon and cryptical terms, even if they are out of place.

I tell you why that concept is flawed, in my opinion. And that clearly does not necessarily mean that the author is always wrong. But in this case i personally think he is highly misleading, and, in conscience, i would not advise a client on that road. One can't deterministically link curve fitting tendence with amount of observed data.

But Let make a concrete example, instead of discussing of "the sex of angels" and DOF. Assume you are backtesting a strategy. Assume, for sake of example, you are given access to *only* 2 sets of data to tune your strategy:

1. 5 years of tickdata of a stock which during has been steadily decreasing in time (say a steady monotonic decrease).

2. 5 different sets of tickdata, 1 month each, of a stock which has been having during those months different behaviors, for instance: bullish, bearish, sideways, small reversions, large reversions

Would you prefer to use the 5 years of tickdata or the 5 months of tickdata ?

Tom

spread_trader11 · Jan 22, 2011

Quote from Arthur Deco:

Six months of data. Profits greater than 2.5X losses. Wins at least one time out of three. Results consistesnt for twelve rolling six month periods. Scalable.

Agree with all. Scalable, well if it actually is a "trading" method, not overnight positioning for example, scale always comes into play, so develop moe than one method.