Difference between Over Ftting and Optimization to Current Environment

Does it really matter if the edge you have discovered is because of randomness (survivorship bias) or intelligent design?

Think of a fair coin and a loaded coin. If you analyze the history of the flips using the fair coin, you may think that you've discovered some "patterns", such as "heads-heads-tails-tails", for example. These patterns are spurious, of course. There is no predictive value in them at all. On the other hand, the patterns of the loaded coin (such as 55% heads, 45% tails) are persistent and exploitable, as long as the coin remains loaded.

It's the same thing with the market edge. The spurious edge is short-lived and useless for extrapolating into the future. The true edge is persistent and exploitable. It's true that the market can change and render the edge unusable, but until that time, it's certainly better than the spurious edge.
 
Think of a fair coin and a loaded coin. If you analyze the history of the flips using the fair coin, you may think that you've discovered some "patterns", such as "heads-heads-tails-tails", for example. These patterns are spurious, of course. There is no predictive value in them at all. On the other hand, the patterns of the loaded coin (such as 55% heads, 45% tails) are persistent and exploitable, as long as the coin remains loaded.

It's the same thing with the market edge. The spurious edge is short-lived and useless for extrapolating into the future. The true edge is persistent and exploitable. It's true that the market can change and render the edge unusable, but until that time, it's certainly better than the spurious edge.

Nice explanation.
 
Simply,

Curve fitting uses the factors that actually determine the curve.

Over-fitting uses factors, in whole or part, that appear to determine the curve; but don't really.
 
Simply,

Curve fitting uses the factors that actually determine the curve.

Over-fitting uses factors, in whole or part, that appear to determine the curve; but don't really.

I think I understand what you mean, and this is a nuance often lost in discussion.

In my mind, curve fitting is typically adjusting parameters on some kind of indicator, until they all line up historically. Thinking you've revolutionized trading, a newbie fails to see that this historical "curve" is one but many many possible curves, thus believing that this set of parameters is applicable to every curve (ie. other time periods or instruments). It is not. Depending on the assumptions, formula and trading rules, it _might_ be applicable to the current conditions in the instrument, for some time though. It might just be statistical noise too, if there's no basis underlying the process you've used.

So saying "curve fitting" is a bad idea on a general basis, is just lack of clarity or understanding. It depends on what basis you've adapted to the curve, what you plan to use it for, and vica versa. On the other hand, making successful adaptive systems for something chaotic as markets is very very hard, prone to failures and instability. Many choose to just use fixed periods they find works for them and learn to deal with the failures instead.
 
I am back testing an intraday strategy for trading oil futures. It is technically driven and takes about 3 trades a day during a constrained trading period. Given I only have detailed data to test back to 2010, that is as far back as my testing has gone. The system works great over the past 2 years, weaker in the 3rd and 4th. It blows up 5 years ago. I understand the effectiveness of the system is based on it being complementary to the market environment. Of course this varies based on the other algorithms dominant at the time AND I feel the price of oil is a factor as well being that exposure per contract changes the volumes traded. Also, the driving forces for oil have changed as well as the patterns in how the market digests that information. Anyway..... is it more wise to use the parameters that give, perhaps less flattering, but more consistent results over all trials OR is it better to use the the parameters that yield the best results over the past couple years? Of couse, I understand I would have to recognize when the environment is changing so I can once again keep my parameters current (if recent optimal parameters are indeed the best choice for implementation). Basically I am confused because I know I'm not supposed to "over fit" ..... but wouldn't the recent trading environment be the one that matters the most anyway? Why should I care if it worked 3 years ago when I am actively going to be on the watch for environmental changes that would cause the system to break down in the present?

I appreciate any perspective or advice. Thanks!

If you are trying to create a trading model using using historic data you will want to split your data into two data sets a training set and a testing set. The data in each set should be randomized so you can determine if your set up really works. When developing your trading set up determine what factors are most important for predicting an outcome and use as few variables as possible. Once you have trained your model, you can use it to make predictions on your test data.
 
To ensure that your system is not benefiting from a set of market conditions that will fade away... just make sure your testing period (in sample) is long enough. Mine starts in 1999... and covers a lot of changing market conditions. Regards and good trading
 
Back
Top