Is this over-fitting

You're right if your definition of forward test is using real time data. So in you example, yes, you would need to wait 7.5 years, which is not very practical.
I should have given more details.
A strategy is optimized using a optimization data set.
Once you find one with good results, you then test it on a validation data set, one that was not used for the optimization step.
But it could happen that both data sets (optimization and validation) give good results but fail when using it in real time.
If profitable trading was easy, everybody would be doing it...
I work in data science, i understand what you mean. One could also use k-fold cross validation for that matter.
However, when the (hypothetical) equity curve is smooth this would essentially mean the same thing, imo
 
  • Like
Reactions: rb7
Lets say I have a set basket of 50 ETF's that I like to backtest and trade with. I run a backtest and get good results and decide to trade a strategy, but I notice about 10/50 of the ETF's perform poorly with the chosen parameters.

1. Would it be considered over-fitting to exclude those 10 ETF's from live trading

2. Doubling down here: if I did exclude the 10 poor peformers, how bad would it be to re-optimize on the remaining 40/50 ETF's.

You should be focused upon "what's working". If something is not, eliminate it. You need a certain amount of diversification to protect against "single issue adverse event risk", but that's all. You don't need "diversification across all sprectrums".

"Relative Strength" and "sector rotation" are big parts of premium performance.
 
Last edited:
if I did exclude the 10 poor peformers

Can you tell your broker that this stock sucks, I want my money back? Or take my orders at the middle of the chart and not at the right edge(a quote from Alex Elders).
 
Lets say I have a set basket of 50 ETF's that I like to backtest and trade with. I run a backtest and get good results and decide to trade a strategy, but I notice about 10/50 of the ETF's perform poorly with the chosen parameters.

1. Would it be considered over-fitting to exclude those 10 ETF's from live trading

2. Doubling down here: if I did exclude the 10 poor peformers, how bad would it be to re-optimize on the remaining 40/50 ETF's.

The only way you could convince me it's not overfitting is if you had a good reason why those 10 didn't work. Is there something in common between them that would plausibly lead to them performing poorly with your strategy? If so, what's the mechanism? If you can't answer that... probably an overfit.
 
What is in the basket?

You cannot dump very different ETFs into a basket.
Just for eg don't dump
the slow-moving grain ETF with the fast-moving oil ETF.

They have very different behavior/characteristics/characters/personalities.

I solved that problem by always using the same system, but in different timeframes. Faster moving means shorter timeframes; slower moving means longer timeframes.
I can use the same system in any commodity and in any market. Don't have to change anything.
 
Last edited:
Every backtest is an over-fit.

Not necessarily.
I backtested my system when it was in development without doing any optimization. I searched for situations where my system got in trouble and tried to find a global improvement in the basics of my system. So not in testing variable parameters and optimize the system for that specific set of data.
A system should be build on logic. If something does not work it means the logic is not good, so you don't fix that with an optimization of parameters. You improve your logic.
To me a good system should perform well in any market within changing any parameter.

Within certain limits parameters in a good system can be changed without influencing the performance. That shows that the system is well balanced, and does not get affected by "noise".

The basic logic of my system never changed the last 25 years and still works. There is some kind of logic in the market behavior, markets are not random. Question is how to find what the logic is.
I tried to find a logical solution, not a mathematical one. And when the logic is OK, the math in general is OK too. As math is about logic.
 
Last edited:
Lets say I have a set basket of 50 ETF's that I like to backtest and trade with. I run a backtest and get good results and decide to trade a strategy, but I notice about 10/50 of the ETF's perform poorly with the chosen parameters.
Why did the 10 of 50 perform poorly? Different sectors, different asset classes, or was there anything that stood out?
Did you run the backtests with different starting dates?
 
The only way you could convince me it's not overfitting is if you had a good reason why those 10 didn't work. Is there something in common between them that would plausibly lead to them performing poorly with your strategy? If so, what's the mechanism? If you can't answer that... probably an overfit.
It is realy strict as you said and in the fact I also think so, but it is really difficult;
Instead I sometime train a strategy by "APPL", and I would believe it most if the trained strategy runs well on "IBM" (at least several tradings) even though it runs not so well another.
Is it ok?
 
Not necessarily.
I backtested my system when it was in development without doing any optimization. I searched for situations where my system got in trouble and tried to find a global improvement in the basics of my system. So not in testing variable parameters and optimize the system for that specific set of data.
A system should be build on logic. If something does not work it means the logic is not good, so you don't fix that with an optimization of parameters. You improve your logic.
To me a good system should perform well in any market within changing any parameter.

Within certain limits parameters in a good system can be changed without influencing the performance. That shows that the system is well balanced, and does not get affected by "noise".

The basic logic of my system never changed the last 25 years and still works. There is some kind of logic in the market behavior, markets are not random. Question is how to find what the logic is.
I tried to find a logical solution, not a mathematical one. And when the logic is OK, the math in general is OK too. As math is about logic.
do you mean the logic as several "rules"?
 
Back
Top