I totally agree.
I was just saying that the example in the paper is about as simple, robust, legitimate, etc. of a trading system as one could come up with. If one was to come up with the hypothesis that "when markets go up, they tend to continue to go up", they would take the last 4 decades of market data to test their initial premise. They would find that regardless of lookback window, a basic trend following system out-performs buy and hold to a significant degree.
This would confirm their initial hypothesis (or rather, reject the null hypothesis with some level of confidence). Assuming that you believe this behavior will continue (which you should if you're going to do any type of testing on historical data), you trade it going forward.
If the system is unprofitable going forward, there is nothing one could have done during the system development process to prevent this. If you believed a shift in market behavior was coming in the near future, you shouldn't be using past data for testing in the first place.
I guess I just don't understand the point the paper is trying to make. Obviously, if the market behavior that's driving your profits ceases, your system will fail.
It should be in the Abstract, but the paper's full text focuses alot on limitations of backtest overfitting. I don't see them backing up this claim in the Abstract, but don't disagree with it either:
In this paper we present two examples that demonstrate the limitation of quantitative evaluation of trading strategies and we claim that the most effective way of guarding against overfitting and selection bias is by limiting the applications of backtesting to a class of strategies that employ similar but simple predictors of price. We claim that determining when market conditions change is in many cases fundamentally more important than any quantitative claims about trading strategy evaluation.
The Conclusion has some more explanation that is more of the common sense stuff, although not really backed up by the paper either. Eg.:
“There’s a creative moment when you think of a hypothesis, maybe it’s that interest rate data drives currency rates. So we think about that first before mining the data. We don’t mine the data to come up with ideas.”
Only naive practitioners feed data to a machine learning model in hope that it will generate a significant result. Quantitative analysis shows that results from multiple trials can be misleading.
At least for me, such research may at times be interesting, but to spend so much time writing and studying papers on such simple systems and on weak trading-premises that do not consider risk management may seem like a little waste of time, if one really wants to make something worthwhile. I see the Conclusion in a convoluted, but perhaps more "correct" way, says very much the same things I stated in my previous post. Ironically, the paper's Conclusion seems to agree with this too:
Although the academic community has contributed significantly in raising awareness about certain issues, it cannot provide a framework for generating those “creative moments” Leda Braga referred to above but only investigate whether a moment was not as creative as was expected. Although this is partly progress, it is far from a solution to the problem, if such a solution exists at all.
In my mind, it's the quality of the feedback-loop when doing development, testing and executing real trades, that determines wether one optimizes on potentially false premises or not, and that for purely technical data you never really know anyways. That higher earnings may drive price higher in the future, is a much more concrete hypothesis, but less to do with trading being more part of an investment strategy.