1) nerdy way to avoid overfitting https://seekingalpha.com/article/4126281-optimizing-trading-strategies-without-overfitting
2) Normie way to overfit less-ish https://blog.quantopian.com/parameter-optimization/
Thanks for these; I'll have a read.
1) nerdy way to avoid overfitting https://seekingalpha.com/article/4126281-optimizing-trading-strategies-without-overfitting
2) Normie way to overfit less-ish https://blog.quantopian.com/parameter-optimization/
1) nerdy way to avoid overfitting https://seekingalpha.com/article/4126281-optimizing-trading-strategies-without-overfitting
2) Normie way to overfit less-ish https://blog.quantopian.com/parameter-optimization/
OK, so I didn't go all the way through Chan's article (too much econometrics jargon for me to understand without some serious background research), but I did want to comment on the first sentence: "Optimizing the parameters of a trading strategy via backtesting has one major problem: there are typically not enough historical trades to achieve statistical significance."
I trade UPRO but build the model with S&P E-mini futures (ES). ES has 5491 past trading days (as of 4/26/2019). One of my buy signals is when the 10-year US Treasury futures price (ZN) is above its 50-day MA. This signal was active of 1188 days of testing or 22% of the time. So I think this strategy certainly clears the statistical significance bar!
The quantopian article criticizes strategies which don't train the model on the most recent data. I might only train my model every few weeks, but it doesn't seem like omitting a few weeks would corrupt parameters obtained with 13 years of past data! I can see how a high-frequency system might suffer greatly from omitting a few days of training, though.
I think the biggest criticism that I'd levy at my approach is that it's kind of boring! A previous poster showed an intraday RSI strategy for AMZN with a 7x return in 2 years. There's no way that my approach (daily trades) is going to get those kind of returns. For securities with a strong upward bias, like UPRO and TECL, my approach is almost always in the market, but succeeds in avoiding the worst drawdowns. Here's the UPRO profit curve (six strategies traded). It looks very similar to the UPRO price, except the drawdowns are less severe.
View attachment 201587
There's no way I'm going to use my approach to turn $10K into $1MM in a couple years, like the hypester ads claim. But I bet I can consistently get 25-40%. That's worth something!
My worry for you is that you're not doing this, so you won't see that occasionally at day n+m, rule RM may stop working and not know how to deal with it. I greatly suspect that just ignoring the rule that stopped working recently, that worked before is sufficient but I'm not sure you're testing this.
OK, so I did a cross-validation test of my VIX and UPRO strategies, where I had a randomly selected 200-trading-day out-of-sample period.
Here's a plot of a VIX strategy (buy when the 5-day past return is between -22% and +2%). You are looking at 50 profit curves. The flat spots in the profit curves indicates where the out-of-sample period was.
The profit curves aren't diverging, which is good. The clustering about the mean looks fairly benign to me. On the other strategies that I've tested, I've not seen any naughty behavior.
View attachment 201635
Here's one strategy that may be a victim of overfitting. It simply doesn't have that many trades, so the spread between profit curves is pretty large
View attachment 201636
What are the yellow and red spots all over the place in the first chart? Is one VIX the other volume or something?
Another way to figure out whether you have overfitting is to look at your R^2 in and out of sample. If you have a high R^2 in sample, but a low R^2 out of sample, you are probably susceptible to overfit.
Sorry, the yellow dots are the VIX share price. The red dots represent cash in the market for one of the 50 realizations.
For the R^2 metric that you envision, what would I be fitting? I do compute an RMS error between the actual profit curve and a straight line connecting (t0,0) to (tmax,pmax), where t0=start time, tmax=end time, pmax=profit at tmax. I call it the "Deviation" and use it as a metric when I'm culling strategies. Here's a screenshot out of my culling spreadsheet (Lower deviation is better).
View attachment 201665