How many years of backtesting do you use for testing stocks?

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2308659

QSPZ3bx.png

before you post this, I am searching for deflated Sharpe ratio... One of the graph is to plot the relationship between overfitting and how much data you use.
 
Thank you for the graph. From the graph, a good guideline is to have at least 600 trials in the backtest to avoid over-fitting. Unfortunately, I am not sure what exactly is meant by each trial. Does 1 trial refer to 1 trade?

So, if time frame is daily, around 10 years of data history is needed. Does it mean for intra-day backtest with time frame in minute, only weeks of data history is enough?

No trials is not backtests. It's how many times you fitted the data. Basically the more fits you do, the greater the chance of finding a spurious result, the more data you need. So unlike data points, you want fewer trials. And this particular plot would be different, depending on the underlying Sharpe Ratio.

Also it definitely isn't the case that having more frequent data reduces the history of data required, since the noise and the parameter variability both scale with the square root of time. If you need 20 years of daily data for statistical significance, you will also need about 20 years of one minute data *

[the exception clearly is if you have data that is 'too slow' for your trading system; of course an HFT would benefit from having tick data rather than daily data]

GAT

* [Technical note] very slightly less because the T-distribution is converging on a normal distribution so the critical value of T falls a tiny amount from 2500 observations to several million.
 
In my experience, the amount of time is less important than selecting for a specific list of market conditions, and accounting for those conditions in my trading rules.

Do I need to be concerned about volatility, range, momentum, bull or bear markets? Yes, then I look for the periods of time that the markets exhibit those conditions....... For me, that covers it.
%%
True.
BUT i still like all the data;
going back to 1927-37 + 1776...................................................................]Edit=all the data on larger timeframes]
 
Last edited:
Thank you for the graph. From the graph, a good guideline is to have at least 600 trials in the backtest to avoid over-fitting. Unfortunately, I am not sure what exactly is meant by each trial. Does 1 trial refer to 1 trade?

So, if time frame is daily, around 10 years of data history is needed. Does it mean for intra-day backtest with time frame in minute, only weeks of data history is enough?

just forget about the graph. The reason why Marcos Lopez form a false strategy theorem is that he wants to raise the awareness of multiple backtest Overfitting.

There are many ways to calculate the probability of Overfitting, such as deflated sharpe ratio, family wise error rate......,etc.

however, they don’t define what trial is. Some backtesters sometimes just do a small change on the code while others do a big change. Under this situation, how can they have a same Overfitting probability?

that’s why deflated sharpe ratio is useless under some scenarios.

But yes. you should be careful of multiple testing. More importantly, your strategy needs to be explainable, don’t use too many variables and run LASSO to reduce Overfitting.
 
2K trades to be sure it's not overfitted

it also depends on the distribution of your trades. If your trades tend to move to particular time frame, for example you backrest 2000-2020 and more than 1.5k trades (>75%) are completed in 2008 and 2020, your ML may not learn the general data structure.
 
I think more important than time, is the data you are using for your backtesting. You should take into account different market conditions that can simulate your trading plans.
 
Back
Top