Quote from braincell:
Run a statistics test to avoid curve fit.
A statistics test is as follows:
-Take a larger number of inputs, for example 500. Now you're using only RSI 14, RSI 21, etc. Add inputs you think are significant, whatever it may be, like MACD h value for different periods, etc.
-For each of the inputs you need around 10 to 50 million runs. With 500 inputs you should do 5 to 25 billion runs, depending on total input count in your code.
-Rank systems by how they perform out of sample. Discard those that have less than 500 trades in-sample.
-All systems that are ranked should add their total success out of sample (ie NetProfit) to variables summing the significance of the input values. So a system that made NetProfit of 20k out of sample should add a value of 20k to all of the input values that were used in it (for example RSI 14, etc).
-Once you run all of the billions of systems, you will see a picture emerging where only some of the input values really seem to be re-appearing.
-If you have a poor search algo, you may need to do many more tests. If you are creating them purely randomly (no hill climbing of any kind etc) you will need 100s of billions.
-Once you select the top 50 input values from the statistics tests, creating systems should be much easier. Meaning, you should be getting a much higher success rate in OOS results compared to using randomly selected input values. If this is not the case, the statistics test suffered a curve fit and perhaps your search algo needs to be improved.
As you can see, running billions of backtests takes time or hardware or very fast search programs. However, it's the best (if not only) way to improve data mining of any kind, in terms of OOS results.
Also, you shouldn't trust any OOS results if the in-sample doesn't have at least 1000 trades in it (like someone said earlier) and the OOS has the unchanged trade frequency. Without this, you really can't say you've found anything "statistically significant".