Quote from dom993:
1- do you have enough setups in the "bad" part of the range to think it is statistically significant?
2- how stable in time is the performance in the "bad" part of the range? BE overall means little, for example it could have been negative in the 1st half of your backtesting & positive in the 2nd half, which could have several implications
3- do you have any market dynamics "theory" that can explain the results of that filter?
Quote from Random.Capital:
Why worry about curve-fitting?

Quote from logic_man:
Hi,
1. I think so, but the set-up isn't exactly easy to backtest over a decade or something, so I've really only got accurate data with all of the relevant metrics for 81 examples from the "bad" part and 149 from the "good" part. Those 81 examples from the "bad" part average ~-.25 ES points per trade, while the 149 examples from the "good" part of the range average ~3 ES points per trade. This leads me to conclude that the mean outcomes of these two populations are actually different, since I think if the means were the same, they would have converged by now. That's based on the heuristic I was taught in stats classes that 30 examples of something were typically sufficient to start drawing some tentative conclusions.
2. It's actually very stable. The average outcome for the "bad" part of the range has been negative virtually from the beginning of the data series.
3. Yes, if I were to explain to you why I think this phenomenon exists, it would be very intuitive to you. I've explained it to two people, one trader and one non-trader, and both "got it" very easily. In fact, that's why I begin my strategy optimization by focusing on this specific parameter.
Quote from dom993:
I doubt the 30 samples heuristic applies to trading, especially to backtesting. But your filter impacts 35% of the setups, and it appears to be stable throughout the backtesting period, this is good.
What is somewhat contradictory, is that despite you believe the filter to be based on a real market phenomenon, the average outcome for the trades filtered is about BE - that says "no better than random". One reading of this could be that the negative edge spotted by the filter balances the positive edge of your basic system. Another reading could be, your basic system has no edge but the backtesting is lucky on the subset of trades outside the filter.
I suggest doing some additional work to assess the value of that filter ... if the "bad" part of the filter is detrimental to your basic system, it could be good for a system working off "opposite" paradigm (if your system looks for reversals, try using the bad part of the filter on a trend-continuation system).
One last comment ... are these 230 setups all you have for your basic system, or is this just the subset for which you have access to the information required for the filter? If it is only a subset, then the obvious thing to do would be to get the information required by the filter for your entire backtesting period. THAT would be good OOS testing for that filter.
Quote from dom993:
Have you tried inverting the other filter & test your new filter on the 170 setups that are currently discarded? I would expect the "good" part of the new filter to perform better than the "bad" part on those 170 setups, even though "better" might just be "not as negative" in that case.
I would also re-analyze the old filter, in the light of the new one ... ie., does the old filter do any good assuming you use the new filter.