Execution risk, liquidity, etc.
Example: (not taking liquidity into account)
I once saw a back-test for options trading that did extremly well at capturing volatility on spread-trades. The data was solid and everything looks excellent. I could NOT find a single issue with the back-test or model. We tried to break-it, but couldn't.
Then they went into production with the program trading of this strategy and it traded for such a small amount it was useless.
It seemed that when the volatility skew anomaly occurred it was only on a small trades (usually retail small size orders) that there was really very little liquidity at those levels. There was no way to deploy any reasonable amount of money behind the strategy because the order flow wasn't there.
The problem was they didn't take liquidity into consideration. It worked, but only on a fraction of the expected capital that needed to be deployed.
===============================
Example II: (not taking special terms into account)
I help design a covered-call scanner for a trading firm. It worked perfectly, we back-tested the model and it seemed to meet with their expectations. Then it went into production and we took some bad hits on a few trades. Because we had forgotten to eliminate spin-offs, splits, certain deal terms. We corrected the (stupid) error - that we should NOT of missed. However, when we revisited the back-tested results it made some incorrect assumptions on static yield because of special deal-terms. That changed the results considerably that covered-call scanners quickly went into the circular file and I stop working on them in the late 90s.
===================================
Example III: (It works until it doesn't.) I help work on a very large pair's trading system based on spread value. There was actually a pure arbitrage on paper. Unfortunatly the spread continued to widened because the terms of the "b" shares had a provision that allowed for no conversion. Again - this was a fundamental fall - not in the data - but in the legal aspect of the relationship.
-------------------------------------------------------------------------
We try to back-test in the blind.
Start with a hypothsis and then test for it.
However, we also forward test everything as well - to account for execution risk, etc.
I have worked on everything from complex OTC trades, arbitrage, convertable trades, etc. I have made every kind of mistake imaginable and I expect to make more in the future. I do learn from them.
We have some successful trading systems, but for each one we have 10 failures. We keep those failures to learn from them, see WHY they fail, and how to plan for the next test.
Some strategies only work for a particular deal, time frame, or condition - but are EXPECTED to fail over time - because those variables that create the situation are not expected to last.