Is Walk-Forward (out of sample) testing simply an illusion?

tommcginnis · Oct 23, 2017

userque said:
You guys are failing to consider the x-axis:

You guys are trying to apply the model as though the domains are equal for all three segments.

I agree with @pursuit

Hopefully, I don't have to expound.

Not only has no one written that the domain between sub-segments be the same, but I have repeated referenced it being variable as a relevant factor.

,

,

,

That said, great exhibits (labeling aside).

Simples · Oct 23, 2017

Simple models and simple changes to such can yield vastly different results. A tool that may help is how many reasons do you have for your solutions not to be overfit? Doesn't matter what they are, but how you establish them matter greatly. These reasons may even be superior to out of sample and forward testing, because if they're right, they should work regardless of these tests, though they could still act as a tool for model validation.

Complex models on the other hand, may be overfit already, simply because of how they became so complex in the first place (in order to fit the data perhaps?). They're often characterized by lack of robustness and fickle dependencies (ie. bad data quality).

It's a mindbender and topic of exploration that may take lifetimes.

userque · Oct 23, 2017

tommcginnis said:
(labeling aside).

Lol...yeah...coulda did a much better job with just a little bit more effort.

Macca1 · Oct 23, 2017

pursuit said:
So if you're not here to discuss and explain the reasoning for your opinions, I'm just curious what are you here for?

I'm here for trading related entertainment.

pursuit said:
If we test a bunch of strategies on data segment 1 and then data segment 2 and then keep the ones that do well on both..........

Isn't that the same as testing them on segment 3 which is a combination of 1 and 2 and keeping "the good ones"?

We'll arrive to the same choice of strats in both cases, no?

-Segment 1 (70% or data) turns out to be based on a strong bull market,
-Segment 2( 30% of data) turns out to be based on a rapid decline.
-Segment 3 (100% of data)

*we have a long only strategy
*we are blinded and have no idea what the data in segment 2 looks like

A) If we tested strategies based only on segment 1, then the equity curves could significantly under-perform on segment 2, making the strategies no longer viable. If some still performed as expected ( even after a regime change), then we know what to investigate further.

B) If we were unblinded and tested strategies across all data 1+2( Segment 3) our strategy design could have already compensated for the decline seen in segment 2( In fact we might have decided that a long only strategy was no longer a viable option). Either way, we have opened ourselves up to curve fitting, or at least increased the likelihood.

When Segment 1 contains vastly different characteristics to Segment 2, then the strategies we arrived at in (B), are going to be different to the Strategies we arrived at in (A). Even though the strategies that performed well in (A) will still perform the same in (B), they could easily get overlooked for better performing strategies derived from only (B). Therefore we will not arrive with the same choice of strategies in both cases.

userque · Oct 24, 2017

Macca1 said:
When Segment 1 contains vastly different characteristics to Segment 2, then the strategies we arrived at in (B), are going to be different to the Strategies we arrived at in (A).

You are missing that the x-axis values are different in segment 1 vs segment 2.
Hypothetically, the best strategy for segment 1 can also be the best strategy for segment 2.

sle · Oct 24, 2017

userque said:
"Hey! I should *build* my models on all possible data, and not leave some data out for testing/validating!"

As a matter of fact, if
- a strategy is built on a good prior hypothesis
- the effect has good statistical significance
- and the number of free parameters is low (preferably none)
it's a perfectly OK thing to do. In fact, you would be better served building a collection of simple strategies this way vs going in circles optimizing something complex.

userque · Oct 24, 2017

sle said:
As a matter of fact, if
- a strategy is built on a good prior hypothesis
- the effect has good statistical significance
- and the number of free parameters is low (preferably none)
it's a perfectly OK thing to do. In fact, you would be better served building a collection of simple strategies this way vs going in circles optimizing something complex.

Oh, of course. I agree. I was merely pointing out that that's not the conclusion that can be drawn from this particular hypothetical.

Macca1 · Oct 24, 2017

userque said:
You are missing that the x-axis values are different in segment 1 vs segment 2.
Hypothetically, the best strategy for segment 1 can also be the best strategy for segment 2.

What are you talking about? Hypothetically sure, the best streagy for segment 1 can also be the best strategy for segment 2. However, it can also not be the best strategy aswell.

userque · Oct 24, 2017

Macca1 said:
What are you talking about? Hypothetically sure, the best streagy for segment 1 can also be the best strategy for segment 2. However, it can also not be the best strategy aswell.

I wanted to expound, but had to stop my analysis of your post. (See below).

Macca1 said:
I'm here for trading related entertainment.

I know right.

Macca1 said:
-Segment 1 (70% or data) turns out to be based on a strong bull market,
-Segment 2( 30% of data) turns out to be based on a rapid decline.
-Segment 3 (100% of data)

Ok.

Macca1 said:
*we have a long only strategy
*we are blinded and have no idea what the data in segment 2 looks like

This is not what the OP says. The OP says that we pick one of the available strategies that also does well in segment 2 as well as segment 1. So I guess I must stop here since your hypothetical requires something different.

Macca1 said:
A) If we tested strategies based only on segment 1, then the equity curves could significantly under-perform on segment 2, making the strategies no longer viable. If some still performed as expected ( even after a regime change), then we know what to investigate further.

B) If we were unblinded and tested strategies across all data 1+2( Segment 3) our strategy design could have already compensated for the decline seen in segment 2( In fact we might have decided that a long only strategy was no longer a viable option). Either way, we have opened ourselves up to curve fitting, or at least increased the likelihood.

When Segment 1 contains vastly different characteristics to Segment 2, then the strategies we arrived at in (B), are going to be different to the Strategies we arrived at in (A). Even though the strategies that performed well in (A) will still perform the same in (B), they could easily get overlooked for better performing strategies derived from only (B). Therefore we will not arrive with the same choice of strategies in both cases.

pursuit · Oct 28, 2017

Optimizing on seg1 and then picking only strats that look pretty on seg2 will result in a similar selection of strats as optimizing on the whole seg3. If we are testing a non-optimized strat - same thing. We end up with a similar selection regardless of whether we explicitly optimize some parameters or not. By selecting only pretty equity curves we are "optimizing".

It's really not that hard to understand (or I guess it is for some people judging from some of the replies on the thread). The out of sample thing is a fallacy and great for marketing, especially to retail traders.

It proves nothing and does nothing to increase the likelihood of success live. Other tests of robustness must be implemented.