Out of sample testing

Specterx · Oct 24, 2018

Dhalsim said:
I use random out sample periods e.g testing from 1997 to 2018, this gives 30% random quarters as out of sample.

Also, is it not wise to have as much test sample data as possible? This gives the systems more data to train and should improve predictive power.

Do you ever have a market inefficiency you want to exploit and then test the basic entry principle behind this potential inefficiency. If this passes the in sample across multiple correlated markets then immediately test the out of sample? This way you don't waste weeks or months trying to build a system around a principle rule which had no edge, to begin with. Is this okay to do or should one finish all the entry criteria (can take weeks) then test the finished entry model on the out of sample?

Well, there's not necessarily a clear line between "basic entry principle" and the final entry criteria. You start with the fact that the market can be a buy or sell at any instant, and then apply filters layer by layer to isolate to +EV periods/events. Every such filter should raise expectancy and reduce the number of trades.

The point of out-of-sample testing is to avoid mining coincident patterns in the data that happen to hold over the training and test periods. Each filter you apply increases the risk of this somewhat, and a variety of additional factors can increase it further:

- Use of "magic numbers", especially where small changes to the number result in large changes to system performance, number of signals, etc.
- Use of filters with no clear connection to the inefficiency/tendency being exploited, or a clear reason why they "should" work
- Excessive layers of filters
- Filters which reduce the number of trades excessively
- Identifying filters or filter parameters by testing large sets of them and discarding those which don't work
- Failing to test in a variety of market conditions, volatility levels etc

The more of these factors which apply to your system, the more important it is to validate on out-of-sample periods.

bashatrader · Oct 30, 2018

I still remember teacher in data mining class screaming about out of sample repeated validations not making any statistical sense.

Q.E.D. · Dec 15, 2018

bashatrader said:
I still remember teacher in data mining class screaming about out of sample repeated validations not making any statistical sense.

My two cents, after 50 years of trading, including a number backtesting and managing as CTA: while testing trading ideas is crucial, Out of sample testing is faulty logic.

Briefly, my reasoning: If one created one system, backtested with favorable results, and then tested OOS data, and then stopped trading thoughts if that failed, then OOS makes sense.

However, nobody does that. If system fails in OOS, virtually everybody reworks the system, tests, & then goes to OOS again. Basically, the OOS just becomes a separate part of the backtesting data, with no reason to test separately.

After millions of tests, and trading using 50+ systems, it is always clear that your largest drawdown lies ahead.

Dhalsim · Dec 17, 2018

Q.E.D. said:
My two cents, after 50 years of trading, including a number backtesting and managing as CTA: while testing trading ideas is crucial, Out of sample testing is faulty logic.

Briefly, my reasoning: If one created one system, backtested with favorable results, and then tested OOS data, and then stopped trading thoughts if that failed, then OOS makes sense.

However, nobody does that. If system fails in OOS, virtually everybody reworks the system, tests, & then goes to OOS again. Basically, the OOS just becomes a separate part of the backtesting data, with no reason to test separately.

After millions of tests, and trading using 50+ systems, it is always clear that your largest drawdown lies ahead.

I understand that but my question is still what is the correct manner to use OOS.

E.g. I have a basic system designed all based around a non optimised basic entry principle. So we have 1000s of trades as there are very few rules to this system. Now let's go add and test entry filters: A, B and C. All are unique non optimised entry filters.

- If I test entry filter A on the IS data and still try to get a large sample of trades. If entry filter A works well on the IS data then can I test in OOS data to see if there is an edge? Let's assume A fails on the OOS so we disregard this rule.

- We now test filter B and it works well on the IS data. Now can we use the same OOS data to test this rule on?

Is it even correct methodology to test the entry filters in this manner on the OOS data? Keep in mind that if the OOS fails then we disregard this rule entirely. Moreover, all rules created are completely unique and non optimised and not created by forcing a system to learn on past data. Essentially each rule is created with Tacit knowledge on what might work based on part market experience.

In this example at most we might test 10 unique entry filters, non optimised and based on Tacit trading knowledge. Can we use the OOS data to test the results of each rule without affecting the legitimacy of the OOS data. At no point to we adjust a rule to force it to work on OOS data. If the filter fails it is completely disregarded and will never be tweaked.

If we can use the OOS data this way then we will get a complete system much quicker and waste far less time in the process. I understand that technically we can test 10,000 different rules and 1 will fit the OOS data, however, please keep in mind that the rules I am discussing do not have optimising parameters and are non-indicator based. Also, they are based on actual live market experience and each one is unique. At most 10 to 15 rules will be tested.

Any thoughts?

userque · Dec 17, 2018

Dhalsim said:
not created by forcing a system to learn on past data

Interesting post.

In my non-expert opinion:

Assuming, as you state, the entry filters are not trained or optimized with any of the data, then you should be able to use all of the data, IS and OOS, to determine whether an edge exists.

Separating data into chunks for training, testing, and validation is used when the data itself is used to create or optimize the trading rules. Since, in your example, your rules are not derived or modified by the data, there should be no need for such separations.

Q.E.D. · Dec 18, 2018

Dhalsim said:
I understand that but my question is still what is the correct manner to use OOS.

E.g. I have a basic system designed all based around a non optimised basic entry principle. So we have 1000s of trades as there are very few rules to this system. Now let's go add and test entry filters: A, B and C. All are unique non optimised entry filters.

- If I test entry filter A on the IS data and still try to get a large sample of trades. If entry filter A works well on the IS data then can I test in OOS data to see if there is an edge? Let's assume A fails on the OOS so we disregard this rule.

- We now test filter B and it works well on the IS data. Now can we use the same OOS data to test this rule on?

Is it even correct methodology to test the entry filters in this manner on the OOS data? Keep in mind that if the OOS fails then we disregard this rule entirely. Moreover, all rules created are completely unique and non optimised and not created by forcing a system to learn on past data. Essentially each rule is created with Tacit knowledge on what might work based on part market experience.

In this example at most we might test 10 unique entry filters, non optimised and based on Tacit trading knowledge. Can we use the OOS data to test the results of each rule without affecting the legitimacy of the OOS data. At no point to we adjust a rule to force it to work on OOS data. If the filter fails it is completely disregarded and will never be tweaked.

If we can use the OOS data this way then we will get a complete system much quicker and waste far less time in the process. I understand that technically we can test 10,000 different rules and 1 will fit the OOS data, however, please keep in mind that the rules I am discussing do not have optimising parameters and are non-indicator based. Also, they are based on actual live market experience and each one is unique. At most 10 to 15 rules will be tested.

Any thoughts?

Just because you do not optimize your parameters, does not mean it is not data fitted. The simple case, assume you have 100 clones, each coming-up with the same system, but each use a slightly different variable/condition. None of the 100 of you have optimized, but viewed from macro level, there are 100 optimizations.

I suggest the hatred of optimization is not well founded, & all the large algos do millions of optimizations, IMO.

userque · Dec 18, 2018

Q.E.D. said:
...The simple case, assume you have 100 clones, each coming-up with the same system, but each use a slightly different variable/condition. None of the 100 of you have optimized, but viewed from macro level, there are 100 optimizations.

The OP clearly stated [emphasis added]:

Dhalsim said:
...All are unique non optimised entry filters.

...all rules created are completely unique and non optimised and not created by forcing a system to learn on past data. Essentially each rule is created with Tacit knowledge...

In this example at most we might test 10 unique entry filters, non optimised and based on Tacit trading knowledge.

So, you either didn't understand the OP. Or you believe the OP doesn't understand what they wrote...and you instead, know what the OP meant to write. Or you believe the OP is lying.

Dhalsim · Dec 18, 2018

userque said:
The OP clearly stated [emphasis added]:

So, you either didn't understand the OP. Or you believe the OP doesn't understand what they wrote...and you instead, know what the OP meant to write. Or you believe the OP is lying.

Yes, this is correct.

@userque your previous post is spot on and I certainly believe if the data is not being trained upon then you do not require OOS and IS periods.

However, if we were to change the example slightly and now assume we train filters A, B and C on the data - is it okay to check the OOS for each filter. Still, keep in mind each filter is completely unique and very different to one another.

@Q.E.D - I have nothing against optimisations. I am just looking at the correct methodology in all scenarios for using OOS data.

userque · Dec 18, 2018

Dhalsim said:
However, if we were to change the example slightly and now assume we train filters A, B and C on the data - is it okay to check the OOS for each filter. Still, keep in mind each filter is completely unique and very different to one another.

I think I follow you:

You have unique filters (A, B, and C).
You train each on the same IS?
Then you want to know if it is proper to compare and test the performance of each filter on the same OOS?

In this simple example, my opinion is yes--this is proper.

Q.E.D. · Dec 19, 2018

userque said:
The OP clearly stated [emphasis added]:

So, you either didn't understand the OP. Or you believe the OP doesn't understand what they wrote...and you instead, know what the OP meant to write. Or you believe the OP is lying.

There is nothing in this universe that is "unique non optimised entry filters." But we are entering the field of epistemology.