Is this over-fitting

My basket of ETFs is very diverse. Basically it is all the big leveraged ETFs. Different countries, sectors, commodities, inverse...
 
Not necessarily.
I backtested my system when it was in development without doing any optimization. I searched for situations where my system got in trouble and tried to find a global improvement in the basics of my system. So not in testing variable parameters and optimize the system for that specific set of data.
A system should be build on logic. If something does not work it means the logic is not good, so you don't fix that with an optimization of parameters. You improve your logic.
To me a good system should perform well in any market within changing any parameter.

Within certain limits parameters in a good system can be changed without influencing the performance. That shows that the system is well balanced, and does not get affected by "noise".

The basic logic of my system never changed the last 25 years and still works. There is some kind of logic in the market behavior, markets are not random. Question is how to find what the logic is.
I tried to find a logical solution, not a mathematical one. And when the logic is OK, the math in general is OK too. As math is about logic.

I agree completely, You start with an idea that makes sense and then look in the market, in the past if it would make money. But 99% of the people that do backtesting, just look at the data and then try to find a system that would work on that data. That is already over fitting...
 
I agree completely, You start with an idea that makes sense and then look in the market, in the past if it would make money. But 99% of the people that do backtesting, just look at the data and then try to find a system that would work on that data. That is already over fitting...
Not necessarily true, data mining is an industry.

biggest problems arise when this is mixed with hope and cognitive bias
 
Not necessarily true, data mining is an industry.

biggest problems arise when this is mixed with hope and cognitive bias
That is only 10 percent of the problem. The biggest problem is that people think the data they are using represents the market. Like a daily chart or any time frame with an open, high, low, close and bid/ask volume represent the market. That is mistake nr 1 in the whole industrie of backtesting. Second mistake is using that data as a basis to do all sort of calculations. All that data leave out the biggest reason why price move. And if they do find a system that work with the data they have, it doesn't work because of the data or analysis but because of the position sizing they are using.
 
Not necessarily true, data mining is an industry.

biggest problems arise when this is mixed with hope and cognitive bias

Datamining misses the most important factor: creativity, thinking out of the box.

In fact datamining does not think at all. It can make huge amounts of calculations, but that's all it can do. The information that humans give define the rate of success of datamining. The computer will never have own/new ideas or think out of the box.

If you ask a computer: why this result.
He will "tell" you that all he can do is math. So it is a mathematical result. But markets are not mathematical. That's why price can be overbought or oversold.

Behavioral finance is very important in trading. And that's something you don't use in datamining. Datamining is exact science; markets are not.
 
Last edited:
Datamining misses the most important factor: creativity, thinking out of the box.

In fact datamining does not think at all. It can make huge amounts of calculations, but that's all it can do. The information that humans give define the rate of success of datamining. The computer will never have own/new ideas or think out of the box.

If you ask a computer: why this result.
He will "tell" you that all he can do is math. So it is a mathematical result. But markets are not mathematical. That's why price can be overbought or oversold.

Behavioral finance is very important in trading. And that's something you don't use in datamining. Datamining is exact science; markets are not.
I understand how you think, but actually behavioral and social science relies on math. That’s why you see so many people with this background in data science. Math is not necessarily ‘linear’.

Data mining can also spark ideas, at least it did with me. So it is not as black and white as you would think. So perhaps using data mining to create ideas is in a sense also, out of the box ;)
 
Lets say I have a set basket of 50 ETF's that I like to backtest and trade with. I run a backtest and get good results and decide to trade a strategy, but I notice about 10/50 of the ETF's perform poorly with the chosen parameters.

1. Would it be considered over-fitting to exclude those 10 ETF's from live trading

2. Doubling down here: if I did exclude the 10 poor peformers, how bad would it be to re-optimize on the remaining 40/50 ETF's.


When you optimize your data set the end date to 1 year ago, then check to see how your system performed in the past year. The past year becomes real, out of sample data. Also, when you optimize have your system run both long and short. And, optimize the data over a time period when the market has had some major bull and bear runs. Otherwise, you are just curve fitting to a particular type of market. If you optimize for long in a bull market, the system is not likely to work well, or at at all, in a bear market and vice versa. If you have the ability to backtest multiple symbols test the 3x ETF pairs, like TQQQ and SQQQ, both long and short.
 
To summarize, you are looking for a representative sample.

Suppose you have a method using options and want to replicate a bear market like 2000/2001 (no option data). The closest thing (I think) is to take some (index) declines and 'lengthen' them combined with random shuffeling the days. Offcourse only an aproximination, but since there is no data there are no other possibilities I am aware of. Next again, a 'backtest' is in a sense always an aproximination. What do you think about this approach?
 
Last edited:
Back
Top