A real edge hardly requires any testing

I think you are confusing hardly requires any testing with don't test at all.

IMO the process should be:
1. you have a model that everything about it says it should work.
2. you test it on data
a. it works and gives reasonable results inline with expectations before the test
b. it fails, you scrap the model and start over

B is the problem. You can't do tweak/test/fail, tweak/test/fail, tweak/test/fail, tweak/test/success!

The hysterical thing to me is that is exactly what I thought the point of backtesting was when I started.
You're right that B is a problem, but not necessarily always a problem. Reading through this thread it's apparent that there are 2 groups talking past each other a bit. Group 1 is rightly cautious of data mining, which is what you're describing. As a result, they feel that using testing to find an edge is always bad. It generally is bad using the standard model most of us use that you describe well, the tweak/test/fail, tweak/test/fail.... model.
There is, however, another group out there, many professionals, who use an entirely different model somewhat like a monte carlo simulation which is to simultaneously test millions of random permutations of a strategy on a data set. They then use some techniques like various out of sample tests and testing during different market conditions to winnow that down to a few hundred strategies, then examine those to determine if there's a thesis to support them or if they're just data mining results or just throw money at all of them if they've done some statistical analysis to show that it's OK to have some spurious data mined results in there. Nothing wrong with this technique, a few of the successful big hedge funds use it, it's just probably not something most of us would do. So when group 2 on this thread is talking about backtesting for a strategy as viable, they're probably talking something like this. And it's perfectly OK for group 2's strategy to be viable while simultaneously saying the tweak/test/fail model isn't generally viable.
 
Well I don't know how you can disagree with something you haven't read and what you said about ML is just not correct.
If we are talking about Advances in Financial Machine Learning book, I have read it and found it mildly entertaining. Some of the specific topics were over my head, some generic stuff was full of truisms, some was "ho-hum" and most was "WTF is he talking about". It's impossible to deny that a lot of what he writes it tinted by his own approach. Anyway, as I said before, the guy is a fantastic self promoter but pretty much failed in extracting real alpha from the markets. To quote a buddy of mine, "he was running a PnL-neutral strategy".

What you are describing is actually what people are doing who do not use ML because you can't measure what is actually contributing to the prediction like you can with ML.
What exactly is "what people do" you are referring to? If you are using a real-life market phenomena as a base for your hypothesis (my preferred approach), you do not need ML at all and instead can use simpler things like impact models or various heuristics.

FWIW, these days I do use a fair number of ML techniques (tree models for my higher frequency alphas, for example) and have formulated my own opinion about their strengths and weaknesses.
 
I agree that imagination is very important and in trading, doing something uniquely is far more important than being technically proficient. Yet, even that quote is partly gibberish IMO.



Similarly I could say that you have no chance against the many thousands of gifted discretionary traders in various institutions. They might generate 10 times better ideas and 10 times faster.
With drive and dedication, a lot is possible.
Don't underestimate @ma_trader, I think he/she is real.
 
Even the construction of mathematical proofs require tinkering. You don’t design a system by pure derivation. But it’s okay.

There is no proof without refutation. And time is the best challenge for a system to overcome.
I agree. Starting with derivation and concentrating more on derivation seems to a better way, than starting with testing as the main source for the edge.
Reminds me of a conversation between a physicist and an engineer.

Take option as an example.

A physicist would ask: What are the basic principles behind the pricing of options and what can cause the dislocations of that price? That dislocation is then tradable.

An engineer would ask: I notice volatility tends to revert to the mean. It is tradable but may need constant tweaking.

@ma_trader I assumed you derived your winning method from basic principles? If you have a real edge, backtesting should validate your few months' profitable trading record.
 
In the end, it's all about judgement. I've deployed plenty of strategies without any backtesting and plenty of strategies with rigorous backtesting - it would be hard to say which one is preferable without caveats.


Here is a concrete example. You're a UHF market maker and planning to deploy a new order book strategy, let's say if you see a certain pattern you penny the bid. My prior would be that it's impossible to backtest because of the feedback (i.e. by trading you modify the order book and thus change the market you're working in). So if you have a good model and have done the studies, trying to create a backtest would be a waste of time.

There are other cases when backtesting does not make sense. If you are dealing with an arbitrage strategy (cause you know it works and most of the problem is in implementation), if you are dealing with a highly asymmetric distribution of returns (e.g. backtesting many risk premium strategies is a waste of time).

Thanks for sharing those. Can't really speak to the UHF, as I don't have much experience there. Something like a pattern though, I would think would be amenable to testing. I get the part about altering the outcome by impact, but I wonder how you arrive at the analysis that the pattern yields some kind of outcome to begin with. Isn't that some kind of testing/analysis based on past behavior (even if implicitly by experience/intuition)?
 
Last edited:
I think you are confusing hardly requires any testing with don't test at all.

IMO the process should be:
1. you have a model that everything about it says it should work.
2. you test it on data
a. it works and gives reasonable results inline with expectations before the test
b. it fails, you scrap the model and start over

B is the problem. You can't do tweak/test/fail, tweak/test/fail, tweak/test/fail, tweak/test/success!

The hysterical thing to me is that is exactly what I thought the point of backtesting was when I started.

Since you are on the topic of ML. Few in ML would approach testing in the
"tweak/test/fail, tweak/test/fail, tweak/test/fail, tweak/test/success! " approach to begin with.
There is something called cross-validation and generalization in the field that is very well known, and is the antithesis of what you describe.

Secondly, we use ensemble/averaging methods to overcome some of that as well.

There are still a lot of limitations with known/published ML though. I reviewed DePrado's book, and I would say a lot of it is overly theoretical vs. hands on.
As an example, he talks about failures of MPT and how HRP is a sound alternative, yet shows no real data to demonstrate. When you actually look at real data, there is very little discrepancy in out of sample performance. That's a lot of work to validate. It would be great if they had a practical/empirical companion.

Can't say I understand the adulation all that much.
 
Last edited:
Isn't that some kind of testing/analysis based on past behavior?
Let’s distinguish a “backtest” from a “model calibrated to historical data”. A regression that take some outcome (e.g a level of a book being taken out) on some input (e.g. ratio of size at top of the book to exponential average of all levels) and produces some values used in forecasting is a model but not a backtest. A rolling simulation of the usage of said regression to trade in a specific way is a backtest. You see the difference?
 
Let’s distinguish a “backtest” from a “model calibrated to historical data”. A regression that take some outcome (e.g a level of a book being taken out) on some input (e.g. ratio of size at top of the book to exponential average of all levels) and produces some values used in forecasting is a model but not a backtest. A rolling simulation of the usage of said regression to trade in a specific way is a backtest. You see the difference?

I don't really distinguish those cases the way that you do. Especially with respect to the thread. When people on the forums say that backtesting is a waste of time. I think of backtesting as an integrated process, much like R&D. from that perspective, I don't see that process as a waste of time.

I don't necessarily separate this process into first building a specific model and calibrating to historical data, then consider trading around this model in a rolling simulation, as a separate "backtest" component. Both of those steps (to me) are components of backtesting. And they don't even need to be separated.

Suppose I had some hypothesis model(s) that is (are) completely unknown to me. I only have the input data, some method of quantitatively generating a set of hypotheses (maybe trillions), and my objective criteria. My models themselves could (and often do) include trading decisions and responses (even bet sizing). Maybe I consider optimizing my objective over rolling or segmented windows, as superior to using all historical data (what you might call calibrating the model), so I use that criteria as an input to my model generator and optimization criterion (I could even choose how to validate as part of the optimization criterion). Fitting around an anchored historical data set might be a sub-optimal way of fitting my model. I'm interested in the fit, fits, or even ensemble of fits, that trade off between best optimizing my criterion and giving me the most confidence that the hypotheses that I choose over the data that I have are a good representative of the behavior that I expect to see on unseen data. There are many variations of this process, but again, I consider backtesting to comprise all of these. When I have new data, if it behaves very differently than I expect, I need to go back and understand why my assumptions were so wrong. But part of backtesting involves trying to properly guage that beforehand.

I am glad to hear some of your descriptions (and especially examples), because it helps me to understand how other people might be perceiving these concepts differently.
 
Last edited:
I
Backtesting is absolutely a research tool. If it helps, remove the word back, and just think about testing. You should run tests over data to draw any conclusions about it.

I still have yet to see someone describe a (legal) edge that would not benefit from some sort of quantitative analysis or testing. If anything, it could help to determine if your qualitative edge wasn't really any kind of edge to begin with, rather luck due to chance.

Could those proponents of no testing give some kind of concrete examples to help us see your reasoning?
Its not a no-testing-at-all sort of a thing. But the concept to exploit does not come from testing, it comes from thinking and imagination, and does not require testing to get qualified as an edge.

The edges I have discovered are confidential, cannot talk about them here.
 
I don't necessarily separate this process into first building a specific model and calibrating to historical data, then consider trading around this model in a rolling simulation, as a separate "backtest" component. Both of those steps (to me) are components of backtesting. And they don't even need to be separated.
I'd think there is a big difference between a historical study and a backtest. Let's take a different example - let's say someone out there is picking out volatility trades. She can look at the history of implied volatility and history of realized volatility to find trades with positive expectation. However, neither of the two are tradeable assets and this study is not a backtest.
 
Back
Top