Tonight I'm doing a few quick calculations on the subject of risk and edge uncertainty.
Let's start with risk of ruin.
The baseline calculation is very straightforward: for a given strategy, backtest over a reasonable sample of trades, and consider this representative of the "pool" of possible trades the system will generate. We will account for uncertainty in our edge later and the implications for uncertainty in our risk of ruin, but this is the starting point.
The risk of ruin calculation from there is very simple: using a Monte Carlo approach, select trades randomly from the sample and transact from a starting bankroll (sorry for my crass gambling terminology) for a large number of trades, say, 2000. We should choose a number big enough that the results don't change much if we continue to make it larger. Since we have an edge according to the trade pool distribution, we expect that if we aren't broke after enough trades we'll be out of danger.
So let's start with some sample numbers. I have a mediocre system I'm playing with on ER2 that backtests with an EV of $54 per trade with 52% winners over 350 trades on two years of 5 min data.
For a $5k account, this baseline calculation gives a ROR of about 0.2% - 0.3%.
I'm not taking into account margin calls here, just a zeroed balance (as if you could put up additional margin but just didn't want to lose more than $5k before calling it quits.)
Ok, so let's go trade! We're safe as can be!
Well, not so fast. The backtest data may or may not represent the real behavior of the market, of course. Sometimes a system "gets lucky" and shows great returns for a timeperiod that aren't sustainable. In fact, we probably have optimized our system in such a way that it's showing the "luckiest" response we can find over our sample.
The next question I'm going to ask is: how likely are we to see the mean result of $54 a trade if our trade samples are coming from a "true" distribution that is worse than our sample would lead us to believe? In other words, how likely is it that our backtest is getting lucky?
I don't want to get into the inevitable squabbles about how it is impossible to know the "true distribution" or whether such a thing even exists. Believe me, I have had that debate countless times in the poker community and I'm not going through it again here. We're going to make some assumptions for this exercise. Take it or leave it.
So we have to construct some "true" distribution of trades, given only our backtest data which we believe provides a basic outline of the distribution, but may be a "lucky" peek into the real underlying distribution.
Let's start simple. I'm going to take the sampled distribution and for every trade I'm simply going start discounting it by a constant amount. I'm just shifting the distribution to the left by a constant amount. You could get fancier, but I won't bother for now.
We're then going to start simulating trade series
of the same sample size as our backtest, and we're going to see how likely it is that our $54/trade EV was the result of a lucky sample from a less profitable "true" distribution.
Clearly if we use the original distribution, we are going to get $54/trade or more exactly half the time in our simulated runs.
If our model distribution is worse than that, we are going to get decreasing probability of seeing the $54/trade EV.
Here's a sample curve:
This is telling us that there's a 15% chance that although our "true" EV is actually $34, we've just gotten a lucky sample over our 325 backtested trades, and we're fooled into thinking our EV is actually $54.
So let's relate that back now to the risk of ruin. Since our edge is smaller, we know our risk of ruin is higher. For each of those discounted distributions, we can look at an implied risk of ruin for that distribution by repeating the process described earlier for a sufficiently large number of simulated trades.
Here's how that works out:
Initially we computed our risk of ruin at 0.25% on our $5k bankroll with this system. As we can see here, that's a very optimistic estimate. It's not terribly unlikely that our risk of ruin could be in the 5-10% range, instead of fractions of a percent, where we'd like it to be.
And now here's a key thing. Just eyeballing now, we can consider the product of the two curves in the second chart in terms of a weighted average of ROR based on the likelihood of each assumed "true" distribution. Clearly then the rate at which the first curve "rolls off" is important in protecting us from those disastrous risks of ruin towards the right of the chart.
That rolloff rate is a strong function of the sample size. Believe it? Let's try again, only this time we are going to use a backtest with 53 trades in it. 53 trades isn't a lot, but it should be good enough to get an idea, right? Let's try it.
Here's the chart with a 53 trade backtest:
Now we're a 1 in 4 shot to have a disastrous ~10% risk of ruin! As for me, I pass on that gamble, and now I have a pretty good basis to face facts that a 53 trade backtest doesn't mean much. 325 isn't perfect either, but it's certainly more confidence inspiring than 50. 50 trade backtests belong in the round file as a rule.
All of this is just back of the envelope estimates to develop a general feel for what you're dealing with wrt risk of ruin and confidence in an edge from a backtest. Again, these estimates are really best considered as lower bounds, not upper bounds on risk and uncertainty. There are lots of alternate ways to generate trial probability distributions, and you could even generalize this method to account for non-stationary distributions if you really wanted to, but that's as much detail as I'm going to get into for now.
Cheers,
Fletch