Just wanted to submit this since I've found it useful for my own work.
TL;DR: compute returns at signal time for various holding periods and be satisfied that it's 80% accurate
One of the issues with backtesting is that it can be time consuming. Backtesting is never 100% accurate, nor should you seek it to be 100% accurate because then you're probably wrong.
I usually aim for 80-90% accuracy in anything I try because the remaining 20% will likely not give me any huge returns. If you want 100% accuracy, this will likely be useless.
In order to backtest a strategy, you have to decide what you want out of it. For me, this is:
I don't want to wait for 10 minutes to backtest against 15 years of data, so quick turnaround is important. 15 seconds is my cutoff.
This means event-based backtesters are probably out of the question since they need to go through each bar one at a time. We need something that can work faster.
Vectorization is (basically) performing multiple operations at the same time, usually across a whole array. If you have a N-element array and want to multiply it by 3.5, there are two ways to go about it:
1. Loop through each element and multiply it by 3.5
2. Multiply each element by 3.5 at the same time
The latter requires hardware support or at the very least, can be done in native code.
With Pandas, operations are vectorized when they can be broadcast. Fortunately, many operations in Pandas are conducive to broadcasting. This includes the basic arithmetic operations. So the above operation (A*3.5) is more or less guaranteed to be done in hardware. This is what makes Pandas faster than doing things with Python arrays.
Now, in order to backtest a strategy, you need to know your entries and subsequent returns. Exits can be done this way as well, but I haven't bothered with it yet.
My process is something like this:
1. Identify positioning (position size can be done, haven't bothered): -1, np.nan, 1 as short, none, long
2. Identify subsequent returns
3. ???
4. ???
5. Profit? Maybe? Probably not.
I often identify any columns looking into the future with a 'return_' prefix or a 'future_' prefix so I don't accidentally use them anywhere else.
Hope this is useful to someone, would love to hear any criticisms.
TL;DR: compute returns at signal time for various holding periods and be satisfied that it's 80% accurate
One of the issues with backtesting is that it can be time consuming. Backtesting is never 100% accurate, nor should you seek it to be 100% accurate because then you're probably wrong.
I usually aim for 80-90% accuracy in anything I try because the remaining 20% will likely not give me any huge returns. If you want 100% accuracy, this will likely be useless.
In order to backtest a strategy, you have to decide what you want out of it. For me, this is:
Is this strategy worth looking into further
I don't want to wait for 10 minutes to backtest against 15 years of data, so quick turnaround is important. 15 seconds is my cutoff.
This means event-based backtesters are probably out of the question since they need to go through each bar one at a time. We need something that can work faster.
Vectorization is (basically) performing multiple operations at the same time, usually across a whole array. If you have a N-element array and want to multiply it by 3.5, there are two ways to go about it:
1. Loop through each element and multiply it by 3.5
2. Multiply each element by 3.5 at the same time
The latter requires hardware support or at the very least, can be done in native code.
With Pandas, operations are vectorized when they can be broadcast. Fortunately, many operations in Pandas are conducive to broadcasting. This includes the basic arithmetic operations. So the above operation (A*3.5) is more or less guaranteed to be done in hardware. This is what makes Pandas faster than doing things with Python arrays.
Now, in order to backtest a strategy, you need to know your entries and subsequent returns. Exits can be done this way as well, but I haven't bothered with it yet.
My process is something like this:
1. Identify positioning (position size can be done, haven't bothered): -1, np.nan, 1 as short, none, long
Code:
bars = pd.DataFrame(....)
long = bars.ema15 > bars.ema30 # or whatever
short = bars.ema15 < bars.ema30
bars['signal'] = np.nan
bars.loc[long,'signal'] = 1
bars.loc[short,'signal'] = -1
2. Identify subsequent returns
Code:
# want to look at returns after holding for N periods
for i in range(1,N+1):
# return after holding for i periods
# Note the negative shift: that looks into the future. OMG.
bars[f'return_{i}'] = bars.signal*(bars.shift(-i).close - bars.close)/bars.close
3. ???
4. ???
5. Profit? Maybe? Probably not.
I often identify any columns looking into the future with a 'return_' prefix or a 'future_' prefix so I don't accidentally use them anywhere else.
Hope this is useful to someone, would love to hear any criticisms.
Last edited: