A Simple Approach To Find The Hedge Ratio For A Pairs Trade

TheBigShort · Oct 29, 2019

Craig66 said:
I guess the overall message I'm trying to get across (because I've been there) is keep it as simple as possible.

Thanks for the feedback Craig! I see you were there at one point (almost 10 years ago!). https://www.elitetrader.com/et/threads/pair-trading-question.171595/

I am looking for a simple approach, however, I have yet to be led astray by Kevin so I will most likely go down the route he is mentioning.

Hey Kevin,

I found this link really helpful in explaining how you went from PCA to TLS (if anyone else wanted further explanation on why you used princomp in the code) https://stats.stackexchange.com/que...ogonal-regression-total-least-squares-via-pca Minimizing orthogonal errors makes a lot of sense to me for this problem. The function worked great.

It's not that I want to give up on VAR; It's that every time I see the model in use it's always Y1:Yt where Yt is usually greater than 3. I thought it might make more sense for dispersion trading where you are trading (say) SPY vs AAPL, MSFT, GOOGL, AMZN, NFLX. But you are saying that for a pair of vectors (Y1, Y2) it's still a viable model (preforms better than TLS/GLS)? Even with Diebold, he was looking at connectedness between multiple assets. But if you are saying its good for a pairs trade I will be more than happy to dive deeper.

I really like the idea of just trading the error correcting leg. Maybe we can talk about this further down the road because I am sure I will have some questions in regards to selection. Ie. If long SPY is the error correcting leg on one pair and long QQQ is the error correcting leg on another pair, I end up with too much market exposure. Am I thinking about that right?

Craig66 · Oct 30, 2019

No problem, go what you're comfortable with.
I wouldn't draw too many conclusions from my old posts, I was so clueless

Kevin Schmit · Oct 30, 2019

TheBigShort said:
T
It's not that I want to give up on VAR; It's that ...for a pair of vectors (Y1, Y2) it's still a viable model (preforms better than GLS)?

To see why that might be the case, lets reduce the two vector case to a single vector Y* = Y1* - Y2*; where Y1* and Y2* are standard scores of Y1 and Y2 vs your smoothed individual instantaneous vol estimates. Y* is a single series conforming to what you would be trading -- when it goes down the pair is converging, when up, diverging.

If you run a GLS of Y* on its lag(s), and fit an ARMA/ARIMA model on Y*, you should get the same or nearly the same Y*hat vector. But there are two main reasons to prefer the AR(I)MA model -- 1) in R it has a useful predict a few periods ahead feature; and 2) it is easier to calculate the impulse function (armimp in timsac package), which relates to my point about directional cross entropy (in the VAR case).

The Y* single vector representation of the pair trade problem is also useful in case you want to fit a regime shift model on the process -- say a HMM with OU emissions where, in the Euler-Maruyama version of OU you'd have regimes of perhaps strongly positive, weakly positive, and negative lambda.

If long SPY is the error correcting leg on one pair and long QQQ is the error correcting leg on another pair, I end up with too much market exposure. Am I thinking about that right?

Yes that is correct. That is why you want to trade enough of those low-volume ETF pairs so that you can balance out longs and shorts to neutralize the market factor at least.

TheBigShort · Oct 30, 2019

Quick question as I study VECM.

When I test for cointegration, Yt - BXt = e, where e is iid, should I be using OLS,TLS or GLS to find my error term to feed into the cointegration test? Do you have a preferable cointegration test/function? It seems there are quite a bunch in R.

Originally I thought to use lm(y ~x) where x is stock1 px and y is stock2 px for the error term, but autocorrelation will effect my results. So maybe I should use GLS for the error term to test for cointegration?

bone · Oct 30, 2019

As was mentioned previously - what I have personally found to work best as “quick and dirty” is to monetize the volatility and average trading range for each spread leg on a rolling 20 day average. That will get you plenty close enough. Especially for shorter term timeframes.

I think you will find reliable and efficient execution to be a much bigger and more formidable challenge than getting your Beta ratio tuned to the Nth degree. Don’t assume that your broker will automatically have the shares available to you that you want to short at the precise time you need to enter a sell short order. And getting decent fills on simultaneous spread legs in the real live stock market (not a simulator) is QUITE a challenge - that’s where your real test will come.

TheBigShort said:
Hello, I am looking at pairs trading from a very high level. I would like to build a simple model that gives me a decent hedge ratio.

My first look was at a VAR (vector autoregressive model) because I played around with it a few months back. The problem is, it seems like it is most useful when we have more than 2 vectors.

From recent readings, the Kalman filter looks like a very popular tool to measure the hedge ratio. However, we are adding more parameters to the model that need estimating.

For my current situation, I think a rolling beta will do the job. With that being said, I am having a bit of a hard time with understanding it. Below is some R code with commentary as I try and work through an example. Before I begin I found this series on using the Kalman filter and I was wondering what others thought about it. https://robotwealth.com/kalman-filter-pairs-trading-r/

In my example, I will be looking at the GLD/GDX spread.

Lets load the necessary packages and get the data

Code:

library(urca) library(MASS) library(quantmod) library(zoo) getSymbols(c("GDX", "GLD"))

test for cointegration using Engle Granger
first find error terms, we will use a robust regression with method set to "MM"

Code:

new.df = merge(GLD$GLD.Adjusted, GDX$GDX.Adjusted) mod = rlm(GLD.Adjusted ~ GDX.Adjusted, data = as.data.frame(new.df), method = "MM") plot(mod$residuals, type = "l")

View attachment 212178
It looks like there was a trend in the residuals for the first 1500 data points before it flattened out.
I am not going to even run the test. Due to the trend we will most likely see that the assets are not cointegrated. I could adjust for the trend but I do not think it's necessary for this example. Correct me if I am wrong.

R has the dlm package which fits dynamic linear models. Hopefully someone has used this package before and can touch on this but for this example we will use a simple OLS regression to measure the hedge ratio.

Code:

regression.window = 60 GLD.rets = ROC(GLD$GLD.Adjusted) GDX.rets = ROC(GDX$GDX.Adjusted) rets.df = na.omit(merge(GLD.rets, GDX.rets)) head(rets.df) colnames(rets.df) = c("GLD", "GDX") mod.coef = rollapply(zoo(rets.df), width=regression.window, FUN = function(Z) { t = lm(formula=GLD~GDX, data = as.data.frame(Z), na.rm=T); return(t$coef) }, by.column=FALSE, align="right") tail(mod.coef)

This is what the rolling beta looks like.
View attachment 212180
View attachment 212179
View attachment 212182
The next part is where I am having the issue.

If I flip GDX and GLD so that the formula is
lm(GDX ~ GLD) (instead of GLD ~ GDX)
our current Beta is 2.03. Why is it not 3(the inverse of .33)?

Since we are in return space. The current beta of(.337) would mean that If I buy $10,000 worth of GLD, I sell short $3,370 worth of GDX??
If we now regress on GDX on GLD we have a current Beta of 2.03 which would mean if we are long $10,000 worth of GLD, we are short $5,000 worth of GDX.

So in short, I am looking to construct a simple pairs strategy (I will be venturing into the micro-cap ETF space) where I can easily estimate a decent hedge ratio. I will also be keeping an eye with idiosyncratic and hidden-factor risks.

Kevin Schmit · Oct 30, 2019

TheBigShort said:
QDo you have a preferable cointegration test/function?

I generally prefer Box-Tiao as elucidated by Bewley, Yang, et al in a series of papers starting in about 1985. In this case though, the particular test is not that important, take your pick. You'd be fine with the original Engle-Granger two stage method based on simple regression.

Your choice of test dictates the answer to your first question (OLS vs TLS ...), but again, not that important because ... if you have a valid ECM, your vectors are cointegrated and there is Granger Causality lurking somewhere in there, even if time-varying, and you'll want to know the strength and direction of that causality. Unit root tests don't tell you those two things -- distance from a critical value is not a measure of the strength of the conintegrating relationship, just as p-value is neither power nor size.

So, in the end, you'll have to fit a model of some kind anyway, so why not just test the model.

I also suggest you look for group or basket (more than two assets) cointegration relationships. A good rule of thumb is that you will need at least one more asset than the number or risk factors you want neutralized in your trading. For example if you're looking for a cointegrated basket of rate instruments and you want to neutralize the usual three components (level, slope, curvature), you'll need at least 4 reasonably widely separated maturities to fit your cointegrating basket. In the ETF space, you might want to neutralize the market and industry factors, or perhaps some or all the Fama-French risk factors.

Edit: obviously, if your arbing off-the-run vs on-the-run the maturities are so close that you don't have much exposure to the three components and my rule of thumb above doesn't hold,

Also, I should have mentioned previously about TLS when you want to use a Kalman Filter to model the time-varying coefficents, as I think you mentioned in your original post. In that case don't use princomp or eigs for the TLS computation.

TheBigShort · Nov 1, 2019

Kevin, I am finished my first round of studying and will try my first attempt at a simple VECM model. I would greatly appreciate if you could evaluate my work. Next will be SVECM and then basket cointegration. If you don't mind explaining it as if you were talking to a 10 year old that would really help solidify the information.

Here are some links I found very useful (the edx course was great).
https://courses.edx.org/courses/cou...9b5f9d8b832f4fa4837d09447db2dd2c/?child=first

https://rpubs.com/simasiami/384720
https://ses.library.usyd.edu.au/bit...d=FC2AFBEDBA9C129F1427C06F6F834248?sequence=1

https://stats.stackexchange.com/questions/tagged/vecm
https://stats.stackexchange.com/questions/tagged/var
https://cran.r-project.org/web/packages/vars/vignettes/vars.pdf
Going through the questions on cross validated really helped me. Richard does a great job at breaking var and vecm down.

You originally mentioned to use SVECM so hopefully after I get the okay from you on VECM we can move on.

On a side note, if I have two assets that are not cointegrated and non-stationary, I cant use VAR or VECM. So for pairs like KO/PEP or GDX/GLD where trends and large structural shifts exist, should I completely avoid them?

Before I get into this I would also like to mention for future readers - This was a great exercise and I learnt a lot. The VAR family is applicable too many fields in finance. If any of you want my notes, send me a PM and I will share with you my Google doc.

For simplicity we are going to look at SPY/QQQ. The goal is to identify when the spread is too large, how fast it will mean revert, which is the error correcting leg and what our hedge ratio is.

Code:

#lets import the data
getSymbols(c("SPY", "QQQ"), from = "2010-01-01")
df = merge(log(SPY$SPY.Adjusted), log(QQQ$QQQ.Adjusted))
ts.plot(df, col = c("red", "blue"))

Screen Shot 2019-11-01 at 4.29.19 PM.png

There is a bit of a trend in the residuals, but if we take a quick look at the spread using the tlsHedgeRatio function, there does seem to be a linear combination that turns the residuals into an I(0) process.

Code:

ksSpread = tlsHedgeRatio(df$SPY.Adjusted, df$QQQ.Adjusted)

plot(ksSpread$spread)

Screen Shot 2019-11-01 at 4.32.25 PM.png

So right off the bat these assets seems cointegrated. Let's do a Johanson test just to make sure.

Code:

cointegration <- ca.jo(df, type="trace",ecdet="trend",spec="transitory")
summary(cointegration)

cointegration@teststat

Screen Shot 2019-11-01 at 4.35.43 PM.png

It seems we have more than 0 and less than one cointegrated relationship. r = 0 would mean both our assets were already I(0)

Next we look for causality - Does SPY causes QQQ and vice versa. For this we will use the Granger Causality test.
To do this we first calculate the stock returns.

Code:

diff.spy = diff(df$SPY.Adjusted)
diff.qqq = diff(df$QQQ.Adjusted)
df.rets = na.omit(cbind(diff.spy, diff.qqq))

We then fit a VAR model. For the VAR model we allow the function to minimize the AIC score. On a side note, I am not 100% sure about the intuition on choosing between c("both", "trend", "const"). But I have chosen to use "both" here.

Code:

rets.var = VAR(df.rets, type = "both", lag.max = 8, ic = "AIC")
causality(rets.var, cause = "QQQ.Adjusted")$Granger
causality(rets.var, cause = "SPY.Adjusted")$Granger

Screen Shot 2019-11-01 at 4.49.12 PM.png

It seems like we can not reject the NULL and neither SPY or QQQ cause each other.

Next we estimate our VAR model take a look at the summary statistics and build our VECM.

Code:

#here we let the model choose the optimal lag length. We want to minimize the AIC so we chooes n = 2
VARselect(df, lag.max = 8, type = "both")$selection
#Kevin could you give some intuition between parameters eigan and trace? I have only seen math formulas for the reasoning and I can't get my head around it.
#next we build vecm with lag length 2
var1 = ca.jo(df, K = 2, type = "eigen",
                 ecdet = "const", spec = "transitory")

#The VECM spits out only 1  lag here because VECM must have 1 less lag than VAR
vecm = cajorls(var1)
summary(vecm$rlm)
#since neither spy nor qqq cause eachother, i did not add any restriction what do you think?
#should I be adding any restrictions to this model?
v1.VAR = vec2var(var1)
#lets make a forcast for 10 days ahead and plot
v1.VAR.fcst = predict(v1.VAR, n.ahead = 10)

Voila! We have our hedge ratio and constant! I am using ect1 for this so our hedge ratio if we go long SPY is short .7762 QQQ +1.6. Kev what's the intuition here? Is this dollar weight or stock weighted? If it is not dollar weight I am not to sure how to interoperate the constant coefficient for my hedge ratio.

Screen Shot 2019-11-01 at 5.00.10 PM.png

Here is how well our rlm does.

Screen Shot 2019-11-01 at 4.57.54 PM.png

Screen Shot 2019-11-01 at 4.58.03 PM.png

And here is our (zoomed in) forecast for 10 days ahead.

Screen Shot 2019-11-01 at 4.15.06 PM.png

The main areas where I currently need some help is:
1) my code (am i correctly writing the code)
2) Where do I find the error correcting constant in the vecm output? Once I find it, how do I understand which one is more likely to correct (assuming neither are non 0, or is one always 0?)?
3)I would like to forecast the spread, however, it seems I am forecasting each leg on it's own. Does the var package have a function where I can graph the spread with it's forecast or do I have to construct the spread using the coefficients provided above and run a ARMA, ARIMA model?

Sorry I have all these questions for you Kevin, but I am not comfortable taking advice about this stuff from my professors. Last time I asked about vol being in backwardation going into earnings, I got the answer "because farther dated options are less liquid and most of the demand is for the near dated options".

Kevin Schmit · Nov 2, 2019

TheBigShort said:
if I have two assets that are not cointegrated and non-stationary, I cant use VAR ...

I'll answer your full post later when I have the time, but for now I'd like to clear up this misaprehension, If you regress y on X, both non-stationary and not cointegrated, and its lags (like we used to do with distributed lag models pre-VAR) without including lags of y on the right-hand-side, then yes, the model will be spurious and inconsistent. However, if lags of y are included, like in a VAR, the model is consistent and useful as-is for, esp, forecasting/prediction. It is perfectly ok to run a VAR in levels on non-stationarly not cointegrated series. The standard errors will need adjustment, IRF is tricky, and tests (for e.g. Granger Causality) are problematic, but forecasts are consistent and unbiased. In the econometrics literature, VAR in levels on non-stationary series is quite common.

Modeling shocks/IRF's is difficult because theoretically the Wold decomposition doesn't exist (matrix is not invertible), but if it is close, you can still get decent IRF estimates. Tests can still be run jointly, Sims wrote on this 30 years ago and IIRC a couple of Japanese researchers expanded on this (don't remember their names).

So for pairs like KO/PEP or GDX/GLD where trends and large structural shifts exist, should I completely avoid them?

I wouldn't give up on them quite yet. I would be very surprised if there were no tradable cointegration relationship in those two pairs. For time, seasonal, or deterministic trends, remove them first. Stochastic trend is not a big problem with VAR ( and remember that any VECM has a VAR(1) representation). Model shocks like structural shifts in the VAR model (in fact the degenerate Wold case (MA representation) implies some shocks have permanent effect). Remember that most cointegration tests have low power when the series is very close to I(1) but actually not.

A note on TLS -- it may be useful to think of TLS as a special case of ridge regression. Both are essentially a diagonal matrix of small numbers subtracted from the XX' matrix.

Edit: you can always run a chain-weighted OU fit like Carr and Lopes de Prado do in a recent paper (complete with Python code!). These are two of my least favorite researchers, but even a blind squirrel finds an acorn occasionally. OU (Euler form) can be easily fit with simple OLS.

Kevin Schmit · Nov 3, 2019

Attached is the Toda and Yamamoto paper I referenced above. The method is fully described in the abstract. It can be implemented in the vars package by including max-integration-order-lagged x and y (e.g. GLD and GDX) as exogenous variables in your VAR (exogen parameter in the VAR call) and then using the causality method to test for Granger causality (actually the null is no Granger causality) using the Wald test.

TheBigShort · Nov 3, 2019

Thanks for the paper(s) Kevin. I am going to get started on the VAR one right now. It looks a bit dense so I might take me a few days to work out the equations. After that i'll check back in with you to pick your brain a bit more on this topic.

In the mean time, I am going to migrate over to CV/QF to ask more questions (give you a bit of a breather

)

A Simple Approach To Find The Hedge Ratio For A Pairs Trade

Attachments