Hello, I am looking at pairs trading from a very high level. I would like to build a simple model that gives me a decent hedge ratio.
My first look was at a VAR (vector autoregressive model) because I played around with it a few months back. The problem is, it seems like it is most useful when we have more than 2 vectors.
From recent readings, the Kalman filter looks like a very popular tool to measure the hedge ratio. However, we are adding more parameters to the model that need estimating.
For my current situation, I think a rolling beta will do the job. With that being said, I am having a bit of a hard time with understanding it. Below is some R code with commentary as I try and work through an example. Before I begin I found this series on using the Kalman filter and I was wondering what others thought about it. https://robotwealth.com/kalman-filter-pairs-trading-r/
In my example, I will be looking at the GLD/GDX spread.
Lets load the necessary packages and get the data
test for cointegration using Engle Granger
first find error terms, we will use a robust regression with method set to "MM"
It looks like there was a trend in the residuals for the first 1500 data points before it flattened out.
I am not going to even run the test. Due to the trend we will most likely see that the assets are not cointegrated. I could adjust for the trend but I do not think it's necessary for this example. Correct me if I am wrong.
R has the dlm package which fits dynamic linear models. Hopefully someone has used this package before and can touch on this but for this example we will use a simple OLS regression to measure the hedge ratio.
This is what the rolling beta looks like.
The next part is where I am having the issue.
If we now regress on GDX on GLD we have a current Beta of 2.03 which would mean if we are long $10,000 worth of GLD, we are short $5,000 worth of GDX.
So in short, I am looking to construct a simple pairs strategy (I will be venturing into the micro-cap ETF space) where I can easily estimate a decent hedge ratio. I will also be keeping an eye with idiosyncratic and hidden-factor risks.
My first look was at a VAR (vector autoregressive model) because I played around with it a few months back. The problem is, it seems like it is most useful when we have more than 2 vectors.
From recent readings, the Kalman filter looks like a very popular tool to measure the hedge ratio. However, we are adding more parameters to the model that need estimating.
For my current situation, I think a rolling beta will do the job. With that being said, I am having a bit of a hard time with understanding it. Below is some R code with commentary as I try and work through an example. Before I begin I found this series on using the Kalman filter and I was wondering what others thought about it. https://robotwealth.com/kalman-filter-pairs-trading-r/
In my example, I will be looking at the GLD/GDX spread.
Lets load the necessary packages and get the data
Code:
library(urca)
library(MASS)
library(quantmod)
library(zoo)
getSymbols(c("GDX", "GLD"))
test for cointegration using Engle Granger
first find error terms, we will use a robust regression with method set to "MM"
Code:
new.df = merge(GLD$GLD.Adjusted, GDX$GDX.Adjusted)
mod = rlm(GLD.Adjusted ~ GDX.Adjusted, data = as.data.frame(new.df), method = "MM")
plot(mod$residuals, type = "l")
It looks like there was a trend in the residuals for the first 1500 data points before it flattened out.
I am not going to even run the test. Due to the trend we will most likely see that the assets are not cointegrated. I could adjust for the trend but I do not think it's necessary for this example. Correct me if I am wrong.
R has the dlm package which fits dynamic linear models. Hopefully someone has used this package before and can touch on this but for this example we will use a simple OLS regression to measure the hedge ratio.
Code:
regression.window = 60
GLD.rets = ROC(GLD$GLD.Adjusted)
GDX.rets = ROC(GDX$GDX.Adjusted)
rets.df = na.omit(merge(GLD.rets, GDX.rets))
head(rets.df)
colnames(rets.df) = c("GLD", "GDX")
mod.coef = rollapply(zoo(rets.df),
width=regression.window,
FUN = function(Z)
{
t = lm(formula=GLD~GDX, data = as.data.frame(Z), na.rm=T);
return(t$coef)
},
by.column=FALSE, align="right")
tail(mod.coef)
The next part is where I am having the issue.
If I flip GDX and GLD so that the formula is
lm(GDX ~ GLD) (instead of GLD ~ GDX)
our current Beta is 2.03. Why is it not 3(the inverse of .33)?
Since we are in return space. The current beta of(.337) would mean that If I buy $10,000 worth of GLD, I sell short $3,370 worth of GDX??lm(GDX ~ GLD) (instead of GLD ~ GDX)
our current Beta is 2.03. Why is it not 3(the inverse of .33)?
If we now regress on GDX on GLD we have a current Beta of 2.03 which would mean if we are long $10,000 worth of GLD, we are short $5,000 worth of GDX.
So in short, I am looking to construct a simple pairs strategy (I will be venturing into the micro-cap ETF space) where I can easily estimate a decent hedge ratio. I will also be keeping an eye with idiosyncratic and hidden-factor risks.
Last edited: