Kevin, I am finished my first round of studying and will try my first attempt at a simple VECM model. I would greatly appreciate if you could evaluate my work. Next will be SVECM and then basket cointegration. If you don't mind explaining it as if you were talking to a 10 year old that would really help solidify the information.
Here are some links I found very useful (the edx course was great).
https://courses.edx.org/courses/cou...9b5f9d8b832f4fa4837d09447db2dd2c/?child=first
https://rpubs.com/simasiami/384720
https://ses.library.usyd.edu.au/bit...d=FC2AFBEDBA9C129F1427C06F6F834248?sequence=1
https://stats.stackexchange.com/questions/tagged/vecm
https://stats.stackexchange.com/questions/tagged/var
https://cran.r-project.org/web/packages/vars/vignettes/vars.pdf
Going through the questions on cross validated really helped me. Richard does a great job at breaking var and vecm down.
You originally mentioned to use SVECM so hopefully after I get the okay from you on VECM we can move on.
On a side note, if I have two assets that are not cointegrated and non-stationary, I cant use VAR or VECM. So for pairs like KO/PEP or GDX/GLD where trends and large structural shifts exist, should I completely avoid them?
Before I get into this I would also like to mention for future readers - This was a great exercise and I learnt a lot. The VAR family is applicable too many fields in finance. If any of you want my notes, send me a PM and I will share with you my Google doc.
For simplicity we are going to look at SPY/QQQ. The goal is to identify when the spread is too large, how fast it will mean revert, which is the error correcting leg and what our hedge ratio is.
Code:
#lets import the data
getSymbols(c("SPY", "QQQ"), from = "2010-01-01")
df = merge(log(SPY$SPY.Adjusted), log(QQQ$QQQ.Adjusted))
ts.plot(df, col = c("red", "blue"))
There is a bit of a trend in the residuals, but if we take a quick look at the spread using the tlsHedgeRatio function, there does seem to be a linear combination that turns the residuals into an I(0) process.
Code:
ksSpread = tlsHedgeRatio(df$SPY.Adjusted, df$QQQ.Adjusted)
plot(ksSpread$spread)
So right off the bat these assets seems cointegrated. Let's do a Johanson test just to make sure.
Code:
cointegration <- ca.jo(df, type="trace",ecdet="trend",spec="transitory")
summary(cointegration)
cointegration@teststat
It seems we have more than 0 and less than one cointegrated relationship. r = 0 would mean both our assets were already I(0)
Next we look for causality - Does SPY causes QQQ and vice versa. For this we will use the Granger Causality test.
To do this we first calculate the stock returns.
Code:
diff.spy = diff(df$SPY.Adjusted)
diff.qqq = diff(df$QQQ.Adjusted)
df.rets = na.omit(cbind(diff.spy, diff.qqq))
We then fit a VAR model. For the VAR model we allow the function to minimize the AIC score. On a side note, I am not 100% sure about the intuition on choosing between c("both", "trend", "const"). But I have chosen to use "both" here.
Code:
rets.var = VAR(df.rets, type = "both", lag.max = 8, ic = "AIC")
causality(rets.var, cause = "QQQ.Adjusted")$Granger
causality(rets.var, cause = "SPY.Adjusted")$Granger
It seems like we can not reject the NULL and neither SPY or QQQ cause each other.
Next we estimate our VAR model take a look at the summary statistics and build our VECM.
Code:
#here we let the model choose the optimal lag length. We want to minimize the AIC so we chooes n = 2
VARselect(df, lag.max = 8, type = "both")$selection
#Kevin could you give some intuition between parameters eigan and trace? I have only seen math formulas for the reasoning and I can't get my head around it.
#next we build vecm with lag length 2
var1 = ca.jo(df, K = 2, type = "eigen",
ecdet = "const", spec = "transitory")
#The VECM spits out only 1 lag here because VECM must have 1 less lag than VAR
vecm = cajorls(var1)
summary(vecm$rlm)
#since neither spy nor qqq cause eachother, i did not add any restriction what do you think?
#should I be adding any restrictions to this model?
v1.VAR = vec2var(var1)
#lets make a forcast for 10 days ahead and plot
v1.VAR.fcst = predict(v1.VAR, n.ahead = 10)
Voila! We have our hedge ratio and constant! I am using ect1 for this so our hedge ratio if we go long SPY is short .7762 QQQ +1.6. Kev what's the intuition here? Is this dollar weight or stock weighted? If it is not dollar weight I am not to sure how to interoperate the constant coefficient for my hedge ratio.
Here is how well our rlm does.
And here is our (zoomed in) forecast for 10 days ahead.
The main areas where I currently need some help is:
1) my code (am i correctly writing the code)
2) Where do I find the error correcting constant in the vecm output? Once I find it, how do I understand which one is more likely to correct (assuming neither are non 0, or is one always 0?)?
3)I would like to forecast the spread, however, it seems I am forecasting each leg on it's own. Does the var package have a function where I can graph the spread with it's forecast or do I have to construct the spread using the coefficients provided above and run a ARMA, ARIMA model?
Sorry I have all these questions for you Kevin, but I am not comfortable taking advice about this stuff from my professors. Last time I asked about vol being in backwardation going into earnings, I got the answer "because farther dated options are less liquid and most of the demand is for the near dated options".