Hey everyone, I thought ET might do well with a stats/betting thread. Although Cross Validated is very helpful with answering my questions, sometimes non-financial data scientists have a hard time connecting the 2 fields. We are lucky enough to have some very smart people on this forum who I and many others would love to learn from. For obvious reasons, I will be hiding some of the variables used in my models going forward.
A couple days ago I came across an interesting variable that did a decent job at predicting the 1 month implied vol vs what the market actually realized in the following 30 days (SPX). log(IV t0/RV t1). The variable is the interest rate swap vol (SRVIX). Here is the first model we have. The data is from 2012 - Yesterday.
iv_rv = log(IV t0/RV t1)
TenYearVol = SRVIX.Index
iv_rv ~ TenYearVol
Plot1 = regular graph
Plot2 = residuals
Plot3 = QQ plot
Plot4 = summary
From looking at the residuals, we can see lots of heteroskesdaticity, and the qqplot tells us that we have some heavy tails, so the distribution is not normal. So I did 2 different transformations, the first was a boxcox, the second was to use a general linear model with a gamma distribution. The gamma distribution was a better fit so I ended up going with that. Here are the stats. We also have a quasi R^2 of .20 (1 - residulas/Null).
What do you guys think? Is this trade-able? Maybe not enough data?
For the interested, I added a dummy variable, where 1 = SPX was above SMA50 and 0 = SPX was below SMA50, it only marginally increased the R^2.
A couple days ago I came across an interesting variable that did a decent job at predicting the 1 month implied vol vs what the market actually realized in the following 30 days (SPX). log(IV t0/RV t1). The variable is the interest rate swap vol (SRVIX). Here is the first model we have. The data is from 2012 - Yesterday.
iv_rv = log(IV t0/RV t1)
TenYearVol = SRVIX.Index
iv_rv ~ TenYearVol
Plot1 = regular graph
Plot2 = residuals
Plot3 = QQ plot
Plot4 = summary
From looking at the residuals, we can see lots of heteroskesdaticity, and the qqplot tells us that we have some heavy tails, so the distribution is not normal. So I did 2 different transformations, the first was a boxcox, the second was to use a general linear model with a gamma distribution. The gamma distribution was a better fit so I ended up going with that. Here are the stats. We also have a quasi R^2 of .20 (1 - residulas/Null).
What do you guys think? Is this trade-able? Maybe not enough data?
For the interested, I added a dummy variable, where 1 = SPX was above SMA50 and 0 = SPX was below SMA50, it only marginally increased the R^2.
