Overfitting Jump Canidates in Vol Trading

TheBigShort · Mar 26, 2019

Hi all,

I have been modelling the effect economic indicators/earnings has on equities. The equities I am looking at are not the equities under the lime light during the event. An example (I currently have a trade on) is how DHI will react to LEN earnings announcement. Another example is how a company will react to an Economic release (Currently long MA vol into GDP numbers).

My process certainly over fits the data especially for the economic indicator trades, and I was wondering if I could get some advice on how to reduce the bias. Once I get my data into a data frame, I work through each ticker in the top/bottom percentile to find the best candidate for the trade (time consuming and definitely not efficient).
On a side note, if anyone knows of a cheaper source than Trading Economics please lmk.

Process For earnings: This is mostly done on Bberg, I look at the supply chain of each company and scan for the highest $ relationship divided by the suppliers/customers mkt cap. I also use peers. Once a list of stocks is created, I compare how much the stocks move on the target companies earnings dates vs how much they move on the target company NON-earnings dates. Any interesting candidates will be further looked into. The problem lies in how significant the move really is. I'll post code and the simple math down below.

Process for Econ Indicators: This is really biased and probably my biggest issue. We have GDP numbers this week, and what I have done is scanned for the stocks that move the most on GDP announcement days. I use the mean, median and STD of the returns. I am currently using the free data (last 4 quarters) at Economic Trading.

The code is in R but I will right sentences in ## so you can follow along. This is screening for trades around economic releases. I am only using the 4 historical dates from Trading Economics (Usually use Bberg, however this is not a forever source).

Code:

##Get all unique tickers from S&P500, Nasdaq 100 and tickers that have weeklies. But lets get rid of BRK/B. 
get.tickers = function(){
 
  spx.url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
  nas.url = "https://en.wikipedia.org/wiki/NASDAQ-100"
  cboe.url = "http://www.cboe.com/products/weeklys-options/available-weeklys"
 
  table.s = read_html(spx.url)%>%html_nodes("table")%>%.[1]%>%html_table(fill = T)
  spx.ticker = as.data.frame(table.s)$Symbol
 
  table.n = read_html(nas.url)%>%html_nodes("table")%>%.[3]%>%html_table(fill=T)
  nas100 = as.data.frame(table.n)$Ticker
 
  table.w = as.data.frame(read_html(cboe.url)%>%html_nodes("table")%>%.[5]%>%html_table(fill = T))
  colnames(table.w) = table.w[1, ] # the first row will be the header
  table.w = table.w[-1, ]     
  tickers = str_replace(table.w$Ticker, '\\*', '')
  tickers = tickers[1:which(tickers == "ZTS")]
 
  all.tickers = unique(c(tickers, spx.ticker, nas100))
  return(all.tickers)
}

tickers = get.tickers()
tickers = tickers[-c(which(ticker == "BRK/B"))]

Code:

##GDP QoQ dates
dates.gdp = c("2018-10-26", "2018-11-28", "2018-12-21", "2019-02-28")

##Get the stock price data and /returns for all dates and dates on GDP day since 2018 since that is where our first GDP date starts.
##I am not using parallel programming for this call because I can not get the data into an environment/list if I do.

hub = new.env()
lapply(tickers, getSymbols, from = "2018-01-01", env = hub)

adjusted = lapply(hub, Ad)
returns = do.call(merge,  lapply(adjusted, ROC))
abs.returns = na.omit(abs(returns))
non.event.mean = apply(abs.returns, 2, mean)
non.event.median = apply(abs.returns, 2, median)
event.mean = apply(abs.returns[dates.gdp], 2, mean)
event.median = apply(abs.returns[dates.gdp], 2, median)
sd.event.mean = apply(abs.returns[dates.gdp], 2, sd)
median.dif = log(event.median/non.event.median)
mean.dif = log(event.mean/non.event.mean)
all.data = as.data.frame(cbind(median.dif, mean.dif, sd.event.mean))
View(all.data)

Here are the top long vol trades for the GDP announcement. Then I need to make sure none of the stocks had earnings on the GDP dates etc.. you can see this becomes daunting. Any help here would be MUCH appreciated thank you!!!!

srinir · Mar 27, 2019

Trading economics is what $200/mo for excel download or $400/mo for python or R download?

That seems expensive. You get lot more from Metastock xenith with $150/mo. I think you can use some of the Eikon API to access the data, but i am not so sure. But excel downloads comes with $150/mo

In my opinion, it is better to evaluate impact on sector etf's from the economic release than individual securities. There is lot more idiosyncratic risk with individual stocks to see the effect from the economic release. This is good enough for my use (from Bespoke)

tommcginnis · Mar 27, 2019

What I noticed long ago was that if the market didn't respond as I thought 'it should' to given upbeat or downbeat news, that 'we need to drop some' weight remained around -- maybe 4-5 days, to a decreasing degree. So, if a Reversion-to-Mean framework uses a pendulum as a handy mental rubric, the inertia that I sometimes observed in news flow changed that rubric to one that has informed my market assessments for years now: a brick-on-a-spring -- which lurches about, soaks up inertia, pauses at unwelcome moments, and then let's fly -- *eventually* taking the same path as a fixed *pendulum* concept.

So, how to code a pendulum-on-a-string? Lagged variables. And more, lagged via a EMA structure, such that the 'effect' under study diminishes dramatically over time. An error-reducing function (or likelihood-maximizing function) would fit right into any ML framework you can think of.

But if "the market" (as captured by major indices) reliably moves individual equities by 65%-80%, then tracking the *expected* movement of the indexes, and where it stalls and lurches, is important.

A thought, anyway....

sle · Mar 27, 2019

TheBigShort said:
View attachment 199612

Definitely looks spurious. It might not be overfit, might be just a result of a family-wise error - do you correct for that?

TheBigShort · Mar 28, 2019

srinir said:
Trading economics is what $200/mo for excel download or $400/mo for python or R download?

$75 a month using Quandl. I originally took a look at the major etfs, but did find anything interesting. Have you read a paper that talks about how indices react to different events? I love that table, I'll probably build something similar. Thanks!

tommcginnis said:
What I noticed long ago was that if the market didn't respond as I thought 'it should' to given upbeat or downbeat news, that 'we need to drop some' weight remained around -- maybe 4-5 days, to a decreasing degree. So, if a Reversion-to-Mean framework uses a pendulum as a handy mental rubric, the inertia that I sometimes observed in news flow changed that rubric to one that has informed my market assessments for years now: a brick-on-a-spring -- which lurches about, soaks up inertia, pauses at unwelcome moments, and then let's fly -- *eventually* taking the same path as a fixed *pendulum* concept.

So, how to code a pendulum-on-a-string? Lagged variables. And more, lagged via a EMA structure, such that the 'effect' under study diminishes dramatically over time. An error-reducing function (or likelihood-maximizing function) would fit right into any ML framework you can think of.

But if "the market" (as captured by major indices) reliably moves individual equities by 65%-80%, then tracking the *expected* movement of the indexes, and where it stalls and lurches, is important.

A thought, anyway....

Tom, I love your enthusiasm and I am sure your ideas are great, but I always have a hard time understanding what you write. If possible when your responding to my questions, could you reduce the amount of *#&-- and maybe use regular terms rather than metaphors. I am not a very smart guy, so you would be doing me a favour.

sle said:
Definitely looks spurious. It might not be overfit, might be just a result of a family-wise error - do you correct for that?

Yea, no doubt. MA moved the least in the whole S&P today LOL. I have not looked into family wise error, but it has gone to the top of my list to look into. What is your opinion on the table @srinir posted above? Do you have something similar on your desk?

sle · Mar 28, 2019

TheBigShort said:
$75 a month using Quandl. I originally took a look at the major etfs, but did find anything interesting. Have you read a paper that talks about how indices react to different events?

What paper is that?

TheBigShort said:
Yea, no doubt. MA moved the least in the whole S&P today LOL. I have not looked into family wise error, but it has gone to the top of my list to look into. What is your opinion on the table @srinir posted above? Do you have something similar on your desk?

I have some event studies that I use but in my experience the statistical significance of these things is relatively low. My prior would be that in any given point in time you get a different "market setup" and the resulting reaction could be different. Some of the more obvious things like "buy risk premia before everyone else starts buying them into the numbers" take a lot of tinkering to iron out.

djames · Mar 29, 2019

+1 for what paper is that?

TheBigShort · Mar 29, 2019

I was asking @srinir if he has read any papers on how indices move during these events.

srinir · Mar 29, 2019

TheBigShort said:
I was asking @srinir if he has read any papers on how indices move during these events.

Sorry didn't realize there was a question for me. No, I haven't read any papers. I remember @globalarbtrader mentioned in his thread there was negligible effects from macro data from his studies.

Doobs789 · Apr 2, 2019

RCH Stock Market excel plugin has a lot of data from numerous providers. Not sure if it will help, but it's free.

http://ogres-crypt.com/SMF/