Joe Doaks' Data Analysis

Equalizer · Jan 22, 2008

Hypo, differentiating or differencing? Surely you imply the latter.

Joe Doaks · Jan 22, 2008

With discrete (not continuous) data, the same thing. Call it sample to sample change if you like. Keep me dishonest.

infolode · Jan 22, 2008

Quote from Joe Doaks:

I cannot decide whether nobody gives a shit about what I am trying to show you, or if you are just too innumerate to understand. I suspect the latter. But for the sake of closure I shall continue.

When you have two variables which you suspect may be related to each other by causation, or to a third unknown variable, a common approach is to perform a cross-correlation of the two. This consists of leading or lagging one of the variables and calculating the sum of the products of the overlapped series. If there is causation, and the processes have been sampled at the correct interval, lags or leads by only a few samples will suggest whether or not they might be related. Attached is the cross-correlation of A (the slowly varying variable) and B (the fast varying variable). Negative shifts reveal if variable B might lead variable A and therefore possible cause it's variation. Positive shifts investigate if A might lead B. The numerical values on the left are so low as to likely be random results, and it is unlikely that changes in B cause changes in A. Values on the right are higher, but not sufficiently so to suggest that A is a strong causative factor in changes in B. Thinking that the two series might be oversampled, I tried cross-correlation at intervals of five samples. I found moderate correlations on the order of +0.5 in a broad range around 200 shifts, but the broadness of the correlation suggests only that there is some cyclicality in the two series. So there is no evidence of correlation between variable A and variable B.

Perhaps you could employ the use of frequency doubling to this signature.

Joe Doaks · Jan 22, 2008

Very insightful! In fact, given that variable B never goes negative, to compare the difference data for A (which WILL have negative samples) we must in effect frequency double the difference values by taking their absolute value, in effect full-wave rectifying. I am several steps ahead of my posting to make sure I make no mis-steps. I thank you for your comment. And it is fun for me that you see where I am going. Do you already know the answer? Half of ET will believe it and the other half won't, so in the end I will have accomplished nothing but having a little mathemagical fun with the great innumerate unwashed.

DaemonTrader · Jan 23, 2008

Ok, I'll bite.

The chart of lagged correlations between variable A (Price) and variable B (Volume) confused me. So much so, that I decided to recreate your analysis.

For the life of me, my brain can't comprehend why the values have such an orderly increase from lags 0 to 30.

As our lag window increases, why should price level become a better predictor of volume? A price level 30 periods in the future predicts volume better than one 10 periods in the future? WTF?

I suppose price level would have some predictive value, however I'd expect to see much greater noise in the correlations from lags 0 to 30.

Perhaps it is because when we lag a time series, we are excluding a portion of the data? That portion happens to be the beginning part of the day when the volatility is greatest...
Care to enlighten me? I need to brush up on my statistics.

Thanks,

DaemonTrader · Jan 23, 2008

Excel file for anyone interested...

Joe Doaks · Jan 23, 2008

You're giving me the DT's, DT. I'll have to get back to you, setting up for trading right now. Thanks for posting!

Joe Doaks · Jan 23, 2008

DT, you done broke de code! (Very old joke, called Four Roses, whose punch line is "Dass a likker, ain't it?")

I believe you have correctly identified the mystery data as being the January 18th minute-by-minute closing price and volume data of the NASDAQ 100 CME E-Mini future with March 2008 expiration, known to some as NQ H8.

That information sheds valuable light on my analysis consulting assignment. I'll get back to you!

Joe Doaks · Jan 23, 2008

DT, there are several answers to the questions you pose:

- as the analyses show, it is nearly meaningless to analyze pure price and volume, because what is important is sample-to-sample price change (you could look at volume change, but that's a waste of time, been there)

- knowing that the sample interval is a minute, and recognizing that 30 minutes shift either way in time should be enough to show a tradeable causality, the correlation results clearly are too weak to suggest causality (I did not say, but I normailzed both series to a maximum value of 1 for numerical convenience, so when you see 0.3 correlation, it is at best only slightly better than random

- analysis at shifts of five minute increments shows only a broad cyclicality for this data set

- in pure price data, there is only the merest hint that price might lead volume, but a near certainty that it cannot be the other way 'round.

Price chage and volume correlations to come.

(A hint to impatient readers: it's really not worth hanging in here. If there were an edge I certainly wouldn't post it. My objectives are purely iconoclastic. Iconoplastic? For plastic icons?)

Joe Doaks · Jan 23, 2008

Attached is the distribution of the bar-to-bar change in the close. The mean is -0.029, the standard deviation is 1.985, the kurtosis is 0.985, and the skewness is 0.371. That is quite close to a Gaussian (normal) distribution as you get with real-world data. So for this sample set, price change is normally distributed. That is not to say necessarily random. On ET the only thing truly random is opinions about market direction. Cross correlation of price change with volume to come.

Joe Doaks' Data Analysis

Equalizer

Joe Doaks

infolode

Joe Doaks

DaemonTrader

DaemonTrader

Attachments

Joe Doaks

Joe Doaks

Joe Doaks

Joe Doaks

Attachments