Can linear regression analysis really predict the future?

MAESTRO · Nov 16, 2009

Quote from raker:

I wonder if Maestro could help answer a problem in regards to the centre of gravity or central tendancy of a data set , which one is a better measure or are they all valid ?

There are a few ways to measure the centre of a data set from which the standard deviation can be caluated .

1. Using the basic mean(average) of the data set.

2. A Linear regresion line can be used which is basically a least squared average of the data set.

3. The midpoint of the data set or median or the difference between the high and low for that time of day.

There are others but each are a valid measure of the centre or average of the data set and each will give a different standard deviation calculation.

I used your example(Maestro) from a previous thread where you pointed out about making the time series more gausian or normally distributed by calculating the 30 day average true range of the series and then putting a 30 day long linear regresion line on it and then measure the deviations from it.

I myself have along those lines have tried to normalize and create more normally distributed data by experimenting with intraday data with different time series such as a basket of stocks above and below the current day open and then normalizing the data into a percentage and then looking at the 10 20 and 30 day average ranges of the data set for a particular time of day (like a vwap but withot the volume)

I have created "variable period regresion lines" and "mean averages" and "midpoint of the day" of the timeseries and calculated the deviations from it.

It seems from just an emperical observation (not statistically tested) that the analogy of mass behavior or the flock of birds changing direction and the metronome example where they all start moving in sync with each other happens in the stock indexes intraday such as the dow 30 and the S&p 500 when there is a confluence of the timeseries when price is hitting the standard deviation created by the regresion line as well as the deviation created by the mean and the midpoint when all these different central tendancies hit their deviations at the same time , it seems to create meanful or large reversions , its as if all the algorithms that are used for trading stocks for the buy and sell side part of the institutions all see the same benchmark and act in unison to create a kind of "hearding affect of computer models"

It appears that you are on the right track. Now try to use the natural spline and measure the standard deviations from it. You will see that you have almost perfect Gaussian distribution. The next step is to make a decision what and when to follow: i.e. the mean or the regression to it.

raker · Nov 16, 2009

Thanks for replying Maestro I have been trying to up my knowledge in this area by reading some of the papers from J . Doyne Farmer (Prediction company) and my personal favourite one of the fathers of stat. arb or mean reversion the original ED Thorpe.

I cant decide whether to play from the mean to the deviation or vice versa , I am trying to see if one can do both but am finding that it is difficult to workout the amount of error around the mean.

As far as splines are concerned for me at my stage of programming development I find the formula a bit too involved to calculate or program into my software ( I wish there was a way of showing a simplified calculation - such as the the one for standard deviation on the Wikipedia website)

MAESTRO · Nov 16, 2009

Quote from raker:

...I cant decide whether to play from the mean to the deviation or vice versa ...

Try to incorporate the first derivative and use a threshold on it to switch from the following the mean to the mean reversion.

raker · Nov 16, 2009

Thanks for that Maestro will look at that , another observation that I have made and this only pertains to the stock indexes and the various baskets of stocks that make up the indexes is that as price touches or nears various vwap benchmarks or the mean of the day the data gets very noisy almost as if all the algorthms are seeing perceived value and are kicking off buy and sell orders around these area making the data noisier .

Again this is just an emperical observation and not backed up with tests but it seems a reasonable theory that the closer you are to to the mean or a 50/50 scenario the more chance the data can go up or down.

One possible solution I am looking at is with equal range or momentum bars which can eliminate alot of the variance around these areas as the standard deviation or variance of range bars are smaller than using time based charts.

Transatlantic99 · Nov 16, 2009

my 2 cents...

OLS regressions are heavily flawed BUT you will see certain forms of regressions used in certain endeavors...

* CoIntegration: where we are more concerned with the stationarity of the residual series which is an artifact of a regression amongst different instruments

* Ridge Regresions: Frequently mentioned in white papers and I believe more often used in the stat arb space or where the covariance matrix is quite large

With regards to the issue of time-varying parameters, there are methods to deal with this that are superior to standard OLS regressions...such as a kalman filter overlay for example...

Yisterwald · Nov 16, 2009

Quote from auspiv:

I've heard that price "gravitates" towards areas of peak volume and it tends to shy away from areas of low volume.

Heh -- I've heard that volume gravitates toward areas of attractive pricing, and it tends to shy away from areas of unattractive pricing.

daveb351 · Nov 16, 2009

Quote from Yisterwald:

Heh -- I've heard that volume gravitates toward areas of attractive pricing, and it tends to shy away from areas of unattractive pricing.

It's called Market Profile...

Craig66 · Nov 16, 2009

Quote from MAESTRO:

It appears that you are on the right track. Now try to use the natural spline and measure the standard deviations from it. You will see that you have almost perfect Gaussian distribution. The next step is to make a decision what and when to follow: i.e. the mean or the regression to it.

Hi MAESTRO,
I am still confused by the usage of the term 'natural spline', will not a cubic natural spline interpolate all the given points in a set? If so, how can one measure the deviations?

dtrader98 · Nov 16, 2009

Quote from dtrader98:
"Patterns are the fool's gold of the financial markets. The power of chance suffices to create spurious patterns that...for all the world appear predictable and bankable... They are the inevitable consequence of the human need to find patterns in the patternless." Benoit Mandelbrot

This over-tendency to of the mind to try to fit spurious patterns is precisely why we need computational intelligence to help augment our decisions. Unlike human minds, which by their very nature, attempt to arrive at subjective conclusions based on their ability to extrapolate and generalize-- computer algorithms allow us to process all of the data in a more robust statistical manner. [/B]

For those of you expressing some interest in many of the ideas regarding pattern interpolation and behavioral bias, I highly suggest taking a look at this recent book. Very easy to digest, and many great gems embedded within (+ a bargain for the cost).

It's not a book about trading systems, but a very good book about psychology and risk perceptions.

trackstar · Nov 17, 2009

MAESTRO!!!! THANK YOU!!!

Best thread on ET ever.