Can linear regression analysis really predict the future?

Craig66 · Nov 13, 2009

Quote from dtrader98:
----------------------------
If posters are bored, here's something to play with.

Since many posters are looking at this from a mean reversion perspective, see if you can understand the reversion directional probability rule.

Take a ts generated from a gaussian process (a large one like 1000 steps).
Then count how many times the following
occur
a) p(t-1) < median AND sign(p(t)-p(t-1))= POS

b) p(t-1) > median AND sign(p(t)-p(t-1))= NEG

Sum both results and divide by the total number of trials. Run it over and over. Does the directional indicator only give a 50% result? Remember, you are generating a probability rule based on random walk distribution. [/B]

Is the medium "the medium for the last n innovations"? If so then isn't the value of n going to heavily affect the result?

Here are some sample runs with 1000 innovation, median length 40, repeat 10x.

POS:24.75 NEG:25.75
POS:25.58 NEG:25.62
POS:26.37 NEG:24.77

Discarding the fact I may have coded this incorrectly, I'm not exactly sure what this proves?

dtrader98 · Nov 13, 2009

Quote from Craig66:

Is the medium "the medium for the last n innovations"? If so then isn't the value of n going to heavily affect the result?

I try to be clear, but so far, I see I haven't been. Some things should be readily obvious to people who have familiarity with time series and statistics.

Couple of things.

1) Every number a, b, and N observations (time values) should all come out positive, as a and b are a count of outcomes meeting rules shown. So, if someone gets a negative value for the result, there is something wrong with your interpretation or my description.

2) I wasn't too clear on how the ts sequence was generated. I never said it was a cumsum random walk.
I simply said it was a random gaussian value generated at each time instant
(distribution parameters are fixed).
You can think of each step as an innovation if you like, although in this case they are not innovations.

3) The series should be stationary by inspection. If not, something is wrong.

4) The median (not medium) is the median of the entire time series you run, 1000, 100000, 1million, whatever; doesn't effect results. More the better (central limit in action). That's the beauty.

Cheers.

clue: what you are trying to verify is that the frequency of occurrence of outcomes generated by your rules are greater or less than 50%, and by how much, and whether the result applies to any randomly generated sequence of arbitrarily large length described in the last 2 posts.

Craig66 · Nov 13, 2009

Ok, that clears things up a bit, here are some runs...

POS:37.41 NEG:37.84
POS:37.455 NEG:37.535
POS:37.3067 NEG:37.5267
POS:37.3 NEG:37.68
POS:37.336 NEG:37.602
POS:37.48 NEG:37.5167
POS:37.5314 NEG:37.4857
POS:37.5038 NEG:37.44
POS:37.5078 NEG:37.4411
POS:37.533 NEG:37.496

Also, attached is a single run, is this looking correct now?

dtrader98 · Nov 13, 2009

Quote from Craig66:

Ok, that clears things up a bit, here are some runs...

POS:37.41 NEG:37.84
POS:37.455 NEG:37.535
POS:37.3067 NEG:37.5267
POS:37.3 NEG:37.68
POS:37.336 NEG:37.602
POS:37.48 NEG:37.5167
POS:37.5314 NEG:37.4857
POS:37.5038 NEG:37.44
POS:37.5078 NEG:37.4411
POS:37.533 NEG:37.496

Also, attached is a single run, is this looking correct now?

No. Blue-looks ok. I don't know what the red line is, but if it is the median, it should be flat. There is no windowing in my description.

austrijec · Nov 13, 2009

My results are multiplied by -1 but confirm bias.

Is my logic in excel correct? Sim runs for a 1 min or so on my pc.

Craig66 · Nov 13, 2009

Quote from dtrader98:

No. Blue-looks ok. I don't know what the red line is, but if it is the median, it should be flat. There is no windowing in my description.

Stupid question, the median is going to pretty much be zero for any stationary time series right? So I replaced 'median' with '0', but I still get pretty much the same result...just re-checking the other stuff now.

Edit:

Here is the code

int main(int argc, char *argv[])
{
int pos = 0;
int neg = 0;
int num_trials = 0;
for(int z = 0; z < 10; ++z)
{
for(int j = 0; j < 10; ++j)
{
double y = 0;
double last_y = 0;
for(int i = 0; i < 1000; ++i)
{
last_y = y;
y = Random::GetGaussian();
if (last_y < 0 && SignOf(y - last_y) == 1) ++pos;
if (last_y > 0 && SignOf(y - last_y) == -1) ++neg;
++num_trials;
}
}
double pos_chance = pos / (double)num_trials;
double neg_chance = neg / (double)num_trials;
std::cout << "POS:" << pos_chance * 100 << " NEG:" << neg_chance * 100 << std::endl;
}
}

dtrader98 · Nov 13, 2009

Quote from austrijec:

My results are multiplied by -1 but confirm bias.

Is my logic in excel correct? Sim runs for a 1 min or so on my pc.

I don't know what your logic is; it should be the rules I described.

The results should converge to a stable value (much as you'd expect percentage of heads to asymptotically converge to 50% after many fair coin tosses).

dtrader98 · Nov 13, 2009

Quote from Craig66:

Stupid question, the median is going to pretty much be zero for any stationary time series right?

Not necessarily. It could be any value, but should be constant. Anything outside of zero is considered bias, but doesn't effect the results. You are concerned with dispersion around the median (and remember, reversion).

I think you can get it. But so far, results are not at all what I expect.

number22 · Nov 13, 2009

Ok, I haven't go though all the post from top, if someone else mentions about fluid dynamic design with linear regression analysis, hat off to you. Fed, ECB and other central banks designed a pond with cash as liquidity, interest rate is very much predicable, I think it is possible to predict impact of liquidity flows from either stock , bond or commodities markets.

The difficult part is you have to have accurate data for input, however, a lot of data is either manipulated or delayed, which will make your analysis flawed from beginning. To reduce these problem you need man power to collect your own data.

Craig66 · Nov 13, 2009

Ok, reworked it a bit...

Code:

int main(int argc, char *argv[])
{
int pos = 0;
int neg = 0;
int num_trials = 0;
for(int z = 0; z < 10; ++z)
{
for(int j = 0; j < 10; ++j)
{
std::vector<double> run;
for(int i = 0; i < 1000; ++i)
{
run.push_back(Random::GetGaussian());
}
double median = Utility::Median(run);
for(int i = 0; i < 999; ++i)
{
if (run < median && SignOf(run[i + 1] - run) == 1) ++pos;
if (run > median && SignOf(run[i + 1] - run) == -1) ++neg;
++num_trials;
}
}
double pos_chance = pos / (double)num_trials;
double neg_chance = neg / (double)num_trials;
std::cout << "POS:" << pos_chance * 100 << " NEG:" << neg_chance * 100 << std::endl;
}
}

Results (still the same):

POS:37.4975 NEG:37.6577
POS:37.4775 NEG:37.7477
POS:37.5909 NEG:37.5742
POS:37.6451 NEG:37.5501
POS:37.5275 NEG:37.4635
POS:37.5209 NEG:37.4241
POS:37.5189 NEG:37.4432
POS:37.495 NEG:37.4862
POS:37.4797 NEG:37.4497
POS:37.5095 NEG:37.4625

I'm obviously still missing something...

Can linear regression analysis really predict the future?

Attachments