Random Trading

Traden4Alpha · Aug 16, 2002

Originally posted by jperl
Perhaps you would tell us how you arrived at your percentages.
They don't look like a binomial distribution to me. Bkuerbs pointed out that 75% of the data should fall within 2 standard deviations(which for the 100 coin toss is +-10). Your data for the bins 41 to 60 add up to 95.4% which looks more like a gaussian(i.e normal) distibution. Is Bkuerbs incorrect about this?[/B]

I used the formulas for the binomial distribution from the CRC Math Handbook, which is the exact same formula quoted by BKerbs on page 3 of this thread. The probability of observing exactly x heads in N tosses is:

C(N,x) * p^x * (1-p)^(n-x),

In this formula, p is the probability of the event (0.50 for a fair coin) and C is the Combinations function ( C(N,x) = N!/((N-x)!*x!) ) which counts the number of different patterns for x heads in a sequence of N tosses.

I do not know exactly where BKerbs gets the 75% figure (maybe for smaller numbers of tosses, the tails of the binomial distribution are a little funky). But at 100 tosses, the distribution is close to normal. I also did a quick simulation by generating 102 sequences of 100 tosses each and looking at the distribution. It looked "normal" to me, with much more that 75% falling inside the +- 2 standard deviation interval.

Enjoying this thread,
-Traden4Alpha

jperl · Aug 16, 2002

Okay- it appears that 102 data points is enough to satisfy the central limit theorem, so the distribution for coin flips should be well approximated by a gaussian normal. With this is mind then for the NASDAQ data, I computed the mean to be
43.90 so that for 102 points this would give a single upday probability of 43.90/102 or 0.430, and a standard deviation of sq.rt.(102*(0.430)*(1-0.430))=5.00. So the data is skewed to the downside of a normal distibution.

BKuerbs · Aug 17, 2002

The numbers I stated are derived from chebycheff's formula. In words: there are (at least) 75% of data within two standard dev of the mean, 89% within 3 standard devs, 94% within 4 standard devs. It is an estimate, involving a ">=", that means at least xx% within.....

The beauty of the formula is, you may apply it without knowing much about the underlying distribution. The drawback is, it is somewhat rough, compare with the 68% within *one* standard dev and 99,7% within *three* standard devs for a normal distribution.

jperl was so kind as to sent me the raw data he used to prepare his table. I used this data to test the fit of a normal and binomial distribution.

I worte down the results in a little document, see http://home.t-online.de/home/Bernd.Kuerbs/Dokumente/Fitting 102 Nasdaq Stocks.pdf

Regards

Bernd Kuerbs

Traden4Alpha · Aug 17, 2002

Nice analysis, BKuerbs. I should have recognized the Chebycheff's inequality numbers (the probability of a random variable differing from its mean is less than or equal to 1/s^2, where s is the number of standard deviations). Chebycheff's inequality is very conservative and will overestimate the probability of a extreme events and underestimate the probability of mainstream events for random variables that are normally or binomially distributed. In fact, Chebycheff's will tell you that the probability of being outside +- 1 standard deviation is as much as 100%.

Regarding the "disagreement" between the 95% and 75% numbers, its a matter of what one is willing to assume. IF one assumes or "knows" that the number of heads in a 100-toss sequence are binomially distributed, then one would expect 96.48% of the sequences to be in the 40-to-60 heads/sequence range (assuming a normal distribution would put this number at 95.45%). But, if one refuses to make assumptions about the statistical distribution of the number of up-days in a 100-day sequence, then one would want to use Chebycheff's to conclude that as few as 75% of the sequences might be in the 40-to-60 up-days/sequence range. So, we are both right -- BKuerbs is being more conservative, whereas I am making more assumptions.

That said, as a trader, I should probably use Chebycheff's inequality for risk management calculations. From a risk standpoint, I probably should NOT assume that the price moves or trading system returns follow a normal distribution (with its 95% within +-2 standard deviation tails). As BKuerbs points out, Chebycheff's is very useful when one does not know (or should not assume) anything about the distribution. No matter how pathological (heavy-tailed) the distribution is, if the variance is defined, it will obey Chebycheff's inequality. So, I should follow BKuerbs example and use Chebycheff's for more conservative estimates of the probability of worst-case events. (I knew there was a good reason to keep reading this thread!)

I going to practice safe trading and use Chebycheff's,
-Traden4Alpha