Quote from bulat:
And how can you tell if something is curve fit or not? It seems to me, that's the most important question.
bulat
Quote from harrytrader:
30 has to do with the practical recommanded size for a sample of size n (n=30 items here) wich individual item follows any probability law BUT IS INDEPENDANT FROM ANY OTHER ITEM to tend (for the SAMPLE as a whole) towards the normal (Laplace-Gauss) law THAT IS TO SAY IT CONCERNS THE LAW OF THE MEAN not the law of the INDIVIDUAL ITEM.
This implies that with only 30 trades you won't get much about consistency of mean if you decide to take a sample of a size of 30 since you have only one SINGLE SAMPLE of 30. It would be better in fact to reduce the number thirty for getting more samples than the contrary (in quality control one often uses MULTIPLE samples of only 4 or 5 items). Secondly the premisce is important: INDEPENDANCY. If your trades are too much consecutive in time for example independancy will be probably a fake. Even taking "diversified" contracts won't garantee you independancy all the time because they follow cycle that can hide their dependancy but in some risky situation this dependancy will exhibit by surprise - it is a rare event but that is just when they occur that all your statistical calculation are then proved to be false. It is not the fault of statistics it is your fault for not having taking account rigourously the conditions of application of statistical law.
If you are an investor the mean is enough, if you are a speculator, the mean is not enough it is the variance that is the problem and the premium source of risk. And it is not the variance of the mean it is the variance of a single item and the variance of a single item is always greater than the variance of a mean. That's why if you don't have much capital you will undergo the maximum law of variation of a single item and risk the ruin with almost certainty if you only care about the mean. Even Nobel Prize B&S had fallen into the trap hee hee ! Because they thought that efficiency means mean reversion whereas efficiency in stock market is the contrary in my opinion: it means maximisation of uncertainty and so variance see http://www.elitetrader.com/vb/showthread.php?s=&threadid=19770&perpage=6&pagenumber=3
"I will profit from this example to give you the intuition of what efficiency really means in truth."
Another way of saying the same thing is that the series is not stationary, or I(0). In other words, the "rules" that generate the time series change over time, and these rule changes can occur without any means of detecting the change using statistical or other (known) measures.One of the problems with trading is that all the past information is a subset of some unknowable future distribution.
The normal distribution is not terrible at modeling the 97 % of the curve at the "hump." It is at the tails that where it is really off. Some modelers then use the (log) normal distribution to model the hump, and the Pareto distribution to model the tails....The amount of price movement over a period of time also does not conform to a normal distribution so stats can only be used for rough estimation. I've posted this before, but I think this is a good thread to repost it.
I am not sure that the markets are following "one" ultimate" distribution. It may be a linear combination of them, or even a nonlinear combination of them. I supposed the whole of them can be thought of as one "grand" distribution....No matter what test you do, the trades are going to only be a sample of the ultimate distribution.
The t-distribution may be more appropriate with such small number of samples......If it shows 60% winners for the past 10 years, that may be the mean or only a skewed result from your tests. Here's something to stress test the sample.
To find out estimate of error in system test sample:
(Can be used for % win because the frequency of wins/losses is a normal distribution.)
Error estimate
E = (z * std. dev. of sample) / sqrt of number of samples in test
E = Error estimate
z = number of std. dev. of normal distribution for the confidence level needed.
z = 3.08 = 99.8% confidence level
z= 2.58 = 99.0% confidence level
z=1.96 = 95.0% confidence level
z=1.645 = 90.0% confidence level
Ex. 50 trades in test (1 = win 0 = loss)
sample mean = 40% winners or .40
sample std. dev. = .25
...If we want to know the estimate of the mean to the 99% level then:
E = (2.58 * .25) / sqrt(50)
E = .0912
so with 99% certainty, we know the mean winning % range is .40 +- .0912 (you can expect to see wins between 30.88% and 49.12% in the future) If it's not acceptable, either do more tests or work on a system with a tighter standard deviation of wins versus losses.
So how many samples do we need to be 99% certain of the mean?
n = ((z**2) * (std. dev. of sample**2)) / (( 1 - confidence level required)**2)
n = number of tests we need to run
z = same as above
std. dev. of sample = std. dev. from sample size we have seen
1 - confidence level required = how exact do we want it:
.90 confidence = 1 - .9 or .1 for the formula
.95 confidence = 1 - .95 or .05 for the formula
.99 confidence = 1 - .99 or .01 for the formula
.998 confidence = 1 - .998 or .002 for the formula
in this case we want 99% confidence
n = ((2.58**2)* (.25**2)) / (.01**2)
n = (6.6564 * .0625) / .0001
n = 4,160 tests needed to prove the mean at the 99% confidence level is really 40% winners.
After you've done the test for win% you can also do it for win size and loss size (independently). Usually the win size will not correspond to a normal distribution. If you're cutting losses short and letting profits run, then you should have some outlier trades in the win size distribution. For the test to be valid you need to eliminate the outlier's. I've found that removing the top 5% winning trades (best 5 out of each 100), has been enough to move the distribution to a more normal bell curve.
When you've done the tests on the win size and loss size, you'll end up with something like:
win size mean $500 +- $100 at the 99% confidence level
loss size mean $250 +- $50 at the 99% confidence level
Then you compute a pessimistic expectation using the low end of win % and win size and the high end of the loss size. If it shows any profit, then you've probably got a winner (as long as it wasn't curve fit).
Ex.
E = (400 * .5) - (300 * .5)
E = 50
Quote from acrary:
One of the problems with trading is that all the past information is a subset of some unknowable future distribution. The amount of price movement over a period of time also does not conform to a normal distribution so stats can only be used for rough estimation. I've posted this before, but I think this is a good thread to repost it.
Quote from acrary:
After you've done the test for win% you can also do it for win size and loss size (independently). Usually the win size will not correspond to a normal distribution. If you're cutting losses short and letting profits run, then you should have some outlier trades in the win size distribution. For the test to be valid you need to eliminate the outlier's. I've found that removing the top 5% winning trades (best 5 out of each 100), has been enough to move the distribution to a more normal bell curve.