Originally posted by jperl
To bring this thread back to the center, here are four NASDAQ stocks whose number of updays(close to close) over the past 100 days is very low:
percent updays
LRCX 37%
TMPW 37
AAPL 36
LRCX 34
Question: Would this data influence your decision to short these stocks at the close of today? Or if your system tester suggested a long entry tomorrow, would the above info influence your decision not to take the long?
You pose an excellent question! This question illustrates a major issue in trading and the power of screening to mislead traders (that you pose this question suggests that you are aware of what I am talking about). Even if these four stocks have seemingly non-randomly high numbers of downdays it is no guarantee that they are destined for further downdays.
<b>When Coins Seem Nonrandom: The Danger of Survivorship Bias</b>
The analogy for screening the N100 for down days is to collect 100 different coins, flip each one a 100 times and record how many heads each coin generates. Out of the 100 coins you WILL find that some coins seem to generate far too many heads, while some other coins generate far too many tails. Moreover, if you do a standard test for the statistical significance of the results, you will determine that some coins seem to have a statistically significant non-random results (even if the coins truly are random). The problem is that the usual 2-sigma or 5% significance threshold implies that 1 out of 20 random data sets will pass the test. Thus, testing the data from 100 stocks, one should find about 5 stocks that "look nonrandom" (Note that as with all things statistical, one will not always see exactly 5 significantly nonrandom coins or stocks out of 100). Flip enough coins, enough times, and you will find strange results.
So, what is one to make of the coin that produced the most heads on the first 100 flips (out of the original set of 100 coins)?? Probably nothing because the coin is example of survivorship bias -- by random chance some coins (or stocks) will survive a test for nonrandomness. Despite the heady performance of that coin in the first set of flips, its probability of heads will remain 50%.
The point is, if you look at enough stocks you are guaranteed to find some that seem to have non-random patterns or fit a trading system extremely well. As one increases the number of stocks in a screening process (or the number of different trading systems in a backtest), one has to implement ever-stricter statistical tests to filter out the high chance that some of the set of stocks (or coins) has, by totally random chance, produced a pattern that looks non-random.
<b>Does the First 50-Days Data Predict the Second 50-days Data?</b>
A better way to look at the data is to consider whether the number of up-days seen in the first half of the data predicts (or correlates) with the number of up-days in the second half of the data. A positive correlation suggests trending or momentum (so you should short the stocks with excessive numbers of down days). A negative correlation suggests mean reversion (so you should buy stocks with excessive numbers of down days).
There are a number of ways to statistically check the relationship between outcomes in the first 50 days and those of the second 50 days: regular correlation, Spearman rank correlation, contingency tables (from 2x2 on upward). The only caution, in doing these tests, is that one should make the data disjoint by skipping a day of data between halfsets to correct for the presence of errors in the closing price data. One might also want to look at an number of successive 50-day periods -- if the pattern of correlation does not hold across a number of successive disjoint dataset, then there is no exploitable pattern.
Modern computers, massive data sets, and powerful software give traders enough rope to shoot themselves in feet with. The harder you look for a pattern, the more often you find patterns that do not actually exist.
Trade carefully,
-Traden4Alpha