Quote from Sparohok:
The multiple hypothesis problem can be addressed relatively easily with methods such as the Bonferroni correction:
http://mathworld.wolfram.com/BonferroniCorrection.html
That's true, but it brings up a whole new set of problems. Bonferroni correction simply requires an ever more stringent level of statistical significance as the number of hypotheses tested goes up. But if you are doing large scale data mining, where you are searching billions of patterns, then the only system that would ever pass a significance test with Bonferroni correction is one that makes virtually astronomical profits. So even if there are valid patterns in the data, they would never pass your test.
In other words you go from a Type I error (mistaking random patterns for meaningfull ones) to a Type II error (mistaking meaningful patterns for random ones) because the test becomes an impossible hurdle to jump over for any pattern real or random.
-bulat
