Is data mining for trading patterns impossible?

Quote from Sparohok:


The multiple hypothesis problem can be addressed relatively easily with methods such as the Bonferroni correction:
http://mathworld.wolfram.com/BonferroniCorrection.html

That's true, but it brings up a whole new set of problems. Bonferroni correction simply requires an ever more stringent level of statistical significance as the number of hypotheses tested goes up. But if you are doing large scale data mining, where you are searching billions of patterns, then the only system that would ever pass a significance test with Bonferroni correction is one that makes virtually astronomical profits. So even if there are valid patterns in the data, they would never pass your test.

In other words you go from a Type I error (mistaking random patterns for meaningfull ones) to a Type II error (mistaking meaningful patterns for random ones) because the test becomes an impossible hurdle to jump over for any pattern real or random.

-bulat
 
Quote from mind:

if it was not alan i would say that something like this:

http://www.elitetrader.com/vb/showt...9&highlight=system+stopped+working#post249509

is the pure curve fit. many variables, all of them with only loose results by themselves added up.
i believe alan is true but i can hardly understand why. i think his edge test is no guarantee against curve fit.

You are obviously correct that it is pure curve fit, since the system performs poorly on all the out of sample data (both before and after) the test period that was posted.

-bulat
 
Well here you are again. And again you drive by the intersection where you should have turned. Patterns don't work because the underlying data series is random a significant part of the time.

If instead of trying to find a pattern within random data, you looked for periods when the data is not random (and then find recurring patterns), you would have a chance to make money.

Some of you must have the ability for abstract thought. Go get yourself a copy of "The mathematics of technical analysis" by Cliff Sherry. Do the exercises and figure it out. Sheesh.

Ordinarily I would wish you "good luck". Instead I will just hope you learn to identify profitable patterns within a non-random data series before your money runs out.

:D

Edit:

Please no PMs. This is the basic fucking Stats & Probabilities from school . Acutally it is taught in the 200 level or second half of the first year of statistics.
 
Quote from Lefty62151:

Well here you are again. And again you drive by the intersection where you should have turned. Patterns don't work because the underlying data series is random a significant part of the time.

If instead of trying to find a pattern within random data, you looked for periods when the data is not random (and then find recurring patterns), you would have a chance to make money.

Some of you must have the ability for abstract thought. Go get yourself a copy of "The mathematics of technical analysis" by Cliff Sherry. Do the exercises and figure it out. Sheesh.

Ordinarily I would wish you "good luck". Instead I will just hope you learn to identify profitable patterns within a non-random data series before your money runs out.

As much as I enjoy feedback from people without a clue, yet a feeling of great self-importance, please don't post to this thread if you intend to be rude while adding absolutely nothing useful to the discussion.

Please no PMs. This is the basic fucking Stats & Probabilities from school . Acutally it is taught in the 200 level or second half of the first year of statistics.

Why would anyone PM you about this? It's pretty obvious that you have nothing useful to share here.
 
Quote from bulat:

I've spent a lot of time thinking about using various automated methods to discover trading patterns/strategies and I'm coming to the conclusion that it's mathematically impossible.

When you apply some automated discovery mechanism you either have no preconceived notion or only a very general notion for what you are looking for. You then use some search technique to look at huge number of possible relationships and indicator/price permutations. If you are using a dumb random search approach you can easily search millions of possibilities. If you use a smart directed search algorithm (ie genetic algorithm), you can search the equivalent of billions or tens of billions of permutations.

When you look at so many possibilities, you are guaranteed to find quite a few methods that work incredibly well simply by chance. Even worse, some of these methods will pass any statistical test you throw at it (sharpe ratio, t-test, edge test, correlation test, etc.) since all these test compare the observed results vs what you'd expect randomly. But when you look at a 1,000,000 random samples, you will obviously have some that perform better than 99.999% of random, thus passing any possible test you throw at it.

So even if there are meaningful patterns that your search discovers, they will be intermingled with numerous patterns that work simply by chance. And there is absolutely no way to actually separate them out.

I'd be curious to hear if anyone sees a flaw in this reasoning.

-bulat

It is possible
http://www.trade-ideas.com/Help.html#GBBOT

and works very well
 
Quote from bulat:

In other words you go from a Type I error (mistaking random patterns for meaningfull ones) to a Type II error (mistaking meaningful patterns for random ones) because the test becomes an impossible hurdle to jump over for any pattern real or random.

If your data mining method results in no significant results, you should look for a different data mining method rather than relaxing your statistical requirements. For example, do not test billions of possible models. This is a recipe for failure. There are alternatives. Develop strategies incrementally rather than doing an exhaustive search. Look for a smooth local parameter space where similar strategies give similar results. Use seperate testing and validation datasets. Ther are many possibilities. But, you ignore the multiple hypothesis problem at your peril.

As I said, human intuition is subject to the same flaw as the t-test. We want to believe that our results are more significant than they actually are. This is why the vast majority of backtested trading strategies underperform when they are tested post discovery.

Martin
 
Quote from bulat:

I've spent a lot of time thinking about using various automated methods to discover trading patterns/strategies and I'm coming to the conclusion that it's mathematically impossible.

When you apply some automated discovery mechanism you either have no preconceived notion or only a very general notion for what you are looking for. You then use some search technique to look at huge number of possible relationships and indicator/price permutations. If you are using a dumb random search approach you can easily search millions of possibilities. If you use a smart directed search algorithm (ie genetic algorithm), you can search the equivalent of billions or tens of billions of permutations.

When you look at so many possibilities, you are guaranteed to find quite a few methods that work incredibly well simply by chance. Even worse, some of these methods will pass any statistical test you throw at it (sharpe ratio, t-test, edge test, correlation test, etc.) since all these test compare the observed results vs what you'd expect randomly. But when you look at a 1,000,000 random samples, you will obviously have some that perform better than 99.999% of random, thus passing any possible test you throw at it.

So even if there are meaningful patterns that your search discovers, they will be intermingled with numerous patterns that work simply by chance. And there is absolutely no way to actually separate them out.

I'd be curious to hear if anyone sees a flaw in this reasoning.

-bulat

would you rather ride in a car driven by a 16 year old kid, or by software written by the greatest minds?

i dont think you can really beat developing your best judgement, and looking at everything in every situation
 
after having witnessed and participated in a number of discussions similar to this one, after having traded several quantitative strategies with several million dollars quite successfully for some time before they started to loose edge, after having listened to different sorts of traders, who i think do well yet are on very different levels of sophistication, i come to the conclusion that we are taking a very important variable out of the equation: our own neural net and our own consciousness.

the posts within this thread indicate it very well IMHO. it is not necessarily the criteria that decides whether we make it or not. it is the way, intensity and decisiveness of our search. if alan did well with his way of searchin it was probably because he knew so much about the market and tested so many, many setups, that he finally found tradeable setups within the randomness and could tell by his experience, and found some way to "prove" this "experience".

i always used sharpe ratio as my main criteria to tell about validity of an approach. now i tend to thnk that the number of trade is very important. if i have 4000 days and i trade on a third of them by entering at the open and getting out at the close, and i use just two or three parameters, i am very confident that there is "something", even if my sharpe ratio is below 1.
 
Quote from mind:

i always used sharpe ratio as my main criteria to tell about validity of an approach. now i tend to thnk that the number of trade is very important. if i have 4000 days and i trade on a third of them by entering at the open and getting out at the close, and i use just two or three parameters, i am very confident that there is "something", even if my sharpe ratio is below 1.

Of course the number of trades is important. Once again it is a matter of statistical validity. If a system gives you one entry a year which you hold for a full year, and at the end of the year you are ahead of the market, you literally cannot draw any statistically valid conclusion about your system's real world performance. If you make 1000 trades a year with a 60% win/loss ratio, you know with extremely high confidence that your system did not get those results by chance. This is known in statistics as the law of large numbers.

Martin
 
Back
Top