Is data mining for trading patterns impossible?

prophet · Apr 22, 2005

Quote from vikana:

When I worked on this extensively (~year 2000) i never managed to find anything with a significant statistical edge. I did find a few strategies that were net positive (after commissions etc), but not nearly as good as what I could design myself.

In the end I "retired" the approach.

You didn't find a statistical edge in terms of papertrading results? What about statistical edges measured in terms of forecast correlation?, What about edges that papertrade with zero spread costs? I have found such edges are quite easy to find. The challenge is getting them to trade with spread costs (or enough signal lead time to use limit orders), and useful size, while maintaining a decent trade frequency for statistical confidence.

MAESTRO · Apr 22, 2005

A degree of freedom is a parameter that yields a different system
for every value allowed. For example, a moving average based on
10 days will yield different results from a moving average based on
24 days. Thus, the length of a moving average represents one
degree of freedom. People tend to want as many degrees of freedom
as possible in their systems. The more indicators you add, the
better you can describe historical market prices. The more degrees
of freedom you have in a system, the more likely that system will
fit itself to a series of prices. Unfortunately, the more a system fits
the data upon which it was developed, the less likely it will be to
produce profits in the future.
System development software (most of it, that is) encourages
the degrees-of-freedom bias. Give a system developer enough leeway
and that person will have a system that perfectly predicts the
moves in the market and makes thousands of dollars-on paper
with certain historical markets, that is. Most software allows people
to optimize to their heartâs content. Eventually, they will end up
with a meaningless system that makes a fortune on the data from
,which it was obtained, but performs miserably in real trading.
Most system development software is designed because people
have this bias. They want to know the perfect answer to the markets.
âThey want to be able to predict the markets perfectly. As a result, you
âfan buy software now for a few hundred dollars that will allow you
to overlay numerous studies over past market data. Within a few

minutes, you can begin to think that the markets are perfectly predictable. And that belief will stay with you until you attempt to trade
market instead of the historically optimized market.
No matter how much I mention this bias, most of you will still
give into it. Youâll still want to optimize your systems as much as
possible. As a result, let me give you several precautions in such
optimization. First, understand the concept you are using so well
that YOU will not even feel that you need to optimize. The more you
understand the concept you are trading, the less need you have to
do historical testing.
I would strongly suggest that you think about various mental
scenarios that might happen in the market. For example, you might
imagine the next war, the advent of a nuclear terrorist attack, the
adoption of a common currency in Europe, the adoption of a common
currency in Asia, China, and Japan joining together as a common
power, an unemployment report that jumps 120 percent, etc.
Some of these ideas might seem wild, but if you can understand
how your system concept would handle these events if they actually
happened, then you understand your concept very well.
No matter how much traders and investors learn about the
dangers of overoptimization, they still want to optimize. Thus, I
strongly recommend that you not use more than four or five
degrees of freedom in your system. So if you use two indicators
(one degree of freedom each) and two filters in your complete system,
thatâs probably all you can tolerate.

Dr. VAN K. THARP

Diamondtrim · Apr 22, 2005

Quote from mind:

really? when i saw the site i thought it was crap. you are trading based on this thinking?

FYI the pattern recognition engine that TI made automated in real-time what Andrew Lo discovered about chart patterns and their effectiveness. http://web.mit.edu/alo/www/

prophet · Apr 22, 2005

Quote from bulat:

Here's an interesting academic paper which also talks about the near impossibility of automatically identifying valid trading patterns, even when you know for sure that they exist in the solution space that you are searching (which you actually dont know).

http://www.intranet.management.mcgill.ca/homepage/profs/sarkissian/papers/pspur.pdf

Take the paper with a grain of salt. They used monthly data, 30+ year testing periods and public domain automated trading methods.... i.e. trend following. Anyone who's spent a few years testing systems will know you might as well beat your head against a wall instead try to find statistically significant or stable results given those parameters.

Why? Markets change dramatically in even a few years. Most systems will need to have some kind of implicit or explicit adaptation built in to work more than 5 years, if at all. From experience, monthly data is just not rich enough to build a system on, for any length testing period, any number of markets. Ok, it's not impossible, just very hard to do, especially with public domain, non-adaptive systems.

prophet · Apr 22, 2005

Quote from Lawrence Chan:

The paper described in non-layman's term what I said in my previous message

So I will just summarize it here,

The more explosive patterns (i.e. fits the tested data better) are chosen by most market participants as these patterns are more obvious to find (i.e. simple data mining) thus will not last (i.e. not working in the future).

No, the paper talks about very long term, low frequency systems, which can not and does not generalize to high frequency systems.

Patterns that "fit the tested data better" (or overfit) typically have many degrees of freedom and/or less testing data. Patterns that are obvious to find have few degrees of freedom and/or a lot of testing data. You can't equate these two. I don't understand what you saying exactly.

Quote from Lawrence Chan:
As I mentioned before, one important area of data mining in financial data that is not really touched by both academics and actual market participants is the behaviour of smaller time frames (e.g. 15-sec, 5-sec, etc.) due to various reasons like not enough computing power, hard disk space, etc., so this area will be an area that gives an edge to the smaller players in the market.

Yes they don't touch on it, for reasons I don't fully understand (lack of access to sufficient markets or years to make a well-rounded paper?). It is certainly not due to CPU or hard disk limitations, if one knows what they are doing. Yes, it has always provided edges for many players.

Quote from Lawrence Chan:
Notice that many successful independent daytraders are tape readers - the lowest time frame of all data, even though they use many other forms of technical analysis to assist them in their final decisions.

It is very telling, and the reason why papers like this don't generalize to short time frames.

prophet · Apr 22, 2005

Quote from Lefty62151:

The reason that patterns do not have a significant edge, is that the underlying price series does not exhibit stationarity. Its interesting that with all the extensive background that you folks claim to have, you don't (none of you) mention this very simple statistical "fact of life".

If you would learn to characterize the data series properly so that you know when price exhibits stationarity (or is likely to exhibit stationarity) you could then look for patterns with confidence that they are the result of non-random behavior.

Basic examples of stationarity occur at or arround earnings reports, economic reports, end of month (also known as "window dressing") bond and note auctions, etc. This is quite basic stuff.

I see the connection you are making between randomness and whether price is stationary or not. Here is a clarification for some of us. As you know a non-stationary price series can be forced into being stationary by subtracting out a moving average. That won't make price easier to trade because we trade off of price not price-MA. It is merely a different way to look at price, and something helpful for automated system analysis, especially regression and AI methods.

There is an interesting connection between autocorrelation/random walk, stationary/non-stationary and tradeability. Pure random walk is non-stationary by definition, and not tradeable. If it were stationary, it would have a definite range, and you could trade (fade) excursions and hold the position until they mean revert...100% profitability. That's what you are talking about... stationary periods have non-random patterns (eg a definite range). That is mostly true. If you can find or predict periods of stationary price, you can make great money.

However, market price has periods of stationary and non-stationary character. Yes, both can be traded... if you know which is present. It is not easy to predict when price will transition between a stationary and non-stationary period. A system designed for stationary (mean reverting) price will generate losses in non-stationary (trending) periods, and vice versa.

Here is where I disagree somewhat with your post. Non stationary does not imply random walk, or hard to trade. Price can be both non-stationary and easy to trade. It is actually the autocorrelation (likelyhood of trending or countertrending) of price (or system returns) that is useful to examine. Specifically you want a strong positive or negative autocorrelation, as close to +1 or -1, and as far as possible from zero. Strong positive means trends will persist strongly. Strong negative means trends will reverse predictably, or price will oscillate and one can fade or counter-trend-trade price (or the system returns). Zero autocorrelation means random walk, and difficult to trade, at least using lagging price as a guide to predict its future. All of this applies to system returns as well as price. Good systems have strong positive autocorrelation in their returns. Thus you know when to trade it based on it's recent profitability history. Strong negative autocorrelation of system returns can be dealt with by fading the system. Zero autocorrelation of system returns means system returns are unpredictable. There may be no way to judge whether to trade or not trade the system going forward. That can be a dangerous situation.

Just to add to the confusion.... it is also useful to look at autocorrelation of autocorrelation (a-of-a). If a-of-a of price is strongly positive, then you will be predict what type of system (trending / counter-trending / no system) is best to trade at any moment.

Some rough relationships:

zero autocorrelation = random-walk = non-stationary

positive autocorrelation = trending character = non-stationary

negative autocorrelation = counter trending character = stationary

Note: These are not absolutes, just general, intuitive relationships. One can also have different autocorrelation and different degrees of stationary for each time scale.

nononsense · Apr 22, 2005

Quote from Diamondtrim:

FYI the pattern recognition engine that TI made automated in real-time what Andrew Lo discovered about chart patterns and their effectiveness. http://web.mit.edu/alo/www/

I am still wondering about what Andrew Lo 'discovered about chart patterns and their effectiveness'.

In spite of the habituall exuberance about such kind of gismos at ET's, things remain extremely quiet about these supposed discoveries. Anybody making money with these?

prophet · Apr 22, 2005

Quote from MAESTRO:

A degree of freedom is a parameter that yields a different system for every value allowed. For example, a moving average based on 10 days will yield different results from a moving average based on 24 days....

Excellent post

For convenience.... http://www.elitetrader.com/vb/showthread.php?s=&postid=734309#post734309

Boy Plunger · Apr 22, 2005

Quote from nononsense:

Hi alex,

My experience somewhat parallels yours, except that I got plenty of software experience from a professional background.

Be good,
nononsense

Thanks for your reply. Some excellent info.

I use basically 3 time frames for equities when i trade-within each i can employ directional or range trading along with other strategies, with some better suited than others during certain conditions i.e. bond market up/down. My reason for attempting to systemize what i already do is twofold but i imagine it will expand as i progress in this endeavor; i need to visually see my attack/setup and i want to eliminate those infrequent 'sabotage' trades that lie outside my realm of control and will power. Bottom line is, i'm hoping to learn more about myself and the way i trade more than creating an unbiased system. I want to be able to use the indicators and math i use in an uncanned software platform running in realtime--in realtick i believe you can program this (API?) but i'm not looking to start a new career as a programmer. The quant guy i use, i make sure he knows nothing about the market, just numbers--plus, he is employed by a firm and travels a great deal. I use him for some longer term stuff and statistical feedback on my trading. Problem is, i have to do this myself. The way i want it.

thanks.

alex

MAESTRO has some good info. above. Very true...

man · May 13, 2005

Quote from OddTrader:

Q

HARDING, David, ... For the standard deviation to be a meaningful statistic at all the return time series must be generated from a process that is both stationary and parametric." UQ

dear David

why do i think you are an academic?

peace

PS if i knew the "process" i did not need any kind of risk management ...