Data mining

nitro · Dec 23, 2007

Quote from Indrionas:

A squigly line itself doesn't tell me anything. But the value of indicator might tell me something.

I do not dispute that it says something, but acting on that information does not necessarily give you a positive expectancy. I value simple and exponetial moving averages highly, but that doesn't mean they carry a positive edge blindly using them.

Anyway, the only indicator based rule I've added to my rules database is ADX(14)>40 (and its opposite). What it tells me is the strength of directional movement (above highs and below lows). This rule might be useful for patterns that are used in range extension strategies.

That may be a worthwhile thing to do. What I suggest to you is that you take every signal where that is true. If you don't make $$ with it, then it has no edge. If you add other rules, some of them discretionary, then you are leaving the realm of empirical science.

I'm sorry I forgot to mention that I'm only interested in daily bar patterns. I doubt you can find any meaningful patterns in intraday data,

Oh, I never said that either. You definitely can. Indicators aren't them though.

my basis for that is that intraday volatility is very unpredictable and there is too much noise, and also I'm not an intraday trader. More precisely, I'm looking for patterns to predict increased volatility.

Right, volatility is easier to predict than direction.

IMO it's impossible to predict the direction of tomorrows price (it's 50/50). What I'm looking for are patterns that might suggest that there is a great possibility of wider range tomorrow, so I can take a risk and trade in some direction (long or short).

I do not believe it is impossible to predict tomorrows direction with greater than 50/50 probability.

I believe this approach to trading is valid. There is a great example of this concept in Toby Crabel's book "Day Trading With Short Term Price Patterns and Opening Range Breakout". Also, a similar approach was used by Monroe Trout (he mentions pattern mining in The New Market Wizards book interview). Both Crabel and Trout are known for very consistent low-risk results, and both of them came from the same Niederhoffer school.

Crable does things that are extremely sophisticated and are not mentioned in those books. You cannot even come close to hoping to emulate them without lots of capital. He employs something on the order of several hundred strategies _at_once_. I do not know much about Trout's approach.

And then there's this guy Acrary (Alan Crary) who mentioned these two trading gurus and his approach to model building seems similar. Instead of using association-rule mining he used neural networks and genetic algorithms to mine patterns. Somewhere in this forum he posted a tradestation code with an example of how a pattern based strategy looks like (post ID 249509) http://www.elitetrader.com/vb/showthread.php?s=&postid=249511#post249511 . This is where I took my ideas from.

I will look into that. Interestingly, I see lots of people that point to acrary's articles on ET. I never found anything of use in them, but that may say more about me than about acrary.

So I guess this is applied to intraday trading.

Thanks for response. I will try to read that PDF file you linked.

yw.

nitro

jficquette · Dec 23, 2007

Quote from Indrionas:

Let's suppose price patterns are mined from price data (data sample).

So we come up with a set of patterns that conforms to our preset requirements.

These requirements could be:
1) support - how many times the pattern showed up in our data sample: s(A)=50 would mean that pattern A showed up 50 times.

2) confidence - % hit rate: what percentage the pattern predicted the target correctly, i.e. A->B (pattern A led to target B), so it's basically s(A,B)/s(A). An example could be 80% accuracy.

3) interest - what confidence is of interest to you. I'll try to explain it with a simple example. Suppose we're analysing data sample containing 1000 elements. We mark 400 elements as our targets.
Now, if you tried simple random prediction (guessing), you would expect accuracy of 40%.
If we mine patterns and let's say get three patterns that conform to our minimum support requirements. Pattern A has confidence of 48%, pattern B has confidence of 65% and pattern C has confidence of 32%.
How do you know if these patterns are significant? They should be better than random by some preset threshold. Random guess accuracy is 40%, so pattern A has advantage of 48%-40%=8%, pattern B has advantage of 65%-40%=15% and pattern C has negative advantage of 32%-40%=-8%, so we automatically reject pattern C.
If we had preset interest threshold to 10%, the pattern A is rejected (8%<10%) and pattern B is accepted (15%>10%).

The problem I see here is that validating patterns this way is not enough, because data mining for patterns produces large amount of garbage. So the subject I would like to discuss is PATTERN VALIDATION.
One known technique is out-of-sample testing: test the patterns and see if they still conform to our preset requirements.
Even here it's still unclear, how much data there should be in training (where we mine) and testing (out-of-sample) data samples? What ratio? In our example we used 1000 data elements to mine patterns, but we could have 3000 data elements in total, so out-of-sample data set size would be 2000, and the training:testing sample size ratio is 1:2. It's clear that the smaller this ratio, the better it is, but on the other hand, you have to have training sample big enough so you could actually mine something meaningful out of it. So what's the optimal ratio? And of course, the training sample should be wide enough to cover different market conditions (uptrend, downtrend, ranging, low volatility, high volatility etc.).

This one technique is widely known and used. But are there any other pattern validation techniques out there? Anyone experienced in statistical data analysis and/or data mining care to share their knowledge?

You need to understand that ALL price patterns are shadows of patterns in higher time frames. Generally 8 times higher.

For example, a "Flag" on a 5 min is nothing more then an inside bar on the 40-45 min.

Price behavior is nothing more then meanderings across statistically boundaries of higher time frames.

John

Indrionas · Dec 23, 2007

Quote from nitro:

I do not dispute that it says something, but acting on that information does not necessarily give you a positive expectancy. I value simple and exponential moving averages highly, but that doesn't mean they carry a positive edge blindly using them.

That may be a worthwhile thing to do. What I suggest to you is that you take every signal where that is true. If you don't make $$ with it, then it has no edge. If you add other rules, some of them discretionary, then you are leaving the realm of empirical science.

Indicators are not a big issue for me at all. Out of 183 binary technical rules I use as inputs for pattern mining only 8 are based on indicators, 2 on ADX and 6 on simple moving averages. What I'm looking for are patterns consisting of 1-3 rules. I'm not interested in indicators as stand-alone signals.

Right, volatility is easier to predict than direction.

I do not believe it is impossible to predict tomorrows direction with greater than 50/50 probability.

It might be possible to predict tomorrow's direction to some extent. But is this prediction useful if the price move is small? Gotta look for fat-tail events instead.

Crable does things that are extremely sophisticated and are not mentioned in those books. You cannot even come close to hoping to emulate them without lots of capital. He employs something on the order of several hundred strategies _at_once_. I do not know much about Trout's approach.

I'm not trying to emulate his trading. I was only referring to the contents of the book. And I believe the key to consistency is trading multiple uncorrelated strategies at once. No need for several hundreds, a few is enough for a small trader.

Since the start of this thread I already developed a technique to filter out chance patterns to great extent. I construct hypothesis in such a way that I am able to apply Bonferroni correction. Also, for the mining to be valid, it is very important to choose proper targets (what you're mining for), but that's not in the scope of this topic.

PredictorX · Dec 24, 2007

Quote from gaidaros:

i agree out-of-sample testing is only a partial solution. the more statistically savvy may correct me, but you are talking about a mass-univariate approach to data mining.

Can you explain what you mean by "a mass-univariate approach"?

Thanks,
Will

Data mining

nitro

jficquette

Indrionas

PredictorX