Data mining

gaidaros · Oct 10, 2007

i agree out-of-sample testing is only a partial solution. the more statistically savvy may correct me, but you are talking about a mass-univariate approach to data mining. this has limitations due to the assumptions of your model. the fundamental correlations proposition mentioned above is one step towards a multivariate approach, which essentially captures more variability than a mass-univariate approach. You could try Bonferroni etc to filter out many chance signals but the bottom line remains:

you can not get more out of the data than the tools and assumptions that your model uses. the reverse approach might offer some edge: first you notice a pattern yourself (wild example: price of oil correlated with number of topless females on beach A and how does it behave when it is winter or summer), then you try to datamine that pattern you noticed with other patterns in a multivariate approach.

that is why I think computers will never take the human factor out of the equation, computers can compute models but not develop the models themselves. that is up to us (thankfully).

my 2cts

George

Indrionas · Oct 10, 2007

Quote from PredictorX:

Solutions to this problem need not be arbitrary. This subject has been well-studied and documented in the literature. Other resampling methods include k-fold cross validation and bootstrapping. I recommend Weiss and Kulikowski's Computer Systems That Learn.

-Will

Thank you for your advice.
I investigated these two methods you suggested.

As far as I understood k-fold cross-validation method makes opposite assumption regarding training and validation sample size ratios. In K-fold training to validation sample size ratio is (K-1)/K, which means the bigger K, the closer it's to 1 (while in my first post I was proposing that the ratio closer to 0 would be better). Although during this test validation sample "migrates" through all the data and I believe it's a good way to test pattern's consistency (robustness).

The bootstrapping idea is very interesting. So it generates many samples out of one sample you analyse, to simulate taking random samples out of population. I don't know how proper is this method to test prediction models I mentioned above (any ideas?). I would take this path: generate sub-samples from my training data sample and calculate the distribution of my prediction model's accuracy. Also, to make bootstrapping method valid, you wuold have to generate sub-samples from a data sample that covers as many different market conditions as possible (so as to simulate market as close to reality as possible).

Indrionas · Oct 10, 2007

Quote from NoWorries:

The out-of-sample testing (and cross-validation in general) is only a partial solution.

It works nice if your out-of-sample results are similar to your in-sample results every time.

More likely they are not. If it looks bad, you reject your hypothesis (=trading idea) and start over again with a new idea.

If you do this many, many times you are vulnerable to the same data dredging fallacy, b/c eventually you will find a trading rule that looks good both in-sample and out-of-sample. Unfortunately, it looks good just by chance in that case.

In other words, you should impose a certain discipline on yourself: Work as hard as possible on a trading rule using in-sample data. When you are 100% convinced you have something really robust, test it out-of-sample. If it looks bad, reject the rule and start over. If you have to reject your rules frequently and don't see any improvement over time, terminate your trading career and choose another profession.

I aggree that out-of-sample and cross-validation tests are not enough. That's why I created this thread - to discuss this problem.

What I think is that these objective methods would really help to weed out garbage results to great extent. I understand that subjective analysis is a must. A pattern must make sense, and no algorithm can decide that. But again, it helps if the garbage is filtered as much as possible before moving on to subjective analysis.

So, are there any other objective validation methods?

Indrionas · Oct 10, 2007

Quote from QuantPlus:

IMO...
As someone with a successful trading business for 15 years...
Built entirely on quantitative analysis using proprietary software...

(1) You have to approach this differently.
You have to start with rational correlations...
Such as oil stocks vs oil price, gold stocks vs gold price, various bond market relationships, etc...
And then look for EXPLOITABLE INEFFICIENCIES in a fairly narrow way.

(2) Doing #1 requires a significant amount of fundamental knowledge and trading experience.

Hi,

I'm a software developer, I create my own software for analysis. Have never bought any software related to trading.

(1) What you're saying is basically "add these specific rules". As I already wrote, I have a set of about 200 rules ready. It is no problem for me to add any rule I can think of. For example, suppose we're building a model for oil stock and I come up with a idea to look at oil stock-oil price relationship. So I simply program an indicator that calculates relationship of these two variables. For example, how much last week did currently analysed stock rose/fallen compared to its ATR(20) divided by how much oil futures price rose/fallen compared to its ATR(20) (note - I only use the ATR to normalize price changes). So I get a simple stock to oil price indicator. Now I can simply add a few boolean rules to my existing rule set, for example: indicator > 0.5, indicator > 1.0, indicator > 2.0, etc. And then mine the patterns. What really bothers me is how to discern valid patterns from garbage patterns (chance, curve fit, call it how you like it). Now there are a few methods mentioned in this thread. And of course the last step would be to take a subjective look at the pattern and decide if it makes any sence.

GTG · Oct 10, 2007

The book"Evidence Based Technical Analysis" deals with this topic quite a bit....it's been awhile since I read it, but I think most of the book was essentially about how to figure out how to measure the probability that a pattern discovered through data-mining has a genuine edge. The author of the book uses boot-strapping method, as one of his tools.

Indrionas · Oct 10, 2007

Quote from GTG:

The book"Evidence Based Technical Analysis" deals with this topic quite a bit....it's been awhile since I read it, but I think most of the book was essentially about how to figure out how to measure the probability that a pattern discovered through data-mining has a genuine edge. The author of the book uses boot-strapping method, as one of his tools.

Yes, I read about it on this forum, it was hyped as supposedly "the best book about TA". The problem is, I live in Eastern Europe and buying such books is a risky and expensive thing. It can take up to two months just for it to be delivered, and then I'm not guaranteed that I'll find anything useful or what I didn't know in the book. Or it may be that there's only 3-5 or even less pages discussing the subject I'm interested in (thus not worth wasting money and time on it). While e-book would be a nice and welcome alternative.

nitro · Oct 10, 2007

http://en.wikipedia.org/wiki/Null_hypothesis

No pattern based on squigly lines on your screen will ever make sense.

The goal of all pattern trading should be to identify noise traders from signal traders, given a certain time frame. Then you are not even half done.

nitro

Quote from Indrionas:

(1) ...What really bothers me is how to discern valid patterns from garbage patterns (chance, curve fit, call it how you like it). Now there are a few methods mentioned in this thread. And of course the last step would be to take a subjective look at the pattern and decide if it makes any sence.

Indrionas · Oct 10, 2007

Quote from nitro:
No pattern based on squigly lines on your screen will ever make sense.
[/B]

Hi nitro,

What is the basis of your argument? Could you give an example?
From what I understand, is that those squigly lines (I bet you're refering to indicators) can be transformed into discrete boolean value by using a simple concept of threshold. Now the threshold, I believe, should not be an exact science.
Also, I assume you're rejecting indicators (squigly lines) and accept price patterns, but the price itself is a squigly line and what you see on the screen (price bars) is just an approximation of that squigly line. You can easily transform any indicator to bar/candlestick appearance too.

Quote from nitro:
The goal of all pattern trading should be to identify noise traders from signal traders, given a certain time frame. Then you are not even half done.

nitro [/B]

I think I don't understand what exactly do you mean by your first sentence... Could you clarify?
I understand that this is not even half, trade management, risk management comes later. I'm currently interested in pre-development process.

nitro · Oct 10, 2007

Quote from Indrionas:

Hi nitro,

What is the basis of your argument? Could you give an example?
From what I understand, is that those squigly lines (I bet you're refering to indicators) can be transformed into discrete boolean value by using a simple concept of threshold. Now the threshold, I believe, should not be an exact science.

Tell me what a squigly line is supposed to tell me about the market. Tell me the underlying dynamic that a squigly line represents.

Also, I assume you're rejecting indicators (squigly lines) and accept price patterns, but the price itself is a squigly line and what you see on the screen (price bars) is just an approximation of that squigly line. You can easily transform any indicator to bar/candlestick appearance too.

I reject all of it. Some long term technical analysis indicators make sense, e.g., Dow theory. But to apply squily lines to intraday market is nonsensical imo. If you can't in ten words convincingly summarize why a squigly line on your screen has anything to do with an edge either mathematical or underlying psychology, then it is worthless. I suggest you study http://www.nuff.ox.ac.uk/users/hendry/book/ans04.pdf as an example of how applying math to nonsense leads to more nonsense.

I think I don't understand what exactly do you mean by your first sentence... Could you clarify?
I understand that this is not even half, trade management, risk management comes later. I'm currently interested in pre-development process.

A signal trader is a trader that has left a footprint on the market, and is capable of continuing to move the market, or not if he leaves. A noise trader is a trader whose market actions carry no further information. The noise trader has no memory.

You can model either the noise trader and make money, or the signal trader and make money, or both and make even more money.

nitro

Indrionas · Oct 10, 2007

Quote from nitro:

Tell me what a squigly line is supposed to tell me about the market. Tell me the underlying dynamic that a squigly line represents.

I reject all of it. Some long term technical analysis indicators make sense, e.g., Dow theory. But to apply squily lines to intraday market is nonsensical imo. If you can't in ten words summarize why a squigly line on your screen has anything to do with an edge either mathematical or underlying psychology, then it is worthless. I suggest you study http://www.nuff.ox.ac.uk/users/hendry/book/ans04.pdf as an example of how applying math to nonsense leads to more nonsense.

A squigly line itself doesn't tell me anything. But the value of indicator might tell me something. Anyway, the only indicator based rule I've added to my rules database is ADX(14)>40 (and its opposite). What it tells me is the strength of directional movement (above highs and below lows). This rule might be useful for patterns that are used in range extension strategies.
I'm sorry I forgot to mention that I'm only interested in daily bar patterns. I doubt you can find any meaningful patterns in intraday data, my basis for that is that intraday volatility is very unpredictable and there is too much noise, and also I'm not an intraday trader. More precisely, I'm looking for patterns to predict increased volatility. IMO it's impossible to predict the direction of tomorrows price (it's 50/50). What I'm looking for are patterns that might suggest that there is a great possibility of wider range tomorrow, so I can take a risk and trade in some direction (long or short). I believe this approach to trading is valid. There is a great example of this concept in Toby Crabel's book "Day Trading With Short Term Price Patterns and Opening Range Breakout". Also, a similar approach was used by Monroe Trout (he mentions pattern mining in The New Market Wizards book interview). Both Crabel and Trout are known for very consistent low-risk results, and both of them came from the same Niederhoffer school. And then there's this guy Acrary (Alan Crary) who mentioned these two trading gurus and his approach to model building seems similar. Instead of using association-rule mining he used neural networks and genetic algorithms to mine patterns. Somewhere in this forum he posted a tradestation code with an example of how a pattern based strategy looks like (post ID 249509) http://www.elitetrader.com/vb/showthread.php?s=&postid=249511#post249511 . This is where I took my ideas from.

Quote from nitro:

A signal trader is a trader that has left a footprint on the market, and is capable of continuing to move the market, or not if he leaves. A noise trader is a trader whose market actions carry no further information. The noise trader has no memory.

nitro

So I guess this is applied to intraday trading.

Thanks for response. I will try to read that PDF file you linked.