Quote from Jerry030:
Youâre welcome.
It is without a doubt the best open source data mining package out there.... probably equivalent to a $4 to $6K commercial package. Their business model is very creative: give the package away for free, then make your profit by selling consulting services when people realize it will take many hundreds of hours to reach the highly skilled level of usage and time being money, once you know what you need it's often more cost effective to pay to get that in done a few weeks than spend months doing it yourself.
By system examples do you mean tutorial or real projects?
You can find some tutorials here: http://www.neuralmarkettrends.com/tutorials/
I don't know of published studies.
Idea: if there are a few of us who what to explore data mining with Rapid Miner and the financial markets lets start a collaborative group.
For example:
1) Pick 3 or 4 markets and several time frames.
2) Create standard training and test data sets
3) Put those on a private site.
4) Each month pick a characteristic to model: entry points, stops, profit targets, trend start, trend stop... there are over a dozen logical trading system components for any market or system strategy.
4) Everybody try to create an optimal result using their preferred method: NN, decision tree, and so on. With Rapid Miner there are lots of potential mehods and mixtures or design components.
5) At the end of the months everybody post their best model as a RapidMiner file.
What one person overlooks, someone else may discover. In any case each month the best solution becomes a kind of benchmark for further independent research or incorporation in your own trading system...... or minimally a tutorial lesson for those trying to learn that method. One person might already have their own great method for trade exit but is less than optimal at stop loss placement. So their participation in the group may really pay off if the collaborative research leads to better stop loss strategy that they can add to their trading system.
What does anybody think?
My thinking is that the work at least in terms of sharing monthly Rapid Miner model files and statistical results needs to be a private process for those contributing something to the effort.
Otherwise in typical Internet forum fashion you get 3 people posting anything useful, 110 people reading for what they can learn but contributing nothing and 27 people insisting their own ideas are much better but also contributing nothing useful. For examples look at most threads on ET or similar groups. Much talk, lots of people doing a kind of social network pecking order shuffle, with little concrete value.
So with this idea it would be the reverse: let your Rapid Miner model do the talking....sort of eliminate the pontifications and posturing and focus on objective results. This approach is standard in academic circles where researchers exchange data sets and experimental designs for peer review and validation of their theories.
Jerry030
"Otherwise in typical Internet forum fashion you get 3 people posting anything useful, 110 people reading for what they can learn but contributing nothing and 27 people insisting their own ideas are much better but also contributing nothing useful"
Don't forget the 50 that pop in and try to shred your ideas, but never back it up nor add anything useful to the discussion.
I like the way you think. Might be worthwhile to pursue, but you'd have to find someone to moderate the 'private' logistics.
Regarding examples, there was a good tutorial on financial markets at one of the sites you mentioned (it was on gold). I stepped through it, but the author didn't draw really draw any conclusions or show any projections.
It was like, and so here you can look at correlations visually and that was about it (lesson 5 or 6 I think).
I'd like to see something like that, but with more of a conclusion about predictability and results. A lot like the type of system methodology you describe earlier.
I plan to play with it a bit more and see if I can come up with something useful.
What I really like is how many of the functions are instantiated and can be quickly pulled into a graphical tree like environment, allowing us to play with many AI type functions to quickly prototype.
Regarding papers, I see a few that look at things like hit rate, but don't seem too comprehensive (i.e. they only look at a small set of data universe). Also, they tend to be proprietary in their approach.
