Quote from dloyer:
A view years ago, I used rapid miner on a test set of normalized eod data to look at the simple problem of when a stock should be held overnight.
I used the built in cross validation tools and had several years of data for > 800 liquid nsdq symbols.
I used free libraries to predict intraday data.
NN = FANN
SVM = libsvm and tinysvm
RVM = dlib
the data preprocessing and analysis was on C++. With intraday data you have many samples to train on and you can get higher correlation on the OOS. But there are too many tricks to get the best results.
For example, regarding simple NN. Which algorithm should you use to train NN? Incremental, QRPOP, RPROP? Which should be learning rate and steepness? Which neurons should you use, sigmoidal or tanh or others? How many layers? How many neurons in hidden layers? Too many questions and I do not have best answers, while I spent much time.
On the other side I think 90% of success is in correct data preprocessing and analysis. Which data input should you use? TA indicators, wavelets, logarithms, ratios, differencies? What shall you predict? Up/down or %movement or some trading signal or...?
I looked through many scientific papers which pretend to get extra % then Buy&Hold and mostly they were piece of crap. They contain math, but did not describe how did they preprocess the data or do not provide OOS results.
Maybe it is true. I am getting the new DLPAL product from price action lab with PRO option announced yesterday. It does feature construction from price history that are not so well-known to the market and there is potential for some arb there. Always looking for new products to get a nice boost and may retire like Grtz Danny