Quote from dtrader98:
Thanks, although I'm pretty familiar with the basics. I am more interested in experienced based specifics that work for you, such as architecture, OOS hit rate, input factors, etc.. (as explicitly mentioned in 1st post). Or, almost as useful, would be what didn't work for you.
Sounds like you have mentioned machine learning in some other threads. Feel free to comment on specifics of other type of learners you have worked on or built.
That was my point: for model development data, there are no hard and fast rules. I have no idea what type of data or software tools you have available. Asking for specifics like this is like asking "How many bricks will I need to build a house?" I don't know, how many rooms do you want? Do you want a chimney? How about a garage?
Testing data is another matter. Though this will vary depending on how precisely you need to assess model performance, in most cases only a few thousand cases will be needed for testing each model. Common rules such at "train on 70% and test on 30%" miss the mark because once the test data reaches a certain size, the precision of the performance metric increases very slowly and is probably pretty close to the true value. Some authors (Weiss, Indurkhya, Kulikowski, Sklansky, Wassel) make a strong theoretical case for something around 5,000 test cases.
-Will Dwinnell
Data Mining in MATLAB