Quote from rdg:
what if it uses color as the determining factor? if it only knows apples and oranges, then any orange fruit must be an orange and every non-orange fruit must be an apple. and when you put it to work, it thinks all bananas are apples. and all watermelons. and kiwi. and strawberries. and grapes. so now you have a bot that can't really identify any fruit correctly even though you think it can identify apples and oranges perfectly.
well not quite what i meant. it can classify apples and oranges. it doesn't classify every none orange as an apple. it knows what an apple looks like and it knows what an orange looks like. it doesn't attempt to classify anything that does not look like one of the two.
so when bananas come down the tape it just goes and flies a kite. which is what I was trying to get at in my post, a regular classification system is judged on how it performs on every element of test data, 'i dont know' isn't usually a valid option, but with trading you can use 'i dont know' as much as you want, and make your profits on the subsets of the problem your system does very well. thus curve-fitting to the apple / orange curves proves a beneficial thing.
taking the case for color. say it finds in the training set apples are red and oranges are .. orange. and that is its whole hypothesis. sure it might attempt to classify a strawberry or dragon fruit as an apple and get it wrong. but if over the year of back data you saw 100% profitable days with small drawdowns, maybe misclassifying strawberries and dragon fruits is just the cost of doing business. and you base your assumption on the amount of strawberries and dragonfruits based on what you saw in your large sample set. of course it could be wrong but the only hope you have at predicting the future is by looking at the past, if you are going to assume a huge change in the number of dragonfruits etc, there is no point trying to learn the market, using either ATSs or discretionary.
Quote from sulli:
Walter, are you saying that you believe if one were to optimize over a very large sample of data, that it would be hard to over optimize?
Just trying to get a handle on your point?
ah. no thats not what i was trying to say. what i meant was that it is possible to over optimize on data of any size depending on the granularity of your a.i's grammar. but even if you do over optimize on past data, it is not necessarily a bad thing with trading since as long as the optimizations defines any correct sub-sets of profitable trades(100% profitable days seen), and not too many incorrect subsets(relatively small drawdowns), it will be a good system regardless of the trades you are missing / you being over optimized for very specific elements.
i guess the nutmeat of the idea is that curve fitting to past data when creating a regular classification system can often be very bad, but with trading it is acceptable to have the same type of over curve fitting on past data as long as you a) have a large sample set b) make enough trades during the back test c) have a small grammar and vc-dim. basically allow 'i dont know' as a valid classification option as long as it does attempt to classify sometimes (and turns a profit on those classifications).
we can accept someone as a good boxer solely because he fights weak opponents in this case, regardless of his actual boxing ability. otherwise known as my baby punching lemma.