Currently, I am learning / researching / testing / confirming what's been mentioned by Maestro, dtrader and others in the following thread:
http://www.elitetrader.com/vb/showthread.php?s=&postid=2645481#post2645481
As part of my research, I've been running some tests regarding random data... distribution... and opposingly technical tendencies... There's not much of an output so I'm not going to bother posting any csv or xls but here's a few things... Actually, the model generation and the tests were automated so there not much I can output without capping the file size limit...
Anyways... the basic/over-simplified tests I've done is:
1. First, I would take the tick data for ES tick data(year) and get the distribution (simply the average and std dev.) taking different aspects of the data (price change via points, %, sequence... etc. etc.... basically, my slip-forward routine) Finally, I used the Mersenne Twister and adjust the values so that it sticks within the range of the ES data range. After having about 20 types of data... I would run a bunch of models on them to see how well the models performed... Finally after that... I would check how the models performed out-sampling for an year. Plus, I added a few more factors to help me gain a better view of how I should deal with them.
2. Obviously, there's going to be an obvious relationship between the models and performance so to add some flavors to the test, I had some of the models to be developed using other sources like S&P 500 Rebalance (I have a few tests and results so maybe I should post them sometime... hrmmm), VIX, Market Sentiments (PREM, ADV, DEC... etc. etc.)
Finally, I added a few static models that are not auto-generated.
My results and conclusions:
1. Just based off this simple test... you can categorize models to 2 types. Distribution dependent models and those that aren't.
2. Most (87%) technical models that use market price as their sole source of information (like trend-following, RTM, swings) are distribution dependent. Taking it further, it hurts more than helping the performance by using technical analysis. As an example, I have a trend-following model that uses some chart based breakouts and I have another model that takes the distribution of the market character in hand (ex. ORB and EOD Vola.), the simple outlier Vola. performed better... Meaning there's no point of using a bunch of indicators and patterns... Within this type of models, simple = better. Also, it's safe to say that almost all models that utilize market prices are dist. dependent in some way.
3. If you have walk-forward optimization considered, the important parameter to consider is the duration of the sustainability of the exisiting distribution. This is easy get... Run test >>> get the kurtosis and skewness value of the distribution. >>> Find the upper/lower bounds of the values >>> get the frequency and duration.
4. Outliers... outliers... outliers... You can go ahead and create a sub-category between normal distribution reliant and the outlier friendly ones.
- The tricky part is the non-dependent ones:
5. It's either that haven't grasped the market character that corresponds to the model or it's an edge. And... what people consider edges usually gets placed in here. Personally, 80% of the edge-based trading get thrown in here and they get missed due to their nature.
6. Problem? 97% of the 12.46% (non-dependents) are curve-fits. The success rate of these are models sustaining the out-sample is extremely low relative to the 26% of the dependents. I still haven't found any pure mathematical / computational logic to extract them. Which ends up with me using my own logic and rationalism. Even if the concepts behind the models are sound to me personally, that doesn't guarantee high rate of success. (Haven't tested... don't know how...)
7. So... I took the better performing models for the ES that provided Zero relevance with the pseudo-random datas and ran them on different markets. This actually helped filter "some" of the good systems. But it easily managed to dismiss the better performing models in the out-sample.