Breaking the conventional knowledge...

TSGannGalt · Aug 13, 2009

Another set of test results which was interesting:

Drilling down from what I did with the previous test, I have 12 models running tests the same way, except... this time around, I set kept the test period static year-by-year confirming the validity by keeping track of the performance 6 months after that.

The average performance of each measure was around the range of the previous test but if you look at the performance year-to-year, you can realize that the efficiency of the measures decaying as passes. (I should have outputed the results from old to new but we all love to see an upward curve... I hope).

Anyways... as much as the markets change and models' performance decays with time. The tools we use have a cleaner curve of the decay.

Again, this is just a test of a single instance. But it's something to consider and a good chance for people to start thinking about the validity of "how" they develop models as much as the models themselves.

So... a question arises... if the significance of a validation tool decays with time, how would you adjust your tool?

I TEST EVERYTHING.

TSGannGalt · Aug 13, 2009

OK... Walkforward...

As mentioned previously... the attach xls is the validity test for different types of WF analysis.

The test is done similarly with the previous 2 tests using 12 models....

Windowed WF is... running a WF logic within a constant interval. I would have the models optimize itself every year and use the best parametric values for the next 6 months. The results are similar to the original test with the data size/out-sample test, a bit over 50%.

Market Driven WF is.... when an outlier event occurs in the market regardless of the model's status (PL... open/close positions etc.). For this case I took the tendency of the market which the model tries to expose and triggers a WF. So as an example... let's say I have a EOD Trend-following model, and the validity of the signal relies closely with the trendiness of the market. I would have the 2 sigma range of the cumulative trendiness within a time period and when the current trendiness crosses these range it triggers a WF routine...

Equity Curve WF is... triggered when the equity curve goes into an outlier position from the 2 sigma of the 1 year test it has done previously.

Measure Triggered WF is... triggered when the cumulative value of the specific measure goes into an outlier within the distribution of the measure from the past.... So let's say that the Avg. % DD is 20% and 2 Sigma range of %DD if 15% top and 25% bottom. The WF is triggered when the open position Net PL hits these range.

So...Windowed is obvious. Market Driven definitely added value to the models. Even though, it's not much as the Market Driven, Equity Curve WF helps too. Measure Triggered WF added value to each of the measure by around 5% or so...

This test was quite interesting...
Windowed, Market Driven and Equity Curve WF used NetProfit to pick out the parameters to WF. The measure triggered WF was using their own designated measure. All the measure triggered WF underperformed the generalized WF techniques.

So was the difference between the top 3 WFs and Measure-triggered WF? The main and most significant underlying issue is...

"How does the market change?"

djmanu · Aug 14, 2009

Let'say, if you optimize for 1 year and then you forward test only the following month , what will be you success rate ?
I expect a significant higher success rate.
You said it's 50% for 6 months but the market could have changed after half of those 6 months.

phattails · Aug 16, 2009

Quote from TSGannGalt:

First...

Testing with large sets of data.

I took six models all profitable which I trade. I picked out the 6 models considering that I am about to run a parametric optimization and has a quantifiable sustainability after each test run.

I have taken 40 years of US equities, commodity futures and cash index (trivial it's cash but I needed datas). I would have the computer randomly pick a start date and the symbol for each iteration and run the models for the set amount of time, which would be 1, 3, 5, 7, 10, 15 and 20 years of historical data.

1. The first set of results were to the average annualized returns of all the test iterations. This pretty much sets as a standard figure for the test.

2. The second set is the average performance of the models after the test done above. In another words, after a single set of optimization the models would take the best performing parameters. It would take those parameters, and test it for another year... So... 1 year test sample for Model 1 returns an average of 185%. The average % return 1 year after the test is on average 25% return.

What can be concluded from the tests are that large set of data doesn't really help the models to sustain their performance for another year of trading.

3. So... I dug deeper a bit... saying... I take the best parameter in the set of 1. and run an out-sample like 2.. I would take the result and pick out the parameters which both Test 1 and 2 were positive (So Test 1 would be the main test, and Test 2 would act as an out-sample... forward testing). There's really not much of a pattern or any viable confirmation that helps me conclude that it works.

Anyways, I don't see a viable confirmation that running a long time of data and outsampling has an advantage.

4. Finally, the 4 set of data is the % of the model/parameters that were actually positive after the set data size and 1 year out-sample.

1 year of data + 1 year out-sample (% of the models being profitable):
0.593210268

3 year of data + 1 year out-sample (% of the models being profitable):
0.51020825

5 year of data + 1 year out-sample (% of the models being profitable):
0.529947475

7 year of data + 1 year out-sample (% of the models being profitable):
0.529165128

10 year of data + 1 year out-sample (% of the models being profitable):
0.56812038

15 year of data + 1 year out-sample (% of the models being profitable):
0.565841906

20 year of data + 1 year out-sample (% of the models being profitable):
0.585818864

So the profitability is very close 50/50. I'm starting off with a profitable model to start off with so the % is all higher than 1/2. Even more there not a clean curve and relative to the sample size.

So I conclude that:

Testing on large size is trivial.
Forward testing / out-sample is trivial.

Adding:
I ran a Gen. Op. like tests without carrying over information from the previous generation. The test was 100,000 generations over 1000 genes.

The fitness function was Net Profit because it keeps the tests very simple and I don't have to worry about the viability of the function affecting the selection process.

I TEST EVERYTHING.

1) Try running significance tests for your variances. You can't deduce anything without accounting for the return distributions.

2) For your set 1 data, did you omit the time periods you are not using for the second set and third set (you are never able to use that first year if that is your in-sample optimization period)?

How many variables are you optimizing? How many trades per time period? What degree of significance is each optimization? How have you dealt with price shocks? What does the distribution set look like for your fitness function? What about using other optimization engines? How significant are your test variables? How stable should their distributions be (what are you assuming and is that okay)? What does the distribution of set 1 look like for each market? Should the selected variables be optimizeable in general? What about for your fitness function? What about randomizing the test periods instead of fixed intervals? When you say trivial, do you mean insignificant or do you mean signficant, but the added value is small?

I'm not trying to criticize, but I personally hate having to redo some test result because I wasn't thorough. I guess I've gotten anal because of necessity.

Finally, what if instead of optimizing, you bought all of the parameters adjusted for (indivual returns)/ total returns.(use compounded), rebalanced at every time period. Brute force for that scenario may be a computational problem, time to break out the calculus books.

infiniwang · Aug 16, 2009

Some threads are full of typed words and fury, but signify nothing.

intradaybill · Aug 16, 2009

Quote from infiniwang:

Some threads are full of typed words and fury, but signify nothing.

High entropy

intradaybill · Aug 16, 2009

Quote from TSGannGalt:

I tend to see way too many people caught up with outdated and factless information.

It works to your advatntage so I woudl keep quite if I were you.

Some set a paradigm for trading gsystem developemnt in the 1980s that was adopted my the majority as the way to go without questioning it. A tiny minority understood what was wrong about it and went a different way making fortunes collectively. It is to the advantage of the minority that the original flawed paradigm is kept alive.

TSGannGalt · Aug 16, 2009

Here's another set of typed words that signifies nothing for others...

Actually, I'm not breaking anything here, it pretty much supports the "conventional knowledge". It's about Risk Management...

So... I attached another set of (insignificant) test results...

First, I have 6 systems that are similar to what I've done previously. I optimize a set of parameters, get the best fitted parameters, run each of the test cases (3 types).

The top set of results are the success rate and 2nd set is the average annualized return. The 3rd set is the average peak %DD it experienced during the test.

The top row of each set is the result of the model, using a single lot, no position sizing.

The 2nd row is the result of the model when you apply general money management schemes... I say general because it's a set of rules that can be implemented in Tradestation or WealthLab (like MAE/MFE stops, % capital, and etc.)

The 3rd row is a setting a set of custom MM rules based on how the models work. (Before the actual test runs, the app. runs a test to figure out which MM profile best fits the particular instance)

Anyways, the numbers show that MM is very important. Kinda like models appreciating the probability of profits with a bunch of trades... only this time, we're dealing it with a portfolio level.

I TEST EVERYTHING.

TSGannGalt · Aug 17, 2009

I'm going off-topic and completely irrelevant for the topic of this thread...

So there's been a discussion about Volume and how signficant/worthless it is. And fortunately, while I was doing this, there's another thread about Price Action / Backtesting... When I started planning on this thread, I figured I should "somewhat" give hope to people that Price Action is somewhat transferable as a system...

Posted is a 5 day instance of what happened. (I don't trade discretion that often... maybe 1-2 days a week...) Anyways, starting from Aug. 10 to the 15th I personally traded ES using a blank chart and a DOM. Of course, I had a bunch of symbols in the Quote Datagrid... and some charts of the majors...

- So the first column is the Date.

- 2nd column is how many handles I made using only my discretion, scalping and occassionally momo/swing.

- SR_MP1 is an automated model. It trades the Support/Resistance using Market Profile (Auction Theory) like indications checking for the strength of the S/R.

- SR_MP2 is an automated model. Also trades the S/R using Market Profile like indications. Addition to it, this model goes against Sell-side algos... trying to break VWAP, TWAP and etc.

- MP1 is an automated model. Pure Auction theory based model. It keep track of the TOS and Bid/Ask cumulatively and tries to trade the strength of the market regardless of S/R.

- MP/Volume is an automated model. Uses Bid/Ask and Volume(TOS) to make decisions. This filters chart patterns...

- SR_Volume is an automated model. Uses S/R Price Patterns (not the TAs like double/triple top/bottom...) with Volume. The only model that runs on the 5 min. bar chart because it requires alot of calculations (Others use tick data).

- Volume. Simple momo system using Volume.

1. Take out MP and Volume out of all the models and it will not work. Though the issue is, all Price Action based models I have require a lot of memory and consumes a lot of memory... Because I can do a better job manually... I don't bother running them with much weight...

2. Humans do a human task better than a CPU. CPUs do a CPU task better than a human. Though, it helped and made my own discretionary trading better because of all the tests and development. I have a clearer view of the market and I know wha to look for. (Unlike most people, I don't have disciplinary problems... and don't believe in psychological issues... my lack of knowledge stays as a lack of knowledge... not an unknown voodoo)...

3. Price Action is only price action. Only difference between backtesting / developing models for using regular TA and PA is solely the capability of the developer/trader. It can be tested and made a model. There's nothing fancy about it.

TSGannGalt · Aug 17, 2009

Anyways...

1. I'll be out of ET for a while (real-life professional obgilations + compliance issues) so I posted a few things that I hoped to post before I leave.

2. Run your own tests and confirm.

3. Do what you test, not what you think. What you think is what needs to be tested. TEST EVERYTHING.

4. I don't know what kind of development "paradigm" people use to develop models but considering the majority of what I read in ET... it's got a lot of issues. My intention is to get people rethink about their trading. So I kept the test results the way they are. Seriously, if you don't agree with me... run tests... If you agree with... also run tests...

Finally, good luck trading.

Breaking the conventional knowledge...

TSGannGalt

Attachments

TSGannGalt

Attachments

djmanu

phattails

infiniwang

intradaybill

intradaybill

TSGannGalt

Attachments

TSGannGalt

Attachments

TSGannGalt