Optimization, curve-fitting and probability

droth,

Read 'Optimization, The Double-Edged Sword'
Quote from the article (page 3):

"One of my observations over years of strategy development is that profitability of a strategy is inversely proportional to its complexity. Keeping this rule in mind, you should avoid too many signals. Each additional signal you add to the strategy increases the possibility that all of the signals together, in combination, are curve-fitted for the particular historical data. So keep the number of signals in your strategy to a minimum to assure that in combination they are not over-optimized on the data"


I also really like the following statement written by vegasoul:

"A robust system makes very little assumptions about the nature of the market environment and use only very universal and general rules in its design"

The length of backtesting period is equally important. The idea is to backtest as_simple_as_possible systems on as_long_as_possible period. Doing the opposite increases the process of curve-fitting.
 
Quote from OddTrader:



Only a thought:

Basically the overall dynamics of market prices consists of nine (maybe more) combinations due to three common patterns: Uptrend, Downtrend, and Sideways.

We need to think further each of these common patterns would possibly have its own dynamics in terms of such as time duration, rate of change, etc.

Would it be a very challenging job to derive an optimised system by means of curve-fitting, whether based on historical or randomly generated data set, and selecting an ideal set of parameters (what are they?)?

:confused:

You are going to take over for Jack someday
 
Quote from maxpi:



You are going to take over for Jack someday

Thanks for your good suggestion! :D

But without Jack, I'm diminishing. :mad:

Hmm, wait a minute. Are you implying Jack=Odd? :confused:
 
>Katz/McCormick focused me again on the t-test.
>Scenario:
>The in-sample test of a strategy in this book resulted in
> - #trades:
> 118

The t-test is normally used for small sample although it's ok for big sample also of course since student law converges towards normal law. But each (individual) variable of the sample should follow normal law of same variance and must be independant. In some phenomenas the hypothesis are rather reasonable (if you mesure a table with the same instrument and the same protocol it is reasonable to assume that error is due to randomness) but not in unknown phenomenas. As I use to say checking the validity of premisces is much more important than making calculus of the significance of the test since this significance has no significance at all if the basic hypothesis is not checked.

>The statistical significance is adjusted: 1 - power(1-0.00184, 20) >= 0.3103 = 31.03%
>(Detail: Can anybody explain the concept between the last >step? It's somewhat unclear to me.)
Supposing that the premisces of the student test (more generally of a parametric test) are reasonable (which is not evident as said above but let's do as if they were) this says that if "the (true but unknown) mean was equal to 0" (called null hypothesis H0) the probability would be alpha = 0.00184. The contrary of H0 is "the (true but unknown) mean wasn't equal to 0". Then the basic axiom of probability says that Prob(H0) + Prob (non H0) = 1 (since 1 is the probability of certainty :) ) so that Prob (non H0) = 1 - Prob(H0) = 1-0.00184. This is called the significance of the test.
After that if an experience E is repeated with independancy, etc., the probability of P(E1*E2*E3*..*En)=P(E1)*P(E2)... this explains power(1-0.00184, 20). Substracted from 1 give the significance.

>Now my idea:
>Improving statistical significance could be target of an >optimization process. I could make sense to not only focus on
>good past performance but also (or only!!) look for a maximum >statistical significance. This might be a way to overcome
>(or make at least less hurting) the problem of over-optimization >and curve-fitting.
"your" idea is already the foundation of statistical decision theory : The theory of statistical estimator is based on the concept of efficiency like in stock market. Nevertheless efficiency here is not as fuzzy :D. In statiscal theory optimality of estimation of a parameter means 3 things: consistancy (when the size of the sample grows, the estimator must converge towards the true parameter), no bias (for example the empirical standard deviation (of the sample) is a biased estimation of the true standard deviation (of the population) so one must correct by square-root of (n-1)), and efficiency which the smallest variance (if you vary the parameter this variance will also vary so that there is an optimum). The best accepted method to find this optimal is the maximum likelyhood which uses the formula above of multiplication of independant probability to find the roots of partial differential equations which will give the optimal value parameters. THE PROBLEM IS THE PROBABILITY LAW MUST BE KNOWN. This is common sense: you can't create knowledge from thin air :D

Quote from droth:

Hi,

I'm currently reading the book Katz/McCormick "The Encyclopedia of Trading Strategies" which brought me back
to one of my main concerns:

Trying to find a strongly systematic approach to trading I have to deal which adjusting strategies by
applying specific parameter sets. Selecting specific parameter sets may improve to strategy performance which
is called optimization. My main concern - as of many traders - is:

"How far do I have to optimize to make a strategy profitable, and when does over-optimization or
curve-fitting start?"

There are many approaches out there to solve this problem, but reading this book an idea came to my mind:

Katz/McCormick focused me again on the t-test.
Scenario:
The in-sample test of a strategy in this book resulted in
- #trades: 118
- Avg profit per trade: 740.9664
- StdDev of profit: 3811.355
(comment: I prefer using avgprofit as percent value, but that's personal style ...)

The StdError ("Expected SD of Mean") is 3811.355/sqrt(118) = 350.8637. Executing a t-test yields a probability of 1.84% that
the results are just by luck. This does sound really good, but has to be adjusted by the number of optimization steps
which have to be performed to achieve this result. The results in this examples have been achieved by executing 20
optimization steps.

The statistical significance is adjusted: 1 - power(1-0.00184, 20) = 0.3103 = 31.03%
(Detail: Can anybody explain the concept between the last step? It's somewhat unclear to me.)
This number is far worse than 1,84%. Anyhow: The book states, that due to serial dependencies, the real error may be less.

So far, all this stuff can be found in this book on page 63ff, unfortunately I didn't give birth to those ideas :-).

Now my idea:
Improving statistical significance could be target of an optimization process. I could make sense to not only focus on
good past performance but also (or only!!) look for a maximum statistical significance. This might be a way to overcome
(or make at least less hurting) the problem of over-optimization and curve-fitting.

Is there any edge in it? Has anybody tried to evaluate past performance based on this kind of t-test and tried to execute the results on unseen data ?
Has anybody executed (and probably executed) optimizers based on this principle ?

Any comments are highly appreciated.

Dierk Droth
 
Quote from bdixon619:

The more systems you build, the fewer optimization steps it takes to find a winner.

Dear bdixon,

You gave away the big secret.

Let me restate it my way: "If you're still optimizing, you're not there yet."

Be good,

nononsense
 
Quote from harrytrader:

[B THE PROBLEM IS THE PROBABILITY LAW MUST BE KNOWN. This is common sense: you can't create knowledge from thin air :D

[/B]

Dear Harrytrader,

It is good you state this once again. A lot of confusion results in some threads from people acting smart while not really knowing what they are talking about.

Be good,

nononsense

P.S. You have very neat looking graphs!
 
Quote from nononsense:

A lot of confusion results in some threads from people acting smart while not really knowing what they are talking about.

Be good,

nononsense
Yeah, exactly. Like you tooting around on the scalping thread that you're doing roughly 500 trades per day, amongst other BS all over ET.
No offence, it just crossed my mind.

Be good,

Scientist
 
droth

I mentioned in another post George Pruitt's article in August Active Trader magazine on systems. It might save you some time. It may be available online.
 
Quote from Scientist:

Yeah, exactly. Like you tooting around on the scalping thread that you're doing roughly 500 trades per day, amongst other BS all over ET.
No offence, it just crossed my mind.

Be good,

Scientist

You big fool, learn how to read first before BS'ing around here. Go get some more lessons at bubba's.

Be good,

nononsense
 
Back
Top