did i apply curve fitting to my system

intradaybill · May 5, 2009

Quote from nephos:

Of course you can curve-fit and over-optimize that way like crazy. Chance the identifiers OHLC, change how many bars ago, change the combination of the (un)equations.

I am not saying that data mining like that inevitably leads to over-optimization, but it certainly is a possibility.

You confuse curve-fitting with selection bias. You can not curve fit C > O but you can select C > H instead.

I think he makes that clear in that link. Trading systems can be modeled as discrete sequences of entry and exit points generated by some process. Any curve fitting must eventually be reflected on how these sequences distribute over time. Things like C > O AND O > H, etc. cannot be redistributed over a given time series. There are no parameters available for doing that.

Selection bias can be dealt with up to a certain point but curve fitting cannot. I hope you understand the difference. They are both too bad. There is a fundamental difference though. With curve fitting you can essentially specify posteriori who won the lottery and it is absurd as it sounds. With selection bias your only claim is that because someone won you can also win.

I don't think many people understand the difference. Those who do not understand it have no chance to go ahead in this business. Take out you probability book and review the concepts. try to understand the difference between absurd claims (like curve fitting) from probabilistic claims like selection bias. They have no connection whatsoever and whoever told they do lied to you.

TSGannGalt · May 5, 2009

Quote from intradaybill:

You confuse curve-fitting with selection bias. You can not curve fit C > O but you can select C > H instead.

I think he makes that clear in that link. Trading systems can be modeled as discrete sequences of entry and exit points generated by some process. Any curve fitting must eventually be reflected on how these sequences distribute over time. Things like C > O AND O > H, etc. cannot be redistributed over a given time series. There are no parameters available for doing that.

Selection bias can be dealt with up to a certain point but curve fitting cannot. I hope you understand the difference. They are both too bad. There is a fundamental difference though. With curve fitting you can essentially specify posteriori who won the lottery and it is absurd as it sounds. With selection bias your only claim is that because someone won you can also win.

I don't think many people understand the difference. Those who do not understand it have no chance to go ahead in this business. Take out you probability book and review the concepts. try to understand the difference between absurd claims (like curve fitting) from probabilistic claims like selection bias. They have no connection whatsoever and whoever told they do lied to you.

Agreed.

nephos completely missed the point.

Taking OHLC, which is the data in which the hypothesis is derived from, is a completely different account from the initial hypothesis themselves.

In another words, you'll be dealing 2 separate systems if you are going to have one system use Open and the other using Close. Which makes them out of the scope for a relative analysis of whether something is "curve-fitted".

If you were to do what nephos does, you'll need to run another series of tests to define a relationship between the values of the data (OHLC) on top of the hypothesis(models and systems). But again, the OHLC selection would not be considered "curve-fitted" by definition.

nephos · May 5, 2009

Quote from intradaybill:

You confuse curve-fitting with selection bias. You can not curve fit C > O but you can select C > H instead.

It appears to me the confusion is on your side. OHLC is just a traditional way of sampling the data to save computing resources. You can replace OHLC with tick resolution that contains all information and build the same models, it will just be more difficult to handle. Selection bias happens before everything I was writing about. Selecting the symbol, testing period etc. happens before.

Quote from TSGannGalt:

Agreed.

nephos completely missed the point.

Some people here seem to have reading comprehension issues. I was writing about models derived from OHLC analysis like complex price patterns. My point was that you can curve-fit with rules and logic as you can with parameters. It ultimately amounts to the same. You can also parameterize rules with switch-case.

Quote from intradaybill:

There are no parameters available for doing that.

In fact all parameters for those price patterns can be optimized:

1. Action to take like buy or sell
2. Price identifier like open, high, low or close (can be emulated with tick only)
3. Price delay like [0] or [2] or [45] bars ago
4. Relational operator like > or =
5. Logical operator like AND or OR
6. How many (un)equations
7. Order of (un)equations

You can combine several patterns optimized that way to perfectly curve-fit any time series.

TSGannGalt · May 5, 2009

Quote from nephos:

It appears to me the confusion is on your side. OHLC is just a traditional way of sampling the data to save computing resources. You can replace OHLC with tick resolution that contains all information and build the same models, it will just be more difficult to handle. Selection bias happens before everything I was writing about. Selecting the symbol, testing period etc. happens before.

Some people here seem to have reading comprehension issues. I was writing about models derived from OHLC analysis like complex price patterns. My point was that you can curve-fit with rules and logic as you can with parameters. It ultimately amounts to the same. You can also parameterize rules with switch-case.

In fact all parameters for those price patterns can be optimized:

1. Action to take like buy or sell
2. Price identifier like open, high, low or close (can be emulated with tick only)
3. Price delay like [0] or [2] or [45] bars ago
4. Relational operator like > or =
5. Logical operator like AND or OR
6. How many (un)equations
7. Order of (un)equations

You can combine several patterns optimized that way to perfectly curve-fit any time series.

...

1. There is no confusion. You're only lacking the background knowledge in which intradaybill and I are basing our thoughts on.

2. OHLC != Tick datasets. It's the basic Aristotle plausible reasoning about logic...

If A is true, then B is true. If B is false what is A?

3. Run tests.

4. I'm not your Math teacher and it's rather pointless to discuss this, until you go back and read some books on Probability and Optimization Theory. Some topics are good for discussion but some are not. If you insist on going on... then so be it...

nephos · May 5, 2009

Quote from TSGannGalt:

2. OHLC != Tick datasets. It's the basic Aristotle plausible reasoning about logic...

Any other interval like 358 seconds OHLC or 55 minutes OHLC or 1 year OHLC can be computed out of 1 tick time stamped data. The tick interval contains all information of the higher time frames and 1 tick is C only, not OHLC.

You really seem not to understand that OHLC really is nothing but a data sampling technique that can as well be skipped in model development by using the pure source data. Not my problem though.

newguy05 · May 6, 2009

not to derail this passionate scholar discussion on ohlc

but does anyone know what are some of the key barometers one should use to measure the system. Is there a website or past discussion that list them out along with some explanation. I think most of it is just common sense, like % winner/loss vs positive expectancy etc.. but just want the complete list so i can measure my system.

Also does anyone have problem with IB demo account data to test? I finished coding my wrapper around the IB java api. Everything is great, except the es mini does NOT move in IB demo. I am not talking about the api, just in tws, the bid and ask would stay the same for a good 20 mins. There are constant trades happening that hit either the bid, ask, or mid. But the bid and ask does NOT move for a long time.

That is definitely not the behavior of the real es mini.

lolatency · May 6, 2009

Quote from DT-waw:

very good question.
the one not curve-fitted at all is the system which generates no profit over the long run.
the more profit or the more consistent returns it produces - the more it is fitted to the curve.

yet 99.9% of traders are unable to grasp this simple concept!

It's called variance-bias trade-off and it is the bane of every statistician.

TSGannGalt · May 6, 2009

Quote from lolatency:

It's called variance-bias trade-off and it is the bane of every statistician.

Ding, ding, ding...

We have the answer!!!

intradaybill · May 6, 2009

Quote from nephos:

Any other interval like 358 seconds OHLC or 55 minutes OHLC or 1 year OHLC can be computed out of 1 tick time stamped data. The tick interval contains all information of the higher time frames and 1 tick is C only, not OHLC.

You really seem not to understand that OHLC really is nothing but a data sampling technique that can as well be skipped in model development by using the pure source data. Not my problem though.

Again you stubbornly miss the point. Curve fitting has to do with finding a curve that approximates a data series in some optimal way.

Selecting a 7 min OHLC series from tick data is not curve-fitting either. All the information of the former series is in the latter. The OHL information is there but it is a function of C:

O = C{ C, at time To} = C(O)
H = max{C, period} = C(H)
L = low{C, period} = C(L)

The free parameters have to do with time and they are: To and period, The three new parameters that emerge depend on time only. Note that C does not depend on time and in the new series in OHLC, C is part of the original series. Also, OHL are part of the original series.

The new series OHLC, contains no new information and it exactly matches the original series at the points OHLC. It is compressed in the sense that it conveys less information than the original series. But do not try to find a way to curve fit from tick data that way, because it is already fit best, actually it is some time-based selection of the original series.

So, to answer your naive claims, the 7 min OHLC series from C tick data is actually:

Co C(H) C(L) C

So the bias that is introduced here is time-based selection. The law of large numbers tells us that deviations from returns inflicted by this kind of selection will average to 0 in the long term.

I don't think you should worry about time-based selection bias if you are back testing correctly. Actually, If your system checks for the stop-loss first then you are not introducing a winning bias. As it turns out in this case, OHLC time-based selection results in a bias towards lower success rates and that is the opposite of curve fitting.

Needless to say that in some time-based selection, the interval has some serious significance, like daily data or timeframes watched by many traders like 5, 60, 240 min.

I am saying that time-based selection can be easily dealt with but in order to understand these issues you must have a lot of empirical knowledge.

nephos · May 6, 2009

I am not entirely sure what point exactly I may have missed. I am however pretty sure that this does not contradict anything I wrote:

Quote from intradaybill:

Again you stubbornly miss the point. Curve fitting has to do with finding a curve that approximates a data series in some optimal way.

Selecting a 7 min OHLC series from tick data is not curve-fitting either.

Who claimed it was since you feel the need to clarify? No one. My claim was that selection bias happens before you work with a selected piece of data:

Quote from nephos:

You can replace OHLC with tick resolution that contains all information and build the same models, it will just be more difficult to handle. Selection bias happens before everything I was writing about. Selecting the symbol, testing period etc. happens before.

So of course you can select to work with sampled data instead of the source data. Then, you already have selected the data you work with and go over to curve-fitting. You now try to find any autocorrelation in your given piece of data with some sort of algorithm like complex price patterns. This is the prime example of curve-fitting.

You however claimed that selection bias even was involved in the latter process of finding any autocorrelation in an already given piece of (sampled) data:

Quote from intradaybill:

You confuse curve-fitting with selection bias. You can not curve fit C > O but you can select C > H instead.