Avoiding Curve fitting

Quote from Rocko Bonaparte:

I've seen neural networks used for classification by encoding the outputs to represent different situations. Say, detecting letters in an image would be a classification problem, and the neural network could have a different output neuron for each letter.

The "regression" method as in the Wikipedia article is something I tried to do once by clobbering together some random indicators with their first and second level derivatives (derivatives in the differential equations sense, not in the financial sense). It didn't work, and I'm pretty sure it was because the indicators weren't good to begin with. None of them were profitable alone, nor did I ever find a combination that helped. The neural network was a red herring that side tracked me for months due to a few bugs in its implementation; it was fools gold in the end.

I could see a neural network basically curve fitting problematically if it's applied to a narrow set of data without having enough data to counter it. And that's where curve fitting would be a problem with anything else.

Your experience is common. Successful usage in this very complex area often requires an MS in Computer Science. Kicking around a few data files seldom does it.

To your point on data: A large training set is required along with smaller sets for test and validation. For hourly EUR/USD about 24,000 bars work very well for a trade window of up to 12 hours into the future.
 
Quote from Jerry030:

Your experience is common. Successful usage in this very complex area often requires an MS in Computer Science. Kicking around a few data files seldom does it.

To your point on data: A large training set is required along with smaller sets for test and validation. For hourly EUR/USD about 24,000 bars work very well for a trade window of up to 12 hours into the future.
To be fair I do have a BS in computer engineering. I was using NEAT since it's what I could readily find without having to try to come up with a neuroevolving network from scratch. Despite this, after I ironed out all the bugs, it tended to saturate. Generally it decided each training pass that it was better to just saturate and take the hit rather than try to guess and potentially score worse.

I'm intending to try again when I have indicators that have either proven to work well together or are profitable on their own. So it would be optimization moreso than discovery.
 
Quote from Random.Capital:

You need a mirror, badly. It is 100% clear you do not understand what you are talking about. Regression is exactly how "classification" NN are trained.

I'm out.

Cheers.

Wonder what the odds are of a product-pump coming next...?

Yep, and regression as implemented in a NN is recursive adaptation or curve fitting to map input neurons to the dependent or output variable. Philosophically, if we ignore the hidden nodes, weight functions and a few other things this is exactly what happens in most trading system optimizations. You can do it in Excel in 20 minutes with Solver, etc. if you want. Find the perfect it between your goal and the data elements in your trading system.

Before you run off to hide could you offer an opinion on why so many people do this on ET? In most other groups in technical or academic areas folks at least have the courtesy to say, "oops, I see your point" when you cite source material like Wikipedia or journal sources which deflate the other parties balloon.

Here folks run away like kids on the playground in junior high. Why is ET like that?
 
Quote from Rocko Bonaparte:

To be fair I do have a BS in computer engineering. I was using NEAT since it's what I could readily find without having to try to come up with a neuroevolving network from scratch. Despite this, after I ironed out all the bugs, it tended to saturate. Generally it decided each training pass that it was better to just saturate and take the hit rather than try to guess and potentially score worse.

I'm intending to try again when I have indicators that have either proven to work well together or are profitable on their own. So it would be optimization moreso than discovery.


I've found that in some cases a NN will learn more from the mathematical components used to calculate the indicator along with the indicator itself. Keep in mind that indicators were invented to dumb down complex market dynamics so it can be shown on a 2 dimensional chart. NN are inherently N dimensional and don't have this limitation. For example the components in the RSI model much better than the RSI itself.
 
Quote from Jerry030:

I've found that in some cases a NN will learn more from the mathematical components used to calculate the indicator along with the indicator itself. Keep in mind that indicators were invented to dumb down complex market dynamics so it can be shown on a 2 dimensional chart. NN are inherently N dimensional and don't have this limitation. For example the components in the RSI model much better than the RSI itself.

What are the components of RSI --- only the last close and the length of the smoothing factor.
 
Quote from Muskoka Joe:

What are the components of RSI --- only the last close and the length of the smoothing factor.

RSI Calculation

100
RSI = 100 - --------
1 + RS

RS = Average Gain / Average Loss

Average Gain = [(previous Average Gain) x 13 + current Gain] / 14
First Average Gain = Total of Gains during past 14 periods / 14

Average Loss = [(previous Average Loss) x 13 + current Loss] / 14
First Average Loss = Total of Losses during past 14 periods / 14

Note: "Losses" are reported as positive values.


To simplify our explanation of the formula, the RSI has been broken down into its basic components which are the RS, the Average Gain, and the Average Loss.

To calculate RSI values for a given dataset, first find the magnitude of all gains and losses for the 14 periods prior to the time where you wish to start the calculation. (Note: 14 is the standard number of periods used when calculating the RSI. If a different number is specified, just substitute that number in for "14" throughout this discussion.)

It is important to understand that the RSI is a "running" calculation and the accuracy of the calculation depends on how long ago the calculations started. The first RSI value is an estimate - subsequent values improve on that estimate. You should calculate at least 14 values prior to the start of any values that you will rely on - going back 28+ periods is even better.

To start the running calculation, the First Average Gain is calculated as the total of all gains during the past 14 periods divided by 14. Similarly, the First Average Loss is calculated as the total magnitude of all losses during the past 14 periods divided by 14. The next values for the "averages" are calculated by taking the previous value, multiplying it by 13, adding in the next Gain (or Loss), and then dividing by 14. This is Wilder's modified "smoothing" technique in action.

The RS value is simply the Average Gain divided by the Average Loss for each period.

Finally, the RSI is simply the RS converted into an oscillator that goes between zero and 100 using this formula: 100 - (100 / RS + 1).

Here's an Excel Spreadsheet that shows the start of an RSI calculation in action.

When the Average Gain is greater than the Average Loss, the RSI rises because RS will be greater than 1. Conversely, when the Average Loss is greater than the Average Gain, the RSI declines because RS will be less than 1. The last part of the formula ensures that the indicator oscillates between 0 and 100. Note: If the Average Loss ever becomes zero, RSI becomes 100 by definition.
 
So the two components are close and exponential smoother.


Quote from Jerry030:

RSI Calculation

100
RSI = 100 - --------
1 + RS

RS = Average Gain / Average Loss

Average Gain = [(previous Average Gain) x 13 + current Gain] / 14
First Average Gain = Total of Gains during past 14 periods / 14

Average Loss = [(previous Average Loss) x 13 + current Loss] / 14
First Average Loss = Total of Losses during past 14 periods / 14

Note: "Losses" are reported as positive values.


To simplify our explanation of the formula, the RSI has been broken down into its basic components which are the RS, the Average Gain, and the Average Loss.

To calculate RSI values for a given dataset, first find the magnitude of all gains and losses for the 14 periods prior to the time where you wish to start the calculation. (Note: 14 is the standard number of periods used when calculating the RSI. If a different number is specified, just substitute that number in for "14" throughout this discussion.)

It is important to understand that the RSI is a "running" calculation and the accuracy of the calculation depends on how long ago the calculations started. The first RSI value is an estimate - subsequent values improve on that estimate. You should calculate at least 14 values prior to the start of any values that you will rely on - going back 28+ periods is even better.

To start the running calculation, the First Average Gain is calculated as the total of all gains during the past 14 periods divided by 14. Similarly, the First Average Loss is calculated as the total magnitude of all losses during the past 14 periods divided by 14. The next values for the "averages" are calculated by taking the previous value, multiplying it by 13, adding in the next Gain (or Loss), and then dividing by 14. This is Wilder's modified "smoothing" technique in action.

The RS value is simply the Average Gain divided by the Average Loss for each period.

Finally, the RSI is simply the RS converted into an oscillator that goes between zero and 100 using this formula: 100 - (100 / RS + 1).

Here's an Excel Spreadsheet that shows the start of an RSI calculation in action.

When the Average Gain is greater than the Average Loss, the RSI rises because RS will be greater than 1. Conversely, when the Average Loss is greater than the Average Gain, the RSI declines because RS will be less than 1. The last part of the formula ensures that the indicator oscillates between 0 and 100. Note: If the Average Loss ever becomes zero, RSI becomes 100 by definition.
 
Quote from Rocko Bonaparte:

The argument still seems to be good/bad curve fitting, and even if it's completely bad in the first place. And then you can complicate it by accounting for trading style. Maybe for one person's trading style it's a bad idea, but essential for another trader.

The strategy I'm trying to develop would basically run swing trades from market open to market open. I can take historical data and map out from that the "perfect" indicator that would, if I acted on a particular day, show what my percent gain would be. Of course I'm time shifting to make that function, so it's never available for forward testing. But if there's something to try to fit a curve to, it's that.

I think a big problem would be if I found a way to fit, say, 20 consecutive points, but outside of that it's complete nonsense. That would be curve fitting in the bad way. If instead I found something that generally was able to approximate the function, then I've found my magic button.

BTW I doubt I'd ever find that magic button. So long as it's "good enough" I'm happy.

The reason is you are using linear math in attempt to model a non-linear system. Try taking your current logical design and run it into a non-linear predivtve analytics application.

See:http://www.kdnuggets.com/software/index.html
 
So the two components are close and exponential smoother.


Components 14 day RSI:
1 - 14 : daily gains each day
15 to 28 ; daily losses each day
29; Average gain
30 Average loss

total components = 30


--------------------------------------------------------------------------------
Quote from Jerry030:

RSI Calculation

100
RSI = 100 - --------
1 + RS

RS = Average Gain / Average Loss

Average Gain = [(previous Average Gain) x 13 + current Gain] / 14
First Average Gain = Total of Gains during past 14 periods / 14

Average Loss = [(previous Average Loss) x 13 + current Loss] / 14
First Average Loss = Total of Losses during past 14 periods / 14

Note: "Losses" are reported as positive values.


To simplify our explanation of the formula, the RSI has been broken down into its basic components which are the RS, the Average Gain, and the Average Loss.

To calculate RSI values for a given dataset, first find the magnitude of all gains and losses for the 14 periods prior to the time where you wish to start the calculation. (Note: 14 is the standard number of periods used when calculating the RSI. If a different number is specified, just substitute that number in for "14" throughout this discussion.)

It is important to understand that the RSI is a "running" calculation and the accuracy of the calculation depends on how long ago the calculations started. The first RSI value is an estimate - subsequent values improve on that estimate. You should calculate at least 14 values prior to the start of any values that you will rely on - going back 28+ periods is even better.

To start the running calculation, the First Average Gain is calculated as the total of all gains during the past 14 periods divided by 14. Similarly, the First Average Loss is calculated as the total magnitude of all losses during the past 14 periods divided by 14. The next values for the "averages" are calculated by taking the previous value, multiplying it by 13, adding in the next Gain (or Loss), and then dividing by 14. This is Wilder's modified "smoothing" technique in action.

The RS value is simply the Average Gain divided by the Average Loss for each period.

Finally, the RSI is simply the RS converted into an oscillator that goes between zero and 100 using this formula: 100 - (100 / RS + 1).

Here's an Excel Spreadsheet that shows the start of an RSI calculation in action.

When the Average Gain is greater than the Average Loss, the RSI rises because RS will be greater than 1. Conversely, when the Average Loss is greater than the Average Gain, the RSI declines because RS will be less than 1. The last part of the formula ensures that the indicator oscillates between 0 and 100. Note: If the Average Loss ever becomes zero, RSI becomes 100 by definition.
--------------------------------------------------------------------------------
 
Quote from Jerry030:


Components 14 day RSI:
1 - 14 : daily gains each day
15 to 28 ; daily losses each day
29; Average gain
30 Average loss

total components = 30

==================================

I'll just disagree with this, as it seems totally off to me, but maybe I just can't figure it out. Sometimes I wish I was smarter, but I wonder if it would help in an arena that is really not a very intellectual pursuit.
 
Back
Top