Recommendations on time-series price prediction models?

upload_2021-4-21_22-26-39.png

Ok, stopped it here. Here's what it came up with:
Code:
0.100239*(row-(1.87231*tan(cos(sqrt(5.69164+row)+0.832774)/0.662045)+atanh(cos(-250.997*row-0.00765016))+1.89037*atanh(cos(-0.2448*row))))+56.9427

Max. complexity left at default setting: 60
No validation.

Next, I'll see what the forecast looks like, and compare it to actual data.
I'm expecting horrible results.

Later, I'll try it with validation; and then limiting it to whatever you use (a few sine or cosine waves?) with and without validation.
 
Actual in blue this time, out of sample forecast in orange.

upload_2021-4-21_22-42-17.png


Code:
CLOSE FORECAST ERROR
66.68 66.48 0.20
66.41 66.83 0.42
66.53 66.51 0.02
66.86 66.44 0.42
66.99 66.40 0.59
67.12 66.36 0.76
67.25 66.15 1.10
66.67 66.41 0.26
67.12 66.53 0.59
67.44 66.61 0.83
68.00 66.68 1.32
67.91 66.73 1.18
67.83 66.76 1.07
67.74 66.71 1.03
TOTAL ERROR= 9.80

Your fitted data only had 101 rows; the seen and unseen data totals 103, so I'm not sure how to line up your data, as it only included the close; so I left it out of the graph/chart.

*** My Apologies to the OP! I've moved my discussion to:
https://www.elitetrader.com/et/thre...symbolic-regression-model-experiments.357998/
 
Last edited:
View attachment 257410
Ok, stopped it here. Here's what it came up with:
Code:
0.100239*(row-(1.87231*tan(cos(sqrt(5.69164+row)+0.832774)/0.662045)+atanh(cos(-250.997*row-0.00765016))+1.89037*atanh(cos(-0.2448*row))))+56.9427

Max. complexity left at default setting: 60
No validation.

Next, I'll see what the forecast looks like, and compare it to actual data.
I'm expecting horrible results.

Later, I'll try it with validation; and then limiting it to whatever you use (a few sine or cosine waves?) with and without validation.

Thanks for posting this.

I agree that genetic programming that creates models like these (mine too) probably won't extrapolate well. It's just too easy for the model to overfit. The parabolic trend plus the sum of a few sinusoids method is harder to overfit with and has some theory behind it (prices oscillate to form a channel around a trend).
 
Thanks for posting this.

I agree that genetic programming that creates models like these (mine too) probably won't extrapolate well. It's just too easy for the model to overfit. The parabolic trend plus the sum of a few sinusoids method is harder to overfit with and has some theory behind it (prices oscillate to form a channel around a trend).
BTW, I'll be updating this at:
https://www.elitetrader.com/et/thre...symbolic-regression-model-experiments.357998/

Don't want to trample all over OP's thread. I just realized that the only input I can give it is the row number, not additional features like I planned. If I did give it features, then I wouldn't be able to forecast more than one bar ahead (with random access into any bar in the future), so I see why you did it that way now. Plus, it reduces the overfitting potential.

Currently running the algo using the last 20% of the seen data to validate.
 
Hey Phil, I tried to replicate the parabolic + cosines method you outlined. However I got stuck on the GA step and wonder if you can give me any help. I'm using python's geneticalgorithm package (https://pypi.org/project/geneticalgorithm/) and the results (after 1 hour) is much worse than the results you got. I attached a comparison.

I thought that this is a plain nonconvex optimization problem and using default parameters and a commonly available package like geneticalgorithm would be able to crack it easily. After reading a few of your others posts, I realizes that you are probably using a hand tuned GA solver. I wonder what kind of tweaks or knowledge do you think is essential to add to the solver in order to fit it nicely?


The model is not created from a Fourier transform, and Fourier transforms (at least the common Fast Fourier Transform) are not good choices for modeling prices according to John Ehlers' "Rocket Science for Traders: Digital Signal Processing Applications."


This method models asset prices as a parabolic trend (often close to linear) plus a cyclical part (sum of a few sinusoids). To create the function, I choose the window of data (89 closing prices in my example) and the number of sinusoids (3 in my example). Then software finds the remaining parameters through genetic optimization to attempt to get a good fit. The number of sinusoids is small to keep the fit fairly smooth.

Another example this time for MDYV SPDR S&P 400 Mid Cap Value ETF for 89 calendar days from 20210107 through 20210405 using data adjusted for dividends and splits and interpolating close prices between non-trading days has a raw chart
View attachment 256288

The fitted function for the close prices is
Code:
y = 56.4195823669  +  0.1046799496 * x  +  0.0002184126 * x^2
    +  1.0908385515 * cos(twopi / 28.9732589925 * x  +  4.5311441422)
    +  0.8659873009 * cos(twopi / 64.9392941518 * x  +  0.3379080296)
    +  0.7130651474 * cos(twopi / 16.9317009049 * x  +  0.6330339909) ;
View attachment 256289

The prices and fitted curve with the parabolic trend subtracted are
View attachment 256290
This suggests going long tomorrow 20210406 and exiting on 20210416 to capture the next cyclic segment predicted to rise.

The three cyclical parts are
View attachment 256291
Notice there is no single, dominant cycle (Conventional cycle analysis often assumes there is one). And the cycle with the largest period (64.9392941518 calendar days) is more than half the data size (89 calendar days). A Fourier transform would not be able to find a period more than half the data size.


I haven't been using this method very long. And like everything else, it works -- sometimes.
 

Attachments

  • comparison.png
    comparison.png
    45.7 KB · Views: 16
Did you standardize/normalize the multiple outputs ... as well as the inputs?

Yes I did, I was debating whether I should normalize it to normal distribution or something else.
Eventually I did:

Xmean = mean from input X
X = (X - Xmean) / Xmean
output Y = (Y - Xmean) / Xmean

so everything is normalized to the mean of the input data. But I'm not sure if this is the best way to normalize it.
 
Yes I did, I was debating whether I should normalize it to normal distribution or something else.
Eventually I did:

Xmean = mean from input X
X = (X - Xmean) / Xmean
output Y = (Y - Xmean) / Xmean

so everything is normalized to the mean of the input data. But I'm not sure if this is the best way to normalize it.
I would standardize, rather than normalize, each input independently.
If you decide to go with multiple outputs, I would also standardize those independently before processing; then inverse the standardization to obtain the proper forecasted values.
 
Last edited:
Yes I did, I was debating whether I should normalize it to normal distribution or something else.
Eventually I did:

Xmean = mean from input X
X = (X - Xmean) / Xmean
output Y = (Y - Xmean) / Xmean

so everything is normalized to the mean of the input data. But I'm not sure if this is the best way to normalize it.
BTW, to your original question, and since you use Python, I believe Sktime will forecast time series recursively.

 
Back
Top