Recommendations on time-series price prediction models?

userque · Apr 20, 2021

ph1l said:
I wrote the genetic programming part with C++ and opencl. The calculations for the function are done in opencl with single-precision floating point arithmetic. The controlling part is perl and shell (bash). The images are from gnuplot.

The only input to the function is time in the form of number of bars relative to the start of the data (0 through 88 calendar days for the example's data that was fitted). This allows the function to be applied for any time.

The attached inputData.csv has the input data with comma-separated format
<TICKER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOLUME>,<UNADJCLOSE>,<UNADJVOLUME>
Calendar days in the data when U.S. stock markets were closed are linearly-interpolated from the previous trading close price.
The candlestick chart has the <OPEN>,<HIGH>,<LOW>,<CLOSE> columns.
The function is fitted on the <CLOSE> column only.
The fitted data and parabolic, least squares trend of the fitted data past the candlesticks is the predicted data (12 bars).

The raw, fitted data including the extra 12 predicted bars is in the attached fitted.txt. This data looks like it has more precision than the actual data because perl converts the single-precision floating point to double-precision.

The actual future data is in the attached unseendata.csv. Since this is recent data for an ETF, there isn't too much of it. This data wasn't used in any calculations or measurements.

Thanks for that. But still ... where is the time input variable? I only see the R variables:

y =
0: R4 = R2 * cos (-81.5485)
1: R0 = 50.7399 - R4
2: R4 = R4 * cos (0.199104)
3: R3 = sqrt (R4)
4: R4 = 41.8048 * cos (R3)
5: R4 = 4.3255 / R4
6: R4 = atan2 (R4 / R0)
7: R0 = R0 * cos (87.6408)
8: R0 = R4 + R0
9: R4 = R4 * sin (-4.13973)
10: R0 = R4 + R0
11: R2 = -44.0889 * sinh (R0)
12: R4 = abs (R0)
13: R3 = log (R4)
14: R4 = atan2 (R2 / 70.0589)
15: R0 = R4 + R0
16: R2 = asinh (R0)
17: R1 = 13.707 * sin (R3)
18: R1 = R1 * sin (-37.162)
19: R4 = 1.39025 * sin (R4)
20: R1 = R1 * cos (68.1254)
21: R3 = tanh (R1 * R3 + R4)
22: R2 = R2 - R3
23: R4 = asin (R1)
24: R1 = sigmoid (-12.9007 * R1 + R4)
25: R1 = R1 * cosh (R3)
26: R2 = R1 + R2
27: R2 = R2 * sin (1.16991)
28: R2 = R2 * sin (-79.8879)
29: R0 = 62.65 - R2
return R0

This formula, as it sits, will only output a single 'y' value. It offers no way to vary 'y' based on 't,' or based on any other variable. ??!!

If 't' is 1, where is the '1' represented in this formula?
If 't' is 2, where would the '2' be represented in this formula?

ph1l · Apr 20, 2021

userque said:
Thanks for that. But still ... where is the time input variable? I only see the R variables:

y =
0: R4 = R2 * cos (-81.5485)
1: R0 = 50.7399 - R4
2: R4 = R4 * cos (0.199104)
3: R3 = sqrt (R4)
4: R4 = 41.8048 * cos (R3)
5: R4 = 4.3255 / R4
6: R4 = atan2 (R4 / R0)
7: R0 = R0 * cos (87.6408)
8: R0 = R4 + R0
9: R4 = R4 * sin (-4.13973)
10: R0 = R4 + R0
11: R2 = -44.0889 * sinh (R0)
12: R4 = abs (R0)
13: R3 = log (R4)
14: R4 = atan2 (R2 / 70.0589)
15: R0 = R4 + R0
16: R2 = asinh (R0)
17: R1 = 13.707 * sin (R3)
18: R1 = R1 * sin (-37.162)
19: R4 = 1.39025 * sin (R4)
20: R1 = R1 * cos (68.1254)
21: R3 = tanh (R1 * R3 + R4)
22: R2 = R2 - R3
23: R4 = asin (R1)
24: R1 = sigmoid (-12.9007 * R1 + R4)
25: R1 = R1 * cosh (R3)
26: R2 = R1 + R2
27: R2 = R2 * sin (1.16991)
28: R2 = R2 * sin (-79.8879)
29: R0 = 62.65 - R2
return R0

This formula, as it sits, will only output a single 'y' value. It offers no way to vary 'y' based on 't,' or based on any other variable. ??!!

If 't' is 1, where is the '1' represented in this formula?
If 't' is 2, where would the '2' be represented in this formula?

All the registers get initialized with the single input value before any of the statements run. So for the 5 registers in this example, that's equivalent to R0 = R1 = R2 = R3 = R4 = t where t is 0 for the first value, 1 for the second, ...

userque · Apr 20, 2021

ph1l said:
All the registers get initialized with the single input value before any of the statements run. So for the 5 registers in this example, that's equivalent to R0 = R1 = R2 = R3 = R4 = t where t is 0 for the first value, 1 for the second, ...

So, the first line,

Code:

0: R4 = R2 * cos (-81.5485)

is essentially,

Code:

0: R4 = t * cos (-81.5485)  ??

And I assume the execution is from top to bottom.?

Do you always start with five registers, in all cases?

ph1l · Apr 20, 2021

userque said:
So, the first line,

Code:

0: R4 = R2 * cos (-81.5485)

is essentially,

Code:

0: R4 = t * cos (-81.5485) ??

And I assume the execution is from top to bottom.?

Do you always start with five registers, in all cases?

Yes, yes, and yes (does that make me a yes man?

).

R2 is initialized to t before the instructions run.

Execution is top-to-bottom. I didn't put in any conditional control flow because then the genetic part might cause the fittest function to closely match the input data but be useless for extrapolation.

I configured the number of registers to 5, and the fitted function uses them all. It's possible a run on different data might not use all the configured registers.

When an operation would result in a non-finite or illegal value (e.g., square root of a negative number), the result of the operation gets set to -888888888. I chose an arbitrary value instead of NaN (not a number) to avoid the degenerate case when all rules return NaN.

userque · Apr 20, 2021

ph1l said:
Yes, yes, and yes (does that make me a yes man?).

R2 is initialized to t before the instructions run.

Execution is top-to-bottom. I didn't put in any conditional control flow because then the genetic part might cause the fittest function to closely match the input data but be useless for extrapolation.

I configured the number of registers to 5, and the fitted function uses them all. It's possible a run on different data might not use all the configured registers.

When an operation would result in a non-finite or illegal value (e.g., square root of a negative number), the result of the operation gets set to -888888888. I chose an arbitrary value instead of NaN (not a number) to avoid the degenerate case when all rules return NaN.

Ok.

Later today/tonight, I'll run the data through my software, just as you did.

But normally, I use far more features than just the closing price.

rolloff · Apr 20, 2021

jublin said:
hi everyone, I'm new to the field of price prediction modeling and currently I'm experimenting the LSTM model. I found a few posts online, like this one : https://towardsdatascience.com/lstm...stock-prices-using-an-lstm-model-6223e9644a2f . They however all suffer from the problem that their predictions are not recursive. I took the trained model from the above mentioned post and ran a recursive prediction; the results are wrong and not usable.

Wonder if anyone can recommend a few time-series price prediction models?

Thanks!

Models that I have seen applied for time series predictions:

1) Linear regression
2) ARIMA
3) Gaussian Dynamic Boltzmann Machines
4) Dynamic Linear models
5) Time-Varying Autoregression
6) LGBM / Catboost / Gaussian Boosting / Entropy-based Boosting
7) LSTMs, multilayer-LSTMs, rescaled LSTMs, phased LSTMs / LSTM gate variants
8) GARCH / TGARCH / NN-GARCH for volatility

Poljot · Apr 20, 2021

There is an Exponential Smoothing (ETS) algorithm in MS Excel. Function name is FORECAST and it includes seasonality.

userque · Apr 21, 2021

ph1l said:
I wrote the genetic programming part with C++ and opencl. The calculations for the function are done in opencl with single-precision floating point arithmetic. The controlling part is perl and shell (bash). The images are from gnuplot.

The only input to the function is time in the form of number of bars relative to the start of the data (0 through 88 calendar days for the example's data that was fitted). This allows the function to be applied for any time.

The attached inputData.csv has the input data with comma-separated format
<TICKER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOLUME>,<UNADJCLOSE>,<UNADJVOLUME>
Calendar days in the data when U.S. stock markets were closed are linearly-interpolated from the previous trading close price.
The candlestick chart has the <OPEN>,<HIGH>,<LOW>,<CLOSE> columns.
The function is fitted on the <CLOSE> column only.
The fitted data and parabolic, least squares trend of the fitted data past the candlesticks is the predicted data (12 bars).

The raw, fitted data including the extra 12 predicted bars is in the attached fitted.txt. This data looks like it has more precision than the actual data because perl converts the single-precision floating point to double-precision.

The actual future data is in the attached unseendata.csv. Since this is recent data for an ETF, there isn't too much of it. This data wasn't used in any calculations or measurements.

Had to run some unexpected calculations from yesterday, into market close today. Forgot about this post.

Setting it up now ... should have a very preliminary function in several minutes.

userque · Apr 21, 2021

ph1l said:
I wrote the genetic programming part with C++ and opencl. The calculations for the function are done in opencl with single-precision floating point arithmetic. The controlling part is perl and shell (bash). The images are from gnuplot.

The only input to the function is time in the form of number of bars relative to the start of the data (0 through 88 calendar days for the example's data that was fitted). This allows the function to be applied for any time.

The attached inputData.csv has the input data with comma-separated format
<TICKER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOLUME>,<UNADJCLOSE>,<UNADJVOLUME>
Calendar days in the data when U.S. stock markets were closed are linearly-interpolated from the previous trading close price.
The candlestick chart has the <OPEN>,<HIGH>,<LOW>,<CLOSE> columns.
The function is fitted on the <CLOSE> column only.
The fitted data and parabolic, least squares trend of the fitted data past the candlesticks is the predicted data (12 bars).

The raw, fitted data including the extra 12 predicted bars is in the attached fitted.txt. This data looks like it has more precision than the actual data because perl converts the single-precision floating point to double-precision.

The actual future data is in the attached unseendata.csv. Since this is recent data for an ETF, there isn't too much of it. This data wasn't used in any calculations or measurements.

After a few minutes:

userque · Apr 21, 2021

A few more minutes: