Profit model correlation

ok, so here is a basic RM methodology I attempted to create a NN using the following as inputs.
1) 533 days worth of closing data of QQQQ (I know it's small, but we're learning and testing tool methodology here) stored
in .xls file. It also has 5 (days-n) delayed versions in 5 columns-- these will be the 5 input nodes to the nn, where the actual closing data col is the output training control variable. Thus it is the label id (col 7) in the xls model read in reference.

The rest is explained on the attached jpg.
The model successfully created a 5 input layer nn, with 1 hidden layer (4 hidden layer neurons) and 1 output layer (1 neuron). I'm still not sure how to set the number of hidden layer neurons. The program did it automatically.

While it showed some of the predicted outputs, I have not (yet) been successful in the next step, which is to reload the nn model it spits out, and open a new RM file with an excel sheet minus the out column (since now I expect it to write out the prediction column), and also load the model that was written. I was able to do the gold tutorial with success, but am not able to get to this step in my created model.

If anyone wants to play along and collaborate, here is your chance. I will help anyone get to the step I am at if you need help to get here. If this is successful, and worthy of pursuit in a collaborative forum, I should expect some feedback and furthering of results at this point. I will continue if I see interest and work. If not, I will continue to pursue this solo.

xnrsxs.jpg


Remember, my main goal is to get up to speed on RM, and see if it is useful to rapidly prototype some stuff (rather than hand code). I am still not proficient enough to get to the level of translating system concepts (such as jerry is desiring... ex basic breakout system) into RM for prototype.


2yltqas.jpg

Here is regression results screen. Nothing pretty yet, but it's a checklist to get to here (means you at least have error free prototype).
 
Quote from Trader922:

All- I am certainly still interested in collaboratively working on a Rapid Miner project.

Wheat seems like a good choice to me for a market to study based on the recent events in the market.

Also, I have a dotnetnuke site I would be willing to customize to facilitate the project and storage of the data.

Jerry - How much data would you typically use for a project like this when using EOD data?

Regards,
Eric

Eric,

Thanks fo the offer.

We could have some large files. For bars in 5 min and such one could have 50,000 to 100,000 rows, with date, time, OHLC. Then you need to add the TIs. Lets say 20 of those to start.

IF we have a mix of a daily and a short term on one market instrument from the stock, futures and Forex sector it will amount to a bit of data.

Until we see where this will go it may be premature to set a permanent library up. We can use FileSend to transfer a 100 MG file between the 4 or 5 of us to start.

If Wheat sounds good I'll prep an XLS file of daily bars from 1960 to 2006 and post a link for people to download.

What indicators would people like to test?

Here is a list of the ones I've got set up

If people have others they want to add we'll need a calculation method: either a software app or some code functions or the formula.

Vector Trigonometric ACos
Chaikin A/D Line
Vector Arithmetic Add
Chaikin A/D Oscillator
Average Directional Movement Index
Average Directional Movement Index Rating
Absolute Price Oscillator
Aroon
Aroon Oscillator
Vector Trigonometric ASin
Vector Trigonometric ATan
Average True Range
Average Price
Bollinger Bands
Beta
Balance Of Power
Commodity Channel Index
Two Crows
Three Black Crows
Three Inside Up/Down
Three-Line Strike
Three Outside Up/Down
Three Stars In The South
Three Advancing White Soldiers
Abandoned Baby
Advance Block
Belt-hold
Breakaway
Closing Marubozu
Concealing Baby Swallow
Counterattack
Dark Cloud Cover
Doji
Doji Star
Dragonfly Doji
Engulfing Pattern
Evening Doji Star
Evening Star
Up/Down-gap side-by-side white lines
Gravestone Doji
Hammer
Hanging Man
Harami Pattern
Harami Cross Pattern
High-Wave Candle
Hikkake Pattern
Modified Hikkake Pattern
Homing Pigeon
Identical Three Crows
In-Neck Pattern
Inverted Hammer
Kicking
Kicking - bull/bear determined by the longer maru
Ladder Bottom
Long Legged Doji
Long Line Candle
Marubozu
Matching Low
Mat Hold
Morning Doji Star
Morning Star
On-Neck Pattern
Piercing Pattern
Rickshaw Man
Rising/Falling Three Methods
Separating Lines
Shooting Star
Short Line Candle
Spinning Top
Stalled Pattern
Stick Sandwich
Takuri (Dragonfly Doji with very long lower shado
Tasuki Gap
Thrusting Pattern
Tristar Pattern
Unique 3 River
Upside Gap Two Crows
Upside/Downside Gap Three Methods
Vector Ceil
Chande Momentum Oscillator
Pearson's Correlation Coefficient (r)
Vector Trigonometric Cos
Vector Trigonometric Cosh
Double Exponential Moving Average
Vector Arithmetic Div
Directional Movement Index
Exponential Moving Average
Vector Arithmetic Exp
Vector Floor
Hilbert Transform - Dominant Cycle Period
Hilbert Transform - Dominant Cycle Phase
Hilbert Transform - Phasor Components
Hilbert Transform - SineWave
Hilbert Transform - Instantaneous Trendline
Hilbert Transform - Trend vs Cycle Mode
Kaufman Adaptive Moving Average
Linear Regression
Linear Regression Angle
Linear Regression Intercept
Linear Regression Slope
Vector Log Natural
Vector Log10
Moving average
Moving Average Convergence/Divergence
MACD with controllable MA type
Moving Average Convergence/Divergence Fix 12/26
MESA Adaptive Moving Average
Moving average with variable period
Highest value over a specified period
Index of highest value over a specified period
Median Price
Money Flow Index
MidPoint over period
Midpoint Price over period
Lowest value over a specified period
Index of lowest value over a specified period
Lowest and highest values over a specified period
Indexes of lowest and highest values over a speci
Minus Directional Indicator
Minus Directional Movement
Momentum
Vector Arithmetic Mult
Normalized Average True Range
On Balance Volume
Plus Directional Indicator
Plus Directional Movement
Percentage Price Oscillator
Rate of change : ((price/prevPrice)-1)*100
Rate of change Percentage: (price-prevPrice)/prev
Rate of change ratio: (price/prevPrice)
Rate of change ratio 100 scale: (price/prevPrice)
Relative Strength Index
Parabolic SAR
Parabolic SAR - Extended
Vector Trigonometric Sin
Vector Trigonometric Sinh
Simple Moving Average
Vector Square Root
Standard Deviation
Stochastic
Stochastic Fast
Stochastic Relative Strength Index
Vector Arithmetic Substraction
Summation
Triple Exponential Moving Average (T3)
Vector Trigonometric Tan
Vector Trigonometric Tanh
Triple Exponential Moving Average
True Range
Triangular Moving Average
1-day Rate-Of-Change (ROC) of a Triple Smooth EMA
Time Series Forecast
Typical Price
Ultimate Oscillator
Variance
Weighted Close Price
Williams' %R
Weighted Moving Average




Jerry
 
Quote from dtrader98:

ok, so here is a basic RM methodology I attempted to create a NN using the following as inputs.
1) 533 days worth of closing data of QQQQ (I know it's small, but we're learning and testing tool methodology here) stored
in .xls file. It also has 5 (days-n) delayed versions in 5 columns-- these will be the 5 input nodes to the nn, where the actual closing data col is the output training control variable. Thus it is the label id (col 7) in the xls model read in reference.

The rest is explained on the attached jpg.
The model successfully created a 5 input layer nn, with 1 hidden layer (4 hidden layer neurons) and 1 output layer (1 neuron). I'm still not sure how to set the number of hidden layer neurons. The program did it automatically.

While it showed some of the predicted outputs, I have not (yet) been successful in the next step, which is to reload the nn model it spits out, and open a new RM file with an excel sheet minus the out column (since now I expect it to write out the prediction column), and also load the model that was written. I was able to do the gold tutorial with success, but am not able to get to this step in my created model.

If anyone wants to play along and collaborate, here is your chance. I will help anyone get to the step I am at if you need help to get here. If this is successful, and worthy of pursuit in a collaborative forum, I should expect some feedback and furthering of results at this point. I will continue if I see interest and work. If not, I will continue to pursue this solo.

xnrsxs.jpg


Remember, my main goal is to get up to speed on RM, and see if it is useful to rapidly prototype some stuff (rather than hand code). I am still not proficient enough to get to the level of translating system concepts (such as jerry is desiring... ex basic breakout system) into RM for prototype.


2yltqas.jpg

Here is regression results screen. Nothing pretty yet, but it's a checklist to get to here (means you at least have error free prototype).

Excellent contribution.

If you'd zip up the input dataset and the RM realted files (aml, etc.) and post it, then any of us should be able to duplicate and expand on it.

RE: models...you need to do a save model, then set up a new template to read the saved model and your Out of Sample data set and then apply to make the prediction.

Jerry030
 
Quote from Jerry030:

Excellent contribution.

If you'd zip up the input dataset and the RM realted files (aml, etc.) and post it, then any of us should be able to duplicate and expand on it.

RE: models...you need to do a save model, then set up a new template to read the saved model and your Out of Sample data set and then apply to make the prediction.

Jerry030

alright. So, I will work on the zipping it up (haven't zipped in awhile, but I think I have pkzip somewhere).

Good news is I figured out why I wasn't getting the output file. I needed to add a model applier instance to the final output file. Success.

Now the bad/debug...
Results were horrible,
1) it created basically a constant output as the predictor.
2) For some reason, it seemed to only link 4 of the input neurons and 2 hidden layers to the output, on the graphical representation...
could be because I'm not reading it correct.
3) Also, I'm not certain how it normalizes the input range. I set it to linear scale, but don't have much visibility to what it's doing or whether it's doing it right.

The results could just be bad, since the 5 prior days were just delayed versions, although it shouldn't be constant.

-------------------------------------------
Lastly, I think we are jumping to testing many of the models you mentioned.
I'm still having trouble conceptualizing how to translate those tests to RM.

Anyways, I'll zip this up and try to post when i get a chance.
 
The files included should be:

1) qqqq_run1.xml
Load and run to create the nn.

2) qqqq_nn.mod
(not neccessary, as the 1st creates it, but here for comparison).

3) qqqq_wrapper1.xml
run this file separately from the 1st after it is completed. This file is your comparison file that instantiates the nn you created to compare to out of sample data.

For some reason, I had problem zipping the excel file, so i will include it on the next post.
 

Attachments

here is the excel file.

Note, you will need to change all the RM entries that reference the excel file to your own local directory/location you store it to. You also need to redirect the model file the 1st sim outputs to your local directory.

I look forward to seeing replication and discussion.
 

Attachments

Quote from Jerry030:

Time is very limited today so sorry for the very short reply below

1) Neuromaster...thanks for the info. That makes sense. The folks who made Rapid Miner free (open Source) didn't do it to benefit humanity but to make money. Hence the general consulting service offering on their web site and also hence the neuromaster offering. That's fair and fine, IMO. We all need to make a living.

Keep in mind though that they have identified the market well. Most traders are in a hurry kind of person. So why learn a very complex process (predictive modeling), when you can buy a packaged solution?

2) Specific on RM for the markets. The real issue isn't the software app but the process. Can a NN, or Decision Tree or any of the several dozen data mining paradigms RM incorporates work in the market? If the paradigm is implemented accurately the same method will work with any package. Think paradigm not package. Will a NN work? Yes or no...Decision tree ..Yes or no.

3) "I also spoke to someone pretty knowledgeable on RM, who mentioned in a few ways, that predicting via RM is pretty complicated and subject to many errors (such as curve fitting, normalizing input improperly with wide change in scaling regimes during learning/training vs. validating, etc...)."

Yep...these are issues and skills needed with most packages. The corporate grade stuff does all this automatically but we are talking software that is in the multiple 10s of thousands of dollars retail. For a free package you've got to understand and do this yourself. Lack of understanding is why most people who try predictive modeling with NN fail. There is an IT term GIGO -Garbage In, Garbage Out.

4)"I think you are jumping a bit over my capability on the use of the tool by jumping to designing a system with specific trading goals(breakout, etc.).
I'm just not certain of how to translate those goals via RM, yet, although an example would help."

Sorry about that.....you'll find I'll do that...keep reminding me not to.

"From my personal limited knowledge with rapidminer, I was expecting to start with a set we could all run to begin with that is not too complex. This way everyone would be able to get a common say xml model to verify I/O conditions and response."

An idea here may be to start with stuff that is easy to model, learn the process then get into the markets. RM has some test datasets, also there are public data mining data set out there used to benchmark software. Or we can do a very simple market related test...let me know what people think.

"Like say a simple daily equity model (like Qs for instance) nn, say with maybe 1,000 days of data for training and the end result would be to validate the one day prediction over some period and quantify the success over an out of sample set.
This is not so much to validate a system, for me, but to gain a rudimentary knowledge of using rapidminer and ironing out the bugs with some others attempting and looking at potential pitfalls, etc..."

1000 bars is on the small side. You’d be better off with 10,000 even having to use hourly. You may not trade hourly that but you can learn with it.

Also perhaps forget the NN part to start as it is the hardest to master. Start with say a Decision Tree or Rule problem. Use performance as a base then get the NN to improve performance.

Or conversely take a know structure of a simple trading system, there are many …. MACD crossover with RSI confirmation and ADX for exit, and model a prediction of its success or failure on each trade. There you free yourself from the complexity of data normalization as most TIs are by their nature normalized …0 to 100 for the RSI.

Jerry030

Just caught this, and agree with everything here (GI = GO). I used a very small sample set for the simple reason that I don't want to sit around and debug through each iteration over 10,000 epochs, as it would slow down my (and other nophyte's) initial progress on coming up to speed on RM.
However, once I get it running to the point that I understand what I am doing a bit, I can increase the sample space.

I am slowly starting to fathom the concepts you are talking about in terms of translating trading methods to RM. I still think I need to gain more of a grasp of RMs myriad of functions.

And of course, examples always help:D
 
last post unless something major is discovered.

summary: reran with about 4,000 data pts. Maximum training validation.

Figured out how to specify input layers and neurons/layer. Playing with it.
(Have to manually add it to add list).

Trying to incorporate output log to monitor convergence and stop wasting time on long epochs.

Added input nodes (10 delayed samples).

Output is still horrific. For some reason the predicted outputs are all settling to practically a fixed value around the training mean, rather than tracking the expected future response. It's possible that the nn is saturated due to input scaling.

I'm still confused on how it normalizes input range, as
1) Most of the examples I've encountered (and believe me when I say they are few) do not pre process the input range. There was a mult level perceptron example with a wide range of vix values that did not do any type of pre-processing and results converged.
2) The documentation is extremely sparse (get what you pay for =).

I'm going to try to download the newest version, as I'm running 4.0.
They mention there are some improvements.

Don't see too many downloads yet.
Are others really interested in this (i.e. collaborating?).
 
Quote from dtrader98:

alright. So, I will work on the zipping it up (haven't zipped in awhile, but I think I have pkzip somewhere).

Good news is I figured out why I wasn't getting the output file. I needed to add a model applier instance to the final output file. Success.

Now the bad/debug...
Results were horrible,
1) it created basically a constant output as the predictor.
2) For some reason, it seemed to only link 4 of the input neurons and 2 hidden layers to the output, on the graphical representation...
could be because I'm not reading it correct.
3) Also, I'm not certain how it normalizes the input range. I set it to linear scale, but don't have much visibility to what it's doing or whether it's doing it right.

The results could just be bad, since the 5 prior days were just delayed versions, although it shouldn't be constant.

-------------------------------------------
Lastly, I think we are jumping to testing many of the models you mentioned.
I'm still having trouble conceptualizing how to translate those tests to RM.

Anyways, I'll zip this up and try to post when i get a chance.

Dt,

The reason a NN will have the horrible results you experience is that it couldn’t find any predictive information in the independent variables (IV) which in this case are open, high, low and close.
Price change in itself looks random to the human brain; hence the large number of people who think the markets are random in discussions on ET and books like a "Random Walk Down Wall Street". The NN needs something to learn from. People invented Technical Indicators as a way to add information content to raw price data in order to make trading decisions. The NN will need some as well.

If you want, pick some from the list I posted yesterday and I'll post a version of your file with them.

Jerry030
 
Quote from Jerry030:

Dt,

The reason a NN will have the horrible results you experience is that it couldn’t find any predictive information in the independent variables (IV) which in this case are open, high, low and close.
Price change in itself looks random to the human brain; hence the large number of people who think the markets are random in discussions on ET and books like a "Random Walk Down Wall Street". The NN needs something to learn from. People invented Technical Indicators as a way to add information content to raw price data in order to make trading decisions. The NN will need some as well.

If you want, pick some from the list I posted yesterday and I'll post a version of your file with them.

Jerry030

Did you review the input data? I attached all of the files. It is not open high low close data. Each vector contains delayed versions of the adjusted close. And I have had this input data converge before on a different platform. My problem is seeing if RM is worthwhile to prototype these types of simulations faster. As I said, I am still not sure how or if RM rescales the input data automatically. The example on the web I looked at did not pre-process, which is why I didn't.

I can trouble-shoot this and get it to work, however, unfortunately, I'm not seeing much collaboration here (what happened to all the posters that said they wanted to work on it?). Feel free to take my model files and apply any types of the data you mention, if it converges that will at least be more progress in the right direction.

dtrader98
 
Back
Top