Quote from PowerST:
thstart -
I find this thread very interesting...
Thanks - it is a lot effort to formulate the problem and solve it - it took more than three years to figure out how to make a good screener and back testing platform.
Let me try to explain better.
After spending a lot of time at learning technical analysis and strategies I found the way computations are usually made are not satisfactory for me.
To make the long story short for example the overused moving average is not calculated right and I believe watching a long time at the real time charts it is actually readjusted post factum. I don't believed this at first but double checked many times using the Schwab real time tools - I mark a point where the moving average used to be, the chart scrolls and after some time, the already calculated point moves up or down - it readjust itself as more data are coming. Moving average is just one example of a simple but a followed indicator.
Next - using this simple moving average example, a simple trading strategy is if you have two moving averages, the fastest one crossing the slowest from down to up means buy, the fastest one crossing the slowest from up to down means sell. This is very simplified strategy for the sake of illustration. It is also very popular. I like the simplicity but there was two problems:
1) The way it was calculated would give a wrong signals.
2) It is very hard to write a robust good screener and back testing software the way it is described usually. A good I mean to have at least some basic scientific principles involved. The way it is made today - one day ago if fast MA it is below slow MA today, etc. or similar methods for event recognition is very approximate, hard to code and to give a repeatable results. It has to be a data driven approach.
3) Just slow and fast MA are not enough factors to analyse. Man needs better periodicity indicator, more time windows and also more independent factors, not only MA.
4) A very overlooked concept is the decimal computation. I personally have not seen a product to perform screening and back test using decimal computations - it is calculated using a floating point. If computations are inputted to next computations, etc. if you use a floating point there is an error which accumulated, and the results will be incorrect.
I would add that some of the methods described are patent pending.
In order to back test a trading strategy I believe what is needed is as follows:
1) A better way to analyse the stock data time series.
2) A better method for classifying, querying and comparing, filtering and visualization of temporal and spatial relationships in time series data
3) A way for synchronized displaying, controlling, and visualization of information.
4) A very fast high performance parallel Computations, Storage, and Retrieval engine optimized for large scale time series.
I believe we created an innovative way for:
1) Analysis of time series based on
1.1) OHLC data discovering periodicity relationships in one stock or more stocks.
1.1.1) A sliding window calculations of 5,10,20,40,80,160,240,360 trading days. These calculations are based on regression on each of OHLC with a very fast proprietary algorithm. Also there are 5 more calculations based on these - their Total, and several combinations of them. Also a calculations of the difference between the price and these values for each trading day.
1.1.2) A rate of change calculations on each of OHLC and 1.1.1) with a very fast proprietary algorithm.
1.2) Volume data - a rate of change calculations
1.3) Options volume data - a rate of change calculations for 3 groups - Market Makers, Firms, Retail.
1.4) Insider trading info with aggregate total number of shares info rated and divided depending of the type of each participant - Officer, Director, etc.
1.5) Bank information - the ratio between deposits and loans, etc.
1.6) Interest rates - rate of change
1.7) Economic indicators - rate of change and similar to 1.1 computations.
1.8) OHLC relationships in the same day and a day before-
Like (H-L)/O, (H-L)/O1DayAgo and many others - Pivot Points for example.
1-7 are relative long term, 3, 8 - a short term indicators.
I believe they give a good mix of data
influencing the trading decision from different independent point of views.
It approaches the scientific way of analysis as close as possible.
2) Having all these data which are basically time series EOD data points they have to be sliced on factors in order to make a factor analysis. The factor analysis basically says that such and such factors influence what happened. This is not in the sense of causality but it is easier to represent that way:
When we have such and such factors at D-Day, D-D1Day Ago...D10DaysAgo - this happened at D1DayAfter.... D10 DaysAfter D-Day.
D-Day is the day of interest.
In order to analyse what happened after up to 10Days after D-Day we added calculations of the Gain for each of the 10 Days after D-Day.
Up to now we have the following calculations for every single trading day back to as much history data available:
1) ~10 columns for Gains based on Closing price - e.g. together with D-Day it is stored the 1DayAfter...10DaysAfter Gains precalculated - so you can make the screener easily later.
2) ~400 columns data Based on 1.1.1) and 1.1.2) for D-Day, D-D1Day Ago...D10DaysAgo
3) ~400 columns data based 1.2-1.8)
The remaining ~200 columns cover point 2) - a way of classifying,querying and comparing, filtering, etc. of data for
for D-Day, D-D1Day Ago...D10DaysAgo
Basically this is a very simple method
for representing the order between points in D-Day - the day of interest.
If you look at the attached screenshot you can see the letters abcdefghij - 10 letters representing the 5,10,20,40,80,160,240,360 data points for the currently analysed D-Day and also D-D1Day Ago...D10DaysAgo.
9876543210 represent the days 0 being D-Day.
The Orange cross point shows the D-Day of interest intersection. If you look at the two pictures you can see that these D-Days had a similar characterisits - after them there was a significant gain. Now if you look at the letters at righ of each you can see a similar patterns, not the same of course. the most prominent similarity is the upper box brown and black c & a. Then in the middle box h, g & f are at the top.
These letters representing the order are some the factors I am talking about.
Now the final idea is this: for each of the trading days of 1 instrument there are about ~1,000 columns of calculations. Not all of them are used, some of them are intermediate but needed further. These computations have to be made in a reasonable time in order to have a trading decision for the next day. This can be made only with a high performance computing if you want to make these computations for more than one stock - say 5,000 stocks. We wanted to perform a high performance computing but on the available technology today, not supercomputers. I believe if we have a powerful workstation it is still underutilized. For example vectorized SIMD parallelization introduced 10 years ago and available in every Intel or AMD processors today is very underutilized and used mostly for graphic applications. But there is a way to use it for general computing with appropriate programming. Also we can use NVIDIA CUDA enabled card if available for superior performance for a general purpose computations. They are not expensive but have up to 240 processors today.
Now after we have these data as factors, we can have a screener based on a factors analysis. In the screener you select for example some specific letter order on Closing price, say "cabd", 5% Gain 5 Days after D-Day. The screener made this way is very fast because it basically is searching letters ordered in specific way in our case "cabd". The result of screening is a list of all instruments and the D-Days when this happened. you click on selected D-Day and you can see it on the chart.
Now all these data have to be visualized easily 3) A way for synchronized displaying, controlling, and visualization of information.
If you look at the screenshot at right you can see the idea. You want to analyse one instrument at a different time frames at once.
You select the instrument and 4 windows are opened simultaneously with predetermined time frames, say window #:
1) 1 Week period
2) 1 Month period
3) 1 Quarter period
4) 1 Year Period
You click on one of them with the left mouse button to select a D-Day of interest. The chart cursor is positioned automatically at all remaining 3 windows at the same D-Day but you can see it better in relationship with the correpsonding time period.
Then you click at one window with the left mouse button and select a D-Day of interest. The remaining 3 windows a synchronized similarly
The same way you can analyse several instruments each at different time frame in a different window. When you click at one D-Day in one window the cursor is automatically updated at all windows simultaneously. For this you need again a high performance computing to be fast enough.
I believe you understand better now what we accomplished.