Originally posted by aphexcoil
I'll try to make this as simple as possible for all the non-programming types.
Here is what I am doing:
I have a routine in my program that takes 5 snapshots of the last current price as well as the volume.
I want to take a two minute average of this data, so naturally that would be 120x5 data points -- or 600 datapoints.
Now, I want to take the average of these data points -- the first 600 of them, and make a number. We'll call that number Y.
Now, the subroutine continues to collect data, and next we average 2-601 and call that Y's of 2. We then average 3-602 and call it Y's of 3. This goes on and on ...
Now, I want to graph Y, which is generated every 200 millseconds on a graph. I want to graph Y's of 2 minus Y's of 1 as my first data point (should be a very small difference). I then want to plot Y's of 3 minus Y's of 2 as my second data point.
Now here is my theory. This little routine will filter out a lot of the noise from the market (I haven't even gotten into the statistical deviation stuff yet).
The baseline of this indicator is 0. If the next Y is larger than the previous Y, this suggests the market is trending upwards. Likewise, the reverse would also hold.
I have no idea what this will look like, how it will relate to price data or anything -- but all I can do it fool around with it until I hack something out that looks like it could be a cool indicator.
Any suggestions?
aphie
Ps: This type of MA sampling of the market would be far faster than anything I've seen on other indicators.
<b>Cool idea, but expand the formulas and see what you really have here:</b> Its always a good idea to expand and combine the mathematical formulas for what you are doing. Often you discover that your effective formula is much simpler than you thought when a bunch of terms cancel out. This can be good if you like to keep things simple. It can be bad, though, if you thought the formula was compensating for something that it really isn't. (I've gotten more than a few chuckles out of some so-called TA's guru's complex "new" formula that just simplifies down to some other well-know TA indicator.)
<b>Velocity as the difference of two moving averages:</b> Lets look at your calculation of velocity of the price (the difference in the y's). By your description of the program (which is very clearly written, by the way) each successive value of Y is the moving average of 600 raw price datapoints taken from a respective successive window. Thus, for example:
Y[1] = AVERAGE(X[1] to X[600])
Y[2] = AVERAGE(X[2] to X[601])
where X is the raw price data, Y is the moving average, and the square bracket ([ ]) notation says to reference that element of the list of the variable. This moving average can be rewritten as a sum that is then divided by the number of datapoints being averaged:
Y[1] = SUM(X[1] to X[600])/600
Y[2] = SUM(X[2] to X[601])/600
we can expand these sums to a few terms to get a feel for them:
Y[1] = SUM(X[1] + X[2] + X[3] to X[600])/600
Y[2] = SUM(X[2] to X[601])/600
So, what happens when we calculate the velocity of the price using these moving average values? The per-minute estimate of velocity (using moving average values spaced at 200-millisecond intervals) is:
V[1] = 300*(Y[2] - Y[1])
The "300" comes from the fact that the raw difference of the Y's is the change in price over a very short period of 200-millisecond. Multiplying by 300 gives us the velocity on a $/minute trend rate. Expanding the formula's for the Y's gives us:
V[1] = 300* ( SUM (X[2] to X[601])/600 - SUM(X[1] to X[600])/600)
V[1] = SUM (X[2] to X[601])/2 - SUM(X[1] to X[600])/2)
<b>All the intermediate terms cancel out:</b> But think about what happens in the subtraction of these two sums. Both sums overlap extensively on the list of X's. Indeed, both sums share X[2] through X[600] for terms. In the subtraction, these shared terms all cancel out. The result is that the velocity formula that is left is only:
V[1] = (X[601] - X[1])/2
This formula for the per-minute velocity may not be as smooth as you would like. Any noise in either X[601] or X[1] will appear in your velocity estimate. Worse, any information about the price dynamics contained in terms X[2] through X[600] is lost by this method of velocity calculation. I'm not saying that this approach is wrong, only that it has some behaviors that do not seem quite "perfect."
<b>Testing this empirically:</b> Admittedly, it was easy to expand and simplify these formulae for estimating price velocity from the moving average of the price. As you progress to more sophisticated moving averages, it may not be easy to replicate this analysis. But you can replicate the analysis empirically by testing how the list of velocity estimates changes in response to a change in a single datapoint someplace. Empirical testing would also let you examine how a blackbox automagical moving average works.
For example, say we calculate the list of velocity values using this difference-of-moving-averages algorithm. Then we perturb the raw data by adding $1 to, for example, datapoint X[602] and recompute the list of velocities. Next, take the difference of the perturbed and unperturbed velocity data. We will find that V[2] is $0.50/minute higher than before and that V[602] is $0.50/minute lower than before. All other values of the velocity are unchanged, reflecting the fact that X[602] really only contributes to estimating the velocity at two points due to the cancelation of terms.
By empirically testing the entire system, you learn how one small change in the data someplace can lead to a change in the system someplace else. For more complex nonlinear, adaptive systems, you might try both small and large perturbations, both negative and positive perturbations, and perturbations at various features in the price stream (e.g. at gaps, in the middle of trends, etc.).
<b>What Are you really after?</b> Maybe creating a smooth estimate of the price is not the best way to go. As it turns out, you really do not care about estimating the price, you only care about estimating the velocity. Think about how you might estimate velocities directly and how you might filter or process those estimates to smooth the noise out of the velocity estimates while avoiding excessive lag.
You are on to some interesting ideas here, and I'm sure we are all enjoying the mental exercise.
Thanx for sharing everyone,
Traden4Alpha
P.S. Don't bother to zip the data files. A new internal 160 GB harddisk costs only about $400 and will provide enough space to 64 years of daily data at 10 MB/file (that's a price of 0.0025/MB or 2.5 cents to store a copy of the file). Zipping and unzipping will only delay your work. (After what good is Moore's law if it can't make your life easier)