I would like to introduce you to my web application

In the simplest case, you will predict the S&P 500 based on the history of the S&P 500 (close/volume). I am very interested in exploring the combinations of different data predictors (either user selected or automatically searched).

Genetic programming essentially automatically build a program/expression/formula to model/predict some problem. Very opened ended in its applicability.
 
Interesting you mention close + volume. What volume do you use? Consolidated tape or primary-listing venue? Do you incorporate pre and post market volume? Do you reduce volume for cancelled trades?

Understanding the source and validity of your data is VERY important.
 
Yes, but do you have any idea of the volume that they use? You are using garbage data for volume which will change once you get a data source you understand. Just quoting the source as Quandl doesn't make it good.

Let me give you a hint.

Volume on Quandl for S&P 500 for 2017-03-17 shows 5,178,040,000. The actual volume of S&P 500 was nowhere near this!

In case you can't work it out, Yahoo's S&P 500 volume is actually the total volume of the NYSE-listed shares (not including NYSE Mkt/NYSE Arca).

The volume data has nothing to do with S&P 500!

It's complete and utter garbage to publish a volume figure against an index that has nothing to do with the index! Quandl, Yahoo and CSI should be ashamed.
 
Last edited:
Of course it's correlated.

But it's still wrong.

Some would argue that volume on an index is garbage anyway, since it equally weights all stocks, no matter their price or weighting in the index, and it rises disproportionately for trades in low priced stocks. e.g. consider the effect of AAPLs 7:1 stock split in 2014. This would have caused a significant increase in the volume for no real change.
 
Back
Top