Quote from opt789:
Look at SPX for 9-23-03, 7-5-05 through 7-13-05, and 12-27-05 the data is missing, they don�t have it. Look at the top of the screen it says �No Data� and they just fill in the previous day�s data so if you aren�t paying attention you will use fake data and not realize it.
And on random days like 6-24-04 they are missing a whole bunch of the strikes.
If you used this data they you must have not checked it very well, or don�t need consistent and complete data for your tests.
I am not sure what you are asking about tick data. The EOD data on SPX alone from 2003 to 2010 is well over 100MB. To optimize an option trading idea you can�t have preconceived notions, you have to check the various possibilities and figure out what has the best risk/reward ratio for your personal trading preferences and tolerances. That means you have to check many different strikes and various months: which option spread do you trade, when do you trade it, how far away is it, how do you hedge it (with options or underlying) and when do you hedge it, do you roll, are you always in a trade or just sometimes, how are stops and fast markets handled? These are just a small sample of the various questions that have to be tested to optimize a trading idea. So back testing just on EOD data takes searching through a lot of data many, many, many times as the very large, multidimensional matrix of possibilities are tested. I honestly don�t know how much data it is for a decade of SPX option data that has each and every intraday bid and ask change of every option. If we say the EOD data is about 150MB and you just have a snapshot every minute then you would have 405 minutes per day so you would have 60 gigabytes of data for a decade of SPX options. I can search through a decade of EOD SPX data in Excel VBA in a matter of milliseconds because I can load it all into memory. Loading 100MB and searching it in just a few milliseconds is no big deal, but multiply that by 405 and that is a different story and that is just for 1 minute data.
Thanks for the heads-up. I used RUT and NDX data, but I'll check to see if I have the same difficulties. I was using 2005 through 2009 data.
I generally do my back-testing with a PHP application aimed at specific strategies. Sometimes I leave the data in CSV files, but usually I populate a data base. My only significant penalty for data set size is execution time. Although I used to be pretty good at VB6, it never really occurred to me to do it in VBA.