Quote from PocketChange:
Raw Market Message Data averages closer to 20GB / Day Compressed. OPRA, Futures, Equities etc. You'll need about 12TB just to store the compressed files and 4x that and a lot of I/O and CPU's to process the data.
It takes 5 min to 25 min to run through a day processing Billion + Messages. Take a look at Nanex, they are a decent source for historic whole market message data.
A better alternative would be tick accurate consolidations from the raw message data. Instead of processing a Billion + messages each run through a day be nice if someone offered just the bid/ask changes and related trade data.
This would reduce SPY as an example from 10M messages in a typical day into 10K consolidated price tick records. There is no need to process Level II historic liquidity and Time and Sales alone doesn't provide an accurate representation of "executable" market conditions.
Tick accurate Bid/Ask Changes and a snapshot of traded activity inside each price change interval would reduce the data sizes and processing times by a factor of a 1000 and provide a clear and concise view of the market.
Load these tick consolidations into SQL and you'll have a fast accurate back test and analysis platform. SQL Queries can return result sets in milliseconds versus coding a C++ 5M/s message processor that still has to churn through the compressed raw data files.