This might be a FAQ, but given a historical data feed like opentick (specifically, quote and trade tick streams), how do you define "price"?:
- bid/ask midpoint
- last trade
There is the additional complication of multiple trading venues: the primary listing exchange, regional exchanges, ECNs, etc. What do you find most meaningful for your models:
- primary listing venue
- NBBO, or some reasonable approximation of the US "composite market"
My interest angle here is to try to model historical price relationships between multiple instruments (factor models, that sort of thing) at intra-day frequencies and so what "price" is requires some thought before blindly storing a few hundred gigs of data.
(I am not interested in charting/OHLC data for multiple reasons, one of them being that OHLC data is derived from the original raw data which is fundamentally quotes and trades.)
Thanks.
- bid/ask midpoint
- last trade
There is the additional complication of multiple trading venues: the primary listing exchange, regional exchanges, ECNs, etc. What do you find most meaningful for your models:
- primary listing venue
- NBBO, or some reasonable approximation of the US "composite market"
My interest angle here is to try to model historical price relationships between multiple instruments (factor models, that sort of thing) at intra-day frequencies and so what "price" is requires some thought before blindly storing a few hundred gigs of data.
(I am not interested in charting/OHLC data for multiple reasons, one of them being that OHLC data is derived from the original raw data which is fundamentally quotes and trades.)
Thanks.

