Thanks for sharing your thoughts on your Python implementation. I may not agree Python is the right tool for this but that aside I have couple points that may disagree with points made in your blog. I wish you could take the below criticism in a constructive way because I have added an explanation and rational to each point, made:
* You said "I am trading futures, in a fully automated system, which is relatively slow and where latency is not an issue, using only price data. "
-> I am not sure I agree, algorithmic futures trading is anything but slow and latency is of utmost importance, why else do you think there are microwave mechanisms in place between Aurora and Mahwah. Of course you can choose to implement an architecture that is only capable of trading low latency strategies but certainly futures market structure and market capability is more in the realm of microseconds.
* "Proactive or passive tick response" -> This should be a non-issue. Virtually every broker API and trading architecture is built on an event-processing model. Incoming pricing data are the defined events and your system ought to react as response to the incoming pricing data. I have not seen any rational on your end why you would want to even consider a "pull implementation" where you ask a broker or data vendor for prices (other than of course for backfill purposes or to acquire historical back-testable data series). You can internally implement a timer if you so wish and build candles that way.
* "Open, Close and intraday" -> I do not fully understand what you are trying to say here. If you want to develop and test a strategy then you need historical data to test over. In the case of futures contracts that would be the historical data of each single contract from the first day of trading until the expiration of the contract. Tick based historical data are nowadays very easy to come by so I am not sure why you make the latency of your system a function of the availability of your captured data. You can cheaply purchase high precision futures contract exchange data. Also, closing prices are NOT untrustworthy. Closing prices are what they are: Official closing prices. And trading at the next session's open just because you cannot get a fill at closing prices may not be the best idea: If you truly care to close your position before the end of the trading session in a given contract then there are a myriad of other, better, ways to close the position near or at the closing price. By the way the following US exchanges all offer Market on Close order type capability: CBOT, CME, KCBOT, MGE, NYBOT, COMEX, NYMEX
* When and how often? -> As said above if you deal with intraday data then you should not pull data but subscribe to events to have incoming data streamed to your platform.
* Irregular timeseries -> You think about this the wrong way around: To build daily compressed time series from intaday time series you need to either have the complete set of intraday data (down to the tick) or else accept that your daily compressed bars will be inaccurate. Why you want to re-invent the wheel in the first place is beyond me (data providers that make available tick based intra day live and historical data as well as daily data are very cheap nowadays especially when limiting oneself to futures data), but building daily compress data points from only intra day snap shots is a horrible way to go about things, especially with a sampling frequency of 1 hour: You will almost necessarily be off by several ticks on each, your high and low of the day because highs and low occur with a very high probability in between your sampled snapshots. You make this tragically complex and error prone the way you described it.
* "TimeStamps": Why are you concerning yourself with this issue at all? You are dealing with exchange traded futures contracts, hence the price data you receive should already contain the official exchange time stamp, not a broker time stamp, not your own time stamp but an official exchange time stamp. Simple as that, done!
* "Getting synchronised tradeable intraday prices isn't easy, except when an explicit market in the spread is quoted (as for calendar spreads in certain markets, like Eurodollar)."
-> This is not true: You simply subscribe to all the open Eurodollar contracts and receive all live streaming prices for each traded contract in Eurodollar. Simple as that. No need to synchronize. For historical data backtesting you simply store the timestamped contracts and read them in a timestamp sorted fashion.
* Spikes and cleaning -> There should not be different options. When you receive a price of zero or one that lies x standard deviations away from the previously traded prices then that is an erroneous quote, period. You filter it out and are done.
* Volumes – beware You say: "One general reason is that as a rule volume data doesn't seem to be as reliable as price data. "
-> that is not true: The volume shown at a specific trade with time stamp is exactly that. Hence the cumulative volume during a given trading session can be 100% accurately determined.