Quote from Mike805:
We can always tweak the indicator for more/less granularity once we move a bit further along. However, assuming a 5min time scale, does an hourly lag make sense? Should we weight recent vola. more heavily?
For now, the gap problem is a good one. I believe gaps introduce a non-linear effect in the return series. If we're going to analyze intraday vola, does it make sense to use information from a non-linear overnight effect? IMO, the information provided by a gap is important, but, does it offer any tangible benefit to the indicator as is? All the gap will do is inflate the opening vola...
What I'm saying is that we can utilize the information from the gap, but maybe not in this particular measure of clustering? Does this make sense?
I suppose if your focus is to only analyze intraday activity (with a specific window in mind), it might make sense to remove the gaps and thus any type of bias your concerned about. Ultimately, it really depends on what the end game is. Once you've identified more specifically what you are looking for, it might make sense to observe both cases and how one might unnecessarily bias or omit any useful results.
Here's an example study with gap omission:
"This paper compares various measures and forecasts of volatility in equity markets. In the absence of overnight trading it is shown that the daily volatility is best measured by the sum of intraday squared 5-min returns, excluding the overnight return. In the absence of overnight trading, the best daily forecast of volatility is produced by modeling overnight volatility differently from intraday volatility"
http://www3.interscience.wiley.com/journal/92013893/abstract?CRETRY=1&SRETRY=0
Many of the papers that I've observed tend to sum all of the 5min squared returns for one day and use that as a proxy for daily vol.