Standard deviation for irregularly spaced timeseries

trade4succes · Jan 11, 2021

Standard deviation of N=10 is the same measure as of N=100, so it shouldn't matter (make sure to divide by N-1 because you are working with observational data). What you could also consider is taking standard deviation not by time, but by number of observations. So instead of measuring every minute, measure every, say 100, observations.

stochastix · Jan 11, 2021

MarkBrown said:
well that is true but faced with trying to decipher data sometimes smoothed is better than raw for observation.

yes that is true, but standard deviation is only a valid measure for series with zero autocorrelation, etc. The vast majority of times when people say they are smoothing something they are interpolating where the interpolation isnt valid. Proper handling requires non-linear filtering, and a signal and noise model, or something LIKE that. A smoother is just a low-pass filter which adds lag. we could go into information theory and the Nyquist limit and Shannon entropytheorems but that would be an ass kicking

longandshort · Jan 11, 2021

kroxobor said:
Hi fellows,

Suppose I have a collection of irregularly spaced time series (tick data with time_obs(i+1) - time_obs(i) ranging from 2ms to 20ms). I want to calculate 1-minute standard deviation of returns.

The question is: should the number of observations be the same in the every sample of 1 minute series? i.e. for example, the first minute contains 50 observations and the second minute 100 has observations. After calculating standard deviations for both is it legal to make claims like volatility of price in the first minute is higher than in the second since std1 > std2?

Thanks in advance.

Here's a primer to help you navigate this: show.cgi (uvt.nl)

Craig66 · Jan 11, 2021

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=208278

trendmomentum · Jan 11, 2021

kroxobor said:
Hi fellows,

Suppose I have a collection of irregularly spaced time series (tick data with time_obs(i+1) - time_obs(i) ranging from 2ms to 20ms). I want to calculate 1-minute standard deviation of returns.

The question is: should the number of observations be the same in the every sample of 1 minute series? i.e. for example, the first minute contains 50 observations and the second minute 100 has observations. After calculating standard deviations for both is it legal to make claims like volatility of price in the first minute is higher than in the second since std1 > std2?

Thanks in advance.

Whatever you do, you will need to normalise the data before you apply any stats to it. Is time really that important to you or are you just trying to get a measure of volatility? Tick data is normally full of noise and very hard to replicate the backtest results with real time ticking data, especially across different data providers.

Consider working with range intervals instead.

Good luck.

stochastix · Jan 12, 2021

trendmomentum said:
Whatever you do, you will need to normalise the data before you apply any stats to it. Is time really that important to you or are you just trying to get a measure of volatility? Tick data is normally full of noise and very hard to replicate the backtest results with real time ticking data, especially across different data providers.

Consider working with range intervals instead.

Good luck.

data is not noisy at all. unless you have a shotty provider , its all information. brains actually operate with noise on a 1/f spectrum . stochastic resonance

stochastix · Jan 14, 2021

stochastix said:
yes that is true, but standard deviation is only a valid measure for series with zero autocorrelation, etc. The vast majority of times when people say they are smoothing something they are interpolating where the interpolation isnt valid. Proper handling requires non-linear filtering, and a signal and noise model, or something LIKE that. A smoother is just a low-pass filter which adds lag. we could go into information theory and the Nyquist limit and Shannon entropytheorems but that would be an ass kicking

see my article
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.9564&rep=rep1&type=pdf

kroxobor · Jan 18, 2021

stochastix said:
see my article
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.9564&rep=rep1&type=pdf

I'm afraid I'll find difficult to code it properly.

What I try to learn is how mean-reversion on 1M timeframes depends on near-term preceding volatility of returns. Using BTC data I wanted to pick some random days in 2020 and from tick data construct time series with interval of 0.5s - 1s. After computing 1-minute volatility for a particular day I wanted to see how behaviour of volatility in N preceding minutes affects ability of price to revert after a breakout by, lets say, 20 USD in the current minute. Breakout event is when price moves by T Dollars for example in 0.3-0.4 sec.

Sorry this idea is still crude and poorly explained but I would appreciate any ideas and comments on this.