Exploring the Potential of Synthetic Data in Trading Strategies

Oddly enough, it is. I use TimeGAN package and made my own metrics of “similarity” based on a variety of characteristics and it seems like it. This said, I only use it (obviously) for stuff that heavily path dependent and not for alpha research - things like thresholds for delta hedging, hysteresis bands, stop losses and take profits. Still, it’s a worthwhile investment

I experienced the same. I looked at marginal distributions, correlations between OHLC and correlations across time. They were all similar to real data. In addition to that, I looked at characteristic that are present in specific markets such as fat tails, volatility clustering, and leverage effect. The synthetic data generated by GANs showed these as well. TimeGAN is a good model, but lags a bit in the latter.
 
In real market data, the price variance (vol) is always changing. In statistics, a random variable is generated using probability density functions and the probability integral transform. The problem with this approach to data generation is that, with these models, variance is a simple constant.

If you allow the density function to use functional or stochastic variances, things very quickly become too complex for basic statistics to deal with. Even simple and natural assumptions about how real market price variance evolves can have wildly complex outcomes in terms of the data being generated.

You can experiment with it in excel. You simply put a uniform variable into the inverse cumulative density function of a given pdf, and then let the variance be any combination of things. The statistics becomes unwieldy, but the data looks much more like market data.

https://en.wikipedia.org/wiki/Doubly_stochastic_model
https://en.wikipedia.org/wiki/Compound_probability_distribution

Have you ever looked at GARCH? It is really good in predicting future volatility/variance, and the packages in R and python are really easy to use.

https://en.wikipedia.org/wiki/Autoregressive_conditional_heteroskedasticity
 
Back
Top