I use correlations in two ways:
- calculate IDM on an expanding window way
- in a bootstrap framework to calculate instrument weights
For the former I'm using an exponentially weighted MA correlation estimate with a lookback of maybe a few years, with let's say a minimum period of 250 days. So we get a situation where we have an instrument has returns, but isn't factored into the IDM. This means the IDM is probably a little bit too low. Heck I can live with that for 250 days.
If you can't live with it, then you can use a little trick. You backfill the returns of the missing assets with the average return of that asset class plus some gaussian noise. The standard deviation of the noise should be calibrated so that the correlation of the new asset with everything else is sensible. This means that new instruments will tend towards having a similar weight to the rest of their asset class.
Of course if the new instrument is the first of it's asset class, you've got a problem. You could . This means new assets will tend towards being equally weighted.
For the latter we might be in a situation where at the start of the year (I normally do weights annually) I don't have data for an instrument that is going to appear sometime during the year. What I do is mark up any instrument for which I definitely want a correlation, and then proceed with my bootstrapping. Let's say each sample I'm bootstrapping 250 days of data from the whole history of data to date.
I then measure the correlation of each bootstrap. There's a very good chance that the new instrument won't appear in a particular bootstrap sample (heck, there's a chance that even an instrument which has been around for donkeys years won't appear in a particular sample). If then use the average return plus noise trick (I precalculate a big data frame of these before I start, otherwise you'd slow the bootstrapping down to a crawl).
The nice thing about this method is that as you get more history and more information you start using the real data more and more.
If that sounds too complicated, then this is something that will be appearing in pysystemtrade in due course.
GAT
Hi,
I have been calculated a covariance matrix using 10yrs of historical data for various instruments. I have noticed that the length of the price series for different instruments varies considerably, meaning that one does not have equal time series to calculate correlations. How do you typically handle this? Do you reduce the length of the time series to the instrument with the shortest length? Are there other ways of handling this?