Help with Correlation Analysis ->

Rodney King · Jan 25, 2012

Quote from rosy2:

looks complicated compared to my approach which took about 3 minutes.

Exactly right. Excel is a spreadsheet. It's intended for accountants who need to add columns of numbers. Many push Excel beyond its comfort zone and winding up with an undebuggable, unverifiable mess, then have to backtrack and start over in another programming environment. Re your pandas/py solution, the same thing is doable in a few lines of R also.

rosy2 · Jan 25, 2012

Quote from Rodney King:

Exactly right. Excel is a spreadsheet. It's intended for accountants who need to add columns of numbers. Many push Excel beyond its comfort zone and winding up with an undebuggable, unverifiable mess, then have to backtrack and start over in another programming environment. Re your pandas/py solution, the same thing is doable in a few lines of R also.

library(quantmod)
getSymbols("YHOO",src="google")
getSymbols("GOOG",src="google")
getSymbols("MSFT",src="google")
df = data.frame( dailyReturn(YHOO), dailyReturn(GOOG), dailyReturn(MSFT) )
cor(df)

took less than 10 minutes but for me over 3. I am not quite as fast with R

Kevin Schmit · Jan 29, 2012

Quote from rosy2:

looks complicated compared to my approach which took about 3 minutes.

The mere correlation matrix, leaving out the internal mechanics, can be got in Excel with a couple fo clicks, assuming the user has turned on the Analysis Toolpack. From the "Data" ribbon, click "Data Analysis" on the far right of the ribbon. In the pop-up dialog choose "Correlation" and click the OK button. Click in the "Input Range:" box and then highlight your data columns. Then click OK. The correlation matrix will appear in a new sheet.

Less than 3 minutes and the OP would not have to install and learn Python.

My previous complicated Excel reply illustrates the internal mechanics of calculating the correlation and conditional correlation matrices, something your Python script does not do.

Kevin Schmit · Jan 29, 2012

Quote from Rodney King:

Excel is a spreadsheet. ...an undebuggable, unverifiable mess, ...same thing is doable in a few lines of R.

The OP seems barely literate in Excel. Trying to learn R would be a complete waste of time for him and an excercise in frustration.

According to this post from Xignite, OP has a lot of company:

http://www.hftreview.com/pg/blog/xi...nology-challenges-all-blown-away-by-the-cloud

Read down to Tech Challenge #5, where it is claimed that "Letâs face it most of the hedge fund industryâs important activities such as modeling and reporting are still done in Microsoft Excel."

Soon2Bgreat · Jan 29, 2012

zedDoubleNaught · Jan 30, 2012

Quote from rosy2:

library(quantmod)
getSymbols("YHOO",src="google")
getSymbols("GOOG",src="google")
getSymbols("MSFT",src="google")
df = data.frame( dailyReturn(YHOO), dailyReturn(GOOG), dailyReturn(MSFT) )
cor(df)

took less than 10 minutes but for me over 3. I am not quite as fast with R

I really like Python now. In just a few hours over the last few days I've gone from nothing to some lines of code that can get a value from a specific cell in a list of old spreadsheet files I made last year, then make a Series and get some stats on them, so they're no longer dead data. Before I didn't look at them, the spreadsheets looked like walls of numbers, and too cumbersome to open them all. Very fast and easy to get something working with Python, thanks again for pointing me in the right direction.

It's probably doable in R too, I haven't looked into R though.

Rationalize · Feb 1, 2012

Quote from rosy2:

library(quantmod)
getSymbols("YHOO",src="google")
getSymbols("GOOG",src="google")
getSymbols("MSFT",src="google")
df = data.frame( dailyReturn(YHOO), dailyReturn(GOOG), dailyReturn(MSFT) )
cor(df)

took less than 10 minutes but for me over 3. I am not quite as fast with R

Wow .. that was quick.

daily.returns daily.returns.1 daily.returns.2
daily.returns 1.0000000 0.3759001 0.3630186
daily.returns.1 0.3759001 1.0000000 0.5659315
daily.returns.2 0.3630186 0.5659315 1.0000000

I'm guessing the OP wants to long/short a combination of something and something else around an estimated mean.

How about a 3 minute particle filter?

Rodney King · Feb 1, 2012

Quote from Rationalize:

Wow .. that was quick.

daily.returns daily.returns.1 daily.returns.2
daily.returns 1.0000000 0.3759001 0.3630186
daily.returns.1 0.3759001 1.0000000 0.5659315
daily.returns.2 0.3630186 0.5659315 1.0000000

If you want to beautify a bit, you could replace the last line of the script with

> names(df)<- c("YHOO","GOOG","MSFT"); round(cor(df),2)
<pre> YHOO GOOG MSFT
YHOO 1.00 0.38 0.36
GOOG 0.38 1.00 0.57
MSFT 0.36 0.57 1.00</pre>

maninjapan · Apr 8, 2013

I know this is an old thread, but hope some of the contributors are still around. I am looking at creating an indicator based on the correlation of a moving period between 2 futures contracts (same way a moving average works). This seems to be easily achievable in excel using correl function but I have a couple of questions regarding this.

1. will this provide relatively accurate calculations. I have read that excel correl function is not provide accurate results ( hence recommendations to use R squared or similar for more accurate results)

2. Am I better off using close to close as the comparison basis or daily returns? from memory I believe using the daily returns is the best way to compare futures contracts. (not that my memory is anything to go by...)

Thanks in advance

fullautotrading · Apr 13, 2013

May use the SCX index:

http://www.datatime.eu/public/gbot/MetricsForAlgorithmicTrading.htm#SCX

It captures any form dependence. Not only the "linear" component (as r does).
(And it is *much* more efficient computationally speaking)