best distributed file system for homebrew hedge funds

Quote from lolatency:

I'm trying to speed up data-mining. Right now, I have R-code that runs; however, R is not multi-threaded. So it runs really slow, or I have to manually cut the files and run multiple instances of R. Also, R is really bad with large files.

What I want to do is write the data to a network file system and then use something like Gearman (http://www.gearman.org/doku.php) to do the functions on the data from the file.

you can get python and the ipython shell which has parallel support and the rpy2 module which will let you talk to R. then you can run as many workers as you have resources for
 
Quote from lolatency:

I'm trying to speed up data-mining. Right now, I have R-code that runs; however, R is not multi-threaded. So it runs really slow, or I have to manually cut the files and run multiple instances of R. Also, R is really bad with large files.

What I want to do is write the data to a network file system and then use something like Gearman (http://www.gearman.org/doku.php) to do the functions on the data from the file.

Samba can do what I want, but what kind of bugs me about samba is that when I am windows, it tries to reconnect those drives and if a box is down, I end up getting some really terrible boot ups. Maybe I can use samba?

I looked at AFS before. AFS looks hard to maintain and looks complex. Is there a decent tutorial? I -love- programming, but I hate system administration.

Why don't you use MySQL and have R read directly from the database ?

And you can run R from the command line in Linux, as many instances as you want and just look at the output in Windows.
 
Back
Top