I'm trying to speed up data-mining. Right now, I have R-code that runs; however, R is not multi-threaded. So it runs really slow, or I have to manually cut the files and run multiple instances of R. Also, R is really bad with large files.
What I want to do is write the data to a network file system and then use something like Gearman (http://www.gearman.org/doku.php) to do the functions on the data from the file.
you can get python and the ipython shell which has parallel support and the rpy2 module which will let you talk to R. then you can run as many workers as you have resources for
I'm trying to speed up data-mining. Right now, I have R-code that runs; however, R is not multi-threaded. So it runs really slow, or I have to manually cut the files and run multiple instances of R. Also, R is really bad with large files.
What I want to do is write the data to a network file system and then use something like Gearman (http://www.gearman.org/doku.php) to do the functions on the data from the file.
Samba can do what I want, but what kind of bugs me about samba is that when I am windows, it tries to reconnect those drives and if a box is down, I end up getting some really terrible boot ups. Maybe I can use samba?
I looked at AFS before. AFS looks hard to maintain and looks complex. Is there a decent tutorial? I -love- programming, but I hate system administration.