Pages

Monday, October 25, 2010

I must post this in the midst of my elation before I run into another crippling error

I was composing an email to the R mailing list to ask for help on the issues I described below. Wanting to seem very serious and well-researched, I double checked everything I was doing before sending it. In the process, I found the error. Maybe I should take that as an approach to problem solving in general: write out my problems as if I'm addressing the ultimate experts on the issue. That way if I fail to figure it out, I have a nice message ready to send, but if I figure it out, then I figure it out!

Here's the email:

I'm trying to set up a large cluster of computers at my university
lab. All clients are running XP SP3, and I'm working from a OS X
10.6.4 machine set up as the master; using R 2.11.1 on all machines.

I've succeeded at getting password-less SSH access to the client
machines using Cygwin, but I'm having issues with initializing sleighs
that seem to stem from the "DOS vs UNIX" path-style issues.
I've tried what seem to be two different correct approches, and get
two different errors.

1. The unix-style way; accessing the contents of the C drive through
/cygdrive/c/...:

s=sleigh(
nodeList=c("
lab211-18.mydomain.edu"),
launch=sshcmd,
user="authorizedUser",
scriptExec=envcmd,
scriptDir="'/cygwin/c/Program Files/R/R-2.11.1/library/nws/bin/'",
scriptName="RNWSSleighWorker.py",
verbose=TRUE)

2. The DOS-style way, where the changes are to the "scriptExec" and
"scriptDir" options:

s=sleigh(
nodeList=c("
lab211-18.mydomain.edu"),
launch=sshcmd,
user="authorizedUser",
scriptExec=scriptcmd,
scriptDir="'c:\\Program Files\\R\\R-2.11.1\\library\\nws\\bin\\'",
scriptName="RNWSSleighWorker.py",
verbose=TRUE)

I've verified that the "RNWSSleighWorker.py" exists in the path
"C:\Program Files\R\R-2.11.1\library\nws\bin\" on the client
machine...

At this point I was going to paste in the exact errors I was seeing
and I realized I was missing a directory ("\R\") from all the path's
above, and that I was doing the unix style one with "cygwin" instead
of "cygdrive". Now, the "dos-style way" seems to be working completely
and totally.

So now I can supposedly do parallel computation. I'm almost afraid to take the next step, for fear of what horrors inevitably await!

unix R and windows not playing nice.


This is beginning to have a very quixotic feeling to it. One of the lab consultants who's been away for five weeks on an internship stopped by today, and as I was updating him on my progress on the project it struck me how little I've actually accomplished in five weeks. True, I've overcome some major bariers and discovered some interesting things, but ultimately I'm still not doing distributed computing.

So far, my accomplishments include figuring out the SSH paswordless login with RSA encription, and getting the client machines to respond to unix commands from the host. Before (when I was using COPSSH), I kept getting an error saying that the 'env' command couldn't be found on the remote machine. 'env' is a really basic unix command, but it seems that it wasn't included in the version of Cygwin that was packaged with COPSSH. I resolved the issue by just installing Cygwin directly. The first time through, I didn't realize how enormous the full Cygwin set of packages is and ended up installing 6gb of mostly unknown material to a computer. I was gratified to see that SSH still works with just the minimal subset of Cygwin packages.

The problem I'm having now is even more frustrating, and gives credit to the recommendation from ReVolution not to mess with Windows clients. The issue is with the quoting of file paths, of course. Unix uses forward slashes and Windows uses backslashes; and in most programming languages the backslash is an escape character. Why oh why would Microsoft have chosen to ignore this reality? Unix and C were around and well established before they came on the scene with their dos-style paths... could that have done it explicitly to make it difficult to operate between platforms (and thus increase switching-costs?).


The crux of the issue is in the last line, right after the "\nws\bin\" bit. See that extra forward slash that follows? It seems thats inserted automatically by the stored function, regardless of what I enter as a parameter. Which is, incidentally, the following:

s=sleigh(

nodeList=c("coblab211-18.business.uc.edu"),

launch=sshcmd,

user="murphytj1",

scriptExec=scriptcmd,

scriptDir="'c:\\Program Files\\R-2.11.1\\library\\nws\\bin\\'",

scriptName="RNWSSleighWorker.py",

verbose=TRUE)


The double backslashes are what's required to overcome the escape character issue. The "scriptDir" is supposed to be the path to the file on the remote machine where the NWS client script is located. But despite the fact that I'm supposedly executing it with a windows style command (what the scriptExec option scriptcmd does), the function appends the forward slash. It is, I think, an error on the part of the writers of the NWS package. Now I'm trying to dig through their code to see if I can fix it; chances are slim.

Equally grim news on the Snowfall front; I can't even get sfCluster running on the host machine. Many many dependencies are apparently missing; starting with the gcc compiler (which I fixed). Even more frustrating, when I call make install, I get another error that looks like it was an issue with the coding:

Ok. Now call 'make install'.
sh-3.2# make install
Making/probing required folders...
mkdir -pv /usr/local/bin
mkdir -pv /usr/local/etc/sfCluster
mkdir -pv /usr/local/lib/sfCluster
mkdir -pv /usr/local/share/sfCluster
mkdir -pv --mode=700 /usr/local/var/sfCluster/tmp
mkdir: illegal option -- -
usage: mkdir [-pv] [-m mode] directory ...
make: *** [install] Error 64
sh-3.2#

What I should probably do is just drop both the macbook and the windows machines and try it on my Ubuntu laptop.