I'm working on a code that work with Epiphany processor (http://www.parallella.org/) and to run Epiphany codes i need sudo privileges on host side program. There is no escape from sudo!
Now i need to run this code across several nodes, in order to do that i'm using mpi but mpi wont function properly with sudo
#sudo mpirun -n 12 --hostfile hosts -x LD_LIBRARY_PATH=${ELIBS} -x EPIPHANY_HDF=${EHDF} ./hello-mpi.elf
Even a simple code that does node communication does not work. The ranks comes 0 if i use sudo. Communication between threads works but not across nodes. This is important because i wanted to divide the work load properly across the cards.
here is the simple code
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Hello World from MPI Process %d on machine %s\n", rank, processor_name);
MPI_Finalize();
}
This code should spit out the rank number differently across the nodes but it does not work with sudo
Any help on this would be great
Here is the output from running the above code without sudo.
mpirun -n 3 --hostfile $MPI_HOSTS ./mpitest
output:
Hello world from processor work1, rank 1 out of 3 processors
Hello world from processor command, rank 0 out of 3 processors
Hello world from processor work2, rank 2 out of 3 processors
This is as expected.
Here is the output from running the above code with sudo.
sudo mpirun -n 3 --hostfile $MPI_HOSTS ./mpitest
output:
Hello world from processor command, rank 0 out of 1 processors
Hello world from processor work1, rank 0 out of 1 processors
Hello world from processor work2, rank 0 out of 1 processors
This is not.
Edit:-
I think @Hristo Iliev got the right answer but I'm not going to be able to test this out
Short answer: instead of sudo mpirun -n 12 ... ./hello-mpi.elf
, the command should be:
mpirun -n 12 ... sudo -E ./hello-mpi.elf
For that to work properly, you have to modify the sudo
configuration (via visudo
) on all hosts and enable passwordless operation for your user:
username ALL = NOPASSWD:SETENV: /path/to/mpirun
This entry will allow your user to run sudo mpirun
without first authenticating yourself, which is important since only the standard input of rank 0 is redirected. It will also allow you to execute sudo
with the -E
option in order to allow it to pass the special Open MPI variables (OMPI_...
) to the executable (without those variables in the environment, the executables cannot connect to each other and instead run as singletons).
Long answer: Running mpirun
with sudo
results in the former being executed with effective user root
. The way mpirun
creates an MPI job is by first launching the requested number of executables and then waiting for them to get to know each other during the MPI_Init
call. Depending on the content of the host list file, mpirun
either spawns a child process (for host entries that match the host mpirun
is executed on) or starts a process remotely using rsh
, ssh
or some other mechanism (e.g. many cluster resource management systems have their own mechanisms for that). When the rsh
/ssh
mechanism is used, since the program runs as root, mpirun
attempts to log into the other host(s) as root. This usually fails for one or both of two reasons:
That's why you see rank 0 coming up (it's a local fork()
-based spawn) and the other ranks missing. Since enabling remote root login is considered a security risk by many, I would rather go the way described in the short answer.
Another option would be to make hello-mpi.elf
owned by root and set the Set UID bit via chmod u+s hello-mpi.elf
. Then you won't need sudo
at all. This will not work if the filesystem is mounted with the nosuid
option or if some other security mechanism is active. Also root-owned suid binaries pose security risks since they always execute with root permissions, no matter what user runs them.
I wonder, why you need root permissions in order to talk to the Epiphany board. Is the SDK doing some fancy privileged operations or is it simply accessing a device file in /dev
that is only writeable by root? If it's the latter, perhaps the device node could be created with different permissions.