I would like to submit MapReduce jobs from a java web application to a remote Hadoop cluster but am unable to specify which user the job should be submitted for. I would like to configure and use a system user which should be used for all MapReduce jobs.
Currently I am unable to specify any user and no matter what the hadoop job runs under the username of the currently logged in user of the client system. This causes an error with the message
Permission denied: user=alice, access=WRITE, inode="staging":hduser:supergroup:rwxr-xr-x
... where "alice" is the local, logged in user on the client machine.
I have tried
UserGroupInformation
instances (both proxies and normal user) and-Duser.name=hduser
, changing the USER
envar and as a hard coded System.setProperty("user.name", "hduser")
call.... to no avail. Regarding 1) I admit to having no clue on how these classes are supposed to be used. Also please note that changing the Java System property is obviously not a real solution for use in the web application.
Does any body know how you specify which user Hadoop uses to connect to a remote system?
PS/ Hadoop is using the default configuration meaning that no authentication is used when connecting to the cluster and that Kerberos is not used in communicating with the remote machines.
Finally I stumbled on the constant
static final String HADOOP_USER_NAME = "HADOOP_USER_NAME";`
in the UserGroupInformation class
.
Setting this either as an environment variable, as a Java system property on startup (using -D
) or programmatically with System.setProperty("HADOOP_USER_NAME", "hduser");
makes Hadoop use whatever username you want for connecting to the remote Hadoop cluster.