Set hadoop system user for client embedded in Java webapp

Christoffer Soop picture Christoffer Soop · Jun 14, 2012 · Viewed 31.9k times · Source

I would like to submit MapReduce jobs from a java web application to a remote Hadoop cluster but am unable to specify which user the job should be submitted for. I would like to configure and use a system user which should be used for all MapReduce jobs.

Currently I am unable to specify any user and no matter what the hadoop job runs under the username of the currently logged in user of the client system. This causes an error with the message

Permission denied: user=alice, access=WRITE, inode="staging":hduser:supergroup:rwxr-xr-x

... where "alice" is the local, logged in user on the client machine.

I have tried

  1. various combinations of creating UserGroupInformation instances (both proxies and normal user) and
  2. setting the Java System property with -Duser.name=hduser, changing the USER envar and as a hard coded System.setProperty("user.name", "hduser") call.

... to no avail. Regarding 1) I admit to having no clue on how these classes are supposed to be used. Also please note that changing the Java System property is obviously not a real solution for use in the web application.

Does any body know how you specify which user Hadoop uses to connect to a remote system?

PS/ Hadoop is using the default configuration meaning that no authentication is used when connecting to the cluster and that Kerberos is not used in communicating with the remote machines.

Answer

Christoffer Soop picture Christoffer Soop · Jun 16, 2012

Finally I stumbled on the constant

static final String HADOOP_USER_NAME = "HADOOP_USER_NAME";`

in the UserGroupInformation class.

Setting this either as an environment variable, as a Java system property on startup (using -D) or programmatically with System.setProperty("HADOOP_USER_NAME", "hduser"); makes Hadoop use whatever username you want for connecting to the remote Hadoop cluster.