Getting 'sudo: unknown user: hadoop' and 'sudo: unable to initialize policy plugin error' on Google Cloud Platform while running hadoop cluster

user2602096 picture user2602096 · Nov 4, 2014 · Viewed 11.2k times · Source

I am trying to deploy the sample Hadoop app provided by Google at https://github.com/GoogleCloudPlatform/solutions-google-compute-engine-cluster-for-hadoop on Google Cloud Platform.

I followed all the setup instructions given there step-by-step. I was able to setup the environment and start the cluster successfully. But I am not able to run the MapReduce part. I am executing this command on my terminal:

./compute_cluster_for_hadoop.py mapreduce <project ID> <bucket name> [--prefix <prefix>]
--input gs://<input directory on Google Cloud Storage>  \
--output gs://<output directory on Google Cloud Storage>  \
--mapper sample/shortest-to-longest-mapper.pl  \
--reducer sample/shortest-to-longest-reducer.pl  \
--mapper-count 5  \
--reducer-count 1

And I am getting the following error:

sudo: unknown user: hadoop
sudo: unable to initialize policy plugin
Traceback (most recent call last):
File "./compute_cluster_for_hadoop.py", line 230, in <module>
main()
File "./compute_cluster_for_hadoop.py", line 226, in main
ComputeClusterForHadoop().ParseArgumentsAndExecute(sys.argv[1:])
File "./compute_cluster_for_hadoop.py", line 222, in ParseArgumentsAndExecute
params.handler(params)
File "./compute_cluster_for_hadoop.py", line 51, in MapReduce
gce_cluster.GceCluster(flags).StartMapReduce()
File "/home/ubuntu-gnome/Hadoop-sample-app/solutions-google-compute-engine-cluster-for-hadoop-master/gce_cluster.py", line 545, in StartMapReduce
input_dir, output_dir)
File "/home/ubuntu-gnome/Hadoop-sample-app/solutions-google-compute-engine-cluster-for-hadoop-master/gce_cluster.py", line 462, in _StartScriptAtMaster
raise RemoteExecutionError('Remote execution error')
gce_cluster.RemoteExecutionError: Remote execution error

Since I have followed all the steps given there as-it-is, I am not able to understand why this issue is arising?

Is the 'hadoop' user actually not created in the previous scripts executed, or there is a problem with user permissions? Or the problem is somewhere else?

Please help me with this error..!! I am stuck here and can't proceed further.

Answer

Dennis Huo picture Dennis Huo · Nov 4, 2014

The setup process is normally expected to create the user 'hadoop' automatically; it's done inside startup-script.sh on line 75-76:

# Set up user and group
groupadd --gid 5555 hadoop
useradd --uid 1111 --gid hadoop --shell /bin/bash -m hadoop

It's possible that some portion of the setup actually failed.

That said, the sample you're referencing, while still useful as a starting point if you're writing your own Python application which interacts with the GCE API directly, is deprecated as a way to deploy Hadoop on Google Compute Engine. If you actually want to use Hadoop, you should use the Google-supported deployment tool bdutil and its associated quickstart. There are some similarities in the cluster which gets deployed, including the setup of a user hadoop. A key difference, however, is that bdutil will also include and configure the GCS connector for Hadoop so that your MapReduce can operate directly against the data in GCS rather than needing to copy it into HDFS first.