Pig keeps trying to connect to job history server (and fails)

Question 1

Pig keeps trying to connect to job history server (and fails)

hadoop apache-pig

badroit · Apr 22, 2015 · Viewed 7.2k times · Source

Answer

Answer

It seems that most Hadoop installation/configuration guides neglect to mention configuring the Job History Server. It seems that Pig, in particular, relies on this server. It also seems like the default (local) settings for the JHS won't work in a multi-node cluster.

The solution was to add the hostname of the server into the configuration in mapred-site.xml to make sure it could be accesses from the other machines. (In my version of the file, the lines had to be added as "new" ... there were no previous settings.)

<property>
  <name>mapreduce.jobhistory.address</name>
  <value>cm:10020</value>
  <description>Host and port for Job History Server (default 0.0.0.0:10020)</description>
</property>

Then restart the job history server:

mr-jobhistory-daemon.sh stop historyserver
mr-jobhistory-daemon.sh start historyserver

If you get a bind exception (port in use), it means the stop didn't work. Either

Use ps ax | grep -e JobHistory to get the process and kill it manually with kill -9 [pid]. Then call the start command above again. Or
Use a different port in the configuration

Pig should pick up the new settings automatically. Run a Pig script and hope for the best.

Question 2

I'm running a Pig job that fails to connect to the Hadoop job history server.

The task (usually any task with GROUP BY) runs for a while and then it starts with a message like:

2015-04-21 19:05:22,825 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-04-21 19:05:26,721 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-04-21 19:05:29,721 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

It then continues for a while retrying the connection. Sometimes it precedes further with the job. Othertimes it throws this exception:

2015-04-21 19:05:55,822 [main] WARN  org.apache.pig.tools.pigstats.mapreduce.MRJobStats - Unable to get job counters
java.io.IOException: java.io.IOException: java.net.NoRouteToHostException: No Route to Host from  cluster-01/10.10.10.11 to 0.0.0.0:10020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
    at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getCounters(HadoopShims.java:132)
    at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addCounters(MRJobStats.java:284)
    at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:235)
    at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:360)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)

I found this question here but in my case the job history server is started. If I run netstat, I find:

tcp        0      0 0.0.0.0:10020           0.0.0.0:*               LISTEN      12073/java       off (0.00/0/0)

Where 12073 is ...

12073 pts/4    Sl     0:07 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Dproc_historyserver -Xmx1000m -Djava.library.path=/data/hadoop/hadoop/lib -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/data/hadoop/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop/logs -Dhadoop.log.file=mapred-hadoop-historyserver-cluster-01.log -Dhadoop.root.logger=INFO,RFA -Dmapred.jobsummary.logger=INFO,JSA -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer

I tried opening the port 10200 in case it was a firewall issue:

ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:10020

... but no luck.

After a few minutes, some of the tasks just arbitrarily continue to the next part.

I'm using Hadoop 2.3 and Pig 0.14.

My question is:

1) What are the possible reasons why Pig cannot connect to the job history server (JHS) given that the JHS is running on the same port that Pig looks for it?

... or failing that ...

2) Is there any way to just tell Pig to stop trying to connect to the JHS and continue with the task?

Pig keeps trying to connect to job history server (and fails)

Answer

Related questions