I have written my own hadoop program and I can run using pseudo distribute mode in my own laptop, however, when I put the program in the cluster which can run example jar of hadoop, it by default launches the local job though I indicate the hdfs file path, below is the output, give suggestions?
./hadoop -jar MyRandomForest_oob_distance.jar hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
12/03/16 16:21:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/16 16:21:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/16 16:21:25 INFO mapred.JobClient: Running job: job_local_0001
12/03/16 16:21:25 INFO mapred.MapTask: io.sort.mb = 100
12/03/16 16:21:25 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/16 16:21:25 INFO mapred.MapTask: record buffer = 262144/327680
12/03/16 16:21:25 WARN mapred.LocalJobRunner: job_local_0001
java.io.FileNotFoundException: File /user/randomforest/input/genotype1.txt does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at Data.Data.loadData(Data.java:103)
at MapReduce.DearMapper.loadData(DearMapper.java:261)
at MapReduce.DearMapper.setup(DearMapper.java:332)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/16 16:21:26 INFO mapred.JobClient: map 0% reduce 0%
12/03/16 16:21:26 INFO mapred.JobClient: Job complete: job_local_0001
12/03/16 16:21:26 INFO mapred.JobClient: Counters: 0
Total Running time is: 1 secs
LocalJobRunner has been chosen as your configuration most probably has the mapred.job.tracker
property set to local
or has not been set at all (in which case the default is local). To check, go to "wherever you extracted/installed hadoop"/etc/hadoop/ and see if the file mapred-site.xml exists (for me it did not, a file called mapped-site.xml.template was there). In that file (or create it if it doesn't exist) make sure it has the following property:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
org.apache.hadoop.mapred.JobClient.init(JobConf)
What is the value of this configuration property in the hadoop configuration on the machine you are submitting this from? Also confirm that the hadoop executable you are running references this configuration (and that you don't have 2+ installations configured differently) - type which hadoop
and trace any symlinks you come across.
Alternatively you can override this when you submit your job, if you know the JobTracker host and port number using the -jt option:
hadoop jar MyRandomForest_oob_distance.jar -jt hostname:port hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1