My hadoop version is 1.0.2. Now I want at most 10 map tasks running at the same time. I have found 2 variable related to this question.
a) mapred.job.map.capacity
but in my hadoop version, this parameter seems abandoned.
b) mapred.jobtracker.taskScheduler.maxRunningTasksPerJob (http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.collector/1.0.2/mapred-default.xml)
I set this variable like below:
Configuration conf = new Configuration();
conf.set("date", date);
conf.set("mapred.job.queue.name", "hadoop");
conf.set("mapred.jobtracker.taskScheduler.maxRunningTasksPerJob", "10");
DistributedCache.createSymlink(conf);
Job job = new Job(conf, "ConstructApkDownload_" + date);
...
The problem is that it doesn't work. There is still more than 50 maps running as the job starts.
After looking through the hadoop document, I can't find another to limit the concurrent running map tasks. Hope someone can help me ,Thanks.
=====================
I hava found the answer about this question, here share to others who may be interested.
Using the fair scheduler, with configuration parameter maxMaps to set the a pool's maximum concurrent task slots, in the Allocation File (fair-scheduler.xml). Then when you submit jobs, just set the job's queue to the according pool.
You can set the value of mapred.jobtracker.maxtasks.per.job
to something other than -1 (the default). This limits the number of simultaneous map or reduce tasks a job can employ.
This variable is described as:
The maximum number of tasks for a single job. A value of -1 indicates that there is no maximum.
I think there were plans to add mapred.max.maps.per.node
and mapred.max.reduces.per.node
to job configs, but they never made it to release.