how to restrict the concurrent running map tasks?

HaiWang picture HaiWang · Jan 17, 2013 · Viewed 15.4k times · Source

My hadoop version is 1.0.2. Now I want at most 10 map tasks running at the same time. I have found 2 variable related to this question.

a) mapred.job.map.capacity

but in my hadoop version, this parameter seems abandoned.

b) mapred.jobtracker.taskScheduler.maxRunningTasksPerJob (http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.collector/1.0.2/mapred-default.xml)

I set this variable like below:

Configuration conf = new Configuration();
conf.set("date", date);
conf.set("mapred.job.queue.name", "hadoop");
conf.set("mapred.jobtracker.taskScheduler.maxRunningTasksPerJob", "10");

DistributedCache.createSymlink(conf);
Job job = new Job(conf, "ConstructApkDownload_" + date);
...

The problem is that it doesn't work. There is still more than 50 maps running as the job starts.

After looking through the hadoop document, I can't find another to limit the concurrent running map tasks. Hope someone can help me ,Thanks.

=====================

I hava found the answer about this question, here share to others who may be interested.

Using the fair scheduler, with configuration parameter maxMaps to set the a pool's maximum concurrent task slots, in the Allocation File (fair-scheduler.xml). Then when you submit jobs, just set the job's queue to the according pool.

Answer

Dave picture Dave · Apr 2, 2013

You can set the value of mapred.jobtracker.maxtasks.per.job to something other than -1 (the default). This limits the number of simultaneous map or reduce tasks a job can employ.

This variable is described as:

The maximum number of tasks for a single job. A value of -1 indicates that there is no maximum.

I think there were plans to add mapred.max.maps.per.node and mapred.max.reduces.per.node to job configs, but they never made it to release.