Aggregate Resource Allocation for a job in YARN

blackfury picture blackfury · Nov 23, 2015 · Viewed 13.1k times · Source

I am new to Hadoop. When i run a job, i see the aggregate resource allocation for that job as 251248654 MB-seconds, 24462 vcore-seconds. However, when i find the details about the cluster, it shows there are 888 Vcores-total and 15.90 TB Memory-total. Can anyone tell me how this is related? what does MB-second and Vcore-seconds refer to for the job.

Is there any material online to know these? I tried surfing, dint get a proper answer

Answer

Manjunath Ballur picture Manjunath Ballur · Nov 23, 2015
VCores-Total: Indicates the total number of VCores available in the cluster
Memory-Total: Indicates the total memory available in the cluster.

For e.g. I have a single node cluster, where, I have set memory requirements per container to be: 1228 MB (determined by config: yarn.scheduler.minimum-allocation-mb) and vCores per container to 1 vCore (determined by config: yarn.scheduler.minimum-allocation-vcores).

I have set: yarn.nodemanager.resource.memory-mb to 9830 MB. So, there can be totally 8 containers per node (9830 / 1228 = 8).

So, for my cluster:

VCores-Total = 1 (node) * 8 (containers) * 1 (vCore per container) = 8 
Memory-Total = 1 (node) * 8 (containers) * 1228 MB (memory per container) = 9824 MB = 9.59375 GB = 9.6 GB

The figure below, shows my cluster metrics: enter image description here

Now let's see "MB-seconds" and "vcore-seconds". As per the description in the code (ApplicationResourceUsageReport.java):

MB-seconds: The aggregated amount of memory (in megabytes) the application has allocated times the number of seconds the application has been running.

vcore-seconds: The aggregated number of vcores that the application has allocated times the number of seconds the application has been running.

The description is self-explanatory (remember the keyword: Aggregated).

Let me explain this with an example. I ran a DistCp job (which spawned 25 containers), for which I got the following:

Aggregate Resource Allocation: 10361661 MB-seconds, 8424 vcore-seconds

Now, let's do some rough calculation on how much time each container took:

For memory:
10361661 MB-seconds = 10361661 / 25 (containers) / 1228 MB (memory per container) = 337.51 seconds = 5.62 minutes

For CPU
8424 vcore-seconds = 8424 / 25 (containers) / 1 (vCore per container) = 336.96 seconds = 5.616 minutes

This indicates on an average, each container took 5.62 minutes to execute.

I hope this makes it clear. You can execute a job and confirm it yourself.