I've been running some hive scripts on an aws emr 4.8
cluster with hive 1.0 and tez 0.8.
My configurations look like this:
SET hive.exec.compress.output=true;
SET mapred.output.compression.type=BLOCK;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
set hive.execution.engine=tez;
set hive.merge.mapfiles=false;
SET hive.default.fileformat=Orc;
set tez.task.resource.memory.mb=5000;
SET hive.tez.container.size=6656;
SET hive.tez.java.opts=-Xmx5120m;
set hive.optimize.ppd=true;
And my global configs are:
hadoop-env.export HADOOP_HEAPSIZE 4750
hadoop-env.export HADOOP_DATANODE_HEAPSIZE 4750
hive-env.export HADOOP_HEAPSIZE 4750
While running my script, I get the following error:
Container [pid=19027,containerID=container_1477393351192_0007_02_000001] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.9 GB of 5 GB virtual memory used. Killing container.
On googling this error, I read that set tez.task.resource.memory.mb
will change the physical memory limit, but clearly I was mistaken. What am I missing?
I have had this problem a lot. The changing
Set hive.tez.container.size=6656;
Set hive.tez.java.opts=-Xmx4g;
does not fix the problem for me but this does:
set tez.am.resource.memory.mb=4096;