Processes exceeding thread stack size limit on RedHat Enterprise Linux 6?

Rory picture Rory · Nov 1, 2012 · Viewed 9.3k times · Source

I have a couple of processes running on RHEL 6.3, but for some reason they are exceeding the thread stack sizes.

For example, the Java process is given the stack size of -Xss256k at runtime on startup, and the C++ process is given a thread stack size of 1MB using pthread_attr_setstacksize() in the actual code.

For some reason however, these processes are not sticking to these limits, and I'm not sure why.

For example, when I run

pmap -x <pid> 

for the C++ and Java process, I can see hundreds of 'anon' threads for each (which I have confirmed are the internal worker threads created by each of these processes), but these have an allocated value of 64MB each, not the limits set above:

00007fa4fc000000 168 40 40 rw--- [ anon ] 
00007fa4fc02a000 65368 0 0 ----- [ anon ] 
00007fa500000000 168 40 40 rw--- [ anon ] 
00007fa50002a000 65368 0 0 ----- [ anon ] 
00007fa504000000 168 40 40 rw--- [ anon ] 
00007fa50402a000 65368 0 0 ----- [ anon ] 
00007fa508000000 168 40 40 rw--- [ anon ] 
00007fa50802a000 65368 0 0 ----- [ anon ] 
00007fa50c000000 168 40 40 rw--- [ anon ] 
00007fa50c02a000 65368 0 0 ----- [ anon ] 
00007fa510000000 168 40 40 rw--- [ anon ] 
00007fa51002a000 65368 0 0 ----- [ anon ] 
00007fa514000000 168 40 40 rw--- [ anon ] 
00007fa51402a000 65368 0 0 ----- [ anon ] 
00007fa518000000 168 40 40 rw--- [ anon ] 
...

But when I run the following on the above process with all the 64MB 'anon' threads

cat /proc/<pid>/limits | grep stack 

Max stack size 1048576 1048576 bytes 

it shows a max thread stack size of 1MB, so am a bit confused as to what is going on here. Also, the script that calls these programs sets 'ulimit -s 1024' as well.

It should be noted that this only seems to occur when using a very high end machines (e.g. 48GB RAM, 24 CPU cores). The issue does not appear on less powerful machines (e.g. 4GB RAM, 2 CPU cores).

Any help understanding what is happening here would be much appreciated.

Answer

Rory picture Rory · Nov 5, 2012

Turns out that RHEL6 2.11 have changed the thread model such that each thread where possible gets allocated its own thread pool, so on a larger system you may see it grabbing up to the 64MB. On 64 bit the max number of thread pools allowed is greater.

The fix for this was to add

export LD_PRELOAD=/path/to/libtcmalloc.so 

in the script that starts the processes (rather than using glibc2.11)

Some more inforation on this is available from:

Linux glibc >= 2.10 (RHEL 6) malloc may show excessive virtual memory usage https://www.ibm.com/developerworks/mydeveloperworks/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en

glibc bug malloc uses excessive memory for multi-threaded applications http://sourceware.org/bugzilla/show_bug.cgi?id=11261

Apache hadoop have fixed the problem by setting MALLOC_ARENA_MAX https://issues.apache.org/jira/browse/HADOOP-7154