I have a web app running on tomcat which has max heap size set to 8GB.
If no users log into the app, non-cleanable memory (which resides after a garbage collection) is quite low at about 1GB. In this situaiton I see constant memory growth for about 4GB and then garbage collector runs and memory again drops to about 1 GB. This pattern continues if no user gets logged in.
The GC logs show a Full GC takes 11 seconds, a considerable amount of time, compared to minor GC, which takes ~1 second:
2017-02-14T15:30:44.553+0530: 591.922: [GC (Allocation Failure) [PSYoungGen: 1501051K->631966K(1833472K)] 2392189K->1523112K(3030016K), 1.5100144 secs]
...[Times: user=1.49 sys=0.01, real=1.51 secs]
2017-02-14T15:31:20.335+0530: 627.705: [GC (Allocation Failure) [PSYoungGen: 1553054K->595007K(1842176K)] 2444200K->1570521K(3038720K), 1.3050284 secs]
...[Times: user=1.27 sys=0.04, real=1.31 secs]
2017-02-14T15:33:33.682+0530: 761.052: [GC (Allocation Failure) [PSYoungGen: 1516095K->556800K(1842176K)] 2491609K->1596474K(3038720K), 1.6957154 secs]
...[Times: user=1.67 sys=0.03, real=1.69 secs]
2017-02-14T15:33:35.378+0530: 762.748: [Full GC (Ergonomics) [PSYoungGen: 556800K->365446K(1842176K)] [ParOldGen: 1039673K->1196476K(2018304K)] 1596474K->1561923K(3860480K), [Metaspace: 70472K->70472K(1114112K)], 11.2779843 secs]
...[Times: user=11.13 sys=0.09, real=11.28 secs]
2017-02-14T15:34:56.232+0530: 843.602: [GC (Allocation Failure) [PSYoungGen: 1286534K->216613K(1842176K)] 2483011K->1609875K(3860480K), 1.4938761 secs]
...[Times: user=1.45 sys=0.05, real=1.50 secs]
Because all other threads gets suspended during GC, if a user tries to access the webapp during a Full GC, server does not respond. What triggers this Full GC?
According to the log, minor GC events are due to allocation failures, and Full GC is due to Ergonomics
. What does this mean?
There is plenty of free heap space left, and I want to delay Full GC from happening until there is no significant memory reduction due to minor GC. Can I prevent Full GC from happening in this case?
My VM args are below:
export CATALINA_OPTS="$CATALINA_OPTS -Xms512m -Xmx8192m -XX:+UseConcMarkSweepGC"
I would point you to the Parallel Collector link. The "ergonomics" is a method for auto tune the collector with the specific behavior of an application.
Most of the time the auto-tuning is fine. In your case it seems that it ends in too long GC. You can fix it by adjusting yourself the parameter of the collector.
Quoting the documentation :
Maximum Garbage Collection Pause Time
The maximum pause time goal is specified with the command-line option -XX:MaxGCPauseMillis=. This is interpreted as a hint that pause times of milliseconds or less are desired; by default, there is no maximum pause time goal. If a pause time goal is specified, the heap size and other parameters related to garbage collection are adjusted in an attempt to keep garbage collection pauses shorter than the specified value. These adjustments may cause the garbage collector to reduce the overall throughput of the application, and the desired pause time goal cannot always be met.
Throughput
The throughput goal is measured in terms of the time spent doing garbage collection versus the time spent outside of garbage collection (referred to as application time). The goal is specified by the command-line option -XX:GCTimeRatio=, which sets the ratio of garbage collection time to application time to 1 / (1 + ).
For example, -XX:GCTimeRatio=19 sets a goal of 1/20 or 5% of the total time in garbage collection. The default value is 99, resulting in a goal of 1% of the time in garbage collection.
Footprint
Maximum heap footprint is specified using the option -Xmx. In addition, the collector has an implicit goal of minimizing the size of the heap as long as the other goals are being met.