Adaptive Size Policy for Best Latency

Java GC Tutorials - Herong's Tutorial Examples

∟Parallel Collector - "+XX:+UseParallelGC"

∟Adaptive Size Policy for Best Latency

This section provides a tutorial on how Adaptive Size Policy is used by the Parallel collector to our best latency goal.

According to Java documentation, the Adaptive Size Policy used by the Parallel collector allows you to set 3 performance goals and let the collector to decide the GC configuration to meet those goals.

In this first test, let's ask the Parallel collector to give us the best latency performance using these JVM options:

-XX:MaxGCPauseMillis=1
   We want the best latency:
   Give us the shortest application pause time or < 1 ms.

-XX:GCTimeRatio=1
   We don't care about throughput:
   You can spend 50% (or 1/(1+1)) of time doing GC

-Xmx1200m
   We don't care about footprint:
   You can use as much as 1.2GB heap

Here is what we got by looking GC pause log messages:

herong> java -XX:+UseParallelGC -XX:MaxGCPauseMillis=1 \
   -XX:GCTimeRatio=9 -Xmx1200m -XX:AdaptiveSizePolicyOutputInterval=1 \
   -Xlog:gc=debug,gc+heap=debug,gc+ergo=trace GarbageCollection2

GC(0) Pause Young (Allocation Failure) 31M->10M(119M) 8.004ms
GC(1) Pause Young (Allocation Failure) 40M->29M(118M) 11.618ms
GC(2) Pause Young (Allocation Failure) 59M->47M(119M) 11.935ms
GC(3) Pause Young (Allocation Failure) 77M->60M(119M) 7.807ms
GC(4) Pause Young (Allocation Failure) 90M->73M(119M) 7.974ms
GC(5) Pause Full (Ergonomics) 73M->33M(139M) 17.960ms
...
GC(99) Pause Young (Allocation Failure) 53M->47M(102M) 2.767ms
GC(100) Pause Young (Allocation Failure) 67M->54M(98M) 4.201ms
GC(101) Pause Young (Allocation Failure) 74M->63M(101M) 4.442ms
GC(102) Pause Young (Allocation Failure) 83M->72M(96M) 4.847ms
GC(103) Pause Full (Ergonomics) 72M->33M(101M) 12.733ms
...
GC(198) Pause Young (Allocation Failure) 50M->44M(88M) 2.230ms
GC(199) Pause Young (Allocation Failure) 61M->54M(88M) 3.995ms
GC(200) Pause Young (Allocation Failure) 71M->61M(88M) 3.996ms
GC(201) Pause Full (Ergonomics) 61M->33M(89M) 11.862ms
...
GC(301) Pause Young (Allocation Failure) 47M->43M(84M) 2.119ms
GC(302) Pause Young (Allocation Failure) 57M->49M(84M) 3.203ms
GC(303) Pause Young (Allocation Failure) 63M->55M(84M) 3.153ms
GC(304) Pause Full (Ergonomics) 55M->33M(83M) 12.098ms
...
GC(399) Pause Young (Allocation Failure) 40M->39M(79M) 1.350ms
GC(400) Pause Young (Allocation Failure) 46M->44M(69M) 2.282ms
GC(401) Pause Young (Allocation Failure) 51M->51M(77M) 2.582ms
GC(402) Pause Full (Ergonomics) 51M->33M(77M) 11.485ms
...
GC(499) Pause Young (Allocation Failure) 35M->35M(64M) 0.694ms
GC(500) Pause Young (Allocation Failure) 37M->37M(57M) 0.983ms
GC(501) Pause Young (Allocation Failure) 39M->38M(64M) 0.950ms
GC(502) Pause Young (Allocation Failure) 40M->40M(64M) 0.932ms
GC(503) Pause Young (Allocation Failure) 42M->42M(63M) 1.052ms
GC(504) Pause Young (Allocation Failure) 44M->44M(56M) 1.052ms
GC(505) Pause Young (Allocation Failure) 46M->45M(63M) 0.900ms
GC(506) Pause Young (Allocation Failure) 47M->47M(56M) 0.917ms
GC(507) Pause Young (Allocation Failure) 49M->49M(62M) 0.926ms

Not too bad. The Adaptive Size Policy did a good job:

The initial GC pause time was about 9ms per Young GC and 18ms per Full GC.
But after 500 rounds of GC, the Parallel collector was able to adjust heap space sizes and tenuring thresholds bring Young GC pause time to 1 ms.

Let's review GC(500) log messages to see how the 1ms pause time was achieved:

GC(500) Heap before GC invocations=501 (full 106): PSYoungGen
          total 14336K, used 4155K [...
GC(500)   eden space 3072K, 68% used [...
GC(500)   from space 11264K, 18% used [...
GC(500)   to   space 11264K, 0% used [...
GC(500)  ParOldGen       total 51712K, used 34622K [...
GC(500)   object space 51712K, 66% used [...
...
GC(500) PSYoungGen: 4155K->3072K(6656K)
GC(500) ParOldGen: 34622K->35646K(51712K)
GC(500) Pause Young (Allocation Failure) 37M->37M(57M) 0.983ms
...
GC(500) Heap after GC invocations=501 (full 106): PSYoungGen
          total 6656K, used 3072K [...
GC(500)   eden space 3072K, 0% used [...
GC(500)   from space 3584K, 85% used [...
GC(500)   to   space 10752K, 0% used [...
GC(500)  ParOldGen       total 51712K, used 35646K [...
GC(500)   object space 51712K, 68% used [...

As you can see, the Parallel actually, reduced "eden" to only 3072K, so that more frequent Young GC were performed, and each Young GC was very short.

Of course, the best latency was achieved with a big price to pay on throughput. The GC overhead reached about 50% at GC(498), see the log messages below:

GC(498)                        GC overhead (%)
GC(498)     Young generation:       60.74	  (no change)
GC(498)     Tenured generation:     48.54	  (attempted to shrink)
GC(498) Tenuring threshold: (attempted to decrease to balance GC costs) = 1