GCPerfP99.java - 99th Percentile Performance

This section provides a GC test program, GCPerfP99.java, that uses 99th percentile performance measurements.

From previous tutorials, we learned that a long system interruption has a huge impact on latency and a small impact on throughput. This is because latency is defined based on the worst execution, while throughput is defined based on the average execution time.

A better way to define the latency is the 99th percentile (or P99) latency, which throws away 1% worst runs, then takes the latency of the rest 99% good runs.

P99 latency is a better measurement, because if system interruption happens less than 1% of the time, then P99 latency is actually 100% accurate.

I have created another GC test program, GCPerfP99.java that uses P99 latency measurement:

/* GCPerfP99.java
 * Copyright (c) HerongYang.com. All Rights Reserved.
 */
class GCPerfP99 {
   static long startTime = System.currentTimeMillis();
   static MyList objList = null;
   static int objSize = 1024;  // in KB, default = 1 MB
   static int baseSize = 32;   // # of objects in the base
   static int chunkSize = 32;  // # of objects per run chunk
   static int warmup = 64;     // warmup loops: 64*32 = 2GB
   static int runs = 1000;     // number of runs
   public static void main(String[] arg) {
      System.out.println("["
          +((System.currentTimeMillis()-startTime)/1000.0)
        +"s] main() started");
      if (arg.length>0) objSize = Integer.parseInt(arg[0]);
      if (arg.length>1) baseSize = Integer.parseInt(arg[1]);
      if (arg.length>2) chunkSize = Integer.parseInt(arg[2]);
      if (arg.length>3) warmup = Integer.parseInt(arg[3]);
      if (arg.length>4) runs = Integer.parseInt(arg[4]);
      System.out.println("Parameters:");
      System.out.println("   Size="+objSize+"KB"
         +", Base="+baseSize +", Chunk="+chunkSize
         +", Warmup="+warmup+", Runs="+runs);
      objList = new MyList();
      myTest();
   }
   public static void myTest() {
      for (int m=0; m<baseSize; m++) {
         objList.add(new MyObject());
      }
      for (int k=0; k<warmup; k++) {
         for (int m=0; m<chunkSize; m++) {
            objList.add(new MyObject());
         }
         for (int m=0; m<chunkSize; m++) {
            objList.removeTail();
         }
      }

      long[] times = new long[runs+1];
      times[0] = System.currentTimeMillis();
      for (int i=0; i<runs; i++) {
         System.out.println("["
        +((System.currentTimeMillis()-startTime)/1000.0)
          +"s] Run start "+(i+1));
         for (int m=0; m<chunkSize; m++) {
            objList.add(new MyObject());
         }
         for (int m=0; m<chunkSize; m++) {
            objList.removeTail();
         }
         times[i+1] = System.currentTimeMillis();
         System.out.println("["+((times[i+1]-startTime)/1000.0)
        +"s] Run end "+(i+1)+": "+(times[i+1]-times[i])+"ms");
      }

      long[] samples = new long[runs];
      for (int i=0; i<runs; i++) {
         samples[i] = times[i+1] - times[i];          // in millis
      }
      java.util.Arrays.sort(samples);        // sorted low to high

      int p99 = (runs*99)/100;                  // 99th percentile
      long duration = 0;
      for (int i=0; i<p99; i++) {
         duration += samples[i];
      }
      long avePerf = (1000*p99*chunkSize)/duration;  // obj/second
      long maxPerf = 999999;
      if (samples[0]>0) maxPerf = (1000*chunkSize)/samples[0];
      long minPerf = (1000*chunkSize)/samples[p99-1];
      long latency = 1000000/minPerf;           // millis/1000 obj
      System.out.println("Results:");
      System.out.println("   Total execution time = "
         +(duration/1000)+" seconds");
      System.out.println("   Total objects processed = "
         +(runs*chunkSize));
      System.out.println("   Average time per run = "
         +(duration/p99)+" milliseconds");
      System.out.println("   Throughput = "
         +avePerf+" objects/second");
      System.out.println("   Latency = "
         +latency+" milliseconds/1000 objects");
      System.out.println("   Throughput (max, ave, min) = ("
         +maxPerf+", "+avePerf+", "+minPerf+")");
      System.out.println("   Latency (min, ave, max) = ("
         +(1000000/maxPerf)+", "+(1000000/avePerf)+", "
         +(1000000/minPerf)+")");

      System.out.println("1% worst runs dropped:");
      for (int i=p99; i<runs; i++) {
         System.out.println("   Run, Time, Throughput = "
            +(i+1)+", "+samples[i]+", "+(1000*chunkSize)/samples[i]);
      }

      System.err.println("Press ENTER to end...");
      try {
         System.in.read();
      } catch (Exception e) {
      }
   }

   static class MyObject {
      private long[] obj = null;
      public MyObject next = null;
      public MyObject prev = null;
      public MyObject() {
         obj = new long[objSize*128];          // 128*8=1024 bytes
         for (int i=0; i<objSize*128; i++) {
            obj[i] = i/2+i/3+i/4+i/5;            // some work load
         }
      }
   }
   static class MyList {
      MyObject head = null;
      MyObject tail = null;
      void add(MyObject o) {
         if (head==null) {
            head = o;
            tail = o;
         } else {
            o.prev = head;
            head.next = o;
            head = o;
         }
      }
      void removeTail() {
         if (tail!=null) {
            if (tail.next==null) {
               tail = null;
               head = null;
            } else {
               tail = tail.next;
               tail.prev = null;
            }
         }
      }
   }
}

Changes made on the test program:

Table of Contents

 About This Book

 Heap Memory Area and Size Control

 JVM Garbage Collection Logging

 Introduction of Garbage Collectors

 Serial Collector - "+XX:+UseSerialGC"

 Parallel Collector - "+XX:+UseParallelGC"

 Concurrent Mark-Sweep (CMS) Collector - "+XX:+UseConcMarkSweepGC"

 Garbage First (G1) Collector - "+XX:+UseG1GC"

 The Z Garbage Collector (ZGC) - "+XX:+UseZGC"

 Object References and Garbage Collection

Garbage Collection Performance Test Program

 GCPerformance.java - GC Performance Test Program

 GCPerformance.java - Program Output

 Performance Impact of Wait Time

 Performance Impact of Object Size

 Performance Impact of Chunk Size

 Performance Jumps Not Related to GC

 Performance Test and System Interruptions

 "START /REALTIME" - Run JVM with Highest Priority

GCPerfP99.java - 99th Percentile Performance

 GCPerfP99.java - Output Verification

 GCPerfP99V2.java - Percentile Performance with Load

 GCPerfP99V2.java - Work Load Level

 GCPerfP99V2.java - Object Number and Size

 Performance Tests on Serial Collector

 Performance Tests on Parallel collector

 Performance Tests on Concurrent collector

 Performance Tests on G1 collector

 Garbage Collection Performance Test Summary

 References

 Full Version in PDF/EPUB