JVM Garbage Collector settings investigation Tigase, Inc.
1. Objective Investigate current JVM Garbage Collector settings, which results in high Heap usage, and propose new optimised ones. Following memory usage of the installation not being under heavy load was the reason to perform the investigation (https:// projects.tigase.org/issues/3248): (note: Tigase Monitor reports in Memory Usage section usage of the OldGen Heap region) It shows slow ramping up Heap usage and then performing FullGC. It would indicate premature promotion short-lived objects. Best possible usage pattern is relatively low number of stop-the-world collections with shortest time. 2. Investigation For the purpose of analysing and comparing JVM memory management performance following tools were used: internal JVM tooling for logging and debugging GC performance Tigase Monitor (observing load, checking OldGen heap region utilisation) VisualVM (with VisualGC add-on) GCViewer (https://github.com/chewiebug/gcviewer, version build from sources as currently available release contains a bug when JVM was configured with particular set of flags - -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX: +PrintGCTimeStamps -Xloggc:logs/jvm.log -verbose:gc ) other tools were used for comparison as well (e.g.: gceasy.io) More detailed description of the utilised options are described in the Tigase Server - JVM settings sections. 2.1. sure.im installation (VM machines, regular traffic) All machines have relatively same traffic, it s configuration (Tigase wise) is exactly the same, JVM was configured with Xms=5G and Xmx=5G (both initial and maximum Heap was configured to 5G). Load on all the machines is relatively low (both packet-pre-second and number-of-connections wise). Tigase Monitor shows steady rising of percentage Memory Usage (which takes values only from OldGen region) when using CMS garbage collector with relatively default settings and then clearing memory once reaching roughly 80-99% occupancy of such region. G1 and Parallel collectors show more stable OldGen usage. Because Tigase Monitor displays only OldGen region metrics, actual Heap usage is different - especially when we consider only percentage. While looking at actual sizing we notice that depending on used GC the OldGen uses different amount of memory. Tigase, Inc. 1
G1GC (+UseG1GC -XX:ConcGCThreads=4 -XX:G1HeapRegionSize=2 -XX:InitiatingHeapOccupancyPercent=35 - XX:MaxGCPauseMillis=1000 -XX:MaxHeapSize=5368709120 -XX:ParallelGCThreads=4): CMS with default settings from etc/tigase.conf enabled (-XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX: +CMSIncrementalMode -XX:ParallelCMSThreads=2 -XX:-ReduceInitialCardMarks) Tigase, Inc. 2
Default GC (Parallel) - without enabling any GC settings: 2.2. c0x installation I (VM machines, 2 core & 4GB, high load traffic) All machines had same Tigase and JVM heap size settings (Xms=Xmx=3,5G), used different GC settings: c01 - Default GC (Parallel collector for both Young and Old generations, i.e. -XX:+UseParallelGC -XX:+UseParallelOldGC) c02 - G1GC collector (runs collections both in Young and Old generations, i.e. -XX:+UseG1GC -XX:MaxGCPauseMillis=100 - XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=2 -XX:ParallelGCThreads=2 -XX:ConcGCThreads=2) c03 - Concurrent Mark and Sweep (CMS) enabled (applies to Tenured space only) with explicit configuration of NewRatio set to default value of 2 (i.e. -XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalMode - XX:CMSInitiatingOccupancyFraction=70 -XX:-ReduceInitialCardMarks -XX:NewRatio=2) c04 - Concurrent Mark and Sweep (CMS) enabled (applies to Tenured space only) with default Tigase settings (i.e. -XX: +UseParNewGC -XX:+UseConcMarkSweepGC) On the surface we can observe, that default default parallel collector causes very uneven usage of OldGen which would suggest lot of premature promotions; G1GC collector, in addition to uneven allocation also imposes higher CPU usage, which imposes processing time resulting in queues overflowing; CMS garbage collector offers more even use of OldGen space on average lower CPU usage. In general CMS poses as better solution. Looking into details of memory allocation and GC operation we can make a couple of observations: Tigase, Inc. 3
c01 Tigase, Inc. 4
c02 Tigase, Inc. 5
c03 Tigase, Inc. 6
c04 Completely ignoring internal operations of GC, the most and the longest pauses (total) were caused by G1GC collector (almost 7minutes), followed by CMS with default Tigase settings (5m11s), followed by CMS with enforced MaxNew size (1m9s) and the least and shortest pauses were inflicted by default Parallel collector, which would seems like the best choice. Tigase, Inc. 7
However, if we analyse the operations of the GC in each run, addition information will be revealed: c01 c02 c03 Tigase, Inc. 8
c04 Again G1GC offers lowest GC performance rate while making the most pauses taking the most time; default, parallel collector displays highest rate of cleared memory as well as least (count-wise) and shortest cumulative pause times (45s) however all of them were Stop-The-World (STW) which means that all application threads were stopped. Lastly CMS garbage collectors - looking at the above stats reveal huge impact of Young Generation sizing. With the default (c04), small size of Young Gen (which seems to change from JVM7 to JVM8) we can observe that total pauses took 3m20s and only 22s of those GC activities were not STW pauses (i.e. GC proceeded concurrently with application threads); additionally GC performance was relatively slow and only slightly better than G1GC. On the other hand enforcing ratio of Young Generation with NewSize property set to 2 resulted in decreasing GC pauses to 70s which while still higher than Parallel collector has the advantage, that only 22s of those pauses were STW pauses (which is roughly half the time Parallel collector stopped the application). 2.3. hw1/hw2 installation (real hardware, Xeon W3530 4c/8t 2.8 Ghz, 48 GB RAM high load) All machines had same Tigase and JVM heap size settings (Xms=Xmx=5G), used different GC settings: A. Parallel GC (hw1) vs CMS with explicit YoungGeneration size (hw2) hw1 - Default GC (Parallel collector for both Young and Old generations, i.e. -XX:+UseParallelGC -XX:+UseParallelOldGC) hw2 - Concurrent Mark and Sweep (CMS) enabled (applies to Tenured space only) with explicit configuration of NewRatio set to default value of 2 (i.e. -XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalMode - XX:CMSInitiatingOccupancyFraction=70 -XX:-ReduceInitialCardMarks -XX:NewRatio=2) Tigase, Inc. 9
HW1: Tigase, Inc. 10
HW2 Tigase, Inc. 11
GC statistics from HW1 (above) and HW2 (below) Comments: While allocation patterns are quite similar looking closer at GC statistics shows that CMS with YoungGeneration set to 1/3 of the Heap size while operating on real hardware (with more available threads) shows better performance than default Parallel GC - it needs less time (almost 15s less, 52s vs 66s) than Parallel collector and in addition, only roughly half of this time caused STW pauses. B. Parallel GC (hw1) vs CMS with explicit YoungGeneration size (hw2) hw1 - Concurrent Mark and Sweep (CMS) enabled (applies to Tenured space only) with default Tigase settings (i.e. -XX: +UseConcMarkSweepGC -XX:+UseParNewGC) hw2 - Concurrent Mark and Sweep (CMS) enabled (applies to Tenured space only) with explicit configuration of NewRatio set to default value of 2 (i.e. -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=70 -XX:ConcGCThreads=3 - XX:NewRatio=2 -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC ) Tigase, Inc. 12
HW1: Tigase, Inc. 13
HW2: Tigase, Inc. 14
GC statistics from HW1 (above) and HW2 (below) Comments: On real HW and using CMS collector, JVM seems to allocate more heap space for Young Generation (ratio wise) comparing to VM, yet still lower than the defaults that javadoc suggests - enforcing NewRatio=2 causes pause time to drop by half from 110s to 55s, decreasing pauses count by more than half and in addition results in better overall GC Performance. Tigase, Inc. 15
C. G1GC vs CMS with explicit NewRatio hw1 - G1 garbage collector (i.e. -XX:ConcGCThreads=4 -XX:G1HeapRegionSize=2 -XX:InitiatingHeapOccupancyPercent=35 - XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=4 -XX:+UseG1GC) hw2 - Concurrent Mark and Sweep (CMS) enabled (applies to Tenured space only) with explicit configuration of NewRatio set to default value of 2 (i.e. -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=70 -XX:ConcGCThreads=3 - XX:NewRatio=2 -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC) Tigase Monitor shows higher average CPU usage while using G1 collector (roughly 20 p.p. higher than CMS), which also imposes higher risk of queues overflowing. Tigase, Inc. 16
HW1: Tigase, Inc. 17
HW2: Tigase, Inc. 18
GC statistics from HW1 (above) and HW2 (below) Comments: In addition to observed in Monitor higher CPU utilisation we also can see that G1 activity is also higher under load (around 5%). Pause time wise G1 has almost 20 times higher pause time while offering almost 20 times lower GC performance resulting in lower throughput. 3. Concussions and recommendations Summarising all the above and adding a couple of pointers: Garbage Collection is the faster the more dead objects occupies given space, therefore on high-traffic installation it s better to have rather large YoungGen resulting in lower promotion of the objects to the OldGen; Using Heap size adjusted to the actual usage is better as the larger the heap the larger are spaces over which collection needs to be performed thus resulting in longer pauses; in case of huge heaps G1 collector may be better solution to avoid longer pauses; It was revealed that with JVM8 default sizing of Young / Old generation changed, even tho NewRatio is still defaulting to 2 : $ java -server -XX:+PrintFlagsFinal -version grep "NewRatio" intx NewRatio = 2 {product} Java(TM) SE Runtime Environment (build 1.7.0_25-b15) $ java -server -XX:+PrintFlagsFinal -version grep "OldSize" uintx OldSize = 5439488 {product} Java(TM) SE Runtime Environment (build 1.7.0_25-b15) $ /usr/lib/jvm/jdk1.8.0_11/bin/java -server -XX:+PrintFlagsFinal -version grep "NewRatio" uintx NewRatio = 2 {product} Java(TM) SE Runtime Environment (build 1.8.0_11-b12) $ /usr/lib/jvm/jdk1.8.0_11/bin/java -server -XX:+PrintFlagsFinal -version grep "OldSize" uintx OldSize := 63700992 {product} Java(TM) SE Runtime Environment (build 1.8.0_11-b12) Statistics API in Tigase were not optimised thus (especially retaining statistics history) increased promotion rate to Tenured space. Tigase, Inc. 19