JVM Garbage Collector settings investigation



Similar documents
Oracle Corporation Proprietary and Confidential

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing

Introduction to Spark and Garbage Collection

Java Performance Tuning

Memory Management in the Java HotSpot Virtual Machine

MID-TIER DEPLOYMENT KB

Angelika Langer The Art of Garbage Collection Tuning

Java Garbage Collection Basics

Garbage Collection in NonStop Server for Java

Enabling Java in Latency Sensitive Environments

Java Garbage Collection Characteristics and Tuning Guidelines for Apache Hadoop TeraSort Workload

Extreme Performance with Java

JBoss Cookbook: Secret Recipes. David Chia Senior TAM, JBoss May 5 th 2011

The Fundamentals of Tuning OpenJDK

WebSphere Performance Monitoring & Tuning For Webtop Version 5.3 on WebSphere 5.1.x

Azul Pauseless Garbage Collection

Java Performance. Adrian Dozsa TM-JUG

Java Troubleshooting and Performance

Performance Optimization For Operational Risk Management Application On Azure Platform

Tomcat Tuning. Mark Thomas April 2009

Optimize GlassFish Performance in a Production Environment Performance White Paper February Abstract

Garbage Collection in the Java HotSpot Virtual Machine

Advanced Liferay Architecture: Clustering and High Availability

Performance Tuning for Oracle WebCenter Content 11g: Strategies & Tactics CHRIS ROTHWELL & PAUL HEUPEL FISHBOWL SOLUTIONS, INC.

Tuning WebSphere Application Server ND 7.0. Royal Cyber Inc.

Using jvmstat and visualgc to Solve Memory Management Problems

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers

Implementing a Well- Performing and Reliable Portal

Practical Performance Understanding the Performance of Your Application

Enterprise Manager Performance Tips

J2EE-JAVA SYSTEM MONITORING (Wily introscope)

Web Performance, Inc. Testing Services Sample Performance Analysis

Apache Tomcat Tuning for Production

USE IMPROVE EVANGELIZE. JVM Internals, Stefan Parvu System Administrator.

Monitoring and Managing a JVM

Performance Monitoring and Tuning. Liferay Chicago User Group (LCHIUG) James Lefeu 29AUG2013

Blackboard Learn TM, Release 9 Technology Architecture. John Fontaine

HeapStats: Your Dependable Helper for Java Applications, from Development to Operation

University of Southern California Shibboleth High Availability with Terracotta

AgencyPortal v5.1 Performance Test Summary Table of Contents

An Oracle White Paper March Load Testing Best Practices for Oracle E- Business Suite using Oracle Application Testing Suite

Agenda. Tomcat Versions Troubleshooting management Tomcat Connectors HTTP Protocal and Performance Log Tuning JVM Tuning Load balancing Tomcat

11.1 inspectit inspectit

Java Garbage Collection Best Practices for Sizing and Tuning the Java Heap

High-Availability. Configurations for Liferay Portal. James Min. Senior Consultant / Sales Engineer, Liferay, Inc.

Agility Database Scalability Testing

Performance Management for Cloudbased STC 2012

How To Use Java On An Ipa (Jspa) With A Microsoft Powerbook (Jempa) With An Ipad And A Microos 2.5 (Microos)

WAS Performance on i5/os. Lisa Wellman May 2010

Monitoring HP OO 10. Overview. Available Tools. HP OO Community Guides

Liferay Performance Tuning

System Requirements Table of contents

Berlin Mainframe Summit. Java on z/os IBM Corporation

Adobe LiveCycle Data Services 3 Performance Brief

Enabling Java in Latency Sensitive Environments

Zing Vision. Answering your toughest production Java performance questions

A Performance Analysis of Distributed Indexing using Terrier

Understanding Java Garbage Collection

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Advanced Performance Forensics

Delivering Quality in Software Performance and Scalability Testing

An Oracle White Paper September Advanced Java Diagnostics and Monitoring Without Performance Overhead

Configuring Apache Derby for Performance and Durability Olav Sandstå

Hardware Recommendations

General Introduction

Oracle WebLogic Server 11g Administration

CSCI E 98: Managed Environments for the Execution of Programs

Resource Aware Scheduler for Storm. Software Design Document. Date: 09/18/2015

TDA - Thread Dump Analyzer

Jonathan Worthington Scarborough Linux User Group

CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS

How To Monitor A Server With Zabbix

SmartFoxServer 2X Performance And Scalability White Paper

What s Cool in the SAP JVM (CON3243)

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

Java and Real Time Storage Applications

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

THE BUSY DEVELOPER'S GUIDE TO JVM TROUBLESHOOTING

IBM WebSphere Portal 7.0 Performance Tuning Guide

Troubleshoot the JVM like never before. JVM Troubleshooting Guide. Pierre-Hugues Charbonneau Ilias Tsagklis

PARALLELS CLOUD SERVER

Validating Java for Safety-Critical Applications

Benchmark Testing Results: OpenText Monitoring and Records Management Running on SQL Server 2012

A Comparison of Oracle Performance on Physical and VMware Servers

Tool - 1: Health Center

Robert Honeyman

Oracle WebLogic Thread Pool Tuning

Multi-core Programming System Overview

WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE

Performance Analysis of Web based Applications on Single and Multi Core Servers

Mobile Cloud Computing for Data-Intensive Applications

Deployment Checklist. Liferay Portal 6.1 Enterprise Edition

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Transcription:

JVM Garbage Collector settings investigation Tigase, Inc.

1. Objective Investigate current JVM Garbage Collector settings, which results in high Heap usage, and propose new optimised ones. Following memory usage of the installation not being under heavy load was the reason to perform the investigation (https:// projects.tigase.org/issues/3248): (note: Tigase Monitor reports in Memory Usage section usage of the OldGen Heap region) It shows slow ramping up Heap usage and then performing FullGC. It would indicate premature promotion short-lived objects. Best possible usage pattern is relatively low number of stop-the-world collections with shortest time. 2. Investigation For the purpose of analysing and comparing JVM memory management performance following tools were used: internal JVM tooling for logging and debugging GC performance Tigase Monitor (observing load, checking OldGen heap region utilisation) VisualVM (with VisualGC add-on) GCViewer (https://github.com/chewiebug/gcviewer, version build from sources as currently available release contains a bug when JVM was configured with particular set of flags - -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX: +PrintGCTimeStamps -Xloggc:logs/jvm.log -verbose:gc ) other tools were used for comparison as well (e.g.: gceasy.io) More detailed description of the utilised options are described in the Tigase Server - JVM settings sections. 2.1. sure.im installation (VM machines, regular traffic) All machines have relatively same traffic, it s configuration (Tigase wise) is exactly the same, JVM was configured with Xms=5G and Xmx=5G (both initial and maximum Heap was configured to 5G). Load on all the machines is relatively low (both packet-pre-second and number-of-connections wise). Tigase Monitor shows steady rising of percentage Memory Usage (which takes values only from OldGen region) when using CMS garbage collector with relatively default settings and then clearing memory once reaching roughly 80-99% occupancy of such region. G1 and Parallel collectors show more stable OldGen usage. Because Tigase Monitor displays only OldGen region metrics, actual Heap usage is different - especially when we consider only percentage. While looking at actual sizing we notice that depending on used GC the OldGen uses different amount of memory. Tigase, Inc. 1

G1GC (+UseG1GC -XX:ConcGCThreads=4 -XX:G1HeapRegionSize=2 -XX:InitiatingHeapOccupancyPercent=35 - XX:MaxGCPauseMillis=1000 -XX:MaxHeapSize=5368709120 -XX:ParallelGCThreads=4): CMS with default settings from etc/tigase.conf enabled (-XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX: +CMSIncrementalMode -XX:ParallelCMSThreads=2 -XX:-ReduceInitialCardMarks) Tigase, Inc. 2

Default GC (Parallel) - without enabling any GC settings: 2.2. c0x installation I (VM machines, 2 core & 4GB, high load traffic) All machines had same Tigase and JVM heap size settings (Xms=Xmx=3,5G), used different GC settings: c01 - Default GC (Parallel collector for both Young and Old generations, i.e. -XX:+UseParallelGC -XX:+UseParallelOldGC) c02 - G1GC collector (runs collections both in Young and Old generations, i.e. -XX:+UseG1GC -XX:MaxGCPauseMillis=100 - XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=2 -XX:ParallelGCThreads=2 -XX:ConcGCThreads=2) c03 - Concurrent Mark and Sweep (CMS) enabled (applies to Tenured space only) with explicit configuration of NewRatio set to default value of 2 (i.e. -XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalMode - XX:CMSInitiatingOccupancyFraction=70 -XX:-ReduceInitialCardMarks -XX:NewRatio=2) c04 - Concurrent Mark and Sweep (CMS) enabled (applies to Tenured space only) with default Tigase settings (i.e. -XX: +UseParNewGC -XX:+UseConcMarkSweepGC) On the surface we can observe, that default default parallel collector causes very uneven usage of OldGen which would suggest lot of premature promotions; G1GC collector, in addition to uneven allocation also imposes higher CPU usage, which imposes processing time resulting in queues overflowing; CMS garbage collector offers more even use of OldGen space on average lower CPU usage. In general CMS poses as better solution. Looking into details of memory allocation and GC operation we can make a couple of observations: Tigase, Inc. 3

c01 Tigase, Inc. 4

c02 Tigase, Inc. 5

c03 Tigase, Inc. 6

c04 Completely ignoring internal operations of GC, the most and the longest pauses (total) were caused by G1GC collector (almost 7minutes), followed by CMS with default Tigase settings (5m11s), followed by CMS with enforced MaxNew size (1m9s) and the least and shortest pauses were inflicted by default Parallel collector, which would seems like the best choice. Tigase, Inc. 7

However, if we analyse the operations of the GC in each run, addition information will be revealed: c01 c02 c03 Tigase, Inc. 8

c04 Again G1GC offers lowest GC performance rate while making the most pauses taking the most time; default, parallel collector displays highest rate of cleared memory as well as least (count-wise) and shortest cumulative pause times (45s) however all of them were Stop-The-World (STW) which means that all application threads were stopped. Lastly CMS garbage collectors - looking at the above stats reveal huge impact of Young Generation sizing. With the default (c04), small size of Young Gen (which seems to change from JVM7 to JVM8) we can observe that total pauses took 3m20s and only 22s of those GC activities were not STW pauses (i.e. GC proceeded concurrently with application threads); additionally GC performance was relatively slow and only slightly better than G1GC. On the other hand enforcing ratio of Young Generation with NewSize property set to 2 resulted in decreasing GC pauses to 70s which while still higher than Parallel collector has the advantage, that only 22s of those pauses were STW pauses (which is roughly half the time Parallel collector stopped the application). 2.3. hw1/hw2 installation (real hardware, Xeon W3530 4c/8t 2.8 Ghz, 48 GB RAM high load) All machines had same Tigase and JVM heap size settings (Xms=Xmx=5G), used different GC settings: A. Parallel GC (hw1) vs CMS with explicit YoungGeneration size (hw2) hw1 - Default GC (Parallel collector for both Young and Old generations, i.e. -XX:+UseParallelGC -XX:+UseParallelOldGC) hw2 - Concurrent Mark and Sweep (CMS) enabled (applies to Tenured space only) with explicit configuration of NewRatio set to default value of 2 (i.e. -XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalMode - XX:CMSInitiatingOccupancyFraction=70 -XX:-ReduceInitialCardMarks -XX:NewRatio=2) Tigase, Inc. 9

HW1: Tigase, Inc. 10

HW2 Tigase, Inc. 11

GC statistics from HW1 (above) and HW2 (below) Comments: While allocation patterns are quite similar looking closer at GC statistics shows that CMS with YoungGeneration set to 1/3 of the Heap size while operating on real hardware (with more available threads) shows better performance than default Parallel GC - it needs less time (almost 15s less, 52s vs 66s) than Parallel collector and in addition, only roughly half of this time caused STW pauses. B. Parallel GC (hw1) vs CMS with explicit YoungGeneration size (hw2) hw1 - Concurrent Mark and Sweep (CMS) enabled (applies to Tenured space only) with default Tigase settings (i.e. -XX: +UseConcMarkSweepGC -XX:+UseParNewGC) hw2 - Concurrent Mark and Sweep (CMS) enabled (applies to Tenured space only) with explicit configuration of NewRatio set to default value of 2 (i.e. -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=70 -XX:ConcGCThreads=3 - XX:NewRatio=2 -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC ) Tigase, Inc. 12

HW1: Tigase, Inc. 13

HW2: Tigase, Inc. 14

GC statistics from HW1 (above) and HW2 (below) Comments: On real HW and using CMS collector, JVM seems to allocate more heap space for Young Generation (ratio wise) comparing to VM, yet still lower than the defaults that javadoc suggests - enforcing NewRatio=2 causes pause time to drop by half from 110s to 55s, decreasing pauses count by more than half and in addition results in better overall GC Performance. Tigase, Inc. 15

C. G1GC vs CMS with explicit NewRatio hw1 - G1 garbage collector (i.e. -XX:ConcGCThreads=4 -XX:G1HeapRegionSize=2 -XX:InitiatingHeapOccupancyPercent=35 - XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=4 -XX:+UseG1GC) hw2 - Concurrent Mark and Sweep (CMS) enabled (applies to Tenured space only) with explicit configuration of NewRatio set to default value of 2 (i.e. -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=70 -XX:ConcGCThreads=3 - XX:NewRatio=2 -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC) Tigase Monitor shows higher average CPU usage while using G1 collector (roughly 20 p.p. higher than CMS), which also imposes higher risk of queues overflowing. Tigase, Inc. 16

HW1: Tigase, Inc. 17

HW2: Tigase, Inc. 18

GC statistics from HW1 (above) and HW2 (below) Comments: In addition to observed in Monitor higher CPU utilisation we also can see that G1 activity is also higher under load (around 5%). Pause time wise G1 has almost 20 times higher pause time while offering almost 20 times lower GC performance resulting in lower throughput. 3. Concussions and recommendations Summarising all the above and adding a couple of pointers: Garbage Collection is the faster the more dead objects occupies given space, therefore on high-traffic installation it s better to have rather large YoungGen resulting in lower promotion of the objects to the OldGen; Using Heap size adjusted to the actual usage is better as the larger the heap the larger are spaces over which collection needs to be performed thus resulting in longer pauses; in case of huge heaps G1 collector may be better solution to avoid longer pauses; It was revealed that with JVM8 default sizing of Young / Old generation changed, even tho NewRatio is still defaulting to 2 : $ java -server -XX:+PrintFlagsFinal -version grep "NewRatio" intx NewRatio = 2 {product} Java(TM) SE Runtime Environment (build 1.7.0_25-b15) $ java -server -XX:+PrintFlagsFinal -version grep "OldSize" uintx OldSize = 5439488 {product} Java(TM) SE Runtime Environment (build 1.7.0_25-b15) $ /usr/lib/jvm/jdk1.8.0_11/bin/java -server -XX:+PrintFlagsFinal -version grep "NewRatio" uintx NewRatio = 2 {product} Java(TM) SE Runtime Environment (build 1.8.0_11-b12) $ /usr/lib/jvm/jdk1.8.0_11/bin/java -server -XX:+PrintFlagsFinal -version grep "OldSize" uintx OldSize := 63700992 {product} Java(TM) SE Runtime Environment (build 1.8.0_11-b12) Statistics API in Tigase were not optimised thus (especially retaining statistics history) increased promotion rate to Tenured space. Tigase, Inc. 19