Monitoring and Managing a JVM

Similar documents

Java Troubleshooting and Performance

HeapStats: Your Dependable Helper for Java Applications, from Development to Operation

Tuning WebSphere Application Server ND 7.0. Royal Cyber Inc.

WEBLOGIC ADMINISTRATION

Java Debugging Ľuboš Koščo

TDA - Thread Dump Analyzer

How To Improve Performance On An Asa 9.4 Web Application Server (For Advanced Users)

Blackboard Open Source Monitoring

Effective Java Programming. measurement as the basis

THE BUSY DEVELOPER'S GUIDE TO JVM TROUBLESHOOTING

Monitoring applications in multitier environment. Uroš Majcen A New View on Application Management.

What s Cool in the SAP JVM (CON3243)

JBoss Cookbook: Secret Recipes. David Chia Senior TAM, JBoss May 5 th 2011

Practical Performance Understanding the Performance of Your Application

Instrumentation Software Profiling

An Oracle White Paper September Advanced Java Diagnostics and Monitoring Without Performance Overhead

Replication on Virtual Machines

WebSphere Server Administration Course

IBM WebSphere Server Administration

Zing Vision. Answering your toughest production Java performance questions

Debugging Java performance problems. Ryan Matteson

Weblogic Server Administration Top Ten Concepts. Mrityunjay Kant, AST Corporation Scott Brinker, College of American Pathologist

An Oracle White Paper September, Enterprise Manager 12c Cloud Control: Monitoring and Managing Oracle Coherence for High Performance

<Insert Picture Here> Java Application Diagnostic Expert

Oracle WebLogic Server Monitoring and Performance Tuning

Jonathan Worthington Scarborough Linux User Group

Course Description. Course Audience. Course Outline. Course Page - Page 1 of 5

How To Use Java On An Ipa (Jspa) With A Microsoft Powerbook (Jempa) With An Ipad And A Microos 2.5 (Microos)

Performance Monitoring and Tuning. Liferay Chicago User Group (LCHIUG) James Lefeu 29AUG2013

Web Performance, Inc. Testing Services Sample Performance Analysis

Java VM monitoring and the Health Center API. William Smith

AGENDA. Introduction About Weblogic Server Weblogic Server Administration Top Ten Concepts Q & A

General Introduction

talent. technology. true business value

MONITORING A WEBCENTER CONTENT DEPLOYMENT WITH ENTERPRISE MANAGER

How To Monitor A Server With Zabbix

WEBLOGIC SERVER MANAGEMENT PACK ENTERPRISE EDITION

Monitoring HP OO 10. Overview. Available Tools. HP OO Community Guides

Robert Honeyman

HP OO 10.X - SiteScope Monitoring Templates

JVM Garbage Collector settings investigation

MagDiSoft Web Solutions Office No. 102, Bramha Majestic, NIBM Road Kondhwa, Pune Tel: /

SAP HANA SPS 09 - What s New? Administration & Monitoring

WebLogic Server System Administration Top Ten Fundamentals Concepts Session ID# 11579

Performance Monitor. Intellicus Web-based Reporting Suite Version 4.5. Enterprise Professional Smart Developer Smart Viewer

Veeam ONE What s New in v9?

OBM / FREQUENTLY ASKED QUESTIONS (FAQs) Can you explain the concept briefly on how the software actually works? What is the recommended bandwidth?

Enterprise Manager Performance Tips

Proactive and Reactive Monitoring

Identifying Performance Bottleneck using JRockit. - Shivaram Thirunavukkarasu Performance Engineer Wipro Technologies

Using jvmstat and visualgc to Solve Memory Management Problems

Winning the J2EE Performance Game Presented to: JAVA User Group-Minnesota

Performance Optimization For Operational Risk Management Application On Azure Platform

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers

Chronon: A modern alternative to Log Files

Intro to Virtualization

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

11.1 inspectit inspectit

Trace-Based and Sample-Based Profiling in Rational Application Developer

Troubleshoot the JVM like never before. JVM Troubleshooting Guide. Pierre-Hugues Charbonneau Ilias Tsagklis

Cisco Unified Contact Center Express and Cisco Unified IP IVR - Best Practices

ZooKeeper Administrator's Guide

JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing

IBM SDK, Java Technology Edition Version 1. IBM JVM messages IBM

Tool - 1: Health Center

THE BUSY JAVA DEVELOPER'S GUIDE TO WEBSPHERE DEBUGGING & TROUBLESHOOTING

ELIXIR LOAD BALANCER 2

A Practical Method to Diagnose Memory Leaks in Java Application Alan Yu

Open Mic on IBM Notes Traveler Best Practices. Date: 11 July, 2013

Transaction Performance Maximizer InterMax

ONLINE BACKUP MANAGER TROUBLESHOOTING MISSING BACKUP JOBS

Oracle Corporation Proprietary and Confidential

J2EE-JAVA SYSTEM MONITORING (Wily introscope)

This document will list the ManageEngine Applications Manager best practices

Linux Server Support by Applied Technology Research Center. Proxy Server Configuration

Sample. WebCenter Sites. Go-Live Checklist

WebOTX Application Server

Oracle WebLogic Server 11g Administration

KonyOne Server Installer - Linux Release Notes

XpoLog Center Suite Log Management & Analysis platform

CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS

Rational Application Developer Performance Tips Introduction

Tomcat Tuning. Mark Thomas April 2009

MID-TIER DEPLOYMENT KB

Administering Microsoft SQL Server 2012 Databases

Monitoring Best Practices for COMMERCE

GigaSpaces Customer Support Overview

Justin Bruns, Performance Test Lead, Fireman's Fund Insurance Robert May, Enterprise Software Tools Administrator Fireman's Fund Insurance

Agility Database Scalability Testing

Monitoring Experience Redefined

Configuring Apache Derby for Performance and Durability Olav Sandstå

Monitoring Remedy with BMC Solutions

Tools in the Box. Quick overview on helpful tools in the JDK and use cases for them. Florin Bunau

Monitoring Microsoft Exchange to Improve Performance and Availability

Transcription:

Monitoring and Managing a JVM Erik Brakkee & Peter van den Berkmortel

Overview About Axxerion Challenges and example Troubleshooting Memory management Tooling Best practices Conclusion

About Axxerion Axxerion is an Integrated Workplace Management System, aiming to make organizations more efficient enable collaboration adapt to an organization Axxerion organization aims to offer employees a stable and innovative workplace

Axxerion Metrics 10 developers 30 consultants 100 virtual and physical servers 14,000 monitored items 62 items per second 6 clusters 300+ clients in 14 countries 80,000 users

Middleware Stack Middleware stack CentOS 6 & 7 KVM Virtualization (standard linux) MySQL 5.6 JBoss 5 Java EE 5 Java 8 Eclipse Link (JPA 2)

Challenges Multi-tenancy Trace back issues Load Some issues only occur under load Difficult to reproduce in test After-the-fact troubleshooting Monitoring many variables Is there a problem? Distinguish between cause and effect Pro-active Self-defeating monitoring

Example of an Incident One year of troubleshooting Crashes in native code Replace native libraries Application logs freeze Automatic reload of log configuration Heap space problems Finalizer queue size monitoring Introduced our own log appender Server generates 25 million exceptions per hour Introduced exception log analysis

Example of an Incident One year of troubleshooting Server log fragments end up in unexpected places File descriptor errors Final solution found through monitoring and reverse thinking All of this was caused by an object cloner

Troubleshooting After-the-fact Which actions led to a problem? Users do not remember what they were doing Therefore, we log a lot Every login & logout Every user action Every update of a field of an object Anything else that can be useful

Troubleshooting Event logging Start and end statements At one-minute intervals Number of bytes allocated CPU usage Allows troubleshooting of individual threads

Troubleshooting Event logging 2015-11-03 13:29:23,509 FINE [com.axxerion.performance] (rp-worker-3821) performance @id 395972 @name rp-worker-3821 @usertime 0 @cputime 231809 @allocated 1336 @state RUNNABLE @blockedcount 0 @blockedtime -1 @waitedcount 0 @waitedtime -1 @lockinfo null @subevent workertask.start @realtime 9615399862973957 2015-11-03 13:29:23,559 FINE [com.axxerion.performance] (rp-worker-3821) performance @id 395972 @name rp-worker-3821 @usertime 10000000 @cputime 20074642 @allocated 2361888 @state RUNNABLE @blockedcount 11 @blockedtime -1 @waitedcount 1 @waitedtime -1 @lockinfo null @subevent workertask.end @realtime 9615399912328419 (based on ThreadMXBean)

Troubleshooting Exception logging Bulk approach Exceptions similar on stack frames Ignore message Custom log appender/handler Periodically write hash data Use hash as key Store hash on disk with full (first seen) exception

Troubleshooting com.axxerion.fault: error_no_administrator_defined: client axdsr1 at com.axxerion.server.b.b.execute(directmessagequeueentry.java:180) at com.axxerion.server.b.b.execute(directmessagequeueentry.java:31) at com.axxerion.r.execute(runnableexecutablewrapper.java:38) 10 more Caused by: Fault: error_no_administrator_defined: client axdsr1 at com.axxerion.server.util.contactutil.getadministratorsystemuserid(contactutil.java:384)... 8 more GIT principle: content addressable storage (hash == object) Secure hash based on exception class, stack frames, cause of exception (recurse) Log the hash in exception.log file Simple hash script can add up similar hashes

Troubleshooting Script output Processing /var/ /log/exceptions.log... Most occurring exceptions Occurrences: 1647 Hash: 2d4eed091276af79d526e88115f68a82bbb0c1de First exception of this kind seen: Time: 2015-09- 28 23:30:20.954301678 +0200 Level: INFO Log file sample 2015-11-02 16:28:16,184 INFO [com.axxerion.exceptiontracker] stats 2d4eed091276af79d526e88115f68a82bbb0c1de INFO occurrences delta 2 cumulative 1101 xyz.svc.webservices.data.serviceresponseexception: The specified object was not found at xyz.svc.ws.data.serviceresponse.internalthrow(serviceresponse.java:266) at xyz.svc.ws.data.request.execute(request.java:152) at xyz.svc.ws.data.svcservice.lookupitems(service.java:1364) at xyz.svc.ws.data.svcservice.bindtoitem(service.java:1407) Total number of unique exceptions: 160

Troubleshooting Troubleshooting experience Ad-hoc scripting Some issues take days others might take years Issues getting harder to find Psychology Relax Switch to stand-by Gather data This might be your only chance Get out of the denial phase Reverse thinking

Memory Management Some essential tips GC logs Heap usage Heap fragmentation Initiating occupancy fraction Maximum chunk size Trigger full GCs jmap -histo:live

Monitoring Identify core parameters User experience Response time Outstanding requests Technical Multiple full GCs within a short time frame Failing scheduled tasks (e.g. backups, restores) See also techblog.netflix.com for several ideas and tooling around monitoring.

Tooling Zabbix Flexible Easily customizable Monitor large numbers of servers

Tooling Technical approach Middleware, OS, Database Use standard items Use your own scripts Application level Use JMX as much as possible Internal statistics service

Tooling JStack Annotate and sort threads with script Deadlock detection and debugging Stack dump (jstack -l) has negligible disruption

Tooling JMap Heap dumps jmap -dump:format=b,file=$dumpfile $pid: stop-the-world up to 3 minutes After disaster Histogram jmap -histo: negligible disruption After-the-fact analysis Trigger full GC jmap -histo:live: stop-the-world up to 30 seconds Avoid heap fragmentation

Tooling JVisual VM MBean monitoring CPU sampling (5s interval) Do not use Memory sampling Profiling

Tooling Eclipse memory analyzer Works with 17 GB memory dumps Has saved the day Run on separate machine

Wishlist Missing JVM features Stop running threads Deallocated bytes per thread MetaSpace GC can trigger full GC Garbage collection Compacting Predictable performance Guaranteed no stop-the-world

Best Practices Simple statements can kill your server System.getProperty( ) exception.tostring() InetAddres.getLocalHost().getHostname() Third-party libraries Reflection bottleneck Risk management Runtime control over new features Assure a loop or recursion breaks off Do not use finalizers

Conclusion Results Server stability (99.998% uptime) Getting better Troubleshooting A lot of tools required Do not assume anything Go top-down Sometimes more issues than one Cause or effect There is no silver bullet

techblog.axxerion.com www.axxerion.com/nl/careers/ please rate my talk in the official J-Fall app