More Coherence War Stories. Patrick Peralta, Oracle

Similar documents
Practical Performance Understanding the Performance of Your Application

Tomcat Tuning. Mark Thomas April 2009

WEBLOGIC ADMINISTRATION

Oracle Corporation Proprietary and Confidential

TDA - Thread Dump Analyzer

Apache Tomcat Tuning for Production

Hands-On Microsoft Windows Server 2008

An Oracle White Paper September Advanced Java Diagnostics and Monitoring Without Performance Overhead

THE BUSY DEVELOPER'S GUIDE TO JVM TROUBLESHOOTING

Resource Aware Scheduler for Storm. Software Design Document. Date: 09/18/2015

J2EE-JAVA SYSTEM MONITORING (Wily introscope)

Monitoring and Managing a JVM

1 How to Monitor Performance

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

Tuning WebSphere Application Server ND 7.0. Royal Cyber Inc.

Oracle JRockit Mission Control Overview

Web Server (Step 1) Processes request and sends query to SQL server via ADO/OLEDB. Web Server (Step 2) Creates HTML page dynamically from record set

Delivering Quality in Software Performance and Scalability Testing

Holly Cummins IBM Hursley Labs. Java performance not so scary after all

Zing Vision. Answering your toughest production Java performance questions

Troubleshoot the JVM like never before. JVM Troubleshooting Guide. Pierre-Hugues Charbonneau Ilias Tsagklis

How To Improve Performance On An Asa 9.4 Web Application Server (For Advanced Users)

MID-TIER DEPLOYMENT KB

What s Cool in the SAP JVM (CON3243)

Transaction Performance Maximizer InterMax

1 How to Monitor Performance

Essentials of Java Performance Tuning. Dr Heinz Kabutz Kirk Pepperdine Sun Java Champions

Oracle WebLogic Server 11g Administration

CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS

A Practical Method to Diagnose Memory Leaks in Java Application Alan Yu

PeopleSoft Online Performance Guidelines

my forecasted needs. The constraint of asymmetrical processing was offset two ways. The first was by configuring the SAN and all hosts to utilize

Memory Profiling using Visual VM

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

BridgeWays Management Pack for VMware ESX

Java Troubleshooting and Performance

AGENDA. Introduction About Weblogic Server Weblogic Server Administration Top Ten Concepts Q & A

talent. technology. true business value

Resource Monitoring During Performance Testing. Experience Report by Johann du Plessis. Introduction. Planning for Monitoring

Java Performance. Adrian Dozsa TM-JUG

Bellwether metrics for diagnosing

ZooKeeper Administrator's Guide

Troubleshooting Citrix MetaFrame Procedures

Java Monitoring. Stuff You Can Get For Free (And Stuff You Can t) Paul Jasek Sales Engineer

Weblogic Server Administration Top Ten Concepts. Mrityunjay Kant, AST Corporation Scott Brinker, College of American Pathologist

Monitoring.NET Framework with Verax NMS

IBM Tivoli Monitoring Version 6.3 Fix Pack 2. Infrastructure Management Dashboards for Servers Reference

Whitepaper: performance of SqlBulkCopy

Oracle WebLogic Server 11g: Monitor and Tune Performance

Apache and Tomcat Clustering Configuration Table of Contents

An Oracle White Paper September, Enterprise Manager 12c Cloud Control: Monitoring and Managing Oracle Coherence for High Performance

Open Mic on IBM Notes Traveler Best Practices. Date: 11 July, 2013

Tuning Your GlassFish Performance Tips. Deep Singh Enterprise Java Performance Team Sun Microsystems, Inc.

Resource Utilization of Middleware Components in Embedded Systems

Code:1Z Titre: Oracle WebLogic. Version: Demo. Server 12c Essentials.

Monitoring Agent for Microsoft Exchange Server Fix Pack 9. Reference IBM

MagDiSoft Web Solutions Office No. 102, Bramha Majestic, NIBM Road Kondhwa, Pune Tel: /

Network Security. Marcus Bendtsen Institutionen för Datavetenskap (IDA) Avdelningen för Databas- och Informationsteknik (ADIT)

Enterprise Application Performance Monitoring with JENNIFER

Agile Performance Testing

ELIXIR LOAD BALANCER 2

WebLogic Server System Administration Top Ten Fundamentals Concepts Session ID# 11579

IBM WebSphere Server Administration

Advanced Performance Forensics

Fortinet Network Security NSE4 test questions and answers:

Java Mission Control

HeapStats: Your Dependable Helper for Java Applications, from Development to Operation

Performance Optimization For Operational Risk Management Application On Azure Platform

WINDOWS SERVER MONITORING

Top 10 Performance Tips for OBI-EE

Sample. WebCenter Sites. Go-Live Checklist

Informatica Master Data Management Multi Domain Hub API: Performance and Scalability Diagnostics Checklist

High-Availability. Configurations for Liferay Portal. James Min. Senior Consultant / Sales Engineer, Liferay, Inc.

EMC KAZEON E-DISCOVERY: PERFORMANCE TUNING GUIDELINES AND BEST PRACTICES

Copyright 1

Oracle WebLogic Thread Pool Tuning

System Administration Guide

TOE2-IP FTP Server Demo Reference Design Manual Rev1.0 9-Jan-15

Liferay Performance Tuning

PERFORMANCE TUNING ORACLE RAC ON LINUX

Veeam ONE What s New in v9?

How To Monitor A Server With Zabbix

FileNet System Manager Dashboard Help

JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing

PTC System Monitor Solution Training

McAfee Web Gateway 7.4.1

WebSphere Performance Monitoring & Tuning For Webtop Version 5.3 on WebSphere 5.1.x

Monitoring HP OO 10. Overview. Available Tools. HP OO Community Guides

Java Performance Tuning

WebSphere Server Administration Course

JProfiler: Code Coverage Analysis Tool for OMP Project

Advanced Liferay Architecture: Clustering and High Availability

Caching SMB Data for Offline Access and an Improved Online Experience

WebSphere Architect (Performance and Monitoring) 2011 IBM Corporation

Understand Troubleshooting Methodology

Introduction to Spark and Garbage Collection

Mission-Critical Java. An Oracle White Paper Updated October 2008

Implementing a Well- Performing and Reliable Portal

How To Use Java On An Ipa (Jspa) With A Microsoft Powerbook (Jempa) With An Ipad And A Microos 2.5 (Microos)

Contents. Digital Video Surveillance Basic Troubleshooting Guide

Transcription:

More Coherence War Stories Patrick Peralta, Oracle

Have you ever seen this? Experienced a 4811 ms communication delay

Or this? Timeout while delivering a packet

Why does it happen?

Packet Delivery 35 34 33 3 6 ACK (33,34,35)

Packet Delay 35 34 33 3 6 ACK (33,35) 34 3 6 ACK (34)

Packet Timeout 3 6 33 33 33

Witness Protocol Witness 2 6 Suspect Accuser 3 1 Witness

Why does it happen? Packet delivery failure But what are the common root causes?

Network Disconnect

Let s take this off-line Customer had communication delays and timeouts every night at the same time It turned out that firewall rules are configured every night which resulted in a network disconnect

Lessons Learned Avoid assumptions about the network TALK to your network infrastructure team Run the datagram test to ensure your network will provide adequate performance for your application

Network Bandwidth Exceeded

The perfect storm 120 node cluster Near cache with invalidation strategy all Large keys (100K) Calling clear caused flood of invalidation events, exceeding network capacity

Lessons Learned Monitor the amount of traffic passing through the network interface Coherence assumes small keys; don t use large objects as keys Avoid heavy map listeners that listen to all events, consider a lite listener and/or MapEvent filters

Extreme CPU Consumption

Missing index A missing index was not detected during application testing In production, the missing index resulted in very high CPU usage

Lessons Learned Ensure that indexes are created for filters Monitor CPU usage on the box High CPU usage from other processes may also result in packet loss

Swapping

Tag team A commerce customer uses two application server clusters for their site While one of the clusters is live, the other cluster is updated with site changes At midnight, the load balancer switches to the passive cluster and the active cluster is shut down

Tag team Each box contained two application servers - one for each cluster There wasn t enough physical memory to run both at the same time When both servers were running during switchover at midnight, Coherence would report packet loss

Why is swapping bad? GC algorithms scan the entire heap and move objects around Generational GC Compaction / defragmentation If the collector has to wait for memory to be paged in from disk, it will take longer for the collector to finish

GC while swapping 2008/09/17 10:19:53 [GC 931996K->856503K(1017024K), 0.0758996 secs] 2008/09/17 10:19:55 [GC 659228K->589962K(1014848K), 0.0395841 secs] 2008/09/17 10:19:59 [GC 931006K->859644K(1020928K), 0.0287045 secs] 2008/09/17 10:20:26 [GC 938176K->865107K(1021376K), 19.7179554 secs] 2008/09/17 10:20:29 [GC 660588K->591451K(1014912K), 2.5831922 secs] 2008/09/17 10:20:33 [GC 943524K->861061K(1030528K), 3.8399128 secs] 2008/09/17 10:20:34 [GC 947829K->863184K(1030208K), 0.0404099 secs]

Lessons Learned Ensure that JVM memory allocation does not exceed physical memory Remember that JVM heap size < JVM memory footprint

Garbage Collection

Reduce, Reuse, Recycle Modern 1.6 VMs are very good at GC Most trouble is caused by running with a full heap (> 66%) Full heaps cause the garbage collector to work harder

Full Heap

GC s cousin: OOME OutOfMemoryError is a common cause for havoc in production environment It may be caused by Running out of heap Running out of off-heap (NIO) storage

Causes of heap OOME we ve seen Filling the cache and/or near cache without a proper high-units setting Application memory leaks Slow consumers causing a queue backlog Accessing a HashMap from multiple threads without synchronization

What to do with OOME If OOME is thrown because heap is full, don t waste your time speculating why! Instead, generate a heap dump - this is the fastest way to resolve OOME issues

Heap Dump

Lessons Learned

JVM Upgrade to JDK 1.6 Improved GC Heaps can be larger (4 to 6 GB) But don t fill them to capacity!

JVM Flags Log verbose GC - with timestamps Generate heap dump when an OutOfMemoryError is thrown JVM kill switch OutOfMemoryError See your JVM s manual or the Coherence production checklist for specific flags

Summary of Best Practices Coherence Logging JMX / JMX Reporter OS Monitoring

Log files Some problems with Coherence require multiple logs to track down and resolve Best practice: submit log files for the entire cluster when submitting a service request Don t just cherry pick the lines you think are important!

Log files Logs go to STDERR by default If running a stand alone JVM, consider using Log4j to rotate logs Don t combine log files from different cluster members - this makes it harder for support to analyze

Script files You should have script files to Gather logs from the entire cluster Generate thread dumps on demand Always take more than one

JMX Reporter JMX Reporter is a feature in Coherence that will log JMX statistics to CSV files This information can be very useful in diagnosing cluster issues

JMX Reporter

OS Monitoring OS monitoring is a must! CPU Swap file usage (should be 0%) NIC utilization

Unix/Linux top vmstat

SAR and ksar

Windows perfmon sysinternals

Conclusion JVMs don t work in a vacuum - be aware of their surroundings! Monitoring of JVM and OS is key The more data you capture, the better

Tools Eclipse MAT (http://www.eclipse.org/mat/) JVisualVM (https://visualvm.dev.java.net/) ksar (http://ksar.atomique.net/) sar (http://pagesperso-orange.fr/sebastien.godard/)