WAS Performance on i5/os Lisa Wellman peace@us.ibm.com May 2010
A simplified view: major WAS functions widely used Administered Java runtime environment HTTP request routing Web container Web thread pool Servlet (and JSP) lifecycle Database connections Pooling connections, prepared statements Security Authentication, authorization Administration, application transaction control EJB container JMS services web services Etc, etc, etc HTTP Server WAS is middleware, it doesn t do anything without an application WebSphere Application Server Web container Database connection pool Database
WebSphere Application Server performance Lisa Wellman peace@us.ibm.com
WAS Queue Funnel Queues HTTP Server threads Web container threads ORB pool Data source pools Etc. Queues get smaller as the request goes deeper into the system Better to wait near the network Don t overload the system, bigger is not always better Some requests serviced without backend resources http://publib.boulder.ibm.com/infocenter/wsdoc400/v6r0/i ndex.jsp Tuning performance Tuning the application server environment Queuing network
Typical web application queues
Performance Tools for WAS Environments Tools for underlying infrastructure (Java, OS) plus. Performance Monitoring Infrastructure (PMI) Tivoli Performance Viewer (TPV) i5/os Web Admin GUI HTTP real time stats Web performance advisor (WPA) Web performance monitor (WPM)
Tivoli Performance Viewer (TPV) TPV is a way of viewing PMI data Impact of each level See documentation for each counter Monitoring Monitoring overall system health Performance Monitoring Infrastructure (PMI) PMI data organization (link at bottom of page) counter pages (links at bottom of page), overhead column Never use JVMPI Level All, any JVM subcategories Recommend enabling PMI service Levels can be dynamically set as needed Use basic or custom levels Demo
i5/os Web Administration GUI (port 2001) HTTP server real time statistics Web performance advisor (WPA) Web performance monitor (WPM)
HTTP data also in Collection Services Average response time in sec
Web performance advisor (WPA) Looks at static configuration information Not tailored to your load Checks basic settings on system, HTTP server, WAS Gives recommendations and allows acceptance Advice gives information Like having a performance expert review your configuration, and provide a report Import/export capability
Web performance advisor (WPA) HW, TCP, etc HTTP, WAS config
Web performance monitor (WPM) Uses ARM under the covers Restarts the HTTP server and WAS in order to enable ARM Restart again to turn it off ARM overhead ~20% See performance at different layers HTTP, WAS, DB2 Can filter data (e.g. a client IP)
Web performance monitor (WPM) Turn off autstart first HTTP and Application Server are restarted!!! ARM is enabled (20% overhead)
Web performance monitor (WPM) Look at CPU and response time Hit Refresh Threads Transactions
Web performance monitor (WPM) Monitor transaction for specific IP (Client)
Web performance monitor (WPM) Use it for clients with long response times, to see where time is spent (HTTP, application, DB)
Other Java-based tools ITCAM IBM Tivoli Composite App Mgr WSAD profiler 3 rd party, such as Wily Introscope Open source / freeware?
Review Monitor WAS queuing network with Performance Monitoring Infrastructure (PMI) Tivoli Performance Viewer (TPV) i5/os Web Admin GUI HTTP real time stats Web performance advisor (WPA) Web performance monitor (WPM)
Java performance Lisa Wellman peace@us.ibm.com
IBM i JVM options 6.0 EOS September 30, 2010 i5/os WAS V5R3 V5R4 V6R1 V7R1 6.0 Classic Classic Classic NA Classic Classic Classic IT 32-bit 6.1 IT 32-bit IT 32-bit IT 64-bit IT 64-bit 7.0 Classic Classic IT 32-bit IT 32-bit IT 32-bit IT 64-bit IT 64-bit
32-bit vs. 64-bit Maximum heap size 32-bit gives limited heap size due to limited pointer addressability ~4GB theoretical limit for the entire job In reality, WAS limits Java heap to ~2GB or less (1.5 safe) 64-bit gives unlimited heap from a practical perspective Runtime heap size Smaller pointers results in a smaller heap Smaller heap means better performance
Reduced memory requirements Classic JVM (64-bit) Footprint can be large due to a few factors 64-Bit JVM requires internal pointers to be twice as large Asynchronous GC can cause heap to get larger Implementation could not move Java objects to compact heap IBM Technology for Java JVM (32-Bit) IT JVM has about a 40% smaller memory footprint 32-Bit JVM has smaller addressability Stop-the-world GCs are performed when heap approaches max Implementation can move Java objects, allowing a heap compaction
IT JVM Garbage Collection
GC Policies Specify with gcpolicy, for example Xgcpolicy:optavgpause optthruput (default) Gives best performance overall No concurrent mark or sweep (completely STW) optavgpause Use to reduce STW GC pause times Uses concurrent mark and sweep gencon Gencon has been working well Generational Concurrent Garbage Collector for WAS environments Good for apps with many short-lived objects Objects created in nursery, which is further split into allocate and survivor areas, scavenge is the term used for cleaning (GC) done in this area Copying scheme of nursery reduces fragmentation Adaptive size and tilting ratio Moved to tenured area after reaching threshold age Uses concurrent mark in tenure area. Does not use concurrent sweep. subpool Scales well on very large multi-processor machines Reduces contention on allocation lock by using many size-based free lists No concurrent mark or sweep
Tuning garbage collection Reasonable to just try different policies and measure throughput / pause times Other strategy is to turn on verbosegc and interpret resulting data You must manage these files, WAS does not! -verbose:gc (or -verbosegc) writes its output to the standard error stream (native_stderr for WAS) Use -Xverbosegclog:filename to direct output elsewhere Best since other output does not mess up the verbose GC format so tools can read it. For WAS, I like to put the output in the logs folder, so I use -Xverbosegclog:logs/verbosegc
Recommendations Maximum heap size Look at used heap, maximum value Add 25% and set the value for the JVM Max for WAS is 1.5-2.0GB Pause Times Look if pause times too long Maybe choose another gc policy Time between garbage collection Look at intervals between garbage collections If they are short (GC runs almost continuously) increase max heap size Compaction times Look at compact times Too many of them try gencon policy
Java Memory Usage Java in particular is adversely affected by paging GC must touch every object in the heap Disparate workloads result in more paging than similar workloads Separating workloads facilitates performance monitoring and tuning Additional memory, if any, is minimal Performance characteristics worth the cost
Java Memory Usage Separate workloads with memory pools, LPAR Do NOT allow automatic memory adjustment (system value QPFRADJ) Prefer to move memory with scheduled jobs if required (e.g. nightly batch jobs) If enabled, protect WAS pool with a sufficient minimum value (WRKSHRPOOL F11) or use a private pool Determine memory requirements by adding JVM sizes (heap AND native memory) Monitor paging, heap sizes, and GC cycles
Performance Tools for (IT) Java workloads CL commands for Java SST jvminfo macro Traces or dumps / IBM support assistant (ISA) tools verbosegc / GC and Memory Visualizer Java dump / Thread Analyzer Heap dump / MDD4J System dump PEX TPROF idoctor JobWatcher PTDV Normal i5/os tools like collection services and i5/os commands
V6R1 CL commands WRKJVMJOB PRTJVMJOB GENJVMDMP Use to work with IT JVMs (not Classic JVMs)
jvminfo SST macro V5R4 PTFs MF42160,MF42128,SI28174,SI28142 Parms <none> -gccycles <vm> -threadsl <vm> -java <vm> -heap <vm> -system <vm> -verbosegc <vm> [off] -help dumps JVM addresses last 300 GCs stacks and locks javacore file heapdump (phd file) core file turn on/off verbose GC
IBM Support Assistant (ISA) The convergence spot for all tools and information from IBM. Based on Eclipse technology and product updater. Support documentation and troubleshooting guides Tools Problem submission into IBM Free, download from http://www.ibm.com/software/support/isa/
Cross-platform IT JVM tools IT JVM is cross-platform, and so are the tools Diagnostics are primarily dump and analyze 1. Generate a dump of data 2. Use an ISA tool to analyze the dump There are additional ISA tools not covered these are currently the strategic ones / the ones I find most useful.
Trace: Verbose GC Full name Type of tool How to get it Complexity Overhead What to use it for Key things to look for Verbose Garbage Collection JVM log, mid-level analysis In WAS use checkbox in console, otherwise use verbose:gc JVM option. Output goes to native_stderr file unless you use -Xverbosegclog:logs/verbosegc Or turn on/off dynamically with WRKJVMJOB or SST. Moderate Minimal Monitor garbage collector behavior, and check for object leaks. Cycles which begin for a reason other than threshold allocation reached Heap growth over time (live objects or current heap size) Long collection time, especially if one cycle starts as soon as the previous one ends
Tool: GC Visualizer Verbose GC Output Full name Type of tool How to get it Complexity Overhead What to use it for Key things to look for IBM Monitoring and Diagnostic Tools for Java - Garbage Collection and Memory Visualizer Parsing tool of a Verbose GC collection Part of ISA Simple to moderate Minimal (Verbose GC only) Detecting object leaks and monitoring heap usage. Compare different runs. High level info and recommendation in report, details in line plots, focus on Used heap (after collection) This tool is supposed to be strategic and supported On line plot, change axis for different views and use VGC Data menu for data points; report gives executive overview; data is summary of GCs
Dump: Javacore Full name Type of tool How to get it Complexity Overhead What to use it for Key things to look for Javacore, JavaDump, or thread dump Dump of the current status of the JVM (human-readable) Mechanism included in J9 JVM The heap dump will be generated (by default) when: JVM terminates unexpectedly Signal sent via kill QUIT <pid> User code calls com.ibm.jvm.dump.javadump() SST jvminfo java <task> GENJVMDMP *JAVA (V6R1) Moderate Minimal Dump information about a running JVM, including the classpath, basic heap information and thread information (state, locks and stacks). Current heap size Threads which are stuck (stack information)
Tool: Thread and Monitor Dump Analyzer Javacore dump Full name Type of tool How to get it Complexity Overhead What to use it for Key things to look for IBM Thread and Monitor Dump Analyzer (TMDA) Parsing tool of a javacore Part of ISA Simple Minimal (client post processing of a javacore file) Detecting Java hangs and delays. Can compare dumps. Java thread state and stacks out of place. Deadlock situations that are occurring. Thread leaks
Dump: HeapDump Full name Type of tool How to get it Overhead What to use it for Key things to look for Heapdump file (phd files) Binary file only readable by parsing programs. Use opts=classic for human-readable form (e.g. -Xdump:heap:opts=CLASSIC+PHD) Mechanism included in J9 JVM The heap dump will be generated (by default) when: OutOfMemoryError occurs in the JVM Specify Xdump:heap for other options, including signal option with kill QUIT <pid> User code calls com.ibm.jvm.dump.heapdump() SST jvminfo heap <task> GENJVMDMP *HEAP (V6R1) wsadmin for WAS Heavy, client overhead very heavy Analyze the file with tools, such as MAT. Debug object leaks Continuous growth of objects
Heap analysis tools Heap/memory analysis is very hard Tools sometimes help Tools are resource intensive (memory) MDD4J intended for relatively simple, first-pass analysis, target casual users Will remain beta, no enhancements MAT (Memory Analyzer Tool) for more complex analysis, target experts
ISA Tools: Java strategy Health Center Newer, may have promise IBM s JVM team is converging on a family of tools - IBM Monitoring and Diagnostic Tools for Java. These have the best chance of being strategic and supported.
normal IBM i performance tools / interfaces WRKACTJOB WRKSYSSTS WRKSYSACT WRKDSKSTS Collection Services Management Central Monitors Performance Data Investigator idoctor (JW, PA, PTDV) PEX
IBM i Performance Tools and IT JVM Tool PEX TProf PEX Java events (object creates, entry/exit) SST macros DMPJVM, ANZJVM idoctor JobWatcher idoctor HeapAnalysis WRKJOB command (stacks) Works with IT JVM? Yes, with V5R4 PTFs, but no stacks No Yes, with V5R4 PTFs No Yes, in V6R1. V5R4 limited to jobs/threads only No Yes, with V5R4 PTFs
PEX TPROF Can also use PEX Analyzer Identifies users of CPU by sampling ADDPEXDFN DFN(TPROF5) TYPE(*PROFILE) PRFTYPE(*JOB) JOB(*ALL) TASK(*ALL) MAXSTG(100000) INTERVAL(5) TEXT('TProf - 5 ms sampling interval') STRPEX / ENDPEX Trace for 500K events, or as long as possible Use PRTPEXRPT or PTDV for analysis PRTPEXRPT MBR(TEST) LIB(MYLIB) TYPE(*PROFILE) PROFILEOPT(*SAMPLECOUNT *PROCEDURE) ORDER(*ASCCENDING) Also leave out PROFILEOPT parameter to use default *PROGRAM value instead of *PROCEDURE Measure Java GC Target 15% or less Identify application problems Operations either run frequently or are processor intensive
idoctor 4 components Job Watcher Collection Services Investigator Disk Watcher PEX analyzer Heap Analysis PTDV (Performance Trace Data Visualizer) = fee, free 45-day trial = free Client and server components Command and GUI interfaces https://www-912.ibm.com/i_dir/idoctor.nsf
JobWatcher IT JVM graphs
TPROF analysis with PEX Analyzer
IT JVM Tools ISA tools have this background Data/Tool Cost Complexity What it is used for i tools such as PEX, Collection Services, Performance Investigator Free Moderate to Complex System resource usage such as CPU, memory pools, disk and IO idoctor Free & fee Moderate to Complex Monitor heap / GC Waits, run signature, CPU users and more SST macros V6R1 commands Free Moderate Display / dump various information IBM Support Assistant Free Simple IBM portal for solving both functional and performance issues. Work in progress as tools are added. Provides searching, problem reporting, updating tools and managing dumps. Verbose GC Free Moderate At every GC cycle, information is logged about the Java heap and GC functions. Useful to determine if your application has memory leaks, monitor your current heap size, frequency and length of GC cycles, etc GC and Memory Visualizer Free Moderate Monitor heap size and GC behavior
IT JVM Tools (cont) Data/Tool Cost Complexity What it is used for javacore file Free Moderate Also referred to as a JavaDump. The Javacore shows information about threads within the JVM (state, stack, locking) Thread and Monitor Dump Analyzer Free Simple javacore parsing tool Compare thread stacks between dumps Monitor (Java lock) analysis ThreadAnalyzer Free Simple javacore parsing tool Analysis to get very high level view of work via grouping similar thread stacks Monitor (Java lock) analysis Heapdump file Free Complex Binary dump file with the contents of the Java heap. Feed into tools to parse the output. MAT Free Complex Analyze heap dumps Various analysis and report options MDD4J Free Complex Analyze heap dumps Pinpoint object leaks and who is rooting the object core file Free Complex Generated by JVM when serious error occurs Let IBM do analysis
Current Tool Summary Monitoring GC verbosegc and Garage Collection and Memory Analyzer, WRKJVMJOB Stack dumps javacore and ThreadAnalyzer or Thread and Monitor Dump Analyzer, WRKJVMJOB Heap dumps heapdump and MDD4J or Memory Analyzer CPU usage Health Monitor, PEX TPROF
Review Java and memory IT JVM GC behavior and tuning IT JVM Performance tools V6R1 CL commands jvminfo SST macro Traces or dumps / IBM support assistant (ISA) tools verbosegc / GC and Memory Visualizer Java dump / Thread & Monitor Dump Analyzer Heap dump / MDD4J & MAT System dump PEX idoctor Normal i5/os tools like collection services, i5/os commands, Systems Director Support for IT JVM in i5/os tools
Performance Roadmap: IT JVM High CPU WRKACTJOB In WAS jobs: TPROF, PTDV In DB jobs: DBMon Other Problems Look at GC health: SST/WRKJVMJOB, Verbose GC Tune GC (Policies, heap sizes) Leak (HeapDump, MAT) Javacorefile (Thread & Monitor Dump Analyzer) Performance Monitoring Infrastructure (PMI) idoctor JobWatch Collection Services, WRKDSKSTS
5-minute Data Collection: IT JVM If you only have 5 minutes to collect data (e.g. need to shut down and recover) Dump GcCycles with SST or WRKJVMJOB (V6R1) to printer Not needed if verbosegc is on Check CPU, if high run a TPROF Create javacore file (or several) Run a JobWatcher trace Grab WAS logs
HTTP server Performance Lisa Wellman peace@us.ibm.com
General Performance Tips Minimize the number of requests per page Each resource reference on a page is a separate request Flatten the directory structure and use short paths Configure FRCA caching of static content Configure memory caching of SSL static content Configure Server Side Includes (SSI) only in the scope of where they are used Do not configure DNS client hostname lookups for logging Do not use.htaccess files Set AllowOveride directive to Off Build CGI programs in "named" activation group Use StartCGI directive to "pre-start" CGI jobs
Tune ThreadsPerChild Directive Controls the number of concurrent requests that the server can process Default is 40 More is not necessarily going to be better Too many can cause processor thrashing It is best to tune this in a controlled environment With some simulation tool driving transactions Or, change it and let it run for a day Use access log reports to analyze your traffic And fine tune over a period of time
Use Persistent Connections The System i has specific code that allows for effective use of persistent connections (Keepalive). Asynchronous I/O support avoids having a thread tied up with a single request. Allow persistent connections use a single "socket" connection for multiple requests from a single client Time to wait between requests Time the server keeps the "socket" connection open waiting for another request Maximum requests per connection Number of requests allowed before the server closes the connection
JDBC access of DB2/400 Lisa Wellman peace@us.ibm.com October 2008
JDBC drivers Native JDBC driver Best performance when database is local (same partition) to client Toolbox JDBC driver Best performance when database is remote from client Use XA drivers only when required
DSPACTPJ Toolbox JDBC Driver Native JDBC driver
Database Performance is a huge topic Performance analysis starting from the backend DB is very effective DB tuning can result in large gains Workload elimination also possible A good way to understand the application at a high level without reading code
Optimizing Performance with Caching Lisa Wellman peace@us.ibm.com
Cache strategy Cache as close to the network as possible Cache configuration requires application and business knowledge Benefit can be large, but effort is required (some caches more than others). Weigh costs and potential benefits; target those with the most potential.
Caching layers Cache as close to the edge as possible e.g. Edge Server cache HTTP server Use FRCA for public content Local caches for secure content ESI caching in WAS HTTP plugin WAS Dynamic cache Application Smart coding
Performance Tuning Parameters for Java/WAS environments
Tuning Areas System HTTP WAS Database Tune when you have major changes (e.g. new applications, upgrades, or more users), and when you have performance concerns
i5/os System values QPFRADJ, QPRCMLTTSK, QQRYDEGREE, QMAXACT, etc. Pools Separate different workloads Probably database Especially Java Memory, max active Min memory (WRKSHRPOL) if QPFRADJ
HTTP server Number of threads Compression Useful for slow networks, has CPU cost Cache settings FRCA, local, etc. Logging and tracing
WAS Java Queuing network Web container threads ORB threads Data source connection pools Session persistence Performance tools Trace, PMI, ARM Cache settings Class reloading Security Isolation levels Transaction boundaries Topology IT (vs. Classic) 32 vs. 64 bit Policy Maximum heap size Minimum heap size Usually only the queuing network, Java, and perhaps topology need to be adjusted, otherwise defaults are good nearly universally. Of course you can change other values, and you have a good probability of causing problems if you do!
Summary Tune the request flow queues Ensure there is enough (dedicated) memory for Java separate memory pool QPFRADJ=0 or minimum on memory pool Tune Java GC Leave everything else alone Most defaults are best for almost all applications and environments Don t adjust anything you do not fully understand Identify problem areas before adjusting anything remember the back ends (e.g. DB)
WebSphere Application Server on i5/os Performance Monitoring and Tuning Lisa Wellman peace@us.ibm.com
Primary Metrics to Monitor Paging rates WRKSYSSTS, Collection Services CPU consumption WRKACTJOB, WRKSYSSTS, Collection Services Java garbage collection (GC) health DMPJVM, SST macros, verbosegc, WRKJVMJOB Database server jobs DSPACTPJ WAS pools Tivoli Performance Viewer (TPV) Classic JVM IT JVM Both
Monitoring strategy Always run Collection Services Job Watch Monitors Run the other tools When problems arise Occasionally to know what s normal Monitor all the time to understand trends and have some data when problems occur
Roadmap High CPU WRKACTJOB In WAS jobs: TPROF, PTDV In DB jobs: DBMon Classic JVM IT JVM Both Paging in WAS jobs (with IT JVM this becomes part of other ) Look at GC health (SST, JobWatcher, verbosegc, DSPJVMJOB) Look for heap growth (leaks) Neither of the above, but slow responses DMPJVM, javacore dump TPV idoctor JobWatcher Collection Services, WRKDSKSTS, etc
5 minute data collection If you only have 5 minutes to collect data (i.e. need to shut down and recover) Dump GcCycles to printer (SST or PRTJVMJOB for IT in V6R1) Not necessary if verbosegc is on Check CPU, if high run a TPROF Classic JVM: Check paging, if NOT high do a DMPJVM IT JVM: Get a javacore via kill -QUIT or SST jvminfo - java or GENJVMDMP Ideally get several dumps Run a JobWatcher trace Save WebSphere Logs
Loadtests When running load/stress tests, measure these metrics using these tools CPU TPROF GC / Heap Size JobWatcher (IT V6R1+) Verbose GC / GC and Memory Analyzer Pools Web Threads, Connections: PMI / TPV Database Tuning DBMon