How to analyse your system to optimise performance and throughput in IIBv9 Dave Gorman gormand@uk.ibm.com 2013 IBM Corporation
Overview The purpose of this presentation is to demonstrate how to find the cause of poor performance for an node (broker) for two different types of problem. The examples are obtained on a Windows system but the principles of investigation and problem determination apply equally on all platforms. The system level tools will differ though. 2 2013 IBM Corporation
Agenda Introduction Tools Techniques Demonstration 3 2013 IBM Corporation
What are the main performance costs in message flows? Parsing Tree Navigation Tree Copying A B C X Y Z Root.Body.Level1.Level2. Level3.Description.Line[1]; Set OutputRoot = InputRoot; Resource Access Processing Logic 4 2013 IBM Corporation
Integration Bus Processes Task bipservice bipbroker biphttplistener DataFlowEngine [n] Function Administration agent availability Administrative agent HTTP Listener (broker wide) Execution group provides message flow runtime environment Important to understand: Topology of broker SOAP and HTTP listeners 6 2013 IBM Corporation
Which resources and how much In busy times expect to use what is needed (!) Exactly what will depend on the configuration and the applications Typical to use CPU and memory plus I/O to some level In quiet times Message Broker and MQ processes should use very little CPU Should use very little I/O capacity Will retain memory unless memory utilisation is very high The amount of CPU and memory used will depend on the situation A complex configuration many MQ channels, hundreds of message flows will use much more memory and CPU than a single message flow Some memory sizes bipservice 4.6 MB bipbroker 162 MB biphttplistener 48 MB DataFlowEngine 191 MB for JavaTransform samples can use from ~100MB to GigaBytes depending on The number of flows, the complexity of the message flow, the size of the messages MQ processes Expect it to be less than Message Broker (192 MB for simple queue manager on xlinux) Will depend on number of open queues, channels, queue buffer sizes etc. 7 2013 IBM Corporation
Tools That are Needed Monitoring Tools At the operating system level to observe System resource usage CPU, memory, I/O activity Heaviest resource users At the component level to observe Behaviour within the particular component (MQ/Integration Bus) Both types of tools of needed They have different views of the world They are complimentary Driving Tools Needed to generate a continuous workload Important to assess performance after warm-up during sustained activity 8 2013 IBM Corporation
UNIX Tools Vmstat Iostat System Configuration: lcpu=64 mem=8192mb kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 1 0 1977672 25823 0 0 0 0 0 0 3 958 696 4 0 96 0 1 0 1977838 25719 0 2 0 98 100 0 29 2941 2250 4 0 96 0 1 0 1977685 25872 0 0 0 0 0 0 2 636 483 4 0 96 0 System configuration: lcpu=64 drives=5 paths=6 vdisks=2 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 29.5 3.6 0.1 96.2 0.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk3 0.0 0.0 0.0 0 0 hdisk2 0.0 0.0 0.0 0 0 hdisk0 0.0 4.0 1.0 8 0 hdisk1 0.0 0.0 0.0 0 0 cd0 0.0 0.0 0.0 0 0 Nmon filemon 11 2013 IBM Corporation
Process Explorer on Windows Watch system activity in detail on Windows Watch CPU Usage Commit Charge I/O Activity Physical Memory History Summary Information Individual Processes Download from http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx 12 2013 IBM Corporation
Process Explorer DataFlowEngine.exe is the Execution Group amqzlaa0.exe is the MQ agent for the broker Can quickly see system is busy Customise by selecting columns of choice 13 2013 IBM Corporation
z/os Tools SDSF RMFMON II 15 2013 IBM Corporation
Key Tools at the Component Level Integration Bus User Trace Trace Nodes WebUI Compare flow statistics at the node (broker), server (execution group), container (application or library) or at an individual message flow level IIB Explorer View resource use at the execution group level View Activity Log MQ Explorer Java Healthcenter 17 2013 IBM Corporation
Statistics Scope Node (broker) Server (execution group) Thread Message Flow Resource Statistics Accounting & Statistics Message Model Node Terminals 18 2013 IBM Corporation
WebUI Statistics Using the WebUI in Integration Bus v9: Control statistics at all levels Easily view and compare flows, helping to understand which are processing the most messages or have the highest elapsed time Easily view and compare nodes, helping to understand which have the highest CPU or elapsed times. View all statistics metrics available for each flow View historical flow data 23 2013 IBM Corporation
Integration Bus Explorer & Resource Statistics View resource statistics for resource managers in IIB such as JVM, ODBC, JDBC etc 24 2013 IBM Corporation
Integration Bus Explorer & Activity Log View activity as it happens using explorer Filter by resource managers 25 2013 IBM Corporation
MQ Explorer MQ Explorer Trace Statistics 26 2013 IBM Corporation
IBM Support Assistant and Java Health Centre Java Health Centre is provided as part of the IBM Support Assistant Offers very low overhead monitoring tool Runs along side an IBM Java application Get visibility, monitoring and profiling in the following application areas: Performance Java method profiling Lock analysis Garbage collection Memory System Java Class File input and Object Enable the application JVM prior to use IBM_JAVA_OPTIONS=-Xhealthcenter 27 2013 IBM Corporation
Demonstration of Analysing Performance Issues Identify problems in two message flows using Process Explorer WebUI Statistics MQ Explorer Java Healthcenter Message Flows Coordinated Request Reply JavaComputeTransformNoXpath 28 2013 IBM Corporation
Demo 1 Analysing a performance problem in the Coordinated Request Reply Sample 29 2013 IBM Corporation
Coordinated Request Reply flows Consists of three message flows Request Converts incoming message from XML to CWF Saves the incoming message in a queue for subsequent reply processing Writes a message for the back end reply message flow Backend Application Sets the completion time in the message Writes a reply message Reply Reads the message from the back end message flow Retreives the original message saved by the request message flow Writes an output message 30 2013 IBM Corporation
Coordinated Request Reply queues The queues Request GET_REQREP_IN GET_BACKEND_REQ BackendReplyApp GET_REPTO_STORE GET_BACKEND_REQ GET_BACKEND_REP Reply GET_BACKEND_REP GET_REQREP_OUT GET_REPTO_STORE 31 2013 IBM Corporation
Run and Investigate Steps 1. Ensure all components are started and the application works as expected Message flows, databases, external applications etc 2. Start a load generator [JMSPerfharness in this case] 3. Look at activity Is processing happening at the expected rate Is CPU usage as expected Is memory usage as expected 4. If things do not seem as expected Look for a build up of messages Poor service times 5. Enable and view statistics 6. Analyse statistics 7. Examine message flows 32 2013 IBM Corporation
Step 1 Check flows are running using the WebUI Check the server is running Check the flows are running Check the event/sys log for any errors Processing messages and no errors 33 2013 IBM Corporation
Step 2 Start a load generator Run Perfharness Use 10 threads All threads start successfully Each thread PUTs a message then GETs a message so should be no messages on queues for any period of time Check event/sys log for any error messages 34 2013 IBM Corporation
Step 3 Look at CPU activity Messages being processed but Rate is low, much lower than expected Very little CPU being used Execution group does not register any CPU activity 35 2013 IBM Corporation
Step 4 Look for a build up of messages Key queues are GET_REQREP_IN GET_REPTO_STORE GET_BACKEND_REQ GET_BACKEND_REQ GET_BACKEND_REP Request BackendReplyApp Build up of msgs on queues: GET_REPTO_STORE GET BACKEND_REQ GET_BACKEND_REP GET_REPTO_STORE GET_REQREP_OUT Reply What does this mean? 36 2013 IBM Corporation
Step 4 Look for a build up of messages Looking at the flows Queue GET_REPTO_STORE is used by Request and Reply message flows Queue GET_BACKEND_REQ is used by BackendReplyApp message flow GET_REPTO_STORE is used mid-flow (so flows using this are less likely to be problem) GET_BACKEND_REQ is the input queue for the BackendReplyApp, Indicates flow is not running fast enough or not enough instances allocated Need to investigate what is happening with BackendReplyApp For this use WebUI flow statistics 37 2013 IBM Corporation
Step 5 Enable flow statistics Start and stop statistics using the WebUI for: All flows in a server All flows in a container Individual flows 38 2013 IBM Corporation
Step 5 View statistics Flow comparison views Select the statistics view Drill down to the problem flow Start by comparing flows Flow Analysis view for most detail 39 2013 IBM Corporation
Step 6 Compare flows Compare flows to determine which one might be causing the problem We can see that the BackenReplyApp flow has an average elapsed time of 1010.1 millseconds. It only has 1 active thread, and has processed 20 messages in the 20 second statistical snapshot period. This matches the rate we see from PerfHarness! 40 2013 IBM Corporation
Step 5 Analyse the flow Display historical flow details such as message rate, CPU and elapsed time View all nodes within the flow to determine and sort by average elapsed and CPU times The compute node Modify_CompletionTime seems to be a problem! What does high elapsed time and low CPU time suggest the problem might be? 41 2013 IBM Corporation
5 Review the code Having worked out which node is causing the problem We can quickly see which why the node is taking 1 sec elapsed time but little CPU 42 2013 IBM Corporation
Problem Found! 1 second sleep in the compute node within the message flow is causing slow processing times and no CPU usage Matches the observations at the start Low CPU and low rate Unlikely to be so easy in future but slow service times, like slow synchronous web service invocation would have the same effect If it was slow web service response then allocate more instances to improve processing rate 43 2013 IBM Corporation
Summary of Steps for this Investigation Use a systematic approach Key steps used were 1. Ensure all components are started and the application works as expected Message flows, databases, external applications etc 2. Start a load generator 3. Look at activity Is processing happening at the expected rate Is CPU usage as expected Is memory usage as expected 4. If things do not seem as expected Look for build up of messages 5. Analyse accounting and statistics 6. Examine message flows It is very important to Use tools System level and component level Start at a high level system level and then close-in on the problem 44 2013 IBM Corporation
Demo 2 Analysing a performance problem in the JavaNoXPath Sample 45 2013 IBM Corporation
Demonstration JavaComputeTransformNoXpath Consists of one message flow JavaComputeTransformNoXpath Reads an XML message Transforms to a different format using a Java Compute node JAVACOMPUTE.TRANSFORMNOXPATH.IN JAVACOMPUTE.TRANSFORMNOXPATH.OUT 46 2013 IBM Corporation
What is the problem we need to solve? The problem is characterised by Low message rate High CPU usage at both system and Execution Group level Sufficient messages on the input queue Likely issue is one of high CPU usage in a message flow But which flow and which node? 47 2013 IBM Corporation
Compare the flows All of the Elapsed and CPU time is in JavaComputeTransformNoXpathFlow so continue investigation of this flow 48 2013 IBM Corporation
Finding the Processing Node for Investigation The majority of the elapsed and CPU time within in the flow is spent in the JavaComputeTransformNoXpath node What might cause this? As this is a Java Compute Node continue investigation using the Java Healthcenter 49 2013 IBM Corporation
Find the Execution Group JVM Health Port for Java Health Center Environment variable: IBM_JAVA_OPTIONS=-Xhealthcenter Open ports starting 1974, the JavaComputeTransformNoXPathFlow DataFlowEngine is 1976 50 2013 IBM Corporation
Alternate Method for Finding the port number Find the Process ID of the execution group with mqsilist and netstat 51 2013 IBM Corporation
Invoking the Java Health Center 52 2013 IBM Corporation
Attaching to the Execution Group JVM 53 2013 IBM Corporation
Connect to a port 54 2013 IBM Corporation
Connection Complete and Ready to Analyse 55 2013 IBM Corporation
Analysis and Recommendations - Classes 56 2013 IBM Corporation
Analysis and Recommendations - Environment 57 2013 IBM Corporation
Analysis and Recommendations Garbage Collection 58 2013 IBM Corporation
Analysis and Recommendations Native Memory 59 2013 IBM Corporation
Analysis and Recommendations - Profiling 60 2013 IBM Corporation
The cause Having worked out which node is causing the problem We can quickly see why the node is consuming a lot of CPU A call to the method bubble_sort() just before propagating out of the node is sorting the entire output message 61 2013 IBM Corporation
If you suspect there is a product problem Identify the problem as best you can Find the simplest test that recreates the problem Collect the data identified in the Must Gather list For IIB http://www.ibm.com/support/docview.wss?rs=849&uid=swg21209857 For MQ http://www.ibm.com/support/docview.wss?uid=swg21229861#mg6 62 2013 IBM Corporation
Summary Wide range of tools available covering operating system and component performance Expect to use multiple tools After all it is important to understand what is happening at different levels Demonstration has shown how to use the key tools for MQ and IIB to debug a problem Practice before hand Being familiar with the tools is a great help in a crisis Learning a new tool and solving a crisis is not a good combination Know your applications and systems What is normal in terms of processing rate, CPU usage etc. This information allows to know whether there is a problem and to what extent 63 2013 IBM Corporation
Additional Information WebSphere Message Broker: Designing for Performance http://www.ibm.com/support/docview.wss?rs=849&uid=swg24006518 WebSphere Message Broker V7-Message display, test & performance utilities (IH03) http://www-01.ibm.com/support/docview.wss?rs=171&uid=swg24000637 IBM Monitoring and Diagnostic Tools for Java - Getting started with Health Center http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/getting_started.html IBM Monitoring and Diagnostic Tools for Java - Health Center Version http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/ IBM Monitoring and Diagnostic Tools for Java - Health Center http://publib.boulder.ibm.com/infocenter/hctool/v1r0/index.jsp 65 2013 IBM Corporation