NetView for z/os V6.1 Packet Trace Analysis Introduction This paper provides insights into the Packet Trace Analysis feature delivered in IBM Tivoli NetView for z/os V6.1, including an explanation of the types of errors analyzed as well as use cases that demonstrate the value of this new feature. Analyzing a packet trace is a process of sifting through data to find the clues that lead us to the problem. We search for the events that have occurred or are occurring that could indicate problems. With these clues, we determine the sequences and patterns which lead us to an understanding of the problem and what we can do to resolve it. Packet Trace Analysis Explained In analyzing a packet trace, we look for some key indicators. These are error flags associated with packets that indicate that an event has occurred. NetView processes 6 types of error flags: Zero Window Size, Window Probes, Retransmissions, Duplicate Acknowledgements, Delayed Acknowledgements, and Session Reset flags. Not all of these errors indicate real problems, in that they can often occur as part of the normal course of data transmission, such as Duplicate or Delayed Acks. However, the frequency and timing of errors -- whether there are many or there are several close together -- can indicate a congestion problem. A Reset flag for a session is a pretty certain indication that the session ended abnormally; but looking across multiple sessions involving a common end point, and seeing many of these with Reset flags could indicate an application failure. NetView also looks for Unacknowledged Syns. No error flag is captured for this, but it is a case where requests are sent to an endpoint and no acknowledgement is 1
returned. This is also an indication of an application failure, or possibly that an application or port is not active. NetView s Packet Trace Analysis function simplifies network problem determination by quickly gathering and presenting trace data in a summarized and easy-to-access format. NetView Packet Trace Analysis processes the trace data gathered, searches out and creates a summary of the error flags and Unacknowledged Syns found in the trace data and displays the summarized data. The NetView IPTRACE command provides an easy method of managing Communications Server packet traces by using fill-inthe-blanks panel input and program function keys to issue the Communications Server commands to start, stop or modify packet traces. Analysis results are summarized as shown in Figure 1. Fig. 1 The Packet Trace Analysis summary screen 2
To see a list of sessions containing each type of error, move the cursor to the appropriate field and press F4. (Note: UDP and ICMP sessions are also collected, but no analysis is done on these.) The list of all TCP sessions results are shown as in Figure 2. Fig 2. Listing of all TCP Sessions The list of sessions can be used to find trends, such as a specific host or port that has excessive sessions showing a particular error type, or multiple error flags across multiple sessions. Individual sessions in the list can be selected for additional detailed analysis. The results of selecting a specific session are shown as in Figure 3. 3
Fig 3. Session Analysis summary for a specific session This detailed view of the specific session gives you a full view of what is happening or has happened in the session, with access to the error flags as well as the details about the session and the individual packets that make up the session. From this detailed analysis, you can view the Communications Server Detailed Session report for the session, or select the individual packets for a detailed view of the data contained in the packet. Packets that contain any of the error flags are color-coded in the summary lists so you can find them more easily. To view a detailed demo of the NetView Packet Trace Analysis function, go to the Tivoli NetView for z/os section in the IBM Tivoli Media Gallery (http://www.ibm.com/developerworks/wikis/display/tivolimediagallery/home). Use Cases 4
Below are a couple of use cases where NetView Packet Trace Analysis helps in resolving network related issues. Why is response time soooooo slow? The Problem: You are receiving calls that network response time is slow. No specific host or application is noted. How NetView can help: Start a packet trace using IPTRACE. Use the ANALYZE function key in the IPTRACE display screen to analyze the sessions captured in the trace. Many of the error types summarized during analysis are associated with performance, including Retransmissions, Zero Window Size, etc.... Look for high concentrations of a specific error type and list the sessions. Is there a pattern (such as a specific host or port that shows up consistently)? Or do any of the sessions show a very high error rate (the count of packets vs. flags in the list)? Select individual sessions and drill down into the details of those sessions. I'm unable to connect to the billing application. The Problem: You have received a call at the help desk that users are not able to connect to the billing application. You verify that the application is running and you are able to access the host where it is running. There could be a problem in the network, or perhaps the application was not working, but now is working. How NetView can help: Use IPTRACE to start a packet trace for the application host IP and port. Collect trace data and use the ANALYZE function key to analyze the attempted connections. In the packet trace analysis summary look for the Unacknowledged Syns count. If there are sessions here, the application is not responding to connection requests. If there are not Unacknowledged Syns, check the Reset flag errors or Zero Window Size and Window Probes. In either case, drill down into the details of the individual sessions to see what data and errors are being transferred. 5
About the Author This paper was written by Paul Koch, a software developer on the IBM Tivoli NetView for z/os product. 6