Application Latency Monitoring using nprobe Luca Deri <deri@ntop.org>
Problem Statement Users demand services measurements. Network boxes provide simple, aggregated network measurements. You cannot always install the measurement box wherever you want (cabling problems, privacy issues). New protocols appear every month, measurement protocols are very static and slow to evolve. 2
Some Measurement Metrics Availability Response Time Accuracy Throughput Utilization Latency and Jitter 3
Response Time 12 th May 2011 - Bolzano Response time is the time it takes a system to react to a given input. Example: In an interactive transaction, it may be defined as the time between the last keystroke by user and the beginning of the resulting display by the computer. Short response time is desirable. Necessary for interactive applications (e.g. telnet/ssh) not too important for batch applications (e.g. file transfer, peer-to-peer). 4
Network Latency [1/2] Network Latency (or delay): amount of time (ms) it takes a packet from source to destination (one-way). It cannot be measured from a single point of observation. Very important for interactive applications (e.g. online games). Round-trip latency (source -> destination -> source) is less accurate but more popular as it can be measured from a single point. 5
Network Latency [2/2] Network latency is affected by: Media type (fibre vs. satellite communications). Memory Buffers Operating system buffering (sockets, queues) Network devices buffering (I/O ports). The more network elements a packet has to traverse, the more is the latency it can accumulate. 6
Application Latency [1/2] Companion of network latency, when measured at application level instead of network level. It computes the delay added by application processing to the packet journey. It basically measures the time taken by the application to serve the request (e.g. SQL query execution time). 7
Application Latency [2/2] Application latency varies according to the issued request (e.g. HTTP request of a static file is faster than a dynamic page). Request Client Server Response Time 8
Jitter Jitter: variance of latency on a mono-directional link. Expressed in msec, it is very important for multimedia applications (e.g. internet telephony or video broadcast). On a nutshell jitter measures how latency changes over time. Ideally, jitter should be as close as possible to zero. 9
Measurement Goals Monitor over-time important metrics such as latency and jitter. Necessary as latency changes with system and network load. Do not provide just generic host-to-host metrics but rather a perrequest report that allows network administrators to pin-point problems. 10
nprobe Overview 11
What is nprobe? nprobe is an open-source flow: Probe Proxy Collector It was designed to be open to extensions via dynamically loadable plugins. Available as software application and also as network appliance. Capacity up to 10 Gbit. 12
nprobe Components Würth-Phoenix NetEye 13
nprobe Placement 12 th May 2011 - Bolzano NetFlow Collector NetFlow v9 or IPFIX exports Insert nprobe/nbox on spanned or mirrored switch ports. Typically watch uplink traffic. Deployed nprobes 14
Measuring Latency 15
Measuring Latency Client Network Delay Server Network Delay Application Latency 16
Server and Client Network Delay [1/2] 17
Server and Client Network Delay [2/2] APPL_LATENCY and CLIENT_NW_DELAY are determined when the nprobe observes the TCP flags in a transaction. Simple 3 packet transaction (TCP only). Divide the time delta by two, as we want to compute the network latency that we assume is half the round trip time. 18
Application Latency [1/2] Application latency is computed as the time needed by an application to react to a client request. For TCP connections, application latency is computed on the first packet after three-way-handshake. For UDP connections on the first client -> server and server -> client packet. 19
Application Latency [2/2] 20
Troubleshooting Latency Network delay is crucial for some online services such as SSH and transaction services. Application latency estimates the time needed by a server to react to a client request, that is basically the time that a server needs to answer to client requests. Both should be constantly monitored in order detect slow-downs and network congestion that degrades the overall service performance. 21
nprobe Latency Monitoring [NFv9 57554][IPFIX 35632.82] %CLIENT_NW_DELAY_SEC Network latency client <-> nprobe (sec) [NFv9 57555][IPFIX 35632.83] %CLIENT_NW_DELAY_USEC Network latency client <-> nprobe (usec) [NFv9 57556][IPFIX 35632.84] %SERVER_NW_DELAY_SEC Network latency nprobe <-> server (sec) [NFv9 57557][IPFIX 35632.85] %SERVER_NW_DELAY_USEC Network latency nprobe <-> server (usec) [NFv9 57558][IPFIX 35632.86] %APPL_LATENCY_SEC Application latency (sec) [NFv9 57559][IPFIX 35632.87] %APPL_LATENCY_USEC Application latency (usec) 22
Application Latency Monitoring In addition to generic latency monitoring, nprobe allows selected protocols to be analyzed: HTTP(S) Database (MySQL) VoIP (Voice over IP) DNS Data can be exported via NetFlow and also dumped on log files. 23
Application Latency: HTTP Client Server Protocol Method URL HTTPReturnCode Location Referer UserAgent ContentType Bytes BeginTime EndTime Flow Hash Cookie Terminator ApplLatency 192.168.1.92 images.apple.com http GET /global/nav/images/globalsearch_bg.png 304 www.apple.com www.apple.com/itunes Mozilla/5.0 (Macintosh;...) image/png 3175 1304148157.834 1304148159.623 2527515623 155664 C 0.178 24
Reporting 25
Latency Measurement [1/4] 26
Latency Measurement [2/4] 27
Latency Measurement [3/4] 28
Latency Measurement [4/4] 29
HTTP Latency Measurement [1/2] 30
HTTP Latency Measurement [2/2] 31
VoIP Latency Measurement [1/3] 32
VoIP Latency Measurement [2/3] 33
VoIP Latency Measurement [3/3] 34
Integration with WÜRTHPHOENIX NetEye 5/16/2011 35
5/16/2011 36
High-Speed Latency Monitoring 37 2011 - ntop.org
Packet Juggling [1/2] Application monitoring requires payload inspection and thus it is CPU intensive. Filtering packets and balancing them across CPU cores (packet steering) is a prerequisite for good performance. FPGA-based cards dramatically enhance packet capture speed, but they are quite costly and thus not suitable for all. 38 2011 - ntop.org
Packet Juggling [2/2] Using network taps increase network complexity, while increasing costs. It is now possible to combine tap, filtering, packet steering all in one card, integrated into the nbox. 39 2011 - ntop.org
Summary nprobe/nbox is an open source highly scalable solution for distributed collection of network traffic information. nprobe is the first solution to export latency information regarding end to end communications using NetFlow v9 or IPFIX. Würth-Phoenix NetEye and Plixer Scrutinizer can be used for reporting application and network latency. 40 2011 - ntop.org