Network Performance Evaluation Throughput by Troy Bennett Computer Science Department School of Engineering California Polytechnic State University 1998 Date Submitted: Advisor:
TABLE OF CONTENTS Page ABSTRACT... III LIST OF FIGURES...IV LIST OF TABLES...V THROUGHPUT...6 TCP/IP HISTORY...7 LAYERS...7 APPLICATIONLAYER...7 TCP/IP...7 ETHERNET...9 TEST PLATFORM...11 NETWORK HARDWARE...11 TESTING TOOLS...11 Netperf...11 Internet Advisor...12 Surveyor...12 EXPERIMENTS...12 EXPECTATIONS...12 EXPERIMENT 1...13 EXPERIMENT 2...15 EXPERIMENT 3...16 EXPERIMENT 4...17 NOTES TO FUTURE STUDENT RESEARCHERS...19 BIBLIOGRAPHY...20 ii
Abstract This document discusses the value of throughput as a performance measurement and presents the outcomes of a series of experiments conducted to measure network throughput. The objective was to determine network bandwidth overheads associated with packet processing. Also reported are some other performance measures of interest to the 3com project. iii
LIST OF FIGURES 1 Figure 1 TCP Packet Format... 8 Figure 2 IP Datagram Format... 9 Figure 3 802.3 Ethernet frame... 9 Figure 4 RFC 894:Ethernet Encapsulation... 14 Figure 5 Throughput Graph : 10MBPS... 15 Figure 6 CPU Utilization : 10MBPS... 16 Figure 7 Throughput Graph : 100MBPS... 18 Figure 8 CPU Utilization : 100MBPS... 18 1 Figures 1-3 Are from the CSc 404 lecture notes of Dr. S. Ron Oliver iv
LIST OF TABLES Table I Throughput Loss : 10MBPS... 13 Table II Throughput Loss : 100MBPS... 17 v
Throughput What is throughput? Why use it as a criterion in evaluation of performance? Throughput is defined as the..number of user information bits transmitted per unit time 2 The key words here are user and by extension user information. A user can be a person or an application. Who or what the user is is relative to where you are looking in the data flow model. User information does not include things such as packet headers tagged on to the users information by a lower level. While a link may be said to be capable of transmitting 1,000 BPS, if the process of transmission uses 200 BPS the maximum throughput for the user is only 800 BPS (the link would be said to have an 80% efficiency). Throughput is a useful measure for this project in that we are trying to determine how much user data is getting through at various levels in the TCP/IP - Ethernet transmission scheme. By comparing the throughput at different levels, we can see how much capacity is lost at each layer. By comparing the measured throughput to theoretical values based on the number of header bytes added to a packet by the protocols in the layers involved, we can also determine the time used at each layer for overhead. 2 Dr. S. Ron Oliver CSc404 Notes Lecture 2 6
TCP/IP History In 1969 the Defense Advanced Research Projects Agency (DARPA) started research into a packet switching network for use by defense agencies. In 1975 this network (ARPANET) went from an experiment into an accepted production tool. The TCP/IP suite of tools was adopted as a military standard in 1983 and all hosts on the ARPANET (which became the MILNET) were required to convert to TCP/IP. Layers From initially requiring each program to handle its own communication network protocols have evolved into layers. The TCP/IP protocol, sponsored by Department of Defense, provided the ability for applications to communicate across a wide variety of different computer types without having to understand the intricacies of each type. Application Layer The Application layer sits above the TCP/IP protocol. It is in this layer that a particular program such as Telnet interfaces with the network without needing to understand the specifics of the services provided below. In terms of the OSI model the TCP/IP Application layer encompasses both the Application and Presentation layers and are the programs themselves. TCP/IP TCP: Transmission Control Protocol is a transport layer connection-oriented protocol for connections between hosts. Reliability is provided at this level by positive acknowledgments and, if needed, retransmission. The TCP delivers the data received from IP to applications based on a port number identification. There is also an unreliable transport layer service, known as User Datagram Protocol (UDP), that is used for non-critical or real-time data transport where missed packets are tolerable as in network device status monitoring. 7
Figure 1 TCP Packet Format IP: At the network layer, the internet protocol supports the basic communication of independent hosts over a network. It is in this layer that routing using Internet addresses is performed. IP routes the packets to the appropriate destination host. IP also performs fragmentation of datagrams as needed. Note: IP does not provide error control or acknowledgment and relies on Transport layer protocols to initiate connections and perform error-checking if needed. 8
Figure 2 IP Datagram Format Ethernet Ethernet is a Medium Access Control (MAC) layer Carrier Sense Medium Access w/collision Detection (CSMA/CD) protocol. This means that any Ethernet node that wishes to transmit data checks for a quiet wire (or other medium) before transmission and then sends its packet. If a collision with another packet occurs this is detected and a random time interval, from the exponential back-off algorithm, is selected before the next transmission attempt will take place. An Ethernet Packet consists of the following parts: Figure 3 802.3 Ethernet frame The Preamble is a 7 octet pattern used for bit synchronization. SFD: Start of Frame Delimiter. 9
DA: Destination Address SA: Source Address FCS: Frame Check Sequence Length: length in octets I-Field: Contains the information being transmitted PAD: Used as needed to maintain 64 octet minimum packet size for collision detection FCS: Frame Check Sequence 10
Test Platform Network Hardware The LAN used for our tests consists of a Hewlett Packard Netserver LH Pro and four HP Vectra VL series 4 workstations. Each machine contains a 3C905 Fast Ethernet XL network adapter capable of 10 and 100MBPS speeds. The machines all run Windows NT. The 10 MBPS experiments were run using a 3C16700 hub and the 100 MBPS tests used the 8/TP100 hub, both of which are 8 port BASE-T. The workstations are configured as follows: Workstation 200MHZ Pentium 96 MB RAM Quantum Fireball 2.4GB Hard Drive Matrox Millennium 2MB video card Windows NT Workstation 4.0 Tests were run between workstations and did not directly involve the server. Testing Tools Netperf Netperf is an application level benchmark program which measures throughput for variable size messages. Netperf communicates with its companion Netserve using either TCP/IP or UDP/IP as selected by the user. Netserve is set up on the host machine with parameters to listen to a specific port. Netperf, the client, is then given the address of the Netserve machine, the desired protocol, message size and length of the trial to be run. In the following experiments TCP/IP and the default run time of 10 seconds are always used. 11
Internet Advisor The Hewlett Packard Internet Advisor is a laptop personal computer configured to run HP s proprietary network analysis software. The Internet Advisor is connected to a port on the Ethernet hub and monitors all the frames on the network. The Internet Advisor is capable of analyzing Ethernet and Token Ring frames as well as deciphering higher layer protocols such as TCP/IP and IPX. In addition, it is capable of reporting network usage in terms of percent utilization. Surveyor Somiti Surveyor is a software protocol analyzer that uses the Network Interface Card NIC of a Windows 95 or NT machine to perform functions similar to that of the Internet Advisor. Since the version of Internet Advisor at Cal Poly is not capable of connecting to 100MBPS Ethernet networks, Surveyor was substituted to monitor the 100MBPS trials. Experiments The purpose of our experiments overall is to determine throughput at different layers. The experiment is not interested in overall network performance per se and since Ethernet is a CSMA/CD protocol the tests are conducted on a single connection between two machines in order to exclude the occurrence of a vast number of collisions. In order to acquire data at each network level under a load that is consistent, tests at different layers will be conducted simultaneously when possible. When ever two or more benchmark measurements cannot be obtained concurrently, separate tests are run with loads as closely matched as possible. Expectations It is always useful to perform a theoretical prediction of the results expected from an experiment. This will prevent one from blindly accepting improbable outcomes, and will provide a basis for any necessary revision in our hypotheses. For example, at the base level the Ethernet connection has a stated speed of 10Mbps. Assuming this is correct we can then work up the protocol stack. Ethernet uses a header of 26 octets per packet (208 bits) and has a maximum data field of 1,500 octets (12,000bits). 12
Hence approximately 1.73% (208/12,000) of the available bandwidth is used for headers, resulting in a maximum throughput of 9.827Mbps for payload data carrying upper-layer packets. Throughput at the higher layers depends on the packet size used. TCP and IP each add 20 bytes per Ethernet frame and each TCP/IP packet typically is fragmented into a number of Ethernet frames. Experiment 1 Throughput loss between the Transport and the Network Access Layers on a 10MBPS LAN The first experiment compares the throughput at the transport layer, as measured by Netperf, to the MAC layer, as measured by the Internet Advisor. Since the Internet Advisor does not provide direct throughput measurement, the test was conducted using the measure of utilization over the intervals of the test. Utilization is the percentage of the maximum capacity of the network used over a time interval. The resulting values were then multiplied by the available bandwidth (Throughput = Utilization * Bandwidth), and a loss in the throughput calculated. Besides the overhead associated with bits lost to headers, there is also the factor of the time required for encapsulation of the higher layer packets into data frames. Our test results, presented below, indicate that the average loss between the Throughput and MAC layers is 550 KBPS. Error! Not a valid link. Table I For these tests continuous sampling was used, the display of data on the Internet Advisor was halted after each Netperf run to collect the data, since the default buffer size of the Internet Advisor could not handle even the 10 second default trial time of Netperf. The message size used was the default 8,192 bytes. Using the standard statistics functions in Microsoft Excel to analyze the data collected, we see that there is a 95% probability that the average loss will fall between 528 and 565KBPS. How much of this is due to the differences in packet sizes and how much is due to the time it takes to process the packets in each layer? The expected loss due to the differences in packet size is calculated as follows: 13
From the transmission point of view we look at the message size of 8,192 bytes. From my initial reading it seemed a single TCP header of 20 bytes was added and a 20 byte IP header for a total of 8,232 bytes and this would be split into 1,500byte data packets. However this did not coincide with the frames I analyzed. My conclusion is that on our network the IEEE 802 standard Ethernet frame is not in use; after some investigation I found that instead, Ethernet Encapsulation (RFC 894) is used. The latter frame format uses six bytes each for destination and source addresses, 2 bytes for type and a 4 byte CRC for a total of 18 bytes of header. The maximum amount of data that an Ethernet frame can carry under this standard is 1,500 bytes, with a minimum data size of 46 bytes. This makes the minimum frame size 64 (46+18) bytes. Thus, a TCP/IP acknowledgment, which is 40 bytes in length, is padded with 6 bytes to make the 46 bytes required minimally. Figure 4 RFC 894:Ethernet Encapsulation Each Ethernet frame carries the TCP/IP header for recompilation at the destination. Each 8,192 byte message generated by Netperf is fragmented into 5 full frames (1,518 bytes each) and a frame with a 892 byte data field (950 bytes). The data overhead in transmission should therefor be ((TCP:20+IP:20+Ethernet:18)*6) / (Message Size:8192 + (20 + 20 + 18)*6) = (348/8540)bytes or Error! Not a valid link.. For our average Ethernet level throughput this would translate into a loss of Error! Not a valid link.kps instead of the 550 indicated. However, the TCP acknowledgments of the packets must also be taken into consideration. Adding six 64bit acknowledgments to our figures, the loss between Netperf and the analyzer would be (Error! Not a valid link./error! Not a valid link.) orerror! Not a valid link. which still only accounts for a loss of Error! Not a valid link.kbps. Assuming the destination receive window is large enough the rest of the loss, 176BPS (550-374), can be assumed to be due to the latency between successive network layers on the computer and in the NIC. 14
Experiment 2 The impact of message size on throughput and CPU utilization on a 10MBPS Ethernet. This experiment studies the relation among message size, the amount of processor activity and the network throughput. The throughput numbers are those reported by Netperf. The CPU % data was gathered using, PERFMON, the Windows NT built in performance monitor. CPU utilization numbers reported are the average over the time period of the experiment. From our previous analysis, we expect that we should get the best throughput when the message size is an exact multiple of 1460 bytes. Since the TCP/IP headers will each occupy 20 bytes per frame and the maximum data size is 1,500. In other words 1,460 is the application message size with the minimum overhead. Throughput vs Message Size 10MBPS Throughput (MBPS) 10 8 6 4 2 0 1 10 100 1000 10000 100000 MessageSize(bytes) Throughput in MBPS Figure 5 Throughput Graph : 10MBPS Unexpectedly, the message sizes with the best throughput were both multiples of approximately one half of 1,460 (730) 8,192 (5.6 multiple of 1,460) and 32,768 (22.4 multiple of 1,460). We do not have an explanation for this observation at this time. In CPU utilization we expect to see a decrease with message size increases as the time between the generation of successive datagrams will be farther apart and the overhead per message decreases up to the 1,460 byte message size. 15
Message Size vs. CPU Utilization 10MBPS CPU Utilization 100 80 60 40 20 0 1 100 10000 1000000 CPU % Message Size (bytes) Figure 6 CPU Utilization : 10MBPS There were no surprises in this trial, the CPU utilization does decrease as expected. Interestingly, the CPU utilization levels off as the message size is increased beyond 10000 bytes. The presumed explanation for the leveling off of is that the CPU utilization is that once the NIC is saturated the CPU generates a constant stream of datagrams to fill the available bandwidth. Experiment 3 Throughput loss between the Transport and Network Access Layers on a 100MBPS LAN The Internet Advisor at Cal Poly is not equipped to measure 100MBPS traffic. As an alternate method, I acquired a demo version of Surveyor, a software protocol analyzer from Shomiti Software at www.shomiti.com. The software was installed on R100B4, a workstation not directly involved in the test, so as to avoid overburdening one CPU or NIC during the test. The software version does not output a discrete percent utilization table as the Internet Advisor does so averages for utilization were taken manually from the graphic display. While this method is not as precise as in experiment one the data may be accurate enough for estimating loss percentages. It should also be noted that since this is a software 16
tool using a standard Ethernet card the possibility exists that there may be packet loss due to the lack of data buffering on the card. The results obtained are presented below. Error! Not a valid link. Table II Note that the average loss indicated by the table is less than 2%. While we would expect that since the utilization is not as close to maximum bandwidth as it was in experiment one, perhaps the number of collisions would be less and the percent loss would be lower than the percent loss of experiment one, 2% is theoretically not possible. The theoretical minimum loss due to overhead is the same as given in experiment one : 4.30%. The expected loss from the Transport Layer to Network Access Layer at 61 MBPS would be 2.5MBPS minimum. Looking at the packets in Surveyor they are the same, Ethernet Encapsulated, as in experiment one. It is clear the graph readings are not accurate enough to produce a meaningful loss test. Experiment 4 100MBPS Variable size messages throughput and CPU utilization This experiment is a repeat of Experiment 2 using the 3com 100MPBS hub. Data was gathered using Netperf and the Windows NT performance monitor. What is interesting to note is the difference in CPU utilization. The processor has to work harder to keep up with the increased bandwidth and does not level out as quickly as during the 10MBPS trials. The first hint of dropping below 100% utilization was a blip 17
in the 512 byte trial. Since the CPU was not as quick to drop the message sizes tested were extended to Throughput vs Message Size 100MBPS Throughput (MBPS) 60 50 40 30 20 10 0 1 100 10000 1000000 Throughput in MBPS Message Size (bytes) 512kB. Figure 7 Throughput Graph : 100MBPS M essage Size vs. CPU Utiliza tion 100MBPS CPU Utilization 100 80 60 40 20 0 1 100 10000 1000000 CPU % Message Size (bytes) Figure 8 CPU Utilization : 100MBPS 18
Notes to future student researchers. To measure the throughput losses between transport(tcp) and Internet(IP) layers, a thorough look at how TCP/IP is implemented in Windows NT will be helpful, if benchmarks are to be developed to measure the losses between them. A simpler way to measure the IP throughput would perhaps be to use UDP datagrams this would eliminate the acknowledgments and therefore the dependence on the destination window. Shomiti Surveyor can decode each of the layers and has a plug-in module Packet Blaster to generate network traffic that should be useful. It is as yet unknown if Surveyor can measure interlayer throughput. Check the accuracy of the Surveyor throughput tests by analyzing the capture buffer for frame overflow. The NIC in the test system does not have an especially large buffer (8K). 19
Bibliography Craig Hunt, TCP/IP Network Administration, Sebastapol; O reilly & Associates, Inc., 1993 Averill M Law & David Kelton, Simulation Modeling and Analysis New York; McGraw-Hill, Inc., 1991 W. Richard Stevens, TCP/IP Illustrated, Volume 1 : The Protocols New York; Addison-Wesley Publishing Company, 1994 William Stallings, Ph.D., Data and Computer Communications New York; MacMillan Publishing Company, 1994 20