AN 10.12 Performance Analysis and Software Optimization on Systems Using the LAN91C111 1 Introduction This application note describes one approach to analyzing the performance of a LAN91C111 implementation and fine-tuning the software to increase system throughput. Often, the user looks to the device driver for an answer to boost the overall Ethernet packet processing (receiving) speed in the system. While optimizations in the driver may well be sufficient to enhance the performance so that the system throughput requirements can be met, it by no means can be guaranteed. It is useful to understand all parameters that play a part in affecting the observed system performance. It is important to understand where the bottleneck is, and it is important to recognize the point where one has done all one could in software optimization. 1.1 Some Queuing Theory The first step to establishing a framework for calculating the theoretical throughput is to adopt a mathematical model used commonly in network performance analysis. For this, the author went back to a textbook he used years ago. The reference to the textbook is given at the end of this application note. There are lots, and lots, of books on the subject of queuing theory. Only the results of the applicable model are presented here. If the reader is interested in the derivation, please enjoy yourself by referring to one of these books. The mathematical model that is most applicable to our case is known as an Μ/Μ/1/Κ queuing system. This notation denotes a system as follows: λ µ K Figure 1.1 Μ/Μ/1/Κ Queuing System 1. There is one input stream of packets, as there is only one PHY in our LAN91C111. 2. There is one process which retrieves the packets from the LAN91C111. 3. A certain number, Κ, of buffers form a queue, are responsible for receiving packets. SMSC AN 10.12 Revision 1.2 (01-13-04)
The packet traffic at the input of the queue is characterized by the average number of packets per second, λ, at which they arrive at the queue. The device driver takes the packets out of the queue at an average rate of µ packets per second. Both of these processes are modeled by a probabilistic distribution called the Poisson distribution. To cut to the chase, the following expressions give, in the steady state, the probability p K, that a packet arrives at the input of the system and finds the queue full, at which the point the packet will be dropped by the LAN91C111. ( 1 a) K a p K = for λ µ K + 1 1 a 1 p K = for λ = µ K + 1 where a = λ/µ. Figure 1.2 Poisson Distribution The throughput is related to the probability that a packet arrives at the queue and finds the queue not full, or 1 - p K. This is proportional to the number of packets not dropped in a given period. Revision 1.2 (01-13-04) 2 SMSC AN 10.12
2 Performance as a Function of µ It is apparent that the performance characteristics of the system depends on several variables: how fast the system removes the packet from the LAN91C111, how many buffers are allocated in the LAN91C111 for receiving packets, and, less intuitively, how fast the packets are arriving the LAN91C111. The following figure depicts the system throughput, expressed as the ratio between the rate of packets not dropped by the system and the rate of total packets arriving, for the given scenario. This we define as the bandwidth utilization, which is the same as 1 - p K. 1.00 0.90 0.80 0.70 Utilization 0.60 0.50 0.40 0.30 λ = 100 λ = 1000 λ = 10000 λ = 50000 λ = 100000 0.20 0.10 0.00 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 µ Figure 2.1 Bandwidth Utilization for K=1 It easy to see that the faster the CPU is able to take packets out of the LAN91C111 buffers, the better the system is able to receive all or most of the input packets. It seems obvious, but one can never expect to receive more than µ packets per second, on the average. SMSC AN 10.12 3 Revision 1.2 (01-13-04)
3 Performance as a function of K The number of buffers allocated for receiving packets also affects the performance of the system. Figure 3.1 correlates the utilization to the number of receiving buffers for a particular packet arrival process characteristic. 1.00 0.90 0.80 0.70 Utilization 0.60 0.50 0.40 Κ=1 Κ=2 Κ=3 Κ=4 0.30 0.20 0.10 0.00 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 µ Figure 3.1 Bandwidth Utilization for λ = 10000 It can be seen that, given our mathematical model, increasing the number of buffers in the queue beyond 2 does not greatly affect the system throughput when the system s processing speed is high compared to the packet arrival rate. Revision 1.2 (01-13-04) 4 SMSC AN 10.12
4 System Performance Analysis Procedure Follow the steps below to analyze the system before attempting to optimize. Knowing where the bottleneck is gives you the most bang for the buck in terms of using your efforts wisely. 1. Are packets currently being discarded by the driver (or anywhere else in software) because the TCP/IP stack input buffer is full? If so, the application software or the operating system is too slow. Optimizing the driver will not help to increase the system performance. 2. Determine the number of buffers allocated for receiving in the LAN91C111. This is your K. 3. Attempt to measure how much time, in seconds, it takes to process one packet in the Interrupt Service Routine (ISR). This includes the following times: a. Interrupt latency b. Time required to copy the packet data from the LAN91C111 to main memory The reciprocal of this number is your µ. You will need to instrument your software to measure this over a large number of packets. µ is the average processing rate. 4. Plot the equation on the first page for 1 - p K, as a function of λ, with the Κ and µ values found from the two steps above. It should look something like one of the following curves. 1.00 0.90 0.80 Utilization 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 µ = 100 µ = 1000 µ = 10000 µ = 50000 µ = 100000 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 λ Figure 4.1 Bandwidth Utilization for K=3 5. Measure how quickly the packets arrive at your system. Over a period of time, calculate the average number of packets arriving per second. This is your actual λ. Since your system is dropping packets (or else, you might not care to do this analysis in the first place), you should not use your software to do the measurement. You will want to do this measurement by using a network sniffer or such similar equipment. SMSC AN 10.12 5 Revision 1.2 (01-13-04)
6. Measure the number of packets received by the software. Correlate λ to the chart from Step 3. Does it make sense in terms of how many packets are currently being dropped? 5 System Performance Optimization Recommendations Try the following techniques in software to increase the system throughput performance. Do them one at a time. Each time after modifying the software, reassess the new performance by repeating Step 6 described in the System Performance Analysis Procedure. 1. Allocate the right number of buffers, Κ, in the LAN91C111 for receiving. Larger is better, but take care to not negatively affect packet transmission performance to the point of not meeting system requirements. The effectiveness of varying this parameter depends on the burstiness of the packets arriving at the input of the system, which in turn is dependent on the network application and the instantaneous network condition. 2. Rewrite the ISR to make copying packet data from the LAN91C111 buffers to main memory more efficient. Consider rewriting it in assembly language if necessary. Copy all packets present in the buffers for each ISR invocation. There is no downside to taking this optimization action (except that assembly code is less readable, harder to maintain, and more difficult to port). 3. Increase the interrupt priority for packet receives to lower the interrupt latency. Shortening the interrupt latency lowers the buffer depth, Κ, requirement. Take care to prevent starving other (nonnetworking) processes of CPU cycles. If you still need more performance gain after taking the actions above, there are usually things that can be tried to modify the hardware design in order to increase µ.. However, it is difficult to make specific recommendations without seeing the particular designs. We heard stories that involve the LAN91C111 on a 16-bit bus with 20 wait states there is no software in the world that is going to fix that! As our rule of thumb, the raw bus bandwidth should be about twice the maximum data bandwidth through an I/O device. For the LAN91C111, that means 50 MBps, or 80 ns cycle time on a 32-bit bus if you want to do 100 Mbps full-duplex. 6 References Schwartz, Mischa, Computer-Communication Network Design and Analysis, Prentice-Hall, Inc., Englewood Cliffs, N.J., 1977. Revision 1.2 (01-13-04) 6 SMSC AN 10.12
80 Arkay Drive Hauppauge, NY 11788 (631) 435-6000 FAX (631) 273-3123 Copyright SMSC 2004. All rights reserved. Circuit diagrams and other information relating to SMSC products are included as a means of illustrating typical applications. Consequently, complete information sufficient for construction purposes is not necessarily given. Although the information has been checked and is believed to be accurate, no responsibility is assumed for inaccuracies. SMSC reserves the right to make changes to specifications and product descriptions at any time without notice. Contact your local SMSC sales office to obtain the latest specifications before placing your product order. The provision of this information does not convey to the purchaser of the described semiconductor devices any licenses under any patent rights or other intellectual property rights of SMSC or others. All sales are expressly conditional on your agreement to the terms and conditions of the most recently dated version of SMSC's standard Terms of Sale Agreement dated before the date of your order (the "Terms of Sale Agreement"). The product may contain design defects or errors known as anomalies which may cause the product's functions to deviate from published specifications. Anomaly sheets are available upon request. SMSC products are not designed, intended, authorized or warranted for use in any life support or other application where product failure could cause or contribute to personal injury or severe property damage. Any and all such uses without prior written approval of an Officer of SMSC and further testing and/or modification will be fully at the risk of the customer. Copies of this document or other SMSC literature, as well as the Terms of Sale Agreement, may be obtained by visiting SMSC s website at http://www.smsc.com. SMSC is a registered trademark of Standard Microsystems Corporation ( SMSC ). Product names and company names are the trademarks of their respective holders. SMSC DISCLAIMS AND EXCLUDES ANY AND ALL WARRANTIES, INCLUDING WITHOUT LIMITATION ANY AND ALL IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND AGAINST INFRINGEMENT AND THE LIKE, AND ANY AND ALL WARRANTIES ARISING FROM ANY COURSE OF DEALING OR USAGE OF TRADE. IN NO EVENT SHALL SMSC BE LIABLE FOR ANY DIRECT, INCIDENTAL, INDIRECT, SPECIAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES; OR FOR LOST DATA, PROFITS, SAVINGS OR REVENUES OF ANY KIND; REGARDLESS OF THE FORM OF ACTION, WHETHER BASED ON CONTRACT; TORT; NEGLIGENCE OF SMSC OR OTHERS; STRICT LIABILITY; BREACH OF WARRANTY; OR OTHERWISE; WHETHER OR NOT ANY REMEDY OF BUYER IS HELD TO HAVE FAILED OF ITS ESSENTIAL PURPOSE, AND WHETHER OR NOT SMSC HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. SMSC AN 10.12 7 Revision 1.2 (01-13-04)