51-20-78 DATA COMMUNICATIONS MANAGEMENT IMPLEMENTING VOICE OVER IP Gilbert Held INSIDE Latency is the Key; Compression; Interprocessing Delay; Network Access at Origin; Network Transmission Delay; Network Egress at Destination; Buffer; Decompression LATENCY IS THE KEY The ability to implement a voice-over-ip solution depends on the oneway latency or delay between the originator and recipient of the call. This delay is variable and depends on several parameters, some of which are under one s control and others that, when using a public network, may be beyond one s control. Exhibit 1 lists the major components associated with the end-to-end analog-to-digital coding, transmission, and conversion back to voice of a packet carrying a digitized portion of a conversation. Note that the fixed and variable delays associated with the end-to-end transmission of a packet can generally range between approximately 60 and 289 ms. To put this time in perspective, a typical human ear can accept up to 250 ms of delay every once in awhile prior to the conversation becoming annoying. While full-duplex transmission is desirable for most data applications, it is not useful and in fact creates problems when voice is carried over a network. This is because rational humans do not have a conversation by both talking at the same time. Instead, rational humans wait for one party to finish talking before the other party to the conversation begins a response. If the latency or delay begins to exceed a quarter of a second for significant portions of a conversation, the conversation will begin to PAYOFF IDEA Unlike the weather that people like to talk about but over which they have a minimum of control, there are many tools and techniques to ensure that a voice-over-ip application has good potential for success. The reason this author uses the term good results from the fact that a public IP network, such as the Internet, is currently incapable of reserving bandwidth. This fact means that the random process of packet flow can turn a viable solution today into an impractical one tomorrow. However, by using some well-known TCP/IP application tools and a handful of logical networking techniques, one can literally stack the deck in one s favor and enhance the chance of a successful voice-over-ip implementation which is the focus of this article.
EXHIBIT 1 Packet Network Delays Fixed and variable delays (ms) Minimum Maximum Compression (voice coding) 10 45 Inter-process (origin) 10 10 Network access at origin 0.25 7 Network transmission delay 20 200 Network egress at destination 0.25 7 Buffer (configurable) 10 10 Decompression 10 10 Total fixed and variable delays 60.50 289 resemble a CB-radio conversation, with each party having to say over to inform the other party it is OK to talk. Otherwise, the delay will result in one party periodically thinking the other has finished talking when they have not, resulting in a full-duplex conversation that requires one party to stop and the other party to begin anew. In fact, the International Telecommunications Union (ITU) standard for one-way delay for a voice call requires a maximum latency of 150 ms, which ensures that the call will not turn into a full-duplex conversation. For most organizations, a maximum latency of 200 ms and a mean latency approaching or under 150 ms should be sufficient to provide good quality of reconstructed voice. For those curious about the latency associated with the use of the public switched telephone network (PSTN), it can be considered almost negligible. That is, telephone company switches introduce delay measured in microseconds (ms), one thousandth of a millisecond (ms). Similarly, pulse code modulation (PCM) and adaptive differential pulse code modulation (ADPCM) have a coding delay of a few microseconds, which represents a thousandth of the delay associated with the use of low-bit voice encoding methods. With an appreciation for the amount of oneway delay a voice-over-ip application can tolerate, examine the component of delay listed in Exhibit 1. In doing so, there are several techniques one can use to modify or adjust certain delays. COMPRESSION There are numerous low-bit rate compression or coding methods one can select. In fact, most equipment vendors typically support between four and six voice coding methods. Most of these methods are members of the code excited linear prediction (CELP) family of voice coders. The original CELP voice coder was a hybrid, incorporating both waveform coding similar to PCM and ADPCM and linear predictive coding. Although the first CELP voice coder operated at a relatively low bit rate in comparison to PCM and ADPCM, its coding delay was relatively high, ap-
proaching 60 ms. Since the first CELP voice coder was introduced, numerous variations have been developed, with approximately half a dozen now standardized by the ITU. Two of the more popular members of the CELP family are the low-delay (LD) CELP (G.723) that operates at 16 Kbps but has a delay of 10 ms, and the G.729.1 multicoder that operates at both 5.3 and 6.3 Kbps and has a coding delay of approximately 30 ms. Other coders have delays up to 45 ms. By carefully examining the specification sheets associated with different voice coding techniques, one will note that there is normally a direct relationship between the coder bit rate and its mean optimum score (MOS), the latter a measurement of voice quality; that is, the higher the bit rate, the higher the MOS. There is also an inverse relationship between the coder s bit rate and its coding delay; that is, the higher the resulting bit rate produced by the coder, the lower the coding delay. Thus, one technique to consider when making a voice-over-ip application work when there is too much latency is to consider changing the voice coding method. In general, changing the coder from one that operates at 5.3/6.3 Kbps to 8 Kbps will reduce the coding delay by approximately 10 ms. If one changes the coder to an LD-CELP coder, one can remove an additional 10 ms of delay. While 20 ms is not earth shattering, it may be enough to ensure that the quality of reconstructed voice is acceptable for one s voice-over-ip application. In addition, if one saves a few milliseconds here and a few milliseconds with other techniques, the cumulative effect of these savings will really add up. The effect of changing these settings may be sufficient to provide an acceptable level of reconstructed voice that might otherwise preclude one s organization from implementing a voice-over-ip application. INTERPROCESSING DELAY The interprocessing delay represents the time required to form an IP datagram from each segment of digitized voice produced by a voice coder and route the resulting datagram to a router connected to the IP network. The total interprocessing delay is approximately 10 ms and represents a series of delays that can be variable due to the fact that the level of utilization of the LAN that connects the voice gateway to the router governs the gateway-to-router delay. In addition to the gateway-to-router delay, the gateway itself has some processing delay as it forms an IP datagram. Similarly, the router must process each datagram prior to sending it on its way toward its destination. Although this author used 10 ms for the total interprocess delay at the origin, it is possible to slightly reduce this time. To do so, consider obtaining a faster processor for the voice gateway. Most voice gateways are modular devices, with voice processing boards that are inserted into a PC. Although each voice processing board contains its own processor, which enables support for different voice
coding methods and actually performs the analog-to-digital conversion, the formation and movement of datagrams through the gateway is a function of its main processor. This means that the use of a Pentium III 700 MHz system will have less delay and support the faster movement of datagrams through the gateway and onto the LAN than a Pentium III 500 MHz-based system. From tests performed by this author, the replacement of the gateway system with a faster processor can be expected to shave 1 ms per extra 100 MHz of processor power. A second method one can consider using to reduce the interprocess delay is to upgrade the LAN if it is operating at a level of utilization beyond 50 percent. Based on this author s experience, the replacement of an Ethernet LAN operating at slightly over a 50 percent level of utilization by a Fast Ethernet network reduced delay by approximately 1 ms. Thus, by upgrading the gateway process from a 500 MHz system to a 700 MHz system and the Ethernet network that connects the gateway to the router by a Fast Ethernet network, one may be able to shave approximately 3 ms off the end-to-end delay. NETWORK ACCESS AT ORIGIN A digitized voice conversation consists of a stream of very small datagrams to reduce the impact of one or more being lost. If the router is connected to the Internet at 56 Kbps and the average datagram length is 49 bytes, then the network access delay at the origin becomes (49 bytes 8 bits/byte)/56 Kbps, or 7.0 ms. Of the 49 bytes in a typical datagram, 20 represent the IP header and 8 represent the UDP header, resulting in 21 actual bytes transporting digitized voice. Although one may encounter references to the use of TCP and UDP for transporting voice, in actuality TCP is used for the connection setup while UDP is used for the actual transport of digitized voice. Because one cannot retransmit lost or erroneous datagrams, UDP is used to transport digitized voice because it is a connectionless, besteffort protocol. Because it is extremely important to ensure that call setup information such as the number dialed is received correctly, TCP, which is a connection-oriented error-free protocol, is used to transport call setup information. If the access line is upgraded to a T1 line operating at 1.544 Mbps, then the network access delay becomes (49 bytes 8 bits/byte) (1.536 Mbps), or 0.25 ms. Note that a divisor of 1.536 Mbps was used instead of 1.544 Mbps for the T1 line because 8 Kbps is used for framing and is not available for data transfer. In examining the network access delay at the origin, it becomes possible to reduce the latency associated with moving datagrams from the local network into the Internet by upgrading the access line. As shown by the previous computations, moving from a 56 Kbps access line to a
T1 access line can reduce latency by almost 7 ms. If the cost of a T1 is prohibitive, one can consider different types of fractional T1 (FT1) access lines that operate between 56 Kbps and 1.544 Mbps and could be used to reduce network access delays. NETWORK TRANSMISSION DELAY Because it is very rare to use voice-over-ip for intra-city communications, assume that the IP network will route data between dissimilar geographical areas. Because there will then be at least two backbone routers involved in the routing process, one can expect a minimum delay of 20 ms. Because most network transmission delays are under 200 ms, the range of delays is listed in Exhibit 1 as 20 to 200 ms. One of the key tools that can be used ahead of time to determine the viability of a voice-over-ip application is the Ping utility program. Exhibit 2 illustrates the basic format of Ping implemented under different versions of Microsoft Windows. Note that one can simply enter Ping with a host name or IP address, or one can enter one or more of the options listed in Exhibit 2. Although the primary use of Ping is to determine the operational status of the target host or address, it will also provide the round-trip delay from originator to destination. Exhibit 3 illustrates the use of the Ping utility to determine the roundtrip delay time between the author s computer and the Web server operated by Yale University. In examining Exhibit 3, note that the Microsoft implementation of Ping results in the transmission of a sequence of four echo-request packets to the target. The target will respond with an echoreply to each echo-request, with the originating station computing the round-trip delay. If a response is not received within 250 ms, which is the default timeout value, the originating station will consider its request to have timed out and will generate its next echo-request. In Exhibit 3, note that the initial round-trip delay was 156 ms, while the second and fourth round-trip delays were computed to be 125 ms and 110 ms, respectively. One of the most common mistakes many persons make when using Ping is to use the first round-trip delay time. In actuality, if one enters a host name that was not previously resolved, the resolution of the host name into an IP address will result in the first round-trip delay being longer than subsequent delays. The only exception to this is when traffic or processing at a router reaches the point where it results in significant delays that cause Ping to timeout. This is illustrated by the third line after the Pinging message in Exhibit 3. Thus, when using Ping to determine the round-trip delay through a network, one should always discard the first delay as it can include time for a host-to-ip address resolution. In addition, one should consider running Ping throughout the day to determine if there are one or more periods of time when delays escalate. One can either write a script to run
EXHIBIT 2 Displaying the Format of Ping by Entering the Command Without Any Options
EXHIBIT 3 Pinging the Web Server at Yale University
Ping at different times, or use the -t option to run it continuously during the day. If one pipes the results to a file, one can then read the file into a spreadsheet and easily determine the mean, peak, and average values, as well as other statistics about round-trip delay. For example, the following DOS command would run Ping on the target www.yale.edu continuously and pipe its output to the file test: Ping -t www.yale.edu >test If the response to a series of Pings issued over a period of time indicates the network delay is too great to allow a voice-over-ip application to have an overall delay under 200 ms, many people would be tempted to give up on the application. Instead, one should consider the use of a second TCP/IP application tool built into Windows and other operating systems. That tool is the traceroute application, which is named tracert under Windows. Tracert operates by initially transmitting a series of packets with the time-to-live (TTL) field value set to 1. Routers automatically decrement the TTL field value. If its value then equals zero, the packet is discarded and the router returns an error message to the originator, along with the IP address of the router and many times textual information about the router and its network connection. The originator then transmits a new series of packets with the TTL field value incremented by 1 to 2. The packets flow through the first router on the path to the destination, where the TTL field value in each packet is decremented by 1 to 1. Thus, the second router in the packet to the destination discards the series of packets and returns error messages to the originator. This process continues until a sequence of packets either reaches its destination or the maximum default TTL value is reached. Under Windows, the default maximum TTL value is 30 hops. Exhibit 4 illustrates the format of the Microsoft Widows version of traceroute called tracert. Note that one can change the maximum number of hops in one s search for the route to a target via the use of the -h option. Because Ping was used to determine the round-trip delay to the Web server at Yale University, now use tracert to trace the path to the Web server. Exhibit 5 illustrates the resulting display from the use of tracert. Note that a total of 15 hops were required to reach the destination. Also note that the Microsoft version of traceroute issues a sequence of three packets for each TTL value used, resulting in three round-trip delay computations. On the 12th hop, the second packet had a response time that exceeded 250 ms, resulting in an asterisk (*) being displayed to indicate a timeout condition. If the use of Ping indicates a periodic delay that exceeds the amount of latency required for a voice-over-ip application to work, one can use tracert to determine if there are one or more routers abnormally contrib-
EXHIBIT 4 The Format of the Microsoft Windows Version of Traceroute, Called Tracert
EXHIBIT 5 Tracing the Route to the Yale University Web Server
uting to the delay. If so, it may be possible to request one s Internet service provider (ISP) to reroute the path packets must take, or perhaps the ISP would be willing to upgrade or replace a router that acts as a bottleneck. In carefully examining Exhibit 5, one notes that the routers at hops 11, 12, and 13 represent bottlenecks. If planning a voice-over-ip application to the Yale campus, this author would certainly bring this to the attention of his ISP. NETWORK EGRESS AT DESTINATION Returning to Exhibit 1, note that the latency or delay for datagrams flowing from the Internet is shown to be between 0.25 ms and 7 ms. Once again, this range represents the variance in delay resulting from the use of a T1 access line (0.25 ms) and a 56 Kbps access line (7 ms). Thus, by upgrading one or both access lines, it becomes possible to reduce endto-end delay. BUFFER As a person talks, a voice coder produces a series of datagrams with a uniform time delay between each datagram. As the datagrams flow through the Internet, they experience random delays at each node in the network based on the flow of traffic to the router as well as the state of its queues. By the time the datagrams exit the Internet and flow toward the destination gateway, the gaps between datagrams have random delays. If these datagrams were directly used to reconstruct voice, the result would be awkward-sounding gaps between each small segment of voice. Thus, instead of being directly converted back to analog voice, the datagrams are first moved into a jitter buffer. Then they are removed in order with a uniform time delay between extractions, resulting in natural sounding voice when the contents of the datagrams are converted back into analog voice. Most jitter buffers can be set from 0 (disabled) to a maximum of 255 ms. Because one needs to reduce the random delays between received datagrams, the jitter buffer should never be set to a value of 0. Similarly, because the human ear can only tolerate 200 ms of delay and there are many other delays that must be considered, the jitter buffer should never be configured anywhere near its maximum value. Instead, a good rule of thumb is to initially set its value to a delay of 10 ms. If the total delay to include the jitter buffer delay is well under 200 ms, one can then experiment and gradually increase the jitter buffer delay to determine if doing so makes reconstructed voice sound better. DECOMPRESSION Unlike voice coding delays that can significantly vary based upon the coding technique, decompression is relatively fixed at approximately
10 ms, regardless of method used. Thus, while there are minor differences in decompression time between voice coding methods, a good rule of thumb is to use a 10 ms delay for compression. RECOMMENDED COURSE OF ACTION As indicated in this article, there are seven major components associated with the end-to-end delay or latency of datagrams transporting voice. While some components are relatively fixed and there is little to be gained by altering their parameters, other delay components are quite variable and there are several techniques for altering their contribution to overall delay. In addition, through the use of Ping and tracert, one obtains the tools to examine the major contributing factor to end-to-end delay network transmission delay. By carefully examining each component of delay and using applicable tools, it becomes possible to successfully implement voice-over-ip applications that otherwise might never become a reality. Gilbert Held is an award-winning author and lecturer. Gil is the author of over 40 books and 250 technical articles. Some of Gil s recent publications include Internetworking Voice and Data, 2nd edition, and Cisco Security Architecture (co-authored with Kent Hundley) published by McGraw-Hill, and Managing TCP/IP and Frame Relay Networking published by John Wiley & Sons. Gil can be reached via e-mail at gil_held@yahoo.com.