Cairo University Faculty of Computers and Information Information Technology Department Master's thesis summary Performance Evaluation of VoIP Quality Schemes Submitted by Esa Sabry El-Metwally El-Metwally Voice over Internet Protocol (VoIP), which provides real-time speech communication between two users in a way that closely resembles a face-to-face conversation. The promise of less expensive phone calls with comparable quality and better features than Public Switched Telephone Networks (PSTNs) has accelerated VoIP s adoption, both in businesses and homes. It is better integration with various forms of collaborative communications, such as instant messaging, email, and voicemail, has made it suitable for future communication solutions, but there remain several issues that are unique to VoIP systems and require further analysis. The quality of a conversation depends on two factors that are directly or indirectly perceived by users: the quality and the latency of the one-way speech segments received. Thesis evaluates the Conversational Voice Communication Quality (CVCQ) of VoIP systems, both from the user and the system perspectives. Thesis identifies the metrics for CVCQ, which include Listening-Only Speech Quality (LOSQ), Conversational Interactivity (CI), and Conversational Efficiency (CE). These depend on the Mouth-to-Ear Delays (MEDs) between the two clients. Then, it investigates their trade-offs with respect to system-controllable mouth-toear delays (MEDs). The work is based on packet traces collected in the PlanetLab simulator and on the dynamics of human interactive speech. Name: Signature 1. Prof. Dr. Fathy Ahmed Amer.. January 2012 -Egypt
Acknowledgement First, I would like to sincerely thank ALLAH the almighty who blessed me with all goods along my life and however I thank him, my thanking will not be enough. I am deeply indebted grateful to my Supervisor Prof. Dr. Fathy Amer for his continuous guidance, supervision, and warm support which helped me to stay on the right track from the first day of my research. Finally, I would like to express my sincere appreciation to my family, specially my parents and my wife for their complete support and patient with me to continue this research. i
Abstract Voice over Internet Protocol (VoIP), which provides real-time speech communication between two users in a way that closely resembles a face-to-face conversation. The promise of less expensive phone calls with comparable quality and better features than Public Switched Telephone Networks (PSTNs) has accelerated VoIP s adoption, both in businesses and homes. It is better integration with various forms of collaborative communications, such as instant messaging, email, and voicemail, has made it suitable for future communication solutions, but there remain several issues that are unique to VoIP systems and require further analysis. One issue is that a VoIP node can reside on any one of several types of hardware interfaces, such as a laptop, Personal Digital Assistant (PDA), smartphone, or dedicated handset. Another issue is that when a conversation is conducted over the Internet, speech segments can experience delays, jitters, and losses. The quality of a conversation depends on two factors that are directly or indirectly perceived by users: the quality and the latency of the one-way speech segments received. In contrast to face-to-face conversations, the delays incurred in the reception of VoIP speech segments can lead to asymmetry in silence durations in between turns and cause inefficiency in communication. In these cases, each user will experience speech segments that are separated by silence periods of alternating long and short durations. This asymmetry might lead to a perception that the other user is responding slowly to the conversation. Due to pathdependent, nondeterministic, and non-stationary network behavior, the factors that affect conversational quality might vary over time and counteract each other. For example, the one-way quality and the delay incurred in the transmission of speech segments from the mouth of a speaker to the ear of a listener Mouth-to-Ear Delay (MED) counteract each other in terms of their effects on conversational quality. On the one hand, speech segments will have a higher chance of being received and consequently provide better one way quality if the receiver waits longer. On the other hand, this additional delay will result in a longer MED, which leads to lower perceptual quality. The impact of delays on conversational quality also depends on the turn-switching frequency. For instance, an MED of 300 milliseconds (ms) can significantly degrade a conversation s symmetry and efficiency if participants take frequent turns, but will be virtually imperceptible if users take a long time (say 10 seconds) in each turn. Variations in one-way ii
quality and latency might cause doubletalk and interruptions that further degrade conversational quality. Ideally, the trade-offs between delay and quality should be dynamic and respond to changing conditions, with the receiver either adapting its MED in order to achieve a consistent speech quality or keeping a consistent MED but allowing the speech quality to vary. While the evaluation of speech communication systems has been an important field for both academia and industry for decades, the introduction of VoIP systems has created a new set of issues that require new evaluation methods. To date, there have been only a small number of comprehensive evaluations of commonly used VoIP systems. In previous work, it has attempted to take on this task with increasing levels of analytical sophistication. The most straightforward analysis approach is through subjective tests. However, subjective tests cannot be used in large-scale experiments due to their large overhead, the high costs of listening experts, and their unrepeatable nature. In addition, the evaluation of some commercial VoIP systems is hampered by their proprietary nature. Most of these systems use codecs and algorithms not freely available for testing. As a result, it s impossible to obtain some of the critical parameters, such as the amount of packets unavailable at the decoder due to network losses or delays. We therefore must evaluate current systems by treating them as black boxes whose input and output waveforms are the only information available. Even so, both subjective and objective metrics are important in the evaluations because each alone is inadequate. This research evaluates the Conversational Voice Communication Quality (CVCQ) of VoIP systems, both from the user and the system perspectives. Thesis identifies the metrics for CVCQ, which include Listening-Only Speech Quality (LOSQ), Conversational Interactivity (CI), and Conversational Efficiency (CE). These depend on the Mouth-to-Ear Delays (MEDs) between the two clients. Thesis investigates their trade-offs with respect to system-controllable mouth-to-ear delays (MEDs). The work is based on packet traces collected in the PlanetLab and on the dynamics of human interactive speech. iii
Publications [1] Esa Sabry, Prof. Dr. Fathi Amer: Performance evaluation of Conversational Voice Communication Quality (CVCQ) of VoIP systems, Al-Azhar Engineering Eleventh International Conference (AEIC 2010), December 21-23 2010 Cairo Egypt. [2] Esa Sabry, Prof. Dr. Fathi Amer: Factors that affect Conversational Voice Communication Quality (CVCQ) of VOIP systems, the 2011 World Congress on Computer Science and Information Technology (WCSIT'11), January 24-27, 2011 Cairo Egypt iv
Contents Acknowledgement Abstract. Publications... Contents... List of figures List of tables..... List of abbreviations. i ii iv v viii ix x Chapter 1: Introduction 1. 1 Background. 2 1. 2 VoIP vs. PSTN... 4 1. 3 Components of VoIP Systems 7 1. 4 Different forms of VoIP. 8 1. 4. 1 ATA 9 1. 4. 2 IP Phones 10 1. 4. 3 Computer-to-computer... 10 1. 5 Motivations. 11 1. 6 Problem Statement. 12 1. 7 Thesis Outline. 13 Chapter 2: Conversational Voice Quality 2. 1 Effect of Delay on Conversation... 15 2. 2 Human Response Delay and Mutual Silence 17 2. 3 Objective Metrics of Interactive VoIP 18 2. 3. 1 Conversational Interactivity... 19 2. 3. 2 Conversational Efficiency. 19 2. 4 Effect of Double Talk on Conversational Quality. 20 2. 4. 1 Double Talk 20 2. 4. 2 Adaptation of Human Behavior 22 v
2. 5 VoIP Client Architecture.... 22 2. 5. 1 Play-Out Scheduling 24 2. 5. 2 Loss Concealment... 24 2. 5. 3 Speech Codecs 25 Chapter 3: Related Work on Evaluation of Conversational Quality 3. 1 Quantifying Quality 27 3. 2 Measuring Quality 28 3. 3 Quality Tolerances 28 3. 4 Quality and Noise 30 3. 5 Effect of MED on Quality 31 3. 6 Subjective and Objective Evaluations 32 3. 7 Objective Measures on Conversational Quality 33 3. 7. 1 The E-Model. 33 3. 7. 2 Perceptual Evaluation of Speech Quality (PESQ) 37 3. 7. 3 The Call Clarity Index (CCI) 39 3. 7. 4 Other Measures... 40 3. 8 Subjective Measures on Conversational Quality.. 41 Chapter 4: Proposed Methodology for Evaluating Conversational Quality 4. 1 A model of Interactive Conversation. 45 4. 2 Test-Bed of the Experiment. 46 4. 3 Evaluation of Conversational Quality Parameters 48 4. 4 Control of MED Level 50 4. 5 Comparing the Performance of Two Conversations 51 Chapter 5: The effect of main factors on conversational quality parameters 5. 1 The Effect of MED. 55 5. 1. 1 The Effect of MED on LOSQ 55 5. 1. 2 The Effect of MED on CI. 59 5. 1. 3 The Effect of MED on CE 62 5. 2 The Effect of HRD. 64 vi
5. 2. 1 The Effect of HRD on LOSQ 64 5. 2. 2 The Effect of HRD on CI and CE 64 5. 3 The Effect of Network Condition 67 5. 3. 1 The Effect of Network Condition on LOSQ. 67 5. 3. 2 The Effect of Network Condition on CI 68 5. 3. 3 The Effect of Network Condition on CE 70 5. 4 Trade-offs on Metrics of Conversational Quality 71 5. 5 Approaches of Main VOIP Systems 72 Chapter 6: Conclusion and Future Work 6. 1 Conclusion. 76 6. 2 Future Work.. 77 References. 78 Appendix A 83 vii
List of Figures Figure 1. 1 Non-adaptive VoIP system. 7 Figure 1. 2 Adaptive VoIP system 8 Figure 1. 3 Different Forms of VoIP Systems.. 9 Figure 1. 4 ATA converter 9 Figure 2. 1 Asymmetric mutual silences... 17 Figure 2. 2 Trade-off between MS variations and LOSQ. 18 Figure 2. 3 Figure 2. 4 The occurrence of a double-talk due to a lack of adequate system reaction to network-delay spikes... Architecture showing interactions among the VoIP clients, the network, and the communicating humans. Figure 3. 1 ROI and Impairment Calculations. 36 Figure 3. 2 The PESQ processing structure... 37 Figure 3. 3 In-service Non-intrusive Measurement Device (INMD) 40 Figure 4. 1 MS experienced and the next HRD. 46 Figure 4. 2 Our test-bed to emulate a two-way interactive voice communication using traces collected in the PlanetLab. Figure 4. 3 MED over VOIP systems 50 Figure 5. 1 Factors affect conversational quality and its parameters 55 Figure 5. 2 The effect of MED on PESQ under (a) various network conditions (b) various conversations types Figure 5. 3 Some samples of LHL network in slow conversations... 57 Figure 5. 4 The effect of MED on CI 60 Figure 5. 5 The effect of MED on CE... 62 Figure 5. 6 The effect of HRD on CQ in different network conditions. 65 Figure 5. 7 The effect of network condition on CI 68 Figure 5. 8 The effect of network condition on CE.. 70 Figure 5. 9 Effect on CI and CE when MED changes.. 71 Figure 5. 10 The planes representing the conditions of the conversational type.. Figure 5. 11 PESQ-MED: Mean values and contours of regions containing 90% of the samples for four VOIP systems.. 21 23 47 56 71 72 viii
List of Tables Table 1. 1 A qualitative comparison of voice over PSTN and over IP 5 Table 2. 1 Statistics of three face-to-face conversations. 20 Table 3. 1 ITU P. 800. 1 terminology on telephone transmission quality 33 Table 3. 2 The ITU s E-model and MOS scores. 35 Table 3. 3 A quality degradation scale 38 Table 4. 1 Objective evaluations of four VoIP systems tested under six Internet and one ideal connections. Table 5. 1 Linear regression table between MED and CI of fast conversation (Type 1) 61 Table 5. 2 Linear regression table between MED and CE of fast conversation (Type 1) 63 Table 5. 3 HRD used in business conversations. 66 Table 5. 4 HRD used in social conversations.. 67 Table 5. 5 Common VoIP Service Quality Thresholds 69 49 ix
List of Abbreviations AIN ATA CCI CE CI CQ CQS CVCQ DSL HRD HRS INMD IP ISP ITU kbps LC LOSQ MED MNRU MOS MS PBX PDA PESQ PLC PSDN PSTN RJ Advanced Intelligent Network Analog Telephone Adaptor Call Clarity Index Conversational Efficiency Conversational Interactivity Conversational Quality Conversational Quality Subjective Conversational Voice Communication Quality Digital Subscriber Line Human Response Delay Human Response Simulator In-service Non-intrusive Measurement Device Internet Protocol Internet Service Provider International Telecommunication Union kilo bit per second Loss Concealment Listening Only Speech Quality Mouth-to-Ear Delay Mean Noise Reference Units Mean Opinion Scores Mutual Silence Private Branch exchange Personal Digital Assistant Perceptual Evaluation of Speech Quality Packet Loss Compensation Packet Switched Data Network Public Switched Telephone Network Registered Jack x
ST UDP VoIP Wi-Fi Segment of Talk User Datagram Protocol Voice over Internet Protocol Wireless Fidelity xi
Chapter One Introduction
Chapter 1 Introduction Chapter One Introduction The global evolution of the Internet and the wide spread growth of networks have been made the Internet part of everyday life. This is the reason why the interest and demand on different applications has been increased. The raise in demand has produced many new applications. Voice over Internet Protocol (VoIP) technology has become a potential alternative to and supplement of the traditional telephony systems over the Public Switched Telephone Network (PSTN), providing a versatile, flexible and cost-effective solution to speech communications. This chapter talks about the VoIP systems and presents a comparison between VoIP systems and PSTN systems. Then, it shows the different forms of VoIP systems. After that, it shows the goal of this thesis and the problem definition. At the last of this chapter, it presents the summary of all chapters of thesis. 1. 1 Background VoIP refers to the real-time transmission of voice signals as packetised data across networks by using the Internet Protocol. The main advantage of VoIP is its low cost structure compared to traditional telephone services, especially for long distance calls. The Internet Protocol network was originally designed for non-real-time data communications, offering a best-effort service with no Quality of Service guarantee. Speech quality is an important aspect of voice communication. On the one hand, service providers must continually assess the quality of the service they offer to maintain their competitiveness. On the other hand, subscribers are constantly comparing the quality and cost structure of various voice services. 2
Chapter 1 Introduction VoIP has many advantages, chief of which its cost-effectiveness. However, to compete with traditional telecommunications technologies, the quality of speech over a VoIP connection must be comparable to or better than that of the PSTN. Compared to the traditional PSTN, new impairments including packet loss, temporal clipping, delay, jitter, and codec distortion are introduced in VoIP. Over the last several years, experience with the voice-over-ip technologies has shown that the quality of voice transmission over the Internet remains a primary obstacle to the broader adoption of VoIP services. VoIP has moved from being an interesting and cheap application for enthusiasts to a public service for everybody, where speech quality requirements have significant importance. Many people are not satisfied with the quality of service offered by VoIP providers, which is often lower than the quality of the traditional PSTN telephony. One of the main causes of the problem is that the Internet was initially designed to transport bursty data, and was not optimized for real-time traffic. Voice requires real-time handling from the network and from the end-points, and very sensitive to many factors. IP call quality can be affected by noise, distortion, too high or low signal volume, echo, gaps in speech, and a variety of other problems. When measuring call quality, three basic categories are studied: listening quality, conversational quality and transmission quality. Listening quality refers to how users rate what they "hear" during a call while conversational quality refers to how users rate the overall quality of a call based on listening quality and their ability to converse during a call. This includes any echo- or delay-related difficulties that may affect the conversation. Transmission quality refers to the quality of the network connection used to carry the voice signal. This is a measure of network service quality as opposed to the specific call quality. 3
Chapter 1 Introduction The objective of call quality measurement is to obtain a reliable estimate of one or more of the above categories using either subjective or objective testing methods, i.e., using human test subjects or computer based measurement tools. 1. 2 VOIP vs. PSTN The public switched telephone network (PSTN) has been evolving ever since Alexander Graham Bell made the first voice transmission over wire in 1876. In traditional telephones, devices are limited to communicating with those devices, which are connected directly, and the telephony companies and their protocols must handle all location and routing features. Traditional telephone uses circuit networks. [1] As PSTN works on circuit switching technique in which network establishes a dedicated end to end connection between two hosts. The resources needed for communication between these end systems are reserved for the duration of communication session. The main disadvantage of circuit switching is the dedicated circuits are idle during silent periods and thus network resources are wasted during these contemplation periods. [2] Internet telephony (VoIP) is a revolutionary technology that has the potential to completely rework the world's phone systems. Internet telephony is the transmission of voice signals from one party to other party digitally i.e., usage of packet switched data network (PSDN). The first documented internet telephony experiments were conducted on the then ARPANET (the forerunner of the Internet) by researchers at Massachusetts Institute of Technology in the mid 1970s, resulting in the publication of an Internet protocol specification, for the Network Voice Protocol, in 1977. [3] These experiments resulted in audio transmission on packet networks but were limited to academic environments only. As computers of that age did not have the power to compress the audio data below 64 kbps or 56 kbps and sound input and output devices have also to be made because there were none to be bought. 4
Chapter 1 Introduction But later when the computing power to compress the speech below 14.4 kbps by 1993 then first commercial Internet phone Application appeared [3]. Unlike PSTN, VoIP uses packet switching, which sends digitized voice data packets over the internet using man y possible paths. The packets are reassembled at their destination to generate the voice signals [4]. VoIP uses packet switching where the network resources are not reserved; a session s messages use the resources on demand, and as a consequence, may have to wait for access to a communication link. In packet switching the packet is sent into network without reserving any bandwidth. If one of the links is congested because other packets need to be transmitted over the link at the same time, then that particular packet has to wait in queue at the sending side of the transmission link and hence suffer a delay. The internet makes its best effort to deliver packets in a timely manner but it does not make any guarantees. [2] over IP. According to [5] following is a qualitative comparison of Voice over PSTN and Table 1.1: A qualitative comparison of voice over PSTN and over IP Concept Voice over PSTN Voice over IP Switching Circuit switched (end-to-end dedicated circuit set up by circuit switches). Packet switched (statistical multiplexing of several connections over links). Bit rate 64 or 32 kbps 14 kbps with overhead (only when speaker is talking). Latency <100 ms 200-700 ms depending on the total traffic on the IP network. Lower latencies possible with private IP networks. Bandwidth Dedicated. Dynamically allocated. 5
Chapter 1 Introduction Concept Voice over PSTN Voice over IP Business customers. Monthly charge for line, plus per-minute for long distance cost of Private Branch exchange (PBX), and other telephony equipments. Residential customer. Monthly charge for line, plus minute charge for long distance, cost of simple phone. Cost of access/billing Equipment Additional features and services Dumb terminal (less expensive); intelligence in the network. Require programming or changes in the network design but fast enough to add if Advanced Intelligent Networks (AIN) are in used. Business customers. Cost of IP infra-structure, Hybrid IP/PBX, and IP phones. Residential customers. Monthly charge for line, plus monthly charge for ISP, cost of computer and other equipment. Integrated smart programmable terminal (expensive); intelligence not in the network. Easy to add without major changes, due to flexible protocol support, but standards are needed for traditional users services. Quality of Service High (extremely low loss). Low and variables, but traffic is sensitive depending on packet loss and delay experienced. Authorization and Authentication Only once when service is installed. Potentially required, per call basis. Regulations Many at federal and state levels. Few yet, but regulatory uncertainty; future regulations may reduce the cost advantages of VoIP. Network availability Electrical power failure at customer premises Security Standard/status 99.999% up time Level of reliability is not known Not a problem; powered by Will have problems, as separate source from phone equipment may be down. company. Power from other sources is High level of security because one line is dedicated to one call. Mature (Simplified internetworking among equipment from different vendors). not easy to obtain. Possible eavesdropping at routers. Emerging possible problems in internetworking. 6
Chapter 1 Introduction 1. 3 Components of VoIP Systems Figure 1.1 demonstrates the traditional (non-adaptive) VoIP system. The system consists of three main components: a sender (a source of VoIP traffic), a receiver (a destination point of a VoIP stream) and the network. The sender can be represented by individual users as well as by a group of calls. Speech is encoded on the sender side and a voice stream typically goes through the traditional data network together with other types of traffic or through a dedicated VoIP-only network. The jitter buffer eliminates a delay variation on the destination side. Different voice decoding mechanisms are used to convert the digital stream to an analog form. Figure 1.1: Non-adaptive VoIP system To make this system adaptive (that is to manage a quality of communications on the sender side in real-time depending on some criteria), it is required to design two components: (1) objective mechanisms of real-time speech quality assessment and (2) adaptive speech quality management algorithms (Figure 1.2). 7
Chapter 1 Introduction Figure 1.2: Adaptive VoIP system 1. 4 Different Forms of VoIP Systems In case of VoIP the interesting thing about VoIP is that there is not just one way to place a call. There are three different "flavors" of VoIP service in common use today. The most common form of Voice over IP service today is the Analog Telephone Adapter (ATA). The ATA is an analog to digital converter and very easy to use, in most cases the ATA is taken from the box plugged into the phone and a broadband connection and the phone is ready to make VOIP calls, this is seen in the top right corner of Figure 1.3 The second VOIP call option is computer to computer calls which Valdes calls the easiest way to use Voice over IP. Computer to computer calls require only a microphone, speakers, sound card free or very inexpensive software and a broadband internet connection, this is demonstrated in the upper left corner of Figure 1.3 The final type of Voice over IP calling involves the use of special IP phones which contain all of the software required onboard and require only to be plugged into a router connected to a broadband connection. This third VOIP solution is more commonly found in business because the units are pricey; a Cisco 7940G IP phone costs $175 on Amazon.com (Amazon, 2011). 8