ADAPTIVE CODEC MODE ASSIGNMENT FOR VOICE OVER IP WITH AMR SPEECH CODEC

Similar documents

An Introduction to VoIP Protocols

Voice over IP Protocols And Compression Algorithms

Voice over IP. Presentation Outline. Objectives

Indepth Voice over IP and SIP Networking Course

Unit 23. RTP, VoIP. Shyam Parekh

Performance Evaluation of VoIP Services using Different CODECs over a UMTS Network

ANALYSIS OF LONG DISTANCE 3-WAY CONFERENCE CALLING WITH VOIP

Voice over IP. Overview. What is VoIP and how it works. Reduction of voice quality. Quality of Service for VoIP

12 Quality of Service (QoS)

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

Advanced Networking Voice over IP: RTP/RTCP The transport layer

Network administrators must be aware that delay exists, and then design their network to bring end-to-end delay within acceptable limits.

Clearing the Way for VoIP

Encapsulating Voice in IP Packets

Computer Networks. Voice over IP (VoIP) Professor Richard Harris School of Engineering and Advanced Technology (SEAT)

Voice over IP (VoIP) Overview. Introduction. David Feiner ACN Introduction VoIP & QoS H.323 SIP Comparison of H.323 and SIP Examples

VoIP Bandwidth Calculation

QoS in VoIP. Rahul Singhai Parijat Garg

Requirements of Voice in an IP Internetwork

VoIP Analysis Fundamentals with Wireshark. Phill Shade (Forensic Engineer Merlion s Keep Consulting)

Voice over IP: RTP/RTCP The transport layer

B12 Troubleshooting & Analyzing VoIP

Voice Over IP Per Call Bandwidth Consumption

Basic principles of Voice over IP

VoIP QoS. Version 1.0. September 4, AdvancedVoIP.com. Phone:

Combining Voice over IP with Policy-Based Quality of Service

IP-Telephony Quality of Service (QoS)

Course 4: IP Telephony and VoIP

A seminar on Internet Telephony

Challenges and Solutions in VoIP

Goal We want to know. Introduction. What is VoIP? Carrier Grade VoIP. What is Meant by Carrier-Grade? What is Meant by VoIP? Why VoIP?

An Analysis of Error Handling Techniques in Voice over IP

Curso de Telefonía IP para el MTC. Sesión 2 Requerimientos principales. Mg. Antonio Ocampo Zúñiga

Comparison of Voice over IP with circuit switching techniques

Voice Over Internet Protocol (VoIP)

Voice over IP (VoIP) for Telephony. Advantages of VoIP Migration for SMBs BLACK BOX blackbox.com

Receiving the IP packets Decoding of the packets Digital-to-analog conversion which reproduces the original voice stream

VOICE over IP H.323 Advanced Computer Network SS2005 Presenter : Vu Thi Anh Nguyet

Understanding Latency in IP Telephony

How To Understand The Differences Between A Fax And A Fax On A G3 Network

Understanding Voice over IP

Evaluating Data Networks for Voice Readiness

Application Note. Pre-Deployment and Network Readiness Assessment Is Essential. Types of VoIP Performance Problems. Contents

VoIP Bandwidth Considerations - design decisions

Introduction to VoIP. 陳懷恩博士副教授兼所長國立宜蘭大學資訊工程研究所 TEL: # 255

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Hands on VoIP. Content. Tel +44 (0) Introduction

Quality of Service Testing in the VoIP Environment

Analysis and Simulation of VoIP LAN vs. WAN WLAN vs. WWAN

Case in Point. Voice Quality Parameter Tuning

Application Notes. Introduction. Contents. Managing IP Centrex & Hosted PBX Services. Series. VoIP Performance Management. Overview.

Glossary of Terms and Acronyms for Videoconferencing

IP Telephony Deployment Models

Applied Networks & Security

Voice Over IP - Is your Network Ready?

VoIP in 3G Networks: An End-to- End Quality of Service Analysis

Quality of Service (QoS) and Quality of Experience (QoE) VoiceCon Fall 2008

VoIP in Mika Nupponen. S Postgraduate Course in Radio Communications 06/04/2004 1

2.1 Introduction. 2.2 Voice over IP (VoIP)

ARIB STD-T64-C.S0042 v1.0 Circuit-Switched Video Conferencing Services

PERFORMANCE ANALYSIS OF VOIP TRAFFIC OVER INTEGRATING WIRELESS LAN AND WAN USING DIFFERENT CODECS

TECHNICAL CHALLENGES OF VoIP BYPASS

NETWORK REQUIREMENTS FOR HIGH-SPEED REAL-TIME MULTIMEDIA DATA STREAMS

Online course syllabus. MAB: Voice over IP

Voice over IP Basics for IT Technicians

INTRODUCTION TO VOICE OVER IP

Application Note How To Determine Bandwidth Requirements

internet technologies and standards

SIP Trunking and Voice over IP

Calculating Bandwidth Requirements

5. DEPLOYMENT ISSUES Having described the fundamentals of VoIP and underlying IP infrastructure, let s address deployment issues.

Overcoming Barriers to High-Quality Voice over IP Deployments. White Paper

APTA TransiTech Conference Communications: Vendor Perspective (TT) Phoenix, Arizona, Tuesday, VoIP Solution (101)

1. Public Switched Telephone Networks vs. Internet Protocol Networks

Voice over Internet Protocol

Optimizing Converged Cisco Networks (ONT)

Implementation of Voice over IP and Audio over IP in the Studio environment

QoS issues in Voice over IP

Simple Voice over IP (VoIP) Implementation

Simulative Investigation of QoS parameters for VoIP over WiMAX networks

Voice-Over-IP. Daniel Zappala. CS 460 Computer Networking Brigham Young University

Performance Analysis of Interleaving Scheme in Wideband VoIP System under Different Strategic Conditions

USING DIGITAL SIGNAL PROCESSOR IN VOICE OVER IP COMMUNICATION

Delivering reliable VoIP Services

Effect of WiFi systems on multimedia applications

Troubleshooting Common Issues in VoIP

Voice over IP (VoIP) Basics for IT Technicians

VIDEOCONFERENCING. Video class

Multimedia Communications Voice over IP

Data Networking and Architecture. Delegates should have some basic knowledge of Internet Protocol and Data Networking principles.

Call Admission Control and Traffic Engineering of VoIP

Voice over Internet Protocol (VoIP) systems can be built up in numerous forms and these systems include mobile units, conferencing units and

Voice over IP. Demonstration 1: VoIP Protocols. Network Environment

Simulation of SIP-Based VoIP for Mosul University Communication Network

Integrate VoIP with your existing network

Transcription:

ADAPTIVE CODEC MODE ASSIGNMENT FOR VOICE OVER IP WITH AMR SPEECH CODEC ABSTRACT In this paper, we proposed a new VoIP system that employs an AMR speech codec which can vary its bit-rate from 12.2 kbits/s to 4.75 kbits/s on a frame basis depending on the network channel conditions. For monitoring of channel conditions every frame, we calculated interarrival jitter using the timestamp information in the RTP header. We then developed the AMR speech codec mode assignment algorithms for VoIP using the estimated channel conditions as follows: 1) simply divide the interarrival jitter range linearly into 8 state, 2) change the range of each codec mode every RTCP packet interval adaptively depending on the long-term estimated channel conditions using both cumulative number of packets lost parameter in the header of RTCP packet and distribution characteristics of the assigned codec modes during that period. Experimental results demonstrate the superiority of the VoIP with an AMR speech codec to the conventional VoIP systems.. INTRODUCTION 1.1 Motivation of the Research Since the possibility of voice communication on the Internet has become a reality in February 1995, when Vocaltec Inc. introduced its Internet phone software, i.e., Internet telephony, voice over Internet protocol (VoIP) is being deployed very rapidly in a number of applications[1]. In addition to

this, VoIP is being developed as a solution of unified communication. VoIP is the technology that allows the transmission of digitized voice signals over the Internet. In other words, VoIP means a realtime delivery of voice traffic in packets between two or more parties across networks. Traditional telephony is based on a fixed-bandwidth, circuit-switched end-to-end connection between two telephones. But, VoIP is based on IP networking, which offers much more potential than just telephony. VoIP initially focused on cheap telephony, offering toll-bypass and PC-to-phone communication. Now, it is used to combine all of service types such as telephony, voice e-mail, fax in real-time without any constriction like time, place, terminal over Internet[2]. One of the key issues related to the success of VoIP is a standard way of providing connectivity from call control to media encoding and administrative controls. Protocol standards for real-time multimedia communication such as H.323, session initiation protocol (SIP), media gateway control protocol (MGCP) provide the foundation for global interoperability and enable future connectivity expansion[3]. Recommendation H.323 from the International Telecommunication Union Telecommunication Standardization Sector (ITU-T) is the pioneering umbrella specification for implementing packet-based multimedia conferencing over local area network (LAN), though it does not guarantee the quality of service (QoS)[4]. In spite of rapid progress of VoIP applications, VoIP still has some problems with reliability and sound quality, due primarily to limitations of both Internet bandwidth and current compression technology[1]. Shortage of bandwidth and high bit-rate of speech codecs are the main cause of interarrival jitter and packet loss. Many researchers pay attention to the improvement of speech quality with VoIP, but most of them are focused on the network protocols to reduce the packet loss, such as resource reservation protocol (RSVP), differentiated service (DiffServ), and multi protocol labeling switches (MPLS)[2]. However, to apply these methods to public network costs lots of money and time. Therefore, to improve the speech quality in VoIP, we focus on the application of speech codec that can change the bit-rate depending on the channel condition.

1.2 Description of the Research Packet loss, jitter, and delay in the packet network are caused mainly by the shortage of network bandwidth. It is due to queuing and routing process in the intermediate nodes of the packet network. In a packet network whose bandwidth is changing very rapidly in time depending on the number of users and data traffic, e,g., Internet, controlling the peak transmission bit-rate of VoIP depending on the channel conditions could be very helpful for making use of the available network bandwidth. Adapting packet size to the channel conditions can reduce packet loss to improve the speech quality. However, the speech codecs adopted in H.323 for VoIP such as G.711 (PCM), G.723.1 (MP-MLQ & ACELP), G.729 (CS-ACELP), GSM-FR have fixed bit-rates, so they cannot adapt themselves flexibly to the channel conditions that may change very rapidly. AMR speech codec[5], on the other hand, is a multirate speech codec that can change the bit-rate from 12.2 kbits/s to 4.75 kbits/s every frame. In this paper, we propose a new VoIP system that employs an AMR speech codec. To apply an AMR speech codec to VoIP, algorithms for AMR codec mode assignment should be developed with estimation of channel conditions. For monitoring of channel conditions, we calculate interarrival jitter value using the timestamp information in the RTP header[6]. We then develop AMR speech codec mode assignment algorithms for VoIP as follows: 1) simply divide the interarrival jitter range linearly into 8 states, i.e., the range of each codec mode is fixed, 2) change the interarrival jitter range of each codec mode every RTCP packet interval adaptively depending on the long-term estimated channel conditions. We perform the voice transmission experiments at several places through the real Internet environment. To evaluate the performance of the proposed VoIP system, we investigate the packet loss rate (PLR), average interarrival jitter and distribution of packet loss from the experiments, and compare them with that of conventional VoIP having fixed bit-rate speech codec. We also perform the mean opinion score (MOS) test for subjective measure of speech quality. Experimental results demonstrate the superiority of the VoIP with an AMR speech codec to the conventional VoIP systems. The rest of this paper is organized as follows. Chapter provides an overview of the structure

of conventional VoIP systems and a brief description of H.323 protocol and RTP/RTCP header information. In addition, speech codecs that are used in the conventional VoIP system are addressed. In Chapter, the method of applying an AMR speech codec to a VoIP system as well as an overview of an AMR speech codec is given. The proposed adaptive codec mode assignment algorithms for the VoIP with an AMR speech codec are then explained in detail. Experimental results are presented with our findings and discussions in Chapter. Finally, Chapter summarizes the research results with a concluding remark and further studies.. CONVENTIONAL VOICE OVER IP SYSTEM 2.1 Background Signaling is one of the most important functions in the telecommunications infrastructure because it enables various network components to communicate with each other to set up and tear down calls. Significant efforts were undertaken in past decades to develop the signaling protocols in use in today s telephone network, also known as the PSTN. These protocols, such as SS7 and Q.931, are defined in large detailed specifications developed by various standardization organizations[7]. Similar efforts are now being undertaken to define VoIP signaling. Since the very beginning of the VoIP industry, issues around signaling protocol for VoIP have been the focal point of industry debates. The ITU-T started work on standardizing VoIP signaling protocols in May 1995. In June 1996, Study Group 16 of ITU-T decided on H.323 version 1, referred to as a standard for real-time videoconferencing over nonguaranteed QoS LANs. An example of VoIP system is shown in Fig. 1. It is hybrid model that is based on the assumption that two or more users have access to personal computer (PC) or telephone that are connected to the Internet[8].

Fig. 1 Structure of VoIP based on integrated service 2.2 H.323 Protocol The H.323[4] standard provides a foundation for audio, video, and data communications across IPbased networks, including the Internet. H.323 is an umbrella recommendation from the ITU-T that sets standards for multimedia communications over LANs that do not provide a guaranteed QoS[9]. Among others, H.323 applications are set to grow into the mainstream market for several reasons. First of all, H.323 sets multimedia standards for the existing infrastructure, i.e., IP-based networks. Designed to compensate for the effect of highly variable LAN latency, H.323 allows customers to use multimedia applications without changing their network infrastructure. Second, IP LANs and PCs are becoming more powerful due to Ethernet bandwidth migrating from 10 Mbps to 100 Mbps, and faster processor, enhanced instruction sets, and powerful multimedia accelerator chips. Third, by providing device-to-device, application-to-application, and vendor-to-vendor interoperability, H.323 allows customer products to interoperate with other H.323 compliant products. And, H.323 provides standards for interoperability between LANs and other networks. The H.323 specification was approved in 1996 by ITU-T's Study Group 16. Version 2 was approved in January 1998. The standard is broad in scope and includes both stand-alone devices and embedded PC technology as well as point-to-point and multipoint conferences. H.323 also addresses

call control, multimedia management, bandwidth management as well as interfaces between LAN's and other networks[10]. H.323 provides the system and component descriptions, call model descriptions, and call signaling procedures. Fig. 2 shows the structure of an H.323 terminal[4]. Like other ITU standards, H.323 does not specify audio or video equipment, data applications, or the network interface, however, it does mandate certain capabilities in order to provide a minimum level of interoperability. H.225.0 describes the media (audio and video) stream packetization, media stream synchronization, control stream packetization, and control message formats. H.245 describes the messages and procedures used for opening and closing logical channels for audio, video, and data, capability exchange, mode requests, control, and indications[9]. These are the recommendations that govern the operation of H.323 equipment and the communications between H.323 endpoints. For audio and video coding, other recommendations are referenced. For audio coding, G.711 is mandatory, while G.722, G.728, G.723.1, and G.729 are optional. For video coding, H.261 QCIF mode is mandatory, while H.261 CIF, and all H.263 modes are optional. The T.120 series of recommendations is used for data applications. H.323 is based on the real-time transport protocol and real-time transport control protocol (RTP/RTCP) of the Internet Engineering Task Force (IETF) for sequencing audio and video packets. Fig. 2 Structure of H.323 terminal

2.3 RTP/RTCP RTP[6] provides end-to-end network transport functions suitable for applications transmitting real-time audio or video data, over multicast network services. RTP neither address resource reservation nor guarantee a QoS for real-time services. The data transport is augmented by a RTCP to allow monitoring of the data delivery in a manner scaleable to large multicast networks, and to provide minimal control and identification functionality. Generally, the calculated interval between RTCP packets is required to be greater than a minimum of 5 seconds to avoid having bursts of RTCP packets exceed the allowed bandwidth when the number of participants is small and the traffic isn t smoothed according to the law of large numbers. RTP/RTCP are designed to be independent of the underlying transport and network layers, and provides some functionality such as sequencing and loss detection. Fig. 3 shows the format of the RTP header[11]. The header format of RTP is designed to support different types of payloads, such as the ITU-T G.711 audio standard, and the JPEG video standard. The RTP protocol data unit (PDU) is carried in the user datagram protocol (UDP) and IP PDUs, with these protocols headers as part of the complete data unit. 0 2 3 4-7 8 9-15 16-31 bits V=2 P X CC M PT Sequence Number Timestamp Synchronization source (SSRC) Identifier Contribution source (CSRC) Identifier (Optional) Payload Data (Variable) Fig. 3 The format of a RTP header 2.4 Speech Codecs in the VoIP System A speech codec transforms analog voice into digital bitstreams and vice versa. In addition, most speech codecs use compression techniques, removing redundant or less important information to

reduce the transmission bandwidth required. Compression in particular is a balancing act between voice quality, local computation power, delay, and network bandwidth requirements[12]. Conventional VoIP systems that are based on H.323 protocol support several speech codecs such as G.711, G.723.1, G.729, GSM-FR with fixed bit-rate. The header length of IP, UDP, and RTP is always fixed, but the length of voice payload that contains coded speech data is varying according to the speech codec. For example, if a 64 kbps pulse code modulation (PCM) is used as a speech codec with frame size of 20 msec, voice payload length becomes 160 bytes. But, in case of 8 kbps conjugated structure-algebraic CELP (CS-ACELP), the payload length becomes 20 bytes. The payload length is fixed with these speech codecs during communication. Table shows the properties of speech codecs used in the conventional VoIP systems and an AMR speech codec[13]. Speech codec s quality is a function of bit-rate, complexity, and processing delay, there is usually a strong inter-dependence between all these attributes and they may have to be traded off against each other. Table Properties of some speech codecs Property Codec G.711 G.723.1 GSM-FR AMR Bit-rate (kbits/s) 64 5.3/6.3 13 12.2~4.75 Frame size (msec) 20 30 20 20 Lookahead size(msec) 0 7.5 0 0 CPU load low high high high Packet size (bytes) 120 14/16 33 12~31 MOS 4.1 3.65 3.7 4.0. PROPOSED VOIP SYSTEM WITH AMR SPEECH CODEC 3.1 Overview of AMR Speech Codec Following the successful standardization for the GSM-EFR speech codec in 1996, ETSI conducted a feasibility study for next generation speech services. The goal was to provide high quality

using the half-rate traffic channel and highly error robust operation in the full-rate traffic channel. The study concluded that the only feasible way to meet the targets was to use an AMR concept, where the speech codec bit-rate was continuously adapted to radio channel conditions no fixed rate solution would meet all the requirements. In October 1998, the AMR speech codec eveloped in collaboration between Ericsson, Nokia and Siemens was selected[14]. The AMR speech codec is based on the ACELP algorithm, and consists of the multirate speech codec, a source controlled rate (SCR) scheme including a voice activity detector (VAD) and a comfort noise generation system, and an error concealment mechanism to combat effects of transmission errors and lost packets[5]. The multi-rate speech codec is a single integrated speech codec with 8 source rates from 4.75 kbits/s to 12.2 kbits/s, and a low rate background noise encoding mode. The speech codec is capable of switching its bit-rate every 20 msec length of speech frame upon command. The codec modes are integrated in a common structure where the bit-rate scalability is realized mainly by altering the quantization schemes according to the codec modes. The frame size is 20 msec with 4 subframes of 5 msec. The bit-rates of the codec for each mode is shown in Table. The silence descriptor, SID, denotes parameters for background noise characteristics. Table Source codec bit-rates for an AMR speech codec Codec mode Source codec bit-rate AMR_12.20 12.20 kbits/s (GSM-EFR) AMR_10.20 10.20 kbits/s AMR_7.95 7.95 kbits/s AMR_7.40 7.40 kbits/s (IS-641) AMR_6.70 6.70 kbits/s (PDC-EFR) AMR_5.90 5.90 kbits/s AMR_5.15 5.15 kbits/s AMR_4.75 4.75 kbits/s AMR_SID* 1.80 kbits/s* (*) Assuming SID frames are continuously transmitted

3.2 Application of an AMR Speech Codec to VoIP Most factors that degrade the speech quality of the VoIP systems such as packet loss, jitter, and delay are caused by the shortage of network bandwidth. It is due to queuing and routing process in the intermediate nodes of the packet network. To guarantee a certain QoS even in critical conditions, it is necessary to control the peak transmission bit-rate depending on the network conditions. Controlling the bit-rate of a speech codec, i.e., multirate coding means changing the packet size to be transmitted. Adapting packet size to the channel conditions can reduce packet loss to improve the speech quality. The speech codecs adopted in H.323 for VoIP are G.711, G.723.1, G.72, GSM-FR, and so on. These codecs have fixed bit-rates as shown in Section 2.5, so they cannot adapt themselves flexibly to the channel conditions that may change very rapidly. AMR speech codec, on the other hand, is a multirate speech codec that can change the bit-rate from 12.2 kbits/s to 4.75 kbits/s every frame, i.e., on a packet basis. In addition, since AMR speech codec is a standard for IMT-2000, when interworking operation between IMT-2000 wireless network and IP network of voice traffic with an AMR speech codec is needed, a transcoding procedure in the Gateway between two networks can be removed. It could decrease the overall transmission delay and complexity, and improve the speech quality. Thus, in this paper, we propose a new VoIP system that employs an AMR speech codec. To select one of eight codec modes of an AMR speech codec on a packet basis, the channel condition should be estimated. We use the timestamp information in the RTP header[6] for it. To determine the bit-rate of an AMR codec, when applied to a VoIP system, we define a parameter of Jit given in Eq. (1) as a measure of monitoring channel conditions. i Jit = ( R S ) ( R S ) (1) i i i i 1 i 1 where R and i S denote the arrival time at the receiver and the timestamp in the RTP header of i packet i, respectively. By computing Eq. (1) at the receiver, interarrival jitter of data packets, i.e., channel condition, can be estimated. When we set the initial de-jitter buffer size to 40 msec at the

receiver, we can simply divide the ranges of Jit value linearly into 8 states as shown in Table. i Using both interarrival jitter of the received packet estimated by Eq. (1) and Table, we assign the AMR codec mode of the packet to be transmitted. These procedures are done every transmission packet. For example, if Jit is 4 msec, AMR codec mode of 12.2 kbits/s will be assigned to the i packet to be transmitted. In the case that Jit is 14 msec, 7.95 kbits/s will be assigned to the packet. i 3.3 Adaptive Codec Mode Assignment Algorithms If the range of each state for codec mode assignment of Table is fixed as mentioned in Section 3.2, packet loss can be increased when the poor channel condition lasts long. In addition to that, when the network condition is very good, the assigned codec modes may be concentrated around the high bit-rate, which reduces the available network bandwidth. Considering such cases, controlling the range of each state adaptively according to the channel conditions will be more desirable in a view of long-term. We propose adaptive codec mode assignment algorithms for an AMR speech codec to VoIP for better adaptation to the channel conditions. To monitor the channel conditions in a view of long-term, Table Initial range for AMR codec mode assignment Range of Jit (msec) i AMR codec mode 0( range [1] )< Jit <5( range [2] ) i 12.2 kbits/s 5( range [2] )< Jit <10( range [3] ) i 10.2 kbits/s 10( range [3] )< Jit <15( range [4] ) i 7.95 kbits/s 15( range [4] )< Jit <20( range [5] ) i 7.40 kbits/s 20( range [5])< Jit <25( range [6]) i 6.70 kbits/s 25( range [6])< Jit <30( range [7] ) i 5.90 kbits/s 30( range [7] )< Jit <40( range [8] ) i 5.15 kbits/s 40( range [8] )< Jit i 4.75 kbits/s

we consider two parameters: one is the cumulative number of packets lost in the header of RTCP packet that is transmitted about every 5 seconds, and the other is distribution characteristics of the assigned codec modes during that RTCP period. The proposed algorithm for adaptive codec mode assignment with an AMR speech codec is summarized in Fig. 4. The cumulative number of packets lost in the header of RTCP represents the total number of lost RTP data packets that is transmitted from the sender since the beginning of reception[6]. We control the range of each state of codec mode given in Table according to the received cumulative number of packets lost parameter every RTCP packet interval. If that parameter of current RTCP packet is higher than that of the previous RTCP packet, which implies that the channel condition could be poor in a view of long term, we increase the range of each state of codec mode. Otherwise, we decrease the range of each state of codec mode. The distribution characteristics of the assigned codec mode are also used to keep the AMR codec mode from being driven into the highest or lowest codec mode. To begin with, we determine M max, i.e., the codec mode assigned most frequently during a RTCP interval. Depending on M max, we decrease or increase the range of each state to spread the occurrence of the assigned codec modes. From now on, we will abbreviate the codec mode assignment method with fixed state range as given in Table to AMR 1, and adaptive codec mode assignment algorithm with backward channel condition monitoring to AMR 2. 3.4 Methods of Speech Quality Test To compare objective speech quality of different speech codecs, SNR, cepstral distortion measure, likelihood ratio measure, and so on[15] are generally used. However, since the speech quality in the packet network largely depends on the number of lost packets, especially, contiguously lost packets, and interarrival jitter[16], we use the QoS factors of the packet network such as packet loss, jitter, and distribution of packet loss as an objective method for speech quality test. Therefore, we investigate the PLR and interarrival jitter depending on the codecs of the VoIP systems during the voice transmission experiments. And, we also measure the characteristics of distribution of lost packets that occur during

Every RTCP interval, FOR i = 1 to 8 IF CumLoss j > CumLoss j-1 range[i] += step ELSE IF CumLoss j < CumLoss j-1 range[i] -= step END Calculate M max FOR k = 1 to 8 IF k > M max range[k] -= step ELSE IF k < M max range[k] += step END where, step = 2 CumLoss: Cumulative number of lost packets j: count of RTCP packet M max : the codec mode which have been assigned most frequently during a RTCP interval k: number assigned to each codec mode k = 1(12.2kbps) ~ k = 8(4.75kbps) (range[1] >= 0 and range[i] < range[i+1]) Fig. 4 Adaptive codec mode assignment algorithm voice transmission in order to verify the capability of the proposed algorithm for contiguously lost packets. We also perform the MOS test[17] for subjective measure of speech quality. MOS is the result of averaging opinion scores for a set of several untrained subjects, and it is employed to assess directly the naturalness of speech quality. Each listener characterizes each set of utterances with a score on a scale from 1 (unacceptable quality) to 5 (excellent quality). An MOS of 4.0 or higher defines good or toll quality, where the reconstructed speech signal is generally indistinguishable from the original

signal. An MOS between 3.5 and 4.0 defines communication quality, which is sufficient for telephone communications[15].. EXPERIMENTS AND DISCUSSION 4.1 Experimental Conditions In our preliminary experiment[18], we constructed the H.323 terminal with an AMR speech codec by applying AMR codec released by 3GPP[19] to the Openphone, an open source for Internet telephony distributed by Equivalence Pty Ltd.[20]. Modifying that terminal, in this paper, we carried out several voice transmission experiments in a full-duplex mode through the public Internet environment to evaluate the performance of the proposed adaptive codec mode assignment algorithms described in Section 3.2 and 3.3. Experimental setup is shown in Fig. 5. We selected GSM-FR and G.723.1 as conventional speech codecs for comparison with an AMR speech codec because the highest bit-rate mode of an AMR speech codec is nearly the same as the bit-rate of GSM-FR, and G.723.1 is the popular speech codec recommended by the VoIP group. The VoIP group is a consortium of vendors backed by Intel and Microsoft to recommend standards for telephony and audioconferencing over the Internet[1]. We did the experiments three times at the different locations and dates. The first one was done between Daegu and Masan, in Korea, on July 13. To consider the bad channel conditions such as high PLR and large delay, the second experiment was carried out between Daegu in Korea and Oxford in England on September 10. One more experiment in Korea was done between Daegu and Jeju on October 14. In each experiment, we transmitted and received simultaneously about 15-minute long recorded speech data files three times at each party. The total number of transmission packets in each trial is about 43,000. And, we didn t use VAD for each speech codec in order to transmit speech packets as many as possible. Then we analyzed the PLR, average interarrival jitter, average transmission

Fig. 5 Experimental setup for voice transmission over Internet bit-rate, distribution characteristics of assigned AMR codec modes as well as that of contiguously lost packets. We also did an informal listening test using some segments of the reconstructed speech selected to have a similar average PLR. The following sections show the experimental results and our discussions. 4.2 Experiment #1 (Daegu Masan) As described in Section 4.1, the first experiment was carried out between our laboratory at Daegu and a PC game room, a sort of Internet café, at Masan. Table shows the average PLR for each trial. In the table, Da and Ma represent the measurement at Daegu and Masan, respectively, and the attached number 1 to 3 denotes the order of three trials. Fig. 6 shows comparison of the overall average PLR for three trials at each party depending on the type of used speech codecs. Considering the measurement at Masan, AMR 2 turned out to have the least PLR than any others, as expected. The difference of PLR depending on the speech codecs at Daegu is not as remarkable as that of Masan, however, AMR 1 and AMR 2 show slightly less PLR than conventional speech codecs on the whole.

The reason that AMR 1 has larger PLR than G.723.1 is thought to be the burst transmission error occurred during the first trial, as shown in Table. From the results of Table and Fig. 6, we can say that a VoIP system with an AMR speech codec outperforms the conventional VoIP systems, and AMR 2, i.e., adaptive codec mode assignment method is better than AMR 1, i.e., fixed codec mode assignment method. The average PLR measured at Masan is 1.79%, and that of Daegu is 5.02%. It means that the channel condition, from Masan to Daegu is worse than that from Daegu to Masan. In other words, the channel conditions for both forward and backward directions are not alike. Fig. 6 Average PLR for experiment #1 Table Average PLR for experiment #1 Speech Codec Trial (Unit: %) Ma1 Ma2 Ma3 Da1 Da2 Da3 GSM-FR 0.209 6.121 5.605 6.768 4.382 4.876 G.723.1 0.986 1.235 0.91 6.199 4.025 4.215 AMR 1 5.781 0.225 0.027 8.117 3.807 4.146 AMR 2 0.167 0.255 0.016 5.573 3.806 4.366

Table shows the average interarrival jitter for each trial. The overall average interarrival jitter for three trials depending on the type of used speech codecs is shown in Fig. 7. It is seen that the average interarrival jitter measured at Masan is 6.61 msec while that of Daegu is 3.18 msec. The average interarrival jitter at Daegu is nearly half of that at Masan. It might result from the large PLR at Daegu since packets having larger jitter than de-jitter buffer size are discarded and not considered. Although smaller interarrival jitter does not imply less PLR, it would be helpful to reduce the playout delay. Table and Fig. 7 show that AMR has interarrival jitter similar or less than conventional speech codecs. Table Average interarrival jitter for experiment #1 Speech Codec Trial (Unit: msec) Ma1 Ma2 Ma3 Da1 Da2 Da3 GSM-FR 3.091 9.328 8.168 3.573 3.23 2.883 G.723.1 10.5 11.24 9.471 2.8696 3.958 2.571 AMR 1 8.882 4.901 3.249 3.597 2.472 1.995 AMR 2 2.848 4.421 2.107 3.058 4.484 3.532 Fig. 7 Average interarrival jitter for experiment #1

Transmission bit-rates of GSM-FR and G.723.1 are fixed with 13 kbits/s and 6.3 kbits/s, respectively. The overall average transmission bit-rates of AMR 1 and AMR 2 in this experiment were 11.16 kbits/s and 9.14 kbits/s, respectively. From this result and Table, the adaptive codec mode assignment method, i.e., AMR 2 is believed to cope with the rapidly changing channel conditions better than AMR 1, GSM-FR and G.723.1. 4.3 Experiment #2 (Daegu Oxford) The second experiment was done between our laboratory and an Internet café at Oxford in England. Experimental setup and conditions are the same as ones in the first experiment Fig. 8 shows the overall average PLR for three trials at each party depending on the used speech codecs. The PLR measured at Oxford is 6.25%, and is as large as that of Daegu. It means that the network conditions of both directions are almost same and bad. Although there is not much difference in PLR among speech codecs here unlike we had in the previous experiment, the adaptive codec mode assignment methods, AMR 2 still shows better performance than GSM-FR and G.723.1. Table shows the average PLR for each trial, and the Ox denotes the measurement at Oxford in England. Fig. 8 Average PLR for experiment #2

Table. Average PLR for experiment #2 Speech Codec Trial (Unit: %) Ox1 Ox2 Ox3 Da1 Da2 Da3 GSM-FR 10.227 5.094 4.963 6.436 7.817 8.275 G.723.1 8.27 5.227 4.893 6.209 8.114 3.915 AMR 1 8.445 5.709 4.66 6.355 8.358 4.115 AMR 2 8.226 5.724 4.687 6.141 7.69 3.666 Table shows the average interarrival jitter for each trial. The overall average interarrival jitter for three trials is given in Fig. 9. It is seen that the average interarrival jitter measured at Oxford is 11.74 msec while that of Daegu is 5.24 msec. The average interarrival jitter at Daegu is less than half of that at Oxford. Considering nearly the same PLRs at both sides, it means that the channel condition from Daegu to Oxford is a little bit worse than that of reverse direction while this experiment was carried out. The PLR is the most important factor that affects the speech quality in the packet network. The packet loss is desirable to be minimal and sporadic for a given PLR since existing error concealment techniques could work reasonably well for one or two lost packets. Thus we examined the occurrence of contiguously lost packets for each trial. Table is an example of the occurrence of contiguously lost packets. Table is the measurement at Oxford for the first trial. From this table, we can see that AMR 1 to AMR 2 have less occurrence of contiguously lost packets compared with GSM-FR and Table. Average interarrival jitter for experiment #2 Speech Codec Trial (Unit: msec) Ox1 Ox2 Ox3 Da1 Da2 Da3 GSM-FR 11.841 13.621 11.853 6.782 6.257 3.564 G.723.1 10.907 13.602 11.576 6.875 6.026 3.529 AMR 1 11.913 13.325 10.48 6.874 5.527 4.145 AMR 2 10.705 13.043 10.102 5.701 5.26 3.886

Fig. 9 Average interarrival jitter for experiment #2 G.723.1. Comparing AMR 1 with AMR 2, the latter is found to have less burst errors, i.e., less occurrence of large size of contiguously lost packets. This proves the adaptation capability of the proposed AMR codec mode assignment algorithms to rapidly changing channel condition. Table Occurrence of contiguously lost packet for experiment #2 (Da1 Ox1) Speech codec Number of Contiguously lost packet GSM-FR G.723.1 AMR 1 AMR 2 1 3705 3115 3180 3180 2 571 461 480 457 3 85 75 90 77 4 20 28 7 6 5 15 30 3 2 6 8 15 8 2 7 9 11 3 0 8 5 4 0 0 Greater than 8 0 0 0 0

The overall average transmission bit-rates of AMR 1 and AMR 2 in this experiment were 10.14 kbits/s, and 8.89 bkits/s, respectively. Fig. 10 shows an example of the occurrence of assigned codec modes. Fig. 10 is the measurement at Oxford for the third trial. We can see that most codec modes assigned in AMR 1 is in the range of high bit-rate, which is not desirable in a sense of reducing the packet size. On the contrary, assigned codec modes of AMR 2 are spread from high bit-rate to low bit-rate. That s because the proposed algorithm keeps the assignment of codec modes from being driven into the highest or lowest bit-rate. Fig. 10 Distribution of assigned bit-rate in AMR (Da3 Ox3) 4.4 Experiment #3 (Daegu Jeju) Considering the farthest domestic place from Daegu, the third experiment was done between our laboratory and a PC game room in Jeju. Experimental setup and conditions are the same as ones in the second experiment. Fig. 11 shows the overall average PLR for three trials at each party. It shows completely different channel conditions between forward path and backward path. The PLR measured at Jeju is 4.91%, however, the PLR at Daegu is nearly zero. This means that the channel condition

from Daegu to Jeju was very poor while the channel condition from Jeju to Daegu was very excellent during the experiment. As expected, we can see the PLR of AMR 2 at Jeju is lower than that of AMR 1 by 2.5%, which corresponds to improvement of packet loss by 48%. The overall average PLR of AMR 2 is lower than that of the GSM-FR and G.723.1 by 5.59% and 3.66%, respectively. It says that the proposed VoIP system outperforms the conventional VoIP systems. Table shows the average PLR for each trial, where Je denotes the measurement at Jeju in Korea. Table depicts the average interarrival jitter for each trial. The overall average interarrival jitter for three trials is shown in Fig. 12. It tells that the average interarrival jitter measured at Jeju is Fig. 11 Average PLR for experiment #3 Table. Average PLR for experiment #3 Speech Codec Trial (Unit: %) Je1 Je2 Je3 Da1 Da2 Da3 GSM-FR 2.726 3.454 18.808 0 0 0.004 G.723.1 2.823 2.536 13.836 0.008 0 0.011 AMR 1 1.507 2.162 12.128 0 0 0.004 AMR 2 0.551 0.934 6.816 0 0 0.002

Table. Average interarrival jitter for experiment #3 Speech Codec Trial (Unit: msec) Je1 Je2 Je3 Da1 Da2 Da3 GSM-FR 11.766 10.983 15.918 5.613 4.531 5.98 G.723.1 9.649 14.368 17.81 6.21 4.894 6.531 AMR 1 9.751 10.434 14.633 4.973 4.824 5.477 AMR 2 9.86 10.461 13.41 4.854 5.089 5.043 Fig. 12 Average interarrival jitter for experiment #3 12.18 msec while that of Daegu is 5.22 msec. Like the second experiment, the average interarrival jitter at Daegu is less than half of that at Jeju. But these were not show much difference among speech codecs. Table is an example of the occurrence of contiguously lost packets. Table is the measurement at Jeju for the first trial. From this table, we can see that AMR 2 do not have big burst errors but GSM-FR and G.723.1 do. Comparing AMR 1 with AMR 2, the latter is found to have less burst errors, i.e., less occurrence of large size of contiguously lost packets. This result agrees with that of the previous experiment.

Table. Occurrence of contiguously lost packet for experiment #3 (Da1 Je1) Speech codec Number of Contiguously lost packet GSM-FR G.723.1 AMR 1 AMR 2 1 45 40 34 29 2 30 28 30 16 3 12 19 15 9 4 11 9 6 2 5 9 11 10 6 6 9 6 6 3 7 4 3 2 1 8 3 5 3 1 9 3 2 3 1 10 2 3 2 0 11 0 1 1 0 12 3 3 1 1 13 3 2 2 2 14 1 2 1 0 15 2 0 3 0 16 2 2 2 0 17 3 2 0 0 Greater than 18 17 17 5 1 The overall average transmission bit-rates of AMR 1 and AMR 2 in this experiment were 10.33 kbits/s and 8.82 kbits/s, respectively. Fig. 13 shows an example of the occurrence of assigned codec modes. Fig. 13 is the measurement at Jeju for the third trial. We can see that assigned codec modes of AMR 1 are much spread compared to that of the second experiment. It implies that the channel condition was very poor in this experiment as shown in Fig. 11.

Fig. 13 Distribution of assigned bit-rate in AMR (Da3 Je3) 4.5 MOS Test We did an informal listening test to assess the performance of the proposed VoIP system. First, we extracted 5 segments of speech, synchronized one another as well as having similar value of PLR, from the received speech data files. Then, an informal listening test, i.e., MOS test, was done as a subjective speech quality test with 12 persons. The MOS test result for each speech segment is shown in Table. As expected, the proposed VoIP system showed higher score than the conventional VoIP systems. Among them, AMR 2 showed the highest score. The superiority of AMR speech codec is shown to be more dominant in the low PLR than in the high PLR. It says that AMR basically has better speech quality than GSM-FR and G.723.1 as shown in Table, and appropriate error concealment techniques are needed in the packet network to heal the burst errors.

Table. Results of MOS test for selected speech segments Codec Segment 1 2 3 4 5 Average GSM-FR 3.09 3.36 3.36 2.27 2.55 2.93 G.723.1 3.09 3.55 3.00 2.09 2.18 2.78 AMR 1 3.82 4.64 4.18 2.18 2.45 3.45 AMR 2 4.45 4.18 4.18 2.45 2.45 3.54. CONCLUSION In this paper, we proposed a new VoIP system that employs an AMR speech codec which can vary its bit-rate from 12.2 kbits/s to 4.75 kbits/s on a frame basis depending on the network channel conditions. To apply an AMR speech codec to VoIP terminal, algorithms for AMR codec mode assignment should be developed with estimation of channel conditions. For monitoring of channel conditions every frame, we calculated interarrival jitter value using the timestamp information in the RTP header. We then developed the following AMR speech codec mode assignment algorithms for VoIP using the estimated channel conditions: 1) simply divide the interarrival jitter range linearly into 8 states, i.e., the range of each mode is fixed, 2) change the interarrival jitter range of each codec mode every RTCP packet interval adaptively depending on the long-term estimated channel conditions using cumulative number of packets lost parameter in the header of RTCP packet and distribution characteristics of the assigned codec modes during that period. We performed the voice transmission experiments at several places through the real Internet environment. To evaluate the performance of the proposed VoIP system, we investigated the PLR, average interarrival jitter and distribution of packet loss from the experiments, and compared them with that of conventional VoIP having fixed bit-rate speech codec. We also performed the MOS test for subjective measure of speech quality. Experimental results demonstrated the superiority of the

VoIP with an AMR speech codec to the conventional VoIP systems. VoIP system with an AMR speech codec has less packet loss rate and interarrival jitter than VoIP with fixed bit-rate speech codecs such as GSM-FR and G.723.1 even though the average bit-rate of AMR is higher than that of G.723.1 speech codec. In the VoIP with an AMR speech codec, adaptive codec mode assignment algorithms showed less PLR than the codec mode assignment algorithm with fixed range of each state. The estimated channel condition actually represents the quality of data transmission from sender to receiver, i.e., channel conditions at the viewpoint of the sender side. But we determine the codec mode of the packet to be transmitted at the receiver with that information assuming that the channel conditions of both forward and backward paths are similar each other. As further studies, therefore, methods to feedback the estimated channel condition to the sender for determining the codec mode of the packet to be transmitted at the sender are being investigated. REFERENCE [1] Internet Telephony Tutorial, http://www.webproforum.com/siemens2/full.html. [2] IP Telephony Signalling, http://www.ericsson.com/datacom/emedia/ip_telephony.pdf. [3] J. Toga, J. Ott, ITU-T Standardization Activities for Interactive Multimedia Communications on Packet-based Networks: H.323 and Related Recommendations, Journal of Computer Networks, Vol. 31, pp. 205-223, 1999. [4] ITU-T Recommendation H.323, Visual Telephone Systems and Terminal Equipment for Local Area Networks which Provide a Non-Guaranteed Quality of Service. [5] ETSI Draft EN 301 704, Digital Cellular Telecommunication System(Phase 2+); Adaptive Multi- Rate(AMR) Speech Transcoding. [6] RFC 1889; RTP: A Transprot Protocol for Real-Time Applications. [7] H. Liu, P. Mouchtaris, Voice over IP Signaling: H.323 and Beyond, IEEE Communications Magazine, pp. 142-148, Oct., 2000.

[8] T. J. Kostas, M. S. Borella, I. Sidhu, G. M. Schuster, J. Grabiec, J. Mahler, Real-Time Voice Over Packet-Switched Networks, IEEE Network, Jan./Feb., pp. 18-27, 1998. [9] G. A. Thom, H.323: The Multimedia Communications Standard for Local Area Networks, IEEE Communications Magazine, pp. 52-56, Dec., 1996. [10] A DataBeam Corporation White Paper, A Primer on the H.323 Series Standard, 1998. [11] U. Black, Internet Telephony, Prentice-Hall, 2001. [12] B. Douskalis, IP Telephony, Hewlett-Packard, 2000. [13] A. M. Kondoz, Digital Speech, Jonh Wiley & Sons, 1994. [14] E. Ekudden, etc, The Adaptive Multi-Rate Speech Coder, IEEE Workshop on Speech Coding Proceedings, pp. 117-119, 1999. [15] X. Huang, A. Acero, H. Hon, Spoken Language Processing, Prentice Hall, 2001. [16] J. Davidson, J. Peters, Voice over IP Fundamentals, Cisco Press, 2001. [17] S. Furui, M. M. Sondhi, Advances in Speech Signal Processing, Marcel Dekker, 1991. [18] J. W. Seo, S. J. Woo, K. S. Bae, A Study on the Application of an AMR Speech Codec to VoIP, in Proc. ICASSP, Vol. 3, pp.1373-1376, 2001. [19] ETSI Draft EN 301 712, Digital Cellular Telecommunication System(Phase 2+); Adaptive Multi- Rate(AMR) Speech; ANSI-C Code for the AMR Speech Codec. [20] http://www.openh323.org.