TECHNICAL PAPER. Fraunhofer Institute for Integrated Circuits IIS



Similar documents
TECHNICAL PAPER. Fraunhofer Institute for Integrated Circuits IIS

FRAUNHOFER INSTITUTE FOR INTEGRATED CIRCUITS IIS AUDIO COMMUNICATION ENGINE RAISING THE BAR IN COMMUNICATION QUALITY

Fraunhofer Institute for Integrated Circuits IIS. Director Prof. Dr.-Ing. Albert Heuberger Am Wolfsmantel Erlangen

IETF 84 RTCWEB. Mandatory To Implement Audio Codec Selection

high-quality surround sound at stereo bit-rates

The AAC audio Coding Family For

Nokia Networks. Voice over LTE (VoLTE) Optimization

Just like speaking face to face.

FAST FACTS. Fraunhofer Institute for Integrated Circuits IIS

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

Understanding the Transition From PESQ to POLQA. An Ascom Network Testing White Paper

The Voice Evolution VoLTE, VoHSPA+, WCDMA+ and Quality Evolution. April 2012

The 3GPP Enhanced Voice Services (EVS) codec

Active Monitoring of Voice over IP Services with Malden

ARIB STD-T V Codec for Enhanced Voice Services (EVS); Voice Activity Detection (VAD) (Release 12)

DAB + The additional audio codec in DAB

SIP Trunking: Enabling Wideband Audio for the Enterprise

HD VoIP Sounds Better. Brief Introduction. March 2009

Mobile Wireless Overview

ARIB STD-T64-C.S0042 v1.0 Circuit-Switched Video Conferencing Services

How QoS differentiation enhances the OTT video streaming experience. Netflix over a QoS enabled

VoLTE with SRVCC: White Paper October 2012

VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.)

BRINGING VOIP TO THE CONFERENCE ROOM: HOW IT MANAGERS CAN ENHANCE THE USER EXPERIENCE

HISO Videoconferencing Interoperability Standard

Evolved HD voice for LTE

Imre Földes THE EVOLUTION OF MODERN CELLULAR NETWORKS

Basic principles of Voice over IP

APTA TransiTech Conference Communications: Vendor Perspective (TT) Phoenix, Arizona, Tuesday, VoIP Solution (101)

Company and Market Overview

Mobility and cellular networks

VoIP QoS. Version 1.0. September 4, AdvancedVoIP.com. Phone:

WEB-BASED VIDEO CONFERENCING

Supporting operators as they introduce Voice over LTE

Achieving PSTN Voice Quality in VoIP

Application Note. Introduction. Definition of Call Quality. Contents. Voice Quality Measurement. Series. Overview

How To Understand How Bandwidth Is Used In A Network With A Realtime Connection

Mobile VoIP Audio Quality CRASH COURSE THE INS AND OUTS OF MOBILE VOIP

NSN White paper November From Voice over IP to Voice over LTE

TECHNICAL REPORT End to End Network Architectures (E2NA); Location of Transcoders for voice and video communications

Video Conferencing Glossary of Terms

PQ v3.0. Voice over Wi-Fi. Datasheet

APPLICATION BULLETIN AAC Transport Formats

Voice over Internet Protocol (VoIP) systems can be built up in numerous forms and these systems include mobile units, conferencing units and

Goal We want to know. Introduction. What is VoIP? Carrier Grade VoIP. What is Meant by Carrier-Grade? What is Meant by VoIP? Why VoIP?

VOICE OVER IP AND NETWORK CONVERGENCE

VoIP Shim for RTP Payload Formats

VoIP over Wireless Opportunities and Challenges

History of Mobile. MAS 490: Theory and Practice of Mobile Applications. Professor John F. Clark

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

White Paper ON Dual Mode Phone (GSM & Wi-Fi)

Performance Evaluation of VoIP Services using Different CODECs over a UMTS Network

Voice, Video and Data Convergence > A best-practice approach for transitioning your network infrastructure. White Paper

Quality of Service (QoS) and Quality of Experience (QoE) VoiceCon Fall 2008

Choosing the Right Audio Codecs for VoIP over cdma2000 Networks:

Voice Quality with VoLTE

HRPD/1XRTT and 3GPP E-UTRAN (LTE) Interworking and Inter-Technology Handoff

ETSI TS V1.1.1 ( )

Measurement of V2oIP over Wide Area Network between Countries Using Soft Phone and USB Phone

VoIP Conferencing. The latest in IP technologies deliver the next level of service innovation for better meetings. Global Collaboration Services

Appendix A: Basic network architecture

INTELLIGENT NETWORK SERVICES MIGRATION MORE VALUE FOR THE

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

High Definition Wideband

Experiencing MultiChannel Sound in

Push-to-talk Over Wireless

Network administrators must be aware that delay exists, and then design their network to bring end-to-end delay within acceptable limits.

Emerging Wireless Technologies

Pradipta Biswas Roll No. 04IT6007 M. Tech. (IT) School of Information Technology Indian Institute of Technology, Kharagpur

Implementation of Video Voice over IP in Local Area Network Campus Environment

Software-Powered VoIP

3GPP Wireless Standard

USER MANUAL DUET EXECUTIVE USB DESKTOP SPEAKERPHONE

Receiving the IP packets Decoding of the packets Digital-to-analog conversion which reproduces the original voice stream

Possible Applications

Wireless Technologies for the 450 MHz band

HSPA: High Speed Wireless Broadband From HSDPA to HSUPA and beyond. HSPA: High Speed Wireless Broadband From HSDPA to HSUPA and Beyond

HSPA, LTE and beyond. HSPA going strong. PRESS INFORMATION February 11, 2011

Cisco Unified IP Phone 7962G and 7942G

LTE, WLAN, BLUETOOTHB

MPEG-4. The new standard for multimedia on the Internet, powered by QuickTime. What Is MPEG-4?

Polycom Video Communications

THE EVOLUTION OF EDGE

ACCESS Rack & Portable. Small, compact and powerful IP audio codec...

CYBER PHYSICAL IIS

VoIP to 20 khz: Codec Choices for High Definition Voice Telephony

NTRC Dominica April 28 to May 2, 2014 Fort Young Hotel

Transcription:

TECHNICAL PAPER The Future of Communication: Full-HD Voice powered by EVS and the AAC-ELD Family We have grown accustomed to HD Everywhere by consuming high fidelity content in most aspects of our lives. State-of-the-art audio and video codecs such as MPEG AAC and H.264 have set our expectations by assuring the highest rich media quality at very low bitrates. The only real exception to the omnipresence of high-quality sound is the phone call, which is still largely tied to the limitations of technologies derived from the last century. With Full-HD Voice, a new era of audio quality for the telecommunications market has begun. Full-HD Voice offers an unsurpassed level of quality, resulting in calls that sound as clear as talking to someone in the same room, or listening to high-quality digital audio. The codecs enabling Full-HD Voice audio quality include the AAC-ELD audio codec family, tailored for over-the-top (OTT) voice services and already used in millions of calls today, and the new next-generation 3GPP communication codec Enhanced Voice Services (EVS), specifically designed to improve mobile phone calls in managed networks. Fraunhofer Institute for Integrated Circuits IIS Director Prof. Dr.-Ing. Albert Heuberger Am Wolfsmantel 33 91058 Erlangen www.iis.fraunhofer.de Contact Matthias Rose Phone +49 9131 776-6175 matthias.rose@iis.fraunhofer.de Contact China Toni Fiedler Phone +86 138 1165 4675 china@iis.fraunhofer.de Contact Japan Fahim Nawabi Phone: +81 90-4077-7609 fahim.nawabi@iis.fraunhofer.de Contact USA Fraunhofer USA, Inc. Digital Media Technologies* Phone +1 408 573 9900 codecs@dmt.fraunhofer.org * Fraunhofer USA Digital Media Technologies, a division of Fraunhofer USA, Inc., promotes and supports the products of Fraunhofer IIS in the U. S.

1. phone calls today - Phone Calls Tomorrow It is no secret that the vast majority of phone calls sound muffled compared to other sources of audio. Calls today have shortcomings that can make it difficult to understand conversations especially in noisy or reverberant environments, listen to talkers with soft or whispered speech and follow conversations in a non-native language or with an accent. For example, the low audio bandwidth makes distinguishing between certain consonants such as f and s quite difficult. Both share a similar low frequency spectral envelope, but the s phoneme is characterized by its significant energy in 10 khz frequency range. Most telephone calls employ low audio bandwidth speech codecs which model the human speech system and can only reproduce the human voice reasonably well. Phone calls are therefore limited to speech only, shutting out more natural communication options that include multiple speakers, ambient sounds, or music. Also, delays lead to involuntary interruptions, impacting natural conversation flow. Codecs used in telecommunication applications such as POTS, ISDN and mobile phone calls are mostly limited to an upper frequency of 3.4 khz - narrowband. Because people are able to hear audio signals of much higher frequencies (between 14 and 20 khz at normal listening levels), most phone services are simply deleting at least three quarters of the audible spectrum. This causes the muffled sound in everyday telephony. The HD Voice services recently introduced by some telephony providers have raised audio bandwidth from 3.4 to around 7 khz, resulting in wideband transmission. The speech codecs used in these services provide audibly better quality compared to legacy calls, but still only transmit less than half of the fully audible audio spectrum. Now, with new technologies providing audio bandwidths of 14 khz and higher, Full-HD Voice is available. The new codecs offer coding of superwideband speech and audio (14-16kHz bandwidth) or even fullband audio (20 khz bandwidth) and deliver an unprecedented level of performance for voice calls, raising the quality to the level experienced in most digital media today. This is achieved by using optimized audio codecs (AAC-ELD) or hybrid solutions uniting both worlds of audio and speech coding (EVS). Figure 1: Bandwidth comparison of POTS, HD Voice and Full-HD Voice 2 / 7

For medium bit rates, as used in professional applications and OTT audio and video services, the AAC-ELD family consisting of AAC-LD (Low Delay AAC), AAC-ELD (Enhanced Low Delay AAC) and AAC-ELD v2 fulfills these high demands. All three codecs are based on the highly successful AAC audio codec and are used for video conferencing systems as well as IP based telecommunication services such as Apple FaceTime. However, the AAC-ELD family is not suitable to enable Full-HD Voice at the low bitrates typically used by operators of managed mobile networks. With the new communication codec EVS developed within 3GPP, there is now a highly efficient audio solution for mobile communication available. 2. Understanding the Codecs Behind Full-HD Voice 2.1 EVS for mobile communication 2.1.1 Mobile telephony on the fast lane Since Long Term Evolution (LTE), the fourth generation of mobile network standards, has been introduced, cellular phone networks are starting to switch to IP based transmission. LTE is based on the older, established GSM and UMTS standards, offering an all-ip architecture and low latencies. It requires the deployment of all-ip voice services or Voice-over-LTE (VoLTE) and in turn opens up the prospect of moving all voice services onto IP networks, eventually phasing out the legacy-switched services based on GSM, UMTS, CDMA networks, and wired public switched telephone networks. With the help of Full-HD Voice technologies service providers can shake off the limitations of these legacy services, including very limited audio bandwidth and the use of speech-centric codecs. With this goal in mind, 3GPP, the international standardization organization for mobile telephony, decided to develop and standardize the new communication codec EVS. It targets packet based systems such as VoLTE (Voice over LTE) or VoWifi (Voice over Wifi), but may be available for 3G/circuit switched systems as well in the future. 2.1.2 EVS the all-rounder Introduced in 2014, EVS is the first 3GPP codec to cover the complete audible audio spectrum of up to 20 khz, pushing mobile phone calls to a new level. Combining state-of-the-art speech and audio compression technologies, EVS enables unprecedented audio quality for speech, music and mixed content. EVS offers a wide range of bit rates from 5.9 kbit/s to 128 kbit/s, allowing service providers to optimize network capacity and call quality as desired for their service. Bit rates for narrow- and wideband start at 5.9 kbit/s, while superwideband Full-HD Voice audio quality is supported from 9.6 kbit/s on. EVS significantly improves the audio quality over legacy codecs at popular mobile bit rates such as 13.2 kbit/s and 24 kbit/s. Another advantage is the codec s backward compatibility to AMR-WB which is enabled by an interoperability mode. It allows seamless switching between VoLTE and circuit switched networks when network conditions warrant a transition. 3 / 7

Mobile network services are often victims of packet loss issues with an unmistakably negative effect on speech intelligibility. Several unique concealment techniques, including a so-called channel-aware mode (CAM) using partial redundancy to improve concealment technologies, have shaped EVS to be a robust audio codec that minimizes errors and quickly recovers from lost packets. Its highly efficient jitter buffer management tops off the allrounder package for a high quality communication experience. The following diagrams compare codec quality on the basis of the mean opinion score (MOS). Figure 2: Quality comparison for clean speech between EVS (narrow-, wideand superwideband) and AMR-NB (narrowband) and AMR-WB (wideband). Today s standard quality equals the pictured AMR-NB values. AMR-WB applies to today s HD Voice quality. The Full-HD Voice superwideband service enabled by EVS outperforms AMR-WB even at very low bit rates such as 9.6 kbit/s. Source: 3GPP TR 26.952, EVS Performance Characterization, Experiment M1. Figure 3: Quality comparison between EVS (narrow-, wide- and superwideband) and AMR-NB (narrowband)/amr- WB (wideband) for music and mixed content. There s an evident quality gap between AMR/AMR-WB and EVS at low bit rates. EVS-WB and EVS-SWB significantly outperform AMR-WB, e.g.: The quality delivered by EVS at 9.6 kbit/s is still higher than that of AMR-WB at 23.85 kbit/s. Source: 3GPP TR 26.952, EVS Performance Characterization, Experiment M3b. 4 / 7

Figure 4: Robustness comparison between AMR-WB and EVS (CAM on and off). EVS maintains its high quality even at a frame error rate (FER) of up to 3% (CAM off). EVS performance at a FER of 6.2% is comparable to AMR-WB s performance at a FER of 3.3%. The same comparison applies for CAM enabled EVS at 9.2%. Source: 3GPP TR 26.952, EVS Performance Characterization, Experiment W1. 2.2 The AAC-ELD codec family The AAC-ELD family consisting of AAC-LD (Low Delay AAC), AAC-ELD (Enhanced Low Delay AAC) and AAC-ELD v2 fulfills the high demands of communication applications in terms of quality and latency. All three codecs are based on the highly successful AAC audio codec, which has been used by Apple in its itunes music store since 2001 and today is deployed in billions of devices. 2.2.1 High efficiency, low latency They support mono, stereo and multichannel signals all with latencies as low as 15-20 ms. Furthermore, the audio codecs extend the application area from clean voice to a broad variety of source material, including voice and singing, music and ambient sounds. The result of all this is natural, real-time communication. The audio quality and operating point of the AAC-ELD family members is described in Figure 5 for stereo audio. While AAC-LD is a very good choice for bit-rates above 96 kbit/s, AAC-ELD improves the audio quality down to 48 kbit/s. Below this bit-rate, AAC-ELD v2 is the best choice to keep the audio quality high. For mono applications, a similar relationship between AAC-ELD and AAC-LD at half bit-rate can be expected, whereas AAC-ELD v2 delivers identical audio quality to AAC-ELD. 5 / 7

Figure 5: AAC-ELD stereo operating points 2.2.2 Application Scenarios Over-the-Top Services and IP Video Telephony Although OTT and VoIP services have been around since the mid to late 1990s, or even earlier with some Packet Radio experiments in Silicon Valley, credit for mass acceptance across services must go to Skype. However, there are limits to what a speech codec, such as Skype s SILK, can deliver in terms of quality. To overcome the limitations of speech codecs, Apple s OTT peer-to-peer video telephony service FaceTime is based on the Full-HD Voice codec, AAC-ELD. As FaceTime is available on most Apple devices, such as iphone, ipad, and Mac, the service can be used today on more than 200 million devices and is growing rapidly. As AAC-ELD is natively supported in Android and ios, Full-HD Voice is readily available in these operating systems. Video Conferencing and Telepresence Video conferencing and telepresence services have been all-ip-based technologies for many years. In these markets, the user expectations of both video and audio quality are demanding. As a result, Full-HD Voice has long been a default choice for these providers. Most companies are offering Full-HD Voice, and a majority of these products are based on the TIP standard, assuring interoperability between devices from different manufacturers. The TIP standard chose AAC-LD as the only mandatory codec besides G.711, which is the legacy voice codec used in narrowband telephony. 3. More information Read more about the AAC-ELD codec family in the EDN Network article Full-HD Voice: Understanding the AAC codecs behind a new era in communication here: http://www.edn.com/design/consumer/4405424/full-hd-voice--understanding-the- AAC-codecs-behind-a-new-era-in-communication. 6 / 7

INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. INFORMATION IN THIS DOCUMENT IS OWNED AND COPYRIGHTED BY THE FRAUNHOFER- GESELLSCHAFT AND MAY BE CHANGED AND/OR UPDATED AT ANY TIME WITHOUT FURTHER NOTICE. PERMISSION IS HEREBY NOT GRANTED FOR RESALE OR COMMERCIAL USE OF THIS SERVICE, IN WHOLE OR IN PART, NOR BY ITSELF OR INCORPORATED IN ANOTHER PRODUCT. Copyright 2015 Fraunhofer-Gesellschaft ABOUT FRAUNHOFER IIS When it comes to advanced audio technologies for the rapidly evolving media world, Fraunhofer IIS stands alone. For more than 25 years, digital audio technology has been the principle focus of the Audio and Multimedia division of Fraunhofer Institute for Integrated Circuits (IIS). From the creation of mp3 and the co-development of AAC to the future of audio entertainment for broadcast, Fraunhofer IIS brings innovations in sound to reality. Today, technologies such as Fraunhofer Cingo for virtual surround sound, Fraunhofer Symphoria for automotive 3D audio, AAC-ELD for telephone calls with CD-like audio quality, and Dialogue Enhancement that allows television viewers to adjust dialogue volume to suit their personal preferences are among the division s most compelling new developments. Fraunhofer IIS technologies enable more than eight billion devices worldwide. The audio codec software and applicationspecific customizations are licensed to more than 1,000 companies. The division s mp3 and AAC audio codecs are now ubiquitous in mobile multimedia systems. Fraunhofer IIS is based in Erlangen, Germany and is an institute of Fraunhofer- Gesellschaft. With 24,000 employees worldwide, Fraunhofer-Gesellschaft is comprised of 66 institutes making it Europe s largest research organization. For more information, contact Matthias Rose, matthias.rose@iis.fraunhofer.de, or visit www.iis.fraunhofer.de/audio. 7 / 7