Implementing Low-Delay Communication Audio Coding on ARM Processors Marc Gayer (marc.gayer@iis.fraunhofer.de) Fraunhofer Institute for Integrated Circuits IIS
Overview The Fraunhofer Institute for Integrated Circuits IIS Low-Delay Audio Coding Basics MPEG-4 AAC Low-Delay Enhanced AAC Low-Delay with SBR Implementations on ARM-powered platforms Conclusions 2
The Fraunhofer Gesellschaft - FhG Bremen Itzehoe Hannover Berlin Braunschweig Golm Oberhausen Magdeburg Dortmund Duisburg Schmallenberg Aachen St. Augustin LeipzigDresden Euskirchen Jena Ilmenau Chemnitz DarmstadtWürzburg Kaiserslautern Erlangen St. Ingbert Saarbrücken Pfinztal Karlsruhe Stuttgart Freising Largest private research organization in Europe Non-profit organization, founded 1949 58 Institutes in 40 locations in Germany Offices in Europe, USA (San Jose) and Asia Permanent staff 12 700, mainly scientists and engineers Freiburg 3
Fraunhofer IIS Institute for Integrated Circuits Home of Founded in 1985, staff ~ 450 in 4 locations Audio, Video, Multimedia at IIS and IDMT More than 100 engineers in Erlangen and Ilmenau doing research on audio and video technologies MPEG Audio Encoders and Decoders MPEG-4 Advanced Audio Coding (AAC-LC, HE-AAC v2, Low-Delay AAC) MPEG Layer-3 (MP3) MPEG Surround HD-AAC (MPEG-4 SLS, Scalable Lossless) MPEG-4 Video Codecs, AV Streaming Technologies Virtual Acoustics, Metadata, Music Recognition Digital Rights Management Spin-offs: Opticom, Coding Technologies, MusicTrace 4
Low-Delay Audio Coding Why? For two-way interactive communication Long delay feels un-natural Example: Audio Conferencing Audio must be synchronized with video Example: Video Conferencing Where speakers can hear the coded signal Speakers find speech difficult when they hear themselves with > 25 ms delay Example: Phone call without an echo canceller Where acoustic delay is important 1 ms ~= 1 foot of sound propagation Example: Wireless speakers and microphones 5
Why a Special Low-Delay Codec? Traditional speech codecs do not work well with music or background noise. General Audio Codecs or Music Codecs are not typically used in interactive situations, and delay is unimportant. usually long latency Codec MP3 AAC-LC HE-AAC HE-AAC v2 Typical Delay (HW or DSP) 140 ms 210 ms 360 ms Typical Application Music Player Music Player, Broadcasting (ipod, itunes, ISDB) Mobile Music, Satellite and Digital Radio, IPTV (XM Radio, 3GPP, Digital Radio Mondial, DAB+) All delay figures for implementations with 100% encoder workload and continuous bitstream transmission 6
Goals of Next-Gen Conferencing Systems We need the fidelity of a music codec... and the latency of a speech codec. The vision of new systems is to move from just delivering intelligible signals to high-fidelity, entertainment-grade performance Intelligibility Natural sound quality No annoying coding artifacts or noise Full audio bandwidth Robust with tandem coding (cascaded codecs, MCUs) Speaker Separation Be able to hear several people talking at once Multi-channel capability Ambience Sounds come from the room, not a speaker 7
Fraunhofer s Low Delay Codecs Realtime communication MPEG-4 AAC Low Delay, AAC-LD Bidirectional communication, VoIP Delay: 20 ms algorithmic, 31 ms implementation AAC-ELD: Enhanced Low-Delay AAC Based on AAC-LD and Spectral Band Replication Delay: 15-32 ms algorithmic Scheduled as international standard: 1/2008 Wireless Audio ULD: Ultra Low Delay Audio Codec Wireless audio: microphones, loudspeakers, hearing-aids, VoIP, music jam sessions Delay: 6 ms algorithmic, 10 ms implementation 8
The AAC Codec Family Part of ISO/IEC MPEG-2 and MPEG-4 Standards: MPEG-2 AAC: (Japanese ISDB) AAC-LC: Music Coding (ipod, itunes) HE-AAC: Low-bitrate music (XM Radio) HE-AAC v2: Lower-bitrate music (3GPP, digital radio and TV services) AAC-LD: Low-bitrate communication New AAC-ELD: Lower-bitrate communication 9
AAC-LD Basics Configuration or Side Information Input Time Signal MDCT TNS Intensity/ Coupling PNS Mid/ Side Scale Factors Quantization Huffman Coding Spectral Processing Perceptual Model Bitrate / Distortion Controller Quantization and Coding Bitstream Payload Formatter, Bit Reservoir Not Shown Output Bitstream AAC-LD is based on AAC Low-Complexity with several modifications to decrease end-to-end delay 10
AAC-LD Delay Optimizations Lower number of subbands Reduced filter bank delay, only 2 x 480 samples No block switching (compensated by TNS) No look ahead delay Reduced bit reservoir size For example < 100 bits/channel Result: 20 ms algorithmic delay (48 khz, frame length 480) Real-time implementation ~31 ms delay Audio quality comparable to MP3 at same bitrate Up to fs/2 audio bandwidth Tuned for best performance at 16 khz bandwidth for 64 kbps/channel Large range of usable bitrates 32-80 kbps/channel 11
Systems using AAC-LD and/or AAC-LC Tandberg MXP Tandberg MPS 200/800 Sony PCS-TL50P Vcon HD4000/HD5000 Lifesize Telos Zephyr Xstream Musicam Netstar Mayah CENTAURI Source Elements Codian MCU 4200 Comrex Access Cisco Telepresence Apple ichat Several others in pipeline Tandberg MXP 2006: Comrex Access 2006: Codian MCU 2007: Apple ichat 2006: Cisco Telepresence 12
Enhanced Low-Delay AAC (AAC ELD) ISO/MPEG Status (5/2007): FPDAM Scheduled International Standard 1/2008 Modifications to AAC-LD: Delay optimized AAC core Optional Low delay Spectral Bandwidth Replication Key Features: Bandwidth up to 16 khz (and more) Algorithmic delay: 15 ms w/o SBR 32 ms with SBR Typcial bitrate: 24 to 48 kbps/channel with SBR Frame length 20 ms with SBR 13
AAC-LD and AAC-ELD: Delay Analysis Codec AAC-LD AAC-ELD Delay Sources MDCT+IMDCT LD-Filterbank Delay at 48kHz 20 ms 15 ms Low-delay window reduces delay from 960 samples (MDCT) to 720 samples (for a frame size of 480 samples) Low delay filterbank does not increase computational complexity Perfect reconstruction and similar frequency response 14
Spectral Band Replication in AAC-ELD Spectral Band Replication (SBR) Energy Energy Transposition Frequency Envelope Adjustment Frequency Core coder (AAC-LC or AAC-LD) is band limited Transposition of low band to high band Shaping and envelope adjustment of high band with additional SBR side info Add noise or additional sinusoidal frequency components that were detected in the SBR encoder 15
AAC-ELD with LD-SBR: Delay Analysis Codec Delay Sources Delay at 48kHz AAC-LD AAC-ELD AAC-ELD + LD-SBR MDCT+IMDCT LD-Filterbank LD-Filterbank QMF -> CLDFB (new after 80th MPEG meeting) SUM 20 ms 15 ms 30 ms 12 ms 1.3 ms 31.3 ms 16
AAC-ELD: Audio Quality Listening test results: MPEG2007/M14723, July 2007 MPEG critical test items: speech, music, single instruments 10 expert listeners MUSHRA test with 12 test items Bitrate: 32 kbps Comparison codecs: AAC-ELD with SBR, 48 khz MPEG-4 AAC-LD, 24 khz ITU-T G.722.1-C, 32 khz AMR-WB (G.722.2, @ 24kbps), 16 khz 17
AAC-LD/ELD Listening Test Result (32 kbps) 18
Implementing AAC-LD and AAC-ELD on ARM-powered Processors 19
Fraunhofer Core Design Kit (CDK) Development Flow Work done by Memory and runtime optimization Fixed-point knowhow Fraunhofer Licensee or Fraunhofer Assembler optimization ARM Implementation #1 Floating-point Reference not bit-exact Fixed-point Template Code bit-exact ARM Implementation #2 Optimized transcendent functions Quality & stability tests Platform specific cache and memory managment ARM Implementation #n 20
ARM Specific Software Features #if defined( CC_ARM) && defined( TARGET_ARCH_5TE) asm { smulwb result, b, a } #if defined(_win32_wce) && defined(_arm_) #if _MSC_VER >= 0x4b1 #if (_M_ARM==5) #include <armintrin.h> #include <cmnintrin.h> Depending on ARM instruction set and compiler tools use: inline assembler: SMULL, SMULWB, SMLAL, SMLAWB, CLZ assembler intrinsics: _MulHigh, _SmulWLo_SW_SL, _SmulAddWLo_SW_SL, _CountLeadingZeros for a variety of 16/32-bit MPY/MAC and other instructions Support for: ARM RealView/ADS ARM gcc for embedded Linux MS Visual C++ or embedded Visual C++ 21
ARM-powered Test Platforms Logic/Freescale i.mx31 LITEKIT board Freescale i.mx31 Application Development System Both based on ARM 1136JS -powered Freescale i.mx31 at up to 532 MHz Intel IXP420 XScale based on ARMv5TE NSLU2 (Linksys) with embedded Linux for ARM 266 MHz CPU speed, 8 MB flash, 32 MB SDRAM Ethernet, 2 x USB, Serial port Various ARM/Xscale-powered PDAs/mobiles 22
Processing Power Requirements on ARM9E Pure C code with ARM inline assembler/intrinsics, 1-channel mono, 48 khz, ARM RVDS simulator Encoder Decoder ARM946E-S ARM920T ARM946E-S ARM920T MP3 (for comparison) 42 MHz 53 MHz 16 MHz 20 MHz MPEG-4 AAC-LC (TNS + PNS) 40 MHz 52 MHz 11 MHz 13 MHz MPEG-4 AAC Low Delay 55 MHz 72 MHz 17 MHz 21 MHz MPEG-4 HE-AAC (SBR+TNS) 70 MHz 81 MHz 24 MHz 32 MHz AAC-ELD Enhanced AAC Low-Delay 65 MHz 73 MHz 26 MHz 35 MHz 23
Processing Power Requirements ARM946E-S Why an increase in processing power consumption of >35% from AAC-LC to AAC-LD encoding when we don t have short blocks in AAC-LD? 70 MHz 60 50 37.5% higher workload 40 30 Decoder Encoder 20 10 0 Encoder Decoder mp3 AAC-LC AAC-LD HE-AAC AAC-ELD 24
AAC-LC vs. AAC-LD processing power complexity AAC-LC frame length usually 1024 samples AAC-LD frame length 480 or 512 samples AAC-LD routines called twice as often during one time unit: function call overhead Psychoacoustic calculations in encoder based on scalefactor bands (sfbs) AAC-LC @ 48 khz: 49 sfbs (frame length 1024) AAC-LD @ 48 khz: 36 sfbs (frame length 512) 47% more scalefactor band-wise calculations per time unit But: AAC-LD does not use processing power consuming short blocks 25
Conclusions MPEG-4 AAC Low-Delay used by a large number of high-quality audio/video conferencing systems AAC-ELD with optional SBR extension currently under standardization in ISO/MPEG End-to-end delay reduced to only 15 ms w/o SBR Even higher coding efficiency with still very low delay using SBR extension Optimized implementations for all ARMpowered processor platforms available at Fraunhofer IIS 26
Please ask now Come to our booth # 605 www.iis.fraunhofer.de/amm mailto: marc.gayer@iis.fraunhofer.de Fraunhofer office in San Jose 27