Multimedia Communications: Coding, Systems, and Networking. Prof. Tsuhan Chen MPEG Audio

Similar documents
AUDIO CODING: BASICS AND STATE OF THE ART

MPEG-1 / MPEG-2 BC Audio. Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de

Digital Audio Compression: Why, What, and How

Audio Coding Introduction

The Theory Behind Mp3

Audio Coding, Psycho- Accoustic model and MP3

MP3 AND AAC EXPLAINED

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER. Figure 1. Basic structure of an encoder.

White Paper: An Overview of the Coherent Acoustics Coding System

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

A Comparison of the ATRAC and MPEG-1 Layer 3 Audio Compression Algorithms Christopher Hoult, 18/11/2002 University of Southampton

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.

Digital terrestrial television broadcasting Audio coding

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

Video Coding Basics. Yao Wang Polytechnic University, Brooklyn, NY11201

Audio Coding Algorithm for One-Segment Broadcasting

Quality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden)

Analog Representations of Sound

An Optimised Software Solution for an ARM Powered TM MP3 Decoder. By Barney Wragg and Paul Carpenter

Digital Audio Compression

Speech Signal Processing: An Overview

Figure 1: Relation between codec, data containers and compression algorithms.

PCM Encoding and Decoding:

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Basic principles of Voice over IP

Multichannel stereophonic sound system with and without accompanying picture

A Review of Algorithms for Perceptual Coding of Digital Audio Signals

MPEG Layer-3. An introduction to. 1. Introduction

For Articulation Purpose Only

High-Fidelity Multichannel Audio Coding With Karhunen-Loève Transform

APPLICATION BULLETIN AAC Transport Formats

Convention Paper 5553

A Framework for Robust and Scalable Audio Streaming

Video-Conferencing System

ISO/IEC JTC 1/SC 29. ISO/IEC :/Amd.1:1999(E) Date: ISO/IEC JTC 1/SC 29/WG 11

(51) Int Cl.: H04S 3/00 ( )

Multimedia Communications

VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.)

Starlink 9003T1 T1/E1 Dig i tal Trans mis sion Sys tem

Digital Speech Coding

4 Digital Video Signal According to ITU-BT.R.601 (CCIR 601) 43

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Aperceptual audio coder is a frequency domain coder which

TECHNICAL OPERATING SPECIFICATIONS

Dream DRM Receiver Documentation

Figure1. Acoustic feedback in packet based video conferencing system

Voice Encryption over GSM:

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201

DOLBY SR-D DIGITAL. by JOHN F ALLEN

Introduction to Packet Voice Technologies and VoIP

THE EMERGING JVT/H.26L VIDEO CODING STANDARD

Study and Implementation of Video Compression Standards (H.264/AVC and Dirac)

ARIB STD-T64-C.S0042 v1.0 Circuit-Switched Video Conferencing Services

Technical Paper. Dolby Digital Plus Audio Coding

HISO Videoconferencing Interoperability Standard

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

Appendix C GSM System and Modulation Description

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP 60 Multi-Channel Sound Track Down-Mix and Up-Mix Draft Issue 1 April 2012 Page 1 of 6

Datasheet EdgeVision

Multichannel Audio Line-up Tones

SPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A

Super Video Compact Disc. Super Video Compact Disc A Technical Explanation

Introduction to Medical Image Compression Using Wavelet Transform

Technical Advances in Digital Audio Radio Broadcasting

Transform-domain Wyner-Ziv Codec for Video

H.264/MPEG-4 AVC Video Compression Tutorial

RECOMMENDATION ITU-R BR *, ** Parameters for international exchange of multi-channel sound recordings with or without accompanying picture ***

FAST MIR IN A SPARSE TRANSFORM DOMAIN

HE-AAC v2. MPEG-4 HE-AAC v2 (also known as aacplus v2 ) is the combination of three technologies:

Main Ways to Enhance Throughput

TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION

Matlab GUI for WFB spectral analysis

Multichannel stereophonic sound system with and without accompanying picture

Digital Audio and Video Data

Advanced Signal Processing 1 Digital Subscriber Line

Image Transmission over IEEE and ZigBee Networks

encoding compression encryption

ATSC Standard A/153 Part 8 HE AAC Audio System Characteristics

Ogg Vorbis Audio Decoder Jon Stritar and Matt Papi December 14, 2005

JPEG Image Compression by Using DCT

Study and Implementation of Video Compression standards (H.264/AVC, Dirac)

high-quality surround sound at stereo bit-rates

Lossless Medical Image Compression using Predictive Coding and Integer Wavelet Transform based on Minimum Entropy Criteria

C Implementation & comparison of companding & silence audio compression techniques

DTS Enhance : Smart EQ and Bandwidth Extension Brings Audio to Life

Understanding the Transition From PESQ to POLQA. An Ascom Network Testing White Paper

Classes of multimedia Applications


For version p (September 4, 2012)

Real-Time DMB Video Encryption in Recording on PMP

Note monitors controlled by analog signals CRT monitors are controlled by analog voltage. i. e. the level of analog signal delivered through the

Michael W. Marcellin and Ala Bilgin

MPEG-1 and MPEG-2 Digital Video Coding Standards

Packet Loss Concealment for Audio Streaming

Propagation Channel Emulator ECP_V3

ETSI TS V1.1.1 ( )

Transcription:

18-796 ultimedia Communications: Coding, Systems, and Networking Prof. Tsuhan Chen tsuhan@ece.cmu.edu PEG Audio 1

Outline Basics Psychoacoustics Subband coding PEG-1 audio Layer I and II Layer III Frame structure and packetization PEG-2 audio ultichannel audio Backward compatible coding Non backward compatible coding Digital Audio Telephone Speech Wideband Speech ediumband Audio Wideband Audio Frequency Band (Hz) Sampling Rate (khz) Bits per Sample Raw Bitrate (kbits/s) 300~3400 8 8 64 50~7000 16 8 128 10~11000 24 16 384 10~22000 48 16 768 CD: 44.1 khz 16 bits 2 channels = 1.411 bits/s 2

Psychoacoustics Threshold in quiet 26 critical bands 0~24 khz Frequency masking in the same critical band Frequency asking SR (Signal-to-ask Ratio) 3

Temporal asking Post-asking: 50~200ms Pre-asking: 1/10 of post-masking Subband Coding H 1 (z) Q F 1 (z) H 2 (z) Q F 2 (z) H (z) Q F (z) Analysis Filterbank Synthesis Filterbank aximal downsampling Q should be based on signal-to-masking ratio (SR) Ear s critical bands are not uniform, but logarithmic The filter bank should match the critical bands Tree-structure filter bank (to be derived on board) 4

Subband Coding vs. DCT z -1 E(z) R(z) z z -1 z Polyphase Representation When E(z) = DCT matrix, this becomes DCT No overlap; blocking artifact odified DCT (DCT) 50% overlap; less blocking artifact PEG-1 Audio ISO/IEC 11172-3 (1988~1991) First high quality audio compression standard Sampling rates: 32, 44.1, 48 khz CD quality two-channel audio at ~256 kbits/s CD: 44.1 khz 16 bits 2 = 1.411 bits/s Quality demonstration (PEG-1 Layer II) Stereo 44.1 khz at 64 kbits/s Stereo 44.1 khz at 128 kbits/s Stereo 44.1 khz at 192 kbits/s Stereo 44.1 khz at 256 kbits/s 5

Block Diagram PC audio samples 32, 44.1, 48 khz analysis filterbank quantizer and coding frame packing encoded bitstream psychoacoustic model 11172-3 ancillary data Decoder Block Diagram encoded bitstream frame unpacking reconstruction synthesis filterbank PC audio samples 32, 44.1, 48 khz 11172-3 Decoder ancillary data 6

Layers Increasing complexity, delay, and quality Layer I: ~384 kbits/s for perceptually lossless quality (4:1) Layer II: ~192 kbits/s for perceptually lossless quality (8:1) Layer III: ~128 kbits/s for perceptually lossless quality (12:1) (for two channels) 100% perceptual lossless Layer I and II 32 Analysis Filterbank 512-tap Scaler & Quantizer ux FFT 512-pt for Layer I 1024-pt for Layer II/III asking Threshold Generator Dynamic Bit Allocator Coder 7

Block-Based Coding 12 12 12 Analysis Filterbank... Block: Layer I Superblock: Layer II/III 12 samples for Layer I, 36 samples for Layer II/III Block companding: Each block normalized by scalefactor For Layer II, up to 3 scalefactors, with 2-bit scalefactor select Each block/superblock receives one bit allocation Layer III Analysis Filterbank 6 or 18 with overlap DCT Scaler & Quantizer Huffman Coding ux FFT asking Threshold Generator Coding 8

Features in Layer III Hybrid filterbank DCT with filterbank Long/short window switching Short for better temporal resolution (to prevent pre-echoes) Long for better frequency resolution Nonuniform quantization Entropy coding Run-length and Huffman coding Bit reservoir (buffer) Frame Structure Header Info Side Info Subband Sanples Aux Data Header info: Sync bits, system info, CRC (cyclic redundancy code) Side info: bit allocation, scalefactor, (and scalefactor select for Layer II and III) Subband samples: 32 12 for Layer I, 32 36 for Layer II and III Packetization: 4-byte header, 184-byte payload 9

Stereo Redundancy Coding Four modes: mono, stereo, dual with two separate channel, joint stereo Joint stereo mode Human stereo perception > 2kHz is based on envelope Intensity stereo coding > 2kHz Encode (L + R) Assign independent left- and right- scalefactors Layer III supports (L+R) and (L R) coding PEG-2 Audio ISO/IEC 13818-3 Allows lower sampling rates 16, 22.05, and 24 khz: about half of PEG-1 From wideband speech to mediumband audio Higher frequency resolution Layer I, II, and III ultichannel coding 2~5 channels; surround sound, multilingual, for visual/hearing-impaired Backward compatible and non-backward compatible coding (13818-7: PEG-2 AAC) 10

ultichannel Audio 2/0-stereo 3/0 3/1 Surround 3/2 3/2 with woofer (5.1 system) LFE: Low-frequency enhancement (woofer) 15~120 Hz Can be anywhere Compatibility Forward compatibility A new decoder can decode an old bitstream Usually simple to achieve Backward compatibility An old decoder can decode a new bitstream, at least partially Usually limits the coding efficiency 11

PEG-2 Backward Compatible Audio Coding PEG-1 Header PEG-1 Data PEG-1 Ancillary Data PEG-1/2 Frame PEG-2 Header PEG-2 Data L C R LS RS atrix L0 R0 T3 T4 T5 PEG-1 PEG-2 Extension ux L0 = α ( L + R0 = α ( R + β C + δ LS) β C + δ RS) α = 1+ 1 1 ; β = δ = or α = 1; β = δ = 0 2 2 Backward Compatible Audio Coding (cont.) L C R LS RS atrix L0 R0 T3 T4 T5 PEG-1 PEG-2 Extension ux Demux PEG-1 Decoder PEG-2 Extension Decoder L0 R0 T3 T4 T5 Inverse atrix L C R LS RS atrixing Dematrixing 12

Non Backward Compatible (NBC) Coding PEG-2 Advanced Audio Coding (AAC) ISO/IEC 13818-7 (April 1997) 320~384 kbits/s for 5 channels, 64kbits/channel NBC at 320 kbits/s as good as BC coding at 640 kbits/s 1~48 audio channels, 0~16 LFEs, 0~16 data streams Same framework (perceptual subband coding) as PEG-1, with some enhancements PEG-2 AAC Enhancements Preprocessing High resolution filterbanks Legend Data Control Noiseless Decoding Inverse Quantizer 1024-line DCT / 128 Temporal noise shaping (TNS): time-dependent quantization Coupling channel 13818-7 Coded Audio Stream Bitstream Demultiplex Scale Factors /S Prediction Intensity multichannel coding Backward adaptive prediction in subbands /S stereo coding Noiseless coding (entropy coding): Huffman coding Intensity/ Coupling TNS Filter Bank Gain Control Output Time Signal 13

Input time signal Perceptual odel Gain Control Legend Filter Bank Data Control TNS Intensity/ Coupling Iteration Loops Quantized Spectrum of Previous Frame Prediction /S Bitstream ultiplex 13818-7 Coded Audio Stream Scale Factors Rate/Distortion Control Process Quantizer Noiseless Coding PEG-2 AAC Profiles ain Low Complexity ain profile Best quality, highest complexity 1024 or 128 DCT Low-complexity profile No temporal noise shaping, no prediction Scalable sampling-rate profile Scalable output sampling rates and complexity Uses hybrid filterbanks (like PEG-1 Layer III) No prediction, no coupling channel Scaleable Sampling Rate 20 khz 18 khz 12 khz 6 khz 14

Simcast To achieve backward compatibility at the cost of higher bitrate L0 R0 PEG-1 PEG-1 Decoder L0 R0 L C R LS RS PEG-2 AAC ux Demux PEG-2 AAC Decoder L C R LS RS References Peter Noll, PEG digital audio coding, IEEE Signal Processing agazine, Sept. 1997, pp. 59-81 D. Pan, A tutorial on PEG/audio compression, IEEE ultimedia, v. 2, no. 2, 1995, pp. 60-74 http://www.mpeg.org/peg/audio.html http://www.cselt.it/mpeg/faq/faq-audio.htm http://www.tnt.uni-hannover.de/project/mpeg/audio/ 15