Chapter 14. MPEG Audio Compression



Similar documents
A Comparison of the ATRAC and MPEG-1 Layer 3 Audio Compression Algorithms Christopher Hoult, 18/11/2002 University of Southampton

AUDIO CODING: BASICS AND STATE OF THE ART

Audio Coding, Psycho- Accoustic model and MP3

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

Quality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden)

Digital Audio Compression: Why, What, and How

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

The Theory Behind Mp3

MPEG-1 / MPEG-2 BC Audio. Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de

A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER. Figure 1. Basic structure of an encoder.

Audio Coding Introduction

Digital Audio Compression

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Trigonometric functions and sound

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.

Audio Coding Algorithm for One-Segment Broadcasting

Digital Transmission of Analog Data: PCM and Delta Modulation

Tutorial about the VQR (Voice Quality Restoration) technology

MP3 AND AAC EXPLAINED

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

MPEG Layer-3. An introduction to. 1. Introduction

Digital Audio and Video Data

Digital terrestrial television broadcasting Audio coding

Matlab GUI for WFB spectral analysis

C Implementation & comparison of companding & silence audio compression techniques

Figure 1: Relation between codec, data containers and compression algorithms.

Classes of multimedia Applications

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically.

Multimedia Communications

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Video Coding Basics. Yao Wang Polytechnic University, Brooklyn, NY11201

PRIMER ON PC AUDIO. Introduction to PC-Based Audio

Preservation Handbook

DAB + The additional audio codec in DAB

QoS issues in Voice over IP

PCM Encoding and Decoding:

A Secure File Transfer based on Discrete Wavelet Transformation and Audio Watermarking Techniques

New Methods of Stereo Encoding for FM Radio Broadcasting Based on Digital Technology

DTS Enhance : Smart EQ and Bandwidth Extension Brings Audio to Life

Technical Paper. Dolby Digital Plus Audio Coding

Spectrum Level and Band Level

Analog-to-Digital Voice Encoding

White Paper: An Overview of the Coherent Acoustics Coding System

For Articulation Purpose Only

Starlink 9003T1 T1/E1 Dig i tal Trans mis sion Sys tem

High-Fidelity Multichannel Audio Coding With Karhunen-Loève Transform

An Optimised Software Solution for an ARM Powered TM MP3 Decoder. By Barney Wragg and Paul Carpenter

Convention Paper 5553

Estimation of Loudness by Zwicker's Method

HD Radio FM Transmission System Specifications Rev. F August 24, 2011

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP 60 Multi-Channel Sound Track Down-Mix and Up-Mix Draft Issue 1 April 2012 Page 1 of 6

Basic principles of Voice over IP

Study and Implementation of Video Compression Standards (H.264/AVC and Dirac)

Objective Speech Quality Measures for Internet Telephony

APTA TransiTech Conference Communications: Vendor Perspective (TT) Phoenix, Arizona, Tuesday, VoIP Solution (101)

Analog Representations of Sound

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

A Framework for Robust and Scalable Audio Streaming

Doppler. Doppler. Doppler shift. Doppler Frequency. Doppler shift. Doppler shift. Chapter 19

Video and Audio Codecs: How Morae Uses Them

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

Welcome to the United States Patent and TradeMark Office

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Multichannel stereophonic sound system with and without accompanying picture

SPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A

4 Digital Video Signal According to ITU-BT.R.601 (CCIR 601) 43

high-quality surround sound at stereo bit-rates

VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.)

FAST MIR IN A SPARSE TRANSFORM DOMAIN

TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201

A Digital Audio Watermark Embedding Algorithm

Digital Speech Coding

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

RECOMMENDATION ITU-R BR *, ** Parameters for international exchange of multi-channel sound recordings with or without accompanying picture ***

White Paper. PESQ: An Introduction. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN

Introduction to image coding

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Calculating Bandwidth Requirements

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

Video-Conferencing System

ARIB STD-T64-C.S0042 v1.0 Circuit-Switched Video Conferencing Services

Introduction to Medical Image Compression Using Wavelet Transform

Figure1. Acoustic feedback in packet based video conferencing system

encoding compression encryption

Appendix C GSM System and Modulation Description

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

DOLBY SR-D DIGITAL. by JOHN F ALLEN

JPEG Image Compression by Using DCT

Basics of Digital Recording

Red Bee Media. Technical file and tape delivery specification for Commercials. Applies to all channels transmitted by Red Bee Media

Solution. (Chapters ) Dr. Hasan Qunoo. The Islamic University of Gaza. Faculty of Engineering. Computer Engineering Department

HE-AAC v2. MPEG-4 HE-AAC v2 (also known as aacplus v2 ) is the combination of three technologies:

(Refer Slide Time: 2:10)

A Review of Algorithms for Perceptual Coding of Digital Audio Signals

Video Encoding Best Practices

GSM speech coding. Wolfgang Leister Forelesning INF 5080 Vårsemester Norsk Regnesentral


Transcription:

Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003

14.1 Psychoacoustics The range of human hearing is about 20 Hz to about 20 khz The frequency range of the voice is typically only from about 500 Hz to 4 khz The dynamic range, the ratio of the maximum sound amplitude to the quietest sound that humans can hear, is on the order of about 120 db 2 Li & Drew c Prentice Hall 2003

Frequency Masking Lossy audio data compression methods, such as MPEG/Audio encoding, remove some sounds which are masked anyway The general situation in regard to masking is as follows: 1. A lower tone can effectively mask (make us unable to hear) a higher tone 2. The reverse is not true a higher tone does not mask a lower tone well 3. The greater the power in the masking tone, the wider is its influence the broader the range of frequencies it can mask. 4. As a consequence, if two tones are widely separated in frequency then little masking occurs 5 Li & Drew c Prentice Hall 2003

Threshold of Hearing A plot of the threshold of human hearing for a pure tone 60 50 40 db 30 20 10 0 10 Fig. 14.2: 10 2 10 3 10 4 Hz Threshold of human hearing, for pure tones 6 Li & Drew c Prentice Hall 2003

Threshold of Hearing (cont d) The threshold of hearing curve: if a sound is above the db level shown then the sound is audible Turning up a tone so that it equals or surpasses the curve means that we can then distinguish the sound An approximate formula exists for this curve: Threshold(f) = 3.64(f/1000) 0.8 6.5 e 0.6(f/1000 3.3)2 + 10 3 (f/1000) 4 (14.1) The threshold units are db; the frequency for the origin (0,0) in formula (14.1) is 2,000 Hz: Threshold(f) = 0 at f =2 khz 7 Li & Drew c Prentice Hall 2003

Frequency Masking Curves Frequency masking is studied by playing a particular pure tone, say 1 khz again, at a loud volume, and determining how this tone affects our ability to hear tones nearby in frequency one would generate a 1 khz masking tone, at a fixed sound level of 60 db, and then raise the level of a nearby tone, e.g., 1.1 khz, until it is just audible The threshold in Fig. 14.3 plots the audible level for a single masking tone (1 khz) Fig. 14.4 shows how the plot changes if other masking tones are used 8 Li & Drew c Prentice Hall 2003

70 60 50 40 Audible tone Inaudible tone db 30 20 10 0 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Frequency (khz) Fig. 14.3: Effect on threshold for 1 khz masking tone 9 Li & Drew c Prentice Hall 2003

70 60 1 4 8 50 40 db 30 20 10 0 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Frequency (khz) Fig. 14.4: Effect of masking tone at three different frequencies 10 Li & Drew c Prentice Hall 2003

Temporal Masking Phenomenon: any loud tone will cause the hearing receptors in the inner ear to become saturated and require time to recover The following figures show the results of Masking experiments: 16 Li & Drew c Prentice Hall 2003

60 db 40 Test tone 20 Mask tone 5 0 10 100 Delay time (ms) 1000 Fig. 14.6: The louder is the test tone, the shorter it takes for our hearing to get over hearing the masking. 17 Li & Drew c Prentice Hall 2003

60 50 40 Level (db) 30 20 10 0 Tones below surface are inaudible 10 0 0.01 Time 0.02 0.03 0 4 6 Frequency 8 Fig. 14.7: Effect of temporal and frequency maskings depending on both time and closeness in frequency. 18 Li & Drew c Prentice Hall 2003

14.2 MPEG Audio MPEG audio compression takes advantage of psychoacoustic models, constructing a large multi-dimensional lookup table to transmit masked frequency components using fewer bits MPEG Audio Overview 1. Applies a filter bank to the input to break it into its frequency components 2. In parallel, a psychoacoustic model is applied to the data for bit allocation block 3. The number of bits allocated are used to quantize the info from the filter bank providing the compression 20 Li & Drew c Prentice Hall 2003

MPEG Layers MPEG audio offers three compatible layers : Each succeeding layer able to understand the lower layers Each succeeding layer offering more complexity in the psychoacoustic model and better compression for a given level of audio quality each succeeding layer, with increased compression effectiveness, accompanied by extra delay The objective of MPEG layers: quality and bit-rate a good tradeoff between 21 Li & Drew c Prentice Hall 2003

MPEG Layers (cont d) Layer 1 quality can be quite good provided a comparatively high bit-rate is available Digital Audio Tape typically uses Layer 1 at around 192 kbps Layer 2 has more complexity; was proposed for use in Digital Audio Broadcasting Layer 3 (MP3) is most complex, and was originally aimed at audio transmission over ISDN lines Most of the complexity increase is at the encoder, not the decoder accounting for the popularity of MP3 players 22 Li & Drew c Prentice Hall 2003

MPEG Audio Strategy MPEG approach to compression relies on: Quantization Human auditory system is not accurate within the width of a critical band (perceived loudness and audibility of a frequency) MPEG encoder employs a bank of filters to: Analyze the frequency ( spectral ) components of the audio signal by calculating a frequency transform of a window of signal values Decompose the signal into subbands by using a bank of filters (Layer 1 & 2: quadrature-mirror ; Layer 3: adds a DCT; psychoacoustic model: Fourier transform) 23 Li & Drew c Prentice Hall 2003

MPEG Audio Strategy (cont d) Frequency masking: by using a psychoacoustic model to estimate the just noticeable noise level: Encoder balances the masking behavior and the available number of bits by discarding inaudible frequencies Scaling quantization according to the sound level that is left over, above masking levels May take into account the actual width of the critical bands: For practical purposes, audible frequencies are divided into 25 main critical bands (Table 14.1) To keep simplicity, adopts a uniform width for all frequency analysis filters, using 32 overlapping subbands 24 Li & Drew c Prentice Hall 2003

MPEG Audio Compression Algorithm Audio (PCM) input Time to frequency transformation Bit allocation, quantizing and coding What to drop Bitstream formatting Encoded bitstream Psychoacoustic modeling Encoded bitstream Bitstream unpacking Frequency sample reconstruction Frequency to time transformation Decoded PCM audio Fig. 14.9: Basic MPEG Audio encoder and decoder. 25 Li & Drew c Prentice Hall 2003

Basic Algorithm (cont d) The algorithm proceeds by dividing the input into 32 frequency subbands, via a filter bank A linear operation taking 32 PCM, sampled in time; output is 32 frequency coefficients In the Layer 1 encoder, the sets of 32 PCM values are first assembled into a set of 12 groups of 32s an inherent time lag in the coder, equal to the time to accumulate 384 (i.e., 12 32) Fig.14.11 shows how are organized A Layer 2 or Layer 3, frame actually accumulates more than 12 for each subband: a frame includes 1,152 26 Li & Drew c Prentice Hall 2003

Audio (PCM) In Subband filter 0 Subband filter 1 Subband filter 2 12 12 12 12 12 12 12 12 12... Subband filter 31 Each subband filter produces 1 sample out for every 32 in... 12 Layer 1 Frame... 12... Layer 2 and Layer 3 Frame 12 Fig. 14.11: MPEG Audio Frame Sizes 27 Li & Drew c Prentice Hall 2003

Mask calculations are performed in parallel with subband filtering, as in Fig. 4.13: PCM audio signal Filter bank: 32 subbands Linear quantizer Bitstream formatting Coded audio signal 1,024-point FFT Psychoacoustic model Side-information coding Fig. 14.13: MPEG-1 Audio Layers 1 and 2. 30 Li & Drew c Prentice Hall 2003

Layer 2 of MPEG-1 Audio Main difference: Three groups of 12 are encoded in each frame and temporal masking is brought into play, as well as frequency masking Bit allocation is applied to window lengths of 36 instead of 12 The resolution of the quantizers is increased from 15 bits to 16 Advantage: a single scaling factor can be used for all three groups 31 Li & Drew c Prentice Hall 2003

Layer 3 of MPEG-1 Audio Main difference: Employs a similar filter bank to that used in Layer 2, except using a set of filters with non-equal frequencies Takes into account stereo redundancy Uses Modified Discrete Cosine Transform (MDCT) addresses problems that the DCT has at boundaries of the window used by overlapping frames by 50%: F (u) = 2 N 1 i=0 [ 2π f(i) cos N ( i + N/2 + 1 ) ] (u +1/2) 2,u=0,.., N/2 1 (14.7) 32 Li & Drew c Prentice Hall 2003

PCM audio signal Filter bank: 32 subbands M-DCT Nonuniform quantization 1,024-point FFT Psychoacoustic model Side-information coding Coded audio signal Bitstream formatting Huffman coding Fig 14.14: MPEG-Audio Layer 3 Coding. 33 Li & Drew c Prentice Hall 2003

Table 14.2 shows various achievable MP3 compression ratios: Table 14.2: MP3 compression performance Sound Quality Bandwidth Mode Compression Ratio Telephony 3.0 khz Mono 96:1 Better than 4.5 khz Mono 48:1 Short-wave Better than 7.5 khz Mono 24:1 AM radio Similar to 11 khz Stereo 26-24:1 FM radio Near-CD 15 khz Stereo 16:1 CD > 15 khz Stereo 14-12:1 34 Li & Drew c Prentice Hall 2003

14.5 Further Exploration Link to Further Exploration for Chapter 14. In Chapter 14 the Further Exploration section of the text website, a number of useful links are given: Excellent collections of MPEG Audio and MP3 links. The official MPEG Audio FAQ MPEG-4 Audio implements Tools for Large Step Scalability, An excellent reference is given by the Fraunhofer- Gesellschaft research institute, MPEG 4 Audio Scalable Profile. 42 Li & Drew c Prentice Hall 2003