Audio Coding, Psycho- Accoustic model and MP3

Similar documents

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

Audio Coding Introduction

MPEG-1 / MPEG-2 BC Audio. Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de

AUDIO CODING: BASICS AND STATE OF THE ART

Digital Audio Compression: Why, What, and How

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Analog-to-Digital Voice Encoding

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

For Articulation Purpose Only

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically.

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

The Theory Behind Mp3

A Comparison of the ATRAC and MPEG-1 Layer 3 Audio Compression Algorithms Christopher Hoult, 18/11/2002 University of Southampton

Preservation Handbook

A Review of Algorithms for Perceptual Coding of Digital Audio Signals

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT)

Example/ an analog signal f ( t) ) is sample by f s = 5000 Hz draw the sampling signal spectrum. Calculate min. sampling frequency.

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.

Digital Audio Compression

Analog Representations of Sound

4 Digital Video Signal According to ITU-BT.R.601 (CCIR 601) 43

TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION

GSM speech coding. Wolfgang Leister Forelesning INF 5080 Vårsemester Norsk Regnesentral

Introduction to image coding

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

MP3 AND AAC EXPLAINED

Simple Voice over IP (VoIP) Implementation

Introduction to Packet Voice Technologies and VoIP

Analysis/resynthesis with the short time Fourier transform

Digital Audio and Video Data

A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER. Figure 1. Basic structure of an encoder.

Appendix C GSM System and Modulation Description

Tutorial about the VQR (Voice Quality Restoration) technology

PCM Encoding and Decoding:

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Digital Speech Coding

MPEG Layer-3. An introduction to. 1. Introduction

Audio Coding Algorithm for One-Segment Broadcasting

VoIP Bandwidth Calculation

White Paper: An Overview of the Coherent Acoustics Coding System

14: FM Radio Receiver

HD Radio FM Transmission System Specifications Rev. F August 24, 2011

Experiment 3 MULTIMEDIA SIGNAL COMPRESSION: SPEECH AND AUDIO

Spike-Based Sensing and Processing: What are spikes good for? John G. Harris Electrical and Computer Engineering Dept

TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS

Speech Signal Processing: An Overview

A Secure File Transfer based on Discrete Wavelet Transformation and Audio Watermarking Techniques

Starlink 9003T1 T1/E1 Dig i tal Trans mis sion Sys tem

Figure 1: Relation between codec, data containers and compression algorithms.

Classes of multimedia Applications

An Optimised Software Solution for an ARM Powered TM MP3 Decoder. By Barney Wragg and Paul Carpenter

Digitizing Sound Files

CM0340 SOLNS. Do not turn this page over until instructed to do so by the Senior Invigilator.

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

Convention Paper 5553

HISO Videoconferencing Interoperability Standard

encoding compression encryption

SPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A

RECOMMENDATION ITU-R BS *,** Audio quality parameters for the performance of a high-quality sound-programme transmission chain

PRIMER ON PC AUDIO. Introduction to PC-Based Audio

Voice Encryption over GSM:

Dream DRM Receiver Documentation

Estimation of Loudness by Zwicker's Method

Chapter 6: Broadcast Systems. Mobile Communications. Unidirectional distribution systems DVB DAB. High-speed Internet. architecture Container

Compression techniques

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

Network Traffic #5. Traffic Characterization

Real-Time Audio Watermarking Based on Characteristics of PCM in Digital Instrument

Boost Your Skills with On-Site Courses Tailored to Your Needs

Voice over IP (VoIP) Part 1

Physical Layer, Part 2 Digital Transmissions and Multiplexing

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

UNIFORM POLYPHASE FILTER BANKS FOR USE IN HEARING AIDS: DESIGN AND CONSTRAINTS

Sampling and Interpolation. Yao Wang Polytechnic University, Brooklyn, NY11201

Analog vs. Digital Transmission

Advanced Signal Processing 1 Digital Subscriber Line

School Class Monitoring System Based on Audio Signal Processing

INTRODUCTION TO COMMUNICATION SYSTEMS AND TRANSMISSION MEDIA

Video compression: Performance of available codec software

Digital Transmission of Analog Data: PCM and Delta Modulation

Spectrum Level and Band Level

Matlab GUI for WFB spectral analysis

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

Lab 5 Getting started with analog-digital conversion

Image Compression through DCT and Huffman Coding Technique

Video Coding Basics. Yao Wang Polytechnic University, Brooklyn, NY11201

TECHNICAL OPERATING SPECIFICATIONS

New Methods of Stereo Encoding for FM Radio Broadcasting Based on Digital Technology

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP 60 Multi-Channel Sound Track Down-Mix and Up-Mix Draft Issue 1 April 2012 Page 1 of 6

Web-Conferencing System SAViiMeeting

EE3414 Multimedia Communication Systems Part I

Objectives. Lecture 4. How do computers communicate? How do computers communicate? Local asynchronous communication. How do computers communicate?

Introduction to Digital Audio

Transcription:

INF5081: Multimedia Coding and Applications Audio Coding, Psycho- Accoustic model and MP3, NR Torbjørn Ekman, Ifi Nils Christophersen, Ifi Sverre Holm, Ifi What is Sound? Sound waves: 20Hz - 20kHz Speed: 331.3 m/s (air) Wavelength: 165 cm - 1.65 cm 1

Threshold for audible sound Reference 20 µpa = 2 10-5 N/m 2 Analogue audio frequencies: 20Hz - 20kHz mono: x(t) scalar stereo: xr ( t) x( t) = xl ( t) 2

Dynamics compression A-Law A abs( S) sign( S) S' = 1+ ln A 1+ ln( A abs( S)) sign( S) 1+ ln A for else abs(s) 1 A µ-law 1+ ln(1 + µ abs( S)) S' = sign( S), µ = 255 ln(1 + µ ) Audio Compression small files, low data rate at transmission reconstruction must be (as much as possible) similar to original signal redundancy (lossless coding) irrelevancy (do not code what you cannot hear) 3

Data Rates Stereo CD Audio 2 = 1411.2 10-3 1 3 16 bit 44.1 10 s s bit Quality Sample Rate Bit/Sample Channels Data Rate kb/s Frequency Telephone 8.000 8 Mono 64,00 200-3400 MW 11.025 8 Mono 88,00 UKW 22.050 16 Stereo 705,60 CD 44.100 16 Stereo 1411,00 20-20000 DAT 48.000 16 Stereo 1536,00 20-20000 The inner human ear 4

Bandpass-filter in the ear Frequency to position Digital representation of sound 110 1 Kvantiserings nivåer 011 010 001 000 111 110 101 time Diskretisert variabel 0.8 0.6 0.4 0.2 0-0.2-0.4 16 bit 4 bit Using 16 bit we get 2*2*2* 2 = 2 16 = 65 536 levels giving: 16*44100 = 705 600 bits/sek -0.6-0.8-1 -1-0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 Kontinuerlig variabel 5

Example: Beethoven, 5th symphony 0.15 Beethoven, Bethovens 5th symphony, 9., samplet sampled med with 44.1 44.1 khz khz 0.1 0.05 16 bit kvantisering 4 bit kvantisering 0 Amplitude -0.05-0.1 16 bit quantizing -0.15-0.2-0.25-0.3 0 20 40 60 80 100 120 Tid 4 bit quantizing ( dum og dårlig ) Sampling When x(t) is bandwidth-limited: then with f > ω x( f ) = 0 n= [] n x ( t) = x g( t n t) = 1 < 1 2ω t x[ n] = x( n t) f s sin(2πωt) g( t) = 2πωt 6

Quantisation x Q(x) k bits { y, K 1, y n } i L = 2 j k representations x y x y Q( x) = y i PCM = Pulse Code Modulation Sampling: Quantisation: Coding: { x( t) } { x[ n] } { x[ n] } { Q( x[ n] )} Q( { x[ n] }) { } n i redundancy irrelevancy Play: y ( t) Q( x[ n ]) g( t n t) = i i 7

Masking Masking Threshold for human ear Threshold changes: neighbouring frequencies (Example 0.5, 1, 4, 8 khz) in time 8

Masking Absolute threshold of hearing. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous masking Noise Masking Tone Tone Masking Noise Noise Masking Noise 2. Nonsimultaneous masking Pre masking (2 ms) Post masking (100 ms) Noise Masking Tone Filtered Noise Center 410 Hz Width 111 Hz Tone 1, 820 Hz 5 db below noise Tone 2, 410 Hz 5 db below noise Noise + Tone 1 Noise + Tone 2 Not masked Masked You can not hear a sinusoid that lies in the same critical band as a filtered noise if the soundpreasure level is below a certain threshold. This effect also stretches out beyond the critical band. 9

Tone Masking Noise Filtered Noise Center 1 khz Width 162 Hz 15 db below Tone 1, 2 khz Tone 2, 1 khz Noise + Tone 1 Noise + Tone 2 Not masked Masked You can not hear a filtered noise that lies in the same critical band as a sinusoid if the soundpreasure level is below a certain threshold. This effect also stretches out beyond the critical band. Exploit Masking If a sound is masked we can t hear it. Make a frequency analyze of the signal and find the masking threshold. Put the quantization noise under the masking threshold and we don t hear the quantization. 10

Pre echo distortion The original sound of a castanet. The abruptness in time domain result in all frequencies being involved. The data is split into windows of finite length. The quantization noise is spread over a entire window. This makes the castanets sound less distinct. Audible effects can be avoided with shorter windows, exploiting premasking. Scale factors and Quantization When the dynamics change over time, only a small subset of the quantization steps are used in regions with low magnitudes. Use scale factors instead: Take a window of data. Find the max magnitude in this window. Use the next larger scale factor from a table. Normalize with the scale factor. Quantize. Now the whole dynamic range of the quantizer is used. Send scale factor and quantized samples. 11

MPEG compression factors MPEG 1 Audio: PCM 32, 44.1, 48 khz, max 448 kbit/s MPEG 2 Audio: PCM 16, 22.05, 24, 32, 44.1, 48 khz, max 384 KBit/s MPEG Audio Layer I,II,III Layer I Layer II Digital TV Layer III MP3 12

MP3 - MPEG 1 Audio Layer 3 Sampling: 16 khz - 48 khz Bit rate: 32 kb/s - 192 kb/s (CD Audio: 44.1 khz, 1411 kb/s) www.iis.fhg.de/amm/gallery/index.html Karlheinz Brandenburg: MP3 and AAC explained http://www.exp-math.uni-essen.de/~dreibh/diplom/bra99.pdf Psychoacoustics in the Encoder 13

perceptual encoding / decoding Filterbank 14

Ideal sub-band coder impossible: ideal sub-band coder downsampling aliasing possible: nearly perfect H m 1 for f Dm, m = 1, K, M ( f ) = 0 else Downsampling from M f s back to sub-bandwidth B, upper frequency is multiple of B f s can sample at f s = 2B = 2M B f s (instead of ) x m [] n [ k] y m M [ k] = x [ k M ] m y m 15

Filterbank in MPEG-1 audio layer 1-3 Polyphase filterbank 32 subbands 512 tap FIR-filters 80 + and * per output Equal width Not perfect reconstruction Frequency overlap A closer look The subbands overlap at 3 db to the adjacent bands. The leakage to the other bands is small. The total response almost adds up to one (0 db). 16

White noise The white noise run through the filterbank. The samples from each band are played in the order of the subbands. The subsampled filtered sequence. The samples from each band are played in the order of the subbands. The reconstruction error is 84 db. Nonideal filterbanks Y( e M 1 n= 1 jω X( e ) = X( e jω 2 πn j ω M In a perfect filterbank the first part is the only part. M 1 1 R jω A jω ) Hk ( e ) Hk ( e ) + k= 0 14 M 444 24444 3 1 M 1 1 j ω R jω A M ) Hk ( e ) Hk ( e ) k= 0 14 M 4444 244444 3 0 2πn The second part consists of the aliasing terms. The filterbank is designed so that the aliasing is small. 17

Tubthumper, a time domain view The red line is the reconstruction error after splitting the signal in subbands, down sampling and applying the synthesis filterbank. The reconstruction error is 84 db and sounds like Tubthumper, frequency view Subband 1 2 4 8 16 32 Center frequency [khz] No subsampling 0.3 1.0 2.4 5.2 10.7 21.7 Subsampled 32 times 18

Filterbank MPEG polyphase 12 samples 12 samples 12 samples filterbank band 0 band 1... Layer I frame 384 samples band 31 Layer II/III frame 1152 samples Critical Bands Heinrich Barkhausen (1881-1956) psycho-acoustic width measured in bark f /100 1bark = 9 + 4 log( f /1000) for else f < 500 19

MPEG - Sub bands Layer I: 32 bands, 625 Hz each, Fourier transform Layer II: 32 bands, three frames, time masking Layer III: Division according to critical bands MPEG masking Psycho-acoustic model masking of neighbouring bands signals are coded when above masking threshold MUSICAM (Masking-pattern adapted Universal Subband Integrated Coding and Multiplexing) Layer I: simplified, Layer II: entirely, Layer III: with other methods 20

Example: Masking MPEG Audio band level masking coding 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1?????? 12 x 15????????????? - x x??????? Bit Allocation and Masking The masking threshold in each subband gives the Just Noticeable Distortion (JND) limit for that band. Bits are assigned subbands so that the quantization noise falls below or as little over the JND as possible. 21

Castanets and Guitar Bit allocation with 2 bits per sample 22

Bit allocation with 4 bits per sample Signal to Quantization Noise Ratio and the Just Noticeable Distortion Frame at t=0.6 s Frame at t=1.1 s 23

Examples on compression Compression 2 4 8 MP1 4 bit 2 bit MP1 error (SQR) 22 db 11 db Direct Quantization 8 bit 4bit 2 bit Direct Quantization Error (SQR) 31 db 7.8 db 1.1 db Downsampling to 22 khz bandwidth and quantization 16 bit 8 bit 4 bit MPEG-1 Layer 3 encoder 24

MP3 Filter bank - sub bands Series MDCT fine grain frequency resolution non-uniform quantisation perception model Huffman coding MP3 (vs. Layer I/II) modified DCT (Series MDCT vs. FFT) critical bands Huffman coding entropy reduction dynamics compression difference and sum of stereo signals 25

MPEG Audio Layer I,II,III Layer I: 19 ms delay, FFT, 384 samples, frequency masking, equal bands Layer II: 35 ms delay, FFT, 1152 samples, frequency masking, time simulated, equal bands Layer III: 59 ms delay, DCT, 1152 samples, frequency and time masking, bands as in bark scale MPEG Layer I, II, III Data rates subj. quality bandwidth compression 1 min audio Audio CD CD 1400 1:1 10.58 MB MPEG1 Layer I CD 384 3.6:1 2.88 MB MPEG1 Layer II CD 256 5.5:1 1.92 MB MPEG1 Layer III CD 128 11:1 962 kb MPEG2 Layer III Radio 64 22:1 481 kb MPEG2 Layer III Telephone 16 88:1 120 kb CS-ACELP Speech 5,30 264:1 40 kb 26

MPEG-2 AAC Audio Formats PCM - Pulse Code Modulation ITU G.711; speech data 4kHz bandwidth, 64 kb/s data rate ADPCM (Adaptive Differential PCM) ITU G.726, G.727; 16, 24, 32, 40 kbit/s. Standard for CCITT G.721 SB-ADPCM (Sub-Band ADPCM) ISDN, G.722; 7 khz bandwidth in 64 kbit/s streams 27

Audio Formats AIFF - Audio Interchange File Format Apple (extension from IFF by Electronic Arts) Wave (by Microsoft and IBM) Part of RIFF (Resource Interchange File Format) NeXT/Sun Audio File Format! big endian Proprietary Audio Formats AT&T Proprietary Compression Algorithm EPAC (Bell Labs) Microsoft Windows Media Audio (WMA) AC-3 Audio Code No. 3 - Dolby Digital Surround 28

Speech compression formats GSM 06-10: 160 13-bit values in 260 Bit (33 Byte) are compressed; 8000 samples/s result in data rate of 1650 Byte/s CELP (Code Excited Linear Prediction): analytical model LD-CELP (Low Delay CELP): G.728 LPC-10E (Linear Prediction Coder (Enhanced): military coder, analytical model, 2.4 kbit/s understandable, but low quality. End of Part Thank you for your attention! 29