Audio Coding Introduction



Similar documents
AUDIO CODING: BASICS AND STATE OF THE ART

MPEG-1 / MPEG-2 BC Audio. Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de

Audio Coding, Psycho- Accoustic model and MP3

Digital Audio Compression: Why, What, and How

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

DAB + The additional audio codec in DAB

MP3 AND AAC EXPLAINED

Audio Coding Algorithm for One-Segment Broadcasting

high-quality surround sound at stereo bit-rates

MPEG Layer-3. An introduction to. 1. Introduction

Digital terrestrial television broadcasting Audio coding

HE-AAC v2. MPEG-4 HE-AAC v2 (also known as aacplus v2 ) is the combination of three technologies:

EE3414 Multimedia Communication Systems Part I

The AAC audio Coding Family For

A Comparison of the ATRAC and MPEG-1 Layer 3 Audio Compression Algorithms Christopher Hoult, 18/11/2002 University of Southampton

Convention Paper 5553

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

TECHNICAL PAPER. Fraunhofer Institute for Integrated Circuits IIS

The Theory Behind Mp3

Multichannel stereophonic sound system with and without accompanying picture

APPLICATION BULLETIN AAC Transport Formats

A Review of Algorithms for Perceptual Coding of Digital Audio Signals

An Optimised Software Solution for an ARM Powered TM MP3 Decoder. By Barney Wragg and Paul Carpenter

Tutorial about the VQR (Voice Quality Restoration) technology

For Articulation Purpose Only

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP 60 Multi-Channel Sound Track Down-Mix and Up-Mix Draft Issue 1 April 2012 Page 1 of 6

Overview ISDB-T for sound broadcasting Terrestrial Digital Radio in Japan. Shunji NAKAHARA. NHK (Japan Broadcasting Corporation)

Technical Paper. Dolby Digital Plus Audio Coding

4 Digital Video Signal According to ITU-BT.R.601 (CCIR 601) 43

Fraunhofer Institute for Integrated Circuits IIS. Director Prof. Dr.-Ing. Albert Heuberger Am Wolfsmantel Erlangen

The Fraunhofer Gesellschaft - FhG

Creating Content for ipod + itunes

5.1 audio. How to get on-air with. Broadcasting in stereo. the Dolby "5.1 Cookbook" for broadcasters. Tony Spath Dolby Laboratories, Inc.

Chapter 6: Broadcast Systems. Mobile Communications. Unidirectional distribution systems DVB DAB. High-speed Internet. architecture Container

FRAUNHOFER INSTITUTE FOR INTEGRATED CIRCUITS IIS AUDIO COMMUNICATION ENGINE RAISING THE BAR IN COMMUNICATION QUALITY

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.

A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER. Figure 1. Basic structure of an encoder.

TECHNICAL PAPER. Fraunhofer Institute for Integrated Circuits IIS

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

ARIB STD-T64-C.S0042 v1.0 Circuit-Switched Video Conferencing Services

Preservation Handbook

White Paper: An Overview of the Coherent Acoustics Coding System

Ideal CD player and FM tuner for use with other 301 Reference Series components also supports RDS and USB memory playback

Technical Advances in Digital Audio Radio Broadcasting

TR 036 TV PROGRAMME ACCOMMODATION IN A DVB-T2 MULTIPLEX FOR (U)HDTV WITH HEVC VIDEO CODING TECHNICAL REPORT VERSION 1.0

Digital Speech Coding

DeNoiser Plug-In. for USER S MANUAL

PRIMER ON PC AUDIO. Introduction to PC-Based Audio

Spatial Audio Coding: Next-generation efficient and compatible coding of multi-channel audio

MPEG-H Audio System for Broadcasting

Loudness and Dynamic Range

The ISO/MPEG Unified Speech and Audio Coding Standard Consistent High Quality for all Content Types and at all Bit Rates

ETSI TS V1.4.1 ( ) Technical Specification

Module 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur

Improved MPEG Low-Delay Audio Coding on DaVinci and TI C64 series DSPs. Negjmedin Fazlija Fraunhofer IIS

Trigonometric functions and sound

ACCESS Rack & Portable. Small, compact and powerful IP audio codec...

GSM speech coding. Wolfgang Leister Forelesning INF 5080 Vårsemester Norsk Regnesentral

Digital Audio Compression

Spectrum Level and Band Level

Multichannel audio: From studio to listener

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

FAST FACTS. Fraunhofer Institute for Integrated Circuits IIS

EUROPEAN COMPUTER DRIVING LICENCE. Multimedia Audio Editing. Syllabus

MPEG-4. The new standard for multimedia on the Internet, powered by QuickTime. What Is MPEG-4?

Speech Signal Processing: An Overview

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Analog-to-Digital Voice Encoding

Aalborg Universitet. Estimation and Modeling Problems in Parametric Audio Coding Christensen, Mads Græsbøll. Publication date: 2005

Solution. (Chapters ) Dr. Hasan Qunoo. The Islamic University of Gaza. Faculty of Engineering. Computer Engineering Department

HD Radio FM Transmission System Specifications Rev. F August 24, 2011

Dream DRM Receiver Documentation

INTRODUCTION TO COMMUNICATION SYSTEMS AND TRANSMISSION MEDIA

Matlab GUI for WFB spectral analysis

UNIVERSITY OF CALICUT

Appendix C GSM System and Modulation Description

DAB Digital Radio Broadcasting. Dr. Campanella Michele

1. (Ungraded) A noiseless 2-kHz channel is sampled every 5 ms. What is the maximum data rate?

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201

Quality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden)

High-Fidelity Multichannel Audio Coding With Karhunen-Loève Transform

DIGITAL AUDIO BASICS

All About Audio Metadata. The three Ds: dialogue level, dynamic range control, and downmixing

Mobile Audio from MP3 to AAC and further

The Successful Implementation of High-Performance Digital Radio

How To Test Video Quality With Real Time Monitor

Video compression: Performance of available codec software

Product Information S N O. Portable VIP protection CCTV & Alarm System 2

VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.)

Example/ an analog signal f ( t) ) is sample by f s = 5000 Hz draw the sampling signal spectrum. Calculate min. sampling frequency.

HISO Videoconferencing Interoperability Standard

Transcription:

Audio Coding Introduction Lecture WS 2013/2014 Prof. Dr.-Ing. Karlheinz Brandenburg bdg@idmt.fraunhofer.de Prof. Dr.-Ing. Gerald Schuller shl@idmt.fraunhofer.de Page Nr. 1

Organisatorial Details - Overview Lectures: 14 lectures read by Prof. Brandenburg and Prof. Schuller Practice lessons: Exam: Instructors: Dr. Andreas Franck M. Sc. Javier Frutos-Bonilla Periodic homework assignments, which will count 30% towards the final grade. Small groups (2-3 people) to solve the homework and deliver a single solution for the whole group. Homework presentation during the lessons (laptop with Octave or Matlab running) Written exam at the end of the semester, 90 minutes Agree to this method by signing the document that is passed around Page Nr. 2

Organisatorial Details Time and Place Lectures: Monday, 03:00-04:30pm, Room Sr K 2026 Practice lessons: Monday, 7:15-8:45am, Room Sr K 2003B, odd weeks (bi-weekly) Suggestion: Shift to other time, for instance Thursday K 2026 01:00-02:30pm Page Nr. 3

Organisatorial Details - Timeline Lecture: Date: Read by: 1. Introduction 14.10.2013 Prof. Brandenburg 2. Psychoacoustics 21.10.2013 Dipl.-Ing. Werner 3. Basics of Multirate Signal Processing 28.10.2013 Prof. Schuller 4. Filter Banks 1 04.11.2013 Prof. Schuller 5. Filter Banks 2 11.11.2013 Prof. Schuller 6. Quantization & Coding 18.11.2013 Prof. Brandenburg 7. MPEG 1 / MPEG 2 BC Audio 25.11.2013 Prof. Brandenburg 8. MPEG 2 / 4 AAC 02.11.2013 Prof. Brandenburg 9. Prediction and Lossless Audio Coding 09.12.2013 Prof. Schuller 10. Audio Coding for Communication (ULD) 16.12.2013 Prof. Schuller 11. Coding of Stereophonic Signals 06.01.2014 Prof. Brandenburg 12. Parametric Coding of High-Quality Audio 13.01.2014 Prof. Brandenburg 13. Dolby AC3, DTS 20.01.2014 Prof. Schuller 14. SAOC and USAC 27.01.2014 Dr. Franck Page Nr. 4

Current Applications (1) Digital audio broadcasting - EU 147 (Layer 2) - WorldSpace (Layer 3) - XM Radio (HeAAC) ISDN Transmission of Audio Digital TV - MPEG-1/2 Layer 2 - Dolby AC-3 multichannel coding - MPEG- 2 AAC Storage of large music volumes (archives) DVD - Dolby Digital - DTS Page Nr. 5

Current Applications (2) Internet and Network Audio - MPEG-1/2 Layer 3 (.mp3, all software player) - AAC (Apple ITunes Music Store) - AAC-LD (real-time video conference systems) - others (WMA) Audio on portable phones -.mp3 - HeAAC (recommended by 3GPP) Solid state portable music player (mp3, AAC, WMA) Page Nr. 6

Basics of High Quality Audio Coding The goal: transparent coding of music signals The source is not known in advance Use information about the sink, not the source The solution: Modeling of the masking threshold of the ear The quantization noise has to be kept below the masked threshold Page Nr. 7

Psychoacoustics (Masked Threshold) 80 db 60 f =0,25 m f =1kHz f m m =4kHz L T 40 20 0 0,02 0,05 0,1 0,2 0,5 1 2 5 10 20 khz f T Page Nr. 8

Demo: The "13 db-miracle" Original signal Original + white noise, SNR = 13,6 db Original + noise at threshold, S/N = 13,6 db Difference (modulated white noise) Difference (noise at threshold) Page Nr. 9

The Basic Paradigm of T/F Domain Audio Coding Digital Audio Input Filter Bank Bit or Noise Allocation Quantized Samples Bitstream Formatting Encoded Bitstream Signal to Mask Ratio Psychoacoustic Model Page Nr. 10

Differences between Audio and Speech Coding (1) Generic audio coding is similar to speech coding except: Larger bandwidth speech coders usually use up to 7 khz bandwidth Fewer audible artifacts Use of psycho-acoustic model for irrelevancy removal Page Nr. 11

Differences between Audio and Speech Coding (2) Different requirements for bitrate speech aims for as small as possible (e.g. GSM: <=13kbps) audio demands more for quality (>=64 kbps, decreasing) Not specialized to speech model Page Nr. 12

History of Audio Coding 1979 - the Critical Band Coder 1982 - classic ATC for Music 1985 - MSC 1987 - OCF 1987 - MASCAM 1987 - PXFM 1990 - ASPEC, MUSICAM 1992 - MPEG 1 1996 - epac 1997 - MPEG 2 AAC 1999 - MPEG 4 AAC 2002 - HE AAC 2012 - USAC MPEG-H: Coding for 3D audio Page Nr. 13

The time line for near-cd-quality 1990 256 kbit/s ASPEC, MUSICAM would fail today s listening tests 1992 192 kbit/s MPEG-1 Layer-3 1994 128 kbit/s MPEG-1 Layer-3 (".mp3") including combined joint stereo coding bad quality for some signals 1997 96 kbit/s MPEG-2 Advanced Audio Coding better than MP3 at 128, not fully transparent 2000 64 kbit/s AAC-based MPEG-4 2003 48 kbit/s MPEG-4 HeAAC (AAC+ in 2000) e.g. used for XM Radio Page Nr. 14

What quality can be reached today? Define the quality to reach for first: High end: don t call it transparent (hard to prove) best listening conditions listeners need years to be trained large number of samples for statistics near CD - quality: defined as good enough, no formal definition much more important for practical purposes example: mp3 at 128 kbit/s for stereo Page Nr. 15

Demo: Can you hear it (Version 4, 2000)? Each? corresponds to either O (Original, 1536 kbit/s for two channels) or C (Coded, 48 kbp/s for two channels) (HeAAC, demo provided by Coding Technologies) Trumpet solo O??? Speech O??? Abba O??? Page Nr. 16

Did you hear it? O (Original, 1536 kbit/s for two channels) or C (Coded, 48 kbp/s for two channels) (HeAAC, demo provided by Coding Technologies) Trumpet solo (O) _ Speech (O) _ Abba (O) _ Page Nr. 17

Extra Material Page Nr. 18

Organisatorial Details Overview (Repetition) Lectures: 14 lectures read by Prof. Brandenburg and Prof. Schuller Practice lessons: Exam: Instructors: Dr. Andreas Franck M. Sc. Javier Frutos-Bonilla Periodic homework assignments, which will count 30% towards the final grade. Small groups (2-3 people) to solve the homework and deliver a single solution for the whole group. Homework presentation during the lessons (laptop with Octave or Matlab running) Written exam at the end of the semester, 90 minutes Agree to this method by signing the document that is passed around Page Nr. 19

Organisatorial Details Timeline (Repetition) Lecture: Date: Read by: 1. Introduction 14.10.2013 Prof. Brandenburg 2. Psychoacoustics 21.10.2013 Dipl.-Ing. Werner 3. Basics of Multirate Signal Processing 28.10.2013 Prof. Schuller 4. Filter Banks 1 04.11.2013 Prof. Schuller 5. Filter Banks 2 11.11.2013 Prof. Schuller 6. Quantization & Coding 18.11.2013 Prof. Brandenburg 7. MPEG 1 / MPEG 2 BC Audio 25.11.2013 Prof. Brandenburg 8. MPEG 2 / 4 AAC 02.11.2013 Prof. Brandenburg 9. Prediction and Lossless Audio Coding 09.12.2013 Prof. Schuller 10. Audio Coding for Communication (ULD) 16.12.2013 Prof. Schuller 11. Coding of Stereophonic Signals 06.01.2014 Prof. Brandenburg 12. Parametric Coding of High-Quality Audio 13.01.2014 Prof. Brandenburg 13. Dolby AC3, DTS 20.01.2014 Prof. Schuller 14. SAOC and USAC 27.01.2014 Dr. Franck Page Nr. 20

History of Audio Coding 1979 - the Critical Band Coder 1982 - classic ATC for Music 1985 - MSC 1987 - OCF 1990 - MUSICAM 1990 - ASPEC 1992 - MPEG 1 1996 - PAC 1997 - MPEG 2 AAC 1999 - MPEG 4 AAC 2002 - HE AAC 2012 - USAC MPEG-H: Coding for 3D audio Page Nr. 21

The Critical Band Coder M.A. Krasner, MIT Lincoln Laboratories, 1979 First coder to use a psycho-acoustic model Sampling rate of 30kHz Analysis/Synthesis Filter QMF Filter Tree of depth 2 to 7 Filter bandwidths ranging from 117 Hz to 3.75 khz No calculation of the Threshold in Quiet, just looked at worst case scenarios Quantization with Block-companding, fixed bit distribution from psycho-acoustic criteria Bitrate of 123.8 kbps Page Nr. 22

classic ATC for Music Universität Erlangen-Nürnberg, 1982 First real-time music coder Sampling rate between 30-32 khz Does not use a psycho-acoustic model bad quality for some music pieces Block length of 128 samples (about 4 ms) Bitrate: 3bits/sample (about 100 kbps) Page Nr. 23

MSC (Multiple Adaptive Spectral Audio Coding) Krahe and others, Universität Duisburg, 1985 First Coder to use both psycho-acoustic model and transformation-coding Analysis/Synthesis: FFT with conversion of Amplitude & Phase window length of 1024 samples window ends sine-tapered with an overlap of 64 samples Threshold estimation is only using in-band masking Quantization uses block-companding with 2 bits per sample Page Nr. 24

OCF (Optimum Coding in Frequency Domain) Brandenburg, Universität Erlangen, 1987, 1988 MDCT-Filter bank with window length of 1024 or 512 Explicit calculation of the masking threshold with a simple model Calculation per critical band No tonality criteria used Maximum calculation instead of convolution Non-uniform quantization (quantization noise dependant on amplitude) Huffman coding from pairs of spectral values Page Nr. 25

ASPEC- Adaptive Spectral Perceptual Entropy Coding (1) Uni Erlangen, FhG, AT&T Bell Labs, Deutsche Thomson-Brandt, CNET, 1990 Analysis/Synthesis: MDCT with switchable block lengths Use of 2 models for psycho-acoustic Simple: like OCF Better: like PXFM + 1/3 Frequency grouping resolution + local tonality criteria (like Hybrid) Quantization/Coding: like OCF, Choice of Huffman-code-books Further division of the spectrum Control of window length (switching the number of bands) Page Nr. 26

MUSICAM - Masking-pattern Universal Subband Integrated Coding and Multiplexing (1) IRT, CCETT, Philips, Matsushita 1990 Subband-coding, that is good time resolution, bad frequency resolution First version used QMF-tree as filter bank Newest version uses 32 channel polyphase-filter bank Parallel FFT for fine calculation of masking Tonality criteria by local comparison of the spectral values Block-companding of the subband signal Page Nr. 27

MPEG-1 (1) Layer I Window length: 384 samples (8 ms) Frequency resolution: 32 subbands Quantization: Block-companding (12 samples) Layer II Window length: 1152 samples (24 ms) Frequency resolution: 32 subbands Quantization: Block companding (12 samples) Use of Scalefactor select information (SFSI) Page Nr. 28

MPEG-1 (2) Layer III Window length: 1152 samples (24 ms) Frequency resolution: 576/192 subbands Quantization: non-uniform with Huffman coding Use of Scalefactor Select Information Page Nr. 29

MPEG (1) December 1988 First meeting of Audio Expert Group July 1989 Call for Proposals (14 proposals received) Fall 1989 Clustering of similar proposals July 1990 Listening tests of Coders December 1990 Adoption the Committee Draft Page Nr. 30

MPEG (2) The results of the Stockholm-Tests showed 2 proposals were best, ASPEC and MUSICAM Listening tests show that ASPEC is better especially at low bitrates In comparison of complexity parameters MUSICAM is better RESULT: collaboration between ASPEC & MUSICAM in a Layered solution (hence Layer 1, Layer 2, & Layer 3) Page Nr. 31

PAC Resulted from split of AT&T and Lucent Technologies Branched off from MPEG-AAC, proprietary instead of standardized technology Used in American Satellite Broadcast System (XM, Sirius) Page Nr. 32

MPEG 2 AAC (1) first named MPEG-2 NBC (non backwards compatible), later named AAC (advanced audio coding) MPEG-2 AAC (ISO/IEC 13818-7) offers very high quality compressed audio Allows 1 to 48 channels, Sampling rates from 8 to 96 khz, with multi-channel, multi-lingual, and multi-program possibilities. AAC works at bit-rates from 8 kbit/s for mono Speech signals and up to 160 kbit/s/channel for very high quality, allows tandem coding Page Nr. 33

MPEG 2 AAC (2) 3 Profiles from AAC with varying levels of complexity and scalability. Joint Stereo -Mode is more flexible compared to MP3 in that it is switchable for individual scale factor bands whereas MP3 was only switchable for the whole spectrum. Page Nr. 34

MPEG 2 AAC Basic Features High frequency resolution filter bank-based coder (1024 subbands MDCT with 50% overlap) 1: 8 block switching (1024/128 subbands MDCT) Non- uniform quantizer Noise shaping in half critical bands (scalefactor bands) Huffman coding of scalefactors and spectral coefficients Page Nr. 35

HE AAC Combination of the MPEG-4 AAC Low Complexity (LC) Object and the MPEG-4 Spectral Band Replication (SBR) Object SBR: parametric coding of high frequency envelope with small amount of control data Parametric stereo and multi-channel coding Backwards compatible to AAC 5.1 surround sound at 128 kbps Good quality stereo at 32 kbps or above Page Nr. 36