Speech, Audio, Image, and Video Coding

Similar documents
Digital Speech Coding

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

Introduction to image coding

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Digital Audio Compression: Why, What, and How


Audio Coding, Psycho- Accoustic model and MP3

For Articulation Purpose Only

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

AUDIO CODING: BASICS AND STATE OF THE ART

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

CHAPTER 2 LITERATURE REVIEW

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

Study and Implementation of Video Compression Standards (H.264/AVC and Dirac)

Analog-to-Digital Voice Encoding

Sachin Dhawan Deptt. of ECE, UIET, Kurukshetra University, Kurukshetra, Haryana, India

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Linear Predictive Coding

encoding compression encryption

Speech Compression. 2.1 Introduction

Preservation Handbook

Audio Coding Algorithm for One-Segment Broadcasting

A Comparison of the ATRAC and MPEG-1 Layer 3 Audio Compression Algorithms Christopher Hoult, 18/11/2002 University of Southampton

Simple Voice over IP (VoIP) Implementation

School Class Monitoring System Based on Audio Signal Processing

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

Study and Implementation of Video Compression standards (H.264/AVC, Dirac)

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

Image Compression through DCT and Huffman Coding Technique

Comparison of different image compression formats. ECE 533 Project Report Paula Aguilera

L9: Cepstral analysis

Understanding HD: Frame Rates, Color & Compression

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.

Figure 1: Relation between codec, data containers and compression algorithms.

CM0340 SOLNS. Do not turn this page over until instructed to do so by the Senior Invigilator.

From Concept to Production in Secure Voice Communications

EE3414 Multimedia Communication Systems Part I

MEDICAL IMAGE COMPRESSION USING HYBRID CODER WITH FUZZY EDGE DETECTION

TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION

A Secure File Transfer based on Discrete Wavelet Transformation and Audio Watermarking Techniques

NICE-RJCS Issue 2011 Evaluation of Potential Effectiveness of Desktop Remote Video Conferencing for Interactive Seminars Engr.

Video Coding Basics. Yao Wang Polytechnic University, Brooklyn, NY11201

Understanding Compression Technologies for HD and Megapixel Surveillance

Ericsson T18s Voice Dialing Simulator

Video compression: Performance of available codec software

Multimedia Technology Bachelor of Science

Lecture 1-6: Noise and Filters

Speech Signal Processing: An Overview

Introduzione alle Biblioteche Digitali Audio/Video

Introduction to Medical Image Compression Using Wavelet Transform

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

Data Storage 3.1. Foundations of Computer Science Cengage Learning

A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER. Figure 1. Basic structure of an encoder.

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically.

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

IBM Research Report. CSR: Speaker Recognition from Compressed VoIP Packet Stream

Digital Audio and Video Data

Analog Representations of Sound

Figure1. Acoustic feedback in packet based video conferencing system

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Voice over IP (VoIP) Part 1

Web-Conferencing System SAViiMeeting

Introduction to Packet Voice Technologies and VoIP

Lecture 2 Outline. EE 179, Lecture 2, Handout #3. Information representation. Communication system block diagrams. Analog versus digital systems

Quick start guide! Terri Meyer Boake

RECOMMENDATION ITU-R BO.786 *

UNIVERSITY OF CALICUT

Quality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden)

SPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A

Challenges and Solutions in VoIP

4 Digital Video Signal According to ITU-BT.R.601 (CCIR 601) 43

A Digital Audio Watermark Embedding Algorithm

Compression techniques

Classes of multimedia Applications

Information, Entropy, and Coding

Calculating Bandwidth Requirements

Digital Audio Compression

Digital Transmission of Analog Data: PCM and Delta Modulation

H.264/MPEG-4 AVC Video Compression Tutorial

ELEC 4801 THESIS PROJECT

WATERMARKING FOR IMAGE AUTHENTICATION

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

White paper. An explanation of video compression techniques.

White paper. H.264 video compression standard. New possibilities within video surveillance.

White Paper: Video Compression for CCTV

Solomon Systech Image Processor for Car Entertainment Application

PRIMER ON PC AUDIO. Introduction to PC-Based Audio

DTS Enhance : Smart EQ and Bandwidth Extension Brings Audio to Life

Video codecs in multimedia communication

AC : MULTIMEDIA SYSTEMS EDUCATION INNOVATIONS I: SPEECH

GSM speech coding. Wolfgang Leister Forelesning INF 5080 Vårsemester Norsk Regnesentral

White Paper: An Overview of the Coherent Acoustics Coding System

Transcription:

Speech, Audio, Image, and Video Coding Douglas L. Jones ECE 497 Spring 2000 4/11/00 Source Coding D.L. Jones 1

Outline Speech Coding Speech Recognition Audio Coding Image Coding Video Coding 4/11/00 Source Coding D.L. Jones 2

Goals for this Lecture Learn basic principles underlying speech, audio, image, and video coding Understand why coding is important Understand some roles of media processing in embedded system design 4/11/00 Source Coding D.L. Jones 3

Speech Coding Speech coding is important for reduced data rate in digital communications, such as cell-phones reduced memory, storage requirements for digital answering machines, speech databases, etc. 4/11/00 Source Coding D.L. Jones 4

Speech Processing Parameters Typical sampling rate: 8 khz Typical quantization: 8-bit mu-law or A-law Base rate: 64k bits per second (64 kbps) Compressed rates range from 2.4-32 kbps Modern compression methods based on ADPCM or LPC 4/11/00 Source Coding D.L. Jones 5

ADPCM Adaptive Differential Pulse Code Modulation (ADPCM) is an old compression standard that remains the best method for half-rate (32 kbps) compression ADPCM is a waveform compression method that tries to preserve the actual speech signal waveform as much as possible. 4/11/00 Source Coding D.L. Jones 6

ADPCM codes the difference signal; that is, d(n) = x(n) - x(n-1) Since the speech waveform is primarily a low-frequency signal, the difference is usually small, so it requires fewer bits to represent Adaptive DPCM adjusts quantization step size to track difference amplitude 4/11/00 Source Coding D.L. Jones 7

A short adaptive prediction filter also reduces size of difference ADPCM at half rate (32 kbps) sounds almost indistinguishable from 64 kbps speech 4/11/00 Source Coding D.L. Jones 8

Linear Prediction Coding Linear Prediction Coding (LPC) is a fundamentally different, model based approach to speech coding Based on acoustic tube model of human speech production Models short speech segments as either white noise (unvoiced) or an impulse train (voiced) input to an all-pole (IIR) filter 4/11/00 Source Coding D.L. Jones 9

LPC-10 Input amplitude, voiced/unvoiced, pitch period, and 10th-order filter coefficients are computed for 20-30 ms blocks Instead of transmitting speech, send only the filter coefficients and other parameters! Rerun filter at the receive end to reconstruct speech 4/11/00 Source Coding D.L. Jones 10

Produces artificial-sounding but understandable speech reconstructions at rates as low as 2400 bits/sec 4/11/00 Source Coding D.L. Jones 11

Enhanced LPC Methods LPC-10 achieves excellent compression, but insufficient quality for most telephony applications Enhanced LPC methods have been developed with higher rates and performance LPC-based approaches dominate speech coding for rates at and below 16 kbps 4/11/00 Source Coding D.L. Jones 12

RELP Residual Excited Linear Prediction (RELP) retains and sends residual (prediction error) as well Sending residual back through prediction model reconstructs original waveform (in the absence of quantization) Rate remains fairly high, since residual requires many bits 4/11/00 Source Coding D.L. Jones 13

CELP Code Excited Linear Prediction (CELP) selects an excitation sequence from a codebook of possible choices Transmit code indicating selection, rather than residual Greatly reduced rate, only modest loss in performance 4/11/00 Source Coding D.L. Jones 14

There are many flavors of CELP; the better lower-rate methods based on this concept Cell-phones tend to use rates from about 4.8 to 9.6 kbps Quality noticeably inferior to telephone, but deemed acceptable Allows 3-6 times as many users in a cell! 4/11/00 Source Coding D.L. Jones 15

Hardware note... Speech coding/decoding is the primary reason for DSP up in digital cell-phones! DSP up is nearly ideal for speech coding algorithms (ASIC wouldn t be better) Since it s there anyway, DSP up also used for many other functions 4/11/00 Source Coding D.L. Jones 16

Speech Recognition Speech recognition is expected to become a very important component of many future embedded systems Convenient, natural user interface for very small embedded systems (e.g., wristwatch cell-phone, Palm-Pilot X) non-critical systems (e.g., car radio, windshield wipers) 4/11/00 Source Coding D.L. Jones 17

Speech Recognition Methods Modern speech recognition is based on short-time spectral analysis Spectral estimates usually constructed from linear prediction followed by further processing Hidden Markov Models (HMMs) perform statistical comparison with database of words and language models 4/11/00 Source Coding D.L. Jones 18

System Requirements Memory and computational requirements: Small vocabulary, isolated word recognition a few MIPS and kbs Large vocabulary, continuous speech 100s of MIPS, 100s of MBs 4/11/00 Source Coding D.L. Jones 19

Audio Coding Quality expectations considerably higher than with speech 16-bit, 44.1 khz stereo is CD standard Modern audio coding methods (e.g., mp3) based on perceptual coding tricks Exploit limitations of human hearing to reduce rate while minimizing audible artifacts 4/11/00 Source Coding D.L. Jones 20

Split signal into different frequency bands according to sensitivities of human hearing Exploit masking to remove data from inaudible bands due to loud neighbors Shape quantization noise to lie in masked regions Obtain near-cd quality at 128-256 kbps 4/11/00 Source Coding D.L. Jones 21

Image Coding Many emerging embedded system applications Digital cameras Security (e.g., fingerprint ID) Medical record storage Image usually acquired with a CCD imaging sensor 4/11/00 Source Coding D.L. Jones 22

Requirements Typical image ~ 512x512 pixels, 3 colors each at 8 bits Or binary black-and-white Two types of compression Lossless: maximum compression ratios of 2-3 Lossy: high quality with compression ratios of 10-30 4/11/00 Source Coding D.L. Jones 23

Image Compression Standards Binary images: JBIG/FAX standards Primarily based on run-length coding (i.e., number of black or white pixels in succession) 8-bit images: JPEG standard: DCT-based Emerging standards wavelet based (EZW, SPIHT, JPEG-2000) 4/11/00 Source Coding D.L. Jones 24

Principles of JPEG Image segmented into 8x8 blocks of pixels 2-D Discrete Cosine Transform (DCT) computed of each block Most of these frequency components are typically very small and can be coarsely quantized or discarded Quantized data is entropy-coded 4/11/00 Source Coding D.L. Jones 25

JPEG Characteristics At compression rates of 1 bit per pixel, quality loss is usually small Below about 0.5 bpp, blocking artifacts begin to appear; much below this is usually unacceptable 4/11/00 Source Coding D.L. Jones 26

Emerging Methods New methods based on wavelets are emerging Frequency decomposition by successive subband filtering Small coefficients discarded Artifacts generally less objectionable 4/11/00 Source Coding D.L. Jones 27

Exploitation of tree structure and dependencies yields further compression JPEG-2000 standard will be based on these methods 4/11/00 Source Coding D.L. Jones 28

Video Coding Embedded system examples: HDTV Satellite TV Set-top boxes Security systems Multimedia devices 4/11/00 Source Coding D.L. Jones 29

Motion-Based Coding Methods Modern video coding methods exploit frame-to-frame similarities to further compress video Similar to JPEG, except that motioncompensated difference frames are coded with DCT Motion vectors encode change in location of blocks 4/11/00 Source Coding D.L. Jones 30

Video Coding Standards MPEG-2 and MPEG-4 are leading standards for high (television) quality video coding H.263 is primary standard for low-rate video coding (video-phones) Compression ratios of 30-50 with good quality are usually obtained 4/11/00 Source Coding D.L. Jones 31

Summary Source coding essential to reduce memory requirements, bandwidth of multimedia data Complex DSP algorithms obtain great data reductions with little loss in quality Coding algorithms have characteristics common to other DSP computations Source coding likely to play increasingly important role in many embedded systems 4/11/00 Source Coding D.L. Jones 32