Speech, Audio, Image, and Video Coding

Size: px

Start display at page:

Download "Speech, Audio, Image, and Video Coding"

Janis Paul
9 years ago
Views:

1 Speech, Audio, Image, and Video Coding Douglas L. Jones ECE 497 Spring /11/00 Source Coding D.L. Jones 1

2 Outline Speech Coding Speech Recognition Audio Coding Image Coding Video Coding 4/11/00 Source Coding D.L. Jones 2

3 Goals for this Lecture Learn basic principles underlying speech, audio, image, and video coding Understand why coding is important Understand some roles of media processing in embedded system design 4/11/00 Source Coding D.L. Jones 3

4 Speech Coding Speech coding is important for reduced data rate in digital communications, such as cell-phones reduced memory, storage requirements for digital answering machines, speech databases, etc. 4/11/00 Source Coding D.L. Jones 4

5 Speech Processing Parameters Typical sampling rate: 8 khz Typical quantization: 8-bit mu-law or A-law Base rate: 64k bits per second (64 kbps) Compressed rates range from kbps Modern compression methods based on ADPCM or LPC 4/11/00 Source Coding D.L. Jones 5

6 ADPCM Adaptive Differential Pulse Code Modulation (ADPCM) is an old compression standard that remains the best method for half-rate (32 kbps) compression ADPCM is a waveform compression method that tries to preserve the actual speech signal waveform as much as possible. 4/11/00 Source Coding D.L. Jones 6

7 ADPCM codes the difference signal; that is, d(n) = x(n) - x(n-1) Since the speech waveform is primarily a low-frequency signal, the difference is usually small, so it requires fewer bits to represent Adaptive DPCM adjusts quantization step size to track difference amplitude 4/11/00 Source Coding D.L. Jones 7

8 A short adaptive prediction filter also reduces size of difference ADPCM at half rate (32 kbps) sounds almost indistinguishable from 64 kbps speech 4/11/00 Source Coding D.L. Jones 8

9 Linear Prediction Coding Linear Prediction Coding (LPC) is a fundamentally different, model based approach to speech coding Based on acoustic tube model of human speech production Models short speech segments as either white noise (unvoiced) or an impulse train (voiced) input to an all-pole (IIR) filter 4/11/00 Source Coding D.L. Jones 9

10 LPC-10 Input amplitude, voiced/unvoiced, pitch period, and 10th-order filter coefficients are computed for ms blocks Instead of transmitting speech, send only the filter coefficients and other parameters! Rerun filter at the receive end to reconstruct speech 4/11/00 Source Coding D.L. Jones 10

11 Produces artificial-sounding but understandable speech reconstructions at rates as low as 2400 bits/sec 4/11/00 Source Coding D.L. Jones 11

12 Enhanced LPC Methods LPC-10 achieves excellent compression, but insufficient quality for most telephony applications Enhanced LPC methods have been developed with higher rates and performance LPC-based approaches dominate speech coding for rates at and below 16 kbps 4/11/00 Source Coding D.L. Jones 12

13 RELP Residual Excited Linear Prediction (RELP) retains and sends residual (prediction error) as well Sending residual back through prediction model reconstructs original waveform (in the absence of quantization) Rate remains fairly high, since residual requires many bits 4/11/00 Source Coding D.L. Jones 13

14 CELP Code Excited Linear Prediction (CELP) selects an excitation sequence from a codebook of possible choices Transmit code indicating selection, rather than residual Greatly reduced rate, only modest loss in performance 4/11/00 Source Coding D.L. Jones 14

15 There are many flavors of CELP; the better lower-rate methods based on this concept Cell-phones tend to use rates from about 4.8 to 9.6 kbps Quality noticeably inferior to telephone, but deemed acceptable Allows 3-6 times as many users in a cell! 4/11/00 Source Coding D.L. Jones 15

16 Hardware note... Speech coding/decoding is the primary reason for DSP up in digital cell-phones! DSP up is nearly ideal for speech coding algorithms (ASIC wouldn t be better) Since it s there anyway, DSP up also used for many other functions 4/11/00 Source Coding D.L. Jones 16

17 Speech Recognition Speech recognition is expected to become a very important component of many future embedded systems Convenient, natural user interface for very small embedded systems (e.g., wristwatch cell-phone, Palm-Pilot X) non-critical systems (e.g., car radio, windshield wipers) 4/11/00 Source Coding D.L. Jones 17

18 Speech Recognition Methods Modern speech recognition is based on short-time spectral analysis Spectral estimates usually constructed from linear prediction followed by further processing Hidden Markov Models (HMMs) perform statistical comparison with database of words and language models 4/11/00 Source Coding D.L. Jones 18

19 System Requirements Memory and computational requirements: Small vocabulary, isolated word recognition a few MIPS and kbs Large vocabulary, continuous speech 100s of MIPS, 100s of MBs 4/11/00 Source Coding D.L. Jones 19

20 Audio Coding Quality expectations considerably higher than with speech 16-bit, 44.1 khz stereo is CD standard Modern audio coding methods (e.g., mp3) based on perceptual coding tricks Exploit limitations of human hearing to reduce rate while minimizing audible artifacts 4/11/00 Source Coding D.L. Jones 20

21 Split signal into different frequency bands according to sensitivities of human hearing Exploit masking to remove data from inaudible bands due to loud neighbors Shape quantization noise to lie in masked regions Obtain near-cd quality at kbps 4/11/00 Source Coding D.L. Jones 21

22 Image Coding Many emerging embedded system applications Digital cameras Security (e.g., fingerprint ID) Medical record storage Image usually acquired with a CCD imaging sensor 4/11/00 Source Coding D.L. Jones 22

23 Requirements Typical image ~ 512x512 pixels, 3 colors each at 8 bits Or binary black-and-white Two types of compression Lossless: maximum compression ratios of 2-3 Lossy: high quality with compression ratios of /11/00 Source Coding D.L. Jones 23

24 Image Compression Standards Binary images: JBIG/FAX standards Primarily based on run-length coding (i.e., number of black or white pixels in succession) 8-bit images: JPEG standard: DCT-based Emerging standards wavelet based (EZW, SPIHT, JPEG-2000) 4/11/00 Source Coding D.L. Jones 24

25 Principles of JPEG Image segmented into 8x8 blocks of pixels 2-D Discrete Cosine Transform (DCT) computed of each block Most of these frequency components are typically very small and can be coarsely quantized or discarded Quantized data is entropy-coded 4/11/00 Source Coding D.L. Jones 25

26 JPEG Characteristics At compression rates of 1 bit per pixel, quality loss is usually small Below about 0.5 bpp, blocking artifacts begin to appear; much below this is usually unacceptable 4/11/00 Source Coding D.L. Jones 26

27 Emerging Methods New methods based on wavelets are emerging Frequency decomposition by successive subband filtering Small coefficients discarded Artifacts generally less objectionable 4/11/00 Source Coding D.L. Jones 27

28 Exploitation of tree structure and dependencies yields further compression JPEG-2000 standard will be based on these methods 4/11/00 Source Coding D.L. Jones 28

29 Video Coding Embedded system examples: HDTV Satellite TV Set-top boxes Security systems Multimedia devices 4/11/00 Source Coding D.L. Jones 29

30 Motion-Based Coding Methods Modern video coding methods exploit frame-to-frame similarities to further compress video Similar to JPEG, except that motioncompensated difference frames are coded with DCT Motion vectors encode change in location of blocks 4/11/00 Source Coding D.L. Jones 30

31 Video Coding Standards MPEG-2 and MPEG-4 are leading standards for high (television) quality video coding H.263 is primary standard for low-rate video coding (video-phones) Compression ratios of with good quality are usually obtained 4/11/00 Source Coding D.L. Jones 31

32 Summary Source coding essential to reduce memory requirements, bandwidth of multimedia data Complex DSP algorithms obtain great data reductions with little loss in quality Coding algorithms have characteristics common to other DSP computations Source coding likely to play increasingly important role in many embedded systems 4/11/00 Source Coding D.L. Jones 32

Digital Speech Coding

Digital Speech Coding Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2720 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html