A MATLAB Software Tool for the Introduction of Speech Coding Fundamentals in a DSP Course

Similar documents
A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

Digital Speech Coding

Analog-to-Digital Voice Encoding

Simple Voice over IP (VoIP) Implementation

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Speech Compression. 2.1 Introduction

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

GSM speech coding. Wolfgang Leister Forelesning INF 5080 Vårsemester Norsk Regnesentral

Introduction to Packet Voice Technologies and VoIP

Subjective SNR measure for quality assessment of. speech coders \A cross language study

Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT)

Voice Encoding Methods for Digital Wireless Communications Systems

Linear Predictive Coding

AC : MULTIMEDIA SYSTEMS EDUCATION INNOVATIONS I: SPEECH

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

A WEB BASED TRAINING MODULE FOR TEACHING DIGITAL COMMUNICATIONS

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

Non-Data Aided Carrier Offset Compensation for SDR Implementation

Solutions to Exam in Speech Signal Processing EN2300

Lecture 1-6: Noise and Filters

Quality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden)

AUDIO CODING: BASICS AND STATE OF THE ART

Audio Coding, Psycho- Accoustic model and MP3

Basic principles of Voice over IP

MULTI-STREAM VOICE OVER IP USING PACKET PATH DIVERSITY

Figure1. Acoustic feedback in packet based video conferencing system

Ericsson T18s Voice Dialing Simulator

Wireless Communication and RF System Design Using MATLAB and Simulink Giorgia Zucchelli Technical Marketing RF & Mixed-Signal

Voice over IP Protocols And Compression Algorithms

The Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT

GSM/EDGE Output RF Spectrum on the V93000 Joe Kelly and Max Seminario, Verigy

HD Radio FM Transmission System Specifications Rev. F August 24, 2011

RF Measurements Using a Modular Digitizer

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Voice Encryption over GSM:

School Class Monitoring System Based on Audio Signal Processing

Simulative Investigation of QoS parameters for VoIP over WiMAX networks

Research Report. By An T. Le Oct 2005 Supervisor: Prof. Ravi Sankar, Ph.D. Oct 28, 2005 An T. Le -USF- ICONS group - SUS ans VoIP 1

NRZ Bandwidth - HF Cutoff vs. SNR

Speech Signal Processing: An Overview

VoIP Bandwidth Calculation

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

PCM Encoding and Decoding:

ISI Mitigation in Image Data for Wireless Wideband Communications Receivers using Adjustment of Estimated Flat Fading Errors

The System Implementation of 1-phone Hardware by Using Low Bit Rate Speech Coding

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

Tutorial about the VQR (Voice Quality Restoration) technology

Objective Speech Quality Measures for Internet Telephony

A New Digital Communications Course Enhanced by PC-Based Design Projects*

Web-Conferencing System SAViiMeeting

Lab 5 Getting started with analog-digital conversion


TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION

VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.)

Emotion Detection from Speech

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

Dream DRM Receiver Documentation

Performance Analysis of Interleaving Scheme in Wideband VoIP System under Different Strategic Conditions

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Performance of Quasi-Constant Envelope Phase Modulation through Nonlinear Radio Channels

Digital Transmission of Analog Data: PCM and Delta Modulation

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

ADVANCED APPLICATIONS OF ELECTRICAL ENGINEERING

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically.

Available from Deakin Research Online:

Voice Activity Detection in the Tiger Platform. Hampus Thorell

White Paper: An Overview of the Coherent Acoustics Coding System

Jitter Measurements in Serial Data Signals

L9: Cepstral analysis

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

Advanced Signal Processing and Digital Noise Reduction

David Tipper Associate Professor Department of Information Science and Telecommunications University of Pittsburgh Slides 2.

White Paper. PESQ: An Introduction. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN

T = 1 f. Phase. Measure of relative position in time within a single period of a signal For a periodic signal f(t), phase is fractional part t p

Lecture 1-10: Spectrograms

Image Compression through DCT and Huffman Coding Technique

Voice Quality Evaluation and the Impact of Wireless Packet Communication Systems

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

1 Multi-channel frequency division multiplex frequency modulation (FDM-FM) emissions

How To Understand The Quality Of A Wireless Voice Communication

Module 13 : Measurements on Fiber Optic Systems

Speech Coding Methods, Standards, and Applications. Jerry D. Gibson

Indepth Voice over IP and SIP Networking Course

From Concept to Production in Secure Voice Communications

Ultra Wideband Signal Impact on IEEE802.11b Network Performance

The Optimization of Parameters Configuration for AMR Codec in Mobile Networks

Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification

Taking the Mystery out of the Infamous Formula, "SNR = 6.02N dB," and Why You Should Care. by Walt Kester

Spike-Based Sensing and Processing: What are spikes good for? John G. Harris Electrical and Computer Engineering Dept

Appendix C GSM System and Modulation Description

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.

Analysis/resynthesis with the short time Fourier transform

PERFORMANCE ANALYSIS OF VOIP TRAFFIC OVER INTEGRATING WIRELESS LAN AND WAN USING DIFFERENT CODECS

Hideo Okawara s Mixed Signal Lecture Series. DSP-Based Testing Fundamentals 46 Per-pin Signal Generator

Revision of Lecture Eighteen

Audio Coding Introduction

SPEECH CODING: FUNDAMENTALS AND APPLICATIONS

Transcription:

A MATLAB Software Tool for the Introduction of Speech Coding Fundamentals in a DSP Course Edward Painter, and Andreas Spanias Department of Electrical Engineering, Telecommunications Research Center Arizona State University, Tempe, Arizona 85287-7206 spanias@asu.edu, painter@asu.edu Abstract An educational software tool on speech coding is presented. Portions of this program are used in our senior-level DSP class at Arizona State University to expose undergraduate students to speech coding and present speech analysis/synthesis as an application paradigm for many DSP fundamental concepts. The simulation software provides an interactive environment that allows students to investigate and understand speech coding algorithms for a variety of input speech records. Time- and frequency-domain representations of input and reconstructed speech can be graphically displayed and played back on a PC equipped with a standard 16-bit sound card. The program has been developed for use in the MATLAB environment and includes implementations of the FS-1015 LPC-10e, the FS-1016 CELP, the ETSI GSM, the IS-54 VSELP, the G.721 ADPCM, the G.722 subband, and the G.728 LD-CELP speech coding algorithms, integrated under a common graphical interface. 1. Introduction Speech coding is an application area in signal processing concerned with obtaining compact represent a- tions of speech signals for efficient transmission or sto r- age. This requires analysis and modeling of digital speech signals which are usually represented by a compact set of quantized filter, excitation, and spectrum param e- ters. As such speech coding uses many fundamental si g- nal processing tools and concepts which are taught in u n- dergraduate DSP classes. It can therefore be used as an application paradigm for demonstrating the utility of DSP tools such as digital filtering, random signal processing, autocorrelation and PSD estimation, handling of nonstationarities, windowing, quantization of filter coefficients, estimation of periodicity, and time-varying signal modeling. An exposition to speech coding in an unde r- graduate DSP course is also motivated by the emergence of new computer and mobile communication applications that require young electrical engineers to have some fu n- damental speech processing knowledge in the context of DSP. We recently started to introduce speech coding t o- wards the end of the senior-level four-credit DSP course by devoting four lectures, two homework assignments, and one computer project to address this important appl i- cation area. As part of this effort, we developed an ed u- cational simulation program in MATLAB that can be used to provide knowledge on speech coding algorithms and demonstrate the utility of several important DSP co n- cepts. LP-based codecs which have been implemented i n- clude the FS-1015 LPC-10e, the FS-1016 CELP, the IS-54 VSELP, the ETSI GSM, the G.728 LD-CELP, the G.721 MATLAB is a trademark of The MathWorks, Inc. ADPCM, and the G.722 subband coder. These programs provide a unified exposition to the algorithms by bringing them together into a common simulation framework under MATLAB. In addition, a unified user-friendly interface is developed which enables users to experiment with a var i- ety of input signals, examine graphical representations of analysis/synthesis parameters, playback reconstructed ou t- put speech, and compare quality of output speech assoc i- ated with the different coding standards. Graphical ou t- puts may provide information to the user about underlying algorithm mechanisms. Simulations have been coded in an expository style to serve as template programs which supply working examples and important details often omitted from the general literature. The MATLAB environment offers several advantages. First, users are able to generate a variety of signal and parameter plots, exper i- ment with the effects of channel noise and network tandeming, and modify algorithm parameters in an enviro n- ment where algorithms are easily manipulated. Second, MATLAB code is compact thereby simplifying algorithm understanding. Third, MATLAB is being used widely in academic institutions to support linear systems and DSP courses. Finally, the MATLAB codecs will run on a variety of computers, i.e., DOS, Mac, UNIX, etc. This paper describes the educational software tool and gives sample simulations that can be used to assist undergraduates in the understanding of speech coding algorithms. 2. MATLAB Codec Simulations Simulations accept input samples from.wav input files, run analysis at the transmitter, transmit p a- rameters through a simulated channel, run synthesis at the receiver, and then generate.wav output files. Speech files contain 16-bit linear PCM data, sampled at 8 khz. A. Time- and Frequency-Domain Viewing Windows A time-domain viewing window allows compar i- sons between input and reconstructed output waveforms (Fig. 1a). One is able to see the differences in waveform matching behavior between a hybrid algorithm (e.g., CELP) and a vocoder (e.g., LPC-10e). Comparisons are enhanced by a facility which allows examination of the reconstruction error. Users can also observe the bitrate/performance tradeoff; higher bit-rate algorithms ge n- erate small errors, while low bit-rate algorithms produce larger errors. A frequency-domain viewing window is also available (Fig. 1b), allowing comparison of magn i- tude spectra between input and reconstructed output speech. Magnitude spectral estimates are generated using a 1024- point FFT. The LPC envelope, corresponding to quantized predictor coefficients received by the decoder, is

superimposed on both plots. One can observe spectral matching properties, e.g., a vocoder such as LPC-10e exhibits reasonable spectral matching despite low SNR. Spectral error display is also available. In all LPC coding methods, short-term spectral characteristics are captured in an all-pole synthesis filter. It is the excitation models which are different in these algorithms in terms of co m- plexity, performance, and bit rate. Our excitation viewing window allows observation of excitation sequences in time and frequency (Fig. 1c). Comparisons help users to u n- derstand different excitation methodologies. After o b- serving voiced LPC-10e excitations, for example, a glottal pulse shape invariance becomes evident; excitation changes between voiced frames occur only in the number of pulse repetitions (pitch) and the added noise. Obser v- ing GSM excitations clarifies the concept of RPE, in which each frame of regularly spaced pulses has distinctly different amplitude patterns than its predecessor. Users can observe that RPE excitations achieve performance gains relative to the simplistic two-state model used in LPC-10e. In CELP (Fig. 1c), random vectors have been combined with lag search vectors to obtain an optimal e x- citation. We have elected to present pole locations of the decoder's LPC synthesis filter through a Z-Plane view (Fig. 1d). This window also allows pole trajectory trac k- ing and animated playback, and provides information about formant locations. B. Quality Measures and Speech File View Utility Many objective quality measures have been pr o- posed to quantify coding distortion [ 1]. Our simulations incorporate spectral and temporal distortion measures in a quality display. Furthermore, there is a frame-by-frame speech file viewing utility which generates 3-D spectograms using FFT or LP-based spectral analysis (Fig. 2). Fig. 2. File Viewer 3-D LPC Spectogram. 3. MATLAB Simulation Exercises Fig. 1. Viewing Windows (CELP): (a) Time-Domain, (b) Frequency Domain, (c) Excitation, (d) Z-Plane. A. CELP Codebook Search Excitation optimization in CELP involves (in most cases) exhaustively searching two vector codebooks. Codebooks are searched sequentially, adaptive first and then stochastic. During the search process, candidate e x- citations are used to synthesize speech and generate error signals. Excitation vectors (gain- shape VQ) are chosen to minimize a perceptually weighted error measure. Our

software enables users to examine codebook (CB) search procedures. We show candidate adaptive CB vectors co r- responding to min. and max. match scores obtained from a 256-vector search space (Figs. 3a,c.). Using these exc i- tation sequences, we can synthesize and evaluate speech waveforms as shown in Figs. 3b,d, respectively. Output records are plotted with input speech to allow compar i- sons. SNRs are provided to give an objective performance measure. From Fig. 3, we observe that higher match scores correspond to higher quality excitations and higher SNR. By developing plots like Fig. 3, students are able to observe the correspondence between match scores and e x- citation quality. Furthermore, they gain knowledge on the nature of VQ excitations. (a) (a) (b) (b) Fig.4. CELP Perceptual Weighting Filter: (a) Poles/Zeros, (b) Magnitude Response and LP Envelope. poor subjective quality measurement inherent in SNR. (c) Fig. 3. Adaptive Excitation Vectors Associated with (a) Min. and (c) Max. Match Scores; Output Speech (b,d) B. CELP Perceptual Weighting Filter CELP CB search procedures minimize a perceptually weighted error. Weighting is achieved through an IIR filter which shapes the error spectrum to exploit masking properties of the ear. In particular, CELP algo rithms exploit the fact that humans have a limited ability to detect small errors in frequency bands where the speech signal has high energy, such as the formant regions. Therefore the CELP weighting filter de-emphasizes formant regions in the error spectrum. The transfer function of the weighting filter is of the form 1 A( z) W( z) = = A( z / ) 1 p i= 1 p i= 1 a a z i i i i z i (d), = 0. 8 where A( z) is the short term LP synthesis filter and a i are the predictor coefficients. The parameter expands formant bandwidths by moving poles radially inward towards the center of the unit circle. Our software enables users to examine pole/zero and frequency response plots for the PWF (Fig. 4). Users may also process speech re c- ords with and without the weighting filter. Comparing output records provides insight on the net effects of the PWF. One can observe that subjective speech quality i m- proves with the filter, despite the drop in SNR. This exe r- cise demonstrates both weighting filter behavior and the (1) C. LPC-10e Voicing Detection The voicing detection scheme in LPC-10e uses a sophisticated linear discriminant analysis procedure in which several signal parameters are linearly combined and then smoothed to generate a voicing decision for each half-frame. Our software enables students to examine the evolution of these parameters with time (Fig. 5.). Fig. 5. LPC-10e Voicing Decision Discriminant Analysis: Mable stood on the rock. D. Robustness to Channel Errors and Tandeming Codec bit streams in wireless applications are subjected to channel errors which are characterized in terms of bit error rate (BER). Coding algorithms should tolerate bit errors with minimal perceptual degradation. Our software is equipped with BER and tandeming controls (Fig. 6) that enable students to contrast error tole r-

ances between the different algorithms. As illustrated in the CELP segmental signal-to-noise ratio (SSNR) penalty plots of Fig. 7, one can investigate algorithmic perfor m- ance in the presence of channel errors. Fig. 7a shows i n- dividual bit sensitivities for the standard CELP frame bits. The reference level at 5.7 db corresponds to SSNR achieved over a clear channel. Vertical penalty lines for each bit indicate the mean SSNR penalty incurred when the corresponding bit is inverted with unity probability. The family of curves in Fig 7b illustrates parametric error sensitivities measured at BERs of 0.1%, 0.5%, 1%, 5%, and 10%. For each curve, bits for the specified parameter are randomly corrupted, while the remaining parameters are left undisturbed. Users can also employ our tools to perform subjective evaluations In addition to channel errors, a robust coding algorithm must also tolerate tandem encoding without excessive compromises in output quality. Our simulations enable users to examine algorithmic responses to multiple sy n- chronous tandems. The software allows, e.g., five-stage configurations ( T0 T5 ). Objective figures of merit are r e- ported in terms of SNR, SSNR, and CD. Example scores reported here reflect mean results after processing frames at each of six tandem nodes ( T0 T5 ). For Mean Opinion Score (MOS) trials, trained listeners could be asked to judge test sentences on a five-point MOS scale. Fig.6. BER and Tandeming Controls.. Fig. 8. Penalty Associated with CELP Tandem Encoding: (a) SSNR, (b) MOS. Tandeming scores for CELP are shown in Figs. 8a and 8b. In our example, experimentally obtained MOS values for CELP are biased by an average of -0.2 with respect to MOS BK [2]. MOS BK is a biased version of the MOS predictor proposed by Kitawaki, et al which is evaluated by our simulation tools: MOS = 0. 04CD 2 0. 80CD + 4. 86 (2) BK The preceding exercises represent the testing c a- pabilities of our software. Other beneficial topics of i n- vestigation include comparisons of performance with di f- ferent input signals/speakers, examination of parametric variations and performance tradeoffs, and evaluations of algorithmic robustness to acoustic background noise. 4. Conclusion Fig. 7. Penalty Associated with CELP Channel Errors: (a) Single Bit, (b) Parametric. We have presented new educational speech co d- ing simulation software developed to supplement our speech coding and DSP lecture courses with hands-on e x-

periments. We have also described a laboratory har d- ware/software environment and outlined simulation exe r- cises. In future work, we will incorporate additional co d- ing algorithms, including a sinusoidal transform coder. 5. References 1. A. Gray and J. Markel, "Distance Measures for Speech Processing," IEEE Trans. ASSP-24, Oct. 1976. 2. N. Kitawaki, et al., Objective Quality Evaluation for Low-Bit-Rate Speech Coding Systems, IEEE J. on Sel. Areas in Comm, pp. 242-248, Feb. 1988.

MATLAB is a trademark of The MathWorks, Inc.