Solutions to Exam in Speech Signal Processing EN2300



Similar documents
TTT4110 Information and Signal Theory Solution to exam

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Lecture 1-10: Spectrograms

Probability and Random Variables. Generation of random variables (r.v.)

NRZ Bandwidth - HF Cutoff vs. SNR

TTT4120 Digital Signal Processing Suggested Solution to Exam Fall 2008

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Component Ordering in Independent Component Analysis Based on Data Power

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Advanced Signal Processing and Digital Noise Reduction

Some probability and statistics

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Nonlinear Iterative Partial Least Squares Method

Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm

Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

Lezione 6 Communications Blockset

Ericsson T18s Voice Dialing Simulator

Hardware Implementation of Probabilistic State Machine for Word Recognition

Understanding and Applying Kalman Filtering

Linear Codes. Chapter Basics

Adaptive Equalization of binary encoded signals Using LMS Algorithm

The Algorithms of Speech Recognition, Programming and Simulating in MATLAB

Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT)

Coding and decoding with convolutional codes. The Viterbi Algor

PCM Encoding and Decoding:

Autocovariance and Autocorrelation

Principles of Digital Communication

TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS

School Class Monitoring System Based on Audio Signal Processing

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Lecture 1-6: Noise and Filters

Example/ an analog signal f ( t) ) is sample by f s = 5000 Hz draw the sampling signal spectrum. Calculate min. sampling frequency.

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Signal Detection C H A P T E R SIGNAL DETECTION AS HYPOTHESIS TESTING

IN current film media, the increase in areal density has

E3: PROBABILITY AND STATISTICS lecture notes

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Background 2. Lecture 2 1. The Least Mean Square (LMS) algorithm 4. The Least Mean Square (LMS) algorithm 3. br(n) = u(n)u H (n) bp(n) = u(n)d (n)

Module 3: Correlation and Covariance

Applications to Data Smoothing and Image Processing I

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA

State of Stress at Point

MIMO CHANNEL CAPACITY

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

VISUAL ALGEBRA FOR COLLEGE STUDENTS. Laurie J. Burton Western Oregon University

Mini-project in TSRT04: Cell Phone Coverage

(2) (3) (4) (5) 3 J. M. Whittaker, Interpolatory Function Theory, Cambridge Tracts

Tutorial about the VQR (Voice Quality Restoration) technology

RECOMMENDATION ITU-R BO.786 *

Data Mining: Algorithms and Applications Matrix Math Review

Linear Predictive Coding

Speech Compression. 2.1 Introduction

Manual Analysis Software AFD 1201

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS

521493S Computer Graphics. Exercise 2 & course schedule change

B3. Short Time Fourier Transform (STFT)

α = u v. In other words, Orthogonal Projection

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

MODULATION Systems (part 1)

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

Experiment 7: Familiarization with the Network Analyzer

Review Jeopardy. Blue vs. Orange. Review Jeopardy

5 Signal Design for Bandlimited Channels

1 Example of Time Series Analysis by SSA 1

Taking the Mystery out of the Infamous Formula, "SNR = 6.02N dB," and Why You Should Care. by Walt Kester

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

CHAPTER 3: DIGITAL IMAGING IN DIAGNOSTIC RADIOLOGY. 3.1 Basic Concepts of Digital Imaging

min ǫ = E{e 2 [n]}. (11.2)

AN Application Note: FCC Regulations for ISM Band Devices: MHz. FCC Regulations for ISM Band Devices: MHz

4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department sss40@eng.cam.ac.

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

How To Solve The Cluster Algorithm

FACTORING QUADRATICS and 8.1.2

Technical Bulletin on the Evaluation of the Kinetics Tuned Absorber/Diffuser Panel by Peter D'Antonio RPG Diffusor Systems, Inc Upper Marlboro, MD

Basics of Floating-Point Quantization

Implementation of Digital Signal Processing: Some Background on GFSK Modulation

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Capacity Limits of MIMO Channels

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

1 Multi-channel frequency division multiplex frequency modulation (FDM-FM) emissions

Non-Data Aided Carrier Offset Compensation for SDR Implementation

Luigi Piroddi Active Noise Control course notes (January 2015)

Developing an Isolated Word Recognition System in MATLAB

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

L9: Cepstral analysis

The Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Quarterly Progress and Status Report. Measuring inharmonicity through pitch extraction

Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks

How to Win the Stock Market Game

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Transcription:

Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory. A: 43 p; B: 37p; C: 3; D: 25p: E: 9 p, out of 48p total. Optional: Swedish or English. To be published on the course web page. Results: Tuesday, Jan 5, 28. Review: At KTH-EE/ STEX, Osquldas v.. Good Luck! Please do the Course Evaluation! See the course web page.

Please start by browsing through the exam problems. If you do not have enough time to complete tedious numerical computations, just show that you understood the principles. Use shortcuts and reasonable approximations where appropriate. Problems Determine for each of the following statements whether it is true or false (giving +p for each correct answer, p for incorrect). Please note that you can get a negative total result. (a) In a speech transmission system using closed-loop adaptive linear prediction with high data rate, the quantization noise is approximately white at the output of the decoder in the receiver. Solution: True (b) For a source signal with uncorrelated sampels, i.e. a white signal spectrum, an optimal scalar quantizer (SQ) achieves exactly the same performance as an optimal vector quantizer (VQ), at any total bit-rate, equal for both quantizers. Solution: False (c) A system using the standardized A-law non-uniform quantization gives a signal to quantization noise ratio which becomes approximately independent of the input signal amplitude in the limit when the signal amplitude decreases towards very low values. Solution: False (d) At time t a hidden Markov model (HMM) has an internal hidden state denoted as S t and observable real-valued output x t. Using a complete observed sequence x,..., x T, the Viterbi algorithm can be used to calculate an exact value of the conditional state probability at time t, P [S t x,..., x T ]. Solution: False. The Viterbi algorithm finds the single most probable state sequence, but other states can still have non-zero probabilities, at any time t. (e) The frequency of the first formant is, on average, about twice as large in female speech, compared to male speech. Solution: False. The vocal tract is on average only slightly shorter for women than for men. (f) A spectrogram, analyzed with a Hamming window of 256 samples, at a sampling rate of 8 samples/s, can show the pitch harmonics in voiced male speech. Solution: True. 2 Fig. shows three synthetic vowel spectra and three pole-zero configurations for LP whitening filters that may or may not correspond to the displayed vowel spectra. All plots were produced using a sampling rate of 8 Hz.

Spectrum Level (db) 4 2 2 4 6 8 5 5 2 25 3 35 4 Frequency/Hz Imaginary Part.8.6.4.2.2.4.6.8.5.5 Real Part (a) (b) Spectrum Level (db) 3 2 2 3 4 5 6 5 5 2 25 3 35 4 Frequency/Hz Imaginary Part.8.6.4.2.2.4.6.8.5.5 Real Part (c) (d) Spectrum Level (db) 4 2 2 4 6 8 5 5 2 25 3 35 4 Frequency/Hz Imaginary Part.8.6.4.2.2.4.6.8.5.5 Real Part (e) (f) Figure : Synthetic vowel spectra and whitening filter zero plots that may or may not correspond to the vowel spectra, i.e. the mapping is not necessarily one-to-one. 2

(a) Indicate for each vowel spectrum the most likely corresponding z-plane zero plot. (3p) Note: the mapping is not necessarily one-to-one, i.e. more than one spectrum may or may not correspond to the same pole-zero plot! Solution: a-f, c-d, e-b. (b) Indicate for each vowel spectrum if the speaker was most probably a man or a woman. (3p) Solution: Spectrum (a) female, (c) male, (e) female. (c) Which of the illustrated vowels has the lowest first-formant frequency? What is, approximately, the first formant frequency in your selected vowel? (2p) Solution: Vowel e has the lowest first formant, at about 3 Hz. 3

3 A stationary random signal X(n) has a Gaussian probability density function with zero mean and known variance σ 2. We want to quantize this signal using a rate of bit/sample. (a) We first design an optimal scalar quantizer for this signal, to achieve minimum total distortion power, including both granular and overload distortion. Determine the optimal quantization intervals and reconstruction levels. (3p) Solution: The probability density function for each signal sample is f X (x) = σ (x ) 2 2π e 2σ 2 Because of the symmetry of this distribution around zero, we must design a midrise quantizer with x =, and symmetric reconstruction levels ˆx = a and ˆx 2 = a. To determine the optimal reconstruction levels, we choose a = xf X (x)dx 2 f X (x)dx = 2 xf X (x)dx = σ π (b) Calculate the resulting signal-to-noise ratio in db for this scalar quantizer. (3p) Solution: The quantization distortion power is q 2 =2 =2 (x a) 2 f X (x)dx = x 2 f X (x)dx 2a =σ 2 2a 2 + a 2 = σ 2 a 2 = ( =σ 2 2 ) π 2xf X (x)dx + 2a 2 f X (x)dx = using the optimal choice for a. Thus the signal-to-quantization noise ratio is ( SNR = lg 2 ) 4.4 (db) π (c) We also consider designing an optimal vector quantizer (VQ) with the same rate of bit per sample, by encoding each pair of consecutive samples using 2 bits. If the input signal is exactly white, does the VQ reduce the quantization noise, compared to the scalar quantizer? Qualitative motivation is required, but no formal calculations. (2p) Solution: With the 2-bit VQ we have 4 reconstruction points in the 2-dimensional x, x 2 plane, placed symmetrically with one point in each quadrant. If the signal is white, i.e. the signal samples are uncorrelated, then the distortion contribution is exactly the same in both dimensions, and the total distortion is exactly the same as with the scalar quantizer. This is an exception for this special case with only two reconstruction levels in each dimension. Generally, the VQ is better, even for a white signal. 4 Consider a 2-dimensional random vector X = (X, X 2 ) T with a distribution as indicated in Fig. 2. 4

x 2 a x 2a Figure 2: Probability density for a two-dimensional random vector. The probability density is uniform in the shaded rectangle, and zero otherwise. (a) Discuss briefly why direct scalar quantization of the vector elements X and X 2 would not be suitable in this case. (p) Solution: The vector elements X and X 2 are correlated, and therefore a VQ will give much better performance. (b) Sketch an optimally rotated orthonormal coordinate system in which the transformed vectors Y = (Y, Y 2 ) T have a distribution that is maximally suitable for element-wise scalar quantization.(p) Solution: Rotating the coordinate axes 45 degrees in either direction, e.g. as in fig. 3, will make the transformed coordinates (Y, Y 2 ) statistically independent. Then two uniform scalar quantizers will have a good chance to perform well (although a VQ would still be slightly better, because of its more flexible space-filling properties). (c) Design two scalar quantizers, one for element Y and the other for Y 2, without any overload distortion. You can use a total data rate of R bits per vector, so you are free to allocate w bits for element Y and R w bits for element Y 2. What is the optimal choice of w for a general value of R? Calculate specifically the optimal w for R = 3. (3p) Solution: The total quantization noise variance is σ 2 q = 2 /2 + 2 2/2 = (2a) 2 2 2w + a 2 2 2(R w) 5

y 2 x 2 a y x 2a Figure 3: Rotated coordinate system corresponding to fig. 2. If w were a continuous-valued variable we would obtain the minimum noise variance when with solution dσ 2 q dw (2a)2 2 2w + a 2 2 2(R w) =; 4 2 2w =2 2(R w) ; 2 2w = 2R + 2w; w =(R + )/2; R w = (R )/2 For simple symmetry reasons, we might have seen directly that we should spend bit more for the dimension with range 2a. The derived result gives exact integer-valued solutions for any odd R. For R = 3 we obtain w = 2; R w =. For even values of R we must choose the best of the two closest alternatives: If we choose w = R w = R/2, σ 2 q (2a) 2 2 R + a 2 2 R = 5a 2 2 R If we choose w = (R + 2)/2; R w = (R 2)/2, Thus, both solutions are equally good. σ 2 q (2a) 2 2 R 2 + a 2 2 R+2 = 5a 2 2 R 6

(d) How large are the quantization stepsizes and 2 for the two scalar quantizers in the special case of R = 3 bits per vector? (2p) Solution: = 2 = a/2 (e) Instead of the coordinate transformation and scalar quantization, we could have achieved roughly the same performance with a Vector Quantizer. Argue briefly why the transformation and scalar coding may still be preferred (especially at high data rates). (p) Solution: The uniform scalar-quantizer is computationally much simpler than the general VQ solution. The VQ encoder must search through the entire VQ codebook for every input vector. 5 You will now design a Wiener filter for speech enhancement. The filter input is a noisy speech signal y(n) = x(n) + v(n) where x(n) represents the clean speech and v(n) is the additive noise. The speech and noise are assumed to be mutually independent and stationary random processes. The noise is white with power spectral density Φ vv (ω) =, and the speech signal has a power spectral density Φ xx (ω) = + cos(ω) (a) Determine the autocorrelation sequence φ xx (k) for the speech signal. (2p) Solution: π jkω dω φ xx (k) = Φ xx (ω)e 2π = = π π π ( + e+jω + e jω jkω dω )e 2 2π =, k = = /2, k = ±, otherwise (b) Design a causal FIR linear-phase Wiener filter with transfer function H(z) = w + w z + w z 2 to produce an optimal estimate of the clean speech signal, in the sense that it minimises a distortion measure Q = E [ (x(n ) ˆx(n)) 2] where ˆx(n) is the output signal from the Wiener filter. (6p) (Note that the delay in this distortion measure is relevant, because the linear-phase filter has a group delay of sample, and we do not regard this filter delay as distortion.) Solution: We will need the correlation functions φ xy (k) =E [x(n + k)y(n)] = E [x(n + k)(x(n) + v(n)] = φ xx (k); φ yy (k) =E [y(n + k)y(n)] = E [(x(n + k) + v(n + k))(x(n) + v(n)] = φ xx (k) + φ vv (k); φ vv (k) =δ(k) 7

The Wiener filter output signal is ˆx(n) =w y(n) + w y(n ) + w 2 y(n 2) = w (y(n) + y(n 2)) + w y(n ) The necessary conditions for minimal distortion are then = E [ (x(n ) ˆx(n)) 2] [ = E 2(x(n ) ˆx(n)) ˆx(n) ] = w w =E [ 2(x(n ) w (y(n) + y(n 2)) w y(n ))(y(n) + y(n 2))] = = 2(φ xy ( ) + φ xy ()) + 4w (φ yy () + φ yy (2)) + 2w (φ yy ( ) + φ yy ()) = = 4φ xx () + 4w (φ yy () + φ yy (2)) + 4w φ yy (); [ =E 2(x(n ) ˆx(n)) ˆx(n) ] = w =E [ 2(x(n ) w (y(n) + y(n 2)) w y(n ))y(n )] = = 2φ xx () + 4w φ yy () + 2w φ yy () Thus, we obtain the optimal filter coefficient from the linear equations ( ) ( ) ( ) φyy () + φ yy (2) φ yy () w φxx () = ; 2φ yy () φ yy () w φ xx () ( ) ( ) ( ) 2 /2 w /2 = ; 2 w with solution ( w w ) ( ) /7 = ; 3/7 8

6 The probability density function of a two-dimensional random vector X = (X, X 2 ) T is defined by an M-component Gaussian mixture model (GMM), with a known probability density function as illustrated in Fig. 4. p X (x) = M m= w m (2π) 2/2 det C m e 2 (x µm)t C m (x µm) Figure 4: GMM probability density function plotted with darker gray shading indicating greater probability density. The thin black curves connect points where the probability density equals.9, / e.6, and.2 times the maximal probability density. (a) Estimate approximate values for all the GMM parameters, i.e. weights w m, means µ m, and covariance matrices C m, that define the probability density function shown in Fig. 4. Crude approximations are sufficient, if your theoretical reasoning is correct. (4p) Solution: The symmetry shows that there are M = 2 Gaussian components, with equal weights w = w 2 =.5. The two mean points are located approximately at the two maxima of the pdf, 9

i.e. at µ = ( ).5 ; µ 2 = ( ).5. The iso-density curves indicate that X and X 2 are uncorrelated, with equal and diagonal covariance matrices C = C 2. Any oftheiso-density curves can be used, but it is slightly easier to use the second curve. For any scalar Gaussian distribution, the probability density is reduced from the maximum value by the factor e (x µ) 2 2σ 2 This factor equals / e when x µ = σ. Therefore, the standard deviations can be easily estimated from the second iso-density curve in the graph. Thus, σ 2.5.5 = and σ 2 2 = 2, and the covariance matrices are ( ) ( ) σ 2 C = C 2 = σ2 2 = 4 (b) Determine the overall mean vector µ = E [X] and covariance matrix C = cov [X] for the random vector X. Express the result in terms of the general parameters w m, µ m, C m of the given GMM, so that the result is valid regardless of your numerical estimates of these parameters. (2p) Solution: We can regard X to be generated in a two-step procedure: First, a discrete random variable U is generated, with probability mass P (U = m) = w m ; m {, 2} This variable indicates which of the two Gaussian components is to be used to generate a vector with the conditional probability density of the selected component. Thus, Similarly, µ =E [X] = E [X U = ] P (U = ) + E [X U = 2] P (U = 2) = =µ w + µ 2 w 2 ; C = cov [X] = E [ (X µ)(x µ) T ] = 2 = E [ (X µ)(x µ) T U = m ] P (U = m) = = = = m= 2 E [ (X µ m + µ m µ)(x µ m + µ m µ) T U = m ] P (U = m) = m= 2 E [ (X µ m )(X µ m ) T + (µ m µ)(µ m µ) T U = m ] P (U = m)+ m= 2 + E [ (X µ m )(µ m µ) T + (µ m µ)(x µ m ) T U = m ] P (U = m) = } {{ } m= = 2 C m w m + (µ m µ)(µ m µ) T w m m=

(c) Now assume that two random vectors X A and X B are independent of each other and have identical density functions equal to the previously defined GMM density p X (x) in Fig. 4. A new random vector is defined as Y = X A + X B Show that the probability density function for Y, denoted p Y (y), can be expressed exactly by a GMM, and determine all parameters for this GMM, expressed in terms of the given parameters w m, µ m, C m for p X (x). Numerical values are required only for the mixture weights in p Y (y). (4p) Hint, may be used without proof: If any two random vectors both have Gaussian distributions, not necessarily identical, then the sum of the two vectors also has a Gaussian distribution. Furthermore, if the vectors are independent, the mean of the sum is always equal to the sum of the two means, and the covariance of the sum is equal to the sum of the two covariance matrices. Solution: Each of X A and X B may be generated from either component or 2 in the original GMM. We use two discrete random variables U A and U B, each with outcomes or 2, to denote these possibilities. Thus there are three distinct ways that Y can be generated:. Both X A and X B are generated from component, i.e. (U A = U B = ). This happens with probability w Y, =.25, and then µ Y, =E [X A + X B U A = U B = ] = 2µ ; C Y, = cov [X A + X B U A = U B = ] = 2C ; 2. Both X A and X B are generated from component 2, i.e. (U A = 2 U B = 2). This happens with probability w Y,2 =.25, and then µ Y,2 =E [X A + X B U A = U B = 2] = 2µ 2 ; C Y,2 = cov [X A + X B U A = U B = 2] = 2C 2 ; 3. X A and X B are generated from different components, i.e. (U A = U B = 2) (U A = 2 U B = ). This happens with probability w Y,3 =.5, and then µ Y,3 =E [X A + X B U A U B ] = µ + µ 2 ; C Y,3 = cov [X A + X B U A U B ] = C + C 2 ; Thus, the probability density for Y is exactly a 3-component GMM, with weight factors w Y, =.25, w Y,2 =.25, and w Y,3 =.5.