Solutions to Exam in Speech Signal Processing EN2300

Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory. A: 43 p; B: 37p; C: 3; D: 25p: E: 9 p, out of 48p total. Optional: Swedish or English. To be published on the course web page. Results: Tuesday, Jan 5, 28. Review: At KTH-EE/ STEX, Osquldas v.. Good Luck! Please do the Course Evaluation! See the course web page.

Please start by browsing through the exam problems. If you do not have enough time to complete tedious numerical computations, just show that you understood the principles. Use shortcuts and reasonable approximations where appropriate. Problems Determine for each of the following statements whether it is true or false (giving +p for each correct answer, p for incorrect). Please note that you can get a negative total result. (a) In a speech transmission system using closed-loop adaptive linear prediction with high data rate, the quantization noise is approximately white at the output of the decoder in the receiver. Solution: True (b) For a source signal with uncorrelated sampels, i.e. a white signal spectrum, an optimal scalar quantizer (SQ) achieves exactly the same performance as an optimal vector quantizer (VQ), at any total bit-rate, equal for both quantizers. Solution: False (c) A system using the standardized A-law non-uniform quantization gives a signal to quantization noise ratio which becomes approximately independent of the input signal amplitude in the limit when the signal amplitude decreases towards very low values. Solution: False (d) At time t a hidden Markov model (HMM) has an internal hidden state denoted as S t and observable real-valued output x t. Using a complete observed sequence x,..., x T, the Viterbi algorithm can be used to calculate an exact value of the conditional state probability at time t, P [S t x,..., x T ]. Solution: False. The Viterbi algorithm finds the single most probable state sequence, but other states can still have non-zero probabilities, at any time t. (e) The frequency of the first formant is, on average, about twice as large in female speech, compared to male speech. Solution: False. The vocal tract is on average only slightly shorter for women than for men. (f) A spectrogram, analyzed with a Hamming window of 256 samples, at a sampling rate of 8 samples/s, can show the pitch harmonics in voiced male speech. Solution: True. 2 Fig. shows three synthetic vowel spectra and three pole-zero configurations for LP whitening filters that may or may not correspond to the displayed vowel spectra. All plots were produced using a sampling rate of 8 Hz.

Spectrum Level (db) 4 2 2 4 6 8 5 5 2 25 3 35 4 Frequency/Hz Imaginary Part.8.6.4.2.2.4.6.8.5.5 Real Part (a) (b) Spectrum Level (db) 3 2 2 3 4 5 6 5 5 2 25 3 35 4 Frequency/Hz Imaginary Part.8.6.4.2.2.4.6.8.5.5 Real Part (c) (d) Spectrum Level (db) 4 2 2 4 6 8 5 5 2 25 3 35 4 Frequency/Hz Imaginary Part.8.6.4.2.2.4.6.8.5.5 Real Part (e) (f) Figure : Synthetic vowel spectra and whitening filter zero plots that may or may not correspond to the vowel spectra, i.e. the mapping is not necessarily one-to-one. 2

(a) Indicate for each vowel spectrum the most likely corresponding z-plane zero plot. (3p) Note: the mapping is not necessarily one-to-one, i.e. more than one spectrum may or may not correspond to the same pole-zero plot! Solution: a-f, c-d, e-b. (b) Indicate for each vowel spectrum if the speaker was most probably a man or a woman. (3p) Solution: Spectrum (a) female, (c) male, (e) female. (c) Which of the illustrated vowels has the lowest first-formant frequency? What is, approximately, the first formant frequency in your selected vowel? (2p) Solution: Vowel e has the lowest first formant, at about 3 Hz. 3

3 A stationary random signal X(n) has a Gaussian probability density function with zero mean and known variance σ 2. We want to quantize this signal using a rate of bit/sample. (a) We first design an optimal scalar quantizer for this signal, to achieve minimum total distortion power, including both granular and overload distortion. Determine the optimal quantization intervals and reconstruction levels. (3p) Solution: The probability density function for each signal sample is f X (x) = σ (x ) 2 2π e 2σ 2 Because of the symmetry of this distribution around zero, we must design a midrise quantizer with x =, and symmetric reconstruction levels ˆx = a and ˆx 2 = a. To determine the optimal reconstruction levels, we choose a = xf X (x)dx 2 f X (x)dx = 2 xf X (x)dx = σ π (b) Calculate the resulting signal-to-noise ratio in db for this scalar quantizer. (3p) Solution: The quantization distortion power is q 2 =2 =2 (x a) 2 f X (x)dx = x 2 f X (x)dx 2a =σ 2 2a 2 + a 2 = σ 2 a 2 = ( =σ 2 2 ) π 2xf X (x)dx + 2a 2 f X (x)dx = using the optimal choice for a. Thus the signal-to-quantization noise ratio is ( SNR = lg 2 ) 4.4 (db) π (c) We also consider designing an optimal vector quantizer (VQ) with the same rate of bit per sample, by encoding each pair of consecutive samples using 2 bits. If the input signal is exactly white, does the VQ reduce the quantization noise, compared to the scalar quantizer? Qualitative motivation is required, but no formal calculations. (2p) Solution: With the 2-bit VQ we have 4 reconstruction points in the 2-dimensional x, x 2 plane, placed symmetrically with one point in each quadrant. If the signal is white, i.e. the signal samples are uncorrelated, then the distortion contribution is exactly the same in both dimensions, and the total distortion is exactly the same as with the scalar quantizer. This is an exception for this special case with only two reconstruction levels in each dimension. Generally, the VQ is better, even for a white signal. 4 Consider a 2-dimensional random vector X = (X, X 2 ) T with a distribution as indicated in Fig. 2. 4

x 2 a x 2a Figure 2: Probability density for a two-dimensional random vector. The probability density is uniform in the shaded rectangle, and zero otherwise. (a) Discuss briefly why direct scalar quantization of the vector elements X and X 2 would not be suitable in this case. (p) Solution: The vector elements X and X 2 are correlated, and therefore a VQ will give much better performance. (b) Sketch an optimally rotated orthonormal coordinate system in which the transformed vectors Y = (Y, Y 2 ) T have a distribution that is maximally suitable for element-wise scalar quantization.(p) Solution: Rotating the coordinate axes 45 degrees in either direction, e.g. as in fig. 3, will make the transformed coordinates (Y, Y 2 ) statistically independent. Then two uniform scalar quantizers will have a good chance to perform well (although a VQ would still be slightly better, because of its more flexible space-filling properties). (c) Design two scalar quantizers, one for element Y and the other for Y 2, without any overload distortion. You can use a total data rate of R bits per vector, so you are free to allocate w bits for element Y and R w bits for element Y 2. What is the optimal choice of w for a general value of R? Calculate specifically the optimal w for R = 3. (3p) Solution: The total quantization noise variance is σ 2 q = 2 /2 + 2 2/2 = (2a) 2 2 2w + a 2 2 2(R w) 5

y 2 x 2 a y x 2a Figure 3: Rotated coordinate system corresponding to fig. 2. If w were a continuous-valued variable we would obtain the minimum noise variance when with solution dσ 2 q dw (2a)2 2 2w + a 2 2 2(R w) =; 4 2 2w =2 2(R w) ; 2 2w = 2R + 2w; w =(R + )/2; R w = (R )/2 For simple symmetry reasons, we might have seen directly that we should spend bit more for the dimension with range 2a. The derived result gives exact integer-valued solutions for any odd R. For R = 3 we obtain w = 2; R w =. For even values of R we must choose the best of the two closest alternatives: If we choose w = R w = R/2, σ 2 q (2a) 2 2 R + a 2 2 R = 5a 2 2 R If we choose w = (R + 2)/2; R w = (R 2)/2, Thus, both solutions are equally good. σ 2 q (2a) 2 2 R 2 + a 2 2 R+2 = 5a 2 2 R 6

(d) How large are the quantization stepsizes and 2 for the two scalar quantizers in the special case of R = 3 bits per vector? (2p) Solution: = 2 = a/2 (e) Instead of the coordinate transformation and scalar quantization, we could have achieved roughly the same performance with a Vector Quantizer. Argue briefly why the transformation and scalar coding may still be preferred (especially at high data rates). (p) Solution: The uniform scalar-quantizer is computationally much simpler than the general VQ solution. The VQ encoder must search through the entire VQ codebook for every input vector. 5 You will now design a Wiener filter for speech enhancement. The filter input is a noisy speech signal y(n) = x(n) + v(n) where x(n) represents the clean speech and v(n) is the additive noise. The speech and noise are assumed to be mutually independent and stationary random processes. The noise is white with power spectral density Φ vv (ω) =, and the speech signal has a power spectral density Φ xx (ω) = + cos(ω) (a) Determine the autocorrelation sequence φ xx (k) for the speech signal. (2p) Solution: π jkω dω φ xx (k) = Φ xx (ω)e 2π = = π π π ( + e+jω + e jω jkω dω )e 2 2π =, k = = /2, k = ±, otherwise (b) Design a causal FIR linear-phase Wiener filter with transfer function H(z) = w + w z + w z 2 to produce an optimal estimate of the clean speech signal, in the sense that it minimises a distortion measure Q = E [ (x(n ) ˆx(n)) 2] where ˆx(n) is the output signal from the Wiener filter. (6p) (Note that the delay in this distortion measure is relevant, because the linear-phase filter has a group delay of sample, and we do not regard this filter delay as distortion.) Solution: We will need the correlation functions φ xy (k) =E [x(n + k)y(n)] = E [x(n + k)(x(n) + v(n)] = φ xx (k); φ yy (k) =E [y(n + k)y(n)] = E [(x(n + k) + v(n + k))(x(n) + v(n)] = φ xx (k) + φ vv (k); φ vv (k) =δ(k) 7

The Wiener filter output signal is ˆx(n) =w y(n) + w y(n ) + w 2 y(n 2) = w (y(n) + y(n 2)) + w y(n ) The necessary conditions for minimal distortion are then = E [ (x(n ) ˆx(n)) 2] [ = E 2(x(n ) ˆx(n)) ˆx(n) ] = w w =E [ 2(x(n ) w (y(n) + y(n 2)) w y(n ))(y(n) + y(n 2))] = = 2(φ xy ( ) + φ xy ()) + 4w (φ yy () + φ yy (2)) + 2w (φ yy ( ) + φ yy ()) = = 4φ xx () + 4w (φ yy () + φ yy (2)) + 4w φ yy (); [ =E 2(x(n ) ˆx(n)) ˆx(n) ] = w =E [ 2(x(n ) w (y(n) + y(n 2)) w y(n ))y(n )] = = 2φ xx () + 4w φ yy () + 2w φ yy () Thus, we obtain the optimal filter coefficient from the linear equations ( ) ( ) ( ) φyy () + φ yy (2) φ yy () w φxx () = ; 2φ yy () φ yy () w φ xx () ( ) ( ) ( ) 2 /2 w /2 = ; 2 w with solution ( w w ) ( ) /7 = ; 3/7 8

6 The probability density function of a two-dimensional random vector X = (X, X 2 ) T is defined by an M-component Gaussian mixture model (GMM), with a known probability density function as illustrated in Fig. 4. p X (x) = M m= w m (2π) 2/2 det C m e 2 (x µm)t C m (x µm) Figure 4: GMM probability density function plotted with darker gray shading indicating greater probability density. The thin black curves connect points where the probability density equals.9, / e.6, and.2 times the maximal probability density. (a) Estimate approximate values for all the GMM parameters, i.e. weights w m, means µ m, and covariance matrices C m, that define the probability density function shown in Fig. 4. Crude approximations are sufficient, if your theoretical reasoning is correct. (4p) Solution: The symmetry shows that there are M = 2 Gaussian components, with equal weights w = w 2 =.5. The two mean points are located approximately at the two maxima of the pdf, 9

i.e. at µ = ( ).5 ; µ 2 = ( ).5. The iso-density curves indicate that X and X 2 are uncorrelated, with equal and diagonal covariance matrices C = C 2. Any oftheiso-density curves can be used, but it is slightly easier to use the second curve. For any scalar Gaussian distribution, the probability density is reduced from the maximum value by the factor e (x µ) 2 2σ 2 This factor equals / e when x µ = σ. Therefore, the standard deviations can be easily estimated from the second iso-density curve in the graph. Thus, σ 2.5.5 = and σ 2 2 = 2, and the covariance matrices are ( ) ( ) σ 2 C = C 2 = σ2 2 = 4 (b) Determine the overall mean vector µ = E [X] and covariance matrix C = cov [X] for the random vector X. Express the result in terms of the general parameters w m, µ m, C m of the given GMM, so that the result is valid regardless of your numerical estimates of these parameters. (2p) Solution: We can regard X to be generated in a two-step procedure: First, a discrete random variable U is generated, with probability mass P (U = m) = w m ; m {, 2} This variable indicates which of the two Gaussian components is to be used to generate a vector with the conditional probability density of the selected component. Thus, Similarly, µ =E [X] = E [X U = ] P (U = ) + E [X U = 2] P (U = 2) = =µ w + µ 2 w 2 ; C = cov [X] = E [ (X µ)(x µ) T ] = 2 = E [ (X µ)(x µ) T U = m ] P (U = m) = = = = m= 2 E [ (X µ m + µ m µ)(x µ m + µ m µ) T U = m ] P (U = m) = m= 2 E [ (X µ m )(X µ m ) T + (µ m µ)(µ m µ) T U = m ] P (U = m)+ m= 2 + E [ (X µ m )(µ m µ) T + (µ m µ)(x µ m ) T U = m ] P (U = m) = } {{ } m= = 2 C m w m + (µ m µ)(µ m µ) T w m m=

(c) Now assume that two random vectors X A and X B are independent of each other and have identical density functions equal to the previously defined GMM density p X (x) in Fig. 4. A new random vector is defined as Y = X A + X B Show that the probability density function for Y, denoted p Y (y), can be expressed exactly by a GMM, and determine all parameters for this GMM, expressed in terms of the given parameters w m, µ m, C m for p X (x). Numerical values are required only for the mixture weights in p Y (y). (4p) Hint, may be used without proof: If any two random vectors both have Gaussian distributions, not necessarily identical, then the sum of the two vectors also has a Gaussian distribution. Furthermore, if the vectors are independent, the mean of the sum is always equal to the sum of the two means, and the covariance of the sum is equal to the sum of the two covariance matrices. Solution: Each of X A and X B may be generated from either component or 2 in the original GMM. We use two discrete random variables U A and U B, each with outcomes or 2, to denote these possibilities. Thus there are three distinct ways that Y can be generated:. Both X A and X B are generated from component, i.e. (U A = U B = ). This happens with probability w Y, =.25, and then µ Y, =E [X A + X B U A = U B = ] = 2µ ; C Y, = cov [X A + X B U A = U B = ] = 2C ; 2. Both X A and X B are generated from component 2, i.e. (U A = 2 U B = 2). This happens with probability w Y,2 =.25, and then µ Y,2 =E [X A + X B U A = U B = 2] = 2µ 2 ; C Y,2 = cov [X A + X B U A = U B = 2] = 2C 2 ; 3. X A and X B are generated from different components, i.e. (U A = U B = 2) (U A = 2 U B = ). This happens with probability w Y,3 =.5, and then µ Y,3 =E [X A + X B U A U B ] = µ + µ 2 ; C Y,3 = cov [X A + X B U A U B ] = C + C 2 ; Thus, the probability density for Y is exactly a 3-component GMM, with weight factors w Y, =.25, w Y,2 =.25, and w Y,3 =.5.