L6: Short-time Fourier analysis and synthesis

Similar documents
L9: Cepstral analysis

Analysis/resynthesis with the short time Fourier transform

Short-time FFT, Multi-taper analysis & Filtering in SPM12

Lecture 1-10: Spectrograms

Lecture 8 ELE 301: Signals and Systems

SGN-1158 Introduction to Signal Processing Test. Solutions

How To Understand The Nyquist Sampling Theorem

Design of FIR Filters

Auto-Tuning Using Fourier Coefficients

B3. Short Time Fourier Transform (STFT)

Speech Signal Processing: An Overview

5 Signal Design for Bandlimited Channels

FFT Algorithms. Chapter 6. Contents 6.1

Frequency Response of FIR Filters

TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS

The continuous and discrete Fourier transforms

The Fourier Analysis Tool in Microsoft Excel

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

Review of Fourier series formulas. Representation of nonperiodic functions. ECE 3640 Lecture 5 Fourier Transforms and their properties

Analog and Digital Signals, Time and Frequency Representation of Signals

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically.

Time Series Analysis: Introduction to Signal Processing Concepts. Liam Kilmartin Discipline of Electrical & Electronic Engineering, NUI, Galway

Probability and Random Variables. Generation of random variables (r.v.)

SIGNAL PROCESSING & SIMULATION NEWSLETTER

Frequency Response of Filters

Applications of the DFT

Adding Sinusoids of the Same Frequency. Additive Synthesis. Spectrum. Music 270a: Modulation

The Algorithms of Speech Recognition, Programming and Simulating in MATLAB

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

The Effective Number of Bits (ENOB) of my R&S Digital Oscilloscope Technical Paper

Lecture 1-6: Noise and Filters

PYKC Jan Lecture 1 Slide 1

Introduction to IQ-demodulation of RF-data

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

1.4 Fast Fourier Transform (FFT) Algorithm

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

FAST Fourier Transform (FFT) and Digital Filtering Using LabVIEW

EE 179 April 21, 2014 Digital and Analog Communication Systems Handout #16 Homework #2 Solutions

Implementation of Digital Signal Processing: Some Background on GFSK Modulation

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

Doppler. Doppler. Doppler shift. Doppler Frequency. Doppler shift. Doppler shift. Chapter 19

SAMPLE SOLUTIONS DIGITAL SIGNAL PROCESSING: Signals, Systems, and Filters Andreas Antoniou

Chapter 8 - Power Density Spectrum

MODULATION Systems (part 1)

RF Measurements Using a Modular Digitizer

chapter Introduction to Digital Signal Processing and Digital Filtering 1.1 Introduction 1.2 Historical Perspective

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Angle Modulation, II. Lecture topics FM bandwidth and Carson s rule. Spectral analysis of FM. Narrowband FM Modulation. Wideband FM Modulation

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Matlab GUI for WFB spectral analysis

AN Application Note: FCC Regulations for ISM Band Devices: MHz. FCC Regulations for ISM Band Devices: MHz

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

FOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS. UIUC Physics 193 POM

Window function From Wikipedia, the free encyclopedia For the term used in SQL statements, see Window function (SQL)

10 Discrete-Time Fourier Series

Solutions to Exam in Speech Signal Processing EN2300

Artificial Neural Network for Speech Recognition

(2) (3) (4) (5) 3 J. M. Whittaker, Interpolatory Function Theory, Cambridge Tracts

PeakVue Analysis for Antifriction Bearing Fault Detection

SWISS ARMY KNIFE INDICATOR John F. Ehlers

Lab 1. The Fourier Transform

SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY

Appendix D Digital Modulation and GMSK

MATRIX TECHNICAL NOTES

The front end of the receiver performs the frequency translation, channel selection and amplification of the signal.

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

UNIVERSITY OF CALIFORNIA, SAN DIEGO Electrical & Computer Engineering Department ECE Fall 2010 Linear Systems Fundamentals

The Calculation of G rms

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Understanding CIC Compensation Filters

Spectrum Level and Band Level

Realtime FFT processing in Rohde & Schwarz receivers

Time series analysis Matlab tutorial. Joachim Gross

AM Receiver. Prelab. baseband

Correlation and Convolution Class Notes for CMSC 426, Fall 2005 David Jacobs

Aliasing, Image Sampling and Reconstruction

PCM Encoding and Decoding:

CONVOLUTION Digital Signal Processing

Thirukkural - A Text-to-Speech Synthesis System

Introduction to Digital Filters

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

9 Fourier Transform Properties

Jitter Measurements in Serial Data Signals

Digital Transmission (Line Coding)

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

4 Digital Video Signal According to ITU-BT.R.601 (CCIR 601) 43

Real-Time Spectrum Analyzer Fundamentals

How to avoid typical pitfalls when measuring Variable Frequency Drives

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer

T = 10-' s. p(t)= ( (t-nt), T= 3. n=-oo. Figure P16.2

Agilent Creating Multi-tone Signals With the N7509A Waveform Generation Toolbox. Application Note

(Refer Slide Time: 01:11-01:27)

T = 1 f. Phase. Measure of relative position in time within a single period of a signal For a periodic signal f(t), phase is fractional part t p

Section 1.1 Linear Equations: Slope and Equations of Lines

Transcription:

L6: Short-time Fourier analysis and synthesis Overview Analysis: Fourier-transform view Analysis: filtering view Synthesis: filter bank summation (FBS) method Synthesis: overlap-add (OLA) method STFT magnitude This lecture is based on chapter 7 of [Quatieri, 2002] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 1

Recap from previous lectures Overview Discrete time Fourier transform (DTFT) Taking the expression of the Fourier transform X jω = x(t)e jωt dt, the DTFT can be derived by numerical integration X e jω = x n e jωn where x n = x nt S and ω = 2πF F S Discrete Fourier transform (DFT) The DFT is obtained by sampling the DTFT at N discrete frequencies ω k = 2πF s N, which yields the transform X k = N 1 n=0 x n e j2π N kn Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 2

Why is another Fourier transform needed? The spectral content of speech changes over time (non stationary) As an example, formants change as a function of the spoken phonemes Applying the DFT over a long window does not reveal transitions in spectral content To avoid this issue, we apply the DFT over short periods of time For short enough windows, speech can be considered to be stationary Remember, though, that there is a time-frequency tradeoff here x(t) 1.5 1 0.5 0-0.5 X(f) 50 40 30 20 SFTF (Hz) 490 390 290 200-1 10 100-1.5 50 100 150 200 time (sa.) 0 500 1000 frequency (Hz) 0 20 40 60 time (frames) Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 3

The short-time Fourier transform in a nutshell Define analysis window (e.g., 30ms narrowband, 5 ms wideband) Define the amount of overlap between windows (e.g., 30%) Define a windowing function (e.g., Hann, Gaussian) Generate windowed segments (multiply signal by windowing function) Apply the FFT to each windowed segment [Sethares, 2007] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 4

Windowing function STFT: Fourier analysis view To localize the speech signal in time, we define a windowing function w n, τ, which is generally tapered at its ends to avoid unnatural discontinuities in the speech segment Any window affects the spectral estimate computed on it The window is selected to trade off the width of its main lobe and attenuation of its side lobes The most common are the Hann and Hamming windows (raised cosines) w n, τ = 0.54 0.4 cos w n, τ = 0.5 1 cos 2π n τ N w 1 2π n τ N 1 http://en.wikipedia.org/wiki/window_function Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 5

Rectangular Hann Hamming http://en.wikipedia.org/wiki/window_function Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 6

Discrete-time Short-time Fourier transform The Fourier transform of the windowed speech waveform is defined as X n, ω = x m w n m e jωn m= where the sequence f n m = x m w n m is a short-time section of the speech signal x m at time n Discrete STFT By analogy with the DTFT/DFT, the discrete STFT is defined as X n, k = X n, ω 2π ω= N k The spectrogram we saw in previous lectures is a graphical display of the magnitude of the discrete STFT, generally in log scale S n, k = log X n, k 2 This can be thought of as a 2D plot of the relative energy content in frequency at different time locations Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 7

For a long window w n, the result is the narrowband spectrogram, which exhibits the harmonic structure in the form of horizontal striations For a short window w n, the result is the wideband spectrogram, which exhibits periodic temporal structure in the form of vertical striations 1 5000 narrowband 5000 wideband 4000 4000 x(t) 0.5 0 SFTF (Hz) 3000 2000 SFTF (Hz) 3000 2000 1000 1000-0.5 500 1000 1500 2000 2500 time (sa.) 0 50 100 150 200 time (frames) 0 50 100 150 200 250 time (frames) Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 8

STFT: filtering view The STFT can also be interpreted as a filtering operation In this case, the analysis window w n plays the role of the filter impulse response To illustrate this view, we fix the value of ω at ω 0, and rewrite X n, ω 0 = x m e jω0m w n m m= which can be interpreted as the convolution of the signal x n e jω 0n with the sequence w n : X n, ω 0 = x n e jω0n w n and the product x n e jω 0n can be interpreted as the modulation of x n up to frequency ω 0 (i.e., per the frequency shift property of the FT) Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 9

[Quatieri, 2002] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 10

Alternatively, we can rearrange as (without proof) X n, ω 0 = e jω 0n x n w n e jω 0n In this case, the sequence x n is first passed through the same filter (with a linear phase factor e jω 0n ), and the filter output is demodulated by e jω 0n [Quatieri, 2002] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 11

This later rearrangement allows us to interpret the discrete STFT as the output of a filter bank X n, k = e j2π N kn x n w n e j2π N kn Note that each filter is acting as a bandpass filter centered around its selected frequency Thus, the discrete STFT can be viewed as a collection of sequences, each corresponding to the frequency components of x n falling within a particular frequency band This filtering view is shown in the next slide, both from the analysis side and from the synthesis (reconstruction) side Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 12

analysis [Quatieri, 2002] synthesis Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 13

Examples ex6p1.m Generate STFT using Matlab functions ex6p2.m Generate filterbank outputs using the filtering view of the STFT ex6p3.m Time-frequency resolution tradeoff (Quatieri fig 7.8) Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 14

Short-time synthesis Under what conditions is the STFT invertible? The discrete-time STFT X n, ω is generally invertible Recall that X n, ω = f n m e jωn m= with f n m = x m w n m Evaluating f n [m] at m = n we obtain f n [n] = x n w 0 So assuming that w 0 0, we can estimate x n as x n = 1 X n, ω e jωn dω 2πw 0 This is known as a synthesis equation for the DT STFT π π Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 15

Amplitude Redundancy of the discrete-time STFT There are many synthesis equations that map X n, ω uniquely to x n Therefore, the STFT is very redundant if we move the analysis window one sample at a time n = 1,2,3 For this reason, the STFT is generally computed by decimating over time, that is, at integer multiples n = L, 2L, 3L For large L, however, the DT STFT may become non-invertible As an example, assume that w n is nonzero over its length N w In this case, when L > N w, there are some samples of x n that are not included in the computation of X n, ω Thus, these samples can have arbitrary values yet yield the same X kl, ω Since X kl, ω is not uniquely defined, it is not invertible N w Unaccounted temporal samples L 2L 3L n Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 16

Amplitude Likewise, the discrete STFT x n, k is not always invertible Consider the case where w n is band-limited with bandwidth B If the sampling interval 2π N is greater than B, some of the frequency components in x n do not pass through any of the filters of the STFT Thus, those frequency components can have any arbitrary values yet produce the same discrete STFT In consequence, depending on the frequency sampling resolution, the discrete STFT may become non invertible B Lost spectral region 2π N 2 2π N 3 2π N ω [Quatieri, 2002] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 17

Synthesis: filter bank summation FBS is based on the filtering interpretation of the STFT As we saw earlier, according to this interpretation the discrete STFT is considered to be the set of outputs from a bank of filters In the FSB method, the output of each filter is modulated with a complex exponential, and these outputs are summed to recover the original signal y n = 1 Nw 0 m= X n, k e j2π N nk [Quatieri, 2002] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 18

Under which conditions does FBS yield exact synthesis? It can be shown that y n = x n if either 1. The length of w n is less than or equal to the no. of filters N w > N, or 2. For N w > N: N 1 W ω 2π k = Nw 0 k=0 N The latter is known as the BFS constraint, and states that the frequency response of the analysis filters should sum to a constant across the entire bandwidth [Quatieri, 2002] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 19

Synthesis: Overlap-add OLA is based on the Fourier transform view of the STFT In the OLA method, we take the inverse DFT for each fixed time in the discrete STFT In principle, we could then divide by the analysis window This method is not used, however, as small perturbations in the STFT can become amplified in the estimated signal y n Instead, we perform an OLA operation between the sections This works provided that w n is designed such that the OLA effectively eliminates the analysis windows from the synthesized sequence The intuition is that the redundancy within overlapping segments and the averaging of the redundant samples averages out the effect of windowing Thus, the OLA method can be expressed as y n = 1 N 1 X p, k e j2π N kn W 0 k=0 p= where the term inside the square brackets is the IDFT Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 20

Under which conditions does OLA yield exact synthesis? It can be shown that if the discrete STFT has been decimated by a factor L, the condition y n = x n is met when which holds when either p= w pl n = W 0 1. The analysis window has finite bandwidth with maximum frequency ω c less than 2π/L, or 2. The sum of all the analysis windows (obtained by sliding w n with L-point increments) adds up to a constant In this case, x n can then be resynthesized as x n = L W 0 p= 1 N N 1 k=0 L X pl, k e j2π N kn [Quatieri, 2002] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 21

[Quatieri, 2002] Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 22

STFT magnitude The spectrogram (STFT magnitude) is widely used in speech For one, evidence suggests that the human ear extracts information strictly from a spectrogram representation of the speech signal Likewise, trained researchers can visually read spectrograms, which further indicates that the spectrogram retains most of the information in the speech signal (at least at the phonetic level) Hence, one may question whether the original signal x n can be recovered from X n, ω, that is, by ignoring phase information Inversion of the STFTM Several methods may be used to estimate x n from the STFTM Here we focus on a fairly intuitive least-squares approximation Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 23

Least-squares estimation from the STFT magnitude In this approach, we seek to estimate a sequence x e n whose STFT magnitude X e n, ω is closest (in a least-squared-error sense) to the known STFT magnitude X n, ω The iteration takes place as follows An arbitrary sequence (usually white noise) is selected as the first estimate x e 1 n We then compute the STFT of x 1 e n and modify it by replacing its magnitude by that of X n, ω X 1 m, ω = X m, ω X e i m, ω i X e m, ω From this, we obtain a new signal estimate as i 1 x i m= w m n g m n e n = w 2 m n m= whereg m i 1 n is the inverse DFT of X i 1 m, ω And the process continues iteratively until convergence or a stopping criterion is met Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 24

It can be shown that this process reduces the distance between X e n, ω and X n, ω at each iteration Thus, the process converges to a local minimum, though not necessarily a global minimum All steps in the iteration can be summarized as (Quatieri, 2002; p. 342) w m n 1 π x i+1 m= X e n = 2π i m, ω e jωn dω π w 2 m n m= where X i m, ω = X m, ω X e i m,ω X e i m,ω Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 25

Example ex6p4.m Estimate a signal from its STFT magnitude Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 26