Speech Signal Processing: An Overview

Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 1 / 13

Organization Introduction Sampling frequency and bit resolution Non-stationary nature Short term processing STFT and Spectrogram Energy and Pitch Cepstral analysis Linear prediction analysis Speech processing tasks Summary Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 2 / 13

Speech: Fundamental and effortless mode of communication among humans. Speech communication: Talker, listener and channel Speech Production Process: Message formulation, language coding, neuro-muscular commands, movement of speech production organs, acoustic pressure variations Speech Perception Process: acoustic pressure variations, movement of speech perception organs, neuro-muscular commands, message comprehension Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 3 / 13

What is present in Speech Signal? Message Speaker Emotion Language Dialect Sensor Channel How to analyze, extract and model these information Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 4 / 13

Sampling Frequency Acoustic pressure variations to electrical signal using microphone Digitization for storage, analysis and processing on a digital machine Sampling, quantization and Encoding Sampling Theorem: The sampling frequency should be greater than or equal to twice the maximum frequency Audio frequency range: 20 Hz to 20 khz Speech components up to 14 khz, but can consider the whole audio range. Min. Sampling frequency recommended is 40 khz Including some guard band it is 48 khz Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 5 / 13

Bit Rate Number of bits / sample Bit resolution Number of quantization levels Minimum 16 bits is recommended Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 6 / 13

Non-Stationary Nature Signal, system, and signals and systems Stationary vs non-stationary signal Significance of non-stationary nature of speech Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 7 / 13

Short Term Processing Need for short term processing Approach for short term processing Frame size and frame shift Short term time domain processing Short term frequency domain processing Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 8 / 13

Short Term Domain Parameters Short term energy Short term zero crossing rate Short term autocorrelation Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 9 / 13

Short Term Frequency Domain Parameters Short term Fourier transform DTFT, STFT, DFT, FFT Spectrogram Wideband spectrogram Narrowband spectrogram Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 10 / 13

Cepstral Analysis of Speech Separation of source and system components in cepstral domain Feature extraction stage of automatic speech processing systems Also in estimation of pitch Cepstrum pitch determination Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 11 / 13

Linear Prediction Analysis of Speech Separation of source and system components in time domain Filter coefficients for speech coding Pitch estimation by SIFT Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 12 / 13

Automatic Speech Processing Tasks Speech recognition Speaker recognition Speech synthesis Language identification Prasanna (EMST Lab, EEE, IITG) Speech Signal Processing: An Overview December 20, 2012 13 / 13