A Multichannel Feature Compensation Approach for Robust ASR in Noisy and Reverberant Environments

Similar documents
Integration of Short-Time Fourier Domain Speech Enhancement and Observation Uncertainty Techniques for Robust Automatic Speech Recognition

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Advanced Signal Processing and Digital Noise Reduction

4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department sss40@eng.cam.ac.

ADAPTIVE ALGORITHMS FOR ACOUSTIC ECHO CANCELLATION IN SPEECH PROCESSING

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

ACOUSTIC BEAMFORMING FOR HEARING AID APPLICATIONS

L9: Cepstral analysis

Background 2. Lecture 2 1. The Least Mean Square (LMS) algorithm 4. The Least Mean Square (LMS) algorithm 3. br(n) = u(n)u H (n) bp(n) = u(n)d (n)

Developing an Isolated Word Recognition System in MATLAB

Analysis/resynthesis with the short time Fourier transform

Blind Source Separation for Robot Audition using Fixed Beamforming with HRTFs

An introduction to OBJECTIVE ASSESSMENT OF IMAGE QUALITY. Harrison H. Barrett University of Arizona Tucson, AZ

TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY

Component Ordering in Independent Component Analysis Based on Data Power

The Algorithms of Speech Recognition, Programming and Simulating in MATLAB

Short-time FFT, Multi-taper analysis & Filtering in SPM12

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Since it is necessary to consider the ability of the lter to predict many data over a period of time a more meaningful metric is the expected value of

MUSIC-like Processing of Pulsed Continuous Wave Signals in Active Sonar Experiments

Handling attrition and non-response in longitudinal data

Computational Foundations of Cognitive Science

Probability and Random Variables. Generation of random variables (r.v.)

A Statistical Framework for Operational Infrasound Monitoring

Artificial Neural Network for Speech Recognition

Kalman Filter Applied to a Active Queue Management Problem

Understanding and Applying Kalman Filtering

High Quality Integrated Data Reconstruction for Medical Applications

Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections

Computer exercise 2: Least Mean Square (LMS)

Sequential Tracking in Pricing Financial Options using Model Based and Neural Network Approaches

High-Resolution Angle Estimation for an Automotive FMCW Radar Sensor

Emotion Detection from Speech

Adaptive HSI Data Processing for Near-Real-time Analysis and Spectral Recovery *

Kristine L. Bell and Harry L. Van Trees. Center of Excellence in C 3 I George Mason University Fairfax, VA , USA kbell@gmu.edu, hlv@gmu.

Ericsson T18s Voice Dialing Simulator

ADVANCED APPLICATIONS OF ELECTRICAL ENGINEERING

Solutions to Exam in Speech Signal Processing EN2300

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

Java Modules for Time Series Analysis

5 Tips For Making the Most Out of Any Available Opportunities

MIMO CHANNEL CAPACITY

Object Recognition and Template Matching

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Project Report. Noise cancellation in a teleconference

Hardware Implementation of Probabilistic State Machine for Word Recognition

CM0340 SOLNS. Do not turn this page over until instructed to do so by the Senior Invigilator.

THE SIMULATION OF MOVING SOUND SOURCES. John M. Chowning

4F7 Adaptive Filters (and Spectrum Estimation) Kalman Filter. Sumeetpal Singh sss40@eng.cam.ac.uk

Bildverarbeitung und Mustererkennung Image Processing and Pattern Recognition

High Performance GPU-based Preprocessing for Time-of-Flight Imaging in Medical Applications

Security Based Data Transfer and Privacy Storage through Watermark Detection

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

Real-time audio signal enhancement for hands-free speech applications

Question 2: How do you solve a matrix equation using the matrix inverse?

Stability of the LMS Adaptive Filter by Means of a State Equation

Speech Signal Processing: An Overview

Available from Deakin Research Online:

Computational Optical Imaging - Optique Numerique. -- Deconvolution --


ECE 468 / CS 519 Digital Image Processing. Introduction

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Real-time Visual Tracker by Stream Processing

Patch-Based Near-Optimal Image Denoising Priyam Chatterjee, Student Member, IEEE, and Peyman Milanfar, Fellow, IEEE

Part II Redundant Dictionaries and Pursuit Algorithms

Em bedded DSP : I ntroduction to Digital Filters

How to Improve the Sound Quality of Your Microphone

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Denoising Convolutional Autoencoders for Noisy Speech Recognition

Admin stuff. 4 Image Pyramids. Spatial Domain. Projects. Fourier domain 2/26/2008. Fourier as a change of basis

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Speech Recognition on Cell Broadband Engine UCRL-PRES

B3. Short Time Fourier Transform (STFT)

Quantifying Seasonal Variation in Cloud Cover with Predictive Models

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Active Vibration Isolation of an Unbalanced Machine Spindle

C. IDENTIFICATION AND SIGNIFICANCE OF THE PROBLEM OR OPPORTUNITY

Alignment and Preprocessing for Data Analysis

A WEB BASED TRAINING MODULE FOR TEACHING DIGITAL COMMUNICATIONS

Transmit Subaperturing for MIMO Radars With Co-Located Antennas Hongbin Li, Senior Member, IEEE, and Braham Himed, Fellow, IEEE

Imputing Missing Data using SAS

FFT Algorithms. Chapter 6. Contents 6.1

A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Classic EEG (ERPs)/ Advanced EEG. Quentin Noirhomme

Analyzing Portfolio Expected Loss

COMPARISON OF SPATIAL AUDIO TECHNIQUES FOR USE IN STAGE ACOUSTIC LABORATORY EXPERIMENTS

HowHow To Choose A Good Stock Broker

Dynamic Neural Networks for Actuator Fault Diagnosis: Application to the DAMADICS Benchmark Problem

PTE505: Inverse Modeling for Subsurface Flow Data Integration (3 Units)

5 Signal Design for Bandlimited Channels

Lecture 8: Signal Detection and Noise Assumption

The Filtered-x LMS Algorithm

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically.

RF Measurements Using a Modular Digitizer

How To Create A Beat Tracking Effect On A Computer Or A Drumkit

Transcription:

A Multichannel Feature Compensation Approach for Robust ASR in Noisy and Reverberant Environments Ramón F. Astudillo 1 Sebastian Braun 2 Emanuël A. P. Habets 2 1 Spoken Language Systems Laboratory, INESC-ID-Lisboa Lisboa, Portugal 2 International Audio Laboratories Erlangen Am Wolfsmantel 33, 91058 Erlangen, Germany

Overview of the Proposed System The approach integrates STFT-domain enhancement with the ASR system through Uncertainty Propagation. Three main components detailed: Joint reverberation and noise reduction by informed spatial filtering applied in STFT domain. Multichannel MMSE-MFCC estimator with different STFT configurations for enhancement and recognition domains. Model-based feature enhancement using the MSE of the MMSE-MFCC estimator and Modified Imputation.

Joint reverberation and noise reduction Signal model: single source S(k, n), propagation vector d(k, n), reverberation r(k, n) and additive noise v(k, n) y(k, n) = d(k, n)s(k, n) + r(k, n) + v(k, n) All components mutually uncorrelated with variances equal to Φ y (k, n) = φ S (k, n) d(k, n)d H (k, n) + φ R (k, n) Γ diff (k) + Φ v (k, n) Multichannel minimum MSE (M-MMSE) source estimate: Ŝ M-MMSE (k, n) = arg min Ŝ(k,n) E { S(k, n) Ŝ(k, n) 2} = H MMSE (k, n) h MVDR (k, n) H y(k, n) }{{} h M-MMSE (k,n)

Joint reverberation and noise reduction Optional use of multichannel MMSE Amplitude (M-STSA) estimate: Ŝ M-STSA (k, n) = H STSA (k, n) h MVDR (k, n) H y(k, n) }{{} h M-STSA (k,n) Parameter estimation per time-frequency DOA for d(k, n): Beamspace root-music (circular array) [Zoltowski et al. 1992] Diffuse PSD φ R (k, n): maximum likelihood estimator [Braun 2013 et al.] Noise covariance matrix Φ v (k, n): speech presence probability based recursive estimation [Souden 2011 et al.]

Joint reverberation and noise reduction... STFT... Multichannel MMSE-STSA STFT MFCC ISTFT ASR SE Stage ASR Stage

M-MMSE-MFCC estimator In the context of ASR, MMSE-MFCC estimators [Yu 2008], [Astudillo 2010], [Stark 2011], bring interesting advantages Same signal model as STFT domain estimators e.g. Wiener, MMSE-STSA, MMSE-LSA. The approach in [Astudillo 2010], here used, also provides the minimum MSE in MFCC domain. The same approach can be applied to derive a M-MMSE-MFCC estimator from the M-MMSE

M-MMSE MFCC Estimator The posterior distribution for the M-MMSE is given by ) p(s(k, n) y(k, n)) N C (ŜM-MMSE (k, n), λ(k, n), where the variance is equal to the minimum MSE λ(k, n) = E { S(k, n) ŜM-MMSE(k, n) 2} = φ S (k, n)(1 h H M-MMSE(k, n)d(k)) In theory, the posterior for the M-MMSE-MFCC can be obtained by Uncertainty Propagation as p(c(i, n) y(n)) N C (ĉm-mmse-mfcc (i, n), λ c (i, n) ).

M-MMSE MFCC Estimator In practice, we need to propagate variances through the STFT. Let φ(n) be the variance of speech or noise, the variance after ISTFT+STFT is given by φ(n ) = R n n 2 φ(n), n Ov(n ) R n n is built by multiplying the inverse Fourier and Fourier matrices truncated to the corresponding overlap. Summing over all overlapping frames Ov attenuates variance artifacts (STFT consistency). Correlations induced by overlapping windows ignored.

Model-based feature enhancement Since the minimum MSE of the M-MMSE-MFCC is available we can apply observation uncertainty techniques. Modified Imputation [Kolossa 2005] showed the best performance, this is given by ĉ MI q (i, n ) = + Σ q (i) Σ q (i) + λ c (i, n )ĉm-mmse (i, n ) λ c (i, n ) Σ q (i) + λ c (i, n ) µ q(i), (1) where µ q and Σ q are the mean and variances of the q-th ASR Gaussian mixture.

Proposed System Characteristics M-MMSE-MFCC with optional use of MI as described. System is real-time capable, per-frame batch if CMS used. To improve performance, speech variance φ S (k, n) re-estimated using the M-STSA. Implementation M-STSA, M-MMSE-MFCC implemented in Matlab. Modified version of HTK used for MI.

Proposed System... SE Stage ASR Stage STFT STFT... M-MMSE MVDR MFCC ISTFT ASR Beamformed signal: Z(k, n) = h MVDR(k, n) H y(k, n) Residual variance: φ U (k, n) = h H MVDR(φ R Γ diff + Φ v)h MVDR

REVERB 2014 Results HTK baseline, development set results for clean training Simulated Data Room 1 Room 2 Room 2 Avg. Near Far Near Far Near Far No Proc. 14.43 25.15 43.46 86.64 52.20 88.40 51.67 MSTSA 19.25 27.65 18.68 36.55 24.60 47.16 28.97 M-MFCC 16.94 23.57 17.20 33.47 20.80 44.29 26.03 +MI 15.34 21.85 16.96 33.67 20.99 45.03 25.64 Recorded Data Room 1 Avg. Near Far No Proc. 88.33 87.56 87.94 MSTSA 58.27 61.18 59.71 M-MFCC 54.15 54.41 54.27 +MI 51.72 50.31 51.02

REVERB 2014 Results HTK baseline, development set results for multi-condition training Simulated Data Room 1 Room 2 Room 2 Avg. Near Far Near Far Near Far No Proc. 16.54 18.88 23.37 43.18 27.40 46.79 29.34 MSTSA 15.46 17.75 17.23 26.13 18.40 30.91 20.97 M-MFCC 15.73 16.79 14.81 21.99 18.05 27.35 19.11 +MI 14.70 16.74 14.30 23.05 17.80 27.42 19.00 Recorded Data Room 1 Avg. Near Far No Proc. 52.90 50.79 51.85 MSTSA 42.48 41.49 41.98 M-MFCC 40.61 39.23 39.92 +MI 39.74 37.18 38.46

Conclusions Improvements over M-STSA by integration with ASR. Results for real data worse compared to simulated data, but consistent across methods. The use of observation uncertainty (MI) yields good results in highly mismatched situations. ISTFT+STFT propagation simplifies integration with well established STFT-domain methods.

Thank You! MMSE-MFCC Matlab code available under https://github.com/ramon-astudillo/stft up tools MI HTK patches available under http://www.astudillo.com/ramon/research/stft-up/