Energy-based multi-speaker voice activity detection with an ad hoc microphone array

Similar documents
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

Chapter 6. Linear Programming: The Simplex Method. Introduction to the Big M Method. Section 4 Maximization and Minimization with Problem Constraints

How to Improve the Sound Quality of Your Microphone

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)

RECORDING AND CAPTURING SOUND

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Recent advances in Digital Music Processing and Indexing

Group Testing a tool of protecting Network Security

Blind Source Separation for Robot Audition using Fixed Beamforming with HRTFs

Enhancing Wireless Security with Physical Layer Network Cooperation

Speech Signal Processing: An Overview

TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

Component Ordering in Independent Component Analysis Based on Data Power

EXPLOIT THE SCALE OF BIG DATA FOR DATA PRIVACY: AN EFFICIENT SCHEME BASED ON DISTANCE-PRESERVING ARTIFICIAL NOISE AND SECRET MATRIX TRANSFORM

A STUDY OF ECHO IN VOIP SYSTEMS AND SYNCHRONOUS CONVERGENCE OF

A secure face tracking system

Evolutionary denoising based on an estimation of Hölder exponents with oscillations.

What Audio Engineers Should Know About Human Sound Perception. Part 2. Binaural Effects and Spatial Hearing

L9: Cepstral analysis

Water Leakage Detection in Dikes by Fiber Optic

Grasshopper3 U3. Point Grey Research Inc Riverside Way Richmond, BC Canada V6W 1K7 T (604)

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Search keywords: Connect, Meeting, Collaboration, Voice over IP, VoIP, Acoustic Magic, audio, web conferencing, microphone, best practices

1 Introduction to Matrices

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Luigi Piroddi Active Noise Control course notes (January 2015)

Control 2004, University of Bath, UK, September 2004

A NOVEL DETERMINISTIC METHOD FOR LARGE-SCALE BLIND SOURCE SEPARATION

UNIVERSAL SPEECH MODELS FOR SPEAKER INDEPENDENT SINGLE CHANNEL SOURCE SEPARATION

Application Notes. Contents. Overview. Introduction. Echo in Voice over IP Systems VoIP Performance Management

ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS. Mikhail Tsitsvero and Sergio Barbarossa

Installing and Configuring TrueConf Client Application for OS X

8. Linear least-squares

Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification

Lecture 5 Least-squares

4.3 Least Squares Approximations

Conference interpreting with information and communication technologies experiences from the European Commission DG Interpretation

Ericsson T18s Voice Dialing Simulator

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS

VPAT Summary. VPAT Details. Section Web-based Internet information and applications - Detail

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

User Manual. Please read this manual carefully before using the Phoenix Octopus

Revision of Lecture Eighteen

Analyzing Mission Critical Voice over IP Networks. Michael Todd Gardner

Reduced echelon form: Add the following conditions to conditions 1, 2, and 3 above:

Accessibility-MiVoice-Business.docx Page 1

Summary Table Voluntary Product Accessibility Template

Subspace Analysis and Optimization for AAM Based Face Alignment

Independent Component Analysis: Algorithms and Applications

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp

hp calculators HP 35s Using the formula solver part 1 What is a solution? Practice Example: Finding roots of polynomials

AN Application Note: FCC Regulations for ISM Band Devices: MHz. FCC Regulations for ISM Band Devices: MHz

How an electronic shutter works in a CMOS camera. First, let s review how shutters work in film cameras.

Force/position control of a robotic system for transcranial magnetic stimulation

Echo Troubleshooting Guide How to identify, troubleshoot and remove echoes in installed-room AV systems

Sound Level Meters Nor131 & Nor132

On Quality of Monitoring for Multi-channel Wireless Infrastructure Networks

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING

Unknown n sensors x(t)

Overview of the research results on Voice over IP

Microsoft Skype for Business/Lync

Deployment Of Multi-Network Video And Voice Conferencing On A Single Platform

Dynamic sound source for simulating the Lombard effect in room acoustic modeling software

Summary Table Voluntary Product Accessibility Template. Criteria Supporting Features Remarks and explanations

Xerox DocuMate 3125 Document Scanner

Conference Phone Buyer s Guide

USER MANUAL DUET EXECUTIVE USB DESKTOP SPEAKERPHONE

1.5 Oneway Analysis of Variance

Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery

Using Your Fitting Software This guide provides comprehensive, task-based information about all the fitting software features.

Summary Table Voluntary Product Accessibility Template

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

PROJECT WORKPLACE DEVELOPMENT

An Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network

How to organize and run audio/ videoconference

User Manual. For additional help please send a detailed to Support@phnxaudio.com. - 1 Phoenix Audio Technologies

Solution Components: REALPRESENCE IMMERSIVE TELEPRESENCE STUDIO TECHNICAL SPECIFICATIONS

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP 60 Multi-Channel Sound Track Down-Mix and Up-Mix Draft Issue 1 April 2012 Page 1 of 6

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

A Microphone Array for Hearing Aids

Lecture 8: Signal Detection and Noise Assumption

Unsupervised Data Mining (Clustering)

Video Conferencing. Femi Alabi UNC-CH - Comp 523 November 22, 2010

vcenter Operations Manager Administration 5.0 Online Help VPAT

Subspace intersection tracking using the Signed URV algorithm

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet

Multivariate normal distribution and testing for means (see MKB Ch 3)

Summary Table. Voluntary Product Accessibility Template

Low-resolution Character Recognition by Video-based Super-resolution

SPEECH INTELLIGIBILITY and Fire Alarm Voice Communication Systems

Active Monitoring of Voice over IP Services with Malden

not a Web- based application. not self-contained, closed products. Please refer to the attached VPAT Please refer to the attached VPAT

MICROPHONE SPECIFICATIONS EXPLAINED

Transcription:

Energy-based multi-speaker voice activity detection with an ad hoc microphone array Alexander Bertrand Marc Moonen Department of Electrical Engineering (ESAT) Katholieke Universiteit Leuven ICASSP 2010 A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 1 / 20

Outline 1 Motivation Problem statement Data model 2 Solving non-negative BSS (NBSS) NBSS with well-grounded sources M-NICA 3 Results A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 2 / 20

Outline Motivation Problem statement 1 Motivation Problem statement Data model 2 Solving non-negative BSS (NBSS) NBSS with well-grounded sources M-NICA 3 Results A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 3 / 20

Problem statement Motivation Problem statement 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 : speech source : microphone Goal: individual voice activity detection (VAD) for multiple simultaneous speakers Ad-hoc microphone array Assumptions: Speakers in near-field (speech power varies over microphones) Speakers mutually independent Limited noise, and limited reverberance A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 4 / 20

Problem statement Motivation Problem statement 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 : speech source : microphone Goal: individual voice activity detection (VAD) for multiple simultaneous speakers Ad-hoc microphone array Assumptions: Speakers in near-field (speech power varies over microphones) Speakers mutually independent Limited noise, and limited reverberance A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 4 / 20

Problem statement Motivation Problem statement 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 : speech source : microphone Goal: individual voice activity detection (VAD) for multiple simultaneous speakers Ad-hoc microphone array Assumptions: Speakers in near-field (speech power varies over microphones) Speakers mutually independent Limited noise, and limited reverberance A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 4 / 20

Problem statement Motivation Problem statement Advantages: Array geometry unknown Speaker positions unknown Energy-based low data rate synchronization sampling clocks not crucial By-product: power of each speaker at each microphone Applications: Binaural hearing aids (head shadow) Video conferencing Ad hoc acoustic sensor networks... A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 5 / 20

Problem statement Motivation Problem statement Advantages: Array geometry unknown Speaker positions unknown Energy-based low data rate synchronization sampling clocks not crucial By-product: power of each speaker at each microphone Applications: Binaural hearing aids (head shadow) Video conferencing Ad hoc acoustic sensor networks... A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 5 / 20

Outline Motivation Data model 1 Motivation Problem statement Data model 2 Solving non-negative BSS (NBSS) NBSS with well-grounded sources M-NICA 3 Results A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 6 / 20

Data model Motivation Data model N speakers, J microphones, J N Speech signal n: s n [t] Microphone signal j: ỹ j [t] Instantaneous speech power (L=block length, k =frame index): s n [k] = 1 L 1 s n [kl + l] 2 L l=0 Instantaneous microphone signal power: y j [k] = 1 L 1 ỹ j [kl + l] 2 L l=0 Stack s n [k] and y j [k] in s[k] and y[k] resp. Data model: y[k] As[k], k N A is a J N mixing matrix A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 7 / 20

Data model Motivation Data model N speakers, J microphones, J N Speech signal n: s n [t] Microphone signal j: ỹ j [t] Instantaneous speech power (L=block length, k =frame index): s n [k] = 1 L 1 s n [kl + l] 2 L l=0 Instantaneous microphone signal power: y j [k] = 1 L 1 ỹ j [kl + l] 2 L l=0 Stack s n [k] and y j [k] in s[k] and y[k] resp. Data model: y[k] As[k], k N A is a J N mixing matrix A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 7 / 20

Data model Motivation Data model N speakers, J microphones, J N Speech signal n: s n [t] Microphone signal j: ỹ j [t] Instantaneous speech power (L=block length, k =frame index): s n [k] = 1 L 1 s n [kl + l] 2 L l=0 Instantaneous microphone signal power: y j [k] = 1 L 1 ỹ j [kl + l] 2 L l=0 Stack s n [k] and y j [k] in s[k] and y[k] resp. Data model: y[k] As[k], k N A is a J N mixing matrix A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 7 / 20

Data model Motivation Data model N speakers, J microphones, J N Speech signal n: s n [t] Microphone signal j: ỹ j [t] Instantaneous speech power (L=block length, k =frame index): s n [k] = 1 L 1 s n [kl + l] 2 L l=0 Instantaneous microphone signal power: y j [k] = 1 L 1 ỹ j [kl + l] 2 L l=0 Stack s n [k] and y j [k] in s[k] and y[k] resp. Data model: y[k] As[k], k N A is a J N mixing matrix A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 7 / 20

Data model Motivation Data model y[k] As[k], k N Remarks: Assumes independence of sources and no reverberation good choice of L Trade-off: size of L (time resolution vs. model mismatch) Noise (incorporate in s or subtract) Goal: find s (and A) track power of each source = blind source separation problem with non-negative source signals (NBSS) A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 8 / 20

Data model Motivation Data model y[k] As[k], k N Remarks: Assumes independence of sources and no reverberation good choice of L Trade-off: size of L (time resolution vs. model mismatch) Noise (incorporate in s or subtract) Goal: find s (and A) track power of each source = blind source separation problem with non-negative source signals (NBSS) A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 8 / 20

Data model Motivation Data model y[k] As[k], k N Remarks: Assumes independence of sources and no reverberation good choice of L Trade-off: size of L (time resolution vs. model mismatch) Noise (incorporate in s or subtract) Goal: find s (and A) track power of each source = blind source separation problem with non-negative source signals (NBSS) A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 8 / 20

Outline Solving non-negative BSS (NBSS) NBSS with well-grounded sources 1 Motivation Problem statement Data model 2 Solving non-negative BSS (NBSS) NBSS with well-grounded sources M-NICA 3 Results A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 9 / 20

Solving non-negative BSS (NBSS) NBSS with well-grounded sources NBSS with well-grounded sources Exploit non-negativity simpler algorithms (compared to classic ICA) Exploit well-groundedness of source signals (non-vanishing pdf at zero) s: well-grounded due to on-off behavior of speech Possible choice of algorithm: Non-negative PCA (NPCA) 1 Avoid step size search: Multiplicative non-negative ICA (M-NICA) 2 1 E. Oja and M. Plumbley, Blind separation of positive sources using non-negative PCA, in Proc. of the 4th international Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), Nara, Japan, 2003. 2 A. Bertrand and M. Moonen, Blind separation of non-negative source signals using multiplicative updates and subspace projection, accepted for publication in Signal Processing. A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 10 / 20

Solving non-negative BSS (NBSS) NBSS with well-grounded sources NBSS with well-grounded sources Exploit non-negativity simpler algorithms (compared to classic ICA) Exploit well-groundedness of source signals (non-vanishing pdf at zero) s: well-grounded due to on-off behavior of speech Possible choice of algorithm: Non-negative PCA (NPCA) 1 Avoid step size search: Multiplicative non-negative ICA (M-NICA) 2 1 E. Oja and M. Plumbley, Blind separation of positive sources using non-negative PCA, in Proc. of the 4th international Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), Nara, Japan, 2003. 2 A. Bertrand and M. Moonen, Blind separation of non-negative source signals using multiplicative updates and subspace projection, accepted for publication in Signal Processing. A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 10 / 20

Solving non-negative BSS (NBSS) NBSS with well-grounded sources NBSS with well-grounded sources Exploit non-negativity simpler algorithms (compared to classic ICA) Exploit well-groundedness of source signals (non-vanishing pdf at zero) s: well-grounded due to on-off behavior of speech Possible choice of algorithm: Non-negative PCA (NPCA) 1 Avoid step size search: Multiplicative non-negative ICA (M-NICA) 2 1 E. Oja and M. Plumbley, Blind separation of positive sources using non-negative PCA, in Proc. of the 4th international Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), Nara, Japan, 2003. 2 A. Bertrand and M. Moonen, Blind separation of non-negative source signals using multiplicative updates and subspace projection, accepted for publication in Signal Processing. A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 10 / 20

Solving non-negative BSS (NBSS) NBSS with well-grounded sources NBSS with well-grounded sources Exploit non-negativity simpler algorithms (compared to classic ICA) Exploit well-groundedness of source signals (non-vanishing pdf at zero) s: well-grounded due to on-off behavior of speech Possible choice of algorithm: Non-negative PCA (NPCA) 1 Avoid step size search: Multiplicative non-negative ICA (M-NICA) 2 1 E. Oja and M. Plumbley, Blind separation of positive sources using non-negative PCA, in Proc. of the 4th international Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), Nara, Japan, 2003. 2 A. Bertrand and M. Moonen, Blind separation of non-negative source signals using multiplicative updates and subspace projection, accepted for publication in Signal Processing. A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 10 / 20

Outline Solving non-negative BSS (NBSS) M-NICA 1 Motivation Problem statement Data model 2 Solving non-negative BSS (NBSS) NBSS with well-grounded sources M-NICA 3 Results A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 11 / 20

M-NICA Solving non-negative BSS (NBSS) M-NICA Main idea An orthogonal mixture of non-negative, well-grounded, independent signals that preserves non-negativity, is a permutation of the original signals [M. Plumbley, 2002] M-NICA: Idea: 1 decorrelate + preserve non-negativity 2 restore signal subspace (projection step) Multiplicative updating: preserves non-negativity no user-defined learning rate Notation: S, Y: M samples of s[k], y[k] in columns, i.e. Y = AS S: mean of rows of S, i.e. S = 1 M S 1 1T A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 12 / 20

M-NICA Solving non-negative BSS (NBSS) M-NICA Main idea An orthogonal mixture of non-negative, well-grounded, independent signals that preserves non-negativity, is a permutation of the original signals [M. Plumbley, 2002] M-NICA: Idea: 1 decorrelate + preserve non-negativity 2 restore signal subspace (projection step) Multiplicative updating: preserves non-negativity no user-defined learning rate Notation: S, Y: M samples of s[k], y[k] in columns, i.e. Y = AS S: mean of rows of S, i.e. S = 1 M S 1 1T A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 12 / 20

M-NICA Solving non-negative BSS (NBSS) M-NICA Main idea An orthogonal mixture of non-negative, well-grounded, independent signals that preserves non-negativity, is a permutation of the original signals [M. Plumbley, 2002] M-NICA: Idea: 1 decorrelate + preserve non-negativity 2 restore signal subspace (projection step) Multiplicative updating: preserves non-negativity no user-defined learning rate Notation: S, Y: M samples of s[k], y[k] in columns, i.e. Y = AS S: mean of rows of S, i.e. S = 1 M S 1 1T A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 12 / 20

Solving non-negative BSS (NBSS) M-NICA The M-NICA algorithm (batch) 1 Initialize S Y 1:N,: 2 Decorrelation step: (preserves non-negativity) n = 1... N, m = 1... M : [ ] SS T D 1 [S 1 S + SST D 1 1 S + D 2S nm ] nm [S] nm [ ] SS T D 1 1 S + SST D 1 1 S + D 2S nm ERRATUM: switch nominator and denominator in paper! 3 Signal subspace projection step: 4 Return to step 2. n = 1... N, m = 1... M : ( [Prowspan{Y} [S] nm max S ] ) nm, 0 (PS: batch mode) A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 13 / 20

Results Results 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 : speech source : microphone cubical room: 5m x 5m x 5m L = 480 (i.e. 30ms) Sliding window with length K = 200 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Assessment by signal-to-error ratio (SER): SER = 1 k 10 log ([A] jn s n[k]) 2 JN 10 ] j,n k [Â ( ŝ n [k] [A] jn s n [k]) 2 A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 14 / 20 jn

Results Results 10 8 Original source energy Estimated source energy by M NICA 6 4 2 0 0 5 10 15 20 25 30 time [s] SER [db] 20 15 10 5 M NICA NPCA η=0.5 NPCA η=1 NPCA η=1.5 NPCA η=2 0 5 0 5 10 15 20 25 30 time [s] A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 15 / 20

Results Results 10 8 Original source energy (source 1) Estimated source energy by M NICA 6 4 2 0 0 50 100 150 200 250 300 350 400 450 10 8 Original source energy (source 2) Estimated source energy by M NICA 6 4 2 0 0 50 100 150 200 250 300 350 400 450 10 8 Original source energy (source 3) Estimated source energy by M NICA 6 4 2 0 0 50 100 150 200 250 300 350 400 450 A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 16 / 20

Results Effect of reverberation 16 14 L=480 L=960 12 SER [db] 10 8 6 4 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Reflection coefficient A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 17 / 20

Results Effect of residual noise 12 11 10 9 SER [db] 8 7 6 5 4 3 2 0 1 2 3 4 5 6 7 8 9 10 SNR in best microphone [db] A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 18 / 20

Results Reconstruction (limited reverberation, limited noise) 10 5 no reverberance, no residual noise Original source energy (source 1) Estimated source energy by M NICA 0 0 10 20 30 40 50 60 10 reflection coefficient = 0.7 Original source energy (source 1) Estimated source energy by M NICA 5 0 0 10 20 30 40 50 60 10 residual noise with SNR of 5 db in best microphone Original source energy (source 1) Estimated source energy by M NICA 5 0 0 10 20 30 40 50 60 A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 19 / 20

Summary Summary Track power of simultaneous speakers Ad-hoc microphone array (unknown geometry) Energy based: near-field low data rate weak synchronization constraints Solve as Non-negative BSS algorithm: M-NICA A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP 2010 20 / 20