Energy-based multi-speaker voice activity detection with an ad hoc microphone array

Transcription

1 Energy-based multi-speaker voice activity detection with an ad hoc microphone array Alexander Bertrand Marc Moonen Department of Electrical Engineering (ESAT) Katholieke Universiteit Leuven ICASSP 2010 A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

2 Outline 1 Motivation Problem statement Data model 2 Solving non-negative BSS (NBSS) NBSS with well-grounded sources M-NICA 3 Results A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

3 Outline Motivation Problem statement 1 Motivation Problem statement Data model 2 Solving non-negative BSS (NBSS) NBSS with well-grounded sources M-NICA 3 Results A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

4 Problem statement Motivation Problem statement : speech source : microphone Goal: individual voice activity detection (VAD) for multiple simultaneous speakers Ad-hoc microphone array Assumptions: Speakers in near-field (speech power varies over microphones) Speakers mutually independent Limited noise, and limited reverberance A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

7 Problem statement Motivation Problem statement Advantages: Array geometry unknown Speaker positions unknown Energy-based low data rate synchronization sampling clocks not crucial By-product: power of each speaker at each microphone Applications: Binaural hearing aids (head shadow) Video conferencing Ad hoc acoustic sensor networks... A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

8 Problem statement Motivation Problem statement Advantages: Array geometry unknown Speaker positions unknown Energy-based low data rate synchronization sampling clocks not crucial By-product: power of each speaker at each microphone Applications: Binaural hearing aids (head shadow) Video conferencing Ad hoc acoustic sensor networks... A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

9 Outline Motivation Data model 1 Motivation Problem statement Data model 2 Solving non-negative BSS (NBSS) NBSS with well-grounded sources M-NICA 3 Results A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

10 Data model Motivation Data model N speakers, J microphones, J N Speech signal n: s n [t] Microphone signal j: ỹ j [t] Instantaneous speech power (L=block length, k =frame index): s n [k] = 1 L 1 s n [kl + l] 2 L l=0 Instantaneous microphone signal power: y j [k] = 1 L 1 ỹ j [kl + l] 2 L l=0 Stack s n [k] and y j [k] in s[k] and y[k] resp. Data model: y[k] As[k], k N A is a J N mixing matrix A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

14 Data model Motivation Data model y[k] As[k], k N Remarks: Assumes independence of sources and no reverberation good choice of L Trade-off: size of L (time resolution vs. model mismatch) Noise (incorporate in s or subtract) Goal: find s (and A) track power of each source = blind source separation problem with non-negative source signals (NBSS) A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

17 Outline Solving non-negative BSS (NBSS) NBSS with well-grounded sources 1 Motivation Problem statement Data model 2 Solving non-negative BSS (NBSS) NBSS with well-grounded sources M-NICA 3 Results A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

18 Solving non-negative BSS (NBSS) NBSS with well-grounded sources NBSS with well-grounded sources Exploit non-negativity simpler algorithms (compared to classic ICA) Exploit well-groundedness of source signals (non-vanishing pdf at zero) s: well-grounded due to on-off behavior of speech Possible choice of algorithm: Non-negative PCA (NPCA) 1 Avoid step size search: Multiplicative non-negative ICA (M-NICA) 2 1 E. Oja and M. Plumbley, Blind separation of positive sources using non-negative PCA, in Proc. of the 4th international Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), Nara, Japan, A. Bertrand and M. Moonen, Blind separation of non-negative source signals using multiplicative updates and subspace projection, accepted for publication in Signal Processing. A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

22 Outline Solving non-negative BSS (NBSS) M-NICA 1 Motivation Problem statement Data model 2 Solving non-negative BSS (NBSS) NBSS with well-grounded sources M-NICA 3 Results A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

23 M-NICA Solving non-negative BSS (NBSS) M-NICA Main idea An orthogonal mixture of non-negative, well-grounded, independent signals that preserves non-negativity, is a permutation of the original signals [M. Plumbley, 2002] M-NICA: Idea: 1 decorrelate + preserve non-negativity 2 restore signal subspace (projection step) Multiplicative updating: preserves non-negativity no user-defined learning rate Notation: S, Y: M samples of s[k], y[k] in columns, i.e. Y = AS S: mean of rows of S, i.e. S = 1 M S 1 1T A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

26 Solving non-negative BSS (NBSS) M-NICA The M-NICA algorithm (batch) 1 Initialize S Y 1:N,: 2 Decorrelation step: (preserves non-negativity) n = 1... N, m = 1... M : [ ] SS T D 1 [S 1 S + SST D 1 1 S + D 2S nm ] nm [S] nm [ ] SS T D 1 1 S + SST D 1 1 S + D 2S nm ERRATUM: switch nominator and denominator in paper! 3 Signal subspace projection step: 4 Return to step 2. n = 1... N, m = 1... M : ( [Prowspan{Y} [S] nm max S ] ) nm, 0 (PS: batch mode) A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

27 Results Results : speech source : microphone cubical room: 5m x 5m x 5m L = 480 (i.e. 30ms) Sliding window with length K = Assessment by signal-to-error ratio (SER): SER = 1 k 10 log ([A] jn s n[k]) 2 JN 10 ] j,n k [Â ( ŝ n [k] [A] jn s n [k]) 2 A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20 jn

28 Results Results 10 8 Original source energy Estimated source energy by M NICA time [s] SER [db] M NICA NPCA η=0.5 NPCA η=1 NPCA η=1.5 NPCA η= time [s] A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

29 Results Results 10 8 Original source energy (source 1) Estimated source energy by M NICA Original source energy (source 2) Estimated source energy by M NICA Original source energy (source 3) Estimated source energy by M NICA A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

30 Results Effect of reverberation L=480 L= SER [db] Reflection coefficient A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

31 Results Effect of residual noise SER [db] SNR in best microphone [db] A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

32 Results Reconstruction (limited reverberation, limited noise) 10 5 no reverberance, no residual noise Original source energy (source 1) Estimated source energy by M NICA reflection coefficient = 0.7 Original source energy (source 1) Estimated source energy by M NICA residual noise with SNR of 5 db in best microphone Original source energy (source 1) Estimated source energy by M NICA A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20

33 Summary Summary Track power of simultaneous speakers Ad-hoc microphone array (unknown geometry) Energy based: near-field low data rate weak synchronization constraints Solve as Non-negative BSS algorithm: M-NICA A. Bertrand, M.Moonen (K.U.Leuven) Multi-speaker VAD with ad hoc array ICASSP / 20