Study on the Effects of Intrinsic Variation using i-vectors in Text-Independent Speaker Verification

Size: px

Start display at page:

Download "Study on the Effects of Intrinsic Variation using i-vectors in Text-Independent Speaker Verification"

Silvia Hopkins
7 years ago
Views:

1 Study on the Effects of Intrinsic Variation using i-vectors in Text-Independent Speaker Verification Sheng Chen, Mingxing Xu, Emlyn Pratt Department of Computer Science and Technology Tsinghua University, Beijing, China June 26, /19

2 Outline 1 Introduction Main Challenges Problem and Proposal 2 i-vector Framework for Intrinsic Variation Modeling i-vector Framework Compensation for Intrinsic Variability 3 Experiments Data partitions for train and testing Description of Speaker Verification systems Experimental Results 4 Conclusions & Future Work 2/19

3 Main Challenges in Speaker Verification Extrinsic Variability Mismatched Channels Environmental Noise... Intrinsic Variability Speaking Style Emotion Speech Volume State of Health... 3/19

4 Problem and Proposal Problem we focus on: Performances of speaker verification systems are adversely affected by intrinsic variability. Question: How the speaker verification system perform when enrollment and testing are done in mismatched conditions due to intrinsic variability? How the technologies focused on modeling the total variability behave in addressing the effects of intrinsic variability in speaker verification? Proposal: Model the intrinsic variability with i-vector framework. 4/19

5 How to define variation forms? Reading Speaking Style English Speaking Language Angry Happy Emotional State Neutral Spontaneous speech at normal rate and volume in Chinese Speaking Rate Fast Slow Speaking Volume Loud Soft Whispered Physical Status Mumbled Denasalized 5/19

6 i-vector Framework for Intrinsic Variation Modeling Application of i-vector modeling: Effective for Channel Compensation 1 i-vector Framework Total Variability M = m+tw Cosine Similarity Scoring score(w target,w test ) = wtarget,wtest w target w test Idea: How about modeling the Intrinsic Variability with i-vector Framework? Front-end factor analysis for speaker verification, N Dehak, PJ Kenny, R Dehak, 6/19

7 How to remove the effects of intrinsic variations? Linear Discriminant Analysis(LDA) Idea: Minimize the within-speaker variability while maximizing the between-speaker variability S B v = λs W v Within-Class Covariance Normalization(WCCN) Idea: Deemphasize the direction of high intra-speaker variability W 1 = BB t Nuisance Attribute Projection(NAP) Idea: Remove the nuisance direction P = I VV t 7/19

8 Experiments Experimental Data Intrinsic Variation Corpus Data partitions for train and testing Description of Speaker Verification Systems GMM-UBM baseline system i-vector based speaker verification systems Experimental Results 8/19

9 Intrinsic Variation Corpus Type Description Number of variation forms 12 Number of Subjects 110(46 males, 64 females) Format WAVE Duration 180s Sample Rate 8KHz Resolution 8 bits Soundtrack Mono 9/19

10 Data partitions in the intrinsic variation corpus Function Source Description UBM traing data Training data used for total variability space Training data used for LDA,WCCN and NAP Testing data 30 speakers 30 speakers 20 speakers 20 speakers 18 hours 12 variation forms 18 hours 12 variation forms 12 hours 12 variation forms 2400 utterances 12 variation forms 10/19

11 Description of Speaker Verification systems GMM-UBM (Baseline System) P(x λ) = M ω i g(x,µ i,σ i ) i=1 S(U) = logp(u λ TAR ) logp(u λ UBM ) Feature: 39 dimensional MFCC UBM: 512 Gaussian mixtures i-vector based Speaker Verification Systems i-vector + LDA i-vector + WCCN i-vector + NAP i-vector + LDA + WCCN 200 dimensional i-vector 11/19

12 EERs(%) for each enrollment condition when testing utterances contain the twelve variation forms Speech Variation Variation Form GMM-UBM LDA WCCN NAP LDA+WCCN Base Case Spontaneous Speaking Style Reading Speaking Volume Speaking Rate Emotional State Physical Status Loud Soft Whispered Fast Slow Angry Happy Denasalized Mumbled Speaking Language English /19

13 Performances of i-vector based systems Overall EER(%) of Speaker Verification systems in the intrinsic variation corpus System EER(%) Relative Reduction(%) GMM-UBM(baseline) i-vector+lda i-vector+wccn i-vector+nap i-vector+lda+wccn /19

14 DET curve for GMM-UBM based system and four i-vector based systems. Speaker Detection Performance 60 GMM-UBM i-vector+lda i-vector+nap i-vector+wccn i-vector+lda+wccn 40 Miss probability (in %) False Alarm probability (in %) 14/19

15 Comparison between GMM-UBM and i-vector in matched and mismatched conditions 15/19

16 EERs(%) for each testing condition when spontaneous utterances are used for enrollment Speech Variation Variation Form GMM-UBM LDA WCCN NAP LDA+WCCN Base Case Spontaneous Speaking Style Reading Speaking Volume Speaking Rate Emotional State Physical Status Loud Soft Whispered Fast Slow Angry Happy Denasalized Mumbled Speaking Language English /19

17 EERs(%) for each testing condition when whispering utterances are used for enrollment Speech Variation Variation Form GMM-UBM LDA WCCN NAP LDA+WCCN Base Case Spontaneous Speaking Style Reading Speaking Volume Speaking Rate Emotional State Physical Status Loud Soft Whispered Fast Slow Angry Happy Denasalized Mumbled Speaking Language English /19

18 Conclusions & Future Work Conclusions Mismatches in intrinsic variations cause sharp degradation in speaker verification performance. The i-vector framwork performs better than GMM-UBM in modeling intrinsic variations. Whispering utterances bring the largest degradation of speaker verification performances. Future Work More techniques for intrinsic variation compensation. Improvements in feature domain. 18/19

19 Q & A Thanks! 19/19

IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered

IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification Shou-Chun Yin, Richard Rose, Senior