Discriminative Decision Function Based Scoring Method Used in Speaker Verification
|
|
- Miles Francis
- 7 years ago
- Views:
Transcription
1 Chinese Journal of Electronics Vol.21, No.4, Oct Discriminative Decision Function Based Scoring Method Used in Speaker Verification LIANG Chunyan, ZHANG Xiang and YAN Yonghong The Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing , China Abstract Decision function of log likelihood ratio derived from classical hypothesis testing theory is widely used in Gaussian mixture model based speaker recognition system. This paper introduces a discriminative decision function based scoring method for speaker recognition with the state-of-the-art Joint factor analysis JFA system. In the scoring module of JFA system, an approximate form of the decision function is proposed. Based on the approximation, we present a discriminative decision function by re-estimating the contribution of each speech sound unit to the decision function to further improve the performance of speaker verification. The discriminative decision function is used to exploit the individual Gaussian component for better classification. The experiments are carried on the core conditions of National institute of standards and technology NIST 2010 speaker recognition evaluation data. The experimental results show that the proposed scoring method outperforms the conventional frame-by-frame strategy on the whole. Key words Speaker verification, Joint factor analysis JFA, Discriminative decision function. I. Introduction The task of speaker verification is to determine whether a given segment of speech is spoken by the hypothesized speaker [1,2]. The task can be treated as a hypothesis-testing problem. Given a trial including both the test utterance and the target speaker, a decision should be made to tell True or False based on the comparison between the log likelihood score of the trial and a threshold. Gaussian mixture models GMMs have always been the dominant approach in speaker verification [1]. In this approach, GMMs are applied to model data distribution and the Log likelihood ratio LLR derived from hypothesis testing theory is used as decision function. In recent years, Joint factor analysis JFA [3,4] has become the state-of-the-art technique in speaker verification. It has been proposed to solve the problem of speaker and session variability in GMMs framework. Many sites used JFA in the latest NIST evaluations, and there are many ways in the step of scoring [5 7]. Frame-by-frame scoring method is the most conventional one, where the whole feature file of each utterance is processed based on a full GMMs log-likelihood evaluation. It treats the GMMs simply as a probability density function of the feature vectors from a target speaker. In this study, we propose a scoring method based on discriminative decision function which is applied to expand a single GMM into a set of individual Gaussian components. In the proposed method, we re-estimate the contribution of each speech sound unit to the decision function to further improve the performance of speaker verification. The rest of this paper is organized as follows. We briefly introduce the theory of JFA in Section II. The traditional frameby-frame scoring method is presented in Section III. We propose the discriminative decision function based scoring strategy in Section IV. Experiment results are shown in Section V. Finally, we give the conclusion in Section VI. II. Joint Factor Analysis JFA has obtained wide attention during the last few years and become the state-of-the-art system in the field of speaker recognition. JFA model is used to solve the problem of speaker and session variability in GMMs framework. In this model, the speaker and channel dependent mean supervector M can be represented as a sum of two supervectors: M = s + c 1 where s is the speaker supervector and c is the channel supervector, both of which are normally distributed. They can be respectively represented by s = m + Vy+ Dz 2 c = Ux 3 where m is the speaker-independent mean supervector, that is the mean supervector of the Universal background model UBM, V is the speaker loading matrix with high speaker variability eigenvoices, D is the diagonal loading matrix describing remaining speaker variability not covered by V,and Manuscript Received June 2011; Accepted Apr This work is supported by the National Natural Science Foundation of China No , No , No , No , No and the Strategic Priority Research Program of the Chinese Academy of Sciences No.XDA
2 Discriminative Decision Function Based Scoring Method Used in Speaker Verification 693 U is the channel loading matrix with high intersession variability eigenchannels. y, z and x are the speaker factor, diagonal factor and channel factor respectively, which are all assumed to be standard normally distributed random variables. The underlying task in JFA is to train the hyperparameters U, V and D on a large training set. In the Bayesian framework, posterior distribution of the factors knowing their priors can be computed using the enrollment data. The likelihood of test utterance χ is then computed by integrating over the posterior distribution of y and z, and the prior distribution of x [8]. III. Traditional Frame-by-Frame Scoring Method The frame-by-frame scoring method is based on a full GMM log-likelihood evaluation [7]. The log-likelihood of test utterance χ and model s is computed as an average frame log-likelihood. The formula is as follows log P χ s log ω cn o t; s c, Σ c log po t s 4 where o t is the feature vector at frame t, T is the length in frames for test utterance χ, C is the number of Gaussians in the GMM and s = s + Ux is the supervector of the target model after channel adaptation while Ux is the channel supervector for the test utterance. Similarly, when calculating the log-likelihood of utterance χ and the UBM, the mean supervector of UBM is also compensated as m = m + Ux. This is equivalent to set the mean supervectors of both the target model and the UBM into the same channel space where the test utterance lies, which can effectively solve the acoustic mismatch problem between the training and test environment. Thus, the average verification score is obtained by computing the log-likelihood ratio between the compensated target speaker model s and UBM m, for the test utterance χ, Λχ log po t s log po t m 5 IV. Discriminative Decision Function Based Scoring Method 1. The approximation of decision function If we define po t to denote the total probability of both the speaker model s and UBM m, given a feature frame o t, that is po t=po t s +po t m 6 Then the Eq.5 can be written as Λχ log pot s log pot m po t po t po t s log po t s +po t m po t m log 7 po t s +po t m In a GMM λ, the probability po t λ for an observed feature frame o t is po t λ = ω cpo t λ c= go t λ c 8 Two terms of the Taylor series logx x 1areusedtoobtain the approximation of Eq.7 and we discard the 1 since the change will not affect the classification accuracy. Λχ 1 po t s T po t s +po t m po t m po t s +po t m 1 = T If we define go t s c go t m c C j=1 got s j +got m j go t s c go t m c C j=1 got s j +got m j 9 Φ c = 1 go t s c go t m c T C,, 2,,C 10 j=1 got s j +got m j as the difference of average occupation probability among the whole observation series for Gaussian component c between the adapted speaker model and UBM, Eq.9 can be rewritten in the following form of inner product Λχ =w bη 11 where w = [1,, 1] is a unit weight vector and bη = [Φ 1,, Φ C] t denotes the difference vector of occupation probability for a trial η. From Eq.11, we can see that, given a trial η, thevalueof the decision function, hence the decision of True or False for the trial, is completely determined by a weight vector w and a difference vector bη. The average occupation probability Φ c can be thought to represent the occurrence frequency of Gaussian mixture component c among the whole observation sequences. We call the difference vector bη as the trial s information vector, which is used to map the trial into a vector. The values in weight vector w canbeviewedasthecontribution to the decision function of the corresponding elements in the trial s information vector. Hence, we can name w the contribution factor, which can also be considered as a classifier between the true information vectors and the false ones. In Eq.11, the values in w are the same, which indicates that the contributions of the differences of average occupation probability corresponding to all the Gaussian components are equal. In GMMs for speaker verification, the Gaussian components can be considered to model the underlying broad phonetic sounds that characterize a person s voice [1]. Hence, Φ c, c =1,,C, can be thought to represent the differences between the average occupation probability for the event that the feature vector of the test utterance is accounted for by each corresponding speech sound unit characterized by the target
3 694 Chinese Journal of Electronics 2012 speaker model and that for the UBM. The contributions to the decision function of the sound units are determined by w. Actually, some of the sound units have more discriminative information for different speakers, which should be given heavy weight. In contrast, the sound units which are less discriminative should be less weighted. In the following, we will show how to obtain a discriminative contribution factors w to further improve the speaker verification performance. 2. MSE criterion Suppose we have a training set consisting of N + + N trials, in which true trials are denoted as {x i}, i =1,,N +, and false trials as {y j}, j = 1,,N. Each of the trials is mapped into a difference vector of occupation probability bx i, i =1,,N +, and by j, j =1,,N. Thus, the score of the decision function for a trial x can be written as score = w t bx. We can first obtain the discriminative contribution factor w based on Minimizing the sum-of-squares error MSE criterion [9]. w = arg min w E{wt bx yx 2 } 12 where E denotes expectation and yx is the ideal output for trial x. Let the ideal output for true trial vectors be 1 and 0 for false trial vectors, i.e. ytrue = 1andyfalse = 0, the criterion above can be approximated using the training set as [ N + N ] w = arg min w t bx i w t by j 2 w j=1 13 We construct matrix M + and M respectively using all the information vectors of true and false trials as follows bx 1 t by 1 t bx 2 t M + =., M = by 2 t 14. bx N+ t by N t And we define [ ] M + M = M Then, the problem of Eq.13 becomes 15 w = arg min Mw o 2 16 w where o is a vector consisting of N + ones followed by N zeros i.e., the ideal outputs for the training trials. The problem of Eq.16 can be solved using the method of normal equations M t Mw = M t o 17 And Eq.17 can be rearranged by M t M w = M t +1 + M t 0 = M t 1 18 where 1 is the vector of all ones and 0 is an all-zeros vector. If we define R = M t M, w can be obtained by w = R 1 M t In the MSE criterion, the classifier focuses on all the training samples but not those which are easily classified wrongly, so the discriminability of w trained by Eq.19 is limited. Based on Eq.19, we then use the Generalized linear discriminant sequence GLDS kernel based Support vector machine SVM to obtain the optimal w. 3. GLDS kernel method for the discriminative training of contribution factor w Combining the solution of Eq.19 with the scoring equation form 11, we have The above equation can become score = b t w = b t R 1 M t score = b t R 1 b+ 21 where b + =1/N +M t +1 and R =1/N +R. We compare two trials x and y by mapping them into trial information vectors b x and b y first and then computing the GLDS kernel as [10] K GLDS = b t xr 1 b y 22 To reduce training time, we factor R = U t U using the Cholesky decomposition. Then K GLDS =Ub x t Ub y 23 If we transform all the trial information vectors by Ub x,the kernel is a simple inner product. This will dramatically reduce the time used in SVM training. Finally, SVM training procedure will find the corresponding α i for each support vector b i and a universal d. Thus the optimal contribution factor w can be solved as follows l w = α iy ir 1 b i + d 24 where d =[d 0 0] t. Given a new trial z, we firstly convert it to the corresponding information vector b z. Then the discriminative decision function based on the optimal contribution factor w can be expressed as l t score = α iy ir 1 b i + d b z 25 V. Experiments 1. Experiments setup The experiments for different JFA systems based on the two kinds of scoring methods the traditional frame- by-frame and the proposed discriminative decision function based scoring methods are carried out on the NIST 2010 speaker recognition evaluation corpus. The NIST SRE 2010 is similar to SRE 2008 but different from prior evaluations by including in the training and test conditions for the core test not only conversational telephone speech recorded over ordinary telephone channels, but also such speech recorded over a room microphone channel, and conversational speech from an interview scenario recorded over a room microphone channel. We respectively name the above three conditions telephone, microphone and interview for short. In this study, we focus on three types of trials: telephone-telephone, interview-interview
4 Discriminative Decision Function Based Scoring Method Used in Speaker Verification 695 and interview-telephone. Equal error rate EER and the minimum Decision cost function mindcf are used as metrics for evaluation [11,12]. In our experiments, we use Mel-frequency cepstral coefficients MFCCs as the acoustic cepstral features. 18 cepstral coefficients are computed and first order derivatives over 5 frames are appended to each feature vector, which results in a dimensionality of 36. These feature vectors are modeled using GMMs and JFA is used to treat the problem of speaker and session variability. The gender dependent UBM models with 1024 mixture components are trained using the NIST SRE side training corpus. The Switchboard II, Switchboard Cellular corpus as well as the telephone data from NIST SRE 2005 and 2006 corpus is used to train the speaker loading matrix with 300 speaker factors. And the NIST SRE 2004 corpus is used to train the diagonal matrix. For channel loading matrix, a telephone loading matrix with 100 channel factors is trained based on the phone data from NIST SRE 2004, 2005 and 2006 corpus for the telephone-telephone condition. A common channel loading matrix also with 100 channel factors for both the interview-interview and interview-telephone conditions is trained based on the telephone and microphone data from NIST SRE 2004, 2005 and 2006 corpus as well as the MIXER5 interview development corpus. The true and false trials for telephone-telephone, interview-interview and interview-telephone conditions provided in NIST SRE 2008 are used for training the contribution factor w respectively for the corresponding test conditions in NIST SRE Experiments of Taylor series approximation Since we obtain an approximate decision function, from which the discriminative decision function based scoring method is derived, the effect of using the Taylor series should be examined. Fig.1 shows the relationship of LLR score obtained from the traditional decision function and the approximation form with two terms of Taylor series. We tested on utterances respectively for male and female speakers and each utterance is scored both on Eqs.5 and 11. It can be seen that the relationship between scores from the two scoring forms is nearly linear, which means that in the purpose of classification, the effect of using Taylor series can be ignored. 3. Experiments on NIST SRE 2010 In this subsection, we list the results of JFA systems using frame-by-frame and Discriminative decision function DDF based scoring methods on the three test conditions in NIST SRE Table 1 lists the performance of the JFA systems based on the two scoring methods for the telephone-telephone condition. From Table 1, we can see that the proposed scoring method outperforms the conventional frame-by-frame strategy for both male and female speakers. Our system can achieve 14.85% relative improvement in EER and 5.53% relative improvement in mindcf for male speakers and relative gains of 16.12% EER and 16.12% mindcf for female speakers. Table 1. Comparison of different scoring methods for the telephone-telephone task EER% mindcf EER% mindcf Frame-by-frame DDF The performance of different JFA systems based on our method and the traditional frame-by-frame one for the interview-interview task is shown in Table 2. As can be seen from Table 2, our method has achieved relative 11.27% and 7.28% improvement in EER and mindcf for male speakers as well as 6.21% and 4.22% improvement in EER and mindcf for female speakers. Table 2. Comparison of different scoring methods for the interview-interview task EER% mindcf EER% mindcf Frame-by-frame DDF Table 3 compares the proposed system with the frame-byframe one for the interview-telephone condition. It demonstrates that except for the measurement of EER for male speakers, the performance of our proposed system is comparable or even better than that of the frame-by-frame one. Relative gains of 5.54% in mindcf for male speakers and 8.39% in EER for female speakers are obtained. We have noticed that the performance of male speakers for the interview-telephone task is not very comparable. This may due to the fact that the number of interview-telephone trials both true and false from NIST SRE 2008 is too small to train the contribution factor w well. Table 3. Comparison of different scoring methods for the interview-telephone task EER% mindcf EER% mindcf Frame-by-frame DDF Fig. 1. Relationship of scores obtained from traditional decision function and approximate form. a ;b 4. Speed The aim of this experiment was to show the approximate scoring time for the two different systems to compare their complexity. The time measured included reading necessary
5 696 Chinese Journal of Electronics 2012 data connected with the trial and computing the likelihood ratio. Each measuring was repeated 5 times and averaged. Table 4 shows the average scoring time per trial. From Table 4, we can see that proposed scoring method is faster than the traditional frame-by-frame one. Table 4. Comparison of average scoring time per trial using frame-by-frame and DDF based scoring methods Scoring time cost s Frame-by-frame 3.75 DDF 2.01 VI. Conclusion In this paper, we have introduced a discriminative decision function based scoring method used in speaker verification with the JFA system. Experiments show that the proposed method is effective and outperforms the traditional frame-byframe scoring method on the whole. As well, the computing complexity of the proposed method is much lower than the frame-by-frame scoring method. References [1] D.A. Reynolds, T.F. Quatieri and R.B. Dunn, Speaker verification using adapted Gaussian mixture models, Digital Signal Processing, Vol.10, No.1-3, pp.19 41, [2] X. Zhang, X. Xiao, H Wang, J. Zhang and Y. Yan, Multiclass maximum a posteriori linear regression for speaker verification, Chinese Journal of Electronics, Vol.19, No.4, pp , [3] M.H. Sanchez, L. Ferrer, E. Shriberg, A. Stolcke, Constrained cepstral speaker recognition using matched UBM and JFA training, Proc. of Interspeech, Florence, Italy, pp , [4] P. Kenny, P. Ouellet, N. Dehak, V. Gupta and P. Dumouchel, A study of inter-speaker variability in speaker verification, IEEE Trans. on Audio, Speech and Language Processing, Vol. 16, No.5, pp , [5] N.Dehak,P.Kenny,R.Dehak,P.OuelletandP.Dumouchel, Front-end factor analysis for speaker verification, IEEE Trans. on Audio, Speech and Language Processing, Vol.19, No. 4, pp , [6] N. Brümmer, L. Burget, J. Cernocky, O. Glembek et al., Fusion of heterogeneous speaker recognition systems in the stbu submission for the NIST speaker recognition evaluation 2006, IEEE Trans. on Audio, Speech and Language Processing, Vol.15, No.7, pp , [7] O. Glembek, L. Burget, N. Dehak, N. Brümmer and P. Kenny, Comparision of scoring methods used in speaker recognition with joint factor analysis, Proceeding of the International Conference on Acoustic Speech and Signal Processing, Taipei, Taiwan, pp , [8] P. Kenny and P. Dumouchel, Experiments in speaker verification using factor analysis likelihood ratios, Proceedings of Odyssey 2004, Toledo, Spain, pp , [9] R. Duda and P. Hart, Pattern Classification and Scene Analysis, Wiley, New York, [10] W. Campbell, Generalized linear discriminant sequence kernels for speaker recognition, Proceedings of the International Conference on Acoustics Speech and Signal Processing, Orlando, Florida, USA, Vol.1, pp , [11] The NIST year 2008 speaker recognition evaluation plan, [12] The NIST year 2010 speaker recognition evaluation plan, LIANG Chunyan received the B.E. degree in Communication Engineering from Shandong Normal University in Now she is a M.S. & Ph.D. candidate in Key Laboratory of Speech Acoustics and Content Understanding at Institute of Acoustics, Chinese Academy of Sciences. Her research interests include speaker recognition and language recognition. liangchunyan@hccl.ioa.ac.cn ZHANG Xiang received B.E. degree in Electronic Information Engineering from Shangdong University in 2006 and Ph.D. degree from Key Laboratory of Speech Acoustics and Content Understanding at Institute of Acoustics, Chinese Academy of Sciences. His research interests include speaker recognition, language identification, speaker diarization, and audio watermarking. YAN Yonghong received B.E. degree from Tsinghua University in 1990, and Ph.D. degree from Oregon Graduate Institute OGI. He worked in OGI as an Assistant Professor 1995, Associate Professor 1998 and Associate Director 1997 of Center for Spoken Language Understanding. He worked in Intel from 1998 to 2001, chaired Human Computer Interface Research Council, worked as Principal Engineer of Microprocessor Research Laboratory and Director of Intel China Research Center. Currently he is a professor and director of Think IT Laboratory. His research interests include speech processing and recognition, language/speaker recognition, and human computer interface. He has published more than 100 papers and holds 40 patents.
IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification Shou-Chun Yin, Richard Rose, Senior
More informationEFFECTS OF BACKGROUND DATA DURATION ON SPEAKER VERIFICATION PERFORMANCE
Uludağ Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, Cilt 18, Sayı 1, 2013 ARAŞTIRMA EFFECTS OF BACKGROUND DATA DURATION ON SPEAKER VERIFICATION PERFORMANCE Cemal HANİLÇİ * Figen ERTAŞ * Abstract:
More informationAS indicated by the growing number of participants in
1960 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 7, SEPTEMBER 2007 State-of-the-Art Performance in Text-Independent Speaker Verification Through Open-Source Software Benoît
More informationABC System description for NIST SRE 2010
ABC System description for NIST SRE 2010 May 6, 2010 1 Introduction The ABC submission is a collaboration between: Agnitio Labs, South Africa Brno University of Technology, Czech Republic CRIM, Canada
More informationOn sequence kernels for SVM classification of sets of vectors: application to speaker verification
On sequence kernels for SVM classification of sets of vectors: application to speaker verification Major part of the Ph.D. work of In collaboration with Jérôme Louradour Francis Bach (ARMINES) within E-TEAM
More informationADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt
ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION Horacio Franco, Luciana Ferrer, and Harry Bratt Speech Technology and Research Laboratory, SRI International, Menlo Park, CA
More informationThe effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications
Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.
More informationSPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,
More informationTraining Universal Background Models for Speaker Recognition
Odyssey 2010 The Speaer and Language Recognition Worshop 28 June 1 July 2010, Brno, Czech Republic Training Universal Bacground Models for Speaer Recognition Mohamed Kamal Omar and Jason Pelecanos IBM
More informationOnline Diarization of Telephone Conversations
Odyssey 2 The Speaker and Language Recognition Workshop 28 June July 2, Brno, Czech Republic Online Diarization of Telephone Conversations Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman Department of
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationDiscriminative Multimodal Biometric. Authentication Based on Quality Measures
Discriminative Multimodal Biometric Authentication Based on Quality Measures Julian Fierrez-Aguilar a,, Javier Ortega-Garcia a, Joaquin Gonzalez-Rodriguez a, Josef Bigun b a Escuela Politecnica Superior,
More informationAPPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
More informationAutomatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationDeep Neural Network Approaches to Speaker and Language Recognition
IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 10, OCTOBER 2015 1671 Deep Neural Network Approaches to Speaker and Language Recognition Fred Richardson, Senior Member, IEEE, Douglas Reynolds, Fellow, IEEE,
More informationAutomatic Cross-Biometric Footstep Database Labelling using Speaker Recognition
Automatic Cross-Biometric Footstep Database Labelling using Speaker Recognition Ruben Vera-Rodriguez 1, John S.D. Mason 1 and Nicholas W.D. Evans 1,2 1 Speech and Image Research Group, Swansea University,
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationChannel-dependent GMM and Multi-class Logistic Regression models for language recognition
Channel-dependent GMM and Multi-class Logistic Regression models for language recognition David A. van Leeuwen TNO Human Factors Soesterberg, the Netherlands david.vanleeuwen@tno.nl Niko Brümmer Spescom
More informationSecure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
More informationADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA. Multimedia Communications Department, EURECOM, Sophia Antipolis, France 2
3rd European ignal Processing Conference (EUIPCO) ADAPTIVE AND ONLINE PEAKER DIARIZATION FOR MEETING DATA Giovanni oldi, Christophe Beaugeant and Nicholas Evans Multimedia Communications Department, EURECOM,
More informationInvestigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationDeveloping an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling
More informationHow To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
More informationGender Identification using MFCC for Telephone Applications A Comparative Study
Gender Identification using MFCC for Telephone Applications A Comparative Study Jamil Ahmad, Mustansar Fiaz, Soon-il Kwon, Maleerat Sodanil, Bay Vo, and * Sung Wook Baik Abstract Gender recognition is
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationMembering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
More informationAvailable from Deakin Research Online:
This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,
More informationArtificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationConvention Paper Presented at the 135th Convention 2013 October 17 20 New York, USA
Audio Engineering Society Convention Paper Presented at the 135th Convention 2013 October 17 20 New York, USA This Convention paper was selected based on a submitted abstract and 750-word precis that have
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationAutomatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations
Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Hugues Salamin, Anna Polychroniou and Alessandro Vinciarelli University of Glasgow - School of computing Science, G128QQ
More informationProbability and Random Variables. Generation of random variables (r.v.)
Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly
More informationMusic Mood Classification
Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationAutomatic Detection of Emergency Vehicles for Hearing Impaired Drivers
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX
More informationDirect Loss Minimization for Structured Prediction
Direct Loss Minimization for Structured Prediction David McAllester TTI-Chicago mcallester@ttic.edu Tamir Hazan TTI-Chicago tamir@ttic.edu Joseph Keshet TTI-Chicago jkeshet@ttic.edu Abstract In discriminative
More informationALIZE/SpkDet: a state-of-the-art open source software for speaker recognition
ALIZE/SpkDet: a state-of-the-art open source software for speaker recognition Jean-François Bonastre 1, Nicolas Scheffer 1, Driss Matrouf 1, Corinne Fredouille 1, Anthony Larcher 1, Alexandre Preti 1,
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationMUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
More informationLog-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network
Recent Advances in Electrical Engineering and Electronic Devices Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Ahmed El-Mahdy and Ahmed Walid Faculty of Information Engineering
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationNTT DOCOMO Technical Journal. Shabette-Concier for Raku-Raku Smartphone Improvements to Voice Agent Service for Senior Users. 1.
Raku-Raku Smartphone Voice Agent UI Shabette-Concier for Raku-Raku Smartphone Improvements to Voice Agent Service for Senior Users We have created a new version of Shabette-Concier for Raku-Raku for the
More informationTowards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial
More informationClassifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationQMeter Tools for Quality Measurement in Telecommunication Network
QMeter Tools for Measurement in Telecommunication Network Akram Aburas 1 and Prof. Khalid Al-Mashouq 2 1 Advanced Communications & Electronics Systems, Riyadh, Saudi Arabia akram@aces-co.com 2 Electrical
More informationSpeech Recognition on Cell Broadband Engine UCRL-PRES-223890
Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationAcknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
More informationROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING SHORT TEST AND TRAINING SESSIONS
18th European Signal Processing Conference (EUSIPCO-21) Aalborg, Denmark, August 23-27, 21 ROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING SHORT TEST AND TRAINING SESSIONS Christos Tzagkarakis and
More informationAutomatic Emotion Recognition from Speech
Automatic Emotion Recognition from Speech A PhD Research Proposal Yazid Attabi and Pierre Dumouchel École de technologie supérieure, Montréal, Canada Centre de recherche informatique de Montréal, Montréal,
More informationGeneral Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
More informationTagging with Hidden Markov Models
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,
More informationFault Analysis in Software with the Data Interaction of Classes
, pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationCircle Object Recognition Based on Monocular Vision for Home Security Robot
Journal of Applied Science and Engineering, Vol. 16, No. 3, pp. 261 268 (2013) DOI: 10.6180/jase.2013.16.3.05 Circle Object Recognition Based on Monocular Vision for Home Security Robot Shih-An Li, Ching-Chang
More informationUNIVERSAL SPEECH MODELS FOR SPEAKER INDEPENDENT SINGLE CHANNEL SOURCE SEPARATION
UNIVERSAL SPEECH MODELS FOR SPEAKER INDEPENDENT SINGLE CHANNEL SOURCE SEPARATION Dennis L. Sun Department of Statistics Stanford University Gautham J. Mysore Adobe Research ABSTRACT Supervised and semi-supervised
More informationLess naive Bayes spam detection
Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:h.m.yang@tue.nl also CoSiNe Connectivity Systems
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationNote on growth and growth accounting
CHAPTER 0 Note on growth and growth accounting 1. Growth and the growth rate In this section aspects of the mathematical concept of the rate of growth used in growth models and in the empirical analysis
More informationTRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY
4 4th International Workshop on Acoustic Signal Enhancement (IWAENC) TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY Takuya Toyoda, Nobutaka Ono,3, Shigeki Miyabe, Takeshi Yamada, Shoji Makino University
More informationSemantic Video Annotation by Mining Association Patterns from Visual and Speech Features
Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
More informationThe CUSUM algorithm a small review. Pierre Granjon
The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................
More informationHow to Improve the Sound Quality of Your Microphone
An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies Andreas Maier, Julian Exner, Stefan Steidl, Anton Batliner, Tino Haderlein, and Elmar Nöth Universität Erlangen-Nürnberg,
More informationNonparametric Tests for Randomness
ECE 461 PROJECT REPORT, MAY 2003 1 Nonparametric Tests for Randomness Ying Wang ECE 461 PROJECT REPORT, MAY 2003 2 Abstract To decide whether a given sequence is truely random, or independent and identically
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationAutomatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior
Automatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior Kenji Yamashiro, Daisuke Deguchi, Tomokazu Takahashi,2, Ichiro Ide, Hiroshi Murase, Kazunori Higuchi 3,
More informationSpeech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
More informationPERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS
The 21 st International Congress on Sound and Vibration 13-17 July, 2014, Beijing/China PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS Dan Wang, Nanjie Yan and Jianxin Peng*
More informationInternet Traffic Prediction by W-Boost: Classification and Regression
Internet Traffic Prediction by W-Boost: Classification and Regression Hanghang Tong 1, Chongrong Li 2, Jingrui He 1, and Yang Chen 1 1 Department of Automation, Tsinghua University, Beijing 100084, China
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationAuthor Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationVEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS
VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS Aswin C Sankaranayanan, Qinfen Zheng, Rama Chellappa University of Maryland College Park, MD - 277 {aswch, qinfen, rama}@cfar.umd.edu Volkan Cevher, James
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationIntegration of Negative Emotion Detection into a VoIP Call Center System
Integration of Negative Detection into a VoIP Call Center System Tsang-Long Pao, Chia-Feng Chang, and Ren-Chi Tsao Department of Computer Science and Engineering Tatung University, Taipei, Taiwan Abstract
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationThe Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network
, pp.67-76 http://dx.doi.org/10.14257/ijdta.2016.9.1.06 The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network Lihua Yang and Baolin Li* School of Economics and
More informationReliable and Cost-Effective PoS-Tagging
Reliable and Cost-Effective PoS-Tagging Yu-Fang Tsai Keh-Jiann Chen Institute of Information Science, Academia Sinica Nanang, Taipei, Taiwan 5 eddie,chen@iis.sinica.edu.tw Abstract In order to achieve
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationDUOL: A Double Updating Approach for Online Learning
: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 zhao6@ntu.edu.sg Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationA Novel Decentralized Time Slot Allocation Algorithm in Dynamic TDD System
A Novel Decentralized Time Slot Allocation Algorithm in Dynamic TDD System Young Sil Choi Email: choiys@mobile.snu.ac.kr Illsoo Sohn Email: sohnis@mobile.snu.ac.kr Kwang Bok Lee Email: klee@snu.ac.kr Abstract
More informationSpeech recognition for human computer interaction
Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices
More informationNeovision2 Performance Evaluation Protocol
Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram rajmadhan@mail.usf.edu Dmitry Goldgof, Ph.D. goldgof@cse.usf.edu Rangachar Kasturi, Ph.D.
More informationSeparation and Classification of Harmonic Sounds for Singing Voice Detection
Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay
More informationForecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network
Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network Dušan Marček 1 Abstract Most models for the time series of stock prices have centered on autoregressive (AR)
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS
ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ImpostorMaps is a methodology developed by Auraya and available from Auraya resellers worldwide to configure,
More information