Building a Speaker Recognition System with one Sample
|
|
- Earl McKinney
- 7 years ago
- Views:
Transcription
1 ISBN (Print), (CD-ROM) Proceedings of the Second Symposium International Computer Science and Computational Technology(ISCSCT 09) Huangshan, P. R. China, 26-28,Dec. 2009, pp Building a Speaker Recognition System with one Sample Mansour Alsulaiman, Ghulam Muhammad, Yousef Alotaibi, Awais Mahmood, and Mohamed Abdelkader Bencherif Computer Engineering Dept., College of Computer and Information Sciences King Saud University, Saudi Arabia msulaiman@ksu.edu.sa; ghulam@ccis.ksu.edu.sa; yaalotaibi@ksu.edu.sa ; awais.mahmood@gmail.com; mbencherif1@yahoo.com Abstract Speaker recognition system is the process of automatically recognizing the person from his/her speech. To correctly recognize a speaker by the system, many speech samples are needed at different times from each speaker. However, in some applications, such as forensic, the number of samples for each speaker is very limited. In this paper, a method is proposed to train the speaker recognition system based on only one speech sample. From that one sample, other samples are generated. The intent is to provide a complete speaker recognition system, without bothering the speaker to record the speech samples at different times. For this purpose, the speech samples are modified without altering the pitch and the speaker dependent features. Many techniques are used to generate new samples and apply these to the system, when the recognition system is based on the hidden Markov model. The system is built using the HTK software which is a hidden Markov model kit, and the best recognition rate is 85.86%. Index Terms Speaker recognition; sample generation; hidden Markov model. I. INTRODUCTION Biometric systems are roughly divided into behavioral and physical pattern measurements. Many countries are making valuable scientific reports on the feasible and viable methods to be used in access or recognition systems. The complexity comes from many issues, the most important ones concern: the no error reproducibility of the registered pattern; the lower data collection error rate, the high user acceptability [1,2], the size of the database [9], the necessary technology to embed into the terminal capture points. These major strategic points tend to classify the different biometric issues into classes, and weight some techniques among others. From the dynamic methods, considered sometimes as changing over time, speech is extremely concerned, as fingerprints are, by the data collection error, many sessions must be executed in order to get a candidate set of samples. The beauty of speech is its non-invasive nature, i.e. it can be recorded without the person s acceptance, or sometimes without his/her physical attendance. This is also subject to the existing speech recording technology, by the use of sophisticated microphones, or via channels like the landline or mobile phones, or from some TV interviews or radio broadcasts. Unfortunately, sometimes there is not enough speech data recorded, which leads to a 2009 ACADEMY PUBLISHER AP-PROC-CS-09CN lack of enough training data for the model to be correctly trained, that results into a very low recognition rate. Some methods of speech lengthening are used in human speech recognition for the benefit of speech perception, and a source of information in understanding prosodic organization of speech [3], and also for children in kinder gardens for word discrimination [4]. Regarding the above mentioned problem, different techniques have been proposed in this paper for generating more new samples by using one sample. One method is an expansion or a meaningful lengthening, by modifying one of the existing samples, in order to strengthen the template establishment during training. All other original samples will be used for testing the system. The paper is organized as follows: section II describes the database, and selection of data; section III defines the modeling technique used in this paper; section IV introduces the front-end processing part of the system, section V illustrates different generation methods which will be explored in this paper, section VI describes the experiments performed with the results given in section VII; in section VIII, the results are analyzed. Finally, section IX concludes the paper and gives suggestions for future work. II. DATABASE This research has been conducted with a local dataset recorded at King Saud University, College of Computer and Information Sciences-CCIS, during the year 2007 [5]. The dataset consists of 91 speakers, pronouncing the word, نعم which stands in English for the word yes, in 5 different occurrences. The speaker recognition system is phoneme based, and uses the phonemes of the word, نعم for recognizing the speaker. The main characteristics of this word are of two aspects. The first aspect is that approximately all the Arab speakers frequently say "yes" (in Arabic) in any discussion. The second aspect is the richness of this word in the phonetic structure. It contains the nasal phoneme at last a bilabial,[ع] a very pertinent phoneme,[ن] phoneme,[م] allowing the capture of the energy of the whole word. It also contains two occurrences of the vowel.(فتحة) This richness, plus the fact that it is a commonly pronounced makes it a good choice for our investigation. The samples will be denoted as: First original sample: O 1.
2 Four other original samples are used for testing: O 2, O 3, O 4, O 5. In this work, a part of the database is used. This part consists of 25 different male speakers (20 adults + 5 children). All are native Arabic speakers. Each Speaker uttered the same isolated word نعم five times. The speakers recorded their speech samples in one or two sessions. III. MODELING TECHNIQUE AND SPEECH FEATURES In text dependent applications, where there is a strong prior knowledge of the spoken text, additional temporal knowledge can be incorporated by using the Hidden Markov Models (HMMs). HMM is a stochastic modeling approach used for speech/speaker recognition. It is similar to a finite state machine. Each state (node) has an associated probability density function (PDF) for the feature vectors. Moving from one state to another is subject to a transition probability. The first and the last states are not emitting states, since the first state is always from where the state machine starts and the last state is the one, where it always ends, i.e. there are no incoming transitions into the start state and there are no output transitions from the end state. Every emitting state has a set of outgoing transitions and the sum of the probabilities for those transitions is equal to one, since the transition from a non-final state must always occur [6, 7, 8]. The HMM system is build using the HTK (Hidden Markov Toolkit) software, which was developed by Steve Young at Cambridge University in Our work involves with three active states, left to right. Also, each state has one mixture. Each phoneme in the keywords is modeled by one model with number of speakers. For example, for a given speaker, each phoneme is modeled differently, even by dealing with the same linguistic sound. These models can be used to find the speaker identity. The silence model is also included in the model set. In a later step, the short pause is created from and tied to the silence model. This system is similar to our original work as presented in [5]. V. PROPOSED METHODS The methods or techniques to generate new speech samples from one original speech sample are proposed. These new samples can be used for the training of speaker recognition systems without altering the speaker identity, such as modifying the pitch and/or the speaker dependent features. All the samples are generated by modifying the first speech sample of each speaker, in the time domain using the PRAAT software. The new samples are generated by any/or combination of the following methods: A. Copying a part of speech & concatenating it The samples are generated by copying a small part from the initial speech sample and then inserting it just after the selection. This is done on the first, middle, and last parts of the sample, resulting in three different additional samples. The cut part is around 20 to 30 milliseconds for first group of three new samples, and 40 to 60 milliseconds for the second group. B. Reversing of the word In this category four different kinds of samples are generated. The first sample is generated by reversing the original sample. The second, third and fourth samples are generated by coping a small part (approximately 20ms to 30 ms) from the phonemes, م ع and ن then inserting it just after the selection in the reversed word. C. Adding noise at different SNR A total of six samples are generated. First three samples are generated by adding babble noise of 5db, 10db and 20db SNR, respectively. The other three samples are generated by adding train noise of 5db, 10db and 20db SNR, respectively. VI. EXPERIMENTS In order to confirm that the new generated samples contain supplementary information about the speakers, initially a test experiment is performed on which a system is trained and tested with the same original sample. This IV. FRONT-END PROCESSING This step deals with the extraction of features, where speech is reduced into a smaller amount of important characteristics, represented by a set of vectors, such as the Mel Frequency Cepstral Coefficients (MFCC). The cepstral features are mostly used in speaker recognition, due to many reasons, their robustness to noise distortion, their capability to filter the sound as does the human cochlear system, and their degree of de-correlation. The parameters of the system are 16KHz sampling rate with a 16 bit sample resolution, 25 milliseconds Hamming window duration with a step size of 10 milliseconds, and 12 MFCC coefficients as features (a) Original ع (c) Lengthening ن (b) Lengthening م (c) Lengthening Figure1. Original and generated samples using concatenation. 331
3 experiment is named as E 1. The recognition rate is 10%, as expected, which is very low. This is due to the fact, that there is not enough information in one sample. Moreover, the system is trained and tested with the original and generated samples (experiments E 2 and E 3 from section V. A), and 100% accuracy is obtained. This high recognition rate is due to supplementary or additional information obtained during the training by the new generated samples. However, this is not a real test, because the system shall be tested with other original samples, as with the following experiments. A. Concatenation Samples S 5, S 6 and S 7 are generated in this section. These are generated by copying the central part, ع, ن approximately 20 ms to 30 ms, of each phoneme and م of the original sample (O 1 ), then inserting it just after the selected part, respectively. The vertical dotted lines in thefig. 1 show the inserted part. This group is named as conc1. The samples S 8, S 9 and S 10 are generated by copying a part of 40 ms to 60 ms. It is a longer length than conc1, of each phoneme, ن ع and م of the original sample O 1. Then it is inserted just after the selected part. This group is named as conc2, as mentioned in Table1. B. Generating Samples by Reversing Different samples are generated in this part. The first sample in this group S 11 is generated by reversing the sample O 1. The second, third and fourth generated samples in this group are S 12, S 13 and S 14. These are generated by copying a part of approximately 20 ms to 30 ms of each phoneme, م ع and, ن of the sample S 11, then inserting it just after the selected part respectively. Note that the order of the phonemes is reversed leading to a new word meaning "all together". This group is named as rev4. S 11 will name as rev1. C. Generating samples by adding noise A total of six samples are generated in this last category. The samples S 15, S 16 and S 17 are generated by adding the babble noise at 5db, 10db and 20db SNR respectively. This group s name is nois1. The three other samples S 18, S 19 and S 20 are generated by adding the train noise at 5db, 10db and 20db SNR respectively. nois2 is the selected name for this group. VII. RESULTS Table 2 describes the results of the different conducted experiments. These experiments are performed using 25 speakers of the database (Section II). In all these experiments the training samples are O 1, and the groups of generated samples as presented in Table 1. A. Effect of concatenation Three experiments are conducted in this part: 1. E 4 represents the training of the system by using the samples of conc1; the recognition rate is 50%. Table 2. Experimental Results Exp Training Test Rec. Technique no. Samples Samples rate E 4 conc1 O 1,S 5,S 6,S 7 O 2,O 3 50% E 5 conc2 O 1, S 8,S 9, S 10 O 2,O 3 40% E 6 conc1,conc2 O 1, S 5, S 6, S 7, S 8, S 9,S 10 83% E 7 conc1,nois1 O S, S, S, S, O,O S 15, S 16,S , 82.11% E 8 nois1,nois2, O 1,S 15, S 16, S 17, conc1 S 18, S 19,S 20, 82% E 9 conc1,rev4 O 1, S 5, S 6, O S 7,,S 11, 1,O 2, O S 12,S 13,S 6,O % E 10 E 11 TABLE 1. Techniques for generating samples Sample Method to generate the new Category Code sample O 1 Original Sample S 5,S 6,S 7 conc1 A small part of the first second, ن and third phoneme which are (approx to 0.03, م and ع seconds) is copied and inserted it just after the selection. S 8,S 9,S 10 conc2 A small part of the first second and third phonemes which are (approx to, م and ع, ن 0.06 seconds) is copied and inserted it just after the selection. S 11 rev1 Reverse of O 1 A small part of S 11, the first second and third, م are phoneme which S 12,S 13, S 14 rev4 (approx., ن and ع 0.02 to 0.03 seconds) is copied and inserted it just after the selection respectively S 15,S 16, S 17 S 18,S 19, S 20 O 6,O 7 nois1 nois2 conc1,conc2, rev4 conc1,conc2, rev1 Babble noise is added at 5db, 10db and 20db in the original speech signal O 1. Train noise is added at 5db, 10db and 20db in the original speech signalo 1. Original samples Reverse of O 2,O 3 respectively O 1,S 5, S 6,S 7,S 8, S 9, S 10,S 11,S 12, S 13, S 14 O 1,S 5,S 6,S 7,S 8,S 9,S 10,S % 85.86% 2. E 5 describes the training of the system by using the samples of conc2; the recognition rate is 40%; this reduced the previous recognition rate by 10 %. 3. In the experiment E 6, it is observed that when both types of concatenations (more information) are included, the recognition rate increased to 82%. These results indicate that different types of concatenation or more samples will improve the recognition rates, better than a single type of concatenation. 332
4 B. Effect of Noise Two experiments are conducted in this part: 1. E 7 illustrates the training of the system by using the samples of conc1 and nois1, the recognition rate is 82.11%. 2. E 8 represents the training of the system by using the samples of nois1 and nois2, the recognition rate is 82%. C. Effect of Reverse In this part, the used samples are generated as described in section V. B. In the following three experiments 1. E 10 shows that by training the system using the samples of conc1, conc2 and rev4. The recognition rate is 76.77%. 2. In E 11, the samples conc1, conc2, and rev1 are used, the recognition rate increased to %. VIII. DISCUSSION Experiment E 1 sets the baseline for this work, since it is shown that without enough information in different samples, the HMM will not be able to build a model and recognize it. Repeating the same sample does not give any new information. Then, by conducting experiments E 4 and E 5, it is proved that by careful modification of a sample, new samples can be generated this would give to the HMM more information, and allows building an improved model that enhances the recognition rates. So, these three experiments (E 1, E 4, and E 5 ) are the bootstrap of our work. 90% E6 85% 80% 75% 70% 65% Figure2. Recognition rates per experiment From experiment E 6, it can be seen that by complementing one method of generation with another the recognition rate increased from 40-50% to 83%. Similar conclusion can be obtained from E 7, where we complemented concatenation with adding noise. Experiment E 8 not only emphasizes this conclusion, but it is also a major result, since it gives a high recognition rate with the samples generated by adding noise and with a little alteration in the original sample (conc1). These results wrap up with the following consequences, lengthening the vowel part duration may increase the recognition rate. Adding noise together with lengthening vowels may reduce the error rate. E 9 and E 10 E7 E8 E9 E10 E11 do not give good results as in E 6 -E 8. This can be attributed to the fact that there is a co-articulation effect, and the phonemes are context dependant, these are affected by the previous and following phonemes. In E 11, conc1, conc2 and rev1 are used; this experiment outperforms the other methods. The recognition rate attained is 85.86%. Although conc1 and conc2 are generated using the same technique, but they are of different lengths, giving different information. rev1 is generated by reversing the word. Hence, there are 3 methods, which are complementing each other, and results into the best recognition rate. In other words reversing the sample in time domain may have a positive effect on the recognition rate. This concept of complementary methods give better results, it can be clarified by the following example and analogy. By looking at a view from different angles, we can produce a better picture of the view or even a complete 360 degrees picture. Similarly using different methods of generating new samples will give HMM a better representation, so a better model is built. This point suggests that we investigate other ways of speech samples generation and explore different combinations. IX. CONCLUSION Different techniques to generate new samples from an original sample, to overcome the problem of a limited database are proposed. Experimental results showed that even adding different types of noises at different SNR levels to the original sample, during training significantly improved the recognition accuracy. The experiments also demonstrate that by using different methods to complement each other will lead to an increase in the recognition rate. The highest obtained recognition rate is 85.86% by using samples from three different methods. In the future, we will investigate these techniques on a large number of speakers, as well as improve the accuracy using other possible techniques. Initial results with 50 speakers are encouraging. The work could be extended to look for the minimum number of words, with some specific phonemes, that may keep the recognition rate as high as possible. This might be used to select a very accurate set of words (phonemes) that characterizes the speaker, without making long sessions of recording. From these selected set of basic words (phonemes) one can generate new samples, using the methods presented in this paper. REFERENCES [1]. J. Wayman, A. Janil, D. Maltoni, D. Maio, Biometric Systems, Technology, Design, and performance evaluation, Springer Editions. [2]. Thomas Ruggles, comparison of biometric techniques, [3]. Cao Jianfen, Restudy of segmental lengthening in Mandarin Chinese, ISCA2004. [4]. Segers, Eliane; Verhoeven, Ludo, Effects of Lengthening the Speech Signal on Auditory Word Discrimination in 333
5 Kindergartners with SLI, Journal of Communication Disorders, v38 n6 p Nov-Dec [5]. S.S. Al-Dahri, Y.H. Al-Jassar,Y.A. Alotaibi, M.M. Alsulaiman, K.A.B Abdullah-Al-Mamun, A Word- Dependent Automatic Arabic Speaker Identification System Signal Processing and Information Technology, ISSPIT IEEE, pp [6]. L. R. Rabiner, B. H. Juang, An Introduction to Hidden Markov Models. IEEE Acoust. Speech Signal Proc. Magazine, Vol. 3, pp. 4-16, Jan., 25, [7]. L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected applications in Speech Recognition Proc. IEEE, vol. 77 (2), pp , Feb [8]. J. Olsson, Text Dependent Speaker Verification with a Hybrid HMM/ANN System. Thesis Project in Speech Technology, /exjobb_jolsson.pdf. [9]. F.Botti, A.Alexander, and A.Drygajlo. An interpretation framework for the evaluation of evidence in forensic automatic speaker recognition with limited suspect data. In Proceedings of 2004: A Speaker Odyssey, pages 63-68, Toledo, Spain,
Establishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationSpeech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus
Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,
More informationSecure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationAPPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
More informationSpeech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
More informationMembering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationMyanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
More informationSchool Class Monitoring System Based on Audio Signal Processing
C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationThirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationSpeech Recognition on Cell Broadband Engine UCRL-PRES-223890
Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationAvailable from Deakin Research Online:
This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,
More informationSpeech recognition for human computer interaction
Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices
More informationInvestigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationAutomatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
More informationSPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,
More informationDeveloping an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling
More informationLecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationArtificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
More informationThe effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications
Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationAn Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1 Data balancing for efficient training of Hybrid ANN/HMM Automatic Speech Recognition systems Ana Isabel García-Moral,
More informationAudio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
More informationThis document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;
More informationEfficient on-line Signature Verification System
International Journal of Engineering & Technology IJET-IJENS Vol:10 No:04 42 Efficient on-line Signature Verification System Dr. S.A Daramola 1 and Prof. T.S Ibiyemi 2 1 Department of Electrical and Information
More informationCHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present
CHANWOO KIM (BIRTH: APR. 9, 1976) 2602E NSH Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Phone: +1-412-726-3996 Email: chanwook@cs.cmu.edu RESEARCH INTERESTS Speech recognition system,
More informationL9: Cepstral analysis
L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationPERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS
The 21 st International Congress on Sound and Vibration 13-17 July, 2014, Beijing/China PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS Dan Wang, Nanjie Yan and Jianxin Peng*
More informationSpeech and Network Marketing Model - A Review
Jastrzȩbia Góra, 16 th 20 th September 2013 APPLYING DATA MINING CLASSIFICATION TECHNIQUES TO SPEAKER IDENTIFICATION Kinga Sałapa 1,, Agata Trawińska 2 and Irena Roterman-Konieczna 1, 1 Department of Bioinformatics
More informationA Sound Analysis and Synthesis System for Generating an Instrumental Piri Song
, pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,
More informationHSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER
HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER Gholamreza Anbarjafari icv Group, IMS Lab, Institute of Technology, University of Tartu, Tartu 50411, Estonia sjafari@ut.ee
More informationIntroduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationDiscriminative Multimodal Biometric. Authentication Based on Quality Measures
Discriminative Multimodal Biometric Authentication Based on Quality Measures Julian Fierrez-Aguilar a,, Javier Ortega-Garcia a, Joaquin Gonzalez-Rodriguez a, Josef Bigun b a Escuela Politecnica Superior,
More informationVideo Affective Content Recognition Based on Genetic Algorithm Combined HMM
Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China
More informationSpeech recognition technology for mobile phones
Speech recognition technology for mobile phones Stefan Dobler Following the introduction of mobile phones using voice commands, speech recognition is becoming standard on mobile handsets. Features such
More informationVOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications
VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,
More informationImplementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31
Disclaimer: This document was part of the First European DSP Education and Research Conference. It may have been written by someone whose native language is not English. TI assumes no liability for the
More informationL2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES
L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES Zhen Qin, Allard Jongman Department of Linguistics, University of Kansas, United States qinzhenquentin2@ku.edu, ajongman@ku.edu
More informationOpen-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com
More informationMarathi Interactive Voice Response System (IVRS) using MFCC and DTW
Marathi Interactive Voice Response System (IVRS) using MFCC and DTW Manasi Ram Baheti Department of CSIT, Dr.B.A.M. University, Aurangabad, (M.S.), India Bharti W. Gawali Department of CSIT, Dr.B.A.M.University,
More informationHOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION. Victor Bisot, Slim Essid, Gaël Richard
HOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Victor Bisot, Slim Essid, Gaël Richard Institut Mines-Télécom, Télécom ParisTech, CNRS LTCI, 37-39 rue Dareau, 75014
More informationQMeter Tools for Quality Measurement in Telecommunication Network
QMeter Tools for Measurement in Telecommunication Network Akram Aburas 1 and Prof. Khalid Al-Mashouq 2 1 Advanced Communications & Electronics Systems, Riyadh, Saudi Arabia akram@aces-co.com 2 Electrical
More informationAutomatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations
Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Hugues Salamin, Anna Polychroniou and Alessandro Vinciarelli University of Glasgow - School of computing Science, G128QQ
More informationOnline Diarization of Telephone Conversations
Odyssey 2 The Speaker and Language Recognition Workshop 28 June July 2, Brno, Czech Republic Online Diarization of Telephone Conversations Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman Department of
More informationThe Development of a Pressure-based Typing Biometrics User Authentication System
The Development of a Pressure-based Typing Biometrics User Authentication System Chen Change Loy Adv. Informatics Research Group MIMOS Berhad by Assoc. Prof. Dr. Chee Peng Lim Associate Professor Sch.
More informationWireless Remote Monitoring System for ASTHMA Attack Detection and Classification
Department of Telecommunication Engineering Hijjawi Faculty for Engineering Technology Yarmouk University Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification Prepared by Orobh
More informationTEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,
More informationSecuring Electronic Medical Records using Biometric Authentication
Securing Electronic Medical Records using Biometric Authentication Stephen Krawczyk and Anil K. Jain Michigan State University, East Lansing MI 48823, USA, krawcz10@cse.msu.edu, jain@cse.msu.edu Abstract.
More informationThe Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
More informationDevelopment of Academic Attendence Monitoring System Using Fingerprint Identification
164 Development of Academic Attendence Monitoring System Using Fingerprint Identification TABASSAM NAWAZ, SAIM PERVAIZ, ARASH KORRANI, AZHAR-UD-DIN Software Engineering Department Faculty of Telecommunication
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationCONATION: English Command Input/Output System for Computers
CONATION: English Command Input/Output System for Computers Kamlesh Sharma* and Dr. T. V. Prasad** * Research Scholar, ** Professor & Head Dept. of Comp. Sc. & Engg., Lingaya s University, Faridabad, India
More informationWeighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 Weighting and Normalisation of Synchronous HMMs for
More informationComp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition
Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The
More informationhave more skill and perform more complex
Speech Recognition Smartphone UI Speech Recognition Technology and Applications for Improving Terminal Functionality and Service Usability User interfaces that utilize voice input on compact devices such
More informationMISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION Ulpu Remes, Kalle J. Palomäki, and Mikko Kurimo Adaptive Informatics Research Centre,
More informationEmail Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
More informationOCR-Based Electronic Documentation Management System
OCR-Based Electronic Documentation Management System Khalaf S. Alkhalaf, Abdulelah I. Almishal, Anas O. Almahmoud, and Majed S. Alotaibi Abstract Optical character recognition (OCR) is one of the latest
More informationThings to remember when transcribing speech
Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely
More informationLog-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network
Recent Advances in Electrical Engineering and Electronic Devices Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Ahmed El-Mahdy and Ahmed Walid Faculty of Information Engineering
More informationHow To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3
Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is
More informationUser Authentication using Combination of Behavioral Biometrics over the Touchpad acting like Touch screen of Mobile Device
2008 International Conference on Computer and Electrical Engineering User Authentication using Combination of Behavioral Biometrics over the Touchpad acting like Touch screen of Mobile Device Hataichanok
More informationAn Experimental Study of the Performance of Histogram Equalization for Image Enhancement
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Special Issue-2, April 216 E-ISSN: 2347-2693 An Experimental Study of the Performance of Histogram Equalization
More informationHow To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
More informationCloud User Voice Authentication enabled with Single Sign-On framework using OpenID
Cloud User Voice Authentication enabled with Single Sign-On framework using OpenID R.Gokulavanan Assistant Professor, Department of Information Technology, Nandha Engineering College, Erode, Tamil Nadu,
More informationOn the Operational Quality of Fingerprint Scanners
BioLab - Biometric System Lab University of Bologna - ITALY http://biolab.csr.unibo.it On the Operational Quality of Fingerprint Scanners Davide Maltoni and Matteo Ferrara November 7, 2007 Outline The
More informationClassification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach
International Journal of Civil & Environmental Engineering IJCEE-IJENS Vol:13 No:03 46 Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach Mansour N. Jadid
More informationFactors Influencing the Adoption of Biometric Authentication in Mobile Government Security
Factors Influencing the Adoption of Biometric Authentication in Mobile Government Security Thamer Omar Alhussain Bachelor of Computing, Master of ICT School of Information and Communication Technology
More informationUnlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics
Unlocking Value from Patanjali V, Lead Data Scientist, Anand B, Director Analytics Consulting, EXECUTIVE SUMMARY Today a lot of unstructured data is being generated in the form of text, images, videos
More informationDenoising Convolutional Autoencoders for Noisy Speech Recognition
Denoising Convolutional Autoencoders for Noisy Speech Recognition Mike Kayser Stanford University mkayser@stanford.edu Victor Zhong Stanford University vzhong@stanford.edu Abstract We propose the use of
More informationThe LENA TM Language Environment Analysis System:
FOUNDATION The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File Dongxin Xu, Umit Yapanel, Sharmi Gray, & Charles T. Baer LENA Foundation, Boulder, CO LTR-04-2 September
More informationGrant: LIFE08 NAT/GR/000539 Total Budget: 1,664,282.00 Life+ Contribution: 830,641.00 Year of Finance: 2008 Duration: 01 FEB 2010 to 30 JUN 2013
Coordinating Beneficiary: UOP Associated Beneficiaries: TEIC Project Coordinator: Nikos Fakotakis, Professor Wire Communications Laboratory University of Patras, Rion-Patras 26500, Greece Email: fakotaki@upatras.gr
More informationMaximum Likelihood Estimation of ADC Parameters from Sine Wave Test Data. László Balogh, Balázs Fodor, Attila Sárhegyi, and István Kollár
Maximum Lielihood Estimation of ADC Parameters from Sine Wave Test Data László Balogh, Balázs Fodor, Attila Sárhegyi, and István Kollár Dept. of Measurement and Information Systems Budapest University
More informationCBS RECORDS PROFESSIONAL SERIES CBS RECORDS CD-1 STANDARD TEST DISC
CBS RECORDS PROFESSIONAL SERIES CBS RECORDS CD-1 STANDARD TEST DISC 1. INTRODUCTION The CBS Records CD-1 Test Disc is a highly accurate signal source specifically designed for those interested in making
More informationIntroduction to Digital Audio
Introduction to Digital Audio Before the development of high-speed, low-cost digital computers and analog-to-digital conversion circuits, all recording and manipulation of sound was done using analog techniques.
More informationRoom Acoustic Reproduction by Spatial Room Response
Room Acoustic Reproduction by Spatial Room Response Rendering Hoda Nasereddin 1, Mohammad Asgari 2 and Ayoub Banoushi 3 Audio Engineer, Broadcast engineering department, IRIB university, Tehran, Iran,
More informationAnalecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
More informationSASSC: A Standard Arabic Single Speaker Corpus
SASSC: A Standard Arabic Single Speaker Corpus Ibrahim Almosallam, Atheer AlKhalifa, Mansour Alghamdi, Mohamed Alkanhal, Ashraf Alkhairy The Computer Research Institute King Abdulaziz City for Science
More informationSolutions to Exam in Speech Signal Processing EN2300
Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.
More informationMusic Genre Classification
Music Genre Classification Michael Haggblade Yang Hong Kenny Kao 1 Introduction Music classification is an interesting problem with many applications, from Drinkify (a program that generates cocktails
More informationNon-Data Aided Carrier Offset Compensation for SDR Implementation
Non-Data Aided Carrier Offset Compensation for SDR Implementation Anders Riis Jensen 1, Niels Terp Kjeldgaard Jørgensen 1 Kim Laugesen 1, Yannick Le Moullec 1,2 1 Department of Electronic Systems, 2 Center
More informationMUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
More informationFunctional Auditory Performance Indicators (FAPI)
Functional Performance Indicators (FAPI) An Integrated Approach to Skill FAPI Overview The Functional (FAPI) assesses the functional auditory skills of children with hearing loss. It can be used by parents,
More informationApplication Note. Using PESQ to Test a VoIP Network. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN
Using PESQ to Test a VoIP Network Application Note Prepared by: Psytechnics Limited 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN t: +44 (0) 1473 261 800 f: +44 (0) 1473 261 880 e: info@psytechnics.com
More informationBandwidth analysis of multimode fiber passive optical networks (PONs)
Optica Applicata, Vol. XXXIX, No. 2, 2009 Bandwidth analysis of multimode fiber passive optical networks (PONs) GRZEGORZ STEPNIAK *, LUKASZ MAKSYMIUK, JERZY SIUZDAK Institute of Telecommunications, Warsaw
More informationSecuring Electronic Medical Records Using Biometric Authentication
Securing Electronic Medical Records Using Biometric Authentication Stephen Krawczyk and Anil K. Jain Michigan State University, East Lansing MI 48823, USA {krawcz10,jain}@cse.msu.edu Abstract. Ensuring
More informationRecognition of Emotions in Interactive Voice Response Systems
Recognition of Emotions in Interactive Voice Response Systems Sherif Yacoub, Steve Simske, Xiaofan Lin, John Burns HP Laboratories Palo Alto HPL-2003-136 July 2 nd, 2003* E-mail: {sherif.yacoub, steven.simske,
More informationHMM-based Breath and Filled Pauses Elimination in ASR
HMM-based Breath and Filled Pauses Elimination in ASR Piotr Żelasko 1, Tomasz Jadczyk 1,2 and Bartosz Ziółko 1,2 1 Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science
More informationTHE COLLECTION AND PRELIMINARY ANALYSIS OF A SPONTANEOUS SPEECH DATABASE*
THE COLLECTION AND PRELIMINARY ANALYSIS OF A SPONTANEOUS SPEECH DATABASE* Victor Zue, Nancy Daly, James Glass, David Goodine, Hong Leung, Michael Phillips, Joseph Polifroni, Stephanie Seneff, and Michal
More informationVoice Authentication for ATM Security
Voice Authentication for ATM Security Rahul R. Sharma Department of Computer Engineering Fr. CRIT, Vashi Navi Mumbai, India rahulrsharma999@gmail.com Abstract: Voice authentication system captures the
More informationA Digital Audio Watermark Embedding Algorithm
Xianghong Tang, Yamei Niu, Hengli Yue, Zhongke Yin Xianghong Tang, Yamei Niu, Hengli Yue, Zhongke Yin School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, Zhejiang, 3008, China tangxh@hziee.edu.cn,
More informationMPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music
ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final
More information