Building a Speaker Recognition System with one Sample

Size: px
Start display at page:

Download "Building a Speaker Recognition System with one Sample"

Transcription

1 ISBN (Print), (CD-ROM) Proceedings of the Second Symposium International Computer Science and Computational Technology(ISCSCT 09) Huangshan, P. R. China, 26-28,Dec. 2009, pp Building a Speaker Recognition System with one Sample Mansour Alsulaiman, Ghulam Muhammad, Yousef Alotaibi, Awais Mahmood, and Mohamed Abdelkader Bencherif Computer Engineering Dept., College of Computer and Information Sciences King Saud University, Saudi Arabia msulaiman@ksu.edu.sa; ghulam@ccis.ksu.edu.sa; yaalotaibi@ksu.edu.sa ; awais.mahmood@gmail.com; mbencherif1@yahoo.com Abstract Speaker recognition system is the process of automatically recognizing the person from his/her speech. To correctly recognize a speaker by the system, many speech samples are needed at different times from each speaker. However, in some applications, such as forensic, the number of samples for each speaker is very limited. In this paper, a method is proposed to train the speaker recognition system based on only one speech sample. From that one sample, other samples are generated. The intent is to provide a complete speaker recognition system, without bothering the speaker to record the speech samples at different times. For this purpose, the speech samples are modified without altering the pitch and the speaker dependent features. Many techniques are used to generate new samples and apply these to the system, when the recognition system is based on the hidden Markov model. The system is built using the HTK software which is a hidden Markov model kit, and the best recognition rate is 85.86%. Index Terms Speaker recognition; sample generation; hidden Markov model. I. INTRODUCTION Biometric systems are roughly divided into behavioral and physical pattern measurements. Many countries are making valuable scientific reports on the feasible and viable methods to be used in access or recognition systems. The complexity comes from many issues, the most important ones concern: the no error reproducibility of the registered pattern; the lower data collection error rate, the high user acceptability [1,2], the size of the database [9], the necessary technology to embed into the terminal capture points. These major strategic points tend to classify the different biometric issues into classes, and weight some techniques among others. From the dynamic methods, considered sometimes as changing over time, speech is extremely concerned, as fingerprints are, by the data collection error, many sessions must be executed in order to get a candidate set of samples. The beauty of speech is its non-invasive nature, i.e. it can be recorded without the person s acceptance, or sometimes without his/her physical attendance. This is also subject to the existing speech recording technology, by the use of sophisticated microphones, or via channels like the landline or mobile phones, or from some TV interviews or radio broadcasts. Unfortunately, sometimes there is not enough speech data recorded, which leads to a 2009 ACADEMY PUBLISHER AP-PROC-CS-09CN lack of enough training data for the model to be correctly trained, that results into a very low recognition rate. Some methods of speech lengthening are used in human speech recognition for the benefit of speech perception, and a source of information in understanding prosodic organization of speech [3], and also for children in kinder gardens for word discrimination [4]. Regarding the above mentioned problem, different techniques have been proposed in this paper for generating more new samples by using one sample. One method is an expansion or a meaningful lengthening, by modifying one of the existing samples, in order to strengthen the template establishment during training. All other original samples will be used for testing the system. The paper is organized as follows: section II describes the database, and selection of data; section III defines the modeling technique used in this paper; section IV introduces the front-end processing part of the system, section V illustrates different generation methods which will be explored in this paper, section VI describes the experiments performed with the results given in section VII; in section VIII, the results are analyzed. Finally, section IX concludes the paper and gives suggestions for future work. II. DATABASE This research has been conducted with a local dataset recorded at King Saud University, College of Computer and Information Sciences-CCIS, during the year 2007 [5]. The dataset consists of 91 speakers, pronouncing the word, نعم which stands in English for the word yes, in 5 different occurrences. The speaker recognition system is phoneme based, and uses the phonemes of the word, نعم for recognizing the speaker. The main characteristics of this word are of two aspects. The first aspect is that approximately all the Arab speakers frequently say "yes" (in Arabic) in any discussion. The second aspect is the richness of this word in the phonetic structure. It contains the nasal phoneme at last a bilabial,[ع] a very pertinent phoneme,[ن] phoneme,[م] allowing the capture of the energy of the whole word. It also contains two occurrences of the vowel.(فتحة) This richness, plus the fact that it is a commonly pronounced makes it a good choice for our investigation. The samples will be denoted as: First original sample: O 1.

2 Four other original samples are used for testing: O 2, O 3, O 4, O 5. In this work, a part of the database is used. This part consists of 25 different male speakers (20 adults + 5 children). All are native Arabic speakers. Each Speaker uttered the same isolated word نعم five times. The speakers recorded their speech samples in one or two sessions. III. MODELING TECHNIQUE AND SPEECH FEATURES In text dependent applications, where there is a strong prior knowledge of the spoken text, additional temporal knowledge can be incorporated by using the Hidden Markov Models (HMMs). HMM is a stochastic modeling approach used for speech/speaker recognition. It is similar to a finite state machine. Each state (node) has an associated probability density function (PDF) for the feature vectors. Moving from one state to another is subject to a transition probability. The first and the last states are not emitting states, since the first state is always from where the state machine starts and the last state is the one, where it always ends, i.e. there are no incoming transitions into the start state and there are no output transitions from the end state. Every emitting state has a set of outgoing transitions and the sum of the probabilities for those transitions is equal to one, since the transition from a non-final state must always occur [6, 7, 8]. The HMM system is build using the HTK (Hidden Markov Toolkit) software, which was developed by Steve Young at Cambridge University in Our work involves with three active states, left to right. Also, each state has one mixture. Each phoneme in the keywords is modeled by one model with number of speakers. For example, for a given speaker, each phoneme is modeled differently, even by dealing with the same linguistic sound. These models can be used to find the speaker identity. The silence model is also included in the model set. In a later step, the short pause is created from and tied to the silence model. This system is similar to our original work as presented in [5]. V. PROPOSED METHODS The methods or techniques to generate new speech samples from one original speech sample are proposed. These new samples can be used for the training of speaker recognition systems without altering the speaker identity, such as modifying the pitch and/or the speaker dependent features. All the samples are generated by modifying the first speech sample of each speaker, in the time domain using the PRAAT software. The new samples are generated by any/or combination of the following methods: A. Copying a part of speech & concatenating it The samples are generated by copying a small part from the initial speech sample and then inserting it just after the selection. This is done on the first, middle, and last parts of the sample, resulting in three different additional samples. The cut part is around 20 to 30 milliseconds for first group of three new samples, and 40 to 60 milliseconds for the second group. B. Reversing of the word In this category four different kinds of samples are generated. The first sample is generated by reversing the original sample. The second, third and fourth samples are generated by coping a small part (approximately 20ms to 30 ms) from the phonemes, م ع and ن then inserting it just after the selection in the reversed word. C. Adding noise at different SNR A total of six samples are generated. First three samples are generated by adding babble noise of 5db, 10db and 20db SNR, respectively. The other three samples are generated by adding train noise of 5db, 10db and 20db SNR, respectively. VI. EXPERIMENTS In order to confirm that the new generated samples contain supplementary information about the speakers, initially a test experiment is performed on which a system is trained and tested with the same original sample. This IV. FRONT-END PROCESSING This step deals with the extraction of features, where speech is reduced into a smaller amount of important characteristics, represented by a set of vectors, such as the Mel Frequency Cepstral Coefficients (MFCC). The cepstral features are mostly used in speaker recognition, due to many reasons, their robustness to noise distortion, their capability to filter the sound as does the human cochlear system, and their degree of de-correlation. The parameters of the system are 16KHz sampling rate with a 16 bit sample resolution, 25 milliseconds Hamming window duration with a step size of 10 milliseconds, and 12 MFCC coefficients as features (a) Original ع (c) Lengthening ن (b) Lengthening م (c) Lengthening Figure1. Original and generated samples using concatenation. 331

3 experiment is named as E 1. The recognition rate is 10%, as expected, which is very low. This is due to the fact, that there is not enough information in one sample. Moreover, the system is trained and tested with the original and generated samples (experiments E 2 and E 3 from section V. A), and 100% accuracy is obtained. This high recognition rate is due to supplementary or additional information obtained during the training by the new generated samples. However, this is not a real test, because the system shall be tested with other original samples, as with the following experiments. A. Concatenation Samples S 5, S 6 and S 7 are generated in this section. These are generated by copying the central part, ع, ن approximately 20 ms to 30 ms, of each phoneme and م of the original sample (O 1 ), then inserting it just after the selected part, respectively. The vertical dotted lines in thefig. 1 show the inserted part. This group is named as conc1. The samples S 8, S 9 and S 10 are generated by copying a part of 40 ms to 60 ms. It is a longer length than conc1, of each phoneme, ن ع and م of the original sample O 1. Then it is inserted just after the selected part. This group is named as conc2, as mentioned in Table1. B. Generating Samples by Reversing Different samples are generated in this part. The first sample in this group S 11 is generated by reversing the sample O 1. The second, third and fourth generated samples in this group are S 12, S 13 and S 14. These are generated by copying a part of approximately 20 ms to 30 ms of each phoneme, م ع and, ن of the sample S 11, then inserting it just after the selected part respectively. Note that the order of the phonemes is reversed leading to a new word meaning "all together". This group is named as rev4. S 11 will name as rev1. C. Generating samples by adding noise A total of six samples are generated in this last category. The samples S 15, S 16 and S 17 are generated by adding the babble noise at 5db, 10db and 20db SNR respectively. This group s name is nois1. The three other samples S 18, S 19 and S 20 are generated by adding the train noise at 5db, 10db and 20db SNR respectively. nois2 is the selected name for this group. VII. RESULTS Table 2 describes the results of the different conducted experiments. These experiments are performed using 25 speakers of the database (Section II). In all these experiments the training samples are O 1, and the groups of generated samples as presented in Table 1. A. Effect of concatenation Three experiments are conducted in this part: 1. E 4 represents the training of the system by using the samples of conc1; the recognition rate is 50%. Table 2. Experimental Results Exp Training Test Rec. Technique no. Samples Samples rate E 4 conc1 O 1,S 5,S 6,S 7 O 2,O 3 50% E 5 conc2 O 1, S 8,S 9, S 10 O 2,O 3 40% E 6 conc1,conc2 O 1, S 5, S 6, S 7, S 8, S 9,S 10 83% E 7 conc1,nois1 O S, S, S, S, O,O S 15, S 16,S , 82.11% E 8 nois1,nois2, O 1,S 15, S 16, S 17, conc1 S 18, S 19,S 20, 82% E 9 conc1,rev4 O 1, S 5, S 6, O S 7,,S 11, 1,O 2, O S 12,S 13,S 6,O % E 10 E 11 TABLE 1. Techniques for generating samples Sample Method to generate the new Category Code sample O 1 Original Sample S 5,S 6,S 7 conc1 A small part of the first second, ن and third phoneme which are (approx to 0.03, م and ع seconds) is copied and inserted it just after the selection. S 8,S 9,S 10 conc2 A small part of the first second and third phonemes which are (approx to, م and ع, ن 0.06 seconds) is copied and inserted it just after the selection. S 11 rev1 Reverse of O 1 A small part of S 11, the first second and third, م are phoneme which S 12,S 13, S 14 rev4 (approx., ن and ع 0.02 to 0.03 seconds) is copied and inserted it just after the selection respectively S 15,S 16, S 17 S 18,S 19, S 20 O 6,O 7 nois1 nois2 conc1,conc2, rev4 conc1,conc2, rev1 Babble noise is added at 5db, 10db and 20db in the original speech signal O 1. Train noise is added at 5db, 10db and 20db in the original speech signalo 1. Original samples Reverse of O 2,O 3 respectively O 1,S 5, S 6,S 7,S 8, S 9, S 10,S 11,S 12, S 13, S 14 O 1,S 5,S 6,S 7,S 8,S 9,S 10,S % 85.86% 2. E 5 describes the training of the system by using the samples of conc2; the recognition rate is 40%; this reduced the previous recognition rate by 10 %. 3. In the experiment E 6, it is observed that when both types of concatenations (more information) are included, the recognition rate increased to 82%. These results indicate that different types of concatenation or more samples will improve the recognition rates, better than a single type of concatenation. 332

4 B. Effect of Noise Two experiments are conducted in this part: 1. E 7 illustrates the training of the system by using the samples of conc1 and nois1, the recognition rate is 82.11%. 2. E 8 represents the training of the system by using the samples of nois1 and nois2, the recognition rate is 82%. C. Effect of Reverse In this part, the used samples are generated as described in section V. B. In the following three experiments 1. E 10 shows that by training the system using the samples of conc1, conc2 and rev4. The recognition rate is 76.77%. 2. In E 11, the samples conc1, conc2, and rev1 are used, the recognition rate increased to %. VIII. DISCUSSION Experiment E 1 sets the baseline for this work, since it is shown that without enough information in different samples, the HMM will not be able to build a model and recognize it. Repeating the same sample does not give any new information. Then, by conducting experiments E 4 and E 5, it is proved that by careful modification of a sample, new samples can be generated this would give to the HMM more information, and allows building an improved model that enhances the recognition rates. So, these three experiments (E 1, E 4, and E 5 ) are the bootstrap of our work. 90% E6 85% 80% 75% 70% 65% Figure2. Recognition rates per experiment From experiment E 6, it can be seen that by complementing one method of generation with another the recognition rate increased from 40-50% to 83%. Similar conclusion can be obtained from E 7, where we complemented concatenation with adding noise. Experiment E 8 not only emphasizes this conclusion, but it is also a major result, since it gives a high recognition rate with the samples generated by adding noise and with a little alteration in the original sample (conc1). These results wrap up with the following consequences, lengthening the vowel part duration may increase the recognition rate. Adding noise together with lengthening vowels may reduce the error rate. E 9 and E 10 E7 E8 E9 E10 E11 do not give good results as in E 6 -E 8. This can be attributed to the fact that there is a co-articulation effect, and the phonemes are context dependant, these are affected by the previous and following phonemes. In E 11, conc1, conc2 and rev1 are used; this experiment outperforms the other methods. The recognition rate attained is 85.86%. Although conc1 and conc2 are generated using the same technique, but they are of different lengths, giving different information. rev1 is generated by reversing the word. Hence, there are 3 methods, which are complementing each other, and results into the best recognition rate. In other words reversing the sample in time domain may have a positive effect on the recognition rate. This concept of complementary methods give better results, it can be clarified by the following example and analogy. By looking at a view from different angles, we can produce a better picture of the view or even a complete 360 degrees picture. Similarly using different methods of generating new samples will give HMM a better representation, so a better model is built. This point suggests that we investigate other ways of speech samples generation and explore different combinations. IX. CONCLUSION Different techniques to generate new samples from an original sample, to overcome the problem of a limited database are proposed. Experimental results showed that even adding different types of noises at different SNR levels to the original sample, during training significantly improved the recognition accuracy. The experiments also demonstrate that by using different methods to complement each other will lead to an increase in the recognition rate. The highest obtained recognition rate is 85.86% by using samples from three different methods. In the future, we will investigate these techniques on a large number of speakers, as well as improve the accuracy using other possible techniques. Initial results with 50 speakers are encouraging. The work could be extended to look for the minimum number of words, with some specific phonemes, that may keep the recognition rate as high as possible. This might be used to select a very accurate set of words (phonemes) that characterizes the speaker, without making long sessions of recording. From these selected set of basic words (phonemes) one can generate new samples, using the methods presented in this paper. REFERENCES [1]. J. Wayman, A. Janil, D. Maltoni, D. Maio, Biometric Systems, Technology, Design, and performance evaluation, Springer Editions. [2]. Thomas Ruggles, comparison of biometric techniques, [3]. Cao Jianfen, Restudy of segmental lengthening in Mandarin Chinese, ISCA2004. [4]. Segers, Eliane; Verhoeven, Ludo, Effects of Lengthening the Speech Signal on Auditory Word Discrimination in 333

5 Kindergartners with SLI, Journal of Communication Disorders, v38 n6 p Nov-Dec [5]. S.S. Al-Dahri, Y.H. Al-Jassar,Y.A. Alotaibi, M.M. Alsulaiman, K.A.B Abdullah-Al-Mamun, A Word- Dependent Automatic Arabic Speaker Identification System Signal Processing and Information Technology, ISSPIT IEEE, pp [6]. L. R. Rabiner, B. H. Juang, An Introduction to Hidden Markov Models. IEEE Acoust. Speech Signal Proc. Magazine, Vol. 3, pp. 4-16, Jan., 25, [7]. L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected applications in Speech Recognition Proc. IEEE, vol. 77 (2), pp , Feb [8]. J. Olsson, Text Dependent Speaker Verification with a Hybrid HMM/ANN System. Thesis Project in Speech Technology, /exjobb_jolsson.pdf. [9]. F.Botti, A.Alexander, and A.Drygajlo. An interpretation framework for the evaluation of evidence in forensic automatic speaker recognition with limited suspect data. In Proceedings of 2004: A Speaker Odyssey, pages 63-68, Toledo, Spain,

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,

More information

Speech recognition for human computer interaction

Speech recognition for human computer interaction Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

Lecture 12: An Overview of Speech Recognition

Lecture 12: An Overview of Speech Recognition Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1

IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1 IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1 Data balancing for efficient training of Hybrid ANN/HMM Automatic Speech Recognition systems Ana Isabel García-Moral,

More information

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;

More information

Efficient on-line Signature Verification System

Efficient on-line Signature Verification System International Journal of Engineering & Technology IJET-IJENS Vol:10 No:04 42 Efficient on-line Signature Verification System Dr. S.A Daramola 1 and Prof. T.S Ibiyemi 2 1 Department of Electrical and Information

More information

CHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present

CHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present CHANWOO KIM (BIRTH: APR. 9, 1976) 2602E NSH Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Phone: +1-412-726-3996 Email: chanwook@cs.cmu.edu RESEARCH INTERESTS Speech recognition system,

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS The 21 st International Congress on Sound and Vibration 13-17 July, 2014, Beijing/China PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS Dan Wang, Nanjie Yan and Jianxin Peng*

More information

Speech and Network Marketing Model - A Review

Speech and Network Marketing Model - A Review Jastrzȩbia Góra, 16 th 20 th September 2013 APPLYING DATA MINING CLASSIFICATION TECHNIQUES TO SPEAKER IDENTIFICATION Kinga Sałapa 1,, Agata Trawińska 2 and Irena Roterman-Konieczna 1, 1 Department of Bioinformatics

More information

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song , pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,

More information

HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER

HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER Gholamreza Anbarjafari icv Group, IMS Lab, Institute of Technology, University of Tartu, Tartu 50411, Estonia sjafari@ut.ee

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Discriminative Multimodal Biometric. Authentication Based on Quality Measures

Discriminative Multimodal Biometric. Authentication Based on Quality Measures Discriminative Multimodal Biometric Authentication Based on Quality Measures Julian Fierrez-Aguilar a,, Javier Ortega-Garcia a, Joaquin Gonzalez-Rodriguez a, Josef Bigun b a Escuela Politecnica Superior,

More information

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China

More information

Speech recognition technology for mobile phones

Speech recognition technology for mobile phones Speech recognition technology for mobile phones Stefan Dobler Following the introduction of mobile phones using voice commands, speech recognition is becoming standard on mobile handsets. Features such

More information

VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications

VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,

More information

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31 Disclaimer: This document was part of the First European DSP Education and Research Conference. It may have been written by someone whose native language is not English. TI assumes no liability for the

More information

L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES

L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES Zhen Qin, Allard Jongman Department of Linguistics, University of Kansas, United States qinzhenquentin2@ku.edu, ajongman@ku.edu

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

Marathi Interactive Voice Response System (IVRS) using MFCC and DTW

Marathi Interactive Voice Response System (IVRS) using MFCC and DTW Marathi Interactive Voice Response System (IVRS) using MFCC and DTW Manasi Ram Baheti Department of CSIT, Dr.B.A.M. University, Aurangabad, (M.S.), India Bharti W. Gawali Department of CSIT, Dr.B.A.M.University,

More information

HOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION. Victor Bisot, Slim Essid, Gaël Richard

HOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION. Victor Bisot, Slim Essid, Gaël Richard HOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Victor Bisot, Slim Essid, Gaël Richard Institut Mines-Télécom, Télécom ParisTech, CNRS LTCI, 37-39 rue Dareau, 75014

More information

QMeter Tools for Quality Measurement in Telecommunication Network

QMeter Tools for Quality Measurement in Telecommunication Network QMeter Tools for Measurement in Telecommunication Network Akram Aburas 1 and Prof. Khalid Al-Mashouq 2 1 Advanced Communications & Electronics Systems, Riyadh, Saudi Arabia akram@aces-co.com 2 Electrical

More information

Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations

Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Hugues Salamin, Anna Polychroniou and Alessandro Vinciarelli University of Glasgow - School of computing Science, G128QQ

More information

Online Diarization of Telephone Conversations

Online Diarization of Telephone Conversations Odyssey 2 The Speaker and Language Recognition Workshop 28 June July 2, Brno, Czech Republic Online Diarization of Telephone Conversations Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman Department of

More information

The Development of a Pressure-based Typing Biometrics User Authentication System

The Development of a Pressure-based Typing Biometrics User Authentication System The Development of a Pressure-based Typing Biometrics User Authentication System Chen Change Loy Adv. Informatics Research Group MIMOS Berhad by Assoc. Prof. Dr. Chee Peng Lim Associate Professor Sch.

More information

Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification

Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification Department of Telecommunication Engineering Hijjawi Faculty for Engineering Technology Yarmouk University Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification Prepared by Orobh

More information

TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE

TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,

More information

Securing Electronic Medical Records using Biometric Authentication

Securing Electronic Medical Records using Biometric Authentication Securing Electronic Medical Records using Biometric Authentication Stephen Krawczyk and Anil K. Jain Michigan State University, East Lansing MI 48823, USA, krawcz10@cse.msu.edu, jain@cse.msu.edu Abstract.

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

Development of Academic Attendence Monitoring System Using Fingerprint Identification

Development of Academic Attendence Monitoring System Using Fingerprint Identification 164 Development of Academic Attendence Monitoring System Using Fingerprint Identification TABASSAM NAWAZ, SAIM PERVAIZ, ARASH KORRANI, AZHAR-UD-DIN Software Engineering Department Faculty of Telecommunication

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

CONATION: English Command Input/Output System for Computers

CONATION: English Command Input/Output System for Computers CONATION: English Command Input/Output System for Computers Kamlesh Sharma* and Dr. T. V. Prasad** * Research Scholar, ** Professor & Head Dept. of Comp. Sc. & Engg., Lingaya s University, Faridabad, India

More information

Weighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition

Weighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 Weighting and Normalisation of Synchronous HMMs for

More information

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The

More information

have more skill and perform more complex

have more skill and perform more complex Speech Recognition Smartphone UI Speech Recognition Technology and Applications for Improving Terminal Functionality and Service Usability User interfaces that utilize voice input on compact devices such

More information

MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION Ulpu Remes, Kalle J. Palomäki, and Mikko Kurimo Adaptive Informatics Research Centre,

More information

Email Spam Detection Using Customized SimHash Function

Email Spam Detection Using Customized SimHash Function International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

More information

OCR-Based Electronic Documentation Management System

OCR-Based Electronic Documentation Management System OCR-Based Electronic Documentation Management System Khalaf S. Alkhalaf, Abdulelah I. Almishal, Anas O. Almahmoud, and Majed S. Alotaibi Abstract Optical character recognition (OCR) is one of the latest

More information

Things to remember when transcribing speech

Things to remember when transcribing speech Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely

More information

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Recent Advances in Electrical Engineering and Electronic Devices Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Ahmed El-Mahdy and Ahmed Walid Faculty of Information Engineering

More information

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3 Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is

More information

User Authentication using Combination of Behavioral Biometrics over the Touchpad acting like Touch screen of Mobile Device

User Authentication using Combination of Behavioral Biometrics over the Touchpad acting like Touch screen of Mobile Device 2008 International Conference on Computer and Electrical Engineering User Authentication using Combination of Behavioral Biometrics over the Touchpad acting like Touch screen of Mobile Device Hataichanok

More information

An Experimental Study of the Performance of Histogram Equalization for Image Enhancement

An Experimental Study of the Performance of Histogram Equalization for Image Enhancement International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Special Issue-2, April 216 E-ISSN: 2347-2693 An Experimental Study of the Performance of Histogram Equalization

More information

How To Filter Spam Image From A Picture By Color Or Color

How To Filter Spam Image From A Picture By Color Or Color Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among

More information

Cloud User Voice Authentication enabled with Single Sign-On framework using OpenID

Cloud User Voice Authentication enabled with Single Sign-On framework using OpenID Cloud User Voice Authentication enabled with Single Sign-On framework using OpenID R.Gokulavanan Assistant Professor, Department of Information Technology, Nandha Engineering College, Erode, Tamil Nadu,

More information

On the Operational Quality of Fingerprint Scanners

On the Operational Quality of Fingerprint Scanners BioLab - Biometric System Lab University of Bologna - ITALY http://biolab.csr.unibo.it On the Operational Quality of Fingerprint Scanners Davide Maltoni and Matteo Ferrara November 7, 2007 Outline The

More information

Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach

Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach International Journal of Civil & Environmental Engineering IJCEE-IJENS Vol:13 No:03 46 Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach Mansour N. Jadid

More information

Factors Influencing the Adoption of Biometric Authentication in Mobile Government Security

Factors Influencing the Adoption of Biometric Authentication in Mobile Government Security Factors Influencing the Adoption of Biometric Authentication in Mobile Government Security Thamer Omar Alhussain Bachelor of Computing, Master of ICT School of Information and Communication Technology

More information

Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics

Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics Unlocking Value from Patanjali V, Lead Data Scientist, Anand B, Director Analytics Consulting, EXECUTIVE SUMMARY Today a lot of unstructured data is being generated in the form of text, images, videos

More information

Denoising Convolutional Autoencoders for Noisy Speech Recognition

Denoising Convolutional Autoencoders for Noisy Speech Recognition Denoising Convolutional Autoencoders for Noisy Speech Recognition Mike Kayser Stanford University mkayser@stanford.edu Victor Zhong Stanford University vzhong@stanford.edu Abstract We propose the use of

More information

The LENA TM Language Environment Analysis System:

The LENA TM Language Environment Analysis System: FOUNDATION The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File Dongxin Xu, Umit Yapanel, Sharmi Gray, & Charles T. Baer LENA Foundation, Boulder, CO LTR-04-2 September

More information

Grant: LIFE08 NAT/GR/000539 Total Budget: 1,664,282.00 Life+ Contribution: 830,641.00 Year of Finance: 2008 Duration: 01 FEB 2010 to 30 JUN 2013

Grant: LIFE08 NAT/GR/000539 Total Budget: 1,664,282.00 Life+ Contribution: 830,641.00 Year of Finance: 2008 Duration: 01 FEB 2010 to 30 JUN 2013 Coordinating Beneficiary: UOP Associated Beneficiaries: TEIC Project Coordinator: Nikos Fakotakis, Professor Wire Communications Laboratory University of Patras, Rion-Patras 26500, Greece Email: fakotaki@upatras.gr

More information

Maximum Likelihood Estimation of ADC Parameters from Sine Wave Test Data. László Balogh, Balázs Fodor, Attila Sárhegyi, and István Kollár

Maximum Likelihood Estimation of ADC Parameters from Sine Wave Test Data. László Balogh, Balázs Fodor, Attila Sárhegyi, and István Kollár Maximum Lielihood Estimation of ADC Parameters from Sine Wave Test Data László Balogh, Balázs Fodor, Attila Sárhegyi, and István Kollár Dept. of Measurement and Information Systems Budapest University

More information

CBS RECORDS PROFESSIONAL SERIES CBS RECORDS CD-1 STANDARD TEST DISC

CBS RECORDS PROFESSIONAL SERIES CBS RECORDS CD-1 STANDARD TEST DISC CBS RECORDS PROFESSIONAL SERIES CBS RECORDS CD-1 STANDARD TEST DISC 1. INTRODUCTION The CBS Records CD-1 Test Disc is a highly accurate signal source specifically designed for those interested in making

More information

Introduction to Digital Audio

Introduction to Digital Audio Introduction to Digital Audio Before the development of high-speed, low-cost digital computers and analog-to-digital conversion circuits, all recording and manipulation of sound was done using analog techniques.

More information

Room Acoustic Reproduction by Spatial Room Response

Room Acoustic Reproduction by Spatial Room Response Room Acoustic Reproduction by Spatial Room Response Rendering Hoda Nasereddin 1, Mohammad Asgari 2 and Ayoub Banoushi 3 Audio Engineer, Broadcast engineering department, IRIB university, Tehran, Iran,

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

SASSC: A Standard Arabic Single Speaker Corpus

SASSC: A Standard Arabic Single Speaker Corpus SASSC: A Standard Arabic Single Speaker Corpus Ibrahim Almosallam, Atheer AlKhalifa, Mansour Alghamdi, Mohamed Alkanhal, Ashraf Alkhairy The Computer Research Institute King Abdulaziz City for Science

More information

Solutions to Exam in Speech Signal Processing EN2300

Solutions to Exam in Speech Signal Processing EN2300 Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.

More information

Music Genre Classification

Music Genre Classification Music Genre Classification Michael Haggblade Yang Hong Kenny Kao 1 Introduction Music classification is an interesting problem with many applications, from Drinkify (a program that generates cocktails

More information

Non-Data Aided Carrier Offset Compensation for SDR Implementation

Non-Data Aided Carrier Offset Compensation for SDR Implementation Non-Data Aided Carrier Offset Compensation for SDR Implementation Anders Riis Jensen 1, Niels Terp Kjeldgaard Jørgensen 1 Kim Laugesen 1, Yannick Le Moullec 1,2 1 Department of Electronic Systems, 2 Center

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

Functional Auditory Performance Indicators (FAPI)

Functional Auditory Performance Indicators (FAPI) Functional Performance Indicators (FAPI) An Integrated Approach to Skill FAPI Overview The Functional (FAPI) assesses the functional auditory skills of children with hearing loss. It can be used by parents,

More information

Application Note. Using PESQ to Test a VoIP Network. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN

Application Note. Using PESQ to Test a VoIP Network. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN Using PESQ to Test a VoIP Network Application Note Prepared by: Psytechnics Limited 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN t: +44 (0) 1473 261 800 f: +44 (0) 1473 261 880 e: info@psytechnics.com

More information

Bandwidth analysis of multimode fiber passive optical networks (PONs)

Bandwidth analysis of multimode fiber passive optical networks (PONs) Optica Applicata, Vol. XXXIX, No. 2, 2009 Bandwidth analysis of multimode fiber passive optical networks (PONs) GRZEGORZ STEPNIAK *, LUKASZ MAKSYMIUK, JERZY SIUZDAK Institute of Telecommunications, Warsaw

More information

Securing Electronic Medical Records Using Biometric Authentication

Securing Electronic Medical Records Using Biometric Authentication Securing Electronic Medical Records Using Biometric Authentication Stephen Krawczyk and Anil K. Jain Michigan State University, East Lansing MI 48823, USA {krawcz10,jain}@cse.msu.edu Abstract. Ensuring

More information

Recognition of Emotions in Interactive Voice Response Systems

Recognition of Emotions in Interactive Voice Response Systems Recognition of Emotions in Interactive Voice Response Systems Sherif Yacoub, Steve Simske, Xiaofan Lin, John Burns HP Laboratories Palo Alto HPL-2003-136 July 2 nd, 2003* E-mail: {sherif.yacoub, steven.simske,

More information

HMM-based Breath and Filled Pauses Elimination in ASR

HMM-based Breath and Filled Pauses Elimination in ASR HMM-based Breath and Filled Pauses Elimination in ASR Piotr Żelasko 1, Tomasz Jadczyk 1,2 and Bartosz Ziółko 1,2 1 Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science

More information

THE COLLECTION AND PRELIMINARY ANALYSIS OF A SPONTANEOUS SPEECH DATABASE*

THE COLLECTION AND PRELIMINARY ANALYSIS OF A SPONTANEOUS SPEECH DATABASE* THE COLLECTION AND PRELIMINARY ANALYSIS OF A SPONTANEOUS SPEECH DATABASE* Victor Zue, Nancy Daly, James Glass, David Goodine, Hong Leung, Michael Phillips, Joseph Polifroni, Stephanie Seneff, and Michal

More information

Voice Authentication for ATM Security

Voice Authentication for ATM Security Voice Authentication for ATM Security Rahul R. Sharma Department of Computer Engineering Fr. CRIT, Vashi Navi Mumbai, India rahulrsharma999@gmail.com Abstract: Voice authentication system captures the

More information

A Digital Audio Watermark Embedding Algorithm

A Digital Audio Watermark Embedding Algorithm Xianghong Tang, Yamei Niu, Hengli Yue, Zhongke Yin Xianghong Tang, Yamei Niu, Hengli Yue, Zhongke Yin School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, Zhejiang, 3008, China tangxh@hziee.edu.cn,

More information

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final

More information