APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
|
|
|
- Samson Norris
- 10 years ago
- Views:
Transcription
1 APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer Science, Univ. of Joensuu, Finland Abstract Speaker Profiler computer program for automatic speaker recognition has been developed in a research project funded by the Finnish Technology Agency. A vector quantization (VQ) matching approach is used, where dissimilarity of an unknown speech sample is computed for codebooks created using the K-means algorithm. This study tests the recognition reliability with two databases constructed from Finnish band-limited GSM speech and authentic crime case speech. Material for the first test is recorded with a GSM phone and a laptop computer. Spontaneous speech vs. reading was tested. The program should pick the right person from the database based on independent non-verbatim speech samples. There were 47.5 % out of 107 samples ranked first correctly. Some very poor quality speech files were used in training and the mother tongue for some speakers was not Finnish. If these samples were not considered, the result was better. The second part of this study consists of real crime investigation cases. The speaker database was constructed from known speech samples (suspect). Unknown sample(s) recorded at the crime scene were matched against the database. From the matched 61 samples, 68.9 % were ranked first correctly. Accuracy is sufficient for creating shortlists in forensics. Keywords: speaker recognition, forensics, automatic recognition, real-time recognition, MFCC 1. The aim of the study The aim of the study is to test how reliably an automatic speaker recognition program performs when the input speech is band-limited (GSM speech) and authentic (real crime case material). Test situation where speech files have been recorded in good laboratory conditions with high quality microphones is far from the reality in forensics. This is why GSM phone recordings were used for this study. Annually most speaker identification cases at the Crime Laboratory consist of GSM speech material only. The tests simulate a typical speaker recognition case. Usually the speech of a criminal (the unknown sample) is spontaneous. The speech of the suspect, on the other hand, usually consists of both reading and semi-spontaneous speech. 2. Speech material and recordings Table 1 lists some statistics describing the overall structure of the data sets used in this study. These figures are explained in detail in sections 2.1 and 2.2.
2 Table 1. The number of samples and their duration in GSM and Forensic data Data Number of speech samples Sample duration (sec.) Test set Subset Male Female Total Min. Max. Avg. GSM Read GSM Spont GSM Both together Forensic Suspect Forensic Crime scene Forensic Both together GSM data Recordings for the first part of the tests were collected during 2001 under the project The Joint Project Finnish Speech Technology supported by the National Technology Agency (TEKES agreements 40285/00, 40406/01, 40238/02) and titled Speaker Recognition (University of Helsinki Project No ). For the present study, 107 speakers (60 female and 47 male) were recorded using a GSM phone and portable computer were selected. The samples included 16 noisy recordings as well as 12 samples spoken by persons whose mother tongue was Estonian, Russian or Swedish. The duration of the samples in spontaneous speech varied from 48 to 300 seconds, and in text reading task from 140 to 252 seconds Authentic crime case data Real crime investigation speech data is used in the second part of the study. The known and unknown speech samples in several crime investigation cases were recorded during the years either via GSM or land-line phones. The phone type could not be limited to one, because the criminals use the phones they have. One speaker (3 samples) is a female, the others are male. The female has very low-pitched voice, and she was included for testing reasons. Only speech files not containing extra background noise, were considered in the test. Some of the files did still contain some noise, e.g. computer hum and traffic sounds. Total of 61 unknown and 28 known samples were used. The durations of the samples varied from few seconds to 379 seconds. 3. Winsprofiler speaker recognition software The Speaker Profiler (sprofiler) is a portable software engine for creating, managing, and recognising voice profiles based on speech samples. The Winsprofiler program used in the tests of this study, is a realization of sprofiler that is augmented by a Windows graphical user interface. It supports both training and recognition from sound files or directly from PC-microphone, as well as drag-and-drop input of files. Recognition is fully automated, and the software supports both offline matching from previously recorded samples, as well as real-time matching with speech input stream from microphone of PC-soundcard. The software was developed in a research project funded by the Finnish Technology Agency (TEKES agreements 40437/03 and 40398/04). Speaker Profiler uses mel-frequency cepstral coefficients (MFCC) as the acoustic features. We have compared different spectral features for automatic speaker identification on several databases, including subband analysis, mel- and Bark-warped cepstra, LPC-based features and formant frequencies, along with their delta features (Kinnunen & al. 2004a, 2004b). MFCC s have shown high accuracy in our experiments for both laboratory- and telephone-quality speech.
3 Speaker profiler uses 12 lowest MFCC coefficients, excluding the 0th coefficient, which is not a robust parameter as it depends on the intensity. The filterbank employed in the MFCC computation consists of 27 triangular filters equispaced on the mel frequency scale. A frame rate of 100 frames per second is used, with a 20 ms overlap between the adjacent frames (30 ms frame length, 10 ms frame shift). Speaker matching is based on a vector quantization (VQ) approach (Kinnunen & al. 2004c). For each enrolled speaker, a codebook of size 64 is created using the K- means algorithm. During the matching, an unknown sample is scored against the stored codebooks, giving a dissimilarity value for each speaker. For easy interpretation, the scores are normalized into the interval [0,1] so that a larger score means better match. The program displays a ranked list of similarity values. Figure 1 shows a Winsprofiler screenshot. Juhani is speaking to a microphone (connected to PC-soundcard). Feature vectors computed from the speech are scored on-line against the database of 8 models, 7 were trained using sound files. Juhani was trained using the PC-soundcard. Figure 1. Winsprofiler real-time matching, Juhani s speech is recorded and matched on-line against 8 database entries, including Juhani s voice model 4. Test results with GSM data In the first test situation spontaneous speech vs. reading was tested. Here the question was, does the program find the right person from the database when his/her two different, non-verbatim, speech samples are compared. First the database was created from the spontaneous speech samples and then, for reliability reasons, the database was also created from the text reading samples. When the speaker database was constructed from the spontaneous speech samples, and the text reading samples were matched, there were 51 out of 107 tested samples ranked first (47.7 %), 68.2 % ranked among the first 3 samples, and 76.6 %
4 among the first 5 samples. The database contained 16 samples that were somewhat noisy or the voice of the speaker was either creaky or had very low pitch. After removing these 16 samples from the database, total of 91 samples formed a new database. The upper part of Table 2 shows the results of this test. When both noisy and nonnative speakers samples were removed, 53.2 % were ranked first, 73.4 % were ranked among the first 3 samples, and 81 % among the first 5 samples. The speaker database was also formed from the text reading samples to check the reliability of the results. Total of 107 samples formed the database. When spontaneous speech samples of the same speakers were used as the reference, there were 25 samples out of 107 tested ranked first (23.4 %), 39.3 % was ranked among the first 3 samples, and 51.5 % among the first 5 samples, see Table 2 (lower part). When noisy and nonnative speakers samples were removed, 62 % were ranked first. 76 % were ranked among the first 3 samples, and 87.3 % among the first 5 samples (see Table 2). Table 2. Rankings of the correct speaker. The database is constructed from spontaneous speech samples and the testing material from text reading of the same speakers (upper part), and vice versa (lower part) Read speech All samples % 68.2 % 76.6 % Noisy samples removed % 70.3 % 80.2 % Noisy + non-native removed % 73.4 % 81.0 % Spontaneous All samples % 39.3 % 51.5 % Noisy samples removed % 73.6 % 83.5 % Noisy + non-native removed % 76.0 % 87.3 % Poor quality of some of the recordings lowered the recognition score and the ratings. Noisy samples could affect many matching situations, where the recognition is not clear. Some tested speech samples were uttered in Finnish, but the speaker s mother tongue is Russian, Estonian, or Swedish (many Finns have Swedish as their mother tongue). This had some effect on the results. The reading and spontaneous speech samples of these speakers did not match as often as the others. 5. Test results with real crime data The second part of this study consists of real crime investigation cases. The known speech samples (suspect) form a speaker database. The unknown speech sample(s) recorded at the crime scene, were used as testing material. The test was done also vice versa: the unknown crime scene speech samples formed the speaker database and the known speech sample(s) from the suspects were used as testing material. In the first case, 28 known speech samples from suspects formed the database, and 61 different phone calls from 13 different criminal cases were matched against the database. All the speech material in the criminal cases was recorded via GSM or landline phones. From the matched 61 samples, 68.9 % were ranked first, 82 % among the first three, and 85.2 % among the first five. Three cases had either very short samples or the samples were noisy. These results are shown in Table 3 separately. The testing with the forensic data was done twice. When the database consisted of 68 unknown criminal speech samples from 13 different crime investigation cases, and
5 the test material consisted of 25 known speech samples from suspects, the result was much worse. Only few samples of suspects and criminals matched. This situation is not typical. Databases consisting of unknown speakers are not used in forensics. Table 3. Rankings of the correct speakers and the corresponding correct identification rates. The database is constructed from the 28 known samples (upper part) and from the 68 unknown samples (lower part) Known samples All samples % 82.0 % 85.2 % Noisy and short samples removed % 86.2 % 89.7 % Unknown samples All samples % 52.0 % 68.0 % Noisy and short samples removed % 59.0 % 72.7 % 6. Conclusion There were some interesting findings in the spontaneous and text reading tests. The creaky voiced female speakers were often recognized as someone else. Likewise, the males with very low-pitched and/or creaky voice succeeded similarly. Some speakers in the database were ranked first many times when the samples from another speakers were matched against the database. The reason for this should be studied. All the speech material in the criminal cases (test-2) was recorded via GSM or land-line phones and no extra background noise was found. This could be an error source for the study. Still the question arises, does the matching use more information from the background than from the speech signal itself. Different results arise with the two opposite forensic tests, though the same samples are used in both. When the database is formed from known, usually long speech samples, the result is better than with the database consisting of unknown speech samples that are usually quite short. Is the sample duration the reason for this problem? The database sizes were different also, the recognition result was worse for the larger. Currently the Speaker Profiler implements a baseline MFCC-VQ recognition. According to our experience, is quite accurate for matched conditions. However, more accuracy is desired in acoustically mismatched conditions, such as technical mismatches in recording. Further studies are therefore needed to detect the criticial conditions, and to employ methods for noise-, channel-, or score normalization. The recognition performance of the Speaker Profiler is nowhere near perfect. However, it can be already as such used to create shortlists, i.e. pick substantially smaller sets of best ranked speaker candidates. The correct speaker is consistently found in the shortlist. This property already makes it a valuable tool for a crime investigator. Acknowledgements The Joint Project of Finnish Speech Technology was supported by TEKES, Civil Aviation Administration in Finland, The Finnish Defence Forces, National Bureau of Investigation in Finland, Accident Investigation Board Finland and Scando Oy.
6 References Alexander, A., Botti, F., Dessimoz, D. And Drygajlo, A The effect of mismatched recordings on human and automatic speaker recognition in forensic applications. Forensic Science International 146S (2004) S95-S99. Botti, F., Alexander, A. And Drygajlo, A On compensation of mismatched recording conditions in the Bayesian approach for forensic automatic speaker recognition. Forensic Science International 146S (2004) S101-S106. Broeders, A.P.A The role of automatic speaker recognition techniques in forensic investigation. Proceedings of the XIIIth International Conference of Phonetic Sciences, vol. 3, Stockholm: KTH & Stockholm University. Iivonen, A., Harinen, K., Keinänen, L., Kirjavainen, J., Meister, E. & Tuuri, L Development of a multiparametric speaker profile for speaker recognition. 15th Int. Congress of Phonetic Sciences, Barcelona 3-9 Aug, 2003, Kinnunen, T. (2004a). Spectral features for automatic text-independent speaker recognition. Licentiate s Thesis, Dept. of Computer Science, Univ. of Joensuu. Kinnunen, T., Hautamäki, V., and Fränti, P. (2004b). "Fusion of spectral feature sets for accurate speaker identification", Proc. 9th Int. Conference Speech and Computer (SPECOM'2004), pp , St. Petersburg, Russia, September 20-22, Kinnunen, T., Karpov, E., and Fränti, P. (2004c). Real-time speaker identification and verification. Accepted for publication in IEEE Transactions on Speech and Audio Processing. Künzel, H., Gonzáles-Rodríguez, J Combining automatic and phonetic-acoustic speaker recognition techniques for forensic applications. 15th International Congress of Phonetic Sciences, Barcelona 3-9 Aug, 2003, Niemi-Laitinen, T Automatic speaker recognition is it possible or not? In S. Ojala & J. Tuomainen (eds) Papers from the 21 st Meeting of Finnish Phoneticians Turku Publications of the Department of Finnish Language and General Linguistics, University of Turku 67: TUIJA NIEMI-LAITINEN is a researcher at the Crime Laboratory at the National Bureau of Investigation, Finland. Her research interests concern speech prosody and speaker recognition. [email protected] or [email protected] JUHANI SAASTAMOINEN is a project manager in the Department of Computer Science in the University of Joensuu, Finland. His current research interest is numerical methods in speech analysis. [email protected] TOMI KINNUNEN is a doctoral student in the department of Computer Science in the University of Joensuu. His research topic is automatic speaker recognition. [email protected] PASI FRÄNTI is a professor of Computer Science in the University of Joensuu, Finland. His primary research interests are in image compression, pattern recognition and data mining. [email protected]
Establishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications
Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,
Emotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
School Class Monitoring System Based on Audio Signal Processing
C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.
Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
Towards usable authentication on mobile phones: An evaluation of speaker and face recognition on off-the-shelf handsets
Towards usable authentication on mobile phones: An evaluation of speaker and face recognition on off-the-shelf handsets Rene Mayrhofer University of Applied Sciences Upper Austria Softwarepark 11, A-4232
Ericsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
Speech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
Carla Simões, [email protected]. Speech Analysis and Transcription Software
Carla Simões, [email protected] Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering
The Effect of Long-Term Use of Drugs on Speaker s Fundamental Frequency
The Effect of Long-Term Use of Drugs on Speaker s Fundamental Frequency Andrey Raev 1, Yuri Matveev 1, Tatiana Goloshchapova 2 1 Speech Technology Center, St. Petersburg, RUSSIA {raev, matveev}@speechpro.com
Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
Mike Perkins, Ph.D. [email protected]
Mike Perkins, Ph.D. [email protected] Summary More than 28 years of experience in research, algorithm development, system design, engineering management, executive management, and Board of Directors
Artificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
Robust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist ([email protected]) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
Automatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
Developing an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling
Myanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics
Unlocking Value from Patanjali V, Lead Data Scientist, Anand B, Director Analytics Consulting, EXECUTIVE SUMMARY Today a lot of unstructured data is being generated in the form of text, images, videos
Hardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
Online Diarization of Telephone Conversations
Odyssey 2 The Speaker and Language Recognition Workshop 28 June July 2, Brno, Czech Republic Online Diarization of Telephone Conversations Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman Department of
ADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA. Multimedia Communications Department, EURECOM, Sophia Antipolis, France 2
3rd European ignal Processing Conference (EUIPCO) ADAPTIVE AND ONLINE PEAKER DIARIZATION FOR MEETING DATA Giovanni oldi, Christophe Beaugeant and Nicholas Evans Multimedia Communications Department, EURECOM,
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
Audio Coding Algorithm for One-Segment Broadcasting
Audio Coding Algorithm for One-Segment Broadcasting V Masanao Suzuki V Yasuji Ota V Takashi Itoh (Manuscript received November 29, 2007) With the recent progress in coding technologies, a more efficient
Audio Scene Analysis as a Control System for Hearing Aids
Audio Scene Analysis as a Control System for Hearing Aids Marie Roch [email protected] Tong Huang [email protected] Jing Liu jliu [email protected] San Diego State University 5500 Campanile Dr San Diego,
IBM Research Report. CSR: Speaker Recognition from Compressed VoIP Packet Stream
RC23499 (W0501-090) January 19, 2005 Computer Science IBM Research Report CSR: Speaker Recognition from Compressed Packet Stream Charu Aggarwal, David Olshefski, Debanjan Saha, Zon-Yin Shae, Philip Yu
Quarterly Progress and Status Report. Measuring inharmonicity through pitch extraction
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Measuring inharmonicity through pitch extraction Galembo, A. and Askenfelt, A. journal: STL-QPSR volume: 35 number: 1 year: 1994
Noise Control Engineering education in Finland
NOISE POLICY WORKSHOP, ICA 2007 Madrid Education in Noise Control Engineering Noise Control Engineering education in Finland Kari Pesonen Kari Pesonen Consulting Engineers Ltd Finland [email protected]
Gender Identification using MFCC for Telephone Applications A Comparative Study
Gender Identification using MFCC for Telephone Applications A Comparative Study Jamil Ahmad, Mustansar Fiaz, Soon-il Kwon, Maleerat Sodanil, Bay Vo, and * Sung Wook Baik Abstract Gender recognition is
An Overview of Text-Independent Speaker Recognition: from Features to Supervectors
An Overview of Text-Independent Speaker Recognition: from Features to Supervectors Tomi Kinnunen,a, Haizhou Li b a Department of Computer Science and Statistics, Speech and Image Processing Unit University
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium [email protected]
Mathematical modeling of speech acoustics D. Sc. Daniel Aalto
Mathematical modeling of speech acoustics D. Sc. Daniel Aalto Inst. Behavioural Sciences / D. Aalto / ORONet, Turku, 17 September 2013 1 Ultimate goal Predict the speech outcome of oral and maxillofacial
Combating Anti-forensics of Jpeg Compression
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 6, No 3, November 212 ISSN (Online): 1694-814 www.ijcsi.org 454 Combating Anti-forensics of Jpeg Compression Zhenxing Qian 1, Xinpeng
Tutorial about the VQR (Voice Quality Restoration) technology
Tutorial about the VQR (Voice Quality Restoration) technology Ing Oscar Bonello, Solidyne Fellow Audio Engineering Society, USA INTRODUCTION Telephone communications are the most widespread form of transport
Using Voice Biometrics in the Call Center. Best Practices for Authentication and Anti-Fraud Technology Deployment
Using Voice Biometrics in the Call Center Best Practices for Authentication and Anti-Fraud Technology Deployment This whitepaper is designed for executives and managers considering voice biometrics to
How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3
Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is
A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song
, pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX
Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification
Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification (Revision 1.0, May 2012) General VCP information Voice Communication
EFFECTS OF BACKGROUND DATA DURATION ON SPEAKER VERIFICATION PERFORMANCE
Uludağ Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, Cilt 18, Sayı 1, 2013 ARAŞTIRMA EFFECTS OF BACKGROUND DATA DURATION ON SPEAKER VERIFICATION PERFORMANCE Cemal HANİLÇİ * Figen ERTAŞ * Abstract:
Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach
Outline Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Jinfeng Yi, Rong Jin, Anil K. Jain, Shaili Jain 2012 Presented By : KHALID ALKOBAYER Crowdsourcing and Crowdclustering
L9: Cepstral analysis
L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,
ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt
ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION Horacio Franco, Luciana Ferrer, and Harry Bratt Speech Technology and Research Laboratory, SRI International, Menlo Park, CA
Marathi Interactive Voice Response System (IVRS) using MFCC and DTW
Marathi Interactive Voice Response System (IVRS) using MFCC and DTW Manasi Ram Baheti Department of CSIT, Dr.B.A.M. University, Aurangabad, (M.S.), India Bharti W. Gawali Department of CSIT, Dr.B.A.M.University,
Framework for Biometric Enabled Unified Core Banking
Proc. of Int. Conf. on Advances in Computer Science and Application Framework for Biometric Enabled Unified Core Banking Manohar M, R Dinesh and Prabhanjan S Research Candidate, Research Supervisor, Faculty
Kristina Kuzminskaia (Suzi)
Curriculum Vitae PERSONAL INFORMATION Juhani Ahonkatu 7 A 2, 70500 Kuopio (Finland) +358 45 148 1005 [email protected] Sex Female Date of birth 14/04/1994 Nationality Russian STUDIES APPLIED FOR
MORPHO CRIMINAL JUSTICE SUITE
MORPHO CRIMINAL JUSTICE SUITE FULL RANGE OF PRODUCTS DEDICATED TO CRIMINAL JUSTICE MISSIONS 2 1 3 Morpho provides law enforcement with a wide range of products to support criminal investigation, background
Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones
Final Year Project Progress Report Frequency-Domain Adaptive Filtering Myles Friel 01510401 Supervisor: Dr.Edward Jones Abstract The Final Year Project is an important part of the final year of the Electronic
CIVIL Corpus: Voice Quality for Forensic Speaker Comparison
CIVIL Corpus: Voice Quality for Forensic Speaker Comparison Eugenia San Segundo Helena Alves Marianela Fernández Trinidad Phonetics Lab. CSIC CILC2013 Alicante 15 Marzo 2013 CIVIL Project Cualidad Individual
Analysis of QoS parameters of VOIP calls over Wireless Local Area Networks
Analysis of QoS parameters of VOIP calls over Wireless Local Area Networks Ayman Wazwaz, Computer Engineering Department, Palestine Polytechnic University, Hebron, Palestine, [email protected] Duaa sweity
Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc
(International Journal of Computer Science & Management Studies) Vol. 17, Issue 01 Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc Dr. Khalid Hamid Bilal Khartoum, Sudan [email protected]
MULTI-STREAM VOICE OVER IP USING PACKET PATH DIVERSITY
MULTI-STREAM VOICE OVER IP USING PACKET PATH DIVERSITY Yi J. Liang, Eckehard G. Steinbach, and Bernd Girod Information Systems Laboratory, Department of Electrical Engineering Stanford University, Stanford,
Using the Amazon Mechanical Turk for Transcription of Spoken Language
Research Showcase @ CMU Computer Science Department School of Computer Science 2010 Using the Amazon Mechanical Turk for Transcription of Spoken Language Matthew R. Marge Satanjeev Banerjee Alexander I.
Web-Conferencing System SAViiMeeting
Web-Conferencing System SAViiMeeting Alexei Machovikov Department of Informatics and Computer Technologies National University of Mineral Resources Mining St-Petersburg, Russia [email protected] Abstract
C E D A T 8 5. Innovating services and technologies for speech content management
C E D A T 8 5 Innovating services and technologies for speech content management Company profile 25 years experience in the market of transcription/reporting services; Cedat 85 Group: Cedat 85 srl Subtitle
Lecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
Thirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS
The 21 st International Congress on Sound and Vibration 13-17 July, 2014, Beijing/China PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS Dan Wang, Nanjie Yan and Jianxin Peng*
Detection of Heart Diseases by Mathematical Artificial Intelligence Algorithm Using Phonocardiogram Signals
International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 3 No. 1 May 2013, pp. 145-150 2013 Innovative Space of Scientific Research Journals http://www.issr-journals.org/ijias/ Detection
Capturing Real-Time Power System Dynamics: Opportunities and Challenges. Zhenyu(Henry) Huang, PNNL, [email protected]
1 Capturing Real-Time Power System Dynamics: Opportunities and Challenges Zhenyu(Henry) Huang, PNNL, [email protected] Frequency trending away from nominal requires tools capturing real-time dynamics
Cloud User Voice Authentication enabled with Single Sign-On framework using OpenID
Cloud User Voice Authentication enabled with Single Sign-On framework using OpenID R.Gokulavanan Assistant Professor, Department of Information Technology, Nandha Engineering College, Erode, Tamil Nadu,
Proceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 2aSC: Linking Perception and Production
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania [email protected]
Securing Electronic Medical Records using Biometric Authentication
Securing Electronic Medical Records using Biometric Authentication Stephen Krawczyk and Anil K. Jain Michigan State University, East Lansing MI 48823, USA, [email protected], [email protected] Abstract.
Annotated bibliographies for presentations in MUMT 611, Winter 2006
Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.
ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS
ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ImpostorMaps is a methodology developed by Auraya and available from Auraya resellers worldwide to configure,
INNOVATION. Campus Box 154 P.O. Box 173364 Denver, CO 80217-3364 Website: http://cam.ucdenver.edu/ncmf
EDUCATION RESEARCH INNOVATION Campus Box 154 P.O. Box 173364 Denver, CO 80217-3364 Website: http://cam.ucdenver.edu/ncmf Email: [email protected] Phone: 303.315.5850 Fax: 303.832.0483 JEFF M. SMITH, m.s.
SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne
SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne Published in: Proceedings of Fonetik 2008 Published: 2008-01-01
Objective Speech Quality Measures for Internet Telephony
Objective Speech Quality Measures for Internet Telephony Timothy A. Hall National Institute of Standards and Technology 100 Bureau Drive, STOP 8920 Gaithersburg, MD 20899-8920 ABSTRACT Measuring voice
Simple Voice over IP (VoIP) Implementation
Simple Voice over IP (VoIP) Implementation ECE Department, University of Florida Abstract Voice over IP (VoIP) technology has many advantages over the traditional Public Switched Telephone Networks. In
Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
Solutions to Exam in Speech Signal Processing EN2300
Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.
NETWORK REQUIREMENTS FOR HIGH-SPEED REAL-TIME MULTIMEDIA DATA STREAMS
NETWORK REQUIREMENTS FOR HIGH-SPEED REAL-TIME MULTIMEDIA DATA STREAMS Andrei Sukhov 1), Prasad Calyam 2), Warren Daly 3), Alexander Iliin 4) 1) Laboratory of Network Technologies, Samara Academy of Transport
Loudspeaker Equalization with Post-Processing
EURASIP Journal on Applied Signal Processing 2002:11, 1296 1300 c 2002 Hindawi Publishing Corporation Loudspeaker Equalization with Post-Processing Wee Ser Email: ewser@ntuedusg Peng Wang Email: ewangp@ntuedusg
Recent advances in Digital Music Processing and Indexing
Recent advances in Digital Music Processing and Indexing Acoustics 08 warm-up TELECOM ParisTech Gaël RICHARD Telecom ParisTech (ENST) www.enst.fr/~grichard/ Content Introduction and Applications Components
Simulative Investigation of QoS parameters for VoIP over WiMAX networks
www.ijcsi.org 288 Simulative Investigation of QoS parameters for VoIP over WiMAX networks Priyanka 1, Jyoteesh Malhotra 2, Kuldeep Sharma 3 1,3 Department of Electronics, Ramgarhia Institue of Engineering
Colorado School of Mines Computer Vision Professor William Hoff
Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Introduction to 2 What is? A process that produces from images of the external world a description
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Single voice command recognition by finite element analysis
Single voice command recognition by finite element analysis Prof. Dr. Sc. Raycho Ilarionov Assoc. Prof. Dr. Nikolay Madzharov MSc. Eng. Georgi Tsanev Applications of voice recognition Playing back simple
Speech Recognition System for Cerebral Palsy
Speech Recognition System for Cerebral Palsy M. Hafidz M. J., S.A.R. Al-Haddad, Chee Kyun Ng Department of Computer & Communication Systems Engineering, Faculty of Engineering, Universiti Putra Malaysia,
MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music
ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final
MUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
QMeter Tools for Quality Measurement in Telecommunication Network
QMeter Tools for Measurement in Telecommunication Network Akram Aburas 1 and Prof. Khalid Al-Mashouq 2 1 Advanced Communications & Electronics Systems, Riyadh, Saudi Arabia [email protected] 2 Electrical
TOOLS for DEVELOPING Communication PLANS
TOOLS for DEVELOPING Communication PLANS Students with disabilities, like all students, must have the opportunity to fully participate in all aspects of their education. Being able to effectively communicate
A Network Simulation Experiment of WAN Based on OPNET
A Network Simulation Experiment of WAN Based on OPNET 1 Yao Lin, 2 Zhang Bo, 3 Liu Puyu 1, Modern Education Technology Center, Liaoning Medical University, Jinzhou, Liaoning, China,[email protected] *2
