Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
|
|
- Osborn Bates
- 7 years ago
- Views:
Transcription
1 , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl für Informatik VI Computer Science Department RWTH Aachen University, Germany Error Minimizing Training Criteria 1
2 Contents 1. Introduction 2. Overview of Discriminative Training Criteria 3. Minimum Classification Error Training for LVCSR 4. Comparative Experiments 5. Conclusions Error Minimizing Training Criteria 2
3 Introduction Discriminative training criteria like Maxmimum Mutual Information (MMI) by now established for large scale speech recognition tasks [Woodland 2002]. Recently: special interest in error minimizing training criteria Types of criteria in general: Criteria aimed at optimal distribution estimation (e.g. ML, MMI) Criteria representing expectation of error rate taken on training data, e.g. Minimum Word Error (MWE) criterion Minimum Phone Error (MPE) criterion significantly outperform MMI on many tasks [Povey 2002, Povey 2004]. Criteria representing (smoothed) empirical error rate on training data, e.g. Minimum Classification Error (MCE) aims at minimizing smoothed sentence error on training data. consistently better results than MMI on small vocabulary tasks. Error Minimizing Training Criteria 3
4 Maximum Mutual Information (MMI) Criterion: F MMI (θ) = 1 R = 1 R Discriminative Criteria R log p θ (W r X r ) r=1 R log p θ(x r W r ) p(w r ) p θ (X r W ) p(w ) r=1 competing model includes correct class sensitive to outliers W Minimum Classification Error (MCE) Criterion: F MCE (θ) = 1 R R r= [ p θ (X r W r ) p(w r ) α p α θ (X r W ) p α (W ) W W r ] 2ϱ competing model excludes correct class approximates sentence error rate on training data Error Minimizing Training Criteria 4
5 Discriminative Criteria Minimum Word Error (MWE) Criterion: F MWE (θ) = 1 R R r=1 p θ (X r W ) p(w ) A(W, W r ) W p θ (X r W ) p(w ) W criterion: expectation of approximation of word accuracy on training data A(W, W r ) approximates raw accuracy of hypothesis W instead of Levenshtein alignment: A(W, W r ) is defined locally on word level: a b c reference a b b hypothesis d time overlap t /3 2/3 A(w, w r ) = { t(w, wr ) if w = w r 1 + t(w, w r ) if w w r Minimum Phone Error (MPE) Criterion: replace word accuracy by phone accuracy measure Error Minimizing Training Criteria 5
6 Extended Unifying Approach F ( θ; f, α, G, {M r } ) = 1 R R f r=1 p α θ (X r W ) p α 1/α (W ) G(W, W r ) log W p α θ (X r W ) p α (W ) W M r criterion smoothing function alternative word sequences exponent gain function f(z) M r α G(W, W r ) ML z - MMI z all (recognized) 1 CT best (recognized) MCE 1 all without W r free δ(w, W r ) FT 1 + e 2ϱz best (recognized) W r Diversity β 1 (1 eβz ) all (recognized) free Jeffreys 1 z z all (recognized) 1 MWE/MPE exp(z) all (recognized) 1 A(W, W r ) Properties of Diversity Index: equals MCE with ϱ = 1/2 for β = 1 equals MMI for β in case of MPE, A(W, W r ) gives phone accuracy Error Minimizing Training Criteria 6
7 MCE Criterion Few publications investigate use of MCE on large vocabulary tasks. Reason: requires exclusion of correct class from set of competing classes. ASR: Exclusion of spoken word sequence from set of all possible word sequences. Difficult if set of competing word sequences is encoded as word lattice: Lattice may contain multiple alignments and pronunciation variants of spoken utterance. Arcs may not uniquely be assigned to correct or competing sentences without changing lattice structure. Remedies: Use N-best lists. Problem: Coverage considerably reduced. Use finite state machines to restructure training lattices. Problem: In general lattice density increases. Exclusion after computing statistics: this work. Advantage: all statistics needed could be extracted from original training lattices Error Minimizing Training Criteria 7
8 MCE Optimization Optimization of MCE criterion lead to expressions containing word posterior like weights. But: corresponding summations exclude spoken word sequence: q [tb, t e ](w X r ) = p {W Mr W Wr w [tb, te] W } {V Mr V Wr} α λ (X r, W ) p α λ (X r, V ) Goal: efficient lattice based computation of MCE weights q [tb, t e ](w X r ) Idea: exclude partial sum over spoken word sequences numerically Error Minimizing Training Criteria 8
9 Efficient Lattice-Based MCE Algorithm Efficient computation of MCE weights for training: Algorithm: q [tb, t e ](w X r ) = = p {W Mr W Wr w [tb, te] W } {V Mr V Wr} p {W Mr w [tb, te] W } V Mr α λ (X r, W ) p α λ (X r, V ) α λ (X r, W ) p α λ (X r, V ) p {W Mr W =Wr w [tb, te] W } {V Mr V =Wr} α λ (X r, W ) p α λ (X r, V ) 1. label all alignments of spoken word sequence in denominator lattice. Corresponding sublattice is equivalent to numerator lattice. 2. Compute arc posteriors in numerator and denominator lattice using forward-backward algorithm (similar to MMI). 3. Subtract posteriors of labeled arcs in denominator lattice by corresponding numerator arc posteriors. Error Minimizing Training Criteria 9
10 Experiments on Wall Street Journal (WSJ0) Task initial ML trained acoustic models for WSJ0: 16 cepstral coefficients ms frame shift LDA (±2 frames, ) 2000 general. triphone states + 1 silence state 6-state HMM within-word triphone models gender independent Gaussian mixtures 1 pooled variance, 149k Gaussian densities corpus WSJ0 train dev eval acoustic data [h] 15:17 0:46 0:40 # speakers # sentences # running words # lexicon words corpus ARPA WSJ0 NOV. 92 dev eval WER SER WER SER ML MMI MCE MWE Error Minimizing Training Criteria 10
11 Experiments on North American Business (NAB) Task initial ML trained acoustic models for WSJ0+1: 16 cepstral coefficients ms frame shift LDA (±1 frames, 99 32) 7000 general. triphone states + 1 silence state 6-state HMM across-word triphone models gender independent Gaussian mixtures corpus WSJ0+1 NAB Nov. 94 train dev eval acoustic data [h] 81:23 0:48 0:53 # speakers # sentences # running words # lexicon words pooled variance, 412k Gaussian densities NAB NAB-20k NAB-65k Nov. 94 corpus dev eval dev eval WER SER WER SER WER SER WER SER ML MMI MCE MWE MPE Error Minimizing Training Criteria 11
12 Conclusions Minimum Classification Error (MCE) applied to large vocabulary speech recognition. Efficient lattice-based computation of MCE training statistics. No need for N-best lists or restructuring of word lattices. Common representation and performance comparison of: Maximum Mutual Information (MMI) criterion, Minimum Classification Error (MCE) criterion, Minimum Word/Phone Error (MWE/MPE) criterion. MCE showed same performance gains as MWE. Acknowledgments This work was partially funded by the European Union under the integrated project TC-STAR Technology and Corpora for Speech to Speech Translation IST-2002-FP , Error Minimizing Training Criteria 12
13 References D. Povey, P. C. Woodland, Minimum phone error and i-smoothing for improved discriminative training, in IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2002, vol. 1, Orlando, FL, May 2002, pp D. Povey, Discriminative training for large vocabulary speech recognition, Ph.D. dissertation, Dept. of Eng., Cambridge Univ., Cambridge, August B.-H. Juang, S. Katagiri, Discriminative learning for minimum error classification, in IEEE Transactions on Signal Processing, vol. 40, no. 12, December 1992, pp W. Chou, C.-H. Lee, B.-H. Juang, Minimum Error Rate Training based on N-Best String Models, in IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, Minneapolis, MN, USA, April 1993, pp R. Schlüter, W. Macherey, Comparison of discriminative training criteria, in 1998 Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, Seattle, WA, May 1998, pp R. Schlüter, W. Macherey, B. Müller, H. Ney, Comparison of discriminative training criteria and optimization methods for speech recognition, Speech Communication, vol. 34, no. 1, pp , May E. McDermott, T. J. Hazen, Minimum classification error training of landmark models for real-time continuous speech recognition, in IEEE Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp Error Minimizing Training Criteria 13
14 References (cont d) E. McDermott, S. Katagiri, Minimum classification error for large scale speech recognition tasks using weighted finite state transducers, in IEEE Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, Philadelphia, PA, March 2005, pp K. K. Paliwal, M. Bacchiani, Y. Sagisaka, Minimum classification error training algorithm for feature extractor and pattern classifier in speech recognition, in 1995 Europ. Conf. on Speech Communication and Technology, vol. 1, Madrid, Spain, September 1995, pp W. Macherey, Implementation and comparison of discriminative training methods for automatic speech recognition, Diploma Thesis, Lehrstuhl für Informatik VI, RWTH Aachen University, Aachen, November D. S. Pallett, J. G. Fiscus, W. M. Fisher, J. S. Garofolo, B. A. Lund, and M. A. Przybocki, 1994 Benchmark test for the ARPA spoken language program, in ARPA Human Language Technology Workshop, Austin, TX, January 1995, pp F. Kubala, Design of the 1994 CSR benchmark tests, in ARPA Human Language Technology Workshop, Austin, TX, January 1995, pp W. Macherey, R. Schlüter, H. Ney, Discriminative training with tied covariance matrices, in 8th Int. Conf. on Spoken Language Processing, vol. 1, Jeju Island, Korea, October 2004, pp Error Minimizing Training Criteria 14
Turkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney
ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department,
More informationModified MMI/MPE: A Direct Evaluation of the Margin in Speech Recognition
Modified MMI/MPE: A Direct Evaluation of the Margin in Speech Recognition Georg Heigold HEIGOLD@CS.RWTH-AACHEN.DE Thomas Deselaers DESELAERS@CS.RWTH-AACHEN.DE Ralf Schlüter SCHLUETER@CS.RWTH-AACHEN.DE
More informationMajor Symbols Used in the Book and Their Descriptions
103 Major Symbols Used in the Book and Their Descriptions Symbols R r = 1, ¼, R X c S s X r Descriptions number of training samples (tokens or strings) index of individual training samples aggregate of
More informationAutomatic Transcription of Conversational Telephone Speech
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 6, NOVEMBER 2005 1173 Automatic Transcription of Conversational Telephone Speech Thomas Hain, Member, IEEE, Philip C. Woodland, Member, IEEE,
More informationSpeech Recognition on Cell Broadband Engine UCRL-PRES-223890
Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationThe XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationAdaptive Training for Large Vocabulary Continuous Speech Recognition
Adaptive Training for Large Vocabulary Continuous Speech Recognition Kai Yu Hughes Hall College and Cambridge University Engineering Department July 2006 Dissertation submitted to the University of Cambridge
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationAutomatic slide assignation for language model adaptation
Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationScaling Shrinkage-Based Language Models
Scaling Shrinkage-Based Language Models Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, Abhinav Sethy IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, NY 10598 USA {stanchen,mangu,bhuvana,sarikaya,asethy}@us.ibm.com
More informationHow to Improve the Sound Quality of Your Microphone
An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies Andreas Maier, Julian Exner, Stefan Steidl, Anton Batliner, Tino Haderlein, and Elmar Nöth Universität Erlangen-Nürnberg,
More informationIBM Research Report. Scaling Shrinkage-Based Language Models
RC24970 (W1004-019) April 6, 2010 Computer Science IBM Research Report Scaling Shrinkage-Based Language Models Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, Abhinav Sethy IBM Research
More informationSecure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationEFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION
EFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION By Ramasubramanian Sundaram A Thesis Submitted to the Faculty of Mississippi State University in Partial Fulfillment of the
More informationCombination of Multiple Speech Transcription Methods for Vocabulary Independent Search
Combination of Multiple Speech Transcription Methods for Vocabulary Independent Search ABSTRACT Jonathan Mamou IBM Haifa Research Lab Haifa 31905, Israel mamou@il.ibm.com Bhuvana Ramabhadran IBM T. J.
More informationTHE BAVIECA OPEN-SOURCE SPEECH RECOGNITION TOOLKIT. Daniel Bolaños. Boulder Language Technologies (BLT), Boulder, CO, 80301 USA
THE BAVIECA OPEN-SOURCE SPEECH RECOGNITION TOOLKIT Daniel Bolaños Boulder Language Technologies (BLT), Boulder, CO, 80301 USA dani@bltek.com ABSTRACT This article describes the design of Bavieca, an opensource
More informationLecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
More informationTHE RWTH ENGLISH LECTURE RECOGNITION SYSTEM
THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM Simon Wiesler 1, Kazuki Irie 2,, Zoltán Tüske 1, Ralf Schlüter 1, Hermann Ney 1,2 1 Human Language Technology and Pattern Recognition, Computer Science Department,
More informationEVALUATION OF AUTOMATIC TRANSCRIPTION SYSTEMS FOR THE JUDICIAL DOMAIN
EVALUATION OF AUTOMATIC TRANSCRIPTION SYSTEMS FOR THE JUDICIAL DOMAIN J. Lööf (1), D. Falavigna (2),R.Schlüter (1), D. Giuliani (2), R. Gretter (2),H.Ney (1) (1) Computer Science Department, RWTH Aachen
More informationα α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =
More informationAdvances in Speech Transcription at IBM Under the DARPA EARS Program
1596 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 5, SEPTEMBER 2006 Advances in Speech Transcription at IBM Under the DARPA EARS Program Stanley F. Chen, Brian Kingsbury, Lidia
More informationDeveloping an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling
More informationSpeech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
More informationSpeech Transcription
TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1 What Is Speech Recognition? Def: Automatic conversion
More informationDetecting and Correcting Transcription Discrepancies between Thai Parliament Meeting Speech Utterances and their Official Meeting Reports
Detecting and Correcting Transcription Discrepancies between Thai Parliament Meeting Speech Utterances and their Official Meeting Reports Natnarong. Puangsri, Atiwong. Suchato, Proadpran. Punyabukkana,
More informationBuilding A Vocabulary Self-Learning Speech Recognition System
INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes
More informationAutomatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
More informationA System for Searching and Browsing Spoken Communications
A System for Searching and Browsing Spoken Communications Lee Begeja Bernard Renger Murat Saraclar AT&T Labs Research 180 Park Ave Florham Park, NJ 07932 {lee, renger, murat} @research.att.com Abstract
More informationSOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS
SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University
More informationMembering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
More informationMUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
More informationLIUM s Statistical Machine Translation System for IWSLT 2010
LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,
More informationOnline Diarization of Telephone Conversations
Odyssey 2 The Speaker and Language Recognition Workshop 28 June July 2, Brno, Czech Republic Online Diarization of Telephone Conversations Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman Department of
More informationSPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,
More informationEvaluating grapheme-to-phoneme converters in automatic speech recognition context
Evaluating grapheme-to-phoneme converters in automatic speech recognition context Denis Jouvet, Dominique Fohr, Irina Illina To cite this version: Denis Jouvet, Dominique Fohr, Irina Illina. Evaluating
More information2014/02/13 Sphinx Lunch
2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue
More informationA secure face tracking system
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 10 (2014), pp. 959-964 International Research Publications House http://www. irphouse.com A secure face tracking
More informationMyanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
More informationPhonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential
white paper Phonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential A Whitepaper by Jacob Garland, Colin Blake, Mark Finlay and Drew Lanham Nexidia, Inc., Atlanta, GA People who create,
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationSpeech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus
Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationAUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language
AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationComparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and
More informationAdvanced Signal Processing and Digital Noise Reduction
Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New
More informationAn Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
More informationObjective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification
Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Raphael Ullmann 1,2, Ramya Rasipuram 1, Mathew Magimai.-Doss 1, and Hervé Bourlard 1,2 1 Idiap Research Institute,
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More information203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
More informationTranscription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)
Proceedings of the Twenty-Fourth Innovative Appications of Artificial Intelligence Conference Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet) Tatsuya Kawahara
More informationMachine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
More informationIEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification Shou-Chun Yin, Richard Rose, Senior
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationSpoken Document Retrieval from Call-Center Conversations
Spoken Document Retrieval from Call-Center Conversations Jonathan Mamou, David Carmel, Ron Hoory IBM Haifa Research Labs Haifa 31905, Israel {mamou,carmel,hoory}@il.ibm.com ABSTRACT We are interested in
More informationEvaluating a motor unit potential train using cluster validation methods
Evaluating a motor unit potential train using cluster validation methods Hossein Parsaei 1 and Daniel W. Stashuk Systems Design Engineering, University of Waterloo, Waterloo, Canada; ABSTRACT Assessing
More informationConfidence Measurement Techniques in Automatic Speech Recognition and Dialog Management
Lehrstuhl für Mensch-Maschine-Kommunikation Technische Universität München Confidence Measurement Techniques in Automatic Speech Recognition and Dialog Management Tibor Fabian Vollständiger Abdruck der
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationOPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationDiscriminative Multimodal Biometric. Authentication Based on Quality Measures
Discriminative Multimodal Biometric Authentication Based on Quality Measures Julian Fierrez-Aguilar a,, Javier Ortega-Garcia a, Joaquin Gonzalez-Rodriguez a, Josef Bigun b a Escuela Politecnica Superior,
More informationCHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present
CHANWOO KIM (BIRTH: APR. 9, 1976) 2602E NSH Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Phone: +1-412-726-3996 Email: chanwook@cs.cmu.edu RESEARCH INTERESTS Speech recognition system,
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationAnnotated bibliographies for presentations in MUMT 611, Winter 2006
Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.
More informationMISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION Ulpu Remes, Kalle J. Palomäki, and Mikko Kurimo Adaptive Informatics Research Centre,
More informationSENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen
SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering, The University of Texas
More informationHow To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
More informationADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt
ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION Horacio Franco, Luciana Ferrer, and Harry Bratt Speech Technology and Research Laboratory, SRI International, Menlo Park, CA
More informationEstonian Large Vocabulary Speech Recognition System for Radiology
Estonian Large Vocabulary Speech Recognition System for Radiology Tanel Alumäe, Einar Meister Institute of Cybernetics Tallinn University of Technology, Estonia October 8, 2010 Alumäe, Meister (TUT, Estonia)
More informationVictoria Kostina Curriculum Vitae - September 6, 2015 Page 1 of 5. Victoria Kostina
Victoria Kostina Curriculum Vitae - September 6, 2015 Page 1 of 5 Victoria Kostina Department of Electrical Engineering www.caltech.edu/~vkostina California Institute of Technology, CA 91125 vkostina@caltech.edu
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationWorkshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science
Engineering & Health Informa2on Science Engineering NLP Solu/ons for Structured Informa/on from Clinical Text: Extrac'ng Sen'nel Events from Pallia've Care Consult Le8ers Canada-China Clean Energy Initiative
More informationCell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationThe LIMSI RT-04 BN Arabic System
The LIMSI RT-04 BN Arabic System Abdel. Messaoudi, Lori Lamel and Jean-Luc Gauvain Spoken Language Processing Group LIMSI-CNRS, BP 133 91403 Orsay cedex, FRANCE {abdel,gauvain,lamel}@limsi.fr ABSTRACT
More informationRethinking Speech Recognition on Mobile Devices
Rethinking Speech Recognition on Mobile Devices Anuj Kumar 1, Anuj Tewari 2, Seth Horrigan 2, Matthew Kam 1, Florian Metze 3 and John Canny 2 1 Human-Computer Interaction Institute, Carnegie Mellon University,
More informationtance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same
1222 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 7, SEPTEMBER 2008 System Combination for Machine Translation of Spoken and Written Language Evgeny Matusov, Student Member,
More informationAPPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
More informationClass-specific Sparse Coding for Learning of Object Representations
Class-specific Sparse Coding for Learning of Object Representations Stephan Hasler, Heiko Wersing, and Edgar Körner Honda Research Institute Europe GmbH Carl-Legien-Str. 30, 63073 Offenbach am Main, Germany
More informationExperiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis
Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National
More informationTED-LIUM: an Automatic Speech Recognition dedicated corpus
TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France firstname.lastname@lium.univ-lemans.fr
More informationSupporting Collaborative Transcription of Recorded Speech with a 3D Game Interface
Supporting Collaborative Transcription of Recorded Speech with a 3D Game Interface Saturnino Luz 1, Masood Masoodian 2, and Bill Rogers 2 1 School of Computer Science and Statistics Trinity College Dublin
More informationProgramming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
More informationAnalysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More informationData a systematic approach
Pattern Discovery on Australian Medical Claims Data a systematic approach Ah Chung Tsoi Senior Member, IEEE, Shu Zhang, Markus Hagenbuchner Member, IEEE Abstract The national health insurance system in
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationWeighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 Weighting and Normalisation of Synchronous HMMs for
More informationAutomatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations
Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Hugues Salamin, Anna Polychroniou and Alessandro Vinciarelli University of Glasgow - School of computing Science, G128QQ
More informationSpeech recognition for human computer interaction
Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices
More informationADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA. Multimedia Communications Department, EURECOM, Sophia Antipolis, France 2
3rd European ignal Processing Conference (EUIPCO) ADAPTIVE AND ONLINE PEAKER DIARIZATION FOR MEETING DATA Giovanni oldi, Christophe Beaugeant and Nicholas Evans Multimedia Communications Department, EURECOM,
More informationA TOOL FOR TEACHING LINEAR PREDICTIVE CODING
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering
More informationApplication of discriminant analysis to predict the class of degree for graduating students in a university system
International Journal of Physical Sciences Vol. 4 (), pp. 06-0, January, 009 Available online at http://www.academicjournals.org/ijps ISSN 99-950 009 Academic Journals Full Length Research Paper Application
More informationTraining Universal Background Models for Speaker Recognition
Odyssey 2010 The Speaer and Language Recognition Worshop 28 June 1 July 2010, Brno, Czech Republic Training Universal Bacground Models for Speaer Recognition Mohamed Kamal Omar and Jason Pelecanos IBM
More informationReliable and Cost-Effective PoS-Tagging
Reliable and Cost-Effective PoS-Tagging Yu-Fang Tsai Keh-Jiann Chen Institute of Information Science, Academia Sinica Nanang, Taipei, Taiwan 5 eddie,chen@iis.sinica.edu.tw Abstract In order to achieve
More information