Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Size: px
Start display at page:

Download "Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition"

Transcription

1 , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl für Informatik VI Computer Science Department RWTH Aachen University, Germany Error Minimizing Training Criteria 1

2 Contents 1. Introduction 2. Overview of Discriminative Training Criteria 3. Minimum Classification Error Training for LVCSR 4. Comparative Experiments 5. Conclusions Error Minimizing Training Criteria 2

3 Introduction Discriminative training criteria like Maxmimum Mutual Information (MMI) by now established for large scale speech recognition tasks [Woodland 2002]. Recently: special interest in error minimizing training criteria Types of criteria in general: Criteria aimed at optimal distribution estimation (e.g. ML, MMI) Criteria representing expectation of error rate taken on training data, e.g. Minimum Word Error (MWE) criterion Minimum Phone Error (MPE) criterion significantly outperform MMI on many tasks [Povey 2002, Povey 2004]. Criteria representing (smoothed) empirical error rate on training data, e.g. Minimum Classification Error (MCE) aims at minimizing smoothed sentence error on training data. consistently better results than MMI on small vocabulary tasks. Error Minimizing Training Criteria 3

4 Maximum Mutual Information (MMI) Criterion: F MMI (θ) = 1 R = 1 R Discriminative Criteria R log p θ (W r X r ) r=1 R log p θ(x r W r ) p(w r ) p θ (X r W ) p(w ) r=1 competing model includes correct class sensitive to outliers W Minimum Classification Error (MCE) Criterion: F MCE (θ) = 1 R R r= [ p θ (X r W r ) p(w r ) α p α θ (X r W ) p α (W ) W W r ] 2ϱ competing model excludes correct class approximates sentence error rate on training data Error Minimizing Training Criteria 4

5 Discriminative Criteria Minimum Word Error (MWE) Criterion: F MWE (θ) = 1 R R r=1 p θ (X r W ) p(w ) A(W, W r ) W p θ (X r W ) p(w ) W criterion: expectation of approximation of word accuracy on training data A(W, W r ) approximates raw accuracy of hypothesis W instead of Levenshtein alignment: A(W, W r ) is defined locally on word level: a b c reference a b b hypothesis d time overlap t /3 2/3 A(w, w r ) = { t(w, wr ) if w = w r 1 + t(w, w r ) if w w r Minimum Phone Error (MPE) Criterion: replace word accuracy by phone accuracy measure Error Minimizing Training Criteria 5

6 Extended Unifying Approach F ( θ; f, α, G, {M r } ) = 1 R R f r=1 p α θ (X r W ) p α 1/α (W ) G(W, W r ) log W p α θ (X r W ) p α (W ) W M r criterion smoothing function alternative word sequences exponent gain function f(z) M r α G(W, W r ) ML z - MMI z all (recognized) 1 CT best (recognized) MCE 1 all without W r free δ(w, W r ) FT 1 + e 2ϱz best (recognized) W r Diversity β 1 (1 eβz ) all (recognized) free Jeffreys 1 z z all (recognized) 1 MWE/MPE exp(z) all (recognized) 1 A(W, W r ) Properties of Diversity Index: equals MCE with ϱ = 1/2 for β = 1 equals MMI for β in case of MPE, A(W, W r ) gives phone accuracy Error Minimizing Training Criteria 6

7 MCE Criterion Few publications investigate use of MCE on large vocabulary tasks. Reason: requires exclusion of correct class from set of competing classes. ASR: Exclusion of spoken word sequence from set of all possible word sequences. Difficult if set of competing word sequences is encoded as word lattice: Lattice may contain multiple alignments and pronunciation variants of spoken utterance. Arcs may not uniquely be assigned to correct or competing sentences without changing lattice structure. Remedies: Use N-best lists. Problem: Coverage considerably reduced. Use finite state machines to restructure training lattices. Problem: In general lattice density increases. Exclusion after computing statistics: this work. Advantage: all statistics needed could be extracted from original training lattices Error Minimizing Training Criteria 7

8 MCE Optimization Optimization of MCE criterion lead to expressions containing word posterior like weights. But: corresponding summations exclude spoken word sequence: q [tb, t e ](w X r ) = p {W Mr W Wr w [tb, te] W } {V Mr V Wr} α λ (X r, W ) p α λ (X r, V ) Goal: efficient lattice based computation of MCE weights q [tb, t e ](w X r ) Idea: exclude partial sum over spoken word sequences numerically Error Minimizing Training Criteria 8

9 Efficient Lattice-Based MCE Algorithm Efficient computation of MCE weights for training: Algorithm: q [tb, t e ](w X r ) = = p {W Mr W Wr w [tb, te] W } {V Mr V Wr} p {W Mr w [tb, te] W } V Mr α λ (X r, W ) p α λ (X r, V ) α λ (X r, W ) p α λ (X r, V ) p {W Mr W =Wr w [tb, te] W } {V Mr V =Wr} α λ (X r, W ) p α λ (X r, V ) 1. label all alignments of spoken word sequence in denominator lattice. Corresponding sublattice is equivalent to numerator lattice. 2. Compute arc posteriors in numerator and denominator lattice using forward-backward algorithm (similar to MMI). 3. Subtract posteriors of labeled arcs in denominator lattice by corresponding numerator arc posteriors. Error Minimizing Training Criteria 9

10 Experiments on Wall Street Journal (WSJ0) Task initial ML trained acoustic models for WSJ0: 16 cepstral coefficients ms frame shift LDA (±2 frames, ) 2000 general. triphone states + 1 silence state 6-state HMM within-word triphone models gender independent Gaussian mixtures 1 pooled variance, 149k Gaussian densities corpus WSJ0 train dev eval acoustic data [h] 15:17 0:46 0:40 # speakers # sentences # running words # lexicon words corpus ARPA WSJ0 NOV. 92 dev eval WER SER WER SER ML MMI MCE MWE Error Minimizing Training Criteria 10

11 Experiments on North American Business (NAB) Task initial ML trained acoustic models for WSJ0+1: 16 cepstral coefficients ms frame shift LDA (±1 frames, 99 32) 7000 general. triphone states + 1 silence state 6-state HMM across-word triphone models gender independent Gaussian mixtures corpus WSJ0+1 NAB Nov. 94 train dev eval acoustic data [h] 81:23 0:48 0:53 # speakers # sentences # running words # lexicon words pooled variance, 412k Gaussian densities NAB NAB-20k NAB-65k Nov. 94 corpus dev eval dev eval WER SER WER SER WER SER WER SER ML MMI MCE MWE MPE Error Minimizing Training Criteria 11

12 Conclusions Minimum Classification Error (MCE) applied to large vocabulary speech recognition. Efficient lattice-based computation of MCE training statistics. No need for N-best lists or restructuring of word lattices. Common representation and performance comparison of: Maximum Mutual Information (MMI) criterion, Minimum Classification Error (MCE) criterion, Minimum Word/Phone Error (MWE/MPE) criterion. MCE showed same performance gains as MWE. Acknowledgments This work was partially funded by the European Union under the integrated project TC-STAR Technology and Corpora for Speech to Speech Translation IST-2002-FP , Error Minimizing Training Criteria 12

13 References D. Povey, P. C. Woodland, Minimum phone error and i-smoothing for improved discriminative training, in IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2002, vol. 1, Orlando, FL, May 2002, pp D. Povey, Discriminative training for large vocabulary speech recognition, Ph.D. dissertation, Dept. of Eng., Cambridge Univ., Cambridge, August B.-H. Juang, S. Katagiri, Discriminative learning for minimum error classification, in IEEE Transactions on Signal Processing, vol. 40, no. 12, December 1992, pp W. Chou, C.-H. Lee, B.-H. Juang, Minimum Error Rate Training based on N-Best String Models, in IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, Minneapolis, MN, USA, April 1993, pp R. Schlüter, W. Macherey, Comparison of discriminative training criteria, in 1998 Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, Seattle, WA, May 1998, pp R. Schlüter, W. Macherey, B. Müller, H. Ney, Comparison of discriminative training criteria and optimization methods for speech recognition, Speech Communication, vol. 34, no. 1, pp , May E. McDermott, T. J. Hazen, Minimum classification error training of landmark models for real-time continuous speech recognition, in IEEE Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp Error Minimizing Training Criteria 13

14 References (cont d) E. McDermott, S. Katagiri, Minimum classification error for large scale speech recognition tasks using weighted finite state transducers, in IEEE Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, Philadelphia, PA, March 2005, pp K. K. Paliwal, M. Bacchiani, Y. Sagisaka, Minimum classification error training algorithm for feature extractor and pattern classifier in speech recognition, in 1995 Europ. Conf. on Speech Communication and Technology, vol. 1, Madrid, Spain, September 1995, pp W. Macherey, Implementation and comparison of discriminative training methods for automatic speech recognition, Diploma Thesis, Lehrstuhl für Informatik VI, RWTH Aachen University, Aachen, November D. S. Pallett, J. G. Fiscus, W. M. Fisher, J. S. Garofolo, B. A. Lund, and M. A. Przybocki, 1994 Benchmark test for the ARPA spoken language program, in ARPA Human Language Technology Workshop, Austin, TX, January 1995, pp F. Kubala, Design of the 1994 CSR benchmark tests, in ARPA Human Language Technology Workshop, Austin, TX, January 1995, pp W. Macherey, R. Schlüter, H. Ney, Discriminative training with tied covariance matrices, in 8th Int. Conf. on Spoken Language Processing, vol. 1, Jeju Island, Korea, October 2004, pp Error Minimizing Training Criteria 14

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney

ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department,

More information

Modified MMI/MPE: A Direct Evaluation of the Margin in Speech Recognition

Modified MMI/MPE: A Direct Evaluation of the Margin in Speech Recognition Modified MMI/MPE: A Direct Evaluation of the Margin in Speech Recognition Georg Heigold HEIGOLD@CS.RWTH-AACHEN.DE Thomas Deselaers DESELAERS@CS.RWTH-AACHEN.DE Ralf Schlüter SCHLUETER@CS.RWTH-AACHEN.DE

More information

Major Symbols Used in the Book and Their Descriptions

Major Symbols Used in the Book and Their Descriptions 103 Major Symbols Used in the Book and Their Descriptions Symbols R r = 1, ¼, R X c S s X r Descriptions number of training samples (tokens or strings) index of individual training samples aggregate of

More information

Automatic Transcription of Conversational Telephone Speech

Automatic Transcription of Conversational Telephone Speech IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 6, NOVEMBER 2005 1173 Automatic Transcription of Conversational Telephone Speech Thomas Hain, Member, IEEE, Philip C. Woodland, Member, IEEE,

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

Adaptive Training for Large Vocabulary Continuous Speech Recognition

Adaptive Training for Large Vocabulary Continuous Speech Recognition Adaptive Training for Large Vocabulary Continuous Speech Recognition Kai Yu Hughes Hall College and Cambridge University Engineering Department July 2006 Dissertation submitted to the University of Cambridge

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Automatic slide assignation for language model adaptation

Automatic slide assignation for language model adaptation Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Scaling Shrinkage-Based Language Models

Scaling Shrinkage-Based Language Models Scaling Shrinkage-Based Language Models Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, Abhinav Sethy IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, NY 10598 USA {stanchen,mangu,bhuvana,sarikaya,asethy}@us.ibm.com

More information

How to Improve the Sound Quality of Your Microphone

How to Improve the Sound Quality of Your Microphone An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies Andreas Maier, Julian Exner, Stefan Steidl, Anton Batliner, Tino Haderlein, and Elmar Nöth Universität Erlangen-Nürnberg,

More information

IBM Research Report. Scaling Shrinkage-Based Language Models

IBM Research Report. Scaling Shrinkage-Based Language Models RC24970 (W1004-019) April 6, 2010 Computer Science IBM Research Report Scaling Shrinkage-Based Language Models Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, Abhinav Sethy IBM Research

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

EFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION

EFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION EFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION By Ramasubramanian Sundaram A Thesis Submitted to the Faculty of Mississippi State University in Partial Fulfillment of the

More information

Combination of Multiple Speech Transcription Methods for Vocabulary Independent Search

Combination of Multiple Speech Transcription Methods for Vocabulary Independent Search Combination of Multiple Speech Transcription Methods for Vocabulary Independent Search ABSTRACT Jonathan Mamou IBM Haifa Research Lab Haifa 31905, Israel mamou@il.ibm.com Bhuvana Ramabhadran IBM T. J.

More information

THE BAVIECA OPEN-SOURCE SPEECH RECOGNITION TOOLKIT. Daniel Bolaños. Boulder Language Technologies (BLT), Boulder, CO, 80301 USA

THE BAVIECA OPEN-SOURCE SPEECH RECOGNITION TOOLKIT. Daniel Bolaños. Boulder Language Technologies (BLT), Boulder, CO, 80301 USA THE BAVIECA OPEN-SOURCE SPEECH RECOGNITION TOOLKIT Daniel Bolaños Boulder Language Technologies (BLT), Boulder, CO, 80301 USA dani@bltek.com ABSTRACT This article describes the design of Bavieca, an opensource

More information

Lecture 12: An Overview of Speech Recognition

Lecture 12: An Overview of Speech Recognition Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

More information

THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM

THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM Simon Wiesler 1, Kazuki Irie 2,, Zoltán Tüske 1, Ralf Schlüter 1, Hermann Ney 1,2 1 Human Language Technology and Pattern Recognition, Computer Science Department,

More information

EVALUATION OF AUTOMATIC TRANSCRIPTION SYSTEMS FOR THE JUDICIAL DOMAIN

EVALUATION OF AUTOMATIC TRANSCRIPTION SYSTEMS FOR THE JUDICIAL DOMAIN EVALUATION OF AUTOMATIC TRANSCRIPTION SYSTEMS FOR THE JUDICIAL DOMAIN J. Lööf (1), D. Falavigna (2),R.Schlüter (1), D. Giuliani (2), R. Gretter (2),H.Ney (1) (1) Computer Science Department, RWTH Aachen

More information

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

More information

Advances in Speech Transcription at IBM Under the DARPA EARS Program

Advances in Speech Transcription at IBM Under the DARPA EARS Program 1596 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 5, SEPTEMBER 2006 Advances in Speech Transcription at IBM Under the DARPA EARS Program Stanley F. Chen, Brian Kingsbury, Lidia

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

Speech Transcription

Speech Transcription TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1 What Is Speech Recognition? Def: Automatic conversion

More information

Detecting and Correcting Transcription Discrepancies between Thai Parliament Meeting Speech Utterances and their Official Meeting Reports

Detecting and Correcting Transcription Discrepancies between Thai Parliament Meeting Speech Utterances and their Official Meeting Reports Detecting and Correcting Transcription Discrepancies between Thai Parliament Meeting Speech Utterances and their Official Meeting Reports Natnarong. Puangsri, Atiwong. Suchato, Proadpran. Punyabukkana,

More information

Building A Vocabulary Self-Learning Speech Recognition System

Building A Vocabulary Self-Learning Speech Recognition System INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

A System for Searching and Browsing Spoken Communications

A System for Searching and Browsing Spoken Communications A System for Searching and Browsing Spoken Communications Lee Begeja Bernard Renger Murat Saraclar AT&T Labs Research 180 Park Ave Florham Park, NJ 07932 {lee, renger, murat} @research.att.com Abstract

More information

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

LIUM s Statistical Machine Translation System for IWSLT 2010

LIUM s Statistical Machine Translation System for IWSLT 2010 LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,

More information

Online Diarization of Telephone Conversations

Online Diarization of Telephone Conversations Odyssey 2 The Speaker and Language Recognition Workshop 28 June July 2, Brno, Czech Republic Online Diarization of Telephone Conversations Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman Department of

More information

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,

More information

Evaluating grapheme-to-phoneme converters in automatic speech recognition context

Evaluating grapheme-to-phoneme converters in automatic speech recognition context Evaluating grapheme-to-phoneme converters in automatic speech recognition context Denis Jouvet, Dominique Fohr, Irina Illina To cite this version: Denis Jouvet, Dominique Fohr, Irina Illina. Evaluating

More information

2014/02/13 Sphinx Lunch

2014/02/13 Sphinx Lunch 2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue

More information

A secure face tracking system

A secure face tracking system International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 10 (2014), pp. 959-964 International Research Publications House http://www. irphouse.com A secure face tracking

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

Phonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential

Phonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential white paper Phonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential A Whitepaper by Jacob Garland, Colin Blake, Mark Finlay and Drew Lanham Nexidia, Inc., Atlanta, GA People who create,

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification

Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Raphael Ullmann 1,2, Ramya Rasipuram 1, Mathew Magimai.-Doss 1, and Hervé Bourlard 1,2 1 Idiap Research Institute,

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)

Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet) Proceedings of the Twenty-Fourth Innovative Appications of Artificial Intelligence Conference Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet) Tatsuya Kawahara

More information

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

More information

IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered

IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification Shou-Chun Yin, Richard Rose, Senior

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

Spoken Document Retrieval from Call-Center Conversations

Spoken Document Retrieval from Call-Center Conversations Spoken Document Retrieval from Call-Center Conversations Jonathan Mamou, David Carmel, Ron Hoory IBM Haifa Research Labs Haifa 31905, Israel {mamou,carmel,hoory}@il.ibm.com ABSTRACT We are interested in

More information

Evaluating a motor unit potential train using cluster validation methods

Evaluating a motor unit potential train using cluster validation methods Evaluating a motor unit potential train using cluster validation methods Hossein Parsaei 1 and Daniel W. Stashuk Systems Design Engineering, University of Waterloo, Waterloo, Canada; ABSTRACT Assessing

More information

Confidence Measurement Techniques in Automatic Speech Recognition and Dialog Management

Confidence Measurement Techniques in Automatic Speech Recognition and Dialog Management Lehrstuhl für Mensch-Maschine-Kommunikation Technische Universität München Confidence Measurement Techniques in Automatic Speech Recognition and Dialog Management Tibor Fabian Vollständiger Abdruck der

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Discriminative Multimodal Biometric. Authentication Based on Quality Measures

Discriminative Multimodal Biometric. Authentication Based on Quality Measures Discriminative Multimodal Biometric Authentication Based on Quality Measures Julian Fierrez-Aguilar a,, Javier Ortega-Garcia a, Joaquin Gonzalez-Rodriguez a, Josef Bigun b a Escuela Politecnica Superior,

More information

CHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present

CHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present CHANWOO KIM (BIRTH: APR. 9, 1976) 2602E NSH Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Phone: +1-412-726-3996 Email: chanwook@cs.cmu.edu RESEARCH INTERESTS Speech recognition system,

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Annotated bibliographies for presentations in MUMT 611, Winter 2006

Annotated bibliographies for presentations in MUMT 611, Winter 2006 Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.

More information

MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION Ulpu Remes, Kalle J. Palomäki, and Mikko Kurimo Adaptive Informatics Research Centre,

More information

SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen

SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering, The University of Texas

More information

How To Identify A Churner

How To Identify A Churner 2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

More information

ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt

ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION Horacio Franco, Luciana Ferrer, and Harry Bratt Speech Technology and Research Laboratory, SRI International, Menlo Park, CA

More information

Estonian Large Vocabulary Speech Recognition System for Radiology

Estonian Large Vocabulary Speech Recognition System for Radiology Estonian Large Vocabulary Speech Recognition System for Radiology Tanel Alumäe, Einar Meister Institute of Cybernetics Tallinn University of Technology, Estonia October 8, 2010 Alumäe, Meister (TUT, Estonia)

More information

Victoria Kostina Curriculum Vitae - September 6, 2015 Page 1 of 5. Victoria Kostina

Victoria Kostina Curriculum Vitae - September 6, 2015 Page 1 of 5. Victoria Kostina Victoria Kostina Curriculum Vitae - September 6, 2015 Page 1 of 5 Victoria Kostina Department of Electrical Engineering www.caltech.edu/~vkostina California Institute of Technology, CA 91125 vkostina@caltech.edu

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Workshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science

Workshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science Engineering & Health Informa2on Science Engineering NLP Solu/ons for Structured Informa/on from Clinical Text: Extrac'ng Sen'nel Events from Pallia've Care Consult Le8ers Canada-China Clean Energy Initiative

More information

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

More information

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

The LIMSI RT-04 BN Arabic System

The LIMSI RT-04 BN Arabic System The LIMSI RT-04 BN Arabic System Abdel. Messaoudi, Lori Lamel and Jean-Luc Gauvain Spoken Language Processing Group LIMSI-CNRS, BP 133 91403 Orsay cedex, FRANCE {abdel,gauvain,lamel}@limsi.fr ABSTRACT

More information

Rethinking Speech Recognition on Mobile Devices

Rethinking Speech Recognition on Mobile Devices Rethinking Speech Recognition on Mobile Devices Anuj Kumar 1, Anuj Tewari 2, Seth Horrigan 2, Matthew Kam 1, Florian Metze 3 and John Canny 2 1 Human-Computer Interaction Institute, Carnegie Mellon University,

More information

tance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same

tance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same 1222 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 7, SEPTEMBER 2008 System Combination for Machine Translation of Spoken and Written Language Evgeny Matusov, Student Member,

More information

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer

More information

Class-specific Sparse Coding for Learning of Object Representations

Class-specific Sparse Coding for Learning of Object Representations Class-specific Sparse Coding for Learning of Object Representations Stephan Hasler, Heiko Wersing, and Edgar Körner Honda Research Institute Europe GmbH Carl-Legien-Str. 30, 63073 Offenbach am Main, Germany

More information

Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis

Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National

More information

TED-LIUM: an Automatic Speech Recognition dedicated corpus

TED-LIUM: an Automatic Speech Recognition dedicated corpus TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France firstname.lastname@lium.univ-lemans.fr

More information

Supporting Collaborative Transcription of Recorded Speech with a 3D Game Interface

Supporting Collaborative Transcription of Recorded Speech with a 3D Game Interface Supporting Collaborative Transcription of Recorded Speech with a 3D Game Interface Saturnino Luz 1, Masood Masoodian 2, and Bill Rogers 2 1 School of Computer Science and Statistics Trinity College Dublin

More information

Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

Data a systematic approach

Data a systematic approach Pattern Discovery on Australian Medical Claims Data a systematic approach Ah Chung Tsoi Senior Member, IEEE, Shu Zhang, Markus Hagenbuchner Member, IEEE Abstract The national health insurance system in

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

More information

Weighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition

Weighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 Weighting and Normalisation of Synchronous HMMs for

More information

Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations

Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Hugues Salamin, Anna Polychroniou and Alessandro Vinciarelli University of Glasgow - School of computing Science, G128QQ

More information

Speech recognition for human computer interaction

Speech recognition for human computer interaction Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices

More information

ADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA. Multimedia Communications Department, EURECOM, Sophia Antipolis, France 2

ADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA. Multimedia Communications Department, EURECOM, Sophia Antipolis, France 2 3rd European ignal Processing Conference (EUIPCO) ADAPTIVE AND ONLINE PEAKER DIARIZATION FOR MEETING DATA Giovanni oldi, Christophe Beaugeant and Nicholas Evans Multimedia Communications Department, EURECOM,

More information

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering

More information

Application of discriminant analysis to predict the class of degree for graduating students in a university system

Application of discriminant analysis to predict the class of degree for graduating students in a university system International Journal of Physical Sciences Vol. 4 (), pp. 06-0, January, 009 Available online at http://www.academicjournals.org/ijps ISSN 99-950 009 Academic Journals Full Length Research Paper Application

More information

Training Universal Background Models for Speaker Recognition

Training Universal Background Models for Speaker Recognition Odyssey 2010 The Speaer and Language Recognition Worshop 28 June 1 July 2010, Brno, Czech Republic Training Universal Bacground Models for Speaer Recognition Mohamed Kamal Omar and Jason Pelecanos IBM

More information

Reliable and Cost-Effective PoS-Tagging

Reliable and Cost-Effective PoS-Tagging Reliable and Cost-Effective PoS-Tagging Yu-Fang Tsai Keh-Jiann Chen Institute of Information Science, Academia Sinica Nanang, Taipei, Taiwan 5 eddie,chen@iis.sinica.edu.tw Abstract In order to achieve

More information