Automatic slide assignation for language model adaptation
|
|
|
- Roderick Pierce
- 10 years ago
- Views:
Transcription
1 Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, Introduction Online multimedia repositories are rapidly growing and imposing themselves as fundamental knowledge assets. This is particularly true in the area of education, where large repositories of video lectures are being built, making education accessible to a wide community of potential students. As with many other repositories, most lectures are not transcribed because of the lack of efficient solutions to obtain them at a reasonable level of accuracy. However, transcription of video lectures is clearly necessary to make them more accessible. Also, this would facilitate lecture searchability and analysis, including classification, summarisation, or plagiarism detection. In addition, people with hearing disabilities would be able to follow the lectures just by reading the transcriptions. Manual transcription of these repositories is excessively expensive and time-consuming and current state-of-the-art automatic speech recognition (ASR) has not yet demonstrated its potential to provide acceptable transcriptions on large-scale collections of audiovisual objects. However, in this type of videos the speaker often presents with some kind of background slides. In these cases, a strong correlation can be observed between the slides and the speech. Consequently, these slides provide an interesting opportunity to adapt general-purpose ASR models by massive adaptation from lecture-specific knowledge. In [1] we proposed an adaptation technique, obtaining adapted language models for each video using the slides. Results reported an improvement of up to 3.6% absolute WER points when using slides. For the present work we will assume that we are given a set of videos together with their slides. The slides, however, will not be labeled so we will not be able to directly obtain a video-adapted model for each video, but we will first need to assign one of the slide sets to each video. In this work, we will explore the automatic assignation of slides and we will study its impact on the final WER, comparing the resulting transcriptions with the ones obtained using the correct slides. 1
2 In this work we will focus on the polimedia repository. polimedia [2] was created for production and distribution of multimedia educational content at Universitat Politècnica de València. Lecturers are able to record lectures under controlled conditions which are distributed along with time-aligned slides. 2 Language model adaptation background Language model adaptation in the context of video transcription consists in using specific language models for each video (or set of videos). In the adaptation method we propose, one language model is trained from the text in the slides. This model is then interpolated [3] with other corpora (out-of-domain and indomain) to obtain a more general and more powerful model, but still adapted to the specific lecture. All language models are standard n-gram models, smoothed using modified Kneser-Ney [4]. The training has been performed using the SRILM toolkit [5], which is free for academic purposes. 2.1 Corpora description In this work several corpora were used to train the language model. We used up to 9 out-of-domain corpora (Tables 1 and 2), and the polimedia corpus as in-domain corpus. Table 1: Basic statistics of out-of-domain corpora Corpus # sentences # words Vocabulary EPPS 132K 0.9M 27K news-commentary 183K 4.6M 78K TED 316K 2.3M 69K UnitedNations 448K 10.8M 105K Europarl-v K 54.9M 155K El Periódico 2 695K 45.4M 313K news (07-11) 8 627K 217.2M 775K UnDoc 9 968K 318.0M 472K Table 2: Basic statistics of Google s Ngram corpus (v1) # unigrams # pages # books Vocabulary M 128M 521K 292K The polimedia corpus comprises more than 100 hours of manually transcribed videos. The corpus is divided in a training set, development set and test set. The polimedia corpus also provides manual transcriptions for the dev and test sets slides. Tables 3 and 4 provide more accurate information about the polimedia corpus. 2
3 Table 3: Basic statistics of the polimedia corpus Videos Time (hours) # sentences # words Vocabulary train K 96.8K 28K dev K 34K 4.5K test K 28.7K 4K Table 4: Basic statistics of slides in the polimedia corpus Videos # slides # sentences # words Vocabulary dev K 3.5K test K 2.9K The vocabulary is formed by the 50k most frequent words from all out-ofdomain corpora plus all the words in the polimedia training set. 3 Automatic slide assignation We are given a set of video transcriptions (correct or with errors) and the slides text for all of the videos. The slides, however, are unlabeled and we do not know which slide set belongs to each video. We propose a simple yet very effective technique which will let us assign slides to all of the videos. To decide which one of the slide sets better suits a specific video, we train a 3-gram model for each slide set and we use its perplexity over the video transcription as a score. The slides assigned to the video will be the ones with the lowest perplexity. 4 Experiments The aim of this work is twofold: on the one hand, we want to check if the proposed method is effective at assigning the slides to the right video with a low error rate. On the other hand, we want to explore the impact of this assignation on the transcription error when adapting using slides. With these goals in mind, we performed the following experiments. 4.1 Automatic assignation experiments Using the technique described in section 3, we carried out two types of automatic assignation experiments: assignation using correctly transcribed videos, and assignation using automatic transcriptions. For each type of transcriptions we tested the technique with subsets of different sizes. For each size we repeated the experiments multiple times using different random subsets, in order to make the experiment independent of the chosen subset. The assignation errors are shown in Table 5, showing the absolute number of incorrectly assigned videos as well as the percentage for both automatic and correct transcriptions. 3
4 Table 5: Assignation error (absolute and relative) 5 videos 10 videos 25 videos Full corpus Abs. % Abs. % Abs. % Abs. % Correct transcriptions Automatic transcriptions We can observe that independently of the size of the subset it s usual to find some errors in the assignation, even when correct slides are used. An analysis of the wrongly classified videos may clarify the causes of these misassigments. Table 6: Wrongly assigned videos. Videos in italics are wrongly assigned only when using automatic transcriptions automatic transcriptions Video # words in the slides M54.B M03.B M62.B02 17 M62.B M62.B04 0 M62.B05 0 Looking at the results in Table 6, we can observe that the slides of three of the videos are empty or almost empty. These three videos are the only assignment errors if we use correct slides. 4.2 LM adaptation with automatically assigned slides models The proposed techniques for language model adaptation are measured in terms of both perplexity and WER obtained with a state of the art ASR system [6]. The acoustic model has been trained using the polimedia corpus (Table 3), employing triphonemes inferred using the conventional CART with almost 3900 leaves. Each triphoneme was trained for up to 64 mixture components per Gaussian, 4 iterations per mixture and 3 states per phoneme with the typical left-to-right topology without skips. Additionally, speaker adaptation was performed applying CMLLR feature normalisation (full transformation matrices). The baseline language model was computed by interpolating all the out-ofdomain corpora with the polimedia corpus. The adapted language models were computed as discussed in Section 2, by interpolating all the previous corpora with the one trained from the video slides. Table 7 shows the results in terms of WER and PPL for both adapted models (correct and automatic assignation) as well as for the baseline model. Results show that the adaptation with automatically assigned slides is slightly worse than the one using correct slides, but it is still significantly better than no adaptation at all. 4
5 Table 7: WER (%) and PPLs Development Test PPL WER PPL WER (a) Baseline (b) Adapted model (correct assignation) (c) Adapted model (automatic assignation) If we look in more detail the results for the videos whose slides were incorrectly assigned, we can observe that in most cases the transcriptions using the real slides are slightly better than the ones using the assigned slides. However, in the cases where the slides were empty or almost empty, differences are in general smaller, even in one of the videso the transcripton is better using the automatic assignation. Table 8: WER comparison for the incorrectly assigned videos Video WER corr. sld. WER with ass. sld. WER transcr. M54.B M03.B M62.B M62.B M62.B M62.B Conclusions and future work The methodology described has been proved to be very effective to automatically assign slides, and despite a small amount of errors in the assignation, transcription results are almost as good as when using correct slides. The experiments assigning slides in differently-sized sets show that the correct assignation does not depend on the size of the set, but on the properties of each slide set. An interesting experiment derived from this work is to use the technique described to select one or more documents from the internet to train a new language model in ordre to improve the adaptation. References [1] A. Martínez-Villaronga, M. A. del Agua, J. Andrés-Ferrer, and A. Juan, Language model adaptation for video lectures transcription, in In Proc. ICASSP, Vancouver, Canada, May [2] polimedia: Videolectures from the Universitat Politecnica de Valencia, 5
6 [3] Frederick Jelinek and Robert L. Mercer, Interpolated estimation of Markov source parameters from sparse data, in In Proceedings of the Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands: North- Holland, May 1980, pp [4] Stanley F. Chen and Joshua Goodman, An empirical study of smoothing techniques for language modeling, Computer Speech & Language, vol. 13, no. 4, pp , [5] A. Stolcke, Srilm - an extensible language modeling toolkit, Proc. Intl. Conf. Spoken Language Processing, Denver, Colorado, September [6] The translectures-upv Team. The translectures-upv toolkit (TLK), translectures-upv toolkit (TLK) for Automatic Speech Recognition, 6
THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM
THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM Simon Wiesler 1, Kazuki Irie 2,, Zoltán Tüske 1, Ralf Schlüter 1, Hermann Ney 1,2 1 Human Language Technology and Pattern Recognition, Computer Science Department,
Strategies for Training Large Scale Neural Network Language Models
Strategies for Training Large Scale Neural Network Language Models Tomáš Mikolov #1, Anoop Deoras 2, Daniel Povey 3, Lukáš Burget #4, Jan Honza Černocký #5 # Brno University of Technology, Speech@FIT,
Turkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey [email protected], [email protected]
TED-LIUM: an Automatic Speech Recognition dedicated corpus
TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France [email protected]
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu
IBM Research Report. Scaling Shrinkage-Based Language Models
RC24970 (W1004-019) April 6, 2010 Computer Science IBM Research Report Scaling Shrinkage-Based Language Models Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, Abhinav Sethy IBM Research
Language technologies for Education: recent results by the MLLP group
Language technologies for Education: recent results by the MLLP group Alfons Juan 2nd Internet of Education Conference 2015 18 September 2015, Sarajevo Contents The MLLP research group 2 translectures
Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0
Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0 Tim Schlippe, Lukasz Gren, Ngoc Thang Vu, Tanja Schultz Cognitive Systems Lab, Karlsruhe Institute
German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings
German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings Haojin Yang, Christoph Oehlke, Christoph Meinel Hasso Plattner Institut (HPI), University of Potsdam P.O. Box
Word Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan
7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan We explain field experiments conducted during the 2009 fiscal year in five areas of Japan. We also show the experiments of evaluation
Slovak Automatic Transcription and Dictation System for the Judicial Domain
Slovak Automatic Transcription and Dictation System for the Judicial Domain Milan Rusko 1, Jozef Juhár 2, Marian Trnka 1, Ján Staš 2, Sakhia Darjaa 1, Daniel Hládek 2, Miloš Cerňak 1, Marek Papco 2, Róbert
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
Effect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果
Effect of Captioning Lecture Videos For Learning in Foreign Language VERI FERDIANSYAH 1 SEIICHI NAKAGAWA 1 Toyohashi University of Technology, Tenpaku-cho, Toyohashi 441-858 Japan E-mail: {veri, nakagawa}@slp.cs.tut.ac.jp
AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language
AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID
Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
Estonian Large Vocabulary Speech Recognition System for Radiology
Estonian Large Vocabulary Speech Recognition System for Radiology Tanel Alumäe, Einar Meister Institute of Cybernetics Tallinn University of Technology, Estonia October 8, 2010 Alumäe, Meister (TUT, Estonia)
Perplexity Method on the N-gram Language Model Based on Hadoop Framework
94 International Arab Journal of e-technology, Vol. 4, No. 2, June 2015 Perplexity Method on the N-gram Language Model Based on Hadoop Framework Tahani Mahmoud Allam 1, Hatem Abdelkader 2 and Elsayed Sallam
CALL-TYPE CLASSIFICATION AND UNSUPERVISED TRAINING FOR THE CALL CENTER DOMAIN
CALL-TYPE CLASSIFICATION AND UNSUPERVISED TRAINING FOR THE CALL CENTER DOMAIN Min Tang, Bryan Pellom, Kadri Hacioglu Center for Spoken Language Research University of Colorado at Boulder Boulder, Colorado
Generating Training Data for Medical Dictations
Generating Training Data for Medical Dictations Sergey Pakhomov University of Minnesota, MN [email protected] Michael Schonwetter Linguistech Consortium, NJ [email protected] Joan Bachenko
Simple Language Models for Spam Detection
Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to
Building A Vocabulary Self-Learning Speech Recognition System
INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes
Micro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
Utilizing Automatic Speech Recognition to Improve Deaf Accessibility on the Web
Utilizing Automatic Speech Recognition to Improve Deaf Accessibility on the Web Brent Shiver DePaul University [email protected] Abstract Internet technologies have expanded rapidly over the past two
Survey: Retrieval of Video Using Content (Speech &Text) Information
Survey: Retrieval of Video Using Content (Speech &Text) Information Bhagwant B. Handge 1, Prof. N.R.Wankhade 2 1. M.E Student, Kalyani Charitable Trust s Late G.N. Sapkal College of Engineering, Nashik
Slovak Automatic Dictation System for Judicial Domain
Slovak Automatic Dictation System for Judicial Domain Milan Rusko 1(&), Jozef Juhár 2, Marián Trnka 1, Ján Staš 2, Sakhia Darjaa 1, Daniel Hládek 2, Róbert Sabo 1, Matúš Pleva 2, Marián Ritomský 1, and
SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen
SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering, The University of Texas
IMPLEMENTING SRI S PASHTO SPEECH-TO-SPEECH TRANSLATION SYSTEM ON A SMART PHONE
IMPLEMENTING SRI S PASHTO SPEECH-TO-SPEECH TRANSLATION SYSTEM ON A SMART PHONE Jing Zheng, Arindam Mandal, Xin Lei 1, Michael Frandsen, Necip Fazil Ayan, Dimitra Vergyri, Wen Wang, Murat Akbacak, Kristin
Collaborative Machine Translation Service for Scientific texts
Collaborative Machine Translation Service for Scientific texts Patrik Lambert [email protected] Jean Senellart Systran SA [email protected] Laurent Romary Humboldt Universität Berlin
Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty
1 Project Number: DM3 IQP AAGV Understanding Video Lectures in a Flipped Classroom Setting A Major Qualifying Project Report Submitted to the Faculty Of Worcester Polytechnic Institute In partial fulfillment
Large Language Models in Machine Translation
Large Language Models in Machine Translation Thorsten Brants Ashok C. Popat Peng Xu Franz J. Och Jeffrey Dean Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94303, USA {brants,popat,xp,och,jeff}@google.com
SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
A STATE-SPACE METHOD FOR LANGUAGE MODELING. Vesa Siivola and Antti Honkela. Helsinki University of Technology Neural Networks Research Centre
A STATE-SPACE METHOD FOR LANGUAGE MODELING Vesa Siivola and Antti Honkela Helsinki University of Technology Neural Networks Research Centre [email protected], [email protected] ABSTRACT In this paper,
Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text
Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text Matthew Cooper FX Palo Alto Laboratory Palo Alto, CA 94034 USA [email protected] ABSTRACT Video is becoming a prevalent medium
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Three New Graphical Models for Statistical Language Modelling
Andriy Mnih Geoffrey Hinton Department of Computer Science, University of Toronto, Canada [email protected] [email protected] Abstract The supremacy of n-gram models in statistical language modelling
Factored Language Model based on Recurrent Neural Network
Factored Language Model based on Recurrent Neural Network Youzheng Wu X ugang Lu Hi toshi Yamamoto Shi geki M atsuda Chiori Hori Hideki Kashioka National Institute of Information and Communications Technology
CS 533: Natural Language. Word Prediction
CS 533: Natural Language Processing Lecture 03 N-Gram Models and Algorithms CS 533: Natural Language Processing Lecture 01 1 Word Prediction Suppose you read the following sequence of words: Sue swallowed
Robust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist ([email protected]) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
Effective Self-Training for Parsing
Effective Self-Training for Parsing David McClosky [email protected] Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - [email protected]
Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System
Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System Eunah Cho, Jan Niehues and Alex Waibel International Center for Advanced Communication Technologies
Automatic Cross-Biometric Footstep Database Labelling using Speaker Recognition
Automatic Cross-Biometric Footstep Database Labelling using Speaker Recognition Ruben Vera-Rodriguez 1, John S.D. Mason 1 and Nicholas W.D. Evans 1,2 1 Speech and Image Research Group, Swansea University,
Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features
Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering
Hybrid Machine Translation Guided by a Rule Based System
Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the
Chapter 7. Language models. Statistical Machine Translation
Chapter 7 Language models Statistical Machine Translation Language models Language models answer the question: How likely is a string of English words good English? Help with reordering p lm (the house
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)
Proceedings of the Twenty-Fourth Innovative Appications of Artificial Intelligence Conference Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet) Tatsuya Kawahara
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Automated Transcription of Conversational Call Center Speech with Respect to Non-verbal Acoustic Events
Automated Transcription of Conversational Call Center Speech with Respect to Non-verbal Acoustic Events Gellért Sárosi 1, Balázs Tarján 1, Tibor Fegyó 1,2, and Péter Mihajlik 1,3 1 Department of Telecommunication
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
Evaluation of speech technologies
CLARA Training course on evaluation of Human Language Technologies Evaluations and Language resources Distribution Agency November 27, 2012 Evaluation of speaker identification Speech technologies Outline
Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke
1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
Language Modeling. Chapter 1. 1.1 Introduction
Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set
POS Tagging 1. POS Tagging. Rule-based taggers Statistical taggers Hybrid approaches
POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches POS Tagging 1 POS Tagging 2 Words taken isolatedly are ambiguous regarding its POS Yo bajo con el hombre bajo a PP AQ
Evaluating grapheme-to-phoneme converters in automatic speech recognition context
Evaluating grapheme-to-phoneme converters in automatic speech recognition context Denis Jouvet, Dominique Fohr, Irina Illina To cite this version: Denis Jouvet, Dominique Fohr, Irina Illina. Evaluating
CallSurf - Automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content
CallSurf - Automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content Martine Garnier-Rizet 1, Gilles Adda 2, Frederik Cailliau
