Automatic slide assignation for language model adaptation

Size: px
Start display at page:

Download "Automatic slide assignation for language model adaptation"

Transcription

1 Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, Introduction Online multimedia repositories are rapidly growing and imposing themselves as fundamental knowledge assets. This is particularly true in the area of education, where large repositories of video lectures are being built, making education accessible to a wide community of potential students. As with many other repositories, most lectures are not transcribed because of the lack of efficient solutions to obtain them at a reasonable level of accuracy. However, transcription of video lectures is clearly necessary to make them more accessible. Also, this would facilitate lecture searchability and analysis, including classification, summarisation, or plagiarism detection. In addition, people with hearing disabilities would be able to follow the lectures just by reading the transcriptions. Manual transcription of these repositories is excessively expensive and time-consuming and current state-of-the-art automatic speech recognition (ASR) has not yet demonstrated its potential to provide acceptable transcriptions on large-scale collections of audiovisual objects. However, in this type of videos the speaker often presents with some kind of background slides. In these cases, a strong correlation can be observed between the slides and the speech. Consequently, these slides provide an interesting opportunity to adapt general-purpose ASR models by massive adaptation from lecture-specific knowledge. In [1] we proposed an adaptation technique, obtaining adapted language models for each video using the slides. Results reported an improvement of up to 3.6% absolute WER points when using slides. For the present work we will assume that we are given a set of videos together with their slides. The slides, however, will not be labeled so we will not be able to directly obtain a video-adapted model for each video, but we will first need to assign one of the slide sets to each video. In this work, we will explore the automatic assignation of slides and we will study its impact on the final WER, comparing the resulting transcriptions with the ones obtained using the correct slides. 1

2 In this work we will focus on the polimedia repository. polimedia [2] was created for production and distribution of multimedia educational content at Universitat Politècnica de València. Lecturers are able to record lectures under controlled conditions which are distributed along with time-aligned slides. 2 Language model adaptation background Language model adaptation in the context of video transcription consists in using specific language models for each video (or set of videos). In the adaptation method we propose, one language model is trained from the text in the slides. This model is then interpolated [3] with other corpora (out-of-domain and indomain) to obtain a more general and more powerful model, but still adapted to the specific lecture. All language models are standard n-gram models, smoothed using modified Kneser-Ney [4]. The training has been performed using the SRILM toolkit [5], which is free for academic purposes. 2.1 Corpora description In this work several corpora were used to train the language model. We used up to 9 out-of-domain corpora (Tables 1 and 2), and the polimedia corpus as in-domain corpus. Table 1: Basic statistics of out-of-domain corpora Corpus # sentences # words Vocabulary EPPS 132K 0.9M 27K news-commentary 183K 4.6M 78K TED 316K 2.3M 69K UnitedNations 448K 10.8M 105K Europarl-v K 54.9M 155K El Periódico 2 695K 45.4M 313K news (07-11) 8 627K 217.2M 775K UnDoc 9 968K 318.0M 472K Table 2: Basic statistics of Google s Ngram corpus (v1) # unigrams # pages # books Vocabulary M 128M 521K 292K The polimedia corpus comprises more than 100 hours of manually transcribed videos. The corpus is divided in a training set, development set and test set. The polimedia corpus also provides manual transcriptions for the dev and test sets slides. Tables 3 and 4 provide more accurate information about the polimedia corpus. 2

3 Table 3: Basic statistics of the polimedia corpus Videos Time (hours) # sentences # words Vocabulary train K 96.8K 28K dev K 34K 4.5K test K 28.7K 4K Table 4: Basic statistics of slides in the polimedia corpus Videos # slides # sentences # words Vocabulary dev K 3.5K test K 2.9K The vocabulary is formed by the 50k most frequent words from all out-ofdomain corpora plus all the words in the polimedia training set. 3 Automatic slide assignation We are given a set of video transcriptions (correct or with errors) and the slides text for all of the videos. The slides, however, are unlabeled and we do not know which slide set belongs to each video. We propose a simple yet very effective technique which will let us assign slides to all of the videos. To decide which one of the slide sets better suits a specific video, we train a 3-gram model for each slide set and we use its perplexity over the video transcription as a score. The slides assigned to the video will be the ones with the lowest perplexity. 4 Experiments The aim of this work is twofold: on the one hand, we want to check if the proposed method is effective at assigning the slides to the right video with a low error rate. On the other hand, we want to explore the impact of this assignation on the transcription error when adapting using slides. With these goals in mind, we performed the following experiments. 4.1 Automatic assignation experiments Using the technique described in section 3, we carried out two types of automatic assignation experiments: assignation using correctly transcribed videos, and assignation using automatic transcriptions. For each type of transcriptions we tested the technique with subsets of different sizes. For each size we repeated the experiments multiple times using different random subsets, in order to make the experiment independent of the chosen subset. The assignation errors are shown in Table 5, showing the absolute number of incorrectly assigned videos as well as the percentage for both automatic and correct transcriptions. 3

4 Table 5: Assignation error (absolute and relative) 5 videos 10 videos 25 videos Full corpus Abs. % Abs. % Abs. % Abs. % Correct transcriptions Automatic transcriptions We can observe that independently of the size of the subset it s usual to find some errors in the assignation, even when correct slides are used. An analysis of the wrongly classified videos may clarify the causes of these misassigments. Table 6: Wrongly assigned videos. Videos in italics are wrongly assigned only when using automatic transcriptions automatic transcriptions Video # words in the slides M54.B M03.B M62.B02 17 M62.B M62.B04 0 M62.B05 0 Looking at the results in Table 6, we can observe that the slides of three of the videos are empty or almost empty. These three videos are the only assignment errors if we use correct slides. 4.2 LM adaptation with automatically assigned slides models The proposed techniques for language model adaptation are measured in terms of both perplexity and WER obtained with a state of the art ASR system [6]. The acoustic model has been trained using the polimedia corpus (Table 3), employing triphonemes inferred using the conventional CART with almost 3900 leaves. Each triphoneme was trained for up to 64 mixture components per Gaussian, 4 iterations per mixture and 3 states per phoneme with the typical left-to-right topology without skips. Additionally, speaker adaptation was performed applying CMLLR feature normalisation (full transformation matrices). The baseline language model was computed by interpolating all the out-ofdomain corpora with the polimedia corpus. The adapted language models were computed as discussed in Section 2, by interpolating all the previous corpora with the one trained from the video slides. Table 7 shows the results in terms of WER and PPL for both adapted models (correct and automatic assignation) as well as for the baseline model. Results show that the adaptation with automatically assigned slides is slightly worse than the one using correct slides, but it is still significantly better than no adaptation at all. 4

5 Table 7: WER (%) and PPLs Development Test PPL WER PPL WER (a) Baseline (b) Adapted model (correct assignation) (c) Adapted model (automatic assignation) If we look in more detail the results for the videos whose slides were incorrectly assigned, we can observe that in most cases the transcriptions using the real slides are slightly better than the ones using the assigned slides. However, in the cases where the slides were empty or almost empty, differences are in general smaller, even in one of the videso the transcripton is better using the automatic assignation. Table 8: WER comparison for the incorrectly assigned videos Video WER corr. sld. WER with ass. sld. WER transcr. M54.B M03.B M62.B M62.B M62.B M62.B Conclusions and future work The methodology described has been proved to be very effective to automatically assign slides, and despite a small amount of errors in the assignation, transcription results are almost as good as when using correct slides. The experiments assigning slides in differently-sized sets show that the correct assignation does not depend on the size of the set, but on the properties of each slide set. An interesting experiment derived from this work is to use the technique described to select one or more documents from the internet to train a new language model in ordre to improve the adaptation. References [1] A. Martínez-Villaronga, M. A. del Agua, J. Andrés-Ferrer, and A. Juan, Language model adaptation for video lectures transcription, in In Proc. ICASSP, Vancouver, Canada, May [2] polimedia: Videolectures from the Universitat Politecnica de Valencia, 5

6 [3] Frederick Jelinek and Robert L. Mercer, Interpolated estimation of Markov source parameters from sparse data, in In Proceedings of the Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands: North- Holland, May 1980, pp [4] Stanley F. Chen and Joshua Goodman, An empirical study of smoothing techniques for language modeling, Computer Speech & Language, vol. 13, no. 4, pp , [5] A. Stolcke, Srilm - an extensible language modeling toolkit, Proc. Intl. Conf. Spoken Language Processing, Denver, Colorado, September [6] The translectures-upv Team. The translectures-upv toolkit (TLK), translectures-upv toolkit (TLK) for Automatic Speech Recognition, 6

Language Model Adaptation for Video Lecture Transcription

Language Model Adaptation for Video Lecture Transcription UNIVERSITAT POLITÈCNICA DE VALÈNCIA DEPARTAMENT DE SISTEMES INFORMÀTICS I COMPUTACIÓ Language Model Adaptation for Video Lecture Transcription Master Thesis - MIARFID Adrià A. Martínez Villaronga Directors:

More information

Document downloaded from: http://hdl.handle.net/10251/35190. This paper must be cited as:

Document downloaded from: http://hdl.handle.net/10251/35190. This paper must be cited as: Document downloaded from: http://hdl.handle.net/10251/35190 This paper must be cited as: Valor Miró, JD.; Pérez González De Martos, AM.; Civera Saiz, J.; Juan Císcar, A. (2012). Integrating a State-of-the-Art

More information

THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM

THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM Simon Wiesler 1, Kazuki Irie 2,, Zoltán Tüske 1, Ralf Schlüter 1, Hermann Ney 1,2 1 Human Language Technology and Pattern Recognition, Computer Science Department,

More information

Strategies for Training Large Scale Neural Network Language Models

Strategies for Training Large Scale Neural Network Language Models Strategies for Training Large Scale Neural Network Language Models Tomáš Mikolov #1, Anoop Deoras 2, Daniel Povey 3, Lukáš Burget #4, Jan Honza Černocký #5 # Brno University of Technology, Speech@FIT,

More information

Scaling Shrinkage-Based Language Models

Scaling Shrinkage-Based Language Models Scaling Shrinkage-Based Language Models Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, Abhinav Sethy IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, NY 10598 USA {stanchen,mangu,bhuvana,sarikaya,asethy}@us.ibm.com

More information

LIUM s Statistical Machine Translation System for IWSLT 2010

LIUM s Statistical Machine Translation System for IWSLT 2010 LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

TED-LIUM: an Automatic Speech Recognition dedicated corpus

TED-LIUM: an Automatic Speech Recognition dedicated corpus TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France firstname.lastname@lium.univ-lemans.fr

More information

MLLP Transcription and Translation Platform

MLLP Transcription and Translation Platform MLLP Transcription and Translation Platform Alejandro Pérez González de Martos, Joan Albert Silvestre-Cerdà, Juan Daniel Valor Miró, Jorge Civera, and Alfons Juan Universitat Politècnica de València, Camino

More information

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu

More information

Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training

Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training Matthias Paulik and Panchi Panchapagesan Cisco Speech and Language Technology (C-SALT), Cisco Systems, Inc.

More information

IBM Research Report. Scaling Shrinkage-Based Language Models

IBM Research Report. Scaling Shrinkage-Based Language Models RC24970 (W1004-019) April 6, 2010 Computer Science IBM Research Report Scaling Shrinkage-Based Language Models Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, Abhinav Sethy IBM Research

More information

ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney

ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department,

More information

EVALUATION OF AUTOMATIC TRANSCRIPTION SYSTEMS FOR THE JUDICIAL DOMAIN

EVALUATION OF AUTOMATIC TRANSCRIPTION SYSTEMS FOR THE JUDICIAL DOMAIN EVALUATION OF AUTOMATIC TRANSCRIPTION SYSTEMS FOR THE JUDICIAL DOMAIN J. Lööf (1), D. Falavigna (2),R.Schlüter (1), D. Giuliani (2), R. Gretter (2),H.Ney (1) (1) Computer Science Department, RWTH Aachen

More information

Language technologies for Education: recent results by the MLLP group

Language technologies for Education: recent results by the MLLP group Language technologies for Education: recent results by the MLLP group Alfons Juan 2nd Internet of Education Conference 2015 18 September 2015, Sarajevo Contents The MLLP research group 2 translectures

More information

Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0 Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0 Tim Schlippe, Lukasz Gren, Ngoc Thang Vu, Tanja Schultz Cognitive Systems Lab, Karlsruhe Institute

More information

German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings

German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings Haojin Yang, Christoph Oehlke, Christoph Meinel Hasso Plattner Institut (HPI), University of Potsdam P.O. Box

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan

7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan 7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan We explain field experiments conducted during the 2009 fiscal year in five areas of Japan. We also show the experiments of evaluation

More information

Slovak Automatic Transcription and Dictation System for the Judicial Domain

Slovak Automatic Transcription and Dictation System for the Judicial Domain Slovak Automatic Transcription and Dictation System for the Judicial Domain Milan Rusko 1, Jozef Juhár 2, Marian Trnka 1, Ján Staš 2, Sakhia Darjaa 1, Daniel Hládek 2, Miloš Cerňak 1, Marek Papco 2, Róbert

More information

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1

More information

SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS

SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS Mbarek Charhad, Daniel Moraru, Stéphane Ayache and Georges Quénot CLIPS-IMAG BP 53, 38041 Grenoble cedex 9, France Georges.Quenot@imag.fr ABSTRACT The

More information

Effect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果

Effect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果 Effect of Captioning Lecture Videos For Learning in Foreign Language VERI FERDIANSYAH 1 SEIICHI NAKAGAWA 1 Toyohashi University of Technology, Tenpaku-cho, Toyohashi 441-858 Japan E-mail: {veri, nakagawa}@slp.cs.tut.ac.jp

More information

Portuguese Broadcast News and Statistical Machine Translation (SMT)

Portuguese Broadcast News and Statistical Machine Translation (SMT) Statistical Machine Translation of Broadcast News from Spanish to Portuguese Raquel Sánchez Martínez 1, João Paulo da Silva Neto 2, and Diamantino António Caseiro 1 L²F - Spoken Language Systems Laboratory,

More information

Evaluating intelligent interfaces for post-editing automatic transcriptions of online video lectures

Evaluating intelligent interfaces for post-editing automatic transcriptions of online video lectures Evaluating intelligent interfaces for post-editing automatic transcriptions of online video lectures J. D. Valor Miró, R. N. Spencer, A. Pérez González de Martos, G. Garcés Díaz-Munío, C. Turró, J. Civera

More information

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

Estonian Large Vocabulary Speech Recognition System for Radiology

Estonian Large Vocabulary Speech Recognition System for Radiology Estonian Large Vocabulary Speech Recognition System for Radiology Tanel Alumäe, Einar Meister Institute of Cybernetics Tallinn University of Technology, Estonia October 8, 2010 Alumäe, Meister (TUT, Estonia)

More information

Perplexity Method on the N-gram Language Model Based on Hadoop Framework

Perplexity Method on the N-gram Language Model Based on Hadoop Framework 94 International Arab Journal of e-technology, Vol. 4, No. 2, June 2015 Perplexity Method on the N-gram Language Model Based on Hadoop Framework Tahani Mahmoud Allam 1, Hatem Abdelkader 2 and Elsayed Sallam

More information

CALL-TYPE CLASSIFICATION AND UNSUPERVISED TRAINING FOR THE CALL CENTER DOMAIN

CALL-TYPE CLASSIFICATION AND UNSUPERVISED TRAINING FOR THE CALL CENTER DOMAIN CALL-TYPE CLASSIFICATION AND UNSUPERVISED TRAINING FOR THE CALL CENTER DOMAIN Min Tang, Bryan Pellom, Kadri Hacioglu Center for Spoken Language Research University of Colorado at Boulder Boulder, Colorado

More information

Evaluation of Interactive User Corrections for Lecture Transcription

Evaluation of Interactive User Corrections for Lecture Transcription Evaluation of Interactive User Corrections for Lecture Transcription Henrich Kolkhorst, Kevin Kilgour, Sebastian Stüker, and Alex Waibel International Center for Advanced Communication Technologies InterACT

More information

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Michael J. Witbrock and Alexander G. Hauptmann Carnegie Mellon University ABSTRACT Library

More information

The TCH Machine Translation System for IWSLT 2008

The TCH Machine Translation System for IWSLT 2008 The TCH Machine Translation System for IWSLT 2008 Haifeng Wang, Hua Wu, Xiaoguang Hu, Zhanyi Liu, Jianfeng Li, Dengjun Ren, Zhengyu Niu Toshiba (China) Research and Development Center 5/F., Tower W2, Oriental

More information

Generating Training Data for Medical Dictations

Generating Training Data for Medical Dictations Generating Training Data for Medical Dictations Sergey Pakhomov University of Minnesota, MN pakhomov.sergey@mayo.edu Michael Schonwetter Linguistech Consortium, NJ MSchonwetter@qwest.net Joan Bachenko

More information

Adapting General Models to Novel Project Ideas

Adapting General Models to Novel Project Ideas The KIT Translation Systems for IWSLT 2013 Thanh-Le Ha, Teresa Herrmann, Jan Niehues, Mohammed Mediani, Eunah Cho, Yuqi Zhang, Isabel Slawik and Alex Waibel Institute for Anthropomatics KIT - Karlsruhe

More information

Simple Language Models for Spam Detection

Simple Language Models for Spam Detection Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to

More information

Building A Vocabulary Self-Learning Speech Recognition System

Building A Vocabulary Self-Learning Speech Recognition System INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes

More information

Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

More information

Utilizing Automatic Speech Recognition to Improve Deaf Accessibility on the Web

Utilizing Automatic Speech Recognition to Improve Deaf Accessibility on the Web Utilizing Automatic Speech Recognition to Improve Deaf Accessibility on the Web Brent Shiver DePaul University bshiver@cs.depaul.edu Abstract Internet technologies have expanded rapidly over the past two

More information

Survey: Retrieval of Video Using Content (Speech &Text) Information

Survey: Retrieval of Video Using Content (Speech &Text) Information Survey: Retrieval of Video Using Content (Speech &Text) Information Bhagwant B. Handge 1, Prof. N.R.Wankhade 2 1. M.E Student, Kalyani Charitable Trust s Late G.N. Sapkal College of Engineering, Nashik

More information

Adaptation to Hungarian, Swedish, and Spanish

Adaptation to Hungarian, Swedish, and Spanish www.kconnect.eu Adaptation to Hungarian, Swedish, and Spanish Deliverable number D1.4 Dissemination level Public Delivery date 31 January 2016 Status Author(s) Final Jindřich Libovický, Aleš Tamchyna,

More information

The KIT Translation system for IWSLT 2010

The KIT Translation system for IWSLT 2010 The KIT Translation system for IWSLT 2010 Jan Niehues 1, Mohammed Mediani 1, Teresa Herrmann 1, Michael Heck 2, Christian Herff 2, Alex Waibel 1 Institute of Anthropomatics KIT - Karlsruhe Institute of

More information

SEGMENTATION AND INDEXATION OF BROADCAST NEWS

SEGMENTATION AND INDEXATION OF BROADCAST NEWS SEGMENTATION AND INDEXATION OF BROADCAST NEWS Rui Amaral 1, Isabel Trancoso 2 1 IPS/INESC ID Lisboa 2 IST/INESC ID Lisboa INESC ID Lisboa, Rua Alves Redol, 9,1000-029 Lisboa, Portugal {Rui.Amaral, Isabel.Trancoso}@inesc-id.pt

More information

Slovak Automatic Dictation System for Judicial Domain

Slovak Automatic Dictation System for Judicial Domain Slovak Automatic Dictation System for Judicial Domain Milan Rusko 1(&), Jozef Juhár 2, Marián Trnka 1, Ján Staš 2, Sakhia Darjaa 1, Daniel Hládek 2, Róbert Sabo 1, Matúš Pleva 2, Marián Ritomský 1, and

More information

SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen

SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering, The University of Texas

More information

IMPLEMENTING SRI S PASHTO SPEECH-TO-SPEECH TRANSLATION SYSTEM ON A SMART PHONE

IMPLEMENTING SRI S PASHTO SPEECH-TO-SPEECH TRANSLATION SYSTEM ON A SMART PHONE IMPLEMENTING SRI S PASHTO SPEECH-TO-SPEECH TRANSLATION SYSTEM ON A SMART PHONE Jing Zheng, Arindam Mandal, Xin Lei 1, Michael Frandsen, Necip Fazil Ayan, Dimitra Vergyri, Wen Wang, Murat Akbacak, Kristin

More information

Collaborative Machine Translation Service for Scientific texts

Collaborative Machine Translation Service for Scientific texts Collaborative Machine Translation Service for Scientific texts Patrik Lambert patrik.lambert@lium.univ-lemans.fr Jean Senellart Systran SA senellart@systran.fr Laurent Romary Humboldt Universität Berlin

More information

Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty

Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty 1 Project Number: DM3 IQP AAGV Understanding Video Lectures in a Flipped Classroom Setting A Major Qualifying Project Report Submitted to the Faculty Of Worcester Polytechnic Institute In partial fulfillment

More information

Large Language Models in Machine Translation

Large Language Models in Machine Translation Large Language Models in Machine Translation Thorsten Brants Ashok C. Popat Peng Xu Franz J. Och Jeffrey Dean Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94303, USA {brants,popat,xp,och,jeff}@google.com

More information

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统 SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande

More information

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University

More information

A STATE-SPACE METHOD FOR LANGUAGE MODELING. Vesa Siivola and Antti Honkela. Helsinki University of Technology Neural Networks Research Centre

A STATE-SPACE METHOD FOR LANGUAGE MODELING. Vesa Siivola and Antti Honkela. Helsinki University of Technology Neural Networks Research Centre A STATE-SPACE METHOD FOR LANGUAGE MODELING Vesa Siivola and Antti Honkela Helsinki University of Technology Neural Networks Research Centre Vesa.Siivola@hut.fi, Antti.Honkela@hut.fi ABSTRACT In this paper,

More information

Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text

Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text Matthew Cooper FX Palo Alto Laboratory Palo Alto, CA 94034 USA cooper@fxpal.com ABSTRACT Video is becoming a prevalent medium

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

Three New Graphical Models for Statistical Language Modelling

Three New Graphical Models for Statistical Language Modelling Andriy Mnih Geoffrey Hinton Department of Computer Science, University of Toronto, Canada amnih@cs.toronto.edu hinton@cs.toronto.edu Abstract The supremacy of n-gram models in statistical language modelling

More information

Factored Language Model based on Recurrent Neural Network

Factored Language Model based on Recurrent Neural Network Factored Language Model based on Recurrent Neural Network Youzheng Wu X ugang Lu Hi toshi Yamamoto Shi geki M atsuda Chiori Hori Hideki Kashioka National Institute of Information and Communications Technology

More information

Transcription System for Semi-Spontaneous Estonian Speech

Transcription System for Semi-Spontaneous Estonian Speech 10 Human Language Technologies The Baltic Perspective A. Tavast et al. (Eds.) 2012 The Authors and IOS Press. This article is published online with Open Access by IOS Press and distributed under the terms

More information

CS 533: Natural Language. Word Prediction

CS 533: Natural Language. Word Prediction CS 533: Natural Language Processing Lecture 03 N-Gram Models and Algorithms CS 533: Natural Language Processing Lecture 01 1 Word Prediction Suppose you read the following sequence of words: Sue swallowed

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

Amara: A Sustainable, Global Solution for Accessibility, Powered by Communities of Volunteers

Amara: A Sustainable, Global Solution for Accessibility, Powered by Communities of Volunteers Amara: A Sustainable, Global Solution for Accessibility, Powered by Communities of Volunteers Dean Jansen 1, Aleli Alcala 1, and Francisco Guzman 2 1 Amara.org, USA {dean,aleli}@pculture.org 2 Qatar Computing

More information

Effective Self-Training for Parsing

Effective Self-Training for Parsing Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu

More information

How to Improve the Sound Quality of Your Microphone

How to Improve the Sound Quality of Your Microphone An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies Andreas Maier, Julian Exner, Stefan Steidl, Anton Batliner, Tino Haderlein, and Elmar Nöth Universität Erlangen-Nürnberg,

More information

Using classes has the potential of reducing the problem of sparseness of data by allowing generalizations

Using classes has the potential of reducing the problem of sparseness of data by allowing generalizations POS Tags and Decision Trees for Language Modeling Peter A. Heeman Department of Computer Science and Engineering Oregon Graduate Institute PO Box 91000, Portland OR 97291 heeman@cse.ogi.edu Abstract Language

More information

Studies on Training Text Selection for Conversational Finnish Language Modeling

Studies on Training Text Selection for Conversational Finnish Language Modeling Studies on Training Text Selection for Conversational Finnish Language Modeling Seppo Enarvi and Mikko Kurimo Aalto University School of Electrical Engineering Department of Signal Processing and Acoustics

More information

The LIMSI RT-04 BN Arabic System

The LIMSI RT-04 BN Arabic System The LIMSI RT-04 BN Arabic System Abdel. Messaoudi, Lori Lamel and Jean-Luc Gauvain Spoken Language Processing Group LIMSI-CNRS, BP 133 91403 Orsay cedex, FRANCE {abdel,gauvain,lamel}@limsi.fr ABSTRACT

More information

Development of a Speech-to-Text Transcription System for Finnish

Development of a Speech-to-Text Transcription System for Finnish Development of a Speech-to-Text Transcription System for Finnish Lori Lamel 1 and Bianca Vieru 2 1 Spoken Language Processing Group 2 Vecsys Research CNRS-LIMSI, BP 133 3, rue Jean Rostand 91403 Orsay

More information

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System Eunah Cho, Jan Niehues and Alex Waibel International Center for Advanced Communication Technologies

More information

Automatic Cross-Biometric Footstep Database Labelling using Speaker Recognition

Automatic Cross-Biometric Footstep Database Labelling using Speaker Recognition Automatic Cross-Biometric Footstep Database Labelling using Speaker Recognition Ruben Vera-Rodriguez 1, John S.D. Mason 1 and Nicholas W.D. Evans 1,2 1 Speech and Image Research Group, Swansea University,

More information

The Influence of Topic and Domain Specific Words on WER

The Influence of Topic and Domain Specific Words on WER The Influence of Topic and Domain Specific Words on WER And Can We Get the User in to Correct Them? Sebastian Stüker KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der

More information

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering

More information

Semantic Search in E-Discovery. David Graus & Zhaochun Ren

Semantic Search in E-Discovery. David Graus & Zhaochun Ren Semantic Search in E-Discovery David Graus & Zhaochun Ren This talk Introduction David Graus! Understanding e-mail traffic David Graus! Topic discovery & tracking in social media Zhaochun Ren 2 Intro Semantic

More information

Hybrid Machine Translation Guided by a Rule Based System

Hybrid Machine Translation Guided by a Rule Based System Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the

More information

Category-based Language Models in a Spanish Spoken Dialogue System

Category-based Language Models in a Spanish Spoken Dialogue System Category-based Language Models in a Spanish Spoken Dialogue System Raquel Justo e Inés Torres Departamento de Electricidad y Electrónica Universidad del País Vasco E-48080 Leioa, Spain webjublr@lg.ehu.es,

More information

Efficient Discriminative Training of Long-span Language Models

Efficient Discriminative Training of Long-span Language Models Efficient Discriminative Training of Long-span Language Models Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur Human Language Technology Center of Excellence Center for Language and Speech Processing, Johns

More information

Chapter 7. Language models. Statistical Machine Translation

Chapter 7. Language models. Statistical Machine Translation Chapter 7 Language models Statistical Machine Translation Language models Language models answer the question: How likely is a string of English words good English? Help with reordering p lm (the house

More information

Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems

Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems Ergun Biçici Qun Liu Centre for Next Generation Localisation Centre for Next Generation Localisation School of Computing

More information

Introduction to the Database

Introduction to the Database Introduction to the Database There are now eight PDF documents that describe the CHILDES database. They are all available at http://childes.psy.cmu.edu/data/manual/ The eight guides are: 1. Intro: This

More information

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments

Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments Josef Psutka 1, Pavel Ircing 1, Josef V. Psutka 1, Vlasta Radová 1, William Byrne 2, Jan

More information

Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)

Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet) Proceedings of the Twenty-Fourth Innovative Appications of Artificial Intelligence Conference Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet) Tatsuya Kawahara

More information

Automatic diacritization of Arabic for Acoustic Modeling in Speech Recognition

Automatic diacritization of Arabic for Acoustic Modeling in Speech Recognition Automatic diacritization of Arabic for Acoustic Modeling in Speech Recognition Dimitra Vergyri Speech Technology and Research Lab., SRI International, Menlo Park, CA 94025, USA dverg@speech.sri.com Katrin

More information

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features , pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of

More information

Automated Transcription of Conversational Call Center Speech with Respect to Non-verbal Acoustic Events

Automated Transcription of Conversational Call Center Speech with Respect to Non-verbal Acoustic Events Automated Transcription of Conversational Call Center Speech with Respect to Non-verbal Acoustic Events Gellért Sárosi 1, Balázs Tarján 1, Tibor Fegyó 1,2, and Péter Mihajlik 1,3 1 Department of Telecommunication

More information

Audio Indexing on a Medical Video Database: the AVISON Project

Audio Indexing on a Medical Video Database: the AVISON Project Audio Indexing on a Medical Video Database: the AVISON Project Grégory Senay Stanislas Oger Raphaël Rubino Georges Linarès Thomas Parent IRCAD Strasbourg, France Abstract This paper presents an overview

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

Using Morphological Information for Robust Language Modeling in Czech ASR System Pavel Ircing, Josef V. Psutka, and Josef Psutka

Using Morphological Information for Robust Language Modeling in Czech ASR System Pavel Ircing, Josef V. Psutka, and Josef Psutka 840 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY 2009 Using Morphological Information for Robust Language Modeling in Czech ASR System Pavel Ircing, Josef V. Psutka,

More information

Evaluation of speech technologies

Evaluation of speech technologies CLARA Training course on evaluation of Human Language Technologies Evaluations and Language resources Distribution Agency November 27, 2012 Evaluation of speaker identification Speech technologies Outline

More information

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke 1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline

More information

Automatic Creation and Tuning of Context Free

Automatic Creation and Tuning of Context Free Proceeding ofnlp - 'E0 5 Automatic Creation and Tuning of Context Free Grammars for Interactive Voice Response Systems Mithun Balakrishna and Dan Moldovan Human Language Technology Research Institute The

More information

Comparison of Data Selection Techniques for the Translation of Video Lectures

Comparison of Data Selection Techniques for the Translation of Video Lectures Comparison of Data Selection Techniques for the Translation of Video Lectures Joern Wuebker 1 Hermann Ney 1,2 1 RWTH Aachen University, Aachen, Germany 2 Univ. Paris-Sud, France and LIMSI/CNRS, Orsay,

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

Language Modeling. Chapter 1. 1.1 Introduction

Language Modeling. Chapter 1. 1.1 Introduction Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

More information

POS Tagging 1. POS Tagging. Rule-based taggers Statistical taggers Hybrid approaches

POS Tagging 1. POS Tagging. Rule-based taggers Statistical taggers Hybrid approaches POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches POS Tagging 1 POS Tagging 2 Words taken isolatedly are ambiguous regarding its POS Yo bajo con el hombre bajo a PP AQ

More information

Evaluating grapheme-to-phoneme converters in automatic speech recognition context

Evaluating grapheme-to-phoneme converters in automatic speech recognition context Evaluating grapheme-to-phoneme converters in automatic speech recognition context Denis Jouvet, Dominique Fohr, Irina Illina To cite this version: Denis Jouvet, Dominique Fohr, Irina Illina. Evaluating

More information

Statistical text-to-speech synthesis of Spanish subtitles

Statistical text-to-speech synthesis of Spanish subtitles Statistical text-to-speech synthesis of Spanish subtitles S. Piqueras, M. A. del-agua, A. Giménez, J. Civera, and A. Juan MLLP, DSIC, Universitat Politècnica de València, Camí de Vera s/n, 46022, València,

More information

An Automated Analysis and Indexing Framework for Lecture Video Portal

An Automated Analysis and Indexing Framework for Lecture Video Portal An Automated Analysis and Indexing Framework for Lecture Video Portal Haojin Yang, Christoph Oehlke, and Christoph Meinel Hasso Plattner Institute (HPI), University of Potsdam, Germany {Haojin.Yang,Meinel}@hpi.uni-potsdam.de,

More information

CallSurf - Automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content

CallSurf - Automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content CallSurf - Automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content Martine Garnier-Rizet 1, Gilles Adda 2, Frederik Cailliau

More information

RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM

RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM B. Angelini, G. Antoniol, F. Brugnara, M. Cettolo, M. Federico, R. Fiutem and G. Lazzari IRST-Istituto per la Ricerca Scientifica e Tecnologica

More information