Speech Processing for Audio Indexing

Size: px
Start display at page:

Download "Speech Processing for Audio Indexing"

Transcription

1 Speech Processing for Audio Indexing Lori Lamel with J.L. Gauvain and colleagues from Spoken Language Processing Group CNRS-LIMSI LIMSI GoTAL Gothenburg, Sweden, August 25-27,

2 Introduction Huge amounts of audio data produced (TV, radio, Internet, telephone) Automatic structuring of multimedia, multilingual data Much accessible content in the audio and text streams Speech and language processing technologies are key components for indexing Applications: digital multimedia libraries, media monitoring, News on Demand, Internet watch services, speech analytics,... LIMSI GoTAL Gothenburg, Sweden, August 25-27,

3 Processing Audiovisual Data Language recognition Speaker recognition signal Audio Segmentation Speech transcription Emotions Named entities,... XML Acoustic phonetic modeling Corpus based studies Perceptual tests Translation Q&A Audio Analysis Dialog Generation Synthesis signal LIMSI GoTAL Gothenburg, Sweden, August 25-27,

4 Some Recent Projects Conversational interfaces & intelligent assistants: Computers in the Human Interaction Loop (chil.server.de) Human-Human communication mediated by the machine multimodality (speech, image, video) who & where speaker identification and tracking what linguistic content (far-field ASR) why & how style, attitude, emotion Speech translation: Technology and Corpora for Speech to Speech Translation ( Global Autonomous Language Exploitation (DARPA GALE) Developing multimedia and multilingual indexing and management tools QUAERO ( Speech analytics: LIMSI GoTAL Gothenburg, Sweden, August 25-27,

5 Why Is Speech Recognition Difficult? Automatic conversion of spoken words in text Text: I do not know why speech recognition is so difficult Continuous: Idonotknowwhyspeechrecognitionissodifficult Spontaneous: Idunnowhyspeechrecnitionsodifficult Pronunciation: YdonatnowYspiCrEkxgnISxnIzsodIfIk lt YdonowYspiCrEknISNsodIfxk l YdontnowYspiCrEkxnISNsodIfIk lt YdxnowYspiCrEknISNsodIfxk lt Important variability factors: Speaker Acoustic environment physical characteristics (gender, background noise (cocktail party,...) age,...), accent, emotional state, room acoustic, signal capture situation (lecture, conversation, (microphone, channel,...) meeting,...) [after lecture Speech Recognition and Understanding by Tanja Schultz and John Kominek] LIMSI GoTAL Gothenburg, Sweden, August 25-27,

6 Audio Samples (1) WSJ Read stop PREVIOUSLY HE WAS PRESIDENT OF CIGNA S PROPERTY CASUAL GROUP FOR FIVE YEARS AND SERVED AS CHIEF FINANCIAL OFFICER WSJ Spontaneous ACCORDING TO KOSTAS STOUPIS AN ECONOMIST WITH THE GREEK GOV- ERNMENT IT IS IT IS NOT YET POSSIBLE FOR GREECE TO TO MAINTAIN AN EXPORT LEVEL THAT IS EQUIVALENT TO EVEN THE THE SMALLEST LEVEL OF THE OTHER WESTERN NATIONS Broadcast News ON WORLD NEWS TONIGHT THIS WEDNESDAY TERRY NICHOLS AVOIDS THE DEATH SENTENCE FOR HIS ROLE IN THE OKLAHOMA CITY BOMBING. ONE OF THE WORLD S MOST POPULAR AIRPLANES THE GOVERNMENT ORDERS AN IMMEDIATE SAFE TY INSPECTION. AND OUR REPORT ON THE CLONING CONTROVERSY BECAUSE ONE AMERICAN SCIENTIST SAYS HE S READY TO GO. LIMSI GoTAL Gothenburg, Sweden, August 25-27,

7 Conversational Telephone Speech Audio Samples (2) stop Hello. Yes, This is Bill. Hi Bill. This is Hillary. So this is my first call so I haven t a clue as to - - what we re supposed to do yeah. It s pretty funny that we re Bill and Hillary talking on the phone though isn t it? Ah I didn t even think about that. {laughter} uhm we re supposed to talk about did you get the topic. Yes. So we re supposed to talk about what changes if any you have made since... CHIL, spontaneous, non native I just give you a brief overview of [noise] what s going on in uh audio and why we bother with all these microphones in the the materials of the walls, the carpet and all that... LIMSI GoTAL Gothenburg, Sweden, August 25-27,

8 Audio Samples (3) EPPS stop Mister President, in these questions we see the European social model in flashing lights. The European social model is a mishmash which pleases no one: a bit of the free market here and a bit of welfare state there, mixed with a little green posturing. EPPS, spontaneous Thank you very much, Mister Chairman. Hum When I came to this room I I have seen and heard homophobia but sort of vaguely ; through T. V. and the rest of it. But I mean listening to some of the Polish colleagues speak here today, especially Roszkowski, Pek, Giertych and Krupa,... EPPS, non native My main my main problem with the environmental proposals of the Commission is that they do not correspond to the objectives of the Sixth Environmental Action Plan. As far as the traffic is concerned, the sixth E. A. P. stressed the decoupling of transport growth and G. D. P. growth. LIMSI GoTAL Gothenburg, Sweden, August 25-27,

9 Indicative ASR Performance Task Condition Word Error Dictation read speech, close-talking mic. 3-4% (humans 1%) read speech, noisy (SNR 15dB) 10% read speech, telephone 20% spontaneous dictation 14% read speech, non-native 20% Found audio TV & radio news broadcasts 10-15% (humans 4%) TV documentaries 20-30% Telephone conversations 20-30% (humans 4%) Lectures (close mic) 20% Lectures (distant mic) 50% EPPS 8% LIMSI GoTAL Gothenburg, Sweden, August 25-27,

10 t 10 t Sw itc hboar d ht Me Eng lis Me ting Rea Me ting Sw itc hboar di in CTSAra bicdar itc hboar dce lu lar Broah ting= IHM News Man dar in10ara bic10 X Var ie(non\ English) hones ATraveK h her (UL) News Eng lis h1 X News Eng lis h10 X News Eng lis hun lim ite d No isy )%( R E W k % 4 % 2 % 10 from NIST hmar ktes His tory NISTSTBenc SDM MDM ing Spec (UL)CTSMan (UL)0MeXNewsCTSFis Conversa iona lspec h (Non= h) Sw dcas tspec dmicrop % dspec h k20 ir lp lan ing ios kspec k5 % LIMSI GoTAL Gothenburg, Sweden, August 25-27,

11 Some Research Topics Development of generic technology Speaker independent, task independent (large vocabulary > 200k words) Robust (noise, microphone,...) Improving model accuracy Processing found audio: varied speaking styles, uncontrolled conditions Multilinguality: low e-resourced languages Developing systems with less supervision Model adaptation: keeping models up-to-date Annotation of metadata (speaker, language, topic, emotion, style...) LIMSI GoTAL Gothenburg, Sweden, August 25-27,

12 Internet Languages Internet users in Millions per language Increase (%) in Internet users p internetworldstats.com, Nov 30, 2007 language from 2000 to 2007 LIMSI GoTAL Gothenburg, Sweden, August 25-27,

13 ASR Systems Text Corpus Speech Corpus Training Normalization Manual Transcription Feature Extraction N gram Estimation Training Lexicon HMM Training Decoding Language Model Recognizer Lexicon Acoustic Models P(W) P(H W) f(x H) Speech Sample Y Acoustic Front end X Decoder W * Speech Transcription LIMSI GoTAL Gothenburg, Sweden, August 25-27,

14 System Components Acouctic models: HMMs (10-20M parameters), allophones (triphones), discriminative features, discriminative training Pronunciation dictionary with probabilities Language models: statistical N-gram models (10-20M N-grams), model interpolation, connectionist LMs, text normalization Decoder: multipass decoding, unsupervised acoustic model adaptation, system combination (Rover, cross-system adaptation) LIMSI GoTAL Gothenburg, Sweden, August 25-27,

15 Multi Layer Perceptron Front-End Features incorporate longer temporal information than cepstral features MLP NN projects the high-dimensional space onto phone state targets LP-TRAP features (500ms), 4 layer bottle-neck architecture (HMM state targets, 3rd layer size is 39), PCA [Hermansky & Sharma, ICSLP 98; Fousek, 2007] Studied various configurations & ways to combine with cepstral features Main results: bnat06 WER (%) Features PLP MLP wlp PLP + MLP wlp No adaptation SAT+CMLLR+MLLR Signifiant WER reduction without acoustic model adaptation Best combination via feature vector concatenation (78 features) Acoustic model adaptation is less effective for MLP Rover (PLP,PLP+MLP) further reduces the WER LIMSI GoTAL Gothenburg, Sweden, August 25-27,

16 Pronunciation Modeling Pronunciation dictionary is an important system component Usually represented with phones + non-speech units (silence, breath, filler) Well-suited to recognition of read and prepared speech Modeling variants particularly important for spontaneous speech multiwords I don t know Explore (semi-)automatic methods to generate pronunciation variants Explicit use of phonetic knowledge to hypothesize rules anti, multi: /i/ - /ay/ variation inter: /t/ vs nasal flap, retroflex schwa vs schwa stop deletion/insertion y-insertion (coupon, news, super) Data driven: confront pronunciations with very large amounts of data Estimate probabilities of pronunciation variants (very important for Arabic) LIMSI GoTAL Gothenburg, Sweden, August 25-27,

17 Example: coupon kyupan kupan LIMSI GoTAL Gothenburg, Sweden, August 25-27,

18 HMM a 11 a 22 a 33 1 a 12 a f(. s 1) f(. s 2) f(. s ) 3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x8 LIMSI GoTAL Gothenburg, Sweden, August 25-27,

19 Example: interest InXIst IntrIst LIMSI GoTAL Gothenburg, Sweden, August 25-27,

20 Pronunciation Variants - BN vs CTS 100 hours of data Word Pronunciation BN CTS Word Pronunciation BN CTS interest IntrIst interests IntrIss IntXIst 3 33 IntrIsts InXIst 0 11 IntXIsts 3 2 IntXIss 3 1 interested IntrIstxd interesting IntrIst G IntXIstxd 3 80 IntXIst G InXIstxd InXIst G Similar but less frequent words: interestingly, disinterest, disinterested, intercom, interconnected, interfere... Estimate pronunciation probabilities, but no generalization LIMSI GoTAL Gothenburg, Sweden, August 25-27,

21 Morphological Decomposition Morphological decomposition proposed to address problem of lexical variety (Arabic, German, Turkish, Finnish, Amharic,...) Rule-based vs data-driven approaches Reduces out-of-vocabulary rate Improve n-gram estimation of infrequent words Tendency to create errors due to acoustic confusions of small units Some proposed solutions: - inhibit decomposition of frequent words - incorporate linguistic knowledge (recognizer confusions, assimilation) Recognition performace of optimized systems is typically comparable to and can be combined with word based systems LIMSI GoTAL Gothenburg, Sweden, August 25-27,

22 Continuous Space Language Models Project words onto a continuous space (about 300) Estimate projection and n-gram probabilities with neural network Better generalization for unknown n-grams (compared to general back-off methods) Efficient training with a resampling algorithm Only model the N most common words (10-12k) Interpolated with a standard 4-gram backoff LM 20% perplexity reduction, % reduction in WER across a variety of languages and tasks [Schwenk, 2007] LIMSI GoTAL Gothenburg, Sweden, August 25-27,

23 Reducing Training Costs - Motivation State-of-the-art speech recognizers trained on hundreds to thousands hours of transcribed audio data hundreds of million to several billions of words of texts large recognition lexicons Less e-represented languages Over 6000 languages, about 800 written Poor representation in accessible form Lack of economic and/or political incentives PhD theses: Vietnamese, Khmer [Le, 2006], Somali [Nimaan, 2007], Amharic, Turkish [Pellegrini, 2008] SPICE: Afrikaans, Bulgarian, Vietnamese, Hindi, Konkani, Telugu, Turkish, but also English, German, French [Schultz, 2007] LIMSI GoTAL Gothenburg, Sweden, August 25-27,

24 Questions How much data is needed to train the models? What performance can be expected with a given amount of resources? Are audio or textual training data more important for ASR in less e-represented languages (languages with texts available)? Assess the impact on WER of the quantity of Speech material (acoustic modeling) Texts (transcriptions) Texts (newspapers, newswires available on the Web) Some studies on Amharic (low e-resourced language) [Pellegrini, 2008] LIMSI GoTAL Gothenburg, Sweden, August 25-27,

25 Amharic, 2 hour development set OOV and WER Rates 50 2h 10h 35h h 5h 10h 35h OOV(%) 25 log WER(%) k 100k 500k 1M 4.6M 10k Text corpus size (# words) Audio transcriptions closer to development data Even with 4.6M texts, more audio transcripts help 100k 500k 1M Text corpus size (# words) 4.6M LIMSI GoTAL Gothenburg, Sweden, August 25-27,

26 Amharic, 2 hour development set Influence of Transcriptions in LM h 10h 35h MA2h ML10h MA2h ML35h 50 10h MA10h ML35h 35h log WER(%) 50 log WER(%) k 100k 500k Text corpus size (# words) 1M 4.6M 2h of audio training data is not sufficient Good operating point: 10h audio, 1M word texts 10k 100k 500k Test corpus size (# words) Collecting texts is more efficient than transcribing audio at this operating point 1M 4.6M LIMSI GoTAL Gothenburg, Sweden, August 25-27,

27 Reducing Training Supervision Use of quick transcriptions (less detailed, less costly to produce) low-cost approximate annotations (commercial transcripts, closed-captions) raw data with closely related contemporary texts raw data with complementary texts (only newspaper, newswire,...) Lightly supervised and unsupervised training using speech recognizer [Zavaliagkos & Colthurst 1998; Kemp & Waibel 1999; Jang & Hauptmann, 1999, Lamel, Gauvain & Adda 2000] Standard MLE training assumes that transcriptions are perfect, and are aligned with audio Flexible alignment using full N-gram decoder and garbage model Can skip over unknown words, transcription errors Implicit modeling of short vowels in Arabic LIMSI GoTAL Gothenburg, Sweden, August 25-27,

28 Reducing Training Supervision - Non-speech Use existing models (trained on manual annotations) to automatically detect fillers and breath noises Add these to reference transcriptions Small reduction in word error rate breath filler Manual (50h) 8.1% 0.6% BN (967h) 4.9% 3.4% BC (162h) 5.4% 6.8% LIMSI GoTAL Gothenburg, Sweden, August 25-27,

29 Reducing Supervision: Pronunciation Generation Pronunciation generation in Arabic is relatively straightforward from a vocalized word form Most sites working on Arabic use the Buckwalter morphological analyzer to generate all possible vocalizations But many words are not handled Introduced generic vowel (@) model as a placeholder Developed rules to automatically generate pronunciations with a generic vowel from non-vocalized word form Enables training with partial information ktb k@t@b@ k@t@b@n kt@b@ kt@b@n k@tb@ k@tb@n k@t@b Reduces OOV of recognition lexicon LIMSI GoTAL Gothenburg, Sweden, August 25-27,

30 Alignment Example: li AntilyAs n l y A s LBC NEWS PLAY LIMSI GoTAL Gothenburg, Sweden, August 25-27,

31 Towards Applications Transcription for Translation (TC-STAR) Using the WEB as a Source of Training Data Indexing Internet audio-video data: Audiosurf Open Domain Question/Answering qast/2008 Open domain Interactive Information Retrieval by Telephone (RITel) ritel.limsi.fr Speech Analytics (Infom@gic/Callsurf) LIMSI GoTAL Gothenburg, Sweden, August 25-27,

32 ASR Transcripts Versus Texts Spoken language, disfluencies (hesitations, fillers,...) No document/sentence boundaries Lack of capitalization, punctuation Transcription errors (substitutions, deletions, insertions) Strong correlation between WER and IE metrics (NIST) Slightly more errors on NE tagged words Vocabulary limitation essentially affects person names + Known vocabulary, i.e. no orthographic errors, no need for normalization + Time codes + Confidence measures LIMSI GoTAL Gothenburg, Sweden, August 25-27,

33 Transcription for Translation (TC-STAR) Three languages (EN, ES, MAN), 3 tasks (European Parliament, Spanish Parliament, BN) Improving modeling and decoding techniques audio segmentation, discriminative features, duration models, automatic learning, discriminative training, pronunciation models, NN language model, decoding strategies,... Model adaptation methods speaker adaptive trainining, acoustic and language model adaptation, cross-system adaptation, adaptation to noisy environments,... Integration of ASR and Translation transcription errors, confidence scores, word graph interface punctuation for MT transcription normalization (98 vs. ninety eight, hesitations,...) LIMSI GoTAL Gothenburg, Sweden, August 25-27,

34 TC-Star ASR Progress 30 Best TCStar LIMSI OCT04-BN More task-specific data More complex models Improved modeling and decoding techniques Automatic learning, discriminative training, pronunciation modeling Demo LIMSI GoTAL Gothenburg, Sweden, August 25-27,

35 Using the WEB as a Source of Training Data (Work done in collaboration with Vecsys Research) How to build a system without speaking the language (e.g Russian) and without transcribed data? Main steps Get texts from the WEB, and basic linguistic knowledge for letter-tosound rules Get BN audio data with incomplete and approximate transcripts Normalize texts/transcripts and build n-gram LM Start with seed AMs from other languages Align incomplete transcripts and audio, discard badly aligned data Train AMs Iterate last 2 steps to minimize the overestimated WER LIMSI GoTAL Gothenburg, Sweden, August 25-27,

36 Results: Approximate WER Russian WER Russian WER select Estimated WER using approximate transcripts Approximate WER reduced from 30% to 10% Similar experiments underway with Spanish and Portuguese LIMSI GoTAL Gothenburg, Sweden, August 25-27,

37 Indexing Internet audio/video Processed data 259 sources 80h/day, 33 languages Transcribing and indexing 203 sources 6 languages AUDIOSURF System English French German Arabic Spanish Mandarin All h 10735h 964h 1756h 60h h h 14950h 1639h 3383h 1369h h h 18675h 1970h 4634h 2337h h h 19943h 1972h 4736h 2401h h LIMSI GoTAL Gothenburg, Sweden, August 25-27,

38 LIMSI GoTAL Gothenburg, Sweden, August 25-27,

39 Open Domain Question/Answering in Audio data Automatic structurization of audio data Detect semantically meaningful units Specific entity taggers (named, task, linguistic entities) - rewrite rules (work like local grammars with specific dictionaries) - named and linguistic entity tagging are language dependent but task independent - task entity tagging is language independent but task dependent Projects: RITel (ritel.limsi.fr), Chil (chil.server.de), Quaero ( Participation in CLEF track QAst07, QAst08 ( qast/ 2008) [Rosset et al, ASRU 2007] Compare QA performance on manual and ASR transcripts of varied audio Best accuracy about , generally degradation with increasing WER LIMSI GoTAL Gothenburg, Sweden, August 25-27,

40 RITel System Example stop LIMSI GoTAL Gothenburg, Sweden, August 25-27,

41 RITel System Information Retrieval by Telephone [Rosset et al, 2005,2006] ritel.limsi.fr Example stop Fully integrated open domain Interactive Information Retrieval System Speech detection and recognition Non-Contextual analysis (for texts and speech transcripts) General Information Retrieval IR on databases and dialogic interactions Management of dialog history Natural Language Generation Data collection platform (natural, real-time) LIMSI GoTAL Gothenburg, Sweden, August 25-27,

42 LIMSI GoTAL Gothenburg, Sweden, August 25-27,

43 LIMSI GoTAL Gothenburg, Sweden, August 25-27,

44 Question: What is the Vlaams Blok? Manual transcript: the Belgian Supreme Court has upheld a previous ruling that declares the Vlaams Blok a criminal organization and effectively bans it. Answer: criminal organisation Extracted portion of an automatic transcript (CTM file format): (...) EN SAT Vlaams EN SAT Blok EN SAT a EN SAT criminal EN SAT organisation EN SAT and (...) Answer: LIMSI GoTAL Gothenburg, Sweden, August 25-27,

45 Speech Analytics Statistical analyses of huge speech corpora Client satisfaction, service efficiency,... Extraction of metadata (content, topics, quality of interaction...) Confidentiality and privacy issues Anonymized transcriptions LIMSI GoTAL Gothenburg, Sweden, August 25-27,

46 CallSurf stop Speech Analytics Sample Extract A: oui bonjour Monsieur, EDF pro à l appareil. C: oui. ah oui oui oui. A: ouais je je vous appelle Monsieur, je vous avais eu il y a quelques jours au téléphone, C: tout à fait. A: et je vous avais transféré vers euh des conseillers euh nouvelle offre qui m ont renvoyé un mail ce matin, en me disant de vous recontacter puisque la demande de mise en service par le premier conseiller que vous aviez eu à l origine n avait pas été faite. C: ah bon A: ouais. donc euh bon c est pas coupé sur place. c est pas le problème... LIMSI GoTAL Gothenburg, Sweden, August 25-27,

47 Detailed Transcription edfp h5 fre spk <o,f0,male> allò oui edfp h5 fre spk <o,f0,female> {fw} Monsieur Dupont edfp h5 fre spk <o,f0,male> oui edfp h5 fre spk <o,f0,female> oui Ma(demoiselle) Mademoiselle Brun EDF pro edfp h5 fre spk <o,f0,male> oui edfp h5 fre spk <o,f0,female> {breath} {fw} je vous rappelle je viens d avoir le service technique {fw} concernant {fw} notre discussion {fw} de ce matin edfp h5 fre spk <o,f0,female> donc {fw} après vérification en fait le prédécesseur qui était donc un client {fw} particulier en référence particulier {breath} a été... LIMSI GoTAL Gothenburg, Sweden, August 25-27,

48 Quick Anonymized Transcription A/ Sylviane XX EDF Pro bonjour. C/ allò bonjour A/ oui C/ je vous donne ma référence client? A/ s il vous plait C/ rrrrr A/ rr rrr C/ rrr A/ oui C/ rrr tiret r r LIMSI GoTAL Gothenburg, Sweden, August 25-27,

49 Training with Reduced Information Transcriptions assumed to be imperfect Alignment allowing all reference words to be optionally skipped Filler words and breath can be inserted between words Anonymized forms absorbed by a generic model Training uses a word lattice generated by the decoder Word error rate currently about 30% LIMSI GoTAL Gothenburg, Sweden, August 25-27,

50 Some Perspectives State-of-the-art transcription systems have WER ranging from 10-30% depending upon the tasks Comparable results obtained on several languages (English, French, German, Mandarin, Spanish, Dutch...) Performance is sufficient for some tasks (indexing huge amounts of audio/video, spoken document retrieval, speech-to-speech translation) Requires substantial training resources but recent research has demonstrated the potential of semi- and unsupervised methods Processing of more varied data (podcasts, voice mail, meetings) Dealing with different speaker accents (regional and non-native accents) Keeping models up-to-date Developing systems for oral languages (without a written form) LIMSI GoTAL Gothenburg, Sweden, August 25-27,

51 Over 25 Years of Progress Voice commands single speaker 2 30 words Controled dialog speaker indep words Conversational Telephone Speech speaker indep. Speech Analytics Audio mining Isolated words single speaker 2 30 words Isolated word dictation single speaker 20k words Transcription for indexation TV & radio Unlimited Domain Transcription Internet audio Connected words speaker indep 10 words Continuous dictation single speaker 60k words Unlimited Domain Speech to speech Translation LIMSI GoTAL Gothenburg, Sweden, August 25-27,

Speech Transcription

Speech Transcription TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1 What Is Speech Recognition? Def: Automatic conversion

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

Evaluation of speech technologies

Evaluation of speech technologies CLARA Training course on evaluation of Human Language Technologies Evaluations and Language resources Distribution Agency November 27, 2012 Evaluation of speaker identification Speech technologies Outline

More information

ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney

ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department,

More information

Speech Processing Applications in Quaero

Speech Processing Applications in Quaero Speech Processing Applications in Quaero Sebastian Stüker www.kit.edu 04.08 Introduction! Quaero is an innovative, French program addressing multimedia content! Speech technologies are part of the Quaero

More information

C E D A T 8 5. Innovating services and technologies for speech content management

C E D A T 8 5. Innovating services and technologies for speech content management C E D A T 8 5 Innovating services and technologies for speech content management Company profile 25 years experience in the market of transcription/reporting services; Cedat 85 Group: Cedat 85 srl Subtitle

More information

Sample Cities for Multilingual Live Subtitling 2013

Sample Cities for Multilingual Live Subtitling 2013 Carlo Aliprandi, SAVAS Dissemination Manager Live Subtitling 2013 Barcelona, 03.12.2013 1 SAVAS - Rationale SAVAS is a FP7 project co-funded by the EU 2 years project: 2012-2014. 3 R&D companies and 5

More information

TED-LIUM: an Automatic Speech Recognition dedicated corpus

TED-LIUM: an Automatic Speech Recognition dedicated corpus TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France firstname.lastname@lium.univ-lemans.fr

More information

Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training

Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training Matthias Paulik and Panchi Panchapagesan Cisco Speech and Language Technology (C-SALT), Cisco Systems, Inc.

More information

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM

THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM Simon Wiesler 1, Kazuki Irie 2,, Zoltán Tüske 1, Ralf Schlüter 1, Hermann Ney 1,2 1 Human Language Technology and Pattern Recognition, Computer Science Department,

More information

Specialty Answering Service. All rights reserved.

Specialty Answering Service. All rights reserved. 0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...

More information

LIUM s Statistical Machine Translation System for IWSLT 2010

LIUM s Statistical Machine Translation System for IWSLT 2010 LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,

More information

EVALUATION OF AUTOMATIC TRANSCRIPTION SYSTEMS FOR THE JUDICIAL DOMAIN

EVALUATION OF AUTOMATIC TRANSCRIPTION SYSTEMS FOR THE JUDICIAL DOMAIN EVALUATION OF AUTOMATIC TRANSCRIPTION SYSTEMS FOR THE JUDICIAL DOMAIN J. Lööf (1), D. Falavigna (2),R.Schlüter (1), D. Giuliani (2), R. Gretter (2),H.Ney (1) (1) Computer Science Department, RWTH Aachen

More information

SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS

SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS Mbarek Charhad, Daniel Moraru, Stéphane Ayache and Georges Quénot CLIPS-IMAG BP 53, 38041 Grenoble cedex 9, France Georges.Quenot@imag.fr ABSTRACT The

More information

Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data

Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data Alberto Abad, Hugo Meinedo, and João Neto L 2 F - Spoken Language Systems Lab INESC-ID / IST, Lisboa, Portugal {Alberto.Abad,Hugo.Meinedo,Joao.Neto}@l2f.inesc-id.pt

More information

CallSurf - Automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content

CallSurf - Automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content CallSurf - Automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content Martine Garnier-Rizet 1, Gilles Adda 2, Frederik Cailliau

More information

Portuguese Broadcast News and Statistical Machine Translation (SMT)

Portuguese Broadcast News and Statistical Machine Translation (SMT) Statistical Machine Translation of Broadcast News from Spanish to Portuguese Raquel Sánchez Martínez 1, João Paulo da Silva Neto 2, and Diamantino António Caseiro 1 L²F - Spoken Language Systems Laboratory,

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Estonian Large Vocabulary Speech Recognition System for Radiology

Estonian Large Vocabulary Speech Recognition System for Radiology Estonian Large Vocabulary Speech Recognition System for Radiology Tanel Alumäe, Einar Meister Institute of Cybernetics Tallinn University of Technology, Estonia October 8, 2010 Alumäe, Meister (TUT, Estonia)

More information

The LIMSI RT-04 BN Arabic System

The LIMSI RT-04 BN Arabic System The LIMSI RT-04 BN Arabic System Abdel. Messaoudi, Lori Lamel and Jean-Luc Gauvain Spoken Language Processing Group LIMSI-CNRS, BP 133 91403 Orsay cedex, FRANCE {abdel,gauvain,lamel}@limsi.fr ABSTRACT

More information

German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings

German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings Haojin Yang, Christoph Oehlke, Christoph Meinel Hasso Plattner Institut (HPI), University of Potsdam P.O. Box

More information

Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)

Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet) Proceedings of the Twenty-Fourth Innovative Appications of Artificial Intelligence Conference Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet) Tatsuya Kawahara

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Development of a Speech-to-Text Transcription System for Finnish

Development of a Speech-to-Text Transcription System for Finnish Development of a Speech-to-Text Transcription System for Finnish Lori Lamel 1 and Bianca Vieru 2 1 Spoken Language Processing Group 2 Vecsys Research CNRS-LIMSI, BP 133 3, rue Jean Rostand 91403 Orsay

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

SPEECH DATA MINING, SPEECH ANALYTICS, VOICE BIOMETRY. www.phonexia.com, 1/41

SPEECH DATA MINING, SPEECH ANALYTICS, VOICE BIOMETRY. www.phonexia.com, 1/41 SPEECH DATA MINING, SPEECH ANALYTICS, VOICE BIOMETRY www.phonexia.com, 1/41 OVERVIEW How to move speech technology from research labs to the market? What are the current challenges is speech recognition

More information

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages 8-10. CHARTE NIVEAU B2 Pages 11-14

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages 8-10. CHARTE NIVEAU B2 Pages 11-14 CHARTES D'ANGLAIS SOMMAIRE CHARTE NIVEAU A1 Pages 2-4 CHARTE NIVEAU A2 Pages 5-7 CHARTE NIVEAU B1 Pages 8-10 CHARTE NIVEAU B2 Pages 11-14 CHARTE NIVEAU C1 Pages 15-17 MAJ, le 11 juin 2014 A1 Skills-based

More information

Generating Training Data for Medical Dictations

Generating Training Data for Medical Dictations Generating Training Data for Medical Dictations Sergey Pakhomov University of Minnesota, MN pakhomov.sergey@mayo.edu Michael Schonwetter Linguistech Consortium, NJ MSchonwetter@qwest.net Joan Bachenko

More information

Using the Amazon Mechanical Turk for Transcription of Spoken Language

Using the Amazon Mechanical Turk for Transcription of Spoken Language Research Showcase @ CMU Computer Science Department School of Computer Science 2010 Using the Amazon Mechanical Turk for Transcription of Spoken Language Matthew R. Marge Satanjeev Banerjee Alexander I.

More information

Speech Processing 15-492/18-492. Speech Translation Case study: Transtac Details

Speech Processing 15-492/18-492. Speech Translation Case study: Transtac Details Speech Processing 15-492/18-492 Speech Translation Case study: Transtac Details Transtac: Two S2S System DARPA developed for Check points, medical and civil defense Requirements Two way Eyes-free (no screen)

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Michael J. Witbrock and Alexander G. Hauptmann Carnegie Mellon University ABSTRACT Library

More information

Building A Vocabulary Self-Learning Speech Recognition System

Building A Vocabulary Self-Learning Speech Recognition System INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes

More information

The LENA TM Language Environment Analysis System:

The LENA TM Language Environment Analysis System: FOUNDATION The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File Dongxin Xu, Umit Yapanel, Sharmi Gray, & Charles T. Baer LENA Foundation, Boulder, CO LTR-04-2 September

More information

How To Translate A Language Into A Different Language

How To Translate A Language Into A Different Language Simultaneous Machine Interpretation Utopia? Alex Waibel and the InterACT Team Carnegie Mellon University Karlsruhe Institute of Technology Mobile Technologies, LLC alex@waibel.com Dilemma: The Language

More information

EFFICIENT HANDLING OF MULTILINGUAL LANGUAGE MODELS

EFFICIENT HANDLING OF MULTILINGUAL LANGUAGE MODELS EFFICIENT HANDLING OF MULTILINGUAL LANGUAGE MODELS Christian Fügen, Sebastian Stüker, Hagen Soltau, Florian Metze Interactive Systems Labs University of Karlsruhe Am Fasanengarten 5 76131 Karlsruhe, Germany

More information

Experiences from Verbmobil. Norbert Reithinger DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken bert@dfki.de

Experiences from Verbmobil. Norbert Reithinger DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken bert@dfki.de Experiences from Verbmobil Norbert Reithinger DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken bert@dfki.de Content Overview of Verbmobil project Scientific challenges and experiences Software technology

More information

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu

More information

D2.4: Two trained semantic decoders for the Appointment Scheduling task

D2.4: Two trained semantic decoders for the Appointment Scheduling task D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive

More information

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

Information Leakage in Encrypted Network Traffic

Information Leakage in Encrypted Network Traffic Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)

More information

couleurs Pantone réf : 126 C réf : 126 C réf : 414 C logo seul Signature sur le devant de la carte de vœux

couleurs Pantone réf : 126 C réf : 126 C réf : 414 C logo seul Signature sur le devant de la carte de vœux CAMOMILE Fir0002/Flagstaffotos CHIST-ERA Project Seminar 2013 Brussels, March 26-27, 2013 presented by C. Barras couleurs Pantone réf : 126 C réf : 126 C réf : 414 C 80% 100% 100% logo seul Signature sur

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

Automatic slide assignation for language model adaptation

Automatic slide assignation for language model adaptation Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly

More information

Scaling Shrinkage-Based Language Models

Scaling Shrinkage-Based Language Models Scaling Shrinkage-Based Language Models Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, Abhinav Sethy IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, NY 10598 USA {stanchen,mangu,bhuvana,sarikaya,asethy}@us.ibm.com

More information

Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text

Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text Matthew Cooper FX Palo Alto Laboratory Palo Alto, CA 94034 USA cooper@fxpal.com ABSTRACT Video is becoming a prevalent medium

More information

PROMT Technologies for Translation and Big Data

PROMT Technologies for Translation and Big Data PROMT Technologies for Translation and Big Data Overview and Use Cases Julia Epiphantseva PROMT About PROMT EXPIRIENCED Founded in 1991. One of the world leading machine translation provider DIVERSIFIED

More information

Audio Indexing on a Medical Video Database: the AVISON Project

Audio Indexing on a Medical Video Database: the AVISON Project Audio Indexing on a Medical Video Database: the AVISON Project Grégory Senay Stanislas Oger Raphaël Rubino Georges Linarès Thomas Parent IRCAD Strasbourg, France Abstract This paper presents an overview

More information

IBM Content Analytics with Enterprise Search, Version 3.0

IBM Content Analytics with Enterprise Search, Version 3.0 IBM Content Analytics with Enterprise Search, Version 3.0 Highlights Enables greater accuracy and control over information with sophisticated natural language processing capabilities to deliver the right

More information

May 26th 2009. Pieter van der Linden, Program Manager Thomson. May 26 th, 2009

May 26th 2009. Pieter van der Linden, Program Manager Thomson. May 26 th, 2009 Presentation at Chorus Final Conference May 26th 2009 Pieter van der Linden, Program Manager Thomson May 26 th, 2009 Brief status Following the European approval, the core developments started in May 2008.

More information

Modern foreign languages

Modern foreign languages Modern foreign languages Programme of study for key stage 3 and attainment targets (This is an extract from The National Curriculum 2007) Crown copyright 2007 Qualifications and Curriculum Authority 2007

More information

Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics

Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics Unlocking Value from Patanjali V, Lead Data Scientist, Anand B, Director Analytics Consulting, EXECUTIVE SUMMARY Today a lot of unstructured data is being generated in the form of text, images, videos

More information

Reference Guide: Approved Vendors for Translation and In-Person Interpretation Services

Reference Guide: Approved Vendors for Translation and In-Person Interpretation Services Reference Guide: Approved Vendors for Translation and In-Person Interpretation Services What you need to know The government of D.C. has identified, vetted, and engaged four vendors in a citywide contract

More information

2009 Springer. This document is published in: Adaptive Multimedia Retrieval. LNCS 6535 (2009) pp. 12 23 DOI: 10.1007/978-3-642-18449-9_2

2009 Springer. This document is published in: Adaptive Multimedia Retrieval. LNCS 6535 (2009) pp. 12 23 DOI: 10.1007/978-3-642-18449-9_2 Institutional Repository This document is published in: Adaptive Multimedia Retrieval. LNCS 6535 (2009) pp. 12 23 DOI: 10.1007/978-3-642-18449-9_2 2009 Springer Some Experiments in Evaluating ASR Systems

More information

Extension of the LECTRA corpus: classroom LECture TRAnscriptions in European Portuguese

Extension of the LECTRA corpus: classroom LECture TRAnscriptions in European Portuguese Extension of the LECTRA corpus: classroom LECture TRAnscriptions in European Portuguese Thomas Pellegrini (1), Helena Moniz (1,2), Fernando Batista (1,3), Isabel Trancoso (1,4), Ramon Astudillo (1) (1)

More information

A SCALABLE VIDEO SEARCH ENGINE BASED ON AUDIO CONTENT INDEXING AND TOPIC SEGMENTATION

A SCALABLE VIDEO SEARCH ENGINE BASED ON AUDIO CONTENT INDEXING AND TOPIC SEGMENTATION A SCALABLE VIDEO SEARCH ENGINE BASED ON AUDIO CONTENT INDEXING AND TOPIC SEGMENTATION Julien LawTo 1, Jean-Luc Gauvain 2, Lori Lamel 2, Gregory Grefenstette 1, Guillaume Gravier 3, Julien Despres 4, Camille

More information

LMELECTURES: A MULTIMEDIA CORPUS OF ACADEMIC SPOKEN ENGLISH

LMELECTURES: A MULTIMEDIA CORPUS OF ACADEMIC SPOKEN ENGLISH LMELECTURES: A MULTIMEDIA CORPUS OF ACADEMIC SPOKEN ENGLISH K. Riedhammer, M. Gropp, T. Bocklet, F. Hönig, E. Nöth, S. Steidl Pattern Recognition Lab, University of Erlangen-Nuremberg, GERMANY noeth@cs.fau.de

More information

Recent advances in Digital Music Processing and Indexing

Recent advances in Digital Music Processing and Indexing Recent advances in Digital Music Processing and Indexing Acoustics 08 warm-up TELECOM ParisTech Gaël RICHARD Telecom ParisTech (ENST) www.enst.fr/~grichard/ Content Introduction and Applications Components

More information

Introductory Guide to the Common European Framework of Reference (CEFR) for English Language Teachers

Introductory Guide to the Common European Framework of Reference (CEFR) for English Language Teachers Introductory Guide to the Common European Framework of Reference (CEFR) for English Language Teachers What is the Common European Framework of Reference? The Common European Framework of Reference gives

More information

Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

More information

Reading Assistant: Technology for Guided Oral Reading

Reading Assistant: Technology for Guided Oral Reading A Scientific Learning Whitepaper 300 Frank H. Ogawa Plaza, Ste. 600 Oakland, CA 94612 888-358-0212 www.scilearn.com Reading Assistant: Technology for Guided Oral Reading Valerie Beattie, Ph.D. Director

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

Design and Data Collection for Spoken Polish Dialogs Database

Design and Data Collection for Spoken Polish Dialogs Database Design and Data Collection for Spoken Polish Dialogs Database Krzysztof Marasek, Ryszard Gubrynowicz Department of Multimedia Polish-Japanese Institute of Information Technology Koszykowa st., 86, 02-008

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis

More information

Voice Mail. Service & Operations

Voice Mail. Service & Operations ervice Service & Operations Your customers and clients expect their calls to be handled quickly or routed to the appropriate person or department. This is where ITS Telecom and Systems can offer valuable

More information

2014/02/13 Sphinx Lunch

2014/02/13 Sphinx Lunch 2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue

More information

Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0 Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0 Tim Schlippe, Lukasz Gren, Ngoc Thang Vu, Tanja Schultz Cognitive Systems Lab, Karlsruhe Institute

More information

Interactive product brochure :: Nina TM Mobile: The Virtual Assistant for Mobile Customer Service Apps

Interactive product brochure :: Nina TM Mobile: The Virtual Assistant for Mobile Customer Service Apps TM Interactive product brochure :: Nina TM Mobile: The Virtual Assistant for Mobile Customer Service Apps This PDF contains embedded interactive features. Make sure to download and save the file to your

More information

VECSYS LIMSI ARCHITECTURE

VECSYS LIMSI ARCHITECTURE VECSYS LIMSI ARCHITECTURE Samir Bennacef Vecsys Centralized Architecture Acoustic models Language models Caseframe Grammar Task Model Database Speech Recognizer text Semantic Analyzer Semantic frame Dialog

More information

7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan

7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan 7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan We explain field experiments conducted during the 2009 fiscal year in five areas of Japan. We also show the experiments of evaluation

More information

VoiceXML-Based Dialogue Systems

VoiceXML-Based Dialogue Systems VoiceXML-Based Dialogue Systems Pavel Cenek Laboratory of Speech and Dialogue Faculty of Informatics Masaryk University Brno Agenda Dialogue system (DS) VoiceXML Frame-based DS in general 2 Computer based

More information

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1

More information

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

SDL BeGlobal: Machine Translation for Multilingual Search and Text Analytics Applications

SDL BeGlobal: Machine Translation for Multilingual Search and Text Analytics Applications INSIGHT SDL BeGlobal: Machine Translation for Multilingual Search and Text Analytics Applications José Curto David Schubmehl IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200

More information

RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM

RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM B. Angelini, G. Antoniol, F. Brugnara, M. Cettolo, M. Federico, R. Fiutem and G. Lazzari IRST-Istituto per la Ricerca Scientifica e Tecnologica

More information

Retrieving Video Segments Based on Combined Text, Speech and Image Processing

Retrieving Video Segments Based on Combined Text, Speech and Image Processing Retrieving Video Segments Based on Combined Text, Speech and Image Processing HARRIS PAPAGEORGIOU AND ATHANASSIOS PROTOPAPAS Institute for Language & Speech Processing Maroussi, Greece THOMAS NETOUSEK

More information

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.

More information

Study Plan for Master of Arts in Applied Linguistics

Study Plan for Master of Arts in Applied Linguistics Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment

More information

Interoperability, Standards and Open Advancement

Interoperability, Standards and Open Advancement Interoperability, Standards and Open Eric Nyberg 1 Open Shared resources & annotation schemas Shared component APIs Shared datasets (corpora, test sets) Shared software (open source) Shared configurations

More information

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy Multi language e Discovery Three Critical Steps for Litigating in a Global Economy 2 3 5 6 7 Introduction e Discovery has become a pressure point in many boardrooms. Companies with international operations

More information

The Influence of Topic and Domain Specific Words on WER

The Influence of Topic and Domain Specific Words on WER The Influence of Topic and Domain Specific Words on WER And Can We Get the User in to Correct Them? Sebastian Stüker KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der

More information

Corpus Design for a Unit Selection Database

Corpus Design for a Unit Selection Database Corpus Design for a Unit Selection Database Norbert Braunschweiler Institute for Natural Language Processing (IMS) Stuttgart 8 th 9 th October 2002 BITS Workshop, München Norbert Braunschweiler Corpus

More information

Text-To-Speech Technologies for Mobile Telephony Services

Text-To-Speech Technologies for Mobile Telephony Services Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary

More information

Open issues regarding legal metadata: IP licensing and management of different cognitive levels

Open issues regarding legal metadata: IP licensing and management of different cognitive levels Open issues regarding legal metadata: IP licensing and management of different cognitive levels FLORENCE MAY 6th, 2011 Danièle Bourcier Meritxell Fernández-Barrera 1 Cersa CNRS-Université Paris 2, Paris

More information

Leveraging ASEAN Economic Community through Language Translation Services

Leveraging ASEAN Economic Community through Language Translation Services Leveraging ASEAN Economic Community through Language Translation Services Hammam Riza Center for Information and Communication Technology Agency for the Assessment and Application of Technology (BPPT)

More information

Machine Translation. Agenda

Machine Translation. Agenda Agenda Introduction to Machine Translation Data-driven statistical machine translation Translation models Parallel corpora Document-, sentence-, word-alignment Phrase-based translation MT decoding algorithm

More information

Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations

Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Hugues Salamin, Anna Polychroniou and Alessandro Vinciarelli University of Glasgow - School of computing Science, G128QQ

More information

Strategies for Training Large Scale Neural Network Language Models

Strategies for Training Large Scale Neural Network Language Models Strategies for Training Large Scale Neural Network Language Models Tomáš Mikolov #1, Anoop Deoras 2, Daniel Povey 3, Lukáš Burget #4, Jan Honza Černocký #5 # Brno University of Technology, Speech@FIT,

More information

MLLP Transcription and Translation Platform

MLLP Transcription and Translation Platform MLLP Transcription and Translation Platform Alejandro Pérez González de Martos, Joan Albert Silvestre-Cerdà, Juan Daniel Valor Miró, Jorge Civera, and Alfons Juan Universitat Politècnica de València, Camino

More information

The Impact of Using Technology in Teaching English as a Second Language

The Impact of Using Technology in Teaching English as a Second Language English Language and Literature Studies; Vol. 3, No. 1; 2013 ISSN 1925-4768 E-ISSN 1925-4776 Published by Canadian Center of Science and Education The Impact of Using Technology in Teaching English as

More information

Speaking your language...

Speaking your language... 1 About us: Cuttingedge Translation Services Pvt. Ltd. (Cuttingedge) has its corporate headquarters in Noida, India and an office in Glasgow, UK. Over the time we have serviced clients from various backgrounds

More information

HMM-based Breath and Filled Pauses Elimination in ASR

HMM-based Breath and Filled Pauses Elimination in ASR HMM-based Breath and Filled Pauses Elimination in ASR Piotr Żelasko 1, Tomasz Jadczyk 1,2 and Bartosz Ziółko 1,2 1 Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science

More information

Technical Questions and Answers for Beneficiaries on the Erasmus+ Online Linguistic Support

Technical Questions and Answers for Beneficiaries on the Erasmus+ Online Linguistic Support Technical Questions and Answers for Beneficiaries on the Erasmus+ Online Linguistic Support December 2014 This document covers the main technical questions and answers for Beneficiaries (BENs) regarding

More information