Speech Transcription



Similar documents
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

CAMBRIDGE FIRST CERTIFICATE Listening and Speaking NEW EDITION. Sue O Connell with Louise Hashemi

BBC Learning English - Talk about English July 11, 2005

The Good Old Days. 2. Famous places: Next, students must drag the pictures of the famous places to the names of the cities where they are.

Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)

BBC Learning English Talk about English Academic Listening Part 1 - English for Academic Purposes: Introduction

The LENA TM Language Environment Analysis System:

VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Evaluation of speech technologies

Interview with David Bouthiette [at AMHI 3 times] September 4, Interviewer: Karen Evans

Working people requiring a practical knowledge of English for communicative purposes

C E D A T 8 5. Innovating services and technologies for speech content management

Robust Methods for Automatic Transcription and Alignment of Speech Signals

TED-LIUM: an Automatic Speech Recognition dedicated corpus

Speech Signal Processing: An Overview

face2face Upper Intermediate: Common European Framework (CEF) B2 Skills Maps

COMPUTER TECHNOLOGY IN TEACHING READING

Pronunciation in English

Turkish Radiology Dictation System

Information Leakage in Encrypted Network Traffic

IC2 Class: Conference Calls / Video Conference Calls

The Impact of Using Technology in Teaching English as a Second Language

Thai Language Self Assessment

EMILY WANTS SIX STARS. EMMA DREW SEVEN FOOTBALLS. MATHEW BOUGHT EIGHT BOTTLES. ANDREW HAS NINE BANANAS.

By Joseph Sorrentino. Copyright MMVII by Joseph Sorrentino All Rights Reserved Brooklyn Publishers LLC in association with Heuer Publishing LLC

SPEECH DATA MINING, SPEECH ANALYTICS, VOICE BIOMETRY. 1/41

Longman English Interactive

Reading Assistant: Technology for Guided Oral Reading

Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development

Develop Software that Speaks and Listens

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

BBC Learning English Talk about English Business Language To Go Part 1 - Interviews

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages CHARTE NIVEAU B2 Pages 11-14

SPEECH Biswajeet Sarangi, B.Sc.(Audiology & speech Language pathology)

Cambridge English: Advanced Speaking Sample test with examiner s comments

Estonian Large Vocabulary Speech Recognition System for Radiology

TeachingEnglish Lesson plans

California Treasures High-Frequency Words Scope and Sequence K-3

Trigonometric functions and sound

Barrister or solicitor?

HD VoIP Sounds Better. Brief Introduction. March 2009

BBC LEARNING ENGLISH 6 Minute English Asking the right questions

BBC LEARNING ENGLISH 6 Minute Vocabulary Irregular verbs 1

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Using an external agency or individual to transcribe your qualitative data

Preparing for the IELTS test with Holmesglen Institute of TAFE

Hear Better With FM. Get more from everyday situations. Life is on.

Maryland 4-H Public Speaking Guide

Industry Guidelines on Captioning Television Programs 1 Introduction

Boost the performance of your hearing aids. Phonak wireless add-ons

Specialty Answering Service. All rights reserved.

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

How to teach listening 2012

Care Skillsbase: Skills Check 5 What Is Effective Communication?

COURSE SYLLABUS SPANISH IA

Problems and Prospects in Collection of Spoken Language Data

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS

Making requests and asking for permission.

BBC LEARNING ENGLISH 6 Minute English Compulsory voting

COMMUNICATION UNIT: IV. Educational Technology-B.Ed Notes


Closed Captions. Questions. Jan Ozer #janozer 11/20/2014

BBC LEARNING ENGLISH 6 Minute Grammar Conditionals review

Student Samples: Grade 7

Learning English podcasts from the Hellenic American Union. Level: Lower Intermediate Lesson: 2 Title: The History of Beer

BBC LEARNING ENGLISH 6 Minute English The Proms

TeachingEnglish Lesson plans. Conversation Lesson News. Topic: News

THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM

Strand: Reading Literature Topics Standard I can statements Vocabulary Key Ideas and Details

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

What you need to know about student s hearing technology for classroom listening

BRAIN ASYMMETRY. Outline

James Bond in a Honda?

The Conjectural November 2015 Transcript 1 of 5

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

How to Practice Pronunciation Without a Microphone

LEARNING FEATURE MAPPING USING DEEP NEURAL NETWORK BOTTLENECK FEATURES FOR DISTANT LARGE VOCABULARY SPEECH RECOGNITION

Speech recognition technology for mobile phones

BBC Learning English Talk about English Business Language To Go Part 2 - Induction

ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING

Critical analysis. Be more critical! More analysis needed! That s what my tutors say about my essays. I m not really sure what they mean.

Alignment of the National Standards for Learning Languages with the Common Core State Standards

Functional Auditory Performance Indicators (FAPI)

How to become a successful language learner

PTE Academic Tutorial

Transcription:

TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1

What Is Speech Recognition? Def: Automatic conversion of spoken words in text (eg. read speech) Other definitions: ability to understand spoken words, to respond to verbal commands, to interpret human speech,... play <Audiofile filename="20060614_1500_1730_or_sat"> <SpeechSegment lang=eng spkr=ms4 stime=2751.13 etime=2846.84> <Word stime=2751.64 dur=0.31 conf=0.980> Mister </Word> <Word stime=2751.95 dur=0.80 conf=0.974> President, </Word> <Word stime=2752.98 dur=0.18 conf=0.996> in </Word> <Word stime=2753.17 dur=0.22 conf=1.000> these </Word> <Word stime=2753.39 dur=0.61 conf=0.995> questions </Word> <Word stime=2754.01 dur=0.15 conf=1.000> we </Word> <Word stime=2754.16 dur=0.25 conf=0.999> see </Word> <Word stime=2754.42 dur=0.36 conf=1.000> the </Word> <Word stime=2754.78 dur=0.37 conf=1.000> European </Word> LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 2

Processing Speech Language Identification Speaker Identification Audio Segmentation Speech Transcription Specific Entities Texts + Tags XML Signal Topics,... Translation LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 3

Why Is Speech Recognition Difficult? Text: I do not know why speech recognition is so difficult Continuous: Idonotknowwhyspeechrecognitionissodifficult Spontaneous: Idunnowhyspeechrecnitionsodifficult Pronunciation: YdonatnowYspiCrEkxnISxnIzsodIfIk lt YdonowYspiCrEknISNsodIfxk l YdontnowYspiCrEkxnISNsodIfIk lt Important variability factors: Speaker Acoustic environment physical characteristics (gender, background noise (cocktail party,...) age,...), accent, emotional state, room acoustic, signal capture situation (lecture, conversation, (microphone, channel,...) meeting,...) LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 4

WSJ Read stop Audio Samples (1) PREVIOUSLY HE WAS PRESIDENT OF CIGNA S PROPERTY CASUAL GROUP FOR FIVE YEARS AND SERVED AS CHIEF FINANCIAL OFFICER WSJ Spontaneous ACCORDING TO KOSTAS STOUPIS AN ECONOMIST WITH THE GREEK GOVERNMENT IT IS IT IS NOT YET POSSIBLE FOR GREECE TO TO MAIN- TAIN AN EXPORT LEVEL THAT IS EQUIVALENT TO EVEN THE THE SMALL- EST LEVEL OF THE OTHER WESTERN NATIONS Broadcast News ON WORLD NEWS TONIGHT THIS WEDNESDAY TERRY NICHOLS AVOIDS THE DEATH SENTENCE FOR HIS ROLE IN THE OKLAHOM A CITY BOMB- ING. ONE OF THE WORLD S MOST POPULAR AIRPLANES THE GOVERN- MENT ORDERS AN IMMEDIATE SAFE TY INSPECTION. AND OUR REPORT ON THE CLONING CONTROVERSY BECAUSE ONE AMERICAN SCIENTIST SAYS HE S READY TO GO. LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 5

Audio Samples (2) Conversational Telephone Speech Hello. Yes, This is Bill. Hi Bill. This is Hillary. So this is my first call so I haven t a clue as to - - what we re supposed to do yeah. It s pretty funny that we re Bill and Hillary talking on the phone though isn t it? Ah I didn t even think about that. {laughter} uhm we re supposed to talk about did you get the topic. Yes. So we re supposed to talk about what changes if any you have made since... EPPS stop Mister President, in these questions we see the European social model in flashing lights. The European social model is a mishmash which pleases no one: a bit of the free market here and a bit of welfare state there, mixed with a little green posturing. LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 6

EPPS, spontaneous stop Audio Samples (3) Thank you very much, Mister Chairman. Hum When I came to this room I I have seen and heard homophobia but sort of vaguely ; through T. V. and the rest of it. But I mean listening to some of the Polish colleagues speak here today, especially Roszkowski, Pek, Giertych and Krupa,... EPPS, non native My main my main problem with the environmental proposals of the Commission is that they do not correspond to the objectives of the Sixth Environmental Action Plan. As far as the traffic is concerned, the sixth E. A. P. stressed the decoupling of transport growth and G. D. P. growth. LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 7

Inside ASR Systems Text Corpus Speech Corpus Training Normalization Manual Transcription Feature Extraction N-gram Estimation Training Lexicon HMM Training Decoding Language Model Recognizer Lexicon Acoustic Models P(W) P(H W) f(x H) Speech Sample Y Acoustic Front-end X Decoder W * Speech Transcription LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 8

Indicative ASR Performance Task Condition Word Error Dictation read speech, close-talking mic. 3-4% (humans 1%) read speech, noisy (SNR 15dB) 10% spontaneous dictation 14% read speech, non-native 20% Found audio TV & radio news broadcasts 10-15% (humans 4%) TV documentaries 20-30% Telephone conversations 20-30% (humans 4%) Lectures (close mic) 20% Lectures (distant mic) 50% EPPS 8% LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 9

Progress in Speech Recognition BN BC 670.000 words AR CH AR EN CH Word Error Rates (from T. Schultz) EN 2005 LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 10

ASR Activites In TCStar Three languages (EN, ES, MAN), three tasks (EPPS, Cortes, BN) Improving modeling and decoding techniques audio segmentation, disctiminative features, duration models, automatic learning, discriminative training, pronunciation models, NN language model, decoding strategies,... Model adaptation methods speaker adaptive trainining, acoustic and language model adaptation, cross-system adaptation, adaptation to noisy environments,... Integration of ASR and Translation transcription errors, confidence scores, word graph interface punctuation for MT transcription normalisation (98 vs. ninety eight, hesitations,...) LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 11

WER for EPPS English Three Years Of Improvments in TCStar 30 TCStar project 20 15 10 9 8 7 6 5 2004 2005 2006 2007 LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 12