Computer Assisted Language Learning (CALL) Systems



Similar documents
ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt

Turkish Radiology Dictation System

Thirukkural - A Text-to-Speech Synthesis System

Emotion Detection from Speech

Pronunciation in English

A CHINESE SPEECH DATA WAREHOUSE

Effect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果

Lecture 12: An Overview of Speech Recognition

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Modern foreign languages

Establishing the Uniqueness of the Human Voice for Security Applications

Myanmar Continuous Speech Recognition System Based on DTW and HMM

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Things to remember when transcribing speech

SPEECH OR LANGUAGE IMPAIRMENT EARLY CHILDHOOD SPECIAL EDUCATION

Reading Competencies

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages CHARTE NIVEAU B2 Pages 11-14

An Arabic Text-To-Speech System Based on Artificial Neural Networks

Fall August 24 October 16 (online classes begin August 21) Drop Deadline: September 4 Withdrawal Deadline: October 2

APEC Online Consumer Checklist for English Language Programs

Master of Arts in Linguistics Syllabus

62 Hearing Impaired MI-SG-FLD062-02

Points of Interference in Learning English as a Second Language

Learning Today Smart Tutor Supports English Language Learners

WinPitch LTL II, a Multimodal Pronunciation Software

TEACHER CERTIFICATION STUDY GUIDE LANGUAGE COMPETENCY AND LANGUAGE ACQUISITION

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

Intonation difficulties in non-native languages.

Proceedings of Meetings on Acoustics

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Speech Analytics. Whitepaper

Analysis and Synthesis of Hypo and Hyperarticulated Speech

How To Assess Dementia With A Voice-Based Assessment

Carla Simões, Speech Analysis and Transcription Software

Speech Recognition System for Cerebral Palsy

Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING

PTE Academic. Score Guide. November Version 4

3-2 Language Learning in the 21st Century

Ericsson T18s Voice Dialing Simulator

Touchstone Level 2. Common European Framework of Reference for Languages (CEFR)

Speech Production 2. Paper 9: Foundations of Speech Communication Lent Term: Week 4. Katharine Barden

Specialty Answering Service. All rights reserved.

PTE Academic Preparation Course Outline

Introduction to Pattern Recognition

L9: Cepstral analysis

The primary goals of the M.A. TESOL Program are to impart in our students:

Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis

The sound patterns of language

Guide to Pearson Test of English General

Study Plan for Master of Arts in Applied Linguistics

Colaboradores: Contreras Terreros Diana Ivette Alumna LELI N de cuenta: Ramírez Gómez Roberto Egresado Programa Recuperación de pasantía.

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

21st Century Community Learning Center

Hardware Implementation of Probabilistic State Machine for Word Recognition

Separation and Classification of Harmonic Sounds for Singing Voice Detection

Program curriculum for graduate studies in Speech and Music Communication

ETS Automated Scoring and NLP Technologies

Developing an Isolated Word Recognition System in MATLAB

Developmental Verbal Dyspraxia Nuffield Approach

Speech Signal Processing: An Overview

LINGUISTIC DISSECTION OF SWITCHBOARD-CORPUS AUTOMATIC SPEECH RECOGNITION SYSTEMS

Evaluating grapheme-to-phoneme converters in automatic speech recognition context

SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne

DynEd International, Inc.

Text-To-Speech Technologies for Mobile Telephony Services

Useful classroom language for Elementary students. (Fluorescent) light

Functional Auditory Performance Indicators (FAPI)

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS

Develop Software that Speaks and Listens

School Class Monitoring System Based on Audio Signal Processing

Version 1.0 Copyright 2009, DynEd International, Inc. August 2009 All rights reserved

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

How To Read With A Book

(in Speak Out! 23: 19-25)

Articulatory Phonetics. and the International Phonetic Alphabet. Readings and Other Materials. Review. IPA: The Vowels. Practice

An Analysis of the Eleventh Grade Students Monitor Use in Speaking Performance based on Krashen s (1982) Monitor Hypothesis at SMAN 4 Jember

Dr. Wei Wei Royal Melbourne Institute of Technology, Vietnam Campus January 2013

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

Working people requiring a practical knowledge of English for communicative purposes

TEFL Cert. Teaching English as a Foreign Language Certificate EFL MONITORING BOARD MALTA. January 2014

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

How To Teach Reading

Evaluation of speech technologies

Transcription:

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 1 Computer Assisted Language g Learning (CALL) Systems Tatsuya Kawahara (Kyoto University, Japan) Nobuaki Minematsu (University of Tokyo, Japan) Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 2 OUTLINE Introduction (TK) Segmental laspect t& Speech hrecognition Tech. (TK) Pronunciation Structure Model (NM) Prosodic Aspect (NM) Speech Synthesis Tech. for CALL (NM) CALL Systems (TK) Database for CALL (NM)

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 3 (Traditional) LL CALL (Traditional) LL: magnetic audio tape Single media, Sequential access Computer Assisted LL Multi media, Random access Easier comparison of learner s speech and model speech Speech technology can be incorporated Partly replace rater s or teacher s jobs Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 4 Speech Technology for LL Automate assessment of proficiency PhonePass Versant ETS TOEFL PSC (Putonghua Shuiping Ceshi) Assist LL With light supervision CALL classroom Self learninglearning Need to keep motivating Edutainment Need to avoid enhancement of errors

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 5 Target Population of CALL Non native speakers Particular L1 (ex.) English LL for Japanese people p Still diverse in proficiency level, but L1 knowledge useful Unlimited L1 (ex.) Japanese LL for people in the world Children (native) [Russel 1996] Handicapped (Hearing or Articulation impaired) people [Bernstein 1977] Accented (dialect) people Putonghua [Liu 2006] Operators at Call Centers Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 6 Target Skill of CALL Reading Writing Listening Speaking including sentence composition Vocabulary, Grammar Pragmatic Dialog (Communication) travel shopping, business negotiation Pronunciation of given sentences Phone segmental only Word.+accent Sentence +intonation Paragraph +prominence

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 7 Importance of Pronunciation Training [Bernstein 2003] Comm = pron * lex * (1+syn+rhet+prag+soc) comm. = communication skills pron. = pronunciation lex. = lexical control and vocabulary syn. = syntax rhet. = rhetorical form prag. = pragmatics soc. = sociolinguistics Pronunciation skill affects entire communicative performance Native sounding pronunciation may not be needed, but acceptable (intelligible enough) pronunciation is desired for smooth communication Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 8 Articulation Speech Students must learn how to control articulators t (vocal tract) But it is not easy to observe the movement of these organs Observation is feasible for acoustic aspect of speech

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 9 Visual Presentation of Articulation Talking Head showing correct articulation [Massaro 2006] Acoustic to articulatory inversion to estimate the articulatory movements [Badin 2010] Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 10 Segmental and Prosodic Aspects Segmental Pronunciation (cf.) speech recognition Phonemes (Sub words) Features: spectrum envelop based Prosody (cf.) speech synthesis Tones Lexical accents Intonation and rhythmpatterns Features: fundamental frequency, power, and duration

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 11 OUTLINE Introduction (TK) Segmental laspect & Speech Recognition Tech. (TK) Pronunciation Structure Model (NM) Prosodic Aspect (NM) Speech Synthesis Tech. for CALL (NM) CALL Systems (TK) Database for CALL (NM) Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 12 Speech Technology used in CALL Speech analysis spectrum, pitch, power Feature normalization required for objective comparison with model speaker (Constrained) speech recognition (ASR) Speech segmentation alignment Error detection Scoring Need to model non native speech and handle erroneous input Not only segmental aspect, but also prosodic aspects Speech synthesis (Minematsu)

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 13 Flowchart of Pronunciation Error Detection and Scoring Speech Analysis Acoustic Model Pronunciation Model Error Detection Segmentation Scoring Speech Recognition Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 14 Formant and Articulatory Features Potentially useful for effective diagnosis and feedback Directrelationship relationship with articulation Not easy to make reliable and robust estimation Not used in ASR

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 15 Classification of Vowels Place of articulation front middle back Openness of mouth narrow -- - wide bat vs. but Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu Relationship between Articulation and Formants 16 Place of Articulation F2 Openness Articulation Chart Formant Chart F1 This still requires expert knowledge on phonetics!

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 17 Classification of Consonants (Japanese) Bilabial Alveolar Palatal Glottal voiced unvoiced voiced unvoiced voiced unvoiced unvoiced Fricative Affricate Stop Semi-vowel Nasal sea vs. she Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 18 MFCC: Mel Frequency Cepstrum Coefficient Most widely used spectral feature Mel bandwidth human perception Cepstrum spectrum envelope orthogonal & less correlated appropriate for statistical model 1. DFT(FFT) power spectrum 2. Mel conversion (Mel band filter bank) 3. Logarithm + Cosine Transform (IDFT) cepstrum 4. Extract low quefrency (12) coefficients

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 19 Feature Normalization in Speech Analysis Feature normalization for objective comparison with model speaker for score calculation via speech recognition against speakers (native/non native) against acoustic channels (database/users) Normalization methods for MFCC Cepstrum Mean Normalization (CMN) Cepstrum Variance Normalization (CVN) Histogram Equalization Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 20 Speaker Normalization in Speech Analysis Vocal Tract Length Normalization (VTLN) Warping spectral dimension Based on acoustic model likelihood min 1 max 1 f L f U Pronunciation Structure (by Minematsu) Invariant feature (F divergence)

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 21 Speech Recognition for CALL Tasks Speech segmentation alignment alignment Error detection Scoring Challenges Modeling non native speech Handling erroneous speech input Constraint Target word or sentence is given Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 22 ASR vs. CALL X: speech input, W: phone label word sequence (target) ASR For given X, find W that maximizes p(w X) Solved by max p(w)*p(x W) Each phone model p(x w) is trained CALL W (oracle) and X (not reliable) given, Segmentation: Viterbi forced alignment Error detection: find W such that p(x W )>p(x W) Scoring: evaluate p(x W)?? How to train the model??

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 23 Flowchart of Pronunciation Error Detection and Scoring Speech Analysis Acoustic Model Pronunciation Model Error Detection Segmentation Scoring Speech Recognition Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 24 Segmentation Pre process for scoring Viterbi forced alignment with HMM representing W In fact, there may be pronunciation errors in X Insertion & deletion seriously affect alignment Error prediction/detection may be necessary

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 25 Segmentation s l ou slow /s l ou/ u /s u l ou/ Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 26 Flowchart of Pronunciation Error Detection and Scoring Speech Analysis Acoustic Model Pronunciation Model Error Detection Segmentation Scoring Speech Recognition

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 27 Error Detection Find W such that p(x W )>p(x W) Compute scores p(x w ) for alternative phones w for each segmented region x When we take into account insertions and deletions, we need to generate a network of possible errors Error prediction can be done with prior knowledge, such as L1 Alternative phones w can be taken from L1 Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 28 Error Prediction in Pronunciation Model No equivalent syllable in L1 (ex.) sea she No equivalent phoneme in L1 (ex.) l r, v b Vowel linsertions (ex.) b r b uh r Rules for errors breath Pronunciation Dictionary Pronunciation Error Prediction S b uh Error r l eh th s uh E

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 29 Error Detection based on Classification Approach Not necessarily compute p(x w ), but test if w is more likely than w Explicit classifier (verifier) learning Incorporate many features Focus on error detection by assuming segmentation Linear Discriminant Analysis (LDA) Support Vector Machines (SVM) Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 30 Other Issues in Error Detection Filter and prioritize many (possible) individual phone errors error miss >> false alarm Not to discourage learners Feedback How to correct errors

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 31 Flowchart of Pronunciation Error Detection and Scoring Speech Analysis Acoustic Model Pronunciation Model Error Detection Segmentation Scoring Speech Recognition Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 32 Scoring: Standpoints Native likeness Howclose to golden native speakers? P(X W,λ G ) What is the golden model? British? American?... Impossible to free from L1 effect, speaker characteristic Intelligibility How distinguishable (less confusable) from other phones? p(w X) Some pronunciation may not be recognized as anything Need to consider L1 phones as well assume L1

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 33 Scoring based on Native Likeness How close to golden native speakers? Defined by p(x W,λ G ) λ G : golden model Normalized by p(x W,λ N ) λ N : non native model In summary, likelihood ratio scoring for phone w p( X W, G ) p( x w, G ) p( xt w, G ) p ( X W, ) p ( x w, ) p ( x w, ) N N Mean w.r.t. phones Mean w.r.t. time frame Π: geometric mean= arithmetic mean in logarithm t t N Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 34 Scoring based on Intelligibility How distinguishable (less confusable) from other phones? Measured by p(w X) W ' p( X W ) p( X W ') t w' p( x w) p( x w') p( xt w) max p( x w') t Often called GOP (Goodness Of Pronunciation) posterior prob. Similar to confidence measure in ASR becomes 1 if best w =w forced alignment (segmentation) Viterbi score

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 35 Extension of GOP Better to limit w to confusing phones, rather than all possible phones [Wei2009] (ex.) /v/ /b/ Better to set different thresholds depending on phones (or phone clusters) for error detection [Ito 2005] Any error can be detected if we use phone loop model Acoustic & Pronunciation Model Need to adapt to non native speakers Better to consider L1 phones Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 36 Scoring to Assessment Other factors Durationmodeling & evaluation Other prosodic aspects accent, intonation Speech rate Score mapping Linear regression to fit to human rater s evaluation

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 37 Flowchart of Pronunciation Error Detection and Scoring Speech Analysis Acoustic Model Pronunciation Model Error Detection Segmentation Scoring Speech Recognition Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 38 Acoustic Modeling: Native vs. Non native Native speech Gold standard, but does not match Non native speech Matched, but error prone There is not large database available Adaptation from native to non native Phone model of L1 is used for the same phone (in the IPA inventory)

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 39 Context Independent Modeling Context dependent (e.g. triphone) models are widely used in ASR Context independent (monophone) model works well, even better, in CALL Phonetic context is not reliable in non native speech insertion of vowels Better segmentation accuracy even in native speech Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 40 Speaker Adaptation of Acoustic Model to Non native Speech Pronunciation of adaptation data may not be correct Compare baseform label (automatic but error prone) and hand label (correct but costly) Phone accuracy: measured based on hand label including errors [Tsubota 2004] Acoustic model Phone accuracy (native model) No adaptation Hand label Baseform label 75.4 81.0 80.6 Lexicon baseform label is sufficient

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 41 Acoustic Model: Native model vs. Non native (L2) model Non native speech database (MEXT project) 13129utterances by 178speakers Pronunciation errors are not annotated (too costly) Dictionary label vs. automatic label with ASR Both are error prone [Tsubota 2004] Acoustic model baseline speaker adapt Native English model 75.4 80.6 Non-native model (baseform) Non-native model (ASR) 78.0 77.1 81.8 81.5 Non native model is more effective, even with dictionary label The superiority is reduced with speaker adaptation Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 42 Acoustic Model: Native/L2 model + L1 model Many phones are shared by target L2 and L1 (ex.) most of consonants shared by Japanese and English Incorporate L1 acoustic model for shared phones in parallel [Tsubota 2002] Acoustic model baseline speaker adapt English native model + L1 model 78.9 81.3 Non-native model + L1 model 78.7 81.5 Incorporating L1 model in parallel is very effective

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 43 Flowchart of Pronunciation Error Detection and Scoring Speech Analysis Acoustic Model Pronunciation Model Error Detection Segmentation Scoring Speech Recognition Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 44 Pronunciation Model Standard baseform possible errors Constraint of L1 is effective Linguistic knowledge /v/ /b/, /ϑ/ /s/ Substitution with similar phone of L1 Insertion of vowels For GOP score computation, simple phone loop model (=no pronunciation model) is used Any errors can be detected, with many false alarms

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 45 Error Prediction in Pronunciation Model No equivalent syllable in L1 (ex.) sea she No equivalent phoneme in L1 (ex.) l r, v b Vowel linsertions (ex.) b r b uh r Rules for errors breath Pronunciation Dictionary Pronunciation Error Prediction S b uh Error r l eh th s uh E Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 46 Pronunciation Model Training Hand craft phonological rules Expert knowledge needed Too many rules cause false alarms, degrading recognition performance Tradeoff between coverage and perplexity Machine learning from annotated data Statistical learning of rewriting rules [Meng2011] Decision tree to find critical rule set [Wang 2009]

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 47 OUTLINE Introduction (TK) Segmental laspect t& Speech hrecognition Tech. (TK) Pronunciation Structure Model (NM) Prosodic Aspect (NM) Speech Synthesis Tech. for CALL (NM) CALL Systems (TK) Database for CALL (NM) Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 48 English CALL System: HUGO @Kyoto Univ. [Tsubota, Imoto, Raux 2002] For Japanese college students, so that they can introduce Japanese cultures Dedicated acoustic model & error prediction scheme for Japanese students Deployed and used in classrooms

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 49 English CALL System: HUGO @Kyoto Univ. Goal: Pinpointing the pronunciation errors which degrade intelligibilityandproviding effective feedback Practice consists of two phases 1. Dialogue based skit (for natural conversation) 2. Training on specific errors detected in the first phase (using a phrase or a word) Pronunciation error detection Segmental pronunciation hand crafted phonological rules Accent (Primary & Secondary Stress) multiple prosodic features Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 50

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 51 Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 52

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 53 List of Pronunciation Errors W/Y deletion (would) SH/CH substitution (choose) R/L substitution (road) ER/A substitution (paper) Non-reduction (student) V/B substitution (problem) Final vowel insertion (let) CCV-cluster insertion (active) VCC-cluster insertion (study) H/F substitution (fire) Built from literature in ESL Remove error patterns with low detection rate Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 54 Intelligibility Assessment based on Error Statistics Estimate Find critical errors

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 55 Priority of Training on Specific Errors according to Intelligibility Level /r/ /l/ confusion: middle advanced vowel insertion: beginner middle WY SH ER RL VR VB FI CCV VCC HF Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 56 NativeAccent [Eskenazi 2007] Product of Fluency Project of CMU English learning Error detection and feedback on articulation Up to 28 L1: Japanese, Russian, French 800 exercises

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 57 English CALL System @ CUHK [Meng 2010] For Chinese learners of English Corpus: 100 Cantonese and 111 Mandarin L1 Reading a paragraph, words Pronunciation error model Hand crafted phonological rules Data driven patterns GOP score Pre filtering based on duration models Synthesizing expressive speech to convey emphasis in feedback generation Synthesizing visual speech with articulator animation Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 58 Shadowing Exercise [Luo 2009] listening and repetition of native utterances, online Simultaneous training of listening and speaking skills High correlation between GOP and TOEIC scores (= 0.90) Higher than simple reading Corr. = 0.90

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 59 ETS SpeechRater for TOEFL [Zechner 2007] Assessment of unconstrained English speech TOEFL ibt Practice Online (TPO) ibt Field Study Based on LVCSR methodology Acoustic model: non native speech (30hours) Language model: non native speech + broadcast news Features: ASR results (word ID, confidence), speech rate, pause length 40 in total Scoring: linear regression model Correlation with human rater: 0.67 Inter human correlation 0.94 Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 60 Dialog Based English CALL @POSTECH [Lee 2010] Situated dialog (ex.) shopping Basedon SDS (Spoken Dialog System) methodology Task dependent ASR+SLU Example Based Dialog Management Corrective feedback based on example selection Field trial on elementary school

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 61 Japanese CALL system: CALLJ @Kyoto Univ. [Wang 2009] Exercise of basic sentence production (text & speech), given a image scene Key features Dynamic generation of questions & ASR grammar network with error prediction Interactive hints H.Wang, C.J.Waple, and T.Kawahara. Computer assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition. Speech Communication, Vol.51, No.10, pp.995 1005, 2009. Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 62 Japanese CALL system: CALLJ @Kyoto Univ. How to Try Windows only. 0. (Unzip CALLJ1.5.zip). 1. Move to the directory CALLJ. 2. Click StartCALLJ. 3. Create your account by clicking New in login window for the first time. You need some knowledge on Japanese.

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 64 References 1. M. Eskenazi, An overview of spoken language technology for education, Speech Communication, Vol. 51, pp.832 844, 2009. 2. M. Russell, C. Brown, A. Skilling, R. Series, J. Wallace, B. Bonham, P. Barker. Applications of automatic speech recognition to speech and language development in young children. Proc. ICSLP, 1996. 3. J. Bernstein. Intelligibility and simulated deaf like segmental and timing errors. Proc. ICASSP, pp. 244 247, 1977. 4. Q. Liu, S. Wei, Y. Hu, W. Guo, R. Wang, The Application of Phone Weight in Putonghua Pronunciation Quality Assessment. Proc. ISCSLP, 2006. 5. J. Bernstein. Objective Measurement of Intelligibility. Proc. ICPhS, pp. 1581 1584, 2003. 6. L. Neumeyer, H. Franco, V. Digalakis andm M. Weintraub, Automatic Scoringof of Pronunciation Quality, Speech Communication, Vol.30, pp.83 93, 2000. 7. S.M. Witt. Phone level Pronunciation Scoring and Assessment for Interactive Language Learning. Speech Communication, Vol.30, pp.95 108, 2000. 8. J. Bernstein, M. Cohen, H. Murveit, D. Rtischev, M. Weintraub. Automatic evaluation and training in English pronunciation. Proc ICSLP, pp. 1185 1188, 1990. Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 65 References 1. Y.Tsubota, T.Kawahara, and M.Dantsuji. An English pronunciation learning system for Japanese students based on diagnosis of critical pronunciation errors. ReCALL Journal, Vol.16, No.1, pp.173 188, 2004. 2. Y.Tsubota, T.Kawahara, and M.Dantsuji. Recognition and verification of English by Japanese students for computer assisted language learning system. Proc. ICSLP, pp.1205 1208, 2002. 3. K.Imoto, Y.Tsubota, A.Raux, T.Kawahara, and M.Dantsuji. Modeling and automatic detection of English sentence stress for computer assisted English prosody learning system. Proc. ICSLP, pp.749 752, 2002. 4. A.Raux and T.Kawahara. Automatic intelligibility assessment and diagnosis of critical pronunciation errors for computer assisted pronunciation iti learning. Proc. ICSLP, pp.737 740, 2002. 5. H.Wang, C.J.Waple, and T.Kawahara. Computer assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition. Speech Communication, Vol.51, No.10, pp.995 1005, 2009.

Tutorial on CALL in INTERSPEECH2012 by T.Kawahara and N.Minematsu 66 References 1. M.Eskenazi, A.Kennedy, C.Ketchum, R.Olszewski, and G.Pelton, The NativeAccentTM pronunciation tutor: measuring success in the real world, Proc. SIG SLaTESLaTE 2007. 2. H.Meng, W K.Lo, A.M.Harrison, P.Lee, K H.Wong, W K.Leung and F.Meng. Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience. Proc. APSIPA, 2010. 3. D. Luo, N, Minematsu, Y. Yamauchi, K. Hirose, Analysis and comparison of automatic language proficiency assessment between shadowed sentences and read sentences, Proc. SLaTE, 2009 4. F. Ehsani, J. Bernstein, A. Najmi, O. Todic, 1997. Subarashii: Japanese interactive ti spoken language education. Proc. Eurospeech, pp. 681 684, 684 1997. 5. K. Zechner, D. Higgins, X. Xi. SpeechRaterTM: a construct driven approach to scoring spontaneous non native speech. Proc. SLaTE, 2007. 6. S.Lee, H.Noh, J.Lee, K.Lee and G.Lee. POSTECH Approaches for Dialog Based English Conversation Tutoring. Proc. APSIPA,2010