Evaluation of speech technologies



Similar documents
C E D A T 8 5. Innovating services and technologies for speech content management

Personnalisez votre intérieur avec les revêtements imprimés ALYOS design

TED-LIUM: an Automatic Speech Recognition dedicated corpus

Brest. Backup : copy flash:ppe_brest1 running-config

Measuring Performance in a Biometrics Based Multi-Factor Authentication Dialog. A Nuance Education Paper

ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS

2. Il faut + infinitive and its more nuanced alternative il faut que + subjunctive.

D2.4: Two trained semantic decoders for the Appointment Scheduling task

Elizabethtown Area School District French III

GlobalPhone Voice to Text User Guide

Speech Transcription

TS3: an Improved Version of the Bilingual Concordancer TransSearch

Archived Content. Contenu archivé

Estonian Large Vocabulary Speech Recognition System for Radiology

RT-09 Speaker Diarization Results

Information Leakage in Encrypted Network Traffic

Turkish Radiology Dictation System

APPENDIX A - ON-LINE CONFIGURATION

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

FOR TEACHERS ONLY The University of the State of New York

Automatic slide assignation for language model adaptation

Une campagne de sensibilisation est lancée (1) pour lutter (1) contre les conséquences de l'alcool au volant. Il faut absolument réussir (2).

Convergence of Translation Memory and Statistical Machine Translation

SCOPTEL WITH ACTIVE DIRECTORY USER DOCUMENTATION

Future Entreprise. Jean-Dominique Meunier NEM Executive Director Nov. 23, 2009 FIA Stockholm

Administrer les solutions Citrix XenApp et XenDesktop 7.6 CXD-203

KEZBER CONTENT MANAGEMENT SYSTEM MANUAL

To the Student: After your registration is complete and your proctor has been approved, you may take the Credit by Examination for French 1A.

Considerations for developing VoiceXML in Canadian French

Evaluating grapheme-to-phoneme converters in automatic speech recognition context

Vacant posts for job profile «Economic & trade issues» (see in annex) Postes vacants pour le profil «Affaires économiques et commerciales» (annexe)

RAMP for SharePoint Online Guide

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

Dragon Medical Practice Edition version 3.1. Release Notes

WYDO Newsletter. 24 June Get together and organize your own activity! Index. World Young Doctors Day Journée Mondiale des Jeunes Médecins

Effect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果

ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition

Detecting Credit Card Fraud

Developing quality processes in accreditation agencies to support the development of quality of HEIs: the case of CTI

VECSYS LIMSI ARCHITECTURE

L enseignement de la langue anglaise en Algérie; Evaluation de certains aspects de la réforme scolaire de 2003

The REPERE Corpus : a multimodal corpus for person recognition

City University of Hong Kong. Information on a Course offered by College of Liberal Arts and Social Sciences with effect from Semester A in 2013/14

AgroMarketDay. Research Application Summary pp: Abstract

Enterprise Risk Management & Board members. GUBERNA Alumni Event June 19 th 2014 Prepared by Gaëtan LEFEVRE

COPYRIGHT 2011 COPYRIGHT 2012 AXON DIGITAL DESIGN B.V. ALL RIGHTS RESERVED

BILL C-665 PROJET DE LOI C-665 C-665 C-665 HOUSE OF COMMONS OF CANADA CHAMBRE DES COMMUNES DU CANADA

Account Manager H/F - CDI - France

Installation procedure And Technical specifications

A Comparative Analysis of Speech Recognition Platforms

Note concernant votre accord de souscription au service «Trusted Certificate Service» (TCS)

An Arabic Text-To-Speech System Based on Artificial Neural Networks

Tera Term Telnet. Introduction

User Manual Culturethèque Thailand

French Property Registering System: Evolution to a Numeric Format?

Sun Management Center Change Manager Release Notes

Title Sujet: MDM Professional Services Solicitation No. Nº de l invitation Date: _A August 13, 2013

Getting Started with Exchange Unified Messaging

2009 Springer. This document is published in: Adaptive Multimedia Retrieval. LNCS 6535 (2009) pp DOI: / _2

MAURICE S PUBLISHED WORKS. (not a definitive list) (*indicates written with others ; **indicates adaptation of a speech)

AP FRENCH LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES

Liste d'adresses URL

Veritas Storage Foundation 5.0 Software for SPARC

Hosted Voice Business Group Administrator Quick Start Guide

Carla Simões, Speech Analysis and Transcription Software

Elizabethtown Area School District French II

General Certificate of Education Advanced Level Examination June 2012

Contracts over $10,000: 1 April 2013 to 30 September 2013 Contrats de plus de $ : 1er avril 2013 au 30 septembre 2013

Travaux publics et Services gouvernementaux Canada. Solicitation No. - N de l'invitation publics et Services gouvernementaux Canada

STUDENT APPLICATION FORM (Dossier d Inscription) ACADEMIC YEAR (Année Scolaire )

Using the Amazon Mechanical Turk for Transcription of Spoken Language

Archived Content. Contenu archivé

VISIT SIP Avaya IP Office Server Edition 9.0 Configuration Checklist

Solaris 10 Documentation README

Lesson 7. Les Directions (Disc 1 Track 12) The word crayons is similar in English and in French. Oui, mais où sont mes crayons?

Educational Resources on Québec

FOR TEACHERS ONLY The University of the State of New York

Autonomy Qfiniti Training Plan

AP FRENCH LANGUAGE 2008 SCORING GUIDELINES

Transcription:

CLARA Training course on evaluation of Human Language Technologies Evaluations and Language resources Distribution Agency November 27, 2012

Evaluation of speaker identification

Speech technologies Outline Speaker recognition

Speech technologies Speaker recognition Automatic speech recognition

Speech technologies Speaker recognition Automatic speech recognition Spoken language understanding

Speech technologies Speaker recognition Automatic speech recognition Spoken language understanding Speech synthesis

Automatic speaker recognition technologies try to recognize a person based on a speech sample. Speaker recognition covers two sub areas: speaker identification and speaker verification. Speaker identification: the goal is to classify an unlabeled voice token as belonging to a set of known speakers. if all trials are coming from the set of know speakers, the task is a closed set speaker identification. if trials can come from speakers out of the list of known speakers, the task is an open set speaker identification Speaker verification : the task is to decide whether or not the unlabeled voice belongs to a claimed identity. We can distinguish text dependent speaker recognition to text independent speaker recognition.

How do we evaluate speaker recognition?

How do we evaluate speaker recognition? Define the task (speaker identification vs verification, matched vs unmatched conditions, text (in)dependent,... )

How do we evaluate speaker recognition? Define the task (speaker identification vs verification, matched vs unmatched conditions, text (in)dependent,... ) Produce corpora for evaluation (train, development, test)

How do we evaluate speaker recognition? Define the task (speaker identification vs verification, matched vs unmatched conditions, text (in)dependent,... ) Produce corpora for evaluation (train, development, test) Evaluate the performance by using different metrics

Metrics for the evaluation of speaker identification For the closed set speaker identification, the metric is called the Misclassification Rate (M) as: M = N err N tot where N err is the number of trials in error and N tot, the total number of trials.

Metric for the evaluation of speaker identification For the open set identification, we have two metrics, the Misclassification Rate as defined previously and the False Rejection Rate or Miss for trials of speakers of the closed set rejected wrongly. E miss = N miss N target where N miss is the number of trials of speakers of the closed set not detected and N target, the total number of target trials.

Metrics for the evaluation of speaker verification Speaker verification is a detection task for which a claimed identity has to be confirmed or declined based on a voice sample. Two kinds of error can be observed, false acceptation when an impostor is accepted as the speaker he claimed to be and false rejection when a correct speaker is rejected. False Rejection Rate or Miss as defined previously:. False Acceptation Rate : E miss = N miss N target E far = N fa N impostors where N fa is the number of trials falsely detected as the target speaker and N impostors the total number of impostor trials.

Metrics for the evaluation of speaker verification Evaluating speaker verification is done with to metrics but usually we try to have only one number. So there are different solutions to combine False rejection and False acceptation: Equal Error Rate: using the decision threshold for which E miss = E far Geometric Mean Error Rate: Cost function : GME = E miss E far cost = c miss E miss P target + c fa E far (1 P target ) Detection Error Trade-off plot (DET curve): it s a plot of miss probability as a function of false acceptation probability.

Example of a DET curve

Automatic speech recognition (ASR) or speech-to-text is the process of transcribing automatically audio data. It includes various technologies like: Voice commands (navigation systems, mobile devices,... ) Telephony speech recognition (Interactive Voice Response server) Voicemail transcription Transcription of broadcast news Conversational speech recognition Meeting transcription Dictation...

Metric for the evaluation of ASR Evaluation of ASR systems is performed by comparing or aligning a reference transcription to an hypothesis transcription (system output). The most commonly used metric for assessing the performance of ASR systems is the Word Error Rate (WER): where WER = N ins + N sub + N del N tot N ins is the number of inserted words in the hypothesis N sub is the number of substituted words in the hypothesis N del is the number of deleted words int the hypothesis N tot is the total number of words int the reference

Concretely how do we evaluate the speech recognition systems? Define the task: language, domain, processing and time constraints,... Produce corpora for evaluation training data of 100h of transcribed speech or more development of 3 to 6 hours, (30k to 60k words) test of 3 to 6 hours Evaluate the performance with WER

A concrete example: evaluation of broadcast news transcription systems Record or acquire audio data (for example broadcast news shows) Manually transcribe the data orthographically with specific tools like Xtrans or Transcriber Convert and normalize the transcription to an evaluation format called STM file Evaluate the performance by aligning an hypothesis CTM file with a reference STM file

reference STM file format It is a plain text file format with 7 columns audiofilename channel speaker name starttime endtime speakergenre transcription

Example of STM file 20041006 0700 0800 CLASSIQUE 1 Emmanuel Cugny 8.552 11.702 (o,f0,male) mais tout de suite les grands titres de l actualité maude bayeu bonjour 20041006 0700 0800 CLASSIQUE 1 Maude Bayeu 11.702 18.116 (o,fx,female) bonjour le gouvernement et l opposition enterrent la polémique sur la mission de didier julia pour libérer les otages en irak 20041006 0700 0800 CLASSIQUE 1 Maude Bayeu 18.116 23.329 (o,f3,female) aprés une réunion matignon hier ils ont décidé de rejouer la carte de la cohésion sociale 20041006 0700 0800 CLASSIQUE 1 Maude Bayeu 23.329 27.144 (o,f3,female) du moins dans l attente de la libération de christian chesnot et georges malbrunot 20041006 0700 0800 CLASSIQUE 1 Maude Bayeu 27.144 31.080 (o,f3,female) le premier ministre n en a pas moins condamn? l initiative paralléle du déput? 20041006 0700 0800 CLASSIQUE 1 Maude Bayeu 31.080 37.554 (o,f3,female) ump celui ci s est expliqué hier huis clos devant la commission des affaires étrangéres de l assemblée nationale 20041006 0700 0800 CLASSIQUE 1 Maude Bayeu 37.554 38.709 (o,f3,female )le groupe ump 20041006 0700 0800 CLASSIQUE 1 Maude Bayeu 77.795 80.698 (o,f3,female) (%HESITATION) dick cheney a justifié l intervention dans le pays

hypothesis CTM file format It is a plain text file format with 5 columns audiofilename channel starttime duration recognized word

Example of CTM file 20041006 0700 0800 CLASSIQUE 1 11.12 0.55 bonjour 20041006 0700 0800 CLASSIQUE 1 11.88 0.52 bonjour 20041006 0700 0800 CLASSIQUE 1 12.44 0.21 le 20041006 0700 0800 CLASSIQUE 1 12.65 0.62 gouvernement 20041006 0700 0800 CLASSIQUE 1 13.27 0.04 et 20041006 0700 0800 CLASSIQUE 1 13.31 0.05 l 20041006 0700 0800 CLASSIQUE 1 13.36 0.63 opposition 20041006 0700 0800 CLASSIQUE 1 13.99 0.38 enterre 20041006 0700 0800 CLASSIQUE 1 14.37 0.11 la 20041006 0700 0800 CLASSIQUE 1 14.48 0.51 polémique 20041006 0700 0800 CLASSIQUE 1 14.99 0.18 sur 20041006 0700 0800 CLASSIQUE 1 15.17 0.10 la 20041006 0700 0800 CLASSIQUE 1 15.27 0.37 mission 20041006 0700 0800 CLASSIQUE 1 15.64 0.17 de 20041006 0700 0800 CLASSIQUE 1 15.81 0.36 Didier 20041006 0700 0800 CLASSIQUE 1 16.17 0.33 Julia 20041006 0700 0800 CLASSIQUE 1 16.50 0.14 pour

How do we score the system hypothesis practically? We are using a scoring software developed by NIST (National Institute of Standards and Technology) called sclite We simply run the following command: sclite -D -F -r reference.stm stm -h fhyphothesis.ctm ctm sclite first aligns the sequence of words of the hypothesis sentence with the sequence of words of the reference by using the Levenshtein distance also called the edit distance

Some alignment examples

Some alignment examples REF sept heures et quart maude bayeu nous rappelle les grands titres de l actualité

Some alignment examples REF sept heures et quart maude bayeu nous rappelle les grands titres de l actualité HYP ou sept heures et quart monde bayeux nous rappelle les grands titres de l actu lit et

Some alignment examples REF sept heures et quart maude bayeu nous rappelle les grands titres de l actualité HYP ou sept heures et quart monde bayeux nous rappelle les grands titres de l actu lit et REF: ** sept heures et quart MAUDE BAYEU nous rappelle les grands titres de l **** *** ACTUALIT HYP: OU sept heures et quart MONDE BAYEUX nous rappelle les grands titres de l ACTU LIT ET

Some alignment examples REF sept heures et quart maude bayeu nous rappelle les grands titres de l actualité HYP ou sept heures et quart monde bayeux nous rappelle les grands titres de l actu lit et REF: ** sept heures et quart MAUDE BAYEU nous rappelle les grands titres de l **** *** ACTUALIT HYP: OU sept heures et quart MONDE BAYEUX nous rappelle les grands titres de l ACTU LIT ET Eval: I S S I I S

Spoken language understanding systems goal is to interpret what has been said It relies on a semantic representation of a speech file The semantic knowledge is represented by the definition of an ontology and semantic concepts

In a project called MEDIA of evaluation of spoken language understanding systems, we defined semantic segments as a 3-tuple which contains the mode, affirmative or negative the name of the concept representing the meaning of the sequence of words the value of the concept The evaluation can be done by computing the concept error rate CER (similar to the WER defined previously).

The evaluation of speech synthesis systems is mainly subjective by asking listeners to evaluate the quality of the audio output based on Mean Opinion Score (MOS) MOS uses a 5 scale: 5 Excellent 4 Good 3 Fair 2 Poor 1 Bad In addition to the subjective evaluation, there are also automatic evaluation to measure the performance of grapheme-to-phoneme converter. We compute the phone error rate (similar to word error rate)

Evaluation of speaker identification Login information Connect to the server using the following command: ssh -l LOGIN 192.168.1.220 with the following login/passwords: 10 different accounts LOGIN: clara1 / clara2 /.../ clara10 password: clara for the 10 accounts

Evaluation of speaker identification Concrete case of evaluation of speaker identification

Evaluation of speaker identification Concrete case of evaluation of speaker identification This was an evaluation campaign of speaker identification organized with the EU CHIL project.

Evaluation of speaker identification Concrete case of evaluation of speaker identification This was an evaluation campaign of speaker identification organized with the EU CHIL project. Log in to the server and go the ACOUSTIC PERSON IDENTIFICATION directory

Evaluation of speaker identification Concrete case of evaluation of speaker identification This was an evaluation campaign of speaker identification organized with the EU CHIL project. Log in to the server and go the ACOUSTIC PERSON IDENTIFICATION directory The exercise is to evaluate the Misclassification Rate of the different submissions from AIT, CMU, LIMSI, MIT, UIUC, UPC,

Evaluation of speaker identification Concrete case of evaluation of speaker identification This was an evaluation campaign of speaker identification organized with the EU CHIL project. Log in to the server and go the ACOUSTIC PERSON IDENTIFICATION directory The exercise is to evaluate the Misclassification Rate of the different submissions from AIT, CMU, LIMSI, MIT, UIUC, UPC, What are the performances?

Evaluation of speaker identification Concrete case of evaluation of speech recognition

Evaluation of speaker identification Concrete case of evaluation of speech recognition This was a French evaluation campaign of broadcast news called ESTER

Evaluation of speaker identification Concrete case of evaluation of speech recognition This was a French evaluation campaign of broadcast news called ESTER The exercise is to evaluate the Word Error Rate of the different submissions from LIA, LIMSI and LIUM

Evaluation of speaker identification Concrete case of evaluation of speech recognition This was a French evaluation campaign of broadcast news called ESTER The exercise is to evaluate the Word Error Rate of the different submissions from LIA, LIMSI and LIUM What is the WER for each system?