RT-09 Speaker Diarization Results



Similar documents
THE goal of Speaker Diarization is to segment audio

Evaluation of speech technologies

Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics

Tools & Resources for Visualising Conversational-Speech Interaction

ADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA. Multimedia Communications Department, EURECOM, Sophia Antipolis, France 2

SPRACH - WP 6 & 8: Software engineering work at ICSI

Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text

Neovision2 Performance Evaluation Protocol

Alignment and Preprocessing for Data Analysis

2009 Springer. This document is published in: Adaptive Multimedia Retrieval. LNCS 6535 (2009) pp DOI: / _2

FLORIDA STATE UNIVERSITY SCHOOL OF COMMUNICATION GRADUATE CERTIFICATE IN DIGITAL VIDEO

Environmental Remote Sensing GEOG 2021

THE FUTURE OF BUSINESS MEETINGS APPLICATIONS FOR AMI TECHNOLOGIES

Speech Transcription

ICANN Remote Participation Hub Technical Documentation

Educate and captivate K-12 students with high quality streaming media!

CLUSTER ANALYSIS FOR SEGMENTATION

Speed Performance Improvement of Vehicle Blob Tracking System

Creating Effective HTML Campaigns

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

Utilizing Automatic Speech Recognition to Improve Deaf Accessibility on the Web

THE KRUSKAL WALLLIS TEST

Extension of ERP for marketing: internal system + external communication Microsoft AX Dynamics. Prof.dr. Dalia Krikščiūnienė

Biometric Evaluation on the Cloud: A Case Study with HumanID Gait Challenge

Using ELAN for transcription and annotation

Quality of Service and Network Performance (UMTS version 3.1.0)

for Lync Interaction Recording

Research Article Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics

Sally Collier. Chief Regulator

The REPERE Corpus : a multimodal corpus for person recognition

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

Framing Business Problems as Data Mining Problems

3.2 LOGARITHMIC FUNCTIONS AND THEIR GRAPHS. Copyright Cengage Learning. All rights reserved.

Objective Speech Quality Measures for Internet Telephony

BEYOND BOUNDARIES AND INTO THE UNIVERSE OF ACCESS: THE DEVELOPMENT OF A UNIVERSAL DESIGN TEAM TO EMPOWER ALL STRUGGLING READERS AND WRITERS

A tool to assist in the design, redesign, and/or evaluation of online courses.

Colour Image Segmentation Technique for Screen Printing

Florida International University - University of Miami TRECVID 2014

Improving Residual Risk Management Through the Use of Security Metrics

INTRODUCTION TO TRANSANA 2.2 FOR COMPUTER ASSISTED QUALITATIVE DATA ANALYSIS SOFTWARE (CAQDAS)

Conversion Optimization Tools

Introduzione alle Biblioteche Digitali Audio/Video

Tracking performance evaluation on PETS 2015 Challenge datasets

SPEECH DATA MINING, SPEECH ANALYTICS, VOICE BIOMETRY. 1/41

How To Test Video Quality With Real Time Monitor

Everett Public Schools Framework: Digital Video Production VI

IP QoS Interoperability Issues

Recover My Files v Test Results for Video File Carving Tool

tance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same

Giuseppe Riccardi, Marco Ronchetti. University of Trento

Unified Communications. Increased productivity through communications convergence

TED-LIUM: an Automatic Speech Recognition dedicated corpus

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Master the Common Core State Standards for Math!

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

The LENA TM Language Environment Analysis System:

LEARNING FEATURE MAPPING USING DEEP NEURAL NETWORK BOTTLENECK FEATURES FOR DISTANT LARGE VOCABULARY SPEECH RECOGNITION

Marzano Center Non-Classroom Instructional Support Member Evaluation Form

WHITE PAPER. Talend Infosense Solution Brief Master Data Management for Health Care Reference Data

Digital Video Recorder

ADVANCED COMMUNICATION SERIES SPEAKING TO INFORM. Assignment #1: THE SPEECH TO INFORM

PUTTING SCIENCE BEHIND THE STANDARDS. A scientific study of viewability and ad effectiveness

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

1 Choosing the right data mining techniques for the job (8 minutes,

Transcription:

Multimodal Information Group Information Technology Laboratory (ITL) Information Access Division (IAD) RT-09 Speaker Diarization Results 2009 Rich Transcription Evaluation Conference May 28-29, 2009 Melbourne, FL Jérôme Ajot & Jonathan Fiscus Rich Transcription http://itl.nist.gov/iad/mig/tests/rt/2009/

RT-09 Evaluation Participants Site ID Site Name SPKR Audio Audio/Video Evaluation Task STT SASTT AMI I2R/NTU Augmented Multi-party Interaction: Univ. Sheffield, IDIAP, Univ. Edinburgh, Univ. of Technology Brno, Univ. Twente Infocomm Research Site and Nanyang Technological University X X X FIT Florida Institute of Technology X ICSI International Computer Science Institute X X X LIA/Eurecom SRI/ICSI Laboratoire Informatique d'avignon/ Ecole d'ingénieurs et centre de recherche en Systèmes de Communications SRI International and International Computer Science Institute UPM Universidad Politécnica de Madrid X UPC Universitat Politècnica de Catalunya X X X X

Diarization Who Spoke When Task: (SPKR) Detect segments of speech an cluster them by speaker Primary input condition: Multiple Distant Mics Participating sites: AMI, IIR/NTU, ICSI, LIA/Eurecom, UPC, UPM Reference file construction: (not changed for RT-09) Reference segment derived by: force aligning the IHM audio to the reference transcripts using LIMSI tools Segments built for each word were smoothed with a 0.3s window

SPKR System Evaluation Method Step 1: Speaker alignment A one-to-one mapping between reference speaker segment clusters and system determined speaker clusters The mdeval tool was used with a +/- 250ms no-score collar around reference segment boundaries Step 2: Error metric computation Diarization Error Rate (DER) the ratio of incorrectly detected speaker time to total speaker time Error Types: Speaker assignment errors (i.e., detected speech but not assigned to the right speaker) False alarms Missed detections Three scorings performed All speech (Primary metric) Non-overlapping speech (for backward compatibility) Scoring as a Speech Activity Detection system

RT-09 SPKR Results Primary Systems, All Speech DER % 40 35 30 25 20 15 10 5 ami icsi iir-ntu lia-eurecom upc upm 0 adm av+sdm mdm mm3a sdm IIR-NTU has < 10%DER But last test, it was ICSI Improvement with MDM < MM3A < SDM First use of video for diarization

RT-09 SPKR Results Primary Systems, All Speech, Split by Error Type DER % 40 35 30 25 20 15 10 5 0 SPKR Error Missed Det. False Alm. icsi icsi ami icsi IIR-NTU liaeurecom upc upm icsi ami icsi IIR-NTU upc adm av+sdm mdm mm3a sdm Speaker Error Dominates

MDM Detailed Analysis Focused analysis on MDM test condition Correct detection of active speakers All data vs. no overlapping speech vs. speech activity DER variability by meetings Audio + Visual diarization Historical DERs

RT-09 Primary SPKR MDM Systems DER Split by Error Type Number of meetings with the correct # of speakers 35 30 25 20 15 10 5 0 (out of 7) 0 0 6 2 0 4 Spkr Error Missed Det. False Alarm Questions: Is there a meeting effect Speaker Errors dominate the scores, not for IIR-NTU False alarms and Missed Det. similar for all

RT-09 SPKR Results Primary Systems, MDM Conference Data 35 30 25 DER % 20 15 10 5 0 NonOverlap All Data SPKRasSAD High correlation between with/without overlap SAD scores are commensurate within domain

RT-09 Primary SPKR MDM Systems Meeting DERs within/across systems NIST-1 Non-overlap scores have similar distrib. DER % DER % NIST-2 Small sample caveat: 6 systems / 7 meetings EDI_20071128-1000 EDI_20071128-1500 IDI_20090128-1600 IDI_20090129-1000 NIST_20080201-1405 NIST_20080227-1501 NIST_20080307-0955 EDI1 EDI2 IDI1 IDI2 NIST1 NIST2 NIST3

Meeting DERs: RT-07 vs RT-09 RT-07 RT-09 Demonstrable meeting effect Large within meeting variation

MDM Error Rates by Meeting DER% 80 70 60 50 40 30 20 10 0 ami icsi iir-ntu liaeurecom upc upm

ICSI SDM + Video Diarization 35 30 25 DER % 20 15 10 5 0 Speaker Err. Missed Det. False Alm. av+sdm sdm av+sdm sdm av+sdm sdm av+sdm sdm EDI-1 EDI-2 IDI-1 IDI-2

Historical Best System MDM SPKR Performance (Forced Alignment Mediated)

Conclusions Bigger test sets are needed The large variability in meeting error rates Like last year: Lowest error rate system correctly detected the right number of speakers Has performance reach asymptote? What the best performance you can get without solving overlap?