RT-09 Speaker Diarization Results
|
|
|
- Rafe Alexander
- 10 years ago
- Views:
Transcription
1 Multimodal Information Group Information Technology Laboratory (ITL) Information Access Division (IAD) RT-09 Speaker Diarization Results 2009 Rich Transcription Evaluation Conference May 28-29, 2009 Melbourne, FL Jérôme Ajot & Jonathan Fiscus Rich Transcription
2 RT-09 Evaluation Participants Site ID Site Name SPKR Audio Audio/Video Evaluation Task STT SASTT AMI I2R/NTU Augmented Multi-party Interaction: Univ. Sheffield, IDIAP, Univ. Edinburgh, Univ. of Technology Brno, Univ. Twente Infocomm Research Site and Nanyang Technological University X X X FIT Florida Institute of Technology X ICSI International Computer Science Institute X X X LIA/Eurecom SRI/ICSI Laboratoire Informatique d'avignon/ Ecole d'ingénieurs et centre de recherche en Systèmes de Communications SRI International and International Computer Science Institute UPM Universidad Politécnica de Madrid X UPC Universitat Politècnica de Catalunya X X X X
3 Diarization Who Spoke When Task: (SPKR) Detect segments of speech an cluster them by speaker Primary input condition: Multiple Distant Mics Participating sites: AMI, IIR/NTU, ICSI, LIA/Eurecom, UPC, UPM Reference file construction: (not changed for RT-09) Reference segment derived by: force aligning the IHM audio to the reference transcripts using LIMSI tools Segments built for each word were smoothed with a 0.3s window
4 SPKR System Evaluation Method Step 1: Speaker alignment A one-to-one mapping between reference speaker segment clusters and system determined speaker clusters The mdeval tool was used with a +/- 250ms no-score collar around reference segment boundaries Step 2: Error metric computation Diarization Error Rate (DER) the ratio of incorrectly detected speaker time to total speaker time Error Types: Speaker assignment errors (i.e., detected speech but not assigned to the right speaker) False alarms Missed detections Three scorings performed All speech (Primary metric) Non-overlapping speech (for backward compatibility) Scoring as a Speech Activity Detection system
5 RT-09 SPKR Results Primary Systems, All Speech DER % ami icsi iir-ntu lia-eurecom upc upm 0 adm av+sdm mdm mm3a sdm IIR-NTU has < 10%DER But last test, it was ICSI Improvement with MDM < MM3A < SDM First use of video for diarization
6 RT-09 SPKR Results Primary Systems, All Speech, Split by Error Type DER % SPKR Error Missed Det. False Alm. icsi icsi ami icsi IIR-NTU liaeurecom upc upm icsi ami icsi IIR-NTU upc adm av+sdm mdm mm3a sdm Speaker Error Dominates
7 MDM Detailed Analysis Focused analysis on MDM test condition Correct detection of active speakers All data vs. no overlapping speech vs. speech activity DER variability by meetings Audio + Visual diarization Historical DERs
8 RT-09 Primary SPKR MDM Systems DER Split by Error Type Number of meetings with the correct # of speakers (out of 7) Spkr Error Missed Det. False Alarm Questions: Is there a meeting effect Speaker Errors dominate the scores, not for IIR-NTU False alarms and Missed Det. similar for all
9 RT-09 SPKR Results Primary Systems, MDM Conference Data DER % NonOverlap All Data SPKRasSAD High correlation between with/without overlap SAD scores are commensurate within domain
10 RT-09 Primary SPKR MDM Systems Meeting DERs within/across systems NIST-1 Non-overlap scores have similar distrib. DER % DER % NIST-2 Small sample caveat: 6 systems / 7 meetings EDI_ EDI_ IDI_ IDI_ NIST_ NIST_ NIST_ EDI1 EDI2 IDI1 IDI2 NIST1 NIST2 NIST3
11 Meeting DERs: RT-07 vs RT-09 RT-07 RT-09 Demonstrable meeting effect Large within meeting variation
12 MDM Error Rates by Meeting DER% ami icsi iir-ntu liaeurecom upc upm
13 ICSI SDM + Video Diarization DER % Speaker Err. Missed Det. False Alm. av+sdm sdm av+sdm sdm av+sdm sdm av+sdm sdm EDI-1 EDI-2 IDI-1 IDI-2
14 Historical Best System MDM SPKR Performance (Forced Alignment Mediated)
15 Conclusions Bigger test sets are needed The large variability in meeting error rates Like last year: Lowest error rate system correctly detected the right number of speakers Has performance reach asymptote? What the best performance you can get without solving overlap?
THE goal of Speaker Diarization is to segment audio
1 The ICSI RT-09 Speaker Diarization System Gerald Friedland* Member IEEE, Adam Janin, David Imseng Student Member IEEE, Xavier Anguera Member IEEE, Luke Gottlieb, Marijn Huijbregts, Mary Tai Knox, Oriol
Evaluation of speech technologies
CLARA Training course on evaluation of Human Language Technologies Evaluations and Language resources Distribution Agency November 27, 2012 Evaluation of speaker identification Speech technologies Outline
Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics
Unlocking Value from Patanjali V, Lead Data Scientist, Anand B, Director Analytics Consulting, EXECUTIVE SUMMARY Today a lot of unstructured data is being generated in the form of text, images, videos
Tools & Resources for Visualising Conversational-Speech Interaction
Tools & Resources for Visualising Conversational-Speech Interaction Nick Campbell NiCT/ATR-SLC Keihanna Science City, Kyoto, Japan. [email protected] Preamble large corpus data examples new stuff conclusion
ADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA. Multimedia Communications Department, EURECOM, Sophia Antipolis, France 2
3rd European ignal Processing Conference (EUIPCO) ADAPTIVE AND ONLINE PEAKER DIARIZATION FOR MEETING DATA Giovanni oldi, Christophe Beaugeant and Nicholas Evans Multimedia Communications Department, EURECOM,
SPRACH - WP 6 & 8: Software engineering work at ICSI
SPRACH - WP 6 & 8: Software engineering work at ICSI March 1998 Dan Ellis International Computer Science Institute, Berkeley CA 1 2 3 Hardware: MultiSPERT Software: speech & visualization
Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text
Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text Matthew Cooper FX Palo Alto Laboratory Palo Alto, CA 94034 USA [email protected] ABSTRACT Video is becoming a prevalent medium
Neovision2 Performance Evaluation Protocol
Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram [email protected] Dmitry Goldgof, Ph.D. [email protected] Rangachar Kasturi, Ph.D.
Alignment and Preprocessing for Data Analysis
Alignment and Preprocessing for Data Analysis Preprocessing tools for chromatography Basics of alignment GC FID (D) data and issues PCA F Ratios GC MS (D) data and issues PCA F Ratios PARAFAC Piecewise
2009 Springer. This document is published in: Adaptive Multimedia Retrieval. LNCS 6535 (2009) pp. 12 23 DOI: 10.1007/978-3-642-18449-9_2
Institutional Repository This document is published in: Adaptive Multimedia Retrieval. LNCS 6535 (2009) pp. 12 23 DOI: 10.1007/978-3-642-18449-9_2 2009 Springer Some Experiments in Evaluating ASR Systems
FLORIDA STATE UNIVERSITY SCHOOL OF COMMUNICATION GRADUATE CERTIFICATE IN DIGITAL VIDEO
FLORIDA STATE UNIVERSITY SCHOOL OF COMMUNICATION GRADUATE CERTIFICATE IN DIGITAL VIDEO 1. Purpose. The Digital Video Certificate Program is a response to the changing technological landscape of our mediated
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
THE FUTURE OF BUSINESS MEETINGS APPLICATIONS FOR AMI TECHNOLOGIES
THE FUTURE OF BUSINESS MEETINGS APPLICATIONS FOR AMI TECHNOLOGIES 2006 AMI. All rights reserved. TABLE OF CONTENTS Welcome to the Future 2 The Augmented Multi-party Interaction Project 2 Applications
Speech Transcription
TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1 What Is Speech Recognition? Def: Automatic conversion
ICANN Remote Participation Hub Technical Documentation
ICANN Remote Participation Hub Technical Documentation Revised: 2015-08-26 JTB Remote Hub Technical Documentation Page 2 Remote Participation In order for groups of individuals to remotely participate
Educate and captivate K-12 students with high quality streaming media!
Educate and captivate K-12 students with high quality streaming media! Learn360 Delivers Superior Content Learn360 is a leading interactive media-on-demand service designed specifically for the K-12 educational
CLUSTER ANALYSIS FOR SEGMENTATION
CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every
Speed Performance Improvement of Vehicle Blob Tracking System
Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA [email protected], [email protected] Abstract. A speed
Creating Effective HTML Email Campaigns
Creating Effective HTML Email Campaigns This event is being recorded. You will receive a copy of the audio/video at the end of the presentation. 701 South Broad Street, Lititz, PA 17543 www.listrak.com
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Utilizing Automatic Speech Recognition to Improve Deaf Accessibility on the Web
Utilizing Automatic Speech Recognition to Improve Deaf Accessibility on the Web Brent Shiver DePaul University [email protected] Abstract Internet technologies have expanded rapidly over the past two
THE KRUSKAL WALLLIS TEST
THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON
Extension of ERP for marketing: internal system + external communication Microsoft AX Dynamics. Prof.dr. Dalia Krikščiūnienė
Extension of ERP for marketing: internal system + external communication Microsoft AX Dynamics Prof.dr. Dalia Krikščiūnienė Microsoft AX Dynamics- marketing module ERP in cloud Industry trends for ERP
Biometric Evaluation on the Cloud: A Case Study with HumanID Gait Challenge
2013 Biometric Consortium Conference Sep17-19, 2013, Tampa, Florida, USA Biometric Evaluation on the Cloud: A Case Study with HumanID Gait Challenge Ravi Panchumarthy, Ravi Subramanian, and Sudeep Sarkar
Using ELAN for transcription and annotation
Using ELAN for transcription and annotation Anthony Jukes What is ELAN? ELAN (EUDICO Linguistic Annotator) is an annotation tool that allows you to create, edit, visualize and search annotations for video
Quality of Service and Network Performance (UMTS 22.25 version 3.1.0)
TSG-SA Working Group 1 (Services) meeting #2 Edinburgh, Scotland 9 th -12 th March 1999 TSGS1#2(99)118 Agenda Item: 9.6 Source: Coordinator Title: Document for: Information I Quality of Service and Network
for Lync Interaction Recording
for Lync Interaction Recording Who is Numonix Our Company Numonix is an innovator in the development of interaction recording and quality management solutions. Our systems are deployed globally in contact
Research Article Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics
Hindawi Publishing Corporation EURASIP Journal on Image and Video Processing Volume 2008, Article ID 246309, 10 pages doi:10.1155/2008/246309 Research Article Evaluating Multiple Object Tracking Performance:
Sally Collier. Chief Regulator
Sally Collier Chief Regulator Summer awarding Cath Jadhav Marking Assigning marks to a student s work according to the mark scheme Grading Setting grade boundaries once marking is (mostly) completed Comparable
The REPERE Corpus : a multimodal corpus for person recognition
The REPERE Corpus : a multimodal corpus for person recognition Aude Giraudel 1, Matthieu Carré 2, Valérie Mapelli 2 Juliette Kahn 3, Olivier Galibert 3, Ludovic Quintard 3 1 Direction générale de l armement,
TDWI Best Practice BI & DW Predictive Analytics & Data Mining
TDWI Best Practice BI & DW Predictive Analytics & Data Mining Course Length : 9am to 5pm, 2 consecutive days 2012 Dates : Sydney: July 30 & 31 Melbourne: August 2 & 3 Canberra: August 6 & 7 Venue & Cost
Framing Business Problems as Data Mining Problems
Framing Business Problems as Data Mining Problems Asoka Diggs Data Scientist, Intel IT January 21, 2016 Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS
3.2 LOGARITHMIC FUNCTIONS AND THEIR GRAPHS. Copyright Cengage Learning. All rights reserved.
3.2 LOGARITHMIC FUNCTIONS AND THEIR GRAPHS Copyright Cengage Learning. All rights reserved. What You Should Learn Recognize and evaluate logarithmic functions with base a. Graph logarithmic functions.
Objective Speech Quality Measures for Internet Telephony
Objective Speech Quality Measures for Internet Telephony Timothy A. Hall National Institute of Standards and Technology 100 Bureau Drive, STOP 8920 Gaithersburg, MD 20899-8920 ABSTRACT Measuring voice
BEYOND BOUNDARIES AND INTO THE UNIVERSE OF ACCESS: THE DEVELOPMENT OF A UNIVERSAL DESIGN TEAM TO EMPOWER ALL STRUGGLING READERS AND WRITERS
BEYOND BOUNDARIES AND INTO THE UNIVERSE OF ACCESS: THE DEVELOPMENT OF A UNIVERSAL DESIGN TEAM TO EMPOWER ALL STRUGGLING READERS AND WRITERS WHAT DO ALL THE FOLLOWING TOOLS HAVE IN COMMON? READING ACCESSIBILITY
A tool to assist in the design, redesign, and/or evaluation of online courses.
A tool to assist in the design, redesign, and/or evaluation of online courses. An Initiative Sponsored By: Illinois Online Network (ION) University of Illinois Quality Online Course Initiative Rubric by
Colour Image Segmentation Technique for Screen Printing
60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone
Florida International University - University of Miami TRECVID 2014
Florida International University - University of Miami TRECVID 2014 Miguel Gavidia 3, Tarek Sayed 1, Yilin Yan 1, Quisha Zhu 1, Mei-Ling Shyu 1, Shu-Ching Chen 2, Hsin-Yu Ha 2, Ming Ma 1, Winnie Chen 4,
Improving Residual Risk Management Through the Use of Security Metrics
Improving Residual Risk Management Through the Use of Security Metrics Every investment in security should be effective in reducing risk, but how do you measure it? Jonathan Pagett and Siaw-Lynn Ng introduce
INTRODUCTION TO TRANSANA 2.2 FOR COMPUTER ASSISTED QUALITATIVE DATA ANALYSIS SOFTWARE (CAQDAS)
INTRODUCTION TO TRANSANA 2.2 FOR COMPUTER ASSISTED QUALITATIVE DATA ANALYSIS SOFTWARE (CAQDAS) DR ABDUL RAHIM HJ SALAM LANGUAGE ACADEMY UNIVERSITY TECHNOLOGY MALAYSIA TRANSANA VERSION 2.2 MANAGINGYOUR
Conversion Optimization Tools
Conversion Optimization Tools Choosing the right optimization tools can make a significant difference to your bottom line. Learn all about the latest tools and how they can improve the volume of visitors
Introduzione alle Biblioteche Digitali Audio/Video
Introduzione alle Biblioteche Digitali Audio/Video Biblioteche Digitali 1 Gestione del video Perchè è importante poter gestire biblioteche digitali di audiovisivi Caratteristiche specifiche dell audio/video
Tracking performance evaluation on PETS 2015 Challenge datasets
Tracking performance evaluation on PETS 2015 Challenge datasets Tahir Nawaz, Jonathan Boyle, Longzhen Li and James Ferryman Computational Vision Group, School of Systems Engineering University of Reading,
SPEECH DATA MINING, SPEECH ANALYTICS, VOICE BIOMETRY. www.phonexia.com, 1/41
SPEECH DATA MINING, SPEECH ANALYTICS, VOICE BIOMETRY www.phonexia.com, 1/41 OVERVIEW How to move speech technology from research labs to the market? What are the current challenges is speech recognition
How To Test Video Quality With Real Time Monitor
White Paper Real Time Monitoring Explained Video Clarity, Inc. 1566 La Pradera Dr Campbell, CA 95008 www.videoclarity.com 408-379-6952 Version 1.0 A Video Clarity White Paper page 1 of 7 Real Time Monitor
Everett Public Schools Framework: Digital Video Production VI
Course: CIP Code: 100202 Career Cluster: Video ProductionTechnology/Technician Everett Public Schools Framework: Digital Video Production VI Arts, Audio/Video Technology & Communications Total Framework
IP QoS Interoperability Issues
SP-030371 IP QoS Interoperability Issues Source: Contact: SBC Communications, BT Randolph Wohlert [email protected] 2 Industry Trend: IP Based Services Next Generation Networks Multi-service
Recover My Files v5.2.1. Test Results for Video File Carving Tool
Recover My Files v5.2.1 Test Results for Video File Carving Tool October 22, 2014 This report w as prepared for the Department of Homeland Security Science and Technology Directorate Cyber Security Division
tance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same
1222 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 7, SEPTEMBER 2008 System Combination for Machine Translation of Spoken and Written Language Evgeny Matusov, Student Member,
Giuseppe Riccardi, Marco Ronchetti. University of Trento
Giuseppe Riccardi, Marco Ronchetti University of Trento 1 Outline Searching Information Next Generation Search Interfaces Needle E-learning Application Multimedia Docs Indexing, Search and Presentation
Unified Communications. Increased productivity through communications convergence
Unified Communications Increased productivity through communications convergence Current Infrastructure 10,000 Telephones Over 370 locations 80 Separate Phone Systems Over $3 Million in annual AT&T cost
TED-LIUM: an Automatic Speech Recognition dedicated corpus
TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France [email protected]
AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
Master the Common Core State Standards for Math!
Program Overview RTI Grades K 5 Master the Common Core State Standards for Math! Aligned to the ommon Core STATE STANDARDS Built on Common Core Topic Progressions Designed to help K 5 students master the
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries Copyright GENIVI Alliance
The LENA TM Language Environment Analysis System:
FOUNDATION The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File Dongxin Xu, Umit Yapanel, Sharmi Gray, & Charles T. Baer LENA Foundation, Boulder, CO LTR-04-2 September
LEARNING FEATURE MAPPING USING DEEP NEURAL NETWORK BOTTLENECK FEATURES FOR DISTANT LARGE VOCABULARY SPEECH RECOGNITION
LEARNING FEATURE MAPPING USING DEEP NEURAL NETWORK BOTTLENECK FEATURES FOR DISTANT LARGE VOCABULARY SPEECH RECOGNITION Ivan Himawan 1, Petr Motlicek 1, David Imseng 1, Blaise Potard 1, Namhoon Kim 2, Jaewon
Marzano Center Non-Classroom Instructional Support Member Evaluation Form
Marzano Center Non-Classroom Instructional Support Member Evaluation Form Prepared by Learning Sciences Marzano Center for Teacher and Leader Evaluation March, 2013 OUR MISSION The Learning Sciences Marzano
WHITE PAPER. Talend Infosense Solution Brief Master Data Management for Health Care Reference Data
WHITE PAPER Talend Infosense Solution Brief Master Data Management for Health Care Reference Data Table of contents BUSINESS ISSUE: SOCIAL COLLABORATION AND DATA STEWARDSHIP... 5 BUSINESS ISSUE: FEEDBACK
Digital Video Recorder
EN Digital Video Recorder Hardware Quick Start Guide Welcome! Lets get started. QH16_42000914E Swann 2014 1 1 Getting to know your DVR Congratulations on your purchase of Swann s latest DVR security system!
ADVANCED COMMUNICATION SERIES SPEAKING TO INFORM. Assignment #1: THE SPEECH TO INFORM
Assignment #1: THE SPEECH TO INFORM Select new and useful information for presentation to the audience Organize the information for easy understandability and retention Present the information in a way
PUTTING SCIENCE BEHIND THE STANDARDS. A scientific study of viewability and ad effectiveness
PUTTING SCIENCE BEHIND THE STANDARDS A scientific study of viewability and ad effectiveness EXECUTIVE SUMMARY The concept of when an ad should be counted as viewable, what effects various levels of viewability
A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania
A Short Introduction to Transcribing with ELAN Ingrid Rosenfelder Linguistics Lab University of Pennsylvania January 2011 Contents 1 Source 2 2 Opening files for annotation 2 2.1 Starting a new transcription.....................
1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand
1 Choosing the right data mining techniques for the job (8 minutes,
CS490D Spring 2004 Final Solutions, May 3, 2004 Prof. Chris Clifton Time will be tight. If you spend more than the recommended time on any question, go on to the next one. If you can t answer it in the
