Incremental ASR, NLU and Dialogue Management in the Potsdam INPRO P2 System

Similar documents
Specialty Answering Service. All rights reserved.

VoiceXML-Based Dialogue Systems

Maximizing Learning in Online Training Courses: Meta-Analytic Evidence. Traci Sitzmann

Language Technology II Language-Based Interaction Dialogue design, usability,evaluation. Word-error Rate. Basic Architecture of a Dialog System (3)

2014/02/13 Sphinx Lunch

INF5820, Obligatory Assignment 3: Development of a Spoken Dialogue System

VoiceXML Tutorial. Part 1: VoiceXML Basics and Simple Forms

D2.4: Two trained semantic decoders for the Appointment Scheduling task

Chapter 8 Software Testing

Web page creation using VoiceXML as Slot filling task. Ravi M H

Developing Usable VoiceXML Applications

2011 Springer-Verlag Berlin Heidelberg

Dialog planning in VoiceXML

How do non-expert users exploit simultaneous inputs in multimodal interaction?

Form. Settings, page 2 Element Data, page 7 Exit States, page 8 Audio Groups, page 9 Folder and Class Information, page 9 Events, page 10

Push-to-talk ain t always bad! Comparing Different Interactivity Settings in Task-oriented Dialogue

CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature

OVERVIEW. Microsoft Project terms and definitions

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Experiences from Verbmobil. Norbert Reithinger DFKI GmbH Stuhlsatzenhausweg 3 D Saarbrücken bert@dfki.de

Speech Processing Applications in Quaero

USER MODELLING IN ADAPTIVE DIALOGUE MANAGEMENT

THE THIRD DIALOG STATE TRACKING CHALLENGE

A Practical Guide to Technical Indicators; (Part 1) Moving Averages

Robustness of a Spoken Dialogue Interface for a Personal Assistant

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

PROMISE - A Procedure for Multimodal Interactive System Evaluation

Processing Dialogue-Based Data in the UIMA Framework. Milan Gnjatović, Manuela Kunze, Dietmar Rösner University of Magdeburg

Voice Driven Animation System

Software Development Processes. Software Life-Cycle Models. Process Models in Other Fields. CIS 422/522 Spring

Chapter 6. Decoding. Statistical Machine Translation

ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition

An Introduction to VoiceXML

interactive product brochure :: Nina: The Virtual Assistant for Mobile Customer Service Apps

Interfaces de voz avanzadas con VoiceXML

Configuring Shared Line Appearances over Analog Trunks

Applications of speech-to-text in customer service. Dr. Joachim Stegmann Deutsche Telekom AG, Laboratories

MODELING OF USER STATE ESPECIALLY OF EMOTIONS. Elmar Nöth. University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, F.R.G.

Connect to telephone. Connect to wall jack

Software Development Processes. Software Life-Cycle Models

IVR Primer Introduction

Standard Languages for Developing Multimodal Applications

Envox CDP 7.0 Performance Comparison of VoiceXML and Envox Scripts

EFFECTS OF AUDITORY FEEDBACK ON MULTITAP TEXT INPUT USING STANDARD TELEPHONE KEYPAD

WebSphere Business Monitor

Working with Windows Handout

Using WINK to create custom animated tutorials

Modifying Curriculum and Instruction

lnteractivity in CALL Courseware Design Carla Meskill University of Massachusetts/Boston

Communications Software Engineering Design Model

Language considerations for developing VoiceXML in German

Christian Leibold CMU Communicator CMU Communicator. Overview. Vorlesung Spracherkennung und Dialogsysteme. LMU Institut für Informatik

1. Introduction to Spoken Dialogue Systems

customer care solutions

Running Record Recording Sheet

Transaction-Typed Points TTPoints

Evaluation of speech technologies

Dynamic User Level and Utility Measurement for Adaptive Dialog in a Help-Desk System

Ambient Temperature In Range Probe X

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

The following information can be output as speech: status of the teacher / student connection. time markers of the timers.

Who needs humans to run computers? Role of Big Data and Analytics in running Tomorrow s Computers illustrated with Today s Examples

AVoiceportal Enhanced by Semantic Processing and Affect Awareness

Analog Monitoring Tool AMT 0.3b User Manual

Normality Testing in Excel

Voice User Interfaces (CS4390/5390)

Transport Layer Protocols

Hosted Service Tips and Troubleshooting

Before you can use the Duke Ambient environment to start working on your projects or

Traitement de la Parole

The role of prosody in toddlers interpretation of verbs argument structure

By Ken Thompson, ServQ Alliance

Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper

VoiceXML. Erik Harborg SINTEF IKT. Presentasjon, 4. årskurs, NTNU, ICT

Operating Procedures for the Instron: Purpose and Scope:

TruePort Windows 2000/Server 2003/XP User Guide Chapter

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

VOICEXML TUTORIAL AN INTRODUCTION TO VOICEXML

Speech Applications - Accessing information by voice. Li Haizhou, PhD Vice President, InfoTalk haizhou.li@infotalkcorp.com

Real-Time Software. Basic Scheduling and Response-Time Analysis. René Rydhof Hansen. 21. september 2010

Serial ATA RAID PCI. User's Manual

CDCI Assistive Technology Tryout Center, Communication Connection! The conference call in number for today is Phone number

CS 378: Computer Game Technology

A System for Labeling Self-Repairs in Speech 1

A Development Tool for VoiceXML-Based Interactive Voice Response Systems

Campaign Design Strategies. Matt Sawkins, Product Manager

Working With Direct Deposit Accounts and Your Payment Elections

SOFT 423: Software Requirements

ITRANS PENDING CLAIMS MAILBOX

SIP Registration Stress Test

The irnetbox Manager User Guide

STATE OF THE IVR: INDUSTRY EXPERTS WEIGH IN Insights and best practices for getting the most out of your IVR interactions.

Enhanced Reporting Software Monitor USER MANUAL

Using WhatsUp Gold VoIP Monitor About, configuring, installing, and using the VoIP monitor features in WhatsUp Gold

Auto Attendant User Guide

Announcements. Midterms. Mt #1 Tuesday March 6 Mt #2 Tuesday April 15 Final project design due April 11. Chapters 1 & 2 Chapter 5 (to 5.

MONITORING AND DIAGNOSIS OF A MULTI-STAGE MANUFACTURING PROCESS USING BAYESIAN NETWORKS

Speech and Language Processing

Use fireworks and Bonfire night as a stimulus for programming

Fitting Big Data into Business Statistics

Transcription:

Incremental ASR, NLU and Dialogue Management in the Potsdam INPRO P2 System Timo Baumann, Michaela Atterer, Okko Buss IVI Workshop, Bielefeld, 2009-06-08

Context: Spoken Dialogue Systems ASR TTS visual output NLU response generator dialogue manager history domain

Context: Spoken Dialogue Systems ASR TTS visual output NLU response generator dialogue manager history domain

Context: Spoken Dialogue Systems ASR TTS visual output NLU response generator dialogue manager history domain

Context: Spoken Dialogue Systems ASR TTS visual output NLU response generator dialogue manager history domain no reaction before the user finishes talking

Context: Incremental Spoken Dialogue Systems ASR TTS visual output NLU response generator dialogue manager history domain partial results are being processed immediately reaction is quicker, back-channels are possible

Context: Incremental Spoken Dialogue Systems Part 2: ASR Part 1: Timo Michaela NLU response generator Part 3: dialogue manager TTS visual output Okko history domain partial results are being processed immediately reaction is quicker, back-channels are possible

Context: Incremental Spoken Dialogue Systems ASR Part 1: Timo TTS visual output response generator Assessing NLU and Improving dialogue manager Speech Recognition for Incremental SDS: history domain Measures & Methods Baumann et al., 2009

A Real-World Example of Incremental ASR Hypotheses Software from Malsburg et al., submitted ASR hypotheses change with time (open movie)

A Real-World Example of Incremental ASR Hypotheses Software from Malsburg et al., submitted ASR hypotheses change with time more edit than necessary overhead ~ 90 %! 90 % of a consumers work will be useless

A Real-World Example of Incremental ASR Hypotheses which edits should we trust? Software from Malsburg et al., submitted ASR hypotheses change with time more edit than necessary overhead ~ 90%!

A Real-World Example of Incremental ASR Hypotheses which edits should we Software from Malsburg et al., submitted trust? Patience, ASR hypotheses change with time more edit than necessary overhead ~ 90%! Young Jedi! waiting helps reduce overhead, sacrifice some timeliness

A Real-World Example of Incremental ASR Hypotheses which edits should we Software from Malsburg et al., submitted trust? Patience, ASR hypotheses change with time more edit than necessary overhead ~ 90%! Young Jedi! waiting helps reduce overhead, sacrifice some timeliness

A Reduced Example... w gold time: w hyp1 w hyp2 w hyp3 w hyp4 w hyp5 w hyp6 w hyp7 w hyp8 w hyp9 w hyp10 w hyp11 w hyp12 zwei drei 0 1 2 3 4 5 6 7 8 9 10 11 12 an ein zwei zwar zwei zwei zwei zwei drei... (an) ª(an), (ein) ª(ein), () (zwei) ª(zwei), (zwar) ª(zwar), (zwei) (drei) w hypt is the word sequence hypothesized at time t two dimensions: time we reason about: time we reason at: wgold is final hypothesis

Change Measure... w gold time: w hyp1 w hyp2 w hyp3 w hyp4 w hyp5 w hyp6 w hyp7 w hyp8 w hyp9 w hyp10 w hyp11 w hyp12 zwei drei 0 1 2 3 4 5 6 7 8 9 10 11 12 () an ein (zwei) zwei zwar zwei zwei zwei (drei) zwei drei... (an) ª(an), (ein) ª(ein), () (zwei) ª(zwei), (zwar) ª(zwar), (zwei) (drei) changes on the right add, delete or revise ideally: one add per word in fact: edit overhead EO = unnecessary edits edits

Change Measure... w gold time: w hyp1 w hyp2 w hyp3 w hyp4 w hyp5 w hyp6 w hyp7 w hyp8 w hyp9 w hyp10 w hyp11 w hyp12 zwei drei 0 1 2 3 4 5 6 7 8 9 10 11 12 () an ein (zwei) zwei zwar zwei zwei zwei (drei) zwei drei... (drei) ideally: 3 edits (an) ª(an), (ein) ª(ein), () (zwei) ª(zwei), (zwar) ª(zwar), (zwei) (drei) actually: 11 edits unwanted: 8 edits EO: 8/11 = 72 %

Edits are bad: edits lead to unnecessary processing of a consumer less edits mean less processing we would like to reduce the edit overhead by deferring or suppressing edits deferring edits leads to delays, deteriorating timing measures

Measuring Timing w gold w hyp1 w hyp2 w hyp3 w hyp4 w hyp5 w hyp6 w hyp7 w hyp8 w hyp9 w hyp10 w hyp11 w hyp12... time: zwei drei 0 1 2 3 4 5 6 7 8 9 10 11 12 an ein zwei WFC zwei =1 { zwar zwei zwei zwei zwei drei... WFF zwei =0 when do we find out about a word? word first correct: WFC when do we become certain about a word? word first final: WFF this is per word averages are important

Measuring Timing w gold w hyp1 w hyp2 w hyp3 w hyp4 w hyp5 w hyp6 w hyp7 w hyp8 w hyp9 w hyp10 w hyp11 w hyp12... zwei drei time: 0 1 2 3 4 5 6 7 8 9 10 11 12 an ein zwei WFC zwei =1 { zwar zwei zwei zwei zwei drei... WFF zwei =0 when do we find out about a word? word first correct: WFC when do we become certain about a word? first word first final: WFF final this is per word averages are important

Timing Measures depending on the use-case we may care for if we want to assume as soon as possible low if we want to know as soon as possible low first final deferring edits means two things: worse first timings (as the lag passes through) less increase in final timings, (if we eliminate wrong edits)

Certainty Considerations the correction time for a word is WFF WFC WFC 58.6 % of all words are immediately correct we can calculate the degree of certainty for given hypothesis ages e.g. if a correct hyp. lasts for 0.55 s, we can be certain (95 %) that it will not change anymore

Improving Incremental ASR our primary goal is to reduce edit overhead by deferring or suppressing edits deferring edits will always hurt first timings less impact on final timings the final (non-incremental) result does not change only trust older parts of hyps. (Right Context) only trust older edits (Message Smoothing)

Right Context to Improve Incremental Performance much jitter is at the right end of the hypotheses at time t only evaluate hypt up to t Δ we need to take this into account for correctness: fair r-correct: w hypt t = w goldt first increases with Δ, final increases Δ we can predict the future with negative Δ e.g. fair r-correctness down 50% at 100ms in the future

Message Smoothing to Improve Incremental Performance most bad edits only last for a short while ''zwei'' '' zwar'' '' zwei'' hold back edits until they reach a certain age only output if they don't die before maturing multiple short edits of a word may delay messages: first may grow without fixed bounds occasionally probable resolution/mitigation: future work future work allow for some kind of ''majority smoothing''

Right Context vs. Smoothing EO parity (50 %)

Right Context vs. Smoothing EO parity (50 %) Right Context: 530 ms bounded ( 530) timing increase Smoothing: 110ms window low (+140/67 ms) timing increase

Context: Incremental Spoken Dialogue Systems ASR TTS visual output Part 2: Michaela NLU dialogue manager history response generator Incremental Natural Language Understanding domain

Incremental NLU in INPRO P2 RUBISC: Robust Unification-Based Incremental Semantic Chunker (SRSL 2009) Incremental Probabilistic Reference Resolver (submitted)

RUBISC: Incremental Chunking with Semantic Chunks Chunks based on semantic content rather than syntax Inspired by the notion of sense units (Selkirk, 1984) φ-phrases: consist of head and all its specifiers/head and all the material on the non-recursive side of the head up to the next head outside of its maximal projection (Nespor and Vogel, 1986) Sense units: up to head; semantic chunks: up to semantically relevant material

Domain (revisited) Actions: grasp, turn, flip, move Objects: w, cross,... End positions head, leg,...

Incremental Chunking

Grammar @:action @:entity:name @:entity:xpos @:entity:ypos @:end action:flipping-> spieg(le el) {flip} action:grasping,end:empty -> nimm nehme {take} entity:name:x -> kreuz plus ((das ein) x) {cross plus ((the an) x)} end:horizontal,action:flipping -> horizontal {horizontally}

Slot Unification

Making Turn-Taking Decisions Schieb das Kreuz in den Kopf des Tieres, das wie ein Elefant... Push the cross into the head of the animal, which looks like... Atterer, Baumann, Schlangen (Coling 2008): Syntactic features : don't know if sentence has ended Atterer & Schlangen (SRSL 2009): Chunker state: 3 slots filled: action:movement object:cross end:head -> can react, barge in etc

Implementation OAA-agent Supports: add, revoke, commit Evaluation: 55% frames correct, 87% slots correct

Incremental Reference Resolver Complementary agent to chunker Bayesian believe update model which treats the intended referent as a latent variable generating a sequence of observations (w 1:n is the sequence of words w 1, w 2,..., w n ): P (r w 1,...,w n ) = α P (w r, w,...,w ) P (r w,...w ) n 1 n 1 1 n 1 Normalizing factor Likelihood of the new observation (approximated) Prior at step n: posterior of previous step

Reference Resolution -- Disfluencies Psycholinguistic evidence: more hesitations when describing something hard (Tanenhaus et al., 1995; Brennan & Schober, 2001; Bailey & Ferreira, 2007; Arnold et al., 2007) Include (filled) pauses into our language model

Probabilistic Reference Resolution OAA agent demo:

Probabilistic Reference Resolution Add 'nimm':

Probabilistic Reference Resolution Add 'das':

Probabilistic Reference Resolution Add '<>':

Probabilistic Reference Resolution Add 'm':

Probabilistic Reference Resolution Revoke 'm':

Reference Resolution - Evaluation N-best approach: select all pieces above a threshold hes no hes Last belief correct: 55% 54% When correct? 88% 85% correct during ences: 37% 31% (Schlangen, Atterer, Baumann (submitted))

Context: Incremental Spoken Dialogue Systems ASR TTS visual output NLU dialogue manager response generator Part 3: Okko Incremental Dialogue Management history domain

Dialogue Manager Overview Receives incremental add/revoke/commit messages from acoustic, ASR and NLU components and allows in-utterance processing. Handles timeouts for different kinds of dialogue acts to define behaviour (domain-independent interaction management/dialog skills). Delegates actions across modalities via Action Manager

Dialogue Manager Incremental vs transaction-based DM (1) Incremental Transaction-based Events are handled asynchronously. Events are handled sequentially. All components are always active, no blocking control. Single component has control. Dialogue state and timeouts must vary dynamically. Static property set.

Dialogue Manager Incremental vs transaction-based DM (2) Transaction-based dialog timeouts timers timeout ence maxspeech ence user Take the um... X-shaped one semantics {action:- piece:-} {action:'take' piece:'x'} system Which piece? <takes the X>

Dialog Manager Incremental Timers & States (1) Timers States maxspeech preutterance Cuts off talk that leads nowhere ence Has settings for types of ence correlating to dialog states nocontent partialcontent fullcontent

Dialog Manager Incremental Timers & States (2) system: Which piece? preutterance timer Hello? user: <ence> system: Which piece? preutterance timer nocontent timer Yes? user: <ence> Erm, yes let's see... system: Which piece? nocontent timer partialcontent timer fullcontent timer Ok! user: Let's see... take the erm... x-shaped piece

Dialogue Manager Incremental vs transaction-based DM (3) Incremental dialog timeouts timers maxspeech preutterance inutterance(1) inutterance(2) postutterance user Take the um... X-shaped one semantics {action:- piece:-} {action:'take' piece:-} {action:'take' piece:'x'} system Which piece? Ok. <takes the X> And now?

Dialogue Manager Incremental vs transaction-based DM (4) Incremental dialog management has the ability to: process user input before end of utterance. reset timers and thresholds during utterance. issue clarification requests mid-utterance. enable backchannel utterances/dialog acts.

Dialogue Manager Incremental DM State Chart preutterance State onentry: Prompt onentry: start encetimer[preutterance] ontimeout: Reprompt onexit: stop timer Silence ASR/VAD nocontent State onentry: start encetimer[inutterance] onentry: start uselestalktimer[nosem] ontimeout: Reprompt onexit: stop timers Perform action Silence Chunker (slot) Chunker(revoke) fullcontent State onentry: start encetimer[postutt] ontimeout: performaction Prosody (final)/ Chunker (frame) Chunker/ Prosody (revoke) partialcontent State onentry: start encetimer[inutterance] onentry: start uselesstalktimer[sem] ontimeout: Reprompt onexit: stop timers

Dialog Management Action Management Action Management delegates output to GUI and speech production modalities. Aware of status of output modalities (busy/idle), can choose free modality if a dialog act can be embodied in either.

Summing Up Incremental component features Optimization of Incr. Output ASR TTS visual output Robust Incr. Chunking, Hesitation sensitive Incr. Statistical RefRes NLU dialogue manager response generator Status aware, Dynamic modalities history domain Dynamic thresholds for dialogue states, Backchannel

Thank You! Acknowledgements: David Schlangen DFG for funding (Emmy Noether programme)

Dialogue Manager Incremental vs transaction-based DM (3) Transaction-based timeouts (from VXML) timers timeout incomplete maxspeech complete user semantics {action:- piece:-} {action:'take' piece:'x'} system (prompts) (responds)