VoiceXML-Based Dialogue Systems



Similar documents
An Introduction to VoiceXML

Standard Languages for Developing Multimodal Applications

Dialog planning in VoiceXML

VoiceXML Overview. James A. Larson Intel Corporation (c) 2007 Larson Technical Services 1

! <?xml version="1.0">! <vxml version="2.0">!! <form>!!! <block>!!! <prompt>hello World!</prompt>!!! </block>!! </form>! </vxml>

VoiceXML Programmer s Guide

VoiceXML and Next-Generation Voice Services

VoiceXML Tutorial. Part 1: VoiceXML Basics and Simple Forms

Mobile Application Languages XML, Java, J2ME and JavaCard Lesson 03 XML based Standards and Formats for Applications

How To Use Voicexml On A Computer Or Phone (Windows)

VOICEXML TUTORIAL AN INTRODUCTION TO VOICEXML

The Future of VoiceXML: VoiceXML 3 Overview. Dan Burnett, Ph.D. Dir. of Speech Technologies, Voxeo Developer Jam Session May 20, 2010

Voice User Interface Design

VoiceXML versus SALT: selecting a voice

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

VoiceXML Discussion.

XML based Interactive Voice Response System

Thin Client Development and Wireless Markup Languages cont. VoiceXML and Voice Portals

Grammar Reference GRAMMAR REFERENCE 1

Web page creation using VoiceXML as Slot filling task. Ravi M H

BeVocal VoiceXML Tutorial

Open Source VoiceXML Interpreter over Asterisk for Use in IVR Applications

VoiceXML. For: Professor Gerald Q. Maguire Jr. By: Andreas Ångström, and Johan Sverin, Date:

Traitement de la Parole

A Development Tool for VoiceXML-Based Interactive Voice Response Systems

Form. Settings, page 2 Element Data, page 7 Exit States, page 8 Audio Groups, page 9 Folder and Class Information, page 9 Events, page 10

Model based development of speech recognition grammar for VoiceXML. Jaspreet Singh

Voice Tools Project (VTP) Creation Review

Envox CDP 7.0 Performance Comparison of VoiceXML and Envox Scripts

Interfaces de voz avanzadas con VoiceXML

Avaya Aura Orchestration Designer

Dialogic PowerMedia XMS VoiceXML

Combining VoiceXML with CCXML

VoiceXML and VoIP. Architectural Elements of Next-Generation Telephone Services. RJ Auburn

Voice Driven Animation System

Migrating Legacy IVR Applications to VoiceXML with Voxeo The advantages of a 100% VoiceXML compliant platform

AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION

A design of the transcoder to convert the VoiceXML documents into the XHTML+Voice documents

How To Develop A Voice Portal For A Business

Moving Enterprise Applications into VoiceXML. May 2002

VoiceXML Data Logging Overview

Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper

VXI* IVR / IVVR. VON.x 2008 OpenSER Summit. Ivan Sixto CEO / Business Dev. Manager. San Jose CA-US, March 17th, 2008

Specialty Answering Service. All rights reserved.

Emerging technologies - AJAX, VXML SOA in the travel industry

Develop Software that Speaks and Listens

Dialogic PowerMedia XMS VoiceXML

VOICE INFORMATION RETRIEVAL FOR DOCUMENTS. Except where reference is made to the work of others, the work described in this thesis is.

IVR CRM Integration. Migrating the Call Center from Cost Center to Profit. Definitions. Rod Arends Cheryl Yaeger BenchMark Consulting International

9RLFH$FWLYDWHG,QIRUPDWLRQ(QWU\7HFKQLFDO$VSHFWV

Personal Voice Call Assistant: VoiceXML and SIP in a Distributed Environment

Christian Leibold CMU Communicator CMU Communicator. Overview. Vorlesung Spracherkennung und Dialogsysteme. LMU Institut für Informatik

Design Grammars for High-performance Speech Recognition

Speech Recognition of a Voice-Access Automotive Telematics. System using VoiceXML

Developing Usable VoiceXML Applications

Presentation / Interface 1.3

ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition

Oct 15, Internet : the vast collection of interconnected networks that all use the TCP/IP protocols

WebSphere Voice Server for Multiplatforms. VoiceXML Programmer s Guide

Voice XML: Bringing Agility to Customer Self-Service with Speech About Eric Tamblyn Voice XML: Bringing Agility to Customer Self-Service with Speech

Using Service Oriented Architecture (SOA) for Speaker-Biometrics Applications

Building Applications with Vision Media Servers

Cisco IOS VoiceXML Browser

WebSphere Business Monitor

Version 2.6. Virtual Receptionist Stepping Through the Basics

Dialogos Voice Platform

Abstract. Avaya Solution & Interoperability Test Lab

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

HOPS Project presentation

Hosted Fax Mail. Hosted Fax Mail. User Guide

10CS73:Web Programming

Phone Routing Stepping Through the Basics

1. Introduction to Spoken Dialogue Systems

Creating a low cost VoiceXML Gateway to replace IVR systems for rapid deployment of voice applications.

Description: Objective: Upon completing this course, the learner will be able to meet these overall objectives:

Integrating VoiceXML with SIP services

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Voice Processing Standards. Mukesh Sundaram Vice President, Engineering Genesys (an Alcatel company)

Aspect Education Services

Introduction to XML Applications

CCXML & the Power of Standards-Based Call Control E X E C U T I V E B R I E F I N G M A R C H

Multimodal Web Content Conversion for Mobile Services in a U-City

Ecma/TC32-TG11/2012/005. What is CSTA? CSTA Overview. Updated by TG11 April 2012

ECMA TR/88. 1 st Edition / June Designing an Object Model for ECMA-269 (CSTA)

Transcription:

VoiceXML-Based Dialogue Systems Pavel Cenek Laboratory of Speech and Dialogue Faculty of Informatics Masaryk University Brno

Agenda Dialogue system (DS) VoiceXML Frame-based DS in general 2

Computer based system Dialogue System (DS) Communicates with the user by means of natural language Spoken or written form Interactive system 3

4 Dialogue System Structure

DS Components Automatic Speech Recognition Transforms speech to text Two basic types Grammar-based ASR The set of accepted phrases defined by regular/context-free grammars (i.e. language model in the form of a grammar) Usually speaker independent Dictation machine Recognizes any utterance N-gram language model Often speaker dependent 5

DS Components Natural Language Understanding Analyzes textual utterance and returns its formal semantic representation Logical formula Name/value pairs When using a grammar-based ASR, semantics is usually encoded in the grammar (referred to as semantic-based grammar) 6

DS Components Dialogue Manager Coordinates activity of all components Maintains representation of the current state of the dialogue Communicates with external applications Decides about the next dialogue step 7

DS Components Natural Language Generation Produces a textual utterance (so called surface realization) from an internal (formal) representation of the answer The surface realization can include formatting information Speaking style, prosody, pauses Earcons Background sounds... 8

DS Components Text-to-Speech Renders an acoustic representation of the surface realization 9

DS Taxonomy According to the Dialogue Management Approach Finite-state dialogue systems Dialogue expressed as a network of states connected by edges Frame-based dialogue systems Agent-based dialogue systems Communication modelled as an interaction of two intelligent agents that are able to reason Agents have some mental attitudes, e.g. beliefs, desires, intentions and goals 10

Frame-Based Dialogue Systems Based on the slot-filling concept Slots represent containers for information that must be elicited from the user Semantics is expressed as name/value pairs Slots are stored in a structure called frame Manageable with the current level of technology 11

Frame-based Dialogue Systems (2) 12 Prompt: Where and when do you want to travel? Grammar: <departure and arrival city, date and time specification> Help: Please specify the departure and arrival city, date and time FROM Prompt: From which city are you leaving? Grammar: <city specification> Help: Tell me the name of the city you want to leave from TO Prompt: To which city do you want to travel? Grammar: <city specification> Help: Tell me the name of the city you want to travel to WHEN Prompt: When do you want to travel? Grammar: <date and time specification> Help: Please specify date and time of your journey Filled: SELECT * FROM connections WHERE departure like 'FROM' AND destination like 'TO' AND time like 'WHEN'

Agenda Dialogue system (DS) VoiceXML Frame-based DS in general 13

VoiceXML Markup language with XML syntax designed for creating audio dialogues featuring Speech recognition and DTMF input Recording of spoken input Speech synthesis and digitized audio playback Mixed initiative conversation Declarative with procedural parts W3C Voice Browser Activity W3C Voice Browser Working Group 14 Applying Web technology to enable users to access services from their telephone via a combination of speech and DTMF

Application Architecture Application layer Presentation layer User DB App logic HTML VoiceXML Voice Browser HTML Browser Phone 15

VoiceXML (2) W3C Speech Interface Framework VoiceXML 2.0 (VXML) dialogue management Speech Recognition Grammar Specification (SRGS) grammar syntax Semantic Interpretation for SR (SISR) process of semantic interpretation of utterances Speech Synthesis Markup Language (SSML) description of the utterance surface realization Call Control XML (CCXML) telephony support http://www.w3.org/voice/ 16

17 Voice Browser

VoiceXML Platform http://www.optimsys.cz/ Free for private and educational use and for non-profit research Extremely modular and extensible Can work on a desktop computer with microphone and speakers Special features for research 18

VoiceXML Document Structure Form 1 Form Item 1 Form Item 2... Form Item N Document... Form N Form Item 1 Form Item 2... Form Item N 19

Form items <block> contains executable code <field> gathers input form the user Form Items <initial> defines first step of a mixed-initiative conversation <subdialog> calling subrutine (mechanism for reusing common dialogs) <record> records user's input <object> calls platform specific extensions <transfer> transfers to another phone number 20

Variables and Scripts VoiceXML exploits ECMAScript Each variable in VoiceXML is an ECMAScript variable (types: undefined, null, bool, integer, double, string, object) Each expression in VoiceXML is an ECMAScript expression Variable is declared using the <var> tag (e.g. <var name= age expr= 20 >) Variable can be later assigned value using the <assign> tag (e.g. <assign name= age expr= age+1 >) 21

Variables and Scripts (2) Each variable must be declared before it is used The <script> tag can contain or refer to some ECMAScript code Declaration of a variable in a script is equivalent to the declaration using the <var> tag 22

23 Form Interpretation Algorithm (FIA) Applies to one form, no implicit transition between forms is performed by the interpreter Each form item has an attribute name specifying name of a variable that is declared by the interpreter and associated with the form item (if the attribute is missing, a name is generated internally) At the beginning, the variable value is undefined (so called unvisited form item) FIA finds the first unvisited form item in document order and interprets it Interpretation of each form item leads under normal cirstumcances to filling in a value into its associated variable FIA ends when there are no more unvisited form items or if the interpretation is explicitly transited to another <form>

Block of executable code Form Item <block> 24 Before <block> is interpreted, its associated variable is set to true to prevent it from being visited again Interpreting <block> means to execute code contained in the <block>, in particular <goto> transition among form items/forms/documents <if><elseif><else> conditional execution <prompt> prompt said to the user <assign> <script>

Gathers input form the user Contains specification of Form Item <field> Prompts that should be spoken to the users tag <prompt> Grammars that define set of acceptable user s utterances and their semantics tag <grammar> Actions that should be performed when the form item variable is filled (tag <filled>) 25

<prompt> Content of the tag is the SSML language <audio> plays an audio file <emphasis> emphasises its content <break> places a break into the speech <voice> sets voice parameters <prosody> influences prosody of the speech 26

<grammar> Content of the tag is the SRGS language, content of the semantic tags is the SISR language SRGS has two equivalent forms XML form and ABNF form If user's utterance matches the grammar, the corresponding parse tree is used for semantic interpretation 27

<grammar> (2) #ABNF 1.0 UTF-8; language en; mode dtmf; root $main; public $main = {$=0;} ($digit {$=$*10+parseInt($digit)})<1-10>; $digit = 1 2 3 4 5 6 7 8 9 0; Utterance 123 28 $main $=0 $digit $=$*10+parseInt($digit) $digit $=$*10+parseInt($digit) $digit $=$*10+parseInt($digit) 1 2 3

<filled> Descendant of <form> or a form item Content of the <filled> tag is executable code 29 After interpreting a form item, the interpreter iterates through all <filled> in document order and executes that ones which fire. There is no priority of form descendants and form item descendatns! <filled> can fire if a specified combination of form items has been filled (and at least one in the last dialogue step) or any of specified form items has been filled

Events Events are named objects that are generated as reaction to the occurrence of a particular situation or condition User does not respond (noinput) User s response is not intelligible (nomatch) An error occurred (error.*) Explicitely thrown using the <throw> tag Events are caught by event handlers (<catch event= name >, <nomatch>, <noinput>, <error>, <help>) Event handlers contain executable code 30

Links The voice analogy to HTML hyperlinks <link> contains <grammar>s When a grammar contained in the link is matched, the action defined by the link is performed The action can be Transition to a place in a document Throwing an event with specified name 31

Scopes Document <vxml> Dialog <form> Form items <catch> <filled> 32 Scopes can contain in general Grammars Event handlers Variables and scripts Not all the items are allowed in each of the scopes Grammars are matched and event handlers and variables searched from the most inner scope

Counters Each form item and each event has associated a counter The counters are reset when the form is entered and increased each time the prompt in the form item is spoken / the event is thrown This allows for tapered prompting and different reactions on repeatedly occuring events 33

Mixed Initiative Dialogues (Form Item <initial>) 34 <initial> defines prompts and event handlers used for opening the dialogue When interpreted Computer speaks the prompts (an open ended question) User answers with an utterance. User has a higher freedom of utterance formulation and can fill multiple form items in a single step Form-level (and higher level) grammars are active to match the utterance Form-level <filled> sections triggered by a combination of filled form items are executed Values of unfilled form items are elicited in a directed dialogue in next iterations of the FIA

Agenda Dialogue system (DS) VoiceXML Frame-based DS in general 35

Dialogue Strategies The task of the dialogue strategy is to decide what the next step of the dialogue will be The decision is made based on Semantics of the last user's utterance History of the conversation Knowledge of the domain Knowledge of the user (user model) 36

Dialogue Strategies (2) Local dialogue strategies Control subdialogues with the aim of eliciting values of one or several slots or a special command from the user Global dialogue strategies Process the newly filled slots and plan the continuation of the dialogue on the global level 37

Local Dialogue Strategies Request for help No input from the user Rejection of utterance by the speech recognizer (no match) Information about current context Repetition of last system prompt (reprompt) Pause/Resume Restart Transfer to operator 38

Global Dialogue Strategies Confirmation strategy No confirmation Implicit confirmation Explicit confirmation 39 Selection based on the degree of certainty that the recognized information is correct Problem: How to recognize that the user corrects a recognition error instead of giving new information

Global Dialogue Strategies (2) Integration of the newly acquired information Filled only empty slots Entered identical values for previously filled slots Entered new values for previously filled slots Problem: which value is valid The new value refine the previously entered value (e.g. less than 15000 + more than 10000) 40

Global Dialogue Strategies (3) Relaxing of overconstrained requests No solution satisfying the criteria specified by the user (no suitable object, no item in the database etc.) The value of a slot must be erased/changed Problem: how to select an appropriate candidate 41

Global Dialogue Strategies (4) Dialogue initiative control As long as the dialogue proceeds well, the dialogue system leaves initiative to the user The user can use a large scale of utterances and the conversation can cover a larger part of the domain (i.e. several slots in one dialogue step) When problems arise, the dialogue system must control the conversation and ask more focused questions Problem: How to recognize that problems arose 42

Global Dialogue Strategies (5) Conversational focus selection Problem: which slot should be selected as the topic of conversation (if the conversation is controlled by the system) Important question the order in which slots are discussed can significantly reduce the length of the dialogue 43

Confirmation strategy Global Dialogue Strategies Summary Integration of the newly acquired information Relaxing of overconstrained requests Dialogue initiative control Conversational focus selection 44

Domain Representation in Frame-based DS Domain (task) represented by Structure of the frame Conditions for filling slot values Relations among slot values Slot priority Other possibilities Control table Path constraints 45

FROM TO WHEN Flat Suitable for simple domains Frame as a Form The flat structure can be compensated by richer relations among slot values and conditions for filling slots 46

Hierarchical frame Reflects internal structure of the task Example: Airplane ticket reservation domain AND DATE OR AND FL. NUM FROM TO 47