The Future of VoiceXML: VoiceXML 3 Overview. Dan Burnett, Ph.D. Dir. of Speech Technologies, Voxeo Developer Jam Session May 20, 2010

Similar documents
VoiceXML-Based Dialogue Systems

An Introduction to VoiceXML

VoiceXML and VoIP. Architectural Elements of Next-Generation Telephone Services. RJ Auburn

VoiceXML Tutorial. Part 1: VoiceXML Basics and Simple Forms

Web page creation using VoiceXML as Slot filling task. Ravi M H

VoiceXML versus SALT: selecting a voice

XML based Interactive Voice Response System

Avaya Aura Orchestration Designer

Dialog planning in VoiceXML

VoiceXML Overview. James A. Larson Intel Corporation (c) 2007 Larson Technical Services 1

! <?xml version="1.0">! <vxml version="2.0">!! <form>!!! <block>!!! <prompt>hello World!</prompt>!!! </block>!! </form>! </vxml>

Aspect Education Services

VoiceXML Discussion.

Using Service Oriented Architecture (SOA) for Speaker-Biometrics Applications

Standard Languages for Developing Multimodal Applications

Using VoiceXML, XHTML, and SCXML to Build Multimodal Applications. James A. Larson

Voice Processing Standards. Mukesh Sundaram Vice President, Engineering Genesys (an Alcatel company)

VoiceXML and Next-Generation Voice Services

VOICEXML TUTORIAL AN INTRODUCTION TO VOICEXML

Mobile Application Languages XML, Java, J2ME and JavaCard Lesson 03 XML based Standards and Formats for Applications

VXI* IVR / IVVR. VON.x 2008 OpenSER Summit. Ivan Sixto CEO / Business Dev. Manager. San Jose CA-US, March 17th, 2008

BeVocal VoiceXML Tutorial

Thin Client Development and Wireless Markup Languages cont. VoiceXML and Voice Portals

How To Use Voicexml On A Computer Or Phone (Windows)

Grammar Reference GRAMMAR REFERENCE 1

Information. OpenScape Contact Center Voice Portal V7.0 R2 Enable Open Dialogue, Intuitive Interaction, and Seamless Handoff

A Development Tool for VoiceXML-Based Interactive Voice Response Systems

Traitement de la Parole

RAPID VOICEXML DEVELOPMENT USING IBM S GRAPHICAL CALL FLOW BUILDER

Interfaces de voz avanzadas con VoiceXML

Emerging technologies - AJAX, VXML SOA in the travel industry

A design of the transcoder to convert the VoiceXML documents into the XHTML+Voice documents

VoiceXML Programmer s Guide

JEE Web Applications Jeff Zhuk

Hosted Fax Mail. Hosted Fax Mail. User Guide

Configuration and Management of Speaker Verification Systems

Combining VoiceXML with CCXML

Best Practices in Outbound Customer Interactions

Migrating Legacy IVR Applications to VoiceXML with Voxeo The advantages of a 100% VoiceXML compliant platform

Voice User Interface Design

Building A Self-Hosted WebRTC Project

Vocalité Version 2.4 Feature Overview

Open Source VoiceXML Interpreter over Asterisk for Use in IVR Applications

PROPHECY. Unlocked Communications Customer Obsession Teams Communications Passion

Voice Tools Project (VTP) Creation Review

Position Paper for The Fourth W3C Web and TV Workshop. Mingmin Wang Oriental Cable Network

Shared VRU. A Key Link in Your Customer Service Chain Kyle Shadday, Director, Voice Response Strategy

Developing Usable VoiceXML Applications

Personal Voice Call Assistant: VoiceXML and SIP in a Distributed Environment

AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION

Building Applications with Vision Media Servers

Form. Settings, page 2 Element Data, page 7 Exit States, page 8 Audio Groups, page 9 Folder and Class Information, page 9 Events, page 10

VocalPassword : voice biometrics authentication.

PowerCAMPUS Portal and Active Directory

INCITS 456 Speaker Recognition Format for Raw Data Interchange (SIVR-1)

5.1 Features Denver CO 80202

JW Player Quick Start Guide

Cisco IOS VoiceXML Browser

Deploying Cisco Unified Contact Center Express Volume 1

VoiceXML. For: Professor Gerald Q. Maguire Jr. By: Andreas Ångström, and Johan Sverin, Date:

Cisco IOS Voice XML Browser

NeoIVR. Flexible & high performance IVR platform

ECMA TR/88. 1 st Edition / June Designing an Object Model for ECMA-269 (CSTA)

Application Notes for Speech Technology Center Voice Navigator 8 with Avaya Aura Experience Portal Issue 1.0

Dialogic PowerMedia XMS VoiceXML

Speaker Identification and Verification (SIV) Introduction and Best Practices Document

Envox CDP 7.0 Performance Comparison of VoiceXML and Envox Scripts

Cisco IOS Voice XML Browser

NSP Software Summit: Next Generation Voice Messaging - A Key to your Success. Alain Decartes Business Development Manager WW SGBU Sales

Presentation / Interface 1.3

Design and Functional Specification

Design Grammars for High-performance Speech Recognition

Description: Objective: Upon completing this course, the learner will be able to meet these overall objectives:

Effective Call Center Automation and Agent Support

NASA Workflow Tool. User Guide. September 29, 2010

Easy configuration of NETCONF devices

VOXEO CXP. Application Lifecycle Management for Next-Generation Customer Contact

VoiceXML Data Logging Overview

Voice XML: Bringing Agility to Customer Self-Service with Speech About Eric Tamblyn Voice XML: Bringing Agility to Customer Self-Service with Speech

World-wide online monitoring interface of the ATLAS experiment

Blackboard Web Community Manager WCAG 2.0 Support Statement February 2016

Cisco CallManager 4.1 SIP Trunk Configuration Guide

Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper

Developing Physical Solutions for InfoSphere Master Data Management Server Advanced Edition v11. MDM Workbench Development Tutorial

Nuance VocalPassword v9 :: Product Description

Transcription:

The Future of VoiceXML: VoiceXML 3 Overview Dan Burnett, Ph.D. Dir. of Speech Technologies, Voxeo Developer Jam Session May 20, 2010

Todayʼs topics V3 Motivations MVC, or DFP V3 is a presentation language V3 architecture New stuff Multimodal use of V3 2

V3 motivations Extensibility FIA flexibility New features Better integration with other W3C languages 3

MVC, or DFP Model-View-Controller paradigm Model is the data being stored/tracked View is the presentation Controller is the piece that controls app flow VBWG: Data-Flow-Presentation Data held in any XML format Flow handled by SCXML, CCXML, or other such language Presentation is HTML, Wiimote vibration, V3 4

V3 is a presentation language As a presentation language The Purpose of VoiceXML 3: To interact with a person by voice (and perhaps video) NOT the Purpose of VoiceXML 3: To be an efficient and effective data repository NOT the Purpose of VoiceXML 3: To efficiently describe business logic However, Voice interaction requires some flow control and some data So limited support for both is still included 5

V3 architecture Core functionality defined in modules Modules combined with convenience syntax into profiles 6

Core functionality defined in modules Module behavior defined precisely as state machines 7

Modules + Conv. Syntax = Profiles Modules grouped into profiles Legacy (V2.1), Basic, Maximal Convenience syntax simplifies authoring 8

Convenience syntax New elements and attributes, but no new functionality Behavior defined in terms of core functionality For example, <menu> defined in terms of <form> with grammars and prompts 9

Convenience syntax example Note: Work in progress!!!! <!-- convenience syntax version --> <vxml...> <menu> <prompt> Say one of: <enumerate/> </prompt> <choice next="#operator"> operator </choice> <choice next="http://sports.example.com/start.vxml"> sports </choice> <choice next="http://weather.example.com/intro.vxml"> weather </choice> </menu>... </vxml> 10

Convenience syntax example (cont.) Note: Work in progress!!!! <!-- previous code behaves as if it were this --> <vxml...> <form> <link next="#operator"> <grammar...> <rule...> operator </rule> </grammar> </link> <link next="http://sports.example.com/start.vxml"> <grammar...> <rule...> sports </rule> </grammar> </link> <link next="http://weather.example.com/intro.vxml"> <grammar...> <rule...> weather </rule> </grammar> </link> <field name= "main"> <prompt> Say one of: operator, sports, weather.</prompt> </field> </form>... </vxml> 11

Convenience syntax Definite candidates are menu/choice/enumerate/option error/help/noinput/nomatch shortcuts link Possible (but different) candidates might be if/else/elseif (using SCXML) transfer (using CCXML) 12

New stuff Author-specifiable transition controllers V2 eventing model now async & compatible with DOM Level 3 New media, SIV functions Session root documents Real-time controls 13

Transition controllers Inter-element transitions now under author control Controllers at form, document, application, and perhaps session levels e.g. form controller specifies which form item to execute next Controllers can be in SCXML or another flow control language Default controllers will give FIA behavior in Legacy Profile 14

Transition controllers example 1 <!-- document-level transition controller controls inter-form transitions --> <vxml...> <controller...> <scxml:scxml version="1.0"...> <!-- SCXML code determining which form to go to next --> </scxml> </controller> <form id="form_a" >... <goto next="form_b"/> <!-- goto is only a suggestion now --> </form> <form id="form_b" >... </form>... </vxml> Note: Work in progress!!!! 15

Transition controllers example 2 Note: Work in progress!!!! <!-- form-level transition controller controls inter-field transitions --> <vxml...> <form> <controller src= "myformbehavior.scxml"> <field name="field_a" >... </field> <field name="field_b" >... </field> <field name="field_c" >... </field> <field name="field_d" >... </field> </form>... </vxml> 16

V2 Events/DOM Level 3 events V2 event handlers: defined in XML code in the document tree; complex event handling (e.g., prompt counters) HTML event handlers: typically ECMAScript, can be attached to any element in the tree, DOM 3 event handling V3 events & handling: same as V2, but event handling now defined as specific use case of DOM 3 Events -- may allow multi-markup documents underlying event model now asynchronous 17

New functionality -- video Video -- <audio> replaced by <media>, which allows both audio and video <media type="audio/x-wav" src="http://www.example.com/resource.wav"/> <media type="video/3gpp" src="http://www.example.com/resource.3gp"/> <media> <!-- inline SSML with audio media fallback--> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"> Ich bin ein Berliner. </speak> <media type="audio/x-wav" src="ichbineinberliner.wav"> </media> 18

New functionality -- media control Media control -- media clipping, speed, and volume control now possible without resorting to SSML <media type="audio/x-wav" soundlevel="+6.0db" speed="50%" repeatcount= "2" src="http://www.example.com/resource.wav"/> <media type="video/3gpp" clipbegin= "2s" clipend="5s" repeatdur="25s" src="http://www.example.com/resource.3gp"/> 19

New functionality -- SIV SIV -- speaker authentication capabilities available as core functionality Enrollment -- creates voice model, associates it with id in speaker database Identification -- which voice model in speaker database is a match for the speech? Verification -- for the claimed id, does the speech match the voice model in the speaker database? 20

SIV process Verification process similar to speech recognition SRGS Voice model ASR Engine SIV Engine Result to VXML Result to VXML 21

SIV process (cont.) SIV process different from speech recognition? ASR -- single turn, possibly with repeats SIV -- multi-turn state info must be maintained Specification of SIV speaker database introduces different kinds of errors 22

SIV summary Lots of stuff to work out 23

New control -- session root Just like application root <vxml session="blahblah.vxml"...> Well, not exactly If not specified, no session root Session root change is ignored or causes error First, let s review application roots 24

Application root review A: <vxml> B: <vxml> C: <vxml root="b"> D: <vxml root="e"> F: <vxml root="e"> G: <vxml> AppRoot A AppRoot B AppRoot B AppRoot E AppRoot E AppRoot G 25

Session root A: <vxml> B: <vxml session="c"> D: <vxml> E: <vxml session="f" > G: <vxml session="h" requiresession="true"> No Session Root Session Root C Session Root C Session Root C error.badfetch 26

Real-time controls Special grammars that are always active (not just in the wait state) Allows arbitrary speech/dtmf Immediate: volume, speed, skip At next event processing: cancel, goto <form> <rtc grammar="digit3.grxml" action="volume" params="+5"/> <field name="a">... </field> <field name="b"> <cancelrtc grammar= "digit3.grxml "/>... </field> Note: Work in progress!!!! </form> Acts as pre-filter on input stream, replacing matches with silence 27

Multimodal use of V3 Loose integration, e.g. SCXML as controller (separate document/process) send/receive events optional MMI lifecycle events Tight integration, e.g. HTML as controller (same document/process) Direct DOM event handlers attached to both V3 and HTML elements for control 28

SCXML as controller <!-- BankApp.vxml --> <vxml version="3.0"> <form id="getaccountnum"> <field name="accountnum"> <grammar src= accountnum.grxml" type="application/grammar+xml"/> <prompt> Please tell me your account number. </prompt> <filled> <exit namelist="accountnum"/> </filled> </field> </form>... </vxml> 29

SCXML as controller <!-- bankapp.scxml --> <scxml...> <state id="getaccountnum"> <invoke targettype="vxml3" src="bankapp.vxml#getaccountnum" /> <transition event="vxml3.gotaccountnum" target="getbalance"/> </state> <state id="getbalance"> <datamodel> <data name="method" expr="'getbalance'"/> <data name="accountnum" expr="_data.accountnum"/> </datamodel> <send targettype="basichttp" target="bankdb.do" namelist="method accountnum" /> <transition event="basichttp.gotbalance" target="playbalance"/> </state> <state id="playbalance">... </state> <final id="exit"/> </scxml> 30

HTML as controller <!-- Note: code dreamed up in my head -- never reviewed by W3C --> <html>... vxmlaccountnumelement.addeventlistener('done',dosomething,false); function dosomething() { this.style.backgroundcolor = '#cc0000'; }... <vxml> <form> <field name="accountnum"> <grammar src= accountnum.grxml" type="application/grammar+xml"/> <prompt> Please tell me your account number. </prompt> <filled> <throw event="done"/> </filled> </field> </form> </vxml> 31

For More Info Follow the work http://www.w3.org/voice Contact me dburnett at voxeo dot com Dan Burnett, Ph.D. Dir. of Speech Technologies, Voxeo Developer Jam Session May 20, 2010 32

Customer Summit 2010 Register today to reserve your All access pass for voxeo s customer summit june 21-23 in the hard rock hotel at universal orlando www.voxeo.com/summit2010

Next Voxeo Developer Jam Session Unified Self-Service: Creating multi-channel communications apps using Voxeo tools Speaker: Adam Kalsey Manager, Voxeo Development Network Date: June 17, 2010 Time: 8 am PDT, 11 am EDT, 5 pm EST 33