VoiceXML Discussion http://www.w3.org/tr/voicexml20/ Voice Extensible Markup Language (VoiceXML) o is a markup-based, declarative, programming language for creating speechbased telephony applications o supports dialogs that feature synthesised speech, digitised audio, recognition of spoken and DTMF key input, recording of audio, mixed initiative conversations, and basic telephony call control o enables Web-based development and content delivery paradigms to be used with interactive voice response applications o delivers an easy to use language that also permits the creation of complex dialogs; o promotes application portability across platforms by providing a common language for platform, tool and application providers alike; o shields application developers from low-level, platform-specific, details; o separates user interaction code (i.e. VoiceXML) from service logic (e.g. PHP, Java), and o reuses existing web back-end infrastructure. Standard VoiceXML architectural model
Document Structure <vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" xml:lang="en-ie"> <form> <block>welcome to the world of VoiceXML!</block> Core to VoiceXML are the concepts of sessions, applications and dialogs A session begins when the user commences interaction with the interpreter context, continues as documents are loaded, executed and unloaded, and terminates when requested to either by the user (e.g. a hang-up), the document (e.g. through an explicit <exit>), or by the interpreter context (e.g. because of an unhandled error). An application is a set of documents that share the same application root document. The application root document remains loaded as long as its URI is specified in the application attribute on the <vxml> element and becomes unloaded when a transition to a document that does not specify this application root document occurs. While it is loaded, the application root document is used for storing data during the lifetime of the application. The application root document can also specify grammars that will be
active throughout the lifetime of the application There are two kinds of dialog: forms and menus. A form is the more powerful construct of the two and is represented by the element <form>. Forms define the interactions that collect input for its set of form items. Menus, on the other hand, are a more simple but restrictive construct. Menus are represented by the element <menu> and contain a number of alternative choices (represented by <choice> elements) a matched choice results in a corresponding transition to a new dialog Dialogs 1. Forms Dialogs in VoiceXML are declarative in nature, that is, one defines the set of input items that can be filled, a set of grammars to match against, the prompts to play at different points, and a set of event handlers and filled actions. The Form Interpretation Algorithm (FIA) is at the core of the VoiceXML execution model. Items and their purpose The simple dialog might appear as part of a cinema booking service to ask the user which movie they would like to see and how many tickets they would like to purchase. <vxml version="2.1" xmlns="http://www.w3.org/2001/vxml"> <form id="tickets"> <block>welcome to EasyTicket!</block>
<field name="movie"> <prompt>which movie would you like tickets for? <grammar src="movie.grxml"/> <field name="number"> <prompt>how many tickets would you like? <grammar src="number.grxml"/> <filled namelist="movie number" mode="all"> <submit next="book.vxml" namelist="movie number"/> <catch event="noinput"> Sorry, I did not hear you. Please try again. <catch event="nomatch"> Sorry, I did not understand. Please try again. Platform: Welcome to EasyTicket! Which movie would you like tickets for? Caller: Vampire Movie II. Platform: How many tickets would you like? Caller: <silence> Platform: Sorry, I did not hear you. Please try again. Caller: Four. Improvement <catch event="noinput"> Sorry, I did not hear you. Please try again. <reprompt/> -------- <field name="movie"> <prompt count="1"> Which movie would you like tickets for? <prompt count="2"> Please choose from the following movies: Vampire Movie II, A Day in the Life, or My Favourite Story. <grammar src="movie.grxml"/> Platform: Welcome to EasyTicket! Which movie would you like tickets for? Caller: <silence> Platform: Sorry, I did not hear you. Please try again. Please choose from the following movies: Vampire Movie II, A Day in the Life, or My Favourite Story. Caller: A Day in the Life. Platform: How many tickets would you like? Caller: Five. 2. Menus
A menu is composed of a number of choices, each represented by the <choice> element. The <choice> element contains a word or phrase that is used to prompt the user for that choice and also automatically to generate a grammar for the menu. A <menu> element usually contains a <prompt> element to prompt the user of their choices. The <enumerate> element works within <menu> to list the choices automatically. <vxml version="2.1" xmlns="http://www.w3.org/2001/vxml"> <menu accept="approximate"> <prompt> Welcome to Cinema One. Please say one of: <enumerate/> <choice next="showtimes.vxml"> Show times </choice> <choice name="easyticket.vxml"> Ticket bookings </choice> <catch event="noinput"> Sorry, I did not hear you. Please say one of: <enumerate/> <catch event="nomatch"> Sorry, I did not understand. Please say one of: <enumerate/> </menu> 3. Mixed initiative dialogs <vxml version="2.1" xmlns="http://www.w3.org/2001/vxml"> <form id="tickets"> <grammar src="cities.grxml"/> <initial name="init"> Let s start the ticket booking process. Where would you like to fly from and to? <catch event="nomatch" count="1"> I did not catch that. Please say something like: I would like to fly from Dublin to Paris. <catch event="nomatch" count="2"> Sorry, I still do not understand. Let s try by asking for one piece of information at a time. <assign name="init" expr="true"/> </initial> <field name="origin"> <prompt>which city are you flying from? <field name="dest"> <prompt>which city are you flying to? <filled namelist="origin dest" mode="all"> Checking availability... <submit next="check_avail.vxml" namelist="origin dest" fetchaudio="ticktock.wav"/> ---- 1 Platform: Let s start the ticket booking process. Where would you like to fly from and to?
Caller: I would like to fly from Berlin to Rome Platform: Checking availability ---- 2 Platform: Let s start the ticket booking process. Where would you like to fly from and to? Caller: I would like to fly from Berlin Platform: Which city are you flying to? Caller: Rome Platform: Checking availability Other Topics Media playback - VoiceXML supports media playback through the use of SSML (Speech Synthesis Markup Language) Media Recording - Media recording in VoiceXML is performed via the input item <record> and stored in the corresponding form item variable <vxml version="2.1" xmlns="http://www.w3.org/2001/vxml"> <form> <record name="greeting" beep="true" dtmfterm="true" maxtime="7s" finalsilence="3000ms" type="audio/x-wav"> <prompt> Please speak your greeting after the beep </record> <field name="confirm"> <grammar src="yesno.grxml"/> <prompt> Your greeting is <audio expr="greeting"/>. Would you like to keep it? <filled> <if cond="confirm == true"> <submit next="store.php" namelist="greeting" enctype="multipart/form-data" method="post"/> <else/> <clear/> </if> Flow control Executable content Speech and DTMF recognition - User input, in the form of speech or DTMF key presses, is accepted during the collect phase of the FIA when a <menu> is executed or when any of the following form items is executed: <field>, <initial>, <record>, <transfer>.
Application Examples corporate auto-attendant, which allows callers to speak the name of the person they wish to talk to and automatically be transferred to that person without the hassle of looking up a phone directory or speaking to an operator. If the party cannot be contacted, the application allows the caller to record a message. -- Call Flow
Application Files Application root document: root.vxml <vxml version="2.1" xmlns="http://www.w3.org/2001/vxml"> <!-- An application-scoped variable holding the contact details of the person the caller wishes to speak to. --> <var name="person"/> <!-- Default catch handlers --> <catch event="noinput">
I m sorry, I didn t hear you. <reprompt/> <catch event="nomatch"> I m sorry, I didn t get that. <reprompt/> <catch event="error"> We are experiencing technical difficulties. Please call back later. <exit/> Application entry point: index.vxml <vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" application="root.vxml" xml:lang="en-gb"> <form> <!-- Welcome message --> <block> <prompt bargein="false"> <audio src="audio/jingle.wav"/> Welcome to the Smart Company auto attendant. </block> <!-- Find person to contact --> <field name="person"> <prompt bargein="true"> Please say the name of the person you would like to speak to or say <emphasis>operator</emphasis> to speak to the operator. <grammar src="grammar/directory.grxml" type="application/srgs+xml"/> <filled> <assign name="application.person" expr="person"/> <!-- Attempt transfer --> <transfer name="xfer" destexpr=" tel: + application.person.number" transferaudio="audio/ringback.wav" connecttimeout="20s" type="consultation"> <prompt> <lexicon uri="names.pls" type="application/pls+xml"/> Transferring you to <value expr="application.person.name"/> <filled> <!-- If the person is busy or does not answer, record message --> <if cond="xfer == busy "> <prompt>the line is busy <goto next="record.vxml"/> <elseif cond="xfer == noanswer "/> <prompt>there was no answer <goto next="record.vxml"/> </if> </transfer> Grammar encapsulating the directory database: directory.grxml <grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" mode="voice" xml:lang="en-gb"
tag-format="semantics/1.0" root="names"> <lexicon uri="names.pls" type="application/pls+xml"/> <rule id="names"> <tag>out.name=meta.current().text;</tag> <one-of> <item>the Operator<tag>out.number= 1900 ;</tag></item> <item>stephen Breslin<tag>out.number= 1907 ;</tag></item> <item>andrew Fuller<tag>out.number= 1916 ;</tag></item> <item>james Bailey<tag>out.number= 1914 ;</tag></item> <item>amanda McDonnell<tag>out.number= 1926 ;</tag></item> <!-- Add new names and numbers here --> </one-of> </rule> </grammar> Pronunciation lexicon file: names.pls <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-gb"> <lexeme> <grapheme>mcdonnell</grapheme> <alias>mac Donnell</alias> </lexeme> </lexicon> VoiceXML file to record caller s message: record.vxml <vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" application="root.vxml"> <form> <!-- Record message --> <record name="message" beep="true" maxtime="20s" type="audio/x-wav" finalsilence="4s"> <prompt> <lexicon uri="names.pls" type="application/pls+xml"/> <value expr="application.person.name"/> is unavailable to take your call. Please leave a message after the beep. <filled> Message recorded. Good bye. <disconnect/> </record> <!-- Send the recording to the server when the user hangs up --> <catch event="connection.disconnect.hangup"> <if cond="message!= undefined"> <submit next="cgi/store_message.php" namelist="message application.person.name" enctype="multipart/form-data" method="post"/> </if> VoiceXML document returned after submission of recorded message to store_message.php (PHP code that receives the recorded message omitted for simplicity) <vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" application="root.vxml"> <form> <block> <!-- This it the VoiceXML document that is returned
after the voice message is stored server-side. This document simply exits. --> <exit/> </block>