Using VoiceXML! Language spec 2.0! Includes support for VUI and for telephony applications (call forward, transfers, etc) " Has tags specific to voice application! Simple (and classic) example! <?xml version="1.0">! <vxml version="2.0">!! <form>!!! <block>!!! <prompt>hello World!</prompt>!!! </block>!! </form>! </vxml> Basics of VoiceXML Architecture VoiceXML CGIs From paper Promise of Voice Enabled Web 2001-2008 Pérez-Quiñones
My VoiceXML home page! Here is a basic home page for voicexml: <?xml version="1.0"?> <!DOCTYPE vxml PUBLIC "-//BeVocal Inc//VoiceXML 1.0//EN" "http:// cafe.bevocal.com/libraries/dtd/vxml1 0-bevocal.dtd"> <vxml version="1.0">! <form id="form">!! <block>!!! <prompt>!!!! Welcome to Manuel's Voice Page! While you are here,!!!! you cannot do anything but listen to me. Cool, no?!!!! Please hang up now, buh bye.!!! </prompt>!! </block>! </form> </vxml> Flow of a simple VoiceXML application Greeting Prompt Filled Exit Recognition Nomatch Noinput
Parts of VoiceXML application: Form, Block, Prompt! <form id grammar>... </form> " Like HTML forms, groups of items " Items # <fields> - items to be filled by the user # <var> - local variables # <catch> - event handlers #! <block>... </block> " Holds executable instructions (setting variables, etc)! <prompt>... </prompt> " Plays text using Text-to-speech " Bargein - option that allows the user to interrupt the prompt with information for next field " Condition - works as a guard condition " Count - counter of times the prompt has been played, VoiceXML keeps counter, different prompts can be played based on that. " Timeout- specify what timeout value is used Prompt! <prompt> example " Prompt can have a count parameter that allows the VoiceXML program to play a different prompt <form id="phone">! <block>!! <prompt count="1">what is your phone number?</prompt>!! <prompt count="3">i need your phone number.</prompt>!! <prompt count="5">i said to give me your...</prompt>! </block> </form>
Sequencing the dialogue! Form can contain many fields " When all fields are "filled", form terminates, unless programmer transfers control! Field specifies a Question/Answer pair " Grammar associated with them " Filled option indicates what to do when all variables are been filled " if/else controls logic " goto specifies where to go next, argument here is id of form in the same VoiceXML document! Forms are evaluated from top to bottom, until all fields are filled. <form id="aform"> <block>welcome to this page.</ block> <field name="instructions> <prompt> Do you need instructions on how to use this page? </prompt> <grammar>[yes no]</grammar> <if cond="instructions=='yes' "> <goto next="#giveinstructions"> <else/> <goto next="#usepage"> </if> </filled></field></form> Field tag! Field specifies a Question/Answer pair " Field name is a variable " Grammar explicitly expressed, or by type in field " type = [ boolean date digits currency number phone time ] " Filled option indicates what to do when the variable has been filled! noinput, nomatch and help are all defined in the language " can use the count option in all of them " <reprompt/> plays the prompt again <field name="instructions" [type]> <prompt> # plays a prompt </prompt> <grammar> # specify valid answers </grammar> <nomatch> # user said something that did not match the grammar # can use <reprompt/> and count option </nomatch> <noinput> # default timeout value exceeded with no input from the user </noinput> <help> # specify valid answers </help> </filled> </field> # what to do when field is filled
Menus! Menus are so typical in a voice user interface, that VoiceXML has a special tag for them " Can contain a prompt that is read at the top of the menu " <enumerate/> will list the text from the choices " Grammar is implicitly built # Accept="exact" # Accept="approximate" " dtmf = "true" # Allows selection using keypad " noinput, nomatch, help can still be used within the menu! Choice tag contains " Text that is read to the user " Grammar built from text " Where to go if the choice is selected " Each can override accept and dtmf <menu id="choices" accept="approximate" dtmf="true"> <prompt> # list the choices from below Please select one of these options <enumerate/> </prompt> <choice next="#grades"> # specifies the user choice and where to go Your grades </choice> <choice next="#calendar"> Today's calendar </choice> <choice next="#homework"> Upcoming homeworks </choice> </menu> More Menus! More control over how the menu is played out " <value> allows inserting variables into VoiceXML code " _prompt contains the prompt for each item in the menu (contents of the <choices> tag) " _dtmf contains the number to press to select that choice <menu id="choices" accept="approximate" dtmf="true"> <prompt> <enumerate> For information about <value expr="_prompt">, press <value expr="_dtmf"> </enumerate> </prompt> <choice next="#grades" accept="exact"> Your grades</choice> <choice next="#calendar">today's calendar </choice> <choice next="#homework">upcoming homeworks</choice> </menu>
More control of processing! So far all control is via <goto> (ugly)! <subdialogue> allows a function call to another form " Makes a function call to another form " Can receive returned values from form <subdialogue name="data" src="#getphone"> Your area code is <value expr="data.area">. <break time="300msec"> Your phone is <value expr="data.phone"> <goto next="#bye"> </filled> </subdialogue> <form id="getphone"> <field name="area">...</field> <field name="phone">..</field> <return namelist="area phone"/ > </filled> </form> Using CGIs! <submit> tag allows us to call a CGI " Calls a program stored at a server using the Common Gateway Interface! In this example, the url generated is http://web/prog.cgi?area=703&phone=3345555 <form id="getphone"> <field name="area">...</field> <field name="phone">..</field> <submit next="http://web/prog.cgi" namelist="area phone" method="get"> </filled> </form>
Grammar! Four different grammars can be used with VoiceXML " XML form of Speech Recognition Grammar Format (GRXML) " Augmented BNF (ABNF) " Java Speech Grammar Format (JSGF) " Nuance GSL (proprietary)! GRXML $ % ABNF " Both W3C standards! Can be field-level or form-level " Can use external grammar files! Built-in grammars " Digits, boolean, currency, date, number, phone, time Grammar - Example GRXML ABNF JSGF GSL <rule id= r1 > <one-of> <item>red</item> <item>blue</item> <item>green </item> </one-of></rule> $r1 = red blue green <r1> = red blue green R1: [red blue green] <rule id= r1 > <item repeat= 0-1 > I want</item> <one-of> <item>sausage </item> <item> pepperoni </ item> <item> onions </item> </one-of> <item repeat= 0-1 > please</item> </rule> $r1 = [I want] sausage pepperoni onions [please] <r1> = [I want] sausage pepperoni onions [please] R1: (?(I want) [sausage pepperoni onions]?please)
Lots of other options! VoiceXML has many more options " <record> # records the user's voice and produces a wav file with the recording # using submit can then save the recording on the web " <transfer> # transfer the phone call to another number " session variables that hold information about the call # including the caller's phone number (caller id) " audio # support for playing audio files instead of using TTS for prompts and messages " etc... Why Not?! Why hasn t VoiceXML caught on? " Somewhat limited purpose # Not suitable for general browsing " The rise of the full-browser phone " Can be slow to navigate through # If options are unknown, must wait for prompts " Unsure users # Can I barge in? # What do I say? Grammars may not be complete " Annoying confirmations # I heard you say " Voice recognition
Why?! What is VoiceXML suitable for? " Getting information # Airline schedules # Calendars " Relatively simple, specific tasks # Ordering items # Banking? " Hands-free tasks! Our server PlumVoice, http://voice.cs.vt.edu! Tutorial http://voice.cs.vt.edu/docs/plum_voicexml_2_0_tutorial.html! Reference document http://voice.cs.vt.edu/docs/plum_programmers_reference_manual.html