Traitement de la Parole Cours 11: Systèmes à dialogues VoiceXML partie 1 06/06/2005 Traitement de la Parole SE 2005 1 jean.hennebert@unifr.ch, University of Fribourg Date Cours Exerc. Contenu 1 14/03/2005 Informations pratiques, introduction 2 21/03/2005 Annulé 3 04/04/2005 Signal de parole: production perception analyse modélisation 4 11/04/2005 Synthèse de la parole 5 18/04/2005 Reco Automatique de la Parole (RAP): principes 6 25/04/2005 RAP: modèles de Markov cachés I 7 02/05/2005 RAP: modèles de Markov cachés II 8 09/05/2005 Reco Automatique du Locuteur 9 10 23/05/2005 Aspects usercentric et pratiques d un service RAP 11 30/05/2005 miniprojet Systèmes de dialogue: principes de design 12 06/06/2005 16h1517h00 miniprojet Systèmes de dialogue: VoiceXML 13 13/06/2005 16h1517h00 miniprojet Systèmes de dialogue: VoiceXML 14 20/06/2005 Présentation des miniprojets, questions révisions Traitement de la Parole SE 2005 2 jean.hennebert@unifr.ch, University of Fribourg 1
Plan 1. VoiceXML Definition 2. Comparison to HTML 3. VoiceXML features 4. Architecture 5. Key concepts 1. Session 2. Dialog states 3. Menu /Forms 6. Grammars 7. Variables 8. Examples Traitement de la Parole SE 2005 3 jean.hennebert@unifr.ch, University of Fribourg VoiceXML 2.0 : definition VoiceXML 2.0: markup language designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. In W3C Voice Browser activity VoiceXML 2.0 recommendation: 16 March 2004 www.w3c.org/voice specs in English: http://www.w3.org/tr/voicexml20/ specs in French: http://www.yoyodesign.org/doc/w3c/voicexml20/ Traitement de la Parole SE 2005 4 jean.hennebert@unifr.ch, University of Fribourg 2
Comparison to HTML «VoiceXML = language for writing Web pages you interact with by listening to spoken prompts and jingles, and control by means of spoken input.» «HTML was designed for visual Web pages and lacks the control over the userapplication interaction that is needed for a speechbased interface. With speech you can only hear one thing at a time (kind of like looking at a newspaper with a times 10 magnifying glass). VoiceXML has been carefully designed to give authors full control over the spoken dialog between the user and the application. The application and user take it in turns to speak: the application prompts the user, and the user in turn responds.» From Dave Raggett, W3C Traitement de la Parole SE 2005 5 jean.hennebert@unifr.ch, University of Fribourg VoiceXML features VoiceXML documents describe: spoken prompts (synthetic speech) output of audio files and streams recognition of spoken words and phrases recognition of touch tone (DTMF) key presses recording of spoken input control of dialog flow telephony control (call transfer and hangup) Traitement de la Parole SE 2005 6 jean.hennebert@unifr.ch, University of Fribourg 3
Example <?xml version="1.0" encoding="iso88591"?> <vxml version="2.0" lang="en"> <form> <block> <prompt bargein="false">welcome to VoiceXML </prompt> </block> </form> </vxml> Traitement de la Parole SE 2005 7 jean.hennebert@unifr.ch, University of Fribourg System Architecture Traitement de la Parole SE 2005 8 jean.hennebert@unifr.ch, University of Fribourg 4
System architecture (parallelism with web servers) Traitement de la Parole SE 2005 9 jean.hennebert@unifr.ch, University of Fribourg Key concepts (1/3) Sessions A session begins when the user starts to interact with a VoiceXML interpreter A session continues as VoiceXML documents are loaded and unloaded. The session ends when requested by the user, VoiceXML document or interpreter context. Dialog states An application is a set of dialog states. The user is always in one dialog state at any time. Each dialog state specifies the next dialog to transition to (using a URL). Traitement de la Parole SE 2005 10 jean.hennebert@unifr.ch, University of Fribourg 5
Key concepts (2/3) Menus A menu presents the user with a choice of options and the transitions to another dialog state based upon the users selection. Forms A form defines an interaction that collects values for each of the fields in the form. Each field may specify a prompt, the expected input, and evaluation rules. The form can be submitted to a server in much the same way as for HTML. Traitement de la Parole SE 2005 11 jean.hennebert@unifr.ch, University of Fribourg Key concepts (3/3) Application An application is a set of VoiceXML documents that share the same application root document. The root document is automatically loaded whenever one of the application documents is loaded, and remains loaded until there is a transition to a different application, or when the call is disconnected. The root document information is available to all documents in the same application. Traitement de la Parole SE 2005 12 jean.hennebert@unifr.ch, University of Fribourg 6
Grammars Each dialog state has one of more grammars associated with it. Grammars define the expected user input, either spoken input or pressed touchtone (DTMF) key. In the simplest case, only the dialogs grammars are active in that dialog. In more complex cases, other grammars canbeactive: grammars defined within the dialog itself external grammars referenced by links grammars defined at the document level and marked as being globally active grammars defined in the root application document and active throughout the application Traitement de la Parole SE 2005 13 jean.hennebert@unifr.ch, University of Fribourg Variables VoiceXML allows you to define named variables for holding data. These can be defined at any level and their scope follows an inheritance model. You can test the values of variables to determine what dialog state to transition to next. Variable expressions can also be used for conditional prompts and grammars etc. Traitement de la Parole SE 2005 14 jean.hennebert@unifr.ch, University of Fribourg 7
Example : menu <?xml version="1.0" encoding="utf8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml" xmlns:xsi="http://www.w3.org/2001/xmlschemainstance" xsi:schemalocation="http://www.w3.org/2001/vxml http://www.w3.org/tr/voicexml20/vxml.xsd"> <menu> <prompt> Welcome home. Say one of: <enumerate/> </prompt> <choice next="http://www.sports.example.com/vxml/start.vxml"> Sports </choice> <choice next="http://www.weather.example.com/intro.vxml"> Weather </choice> <choice next="http://www.stargazer.example.com/voice/astronews.vxml"> Stargazer astrophysics news </choice> <noinput>please say one of <enumerate/></noinput> </menu> </vxml> Traitement de la Parole SE 2005 15 jean.hennebert@unifr.ch, University of Fribourg Example : menu Traitement de la Parole SE 2005 16 jean.hennebert@unifr.ch, University of Fribourg 8
Example: Forms and field <?xml version="1.0"?> <!DOCTYPE vxml PUBLIC '//Voxpilot/DTD VoiceXML 2.0//EN' 'http://dtd.voxpilot.com/voice/2.0/voxpilot_voicexml2.0.dtd'> <vxml version="2.0" xml:lang="engb"> <form> <field name="sevendigits" type="digits?length=7" modal="true"> <prompt>say or type a seven digit number</prompt> </field> <field name="answer" type="boolean" modal="true"> <prompt>answer yes or no</prompt> </field> <filled namelist="sevendigits answer"> <prompt> Your number was <sayas interpretas="vxml:digits"> <value expr="sevendigits"/> </sayas> <break time="80ms"/> and you answered <value expr="answer"/> <break time="80ms"/> Bye </prompt> </filled> </form> </vxml> Traitement de la Parole SE 2005 17 jean.hennebert@unifr.ch, University of Fribourg Example: forms and field with grammars <?xml version="1.0"?> <!DOCTYPE vxml PUBLIC '//Voxpilot/DTD VoiceXML 2.0//EN' 'http://dtd.voxpilot.com/voice/2.0/voxpilot_voicexml2.0.dtd'> <vxml version="2.0" xml:lang="engb"> <form id="creditcard"> <field name="cardtype"> <prompt>which credit card type?</prompt> <grammar type="application/gsl"> <![CDATA[ Cardtype [ (?american express) {return("amex")} (master?card) {return("mc")} (visa) {return("visa")} ] ]]> </grammar> <noinput count="1"> Sorry I didn't hear you <reprompt/> </noinput> Traitement de la Parole SE 2005 18 jean.hennebert@unifr.ch, University of Fribourg 9
Example: forms and field with grammars <nomatch count="1"> <prompt>sorry I didn't understand please repeat the card type</prompt> </nomatch> <nomatch count="2"> <prompt>i still don't understand please select either american express or master card or visa</prompt> </nomatch> <filled> <prompt>you selected <value expr="cardtype"/></prompt> </filled> </field> </form> </vxml> Traitement de la Parole SE 2005 19 jean.hennebert@unifr.ch, University of Fribourg Questions and answers Traitement de la Parole SE 2005 20 jean.hennebert@unifr.ch, University of Fribourg 10