Web page creation using VoiceXML as Slot filling task Ravi M H
Agenda > Voice XML > Slot filling task > Web page creation for user profile > Applications > Results and statistics > Language model > Future work > Open questions
Why is it so interesting?? When something is uttered over phone and changes are taking place in remote server which the user can also see in web browser - makes it an interesting interface...
Project The app is about simple web page creation using slot filling task, and voice XML.
What is it about? Voice XML - standard XML for specifying interactive voice dialogue between human and computer. Uses Voice Browser(similar to HTML browser) Commonly seen - Automated machines, commercial VoiceXML apps being used in most of the places for enquiring, info tracking..etc.
slot filling task? A simple task of dialogue system Involves filling up blanks in ordered way, and is usually System Initiated. These are restricted to certain domains in most of the applications, and are seldom generic.
Lessons from the course Simple slot filling tasks System initiated and user initiated queries and responses. Voice XML Task oriented dialogues
User profile Everything is in the air. :) People express to have web presence. Most of them are not tech savvy, but still would love to have customized web page with their details. Hence a simple web page for a non tech savvy person.
Example data Name: Address: Email: Phone no: Favourite color: Hobbies: Favourite movie stars: Education : Skillset : Family and relationships:
who will fill the slots and how? 1. People will call to a published number. 2. System takes them through an automated slot filling task, where they specify the details to be recorded. 3. This info is used to create a web page. which later can be published and the address is email.
Applications Various uses: 1. matrimony sites. 2. Non tech person wanting to have web presence. 3. Simple profile for all the users to make it mandatory for employees/ students/faculty
Results and Statistics After the entire application is built, some of the closed group would be asked to test the application. An online user survey would be taken after which they indicate, the accuracy with which the system could recognize the speech, error rate and other useful attributes.
Language model and grammar Some of the slots require unigram, while others such as hobbies might be 2- gram model Phone numbers, and email ids- are easy, since we can ask user to input alphabets and number (restricted to 26 letters and 10 numbers) Favourite color - unigram, even if they say crimson red.. red would be considered..
Grammar SRGS- speech recognition and grammar specification and SISR - semantic interpretation of speech recognition GSL- Grammar specification language
Siblings-- (Bargein = false) - can be used to restrict user to specify the numbers and name accordingly. asking to input brothers first Area Code, city, States - have GSL grammar for common lists. (restricted now to US based locations) Voxeo - VoiceXML based application platform - Prophecy is being used
Sphinx was tried, but found Voxeo to be more effective with fully loaded platform - Prohpecy and support CCxml, voicexml, has free ASR and TTS engine.
Future work Demo app has been made and call can be made to the number published, which informs about the connectivity. Webpage creation and the back end application is yet to be done. Local setup works fine- and accuracy is upto 90% for a sample app.
Open questions. Modelling the application using different ASR and language model Using different set of Grammar for checking the accuracy of detection and involving other tasks, not just focused on creating a webpage but more complex application
Thank you!!!! :D Any questions???