VOICE INFORMATION RETRIEVAL FOR DOCUMENTS Except where reference is made to the work of others, the work described in this thesis is my own or was done in collaboration with my advisory committee. Weihong Hu Certificate of Approval: W. Homer Carlisle Juan E. Gilbert, Chair Associate Professor Assistant Professor Computer Science and Software Computer Science and Software Engineering Engineering N. Hari Narayanan Stephen L. McFarland Associate Professor Acting Dean Computer Science and Software Graduate School Engineering
VOICE INFORMATION RETRIEVAL FOR DOCUMENTS Weihong Hu A Thesis Submitted to The Graduate Faculty of Auburn University In Partial Fulfillment of the Requirements for the Degree of Master of Science Auburn, Alabama August 4, 2003
VOICE INFORMATION RETRIEVAL FOR DOCUMENTS Weihong Hu Permission is granted to Auburn University to make copies of this thesis at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author Date Copy sent to: Name Date 3
THESIS ABSTRACT VOICE INFORMATION RETRIEVAL FOR DOCUMENTS Weihong Hu Master of Science, August 4, 2003 68 Typed Pages Directed by Dr. Juan E. Gilbert Currently, new methods of interaction between people and the World Wide Web are constantly emerging. Among them, voice is becoming more and more preferred. Various voice applications (telephone-enabled applications) have been implemented and used by governments, businesses, universities, libraries, visual impaired people etc. However, very little attention has been given to document information retrieval using voice because of existing technical difficulties and limitations with natural language processing, voice recognition, grammar generation, result representation, etc. This thesis explored the background of information retrieval using voice especially Interactive Voice Response systems (IVR), several well-known existing projects; and introduces the concepts of Voice Extensible Markup Language (VoiceXML) [15]. A voice information retrieval system for documents (VIRD) has been 4
designed and implemented to search for documents from a database using the telephone and VoiceXML. Five phases have been applied to this research: database creation and normalization, user inquiries, denormalized view and stored procedures, summarization functions, and user interface design. In this research, an experiment has been conducted to measure the effectiveness and the usability of VIRD. The PARADISE framework [17] was used to evaluate the effectiveness of VIRD. Both Quantitative data and Qualitative data were collected. Two sets of metrics were applied and analyzed. A careful analysis of the experiment data revealed that VIRD achieved its effectiveness and user satisfaction as a mode of document information retrieval via mobile access. However, it was also found that improved recognition and improved representation for large result sets were required. Finally, conclusions of this research are presented and future work that aims to improve VIRD is suggested. 5
ACKNOWLEDGMENTS The author would like to express her deep gratitude to her advisor, Dr. Juan E. Gilbert, for his patient guidance, valuable advice, and continued encouragement throughout her studies. Sincere thanks are also due to her two graduate committee members, Dr. N. Hari Narayanan and Dr. W. Homer Carlisle, for their reviewing and advising efforts. In addition, the author would like to thank her husband, Yapin Zhong, for his help while conducting the experiment and constant support. 6
Voice Information Retrieval for Documents Weihong Hu M.S. Thesis Dept. of Computer Science & Software Engineering Auburn University
Outline Motivation Literature review VIRD System Architecture & Voice User Interface (VUI) Experiment Future Work Demo 2
Motivation A very large part of the world population does not have access to either computers or the Internet Very tiny visual interfaces make users feel quite uncomfortable Blind or partially-sighted users are not able to access information visually VoiceXML technologies provide an alternative way to search for document via mobile devices Very little work involving VUI for document retrieval 3
Literature Review Information Retrieval via Voice VoiceXML Technology Common VoiceXML applications 4
Information Retrieval via Voice Traditional Interactive Voice Response systems (IVR) IVR systems are software applications that accept telephone input and touch-tone keypad selection and provide appropriate responses VoiceXML applications Allow users to call into an application system and use a combination of their voice and/or telephone input and/or touch tone keypad to interact with the system Use HTTP protocol to interact with Web server 5
VoiceXML Voice Extensible Markup Language (VoiceXML) A World Wide Web Consortium ( W3C) standard speechapplication development language Designed for creating audio dialogues that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations Allow users to interact with the Internet without needing visual access Allow users to have complete control over the user-application interaction through spoken dialogues 6
Voice Portals & System Infrastructure 7
Voice Portals & VoiceXML System Infrastructure (cont d) Voice portals provide the running platforms for voice applications Some well-known voice portals Tellme, VocalTec, and BeVocal 8
Common VoiceXML applications Simple uses Movie listings Traffic information Order tracking Directory assistance Personal information management Complex uses Business communications, virtual offices, and voice email Web-based IVR speech-recognition enabled call centers E-commerce Airline reservations Stock trades and financial services management 9
VIRD System Architecture Speech Interface Voice Server IR System VoiceXML Interpreter Controller IR Speech DB 10
VIRD Document Database Twenty document abstracts http://www.citeseer.com database 11
Sample Document Abstract Title: Selectivity Estimation for Boolean Queries Abstract: In a variety of applications ranging from optimizing queries on alphanumeric attributes to providing approximate counts of documents containing several query terms, there is an increasing need to quickly and reliably estimate the number of strings (tuples, documents, etc.) matching a Boolean query. Boolean queries in this context consist of substring predicates composed using Boolean operators. While there has been some work in estimating the selectivity of substring queries, 12
Sample VoiceXML grammar <grammar> <![CDATA[ [ [(query)] {<keyword "query">} [(match)] {<keyword "match">} [(boolean)] {<keyword "boolean">} [(estimation)] {<keyword "estimation">} [(selectivity)] {<keyword "selectivity">} [(optimize)] {<keyword "optimize">} [(tuple)] {<keyword "tuple">} [(operator)] {<keyword operator">} [(application)] {<keyword application">} [(substring)] {<keyword substring">} [(alphanumeric)] {<keyword alphanumeric">} [(attribute)] {<keyword attribute">} [(approximate)] {<keyword approximate">} ] ]]> </grammar> 13
PRINCIPLES OF VIRD VUI DESIGN Continuous Representation making the system s capabilities apparent to the user as a reminder at any point in the dialogues Immediate Impact immediate, implicit confirmation must be provided Incrementality a sense of continuity and natural flow in the conversation between the system agent and the user Summarization and Aggregation the results must be condensed for audio-only interfaces due to the constraint imposed by auditory memory limitations 14
Diagram of VIRD VUI Welcome Message Main Menu Dialogue Query Dialogue Results Dialogue Confirm Email Save Dialogue Confirm MyLibrary 15
VIRD Voice User Interface Main Menu Dialogue Contains four search functions: keyword, title, author or year Query Dialogue Allows the user to say the words that will be used during the search 16
VIRD Voice User Interface (cont d) Results Dialogue Voice Navigator: Presents the list of retrieval documents to the user through a list of voice command: NEXT, PREVIOUS, STOP, REPEAT, DETAIL, TRY AGAIN or SAVE Save Dialogue Allows the user to request a copy of the article via email or library 17
Experiment Participants Twenty Computer Science graduate and senior undergraduate students in a User Interface Design course participated in this experiment at Auburn University (ten female, ten male) Procedures Came in, used the same telephone, sit in the same chair, in the same room with the experimenter (as an observer) read a one-page instruction sheet interacted with the VIRD system to complete a task based on the task scenario. Task scenario: You are working on a research paper for Dr. X s database course. Your research topic is XML. Dr. X wants you to find a document on the subject tree algebra for XML using the system. When you find the document, use the save option to let the system email it to you filled out a survey giving subjective evaluation of the system s performance 18
Evaluation Methodology Measuring user satisfaction of the voice user interface for the document retrieval system PARADISE framework [1] 19
Evaluation Methodology (cont d) Maximize user satisfaction Maximize task success Minimize costs Efficiency measures Qualitative measures 20
Evaluation Metrics The first set: Task success Dialogue efficiency Dialogue qualitative The second set: Completion Inaccuracy Misinterpretation 21
Evaluation Result Metrics Comparison Metrics Comparision Chart 1 0.9 0.8 0.7 0.6 percent 0.5 0.4 86.50% 89.50% 81% 85% Series2 Series1 0.3 0.2 0.1 0 Task Success Dialogue Efficiency Dialogue Qualitative User Satisfactory metrics 22
Evaluation Result (cont d) Time of Completion 600 500 400 Seconds 300 Series1 200 100 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Subjects 23
Evaluation Result (cont d) Misinterpretation 7 6 5 4 Times 3 Series1 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Subjects 24
Experiment Summery Even though the misinterpretation rate is high, the user satisfaction is still high, this means, the user will accept the errors as long as they can recover from the errors easily A potential flaw in PARADISE Maximize task success Maximize user satisfaction Efficiency measures Minimize costs Qualitative measures 25
Future Work Investigate Spoken Query Retrieval for Large Documents (Yapin s research) Investigate a new usability model for Voice User Interface (Priyanka s research) 26
Demo 27
Questions 28
References 1. C.A.Kamm & M.A.Walker. Design and evaluation of spoken dialog systems. In Proceedings of the ASRU Workshop, 1997. 29