Natural Language Interfaces to Databases: simple tips towards usability



Similar documents
Towards a flexible syntax/semantics interface

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

M3039 MPEG 97/ January 1998

Pragmatic Web 4.0. Towards an active and interactive Semantic Media Web. Fachtagung Semantische Technologien September 2013 HU Berlin

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

On Intuitive Dialogue-based Communication and Instinctive Dialogue Initiative

Natural Language Web Interface for Database (NLWIDB)

Towards Unsupervised Word Error Correction in Textual Big Data

Europass Curriculum Vitae

Natural Language to Relational Query by Using Parsing Compiler

SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Overview of MT techniques. Malek Boualem (FT)

Week 3. COM1030. Requirements Elicitation techniques. 1. Researching the business background

Managing large sound databases using Mpeg7

HOPS Project presentation

Text Mining - Scope and Applications

Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Performance Evaluation Techniques for an Automatic Question Answering System

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Robustness of a Spoken Dialogue Interface for a Personal Assistant

MuZeeker: a domain specific Wikipedia-based search engine

The Prolog Interface to the Unstructured Information Management Architecture

Implementation of an Information Technology Infrastructure Library Process The Resistance to Change

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu

Reengineering a domain-independent framework for Spoken Dialogue Systems

Master in Digital Humanities

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

The PALAVRAS parser and its Linguateca applications - a mutually productive relationship

Annotea and Semantic Web Supported Collaboration

JaVaLI!: understanding real questions

Cooperative question-responses and question dependency

Unifying Search for the Desktop, the Enterprise and the Web

Semantic annotation of requirements for automatic UML class diagram generation

A terminology model approach for defining and managing statistical metadata

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

Metafrastes: A News Ontology-Based Information Querying Using Natural Language Processing

Moving Enterprise Applications into VoiceXML. May 2002

Annotation and Evaluation of Swedish Multiword Named Entities

Building a Question Classifier for a TREC-Style Question Answering System

WRITING ACROSS THE CURRICULUM Writing about Film

Interactive Dynamic Information Extraction

Curriculum for the Master of Arts programme in Slavonic Studies at the Faculty of Humanities 2 of the University of Innsbruck

Luís Carlos dos Santos Marujo

Designing Programming Exercises with Computer Assisted Instruction *

THE BACHELOR S DEGREE IN SPANISH

Digital data collection and registration on geographical names during field work*

DEVELOPING REQUIREMENTS FOR DATA WAREHOUSE SYSTEMS WITH USE CASES

Text Analytics with Ambiverse. Text to Knowledge.

Analysis and Synthesis of Help-desk Responses

A web-based multilingual help desk

The future of Artificial Intelligence

Voice Driven Animation System

Processing Dialogue-Based Data in the UIMA Framework. Milan Gnjatović, Manuela Kunze, Dietmar Rösner University of Magdeburg

Ricardo Dias. Contacts. About me. (Born August 28th 1986, Portugal)

Terminology mining with ATA and Galinha

DESIGN AND DEVELOPING ONLINE IRAQI BUS RESERVATION SYSTEM BY USING UNIFIED MODELING LANGUAGE

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH

Pattern based approach for Natural Language Interface to Database

Online free translation services

GCSE Media Studies. Course Outlines. version 1.2

Comparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering

ISBN:

2 F@QA@CLEF. 1 Introduction. Categories and Subject Descriptors. General Terms. Keywords

MULTIFUNCTIONAL DICTIONARIES

Transcription:

Natural Language Interfaces to Databases: simple tips towards usability Luísa Coheur, Ana Guimarães, Nuno Mamede L 2 F/INESC-ID Lisboa Rua Alves Redol, 9, 1000-029 Lisboa, Portugal {lcoheur,arog,nuno.mamede}@l2f.inesc-id.pt http://www.l2f.inesc-id.pt Abstract. Natural Language Interfaces to Databases can be an easy way to obtain information: the user simply has to write a question in his/her own language to get the desired answer. Nevertheless, these kind of applications also present some problems. Many of those arise from the fact that who develops the interface does it according with his/her own idea of usability, which is sometimes far from the real interaction the interface will have to support; but even when a question is syntactically supported, it can be misunderstood and a wrong answer can be provided to the user. In this paper we present some simple tips that intend to minimize these situations. 1 Introduction During the implementation of JaTaDigo [1, 2], a Natural Language Interface (in Portuguese) to a cinema database, we had to deal with many problems related with usability and we understood that some simple solutions can be implemented in order to minimize these problems and its effects. As so, in this paper, we focus on some tips that intend to make NLIDBs more user friendly and trustable, improving their usability. The paper is organized as follows: in Section 2 some related work is presented; in Section 3 we present some tips towards usability; in Section 4 we evaluate one of those tips, namely the importance of presenting examples of questions that are understood by the system, as well as questions that the system is not able to answer; finally, in Section 5 we present some conclusions and future work. 2 Related Work Communicating with the computer is a long-standing goal for Artificial Intelligence research. Although the first NLIDB emerged in the 70 s, NLIDB had their golden era in the 80 s and mid 90 s. Nowadays, NLIDB are considered to be particular situations of question answering (QA) systems. In recent years there have been several attempts to merge QA systems with dialogue systems, improving system results by allowing interaction with the user. For instance, HITIQA

(High-Quality Interactive Question Answering) [3], is an interactive question answering system that answers (complex) open domain questions in natural language, such aswhat has been Russia s reaction to U.S. bombing of Kosovo? and narrows the search space through a clarification dialogue with the user. Another example is the RITEL (Recherche d Informations para TELéphone) project [4]. Its goal is to integrate conversational and oral capabilities in information retrieval systems (made by phone) and, in particular, in QA systems. We also detach TV- Guide and BirdQuest projects [5 7]. In TV-Guide, a multimodal system is used to allow access to public domain information, namely television programming. Within this application, the user can formulate a vague question that is then refined in a dialogue; BirdQuest answers questions about nordic birds. In this system, dialogue capacities are combined with information extraction. JaTeDigo follows some of these applications ideas as we also believe that interacting with the user even in a very simple manner, not implicating the development of a truly dialogue system can improve the application results. 3 Tips As said before, many problems arise from the fact that who develops a NLIDB does it according with his/her own idea of usability. In the following we present some tips concerning this problem: Before starting the NLIDB implementation, a corpus containing questions that users would like to ask should be build. This corpus can be used to identify the questions in which the developer should invest that is frequently asked questions with the same syntax and/or topic but also to confront developers with inventive and unusual questions. Considering JaTeDigo implementation, before starting the development of the interface, a corpus with around 80 questions was build from 8 users. At that point we understood that there was a set of questions that we could not answer, regarding the information we had in the database. For instance, the question Qual o maior êxito de bilheteira dos últimos 5 anos? (Which was the major box office in the last 5 years?) could not be answered because the database had no information concerning major box offices. Also, another problem that was detected in this phase resulted from the fact that questions were written in Portuguese and information regarding characters, was in English. As so, for instance, the question De quem é a voz do burro no Shrek? (From whom is the donkey voice in Shrek?) could not be answered, because we had no means to translate burro into donkey. Present examples of successful and unsuccessful questions to the user. The examples obtained in the previous step can be used to guide the user in the type of question that he/she may or may not submit. A first evaluation should be done as soon as possible, without embarrassments, and by as many different users as possible. When the interface is in use, if there is no way for the system to perform a safe disambiguation it is better to profit from the user to do it. Considering

JaTeDigo, as sometimes there is no way to disambiguate without making possible wrong choices, we opt to ask user s opinion. Figure 1 illustrates this disambiguation step being given the question Who directed King Kong?. Unnecessary interactions should be avoided. For instance, consider the question Who plays with Emma Watson in Harry Potter?. There are two actresses with the name Emma Watson, nevertheless, only one of them plays in Harry Potter. As a result, this ambiguity should be solved by the system as there is no need to ask the user to disambiguate: the information is all there. Identify situations where the user will use the wrong words to ask the question that he/she has in mind and adapt the system to those. For instance, the following question was asked to JaTeDigo Quem contracena com Hugo Weaving em The Lord of the Rings? (Who plays with Hugo Weaving in The Lord of the Rings? ) and an early version of JaTeDigo answered Hugo Weaving does not participate in the movie The Lord Of The Rings. Why? Because none of the movies from the Tolkien trilogy is called exactly The Lord of the Rings (but, for instance, The Lord of the Rings, the two towers). Besides, there is an animation movie from 1978 with that name (and Hugo Weaving does not participate in it). As a result JaTeDigo understood that the user was asking about that movie from 1978. As the previous step is not always possible, try to minimize the troubles caused by a wrong answer, by providing information that can help the user to validate the answer or to understand that the question was badly interpreted. Considering JaTeDigo, information about the film opening year is provided, as well as the main cast. If JaTeDigo answer was Hugo Weaving does not participate in the movie The Lord Of The Rings from 1978, the user would understand that something was wrong. Fig. 1. Disambiguation step.

In the following we show some preliminary results of an evaluation concerning the last tip. 4 How important are example-questions? JaTeDigo interface is a web page (Figure 2). As it happens with START [8], examples of successful and unsuccessful questions are presented in order to give the user a picture of the system capabilities and limitations. Fig. 2. JaTeDigo interface (used in Experiment A). We asked 10 people to ask 10 questions to JaTeDigo: 5 had an interface with examples (from now on Experiment A); 5 had an interface without examples (from now on Experiment B). Results are shown in table 1. From Table 1 we can conclude that 33 questions were answered in Experiment A, against 20 from Experiment B. It should be notice that in both experiments, the percentage of the correctly/incorrectly answered questions is similar. That is, 29 questions in 33 were answered in Experiment A, and 18 in 20 were answered in Experiment B. It should also be noticed that 15 of the not answered questions in Experiment B were due to the fact that they were not supported; only 5 were not supported in Experiment A and all of them resulted from one single orthographic mistake. In fact the word oscars in Portuguese is written óscares and the 5 questions not answered/not supported were due a missing accent. What is curious is that a question with this word is shown in the example-questions of Experiment A. By this, although this is a preliminary evaluation, we can say that the user is influenced by the examples showed (mainly influenced by its topics or syntax),

Correctly 29 Answered 33 Incorrectly 4 Experiment A Not supported 5 Not answered Incorrect NER 0 17 Other motives 12 Correctly 18 Answered 20 Incorrectly 2 Experiment B Not supported 15 Not answered Incorrect NER 9 30 Other motives 6 Table 1. Results from Experiment A and B. but, apparently, he/she does not read carefully enough the presented examples in order to avoid misspellings. Anyway, we can say that it is worthy to invest in examples in the interface. 5 Conclusions and Future Work We have presented some tips that intend to make NLIDBs more user friendly and trustable. First, we have detached the user s role: the NLIDB can profit from potential users feedback during the development process, allowing to understand the question that will effectively be asked to the system (and not only what the development team has in mind). Also the NLIDB can profit from the user feedback when the interface is running, for instance, for disambiguation proposes. Secondly, we have presented some tips to increase (or at least not to decrease) user s confidence: the system should try to avoid unnecessary questions and provide information in the answers that would help the user to understand if the question was well interpreted (or not). Also, particular situations, where it is known that user will formulate the question in a incorrect way should be identified. Moreover, we have presented an experiment that intended to show the importance of guiding the user with successful and unsuccessful examples and we have shown that this guidance lead to a considerable increase of successful answered questions although it does not help to avoid misspellings. A system as JaTeDigo, as any NLIDB, needs constant improvement. As future work we intend to continue to extend its understanding capabilities and make it more robust: if only part of the request was understood, a dialogue with the user should be establish in order to refine the question. Moreover, we intend to incorporate some of these tips in a QA system. References 1. Guimarães, R.: Játedigo uma interface em língua natural para uma base de dados de cinema. Master s thesis, Instituto Superior Técnico (2007)

2. Coheur, L., Guimarães, R., Mamede, N.: Supporting named entity recognition and syntactic analysis with full text queries. In: Proceedings of the 3th International Conference on Applications of Natural Language to Information Systems (NLDB2008), London, Springer-Verlag (2008) 3. Small, S., Strzalkowski, T., Liu, T., Ryan, S., Salkin, R., Shimizu, N., Kantor, P., Kelly, D., Rittman, R., Wacholder, N., Yamrom, B.: Hitiqa: Scenario based question answering. In Harabagiu, S., Lacatusu, F., eds.: HLT-NAACL 2004: Workshop on Pragmatics of Question Answering, Boston, Massachusetts, USA, Association for Computational Linguistics (May 2 - May 7 2004) 52 59 4. Rosset, S., Galibert, O., Illouz, G., Max, A.: Interaction et recherche d information : le projet Ritel. Traitement Automatique des Langues 46(46-3) (2006) 5. Jönsson, A., Merkel, M.: Some issues in dialogue-based question-answering. In Maybury, M.T., ed.: New Directions in Question Answering, AAAI Press (2003) 45 48 6. Jönsson, A., Merkel, M.: Extending qa systems to dialogue systems. In: Working Notes from NoDaLiDa 03, Iceland (2003) 7. Jönsson, A., Andén, F., Degerstedt, L., Flycht-Eriksson, A., Merkel, M., Norberg, S.: Experiences from combining dialogue system development with information extraction techniques. In: New Directions in Question Answering. (2004) 153 168 8. Katz, B., Lin, J.: Annotating the semantic web using natural language. In: NLPXML 02: Proceedings of the 2nd workshop on NLP and XML, Morristown, NJ, USA, Association for Computational Linguistics (2002) 1 8