Providing Inferential Capability to Natural Language Database Interface



Similar documents
S. Aquter Babu 1 Dr. C. Lokanatha Reddy 2

International Journal of Advance Foundation and Research in Science and Engineering (IJAFRSE) Volume 1, Issue 1, June 2014.

Classification of Natural Language Interfaces to Databases based on the Architectures

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

Pattern based approach for Natural Language Interface to Database

Natural Language Query Processing for Relational Database using EFFCN Algorithm

Natural language Interface for Database: A Brief review

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

Department of Computer Science and Engineering, Kurukshetra Institute of Technology &Management, Haryana, India

A Survey of Natural Language Interface to Database Management System

DEVELOPMENT OF NATURAL LANGUAGE INTERFACE TO RELATIONAL DATABASES

Aneesah: A Conversational Natural Language Interface to Databases

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

An Approach for Response Generation of Restricted Bulgarian Natural Language Queries

Natural Language to Relational Query by Using Parsing Compiler

NATURAL LANGUAGE DATABASE INTERFACE

A Natural Language Query Processor for Database Interface

Natural Language Web Interface for Database (NLWIDB)

Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql

Conceptual Schema Approach to Natural Language Database Access

A Study of the Various Architectures for Natural Language Interface to DBs

An Approach for Designing a Restricted Bulgarian Natural Language Database Query System

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

Computer Standards & Interfaces

Intelligent Natural Language Query Interface for Temporal Databases

BINLI: An Ontology-Based Natural Language Interface for Multidimensional Data Analysis

Using Natural Language Interfaces

TEAM: A TRANSPORTABLE NATURAL-LANGUAGE INTERFACE SYSTEM. Barbara J. Grosz Artificial Intelligence Center SRI International Menlo Park, CA 94025

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu

A NATURAL LANGUAGE PROCESSOR FOR QUERYING CINDI

Natural Language Updates to Databases through Dialogue

Overview of the TACITUS Project

Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Interface for Web-based Databases

Bilingual Dialogs with a Network Operating System

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Application of Natural Language Interface to a Machine Translation Problem

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Metafrastes: A News Ontology-Based Information Querying Using Natural Language Processing

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

DEALMAKER: An Agent for Selecting Sources of Supply To Fill Orders

Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database

Minnesota K-12 Academic Standards in Language Arts Curriculum and Assessment Alignment Form Rewards Intermediate Grades 4-6

From Databases to Natural Language: The Unusual Direction

ONTOLOGY BASED FEEDBACK GENERATION IN DESIGN- ORIENTED E-LEARNING SYSTEMS

Paraphrasing controlled English texts

CALICO Journal, Volume 9 Number 1 9

Integrating Heterogeneous Data Sources Using XML

Knowledge Modelling in Support of Knowledge Management

DOCUMENT MANAGEMENT IN CONTEXT OF COLLABORATIVE SYSTEMS

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Special Topics in Computer Science

A Natural Language Database Interface For SQL-Tutor

NLUI Server User s Guide

ISSN: Sean W. M. Siqueira, Maria Helena L. B. Braz, Rubens Nascimento Melo (2003), Web Technology for Education

An Arabic Natural Language Interface System for a Database of the Holy Quran

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Linguistic Preference Modeling: Foundation Models and New Trends. Extended Abstract

How to make Ontologies self-building from Wiki-Texts

Object-Relational Database Based Category Data Model for Natural Language Interface to Database

Constructing an Interactive Natural Language Interface for Relational Databases

Application Design: Issues in Expert System Architecture. Harry C. Reinstein Janice S. Aikins

OWL based XML Data Integration

Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE)

A Framework for Ontology-Based Knowledge Management System

Rapid Prototyping of Application-oriented Natural Language Interfaces

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

A Tool for Generating Relational Database Schema from EER Diagram

A Case Study of Question Answering in Automatic Tourism Service Packaging

2 AIMS: an Agent-based Intelligent Tool for Informational Support

SOLVING SEMANTIC CONFLICTS IN AUDIENCE DRIVEN WEB DESIGN

Natural Language Dialogue in a Virtual Assistant Interface

TERMINOGRAPHY and LEXICOGRAPHY What is the difference? Summary. Anja Drame TermNet

Data Discovery on the Information Highway

Management of Human Resource Information Using Streaming Model

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

A Virtual Machine Searching Method in Networks using a Vector Space Model and Routing Table Tree Architecture

Ontology-based Product Tracking System

Information Brokering over the Information Highway: An Internet-Based Database Navigation System

Distributed Database for Environmental Data Integration

Knowledge-based Approach in Information Systems Life Cycle and Information Systems Architecture

Towards Building Robust Natural Language Interfaces to Databases

AN ARCHITECTURE OF AN INTELLIGENT TUTORING SYSTEM TO SUPPORT DISTANCE LEARNING

DLDB: Extending Relational Databases to Support Semantic Web Queries

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS

Automated Extraction of Security Policies from Natural-Language Software Documents

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

A Document Management System Based on an OODB

A Trio of Database User Interfaces for Handling Vague Retrieval Requests

Information extraction from online XML-encoded documents

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

Information Need Assessment in Information Retrieval

Novel Data Extraction Language for Structured Log Analysis

English Grammar Checker

Transcription:

International Journal of Electronics and Computer Science Engineering 1634 Available Online at www.ijecse.org ISSN- 2277-1956 Providing Inferential Capability to Natural Language Database Interface Harjit Singh Assistant Professor: Department of Computer Science Punjabi University Akali Phoola Singh Neighbourhood Campus, Dehla Seehan (Sangrur), Punjab, India Email: hjit@live.com Abstract- Not everybody is able to write SQL (Structured Query Language) queries as they may not be aware of the structure of the database. So there is a need for non-expert users to query relational databases in their natural language. The idea of using natural language instead of SQL, has promoted the development of Natural Language Interface to Database systems (NLIDB). The traditional Information Retrieval Models were based on approximation and lexical mapping which had its own deficiencies. System is inadequate if the query uses hypernyms (broad category words). If a user is using synonyms of lexicon then the system is unable to access the database. Homonymous keyword in query may arise ambiguity and possibly produce erroneous result because the system is unable to distinguish actual meaning of homonyms. In a query, lexemes may be related to each other and produce a collated meaning which is not considered by the classical system. To overcome these limitations, a knowledgebase can be provided to the NLIDB. The knowledgebase will provide inferential capability to the systems using a collection of hypernyms, synonyms, homonyms, discourse and other information required to produce accurate results. Keywords NLIDB, NLI, hypernyms, synonyms, homonyms, discourse I. INTRODUCTION Asking questions to databases in natural language is very convenient and easy method of data access, especially for casual users who do not understand complex database query language such as SQL. Although number of efforts has been done by researchers to provide intelligence to Natural Language Interface to Database (NLIDB), they are not complete. Some of these efforts include: The system LUNAR was introduced in 1971. The system uses an Augmented Transition Network (ATN) parser and Woods' procedural Semantics. The system performance was quite impressive; it managed to handle 78% of requests without any errors and this ratio rose to 90% when dictionary errors were corrected. But these figures may be misleading because the system was not subject to intensive use due to the limitation of its linguistic capabilities. The LADDER system was designed as a natural language interface to a database of information about US Navy ships. The system uses semantic grammars technique that interleaves syntactic and semantic processing. The system was able to process a database that is equivalent to a relational database with 14 tables and 100 attributes. The RENDEZVOUS system appeared in late seventies. In this, users could access databases via relatively unrestricted natural language. In this system, special emphasis is placed on query paraphrasing and in engaging users in clarification dialogs when there is difficulty in parsing user input. The PLANES system was developed in late seventies at the University of Illinois Coordinated Science Laboratory. PLANES include an English language front end with the ability to understand and explicitly answer user requests. It carries out clarifying dialogues with the user as well as answer vague or poorly defined questions. The PHILIQA system was developed in 1977 and was known as Philips Question Answering System, uses a syntactic parser which runs as a separate pass from the semantic understanding passes. This system is mainly involved with problems of semantics and has three separate layers of semantic understanding. The system CHAT-80 is one of the most referenced NLP (Natural Language Processing) systems in the eighties. The database of CHAT-80 consists of facts (i. e. oceans, major seas, major rivers and major cities) about 150 of the

Providing Inferential Capability to Natural Language Database Interface countries world and a small set of English language vocabulary that are enough for querying the database. The CHAT-80 system processes an English language question in three stages. The system TEAM was developed in 1987. A large part of the research earch of that time was devoted to portability issues. TEAM was designed to be easily configurable by database administrators with no knowledge of NLIDBs. The system DATALOG is an English database query system based on Cascaded ATN grammar. By providing separate representation schemes for linguistic knowledge, general world knowledge, and application domain knowledge, DATALOG achieves a high degree of portability and extendibility. NALIX (Natural Language Interface for an XML Database) is an NLIDB system developed at the University of Michigan in 2006. The database used for this system is extensible markup language (XML) database with Schema- Free XQuery as the database query language. NALIX is different from the general syntax based approaches; in the way the system was built: NALIX implements a reversed-engineering engineering technique by building the system from a query language toward the sentences. Indeed all these efforts are not worthless although they are not complete. The system fails to understand and execute the queries containing hypernyms, synonyms, homonyms, discourse etc., so next section of this paper is trying to give the method for overcoming the mentioned inadequacies. II. PROVIDING INFERENTIAL CAPABILITY TO NLIDB The conventional NLI system is not capable to understand hypernyms, synonyms, homonyms and discourse. This section will discuss how the results are affected when these words are used in a query and propose a method to overcome the deficiencies of the system. A. Hypernyms Hypernym is a word that is more generic than a given word. A linguistic term for a word whose meaning includes the meanings of other words. If a query contains a hypernym of a specific word, the classical NLI system will be inadequate due to the non-availability of the knowledge of hypernyms. For example: If NLI Query is: Number of students doing graduation The conventional NLI does not have the knowledge that the word graduation is a hypernym of BA, BSc, Bcom, BCA, BBA etc. (Figure 4). So, it will fail to produce the required result. To overcome the limitation, a database of hypernyms needs to be embedded in the NLI system. Graduation BA BSc BCom BCA BBA Figure 1. Graduation is a hypernym for BA, BSc, BCom, BCA, BBA etc. B. Synonyms Synonyms are different words with almost identical or similar meanings. Words that are synonyms are said to be synonymous, and the state of being a synonym is called synonymy. If synonyms of lexicon are used in a query, the classical Information Retrieval methods will be unable to answer the query due to the non-availability of database of synonyms. For example: 1639

If a NLI query is: Number of employees whose salary is more than 25000 The equivalent SQL query will be something like: SELECT * FROM EMPLOYEES WHERE SALARY>=25000 The query will produce the result correctly. But if the user uses a synonym and use a NLI query as: Number of employees whose pay is more than 25000 The equivalent SQL query will be something like: SELECT * FROM EMPLOYEES WHERE PAY>=25000 IJECSE,Volume1,Number 3 Harjit Singh et al. In this case, query will not produce the correct result because NLI does not know that Salary and Pay are two words with the same meaning. To make the NLI efficient to answer such type of queries which may use synonyms, the database of synonyms needs to be embedded in NLI System. In this way, if a lexicon or keyword does not match, the system will not refuse to answer the query. Instead, it will traverse through all the synonyms of that unmatched keyword provided in the database of synonyms and replace the unmatched keyword with its synonym to answer the query successfully (Figure 1). Pay Salary Income Remuneration SELECT * FROM EMPLOYEES WHERE SALARY>=25000 Figure 2. Replacing an unmatched keyword with its synonym C. Homonyms In linguistics, a homonym is, in the strict sense, one of a group of words that share the same spelling and the same pronunciation but have different meanings. The state of being a homonym is called homonymy. If a query contains a homonymous keyword, it raises ambiguity in the meaning of the query. There is a possibility of producing the erroneous result because classical NLI is unable to get the actual meaning of homonym in the query. For example: If NLI query is: How many singers like the rock? The word rock has two meanings: 1) The solid mineral material forming part of the surface of the earth. A large stone. 2) A type of music (a shortened form of rock and roll) The classical NLI does not know the actual meaning of rock in the query. To overcome the limitation, a knowledgebase of homonyms need to be embedded in the NLI system. Since the word rock can be used in two different contexts, the knowledgebase will provide the actual meaning of the word by relating it to the other words in the query (Figure 2 and Figure 3).

Providing Inferential Capability to Natural Language Database Interface Rock Type of Music Related to Singer Figure 3. Rock is related to Singer Rock Is A Large Stone Composed Of Solid Mineral Material Figure 4. Rock is a large stone D. Discourse Discourse is the power of the mind to reason or infer by running, as it were, from one fact or reason to another, and deriving a conclusion; an exercise or act of this power; reasoning; range of reasoning faculty. If a query contains a keyword that refers to another keyword, there is the possibility of failure to produce the result because classical NLI does not have the outer world knowledge. For example: If NLI query is: Show the age of John and his father? Classical NLI cannot conclude the fact due to the non-availability of reasoning capability. If a knowledgebase of pronouns, anaphora and noun-phrases (NP) is embedded in NLI system, it will provide intelligence to the system to make a conclusion in such situations and result from this type of query can be achieved. The Discourse Representation Structure (DRS) in Figure-5 shows the discourse in the above query. ({x, y}, {x=john, His(x), Robert(y), Father(y, x)}) x, y

IJECSE,Volume1,Number 3 Harjit Singh et al. x= John His(x) Robert(y) Father(y, x) Figure 5. DRS of John and his Father So in this query, his refers to John and y is the Father of x and y is Robert. After this conclusion, the query will become: Show the age of John and Robert Which can be easily processed and produce the required result. III. CONCLUSION Conventional NLI model suffered from inadequacies which allow only static format queries to be executed by the system. It puts more burdens on the user to formulate queries that the system can answer successfully. To make NLI system user friendly, a knowledgebase is embedded in it to overcome its deficiency of realizing hypernyms, synonyms, homonyms and discourse. This inferential capability will make the NLI system intelligent and enable it to execute open domain based queries. It will be an intelligent NLIDB. REFERENCES [1] Majdi Owda, Zuhair Bandar, Keeley Crockett, Conversation-Based Natural Language Interface to Relational Databases, pages 363-367 (2007). [2] Mrs. Neelu Nihalani 1, Dr. Sanjay Silakari 2, Dr. Mahesh Motwani, Natural language Interface for Database: A Brief review, IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 2, pages 600-608, March 2011 [3] Siddiqui, Tanveer and Tiwary, U.S., Natural Language processing and Information Retrieval, Oxford University press (2008). [4] Albert Visser, Discourse Representation by Hypergraphs, Doctoraalscriptie voor de studie Cognitieve Kunstmatige Intelligentie Sander Bruggink, November 6, 2001 [5] Jha, Girish Nath, A Natural Language Interface for Databases, Dept. of Linguistics, University of Illinois, Urbana-Champaign [6] www.thefreedictionary.com/rock [7] en.wikipedia.org/wiki/synonym [8] http://grammar.about.com/od/fh/g/hypernym.htm [9] http://en.wikipedia.org/wiki/homonym [10] www.definitions.net/definition/discourse [11] Woods, W. An experimental parsing system for transition network grammars. In Natural language Processing, R. Rustin, Ed.,Algorithmic Press, New York. (1973) [12] Woods, W., Kaplan, R. and Webber, B. The Lunar Sciences Natural Language Information System. Bolt Beranek and Newman Inc., Cambridge, Massachusetts Final Report. B. B. N. Report No 2378. (1972) [13] Hendrix, G. The LIFER manual A guide to building practical natural language interfaces. SRI Artificial Intelligence Center, Menlo Park, Calif. Tech. Note 138. (1977) [14] Hendrix, G., Sacrdoti, E., Sagalowicz, D. and Slocum, J. Developing a natural language interface to complex data. ACM Transactions on Database Systems, Volume 3, No. 2, USA, Pages 105 147 (1978) [15] D.L. Waltz., An English Language Question Answering System for a Large Relational Database, Communications of the ACM, 21(7):, pp 526 539 (July 1978) [16] R.J.H., Scha., Philips Question Answering System PHILIQA1, In SIGART Newsletter, no.61. ACM, New York, (February 1977) [17] Amble, T. BusTUC A Natural Language Bus Route Oracle. 6th Applied Natural Language Processing Conference, Seattle, Washington, USA (2000) [18] Warren, D., Pereira, F. An efficient and easily adaptable system for interpreting natural language queries in Computational Linguistics. Volume 8 pages 3 4. (1982)

Providing Inferential Capability to Natural Language Database Interface [19] B.J. Grosz, TEAM: A Transportable Natural-Language Interface System, In Proceedings of the 1st Conference on Applied Natural Language Processing, Santa Monica, California, pp 39 45, (1983) [20] B.J. Grosz, D.E. Appelt, P.A. Martin, and F.C.N. Pereira, TEAM: An Experiment in the Design of TransportableNatural-Language Interfaces, Artificial Intelligence, 32:, pp 173 243, ( 1987) [21] C.D. Hafner, Interaction of Knowledge Sources in a Portable Natural Language Interface, In Proceedings of the 22nd Annual Meeting of ACL, Stanford, California, pp 57 60, (1984) [22] Yunyao Li, Huahai Yang, and H.V. Jagadish, Nalix:an Interactive Natural Language Interface for Querying XML, SIGMOD (2005). [23] Yunyao Li, Huahai Yang, and H.V. Jagadish, Constructing a Generic Natural Language Interface for an XML Database, EDBT (2006).