Classification of Natural Language Interfaces to Databases based on the Architectures



Similar documents
S. Aquter Babu 1 Dr. C. Lokanatha Reddy 2

International Journal of Advance Foundation and Research in Science and Engineering (IJAFRSE) Volume 1, Issue 1, June 2014.

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

Providing Inferential Capability to Natural Language Database Interface

Pattern based approach for Natural Language Interface to Database

Natural language Interface for Database: A Brief review

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Department of Computer Science and Engineering, Kurukshetra Institute of Technology &Management, Haryana, India

A Survey of Natural Language Interface to Database Management System

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

Natural Language to Relational Query by Using Parsing Compiler

An Approach for Response Generation of Restricted Bulgarian Natural Language Queries

A Study of the Various Architectures for Natural Language Interface to DBs

Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu

Natural Language Database Interface for the Community Based Monitoring System *

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

Natural Language Web Interface for Database (NLWIDB)

Conceptual Schema Approach to Natural Language Database Access

Object-Relational Database Based Category Data Model for Natural Language Interface to Database

NATURAL LANGUAGE DATABASE INTERFACE

Special Topics in Computer Science

A Natural Language Query Processor for Database Interface

Semantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing

DEVELOPMENT OF NATURAL LANGUAGE INTERFACE TO RELATIONAL DATABASES

Effective Self-Training for Parsing

Search and Information Retrieval

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

RRSS - Rating Reviews Support System purpose built for movies recommendation

Search Result Optimization using Annotators

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

AUTOMATIC DATABASE CONSTRUCTION FROM NATURAL LANGUAGE REQUIREMENTS SPECIFICATION TEXT

Automatic Text Analysis Using Drupal

Application of Natural Language Interface to a Machine Translation Problem

Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis

KEYWORD SEARCH IN RELATIONAL DATABASES

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Basic Parsing Algorithms Chart Parsing

Metafrastes: A News Ontology-Based Information Querying Using Natural Language Processing

A Workbench for Prototyping XML Data Exchange (extended abstract)

Building a Question Classifier for a TREC-Style Question Answering System

Computer Standards & Interfaces

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

The Sierra Clustered Database Engine, the technology at the heart of

CA Compiler Construction

Machine Learning for natural language processing

Compiler I: Syntax Analysis Human Thought

II. PREVIOUS RELATED WORK

Lappoon R. Tang, Assistant Professor, University of Texas at Brownsville,

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

1 File Processing Systems

Component Approach to Software Development for Distributed Multi-Database System

Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql

Deploying Artificial Intelligence Techniques In Software Engineering

Unit 3. Retrieving Data from Multiple Tables

Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms

The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects

Integrating Heterogeneous Data Sources Using XML

Efficient Integration of Data Mining Techniques in Database Management Systems

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY

An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

A Framework of Personalized Intelligent Document and Information Management System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Converting Relational Database Into Xml Document

1 Introduction. Dr. T. Srinivas Department of Mathematics Kakatiya University Warangal , AP, INDIA

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Course Objectives Course Requirements Methods of Grading S/N Type of Grading Score (%) Course Delivery Strategies Practical Schedule LECTURE CONTENT

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

A Survey on Product Aspect Ranking

Extraction of Radiology Reports using Text mining

Transcription:

Volume 1, No. 11, ISSN 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/ Classification of Natural Language Interfaces to Databases based on the Architectures S.AQUTER BABU Asst. Professor Dept. of Computer Science Dravidian University Kuppam, India s_a_babu1@yahoo.co.in D. MABUNI Asst. Professor Dept. of Computer Science Dravidian University Kuppam, India mabuni.d@gmail.com Prof. C. LOKANATHA REDDY Professor Dept. of Computer Science Dravidian University Kuppam, India Abstract Natural Language Interface to Database (NLITDB) system is an interface to a database where an user submits his/her request to retrieve some information from a database in natural language like English. A NLITDB system accepts questions in natural language and generates results. Generally, users have to learn a Query Language such as Structured Query Language (SQL) to formulate a query and to retrieve information from a database. Learning a Query Language such as Structured Query Language (SQL) is difficult for many non-technical database users. A solution for this problem is to make use of NLITDB to retrieve information from database. Nowadays, the importance of NLITDB system is gained because of the increasing interaction of non-technical users with databases. Many NLITDB systems were developed since 1960 s. Each NLITDB system used an architecture to process the natural language question submitted by the user. In this paper, We classify and review the existing NLITDB systems based on the architectures adopted by them. However, the outcomes of this classification are dedicated to the researchers in NILITDB systems to know which architectures were used more in the development of NLITDB systems. Keywords- Databases, Natural Language Interface to Database (NLITDB), Architecture, Structured Query language (SQL). 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 51

1. Introduction One of the main characteristics of a Database Management System (DBMS) is to allow users to create and maintain a Database. Database is an organized collection of logically related data. Nowadays, many non-technical people also are interacting with databases. DBMSs provide a query language such as Structured Query language (SQL) for the users to formulate queries and to retrieve information from a database. It is difficult for the non-technical people to formulate a query in query language such as SQL to retrieve information from a database because of the lack of knowledge about database structure, SQL syntax etc. Natural Language Interface to Database (NLITDB) systems were developed since 1960 s to solve the problem of formulating queries in query language such as SQL to retrieve the information from a database. NLITDB systems allow the users to submit their request to retrieve information from the database in natural language such as English. NLITDB system accepts questions in natural language and these user questions are translated into a query language such as SQL, which are processed by the DBMS to retrieve the answers. In this paper, We classify and review the existing NLITDB systems based on the architectures adopted by them. However, the outcomes of this classification are dedicated to the researchers in NILITDB systems to know which architectures were used more in the development of NLITDB systems. The rest of the paper is organized as follows: section 2 presents an overview of different architectures adopted by many NLITDB systems. Section 3 discusses the classification of NLITDB systems based on the architectures adopted by them. Section 4 presents the results and Finally, section 5 concludes the paper. 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 52

2. Types of Architectures The following four types of Architectures were used in the development of many NLITDB systems [1]. Each architecture in NLITDB system reflects different choices of what information is to be applied and in what manner. Pattern-Matching systems Syntax-based systems Semantic Grammar systems Intermediate Representation Languages (IRL) 2.1) Pattern-Matching systems Some of the early NLITDB systems relied on pattern-matching techniques to answer the user's questions. To illustrate a simplistic pattern-matching approach, consider a database table holding information about countries: Countries table Country Capital Language --------- ------- --------- France Paris French Italy Rome Italian......... A primitive pattern-matching system could use rules like: pattern:... ``capital''... <country> action : Report Capital of row where Country = <country> The above rule says that if a user's request contains the word ``capital'' followed by a country name (i.e. a name appearing in the Country column), then the system should locate the row which contains the country name, and print the corresponding capital. 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 53

If, for example, the user typed ``What is the capital of Italy?'', the system would use the above pattern rule, and report ``Rome''. The same rule would allow the system to handle ``Print the capital of Italy.'', ``Could you please tell me what is the capital of Italy?'', etc. In all cases the same response would have been generated. The main advantage of the pattern-matching approach is its simplicity: no elaborate parsing and interpretation modules are needed, and the systems are easy to implement. The pattern-matching architecture was used in one of the NLITDB systems SAVVY. 2.2) Syntax-based systems Syntax based systems are based on the idea of extending syntactic parsers with semantic labels. A sentence is parsed using certain grammar rules resulting in a syntactic tree, some of the nodes in the tree are then mapped to their semantic meaning, and these semantic meanings are further combined to produce the corresponding database query in database query language such as SQL. The main advantage of using syntax based approaches is that they provide detailed information about the structure of a sentence. A parse tree contains a lot of information about the sentence structure; starting from a single word and its part of speech, how words can be grouped together to form a phrase, how phrases can be grouped together to form more complex phrases, until a complete sentence is built. Having this information, we can map the semantic meanings to certain production rules (or nodes in a parse tree). The Syntax based systems architecture was used in the NLITDB systems like LUNAR, NALIX etc. 2.3) Semantic Grammar systems A Semantic grammar system is very similar to the syntax based system, meaning that the query result is obtained by mapping the parse tree of a sentence to a database query in database query language such as SQL. The basic idea of a semantic grammar system is to 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 54

simplify the parse tree as much as possible, by removing unnecessary nodes or combining some nodes together. Based on this idea, the semantic grammar system can better reflect the semantic representation without having complex parse tree structures. Therefore, a production rule in a semantic grammar system does not necessarily correspond to the general syntactic concepts. Instead of smaller structures, the semantic grammar approach also provides a special way for assigning a name to a certain node in the tree, thus resulting in less ambiguity compared to the syntax based approach. The Semantic grammar systems architecture was used in the NLITDB systems like PLANES, LADDER, REL etc. 2.4) Intermediate Representation Languages (IRL) Due to the difficulties of directly translating a sentence into a general database query languages using a syntax based approach, the intermediate representation systems were proposed. The idea is to map a sentence into a logical query language first, and then further translate this logical query language into a general database query language, such as SQL. In the process there can be more than one intermediate meaning representation language. The following Figure shows a possible architecture of an intermediate representation language system. 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 55

The Intermediate Representation Languages (IRL) architecture was used in the NLITDB systems like CHAT-80, PHILIQA, TEAM etc. 3. Classification of NLITDB Systems Each NLITDB system used an architecture to process the natural language question submitted by the user. We have collected information about Twenty One existing NLITDB systems through research papers published and available in the Internet. After studying and analyzing these NLITDB systems, We have classified them into different categories based on the architectures adopted by them. The following NLITDB system adopted Pattern-Matching systems architecture. SAVVY The following NLITDB systems adopted Syntax-based systems architecture. LUNAR NALIX The following NLITDB systems adopted Semantic-Grammar systems architecture. LADDER RENDEZVOUS PLANES REL EUFID ELF EASYASK ENGLISH QUERY The following NLITDB systems adopted Intermediate-Representation Languages (IRL) architecture. PHILIQA CHAT-80 TEAM IRUS Ginsparg s JANUS LOQUI MASQUE/SQL EDITE CLE 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 56

The following graph shows the above classification Classification of NLITDB Systems Number of NLITDB Systems 12 10 8 6 4 2 0 Pattern-Matching systems Syntax-based systems Sematic- Grammar systems Intermediate- Representation Languages Architectures 4. Results Based on our study and analysis about NLITDB systems, We came to know about twenty one existing NLITDB systems and their architectures. We also came to know that most of the NLITDB systems have adopted Semantic-Grammar systems and Intermediate-Representation Languages Architectures. 5. Conclusion Natural Language Interface to Database (NLITDB) system allows database users to formulate questions in natural language like English to retrieve information from a database. Users questions are translated into database query language such as SQL, which is processed by a DBMS to return the answer. Many NLITDB systems were developed since 1960 s with different architectures. In this paper, We have classified twenty one existing NLITDB systems based on four main architectures adopted by them. Based on our study and analysis about NLITDB systems, We conclude that most of the NLITDB systems have adopted Semantic-Grammar systems and Intermediate-Representation Languages Architectures. References [1] I. Androutsopoulos, G.D. Ritchie, and P. Thanisch, Natural Language Interfaces to Databases An Introduction, Journal of Natural Language Engineering 1 Part 1 (1995), 29--81. [2] Eric Brill, Transformation Based Error Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging, ACL (1995). 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 57

[3] Eugene Charniak, A maximum-entropy-inspired parser, North American Association for Computational Linguistics (2000), 132--139. [4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cli#ord Stein, Introduction to Algorithms, Second Edition, MIT Press and McGrawHill, 2001. [5] D.R. Dowty, R.E. Wall, and S. Peters, Introduction to montague semantics, D.Reidel Publishing Company, Dordrecht, Holland, 1981. [6] G. Hendrix, E. Sacerdoti, D. Sagalowicz, and J. Slocum, Developing a Natural Language Interface to Complex Data, ACM Transactions on Database Systems (1978), 105--147. [7] Daniel Jurafsky and James H. Martin, Speech and Natural Language Processing, PrenticeHall Inc., Upper Saddle River, New Jersey, 2000. [8] Rohit J. Kate and Raymond J. Mooney, Using StringKernels for Learning Semantic Parsers, COLING ACL (2006). [9] Yunyao Li, Huahai Yang, and H.V. Jagadish, Nalix:an Interactive Natural Language Interface for Query ing XML, SIGMOD (2005). [10] Yunyao Li, Huahai Yang, and H.V. Jagadish, Constructing a Generic Natural Language Interface for an XML Database, EDBT (2006). [11] Raymond J. Mooney, Learning Language from Perceptual Context:A Challenge Problem for AI, Amer ican Association for Artificial Intelligence (2006). [12] AnaMaria Popescu, Alex Armanasu, Oren Etzioni, David Ko, and Alexander Yates, Modern Natural Language Interfaces to Databases:Composing Statistical Parsing with Semantic Tractability, COLING (2004). [13] Woods, W. (1973). An experimental parsing system for transition network grammars in Natural Language Processing, R. Rustin. Ed., Algorithmic Press, New York. [14] B.J. Grosz, TEAM: A Transportable Natural Language Interface System, In Proceedings of the 1 st Conference on Applied Natural Language Processing, Santa Monica, California, (1983), pp 39-45. [15] P. Resnik, Access to Multiple Underlying Systems in JANUS, BBN report 7142, Bolt Beranek and Newman inc., Cambridge, Massachusetts, (September, 1989). * * * 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 58