NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR



Similar documents
NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

Natural Language Web Interface for Database (NLWIDB)

Natural Language to Relational Query by Using Parsing Compiler

Classification of Natural Language Interfaces to Databases based on the Architectures

International Journal of Advance Foundation and Research in Science and Engineering (IJAFRSE) Volume 1, Issue 1, June 2014.

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

Pattern based approach for Natural Language Interface to Database

NATURAL LANGUAGE DATABASE INTERFACE

Providing Inferential Capability to Natural Language Database Interface

S. Aquter Babu 1 Dr. C. Lokanatha Reddy 2

Department of Computer Science and Engineering, Kurukshetra Institute of Technology &Management, Haryana, India

A Natural Language Query Processor for Database Interface

Natural Language Database Interface for the Community Based Monitoring System *

DEVELOPMENT OF NATURAL LANGUAGE INTERFACE TO RELATIONAL DATABASES

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Natural language Interface for Database: A Brief review

Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

A Survey of Natural Language Interface to Database Management System

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Conceptual Schema Approach to Natural Language Database Access

A Case Study of Question Answering in Automatic Tourism Service Packaging

Search Result Optimization using Annotators

Object-Relational Database Based Category Data Model for Natural Language Interface to Database

Automatic Text Analysis Using Drupal

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Overview of MT techniques. Malek Boualem (FT)

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Structural and Semantic Indexing for Supporting Creation of Multilingual Web Pages

Metafrastes: A News Ontology-Based Information Querying Using Natural Language Processing

D2.4: Two trained semantic decoders for the Appointment Scheduling task

A Study of the Various Architectures for Natural Language Interface to DBs

Building a Question Classifier for a TREC-Style Question Answering System

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

IAI : Knowledge Representation

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Optimization of Image Search from Photo Sharing Websites Using Personal Data

Special Topics in Computer Science

An Arabic Natural Language Interface System for a Database of the Holy Quran

Semantic annotation of requirements for automatic UML class diagram generation

2014/02/13 Sphinx Lunch

Extraction of Radiology Reports using Text mining

A Survey on Product Aspect Ranking

A Mixed Trigrams Approach for Context Sensitive Spell Checking

Outline of today s lecture

Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Download Check My Words from:

Effective Self-Training for Parsing

ISSN: Sean W. M. Siqueira, Maria Helena L. B. Braz, Rubens Nascimento Melo (2003), Web Technology for Education

Interactive Dynamic Information Extraction

The Prolog Interface to the Unstructured Information Management Architecture

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Collecting Polish German Parallel Corpora in the Internet

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Computer Standards & Interfaces

II. PREVIOUS RELATED WORK

RRSS - Rating Reviews Support System purpose built for movies recommendation

Healthcare Measurement Analysis Using Data mining Techniques

Application of Natural Language Interface to a Machine Translation Problem

Sentiment analysis for news articles

Shallow Parsing with Apache UIMA

DESIGN OF AN ONLINE EXPERT SYSTEM FOR CAREER GUIDANCE

INTEGRATED DEVELOPMENT ENVIRONMENTS FOR NATURAL LANGUAGE PROCESSING

Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE)

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

DEPENDENCY PARSING JOAKIM NIVRE

Specialty Answering Service. All rights reserved.

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Linguistic Preference Modeling: Foundation Models and New Trends. Extended Abstract

Comprendium Translator System Overview

A Survey of Online Tools Used in English-Thai and Thai-English Translation by Thai Students

Learning is a very general term denoting the way in which agents:

31 Case Studies: Java Natural Language Tools Available on the Web

Search Engine Based Intelligent Help Desk System: iassist

Identifying Focus, Techniques and Domain of Scientific Papers

Component Approach to Software Development for Distributed Multi-Database System

Distributed Database for Environmental Data Integration

A Knowledge-based System for Translating FOL Formulas into NL Sentences

Transcription:

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati Vidyapeeth Deemed University, Pune, India ABSTRACT Databases have become ubiquitous. Almost all IT applications are storing into and retrieving information from databases. Retrieving information from the database requires knowledge of technical languages such as Structured Query Language (SQL). However majority of the users who interact with the databases do not have a technical background and are intimidated by the idea of using languages such as SQL. This has led to the development of a few Natural Language Database Interfaces (NLDBIs). A NLDBI allows the user to query the database in a natural language. This paper highlights on architecture of new NLDBI system using Probabilistic Context Free Grammar (PCFG), its implementation and discusses on results obtained. In most of the typical NLDBI systems the natural language statement is converted into an internal representation based on the syntactic and semantic knowledge of the natural language. This representation is then converted into queries using a representation converter. A natural language query is translated to an equivalent SQL query after processing through various stages. The work has been experimented on primitive database queries with certain constraints. KEYWORDS: Natural language database interface, representation converter, syntactic and semantic knowledge, PCFG. I. INTRODUCTION Natural language processing is becoming one of the most active areas in Human-computer Interaction. The goal of NLP is to enable communication between people and computers without resorting to memorization of complex commands and procedures. In other words, NLP is a technique which can make the computer understand the languages naturally used by humans. While natural language may be the easiest symbol system for people to learn and use, it has proved to be the hardest for a computer to master. Despite the challenges, natural language processing is widely regarded as a promising and critically important endeavor in the field of computer research. The general goal for most computational linguists is to instill the computer with the ability to understand and generate natural language so that eventually people can address their computers through text as though they were addressing another person. The applications that will be possible when NLP capabilities are fully realized are impressive computers would be able to process natural language, translating languages accurately and in real time, or extracting and summarizing information from a variety of data sources, depending on the users requests. This paper describes a natural language database interface that wires complex queries based on a probabilistic context free grammar (PCFG) to relational database. First we summarize some classic NLDBI systems. Consequently we discuss the overall system architecture of the natural language database interface, some implementation details and experimental results. II. RELATED WORK The very first attempts at Natural language database interfaces are just as old as any other NLP research. In fact database NLP may be one of the most important successes in NLP since it began. 568 Vol. 3, Issue 2, pp. 568-573

Asking questions to databases in natural language is a very convenient and easy method of data access, especially for users who do not understand complicated database query languages such as SQL. The success in this area is partly because of the real-world benefits that can come from database NLP systems, and partly because NLP works very well in a single-database domain. Databases usually provide small enough domains that ambiguity problems in natural language can be resolved successfully. Here are some examples of database NLP systems: 2.1 LUNAR LUNAR (Woods, 1973) involved a system that answered questions about rock samples brought back from the moon. Two databases were used, the chemical analyses and the literature references. The program used an Augmented Transition Network (ATN) parser and Woods' Procedural Semantics. The system was informally demonstrated at the Second Annual Lunar Science Conference in 1971. [1] 2.2 LlFER/LADDER It was one of the first good database NLP systems. It was designed as a natural language interface to a database of information about US Navy ships. This system, as described in a paper by Hendrix (1978), used a semantic grammar to parse questions and query a distributed database. The LIFERILADDER system could only support simple one-table queries or multiple table queries with easy join conditions. [4] 2.3 CHAT-80 The system CHAT-80 [5] is one of the most referenced NLP systems in the eighties. The system was implemented in Prolog. The CHAT-80 was an quite impressive, efficient and sophisticated system. The database of CHAT-80 consists of facts (i. e. oceans, major seas, major rivers and major cities) about 150 of the countries world and a small set of English language vocabulary that are enough for querying the database. III. SYSTEM DESCRIPTION A brief description of the system is given. Let us consider a database say ORACLE. Within this oracle database we have stored some tables which are properly normalized. Now if the user wishes to access the data from the table, he/she has to be technically strong in the SQL language to make a query for the ORACLE database. This system cuts this part and enables the end user to access the tables in his/her language. 3.1 SYSTEM SCOPE Input to this system is in natural language, here it is in English. A limited data dictionary is used in which all possible related to a particular system is stored. The Data Dictionary of the system must be regularly updated with words that are specific to the particular system. Ambiguity among the words will be taken care of while processing the natural language. 3.2 SYSTEM ARCHITECTURE The system admits the following elements. Designing the front end or the user interface where the user will enter the query in Natural Language. 1. Manage corpus: A data dictionary is used in which all possible word related to a particular system is loaded 2. PCFG rule: a Probabilistic Context Free Grammar(PCFG) rule is entered with probability. 3. Parsing: Derives the Semantics of the Natural Query given by the user and parses it in its technical form. 4. PCFG rules used: After the successful parsing of the statement given by the user, the system generates CYK chart against the user statement. 5. SQL query: This module generates the corresponding SQL statement and places it in the User Interface Screen as a result form. 569 Vol. 3, Issue 2, pp. 568-573

Figure. 1 Architecture of NLDBI System. The system architecture of natural language database interface developed is given in Figure. 1, which depicts the layout of the processes included in converting Natural Language query into a syntactical SQL query to be fired on the RDBMS. To process a query, the first step is speech tagging followed by word tagging. The second step is parsing the tagged sentence by a grammar. The grammar parser analyzes the query sentence according to the tag of each word and generates the grammar tree/s. Finally, the SQL translator processes the grammar tree to obtain the SQL query. SQL translator is used translating the leaves of the tree to the corresponding SQL. Actually the process is collecting information from the parsed tree. Two techniques may be used to collect the information: dependency structure and verb sub categorization [13, 14, 15]. These techniques are also used in disambiguation, since a PCFG is context-free and ancestor-free, any information in the context of a node in the parsed tree needs not be taken into account when constructing the parsed tree. A dependency structure is used to capture the inherent relations occurs in the corpus texts that may be critical in real-world applications, and it is usually described by typed dependencies. A typed dependency represents grammatical relations between individual words with dependency labels, such as subject or indirect object. Verb sub categorization is the other technique [16]. If we know the sub categorization frame of the verb, we can find the objects of this verb easily, and the target of the query can be found easily. This paper is based on a concept of processing user natural language into a technical form so as to access the data from higher end data storage. NLDBI is a system that allows users to access a database in natural language and has been a popular field of study. Suppose we consider a properly normalized database. Now if the user wishes to access the data from the table, he/she accesses the tables in his/her language. English Semantic statement Word tagging Representation Parsing PCFG SQL Query Figure. 2 Generation of SQL query from English Statement. 570 Vol. 3, Issue 2, pp. 568-573

IV. RESULTS The user interface of or NLDBI system is shown in Figure 3. At first the user clicks on Manage Corpus button so that all the data dictionary is loaded. When the user clicks on the pcfg Rules setting the user has to enter Probabilistic Context Free Grammar as shown in fig 4.After this the user has to enter his query in natural language. Parser option will generate the parse tree. For example, for the query Select the airlines whose seats number is more than 60 and whose mid-stops contain Wuhan, the system generated the parsed tree shown in Figure 6 using the Stanford Lexicalized Parser v1.6 that, in turn, uses the Penn Treebank. Figure. 3 User Interface After the successful parsing of the statement given by the user, the system generates CYK chart against the user statement. The SQL module generates the corresponding SQL statement and places it in the User Interface Screen as a result form as shown in the Fig 5.For example if the user enters the Natural Language find all student name.. the corresponding SQL query displayed will be select * from STUDENT. Figure.4 PCFG Rules 571 Vol. 3, Issue 2, pp. 568-573

Figure 5. Final Output after query conversion Figure 6. An example parsed tree V. CONCLUSIONS AND FUTURE WORKS The system accepts an English language requests that is interpreted and translated into SQL command using semantic grammar technique. In addition, the system requires a knowledge base that consists of a database and its schema. The result of the number of experiments in the form of trials in a user friendly environment had been very successful and satisfactory. To improve the system performance in natural language processing various issues like enriching the knowledge sources of the system in order to increase the system efficiency and researching methods to improve the coherence and the fluency of output texts must be considered. From the experiments we can seen it is possible to translate a natural language query to SQL, and a probabilistic approach may be promising. So far, our NLDBI system considers selection and a few simple aggregations. The next aim of our research is to optimize the PCFG, to accommodate more and more complex queries. ACKNOWLEDGEMENTS I express my deep sense of gratitude and my research guide Prof. P. R. Devale for his continuous inspiration and valuable guidance in throughout my dissertation work. REFERENCES [1] Woods, W., Kaplan, R. Lunar rocks in natural English: Explorations in natural language question answering. Linguistic Structures Processing. In Fundamental Studies in Computer Science, 5:521-569, 1977. [2] Androutsopoulos, I., Richie, G.D., Thanisch, P. Natural Language Interface to Database An Introduction. Journal of Natural Language Engineering, Cambridge University Press. 1(1), 29-81, 1995. 572 Vol. 3, Issue 2, pp. 568-573

[3] Linguistic Technology. English Wizard Dictionary Administrator's Guide. Linguistic Technology Corp.Littleton, MA, USA, 1997. [4] Hendrix, G. (1977). The LIFER manual A guide to building practical natural language interfaces. SRI Artificial Intelligence Center, Menlo Park, Calif. Tech. Note 138. [5] Warren, D., Pereira, F. (1982). An efficient and easily adaptable system for interpreting natural language queries in Computational Linguistics. Volume 8 pages 3 4. [6] I. Androutsopoulos, G.D. Ritchie, and P. Thanisch, Natural Language Interfaces to Databases An Introduction, Journal of Natural Language Engineering 1 Part 1 (1995), 29 81. [7] Huangi,Guiang Zangi, Phillip C-Y Sheu "A Natural language database Interface based on probabilistic context free grammar", IEEE International workshop on Semantic Computing and Systems 2008. [8] ELF Software CO. Natural-Language Database Interfaces from ELF Software Co, cited November 1999, available from Internet: [9] Ana-Maria Popescu, Alex Armanasu, Oren Etzioni, David Ko, and Alexander Yates, Modern Natural Language Interfaces to Databases:Composing Statistical Parsing with Semantic Tractability, COLING (2004). [10] Natural language Interface for Database: A Brief review, Mrs. Neelu Nihalani, Dr. Sanjay Silakari, Dr. Mahesh Motwani. IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011 ISSN (Online): 1694-0814 [11] Generic Interactive Natural Language Interface to Databases (GINLIDB) Faraj A. El-Mouadib, Zakaria S. Zubi, Ahmed A. Almagrous, and Irdess S. El-Feghi. International Journal of Computers Issue 3, Volume 3, 2009. [12] Miikkulainen R., Natural language processing with subsymbolic neural networks, Neural Network Perspectives on Cognition and Adaptive Robotics.(2011) [13] Dan Klein, Christopher D. Manning: Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency. ACL 2004: 478-485. [14] M-C.de Marneffe, B. MacCartney, and C. D. Manning. Generating Typed Dependency Parses From Phrase Structure Parses. In Proceedings of the IEEE /ACL 2006 Workshop on Spoken Language Technology. The Stanford Natural Language Processing Group. 2006. [15] Dan Klein and Christopher D. Manning. 2003. Fast Exact Inference with a Factored Model for Natural Language Parsing. In Advances in Neural Information Processing Systems 15 (NIPS 2002), Cambridge, MA: MIT Press, pp. 3-10. [16]Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. Generating Typed Dependency Parses from Phrase Structure Parses. In LREC 2006. [17] Phillip McCarthy, Applied Natural language Processing: Identification, Investigation and Resolution.(2011). Biography: Arati K. Deshpande, PG Scholar in information Technology at Bharati Vidyapeeth Deemed University, Pune. Her fields of interest are Natural Language Processing, Operating system and Computer Networking. Prakash Devale, presently working as a professor and Head department of Information Technology at Bharati Vidyapeeth Deemed University College of Engineering, Pune. He received his ME from Bharati Vidyapeeth University and pursuing Ph.D degree in natural language processing. 573 Vol. 3, Issue 2, pp. 568-573