Natural Language to Relational Query by Using Parsing Compiler



Similar documents
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

Semantic annotation of requirements for automatic UML class diagram generation

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

NATURAL LANGUAGE DATABASE INTERFACE

Natural Language Web Interface for Database (NLWIDB)

Natural Language Database Interface for the Community Based Monitoring System *

CENG 734 Advanced Topics in Bioinformatics

A Survey on Product Aspect Ranking

Classification of Natural Language Interfaces to Databases based on the Architectures

KEYWORD SEARCH IN RELATIONAL DATABASES

Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database

Interactive Dynamic Information Extraction

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS

Pattern based approach for Natural Language Interface to Database

Search Result Optimization using Annotators

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Spam Detection Using Customized SimHash Function

RRSS - Rating Reviews Support System purpose built for movies recommendation

ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search

SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Firewall Builder Architecture Overview

Search Engine Based Intelligent Help Desk System: iassist

How To Create A Data Transformation And Data Visualization Tool In Java (Xslt) (Programming) (Data Visualization) (Business Process) (Code) (Powerpoint) (Scripting) (Xsv) (Mapper) (

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Software Engineering EMR Project Report

An Approach towards Automation of Requirements Analysis

INTRUSION PROTECTION AGAINST SQL INJECTION ATTACKS USING REVERSE PROXY

CHAPTER 5 INTELLIGENT TECHNIQUES TO PREVENT SQL INJECTION ATTACKS

Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE)

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Computer Standards & Interfaces

Automated Extraction of Security Policies from Natural-Language Software Documents

Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery

SQLMutation: A tool to generate mutants of SQL database queries

The preliminary design of a wearable computer for supporting Construction Progress Monitoring

S. Aquter Babu 1 Dr. C. Lokanatha Reddy 2

AUTOMATIC DATABASE CONSTRUCTION FROM NATURAL LANGUAGE REQUIREMENTS SPECIFICATION TEXT

Finding Execution Faults in Dynamic Web Application

Fuzzy Multi-Join and Top-K Query Model for Search-As-You-Type in Multiple Tables

Specialty Answering Service. All rights reserved.

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Optimization of Internet Search based on Noun Phrases and Clustering Techniques

Software Architecture Document

Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms

A FRAMEWORK FOR MANAGING RUNTIME ENVIRONMENT OF JAVA APPLICATIONS

Building a Question Classifier for a TREC-Style Question Answering System

MULTI-DIMENSIONAL PASSWORD GENERATION TECHNIQUE FOR ACCESSING CLOUD SERVICES

The Prolog Interface to the Unstructured Information Management Architecture

A Survey on Product Aspect Ranking Techniques

Automatic Text Analysis Using Drupal

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC Presentation

Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql

Moving Enterprise Applications into VoiceXML. May 2002

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

Chapter 1: Introduction

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Profile Based Personalized Web Search and Download Blocker

Fast and Easy Delivery of Data Mining Insights to Reporting Systems

UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis

Shallow Parsing with Apache UIMA

Knocker main application User manual

Compiler I: Syntax Analysis Human Thought

ELEVATING FORENSIC INVESTIGATION SYSTEM FOR FILE CLUSTERING

IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD

Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu

A Tokenization and Encryption based Multi-Layer Architecture to Detect and Prevent SQL Injection Attack

Lecture 9. Semantic Analysis Scoping and Symbol Table

Optimization of Image Search from Photo Sharing Websites Using Personal Data

A Review of Data Mining Techniques

A Time Efficient Algorithm for Web Log Analysis

Using NLP and Ontologies for Notary Document Management Systems

A UPS Framework for Providing Privacy Protection in Personalized Web Search

A Case Study of Question Answering in Automatic Tourism Service Packaging

Integrating VoltDB with Hadoop

Role of Text Mining in Business Intelligence

Bitemporal Extensions to Non-temporal RDBMS in Distributed Environment

Integration of Sound Signature in 3D Password Authentication System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

International Journal of Advance Foundation and Research in Science and Engineering (IJAFRSE) Volume 1, Issue 1, June 2014.

Search and Information Retrieval

Transcription:

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015, pg.485 494 RESEARCH ARTICLE ISSN 2320 088X Natural Language to Relational Query by Using Parsing Compiler Radhika Mali, Pooja Toge, Prerana Gargote, Supriya Mali, Prof. Sanchika Bajpai Department of Computer Science. University of Pune MH (India) JSPM s BSIOTR. College of Engineering, Pune, Maharashtra, India 1 radhikamali1993@gmail.com, 2 poojatoge26@gmail.com, 3 gargote.prerana9293@gmail.com, 4 sups.mali09@gmail.com Abstract- Translating a natural language question into a database query suffers from the translation ambiguity problem, which has not received significant attention in this field. To deal with translation ambiguity, we suggest an ambiguity resolution method based on the proposed database semantics. In natural language database interfaces (NLDBI), manual construction of translation knowledge normally undermines domain portability because of its expensive human intervention. It introduces some classical NLDBI products and their applications and proposes the architecture of a new NLDBI system including its probabilistic context free grammar, the inside and outside probabilities which can be used to construct the parse tree, an algorithm to calculate the probabilities, and the usage of dependency structures and verb sub categorization in analyzing the parse tree. To perform extraction our system provide automated query generation component so that random users do not have to learn the query language. The two important aspects of an information system are efficiency and quality of extraction result, also our system reduces the processing time by 89% as compared to a traditional pipeline approach. Our experiments also describe that our approach archives purity extraction results. Index term- Information storage and retrieval, Natural language processing, Query generation, Query language, Text mining 2015, IJCSMC All Rights Reserved 485

I. INTRODUCTION The SQL translator acts as a frontend to any information system and provides a natural language interface (NLI) to the end user. The translator is designed to understand everyday English requests and invoke appropriate database reporting tool for a valid query. Each valid query is mapped onto an appropriate SQL command through a neural net based converter/translator[1] Information Extraction (IE) is a process for the extraction of a particular kind of relationships of interest from a document collection. The purpose of information extraction (IE) is to find wanted pieces of information in natural language texts and store them in a form that is suitable for automatic querying and processing. IE requires a predefined output representation (target structure) and only searches for facts that place this representation.[2] Simple target structures define just a number of slots. Each slot is filled with a string extracted from a text, e.g. a name or a date (slot filler). In order to perform relationship extraction a typical IE setting involves a pipeline of text processing modules. The field of information (IE) search to develop methods for fetching structured information from natural language text. Examples of structured information from natural language are the extraction of entities and relationships between entities. IE is usually deployed as a pipeline of special-purpose programs, which include: sentence splitting to identify sentences from a paragraph of text, Tokenization which identifies word tokens from sentences, Named entity recognition used to identifies mentions of entity types of interest. To utilize lexical, syntactic and semantic features, Syntactic parsing identifies grammatical structures of sentences; Pattern matching obtains relationships based on a set of extraction patterns. Extraction patterns are typically obtained through manually written patterns compiled by experts or automatically generated patterns based on training data. Different kinds of parsers, which include shallow and deep parsers, can be utilized in the pipeline.[1] II. RELATED WORK The main focus has been on improving the accuracy & runtime of information extraction[ie]. NLDBi s can be classified according to the approach employed in deriving an SQL query that retrieves the answer of a given natural language question to a database. Example, system uses this approach, reducing the problem finding a semantic interpretation of ambiguous phrases to a graph matching problem. III. SYSTEM ARCHITECTURE In most of the typical NLDBi systems the natural language statement is converted into an internal representation based on the syntactic and semantic knowledge of the natural language. This representation is then converted into queries using a representation converter. Before an natural language query is translated to an equivalent query in technical language like SQL it has to through a lot of steps. Tokenization Grammar Checking Query Generation Data Collection 2015, IJCSMC All Rights Reserved 486

Fig. 1 Architecture of NLDBi System In this system architecture of natural language database interface developed is given in Fig. 1, which depicts the layout of the processes included in converting NL query into a syntactical SQL query to be fired on the RDBMS. The extraction patterns over parse trees can be expressed in our proposed parse tree query language. In the extraction phase, the PTQL query evaluator takes a PTQL query and transforms it into keyword-based queries and SQL queries, which are evaluated by the underlying RDBMS and information retrieval (IR) engine. To speed up query evaluation, the index builder creates an inverted index for the indexing of sentences according to words and the corresponding entity types[1]. IV. IMPLEMENTATION The experimental work is to design an interface for generating queries from natural language statements/questions. It also consists of designing a parser for the natural language statements, which will parse the input statement, generate the query and fire it on the database. The experimental work will understand the exact meaning the end user wants to go for, generate a what- type sentence and then convert it into a query and handover it to the interface. The interface further processes the query and searches for the database. The database gives the result to the system which is displayed to the user[4]. 2015, IJCSMC All Rights Reserved 487

Fig.2. Generation of SQL Query from Natural Language Fig. 2 depicts the processing of English input statement to generate SQL query. The entire process involves tagging of input statement, apply grammar and semantic representation to generate parse tree, analyze the parse tree using grammar and translating the leaves of the tree to generate corresponding SQL query. The SQL translator generates query in SQL. Using grammar the parse tree is obtained from the input statement. The leaves of the parse tree are translated to corresponding SQL[3]. The following modules were developed. An Interface: It allows the user to enter the query in NL, interact with the system during ambiguities and display the query results. Parsing: Derives the Semantics of the statement given by the user and parses it into its internal representation, to convert NL input statement into what- type question for selection of data. Query Generation: It generates a query against the user statement in SQL and passes on to the database. 2015, IJCSMC All Rights Reserved 488

The algorithm designed is as given below: Algorithm 1 Generation of parse tree/s from NL statement using grammar 1. Read input statement S 2. For each word Wi from S do 3. If(Wi Grammar G) then 4. Add Wi to symbol table ST 5. End if 6. End for 7. For each Wi from ST do 8. Add Wi to parse terr/s T for What-type question/s 9. End for 10. Display What-Type question/s Q 11. Read input Q 12. For each Wi from Q do 13. If(Wi G) then 14. Add Wi to parse tree for SQL-query 15. End if 16. End for 17. Display SQL-query 2015, IJCSMC All Rights Reserved 489

V. EXPERIMENTAL RESULT The system implemented was tested for variety of NL statement under various categories and the result obtained ware satisfactory under the known constraints. Fig: set database connection The fig. shows the set connection to database. In the connection two categories are included: server configuration and server authentication.sever configuration ask the server name and DB name. And server authentication is user ID and the password to be used to create the connection. 2015, IJCSMC All Rights Reserved 490

Fig 2: set dictionary using synonyms Fig. Database create the dictionary, in dictionary different tables are created, using different synonyms. In table column name, synonyms and related ID are used. 2015, IJCSMC All Rights Reserved 491

Fig 3: NLDBI system The fig shows the typical categories of generating ambiguous parse tree. 1. First enter the natural statement in this system, and then these can create the English grammar. Suppose the natural statement are incorrect these system can show the message about wrong statement they cannot be create the English statement. 2. After the English statement creates SQL statement means these SQL query generated. 3. In English statement no of statement are shows these statement are select and u can delete the statement help of the delete button. 2015, IJCSMC All Rights Reserved 492

Fig. 4 Final output as a result. Above fig4 shows the final output of the natural statement. When SQL query is fired on database then we got final result as a information which we want. VI. CONCLUSION Information extraction system is to extract query language. Natural query language transfer to PTQL(Parse Tree Query Language) tree. This leads to the unnecessary reprocessing of the entire text when the extraction goal is modified or improved, which can be computationally exclusive and time consuming one. To reduce this unnecessary reprocessing time the intermediate process data is stored in database system. The database is in the form of parse tree. To extract information from parse tree the extraction goal written by the user in natural language text is converted into PTQL and then extraction is performed on text corpus. This increment extraction approach reduces time 89% as compared to performing extraction by first processing each sentence one at a time with linguistic parse and then other components. REFERENCES [1] http://piserjournal.org/2014/09/natural-statement-to-relational-query-by-using-parsingcompiler/ [2] D. Ferrucci and A. Lally, UIMA: An Architectural Approach to Unstructured Information Processing in the Corporate Research Environment, Natural Language Eng., vol. 10, nos. 3/4, pp. 327-348, 2004. [3] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan, GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications, Proc. 40th Ann. Meeting of the ACL, 2002. [4] F.Chen, A. Doan, J. Yang, and R. Ramakrishnan, Efficient Information Extraction over Evolving Text Data, Proc IEEE 24 th Int l Conf. Data Eng. (ICDE 08), pp. 943-952, 2008. [5] S.Sarawagi, Information Extraction, Foundations and Trends in Databases, vol.1, no. 3, pp. 261-377, 2008. 2015, IJCSMC All Rights Reserved 493

[6] M.J. Cafarella and O. Etzioni, A Search Engine for Natural Language Applications, Proc. 14th Int l Conf. World Wide Web(WWW 05), 2005. [7] XQuery 1.0: An XML Query Language, http://www.w3.org/ XML/Query, June 2001. [8] C.Lai, A Formal Framework for Linguistic Tree Query, Master s thesis, Dept. of Computer Science and Software Eng., Univ. of Melbourne, 2005. [9] E.Agichtein and L. Gravano, Querying Text Databases for Efficient Information Extraction, Proc. Int l Conf. Data Eng. (ICDE), pp. 113-124, 2003. [10] M.Huang, S. Ding, H. Wang, and X. Zhu, Mining Physical Protein-Protein Interactions by Exploiting Abundant Features, Proc. Second BioCreative Challenge, pp. 237-245, 2007. 2015, IJCSMC All Rights Reserved 494