Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu



Similar documents
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

S. Aquter Babu 1 Dr. C. Lokanatha Reddy 2

Constructing an Interactive Natural Language Interface for Relational Databases

Classification of Natural Language Interfaces to Databases based on the Architectures

Natural Language Database Interface for the Community Based Monitoring System *

International Journal of Advance Foundation and Research in Science and Engineering (IJAFRSE) Volume 1, Issue 1, June 2014.

Computer Standards & Interfaces

Recovering Business Rules from Legacy Source Code for System Modernization

Natural Language to Relational Query by Using Parsing Compiler

Building a Question Classifier for a TREC-Style Question Answering System

Software Engineering EMR Project Report

How to Improve Database Connectivity With the Data Tools Platform. John Graham (Sybase Data Tooling) Brian Payton (IBM Information Management)

Search and Information Retrieval

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Interactive Dynamic Information Extraction

Personalization of Web Search With Protected Privacy

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

CHAPTER 3 PROPOSED SCHEME

Business Application Services Testing

XFlash A Web Application Design Framework with Model-Driven Methodology

Seminar Datenbanksysteme

Identifying Focus, Techniques and Domain of Scientific Papers

FiskP, DLLP and XML

Deferred node-copying scheme for XQuery processors

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Pattern based approach for Natural Language Interface to Database

Creating a Publication Work Breakdown Structure

6-1. Process Modeling

A Survey on Product Aspect Ranking

W6.B.1. FAQs CS535 BIG DATA W6.B If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

HELP DESK SYSTEMS. Using CaseBased Reasoning

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty

IBM DB2 XML support. How to Configure the IBM DB2 Support in oxygen

MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

DataDirect XQuery Technical Overview

Classification/Decision Trees (II)

HOW TO CLASSIFY WORKS USING ACM S COMPUTING CLASSIFICATION SYSTEM

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

High Performance XML Data Retrieval

Pushing XML Main Memory Databases to their Limits

Database Design Patterns. Winter Lecture 24

A Tool for Data Cube Construction from. Structurally Heterogeneous XML Documents

XML Processing and Web Services. Chapter 17

How To Use X Query For Data Collection

The Prolog Interface to the Unstructured Information Management Architecture

estatistik.core: COLLECTING RAW DATA FROM ERP SYSTEMS

Finding Advertising Keywords on Web Pages. Contextual Ads 101

Introduction to XML Applications

Learning Translation Rules from Bilingual English Filipino Corpus

XML Data Integration Based on Content and Structure Similarity Using Keys

Application of Natural Language Interface to a Machine Translation Problem

1. Domain Name System

RETRATOS: Requirement Traceability Tool Support

Natural Language Query Processing for Relational Database using EFFCN Algorithm

A Multi-document Summarization System for Sociology Dissertation Abstracts: Design, Implementation and Evaluation

How To Write A Summary Of A Review

Customer Intentions Analysis of Twitter Based on Semantic Patterns

A Workbench for Prototyping XML Data Exchange (extended abstract)

Integrating Heterogeneous Data Sources Using XML

Blog Post Extraction Using Title Finding

Mining a Change-Based Software Repository

Design Patterns in Parsing

Technologies for a CERIF XML based CRIS

Publishing Linked Data Requires More than Just Using a Tool

PP-Attachment. Chunk/Shallow Parsing. Chunk Parsing. PP-Attachment. Recall the PP-Attachment Problem (demonstrated with XLE):

Performance Evaluation and Optimization of Math-Similarity Search

How to Design and Create Your Own Custom Ext Rep

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

How To Improve Cloud Computing With An Ontology System For An Optimal Decision Making

A QTI editor integrated into the netuniversité web portal using IMS LD

Search Engine Based Intelligent Help Desk System: iassist

Beating the MLB Moneyline

Efficient XML-to-SQL Query Translation: Where to Add the Intelligence?

Generating XML from Relational Tables using ORACLE. by Selim Mimaroglu Supervisor: Betty O NeilO

Lecture 9. Semantic Analysis Scoping and Symbol Table

TREC 2003 Question Answering Track at CAS-ICT

6. SQL/XML. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. XML Databases 6. SQL/XML. Creating XML documents from a database

Text Analytics Evaluation Case Study - Amdocs

Web Data Extraction: 1 o Semestre 2007/2008

Oracle Database: SQL and PL/SQL Fundamentals

Obfuscated Biology -MSc Dissertation Proposal- Pasupula Phaninder University of Edinburgh March 31, 2011

Who? Wolfgang Ziegler (fago) Klaus Purer (klausi) Sebastian Gilits (sepgil) epiqo Austrian based Drupal company Drupal Austria user group

Qlik REST Connector Installation and User Guide

CENG 734 Advanced Topics in Bioinformatics

Natural Language Interfaces (NLI s)

Transport System. Transport System Telematics. Concept of a system for building shared expert knowledge base of vehicle repairs

Oracle SQL. Course Summary. Duration. Objectives

XML Databases 6. SQL/XML

D2.4: Two trained semantic decoders for the Appointment Scheduling task

Schema-free SQL. Tianyin Pan. Fei Li. H. V. Jagadish ABSTRACT. Categories and Subject Descriptors. Keywords

Resolving Common Analytical Tasks in Text Databases

Developing XML Solutions with JavaServer Pages Technology

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari

Transcription:

Constructing a Generic Natural Language Interface for an XML Database Rohit Paravastu

Motivation Ability to communicate with a database in natural language regarded as the ultimate goal for DB query interfaces Challenges Automatically understanding Natural Language Translate this parsed natural language query into a Database query

NaLIX Deals with the challenge of translating NLQ into Xquery Dealt with Attribute name confusion Query Structure Confusion Differentiate between Return the book with the lowest price and Return the lowest price of the book

Background Keyword Searches Keywords that are expressed together in a query must match objects that are close together in the database Problem? Too blunt Abstract notion of close together

Schema-Free Xquery A function called meaningful query focus used to retrieve the relation between two keywords in the search Example: Return the director of Gone with the wind Gone with the wind movie

Query Translation Relations between the words to be translated into Xquery NLQ converted to a parse tree Three main steps Classification of terms in the parse tree of NLQ Validation of parse tree Translating parse tree into Xquery

Token Classification Tokens Words/phrases that match a Xquery construct or an attribute value Markers Words that don t occur in database and not a Xquery construct

Tokens and Markers

Query Translation Given a valid parse tree, identify the relations between the name tokens and translate into xquery syntax Not so straightforward

Example

Definitions Equivalent NTs: NTs with same noun phrase with same modifiers. Movie (nodes 4 and 8) in example Sub-parse tree: A subtree rooted at an operator token and has atleast two children Core Token: NT in a sub-parse tree with no descendant NTs (or) NTs equivalent to another core token Movie (nodes 4,8) and book (11)

Definitions Directly Related NTs: Parent-child relation Title and movie Related by Core Tokens: Related to same or equivalent core token Related NTs: Either of the above or related to the same NT Sets {2,4,6,8} and {9,11} in example The set of related NTs are grouped together in the same MQF

Variables Each set of equivalent name tokens assigned a variable <var> NT A variable can also be made up of a group of variables. Called composed variables

Template Matching Matching a variable or a group of variables to a given template Template gives the translation for that particular set of variables/phrases in the sentence

Templates

Aggregator Nesting If the NT attached to an aggregate function is a core token, consider the entire sentence as part of the aggregation Return the number of movies, where the director of the movie is Ron Howard Return the lowest price for each book If the NT attached to an aggregate function is not a core token, the scope of the aggregation is limited to all the directly related NTs of the attached NT Return each book with lowest price

Translation Process Parse Variable binding Nesting Scope Final Xquery

Example Output Query: Return each director, where the number of movies directed by the director is the same as the number of movies directed by Ron Howard

Interactive Query Formulation Users asked to rephrase the question if there is no valid parse tree Suggestions given to rephrase the query Given the attribute value tokens, the phrases that epitomise the relation between the attributes can be rephrased. Ambiguity in the attribute values resolved using wordnet

Experimental Evaluation Participants asked to search for a given question using keyword search or NaLIX Comparison over Ease of use Search quality Participant asked to reformulate query iteratively until an acceptable threshold of precision and recall is reached.

Experimental Evaluation Ease of use: Time taken to come up with an acceptable NLQ Search Quality: Precision and Recall of the resultant Xquery Used books data from DBLP database for evaluation

Results Ease of Use Average time of 90 seconds to form a query Less than 2 iterations per query on average Atleast one participant got the correct NLQ in the first iteration for each question

Results Search quality Average Precision of 83% and Recall 90.1% Quality affected by Quality of NLQ given by user Parser accuracy Average precision of 95.1% and Recall 97.6% for queries that are formulated and parsed correctly

Results Precision of Search results Recall of Search results

Discussion Positive points Drawbacks Is it useful for your project? Are you convinced of its usability over different datasets Any suggestions/ideas on how to make this better