A Chart Parsing implementation in Answer Set Programming



Similar documents
Basic Parsing Algorithms Chart Parsing

Artificial Intelligence Exam DT2001 / DT2006 Ordinarie tentamen

Outline of today s lecture

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Learning Translation Rules from Bilingual English Filipino Corpus

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Natural Language Database Interface for the Community Based Monitoring System *

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Natural Language to Relational Query by Using Parsing Compiler

Understanding English Grammar: A Linguistic Introduction

Paraphrasing controlled English texts

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

Syntax: Phrases. 1. The phrase

The compositional semantics of same

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Overview of MT techniques. Malek Boualem (FT)

Constraints in Phrase Structure Grammar

Building a Question Classifier for a TREC-Style Question Answering System

Random vs. Structure-Based Testing of Answer-Set Programs: An Experimental Comparison

Artificial Intelligence

Special Topics in Computer Science

A chart generator for the Dutch Alpino grammar

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

Overview of the TACITUS Project

Interactive Dynamic Information Extraction

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

PP-Attachment. Chunk/Shallow Parsing. Chunk Parsing. PP-Attachment. Recall the PP-Attachment Problem (demonstrated with XLE):

A Survey on Product Aspect Ranking

Text-To-Speech Technologies for Mobile Telephony Services

A Knowledge-based System for Translating FOL Formulas into NL Sentences

Syntactic Theory on Swedish

Syntactic Theory. Background and Transformational Grammar. Dr. Dan Flickinger & PD Dr. Valia Kordoni

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

A Test Case Generator for the Validation of High-Level Petri Nets

EXJ 201. Show a little understanding of the persuasive purpose of the task but neglect to take or to maintain a position on the issue in the prompt

Sentence Structure/Sentence Types HANDOUT

If-Then-Else Problem (a motivating example for LR grammars)

TREC 2003 Question Answering Track at CAS-ICT

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

Structurally ambiguous sentences

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

DEPENDENCY PARSING JOAKIM NIVRE

How To Write A Summary Of A Review

Ontology and automatic code generation on modeling and simulation

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

Natural Language Processing for Verbatim Text Coding and Data Mining Report Generation

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

Transaction-Typed Points TTPoints

Identifying Prepositional Phrases in Chinese Patent Texts with. Rule-based and CRF Methods

Statistical Machine Translation

UNIVERSITY COLLEGE LONDON EXAMINATION FOR INTERNAL STUDENTS

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

RRSS - Rating Reviews Support System purpose built for movies recommendation

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Semester Review. CSC 301, Fall 2015

Knowledge Representation and Question Answering

Introduction to formal semantics -

Design Patterns in Parsing

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

ORAKEL: A Portable Natural Language Interface to Knowledge Bases

International Journal of Advance Foundation and Research in Science and Engineering (IJAFRSE) Volume 1, Issue 1, June 2014.

Empirical Machine Translation and its Evaluation

A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts

COMPARING MATRIX-BASED AND GRAPH-BASED REPRESENTATIONS FOR PRODUCT DESIGN

Understanding Data Flow Diagrams Donald S. Le Vie, Jr.

Likewise, we have contradictions: formulas that can only be false, e.g. (p p).

Towards a Framework for Generating Tests to Satisfy Complex Code Coverage in Java Pathfinder

Memory-based Pattern Completion in Database Semantics

Noam Chomsky: Aspects of the Theory of Syntax notes

ExSched: Solving Constraint Satisfaction Problems with the Spreadsheet Paradigm

Application of Natural Language Interface to a Machine Translation Problem

Business Process- and Graph Grammar-Based Approach to ERP System Modelling


Computational Methods for Database Repair by Signed Formulae

Slot Grammars. Michael C. McCord. Computer Science Department University of Kentucky Lexington, Kentucky 40506

Clustering Connectionist and Statistical Language Processing

Transcription:

A Chart Parsing implementation in Answer Set Programming Ismael Sandoval Cervantes Ingenieria en Sistemas Computacionales ITESM, Campus Guadalajara elcoraz@gmail.com Rogelio Dávila Pérez Departamento de Sistemas de Informacion CUCEA, Universidad de Guadalajara rdav90@gmail.com Abstract. The present document presents an attempt to develop an Answer- Set Prolog program to implement a well-known strategy for natural language parsing, known as chart parsing. It is important to notice that the program presented is an end-term project for a course in natural language processing, even though we believe there are interesting insights that can be extracted from this effort. 1 Introduction An important language for knowledge representation and programming, called Answer-Set Programming (ASP) [1] has been developed and offers interesting possibilities for natural language processing. With the aim of exploring the capabilities of ASP in this area, it was decided to build an experimental program for undertaking the parsing of a natural language sentence using a well-known technique, called chart parsing. The results are presented in this document. The program was built from some ideas taken form [4], and important insights extracted from the experience are included. The rest of the paper is organized as follows: section 2 introduces the preliminaries of Chart Parsing; in section 3 the implementation of the program is discussed and in section 4 some conclusions are presented. 2 Chart Parsing Natural languages are characterized by presenting a high degree of structural ambiguity, for example: (a) Mary and John hear the report on the travel The sentence (a) is ambigous since two different syntactic structures can be assigned to it, as it is shown in Figure-1. In the representation (b) Mary and John hear the report while they were traveling. In the second one (c) Mary and John hear a report about a travel; two different meanings for the same sentence!

(b) S (c) S VP VP V PP Det Noun V Det Noun Noun PP Mary and John hear the report on the travel Mary and John hear the report on the travel Figure 1: Parsing a sentence Syntactic structures are generated by the following grammar defined by the context free rules that express the way in which different components of language are combined to build sentences: (1) S VP (5) PropName (2) VP V (6) Conj (3) VP VP PP (7) Noun Noun PP (4) Det Noun (8) PP Prep and the lexicon rules which attach to each word its corresponding category: L1 Det the L2-3 Noun report travel L4 Verb hear L4-5 PropName Mary John L6 Conj and L7 Prep on The application of a parsing process to the sentence (a) using the grammar presented above conveys the syntactic structures shown in (b) and (c). The main desirable for a parsing algorithm is to avoid rebuilding substructures that have already been recognized during the parsing process. An important method for natural language parsing is called Chart Parsing [3,5]. It consists of a tabular based bottom-up parsing algorithm. The basic idea is to keep a table containing each substructure generated during the parsing process in different iterations until all possible structures for the sentence are obtained. The input for the method consists of a sentence seen as a string of words and a grammar, espressed as a set of production rules. An example of the (partial) structure generated after applying a chart parsing algorithm to our sample sentence (a) is shown below:

S VP PropName Conj PropName V Det Mary and John hear 1 2 3 4 5 The point is that once we have recognized a substructure to say, a noun phrase (), we never come back an remake that analysis. In the next section the actual implementation of the program is discussed. 3 Implementation The implementation of the algorithm in AS Prolog is given in four steps: Step 1. Create the initial table including an arc for each word in the sentence. Step 2. Add a self_arc for each grammar rule whose first left-most element of the right-hand side is already in the table. Step 3. Extend every self_arc into an arc of the same category where the full righthand of the rule has been found. Step 4. Iterate from Step 2 to 4 incrementing the time until the given limit. Now we will show the way in which the following simple sentence is processed: A dog bite the girl There is a C interface which given the grammar and the sentence generates the following program.: 3.1 Step 1 Each word is mapped to a predicate called word/3 whose arguments are the actual word, the initial and ending nodes. word(a,0,1). word(dog,1,2). word(bite,2,3). word(the,3,4). word(girl,4,5). init(0). end(5). time_goal(5). range(0..5). time(0..5).

ASP requires the initial and the last node to be expressed so we used the predicates init/1 and end/1. Another parameter that needs to be considered is the limit for the iteration process. This limit is introduced by the time_goal/1 predicate. A rule has to be introduced for initiating the table holds/2, whose arguments are the arc/3 and the time/1, starting to the value 0. holds(arc(x,a,b),0):- word(x,a,b), range(a), range(b). Then the lexicon is introduced in the following way: verb(bite). dete(a). dete(the). noun(girl). noun(dog). rule(v). rule(np). rule(s). rule(vp). rule(noun). rule(det). % determiner % verb % noun phrase % sentence % verb phrase % determiner 3.2 Step 2 Now the rules that have a chance to be applied are introduced. The decision on whether a rule will be applied or not depends on having detected that the left-most element of the right hand part of the rule has been recognized, as stated below: holds(self_arc(sentence,a),t+1):- holds(arc(np,a,b),t), holds(self_arc(np,a),t+1):- holds(arc(det,a,b),t), holds(self_arc(vp,a),t+1):- holds(arc(v,a,b),t), holds(self_arc(v,a),t+1):- holds(arc(x,a,b),t),word(x,a,b),v_t(x), holds(self_arc(det,a),t+1):- holds(arc(x,a,b),t),word(x,a,b),dete(x), holds(self_arc(noun,a),t+1):- holds(arc(x,a,b),t),word(x,a,b),noun(x),

3.3 Step 3 The context free grammar rules are introduced for each of the syntactic categories. An arc labeled with this category will be introduced in the table whenever the constituents of the category are reached. These new arcs correspond to the expanded arcs in the model.. holds(arc(s,a,b),t+1):- holds(self_arc(s,a),t), holds(arc(np,a,c),t), holds(arc(vp,c,b),t), range(a),range(b),range(c),time(t). holds(arc(np,a,b),t+1):- holds(self_arc(np,a),t), holds(arc(det,a,c),t), holds(arc(noun,c,b),t), range(a),range(b),range(c),time(t). 3.4 Step 4 The result of the process is given by collecting an arc with category S sentence) which goes from the initial to the final node. The presence of this arc means that a sentence has been recognized to belong to the language generated by the provided grammar. If more than one arc like this one are found, it means that different syntactic structures where found for the same sentence and it is structurally ambiguos. If it happens that none of the arcs satisfied the previous conditions then the sentence was not recognized. We define two rules, called the inertia rules [4], they basically state that actions normally do not affect fluents or things remain unchanged unless something change them. Theses rules preserve any arc which exists in time T at the new time T+1. holds(arc(x,a,b),t+1):- holds(arc(x,a,b),t), not -holds(arc(x,a,b),t+1), range(a),range(b), time(t),rule(x). holds(arc(x,a,b),t+1):- holds(arc(x,a,b),t), not -holds(arc(x,a,b),t+1), range(a),range(b), time(t),word(x,a,b). These rules prevent an arc for been inserted if it is already inside.

3.5 Sample Output (formatted) ////////// tiempo 0 holds(arc(a,0,1),0) % holds(arc(dog,1,2),0) % The inicial arcs are introduced holds(arc(bite,2,3),0) % holds(arc(the,3,4),0) % holds(arc(girl,4,5),0) % ////////// tiempo 1 holds(arc(a,0,1),1) holds(arc(dog,1,2),1) holds(self_arc(v,2),1) holds(self_arc(det,0),1) % these arcs are introduced by the % inertial rule % A self_arc is an arc that points % to the same node And it goes on until the final table generated in time 6, shown below: ////////// tiempo 6 holds(arc(a,0,1),6) % block found in time 0 holds(arc(dog,1,2),6) % holds(self_arc(v,2),6) % block found in time 1 holds(self_arc(det,0),6) % holds(arc(v,2,3),6) % block found in time 2 holds(arc(det,0,1),6) holds(self_arc(np,0),6) % block found in time 3 holds(self_arc(np,3),6) holds(self_arc(vp,2),6) holds(arc(np,0,2),6) holds(arc(np,3,5),6) % block found in time 4 holds(self_arc(s,0),6) holds(self_arc(s,3),6) % block found in time 5 holds(arc(vp,2,5),6) holds(arc(s,0,5),6) % The sentence arc is generated % from node 0 to node 5 As shown in the final table the solution for the sentence was finally found. The process should keep running until no more arcs are generated as another solution may came out.

4 Conclusions A program build under the ASP model to accomplish a chart parsing of sentences of natural language has been developed. Even when the application was developed as the final project for a course in natural language processing, Some positive conclusions from the experience: a) It became to be a challenging and interesting project that provided us of a deeper understanding of the ASP model. b) Once you have realized the way in which the knowledge has to be represented in ASP it is more intuitive to model the problem. c) An interesting feature of the software is that it keeps track of all events that had happen during the developing of the solution which is very helpful to debugging. Some reflections on the resulting program and the AS Programming model are: a) At each step of the iteration every arc from the previous one, T, is copied into the new one, T+1. This fact implies that much information is redundant. b) In this first attempt the grammar is encoded inside the program and not just a different input to it. The most diserable configuration would be for the grammar to be independent of the program. c) It was not easy to find out the way in which the algorithm had to be developed in ASP, there are interesting material [1,2] but they were not written in order to introduce the newcomers into this programming area. References [1] M. Gelfond and V. lifschitz. The stable model semantics for logic programming. In Robert Kowalski and Kenneth Bowen, editors, Proceedings of International Logic Programming Confernce and Symposium, pp 1070-1080, 1988. [2] M. Gelfond and V. lifschitz. Classial Negation in Logic Programs and Disjunctive Databases. New Generation Computing, vol. 9. pp 365-385, 1991. [3] Gerald Gazdar and Chris Mellish, Natural Language Processing in Prolog: An Introduction to Computational Linguistics, Addison-Wesley Publishers, 1989. [4] Mónica Nogueira, Building Knowledge Systems in A-Prolog. Ph. D. Thesis, Computer Science Department, University of Texas at El Paso, 2003. [5] Christer Samuelsson and Mats Wirén, Parsing Thechnics, in Handbook on Natural Language Processing, Marcel Dekker Publishers, NY, 2000.