Outline of today s lecture



Similar documents
Paraphrasing controlled English texts

Syntax: Phrases. 1. The phrase

Basic Parsing Algorithms Chart Parsing

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Constraints in Phrase Structure Grammar

Phrase Structure Rules, Tree Rewriting, and other sources of Recursion Structure within the NP

Sentence Structure/Sentence Types HANDOUT

Structure of Clauses. March 9, 2004

Learning Translation Rules from Bilingual English Filipino Corpus

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

Ling 201 Syntax 1. Jirka Hana April 10, 2006

SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE

A Chart Parsing implementation in Answer Set Programming

03 - Lexical Analysis

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Noam Chomsky: Aspects of the Theory of Syntax notes

A chart generator for the Dutch Alpino grammar

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Natural Language Processing

Structurally ambiguous sentences

31 Case Studies: Java Natural Language Tools Available on the Web

Constituency. The basic units of sentence structure

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

Understanding English Grammar: A Linguistic Introduction

Natural Language to Relational Query by Using Parsing Compiler

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

PP-Attachment. Chunk/Shallow Parsing. Chunk Parsing. PP-Attachment. Recall the PP-Attachment Problem (demonstrated with XLE):

Syntactic Theory. Background and Transformational Grammar. Dr. Dan Flickinger & PD Dr. Valia Kordoni

Artificial Intelligence Exam DT2001 / DT2006 Ordinarie tentamen

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

I have eaten. The plums that were in the ice box

The parts of speech: the basic labels

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Rethinking the relationship between transitive and intransitive verbs

Compiler I: Syntax Analysis Human Thought

Non-nominal Which-Relatives

A Knowledge-based System for Translating FOL Formulas into NL Sentences

6-1. Process Modeling

Syntaktická analýza. Ján Šturc. Zima 208

Semester Review. CSC 301, Fall 2015

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

UNIVERSITY COLLEGE LONDON EXAMINATION FOR INTERNAL STUDENTS

Building a Question Classifier for a TREC-Style Question Answering System

Slot Grammars. Michael C. McCord. Computer Science Department University of Kentucky Lexington, Kentucky 40506

Lecture 9. Semantic Analysis Scoping and Symbol Table

Lecture 9. Phrases: Subject/Predicate. English 3318: Studies in English Grammar. Dr. Svetlana Nuernberg

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Comprendium Translator System Overview

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Context Grammar and POS Tagging

Bottom-Up Parsing. An Introductory Example

LESSON THIRTEEN STRUCTURAL AMBIGUITY. Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity.

The Role of Sentence Structure in Recognizing Textual Entailment

Effective Self-Training for Parsing

GRAMMAR EFFICIENCY OF PARTS-OF-SPEECH SYSTEMS. A thesis submitted. to Kent State University in partial. fulfillment of the requirements for the

Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA

Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar anoop/

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

CSCI 3136 Principles of Programming Languages

English prepositional passive constructions

CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

DEPENDENCY PARSING JOAKIM NIVRE

A Writer s Reference, Seventh Edition Diana Hacker Nancy Sommers

Understanding Clauses and How to Connect Them to Avoid Fragments, Comma Splices, and Fused Sentences A Grammar Help Handout by Abbie Potter Henry

A Survey on Product Aspect Ranking

Shallow Parsing with Apache UIMA

If-Then-Else Problem (a motivating example for LR grammars)

Surface Realisation using Tree Adjoining Grammar. Application to Computer Aided Language Learning

Grammar Unit: Pronouns

SENTENCE PUZZLE RACE. by Zoltan Golcz

The Syntax of Predicate Logic

Static vs. Dynamic. Lecture 10: Static Semantics Overview 1. Typical Semantic Errors: Java, C++ Typical Tasks of the Semantic Analyzer

CS 533: Natural Language. Word Prediction

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

Design Patterns in Parsing

Organizational Issues Arising from the Integration of the Lexicon and Concept Network in a Text Understanding System

Check, Revise, and Edit Chart

BBC LEARNING ENGLISH 6 Minute Grammar Active and passive voice

Introduction. Philipp Koehn. 28 January 2016

Identifying Focus, Techniques and Domain of Scientific Papers

Syntactic Theory on Swedish

Semantic parsing with Structured SVM Ensemble Classification Models

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

Transcription:

Outline of today s lecture Generative grammar Simple context free grammars Probabilistic CFGs Formalism power requirements

Parsing Modelling syntactic structure of phrases and sentences. Why is it useful? as a step in assigning semantics checking grammaticality applications: e.g. produce features for classification in sentiment analysis lexical acquisition

Generative grammar Generative grammar a formally specified grammar that can generate all and only the acceptable sentences of a natural language Internal structure: the big dog slept can be bracketed ((the (big dog)) slept) constituent a phrase whose components form a coherent unit The internal structures are typically given labels, e.g. the big dog is a noun phrase (NP) and slept is a verb phrase (VP)

Simple context free grammars Context free grammars 1. a set of non-terminal symbols (e.g., S, VP); 2. a set of terminal symbols (i.e., the words); 3. a set of rules (productions), where the LHS (mother) is a single non-terminal and the RHS is a sequence of one or more non-terminal or terminal symbols (daughters); S -> NP VP V -> fish 4. a start symbol, conventionally S, which is a non-terminal. Exclude empty productions, NOT e.g.: NP-> ǫ

Simple context free grammars A simple CFG for a fragment of English lexicon rules S -> NP VP VP -> VP PP VP -> V VP -> V NP VP -> V VP NP -> NP PP PP -> P NP V -> can V -> fish NP -> fish NP -> rivers NP -> pools NP -> December NP -> Scotland NP -> it NP -> they P -> in

Simple context free grammars Analyses in the simple CFG they fish (S (NP they) (VP (V fish))) (S (NP they) (VP (V can) (VP (V fish)))) (S (NP they) (VP (V can) (NP fish))) they fish in rivers (S (NP they) (VP (VP (V fish)) (PP (P in) (NP rivers))))

Simple context free grammars Analyses in the simple CFG they fish (S (NP they) (VP (V fish))) (S (NP they) (VP (V can) (VP (V fish)))) (S (NP they) (VP (V can) (NP fish))) they fish in rivers (S (NP they) (VP (VP (V fish)) (PP (P in) (NP rivers))))

Simple context free grammars Analyses in the simple CFG they fish (S (NP they) (VP (V fish))) (S (NP they) (VP (V can) (VP (V fish)))) (S (NP they) (VP (V can) (NP fish))) they fish in rivers (S (NP they) (VP (VP (V fish)) (PP (P in) (NP rivers))))

Simple context free grammars Structural ambiguity without lexical ambiguity they fish in rivers in December (S (NP they) (VP (VP (VP (V fish)) (PP (P in) (NP rivers))) (PP (P in) (NP December)))) (S (NP they) (VP (VP (V fish)) (PP (P in) (NP rivers) (PP (P in) (NP December)))))

Simple context free grammars Structural ambiguity without lexical ambiguity they fish in rivers in December (S (NP they) (VP (VP (VP (V fish)) (PP (P in) (NP rivers))) (PP (P in) (NP December)))) (S (NP they) (VP (VP (V fish)) (PP (P in) (NP rivers) (PP (P in) (NP December)))))

Simple context free grammars Parse trees S NP VP they V VP can VP V P fish in PP NP December (S (NP they) (VP (V can) (VP (VP (V fish)) (PP (P in) (NP December)))))

Chart parsing chart store partial results of parsing in a vector edge representation of a rule application Edge data structure: [id,left_vtx, right_vtx,mother_category, dtrs]. they. can. fish. 0 1 2 3 Fragment of chart: id left right mother daughters 1 0 1 NP (they) 2 1 2 V (can) 3 1 2 VP (2) 4 0 2 S (1 3)

Bottom up parsing: edges 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V

Bottom up parsing: edges 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V

Bottom up parsing: edges 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V

Bottom up parsing: edges 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V

Bottom up parsing: edges 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V

Bottom up parsing: edges 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V

Bottom up parsing: edges 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V

Bottom up parsing: edges 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V

Bottom up parsing: edges 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V

Bottom up parsing: edges 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V

Bottom up parsing: edges 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V

A bottom-up passive chart parser Parse: Initialize the chart For each word word, let from be left vtx, to right vtx and dtrs be (word) For each category category lexically associated with word Add new edge from, to, category, dtrs Output results for all spanning edges

Inner function Add new edge from, to, category, dtrs: Put edge in chart: [id,from,to, category,dtrs] For each rule lhs cat 1... cat n 1,category Find sets of contiguous edges [id 1,from 1,to 1, cat 1,dtrs 1 ]... [id n 1,from n 1,from, cat n 1,dtrs n 1 ] (such that to 1 = from 2 etc) For each set of edges, Add new edge from 1, to, lhs, (id 1... id)

Parse construction 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V word = they, categories = {NP} Add new edge 0, 1, NP, (they) Matching grammar rules: {VP V NP, PP P NP} No matching edges corresponding to V or P

Parse construction 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V word = can, categories = {V} Add new edge 1, 2, V, (can) Matching grammar rules: {VP V} recurse on edges {(2)}

Parse construction 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V Add new edge 1, 2, VP, (2) Matching grammar rules: {S NP VP, VP V VP} recurse on edges {(1,3)}

Parse construction 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V Add new edge 0, 2, S, (1, 3) No matching grammar rules for S Matching grammar rules: {S NP VP, VP V VP} No edges for V VP

Parse construction 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V word = fish, categories = {V, NP} Add new edge 2, 3, V, (fish) Matching grammar rules: {VP V} recurse on edges {(5)} NB: fish as V

Parse construction 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V Add new edge 2, 3, VP, (5) Matching grammar rules: {S NP VP, VP V VP} No edges match NP recurse on edges for V VP: {(2,6)}

Parse construction 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V Add new edge 1, 3, VP, (2, 6) Matching grammar rules: {S NP VP, VP V VP} recurse on edges for NP VP: {(1,7)}

Parse construction 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V Add new edge 0, 3, S, (1, 7) No matching grammar rules for S Matching grammar rules: {S NP VP, VP V VP} No edges matching V

Parse construction 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V Add new edge 2, 3, NP, (fish) NB: fish as NP Matching grammar rules: {VP V NP, PP P NP} recurse on edges for V NP {(2,9)}

Parse construction 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V Add new edge 1, 3, VP, (2, 9) Matching grammar rules: {S NP VP, VP V VP} recurse on edges for NP VP: {(1, 10)}

Parse construction 11:S 10:VP 8:S 9:NP 4:S 7:VP 3:VP 6:VP 1:NP 2:V 5:V Add new edge 0, 3, S, (1, 10) No matching grammar rules for S Matching grammar rules: {S NP VP, VP V VP} No edges corresponding to V VP Matching grammar rules: {VP V NP, PP P NP} No edges corresponding to P NP

Resulting chart. they. can. fish. 0 1 2 3 id left right mother daughters 1 0 1 NP (they) 2 1 2 V (can) 3 1 2 VP (2) 4 0 2 S (1 3) 5 2 3 V (fish) 6 2 3 VP (5) 7 1 3 VP (2 6) 8 0 3 S (1 7) 9 2 3 NP (fish) 10 1 3 VP (2 9) 11 0 3 S (1 10)

Output results for spanning edges Spanning edges are 8 and 11: Output results for 8 (S (NP they) (VP (V can) (VP (V fish)))) Output results for 11 (S (NP they) (VP (V can) (NP fish))) Note: sample chart parsing code in Java is downloadable from the course web page.

Packing To make parsing more efficient: don t add equivalent edges as whole new edges dtrs is a set of lists of edges (to allow for alternatives) about to add: [id,l_vtx, right_vtx,ma_cat, dtrs] and there is an existing edge: [id-old,l_vtx, right_vtx,ma_cat, dtrs-old] we simply modify the old edge to record the new dtrs: [id-old,l_vtx, right_vtx,ma_cat, dtrs-old dtrs] and do not recurse on it: never need to continue computation with a packable edge.

Packing example 1 0 1 NP {(they)} 2 1 2 V {(can)} 3 1 2 VP {(2)} 4 0 2 S {(1 3)} 5 2 3 V {(fish)} 6 2 3 VP {(5)} 7 1 3 VP {(2 6)} 8 0 3 S {(1 7)} 9 2 3 NP {(fish)} Instead of edge10 1 3 VP {(2 9)} 7 1 3 VP {(2 6), (2 9)} and we re done

Packing example 8:S 9:NP 4:S 7:VP+ 3:VP 6:VP 1:NP 2:V 5:V Both spanning results can now be extracted from edge 8.

Packing example 10:VP 8:S 9:NP 4:S 7:VP+ 3:VP 6:VP 1:NP 2:V 5:V Both spanning results can now be extracted from edge 8.

Packing example 8:S 9:NP 4:S 7:VP+ 3:VP 6:VP 1:NP 2:V 5:V Both spanning results can now be extracted from edge 8.

Probabilistic CFGs Producing n-best parses manual weight assignment probabilistic CFG trained on a treebank automatic grammar induction automatic weight assignment to existing grammar beam-search

Formalism power requirements Why not FSA? centre-embedding: A αaβ generate grammars of the form a n b n. For instance: the students the police arrested complained However, limits on human memory / processing ability:? the students the police the journalists criticised arrested complained More importantly: Without internal structure, we can t build good semantic representations

Formalism power requirements Why not FSA? centre-embedding: A αaβ generate grammars of the form a n b n. For instance: the students the police arrested complained However, limits on human memory / processing ability:? the students the police the journalists criticised arrested complained More importantly: Without internal structure, we can t build good semantic representations

Formalism power requirements Overgeneration in atomic category CFGs agreement: subject verb agreement. e.g., they fish, it fishes, *it fish, *they fishes. * means ungrammatical case: pronouns (and maybe who/whom) e.g., they like them, *they like they S -> NP-sg-nom VP-sg S -> NP-pl-nom VP-pl VP-sg -> V-sg NP-sg-acc VP-sg -> V-sg NP-pl-acc VP-pl -> V-pl NP-sg-acc VP-pl -> V-pl NP-pl-acc NP-sg-nom -> he NP-sg-acc -> him NP-sg-nom -> fish NP-pl-nom -> fish NP-sg-acc -> fish NP-pl-acc -> fish BUT: very large grammar, misses generalizations, no way of saying when we don t care about agreement.

Formalism power requirements Overgeneration in atomic category CFGs agreement: subject verb agreement. e.g., they fish, it fishes, *it fish, *they fishes. * means ungrammatical case: pronouns (and maybe who/whom) e.g., they like them, *they like they S -> NP-sg-nom VP-sg S -> NP-pl-nom VP-pl VP-sg -> V-sg NP-sg-acc VP-sg -> V-sg NP-pl-acc VP-pl -> V-pl NP-sg-acc VP-pl -> V-pl NP-pl-acc NP-sg-nom -> he NP-sg-acc -> him NP-sg-nom -> fish NP-pl-nom -> fish NP-sg-acc -> fish NP-pl-acc -> fish BUT: very large grammar, misses generalizations, no way of saying when we don t care about agreement.

Formalism power requirements Subcategorization intransitive vs transitive etc verbs (and other types of words) have different numbers and types of syntactic arguments: *Kim adored *Kim gave Sandy *Kim adored to sleep Kim liked to sleep *Kim devoured Kim ate Subcategorization is correlated with semantics, but not determined by it.

Formalism power requirements Overgeneration because of missing subcategorization Overgeneration: they fish fish it (S (NP they) (VP (V fish) (VP (V fish) (NP it)))) Informally: need slots on the verbs for their syntactic arguments. intransitive takes no following arguments (complements) simple transitive takes one NP complement like may be a simple transitive or take an infinitival complement, etc

Formalism power requirements Long-distance dependencies 1. which problem did you say you don t understand? 2. who do you think Kim asked Sandy to hit? 3. which kids did you say were making all that noise? gaps (underscores below) 1. which problem did you say you don t understand _? 2. who do you think Kim asked Sandy to hit _? 3. which kids did you say _ were making all that noise? In 3, the verb were shows plural agreement. * what kid did you say _ were making all that noise? The gap filler has to be plural. Informally: need a gap slot which is to be filled by something that itself has features.