Lecture 8 Lexicalized and Probabilistic Parsing

Similar documents
Effective Self-Training for Parsing

Outline of today s lecture

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Basic Parsing Algorithms Chart Parsing

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Constraints in Phrase Structure Grammar

Syntax: Phrases. 1. The phrase

PP-Attachment. Chunk/Shallow Parsing. Chunk Parsing. PP-Attachment. Recall the PP-Attachment Problem (demonstrated with XLE):

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

Course: Model, Learning, and Inference: Lecture 5

Language Model of Parsing and Decoding

Paraphrasing controlled English texts

Tagging with Hidden Markov Models

Learning Translation Rules from Bilingual English Filipino Corpus

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA

CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature

Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar anoop/

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

Ling 201 Syntax 1. Jirka Hana April 10, 2006

A Chart Parsing implementation in Answer Set Programming

The English Genitive Alternation

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Introduction. Philipp Koehn. 28 January 2016

Automatic Detection and Correction of Errors in Dependency Treebanks

Statistical Machine Translation: IBM Models 1 and 2

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Natural Language Processing

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

A Statistical Parser for Czech*

Constituency. The basic units of sentence structure

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

Building a Question Classifier for a TREC-Style Question Answering System

Syntactic Theory. Background and Transformational Grammar. Dr. Dan Flickinger & PD Dr. Valia Kordoni

Artificial Intelligence Exam DT2001 / DT2006 Ordinarie tentamen

Phrase Structure Rules, Tree Rewriting, and other sources of Recursion Structure within the NP

POS Tagging 1. POS Tagging. Rule-based taggers Statistical taggers Hybrid approaches

Identifying Focus, Techniques and Domain of Scientific Papers

Comprendium Translator System Overview

A chart generator for the Dutch Alpino grammar

Special Topics in Computer Science

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Chapter 2 Phrases and Clauses

DEPENDENCY PARSING JOAKIM NIVRE

Detecting Parser Errors Using Web-based Semantic Filters

The parts of speech: the basic labels

Noam Chomsky: Aspects of the Theory of Syntax notes

Interactive Dynamic Information Extraction

Natural Language Database Interface for the Community Based Monitoring System *

Language Modeling. Chapter Introduction

Speech and Language Processing

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Understanding English Grammar: A Linguistic Introduction

KEY CONCEPTS IN TRANSFORMATIONAL GENERATIVE GRAMMAR

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE

Lecture 9. Phrases: Subject/Predicate. English 3318: Studies in English Grammar. Dr. Svetlana Nuernberg

The Role of Sentence Structure in Recognizing Textual Entailment

BILINGUAL TRANSLATION SYSTEM

31 Case Studies: Java Natural Language Tools Available on the Web

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided

Chinese Open Relation Extraction for Knowledge Acquisition

Syntactic Theory on Swedish

LESSON THIRTEEN STRUCTURAL AMBIGUITY. Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity.

A Mixed Trigrams Approach for Context Sensitive Spell Checking

Semantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing

GESE Initial steps. Guide for teachers, Grades 1 3. GESE Grade 1 Introduction

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Machine Learning for natural language processing

Overview of MT techniques. Malek Boualem (FT)

Author Gender Identification of English Novels

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Statistical Machine Translation

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Non-nominal Which-Relatives

Training Paradigms for Correcting Errors in Grammar and Usage

Machine Translation and the Translator

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

Automatic Pronominal Anaphora Resolution in English Texts

Natural Language to Relational Query by Using Parsing Compiler

Conditional Random Fields: An Introduction

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)

Big Data in Education

Clustering Connectionist and Statistical Language Processing

Transcription:

Lecture 8 Lexicalized and Probabilistic Parsing CS 6320 337

Outline PP Attachment Problem Probabilistic CFG Problems with PCFG Probabilistic Lexicalized CFG he Collins Parser Evaluating parsers Example 338

PP-attachment Problem I buy books for children I buy (books for children) Or I buy (for children) (books) 339

Semantic selection I ate a pizza with anchovies. I ate a pizza with friends. I ate (a pizza with anchovies). I ate (a pizza) (with friends). 340

Parse tree for pizza with anchovies 341

Parse tree for pizza with friends 342

More than one PP I saw the man in the house with the telescope. I saw (the man (in the house (with the telescope)). I saw (the man (in the house) (with the telescope)) I saw (the man) (in the house (with the telescope)). I saw (the man) (in the house) (with the telescope). 343

Accuracy for most probable attachment for a preposition Prep. %of total % V of 13.47% 6.17% to 13.27% 80.14% in 12.42% 73.64% for 6.87% 82.44% on 6.21% 75.51% with 6.17% 86.30% from 5.37% 75.90% at 4.09% 76.63% as 3.95% 86.51% by 3.53% 88.02% into 3.34% 89.52% that 1.74% 65.97% about 1.45% 70.85% over 1.30% 86.83% Accuracy on UPenn-II reebank: 81.73% 344

Semantic information Identify the correct meaning of the words Use this information to decide which is the most probable parse Ex.: eat pizza with friends 345

Pragmatic and discourse information o achieve 100% accuracy, we need this kind of information Examples Buy a car with a steering wheel you need knowledge about how the cars are made I saw that car in the picture you need also surrounding discourse [McLauchlan 2001 Maximum Entropy Models and Prepositional Phrase Ambiguity] 346

Probabilistic Context-Free Grammars A β[p] P(A β) P(A β A) P(RHS LHS) P(A β ) 1 β 347

348 Probabilistic Context Probabilistic Context-Free Grammar Free Grammar PCFG assigns a probability to each parse-tree ) ( ), ( 1 ) ( but ) ( ) ( ), ( )) ( ( ), ( P S P S P S P P S P n r p S P n ) ( arg max ) ˆ( ) yield(.. S P S S t s ) ( ), ( arg max ) ˆ( ) yield(.. S P S P S S t s ), ( arg max ) ˆ( ) yield(.. S P S S t s ) ( arg max ) ˆ( ) yield(.. P S S t s

Probabilistic Context-Free Grammars Figure 14.1 A PCFG that is a probabilistic augmentation of the L 1 miniature English CFG grammar and lexicon of Fig 13.1. hese probabilities were made up for pedagogical purposes and are not based on a corpus (since any real corpus would have many more rules, so thetrue probabilities of each rule would be much smaller). 349

Probabilistic Context-Free Grammars Figure 14.2 wo parse trees for an ambiguous sentence. he transitive parse on the left corresponds to the sensible meaning Book a flight that serves dinner, while the ditransitive parse on the right corresponds to the nonsensical meaning Book a flight on behalf of the dinner. 350

Probabilistic Context Free Grammars wo parse trees for an ambiguous sentence. Parse (a) corresponds to the meaning Can you book flights on behalf of WA, parse (b) to Can you book flights which are run by WA. 351

Probabilistic Context Free Grammars P( l P( r Note: ). 15. 40. 40 1. 5 ). 15. 75. 40. 1. 7 P( S). 40. 40. 50 10. 40. 40. 50 10 6. 05. 40. 40 6 P(, S) τ ( S) P( ) τ ( S). 40. 05. 30. 05. 40. 35. 05. 30. 75 Probabilities of a sentence is the sum of probabilities of all parse trees. Useful for Language Modeling 352

Probabilistic CKY Parsing Figure 14.3 he probabilistic CKY algorithm for finding the maximum probability parse of a string of num_wordswords given a PCFG grammar with num_rulesrules in Chomsky normal form. backis an array of backpointers used to recover the best parse. he build_tree function is left as an exercise to the reader. 353

Probabilistic CKY Parsing Figure 14.4 he beginning of the probabilistic CKY matrix. Filling out the rest of the chart is left as Exercise 14.4 for the reader. 354

Learning PCFG Probabilities reebank contains parse trees for a large corpus P( α β α ) Count ( α β ) γ Count ( α γ ) Count ( α β ) Count ( α ) 355

Problems with PCFG and Enhancement 1. Assumption that production probabilities are independent does not hold. Often the choice of how a node expands depends on the location of that node in the parse tree. Ex: syntactic subjects are often realized with pronouns, whereas direct objects use more non- pronominal nounphrases. NP Pronoun NP DetNoun 356

Problems with PCFG and Enhancement NP D NN.28 NP PRP.25 would be erroneous 357

Problems with PCFG and Enhancement 2.1 PCFG are insensitive to the words they expand; In reality the lexical information about words plays an important role in selecting correct parse trees. (a) I ate pizza with anchovies. (b) I ate pizza with friends. In (a) NP NP PP (NP attachment) (b) VP NP PP (VP attachment) PP attachment depends on the semantics of PP head noun. 358

Problems with PCFG and Enhancement 2.2 Lexical preference of verbs (Subcategorization) Moscow sent more than 100,000 soldiers into Afghanistan. NP into Afghanistan attaches to sent not to soldiers. his is because the verb send subcategorizes for destination, expressed by preposition into. 359

Problems with PCFG and Enhancement 2.3 Coordination Ambiguities (a) (dogs in houses) and (cats) (b) dogs in(houses and cats) (a) is preferred because dogs and cats are semantic siblings -i.e., animals. 360

Coordination Ambiguities Figure 14.7 An instance of coordination ambiguity. Although theleft structure is intuitively the correct one, a PCFG will assign them identical probabilities since both structures use exactly the same rules. After Collins (1999). 361

Probabilistic Lexicalized CFG Figure 14.5 wo possible parse trees for a prepositional phrase attachment ambiguity. he left parse is the sensible one, in which into a bin describes the resulting location of the sacks. In the right incorrect parse, the sacks to be dumped are the ones which are already into a bin, whatever that might mean. 362

Probabilistic Lexicalized CFG Figure 14.6 Another view of the preposition attachment problem. Should the PPon the right attach to the VPor NPnodes of the partial parse tree on the left? 363

Probabilistic Lexicalized CFG Figure 14.10 A lexicalized tree, including head tags, for a WSJ sentence, adapted from Collins (1999). Below we show the PCFG rules that would be needed for this parse tree, internal rules on the left, and lexical rules on the right. 364

Probabilistic Lexicalized CFG Lexical headsplay an important role since the semantics of the head dominates the semantics of that phrase. Annotate each non-terminal phrasal node in a parse tree with its lexical head. Workers dumped sacks into a bin. 365

Probabilistic Lexicalized CFG A lexicalized grammar shows lexical preferences between heads and their constituents. Probabilities are added to show the likelihood of each rule/head combination. 10 VP( dumped ) VBD( dumped ) NP( sacks) PP( into)[3 10 ] VP( dumped VP( dumped VP( dumped ) VBD( dumped ) VBD( dumped ) VBD( dumped ) NP( cats) PP( into) [8 10 ) NP( hats) PP( into) [4 10 ) NP( sacks) PP( above) [1 10 Since it is not possible to store all possibilities, one solution is to cluster some of the cases based on their semantic category. E.g., hats and sacks are inanimated objects. E.g., dumped prefers preposition into over above. 11 10 ] ] 12 ] 366

Probabilistic Lexicalized CFG A lexicalized tree from Collins (1999) An incorrect parse of the sentence from Collins (1999) 367

Probabilistic Lexicalized CFG p( VP C( VP( dumped) VBD NP PP) C( VP( dumped) β ) 6 0.67 9 p( VP VBD NP VP, dumped) C( VP( dumped) VBD NP) C( VP( dumped) β ) 0 9 VBD NP PP VP, dumped) β 0 β 368

Probabilistic Lexicalized CFG Head probabilities. he mother head is dumpedand the head of PPis into. p( into PP, dumped) C( X ( dumped) KPP( into) K) C( X ( dumped) KPPK) 2 9 β 0.22 369

Probabilistic Lexicalized CFG he mother head is sacksand the head of PPis into. p( into PP, sacks) C( X ( sacks) KPP( into) K) C( X ( sacks) KPPK) 0 0 β hus the head probabilities predict that dumpedis more likely to be modified by intothat is sacks. 370

Probabilistic Lexicalized CFG Modern parsers (Charniak, Collins, etc.) make simplifying assumptions about relating the heads of phrases to the heads of their constituents. In PCFG the probability of a node nbeing expanded by a rule r is conditioned only by the syntactic category of node n. Idea: add one more conditioning factor: the headword of the node h(n). p(r(n) n, h(n)) is the conditional probability of expanding rule rgiven the syntactic category of n and lexical information h(n). p ( r VP, dumped ) r is VP VBDNP PP 371

Probabilistic Lexicalized CFG How to compute the probability of a head? wo factors are important: syntactic category of a node neighboring heads p( h( n) word where h( m( n)) is i n, h( m( n))) the head of the node's mother. p ( head ( n) sacks n NP, h( m( n)) dumped is the probability that an NPwhose mother node is dumped has the head sacks. his probability captures the depending information between dumped and sacks. ) 372

Probabilistic Lexicalized CFG Update the formula for computing the probability of a parse. P(, S) p( r( n) n, h( n)) p( h( n) n, h( m( n))) An example: n Consider an incorrect parse tree for Workers dumped sacks into a bin, and compare it with the previous correct one. 373

How to Calculate Probabilities P( VP( dumped, VBD) VBD( dumped, VBD) NP( sacks, NNS) PP(into, P) can be estimated as Count( VP( dumped, VBD) VBD( dumped, VBD) NP( sacks, NNS) Count( VP( dumped, VBD)) PP( into, P)) However, this is difficult due to small number of times such a specific rule applies. Instead, make independence assumptions. P( VP( dumped, VBD) VBD( dumped, VBD) NP( sacks, NNS) PP( into, P)) Note: Modern statistical parsers differ in which independent assumptions they make. 374

he Collins Parser LHS L L 1... L HR... R 1R n n 1 1 n n P( VP( dumped, VBD) SOP VBD( dumped, VBD) NP( sacks, NNS ) PP( into, P)SOP ) 375

he Collins Parser P( VP( dumped, VBD) VBD( dumped, VBD) NP( sacks, NNS ) PP( into, P)) P H ( VBD VP, dumped ) P P P P L R R R ( SOP ( NP( sacks, NNS ) VP, VBD, dumped ) ( PP( into, P) VP, VBD, dumped ) ( SOP VP, VBD, dumped ) VP, VBD, dumped ) Count( VP( dumped, VBD) with NNS( sacks) as a daughter somewhere on the right) Count( VP( dumped, VBD)) 376

Evaluating Parsers labeled recall: # of correct constituents in hypothesisparse of s # of correct constituents in referenceparse of s labeled precision: # of correct constituents in hypothesisparse of s # of total constituents in hypothesisparse of s Fβ ( β 2 2 + 1) PR β P + R F 1 PR 2 P + R 377

References Chart parsing Caraballo, S. and Charniak, E. New figures of merit for best-first probabilistic chart parsing. Computational Linguistics 24(1998), 275-298 Charniak, E., Goldwater, S. and Johnson, M. Edge-based best-first chart parsing. In Proceedings of the Sixth Workshop on Very Large Corpora. 1998, 127-133 Charniak, E. A Maximum-Entropy-Inspired Parser Proceedings of NAACL-2000 Maximum entropy Berger, A.L., Pietra, S.A.D. and Pietra, V.J.D. A maximum entropy approach to natural language processing. Computational Linguistics 22 1 (1996), 39-71. Ratnaparkhi, A. Learning to parse natural language with maximum entropy models. Machine Learning 34 1/2/3 (1999), 151-176. Charniak s parser on web: ftp://ftp.cs.brown.edu/pub/nlparser/ 378