PP-Attachment. Chunk/Shallow Parsing. Chunk Parsing. PP-Attachment. Recall the PP-Attachment Problem (demonstrated with XLE):



Similar documents
Syntax: Phrases. 1. The phrase

Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA

Outline of today s lecture

Shallow Parsing with Apache UIMA

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Artificial Intelligence Exam DT2001 / DT2006 Ordinarie tentamen

Special Topics in Computer Science

Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu

Learning Translation Rules from Bilingual English Filipino Corpus

31 Case Studies: Java Natural Language Tools Available on the Web

How to make Ontologies self-building from Wiki-Texts

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Ask your teacher about any which you aren t sure of, especially any differences.

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Effective Self-Training for Parsing

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Natural Language Processing

Understanding English Grammar: A Linguistic Introduction

The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle

Automatic Detection and Correction of Errors in Dependency Treebanks

Natural Language Database Interface for the Community Based Monitoring System *

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

Design Patterns in Parsing

A Chart Parsing implementation in Answer Set Programming

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

Interactive Dynamic Information Extraction

Basic Parsing Algorithms Chart Parsing

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014

Customizing an English-Korean Machine Translation System for Patent Translation *

Overview of MT techniques. Malek Boualem (FT)

Movement and Binding

Brill s rule-based PoS tagger

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Stock Market Prediction Using Data Mining

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments

An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Chinese Open Relation Extraction for Knowledge Acquisition

LESSON THIRTEEN STRUCTURAL AMBIGUITY. Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity.

The Role of Sentence Structure in Recognizing Textual Entailment

A Machine Translation System Between a Pair of Closely Related Languages

Chapter 13, Sections Auxiliary Verbs CSLI Publications

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Year 3 Grammar Guide. For Children and Parents MARCHWOOD JUNIOR SCHOOL

BILINGUAL TRANSLATION SYSTEM

Semantic analysis of text and speech

Introduction. Philipp Koehn. 28 January 2016

Context Grammar and POS Tagging

INTEGRATED DEVELOPMENT ENVIRONMENTS FOR NATURAL LANGUAGE PROCESSING

Optimization of Internet Search based on Noun Phrases and Clustering Techniques

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

MARY. V NP NP Subject Formation WANT BILL S

Identifying Focus, Techniques and Domain of Scientific Papers

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

Mining Opinion Features in Customer Reviews

Search Engine Based Intelligent Help Desk System: iassist

Theorem (informal statement): There are no extendible methods in David Chalmers s sense unless P = NP.

Comprendium Translator System Overview

CALICO Journal, Volume 9 Number 1 9

Grammar Presentation: The Sentence

A Mixed Trigrams Approach for Context Sensitive Spell Checking

A. Schedule: Reading, problem set #2, midterm. B. Problem set #1: Aim to have this for you by Thursday (but it could be Tuesday)

PoS-tagging Italian texts with CORISTagger

A Survey on Product Aspect Ranking

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Design and Development of Part-of-Speech-Tagging Resources for Wolof

Linguistic Knowledge-driven Approach to Chinese Comparative Elements Extraction

Noam Chomsky: Aspects of the Theory of Syntax notes

Question Prediction Language Model

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology

Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database

Bilingual Dialogs with a Network Operating System

Transcription:

PP-Attachment Recall the PP-Attachment Problem (demonstrated with XLE): Chunk/Shallow Parsing The girl saw the monkey with the telescope. 2 readings The ambiguity increases exponentially with each PP. The girl saw the monkey with the telescope in the garden. Miriam Butt October 2002 4 readings The girl saw the monkey with the telescope in the garden under the tree. 9 readings PP-Attachment It would be nice to not have to solve the Attachment relations. Or, at least, to leave them to a later component so that one can already do some shallow parsing. The girl saw the monkey with the telescope. [The girl] saw [the monkey] [with the telescope]. NP NP PP Chunk Parsing Steve Abney pioneered the notion of chunk parsing. [I begin] [with an intuituion]: [when I read] [a sentence], [I read it] [a chunk] [at a time]. Chunks appear to correspond roughly to prosodic phrases (though nobody has really researched this from the phonology-syntax interface perspective). Strong stress will be once a chunk. Pauses are most likely to fall between chunks.

Chunk Parsing Chunks are difficult to define precisely. A typical chunk consists of a content word surrounded by a constellation of function words, matching a fixed template. (Abney 1991). [I begin] [with an intuition]: [when I read] [a sentence], [I read it] [a chunk] [at a time]. [The girl] saw [the monkey] [with the telescope]. By contrast, the relationship between chunks is mediated more by lexical selection than by rigid templates. (Abney 1991). Chunk Parsing:A Three-Step Process Abney envisioned chunk parsing as a cascade of (finite-state) processes: Word Identification (in practice mostly done via POS-Tagging) Chunk Identification (via rules) Attachment of Chunks (Parsing, also via rules) Chunk Parsing:Word Identification The effort to establish such a conclusion must have two foci,... [the/det] [effort/n] [to/inf-to/p] [establish/v] [such/predet/det/pron] [a/det] [conclusion/n] [of course/adv] [must/n/v] [have/v] [two/num] [foci/n] [,/Comma] Problems: Ambiguity, Multiwords (need lexicons and stochastic methods) Chunk Parsing:Chunk Identification The effort to establish such a conclusion must have two foci,... [the effort/dp] [to establish/cp-inf] [such a conclusion/dp] [of course must have/cp] [two foci/dp] [,/Comma] Problems: Where does a chunk end? What do I want to call my chunks?

Chunk Parsing:Attachment The effort to establish such a conclusion must have two foci,... Chunk Parsing:Attachment Attachment Preferences: Prefer argument attachment, prefer verb attachment. Prefer low attachment. IP: DP: [the effort] CP-inf: [[to establish] [such a conclusion]] VP: [[of course must have] [two foci]] [,/Comma] S NP VP V NP PP The girl saw the monkey with the telescope. Chunk Parsing:Finding an End Unlike the ends of sentences, the ends of chunks are not overtly marked. Solution: What to do? Mark when you are the beginning (<B>) of a chunk or in the middle (<I>). Hallucinate end-of-chunk markers everywhere and then take those that fit the given template/pattern (this turns out not be costly). Prefer longer chunks to shorter chunks. Chunk Parsing On-Line 1. Uni Zürich, Interactive Tools 2. POS-Tagger and Chunker from Infogistics (on the web: www.infogistics.com) References: Some References to Chunk Parsing can be found on Steve Abney s home page: http://www.vinartus.net/spa/publications.html

Why is Shallow Parsing Good? Shallow Parsing or Partial Parsing can be used for: Bootstrapping a more complete parser Constructing a tree bank (annotated text) which other applications can use. Extraction of specialized terminology or multi-words Information Retrieval (delimits the search space) Noun Chunking Applications Many applications seem to concentrate on Noun Chunking rather than trying to analyze a whole sentence. That is: only the noun chunks are identified and labeled. Church (1988): uses input of an HMM POS-tagger and brackets those that are noun chunks. [ ] [ ] DT NN VBD IN NN CS the prosecutor said in closing that Noun Chunking Applications Example: A German noun chunker (Schmid and Schulte im Walde 2000) Correct Identification: 92% With Syntactic Category and Case Information: 83% What is NP Chunking Good For? Names tend to be a problem for parsers: new ones are constantly being made up and there is a lot of specialized terminology in medical texts or manuals. NP Chunkers can help by identifying these special terms so that they can be listed in a specialized lexicon.

What is NP Chunking Good For? Date/Time NPs and Name NPs have a very different syntax from normal NPs. I can meet you on [Thursday the fifth of November at 12:30 pm]. [Professor Dr. John A. Smith] arrived at the university. NP Chunkers can again help to identifying these special NPs without needing to use a specialized, painfully handwritten grammar. Interactive Annotation: Annotate Developed by Thorsten Brants and Oliver Plaehn at the DFKI (German Center for Artificial Intelligence). Includes a Stochastic POS-Tagger Allows for further manual annotation/parsing (identification of NPs/PPs, etc.) Watches the user with a statistical component and learns the grammar the user is working with. Provides the user with guesses as to an analysis which is correct about 70% of the time. Interactive Annotation: Annotate Types of Shallow Parsers Sophisticated versions of POS-Taggers Chunk Parsers (also rely on POS-Tags) Finite-State Parsers Stochastic Processes which learn a grammar Combinations of all of the above.