Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde



Similar documents
2014/02/13 Sphinx Lunch

Outline of today s lecture

Clustering Connectionist and Statistical Language Processing

A Mixed Trigrams Approach for Context Sensitive Spell Checking

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Brill s rule-based PoS tagger

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

Interactive Dynamic Information Extraction

Context Grammar and POS Tagging

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

Building a Question Classifier for a TREC-Style Question Answering System

Customizing an English-Korean Machine Translation System for Patent Translation *

Natural Language Database Interface for the Community Based Monitoring System *

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

Projektgruppe. Categorization of text documents via classification

A chart generator for the Dutch Alpino grammar

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work

Natural Language to Relational Query by Using Parsing Compiler

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets

Language and Computation

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Search Engine Based Intelligent Help Desk System: iassist

PoS-tagging Italian texts with CORISTagger

Comprendium Translator System Overview

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

Training and evaluation of POS taggers on the French MULTITAG corpus

Semantic annotation of requirements for automatic UML class diagram generation

Web Document Clustering

NewsX. Diploma Thesis. Event Extraction from News Articles. Dresden University of Technology

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

Constraints in Phrase Structure Grammar

Clustering of Polysemic Words

Semantic analysis of text and speech

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

POS Tagging 1. POS Tagging. Rule-based taggers Statistical taggers Hybrid approaches

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

SWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives

What Is This, Anyway: Automatic Hypernym Discovery

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)

Stock Market Prediction Using Data Mining

SENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES

Detecting Parser Errors Using Web-based Semantic Filters

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Overview of MT techniques. Malek Boualem (FT)

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Identifying Focus, Techniques and Domain of Scientific Papers

A Method for Automatic De-identification of Medical Records

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation

Automatic Pronominal Anaphora Resolution in English Texts

Special Topics in Computer Science

31 Case Studies: Java Natural Language Tools Available on the Web

Text Analysis beyond Keyword Spotting

Statistical Machine Translation

How to make Ontologies self-building from Wiki-Texts

Social Media Text and Predictive Analytics

Text Generation for Abstractive Summarization

A Framework for Ontology-Based Knowledge Management System

Shallow Parsing with Apache UIMA

Speech and Language Processing

Computer Standards & Interfaces

Overview of the TACITUS Project

Thesis Proposal Verb Semantics for Natural Language Understanding

Timeline (1) Text Mining Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?

Tagging a corpus of Malay texts, and coping with syntactic drift. Gerry Knowles, Lancaster University, UK Zuraidah Mohd Don, University of Malaya.

Terminology Extraction from Log Files

Transcription:

Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction properties assumption: The arguments are independent of the verb given the cluster. probability of the tuple (read, SUBJ:OBJ, Peter, book) p(c)p(read c)p(sbj:obj c) p(peter c, S:O, 1) p(book c, S:O, 2) c Selectional restrictions are expressed with WordNet concepts r. p(book slot) = r p(r slot)p(book r) training with Expectation-Maximization algorithm and the Minimum-Description-Length principle

D4 Project: Phase 2 Main Goals Verb Clustering Extension of the verb-clustering model to deal with adjuncts Induction of noun taxonomies Parsing Usage of very large amounts of context information for syntactic disambiguation Selection of the most relevant subset of context information with decision trees

Extension of the Verb-Clustering Model The current model only deals with arguments. Verb-argument tuples extracted from automatically parsed data also contain adjuncts (which are currently treated like arguments). A more adequate modelling of adjuncts is likely to improve the verb-clustering model. Automatic distinction between arguments and adjuncts New insights into into the difference between arguments and adjuncts

Modelling of Adjuncts Observations (Schütze, Merlo) Arguments are verb-specific; adjuncts combine with a wide range of verbs. read/sleep/... in the garden believe in god Only the first one of a sequence of PPs is a possible argument. The others must be adjuncts. John waited for Sandy in the hotel at the bar until 9 o clock.

Extension of the Model Generation of the tuple (C5, read, SUBJ:NP, Peter, book, yesterday) 1 Choose a verb cluster p(c5) 2 Choose a verb given cluster C5 p(read C5) 3 Choose a subcat frame given cluster C5 p(subj:np C5) 4 Choose a filler for each slot given the cluster, the frame, and the slot number p(peter C5, SUBJ:NP, 1) p(book C5, SUBJ:NP, 2)

Extension of the Model Generation of the tuple (C5, read, SUBJ:NP, Peter, book, yesterday) 1 Choose a verb cluster p(c5) 2 Choose a verb given cluster C5 p(read C5) 3 Choose a subcat frame given cluster C5 p(subj:np C5) 4 Choose a filler for each slot given the cluster, the frame, and the slot number p(peter C5, SUBJ:NP, 1) p(book C5, SUBJ:NP, 2) 5 Decide whether to add an adjunct (yes) p adjunct 6 Choose an adjunct filler p(yesterday) 7 Decide whether to add another adjunct (no) 1 p adjunct Assumption: Adjuncts are independent of the verb (cluster). They freely combine with any verb.

Training on Underspecified Tuples From a parse of the sentence John waited for Mary for half an hour. without an argument/adjunct distinction, the following underspecified tuple is extracted: (wait, {SUBJ:PP, SUBJ}, John, for Mary, for hour) The verb-clustering model is trained on a corpus of underspecified (as well as on fully specified) tuples. Expectation: The trained model will assign a higher probability to the tuple with the frame SUBJ:PP, and thereby will correctly resolve the argument/adjunct ambiguity.

Induction of Noun Hierarchies

Selectional Preferences Semantic realization of a predicate s complement Reference to the syntactic function and the thematic role Example: drink tea, drink coffee, drink beer, etc. drink a beverage ( drink a substance) Preference: degree of acceptability Requires inventory (and organisation) of semantic categories clusters / WordNet

Selectional Preference Sources Available sources of selectional preferences: WordNet (used by most work) Cluster (few) Similarity-based approach (only one) Question: Can we induce selectional preferences using a noun hierarchy that is NOT manually pre-defined? Goals: 1 Automatic induction of noun hierarchy 2 Application to selectional preferences (in D4 verb clustering model)

Approaches: Overview Cluster-based selectional preferences Pereira et al. (1993) Rooth et al. (1999), Schulte im Walde et al. (2008) WordNet-based selectional preferences Resnik (1997) - association strength Li/Abe (1998) - MDL cut Abney/Light (1999) - HMM Ciaramita/Johnson (2000) - Bayesian belief network Clark/Weir (2002) - MDL cut Light/Greiff (2002) - summary of approaches Brockmann/Lapata (2003) - comparison

WordNet-based Preferences <entity> <object, inanimate object, physical object> <substance, matter> <fuel> <food, nutrient> <body substance> <beverage, drink, potable> <tea> <coffee, java > <milk> direct objects of drink

Approach by Erk (2007) Katrin Erk (2007): A simple, similarity-based model for selectional preferences. In: Proceedings of the 45th Conference of the Association for Computational Linguistics Alternative approach to WordNet-based models Basis: corpus-based semantic similarity metrics generalise from seen headwords to other, similar words Independence of manual lexical resource Corpus for computing the similarity metrics can be freely chosen, allowing domain variation Application: semantic roles

Approach by Erk (2007) Primary corpus: extract tuples <p,r,w> of a predicate p, an argument position r, and a seen headword w Generalisation corpus: compute a corpus-based semantic similarity metric Selectional preference S of a functional relation r for a possible headword w 0 is modelled as a weighted sum of the similarities between w 0 and the seen headwords w: S r (w 0 ) = w Seen(r) sim(w 0, w)α(w) Metrics: mutual information, cosine, dice, etc. Weights α: uniform, inverse document frequency, etc.

Approach by Erk (2007) Selectional preferences computed for frame-semantic roles, e.g., what is the selectional preference for the Vehicle role of Ride Vehicle? Primary corpus: FrameNet data Generalisation corpus: British National Corpus Ride vehicle:vehicle Jaccard: truck, boat, coach, van, ship, lorry, creature, flight, guy, carriage, helicopter, lad Cosine: it, there, they, that, i, ship, second one, machine, e, other one, response, second Ingest substance:substance Jaccard: loaf, ice cream, you, some, that, er, photo, kind, he, type, thing, milk Cosine: there, that, object, argument, them, version, machine, result, response, item, concept, s

Induction of Noun Hierarchy Parts: 1 Create classes of synonymous/similar nouns rely on lexico-syntactic patterns (Hearst, 1992) rely on word graphs (Dorow, 2007) rely on similarity measures (similar to Erk, 2007) use standard clustering techniques 2 Organise classes in is-a hierarchy rely on lexico-syntactic patterns (Hearst, 1992) use word graphs (Dorow, 2007) for disambiguation specify similarity measures for hypernym relation

Syntactic Disambiguation with Very Large Contexts in a Statistical Parser

Statistical Parsing A statistical parser computes the most probable parse tree. The parse tree probability is the product of the rule probabilities. The probability of a grammar rule such as VP VP, VP, CC VP is often decomposed into a product of symbol probabilities: p(vp VP...) * p(, VP VP...) * p(vp VP VP,...) * p(, VP VP, VP...) * p(cc VP VP, VP,...) * p(vp VP VP, VP, CC...) Crucial task: accurate estimation of these probabilities

POS Tagging Same task, different context: A part-of-speech tagger finds the most probable POS sequence for a given word sequence: Das zu versteuernde Einkommen sinkt ART.Def.Nom.Sg.Neut PART.Zu ADJA.Pos.Nom.Sg.Neut N.Reg.Nom.Sg.Neut VFIN.Full.3.Sg.Pres.Ind HMM POS taggers compute the probability of a POS tag sequence which is decomposed as follows: p(art.def.nom.sg.neut) * p(part.zu ART.Def.Nom.Sg.Neut) * p(adja.pos.nom.sg.neut ART.Def.Nom.Sg.Neut, PART.Zu) *...

Sparse Data Problem Given a fine-grained POS tagset and a large context size, many grammatical POS sequences are rare or not observed at all. The POS trigram ART.Def.Nom.Sg.Neut, PART.Zu, ADJA.Pos.Nom.Sg.Neut e.g. does not occur in the Tiger treebank. The trigram probability is 0. The best tag sequence for the example sentence Das zu versteuernde Einkommen sinkt cannot be found.

Decomposition Solution proposed in Schmid/Laws 2008: Replace each POS tag by an attribute vector ADJA.Pos.Nom.Sg.Neut (ADJA, Pos, Nom, Sg, Neut) Decompose the tag probability into a product of attribute probabilities p(adja.pos.nom.sg.neut ART.Def.Nom.Sg.Neut, PART.Zu) = p(adja ART.Def.Nom.Sg.Neut, PART.Zu) p(pos ART.Def.Nom.Sg.Neut, PART.Zu, ADJA) p(nom ART.Def.Nom.Sg.Neut, PART.Zu, ADJA.Pos) p(sg ART.Def.Nom.Sg.Neut, PART.Zu, ADJA.Pos.Nom) p(neut ART.Def.Nom.Sg.Neut, PART.Zu, ADJA.Pos.Nom.Sg)

Reduction of the Context Step 2: The conditioning context is reduced to the most informative attributes. p(adja.pos.nom.sg.neut ART.Def.Nom.Sg.Neut, PART.Zu) = p(adja ART, PART.Zu) p(pos ART, PART, ADJA) p(nom ART.Nom, PART, ADJA) p(sg ART.Sg, PART, ADJA) p(neut ART.Neut, PART, ADJA) = 1 * 1 * 1 * 1 * 1 = 1 (MLE was 0!) Question: How to select the most informative context attributes? Answer: Decision trees

Goals Development of a statistical parser with a fine-grained set of grammar symbols such as NP.base.subj.nom.sg.fem decomposition of the rule probabilities into a product of symbol probabilities decomposition of the symbol probabilities into a product of attribute probabilities estimation of the attribute probabilities with probability estimation trees beam search strategy

Summary Project Goals Verb Clustering Extension of the verb-clustering model to deal with adjuncts Induction of noun taxonomies Parsing A statistical parser using very large contexts for syntactic disambiguation Selection of the relevant subset of context information with decision trees