Word Sense Disambiguation. Lexicographers use SketchEngine. Word Sense Disambiguation (WSD) The WSD task: given. SketchEngine concordance



Similar documents
Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Natural Language Processing. Part 4: lexical semantics

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Building a Question Classifier for a TREC-Style Question Answering System

31 Case Studies: Java Natural Language Tools Available on the Web

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

Clustering Connectionist and Statistical Language Processing

Identifying Focus, Techniques and Domain of Scientific Papers

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Combining Contextual Features for Word Sense Disambiguation

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Word Sense Disambiguation for Text Mining

PoS-tagging Italian texts with CORISTagger

Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Statistical Machine Translation

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Syntax: Phrases. 1. The phrase

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

Reliable and Cost-Effective PoS-Tagging

SENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES

Machine Learning for natural language processing

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE

Interactive Dynamic Information Extraction

Tagging with Hidden Markov Models

Get the most value from your surveys with text analysis

Customizing an English-Korean Machine Translation System for Patent Translation *

Hybrid Strategies. for better products and shorter time-to-market

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Teaching terms: a corpus-based approach to terminology in ESP classes

LESSON THIRTEEN STRUCTURAL AMBIGUITY. Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity.

Brill s rule-based PoS tagger

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

Intro to Linguistics Semantics

Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance

A Sentiment Analysis Model Integrating Multiple Algorithms and Diverse. Features. Thesis

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

Projektgruppe. Categorization of text documents via classification

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Learning Translation Rules from Bilingual English Filipino Corpus

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Natural Language Database Interface for the Community Based Monitoring System *

Overview of MT techniques. Malek Boualem (FT)

Context Grammar and POS Tagging

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Extraction of Hypernymy Information from Text

Word sense disambiguation through associative dictionaries

Simple Features for Chinese Word Sense Disambiguation

Types of meaning. KNOWLEDGE: the different types of meaning that items of lexis can have and the terms used to describe these

Automatic Pronominal Anaphora Resolution in English Texts

SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE

Semantic annotation of requirements for automatic UML class diagram generation

A Survey on Product Aspect Ranking Techniques

Identifying Thesis and Conclusion Statements in Student Essays to Scaffold Peer Review

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS

Language and Computation

Evaluating Sentiment Analysis Methods and Identifying Scope of Negation in Newspaper Articles

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

Data Deduplication in Slovak Corpora

Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization

The Role of Sentence Structure in Recognizing Textual Entailment

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

Outline of today s lecture

Special Topics in Computer Science

Paraphrasing controlled English texts

Speech and Language Processing

Micro blogs Oriented Word Segmentation System

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

A chart generator for the Dutch Alpino grammar

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

Ling 201 Syntax 1. Jirka Hana April 10, 2006

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Pronominal Anaphora Resolution. in English Texts

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Movie Classification Using k-means and Hierarchical Clustering

Semantic analysis of text and speech

Stock Market Prediction Using Data Mining

A Survey of Online Tools Used in English-Thai and Thai-English Translation by Thai Students

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

The Oxford Learner s Dictionary of Academic English

Modeling coherence in ESOL learner texts

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

Cheshire Public Schools Spelling Program. Practice Strategies

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources

Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar anoop/

Cross-lingual Synonymy Overlap

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Web 3.0 image search: a World First

CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

How to make Ontologies self-building from Wiki-Texts

Year 3 Grammar Guide. For Children and Parents MARCHWOOD JUNIOR SCHOOL

Transcription:

School of something Computing FACULTY OF OTHER ENGINEERING Word Sense Disambiguation semantic tagging of text, for Confusion Set Disambiguation Lecturer: Eric Atwell Word Sense Disambiguation (WSD) The WSD task: given a word in context, A fixed inventory of potential word senses decide which sense of the word this is. Examples: English-to-Spanish MT Decide from a set of Spanish translations Speech Synthesis and Recognition Decide from homographs with different pronunciations like bass and bow Confusion set disambiguation {principal,principle}, {to,too,two}, {weather,whether} Michelle Banko and Eric Brill 2001 Scaling to very very large corpora for natural language disambiguation Lexicographers use SketchEngine SketchEngine concordance A word sketch is a one-page, automatic, corpus-derived summary of a word s grammatical and collocational behaviour. Sketch Engine shows a Word Sketch or list of collocates or words co-occurring with the target word more frequently than predicted by probabilities; A lexicographer can colour-code groups of related collocates indicating different senses or meanings of the target word; With a large corpus the lexicographer should find all current senses, better than relying on intuition; For minority languages with few existing corpus resources, eg Amharic, Sketch Engine is combined with Web-Bootcat, so researchers can collect their own new Web-as-Corpus for another language Two variants of WSD task How to automate WSD? Easy and hard versions: Lexical Sample task Small pre-selected set of target words And inventory of senses for each word All-words task Every word in an entire text A lexicon with senses for each word Sort of like part-of-speech tagging Except each lemma has its own tagset Approaches PoS-tagging disambiguates a few cases (bank a plane v the river bank) Supervised Machine Learning of collocation patterns from sense-tagged corpus Un- (or Semi-) supervised: no tagged corpus, use linguistic knowledge from a dictionary Overlap in dictionary definitions: Lesk algorithm Selectional Restriction rules 1

Supervised Machine Learning Approaches Supervised WSD 1: WSD Tags Supervised machine learning approach: a training corpus of? used to train a classifier that can tag words in new text Just as we saw for part-of-speech tagging Summary of what we need: the tag set ( sense inventory ) the training corpus A set of features extracted from the training corpus A classifier What s a tag? Grammarians agree on Noun, Verb, Adjective, Adverb, Preposition, Conjunction, BUT there is no equivalent set of semantic tags which cover all words in English WORDNET: a set of SENSES for each word In NLTK; also online: http://www.cogsci.princeton.edu/cgi-bin/webwn WordNet Bass Inventory of sense tags for bass The noun ``bass'' has 8 senses in WordNet 1. bass - (the lowest part of the musical range) 2. bass, bass part - (the lowest part in polyphonic music) 3. bass, basso - (an adult male singer with the lowest voice) 4. sea bass, bass - (flesh of lean-fleshed saltwater fish of the family Serranidae) 5. freshwater bass, bass - (any of various North American lean-fleshed freshwater fishes especially of the genus Micropterus) 6. bass, bass voice, basso - (the lowest adult male singing voice) 7. bass - (the member with the lowest range of a family of musical instruments) 8. bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes) Supervised WSD 2: Get a corpus Lexical sample task: Line-hard-serve corpus - 4000 examples of each Interest corpus - 2369 sense-tagged examples Banko&Brill: auto-generate confusion sets e.g.{to,too,two} in very very large corpora All words: Semantic concordance: a corpus in which each open-class word is labeled with a sense from a specific dictionary/thesaurus. SemCor: 234,000 words from Brown Corpus, manually tagged with WordNet senses SENSEVAL-3 competition corpora - 2081 tagged word tokens Supervised WSD 3: Extract feature vectors Weaver (1955) If one examines the words in a book, one at a time as through an opaque mask with a hole in it one word wide, then it is obviously impossible to determine, one at a time, the meaning of the words. [ ] But if one lengthens the slit in the opaque mask, until one can see not only the central word in question but also say N words on either side, then if N is large enough one can unambiguously decide the meaning of the central word. [ ] The practical question is : ``What minimum value of N will, at least in a tolerable fraction of cases, lead to the correct choice of meaning for the central word?'' 2

dishes bass washing dishes. simple dishes including convenient dishes to of dishes and free bass with pound bass of and bass player his bass while In our house, everybody has a career and none of them includes washing dishes, he says. In her tiny kitchen at home, Ms. Chen works efficiently, stirfrying several simple dishes, including braised pig s ears and chcken livers with green peppers. Post quick and convenient dishes to fix when your in a hurry. Japanese cuisine offers a great variety of dishes and regional specialties We need more good teachers right now, there are only a half a dozen who can play the free bass with ease. Though still a far cry from the lake s record 52-pound bass of a decade ago, you could fillet these fish again, and that made people very, very happy. Mr. Paulson says. An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations again. Lowe caught his bass while fishing with pro Bill Lee of Killeen, Texas, who is currently in 144th place with two bass weighing 2-09. Feature vectors A simple representation for each observation (each instance of a target word) Vectors of sets of feature/value pairs I.e. files of comma-separated values, a line in WEKA.arff These vectors should represent the window of words around the target How big should that window be? Two kinds of features in the vectors Collocational features and bag-of-words features Collocational Features about words at specific positions near target word Often limited to just word identity and POS Bag-of-words Features about words that occur anywhere in the window (regardless of position) Typically limited to frequency counts 3

Examples Example text (WSJ) An electric guitar and bass player stand off to one side not really part of the scene, just as a sort of nod to gringo expectations perhaps Assume a window of +/- 2 from the target Examples Example text An electric guitar and bass player stand off to one side not really part of the scene, just as a sort of nod to gringo expectations perhaps Assume a window of +/- 2 from the target Collocational Bag-of-words Position-specific information about the words in the window guitar and bass player stand [guitar, NN, and, CC, player, NN, stand, VB] Word n-2, POS n-2, word n-1, POS n-1, Word n+1 POS n+1 In other words, a vector consisting of [position n word, position n part-of-speech ] Information about the words that occur within the window. First derive a set of terms to place in the vector. Then note how often each of those terms occurs in a given window. Co-Occurrence Example Semi-supervised Bootstrapping Assume we ve settled on a possible vocabulary of 12 words that includes guitar and player but not and and stand guitar and bass player stand [0,0,0,1,0,0,0,0,0,1,0,0] Which are the counts of words predefined as e.g., [fish,fishing,viol, guitar, double,cello What if you don t have enough data to train a system Bootstrap Pick a word that you as an analyst think will co-occur with your target word in particular sense Search (grep) through a corpus (eg WWW: Web-as-Corpus?) for your target word and the hypothesized word Assume that the target tag is the right one 4

Bootstrapping Sentences extracting using fish and play For bass Assume play occurs with the music sense and fish occurs with the fish sense Use each text sample as training data for one or other sense Classifiers Once we cast the WSD problem as a classification problem, then all sorts of techniques are possible Naïve Bayes (the easiest thing to try first) Decision lists Decision trees Neural nets Support vector machines Nearest neighbor methods Classifiers The choice of technique, in part, depends on the set of features that have been used Some techniques work better/worse with features with numerical values Some techniques work better/worse with features that have large numbers of possible values For example, the feature the word to the left has a fairly large number of possible values Naïve Bayes Test On a corpus of examples of uses of the word line, naïve Bayes achieved about 73% correct Decision Lists: another popular method A case statement. Good? The score is only meaningful in comparison to other scores same task, different classifiers (to find best classifier) OR different tasks, same classifier (to find hardest task) 5

Learning Decision Lists Restrict the lists to rules that test a single feature (1- decisionlist rules) Evaluate each possible test and rank them based on how well they work. Glue the top-n tests together and call that your decision list. WSD Evaluations and baselines Evaluate against a manually tagged corpus Exact match accuracy % of words tagged identically with manual sense tags Usually evaluate using held-out data from same labeled corpus Compare accuracy to Baseline(s) and Ceiling Baseline: Most frequent sense?baseline: The Lesk algorithm Ceiling: Human inter-annotator agreement Most Frequent Sense Wordnet senses are ordered in frequency order So most frequent sense in wordnet = take the first sense Ceiling As good as it gets Human inter-annotator agreement Compare annotations of two humans On same data Given same tagging guidelines Human agreements on all-words corpora with Wordnet style senses 75%-80% Unsupervised Methods WSD: Dictionary/Thesaurus methods Original Lesk: pine cone The Lesk Algorithm Selectional Restrictions NB these are unsupervised in Machine Learning terms, in that, in training data, each instance does NOT have a class. This does NOT mean there is no human intervention in fact, human knowledge is used, encoded in dictionaries and hand-coded rules; BUT this is not supervised ML. Dictionary definitions of pine1 and cone3 literally overlap: evergreen + tree So pine cone must be pine1 + cone3 6

Simplified Lesk Disambiguation via Selectional Restrictions Count words in the context (sentence) which are also in the Gloss or Example for 1 and 2; Choose the word-sense with most overlap Verbs are known by the company they keep Different verbs select for different thematic roles wash the dishes (takes washable-thing as patient) serve delicious dishes (takes food-type as patient) Method: semantic attachments in grammar Semantic attachment rules are applied as sentences are syntactically parsed, e.g. VP --> V NP V serve <theme> {theme:food-type} Selectional restriction violation: no parse Summary But this means we must: Write selectional restrictions for each sense of each predicate or use FrameNet Serve alone has 15 verb senses Obtain hierarchical type information about each argument (using WordNet) How many hypernyms does dish have? How many words are hyponyms of dish? But also: Sometimes selectional restrictions don t restrict enough (Which dishes do you like?) Sometimes they restrict too much (Eat dirt, worm! I ll eat my hat!) Humans can do WSD, difficult to automate Lexicographers use concordance (eg from SketchEngine) to visualize word senses PoS-tagging disambiguates a few cases (bank a plane v the river bank) Supervised Machine Learning of collocation patterns from sense-tagged corpus Un- (or Semi-) supervised: no tagged corpus, use linguistic knowledge from a dictionary Overlap in dictionary definitions: Lesk algorithm Selectional Restriction rules 7