Lexical Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT.

Similar documents
Natural Language Processing. Part 4: lexical semantics

Building a Question Classifier for a TREC-Style Question Answering System

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

Comparing methods for automatic acquisition of Topic Signatures

Clever Search: A WordNet Based Wrapper for Internet Search Engines

Combining Contextual Features for Word Sense Disambiguation

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD

An Integrated Approach to Automatic Synonym Detection in Turkish Corpus

Intro to Linguistics Semantics

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Identifying Focus, Techniques and Domain of Scientific Papers

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Word Sense Disambiguation for Text Mining

A Sentiment Analysis Model Integrating Multiple Algorithms and Diverse. Features. Thesis

Analyzing survey text: a brief overview

TERMINOGRAPHY and LEXICOGRAPHY What is the difference? Summary. Anja Drame TermNet

DISA at ImageCLEF 2014: The search-based solution for scalable image annotation

Clustering Connectionist and Statistical Language Processing

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach

The study of words. Word Meaning. Lexical semantics. Synonymy. LING 130 Fall 2005 James Pustejovsky. ! What does a word mean?

Word Sense Disambiguation as an Integer Linear Programming Problem

Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization

A Method for Automatic De-identification of Medical Records

Semantic Clustering in Dutch

Optimizing the relevancy of Predictions using Machine Learning and NLP of Search Query

Cross-lingual Synonymy Overlap

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

SEMANTIC RESOURCES AND THEIR APPLICATIONS IN HUNGARIAN NATURAL LANGUAGE PROCESSING

Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance

Machine Learning Approach To Augmenting News Headline Generation

Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar anoop/

Chapter ML:XI (continued)

Textual Data Analysis tools for Word Sense Disambiguation

Simple Features for Chinese Word Sense Disambiguation

Package wordnet. January 6, 2016

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

2. SEMANTIC RELATIONS

Clustering of Polysemic Words

DanNet Teaching and Research Perspectives at CST

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work

Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features

Exploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons

Chapter 2 The Information Retrieval Process

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Word sense disambiguation through associative dictionaries

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

A clustering-based Approach for Unsupervised Word Sense Disambiguation

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

A Mixed Trigrams Approach for Context Sensitive Spell Checking

Context Sensitive Paraphrasing with a Single Unsupervised Classifier

Semi-Supervised Learning for Blog Classification

An empirical study of semantic similarity in WordNet and Word2Vec. A Thesis

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Comparing Topiary-Style Approaches to Headline Generation

Resolving Common Analytical Tasks in Text Databases

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Applications of Named Entity Recognition in Customer Relationship Management Systems

Semantic analysis of text and speech

What Is This, Anyway: Automatic Hypernym Discovery

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Taxonomies for Auto-Tagging Unstructured Content. Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013

PoS-tagging Italian texts with CORISTagger

Report on the embedding and evaluation of the second MT pilot

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies

Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts

Stock Market Prediction Using Data Mining

Finding Negative Key Phrases for Internet Advertising Campaigns using Wikipedia

Machine Learning: Overview

TextRank: Bringing Order into Texts

Doctoral Dissertation. Graph-Theoretic Approaches to Minimally-Supervised Natural Language Learning. Mamoru Komachi

Extraction of Hypernymy Information from Text

Free Construction of a Free Swedish Dictionary of Synonyms

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Overview of MT techniques. Malek Boualem (FT)

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

COLLOCATION TOOLS FOR L2 WRITERS 1

Benefits of HPC for NLP besides big data

Machine Translation. Agenda

Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries

Hybrid Strategies. for better products and shorter time-to-market

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Mining Text Data: An Introduction

Word Completion and Prediction in Hebrew

Clustering Technique in Data Mining for Text Documents

Transaction-Typed Points TTPoints

Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation

Refining the most frequent sense baseline

SENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES

Exploiting Redundancy in Natural Language to Penetrate Bayesian Spam Filters

The Open University s repository of research publications and other research outputs. PowerAqua: fishing the semantic web

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Although the IT community widely acknowledges the usefulness of domain ontologies,

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Transcription:

Lexical Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu

Q: What s meaning? Compositional semantics answer:

Q: What s meaning? We now answer from a lexical semantics perspective i.e. let s think about the meaning of words

Today Word senses & word relations WordNet Word Sense Disambiguation Supervised Semi-supervised

WORD SENSES & WORD RELATIONS

Where can we look up the meaning of words? Dictionary?

Word Senses Word sense = distinct meaning of a word Same word, different senses Homonyms (homonymy): unrelated senses; identical orthographic form is coincidental Polysemes (polysemy): related, but distinct senses Metonyms (metonymy): stand in, technically, a subcase of polysemy Different word, same sense Synonyms (synonymy)

Just to confuse you Homophones: same pronunciation, different orthography, different meaning Examples: would/wood, to/too/two Homographs: distinct senses, same orthographic form, different pronunciation Examples: bass (fish) vs. bass (instrument)

Relationship Between Senses IS-A relationships From specific to general (up): hypernym (hypernymy) From general to specific (down): hyponym (hyponymy) Part-Whole relationships wheel is a meronym of car (meronymy) car is a holonym of wheel (holonymy)

WordNet: a lexical database for English https://wordnet.princeton.edu/ Includes most English nouns, verbs, adjectives, adverbs Electronic format makes it amenable to automatic manipulation: used in many NLP applications WordNets generically refers to similar resources in other languages

WordNet: History Research in artificial intelligence: How do humans store and access knowledge about concept? Hypothesis: concepts are interconnected via meaningful relations Useful for reasoning The WordNet project started in 1986 Can most (all?) of the words in a language be represented as a semantic network where words are interlinked by meaning? If so, the result would be a large semantic network

Synonymy in WordNet WordNet is organized in terms of synsets Unordered set of (roughly) synonymous words (or multi-word phrases) Each synset expresses a distinct meaning/concept

WordNet: Example Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) pipe oil, water, and gas into the desert {pipe} (play on a pipe) pipe a tune {pipe} (trim with piping) pipe the skirt What do you think of the sense granularity?

The Net Part of WordNet {conveyance; transport} hyperonym {vehicle} hyperonym {bumper} {motor vehicle; automotivevehicle} meronym hyperonym {car dor} {car window} {car; auto; automobile; machine; motorcar} {armrest} meronym hyperonym hyperonym {cruiser; squadcar; patrol car; policecar; prowl car} {cab; taxi; hack; taxicab; } {car miror} meronym meronym {hinge; flexiblejoint} {dorlock}

WordNet: Size Part of speech Word form Synsets Noun 117,798 82,115 Verb 11,529 13,767 Adjective 21,479 18,156 Adverb 4,481 3,621 Total 155,287 117,659 http://wordnet.princeton.edu/

Word Sense From WordNet: Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) pipe oil, water, and gas into the desert {pipe} (play on a pipe) pipe a tune {pipe} (trim with piping) pipe the skirt

WORD SENSE DISAMBIGUATION

Word Sense Disambiguation Task: automatically select the correct sense of a word Lexical sample All-words Motivated by many applications: Semantic similarity Information retrieval Machine translation But gains in performance have been elusive.

How big is the problem? Most words in English have only one sense 62% in Longman s Dictionary of Contemporary English 79% in WordNet But the others tend to have several senses Average of 3.83 in LDOCE Average of 2.96 in WordNet Ambiguous words are more frequently used In the British National Corpus, 84% of instances have more than one sense Some senses are more frequent than others

Ground Truth Which sense inventory do we use? Issues there? Application specificity?

Existing Corpora Lexical sample line-hard-serve corpus (4k sense-tagged examples) interest corpus (2,369 sense-tagged examples) All-words SemCor (234k words, subset of Brown Corpus) Senseval/SemEval (2081 tagged content words from 5k total words)

Evaluation Intrinsic Measure accuracy of sense selection wrt ground truth Extrinsic Integrate WSD as part of a bigger end-to-end system, e.g., machine translation or information retrieval Compare WSD

Baseline Performance Baseline: most frequent sense Equivalent to take first sense in WordNet Does surprisingly well! 62% accuracy in this case!

Upper Bound Performance Upper bound Fine-grained WordNet sense: 75-80% human agreement Coarser-grained inventories: 90% human agreement possible

WSD Approaches Depending on use of manually created knowledge sources Knowledge-lean Knowledge-rich Depending on use of labeled data Supervised Semi- or minimally supervised Unsupervised

Simplest WSD algorithm: Lesk s Algorithm Intuition: note word overlap between context and dictionary entries Unsupervised, but knowledge rich The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. WordNet

Lesk s Algorithm Simplest implementation: Count overlapping content words between glosses and context Lots of variants: Include the examples in dictionary definitions Include hypernyms and hyponyms Give more weight to larger overlaps (e.g., bigrams) Give extra weight to infrequent words (e.g., idf weighting)

Another approach: Supervised WSD How would you frame WSD as a supervised classification task?

Supervised Classification Training Testing training data? unlabeled document label 1 label 2 label 3 label 4 supervised machine learning algorithm Representation Function Classifier label 1? label 2? label 3? label 4?

Features Possible features POS and surface form of the word itself Surrounding words and POS tag Positional information of surrounding words and POS tags Same as above, but with n-grams Grammatical information Richness of the features? Richer features = ML algorithm does less of the work More impoverished features = ML algorithm does more of the work

Classifiers Once we cast the WSD problem as supervised classification, any multiclass classifer can be used: Perceptron Support Vector Machines

WSD Accuracy Generally Supervised approaches yield ~70-80% accuracy However depends on actual word, sense inventory, amount of training data, number of features, etc.

WORD SENSE DISAMBIGUATION: MINIMIZING SUPERVISION

Minimally Supervised WSD Problem: annotations are expensive! Solution 1: Bootstrapping or co-training (Yarowsky 1995) Start with (small) seed, learn classifier Use classifier to label rest of corpus Retain confident labels, treat as annotated data to learn new classifier Repeat Heuristics (derived from observation): One sense per discourse One sense per collocation

One Sense per Discourse A word tends to preserve its meaning across all its occurrences in a given discourse Evaluation: 8 words with two-way ambiguity, e.g. plant, crane, etc. 98% of the two-word occurrences in the same discourse carry the same meaning the grain of salt: accuracy depends on granularity Performance of one sense per discourse measured on SemCor is approximately 70%

One Sense per Collocation A word tends to preserve its meaning when used in the same collocation Strong for adjacent collocations Weaker as the distance between words increases Evaluation: 97% precision on words with two-way ambiguity Again, accuracy depends on granularity: 70% precision on SemCor words

Yarowsky s Method: Example Disambiguating plant: industrial sense vs. living thing sense Think of seed features for each sense Industrial sense: co-occurring with manufacturing Living thing sense: co-occurring with life Use one sense per collocation to build initial decision list classifier Treat results as annotated data, train new decision list classifier, iterate

Decision List Ordered list of tests (equivalent to case statement): Example decision list, discriminating between bass (fish) and bass (music) :

Building Decision Lists Simple algorithm: Compute how discriminative each feature is: log P( S P( S 1 2 ) ) Create ordered list of tests from these values f f i i

Yarowsky s Method: Stopping Stop when: Error on training data is less than a threshold No more training data is covered Use final decision list for WSD

Yarowsky s Method: Discussion Advantages: Accuracy is about as good as a supervised algorithm Bootstrapping: far less manual effort Disadvantages: Seeds may be tricky to construct Works only for coarse-grained sense distinctions Snowballing error with co-training

WSD with Parallel Text Problems: annotations are expensive! What s the proper sense inventory? How fine or coarse grained? Application specific? Observation: multiple senses translate to different words in other languages! Use the foreign language as the sense inventory! Added bonus: annotations for free! (Byproduct of word-alignment process in machine translation)

Today Word senses & word relations WordNet Word Sense Disambiguation Lesk Algorithm Supervised classification Semi-supervised approaches Co-training: combine 2 views of data Use parallel text Next: semantic similarity & more unsupervised learning