NATURAL LANGUAGE PROCESSING WORD SENSE DISAMBIGUATION

Similar documents
Natural Language Processing. Part 4: lexical semantics

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

Intro to Linguistics Semantics

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

The study of words. Word Meaning. Lexical semantics. Synonymy. LING 130 Fall 2005 James Pustejovsky. ! What does a word mean?

Clustering Connectionist and Statistical Language Processing

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

2. SEMANTIC RELATIONS

Identifying Focus, Techniques and Domain of Scientific Papers

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

Tagging with Hidden Markov Models

A Sentiment Analysis Model Integrating Multiple Algorithms and Diverse. Features. Thesis

Machine Learning for natural language processing

Course: Model, Learning, and Inference: Lecture 5

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Building a Question Classifier for a TREC-Style Question Answering System

An Integrated Approach to Automatic Synonym Detection in Turkish Corpus

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

SPELLING WORD #1: SENTENCE:

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS

Introduction to Pattern Recognition

Sentiment analysis using emoticons

Speech and Language Processing

Machine Learning using MapReduce

Social Media Mining. Data Mining Essentials

Projektgruppe. Categorization of text documents via classification

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Semantic Clustering in Dutch

Music Mood Classification

Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms

Context Aware Predictive Analytics: Motivation, Potential, Challenges

{ Mining, Sets, of, Patterns }

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD

Introduction to Machine Learning Using Python. Vikram Kamath

The Role of Sentence Structure in Recognizing Textual Entailment

Final Project Report

Environmental Remote Sensing GEOG 2021

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation

Types of meaning. KNOWLEDGE: the different types of meaning that items of lexis can have and the terms used to describe these

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Word sense disambiguation through associative dictionaries

Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance

Music Genre Classification

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Semantic Analysis of Song Lyrics

Learning is a very general term denoting the way in which agents:

Rethinking the relationship between transitive and intransitive verbs

Basic Parsing Algorithms Chart Parsing

Active Learning SVM for Blogs recommendation

Applications of Named Entity Recognition in Customer Relationship Management Systems

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Predicting Flight Delays

EQUATIONS and INEQUALITIES

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Clustering of Polysemic Words

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Data, Measurements, Features

Performance Indicators-Language Arts Reading and Writing 3 rd Grade

Clustering Technique in Data Mining for Text Documents

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

Combining Contextual Features for Word Sense Disambiguation

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Language and Computation

Significant Figures, Propagation of Error, Graphs and Graphing

PUSD High Frequency Word List

Outline of today s lecture

Temperature Scales. The metric system that we are now using includes a unit that is specific for the representation of measured temperatures.

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Semantic Analysis of. Tag Similarity Measures in. Collaborative Tagging Systems

MS1b Statistical Data Mining

Does it fit? KOS evaluation using the ICE-Map Visualization.

Machine Learning for Data Science (CS4786) Lecture 1

Word Sense Disambiguation for Text Mining

Lecture 9: Introduction to Pattern Analysis

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Analytics on Big Data

Supervised Learning (Big Data Analytics)

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

Goal driven planning and adaptivity for AUVs CAR06

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets

Multiplication and Division with Rational Numbers

Clever Search: A WordNet Based Wrapper for Internet Search Engines

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

Fry Phrases Set 1. TeacherHelpForParents.com help for all areas of your child s education

Using Data Mining for Mobile Communication Clustering and Characterization

Computational Linguistics and Learning from Big Data. Gabriel Doyle UCSD Linguistics

Textual Data Analysis tools for Word Sense Disambiguation

Problem of the Month: Cutting a Cube

Transcription:

NATURAL LANGUAGE PROCESSING (COM4513/6513) WORD SENSE DISAMBIGUATION Andreas Vlachos a.vlachos@sheffield.ac.uk Department of Computer Science University of Sheffield 1

SO FAR part-of-speech tagging syntactic dependency parsing and algorithms for learning models from data 2

IN THIS LECTURE Word senses and their disambiguation definitions resources three ways to disambiguate: supervised knowledge-based unsupervised 3

WHAT IS A WORD SENSE? A discrete representation of an aspect of a word's meaning: sense1: a bank is a financial institution and a financial intermediary... sense2: a raised portion of seabed or sloping ground... Usually, by word in this contex we mean lemma, i.e. how the word is found in dictionaries: banks -> bank sung -> sing Words with different parts of speech tags (verb, noun, etc.) have different lemmas 4

WORDS AND THEIR SENSES I deposited the cheque at my bank. versus: Fishing from the river bank is prohibited. Homonymy: same spelling and pronunciation, different unrelated senses (their translations are usually different) Homographs: same spelling (bass guitar vs sea bass) Homophones: same pronunciation (right and write) 5

POLYSEMY I deposited the cheque at my bank. The bank is around the corner. I volunteer at the blood bank. Polysemy: like homonymy, but with related senses. Beethoven wrote the Moonlight Sonata. I like Beethoven. Metonymy: special type of polysemy when senses are aspects of the same concept (composer vs pieces by). 6

DETECTING POLYSEMY Which of those flights serve breakfast? Does Midwest Express serve Philadelphia? Does Midwest Express serve breakfast and Philadelphia? The last construnction is called zeugma. It is ill-formed, suggesting that serve has two distinct senses. Polysemy vs. homonymy: helpful distinction but not black & white. How (un-)related are blood banks and financial banks? 7

SYNONYMY Words with (nearly) identical meaning, e.g. couch and sofa. More formally, the can substitute each other in a sentence without changing its truth value. (propositional meaning) Remember: synonymy is among senses not words: How big/large is this plane? My big/large brother is lying. big has the sense of being older, large doesn't. No absolute synonymy; there are reasons why we say H2O instead of water: text genre, politeness, slang, etc. 8

OTHER RELATIONS BETWEEN SENSES antonyms: opposites with respect to one aspect of meaning: hot/cold, rise/fall, up/down, etc. hyponym/hypernym: one sense more/less specific than the other: car is a hyponym of vehicle fruit is a hypernym of mango meronymy: part-whole relation: wheel/car, handle/door, etc. 9

WORDNET A publicly available database of words (lemmas) annotated with senses and relations among them, e.g.: The noun bass has 8 senses in WordNet: bass 1 1. : the lowest part of the musical range 2. { sea bass 1, bass 4 }: the lean flesh of a saltwater fish of the family Serranidae, etc. gloss: a dictionary-style definition of a sense, e.g. the lowest part of the musical range synset: a set of near-synonymous senses, e.g. {, } bass 4 sea bass 1 10

WORDNET GRAPH 11

WORDNET Most commonly used resource for senses and their relations Originally for English, but now available for many languages: http://globalwordnet.org/ Coverage issues: 147,278 unique words, is that all the words? what about domain-specifc usage, e.g. gene expression? 12

SUPERVISED WORD SENSE DISAMBIGUATION Given a set of words in context annotated with senses, e.g.: learn a model (one per word/lemma) to predict the sense. 13

SUPERVISED WSD Use you favourite classifier: perceptron, naive Bayes, etc. Feature vector with: collocational: previousword, nextword, previous2words, next2words, PoS tags, etc. bag-of-words: as above, but without encoding their position relative to the word Intuition: like text classification, but focusing on a word 14

KNOWLEDGE-BASED WSD Supervised WSD assumes annotated examples for each word for all its senses, will not scale to 1000s of words. Each word sense has a definition: Given the word in a new sentence: "The bank can guarantee deposits which eventually cover..." find the sense with the max word overlap (simplified Lesk) 15

UNSUPERVISED WSD New word senses emerge, dictionaries get out of date. Solution: Unsupervised WSD, a.k.a. word sense induction take words in context and extract their feature vector as in supervised WSD use any clustering algorithm words are now clustered according to their context The clusters are related to the senses. But no labels in, no labels out: cluster 1 is not bass 1. 16

CLUSTERING WITH K-MEANS 1 Input: = {x... x initialize cluster means μ ℜ }, clusters,... 1 μ ℜ D K while not_converged do for x n do assign x n to cluster cn = arg max cosine( μ k, xn ) k=1...k for k set μ 1...K do k = mean({xn cn = k}) return c1... cn means initialization matters, but random works convergence assignments don't change 17

INTRINSIC WSD EVALUATION For supervised and knowledge-based use accuracy. For unsupervised we cannot use accuracy: clusters are not labels, how to compare assess how well cluster "correspond" to labels variety of clustering evaluation measures exist, no agreement on which one is the best Baseline: most frequent sense for each word. 18

EXTRINSIC WSD EVALUATION Use the word senses predicted to improve performance in a different task, e.g.: bag-of-word-senses in addition to bag-of-word features for text classification word-senses in addition to words for syntactic parsing etc. Applicable to supervised and unsupervised WSD, since sense-ids and cluster-ids are only used as vector indices. 19

BIBLIOGRAPHY Jurafsky & Martin Chapter 18 20

COMING UP NEXT Distributed word representations (a.k.a. word2vec) Is it always a good idea to use a discrete representation of meaning? 21