CMPT-825 Natural Language Processing

Similar documents
Natural Language Processing. Part 4: lexical semantics

Intro to Linguistics Semantics

2. SEMANTIC RELATIONS

Building a Question Classifier for a TREC-Style Question Answering System

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

SPELLING WORD #1: SENTENCE:

The Lexicon. The Lexicon. The Lexicon. The Significance of the Lexicon. Australian? The Significance of the Lexicon 澳 大 利 亚 奥 地 利

TERMINOGRAPHY and LEXICOGRAPHY What is the difference? Summary. Anja Drame TermNet

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

Package wordnet. January 6, 2016

Cross-lingual Synonymy Overlap

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach

The study of words. Word Meaning. Lexical semantics. Synonymy. LING 130 Fall 2005 James Pustejovsky. ! What does a word mean?

The Lois Project: Lexical Ontologies for Legal Information Sharing

Ling 201 Syntax 1. Jirka Hana April 10, 2006

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Livingston Public Schools Scope and Sequence K 6 Grammar and Mechanics

Latin WordNet project

Clever Search: A WordNet Based Wrapper for Internet Search Engines

COMPARISON OF PUBLIC SCHOOL ELEMENTARY CURRICULUM AND MONTESSORI ELEMENTARY CURRICULUM

A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches

Visualizing WordNet Structure

Combining Contextual Features for Word Sense Disambiguation

Extraction of Hypernymy Information from Text

Using Text Mining and Natural Language Processing for Health Care Claims Processing

Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features

SEMANTIC RESOURCES AND THEIR APPLICATIONS IN HUNGARIAN NATURAL LANGUAGE PROCESSING

An Integrated Approach to Automatic Synonym Detection in Turkish Corpus

Outline of today s lecture

Semantic analysis of text and speech

Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance

SEPTEMBER Unit 1 Page Learning Goals 1 Short a 2 b 3-5 blends 6-7 c as in cat 8-11 t p

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Monday Simple Sentence

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

ENGLISH COMPREHENSION AND LANGUAGE GRADE

Introduction to WordNet: An On-line Lexical Database. (Revised August 1993)

PUSD High Frequency Word List

King Midas & the Golden Touch

A Mixed Trigrams Approach for Context Sensitive Spell Checking

The Open University s repository of research publications and other research outputs. PowerAqua: fishing the semantic web

Mr. Anker Tests Current Listing of Activities, December 2008

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

File Folder Games Order Form Title 1 Parent/Teacher Resource Lab

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Focus: Reading Unit of Study: Research & Media Literary; Informational Text; Biographies and Autobiographies

A Serious Game for Building a Portuguese Lexical-Semantic Network

English-Arabic Dictionary for Translators

English. Universidad Virtual. Curso de sensibilización a la PAEP (Prueba de Admisión a Estudios de Posgrado) Parts of Speech. Nouns.

Academic Standards for Reading, Writing, Speaking, and Listening June 1, 2009 FINAL Elementary Standards Grades 3-8

PoS-tagging Italian texts with CORISTagger

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Domain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora

Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries

A Mapping of CIDOC CRM Events to German Wordnet for Event Detection in Texts

31 Case Studies: Java Natural Language Tools Available on the Web

ISSN: Sean W. M. Siqueira, Maria Helena L. B. Braz, Rubens Nascimento Melo (2003), Web Technology for Education

Student s Worksheet. Writing útvary, procvičování

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources

Checklist for Recognizing Complete Verbs

Reading IV Grade Level 4

Clustering of Polysemic Words

Information Retrieval, Information Extraction and Social Media Analytics

SIMOnt: A Security Information Management Ontology Framework

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Vocabulary Reflection Statement

Simple Features for Chinese Word Sense Disambiguation

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Parent Help Booklet. Level 3

Computer Standards & Interfaces

Fry Phrases Set 1. TeacherHelpForParents.com help for all areas of your child s education

This image cannot currently be displayed. Course Catalog. Language Arts Glynlyon, Inc.

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

An ontology-based approach for semantic ranking of the web search engines results

Estudios de Asia y Africa Idiomas Modernas I What you should have learnt from Face2Face

Programs are Knowledge Bases

Taxonomies for Auto-Tagging Unstructured Content. Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013

NLUI Server User s Guide

Morphemes, roots and affixes. 28 October 2011

Thesis Proposal Verb Semantics for Natural Language Understanding

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

This image cannot currently be displayed. Course Catalog. Language Arts Glynlyon, Inc.

The Role of Sentence Structure in Recognizing Textual Entailment

Language Lessons. Secondary Child

ENGLISH COMPREHENSION AND LANGUAGE GRADE

Words with Attitude. Jaap Kamps and Maarten Marx. Abstract. 1 Introduction

Interactive Dynamic Information Extraction

Comparing methods for automatic acquisition of Topic Signatures

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes

GESE Initial steps. Guide for teachers, Grades 1 3. GESE Grade 1 Introduction

Semantic Analysis of. Tag Similarity Measures in. Collaborative Tagging Systems

4.1 Multilingual versus bilingual systems

Topics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment

Third Grade Language Arts Learning Targets - Common Core

Transcription:

CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop February 11, 2008 1 / 19

Lexical Semantics So far, we have listed words in our lexicon or vocabulary assuming a single meaning per word. However, the same word bank means two different things but we cannot distinguish between them using the traditional definition of word. To deal with this issue, we combine the spelling or pronunciation of a word and the meaning. In the lexicon we now store lexemes instead of words. A lexeme pairs a particular spelling or pronunciation with a particular meaning. 2 / 19

Lexical Semantics The meaning part of a lexeme is called a sense. For CL, our interest is in relations between lexemes or disambiguating different senses of a word. word: bank lexeme: bank 1 OR word: bank lexeme: bank 2 Note that meanings are often not definitions, but often are simple listings of compatible lexemes. cf. dictionary defns: red, n. the color of blood or ruby; blood, n. red liquid circulating in animals 3 / 19

Homonyms Homonyms: words that have the same form but different meanings 1. Instead, the chemical plant was found in violation of several environmental laws 2. Stanley formed an expedition to find a rare plant found along the Amazon river Same orthographic form: plant but two senses: plant 1 and plant 2 4 / 19

Homonyms Text vs. speech: fly-casting for bass vs. rhythmic bass chords These cases are homonyms in text, but not in speech. Referred to as homographs Speech vs. text: would vs. wood These cases are not homonyms in text, but easily confused in speech. Referred to as homophones Note that this problem in some cases can be solved using part of speech tagging Can you think of a case which cannot be solved using POS tagging? 5 / 19

Applications Spelling correction: homophones: weather vs. whether Speech recognition: homophones: to, two, too. Also homonyms (see n-gram e.g.) Text to speech: homographs: bass vs. bass Information retrieval: homonyms: latex 6 / 19

Polysemy Consider the homonym: bank commercial bank 1 vs. river bank 2 Now consider 1. A PCFG can be trained using derivation trees from a tree bank annotated by human experts Is this a new sense of bank? 7 / 19

Polysemy Senses can be derived from a particular lexeme. This process is known as polysemy In previous case we would say that the use of bank is a sense derived from commercial bank 1 In some cases, splitting into different lexemes has other supporting evidence: bank 1 has a Romance (Italian) origin vs. bank 2 has a Germanic (Scandinavian) origin 1. A PCFG can be trained using a bank of derivation trees called a tree-bank annotated by human experts How can we tell between homonyms and polysemous uses of a word? 8 / 19

Word sense and conjunction: zeugma Consider the case for a verb like serve 1. Does United serve breakfast? 2. Does United serve Philadelphia? 3. Does United serve breakfast and dinner? 4. #Does United serve breakfast and Philadelphia? 9 / 19

Word Sense Disambiguation Consider a noun like bank 1. How many senses does it have? 2. How are these senses related? 3. How can they be reliably distinguished? For NLP software, among these three questions, typically at runtime we need to automatically find the answer to the last question: given a word in context, map it to the correct lexeme: word-sense disambiguation 10 / 19

Synonyms Synonyms: Different lexemes with the same meaning 1. How big/large is that plane? 2. Would I be flying on a big/large or small plane? Synonyms clash with polysemous meanings 1. Seema is my big sister 2. #Seema is my large sister 11 / 19

WordNet WordNet is an electronic database of word relationships, handcrafted from scratch by researchers at Princeton University (George Miller, Christine Fellbaum, et al.) WordNet contains 3 databases: for verbs, nouns and one for adjectives and adverbs Category Unique Forms Number of Senses Noun 94474 116317 Verb 10319 22066 Adjective 20170 29881 Adverb 4546 5677 12 / 19

WordNet Ask the question: how many senses per noun or verb? The distribution of senses follows Zipf s (2nd) Law. WordNet provides multiple lexeme entries for each word and for each part of speech, e.g. plant as noun has 3 senses; plant as verb has 2 senses WordNet also provides domain-independent lexical relations such as IS-A, HasMember, MemberOf,... 13 / 19

WordNet: noun relations Relation Definition Example Hypernym this is a kind of breakfast meal Hyponym this has a specific instance meal lunch Has-Member this has a member faculty professor Member-Of this is member of a group copilot crew Has-Part this has a part table leg Part-Of this is part of course meal Antonym this is an opposite of leader follower 14 / 19

WordNet: verb relations Relation Definition Example Hypernym this event is a kind of fly travel Tropynym this event has a subtype walk stroll Entails this event entails snore sleep Antonym this event is opposite of increase decrease 15 / 19

WordNet: example from ver1.7.1 Sense1: Canada North American country,north American nation country, state, land administrative district,administrative division,territorial division district, territory region location entity, physical thing 16 / 19

WordNet: example from ver1.7.1 Sense 3: Vancouver city, metropolis, urban center municipality urban area geographical area region location entity, physical thing administrative district, territorial division district, territory region location entity, physical thing port geographic point point location entity, physical thing 17 / 19

WordNet A synset in WordNet is a list of synonyms (interchangeable words) { chump, fish, fool, gull, mark, patsy, fall guy, sucker, schlemiel, shlemiel, soft touch, mug } How can we use this information like synsets, hypernyms, etc. from WordNet to benefit NLP applications? Consider one example: PP attachment in parsing, words plus word classes extracted from the hypernym hierarchy increase accuracy from 84% to 88% (Stetina and Nagao, 1998) 18 / 19

WordNet Another example of WordNet used in NLP applications: selectional restrictions We have considered subcategorization: VP-with-NP-complement V(eat) NP eat six bowls of rice But not selectional restrictions of the verb itself: eat tomorrow Consider what do you want to eat tomorrow We can use the synset { food, nutrient } to describe the NP argument of eat then the 60K lexemes under these nodes in the WordNet hierarchy will be acceptable. (however, what about eat my shorts ) several other applications have been explored 19 / 19