Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang"

Transcription

1 Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania October 30, 2003

2 Outline English sense-tagging Senseval-1 verbs Senseval-2 verbs WordNet verb sense groupings Chinese sense-tagging Penn Chinese Treebank People s Daily News Sense-tagging in PropBank II 1

3 Local Contextual Predicates for English WSD Collocational (Ratnaparkhi pos-tagger): target verb w; pos of w; pos of words at positions -1, +1, wrt w; words at positions -2, -1, +1, +2, wrt w syntactic (Collins parser): is the sentence containing w passive; is there a sentential complement, subject, direct object, or indirect object the words (if any) in the positions of subject, direct object, indirect object, particle, prepositional complement (and its object) semantic (Nymble: Bikel et al.): Named Entity tag (PERSON, ORGANIZATION, LOCATION) for proper nouns, and WN synsets and hypernyms for all nouns in above syntactic relation to w 2

4 Topical Contextual Keywords Generate list of keywords from training set for each verb: Sort all words k by entropy È of Ë Ò µ, where k appears anywhere in context, provided that k appears in more than (= 2) instances in the corpus Select words k with lowest entropy (most informative) 3

5 Senseval-1 Lexical Sample Task Lexicon: Hector lexical database, senses are organized in hierarchies Corpus: British National Corpus High average inter-annotator agreement (95.5%) 13 verbs (12 senses/verb in corpus) Avg training set size: 215 instances/verb Baseline (most frequent sense): 57% 4

6 Senseval-1 Verb Results System Accuracy p-value Avg. System ETS (Naive Bayes) MaxEnt (lex+trans+topic) MaxEnt (best variants) JHU-final (Decision List)

7 Senseval-2 English Verb Lexical Sample Task Lexicon: WordNet1.7; senses are also grouped Corpus: Penn Treebank WSJ, supplemented with British National Corpus Inter-annotator agreement: 71% 29 verbs, mostly highly polysemous (16 senses/verb in corpus) Avg training set size: 110 instances/verb Baseline (most frequent sense): 40% Best system performance: 60% 6

8 System Accuracy and Feature Types (English) Feature (local) Accuracy Feature (local, topic) Accuracy collocation 48.3 collocation syn syn syn+sem syn+sem 60.2 Linguistically richer features improve system accuracy 7

9 Senseval-2 Verbs Results System Accuracy p-value Avg. System SMU JHU KUNLP MaxEnt 60.2 (Human)

10 Senseval-2 verb groupings methodology Groupings of senses done after sense-tagging for Senseval-2 Double blind grouping of each verb by two people Discussion of criteria used for groupings - syntactic and semantic Adjudication of groupings by third person using agreed-upon criteria 9

11 Groupings improve performance Well-defined groupings improve human inter-annotator agreement (71% to 82%) Random grouping produced insignificant improvement in interannotator agreement (71% to 73%) Similar improvement in system score (60% to 70%) 10

12 Chinese WSD (CTB) Lexicon: CETA (Chinese-English Translation Assistance) Dictionary Corpus: Penn Chinese Treebank (100K words) Manual segmentation, pos-tagging, parsing 28 words (multiple verb senses, possibly other pos), most polysemous in 5K-word sample of corpus 3.5 senses/word in corpus Baseline (most frequent sense): 77% 11

13 Contextual predicates (Chinese) Local features: Collocational features: same as for English, plus follows verb feature syntactic features: hassubj, subj, hasobj, obj-p, obj, hasinobj, Comp-VP, VP- Comp, Comp-IP, hasprd semantic features (for verbs only): HowNet noun category for each subject and object Topical features: Same as for English 12

14 System Accuracy and Feature Types (CTB) Feature type Accuracy Std. Dev. collocation collocation (+ pos) collocation + syntax collocation + syntax + semantics baseline

15 Chinese WSD (PDN) Five words with low accuracy and counts in CTB subsequently sense-tagged in People s Daily News (1M words). PDN corpus has manual segmentation, pos-tagging; no parse About 200 sentences/word in PDN 8.2 senses/verb in corpus Baseline (most frequent sense): 58% Automatic segmentation, pos-tagging, parsing 14

16 System Accuracy and Feature Types (PDN, automatic) Feature type Accuracy Std. Dev. collocation collocation (+ pos) collocation + syntax collocation + syntax + semantics baseline

17 System Accuracy and Feature Types (PDN, manual) Feature Type Accuracy Std. Dev. collocation collocation (+ pos) collocation + topic

18 Differences between English and Chinese Higher number of verbs in Chinese than English Lower polysemy per verb for Chinese Many multi-character Chinese verbs Much ambiguitiy in Chinese is at level of word segmentation Lexical collocational information may be sufficient for Chinese 17

19 PropBank II sense-tagging Feasibility study - tag a reasonable set of polysemous words in Eng/Chin CTB determine realistic, concrete sense-tagging goals for next two years Which sense distinctions will be most relevant to IE and MT? how fine-grained do we really need to be? What is the most efficient/accurate way to produce the data? hierarchical tagging? active learning? does hand correcting automatic tagging bias the results? 18

Simple Features for Chinese Word Sense Disambiguation

Simple Features for Chinese Word Sense Disambiguation Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, and Fu-Dong Chiou Department of Computer and Information Science University of Pennsylvania htd,chingyc,mpalmer,chioufd

More information

Combining Contextual Features for Word Sense Disambiguation

Combining Contextual Features for Word Sense Disambiguation Proceedings of the SIGLEX/SENSEVAL Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, Philadelphia, July 2002, pp. 88-94. Association for Computational Linguistics. Combining

More information

Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features

Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features Jinying Chen and Martha Palmer Department of Computer and Information Science, University of Pennsylvania,

More information

Identifying FrameNet Frames for Verbs from a Real-Text Corpus

Identifying FrameNet Frames for Verbs from a Real-Text Corpus Identifying FrameNet Frames for Verbs from a Real-Text Corpus Matthew HONNIBAL and Tobias HAWKER Language Technology Research Group School of Information Technologies Madsen Building (F09) University of

More information

Word Sense Disambiguation. Lexicographers use SketchEngine. Word Sense Disambiguation (WSD) The WSD task: given. SketchEngine concordance

Word Sense Disambiguation. Lexicographers use SketchEngine. Word Sense Disambiguation (WSD) The WSD task: given. SketchEngine concordance School of something Computing FACULTY OF OTHER ENGINEERING Word Sense Disambiguation semantic tagging of text, for Confusion Set Disambiguation Lecturer: Eric Atwell Word Sense Disambiguation (WSD) The

More information

Word sense disambiguation

Word sense disambiguation Word sense disambiguation» Background from linguistics Lexical semantics» On-line resources» Computational approaches WordNet Handcrafted database of lexical relations Three separate databases: nouns;

More information

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed

More information

A Semantic Feature for Verbal Predicate and Semantic Role Labeling using SVMs

A Semantic Feature for Verbal Predicate and Semantic Role Labeling using SVMs A Semantic Feature for Verbal Predicate and Semantic Role Labeling using SVMs Hansen A. Schwartz and Fernando Gomez and Christopher Millward School of Electrical Engineering and Computer Science University

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction

More information

Joint Learning of Preposition Senses and Semantic Roles of Prepositional Phrases

Joint Learning of Preposition Senses and Semantic Roles of Prepositional Phrases Joint Learning of Preposition Senses and Semantic Roles of Prepositional Phrases Daniel Dahlmeier 1, Hwee Tou Ng 1,2, Tanja Schultz 3 1 NUS Graduate School for Integrative Sciences and Engineering 2 Department

More information

A chart generator for the Dutch Alpino grammar

A chart generator for the Dutch Alpino grammar June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:

More information

Natural Language Processing Lecture 7: Lexical semantics. Lexical semantics

Natural Language Processing Lecture 7: Lexical semantics. Lexical semantics Lexical semantics Limited domain: mapping to some knowledge base term(s). Knowledge base constrains possible meanings. Issues for broad coverage systems: Boundary between lexical meaning and world knowledge.

More information

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Chapter 8. Final Results on Dutch Senseval-2 Test Data Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised

More information

An Empirical Study of the Behavior of Active Learning for Word Sense Disambiguation

An Empirical Study of the Behavior of Active Learning for Word Sense Disambiguation An Empirical Study of the Behavior of Active Learning for Word Sense Disambiguation Jinying Chen, Andrew Schein, Lyle Ungar, 2 Martha Palmer Department of Computer and Information Science University of

More information

Word Meaning & Word Sense Disambiguation

Word Meaning & Word Sense Disambiguation Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Today Representing word meaning Word sense disambiguation as supervised classification Word sense

More information

Lexical Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT.

Lexical Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Lexical Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Q: What s meaning? Compositional semantics answer: Q: What s meaning? We now answer from a lexical semantics perspective

More information

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information

More information

NATURAL LANGUAGE PROCESSING WORD SENSE DISAMBIGUATION

NATURAL LANGUAGE PROCESSING WORD SENSE DISAMBIGUATION NATURAL LANGUAGE PROCESSING (COM4513/6513) WORD SENSE DISAMBIGUATION Andreas Vlachos a.vlachos@sheffield.ac.uk Department of Computer Science University of Sheffield 1 SO FAR part-of-speech tagging syntactic

More information

Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers

Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers Sonal Gupta Christopher Manning Natural Language Processing Group Department of Computer Science Stanford University Columbia

More information

of VerbNet against PropBank and Section 5 shows examples of preposition mismatches between the two resources. 2 VerbNet's components VerbNet is an on-

of VerbNet against PropBank and Section 5 shows examples of preposition mismatches between the two resources. 2 VerbNet's components VerbNet is an on- Using prepositions to extend a verb lexicon Karin Kipper, Benjamin Snyder, Martha Palmer University of Pennsylvania 200 South 33rd Street Philadelphia, PA 19104 USA fkipper,bsnyder3,mpalmerg@linc.cis.upenn.edu

More information

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features , pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of

More information

An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words

An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words Patrick Pantel and Dekang Lin Department of Computing Science University of Alberta 1 Edmonton, Alberta T6G

More information

Transition-Based Dependency Parsing with Long Distance Collocations

Transition-Based Dependency Parsing with Long Distance Collocations Transition-Based Dependency Parsing with Long Distance Collocations Chenxi Zhu, Xipeng Qiu (B), and Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,

More information

CMPT-825 Natural Language Processing

CMPT-825 Natural Language Processing CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop February 11, 2008 1 / 19 Lexical Semantics So far, we have listed words in our lexicon or vocabulary assuming a single meaning

More information

How much does word sense disambiguation help in sentiment analysis of micropost data?

How much does word sense disambiguation help in sentiment analysis of micropost data? How much does word sense disambiguation help in sentiment analysis of micropost data? Chiraag Sumanth PES Institute of Technology India Diana Inkpen University of Ottawa Canada 6th Workshop on Computational

More information

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and

More information

Learning a Probabilistic Model of Event Sequences From Internet Weblog Stories

Learning a Probabilistic Model of Event Sequences From Internet Weblog Stories Learning a Probabilistic Model of Event Sequences From Internet Weblog Stories Mehdi Manshadi 1, Reid Swanson 2, and Andrew S. Gordon 2 1 Department of Computer Science, University of Rochester P.O. Box

More information

Extraction of Hypernymy Information from Text

Extraction of Hypernymy Information from Text Extraction of Hypernymy Information from Text Erik Tjong Kim Sang, Katja Hofmann and Maarten de Rijke Abstract We present the results of three different studies in extracting hypernymy information from

More information

Effective Self-Training for Parsing

Effective Self-Training for Parsing Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu

More information

Customer Intentions Analysis of Twitter Based on Semantic Patterns

Customer Intentions Analysis of Twitter Based on Semantic Patterns Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun mohamed.hamrounn@gmail.com Mohamed Salah Gouider ms.gouider@yahoo.fr Lamjed Ben Said lamjed.bensaid@isg.rnu.tn ABSTRACT

More information

Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning

Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning Next week Finish semantics Begin machine learning for NLP Review for midterm Midterm October 27 th Will cover everything through semantics

More information

Detecting Parser Errors Using Web-based Semantic Filters

Detecting Parser Errors Using Web-based Semantic Filters Detecting Parser Errors Using Web-based Semantic Filters Alexander Yates Stefan Schoenmackers University of Washington Computer Science and Engineering Box 352350 Seattle, WA 98195-2350 Oren Etzioni {ayates,

More information

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no INF5820 Natural Language Processing - NLP H2009 Jan Tore Lønning jtl@ifi.uio.no Semantic Role Labeling INF5830 Lecture 13 Nov 4, 2009 Today Some words about semantics Thematic/semantic roles PropBank &

More information

Identifying Prepositional Phrases in Chinese Patent Texts with. Rule-based and CRF Methods

Identifying Prepositional Phrases in Chinese Patent Texts with. Rule-based and CRF Methods Identifying Prepositional Phrases in Chinese Patent Texts with Rule-based and CRF Methods Hongzheng Li and Yaohong Jin Institute of Chinese Information Processing, Beijing Normal University 19, Xinjiekou

More information

Structured Knowledge for Low-Resource Languages: The Latin and Ancient Greek Dependency Treebanks

Structured Knowledge for Low-Resource Languages: The Latin and Ancient Greek Dependency Treebanks Structured Knowledge for Low-Resource Languages: The Latin and Ancient Greek Dependency Treebanks David Bamman and Gregory Crane The Perseus Project, Tufts University The Problem: Classical Philology We

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

Semantic analysis of text and speech

Semantic analysis of text and speech Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic

More information

Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Database Interface for the Community Based Monitoring System * Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

Self-Training for Parsing Learner Text

Self-Training for Parsing Learner Text elf-training for Parsing Learner Text Aoife Cahill, Binod Gyawali and James V. Bruno Educational Testing ervice, 660 Rosedale Road, Princeton, NJ 0854, UA {acahill, bgyawali, jbruno}@ets.org Abstract We

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

Interactive Second Language Learning from News Websites

Interactive Second Language Learning from News Websites Interactive Second Language Learning from News Websites Tao Chen 1 Naijia Zheng 1 Yue Zhao 1 Muthu Kumar Chandrasekaran 1 Min-Yen Kan 1,2 1 School of Computing, National University of Singapore 2 NUS Interactive

More information

The Proposition Bank: An Annotated Corpus of Semantic Roles

The Proposition Bank: An Annotated Corpus of Semantic Roles The Proposition Bank: An Annotated Corpus of Semantic Roles Martha Palmer University of Pennsylvania Daniel Gildea. University of Rochester Paul Kingsbury University of Pennsylvania The Proposition Bank

More information

Automatic Pronominal Anaphora Resolution in English Texts

Automatic Pronominal Anaphora Resolution in English Texts Computational Linguistics and Chinese Language Processing Vol. 9, No.1, February 2004, pp. 21-40 21 The Association for Computational Linguistics and Chinese Language Processing Automatic Pronominal Anaphora

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Chinese Open Relation Extraction for Knowledge Acquisition

Chinese Open Relation Extraction for Knowledge Acquisition Chinese Open Relation Extraction for Knowledge Acquisition Yuen-Hsien Tseng 1, Lung-Hao Lee 1,2, Shu-Yen Lin 1, Bo-Shun Liao 1, Mei-Jun Liu 1, Hsin-Hsi Chen 2, Oren Etzioni 3, Anthony Fader 4 1 Information

More information

Question Prediction Language Model

Question Prediction Language Model Proceedings of the Australasian Language Technology Workshop 2007, pages 92-99 Question Prediction Language Model Luiz Augusto Pizzato and Diego Mollá Centre for Language Technology Macquarie University

More information

Natural Language Processing. Part 4: lexical semantics

Natural Language Processing. Part 4: lexical semantics Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells Computer Science Dep., Universidad Autonoma de Madrid, 28049 Madrid, Spain

More information

The Sem metrix Project: Scaling up the Profile-Based Measurement of Lexical Variation

The Sem metrix Project: Scaling up the Profile-Based Measurement of Lexical Variation Overview Profile-based Msm Build-up Synonyms First results The Sem metrix Project: Scaling up the Profile-Based Measurement of Lexical Variation Kris Heylen & Yves Peirsman KULeuven Quantitative Lexicology

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language

More information

Computational. Linguistics. 8. Word sense disambiguation. CSC 2501 / 485 Fall 2014

Computational. Linguistics. 8. Word sense disambiguation. CSC 2501 / 485 Fall 2014 Computational 8 Linguistics CSC 2501 / 485 Fall 2014 8. Word sense disambiguation Frank Rudzicz Toronto Rehabilitation Institute-UHN, Department of Computer Science, University of Toronto Reading: Jurafsky

More information

Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp

Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 239-246. The Necessity of Parsing for Predicate Argument Recognition Daniel Gildea

More information

Exploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons

Exploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, July 2002, pp. 125-132. Association for Computational Linguistics. Exploiting Strong Syntactic Heuristics

More information

Semantic Dependency Parsing using N-best Semantic Role Sequences and Roleset Information

Semantic Dependency Parsing using N-best Semantic Role Sequences and Roleset Information Semantic Dependency Parsing using N-best Semantic Role Sequences and Roleset Information Joo-Young Lee, Han-Cheol Cho, and Hae-Chang Rim Natural Language Processing Lab. Korea University Seoul, South Korea

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

(1) The woman is opening the door. (2) The man is dropping the book.

(1) The woman is opening the door. (2) The man is dropping the book. Background: The complexity of a verb s argument structure influences accuracy of sentence production in aphasia (Thompson, Lange, Schneider, & Shapiro, 1997). For instance, transitive sentences, such as

More information

MT Development Experience of Vietnam

MT Development Experience of Vietnam MT Development Experience of Vietnam VU Tat Thang, Ph.D. Institute of Information Technology Vietnamese Academy of Science and Technology vtthang@ioit.ac.vn Thang VU 2002 ~ 2005: IOIT, Vietnam Speech Processing

More information

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD. Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.

More information

Dictionary-Driven Semantic Look-up

Dictionary-Driven Semantic Look-up Computers and the Humanities 34: 193 197, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 193 Dictionary-Driven Semantic Look-up FRÉDÉRIQUE SEGOND 1, ELISABETH AIMELET 1, VERONIKA LUX

More information

Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms

Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms ESSLLI 2015 Barcelona, Spain http://ufal.mff.cuni.cz/esslli2015 Barbora Hladká hladka@ufal.mff.cuni.cz

More information

English Grammar Checker

English Grammar Checker International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,

More information

Automatic Translation: Overcome Barriers between European And Chinese Languages

Automatic Translation: Overcome Barriers between European And Chinese Languages Automatic Translation: Overcome Barriers between European And Chinese Languages WONG Fai, MAO Yuhang, DONG QingFu, QI YiHong Tsinghua University (China) Speech and Language Processing Research Center,

More information

Customizing an English-Korean Machine Translation System for Patent/Technical Documents Translation * *

Customizing an English-Korean Machine Translation System for Patent/Technical Documents Translation * * Customizing an English-Korean Machine Translation System for Patent/Technical Documents Translation * * Oh-Woog Kwon, Sung-Kwon Choi, Ki-Young Lee, Yoon-Hyung Roh, and Young-Gil Kim Natural Language Processing

More information

CHAPTER I INTRODUCTION

CHAPTER I INTRODUCTION 1 CHAPTER I INTRODUCTION A. Background of the Study Language is used to communicate with other people. People need to study how to use language especially foreign language. Language can be study in linguistic

More information

PiQASso: Pisa Question Answering System

PiQASso: Pisa Question Answering System PiQASso: Pisa Question Answering System Giuseppe Attardi, Antonio Cisternino, Francesco Formica, Maria Simi, Alessandro Tommasi Dipartimento di Informatica, Università di Pisa, Italy {attardi, cisterni,

More information

Sorting out the Most Confusing English Phrasal Verbs

Sorting out the Most Confusing English Phrasal Verbs STARSEM-2012 Sorting out the Most Confusing English Phrasal Verbs Yuancheng Tu Department of Linguistics University of Illinois ytu@illinois.edu Dan Roth Department of Computer Science University of Illinois

More information

Automatic Pronominal Anaphora Resolution. in English Texts

Automatic Pronominal Anaphora Resolution. in English Texts Automatic Pronominal Anaphora Resolution in English Texts Tyne Liang and Dian-Song Wu Department of Computer and Information Science National Chiao Tung University Hsinchu, Taiwan Email: tliang@cis.nctu.edu.tw;

More information

Cross-lingual Synonymy Overlap

Cross-lingual Synonymy Overlap Cross-lingual Synonymy Overlap Anca Dinu 1, Liviu P. Dinu 2, Ana Sabina Uban 2 1 Faculty of Foreign Languages and Literatures, University of Bucharest 2 Faculty of Mathematics and Computer Science, University

More information

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Multilingual Word Sense Disambiguation and Entity Linking on the Web based on BabelNet Roberto Navigli, Tiziano

More information

Ambiguous Prepositional Phrase Resolution by Humans. Joseph Houpt

Ambiguous Prepositional Phrase Resolution by Humans. Joseph Houpt Ambiguous Prepositional Phrase Resolution by Humans Joseph Houpt Master of Science Artificial Intelligence School of Informatics University of Edinburgh 2006 Abstract This paper examines the information

More information

Detecting Semantic Ambiguity

Detecting Semantic Ambiguity Linguistic Issues in Language Technology LiLT Submitted, January 2012 Detecting Semantic Ambiguity Alternative Readings in Treebanks Kristiina Muhonen and Tanja Purtonen Published by CSLI Publications

More information

Natural Language Processing

Natural Language Processing Natural Language Processing 2 Open NLP (http://opennlp.apache.org/) Java library for processing natural language text Based on Machine Learning tools maximum entropy, perceptron Includes pre-built models

More information

TREC 2003 Question Answering Track at CAS-ICT

TREC 2003 Question Answering Track at CAS-ICT TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China changyi@software.ict.ac.cn http://www.ict.ac.cn/

More information

Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation

Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation Daisuke Kawahara, Sadao Kurohashi National Institute of Information and Communications Technology 3-5 Hikaridai

More information

Co-training and Self-training for Word Sense Disambiguation

Co-training and Self-training for Word Sense Disambiguation Co-training and Self-training for Word Sense Disambiguation Rada Mihalcea Department of Computer Science and Engineering University of North Texas rada@cs.unt.edu Abstract This paper investigates the application

More information

Learning and Inference for Clause Identification

Learning and Inference for Clause Identification Learning and Inference for Clause Identification Xavier Carreras Lluís Màrquez Technical University of Catalonia (UPC) Vasin Punyakanok Dan Roth University of Illinois at Urbana-Champaign (UIUC) ECML 2002

More information

DanNet Teaching and Research Perspectives at CST

DanNet Teaching and Research Perspectives at CST DanNet Teaching and Research Perspectives at CST Patrizia Paggio Centre for Language Technology University of Copenhagen paggio@hum.ku.dk Dias 1 Outline Previous and current research: Concept-based search:

More information

The Role of Sentence Structure in Recognizing Textual Entailment

The Role of Sentence Structure in Recognizing Textual Entailment Blake,C. (In Press) The Role of Sentence Structure in Recognizing Textual Entailment. ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic. The Role of Sentence Structure

More information

Maximum Entropy Models for FrameNet Classification

Maximum Entropy Models for FrameNet Classification Maximum Entropy Models for FrameNet Classification Michael Fleischman, Namhee Kwon and Eduard Hovy USC Information Sciences Institute 4676 Admiralty Way Marina del Rey, CA 90292-6695 {fleisch, nkwon, hovy

More information

A Mixed Trigrams Approach for Context Sensitive Spell Checking

A Mixed Trigrams Approach for Context Sensitive Spell Checking A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu

More information

Computer Standards & Interfaces

Computer Standards & Interfaces Computer Standards & Interfaces 35 (2013) 470 481 Contents lists available at SciVerse ScienceDirect Computer Standards & Interfaces journal homepage: www.elsevier.com/locate/csi How to make a natural

More information

The STEVIN IRME Project

The STEVIN IRME Project The STEVIN IRME Project Jan Odijk STEVIN Midterm Workshop Rotterdam, June 27, 2008 IRME Identification and lexical Representation of Multiword Expressions (MWEs) Participants: Uil-OTS, Utrecht Nicole Grégoire,

More information

Schema documentation for types1.2.xsd

Schema documentation for types1.2.xsd Generated with oxygen XML Editor Take care of the environment, print only if necessary! 8 february 2011 Table of Contents : ""...........................................................................................................

More information

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD Eneko Agirre and Oier Lopez de Lacalle and Aitor Soroa Informatika Fakultatea, University of the Basque Country 20018,

More information

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014 Introduction! BM1 Advanced Natural Language Processing Alexander Koller! 17 October 2014 Outline What is computational linguistics? Topics of this course Organizational issues Siri Text prediction Facebook

More information

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25 L130: Chapter 5d Dr. Shannon Bischoff Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25 Outline 1 Syntax 2 Clauses 3 Constituents Dr. Shannon Bischoff () L130: Chapter 5d 2 / 25 Outline Last time... Verbs...

More information

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like

More information

A Quirk Review of Translation Models

A Quirk Review of Translation Models A Quirk Review of Translation Models Jianfeng Gao, Microsoft Research Prepared in connection with the 2010 ACL/SIGIR summer school July 22, 2011 The goal of machine translation (MT) is to use a computer

More information

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software Semantic Research using Natural Language Processing at Scale; A continued look behind the scenes of Semantic Insights Research Assistant and Research Librarian Presented to The Federal Big Data Working

More information

Building A Vocabulary Self-Learning Speech Recognition System

Building A Vocabulary Self-Learning Speech Recognition System INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes

More information

Genre distinctions and discourse modes: Text types differ in their situation type distributions

Genre distinctions and discourse modes: Text types differ in their situation type distributions Genre distinctions and discourse modes: Text types differ in their situation type distributions Alexis Palmer and Annemarie Friedrich Department of Computational Linguistics Saarland University, Saarbrücken,

More information

Context Grammar and POS Tagging

Context Grammar and POS Tagging Context Grammar and POS Tagging Shian-jung Dick Chen Don Loritz New Technology and Research New Technology and Research LexisNexis LexisNexis Ohio, 45342 Ohio, 45342 dick.chen@lexisnexis.com don.loritz@lexisnexis.com

More information

12 FIRST QUARTER. Class Assignments

12 FIRST QUARTER. Class Assignments 12 FIRST QUARTER Class Assignments August 6- Handout textbooks. August 7- Overview of the course. Go over senior dates. Go over class syllabus. August 10- Part 2 Chapter 1 Parts of Speech. Nouns. Daily

More information

Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization

Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization Alfio Gliozzo and Carlo Strapparava ITC-Irst via Sommarive, I-38050, Trento, ITALY {gliozzo,strappa}@itc.it

More information

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5

More information

Word Sense Disambiguation as an Integer Linear Programming Problem

Word Sense Disambiguation as an Integer Linear Programming Problem Word Sense Disambiguation as an Integer Linear Programming Problem Vicky Panagiotopoulou 1, Iraklis Varlamis 2, Ion Androutsopoulos 1, and George Tsatsaronis 3 1 Department of Informatics, Athens University

More information