Word Meaning & Word Sense Disambiguation
|
|
- Avice Banks
- 7 years ago
- Views:
Transcription
1 Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu
2 Today Representing word meaning Word sense disambiguation as supervised classification Word sense disambiguation without annotated examples
3 Drunk gets nine year in violin case. beatrice/humor/headlines.html
4 How do we know that a word (lemma) has distinct senses? Linguists often design tests for this purpose e.g., zeugma combines distinct senses in an uncomfortable way Which flight serves breakfast? Which flights serve Tuscon? *Which flights serve breakfast and Tuscon?
5 Where can we look up the meaning of words? Dictionary?
6 Word Senses Word sense = distinct meaning of a word Same word, different senses Homonyms (homonymy): unrelated senses; identical orthographic form is coincidental Polysemes (polysemy): related, but distinct senses Metonyms (metonymy): stand in, technically, a subcase of polysemy Different word, same sense Synonyms (synonymy)
7 Homophones: same pronunciation, different orthography, different meaning Examples: would/wood, to/too/two Homographs: distinct senses, same orthographic form, different pronunciation Examples: bass (fish) vs. bass (instrument)
8 Relationship Between Senses IS-A relationships From specific to general (up): hypernym (hypernymy) From general to specific (down): hyponym (hyponymy) Part-Whole relationships wheel is a meronym of car (meronymy) car is a holonym of wheel (holonymy)
9 WordNet: a lexical database for English Includes most English nouns, verbs, adjectives, adverbs Electronic format makes it amenable to automatic manipulation: used in many NLP applications WordNets generically refers to similar resources in other languages
10 WordNet: History Research in artificial intelligence: How do humans store and access knowledge about concept? Hypothesis: concepts are interconnected via meaningful relations Useful for reasoning The WordNet project started in 1986 Can most (all?) of the words in a language be represented as a semantic network where words are interlinked by meaning? If so, the result would be a large semantic network
11 Synonymy in WordNet WordNet is organized in terms of synsets Unordered set of (roughly) synonymous words (or multi-word phrases) Each synset expresses a distinct meaning/concept
12 WordNet: Example Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) pipe oil, water, and gas into the desert {pipe} (play on a pipe) pipe a tune {pipe} (trim with piping) pipe the skirt What do you think of the sense granularity?
13 The Net Part of WordNet {conveyance; transport} hyperonym {vehicle} hyperonym {bumper} {motor vehicle; automotivevehicle} meronym hyperonym {car dor} {car window} {car; auto; automobile; machine; motorcar} {armrest} meronym hyperonym hyperonym {cruiser; squadcar; patrol car; policecar; prowl car} {cab; taxi; hack; taxicab; } {car miror} meronym meronym {hinge; flexiblejoint} {dorlock}
14 WordNet 3.0: Size Part of speech Word form Synsets Noun 117,798 82,115 Verb 11,529 13,767 Adjective 21,479 18,156 Adverb 4,481 3,621 Total 155, ,659
15 Word Sense From WordNet: Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) pipe oil, water, and gas into the desert {pipe} (play on a pipe) pipe a tune {pipe} (trim with piping) pipe the skirt
16 WORD SENSE DISAMBIGUATION
17 Word Sense Disambiguation Task: automatically select the correct sense of a word Input: a word in context Output: sense of the word Can be framed as lexical sample (focus on one word type at a time) or all-words (disambiguate all content words in a document) Motivated by many applications: Information retrieval Machine translation
18 How big is the problem? Most words in English have only one sense 62% in Longman s Dictionary of Contemporary English 79% in WordNet But the others tend to have several senses Average of 3.83 in LDOCE Average of 2.96 in WordNet Ambiguous words are more frequently used In the British National Corpus, 84% of instances have more than one sense Some senses are more frequent than others
19 Ground Truth Which sense inventory do we use?
20 Existing Corpora Lexical sample line-hard-serve corpus (4k sense-tagged examples) interest corpus (2,369 sense-tagged examples) All-words SemCor (234k words, subset of Brown Corpus) Senseval/SemEval (2081 tagged content words from 5k total words)
21 Evaluation Intrinsic Measure accuracy of sense selection wrt ground truth Extrinsic Integrate WSD as part of a bigger end-to-end system, e.g., machine translation or information retrieval Compare WSD
22 Baseline Performance Baseline: most frequent sense Equivalent to take first sense in WordNet Does surprisingly well! 62% accuracy in this case!
23 Upper Bound Performance Upper bound Fine-grained WordNet sense: 75-80% human agreement Coarser-grained inventories: 90% human agreement possible
24 WSD as Supervised Classification Training Testing training data? unlabeled document label 1 label 2 label 3 label 4 supervised machine learning algorithm Feature Functions Classifier label 1? label 2? label 3? label 4?
25 WSD Approaches Depending on use of manually created knowledge sources Knowledge-lean Knowledge-rich Depending on use of labeled data Supervised Semi- or minimally supervised Unsupervised
26 Simplest WSD algorithm: Lesk s Algorithm Intuition: note word overlap between context and dictionary entries Unsupervised, but knowledge rich The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. WordNet
27 Lesk s Algorithm Simplest implementation: Count overlapping content words between glosses and context Lots of variants: Include the examples in dictionary definitions Include hypernyms and hyponyms Give more weight to larger overlaps (e.g., bigrams) Give extra weight to infrequent words
28 WSD Accuracy Generally Supervised approaches yield ~70-80% accuracy However depends on actual word, sense inventory, amount of training data, number of features, etc.
29 WORD SENSE DISAMBIGUATION: MINIMIZING SUPERVISION
30 Minimally Supervised WSD Problem: annotations are expensive! Solution 1: Bootstrapping or co-training (Yarowsky 1995) Start with (small) seed, learn classifier Use classifier to label rest of corpus Retain confident labels, add to training set Learn new classifier Repeat Heuristics (derived from observation): One sense per discourse One sense per collocation
31 One Sense per Discourse A word tends to preserve its meaning across all its occurrences in a given discourse Gale et al words with two-way ambiguity, e.g. plant, crane, etc. 98% of the two-word occurrences in the same discourse carry the same meaning Krovetz 1998 Heuristic true mostly for coarse-grained senses and for homonymy rather than polysemy Performance of one sense per discourse measured on SemCor is approximately 70%
32 One Sense per Collocation A word tends to preserve its meaning when used in the same collocation Strong for adjacent collocations Weaker as the distance between words increases Evaluation: 97% precision on words with two-way ambiguity Again, accuracy depends on granularity: 70% precision on SemCor words
33
34
35
36
37 Yarowsky s Method: Stopping Stop when: Error on training data is less than a threshold No more training data is covered
38 Yarowsky s Method: Discussion Advantages Accuracy is about as good as a supervised algorithm Bootstrapping: far less manual effort Disadvantages Seeds may be tricky to construct Works only for coarse-grained sense distinctions Snowballing error with co-training
39 WSD with 2 nd language as supervision Problems: annotations are expensive! What s the proper sense inventory? How fine or coarse grained? Application specific? Observation: multiple senses translate to different words in other languages Use the foreign language as the sense inventory Added bonus: annotations for free! (Using machine translation data)
40 Today Representing word senses & word relations WordNet Word Sense Disambiguation Lesk Algorithm Supervised classification Minimizing supervision Co-training: combine 2 views of data Use parallel text Next: general techniques for learning without annotated training examples If needed review probability, expectations
Natural Language Processing. Part 4: lexical semantics
Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationComparing methods for automatic acquisition of Topic Signatures
Comparing methods for automatic acquisition of Topic Signatures Montse Cuadros, Lluis Padro TALP Research Center Universitat Politecnica de Catalunya C/Jordi Girona, Omega S107 08034 Barcelona {cuadros,
More informationANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
More informationCombining Contextual Features for Word Sense Disambiguation
Proceedings of the SIGLEX/SENSEVAL Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, Philadelphia, July 2002, pp. 88-94. Association for Computational Linguistics. Combining
More informationComparing Ontology-based and Corpusbased Domain Annotations in WordNet.
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information
More informationAn Integrated Approach to Automatic Synonym Detection in Turkish Corpus
An Integrated Approach to Automatic Synonym Detection in Turkish Corpus Dr. Tuğba YILDIZ Assist. Prof. Dr. Savaş YILDIRIM Assoc. Prof. Dr. Banu DİRİ İSTANBUL BİLGİ UNIVERSITY YILDIZ TECHNICAL UNIVERSITY
More informationIntro to Linguistics Semantics
Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember
More informationSense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania htd@linc.cis.upenn.edu October 30, 2003 Outline English sense-tagging
More informationChapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
More informationClever Search: A WordNet Based Wrapper for Internet Search Engines
Clever Search: A WordNet Based Wrapper for Internet Search Engines Peter M. Kruse, André Naujoks, Dietmar Rösner, Manuela Kunze Otto-von-Guericke-Universität Magdeburg, Institut für Wissens- und Sprachverarbeitung,
More informationAutomatic assignment of Wikipedia encyclopedic entries to WordNet synsets
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells Computer Science Dep., Universidad Autonoma de Madrid, 28049 Madrid, Spain
More informationThe study of words. Word Meaning. Lexical semantics. Synonymy. LING 130 Fall 2005 James Pustejovsky. ! What does a word mean?
Word Meaning LING 130 Fall 2005 James Pustejovsky The study of words! What does a word mean?! To what extent is it a linguistic matter?! To what extent is it a matter of world knowledge? Thanks to Richard
More informationKnowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD
Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD Eneko Agirre and Oier Lopez de Lacalle and Aitor Soroa Informatika Fakultatea, University of the Basque Country 20018,
More informationAN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS Alok Ranjan Pal 1, 3, Anirban Kundu 2, 3, Abhay Singh 1, Raj Shekhar 1, Kunal Sinha 1 1 College of Engineering and Management,
More informationIdentifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of
More informationSemantic Clustering in Dutch
t.van.de.cruys@rug.nl Alfa-informatica, Rijksuniversiteit Groningen Computational Linguistics in the Netherlands December 16, 2005 Outline 1 2 Clustering Additional remarks 3 Examples 4 Research carried
More informationTERMINOGRAPHY and LEXICOGRAPHY What is the difference? Summary. Anja Drame TermNet
TERMINOGRAPHY and LEXICOGRAPHY What is the difference? Summary Anja Drame TermNet Summary/ Conclusion Variety of language (GPL = general purpose SPL = special purpose) Lexicography GPL SPL (special-purpose
More informationWord Sense Disambiguation for Text Mining
Word Sense Disambiguation for Text Mining Daniel I. MORARIU, Radu G. CREŢULESCU, Macarie BREAZU Lucian Blaga University of Sibiu, Engineering Faculty, Computer and Electrical Engineering Department Abstract:
More informationCross-lingual Synonymy Overlap
Cross-lingual Synonymy Overlap Anca Dinu 1, Liviu P. Dinu 2, Ana Sabina Uban 2 1 Faculty of Foreign Languages and Literatures, University of Bucharest 2 Faculty of Mathematics and Computer Science, University
More information2. SEMANTIC RELATIONS
2. SEMANTIC RELATIONS 2.0 Review: meaning, sense, reference A word has meaning by having both sense and reference. Sense: Word meaning: = concept Sentence meaning: = proposition (1) a. The man kissed the
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationSimple Features for Chinese Word Sense Disambiguation
Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, and Fu-Dong Chiou Department of Computer and Information Science University of Pennsylvania htd,chingyc,mpalmer,chioufd
More informationSEMANTIC RESOURCES AND THEIR APPLICATIONS IN HUNGARIAN NATURAL LANGUAGE PROCESSING
SEMANTIC RESOURCES AND THEIR APPLICATIONS IN HUNGARIAN NATURAL LANGUAGE PROCESSING Doctor of Philosophy Dissertation Márton Miháltz Supervisor: Gábor Prószéky, D.Sc. Multidisciplinary Technical Sciences
More informationAn Efficient Database Design for IndoWordNet Development Using Hybrid Approach
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh P rabhu 2 Shilpa Desai 1 Hanumant Redkar 1 N eha P rabhugaonkar 1 Apur va N agvenkar 1 Ramdas Karmali 1 (1) GOA
More informationWord Sense Disambiguation as an Integer Linear Programming Problem
Word Sense Disambiguation as an Integer Linear Programming Problem Vicky Panagiotopoulou 1, Iraklis Varlamis 2, Ion Androutsopoulos 1, and George Tsatsaronis 3 1 Department of Informatics, Athens University
More informationBuilding the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)
Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Multilingual Word Sense Disambiguation and Entity Linking on the Web based on BabelNet Roberto Navigli, Tiziano
More informationClustering of Polysemic Words
Clustering of Polysemic Words Laurent Cicurel 1, Stephan Bloehdorn 2, and Philipp Cimiano 2 1 isoco S.A., ES-28006 Madrid, Spain lcicurel@isoco.com 2 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe,
More informationPackage wordnet. January 6, 2016
Title WordNet Interface Version 0.1-11 Package wordnet January 6, 2016 An interface to WordNet using the Jawbone Java API to WordNet. WordNet () is a large lexical database
More informationExploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization
Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization Alfio Gliozzo and Carlo Strapparava ITC-Irst via Sommarive, I-38050, Trento, ITALY {gliozzo,strappa}@itc.it
More informationDISA at ImageCLEF 2014: The search-based solution for scalable image annotation
DISA at ImageCLEF 2014: The search-based solution for scalable image annotation Petra Budikova, Jan Botorek, Michal Batko, and Pavel Zezula Masaryk University, Brno, Czech Republic {budikova,botorek,batko,zezula}@fi.muni.cz
More informationAnalyzing survey text: a brief overview
IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining
More informationModule Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
More informationTowards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features
Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features Jinying Chen and Martha Palmer Department of Computer and Information Science, University of Pennsylvania,
More informationSemantic analysis of text and speech
Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic
More informationExploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, July 2002, pp. 125-132. Association for Computational Linguistics. Exploiting Strong Syntactic Heuristics
More informationDanNet Teaching and Research Perspectives at CST
DanNet Teaching and Research Perspectives at CST Patrizia Paggio Centre for Language Technology University of Copenhagen paggio@hum.ku.dk Dias 1 Outline Previous and current research: Concept-based search:
More informationExtraction of Hypernymy Information from Text
Extraction of Hypernymy Information from Text Erik Tjong Kim Sang, Katja Hofmann and Maarten de Rijke Abstract We present the results of three different studies in extracting hypernymy information from
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationOptimizing the relevancy of Predictions using Machine Learning and NLP of Search Query
International Journal of Scientific and Research Publications, Volume 4, Issue 8, August 2014 1 Optimizing the relevancy of Predictions using Machine Learning and NLP of Search Query Kilari Murali krishna
More informationTesting Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
More informationUsing Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance David Bixler, Dan Moldovan and Abraham Fowler Language Computer Corporation 1701 N. Collins Blvd #2000 Richardson,
More informationHow the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
More informationApplying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu
Applying Co-Training Methods to Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu 1 Statistical Parsing: the company s clinical trials of both its animal and human-based
More informationAn empirical study of semantic similarity in WordNet and Word2Vec. A Thesis
An empirical study of semantic similarity in WordNet and Word2Vec A Thesis Submitted to the Graduate Faculty of the University of New Orleans in partial fulfillment of the requirements for the degree of
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationTaxonomies for Auto-Tagging Unstructured Content. Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013
Taxonomies for Auto-Tagging Unstructured Content Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013 About Heather Hedden Independent taxonomy consultant, Hedden
More informationTextual Data Analysis tools for Word Sense Disambiguation
Textual Data Analysis tools for Word Sense Disambiguation Simona Balbi 1, Agnieszka Stawinoga 2 Università FedericoII di Napoli 1 simona.balbi@unina.it, 2 agnieszka.stawinoga@unina.it Abstract The ambiguity
More informationPresented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software
Semantic Research using Natural Language Processing at Scale; A continued look behind the scenes of Semantic Insights Research Assistant and Research Librarian Presented to The Federal Big Data Working
More informationA clustering-based Approach for Unsupervised Word Sense Disambiguation
Procesamiento del Lenguaje Natural, Revista nº 49 septiembre de 2012, pp 49-56 recibido 04-05-12 revisado 04-06-12 aceptado 07-06-12 A clustering-based Approach for Unsupervised Word Sense Disambiguation
More informationFree Construction of a Free Swedish Dictionary of Synonyms
Free Construction of a Free Swedish Dictionary of Synonyms Viggo Kann and Magnus Rosell KTH Nada SE-1 44 Stockholm Sweden {viggo, rosell}@nada.kth.se Abstract Building a large dictionary of synonyms for
More information3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work
Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan hasegawa.takaaki@lab.ntt.co.jp
More informationCOLLOCATION TOOLS FOR L2 WRITERS 1
COLLOCATION TOOLS FOR L2 WRITERS 1 An Evaluation of Collocation Tools for Second Language Writers Ulugbek Nurmukhamedov Northern Arizona University COLLOCATION TOOLS FOR L2 WRITERS 2 Abstract Second language
More informationC o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER
INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process
More informationSentiment Analysis of Movie Reviews and Twitter Statuses. Introduction
Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about
More informationPoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy fabio.tamburini@unibo.it Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
More informationWhat Is This, Anyway: Automatic Hypernym Discovery
What Is This, Anyway: Automatic Hypernym Discovery Alan Ritter and Stephen Soderland and Oren Etzioni Turing Center Department of Computer Science and Engineering University of Washington Box 352350 Seattle,
More informationA Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract
More informationReport on the embedding and evaluation of the second MT pilot
Report on the embedding and evaluation of the second MT pilot quality translation by deep language engineering approaches DELIVERABLE D3.10 VERSION 1.6 2015-11-02 P2 QTLeap Machine translation is a computational
More informationA Sentiment Analysis Model Integrating Multiple Algorithms and Diverse. Features. Thesis
A Sentiment Analysis Model Integrating Multiple Algorithms and Diverse Features Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The
More informationContext Sensitive Paraphrasing with a Single Unsupervised Classifier
Appeared in ECML 07 Context Sensitive Paraphrasing with a Single Unsupervised Classifier Michael Connor and Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign connor2@uiuc.edu
More informationConstruction of Thai WordNet Lexical Database from Machine Readable Dictionaries
Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Patanakul Sathapornrungkij Department of Computer Science Faculty of Science, Mahidol University Rama6 Road, Ratchathewi
More informationSentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationWord sense disambiguation through associative dictionaries
INSTITUTO POLITÉCNICO NACIONAL CENTRO DE INVESTIGACIÓN EN COMPUTACIÓN LABORATORIO DE LENGUAJE NATURAL Y PROCESAMIENTO DE TEXTO Word sense disambiguation through associative dictionaries TESIS QUE PRESENTA
More informationA Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu
More informationRefining the most frequent sense baseline
Refining the most frequent sense baseline Judita Preiss Department of Linguistics The Ohio State University judita@ling.ohio-state.edu Josh King Computer Science and Engineering The Ohio State University
More informationTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
More informationThe Open University s repository of research publications and other research outputs. PowerAqua: fishing the semantic web
Open Research Online The Open University s repository of research publications and other research outputs PowerAqua: fishing the semantic web Conference Item How to cite: Lopez, Vanessa; Motta, Enrico
More informationSpeech and Language Processing
Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Second Edition Daniel Jurafsky Stanford University James H. Martin University
More informationChapter 2 The Information Retrieval Process
Chapter 2 The Information Retrieval Process Abstract What does an information retrieval system look like from a bird s eye perspective? How can a set of documents be processed by a system to make sense
More informationHybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
More informationOverview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
More informationVisualizing WordNet Structure
Visualizing WordNet Structure Jaap Kamps Abstract Representations in WordNet are not on the level of individual words or word forms, but on the level of word meanings (lexemes). A word meaning, in turn,
More informationMethods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations
Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations Bento C. Dias-da-Silva 1, Ariani Di Felippo 2, Ricardo Hasegawa 3 Centro de Estudos Lingüísticos
More informationLearning Domain Ontologies from Document Warehouses and Dedicated Web Sites
Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites Roberto Navigli Università di Roma La Sapienza Paola Velardi Università di Roma La Sapienza We present a method and a tool, OntoLearn,
More informationMachine Learning Approach To Augmenting News Headline Generation
Machine Learning Approach To Augmenting News Headline Generation Ruichao Wang Dept. of Computer Science University College Dublin Ireland rachel@ucd.ie John Dunnion Dept. of Computer Science University
More informationDomain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora
Domain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora Jing-Shin Chang Department of Computer Science& Information Engineering National
More informationWeb opinion mining: How to extract opinions from blogs?
Web opinion mining: How to extract opinions from blogs? Ali Harb ali.harb@ema.fr Mathieu Roche LIRMM CNRS 5506 UM II, 161 Rue Ada F-34392 Montpellier, France mathieu.roche@lirmm.fr Gerard Dray gerard.dray@ema.fr
More informationExploiting Redundancy in Natural Language to Penetrate Bayesian Spam Filters
Exploiting Redundancy in Natural Language to Penetrate Bayesian Spam Filters Christoph Karlberger, Günther Bayler, Christopher Kruegel, and Engin Kirda Secure Systems Lab Technical University Vienna {christoph,gmb,chris,ek}@seclab.tuwien.ac.at
More informationIntroduction to WordNet: An On-line Lexical Database. (Revised August 1993)
Introduction to WordNet: An On-line Lexical Database George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller (Revised August 1993) WordNet is an on-line lexical reference
More informationCINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:
More informationA Mapping of CIDOC CRM Events to German Wordnet for Event Detection in Texts
A Mapping of CIDOC CRM Events to German Wordnet for Event Detection in Texts Martin Scholz Friedrich-Alexander-University Erlangen-Nürnberg Digital Humanities Research Group Outline Motivation: information
More informationComparing Topiary-Style Approaches to Headline Generation
Comparing Topiary-Style Approaches to Headline Generation Ruichao Wang 1, Nicola Stokes 1, William P. Doran 1, Eamonn Newman 1, Joe Carthy 1, John Dunnion 1. 1 Intelligent Information Retrieval Group,
More informationSpecial Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationDomain Independent Knowledge Base Population From Structured and Unstructured Data Sources
Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationSemi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 160-167. Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation
More informationTopics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer
More informationDomain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
More informationSelf-Monitoring in Social Networks
Self-Monitoring in Social Networks Amin Anjomshoaa 1, Khue Vo Sao 1, Amirreza Tahamtan 1, A Min Tjoa 1, Edgar Weippl 2 1 Institute of Software Technology and Interactive Systems, Vienna University of Technology
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationTransaction-Typed Points TTPoints
Transaction-Typed Points TTPoints version: 1.0 Technical Report RA-8/2011 Mirosław Ochodek Institute of Computing Science Poznan University of Technology Project operated within the Foundation for Polish
More informationInteractive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
More informationASSOCIATING COLLOCATIONS WITH DICTIONARY SENSES
ASSOCIATING COLLOCATIONS WITH DICTIONARY SENSES Abhilash Inumella Adam Kilgarriff Vojtěch Kovář IIIT Hyderabad, India Lexical Computing Ltd., UK Masaryk Uni., Brno, Cz abhilashi@students.iiit.ac.in adam@lexmasterclass.com
More informationStock Market Prediction Using Data Mining
Stock Market Prediction Using Data Mining 1 Ruchi Desai, 2 Prof.Snehal Gandhi 1 M.E., 2 M.Tech. 1 Computer Department 1 Sarvajanik College of Engineering and Technology, Surat, Gujarat, India Abstract
More informationSemantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation
Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation Denis Turdakov, Pavel Velikhov ISP RAS turdakov@ispras.ru, pvelikhov@yahoo.com
More informationParsing Technology and its role in Legacy Modernization. A Metaware White Paper
Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks
More information