WordNet Structure and use in natural language processing

Size: px
Start display at page:

Download "WordNet Structure and use in natural language processing"

Transcription

1 WordNet Structure and use in natural language processing Abstract There are several electronic dictionaries, thesauri, lexical databases, and so forth today. WordNet is one of the largest and most widely used of these. It has been used for many natural language processing tasks, including word sense disambiguation and question answering. This is an attempt to explore and understand the structure of WordNet, and how it is used and for what applications it is used, and also to see where it's strength and weakness lies. good introduction, nice that you mention the general 1. WordNet as a lexical database aim of your work, so the reader is always aware of what you're eyplaining. Double use of "and", maybe you can reformulate the last sentence 1.1 Background appears Before the 1990s, most of the dictionaries for English existed only in paper form. The dictionaries that were available in electronic form was limited to a few groups of researchers. This was were something that hindered much work to be done in certain areas of computational linguistics, for example word sense disambiguation (WSD). why were CL tasks more difficult without electronical databases (explain advantages of electronic database) In 1993, WordNet was introduced. It is a lexical database, organized as a semantic network. The development began in 1985 at Princeton University by a group of psychologists and linguists, and the university still is the maintainer of this lexical database. Even though it was not created with the intention to serve as knowledge source for tasks in computational linguistics, it has been used as such. It has been widely used as a lexical resource for different tasks, have been ported to several has been different languages, and has spawned many different subsets. One task that it has been widely used for is the previous mentioned WSD. double use of has been used: reformulate (applicated as lexical resource..) 1.2 Structure why do you mention WSD so often? are you especcially focusing on that task? If so name that you focuse on WSD or say why it is so important WordNet consists of three separate databases, one for nouns, one for verbs and one for adjectives and adverbs. It does not include closed class words. The current version available for download is WordNet 3.0, which was released in December It contains 117,097 nouns, 22,141 adjectives, 11,488 verbs and 4,601 adverbs. [2] There is a later release, 3.1, which is available for online usage. were is your first footnote? :).. The basic structure is synsets. These are sets of synonyms, or more correct, near-synonyms, since there exists none to few true synonyms. Synsets contains a set of lemmas, and these sets are tagged with the sense they represent. These senses can be said to be concepts, all of the lemmas (or words), can be said to express the same concept. Word forms which have different meanings appear in different synsets. For example the noun bank, has 10 different senses in WordNet, and thus it appear in 10 different synsets. It also appear as verb in 8 different synsets. Each of these synsets are is also connected in some way to other synsets, expressing some kind of relation. Which these relations are depend on the part of speech of the word itself, although the hypernym/hyponym (the what do you mean with this sentence? (some of these relations?)

2 is-a relation) relationship is the most common, and appears for both nouns and verbs (hypoyms for verbs are known as troponyms, and the difference between them leading to the different names will be expanded on). very good explanation of synsets. what do you mean with will be expanded on? that they will work on that or that you will explain it in detail later? One thing that WordNet does not take in to account is pronunciation, which can be observed by looking at the noun bass. The pronunciation differs whether talking about bass in the sense of the low tone or the instrument, or talking about the fish bass. very good that you mention this aspect. maybe you can also state that this is a disadvantage for some CL tasks Nouns Nouns have the richest set of relations of all parts of speech represented in WordNet, with 12 different relations. As previously stated, the hyponym/hypernym relation is the most frequent used one. For example, if we look at the noun bass again (which have 8 different senses), now in the has context of sea bass, it is a saltwater fish, which is a kind of a seafood, which is a kind of solid food, and so on. These relations are also transitive, which means that sea bass is a type of food, as much as it is a type of saltwater fish. Sense 4 sea bass, bass => saltwater fish => seafood => food, solid food => solid => matter => physical entity => entity Table 1: Hypernyms of bass in the sense of sea bass. good that you show a table for illustrations WordNet also separates the hyponyms between types and instances. A chair is a type of furniture. Hesse, however is not a type of author, but an instance of author. So an instance is a specific form of hyponyms, and these instances are usually proper nouns, describing a unique entity, such as persons, cities and companies. These instances goes both ways, just like the types. go what do you mean by: they go both ways? Meronymi, the part-of relationship is divided into three different types, member meronymi, part meronymi and substance meronymi. It also has it's counterpart, just like hyponyms, holonymi. Where meronymi is has-part, holonymi is part-of. And just like homonyms, meronyms are a transitive relationship. If a tree has branches, and a branch has leaves, the tree has leaves. Part meronymi, which is the relationship most commonly associated with the word, describes parts of an entity. is the blue sentence an explanation/example for meronyms? Substance meronymi describes substances contained in an entity. For example, using the word water in the sense of the chemical substance H2O, it has substance hydrogen and substance oxygen. The last subset of meronymi, member meronymi, describes the relationship of belonging to a larger kind of group. Looking at the word tree again, we can see that it is a member of the entity forest,

3 wood and woods. See table 2 for a description of the different types of meronymis. Part meronymi: Sense 1 tree HAS PART: stump, tree stump HAS PART: crown, treetop HAS PART: limb, tree branch HAS PART: trunk, tree trunk, bole HAS PART: burl very good table, before I didn't understand totally the different types, but this helps a lot! Substance meronymi: Sense 1 water, H2O HAS SUBSTANCE: hydrogen, H, atomic number 1 HAS SUBSTANCE: oxygen, O, atomic number 8 Member meronymi: Sense 1 forest, wood, woods HAS MEMBER: underbrush, undergrowth, underwood HAS MEMBER: tree Table 2: Different types of meronymi used in WordNet. Antonyms describe words that are semantically opposed. If you are a parent, you can not be a child in the sense of someones child. However, they do not have to rule out one another. Even though poor and rich are antonyms, just saying that one is rich does not automatically mean that they are poor. I am not sure if the example of parent and child is so good. here it sounds as if YOU are a parent Verbs YOU cannot be a child of someone, but of course everyone is a child of someone. I know what you mean by it, but just because i know what an antonym is. maybe something like: love - hate would be better, or you should add: they are antonyms because they have opposit characteristics Verbs, just like nouns, have the hypernym relationship. Where the counterpart to hypernyms in the case of nouns is called hyponyms, this relationship among verbs are called troponyms. These goes go. from the event to a superordinate event, and from an event to a subordinate event, respectively. Troponyms can also be described as in which manner something is done, therefor explaining the difference of names. Antonymi also exists for verbs, and functions the same way, stop is an antonym of start. do you mean difference of senses? The third relation, entails goes from an event to an event it entails. Entailment is used in pragmatics to describe a relationship between to sentences, where the truth condition of one sentence depends two on the truth of the other. If sentence A is true, then sentence B also has to be true. For example If A entails B The criminal was sentenced to death (A), and The criminal is dead (B). If A is true, then B also has to be true. This is the kind of relationship described by entails in WordNet. If you snore, you are also sleeping, which is represented as an entails relation of the two words, and thus you entailment have an entails mapping from snore to sleep.

4 1.2.3 Adjectives and adverbs Adjectives are mostly organized in the terms of antonymi. As in the case of nouns and verbs, these are words which have meanings that are semantically opposed. As all words in WordNet, they are also part of a synset. The other adjectives in this particular synset also have their antonyms, and thus the antonyms of the other words become indirect antonyms for the synonyms. Pertainyms is a relation which points the adjective to the nouns that they were derived from. This is one of the relations that cross the part of speech, though there are a few rare cases in which it points to another adjective. There's an extra paragraph for cross relations, maybe put it there? The amount of adverbs are quite small. This depends on the fact that most of the adverbs in English are derived from adjectives. Those that does exist are ordered mostly in the same way that adjectives, with antonyms. They also have a relationship that is like the pertainym relation of adjectives, which also is a cross part of speech pointer, and points to the adjective that they were derived from. there's also the relation "similar to" an "also see" which conntects as it sais "similar adjectives, that are not so synsonymous to be in obe synset. you could explain that as well Relations across part of speech point Most of the relations in WordNet are relations among words of the same part of speech. There is are however some pointers across the subfields the part of speeches it consists of. One has already been mentioned, pertainyms, which points from an adjective to the noun that it was derived from. Other other than than that, there are pointers that points to semantically similar words which share the same stem, that are called derivationally related form. For many of these pairs of nouns and verbs, the thematic role is pointers also described. The verb kill has a pointer to the noun killer, and killer would be the agent of kill. 2. Using WordNet for Natural Language Processing There are several subfields in natural language processing which can benefit from having a large lexical database, especially one as big and extensive as WordNet. Obviously, many semantic applications can draw benefits from using WordNet, including WSD and sentiment analysis. Many papers have been published regarding WordNet and WSD, exploring different approaches and algorithms, which is the main field for using this. In fact, WordNet can be said to be the de facto standard knowledge source for WSD in English.[4] This success depends on several factors. It is not domain specific, it is very extensive and publicly available. what is which? Since WSD has been the subfield which has used WordNet most extensively, this is what will be good :) focused on here. Though, an interesting mention is that there do exists packages to access WordNet exist in several programming languages, including Perl and Python. For Python, the Natural Language Tool Kit (NLTK), which offers many modules and tools to analyze and process natural language and is widely used, has tools for using WordNet, such as finding synsets and other relations between words.

5 2.1 WordNet for Word Sense Disambiguation has WSD is a field which has been around since humans have tried to process natural language with computers. It is has been described as an AI-complete problem and is considered to be an intermediate step in many NLP tasks. The two main approaches to solving this problem are knowledge-based methods and supervised methods. Supervised methods suffer from sparseness in data to train on, in contrast to syntactic parsing, where there exists many resources of tagged data to exist work with. SemCor is a subset of the Brown Corpus, tagged with senses from WordNet. 186 files out of the 500 that constitutes the Brown Corpus have tags for all of the content words (nouns, verbs, adjectives and adverbs) and another 166 files have tags for the verbs. Even if this may be sufficient for evaluation, it is not enough for building a robust system for WSD.[3] The why footnote 3 after 4 knowledge-based methods use some kind of knowledge source, such as WordNet, to retrieve word senses. It is for these methods that WordNet has been used extensively. so are knowledge-based methods better? WordNet keeps occurring in papers regarding WSD to this date. Due to the knowledge-based methods using WordNet performs worse than supervised methods, approaches to extend the perform knowledge contained in WordNet have been proposed. They range from semantically tagging the glosses in WordNet to enrich the semantic relations, to extracting knowledge from Wikipedia. and to improve Combining WordNet with ConceptNet, a semantic network which contains semantic relations, to improve performance have also been proposed.[5] first you say supervised is not robust enough, but then you say that knowledge-based perform worse. I am confused. Maybe you can make two smaller paragraphs: supervised 3. Discussion approach and knowledge-based approach. There you can explain what supervised/knowledgebased is and how they use WordNet and what advantages/ disadvantages they have/how use of WordNet has to be improved in each approach WordNet is an impressive database, with its large amount of words and the encoding of the relations. Also being freely available makes it very practical to use for natural language processing, just as it have been. However there are quite a few things that may speak against it. The very fine-grained distinctions in the database can be problematic for several tasks. Difficulty, for example, have four different senses in WordNet, all of them very similar, and can be hard to set apart, just not for computers, but also for humans. As such, not all senses may be relevant to disambiguate a word. Other problems may be that it was mainly annotated and tagged by humans, which may produce some inconsistencies, and that it was not produced just to solve NLP tasks. WordNet is still widely used by people working in semantic natural language processing, as can be understood when reading papers, specifically regarding WSD. This can be seen in recent research, where WordNet have not been abandoned, but instead been used in combination with other resources, or has been tried to be improved in different ways. And since WordNet 3.0, it also contains a corpus of semantically annotated disambiguated glosses, which itself can prove to be useful.[8] WordNet will be used for a time to come for WSD, mostly because the sparseness of data for supervised methods. Improvement of the lexical knowledge and algorithms to use for this may be the best way to go for the time being. and since WordNet also contains a corpus.., itself can prove to be WordNet will be mainly used for WSD? may be the best wary to find a good sollution for the NLP tasks/wsd

6 sory now I get your footnote system :) forget about the footnote comments Bibliograhy [1] George A. Miller, Richard Beckwith, Christine, Fellbaum, Derek Gross & Katherine Miller, Introduction to WordNet: An On-line Lexical Database (1993) [2] Daniel Jurafsky & James H. Martin, Speech and Language Processing (Pearson Education International, 2009) [3] Eneko Agirre & Philip Edmonds, Word Sense Disambiguation: Algorithms and Applications (Springer, 2006) [4] Robert Navigli, Word Sense Disambiguation: A Survey (2009) [5] Junpeng Chen & Juan Liu, Combining ConceptNet and WordNet for Word Sense Disambiguation (2011) [6] Jorge Morato, Miguel Ángel Marzal, Juan Lloréns & José Moreiro, Wordnet Applications (2003) [7] Julian Szymański & Włodzisław Duch, Annotating Words Using WordNet Semantic Glosses (2012) I liked your discussion a lot, especially what the problems of WordNet are. It would be nice if you could extend / restructure the "WordNet for WDS" part, so one can understand what these two methods are, what the difference between them is,how WordNet is used in these approaches and what method performs better. I also like your tables with examples for the WordNet relations. Have a merry Chrismas :)

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering

More information

Natural Language Processing. Part 4: lexical semantics

Natural Language Processing. Part 4: lexical semantics Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh P rabhu 2 Shilpa Desai 1 Hanumant Redkar 1 N eha P rabhugaonkar 1 Apur va N agvenkar 1 Ramdas Karmali 1 (1) GOA

More information

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information

More information

An Integrated Approach to Automatic Synonym Detection in Turkish Corpus

An Integrated Approach to Automatic Synonym Detection in Turkish Corpus An Integrated Approach to Automatic Synonym Detection in Turkish Corpus Dr. Tuğba YILDIZ Assist. Prof. Dr. Savaş YILDIRIM Assoc. Prof. Dr. Banu DİRİ İSTANBUL BİLGİ UNIVERSITY YILDIZ TECHNICAL UNIVERSITY

More information

Package wordnet. January 6, 2016

Package wordnet. January 6, 2016 Title WordNet Interface Version 0.1-11 Package wordnet January 6, 2016 An interface to WordNet using the Jawbone Java API to WordNet. WordNet () is a large lexical database

More information

Intro to Linguistics Semantics

Intro to Linguistics Semantics Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember

More information

Clever Search: A WordNet Based Wrapper for Internet Search Engines

Clever Search: A WordNet Based Wrapper for Internet Search Engines Clever Search: A WordNet Based Wrapper for Internet Search Engines Peter M. Kruse, André Naujoks, Dietmar Rösner, Manuela Kunze Otto-von-Guericke-Universität Magdeburg, Institut für Wissens- und Sprachverarbeitung,

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Semantic analysis of text and speech

Semantic analysis of text and speech Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic

More information

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain? What s in a Lexicon What kind of Information should a Lexicon contain? The Lexicon Miriam Butt November 2002 Semantic: information about lexical meaning and relations (thematic roles, selectional restrictions,

More information

2. SEMANTIC RELATIONS

2. SEMANTIC RELATIONS 2. SEMANTIC RELATIONS 2.0 Review: meaning, sense, reference A word has meaning by having both sense and reference. Sense: Word meaning: = concept Sentence meaning: = proposition (1) a. The man kissed the

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

WordNet Website Development And Deployment Using Content Management Approach

WordNet Website Development And Deployment Using Content Management Approach WordNet Website Development And Deployment Using Content Management Approach N eha R P rabhugaonkar 1 Apur va S N ag venkar 1 Venkatesh P P rabhu 2 Ramdas N Karmali 1 (1) GOA UNIVERSITY, Taleigao - Goa

More information

Cross-lingual Synonymy Overlap

Cross-lingual Synonymy Overlap Cross-lingual Synonymy Overlap Anca Dinu 1, Liviu P. Dinu 2, Ana Sabina Uban 2 1 Faculty of Foreign Languages and Literatures, University of Bucharest 2 Faculty of Mathematics and Computer Science, University

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

The study of words. Word Meaning. Lexical semantics. Synonymy. LING 130 Fall 2005 James Pustejovsky. ! What does a word mean?

The study of words. Word Meaning. Lexical semantics. Synonymy. LING 130 Fall 2005 James Pustejovsky. ! What does a word mean? Word Meaning LING 130 Fall 2005 James Pustejovsky The study of words! What does a word mean?! To what extent is it a linguistic matter?! To what extent is it a matter of world knowledge? Thanks to Richard

More information

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets

Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells Computer Science Dep., Universidad Autonoma de Madrid, 28049 Madrid, Spain

More information

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction

More information

Comparing methods for automatic acquisition of Topic Signatures

Comparing methods for automatic acquisition of Topic Signatures Comparing methods for automatic acquisition of Topic Signatures Montse Cuadros, Lluis Padro TALP Research Center Universitat Politecnica de Catalunya C/Jordi Girona, Omega S107 08034 Barcelona {cuadros,

More information

Latin WordNet project

Latin WordNet project Latin WordNet project Stefano Minozzi Laboratorio di Informatica Umanistica Università degli Studi di Verona Latin WordNet project Laboratorio di Informatica Umanistica Università degli Studi di Verona

More information

The Open University s repository of research publications and other research outputs. PowerAqua: fishing the semantic web

The Open University s repository of research publications and other research outputs. PowerAqua: fishing the semantic web Open Research Online The Open University s repository of research publications and other research outputs PowerAqua: fishing the semantic web Conference Item How to cite: Lopez, Vanessa; Motta, Enrico

More information

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD Eneko Agirre and Oier Lopez de Lacalle and Aitor Soroa Informatika Fakultatea, University of the Basque Country 20018,

More information

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD. Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.

More information

Introduction to WordNet: An On-line Lexical Database. (Revised August 1993)

Introduction to WordNet: An On-line Lexical Database. (Revised August 1993) Introduction to WordNet: An On-line Lexical Database George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller (Revised August 1993) WordNet is an on-line lexical reference

More information

Customer Intentions Analysis of Twitter Based on Semantic Patterns

Customer Intentions Analysis of Twitter Based on Semantic Patterns Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun mohamed.hamrounn@gmail.com Mohamed Salah Gouider ms.gouider@yahoo.fr Lamjed Ben Said lamjed.bensaid@isg.rnu.tn ABSTRACT

More information

King Midas & the Golden Touch

King Midas & the Golden Touch TM Celebration Press Reading DRA2 Level 30 Guided Reading Level N Genre: Fiction Traditional Tale Reading Skill: Analyze Theme King Midas & the Golden Touch Retold by Alan Trussell-Cullen Illustrated by

More information

Integrating Collaborative Filtering and Sentiment Analysis: A Rating Inference Approach

Integrating Collaborative Filtering and Sentiment Analysis: A Rating Inference Approach Integrating Collaborative Filtering and Sentiment Analysis: A Rating Inference Approach Cane Wing-ki Leung and Stephen Chi-fai Chan and Fu-lai Chung 1 Abstract. We describe a rating inference approach

More information

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Multilingual Word Sense Disambiguation and Entity Linking on the Web based on BabelNet Roberto Navigli, Tiziano

More information

Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features

Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features Jinying Chen and Martha Palmer Department of Computer and Information Science, University of Pennsylvania,

More information

Comparative Analysis on the Armenian and Korean Languages

Comparative Analysis on the Armenian and Korean Languages Comparative Analysis on the Armenian and Korean Languages Syuzanna Mejlumyan Yerevan State Linguistic University Abstract It has been five years since the Korean language has been taught at Yerevan State

More information

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko

More information

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software Semantic Research using Natural Language Processing at Scale; A continued look behind the scenes of Semantic Insights Research Assistant and Research Librarian Presented to The Federal Big Data Working

More information

Web opinion mining: How to extract opinions from blogs?

Web opinion mining: How to extract opinions from blogs? Web opinion mining: How to extract opinions from blogs? Ali Harb ali.harb@ema.fr Mathieu Roche LIRMM CNRS 5506 UM II, 161 Rue Ada F-34392 Montpellier, France mathieu.roche@lirmm.fr Gerard Dray gerard.dray@ema.fr

More information

Chapter 2 The Information Retrieval Process

Chapter 2 The Information Retrieval Process Chapter 2 The Information Retrieval Process Abstract What does an information retrieval system look like from a bird s eye perspective? How can a set of documents be processed by a system to make sense

More information

Natural Language Dialogue in a Virtual Assistant Interface

Natural Language Dialogue in a Virtual Assistant Interface Natural Language Dialogue in a Virtual Assistant Interface Ana M. García-Serrano, Luis Rodrigo-Aguado, Javier Calle Intelligent Systems Research Group Facultad de Informática Universidad Politécnica de

More information

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5

More information

Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations

Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations Bento C. Dias-da-Silva 1, Ariani Di Felippo 2, Ricardo Hasegawa 3 Centro de Estudos Lingüísticos

More information

The Lois Project: Lexical Ontologies for Legal Information Sharing

The Lois Project: Lexical Ontologies for Legal Information Sharing The Lois Project: Lexical Ontologies for Legal Information Sharing Daniela Tiscornia Institute of Legal Information Theory and Techniques - Italian National Research Council Abstract. Semantic metadata

More information

Sentiment Analysis: a case study. Giuseppe Castellucci castellucci@ing.uniroma2.it

Sentiment Analysis: a case study. Giuseppe Castellucci castellucci@ing.uniroma2.it Sentiment Analysis: a case study Giuseppe Castellucci castellucci@ing.uniroma2.it Web Mining & Retrieval a.a. 2013/2014 Outline Sentiment Analysis overview Brand Reputation Sentiment Analysis in Twitter

More information

Keep your English up to date 4. Teacher s pack Lesson plan and student worksheets with answers. Facebook

Keep your English up to date 4. Teacher s pack Lesson plan and student worksheets with answers. Facebook Keep your English up to date 4 Teacher s pack Lesson plan and student worksheets with answers British Broadcasting Corporation 2008 Lesson Plan: Teacher's notes CONTENTS 1. Level, topic, language, aims,

More information

Information Retrieval, Information Extraction and Social Media Analytics

Information Retrieval, Information Extraction and Social Media Analytics Anwendersoftware a Information Retrieval, Information Extraction and Social Media Analytics Based on chapter 10 of the Advanced Information Management lecture Laura Kassner Universität Stuttgart Winter

More information

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

More information

Synonyms and Near Synonyms in Dictionaries of the English Language. Darija Omrčen Faculty of Kinesiology, University of Zagreb, Croatia

Synonyms and Near Synonyms in Dictionaries of the English Language. Darija Omrčen Faculty of Kinesiology, University of Zagreb, Croatia Synonyms and Near Synonyms in Dictionaries of the English Language Darija Omrčen Faculty of Kinesiology, University of Zagreb, Croatia SEMANTIC RELATIONSHIPS Synonymy, antonymy, polysemy, etc. are the

More information

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS Alok Ranjan Pal 1, 3, Anirban Kundu 2, 3, Abhay Singh 1, Raj Shekhar 1, Kunal Sinha 1 1 College of Engineering and Management,

More information

DISA at ImageCLEF 2014: The search-based solution for scalable image annotation

DISA at ImageCLEF 2014: The search-based solution for scalable image annotation DISA at ImageCLEF 2014: The search-based solution for scalable image annotation Petra Budikova, Jan Botorek, Michal Batko, and Pavel Zezula Masaryk University, Brno, Czech Republic {budikova,botorek,batko,zezula}@fi.muni.cz

More information

Rubrics & Checklists

Rubrics & Checklists Rubrics & Checklists fulfilling Common Core s for Third Grade Narrative Writing Self-evaluation that's easy to use and comprehend Scoring that's based on Common Core expectations Checklists that lead students

More information

Level 2 l Intermediate

Level 2 l Intermediate 1 Warmer What kinds of food do people often throw away? Do you waste food? Why? Why not? 2 Key words Complete the sentences using these key words from the text. The paragraph numbers are given to help

More information

Combining Contextual Features for Word Sense Disambiguation

Combining Contextual Features for Word Sense Disambiguation Proceedings of the SIGLEX/SENSEVAL Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, Philadelphia, July 2002, pp. 88-94. Association for Computational Linguistics. Combining

More information

The Oxford Learner s Dictionary of Academic English

The Oxford Learner s Dictionary of Academic English ISEJ Advertorial The Oxford Learner s Dictionary of Academic English Oxford University Press The Oxford Learner s Dictionary of Academic English (OLDAE) is a brand new learner s dictionary aimed at students

More information

PoS-tagging Italian texts with CORISTagger

PoS-tagging Italian texts with CORISTagger PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy fabio.tamburini@unibo.it Abstract. This paper presents an evolution of CORISTagger [1], an high-performance

More information

MODIFIERS. There are many different types of modifiers. Let's begin by taking a look at the most common ones.

MODIFIERS. There are many different types of modifiers. Let's begin by taking a look at the most common ones. MODIFIERS A modifier is a word, phrase, or clause that describes another word or word group. Many types of words and phrases can act as modifiers, such as adjectives, adverbs, and prepositional phrases.

More information

Clustering of Polysemic Words

Clustering of Polysemic Words Clustering of Polysemic Words Laurent Cicurel 1, Stephan Bloehdorn 2, and Philipp Cimiano 2 1 isoco S.A., ES-28006 Madrid, Spain lcicurel@isoco.com 2 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe,

More information

A Survey on Product Aspect Ranking Techniques

A Survey on Product Aspect Ranking Techniques A Survey on Product Aspect Ranking Techniques Ancy. J. S, Nisha. J.R P.G. Scholar, Dept. of C.S.E., Marian Engineering College, Kerala University, Trivandrum, India. Asst. Professor, Dept. of C.S.E., Marian

More information

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania htd@linc.cis.upenn.edu October 30, 2003 Outline English sense-tagging

More information

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,

More information

Non-exam Assessment Tasks

Non-exam Assessment Tasks SPECIMEN MATERIAL ENTRY LEVEL CERTIFICATE STEP UP TO ENGLISH Silver Step 5972/1 Component 1 Literacy Topics Planning the Prom Non-exam Assessment Task and Teachers Notes Specimen 2015 Time allowed: 1 hour

More information

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Ling 201 Syntax 1. Jirka Hana April 10, 2006 Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all

More information

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process

More information

Stock Market Prediction Using Data Mining

Stock Market Prediction Using Data Mining Stock Market Prediction Using Data Mining 1 Ruchi Desai, 2 Prof.Snehal Gandhi 1 M.E., 2 M.Tech. 1 Computer Department 1 Sarvajanik College of Engineering and Technology, Surat, Gujarat, India Abstract

More information

An empirical study of semantic similarity in WordNet and Word2Vec. A Thesis

An empirical study of semantic similarity in WordNet and Word2Vec. A Thesis An empirical study of semantic similarity in WordNet and Word2Vec A Thesis Submitted to the Graduate Faculty of the University of New Orleans in partial fulfillment of the requirements for the degree of

More information

Taxonomies for Auto-Tagging Unstructured Content. Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013

Taxonomies for Auto-Tagging Unstructured Content. Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013 Taxonomies for Auto-Tagging Unstructured Content Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013 About Heather Hedden Independent taxonomy consultant, Hedden

More information

5. Develop two test questions based on the first chapter:

5. Develop two test questions based on the first chapter: Reading Notes: Chapter One (pgs. 1 16) Introduction While reading, we will pause to make some observations. These observations are intended to improve your ability to see and interpret key ideas and events

More information

Language Meaning and Use

Language Meaning and Use Language Meaning and Use Raymond Hickey, English Linguistics Website: www.uni-due.de/ele Types of meaning There are four recognisable types of meaning: lexical meaning, grammatical meaning, sentence meaning

More information

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided Categories Categories According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided into 1 2 Categories those that belong to the Emperor embalmed

More information

SPELLING WORD #1: SENTENCE:

SPELLING WORD #1: SENTENCE: ACTIVITY 1: SENTENCES: Use each spelling word in a third grade sentence. (Underline the spelling word.) Ex. I know how to spell each word because I did my homework. SPELLING WORD #1: ACTIVITY 2: SYLLABLES:

More information

ISSN: 2278-5299 365. Sean W. M. Siqueira, Maria Helena L. B. Braz, Rubens Nascimento Melo (2003), Web Technology for Education

ISSN: 2278-5299 365. Sean W. M. Siqueira, Maria Helena L. B. Braz, Rubens Nascimento Melo (2003), Web Technology for Education International Journal of Latest Research in Science and Technology Vol.1,Issue 4 :Page No.364-368,November-December (2012) http://www.mnkjournals.com/ijlrst.htm ISSN (Online):2278-5299 EDUCATION BASED

More information

A Mixed Trigrams Approach for Context Sensitive Spell Checking

A Mixed Trigrams Approach for Context Sensitive Spell Checking A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu

More information

Learning Translation Rules from Bilingual English Filipino Corpus

Learning Translation Rules from Bilingual English Filipino Corpus Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,

More information

Automatic Text Analysis Using Drupal

Automatic Text Analysis Using Drupal Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing

More information

Nouns are naming words - they are used to name a person, place or thing.

Nouns are naming words - they are used to name a person, place or thing. Adjectives Adjectives are describing words - they tell you more about nouns. Nouns are naming words - they are used to name a person, place or thing. Adjectives tell you more about the noun. Using adjectives

More information

SEMANTIC RESOURCES AND THEIR APPLICATIONS IN HUNGARIAN NATURAL LANGUAGE PROCESSING

SEMANTIC RESOURCES AND THEIR APPLICATIONS IN HUNGARIAN NATURAL LANGUAGE PROCESSING SEMANTIC RESOURCES AND THEIR APPLICATIONS IN HUNGARIAN NATURAL LANGUAGE PROCESSING Doctor of Philosophy Dissertation Márton Miháltz Supervisor: Gábor Prószéky, D.Sc. Multidisciplinary Technical Sciences

More information

Rubrics & Checklists

Rubrics & Checklists Rubrics & Checklists fulfilling Common Core s for Third Grade Opinion Writing Self-evaluation that's easy to use and comprehend Scoring that's based on Common Core expectations Checklists that lead students

More information

Teaching Vocabulary to Young Learners (Linse, 2005, pp. 120-134)

Teaching Vocabulary to Young Learners (Linse, 2005, pp. 120-134) Teaching Vocabulary to Young Learners (Linse, 2005, pp. 120-134) Very young children learn vocabulary items related to the different concepts they are learning. When children learn numbers or colors in

More information

Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base

Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base Aitor González Agirre, Egoitz Laparra, German Rigau Basque Country University Donostia, Basque Country {aitor.gonzalez,

More information

Word Taxonomy for On-line Visual Asset Management and Mining

Word Taxonomy for On-line Visual Asset Management and Mining Word Taxonomy for On-line Visual Asset Management and Mining Osmar R. Zaïane * Eli Hagen ** Jiawei Han ** * Department of Computing Science, University of Alberta, Canada, zaiane@cs.uaberta.ca ** School

More information

Semantic annotation of requirements for automatic UML class diagram generation

Semantic annotation of requirements for automatic UML class diagram generation www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

More information

Word Sense Disambiguation as an Integer Linear Programming Problem

Word Sense Disambiguation as an Integer Linear Programming Problem Word Sense Disambiguation as an Integer Linear Programming Problem Vicky Panagiotopoulou 1, Iraklis Varlamis 2, Ion Androutsopoulos 1, and George Tsatsaronis 3 1 Department of Informatics, Athens University

More information

A Serious Game for Building a Portuguese Lexical-Semantic Network

A Serious Game for Building a Portuguese Lexical-Semantic Network A Serious Game for Building a Portuguese Lexical-Semantic Network Mathieu Mangeot LIG-GETALP & Université de Savoie (France) 41 rue des mathématiques, BP 53 F-38041 GRENOBLE CEDEX 9 FRANCE mathieu.mangeot@imag.fr

More information

TERMINOGRAPHY and LEXICOGRAPHY What is the difference? Summary. Anja Drame TermNet

TERMINOGRAPHY and LEXICOGRAPHY What is the difference? Summary. Anja Drame TermNet TERMINOGRAPHY and LEXICOGRAPHY What is the difference? Summary Anja Drame TermNet Summary/ Conclusion Variety of language (GPL = general purpose SPL = special purpose) Lexicography GPL SPL (special-purpose

More information

The Lexicon. The Lexicon. The Lexicon. The Significance of the Lexicon. Australian? The Significance of the Lexicon 澳 大 利 亚 奥 地 利

The Lexicon. The Lexicon. The Lexicon. The Significance of the Lexicon. Australian? The Significance of the Lexicon 澳 大 利 亚 奥 地 利 The significance of the lexicon Lexical knowledge Lexical skills 2 The significance of the lexicon Lexical knowledge The Significance of the Lexicon Lexical errors lead to misunderstanding. There s that

More information

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,

More information

Checklist for Recognizing Complete Verbs

Checklist for Recognizing Complete Verbs Checklist for Recognizing Complete Verbs Use the following six guidelines to help you determine if a word or group of words is a verb. 1. A complete verb tells time by changing form. This is the number

More information

You should read this chapter if you need to review or learn about

You should read this chapter if you need to review or learn about CHAPTER 4 Using Adjectives and Ads Correctly Do I Need to Read This Chapter? You should read this chapter if you need to review or learn about Distinguishing between adjectives and ads Comparing with adjectives

More information

Words with Attitude. Jaap Kamps and Maarten Marx. Abstract. 1 Introduction

Words with Attitude. Jaap Kamps and Maarten Marx. Abstract. 1 Introduction Words with Attitude Jaap Kamps and Maarten Marx Abstract The traditional notion of word meaning used in natural language processing is literal or lexical meaning as used in dictionaries and lexicons. This

More information

The more money we have NEW INTERNATIONALIST EASIER ENGLISH INTERMEDIATE READY LESSON

The more money we have NEW INTERNATIONALIST EASIER ENGLISH INTERMEDIATE READY LESSON The more money we have NEW INTERNATIONALIST EASIER ENGLISH INTERMEDIATE READY LESSON This lesson: 1/ Grammar 2/ Speaking: about ideas and graphs 3/ Reading: for gist and detail 4/ Writing: make a poster

More information

Enriching a formal ontology with a thesaurus: an application in the cultural heritage domain

Enriching a formal ontology with a thesaurus: an application in the cultural heritage domain Enriching a formal ontology with a thesaurus: an application in the cultural heritage domain Roberto Navigli, Paola Velardi Dipartimento di Informatica, Università La Sapienza, Italy navigli,velardi@di.uniroma1.it

More information

SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE

SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE OBJECTIVES the game is to say something new with old words RALPH WALDO EMERSON, Journals (1849) In this chapter, you will learn: how we categorize words how words

More information

Self-Monitoring in Social Networks

Self-Monitoring in Social Networks Self-Monitoring in Social Networks Amin Anjomshoaa 1, Khue Vo Sao 1, Amirreza Tahamtan 1, A Min Tjoa 1, Edgar Weippl 2 1 Institute of Software Technology and Interactive Systems, Vienna University of Technology

More information

Chapter 10 Paraphrasing and Plagiarism

Chapter 10 Paraphrasing and Plagiarism Source: Wallwork, Adrian. English for Writing Research Papers. New York: Springer, 2011. http://bit.ly/11frtfk Chapter 10 Paraphrasing and Plagiarism Why is this chapter important? Conventions regarding

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Visualizing WordNet Structure

Visualizing WordNet Structure Visualizing WordNet Structure Jaap Kamps Abstract Representations in WordNet are not on the level of individual words or word forms, but on the level of word meanings (lexemes). A word meaning, in turn,

More information

VCOP. Vocabulary, Connectives, Openers and Punctuation - Helping your child with V.C.O.P at home

VCOP. Vocabulary, Connectives, Openers and Punctuation - Helping your child with V.C.O.P at home Vocabulary, Connectives, Openers and Punctuation - VCOP Helping your child with V.C.O.P at home Throughout the school, the children are involved in activities that help them to gain more knowledge about

More information

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged

More information

Cambridge English: Advanced Speaking Sample test with examiner s comments

Cambridge English: Advanced Speaking Sample test with examiner s comments Speaking Sample test with examiner s comments This document will help you familiarise yourself with the Speaking test for Cambridge English: Advanced, also known as Certificate in Advanced English (CAE).

More information

Domain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora

Domain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora Domain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora Jing-Shin Chang Department of Computer Science& Information Engineering National

More information

Rubrics & Checklists

Rubrics & Checklists Rubrics & Checklists fulfilling Common Core s for Fourth Grade Narrative Writing Self-evaluation that's easy to use and comprehend Scoring that's based on Common Core expectations Checklists that lead

More information

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No. Table of Contents Title Declaration by the Candidate Certificate of Supervisor Acknowledgement Abstract List of Figures List of Tables List of Abbreviations Chapter Chapter No. 1 Introduction 1 ii iii

More information

Using DEB Services for Knowledge Representation within the KYOTO Project

Using DEB Services for Knowledge Representation within the KYOTO Project Using DEB Services for Knowledge Representation within the KYOTO Project Aleš Horák and Adam Rambousek Faculty of Informatics, Masaryk University Botanická 68a, 602 00 Brno, Czech Republic {hales,xrambous}@fi.muni.cz

More information