An Integrated Approach to Automatic Synonym Detection in Turkish Corpus
|
|
- Silvester Flynn
- 7 years ago
- Views:
Transcription
1 An Integrated Approach to Automatic Synonym Detection in Turkish Corpus Dr. Tuğba YILDIZ Assist. Prof. Dr. Savaş YILDIRIM Assoc. Prof. Dr. Banu DİRİ İSTANBUL BİLGİ UNIVERSITY YILDIZ TECHNICAL UNIVERSITY Department of Computer Engineering September 17, 2014
2 1 Semantic Relation 2 Experimental Setup 3 Methodology 4 Results and Evaluation 5 Future Work 6 Conclusion
3 Semantic Relation Semantic relations are underlying relations betwen two concepts expressed by words or phrases (Moldovan 2004) Contradictory views about number of semantic relations Uncountability (Murphy 2003) 13 classes (Rosario 2001) 17 classes (Stephens 2001) 5 classes at the top and 30 classes at the bottom (Nastase 2003) 35 classes (Moldovan 2004) 7 classes (SemEval Task4) etc.
4 Semantic Relation Five main families of relations: Hyponym/Hypernym (Class inclusion) Meronym/Holonym (Part-Whole) Synonym (Similars) Antonym (Contrast) Case
5 Semantic Relation Five main families of relations: Hyponym/Hypernym (Class inclusion) Meronym/Holonym (Part-Whole) Synonym (Similars) Antonym (Contrast) Case
6 Semantic Relation In this study: Synonym (Similars) Synonyms are words with identical or similar meanings car is synonymous with automobile Hyponym/Hypernym (Class inclusion) A relation of inclusion (IS-A / kind of) A dog is an animal dog: hyponym / animal: hypernym Meronym/Holonym (Part-Whole) A relationship between terms that respect to the significant parts of a whole the eye is part of the face eye: part / face: whole
7 Related Works Methods are mostly based on distributional similarity. Distributional hypothesis which adopts that semantically similar words share similar contexts (Harris 1954). However insufficient for synonymy! covers near-synonyms (Eg. top-5 for orange is: yellow, lemon, peach, pink, lime) (Lin et. al. 2003) does not distinguish between synonyms and other relations Different strategies: integrating two independent approaches such as distributional similarity and pattern-based approaches utilizing external features such as dependency relations.
8 Related Works In Turkish, recent studies on synonym relations are based on dictionary definition TDK 1 and Wiktionary. Defined rules and phrasal patterns that occur in the dictionary definitions with. Objective of this study: to determine synonymy in a Turkish Corpus The main contributions of our work: its corpus-driven characteristics relies on both dependency and semantic relations. 1 TDK:Turkish Language Association
9 Language Resources BOUN Web corpus with 500M tokens (Sak et.al. 2008) four sub-corpora: Three of them named NewsCor are from three major Turkish news portals GenCor is a general sampling of web pages in the Turkish Language morphological parser based on two-level morphology an averaged perceptron-based morphological disambiguator Preprocessing surface+root+pos-tag+[and all other markers] Turkish: araba English: car arabanın+araba+noun+a3sg+pnon+gen
10 Methodology No particular LSPs to extract synonymy Our main assumption: Synonym pairs mostly show similar dependency and semantic characteristics in corpus. They share the same meronym/holonym relations, same particular list of governing verbs, adjective modification profile and so on. Examples: Automobile is a vehicle Car is a vehicle Wheel is part of automobile Wheel is part of car
11 Methodology Model: determine if a given word pair is synonym or not. Data: Manually and randomly selected 200 synonym pairs(syn) and 200 non-synonym pairs(nonsyn) to build a training data set. Non-synonym pairs are especially selected from associated (relevant) pairs such as tree-leaf, student-school, computer-game, etc. Features from Corpus: For each synonym pair, 15 different features. Co-occurrence (1) Semantic relations based on LSPs (4) Dependency relations based on syntactic patterns (10) Target Class: SYN and NONSYN
12 Features from Corpus Feature 1- Co-occurrence: The first feature is the co-occurrence of word pairs within a broad context Features 2/3- Meronym/Holonym: A big matrix in which rows depict Whole candidates, columns depict Part candidates Example: Table: 1. Whole X Part Matrix Whole/Part Part1 Part2... Partm Whole1 val. val.... val. Whole2 val. val.... val.... val. val.... val. Wholen val. val.... val. Cells represent the possibility of that corresponding whole and part are in meronymy relation
13 Features from Corpus Table: 2. General Patterns (GP), Dictionary-based Patterns(TDK-P) and Bootstrapped Patterns(BP) GP TDK-P BP NPx part of NPy Group-of NPy-gen NPx-pos (whole group all) (set flock union) NPx member of NPy Member-of NPy-nom NPx-pos (class member team) (from the family of Y) NPy constituted of NPx Amount-of NPy-Gen (N-ADJ)+ NPx-Pos (amount measure unit) NPy made of NPx Has/Have (Y has l(h)) NPy of one-of NPx NPy consist of NPx Consist-of NPx whose NPy NPy has/have NPx Made-of NPxs with NPy NPy with NPx
14 Features from Corpus To measure the similarity of meronymy/holonymy profile of two given words, cosine function is applied on two rows/columns indexed by two given words Table: 3. Example of Whole X Part Matrix Whole/Part Wheel... Car... Wiper Automobile... Car X X School Person Automobile X X Parking X X...
15 Features from Corpus Features 4/5 - Hyponym/Hypernym: Another matrix is built for hypernymy/hyponymy by applying LSPs Same procedure is carried out as Meronym/Holonym. NPs gibi CLASS (CLASS such as NPs) NPs ve diğer CLASS (NPs and other CLASS) CLASS lardan NPs (NPs from CLASS) NPs ve benzeri CLASS (NPs and similar CLASS)
16 Features from Corpus Features Dependency Relations: are obtained by syntactic patterns The more they are modified by same adjectives, the more likely they are synonym. Examples: Fast Car (+) / Fast Automobile (+) / Handsome Car (X) / Handsome Automobile (X) 36 different patterns were extracted, 8 were eliminated because of the poor results. Then we grouped them according to their syntactic structures.
17 Features from Corpus Table: 4. Dependency Features Features Dependency relation # of Patterns Examples G1 direct object of verb 13 I drive a car araba sürüyorum G2 subject of verb 3 waiting car bekleyen araba G3 direct object/subject of verb G4 modified by adjective+(with/without) 2 car with gasoline benzinli araba G5 modified by inf 1 swimming pool yüzme havuzu G6 modified by noun 1 toy car oyuncak araba G7 modified by adjective 1 red car kırmızı araba G8 modified by acronym locations 1 the cars in ABD ABD deki arabalar G9 modified by proper noun locations 1 the cars in Istanbul Istanbul daki arabalar G10 modified by locations 2 the car at parking lot otoparktaki araba
18 Results and Evaluation All features explained before are computed for all pairs/instances All computed scores of the pairs are kept as a training set Taking all features into account and applying machine learning algorithms The most suitable algorithm, logistic regression
19 Results and Evaluation The first aim is to find out which feature is the most informative for detecting synonymy and contributes most to the overall success of the model. Table: 5. F-Measure of Semantic Relations (SRs) Features co-occurrence hyponym hypernym meronym holonym F-Measure The semantic features are notably better than dependency features. Among semantic relations, the most powerful attributes are meronymy and holonymy features. The possible reason: the sufficient number of cases matched by lexico-syntactic patterns.
20 Results and Evaluation Among dependency relations, G1, G4 and G7 have better performance Table: 6. F-measure of Dependency Relations Features G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 F-Measure The poorest groups, G8 and G9, have low production capacity The successful features are linearly dependent on target class. On the aggregated data where all useful features are considered, the performance of logistic regression is F- measure of 80.3%. The achieved score is better than the individual performance of each feature.
21 Done In our paper: Dictionary definitions, WordNet, and other useful resources could be used and evaluated in future work. In our review: WordNet seems a good resource helping improve it even more.
22 Done
23 Conclusion Semantic relation is one of the problem in applications in NLP. Automatic acquisition of semantic relation has become more popular in recent years. It is obviously impossible to discover synonym pairs by applying LSP. Thanks to semantic and syntactic relations. The most powerful semantic relations: Meronym and Holonym. The most powerful syntactic relations: G4 modified by adjective (with/without). Aggregation of useful features gives promising result.
24 Thank You For Listening...
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
More informationTurkish synonym identification from multiple resources: monolingual corpus, mono/bilingual on-line dictionaries and WordNet
1 1 1 1 1 1 1 1 0 1 Turkish synonym identification from multiple resources: monolingual corpus, mono/bilingual on-line dictionaries and WordNet Tuğba YILDIZ 1,*, Banu DİRİ, Savaş YILDIRIM 1 1 Department
More informationIntro to Linguistics Semantics
Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember
More informationCross-lingual Synonymy Overlap
Cross-lingual Synonymy Overlap Anca Dinu 1, Liviu P. Dinu 2, Ana Sabina Uban 2 1 Faculty of Foreign Languages and Literatures, University of Bucharest 2 Faculty of Mathematics and Computer Science, University
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationClever Search: A WordNet Based Wrapper for Internet Search Engines
Clever Search: A WordNet Based Wrapper for Internet Search Engines Peter M. Kruse, André Naujoks, Dietmar Rösner, Manuela Kunze Otto-von-Guericke-Universität Magdeburg, Institut für Wissens- und Sprachverarbeitung,
More informationNatural Language Processing. Part 4: lexical semantics
Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between
More informationSemantic Clustering in Dutch
t.van.de.cruys@rug.nl Alfa-informatica, Rijksuniversiteit Groningen Computational Linguistics in the Netherlands December 16, 2005 Outline 1 2 Clustering Additional remarks 3 Examples 4 Research carried
More informationComparing Topiary-Style Approaches to Headline Generation
Comparing Topiary-Style Approaches to Headline Generation Ruichao Wang 1, Nicola Stokes 1, William P. Doran 1, Eamonn Newman 1, Joe Carthy 1, John Dunnion 1. 1 Intelligent Information Retrieval Group,
More informationPOLITE ENGLISH. Expressing your gratitude in English. FREE ON-LINE COURSE. Lesson 6: version without a key
POLITE ENGLISH FREE ON-LINE COURSE Lesson 6: Expressing your gratitude in English. version without a key WARM UP THINK "Feeling gratitude and not expressing it is like wrapping a present and not giving
More informationClustering of Polysemic Words
Clustering of Polysemic Words Laurent Cicurel 1, Stephan Bloehdorn 2, and Philipp Cimiano 2 1 isoco S.A., ES-28006 Madrid, Spain lcicurel@isoco.com 2 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe,
More informationDISA at ImageCLEF 2014: The search-based solution for scalable image annotation
DISA at ImageCLEF 2014: The search-based solution for scalable image annotation Petra Budikova, Jan Botorek, Michal Batko, and Pavel Zezula Masaryk University, Brno, Czech Republic {budikova,botorek,batko,zezula}@fi.muni.cz
More informationUsing Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance David Bixler, Dan Moldovan and Abraham Fowler Language Computer Corporation 1701 N. Collins Blvd #2000 Richardson,
More informationText Analytics. A business guide
Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application
More informationTaxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,
More information1 Class Diagrams and Entity Relationship Diagrams (ERD)
1 Class Diagrams and Entity Relationship Diagrams (ERD) Class diagrams and ERDs both model the structure of a system. Class diagrams represent the dynamic aspects of a system: both the structural and behavioural
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationExtraction of Hypernymy Information from Text
Extraction of Hypernymy Information from Text Erik Tjong Kim Sang, Katja Hofmann and Maarten de Rijke Abstract We present the results of three different studies in extracting hypernymy information from
More informationA Uniform Approach to Analogies, Synonyms, Antonyms, and Associations
A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations Peter D. Turney National Research Council of Canada Institute for Information Technology M50 Montreal Road Ottawa, Ontario, Canada
More informationExploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, July 2002, pp. 125-132. Association for Computational Linguistics. Exploiting Strong Syntactic Heuristics
More informationTransaction-Typed Points TTPoints
Transaction-Typed Points TTPoints version: 1.0 Technical Report RA-8/2011 Mirosław Ochodek Institute of Computing Science Poznan University of Technology Project operated within the Foundation for Polish
More informationThe study of words. Word Meaning. Lexical semantics. Synonymy. LING 130 Fall 2005 James Pustejovsky. ! What does a word mean?
Word Meaning LING 130 Fall 2005 James Pustejovsky The study of words! What does a word mean?! To what extent is it a linguistic matter?! To what extent is it a matter of world knowledge? Thanks to Richard
More informationIdentifying free text plagiarism based on semantic similarity
Identifying free text plagiarism based on semantic similarity George Tsatsaronis Norwegian University of Science and Technology Department of Computer and Information Science Trondheim, Norway gbt@idi.ntnu.no
More informationSemantic analysis of text and speech
Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic
More informationPhase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
More informationFree Construction of a Free Swedish Dictionary of Synonyms
Free Construction of a Free Swedish Dictionary of Synonyms Viggo Kann and Magnus Rosell KTH Nada SE-1 44 Stockholm Sweden {viggo, rosell}@nada.kth.se Abstract Building a large dictionary of synonyms for
More informationAutomated Extraction of Security Policies from Natural-Language Software Documents
Automated Extraction of Security Policies from Natural-Language Software Documents Xusheng Xiao 1 Amit Paradkar 2 Suresh Thummalapenta 3 Tao Xie 1 1 Dept. of Computer Science, North Carolina State University,
More informationInformation Retrieval, Information Extraction and Social Media Analytics
Anwendersoftware a Information Retrieval, Information Extraction and Social Media Analytics Based on chapter 10 of the Advanced Information Management lecture Laura Kassner Universität Stuttgart Winter
More informationDefining Antonymy: A Corpus-based Study of Opposites Found by Lexico-syntactic Patterns Abstract
Defining Antonymy: A Corpus-based Study of Opposites Found by Lexico-syntactic Patterns Abstract Using small sets of adjectival seed antonym pairs, we automatically find patterns where these pairs co-occur
More informationSentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationA Semantic Portal for the International Affairs Sector
A Semantic Portal for the International Affairs Sector Contreras, Benjamins, Blazquez, Losada, Salle, Sevilla, Navaro, Casillas, Mompo, Paton, Corcho (isoco) www.esperonto.net MCYT, PROFIT Tena, Martos
More informationC o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER
INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process
More informationSIMOnt: A Security Information Management Ontology Framework
SIMOnt: A Security Information Management Ontology Framework Muhammad Abulaish 1,#, Syed Irfan Nabi 1,3, Khaled Alghathbar 1 & Azeddine Chikh 2 1 Centre of Excellence in Information Assurance, King Saud
More informationAutomatic Detection of Nocuous Coordination Ambiguities in Natural Language Requirements
Automatic Detection of Nocuous Coordination Ambiguities in Natural Language Requirements Hui Yang 1 Alistair Willis 1 Anne De Roeck 1 Bashar Nuseibeh 1,2 1 Department of Computing The Open University Milton
More informationLing 201 Syntax 1. Jirka Hana April 10, 2006
Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all
More informationNumerical Data Integration for Cooperative Question-Answering
Numerical Data Integration for Cooperative Question-Answering Véronique Moriceau Institut de Recherche en Informatique de Toulouse 118, route de Narbonne 31062 Toulouse cedex 09, France moriceau@irit.fr
More informationW. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
More informationDanNet Teaching and Research Perspectives at CST
DanNet Teaching and Research Perspectives at CST Patrizia Paggio Centre for Language Technology University of Copenhagen paggio@hum.ku.dk Dias 1 Outline Previous and current research: Concept-based search:
More informationGet the most value from your surveys with text analysis
PASW Text Analytics for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That
More informationWords and Patterns: Lexico-Grammatical Patterns and Semantic Relations in Domain-Specific Discourses
Revista Alicantina de Estudios Ingleses 24(2011):213-233 Words and Patterns: Lexico-Grammatical Patterns and Semantic Relations in Domain-Specific Discourses Concepción Orna-Montesinos University of Saragossa
More informationT U R K A L A T O R 1
T U R K A L A T O R 1 A Suite of Tools for Augmenting English-to-Turkish Statistical Machine Translation by Gorkem Ozbek [gorkem@stanford.edu] Siddharth Jonathan [jonsid@stanford.edu] CS224N: Natural Language
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationOn the use of antonyms and synonyms from a domain perspective
On the use of antonyms and synonyms from a domain perspective Debela Tesfaye IT PhD Program Addis Ababa University Addis Ababa, Ethiopia dabookoo@gmail.com Carita Paradis Centre for Languages and Literature
More informationComparing Ontology-based and Corpusbased Domain Annotations in WordNet.
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information
More informationPOSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
More informationSemantic relation extraction and its applications
Semantic relation extraction and its applications Roxana Girju ESSLLI 2008 20th European Summer School in Logic, Language and Information 4 15 August 2008 Freie und Hansestadt Hamburg, Germany Programme
More informationAutomatic assignment of Wikipedia encyclopedic entries to WordNet synsets
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells Computer Science Dep., Universidad Autonoma de Madrid, 28049 Madrid, Spain
More informationSearch results optimization approach based on semantics
Search results optimization approach based on semantics Li Li 1,2*, Qiang Yao 3 1 School of Computer and Communication Engineering, University of Science and Technology Beijing, No.30 Xueyuan Road, Haidian
More informationAn empirical study of semantic similarity in WordNet and Word2Vec. A Thesis
An empirical study of semantic similarity in WordNet and Word2Vec A Thesis Submitted to the Graduate Faculty of the University of New Orleans in partial fulfillment of the requirements for the degree of
More informationEffective Mentor Suggestion System for Collaborative Learning
Effective Mentor Suggestion System for Collaborative Learning Advait Raut 1 U pasana G 2 Ramakrishna Bairi 3 Ganesh Ramakrishnan 2 (1) IBM, Bangalore, India, 560045 (2) IITB, Mumbai, India, 400076 (3)
More informationONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
More informationMovie Classification Using k-means and Hierarchical Clustering
Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani
More informationAccelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
More informationAn Iterative Method of Extracting Chinese ISA Relations for Ontology Learning
870 JOURNAL OF COMPUTERS, VOL. 5, NO. 6, JUNE 2010 An Iterative Method of Extracting Chinese ISA Relations for Ontology Learning Lei Liu College of Applied Sciences, Beijing University of Technology, Beijing,
More informationA typology of ontology-based semantic measures
A typology of ontology-based semantic measures Emmanuel Blanchard, Mounira Harzallah, Henri Briand, and Pascale Kuntz Laboratoire d Informatique de Nantes Atlantique Site École polytechnique de l université
More informationSEMANTIC RESOURCES AND THEIR APPLICATIONS IN HUNGARIAN NATURAL LANGUAGE PROCESSING
SEMANTIC RESOURCES AND THEIR APPLICATIONS IN HUNGARIAN NATURAL LANGUAGE PROCESSING Doctor of Philosophy Dissertation Márton Miháltz Supervisor: Gábor Prószéky, D.Sc. Multidisciplinary Technical Sciences
More informationA Survey on Product Aspect Ranking Techniques
A Survey on Product Aspect Ranking Techniques Ancy. J. S, Nisha. J.R P.G. Scholar, Dept. of C.S.E., Marian Engineering College, Kerala University, Trivandrum, India. Asst. Professor, Dept. of C.S.E., Marian
More informationA Frequent Concepts Based Document Clustering Algorithm
A Frequent Concepts Based Document Clustering Algorithm Rekha Baghel Department of Computer Science & Engineering Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, Punjab, 144011, India.
More informationHow the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
More informationConstituency. The basic units of sentence structure
Constituency The basic units of sentence structure Meaning of a sentence is more than the sum of its words. Meaning of a sentence is more than the sum of its words. a. The puppy hit the rock Meaning of
More informationVerb Learning in Non-social Contexts Sudha Arunachalam and Leah Sheline Boston University
Verb Learning in Non-social Contexts Sudha Arunachalam and Leah Sheline Boston University 1. Introduction Acquiring word meanings is generally described as a social process involving live interaction with
More informationOpinion Mining Issues and Agreement Identification in Forum Texts
Opinion Mining Issues and Agreement Identification in Forum Texts Anna Stavrianou Jean-Hugues Chauchat Université de Lyon Laboratoire ERIC - Université Lumière Lyon 2 5 avenue Pierre Mendès-France 69676
More informationThe Lois Project: Lexical Ontologies for Legal Information Sharing
The Lois Project: Lexical Ontologies for Legal Information Sharing Daniela Tiscornia Institute of Legal Information Theory and Techniques - Italian National Research Council Abstract. Semantic metadata
More informationIntroduction to WordNet: An On-line Lexical Database. (Revised August 1993)
Introduction to WordNet: An On-line Lexical Database George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller (Revised August 1993) WordNet is an on-line lexical reference
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationWelcome to the Data Analytics Toolkit PowerPoint presentation on EHR architecture and meaningful use.
Welcome to the Data Analytics Toolkit PowerPoint presentation on EHR architecture and meaningful use. When data is collected and entered into the electronic health record, the data is ultimately stored
More informationAnalyzing survey text: a brief overview
IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining
More informationFlattening Enterprise Knowledge
Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it
More informationRev. TX FPSP, 07/12. Global Issues Problem Solving Team Booklet
GLOBAL ISSUES TEAM BOOKLET TEAM Code STEP 1. Identify Challenges Thoroughly read and discuss the future scene, consider the many challenges, issues, concerns, and problems related to it. Generate as many
More information3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work
Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan hasegawa.takaaki@lab.ntt.co.jp
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationChapter 7: Data Mining
Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain
More informationNatural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
More informationThe English Genitive Alternation
The English Genitive Alternation s and of genitives in English The English s genitive freely alternates with the of genitive in many situations: Mary s brother the brother of Mary the man s house the house
More informationStatistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
More informationEMILY WANTS SIX STARS. EMMA DREW SEVEN FOOTBALLS. MATHEW BOUGHT EIGHT BOTTLES. ANDREW HAS NINE BANANAS.
SENTENCE MATRIX INTRODUCTION Matrix One EMILY WANTS SIX STARS. EMMA DREW SEVEN FOOTBALLS. MATHEW BOUGHT EIGHT BOTTLES. ANDREW HAS NINE BANANAS. The table above is a 4 x 4 matrix to be used for presenting
More informationOverview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
More informationDeriving a Large Scale Taxonomy from Wikipedia
Deriving a Large Scale Taxonomy from Wikipedia Simone Paolo Ponzetto and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationA Capability Model for Business Analytics: Part 3 Using the Capability Assessment
A Capability Model for Business Analytics: Part 3 Using the Capability Assessment The first article of this series presents a capability model for business analytics, and the second article describes a
More information2. SEMANTIC RELATIONS
2. SEMANTIC RELATIONS 2.0 Review: meaning, sense, reference A word has meaning by having both sense and reference. Sense: Word meaning: = concept Sentence meaning: = proposition (1) a. The man kissed the
More informationONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU
ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES
More informationSentiment Classification. in a Nutshell. Cem Akkaya, Xiaonan Zhang
Sentiment Classification in a Nutshell Cem Akkaya, Xiaonan Zhang Outline Problem Definition Level of Classification Evaluation Mainstream Method Conclusion Problem Definition Sentiment is the overall emotion,
More informationMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews Minqing Hu and Bing Liu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu
More informationISSN: 2278-5299 365. Sean W. M. Siqueira, Maria Helena L. B. Braz, Rubens Nascimento Melo (2003), Web Technology for Education
International Journal of Latest Research in Science and Technology Vol.1,Issue 4 :Page No.364-368,November-December (2012) http://www.mnkjournals.com/ijlrst.htm ISSN (Online):2278-5299 EDUCATION BASED
More informationDomain Independent Knowledge Base Population From Structured and Unstructured Data Sources
Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle
More informationContext Grammar and POS Tagging
Context Grammar and POS Tagging Shian-jung Dick Chen Don Loritz New Technology and Research New Technology and Research LexisNexis LexisNexis Ohio, 45342 Ohio, 45342 dick.chen@lexisnexis.com don.loritz@lexisnexis.com
More informationAcquiring Ontological Relationships from Wikipedia Using RMRS
Acquiring Ontological Relationships from Wikipedia Using RMRS Aurelie Herbelot and Ann Copestake University of Cambridge, Computer Laboratory, J. J. Thomson Avenue, Cambridge, United Kingdom ah433@cam.ac.uk,ann.copestake@cl.cam.ac.uk
More informationAutomatically populating an image ontology and semantic color filtering
Automatically populating an image ontology and semantic color filtering Christophe Millet, Gregory Grefenstette, Isabelle Bloch, Pierre-Alain Moëllic, Patrick Hède CEA/LIST/LIC2M 18 Route du Panorama,
More informationUsing a Treebank for Finding Opposites
Using a Treebank for Finding Opposites Anna Lobanova Gosse Bouma University of Groningen Artificial Intelligence Dept. E-mail: a.lobanova@ai.rug.nl Erik Tjong Kim Sang University of Groningen Information
More informationSentiment Analysis of Movie Reviews and Twitter Statuses. Introduction
Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about
More informationPersonalized Healthcare Cloud Services for Disease Risk Assessment and Wellness Management using Social Media. Abstract
Personalized Healthcare Cloud Services for Disease Risk Assessment and Wellness Management using Social Media Assad Abbas North Dakota State University, USA assad.abbas@ndsu.edu Mazhar Ali COMSATS Institute
More informationTHE knowledge needed by software developers
SUBMITTED TO IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1 Extracting Development Tasks to Navigate Software Documentation Christoph Treude, Martin P. Robillard and Barthélémy Dagenais Abstract Knowledge
More informationSemantic Structure Matching for Assessing Web-Service Similarity
Semantic Structure Matching for Assessing Web- Service Similarity Yiqiao Wang and Eleni Stroulia Computer Science Department, University of Alberta, Edmonton, AB, T6G 2E8, Canada {yiqiao,stroulia}@cs.ualberta.ca
More informationArchitecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
More informationANNLOR: A Naïve Notation-system for Lexical Outputs Ranking
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France annlor@limsi.fr Cyril Grouin LIMSI-CNRS rue John von Neumann 91400
More informationCHAPTER-6 DATA WAREHOUSE
CHAPTER-6 DATA WAREHOUSE 1 CHAPTER-6 DATA WAREHOUSE 6.1 INTRODUCTION Data warehousing is gaining in popularity as organizations realize the benefits of being able to perform sophisticated analyses of their
More informationA Serious Game for Building a Portuguese Lexical-Semantic Network
A Serious Game for Building a Portuguese Lexical-Semantic Network Mathieu Mangeot LIG-GETALP & Université de Savoie (France) 41 rue des mathématiques, BP 53 F-38041 GRENOBLE CEDEX 9 FRANCE mathieu.mangeot@imag.fr
More informationSentiment-Oriented Contextual Advertising
Teng-Kai Fan Department of Computer Science National Central University No. 300, Jung-Da Rd., Chung-Li, Tao-Yuan, Taiwan 320, R.O.C. tengkaifan@gmail.com Chia-Hui Chang Department of Computer Science National
More informationDomain Ontology Based Class Diagram Generation From Functional Requirements
International Journal of Computer Information Systems and Industrial Management Applications. ISSN 2150-7988 Volume 6 (2014) pp. 227-236 MIR Labs, www.mirlabs.net/ijcisim/index.html Domain Ontology Based
More information