ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
|
|
|
- Julia Holland
- 9 years ago
- Views:
Transcription
1 ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering Yıdız Technical University, İstanbul, Turkey {gurkans,banu}@ce.yildiz.edu.tr 2 Faculty of Engineering and Natural Sciences, Department of Computer Engineering İstanbul Bilgi University, İstanbul, Turkey [email protected] ABSTRACT Extraction of semantic relations from various sources such as corpus, web pages, dictionary definitions etc. is one of the most important issue in study of Natural Language Processing (NLP). Various methods have been used to extract semantic relation from various sources. Pattern-based approach is one of the most popular method among them. In this study, we propose a model to extract antonym pairs from Turkish corpus automatically. Using a set of seeds, we automatically extract lexico-syntactic patterns (LSPs) for antonym relation from corpus. Reliability score is calculated for each pattern. The most reliable patterns are used to generate new antonym pairs. Study conduct on only adjective-adjective and noun-noun pairs. Noun and adjective target words are used to measure success of method and candidate antonyms are generated using reliable patterns. For each antonym pair consisting of candidate antonym and target word, antonym score is calculated. Pairs that have a certain score are assigned to antonym pair. The proposed method shows good performance with 77.2% average accuracy. KEYWORDS Natural Language Processing, Semantic relations, Antonym, Pattern-based approach 1. INTRODUCTION Extraction of semantic relation pairs from corpus is one of the most popular topic in NLP. Hyponymy, hypernymy, meronymy, holonymy, synonymy, antonymy etc. can be given to example of semantic relations. Several resources are used to acquire semantic relations. WordNet [1] is the one of the important sources for semantic relations. WordNet is a lexical database for English and consists of so many words and links among these words. Since each word is represented as synonym words called Jan Zizka et al. (Eds) : ICAITA, SAI, CDKP, Signal, NCO pp , CS & IT-CSCP 2015 DOI : /csit
2 12 Computer Science & Information Technology (CS & IT) synsets in WordNet, it can be said that main relations of WordNet is synonymy. Apart from synonymy relation, words are connected each other via semantic relation links like hyponymy, hypernymy, meronymy, holonymy, antonymy etc. Words are collected under four different titles as noun, adjective, verb, adverb, respectively in WordNet. One of the most important semantic relation in WordNet is antonymy. Antonymy represents contrast sense between two words. In fact, there is no exactly consensus on the definition of the antonymy. According to domain experts, some pairs like good-bad, hot-cold etc. represent good antonymy relation, but some pairs like north-south, woman-man do not exactly represent antonymy. This makes difficult to detect opposite pairs. In addition, studies have shown that synonym and antonym words occur with similar context words. This case also reveals difficulty of distinguishing antonyms from synonyms. In this study, we propose a pattern-based model to extract antonym pairs from Turkish corpus. Only lexico syntactic patterns are used to find antonym pairs. Noun-noun and adjective-adjective antonym initial seeds are prepared and antonym patterns are extracted using seeds. Patterns having a reliable pattern score are selected to generate new antonym pairs from corpus. The rest of this paper organized as follows: Section 2 presents related works. Extraction of antonym patterns and extraction of new antonym pairs are explained in Section 3 and Section 4, respectively. Finally, we present experimental results in Section RELATED WORKS Patterns have been widely used to extract semantic relations from corpus. The most popular pattern-based study was made by Hearst [2] in Hearst used some patterns like such X as Y to extract hyponym words from corpus. In this pattern, X and Y represent hypernym and hyponym words, respectively. After experiments, it has been shown that using some patterns, hyponym words can be extracted from corpus with high accuracy. Various studies have been conducted on extraction of antonym pairs. Lobanova (2010) [3] prepared some adjective-adjective antonym initial pairs and generated antonym patterns occurring with initial pairs from large Dutch corpus. Lobanova used generated antonym patterns to extract new antonym pairs from corpus. This process repeated iteratively. At each iteration, new antonym patterns were generated by using initial pairs and new antonym pairs were used to extract new antonym patterns again. At each iteration, only reliable antonym pairs and patterns were selected. Thus, sharp accuracy decreasing for generated antonym pairs was prevented. Turney (2008) [4] used a corpus based supervised classification method to separate antonyms from synonyms. Only patterns obtained from corpus were used as features. Co-occurrence frequency between pair and pattern was used as a feature. Support Vector Machines (SVM) was used as classification algorithm. To measure success of method, English as a second language (ESL) questions were used and 75.0% classification accuracy was obtained for antonym pair classification. Lin (2003) [5] manually prepared some patterns like from X to Y, either X or Y to discriminate synonyms from antonyms. It was observed that antonym pairs occur with these two pattern very frequently but synonym pairs occur with these patterns rarely.
3 Computer Science & Information Technology (CS & IT) 13 Mohammad (2008) [6] developed an unsupervised method using degree of antonym to discriminate antonym pairs. According to definition of degree of antonym, the more a pair has antonym degree, the more the pair represents antonymy. Mohammad used corpus statistical features and antonym dictionary category words together. Over test pairs 80.0% accuracy was obtained for antonym pairs. For Turkish, there are some studies to extract semantic relation pairs from corpus and dictionary definitions [7], [8]. Hyponym-hypernym [9], [10], meronym-holonym [11] and synonym [12] pairs have been automatically extracted from Turkish corpus. For hyponym-hypernym, meronym-holonym and synonym pairs 83.0%, 75.0% and 80.3% accuracies were obtained, respectively. Although there are some studies about antonym pair extraction from Turkish dictionary definitions, there is no study using Turkish corpus and antonym corpus patterns. Our main motivation is that there is no such a corpus based study for Turkish before. 3. EXTRACTION OF ANTONYM PATTERNS FROM TURKISH CORPUS LSPs are widely used to extract antonym relation pairs. In this study, antonym patterns are used to extract antonym pairs. Therefore, we have to generate antonym patterns from corpus. To extract Turkish antonym patterns, following processing steps are applied. BOUN web corpus was used [13] as a source. The corpus consists of nearly 10 million sentences and 500 million words (tokens). Firstly, we remove all punctuation and special characters from corpus. Corpus is parsed morphologically by Zemberek Turkish NLP tool [14] and each word in corpus is separated to root, root part-of-speech tag and suffixes. For a given word, Zemberek generates multiple parsing results, but only first parsing result is used. Because our corpus is too big, search process can take a long time. For fast search operations, morphologically parsed corpus is indexed by Apache Lucene searching tool [15] and index file is used for all corpus search operations. To find antonym patterns, we generate noun and adjective target words. Antonym equivalents of target antonyms are extracted with using Turkish Antonym Dictionary [16]. 184 antonym pairs called initial seeds are searched in corpus index file and sentences which contain initial seeds are found. In related sentences, initial seeds are replaced with * (wildcard) character. We select patterns having maximum two words between two * characters and others are removed. Thus, we ignore unproductive special patterns. Reliability score of each pattern is calculated. To calculate pattern reliability score, we used a formula which is given in equation (1). R n = (1) In formula, R n represents reliability score of pattern n. P is total co-occurrence frequency of pattern n with initial seeds. T represents total co-occurrence frequency of pattern n with other antonym pairs(other seeds) in corpus. Total co-occurrence frequency of pattern with initial seeds is divided by total co-occurrence frequency of pattern with other seeds. Then, reliability score is
4 14 Computer Science & Information Technology (CS & IT) calculated for each pattern. For example, if pattern X occurs with initial seeds 100 times and occurs with other seeds times in corpus, reliability score of X equals 100/ = But the reliability score may be misleading. If pattern X occurs with initial seeds 7 times and occurs with other seeds 10 times, reliability score equals 7/10 = 0.7. Although reliability score of X is high, X occurs with initial seeds only 7 times. Because co-occurrence frequency of X with initial seeds is too low, pattern X does not have any importance in terms of productivity and generality. For this reason, we calculate reliability score for patterns occurring with initial seeds more than 50 times and other patterns are ignored. To determine pattern reliability score, number of different initial seeds occurring with a pattern is an important parameter. We can say that the more different initial seeds occur with a pattern, the more the pattern is reliable. We assume that pattern X occurs with initial seeds 100 times, but only occurs with 5 different initial seeds. Likewise, pattern Y occurs with initial seeds 100 times, but occurs with 20 different initial seeds. If total co-occurrence frequency of X and Y with other seeds equals 1000, reliability scores of both patterns equal 100/1000 = 0.1. Although pattern reliability scores of X and Y equal each other, Y pattern occurs with more different initial seeds than X. Hence the pattern Y is more general and productive than X. To tackle this problem, pattern reliability score is calculated for patterns occurring with more than 20 different initial seeds and other patterns are not assessed. After calculating reliability score for each pattern according to two conditions given above, all patterns are sorted according to reliability score. Patterns that have reliability score greater than 0.02 are selected to generate new antonym pairs from corpus. Reliable antonym patterns are given in Table 1. Table 1. Antonym patterns extracted from corpus using initial seeds 4. EXTRACTING NEW ANTONYM PAIRS USING PATTERNS Using antonym patterns in Table 1, antonym equivalent words are extracted for a given target word. Process steps of new antonym pair extraction are given below.
5 Computer Science & Information Technology (CS & IT) 15 Firstly, target words are determined and patterns generated by replacing target word with * characters are searched in corpus. Words corresponding to * characters are extracted as candidates of target word. Although reliable patterns are used, not antonym pairs can occur in these patterns. For this reason, antonym equivalents of given a target word are defined as candidates. In pattern structure, any antonym pairs can show two different sequence like X-Y and Y-X. Thus, given a target word is searched in two different positions and words in different * positions are recorded as candidates. For example, target word iyi (good) are searched as; Turkish patterns English equivalents # iyi ve * arasındaki # between good and * # * ve iyi arasındaki # between * and good # ne iyi ne de * # neither good nor * # ne * ne de iyi # neither * nor good After extracting candidates of target word, antonym score is calculated for each pair consisting of target and a candidate. Pairs having a certain antonym score are assigned to antonym and others are eliminated. To calculate antonym scores of pairs, we used Lobanova s antonym score formula given in equation (2) [17]. P x = 1-1 (2) In formula, P x represents antonym score for pair x. M is number of reliable pattern and C k is cooccurrence frequency of pair x with pattern k. T k represents co-occurrence frequency of pattern k with other seeds in corpus. 5. EXPERIMENTAL RESULTS To measure success of model, 196 noun and adjective target words are utilized. Target words were searched together with reliable antonym patterns and candidates were extracted. For each pair, antonym score was calculated. After observations, we defined minimum reliable antonym score as 0.3. When the minimum reliable antonym score is defined less than 0.3, it is shown that accuracy of the method falls sharply. For 45 out 196 target words, our method proposed reliable antonym pairs with 77.2% average accuracy. Class of pairs were manually tagged by 3 Turkish native speakers. 21 target words and candidates, english equivalents are given in Table 2.
6 16 Computer Science & Information Technology (CS & IT) Table 2. Target words, candidates, antonym scores and pair classes 6. CONCLUSIONS In this study, antonym pairs and patterns were automatically extracted from Turkish corpus. Noun-noun and adjective-adjective seeds were created and antonym patterns were generated using these seeds. After generating patterns from initial seeds, reliability score was calculated for each antonym pattern. 11 patterns having reliability score greater than 0.02 were selected to produce new antonym pairs. To measure accuracy of method, noun and adjective target words were used as test words. Using these targets with antonym patterns, candidates were found for each target words. For each antonym pair, antonym score was calculated. Pairs having antonym score greater than 0.3 were assigned to antonym and others were eliminated. For 45 out 196 target words, our method proposed reliable antonym pairs with 77.2% average accuracy. This study has been shown that Turkish antonym relation patterns can be extracted from corpus easily using some manually created antonym seeds. Candidates also can be easily extracted for a given target word with high accuracy with using reliable antonym patterns. Because patterns are used to extract antonym pairs, high co-occurrence frequency of target with patterns in corpus directly influences success of the method. This is a disadvantage for all of pattern-based methods.
7 Computer Science & Information Technology (CS & IT) 17 In further studies, we aim to use corpus statistical information with patterns. Thus, antonym pairs occurring with patterns at low frequency can be extracted from corpus. REFERENCES [1] Christiane Fellbaum (1998, ed.) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press. [2] Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: 14th International Conference on Computational Linguistics, COLING 1992, Nantes, France, pp (1992) [3] Lobanova, A., van der Kleij, T. and J. Spenader (2010). Defining antonymy: a corpus-based study of opposites by lexico-syntactic patterns. In: International Journal of Lexicography. Vol 23: [4] Turney, P.D. (2008), A uniform approach to analogies, synonyms, antonyms, and associations, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp [5] Dekang Lin, Shaojun Zhao, Lijuan Qin, and Ming Zhou Identifying synonyms among distributionally similar words. In Proceedings of IJCAI Acapulco, Mexico. [6] Saif Mohammad, Bonnie Dorr, and Graeme Hirst Computing word-pair antonymy. In EMNLP, pages Association for Computational Linguistics. [7] Yazıcı, E. ve Amasyalı, M.F., (2011). Automatic Extraction of Semantic Relationships Using Turkish Dictionary Definitions, EMO Bilimsel Dergi, İstanbul. [8] Zeynep Orhan, İlknur Pehlivan, Volkan Uslan, Pınar Önder,Automated Extraction of Semantic Word Relations in Turkish Lexicon, Journal of Mathematical And Computational Applications, 16, 1, Jan. 2011, pp.13-22, [9] Yildirim, S., Yildiz, T., (2012). Corpus-Driven Hyponym Acquisition for Turkish Language, CICLing 13th International Conference on Intelligent Text Processig and Computational Linguistics, [10] Şahin, G., Diri, B., Yıldız T., Pattern and Semantic Similarity Based Automatic Extraction of Hyponym-Hypernym Relation from Turkish Corpus, 23th Signal Processing and Communications Applications Conference, Malatya, Turkey, (16-19 May), [11] Yıldız, T., Yıldırım, S., Diri, B., Extraction of Part-Whole Relations from Turkish Corpora, Computational Linguistics and Intelligent Text Processing, CICLing, Greece, [12] Yıldız, T., Yıldırım, S., Diri, B., An Integrated Approach to Automatic Synonym Detection in Turkish Corpus, 9th International Conference on Natural Language Processing, PolTAL, Springer LNAI proceedings, Warsaw, Poland, (17-19 September), [13] Sak, H., Güngör, T. and Saraçlar. M., (2008). Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus, The 6th International Conference on Natural Language Processing, GOTAL [14] [15] [16] [17] Lobanova, A., van der Kleij, T. and J. Spenader (2010). Defining antonymy: a corpus-based study of opposites by lexico-syntactic patterns. In: International Journal of Lexicography. Vol 23:
An Integrated Approach to Automatic Synonym Detection in Turkish Corpus
An Integrated Approach to Automatic Synonym Detection in Turkish Corpus Dr. Tuğba YILDIZ Assist. Prof. Dr. Savaş YILDIRIM Assoc. Prof. Dr. Banu DİRİ İSTANBUL BİLGİ UNIVERSITY YILDIZ TECHNICAL UNIVERSITY
Turkish synonym identification from multiple resources: monolingual corpus, mono/bilingual on-line dictionaries and WordNet
1 1 1 1 1 1 1 1 0 1 Turkish synonym identification from multiple resources: monolingual corpus, mono/bilingual on-line dictionaries and WordNet Tuğba YILDIZ 1,*, Banu DİRİ, Savaş YILDIRIM 1 1 Department
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
A Survey on Product Aspect Ranking Techniques
A Survey on Product Aspect Ranking Techniques Ancy. J. S, Nisha. J.R P.G. Scholar, Dept. of C.S.E., Marian Engineering College, Kerala University, Trivandrum, India. Asst. Professor, Dept. of C.S.E., Marian
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
Natural Language Processing. Part 4: lexical semantics
Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Defining Antonymy: A Corpus-based Study of Opposites Found by Lexico-syntactic Patterns Abstract
Defining Antonymy: A Corpus-based Study of Opposites Found by Lexico-syntactic Patterns Abstract Using small sets of adjectival seed antonym pairs, we automatically find patterns where these pairs co-occur
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Web opinion mining: How to extract opinions from blogs?
Web opinion mining: How to extract opinions from blogs? Ali Harb [email protected] Mathieu Roche LIRMM CNRS 5506 UM II, 161 Rue Ada F-34392 Montpellier, France [email protected] Gerard Dray [email protected]
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information
On the use of antonyms and synonyms from a domain perspective
On the use of antonyms and synonyms from a domain perspective Debela Tesfaye IT PhD Program Addis Ababa University Addis Ababa, Ethiopia [email protected] Carita Paradis Centre for Languages and Literature
A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations
A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations Peter D. Turney National Research Council of Canada Institute for Information Technology M50 Montreal Road Ottawa, Ontario, Canada
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh P rabhu 2 Shilpa Desai 1 Hanumant Redkar 1 N eha P rabhugaonkar 1 Apur va N agvenkar 1 Ramdas Karmali 1 (1) GOA
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
Topics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 [email protected] Steffen STAAB Institute AIFB,
Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance David Bixler, Dan Moldovan and Abraham Fowler Language Computer Corporation 1701 N. Collins Blvd #2000 Richardson,
Movie Classification Using k-means and Hierarchical Clustering
Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India [email protected] Saheb Motiani
Intro to Linguistics Semantics
Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember
Comparing Topiary-Style Approaches to Headline Generation
Comparing Topiary-Style Approaches to Headline Generation Ruichao Wang 1, Nicola Stokes 1, William P. Doran 1, Eamonn Newman 1, Joe Carthy 1, John Dunnion 1. 1 Intelligent Information Retrieval Group,
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells Computer Science Dep., Universidad Autonoma de Madrid, 28049 Madrid, Spain
Free Construction of a Free Swedish Dictionary of Synonyms
Free Construction of a Free Swedish Dictionary of Synonyms Viggo Kann and Magnus Rosell KTH Nada SE-1 44 Stockholm Sweden {viggo, rosell}@nada.kth.se Abstract Building a large dictionary of synonyms for
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Package wordnet. January 6, 2016
Title WordNet Interface Version 0.1-11 Package wordnet January 6, 2016 An interface to WordNet using the Jawbone Java API to WordNet. WordNet () is a large lexical database
What Is This, Anyway: Automatic Hypernym Discovery
What Is This, Anyway: Automatic Hypernym Discovery Alan Ritter and Stephen Soderland and Oren Etzioni Turing Center Department of Computer Science and Engineering University of Washington Box 352350 Seattle,
PoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy [email protected] Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work
Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan [email protected]
Context Grammar and POS Tagging
Context Grammar and POS Tagging Shian-jung Dick Chen Don Loritz New Technology and Research New Technology and Research LexisNexis LexisNexis Ohio, 45342 Ohio, 45342 [email protected] [email protected]
Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries
Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Patanakul Sathapornrungkij Department of Computer Science Faculty of Science, Mahidol University Rama6 Road, Ratchathewi
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews Minqing Hu and Bing Liu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu
Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction
Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about
Learning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Chapter 2 The Information Retrieval Process
Chapter 2 The Information Retrieval Process Abstract What does an information retrieval system look like from a bird s eye perspective? How can a set of documents be processed by a system to make sense
Simple maths for keywords
Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd [email protected] Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 [email protected] 2 [email protected] Abstract A vast amount of assorted
Clustering of Polysemic Words
Clustering of Polysemic Words Laurent Cicurel 1, Stephan Bloehdorn 2, and Philipp Cimiano 2 1 isoco S.A., ES-28006 Madrid, Spain [email protected] 2 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe,
Brill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
SENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Dissertations and Theses from the College of Business Administration Business Administration, College of 4-1-2012 SENTIMENT
The study of words. Word Meaning. Lexical semantics. Synonymy. LING 130 Fall 2005 James Pustejovsky. ! What does a word mean?
Word Meaning LING 130 Fall 2005 James Pustejovsky The study of words! What does a word mean?! To what extent is it a linguistic matter?! To what extent is it a matter of world knowledge? Thanks to Richard
Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
Sentiment analysis for news articles
Prashant Raina Sentiment analysis for news articles Wide range of applications in business and public policy Especially relevant given the popularity of online media Previous work Machine learning based
A chart generator for the Dutch Alpino grammar
June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:
A Comparative Study on Sentiment Classification and Ranking on Product Reviews
A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan
Interpreting areading Scaled Scores for Instruction
Interpreting areading Scaled Scores for Instruction Individual scaled scores do not have natural meaning associated to them. The descriptions below provide information for how each scaled score range should
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Makoto Nakamura, Yasuhiro Ogawa, Katsuhiko Toyama Japan Legal Information Institute, Graduate
From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files
Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted
Customer Intentions Analysis of Twitter Based on Semantic Patterns
Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun [email protected] Mohamed Salah Gouider [email protected] Lamjed Ben Said [email protected] ABSTRACT
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded
Word Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
Integrating Collaborative Filtering and Sentiment Analysis: A Rating Inference Approach
Integrating Collaborative Filtering and Sentiment Analysis: A Rating Inference Approach Cane Wing-ki Leung and Stephen Chi-fai Chan and Fu-lai Chung 1 Abstract. We describe a rating inference approach
Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations
Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations Bento C. Dias-da-Silva 1, Ariani Di Felippo 2, Ricardo Hasegawa 3 Centro de Estudos Lingüísticos
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS Alok Ranjan Pal 1, 3, Anirban Kundu 2, 3, Abhay Singh 1, Raj Shekhar 1, Kunal Sinha 1 1 College of Engineering and Management,
Secure semantic based search over cloud
Volume: 2, Issue: 5, 162-167 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 Sarulatha.M PG Scholar, Dept of CSE Sri Krishna College of Technology Coimbatore,
Sentiment Classification. in a Nutshell. Cem Akkaya, Xiaonan Zhang
Sentiment Classification in a Nutshell Cem Akkaya, Xiaonan Zhang Outline Problem Definition Level of Classification Evaluation Mainstream Method Conclusion Problem Definition Sentiment is the overall emotion,
Facilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat [email protected] Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
Third Grade Language Arts Learning Targets - Common Core
Third Grade Language Arts Learning Targets - Common Core Strand Standard Statement Learning Target Reading: 1 I can ask and answer questions, using the text for support, to show my understanding. RL 1-1
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
ISSN: 2278-5299 365. Sean W. M. Siqueira, Maria Helena L. B. Braz, Rubens Nascimento Melo (2003), Web Technology for Education
International Journal of Latest Research in Science and Technology Vol.1,Issue 4 :Page No.364-368,November-December (2012) http://www.mnkjournals.com/ijlrst.htm ISSN (Online):2278-5299 EDUCATION BASED
Word Sense Disambiguation as an Integer Linear Programming Problem
Word Sense Disambiguation as an Integer Linear Programming Problem Vicky Panagiotopoulou 1, Iraklis Varlamis 2, Ion Androutsopoulos 1, and George Tsatsaronis 3 1 Department of Informatics, Athens University
Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track Yung-Chun Chang 1,2, Yu-Chen Su 3, Chun-Han Chu 1, Chien Chin Chen 2 and
Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources
Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle
Language Meaning and Use
Language Meaning and Use Raymond Hickey, English Linguistics Website: www.uni-due.de/ele Types of meaning There are four recognisable types of meaning: lexical meaning, grammatical meaning, sentence meaning
31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
Parsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh [email protected] Ewan Klein University of Edinburgh [email protected] Abstract Software
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
