Annotation and Evaluation of Swedish Multiword Named Entities
|
|
|
- Daniel Chambers
- 10 years ago
- Views:
Transcription
1 Annotation and Evaluation of Swedish Multiword Named Entities DIMITRIOS KOKKINAKIS Department of Swedish, the Swedish Language Bank University of Gothenburg Sweden
2 Introduction Considerable body of work in NER plethora of identification and classification techniques; NE taxonomies and resources; Likewise, a wide variety of work in MWE; key problem for the development of large scale linguistically sound NLP technologies (Sag et al., 2002) typology; detection; function; applications; Considerably less focus at the intersection of the two; their nature/complexity/magnitude and evaluation here, we evaluate 2 Swedish NER systems on gold standard in order to provide insights on the magnitude and usage of such expressions in modern Swedish corpora
3 MWE-NEs and their relation to NLP composed of >1 tokens (even in combinations of characters / numerals) and, for some of those, their meaning cannot be traced back to their individual parts (Vincze et al., 2011); e.g., New York Yankees justifiable to treat such expression as a single syntactic and/or semantic entity in, e.g., treebanks (Bejček & Straňák, 2010). NLP applications require to treat MWE-NEs as a single object for ensuring: improving parsing accuracy (Nivre and Nilsson, 2004) improving question-answering (McCord et al., 2012) improving machine translation (Tan and Pal, 2014) better translation quality (Hurskainen, 2008) improving multilingual IR (Vechtomova, 2012)
4 Swedish Evaluation Corpora SUC3.0 (the Stockholm-Umeå Corpus, v. 3.0) is a freely available Swedish gold standard corpus, that can be used for the evaluation of MWE-NE recognition. SUC3.0 recognized 9 types of NEs: person, work, event, product, inst [itution], place, myth, other and animal. These 9 entity types have been manually annotated according to the TEIP3 guidelines SIC (the Stockholm Internet Corpus contains Swedish blog posts, automatically annotated with pos, and NEs; 13,562 tokens) Swedish Wikipedia (28 random selected articles; 16,069 tokens)
5 9, [NEs] [time] Swedish Evaluation Corpora SUC3.0: 9,884 MWE-NEs (roughly 30% of all NEs in the corpus) found in ~7,530 corpus lines (~ tokens) none (MWE) time expressions* SIC: only 34 MWE-NEs (18 MWE time expressions) Swedish Wikipedia articles: (purpose) SUC3.0 do not contain annotated time expressions, an important category often discussed in the context of NER; 223 MWE-NEs and 222 MWE time expressions *temporal expressions: absolut temporal; relative temporal; durations
6 SIC+SW SUC3.0 Swedish Evaluation Corpora MWE-NE #2- token entities %* >2- token entities person 5, % (58.7%) % (4.6%) place % (5.3%) % (0.9%) institution 1, % (11.3%) % (4.1%) other % (3.3%) % (1.5%) work % (4.2%) % (6.1%) person % (11.7%) % (3%) place % (9.4%) 1 2.1% (0.2%) institution 57 76% (11.5%) 18 24% (3.6%) other % (3.2%) % (2%) work % (3.2%) % (3.8%) % time % (20.5%) % (27.8%) Available from: < and < *Percentages of bigram NEs compared to all MWE-NEs in the 2 gold standard corpora
7 SUC3.0 Pre-processing the NE annotation of SUC3.0 is not completely homogeneous wrt the NEs content 2 Swedish NER taggers are trained on a simplified version of the SUC3.0; using 4 entity types, namely: person organization location miscellaneous thus product, myth, event, animal, and other are merged in the miscellaneous category; institution was mapped to organization, and place to location Moreover, SUC3.0 does not provide annotation for date or time expressions, we manually annotated 28 randomly chosen Swedish Wikipedia articles for this part of the evaluation
8 SUC3.0 Pre-processing For the sake of the experiment prior filtering and harmonization of the SUC was necessary before the evaluation of the entities a number of person included the vocation or other features as part of the annotation: President as in President George Bush in: SUC3.0-file aa08c-019 animal (68) and myth (18) were merged into the category person In the generic NE type other (because of discrepancies in the SUC3.0 annotation) we included the for product (208) event (93) and other (174)
9 Evaluation All annotated texts were converted to the conll data format (columns separated by a single space) and then the conlleval script < was used for the evaluation of the automatic NER Tokens not part of an entity are tagged O O for Outside B stands for Begin I stands for Inside
10 Comparison and Evaluation 1 (SUC3.0) P P stagger R R stagger *FB1 FB1 stagger Gold Data person-b 95.80% 98.85% 90.28% 96.40% 92.96% 97.61% SUC3.0 based on 6,264 person-i 93.90% 98.04% 88.46% 95.33% 91.10% 96.66% place-b 94.74% 97.36% 78.26% 89.48% 85.71% 93.28% place-i 89.20% 96.92% 73.71% 81.39% 80.72% 88.48% inst-b 93.35% 97.46% 64.79% 88.23% 76.49% 92.62% inst-i 90.39% 96.44% 62.73% 81.85% 74.06% 88.55% work-b 70.73% 81.31% 25.47% 60.86% 37.45% 69.61% work-i 54.27% 80.39% 20.15% 48.92% 29.39% 60.83% other-b 89.68% 93.64% 62.73% 80.63% 73.82% 86.65% other-i 80.80% 95.41% 56.29% 74.32% 66.35% 83.55% based on 6,795 SUC3.0 based on 619 based on 741 SUC3.0 based on 1,521 based on 2,130 SUC3.0 based on 1,022 based on 2,513 SUC3.0 based on 475 based on 675 * FB1 = 2*P*R/P+R ** < by Erik Tjong Kim Sang *** Note! Stagger was trained on the SUC3.0 NE!
11 Comparison and Evaluation 2 (SW+SIC) P P stagger R R stagger FB1 FB1 stagger Gold Data person-b 75.78% 49.6% 89.04% 84.93% 81.76% 62.63% SW+SIC based on 73 person-i 75.45% 74.76% 88.30% 81.91% 81.37% 78.17% place-b 78.57% 47.06% 68.75% 16.67% 73.33% 24.62% place-i 76.74% 100% 67.35% 8.16% 71.74% 15.09% inst-b 71.15% 50% 49.33% 21.33% 58.27% 29.91% inst-i 67.12% 58.06% 46.67% 17.14% 55.06% 26.47% work-b 66.67% 12.50% 38.71% 5% 48.98% 7.14% work-i 77.42% 11.11% 40% 2.33% 52.75% 3.85% other-b 64.29% 50% 30% 4.88% 40.91% 8.89% other-i 76.47% 40% 27.66% 3.12% 40.62% 5.8% time-b 91.03% 84.58% 87.69% time-i 98.21% 81.32% 88.97% based on 94 SW+SIC based on 48 based on 49 SW+SIC based on 75 based on 105 SW+SIC based on 20 based on 43 SW+SIC based on 41 based on 64 SW+SIC: based on 240 based on 471
12 Error Analysis, some observations The NE type work seems to be the most difficult MWE-NE to identify; usually there are no orthographic or other identifiable signs in their immediate context, the use of common vocabulary makes things even more difficult kk48-011: Vi hade tidigare spelat en komedi, <work>de båda direktörerna</work>. ( We had previously played a comedy, The both directors. ) Non-consistent : e.g. between work & inst in both cases below the annotation should have been work: kk72-126: [ ] efter artikeln i <inst>svenska Dagbladet</inst> [ ] [ ] after an article in Svenska Dagbladet [ ] ; while in kl the same entity is given as: [ ] annonsen kommer i <work>svenska Dagbladet</work> [ ] [ ] the advertisement is posted in Svenska Dagbladet [ ]
13 Error Analysis, some observations The types inst and other exhibit very low recall for various reasons, e.g. systematic polysemy between an organization and a location, e.g. in file jg05b-005 [ ] mottagningen på <inst>sandvikens sjukhus</inst> ( [ ] the reception at the Sandviken hospital ) where the obtained annotation from the NER system was place and probably correct; or other not so obvious reasons as e.g. in file he06d-002: I <other>konsum Huddinge centrum</other> är en torgyta intill [ ] ( In Konsum at Huddinge center there is a square area next to [ ] while the obtained annotation by the NER system was once again place.
14 Conclusions an experiment to automatically annotate and evaluate Swedish MWE-NEs the evaluation results show a large variation wrt the type of NEs concerned, with the worse results to be found for the categories work and other during the analysis of the SUC3.0 MWE NEs we discovered inconsistencies and discrepancies that affect the results in a negative way. A newer version of these, with the inconsistencies resolved could contribute to a much reliable gold standard for Swedish NER (e.g. training and/or testing)
Multiword Expressions and Named Entities in the Wiki50 Corpus
Multiword Expressions and Named Entities in the Wiki50 Corpus Veronika Vincze 1, István Nagy T. 2 and Gábor Berend 2 1 Hungarian Academy of Sciences, Research Group on Artificial Intelligence [email protected]
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
Shallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland [email protected] Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Workshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science
Engineering & Health Informa2on Science Engineering NLP Solu/ons for Structured Informa/on from Clinical Text: Extrac'ng Sen'nel Events from Pallia've Care Consult Le8ers Canada-China Clean Energy Initiative
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet, Mathieu Roche To cite this version: Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet,
Automatic Detection and Correction of Errors in Dependency Treebanks
Automatic Detection and Correction of Errors in Dependency Treebanks Alexander Volokh DFKI Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany [email protected] Günter Neumann DFKI Stuhlsatzenhausweg
Word Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
ETL Ensembles for Chunking, NER and SRL
ETL Ensembles for Chunking, NER and SRL Cícero N. dos Santos 1, Ruy L. Milidiú 2, Carlos E. M. Crestana 2, and Eraldo R. Fernandes 2,3 1 Mestrado em Informática Aplicada MIA Universidade de Fortaleza UNIFOR
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,
A Systematic Cross-Comparison of Sequence Classifiers
A Systematic Cross-Comparison of Sequence Classifiers Binyamin Rozenfeld, Ronen Feldman, Moshe Fresko Bar-Ilan University, Computer Science Department, Israel [email protected], [email protected],
Micro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
Automatic Text Analysis Using Drupal
Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing
An Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2011 Editor(s): Contributor(s): Reviewer(s): Status-Version: Volha Petukhova, Arantza del Pozo Mirjam
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark [email protected] Outline Flow chart Linguateca Palavras History
Survey Results: Requirements and Use Cases for Linguistic Linked Data
Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group
A Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA [email protected], [email protected]
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
Optimizing Multilingual Search With Solr
www.basistech.com [email protected] 617-386-2090 Optimizing Multilingual Search With Solr Pg. 1 INTRODUCTION Today s search application users expect search engines to just work seamlessly across multiple
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
Transition-Based Dependency Parsing with Long Distance Collocations
Transition-Based Dependency Parsing with Long Distance Collocations Chenxi Zhu, Xipeng Qiu (B), and Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
PoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy [email protected] Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
Building gold-standard treebanks for Norwegian
Building gold-standard treebanks for Norwegian Per Erik Solberg National Library of Norway, P.O.Box 2674 Solli, NO-0203 Oslo, Norway [email protected] ABSTRACT Språkbanken at the National Library of Norway
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH
Journal of Computer Science 9 (7): 922-927, 2013 ISSN: 1549-3636 2013 doi:10.3844/jcssp.2013.922.927 Published Online 9 (7) 2013 (http://www.thescipub.com/jcs.toc) ARABIC PERSON NAMES RECOGNITION BY USING
Mining a Corpus of Job Ads
Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department
Research Portfolio. Beáta B. Megyesi January 8, 2007
Research Portfolio Beáta B. Megyesi January 8, 2007 Research Activities Research activities focus on mainly four areas: Natural language processing During the last ten years, since I started my academic
Named Entity Recognition in Broadcast News Using Similar Written Texts
Named Entity Recognition in Broadcast News Using Similar Written Texts Niraj Shrestha Ivan Vulić KU Leuven, Belgium KU Leuven, Belgium [email protected] ivan.vulic@@cs.kuleuven.be Abstract
Text Analysis for Big Data. Magnus Sahlgren
Text Analysis for Big Data Magnus Sahlgren Data Size Style (editorial vs social) Language (there are other languages than English out there!) Data Size Style (editorial vs social) Language (there are
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and
Simple Type-Level Unsupervised POS Tagging
Simple Type-Level Unsupervised POS Tagging Yoong Keok Lee Aria Haghighi Regina Barzilay Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology {yklee, aria42, regina}@csail.mit.edu
Named Entity Recognition Experiments on Turkish Texts
Named Entity Recognition Experiments on Dilek Küçük 1 and Adnan Yazıcı 2 1 TÜBİTAK - Uzay Institute, Ankara - Turkey [email protected] 2 Dept. of Computer Engineering, METU, Ankara - Turkey
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
LGPLLR : an open source license for NLP (Natural Language Processing) Sébastien Paumier. Université Paris-Est Marne-la-Vallée
LGPLLR : an open source license for NLP (Natural Language Processing) Sébastien Paumier Université Paris-Est Marne-la-Vallée [email protected] Penguin from http://tux.crystalxp.net/ 1 Linguistic data
The University of Amsterdam s Question Answering System at QA@CLEF 2007
The University of Amsterdam s Question Answering System at QA@CLEF 2007 Valentin Jijkoun, Katja Hofmann, David Ahn, Mahboob Alam Khalid, Joris van Rantwijk, Maarten de Rijke, and Erik Tjong Kim Sang ISLA,
Technical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold [email protected] [email protected] Copyright 2012 by KNIME.com AG
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
How to make Ontologies self-building from Wiki-Texts
How to make Ontologies self-building from Wiki-Texts Bastian HAARMANN, Frederike GOTTSMANN, and Ulrich SCHADE Fraunhofer Institute for Communication, Information Processing & Ergonomics Neuenahrer Str.
Hybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
PPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
Errors in Operational Spreadsheets: A Review of the State of the Art
Errors in Operational Spreadsheets: A Review of the State of the Art Stephen G. Powell Tuck School of Business Dartmouth College [email protected] Kenneth R. Baker Tuck School of Business Dartmouth College
Integrating Annotation Tools into UIMA for Interoperability
Integrating Annotation Tools into UIMA for Interoperability Scott Piao, Sophia Ananiadou and John McNaught School of Computer Science & National Centre for Text Mining The University of Manchester UK {scott.piao;sophia.ananiadou;john.mcnaught}@manchester.ac.uk
Natural Language Processing in the EHR Lifecycle
Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS [email protected] Health & Public Service Outline Medical Data Landscape Value Proposition of NLP
A Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA [email protected] Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA [email protected] Abstract
Text Mining - Scope and Applications
Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
SVM Based Learning System For Information Extraction
SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014
Automatic Knowledge Base Construction Systems Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 1 Text Contains Knowledge 2 Text Contains Automatically Extractable Knowledge 3
Twitter Stock Bot. John Matthew Fong The University of Texas at Austin [email protected]
Twitter Stock Bot John Matthew Fong The University of Texas at Austin [email protected] Hassaan Markhiani The University of Texas at Austin [email protected] Abstract The stock market is influenced
Context Grammar and POS Tagging
Context Grammar and POS Tagging Shian-jung Dick Chen Don Loritz New Technology and Research New Technology and Research LexisNexis LexisNexis Ohio, 45342 Ohio, 45342 [email protected] [email protected]
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)
An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines) James Clarke, Vivek Srikumar, Mark Sammons, Dan Roth Department of Computer Science, University of Illinois, Urbana-Champaign.
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Study Plan for Master of Arts in Applied Linguistics
Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment
Brill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania [email protected] October 30, 2003 Outline English sense-tagging
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim [email protected]
31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE
UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE A.J.P.M.P. Jayaweera #1, N.G.J. Dias *2 # Virtusa Pvt. Ltd. No 752, Dr. Danister De Silva Mawatha, Colombo 09, Sri Lanka * Department of Statistics
A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow
A Framework-based Online Question Answering System Oliver Scheuer, Dan Shen, Dietrich Klakow Outline General Structure for Online QA System Problems in General Structure Framework-based Online QA system
Duplication in Corpora
Duplication in Corpora Nadjet Bouayad-Agha and Adam Kilgarriff Information Technology Research Institute University of Brighton Lewes Road Brighton BN2 4GJ, UK email: [email protected]
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes Jitendra Jonnagaddala a,b,c Siaw-Teng Liaw *,a Pradeep Ray b Manish Kumar c School of Public
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic
11-792 Software Engineering EMR Project Report
11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of
Towards Task-Based Temporal Extraction and Recognition
Towards Task-Based Temporal Extraction and Recognition David Ahn, Sisay Fissaha Adafre and Maarten de Rijke Informatics Institute, University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam, The Netherlands
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files
Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance David Bixler, Dan Moldovan and Abraham Fowler Language Computer Corporation 1701 N. Collins Blvd #2000 Richardson,
Overview of the EVALITA 2009 PoS Tagging Task
Overview of the EVALITA 2009 PoS Tagging Task G. Attardi, M. Simi Dipartimento di Informatica, Università di Pisa + team of project SemaWiki Outline Introduction to the PoS Tagging Task Task definition
