Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger
|
|
- Alisha Clarke
- 7 years ago
- Views:
Transcription
1 Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger Yahoo! Research Barcelona & Toyota Technological Institute at Chicago 3rd Mini Workshop on Web Mining
2 Semantic tagging Pseudo-definition: automatically identify instances of pre-defined semantic categories or templates
3 Semantic tagging Pseudo-definition: automatically identify instances of pre-defined semantic categories or templates Clara Harris person, one of the guests in the box, stood up and demanded water. (NER)
4 Semantic tagging Pseudo-definition: automatically identify instances of pre-defined semantic categories or templates Clara Harris person, one of the guests in the box, stood up and demanded water. (NER) NS-Meg cells cell expressed mrna rna for the EPO receptor protein. (Bio-NER)
5 Semantic tagging Pseudo-definition: automatically identify instances of pre-defined semantic categories or templates Clara Harris person, one of the guests in the box, stood up and demanded water. (NER) NS-Meg cells cell expressed mrna rna for the EPO receptor protein. (Bio-NER) Challenges in domain-independent context
6 Semantic tagging Pseudo-definition: automatically identify instances of pre-defined semantic categories or templates Clara Harris person, one of the guests in the box, stood up and demanded water. (NER) NS-Meg cells cell expressed mrna rna for the EPO receptor protein. (Bio-NER) Challenges in domain-independent context suitable broad-coverage ontologies and data
7 Semantic tagging Pseudo-definition: automatically identify instances of pre-defined semantic categories or templates Clara Harris person, one of the guests in the box, stood up and demanded water. (NER) NS-Meg cells cell expressed mrna rna for the EPO receptor protein. (Bio-NER) Challenges in domain-independent context suitable broad-coverage ontologies and data semi-supervised word sense disambiguation
8 Semantic tagging Pseudo-definition: automatically identify instances of pre-defined semantic categories or templates Clara Harris person, one of the guests in the box, stood up and demanded water. (NER) NS-Meg cells cell expressed mrna rna for the EPO receptor protein. (Bio-NER) Challenges in domain-independent context suitable broad-coverage ontologies and data semi-supervised word sense disambiguation GOAL: Clara Harris person, one of the guests person in the box artifact, stood up motion and demanded communication water substance.
9 Outline Introduction 1 Introduction
10 NER WSD Introduction Semantic tagging: NER and WSD An intermediate tagging task General - named entity recognition (NER) Simple ontologies: 3-4 categories, person, location, organization, time, etc. High accuracy on several data: newswire, biomedical, etc., Useful ( Limited semantic/syntactic coverage.
11 NER WSD Introduction Semantic tagging: NER and WSD An intermediate tagging task General - named entity recognition (NER) Simple ontologies: 3-4 categories, person, location, organization, time, etc. High accuracy on several data: newswire, biomedical, etc., Useful ( Limited semantic/syntactic coverage. Specific - word sense disambiguation (WSD) Wordnet: tens of thousands of specific word senses All open class words covered, domain-independent Insufficient performance: at (first sense) baseline level
12 Supersense tagging Semantic tagging: NER and WSD An intermediate tagging task 1 Simplify the ontology (Wordnet): noun and verbs synsets mapped to 41 general semantic classes (supersenses) partial disambiguation manageable tagset
13 Supersense tagging Semantic tagging: NER and WSD An intermediate tagging task 1 Simplify the ontology (Wordnet): noun and verbs synsets mapped to 41 general semantic classes (supersenses) partial disambiguation manageable tagset 2 Adopt state-of-the-art learning methods structured learning discriminative HMM
14 Supersense tagging Semantic tagging: NER and WSD An intermediate tagging task 1 Simplify the ontology (Wordnet): noun and verbs synsets mapped to 41 general semantic classes (supersenses) partial disambiguation manageable tagset 2 Adopt state-of-the-art learning methods structured learning discriminative HMM 3 Results: Step forward in WSD accuracy, extensive NE information
15 Wordnet supersenses Wordnet supersenses Supersenses as a tagset Supersense data Wordnet 2.0: 11,306 verbs (13,508 synsets), 114,648 nouns (79,689 synsets)
16 Wordnet supersenses Wordnet supersenses Supersenses as a tagset Supersense data Wordnet 2.0: 11,306 verbs (13,508 synsets), 114,648 nouns (79,689 synsets) Synsets mapped to 26 noun and 15 verb supersenses
17 Wordnet supersenses Wordnet supersenses Supersenses as a tagset Supersense data Wordnet 2.0: 11,306 verbs (13,508 synsets), 114,648 nouns (79,689 synsets) Synsets mapped to 26 noun and 15 verb supersenses Applications Lexical acquisition (Ciaramita & Johnson, 2003 Curran, 2005) Intermediate disambiguation step in supervised WSD (Ciaramita et al. 2003) Design latent categories for parse re-ranking (Koo & Collins, 2005)
18 Wordnet supersenses Supersenses as a tagset Supersense data Advantages of supersense tagset Tagset size: small enough to adopt structured learning methods
19 Wordnet supersenses Supersenses as a tagset Supersense data Advantages of supersense tagset Tagset size: small enough to adopt structured learning methods Extensive semantic coverage Clara Harris person, one of the guests person in the box artifact, stood up motion and demanded communication water substance
20 Wordnet supersenses Supersenses as a tagset Supersense data Advantages of supersense tagset Tagset size: small enough to adopt structured learning methods Extensive semantic coverage Clara Harris person, one of the guests person in the box artifact, stood up motion and demanded communication water substance Partial disambiguation through sense merging; e.g., bark : 1 plant cover - noun.plant 2 sound made by a dog - noun.event 3 sound resembling bark-2 - noun.event 4 sailing ship - noun.artifact
21 Supersenses: nouns Wordnet supersenses Supersenses as a tagset Supersense data NOUNS SUPERSENSE NOUNS DENOTING SUPERSENSE NOUNS DENOTING act acts or actions object natural objects (not man-made) animal animals quantity quantities and units of measure artifact man-made objects phenomenon natural phenomena attribute attributes of people and objects plant plants body body parts possession possession and transfer of possession cognition cognitive processes and contents process natural processes communication comm. processes and contents person people event natural events relation relations between people, things, ideas feeling feelings and emotions shape two and three dimensional shapes food foods and drinks state stable states of affairs group groupings of people or objects substance substances location spatial position time time and temporal relations motive goals Tops abstract terms for unique beginners
22 Wordnet supersenses Supersenses as a tagset Supersense data Supersenses: nouns - extends NER tagset NOUNS SUPERSENSE NOUNS DENOTING SUPERSENSE NOUNS DENOTING act acts or actions object natural objects (not man-made) animal animals quantity quantities and units of measure artifact man-made objects phenomenon natural phenomena attribute attributes of people and objects plant plants body body parts possession possession and transfer of possession cognition cognitive processes and contents process natural processes communication comm. processes and contents person people event natural events relation relations between people, things, ideas feeling feelings and emotions shape two and three dimensional shapes food foods and drinks state stable states of affairs group groupings of people or objects substance substances location spatial position time time and temporal relations motive goals Tops abstract terms for unique beginners
23 Supersenses: verbs Wordnet supersenses Supersenses as a tagset Supersense data VERBS SUPERSENSE VERBS OF SUPERSENSE VERBS OF body grooming, dressing and bodily care emotion feeling change size, temperature change, intensifying motion walking, flying, swimming cognition thinking, judging, analyzing, doubting perception seeing, hearing, feeling communication telling, asking, ordering, singing possession buying, selling, owning competition fighting, athletic activities social political/social activities, events consumption eating and drinking stative being, having, spatial relations contact touching, hitting, tying, digging weather raining, snowing, thundering, etc. creation sewing, baking, painting, performing
24 Wordnet supersenses Supersenses as a tagset Supersense data Supersenses: B/I/0 label encoding Clara B-noun.person Harris I-noun.person, 0 one 0 of 0 the 0 guests B-noun.person in 0 the 0 box B-noun.artifact stood B-verb.motion up I-verb.motion and
25 Supersenses: annotated data Wordnet supersenses Supersenses as a tagset Supersense data Data - Semcor (SEM/SEMv) Senseval-3 (SE3): Polysemy info Dataset Counts SE3 SEM SEMv Tokens 5, , ,546 Avg-poly-N-WS Avg-poly-N-SS Avg-poly-V-WS Avg-poly-V-SS
26 Introduction Sequential classification model Discriminative sequence model Features Training data Goal: optimize the choice of labelling y i for word w i exploiting local label-to-label dependencies: Sense Y 1 Y 2 Y 3 Y n Word X 1 X 2 X 3 X n Related work: Early work on semantic tagging with HMMs (Segond et al., 1997; de Loupy et al., 1998). Little work on current WSD using HMMs (Molina et al., 2002; 2004), not better than simpler methods
27 Perceptron-trained HMM tagger Discriminative sequence model Features Training data Training examples: (x i, y i ) N (x and y are vectors) Representation: Φ maps (x, y) pairs to a feature vector Φ(x, y) IR d Discriminant function: F(x) = arg max y Y Φ(x, y), w Decoding computed with Viterbi w learned on the training data with an average perceptron (Collins, 2002) One adjustable parameter T = number of epochs
28 Taggers features Introduction Discriminative sequence model Features Training data 1 Words: x i, x i 1, x i+1, x i 2, x i+2 2 First sense: Baseline prediction for x i, fs(x i ) 3 Combined (1) and (2): x i + fs(x i ) 4 Part-of-Speech: pos i, pos i 1, pos i+1, pos i 2, pos i+2, common (NN/NNS) proper (NNP/NNPS) nouns, etc. 5 Word shape: regexp-like transformations have x* Clara Xx* I.B.M X.X.X. 6 Previous label: y i 1
29 Supersense-annotated data Discriminative sequence model Features Training data Data - Semcor (SEM/SEMv) Senseval-3 (SE3), Wordnet synset IDs substituted with supersense labels: Dataset Counts SE3 SEM SEMv Sentences ,138 17,038 Tokens 5, , ,546 Supersenses 1, ,135 40,911 Verbs ,710 40,911 Nouns ,425 0
30 First sense baseline Baseline Evaluation on Semcor Evaluation on Senseval-3 1 Identify Wordnet entries (POS info) Clara Harris N, one of the guests N in the box N, stood up V and demanded V water N 2 Assign most frequent sense according to Wordnet Clara Harris person, one of the guests person in the box artifact, stood up motion and demanded communication water substance Extremely competitive: Senseval-3 (ACL 2004) 4/26 systems above baseline (best +2.8%)
31 Evaluation on Semcor Baseline Evaluation on Semcor Evaluation on Senseval-3 Semcor Method Recall Precision F-score [σ] Rand Baseline Supersense-Tagger fold cross-validation: +10.7%, 31.2% error reduction
32 Results discussion Introduction Baseline Evaluation on Semcor Evaluation on Senseval % F-score, very promising (without multilabels) Tagger improves both precision and recall remarkably More robust than baseline in identifying instances (depends less on POS-info) F-score on person/group/location/time 82.5%, considering also common nouns SEMv, used only as fragments, contributes 1% F-score
33 Baseline Evaluation on Semcor Evaluation on Senseval-3 Evaluation on Senseval-3 all words Senseval-3 Method Recall Precision F-score [σ] Rand Baseline SenseLearner Supersense-Tagger Train = SEM/SEMv, Test = SEM 5 trials: +6.45%, 17.96% error reduction Senseval-3, best: +2.8%, 7.45% error reduction : SenseLearner (Mihalcea & Csomai, 2005) output mapped to supersenses
34 Results discussion Introduction Baseline Evaluation on Semcor Evaluation on Senseval % F-score (without multilabels) With out-of-vocabulary named-entities (many false positives) For the first time, a considerable improvement over first sense baseline (and best traditional WSD methods)
35 Conclusion Introduction Domain-independent broad-coverage semantic tagging approach based on Wordnet Supersenses and discriminative HMM tagger Positive results on Semcor, feasibility of the task. Considerable improvement over best known methods on WSD on novel data 1 Simplified semantic representation: smaller number of (shared) senses 2 Structured learning approach: potential from more sophisticated approaches (e.g., kernels) Contribution: beginning of systematic investigation of extensive semantic annotation in IE/IR/NLP
36 Ongoing work State of ontology and data data: outdated, gaps: bat, veteran, home, age, like, cover,.. ontology: inconsistencies, biased distribution of senses Adaptability: accuracy on news data seems good, although many words unknown, degrades on noisier data (blogs) Beyond WordNet: Wikipedia? Basic levelxs ontologies, to be expanded in specific domains? Applications: preliminary IE experiments: supersense info more useful than CONLL-NER info
Anotaciones semánticas: unidades de busqueda del futuro?
Anotaciones semánticas: unidades de busqueda del futuro? Hugo Zaragoza, Yahoo! Research, Barcelona Jornadas MAVIR Madrid, Nov.07 Document Understanding Cartoon our work! Complexity of Document Understanding
More informationTagging with Hidden Markov Models
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,
More informationParsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh mroth@inf.ed.ac.uk Ewan Klein University of Edinburgh ewan@inf.ed.ac.uk Abstract Software
More informationChapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
More informationEffective Self-Training for Parsing
Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu
More informationApplying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu
Applying Co-Training Methods to Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu 1 Statistical Parsing: the company s clinical trials of both its animal and human-based
More informationToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database
ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationResolving Common Analytical Tasks in Text Databases
Resolving Common Analytical Tasks in Text Databases The work is funded by the Federal Ministry of Economic Affairs and Energy (BMWi) under grant agreement 01MD15010B. Database Systems and Text-based Information
More informationCustomer Intentions Analysis of Twitter Based on Semantic Patterns
Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun mohamed.hamrounn@gmail.com Mohamed Salah Gouider ms.gouider@yahoo.fr Lamjed Ben Said lamjed.bensaid@isg.rnu.tn ABSTRACT
More informationPOSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationArchitecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
More informationBig Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
More informationProtein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track Yung-Chun Chang 1,2, Yu-Chen Su 3, Chun-Han Chu 1, Chien Chin Chen 2 and
More informationONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU
ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES
More informationSVM Based Learning System For Information Extraction
SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk
More informationExploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization
Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization Alfio Gliozzo and Carlo Strapparava ITC-Irst via Sommarive, I-38050, Trento, ITALY {gliozzo,strappa}@itc.it
More informationNatural Language Processing. Part 4: lexical semantics
Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between
More informationLanguage and Computation
Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University tamas.biro@yale.edu http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1 Practical matters
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationMicro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationSense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania htd@linc.cis.upenn.edu October 30, 2003 Outline English sense-tagging
More informationSemi-Supervised Learning for Blog Classification
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,
More informationComparing methods for automatic acquisition of Topic Signatures
Comparing methods for automatic acquisition of Topic Signatures Montse Cuadros, Lluis Padro TALP Research Center Universitat Politecnica de Catalunya C/Jordi Girona, Omega S107 08034 Barcelona {cuadros,
More informationSelected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms
Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms ESSLLI 2015 Barcelona, Spain http://ufal.mff.cuni.cz/esslli2015 Barbora Hladká hladka@ufal.mff.cuni.cz
More informationKybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es
KYOTO () Intelligent Content and Semantics Knowledge Yielding Ontologies for Transition-Based Organization http://www.kyoto-project.eu/ Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU
More informationKnowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD
Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD Eneko Agirre and Oier Lopez de Lacalle and Aitor Soroa Informatika Fakultatea, University of the Basque Country 20018,
More informationPPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
More informationText Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com
Text Analytics with Ambiverse Text to Knowledge www.ambiverse.com Version 1.0, February 2016 WWW.AMBIVERSE.COM Contents 1 Ambiverse: Text to Knowledge............................... 5 1.1 Text is all Around
More informationPoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy fabio.tamburini@unibo.it Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
More informationTowards Automatic Animated Storyboarding
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Towards Automatic Animated Storyboarding Patrick Ye and Timothy Baldwin Computer Science and Software Engineering NICTA
More informationClassifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
More informationInteractive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
More informationAutomatic assignment of Wikipedia encyclopedic entries to WordNet synsets
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells Computer Science Dep., Universidad Autonoma de Madrid, 28049 Madrid, Spain
More informationBuilding the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)
Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Multilingual Word Sense Disambiguation and Entity Linking on the Web based on BabelNet Roberto Navigli, Tiziano
More informationSemantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
More informationDomain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
More informationDraft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001
A comparison of the OpenGIS TM Abstract Specification with the CIDOC CRM 3.2 Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001 1 Introduction This Mapping has the purpose to identify, if the OpenGIS
More informationA Framework for Named Entity Recognition in the Open Domain
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics School of Humanities, Languages, and Social Sciences University of Wolverhampton Stafford
More informationAutomatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014
Automatic Knowledge Base Construction Systems Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 1 Text Contains Knowledge 2 Text Contains Automatically Extractable Knowledge 3
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationNAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationAutomated Extraction of Vulnerability Information for Home Computer Security
Automated Extraction of Vulnerability Information for Home Computer Security Sachini Weerawardhana, Subhojeet Mukherjee, Indrajit Ray, and Adele Howe Computer Science Department, Colorado State University,
More informationBrill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
More informationWhat Is This, Anyway: Automatic Hypernym Discovery
What Is This, Anyway: Automatic Hypernym Discovery Alan Ritter and Stephen Soderland and Oren Etzioni Turing Center Department of Computer Science and Engineering University of Washington Box 352350 Seattle,
More informationA Knowledge-Poor Approach to BioCreative V DNER and CID Tasks
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,
More informationPhase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
More informationMIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationExtracting Events from Web Documents for Social Media Monitoring using Structured SVM
IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E85A/B/C/D, No. xx JANUARY 20xx Letter Extracting Events from Web Documents for Social Media Monitoring using Structured SVM Yoonjae Choi,
More informationA Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract
More informationBuilding a Spanish MMTx by using Automatic Translation and Biomedical Ontologies
Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies Francisco Carrero 1, José Carlos Cortizo 1,2, José María Gómez 3 1 Universidad Europea de Madrid, C/Tajo s/n, Villaviciosa
More informationTesting Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
More informationLearning to Identify Emotions in Text
Learning to Identify Emotions in Text Carlo Strapparava FBK-Irst, Italy strappa@itc.it Rada Mihalcea University of North Texas rada@cs.unt.edu ABSTRACT This paper describes experiments concerned with the
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
More informationExtraction and Visualization of Protein-Protein Interactions from PubMed
Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much
More informationSemantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing
Semantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento
More informationCINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
More informationA Semantic Model for Multimodal Data Mining in Healthcare Information Systems. D.K. Iakovidis & C. Smailis
A Semantic Model for Multimodal Data Mining in Healthcare Information Systems D.K. Iakovidis & C. Smailis Department of Informatics and Computer Technology Technological Educational Institute of Lamia,
More informationIdentifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of
More informationAutomated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion Transcripts Vitomir Kovanović v.kovanovic@ed.ac.uk Dragan Gašević dgasevic@acm.org School of Informatics, University of Edinburgh Edinburgh, United Kingdom v.kovanovic@ed.ac.uk
More informationComputational Linguistics and Learning from Big Data. Gabriel Doyle UCSD Linguistics
Computational Linguistics and Learning from Big Data Gabriel Doyle UCSD Linguistics From not enough data to too much Finding people: 90s, 700 datapoints, 7 years People finding you: 00s, 30000 datapoints,
More information31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
More information<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany
Information Systems University of Koblenz Landau, Germany Exploiting Spatial Context in Images Using Fuzzy Constraint Reasoning Carsten Saathoff & Agenda Semantic Web: Our Context Knowledge Annotation
More informationIntroduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationDomain Independent Knowledge Base Population From Structured and Unstructured Data Sources
Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle
More informationAnnotation and Evaluation of Swedish Multiword Named Entities
Annotation and Evaluation of Swedish Multiword Named Entities DIMITRIOS KOKKINAKIS Department of Swedish, the Swedish Language Bank University of Gothenburg Sweden dimitrios.kokkinakis@svenska.gu.se Introduction
More informationContext Aware Predictive Analytics: Motivation, Potential, Challenges
Context Aware Predictive Analytics: Motivation, Potential, Challenges Mykola Pechenizkiy Seminar 31 October 2011 University of Bournemouth, England http://www.win.tue.nl/~mpechen/projects/capa Outline
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationSentiment analysis: towards a tool for analysing real-time students feedback
Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: nabeela.altrabsheh@port.ac.uk Mihaela Cocea Email: mihaela.cocea@port.ac.uk Sanaz Fallahkhair Email:
More informationUsing Semantic Data Mining for Classification Improvement and Knowledge Extraction
Using Semantic Data Mining for Classification Improvement and Knowledge Extraction Fernando Benites and Elena Sapozhnikova University of Konstanz, 78464 Konstanz, Germany. Abstract. The objective of this
More informationPOS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23
POS Def. Part of Speech POS POS L645 POS = Assigning word class information to words Dept. of Linguistics, Indiana University Fall 2009 ex: the man bought a book determiner noun verb determiner noun 1
More informationSemantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation
Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation Denis Turdakov, Pavel Velikhov ISP RAS turdakov@ispras.ru, pvelikhov@yahoo.com
More informationMachine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
More informationRefining the most frequent sense baseline
Refining the most frequent sense baseline Judita Preiss Department of Linguistics The Ohio State University judita@ling.ohio-state.edu Josh King Computer Science and Engineering The Ohio State University
More informationA Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches
J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria 105 A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria
More informationInferring Probabilistic Models of cis-regulatory Modules. BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.
Inferring Probabilistic Models of cis-regulatory Modules MI/S 776 www.biostat.wisc.edu/bmi776/ Spring 2015 olin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following
More informationCENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
More informationWhy language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles
Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like
More informationSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data Apoorv Agarwal Boyi Xie Ilia Vovsha Owen Rambow Rebecca Passonneau Department of Computer Science Columbia University New York, NY 10027 USA {apoorv@cs, xie@cs, iv2121@,
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
More informationAutomated Annotation of Events Related to Central Venous Catheterization in Norwegian Clinical Notes
Automated Annotation of Events Related to Central Venous Catheterization in Norwegian Clinical Notes Ingrid Andås Berg Healthcare Informatics Submission date: March 2014 Supervisor: Øystein Nytrø, IDI
More informationApplications of Named Entity Recognition in Customer Relationship Management Systems
Applications of Named Entity Recognition in Customer Relationship Management Systems Farbod Saraf Jadidian September 2014 Dissertation submitted in partial fulfilment for the degree of Master of Science
More informationOnline Latent Structure Training for Language Acquisition
IJCAI 11 Online Latent Structure Training for Language Acquisition Michael Connor University of Illinois connor2@illinois.edu Cynthia Fisher University of Illinois cfisher@cyrus.psych.uiuc.edu Dan Roth
More informationOverview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
More informationPortage Guide Birth to Six Preschool Indicator 7 Child Outcomes Crosswalk. Outcome 2 Acquisition & Use of Knowledge & Skills
Portage Guide Birth to Six Preschool Indicator 7 Child Outcomes Crosswalk NPG Domains Outcome 1 Positive Social Emotional Skills Outcome 2 Acquisition & Use of Knowledge & Skills Outcome 3 Appropriate
More informationData Cleansing for Remote Battery System Monitoring
Data Cleansing for Remote Battery System Monitoring Gregory W. Ratcliff Randall Wald Taghi M. Khoshgoftaar Director, Life Cycle Management Senior Research Associate Director, Data Mining and Emerson Network
More informationComparing Ontology-based and Corpusbased Domain Annotations in WordNet.
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information
More informationComputer Standards & Interfaces
Computer Standards & Interfaces 35 (2013) 470 481 Contents lists available at SciVerse ScienceDirect Computer Standards & Interfaces journal homepage: www.elsevier.com/locate/csi How to make a natural
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
More informationWorkshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science
Engineering & Health Informa2on Science Engineering NLP Solu/ons for Structured Informa/on from Clinical Text: Extrac'ng Sen'nel Events from Pallia've Care Consult Le8ers Canada-China Clean Energy Initiative
More informationAutomated Extraction of Security Policies from Natural-Language Software Documents
Automated Extraction of Security Policies from Natural-Language Software Documents Xusheng Xiao 1 Amit Paradkar 2 Suresh Thummalapenta 3 Tao Xie 1 1 Dept. of Computer Science, North Carolina State University,
More informationHow To Create A Text Classification System For Spam Filtering
Term Discrimination Based Robust Text Classification with Application to Email Spam Filtering PhD Thesis Khurum Nazir Junejo 2004-03-0018 Advisor: Dr. Asim Karim Department of Computer Science Syed Babar
More informationA Mapping of CIDOC CRM Events to German Wordnet for Event Detection in Texts
A Mapping of CIDOC CRM Events to German Wordnet for Event Detection in Texts Martin Scholz Friedrich-Alexander-University Erlangen-Nürnberg Digital Humanities Research Group Outline Motivation: information
More information