Extraction and Visualization of Protein-Protein Interactions from PubMed
|
|
- Mariah Hutchinson
- 8 years ago
- Views:
Transcription
1 Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin
2 Finding Relevant Knowledge Find information about Much knowledge is in text (and only text) Find articles with information about - PubMed/Medline - Which diseases is RAB5 associated to? Find information about inside each article - Reading many abstracts is tedious - What about a summarize results button? Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/2006 2
3 Question What is the risk of treating malaria patients that have a G6PD (Glucose 6-Phosphate Dehydrogenase) deficiency with Primaquine? Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/2006 3
4 Use PubMed Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/2006 4
5 Use AliBaba Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/2006 5
6 Use AliBaba Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/2006 6
7 Question Which proteins are associated to RAB5? Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/2006 7
8 Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/2006 8
9 Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/2006 9
10 Finding Relevant Knowledge Find information about Find articles with information about - PubMed/Medline - Which diseases is RAB5 associated to? Find information about inside each article - Reading many abstracts is tedious - What about a summarize results button? Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
11 Overview Why text mining for biomedical research Extraction of protein-protein interactions from text - Learning language patterns - Pattern generalization - Evaluation Alibaba: Summarizing PubMed results Vision Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
12 Possible Approaches to PPI Co-occurrence - Two proteins in one sentences -> PPI - Tendency: Low precision, very good recall Full sentence parsing - Recognizes syntactic relationship between entities - Extraction uses rules navigating syntax tree - Only ~30% of all sentences can be parsed unambiguously But recent developments (e.g. INFO-PUBMED, Rinaldi et al.) - Tendency: Good precision, low recall Pattern matching Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
13 Relationship Mining Language pattern - Sentence GENE regulates expression of GENE GENE is strongly suppressed by GENE - Adding part-of-speech GENE VRB NOM PRP GENE GENE is ADJ VRB PRP GENE Different levels of generality - GENE.* VRB.* GENE Simple rules, high recall, low precision - GENE [is] ADJ? {regulat suppres} NOM? PRP GENE Complex rules, lower recall, higher precision Balanced precision/recall requires many rules Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
14 State-of-the-Art Most systems work on hand-crafted sets of pattern - Hundreds of pattern - Enormous effort - Need to be created for any type of relationship Our idea Protein-protein, gene-disease, disease-drug, - Learn patterns automatically Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
15 Recall Bioinformatics Protein families are often defined by patterns How to find protein families? - [Very simple method] - Compute distances between protein sequences Alignment - Find clusters of similar sequences E.g. using hierarchical clustering - Build multiple sequence alignment for each sequence E.g. using ClustalW, DAlign, - Compute profile for each MSA From sequences (of AA) to sentences (of tokens) Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
16 AliBaba Workflow PubMed IntAct Protein pairs Search sentences Linguistic annotation Initial patterns Clustering Alignment Consensus pattern Extracted PPI Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
17 Initial Pattern Extract all pairs of proteins from IntAct - Only the names, not the evidence / links - Gold standard: These interactions are assumed to be real Find all sentences in PubMed - Pair of proteins and interaction word - FADD immediately activates procaspase-8 Extract core phrases - Width: Parameter - show that FADD immediately activates procaspase-8 during Annotate with linguistic information Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
18 Linguistic Annotation Multi-layered pattern Original FADD immediately activates procaspase-8 Class / POS PTN ADV VRB PTN Stem Token PTN immediat activat PTN PTN immediately activates PTN Initial pattern set - Highly specific - Can be used immediately, but results in very low recall Need to be generalized Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
19 Workflow PubMed IntAct Protein pairs Search sentences Linguistic annotation Initial patterns Clustering Alignment Consensus pattern Extracted PPI Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
20 Pattern Generalization Initial patterns - Too many (performance is an issue) - Too specific - Miss many little linguistic derivations Find clusters of similar patterns - Requires a distance measure for language patterns For each cluster, generate consensus pattern - Compute commonality of each set - Generate a new, generalized pattern Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
21 Distances of Initial Patterns Sentence alignment One layer: Standard dynamic programming End-Free alignment of patterns (core phrases) against sentences Cost for insertion, deletion, match, replacement Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
22 Substitution Matrices One substitution matrix per layer Layers can be weighted Score is aggregated over all layers c( i, j) = w l layers l * score Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/ l ( i[ l], j[ l])
23 Clustering and Generalization Distance matrix for all pairs of initial patterns Hierarchical clustering Consensus pattern using multiple sentence alignment - Generates a profile per layer Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
24 Workflow PubMed IntAct Protein pairs Search sentences NER and POS tagging Initial patterns Clustering Alignment Consensus pattern Extracted PPI Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
25 Search Phase Given a text: All sentences - are searched for at least two protein names - matched against all consensus pattern - Complication: Matching a sentence (i.e. a multi-layered pattern) against a pattern profile c( i, j) = wl * scorel ( i[ l], j[ l])*(1 freq( i[ l])) l layers Highest scoring pattern wins Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
26 Evaluation ~ IntAct pairs ~ sentences containing an IntAct pair and an interaction word ~ unique initial patterns - Difference between abstracts and full text Evaluation using SPIES corpus - Hao et al. 2004, ~900 sentences, ~1500 annotated PPI - Not the best corpus one can think of Only sentences with 2 proteins, taken from very few papers But strongest competitor Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
27 Results Using initial patterns directly - As expected: Precision ~85%, recall ~15% Generalization: ~9.500 consensus pattern - Some very large, most very small - Can be tuned towards precision or recall (cluster threshold) Result: 79% precision at 52% recall - F-measure: 63 - Most important type of error: Enumerations CUL-1 interacts with SKR-1, SKR-2, SKR-3, and SKR-10 - Tweaking towards higher recall yields 74 / 57 Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
28 Comparison Hao et al. report F-measure of 68 - Semi-automatic system - Patterns are learned from annotated corpus - Self-made corpus - [Alibaba on home-made corpus: F-measure 66] Alibaba - Needs no learning corpus at all - Semi-supervised methods examples are almost correct - Highly adaptable to different tasks Examples readily available in many databases Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
29 Overview Why text mining for biomedical research Extraction of protein-protein interactions from text Alibaba: Summarizing PubMed results Vision Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
30 Workflow Client 1. PubMed Query Server 2. Query PMIDs Internet Annotated Texts (XML) PMID: PMID: Local Document Index Annotation Pipeline Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
31 Alibaba Analyses results of a PubMed query - Full PubMed query syntax - Scope of analysis is defined by user Extracting and visualizing information - Entities: dictionary matches [Kirsch et al. 05] Genes, proteins, diseases, cells, tissues, species, drugs - Detects PPI using extraction pipeline - Detects further relationships using co-occurrence - Confidence scores Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
32 Query Extracted infos Visualization of extracted relationships Links to databases Links to textual evidence Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
33 Walk-through Which proteins are associated with the TNFalpha associated death domain (TRADD)? Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
34 Many! Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
35 Filter by Object Type and Confidence Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
36 Show only Connected Objects Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
37 Show Type of Interaction Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
38 Location of Interaction Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
39 View Annotated Abstracts Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
40 Overview Why text mining for biomedical research Extraction of protein-protein interactions from text Alibaba: Summarizing PubMed results Vision Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
41 Annotated Relationships Relationships have many parameters Example: Modeling in Systems Biology The apparent K(m) value was calculated for adenosine and found to be 3.63 x 10(-3) M, which indicates high affinity of adenosine deaminase for its substrate adenosine. Constant: K(m) Value: 3.63 x 10(-3) Unit: M Enzyme: Adenosine deaminase Compound: adenosine Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
42 KMedDB Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
43 More Overlying extracted networks with established pathways (KEGG) Application to other types of relationships - Protein disease, disease target drug - Annotated corpora for evaluation welcome Improving text mining performance Disambiguation Advanced NER methods (links are lost) Larger learning sample (reactome, BIND, DIP, ) Scalability Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
44 Conclusion Learning patterns is possible - Quickly adaptable to different tasks Corpus creation is a bottleneck - Even if available, might not be suitable for task at hand - Use semi-supervised methods - The more data, the more promising (full text, web) What is an interaction? - Probably hardest problem for higher felt precision - Solve more specific problems - [Alibaba: task-specific lists of interaction words] Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
45 Acknowledgements Humboldt-Universität, Informatics - Jörg Hakenberg Torsten Schiemann - Conrad Plake Markus Pankalla - Lukas Faulstich Long Nguyen Max-Planck-Institute for Molecular Genetics - Edda Klipp, Sebastian Schmeier, Axel Kowald European Bioinformatics Institute - Harald Kirsch, Dietrich Rebholz-Schumann Ulf Leser: Visualizing PPI from text, SCAI Text Mining Symposium, 10/
Text Mining and Knowledge Management
Text Mining and Knowledge Management Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Berlin Center for Genome Based Bioinformatics University of Applied Sciences Berlin Center
More informationCENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
More informationPPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
More informationProtein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track Yung-Chun Chang 1,2, Yu-Chen Su 3, Chun-Han Chu 1, Chien Chin Chen 2 and
More informationPOSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
More informationToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database
ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch
More informationProteinQuest user guide
ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for
More informationDiscover more, discover faster. High performance, flexible NLP-based text mining for life sciences
Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences It s not information overload, it s filter failure. Clay Shirky Life Sciences organizations face the challenge
More informationSyntactic Parsing for Bio-molecular Event Detection from Scientific Literature
Syntactic Parsing for Bio-molecular Event Detection from Scientific Literature Sérgio Matos 1, Anabela Barreiro 2, and José Luis Oliveira 1 1 IEETA, Universidade de Aveiro, Campus Universitário de Santiago,
More informationFinal Program Auction - Diagnos and Competitors
Final Program Second BioCreAtIvE Challenge Workshop: Critical Assessment of Information Extraction in Molecular Biology Venue: Auditorium Madrid, April, 23-25, 2007 Main Organizer Prof. Alfonso Valencia,
More informationText Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk
Text Mining for Health Care and Medicine Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk The Need for Text Mining MEDLINE 2005: ~14M 2009: ~18M Overwhelming information in textual,
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationInteractive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
More informationExtracting value from scientific literature: the power of mining full-text articles for pathway analysis
FOR PHARMA & LIFE SCIENCES WHITE PAPER Harnessing the Power of Content Extracting value from scientific literature: the power of mining full-text articles for pathway analysis Executive Summary Biological
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationProtein Protein Interactions (PPI) APID (Agile Protein Interaction DataAnalyzer)
APID (Agile Protein Interaction DataAnalyzer) 23 APID (Agile Protein Interaction DataAnalyzer) Integrates and unifies 7 DBs: BIND, DIP, HPRD, IntAct, MINT, BioGRID. Includes 51,873 proteins 241,204 interactions
More informationTechnical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG
More informationUnderstanding Biology in the Era of Big Data:
FOR PHARMA & LIFE SCIENCES WHITE PAPER Understanding Biology in the Era of Big Data: Depth of Coverage Matters Executive Summary Biological research today can be summarized in one word data. With more
More informationVad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives Dirk.Repsilber@oru.se 2015-05-21 Functional Bioinformatics, Örebro University Vad är bioinformatik och varför
More informationDoctor of Philosophy in Computer Science
Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects
More informationSAP HANA Enabling Genome Analysis
SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in
More informationA leader in the development and application of information technology to prevent and treat disease.
A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationPerCuro-A Semantic Approach to Drug Discovery. Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang
PerCuro-A Semantic Approach to Drug Discovery Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang Towards the fulfillment of the course Semantic Web CSCI 8350 Fall 2003 Under
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationOpen Flow Biological Network Initiative: Pathway map building, standards, simulation, and knowledge sharing
Open Flow Biological Network Initiative: Pathway map building, standards, simulation, and knowledge sharing Hiroaki Kitano (1,2), Yukiko Matsuoka (1) (1) The Systems Biology Institute, (2) OIST 2009/04/07
More informationMolecular event extraction from Link Grammar parse trees in the BioNLP 09 Shared Task
Computational Intelligence, Volume xx, Number 000, 2009 Molecular event extraction from Link Grammar parse trees in the BioNLP 09 Shared Task Võ HáNguyên, Jörg Hakenberg, Luis Tari, Chitta Baral, Arizona
More informationIdentifying and extracting malignancy types in cancer literature
Identifying and extracting malignancy types in cancer literature Yang Jin 1, Ryan T. McDonald 2, Kevin Lerman 2, Mark A. Mandel 4, Mark Y. Liberman 2, 4, Fernando Pereira 2, R. Scott Winters 3 1, 3,, Peter
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationHPI in-memory-based database system in Task 2b of BioASQ
CLEF 2014 Conference and Labs of the Evaluation Forum BioASQ workshop HPI in-memory-based database system in Task 2b of BioASQ Mariana Neves September 16th, 2014 Outline 2 Overview of participation Architecture
More informationKinexus has an in-house inventory of lysates prepared from 16 human cancer cell lines that have been selected to represent a diversity of tissues,
Kinexus Bioinformatics Corporation is seeking to map and monitor the molecular communications networks of living cells for biomedical research into the diagnosis, prognosis and treatment of human diseases.
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationChapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
More informationLeading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik
Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated
More informationBuild Vs. Buy For Text Mining
Build Vs. Buy For Text Mining Why use hand tools when you can get some rockin power tools? Whitepaper April 2015 INTRODUCTION We, at Lexalytics, see a significant number of people who have the same question
More informationProcessing Genome Data using Scalable Database Technology. My Background
Johann Christoph Freytag, Ph.D. freytag@dbis.informatik.hu-berlin.de http://www.dbis.informatik.hu-berlin.de Stanford University, February 2004 PhD @ Harvard Univ. Visiting Scientist, Microsoft Res. (2002)
More informationMETHODS IN MEDICAL INFORMATICS
Chapman & Hall/CRC Mathematical and Computational Biology Series METHODS IN MEDICAL INFORMATICS Fundamentals of Healthcare Programming in Perln Pythoni and Ruby Jules J- Berman TECHNISCHE INFORMATION SBIBLIOTHEK
More informationIEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and
More informationPresenting data: how to convey information most effectively Centre of Research Excellence in Patient Safety 20 Feb 2015
Presenting data: how to convey information most effectively Centre of Research Excellence in Patient Safety 20 Feb 2015 Biomedical Informatics: helping visualization from molecules to population Dr. Guillermo
More informationBioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
More informationAn Interactive De-Identification-System
An Interactive De-Identification-System Katrin Tomanek 1, Philipp Daumke 1, Frank Enders 1, Jens Huber 1, Katharina Theres 2 and Marcel Müller 2 1 Averbis GmbH, Freiburg/Germany http://www.averbis.com
More informationAccelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
More informationIntroduction to IE with GATE
Introduction to IE with GATE based on Material from Hamish Cunningham, Kalina Bontcheva (University of Sheffield) Melikka Khosh Niat 8. Dezember 2010 1 What is IE? 2 GATE 3 ANNIE 4 Annotation and Evaluation
More informationDutch Parallel Corpus
Dutch Parallel Corpus Lieve Macken lieve.macken@hogent.be LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT
More informationA Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationFrom Data to Foresight:
Laura Haas, IBM Fellow IBM Research - Almaden From Data to Foresight: Leveraging Data and Analytics for Materials Research 1 2011 IBM Corporation The road from data to foresight is long? Consumer Reports
More informationBIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
More informationDr Alexander Henzing
Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander
More informationSemantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies
Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies 1 Enterprise Information Challenge Source: Oracle customer 2 Vision of Semantically Linked Data The Network of Collaborative
More informationTerminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
More informationNatural Language Processing for Bioinformatics: The Time is Ripe
Natural Language Processing for Bioinformatics: The Time is Ripe Jeffrey T. Chang Soumya Raychaudhuri is a Ph.D. candidate in the Russ Altman lab in the Biomedical Informatics program at Stanford University.
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationVisualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
More informationInformation Extraction from Patents: Combining Text- and Image-Mining. Martin Hofmann-Apitius
Information Extraction from Patents: Combining Text- and Image-Mining Martin Hofmann-Apitius Bonn-Aachen International Centre for Information Technology (B-IT) September 25, 2007 Status Report: Major Achievements
More informationPipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices
overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding
More informationBANNER: AN EXECUTABLE SURVEY OF ADVANCES IN BIOMEDICAL NAMED ENTITY RECOGNITION
BANNER: AN EXECUTABLE SURVEY OF ADVANCES IN BIOMEDICAL NAMED ENTITY RECOGNITION ROBERT LEAMAN Department of Computer Science and Engineering, Arizona State University GRACIELA GONZALEZ * Department of
More informationEfficient Data Integration in Finding Ailment-Treatment Relation
IJCST Vo l. 3, Is s u e 3, Ju l y - Se p t 2012 ISSN : 0976-8491 (Online) ISSN : 2229-4333 (Print) Efficient Data Integration in Finding Ailment-Treatment Relation 1 A. Nageswara Rao, 2 G. Venu Gopal,
More informationResolving Common Analytical Tasks in Text Databases
Resolving Common Analytical Tasks in Text Databases The work is funded by the Federal Ministry of Economic Affairs and Energy (BMWi) under grant agreement 01MD15010B. Database Systems and Text-based Information
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationBrill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
More informationClinical and research data integration: the i2b2 FSM experience
Clinical and research data integration: the i2b2 FSM experience Laboratory of Biomedical Informatics for Clinical Research Fondazione Salvatore Maugeri - FSM - Hospital, Pavia, italy Laboratory of Biomedical
More information11-792 Software Engineering EMR Project Report
11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationAn Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2011 Editor(s): Contributor(s): Reviewer(s): Status-Version: Volha Petukhova, Arantza del Pozo Mirjam
More informationMicro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
More informationThe PALAVRAS parser and its Linguateca applications - a mutually productive relationship
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark eckhard.bick@mail.dk Outline Flow chart Linguateca Palavras History
More informationCOMPARING USABILITY OF MATCHING TECHNIQUES FOR NORMALISING BIOMEDICAL NAMED ENTITIES
COMPARING USABILITY OF MATCHING TECHNIQUES FOR NORMALISING BIOMEDICAL NAMED ENTITIES XINGLONG WANG AND MICHAEL MATTHEWS School of Informatics, University of Edinburgh Edinburgh, EH8 9LW, UK {xwang,mmatsews}@inf.ed.ac.uk
More informationLecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr
Lecture 11 Data storage and LIMS solutions Stéphane LE CROM lecrom@biologie.ens.fr Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog
More informationALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search
Project for Michael Pitts Course TCSS 702A University of Washington Tacoma Institute of Technology ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search Under supervision of : Dr. Senjuti
More informationClassification and Prioritization of Biomedical Literature for the Comparative Toxicogenomics Database
Classification and Prioritization of Biomedical Literature for the Comparative Toxicogenomics Database Dina VISHNYAKOVA a,b,d,1, Emilie PASCHE a,b,d, Julien GOBEILL a,c,d, Arnaud GAUDINAT a,c,d, Christian
More informationTMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes Jitendra Jonnagaddala a,b,c Siaw-Teng Liaw *,a Pradeep Ray b Manish Kumar c School of Public
More informationNetwork Protocol Analysis using Bioinformatics Algorithms
Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol
More informationData Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov
Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray
More informationAsk your Database: Natural Language Processing using In-Memory Technology
Enterprise Platform and Integration Concepts Master Project Summer Term 2015 Ask your Database: Natural Language Processing using In-Memory Technology Dr. Mariana Neves April 10th, 2015 Question Answering
More informationorg.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.
org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank
More informationCuration of NLP Pipeline - A Review
ASSISTED CURATION: DOES TEXT MINING REALLY HELP? BEATRICE ALEX, CLAIRE GROVER, BARRY HADDOW, MIJAIL KABADJOV, EWAN KLEIN, MICHAEL MATTHEWS, STUART ROEBUCK, RICHARD TOBIN, AND XINGLONG WANG School of Informatics
More informationContent visualization of scientific corpora using an extensible relational database implementation
. Content visualization of scientific corpora using an extensible relational database implementation Eleftherios Stamatogiannakis, Ioannis Foufoulas, Theodoros Giannakopoulos, Harry Dimitropoulos, Natalia
More informationCreating Metabolic Network Models using Text Mining and Expert Knowledge
Creating Metabolic Network Models using Text Mining and Expert Knowledge J.A. Dickerson 1, D. Berleant 1, Z. Cox 1, W. Qi 1, and E. Wurtele 2 Iowa State University, Ames, IA, 50011 Abstract: This paper
More informationLarge Scale Text Analysis Using the Map/Reduce
Large Scale Text Analysis Using the Map/Reduce Hierarchy David Buttler This work is performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationBio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
More informationFind the signal in the noise
Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical
More informationGuide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
More informationSearch and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
More informationAcceleration for Personalized Medicine Big Data Applications
Acceleration for Personalized Medicine Big Data Applications Zaid Al-Ars Computer Engineering (CE) Lab Delft Data Science Delft University of Technology 1" Introduction Definition & relevance Personalized
More information72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology Paulo.gottgtroy@aut.ac.nz Abstract This paper is
More informationLDIF - Linked Data Integration Framework
LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,
More informationSIMOnt: A Security Information Management Ontology Framework
SIMOnt: A Security Information Management Ontology Framework Muhammad Abulaish 1,#, Syed Irfan Nabi 1,3, Khaled Alghathbar 1 & Azeddine Chikh 2 1 Centre of Excellence in Information Assurance, King Saud
More informationBig Data Problem? or Big Problem with Data? William Hayes, PhD SVP PlaCorm Dev, Selventa
Big Data Problem? or Big Problem with Data? William Hayes, PhD SVP PlaCorm Dev, Selventa 2013, Selventa. All Rights Reserved. Confiden;al 1 Who am I? ex- Aerospace Engineer Defected to Bioinforma;cs (PhD
More informationTS3: an Improved Version of the Bilingual Concordancer TransSearch
TS3: an Improved Version of the Bilingual Concordancer TransSearch Stéphane HUET, Julien BOURDAILLET and Philippe LANGLAIS EAMT 2009 - Barcelona June 14, 2009 Computer assisted translation Preferred by
More informationWeb-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationDeCyder Extended Data Analysis (EDA) Software
Part of GE Healthcare Data File 28-4015-41 AA DeCyder Extended Data Analysis (EDA) Software DeCyder EDA DeCyder Extended Data Analysis Software (DeCyder EDA) is high-performance informatics software for
More informationKybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es
KYOTO () Intelligent Content and Semantics Knowledge Yielding Ontologies for Transition-Based Organization http://www.kyoto-project.eu/ Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU
More information