Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences



Similar documents
Information Extraction from Patents: Combining Text- and Image-Mining. Martin Hofmann-Apitius

Ph.D. in Bioinformatics and Computational Biology Degree Requirements

A leader in the development and application of information technology to prevent and treat disease.

How to Work with a Reference Answer Set

Informatics and Knowledge Management at the Novartis Institutes for BioMedical Research (NIBR)

Guidelines for Establishment of Contract Areas Computer Science Department

Information Extraction Technologies in Chemistry A Critical Review

Integrating Medicinal Chemistry and Computational Chemistry: The Molecular Forecaster Approach

CLUSTER ANALYSIS WITH R

ProteinQuest user guide

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining

Bio-IT World 2013 Best Practices Awards

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences

Master of Philosophy (MPhil) and Doctor of Philosophy (PhD) Programs in Life Science

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

Integrating Bioinformatics, Medical Sciences and Drug Discovery

M The Nucleus M The Cytoskeleton M Cell Structure and Dynamics

PPInterFinder A Web Server for Mining Human Protein Protein Interaction

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Big Data and Text Mining

Virtual research environments: learning gained from a situation and needs analysis for malaria researchers

The INFUSIS Project Data and Text Mining for In Silico Modeling

K 066/875. Master Curriculum. Bioinformatics. (in English)

Dr Alexander Henzing

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

Classification and Prioritization of Biomedical Literature for the Comparative Toxicogenomics Database

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Data, Measurements, Features

Scuola di dottorato in Scienze molecolari Information literacy in chemistry Patents

COURSE TITLE COURSE DESCRIPTION

Kazan (Volga region) Federal University, Kazan, Russia Institute of Fundamental Medicine and Biology. Master s program.

Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee

Technical Report. The KNIME Text Processing Feature:

Course Specification

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems.

Programme Specification (Undergraduate) Date amended: August 2012

1. Program Title Master of Science Program in Biochemistry (International Program)

Bachelor of Science in Applied Bioengineering

2019 Healthcare That Works for All

Distributed Bioinformatics Computing System for DNA Sequence Analysis

Bachelor of Science in Pharmaceutical Sciences (BSPS) Program Overview and Internship Requirements

Welcome Address by the. State Secretary at the Federal Ministry of Education and Research. Dr Georg Schütte

Electronic Laboratory Notebook in the Graduate Level Laboratory Informatics Program

CNAS ASSESSMENT COMMITTEE CHEMISTRY (CH) DEGREE PROGRAM CURRICULAR MAPPINGS AND COURSE EXPECTED STUDENT LEARNING OUTCOMES (SLOs)

Forensic Science Standards and Benchmarks

Medical Informatics An Overview Saudi Board For Community Medicine

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

Language: English Lecturer: Gianni de Fabritiis. Teaching staff: Language: English Lecturer: Jordi Villà i Freixa

Course Requirements for the Ph.D., M.S. and Certificate Programs

Data Visualization in Cheminformatics. Simon Xi Computational Sciences CoE Pfizer Cambridge

Anforderungen der Life-Science Industrie an die Hochschulen. Hans Widmer Novartis Institutes for BioMedical Research

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Environmental Research and Innovation ( ERIN )

Web Mining using Artificial Ant Colonies : A Survey

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Teacher Guide: Have Your DNA and Eat It Too ACTIVITY OVERVIEW.

Scientific Business Intelligence using Pipeline Pilot

UNIVERSITY OF MARIBOR FACULTY OF ECONOMICS AND BUSINESS Razlagova ulica 14, 2000 Maribor

Bachelor of Science in Biochemistry and Molecular Biology

On Covert Data Communication Channels Employing DNA Steganography with Application in Massive Data Storage

Doctor of Philosophy in Computer Science

From Data to Foresight:

Biomedical Informatics: Computer Applications in Health Care and Biomedicine

CTC Technology Readiness Levels

SCIENCE. Introducing updated Cambridge International AS & A Level syllabuses for. Biology 9700 Chemistry 9701 Physics 9702

UNIVERSITY OF MARIBOR FACULTY OF CHEMISTRY AND CHEMICAL ENGINEERING INFORMATION PACKAGE / INTERNATIONAL EXCHANGE STUDENTS' GUIDE 2016/2017.

PROGRAMME SPECIFICATION

Extracting value from scientific literature: the power of mining full-text articles for pathway analysis

High Performance Computing Initiatives

A Laboratory Information. Management System for the Molecular Biology Lab

Big Data in Drug Discovery

Cheminformatics and its Role in the Modern Drug Discovery Process

Intro to Bioinformatics

2. SUMMER ADVISEMENT AND ORIENTATION PERIODS FOR NEWLY ADMITTED FRESHMEN AND TRANSFER STUDENTS

Code of Conduct and Best Practice for Access and Benefit Sharing

Course Curriculum for Master Degree in Medical Laboratory Sciences/Clinical Biochemistry

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Biomedical engineering

COMPUTATIONAL LIFE SCIENCE (MSc) GRADUATE PROGRAM

Publication of small-unit-cell structures in Acta Crystallographica Michael Hoyland ECM28 University of Warwick, 2013

Transcription:

Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences Prof. Dr. Martin Hofmann-Apitius Head of the Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCA Professor for Applied Life Science Informatics, Bonn-Aachen International Center for Information Technology (B-IT)

Rapid Paradigm Changes : Addressing Increasing Complexity From genes to proteins to biological function Multiscale integration of biological, medical and chemical information Systems Biology Simulation of Lif Isolated molecules Genomics Molecular Biology Organic Chemistry Biochemistry 1900 1950 1960 1970 1980 1990 2000 2010 2020 Seite 2

Acquisition of BioMedical Knowledge Before the Internet Study textbooks Learn how to use a library Focus on a subject Read scientific publications Become an expert in the field Seite 3

Acquisition of BioMedical Knowledge since 1997 Study textbooks Learn how to use MEDLINE Get an overview on a subject Read scientific publications Become an expert in the field Seite 4

Growth Rates of Scientific Publications in BioMedicine Growth of PubMed: 1,500-3,500 new data sets per day Currently > 16 Mio. entries Seite 5

Biomedical Databases as Sources of Knowledge? BioMedical databases store data, not knowledge Representation of information in databases dependent on database - model Expressiveness of database - models not sufficient for the representation of complex biomedical information Seite 6

An Over-Simplification...? The more complex a subject is, the more likely you will find it adequately described only in unstructured text and not in databases Seite 7

Breaking the Silos: Linking Named Entities in Text to Database Entries Seite 8

Mapping of Text Objects to Database Entries Proprietary knowledge Textual information Experimental data Pathway/Interaction Databases Seite 9

Protein Name Recognition F12A Multiple names for one gene Ambiguous names in databases Ambiguous acronyms Common word names Multi-word terms Spelling variants Permutations Nested protein names COL1A1 Neuronectin, GMEM, tenascin, HXB, cytotactin, hexabrachion p21, EPO, large T antigen WAS, STEP, ice, StAR Interleukin 1 alpha Tumor necrosis factor beta Collagen, type I, alpha 1 Collagen alpha 1(I) chain Alpha 1 collagen Alpha-1 type I collagen TNF receptor 1 collagen, type I, alpha receptor Seite 10

Functional Network of Interacting Molecules Extracted from Tex Seite 11

Awareness of Synonyms by a Computer Programme Seite 12

Available Chemical Information Textbooks Reports Patents Databases Scientific journals and publications Websites Seite 13

Representing a Chemical Compound How much information do you want to include? Atoms present OH Connections between atoms o bond types Isotopes Charges Stereochemical configuration 14 CH 2 O H N + 3 CH O - Seite 14

Chemical Structure Recognition an Overview 1 Document 2 Depiction 3 Reconstruction 4 SDF file 5 in silico Chemistry created from /home/marc/workspace/csr/results/csr/examples/us2005182053/ US2005182053_result.pnm MZCSRv0.5010050621162D 0.00000 0.00000 0 26 28 0 1 0 0 0 0 0999 V2000 204.0000 102.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 275.0000 61.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 201.0000 59.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 422.0000 178.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 311.0000 164.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 384.0000 165.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 447.0000 144.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 383.0000 123.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 131.0000 60.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 239.0000 123.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 349.0000 218.0000 0.0000 R# 0 0 0 0 0 0 0 0 0 0 0 0 447.0000 207.0000 0.0000 R# 0 0 0 0 0 0 0 0 0 0 0 0 Seite 15

Look And Feel Of chemocr reconstructed molecule input image Seite 16

n Automatically Generated Knowledge e Layer Seite 17

Summary Complex biomedical interrelationships are described in text, not in databases However, databases harbor relevant information on biomedical objects Automated recognition of biomedical entities in text and analysis of chemical depictures allows connecting entities in text and entities in databases as well as experimental platforms In the future we will see that text becomes largely interoperable with databases Moreover, we might be able to use text mining and image mining technologies to automatically generate knowledge layers that will boost the ability to find relevant knowledge. Seite 18

Consequences for the Scientific Communication Process Automated recognition of biomedical entities in text should be enabled / supported by publishers As an alternative to keyword based document retrieval (such as Google) I propose to establish a system that enables the scientist to navigate through an abstract knowledge layer and to identify and to purchase only relevant publications based on factual statements made in these documents. The knowledge layer must not be publisherspecific and consequently it should be generated in a joint effort of public and private stakeholders (publishers; national and international organizations). Seite 19

Thank you for your attention Seite 20