Information Extraction Technologies in Chemistry A Critical Review
|
|
|
- Marcus Griffith
- 10 years ago
- Views:
Transcription
1 Information Extraction Technologies in Chemistry A Critical Review Martin Hofmann-Apitius The International Conference in Trends for Scientific Information Professionals Sitges (Barcelona), Spain October 2007
2 Structure of this Presentation Text & Image Mining in Chemistry a Critical Review 1. Representation of chemical information in scientific literature 2. Technologies for chemical named entity recognition 3. Technologies for information extraction from chemical structure depictions 4. Benchmarking activities in the biomedical arena & Call for a joint initiative for benchmarking of information extraction technology in the field of chemistry Seite 2
3 Representation of Chemical Information in Scientific Literature Seite 3
4 Sources of Chemical Information in Scientific Literature Scientific literature comprising chemical information is not limited to journal publications. Unstructured knowledge sources containing chemical information are: - Journal articles - Patents - Books (incl. Chemical Handbooks) - Doctoral Theses and Project Reports - Package inserts / Chemical hazard documentation - Websites Seite 4
5 Identification and Representation in Chemistry Trivial names (incl. brands): Aspirin, Acetylsalicyclic acid,. Systematic nomenclatures: Mass formula: C9H8O4 SMILES: OC(=O)C1=C(C=CC=C1)OC(=O)C InChI: 1/C9H8O4/c1-6(10) (8)9(11)12/h2-5H,1H3,(H,11,12) IUPAC: pyrido[1",2":1',2']imidazo[4',5':5,6]pyrazino[2,3-b]phenazine References to registration numbers (e.g. CAS or Beilstein) Structural formula: universal language between chemists Chemical properties ~ atom composition + spatial arrangement Seite 5
6 Chemical Information Communication During the publication process chemical entities in text appear as trivial names, systematic and semi-systematic names moreover Molecule structures are published as images the machine readable format is lost Seite 6
7 Technologies for Chemical Named Entity Recognition (NER) Seite 7
8 Types of Chemical Named Entity Recognition (NER) Three classes of automated Chemical NER can be distinguished: - NER of chemical entities based on dictionaries - NER of chemical entities based on rules (regular expressions) - NER of chemical entities based on training sets (machine learning) It is noteworthy that there exist also combinations of these approaches, e.g. using dictionaries as a feature in machine learning approaches. Seite 8
9 Reports on Chemical Named Entity Recognition (NER) Chemical NER is a rather new, emerging field. Information extraction specialists such as TEMIS have teamed up with cheminformatics specialists (in the case of TEMIS it is MDL) and cheminformatics specialists such as InfoChem have started their own information extraction approaches and teamed up with research teams like our team at SCAI. Other combinations are e.g. InforSense (a workflow environment provider) and Linguamatics; or the text mining activities of Accelrys (SciTegic s Pipeline Pilot Technology). Benchmarking is difficult as chemical knowledge is traditionally much more proprietary than biological knowledge and therefore there is no joint critical assessment of information extraction technologies in chemistry (unlike the BioCreative assessment of text mining tools in molecular biology) Recent momentum comes from groups like the Cambridge-based research groups of Murray-Rust, Corbett, Teufel, and others, who push open source developments in chemical knowledge management and chemical NER. Seite 9
10 Chemical NER in the Open Source Arena: OSCAR OSCAR is an open source system for chemical NER developed by the groups of Corbett, Murray-Rust, Teufel and others in Cambridge. OSCAR uses chemical dictionaries as well as rule-sets (regular expressions) including e.g. rules like if it ends with -ase it is most likely an enzyme. OSCAR has been combined with a scientific document parser (SciXML format) and a recent paper by Batchelor and Corbett reports on first experiences with chemical NER, comparing NER of GO-terms with NER of chemical compounds in the same text (Proceedings of the Association for Computational Linguistics 2007; Demo & Poster Session). OSCAR comprises a machine learning approach combined with dictionary features and rule-sets, that are used to define features. Seite 10
11 What we Learned from OSCAR Publications and Preliminary Tests done at Fraunhofer SCAI... OSCAR is - like many open source tools - poorly documented, but source code is available OSCAR authors provide annotation guidelines and - most important - they published on inter-annotator agreement... Valuable information for the assessment of the quality of training corpora The developers of OSCAR still see some issues with the algorithmic approaches taken: - no stemming / lemmatization (detrimental to IUPAC expressions anyway) - context-dependent disambiguation in chemistry --> example plural forms - enumerations are a significant problem Seite 11
12 Entity-Types for chemical NER and Performance of chemical NER using LingPipe (Technology used in OSCAR) Corbett, Batchelor and Teufel Annotation of Named Chemical Entities BioNLP 2007: Biological, translational, and clinical language processing, pages 57-64, Prague, June 2007 Seite 12
13 Chemical NER in Patent Literature Chemical NER from patent literature is not trivial at all. Although e.g. Accelrys claims that their text analytics collection of the Pipeline Pilot is able to handle patents, we believe that current text mining technology still faces significant problems when applied to patents at production scale. Patents are complicated because: - they are not necessarily written to be understandable - many patents exist as PDF and have to be OCR-ed, which introduces errors - patents contain a lot of chemical information in images and not in characters - patents are full text documents with sometimes hundreds of page. Classical NLP tools are not able to work on such large documents - or, if the do, it takes ages. Seite 13
14 Chemical Named Entity Recognition (NER) using CRFs Conditional Random Fields (CRF) is a machine learning approach that is quite emerging in the text mining community. In the gene mentioning task of the BioCreative 2006 critical assessment, CRFs were applied by all topranking groups. CRFs are based on graphical models for sequence-type information and text is nothing else but a sequence of characters and empty positions. CRFs are trained and work at the token level. Seite 14
15 Appearance of IUPAC in Patents Seite 15
16 Recognition of IUPAC in Patents: First Steps Seite 16
17 Lesson learned: Artificial Corpus =/= Real Patent Situation Seite 17
18 BioMedical Objects can link from text to database entries Seite 18
19 Chemical Objects can link from text to virtual experiments... DOCKING: Simulated ligand binding Seite 19
20 Name 2 Structure Conversion of names to structures offers a couple of options that can help to verify molecule information. Names (trivial names as well as IUPAC, InChi or CAS designators) can be converted to structures using one of the about 4 tools available on the market. Once the name is converted to a structure, all the nice gimmics that work on structures can be done... Alignments, fragment searches, similarity searches, identity etc. Now: how good are the current N2S tools? Our colleagues at InfoChem did a preliminary study... Seite 20
21 Chemical Named Entity Recognition Document HO Paracetamol H N O Annotation List of chemical names N-acetyl-p-aminophenol Acetaminophen Calpol Paralief Panadol dextropropoxyphene Automatic recognition of chemical names Paracetamol Acetaminophen N-acetyl-p-aminophenol Calpol Panadol. InfoChem is using the annotation software from IBM (Chemical Annotator V.2.01) It recognizes and extracts/highlights chemical names from text It uses various dictionaries of names and fragments InfoChem GmbH th Fraunhofer Symposium on Text Mining in the Life Sciences 24 th -25 th September 2007
22 Name to Structure Conversion Tool (N2S) List of chemical names Paracetamol Acetaminophen N-acetyl-p-aminophenol Calpol Panadol. Automatic conversion of chemical names to structures Connection Table H N O HO Goal: conversion of chemical names to structures (connection tables, e.g. MOLfile) Tools in test-phase: CambridgeSoft (Name=Struct, processed by IBM), ACD/Labs (Name to Structure), OpenEye (Lexichem) InfoChem (ICN2S) in development InfoChem GmbH th Fraunhofer Symposium on Text Mining in the Life Sciences 24 th -25 th September 2007
23 Evaluation of N2S Tools First results CambridgeSoft Name=Struct Converts (1) : 6.5 M Philosophy: liberal, convert as much as possible ACDLabs Name to Structure Converts (1) : 3.5 M Philosophy: Rather strict, results more reliable Converts (1) : 2.5 M OpenEye Lexichem Philosophy: Rather strict Test File: 2,164 Test File: 2,164 N/a Conversion: 2,164 (100%) Correct: 1,486 (69%) Conversion: 1,142 (53%) Correct: 985 (46%) N/a N/a Wrong: 678 (31%) Wrong: 157 (14%) N/a (1) Based on 6.5 M names abstracted from US patents which could be converted with CambridgeSoft Name=Struct InfoChem GmbH th Fraunhofer Symposium on Text Mining in the Life Sciences 24 th -25 th September 2007
24 Evaluation of N2S Tools Conclusions Challenges Real world names instead of correct nomenclature Ambiguous names (e.g. xylene: ortho/meta/para) => Mixture or error? Recall/precision trade-off (correct conversion vs. high number of structures) Deficiency of dictionaries (errors, missing fragments, trivial names, etc.) Formally incorrect names (incorrect syntax, missing locants, typing errors etc.) Chemical name mixed with text (e.g. methoxy-substituted benzene) InfoChem GmbH th Fraunhofer Symposium on Text Mining in the Life Sciences 24 th -25 th September 2007
25 Technologies for Information Extraction from Chemical Structure Depictions Seite 25
26 Computer Systems are not Chemistry-aware The universal language of chemistry is the graphical depiction of the chemical molecule structure. But even though chemical structure depictions can be readily interpreted by chemists, computers regard chemical structure depictions as a bunch of pixels. Consequently, structure depictions are information-rich, but cannot be used by computational approaches. Computers are not chemistry-aware... See example Google Image Search Seite 26
27 Searching for Structural Information in Images THC Search Images Results 1-20 of about 113,000 for THC. (0.40 seconds) New! Want to improve Google Image Search? Try Google Image Labeler. Seite 27
28 Computer-based Interpretation of Chemical Structure Depictions 4 different systems published so far: - Kékulé (Joe R. McDaniel, Jason R. Balmuth Kekule: OCR-optical chemical (structure) recognition Journal of Chemical Information and Computer Sciences Date: 1992 Volume: 32 Issue: 4p ) - CLiDE (P. Ibison, F. Kam, R.W. Simpson, C. Tonnelier, T. Venczel and A.P.Johnson: Chemical Structure Recognition and Generic Text Interpretation in the CLiDE project Proceedings on Online Information 92, 1992, London, England) - OSRA (open source chemical OCR; see started 2005?) - chemocr (see started 2004) Seite 28
29 Reconstruction of Synthesis Pathways from Patents: an Example 35 could be converted to structure using a single name to structure tool Seite 29
30 Biological Background Information Seite 30
31 A Synthesis Reaction Schema Seite 31
32 ChemoCR GUI Input: Bitmaps Seite 32 Output: Molecules
33 From Picture to Reaction: Pre-Processing 0: Picture Seite 33 BMP after scaling & binarization
34 From Picture to Reaction: Identification of Connected Components 0: Picture 1: Connected Components 1 component - Bondset - Letter Seite 34
35 From Picture to Reaction: Tagging of Characters 0: Picture 1: Connected Components 2: Tag Text Text or Bond? Seite 35 Text or Bond?
36 From Picture to Reaction: OCR of Identified Characters 0: Picture 1: Connected Components 2: Tag Text 3: OCR Seite 36
37 From Picture to Reaction: Identification of Lines (Bonds) Only the bonds, please 0: Picture 1: Connected Components 2: Tag Text 3: OCR 4: Vectorizer Seite 37
38 From Picture to Reaction: Applying Chemical Knowledge Associate semantics: - Bond - Atom - Reaction symbol 0: Picture 1: Connected Components 2: Tag Text 3: OCR 4: Vectorizer 5: Expert System Seite 38
39 From Picture to Reaction: Interpretation as Chemical Graphs Connect everything & mind the gaps 0: Picture 1: Connected Components 2: Tag Text 3: OCR 4: Vectorizer 5: Expert System 6: Chemical Graph Seite 39
40 From Picture to Reaction: Representation Format Molecule Reaction 0: Picture 1: Connected Components 2: Tag Text 3: OCR 4: Vectorizer 5: Expert System 6: Chemical Graph 7: Molecule Seite 40 Annotation
41 From Picture to Reaction: Error / Problem Detection Sorry, but no Markush 0: Picture 1: Connected Components 2: Tag Text 3: OCR 4: Vectorizer 5: Expert System 6: Chemical Graph 7: Molecule 8: Validation Seite 41
42 Reaction Schema Reconstructed by ChemoCR: Embedding the Resulting SDF in Patent Document View Seite 42
43 Potential Knowledge Gain Through ChemoCR Analysis One of the key questions associated with multi-modal patent mining is: do we gain from being able to simultaneously analyze text and chemical structure depictions? What is the gain of knowledge if we combine text analysis and image analysis? Seite 43
44 Molecules can be Found in Text 560 different molecules (fragments) identified in text Mapping to PubChem via dictionary (name to InChI) Mostly known structures Seite 44
45 Structure Depictions Frequently Contain Novel Structures Seite 45
46 Benchmarking activities in the biomedical arena & Call for a joint initiative for benchmarking of information extraction technology in the field of chemistry Seite 46
47 Lessons to be learned from the Biomedical Community In the biomedical community benchmarking activities such as CASP (Critical Assessment of Techniques for Protein Structure Prediction) or BioCreative (Critical Assessment of Text Mining in Biology) have helped to assess the quality of current technology developments. However, the development of the testing scenarios and the organisation of such critical assessments is a time consuming task (ask Lynette Hirschman). Seite 47
48 Need for Benchmarking / Evaluations in Chemistry Taken from: Corbett, Batchelor and Teufel Annotation of Named Chemical Entities BioNLP 2007: Biological, translational, and clinical language processing, pages 57-64, Prague, June 2007 Seite 48
49 A Consideration... BioCreative will be organised by Lynette Hirschman and her colleagues (Rolf Appweiler; Alfonso Valencia and others) again in At the 5th Symposium on Text Mining in the Life Sciences in September 2007 in Bonn, Lynette Hirschman gave a very impressive talk on the results of the critical assessment of text mining in biomedicine BioCreative 2004 and In her talk, she called for suggestions from the community for new scenarios that could be worked on in the course of BioCreative Seite 49
50 ... and a Suggestion I suggest that we ask Lynette to open a parallel track in BioCreative, which focuses on information extraction in chemistry. Such ChemCreative should be aligned with BioCreative, but it needs significant input from the chemistry community. Fraunhofer SCAI offers full support for establishing ChemCreative and we offer to team up with public (academic) professionals in the chemical information management arena to develop the appropriate scenarios and evaluation criteria. Finally: did you ever come to the point where you felt it would be just too good to have access to publicly available, well annotated corpora of chemical literature? Seite 50
51 Acknowledgement In alphabetical order: Holger Dach Juliane Fluck Christoph Friedrich Tobias Gattermayer Tobias Goecke Carina Haupt Roman Klinger Corinna Kolarik Peter Kral Theo Mevissen Bernd Müller Chia-Hao Ou A. Weihermüller Marc Zimmermann Seite 51
Information Extraction from Patents: Combining Text- and Image-Mining. Martin Hofmann-Apitius
Information Extraction from Patents: Combining Text- and Image-Mining Martin Hofmann-Apitius Bonn-Aachen International Centre for Information Technology (B-IT) September 25, 2007 Status Report: Major Achievements
Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences
Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences Prof. Dr. Martin Hofmann-Apitius Head of the Department of Bioinformatics Fraunhofer Institute for Algorithms and
Extraction and Visualization of Protein-Protein Interactions from PubMed
Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much
Cheminformatics and Pharmacophore Modeling, Together at Last
Application Guide Cheminformatics and Pharmacophore Modeling, Together at Last SciTegic Pipeline Pilot Bridging Accord Database Explorer and Discovery Studio Carl Colburn Shikha Varma-O Brien Introduction
Is 20TB really Big Data?
From Big Data to Chemical Information (RSC CICAG) 22 April 2015 London 100 million compounds, 100K protein structures, 2 million reactions, 1 million journal articles, 20 million patents and 15 billion
CENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices
overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track Yung-Chun Chang 1,2, Yu-Chen Su 3, Chun-Han Chu 1, Chien Chin Chen 2 and
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences
Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences It s not information overload, it s filter failure. Clay Shirky Life Sciences organizations face the challenge
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and
Big Data and Text Mining
Big Data and Text Mining Dr. Ian Lewin Senior NLP Resource Specialist [email protected] www.linguamatics.com About Linguamatics Boston, USA Cambridge, UK Software Consulting Hosted content Agile,
SCIENCE. Introducing updated Cambridge International AS & A Level syllabuses for. Biology 9700 Chemistry 9701 Physics 9702
Introducing updated Cambridge International AS & A Level syllabuses for SCIENCE Biology 9700 Chemistry 9701 Physics 9702 The revised Cambridge International AS & A Level Biology, Chemistry and Physics
ProteinQuest user guide
ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for
Technical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold [email protected] [email protected] Copyright 2012 by KNIME.com AG
Current Trends in Cheminformatics
Current Trends in Cheminformatics Dr. Wendy A. Warr http://www.warr.com Overview Literature analysis Hardware and infrastructure Web 2.0 The Semantic Web Text mining Combichem Evaluation of scientific
bioavailability active transport blood-brain barrier transport absorption volume of distribution drug binding to plasma proteins
ACD/I-Lab Structure-based predictions at the What does ACD/I-Lab do? ACD/I-Lab is an online structure-based prediction engine and database which calculates physicochemical properties, ADME toxicities and
A LEVEL. Type of resource H433 CHEMISTRY B. Theme: Carbon-13 MMR. October 2015
A LEVEL Type of resource H433 CHEMISTRY B (SALTERS) Theme: Carbon-13 MMR October 2015 We will inform centres about any changes to the specification. We will also publish changes on our website. The latest
BANNER: AN EXECUTABLE SURVEY OF ADVANCES IN BIOMEDICAL NAMED ENTITY RECOGNITION
BANNER: AN EXECUTABLE SURVEY OF ADVANCES IN BIOMEDICAL NAMED ENTITY RECOGNITION ROBERT LEAMAN Department of Computer Science and Engineering, Arizona State University GRACIELA GONZALEZ * Department of
Application Note # LCMS-62 Walk-Up Ion Trap Mass Spectrometer System in a Multi-User Environment Using Compass OpenAccess Software
Application Note # LCMS-62 Walk-Up Ion Trap Mass Spectrometer System in a Multi-User Environment Using Compass OpenAccess Software Abstract Presented here is a case study of a walk-up liquid chromatography
ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search
Project for Michael Pitts Course TCSS 702A University of Washington Tacoma Institute of Technology ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search Under supervision of : Dr. Senjuti
SciFinder. Essential Content. Proven Results. Are You Ready for the Web? Österreichischer Bibliothekartag Innsbruck 2011
SciFinder Essential Content. Proven Results. Are You Ready for the Web? Österreichischer Bibliothekartag Innsbruck 2011 What is SciFinder? Simply put SciFinder is the first choice for chemists and related
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
SIPAC. Signals and Data Identification, Processing, Analysis, and Classification
SIPAC Signals and Data Identification, Processing, Analysis, and Classification Framework for Mass Data Processing with Modules for Data Storage, Production and Configuration SIPAC key features SIPAC is
Big Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
How to create a web-based molecular structure database with free software
How to create a web-based molecular structure database with free software Norbert Haider Department of Drug Synthesis Faculty of Life Sciences, University of Vienna [email protected] small molecules
file:///c /Documents%20and%20Settings/terry/Desktop/DOCK%20website/terry/Old%20Versions/dock4.0_faq.txt
-- X. Zou, 6/28/1999 -- Questions on installation of DOCK4.0.1: ======================================= Q. Can I run DOCK on platforms other than SGI (e.g., SparcStations, DEC Stations, Pentium, etc.)?
Neural Networks in Data Mining
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department
Not all NLP is Created Equal:
Not all NLP is Created Equal: CAC Technology Underpinnings that Drive Accuracy, Experience and Overall Revenue Performance Page 1 Performance Perspectives Health care financial leaders and health information
11-792 Software Engineering EMR Project Report
11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes Jitendra Jonnagaddala a,b,c Siaw-Teng Liaw *,a Pradeep Ray b Manish Kumar c School of Public
Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems.
Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems. Roberto Todeschini Milano Chemometrics and QSAR Research Group - Dept. of
Dr Alexander Henzing
Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander
Guidelines for Establishment of Contract Areas Computer Science Department
Guidelines for Establishment of Contract Areas Computer Science Department Current 07/01/07 Statement: The Contract Area is designed to allow a student, in cooperation with a member of the Computer Science
PPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
Chemistry Add-in for Word User s Guide
Chemistry Add-in for Word User s Guide Abstract This document describes how to use the Chemistry Add-in for Word, an add-in for Microsoft Word that provides a simple and flexible way to include chemical
Find the signal in the noise
Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical
M3039 MPEG 97/ January 1998
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039
Text Mining - Scope and Applications
Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss
Agilent s Kalabie Electronic Lab Notebook (ELN) Product Overview ChemAxon UGM 2008 Agilent Software and Informatics Division Mike Burke
Agilent s Kalabie Electronic Lab Notebook (ELN) Product Overview ChemAxon UGM 2008 Agilent Software and Informatics Division Mike Burke Kalabie: Extending the OpenLAB Architecture Agilent User Interface
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
BIG. Big Data Analysis John Domingue (STI International and The Open University) Big Data Public Private Forum
Big Data Analysis John Domingue (STI International and The Open University) Project co-funded by the European Commission within the 7th Framework Program (Grant Agreement No. 257943) 1 The Data landscape
Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University
Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary
COURSE TITLE COURSE DESCRIPTION
COURSE TITLE COURSE DESCRIPTION CH-00X CHEMISTRY EXIT INTERVIEW All graduating students are required to meet with their department chairperson/program director to finalize requirements for degree completion.
Extracting value from scientific literature: the power of mining full-text articles for pathway analysis
FOR PHARMA & LIFE SCIENCES WHITE PAPER Harnessing the Power of Content Extracting value from scientific literature: the power of mining full-text articles for pathway analysis Executive Summary Biological
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Chemical reactions allow living things to grow, develop, reproduce, and adapt.
Section 2: Chemical reactions allow living things to grow, develop, reproduce, and adapt. K What I Know W What I Want to Find Out L What I Learned Essential Questions What are the parts of a chemical reaction?
Doctor of Philosophy in Computer Science
Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects
CHEMISTRY. Real. Amazing. Program Goals and Learning Outcomes. Preparation for Graduate School. Requirements for the Chemistry Major (71-72 credits)
CHEMISTRY UW-PARKSIDE 2015-17 CATALOG Greenquist 344 262-595-2326 College: Natural and Health Sciences Degree and Programs Offered: Bachelor of Science Major - Chemistry Minor - Chemistry Certificate -
The Prolog Interface to the Unstructured Information Management Architecture
The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, [email protected] 2 IBM
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge
CHAPTER 15: IS ARTIFICIAL INTELLIGENCE REAL?
CHAPTER 15: IS ARTIFICIAL INTELLIGENCE REAL? Multiple Choice: 1. During Word World II, used Colossus, an electronic digital computer to crack German military codes. A. Alan Kay B. Grace Murray Hopper C.
VCE CHEMISTRY 2008 2011: UNIT 3 SAMPLE COURSE OUTLINE
VCE CHEMISTRY 2008 2011: UNIT 3 SAMPLE COURSE OUTLINE This sample course outline represents one possible teaching and learning sequence for Unit 3. 1 2 calculations including amount of solids, liquids
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Bio-IT World 2013 Best Practices Awards
Published Resources for the Life Sciences 250 First Avenue, Suite 300, Needham, MA 02494 phone: 781-972-5400 fax: 781-972-5425 Bio-IT World 2013 Best Practices Awards Celebrating Excellence in Innovation
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Big Data Processing and Analytics for Mouse Embryo Images
Big Data Processing and Analytics for Mouse Embryo Images liangxiu han Zheng xie, Richard Baldock The AGILE Project team FUNDS Research Group - Future Networks and Distributed Systems School of Computing,
From Data to Foresight:
Laura Haas, IBM Fellow IBM Research - Almaden From Data to Foresight: Leveraging Data and Analytics for Materials Research 1 2011 IBM Corporation The road from data to foresight is long? Consumer Reports
Processing: current projects and research at the IXA Group
Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive
An Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2011 Editor(s): Contributor(s): Reviewer(s): Status-Version: Volha Petukhova, Arantza del Pozo Mirjam
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
A leader in the development and application of information technology to prevent and treat disease.
A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today
Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments
Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments Mario Cannataro, Pietro Hiram Guzzi, Tommaso Mazza, and Pierangelo Veltri University Magna Græcia of Catanzaro, 88100
Forensic Science Standards and Benchmarks
Forensic Science Standards and Standard 1: Understands and applies principles of scientific inquiry Power : Identifies questions and concepts that guide science investigations Uses technology and mathematics
Committee on WIPO Standards (CWS)
E CWS/1/5 ORIGINAL: ENGLISH DATE: OCTOBER 13, 2010 Committee on WIPO Standards (CWS) First Session Geneva, October 25 to 29, 2010 PROPOSAL FOR THE PREPARATION OF A NEW WIPO STANDARD ON THE PRESENTATION
THE CAMBRIDGE CRYSTALLOGRAPHIC DATA CENTRE (CCDC)
ABOUT THE CAMBRIDGE CRYSTALLOGRAPHIC DATA CENTRE (CCDC) The CCDC is the trusted research institution responsible for the 50-year old Cambridge Structural Database (CSD) and its applications. Used by thousands
White Paper: Securely archiving emails
White Paper: Securely archiving emails in the PDF/A format 25 years ago, Germany s first email was sent. At the time, few could guess that electronic messaging would become one of the world s most essential
Pallas Ludens. We inject human intelligence precisely where automation fails. Jonas Andrulis, Daniel Kondermann
Pallas Ludens We inject human intelligence precisely where automation fails Jonas Andrulis, Daniel Kondermann Chapter One: How it all started The Challenge of Reference and Training Data Generation Scientific
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
PROMT Technologies for Translation and Big Data
PROMT Technologies for Translation and Big Data Overview and Use Cases Julia Epiphantseva PROMT About PROMT EXPIRIENCED Founded in 1991. One of the world leading machine translation provider DIVERSIFIED
Keystone Exams: Chemistry Assessment Anchors and Eligible Content. Pennsylvania Department of Education www.education.state.pa.
Assessment Anchors and Pennsylvania Department of Education www.education.state.pa.us 2010 PENNSYLVANIA DEPARTMENT OF EDUCATION General Introduction to the Keystone Exam Assessment Anchors Introduction
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
How To Understand The Chemistry Of A 2D Structure
Finding Better Leads using Molecular Fields Sally Rose, Tim Cheeseright, Cresset BioMolecular Discovery Ltd 2D drawings are a ubiquitous representation for molecular structures. Despite this, they provide
CLUSTER ANALYSIS WITH R
CLUSTER ANALYSIS WITH R [cluster analysis divides data into groups that are meaningful, useful, or both] LEARNING STAGE ADVANCED DURATION 3 DAY WHAT IS CLUSTER ANALYSIS? Cluster Analysis or Clustering
NMR and other Instrumental Techniques in Chemistry and the proposed National Curriculum.
NMR and other Instrumental Techniques in Chemistry and the proposed National Curriculum. Dr. John Jackowski Chair of Science, Head of Chemistry Scotch College Melbourne [email protected]
Introduction. What Can an Offline Desktop Processing Tool Provide for a Chemist?
Increasing Chemist Productivity in an Open-Access Environment Ryan Sasaki, Graham A. McGibbon, Steve Hayward Featuring ACD/Spectrus Processor and Aldrich Library for ACD/Labs Advanced Chemistry Development,
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
Indiana's Academic Standards 2010 ICP Indiana's Academic Standards 2016 ICP. map) that describe the relationship acceleration, velocity and distance.
.1.1 Measure the motion of objects to understand.1.1 Develop graphical, the relationships among distance, velocity and mathematical, and pictorial acceleration. Develop deeper understanding through representations
How To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
Instructional Design Philosophy
Instructional Design Philosophy Executive Summary Instructional Design Principles: MindLeaders training is based on sound principles from research in instructional design, adult learning, and information
TAKEAWAYS CHALLENGES. The Evolution of Capture for Unstructured Forms and Documents STRATEGIC WHITE PAPER
STRATEGIC WHITE PAPER CHALLENGES Lost Productivity Business Inefficiency The Evolution of Capture for Unstructured Forms and Documents Inability to Capture and Classify Unstructured Forms and Documents
The Re-emergence of Data Capture Technology
The Re-emergence of Data Capture Technology Understanding Today s Digital Capture Solutions Digital capture is a key enabling technology in a business world striving to balance the shifting advantages
Big Data An Opportunity or a Distraction? Signal or Noise?
Big Data An Opportunity or a Distraction? Signal or Noise? Maya R. Said, Sc.D. SVP & Global Head, Oncology Policy & Market Access, Novartis 3rd International Systems Biomedicine Symposium Luxembourg, 28
OCR LEVEL 3 CAMBRIDGE TECHNICAL
Cambridge TECHNICALS OCR LEVEL 3 CAMBRIDGE TECHNICAL CERTIFICATE/DIPLOMA IN IT SYSTEM DESIGN R/505/4647 LEVEL 3 UNIT 33 GUIDED LEARNING HOURS: 60 UNIT CREDIT VALUE: 10 SYSTEM DESIGN R/505/4647 LEVEL 3
