Data Visualization in Cheminformatics. Simon Xi Computational Sciences CoE Pfizer Cambridge



Similar documents
Cheminformatics and its Role in the Modern Drug Discovery Process

STRUCTURE-GUIDED, FRAGMENT-BASED LEAD GENERATION FOR ONCOLOGY TARGETS

QSAR. The following lecture has drawn many examples from the online lectures by H. Kubinyi

Accelerating Lead Generation: Emerging Technologies and Strategies

How To Understand Protein-Protein Interaction And Inhibitors

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

FACT SHEET TESTETROL, A NOVEL ORALLY BIOACTIVE ANDROGEN

Cheminformatics and Pharmacophore Modeling, Together at Last

Integrating Medicinal Chemistry and Computational Chemistry: The Molecular Forecaster Approach

speed thought Getting the most of CHEMAXON Integration June 2006 of The Power of at the

Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems.

Dr Alexander Henzing

Corporate Overview. Dr Robert Scoffin CEO. http;// STAND NUMBER: 27

The Clinical Trials Process an educated patient s guide

Lead optimization services

Alterações empresariais sustentadas pelo conceito de engenharia do Produto Patrício Soares da Silva, MD, PhD

A Statistician s View of Big Data

De novo design in the cloud from mining big data to clinical candidate

Integrating Bioinformatics, Medical Sciences and Drug Discovery

Nathan Brown. The Application of Consensus Modelling and Genetic Algorithms to Interpretable Discriminant Analysis.

Eudendron: an Innovative Biotech Start-up

Selvita Integrated drug discovery collaborations

THE CAMBRIDGE CRYSTALLOGRAPHIC DATA CENTRE (CCDC)

The INFUSIS Project Data and Text Mining for In Silico Modeling

Biological importance of metabolites. Safety and efficacy aspects

Diabetes and Drug Development

Integration of DiscoveryQuant Software into Automated In-Vitro ADME Assay Workflows

DMPK: Experimentation & Data

Corporate Presentation November, 2013

Pharmacology skills for drug discovery. Why is pharmacology important?

Malaria Journal. Open Access RESEARCH. Samuel Ayodele Egieyeh 1,2, James Syce 2, Sarel F. Malan 2 and Alan Christoffels 1*

bioavailability active transport blood-brain barrier transport absorption volume of distribution drug binding to plasma proteins

Use of Predictive ADME in Library Profiling and Lead Optimization

In Silico Models: Risk Assessment With Non-Testing Methods in OSIRIS Opportunities and Limitations

PIRAMAL DISCOVERY SOLUTIONS

High-Throughput Screening at The University of Chicago Cellular Screening Center. Sam Bettis Technical Director

Designed chemical libraries for hit/lead optimisation

We use Reaxys intensively for hit identification, hit-to-lead and lead optimization.

How To Understand The Chemistry Of A 2D Structure

Scoring Functions and Docking. Keith Davies Treweren Consultants Ltd 26 October 2005

Exploiting the Pathogen box

Lead generation and lead optimisation:

Big Data in Drug Discovery

博 士 論 文 ( 要 約 ) A study on enzymatic synthesis of. stable cyclized peptides which. inhibit protein-protein interactions

Combinatorial Chemistry and solid phase synthesis seminar and laboratory course

CLUSTER ANALYSIS WITH R

Crossing the drug development divide

1. Program Title Master of Science Program in Biochemistry (International Program)

BIOINFORMATICS Supporting competencies for the pharma industry

Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences

Valentina Gualato, Ph.D. Process Development Scientist

Original article: A SIMPLE CLICK BY CLICK PROTOCOL TO PERFORM DOCKING: AUTODOCK 4.2 MADE EASY FOR NON-BIOINFORMATICIANS

Micromyx. Micromyx. A Microbiology Services Company. Lab Services Research - Consulting -

A leader in the development and application of information technology to prevent and treat disease.

MSc in Toxicology. Master Degree Programme

Informatics and Knowledge Management at the Novartis Institutes for BioMedical Research (NIBR)

SIPBS Portfolio Entry

Apply with Resume to: Submenu Path Company/Careers/Current Openings/Job Type: Science

MSC IN MEDICINAL CHEMISTRY

MSc program Pharmaceutical Design and Engineering. Peter Heegaard, Head of Studies DTU Veterinary

M The Nucleus M The Cytoskeleton M Cell Structure and Dynamics

Running Large Workflows in the Cloud

Gene Silencing Oligos (GSOs) Third Generation Antisense

Nafith Abu Tarboush DDS, MSc, PhD

CHEM-E4140 Selectivity 12. Pharma Business

Academic Drug Discovery in the Center for Integrative Chemical Biology and Drug Discovery

Protein Protein Interaction Networks

KNIME Enterprise server usage and global deployment at NIBR

dixa a data infrastructure for chemical safety Jos Kleinjans Dept of Toxicogenomics Maastricht University

Data, Measurements, Features

CHEM 451 BIOCHEMISTRY I. SUNY Cortland Fall 2010

Chemical safety and big data: the industry s demands

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences

Big Data analytics for precision medicine and drug discovery

Call 2014: High throughput screening of therapeutic molecules and rare diseases

Medicines for Neglected Diseases Workshop. Dennis Liotta, Ph.D. Director Emory Institute for Drug Discovery Atlanta, Georgia

Clustering & Visualization

Cost of Developing a New Drug

Course Requirements for the Ph.D., M.S. and Certificate Programs

SciFinder. Essential Content. Proven Results. Are You Ready for the Web? Österreichischer Bibliothekartag Innsbruck 2011

1 The water molecule and hydrogen bonds in water

Matteo di Tommaso FDA-PhUSE March 2013 Vice President, Research Business Technology Chair, PRISME Forum

Transcription:

Data Visualization in Cheminformatics Simon Xi Computational Sciences CoE Pfizer Cambridge

My Background Professional Experience Senior Principal Scientist, Computational Sciences CoE, Pfizer Cambridge 9-year experience in pharmaceutical research with a focused on developing cheminformatics and bioinformatics applications for research scientists Education MSc in Molecular Cell Biology in UTDallas MSc in Software Engineering in SMU Finishing Ph.D in Bioinformatics in Boston University

What we will cover today Introduction to drug discovery Cheminformatics basics Encoding of the chemical structures Visualizing data and structures Design and optimization of compound library A case study

The Billion Dollar Molecules Drug Name 2006 World- Wide Sales Primary Use Lipitor $14,385M cholesterol Nexium $5,182M heartburn Advair $6,129M asthma Prevacid $3,425M heartburn Plavix $6,057M anticoagulant Singulair $3,579M asthma Seroquel $3,560M depression Effexor $3,722M depression Norvasc $4,866M hypertension Lipitor 14 billion annual sales

Industry Productivity vs. Investment The Challenge Total R&D Investment ($ Billions) NME/$ $25 $20 $15 $10 $5 $0 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 # NMEs 60 40 20 0 Source: PhRMA annual survey, 2000 Nature Reviews Drug Discovery 3, 451-456 (2004)

~100 Discovery Approaches Attrition On The R&D Process Millions of Compounds Screened Preclinical Pharmacology Preclinical Safety Clinical Pharmacology & Safety 1-2 Products Discovery Exploratory Development Full Development Phase I Phase II Phase III 0 15 5 10 Idea 11-15 Years Drug

Nat Rev Drug Discov. 2007 6:636-49.

What is Chemoinformatics? Use of computer and informational techniques, applied to a range of problems in the field of chemistry. These in silico techniques are commonly used in pharmaceutical companies in the process of drug discovery. Chemistry is a visual science. Data visualization is a key component of cheminformatics.

What is Chemoinformatics?

Encoding Chemical Structures SD format Lipitor Atoms Bonds SMILES format CC(C)C1=C(C(=O)NC2=CC=CC=C2)C( C3=CC=CC=C3)=C(N1CCC(O)CC(O)C C(O)=O)C4=CC=C(F)C=C4

Representing Structure as Fingerprints 010 0 100 0 1001 00000 1 00

Compound Similarity Search

Compound Properties/Descriptors 1D, 2D, 3D, multi-dimensional properties 1D: Molecular Weight, clogp, #of Atoms, charge, #H-Bond donors and acceptors 2D: Atom pairs, substructures functional groups 3D: Shape, pharmacophores nd: Fingerprints, etc.. Chemical series compounds sharing the same core structures 3D

Series Classifications Wards Clustering Iteratively merging a pair of nodes until all nodes are merged. At each merging step, two nodes that give minimal variance are chosen and merged into one new node. Once the tree hierarchy is generated, clusters can be defined by cutting the tree at certain dissimilarity threshold

What makes a drug? Primary pharmacology In vitro potency Cell based potency Functional assays Selectivity against other targets Toxicity Properties Inhibition of CYP450 isozymes PXR transactivation Human hepatocyte toxicity Mutagenicity Mitochondria toxicity Covalent protein binding Inhibition of HERG ADME/Physicochemical Properties Solubility Chemical stability Hydrophobicity/hydrogen bonding potential Intestinal mucosal cell permeation Liver and kidney clearance Metabolism Transporters Charge Size Protein binding Blood-brain barrier permeation Target cell permeation

Drug-Likeness: Rule of Five Proposed by C. Lipinski to describe drug-like molecules. Molecules displaying good oral absorption and /or distribution properties are likely to possess the following characteristics: Molecular Weight < 500 logp < 5.0 H-donors < 5 H-acceptors (number of N and O atoms) < 10

Data Visualization Grid View Table View Plot View Heatmap View Software Relevance Software Usability Software Management

Building Predictive Models using Machine Learning Techniques Use computational models to understand Structure-Activitive Relationship (SAR) Use computational models to run virtual screen to guide compound selection for synthesis

Interpretability of Predictive Models The good part The not so good part Can we derive this for non-linear models?

Multiple Parameter Optimization in Combinatorial Library Design Given a 100x100x100 virtual library space and a set of predictive models for various properties (e.g. potency, ADME, selectivity), select the best 300 compounds for synthesis with the highest probability of being potent and drug-like and with diverse sampling of the chemical space N R3 N R1 N N R2 For example diaminopyrimidine library

The problem of Multiple Parameters Optimzation The chemical space is huge Predictive models are not very predictive Many parameters to optimize and sometime contradictory to each other

MPO a case study with kinase selectivity ~200 cmpds from a library tested against 40 kinases, can we design another 100 cmpds that are highly selective N F F F N R1 N N R2 Trifluoro-diaminopyrimidine series (~200 cmpds) Identify compounds with desired seletivity profile in the expanded virtual chemical space Virtual Library Profile R1 Tested compounds Model Building R1 R2 FW Predictable Virtual Chemical Space Solving R-groups contribution using linear regression R2 Only few combination Rgroup- Kinase have been previously tested Enumeration 5-50x expansion R1 R2

Predictive models - Leave-One-Out Validations

Experimental Validation of Predictions KSS pic50 vs. FW pic50 r 2 =0.45 r 2 =0.59 r 2 =0.92 r 2 =0.86 r 2 =0.74 r 2 =0.83 r 2 =0.63 r 2 =0.88 ~40 cmpds in two series were selected for KSS testing More promiscuous r 2 =0.85 r 2 =0.81 r 2 =0.81 r 2 =0.85 More selective

Cheminformatics Challenges for Drug Discovery Information retrieval and knowledge managment - rapidly and efficiently present all relevant data/knowledge to scientists at the right time and right place Predictive models - drastically improve the accuracy and interpretability of in silico models for potency and ADME endpoints Computer-aided design provide easy to use software applications to help scientists analyze/visualize their data and make efficient use of prior knowledge during compound designs

References 1. Agrafiotis, D. K., Lobanov, V. S. and Salemme, F. R. (2002) Combinatorial informatics in the post-genomics ERA. Nat Rev Drug Discov. 1, 337-346 2. Lipinski, C. and Hopkins, A. (2004) Navigating chemical space for biology and medicine. Nature. 432, 855-861 3. Paolini, G. V., Shapland, R. H., van Hoorn, W. P., Mason, J. S. and Hopkins, A. L. (2006) Global mapping of pharmacological space. Nat Biotechnol. 24, 805-815