Institute for Computational Biomedicine Big Data analytics for precision medicine and drug discovery Olivier Elemento, PhD Associate Professor Laboratory of Cancer Systems Biology Institute for Precision Medicine
Precision Medicine at Weill Cornell Advanced cancer patient Big Data
Weill Cornell CLIA whole-exome sequencing test queries 22,000 genes
Orders a Test Receives Report Patient Institute for Precision Medicine Workflow Patient Management Lab Management Dashboard Tracker Scanner cbio
IPM Computational Pipeline Mutation database Sequencers ICB infrastructure LIMS & dashboard Point mutations & indel detection Analysis Copy Number Alteration detection Report generation Visualization IPM cbioportal Sample matching check (SPIA), Clonality Analysis (CLONET), Germline variants detection + Electronic Medical Record (EPIC)
Copy Number Alterations CHD1 MYC PTEN AR
Bladder cancer patient HER2
HER2 amplification in a bladder cancer patient After Herceptin treatment
350 patients, 500 tumor normal pairs Beltran et al, 2015, JAMA Oncology
Genomic data warehouse Locally installed custom cbioportal
Neo-epitope discovery for PM-guided immunotherapy Bhavneet Bhinder
Neo-epitope discovery for PM-guided immunotherapy Bhavneet Bhinder
Neo-epitope discovery for PM-guided immunotherapy Bhavneet Bhinder
Increasing actionability of the cancer genome
Finding the targets of small molecules is hard 20-30 million small molecules Targets <0.01% target known
Neel Madhukar BANDIT: Bayesian ANalysis to predict Drug Interaction Targets Drug Efficacies Transcriptional Response Bioassays Integration P(data shared _target) Õ P(data no_ shared _target) Side Effects Chemical Structure
High likelihood of sharing a target
All data types are useful
More data, better results
How well can we predict targets? Fraction correctly predicted Likelihood ratio cutoff Drug targets from DrugBank
50,000+ orphan small molecules 40% Likelihood ratio > 4 3+ data types required
Predicted anti-tubulin molecules
NSC116555 NSC648543 NSC667932 NSC406042 NSC335989 No Drug (Control) Taxol - Polymerizing (Control) Vinblastine- Depolymerizing (Control) NSC116555 - Polymerizing NSC648543 - Depolymerizing NSC667932 - Depolymerizing NSC406042 - Depolymerizing NSC335989 - Depolymerizing With Evi Giannakakou`
Predicting Clinical Trial Winners success failure Katie Gayvert
Factors Contributing to Clinical Trial Success Toxicity Features Chemical Toxicity Chemical Descriptors Lipinski s Rule of 5 Subgroups, Polarity, Formal Charge, # Atoms Variance of drug s effect of drug in across tissue types GI50 values in cell lines Target Toxicity Essentiality of target gene Frequency of target loss Network connectivity Expression level of target gene across multiple tissues GTEx expression level in different tissue types GTEx expression level in relevant GTEx tissues Features Representing Improvements Over Baseline Number of drugs currently approved for the indication Similarity to other drugs that are approved or failed for the indication
Lipinski Rule of 5 1. No more than 5 hydrogen bond donors 2. No more than 10 hydrogen bond acceptors 3. A molecular mass less than 500 daltons 4. An octanol-water partition coefficient log P not greater than 5 FDA Approved Cancer Drugs Hydrogen Bond Molecular Acceptor Weight Hydrogen Bond Donor X Log P Hydrogen Bond Acceptor Failed Cancer Clinical Trials Hydrogen Molecular Bond X Log P Weight Donor drugs Passed test Failed test
Target Expression Level Across Tissues Failed Clinical Trials FDA Approved Drugs drugs Low expression High expression
Prediction Method Performance FDA Approved Cancer Drugs vs. Failed Clinical Trials (phase I) Reference Approved Failed Approved 113 25 Failed 27 94 Accuracy AUROC Sensitivity Specificity = 0.80 = 0.87 = 0.79 = 0.81 Random Forests Classifier, leave-one-out crossvalidation
Feature Importance model Not important important maxdegree Refractivity Liver Pituitary MolecularWeight Pancreas Blood LogpSolubility RotatableBondCount PolarSurfaceArea Small Intestine maxbtwn XLogP Thyroid HydrogenBondDonorCount Muscle Kidney Stomach Nerve Skin Colon Adrenal Gland Testis Salivary Gland Vagina Ovary Spleen Prostate Fallopian Tube Blood Vessel 8 9 10 11 12 13 14 MeanDecreaseAccuracy Refractivity maxdegree PolarSurfaceArea LogpSolubility XLogP MolecularWeight maxbtwn RotatableBondCount lossfreq HydrogenBondDonorCount HydrogenBondAcceptorCount NumRings Liver Muscle Kidney Adrenal Gland Testis Pancreas Blood Small Intestine Pituitary Nerve Heart Thyroid Brain Stomach Colon Spleen Ovary Esophagus 0 2 4 6 8 MeanDecreaseGini
Acknowledgements Elemento Lab, Weill Cornell Yanwen Jiang, PhD David Redmond, Arielle Messer, Neel Madhukar Matt Teater, Mark Carty Wei Du, Katie Gayvert, Heng Pan, Linda Huang, Ken Eng Wayne Tam, MD, PhD Kui Nie, PhD Peter Martin, MD John Leonard, MD Ari Melnick, MD Leandro Cerchietti, MD Mark Rubin, MD Selina Chen-Kiang, PhD Rubin lab Rickman lab Tarun Kapoor, Ph.D Sarah Wacker, PhD IPM team Jenny Xiang & WCMC Genomics Core, Adrian Tan, Tuo Zhang Epigenomics core Funding: NSF, NIH, Starr Cancer, LLS SCOR & TRP, Cancer Center Pilot Grant, Hirschl trust, Tri-Sci Stem cell, PhRMA, Janssen Pharma