De novo design in the cloud from mining big data to clinical candidate Jérémy Besnard Data Science For Pharma Summit 28 th January 2016
Overview the 3 bullet points Cloud based data platform that can efficiently capture and mine multiple data sources Platform to facilitate exchange with collaborators and integrate live data into our infrastructure Large scale machine learning to extract knowledge from this data to improve the decision making during drug discovery campaign
Cloud not just a trendy word
exscientia A young spin-out company from the University of Dundee Few employees working in multiple locations Built platform from revenues through partnerships and contracts with pharmaceutical companies We were able to work with our partners quickly and efficiently by deploying our infrastructure on the cloud
Big data Big compared to what? Philosophy is more important than the size Collect and use all the data rather than use small sample sets Accept messiness of data - benefits of using more data of variable quality outweigh the costs of using small, very exact data Accept we may not need to understand the physical basis of a correlation for the predictions to be useful ex scientia = from knowledge in latin
Sources of data Comprehensive exploration Platform integrated with proprietary methods delivers a global view of the polypharmacopaeia HTS & Patent Competitor Structural Fragments Literature derived SAR intel complexes Clinical Proprietary Automated Med-Chem Design Synthesis and Assay through preferred outsourced providers or collaborators
Live data Large datasets, public, commercial or corporate are important source of information but once a drug discovery project starts new information is generated and need to be integrated and exploited Learnt from the past but don t live in the past
Challenges of data flow - plumbing
Shared platform and automation Integration with ScienceCloud from Biovia to share chemistry and assay data with our collaborators Collaborator upload new data Upload to our collaborator for future decisions Automatically downloaded and integrated New information generated
DataScience Darwin meets big data Drug discovery is ultimately a high dimensional optimization problem Given the impossibly vast chemical space, brute force searches are inherently inefficient Darwinian processes are unreasonably efficient at finding solutions to high dimensional problems, whether it be evolutionary fitness in nature or drug discovery
Initial population selection De novo design algorithm Background knowledge Virtual enumeration Elite & random population selection Non-stop condition Predict properties Multiobjective prioritization Besnard et al. Automated design of ligands to polypharmacological profiles, Nature 492, 215 220 http://doi.org/10.1038/nature11691 Final population
Illustration of evolution <30 compounds required to discover, synthesize and patent Compound 27s, a selective D4 compound with early lead properties 2 generations 2 generations 2 generations 2 generations 2 generations >10,000 compounds evolved & scored for D4 and off-targets per generation, but only the few most promising compounds were synthesized screened. Compound 27s D4 K i =90nM Patent: PCT/GB2012/051194 / WO2012160392
Technology in practice Automated lead generation with rapid design cycle and efficient evolution to drug candidate profile Design Simultaneous design objectives deliver balanced compounds Learn Assay data informs next design cycle Make 10-30 compounds/cycle High information content Test in disease relevant assays
COLLABORATION PROJECTS
Metabolic Disorder Dual agonist for two unrelated targets Design against polypharmacology profiles Confirmatory 3D structures of both complexes
Bispecific Compounds Goal is to find first-in-class bispecific small molecule bispecific for two enzymes of unrelated families Process: Gather public and patent data to built models De novo design with evolutionary algorithm Docking of top ranked compounds to assess if the compounds could bind in the 2 targets In-vitro assay followed by crystallography
Structural validation X-ray crystallography of both structure complexes with the top prioritized compound and assay data confirm the design hypothesis Enzyme A IC 50 = 350nM Enzyme B IC 50 = 10nM
Psychiatric Disease Dual agonist for two distinct GPCRs Collaboration with Sumitomo Dainippon Pharma Design against polypharmacology profiles in vitro assessment Rapid delivery of candidate to in vivo safety study
Lead Identification Design, synthesis, assay: 5-15 compounds per 2-week cycle Design of 5 chemotypes Compounds synthesized Ease of synthesis Dual agonist activity Best Affinity GPCR selectivity* 25 80 nm 70 nm to lead optimization 5 30 45 70 nm 100 nm to lead optimization 5 multiple compounds <150nM at both targets. * <50% activity at 1uM over 20 GPCR receptors
Lead Optimization 80 further compounds for each prioritized scaffold Dual agonist <20nM target 1 <20nM target 2 scaffold designated as backup scaffold prioritized additional assays progressed further compounds made on this scaffold Candidate Seeking 40 compounds for prioritized scaffold Dual agonist <20nM target 1 <20nM target 2 solubility HERG >10uM GPCR selectivity DMPK
quality of compounds over 8 metrics Towards candidate nomination Successful bispecific project for CNS disease 2 chemotypes progressing to candidate selection (for Q2 2015) <400 compounds synthesized and assayed 12 month project circle size number of assays performed for each compound circle colour compound quality/proximity to objectives compounds synthesized over time
Faster, cleaner, lower cost Lead to Candidate $3.5M improved productivity $13.5M Hit to Lead Target to Hit Target to Hit Hit to Lead Lead Opt Target Standard cost ($million)* $1 $2.5 $10 Lead to Candidate 1.25 Years shorter timelines enhance efficiency 4.5 Years Hit to Lead Target to Hit Target Standard time (years) * Target to Hit Hit to Lead Lead Opt 1 1.5 2 *Reference cost & time model of R&D process from Eli Lilly: Paul, SM et al. Nature Rev Drug Disc. (2010), 9(3), 203-214
Disruptive approach Using the cloud as IT infrastructure we were able to develop our technology to mine, process and share information smoothly and efficiently Our technology can use this large set of information to deliver Intellectual Property to our clients in the form of compound designs and improve their hit discovery campaign Multiple targets is a realistic objective
Extended capabilities A disruptive system for automated medicinal chemistry Single target projects Improved side effect profiles Planned avoidance of anti-targets Bispecific small molecules Increased efficacy New therapeutic space Efficiency gains Faster project delivery More projects explored Phenotypic drug discovery Increased efficacy New therapeutic space
The team CEO Andrew Hopkins Chair of Medicinal Informatics (Dundee & Oxford) Raised $50 Million for research Author of highly cited papers CTO Jérémy Besnard Co-founder & Co-inventor CIO Richard Bickerton Co-founder. 8 years in Biotech. Trained by Sir Tom Blundell. Chemoinformatics Willem van Hoorn Molecular Informatics Adrian Schreyer COO Mark Swindells Previously Yamanouchi (Tsukuba, Japan) & CSO at Inpharmatica Ltd (UK). Raised over 40 Million in Venture Capital Chief Chemist Andy Bell Co-inventor of sildenafil (Viagra) & key contributor to voriconazole (Vfend) project.
THANK YOU