Examples from Industrial Practice in Lead Development Wolfgang Muster F. Hoffmann-La Roche Ltd.
Areas Computer-Aided Molecular Modeling (CAMM) * Absorption, Distribution, Metabolism and Excretion (ADME) Physicochemical Properties Predictive Toxicology Genotoxicity/Carcinogenicity Phospholipidosis Genotoxic impurities * Alternative terms applied to this area: Computer-Aided Drug Design (CADD) Computational Drug Design (CDD) Computer-Aided Molecular Design (CAMD) Rational Drug Design In silico Drug Design Computer-Aided Rational Drug Design Computer-Aided Drug Discovery and Development (CADDD) Cheminformatics and Molecular Modeling
Areas Efficacy Safety Profile Drug Candidate ADME Properties
Failure reasons ADME 8% In silico ADME Business 22% Safety 42% Efficacy 28% Better target validation Predictive Toxicology
Computer-Aided Molecular Modeling (CAMM) * * nature of known ligands, homology to related targets, the size, polarity and shape of binding sites in known target 3D structures, knowledge of the key amino acids modulating selective binding or functional activity Remark: 3D-receptor modeling for prediction of potential side effects are presently devised (Vedani et al.) Stahl et al. (2006) Drug Discover Today 11(7-8): 326-33. Kapetanovic (2008) Chem-Biol Interactions 171: 165-76.
Computer-Aided Molecular Modeling (CAMM) Various computational methods, such as Virtual screening Many computational techniques are available to compile focused compound sets, with most of them falling under the umbrella term virtual screening Fragment-based screening Fragment screening is an additional focused screening technique; small libraries of several hundred to several thousand low molecular weight substances that are screened by directbinding methods in combination with X-ray crystallography Chemogenomics search strategies (for target classes without structure information, especially for G-protein-coupled receptors) Multidimensional similarity paradigm: ligand structure similarity, target sequence similarity and similarity of biological effects are combined. Biological similarity is determined in terms of affinity fingerprints of compounds against a set of targets. Classic structure-based design (QSARs) should be seen as multifaceted disciplines contributing to the early drug discovery process. Stahl et al. (2006) Drug Discover Today 11(7-8): 326-33.
Fostel, J. Predictive ADME-Tox 2005
Predictive ADME Molecular properties Optimization of chemical series (quality of leads) All activities of promising compound classes should focus on multiple ADME Tox-related parameters in parallel to activity and selectivity Results of commercially available tools for calculating physicochemical properties and ADME-related parameters have to be interpreted with great care The use of generic models can only be recommended if they have been validated for a particular project; results of new compounds outside of the training sets can be misleading (ionization constants, lipophilicity and solubility) Shift in optimization strategy, use of measured values calls for high quality, fast and standardized assays (100 500 compounds per week) Generally, the aim of a local model is to rank compounds and not to predict the absolute magnitude of an in vivo or in vitro effect Allows project teams to abandon the classic paradigm of sequential filtering in more complex and expensive models (continuous model building; in vivo spot checks) Stahl et al. (2006) Drug Discover Today 11(7-8): 326-33.
In silico prediction systems Toxicology Use of in silico tools within toxicology: In silico prediction of toxic effects at early development stages before drug candidate selection Hypothesis generation for structural mechanisms of action In later stages: first assessment of impurities, degradation products, side products, metabolites,...e.g. structural evaluation of synthesis schemes
In silico prediction systems Summary table System name Short description Predicted endpoints Classical QSAR approaches Correlate structural or property descriptors of compounds with biological activities QSARs for various endpoints published DEREK for Windows Knowledge(rule)-based expert system M/C/SS/I and more (>40) MCASE (CASE, CASETOX) OncoLogic MDL QSAR lazar TOPKAT ToxScope Machine-learning approach to identify molecular fragments with a high probability of being associated with an observed biological activity Knowledge-based expert system, mimicking the decision logic of human experts QSAR modeling system to establish structure-property relationships, create new calculators and generate new compound libraries Derives predictions from toxicity data by searching the database for compounds that are similar with respect to a given toxic activity TOPKAT employs cross-validated QSTR models for assessing various measures of toxicity; each module consists of a specific database ToxScope correlates toxicity information with structural features of chemical libraries, and creates a data mining system Available modules: M/C/T/I/H/MTD/BD/AT and more C M/C/hERG inhib/at/ld50 M/C/H/ET Available modules: M/C/T/LD50/SS/I/ET and more M/C/I/H/T and more HazardExpert Knowledge(rule)-based expert system M/C/I/SS/IT/NT COMPACT PASS Cerius 2 COMPACT is a procedure for the rapid identification of potential carcinogenicity or toxicities mediated by CYP450s Based on the comparison of new structures with structures of wellknown biological activity profiles by using MNA structure descriptors Molecular modeling software with a ADME/Tox tool package provides computational models for the prediction of ADME properties C and P450-mediated toxicities Multiple endpoints ADME/H
In silico prediction systems Summary table continued System name Short description Predicted endpoints Tox Boxes MetaDrug DICAS CADD CSGeno Tox Modules generated by a machine-learning approach implemented in a fragment-based Advanced Algorithm Builder (AAB) Assessment of toxicity by generating networks around proteins and genes (toxicogenomics platform) Cascade model with the capability to mine for local correlations in datasets with large number of attributes Computer-aided drug design (CADD) by multi-dimensional QSARs applied to toxicity-relevant targets QSTR-based package employing electrotopological state indexes, connectivity indexes and shape indices M/AT/C/LD50 and more >40 QSAR models for ADME/Tox properties C Receptor- and CYP450-mediated toxicities, ED M Admensa Interactive QSAR-based system primarily for ADME optimization CT PreADMET Calculation of important descriptors and neural network for the construction of prediction system M/C BfR Decision Support System Rule-based system using physicochemical properties and substructures I and corrosion M=Mutagenicity, C=Carcinogenicity, SS=Skin Sensitisation, I=Irritancy, H=Hepatotoxicity, T=Teratogenicity, MTD=Maximum Tolerated Dose, LD50=, BD=Biodegradation, AT=Acute Toxicity, ET=Environmental Toxicities, IT=Immunotoxicity, NT=Neurotoxicity, CT=Cardiotoxicity, ED= Endocrine disruption, ADME=Absorption Distribution Metabolism Excretion, QSTR=Quantitative Structure Toxicity Relationship, MNA=Multilevel Neighborhoods of Atoms Muster et al. (2008) Drug Discovery Today 13/7-8, 303-310.
In silico prediction systems Toxicology DEREK for Windows (DfW) Deductive Estimation of Risk from Existing Knowledge DfW is a knowledge-based expert system for the qualitative prediction of toxicity. DfW is not a database system but a rulebase system. Each rule describes relationship between a structural feature (toxicophore) and its associated toxicity. Genotoxicity endpoint represented by 139 rules* (51 chromosomal damage*) Carcinogenicity endpoint represented by 54 rules* Irritation (skin, eye and respiratory tract) (33 rules*) Sensitisation (skin and respiratory tract) (76 rules*) Thyroid toxicity, herg channel inhibition, oestrogenicity, photo-induced effects, neurotoxicity, teratogencity: less well covered Negative in DfW means: really negative or not covered! * DfWV9.0.0
In silico prediction systems Toxicology MultiCASE (MCASE) Multiple Computer Automated Structure Evaluation MCASE tries to predict toxicity on the basis of discrete structural fragments found to be statistically relevant to specific biological activity (biophores). The differences between active and inactive molecules are investigated with the help of a so-called learning dataset, to deduce the attributes or substructures (socalled biophores) responsible for activity. From the frequency with which a particular biophore is identified in all active and all inactive molecules, one can calculate the probability with which this fragment is associated with biological activity. Ames modules for each strain +/- rat or hamster S9 available Four carcinogenicity modules incl. proprietary data male/female rats and mice Modules and the underlying database have been developed with FDA High prediction accuracy of the MCASE modules (mainly based on the unique dataset) Teratogenicity/Developmental toxicity/male fertility/behavioral toxicity in diff species (49 modules) Hepatotoxicity in humans (14 modules) GSH adduct formation (in-house) rat and human microsomes Further available modules: antibacterial (pharm), ADME, cytotoxicity, ecotoxicity, skin/eye irritations, allergies, enzyme inhibition, biodegradation, bioaccumulation
In silico prediction systems Toxicology DEREK for Windows (DfW) DEREK is a knowledge-based expert system for the qualitative prediction of toxicity. DEREK is not a database system but a rulebase system. Each rule describes relationship between a structural feature (toxicophore) and its associated toxicity. METEOR Meteor is a computer program that helps scientists who need information about the metabolic fate of chemicals. The program uses expert knowledge rules in metabolism to predict the metabolic fate of chemicals and the predictions are presented in metabolic trees. The only information needed by the program to make its prediction is the molecular structure of the chemical. VITIC Toxicology Database Vitic is a chemically intelligent toxicology database, which can recognise and search for similarities in chemical structures. Vitic is especially useful in (Quantitative) Structure-Activity Relationship (QSAR) modelling.
In silico prediction systems Toxicology MultiCASE (MCASE) MCASE tries to predict toxicity on the basis of discrete structural fragments found to be statistically relevant to specific biological activity (biophores). The differences between active and inactive molecules are investigated with the help of a so-called learning dataset, to deduce the attributes or substructures (so-called biophores) responsible for activity. From the frequency with which a particular biophore is identified in all active and all inactive molecules, one can calculate the probability with which this fragment is associated with biological activity. In silico phospholipidosis tool (CAFCA) In-house tool predicts amphiphilic properties of charged small molecules expresed in terms of free energy of amphiphilicity (DDG AM ). Amphiphilic compounds have the potential to accumulate in lipid bilayers, interfering with the phopholipid metabolism and turnover, therefore causing adverse effects. In silico phototoxicity prediction Phototoxicity prediction based on chemical structure or chemical structure in combination with measured UV spectra Further endpoints in development Promising results with local models with the potential to be generally applicable (e.g. prediction of herg channel inhibition, GSH adduct formation)
Expert vs data-driven (QSAR) systems - Toxicology Local SARs (project-specific SARs) based on 5 to maximally 30 data points; can be evaluated by eye (Q)SAR systems will get increasing importance if HCS for more toxicological endpoints are validated and implemented (Q)SAR systems are normally not used for genotoxicity and/or carcinogenicity at Roche Commercial systems are predicting well and can be optimized Acceptance of regulatories Established for other endpoints (e.g. phototoxicity, phospholipidosis, herg assay) (Q)SAR systems might be also helpful, if additional in vitro HCS parameters or cross-reactivities have been measured
Use of in silico genotoxicity prediction On-the-fly Prediction/Classification DEREK combined with MCASE LI/LO In silico DEREK / MCASE analysis Crosscheck VITIC, METEOR, SciFinder, TOXNET optimize (HTS) MNT in vitro tbd optimize Ames micro MNT in vitro In vitro Gene mutations Chromosomal aberration CCS RDC1 In vivo Ames GLP HCA one or both MNT in vivo ML/TK required for phase II Structural assessments of synthesis scheme, impurities, metabolites Rodent cancer bioassay
The success of early genotoxicity screening Year Ames micro number of positive (incl. weak pos. and inconclusive ones) compounds b Full Ames (GLP) number of positive (incl. weak pos. and inconclusive ones) compounds b 1996-33 (48 %) 1997-25 (37 %) 1998 9 (15 %) 11 (24 %) 1999 5 (11 %) 9 (18 %) 2000 11 (11 %) 5 (20 %) 2001 6 (7 %) 3 (21 %) 2002 7 (9 %) 1 (6 %) Start of routine in silico screening 2003 3 (3 %) 0 2004 0 0 2005 3 (2 %) 0 2006 3 (2 %) 0 2007 2 (1 %) 0 2008 a 1 (2 %) 0 a until March 2008 b expected mutagens, intermediates/reactants and positives results due to impurities excluded
Phospholipidosis Drug-induced phospholipidosis is a reversible storage disorder characterized by accumulation of phospholipids within cells, i.e., in the lysosomes Caused by cationic amphiphilic drugs (CADs) and some cationic hydrophilic drugs (e.g. Aminoglycoside gentamicin) Drug-induced phospholipidosis is a generalized condition in humans and animals; it may occur in virtually any tissue characterized by accumulation of one, or several classes of phospholipids within the cell Phospholipidosis may or may not be accompanied by organ toxicity although their association has not been proven (except for gentamicin) Cationic hydrophilic N N O O Hydrophobic residues O O
In silico classification of phospholipidosis potential Free Energy of Amphiphilicity (ΔΔGAM ) Negative ΔΔG AM >= -6 kj/mol pk a < 6.3 I O O O I Positive ΔΔG AM < -6 kj/mol pk a >= 6.3 pka N CAFCA (CAlculated Free energy of Charged Amphiphiles) Fischer, H. et al. (2000) Chimia 54, 640-645.
Techniques to detect phospholipidosis In silico tool From in vivo findings to predictive in vitro assay to HT in silico tool Calculation for large data set possible Accessible on the Intranet - optimization of pka value as well as amphiphilic properties Identification of clear positive chemical series rather than single molecules Useful in Lead Identification and early Lead Optimization (depends on the indication, potency/dose and duration of treatment) Overall predictability of the in silico tool is very high for the in vitro assay; in vitro test normally not conducted anymore Amiodarone as an example of a cationic amphiphilic drug (CAD)
In silico classification of genotoxic impurities In principle, any impurity that is present below the threshold of qualification (0.15%) needs not to be toxicologically qualified or characterized (ICH) For a drug of 1 g daily intake this implies that a chronic intake of less than 1.5 μg of an impurity in that drug is considered toxicologically insignificant, however, ICH guidelines do indicate that lower thresholds (for reporting, identification & qualification) can be appropriate if the impurity is unusually toxic - but do not give guidance on what this is or how to handle Synthesis of APIs often involves reactive starting materials, intermediates or process steps; synthesis pathways frequently involve known or suspected genotoxic compounds Unknown/undetermined low levels of genotoxic impurities may be present (such as e.g. sulfonic acid esters) Issue not directly addressed in ICH guidelines -> new draft of the EMEA guideline on the limits of genotoxic impurities with new concept Clinical developments put on hold, because the synthesis pathways contains intermediates with alerting structures; Companies were requested to either show that the alerting intermediates are below 1 ppm in the drug or provide data on genotoxicity Solution: use a generic TTC (Threshold of Toxicological Concern) based on historical experience with genotoxic carcinogens; staged TTC taking treatment duration into accout
In silico classification of genotoxic impurities Step 1: Identify and classify structural alerts in parent compound and impurities Step 2: Establish a qualification strategy Step 3: Establish acceptable limits A: Limitation based on structural information, chemistry and analytical capabilities B: Testing of neat impurity; limitation based on outcome C: Testing of spiked material; limitation based on outcome Proposal of acceptable intake levels without appreciable risk based on dose, duration of use, indication and patient/volunteer population (staged TTC)
In silico classification of genotoxic impurities Class 1: Genotoxic Carcinogens Eliminate Impurity? No Risk Assessment? 3 Class 2: Genotoxic, Carc unknown Threshold Mechanism? Yes/ Not tested Yes Class 3: Alert Unrelated to parent Impurity Genotoxic? 1 No Class 4: Alert Related to parent API Genotoxic 2 Class 5: No Alerts No or unknown No Staged TTC PDE (e.g. ICH Q3 appendix 2 reference Control as an ordinary impurity 1 Either tested neat or spiked into API and tested up to 250 μg/plate 2 If API is positive, risk benefit analysis required 3 Quantitative risk assessment to determine ADI
In silico classification of genotoxic impurities
In silico systems during drug development process Basic Research Target identification, assessment and validation Lead Identification Lead Optimization Preclinical Development Clinical Development Phase 1 Phase 2 Phase 3 CAMM Filing/Approval & Launch ADME / MolecProp clogp / PSA / cpampa / cpka Metabolic clearance PredTox 1: DEREK / MCASE / VITIC / METEOR PredTox 2: PL / Phototox herg / GSH adducts
Future challenges for drug design and early screening Adequately predict complex toxicological endpoints (e.g. hepatotoxicity, cardiotoxicity, nephrotoxicity) need for standardized high-quality data (Innovative Medicine Initiative) Design in silico tools to cope with the enormous amount of data generated by new techniques HTS/HCS, omics, system biology, biomarkers, etc. Establish closer link from preclinical to clinical development
Conclusions In silico systems are extensively used during the early phases of drug development until selection of the clinical candidate (e.g. 3D-modeling, expert systems, QSAR tools) Applying in silico and in vitro screening significantly reduced failures in early project phases, increased efficiency and improved thquality of clinical candidates The number of ADME-Tox in silico and (HTS)-in vitro screens are rapidly increasing DEREK/MCASE and other commercially available systems are predicting toxicity endpoints like mutagenicity, carcinogenicity, skin sensitisation and irritancy well; inhouse optimization is essential for high performance Further endpoints are less-well covered, mainly due to the lack of comprehensive, high quality and standardized databases QSAR tools can be established, based on internal standardized datasets, e.g. phospholipidosis, phototoxicity, herg channel inhibition, GSH adduct formation Challenge how to predict adequately potential genotoxic impurities from structures in synthesis scheme; further regulations needed?
Are you sure, Stan, that a pointy head and a long beak is what makes them fly?
Acknowledgements Alessandro Brigo Stephan Kirchner Lutz Müller Holger Fischer Manfred Kansy Edith Brandt Raymond Schmitt Wolfgang Hering Joelle Muller Sabine Marget-Muller Nicole Helt Flavio Crameri Laura Suter-Dick Thomas Weiser Thomas Singer