QSAR The following lecture has drawn many examples from the online lectures by H. Kubinyi LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 1
PART III: Target structure independent methods QSAR Hansch Free Wilson analysis Generalized descriptor regression ADME/Tox properties Rule of five Models of absorption, distribution, metabolism Toxicity Methods (Regression and Classification) Drug likeness Fragments of WDI LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 2
Some (counter) examples LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 3
Small change large effect LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 4
Similar binding mode LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 5
Combination of the interactions LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 6
Nevertheless sometimes similarity based methods can help in improbable situations LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 7
Quantitative structure activity relations (QSAR) Correlate general properties of molecules with their biological activities It was well known already from the beginning of the last century that there is a clear correlation between hydrophobicity and narcotic effect Hansch pioneered in the 1960s the use of mainly linear equations of the type Log IC 50 = a Log P +b σ + +k Basic assumption: Similar structures have similar effects Linearity assumption Similar changes have similar (proportional) effects LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 8
Hansch QSAR (Antiadrenergic activity of disubstituted N,N-Dimethyl-αbromophenethylamines) LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 9
Free Wilson (Antiadrenergic activity of disubstituted N,N-Dimethyl-αbromophenethylamines) LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 10
Comparison of models LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 11
How do we obtain the regression equation? Aw=y, where A is the corresponding matrix of physical properties (Hansch) or the Free-Wilson matrix of residues, y is the vector of response variables (Concentration or comparable biological effect), and w is the weight vector Typically the system is over or underdetermined LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 12
Remember regression? Given Aw=y, where A is the MxN data matrix with complexes in the rows and energy terms in the columns, x the weight vector and y the activity vector In general we have a few energy terms and many complexes Least squares optimization A t Aw= A t y (A t A) is symmetric and can be written as O t Λ 2 O, (O= eigenvector matrix, Λ 2 =matrix of non-negative eigenvalues) If (A t A) -1 existed we could write: w= (A t A) -1 A t y= O t Λ -2 O A t y Eigenvectors with zero eigenvalue span the null space Create the pseudoinverse by removing the zero eigenvectors and inverting Resulting model contains only relevant weights LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 13
Least squares We are looking for the solution for w that optimizes the fit, i.e. minimizes the error = min w Aw y This is identical to the w minimizing Aw y 2 min w Aw y 2 = min w (Aw y) t (Aw y) = = min w (w t A t -y t ) (Aw y) = min w (w t A t Aw-2y t Aw+y t y) (w t A t Aw-2y t Aw+y t y)/ w = 2(w t A t A-y t A)=0 What now? Solve with SD or invert A t A LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 14
Regularizing least squares via dimension reduction In general in underdetermined systems it is possible to obtain almost perfect fits However the corresponding models have almost no predictive value >Minimize the number of free (model) parameters! Parameter (feature) selection Use only most significant eigenvectors of the A t A matrix PCA Use only most significant eigenvectors of the y t AA t y matrix PLS Yet there exist (in general) no magical methods that can identify the relevant correlations! Biological insight LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 15
Spurious correlations LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 16
No measure is perfect LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 17
ADME / Tox LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 18
Cheminformatics in drug development Diversity analysis Clustering Virtual screening Combi Chem HTS Reaction databases QSAR Free energy Target identification Isolation of target Screening Synthesis Activity (Affinity) Docking omics Clinical studies Adme/Tox Animal studies Iterative refinement Expression analysis Bioinformatics ADMET and statistics Design Similarity search LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 19
ADME properties Pharmacokinetics Absorption Distribution Metabolism Elimination Often toxicity is treated together with ADME Similar mechanisms as in metabolism Also limits possible gain by drug Gain over risk ratio := therapeutic index ADME/Tox properties often depend on general physicochemical properties of the molecules and not on a direct receptor-ligand interaction qsar LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 20
ADME overview LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 21
One compartment model Dose Site of absorption K a Central compartment C V D K e LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 22
Absorption Depends on hydrophobicity Most drugs are passively absorbed GI wall acts as a semi permeable membrane Acids/bases are almost exclusively absorbed in their neutral form ph dependent Stomach low ph, acidic compounds dissolve less, are however better absorbed Intestine higher ph, basic compounds dissolve less, are however better absorbed Intestine has significantly higher surface E.g. salicylate: Stomach alone 30% absorbed in 1 hour Intestines: 60% in 10 minutes Drug stability in GI fluids is an issue LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 23
Distribution Reversible transfer of the drug from one site in the body to another Only the unbound compound can move freely Plasma bound Tissue bound Acidic drugs often bind to albumin Basic drugs to glyco- and lipoproteins High binding to plasma proteins >> low V D High binding to tissue protein>> high V D unbound unbound Distribution depends on: Blood flow to tissue Hydrophobicity (High log P) high V D ) Regional differences in ph Binding LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 24
Metabolic processes Drugs are metabolized mainly in the liver, but also in the intestinal wall and the organs Phase I: Reductions, oxidation, hydrolysis Main enzymes Cytochrome P450 Phase II Conjugation with small molecules Glucuronidation,,sulfatation, acetylation, conjugation with glycin, glutamine, O N and S methylation (Phase III) Drug transport, excretion Polar compounds (Kidneys) Lipophilic compounds (bile, fecies) LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 25
Excretion Irreversible loss from the body Polar compounds over kidneys Nonpolar through feces Clearance Cl= rate of elimination/concentration in plasma LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 26
An example: Theophyline Dimethylated xanthine Effect: Bronchodilation (asthma treatment) Increases camp level Cardiac stimulant Plasma level response >0 µg/l sub-therapeutic levels >5 µg/l clinical improvement possible >10 µg/l optimum range >20 µg/l Nausea, vomiting, diarrhea, >40 µg/l Seizure, brain damage, cardiac arrhythmia/arrest LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 27
ADME Absorption: oral 4-6 h (rapid) Distribution: rapid distribution into peripheral tissues other than fat Volume of distribution 0.45l/kg Only 60% plasma protein bound Crosses placenta, enters breast milk Metabolization Inactive metabolites formed by many pathways Main pathways: formation of uric acid by hydroxylation and demethylation Excretion <10% of unmodified theophyline Non-linear response of plasma level Saturation of demethylating enzyme Increase in excretion (diuretic effect) LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 28
Clearance Clearance depends on Age Diet (high carbohydrate, low protein, excessive caffeine) > decreased clearance Disease (liver disease, heart failure, pneumonia, fever) Drug interactions Smoking LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 29
Predicting biological (e.g. ADME/Tox) properties LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 30
Lipinskis rule of 5 Analysis of 2245 typical molecules from the WDI Molecules that do fall into one of the following categories probably show poor bioavailability: A molecular weight of more than 500g/mol A calculated lipophilicity (logp) of more than 5 More than 5 H-bond donors More than 10 H-bond acceptors (sum of O and N atoms) Often a fifth rule is added: If the number of rotatable bonds is less than 10 then one of the four previous rules can be violated LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 31
ADME/Tox and QSAR take home messages Pharmacokinetics and Tox limit the therapeutic gain of drugs Property prediction by simple linear models Condition: Similar (small) changes have similar effect Compounds belong to the same series Similar mode of action Similar binding mode Binding affinity correlates with interaction energy Biological activities correlate with binding affinity LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure independent methods J. Apostolakis 32