Gold (Genetic Optimization for Ligand Docking) G. Jones et al. 1996



Similar documents
Molecular Docking. - Computational prediction of the structure of receptor-ligand complexes. Receptor: Protein Ligand: Protein or Small Molecule

Scoring Functions and Docking. Keith Davies Treweren Consultants Ltd 26 October 2005

Multiobjective Robust Design Optimization of a docked ligand

QSAR. The following lecture has drawn many examples from the online lectures by H. Kubinyi

Consensus Scoring to Improve the Predictive Power of in-silico Screening for Drug Design

Section IV.1: Recursive Algorithms and Recursion Trees

Genetic Algorithms commonly used selection, replacement, and variation operators Fernando Lobo University of Algarve

Hydrogen Bonds The electrostatic nature of hydrogen bonds

Amino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group.

6 Creating the Animation

A Review And Evaluations Of Shortest Path Algorithms

Replication Study Guide

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

agucacaaacgcu agugcuaguuua uaugcagucuua

Genetic Algorithm. Based on Darwinian Paradigm. Intrinsically a robust search and optimization mechanism. Conceptual Algorithm

A Non-Linear Schema Theorem for Genetic Algorithms

CSC 2427: Algorithms for Molecular Biology Spring Lecture 16 March 10

How To Cluster Of Complex Systems

Translation Study Guide

Original article: A SIMPLE CLICK BY CLICK PROTOCOL TO PERFORM DOCKING: AUTODOCK 4.2 MADE EASY FOR NON-BIOINFORMATICIANS

green B 1 ) into a single unit to model the substrate in this reaction. enzyme

Overview of Eukaryotic Gene Prediction

Genetic Algorithms and Sudoku

Graph Mining and Social Network Analysis

Lab 4: 26 th March Exercise 1: Evolutionary algorithms

D A T A M I N I N G C L A S S I F I C A T I O N

Lectures 2 & 3. If the base pair is imbedded in a helix, then there are several more angular attributes of the base pair that we must consider:

Bioinformatics for Biologists. Protein Structure

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Introduction To Genetic Algorithms

Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert Karakaş, Phoebe L. Stewart, and Jens Meiler

Protein Protein Interaction Networks

Structure Tools and Visualization

DNA Worksheet BIOL 1107L DNA

Chapter 13: Query Processing. Basic Steps in Query Processing

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

College of information technology Department of software

Role of Hydrogen Bonding on Protein Secondary Structure Introduction

On Efficiently Capturing Scien3fic Proper3es in Distributed Big Data without Moving the Data:

ENZYMES. Serine Proteases Chymotrypsin, Trypsin, Elastase, Subtisisin. Principle of Enzyme Catalysis

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Use the Force! Noncovalent Molecular Forces

Persistent Binary Search Trees

1 The water molecule and hydrogen bonds in water

Random Map Generator v1.0 User s Guide

Genetic programming with regular expressions

Using AutoDock with AutoDockTools: A Tutorial

AP BIOLOGY 2010 SCORING GUIDELINES (Form B)

A Fast Computational Genetic Algorithm for Economic Load Dispatch

Hands-on exercises on solvent models & electrostatics EMBnet - Molecular Modeling Course 2005

NO CALCULATORS OR CELL PHONES ALLOWED

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

K'NEX DNA Models. Developed by Dr. Gary Benson Department of Biomathematical Sciences Mount Sinai School of Medicine

Biochemistry 462a Hemoglobin Structure and Function Reading - Chapter 7 Practice problems - Chapter 7: 1-6; Proteins extra problems

Lecture #7 (2D NMR) Utility of Resonance Assignments

Model-based Parameter Optimization of an Engine Control Unit using Genetic Algorithms

Chapter 8: Energy and Metabolism

Isotope distributions

(Refer Slide Time: 2:03)

Refinement of a pdb-structure and Convert

Essentials of Human Anatomy & Physiology 11 th Edition, 2015 Marieb

Language: English Lecturer: Gianni de Fabritiis. Teaching staff: Language: English Lecturer: Jordi Villà i Freixa

Section Activity #1: Fill out the following table for biology s most common elements assuming that each atom is neutrally charged.

Helices From Readily in Biological Structures

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

Unit I: Introduction To Scientific Processes

Consensus alignment server for reliable comparative modeling with distant templates

Alpha Cut based Novel Selection for Genetic Algorithm

Peptide bonds: resonance structure. Properties of proteins: Peptide bonds and side chains. Dihedral angles. Peptide bond. Protein physics, Lecture 5

Demand Forecasting Optimization in Supply Chain

CHM333 LECTURE 13 14: 2/13 15/13 SPRING 2013 Professor Christine Hrycyna

Data Structure [Question Bank]

DATA STRUCTURES USING C

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

VECTORAL IMAGING THE NEW DIRECTION IN AUTOMATED OPTICAL INSPECTION

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

D-optimal plans in observational studies

3 Some Integer Functions

INTRODUCTION TO PROTEIN STRUCTURE

Original Article Efficient Genetic Algorithm on Linear Programming Problem for Fittest Chromosomes

Data Structures and Algorithms Written Examination

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling

Performance Optimization of I-4 I 4 Gasoline Engine with Variable Valve Timing Using WAVE/iSIGHT

Data Visualization in Cheminformatics. Simon Xi Computational Sciences CoE Pfizer Cambridge

SAnDReS Tutorial 01 Prof. Dr. Walter F. de Azevedo Jr.

Automated TLS group determination in Phenix

BCS HIGHER EDUCATION QUALIFICATIONS Level 6 Professional Graduate Diploma in IT. March 2013 EXAMINERS REPORT. Knowledge Based Systems

H 2O gas: molecules are very far apart

REMOTE CONTROL by DNA as a Bio-sensor -antenna.

Protein Studies Using CAChe

Geometric Transformations Grade Four

8-3 The Reactions of Photosynthesis Slide 1 of 51

Neural Network and Genetic Algorithm Based Trading Systems. Donn S. Fishbein, MD, PhD Neuroquant.com

Transcription:

Gold (Genetic Optimization for Ligand Docking) G. Jones et al. 1996 LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 1

Genetic algorithms Inspired from evolution General principle: LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 2

Gold GA Gold uses a genetic algorithm for optimization Steady state principle (single operations no generations) No duplicates Roulette wheel selection Operators and parents Gray coding of binary features Approximate coding of conformation LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 3

The Gold chromosomes Each chromosome consists of two binary plus two integer strings The binary strings code the torsions of the ligand and the protein In the protein the single bonds to terminal H-bond donors are rotatable The integer strings code for the translation and orientation of the ligand, in terms of the H-bonds that are formed. If the Nth integer in the FIRST integer string has the value P then the Nth H-donor in the ligand forms a H-bond with the Pth acceptor of the protein If the Nth integer in the SECOND integer string has the value P then the Nth H-acceptor in the ligand forms a H-bond with the Pth donor of the protein The actual position of the ligand is obtained with a least squares fit LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 4

The H-Bonds LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 5

Gold 1. A set of reproduction operators (crossover, mutation, etc.) is chosen. Each operator is assigned a weight. 2. An initial population is randomly created and the fitness of its members determined 3. An operator is chosen using roulette wheel selection, based on operator weights 10 for crossover, 40 for mutation 4. The parents are chosen with rws based on fitness 5. Offspring are obtained and their fitness evaluated 6. If not already present in the population the children replace the least fit members of the population 7. After 100000 operations stop else goto 3 LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 6

The energy function H-bonds VdW between protein and ligand (12-6 potential) Intra-ligand VdW The energy function of Gold is one of its strengths LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 7

Efficiency depends strongly on the parameters (initial population, number of runs) The developers report very good results already with runs that take ~1 min per complex LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 8

Some results LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 9

A related approach Autodock used initially a SA/MC approach The main advantage of SA is the combination of global optimization (high temperature) with local optimization (lower temperature) For flexible molecules >8 flexible dihedrals it turns out that SA is far too slow LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 10

LGA LGA or GALS Lamarkian GA or GA with local search has been implemented The idea is to adapt each individual to its environment by performing a LS (minimization) Optimization takes place directly on the chromosomes The effect of the minimization is passed on to the offspring Force field type of energy function GM Morris et al 1998, Comparison of SA, GA, LGA LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 11

LGA LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 12

SA/GA/LGA comparison SA GA LGA LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 13

LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 14

Conclusion GA GAs are very robust Default parameters used all along and efficient (depending on the settings) They clearly outperform SA for docking problems Not in our hands A significant part of the trick, seems to be the combination with at least a crude type of local optimization Hydrogen bonds are crucial for docking How do GAs compare with systematic approaches? LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 15

GlamDock Old GlamDock Gold-like interaction point matching search space Steady-State Genetic Algorithm search A ChemScore-like empirical function New GlamDock Replaced the GA with a simpler MC/SA search + conformational stack Simpler configuration More efficient search Smooth, continuously differentiable ChemScore based scoring a gradient based minimization in torsion space More effective identification of local minima LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 16

GlamDock (MCM) LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 17

Comparison between 8 different docking tools Bissantz et al. J. Med. Chem. 2000, 43, 4759-4767 Kellenberger et al. PROTEINS: Structure, Function, and Bioinformatics 57:225 242 (2004) LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 18

8 Docking tools against each other Dock (negative image of binding site) FlexX (incremental construction) Fred (naive) Glide (systematic, funnel) Gold (GA) Slide (Flex protein (side chains), Surflex (Det. GA), QXP (Monte Carlo) (Why not ICM?) LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 19

Sampling accuracy LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 20

Ranking accuracy LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 21

CPU time LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 22

GlamDock LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 23

Conclusion of comparison study Gold, Glide, Surflex, Flexx: Best structure prediction (50-55%) Gold, Glide, Surflex, Flexx: Best screening properties (50-55%) Previous results Poor prediction of absolute free energies Reasonable results for virtual screening Docking and esp. virtual screening depend mainly on scoring function Consensus scoring improves results significantly LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 28

Conclusion of flexible ligand docking Flexible redocking is doable Best methods GAs, and incremental construction (and MCM Main problem is the evaluation of the structures (Score) Possibly scoring functions have been fitted too strongly to redocking of known ligands LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 29

Flexible receptor LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 30

Flexible receptor Side chain flexibility Backbone flexibility Hinge bending Domain flexibility Even small differences can be important! Induced fit Protein mutants Homology modelling LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 31

Substate view of protein dynamics LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 32

Induced fit Folding free energy lies between 10-15 kcal for many proteins Less favorable substates may be stabilized by certain ligands Most of the time the differences are not very large, yet significant LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 33

Side chain flexibility of proteins upon ligand binding Najmanovich et al. Proteins 39:261-268 2000 LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 34

Number of flexible side chains per binding site LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 35

Amino acid type dependence LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 36

AA dependence related to N tor LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 37

Backbone / Side chain flexibility LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 38

Conclusions Relatively few side chains move on average ( 3 for 85% of cases) Polar side chains move most Side chain flexibility does not correlate with backbone flexibility LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 39

Flexible receptor docking LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 40

Methods Simulation MC/MD, SA Fuzzy Discrete Ensembles of structures Rotamer libraries LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 41

FlexE H. Claussen J. Mol. Biol. 2001 308, 377-395 LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 42

Protein flexibility Main idea: describe the protein structure variations with a set of protein structures representing the flexibility, mutation or alternative models of a protein. The variability considered by flexe is defined by the differences within the given input structures. LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 43

United protein description Data structure that administers the protein structures variations. Contains an ensemble of up to 30 possible conformations of the protein. Most of them are low energy conformations of the same protein. LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 44

United protein description - construction Superposition Clustering Add picture - 8 LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 45

United protein description - clustering The superimposed structures are combined by clustering each part separately Complete linkage hierarchical cluster The clustered instances can be recombined to form new valid protein structures. LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 46

Notation Component : all the atoms which belong to the same amino acid or mutation of the amino acid. Contains a backbone part and a side chain part Part : set of instances Instance : one of the alternative conformations. LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 47

Incompatibility Two instances of the united protein description are incompatible if they cannot be realized simultaneously. Logical: two instances are alternative to each other Geometric: two logically compatible instances overlap Structural: two instances of the same chain are unconnected LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 48

Incompatibility graph LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 49

Incompatibility graph The incompatibility is internally represented as a graph by using the instances as nodes and connecting pairs of incompatible nodes by an edge. Valid protein structures correspond to independent sets in the graph. LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 50

Selection of instances The ligand is placed fragment by fragment into the active site by the incremental construction algorithm. After each construction step, all possible interactions are determined. Apply the scoring function for each instance. We choose the IS with the highest score. LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 51

Independent set The IS can be assembled from IS of the connected components. Apply a modified version of the Bron-Kerbosch algorithm on the complementary graph. Compatibility graph Independent components! cliques LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 52

Cannot be extended Enumerating all cliques (Bron Kerbosch, 1973) Clique: Maximal complete subgraph Two versions of the algorithm Both are backtracking algorithms The two algorithms are quite similar The first goes through the cliques in an ordered fashion The second optimizes the order of the search and visits larger cliques at the beginning Version I is mainly relevant for illustration purposes LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 53

Version I Three sets are important for the algorithms: Compsub: Current set Is extended or reduced by one point by travelling along the edges of the backtracking tree Candidates The set of all points that will in due time serve as extension to compsub Not The set of all points that have already served as an extension of the present configuration of compsub LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 54

Version I Recursive extension operator: Extend (COMPSUB, CANDIDATES, NOT, G) If CANDIDATES== //cannot grow if NOT== print COMPSUB //maximality return //backtrack end if For c 2 CANDIDATES Put c in COMPSUB Update CANDIDATES and NOT // Remove all points not connected to the selected candidate Extend (COMPSUB, CANDIDATES, NOT, G) Remove c from COMPSUB and put into NOT End //for return Also for NOT LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 55

Some remarks The lists NOT and CANDIDATES can be concatenated into a single local array NOT CANDIDATES 1.ne ce For the indices ne, ce we have: ne ce ne = ce: CANDIDATES= ne=0: NOT= Ce=0: NOT=CANDIDATES= clique found If ne+1 is the current candidate then all we need to do at the end of extend is ne=ne+1 Both CANDIDATES and NOT must be empty when a clique is found If 9 c 2 NOT s.t. 8 d2 CANDIDATES: (c,d)2 E c will never be removed from NOT! no cliques on this subtree LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 56

Version II Is simply a clever way of choosing the next candidate: Pick vertex c in NOT with the most edges to CANDIDATES Use as next candidate a vertex that is not connected to c With every iteration we are at least one step closer to cutting the subtree LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 57

Evaluation FlexE was evaluated with ten protein structures ensembles containing 105 crystal structure from the PDB. The structures within the ensemble highly similar backbone trace Different conformations for several side chains. LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 58

LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 59

Evaluation Cont. FlexE finds a ligand position with RMSD below 2 A in 67% of the cases. Average CPU time for the incremental construction algorithm is 5.5 minutes. LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 60

LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 61

Conclusion The ensemble approach is able to cope with several sidechains conformations and even movements of loops. Very efficient. Motions of larger backbone segments or even domain movements are not covered by this approach. Main problems: Protein structures (where do they come from?) Internal protein energy LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 62