Identifying and modelling key water molecules

Identifying and modelling key water molecules Aim Water molecules can play key roles in mediating protein-ligand interactions. In this use case we illustrate how to make the most out the vast amounts of structural data available from the PDB and the CSD when trying to identifying key hydration sites. We will also show how to model water molecules in protein-ligand docking. Introduction It has long been recognised that water molecules can play a key role in protein-ligand recognition. 1 In the protein-ligand docking package GOLD 2 water molecules can be allowed to spin and toggle on and off. 3 Toggling a water molecule on introduces an entropic penalty to the scoring function which needs to be offset by forming hydrogen bonds to the protein and the ligand. If the hydrogen bonds formed by the water molecules does not offset the entropic penalty introduced by turning the water molecule on then the water molecule will be deselected for (turned off) during the genetic algorithm run. However, an assumption made in the design of the treatment of water molecules in GOLD was that the modeller would know the positions of any key water molecules. This use case will show how one can identify such potential hydration sites by making use of the structural data available in the PDB 4 and the CSD 5. Further, this use case will also illustrate a new feature of GOLD 5.0: the ability to allow key water molecules to translate during the docking. Method In this use case we will be using the a structure of neuraminidase in complex with Zanamavir (1a4g). 6 Accessing the structure through Relibase+ 7 immediately reveals that this particular complex has four water molecules mediating interactions between the protein and the ligand (figure 1). 1

Figure 1 Relibase+ has pre-calculated information on water molecules mediating interactions between protein and ligand molecules. In this neuraminidase structure there are four molecules mediating interactions between the protein and the ligand. Furthermore, by identifying similar binding sites and superimposing them one can find out which water molecules are conserved. Such a search in Relibase+ reveals that there are four structures with a sequence identity >95% to chain A of 1a4g (1a4q, 1nsb, 1nsc and 1nsd). 2

Figure 2 Binding site superimposition analysis in Relibase+. Note that the conserved waters between the reference structure and the superimposed hits are calculated on the fly (bottom right hand column). In this instance we find that the water molecules at the bottom of the cavity (HOH689 and HOH711) are conserved in all structures. However, four protein structures might not be enough to make an informed decision on which waters to include in a docking experiment. Clearly, we could make the sequence similarity cut-off less stringent. However, the protein-ligand structures in Relibase+ (derived from the PDB) are not the only source of structural data. The CSD now contains over half a million small molecule crystal structures, many of which are hydrates. Information on propensities of water probes around functional groups is available from IsoStar 8 (a knowledge base of intermolecular interactions derived from the CSD). Further, the program SuperStar 9 is capable of combining IsoStar propensity maps in order to calculate hotspots in protein binding sites. 3

Figure 3 SuperStar water hotspots calculated for neuraminidase structure 1a4g (purple spheres). The native ligand is displayed in cyan. Water molecules from an apo structure of neuraminidase (1nsb) are shown as red spheres. The water probe hotspots calculated by SuperStar for 1a4g show good agreement with water molecules from the apo structure 1nsb. The SuperStar water hotspot at the bottom of the cavity, figure 3, corresponds to the conserved water HOH711. The SuperStar water hotspot at the edge of the cavity is in all holo structures displaced by carboxylate groups of the ligands. Having identified potential key water molecules we set up a number of docking experiments. The first experiment did not include any water molecules. The second experiment included the native HOH711 water molecule, which was allowed to spin and toggle on and off. The third experiment included two water molecules positioned at the SuperStar calculated water hotspots. The waters were again allowed to spin and toggle on and off. Finally, a fourth docking experiment was set up again using the two SuperStar water hotspots. In this experiment the water molecules were allowed to translate up to 1Å from the original position as well as spin and toggle on and off. All docking experiments used default settings and the ChemScore scoring function. 4

Results When the docking was run without any water molecules the correct pose was not obtained, figure 4. Figure 4 Docking the native ligand into the 1a4g structure does not yield the correct pose when water HOH711 is absent. However, when the native water was included and it was allowed to spin and toggle on and off the correct pose was obtained, figure 5. Figure 5 Docking the native ligand whilst allowing the native water molecule HOH711 to spin and toggle on and off resulted in the correct pose. 5

When using the SuperStar water hotspots the correct pose was not obtained. This can be explained by the SuperStar hotspot corresponding to HOH711 being more buried than the native water molecule. This is not surprising as the SuperStar hotspot was calculated without the ligand present in the binding site. As such the water hotspot was optimised towards the protein carboxylate groups (figure 6). Figure 6 Docking the native ligand whilst allowing the SuperStar calculated water molecules to spin and toggle on and off. The correct ligand pose was not obtained. However, when the SuperStar water molecules were allowed to translate during the docking the correct pose was obtained (figure 7). It is worth noting that the second SuperStar water was always (correctly) toggled off. Figure 7 Docking the native ligand whilst allowing the SuperStar calculated water molecules to spin, toggle on and off and translate resulted in the correct ligand pose. Note that the second water was correctly toggled off. 6

Conclusions There is a vast amount of structural data available, both in terms of protein-ligand complexes and small molecule crystal structures. Using Relibase+ and SuperStar one can make the most out of this data when trying to identify key hydration sites. In protein-ligand docking water molecules can make the difference between success and failure. Further, subtle variations in the orientation and position of the waters can have large effects. GOLD s flexible treatment of water molecules allows modellers to customise the behaviour of individual water molecules; waters in GOLD 5 can be allowed to spin, translate and toggle on and off during the genetic algorithm run. References 1. J.E. Ladbury. Chem. & Biol., 1996, 3, 973-980 2. G. Jones, P. Willett and R. C. Glen. J. Mol. Biol., 1995, 245, 43-53 3. M. L. Verdonk, G. Chessari, J. C. Cole, M. J. Hartshorn, C. W. Murray, J. W. M. Nissink, R. D. Taylor and R. Taylor. J. Med. Chem., 2005, 48, 6504-6515 4. H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov and P. E. Bourne. Nucleic Acids Res, 2000, 28, 235-242 5. F. H. Allen. Acta Cryst., 2002, B58, 380-388 6. N. R. Taylor et al. J. Med. Chem., 1998, 41, 798-807 7. M. Hendlich, A. Bergner, J. Günther, G. Klebe. J. Mol. Biol., 2003, 326, 607-620 8. I. J. Bruno, J. C. Cole, J. P. M. Lommerse, R. S. Rowland, R. Taylor and M. L. Verdonk. J. Comput.-Aided Mol. Des., 1997, 11, 525-537 9. M. L. Verdonk, J. C. Cole and R. Taylor. J. Mol. Biol., 1999, 289, 1093-1108 7

Products CSD the world s only comprehensive, fully curated database of crystal structures, containing over 500,000 entries Relibase+ - an essential tool for searching, exploring and comparing all protein-ligand data from public and in-house data sources IsoStar a knowledge base of intermolecular interactions which provides easy appreciation of the geometry, strength and stability of interactions SuperStar a tool for investigating interaction sites in proteins making it easy to generate pharmacophores using experimental data GOLD an accurate and reliable protein-ligand docking program Hermes CCDC s life science visualiser, used by GOLD, GoldMine, Relibase+ and SuperStar For further information please contact Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, UK. Tel: +44 1223 336408, Fax: +44 1223 336033, Email: admin@ccdc.cam.ac.uk 8