Protein crystallography Structure determination & Analysis
Synchrotron radiation Insertion device Synchrotron radiation is emitted when a beam of electrons moving close to the speed of light is bent by a powerful magnetic field Highly intense X-ray beam Wavelength tunable X-ray bunches picosecond intervals Polarized BCM 6013 2
Data collection equipment Rotates the crystal about an axis (φ) perpendicular to the x-ray beam (and normal to the goniometer). The diffraction pattern from a crystal is a 3-D pattern, and the crystal must be rotated in order to observe all the diffraction spots. Diagram from Bernhard Rupp s Crystallography 101 website: http://www-structure.llnl.gov/xray/101index.html BCM 6013 3
Structure analysis pipeline Data collection Phase determination Data reduction of multiple images 3-D structure model Automated Pipeline Refinement & Modeling Electron density map BCM 6013 4
Data reduction Indexation Integration Merging/Scaling Mosflm, XDS, HKL2000 BCM 6013 5
Scaling Interface BCM 6013 6
Scaling statistics BCM 6013 7
The phase problem Complex number http://www.ruppweb.org/xray/phasing/phasingt.html I hkl what is measured φ hkl Information lost BCM 6013 8
Phasing : ab initio BCM 6013 9
Experimental phase determination has been fully automated A variety of programs are available for this purpose Solve/Resolve, Phaser, Sharp, Crank BCM 6013 10
Phasing : Molecular Replacement (MR) MR: A is the probe molecule, and A' is the target molecule. The operation, consisting of rotation and translation, superimposes the probe onto the target molecule as shown. [R] is the rotation matrix, and T is the translation vector. BCM 6013 11
MR Pipeline BCM 6013 12
Refinement function minimized Model restraints Intensity data Protein structure coordinates are underdetermined from the intensity data requiring additional restraints BCM 6013 13
Refinement & Model Building Pipeline Highly automated pipeline Model building and refinement improves phases which improve the model building etc. BCM 6013 14
Model Inspection with Coot Model inspection tool Useful for manual intervention in model building BCM 6013 15
MolProbity: Model validation BCM 6013 16
Molecular Visualization PyMol: To understand how function is expressed by structure, you need to visualize it. BCM 6013 17
Protein Crystallography Part IV - Refinement and Validation Tim Grüne Dept. of Structural Chemistry University of Göttingen http://shelx.uni-ac.gwdg.de tg@shelx.uni-ac.gwdg.de
Overview Why validation is necessary What data/information can be used for validation How can structures be validated Useful Programs / Web sites PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
Why Validation? Scientific (experimental) results are always afflicted with prejudice and bias be it deliberately or by accident and ignorance. Even though articles are proof read by referees, the experiment itself will hardly ever be repeated by an independent person before publications. Protein crystallography is no exception. However, crystallographic results are most often presented by colourful pictures that can easily make the reader over interpret their meaning. Since such models are used by non crystallographers, it is important for them to be able to check their quality. PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
Words of Caution The left hand side was published in 1989 as the structure of photoactive yellow protein (PDB entry 1phy). It took six years until the corrected version (right) was published (PDB entry 2phy) (G. Kleywegt, Acta Cryst D56 (2000)). PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
Not Everything in Black and White Makes Sense The structure of TBP, the TATA-box binding protein (TBP or TFIIDτ ) was published in 1992 (Nikolov et al., Nature 360, pp.40 46). The shape of the molecule suggested that the TATA box sits straight in the groove of the protein. The structure of the complex, published a year later by Kim et al. (Nature 365, pp. 520 527) revealed that the DNA was actually heavily bent. P ROTEIN C RYSTALLOGRAPHY IV S TRUCTURE VALIDATION
What you see is what you get? Another issue with PDB files is that they contain more information than a graphical viewer might be able to display. Many crystallographers include atoms/residues into their structures without experimental support and set their occupancy to zero. This could be justified because they know the residues were present in the molecule (at least for recombinant proteins). PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
By what means structures can be validated I Validation means estimation of the model in comparison with the data. However, since the model was created by refinement against the data, the model is biased. Therefore, there is need for an independent judge. All information can be used 1. that did not participate in the creation of the model/ minimisation of the model data difference 2. of which ideal values are known. This means that these information must be the same or similar for all proteins. PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
By what means structures can be validated II Comparison Model Data. Data collected from the crystal are of course the first source one would think of when it comes to validation. Unfortunately, in calculating the electron density, amplitudes from the data were mixed with phases from the model. This means that our model is already heavily biased against the data. This is why the 5 10% of all reflections never used for refinement in order to be able to calculate the R free value: R (free) = hkl F obs F calc hkl ( F obs ) 95% of all reflections (h,k,l) are used in order to calculate the R value, which is used for model refinement and optimisation. The remaining 5% of reflections are NOT used for refinement/optimisation, and the R free is calculated from them with the same formula above. Therefore, these 5% of reflections are independent. PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
Dihedral Angles The Ramachandran Plot Comparison Model External Information. A quantity that was not used in refinement, and therefore is mostly unbiased, are angles. The most famous ones are the dihedral angles ψ and ϕ, defined by the peptide main chain. Φ is the angle between the two planes defined by C i 1 N i C α and N i C α i C i, whereas Ψ is the angle between the two planes of N i C α i C i and C α i C i N i+1. Because of energetic reasons, these two angles are not independent. Their dependency is drawn in the Ramachandran plot. PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
The Ramachandran Plot PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
Ramachandran Plot with several Molecules Even more information can be read from the Ramachandran plot, if more than one copy of a molecule live in the asymmetric unit: the two (or more) copies should be rather similar to each other. If one plots the Ramachandran plot for all molecules into the same diagram and connects corresponding residues, one should NOT obtain a picture like this. PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
The Real Space R factor R and R free are global figures of merit: one number each that represents the quality of the total structure. The Ramachandran plot was a first mean to assess local errors, because it provides information about single residues. Another on is the real space R factor or real space correlation coefficient. The electron density around a residue does not depend much on residues far away; this allows calculation of the difference between density calculated from the model and calculated from data (reflections) and model (phases). PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
The Real Space R factor an Example PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
Useful Programs These programs check a lot of properties both of PDB files and X ray diffraction data: procheck This program must be installed on the local computer (no web interface). It takes a PDB file as input and writes out ten Postscript files plus text files that describe deviations from ideal geometry values Electron Density Server http://eds.bmc.uu.se/eds/, from the Uppsala Software Factory. A web interface where the user can enter the code of a PDB entry or upload their own data. Calculates e.g. the real space R factor, B factor plots, etc. WhatCheck http://www.cmbi.kun.nl/pdbreport: extensive checks for PDB files with very strict error checking. Probably finds errors with every structure in the PDB PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
Procheck The summary file from procheck: +----------<<< P R O C H E C K S U M M A R Y >>>----------+../images/1kx5.pdb 1.9 861 residues * Ramachandran plot: 90.9% core 6.6% allow 1.9% gener 0.6% disall * All Ramachandrans: 34 labelled residues (out of 804) + Chi1-chi2 plots: 3 labelled residues (out of 514) Main-chain params: 6 better 0 inside 0 worse Side-chain params: 5 better 0 inside 0 worse * Residue properties: Max.deviation: 5.6 Bad contacts: 13 * Bond len/angle: 5.3 Morris et al class: 1 1 2 G-factors Dihedrals: 0.26 Covalent: 0.42 Overall: 0.33 M/c bond lengths:100.0% within limits 0.0% highlighted * M/c bond angles: 98.5% within limits 1.5% highlighted 1 off graph + Planar groups: 98.3% within limits 1.7% highlighted 1 off graph +----------------------------------------------------------------------------+ + May be worth investigating further. * Worth investigating further. PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
Detailed Ramachandran PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
Geometry Distortion PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
Summary Most of the pretty pictures about proteins represent structures determined by X ray diffraction. But do not be deceived by colours and artistic compositions. Everyone who make use of PDB files / structural data should be aware of possible pitfalls. 1. Read the header information. 2. Consider the resolution and data quality 3. Make use of programs that examine structure and (if available/possible) data Interpreting data is important for science, but one must not exaggerate and stay close to the facts. PROTEIN CRYSTALLOGRAPHY IV STRUCTURE VALIDATION
Time-resolved macromolecular crystallography Static structures of many biological macromolecules are available The detailed mechanism by which they function often remains elusive Goal To capture molecules in action Make recorded movies of electron density maps to obtain structures of intermediates and the reaction mechanism BCM 6013 18
How to Capture Structural Intermediates? Extend the lifetime of intermediates: physical or chemical trapping Low temperature Trigger-freeze trap by freezing ph change or other solvent modification Chemical modification or Real-time snap-shots of evolving structural changes: no trapping Probe fast structural changes at ambient temperature Requires rapid reaction initiation (short laser pulses) rapid data collection (short X-ray pulses, Laue technique) For sub-sec time resolution, Laue diffraction high-flux pink X-ray beam & stationary crystal BCM 6013 19
BCM 6013 20
BCM 6013 21
Early ligand pathways after photolysis of Myoglobin:CO L29F The heme cavity is shown from t = 0 to 1 ns after photolysis Individual trajectories can be simulated by molecular dynamics and examples are shown in b1 b4 Hummer G, Schotte F, Anfinrud PA: Unveiling functional protein motions with picosecond x-ray crystallography and molecular dynamics simulations. Proc Natl Acad Sci USA 2004, 101:15330-15334. BCM 6013 22
Time-resolved wide-angle X-ray scattering of the haemoglobin: carbon monoxide complex Structure quake Surface representation of the expected time dependent structural changes in haemoglobin Cammarata M, Levantino M, Schotte F, Anfinrud PA, Ewald F, Choi J, Cupane A, Wulff M. & Ihee H. (2008). Nat. Methods, 5, 881-886 BCM 6013 23
Dynamical Structural Science next generation of X-ray facilities can revolutionize time-resolved structural studies of proteins. based on hard X-ray free electron lasers (XFELs) conservative extrapolation of recent developments can foresee intermediate trapping and time-resolved wide-angle X-ray scattering studies becoming increasingly widely applied approaches within structural biology. applications of time-resolved Laue diffraction and X-ray absorption spectroscopy will grow, albeit more slowly. imminent application of ultra-fast extremely intense XFEL-generated X-ray pulses to probe the reaction pathways of light-sensitive proteins will offer unique opportunities to probe the structural dynamics of proteins with time by following the evolution of isomerization, bond breaking and charge-transfer reactions on the timescales at which these fundamental photochemical processes occur the remarkable peak brilliance of the XFEL will enable interpretable X-ray diffraction patterns to be recorded from nanometre size crystals of only a few enzyme molecules such an approach would potentially enable extremely rapid chemical triggering of reactions within nanocrystals by rapid mixing enabling an enzyme catalyzed reaction to be followed for some time Westenhoff S, Nazarenko E, Malmerberg E, Davidsson J, Katona G, Neutze R. Time-resolved structural studies of protein reaction dynamics: a smorgasbord of X-ray approaches. Acta Crystallogr A. 2010 Mar;66(Pt 2):207-19. BCM 6013 24
BCM 6013 25