Classical direct methods



Similar documents
Phasing & Substructure Solution

Structure Factors

Electron density is complex!

Crystal Structure Determination I

X-Ray Diffraction HOW IT WORKS WHAT IT CAN AND WHAT IT CANNOT TELL US. Hanno zur Loye

Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks

KEANSBURG SCHOOL DISTRICT KEANSBURG HIGH SCHOOL Mathematics Department. HSPA 10 Curriculum. September 2007

Publication of small-unit-cell structures in Acta Crystallographica Michael Hoyland ECM28 University of Warwick, 2013

Phase determination methods in macromolecular X- ray Crystallography

An introduction in crystal structure solution and refinement

LMB Crystallography Course, Crystals, Symmetry and Space Groups Andrew Leslie

South Carolina College- and Career-Ready (SCCCR) Pre-Calculus

Eðlisfræði 2, vor 2007

Absolute Structure Absolute Configuration

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

PURSUITS IN MATHEMATICS often produce elementary functions as solutions that need to be

X-ray Powder Diffraction Pattern Indexing for Pharmaceutical Applications

Experiment #1, Analyze Data using Excel, Calculator and Graphs.

AP Physics 1 and 2 Lab Investigations

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

Common Core Unit Summary Grades 6 to 8

Lecture 14: Section 3.3

" {h k l } lop x odl. ORxOS. (2) Determine the interplanar angle between these two planes:

Algebra 1 Course Information

Georgia Department of Education Kathy Cox, State Superintendent of Schools 7/19/2005 All Rights Reserved 1

With the Tan function, you can calculate the angle of a triangle with one corner of 90 degrees, when the smallest sides of the triangle are given:

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

List the 3 main types of subatomic particles and indicate the mass and electrical charge of each.

Molecular Dynamics Simulations

Tennessee Mathematics Standards Implementation. Grade Six Mathematics. Standard 1 Mathematical Processes

CORDIC: How Hand Calculators Calculate

Forecasting Methods. What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes?

2010 Solutions. a + b. a + b 1. (a + b)2 + (b a) 2. (b2 + a 2 ) 2 (a 2 b 2 ) 2

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Introduction to Principal Components and FactorAnalysis

Dr.B.R.AMBEDKAR OPEN UNVERSITY FACULTY OF SCIENCE M.Sc. I year -CHEMISTRY ( ) Course I: Inorganic Chemistry

Graphs of Polar Equations

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Infrared Spectroscopy: Theory

Comparing Alternate Designs For A Multi-Domain Cluster Sample

PRE-CALCULUS GRADE 12

9. Sampling Distributions

Marketing Mix Modelling and Big Data P. M Cain

Math 1. Month Essential Questions Concepts/Skills/Standards Content Assessment Areas of Interaction

Prentice Hall Mathematics: Algebra Correlated to: Utah Core Curriculum for Math, Intermediate Algebra (Secondary)

Section 5.0A Factoring Part 1

Higher Education Math Placement

NEW MEXICO Grade 6 MATHEMATICS STANDARDS

Chapter 9 Summary and outlook

PYTHAGOREAN TRIPLES KEITH CONRAD

Time series Forecasting using Holt-Winters Exponential Smoothing

APPLIED MATHEMATICS ADVANCED LEVEL

Data Mining: Algorithms and Applications Matrix Math Review

SOME EXCEL FORMULAS AND FUNCTIONS

Year 9 set 1 Mathematics notes, to accompany the 9H book.

Calculating Nucleic Acid or Protein Concentration Using the GloMax Multi+ Microplate Instrument

A Learning Based Method for Super-Resolution of Low Resolution Images

17. SIMPLE LINEAR REGRESSION II

ASSESSSMENT TASK OVERVIEW & PURPOSE:

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Diffraction Course Series 2015

Friday, January 29, :15 a.m. to 12:15 p.m., only

FACTORING ANGLE EQUATIONS:

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Symmetric Stretch: allows molecule to move through space

Constrained curve and surface fitting

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER

Prentice Hall Algebra Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

Solutions to Homework 10

Exact Values of the Sine and Cosine Functions in Increments of 3 degrees

Measurement with Ratios

The Bass Model: Marketing Engineering Technical Note 1

Correlation key concepts:

Curriculum Map by Block Geometry Mapping for Math Block Testing August 20 to August 24 Review concepts from previous grades.

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

DO PHYSICS ONLINE FROM QUANTA TO QUARKS QUANTUM (WAVE) MECHANICS

Relevant Reading for this Lecture... Pages

Geometry. Higher Mathematics Courses 69. Geometry

EXPERIMENTAL ERROR AND DATA ANALYSIS

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Eigenvalues and Eigenvectors

Fundamentals of grain boundaries and grain boundary migration

AMSCO S Ann Xavier Gantert

Appendix A: Science Practices for AP Physics 1 and 2

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Algebra 1 Course Title

REWARD System For Even Money Bet in Roulette By Izak Matatya

ISOMETRIES OF R n KEITH CONRAD

Core Maths C2. Revision Notes

X On record with the USOE.

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Molecular Geometry and Chemical Bonding Theory

Warm up. Connect these nine dots with only four straight lines without lifting your pencil from the paper.

Integrated Resource Plan

PROTON NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY (H-NMR)

We shall first regard the dense sphere packing model Draw a two dimensional pattern of dense packing spheres. Identify the twodimensional

Transcription:

Classical direct methods Göttingen, December 4 th 2008 George M. Sheldrick http://shelx.uni-ac.gwdg.de/shelx/

The crystallographic phase problem In order to calculate an electron density map, we require both the intensities I = F 2 and the phases φ of the reflections hkl. The information content of the phases is appreciably greater than that of the intensities. Unfortunately, it is almost impossible to measure the phases experimentally! This is known as the crystallographic phase problem and would appear to be difficult to solve! (np-hard?) Despite this, for the vast majority of small-molecule structures the phase problem is solved routinely in a few seconds by black box direct methods.

The Sayre equation In the same issue of Acta Cryst. (1952), Sayre, Cochran and Zachariasen independently derived phase relations and showed that they were consistent with the equation now known as the Sayre equation: F h = q h (F h F h-h ) where q is a constant dependent on sin(θ)/λ for the reflection h (hkl) and the summation is over all reflections h (h k l ). Sayre derived this equation by assuming equal point atoms. For such a structure the electron density (0 or Z) is proportional to its square (0 or Z 2 ) and the convolution theorem gives the above equation directly. The Sayre equation is (subject to the above assumptions) exact, but requires complete data including F 000.

Normalized structure factors Direct methods turn out to be more effective if we modify the observed structure factors to take out the effects of atomic thermal motion and the electron density distribution in an atom. The normalized structure factors E h correspond to structure factors calculated for a point atom structure. E h2 = (F h2 /ε) / <F 2 /ε> resolution shell where ε is a statistical factor, usually unity except for special reflections (e.g. 00l in a tetragonal space group). <F 2 /ε> may be used directly or may be fitted to an exponential function (Wilson plot).

The tangent formula (Karle & Hauptman, 1956) The tangent formula, usually in a heavily disguised form, is still a key formula in small-molecule direct methods: tan(φ h ) = h E h E h-h sin( φ h + φ h-h ) h E h E h-h cos( φ h + φ h-h ) The sign of the sine summation gives the sign of sin(φ h ) and the sign of the cosine summation gives the sign of cos(φ h ), so the resulting phase angle is in the range 0-360º.

The triple phase relation If we express the Sayre equation in terms of E : E h = q h (E h E h-h ) and compare the phases of the left and right hand sides of one term in the summation, we obtain: φ h = φ h + φ h-h (modulus 360º) By means of statistical assumtions, for example that the structure consists of equal resolved point atoms, we can obtain a probability distribution (Cochran, 1955) for the error in this TPR: P(Φ) = g exp( 2 E h E h E h-h / N ½ ) where Φ = φ h φ h φ h-h, g is a normalizing factor, and N is the number of (equal) atoms in the (primitive) unit-cell.

The Multan Era (1969-1986) The program MULTAN (Woolfson, Main & Germain) used the tangent formula to extend and refine phases starting from a small number of reflections; phases were permuted to give a large number of starting sets. This multisolution (really multiple attempt) direct methods program was user friendly and relatively general, and for the first time made it possible for nonexperts to solve structures with direct methods. It rapidly became the standard method of solving small-molecule structures. Yao Jia-Xing (1981) found that it was even better to start from a large starting set with random phases (RANTAN), and this approach was adopted by most subsequent programs.

Disadvantages of the tangent formula In space groups without translational symmetry (e.g. P1, C2/m, R3) the tangent formula fits exactly to the trivial but wrong (uranium atom) solution with all phases zero. Enantiomer information tends to be lost in polar space groups (e.g. P1, C2) resulting in a pseudo-centrosymmetric false solution. Heavy atoms tend to be emphasized at the expense of the light atoms (Sayre s equation can be regarded as a squaring equation ). Only the strongest ca. 15% of the data are used.

Negative quartets: using the weak data too Schenk (1973) discovered that the quartet phase sum: Φ = φ h + φ h + φ h + φ -h-h -h is, in contrast to the TPR sum, more often close to 180º than 0º when the four primary E-values E h, E h, E h and E h h h are relatively large and the three independent cross-terms E h+h, E h+h and E h +h are all small. Hauptman (1975) and Giacovazzo (1976) derived probability formulas for these negative quartets using different approaches; Giacovazzo s formula is simpler and more accurate and so came into general use. Although this phase information is weak (and depends on 1/N rather than 1/N ½ for TPRs) tests based on negative quartets, unlike triplets and the tangent formula, discriminate well against uranium-atom false solutions.

The limits of purely reciprocal space direct methods Conventional direct methods based on improved versions of the tangent formula and implemented in programs such as MULTAN, RANTAN, SAYTAN, DIRDIF, SIR, SHELXS etc. were extremely efficient at solving small molecule structures up to 100 unique atoms but only succeeded in solving a handful of structures larger than about 200 unique atoms. The efficiency of the tangent formula as a phase space search engine lies in its ability to correlate phases widely distributed in reciprocal space. Despite its computational efficiency, the tangent formula tends to lose enantiomorph discrimination and drifts towards uranium-atom false solutions, especially for larger structures. It was clearly necessary to constrain the phases more tightly to be chemically reasonable.

Finding the minimum

Dual space recycling Introduced with the SnB program by the Buffalo group in 1993 and later used as the basis for SHELXD. The real space part imposes a strong atomicity constraint on the phases that are refined in the reciprocal space part. This approach also works well for the location of heavy atoms from SAD, MAD etc. data, because these atoms are also well resolved from each other.

The correlation coefficient between E o and E c 100 [ Σ(wE o E c ) Σw Σ(wE o ) Σ(wE c ) ] CC = { [Σ(wE 2 o )Σw (ΣwE o ) 2 ] [Σ(wE 2 c )Σw (ΣwE c ) 2 ] } ½ Fujinaga & Read, J. Appl. Cryst. 20 (1987) 517-521. This is the test used in SHELXD to identify the best solution. For data to atomic resolution, a CC of 65% or more almost always indicates a correct solution. For heavy atoms from SAD or MAD data, above 30% is probably solved. CC(weak), calculated using the reflections NOT used for phasing (like the free R) is particularly useful for low resolution SAD or MAD phasing, and should be at least 15%.

Random OMIT maps Omit maps are frequently used by protein crystallographers to reduce model bias when interpreting unclear regions of a structure. A small part (<10%) of the model is deleted, then the rest of the structure refined (often with simulated annealing to reduce memory effects) and finally a new difference electron density map is calculated. A key feature of SHELXD is the use of random omit maps in the search stage. About 30% of the peaks are omitted at random and the phases calculated from the rest are refined. The resulting phases and observed E-values are used to calculate the next map, followed by a peak-search. This procedure is repeated 20 to 500 times. Although the random omit and probabilistic Patterson sampling appreciably improve the efficiency of direct methods, using both together is not much better than either alone. Usually we use the probabilistic Patterson sampling for the location of heavy atoms for macromolecular phasing and random omit maps for ab initio structure solution.

Fine tuning SHELXD for difficult structures 1. Use the inner loop (FIND) just to find heavy atoms (with the help of a super-sharp Patterson, PSMF 4) and the outer loop (PLOP) to expand to the full structure. 2. If the distribution of peak heights indicates a tendency to produce uranium atom solutions, increase the number of phases held fixed in tangent expansion by increasing the second TANG parameter (e.g. TANG 0.95 0.6). 3. Expand the data to P1, then find the true symmetry later (TRIC). 4. If the data are borderline for atomic resolution, rename the resulting name.res file to name.ins and use SHELXE to produce a map (e.g. shelxe name m20 s0.4 e1). This is also a good way of inverting the structure if required (direct methods cannot distinguish enantiomorphs), and also now enables polypeptides to be autotraced.

Unknown structures solved by SHELXD Compound Sp. Grp. N(mol) N(+solv) HA d(å) Hirustasin P4 3 2 1 2 402 467 10S 1.20 Cyclodextrin P2 1 448 467 0.88 Decaplanin P2 1 448 635 4Cl 1.00 Cyclodextrin P1 483 562 1.00 Bucandin C2 516 634 10S 1.05 Amylose-CA26 P1 624 771 1.10 Viscotoxin B2 P2 1 2 1 2 1 722 818 12S 1.05 Mersacidin P3 2 * 750 826 24S 1.04 Feglimycin P6 5 * 828 1026 1.10 Tsuchimycin P1 1069 1283 24Ca 1.00 rc-wt Cv HiPIP P2 1 2 1 2 1 1264 1599 8Fe 1.20 Cytochrome c3 P3 1 2024 2208 8Fe 1.20 *twinned The largest substructure solved so far was probably 197 correct Se out of a possible 205 by Qingping Xu of the JCSG (PDB 2PNK).

The 1.2 Å rule Experience with a large number of structures has led us to formulate the empirical rule that if fewer than half the number of theoretically measurable reflections in the range 1.1-1.2 Å are observed, it is very unlikely that the structure can be solved by direct methods [Sheldrick, 1990]. Remarkably, this rule has withstood the test of time. Morris & Bricogne, Acta Cryst. D59 (2003) 615-617 gave an explanation: the variation of the experimental E 2 with resolution shows that data in the range 1.2-1.0 Å have a higher information content. When heavier atoms are present, this rule can be relaxed somewhat, but then density modification may be needed to complete the structure.

Strategy for solving a twin with SHELXD Peter Müller and I have noticed that, even for 50/50 twins without heavy atoms, the SHELXD inner loop (FIND) often gives a double image that is strongly biased towards one component. This has been exploited by incorporating TWIN and (fixed) BASF into the outer loop (PLOP). This bistable recycling algorithm crystallizes as one of the two components in the same way that it resolves enantiomorphic double images. More PLOP cycles should be used than for non-twins, and the threshold for entering the outer loop should be lowered. If heavy atoms are present, the probabilistic Patterson sampling PATS should always be used to generate starting atoms, since it is still valid for twins!

Structure solution in P1 It has been observed [e.g. Sheldrick & Gould, Acta Cryst. B51 (1995) 423-431; Xu et al., Acta Cryst. D56 (2000) 238-240; Burla et al., J. Appl. Cryst. 33 (2000) 307-311] that it may be more efficient to solve structures in P1 and then search for the symmetry elements later. This works particularly well for solving P1 structures in P1 (most simply by using the TRIC instruction in SHELXD). This approach requires finding the positions of the symmetry elements (and perhaps with their help determining the space group) after solving the structure, for example after solving a P1 structure in P1 it is necessary to move the complete structure so that an inversion center lies on the origin.

Acknowledgements I am particularly grateful to Isabel Usón, my group in Göttingen and many SHELX users for discussions and test data. SHELXD references: Usón & Sheldrick (1999), Curr. Opin. Struct. Biol. 9, 643-648; Sheldrick, Hauptman, Weeks, Miller & Usón (2001), International Tables for Crystallography Vol. F, eds. Arnold & Rossmann, pp. 333-351; Sheldrick (2008), Acta Cryst. A64, 112-122.