1 3586 J. Phys. Chem. B 1998, 102, All-Atom Empiricl Potentil for Moleculr Modeling nd Dynmics Studies of Proteins A. D. McKerell, Jr.,*,, D. Bshford,, M. Bellott,, R. L. Dunbrck, Jr.,, J. D. Evnseck,, M. J. Field,, S. Fischer,, J. Go,, H. Guo,, S. H,, D. Joseph-McCrthy,. L. Kuchnir,, K. Kuczer,, F. T. K. Lu,, C. Mttos, S. Michnick,, T. Ngo,, D. T. Nguyen,, B. Prodhom,, W. E. Reiher, III,, B. Roux,, M. Schlenkrich,, J. C. Smith,, R. Stote,, J. Strub,, M. Wtnbe,, J. Wiórkiewicz-Kuczer,, D. Yin, nd M. Krplus*,, Deprtment of Chemistry & Chemicl Biology, HrVrd UniVersity, Cmbridge, Msschusetts 02138, Deprtment of Phrmceuticl Sciences, UniVersity of Mrylnd, School of Phrmcy, Bltimore, Mrylnd 21201, nd Lbortoire de Chimie Biophysique, ISIS, Institut Le Bel, UniVersité Louis Psteur, Strsbourg, Frnce ReceiVed: September 22, 1997; In Finl Form: Februry 6, 1998 New protein prmeters re reported for the ll-tom empiricl energy function in the CHARMM progrm. The prmeter evlution ws bsed on self-consistent pproch designed to chieve blnce between the internl (bonding) nd interction (nonbonding) terms of the force field nd mong the solvent-solvent, solvent-solute, nd solute-solute interctions. Optimiztion of the internl prmeters used experimentl gs-phse geometries, vibrtionl spectr, nd torsionl energy surfces supplemented with b initio results. The peptide bckbone bonding prmeters were optimized with respect to dt for N-methylcetmide nd the lnine dipeptide. The interction prmeters, prticulrly the tomic chrges, were determined by fitting b initio interction energies nd geometries of complexes between wter nd model compounds tht represented the bckbone nd the vrious side chins. In ddition, dipole moments, experimentl hets nd free energies of vporiztion, solvtion nd sublimtion, moleculr volumes, nd crystl pressures nd structures were used in the optimiztion. The resulting protein prmeters were tested by pplying them to noncyclic tripeptide crystls, cyclic peptide crystls, nd the proteins crmbin, bovine pncretic trypsin inhibitor, nd crbonmonoxy myoglobin in vcuo nd in crystls. A detiled nlysis of the reltionship between the lnine dipeptide potentil energy surfce nd clculted protein φ, χ ngles ws mde nd used in optimizing the peptide group torsionl prmeters. The results demonstrte tht use of b initio structurl nd energetic dt by themselves re not sufficient to obtin n dequte bckbone representtion for peptides nd proteins in solution nd in crystls. Extensive comprisons between moleculr dynmics simultions nd experimentl dt for polypeptides nd proteins were performed for both structurl nd dynmic properties. Energy minimiztion nd dynmics simultions for crystls demonstrte tht the ltter re needed to obtin meningful comprisons with experimentl crystl structures. The presented prmeters, in combintion with the previously published CHARMM ll-tom prmeters for nucleic cids nd lipids, provide consistent set for condensed-phse simultions of wide vriety of molecules of biologicl interest. I. Introduction Empiricl energy clcultions re of gret utility in the study of the structure, dynmics, nd thermodynmics of proteins, s well s of other mcromolecules of biologicl interest. 1-3 An essentil element is the simplicity of the potentil energy function, which mkes possible simultions of mesoscopic systems involving tens of thousnds of toms for time scles extending into the nnosecond rnge or longer. Although mny potentil functions re now in use, improvements in their ccurcy continue to be importnt. This is of prticulr concern t present becuse most of the problems now being investigted by simultion methods require more quntittive results thn Abbrevitions: BPTI, bovine pncretic trypsin inhibitor; MbCO, crbonmonoxy myoglobin; NMA, N-methylcetmide; rms, root-mensqure. * Corresponding uthors. Hrvrd University. University of Mrylnd. Université Louis Psteur. Present ddress in Supporting Informtion. did much of the erlier work, 2 where more qulittive fetures were of primry interest. Indeed, the need for more quntittive results from empiricl energy clcultions, rnging from structurl nd dynmic informtion to thermodynmic properties, motivted the development of the CHARMM22 force field for proteins presented in this pper; the 22 signifies tht the present prmetriztion ws first included in version 22 of CHARMM, which ws relesed in While improving the ccurcy, it is desirble to limit the complexity of the potentil function so s not to introduce unnecessry increses in the required computer time. The pproch we tke here is to optimize the prmeters for the widely used CHARMM potentil energy function without chnging the functionl form. 4-8 We present ll-tom prmeters for proteins tht hve been shown to yield good results in vriety of simultions. 9 The methodology used in the prmeter optimiztion, which differs in certin spects from tht employed by others, is consistent with recently published prmeter sets for nucleic cids 13 nd lipids. 14 Becuse of the importnt role of the solvent nd explicit solvent S (97) CCC: $ Americn Chemicl Society Published on Web 04/14/1998
2 Empiricl Prmetriztion of Proteins J. Phys. Chem. B, Vol. 102, No. 18, representtions now used in most simultions, considerble emphsis is plced on blnce mong the protein-protein, protein-solvent, nd solvent-solvent interctions. The pproch for chieving such blnce is refinement of tht employed for the nonbonded interctions in the CHARMM 19 (Prm 19) polr hydrogen potentil energy function. 5,15 Although the new prmeter set uses the sme functionl form s employed previously in the CHARMM progrm, the resulting potentil energy function is considerble improvement over erlier functions. 4,7,8,15,16 In contrst to the CHARMM 19 polr hydrogen prmeter set, 15 which uses extended toms for crbons (e.g., CH 3 group is treted s single tom), the present potentil function includes ll toms explicitly. Further, the prmetriztion is bsed on much broder rnge of experimentl nd b initio dt. This yields prmeters tht re pplicble to wide vriety of systems nd reduces complictions due to correltions mong the prmeters. The present prmeter set ws optimized for the protein min chin nd for the individul side chins by detiled nlyses of one or more smll model compounds for ech cse. The bckbone prmeters used N-methylcetmide (NMA) nd the lnine dipeptide; histidine prmeters re bsed on imidzole, 4-methylimidzole, nd imidzolium; vline, leucine, nd isoleucine re bsed on smll liphtic compounds including ethne, propne, butne, nd isobutne; nd so on. Such strtegy ensures tht the prmeters for ech mino cid re fully optimized with respect to the vilble dt. All of the prmeters were optimized by the sme self-consistent procedures described here for the protein bckbone, except those for the romtic side chins, which were tken with some slight modifictions from the vlues published by Jorgensen. 17 It should be noted tht the explicit representtion of hydrogens in romtic rings is necessry to produce the qudrpole moment required for reproduction of romtic-romtic interctions seen in smll peptide crystls. 17b Detils of the studies mde for prmetriztion of the individul mino cid side chins will be published elsewhere. The present pper describes the philosophy used in the prmetriztion nd gives the detils of the prmeter evlution for the protein bckbone. It lso presents the results obtined by using the prmeters for simultions of liquid NMA nd of number of peptides nd proteins in solution nd in crystl environment. The resulting dt elucidte number of importnt physicl effects, including the importnce of nonbonding contributions in determining structures nd vibrtions nd the need for blnce of intrmoleculr nd intermoleculr terms nd mong the solvent-solvent, solvent-solute, nd solute-solute contributions to the intermoleculr terms. The present work shows tht potentil energy functions of the type used in the present study must be optimized, in prt, with respect to condensed-phse properties; i.e., use of b initio results by themselves is not sufficient. The function nd the prmeters obtined re implemented in CHARMM 22 nd subsequent versions of tht progrm. The protein prmeters, together with those for nucleic cid, 13 lipids, 14 nd crbohydrtes (in progress) form consistent optimized set for wide rnge of biomolecules. The CHARMM progrm includes the potentil energy function we describe here nd the other spects of the moleculr model tht re required for full description (e.g., cutoff vlues). Only with this informtion is it possible to repet clcultion. The progrm is vilble to not-for-profit institutions t nominl chrge. 18 Section II presents the potentil function nd the philosophy of the prmeter development. Section III describes the methods used for minimiztion nd dynmicl simultions to determine nd test the prmeters. The results obtined for the vrious test cses re presented in Section IV. Section IV. focuses on the prmetriztion of the peptide bckbone, while Sections IV.b-IV.d present pplictions to tripeptide crystls, cyclic peptide crystls, nd proteins, respectively. Conclusions nd future directions re given in Section V. The full ll-hydrogen prmeter set is given in n Appendix tht is presented s Supporting Informtion. II. Prmetriztion Methodology Clcultions were performed with the simultion progrm CHARMM, 4 in which n empiricl energy function tht contins terms for both internl nd externl interctions ws used. The energy function hs the form U(RB) ) bonds ngle K b (b - b 0 ) 2 + K UB (S - S 0 ) 2 + UB K θ (θ - θ 0 ) 2 + dihedrls K imp (φ - φ 0 ) 2 + impropers nonbond K χ (1 + cos(nχ - δ)) + ɛ[( R )12 min ij r ij ( R )6 min ij ] q i q j - r + (1) ij ɛ l r ij where K b, K UB, K θ, K χ, nd K imp re the bond, Urey-Brdley, ngle, dihedrl ngle, nd improper dihedrl ngle force constnts, respectively; b, S, θ, χ, nd φ re the bond length, Urey-Brdley 1,3-distnce, bond ngle, dihedrl ngle, nd improper torsion ngle, respectively, with the subscript zero representing the equilibrium vlues for the individul terms. Coulomb nd Lennrd-Jones 6-12 terms contribute to the externl or nonbonded interctions; ɛ is the Lennrd-Jones well depth nd R min is the distnce t the Lennrd-Jones minimum, q i is the prtil tomic chrge, ɛ l is the effective dielectric constnt, nd r ij is the distnce between toms i nd j. The Lennrd-Jones prmeters between pirs of different toms re obtined from the Lorentz-Berthelodt combintion rules, in which ɛ ij vlues re bsed on the geometric men of ɛ i nd ɛ j nd R minij vlues re bsed on the rithmetic men between R mini nd R minj. Becuse of the role of electrosttic contributions in determining intrmoleculr, s well s intermoleculr, energetics (s described below), the effective dielectric constnt ɛ 1 must be set equl to unity in this potentil energy function since otherwise n unblnced prmetriztion will be obtined, prticulrly for the peptide group. This contrsts with the polr hydrogen prmeter set, PARAM 19, 5,15 in which it is pproprite to introduce distnce-dependent dielectric prmeter. For the CHARMM 22 set, neutrlized chrged groups cn be introduced to mimic some spects of the shielding from high dielectric constnt solvent. 120 Given RB, the vector of the coordintes of the toms, the vrious distnces nd ngles required to evlute U(RB) ineq1 re redily determined. All possible bond ngles nd dihedrl ngles re included in U(RB), while limited number of Urey- Brdley terms nd improper dihedrl ngles re used to optimize the fit to vibrtionl spectr. As cn be seen from eq 1, only the qudrtic term is included in the Urey-Brdley function; this is in ccord with n nlysis 19 tht shows the liner term cn be omitted when Crtesin coordintes re used nd the minimum energy structure is employed for determining the vibrtionl frequencies. Nonbonded interction terms re
3 3588 J. Phys. Chem. B, Vol. 102, No. 18, 1998 McKerell et l. included for ll toms seprted by three or more covlent bonds. No generl scling of the electrosttic or Lennrd-Jones interctions for toms seprted by three bonds (the so-clled 1-4 term) is used. In specific cses there is scling of the 1-4 Lennrd-Jones term; exmples include the liphtic crbons nd the mide nitrogen nd oxygen toms. No explicit hydrogenbond term is included becuse the Coulomb nd Lennrd-Jones terms cn ccurtely represent the hydrogen-bonding interctions. 5,15 The wter model used in ll clcultions is the TIP3P model 20 modified for the CHARMM force field. 5 The consistency of the protein nd solvent interctions is bsed on the use of this wter model; i.e., it forms prt of the system description nd other wter models would be less pproprite. II.. Prmetriztion Strtegy. Development of prmeters for empiricl potentil energy functions, such s tht in eq 1, requires coherent strtegy. The present work is n ttempt to optimize the prmeters by the use of wide rnge of informtion in consistent fshion. Self-consistency mong the different terms in the potentil energy function ws chieved by itertive optimiztion of the prmeters. Typiclly, initil vlues of the intermoleculr prmeters (Coulomb nd Lennrd- Jones) were chosen from previous CHARMM prmeter sets 4,5,21 or bsed on the reproduction of b initio interction clcultions on rigid monomers. Given these vlues, the intrmoleculr prmeters (bond length, Urey-Brdley, bond ngle, dihedrl ngle, nd improper dihedrl ngle terms) were determined by using structurl nd vibrtionl dt for the model compounds. The resulting structures were used for optimiztion of the intermoleculr prmeters reltive to interction energies nd condensed-phse properties of model compounds. With the improved interction prmeters, the structures, vibrtionl spectr, nd energy surfces of the model compounds were reoptimized by djusting the internl prmeters. This itertive process ws repeted until convergence of the prmeters ws chieved. Intrmoleculr Terms. Geometries re dominted by the equilibrium vlues for the bond length nd bond ngle terms nd by the dihedrl term phse nd multiplicity. These prmeters were optimized by fitting to gs-phse structures from microwve nd electron diffrction dt or crystl structures from X-ry dt. In the cse of X-ry structures, cre ws tken in the interprettion of the individul crystl structures to ccount for the influence of intermoleculr interctions on the intrmoleculr geometries. Idelly (e.g., for imidzole), both gs-phse nd crystl dt re used. Such combintion llows for prmeter optimiztion in the gs phse followed by testing of the prmeters with the crystl structure where intermoleculr interctions, s well s the intrmoleculr prmeters, influence the geometry. Ab initio dt, especilly for ionic species such s cette, gunidinium, imidzolium, nd methylmmonium, were introduced to supplement the experimentl geometric dt. In mny cses, survey results of crystl structures in the Cmbridge Crystl Dt Bnk (CCDB) 22 were used to determine the rnge of the llowed geometric vlues (e.g., for indole, pyrrolidine); i.e., if lrge number of frgment structures re vilble, the verge geometries tend to diminish the distortions ssocited with crystl interctions. Such verges re, in fct, preferble to gs-phse dt in some cses becuse they contin contributions to the geometry ssocited with condensed-phse effects. An exmple of prticulr importnce for proteins rises in the determintion of the peptide bckbone prmeters (see Section IV.b). Adjustment of the prmeters ws performed mnully, lthough in certin cses (e.g., for proline) utomted procedures were employed. We hve found tht utomted procedures must be used with gret cre owing to the extensive nture of prmeter spce, correltion mong the prmeters, nd their underdetermined nture. An utomted lest-squres procedure often leds to combintion of unphysicl prmeters tht reproduce the input dt. More meningful prmeter vlues, which hve wider rnge of pplicbility, were obtined mnully with resonble prmeter rnges for the optimiztion in the itertive refinement procedure described bove. Once stisfctory geometries were obtined, the force constnts ssocited with the bond length, bond ngle, dihedrl ngle, nd improper torsion terms were djusted by fitting vibrtionl dt for the model systems. Gs-phse infrred nd Rmn dt were the primry sources of such dt. Solution nd crystl dt were used in certin cses, prticulrly for ionic species for which few gs-phse vibrtionl dt re vilble. In the solution results, ttention ws pid to interctions tht could influence the experimentlly determined vibrtionl frequencies nd efforts were mde to ccount for condensedphse contributions; n exmple is the NH stretch ssocited with the peptide bckbone. The results of b initio clcultions were introduced where necessry to supplement the experimentl dt. One re where b initio clcultions were essentil is in the ssignments of experimentl vibrtionl frequencies to internl coordintes. Only limited isotopic substitution dt re vilble from mny cses, nd there re often mbiguities in the interprettion of the dt becuse mny norml modes contin contributions from the sme internl coordintes. Ab initio results were lso used to obtin vlues for low-frequency torsionl modes tht re difficult to observe experimentlly. Finlly, for crystl or solution mesurements, isolted molecule b initio results were used s n id in determining the contribution of intermoleculr interctions to the observed vibrtionl spectr. In prticulr, the optimiztion of force constnts ssocited with the ionic side chins ws significntly ided by the b initio dt. Scled HF/6-31G(d) b initio vlues were used for the vibrtionl clcultions. When fesible, the scling fctor ws determined by comprison of known experimentl frequencies with the b initio results; the derived fctor ws then pplied to the unobserved frequencies. Where this ws not possible, scle fctor of 0.9 ws introduced becuse it hs been found to give good results in other studies. 23 Anlysis of the vibrtionl spectr nd potentil energy distributions were mde with the MOLVIB progrm (J. Wiórkiewicz-Kuczer nd K. Kuczer, unpublished results). Avilbility of ssignments from the potentil energy distributions long with frequencies llowed both to be tken into ccount during optimiztion of force constnts. Following djustment of force constnts to fit the vibrtionl dt, the minimized geometries were rechecked nd djustments were mde to both the equilibrium prmeters nd the force constnts in n itertive fshion, s pointed out bove. Finl optimiztion of the vibrtionl spectr ws done by the ddition of Urey-Brdley nd improper terms in cses where the greement between the clculted results nd the vilble dt ws unstisfctory. Urey-Brdley terms were importnt for the in-plne deformtions s well s seprting symmetric nd symmetric bond stretching modes (e.g., in liphtic molecules). Improper dihedrl terms ided minly in the ccurte reproduction of out-of-plne modes such s the wgging modes of the imidzole hydrogens, nd in the mides, such s N-methylcetmide nd cetmide.
4 Empiricl Prmetriztion of Proteins J. Phys. Chem. B, Vol. 102, No. 18, In ddition to the geometries nd vibrtionl frequencies of specific structures, the reltive energies of different bckbone nd side chin conformers re importnt. Exmples include torsionl surfces for crbon-crbon bonds, methyl nd ethyl rottions in model compounds such s 4-ethylimidzole nd ethylbenzene, nd the lnine dipeptide mp. Although conformtionl nlysis 24 cn give n insightful description of the reltive free energies of different side chin conformers, more quntittive results re needed for potentil energy function. Consequently, detiled clcultions were mde for the energies of side chin conformers. Also, in some cses lrge devitions from the minim cn occur during moleculr dynmics simultions so tht it is necessry to hve more complete knowledge of the potentil surfce thn tht obtined from the reltive energies of the minim nd from their vibrtionl frequencies. An exmple is the out-of-plne distortion of romtic hydrogens, including the H ɛ1 tom in tryptophn, where devitions of 15 introduce strin energies of less thn 1 kcl/mol. Thus, the intrmoleculr nd intermoleculr prmeter optimiztion included informtion from dibtic energy surfces where pproprite. Such dt mde it possible to djust the prmeters so s to describe energy brriers nd the positions of sddle points, s well s the minimum-energy structures used in the vibrtionl nlysis (e.g., rottion of the side chin hydroxyl group in tyrosine). Experimentl gs-phse dt were used in mny cses, nd b initio clcultions were mde to obtin surfces for which no stisfctory experimentl dt for brriers were vilble (e.g., proline). Chnges in the structure (e.g., bond elongtion nd ngle opening) s function of dihedrl ngle cn be importnt 25 nd were obtined from the b initio clcultions. Such informtion ws used for optimizing nd testing the ccurcy of the potentil function in reproducing structurl distortions, s well s energetic differences. Stisfctory greement ws obtined in most cses for both the vibrtionl frequency nd the torsionl brrier from the combined contributions of dihedrl nd nonbonded terms. In certin cses, compromises were mde becuse single dihedrl term could not describe vibrtionl dt nd the energy brriers; e.g., the H-C-C-H torsionl frequency in ethne ws slightly elevted s compred to the experimentl vlue to llow the rottionl energy surfce to be ccurtely modeled. In some cses tht were regrded s prticulrly importnt (e.g., the dipeptide potentil surfce), more thn one term ws used for the dihedrl ngle potentil in eq 1. Intermoleculr Terms. Intermoleculr prmeter optimiztion involves the vn der Wls nd electrosttic interctions. The objective ws to obtin set of prmeters tht result in blnced protein-protein, protein-wter, nd wter-wter interctions. Interction energies nd structurl dt for model dimer systems nd mcroscopic properties of pure liquids nd solutions were used in the prmeter determintion. The wter model nd wter-wter interctions were tken s the bsis of the prmetriztion. As first step, number of published wter models were tested, including the TIP3P, 20 TIP4P, 20 nd extended SPC/E 26 models s well s lternte models (Go, J.; McKerell, A. D., Jr.; Krplus, M., unpublished results). On the bsis of comprison of the TIP3P nd SPC/E models with b initio clcultions, softer repulsive vn der Wls term ws exmined. Although r -9 nd r -10 repulsions yielded good results, there ws no significnt improvement, reltive to r -12, for either the energetics or the structure of liquid wter. Consequently, it ws decided to retin one of the previously published wter models, nd the TIP3P model ws selected. This model 20 stisfctorily reproduces the first-shell hydrtion nd the energetics of liquid wter, lthough the tetrhedrlity is too wek nd the diffusion constnt is somewht too high. The SPC/E model, which lso reproduces the first hydrtion shell nd is somewht better for the tetrhedrlity, ws not used becuse it hs n inconsistency when pplied to heterogeneous solutions. In the SPC/E model, correction is mde to ccount for the overestimtion of the interction energy due to the omission of electronic polriztion. Such correction is resonble with respect to pure solvent properties but cn led to problems for solution simultions; i.e., the solute does not know tht the energy correction is present in the wterwter interctions. For proper blnce of wter-wter nd wter-solute interctions, the solute chrges would hve to be incresed, thereby leding to possible overestimtion of the solute-solute interctions nd incorrect results for the clculted properties of the solute itself. The TIP4P model, lthough it gives excellent results, ws not used becuse it includes virtul prticle, which complictes the tretment becuse the forces hve to be projected onto the rel toms. Moreover, becuse most of the simultion time of solvted protein is spent on simulting the wter molecules, the less costly TIP3P model ws utilized. It should be noted tht use of the CHARMM 22 prmeter set with wter models other thn TIP3P my led to inconsistencies becuse the wter-protein nd protein-protein intermoleculr prmetriztion my not be well-blnced. Given the wter-wter interction for the TIP3P model, the solute-wter interctions were optimized on the bsis of b initio results for interctions of complexes nd experimentl dt for mcrosopic systems, including thermodynmic prmeters nd moleculr volumes. Ab initio clcultions were performed to determine the minimum interction energies nd geometries between wter molecule nd model compound, primrily t sites involving hydrogen-bonding interctions with polr toms. To determine the prtil tomic chrges, the interction between wter nd ll polr sites of the model compounds were exmined. Typiclly, the isolted model compounds were optimized t the HF/6-31G(d) level. The optimized structures were then used for series of supermoleculr clcultions involving the model compound nd single wter molecule t ech of the vrious sites. The HF/6-31G(d) optimized structure ws replced with n experimentl gs-phse structure if vilble; the gs-phse wter structure corresponding to the TIP3P model ws used in ll cses. 27 The supermolecule structure ws optimized t the HF/6-31G(d) level by vrying the interction distnce nd, in certin cses, single ngle, to find the locl minimum for the wter position with fixed monomer geometries. From the resulting structure, the interction energy ws clculted s the difference between the totl supermolecule energy nd the sum of the individul monomer energies; no corrections for bsis-set superposition error were mde. This pproch is essentilly tht introduced by Reiher nd Krplus. 5,15 It ws subsequently dopted by Jorgensen nd co-workers 8,28 nd most recently used in the development of the MMFF energy function by Hlgren. 29 In the present forcefield development nd in the work of Hlgren 29 but not tht of Jorgensen nd co-workers, 8 we follow Reiher nd Krplus 5,15 in scling the clculted b initio vlues used for the prmetriztion of the interctions between neutrl polr molecules nd wter. This djustment tkes ccount of limittions in the level of the b initio theory being employed nd the neglect of mny-body polriztion in liquid wter. Limittions in the HF/ 6-31G(d) level of theory include omission of the dispersive (ttrction) term in the Lennrd-Jones interction, the use of fixed geometries, the reltively smll size of the bsis set nd
5 3590 J. Phys. Chem. B, Vol. 102, No. 18, 1998 McKerell et l. the omission of corrections for bsis-set superposition error. These limittions led to cncelltion of errors so tht the clculted minimum interction energy nd distnce re kcl/mol nd 2.98 Å, respectively, for the gs-phse wter dimer, 5 in stisfctory greement with the experimentl vlues of 5.4 ( 0.7 kcl/mol nd 2.98 Å. 27b Similr ccurcy is obtined, probbly fortuitously, with the HF/6-31G(d) level clcultions for other polr systems nd is the primry reson the reltively inexpensive level of theory ws used s the bsis for the prmetriztion. For the condensed phse, neglect of mny-body polriztion leds to the b initio interction energy being underestimted nd the minimum distnce overestimted s compred to the condensed phse, in greement with previous work on wide vriety of molecules. 8,13,14 To overcome the underestimtion of the condensed-phse interction energy, scling fctor ws introduced. The scling fctor ws obtined from the rtio of the empiricl interction energy of the TIP3P wter dimer to the HF/6-31G(d) wter dimer interction energy. The resulting vlue, 1.16, ws used to scle the rtio of the wter to model compound b initio interction energies to be consistent with the TIP3P dimer model, so s to obtin blnce of the solute-wter nd wter-wter intermoleculr interctions. 30 The use of single scle fctor mkes the ssumption tht dispersion effects nd polrizbilities re constnt for the compounds being prmetrized nd tht the queous solvent environment is being used for ll clcultions. Recent work, 31 using combined quntum mechnicl/moleculr mechnicl pproch, hs shown tht the electronic polriztion contribution to the electrosttic interction energy vries linerly with the totl interction energies for solvted molecules; this supports the simple scling model. When vilble, experimentl dt on interction energies from mss spectrometry were used in ddition to the b initio results. For chrged species, no scling ws pplied since the HF/6-31G(d) interction energies themselves yield chrge distributions tht give stisfctory greement with hets nd free energies of solvtion. 8,36 To compenste for the overestimtion of the minimum interction distnces in the Hrtree-Fock model due to the bsence of the dispersion contribution nd neglect of mny-body effects, the minimum distnces were ssumed to be bout 0.2 Å shorter thn the HF/6-31G(d) vlues. Such n pproch is consistent with the TIP3P wter model 20 nd pure liquid simultions for which the shorter distnce is required to obtin the correct density. 37 Once the interction energies nd minimum energy geometries for the model supermolecules hd been determined from b initio clcultions, the prtil chrges on the toms of the model compounds were djusted to reproduce those vlues. For consistency, the wter (TIP3P) nd model compound geometries were kept fixed nd only the one or two geometricl prmeters used in the b initio clcultion were vried in the structurl optimiztion of the complex with the CHARMM force field. The initil model compound geometries were those used in the b initio clcultions; the CHARMM optimized geometries were used during subsequent itertions. Use of the CHARMM optimized geometry ensured tht the finl prtil tomic chrges were consistent with the intrmoleculr portion of the force field. Initil prtil chrges were obtined from Mulliken popultion nlysis of the HF/6-31G(d) wve function. In ddition to energies nd distnces, the mgnitudes nd directions of the dipole moments of the model compounds were used in the fitting procedure. If vilble, experimentl gs-phse dipole moment vlues were used; if not, b initio vlues t the HF/6-31G(d) level were dopted. As with the TIP3P wter model, where the empiricl dipole (2.35 D) is lrger thn the experimentl gs-phse vlue (1.86 D), the chrges of the model compounds were djusted such tht the empiricl dipole moments were somewht lrger thn the experimentl or b initio vlues. Once stisfctory greement with ll of these dt hd been obtined, condensed-phse simultions were performed to refine the vn der Wls prmeters. Pure solvent simultions of liphtic nd polr neutrl compounds were used to clculte hets of vporiztion nd moleculr volumes tht could be compred with experimentl dt. Generlly, only smll djustments in the vn der Wls prmeters were required to obtin stisfctory results. In certin cses, crystl simultions were performed to determine hets of sublimtion nd unit cell prmeters; these were lso used in refining the vn der Wls prmeters. 13 Following ny djustment of the vn der Wls prmeters, the supermolecule energies nd distnces were reclculted nd djustments mde in the chrges where necessry. In the present force field the CHARMM TIP3P vn der Wls prmeters re used for both the solvent-solvent nd solvent-solute interctions. This is in contrst to PARAM19, where the TIP3P vn der Wls prmeters for solute-solvent interctions differ from the CHARMM TIP3P pure-solvent vn der Wls prmeters. 5 To simplify the procedure nd to llow for the trnsfer of the prmeters from the model compounds to lrger units such s mino cids, severl ssumptions were mde in the prmeter optimiztion. Chrges were selected to yield groups of unit chrge (0, (1, s pproprite). As well s iding in the trnsfer of the chrges to lrger molecules, this simplifies the tretment of long-rnge electrosttic interctions vi multipole expnsions. 38 The groups optimlly contined five toms or less; in certin instnces lrger groups were required (e.g., imidzole) to obtin stisfctory fits. Adjustment of the chrges upon linking the model compounds to form lrger entities ws performed by dding the chrge of the deleted hydrogen tom to the hevy tom to which it ws previously ttched. This pproch mintins the unit chrge groups from the originl model compounds. Aliphtic intrmoleculr nd intermoleculr prmeters were used without djustment for ll liphtic moieties of mino cid side chins (e.g., for ll C β crbons), s well s for the nucleic cid nd lipid prmeter sets. 13,14 Simultions of the queous solvtion of smll liphtic molecules hve shown tht the chrge distribution hs negligible effect on pure solvent hets of vporiztion nd crystl hets of sublimtion (S. Fischer nd M. Krplus, unpublished results). Results from condensed-phse simultions of peptides nd proteins presented below indicte tht the use of unit chrge groups does not hve n dverse impct on the ccurcy of the finl intrmoleculr prmeters. Given the bove ssumptions, hierrchicl pproch cn be used for the extension of the prmeter set to other molecules. Ech prmeter is optimized in the best possible model compound; best is defined by the nture of the compound nd the vilble dt. Once specific prmeter hs been optimized, it is not chnged when it ppers in the corresponding groups of other compounds. As chemiclly similr molecules re introduced, the vilble prmeters re employed s much s possible. Often, the connectivities of the new molecules (e.g., new bond ngles) re such tht dditionl prmeters cn be dded without destroying the consistency. This llows some degree of flexibility in the prmetriztion for new systems. If the fit obtined from prmeters to the dt pertining to the new molecule is not of sufficient ccurcy, new tom types cn be introduced. These new tom types llow for the introduction of new internl prmeters, so tht the optimiztion cn be
6 Empiricl Prmetriztion of Proteins J. Phys. Chem. B, Vol. 102, No. 18, improved, while ensuring tht the results for other molecules re not compromised. Efforts re mde to keep the number of tom types to minimum. However, s the ultimte gol of this prmeter set is the qulity of fit to wide rnge of dt, the totl number of tom types hs incresed over erlier prmetriztions, i.e., the current protein prmeter set contins 55 tom types s compred to 29 in CHARMM 19 (see Appendix for the list of tom types). 5,15 III. Methods Used for Simultions of the Test Systems Ab initio Hrtree-Fock clcultions were performed with vrious versions of Gussin; 39 Gussin 80, 88, 90, 92, nd 94 were used. Optimiztions of the moleculr structures were performed by either the Berny or the Murtugh-Srgent lgorithm to the defult tolernces. Interction energies nd geometries for optimized NMA nd wter in two different hydrogen-bonding positions (n NH s donor nd CO s n cceptor) nd the NMA dimer were clculted on the bsis of the fixed 6-31G(d) NMA geometry nd the experimentl geometry of wter. 27 In the NMA-wter interction, the hydrogen-bond distnce nd single ngle were vried in the optimiztions (see Figure 3 below); ll other degrees of freedom were fixed. For the NMA dimer only the hydrogen-bond distnce ws optimized. The interction energy ws defined s the difference between the totl energy of the supermoleculr complex nd the sum of the monomer energies; no bsis set superposition error corrections were included. Liquid NMA nd NMA dissolved in wter were simulted with the BOSS progrm 40 using Metropolis smpling in the NPT ensemble. The combintion rules of the BOSS progrm were modified to the Lorentz-Berthelodt rules used in the CHARMM force field (see bove). The pure solvent system consisted of 128 NMA molecules in cubic cell with n edge length of pproximtely 26 Å, subjected to n externl pressure of 1 tm nd temperture of 100 C. Averges were obtined over 2 million configurtions fter n initil equilibrtion period of 1 million configurtions. Both the het of vporiztion nd the liquid density were clculted. For determining the het of solution in wter, the NMA molecule ws plced in the center of periodic box consisting of 267 wter molecules nd the wter molecules tht hd interction energies with NMA greter thn kcl/mol were removed. Equilibrtion ws performed over 10 6 configurtions followed by the evlution of or configurtions for verging. The cutoffs used were 9.5 Å for solute-solvent interctions nd 8.5 Å for solvent-solvent interctions with 1.0 Å switch region for both vn der Wls nd electrosttic interctions; these cutoffs re defult vlues in the BOSS progrm. The het of solvtion of NMA in wter ws clculted from the difference between the verge energy of NMA in queous solution nd the verge energy of the sme number of wter molecules in the bsence of NMA. Severl different types of systems were used for testing the prmeters. Vcuum clcultions were performed on crmbin, BPTI, nd crbonmonoxy myoglobin nd crystl clcultions were performed on tripeptides, cyclic peptides, crmbin, BPTI nd crbonmonoxy myoglobin. In ll of the simultions, covlent bonds involving hydrogen toms were constrined using the SHAKE lgorithm. 41 For the condensed-phse clcultions, trunction schemes for both the vn der Wls nd the electrosttic interctions were introduced. Hydrogens not present in the crystl structures were positioned on the bsis of the defult internl coordintes in the prmeter set; wter hydrogens nd ny other hydrogens tht did not hve unique positions (e.g., for mny of the protein side chins) were plced using the CHARMM HBUILD fcility. 42 Detils of the setup, minimiztion, nd simultion techniques re given below. Crystl minimiztions nd simultions were performed with the CRYSTAL module in the CHARMM progrm. 43 Trunction of the nonbonded interctions ws introduced by using n tom-bsed shifting function for the electrosttic interctions nd tom-bsed switching function for the vn der Wls interctions with the IMAGE tom list cutoff set to be 1 Å lrger thn the nonbond list cutoff. 4 Comprisons of minimiztions were mde for the tripeptides nd cyclic peptides with different cutoff distnces; these rnged from 10 to 25 Å, s described below. In the peptide crystl minimiztions, the lttice prmeters nd hevy tom positions were initilly fixed for 50 dopted-bsis Newton Rphson (ABNR) 4 minimiztion steps to optimize the hydrogen tom positions tht were either poorly determined in the X-ry structures or plced in stndrd positions. All toms were then llowed to relx with the lttice prmeters fixed for n dditionl 200 ABNR steps. This ws followed by full minimiztion including the lttice prmeters tht ws terminted when the rms grdient verged over the minimiztion ws 10-6 kcl/mol/å or less or up to 1000 ABNR steps; the finl rms grdients re reported for the vrious systems. Constnt volume, NVT, nd constnt pressure, NPT, simultions on the tripeptides nd cyclic peptides were performed on the symmetric unit using the temperture nd pressure coupling scheme of Berendsen nd co-workers 44 s implemented in CHARMM in conjunction with the lepfrog integrtor. In the simultions, time step of 1 fs ws used with temperture coupling constnt of 0.1, pressure coupling constnt of 10, nd n isotherml compressibility of tm -1. These vlues were selected to yield stble temperture nd pressure for the system during the simultions, while keeping the influence of the coupling to minimum. Prior to the simultions, the crystls were submitted to 50-step ABNR minimiztion of the hydrogen toms followed by 200-step ABNR minimiztion of ll toms with the lttice prmeters fixed, s used in the minimiztion studies. Simultions were performed for 50 ps, nd the coordintes were sved every 100 steps (0.1 ps) for nlysis. Both internl nd externl pressures were monitored during the NVT nd NPT simultions. Internl pressures were clculted from the forces on the primry toms, nd the externl pressures were obtined from the difference between the totl forces due to both imge nd primry toms nd the primry tom forces. 1 The internl nd externl pressures re expected to be pproximtely equl; negtive pressure indictes tht the volume of the system would contrct in NPT simultion. The rms fluctutions of the pressures were up to 1 order of mgnitude lrger thn the verge pressure, s expected for system of this size. Moleculr dynmics simultions of the proteins were performed using the lepfrog lgorithm s implemented in CHARMM for the both the vcuum nd crystl simultions. When specified, SHAKE ws pplied to ll covlent bonds involving hydrogens. 41 Vcuum simultions were initited with 5-ps heting period in which the velocities were incresed in increments of 6 K every 0.1 ps. This ws followed by 5-ps equilibrtion period in which (5 K window ws pplied to the temperture nd checked every 0.1 ps; if the temperture ws out of rnge, velocity scling ws performed. The production run, unless specified, ws continued for 300 ps without velocity scling.
7 3592 J. Phys. Chem. B, Vol. 102, No. 18, 1998 McKerell et l. Crystl simultions on crmbin, BPTI, nd crbonmonoxy myoglobin were performed using the following protocol for generting the simultion system in the NVT ensemble. The strting configurtion for the crmbin crystl studies corresponded to the 0.94 Å X-ry coordintes with n R-fctor of 0.104, including two ethnols nd 86 wter molecules. 45 Five dditionl wter molecules hd been dded to fill vcuum points of the crystl, using the CHARMM19 prmeter set (John Kuriyn, personl communiction); this yielded simultion system consisting of 933 toms. BPTI clcultions were initited from the joint neutron nd X-ry refined structure t resolutions of 1.8 nd 1.0 Å nd R-fctors of nd 0.200, respectively. 46 Vcuum points in the BPTI crystl were filled by wter molecules. This ws performed by generting the primry toms, s defined by the symmetric unit, nd ll crystl imge toms within 13 Å of the primry toms. The BPTI symmetric unit cell ws overlyed by TIP3P box of dimensions Å. All wter molecules whose oxygen toms were within 2.8 Å of ny of the primry or imge non-hydrogen toms were deleted. The resulting system ws used s the strting configurtion for the simultion. This system included 892 BPTI toms, 6 toms of dinionic phosphte, 63 crystl wters, nd 29 dded wter molecules for totl of 1174 toms. The strting configurtion ws subjected to 50 steepest-descent (SD) steps followed by 5 steps of Powell minimiztion with ll hevy toms fixed to their initil positions in the presence of the crystl imges; SHAKE ws pplied to bonds involving hydrogens. This ws followed by 50 SD step minimiztion of ll toms followed by 5 Powell steps with SHAKE, gin in the presence of the crystl imges. Crbonmonoxy myoglobin simultions strted with the 1.5 Å crystl structure (R-fctor ) 0.171) obtined t 260 K. 47 The bound crbon monoxide nd sulfte ion present in the crystl were included in the simultion. As with BPTI, vcuum points in the crystl structure were filled with wters using Å wter box; the box size ws chosen to cover one symmetric unit following the methodology presented bove for BPTI. This procedure dded 208 wter molecules in ddition to the 137 wter molecules identified in the X-ry study, yielding totl of 3574 toms in the system. Following the ddition of the wters the sme minimiztion protocol s tht pplied to BPTI ws used. All crystl simultions were performed by grdully heting the system over 5-ps period by incresing the temperture every 0.1 ps to finl tempertures of 300, 285, nd 260 K for crmbin, BPTI, nd crbonmonoxy myoglobin, respectively, in ccord with the tempertures used for the structure determintions. This ws followed by 5 ps of equilibrtion using (5 K window with testing every 0.1 ps. If the temperture ws outside of the window, the velocities were scled to bring the temperture bck to 300 K. Production trjectories were performed for 100 ps without velocity scling. The integrtion time step ws 1 fs. Coordintes were sved every 0.1 ps for nlysis. Anlysis of the protein simultions ws similr to tht used previously with emphsis on the spects of the results tht test both the intrmoleculr nd intermoleculr contributions to the potentil function. The rms differences were clculted for the specified non-hydrogen toms following lest-squres fit of the bckbone (C, N, C R, O) toms except where the rms difference ssocited with C R toms is reported. In tht cse only the C R toms were used in the lest-squres fit. The rms fluctutions were clculted following reorienttion of ll nonhydrogen toms in ech time frme to the strting coordintes to ensure tht trnsltionl nd rottionl motions of the protein did not contribute to the clculted fluctutions. This is done becuse the simultions were not long enough to provide full smpling of the overll motion tht does occur in the crystl. Thus, the clculted vlues re lower limits for the overll tomic fluctutions. 48 Averge vlues nd fluctutions of internl coordintes were obtined by verging over the individul time frmes of the trjectories. Use of verge differences, in ddition to rms differences, exposes systemtic trends introduced by the prmeters. Time-verged structures were obtined from the production portions of the simultions. Nonbonded interction distnces were clculted on the bsis of previously described pproch for hydrtion. 13 The interctions re nlyzed in terms of hevy tom-to-hevy tom (donor-to-cceptor) distnces within cutoff distnce to void unphysicl contributions. In the present study, cutoff distnce of 3.5 Å ws employed. This distnce corresponds to the first minimum in the TIP3P wter model O-to-O rdil distribution function 20 nd is ssumed to represent the outer limit of the first hydrtion shell. The sme cutoff ws used in the nucleic cid prmetriztion pper 13 nd in study of nonbonded interctions in proteins bsed on survey of the Brookhven Protein Dt Bnk. 49 Applying this cutoff distnce llows for the verge interction distnce nd hydrtion or occupncy number (see legend of Tble 23 below) to be obtined from the experimentl X-ry structures, even though only limited number of such interctions my be present. 50 For consistency, the sme pproch ws used for nlysis of the X-ry dt nd dynmics simultions. Averge distnces nd hydrtion or occupncy numbers were obtined over the individul time frmes in the trjectory nd normlized with respect to the number of time frmes nd the types of toms included in the nlysis. IV. Results nd Discussion The prmetriztion of the protein potentil energy function ws bsed on sets of smll model compounds tht re pproprite to represent the min chin nd the mino cid side chins. The min chin prmetriztion nd testing re presented in the first prt of this section (Section IV.). Detils of the side chin prmetriztion re given in seprte ppers tht re in preprtion. The entire set of protein prmeters is listed in the Appendix. The results obtined in testing the prmeters on tripeptides, cyclic peptides, nd proteins re presented in Section IV.b, IV.c, nd IV.d, respectively. We hve chosen the systems for their intrinsic interest, becuse extensive dt were vilble for them, nd/or becuse they hve been used in testing other protein prmeter sets (e.g., cyclic peptides, crmbin). The nlysis concentrtes on comprisons with experimentl results tht test both the intrmoleculr nd intermoleculr portion of the potentil function. Specil ttention is pid to intermoleculr interctions involving wter molecules. Crystl studies were performed on the noncyclicl tripeptides Gly-Al-Vl 3H 2 O nd Gly-Al-Leu 3H 2 O, 51 which re in helicl conformtions, nd Al-Al-Al, which is prllel pleted sheet model. 52 These peptides include wter molecules nd ionic functionl groups. Cyclic peptide crystl minimiztions nd simultions were performed on Al-Al-Gly-Gly-Al- Gly H 2 O nd Al-Al-Gly-Al-Gly-Gly 2H 2 O, 53 Gly-Gly-D- Al-D-Al-Gly-Gly 4H 2 O, 54 (Gly-Pro-Gly) 2, 55 Gly-Pro-Gly-D- Al-Pro, 56 nd (Cys-Gly-Pro-Phe) 2 4H 2 O. 57 Protein test clcultions were mde for crmbin, the bovine pncretic trypsin inhibitor (BPTI), nd crbonmonoxy myoglobin in vcuum nd in crystl environment.
8 Empiricl Prmetriztion of Proteins J. Phys. Chem. B, Vol. 102, No. 18, Figure 1. Structures of (A) N-methylcetmide (NMA) nd (B) the lnine dipeptide. Atom nmes represent the nomenclture used in the text. IV.. Protein Bckbone. Accurte prmetriztion of the protein bckbone is essentil for the overll qulity of the potentil energy function of peptides nd proteins. Two molecules were selected s the model compounds for the prmetriztion of the peptide bckbone. The first is N- methylcetmide (NMA), which is the simplest peptide model; it contins single peptide bond tht is methylted on the crbonyl crbon nd the mide nitrogen. This results in system tht is closer to the interior peptide bond of protein thn models such s formmide or cetmide tht hve been used in previous studies. Experimentl dt for NMA include structurl nd vibrtionl mesurements s well s thermodynmic dt for liquid NMA nd for NMA in queous solution. Most of the optimiztion of the peptide group interction prmeters were bsed on NMA. The second model system is the lnine dipeptide, which contins two peptide linkges, gin methylted. Detiled nlysis of chnges in structurl nd energetic properties ssocited with vritions in the φ nd ψ ngles were mde. Of prticulr interest re the C7 eq, C7 x, nd C5 conformtions, which re the three minim for the dipeptide in vcuum tht re typiclly used for the study of protein bckbone energetics. Although little is known experimentlly bout the conformtionl properties of the lnine dipeptide, resonbly high-level b initio clcultions re now vilble for it nd for some closely relted model systems. These theoreticl results were used in the prmetriztion, insted of energy estimtes bsed on the φ, ψ distributions observed in proteins tht hve been used in some other studies. 58,59 Digrms of NMA nd the lnine dipeptide, including the tom-nming convention, re shown in Figure 1. Becuse of its specil covlent interctions, specific prmeters were developed for the peptide bond of proline residues. IV..1. Internl Prmetriztion. As mentioned in the Introduction, the internl prmetriztion of the peptide bckbone is complicted by the importnt structurl chnges tht occur in going from the gs-phse to the condensed-phse environment. The most importnt difference involves significnt shortening of the CN bond. This rises from the incresed contribution of the resonnce structure with CN double bond when the CdO nd N-H groups re involved in hydrogen-bonding interctions, s they generlly re in proteins. The effect is most clerly demonstrted in the NMA crystl structure nd in comprison between theoreticl clcultions of NMA by itself nd of NMA hydrogen bonded to wter molecules. In fct, theoreticl prediction 60,61 tht the stndrd NMA crystl structure 62 ws incorrect hs been confirmed recently by new structure determintion. 63 Such chnges in TABLE 1: Geometric Dt on N-Methylcetmide MP2/6-31G(d) b experimentl CHARMM gs c crystl d survey e gs 3H H2O, 2O 2FM Trns Bonds C4-C (5) 1.515(3) 1.52(1) C5-N (4) 1.325(3) 1.33(1) N7-C (6) 1.454(3) 1.45(2) C5dO (3) 1.246(2) 1.23(1) N7-H Angles C4-C5-N (15) 116.3(6) 116(2) O6dC5-N (4) 121.7(6) 123(1) C4-C5dO (6) 121(4) C5-N7-C (8) 121.3(6) 122(1) C5-N7-H (50) Cis Bonds C4-C C5-N N7-C C5dO N7-H Angles C4-C5-N O6dC5-N C4-C5dO C5-N7-C C5-N7-H Distnce in Å nd ngles in deg; vlues in prentheses represent the stndrd devition error in the finl digit(s). b From ref 60, 3H 2O indictes two wter molecules hydrogen bonding to the crbonyl oxygen nd one wter molecule hydrogen bonding to the mide proton; H 2O, 2FM indictes one wter molecule nd one formmide hydrogen bonding to the crbonyl oxygen nd one formmide hydrogen bonding to the mide proton; see originl reference for the exct geometries. c Gs-phse electron diffrction dt from ref 65. d Crystl vlues re from ref 63 for the 0.9 occupncy structure. e Survey of the Cmbridge Crystl Dt Bnk 22 performed s prt of the present study tht involved 145 structures from which 133 peptide bonds were selected with R-fctors less thn electronic structure between the gs phse nd the condensed phse re difficult to represent in empiricl potentils without complicting the potentil function by introducing polriztion. Becuse the primry focus of the present prmetriztion is to develop model for the peptide bckbone for proteins nd for condensed-phse simultions, in generl, the optimiztion of the internl force field ws done for NMA nd the lnine dipeptide with their condensed-phse geometries. This ssumes tht in peptides nd proteins the hydrogen-bonding propensities re generlly stisfied either by internl hydrogen bonds or by hydrogen bonds to wter. 49 Ab initio nd experimentl geometries for NMA nd the lnine dipeptide were employed in the optimiztion, long with survey results on proteins. 64 Peptide bckbone geometries in the Cmbridge Crystl Dt Bse 22 were lso used in the prmeter development. Tble 1 presents the internl geometries of NMA from the empiricl force field, experiment, 63,65 nd b initio clcultions; 60,61,66 for the structurl definitions, see Figure 1. The NMA b initio clcultions include fully optimized structures for the isolted molecule in the gs phse nd for the molecule with hydrogen-bonded wter nd/or formmide molecules. These structures indicte the nture of the chnges in geometry expected in going to the condensed phse. There is decrese in the peptide bond length nd n increse in the crbonyl CdO bond length in going from the gs phse to the condensed phse. Anlysis of Tble 1 shows tht the b initio clcultions
9 3594 J. Phys. Chem. B, Vol. 102, No. 18, 1998 McKerell et l. TABLE 2: Geometric Dt on the Alnine Dipeptide C7 eq C7 x C5 emp. b initio emp. b initio emp. b initio survey φ ψ ω ω Bonds C4-C (7) C5-N (7) C5-O (5) N7-C (8) C9-C C9-C (6) C12-O (9) C12-N (3) N17-C (14) N7-H N17-H Angles C4-C5-N (2) C5-N7-C (1) N7-C9-C (2) C9-C12-N (1) C12-N17-C (1) Bonds lengths in Å nd ngles in deg. Ab initio dt from ref 70. Survey results from the Cmbridge Crystl Dt Bse 22 performed s prt of the present study; vlues in prentheses represent the stndrd devition error in the finl digit(s). The smple includes compounds contining dipeptides with terminl liphtic crbons. reproduce the experimentlly observed trends. Comprison of the CHARMM structures shows stisfctory greement for the C5-N7 nd N7-C9 bonds nd for the C4-C5-N7, O6dC5- N7, C4-C5dO6, C5-N7-C9, nd C5-N7-H8 ngles. Upon going from the trns to the cis conformer, CHARMM reproduces predicted chnges in the b initio MP2/6-31G(d) structures for the O6dC5-N7, C5-N7-C9, nd C5-N7-H8 ngles. The most obvious discrepncy occurs for the C4-C5 bond length; i.e., the empiricl bond length is Å versus vlues between 1.51 nd 1.52 Å for the b initio nd experimentl dt. The lnine dipeptide results (see below) show tht this difference is resolved in the lrger compound. A number of b initio clcultions indicte tht the structure of NMA in vcuum devites slightly from plnrity. 67,68 There is pyrmidliztion of the peptide nitrogen, leding to devitions in plnrity of up to 10 for the OdC-N-H dihedrl ngle. Recent clcultions show tht the peptide bond is essentilly plnr when involved in hydrogen-bonding interctions. 60 Since the present prmeter set is designed for condensed-phse simultions, the minimum energy geometry of NMA ws prmetrized to be plnr. To tret peptide bond rottion, including pyrmidliztion of the peptide nitrogen, modified force field is required. 69 Tble 2 contins the geometric dt for the lnine dipeptide in the C7 eq,c7 x, nd the C5 conformtions s clculted with CHARMM nd by b initio methods t the HF/6-31G(p,d) level. 70 Results from survey of the CCDB of dipeptidecontining molecules re lso included. The tom nmes for the lnine dipeptide re shown in Figure 1b. Overll, the empiricl bond lengths nd ngles re in stisfctory greement with the b initio nd survey dt. Of note re the C4-C5 nd C9-C12 bond lengths; these correspond to the C4-C5 bond in NMA for which poor greement ws obtined. For the lnine dipeptide, the C4-C5 bond is still too short; however, the C9-C12 bond length is in good greement with both the b initio nd survey dt. The C4-C5 nd C9-C12 bonds re treted with the sme prmeters; the difference between the two is due to the influence of nonbonded interctions on the optimized distnces. This emphsizes the importnce of the itertive pproch used for intrmoleculr nd intermoleculr prmeters in the present study. The remining bonds nd ngles in Tble 2 re in good greement with the trget dt. This includes the C5-N7 nd C12-N17 peptide bonds nd the C5dO6 nd C12dO13 crbonyl bonds. For the ngles the greement of the empiricl nd b initio vlues is generlly good for both the bsolute vlues nd the trends mong the three minim (see Tble 2). In some cses, including the C5-N7-C9 nd N7-C9-C12 ngles, the chnges between the minim in the CHARMM structures re not s lrge s those predicted by the b initio clcultions. However, the chnges in the CHARMM vlues re in the correct direction. The overll geometries of the three minim, s indicted by the φ nd ψ dihedrl ngles, re in resonble greement with the b initio vlues. The lrgest differences occur in the ψ vlues of the C7 x nd C5 minim, where differences of nd occur, respectively. No effort ws mde to reconcile these differences becuse of the overll success of the prmeters in reproducing experimentl vlues of φ nd ψ in number of peptides nd proteins (see Tble 22 nd Tbles 4 nd 7 of the Supporting Informtion). The differences between the CHARMM nd b initio results my be due to the limittions in the form of the empiricl energy function in CHARMM. However, it is not cler tht the b initio vlues hve converged to the correct results since the φ, ψ dihedrls re sensitive to the level of the b initio clcultion. Prmetriztion of the force constnts for the peptide bckbone ws bsed on the vibrtionl spectr nd the reltive energies of different conformers of NMA nd the lnine dipeptide. Vibrtionl dt for NMA re obtined from gsphse nd Ar mtrix IR 71,b,72 nd RAMAN 73 studies nd b initio results. 74,75 Solid nd liquid-phse studies indicte tht certin frequencies re shifted due to hydrogen bonding; most noticeble re the NH stretching 76,77 nd in-plne nd out-ofplne bending modes. 75b In the optimiztion, prmeters ssocited with the methyl groups were trnsferred directly from the CHARMM liphtic prmeter set. Tble 3 lists the
10 Empiricl Prmetriztion of Proteins J. Phys. Chem. B, Vol. 102, No. 18, TABLE 3: Vibrtionl Dt for N-Methylcetmide experimentl/b initio b CHARMM mode frequency ssignment frequency ssignment 1 VLF 64 τcch3(101) 2 VLF 89 τnch3(1001) c ωn7h d 200 τc5-n7(107) τc5-n βcnc 271 βcnc(62) βccn βccn(25) τc5-n7 431 βccn(50) ωn7h d βccn 579 βc5do(50) βc5do νc5-c4(29) βc5do 652 ωc5do(67) νc5-c4 ωn7h(30) c βc5do 776 νc5-n7(34) rch3 νc5do(20) νc5-n7 797 ωn7h(66) rch3 rch3(15) νc5-c rch3 949 rch3(36) νn7-c9 νn7-c9(34) νc5-c rch3 996 rch3(47) βc5do νn7-c9(26) νn7-c rch3(83) rch rch rch3 (72) νn7-c rch3(67) βc5do ωc5do(17) βn7h c rch βn7h(44) νc5-c4(24) δch3s 1384 δch3s(94) δch3s 1413 δch3s(89) δch3s 1416 δch3s(88) δch3s 1418 δch3s(91) δch3s 1426 δch3s(87) rch3(15) δch3s 1481 δch3s(50) βn7h(21) βn7h 1587 δch3s(39) βn7-c9 βn7h(20) νn7-c9(17) νc5do 1683 νc5do(66) νch3s 2852 νch3s(100) νch3s 2914 νch3s(100) νch3s 2915 νch3s(100) νch3s 2917 νch3s(100) νch3s 2975 νch3s(100) νch3s 2975 νch3s(100) νn7h 3326 νn7h(99) Frequencies in cm -1. Potentil energy distributions determined with the MOLVIB module in CHARMM. Only modes contributing greter thn 12% re included. VLF indictes unobserved very low frequencies. ω indictes wgging modes, ν indictes stretching modes, τ indictes torsionl rottions, r indictes rocking, δ indictes deformtions, nd β indictes bends. b Experimentl dt from refs 71,b s reported nd supplemented with b initio dt in ref 74. c Frequencies estimted from the cited b initio clcultions. d In ref 74 these modes re ssigned s NH deformtions. On the bsis of more recent studies 75 we hve ssigned these s wgging modes. CHARMM vibrtionl frequencies nd the experimentl gsphse NMA frequencies reported by Sugwr et l. 74 Exmintion of Tble 3 shows stisfctory greement for modes 3-15, which re dominted by the internl prmeters describing the peptide bond. Of note is the greement for modes 7 nd 9, which contin significnt contributions from the N-H nd CdO out-of-plne wgging modes. This greement ws obtined by use of improper dihedrl ngle force constnts for the peptide bond (see Appendix). Modes re ssocited with the methyl groups nd re in good greement with the experimentl dt, confirming the vlidity of the direct trnsfer of the liphtic prmeters to the present system. Experimentl nd b initio studies 75,76,77 show modes ssocited with the N-H group to chnge upon going from the gs to condensed phse. Optimiztion of the force constnts ssocited with the N-H stretching, bending, nd wgging modes, therefore, emphsized the reproduction of condensedphse vibrtions rther thn gs-phse dt. This is bsed on the ssumption tht the N-H group lwys prticiptes in hydrogen-bonding interctions nd is consistent with the optimiztion of the bond nd ngle equilibrium prmeters discussed bove; i.e., n ttempt is mde to provide prmeter set tht mirrors the condensed-phse environment. This pproch leds to the N-H stretch mode being lower thn the gs-phse experimentl vlue, s shown in Tble 3. Recent studies hve shown queous hydrogen-bonded N-H bending modes to occur t 1313 nd 1580 cm -1, 75b vlues tht re higher thn those present in Tble 3. The CHARMM vlues of 1267, 1481, nd 1587 cm -1 re in stisfctory greement with the condensed-phse vlues. Similrly, gs-phse N-H wgs occur t 171 nd 391 cm -1, while the queous-phse frequency is clculted to occur t 745 cm b The CHARMM vlues of 652 nd 797 cm -1 re in good greement with the ltter vlue. This pproch is used to obtin better dynmic properties of the protein bckbone for condensed-phse simultions within the limittion tht hrmonic bond stretching, ngle bending, nd improper terms re used. The prmetriztion of NMA lso ccounts for the reltive cis/trns energies nd the brrier to rottion bout the ω dihedrl ngle (see Figure 1A). Anlysis of NMR line shpes hs been used to determine n enthlpic brrier to rottion of 19.8 ( 1.8 kcl/mol nd free energy brrier of 21.3 ( 0.3 kcl/mol. 78 Ab initio clcultions t the MP2/6-31G(d)//HF/6-31G(d) level predict tht the cis conformer is 2.07 kcl/mol bove the trns conformer. 66 These dt were used for the optimiztion of the dihedrl ngle prmeters ssocited with the peptide bond. Vlues of 21.0 kcl/mol for the energetic brrier to rottion nd 1.74 kcl/mol for the cis-trns energy difference were obtined with the present prmeter set. This required inclusion of 1-fold nd 2-fold term for the C4-C5-N7-C9 dihedrl ngle (see Appendix). The internl prmeter optimiztion for the lnine dipeptide ws bsed on the trnsfer of the prmeters from NMA. As the mjority of internl prmeters were determined in this wy, only few terms remined to be djusted. These include the dihedrl terms ssocited with the φ nd ψ dihedrl ngles nd the ngle term ssocited with the centrl ngle N7-C9-C12; this ngle is often referred to s τ nd ws one of the few ngulr degrees of freedom tht were djusted in erly crystl structure determintions. Adjustments of internl prmeters ssocited with the peptide bckbone hve previously been bsed on experimentl geometric dt for the vrition of the ngle τ nd with φ nd ψ on reltive energies of lnine dipeptide conformers from b initio studies 10,79,80 or on the free energies obtined from the φ, ψ distributions observed in protein structures. 58,59,80 Use of energetics from survey dt is pproprite when the gol of the force field is to obtin condensedphse free energy informtion bsed on gs-phse clcultions lone. However, for force fields to be used for simultions with explicit solvent models, prmeters should be bsed primrily on potentil energy rther thn free energy dt. 59 In the present work, optimiztion ws initilly bsed on b initio results for the reltive energies of certin conformtions of the lnine
11 3596 J. Phys. Chem. B, Vol. 102, No. 18, 1998 McKerell et l. TABLE 4: Ab Initio Results on the Alnine Dipeptide nd the Alnine Dipeptide Anlogue level C7 eq C7 x C5 helicl Alnine Dipeptide 3-21G d 0.0(-85.8, 69.0) 2.81(74.4, -58.2) 1.13(-193.0, 190.6) 4-21G b 0.0(-84.6, 73.0) 2.6(74.6, -62.0) 1.4(-165.7, 167.3) DZP c 0.0(-85.9, 79.1) 2.99(75.8, -58.9) 0.50(-156.0, 161.0) 3.22(65.9, 33.5) 6-31G f 0.0(-86.4, 72.7) 2.55(74.1, -58.6) 0.48(-159.8, 160.5) 6-31G(d,p) h 0.0(-85.8, 79.0) 2.82(76.0, -55.4) 0.40(-157.2, 159.8) 4.35(-60.7, -40.7) MP2/TZVP//HF/6-31G(d,p) h LMP2/cc-pVTZ(-f)//MP2/6-31G(d) g 0.0(-83.1, 77.8) 2.48(74.4, -64.2) 1.11(-158.4, 161.3) MP2/cc-pVTZ(-f)//MP2/6-31G(d) g MP2/TZP//MP2/6-31G(d) g MP4 /cc-pvtz(-f)//mp2/6-31g(d) g MP4-BSSE /cc-pvtz(-f)//mp2/6-31g(d) g Alnine Dipeptide Anlogue 3-21G d 0.0(-84.5, 67.3) 2.53(74.1, -57.3) 1.26(-191.6, 189.4) 4-21G e 0.0(-84.7, 67.3) 1.39(-166.6, 169.9) 4-31G e 0.0(-85.5, 69.4) 0.45(-161.5, 164.5) 6-31G e 0.0(-85.2, 69.8) 0.33(-160.9, 164.0) 6-31G(d,p) e 0.0(-85.3, 76.0) 0.30(-157.9, 162.6) 6-311G(d,p) e 0.0(-85.5, 78.3) 0.25(-156.8, 162.2) MP2/6-311G(d,p) e 0.0(-81.5, 82.5) 1.66(-159.8, 162.1) 6-31+G(d) i 0.0(-85.8, 78.1) 2.56(75.1, -54.2) 0.19(-155.6, 160.0) HF/6-31+G(d,p)//HF/6-31+G(d) i MP2/6-31+G(d,p)//HF/6-31+G(d) i MP2/6-31+G(d,p) f 0.0(-83.0, 79.2) 2.20(74.3, -60.2) 1.27(-156.2, 160.5) Energies in kcl/mol. Vlues in prentheses represent the φ nd ψ ngles. Alnine dipeptide nlogue is the lnine dipeptide with the two terminl methyl groups omitted. b Reference 25. c Reference 81. d Reference 83. e Reference 82. f Guo, H.; Krplus, M. Unpublished results. g Reference 85. h Reference 70. i Reference 84. dipeptide tht correspond to different φ nd ψ vlues nd on the chnge in τ s function of conformtion. Appliction of the bove prmeters for minimiztion nd moleculr dynmics simultions on crmbin, BPTI, nd MbCO gve resonble results (see Section IV.d). However, the MbCO clculted structures showed significnt devitions from experiment concerning the bckbone φ, ψ ngles. Since these devitions, which concerned minly the R-helicl region, were systemtic, n itertive procedure bsed on the verge differences between clculted nd experimentl MbCO bckbone geometries nd the conformtionl energetics of n lnine dipeptide nlogue in which the terminl methyl groups re omitted ws undertken to obtin the finl prmeter set. The resulting prmeter set hd stisfctory behvior in the R-helicl regions of MbCO, BPTI, nd crmbin nd in the β-sheet regions of crmbin nd BPTI; there re 7 nd 15 mino cids in β-sheets in crmbin nd BPTI, respectively. From the crystl simultions, the verge differences of the φ, ψ vlues in the β-sheet regions with the finl prmeter set re -7.1, 6.0 nd -4.5, 0.8 for crmbin nd BPTI, respectively, while the rms devitions re 11.7, 7.8 nd 9.5, 6.6 for crmbin nd BPTI, respectively. The verge devitions suggest tht there remins smll, possibly systemtic, devition in the β-sheet region, lthough the smple is rther smll. However, it should be noted tht the rms differences for the β-sheet dihedrl ngles re significntly smller thn those occurring for ll residues in the two proteins (see Tble 22). The djustment bsed on MbCO provides wy of correcting the b initio dipeptide energy mp for energetic effects due to the protein environment. Since the geometric prmeters were determined for NMA in solution environment, the b initio mp is being fitted to lnine dipeptide conformers whose internl geometries (prticulrly the CN nd CO bond lengths) differ significntly from the b initio vlues (see Tble 2). Tble 4 gives the reltive energies s function of φ nd ψ for the lnine dipeptide nd n lnine dipeptide nlogue from vriety of published b initio studies. 25,70,81-85 In the nlogue the terminl CH 3 groups re replced by hydrogen. The study of Hed-Gordon et l. 84 included energies for 15 sttionry points for the lnine dipeptide nlogue t the HF/6-31+G(d) level, nd the study by Gould et l. 70 contined energies for 7 lnine dipeptide structures t the HF/6-31G(p,d) nd MP2/ TZVP//HF/6-31G(p,d) levels. As the Gould et l. study ws published following completion of the present work, the energies of Hed-Gordon et l. 84 were used in the optimiztion of the φ nd ψ dihedrl prmeters. This ws performed by djusting the dihedrl prmeters, optimizing the full lnine dipeptide with φ nd ψ fixed t the vlues reported by Hed-Gordon et l., nd determining the sum of the squres of the difference between the b initio nd empiricl reltive energies. In the clcultions the empiricl lnine dipeptide energies were compred directly with the b initio results for the lnine dipeptide nlogue; i.e., no CHARMM clcultions were mde for the lnine dipeptide nlogue becuse it contins n ldehyde functionl group (see bove) not included in the protein prmetriztion. Tbles 5 nd 6 show the results of the optimiztion procedure used for the dipeptide prmeters to give stisfctory results for the reltive energies of the vrious conformers of the lnine dipeptide nd simultneously to remove the systemtic devition
12 Empiricl Prmetriztion of Proteins J. Phys. Chem. B, Vol. 102, No. 18, TABLE 5: Reltive Energy of the Empiricl r R Conformer nd the Sum of the Squres Difference between b Initio nd Chrmm22 Energies for the Alnine Dipeptide or the Alnine Dipeptide Anlogue for Six-Prmeter Sets prmeters R R energy b sum of the squres HF/6-31+G(d) (dipeptide nlog) c MP2/TZVP// HF/6-31G(p,d) d set 1, no cutoff set 2, no cutoff set 3, no cutoff set 4, no cutoff set 5, no cutoff set 6, no cutoff Energies in kcl/mol nd dihedrls in deg. Sum of the squres of the reltive energy differences between the b initio energies nd the empiricl energies not including the C7 eq conformer, where the energy is 0.0 for ll levels of theory. The empiricl energies were obtined following full-geometry optimiztion with the φ nd ψ dihedrls constrined t the b initio vlues s reported in the cited studies. b R R (right-hnded helix) energy is the empiricl energy reltive to the C7 eq conformer following minimiztion with φ nd ψ constrined to -65 nd -41, respectively. c The 15 conformers used for the sum of the squres determintion re those listed in Tble 2 of ref 84. d The six conformers used for the sum of the squres determintion re those listed in Tble 2 of ref 70. TABLE 6: Averge Difference in the O nd ψ Vlues between the Myoglobin-CO Minimized nd Crystl Structures minimiztion MD simultion prmeters φ ψ φ ψ set 1, no cutoff -3.8 ( ( ( ( 2.5 set 1, ( ( ( ( 2.5 set 2, ( ( ( ( 2.4 set 3, ( ( ( ( 2.0 set 4, ( ( ( ( 2.2 set 5, ( ( ( ( 6.0 set 6, ( ( ( ( 2.4 Dihedrls in deg. Minimiztions involved 100 steepest descent steps followed by 500 ABNR steps, nd MD simultions involved 20-ps vcuum simultions with the nlysis performed using the ps time-verged structure. in the φ, ψ vlues in MbCO. Only the dihedrl ngle prmeters for φ nd ψ were chnged in the different sets; the vlues re given in the Appendix. Tble 5 lists the sum of the squre differences of the energies reltive to C7 eq of the CHARMM clcultions nd the lnine dipeptide nlogue for the different prmeter sets. Also included in Tble 5 re the sum of the squre differences between empiricl nd b initio reltive energies for seven conformers of the lnine dipeptide, 70 which were published fter the prmeters hd been determined (see bove). The comprison provides posteriori verifiction of the use of the lnine dipeptide nlogue b initio results s the bsis for the prmeter optimiztion. The nlogue nd full sum of the squres vlues hve prllel behvior, indicting tht the lnine dipeptide nlogue results were pproprite s bsis for the optimiztion procedure. In fct, better greement is obtined with the higher-level full dipeptide clcultions thn with those for the nlogue. The initil set of dihedrl prmeters ssocited with the peptide bckbone re identified s set 1. They were used only to perform moleculr dynmics simultion for crbonmonoxy myoglobin. 47 They give the best greement with the b initio vlues, to which they were fitted (see Tble 5). However, s indicted in Tble 6, the structures from the vcuum clcultions in both the presence nd bsence of n tom trunction scheme yielded significnt devitions for the verge vlues of φ nd ψ for crbonmonoxy myoglobin. In prticulr, the results show tht the φ ngles were consistently too smll nd the ψ ngles were consistently too lrge in comprison with the experimentl vlues. Use of verge devitions, rther thn rms vlues, shows the direction of the devitions. Such systemtic differences indicte tht the prmeters re introducing systemtic bis into the bckbone conformtion. This bis hs been observed in other force fields 10 nd is likely to be present in generl. Such decrese in φ nd increse in ψ mens tht residues in helicl conformtion, which domintes the MbCO structure, re being shifted to more extended conformtion (i.e., in the direction of the C7 eq dipeptide minimum). Accordingly, the dihedrl prmeters ssocited with φ nd ψ were djusted to lower the energy of the R R (-61, -41 ) conformer reltive to C7 eq (see Tble 5). Cre ws tken during the djustments of the dihedrl ngle prmeters to ensure tht the R R conformer did not become locl minimum, since it is not minimum in the b initio clcultions. As cn be seen by compring Tbles 4 nd 5, the R R energy of set 1 is significntly too high, while the vlues for sets 4-6 re in resonble rnge. As the R R - conformer energy is lowered by ltering the prmeters, poorer greement between the reltive energies for the empiricl nd b initio vlues for other lnine dipeptide conformers is obtined (Tble 5). The vcuum minimiztions nd moleculr dynmics simultions for MbCO were repeted using the fiveprmeter sets (sets 2-6). The greement between the clculted nd experimentl structures shows significnt improvement. Set 4 yielded the best greement of the moleculr dynmics simultions nd the crbonmonoxy myoglobin of φ, ψ vlues. It ws selected s the finl prmeter set nd used in subsequent tests. The influence of the six-prmeter sets used in determining φ nd ψ on the dibtic potentil energy surfce for the lnine dipeptide is shown in Figure 2; six lnine dipeptide mps corresponding to the six φ, ψ prmeter vlues re presented. As the energy of the R R conformer decreses the pth between the C7 eq nd R R conformers becomes more well-defined nd reltively nrrow chnnel forms between the two. Anlysis of the sum of the squres of the energy differences with respect to b initio dt in Tble 5 shows these vlues increse, indicting lower-qulity surfce with respect to the b initio dt. While the discrepncy between the reltive energies of the empiricl prmeter sets nd the b initio dt in Tble 5 my be ttributed in prt to limittions in the level of theory in the b initio clcultions, the results reinforce the view tht gsphse dt should be used with cre in prmetriztion of force fields designed for use in the condensed phse. This is consistent with chnges observed in the internl geometries of NMA nd the lnine dipeptide upon going from the gs to condensed phses (Tbles 1 nd 2). The bility of force-field clcultions to tret both the gs nd condensed phses ccurtely my require ltertion of the potentil function by the ddition of electronic polriztion, s lredy mentioned. During djustment of the prmeters, emphsis ws plced on chnging the reltive energies of C7 eq nd R-helicl structures; however, in no instnce ws the R-helicl structure true minimum. Anlysis of the mps in Figure 2 shows the presence of chnnel leding from the C7 eq region to the R-helicl region. Upon going from prmeter set 1 to set 5, the chnnel becomes nrrower nd the C7 eq to R-helicl energy difference becomes smller, in greement with the b initio vlues. Verifiction of the vlidity of the surfce beyond tht outlined bove is difficult, especilly considering tht exct reproduction of gs-phse b initio dt my not yield the best
13 3598 J. Phys. Chem. B, Vol. 102, No. 18, 1998 McKerell et l. Figure 2. Adibtic lnine dipeptide potentil energy surfces for prmeters sets 1-6 (see Tble 6). The left-hnd column of surfces going top to bottom corresponds to sets 1, 2, nd 3, nd the right-hnd column of surfces going top to bottom corresponds to sets 4, 5, nd 6. Contours represent 1 kcl/mol. condensed-phse properties with the present form of the potentil energy function. Comprison with the HF/3-21G lnine dipeptide nlogue mp of Hed-Gordon et l. 83 is limited owing to the numerous minim on tht mp; certin ones pper to be ssocited with the use of the truncted molecule since they do not occur in the full lnine dipeptide. To further vlidte the φ, ψ mps, the empiricl nd b initio energies for the seven conformtions studied by Gould et l. 70 re presented in Tble 7. For the C7 x nd C5 conformers the empiricl dt fll in the rnge of the b initio vlues. Concerning the shpe of the region in the upper left qudrnt of the surfce, the empiricl R R structure is slightly higher in energy while the β conformer is slightly lower thn the b initio vlues. The differences, however, re within 1 kcl/mol, suggesting this region of the mp to be in resonble greement with the b initio predicted results. Interestingly, decresing the R R empiricl energy nd incresing the β-conformer energy would yield surfce more similr to tht of prmeter set 5 in Figure 2. The β2 conformer is 2 kcl/mol or more bove the b initio vlue, suggesting tht the brrier in tht region my be too lrge. This effect occurs to greter extent for the empiricl R L conformer, which is overestimted by pproximtely 6 kcl/mol. This further suggests tht the energies of the brrier regions my be overestimted by the present force field. Additionl b initio clcultions on different conformers of the full lnine dipeptide will llow further verifiction of the mps. Vlues of the ngle τ re lso included in Tble 7 for the conformers tht were studied. Overll, the empiricl force field mimics the b initio results resonbly well. To obtin this level of greement for the chnge in τ with conformtion, s well s for the djustment of the reltive energies of the dipeptide conformers, it ws necessry to introduce ltered vn der Wls (1, 4) interctions for the peptide bond nitrogen nd oxygen toms (see Appendix); i.e., the R min1,4 vlues on the nitrogen nd oxygen toms were set to 1.55 nd 1.40 Å, respectively, llowing closer pproch of those toms. These terms were essentil, in prticulr, for obtining stisfctory vlues of τ for the C5 conformer. In tht conformer, the nitrogen of the first peptide bond is within vn der Wls contct of the oxygen of the second peptide bond. Use of the vn der Wls prmeters obtined from the optimiztion of the interction prmeters (see below) leds to n overestimtion of the vn der Wls repulsion nd n opening of τ by 3.7 in the C5 conformer (not shown). Since the nitrogen nd oxygen re in 1, 4 configurtion, the introduction of the vn der Wls (1, 4) terms llowed for this problem to be overcome. In mny other force fields, such 1,4 vn der Wls scling is used to vrying degrees; e.g., in the CHARMM polr hydrogen prmeter set (1, 4) scling is used only for the crbon toms while the AMBER force fields use (1, 4) scling fctor of 1 / 2 for ll the vn der Wls interctions. 7,79,86 In the present CHARMM ll-tom prmeter set, scling is used only for the peptide oxygen nd nitrogen interction nd for the liphtic crbons; (1, 4) vn der Wls scling in the ltter cse is required for the proper tretment of cyclic structures, such s cyclohexne. The results obtined here for the lnine dipeptide τ vlues re comprble to those of Momny et l., 25 who used similr (1, 4) scling of the peptide N nd C toms. Anlysis of the vibrtionl spectr of the three lnine dipeptide minim ws performed. Comprisons with recent b initio clcultions 87 nd experimentl solution studies bsed on vibrtionl Rmn opticl ctivity (VRAO) 87 provide nother test of the force-field prmeters. Tble 8 shows the vibrtionl spectr obtined with the CHARMM prmeters for the C7 eq, C7 x, nd C5 conformers, including the potentil energy distributions. For the C7 eq nd C5 structures the HF/6-31G(d) vibrtionl frequencies below 1800 cm -1 re lso presented. 87 Only the empiricl ssignments re included; the b initio dt were ssigned on the bsis of the published potentil energy distributions. 87 Overll comprison of the empiricl nd b initio dt, excluding modes 22 nd 21 for the C7 eq nd C5 structures, respectively (see below), shows verge differences of 11 nd 19 cm -1 nd rms differences of 33 nd 56 cm -1 for the C7 eq nd C5 structures, respectively. Both verge differences re positive, indicting the empiricl vlues re consistently lrger thn the b initio dt. This my be prtilly due to the use of scle fctor of 0.88 for the b initio results; 87 other studies suggest vlue closer to Use of the ltter vlue yields verge differences of -9 nd -1 cm -1. In generl, the empiricl nd b initio dt re in stisfctory greement for both the frequencies nd ssignments. The lrgest differences occur for modes 22 nd 21 for the C7 eq nd C5 structures, respectively, which re both ssocited with wgging of the N-H protons. The empiricl force field predicts these wgs to hve vlues significntly higher thn the b initio clcultions; however, other N-H wgs s well s CdO wgs in the region of cm -1 re in resonble greement, consistent with the N-H wg frequency clculted for NMA (see Tble 3, mode 9). For the low frequencies, represented s modes 1-7, the greement is generlly good. Mode 2,
14 Empiricl Prmetriztion of Proteins J. Phys. Chem. B, Vol. 102, No. 18, TABLE 7: Energies, Conformtions, nd Dipole Moments of Fixed Conformtions of the Alnine Dipeptide,b energy τ dipole (φ,ψ) emp. 6-31G(p,d) MP2/TZVP c emp. 6-31G(p,d) emp. 6-31G(p,d) C7 eq(-85.8, 79.0) C7 x(76.0, -55.4) C5(-157.2, 159.8) R R(-60.7, -40.7) R L(67.0, 30.2) β(-57.6, 134.4) β 2(-130.9, 22.3) Energies in kcl/mol, ngles in deg, nd dipole moments in D. b The conformtions selected re those in Tble 2 of ref 70. Empiricl energies were determined following full optimiztion with the φ nd ψ dihedrl ngles constrined to the vlues in Tble 2 of ref 70; the vlues re given in prentheses in column one. c The MP2/TZVP energy for the HF/6-31G(p,d) optimized geometry. 70 representing the methyl torsionl rottions, hs the lrgest disgreement, with the empiricl vlues being pproximtely 40 cm -1 higher thn the b initio vlues. However, the empiricl frequency of mode 4 in the C5 conformer, which contins significnt contribution from the methyl torsion, is lower thn the b initio vlues, indicting the requirement for compromise in the optimiztion with the present energy function. The frequencies for the φ nd ψ torsions re wellreproduced by the empiricl force field, including the decrese in the φ frequency upon going from the C7 eq to the C5 conformer. Differences between the conformers occur lso for the modes ssocited with τ(dn-ct-c), s expected due to the lrge chnges in τ with conformtion. These results suggest tht certin modes could be used to chrcterize the different conformers in experimentl studies. Comprison with the experimentl dt is limited becuse only few frequencies were ssigned. 87 Furthermore, the ssignments rely on the b initio dt included in tht study nd shown in Tble 8. The CdO stretch is suggested to occur in the experimentl regime t 1654 cm -1. Empiricl modes 47 nd 48 for both the C7 eq nd C5 conformers rnge from 1677 to 1684 cm -1, slightly higher, but still in good greement with experiment. The experimentl studies indicte tht the CdO deformtions occur in the rnge cm -1. Empiricl vlues occur in this region nd up to pproximtely 640 cm -1. The C-N stretch of the peptide bond with contributions from deformtions of the N-H nd C R -H groups occur in the region of 1298 cm -1 from experiment. A corresponding vibrtion occurs t 1184 cm -1 of the C7 eq empiricl spectrum. Experimentl modes t 1370, 1445, nd 1503 cm -1 re ssocited with in-plne bending of the N-H moieties. Corresponding frequencies occur t 1265, 1574, nd 1598 cm -1 of the C7 eq nd 1218, 1273, 1572, nd 1609 cm -1 of the C5 empiricl spectr; i.e., the empiricl vlues brcket the experimentl dt. The C-CT stretch is suggested to occur t 963 cm -1 in the experimentl spectr. Empiricl vibrtions with significnt C-CT contributions occur t 569, 662, 819, nd 1265 cm -1 for the C7 eq nd 640, 766, nd 1273 cm -1 for the C5 conformers, gin brcketing the experimentl vlue. Overll, the empiricl force field produces vibrtionl spectr for the lnine dipeptide tht re in stisfctory greement with both b initio nd experimentl dt. Additionl experimentl ssignments would llow more detiled comprisons. A comprison of the C7 eq,c7 x, nd C5 minim from severl empiricl force fields is presented in Tble 9. Comprison cn be mde lso with recent review of vrious spects of the properties of the lnine dipeptide. 88 On the bsis of the b initio results presented in Tble 4, the C7 x conformer is from 2.0 to 2.8 kcl/mol bove the C7 eq conformer, while the C5 conformer rnges from 0.4 to 1.5 kcl/mol bove the C7 eq conformer. The wide vrition found in the b initio results mkes cler tht the required level of b initio theory hs not yet been reched for these molecules. The present CHARMM prmeters yield energies 2.05 nd 0.92 kcl/mol for C7 x nd C5, respectively, reltive to C7 eq. While the C5 energy flls in the middle of the rnge of b initio vlues, the C7 x lies t the lower end. Results on the lnine dipeptide nd the lnine dipeptide nlogue indicte tht the inclusion of electron correltion leds to lowering of the C7 x energy nd n elevtion of the C5 reltive to C7 eq. 88 The CHARMM vlues, thus, re consistent with the b initio dt when electron correltion is tken into ccount. Of the empiricl prmeter sets listed in Tble 9, the AMBER/OPLS nd MSI CHARMm sets re in resonble greement with b initio dt; MM3 yields stisfctory greement for the C5 conformer, nd no vlue of the C7 x is vilble. In AMBER (ll tom) the C7 x nd C5 re similr, with the C7 x being underestimted nd the C5 overestimted. 79 The opposite trend occurs with ECEPP/2, which ws prmetrized to reproduce protein distributions. Also included in Tble 9 re the vlues of φ nd ψ for the three minimum conformers. Comprison of the b initio nd empiricl vlues show differences of 10 or more. The mgnitude of these differences my be of reltively minor importnce considering the rther flt chrcter of the energy surfces in the vicinity of the minimum-energy conformtions (see Figure 2). Also, the ddition of electron correltion vi MP2 theory leds to significnt shift in the C7 x minimum conformtion, 85 gin suggesting tht convergence hs not been chieved in the b initio clcultions. In considering the present prmetriztion, it is useful to refer lso to severl studies tht hve been published recently on the lnine dipeptide nd other models of the protein bckbone. A study by Dudek nd Ponder 89 explored the role of electrosttics on the energetics of the lnine dipeptide in number of moleculr mechnics models. Ab initio reltive energies were determined for series of structures of the lnine dipeptide in which only the φ nd ψ vlues were chnged; i.e., the vrious conformers were obtined with rigid rottions nd did not llow for dibtic relxtion of the other degrees of freedom. The geometric chnges tht occur in the lnine dipeptide between the C7 ex nd C5 conformers, for exmple (Tbles 2 nd 7), indicte tht such b initio rigid-rottion results re only of limited vlue. It ws shown more thn 20 yers go in study of cetylcholine with empiricl energy functions 90 tht it ws essentil to include conformtionl flexibility to obtin meningful reltive energies for different conformers. Also, comprison of other b initio energy clcultions bsed on rigid structures 58 with vriety of b initio clcultions tht included full relxtion show tht the differences in the energies of the lnine dipeptide conformers re significntly overestimted when rigid geometries re used. 91
15 3600 J. Phys. Chem. B, Vol. 102, No. 18, 1998 McKerell et l. TABLE 8: Alnine Dipeptide Chrmm22 nd b Initio Vibrtionl Spectr C7 eq C7 x C5 mode emp..i. ssign. emp. ssign. emp..i. ssign ψ(95) 58 ψ(83) ψ(64) φ(33) tch3(94) 66 tch3(90) tch3(69) φ(17) tch3(62) 82 tch3(71) φ(44) tc-n(29) tc-n(22) tch3(35) ψ(17) tc-n(42) 93 tc-n(62) tch3(64) tch3(32) tch3(18) tc-n(21) φ(17) dn-ct-c(17) φ(82) 132 φ(101) dn-ct-c(26) tc-n(25) tch3(24) tc-n(42) 174 tc-n(74) tc-n(40) dc-n-ct(24) dc-n-ct(16) dn-ct-c(16) tc-n(57) 198 dct-c-n(31) tc-n(83) dc-n-ct(23) dn-ct-c(18) dc-n-ct(40) 242 dc-n-ct(58) dc-n-ct(41) dct-c-n(33) dct-c-n(21) tch3(76) 270 tch3(81) tch3(36) dct-c-n(25) dc-n-ct(17) dn-ct-ct(25) 285 dc-ct-ct(31) tch3(52) dc-ct-ct(22) tch3(16) dn-ct-c(33) 320 dc-n-ct(45) dc-c R-C β(31) tc-n(19) dc-n-ct(21) sc-ct(18) dc-n-ct(32) 357 dn-ct-ct(53) dct-c-n(36) dcdo(31) dcdo(19) dc-n-ct(18) dc-ct-ct(19) dcdo(16) dct-c-n(25) 402 dct-c-n(31) dn-c R-C β(43) dct-c-n(50) 519 dcdo(53) dct-c-n(32) dn-ct-ct(21) dcdo(29) dcdo(54) 577 dct-c-n(19) dcdo(44) sc-ct(23) sc-ct(16) sc-ct(21) dcdo(33) 645 sc-ct(28) dcdo(23) sc-ct(18) dcdo(20) sc-ct(22) dc-n-ct(17) wcdo(70) 656 wcdo(73) wcdo(74) wn-h(24) wn-h(23) wn-h(18) wcdo(41) 742 sc-n(21) wn-h(54) wn-h(36) wcdo(29) scdo(18) 755 wn-h(41) sc-n(24) sc-n(15) wcdo(33) scdo(17) wn-h(15) sc-ct(17) sc-n(32) 805 sc-n(28) sc-n(24) scdo(21) scdo(18) scdo(17) sc-ct(18) wn-h(64) 833 wn-h(59) wn-h(63) dch3(20) dch3(15) wn-h(41) 893 wn-h(50) wn-h(28) wcdo(22) wcdo(26) wcdo(28) dch3(16) dch3(28) 909 dch3(34) dch3(33) sn-ct(26) sn-ct(18) sn-ct(21) sct-ct(20) sct-ct(15) dch3(48) 951 dch3(46) sn-ct(48) sn-ct(37) sn-ct(38) dch3(32) dch3(84) 976 dch3(78) dch3(78) dch3(41) 1005 dch3(32) dch3(41) dc R-H(21) dc R-H(21) sn-ct(17) dch3(60) 1044 dch3(73) dch3(71) sc R-C β(18) dch3(88) 1072 dch3(84) dch3(89) dch3(77) 1082 dch3(53) dch3(72) wcdo(19) dch3(77) 1086 dch3(76) dch3(83) sn-ct(34) 1090 dch3(58) dch3(29) dch3(24) sn-ct(27)
16 Empiricl Prmetriztion of Proteins J. Phys. Chem. B, Vol. 102, No. 18, TABLE 8: (Continued) C7 eq C7 x C5 mode emp..i. ssign. emp. ssign. emp..i. ssign. dc R-H(16) dc R-H(33) 1244 dn-h(26) dn-h(33) dn-h(23) dct-ha(22) sc-n(19) sc-ct(15) dn-h(44) 1270 dn-h(43) dn-h(35) sc-ct(21) sc-ct(23) sc-ct(22) dc R-H(37) 1325 dc R-H(41) dc R-H(48) dch3(15) sc R-C β(18) sct-ct(15) dch3(94) 1386 dch3(95) dch3(95) dch3(88) 1407 dch3(81) dch3(85) dch3(92) 1413 dch3(98) dch3(86) dch3(99) 1416 dch3(99) dch3(98) dch3(100) 1418 dch3(100) dch3(100) dch3(98) 1425 dch3(98) dch369) dc R-H(18) dch3(68) 1429 dch3(62) dch3(100) dc R-H(18) dc R-H(22) dch3(80) 1438 dch3(95) dch3(57) dc R-H(21) dch3(66) 1442 dch3(60) dch3(98) dc R-H(20) dch3(50) 1482 dch3(53) dch3(42) dn-h(23) dn-h(22) dn-h(26) 1552 dn-h(32) dch3(37) sn-ct(15) sc-n(22) dn-h(20) sn-ct(17) sn-ct(17) dn-h(22) 1590 dch3(34) dn-h(24) dch3(22) dn-h(25) dc R-H(15) sc-n(18) sn-ct(17) sn-ct(16) sc-n(15) scdo(64) 1685 scdo(66) scdo(63) scdo(65) 1692 scdo(59) scdo(66) sch3(100) 2852 sch3(100) 2852 sch3(100) sch3(92) 2902 sch3(86) 2902 sch3(92) sc R-H(91) 2903 sc R-H(86) 2905 sc R-H(92) sch3(100) 2914 sch3(100) 2914 sch3(100) sch3(100) 2915 sch3(100) 2914 sch3(100) sch3(100) 2917 sch3(100) 2917 sch3(100) sch3(100) 2958 sch3(100) 2958 sch3(100) sch3(100) 2960 sch3(100) 2961 sch3(100) sch3(100) 2975 sch3(100) 2975 sch3(100) sch3(100) 2975 sch3(100) 2976 sch3(100) sn-h(99) 3319 sn-h(99) 3325 sn-h(99) sn-h(100) 3325 sn-h(100) 3328 sn-h(99) Frequencies in cm -1. Ab initio dt from ref 87. Potentil energy distributions determined with the MOLVIB module in CHARMM. In recent study by Bechy et l., 85 extensive b initio clcultions were performed on the lnine dipeptide nd the lnine tetrpeptide. These clcultions employed geometries fully optimized t the HF/6-31G(p,d) level for the lnine dipeptide nd tetrpeptide, respectively, nd significntly extended the level of theory used to clculte the reltive energies of the nlyzed conformers, including electron correltion, bsed on single-point clcultions of the HF/6-31G(p,d) optimized structures. The lnine dipeptide results were generlly consistent with dt from lower levels of theory (see Tble 4), nd the new lnine tetrpeptide results were used to test number of vilble force fields, including the present CHARMM potentil. A developmentl version of the ll-tom OPLS force field, 92 MMFF, 93 nd MM3 10 were shown to best reproduce the b initio energies of the different conformers obtined from unrestrined minimiztions, while the present prmeters performed somewht worse (e.g., 3.78 vs 1.21 kcl/mol for the rms difference for CHARMM22 nd MMFF, respectively, for optimized geometries with the φ nd ψ dihedrl ngles restrined to the HF/6-31G(d,p) vlues). In terms of the ctul energy vlues, the difference ppers to be significnt becuse the uncertinty in the b initio vlues of the reltive energies is probbly not much worse thn 1 kcl/mol. However, the comprison is of limited vlue becuse the objective of the present force field is to produce ccurte results in solution simultions of peptides nd proteins, rther thn to reproduce b initio clcultions of isolted peptides. This is of prticulr relevnce here becuse some of the conformers of the lnine tetrpeptide tht were studied by Bechy et l. 85 correspond to ones only rrely observed for lnine-contining regions in proteins, nd presumbly in peptides, in solution. In prticulr, severl of the locl minim hve φ, ψ vlues in the R L region (φ = 60, ψ = 60), which re rrely found for lnine. While the R L region is significntly populted in proteins, more thn 50% of the residues observed in this region re glycines (R. L. Dunbrck, Jr., personl communiction). The conformtions of the tetrpeptide, which consist of three lnines plus terminl blocking groups (in nlogy to the dipeptide ), cn be described in terms of the conformtions of lnine units; i.e., the totl conformtionl energy of the tetrpeptide, reltive to the minimum energy structure, cn be pproximted by the sum of the reltive energies of the lnine units. The three tetrpeptide conformers for which there re lrge devitions in the empiricl CHARMM energies from the
17 3602 J. Phys. Chem. B, Vol. 102, No. 18, 1998 McKerell et l. TABLE 9: Reltive Energies nd Conformtions of the Alnine Dipeptide C7 eq,c7 x, nd C5 Minim from Empiricl nd b Initio Clcultions source C7 eq C7 x C5 Energies MP2/TZVP b CHARMM MM3 c AMBER (ll-tom) d AMBER/OPLS e ECEPP/2 e MSI CHARMm f Conformtions (φ, ψ) HF/6-31G** -86, 78 75, , 160 CHARMM -81, 71 70, , 171 MM3-83, , 154 AMBER(ll-tom) AMBER/OPLS -84, 70 67, , 162 ECEPP/2-80, 76 76, , 157 MSI CHARMm -79, 72 70, , 171 Energies in kcl/mol nd dihedrls in deg. Fully optimized empiricl geometries. b Reference 70, MP2/TZVP energy t the HF/6-31G(p,d) optimized geometry. c Reference 10. d Reference 79. e Reference 8. f Reference 25. b initio results hve t lest one of three φ, ψ vlues in the R L region; one hs two φ, ψ in the R L region (reltive energy difference 9.92 kcl), nd the other two (reltive energy differences 6.92 nd 4.82 kcl) ech hve one set of φ, ψ vlues in the R L region. The R L energy of the empiricl energy function is too high by kcl in the lnine dipeptide (see Tble 7). To exmine the importnce of the R L configurtions in the tetrpeptide results, we clculted the rms difference between the CHARMM nd LMP2/cc-pVTZ(-f) reltive energies (with respect to structure 3 of Bechy et l.). For the restrined geometries without pplying ny scling procedure (see Tble 5 of Bechy et l.) only reltive energies hve mening in the empiricl energy function since the zero of energy is rbitrry. Our clculted rms difference is 4.39 kcl/ mol, somewht lrger thn tht reported by Bechy et l. using their comprison method. The rms difference without the three conformers tht contined one or more φ, ψ pirs in the R L region of the dipeptide ws found to be 0.77 kcl/mol. This result is consistent with results from the Bechy et l. study, where omission of conformtions in the R L region led to decrese in the reported CHARMM rms difference from 3.78 to 0.95 kcl/mol. The high energy of the CHARMM R L conformer is consistent with generlly too-high reltive energy of the upper right qudrnt of the lnine dipeptide mp (see Tble 7 nd Figure 2). Becuse of the rrity of the R L conformer in proteins (other thn for glycines), the error in the energy ws ignored in the prmetriztion. It should be noted tht the R R conformer of the glycine dipeptide, which is equivlent to the R L lnine dipeptide conformer, is lso overestimted (see Tble 10). Simultions of cyclic peptides, however, show the φ, ψ ngles of glycines in the R R conformtion to be well-mintined by the present force field (see Tble 7 of Supporting Informtion). These results nd those presented bove on the reltionship between energetics of the lnine dipeptide nd clculted φ, ψ vlues of MbCO point to the importnce of not prmetrizing protein nd peptide force fields simply on the bsis of limited gs-phse b initio dt, s lredy pointed out in the pper by Bechy et l. Another considertion in evluting the comprison is tht the geometry of the peptide bond in the CHARMM energy function ws prmetrized for solution nd crystl structures. This leds to CO bond lengths in the lnine dipeptide in the rnge Å (Tble 2). These vlues TABLE 10: Reltive Energies nd Conformtions for the Glycine Dipeptide from b Initio nd the Empiricl Clcultions conformer φ ψ empiricl b HF/6-31G(p,d) c MP2/TZVP d C C R R β Fully Optimized Empiricl Results e C C R R β 2 Energies in kcl/mol nd dihedrl ngles in deg. b Empiricl energies determined with the φ, ψ vlues constrinted to the HF/6-31G(p,d) (listed) vlues nd the reminder of the molecule fully optimized. c Fully optmized vlues from ref 70. d MP2/TZVP (Dunning s triple-ζ bsis set plus polriztion functions) energy for the HF/ 6-31G(p,d) optimized geometry from ref 70. e Full optimiztions performed following prtil optimiztions with the φ, ψ vlues constrined to the HF/6-31G(p,d) (listed) vlues. Both the R R nd β 2 conformers converted to the C7 during the full optimiztions. TABLE 11: Minimum Interction Energies nd Geometries of NMA with Wter nd the NMA Dimer HF/6-31G(d) c empiricl interction b E min R min ngle E min R min ngle (1) CdO HOH (0.90) (2) N-H OHH (0.47) (3) prllel dimer (-0.59) 1.84 Energies in kcl/mol, distnces in Å, nd ngles in deg. Vlues in prentheses with the empiricl E min dt re the Lennrd-Jones contributions to the interction energies. b See Figure 3 for the interction geometries. c HF/6-31G(d) energies hve been scled by 1.16 (see text). re significntly longer thn the vlues found in the gs-phse b initio clcultions; e.g., vlues of Å re obtined in ref 70. IV..2. Interction Prmeters. Prtil tomic chrges nd Lennrd-Jones prmeters for the protein bckbone were optimized using NMA s the model compound. Dt tht were used included the interction energies nd geometries of the complexes of NMA with wter nd the NMA dimer from b initio clcultions, the dipole moment of NMA, the het of vporiztion nd moleculr volume of pure NMA, nd the het of solvtion of NMA. Additionl testing of the prmeters ws performed vi crystl simultions of NMA nd the lnine dipeptide. Initil interction prmeters were obtined from the CHARMM19 prmeter set 5,15 for the polr toms. The chrges of the methyl groups treted were determined using the stndrd chrge of 0.09 for the hydrogens 21 nd methyl crbon chrges selected to yield neutrl totl chrge. The liphtic hydrogen chrge ws previously determined on the bsis of the electrosttic contribution to the trns-guche energy difference of n-butne. 21 Lennrd-Jones prmeters were obtined from the CHARMM22 ll-hydrogen nucleic cid prmeters for the mide group 13 nd from the CHARMM22 ll-hydrogen lkne prmeters for the methyl groups. Optimiztion of the vn der Wls prmeters ws limited, therefore, to djustment of the peptide bond crbonyl crbon rdius nd well depth. The djustment of the prtil tomic chrges nd vn der Wls prmeters ws performed in n itertive fshion, s outlined in Section II.. Tble 11 lists the interction energies nd geometries for the NMA-wter nd NMA dimer complexes t the minimumenergy geometry from the empiricl nd b initio clcultions
18 Empiricl Prmetriztion of Proteins J. Phys. Chem. B, Vol. 102, No. 18, Figure 3. Interction orienttions of N-methylcetmide with wter (A nd B) nd the N-methylcetmide prllel dimer (C). Figure 4. CHARMM prtil tomic chrges nd tom types for (A) N-methylcetmide nd (B) the lnine dipeptide. for the orienttions shown in Figure 3. As discussed in Section II., the b initio interction energies hve been scled by Comprison of the b initio nd empiricl energies shows excellent greement. The geometries re lso in good greement, with the empiricl distnces pproximtely 0.2 Å less thn the b initio vlues (see Section II.), while the ngles for the interctions with wter re in good greement. The lrgest disgreement occurs for the NMA dimer, for which the empiricl interction energy is slightly too fvorble. Included in Tble 11 re the contributions of the Lennrd-Jones term to the interction energies. As my be seen, the mgnitude of these contributions is significnt, emphsizing the need to blnce the Lennrd-Jones nd electrosttic prmeters. 30 It is of interest tht developmentl version of the CHARMM force field reproduces the coopertivity of binding of multiple wter molecules to NMA. 94 The dipole moment ws monitored while djusting the chrges to reproduce the interction energies. The finl chrge distribution is shown in Figure 4; it yields dipole moment of 4.12, somewht lrger thn the experimentl dipole of Such n overestimtion is due to the requirement tht protein polriztion effects be included implicitly in the force field to be consistent with the chrges of TIP3P wter (see Section II.). TABLE 12: Condensed-Phse Clculted nd Experimentl Dt for N-Methylcetmide Pure Solvent clculted experimentl H vp mol vol. H vp mol vol ( ( Aqueous Solvent b,c H solv mol vol. H solv mol vol (-19.4) 75(65) Energies in kcl/mol nd moleculr volumes in Å 3. b Both the het of solvtion nd moleculr volume re determined from the difference between two lrge fluctuting numbers; on the bsis of the sttisiticl error in the individul vlues the errors re estimted to be (3 kcl/ mol nd (20 Å 3, respectively. c The clculted vlues re bsed on 6 M configurtions; the vlues in prentheses re from 4.5 M configurtions. Additionl nlysis of the vlidity of the chrges ws performed by clculting the dipole moment of the lnine dipeptide s function of conformtion. Shown in Tble 7 re the clculted dipoles from the empiricl force field long with those from b initio clcultions t the HF/6-31G(p,d) level. 70 Overll, the empiricl tomic chrges reproduce the trends seen in the b initio clcultions. With the exception of the C7 eq nd C7 x conformers, the empiricl vlues re lrger thn the HF/6-31G(p,d) vlues, s expected owing to the implicit inclusion of polriztion in the force field. The lrgest dipole occurs in the R R conformer, consistent with the b initio result. The mgnitude of the dipole moment of this conformer is importnt for the proper tretment of the R R helix. Tble 12 presents results for liquid NMA from Monte Crlo clcultions nd experiment. 96 The clculted het of vporiztion is somewht underestimted. However, other experimentl dt hve indicted vlue of 13.3 kcl/mol, 97 suggesting the present vlue is resonble. Comprison of the moleculr volumes shows tht the clculted vlue (133.7 Å 3 ) is within 2% of experiment. As n dditionl test of the NMA prmeters in the condensed phse, the het of solvtion nd moleculr volume in infinitely dilute queous solution were clculted. The results re included in Tble 12. Monte Crlo clcultions of NMA in box of 262 TIP3P wter molecules yielded het of solution of kcl/mol nd moleculr volume of 75 Å 3, s compred to experimentl vlues of kcl/mol nd Å 3, respectively. Comprison with previously reported vlues of kcl/mol nd -9 Å 3 for the het of solution nd moleculr volume of NMA bsed on the OPLS force field nd Monte Crlo clcultion shows tht the present prmeters represent significnt improvement; 98 in the Monte Crlo clcultions the internl geometry of NMA ws constrined to the optimized gs-phse geometry. The greement of the solution results, combined with the NMA pure-liquid properties, confirms tht the interction prmeters re good representtion of the nonbonded interctions of NMA in different environments; i.e., there is the pproprite blnce between the solute-solute nd solute-solvent interctions of the protein bckbone in the present prmeter set. Recently, Kminski nd Jorgensen 99 hve stressed the importnce of the correct representtion of liquid properties by moleculr mechnics force fields. They mde comprisons of new OPLS ll-tom force field, 100 AMBER 94, 69 nd MMFF. 93 The best greement for liquid NMA ws obtined for the OPLS force field; the clculted vlues H vp ) kcl/mol nd moleculr volume of Å 3 re lmost identicl to those reported here.
19 3604 J. Phys. Chem. B, Vol. 102, No. 18, 1998 McKerell et l. Figure 5. Digrms of the interctions between the primry nd imges toms for the (A) N-methylcetmide nd (B) lnine dipeptide crystls. Bold chrcters identify imges bsed on the CHARMM imge nomenclture. IV..3. N-Methylcetmide nd Alnine Dipeptide Crystl Clcultions. Crystl clcultions were performed on NMA nd the lnine dipeptide s n dditionl test of the vlidity of the bckbone prmeters in the condensed phse. The two crystls re shown in Figure 5. NMA crystllizes in n orthorhobmic Pnm spce group t 238 K with four molecules per unit cell. 62 The symmetric unit corresponds to hlf of the NMA molecule bsed on mirror plne through the hevy toms of the molecule. One symmetry opertion genertes the methyl hydrogen cross the mirror plne. In the present clcultions, we use this opertion to generte the full NMA molecule nd then use the CRYSTAL fcility to crete the remining four molecules in the unit cell s well s other unit cells, such tht the primry toms represent single NMA molecule. L-Alnine dipeptide crystllizes in n orthorhombic P spce group t room temperture. 101 Constnt volume, constnt temperture (NVT), nd constnt pressure, constnt temperture (NPT) moleculr dynmics simultions were performed for both crystls. Energy minimiztions of the crystls were performed s function of the nonbonded interction trunction distnces to determine n pproprite vlue. Tble 1 of the Supporting Informtion presents the unit cell prmeters nd energies from the minimiztions of both NMA nd the lnine dipeptide. Other thn the shortest distnces (10-9-7), the results re resonbly behved. The trunction scheme ws chosen since the crystl prmeters pper to be well-converged. For both crystls the totl volume decresed upon minimiztion, s expected becuse the experiments re performed t finite tempertures while the minimiztion corresponds to 0 K. In NMA ll three lttice prmeters contrct by between 2.1 nd 4.1%, indicting tht the crystl is well-represented by the empiricl model. In the lnine dipeptide, the B- nd C-lttice prmeters contrct while the A-term increses; the reson for this is not evident. This trend is lso found in the NPT simultions (see below). Simultions on NMA nd the lnine dipeptide were performed in both the NVT nd NPT ensembles with the trunction scheme. Tble 13 shows the pressures nd unit cell prmeters obtined from the simultions. In the NVT ensemble simultions NMA yielded negtive externl pressure of pproximtely tm while the lnine dipeptide yielded positive pressure of pproximtely 3000 tm (see Section IV.b). Upon going to the NPT ensemble, the verge pressures pproch unity. The devition from unity is due to the lrge pressure fluctutions; in fct, vlues of less thn 1000 tm or so hve only smll effect on the structure nd energy. As expected, there is n overll contrction of the NMA crystl (corresponding to the negtive pressure in the NVT simultion) nd n expnsion of the lnine dipeptide crystl. Comprison of the unit cell prmeters from the minimiztions nd the NPT simultions shows n increse in the simultions, gin s expected becuse of the kinetic energy corresponding to 300 K in the system. In NMA this yields unit cell prmeters tht re in good greement with the experimentl vlues; the B-xis is 0.15 Å shorter thn in the crystl structure. In the lnine dipeptide crystl the A-xis is significntly lrger thn the crystl vlue. The expnsion of the A-xis ppers to be ssocited with interctions of the terminl methyl groups of the lnine dipeptide (see below), but the vlues for B- nd C-xes re in good greement with experiment. Additionl nlysis of the NMA nd lnine dipeptide crystl clcultions ws bsed on exmintion of the structurl detils. The comprison with the NMA crystl is complicted by the fct tht it is disordered with mjor nd minor occupncy of 0.9 nd 0.1, respectively. 63 We use the structurl dt s given, which presumbly refers to the mjor conformer. Tble 14 shows the rms devition between the clculted nd the crystl structure of the primry cell non-hydrogen toms; corresponding results for the dihedrl ngles re given in Tble 15 nd for the nonbonded interction distnces in Tble 3 of the Supporting Informtion. The rms devitions indicte tht there re only minor chnges in the internl structures. This is supported by comprison of the clculted nd experimentl dihedrl ngles. The excellent greement for NMA is expected s the 2-fold dihedrl term for rottion bout the peptide bond combined with the high energy brrier (see Appendix) leds to only smll fluctutions in the vicinity of the minimum. In the lnine dipeptide, rottion bout φ nd ψ is reltively unrestrined, but there is good greement between the clculted nd experimentl vlues. The qulity of the greement is encourging considering tht the φ, ψ vlues in the crystl re not minim on the lnine dipeptide mp but shifted pproximtely 2 kcl/mol bove the C5 minim. Although there is significnt devition of the two peptide bonds from plnrity in the experimentl crystl structure, the clculted vlues re close to plnrity. This my suggest tht the empiricl potentil function is somewht too steep ner plnrity, lthough the rms fluctutions of these dihedrls re pproximtely 9. A recent survey of the CCDB indictes tht devitions from plnrity of the peptide bond do occur; the stndrd devition from plnrity is For the simultion results, the differences between the clculted nd experimentl structures re less thn the rms fluctutions of the dihedrl ngles in ll cses.
20 Empiricl Prmetriztion of Proteins J. Phys. Chem. B, Vol. 102, No. 18, TABLE 13: Results from the Crystl Simultions of N-Methylcetmide nd the Alnine Dipeptide Using the Trunction Scheme pressure unit cell prmeters system P ext P int A B C vol NMA expt NVT ( ( 9596 NPT -310 ( ( ( ( ( Alnine Dipeptide expt NVT 2892 ( ( 7273 NPT 239 ( ( ( ( ( Pressures in tm, lengths in Å, nd volumes in Å 3 ; the unit cell prmeters re fixed in the NVT ensemble t the experimentl vlues. TABLE 14: Rms Differences from the N-Methylcetmide nd Alnine Dipeptide Crystl Clcultions system minimized NVT NPT NMA lnine dipeptide Rms differences in Å for ll non-hydrogen toms following lestsqures fit of the non-hydrogen toms to the crystl structures. For the simultions the time-verged structures were used. Since the internl geometries of NMA nd the lnine dipeptide in the crystl clcultions re very close to the experimentl vlues, the chnges in the unit cell prmeters re ssocited with the nonbonded interction distnces. Tble 2 of the Supporting Informtion lists vrious distnces for both NMA nd lnine dipeptide. For NMA the differences between the experimentl nd clculted distnces re smll nd similr. This is consistent with the isotropic chnges in the unit cell prmeters (see Tble 13 nd Tble 1 of the Supporting Informtion). In the minimiztion, the mjority of distnces become slightly shorter thn experiment. In the simultions in the NVT ensemble, the mjority of distnces re slightly longer thn experiment with ll the differences well within the rms fluctutions of the simultion vlues. In the NPT ensemble, the mjority of distnces gin contrct nd the devitions from the experimentl vlues re ll smller thn those from the minimiztion. In the lnine dipeptide, most distnces increse in the minimiztion, s well s in the NVT nd NPT clcultions. There ws significnt expnsion of the A-xis (see Tbles 13 nd Tble 1 of the Supporting Informtion), while the greement of xes B nd C with experiment ws stisfctory. The digrm of the crystl structure in Figure 5B shows the interctions in the crystl. Hydrogen bonds involving the peptide bonds re ligned with the B- nd C-xes. As my be seen in Tble 2 of the Supporting Informtion, the interction distnces between the nitrogens nd oxygens in the lnine dipeptide re generlly too long; for exmple, the N17 to O5 distnces increse by pproximtely 0.2 Å in both the NVT nd NPT simultions. These differences, however, were not observed in the crmbin, BPTI, nd MbCO crystl simultions (see below), so no dditionl optimiztion of the prmeters ws performed. Anlysis of the remining interctions, mny of which re ssocited with the C-xis, shows trend in the simultions for the nonbonded interction distnces to increse in the simultions. This is true, in prticulr, for the primry-to-primry interctions between the peptide bonds of molecules 1 nd 2 (toms C12 nd N17) nd the primy-to-imge interctions involving O6 to C023 1 C11 nd O6 to C001 1 C19. All of these interction distnces increse significntly in both the minimiztion nd the NPT simultion. Such expnsion my be due to limittions in the prmeters for the interction between polr toms nd liphtic moieties. Limits in the potentil energy function relted to the interction of the dipoles of the peptide bonds contining toms C12 nd N17 in molecules 1 nd 2 of the primry toms my led to the ssocited incresed distnces. In ddition, the sphericl model for the tomic vn der Wls surfces my be insufficient to reproduce the interctions of the π orbitls of the peptide bonds. 103 Despite these limittions, the current prmeters dequtely reproduce the NMA crystl structure nd led to resonble reproduction of the L-lnine dipeptide crystl, lthough res for improvement re evident. IV.b. Tripeptide Crystl Simultions. Previous prmetriztion studies of proteins hve focused on cyclic peptides s test systems (see Tble 16). 6,8,10,104 Although we lso consider cyclic peptides (see Section IV.c), their constrined structures nd the lck of ionic groups limits their pplicbility s model systems for proteins. Consequently, we lso used three noncyclic tripeptide crystls in testing the present prmeter set. They re Gly-Al-Leu 3H 2 O (GAL), Gly-Al-Vl 3H 2 O (GAV), 51 nd Al-Al-Al (AAA). 52 GAL nd GAV represent conformers tht re nerly R-helicl nd hve been suggested to correspond to nucletion structures for helices, while AAA hs n extended prllel β-pleted sheet conformtion. Digrms of the three tripeptides re shown in Figure 6. All of these structures re zwitterions, which llows for testing of the present prmeters on nonbonded interctions involving ionic groups. Crystl minimiztions s function of different cutoffs were performed to test the influence of the trunction scheme on the resulting structures. The minimiztion results re presented in Tble 3 of the Supporting Informtion. As in the NMA nd lnine dipeptide crystl minimiztions presented bove (Section IV..3), there re significnt fluctutions in the unit cell prmeters nd the energies s the cutoff distnces chnge for the shorter cutoff distnces. For the longer cutoff distnces, the fluctutions decresed. The cutoff regime ws gin selected for more detiled studies, lthough the unit cell prmeters nd energies hve not fully converged. The GAL nd AAA crystls contrct in n isotropic fshion. There is some symmetry in the contrction in the GAV crystl, with the A-xis contrcting, the B-xis reltively unchnged, nd the C-xis expnding. In ll cses the minimiztions led to the expected decrese in the totl volumes of the crystls, s discussed in Section IV.3. For the GAL crystl, which hs n orthogonl spce group, both NVT nd NPT moleculr dynmics simultions were performed; only NVT simultions were performed for GAV nd AAA. Tble 17 gives the globl crystl properties nd Tble 18 presents the rms differences between the simultion results nd the crystl structures. In the NVT simultions, positive pressures were obtined for GAV nd GAL, while negtive pressure ws obtined for AAA. Correspondingly, the GAL NPT simultion yielded smll expnsion of the unit cell. The