2/15/2015 Computational methods for increasing the stability of type IV pilin protein from Pseudomonas aeruginosa John Loft. Bioengineering 488 Computational Protein Design (Winter 2015), University of Washington, Box 355013, Seattle, Washington 98195-5013, USA. 1
Abstract Pseudomonas aeruginosa is a multiple drug resistant bacterium commonly known to form biofilms in the lungs, urinary tract, and kidneys. In some studies, it has been responsible for over one fourth of hospital-acquired infections in intensive care units (Vincent 1995). It binds to hosttargets through a mechanism that is dependent upon type-iv pilin, an adhesion protein that is antigenic. Engineered mutants of the type-iv pilin protein with varying sequences and chainlengths may act as potential vaccines against P. aeruginosa, while retaining stable epitopes at increased temperatures. This can increase the shelf life of the vaccines and ease storage requirements for distribution. Utilizing the programs Modeller, FoldIt, and Chimera, a variety of mutants were generated through homologous modeling of the crystal structure of the globular domain of type- IV pilin, obtained from the Protein Data Bank. Five redesigns of the globular domain were initially created. The relaxation of these mutants in water at 310 Kelvin for 5 nanoseconds was then simulated using a molecular mechanics kernel called in lucem molecular mechanics (ilmm), developed by the Daggett Research Group (Beck, et al., 2000-2013). Effective mutations were categorized by multiple data outputs from ilmm, including the residence time of amino acids in pertinent secondary structures, the α-carbon root-mean-square-deviation (Cα RMSD) through time, and the contact time of interacting amino acids. This information was applied to a final redesign of 1DZO, in an effort to optimize the stability further. Introduction The conglomeration of bacteria into biofilms occurs through cell-cell interactions mediated by lipoprotein complexes. Research into disrupting these interactions has become a large part of modern biomedicine. Here we explore computational methods used to design a 2
prospective vaccine for P. aeruginosa by modifying a truncated PAK pilin protein (PDB ID: 1DZO) in both sequence and length. Our consideration focused only on the globular domain of pilin, using residues Gly25 through Arg142, with amino acid numbering beginning at 25 because the fimbrillar portion of the protein is not represented in the 1DZO structure. A patent for multiple antigenic sequences (patent # US 5612036 A) in pilin proteins has been filed that covers residues 129 through 142 for the 1DZO protein. The goal of our simulations was not to discover antigenic sequences, but to create variants of vaccines with higher thermostability without disrupting antigenicity. A handful of proteins were engineered through Chimera and FoldIt, including an automated FoldIt design (AFD), a manual FoldIt design (MFD), an intuitive Chimera design (ID), a fragment of the wild-type (FWT), and a mutated fragment of the wild-type (FMD). Of these structures, four of the five mutants exhibited smaller α-c RMSDs than the wild-type (WT). Some exhibited only trivial gains in thermostability, and speculation is offered on why some mutations and mutation techniques were more advantageous than others. Methods Homology Modeling Our first step was to obtain a known sequence that encodes for the pilin protein from GenBank, an NIH genetic sequence database. The sequence we used was entitled, type 4 fimbrial precursor PilA [Pseudomonas aeruginosa PAO1] (Genbank ID: AAG07913.1). We obtained a FASTA file for this sequence, and BLAST searched the sequence to locate experimentally determined structures in the PDB, selecting the model 1DZO, determined by x- ray diffraction with a resolution of 1.63 Å. 3
After finding the first overlapping sequence of 1DZO with the GenBank sequence, we truncated the GenBank sequence to eliminate the fimbrillar section of the protein. A homology model matching the GenBank sequence to the secondary structures of 1DZO was then generated using perl scripts from the Modeller bioinformatics package, with the computational aid of the Stampede supercomputer, based in the University of Texas, Austin. These operations calculated a minimized energy structure of the GenBank pilin protein. Aligning the two structures in Chimera and calculating the RMSD between them was then used to validate the model. Homology modeling can be a key tool that allows for the prediction of structures from unconserved sequences, and this exercise proved practical for future research on mutated strains of P. aeruginosa, but for our mutated models in subsequent research, we used only the 1DZO sequence. Computational Design of the Pilin Adhesion Protein In the next component of the study, FoldIt was used to generate an automated design (AFD). The methodology behind the automated design featured freezing the epitope and allowing all other residues to mutate in a manner that reduced the total Rosetta energy score. Fifty-five mutations were made in the AFD, lowering the sequence identity to 61.67% of the original 1DZO. Fine-grain energy minimization was conducted afterwards. A manual design (MFD) was also created in FoldIt, and the methodology behind the MFD was similar to that of the AFD, except that in addition to the epitope, all cysteines, glycines, and prolines were not permitted to mutate. FoldIt s automatic mutation feature conducted the changes, and by shear coincidence, only fifty-five mutations were made again. The AFD and MFD sequences were then examined in Chimera and found to share 60% sequence identity, ensuring that they were not the same protein. 4
Additionally, a fragment of the WT protein (FWT) was created in FoldIt, simply by deleting the first five residues in the 1DZO structure. This fragment was virtually replicated, and a mutated fragment (FMD) based off of it was designed, containing a total of ten mutations. Seven of these mutations were made in the alpha helix in an attempt to increase hydrogen bonding between coils and three were made in loop structures, with the mutation selection again based off of FoldIt s minimum Rosetta energy scoring function. A fifth design was also created exclusively in Chimera. Only two mutations were made, Ala86Thr and Ile115Thr. Mutation Ala86Thr was committed because alanine is hydrophobic and residue 86 is located on the outside of the structure. Modification to threonine, a polar uncharged side chain, reduces hydrophobic interactions while increasing the chance for hydrogen bonding between beta-sheets. This change may result in a lower free energy score by increasing the entropy of the protein, as threonine can take on more possible configurations than alanine. Mutation Ile115Thr was also made to increase beta-sheet hydrogen bonding and increase the solvable surface area of the protein. These alterations were facilitated by the rotamer selection feature in Chimera, using the Dunbrack library, and it should be noted that in future studies, the dynameomics library should be used instead. Molecular Dynamics The five redesigns of the truncated pilin protein were then simulated in a molecular dynamics kernel. In lucem molecular mechanics (ilmm) prepared the PDB files, making a number of assumption that can be read in the referenced literature. One important faulty assumption the ilmm kernel made was that no disulfide bond occurred between residues 104 and 117. This error was rectified by manually specifying a disulfide bond. The proteins were simulated for 5 ns at 310 K through multiple cycles of steepest decent minimization. With all 5
parameters specified, each simulation on the Stampede supercomputer took several hours. The results of these simulations were then integrated to produce a final redesign. Results Our homology model created with Modeller displayed an excellent RMSD of 0.368 Å, with a shared sequence identity of only 56.52%. Of the five initial designs plus the final design created with Chimera and FoldIt, all designs except the MFD exhibited a smaller, final Cα RMSD than the WT protein. The AFD produced the best results, with a final Cα RMSD reduction of 0.81 Å. The ID and FMD produced reductions of 0.53 Å and 0.57 Å respectively. The final design, however, only reduced the Cα RMSD by 0.18 Å. The dssp modules output from the simulation showed a shift of -5 amino acids from the actual residue number that correlated to the displayed secondary structure. It is unknown why this occurred. From the WT dssp module, it was noticed that a single amino acid, Pro21, has trouble adopting an alpha helix structure. It was also noted that intermediate residues in the third beta-sheet could be mutated to more strictly adopt the beta-sheet conformation. Some of the dssp modules from mutants indicated that a strengthened 3/10 helix might be created in one of the mid-chain loop structures as well. Alarmingly, the dssp for the MFD and the final design showed that the 3/10 helix located in the epitope was largely absent. Discussion It was discouraging to see that the AFD displayed the largest reduction in Cα RMSD. Although the AFD had the most hydrophilic and charged substitutions with many long lyseines exposed into the solvent, the fact that no other mutants displayed competitive thermostability suggests that greater intuition should have been employed in the initial protein engineering. More 6
modifications to the hydrophobic core and the soluble accessible beta-sheets during the intuitive design phase may have resulted in a greater change in the ID thermostability. Further discouraging results occur in the dssp data, with the MFD and final design lacking a consistent 3/10 helix in the epitope region, suggesting a loss of antigenicity. It is suspected that in the MFD, this loss is due simply to poor mutations. In the final design, however, an error was made in not explicitly specifying the disulfide bond that occurs in the epitope. This oversight is likely the culprit of flawed dssp data, and the simulation should be rerun to examine if the final design can maintain antigenicity. While the final design may not claim the highest thermostabiliy, it does exhibit smaller perturbations in Cα RMSD through time, indicating its total energy as a function of conformation may have comparatively fewer local minima, and therefore the protein might be more rigid in its range of motion, despite being only slightly more thermostable than the WT. Conclusion Objectively from these trials, the AFD trumped all other designs in thermostability, and if it can be proved that this mutant can fold into the proper structure from an unfolded state and maintain antigenicity, the AFD would act as a superior vaccine. As a first exploration into protein engineering, many pitfalls occurred in this study. There is little doubt that more advantageous mutants can be created. Nonetheless, the methods demonstrated within show where caution should be taken in protein engineering and molecular dynamics simulations, and provide indispensable tools for further research. 7
References 1.) Vincent, J.-L. (1995). The Prevalence of Nosocomial Infection in Intensive Care Units in Europe. JAMA, 274(8), 639. doi:10.1001/jama.1995.03530080055041 2.) Beck D.A.C., McCully M.E., Alonso D.O.V., Daggett V. (2000-2012) in lucem Molecular Mechanics (ilmm). University of Washington, Seattle. 3.) Crosslinked polypeptide vaccine with cysteine groups and carriers. (1997, March 18). Retrieved from http://www.google.com/patents/us5612036 Appendix Tables Protein( Type( Final(Cα( RMSD((Å)( WT+ 3.03961+ FINAL+ 2.85542+ ID+ 2.49824+ MFD+ 3.8353+ AFD+ 2.23056+ FWT+ 2.65751+ FMD+ 2.47399+ Table 1. The final Cα RMSDs of each simulation relative to its starting structure. Mutation Pro21Arg Ser35Glu Val54Arg Ala55Ser Ala56Lys Tyr63Arg Ala65Phe Iso94Val Reasoning Decrease kink in alpha helix Increase beta bridge stability Create 3/10 helix Create 3/10 helix Create 3/10 helix Increase beta sheet stability Increase beta sheet stability Increase beta sheet stability Table 2. Justification for each mutation in the final design. 8
Figure Legends Figure 1. The wild-type structure of 1DZO. Figure 2. The patented residues of pilin protein colored red in 1DZO. Figure 3. The intuitively designed mutant with altered residues shown in orange and residues within 4.0 Å of altered residues colored in purple. Figure 4. The automated FoldIt Design colored with the same paradigm as figure 3. Figure 5. The fragment of the wild-type colored with the same paradigm as figure 3. Figure 6. The final design with mutated fragments colored in red. Figures 7-9. The Cα RMSD of the ID & WT through time, the Cα RMSD of the MFD & WT through time, and the Cα RMSD of the AFD & WT through time. Figure 10-12. The Cα RMSD of the FWT & WT through time, the Cα RMSD of the FMD & WT through time, and the Cα RMSD of the final design & WT through time. Figure 13. WT secondary structures through time. Figure 14. Secondary structures of the Chimera intuitive design through time. Figure 15. Secondary structures of the manual Foldit design through time. Figure 16. Secondary structures of the automated Foldit design through time. Figure 17. Secondary structures of the WT fragment through time. Figure 18. Secondary structures of the manual designed fragment through time. Figure 19. Secondary structures of the final design through time. 9
Figures 10
+ 11
12
13
14
15
16 0.00E+00+ 5.00E801+ 1.00E+00+ 1.50E+00+ 2.00E+00+ 2.50E+00+ 3.00E+00+ 3.50E+00+ 4.00E+00+ 0+ 150+ 310+ 470+ 630+ 790+ 950+ 1110+ 1270+ 1430+ 1590+ 1750+ 1910+ 2070+ 2230+ 2390+ 2550+ 2710+ 2870+ 3030+ 3190+ 3350+ 3510+ 3670+ 3830+ 3990+ 4150+ 4310+ 4470+ 4630+ 4790+ 4950+ Cα(RMSD((Å)( Time((ps)( WT+RMSD+ Intui?ve+Design+ 0.00E+00+ 1.00E+00+ 2.00E+00+ 3.00E+00+ 4.00E+00+ 5.00E+00+ 0+ 150+ 310+ 470+ 630+ 790+ 950+ 1110+ 1270+ 1430+ 1590+ 1750+ 1910+ 2070+ 2230+ 2390+ 2550+ 2710+ 2870+ 3030+ 3190+ 3350+ 3510+ 3670+ 3830+ 3990+ 4150+ 4310+ 4470+ 4630+ 4790+ 4950+ Cα(RMSD((Å)( Time((ps)( WT+RMSD+ Manual+FoldIt+Design+RMSD+ 0.00E+00+ 5.00E801+ 1.00E+00+ 1.50E+00+ 2.00E+00+ 2.50E+00+ 3.00E+00+ 3.50E+00+ 4.00E+00+ 0+ 150+ 310+ 470+ 630+ 790+ 950+ 1110+ 1270+ 1430+ 1590+ 1750+ 1910+ 2070+ 2230+ 2390+ 2550+ 2710+ 2870+ 3030+ 3190+ 3350+ 3510+ 3670+ 3830+ 3990+ 4150+ 4310+ 4470+ 4630+ 4790+ 4950+ Cα(RMSD((Å)( Time((ps)( WT+RMSD+ Automated+FoldIt+Design+RMSD+
17 + 0.00E+00+ 5.00E801+ 1.00E+00+ 1.50E+00+ 2.00E+00+ 2.50E+00+ 3.00E+00+ 3.50E+00+ 4.00E+00+ 0+ 150+ 310+ 470+ 630+ 790+ 950+ 1110+ 1270+ 1430+ 1590+ 1750+ 1910+ 2070+ 2230+ 2390+ 2550+ 2710+ 2870+ 3030+ 3190+ 3350+ 3510+ 3670+ 3830+ 3990+ 4150+ 4310+ 4470+ 4630+ 4790+ 4950+ Cα(RMSD((Å)( Time((ps)( WT+RMSD+ WT+Fragment+RMSD+ 0.00E+00+ 5.00E801+ 1.00E+00+ 1.50E+00+ 2.00E+00+ 2.50E+00+ 3.00E+00+ 3.50E+00+ 4.00E+00+ 0+ 150+ 310+ 470+ 630+ 790+ 950+ 1110+ 1270+ 1430+ 1590+ 1750+ 1910+ 2070+ 2230+ 2390+ 2550+ 2710+ 2870+ 3030+ 3190+ 3350+ 3510+ 3670+ 3830+ 3990+ 4150+ 4310+ 4470+ 4630+ 4790+ 4950+ Cα(RMSD((Å)( Time((ps)( WT+RMSD+ Manually+designed+fragment+RMSD+ 0.00E+00+ 5.00E801+ 1.00E+00+ 1.50E+00+ 2.00E+00+ 2.50E+00+ 3.00E+00+ 3.50E+00+ 4.00E+00+ 0+ 150+ 310+ 470+ 630+ 790+ 950+ 1110+ 1270+ 1430+ 1590+ 1750+ 1910+ 2070+ 2230+ 2390+ 2550+ 2710+ 2870+ 3030+ 3190+ 3350+ 3510+ 3670+ 3830+ 3990+ 4150+ 4310+ 4470+ 4630+ 4790+ 4950+ Cα(RMSD((Å)( Time((ps)( WT+RMSD+ Final+Design+RMSD+
18
19
20
21
22
23
24