Protein Folding Protein folding is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil. Each protein exists as an unfolded polypeptide or random coil when translated from a sequence of mrna to a linear chain of amino acids. This polypeptide lacks any developed three-dimensional structure. Amino acids interact with each other to produce a well-defined threedimensional structure, the folded protein, known as the native state. The resulting three-dimensional structure is determined by the amino acid sequence (Anfinsen's dogma).
Anfinsen's dogma (also known as the thermodynamic hypothesis) is a postulate in molecular biology championed by the Nobel Prize winner Christian B. Anfinsen. The dogma states that, at least for small globular proteins, the native structure is determined only by the protein's amino acid sequence. This amounts to saying that, at the environmental conditions (temperature, solvent concentration and composition, etc.) at which folding occurs, the native structure is a unique, stable and kinetically accessible minimum of the free energy. The three conditions:
The three conditions: uniqueness requires that the sequence does not have any other configuration with a comparable free energy. Hence the free energy minimum must be unchallenged. stability small changes in the surrounding environment cannot give rise to changes in the minimum configuration. This can be pictured as a free energy surface that looks rather like a funnel (with the native state in the bottom of it) than like a soup plate; the free energy surface around the native state must be rather steep and high, in order to provide stability. kinetical accessibility means that the path in the free energy surface from the unfolded to the folded state must be reasonably smooth or, in other words, that the folding of the chain must not involve highly complex changes in the shape (like knots or other high order conformations).
How the protein reaches this structure is really the subject of the field of protein folding, which has a related dogma called Levinthal's paradox. The Levinthal paradox states that the number of possible conformations available to a given protein is astronomically large, such that even a small protein of 100 residues would require more time than the universe has existed to explore all possible conformations and choose the appropriate one, it would also arguably make computational prediction of protein structures under the same basis unfeasible if not impossible.
In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations. For example, a polypeptide of 100 residues will have 99 peptide bonds, and therefore 198 (2x99, duh) different phi and psi bond angles. If each of these bond angles can be in one of three stable conformations, the protein may misfold into a maximum of 3 198 different conformations (including any possible folding redundancy). Therefore if a protein were to attain its correctly folded configuration by sequentially sampling all the possible conformations on the nano/picosecond time scale, it would require a time of ~10 26 seconds to arrive at its correct native conformation. This is longer than the age of the universe (432,329,886,000,000,000 seconds). The "paradox" is that most small proteins fold spontaneously on a millisecond or even microsecond time scale. This paradox is central to computational approaches to protein structure prediction.
What this means: proteins need help to fold. The correct three-dimensional structure is essential to function, although some parts of functional proteins may remain unfolded. Failure to fold into native structure produces inactive proteins that are usually toxic. Several neurodegenerative and other diseases are believed to result from the accumulation of amyloid fibrills formed by misfolded proteins. Many allergies are caused by the folding of the proteins, for the immune system does not produce antibodies for certain protein structures.
But wait.it turns out that a protein just doesn t go from it s unfolded state to it s native/folded state directly. It passes through an intermediate step, where some of the native state is formed, but not fully. This state is called the molten globule. A polypeptide acquires most/some of if its correct 2 o structure, with the α helices and β sheets somewhat or fully formed it has a looser 3 o structure than the native state this is the molten globular state. The compaction that is necessary to go from the molten globular state to the final native state occurs spontaneously.
Globular proteins Every biochemist or molecular biologist who has worked with proteins knows by experience that they are unstable. Slight changes in ph or temperature can convert a solution of biologically active protein molecules in their native state to a biologically inactive denatured state. The energy difference between the folded, native state and the unfolded, denatured statein physiological condition is small, about 5 15 kcal/mol, not much when you consider that the energy contribution of a single H bond is ~2 5 kcal/mol.
There are 2 major contributors to the energy difference between the folded and the denatured state: enthalpy (H) and entropy (S). Enthalpy derives from the energy of the non covalent interactions within the polypeptide chain the hydrophobic interactions, H bonds and ionic bonds. The covalent bonds within and between the amino acid residues in the polypeptide chain are the same in the native and denatured states, with the exceptions of disulfide bonds in those proteins where these form between cysteine residues. Think about that. There are the same number of covalent bonds in a floppy, linear protein, as there are in a nicely folded, happy protein. OK, stop thinking about that.
The non covalent interactions on the other hand differ significantly between the two states. In the native state these interactions are maximized to produce a compact globular molecule with a tightly packed hydrophobic core whereas the denatured state is more open and the side chains are more loosely packed. These non covalent interactions are therefore stronger and more frequent in the native state and hence their energy contribution, enthalpy, is much larger. The enthalpy difference between native and denatured states can reach several hundred kcal/mol.
Entropy derives from the 2 nd law of thermodynamics which states that energy is required to create order. Anything to do with needing energy is quite annoying, which is why disorder is preferred. Proteins in the native state are highly ordered in one main conformation whereas the denatured state is highly disordered, with the protein molecules in many different conformations. A typical unfolded protein (a solution in 6M guanidium chloride or 8M urea) contains 10 15 10 20 protein molecules, each of which will have a unique conformation. They love this. In the absence of compensating factors it would therefore be entropically much more favorable for the protein to be in the disordered denatured state. The energy difference due to entropy between the native ordered state and the denatured state can also reach several hundred kcal/mol but in the opposite direction to the enthalpy difference.
So, we are going to have the difference between two large numbers, the enthalpy difference and the entropy difference, that defines how stable proteins are. That s why the total energy difference between the native and the denatured state of 5 15 kcal/mol, which is called the free energy difference (ΔG). The fact that this difference is very small is a severe complication factor both for predictions of possible native states and for interpretation of factors responsible for the stability or instability of protein molecules, because our knowledge about the denatured state is very incomplete. You make a small error in enthalpy or entropy estimations and you change the free energy a lot.
The marginal stability of the native state over the denatured state is biologically very important. Living cells need globular proteins in correct quantities at appropriate times. It is therefore as important to be able easily to degrade these proteins as it is to be able to synthesize them. Globular proteins in living cells usually have a rather rapid turnover and their native states have therefore evolved to be only marginally stable. Moreover, the catalytic activities of enzymes, and other important functions of proteins, generally require some structural flexibility, which would be inconsistent with a completely rigidly stabilized structure. Proteins tremble on the brink of instability because easy to move and change. The more folded the more difficult to attack and change often.
Molten globules are intermediates in folding The 1 st observable event in the folding pathway of some proteins is a collapse of the flexible disordered unfolded polypeptide chain into a partly organized globular state, which we know is called the molten globule
This event is fast, usually within the deadtime of the experimental observation, which is a few milliseconds. We therefore know almost nothing about the process that leads to the molten globule, but we know some of the properties of this state. The molten globule has most of the 2 nd structure of the native state and in some cases even native like positions of the α helices and β strands. It is less compact than the native structure and the proper packing interactions in the interior of the protein have not been formed. Also loops and other elements of surface structure remain largely unfolded, with different conformations. The molten globule should, therefore, not be viewed as a single structural entity but as an ensemble of related structures that are rapidly inter converting.
In a 2 nd step, which can last up to 1 second, persistent native like elements of 3 o structure beginning to develop, possibly in the form of sub domains that are not yet properly docked. The ensemble of the conformations is much reduced compared with those of the molten globule but it is still far from a single form. The single native form is reached in the final stage of folding which involves the formation of native interactions throughout the protein, including hydrophobic packing in the interior as well as the fixation of surface loops. Hydrophobic packing is the DRIVING FORCE
Burying hydrophobic side chains is a key event Hydrophilics Hydrophobics
All of this delicate energy balancing means that folding mechanisms are difficult to examine experimentally, since the possible intermediates have a very short lifetime. If kinetic factors are important for the folding process it is possible that the observed folded conformation is not the one with the lowest free energy but rather the most stable of those conformations that are kinetically accessible. The protein might be kinetically trapped in a local low global energy minimum which might have a different fold. In such a case structure prediction by energy calculations would give the wrong structure even if such calculations could be made with great accuracy. One important question therefore is how a living cell can prevent the folding pathway from being blocked at an intermediate stage. The most common obstacles to correct folding seem to be (1) aggregation of the intermediates through exposed hydrophobic groups, (2) formation of incorrect disulfide bonds, and (3) isomerization of proline residues. To circumvent these 3 obstacles cells produce special proteins that assist the folding process
Back to..burying hydrophobic side chains is a key event The collapse of the unfolded state to generate the molten globule embodies the main mystery of protein folding. What is the driving force behind the choice of native 3 o fold from a randomly oriented polypeptide chain? There is very little change in free energy by forming the internal hydrogen bonds that are characteristic of α helices and β sheets because in the unfolded state equally stable hydrogen bonds can be formed to water molecules. You have a linear chain with NH s and CO s H bonded to water, or a 3D structure where these things are H bonded to each other in helices and sheets. 2 o structure formation therefore cannot be the thermodynamic driving force of protein folding. On the other hand there is a large free energy change by bringing hydrophobic side chains out of contact with water and into contact with each other in the interior of a globular entity. Thus the most likely scenario is that the polypeptide chain begins to form a compact shape with hydrophobic side chains at least partially buried very early in the folding process.
This scenario, hydrophobic interactions, has several important consequences. It vastly reduces the number of possible conformations that need to be searched because only those that are sterically accessible within this shape can be sampled. Second, when some of the side chains are partly buried, their polar backbone NH and CO groups are also buried in a hydrophobic environment unable to form hydrogen bonds to each other, which they can only do if they are close together. The simplest way to form such binds is by forming elements of 2 o structure. The formation of 2 o structure early in the folding process can therefore be regarded as a consequence of burying hydrophobic side chains and not as a driving force for the formation of molten globule. So summary: 2 o structure formation is NOT driving force for folding but forming hydrophobic core IS the thermodynamic driving force. Core assembles first followed by the secondary 2 0 THIS IS IMPORTANT
Hydrophobic side chains are usually scattered along the entire sequence in a seemingly random manner. In the native state of the folded protein about ½ of these side chains are buried in the interior and the rest scattered on surface of protein, surrounded by hydrophilic side chains. The buried side chains are not clustered in the sequence but scattered along entire polypeptide chains. Why are some on the surface of the protein?
Targeting interactions
So, we really need a nice balance on the protein surface. We need hydrophilics on there to interact with water/solvent shells so that the protein is soluble. Remember these interactions provide nice energy. But we also need hydrophobics to do some targeting work with other systems. It s all balance/compromise.
Both single and multiple folding pathways have been observed In order to understand fully any folding pathway, all states of the pathway must be characterized both structurally and energetically. During the folding process, the protein proceeds from a high energy unfolded state to a low energy native state through metastable intermediate states with local low energy minima separated by unstable transition states of higher energy. The characterization of these states is not trivial and many different experimental techniques are employed, including NMR, hydrogen exchange, spectroscopy and thermochemistry.
Barnase, a small bacterial ribonuclease, has been seen to have an intermediate molten globule state with most of its native 2 o structure as well as native like position of the α helix and β sheet.
Barnase has a single major transition state.
Lysozyme At 20ms 2 major intermediates were found. One where the α helical domain is well formed but the β sheet was not. One which was quite messy. Point is..you can have more than one intermediate
formed Fully unfolded Fully folded formed Lysozyme has two intermediate states unlike Barnase
Enzymes assist formation of proper disulfide bonds during folding disulfide bonds = very important The formation of correct disulfide bonds during folding process poses special problems for cells. In the denatured state there are no disulfide bridges; the cysteine residues are reduced. In bacteria this oxidation occurs mainly in the periplasmic space and is catalyzed by a family of enzymes called disulfide bridge forming enzymes, Dsb.
In bacteria proteins with disulfide bridges are essentially only found in the periplasmic space and in the outer membrane or secreted. In eukaryotic cells disulfide bond formation occurs in the endoplasmic reticulum before proteins are exported to the cell surface. Here an enzyme called protein disulfide isomerase, PDI, catalyzes internal disulfide exchange to remove folding intermediates with incorrectly formed disulfide bridges. Proteins with disulfide bonds are NOT found in the cytosol but are located in the plasma membrane or are secreted.
You can imagine this is a big deal
What if the protein makes incorrect S-S connections? Mis-folded.
The influence of disulfide bond formation on folding in vitro has been extensively studied by Thomas Creighton, EMBL and Peter Kim at MIT. Creighton s pioneering work, introduced the trapping of disulfide bonded intermediates as a method for studying the folding pathways of proteins. Recent experiments have shown that in vitro results are relevant for the folding process in vivo. Creighton and Kim have studied the small protien, bovine pancreatic trypsin inhibitor, BPTI, which has 6 cysteine residues that form 3 disulfide bonds within its polypeptide chain of 58 residues.
The fully reduced protein is largely unfolded and does not fold until the cysteine residues are oxidized to disulfide bridges. In the native state, fully folded, these bonds are between cysteine residues 30 51, 5 55, and 14 38. During the folding process formation of the first disulfide bond is almost random and all possible single disulfide species are in rapid interchanging equilibrium. However, the intermediate with S S at 30 51 seems present in about 60% of the molecules. The productive folding pathway goes through this stable intermediate, all others must rearrange to this one. It has a partly folded conformation comprising the native like α helix sticking out into solution and a nice β sheet linked by 30 51 S S bond. Remember: Native = 30 51, 5 55, and 14 38.
The second S S bonds formed from the 30 51 intermediate are between all 3 possible pairs of flexible cysteines, to form 5 14, 5 38, or 14 38. The first two are non native and occur primarily because they are in flexible parts of the polypeptide chain. The third is native (14 38 S S bond). So you imagine that the bottom one directs what happens next. But noooooo look at where the 5 and 55 Cys s are. Miles apart. This thing is trapped. Native = 30 51, 5 55, and 14 38.
The whole thing undergoes some massive rearrangement. S S bond 5 55 gets formed, to bring 14 38 closer together.
Not a sequential process flips within itself by breaking bonds to get final arrangement Predominant Not Predominant At this stage lots of intra molecular rearranging
Trans peptide with C=O and NH groups pointing in opposite directions was 1000 times preferred to the cis peptide. However, when the 2 nd residue is a proline the cis conformer is only 4 times less stable. Mainly the Pro trans conformer, but the cis version is found in tight turns and are sometimes essential for conformational flexibility. In the native protein the cis proline arrangements are stabilized by 3 o structure interactions, but in the unfolded state there is an equilibrium between cis/trans isomers. When folding occurs, you can get the proline peptide bond in the wrong form. From a kinetic standpoint, cis trans proline isomerization is a very slow process that can impede the progress of protein folding by trapping one or more proline residues crucial for folding in the non native isomer, especially when the native protein requires the cis isomer. More prolines more chance. As noted cis trans isomerization is a slow process and in vitro is often the rate limiting folding step. NEED to get things in right conformations!
In vivo, rates of this process are enhanced by enzymes initially called peptidyl prolyl isomerases. First one found cyclophilin *impacts the rate of cis trans isomerization of proline peptides by a factor of a million.
Proteins can fold or unfold inside chaperonins Before protein molecules attain their native folded state they may expose hydrophobic patches to the solvent. Isolated purified proteins will aggregate during folding even at relatively low protein concentrations. Inside cells, where there are high concentrations of many different proteins, aggregations can occur during the folding process. This is prevented by molecular chaperones, ubiquitous and abundant families of proteins that assist the folding of both nascent polypeptides still attached to ribosomes and released completed polypeptide chains. These molecular machines use chemical energy, in the form of adenosine triphosphate (ATP), to promote protein folding in all cells. The structure of these chaperonins resemble two donuts stacked on top of one another to create a barrel. Each ring is composed of either 7, 8 or 9 subunits depending on the organism in which the chaperonin is found.
Chaperonins bind partly folded and incorrectly folded protein molecules but not proteins in native state. They are promiscuous in that they bind to and assist the folding of a large number of different proteins independent of the latter s amino acid sequences. Two types: Group II Group II chaperonins, found in the eukaryotic cytosol and in archaea, are more poorly characterized. Forget these for now.
Group I Group I chaperonins are found in bacteria as well in chloroplasts and mitochondria. The GroEL/GroES complex in E. coli is a Group I chaperonin and the best characterized large (~ 1 MDa) chaperonin complex. GroEL is a double-ring 14mer with a greasy hydrophobic patch at its opening and can accommodate the native folding of substrates 15-60 kda in size. GroES is a single-ring heptamer that binds to GroEL in the presence of ATP. It's like a cover that covers GroEL (box/bottle). GroEL/GroES may not be able to undo protein aggregates, but kinetically it competes in the pathway of misfolding and aggregation, thereby preventing aggregate formation.
GroEL. "Top view" C-alpha trace colored by crystrallographic B-factors. Red indicates high mobility; blue low mobility. The inside of the apical domains, where unfolded substrates are thought to bind, are the most mobile. GroES. "Top view" C-alpha trace colored by crystallographic B- factors. Only one of seven mobile loops is visible at left.
Crystal structure of E. coli GroEL and GroES complex. (a) The overall structure of GroEL/GroES complex The GroES molecule is represented by orange surface. One GroEL monomer in the GroEL top (cis) ring is displayed as ribbon, with the apical, intermediate, and equatorial domains colored in blue, green, and red, respectively. The rest of the top (cis) ring and the entire bottom (trans) ring are shown as grey and cyan surface representations, respectively. Ribbon representation of the GroEL heptameric bottom (trans) ring. Each GroEL monomers is shown in a different color.
HOW IT WORKS: Unfolded proteins bind to hydrophobic surfaces in the central cavities of either GroEL ring. ATP and GroES bind to GroEL forming a cap over the protein-containing cavity and simultaneously causing a conformational change in GroEL that sequesters the hydrophobic surfaces. This releases the protein into the cavity where it is allowed to fold into its native structure as dictated by the primary amino acid sequence. Discharge of the protein into the bulk solvent may occur only when ATP and GroES bind to the opposite ring of GroEL, triggering an unfavorable ring-ring interaction that leads to dissociation of the first GroES and release of the folded protein.