Folding and misfolding of the papillomavirus E6 interacting peptide E6ap

  1. Bianxiao Cui*,
  2. Min-Yi Shen*, and
  3. Karl F. Freed
  1. James Franck Institute and Department of Chemistry, University of Chicago, Chicago, IL 60637
  1. Communicated by R. Stephen Berry, University of Chicago, Chicago, IL, February 27, 2003 (received for review October 7, 2002)

Abstract

All-atom Langevin dynamics simulations have been performed to study the folding pathways of the 18-residue binding domain fragment E6ap of the human papillomavirus E6 interacting peptide. Six independent folding trajectories, with a total duration of nearly 2 μs, all lead to the same native state in which the E6ap adopts a fluctuating α-helix structure in the central portion (Ser-4–Leu-13) but with very flexible N and C termini. Simulations starting from different core configurations exhibit the E6ap folding dynamics as either a two- or three-state folder with an intermediate misfolded state. The essential leucine hydrophobic core (Leu-9, Leu-12, and Leu-13) is well conserved in the native-state structure but absent in the intermediate structure, suggesting that the leucine core is not only essential for the binding activity of E6ap but also important for the stability of the native structure. The free energy landscape reveals a significant barrier between the basins separating the native and misfolded states. We also discuss the various underlying forces that drive the peptide into its native state.

Computer simulations provide a powerful tool for elucidating the equilibrium and folding dynamics of biological macro-molecules (1). Most all-atom studies of protein dynamics during the last decade are limited to describing the short-time conformationally rather restricted molecular motions near the native state (14) or selected unfolded states (59), where molecular dynamics simulations are quite adequate. On the other hand, a complete understanding of more complex protein folding requires elucidating the detailed mechanistic dynamics of the initial formation and subsequent evolution of the protein structures, possible misfolding and recovery, and the final transition to the native state. Simulations of the molecular mechanism for this dynamical evolution toward the native protein structure have not been possible in full atomistic detail because of the immense number of integration steps (≈109 for a 1-μs simulation), the complexity of the potential energy surfaces (≈106 interactions for each step), and the huge range of configurations involved. Many recent all-atom simulations are devoted to protein folding (1024), and some display general characteristics of folding (1824). However, the majority of these studies focus on constructing free energy surfaces (1019), an equilibrium property that can be obtained by using a number of shorter simulations that are insufficient for exhibiting the full dynamical mechanism of protein folding. Hence, there is an obvious need for greatly extending the accessible time scales for individual trajectories in molecular simulations to approach the long times characteristic of protein folding. However, as stated by Snow et al. (24), “performing a molecular simulation for 10 μs... for the protein BBA5 (23 residues) would require decades for a typical modern CPU.”

In this work, we apply faster validated implicit solvent simulations to describe the all-atom folding dynamics, including misfolding and recovery events, of a small helical protein on the microsecond time scale by using garden-variety personal computers. Because all-atom explicit solvent molecular dynamics (MD) simulations must involve computing the forces acting on each atom of both the solute and solvent, >95% of the computation is dedicated to the solvent molecules (25, 26), thereby severely limiting the time scales accessible with MD simulations. Hence, implicit solvent models (27, 28) are widely used to reduce the enormous computational times in simulations, because only the degrees of freedom of the solute are explicitly considered. Although a large number of implicit solvent models are used, very few of them have been directly compared with the corresponding explicit solvent simulations. According to our recent screening (29, 30) of numerous models, omission or poor representation of the following factors will result in false dynamics in aqueous solution: (i) a proper scaling of frictional and random forces; (ii) dielectric screening; and (iii) hydrophobic and other protein–solvent interactions. The Shen–Freed (SF) solvation model, on the other hand, has been validated by reproducing fairly accurately the results of explicit solvent simulations for Met-Enkephalin dynamics (29) and for the initial stages of folding of the villin headpiece HP-36 (30). This Langevin dynamics (LD)-based method describes the influence of the solvent in a mean-field representation by using a distance-dependent dielectric constant, atomic friction coefficients computed from an accessible surface area (ASA) model, and an ASA-based solvation potential to describe the solvation free energy. Although approximate and subject to further improvement, the SF model provides a fast and workable model that compares quite favorably with explicit solvent MD simulations but achieves this agreement with two to three orders of magnitude reduction in computation time [after several performance-enhancing algorithms (31)]. The SF model requires only the input of the solvent viscosity, dielectric constant, and surface solvation parameters. This approach is unique among solvation models, because the temperature dependence of all three quantities can be determined precisely, making the SF model suitable for temperature-dependent studies.

We have studied the dynamics and the possible folding intermediate of the E6ap peptide by using the SF implicit solvent method and single trajectories as long as 1 μs. E6ap is an 18-residue protein fragment and also is the binding domain of a much larger human cellular protein E6AP (E6-associated protein) (32), which is central to the oncogenic activity of the high-risk human papillomavirus (HPV) E6 protein. The viral E6 protein targets and interferes with the normal function of key cellular proteins that regulate cell proliferation and differentiation (33). For example, E6 binds to the cellular tumor suppressor p53, which is important in cell cycle control and apoptosis in response to DNA damage. Binding of high-risk HPV E6 with p53 appears to be mediated by another cellular protein, E6AP, which contains ubiquitin–protein ligase activity in its C-terminal Hect domain (34). The binding complex between the human E6AP and the virus E6 protein results in the specific ubiquitation and subsequent degradation of p53 (35, 36), which accounts, in part, for the oncogenic action of HPV E6. Located in the central portion (391–408 in the amino acid sequence) of E6AP, the small binding domain fragment E6ap is both necessary and sufficient for binding to the viral E6 protein (32, 36). Deletion of these 18 amino acids from E6AP results in loss of both E6 and p53 binding activities.

Be et al. (37) have synthesized E6ap and determined its structure in 40% tetrafluoroethanol (TFE) aqueous solution by 2D NMR spectroscopy. The solution structure is a slightly skewed α-helix with flexible termini. Three leucine residues (Leu-9, Leu-12, and Leu-13) reside on the same side of the helix surface and form a leucine hydrophobic cluster essential for E6-binding activity, as demonstrated by site-directed mutagenesis, because substitution of any of the three core leucines with alanine completely destroys the binding activity. The 3D structure of E6ap obtained from NMR experiment is displayed in Fig. 1 a. The overall NMR structure is a single α-helix strand comprising most of the residues, with a well defined central part from residues 4–13 (37). However, we emphasize that this NMR structure has been determined in a solution with 40% TFE, a cosolvent that stabilizes α-helices in aqueous solution. Although CD measurements indicate that the helical content of E6ap is only ≈20% in pure aqueous solution, implying a rather poor helix former, Be et al. (37) suggest from their mutational analysis that the helical conformation of E6ap is essential when binding to the viral E6 protein.

Fig. 1.

(a) Representative NMR structure of E6ap in 40% TFE aqueous solution as determined by Be et al. (37). (b–e) Typical conformations adopted by E6ap in the LD simulation trajectories beginning from the NMR structure.


The mechanisms whereby E6AP binds to the E6 protein, which is essential in the E6-mediated degradation of p53, are of obvious interest. Because the structure and dynamics of the binding domain fragment E6ap may provide insight into the interaction between E6 and its interacting proteins E6AP, our study of E6ap using LD simulations is designed for the understanding of the structure and dynamics of E6ap in atomic detail. The next section briefly describes the implicit solvent model and simulation methods. Simulations for the stability of the E6ap native state and for the folding dynamics are presented in Stimulations of E6ap Dynamics, whereas Free Energy Landscape and Folding Path describes the free energy surfaces for the E6ap folding process.

Implicit Solvent LD Simulation Methods

The LD simulations take the total system energy U total = U b + U bend + U tors + U imp - tors + U ch(ε(r)) + U vdW + U solv(σ) as the sum of the usual types of interaction potentials between the protein atoms, whereas the solvent contributions are modeled by using a spatially dependent dielectric constant to screen charge–charge interactions U ch(ε(r)) and a solvation potential U solv(σ). The bonding interactions U b, bond–bond bending interactions U bend, and improper torsional energies U imp - tors are modeled by harmonic potentials, the regular torsional potentials U tors by standard periodic functions, and the van der Waals interactions by Lennard–Jones 6–12 potentials. The Coulomb interactions U ch(ε) = Σi > j q i q j/ε(r ij)r ij are expressed in terms of atomic partial charges q i and a Ramstein–Lavery style (38) distance-dependent dielectric constant ε(r). The microscopic solvation potential is modeled by using the Ooi–Scheraga solvent accessible surface area potential (39), Formula, where σi is the accessible surface area of a hypersurface bisecting the first solvent shell surrounding protein atom i, and the empirical surface free energy parameters g i depend on the atom type. Because the g i are free energy parameters, the U total generates a temperature-dependent free energy that contains contributions from solvent reorientation within a mean-field approximation.

The LD simulations use the velocity Verlet algorithm (40) with a time step of Δt = 2.5 fs for integrating the equations of motion for the protein atom positions and velocities. The computations are performed by using a modified version of the tinker 3.9 molecular design package (41) and the standard amber94 force field (42). The frictional forces and corresponding random forces acting on the protein atoms are computed by using the Pastor–Karplus accessible surface area model (43). The solvent accessible surface areas σi for the friction coefficients are calculated from the exposed surface area of the solute atom by using a probe of zero radius. The smaller probe size for friction coefficients is used to cancel effectively the results of (more expensive to calculate) hydrodynamic interactions. The accessible surface areas, atomic friction coefficients, and solvation potentials are updated every 100 dynamical steps (0.25 ps), because tests show that this approximation incurs negligible error because significant conformational variations occur on a much longer time scale (29). No cutoff is applied on the forces between nonbonded atoms.

All calculations are performed on a single central processing unit (CPU) (Pentium 4, 1.6 GHz/256 Mb), with an output of ≈30 ns per CPU day for E6ap whose sequence is IPESSELTLQELLGEERR. Before starting each simulation, the initial conformation is energy minimized with a steepest descent algorithm to remove close contacts that would otherwise be present on cooling the initial configuration to 300 K (see Simulations of E6ap Dynamics). A constant temperature is maintained by using a Berendsen-type (44) thermal bath coupling throughout the simulation, and the data are saved every 4,000 steps (10 ps) for subsequent analysis.

Simulations of E6ap Dynamics

Before beginning the folding simulations of E6ap, we first test its stability by conducting LD simulations at 300 K that start with a representative NMR structure in the presence of 40% TFE. An ensemble of 24 NMR solution structures is obtained from the Protein Data Bank (PDB ID code 1EQX). Of this group of structures that all satisfy the NMR nuclear Overhauser effect constraints within a defined tolerance, a single representative native structure is selected by using the nmrclust module (45) from the olderado (On-Line Database Ensemble Representatives And DOmains). Among the 24 NMR structures, structure 16 is chosen to be the representative structure for comparisons. We have run two independent 50- and 100-ns trajectories beginning from the same NMR structure but with different random seeds for the initial velocity distribution and random forces.

The similarity of the protein structure in a simulation to its native conformation is characterized by the rms deviation (rmsd), which measures the minimized rms distance of each backbone Cα atom from its native structure. Because the structure of E6ap in water is currently unavailable from experiment, the NMR structure in the mixed solvent is chosen as the “native” conformation solely to have a fixed reference for computing the rmsd. The backbone rmsd between the instantaneous and NMR native protein conformations, for both the whole peptide and the central segments (residues 4–13), increases initially, reaches a constant after ≈2 ns, and remains close to that level throughout the simulation. The average of the saturated limit of the rmsd for the central segments (1.5 Å) is much smaller and exhibits less fluctuations than that for the whole peptide (≈3.8 Å), which suggests that the average structure of the central segments is more stable, whereas the two ends are very mobile. Nevertheless, these values for the rmsd indicate the expected departure of the conformation in aqueous solution from the “native” NMR structure in 40% TFE solution.

Representative conformations adopted by E6ap are shown in Fig. 1 b–e. Compared to the initial NMR structure in 40% TFE solution (in Fig. 1 a), the E6ap peptide retains a fluctuating α-helix structure in the central portion (Ser-4–Leu-13), where the essential leucine patch is well conserved throughout the two trajectories. However, both the N and C termini are dynamically very flexible. The equilibrium fluctuations of the helical content occur quite rapidly, normally within 1 ns, which suggests a highly labile helical dynamics in E6ap. As shown in Fig. 1 b–e, each terminus can extend the helix structure either by forming a backbone hydrogen bond or by flipping back to interact with the central portion through side-chain interactions and therefore failing to produce a discernible secondary structure. Also, it appears that the back-and-forth flippings of the N and C termini are quite independent.

To study the E6ap folding mechanisms and the possible folding pathways and intermediates, LD folding simulations are conducted with E6ap starting from completely denatured states that are obtained by first running LD simulations at a high temperature (750 K) for about 1 ns each. After the 1-ns denaturation, the structures are quenched and free energy minimized at 300 K to serve as the initial conformations for the LD folding simulations. (Note that this method of denaturation is more extreme than any accessible in experiments.) Six independent folding trajectories, with time durations of 104, 105, 143, 179, 281, and 1,004 ns, respectively, have been generated starting from different initial states. The longest 1-μs trajectory is used to construct the relative free energy plots in Free Energy Landscape and Folding Path.

The denatured conformations are fairly extended, and all secondary-structure and native contacts are removed after heating at 750 K for 1 ns. The essential leucine hydrophobic core (Leu-9, Leu-12, and Leu-13) may either be preserved or destroyed after heating. At the end of the simulations, all six folding trajectories exhibit a loosely fluctuating central helix and two flexible tails, similar to the trajectories that begin from the NMR structure in 40% TFE. However, the folding paths for the simulations that begin with configurations that lack a leucine core are qualitatively different from the other simulations, as discussed below.

To elucidate the dynamics of the fluctuating central helices in pure aqueous solution, an analysis of the average helical content is performed by using a time average computed from the 1-μs trajectory. A residue is defined as being in the α-helix state at time t if the (ϕ, Ψ) angles fall within an angular range ±30° about (-57°, -48°), and the residue forms an (i, i + 4)-type backbone hydrogen bond with other residues. The second criterion is used to exclude cases where the instantaneous (ϕ, Ψ) angles form a right-handed α-helical structure without having structural support from hydrogen bonds. To compute the average helical content from the LD trajectory, it is convenient to define a variable ξμ(t i) that is unity if residue μ is in a helical state at the ith time step t i and that vanishes otherwise. The average of ξμ(t i) over t i and μ gives Formula where n is the number of residues, T is the number of time frames in the trajectory, and μ is the average helical content of the μth residue. Fig. 2 presents μ as a function of residue μ by using the last 500 ns of the 1-μs trajectory (to assure that E6ap is well equilibrated). The flexible nature of the helical structure of E6ap in pure aqueous solution is quite evident from Fig. 2, which exhibits residues Leu-9 and Leu-12 as a center of helical stability, Glu-11 as a center of minimal helical stability, residues Pro-2 through Glu-6 as weakly helical, and the C-terminal residues Gly-14 through Arg-17 as virtually devoid of helical content. Using this stringent definition for the α-helix, the average helical content emerges as 22.3% and is lower than the helical contents deduced from other popular definitions [31.5% with the Kabsch–Sander definition (46), 39.5% with a criterion depending only on Ψ/ϕ (26)]. The amber94 force field is known to overemphasize helical propensities, and the simulation temperature of 300 K is higher than the experimental condition of 288 K, where the helical propensity would be lower. Therefore, combining the experimental CD helical measurement (20%) and the simulated value (22–40%), we conclude that the E6ap adopts a fluctuating helical structure in pure water. Moreover, computations of the correlation time τμ for survival of helical content in the μth residue support the fluctuational description of a loose central helix. Fig. 2 displays τμ in the nanosecond range but with no correlation between τμ and μ. For example, Thr-8 has the third largest μ but one of the smallest τμ, whereas Glu-3, Glu-6, and Leu-7 exhibit the largest τμ but only moderate μ, suggesting that the residues participate in a stable helix that forms rarely but persists relatively long. The high helical stability of Leu-9 and Leu-12 is vital for the p53-binding activity of E6ap, because this stability maintains the structural integrity of the leucine-triplet patch that is required for binding to E6. Despite the low average helicity of E6ap in pure aqueous solution, this LD simulation displays the structure of the binding site as well preserved with relatively high local helix propensity.

Fig. 2.

The average residual helical content μ (circles) in E6ap and helical survival correlation time τμ (triangles).


Further analysis of the dynamics proceeds by viewing the time evolution of the radius of gyration (R g) and the backbone rmsd for both the whole peptide and for the central segments (residues 4–13). In Fig. 3a (for one of the five folding trajectories), the rmsd curves initially drop sharply during the first few nanoseconds and then fluctuate around the same constant values (3.7 Å for the whole peptide and 1.5 Å for the central segments), as in the simulations beginning from the NMR structure in 40% TFE solution. The time course of the R g exhibits behavior similar to the rmsd and saturates around an average of 9 Å. However, the rmsd curves in Fig. 3b (misfolding trajectory) initially increase to much larger constant values (7 Å for the whole peptide and 4 Å for the central segment) and remain there for >80 ns. Then, at ≈85 ns, they rapidly drop (in 3 ns) to the equilibrium values as in the stability simulations. The curves for R g in Fig. 3b initially decrease to a smaller value (≈8 Å) and then suddenly increase back to the equilibrium value (≈9 Å) at around 85 ns. Note that except for the first few nanoseconds, the fluctuations in R g are anticorrelated with those in the rmsd, which means the native state is not the most compact state, in contrast with the situation generally found for large proteins.

Fig. 3.

(a and b) The time evolution of the radius of gyration and the backbone rmsd for whole chain and residues 4–13 for the folding and misfolding simulations, respectively. (c and d) The time evolution of the fraction of native contact P (dots) and the probability for the presence of native contacts Q (lines) for folding and misfolding simulations, respectively.


The folding process can be described more quantitatively by considering the time course of the fraction of the native contacts Q. Two residues are considered to be in contact when the distance between their α-carbons is within 6.5 Å, and they are separated by at least two residues in the sequence (to remove interferences from bonding constraints) (47). This conventional definition of the fraction of native contacts suffers from some shortcomings. By definition, Q can equal only discrete rational values n/[total number of native contacts], where n is an integer from zero to the total number of native contacts. The time course of Q along a single trajectory is very noisy when the total number of native contacts is small (only 38 for E6ap), and the information conveyed is partially obscured by the large fluctuations. Hence, we introduce a new parameter P as a measure of the average probability for the presence of native contacts. Instead of a sharp cutoff 6.5 Å for the contact distance as in the calculation of Q, the native contact probability for residues i and j is defined by a Gaussian distribution with mean Formula and with a spread proportional to the interparticle distance Formula in the native structure. Thus, the average total probability of finding native contacts at time t is defined as Formula where r ij(t) is the α-carbon distance (in angstroms) at time t between residue i and residue j (|i - j| ≥ 2), and N is the total number of residue pairs treated. α is an empirical parameter that controls the penalty for deviations from the mean separation distance in the native state, and a value of 14.0 for α is obtained by minimizing the difference between P and Q. As shown below, P is continuous, less scattered, and provides much clearer information than Q for the structural details arising from long-distance (>6.5 Å) contacts.

Fig. 3 c and d displays the fraction of native contacts Q and the average probability of native contacts P as a function of time for a typical folding trajectory and for the misfolding trajectory. The behavior of the parameter P agrees very well with Q but contains significantly reduced fluctuations, so attention is focused on P for the remainder of the discussion. During the folding trajectory, P quickly grows from 0.2 to 0.7 within the first 10 ns and then fluctuates between 0.6 and 0.8 ns for the rest of the trajectory, which accords well with the initial decrease and the subsequent dynamics of the rmsd in Fig. 3a. P for the stability trajectories (not shown) exhibits similar fluctuations between 0.6 and 0.8 ns. On the other hand, P for the misfolding trajectory experiences an initial rapid decrease to 0.3 ns, then slowly but steadily grows back to ≈0.7 ns, and at ≈90 ns jumps in a similar although less dramatic behavior than the rmsd but with several nanoseconds lag. Because the time evolution of P more or less follows the side chain dynamics and because the rmsd tracks only the backbone movement, the difference in their time courses suggests that the side chain movement is slow and continuous, whereas alterations in the backbone structure are more discrete and occur on a shorter time scale.

The behavior of both the rmsd and P implies that dramatic changes occur in the E6ap conformation at ≈85 ns in the misfolding trajectory. Typical spatial configurations for E6ap before 85 ns (the misfolded structure) and the NMR structure are depicted in Fig. 4, with the core residues Leu-9, Leu-12, and Leu-13 displayed in atomic detail. Although the NMR structure in 40% TFE solution has a single central helix with the leucine core on the same side of surface, the misfolded structure is composed of two smaller helices located at separate arms of a hairpin-like structure. Also evident in Fig. 4 is the absence of the leucine hydrophobic core in the misfolded helical hairpin, with the three core leucine residues situated at different sides of the backbone chain and especially with Leu-9 not involved in either of the two helices. The role of the leucine triplet is better illustrated in terms of solvent accessible areas. The misfolded state has a particularly large surface area of 322.5 Å2 for Leu9-12-13, nearly as large as the 323.8 Å2 for the unfolded state, and much greater than the 298.1 Å2 for the native structure. This surface burial of the leucine triplet and the favorable Lennard–Jones interactions on clustering not only stabilize the native state but also guide the misfolded structure toward it.

Fig. 4.

Misfolded (a) and NMR (b) structures, with the core residues Leu-9, Leu-12, and Leu-13 displayed in atomic detail.


Free Energy Landscape and Folding Path

The misfolding trajectories display the peptide as initially trapped in a misfolded state before folding back into the somewhat loose native state. To elucidate the nature of the intermediate misfolded state more fully, we conduct a long trajectory (≈1 μs), beginning with the same initial conformation as in the previous misfolding simulation but with a different random number seed for the initial velocity distribution and random forces. The time course of the 1-μs trajectory exhibits the same initial trapping (rmsd4–13 ≈ 3.5 Å) in the misfolded state, although the transition time varies slightly, with a later return to the equilibrium state (rmsd4–13 ≈ 1.5 Å). Notably, the two misfolding trajectories both display a conversion time from the misfolded to folded states at ≈80–100 ns. Subsequent to the initial trapping before 90 nsec, the rmsd for the long trajectory increases again toward the misfolded state value before finally settling down to that characteristic of the native state central portion value at t ≈ 350 ns. The protein conformation at t ≈ 300 ns appears to be similar to that at ≈70 ns, as displayed in Fig. 4 b, which indicates that the peptide revisits the same misfolded state reversibly.

A full understanding of the folding mechanism requires a global overview of the free energy landscape. However, it is difficult to construct the free energy surface for several reasons. First, the free energy depends on a large number of variables. To deduce meaningful information, the multidimensional surface is generally projected onto some suitable folding coordinates (9, 15). Second, to compute the free energy, a complete sampling of conformational space is necessary. Although umbrella-sampling (48, 49) and multicanonical or entropy-sampling techniques (5, 7) appear very promising, a large number of simulations are required just to map out the structure of the free energy surface near the native structure basin.

We have computed the relative free energy for the folding free energy landscape of the E6ap peptide. The relative free energy is taken to be proportional to the logarithm of the probability for finding the peptide with specific values of the chosen folding coordinates (2123) during the long 1-μs trajectory. The folding coordinates are the rmsd4–13 and R g, whose instantaneous values during the folding process are projected onto a 2D contour map. Even though all the folding simulations yield the same free energy basin as in the simulations that begin from the NMR structure, only the 1-μs folding trajectory is long enough that the system crosses the barrier between folded and misfolded free energy basins more than once. Hence, the following relative free energy plots are understood as being only qualitative.

Fig. 5 displays the contour plot of the free energy surface as a function of the rmsd and R g for the 1-μs trajectory. The large and flat native basin at rmsd4–13 ≈ 1.5 Å and R g ≈ 9 Å for the equilibrium simulation (not shown) indicates a flexible and dynamical native state. The folding trajectory produces a single energetic descent toward the native basin. Two free energy minima are clearly evident in the free energy surface for the 1-μs misfolding simulation (Fig. 5). In addition to the same minimum at rmsd ≈ 1.5 Å and R g ≈ 9 Å, another minimum exists at rmsd4–13 ≈ 3.5 Å and R g ≈ 7.5 Å, thereby demonstrating the existence of a metastable state with relatively lower R g and larger rmsd.

Fig. 5.

Relative free energy contour map as a function of the rmsd and R g for 1-μs folding–misfolding trajectory. Inset highlights the native basin. The free energy difference between contours is 0.4 kcal/mol, and the barrier height is ≈2 kcal/mol.


Further inspection of the energy components at the misfolded and folded free energy basins provides better insight into the underlying forces during the folding-misfolding transition. The total potential energy for the long 1-μs simulation displays an overall descent throughout the simulation, varying sharply at the critical time of the misfolding–folding transition. The Coulomb and solvation energies, however, exhibit quite different behaviors. The solvation free energy is positively correlated whereas the Coulomb energy negatively correlated with respect to the behavior of rmsd. The instantaneous peptide conformation in the 1-μs trajectory is classified as being in the misfolded or native free energy basins if its R g and rmsd values fall into the ranges described Table 1. Table 1 also presents average values of the total energy, Coulomb energy, and solvation energy for E6ap in the misfolded and native free energy basins. The average total energy of the folded state is ≈7.3 kcal/mol lower than that of the misfolded state. The average solvation free energy is 2.4 kcal/mol lower, whereas the average Coulomb energy is 6.0 kcal/mol higher for misfolded than for the native state. The lower solvation energy is as expected because of the more compact structure of the misfolded state (see Fig. 4 b). The excess electrostatic energy in the folded state is mainly due to the formation of more backbone hydrogen bonds in the core α-helix, and this energy contribution finally drives the peptide back to the folded structure.

View this table:
Table 1. Comparison of the energy components for the native and misfolded state free energy basins

Discussion

We have examined the dynamics of the folding of the E6ap peptide by using implicit solvent LD simulations. Our results indicate that: (i) E6ap adopts a flexible helical structure in the LD simulation, and the leucine hydrophobic core lies on the same face of the helix, both of which agree well with NMR experiments; (ii) the helical content derived from the long time simulation is 22–40%, which overestimates, as expected, the 20% from CD measurements and suggests the reliability of this description; (iii) the central leucine core is not only crucial for its biological function but also important for the stability of native structure; (iv) starting from different initial configurations, there are multiple folding pathways leading to the native state, and E6ap can behave either as a two- or three-state folder; (ν) the free energy surface contains a significant barrier between the native and the misfolded state basins; and (vi) the Coulomb energy gained in forming backbone hydrogen bonds is the main driving force for the peptide to fold back into its native structure. The richness observed in the diversity of folding pathways for this simple 18-residue peptide suggests that perhaps even greater richness is to be found from simulations for the folding of larger and more complex proteins through computations that are enormously facilitated by the use of accurate, validated, implicit solvent models and by continually increasing computational power.

Acknowledgments

We thank Prof. Tobin Sosnick for many helpful suggestions. B.C. is supported in part by National Science Foundation Grant NSF-9977841. M.-y.S. and K.F.F. acknowledge the support of the National Institutes of Health (Grant GM56678).

Footnotes

  • To whom correspondence should be addressed. E-mail: k-freed{at}uchicago.edu.

  • * B.C. and M.-y.S. contributed equally to this work.

  • Abbreviations: SF, Shen–Freed; LD, Langevin dynamics; TFE, tetrafluoroethanol; rmsd, rms deviation.

References

« Previous | Next Article »Table of Contents

This Article

  1. PNAS June 10, 2003 vol. 100 no. 12 7087-7092

Classifications