Previous Article |
Table of Contents
| Next Article
BIOPHYSICS
Atomically detailed folding simulation of the B domain of staphylococcal protein A from random structures



*Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301;
Facultad de Ciencias Físico Matemáticas y Naturales, Instituto de Matemática Aplicada San Luis, Consejo Nacional de Investigaciones Científicas y Técnicas de Argentina, Universidad Nacional de San Luis, Ejército de Los Andes 950-5700 San Luis, Argentina; and
Computational Biology Service Unit, Cornell Theory Center, Cornell University, Ithaca, NY 14853-3801
Contributed by Harold A. Scheraga, October 6, 2003
| Abstract |
|---|
|
|
|---|
The conformational space of the 1055 fragment of the B-domain of staphylococcal protein A has been investigated by using the electrostatically driven Monte Carlo (EDMC) method. The ECEPP/3 (empirical conformational energy program for peptides) force-field plus two different continuum solvation models, namely SRFOPT (Solvent Radii Fixed with atomic solvation parameters OPTimized) and OONS (Ooi, Oobatake, Némethy, and Scheraga solvation model), were used to describe the conformational energy of the chain. After an exhaustive search, starting from two different random conformations, three of four runs led to native-like conformations. Boltzmann-averaged root-mean-square deviations (RMSD) for all of the backbone heavy atoms with respect to the native structure of 3.35 Å and 4.54 Å were obtained with SRFOPT and OONS, respectively. These results show that the protein-folding problem can be solved at the atomic detail level by an ab initio procedure, starting from random conformations, with no knowledge except the amino acid sequence. To our knowledge, the results reported here correspond to the largest protein ever folded from a random conformation by an initial-value formulation with a full atomic potential, without resort to knowledge-based information.
In this article, we report the results of the global optimization of the all-atom force field ECEPP/3 (empirical conformational energy program for peptides) (2629) plus two implicit hydration models [SRFOPT (Solvent Radii Fixed with atomic solvation parameters OPTimized; ref. 30) and OONS (Ooi, Oobatake, Némethy, and Scheraga solvation model; ref. 31)], using the electrostatically driven Monte Carlo (EDMC) method (32, 33) to explore the conformational space of the 1055 fragment of the B-domain of the staphylococcal protein A molecule, efficiently. The structure of this fragment of the B-domain of the protein A molecule is known from x-ray (34) and NMR (35) investigations, and from minimalist and all-atom simulations (3646). However, such initial-value-formulated simulations (except for ref. 46, which is a boundary-value formulation) were not started from a random conformation. Therefore, in this work, we attempted to provide an extensive exploration of the conformational space by starting from two different randomly chosen conformations, using a Beowulf class cluster.
| Methods |
|---|
|
|
|---|
![]() | [1] |
where Eint(rp) is the internal conformational energy of the molecule in the absence of solvent, assumed to correspond to the ECEPP/3 energy of the neutral molecule and Fsas(rp) represents the solvation free energy as defined by Vila et al. (30). Two different solvation models were used during the simulations, namely, SRFOPT (30) and OONS (31).
The Starting Point: Generation of the Random Conformation. The N and C termini of the 1055 fragment of the B-domain of the staphylococcal protein A molecule were blocked by amino-COCH3 and carboxyl-NH2 groups, respectively. The simulations were started from two different initial random conformations, i.e., namely rnd_1 and rnd_2. For each of these random conformations, all backbone and side-chain dihedral angles were chosen randomly between 180.0° and 180.0°, with the exception of the dihedral angles
of the peptide group, which were always chosen in the trans (180.0°) conformation. All backbone and side-chain dihedral angles (including
's) were allowed to vary freely during the simulations, with the exception of proline residues. In the ECEPP/3 fixed geometry approximation, both up (U) and down (D) puckering conformations of the pyrrolidine ring (which pertain to the following:
= 53.0° and
1 = 28.1° and
= 68.8° and
1 = 27.4° positions, respectively, of the C
atom of the proline residue) are considered. At the beginning of a simulation, the puckering of the proline residues is chosen randomly. However, up and down puckering and cis
trans isomerization of the peptide group preceding proline residues were allowed to permute during the course of the simulations. During the EDMC search, if a proline residue is selected for a change, an attempt is made to change the puckering state with a probability of 1/2. In addition, the probability of changing the dihedral angles
preceding proline from trans to cis is 1/3, and the values of
were allowed to vary around the trans and cis conformations.
The Conformational Search. For each rnd_1 and rnd_2 initial conformation, an ensemble of conformations was generated by using the EDMC method, as shown in Table 1 for runs 14. During each of these runs, the energies of the initially generated and subsequent conformations were minimized by using the secant unconstrained minimization solver (SUMSL) algorithm (47) in combination with ECEPP/3 plus a surface accesible solvation model, either SRFOPT or OONS. Only a small set of low-energy conformations was retained. This set corresponded to the accepted conformations from the Monte Carlo path followed by the EDMC method in each of these runs. It should be noted that the acceptance rate (see Table 1) is low for all of the runs in our simulations. However, as was noted previously (48, 49), this acceptance rate is characteristic of the EDMC procedure.
|
Evaluation of the Results: Computation of the Root-Mean-Square Deviation (RMSD). We chose to compare the results of our simulations with the minimized-averaged structure of the 1055 fragment of the B-domain of the staphylococcal protein A molecule free in solution determined from NMR at 30°C by Gouda et al. (35), and not with the three-dimensional structure of the 1055 fragment of the B-domain of the staphylococcal protein A molecule bound to the Fc fragment of human polyclonal IgG, determined by x-ray crystallographic analysis by Deisenhofer (34). The reason for this decision was that the B-domain of the staphylococcal protein A molecule forms contacts with the Fc fragment in the crystal, and this could be the main source of the structural differences observed between the NMR and x-ray structures, i.e., the fragment¶ Ser-42Ala-55 constitutes helix III of the NMR structure whereas, in the crystallographic structure, there is no structural information available for the segment Ala-49Lys-59, and the portion of the polypeptide chain from Ser-42Glu-48 is in an extended conformation.
Because all of the NMR constraints used to determine the most probable conformation in solution represent population-weighted-averaged measurements, we computed the Boltzmann-averaged values for the RMSD from the minimized-averaged structure obtained from the NMR experiment (35), using the separate ensembles of conformations generated from each of two different starting random conformations.
| Results |
|---|
|
|
|---|
RMSDs from the native structure for all the accepted conformations listed in Table 1 for the runs with the SRFOPT and OONS surface area model, respectively. Table 2 shows the Boltzmann-averaged RMSDs from the native fold obtained with these solvation models for all the accepted conformations obtained for each of the simulations starting from rnd_1 and rnd_2. The plots in Figs. 1 and 2 pertain to the C
(rather than the backbone heavy atoms or all of the heavy atoms) RMSDs from the native structure because there are very good correlations between these quantities and the C
RMSDs, with a correlation coefficient of R = 0.997 and 0.980, respectively. In other words, plots of the RMSDs of the backbone heavy atoms or all of the heavy atoms, instead of the C
rms deviations from the native structure, do not add any new information.
|
|
|
The NMR-determined structure of the 1055 fragment of the B-domain of the staphylococcal protein A molecule is composed of a bundle of three
-helices as shown in Fig. 3b, i.e., helix I (Gln-10His-19), helix II (Glu-25Asp-37), and helix III (Ser-42Ala-55). The lowest energy conformation found with SRFOPT (720.7 kcal/mol) exhibits a similar distribution of helices, i.e., helix I (Gln-11His-19), helix II (Arg-28Asp-37), and helix III (Pro-39Asp-54). Helices II and III are antiparallel to each other in both the NMR-derived native fold and in the lowest-energy conformation found with either SRFOPT or OONS, as shown in Fig. 3 c and d.
|
Independent of the starting conformation, the SRFOPT solvation model always led to a native-like structure. Both of the lowest-energy conformations identified in these runs (715.5 and 720.7 kcal/mol, respectively) have native-like folds. Analysis of the runs from Table 1 shows that the potential function with the OONS solvation parameters led to two broad basins with similar energies. One of the basins contains the native-like structure (651.4 kcal/mol) and the second one contains conformations that represent mirror images of the native-like structure (RMSD vs. Energy is indicated in Fig. 2).
To investigate the depth of the basins containing the native- and mirror-like conformations, we carried out additional crosschecking runs. The lowest-energy conformation obtained with SRFOPT in run 2 (720.7 kcal/mol) was used as the starting conformation with the OONS solvation model and, vice versa, the lowest-energy conformation, i.e., the mirror image, obtained with OONS in run 4 (652.9 kcal/mol) was used as the starting conformation with the SRFOPT solvation model. The resulting total energy shown in Table 1 for the mirror image (in italics) indicates that, for both solvation models, the mirror image is higher in energy than those belonging to the native-like conformation, after the crosschecking. It should be noted that no constraints were used to restrict the search during these runs.
It was found that the potential energy with OONS seems to distinguish the native-like structure (654.3 kcal/mol) from the mirror image (652.9 kcal/mol); however, the energy difference between these two basins is quite small. The potential energy with SRFOPT, on the other hand, shows a bigger difference between both structures, namely, 720.7 kcal/mol for the native-like vs. 702.5 kcal/mol for the mirror image structure, indicating that this solvation model can detect the correct fold. A summary of these crosschecking runs is shown in Table 1, runs 5 and 6.
The scatter plots shown in Figs. 1 and 2 indicate that the differences in energy among the few conformations close to the lowest one are small whereas they display a broad dispersion in RMSD. This result represents one of the main difficulties in identifying the native-like structure. On the other hand, this result means that the total energy, as a scoring function, must be extremely precise to distinguish the correct folded structure from wrong ones. This requirement constitutes a challenge for improving existing force fields or for developing new potential functions. In addition, the small energy gap between basins containing quite different folds represents a great challenge for search methods. Regarding this observation, it can be seen from Table 2 that the EDMC method did find a conformation that is quite close to the native structure (with an RMSD for the C
atoms of 2.85 Å). However, this conformation with the lowest RMSD (698.3 kcal/mol) is higher in energy than the lowest energy minimum (720.7 kcal/mol) found for this solvation model by >20 kcal/mol. Fig. 4 shows the superposition of this structure with the native-NMR fold. It should be pointed out, for comparative purposes only, that the C
RMSD of the fragment containing helix I, helix II, and part of helix III, i.e., residues|| 128162 of the x-ray (35) and equivalent residues 1044 of the NMR (36) structures, is 2.0 Å.
|
| Discussion and Conclusions |
|---|
|
|
|---|
We wrote in 1994 (54), "With the recent development of various efficient approaches to overcome the multiple-minima problem, the problem has been in some sense solved for small oligopeptides, as well as for regular-repeating structures and assemblies of fibrous proteins. It can reasonably be expected that the current extension of these to globular proteins will result in efficient searches of their conformational space in the near future. Advances in computer hardware and software, especially the wider use of parallelism, will speed up computations, making practical the application to larger molecules. The point may have been reached where wider applicability of these techniques is becoming limited by the accuracy of the potential energy functions used to describe the energetic of polypeptide and protein structure." Results of this work, in some sense, support this assertion. The 46-residue fragment of protein A is larger than the 36-residue
-helical protein from the villin headpiece, for which all-atom simulations starting from an extended structure were previously carried out (55, 56). It is worth noting that simulations on the villin headpiece were carried out with explicit solvent (55), which increases the computing time considerably compared with the time required for the implicit solvent models used in our simulations. However, despite the large progress in computer technology, we still cannot extend the all-atom approach to the prediction of the folding of globular proteins containing 100200 residues. For this reason, the use of simpler models based on physical grounds, such as the hierarchical approach starting with the united-residue (UNRES) force field and finishing with an all-atom search (57), is necessary. Also, procedures that combine knowledge-based information with energy minimization (5860) have achieved some measure of success in recent CASP (critical assessment of techniques for protein structure prediction) blind tests. Hopefully, further improvements will extend the applicability of all-atom ab initio procedures.
| Acknowledgements |
|---|
| Footnotes |
|---|
¶ Residue numbers correspond to those used for the NMR form in the Protein Data Bank: file 1BDD
[PDB]
. ![]()
|| Residue numbers correspond to those used for the crystalline form in the Protein Data Bank: file 1FC2
[PDB]
. ![]()
To whom correspondence should be addressed. E-mail: has5{at}cornell.edu.
| References |
|---|
|
|
|---|
, N. & Scheraga, H. A. (1969) J. Chem. Phys. 51, 47514767.[CrossRef]
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
S. Kmiecik and A. Kolinski Folding Pathway of the B1 Domain of Protein G Explored by Multiscale Modeling Biophys. J., February 1, 2008; 94(3): 726 - 736. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. B. Ozkan, G. A. Wu, J. D. Chodera, and K. A. Dill Protein folding by zipping and assembly PNAS, July 17, 2007; 104(29): 11987 - 11992. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. A. Hubner, E. J. Deeds, and E. I. Shakhnovich High-resolution protein folding with a transferable potential PNAS, December 27, 2005; 102(52): 18914 - 18919. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ota, M. Ikeguchi, and A. Kidera Phylogeny of protein-folding trajectories reveals a unique pathway to native structure PNAS, December 21, 2004; 101(51): 17658 - 17663. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Arora, T. G. Oas, and J. K. Myers Fast and faster: A designed variant of the B-domain of protein A folds in 3 {micro}sec Protein Sci., April 1, 2004; 13(4): 847 - 853. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Dimitriadis, A. Drysdale, J. K. Myers, P. Arora, S. E. Radford, T. G. Oas, and D. A. Smith Microsecond folding dynamics of the F13W G29A mutant of the B domain of staphylococcal protein A by laser-induced temperature jump PNAS, March 16, 2004; 101(11): 3809 - 3814. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||