New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Protein elastic network models and the ranges of cooperativity
Abstract
Elastic network models (ENMs) are entropic models that have demonstrated in many previous studies their abilities to capture overall the important internal motions, with comparisons having been made against crystallographic Bfactors and NMR conformational variabilities. ENMs have become an increasingly important tool and have been widely used to comprehend protein dynamics, function, and even conformational changes. However, reliance upon an arbitrary cutoff distance to delimit the range of interactions has presented a drawback for these models, because the optimal cutoff values can differ somewhat from protein to protein and can lead to quirks such as some shuffling in the order of the normal modes when applied to structures that differ only slightly. Here, we have replaced the requirement for a cutoff distance and introduced the more physical concept of inverse power dependence for the interactions, with a set of elastic network models that are parameterfree, with the distance cutoff removed. For small fluctuations about the native forms, the power dependence is the inverse square, but for larger deformations, the power dependence may become inverse 6th or 7th power. These models maintain and enhance the simplicity and generality of the original ENMs, and at the same time yield better predictions of crystallographic Bfactors (both isotropic and anisotropic) and of the directions of conformational transitions. Thus, these parameterfree ENMs can be models of choice whenever elastic network models are used.
Protein dynamics can provide important insights into protein function. Most proteins carry out their functions through conformational changes of the structures. In Xray structures, the information about thermal motions is provided by the DebyeWaller temperature factors or Bfactors, which are proportional to the mean square fluctuations of atom positions in a crystal. Thus, accurate predictions of crystalline Bfactors offer a good starting point for understanding the functional dynamics of proteins.
A number of computational and statistical approaches has been proposed to predict protein Bfactors from protein sequence (1–7), atomic coordinates (8–13), and electron density maps (14). The atomic coordinatebased methods such as molecular dynamics (MD) (15–18) and normal mode analysis (NMA) (19–22) are computationally expensive, and thus, in the past decade, the elastic network model (ENM), with NMA, using a single parameter harmonic potential, usually coarsegrained, has been widely used for studying protein dynamics including Bfactor predictions. The ENM for isotropic fluctuations is called the Gaussian network model (GNM) (23, 24), where only the magnitudes of the fluctuations are considered. Its anisotropic counterpart, where the directions of the collective motions are also examined, is called the anisotropic network model (ANM) (25), and these can be compared with the experimental anisotropic temperature factors (26–29), but generally these comparisons are worse than when atomic representations are used (30).
Previous studies have shown that in many cases, GNMpredicted Bfactors are in quite good agreement with the experimental Bfactors determined by crystallographers (8–11). Kundu et al. (10) studied 113 Xray protein structures and found that GNM is able to predict the experimental Bfactors well, yielding an average correlation between prediction and experiment of ≈0.59. The results of Sen et al. showed that the correlations between experimental Bfactors and the GNMpredicted values are quite similar at either coarsegrained or atomic levels (12), which is consistent with its being overall an entropy model. The ANM can also be used for Bfactor pre dictions, although, in reproducing the isotropic Bfactors, it was noted that ANM generally performs slightly worse than GNM (10).
In the ENM, a parameter—the cutoff distance is used to define the connections and placement of springs between residue pairs. Only the residue pairs within a given cutoff distance are considered to be connected to one another. The cutoff in GNM is generally set at 7–8 Å. Kundu et al. (10) showed that 7.3 Å gave the best results for Bfactor prediction for most of the proteins in the dataset. In ANM, the optimal cutoff is usually larger, ≈13–15 Å (25). Besides the parameter for cutoff distance, the spring constant is another parameter. However, in current practice, most researchers use a uniform spring constant for all connected residue pairs, and this spring constant is used simply to scale overall the range of Bfactors so the spring constant is not actually a fundamental parameter in the model in the same sense that the cutoff distance is. In some studies (31, 32), stronger spring constants have been assigned for rigid elements in the structure. Erman empirically fitted the experimental Bfactors by iteratively changing the spring constants of the Kirchoff matrix for GNM (33). One concern about this approach is that the results of such a fitting (or overfitting) may not have a real physical basis, even though the correlations achieved between experiment and prediction can be excellent. The Hinsen (34–37) and the Phillips (29) groups have used a distancedependent force constant in their studies of protein dynamics. They proposed a force constant exhibiting an exponential decay over the distance between a pair of interacting atoms. The Hinsen model has some physical basis, but, it still does not fully avoid using an arbitrary distance parameter. In his model, the force constant k is defined as an exponential decay of distance r: k(r) = c exp(−r^{2}/r_{0}^{2}), where r_{0} is an arbitrary parameter (in Hinsen's studies, r_{0} was taken as 3.0 Å). There are several problems with using a cutoff distance. First, the values taken are arbitrary to a significant extent. Second, the optimal values vary somewhat for different proteins. Last, there is a discontinuity in the residueresidue interaction potential at the cutoff distance. There can be model artifacts such as the same mode for 2 only slightly different structures being significantly different with the corresponding mode indices shuffled.
Recently Lin et al. proposed a weighted contact number (WCN) model (38) for Bfactor predictions. Using the WCN model, they found a good correlation between the calculated WCN and the experimental Bfactors. The main difference between the WCN and previous contact numberbased models (39, 40) is that the WCN model does not use any cutoff distance, but instead considers the contacts between all pairs of residues. The effect of a contact between a pair of residues is weighted by the inverse of their square distance. One advantage of the WCN model is that it avoids choosing an arbitrary cutoff distance. Inspired by the simple idea of the WCN model, here we propose a parameterfree elastic network model (pfENM). The idea is to assume that there is an interaction between all residue pairs, and to make their interaction strengths inversely proportional to the square distance of their separation. In this way, we can avoid using an arbitrary cutoff distance in the models and therefore avoid the problems outlined above. And, we learn that this refined ENM actually shows significant improvements over the original ENMs. In our parameterfree models, we use an inverse 2nd power (square distance) to model the spring constants. Past work had used an inverse 6th power spring (29, 36) or a spring of exponential form (34–37). However, our results show that our pfENM with inverse 2nd power springs clearly outperforms them all in Bfactor predictions (see Table S1 and Table S2).
While a cutoff value was used in the original atomic elastic network model of Tirion (41), the use of a cutoff distance may also have been supported conceptually by the typical reliance upon a cutoff distance for defining coarsegrained contact energies derived from a set of known structures (42, 43). In the latter case the use of a cutoff distance is an attempt to avoid some of the details of interactions and to capture average effects. However, when computing molecular motions, the averaging represented in the use of a cutoff distance is not likely appropriate because it will inevitably eliminate some of the structural details, and ignore the characteristic physical nature of interactions between atoms that should diminish with increasing separation, since the springs should weaken to represent less cohesion between more distant parts. In fact, we find that the use of a cutoff distance, in addition to the use of the distance power of −2, always yields worse fits for the Bfactor data (See Table S3 and Table S4).
We test these parameterfree models (pfGNM and pfANM) using a large nonredundant dataset containing 1,220 Xray structures and a dataset that contains 341 highresolution structures having anisotropic Bfactor entries. Our results show that for Bfactor predictions, the pfENM models have an even better performance than the original ENM. Since these parameterfree models also maintain the simplicity of ENMs, we thus propose that in the future, these parameterfree ENMs ought to be the models of choice whenever elastic network models are required.
Results
Isotropic BFactor Predictions – GNM.
In this section, the GNM and pfGNM are compared for scalar isotropic Bfactor predictions. The correlation between the computed Bfactors and the experimental values are calculated for the 1,220 protein structures in our dataset. We have performed investigations to determine the best inverse power to use, and the detailed results are given in Tables S1–S4.
In the work of Kundu et al. (10), the mean of the correlations between experimental Bfactors and the GNM predicted values was 0.59. For our dataset, we obtain a mean correlation of 0.55 using the same GNM, and 0.60 when using pfGNM, both of which are similar to their results. Here, we point out that we must be cautious in interpreting these “mean values” since they are dataset dependent. With our dataset, the mean correlation values from pfGNM (0.60) and GNM (0.55) clearly indicate that pfGNM performs significantly better than GNM.
A more detailed comparison of the correlations is found in the histograms of the correlation distributions. See Fig. 1, which shows histograms of the distribution of correlations between experimental Bfactors and the computed values for GNM and pfGNM. The percentage of proteins having correlation values >0.5 is ≈75% using pfGNM, which is better than the 67% found by using the original GNM with a cutoff distance. For ≈73% of proteins, pfGNM yields better results than GNM.
To further compare the results from GNM and pfGNM, we calculate the percentage of Bfactor correlations improved by using pfGNM over GNM. Fig. 2 displays the improvement with pfGNM over the original GNM for Bfactor prediction. For >57% of proteins, the pfGNM correlation is at least 5% better than the GNM correlation.
Further Isotropic BFactor Predictions.
Similar comparisons are performed between pfANM and ANM. Fig. 3 shows the histograms of the distribution of correlations between experimental Bfactors and ones computed using ANM and pfANM. Similar to the comparison between pfGNM and GNM, the mean correlation values from pfANM (0.55) and ANM (0.46) indicate that pfANM peforms significantly better than ANM. The percentage of proteins having correlation values >0.5 is ≈65% using pfANM, which is better than that (50%) using the original ANM with a cutoff distance. For ≈83% of proteins, pfANM yields better results than ANM.
The improvement of pfANM over ANM is further shown by the increased percentage of Bfactor correlations in Fig. 4. For >72% of proteins, the pfANM correlation is at least 5% better than the ANM correlation.
Anisotropic BFactor Predictions.
To test the models more thoroughly, in this section we will look at the performance of these parameterfree ENMs (pfENM) for anisotropic Bfactor prediction. In comparison with isotropic Bfactors, anisotropic Bfactors provide not only the magnitudes of mean square fluctuations, but also information about the directions of the fluctuations. In PDB files, the anisotropic Bfactors are given as 6 numbers that represent the anisotropic tensor of the fluctuations. Hence the anisotropic Bfactors provide more data about the fluctuations. These are only available for some of the higher resolution crystal structures. Fortunately, because of improving experimental structure quality, the number of such proteins with anisotropic Bfactors has been increasing rapidly.
In recent work (26), we applied the ANM to study the anisotropic fluctuations of 341 proteins. Here, we carry out a similar study using pfANM. Again, we will see that pfANM performs significantly better than ANM.
The results of anisotropic Bfactor predictions from both pfANM and ANM are listed in Table 1.The experimental and calculated anisotropic Bfactor tensors, which can be represented by ellipsoids, are compared [see (26) for details]. Specifically, the first and second anisotropies (κ and χ) of the experimental anisotropic Bfactors are compared with those calculated from the models (pfANM or ANM). The first/second anisotropy is defined as the ratio of the length of the shortest (or second shortest) axis to the longest axis. The first column in Table 1 shows the mean difference between the experimental first anisotropy and the calculation. The means are taken over all proteins, and for each protein, over all its residues (each residue is represented by its alpha carbon). Similarly for the second column, the mean difference is for the second anisotropy. It is seen in Table 1 that pfANM clearly does better than ANM in predicting anisotropies even though the gain is small. The mean θ and φ values in the third and fourth column describe how well the directions of the ellipsoids match between experiment and calculation. Specifically, θ is the angle between the longest axes of the ellipsoids (which represent the anisotropic Bfactor tensors of experiment and calculation), and φ is the rotation angle required to align the second longest axes of the 2 ellipsoids once their longest axes are aligned (26). Therefore, a perfect prediction would render both θ and φ as 0. From the Table 1, we see that neither pfANM nor ANM does well in this regard, for possible reasons that were discussed in (26). However, the point of interest here is that pfANM does slightly better than ANM.
A more rigorous measure for comparing anisotropic tensors are the correlation coefficient of tensors proposed by Merritt (44). If U and V are 2 anisotropic Bfactor tensors, then the correlation coefficient between them is given by: The normalized correlation coefficient is given by: where U_{iso} and V_{iso} describe a pair of isotropic atoms, with U_{iso}^{11} = U_{iso}^{22} = U_{iso}^{33} = U_{eq} = trace(U)/3 and similarly for V_{iso}. This normalized correlation coefficient ncc in Eq. 2 will be >1 if the 2 atoms described by U and V are more similar to each other than to an isotropic case, and will not be >1 otherwise. Thus, ncc provides an excellent measure to compare the size, orientation, and direction of the 2 tensors. In practice, a simple count of how many atoms in a structure have their normalized correlation coefficients larger than 1 provides a good measure of the quality of anisotropic Bfactor prediction.
The percentages of residues with ncc larger than 1, as computed by pfANM and ANM, are listed in the last column of Table 1. Using this ncc measure, it can clearly be seen that pfANM yields significant improvements over ANM in anisotropic Bfactors predictions.
Discussion
Besides testing the performance of pfANM in isotropic and anisotropic Bfactor predictions, we have also looked at how well the modes of pfANM are related to experimentally observed conformational changes, especially the conformational differences between pairs of experimental structures of the same protein that has both an “open” and a “closed” form (31, 32, 45–49). We conduct this study using the same approach and dataset as in our previous work (32), except that here we now recalculate using pfANM. There are a total of 170 structure pairs in the dataset.
The conformational changes between the “open” and “closed” forms of the 170 proteins are interpreted with the modes calculated from both pfANM and ANM. We employ 2 measures to quantify how well the conformational changes can be explained by pfANM or ANM modes. One of them is the maximum overlap (MO, cosine of angle between 2 vectors), which is the maximum of the overlap values between the observed conformational change and the direction of a mode. If there is a single mode that contributes significantly to the observed conformational change, it has a high overlap value and it might be a mode that is functionally important. Fig. 5 shows the maximum overlap (MO) values for both pfANM and ANM. The 2 models are quite comparable even though the modes of ANM match slightly better with the observed conformational changes than do those of the pfANM. It is, however, likely that distortions of different magnitudes may exhibit different distance dependences. First, both ANM and pfANM assume harmonic interactions among the residues and consequently, both models are most valid for interpreting local fluctuations (Bfactors). When applying these models to interpret larger conformational changes, caution needs to be taken. The larger deformations observed in these conformational transitions often correspond to the proteins moving as if they were comprised of rigid domains or clusters, that is, the local cohesion within a domain/cluster appears to be much stronger relative to the interdomain cohesiveness. Compared with the ANM model with a cutoff distance, the pfANM with inverse square distance dependence has stronger longrange cohesion that does not permit discrete domains to move sufficiently. Indeed, when we increase the power of the inverse square function built into the pfANM to a higher power to strengthen the short range interactions over the longer range interactions, to permit domains to move, then the pfANM model outperforms the ANM in its interpretation of conformational changes. As shown in Table S5, the best power dependence of the pfANM for the large conformational transitions is in the range of r^{−6} to r^{−8}.
In this work, we developed a parameterfree ENM (pfENM) and compared its performance with the conventional ENM that uses a cutoff distance to define binary interactions—either on or off. We have found that our pfENMs are not only simpler (without any cutoff distances), but they also perform better than usual ENMs, for both isotropic and anisotropic Bfactor predictions. The removal of the cutoff distance may also remedy the problem of the scrambling of the modes that sometimes occurs between structures that differ only slightly (13), and other problems caused by the discontinuity of the interactions at the boundary of the cutoff distance. Thus, these parameterfree ENMs should be the models of choice wherever elastic network models are used. Recall that the ENMs are conformational entropy models, so these distancedependent springs are better able to capture the conformational entropic fluctuations than can the use of a cutoff distance. This means that the cohesiveness of protein structures is better represented by interactions decaying with the inverse square of the separation distances. In some ways this corresponds to others' findings where they used inverse 6th power springs (29, 36)
Improvements to the ENMs are critical for their use in developing mechanisms (50–53). Previously springs of different strengths for different classes of interactions were not been found to improve the motions computed with ENMs (54). The improvements found here do suggest that the introduction of different power dependences for different extents of deformation leads to gains. This introduces an approach, different in formulation from distancedependent interaction energies, for distancedependent entropyrelated cohesion.
Methods
Protein Dataset.
We use PDBPEPRDB (55) to select protein structures from the Protein Data Bank (PDB) (56). We choose protein structures determined by Xray crystallography at resolutions better than 2.0 Å and with Rfactors better than 0.2. We exclude protein fragments or membrane proteins. All protein sizes are at least 50 residues with sequence identities not >25% and structure similarities differing by >10 Å (parameter in PDBPEPRDB). We exclude structures that only have backbone atoms or alpha carbons. We also remove any structure that does not provide experimental Bfactors. Finally, we obtain and use 1,220 protein structures in our dataset.
For anisotropic Bfactors, we choose to include in our dataset all protein crystal structures in the Protein Data Bank (PDB) that have resolution equal or better than 1.2 Å and have anisotropic Bfactor entries. There were 341 such structures in our dataset. The dataset is the same one that we used in (26).
CoarseGrained Gaussian Network Model (GNM).
In the GNM (8), a protein structure is represented by its alpha carbons only. The relative fluctuations between a pair of contacting residues are assumed to follow a Gaussian distribution in its dependence on the square of the separation (hence the name of the model). Alpha carbon pairs are considered to be in contact when their separation distance is smaller than a preset cutoff distance (≈7–8 Å, here we use 7.3 Å). All springs are taken to be at equilibrium for the input structure, and the input structure is assumed to be the lowest in energy, for the computed fluctuations around it. One advantage of this approach is that the fluctuations of each atom around its equilibrium position, and their crosscorrelations can be expressed in simple analytical form. To determine the atomic fluctuations, we first form the Kirchhoff matrix based on the contact information: where r_{ij} is the distance between atoms i and j, and r_{c} is the cutoff distance. The mean square fluctuations of each atom are given by: where γ is the spring constant. Then the theoretical Bfactors can be conveniently expressed as:
Anisotropic Network Model (ANM).
In the ANM (25), the Hessian matrix contains the second derivatives of the energy function. For a structure with n residues, the Hessian matrix H contains n by n superelements each of size 3 by 3. The ijth superelement of H is given as: where X_{i}, Y_{i}, and Z_{i} are the Cartesian components of residue i, and V represents the potential energy of the system. The potential between residues i and j is harmonic, modeled by a Hookean spring, for residues i and j within the cutoff distance. In the ANM (25), a larger cutoff distance than for GNM, say 13 Å, is normally used to establish the spring connections between residues. The Hessian matrix H is the counterpart of the Kirchhoff matrix Γ in the GNM. Similarly, the inverse of H contains n by n superelements, with the superelements of H^{−1}, being 3 by 3 matrices, describing the correlations between the components of ΔR_{i}. and ΔR_{j}. Specifically,
ParameterFree ENM (pfENM).
In the pfENM models, there is no cutoff distance. All pairs of residues are considered to be interacting with one another, although their interaction strengths are weighted by the inverses of their square distances. (A higher power of distances could have been used, however, from our preliminary results, the differences would be small.). Thus, residue pairs that are far apart have weaker interactions than those that are close. In pfGNM, the elements of the Kirchhoff matrix Γ are calculated as: where r_{ij} is the distance between residues i and j. In the pfANM, each superelement of the Hessian matrix is weighted by the inverse of the square distance between that residue pair, that is,
Comparing Computed BFactors with Experimental Values.
The correlation between experimental and calculated Bfactors is given by: A perfect correlation between the 2 vectors gives a value of 1 whereas perfect anticorrelation gives −1. Here, the computed Bfactors can be from either the GNM, ANM, or pfENM models.
Acknowledgments
This work was supported by National Institutes of Health Grants R01GM073095, R01GM072014, and R01GM081680.
Footnotes
 ^{1}To whom correspondence should be addressed. Email: jernigan{at}iastate.edu

Edited by Ken A. Dill, University of California, San Francisco, CA, and approved June 3, 2009

Author contributions: L.Y., G.S., and R.L.J. designed research; L.Y. and G.S. performed research; L.Y., G.S., and R.L.J. analyzed data; and L.Y., G.S., and R.L.J. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0902159106/DCSupplemental.
References
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Yang LW,
 et al.
 ↵
 ↵
 ↵
 Ming D,
 Kong Y,
 Lambert MA,
 Huang Z,
 Ma J
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Brooks B,
 Karplus M
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Eyal E,
 Chennubhotla C,
 Yang LW,
 Bahar I
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Hinsen K
 ↵
 ↵
 Halle B
 ↵
 ↵
 ↵
 ↵
 ↵
 Merritt EA
 ↵
 ↵
 ↵
 ↵
 Tama F,
 Sanejouand YH
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Bahar I,
 Cui Q
 Sen TZ,
 Jernigan RL
 ↵
 Noguchi T,
 Akiyama Y
 ↵
 Berman HM,
 et al.
Citation Manager Formats
Sign up for Article Alerts
Article Classifications
 Biological Sciences
 Biophysics and Computational Biology