New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Flexibility and packing in proteins

Edited by Alan Fersht, University of Cambridge, Cambridge, United Kingdom, and approved December 4, 2001 (received for review October 2, 2001)
Abstract
Structural flexibility is an essential attribute, without which few proteins could carry out their biological functions. Much information about protein flexibility has come from xray crystallography, in the form of atomic meansquare displacements (AMSDs) or B factors. Profiles showing the AMSD variation along the polypeptide chain are usually interpreted in dynamical terms but are ultimately governed by the local features of a highly complex energy landscape. Here, we bypass this complexity by showing that the AMSD profile is essentially determined by spatial variations in local packing density. On the basis of elementary statistical mechanics and generic features of atomic distributions in proteins, we predict a direct inverse proportionality between the AMSD and the contact density, i.e., the number of noncovalent neighbor atoms within a local region of ∼1.5 nm^{3} volume. Testing this local density model against a set of highquality crystal structures of 38 nonhomologous proteins, we find that it accurately and consistently reproduces the prominent peaks in the AMSD profile and even captures minor features, such as the periodic AMSD variation within α helices. The predicted rigidifying effect of crystal contacts also agrees with experimental data. With regard to accuracy and computational efficiency, the model is clearly superior to its predecessors. The quantitative link between flexibility and packing density found here implies that AMSDs provide little independent information beyond that contained in the mean atomic coordinates.
To date, xray crystallography has provided nearly 12,000 atomiclevel models of protein structure (see http://www.rcsb.org/pdb/). The primary data, structure factors of Bragg reflections, result from diffraction of xrays by the atoms in a singlecrystal comprising some 10^{15} protein molecules. At any instant, the members of this molecular ensemble are continuously, but nonuniformly, distributed in conformational space. The structure factors yield a set of mean atomic positions r ≡ 〈r_{k}〉 that define the “groundstate” protein structure, or, if resolution permits, a small number of substantially populated lowenergy conformational substates. For each atom thus located, one also obtains a Debye–Waller factor, i.e., the spatial Fourier transform of the probability distribution function (PDF) F_{k}(u_{k}) for displacements u_{k} ≡ r_{k} − r of atom k away from its mean position (1). For diffraction data of ultrahigh resolution, F_{k}(u_{k}) is usually modeled as a trivariate Gaussian function, parametrized in terms of the six independent elements of the atomic covariance matrix 〈u_{k} u〉 (2). More commonly, one adopts a univariant Gaussian function, fully characterized by the (isotropic) meansquare displacement 〈u_{k}⋅u_{k}〉 ≡ σ_{k}, or the B factor B_{k} ≡ 8π^{2}σ_{k}/3.
Like the mean atomic positions, the set of atomic mean square displacements (AMSDs) {σ_{k}}, k = 1, 2, … , N, is an intrinsic property of the protein (in its crystal environment), providing a spatially resolved measure of the smallamplitude pliability or flexibility of the groundstate protein conformation (3). Although Bragg diffraction data contain no information about the rate or mechanism of conformational motion, AMSDs are often discussed and interpreted in dynamical terms (3–5). Indeed, the terms flexibility, dynamics, and mobility are often used synonymously in this context. In principle, AMSDs can be calculated and the associated motions identified by molecular dynamics simulations based on semiempirical atomic forcefield models (6–9). In practice, the agreement between simulated and experimental AMSDs is modest (7–9), even when the rigidifying effect of crystal contacts is taken into account (7, 9). Calculated AMSDs tend to increase with the length of the analyzed trajectory as slower motions of larger amplitude are sampled and do not converge even in nanosecondlength simulations (8, 9). Among the slower motions are dihedral barrier crossings between distinct conformational substates, such as alternative sidechain conformations. The displacement distribution F_{k}(u_{k}) for atoms undergoing such motions is multimodal and hence not well approximated by a Gaussian function (10, 11). Therefore, ultrahighresolution diffraction data are usually modeled with several residues in alternative conformations, each with its own set of AMSDs. In either case, conformational substates complicate the comparison with simulated AMSDs.
Disregarding minor quantum effects, AMSDs are static equilibrium properties, completely determined by the interactions within the system. In other words, AMSDs cannot depend on any kinetic parameters, such as libration frequencies, substate interconversion rates, or solvent viscosity. Consequently, AMSDs can be predicted without invoking motion. Moreover, this should be far less challenging than predicting the mean atomic positions (the folding problem), because AMSDs are governed by local features of the energy landscape near the global minimum. It has long been recognized that AMSDs correlate with structural features such as solvent exposure, packing density, and secondary structure (5, 12, 13). However, such observations have been of a qualitative nature and have not been pursued in a systematic way.
The aim of the present work is to explore the hypothesis that AMSDs can be predicted solely on the basis of packing density. This hypothesis is motivated by the following considerations. On average, protein interiors are as densely packed as crystalline solids (14–17). Most atoms therefore cannot be displaced much without also displacing some of their nonbonded neighbors. Yet, the local density, averaged over volume elements of 0.1–1 nm^{3}, varies substantially within a protein (14, 17, 18). Equivalently stated, the distribution of voids (cavities and subatomic interstices) is highly inhomogeneous. Presumably, lowdensity regions can accommodate a variety of alternative packings or conformations, whereas highdensity regions might be realized only for a few closely similar conformations. AMSDs should then be anticorrelated with local packing density.
Although these arguments are intuitively appealing, the functional form an AMSD–density relationship is not obvious. We show here that a simple inverse proportionality, σ_{k} ∝ n, emerges from a series of crude but welldefined approximations. As the measure of local packing, we use the contact density n_{k}, i.e., the number of nonhydrogen atoms within a spherical region of ∼1.5 nm^{3} volume centered on atom k. We then test the predictive power of this simple relation on a set of 38 nonhomologous protein crystal structures of exceptionally high quality. We find that the simple inverse relationship faithfully reproduces the variation of backbone as well as sidechain AMSDs along the polypeptide chain. Because the AMSD profile can be predicted with good accuracy from the contact density, it does not furnish much independent information beyond that already contained in the mean coordinates. This finding has implications for how we think about AMSDs. For example, the use of AMSDs to infer likely pathways for ligand access to internal sites (3, 4, 19), sometimes called thermal motion paths (19), essentially amounts to an identification of contiguous regions of low packing density.
The present local density approach to AMSDs is superficially related to the use of effective harmonic potentials with distancedependent force constants to predict AMSDs and largescale conformational transitions (20–23). However, the underlying physical models are qualitatively different. Moreover, the present approach is both more accurate and more computationally efficient.
Methods
Statistical–Mechanical Basis.
The isotropic AMSD, σ_{k}, of atom k is defined by 1 where F̄_{k}(u_{k}) is the orientational average of the displacement PDF F_{k}(u_{k}). The potential of mean force (POMF) w_{k} associated with F_{k}(u_{k}) may be defined through (24) 2 where β = (k_{B}T)^{−1}, and C_{k} is a constant that normalizes F_{k}(u_{k}) to unity. Expanding w_{k} around the mean position of atom k, we obtain (in matrix notation) 3 where the Cartesian components of the vector a_{k} and the tensor B_{k} are first and second derivatives, respectively, of βw_{k} with respect to the Cartesian components of the displacement vector u_{k}, evaluated at the mean position r (or u_{k} = 0).
The isotropic PDF F̄_{k}(u_{k}) in Eq. 1 involves the orientational average of the Boltzmann factor in Eq. 2. We approximate this by the Boltzmann factor of the orientationally averaged POMF: 4 When we take the isotropic average of w_{k} in Eq. 3, all terms containing odd powers of u_{k} vanish, so that 5 with Λ_{k} = Tr B_{k}/3. For sufficiently small displacements, we can neglect terms of fourth and higher order in Eq. 5. The orientationally averaged POMF then becomes harmonic, as generally assumed in the interpretation of diffraction data (1, 2), and a combination of Eqs. 1, 4, and 5 yields σ_{k} = 3/(2Λ_{k}).
To relate Λ_{k} to the local density, we make a bold assumption: when atom k is displaced from its mean position, all other atoms remain at their mean positions. The Nparticle problem then becomes a oneparticle problem, and the POMF w_{k} reduces to the sum of pair interactions ν_{ki}(r_{k} − r) of atom k with every other atom i, each confined to its mean position r. The pair interaction ν_{ki} depends on the atomic configuration in a complicated way. In several recent treatments of protein conformational dynamics and flexibility, harmonic pair interactions have been postulated (20–23). Although ν_{ki} may be approximately harmonic near its minimum (for the isolated atom pair), it is certainly not harmonic at the separations of the vast majority of atom pairs in the mean configuration of the protein, where the second derivatives in Λ_{k} are to be evaluated. In fact, most atoms i in the protein hardly interact at all with the reference atom k and, therefore, do not contribute significantly to Λ_{k}. In evaluating Λ_{k}, we therefore need to consider only those n_{k} atoms i whose mean positions are within some cutoff distance R_{C} of atom k, i.e., for which r ≡ r − r ≤ R_{C}. We can then express Λ_{k} as a sum of contributions from these n_{k} atoms, or as Λ_{k} = n_{k} λ_{k}, where λ_{k} is the mean of the n_{k} atomic contributions. We shall now argue that the dependence of Λ_{k} on k derives mainly from local density (n_{k}) variations.
Consider the radial distribution function g_{k}(r), which, when multiplied by 4π r^{2}dr, gives the number n_{k}(r) of nonhydrogen atoms in a spherical shell of thickness dr at a distance r from reference atom k. We compute this quantity by summing over the isotropic displacement PDFs F̄_{i}(u_{i}) for all other atoms i and taking the isotropic average over the orientation of the vector r: 6 where R_{±} = (r ± r)^{2}/(2σ_{i}). This result is obtained by inserting the Gaussian PDF F̄_{i}(u_{i}) (see Eqs. 1, 4, and 5) and noting that u = r^{2} + (r)^{2} − 2 r r cos θ, where θ is the angle between r and r. Fig. 1 shows g_{k}(r) for each α carbon in parvalbumin; similar results are obtained for other proteins. The first peak in g_{k}(r), with maximum at 1.5 Å and extending to about 3.5 Å, corresponds to the covalent neighbors: 6–10 atoms linked to the reference α carbon by 1–3 bonds. Except for a few α carbons near the chain termini, g_{k}(r) exhibits a second peak, with maximum near 5 Å, produced by a much larger number of atoms. Although close in space, most of these atoms are many bonds away from the reference α carbon and therefore have predominantly noncovalent interactions with it. We refer to them as noncovalent neighbors.
By far the largest contribution to Λ_{k} comes from the covalent neighbors. Because displacements of these atoms are highly correlated with displacement of the reference α carbon, the rigidenvironment approximation (r_{i} = r) is strongly violated. But because the covalent neighbors are distributed in much the same way around all α carbons (see Fig. 1), they hardly contribute to the AMSD variation that we seek to model. We therefore ignore the covalent neighbors and attribute Λ_{k} entirely to the noncovalent neighbors. We must then also reinterpret the pair potential ν_{ki} as the interaction of atom i with the cluster comprising atom k and its 6–10 covalent neighbors.
The mean contribution λ_{k} from the noncovalent neighbors depends mainly on their mean positions. Because the position of the second peak in g_{k}(r) varies relatively little (see Fig. 1), whereas n_{k} varies by a factor 5 (see below), we attribute the variation of Λ_{k} with k entirely to n_{k}, writing Λ_{k} = n_{k} λ. We then arrive at the desired result 7 predicting that the AMSD is inversely proportional to the contact density n_{k}, i.e., the number of noncovalent neighbors. The set of approximations leading to Eq. 7 will be referred to as the local density model (LDM). The LDM can predict the AMSD variation within a protein but, because the parameter λ is undetermined, it cannot yield the mean AMSD. In comparing LDM predictions with experimental results, we therefore scale the calculated AMSDs such that 〈σ〉 = 〈σ〉. This scaling takes care of the temperature dependence of the AMSDs; if the (renormalized) pair interactions are temperature independent, it follows from the foregoing that λ ∝ T^{−1}.
Selection of Proteins.
To test the hypothesis that AMSDs scale with local density according to Eq. 7, we use a set of 38 crystal structures taken from the current (Feb. 25, 2001) pdb select list of structures with less than 25% sequence identity between any two proteins (25). From this list, we selected the structures of highest resolution (≤1.30 Å) and best quality (R factor < 0.16), retaining only singlechain proteins with more than 50 residues. Because experimental AMSDs depend to some extent on the refinement method, we included only structures refined with the program shelxl (26), the most widely used protocol for ultrahighresolution data. Although nearly all of the selected structures were refined with anisotropic Debye–Waller factors, we use only the isotropic AMSDs. (Relevant characteristics of the analyzed protein structures are collected in Table 2, which is published as supporting information on the PNAS web site, www.pnas.org).
Assessment of LDM Predictions.
Two different indicators, a merit function and a measure of association, are used here to quantitatively assess the agreement between calculated (σ) and experimental (σ) AMSDs. For most proteins, the σ_{k} distribution is highly skewed, with a sharp cutoff on the lowσ_{k} side, corresponding to atoms in densely packed core regions, and a pronounced tail on the highσ_{k} side, corresponding to atoms in flexible loops or chain termini. The conventional merit function, the meansquare deviation, and the usual measure of association, Pearson's linear correlation coefficient, are unsuitable here because they can be dominated by a few outliers (27). We therefore use more robust indicators. As merit function, we use the relative mean absolute deviation, i.e., 〈σ − σ〉 divided by 〈σ − 〈σ〉〉. Normalization by the experimental σ_{k} variation allows us to compare Δ values from protein structures determined at ambient and cryogenic temperatures. As a measure of association, we use the Spearman rankorder correlation coefficient, ρ, which is based on the rank order of σ_{k} rather than its actual value (27). In contrast to Pearson's coefficient, the nonparametric correlation coefficient ρ can be meaningfully compared among different protein structures.
The Contact Density.
Although the LDM can be used to predict AMSDs for any atom type, we shall mainly discuss α carbons here. The contact density n_{k} is then the number of nonhydrogen atoms within a distance R_{C} from the reference α carbon k. The cutoff radius R_{C} should be chosen to include most of the second peak in g_{k}(r). For the calculations reported here, we have fixed R_{C} to the radial distance, R, of the second minimum in the Cα–Cα radial density, 4π r^{2}g_{αα}(r). For our data set, R has a mean of 7.35 Å and a standard deviation of only 0.18 Å. Virtually identical results are obtained for any R_{C} value in the range 7–10 Å (see Fig. 6, which is published as supporting information on the PNAS web site).
The contact density can be obtained simply by counting the number of nonhydrogen atoms whose mean positions are within R_{C} of atom k. However, we can compensate to some extent for the shortcomings of the rigidenvironment approximation by taking into account thermal displacements of neighbor atoms in the calculation of n_{k}. We thus obtain n_{k} by integrating the radial density n_{k}(r) = 4π r^{2}g_{k}(r), with g_{k}(r) given by Eq. 6, from r = 0 to r = R_{C}. Because this requires knowledge of the AMSD variation that we want to predict, we perform a selfconsistent calculation starting from a flat AMSD profile. When integrating n_{k}(r), we should also use a lower cutoff of about 3.5 Å to exclude the covalent neighbors. However, because n_{k} is heavily dominated by noncovalent neighbors, a lower cutoff has little effect.
The Cα contact density distribution P(n), calculated in this way with R_{C} = 7.35 Å, is shown in Fig. 2 for the entire data set. For each of the 38 proteins, the contact density spans essentially the whole range, n ≈ 20–100, of the cumulative distribution. The smalln flank of P(n) is caused by α carbons in exposed termini and loops. On the other flank, P(n) drops sharply as the closepacking limit of n ≈ 100 is approached. At intermediate densities, P(n) exhibits a broad plateau for n ≈ 60–90. The mean contact density for all 6,231 α carbons is 67.5.
Some α carbons have low contact density simply because they are near the surface of the protein. To examine this geometric contribution to the contact density, we calculated P(n) for a set of 38 uniformly packed spherical “proteins” with the equivalentsphere radii of the real proteins. The resulting P(n) has two striking features (see Fig. 2). First, it is dominated by a large peak (truncated in Fig. 2) from atoms that are further than R_{C} from the surface. These “core” atoms all have the same contact density, which we have set to 100. Second, P(n) decreases with n over a wide range, because the number of atoms in a spherical shell decreases towards the center. Because P(n) for real proteins displays neither of these features, we conclude that it mainly reflects local variations in packing density (including the detailed shape of the surface).
The contact density used in the following to predict AMSDs includes probability density from all nonhydrogen atoms within a sphere of radius R_{C}, whether or not these atoms belong to the same protein molecule as the reference atom k. In other words, n_{k} may contain contributions from atoms in neighboring protein molecules in the crystal. The contact density may also contain contributions from cofactors, such as heme groups, iron–sulphur clusters, and specifically bound metal ions and, in a few cases, from internally bound substrates. On the other hand, n_{k} does not include any contributions from water molecules or cosolvents. (In a separate calculation on two BPTI structures, inclusion of the four internal water molecules was found to slightly improve the agreement between predicted and experimental AMSDs.) Most of the crystal structures in our data set contain multiple conformations of several residues, particularly for sidechains. For such residues, the dominant conformation and its associated AMSDs were used for the analysis.
Results
We consider first backbone flexibility, comparing predicted and experimental AMSD profiles for the α carbons along the polypeptide chain. Fig. 3 shows such profiles for Serratia marcescens endonuclease (241 residues). Predicted AMSDs were calculated selfconsistently by using Eqs. 6 and 7 with R_{C} = 7.32 Å. The LDM reproduces all prominent peaks in the AMSD profile and even captures minor features, such as the periodic AMSD variation often seen in α helices. The prediction quality indicators are Δ = 0.54 and ρ = 0.82. (For a compilation of quality indicators for all protein structures, see Table 3, which is published as supporting information on the PNAS web site.)
The LDM yields consistently accurate predictions of AMSD profiles; for our set of 38 protein structures, Δ = 0.72 ± 0.11 and ρ = 0.70 ± 0.09 (mean ± one standard deviation). For 82% of these structures, Δ < 0.78 and ρ > 0.60. The indicators Δ and ρ do not correlate with protein size or with secondary structure content (see Fig. 7, which is published as supporting information on the PNAS web site).
Among the analyzed structures, 31 were determined at cryogenic temperatures (85–120 K) and only 7 at ambient temperatures (287–300 K). Although proteins are less flexible at low temperature, any static lattice disorder should be unaffected by cryogenic quenching. The effect of lattice disorder might be modeled by fitting the two parameters in σ_{k} = σ_{0} + c/n_{k} to the experimental AMSDs. Because the relation between σ_{k} and 1/n_{k} remains linear, the correlation coefficient is not affected. Although the ambienttemperature structures have larger mean AMSD than the cryostructures (0.54 versus 0.43 Å^{2}), the agreement between predicted and experimental AMSD profiles is hardly better for the roomtemperature structures: Δ = 0.70 ± 0.11 versus 0.73 ± 0.11 and ρ = 0.70 ± 0.12 versus 0.70 ± 0.08. This nearinvariance suggests that lattice disorder does not contribute significantly to our data set, in accordance with the expectation that crystals diffracting to atomic resolution should exhibit little mosaicity (3).
As seen from Table 1, the selfconsistent calculation of the contact density, according to Eq. 6, leads to a small improvement compared to a fixedatom calculation. When the contact density includes only atoms in the same protein molecule as the reference α carbons, the predicted AMSD profile often exhibits peaks that are absent in the experimental profile. These spurious peaks tend to coincide with loop and turn regions in intimate contact with adjacent protein molecules. In the LDM, the effect of such crystal contacts can be taken into account by including all nonhydrogen atoms in neighboring proteins that are within R_{C} of any of the reference α carbons. This inclusion markedly improves the agreement with experiment (see Table 1, rows b and d), particularly for small proteins, which have a larger fraction of their residues involved in crystal contacts. Fig. 4 illustrates the effect of crystal contacts for Bacillus caldolyticus coldshock protein (66 residues, R_{C} = 7.71 Å). The agreement with experiment is clearly better when the contact density includes contributions from crystal neighbors (Δ = 0.77, ρ = 0.71) rather than just atoms in the reference molecule (Δ = 1.36, ρ = 0.58). The strongest crystal interaction, clearly manifested in the AMSD profile, involves a loop region (G21, E22, G23) in close contact with the N terminus (M1, Q2, R3) of a symmetryrelated molecule.
The LDM is not limited to α carbons but can be applied to any or all atom types. Fig. 5 shows the AMSD profile for all 460 nonhydrogen atoms in a Kunitztype domain from collagen (58 residues, R_{C} = 7.40 Å). The LDM correctly predicts that sidechain atoms are more flexible than adjacent backbone atoms and, in most cases, also reproduces the relative flexibility of different sidechains. The overall agreement with experiment is comparable to that found for α carbons only. Fig. 5 also shows, in several instances, that the LDM correctly identifies sidechains with reduced flexibility because of crystal contacts.
Discussion
Interactions, Dynamics, and Flexibility.
Considering its extreme simplicity, the LDM is remarkably successful. Its central idea, that AMSD profiles are governed by spatial variations in packing density, might seem to ignore all interactions apart from excluded volume. In particular, the LDM does not recognize covalent bonds or hydrogen bonds explicitly. However, all types of interactions, specific as well as nonspecific, are implicitly manifested in the LDM via their effect on the local density. Elements of regular secondary structure, such as α helices and β sheets, not only are extensively hydrogenbonded but also are densely packed. Disulfide bridges not only impose connectivity constraints on conformational motions, but, by forcing backbone segments together, also increase the local atomic density. Thus, for example, the LDM accurately predicts the AMSDs of all six disulfide cysteine residues in the Kunitztype domain C5 (see Fig. 5).
Protein conformational flexibility may be thought of and rationalized in different ways. The most widely adopted viewpoint is to interpret AMSDs in terms of conformational motion. Ultimately, however, both flexibility and dynamics are determined by interactions. It is therefore possible, in principle, to predict AMSDs from detailed interaction models. This approach is computationally demanding and has met with limited success so far, partly because interactions in proteins are extremely complex and not yet fully understood. By relating flexibility directly to local density, the LDM offers a conceptual shortcut that bypasses the intricacies of detailed interaction models.
As demonstrated here, variations in smallamplitude structural flexibility within native proteins are largely governed by spatial inhomogeneities in packing density. By unifying these aspects of protein structure, the LDM contributes to our understanding of the physical properties of proteins. On the other hand, the success of the LDM implies that (isotropic) crystallographic B factors supply very little independent information not already present in the mean atomic coordinates. It should be possible to improve the accuracy of LDM predictions by including ordered water molecules buried in internal cavities or trapped at crystal contacts, by using weight factors for different (united) atom types, or by optimizing the cutoff radius for each protein structure. Yet, truly quantitative accuracy cannot be expected from such a simple model. Further insight about the determinants of structural flexibility might come from a systematic study of those instances where the LDM predictions are least accurate. In some cases, such discrepancies might be traced to unresolved conformational substates; in other cases, they might reflect deficiencies in the model.
Other AMSD Correlations.
Protein flexibility correlates with a variety of physical properties, such as solvent exposure, distance from centerofmass, and secondary structure (3–5, 12, 28, 29). The most frequently invoked correlation is that with solventaccessible surface area (SASA) (5, 12, 29). The simplest linear relationship between AMSD (σ_{k}) and SASA (a_{k}) of atom k is σ_{k} = σ_{0} + ca_{k}. Like the uniformsphere model discussed above, this model predicts that all buried atoms have the same AMSD, σ_{0}. Although the SASA model identifies many of the prominent peaks in the AMSD profile, arising from exposed loop and turn residues, the predicted AMSD variation is far too small, and most of the fine structure is lost. To quantitatively assess the SASA model, we adjusted the two parameters σ_{0} and c by minimizing the mean absolute deviation between predicted and experimental α carbon AMSDs, using for a_{k} the SASA (1.4Å probe radius) for the backbone atoms of residue k. This yields rather poor agreement for our data set. For example, for protein G Iggbinding domain III (2igd, 61 residues), the SASA model yields Δ = 0.85 and ρ = 0.49, whereas the LDM yields Δ = 0.53 and ρ = 0.83; and for S. marcescens endonuclease (1ql0, 241 residues), the SASA model yields Δ = 0.82 and ρ = 0.54, whereas the LDM yields Δ = 0.54 and ρ = 0.82.
Another approach for predicting α carbon AMSDs in proteins (21) has been inspired by a simple model of rubber elasticity (30, 31). In this Gaussian phantom network model (GNM), an elastomer material is modeled as a network of noninteracting polymer segments, where any two connected junctions are subject to a restoring force proportional to their separation and with a force constant inversely proportional to the segment contour length (30). This force, which increases with separation, is entirely generated by the configurational entropy of the polymer segment; all mechanical interactions, even excluded volume, are neglected. To prevent the model network from collapsing, the mean positions of the junctions are taken to be fixed by external forces. The GNM leads to a harmonic POMF w_{k}(u_{k}) and, consequently, to a Gaussian displacement PDF F_{k}(u_{k}) (30, 31).
In the protein version of the GNM, here denoted as PGNM, the α carbons are regarded as junctions in a virtual network characterized by pairwise interactions of the form ν_{ki} = (γ/2) ɛ_{ki} r_{ki} − r^{2} = (γ/2) ɛ_{ki} u_{k} − u_{i}^{2}, where ɛ_{ki} = 1 if r ≤ R_{C} and ɛ_{ki} = 0 otherwise. In contrast to the entropic interaction in the original GNM, this pair potential is postulated without an underlying physical model. This fundamental difference between the two models reflects the fact that the junctions are physically linked in the real network (GNM) but not in the virtual network (PGNM). The real interaction between α carbons is negligibly weak and monotonically decaying (as r) at most separations of interest here.
As in the original GNM, the postulated harmonic form of the pair interaction leads to a Gaussian displacement PDF with the AMSD given by 8 Here, D is the diagonal eigenvalue matrix, and U is the orthogonal eigenvector matrix that diagonalizes the symmetric matrix Γ according to U^{T} Γ U = D. The socalled Kirchoff adjacency matrix Γ (32) has offdiagonal elements Γ_{ki} = −ɛ_{ki}, whereas its diagonal elements are the Cα–Cα contact densities, Γ_{kk} = ∑_{i≠k}ɛ_{ki} = n_{k}. When expressed in terms of the atomic displacements u_{k}, the partition function diverges, because the adjacency matrix is singular (rank N − 1) in the PGNM. Therefore, the inverse of Γ does not exist. In Eq. 8, the zero eigenvalue, corresponding to a uniform displacement of all atoms, is omitted from the sum. The adjacency matrix is dominated by its diagonal elements, giving the number of α carbons within R_{C} (averaging 8.3 ± 0.6 for our data set). If all offdiagonal elements (−1 or 0) in Γ are set to zero, one obtains σ_{k} = 3 k_{B}T/(γ n_{k}), which coincides with the LDM result in Eq. 7, with the correspondence λ ↔ γ/(2 k_{B}T).
We have tested the PGNM on our set of 38 highquality protein structures. As seen from Table 1 (rows d and e), the LDM predicts α carbon AMSDs considerably more accurately than the PGNM. It is also of interest to compare PGNM and LDM predictions when both models are based on α carbons only. As seen from Table 1 (rows e and f), the offdiagonal elements of Γ do not substantially improve the AMSDs. (This is also the case when both models are based on all nonhydrogen atoms.) Furthermore, when the contact density is calculated selfconsistently, the LDM performs slightly better (and more consistently) than the PGNM, even when only α carbons are included in the contact density (rows e and g).
A major weakness of the PGNM is its obscure physical basis. Although the mathematical formalism conforms closely to the original GNM, the underlying physics is quite different. For example, in the original GNM, γ is proportional to T, making the AMSDs independent of temperature, in contrast to what is observed for proteins (33). The PGNM approach has been justified a posteriori through its agreement with experimental AMSDs (34, 35). We believe that the reason for the relative success of the PGNM can be found in the physical justification given here for the LDM, to which it reduces after a numerical approximation (neglect of offdiagonal Γ elements). The LDM is not only more accurate than the PGNM; it is also more computationally efficient. Because it does not involve any matrix diagonalization, the LDM can readily be used to predict AMSDs for all nonhydrogen atoms. To the extent that NMR derived secondrank orientational order parameters for peptide N—H bonds correlate with (crystalcontactcorrected) xrayderived AMSDs (29, 35–37), they can also be predicted by the LDM.
Acknowledgments
This work was supported by the Swedish Research Council.
Footnotes

↵* Email: bertil.halle{at}bpc.lu.se.

This paper was submitted directly (Track II) to the PNAS office.
Abbreviations
 AMSD,
 atomic meansquare displacement;
 GNM,
 Gaussian network model;
 PGNM,
 protein version of GNM;
 LDM,
 local density model;
 PDF,
 probability distribution function;
 POMF,
 potential of mean force;
 SASA,
 solvent accessible surface area
 Received October 2, 2001.
 Copyright © 2002, The National Academy of Sciences
References
 ↵
 Willis B T M,
 Pryor A W
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 York D M,
 Wlodawer A,
 Pedersen L G,
 Darden T A
 ↵
 ↵
 ↵
 Rejto P A,
 Freer S T
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Tama Y,
 Sanejouand YH
 ↵
 Hill T L
 ↵
 ↵
 ↵
 Press W H,
 Teukolsky S A,
 Vetterling W T,
 Flannery B P
 ↵
 Kuriyan J,
 Weis W I
 ↵
 ↵
 ↵
 Flory P J
 ↵
 ↵
 ↵
 ↵
 ↵