# Effects of geometry and chemistry on hydrophobic solvation

See allHide authors and affiliations

Edited by Pablo G. Debenedetti, Princeton University, Princeton, NJ, and approved August 21, 2014 (received for review April 4, 2014)

## Significance

The solvation free energy of a molecule includes the free energy required to remove solvent from what will become the molecular interior and the free energy gained from dispersive interactions between the solute and solvent. Traditionally, these free energies have been assumed to be proportional to the surface area of the molecule. However, we computed these free energies for a series of alkanes and four configurations of decaalanine and showed that although these free energies were linear in the surface area for each set of molecules, each atom’s contributions to these energies depended on correlations with its surrounding atoms. The atomic contributions to these energies were therefore not additive. This finding suggests that most current hydrophobic models are unsatisfactory.

## Abstract

Inserting an uncharged van der Waals (vdw) cavity into water disrupts the distribution of water and creates attractive dispersion interactions between the solvent and solute. This free-energy change is the hydrophobic solvation energy (Δ*G*_{vdw}). Frequently, it is assumed to be linear in the solvent-accessible surface area, with a positive surface tension (γ) that is independent of the properties of the molecule. However, we found that γ for a set of alkanes differed from that for four configurations of decaalanine, and γ = −5 was negative for the decaalanines. These findings conflict with the notion that Δ*G*_{vdw} favors smaller *A*. We broke Δ*G*_{vdw} into the free energy required to exclude water from the vdw cavity (Δ*G*_{rep}) and the free energy of forming the attractive interactions between the solute and solvent (Δ*G*_{att}) and found that γ < 0 for the decaalanines because −γ_{att} > γ_{rep} and γ_{att} < 0. Additionally, γ_{att} and γ_{rep} for the alkanes differed from those for the decaalanines, implying that none of Δ*G*_{att}, Δ*G*_{rep}, and Δ*G*_{vdw} can be computed with a constant surface tension. We also showed that Δ*G*_{att} could not be computed from either the initial or final water distributions, implying that this quantity is more difficult to compute than is sometimes assumed. Finally, we showed that each atom’s contribution to γ_{rep} depended on multibody interactions with its surrounding atoms, implying that these contributions are not additive. These findings call into question some hydrophobic models.

Many techniques in computational biophysics require the computation of free-energy differences. However, directly computing these free-energy differences with quantum or classical molecular dynamics (MD) methods has proven challenging because doing so requires long simulation times. To make such computations more tractable, techniques have been developed based on the observation that these free-energy changes could be computed by combining estimates of the changes in vacuum, where they are usually easier to compute, with estimates of the solvation energies (Δ*G*) of the initial and final states (1, 2).

One common technique for computing Δ*G* is to break it into electrostatic (Δ*G*_{el}) and hydrophobic (Δ*G*_{vdw}) terms (3, 4). In this breakdown, Δ*G*_{vdw} is the free energy required to place an uncharged van der Waals (vdw) cavity into solution, and Δ*G*_{el} is the free energy of charging the cavity once it has been placed into solution. Several techniques, including the Poisson–Boltzmann equation (5), the generalized Born model (6), structured continuum approaches (7⇓–9), and integral equation methods (10), have been developed to compute Δ*G*_{el}. In the present study, we continue a study of Δ*G*_{vdw} begun previously (11⇓–13).

Many models have been constructed to compute Δ*G*_{vdw}. The first of these models, and perhaps the most widespread, is_{vdw} is a positive constant independent of the properties of the molecule and *b* is the energy of solvating a point-like solute (2, 14⇓⇓–17). This model follows from assuming that Δ*G*_{vdw} for microscopic cavities should scale with *A* in the same manner as for macroscopic cavities (18). Alternatively, several studies have pointed out that experimental measurements often are not consistent with Eq. **1** (19⇓⇓⇓⇓⇓⇓⇓–27). These studies have argued that Δ*G*_{vdw} should be split into purely repulsive (Δ*G*_{rep}) and attractive (Δ*G*_{att}) parts:*G*_{rep} is a linear function of *A* or the molecular volume (*V*),_{rep} and *ρ*_{rep} are positive constants independent of the properties of the molecule. Studies (28⇓⇓–31) have argued that Δ*G*_{rep} should obey Eq. **3** for large solute molecules, Eq. **4** for small solutes, and Eq. **5** in a cross-over regime. Alternatively, one could construct other functional forms that would approach Eqs. **3** and **4** in the appropriate limits, such as*G*_{att} by integrating over approximate solvent densities:*U*_{att} was the attractive dispersive interaction energy between the solute and solvent, as described in *Materials and Methods*.

In three previous studies, we obtained computational results that appeared to contradict some common expectations of Δ*G*_{vdw} (11⇓–13). For example, the idea that Δ*G*_{vdw} drives the initial collapse during protein folding by opposing the formation of cavities in the solvent with an energy penalty that increases with *A* has frequently been discussed (2, 32, 33). However, in the first of our studies (11), where we computed Δ*G*_{vdw} for a series of glycine peptides ranging in length from 1 to 5 monomers, we found that although Δ*G*_{vdw} was linear in the number of monomers, as would be expected if it scaled with *A*, Δ*G*_{vdw} was negative and decreased with the number of monomers, seemingly implying that γ_{vdw} was negative. In the second of our studies (12), we computed Δ*G*_{vdw} for a set of decaalanine conformations and found that Δ*G*_{vdw} was negative for these peptides as well. Finally, in the third study (13), we computed Δ*G*_{vdw} for a series of alanine peptides ranging in length from 1 to 10 monomers, with the peptides either held in fixed, extended conformations or allowed to assume a normal distribution of conformations. We once again found that Δ*G*_{vdw} was negative, and it decreased with the number of monomers when the peptides were held in extended conformations but increased with the number of monomers when the peptides were allowed to adopt a normal distribution of conformations.

To better understand these findings, here we computed with free-energy perturbation (FEP) (34, 35) not just Δ*G*_{vdw} but also Δ*G*_{rep} and Δ*G*_{att} for four conformations of decaalanine from a previous study (12) and a series of alkanes that has been examined in the literature (22). As discussed below, none of Eqs. **1** and **3****–****7** are consistent with our data. Each atom’s contributions to Δ*G*_{rep}, Δ*G*_{att}, and Δ*G*_{vdw} appear to depend on their chemical surroundings and the geometry of the molecular surface. We do not know of a theory that can explain the calculations presented here.

## Results

In Table 1 Δ*G*_{rep}, Δ*G*_{att}, and Δ*G*_{vdw} are shown. As expected, Δ*G*_{rep} and Δ*G*_{att} were highly correlated with *A*, and Δ*G*_{vdw} was highly correlated with *A* for the decaalanine conformers. The significantly weaker correlation between Δ*G*_{vdw} and *A* for the alkanes may be attributable to the small range of Δ*G*_{vdw} for these molecules, about 2.0 kcal/mol. Estimates of γ_{rep}, γ_{att}, and γ_{vdw} from least-squares fits between these energies and *A* are shown in Table 1, along with the Pearson’s correlation coefficients (*R*^{2}) of these fits. Plots of Δ*G*_{rep}, Δ*G*_{att}, and Δ*G*_{vdw} against *A* are shown in *SI Appendix*, Figs. S2 and S3.

As a test of the convergence of Δ*G*_{rep} and Δ*G*_{att} for the alkanes and Δ*G*_{rep} for the decaalanines, we compared the estimates obtained from forward FEP to those obtained from backward FEP. To test the convergence of Δ*G*_{att} for the decaalanines, we computed estimates from simulations that were half as long. To compute Δ*G*_{vdw} for the decaalanines, we combined Δ*G*_{rep} computed with forward FEP with Δ*G*_{att} from the longer simulations. To compute Δ*G*_{vdw} for the alkanes, we combined estimates of Δ*G*_{rep} and Δ*G*_{att} obtained from forward FEP. The energies given by the presumably less accurate calculations differed by at most 0.57 kcal/mol from those given by the more accurate simulations. Additionally, our estimates of ∂Δ*G*_{rep}/∂λ and ∂Δ*G*_{att}/∂λ varied smoothly with λ, indicating that our results were converged. Plots of ∂Δ*G*_{rep}/∂λ and ∂Δ*G*_{att}/∂λ versus λ for one of the decaalanine conformations are shown in *SI Appendix*, Fig. S1, and the other estimates of Δ*G*_{rep}, Δ*G*_{att}, and Δ*G*_{vdw} are shown in *SI Appendix*, Table S1. The estimates of Δ*G*_{rep}, Δ*G*_{att}, and Δ*G*_{vdw} for the alkanes were in good agreement with results reported elsewhere (22).

In agreement with our previous studies (12, 13), γ_{vdw} < 0 for the decaalanines. By breaking Δ*G*_{vdw} into Δ*G*_{rep} and Δ*G*_{att} we saw that γ_{vdw} was negative because, although increasing the surface area led to increasingly unfavorable Δ*G*_{rep} (γ_{rep} > 0), this energy penalty was more than offset by the increasingly favorable attractive interactions between the solute and solvent (γ_{att} < 0, and −γ_{att} > γ_{rep}.).

As shown in Table 1, γ_{att} for the alkanes differed from that for the decaalanines. This finding is in agreement with previous studies that Δ*G*_{att} is not a simple function of *A* (19⇓⇓⇓⇓⇓⇓⇓–27), and it implies that Δ*G*_{vdw} ≡ Δ*G*_{att} + Δ*G*_{rep} is also probably not a simple function of *A*. Attempting to estimate Δ*G*_{vdw} with a single γ_{vdw} independent of the other properties of the molecule is probably not possible.

For the alkanes both linear response theory (LRT) and the single-step perturbation (SSP) methods yielded acceptable estimates of Δ*G*_{att}, and for the decaalanines LRT yielded estimates that were within 10 kcal/mol of the results given by FEP (1). However, SSP yielded clearly worse estimates of Δ*G*_{att} for the decaalanines, and both methods overestimated γ_{att} for the decaalanines by more than 5 cal/mol per Å^{2}. These findings imply that SSP may not give good estimates of Δ*G*_{att} for some systems. Instead, the change in the water distribution that occurs when the dispersive interactions are turned on would need to be taken into account. Additionally, the decaalanines examined here were smaller than many systems of interest in biophysical problems (e.g., globular proteins). Further work will be needed to determine whether these conclusions hold for larger biomolecules. Plots of *A* are shown in *SI Appendix*, Fig. S2.

Table 1 also shows that γ_{rep} for the decaalanines differed significantly from γ_{rep} for the alkanes. This observation implies that even if a theory were created that could predict Δ*G*_{att}, estimating Δ*G*_{rep} by assuming that it is linear in *A* with a surface tension that is independent of the properties of the molecule may not be possible.

### Perturbative Derivative Calculations.

We used Eqs. **19** and **20** to investigate how small changes in atomic coordinates affect Δ*G*_{att} and Δ*G*_{rep}, and we plotted the resulting derivatives against ∂*A*/∂*x*_{i}. If Δ*G*_{rep} and Δ*G*_{att} were simply linear functions of *A*, the slopes of the resulting least-squares lines would provide perturbative estimates, _{rep} and γ_{att}. Table 1 lists *R*^{2} of the related least-squares lines. *SI Appendix*, Fig. S6 is a plot of ∂Δ*G*_{att}/∂*x*_{i} versus ∂*A*/∂*x*_{i} for the decaalanine conformations, and *SI Appendix*, Fig. S7 includes plots of ∂Δ*G*_{rep}/∂*x*_{i} and ∂Δ*G*_{att}/∂*x*_{i} versus ∂*A*/∂*x*_{i} for the alkanes.

The data in Table 1 show that *G*_{att}/∂*x*_{i}, with a few exceptions, was not strongly correlated with ∂*A*/∂*x*_{i}. These findings may indicate that Δ*G*_{att} is not truly a simple linear function of *A* but instead must be computed with a more robust theory, as discussed in several previous studies (19⇓⇓⇓⇓⇓⇓⇓–27).

To ensure that the estimates of ∂*A*/∂*x*_{i} used to generate Table 1 were accurate, finite difference estimates of these derivatives were computed by moving each atom by 0.0001 A in each direction. The finite difference estimates of these derivatives were nearly identical to those given by PROGEOM (*R*^{2} > 0.99999, and the slopes of the least-squares lines were between 0.99 and 1.01). To check that ∂Δ*G*_{rep}/∂*x*_{i} and ∂Δ*G*_{att}/∂*x*_{i} had converged, the estimates used to generate Table 1 were compared with estimates obtained by using only the first halves of the trajectories. The results from this second computation were nearly identical to those used to construct Table 1 (*R*^{2} > 0.99, with slopes between 0.96 and 1.03).

Additionally, _{rep} for the alkanes differed from that of the alkanes, this observation appears to imply that Δ*G*_{rep} is not a simple linear function of *A*.

As can be seen in Table 1, *G*_{rep}/∂*x*_{i} and ∂*A*/∂*x*_{i} (*R*^{2} ∼ 0.85) was weaker than that between Δ*G*_{rep} and *A* (Table 1 and Fig. 1*B*). This weaker correlation could indicate that the good correlation between *A* and Δ*G*_{rep} observed in Table 1 may be attributable to cancellations among small deviations from linearity. This observation may have implications for applications where the changes in Δ*G*_{rep} arising from small perturbations of a structure, such as the rotation of a side chain or the addition or deletion of a few atoms, are estimated. Predicting such free-energy changes by simply assuming that changes in Δ*G*_{rep}, Δ*G*_{att}, or Δ*G*_{vdw} should be proportional to changes in *A* may not be accurate.

Fig. 1*B* shows that for four of the atom types in the decaalanines (CT3, HA, HB, and O, corresponding to the β-C and terminal carbons, the hydrogens that bind to the β-C, the hydrogens that bind to the *α*-C, and the backbone oxygens, respectively), ∂Δ*G*_{rep}/∂*x*_{i} is roughly linear in ∂*A*/∂*x*_{i}, although the correlations between these two quantities are not perfect. However, for the atoms of type CT1 (*α*-C), no clear relationship between ∂Δ*G*_{rep}/∂*x*_{i} and ∂*A*/∂*x*_{i} was seen, and for atom types NH1 (backbone nitrogen) and C (carbonyl carbon), ∂Δ*G*_{rep}/∂*x*_{i} actually appeared to decrease with increasing ∂*A*/∂*x*_{i}. These findings are confirmed in Fig. 1*C*, where the probability densities (*f*) of the angles (θ) between the vectors ∇_{i}*A* and ∇_{i}Δ*G*_{rep}, where ∇_{i} = (∂/∂*x*_{i}, ∂/∂*y*_{i}, ∂/∂*z*_{i}) and (*x*_{i}, *y*_{i}, *z*_{i}) were the coordinates of the center of atom *i*, are plotted as functions of θ. Although *f*(θ) was peaked at 0° for atom types CT3, HA, HB, and O, *f*(θ) was peaked at 180° for atom types NH1 and C, and *f*(θ) had no clear peak for atom type CT1.

Atom types CT3, HA, HB, and O all make up significant portions of the decaalanine solvent-accessible molecular surface (Fig. 1*D*), and they typically have a single, continuous solvent-accessible surface patch. Moving these atoms further into the solvent (thereby increasing Δ*G*_{rep}) should also increase *A*. Conversely, although atoms of type CT1 have large solvent-accessible regions, these regions are broken up into several separate patches. Moving these atoms in one direction may increase the sizes of some of these patches while decreasing the sizes of others. This more complicated surface geometry probably explains why θ was not strongly peaked for the CT1 atoms. In contrast, atom types NH1 and C typically only have small patches of solvent-accessible surface. If such a patch lies at the bottom of a deep depression on the molecular surface, then moving the corresponding atom further into the solvent may increase its solvent-accessible surface area but may actually decrease *A* by burying solvent-accessible areas of neighboring atoms. An example of such an apparent anomaly is seen in Fig. 1*D*, where the solvent-accessible surface area of the carbonyl carbon of the second residue in one of the decaalanines (d) is shown, along with a vector pointing in the opposite direction of ∇_{i}*A*. Moving this atom in this direction initially decreases *A* (Fig. 1*A*), but after moving this atom in this direction by about 1 A, further motion of the atom in this direction would increase *A*. Apparently, whether Δ*G*_{rep} will increase and at what rate when an atom is moved depends not just on the change in *A* but on the arrangement of neighboring atoms and the local geometry of the molecular surface. In turn, because Δ*G*_{rep} appears to depend on multibody interactions, the atomic contributions to Δ*G*_{rep} do not appear to be additive.

### Considering the Molecular Volume as a Confounding Variable.

As shown above, γ_{rep} for the decaalanines differed from that for the alkanes, *G*_{rep}/∂*x*_{i} was not linear in ∂*A*/∂*x*_{i}. These observations imply that Δ*G*_{rep} is not a function only of *A*, but at first glance they might appear to be consistent with the idea that Δ*G*_{rep} is a function (*f*(*A*, *V*)) of *A* and *V*, approaching Eq. **3** for large molecules and Eq. **4** for small molecules, as has been widely discussed (28⇓⇓–31). However, for the systems presented here *V* is very nearly linear in *A*,*α* = 2.3 Å and *c* = −216 Å^{3} are fit constants, *R*^{2} = 0.998, and *δ* is a function of the coordinates of the atomic centers, defined by Eq. **8**. Thus, the problem of fitting these data against models that contain both *A* and *V*, such as Eqs. **5** and **6**, is not well-posed. However, we do feel that the results in this study, combined with Eq. **8**, can be used to place constraints on a putative *f*(*A*, *V*).

Eq. **8** implies*V*/∂*x*_{i} versus ∂*A*/∂*x*_{i} yielded estimates of *α* = 2.0 for the alkanes, 2.2 for the decaalanines, and 2.1 when the two data sets were considered together. The *R*^{2} of the least-squares lines were 0.97, 0.75, and 0.84.

If ∂*f*/∂*A*, ∂*f*/∂*V*, and ∂δ/∂*A* do not change significantly for a data set and Δ*G*_{rep} = *f*(*A*, *V*), then the slope of a plot of Δ*G*_{rep} versus *A* will be*δ* had no clear dependence on *A*.

Eq. **11**, combined with the observations that γ_{rep} for the alkanes differed from that for the decaalanines and that *δ* has no clear dependence on *A*, leads us to conclude that if Δ*G*_{rep} = *f*(*A*, *V*), then *f*(*A*, *V*) cannot be a linear function of some combination of *A* and *V*, as in Eq. **5**, because then ∂*f*/∂*A* and ∂*f*/∂*V* would be equal in the two data sets and thus their γ_{rep} would be equal.

Next, note that

From Eqs. **11** and **12**, we see that if Δ*G*_{rep} = *f*(*A*, *V*) is simply a function of *A* and *V*, ∂*f*/∂*A*, ∂δ/∂*A*, and ∂*f*/∂*V* do not change much for a set of molecules, and Eq. **8** holds for that set of molecules, the least-squares line of a plot of Δ*G*_{rep} versus *A* should have the same slope as one for a plot of ∂Δ*G*_{rep}/∂*x*_{i} versus ∂*A*/∂*x*_{i}. However, *G*_{rep}/∂*x*_{i} was not even linear in ∂*A*/∂*x*_{i}. We are therefore left with two possibilities: either Δ*G*_{rep} cannot be written as a function of only *A* and *V;* or ∂*f*/∂*A*, ∂δ/∂*A*, and ∂*f*/∂*V* are not roughly constant for each of the two molecular sets examined here but vary in some way that allows the slopes of the two plots to be different but still maintain apparent linear relationships between Δ*G*_{rep} and *A*, and ∂Δ*G*_{rep}/∂*x*_{i} and ∂*A*/∂*x*_{i}.

## Discussion

In this study, we computed the solvation free energy (Δ*G*_{vdw}) of an uncharged vdw cavity into water by computing Δ*G*_{rep} and Δ*G*_{att} for a series of alkanes and four configurations of decaalanine. As expected, Δ*G*_{att} and Δ*G*_{rep} were linear in *A* for both the decaalanines and the alkanes, and Δ*G*_{vdw} was linear in *A* for the decaalanines. Although the correlation between Δ*G*_{vdw} and *A* was weak for the alkanes, this poor correlation could be attributable to insufficiently accurate estimates of Δ*G*_{rep} and Δ*G*_{att} and the geometric contributions of the branched alkanes.

We found that γ_{vdw} < 0 for the decaalanines. This finding seems to indicate that whether Δ*G*_{vdw} favors extended or compact conformations is system dependent. Repeating these calculations with larger and bulkier peptide side chains could provide insight into why such residues are usually buried in protein interiors. The current finding is, however, in agreement with our findings in earlier studies (11, 13) that Δ*G*_{vdw} became increasingly negative as the number of monomers was increased for glycine and alanine peptides. By dividing Δ*G*_{vdw} into Δ*G*_{rep} and Δ*G*_{att} we found that γ_{vdw} < 0 for the decaalanines because, although increasing *A* increases the cost of excluding the water (γ_{rep} > 0), this cost is more than compensated for by the increased favorable dispersive interactions between the water and the peptide (γ_{att} < 0, and −γ_{att} > γ_{rep}).

Our analysis also allowed us to test the simple theories of hydrophobic solvation listed in the Introduction. Although Δ*G*_{vdw} generally increased with *A* for the alkanes, γ_{vdw} < 0 for the decaalanines. In combination with the observation that both sets of systems provided different estimates of γ_{rep} and γ_{att}, this finding may indicate that Eq. **1** is invalid because the proposed γ_{vdw} is not independent of the other properties of the molecule and the details of the correlations with solvent.

Eq. **3** also appears to be inconsistent with our results because the value of γ_{rep} obtained for the alkanes differed from that obtained for the decaalanines, also because *G*_{rep}/∂*x*_{i} was not always linear in ∂*A*/∂*x*_{i}. Indeed, for some atom types in the decaalanines ∂Δ*G*_{rep}/∂*x*_{i} was linear in ∂*A*/∂*x*_{i}, but the slope of the least-squares line was negative. This finding is difficult to explain without taking into account the effects of multibody correlations with neighboring solute atoms, as described in *Results*. Although Δ*G*_{rep} was linear in *A* for both the alkanes and decaalanines, the idea that Δ*G*_{rep} can be computed for molecular-scale cavities with a single, universal surface tension is probably invalid.

Although these results demonstrate that Δ*G*_{rep} is not simply a linear function of *A* (Eq. **3**) for these molecules, they may at first glance appear to be consistent with the idea that Δ*G*_{rep} is actually a function of both *A* and *V* (*f*(*A*, *V*)). However, as we discussed in *Results*, *V* was nearly linear in *A* for the systems in this study, so unambiguously fitting against models similar to those in Eqs. **5** and **6** was not a well-posed problem. Additionally, the linear dependence of *V* on *A* combined with the observations that γ_{rep} for the alkanes differed from that of the decaalanines, that *G*_{rep}/∂*x*_{i} was not linear in ∂*A*/∂*x*_{i} places strong constraints on the form of any such function *f*(*A*, *V*). In particular, models of the form in Eq. **5** are not consistent with these results.

Finally, our results indicate that even if Δ*G*_{rep} were given, Δ*G*_{vdw} might be difficult to compute because Eq. **7** cannot be used to compute Δ*G*_{att} for the decaalanines. Although SSP theory could predict Δ*G*_{att} for the alkanes, regardless of whether the initial or final water distributions were considered, it could not do so for the decaalanines. Instead, the change in water distributions between the initial and final states had to be considered.

In conclusion, we found that none of Δ*G*_{rep}, Δ*G*_{att}, and Δ*G*_{vdw} were simple functions of *A* and *V* over a range of alkanes and decaalanine conformers. Instead, these free energies appear to depend on multibody interactions with their neighboring atoms, the shape of the molecular interface, and on subtle changes in the water density surrounding the molecule when the attractive interactions are turned on. A successful hydrophobic theory will have to account for these interactions, and Eqs. **1** and **3****–****7** do not appear to be consistent with these findings.

## Materials and Methods

All MD simulations were run with a modified version of NAMD (nanoscale molecular dynamics) 2.9 (37). SHAKE was used to constrain the hydrogens. All simulations used the TIP3P (transferable intermolecular potential 3 point) water model modified for use with the CHARMM (chemistry at Harvard macromolecular mechanics) force field (38, 39), a constant temperature of 300 K, a constant pressure of 1 atm, periodic boundary conditions, particle mesh Ewald for the electrostatics, and a 2-fs time step (37). All *A*, *V*, and their derivatives with respect to the atomic coordinates were computed with the ALPHASURF program in the PROGEOM package (40), modified to weight each atom by 1. The solvent-accessible surface as defined by Lee and Richards was used (41), with the vdw radii taken from the CHARMM 22 force field for alanine and from the parameters used by Gallicchio et al. for the alkanes (22). A probe radius of 1.7682 Å was used rather than the normal choice of 1.4 Å because it was the radius of the oxygen atom in the water model. This choice also led to slightly better correlations between Δ*G*_{rep} and *A*.

### Structure Preparation.

The decaalanine structures were taken from a previous study (12). The structures we used were identified in that study as extended (d), denatured 1 (d1), denatured 2 (d2), and denatured 3 (d3). The parameters for the decaalanines were taken from the CHARMM 22 force field (38), and the parameters for the alkanes were taken from a previous study (22). The atoms of the decaalanines were fixed during all simulations. The alkanes were created in ideal configurations and allowed to move freely during the calculations of Δ*G*_{rep} and Δ*G*_{att}. For the simulations from which ∂Δ*G*_{rep}/∂*x*_{i} and ∂Δ*G*_{att}/∂*x*_{i} were computed, the configurations of the alkanes were fixed in the configurations obtained after minimization and equilibration in the initial and final frames of the computation of Δ*G*_{att}. Because we were not concerned with electrostatics in these calculations, all simulations were run with uncharged solutes.

### Free-Energy Definitions.

We defined Δ*G*_{rep} as the free-energy difference between a system where the solute did not interact with the water and one where the interaction potential was given by the repulsive part of the Weeks–Chandler–Andersen decomposition (19, 20) of *U*_{vdw}:

where *ε*_{ij} is the well depth and *U*_{vdw}. The energy difference between a system where the interaction potential between the solute and the water was *U*_{rep} and a system where it was *U*_{vdw} then gave Δ*G*_{att}. This process can also be considered to be the addition of an attractive potential (*U*_{att} ≡ *U*_{vdw} − *U*_{rep}) to a system where the interaction energy between the solute and the solvent was *U*_{rep}. We then defined Δ*G*_{vdw} ≡ Δ*G*_{rep} + Δ*G*_{att}. For this process of first creating the repulsive cavity and then turning on the attractive potential, Δ*G*_{att}, Δ*G*_{rep}, and Δ*G*_{vdw} are all well-defined.

### Free-Energy Calculations.

We computed Δ*G*_{rep} and Δ*G*_{att} by backward and forward FEP (34, 35). Although FEP estimates of solvation free energies converge more rapidly if they are performed by inserting the particle (42, 43), the differences between the results obtained with backward and forward FEP provide rough estimates of the errors in the calculations. In FEP a desired free-energy difference is computed by linking the initial and final states with a coupling parameter λ to create a λ-dependent potential (*U*(λ)), where *U*(0) is the interaction potential between the solute and the solvent in the initial state and *U*(1) is the interaction potential in the final state. For the computation of Δ*G*_{rep} *U*(0) = 0, *U*(1) = *U*_{rep}, and if

For the computation of Δ*G*_{att}, *U*(0) = *U*_{rep}, *U*(1) = *U*_{vdw}, and

For the computation of Δ*G*_{rep} for the alkanes, λ-space was divided into 100 equally spaced windows. First, each configuration of decaalanine was placed in a water box and underwent 5,000 steps of minimization with the solute atoms fixed. These structures were then equilibrated at each λ-value. During this equilibration, the temperature of the solute was raised from 25 to 300 K in increments of 25 K with simulations of 2 ps at each temperature. Production simulations of 10 ps were run at each λ-value. Snapshots were taken every 0.2 ps, and these frames were used in the FEP calculations. For the computation of Δ*G*_{att} for the alkanes, λ-space was divided into 20 equally spaced windows, and equilibrated structures at each λ value were generated as before. Simulations of 200 ps were then run at each λ-value. Snapshots were taken every 2 ps.

For the computation of Δ*G*_{att} for the decaalanines, λ-space was divided into 20 equally spaced windows, and equilibrated structures were obtained at each λ-value as described above. Simulations of 10 ps were then run at each λ-value. Snapshots were taken every 0.2 ps, and these frames were used in the FEP. These simulations were then repeated but run for 20 ps at each λ-value to verify that the original simulations had converged. For the computation of Δ*G*_{rep} for the decaalanines, λ-space was initially divided into 20 equally spaced windows, simulations of 1 ns were run at each λ-value, and snapshots were taken every 0.2 ps. To reach convergence, for any window in λ-space, if the estimate of the free-energy change across this window from forward FEP differed from that given by backward FEP by more than 1.0 kcal/mol divided by the total number of windows, the window was divided into two by inserting a new λ-value in between. This process was iterated until reasonable estimates were obtained. The final estimates of Δ*G*_{rep} for d used 89 windows, d1 used 93 windows, d2 used 97 windows, and d3 used 115 windows.

### LRT and SSP.

The LRT is an approximate method of computing free-energy changes between systems when the perturbing potential is small. In LRT, Δ*G*_{att} is estimated by

where the averages are taken over the ensembles defined by *U*_{rep} and *U*_{att}. If the relative probabilities of different configurations of the system do not change much between these two ensembles, then Δ*G*_{att} can be approximated by SSP either by

or

as has been attempted in several studies (23, 24, 27).

The simulations for the LRT and SSP calculations were obtained by running 1-ns simulations in the initial and final ensembles with the same conditions as used for the FEP calculations. Because the values of **17** and **18** should be equal if the assumptions of SSP are valid, either equation could have been used to test the validity of SSP. The values quoted here were obtained from Eq. **18**.

### Free-Energy Derivatives.

In addition to computing Δ*G*_{rep} and Δ*G*_{att}, we also computed their derivatives with respect to the atomic coordinates (*x*_{i}). If Δ*G*_{rep} and Δ*G*_{att} are linear functions of either *A* or *V*, then these derivatives will be proportional to either ∂*A*/∂*x*_{i} or ∂*V*/∂*x*_{i}. We computed ∂Δ*G*_{rep}/∂*x*_{i} by

where this average was taken in the ensemble defined by *U*_{rep}. We used LRT to estimate ∂Δ*G*_{att}/∂*x*_{i}:

These derivatives were computed from the same simulations used to compute the LRT and SSP estimates of Δ*G*_{att}.

## Acknowledgments

We thank The Robert A. Welch Foundation (H-0037), the National Science Foundation (NSF) (CHE-1152876), and the National Institutes of Health (GM-037657) for partial support of this work. A portion of this work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant OCI-1053575. In particular, the calculations were performed on the machines at the Texas Advanced Computing Center and Kraken at Oak Ridge National Laboratory.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: mpettitt{at}utmb.edu.

Author contributions: R.C.H. and B.M.P. designed research; R.C.H. performed research; R.C.H. and B.M.P. analyzed data; and R.C.H. and B.M.P. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1406080111/-/DCSupplemental.

## References

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Kokubo H,
- Harris RC,
- Asthagiri D,
- Pettitt BM

- ↵
- ↵
- ↵.
- Sharp KA,
- Nicholls A,
- Fine RF,
- Honig B

- ↵
- ↵.
- Young T

- ↵
- ↵.
- Chandler D,
- Weeks JD,
- Andersen HC

- ↵
- ↵.
- Gallicchio E,
- Kubo MM,
- Levy RM

- ↵
- ↵
- ↵
- ↵
- ↵.
- Wagoner JA,
- Baker NA

- ↵
- ↵.
- Huang DM,
- Chandler D

- ↵
- ↵.
- Rajamani S,
- Truskett TM,
- Garde S

- ↵.
- Meyer EE,
- Rosenberg KJ,
- Israelachvili J

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Edelsbrunner H,
- Koehl P

- ↵
- ↵
- ↵

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Chemistry

- Biological Sciences
- Biophysics and Computational Biology