New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Accurate, conformationdependent predictions of solvent effects on protein ionization constants

Communicated by Robert R. Baldwin, Stanford University Medical Center, Stanford, CA, January 9, 2007 (received for review July 12, 2006)
Abstract
Predicting how aqueous solvent modulates the conformational transitions and influences the pKa values that regulate the biological functions of biomolecules remains an unsolved challenge. To address this problem, we developed FDPB_MF, a rotamer repacking method that exhaustively samples side chain conformational space and rigorously calculates multibody protein–solvent interactions. FDPB_MF predicts the effects on pKa values of various solvent exposures, large ionic strength variations, strong energetic couplings, structural reorganizations and sequence mutations. The method achieves high accuracy, with root mean square deviations within 0.3 pH unit of the experimental values measured for turkey ovomucoid third domain, hen lysozyme, Bacillus circulans xylanase, and human and Escherichia coli thioredoxins. FDPB_MF provides a faithful, quantitative assessment of electrostatic interactions in biological macromolecules.
By strongly interacting with charges, the solvent significantly modulates the electrostatic properties of biomolecular systems. Although variations in pH and ionic strength are responsible for physiologically important conformational transitions (1), quantitatively assessing their role in modulating electrostatic energies is particularly challenging (2). Approaches in which solvent is modeled as a continuum polarizable medium while biomolecules are modeled in explicit molecular detail have become widely used in recent years (3, 4). However, no current method combines broad conformational sampling with a rigorous solvation model that can predict quantitatively and efficiently the effects of solvent on protein energetics and conformations.
In principle, molecular dynamics and freeenergy simulations monitoring protonation events can model accurately the energetic effects of solvation and conformational relaxation of biomolecules. However, despite recent progress, these techniques often fail to converge in a reasonable amount of computer time and therefore do not provide reliable predictions of thermodynamic properties (5, 6).
Rotamer repacking methods sample more efficiently and exhaustively the conformational space of biomolecules. However, these methods require the energy function be decomposed into self energies and interaction energies between pairs of residues (7–10). Consequently, repacking methods cannot model the effects of conformational relaxations involving more than two positions at a time. Many important properties of biomolecular surfaces (i.e., their interface with the solvent and the distribution of bulk ions around them) are not pairwise factorable and depend on the simultaneous knowledge of the conformations of all residues. Approximating these effects by relaxing the protein–solvent interface only at the vicinity of solventexposed charged residues does not improve the prediction of pKa values in proteins (11). Current rotamer repacking methods have also difficulties in assessing accurately and reliably the electrostatic properties of buried charged residues that often play important roles in catalysis, structural specificity, and folding (2, 9–11).
To address these problems, we developed FDPB_MF, a rotamer repacking method that combines a general and full treatment of side chain conformational flexibility with the rigorous computation of multibody protein–solvent interactions. Nonpairwise factorable solvation energy terms calculated by a finitedifference Poisson–Boltzmann (FDPB) method were incorporated into a meanfield (MF) side chain rotamer repacking algorithm. Side chain conformations that minimize the free energy of the system were identified with the explicit relaxation of the protein–solvent interface and of the ion distribution around the protein. Here, we apply FDPB_MF to the classic problem of calculating the ionization constants of protein side chains.
Results
For further details, see supporting information (SI) Text, SI Figs. 4–7, and SI Tables 5–8.
FDPB_MF was developed to provide a solution to the problem of combining efficiently the exhaustive sampling of protein side chain conformations with a rigorous treatment of solvation. The method is based on an ensemble description in which each conformation is assigned a weight corresponding to its probability of being occupied in the population. The effective quantities describing the probabilityweighted conformational ensembles were derived so that they would equate or well approximate the corresponding average calculated by enumerating all of the discrete states of the system. During the development of the method, three important challenges were encountered that led to algorithmic developments for accurate calculation of the solvation energy of a probabilityweighted conformational ensemble [Eq. 1 and SI Fig. 5], calculation of probabilitydependent physical quantities to solve the PoissonBoltzmann equation ( SI Text Eqs. S2 and S4 ) and mapping of probabilityweighted dielectric boundaries on a lattice ( SI Text Eqs. S6 and S7 and SI Fig. 6).
Choice of the Solute Dielectric Constant.
The choice of a dielectric constant for the solute (ε_{p}) depends on the level of explicit modeling of its polarizabilities. ε_{p} = 2 is commonly assumed to be an adequate value for the implicit average treatment of electronic polarizabilities. Because electronic fluctuations and backbone conformational relaxations are not treated explicitly in FDPB_MF, the solute was assigned a higher ε_{p} of 4, a value derived from crystalline acetamide, a molecular analogue of the protein backbone (12). However, no sets of atomic charges and radii with ε_{p} = 4 have been parameterized to reproduce solvation energies of peptides or proteins. The closest set of parameters is the PARSE set, parameterized to reproduce solvation energies of small molecules with solute polarizabilities corresponding to ε_{p} ranging from 2 to 3 (13). In principle, the calculations should be performed with the ε_{p} of the molecules from which the PARSE set of charges and radii were derived. Table 1 summarizes the prediction of the pKa values of OMTKY3 by FDPB_MF performed with ε_{p} = 2 or 4. The predictions with ε_{p} = 2 were still acceptable (rmsd = 0.5) but significantly less accurate than the ones performed with ε_{p} = 4, especially for the buried Asp27 (rms error of 0.9 versus 0.0). The main reasons for the reduced accuracy could be

FDPB_MF's treatment of polarizability due to conformational relaxation of the polypeptide chain is limited to the effects of sampling side chain rotameric degrees of freedom. The use of ε_{p} = 4 may compensate for the lack of treatment of backbone conformational relaxation and covalent bond distortions and also for the discrete representation of the conformational space.

Solvation due to electronic polarizability may be higher in protein cores than in highly solvated small molecules and would explain why the loss of accuracy mainly affected the buried Asp27.

With ε_{p} = 2, the difference in solvation energies and electrostatic interactions between solventexposed and buried positions are higher than with ε_{p} = 4. The ruggedness of the conformational free energy landscape may therefore be higher with ε_{p} = 2 than with ε_{p} = 4. Accordingly, the selfconsistent MF (SCMF) algorithm did not always well converge at ε_{p} = 2 and this may also explain why the predictions were less accurate. To ensure convergence, the lambda factor (SI Fig. 4 and ref. 15) which controls the rotamer probability updates in the SCMF was subsequently lowered to 0.15 in all of the calculations performed in this study. In these calculations, the solute and the solvent were also always assigned a dielectric constant of 4 and 80, respectively.
Test of the FDPB_MF Algorithm.
To validate the physical basis of the approach, we performed energy calculations and conformational searches on a model OMTKY3 with three flexible polar sites that we could enumerate discretely. Effective solvation energies were found to be within 0.3 kcal/mol of the exact probabilityweighted average energies of the discrete states of the system (data not shown). The ability of the FDPB_MF algorithm to find pHdependent global free energy minima also was assessed. The FDPB_MF titration curves were compared with the exact curve obtained by exhaustively sampling all of the microstates of the system (Fig. 1). The pKa of Asp27 predicted by FDPB_MF in the model OMTKY3 lies within 0.3 pH unit of the value calculated by using the exact scheme. However, the variations in the probability of the deprotonated species upon pH predicted by FDPB_MF were much sharper than the one predicted by enumerating all of the microstates of the system. Asp27 protonated and deprotonated species interact strongly with two different conformations of Tyr31 (see SI Fig. 7A ). This strong conformational coupling could not be resolved by a simple meanfield averaging of the conformational ensembles (zero and firstorder terms of Eq. 1 ) (16, 17). At pH values close to the pKa, where these mutually exclusive configurations have similar free energies, FDPB_MF only selects the configuration that has the lowest free energy. To accurately compute the probability of each chemical specie at pH values close to the pKa, protonated and deprotonated states for this residue pair have to be enumerated and treated separately (secondorder term in Eq. 1 ). The secondorder term in Eq. 1 also proved to be essential in the accurate prediction of the pKas of the two buried energetically coupled glutamate residues of Bacillus circulans xylanase.
Prediction of SolventDependent Protein Ionization Constants.
We then assessed the ability of FDPB_MF to accurately predict solventdependent energetics in proteins. For all FDPB_MF predictions, conformational flexibility was modeled for side chains atoms, whereas backbone atoms were fixed to the coordinates determined by xray crystallography. The resulting conformational ensemble was relaxed by the SCMF algorithm to find the distribution of rotamer probabilities that minimizes the free energy of the system. The pKa values were assigned to the pH at which the corresponding sites become 50% deprotonated.
Fig. 2 summarizes the pKa calculations performed by FDPB_MF on 31 acidic residues and compares these values to the predictions published by other methods. Among these, PROPKA performs best, with a rmsd, maximal error to the experimental values and correlation coefficient of 0.77 pH unit, 2.5 pH units, and 0.89, respectively. However, the slope of the best fit to these predictions was significantly lower than 1.0 (0.73), suggesting that the model underlying PROPKA has been trained to reproduce experimental data rather than accurately recapitulating first physical principles. FDPB_MF predicts the pKa values with a rmsd, maximal error to the experimental values and correlation coefficient of 0.34 pH unit, 0.70 pH unit, and 0.97, respectively. The slope of the best fit to these predictions was 1.01. We largely focused our study on functionally or structurally important residues with pKas that are often difficult to predict accurately (Table 2). PROPKA surpasses other published methods with a rmsd and maximal error to the experimental values of 0.89 unit and 2.5 pH units, respectively. FDPB_MF predicts the pKa values with a rmsd, maximal error to the experimental values of 0.38 and 0.70 pH unit, respectively.
The ionic strength is an important parameter governing the stability and function of proteins. Several acidic sites of OMTKY3 show measured pKa shifts of 0.25–0.77 pH unit upon variation of the ionic strength from 10 mM to 1 M (19). FDPB_MF accurately predicted the effects of such large variation in ionic strength (Table 3). The rmsd and the maximum error to the experimental values were 0.14 and 0.2 pH unit, respectively. The sensitivity of pKas to the change in bulk ionic strength often reflects the nature of the electrostatic interactions stabilizing a particular ionized site (30, 31). These results suggest that FDPB_MF predicts quantitatively the complex balance of electrostatic interactions that stabilize charged residues.
Accurately predicting the energetic effects of mutations is an important goal in protein modeling. The insensitivity of the pKas of three acidic residues (Asp7, Glu10, Glu19) to the removal of a neighboring positively charged residue (Lys 34) was not accurately predicted by simple FDPB calculations (20). FDPB_MF predicted smaller pKa perturbations (Table 4), with rmsd and maximal error from the experimental data of 0.28 and 0.4 pH unit, respectively.
Electrostatic interactions play a crucial role in modulating the catalytic reactivity of enzymes. We assessed the ability of FDPB_MF to accurately predict the pKa values of the two buried energetically coupled glutamate residues that lie at the heart of the active site of B. circulans xylanase (BCX; Table 2 and SI Table 8). Simple FDPB calculations with a single dielectric constant of 8 for the protein failed to predict accurately the pKa values of both glutamates in the WT protein. By introducing an explicit treatment of conformational flexibility, EGAD performed better than FDPB and similar to PROPKA for the WT protein. However, EGAD failed to predict the large pKa shift (−2.5 pH units) observed for Glu172 when Glu78 was mutated to Gln (21, 22). By finely sampling the conformational space in the vicinity of the glutamates and by uncoupling their titration in the WT protein, FDPB_MF accurately predicted the unusually high pKa of Glu172 (pKa of 6.7) and its large pKa shift upon mutation of Glu78 to Gln. The rmsd and maximal error to the experimental values were equal to 0.2 and 0.3 pH unit, respectively.
The buried, catalytically important Asp26 in Escherichia coli and human thioredoxins have among the highest pKa values ever measured for an aspartate (23). FDPB_MF predicted their pKa more accurately than other methods, with errors of 0.5 and 0.7 pH unit compared with the experimental values (Table 2 and SI Table 8).
Prediction of SolventDependent Protein Conformations.
Because protein energetics and conformations are tightly coupled, we expect quantitative energetic predictions to be corroborated with highresolution structural predictions.
Fig. 3 compares the observed and predicted structural reorganizations occurring in the active site of B. circulans xylanase upon titration of Glu 172. Although the crystals were soaked at pH 4.0, the observed conformation at “pH 4.0” likely corresponds to that induced by deprotonated Glu 78 and protonated Glu 172 (21) and can be compared with the structure predicted at pH 5.5. Except for a small difference at Tyr 69, the observed and predicted conformational changes induced by protonation of Glu 172 are in good agreement. The aromatic ring of Tyr 80 flips by 180° to preferentially stabilize the remaining deprotonated Glu at position 78. The Asn 35 amide group moves away from the protonated Glu 172 and the amide of Gln 127 comes slightly closer to Glu 78. The rmsd between the predicted and observed distance changes involving protons and heteroatoms in the active site was 0.2 Å (SI Table 5).
FDPB_MF also predicted substantial pH dependent conformational shifts ( SI Text and SI Fig. 7) that have yet to be tested by experiments. Most of these conformational changes were corroborated with significantly improved energetic predictions when compared with traditional FDPB calculations performed on fixed crystallographic structures (Tables 2–4 and SI Tables 6–8).
Discussion
Protein stability, solubility, recognition, and catalysis depend critically on electrostatic interactions, and many important conformational transitions in proteins are triggered physiologically by changes in bulk ion composition and in pH. FDPB_MF was developed to accurately predict this coupling between solventdependent protein energetics and conformations. Here we used FBPB_MF to predict the pKa values of 31 carboxylates in very different protein environments and solvent conditions. This approach yielded predictions of unprecedented accuracy as well as new capabilities to model the effects of mutations and changes in ionic strength. The accuracy of the pKa predictions approached the experimental error in the measured values, adding considerable utility to the pKa predictions based on structural data.
For charged residues near the protein surface, the solvent accessibility and solvation energies are mainly determined by the conformation of neighboring residues or substrates. By explicitly modeling the conformational relaxations of all residues and/or ligands that constitute the proteinsolvent interface, the FDPB_MF method captures the energetics of partially solventexposed acidic residues more accurately than available methods that rely solely on pairwisefactorable energy functions. Buried charged residues are less frequent and are often involved in catalysis or in structural specificity. Anisotropy in polarizability of protein interiors is higher than at the surfaces, in line with the inability of FDPB calculations to capture implicitly the structural relaxation of protein interiors with a single dielectric constant (21). The energetics of buried residues are therefore sensitive to the fine conformational reorganization of their packed microenvironments. FDPB_MF explicitly models these effects and predicts the energetics of buried charged residues more accurately than current molecular dynamics simulations and rotamer repacking methods. The accuracy of FBPB_MF arises from the unique combination of fine sampling of the side chain conformational space, rigorous calculation of multibody protein–solvent interactions and a general treatment of conformational relaxation. Unlike current methods based on first principles, FDPB_MF quantitatively predicts the energetic effects of various protein environments, sequence mutations and solvent compositions (Fig. 2, rmsd from experimental values = 0.34 pH unit). Each calculation reported here took 2–6 days on a single Pentium 4 2.4 GHz CPU for a single pH.
Understanding the relationships between protein energetics, conformations and dynamics also presents major challenges in protein modeling. Proteins exhibit side chain conformational flexibility that is dictated by their structural environment and by the physical properties of the solvent. FDPB_MF predicts pHinduced, side chain conformational changes involving either changes in the dominant rotamers or redistribution of probabilities within a conformational ensemble (Fig. 3 and SI Fig. 7). These shifts highlight the complex nature of the electrostatic interactions involved in stabilizing charged residues. The explicit modeling of these structural reorganizations by FDPB_MF is corroborated by improved energetic predictions when compared with fixedstructure calculations (Tables 2–4 and SI Tables 6–8). The pHinduced conformational changes predicted by FDPB_MF in B. circulans xylanase are in good agreement with those observed in the crystals (Fig. 3). As the challenges of protein structure prediction continue to move toward the generation of highresolution models (33), the accurate prediction of solventdependent conformational changes will grow in importance.
Materials and Methods
FDPB Calculations.
Perturbation theory is an effective approach for the computation of meanfield properties and correlations beyond the meanfield. Following the same formalism (i.e., in the form of a perturbation series), Eq. 1 describes a general solution to the electrostatic protein–solvent interaction energy (E^{FDPB} ) of a conformational ensemble defined by a rotamer probability distribution (P_{I,R} ). In this ensemble description, each rotamer is assigned a weight corresponding to its probability of being occupied in the population of rotamers. In the meanfield approximation, individual side chain rotamers interact with the probabilityweighted average of all of the rotamers at the neighboring sites. The corresponding meanfield electrostatic protein–solvent interaction energies are defined by the zero and firstorder terms of Eq. 1 . These were computed in two steps by the FDPB module (SI Fig. 5). First, all side chain rotamers R at all sites I weighted by their probabilities P_{I,R} were mapped onto a threedimensional grid. The Poisson–Boltzmann (PB) equation was solved by finitedifferences for this “distributed” conformational ensemble and a corresponding solvation energy, E^{DIST} , was computed (zeroorder term in Eq. 1 ). This “distributed” representation of the system is inaccurate partly because rotamers from the same site are present and interact with each other. An additional energy term for each rotamer, E ^{100} _{IR} (firstorder term in Eq. 1 ), is necessary and was computed by iteratively placing single discrete rotamers at each site and solving the PB equation for the solvation energy. Adding the firstorder to the zeroorder term of Eq. 1 corrects for the simultaneous presence of rotamers on the grid and gives an accurate solution to the meanfield protein–solvent electrostatic interaction energies. The meanfield treatment was sufficient for most calculations performed in this study. It failed, however, when side chain conformations from different sites were strongly coupled. Higherorder terms in Eq. 1 are then necessary to describe the mutually exclusive combinations of rotamers.
The meanfield treatment requires the definition of effective physical quantities that accurately describe the probabilityweighted conformational ensembles. These quantities were derived so that they would equate or well approximate the corresponding average calculated by enumerating all of the discrete states of the system. If a rotamer R alone occupies a particular space with a probability P _{R}, it is equivalent to that space being occupied a fraction P _{R} of the time by the solute and a fraction (1 − P _{R}) of the time by the solvent. Consequently, if E _{R}, E _{S}, and E _{W} define the Born solvation energy of a charge covered by rotamer R, pure solute, or water, respectively, then E _{R} should equate the probability weighted average of E _{S} and E _{W} as defined by Eq. 2 Eq. 2 was used to derive the effective physical quantities needed to solve the PB equation by finite differences for the probabilityweighted conformational ensembles ( SI Text ).
The FDPB algorithm is based on the QNIFFT program (Academic version of Delphi provided by K. Sharp, University of Pennsylvania, Philadelphia, PA). All calculations were carried out on a 129 cubic grid. Focusing yielded grid resolutions > 2.4 grid units/Å (34). The linearized form of the PB equation was solved by the inexact Newton method with a multilevel solver algorithm to aid convergence (35). The solute was mapped on the grid with PARSE charge and radii parameters (13). Atomic charges, dielectric constants, and Debye factors were assigned according to the probabilities of occupancy of the solute and water ( SI Text ). A 1.4Å water sphere probe radius and a 2.0Å Stern ionexclusion radius were used to generate the solventaccessible surface and ion exclusion layer, respectively. Solvation energies were calculated by subtracting the energy of the grid computed for the solute placed in solvent from the energy of the same grid computed without solvent in a medium of uniform dielectric constant.
Conformational Search.
A SCMF method was developed that relaxes a side chain rotamer probability distribution to minimize the free energy of a protein conformational ensemble (15). In this method, nonpairwise factorable protein–solvent electrostatic interactions were computed repeatedly as rotamer probabilities were updated (SI Fig. 4). At each iteration of the relaxation, a new rotamer probability distribution (PM_{1}) was sent to the FDPB module, and the algorithm computed protein–solvent electrostatic interaction energies for that particular rotamer probability distribution. These quantities were combined with precomputed one and twobody nonelectrostatic potential energy terms to calculate meanfield potential energies (E _{ir}).
pKa Predictions.
Proteins were described by a conformational ensemble consisting of a fixed backbone and a library of side chain rotamers at each flexible position. For all calculations, fixed backbone and side chain coordinates were taken from the xray structures 1PPF (25), 1XNB (26), 2TRX (27), 1ERT (28), and 2LZT (29) for OMTKY3, BCX, TRX, TRH, and HEWL, respectively. The penultimate rotamer library (36) was used to model side chain conformational flexibility for all amino acids. Side chain coordinates were built on the backbone structure and energy minimized by using the CHARMM19 geometric and van der Waals potential energy terms and a 20° square well dihedral restraint (37). Protonated rotamers for acidic residues were built by placing the proton trans to the preceding methylene group. The entropic bias for the protonated state was corrected by adding the quantity RT × log(N _{p}/N _{d}) to the meanfield energy of the protonated rotamers (N _{p} and N _{d} being the number of protonated and deprotonated rotamers at a given site, respectively).
The modeling of side chain flexibility was defined in each system to keep the calculations efficient while adequately sampling conformational space ( SI Text ). In the minimalist OMTKY3 system, three neighboring polar residues (Asp27, Lys29, and Tyr31) were treated simultaneously with several side chain rotamers weighted by probabilities lying between but not equal to 0 and 1. All other sites were placed in the discrete side chain crystallographic coordinates.
In addition to the electrostatic solvation term (U ^{FDPB}), the potential energy function was approximated and decomposed in pairwisefactorable potential energy terms where U ^{geom} and U ^{LJ} correspond, respectively, to the bonded and Lennard–Jones energy terms of the CHARMM19 force field (37). U ^{SAS} corresponds to the pairwisefactorable approximation of the nonelectrostatic solvation potential derived from the product over the solvent surface tension and the solventaccessible surface area of the solute (38). U ^{protonation} consists of a protonation potential and is derived from a thermodynamic cycle relating a titratable site in the protein to the equivalent site in an isolated model compound with an experimentally determined pKa (12).
Relaxations of the conformational ensembles were performed by the SCMF algorithm at several pHs varying by steps of 0.25 pH unit. The pKa of a particular site was assigned to the pH at which the probabilities of the deprotonated and protonated rotamers become equal.
Acknowledgments
We thank J. Havranek for sharing a program for side chain conformational search that was useful in the initial steps of the project. We are grateful to Lawrence McIntosh for drawing our attention to and providing unpublished coordinates on the xylanase system, to HoLeung Ng and Mark Sales for critical reading of the manuscript, and to the anonymous reviewers for their suggestions. This work was supported by National Institutes of Health Grant GM48958 (to T.A.).
Footnotes
 ^{†}To whom correspondence should be sent at the present address: Department of Biochemistry, University of Washington, Seattle, WA 98195. Email: barthp{at}u.washington.edu

Author contributions: P.B., T.A., and P.B.H. designed research; P.B. performed research; P.B., T.A., and P.B.H. analyzed data; and P.B. and T.A. wrote the paper.

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/cgi/content/full/0700188104/DC1.
 Abbreviations:
 PB,
 Poisson–Boltzmann;
 FDPB,
 finitedifference PB;
 MF,
 meanfield;
 SCMF,
 selfconsistent MF;
 OMTKY3,
 turkey ovomucoid third domain;
 BCX,
 B. circulans xylanase;
 TRX,
 E. coli thioredoxin;
 HEWL,
 hen lysozyme.
 © 2007 by The National Academy of Sciences of the USA
References
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵

↵
 Havranek JJ ,
 Harbury PB
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵

↵
 Campbell RL ,
 Rose DR ,
 Wakarchuk WW ,
 To RJ ,
 Sung W ,
 Yaguchi M ,
 Suominen P ,
 Reinikainen T
 ↵

↵
 Weichsel A ,
 Gasdaska JR ,
 Powis G ,
 Montfort WR
 ↵
 ↵
 ↵
 ↵

↵
 Bradley P ,
 Misura KM ,
 Baker D
 ↵
 ↵
 ↵
 ↵
 ↵