Temperature dependence of amino acid hydrophobicities

Significance Systematic relationships have long been recognized between the hydrophobicities of amino acids and (i) their tendencies to be located at the exposed surfaces of globular and membrane proteins and (ii) the composition of their triplets in the genetic code. Here, we show that the same coding relationships are compatible with the high temperatures at which life is widely believed to have originated. An accompanying paper reports that these two properties appear to be encoded separately by bases in the acceptor stem and the anticodon of tRNA. The hydrophobicities of the 20 common amino acids are reflected in their tendencies to appear in interior positions in globular proteins and in deeply buried positions of membrane proteins. To determine whether these relationships might also have been valid in the warm surroundings where life may have originated, we examined the effect of temperature on the hydrophobicities of the amino acids as measured by the equilibrium constants for transfer of their side-chains from neutral solution to cyclohexane (Kw>c). The hydrophobicities of most amino acids were found to increase with increasing temperature. Because that effect is more pronounced for the more polar amino acids, the numerical range of Kw>c values decreases with increasing temperature. There are also modest changes in the ordering of the more polar amino acids. However, those changes are such that they would have tended to minimize the otherwise disruptive effects of a changing thermal environment on the evolution of protein structure. Earlier, the genetic code was found to be organized in such a way that—with a single exception (threonine)—the side-chain dichotomy polar/nonpolar matches the nucleic acid base dichotomy purine/pyrimidine at the second position of each coding triplet at 25 °C. That dichotomy is preserved at 100 °C. The accessible surface areas of amino acid side-chains in folded proteins are moderately correlated with hydrophobicity, but when free energies of vapor-to-cyclohexane transfer (corresponding to size) are taken into consideration, a closer relationship becomes apparent.

The hydrophobicities of the 20 common amino acids are reflected in their tendencies to appear in interior positions in globular proteins and in deeply buried positions of membrane proteins. To determine whether these relationships might also have been valid in the warm surroundings where life may have originated, we examined the effect of temperature on the hydrophobicities of the amino acids as measured by the equilibrium constants for transfer of their side-chains from neutral solution to cyclohexane (K w>c ). The hydrophobicities of most amino acids were found to increase with increasing temperature. Because that effect is more pronounced for the more polar amino acids, the numerical range of K w>c values decreases with increasing temperature. There are also modest changes in the ordering of the more polar amino acids. However, those changes are such that they would have tended to minimize the otherwise disruptive effects of a changing thermal environment on the evolution of protein structure. Earlier, the genetic code was found to be organized in such a way that-with a single exception (threonine)-the side-chain dichotomy polar/ nonpolar matches the nucleic acid base dichotomy purine/pyrimidine at the second position of each coding triplet at 25°C. That dichotomy is preserved at 100°C. The accessible surface areas of amino acid side-chains in folded proteins are moderately correlated with hydrophobicity, but when free energies of vapor-tocyclohexane transfer (corresponding to size) are taken into consideration, a closer relationship becomes apparent. hydrophophobicity | protein folding | anticodon | temperature | genetic code T he equilibrium conformations of proteins in neutral solution are strongly influenced by interactions between their constituent amino acids and solvent water. Early work on the crystal structure of hemoglobin and related proteins showed that the side-chains of the more polar amino acid residues tend to be exposed to solvent, whereas less polar side-chains tend to be buried within the interior of globular proteins (1). Later, those tendencies were put to a quantitative test by measuring equilibria of transfer of amino acid side-chains from neutral aqueous solution into less polar environments, such as the vapor phase (2,3) or a nonpolar solvent such as cyclohexane (4), which dissolves only ∼2 × 10 −3 M water at saturation (5) and appears to be devoid of specific interactions with solutes. The water-to-cyclohexane distribution coefficients (K w>c ) of the 20 common sidechains [here termed "hydrophobicities" (6, 7) and expressed in concentration units of mol/L in each phase; SI Appendix] were found to span a range of 15 orders of magnitude at pH 7 and 25°C. Values of K w>c have been shown to be related to their outside-to-inside distributions in globular proteins (4,8) and to their tendencies to appear within the buried sequences of transmembrane proteins (9)(10)(11).
Those solvent distribution experiments were conducted at what we would consider ordinary temperatures. However, there is widespread (12, 13)-if not universal (14)-agreement that life originated when the earth was warmer-perhaps much warmerthan it is today, and some modern organisms thrive at temperatures approaching the boiling point of water. The literature discloses little concrete information about how the hydrophobicities of the individual amino acids respond to changing temperature. Would the rules of protein folding be expected to differ significantly for an organism such as Pyrococcus horikoshii, which grows optimally at 98°C (15), and might such differences have affected the early stages of evolution of life on Earth?
Earlier work also uncovered the existence of a pronounced bias in the relationship between amino acid hydrophobicity and the genetic code. Thus, a pyrimidine at the second codon position signals amino acids whose average hydrophobicity is much greater than those coded by a purine at the same position (2,3). The values reported here allow a more detailed analysis, described in a companion paper (16), which reveals that two separate codes for amino acid size and hydrophobicity appear to be embedded in different parts of their tRNA sequences, with size (represented by vapor-to-cyclohexane equilibria) encoded in the acceptor stem, and hydrophobicity (represented by water-to-cyclohexane equilibria) embedded in the anticodon. Would one expect these relationships to be maintained at elevated temperatures?
In the present work, we sought to address these questions by determining the effect of temperature on equilibria of transfer of molecules representing the side-chains of each amino acid from neutral aqueous solution to cyclohexane (K w>c ).

Materials and Methods
For molecules representing the side-chains of Gly, Ala, Val, Leu, and Ile (hydrogen, methane, propane, butane, and isobutane), values of K w>c were obtained by combining the best values for the molar solubilities of the gases in water and in cyclohexane, compiled from the literature in the Solubility Data Series (17). These values are considered accurate to within ±0.2 kcal/mol in ΔG at 25°C, ±0.4 kcal/mol in ΔH, and ±0.6 kcal/mol in TΔS at 25°C.
For molecules representing most of the other amino acid side-chains, we used 1 H-NMR as described earlier (4) to determine values of K w>c for the uncharged forms. In most cases, the solute was dissolved at a known concentration, established gravimetrically (0.01-0.1 M), in the more-favored solvent (water or cyclohexane). An aliquot (1.5 mL) was introduced into a scintillation vial and mixed vigorously with the less-favored solvent (15 mL) for 30 min using a magnetic stirring bar, immersed in a Haake K20 water Significance Systematic relationships have long been recognized between the hydrophobicities of amino acids and (i) their tendencies to be located at the exposed surfaces of globular and membrane proteins and (ii) the composition of their triplets in the genetic code. Here, we show that the same coding relationships are compatible with the high temperatures at which life is widely believed to have originated. An accompanying paper reports that these two properties appear to be encoded separately by bases in the acceptor stem and the anticodon of tRNA. bath in which the temperature was maintained within 0.1°C as measured with an ASTM International thermometer. The mixture was allowed to settle for 15 min, and a sample (0.1 mL) of the aqueous layer was removed using a pipette with a gel-loading tip and mixed with 99% D 2 O (0.8 mL) to which a known concentration of pyrazine (0.01 M, δ 8.60 ppm) had been added as a proton integration standard for determining the concentration of solute. 1 H-NMR spectra were acquired using a Bruker 500-MHz spectrometer equipped with a cryoprobe, using a water suppression pulse sequence with two transients. When the favored solvent was cyclohexane, K w>c was calculated from the concentrations of solute in the aqueous phase before and after extraction. When the favored solvent was water, the concentration of solute in cyclohexane was determined by back-extraction from the cyclohexane phase (10 mL) with D 2 O (1 mL).
Values of K w>c for the uncharged side-chains of Asp and Glu (acetic and propionic acids) were determined in the presence of 0.03 M HCl to suppress ionization. Values for the uncharged side-chain of His (1-methylimidazole) and Lys (1-butylamine) were determined in the presence of 0.03 M KOH to suppress ionization. Values for the uncharged side-chain of Arg (N-propylguanidine) were obtained by conducting measurements in the presence of 0.3, 1, and 3 M KOH to suppress ionization, and no significant variation was observed. To circumvent problems associated with the volatility of methanethiol (the side-chain of Cys) at elevated temperatures, distribution measurements were conducted in the presence of potassium phosphate buffer (0.1 M, pH 12.35), to maintain this solute in a constant state of ionization (99% thiolate) (pK a 10.35, assumed to be equivalent to that of ethanethiol at 25°C) (18). Because the heat of ionization of HPO 4 -2 (4 kcal/mol) is similar to that of thiols (6 kcal/mol) (19), minimizing variations in the relative abundance of the charged and uncharged species of methanethiol within the 40°C range of temperatures over which distribution experiments were conducted (Table 1), potassium phosphate buffers were chosen for these experiments. To obtain the distribution coefficient of uncharged methanethiol, these results were corrected to reflect the fraction of uncharged thiol that was present in the aqueous phase at various temperatures. In the case of methanol, distribution measurements were conducted using 1-14 C-methanol by determining the concentrations of radioactivity in both the aqueous and the cyclohexane phase.
In each case, experiments were conducted over at least a fivefold range of solute concentrations. The results showed no significant variation in K w>c value, such as would have been expected if self-association of the solute had been present in either phase. Table 1 shows the temperature range over which the positions of transfer equilibria were measured. In all cases, van't Hoff plots were linear within experimental error over the ∼50°C temperature range examined, yielding values with an estimated error of ±0.3 kcal/mol in ΔG at 25°C, ±0.5 kcal/mol in ΔH, and ±0.8 kcal/mol in TΔS at 25°C. Differences in heat capacity appeared not to exert an important influence on the response of these transfer equilibria to temperature, in view of the absence of significant curvature in the van't Hoff plots over the 40-50°C temperature range examined.
The side-chains of Asp, Glu, His, Lys, and Arg enter cyclohexane in detectable amounts only in their uncharged forms (4), as expected from the extremely large heats of solution of ions by water (20). Thus, the overall equilibria of transfer [K w>c(total) ] of those ionizable solutes (involving both the charged and uncharged forms of the side-chains of Asp, Glu, His, Lys, and Arg that are present at pH 7) include a term for the water-to-cyclohexane transfer of the uncharged species, divided by the fraction (α) of the solute molecules that are present in the uncharged form in aqueous solution at pH 7. For an amine side-chain (A), that fraction was calculated from the pK a value of the amine's conjugate acid (AH + ) using Eq. 1. For a carboxylic acid side-chain (CH), α was calculated using Eq. 2 To estimate values of α at elevated temperatures, enthalpies of ionization of molecules representing each side-chain (19) (e.g., acetic acid representing Asp and n-butylamine representing Lys) were used to estimate pK a values of the side-chain models at elevated temperatures from the corresponding values (21) at 25°C ( Table 2). The resulting values of the equilibrium constant K w>c(total) , which describes the distribution of the solute in both its charged and uncharged forms from neutral aqueous solution to cyclohexane, are shown in Table 3.
Values of K v>c (SI Appendix, Table S1) needed to complete the vapor phase thermodynamic cycle (SI Appendix, Fig. S1) were calculated by multiplying the values of K v>w (3) by the present values of K w>c(total) . Table 1 summarizes the present values of log 10 K w>c(uncharged) for water-to-cyclohexane transfer of uncharged molecules representing side-chains of the 20 common amino acids at 25°C and 100°C  Table 2 shows the fractions (α) of molecules representing the ionizing side-chains of Asp, Glu, His, Lys, and Arg (acetic acid, propionic acid, 4-methylimidazole, butylamine, and N-propylguanidine) that remain uncharged at pH 7 and 25°C, based on pK a values from the literature (21). Values for the heats of ionization of molecules representing the ionizing side-chains of Asp, Glu, His, Lys, and Arg were used to calculate their α values at 100°C as described above at pH 6.14, the pH value of a neutral solution at 100°C (22). Table 3 shows values of log K w>c(total) at 25°C for all species (charged plus uncharged), obtained by dividing K w>c(uncharged) by the value of α at 25°C. A similar procedure was used to obtain values of K w>c(total) at 100°C for all species (charged plus uncharged).

Discussion
Enthalpies of Transfer of Uncharged Side-Chains. Uncharged sidechains tend to become more hydrophobic with increasing temperature, as indicated by their positive ΔH values for water-to-cyclohexane transfer (Table 1). Because H-bonds become weaker with increasing temperature (23), an increase in enthalpy would be expected when water is released from close contact with the polar groups of a solute when it passes from water into cyclohexane; positive ΔH w>c values (5-14 kcal/mol) are observed for the uncharged, but polar, side-chains of Arg, Asp, Asn, Glu, Lys, Gln, His, Ser, and Thr. Entropies of Transfer of Uncharged Side-Chains. An increase in entropy would be expected when structured water is released from close contact with nonpolar groups around which it was organized (Table 1). One would therefore anticipate an increase in entropy when a nonpolar molecule leaves water and enters a nonpolar solvent like cyclohexane (23). However, because water is also organized in the vicinity of polar substituents, a reduction in the local organization of solvent water would also be expected when a polar molecule leaves water and enters cyclohexane. It is therefore not surprising that substantial positive values of TΔS are observed for water-to-cyclohexane transfer of side-chains of Ile, Leu, and Val, which are nonpolar, and for the side-chains of Arg and Glu, which are strongly polar.
Overall Equilibria of Transfer of Amino Acid Side-Chains (Including Both Charged and Uncharged Species) from Water to Cyclohexane at pH 7.
Because amino acid side-chains enter cyclohexane only in their uncharged forms, the effect of ionization of the side-chains of His, Lys, Arg, Asp, and Glu is to draw these molecules preferentially into the aqueous phase [as indicated by K w>c(total) ], to an extent that varies with the extent to which the pK a value of the ionizing side-chain differs from 7 ( Table 3). Values of the overall equilibrium constant [K w>c(total) ] for transfer of each side-chain, taking into account the sum of the concentrations of both the charged and uncharged forms ( Eqs. 1 and 2), span a range of 10 15.2 -fold at 25°C, with Arg occupying an extreme position (Table 3 and Fig. 1).
At elevated temperatures, that range of values becomes smaller, for at least two reasons. First, the effect of temperature in enhancing K w>c(uncharged) is more pronounced for the more polar than for the less polar amino acids, so that the overall range of K w>c values for the uncharged side-chains shrinks with increasing temperature, from 10 8.6 -fold at 25°C to 10 6.7 -fold at 100°C (Table 1 and Fig. 1). Second, the fraction (α) of the sidechains of His, Lys, Arg, Asp, and Glu that remains uncharged decreases with increasing temperature ( Table 2).

Relationship Between Side-Chain Transfer Equilibria and Protein
Folding. Considerable effort has been devoted to the search for a general scale that might relate the physical properties of the side-chain of each the 20 amino acids to its solvent-accessible surface area in folded proteins (ASA fold ), as determined using a probe that is often a water-sized sphere (24)(25)(26)(27)(28)(29)(30)(31)(32). Values of ASA fold depend on the size and shape of the probe, the presence or absence of steric constraints on the approach of a probe that may be imposed by flanking peptide bonds, and the rotameric preferences of the larger side-chains, all of which can affect the normalization of estimated ASA values. Despite those potential pitfalls, we decided to explore the relationship, if any, between a recent set of weighted average values of ASA fold reported by Moelbert et al. (28) and the three branches of the vapor phase thermodynamic cycle (Fig. 2B). Proline was omitted from consideration because it occurs frequently in turns (26), and cysteine was omitted because it tends to be buried by cross-linkage with other cysteine residues or coordination by metal ions (33). After conversion of each ASA fold value to a virtual equilibrium constant (K surf ) (SI Appendix, Fig. S2), correlations between log K surf and each of the three phase transfer equilibria were found to be R 2 = 0.62 for log K w>c , R 2 = 0.32 for log K v>w , and R 2 = 0.19 for log K v>c (Fig. 2A). Thus, the strongest correlation was with K w>c , confirming the results of an earlier study (4). Agreement was found to improve (Fig. 2C) when a second independent physical property was included in Eq. 3 log K surf = β 0 + β w>c ðΔG w>c Þ + β v>c ðΔG v>c Þ + β w>c * v>c ðΔG w>c ÞðΔG v>c Þ + e [3b] in which the β values were estimated by least-squares regression analysis. Eqs. 3a-3c represent different pairs of transfer equilibria (w>c,v>w = red; w>c,v>c = green; v>c,v>w = blue; Fig. 2B) from the three-way vapor phase cycle (34), and e is a residual. When each of these pairs was tested for its 2D ability to predict log K Surf , correlations of those predictions with the observed values averaged R 2 = 0.90 (SI Appendix, Fig. S3). Using the Student t test to determine whether this improved correlation (Fig. 2C) exceeded that expected for an additional random predictor, we obtained P < 0.0001 for each of the two transfer free energies and P = 0.02 for their interaction. Using the results of a more recent model for predicting ASA fold , proposed by Tien et al. (32), similar values of the three coefficients were obtained (SI Appendix, Table S2). The one-dimensional correlations described in Fig. 2A indicate the more prominent role for K w>c values in both sets of correlations (SI Appendix, Fig. S3). The K v>c branch of the phase-transfer cycle (Fig. 2B) is related to amino acid size as represented by its surface area in a tripeptide (R 2 = 0.86), its volume (R 2 = 0.83), or its mass (R 2 = 0.89). It is uncorrelated with K w>c (R 2 = 0.01; SI Appendix) and constitutes a linearly independent source of information that was omitted from previous attempts to understand how the properties of amino acids are related to protein folding. Its apparent helpfulness in contributing to prediction of the locations of the common amino acids in folded proteins agrees with the well-established closepacking of amino acid side-chains in proteins, in which large and small amino acids differ in their structural requirements (35). The finding that the core/surface distributions of the 20 amino acids can be better approximated using two complementary types of solvent transfer free energies seems likely to be useful in protein design and phylogenetics (36), by identifying suitable amino acid substitutions.
Effects of Temperature on Protein Folding and the Genetic Code. As noted earlier, most side-chains become more hydrophobic with rising temperature, and those increases are greater for the more polar amino acids. For the uncharged forms of the side-chains, the numerical range of ΔG w>c values (11.6 kcal/mol at 25°C) shrinks to 6.7 kcal/mol at 100°C. When both the charged and Changes in the ordering of side-chain hydrophobicities at 25°C and 100°C, colored according to the second base of the corresponding codon as indicated. Hydrophobic residues, near the top, tend to be associated with pyrimidines (red and purple), whereas hydrophilic residues, near the bottom, tend to be associated with purines (blue and black).
uncharged forms of all of the side-chains are taken into consideration, the details change. The large heat of deprotonation of the guanidinium side-chain of Arg (18 kcal/mol) (37) causes its pK a value to decrease by 2.7 units when the temperature rises from 25°C to 100°C, whereas the pK a values of the side-chains of Lys (10.95) and His (6.95) decrease by 2.0 and 1.3 units, respectively. In contrast, the heats of ionization of acetic and propionic acids are −0.1 kcal/mol (19), so that the pK a values of the side-chains of Asp and Glu do not change significantly when the temperature increases from 25°C to 100°C. As a result, the numerical range of ΔG w>c values for water-to-cyclohexane transfer from neutral aqueous solution to cyclohexane shrinks from 15.2 kcal/mol at 25°C to 10.74 kcal/mol at 100°C.
There is also a modest reshuffling of relative hydrophobicities, so that the order His > Gln > Lys > Glu > Asn > Asp at 25°C is reordered to the series Glu > Lys > His > Gln > Asp > Asn at 100°C. However, their K w>c(total) values at 100°C fall within a factor of 60, an extremely narrow range compared with a range of 1.5 × 10 15 -fold for all of the amino acid side-chains (SI Appendix, Table S1, last column). The limited magnitude of these changes, for the more polar amino acids, would presumably have tended to minimize the potentially disruptive effects of changes in thermal environment on the structures of globular proteins and also on the relative tendencies of amino acids to be found in deeply buried sequences of transmembrane proteins. It seems reasonable to infer that the rules of protein folding, insofar as they involve solvation by water, are not very different in a thermophilic organism such as Pyrococcus horikoshii, which grows optimally at 98°C, from those in a mesophilic organism growing at ordinary temperatures.
Earlier work revealed a pronounced bias in the genetic code (2, 3). Our recent reanalysis indicates that two separate codes, for amino acid size and hydrophobicity, appear to be embedded in tRNA sequences, with size encoded in the acceptor stem and hydrophobicity embedded in the anticodon (16). One aim of the present work was to determine whether these relationships are likely to have been maintained at elevated temperatures. In Fig.  1, which shows the relative hydrophobicities of the 20 common amino acids at 25°C and 100°C, the second base of each coding triplet is colored to indicate whether it is a purine (A or G, warm colors) or pyrimidine (C or U, cool colors). Fig. 1 shows that there is a single departure (Thr) from perfect correspondence between the dichotomy polar/nonpolar and the dichotomy purine/pyrimidine. That relationship is maintained at both high and low temperatures, implying that the meaning of a genetic code established in a hot environment would not have changed very much as the surroundings cooled.