Previous Article |
Table of Contents
| Next Article
BIOLOGICAL SCIENCES / EVOLUTION
An empirical test of the concomitantly variable codon hypothesis


,
*Department of Ecology, Evolution, and Behavior, University of Minnesota, 100 Ecology Building, 1987 Upper Buford Circle, Saint Paul, MN 55108; and
BioTechnology Institute, University of Minnesota, 140 Gortner Laboratory, 1479 Gortner Avenue, Saint Paul, MN 55108
Edited by Daniel L. Hartl, Harvard University, Cambridge, MA, and approved May 17, 2007 (received for review March 1, 2007)
| Abstract |
|---|
|
|
|---|
covarion | phylogeny | hybrid incompatibilities
Detecting evolutionary codependencies, or covarions (concomitantly variable codons) (16), is conceptually simple. Whenever a hybrid enzyme, formed by fusing homologous genes from different species, is compromised in function there must exist at least one covarion. By contrast, a fully functional hybrid enzyme demonstrates that covarions have not arisen between the fused segments.
We applied this criterion to covarion formation in triosephosphate isomerase (TIM) (encoded by tpiA). TIM catalyzes the reversible interconversion of dihydroxyacetone phosphate (DHAP) and glyceraldehyde-3-phosphate (G3P) in glycolysis, an unregulated reaction that requires no cofactor (17). Found throughout the tree of life, TIM structures are highly conserved (Fig. 1A) (1820) even though their sequences are highly divergent. TIMs, conserved in fold and function, thus provide ideal material with which to investigate the evolution of covarions arising through sequence divergence rather than as a consequence of changes in structure and function (21).
|
| Results |
|---|
|
|
|---|
Both hybrid enzymes were active, soluble, and complemented a
tpiA::kan deficiency during growth on minimal glycerol medium (22, 23). The kinetic performance of the hybrid with the E. coli N terminus was similar to the wild-type E. coli enzyme, whereas that of the hybrid with the E. cloacae N terminus was similar to the wild-type E. cloacae enzyme (Table 1). There is no evidence here of a covarion.
|
-proteobacteria. E. coli and Pseudomonas aeruginosa TIMs differ at m = 120 of the 255 sites, including two single residue gaps (one in each sequence) and with a ragged C-terminal end to the alignment (Fig. 1B). Each of j = 11 segments was reciprocally substituted into the wild-type TIMs (Table 1). Each segment averaged 
= m/j = 120/11 = 10.91 differences, with a range from 4 to 19 differences. Altogether these hybrids screened 
xi(m xi) = 12,922 of the m(m 1) = 14,280 possible pairwise combinations, representing a minimum 6,461 of 7,140 synthetic combinations (the actual number will be slightly larger because some sites will have received more than one mutation) (24). Although not every possible pairwise combination was screened in this analysis, the method of substituting small segments allows for incompatibilities between sites adjacent in both primary sequence and tertiary structure to be screened.
Concomitantly variable codons of lethal effect were identified by the failure of hybrid enzymes to complement the
tpiA::kan strain during growth on minimal glycerol medium (2223). SDS/PAGE analysis revealed hybrid enzymes were insoluble, suggesting that incompatibilities leading to misfolding were a primary cause of lethal covarion formation.
Polar effects were evident: reciprocal hybrids do not, in many cases, demonstrate the same growth phenotype. For example, hybrids LFM235 and LFM237 (in which segments a and b from P. aeruginosa TIM were substituted into E. coli TIM) complemented the
tpiA::kan strain, whereas reciprocal hybrids LFM234 and LFM236 failed to do so (Table 1). By contrast, reciprocal hybrids LFM238 and LFM239 (in which segments d were exchanged) were both inactive.
Polar effects are an expected consequence of covarion formation (16). Imagine that each of two interacting residues, a and b, in an ancestral protein acquires an amino acid replacement, A or B, in each of two descendent lineages. Then we have two extant functional proteins, Ab and aB, and two reciprocal hybrids: ab, which has the ancestral combination of amino acids and is functional, and AB, which has the synthetic combination that forms a covarion. The case in which both reciprocal hybrids are inactive is explained by the two segments having two covarions of opposite polarity (e.g., abCD and ABcd). This phenomenon is conceptually similar to DobzhanskyMuller incompatibilities (25), an idea originally formulated to explain asymmetry in hybrid organism compatibilities (24, 26) and recently extended to interactions within genes themselves (9).
Interacting segments can be identified. Hybrids pLFM230 (with segments a--c of P. aeruginosa substituted into E. coli TIM) and pLFM218 (with segments af of P. aeruginosa substituted into E. coli TIM) complement the
tpi::kanA strain (Table 1). Intervening hybrids pLFM18 and pLFM212 (with segments ad and ae of P. aeruginosa substituted into E. coli TIM) do not. No lethal covarions exist between segments d and f of P. aeruginosa and e of E. coli TIM, because hybrid pLFM240 (with segment e of E. coli substituted into P. aeruginosa TIM) complements the
tpi::kanA strain. Hence, interactions between segments d of P. aeruginosa and f of E. coli TIM produce a lethal covarion. Similar reasoning shows that covarions of lethal effect are rare (Fig. 2), produced by 5 and no more than 8 of the 110 possible pairwise segmental combinations.
|
Covarions of Sublethal Effect. Covarions of less than lethal effect are also in evidence. Although lethal covarions are produced upon introducing E. coli segments a or b into P. aeruginosa TIM (LFM229 and LFM234), the individual amino acid replacements fail to abolish growth. Similarly, introducing P. aeruginosa segment d into E. coli TIM (LFM239) eliminates function, whereas individual replacements fail to do so. In these cases, lethality depends on two or more residues per segment acting in concert. Replacements H5R and L14M (in E. coli segment a) each restore function to hybrid LFM229, demonstrating that, in this case, lethality is caused by the joint action of covarions of less-than-lethal effect. S68V restores function to hybrid LFM239, although other contributors were not identified. No individual mutation in P. aeruginosa segment d restores function in hybrid LFM234, demonstrating that loss of function can be caused by multiple covarions within the same segment.
Covarions of less-than-lethal effect were explored by kinetic analysis. As was the case with E. coli and E. cloacae, the wild-type performances (kcat/km) are not identical, E. coli TIM being 3-fold higher than P. aeruginosa TIM. Hybrid performance below wild-type P. aeruginosa TIM indicates that a covarion is present. Hybrid performance above wild-type P. aeruginosa TIM is considered "normal"; we cannot see evidence of covarions from our analysis in this case. By this criterion, most hybrids have a covarion: 16 of the 22 short-segment hybrids are functionally compromised, although 10 perform sufficiently well to complement the tpiA::kan strain.
Following Orr's model of DobzhanskyMuller hybrid incompatibilities (25), let p
be the probability that a pairwise synthetic combination compromises enzyme activity. Then the probability that substituting a segment from protein A into protein B with xi amino acid differences between the segments and m total differences between A and B produces a hybrid enzyme of normal performance is simply (1 p
)xi(mxi)
, where xi(m xi) is the number of pairwise combinations and
is the synthetic fraction (the fraction of pairwise combinations of amino acids that did not arise during divergence of the lineages varies between
and 1). The expected number of segmental hybrids of normal performance is n = 
(1 p
)xi(mxi)
. Setting n = 6, j = 11, and
= 0.5 (which assumes each difference between the two sequences is caused by a single mutation), and solving yields p0.5 = 0.00238. Sites that receive two or more mutations increase the number of synthetic combinations without altering sequence similarity. The maximum number of synthetic combinations for a segmental hybrid is double the minimum, or xi(m xi). Setting
= 1, and solving yields p1 = 0.00119, which is exactly half the previous estimate (
and p
are inversely related, such that
·p
remains constant). The expected number of covarions sampled is estimated as cs =
·p

xi(m xi) = 15.4, whereas the total number of covarions is estimated as cT =
·p
·m(m 1) = 17. Hence, only 17 of a minimum of 7,140 synthetic combinations form covarions. We conclude that covarions are rare.
Analysis of variance provides an alternative means to investigate the underlying covarion structure, with the 11 segments ak treated as main effects, and with covarions equated with significance of interaction terms (e.g., a·b). Whereas the probabilistic approach treats reciprocal interactions (e.g., Ab and Ba) individually, an analysis of variance combines them into a single term. Hence, in the analysis of variance, there are
possible pairwise segmental interactions,
three-way segmental interactions, etc. To simplify the analysis (there are more interactions than there are degrees of freedom) we focused attention on the 21 pairs of segments with replacements in close contact (<5 Å apart). These pairs are most likely to form covarions because the main chains of even widely divergent TIM sequences are readily superimposed (Fig. 1A), implying a rigidity that should prevent distant replacements compensating for local steric incompatibilities. Long-distance electrostatic interactions, although readily transmitted through hydrophobic domain cores, are unlikely to arise when charge changes are restricted to surface residues surrounded by solvent of high dielectric.
Analysis of variance of the loge(kcat/km) values identifies 8 (Table 2) of the 15.4 ± 3.9 pairwise segmental incompatibilities expected with the probabilistic approach. Lacking sufficient degrees of freedom, the analysis of variance included only pairwise interactions and excluded hybrid LFM233, which can only be accommodated by including high-order interactions. Forwardbackward stepwise regression, with all 55 possible pairwise segmental interactions, was performed to ascertain the bias caused by excluding long-range effects. The same covarions were identified together with a highly changeable assortment of minor interactions and the a·f
b·f interaction, which is always recovered. We conclude that no more than one covarion was overlooked by restricting the analysis of variance to the 21 pairs of segments with replacements <5 Å apart. Adding the a·f
b·f interaction produces a marginal improvement in the fit with 10 interactions now significant. The analysis of variance confirms that covarions are rare and helps identify interacting segments (Fig. 2B). The identity of the three-way interactions needed to accommodate hybrid LFM233 could not be reliably ascertained because forwardbackward stepwise regression showed a large number of possible three-way interactions produced similar fits.
|
| Discussion |
|---|
|
|
|---|
A potential bias in the results arises because sites within a segment are not broken apart, and hence covarions within a segment cannot be identified. Yet proximity in the linear sequence is no guarantee of proximity in the three-dimensional protein fold. For example, direct interactions between the sidechains of adjacent residues in a
-sheet point are impossible; they point 180° away from each other. Moreover, individual point mutations did not recover more lethal covarions than expected. For these reasons we do not anticipate that our use of segmental replacements affects our general conclusions.
Assuming reduced function is caused by pairwise interactions (25), we estimate that 1 in 850 pairwise combinations produces a covarion. This estimation suggests two 250-residue proteins (the approximate size of TIM) that differ at 30 sites (88% identical) would have, on average, only 30 x 29/850 = 1 covarion. Similar sequences are unlikely to have developed many covarions. Two distantly related proteins that differ at 175 sites (30% identical) are expected to produce 175 x 174/850 = 36 covarions under the pairwise interaction model. In other words, 36 pairs of sites, or 41% of the sites that differ, form covarions. The pairwise interaction model predicts that distantly related sequences are likely to have developed many covarions.
There is insufficient data to conclude that the frequency of covarions increases quadratically with genetic distance as is expected under the pairwise interaction model (25), and thus caution is warranted here. Sequence surveys show that
10% of amino acid replacements in non-human proteins would be pathogenic in humans (9). Similarly, 10% of pathogenic amino acid replacements in Drosophila melanogaster are found in related Dipteran species (10). Both genetic surveys therefore suggest that the frequency of covarions increases linearly with genetic distance. Resolution of these conflicting interpretations awaits detailed experimental investigation.
Shifts in covarions are capable of producing long-branch attraction and have been invoked to explain why Microsporidia and Archaebacteria are united in EF-1
phylogenies (27). Conceivably, difficulties in resolving deep phylogenetic structures, for example the origins of major phyla, might in part be attributed to an accumulation of covarions that violates the cardinal assumption of all phylogenetic methods, namely that each site in a sequence evolves independently of all other sites. The extent to which a snowball effect (25) produced by a quadratic increase in the frequency of covarions with genetic distance, a linear increase in the frequency of covarions with genetic distance (28, 29), or even a steady-state level of churning covarions might compromise deep phylogenetic analysis is unknown. Thus, resolving the covarion contradiction is not only fundamental to a mechanistic understanding of protein evolution but also to accurately reconstructing the history of life.
| Methods |
|---|
|
|
|---|
tpiA E. coli strains have multiple genes deleted along with tpiA, a new strain was created using the
red recombinase system (30), with the tpiA gene in E. coli MG1655 replaced by a kanamycin resistance (kan) cassette. To the resulting LFM46 strain, the prophage
DE3 polymerase system was incorporated (Novagen, San Diego, CA) to allow for expression of hybrid triosephosphate isomerase genes under the control of the T7lac promoter. Cloning of Wild-Type Triosephosphate Isomerase Genes. The tpiA genes from E. coli (strain MG1655), E. cloacae (gift of B. Hall, University of Rochester, Rochester, NY), and P. aeruginosa (strain PAO1, ATCC) were amplified and subcloned into the pET16b vector (Novagen).
Construction of Hybrid Proteins. Hybrid proteins were constructed using restriction digestion and ligation and PCR. An initial set of hybrids was constructed by cutting the vectors containing cloned wild-type enzymes at the common AgeI site and then recombining the spliced sequences, allowing the construction of two complementary hybrids between each pair of genes.
A series of hybrids with a single division point between E. coli and P. aeruginosa sections was created using overlap-extension PCR. Two megaprimers were created from the different wild-type sections of tpiA to be joined, each with a 15-bp overhang complementary to the opposing section. The two megaprimers with the 30-bp overlap were combined with a third round of PCR(Herculase polymerase; Stratagene, La Jolla, CA). Resulting hybrids were then used as templates along with the wild-type sections, and the overlap-extension PCR procedure was repeated to isolate individual regions ak. The overlap-extension PCR fragments were cloned into the pET16b vector at NdeI and BamHI sites for expression and subsequent analysis.
The relevant sections of all strains and plasmids constructed during the course of these experiments were verified by sequencing at the Advanced Genetic Analysis Center, University of Minnesota.
Analysis of Hybrid Proteins.
Each constructed wild-type and hybrid plasmid was tested for the ability to complement a tpiA deficiency. Each plasmid, a derivative of the pET16b vector, was transformed into the tpiA::kan strain, LFM46(
DE3) and screened for the ability to grow on glycerol as the sole carbon source by streaking on M63 minimal media plates with 0.2% glycerol and 100 µg/ml ampicillin (22, 23). Solubility was determined by overexpressing the relevant hybrid tpiA in the tpiA-deficient background and comparing protein in soluble and insoluble fractions by SDS/PAGE.
Enzyme Kinetics.
Kinetics reactions with glyceraldehyde-3-phosphate (G3P) as substrate (0.7 ml) consisted of 0.1M tri ethanolamine, 5 mM EDTA, 200 µM NADH, 0.212 mM D-glyceraldehyde-3-phosphate substrate, 5 units of
-glycerophosphate dehydrogenase (rabbit muscle, contains <0.02% TIM; SigmaAldrich, Dorset, U.K.), and a small amount of purified TIM enzyme. Substrate was prepared from DL-glyceraldehyde-3-phosphate diethyl acetyl barium salt (Sigma, St. Louis, MO) by using Dowex-50 (hydrogen form) according to the manufacturer's instructions to remove barium salt. Coupling enzyme,
-glycerophosphate dehydrogenase, was provided as an ammonium sulfate precipitate and was dialyzed exhaustively at 4°C against 0.1 M triethanolamine, 5 mM EDTA, and 10% glycerol solution to remove ammonium sulfate. Overexpressed hybrid and wild-type TIM proteins have a C-terminal 6x His tag and were purified using BD-Talon (metal affinity) columns (Becton-Dickinson, San Jose, CA).
Blank reactions (no TIM) were run for 10 min to determine the background activity in each reaction. After the addition of TIM, reactions were monitored a further 10 min on a Cary 300 spectrophotometer. For each enzyme, eight reactions with different substrate concentrations (
0.212 mM) were run simultaneously. Substrate was prepared fresh daily, and concentration was determined before each run. For each protein, reactions were run with a minimum of two different protein preps on two different days.
TIM reaction velocity was calculated by subtracting the background velocity from the total velocity. Vmax and km were determined using a nonlinear fit (JMP SAS Institute,Cary, NC) to the MichaelisMenten equation, v = Vmax·S/(km + S). Protein concentrations were determined using Bio-Rad (Hercules, CA) protein assays according to the manufacturer's instructions, with 20 µl of each sample made up of solution and 1.0 ml of diluted dye reagent used for each reaction. The kcat value was determined by taking the Vmax per active site. To compare enzymes, enzyme performance, kcat/km, was calculated. Individual kcat, km, and performance data from each trial was averaged for each enzyme. Standard error of performance values reported represents the error between independent trials.
Mutagenesis. Mutagenesis of nonfunctional hybrids was performed to identify individual sites involved in covarions. The five nonfunctional hybrids with individually isolated segments (LFM229, LFM234, LFM236, LFM238, LFM239) were used.
Site-directed mutants of LFM238 and LFM239 were created by overlap-extension PCR. Each of the individual 12-aa sites in each nonfunctional hybrid was altered to encode the wild-type amino acid of the opposing organism. In addition, the corresponding individual mutations were made in the wild-type enzymes to look for loss-of-function mutants.
Site-directed mutants of the hybrids LFM229, LFM234, and LFM236 and the corresponding changes in the wild-type gene were constructed with an alternative PCR method by using a single megaprimer. Here, a single primer containing the desired mutation was paired with a primer in the T7 promoter of pET16b (Novagen) for amplification from either a hybrid (LFM229, LFM234, LFM236) or wild-type (LFM16) plasmid template, generating a double-stranded mutant megaprimer fragment of 100300 bp. The megaprimer was gel purified (Qiagen Gel Extraction kit; Qiagen, Valencia, CA) and used for a second round of PCR with a second primer in the T7 terminator of pET16b (Novagen) to generate the entire mutant fragment of interest.
Amplified genes were purified from agarose gels and subcloned into vector pLFM14 or pLFM16. Purified mutants were confirmed by sequencing, and plasmids were transformed into the tpiA::kan strain LFM46(
DE3). Resultant colonies were streaked on M63 minimal media with 0.2% glycerol and 100 µg/ml ampicillin to screen for the restoration of TIM function.
| Acknowledgements |
|---|
|
|
|---|
| Footnotes |
|---|
Abbreviations: TIM, triosephosphate isomerase.
To whom correspondence should be addressed. E-mail: deanx024{at}umn.edu
Author contributions: L.M.F.M., M.L., and A.M.D. designed research; L.M.F.M. and M.L. performed research; M.L. contributed new reagents/analytic tools; L.M.F.M. and A.M.D. analyzed data; and L.M.F.M. and A.M.D. wrote the paper.
Present address: The Wistar Institute, 3601 Spruce Street, Philadelphia, PA 19104. ![]()
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
© 2007 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
W. M. Patrick, E. M. Quandt, D. B. Swartzlander, and I. Matsumura Multicopy Suppression Underpins Metabolic Evolvability Mol. Biol. Evol., December 1, 2007; 24(12): 2716 - 2722. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||