Previous Article |
Table of Contents
| Next Article
BIOLOGICAL SCIENCES / EVOLUTION
Codon-usage bias versus gene conversion in the evolution of yeast duplicate genes



*Department of Ecology and Evolution, University of Chicago, 1101 East 57th Street, Chicago, IL 60637; and
Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
Contributed by Wen-Hsiung Li, July 28, 2006
| Abstract |
|---|
|
|
|---|
Many Saccharomyces cerevisiae duplicate genes that were derived from an ancient whole-genome duplication (WGD) unexpectedly show a small synonymous divergence (KS), a higher sequence similarity to each other than to orthologues in Saccharomyces bayanus, or slow evolution compared with the orthologue in Kluyveromyces waltii, a non-WGD species. This decelerated evolution was attributed to gene conversion between duplicates. Using
300 WGD gene pairs in four species and their orthologues in non-WGD species, we show that codon-usage bias and protein-sequence conservation are two important causes for decelerated evolution of duplicate genes, whereas gene conversion is effective only in the presence of strong codon-usage bias or protein-sequence conservation. Furthermore, we find that change in mutation pattern or in tDNA copy number changed codon-usage bias and increased the KS distance between K. waltii and S. cerevisiae. Intriguingly, some proteins showed fast evolution before the radiation of WGD species but little or no sequence divergence between orthologues and paralogues thereafter, indicating that functional conservation after the radiation may also be responsible for decelerated evolution in duplicates.
selective constraints | whole-genome duplication | concerted evolution | decelerated evolution
| Results and Discussion |
|---|
|
|
|---|
and a is expected to be longer than that between orthologues
and
(Fig. 1a), but the opposite is true in Fig. 1b because of a gene-conversion event. To see how often such a situation has occurred in yeast duplicate genes, we studied
300 WGD gene pairs in S. cerevisiae and their syntenic orthologues from three related species, S. bayanus, Saccharomyces mikatae and Saccharomyces paradoxus (8). Because the WGD occurred before the radiation of these species, in the absence of gene conversion, the synonymous distance (KS) is expected to be larger between S. cerevisiae paralogues than between orthologues in different species. We find that this expectation indeed holds in most cases, with 93.4% of duplicate pairs in S. cerevisiae having a paralogous KS greater than or equal to the KS between orthologues (Fig. 1e). This result indicates that only in a small proportion of these WGD duplicate genes has the tree topology been distorted by gene conversion, because only when a point is below the line in Fig. 1e would a distortion in topology have occurred. Interestingly, most S. cerevisiae paralogous pairs with a small KS also show a small KS between orthologues, and many have a high codon-adaptation index (CAI) value (a large circle in Fig. 1e), a measure of codon-usage bias (9). This analysis suggests that decelerated evolution of S. cerevisiae paralogues is, at least in part, due to biased codon usage, which serves as an evolutionary constraint (7, 10).
|
To pursue the analysis further, we reconsidered the 66 duplicate gene pairs identified by Gao and Innan (4) to have a small KS between S. cerevisiae paralogues. We found that 57 of them were duplicated before the divergence between S. cerevisiae and S. bayanus, and only one of these 57 pairs (YGL147C/YNL067W) is not from WGD (3, 11). In the 57 phylogenies for these 57 pairs, only 8 pairs showed a completely distorted tree topology (suggesting conversion in all lineages) like Fig. 1f, 23 pairs showed a partially distorted topology, and approximately half of them (26 pairs) showed no topology distortion (Table 3, which is published as supporting information on the PNAS web site). We note that, with the exception of two (YDL131W/YDL182W and YDR312W/YHR066W), all 57 pairs have a strong codon-usage bias (CAI > 0.5). Therefore, in many of these gene pairs, the small KS values between S. cerevisiae paralogues (and between orthologues) might be largely due to strong codon-usage bias constraint.
The above phylogenetic analysis, however, is not powerful enough for detecting all gene-conversion events, because conversion events involving only a small DNA region are unlikely to change the tree topology. For this purpose, we have developed a statistical method to detect gene-conversion events and have applied it to
300 WGD duplicate gene pairs in S. cerevisiae, S. paradoxus, S. mikatae, and S. bayanus. Our main purpose is to see whether gene conversion occurred primarily in high-CAI genes. Indeed, Table 1 shows that approximately half of the genes with CAI
0.7 have undergone gene-conversion events, whereas only 2% of the genes with CAI < 0.5 have conversions (P < 108 for all species). Apparently, codon-usage bias increases the rate of gene conversion by reducing the rate of sequence divergence. In the absence of strong codon-usage bias, synonymous divergence between duplicate genes increases with time, and the chance of gene conversion is concomitantly reduced.
|
|
|
40% for the four Saccharomyces species. However, although most-favored codons are the same among these species (Table 4), we found a switch of the preferred codon of glutamine (Gln) between CAA and CAG and a switch of the preferred codon of glutamic acid (Glu) between GAA and GAG between S. cerevisiae and A. gossypii. As shown in Table 2, these switches might be due to changes in tDNA gene copy number. For instance, the numbers of tDNA-Glu genes for anticodons TTC and CTC are 14 and 2 in S. cerevisiae but 3 and 8 in A. gossypii, and this may explain why the GAA codon is preferred in S. cerevisiae, whereas GAG is preferred in A. gossypii. Such a difference in codon preference can increase the synonymous distance between species. The tDNA gene phylogeny suggests that the change of gene copy number can be derived from a point mutation at anticodon or from duplication/deletion of tDNA genes in the genome (Fig. 3).
|
Gao and Innan (4) estimated the expected length of concerted evolution in S. cerevisiae as 25 million years, based on the theory the same group had proposed earlier (20) (f = 9 of 51; 51 gene pairs shows concerted evolution at the divergence time between S. cerevisiae and S. bayanus, whereas 9 gene pairs are still under concerted evolution at the divergence time between S. cerevisiae and S. paradoxus). We selected 18 gene pairs for which the paralogues and orthologues in S. cerevisiae, S. paradoxus, and S. bayanus are all available and with CAI
0.7. We detected gene conversion in 11 S. cerevisiae gene pairs. When we used S. paradoxus to calculate the orthologous distance instead, 6 gene pairs still have gene-conversion events detectable. The expected length of concerted evolution for S. cerevisiae genes with CAI
0.7 thus estimated is 70 million years (f = 6 of 11, from S. cerevisiaeS. bayanus divergence to S. cerevisiaeS. paradoxus divergence). Note that this value may be underestimated because these genes are highly constrained and have evolved slowly. Informative sites indicating gene conversion may be too few to make the statistics significant. However, we obtained a similar estimate by assuming that the duration of concerted evolution started at the WGD event, and the WGD occurred 100 million years ago (f = 12 of 21, from WGD to S. cerevisiaeS. bayanus divergence). Using the same method, we can estimate the expected lengths of concerted evolution for S. cerevisiae genes with CAI between 0.5 and 0.7 and CAI < 0.5 as 20 million years and 10 million years, respectively (f = 4 of 31 and 4 of 238, from WGD to S. cerevisiaeS. bayanus divergence).
In summary, our analysis suggests that codon-usage bias and protein functional conservation might have been more important than gene conversion for the decelerated evolution of WGD duplicate genes in yeasts. Note that gene conversion occurs only occasionally, whereas codon-usage constraint and functional constraint of proteins are constant forces that slow down sequence evolution. Furthermore, the rate of gene conversion decreases as sequence divergence increases. For this reason, gene conversion may not be an effective means for long-term maintenance of sequence similarity between duplicate genes in the absence of codon-usage constraint or functional constraint. In contrast, both codon-usage constraint and protein functional constraint can slow down sequence evolution in the absence of gene conversion. Of course, the three factors can have synergistic effects in maintaining high sequence similarity between paralogues.
| Materials and Methods |
|---|
|
|
|---|
Identification of Gene-Conversion Events. Numerous methods for gene-conversion identification have been developed, but these methods are either not suitable or not powerful enough for this analysis. For example, S. Sawyer's (24) method uses measures of the distribution of identical synonymous sites between sequence pairs to identify candidate regions of conversion. This method assumes a neutral evolutionary process for synonymous sites and may, therefore, not be suitable for yeast genes in which codon-usage bias affects synonymous substitution. More importantly, this method does not use any outgroup for reference, so it is, in general, less powerful than phylogeny-based methods. Other methods, such as those of Jakobsen and coworkers (25, 26), rely on the examination of site-by-site phylogenies, and the phylogeny for each site in a multiple alignment of paralogues and orthologues is tested for its support of conversion. Although these methods are similar to ours, they suffer when there are multiple substitutions at individual sites (27). Multiple substitutions may, again, be a problem in our analysis, because we are examining the ancient duplicates retained from the WGD in yeast, in which multiple substitutions are common. Therefore, we have developed a related algorithm for conversion identification.
We used WGD orthologues in the four genomes, S. cerevisiae, S. bayanus, S. mikatae, and S. paradoxus. At nucleotide position i, let Di equal the number of nucleotide differences between the two nucleotides in paralogous gene 1 and gene 2 in species 1 (the species under study), and Bji equal the number of nucleotide differences in gene j (j = 1, 2) between species 1 and its orthologue in species 2. Let Bi = (B1i + B2i)/2. Sequences with gaps longer than 50% of the alignment were removed. For a gene under study, species with only one (or no) paralogue available are also removed. Gaps are all removed. For S. cerevisiae, S. paradoxus, or S. mikatae, Bi is calculated between the species under study and S. bayanus. For S. bayanus, Bi is calculated as the average of the differences between S. bayanus and the available three species.
Under the null hypothesis of no gene conversion, the distance (number of differences) between the two paralogues in a species should be larger than or equal to the distance between orthologues, i.e., Di Bi
0, because the duplication event occurred before speciation. Dynamic programming is used to select the segment from site m to n that maximizes
i=mn(Bi Di). This segment has N sites, where N = n m + 1. Let D =
i=mnDi and B =
i=mnBi. If n
20, the binomial probability to observe D
B for a segment of N sites is calculated by using the orthologous distance B as the expected distance, i.e., D = B. This is a stringent criterion, because the WGD event occurred earlier than speciation events. The estimated probability is
|
|
However, this segment always has its first and last sites supporting Bi > Di, which may cause an overestimate of the significance. Therefore, we remove the first or the last site of the segment, and recalculate B and D as
i=m+1nBi and
i=m+1nDi or
i=mn1Bi and
i=mn1Di and obtain binomial probabilities P1 and P2, respectively. The higher value of P1 and P2 is used.
The segments thus identified with the paralogous distance significantly smaller than the orthologous distance might potentially be derived from gene conversion. However, many possible segments of N sites can be selected from the entire gene sequence, so we need to take this factor into consideration. Therefore, for each segment with a binomial probability P < 0.01 computed from Eq. 1, we construct an empirical distribution of B for a segment of length N using 10,000 bootstrap samples from {B1, B2, ., BL}, where L equals alignment length for the gene under consideration. Then, it is possible to determine the significance of D by counting the proportion of samples for which D < B. Segments with a binomial probability P < 0.01 and with an empirical probability <0.01 are considered candidate gene conversions.
Codon-Usage Frequencies and tDNA Genes.
Relative frequencies of codon usage in orthologues of WGD genes were calculated for the genomes of K. waltii, A. gossypii, S. cerevisiae, S. bayanus, S. mikatae, and S. paradoxus. Two sets of gene pairs were obtained. S. cerevisiae genes with CAI > 0.5 were classified into the highly expressed set and so were their orthologues in other species, whereas genes with CAI < 0.2 were classified into the less expressed set. The
2 test was used to examine whether a codon is favored in highly expressed genes compared with less expressed genes. We obtained tDNA genes of S. cerevisiae from the Munich Information Center for Protein Sequences (MIPS), and used the sequences and genomic BLAST in the National Center for Biotechnology Information (NCBI) to identify orthologues in the other five genomes.
| Acknowledgements |
|---|
|
|
|---|
| Footnotes |
|---|
Abbreviations: CAI, codon-adaptation index; WGD, whole-genome duplication.
To whom correspondence should be addressed. E-mail: whli{at}uchicago.edu
Author contributions: Y.-S.L. and W.-H.L. designed research; Y.-S.L., J.K.B., J.-K.H., and W.-H.L. performed research; Y.-S.L. and J.K.B. analyzed data; and Y.-S.L., J.K.B., J.-K.H., and W.-H.L. wrote the paper.
The authors declare no conflict of interest.
© 2006 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
X. Wang, H. Tang, J. E. Bowers, F. A. Feltus, and A. H. Paterson Extensive Concerted Evolution of Rice Paralogs and the Road to Regaining Independence Genetics, November 1, 2007; 177(3): 1753 - 1763. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. A. van Hoek and P. Hogeweg The Role of Mutational Dynamics in Genome Shrinkage Mol. Biol. Evol., November 1, 2007; 24(11): 2485 - 2494. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. J. Leach, Z. Zhang, C. Lu, M. J. Kearsey, and Z. Luo The Role of Cis-Regulatory Motifs and Genetical Control of Expression in the Divergence of Yeast Duplicate Genes Mol. Biol. Evol., November 1, 2007; 24(11): 2556 - 2565. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. K. Ingvarsson Gene Expression and Protein Length Influence Codon Usage and Rates of Sequence Evolution in Populus tremula Mol. Biol. Evol., March 1, 2007; 24(3): 836 - 844. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||