Previous Article |
Table of Contents
| Next Article
BIOLOGICAL SCIENCES / EVOLUTION
Buffering of crucial functions by paleologous duplicated genes may contribute cyclicality to angiosperm genome duplication

,
,
,¶
*Plant Genome Mapping Laboratory and Departments of
Plant Biology,
Genetics, and
Crop and Soil Science, University of Georgia, Athens, GA 30602
Edited by Tomoko Ohta, National Institute of Genetics, Mishima, Japan, and approved December 8, 2005 (received for review September 13, 2005)
| Abstract |
|---|
|
|
|---|
60 million years later, and its gradual deterioration may contribute cyclicality to genome duplication in some lineages.
amino acid substitution | Arabidopsis | Oryza | protein functional domain | single | nucleotide polymorphism
Under classical models, gene duplication is proposed to be a primary source of genetic material available for the evolution of genes with new functions (911); one member of a duplicated gene pair may mutate and acquire unique functionality (12, 13), with the fitness of the organism insulated by the homoeolog. Such models would predict that, in natural populations, higher levels of polymorphism would occur in duplicated genes than in singletons [a prediction supported by recent theory (14)] and that the ability of duplicates to provide functional compensation for one another would erode as their functions diverged (15).
However, recent findings raise perplexing questions about this classical "functional divergence" model. Analysis of 17 nonallelic duplicates in Xenopus laevis shows evidence of purifying selection on each duplicate gene (16). For three recently duplicated (
0.251.2 million years ago) Arabidopsis genes, both progenitor and derived copies show significantly reduced species-wide polymorphism (17). Duplicated yeast genes provide a discernible degree of functional compensation for a remarkably long period (18) and appear to undergo gene conversion (19). Realization of the potential benefits that may result from functional divergence of newly duplicated genes would naturally require a new polyploid to persist long enough for adaptive evolution to occur. Most higher organisms are thought to continuously produce aberrant unreduced gametes at low rates, but the rarity of genome duplication shows that the overwhelming majority do not survive. Study of both natural polyploids and synthetic polyploids formed by colchicine-based manipulation of interspecific hybrids reveals immediate consequences of polyploidization that ostensibly seem maladaptive, including loss and restructuring of low-copy DNA sequences (2025), activation of genes and retrotransposons (26, 27), and gene silencing (2831).
The angiosperms are an outstanding higher-eukaryote model in which to elucidate consequences of genome duplication in view of the strong signal that remains from multiple genome duplications, naturally occurring replication afforded by independent duplications, and a host of genetic and molecular tools. The fates of duplicated genes may be closely associated with effective population size (Ne) for a taxon (32), meaning that recent insights from microbes (18, 3335) may not extend well to crown eukaryotes. For example, recent empirical data (35) validate the prediction that subfunctionalization should be rare in organisms with large effective population size, such as yeast (12), and highlight the need for complementary investigations of larger-bodied (smaller Ne) eukaryotes, such as angiosperms.
We explore for "footprints of selection" associated with genome duplication, investigating whether the evolution of genes in modern populations or recently diverged taxa are influenced by the presence of an ancient duplicated copy at an unlinked locus. Specifically, we distinguish strictly "singleton" genes from those that have retained duplicated copies as a result of ancient duplication and compare levels and patterns of polymorphism in the coding sequences of these two gene classes. Our results suggest that functional conservation, particularly of complex genes and functional domains, may occur in many paleologs that are still recognizable as such. Thus, functional divergence suggested by classical models and functional buffering suggested herein may be alternative outcomes in the spectrum of possible fates for paleologs. Several avenues may offer some reconciliation between these respective models.
| Results and Discussion |
|---|
|
|
|---|
) duplicated chromosomal segments that we could discern (1), we first determined whether individual genes did or did not retain a paleologous copy in subsequent duplications (Fig. 1b). Indeed,
-duplicated genes are more often retained in duplicate for both the
(36.3% of the time versus 20.6% for singletons; binomial proportion, P < 1.3 x 1015) and
(30.2% versus 22.8%; P < 2.2 x 104) events.
-Event-derived duplicated and singleton genes show a similar pattern; 41.8% of
-duplicated genes are retained in duplicate during the
event compared with only 27.0% of singleton genes (P < 7.9 x 1054). The tendency for singletons to be repeatedly restored to singleton status after new duplications, together with the unexpected excess of duplicated genes that not only persist for long time periods but continue to spawn additional copies, suggest that gene retention after duplication is selective. Others have recently supported these results (36) and suggested differential reduplication of particular gene functional classes (37, 38).
|
|
,
, and
Arabidopsis singleton genes, respectively, but only 39%, 44%, and 47% of cases for corresponding duplicated genes. In rice, SNPs caused inferred amino acid changes in 61% of singletons versus 48% in duplicated genes. These differences are statistically significant for the Arabidopsis
and rice duplications (P < 0.001,
2) but not for the older Arabidopsis duplications.
|
and rice duplications but not for the older Arabidopsis duplications. To preclude the possibility that erroneous gene predictions influence this result, the analyses were repeated with only the subsets of singleton and duplicated genes that matched an EST (E < 1050) with nearly identical results. The conservative evolution of duplicated genes was evident not only qualitatively but also quantitatively. Gene pairs that were relatively similar tended to retain less severe SNPs (in terms of inferred amino acid changes) than more divergent gene pairs from the same event (i.e., of similar age) (Fig. 3). The conservative evolution of paleologs is all the more striking in view of the theoretical expectation that variation within species in duplicated genes should be greater than in single-copy genes (14). This finding further contradicts the expectations of traditional functional divergence models for polyploidy (911), in which radically altered alleles should persist more frequently in similar duplicated genes because of the presence of a redundant copy but rarely in members of more divergent pairs that may no longer have the same functions.
|
|
event, an average of 20.5% of the coding regions of singletons and 44% of duplicates are covered by characterized domains. Similarly, rice singletons average 7.7% coverage by domains versus 41.3% for duplicated genes. The greater retention in duplicate of longer and more complex proteins, together with their lower tolerance of nonsynonymous mutations, suggest that the potential advantage conferred by the possibility of future neofunctionalization may be outweighed by the immediate benefits of buffering crucial functionality, i.e., ensuring that the functions of essential genes and domains are met even after mutation of one copy by the presence of a second copy. The tendency for duplicated gene copies to tolerate less severe amino acid substitutions than singletons in natural populations is in contrast to a primary advantage of polyploidy being the freedom for duplicated genes to acquire new functions (911). However, this finding is consistent with recent results about the evolution of recently evolved duplicates (16, 17) and long-term functional compensation (18, 41, 42). Further testing of this hypothesis might derive from obtaining data for additional taxa that parallel the fitness data available for individual gene knockouts in yeast (18).
Such genetic buffering may be especially important in survival of lineages experiencing the early "genomic turmoil" that immediately follows polyploid formation (2025). However, our data show that such buffering still markedly affects diversity among ecotypes of Arabidopsis thaliana and subspecies of Oryza sativa, species that each trace to genome duplications that occurred 60 million years or more before speciation (1, 3).
Do Homogenization Processes Act on Paleologs? What mechanisms might preserve the sequence of thousands of pairs of genes at paleologous sites across a genome for 60 million years or more? Occasional nonhomologous associations between chromosomes are observed during mitosis in many taxa, including rice (43), and might periodically permit homogenization processes to act between paleologs. Mechanisms such as gene repair (44) and gene conversion (45) have been suggested to occur between ancient duplicated genes (19) but can be difficult to quantify (46). Indeed, genome-wide characterization of duplicates via established methods provided ambiguous results.
To attempt to circumvent problems with accurately detecting homogenization across an entire genome, we explored pairwise alignments of duplicated genes for unexpectedly long stretches of sequence identity. Paleologs were compared with one another, whereas corresponding singletons from the same polyploidization event were compared with their best homolog from species that diverged from the lineage before or after duplication, respectively, as detailed in Materials and Methods. In the absence of homogenization, the lengths of identical regions for duplicate pairs should fall between those of such pairs of singleton-to-homolog comparisons flanking the duplication event (Fig. 5; see also Supporting Results and Table 2, which are published as supporting information on the PNAS web site). Within this framework, the largest stretch of identical codons for each gene, representing the best evidence for potential homogenization, was compared in both domain and nondomain regions of singleton and duplicated genes.
|
20 million years more recently (3). Although the possibility of interaction between paleologous genes needs further study, the preservation of key domains by homogenization processes could offer a selective advantage to the retention of duplicated copies of genes that serve crucial functions. Loss of duplicated copies may permit the remaining singleton genes to evolve more rapidly, thus accelerating mutation rates in an adaptive manner by a mechanism that is explicable without projecting a specialized purpose into evolution (47). Homogenization may also cause molecular clocks based on substitution rates between duplicated genes to chronically underestimate the age of duplications, as suggested recently (23). This potential bias would affect many dating results, including some from our own laboratory (1, 3).
Reconciliation Between the Models. Several avenues may offer some reconciliation between the classical functional divergence model for genome duplication (911) and an emerging functional buffering model. Our hypothesis that a primary advantage of retaining genes in duplicate may be the immediate buffering of crucial functionality in no way precludes the occasional evolution of unique functionality as one outcome in a spectrum of possibilities that is generally weighted in favor of buffering. Indeed, genes for which divergence was advantageous may have diverged long ago and are no longer recognizable as paleologs. Finally, the present results focus on changes in protein sequence and do not address changes in noncoding regulatory elements that can cause expression divergence (48).
Polyploid buffering, along with recombination, may contribute to reduced accumulation of degenerative mutations via Müllers ratchet (49). Although proximal duplication has been suggested to primarily amplify genes with secondary functionality, whole-genome duplication amplifies, and may buffer, genes involved in essential processes, such as primary metabolism (50). Many predominantly clonally propagated angiosperms (such as banana and sugarcane) and apomicts (51) are recently formed polyploids; protection of critical functionality by duplication and preservation of essential genes would explain how, in the absence of recombination, they avoid degeneration via Müllers ratchet (49). Similarly, repair processes fostered by whole-genome duplication are also key to the extraordinary radiation tolerance of Deinococcus radiodurans (52).
Might Erosion of Buffering Impart Cyclicality to Genome Duplication?
Reciprocal buffering presumably erodes with time as ancient duplicated genes eventually diverge, potentially imparting cyclicality to genome duplication. For example, in Arabidopsis, we could only assert that SNP frequencies in duplicated genes were lower than those in singletons (39% versus 52%) for the most recently (
) duplicated segments; the frequencies of SNPs encoding nonsynonymous changes in older Arabidopsis
and
duplicate genes are virtually indistinguishable from those for singletons (49% versus 51% and 47% versus 50%, respectively). In organisms such as angiosperms, in which there are few roadblocks to polyploidization (such as sex chromosomes), the selective advantage for newly formed polyploids to survive may increase as the buffering from previous genome duplication erodes.
The emerging functional buffering model for polyploid genome evolution accommodates the ability of duplicated genes to innovate despite conservative rates of protein evolution. Genomes appear to have retained duplicated genes with slower evolutionary rates (53), homogenized protein domains, and higher connectedness in cellular networks (33). Duplicate gene evolution is thus a balance between maintaining critical gene components and diversification of the use of these components for evolutionary innovation. The identification of these critical components and characterization of their buffering capacity, mechanisms, and longevity may be important steps toward understanding the conditions under which large-scale duplications lead to successful taxa.
| Materials and Methods |
|---|
|
|
|---|
(1) and Oryza duplicated segments (3). Singletons had no best BLAST hit at an E value of <1010 anywhere in the genome. Proximal duplications were removed based on having additional nonhomoeologous duplicate pairs. Duplicated pairs corresponding to a recent segmental duplication between rice chromosomes 11 and 12 were removed. Arabidopsis coding, genomic, and protein sequences were downloaded from TAIR (The Arabidopsis Information Resource, www.arabidopsis.org) release 20030417. TIGR (The Institute for Genomic Research) Rice Pseudomolecules and Genome Annotation Version 1.0 (www.tigr.org/tdb/e2k1/osa1) was used for rice analyses. SNPs. Arabidopsis SNPs were selected as described from 37,344 polymorphisms derived from whole-genome shotgun data for the Landsberg ecotype compared with the finished sequence of ecotype Columbia (39). Rice SNPs were from a whole-genome comparison between the japonica and indica subspecies (40). To locate SNPs within coding regions of duplicated and singleton genes, 20-bp sequences flanking the SNP were located in the genomic sequences from BLAST to identify potential genes and "fuzznuc" from the EMBOSS 2.4.1 package to find exact positions. Genomic locations were translated into positions in the coding sequence by comparing alignments of genomic and coding sequence created with the EMBOSS program EST2GENOME. Positions of SNPs within codons and Blosum80 substitution values were determined by using the predicted coding sequence and previously determined positions. The entire process was automated with PYTHON scripts.
Characterization of Gene Features.
Distances between duplicate gene pairs were calculated with the PROTDIST program from PHYLIP 3.6A3, using the default JonesTaylorThornton substitution matrix and a
-distribution of rates across positions (
= 0.9). Locations of protein domains within singleton and duplicated genes were determined with HMMPFAM from the HMMER package version 2.2g. Searches were conducted against the Pfam 11.0 database (7,255 models), retaining global matches with an E value of <0.001.
Analysis of Homogenization.
For both Arabidopsis and Oryza, we used gene databases from taxa selected to flank the duplication event as determined by phylogenetic dating techniques (1, 3). Arabidopsis singletons used clustered EST databases from PlantGDB (Plant Genome Database, www.plantgdb.org) for Brassica napus (which diverged from Arabidopsis after the duplication event) and Gossypium arboreum (which diverged from Arabidopsis before the duplication event). For Oryza singletons, gene databases were used from Sorghum bicolor (which diverged from Oryza after the duplication event) from PlantGDB and Musa (which diverged from Oryza before the duplication event) from PROMUSA (www.promusa.org). Homologs from these databases were selected based on having a TBLASTN E value of <1010 and minimum alignment of 40 aa with the singleton protein sequences used as the query. For duplicate pairs and singletons with detected homologs, amino acid sequences were aligned using CLUSTALW version 1.83 and used to align the corresponding coding sequences with TRANALIGN from EMBOSS. Aligned sequences were then examined to identify stretches of identical codons without a substitution or gap. The longest region of identity within domains, if present, or in nondomain regions was chosen from each singleton and duplicate gene. The distributions were examined for statistical differences by ANOVA of Box-Cox-transformed data (
= 1/3). Subsequently, TukeyKramer comparisons corrected for unequal sample sizes were used to define statistically different groups.
| Acknowledgements |
|---|
|
|
|---|
| Footnotes |
|---|
Author contributions: B.A.C., J.E.B., and A.H.P. designed research; B.A.C., J.E.B., and F.A.F. performed research; B.A.C., J.E.B., and F.A.F. contributed new reagents/analytic tools; B.A.C., J.E.B., F.A.F., and A.H.P. analyzed data; and B.A.C., F.A.F., and A.H.P. wrote the paper.
Conflict of interest statement: No conflicts declared.
This paper was submitted directly (Track II) to the PNAS office.
© 2006 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
X. Wang, H. Tang, J. E. Bowers, F. A. Feltus, and A. H. Paterson Extensive Concerted Evolution of Rice Paralogs and the Road to Regaining Independence Genetics, November 1, 2007; 177(3): 1753 - 1763. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Melchinger, H.-P. Piepho, H. F. Utz, J. Muminovic, T. Wegenast, O. Torjek, T. Altmann, and B. Kusterer Genetic Basis of Heterosis for Growth-Related Traits in Arabidopsis Investigated by Testcross Progenies of Near-Isogenic Lines Reveals a Significant Role of Epistasis Genetics, November 1, 2007; 177(3): 1827 - 1837. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. W. Ganko, B. C. Meyers, and T. J. Vision Divergence in Expression between Duplicated Genes in Arabidopsis Mol. Biol. Evol., October 1, 2007; 24(10): 2298 - 2309. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Freeling, L. Rapaka, E. Lyons, B. Pedersen, and B. C. Thomas G-Boxes, Bigfoot Genes, and Environmental Response: Characterization of Intragenomic Conserved Noncoding Sequences in Arabidopsis PLANT CELL, May 1, 2007; 19(5): 1441 - 1457. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Sieber, F. Wellmer, J. Gheyselinck, J. L. Riechmann, and E. M. Meyerowitz Redundancy and specialization among plant microRNAs: role of the MIR164 family in developmental robustness Development, March 15, 2007; 134(6): 1051 - 1060. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Birchler and R. A. Veitia The Gene Balance Hypothesis: From Classical Genetics to Modern Genomics PLANT CELL, February 1, 2007; 19(2): 395 - 402. [Full Text] [PDF] |
||||
![]() |
J. A. Udall and J. F. Wendel Polyploidy and Crop Improvement Crop Sci., November 1, 2006; 46(Supplement_1): S-3 - S-14. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Tuskan, S. DiFazio, S. Jansson, J. Bohlmann, I. Grigoriev, U. Hellsten, N. Putnam, S. Ralph, S. Rombauts, A. Salamov, et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science, September 15, 2006; 313(5793): 1596 - 1604. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. C. Thomas, B. Pedersen, and M. Freeling Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes Genome Res., July 1, 2006; 16(7): 934 - 946. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. E. Schranz and T. Mitchell-Olds Independent Ancient Polyploidy Events in the Sister Families Brassicaceae and Cleomaceae PLANT CELL, May 1, 2006; 18(5): 1152 - 1165. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||