Seattle Children's Hospital Research Institute  Sign up for PNAS Online eTocs
Link: Info for AuthorsLink: Editorial BoardLink: AboutLink: SubscribeLink: AdvertiseLink: ContactLink: Sitemap Link: PNAS Home
Proceedings of the National Academy of Sciences
Link: Current Issue "" Link: Archives "" Link: Online Submission ""  Link: Advanced Search

Published online on February 8, 2006, 10.1073/pnas.0507782103
PNAS | February 21, 2006 | vol. 103 | no. 8 | 2730-2735


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Supporting Information
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My File Cabinet
Right arrow Download to citation manager
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via ISI Web of Science (23)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Chapman, B. A.
Right arrow Articles by Paterson, A. H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chapman, B. A.
Right arrow Articles by Paterson, A. H.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg  
What's this?

 Previous Article  | Table of Contents |  Next Article 

BIOLOGICAL SCIENCES / EVOLUTION
Buffering of crucial functions by paleologous duplicated genes may contribute cyclicality to angiosperm genome duplication

Brad A. Chapman*,{dagger}, John E. Bowers*, Frank A. Feltus*, and Andrew H. Paterson*,{dagger},{ddagger},§,

*Plant Genome Mapping Laboratory and Departments of {dagger}Plant Biology, {ddagger}Genetics, and §Crop and Soil Science, University of Georgia, Athens, GA 30602

Edited by Tomoko Ohta, National Institute of Genetics, Mishima, Japan, and approved December 8, 2005 (received for review September 13, 2005)


    Abstract
 Top
 Abstract
 Results and Discussion
 Materials and Methods
 Acknowledgements
 References
 
Genome duplication followed by massive gene loss has permanently shaped the genomes of many higher eukaryotes, particularly angiosperms. It has long been believed that a primary advantage of genome duplication is the opportunity for the evolution of genes with new functions by modification of duplicated genes. If so, then patterns of genetic diversity among strains within taxa might reveal footprints of selection that are consistent with this advantage. Contrary to classical predictions that duplicated genes may be relatively free to acquire unique functionality, we find among both Arabidopsis ecotypes and Oryza subspecies that SNPs encode less radical amino acid changes in genes for which there exists a duplicated copy at a "paleologous" locus than in "singleton" genes. Preferential retention of duplicated genes encoding long complex proteins and their unexpectedly slow divergence (perhaps because of homogenization) suggest that a primary advantage of retaining duplicated paleologs may be the buffering of crucial functions. Functional buffering and functional divergence may represent extremes in the spectrum of duplicated gene fates. Functional buffering may be especially important during "genomic turmoil" immediately after genome duplication but continues to act {approx}60 million years later, and its gradual deterioration may contribute cyclicality to genome duplication in some lineages.

amino acid substitution | Arabidopsis | Oryza | protein functional domain | single | nucleotide polymorphism


Recent sequencing efforts have revealed that genome duplication, a punctuational event in the evolution of a lineage, is more common than previously suspected. For example, detailed analyses of the sequences of the dicotyledonous angiosperm (dicot) Arabidopsis thaliana (1, 2) and monocotyledonous Oryza sativa (rice) (3, 4) show duplicated genome structure. Hypotheses about the timing and extent of individual events contributing to this structure range from our preference of a few discrete events (1, 3) to many segmental events (5), but all agree that much segmental duplication has occurred in both monocots and dicots since their divergence from a common ancestor, variously estimated at 125–140 (6) to 170–235 million years ago (7). Additional, more recent, duplications in specific angiosperm lineages (8) lead one to question whether this process may be not just episodic but truly cyclical in this group of taxa.

Under classical models, gene duplication is proposed to be a primary source of genetic material available for the evolution of genes with new functions (911); one member of a duplicated gene pair may mutate and acquire unique functionality (12, 13), with the fitness of the organism insulated by the homoeolog. Such models would predict that, in natural populations, higher levels of polymorphism would occur in duplicated genes than in singletons [a prediction supported by recent theory (14)] and that the ability of duplicates to provide functional compensation for one another would erode as their functions diverged (15).

However, recent findings raise perplexing questions about this classical "functional divergence" model. Analysis of 17 nonallelic duplicates in Xenopus laevis shows evidence of purifying selection on each duplicate gene (16). For three recently duplicated ({approx}0.25–1.2 million years ago) Arabidopsis genes, both progenitor and derived copies show significantly reduced species-wide polymorphism (17). Duplicated yeast genes provide a discernible degree of functional compensation for a remarkably long period (18) and appear to undergo gene conversion (19). Realization of the potential benefits that may result from functional divergence of newly duplicated genes would naturally require a new polyploid to persist long enough for adaptive evolution to occur. Most higher organisms are thought to continuously produce aberrant unreduced gametes at low rates, but the rarity of genome duplication shows that the overwhelming majority do not survive. Study of both natural polyploids and synthetic polyploids formed by colchicine-based manipulation of interspecific hybrids reveals immediate consequences of polyploidization that ostensibly seem maladaptive, including loss and restructuring of low-copy DNA sequences (2025), activation of genes and retrotransposons (26, 27), and gene silencing (2831).

The angiosperms are an outstanding higher-eukaryote model in which to elucidate consequences of genome duplication in view of the strong signal that remains from multiple genome duplications, naturally occurring replication afforded by independent duplications, and a host of genetic and molecular tools. The fates of duplicated genes may be closely associated with effective population size (Ne) for a taxon (32), meaning that recent insights from microbes (18, 33–35) may not extend well to crown eukaryotes. For example, recent empirical data (35) validate the prediction that subfunctionalization should be rare in organisms with large effective population size, such as yeast (12), and highlight the need for complementary investigations of larger-bodied (smaller Ne) eukaryotes, such as angiosperms.

We explore for "footprints of selection" associated with genome duplication, investigating whether the evolution of genes in modern populations or recently diverged taxa are influenced by the presence of an ancient duplicated copy at an unlinked locus. Specifically, we distinguish strictly "singleton" genes from those that have retained duplicated copies as a result of ancient duplication and compare levels and patterns of polymorphism in the coding sequences of these two gene classes. Our results suggest that functional conservation, particularly of complex genes and functional domains, may occur in many paleologs that are still recognizable as such. Thus, functional divergence suggested by classical models and functional buffering suggested herein may be alternative outcomes in the spectrum of possible fates for paleologs. Several avenues may offer some reconciliation between these respective models.


    Results and Discussion
 Top
 Abstract
 Results and Discussion
 Materials and Methods
 Acknowledgements
 References
 
Gene Fates Are Correlated Across Multiple Duplication Events. From the oldest ({gamma}) duplicated chromosomal segments that we could discern (1), we first determined whether individual genes did or did not retain a paleologous copy in subsequent duplications (Fig. 1b). Indeed, {gamma}-duplicated genes are more often retained in duplicate for both the beta (36.3% of the time versus 20.6% for singletons; binomial proportion, P < 1.3 x 10–15) and {alpha} (30.2% versus 22.8%; P < 2.2 x 10–4) events. beta-Event-derived duplicated and singleton genes show a similar pattern; 41.8% of beta-duplicated genes are retained in duplicate during the {alpha} event compared with only 27.0% of singleton genes (P < 7.9 x 10–54). The tendency for singletons to be repeatedly restored to singleton status after new duplications, together with the unexpected excess of duplicated genes that not only persist for long time periods but continue to spawn additional copies, suggest that gene retention after duplication is selective. Others have recently supported these results (36) and suggested differential reduplication of particular gene functional classes (37, 38).


Figure 1
View larger version (30K):
[in this window]
[in a new window]
 
Fig. 1. Preferential retention of ancient duplicated genes. (a) Whole-genome duplications and SNP study systems. Colinear regions of genes resulting from ancient duplications (three in Arabidopsis and one in Oryza) are recognized based on the subset of cases in which both homoeologous copies of an ancestral gene are retained (X); however, in the majority of cases one homoeolog has been lost (O). For each gene, whether a duplicated copy is present or not, SNPs between recently diverged landraces (Arabidopsis) or subspecies (Oryza) are used to characterize alleles that have persisted in modern populations. (b) Arabidopsis genes present in duplicate are preferentially maintained through subsequent duplication events. Individual singleton and duplicated genes at corresponding locations in collinear regions resulting from the oldest ({gamma}) duplication event were traced through subsequent beta and {alpha} events (see text). Bold numbers indicate the total copy numbers of a single event-derived gene in the Arabidopsis genome sequence. Numbers in parentheses are the total gene counts within each class. The differences in final copy numbers between singleton and duplicated genes are statistically significant (P < 0.01, {chi}2).

SNP Alleles That Persist in Populations Are More Conservative in Genes with Extant Paleologs Than in Singleton Genes. To investigate whether the evolution of genes in modern populations or recently diverged taxa are influenced by the presence of ancient duplicated copies at paleologous loci, we surveyed naturally occurring polymorphic alleles in large intraspecific SNP collections for Arabidopsis [37,344 (39)] and rice [384,341 (40)] (Fig. 1a). A total of 2,022 and 7,971 SNPs could be plotted to codon locations within genes in ancient duplicated regions of Arabidopsis and rice, respectively. This finding permitted investigation of whether SNP alleles show differential patterns of abundance or severity in duplicated genes versus singletons, the latter defined stringently to exclude genes with recent nonhomoelogous duplicated copies. The overall SNP rate is lower in Arabidopsis (in which landraces within a species were compared) than in rice (in which subspecies were compared) (Table 1). Because our experimental unit is the gene rather than the organism, identification of common trends associated with independent duplications in these divergent lineages offers a broader inference space than study of SNP frequencies in additional members of each taxon, although the latter would be a valuable future investigation.


View this table:
[in this window]
[in a new window]
 
Table 1. SNPs located within coding regions of duplicate and singleton genes

 
By placing SNPs in their context within codons, striking differences between singleton and duplicated genes became evident. In genes for which an ancient duplicated copy was present at a paleologous location, a larger proportion of SNPs were in the third codon position than in singletons. This observation held true across all duplication events in both Arabidopsis and rice (Fig. 2a). Indeed, SNPs caused an inferred amino acid change in 52%, 51%, and 50% of cases for {alpha}, beta, and {gamma} Arabidopsis singleton genes, respectively, but only 39%, 44%, and 47% of cases for corresponding duplicated genes. In rice, SNPs caused inferred amino acid changes in 61% of singletons versus 48% in duplicated genes. These differences are statistically significant for the Arabidopsis {alpha} and rice duplications (P < 0.001, {chi}2) but not for the older Arabidopsis duplications.


Figure 2
View larger version (31K):
[in this window]
[in a new window]
 
Fig. 2. SNPs that have persisted in natural populations cause less severe inferred amino acid changes in ancient duplicated genes than in singletons. (a) Codon position of SNP substitutions. For all duplication events, a higher proportion of changes are concentrated in the third wobble position for duplicated genes than for singletons. (b) Severity of SNP-derived amino acid changes. Blosum80 substitution values were used as a measure of the severity of a change, with higher values indicating less severe changes. For all cases, mean changes in duplicated genes are less radical than in singletons. For the Arabidopsis {alpha} and rice duplication events, these differences are statistically significant. Error bars indicate SEM.

Consistent with the finding that a higher fraction of SNPs in duplicated genes were in the third codon position, the severity of amino acid changes resulting from SNPs in duplicated genes was less than in singletons. We used the Blosum80 substitution matrix to compare amino acids inferred to be encoded by alternative SNP alleles. For all cases, duplicated genes have a higher mean substitution value (Fig. 2b), indicating that the changes caused are less radical. This result is statistically significant for the Arabidopsis {alpha} and rice duplications but not for the older Arabidopsis duplications. To preclude the possibility that erroneous gene predictions influence this result, the analyses were repeated with only the subsets of singleton and duplicated genes that matched an EST (E < 10–50) with nearly identical results.

The conservative evolution of duplicated genes was evident not only qualitatively but also quantitatively. Gene pairs that were relatively similar tended to retain less severe SNPs (in terms of inferred amino acid changes) than more divergent gene pairs from the same event (i.e., of similar age) (Fig. 3). The conservative evolution of paleologs is all the more striking in view of the theoretical expectation that variation within species in duplicated genes should be greater than in single-copy genes (14). This finding further contradicts the expectations of traditional functional divergence models for polyploidy (911), in which radically altered alleles should persist more frequently in similar duplicated genes because of the presence of a redundant copy but rarely in members of more divergent pairs that may no longer have the same functions.


Figure 3
View larger version (24K):
[in this window]
[in a new window]
 
Fig. 3. SNPs that have persisted in natural populations cause less severe inferred amino acid changes in genes with a similar duplicated homoeolog than those with a more divergent duplicated homoeolog. For each duplication event, the distance between duplicated proteins was calculated by PROTDIST and plotted against Blosum80 substitution values measuring the severity of SNP changes. Error bars indicate the SEM for each Blosum80 substitution class. x (Upper) and * (Lower) designate statistically different groups using Tukey–Kramer multiple comparisons corrected for unequal sample sizes (P < 0.05). (Data lacking the two indicated symbols are not statistically distinguishable from either of the other categories.) The statistically significant negative slope for both Arabidopsis and rice plots (both P < 0.0001) indicate that, as the distance between duplicated genes decreases, SNPs are less severe.

Genes Encoding Long and Complex Proteins Are Preferentially Preserved in Duplicate and Evolve Conservatively. Genes for which an ancient duplicated copy has been preserved are 25–112% longer, on average, than singletons (Table 1). To investigate whether greater structural complexity could confer a selective advantage to preservation of their sequence and function, we first determined the percentage of each gene (coding region) covered by discernible Pfam database-defined protein domains (widely characterized functional units) and then plotted that percentage against Blosum80-based severity of inferred amino acid substitutions resulting from SNPs that have persisted in natural populations. In both singleton and duplicated proteins, higher domain coverage is associated with lower severity of amino acid changes encoded by SNPs (Fig. 4).


Figure 4
View larger version (29K):
[in this window]
[in a new window]
 
Fig. 4. Retained SNPs cause less radical amino acid changes in domain-rich proteins. The mean percentage of proteins contained within defined Pfam domains was plotted against the severity of SNP-caused changes, as measured by Blosum80 substitution values. Error bars indicate the SEM. * designates statistically different groups (P < 0.05, Tukey–Kramer). Although statistically significant groupings of data points could be resolved only for rice duplicated genes, significant positive slopes (P < 0.0001) for three of the four lines (Arabidopsis duplicate and rice singleton and duplicate), with the fourth marginally missing significance (Arabidopsis singleton, P < 0.074), strongly support the hypothesis that less radical changes are associated with proteins containing a higher percentage of domains. This finding appears true regardless of duplication status.

Not only are duplicated genes generally longer than singletons, but a much higher fraction of the coding regions comprise identifiable domains, thus making duplicated genes doubly sensitive to nonsynonymous mutations. For the Arabidopsis {alpha} event, an average of 20.5% of the coding regions of singletons and 44% of duplicates are covered by characterized domains. Similarly, rice singletons average 7.7% coverage by domains versus 41.3% for duplicated genes.

The greater retention in duplicate of longer and more complex proteins, together with their lower tolerance of nonsynonymous mutations, suggest that the potential advantage conferred by the possibility of future neofunctionalization may be outweighed by the immediate benefits of buffering crucial functionality, i.e., ensuring that the functions of essential genes and domains are met even after mutation of one copy by the presence of a second copy. The tendency for duplicated gene copies to tolerate less severe amino acid substitutions than singletons in natural populations is in contrast to a primary advantage of polyploidy being the freedom for duplicated genes to acquire new functions (911). However, this finding is consistent with recent results about the evolution of recently evolved duplicates (16, 17) and long-term functional compensation (18, 41, 42). Further testing of this hypothesis might derive from obtaining data for additional taxa that parallel the fitness data available for individual gene knockouts in yeast (18).

Such genetic buffering may be especially important in survival of lineages experiencing the early "genomic turmoil" that immediately follows polyploid formation (2025). However, our data show that such buffering still markedly affects diversity among ecotypes of Arabidopsis thaliana and subspecies of Oryza sativa, species that each trace to genome duplications that occurred 60 million years or more before speciation (1, 3).

Do Homogenization Processes Act on Paleologs? What mechanisms might preserve the sequence of thousands of pairs of genes at paleologous sites across a genome for 60 million years or more? Occasional nonhomologous associations between chromosomes are observed during mitosis in many taxa, including rice (43), and might periodically permit homogenization processes to act between paleologs. Mechanisms such as gene repair (44) and gene conversion (45) have been suggested to occur between ancient duplicated genes (19) but can be difficult to quantify (46). Indeed, genome-wide characterization of duplicates via established methods provided ambiguous results.

To attempt to circumvent problems with accurately detecting homogenization across an entire genome, we explored pairwise alignments of duplicated genes for unexpectedly long stretches of sequence identity. Paleologs were compared with one another, whereas corresponding singletons from the same polyploidization event were compared with their best homolog from species that diverged from the lineage before or after duplication, respectively, as detailed in Materials and Methods. In the absence of homogenization, the lengths of identical regions for duplicate pairs should fall between those of such pairs of singleton-to-homolog comparisons flanking the duplication event (Fig. 5; see also Supporting Results and Table 2, which are published as supporting information on the PNAS web site). Within this framework, the largest stretch of identical codons for each gene, representing the best evidence for potential homogenization, was compared in both domain and nondomain regions of singleton and duplicated genes.


Figure 5
View larger version (41K):
[in this window]
[in a new window]
 
Fig. 5. Domain regions of duplicated homoeologs show longer-than-expected stretches of identical codons. The length distributions of the longest regions of identical codons found from alignments of duplicated pairs was compared with singletons aligned against pre- and postduplication homologs for both domain-containing and non-domain-containing regions; phylogenetic trees show the relative timing of Arabidopsis {alpha} and rice duplication events. Box plots of the first quartile, median, and third quartile show the distribution of these regions, with the number of regions contributing to each plot in parentheses. Letters indicate statistically different groups (P < 0.05, Tukey–Kramer multiple comparisons) for each class of gene type and domain or nondomain region. Based on calculated substitution rates (Supporting Results and Table 2), identity stretches between duplicated genes would be expected to fall between the two flanking comparisons of singletons to homologs. This pattern is observed for nondomain regions in Arabidopsis and rice. In contrast, identity stretches between duplicated genes in domain regions are statistically indistinguishable from comparisons of singletons to taxa that diverged much more recently than the duplication event.

Longer-than-expected stretches of identity are found in domain regions of duplicated genes but not singleton genes, suggesting that homogenization processes may act between paleologs. For nondomain regions, the distribution of the largest identical regions for duplicated genes falls between that of the pre- and postduplication homolog versus singleton comparisons, as expected based on KS and KA values (Fig. 5; see also Supporting Results and Table 2). In contrast, domain regions of duplicates show a length distribution of identity stretches that is statistically equivalent to those in postduplication comparisons (Fig. 5). For example, the distribution of identity stretches between duplicated genes in domain regions of rice are slightly longer than those found between singletons and Sorghum, which is estimated to have diverged {approx}20 million years more recently (3).

Although the possibility of interaction between paleologous genes needs further study, the preservation of key domains by homogenization processes could offer a selective advantage to the retention of duplicated copies of genes that serve crucial functions. Loss of duplicated copies may permit the remaining singleton genes to evolve more rapidly, thus accelerating mutation rates in an adaptive manner by a mechanism that is explicable without projecting a specialized purpose into evolution (47). Homogenization may also cause molecular clocks based on substitution rates between duplicated genes to chronically underestimate the age of duplications, as suggested recently (23). This potential bias would affect many dating results, including some from our own laboratory (1, 3).

Reconciliation Between the Models. Several avenues may offer some reconciliation between the classical functional divergence model for genome duplication (911) and an emerging functional buffering model. Our hypothesis that a primary advantage of retaining genes in duplicate may be the immediate buffering of crucial functionality in no way precludes the occasional evolution of unique functionality as one outcome in a spectrum of possibilities that is generally weighted in favor of buffering. Indeed, genes for which divergence was advantageous may have diverged long ago and are no longer recognizable as paleologs. Finally, the present results focus on changes in protein sequence and do not address changes in noncoding regulatory elements that can cause expression divergence (48).

Polyploid buffering, along with recombination, may contribute to reduced accumulation of degenerative mutations via Müller’s ratchet (49). Although proximal duplication has been suggested to primarily amplify genes with secondary functionality, whole-genome duplication amplifies, and may buffer, genes involved in essential processes, such as primary metabolism (50). Many predominantly clonally propagated angiosperms (such as banana and sugarcane) and apomicts (51) are recently formed polyploids; protection of critical functionality by duplication and preservation of essential genes would explain how, in the absence of recombination, they avoid degeneration via Müller’s ratchet (49). Similarly, repair processes fostered by whole-genome duplication are also key to the extraordinary radiation tolerance of Deinococcus radiodurans (52).

Might Erosion of Buffering Impart Cyclicality to Genome Duplication? Reciprocal buffering presumably erodes with time as ancient duplicated genes eventually diverge, potentially imparting cyclicality to genome duplication. For example, in Arabidopsis, we could only assert that SNP frequencies in duplicated genes were lower than those in singletons (39% versus 52%) for the most recently ({alpha}) duplicated segments; the frequencies of SNPs encoding nonsynonymous changes in older Arabidopsis beta and {gamma} duplicate genes are virtually indistinguishable from those for singletons (49% versus 51% and 47% versus 50%, respectively). In organisms such as angiosperms, in which there are few roadblocks to polyploidization (such as sex chromosomes), the selective advantage for newly formed polyploids to survive may increase as the buffering from previous genome duplication erodes.

The emerging functional buffering model for polyploid genome evolution accommodates the ability of duplicated genes to innovate despite conservative rates of protein evolution. Genomes appear to have retained duplicated genes with slower evolutionary rates (53), homogenized protein domains, and higher connectedness in cellular networks (33). Duplicate gene evolution is thus a balance between maintaining critical gene components and diversification of the use of these components for evolutionary innovation. The identification of these critical components and characterization of their buffering capacity, mechanisms, and longevity may be important steps toward understanding the conditions under which large-scale duplications lead to successful taxa.


    Materials and Methods
 Top
 Abstract
 Results and Discussion
 Materials and Methods
 Acknowledgements
 References
 
Identifying Duplicated and Singleton Genes. Duplicated genes were defined as reciprocal best BLAST hits at homoeologous locations within colinear regions of genes previously defined for the Arabidopsis {alpha} (1) and Oryza duplicated segments (3). Singletons had no best BLAST hit at an E value of <10–10 anywhere in the genome. Proximal duplications were removed based on having additional nonhomoeologous duplicate pairs. Duplicated pairs corresponding to a recent segmental duplication between rice chromosomes 11 and 12 were removed. Arabidopsis coding, genomic, and protein sequences were downloaded from TAIR (The Arabidopsis Information Resource, www.arabidopsis.org) release 20030417. TIGR (The Institute for Genomic Research) Rice Pseudomolecules and Genome Annotation Version 1.0 (www.tigr.org/tdb/e2k1/osa1) was used for rice analyses.

SNPs. Arabidopsis SNPs were selected as described from 37,344 polymorphisms derived from whole-genome shotgun data for the Landsberg ecotype compared with the finished sequence of ecotype Columbia (39). Rice SNPs were from a whole-genome comparison between the japonica and indica subspecies (40). To locate SNPs within coding regions of duplicated and singleton genes, 20-bp sequences flanking the SNP were located in the genomic sequences from BLAST to identify potential genes and "fuzznuc" from the EMBOSS 2.4.1 package to find exact positions. Genomic locations were translated into positions in the coding sequence by comparing alignments of genomic and coding sequence created with the EMBOSS program EST2GENOME. Positions of SNPs within codons and Blosum80 substitution values were determined by using the predicted coding sequence and previously determined positions. The entire process was automated with PYTHON scripts.

Characterization of Gene Features. Distances between duplicate gene pairs were calculated with the PROTDIST program from PHYLIP 3.6A3, using the default Jones–Taylor–Thornton substitution matrix and a {gamma}-distribution of rates across positions ({alpha} = 0.9). Locations of protein domains within singleton and duplicated genes were determined with HMMPFAM from the HMMER package version 2.2g. Searches were conducted against the Pfam 11.0 database (7,255 models), retaining global matches with an E value of <0.001.

Analysis of Homogenization. For both Arabidopsis and Oryza, we used gene databases from taxa selected to flank the duplication event as determined by phylogenetic dating techniques (1, 3). Arabidopsis singletons used clustered EST databases from PlantGDB (Plant Genome Database, www.plantgdb.org) for Brassica napus (which diverged from Arabidopsis after the duplication event) and Gossypium arboreum (which diverged from Arabidopsis before the duplication event). For Oryza singletons, gene databases were used from Sorghum bicolor (which diverged from Oryza after the duplication event) from PlantGDB and Musa (which diverged from Oryza before the duplication event) from PROMUSA (www.promusa.org). Homologs from these databases were selected based on having a TBLASTN E value of <10–10 and minimum alignment of 40 aa with the singleton protein sequences used as the query. For duplicate pairs and singletons with detected homologs, amino acid sequences were aligned using CLUSTALW version 1.83 and used to align the corresponding coding sequences with TRANALIGN from EMBOSS. Aligned sequences were then examined to identify stretches of identical codons without a substitution or gap. The longest region of identity within domains, if present, or in nondomain regions was chosen from each singleton and duplicate gene. The distributions were examined for statistical differences by ANOVA of Box-Cox-transformed data ({gamma} = 1/3). Subsequently, Tukey–Kramer comparisons corrected for unequal sample sizes were used to define statistically different groups.


    Acknowledgements
 Top
 Abstract
 Results and Discussion
 Materials and Methods
 Acknowledgements
 References
 
This work was supported by a predoctoral fellowship from the Howard Hughes Medical Institute (to B.A.C.) and by the Rockefeller Foundation (F.A.F. and A.H.P.) and the U.S. National Science Foundation (J.E.B. and A.H.P.).


    Footnotes
 
To whom correspondence should be addressed at: Plant Genome Mapping Laboratory, University of Georgia, 111 Riverbend Road, Athens, GA 30602. E-mail: paterson{at}uga.edu

Author contributions: B.A.C., J.E.B., and A.H.P. designed research; B.A.C., J.E.B., and F.A.F. performed research; B.A.C., J.E.B., and F.A.F. contributed new reagents/analytic tools; B.A.C., J.E.B., F.A.F., and A.H.P. analyzed data; and B.A.C., F.A.F., and A.H.P. wrote the paper.

Conflict of interest statement: No conflicts declared.

This paper was submitted directly (Track II) to the PNAS office.

© 2006 by The National Academy of Sciences of the USA


    References
 Top
 Abstract
 Results and Discussion
 Materials and Methods
 Acknowledgements
 References
 

  1. Bowers, J. E., Chapman, B. A., Rong, J. K. & Paterson, A. H. (2003) Nature 422, 433–438.[CrossRef][Medline]
  2. Simillion, C., V& epoele, K., Van Montagu, M. C. E., Zabeau, M., Van de Peer, Y. (2002) Proc. Natl. Acad. Sci. USA 99, 13627–13632.[Abstract/Free Full Text]
  3. Paterson, A. H., Bowers, J. E. & Chapman, B. A. (2004) Proc. Natl. Acad. Sci. USA 101, 9903–9908.[Abstract/Free Full Text]
  4. Paterson, A., Bowers, J., Peterson, D., Estill, J. & Chapman, B. (2003) Curr. Opin. Genet. Dev 13, 644–650.[CrossRef][ISI][Medline]
  5. Hughes, A. L. & Friedman, R. (2003) Genome Res 13, 794–799.[Abstract/Free Full Text]
  6. Davies, T. J., Barraclough, T. G., Chase, M. W., Soltis, P. S., Soltis, D. E. & Savolainen, V. (2004) Proc. Natl. Acad. Sci. USA 101, 1904–1909.[Abstract/Free Full Text]
  7. Yang, Y. W., Lai, K. N., Tai, P. Y. & Li, W. H. (1999) J. Mol. Evol 48, 597–604.[CrossRef][ISI][Medline]
  8. Blanc, G. & Wolfe, K. H. (2004) Plant Cell 16, 1667–1678.[Abstract/Free Full Text]
  9. Stephens, S. (1951) Adv. Genet 4, 247–265.[Medline]
  10. Ohno, S. (1970) Evolution by Gene Duplication (Springer, Berlin).
  11. Taylor, J. S. & Raes, J. (2004) Annu. Rev. Genet 38, 615–643.[CrossRef][ISI][Medline]
  12. Lynch, M., O’Hely, M., Walsh, B. & Force, A. (2001) Genetics 159, 1789–1804.[Abstract/Free Full Text]
  13. Tocchini-Valentini, G. D., Fruscoloni, P. & Tocchini-Valentini, G. P. (2005) Proc. Natl. Acad. Sci. USA 102, 8933–8938.[Abstract/Free Full Text]
  14. Innan, H. (2003) Genetics 163, 803–810.[Abstract/Free Full Text]
  15. Wagner, A. (2000) Nat. Genet 24, 355–361.[CrossRef][ISI][Medline]
  16. Hughes, M. K. & Hughes, A. L. (1993) Mol. Biol. Evol 10, 1360–1369.[Abstract]
  17. Moore, R. C. & Purugganan, M. D. (2003) Proc. Natl. Acad. Sci. USA 100, 15682–15687.[Abstract/Free Full Text]
  18. Gu, Z. L., Steinmetz, L. M., Gu, X., Scharfe, C., Davis, R. W. & Li, W. H. (2003) Nature 421, 63–66.[CrossRef][Medline]
  19. Gao, L. Z. & Innan, H. (2004) Science 306, 1367–1370.[Abstract/Free Full Text]
  20. Kashkush, K., Feldman, M. & Levy, A. A. (2002) Genetics 160, 1651–1659.[Abstract/Free Full Text]
  21. Ozkan, H., Levy, A. A. & Feldman, M. (2001) Plant Cell 13, 1735–1747.[Abstract/Free Full Text]
  22. Shaked, H., Kashkush, K., Ozkan, H., Feldman, M. & Levy, A. A. (2001) Plant Cell 13, 1749–1759.[Abstract/Free Full Text]
  23. Feldman, M., Liu, B., Segal, G., Abbo, S., Levy, A. A. & Vega, J. M. (1997) Genetics 147, 1381–1387.[Abstract]
  24. Song, K. M., Lu, P., Tang, K. L. & Osborn, T. C. (1995) Proc. Natl. Acad. Sci. USA 92, 7719–7723.[Abstract/Free Full Text]
  25. Ozkan, H., Levy, A. A. & Feldman, M. (2002) Isr. J. Plant Sci 50, S65–S76.[CrossRef]
  26. Kashkush, K., Feldman, M. & Levy, A. A. (2003) Nat. Genet 33, 102–106.[CrossRef][ISI][Medline]
  27. O’Neill, R. J. W., O’Neill, M. J. & Graves, J. A. M. (2002) Nature 420, 106.
  28. Chen, Z. J. & Pikaard, C. S. (1997) Genes Dev 11, 2124–2136.[Abstract/Free Full Text]
  29. Chen, Z. J. & Pikaard, C. S. (1997) Proc. Natl. Acad. Sci. USA 94, 3442–3447.[Abstract/Free Full Text]
  30. Comai, L., Tyagi, A. P., Winter, K., Holmes-Davis, R., Reynolds, S. H., Stevens, Y. & Byers, B. (2000) Plant Cell 12, 1551–1567.[Abstract/Free Full Text]
  31. Lee, H. S. & Chen, Z. J. (2001) Proc. Natl. Acad. Sci. USA 98, 6753–6758.[Abstract/Free Full Text]
  32. Lynch, M. & Conery, J. S. (2003) Science 302, 1401–1404.[Abstract/Free Full Text]
  33. Wagner, A. (2001) Mol. Biol. Evol 18, 1283–1292.[Abstract/Free Full Text]
  34. Conant, G. C. & Wagner, A. (2003) Genome Res 13, 2052–2058.[Abstract/Free Full Text]
  35. Kellis, M., Birren, B. W., L& er, E. S. (2004) Nature 428, 617–624.[CrossRef][Medline]
  36. Seoighe, C. & Gehring, C. (2004) Trends Genet 20, 461–464.[CrossRef][ISI][Medline]
  37. Maere, S., De Bodt, S., Raes, J., Casneuf, T., Van Montagu, M., Kuiper, M. & Van de Peer, Y. (2005) Proc. Natl. Acad. Sci. USA 102, 5454–5459.[Abstract/Free Full Text]
  38. Blanc, G. & Wolfe, K. H. (2004) Plant Cell 16, 1679–1691.[Abstract/Free Full Text]
  39. J& er, G., Norris, S., Rounsley, S., Bush, D., Levin, I., Last, R. (2002) Plant Physiol 129, 440–450.[Abstract/Free Full Text]
  40. Feltus, F. A., Wan, J., Schulze, S. R., Estill, J. C., Jiang, N. & Paterson, A. H. (2004) Genome Res 14, 1812–1819.[Abstract/Free Full Text]
  41. Conant, G. C. & Wagner, A. (2004) Proc. R. Soc. London Ser. B 271, pp. 89–96.
  42. Teichmann, S. A. & Babu, M. M. (2004) Nat. Genet 36, 492–496.[CrossRef][ISI][Medline]
  43. Lawrence, W. J. C. (1931) Cytologia 2, 352–384.
  44. Liu, L., Parekh-Olmedo, H. & Kmiec, E. B. (2003) Nat. Rev. Genet 4, 679–689.[ISI][Medline]
  45. Newman, T. & Trask, B. J. (2003) Genome Res 13, 781–793.[Abstract/Free Full Text]
  46. Zhang, L. Q., Vision, T. J. & Gaut, B. S. (2002) Mol. Biol. Evol 19, 1464–1473.[Abstract/Free Full Text]
  47. Dickinson, W. J. & Seger, J. (1999) Nature 399, 30.[CrossRef][Medline]
  48. Makova, K. D. & Li, W. H. (2003) Genome Res 13, 1638–1645.[Abstract/Free Full Text]
  49. Muller, H. J. (1932) Am. Nat 66, 118–138.[CrossRef][ISI]
  50. Hooper, S. D. & Berg, O. G. (2003) Mol. Biol. Evol 20, 945–954.[Abstract/Free Full Text]
  51. Bayer, R. J. & Stebbins, G. L. (1987) Syst. Bot 12, 305–319.[CrossRef]
  52. Levin-Zaidman, S., Engl& er, J., Shimoni, E., Sharma, A. K., Minton, K. W., Minsky, A. (2003) Science 299, 254–256.[Abstract/Free Full Text]
  53. Davis, J. C. & Petrov, D. A. (2004) PLos Biol 2, e55.[Medline]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg    What's this?


This article has been cited by other articles in HighWire Press-hosted journals:


Home page
GeneticsHome page
X. Wang, H. Tang, J. E. Bowers, F. A. Feltus, and A. H. Paterson
Extensive Concerted Evolution of Rice Paralogs and the Road to Regaining Independence
Genetics, November 1, 2007; 177(3): 1753 - 1763.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. E. Melchinger, H.-P. Piepho, H. F. Utz, J. Muminovic, T. Wegenast, O. Torjek, T. Altmann, and B. Kusterer
Genetic Basis of Heterosis for Growth-Related Traits in Arabidopsis Investigated by Testcross Progenies of Near-Isogenic Lines Reveals a Significant Role of Epistasis
Genetics, November 1, 2007; 177(3): 1827 - 1837.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
E. W. Ganko, B. C. Meyers, and T. J. Vision
Divergence in Expression between Duplicated Genes in Arabidopsis
Mol. Biol. Evol., October 1, 2007; 24(10): 2298 - 2309.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
M. Freeling, L. Rapaka, E. Lyons, B. Pedersen, and B. C. Thomas
G-Boxes, Bigfoot Genes, and Environmental Response: Characterization of Intragenomic Conserved Noncoding Sequences in Arabidopsis
PLANT CELL, May 1, 2007; 19(5): 1441 - 1457.
[Abstract] [Full Text] [PDF]


Home page
DevelopmentHome page
P. Sieber, F. Wellmer, J. Gheyselinck, J. L. Riechmann, and E. M. Meyerowitz
Redundancy and specialization among plant microRNAs: role of the MIR164 family in developmental robustness
Development, March 15, 2007; 134(6): 1051 - 1060.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
J. A. Birchler and R. A. Veitia
The Gene Balance Hypothesis: From Classical Genetics to Modern Genomics
PLANT CELL, February 1, 2007; 19(2): 395 - 402.
[Full Text] [PDF]


Home page
Crop Sci.Home page
J. A. Udall and J. F. Wendel
Polyploidy and Crop Improvement
Crop Sci., November 1, 2006; 46(Supplement_1): S-3 - S-14.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
G. A. Tuskan, S. DiFazio, S. Jansson, J. Bohlmann, I. Grigoriev, U. Hellsten, N. Putnam, S. Ralph, S. Rombauts, A. Salamov, et al.
The genome of black cottonwood, Populus trichocarpa (Torr. & Gray).
Science, September 15, 2006; 313(5793): 1596 - 1604.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
B. C. Thomas, B. Pedersen, and M. Freeling
Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes
Genome Res., July 1, 2006; 16(7): 934 - 946.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
M. E. Schranz and T. Mitchell-Olds
Independent Ancient Polyploidy Events in the Sister Families Brassicaceae and Cleomaceae
PLANT CELL, May 1, 2006; 18(5): 1152 - 1165.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Supporting Information
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My File Cabinet
Right arrow Download to citation manager
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via ISI Web of Science (23)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow