Previous Article |
Table of Contents
| Next Article
Eukaryotic Transposable Elements and Genome Evolution Special Feature
EUKARYOTIC TRANSPOSABLE ELEMENTS AND GENOME EVOLUTION SPECIAL FEATURE / BIOLOGICAL SCIENCES / RESEARCH ARTICLES / GENETICS
Recurrent duplication-driven transposition of DNA during hominoid evolution

,




*Department of Genome Sciences and the ¶Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195;
Department of Genetics and Center for Human Genetics, Case Western Reserve School of Medicine and University Hospitals of Cleveland, Cleveland, OH 44106; 
Genome Technology Branch and
NISC, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892; ||Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030; and **Sezione di Genetica, Dipartimento di Anatomia Patologica e di Genetica, University of Bari, 70126 Bari, Italy
Edited by Susan R. Wessler, University of Georgia, Athens, GA, and approved August 18, 2006 (received for review July 1, 2006)
| Abstract |
|---|
|
|
|---|
duplicons | LCR16 elements | lineage-specific duplications | segmental duplication
450 duplication hubs have been identified that have been the target of duplications from many different ancestral loci. This property has created regions of the genome that are complex mosaics of different genomic segments (7) where novel genes, fusion genes, and gene families have emerged (2, 813). Detailed studies of a few of the underlying regions (9, 11, 14) suggest that duplications have occurred in a stepwise fashion, involving subsequent larger segments of duplication as secondary events. The mechanism by which hundreds of kilobases of genomic sequence becomes duplicatively transposed to a new location on a chromosome is unknown. Human chromosome 16 represents one of the most extreme examples of such recent segmental-duplication activity (15). More than 10% of the euchromatic portion of human chromosome 16p consists of segmental duplications known as LCR16 (low-copy repeat sequences on chromosome 16) (16, 17). During the initial sequence analysis of this chromosome, Loftus et al. (17) identified at least 20 distinct gene-rich LCR16 elements, ranging in size from a few kilobases to >50 kb in length, termed LCR16at. The majority of these were duplicated in an interspersed configuration throughout the chromosome. We subsequently identified a gene family, morpheus, within LCR16a that showed significant signatures of positive selection [Ka/Ks ratios up to 13.0 between humans and Old-World monkey (OWM) species]. The finished chromosome 16 sequence (18) provided the basis for a detailed analysis of these regions. We investigated the detailed organization of these regions among nonhuman primate species by sequencing large-insert clones from a diverse panel of primates to address questions regarding the mechanism of origin, the extent of structural variation among primates, and the relationship of these complex structures to the rapidly evolving LCR16a segment.
| Results |
|---|
|
|
|---|
19,794 bp in length) (Table 5, which is published as supporting information on the PNAS web site, and Fig. 1). Of the 11 other LCR16 elements considered in this analysis (Table 1), all map within 109 kb of an LCR16a duplication. After excluding ancestral segments, we find only one exception where a block exists (LCR16uw, Fig. 1 and Table 6, which is published as supporting information on the PNAS web site) without a full-length (20 kb) LCR16a element. In contrast, two distinct "solo" LCR16a elements have been identified that are not associated with other duplicated segments (Table 5), including a single rogue segment that has been mapped outside of chromosome 16 to human 18p11.
|
|
Duplication of Great Ape LCR16 Blocks into New Locations. Mapping and sequencing of LCR16 segmental duplications within primate genomes has been problematic because the duplications are typically embedded in large duplication blocks that may exceed 100 kb in size. For example, in the chimpanzee genome, these regions are misassembled, are highly fragmented, or correspond to gaps (19). Large-insert genomic clones, such as BACs, can help circumvent this problem because BAC-end sequence (BES) may extend beyond the duplication blocks to anchor in unique sequence (20). Such sequence anchors provide information regarding the corresponding map position. We therefore selected 782 BACs for insert end-sequencing, generating 526 pairs of end-sequence that were informative for mapping purposes (Table 8, which is published as supporting information on the PNAS web site). Based on comparative mapping of macaque and baboon for each single-copy locus of LCR16, we unambiguously determined the most likely ancestral location of each segmental duplication, which mapped to nine distinct locations that were consistent between both outgroup species (Fig. 1; and see Fig. 4, which is published as supporting information on the PNAS web site). With the exception of LCR16t, LCR16a is not associated with any of these regions in OWM species (Fig. 1).
Using a similar strategy, we attempted to assign locations for corresponding loci within ape genomes using BES data. In contrast to OWMs, we identified multiple loci for each probe, the vast majority of which associated with LCR16a based on the hybridization results. We categorized ape loci as mapping to (i) an orthologous locus (based on the identification of LCR16 duplications at that position in human), (ii) an ancestral position (based on map positions of single copy loci in baboon and macaque), or (iii) a nonorthologous location (based on the absence of a corresponding duplication at that position in human) (Tables 2 and 5 and Fig. 4). We could assign 35 loci to one of these categories, whereas
27 were ambiguous (end sequences placed in duplicated sequences in humans or other primates, preventing accurate assignment; see Methods). We observed a spatial clustering of new insertions. Both sequence and BES data, for example, indicate that the distal portion of chromosome 16 has been the target of such LCR16 duplications, particularly within the chimpanzee lineage. Similarly, many of these orangutan insertions mapped to a 5-Mb region on human 13q12.113q12.3 (Fig. 4).
|
|
As a more direct test of association with LCR16a, we performed a series of independent hybridization experiments with each of the eight lineage-specific duplications in orangutan that were not identified as duplicated in chimpanzee or human. We estimated the copy number of each duplication and then cross-referenced positive clones by PCR to determine whether they were associated with LCR16a in the orangutan (Table 10, which is published as supporting information on the PNAS web site). Seventy-two percent (100 of 139) of clones detected by using a lineage-specific probe were also positive for LCR16a. When BES data were used to eliminate the ancestral locus, we found that 94% (17 of 18) of the duplicated loci were in association with LCR16a. We found only one exception where an orangutan-specific duplication had occurred without LCR16a. These data indicate that different intrachromosomal euchromatic duplications have emerged at different locations in a different lineage but focused, once again, around the LCR16a core. Interestingly, both copy number and sequence divergence decrease in a gradient-like fashion as distance from LCR16a increases (Fig. 7, which is published as supporting information on the PNAS web site). Thus, even though the chromosome, the location, and the content of the segmental duplication differ, we observe a virtually identical complex mosaic pattern of segmental duplications and polarity vis-à-vis LCR16a in different primate species.
Recurrent and Independent Duplications of LCR16a. Sequencing of the baboon and macaque genomes confirmed the ancestral location of each LCR16 segment (Fig. 1). Using noncoding primate genome sequences, we constructed a neighbor-joining phylogenetic tree for each of the 14 human LCR16 duplicons (Fig. 8, which is published as supporting information on the PNAS web site). The tree topology and corresponding branch lengths were remarkably consistent with the evolutionary order of events predicted from the initial hybridization results. The LCR16a phylogenetic analysis reveals two distinct clades, one monophyletic origin with respect to human/African ape sequences and a second monophyletic origin for the orangutan loci (Fig. 6). This finding is consistent with molecular clock data, which indicate that LCR16a expansions have occurred independently in each of the two lineages. It is interesting that, when the duplication architecture is superimposed over the LCR16a phylogeny, similar block architectures cluster. For example, in the case of human, three distinct groups can be recognized based largely on the presence of flanking LCR16 duplicons (LCR16b, d, or k/l). These associations supersede relationships predicted based on orthology, suggesting large-scale genetic exchanges since speciation of humans and great apes (21).
The finding of so many independent, recurrent duplications of the LCR16a segment prompted us to investigate whether there might be evidence for additional, more ancient copies of LCR16a that were not originally identified as a result of our threshold for detection (i.e., >90% sequence identity). Five additional loci were discovered, including three nearly full-length copies on chromosome 10q22.3 as well as two partial copies on chromosomes Xp11.22 and 11p15.4 (Fig 9, which is published as supporting information on the PNAS web site). Three of these five homologous LCR16a structures were embedded within complex duplication blocks flanked by chromosome-specific segmental duplications. The extensive substitutions (
0.20.3 substitutions per site) suggest that these duplications of LCR16a occurred much earlier during primate evolution (>40 million years) (22). Analysis of the recent rhesus macaque genome assembly confirmed the presence of Xp11.22, 11p15.4, and one of the 10q22.3 loci at syntenic positions to these human copies, confirming duplication of these before the divergence of the macaque/human lineages.
Junction Analysis. Two types of junctions could be identified based on our comparison of nonhuman primate and human sequences: (i) those that traversed lineage-specific duplications that had not been observed in humans (termed accretion boundaries) and (ii) those corresponding to the sites of new insertions (i.e., unique-duplication transitions where the LCR16 duplications were not present at that locus in human). The latter, termed insertion boundaries, provided the opportunity to study the architecture of the integration sites before duplicative transposition.
We generated precise sequence alignments and examined the repeat content for a total of 12 insertion and 23 accretion boundaries. As a control for the quality of sequence and assembly, subsets of these boundaries were tested and validated by junction-PCR amplification and sequencing of the PCR product (Fig. 3). Overall,
55% (19 of 35) of the junctions showed the presence of an Alu repeat mapping precisely at the accretion or insertion boundary. Of these,
95% (18 of 19) corresponded to younger subfamilies (AluS and AluY) (Table 11 and Fig. 10, which are published as supporting information on the PNAS web site). This threefold enrichment confirms previous findings that younger Alu repeat elements are significantly enriched at the breakpoints of segmental duplication (23, 24). Because of the lineage-specific nature of the duplications, donor and acceptor relationships in most cases could be readily defined. We noted 11 examples where the transition between donor and acceptor sequences occurred within homologous (although not identical) repeat elements.
|
|
| Discussion |
|---|
|
|
|---|
Recurrent Duplications. We show that LCR16a has duplicated independently in each of the great-ape lineages to new euchromatic locations (Fig. 2). Most of the complex duplication blocks on human chromosome 16 are or have been associated with a full-length copy of LCR16a. Human and orangutan LCR16a map to different locations in the two genomes (Fig. 4). More ancient, full-length copies of the LCR16a element have been identified on different chromosomes, once again associated with complex regions of duplication. These data indicate that LCR16a duplications have occurred independently multiple times, and this 20-kb sequence has an inherent proclivity to duplicate to new locations.
Duplication Polarity. Other LCR16 elements have accumulated in a stepwise fashion focused around LCR16a to form complex duplication blocks (Fig. 6). Unlike LCR16a, solitary duplications (i.e., not associated with another LCR) are rarely identified for these (in the one clear case in human, analysis of the structure showed it to be a deletion of LCR16a) (Fig. 4v). Based on outgroup sequence data (macaque and baboon), most of these LCR16 elements originate from ancestral single-copy sequences (Fig. 1). We show that younger and less abundant duplications accumulate at the periphery of LCR16a (Fig. 7). In the case of orangutan, a completely analogous structure of flanking duplications (independent in origin) has emerged flanking LCR16a (Figs. 2 and 3). These data suggest polarity of duplication around LCR16a.
Ancestral Associations. Our hybridization and sequencing (Tables 1 and 2) data indicate that several of the ancestral loci of intrachromosomal segmental duplication on chromosomes 13 and 16 have been associated with LCR16a. In gorilla, for example, we find LCR16a in close proximity to LCRl (although at least in humans, such an association no longer exists). Two other examples (Figs. 2d and 4u) indicate that ancestral positions of LCR16 in chimpanzee and orangutan map in close proximity with LCR16a and are associated with lineage-specific duplications in these species. We propose these associations with LCR16a have served to prime lineage-specific duplications from these regions.
Coordinated Deletion. Our detailed analysis of six new insertions have shown that, in all six cases, the newer insertions involved the coordinated deletion of sequences. The preintegration sequences are highly enriched for common repeat sequences and may be prone to double-strand breakage events. The coordinated deletion of target site nucleotides has been observed for several atypical L1 integration events (26, 27) and may implicate single-strand annealing (SSA) and/or synthesis-dependent annealing (SDSA) (28, 29) as part of the pathway of segmental duplication.
Core Duplicon-Flanking Transposition Model. We have shown that LCR16 segmental duplications change in copy number, composition, and, more remarkably, location among humans and great apes. These regions of the genome may be loosely classified as a form of mobile DNA. Unlike typical common repeats (30), however, this process has moved and juxtaposed large gene structures, frequently in a lineage-specific manner, into new genomic contexts. The complex set of data presented here argues that LCR16a has played an active role in creating the duplication architecture on human chromosome 16 and orangutan chromosome 13. We propose that other LCR16 duplications have been duplicated passively, essentially as genetic hitchhikers as part of this process. The association of LCR16a elements with ancestral loci, especially younger duplication events, suggests that a property of the sequence itself has the potential to duplicate sequences to new locations. This has occurred independently and at different times during humangreat ape evolution. These events are associated with both deletions and other rearrangement events that have subtly restructured human and great ape chromosomes during evolution. Core duplicons, similar to LCR16a, have recently been identified for other chromosomes with an overabundance of intrachromosomal duplications (31, 32). It is possible that this characteristic may represent a general property of the human/great ape genome.
There are at least two possible explanations for our observations. First, the LCR16a sequence may have evolved mechanistically as a preferred template for gene conversion events to new locations in the human genome. In this model, LCR16a would serve as a source for the directional repair of a double-strand breaks in the genome, perhaps similar to yeast mating-type switching (33). The Alu-repeat richness of the LCR16a cassette would provide the homology to promote single-strand annealing and/or SDSA. These findings might explain the coordinated deletion of preintegration sites, the enrichment of Alu repeats at breakpoints, and the finding that sequences flanking LCR16a become duplicated. If the LCR16a sequence carries an inherent enhancer of gene conversion, it is unclear how the process could be so processive (hundreds of kilobases) or why it, as opposed to other Alu-rich repeat regions of the genome, is the preferred source.
An alternative explanation for the apparent strong association of new duplications with LCR16a may be as an indirect consequence of intense selection. We have shown previously that the gene family encoded by LCR16a shows among the strongest signatures of positive selection among humans and African ape genes (8). It is possible that the complex pattern of duplication is simply a consequence of the pressure to produce more divergent copies of LCR16a at distinct locations. We do not favor this model completely, because positive selection of the morpheus gene family occurred only among humans and African apes (Ka/Ks = 1013 for exon 2 when compared with the OWM outgroup). Our data indicate that complex duplicated blocks have emerged completely independently in the orangutan lineage, where there is no strong evidence of positive selection (Ka/Ks
1.0). Moreover, we have identified more ancient copies of LCR16a on chromosomes 10 and 11, and the X chromosome, suggesting that this piece of genetic material was inherently unstable and duplicating before positive selection. We therefore favor a duplication-driven model of DNA transposition. This dynamic model for genomic duplication helps to explain the nonrandom spatialtemporal distribution of segmental duplications in human and great apes.
| Methods |
|---|
|
|
|---|
BAC Sequencing. BACs were subjected to shotgun sequencing at the National Institutes of Health Intramural Sequencing Center (34) and the Baylor College of Medicine Human Genome Sequencing Center to (35) at least 6-fold sequence redundancy. A subset of clones (n = 25) corresponding to potential new insertions were selected for ordered and oriented sequence assembly.
Sequence Annotation. Nonhuman primate BAC sequence was compared with human genome sequence by using Miropeats (36), two_way_mirror (J. Bailey, personal communication), and ALIGN (37), using parameters optimized for global alignment of primate sequences (22). The best map location was defined as one where human and nonhuman primate sequences align within nonduplicated flanking unique sequences. If the entire BAC was duplicated, the most significant correspondence by BLAST sequence homology was used, and the location was classified as "ambiguous." We examined the extent of recent duplication (>94%) for each clone using the whole-genome shotgun sequence-detection strategy for human (2), chimpanzee (38), and orangutan (E.E.E., unpublished data). FISH hybridization was used to assess duplication/unique status in gorilla (M.V., unpublished data). For simplicity, human chromosome designations are used for nonhuman map descriptions (39).
PCR Breakpoint Analysis. A subset of breakpoints associated with lineage-specific insertions were validated by designing PCR assays across the breakpoint junctions (Table 12) and amplification of genomic DNA from a panel of primate lymphoblast-derived DNAs. The dense repeat content of many of the breakpoints precluded design of assays across all insertion breakpoints.
Phylogenetic Analysis. We extracted overlapping sequences corresponding to each of the human segmental duplications from nonhuman primate sequences and generated multiple sequence alignments using ClustalW (40) and corresponding neighbor-joining phylogenetic trees (MEGA). We considered only noncoding sequences by processing the multiple sequence alignments for corresponding cDNA using MAM software. We used Kimura's two-parameter method (41) for all estimates of genetic distance.
| Acknowledgements |
|---|
|
|
|---|
| Footnotes |
|---|
Abbreviations: OWM, Old-World monkey.

To whom correspondence should be addressed. E-mail: eee{at}gs.washington.edu
Author contributions: M.E.J. and E.E.E. designed research; M.E.J., N.C.S.P., V.A.M., S.S., M.V., R.A.G., and E.D.G. performed research; M.E.J., Z.C., and E.E.E. analyzed data; and M.E.J. and E.E.E. wrote the paper.
National Institutes of Health Intramural Sequencing Center (NISC) Comparative Sequencing Program: Leadership provided by Robert W. Blakesley, Gerard G. Bouffard, Nancy F. Hansen, Maishali Maskeri, Pamela J. Thomas, and Jennifer C. McDowell. ![]()
The authors declare no conflict of interest.
This article is a PNAS direct submission.
Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos: AC092562, AC07264AC07266, AC097268, AC097270AC097271, AC097326AC097328, AC097330AC097334, AC117273, AC118583, AC119422, AC144359AC144362, AC144462, AC144590, AC144875AC144877, AC144879AC144881, AC145000, AC145025, AC145040, AC145177, AC145239AC145240, AC145242AC145243, AC145295, AC145353AC145354, AC145356, AC145400AC145403, AC146492, AC146844, AC146898, AC146952, AC147576, AC148534, AC148537 AC148538, AC148619, AC148838AC148839, AC148841, AC148882, AC149436, AC150449, AC153733, and AC154112).
© 2006 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
E. E. Eichler and A. W. Zimmerman A Hot Spot of Genetic Instability in Autism N. Engl. J. Med., February 14, 2008; 358(7): 737 - 739. [Full Text] [PDF] |
||||
![]() |
M. Babcock, S. Yatsenko, J. Hopkins, M. Brenton, Q. Cao, P. de Jong, P. Stankiewicz, J. R. Lupski, J. M Sikela, and B. E. Morrow Hominoid lineage specific amplification of low-copy repeats on 22q11.2 (LCR22s) associated with velo-cardio-facial/digeorge syndrome Hum. Mol. Genet., November 1, 2007; 16(21): 2560 - 2571. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||