Repair of DNA double-strand breaks by templated nucleotide sequence insertions derived from distant regions of the genome

Edited by Michael A. Resnick, National Institute of Environmental Health Sciences, Research Triangle Park, NC, and accepted by the Editorial Board April 18, 2014 (received for review November 25, 2013)
May 12, 2014
111 (21) 7729-7734

Significance

We show that DNA double-strand breaks (DSBs) can be repaired by insertion of 50- to 1,000-bp sequences termed “templated-sequence insertions” (TSIs) derived from distant regions of the genome. Additional experiments indicate that the source of template for repair was primarily nuclear RNA. This mode of DNA-DSB repair by insertion is not restricted to experimentally produced breaks but also occurs at the site of spontaneous DNA DSBs in human cells. These TSIs are polymorphic in the human genome, suggesting that some TSIs occur in germ cells or embryos. Recognition of these TSIs is important in interpreting structural variations in short-read sequencing studies and provides additional polymorphic markers for population and evolution studies. This error-prone form of DNA repair may play a role in genetic diseases.

Abstract

We used the I-SceI endonuclease to produce DNA double-strand breaks (DSBs) and observed that a fraction of these DSBs were repaired by insertion of sequences, which we termed “templated sequence insertions” (TSIs), derived from distant regions of the genome. These TSIs were derived from genic, retrotransposon, or telomere sequences and were not deleted from the donor site in the genome, leading to the hypothesis that they were derived from reverse-transcribed RNA. Cotransfection of RNA and an I-SceI expression vector demonstrated insertion of RNA-derived sequences at the DNA-DSB site, and TSIs were suppressed by reverse-transcriptase inhibitors. Both observations support the hypothesis that TSIs were derived from RNA templates. In addition, similar insertions were detected at sites of DNA DSBs induced by transcription activator-like effector nuclease proteins. Whole-genome sequencing of myeloma cell lines revealed additional TSIs, demonstrating that repair of DNA DSBs via insertion was not restricted to experimentally produced DNA DSBs. Analysis of publicly available databases revealed that many of these TSIs are polymorphic in the human genome. Taken together, these results indicate that insertional events should be considered as alternatives to gross chromosomal rearrangements in the interpretation of whole-genome sequence data and that this mutagenic form of DNA repair may play a role in genetic disease, exon shuffling, and mammalian evolution.
Maintenance of chromosomal integrity is critical to the survival of all organisms; thus the repair of DNA double-strand breaks (DSBs) has been the subject of intense study (1, 2). The principal forms of DNS-DSB repair in mammals are nonhomologous end joining (NHEJ), which can be divided into canonical (cNHEJ) and alternative NHEJ (aNHEJ) (3) and homologous recombination (HR). HR leads to an error-free repair of the DNA DSB but requires a sister chromatid to serve as the template for repair (4). NHEJ does not require a sister chromatid for repair but is error-prone and thus is inherently mutagenic (1).
Many cancers are associated with acquired gross chromosomal rearrangements (GCR), including translocation, inversions, and deletions (5, 6). One proposed mechanism underlying these events is mistakes in NHEJ repair (5, 7). To study GCRs in mammalian cells, the I-SceI endonuclease, which has an 18-bp recognition sequence not present in mice or humans, has been used to produce a single, specific DNA DSB within chromosomal DNA (811). Although some investigators were unable to recover GCRs by producing a single I-SceI–induced DNA DSB in mammalian cells (1012), several recent studies (8, 9, 13) have reported production of thousands of chromosomal translocations using I-SceI cleavage followed by anchored PCR and deep sequencing.
Several studies (11, 12, 14) have documented that transfected plasmid DNA can be captured and used as a patch at the site of I-SceI–induced DNA DSBs. These patches often show signs of aNHEJ, such as microdeletion, microhomology, and nontemplated nucleotide addition. Moreover, mitochondrial DNA fragments have been identified at the site of homothallic switching (HO) endonuclease-induced breaks (15), further confirming the observation that DNA DSB can be repaired by insertion of DNA sequences.
RNA provides a template for DNA synthesis during reverse transcription of retroviruses and retrotransposons as well as during telomere elongation (16). In Saccharomyces cerevisiae, DNA DSBs can be repaired by the insertion of endogenous Ty1 retrotransposons (15, 17), and it has been suggested that endogenous retrotransposons, such as long interspersed element-1 (LINE-1), may have a role in DNA-DSB repair in mammalian cells. When supplied by a plasmid vector, the human LINE-1 ORF2 can mediate repair of HO endonuclease-induced DNA DSBs in S. cerevisiae via insertion of retrotransposon- or retrotransposon 3′-transduced cDNA sequences (18). In these experiments, RNA served indirectly as a template for DNA-DSB repair through a cDNA intermediate. More recently, synthetic RNA oligonucleotides have been shown to serve as a template for DNA synthesis during repair of HO endonuclease-induced DNA DSBs in S. cerevisiae, albeit at a very low efficiency compared with DNA oligonucleotides (19). Although there is no direct experimental evidence that mRNA or precursor mRNA (pre-mRNA) can serve as a repair template for DNA DSBs in mammalian cells (2), a role for LINE-1 retrotransposons in DNA-DSB repair has been predicted (20, 21). This prediction was based on the observation that new integration sites for an endonuclease-incompetent LINE-1 retrotransposon could be found in cultured rodent cells (21). In those experiments, the integration sites lacked the typical hallmarks of integration induced by the LINE-1 integrase, such as target-site duplications (TSDs) and polyA tracts, leading to the prediction that LINE-1 sequences had become integrated during repair of a spontaneous DNA DSB.

Results

I-SceI–Induced DNA DSBs Can Be Repaired by Insertion of Sequences Derived from Distant Regions of the Genome.

To study the repair of DNA DSBs in vivo, we previously generated a vector (EF1aTK) containing the EF1a promoter driving expression of the herpes simplex thymidine kinase (HsTK); inserted between the EF1a promoter and HsTK cDNA was the recognition site for the rare-cutting meganuclease I-SceI (Fig. S1A). This vector was electroporated into the human monocytic leukemia cell line U937, and a clone (designated “F5”) that had integrated a single copy of the EF1aTK vector was isolated (Fig. 1A) (11). We attempted to induce reciprocal chromosomal translocations by introducing an I-SceI expression vector and using ganciclovir (GCV) to select clones that had lost expression of HsTK. The majority of the GCV-resistant clones identified had short deletions encompassing the HsTK start codon; however, we identified rare clones that had undergone an insertion of 47–756 bp that was not derived from nearby genomic sequences but instead was derived from a distant genomic region (11).
Fig. 1.
Repair of DNA DSBs by insertion of sequences derived from distant regions of the genome. (A, Upper Left) Outline of reporter system. The EF1a promoter (open box), I-SceI recognition sequence, HsTK cDNA (cross-hatched box), and G418R cassette (vertically striped box) are indicated. (Upper Right) Genomic DNA was PCR amplified using primers flanking the I-SceI site. PCR products were subcloned and transformed into bacteria, colonies were isolated, and inserts were PCR amplified. Alleles with an intact I-SceI site were identified by digestion with I-SceI, which cleaves the wild-type fragment into two comigrating fragments of ∼300 bp (arrowhead). To reduce background from short indels, we excised the portion of the gel indicated, ligated the population of fragments into plasmids, transformed bacteria, and isolated colonies, which were PCR amplified with I-SceI–flanking primers. (Lower) PCR products larger than wild-type were sequenced. (B) Size of insertions events recovered from F5 and A15 cell lines varied from 73–414 bp (median, 191 bp). (C) More than half of the inserted sequences were derived from transcribed genic regions. (D) Approximately one fifth of the insertions were derived from LINE or SINE sequences.
To gain a greater understanding of these insertions, we transfected the F5 clone with an I-SceI expression vector, harvested genomic DNA, and amplified the region flanking the I-SceI site. The PCR products then were subcloned into a plasmid vector, and individual colonies were isolated (Fig. 1A). We analyzed 120 samples; 10 (8.3%) were wild-type with an intact I-SceI site, 101 (84.2%) showed small (<50 bp) insertions or deletions (collectively designated “short indels”), 7 (5.8%) showed large (>50 bp) deletions, and 2 (1.7%) showed large (>50 bp) insertions.
Because insertion events were uncommon compared with short indels, we modified our experimental procedure by isolating a population of PCR products 50- to 1,500-bp larger than the wild-type EF1aTK allele to enrich for samples with insertions (SI Materials and Methods and Fig. 1A). With this modification, the frequency of PCR products containing insertions was enriched substantially (from 1.8 to 7–18%), although many small indels (78–86%) and large deletions (2–7%) still were recovered. We identified 32 insertions derived from distant genomic regions in the F5 cells. To determine whether this phenomenon was unique to the F5 clone, we repeated the experiment with a clone (designated “A15”) that had integrated a single copy of the EF1aTK vector into the ovarian cancer cell line OVCAR8 and again identified insertions derived from distant genomic regions. In all, 32 insertions from the F5 clone and 34 from the A15 clone were derived from distant regions of the genome (Dataset S1A). To distinguish these insertions from random, nontemplated nucleotide additions, we termed these insertions “templated-sequence insertions” (TSIs). One TSI from F5 cells and two TSIs from A15 cells were generated by the insertion of two unrelated fragments from distinct genomic origins. The lengths and distributions of the TSIs were similar in the two cell lines (P = 0.442, Student’s t test) (Fig. 1B, Fig. S1B, and Dataset S1A). The TSI origins were mapped to 18 of 24 human chromosomes without any obvious preference for particular chromosomes or chromosomal regions (Fig. S1B). More than half of the TSIs were derived from genic sequences (Fig. 1C), and approximately one fifth of the TSIs were templated from LINE-1 or short interspersed element (SINE) sequences (Fig. 1D). Junctions often showed features of aNHEJ, such as microhomology and nontemplated nucleotide addition (Fig. S1C). These findings demonstrate that the use of TSIs to patch I-SceI–induced DNA DSBs is a reproducible form of DNA-DSB repair.

TSIs Are Not Unique to I-SceI–Induced Cleavage.

Transcription activator-like effector nucleases (TALENs) are broadly applicable genome-editing tools (22) that produce site-specific DNA DSBs through the action of the FokI endonuclease (Fig. S1D). To determine whether repair of DNA DSBs by TSI is unique to I-SceI–induced DNA DSBs, we used TALENs to create site-specific DNA DSBs within the breast cancer 1 (BRCA1) and phosphatase and tensin homolog (PTEN) loci. Plasmids encoding TALENs for BRCA1 or PTEN were transfected into 293T or F5 cells, respectively. Genomic DNA was harvested and amplified with primers that flanked the TALEN cleavage site (Fig. S1D). Similar to the approach used in Fig. 1A, a region of the gel 50- to 1,500-bp larger than the wild-type, uncleaved fragment was excised and subcloned. Individual plasmids were sequenced, and, as shown in Fig. S1E, TALEN-induced DNA DSBs also were repaired by TSIs.
In addition to TSIs identified at experimentally induced DNA DSBs, there are rare, anecdotal reports consistent with insertions at the site of a physiologic DNA DSB, such as the IGH switch region in human B-lineage tumors (2325); these findings are summarized in Dataset S2A. These observations support the hypothesis that the repair of DNA DSBs by TSIs is a generalizable phenomenon not restricted to repair of experimental DNA DSBs produced by endonucleases.

TSIs Used for DNA-DSB Repair Were Not Excised from the Genome.

To determine whether the TSI fragment had been excised from the genome and used to patch a DNA DSB, we evaluated four pure daughter clones (derived from the F5 or A15 cell lines) which contained experimentally induced TSIs. We generated primers that flanked the TSI donor sequence and amplified genomic DNA from the specific clone to determine if the donor sequence had been deleted. All four daughter clones showed a single, specific PCR product identical in size to the parental cell line, suggesting that the TSI donor region had not been excised from the genome (Fig. S2A). It remained possible that a TSI donor allele had deleted more than 2 kb (a portion of which was inserted at the DNA-DSB site) and that the PCR product was amplified from the remaining intact allele. However, we identified an SNP within the 2-kb amplicon for clone 5T121; the nucleotide sequence of this SNP demonstrated that both alleles were retained (Fig. S2B). In addition, in all four examples, quantitative PCR (qPCR) showed a copy number gain for the templated sequence, supporting the hypothesis that TSI donor regions were not excised from the genome (Fig. S2C).

TSI Used as Patches for DNA-DSB Repair Can Be Derived from RNA.

We hypothesized that the TSIs used to repair the DNA DSBs were patches produced by reverse transcription of a nuclear RNA template. To test the hypothesis that RNA could be used as a template for DNA-DSB repair, we cotransfected human F5 cells with an I-SceI expression vector and murine RNA, which had been treated with DNase I to eliminate any contaminating genomic DNA (Fig. S3 A and B), and isolated insertion events as described in Fig. 1A. We identified 102 clones with insertions; 92 clones had an insertion derived from a single donor site, nine clones had insertions derived from two donor sites, and one clone had an insertion derived from three distinct donor sites, leading to a total of 113 unique TSI events at the DNA-DSB site (Dataset S1B). Nine insertions matched murine sequences, suggesting that mRNA or pre-mRNA can be the underlying template for TSIs (Fig. 2). Seven of the nine mouse TSIs were derived from genic sequences, but two were derived from intergenic regions and would not have been expected to be present in total RNA. However, RT-PCR experiments demonstrated that these sequences represented nonannotated transcribed regions (Fig. S3 C and D). Unexpectedly, none of the insertions derived from murine RNA contained ribosomal RNA sequences, perhaps suggesting some specificity in the selection of the RNA template. Most (104/113) of the TSIs from the murine RNA cotransfection experiments were derived from endogenous human sequences; the size of the TSIs of human origin was 44–424 bp (Fig. S3E), similar in size and distribution to the TSIs identified in our initial experiments. The inserted sequences originated from 23 of the 24 human chromosomes with no clear preference for any chromosome (Fig. S3F). These results stand in contrast to insertions generated by replication fork stalling and strand switching, which preferentially insert nearby sequences derived from the same chromosome (26, 27). Of note, four insertions displayed simple repeat sequences, including three samples with mammalian telomere repeat (TTAGGG)n sequences, suggesting that telomerase RNA also can be used as a template to patch a DNA DSB (Fig. S3G).
Fig. 2.
Mouse RNA cotransfection. Nine mouse sequence-insertion events were identified. Sample F5R100L1-7 showed three tandem TSIs consisting of two human fragments and one murine fragment.

Reverse-Transcriptase Inhibitors Can Suppress TSIs.

LINEs are transposable elements that transpose not only LINE-1 RNA but also other elements such as SINEs (28). In addition, the reverse-transcriptase and endonuclease activity of the LINE-1 ORF2 has been proposed as a mechanism for the insertion of processed pseudogenes into the mammalian genome (29). To determine whether suppression of endogenous reverse-transcriptase enzymes would reduce the frequency of DNA-DSB repair by TSI, we transfected the F5 clone with an episomal I-SceI expression vector and selected cells that had been successfully transfected with hygromycin. The culture was divided into two aliquots. One was treated with AZT and 2′, 3′-dideoxyinosine (ddI), which have been shown to inhibit LINE-1 and other endogenous reverse-transcriptase enzymes (30, 31). The second aliquot was treated with vehicle alone. The concentrations of AZT and ddI used (3.7 and 4.2 μM, respectively) were below the AZT Ki for POLA, POLB, POLG, and POLE (140, 290, 8.7, and 400 μM, respectively) (32) and were below the IC50 for muscle cell proliferation (100 μM for AZT and 500 μM for ddI) (33). Genomic DNA was harvested and amplified using primers that flank the I-SceI cleavage site. The region of the gel containing fragments 50- to 1,500-bp larger than the wild-type (uncleaved) fragment was excised, and the PCR products were subcloned. In each of four independent experiments, reverse-transcriptase inhibitor (RTI) treatment significantly reduced the frequency of TSIs after in vivo I-SceI cleavage (Fig. S4 A and B), supporting the hypothesis that the TSIs used to repair DNA DSBs are derived primarily by reverse transcription of RNA.

TSIs Can Be Identified at the Sites of Spontaneous DNA DSBs.

To determine whether TSIs derived from distant genomic regions were limited to repair of experimentally induced DNA DSBs or instead represented a recurrent mechanism for repair of a spontaneous DNA DSB, we studied whole-genome sequence data from two human myeloma cell lines, KP6 (34) and MC1286PE1. We reasoned that repair of a spontaneous DNA DSB by a TSI would produce pairs of structural variation (SV) in the whole-genome sequence, and therefore we examined SVs in these two cell lines. From a total of 3,393 interchromosomal SVs in the KP6 cell line and 3,145 SVs in the MC1286PE1 cell line, we identified pairs of reciprocal fusion sequences (compared with the reference genome, GRCh37/hg19) that could represent interchromosomal TSIs by applying the following criteria: The two fusion junctions had to be located within 50 kb of one another; the strand polarity had to align so that an insertion was feasible; and the sequences had to map to a single unique region of the genome. A detailed analysis of one of these SVs is shown in Fig. 3A. As outlined, although this pair of SVs could have been caused by a balanced translocation, the translocation would produce one dicentric chromosome and one acentric chromosome. We hypothesized that this pair of SVs instead was caused by an insertion of chromosome-4 sequences into chromosome 7, and we generated primers to test this hypothesis. As shown in Fig. 3 B and C, we were able to amplify this hypothetical TSI and verify that it indeed was produced by the insertion of chromosome-4 sequences into chromosome 7. Using this strategy, we verified 17 of 18 TSIs predicted from KP6 sequence data and 13 of 15 TSIs predicted from the MC1286PE1 sequence data (Fig. S5 and Dataset S2B).
Fig. 3.
Identification of insertions from whole-genome sequence data. (A) SV data showed chromosome-4 sequences fused to chromosome 7 and reciprocal chromosome-7 sequences fused to chromosome 4. One SV has 23 “T”s inserted at the fusion point. Sequence fragments would be consistent with either a balanced translocation or an insertion of a chromosome-4 sequence into chromosome 7. (1) A putative balanced translocation consistent with the fusion data; note that this translocation would duplicate 188 bp of the chromosome-4 sequence (chr4:95518558–746) and would create one dicentric and one acentric chromosome. (2) Potential insertion of chromosome-4 sequences into chromosome 7; PCR primers used to amplify the insertion are indicated. (B) A larger PCR product is present in both the KP6 and MC1286PE1 cell lines; 293T cells have only the smaller PCR product representing the reference chromosome-7 sequence. (C) Nucleotide sequence of the insertion. Chromosome-7 sequences, target-site duplication (TSD), polyA tail (negative strand), polyadenylation signal, and chromosome-4 insertion are indicated.

Most TSIs Are Germ-Line Events and Represent Common Polymorphisms in the Human Genome.

Surprisingly, eight insertions were identical or nearly identical in the two cell lines (Dataset S2B), suggesting that these insertions represent polymorphisms in the human genome. To determine whether these TSIs were likely to be germ-line or somatic events, we compared the insertion junctions with a catalog of known SVs identified by the whole-genome sequence of 52 healthy individuals (SI Materials and Methods). SVs for 20 of the 23 unique TSIs were identified in this dataset, suggesting that most of the TSIs identified in the two myeloma cell lines represented germ-line polymorphisms rather than somatic mutations (Dataset S3A). To confirm the findings from the whole-genome SV dataset, we used PCR to amplify the TSIs from a set of widely studied human cell lines (293T, U937, HL60, K562, and OVCAR8). Nineteen of the 23 TSIs were PCR amplified from at least two independent cell lines, clearly demonstrating that these TSIs are polymorphic in human genomes (Dataset S3A); these polymorphic insertions are referred to as “templated sequence insertion polymorphisms” (TSIPs). Two TSIs were not found in the 52 individual databases or in cell lines but were found only in the KP6 cell line. These TSIs may represent somatic events, especially an 85-bp insertion of chromosome-1 sequence embedded at the junction of an acquired t(6;12) chromosomal translocation. Similar to our findings with the F5 and A15 daughter clones (Fig. S2), the TSI donor sequences from these two potential somatic TSIs were not deleted from the genome, and the copy number of the TSI donor sequence was increased (Fig. S6 A and B), as is consistent with the possibility of a somatic TSI. In addition, both TSI donor sequences were expressed in the KP6 cell line (Fig. S6C).

TSIs Can Be Placed into Two Classes.

Features of the 23 unique TSIs identified in the two myeloma cell lines are shown in Dataset S3B. The TSIs can be placed into two classes (class 1 and class 2) based on nucleotide sequences at or near the insertion. Class 1 TSIs displayed a TSD of at least 5 bp and the addition of nontemplated adenines (a polyA “tail”) at one insertion junction. These class 1 TSIs typically inserted at a preferred LINE-1 retrotransposon integration site (consensus sequence 5′-TTTT/A-3′) and contained a polyadenylation signal (5′-AATAAA-3′) located 10–20 nucleotides upstream of the polyA track (Datasets S2B and S3B). Class 2 TSIs had none of these features but instead displayed aNHEJ features such as microdeletion, microhomology, and nontemplated nucleotide addition at the insertion sequence junction (Datasets S2B and S3B).
The TSI origin and acceptor loci are shown as a Circos plot in Fig. 4A. Twelve of the TSIs represent class 1 events, eight represent class 2 TSIs, and the remaining three are ambiguous, because they lack either a TSD or a polyA addition. Class 2 TSIs were shorter (85–305 bp; median, 164 bp) than class 1 TSIs (160–3771 bp; median, 464 bp) (P = 0.024, Student’s t test) (Fig. 4B) and were similar in size to our experimentally produced TSIs (Fig. 4B). Genic regions were overrepresented compared with genomic composition and were preferentially used as templates (Fig. 4C). We determined the frequency of the TSIPs identified in the two myeloma cell lines by examining SV data from the whole-genome sequences of 52 normal individuals (SI Materials and Methods). The prevalence of these TSIPs in the general population varied widely, from 0 to 98%, suggesting that both cell lines carried a unique combination of ancient, ancestral TSIPs and more recent, rare TSIPs (Fig. 4D).
Fig. 4.
TSIs found in two myeloma cell lines. (A) Circos plot indicating TSIs identified in the two myeloma cell lines [KP6 and MC1286PE1; n = 22, excluding the chromosome-1 fragment inserted into t(6;12)]. (B) The length of insertions varies from 85–3,771 bp (median, 305 bp). (C) Insertions are biased toward genic regions (χ2 goodness-of-fit: P = 0.0013). (D) Frequency of the TSI events identified in the two myeloma cell lines (17 for KP6 and 13 for MC1286PE1) in the genomes of 52 normal volunteers. Note that three insertions had a frequency of close to 100%, suggesting that the reference genome (GRCh37/hg19) may represent a rare variant that lacks the TSIP at these three insertion sites.

Discussion

We previously described moderate-sized (50–1,000 bp) insertions of sequences at an I-SceI–induced DNA DSB that were derived from distant regions of the genome (11). We designated these insertions “TSIs” to distinguish them from the well-recognized nontemplated nucleotide addition that often accompanies NHEJ. Although the most common form of repair for experimentally (I-SceI) induced DNA DSBs was via small indels, a substantial (∼2%) fraction of repair events were made by TSIs. This phenomenon was not limited to I-SceI–induced DNA DSBs, because repair of TALEN-induced DNA DSBs also could be accomplished via TSI patches. Finally, we identified TSIs in the genomes of myeloma cell lines. Further investigation revealed that the majority of these TSIs in the human genome were TSIPs.
Reverse-transcribed retrotransposon sequences have been shown to patch DNA DSBs in yeast (17, 18), and a role for retrotransposon RNA in DNA-DSB repair has been predicted in mammalian cells (20). Furthermore, short RNA oligonucleotides have been shown to serve as templates for DNA-DSB repair in yeast (19), and cellular mRNA was shown to mediate recombination in yeast (35). These findings led us to suspect that the experimentally produced TSIs were derived primarily from nuclear mRNA or pre-mRNA. The observations that (i) DNA DSBs in a human cell line can be repaired by murine sequences when murine RNA is cotransfected, (ii) the genomic regions corresponding to the insertions are intact, and (iii) repair by insertion is inhibited by RTIs all support the contention that these experimentally produced TSIs were derived from RNA, as opposed to DNA, templates. Interestingly, small RNA species, although derived from local sequences near the DNA-DSB site rather than from more distant genomic regions, have been shown to be important for DNA-DSB repair (36). An intriguing possibility is that the RNA source for the DNA-DSB patch is not necessarily derived from a distant genomic region but instead is derived from a gene that is cotranscribed with the EF1a-TK cassette, in a shared transcription factory (37). In any event, our detection of insertions at the site of experimentally induced DNA-DSB repair supports the prediction, based on the detection of endonuclease-deficient LINE-1 insertions (20), that reverse transcription of RNA transcripts can be used to repair spontaneous DNA DSBs. These findings are distinct from prior reports of retrotransposon-mediated DNA-DSB repair in S. cerevisiae (17), in that the majority of experimentally produced TSIs in our study consist of genic regions rather than retrotransposon sequences. Similar to retrotransposon insertions (38), this mechanism of DNA repair is mutagenic, because insertions that disrupt exons could compromise coding sequences, and the insertion of exonic regions could lead to the incorporation of alternate exons into mRNA.
One insertion from the A15 cell line and three from the F5 cells consisted of (TTAGGG)n repeats, identical to mammalian telomere repeat sequence (Fig. S3G and Dataset S1). Interstitial telomeric sequences (ITSs) have been reported in several species, including humans (39), and it has been speculated that these insertions result from DNA-DSB repair (40). However, ITS insertion at the site of a DNA DSB was not detected in a prior study that screened >300,000 repair events looking specifically for ITS insertions (41). To our knowledge, this is the first direct experimental evidence documenting that telomere sequences can be used to repair DNA DSBs in mammals.
A large number of polymorphisms in human genomes have been identified and classified as SNPs, small (<50 bp) insertions or deletions (referred to collectively as short indels), and large (>50 bp) deletions. Analysis of the genomes of 1,092 individuals from 14 populations (the 1,000 Genomes Project) revealed 38 million SNPs, 1.4 million short indel polymorphisms, and 14,000 larger-deletion polymorphisms (42). In addition, a large number of polymorphic LINE-1 or Alu elements have been cataloged (43), and the presence of retroposed, processed gene transcript polymorphisms similar to processed pseudogenes has been predicted by analysis of SVs in the 1,000 Genomes Project (44). However, the methods used to identify LINE-1 polymorphic insertions were designed to find LINE-1 insertion events (43), whereas the methods used to identify polymorphic pseudogene insertions searched specifically for insertions of known exonic sequences (44). Our method of identifying TSIPs is not biased toward a specific type of insertion sequence; in fact, most of the TSIP insertions we identified were derived from neither LINE-1 nor exonic sequences. Of note, as opposed to SNPs, which may be the result of independent, recurrent events (i.e., a C→T transition), it is highly unlikely that these TSIPs are recurrent events; instead, they likely represent unique founder events.
TSIPs could be placed into two classes based on nucleotide features of the insertion event. Class 1 TSIPs have all the hallmarks of a retrotransposon-induced event. The cleavage typically is at a preferred 5′-TTTT/A-3′ LINE-1 insertion site; TSDs (5–20 bp) are present, as is a polyA tract; and a polyadenylation signal is located within 20 bp of the polyA tract. The most obvious explanation for these observations is that nuclear RNA transcripts were acted upon by the LINE-1 ORF2 and inserted, through the action of LINE-1 endonuclease and reverse transcriptase, into a distant genomic locus. The RNA transcripts that produced class 1 TSIPs include transcripts known to be subject to LINE-1–mediated insertion events, including LINE-1/SINE insertion (n = 2) and processed pseudogenes type insertion (n = 5). Surprisingly, almost half (5/12) of the class 1 TSIPs were derived from intronic or intergenic sequences that had become polyadenylated, most likely by a cryptic polyadenylation signal (Datasets S2B and S3B). Thus, LINE-1–mediated mobilization of non–LINE-1 transcripts in a germ cell or embryo may occur more often than previously recognized.
In contrast, the class 2 TSIPs represent a previously unknown form of insertion polymorphism that does not show TSD, polyA tracks, or a preferred integration site (Dataset S3B). Instead, class 2 TSIPs show signs typically associated with aNHEJ, such as short deletions at the insertion site, short track microhomology, and the addition of nontemplated nucleotides at the insertion site. We suspect that the class 2 TSIPs were produced by a DNA DSB induced by physiologic or environmental DNA damage, followed by microhomology-mediated annealing of mRNA, reverse transcription, and healing of the DNA DSB. This latter mechanism produces nucleotide features similar to the TSIs seen at I-SceI–induced DSBs (lack of polyA sequence and presence of microhomology).
One important limitation of current deep-sequence studies is the relatively short read lengths (50–500 bp, using the most popular current platforms), which then are assembled by comparison with a reference sequence. Reads that map to two different chromosomal regions typically are cataloged as SVs. The presence of two reciprocal SVs often is suspected to represent a balanced chromosomal translocation (45, 46) but in fact may be an insertion, as demonstrated in Fig. 3. Several recent papers (8, 9, 13) used I-SceI cleavage followed by anchored PCR to amplify sequences fused to the I-SceI cleavage site and deep sequencing to identify chromosomal translocations. However, those reports sequenced only small, short reads (the majority of reads containing <300 bp of fused sequence, well within the range of the TSI produced by I-SceI cleavage), making it difficult to rule out the possibility that the fusion sequences were generated by TSI repair rather than by chromosomal translocation. We believe that the interpretations of short-read deep-sequence data should consider the possibility of insertional events as alternatives to chromosomal translocations or other gross chromosomal rearrangements.
The findings reported here document that repair of DNA DSBs by insertions derived from distant genomic regions, which we term template sequence insertions, TSIs, are a common mode of repair for experimentally induced DNA DSBs. Moreover, we find evidence of numerous TSIs that are polymorphic in the human genome. Similar to retrotransposon insertions, we suspect that these TSIs, which represent a mutagenic form of DNA-DSB repair, may play a role in both the etiology of genetic diseases and mammalian evolution.

Materials and Methods

Cell Lines and DNA Transfections.

The KP6 and MC1286PE1 cell lines have previously been published (34, 47). Data from the 52 normal volunteers was obtained by Complete Genomics and published on their publicly available web site (www.completegenomics.com/sequence-data/download-data/). Cell lines that contain a single copy of linearized pEF1aTK (11) (Fig. 1) integrated into the U937 human monocytic leukemia cell line (F5) (11) and the OVCAR-8 human ovary cancer cell line (A15) (12) have been described previously. An episomal I-SceI expression vector that contained a hygromycin-resistant cassette (pCEP4-I-SceI) (11) was transfected together with total murine RNA (20–120 μg) into F5 or A15 cells and was selected with hygromycin. Genomic DNA was isolated 14–21 d after transfection.

Detection of I-SceI Cleavage Repaired by Insertion.

Genomic DNA was PCR amplified using primers that flanked the I-SceI site and was size-fractionated on 1% agarose gels. A portion of the gel 50- to 1,500-bp larger than the wild-type PCR product, predicted to contain unique polyclonal fragments with insertions, was excised, cloned into plasmid vectors, and transformed into bacteria. Colonies with insertions at the I-SceI cleavage site were sequenced.
Detailed experimental procedures are shown in SI Materials and Methods.

Acknowledgments

We thank Drs. Paul Meltzer and Ilan Kirsch, and members of the P.D.A. laboratory for helpful discussions; Haruto Onozawa for technical assistance; and the National Institutes of Health (NIH) Mini-sequencing core for Sanger sequencing. This work was supported by the intramural program of the National Cancer Institute, NIH. M.O. was supported by the Japan Society for the Promotion of Science Postdoctoral Fellowships for Research Abroad program.

Supporting Information

Supporting Information (PDF)
Supporting Information
Dataset_S02 (PDF)
Supporting Information
pnas.1321889111.sd01.xlsx
pnas.1321889111.sd03.xlsx

References

1
MR Lieber, The mechanism of double-strand DNA break repair by the nonhomologous DNA end-joining pathway. Annu Rev Biochem 79, 181–211 (2010).
2
A Malkova, JE Haber, Mutations arising during repair of chromosome breaks. Annu Rev Genet 46, 455–473 (2012).
3
L Deriano, DB Roth, Modernizing the nonhomologous end-joining repertoire: Alternative and classical NHEJ share the stage. Annu Rev Genet 47, 433–455 (2013).
4
M Sasaki, J Lange, S Keeney, Genome destabilization by homologous recombination in the germ line. Nat Rev Mol Cell Biol 11, 182–195 (2010).
5
A Nussenzweig, MC Nussenzweig, Origin of chromosomal translocations in lymphoid cancer. Cell 141, 27–38 (2010).
6
JM Chen, DN Cooper, C Férec, H Kehrer-Sawatzki, GP Patrinos, Genomic rearrangements in inherited disease and cancer. Semin Cancer Biol 20, 222–233 (2010).
7
PD Aplan, Causes of oncogenic chromosomal translocation. Trends Genet 22, 46–55 (2006).
8
R Chiarle, et al., Genome-wide translocation sequencing reveals mechanisms of chromosome breaks and rearrangements in B cells. Cell 147, 107–119 (2011).
9
IA Klein, et al., Translocation-capture sequencing reveals the extent and nature of chromosomal rearrangements in B lymphocytes. Cell 147, 95–106 (2011).
10
C Richardson, M Jasin, Frequent chromosomal translocations induced by DNA double-strand breaks. Nature 405, 697–700 (2000).
11
T Varga, PD Aplan, Chromosomal aberrations induced by double strand DNA breaks. DNA Repair (Amst) 4, 1038–1046 (2005).
12
Y Cheng, et al., Efficient repair of DNA double-strand breaks in malignant cells with structural instability. Mutat Res 683, 115–122 (2010).
13
Y Zhang, et al., Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell 148, 908–921 (2012).
14
Y Lin, AS Waldman, Capture of DNA sequences at double-strand breaks in mammalian chromosomes. Genetics 158, 1665–1674 (2001).
15
X Yu, A Gabriel, Patching broken chromosomes with extranuclear cellular DNA. Mol Cell 4, 873–881 (1999).
16
C Autexier, NF Lue, The structure and function of telomerase reverse transcriptase. Annu Rev Biochem 75, 493–517 (2006).
17
JK Moore, JE Haber, Capture of retrotransposon DNA at the sites of chromosomal double-strand breaks. Nature 383, 644–646 (1996).
18
SC Teng, B Kim, A Gabriel, Retrotransposon reverse-transcriptase-mediated repair of chromosomal breaks. Nature 383, 641–644 (1996).
19
F Storici, K Bebenek, TA Kunkel, DA Gordenin, MA Resnick, RNA-templated DNA repair. Nature 447, 338–341 (2007).
20
TA Morrish, et al., Endonuclease-independent LINE-1 retrotransposition at mammalian telomeres. Nature 446, 208–212 (2007).
21
TA Morrish, et al., DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat Genet 31, 159–165 (2002).
22
D Reyon, et al., FLASH assembly of TALENs for high-throughput genome editing. Nat Biotechnol 30, 460–465 (2012).
23
G Pratt, et al., Insertional events as well as translocations may arise during aberrant immunoglobulin switch recombination in a patient with multiple myeloma. Br J Haematol 112, 388–391 (2001).
24
G Lenz, et al., Aberrant immunoglobulin class switch recombination and switch translocations in activated B cell-like diffuse large B cell lymphoma. J Exp Med 204, 633–643 (2007).
25
E Nardini, et al., Detection of aberrant isotype switch recombination in low-grade and high-grade gastric MALT lymphomas. Blood 95, 1032–1038 (2000).
26
F Zhang, et al., The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humans. Nat Genet 41, 849–853 (2009).
27
JM Kidd, et al., A human genome structural variation sequencing resource reveals insights into mutational mechanisms. Cell 143, 837–847 (2010).
28
R Cordaux, MA Batzer, The impact of retrotransposons on human genome evolution. Nat Rev Genet 10, 691–703 (2009).
29
C Esnault, J Maestre, T Heidmann, Human LINE retrotransposons generate processed pseudogenes. Nat Genet 24, 363–367 (2000).
30
RB Jones, et al., Nucleoside analogue reverse transcriptase inhibitors differentially inhibit human LINE-1 retrotransposition. PloS one 3, e1547 (2008).
31
L Dai, Q Huang, JD Boeke, Effect of reverse transcriptase inhibitors on LINE-1 and Ty1 reverse transcriptase activities and on LINE-1 retrotransposition. BMC Biochem 12, 18 (2011).
32
JL Martin, CE Brown, N Matthews-Davis, JE Reardon, Effects of antiviral nucleoside analogs on human DNA polymerases and mitochondrial DNA synthesis. Antimicrob Agents Chemother 38, 2743–2749 (1994).
33
E Benbrik, et al., Cellular and mitochondrial toxicity of zidovudine (AZT), didanosine (ddI) and zalcitabine (ddC) on cultured human muscle cells. J Neurol Sci 149, 19–25 (1997).
34
JJ Westendorf, et al., Establishment and characterization of three myeloma cell lines that demonstrate variable cytokine responses and abilities to produce autocrine interleukin-6. Leukemia 10, 866–876 (1996).
35
LK Derr, JN Strathern, DJ Garfinkel, RNA-mediated recombination in S. cerevisiae. Cell 67, 355–364 (1991).
36
S Francia, et al., Site-specific DICER and DROSHA RNA products control the DNA-damage response. Nature 488, 231–235 (2012).
37
CS Osborne, et al., Active genes dynamically colocalize to shared sites of ongoing transcription. Nat Genet 36, 1065–1071 (2004).
38
E Lee, et al., Landscape of somatic retrotransposition in human cancers. Science; Cancer Genome Atlas Research Network 337, 967–971 (2012).
39
A Ruiz-Herrera, et al., Distribution of intrachromosomal telomeric sequences (ITS) on Macaca fascicularis (Primates) chromosomes and their implication for chromosome evolution. Hum Genet 110, 578–586 (2002).
40
SG Nergadze, MA Santagostino, A Salzano, C Mondello, E Giulotto, Contribution of telomerase RNA retrotranscription to DNA double-strand break repair during mammalian genome evolution. Genome Biol 8, R260 (2007).
41
P Rebuzzini, et al., New mammalian cellular systems to study mutations introduced at the break site by non-homologous end-joining. DNA Repair (Amst) 4, 546–555 (2005).
42
GR Abecasis, et al., An integrated map of genetic variation from 1,092 human genomes. Nature; 1000 Genomes Project Consortium 491, 56–65 (2012).
43
CR Huang, et al., Mobile interspersed repeats are major structural variants in the human genome. Cell 141, 1171–1182 (2010).
44
AD Ewing, et al., Retrotransposition of gene transcripts leads to structural variation in mammalian genomes. Genome Biol; Broad Institute Genome Sequencing and Analysis Program and Platform 14, R22 (2013).
45
HJ Abel, et al., SLOPE: A quick and accurate method for locating non-SNP structural variation from targeted next-generation sequence data. Bioinformatics 26, 2684–2688 (2010).
46
JS Welch, et al., The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012).
47
M Affer, et al., Promiscuous MYC locus rearrangements hijack enhancers but mostly super-enhancers to dysregulate MYC expression in multiple myeloma. Leukemia, 2014).

Information & Authors

Information

Published in

The cover image for PNAS Vol.111; No.21
Proceedings of the National Academy of Sciences
Vol. 111 | No. 21
May 27, 2014
PubMed: 24821809

Classifications

Submission history

Published online: May 12, 2014
Published in issue: May 27, 2014

Keywords

  1. polymorphism
  2. TSIP
  3. LINE-1
  4. DNA patch

Acknowledgments

We thank Drs. Paul Meltzer and Ilan Kirsch, and members of the P.D.A. laboratory for helpful discussions; Haruto Onozawa for technical assistance; and the National Institutes of Health (NIH) Mini-sequencing core for Sanger sequencing. This work was supported by the intramural program of the National Cancer Institute, NIH. M.O. was supported by the Japan Society for the Promotion of Science Postdoctoral Fellowships for Research Abroad program.

Notes

This article is a PNAS Direct Submission. M.A.R. is a guest editor invited by the Editorial Board.

Authors

Affiliations

Masahiro Onozawa
Genetics Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892; and
Zhenhua Zhang
Genetics Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892; and
Yoo Jung Kim
Genetics Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892; and
Liat Goldberg
Genetics Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892; and
Tamas Varga
Genetics Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892; and
P. Leif Bergsagel
Comprehensive Cancer Center, Mayo Clinic, Scottsdale, AZ 85259
W. Michael Kuehl
Genetics Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892; and
Peter D. Aplan1 [email protected]
Genetics Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892; and

Notes

1
To whom correspondence should be addressed. E-mail: [email protected].
Author contributions: M.O., P.L.B., W.M.K., and P.D.A. designed research; M.O., Z.Z., Y.J.K., L.G., T.V., P.L.B., and W.M.K. performed research; M.O., Z.Z., Y.J.K., L.G., T.V., W.M.K., and P.D.A. analyzed data; and M.O. and P.D.A. wrote the paper.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Altmetrics




Citations

Export the article citation data by selecting a format from the list below and clicking Export.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to access the full text.

    Single Article Purchase

    Repair of DNA double-strand breaks by templated nucleotide sequence insertions derived from distant regions of the genome
    Proceedings of the National Academy of Sciences
    • Vol. 111
    • No. 21
    • pp. 7499-7879

    Figures

    Tables

    Media

    Share

    Share

    Share article link

    Share on social media