Previous Article |
Table of Contents
| Next Article
GENETICS
A large-scale, gene-driven mutagenesis approach for the functional analysis of the mouse genome



**


*Institute of Developmental Genetics,
GSF-National Research Center for Environment and Health, D-85764 Neuherberg,
Germany;
Laboratory for Molecular Hematology,
University of Frankfurt Medical School, D-60590 Frankfurt am Main, Germany;
Department of Developmental Biology, Max
Planck Institute of Immunobiology, D-79108 Freiburg, Germany;
¶Department of Cell and Molecular Biology,
Institute of Biochemistry and Biotechnology, TU Braunschweig, D-38106
Braunschweig, Germany; ||Department for Molecular
Neurogenetics, Max Planck Institute of Psychiatry, D-80804 Munich, Germany;
and 
Department of Vertebrate Genomics,
Max Planck Institute for Molecular Genetics, D-14195 Berlin, Germany
Communicated by Sherman M. Weissman, Yale University School of Medicine, New Haven, CT, May 30, 2003 (received for review November 5, 2002)
| Abstract |
|---|
|
|
|---|
Large-scale insertional mutations in mammalian cells are induced most effectively with gene traps, a class of DNA or retroviral vectors that insert a promoterless reporter gene into a large collection of chromosomal sites. By selecting for gene expression, recombinants are obtained in which the reporter gene is fused to the regulatory elements of an endogenous gene. Transcripts generated by these fusions faithfully reflect the activity of individual cellular genes and serve as molecular tags to identify and/or clone any genes linked to specific functions (35). Application of this technique in a genome-wide manner should allow the identification of most, if not all, active transcripts in the genome and thus is an important tool for genome annotation. More importantly, gene trapping in mouse embryonic stem (ES) cells enables the establishment of ES cell libraries with mutations in most genes, which then can be used to make mice. This opens the possibility to assign a function to each gene in the context of an entire organism.
Several smaller-sized mutagenesis screens with gene-trap vectors have been reported (4, 69). However, the use of single gene-trap vectors in each screen, the unavailability of a complete mouse genome sequence, and a comparatively low number of analyzed insertions precluded a systematic assessment of the technology.
Based on the analysis of 5,142 sequence tags obtained from gene-trap insertions across the mouse genome, we show here that gene-trap vectors can disrupt all functional classes of genes, including disease genes, and are highly mutagenic in transgenic mice. We also show that individual gene-trap vectors complement each other in gene targeting, suggesting that the most effective way of saturating the mouse genome with mutations is by using a combination of different gene-trap vectors.
| Materials and Methods |
|---|
|
|
|---|
geo and pT1ATG
geo plasmid vectors or
infected with U3
geo and ROSA
geo retroviruses as described
(3,
10). Gene-trap-expressing ES
cell clones were selected in 200 µg/ml G418 (GIBCO/BRL), manually picked,
expanded, and stored frozen in liquid nitrogen. For gene-trap sequence tag
(GTST) recovery all clones were arrayed into 48-well plates, lysed, and
subjected to 5' rapid amplification of cDNA ends. 5' Rapid Amplification of cDNA Ends and Sequencing. cDNAs were prepared from the polyadenylated RNA by using a RoboAmp robotic device (MWG Biotec, Ebersberg, Germany) with a processing capacity of 96 samples per day. Samples of 2 x 105 cells were lysed in 1 ml of lysis buffer containing 100 mM Tris·HCl, pH 8.0/500 mM LiCl/10 mM EDTA/1% lithium-dodecyl sulfate (LiDS)/5 mM DTT. Polyadenylated RNA was captured from the lysates by biotin-labeled oligo(dT) primers according to manufacturer instructions (Roche Diagnostics, Indianapolis) and placed on streptavidin-coated 96-well plates (AB Gene, Surrey, U.K.). After washing, solid-phase cDNA synthesis was performed in situ by using random hexamers and SuperScript II reverse transcriptase (Invitrogen). To remove excess primers the cDNAs were filtered through multiscreen PCR plates (Millipore). The 5' ends of the purified cDNAs were tailed with dCTPs by using terminal transferase, terminal deoxynucleotidyl transferase (Invitrogen), following manufacturer instructions.
For PCR amplification of GTSTs, the following vector-specific primers were
used: (i) pT1
geo and pT1ATG
geo: 5'-CTA CTA CTA CTA
GGC CAC GCG TCG ACT AGT ACG GGI IGG GII GGG IIG-3' and 5'-GCC AGG
GTT TTC CCA GTC ACG A-3'; and 5'-CTA CTA CTA CTA GGC CAC GCG TCG
ACT AGT AC-3' and 5'-TGT AAA ACG ACG GCC AGT GTG AAG GCT GTG CGA
GGC CG-3' (nested); and (ii) U3
geo and the ROSA
geo:
5'-GCC ATT CAG GCT GCG CAA-3'; and 5'-CAA GGC GAT TAA GTT
GGG TAA TG-3' (nested). Amplification products were directly sequenced
by using AB377 or ABI3700 sequencing machines (Applied Biosystems).
GTST Analysis. After filtering sequences against repeats and removing all vector sequences from the GTSTs, a PHRED score was assigned to each individual nucleotide. GTSTs qualified as informative if they were at least 50 nt long and exhibited a minimum mean PHRED score of 20 (Fig. 4, which is published as supporting information on the PNAS web site, www.pnas.org). Homology searches were performed by using the publicly available sequence databases and the BLASTN algorithm. Databases included GenBank, UniGene, Online Mendelian Inheritance in Man (OMIM) (all at www.ncbi.nlm.nih.gov), ENSEMBL (www. ensembl.org), RIKEN (www.rarf.riken.go.jp), and GeneOntology (www.geneontology.org).
ES Cell Injections, Breeding, and Genotyping. 129Sv/J (TBV-2, R1, and E14.1) ES cell-derived chimeras were generated by injecting C57BL/6 blastocysts. The resulting male chimeras were bred to C57BL/6 females, and agouti offspring were tested for transgene transmission by tail blotting. Animals heterozygous for gene-trap insertions were backcrossed to C57BL/6 mice, and phenotypes were assessed in homozygous F2 offspring.
| Results and Discussion |
|---|
|
|
|---|
geo, pT1ATG
geo, ROSA
geo,
and U3
geo to transduce a promoterless
-galactosidase-neomycin
phosphotransferase (
geo) reporter gene into mouse ES cells. In
pT1
geo, pT1ATG
geo, and ROSA
geo,
geo is flanked by an
upstream 3' splice consensus sequence (splice acceptor) and a downstream
polyadenylation site to ensure its activation from integrations into introns
("intron trap")
(1113).
U3
geo lacks a splice acceptor sequence and therefore is activated mostly
from integrations into exons ("exon trap")
(10,
14). Because all these
gene-trap vectors require a cellular promoter for activation, the maximum
number of genomic targets equals the number of expressed genes. The vectors
pT1
geo and pT1ATG
geo were transduced as DNA into ES cells by
electroporation. The vectors U3
geo and ROSA
geo were transduced as
retroviruses into ES cells by infection. From 11,266 ES cell clones containing gene-trap insertions in expressed genes, we isolated 8,423 sequences adjacent to the gene-trap integration sites (GTSTs). As summarized in Table 1, 5,142 of these sequences provided useful GTSTs. The other sequences were either of low quality or were too short (<50 nucleotides) to be informative (see Materials and Methods and Fig. 4).
|
GenBank (NCBI) homology analysis revealed that 3,750 (72.9%) of the GTSTs belonged to known genes, 623 (12.1%) were ESTs, and 769 (15%) had no match in the database (Table 1). In comparison to our previous analysis (7), the number of matches to known genes increased by 26%, clearly reflecting the sustained progress in sequencing of the human and mouse genomes. Moreover, when nonmatching "novel" (previously uncharacterized) sequences (769) were aligned to the ENSEMBL database, 41% (389) produced a match (Table 2). However, despite the availability of a nearly complete mouse genome sequence, 7.4% (380 of 5,142; Tables 1 and 2) failed to produce a match in any database. Although this could be the result of some strain-specific variations between mouse genomes, it may also reflect the fact that some sequences are not yet available from the genome sequence, which still contains gaps.
|
Fifty-five percent of the genome-matching GTSTs were in annotated genes.
Interestingly, the frequency of U3
geo insertions into predicted introns
was almost twice as high as that obtained with all the other vectors
(Table 2), confirming previous
studies showing that the U3-type exon-trap vectors can be activated also from
integrations into the introns of expressed genes
(9,
15). Unexpectedly, 50 of 110
GTSTs obtained with the other vectors were also part of predicted introns
(Table 2), although intronic
sequences should have been removed by splicing
(3,
4). Although in nine instances
the intron-matching GTSTs resulted from aberrant splicing, we assumed that the
other 41 GTSTs are actually part of exons annotated incorrectly by the current
gene-prediction programs. To substantiate this hypothesis, we selected 10
annotated genes for additional expression studies. By using RT-PCR and primers
complementary to the intron-annotated GTSTs and to the corresponding
downstream exons (Fig. 5A, which is published as supporting
information on the PNAS web site), we obtained amplification products in five
instances. Direct sequencing of these products revealed splicing of the GTSTs
to the downstream exons (Fig. 5B), indicating that a significant
proportion of intron-matching GTSTs indeed are part of mispredicted exons.
To localize the GTSTs cytogenetically, we screened the UniGene database
using the GenBank accession number as an identifier. Allowing for an
e value
1020, we identified 1,349
GTSTs in mapped UniGene clusters that were distributed among all chromosomes
except the Y chromosome (Table
3). There was a direct correlation between the number of GTSTs on
a given chromosome and the number of UniGene clusters on that chromosome,
indicating that gene-trap insertions are dispersed throughout the genome and
occur more frequently in chromosomes with a high density of genes
(Fig. 1).
|
|
Several preferred integration sites or "hot spots" were
observed, some of which were hit >20 times. Examples include the UniGene
clusters 38,186 and 36,541, the growth-arrest gene Gas5, the
C-terminal-binding protein 2, and the Jumonji (mouse) homolog (Table 8, which
is published as supporting information on the PNAS web site). We identified a
total of 441 UniGene clusters containing two or more gene-trap insertions,
which corresponds to 25% of the recovered UniGene clusters and suggests that
75% of all genes are randomly accessible for gene-trap insertions. Forty-five
percent of the hot spots contained multiple (more than two) insertions of more
than one of the vectors and thus were vector-independent. Of the remaining
vector-specific hot spots, 12% were recognized only by pT1
geo, 10% by
pT1ATG
geo, 16% by U3
geo, and 17% by ROSA
geo vectors.
Moreover, the gene-trap hot spots were not sequence-specific and were not
related to gene size (Fig. 2),
suggesting that they are most likely defined by secondary chromatin structure.
Considering that over half of all the hot spots are vector-specific, we
believe that the most effective way to saturate the genome with gene-trap
insertions is with gene-trap vector combinations.
|
To estimate how effectively the various vectors trap genes that had not
been trapped before, we determined the number of insertions required by each
vector to trap a novel UniGene cluster.
Fig. 3 shows that the vectors
with a splice acceptor site (pT1
geo, pT1ATG
geo, and ROSA
geo)
trapped a different gene with almost every insertion. However, results from
pT1
geo, for which more insertions are available, suggest that the
trapping efficiency decreases with an increasing number of insertions,
presumably because of a gradual reduction of the pool of trappable genes
(Fig. 3 Insert). In
contrast, U3
geo, which does not contain a splice acceptor, consistently
required two or more insertions to hit a novel UniGene cluster
(Fig. 3). The inferior
gene-trapping efficiency of U3
geo reflects its comparatively small pool
of genomic integration targets, consisting mainly of the exons of expressed
genes. As a result, U3
geo integrated more frequently into a given
genomic hot spot than any of the other vectors. With an average insertion
frequency of 4.1 insertions per hot spot, U3
geo exceeded the average
hot-spot insertion frequency of the other vectors by almost 2-fold (Table
8).
|
Because gene inactivations induced by gene-trap vectors with a splice
acceptor sequence partly depend on effective splicing, the frequency of
aberrant splicing events was determined by analyzing the splice junctions
induced by each individual vector. Because the frequency of aberrant splicing
was essentially similar for all gene-trap vectors (pT1
geo = 3.5%;
pT1ATG
geo = 5.5%; ROSA
geo = 4.0%), we conclude that the splice
acceptor sequences used in this analysis are equally efficient [i.e.,
engrailed splice acceptor sequence for pT1
geo and pT1ATG
geo
(11,
12) and adenovirus major late
transcript splice acceptor sequence for ROSA
geo
(13)]. Interestingly, >80%
of the aberrantly spliced integrations into annotated genes were atypically in
exons, suggesting that ectopic splice sites inside exons are recognized
ineffectively by the splicing enzymes.
Because the relative mutagenicity of the gene-trap vectors likely depends
on their position within a gene, we looked at the insertion site of each
gene-trap vector with regard to its location within the full-length cDNA.
Table 4 shows that the vast
majority of retroviral gene-trap insertions involved the 5' half of
genes, confirming a reported preference of retroviral integrations
(9,
16). Interestingly, >50% of
the U3
geo insertions were in 5' untranslated regions
(Table 4), presumably due to a
relatively high stringency of selection that requires gene-trap vectors
without a splice acceptor to insert close to an active cellular promoter.
Although plasmid vectors also exhibited a slight preference for the 5'
ends of genes, insertions were distributed more evenly over the coding region
of a gene, indicating that even longer fusion proteins are stable
(Table 4). Finally, one
U3
geo integration was recovered from an intronless gene (glutathione
peroxidase 4/ENSMUSG00000038809). Although this was a unique event, it
demonstrates that U3
geo vectors can also disrupt single exon genes.
|
To analyze the functional spectrum of the genes represented in the GTST library, we classified the trapped UniGene clusters based on their known or putative function by using the GeneOntology database. Table 5 shows that the vectors used in this study inserted into all functional classes of mammalian genes, although with different frequencies, which suggests that the effective trapping of some specific classes of genes may require more specialized gene-trap vectors (17, 18).
|
Because the development of mouse models for human disease is a major goal of the human genome project, we also searched our library for integrations into genes involved in human disease. Using the Online Mendelian Inheritance in Man (OMIM) database, we found 204 GTSTs that corresponded to 90 previously characterized disease genes (Table 9, which is published as supporting information on the PNAS web site). ES cell clones with these insertions can be used to produce mouse mutant strains that may replicate the genetic defects and the symptomology of specific human disorders, and that may be useful for testing therapeutic methods. For example, we recently characterized a mouse strain with a phenotype closely resembling congenital nephrotic syndrome (19, 20).
To analyze the frequency of obvious phenotypes developing after gene-trap insertions, we injected 29 randomly selected ES cell clones into blastocysts and produced mutant mice from them. As shown in Table 6, 59% of the mice developed an obvious phenotype when bred to homozygosity, a frequency comparable with conventional gene targeting and to reported gene-trap screens (6, 13). Interestingly, over half of the observed phenotypes were embryonic or perinatal-lethal (Table 7), suggesting that a significant proportion of the genes expressed in ES cells are required for embryonic development.
|
|
We conclude that gene-trap mutagenesis is an efficient approach for annotating and dissecting the function of mammalian genes. Its large-scale implementation has already enabled the worldwide establishment of several databases containing GTSTs from hundreds of mouse genes (4, 69). Collectively, these databases provide an unprecedented resource for the scientific community in the postgenomic era, because clones from the corresponding ES cell libraries can be used immediately to cost-effectively generate mouse models of human disease. Clearly, the goal of understanding the function of every gene in the genome could be attained more quickly with the establishment of ES cell libraries with mutations in every single gene. Because each gene-trap vector seems to have its own set of specific hot spots, we conclude that the most effective generation of an ES cell library saturated with mutations should involve a collection of different gene-trap vectors. The ongoing collaboration within the international mouse-mutagenesis consortium (22) is likely to achieve complete saturation of the mouse genome within the next few years.
| Acknowledgements |
|---|
geo vectors, H. Earl
Ruley for the U3
geo, and Philippe Soriano for the ROSA
geo vectors.
We acknowledge Susanne Bourier, Franziska Köhler, Katharina Kuhlmeier,
Sava Michailidou, Ines Peiser, Armin Reffelmann, Irina Rodionova, Cordula
Schulz, Beata Thalke, Sandra Schwarzmeier, Beate Walther, and Carsta Werner
for excellent technical assistance. This work was supported by grants from the
Bundesministerium für Bildung und Forschung to the German Gene Trap
Consortium. | Footnotes |
|---|
Present address: Institute of Molecular and Structural Biology, Aarhus
University, C. F. Mollers Alle, 8000 Aarhus C, Denmark. ![]()
** W.W., H.v.M., and P.R. contributed equally to this work. ![]()

To whom correspondence should be addressed. E-mail:
ruiz{at}molgen.mpg.de.
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
T. Lee, C. Shah, and E. Y. Xu Gene trap mutagenesis: a functional genomics approach towards reproductive research Mol. Hum. Reprod., November 1, 2007; 13(11): 771 - 779. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. H. Friedel, C. Seisenberger, C. Kaloff, and W. Wurst EUCOMM the European Conditional Mouse Mutagenesis Program Brief Funct Genomic Proteomic, October 29, 2007; (2007) elm022v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Abad, C. Menendez, A. Fuchtbauer, M. Serrano, E.-M. Fuchtbauer, and I. Palmero Ing1 Mediates p53 Accumulation and Chromatin Modification in Response to Oncogenic Stress J. Biol. Chem., October 19, 2007; 282(42): 31060 - 31067. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Gragerov, K. Horie, M. Pavlova, L. Madisen, H. Zeng, G. Gragerova, A. Rhode, I. Dolka, P. Roth, A. Ebbert, et al. Large-scale, saturating insertional mutagenesis of the mouse genome PNAS, September 4, 2007; 104(36): 14406 - 14411. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. T. Greig, J. Antonchuk, D. Metcalf, P. O. Morgan, D. L. Krebs, J.-G. Zhang, D. F. Hacking, L. Bode, L. Robb, C. Kranz, et al. Agm1/Pgm3-Mediated Sugar Nucleotide Synthesis Is Essential for Hematopoiesis and Development Mol. Cell. Biol., August 15, 2007; 27(16): 5849 - 5859. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Roma, G. Cobellis, P. Claudiani, F. Maione, P. Cruz, G. Tripoli, M. Sardiello, I. Peluso, and E. Stupka A novel view of the transcriptome revealed from gene trapping in mouse embryonic stem cells Genome Res., July 1, 2007; 17(7): 1051 - 1060. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Grad, T. A. McKee, S. M. Ludwig, G. W. Hoyle, P. Ruiz, W. Wurst, T. Floss, C. A. Miller III, and D. Picard The Hsp90 Cochaperone p23 Is Essential for Perinatal Survival Mol. Cell. Biol., December 1, 2006; 26(23): 8976 - 8983. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Lin, S. L. Donahue, T. Moore-Jarrett, S. Cao, A. B. Osipovich, and H. E. Ruley Mutagenesis of diploid mammalian genes by gene entrapment Nucleic Acids Res., November 6, 2006; 34(20): e139 - e139. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Warming, R. A. Rachel, N. A. Jenkins, and N. G. Copeland Zfp423 is required for normal cerebellar development. Mol. Cell. Biol., September 1, 2006; 26(18): 6913 - 6922. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Schnutgen Generation of multipurpose alleles for the functional analysis of the mouse genome Brief Funct Genomic Proteomic, March 1, 2006; 5(1): 15 - 18. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. De-Zolt, F. Schnutgen, C. Seisenberger, J. Hansen, M. Hollatz, T. Floss, P. Ruiz, W. Wurst, and H. von Melchner High-throughput trapping of secretory pathway genes in mouse embryonic stem cells Nucleic Acids Res., February 13, 2006; 34(3): e25 - e25. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. S. Nord, P. J. Chang, B. R. Conklin, A. V. Cox, C. A. Harper, G. G. Hicks, C. C. Huang, S. J. Johns, M. Kawamoto, S. Liu, et al. The International Gene Trap Consortium Website: a portal to all publicly available gene trap cell lines in mouse Nucleic Acids Res., January 1, 2006; 34(suppl_1): D642 - D648. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Guenet The mouse genome Genome Res., December 1, 2005; 15(12): 1729 - 1740. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. C. Skarnes Two ways to trap a gene in mice PNAS, September 13, 2005; 102(37): 13001 - 13002. [Full Text] [PDF] |
||||
![]() |
H Santti, L Mikkonen, A Anand, S Hirvonen-Santti, J Toppari, M Panhuysen, F Vauti, M Perera, G Corte, W Wurst, et al. Disruption of the murine PIASx gene results in reduced testis weight J. Mol. Endocrinol., June 1, 2005; 34(3): 645 - 654. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Schnutgen, S. De-Zolt, P. Van Sloun, M. Hollatz, T. Floss, J. Hansen, J. Altschmied, C. Seisenberger, N. B. Ghyselinck, P. Ruiz, et al. Genomewide production of multipurpose alleles for the functional analysis of the mouse genome PNAS, May 17, 2005; 102(20): 7221 - 7226. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Cobellis, G. Nicolaus, M. Iovino, A. Romito, E. Marra, M. Barbarisi, M. Sardiello, F. P. Di Giorgio, N. Iovino, M. Zollo, et al. Tagging genes with cassette-exchange sites Nucleic Acids Res., March 1, 2005; 33(4): e44 - e44. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. B. Osipovich, A. Singh, and H. E. Ruley Post-entrapment genome engineering: First exon size does not affect the expression of fusion transcripts generated by gene entrapment Genome Res., March 1, 2005; 15(3): 428 - 435. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Shigeoka, M. Kawaichi, and Y. Ishida Suppression of nonsense-mediated mRNA decay permits unbiased gene trapping in mouse embryonic stem cells Nucleic Acids Res., February 1, 2005; 33(2): e20 - e20. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. Gray, H. Fu, P. Luo, Q. Zhao, J. Yu, A. Ferrari, T. Tenzen, D.-i. Yuk, E. F. Tsung, Z. Cai, et al. Mouse Brain Organization Revealed Through Direct Genome-Scale TF Expression Analysis Science, December 24, 2004; 306(5705): 2255 - 2257. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hirashima, A. Bernstein, W. L. Stanford, and J. Rossant Gene-trap expression screening to identify endothelial-specific genes Blood, August 1, 2004; 104(3): 711 - 718. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. B. Osipovich, E. K. White-Grindley, G. G. Hicks, M. J. Roshon, C. Shaffer, J. H. Moore, and H. E. Ruley Activation of cryptic 3' splice sites within introns of cellular genes following gene entrapment Nucleic Acids Res., May 20, 2004; 32(9): 2912 - 2924. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||