Previous Article |
Table of Contents
| Next Article
GENETICS
Whole-genome shotgun assembly and comparison of human genome assemblies
aApplied Biosystems, 45 West Gude Drive, Rockville, MD 20850; bThe Center for the Advancement of Genomics (TCAG), 1901 Research Boulevard, Suite 600, Rockville, MD 20850; fCelera Genomics, 45 West Gude Drive, Rockville, MD 20850; gThe Institute for Genomic Research (TIGR), 9712 Medical Center Drive, Rockville, MD 20850; kDepartment of Molecular Biology and Genetics, Cornell University, 227 Biotechnology Building, Ithaca, NY 14853; lDepartment of Mathematics, University of Southern California, 1042 West 36th Place, DRB 155, Los Angeles, CA 90033; mDepartment of Genetics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106; oApplied Biosystems, 850 Lincoln Centre Drive, Foster City, CA 94404; and pComputer Science Division, University of California, 775 Soda Hall, Berkeley, CA 94720
Contributed by J. Craig Venter, December 8, 2003
We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 13041351; International Human Genome Sequencing Consortium (2001) Nature 409, 860921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.
Data deposition: The sequences of the assemblies herein referred to as WGSA, CSA, and WGA have been deposited in the GenBank database (whole-genome assembly project accession nos. AADD00000000, AADC00000000, and AADB00000000).
c Present address: School of Computing, Queen's University, Kingston, ON, Canada K7L 3N6.
d Present address: Department of Genetics, University of Pennsylvania, 1409 Blockley Hall, Philadelphia, PA 19104.
e Present address: The Center for the Advancement of Genomics (TCAG), 1901 Research Boulevard, Suite 600, Rockville, MD 20850.
h Present address: WSI-Algorithmen der Bioinformatik, Universität Tübingen, Sand 14, 72076 Tübingen, Germany.
i Present address: Department of Computational Biology (ABISS), University of Rouen, 76821 Mont-Saint-Aignan Cedex, France.
j Present address: Institute of Computer Science, Freie Universität Berlin, Takustrasse 9, D-14195 Berlin, Germany.
n Present address: Department of Genetics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106.
q To whom correspondence should be addressed. E-mail: jcventer{at}tcag.org.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg What's this?
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
G. Denisov, B. Walenz, A. L. Halpern, J. Miller, N. Axelrod, S. Levy, and G. Sutton Consensus generation and variant detection by Celera Assembler Bioinformatics, April 15, 2008; 24(8): 1035 - 1040. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. Buza, F. M. McCarthy, N. Wang, S. M. Bridges, and S. C. Burgess Gene Ontology annotation quality analysis in model eukaryotes Nucleic Acids Res., February 2, 2008; 36(2): e12 - e12. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Kim, M. S. Waterman, and L. M. Li Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi Genome Res., July 1, 2007; 17(7): 1101 - 1110. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Bansal, A. Bashir, and V. Bafna Evidence for large inversion polymorphisms in the human genome from HapMap data Genome Res., February 1, 2007; 17(2): 219 - 230. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Lefebvre, J. Fan, S. Chevalier, R. Sullivan, E. Carmona, and P. Manjunath Genomic structure and tissue-specific expression of human and mouse genes encoding homologues of the major bovine seminal plasma proteins Mol. Hum. Reprod., January 1, 2007; 13(1): 45 - 53. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Ideker and A. Valencia Bioinformatics in the human interactome project Bioinformatics, December 15, 2006; 22(24): 2973 - 2974. [Full Text] [PDF] |
||||
![]() |
K.-A. da Costa, O. G. Kozyreva, J. Song, J. A. Galanko, L. M. Fischer, and S. H. Zeisel Common genetic polymorphisms affect the human requirement for the nutrient choline FASEB J, July 1, 2006; 20(9): 1336 - 1344. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. She, G. Liu, M. Ventura, S. Zhao, D. Misceo, R. Roberto, M. F. Cardone, M. Rocchi, NISC Comparative Sequencing Program, E. D. Green, et al. A preliminary comparative analysis of primate segmental duplications shows elevated substitution rates and a great-ape expansion of intrachromosomal duplications Genome Res., May 1, 2006; 16(5): 576 - 583. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Shao, V. Shepelev, and A. Fedorov Bioinformatic analysis of exon repetition, exon scrambling and trans-splicing in humans Bioinformatics, March 15, 2006; 22(6): 692 - 698. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Choe, H. H. Guo, and G. van den Engh A dual-fluorescence reporter system for high-throughput clone characterization and selection by cell sorting Nucleic Acids Res., March 14, 2005; 33(5): e49 - e49. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Florea, V. Di Francesco, J. Miller, R. Turner, A. Yao, M. Harris, B. Walenz, C. Mobarry, G. V. Merkulov, R. Charlab, et al. Gene and alternative splicing annotation with AIR Genome Res., January 1, 2005; 15(1): 54 - 66. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. W. Blakesley, N. F. Hansen, J. C. Mullikin, P. J. Thomas, J. C. McDowell, B. Maskeri, A. C. Young, B. Benjamin, S. Y. Brooks, B. I. Coleman, et al. An intermediate grade of finished genomic sequence suitable for comparative analyses Genome Res., November 1, 2004; 14(11): 2235 - 2244. [Abstract] [Full Text] [PDF] |
||||