PNAS Peer Review  Sign up for PNAS Online eTocs
Link: Info for AuthorsLink: Editorial BoardLink: AboutLink: SubscribeLink: AdvertiseLink: ContactLink: Sitemap Link: PNAS Home
Proceedings of the National Academy of Sciences
Link: Current Issue "" Link: Archives "" Link: Online Submission ""  Link: Advanced Search

Published online on February 9, 2004, 10.1073/pnas.0307971100
PNAS | February 17, 2004 | vol. 101 | no. 7 | 1916-1921


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Supporting Information
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My File Cabinet
Right arrow Download to citation manager
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via ISI Web of Science (33)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Istrail, S.
Right arrow Articles by Venter, J. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Istrail, S.
Right arrow Articles by Venter, J. C.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg  
What's this?

 Previous Article  | Table of Contents |  Next Article 

GENETICS
Whole-genome shotgun assembly and comparison of human genome assemblies

Sorin Istrail a, Granger G. Sutton a, Liliana Florea a, Aaron L. Halpern b, Clark M. Mobarry a, Ross Lippert a, Brian Walenz a, Hagit Shatkay a c, Ian Dew a, Jason R. Miller a, Michael J. Flanigan a, Nathan J. Edwards a, Randall Bolanos a, Daniel Fasulo a, Bjarni V. Halldorsson a, Sridhar Hannenhalli a d, Russell Turner a, Shibu Yooseph a e, Fu Lu f, Deborah R. Nusskern f, Bixiong Chris Shue f, Xiangqun Holly Zheng f, Fei Zhong f, Arthur L. Delcher g, Daniel H. Huson f h, Saul A. Kravitz b, Laurent Mouchard f i, Knut Reinert f j, Karin A. Remington b, Andrew G. Clark k, Michael S. Waterman l, Evan E. Eichler m, Mark D. Adams f n, Michael W. Hunkapiller o, Eugene W. Myers p, and J. Craig Venter b q

aApplied Biosystems, 45 West Gude Drive, Rockville, MD 20850; bThe Center for the Advancement of Genomics (TCAG), 1901 Research Boulevard, Suite 600, Rockville, MD 20850; fCelera Genomics, 45 West Gude Drive, Rockville, MD 20850; gThe Institute for Genomic Research (TIGR), 9712 Medical Center Drive, Rockville, MD 20850; kDepartment of Molecular Biology and Genetics, Cornell University, 227 Biotechnology Building, Ithaca, NY 14853; lDepartment of Mathematics, University of Southern California, 1042 West 36th Place, DRB 155, Los Angeles, CA 90033; mDepartment of Genetics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106; oApplied Biosystems, 850 Lincoln Centre Drive, Foster City, CA 94404; and pComputer Science Division, University of California, 775 Soda Hall, Berkeley, CA 94720

Contributed by J. Craig Venter, December 8, 2003

We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304–1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860–921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.


Abbreviations: WGSA, whole-genome shotgun assembly; IHGSC, the International Human Genome Sequencing Consortium; BAC, bacterial artificial chromosome; NCBI-34, National Center for Biotechnology Information (NCBI) Build 34 of the human genome; STS, sequence tagged site; CSA, compartmental shotgun assembly; WGA, whole-genome assembly.

Data deposition: The sequences of the assemblies herein referred to as WGSA, CSA, and WGA have been deposited in the GenBank database (whole-genome assembly project accession nos. AADD00000000, AADC00000000, and AADB00000000).

c Present address: School of Computing, Queen's University, Kingston, ON, Canada K7L 3N6.

d Present address: Department of Genetics, University of Pennsylvania, 1409 Blockley Hall, Philadelphia, PA 19104.

e Present address: The Center for the Advancement of Genomics (TCAG), 1901 Research Boulevard, Suite 600, Rockville, MD 20850.

h Present address: WSI-Algorithmen der Bioinformatik, Universität Tübingen, Sand 14, 72076 Tübingen, Germany.

i Present address: Department of Computational Biology (ABISS), University of Rouen, 76821 Mont-Saint-Aignan Cedex, France.

j Present address: Institute of Computer Science, Freie Universität Berlin, Takustrasse 9, D-14195 Berlin, Germany.

n Present address: Department of Genetics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106.

q To whom correspondence should be addressed. E-mail: jcventer{at}tcag.org.


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg    What's this?


This article has been cited by other articles in HighWire Press-hosted journals:


Home page
BioinformaticsHome page
G. Denisov, B. Walenz, A. L. Halpern, J. Miller, N. Axelrod, S. Levy, and G. Sutton
Consensus generation and variant detection by Celera Assembler
Bioinformatics, April 15, 2008; 24(8): 1035 - 1040.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. J. Buza, F. M. McCarthy, N. Wang, S. M. Bridges, and S. C. Burgess
Gene Ontology annotation quality analysis in model eukaryotes
Nucleic Acids Res., February 2, 2008; 36(2): e12 - e12.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. H. Kim, M. S. Waterman, and L. M. Li
Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi
Genome Res., July 1, 2007; 17(7): 1101 - 1110.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
V. Bansal, A. Bashir, and V. Bafna
Evidence for large inversion polymorphisms in the human genome from HapMap data
Genome Res., February 1, 2007; 17(2): 219 - 230.
[Abstract] [Full Text] [PDF]


Home page
Mol Hum ReprodHome page
J. Lefebvre, J. Fan, S. Chevalier, R. Sullivan, E. Carmona, and P. Manjunath
Genomic structure and tissue-specific expression of human and mouse genes encoding homologues of the major bovine seminal plasma proteins
Mol. Hum. Reprod., January 1, 2007; 13(1): 45 - 53.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Ideker and A. Valencia
Bioinformatics in the human interactome project
Bioinformatics, December 15, 2006; 22(24): 2973 - 2974.
[Full Text] [PDF]


Home page
FASEB J.Home page
K.-A. da Costa, O. G. Kozyreva, J. Song, J. A. Galanko, L. M. Fischer, and S. H. Zeisel
Common genetic polymorphisms affect the human requirement for the nutrient choline
FASEB J, July 1, 2006; 20(9): 1336 - 1344.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
X. She, G. Liu, M. Ventura, S. Zhao, D. Misceo, R. Roberto, M. F. Cardone, M. Rocchi, NISC Comparative Sequencing Program, E. D. Green, et al.
A preliminary comparative analysis of primate segmental duplications shows elevated substitution rates and a great-ape expansion of intrachromosomal duplications
Genome Res., May 1, 2006; 16(5): 576 - 583.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Shao, V. Shepelev, and A. Fedorov
Bioinformatic analysis of exon repetition, exon scrambling and trans-splicing in humans
Bioinformatics, March 15, 2006; 22(6): 692 - 698.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Choe, H. H. Guo, and G. van den Engh
A dual-fluorescence reporter system for high-throughput clone characterization and selection by cell sorting
Nucleic Acids Res., March 14, 2005; 33(5): e49 - e49.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
L. Florea, V. Di Francesco, J. Miller, R. Turner, A. Yao, M. Harris, B. Walenz, C. Mobarry, G. V. Merkulov, R. Charlab, et al.
Gene and alternative splicing annotation with AIR
Genome Res., January 1, 2005; 15(1): 54 - 66.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
R. W. Blakesley, N. F. Hansen, J. C. Mullikin, P. J. Thomas, J. C. McDowell, B. Maskeri, A. C. Young, B. Benjamin, S. Y. Brooks, B. I. Coleman, et al.
An intermediate grade of finished genomic sequence suitable for comparative analyses
Genome Res., November 1, 2004; 14(11): 2235 - 2244.
[Abstract] [Full Text] [PDF]