Whole-genome disassembly
- Howard Hughes Medical Institute and University of Washington, Seattle, WA 98195
The race to sequence the human genome has garnered a level of popular attention unprecedented for a scientific endeavor. This fascination has partly been caused of course by the importance of the goal; but it also reflects the Olympian nature of the contest, which opposed two capable teams with sharply contrasting cultures (public and private), personalities, and strategies. Titanic struggles being the stuff of mythology, it should perhaps not surprise us that a number of myths regarding this race have already emerged. In a recent issue of PNAS, Waterston et al. (1), leaders of the public effort, help to dispel one of these myths, involving the controversial “whole-genome shotgun” strategy used by Celera.
Issues surrounding sequencing strategies will no doubt seem arcane to most readers but are worth considering if only because they may significantly influence the pace and cost of DNA sequencing during the remainder of the Genome Era. That a strategy is needed at all arises from the fact that a sequencing “read,” the tract of data obtainable in a single experimental run, is only a few hundred bases in length and contains errors. Getting reliable sequence of a larger DNA segment therefore requires a method for generating and piecing together a number of reads covering the segment. Since its introduction by Sanger and colleagues over 20 years ago, the favored method for this purpose has comprised the following steps: an initial “shotgun” phase in which reads are derived from subclones essentially randomly located within the targeted region; an assembly phase, in which read overlaps are determined (the main challenge here being to identify and discard false overlaps arising from repeated sequences) and used to approximately reconstruct the underlying sequence; and a finishing phase in which additional reads are obtained in directed fashion to close gaps and shore …





