An algorithm for assembly of ordered restriction maps from single DNA molecules
- *Department of Mathematics, University of Southern California, 3620 South Vermont Avenue, KAP 108, Los Angeles, CA 90089-2532;
- ‡Laboratory for Molecular and Computational Genomics, Departments of Genetics and Chemistry, University of Wisconsin, Biotechnology Center, 425 Henry Mall, Madison, WI 53706; and
- §Department of Computational Molecular Biology and Bioinformatics, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089-2910
-
Edited by Philip P. Green, University of Washington School of Medicine, Seattle, WA, and approved August 24, 2006 (received for review May 17, 2006)
Abstract
The restriction mapping of a massive number of individual DNA molecules by optical mapping enables assembly of physical maps spanning mammalian and plant genomes; however, not through computational means permitting completely de novo assembly. Existing algorithms are not practical for genomes larger than lower eukaryotes due to their high time and space complexity. In many ways, sequence assembly parallels map assembly, so that the overlap–layout–consensus strategy, recently shown effective in assembling very large genomes in feasible time, sheds new light on solving map construction issues associated with single molecule substrates. Accordingly, we report an adaptation of this approach as the formal basis for de novo optical map assembly and demonstrate its computational feasibility for assembly of very large genomes. As such, we discuss assembly results for a series of genomes: human, plant, lower eukaryote and bacterial. Unlike sequence assembly, the optical map assembly problem is actually more complex because restriction maps from single molecules are constructed, manifesting errors stemming from: missing cuts, false cuts, and high variance of estimated fragment sizes; chimeric maps resulting from artifactually merged molecules; and true overlap scores that are “in the noise” or “slightly above the noise.” We address these problems, fundamental to many single molecule measurements, by an effective error correction method using global overlap information to eliminate spurious overlaps and chimeric maps that are otherwise difficult to identify.
Footnotes
- †To whom correspondence should be addressed. E-mail: valouev{at}usc.edu
-
Author contributions: A.V. and M.S.W. designed research; A.V. performed research; A.V. and D.C.S. contributed new reagents/analytic tools; A.V., D.C.S., and S.Z. analyzed data; and A.V., D.C.S., and M.S.W. wrote the paper.
-
The authors declare no conflict of interest.
-
This article is a PNAS direct submission.
-
↵ ¶ Ananthraman, T., Schwartz, D, Mishra, B. The Seventh International Conference on Intelligent Systems for Molecular Biology, 1999.
- © 2006 by The National Academy of Sciences of the USA










