Table 2.

Experimental data for human and mouse assemblies

Sequence coverage, ×
SpeciesLibrary typeNo. of librariesDNA used, μgMean size, bpRead lengthAllPFAlignedUniqueValidPhysical coverage, ×
Short jump2202,53610145.940.733.731.719.7249.4
Fosmid jump22035,29576*
Short jump3202,20910148.040.735.132.019.9219.1
Long jump5507,5322613.
Fosmid jump13038,453761.
  • The data used as assembly input are shown. Tables S1 and S2 provide more detail. Library type: See Table 1. DNA used: Amount of DNA used as input to library construction. For each genome and each library type, a single aliquot was used. DNA source for human: Coriell Biorepository, NA12878. DNA source for mouse: Jackson Laboratory C57/BL6J (stock 000664). Size: Mean of observed fragment size distribution. Read length: Number of bases sequenced. The exception is the long jump libraries prepared with the EcoP15I digestion, which yield 26 bases of genomic information; these inserts were sequenced to 36 bases and then trimmed to 26 bases. Sequence coverage: All reads were used in the assembly, but we describe their properties here via a series of nested categories. All: Total number of bases in reads, divided by genome size, assumed to be the reference size of 3.10 Gb for human and 2.73 Gb for mouse. PF: Coverage by purity-filtered (PF) reads. Aligned: Coverage by aligned PF reads. Unique: Coverage by aligned PF reads, exclusive of duplicates, which were identified by concurrence of start and stop points of pairs on the reference. Valid: Coverage by unique pairs for which the fragment length was within 5 SDs of the mean. Physical coverage: Total coverage by valid pairs and the bases between them.

  • *Reads from one library had length 76, and those from the other had length 101.