Precise physical models of protein–DNA interaction from highthroughput data

Contributed by Curtis G. Callan, Jr., November 8, 2006 (received for review September 30, 2006)
SI Figure 5Fig. 5. NonGaussian intensity ratios. (a) Histogram of Abf1p PBM LIRs from Mukherjee et al.'s (1) data. (b) Histogram of corresponding intensity ratios given by exponentiating these LIRs (using base 2). (Inset) A zoomedin view of the highlighted region of the tail. The green line in each plot indicates the cut Mukherjee et al. used to delineate putatively bound regions.
SI Figure 6Fig. 6. Ensemble means for matrix elements inferred from Abf1p PBM data using 20 (a) and 50 (b) sequences per zbin. (c) c^{2} consistency P values for matrix elements between these two ensembles. No elements have Bonferronicorrected P < 0 .05.
SI Figure 7Fig. 7. MCMC convergence. (a) The mean intrarun variance vs. interrun variance of each matrix element given by MCMC runs on Mukherjee et al.'s (1) PBM data. These were computed by using the first 20, 100, 1,000, and 5,000 models recorded in each run. The green line has slope 1. (b) The perdatum log likelihood of models in each of the 10 PBM MCMC runs as a function of the order in which those models were sampled. (Inset) A blowup of results for the first 100 models sampled in each run. (c and d) Similar plots for MCMC runs using Lee et al.'s (2) ChIPchip data.
2. Lee TI, Rinaldi NJ, Robert F, Odom DT, BarJoseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon, I, et al. (2002) Science 298:799804.
SI Figure 8Fig. 8. EMA likelihood analysis does not overfit Mukherjee et al.'s (1) Abf1 PBM data. (a and b) Shown are the mean matrix elements determined by MCMC sampling using two disjoint halves of the data. (c) Corresponding c^{2} consistency P values. No elements have Bonferronicorrected P < 0 .05.
SI Figure 9Fig. 9. Ensemble means for matrix elements inferred from Abf1p PBM data using matrix widths ranging from 14 to 26 (indicated on left side of each plot). Matrices in each MCMC ensemble were rescaled to provide maximum consistency between the ensembles. The resulting c^{2} consistency P values for the central matrix elements common to all ensembles are shown at the top. No elements have Bonferronicorrected P < 0 .05. Red lines delineate matrix boundaries. The rows of each matrix refer to bases A, C, G, and T (top to bottom).
SI Figure 10Fig. 10. TF model parameters inferred from Abf1p ChIPchip data. (a and b) Mean values (a) and rmsds (b) of energy matrix elements accoss models in the MCMC ensemble Q_{ChIP}. (c) These are plotted against each other. (Insets) Shown are raw MCMC histograms of the values obtained for the matrix elements circled in a and b and highlighted in c. These are the same matrix elements highlighted in Fig. 1.
SI Figure 11Fig. 11. Abf1p model predictions using Q_{ChIP}. (a) Histogram of Q_{ChIP} HFs for all 20bp sites in the intergenic DNA of S. cerevisiae. The leftmost bar of the histogram contains the vast majority of intergenic sites and has been truncated. (b) Mean Xstatistic of regions within each zbin plotted against the mean (red square) and rmsd (error bar) fraction classified as bound by models in Q_{ChIP}. The green line indicates the cutoff used by Lee et al. (1) to delineate putatively bound regions. The Xstatistics of all probed sequences are histogrammed in the background for reference. (c) This 2D histogram shows the mean putative energies assigned by Q_{ChIP} to ungapped orthologous intergenic site pairs in S. cerevisiae and S. bayanus. Dashed lines indicate the energy cutoff of 1.
SI Figure 12Fig. 12. EMA likelihood analysis does not overfit Lee et al.'s (1) Abf1p ChIPchip data. (a and b) The mean matrix elements determined by MCMC sampling using two disjoint halves of the data. (c) Corresponding c^{2} consistency P values. No elements have Bonferronicorrected P < 0 .05.
SI Figure 13Fig. 13. Ensemble means for matrix elements inferred from Abf1p ChIPchip data using quantizations ranging from 20 to 200 sequences per zbin (indicated on the left side of each plot). c^{2} consistency P values for matrix elements between these ensembles are shown at the top. No elements have Bonferronicorrected P < 0 .05. The rows of each matrix refer to bases A, C, G, and T (top to bottom).
SI Figure 14Fig. 14. Ensemble means for matrix elements inferred from Abf1p ChIPchip data using matrix widths ranging from 14 to 26 (indicated on left side of each plot). Matrices in each MCMC ensemble were rescaled to provide maximum consistency between the ensembles. The resulting c^{2} consistency P values for the central matrix elements common to all ensembles are shown at the top. No elements have Bonferronicorrected P < 0 .05. Red lines delineate matrix boundaries. The rows of each matrix refer to bases A, C, G, and T (top to bottom).
