New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Precise physical models of protein–DNA interaction from highthroughput data

Contributed by Curtis G. Callan, Jr., November 8, 2006 (received for review September 30, 2006)
Article Figures & SI
Figures
Data supplements
Kinney et al. 10.1073/pnas.0609908104.
Supporting Information
Files in this Data Supplement:
SI Table 1
SI Figure 5
SI Text
SI Figure 6
SI Figure 7
SI Figure 8
SI Figure 9
SI Figure 10
SI Figure 11
SI Figure 12
SI Figure 13
SI Figure 14
SI Figure 5Fig. 5. NonGaussian intensity ratios. (a) Histogram of Abf1p PBM LIRs from Mukherjee et al.'s (1) data. (b) Histogram of corresponding intensity ratios given by exponentiating these LIRs (using base 2). (Inset) A zoomedin view of the highlighted region of the tail. The green line in each plot indicates the cut Mukherjee et al. used to delineate putatively bound regions.
1. Mukherjee S, Berger MF, Jona G, Wang XS, Muzzey D, Snyder M, Young RA, Bulyk ML (2004) Nat Genet 36:13311339.
SI Figure 6Fig. 6. Ensemble means for matrix elements inferred from Abf1p PBM data using 20 (a) and 50 (b) sequences per zbin. (c) c^{2} consistency P values for matrix elements between these two ensembles. No elements have Bonferronicorrected P < 0 .05.
SI Figure 7Fig. 7. MCMC convergence. (a) The mean intrarun variance vs. interrun variance of each matrix element given by MCMC runs on Mukherjee et al.'s (1) PBM data. These were computed by using the first 20, 100, 1,000, and 5,000 models recorded in each run. The green line has slope 1. (b) The perdatum log likelihood of models in each of the 10 PBM MCMC runs as a function of the order in which those models were sampled. (Inset) A blowup of results for the first 100 models sampled in each run. (c and d) Similar plots for MCMC runs using Lee et al.'s (2) ChIPchip data.
1. Mukherjee S, Berger MF, Jona G, Wang XS, Muzzey D, Snyder M, Young RA, Bulyk ML (2004) Nat Genet 36:13311339.
2. Lee TI, Rinaldi NJ, Robert F, Odom DT, BarJoseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon, I, et al. (2002) Science 298:799804.
SI Figure 8Fig. 8. EMA likelihood analysis does not overfit Mukherjee et al.'s (1) Abf1 PBM data. (a and b) Shown are the mean matrix elements determined by MCMC sampling using two disjoint halves of the data. (c) Corresponding c^{2} consistency P values. No elements have Bonferronicorrected P < 0 .05.
1. Mukherjee S, Berger MF, Jona G, Wang XS, Muzzey D, Snyder M, Young RA, Bulyk ML (2004) Nat Genet 36:13311339.
SI Figure 9Fig. 9. Ensemble means for matrix elements inferred from Abf1p PBM data using matrix widths ranging from 14 to 26 (indicated on left side of each plot). Matrices in each MCMC ensemble were rescaled to provide maximum consistency between the ensembles. The resulting c^{2} consistency P values for the central matrix elements common to all ensembles are shown at the top. No elements have Bonferronicorrected P < 0 .05. Red lines delineate matrix boundaries. The rows of each matrix refer to bases A, C, G, and T (top to bottom).
SI Figure 10Fig. 10. TF model parameters inferred from Abf1p ChIPchip data. (a and b) Mean values (a) and rmsds (b) of energy matrix elements accoss models in the MCMC ensemble Q_{ChIP}. (c) These are plotted against each other. (Insets) Shown are raw MCMC histograms of the values obtained for the matrix elements circled in a and b and highlighted in c. These are the same matrix elements highlighted in Fig. 1.
SI Figure 11Fig. 11. Abf1p model predictions using Q_{ChIP}. (a) Histogram of Q_{ChIP} HFs for all 20bp sites in the intergenic DNA of S. cerevisiae. The leftmost bar of the histogram contains the vast majority of intergenic sites and has been truncated. (b) Mean Xstatistic of regions within each zbin plotted against the mean (red square) and rmsd (error bar) fraction classified as bound by models in Q_{ChIP}. The green line indicates the cutoff used by Lee et al. (1) to delineate putatively bound regions. The Xstatistics of all probed sequences are histogrammed in the background for reference. (c) This 2D histogram shows the mean putative energies assigned by Q_{ChIP} to ungapped orthologous intergenic site pairs in S. cerevisiae and S. bayanus. Dashed lines indicate the energy cutoff of 1.
1. Lee TI, Rinaldi NJ, Robert F, Odom DT, BarJoseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon, I, et al. (2002) Science 298:799804.
SI Figure 12Fig. 12. EMA likelihood analysis does not overfit Lee et al.'s (1) Abf1p ChIPchip data. (a and b) The mean matrix elements determined by MCMC sampling using two disjoint halves of the data. (c) Corresponding c^{2} consistency P values. No elements have Bonferronicorrected P < 0 .05.
1. Lee TI, Rinaldi NJ, Robert F, Odom DT, BarJoseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon, I, et al. (2002) Science 298:799804.
SI Figure 13Fig. 13. Ensemble means for matrix elements inferred from Abf1p ChIPchip data using quantizations ranging from 20 to 200 sequences per zbin (indicated on the left side of each plot). c^{2} consistency P values for matrix elements between these ensembles are shown at the top. No elements have Bonferronicorrected P < 0 .05. The rows of each matrix refer to bases A, C, G, and T (top to bottom).
SI Figure 14Fig. 14. Ensemble means for matrix elements inferred from Abf1p ChIPchip data using matrix widths ranging from 14 to 26 (indicated on left side of each plot). Matrices in each MCMC ensemble were rescaled to provide maximum consistency between the ensembles. The resulting c^{2} consistency P values for the central matrix elements common to all ensembles are shown at the top. No elements have Bonferronicorrected P < 0 .05. Red lines delineate matrix boundaries. The rows of each matrix refer to bases A, C, G, and T (top to bottom).
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Biological Sciences
Related Content
 No related articles found.
Cited by...
 Parametric inference in the large data limit using maximally informative models
 Comparison of the theoretical and realworld evolutionary potential of a genetic circuit
 Comprehensive analysis reveals how single nucleotides contribute to noncoding RNA function in bacterial quorum sensing
 Equitability, mutual information, and the maximal information coefficient
 Searching for simplicity in the analysis of neurons and behavior
 Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence
 Genomewide Expression Profiling, In Vivo DNA Binding Analysis, and Probabilistic Motif Prediction Reveal Novel Abf1 Target Genes during Fermentation, Respiration, and Sporulation in Yeast