Discovering gapped binding sites of yeast transcription factors

  1. Chien-Yu Chen*,
  2. Huai-Kuang Tsai,
  3. Chen-Ming Hsu,
  4. Mei-Ju May Chen§,
  5. Hao-Geng Hung,
  6. Grace Tzu-Wei Huang, and
  7. Wen-Hsiung Li,**,††
  1. *Department of Bio-Industrial Mechatronics Engineering,
  2. §Graduate Institute of Biomedical Electronics and Bioinformatics, and
  3. Department of Computer Science and Informatics Engineering, National Taiwan University, Taipei 106, Taiwan;
  4. Institute of Information Science and
  5. Research Center for Biodiversity and Genomics Research Center, Academia Sinica, Taipei 115, Taiwan;
  6. Department of Computer Science and Engineering, Yuan Ze University, Tao-Yuan 320, Taiwan; and
  7. **Department of Ecology and Evolution, University of Chicago, 1101 East 57th Street, Chicago, IL 60637
  1. Contributed by Wen-Hsiung Li, December 24, 2007 (received for review November 14, 2007)

Abstract

A gapped transcription factor-binding site (TFBS) contains one or more highly degenerate positions. Discovering gapped motifs is difficult, because allowing highly degenerate positions in a motif greatly enlarges the search space and complicates the discovery process. Here, we propose a method for discovering TFBSs, especially gapped motifs. We use ChIP-chip data to judge the binding strength of a TF to a putative target promoter and use orthologous sequences from related species to judge the degree of evolutionary conservation of a predicted TFBS. Candidate motifs are constructed by growing compact motif blocks and by concatenating two candidate blocks, allowing 0–15 degenerate positions in between. The resultant patterns are statistically evaluated for their ability to distinguish between target and nontarget genes. Then, a position-based ranking procedure is proposed to enhance the signals of true motifs by collecting position concurrences. Empirical tests on 32 known yeast TFBSs show that the method is highly accurate in identifying gapped motifs, outperforming current methods, and it also works well on ungapped motifs. Predictions on additional 54 TFs successfully discover 11 gapped and 38 ungapped motifs supported by literature. Our method achieves high sensitivity and specificity for predicting experimentally verified TFBSs.

Footnotes

  • ††To whom correspondence should be addressed. E-mail: whli{at}uchicago.edu
  • Author contributions: C.-Y.C., H.-K.T., and W.-H.L. designed research; C.-Y.C., H.-K.T., C.-M.H., M.-J.M.C., H.-G.H., and G.T.-W.H. performed research; C.-Y.C., H.-K.T., C.-M.H., M.-J.M.C., H.-G.H., and G.T.-W.H. analyzed data; and C.-Y.C., H.-K.T., G.T.-W.H., and W.-H.L. wrote the paper.

  • The authors declare no conflict of interest.

  • This article contains supporting information online at www.pnas.org/cgi/content/full/0712188105/DC1.

« Previous | Next Article »Table of Contents