CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling
- *Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, MA 02138; and †Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115
-
Edited by Michael S. Waterman, University of Southern California, Los Angeles, CA (received for review April 23, 2004)
Abstract
The regulatory information for a eukaryotic gene is encoded in cis-regulatory modules. The binding sites for a set of interacting transcription factors have the tendency to colocalize to the same modules. Current de novo motif discovery methods do not take advantage of this knowledge. We propose a hierarchical mixture approach to model the cis-regulatory module structure. Based on the model, a new de novo motif-module discovery algorithm, CisModule, is developed for the Bayesian inference of module locations and within-module motif sites. Dynamic programming-like recursions are developed to reduce the computational complexity from exponential to linear in sequence length. By using both simulated and real data sets, we demonstrate that CisModule is not only accurate in predicting modules but also more sensitive in detecting motif patterns and binding sites than standard motif discovery methods are.
Footnotes
-
↵ ‡ To whom correspondence should be addressed. E-mail: wwong{at}stat.harvard.edu.
-
This paper was submitted directly (Track II) to the PNAS office.
-
Abbreviations: TF, transcription factor; TFBS, TF-binding site; CRM, cis-regulatory module; HMx, hierarchical mixture; PWM, position-specific weight matrix; BP, BioProspector; Bcd, Bicoid; Hb, Hunchback; Kr, Krüppel.
- Copyright © 2004, The National Academy of Sciences





