New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Signatures of combinatorial regulation in intrinsic biological noise

Communicated by Stuart A. Rice, University of Chicago, Chicago, IL, September 19, 2008 (received for review June 10, 2008)
Abstract
Gene expression is controlled by the action of transcription factors that bind to DNA and influence the rate at which a gene is transcribed. The quantitative mapping between the regulator concentrations and the output of the gene is known as the cisregulatory input function (CRIF). Here, we show how the CRIF shapes the form of the joint probability distribution of molecular copy numbers of the regulators and the product of a gene. Namely, we derive a class of fluctuationbased relations that relate the moments of the distribution to the derivatives of the CRIF. These relations are useful because they enable statistics of naturally arising celltocell variations in molecular copy numbers to substitute for traditional manipulations for probing regulatory mechanisms. We demonstrate that these relations can distinguish super and subadditive gene regulatory scenarios (molecular analogs of AND and OR logic operations) in simulations that faithfully represent bacterial gene expression. Applications and extensions to other regulatory scenarios are discussed.
Transcription factors regulate the expression of genes by binding to specific sites on the DNA that are typically spatially close to, sometimes even in, sequences that code for proteins. A group of such binding sites is collectively known as a cisregulatory region. When a gene processes the effects of multiple transcription factors, one can view it as performing a computation. The inputs are the occupancies of the binding sites in the cisregulatory region and the output is the rate of transcription. The simplest method of describing the relationship between the transcription factors and the output is by using boolean logic (1,2). For example, 2 activators can regulate a gene with AND logic in which both are required or with OR logic in which either transcription factor is sufficient for transcription. Knowing which mode of combinatorial regulation a gene employs can be important for determining its function in regulatory networks. For example, the coherent feed forward loop, one of the most common motifs in gene regulatory networks (3), filters noise in upstream signals differently depending on how one of the participating genes integrates its inputs (4,5).
In general, the output of a gene will not be binary, and will depend on the concentrations of the transcription factors in a complex manner. The notion of logic operations can be generalized by introducing a continuous function that encodes the dependence of the rate of transcription on the concentrations of the inputs. Such “cisregulatory input functions (CRIFs)” (5) have been evaluated experimentally for the wellstudied lac operon. The CRIF of the wild type is complex (6,7), but those of certain mutant operons represent molecular analogs of binary logic gates (8). In higher organisms, many genes are regulated by a large number of transcription factors, often with multiple binding sites for each factor (9–11). It is increasingly clear that the action of many such regulators depends on their molecular context (12,,–19). Because CRIFs are capable of encoding arbitrarily complex modes of regulation, they will likely be even more useful for describing the control of transcription in these systems than in the simpler prokaryotic ones.
To date, the notion of a CRIF has largely been restricted to studying how the average rate of transcription depends on the average concentration of the transcription factor inputs by using bulk measurements. However, experiments performed on single cells reveal that because transcription factors are often present in low copy numbers, stochastic fluctuations in the concentrations of these molecules can have important consequences for gene regulation (20,–25). Although a dialog between theory and experiment has revealed how noise enters the basic processes of transcription and translation (26,27) and propagates through a cascade of genes (23,28,29), the relationship between combinatorial regulation and stochastic fluctuations has received much less attention. In particular, it is unclear how the mode of regulation encoded in the CRIF is related to the statistics that characterize the distribution of protein concentrations across a population\break of cells.
Our approach to this area is motivated by the fact that, at equilibrium, the fluctuations of a property of a material can be related to the response to changes in a conjugate control parameter. A wellknown example is the relation between the fluctuations in the energy of a system and its heat capacity. More generally, the fluctuation–dissipation theorem relates fluctuations about equilibrium to the relaxation of the system after it has been perturbed from equilibrium (30). General fluctuation–dissipation relations for linear (or linearized) chemical systems that relate the average values of the chemical concentrations to the second moments can also be derived (31) and have been applied to biological systems(29,32).
Here, we use a stochastic model of a gene regulated by an arbitrary number of transcription factors to show that logic operations are most closely related to the derivatives of the CRIF. In turn, the derivatives can be related to higher moments of the distribution of input and output copy numbers in analogy to molecular systems at equilibrium. These facts suggest means for inferring regulatory synergies from measurements that report on single cells. Using simulations of simple constructs with reasonable parameters, we demonstrate that the signatures of combinatorial logic should be detectable with presently available experimental methods. This is useful because proximity in DNA binding is not sufficient to infer combinatorial interactions, and they cannot be readily probed by traditional methods (e.g., knockouts) or highthroughput expression assays (e.g., microarray data). Broader implications for the work are discussed.
Results
In this section, we develop a model of gene expression and use it to derive a general expression relating the distribution of protein copy numbers to the derivatives of the CRIF. We then illustrate the utility of this relationship by considering idealized logic gates. Finally, predictions are made for an actual system to enable validation of the theory and its assumptions.
Fluctuation Relations for a Gene.
We model a gene regulated by an arbitrary number of transcription factors by using chemical reactions for the production and degradation of the transcription factor inputs and the output: where the {X_{m}} for m = 1 to n are the transcription factor inputs to the logic gate, X_{0} is the measured output of the gene of interest, and arrows from and to ∅ denote synthesis and degradation, respectively. The rate of production of X_{0} is determined by the concentrations of the transcription factors and is encoded in the (dimensionless) cisregulatory input function, which we denote symbolically by f(N_{1},N_{2},…,N_{n}), where N_{m} is the number of copies of X_{m}.
Although f is arbitrary with respect to the mode of regulation, our analytical treatment relies on 2 simplifying assumptions: (i) we model transcription and translation as a single composite step and (ii) we employ a quasisteadystate approximation to absorb the association and dissociation of the regulators into the CRIF. The latter assumption is justified because those reactions are significantly faster than protein production and degradation (33). Although these assumptions enable the derivation of analytical results, we show with simulations that the conclusions of the theory hold when these elementary steps are considered explicitly.
To relate the CRIF to the noise properties of the system, we describe the dynamics of the gene by a master equation (31): where N_{m} denotes the set of copy numbers of the inputs and the reporter, and P is the probability of observing a particular combination of protein copy numbers. Ê_{m} is a step operator defined such that Ê_{m}h(N_{m}) = h(N_{m} + 1) and Ê_{m}^{−1}h(N_{m}) = h(N_{m} − 1), where h represents an arbitrary function of N_{m}. Eq. 2 simply tracks P({N_{m}}) assuming the protein copy numbers evolve according to the reactions in Eq. 1. The first 2 terms on the righthand side of Eq. 2 represent the creation of the output and the inputs, respectively, and the last term represents degradation of all the species.
The parameter Ω in Eq. 2 quantifies the size of the system; it simply reflects the choice of units of the kinetic parameters. We include it explicitly to facilitate expansion of the master equation (31). To this end, motivated by the fact that fluctuations are expected to scale as Ω
After this change of variables, we expand the master equation in powers of Ω^{–}
Describing Combinatorial Regulation
Consider a CRIF that is a function of the concentration of 2 transcription factors f(φ_{1},φ_{2}). How does the output respond to simultaneous changes in the concentrations of the inputs? Note that here we are not envisioning stochastic fluctuations, but small changes to the average concentrations due to perturbations (either natural or artificial cues) to the system. Eq. 3 then relates the response to these perturbations to the intrinsic noise that can be observed even when the system is unperturbed. By Taylor expanding, we find: The first and second terms give the responses to changes in only X_{1} and only X_{2}, respectively. The third term represents the synergistic contributions, and the mixed partial derivative is thus the lowestorder term that reports on them. The mixed partial derivative can be accessed through the 3point correlation function 〈ξ_{0}ξ_{1}ξ_{2}〉, as shown by the specific form Eq. 3 takes in the case of 2 regulators (j = 2): where C is a positive constant. Thus, Eqs. 4 and 5 taken together suggest that information about synergistic regulation is available from the 3point correlation function. A pedagogical discussion of the intuitive meaning of this result can be found in SI Appendix, and Figs. S1 and S2.
Idealized TwoRegulator Systems.
Although Eqs. 3 and 5 are general and apply regardless of the mode of regulation, it is instructive to consider a simple model for the CRIF. Consider the 2 activator system depicted schematically in Fig. 1. This system has 4 possible states, one corresponding to each combination of bound transcription factors. If the binding and unbinding of the transcription factors to the DNA are taken to be fast, one obtains (34) where K_{i} is the Michaelis constant for the binding of the ith transcription factor and r_{α} is the rate of transcription when the system is in state α (α = 0, 1, 2, 12). For an AND gate both regulators must be bound to initiate transcription so r_{0} = r_{1} = r_{2} = 0 and r_{12} = r, whereas for an OR gate, binding of either regulator enables maximum production so r_{0} = 0 and r_{1} = r_{2} = r_{12} = r. The f functions for these idealized systems are plotted in Fig. 1. The differences between the AND and OR gates are most visible at the corners of the plots, where one activator is expressed strongly and the other is expressed weakly. The gene activities vary continuously with the levels of the activators because they reflect averages over the states of the cisregulatory regions sampled as the transcription factors associate and dissociate from the DNA.
Above, we showed that the mixedpartial derivative reports on synergies (Eq. 4). For this model, this derivative is Because K_{1} and K_{2} are positive, the sign of the righthand side of this equation is determined by the quantity in square brackets. This is the rate when both regulators are bound (r_{12}) less the novel contributions from having one regulator bound (r_{1}−r_{0} and r_{2}−r_{0} for regulators 1 and 2, respectively) and the basal rate (r_{0}). Thus, this mixed derivative reports on the novel effects that emerge when 2 regulators are bound. Substitution of the values defining the AND and OR gates shows that the mixed partial derivative is always positive for the former and always negative for the latter. There is a simple physical interpretation of these results. An AND gate requires that both transcription factors be present for the gene to be expressed, so the effect of having both proteins is greater than the sum of the individual effects (no output). In the case of an OR gate, once one transcription factor is bound, no additional activity is obtained from the binding of the second, and the net effect is therefore less than the sum of the individual effects of the regulators.
Eq. 5 shows that the sign of the mixedpartial derivative is the same as the sign of the 3point correlation function involving both regulators and the output. Because our analytical results were derived with simplifying assumptions as discussed above, we performed stochastic simulations (35) of genes regulated by AND and OR logic in which the processes of transcription, translation, DNA binding and unbinding, and protein and mRNA degradation were treated explicitly (Fig. 2). Consider the mean concentration of the output for genes regulated by AND and OR logic gates. Although the mean for the OR gate is higher because either activator can initiate transcription, a change in the kinetic parameters could yield an AND gate with a mean output concentration identical to that for the OR gate and vice versa. Thus, there is no unique signature of the regulatory mechanism present in this statistic. Similarly for the variance of the output and the crosscorrelation involving either individual input and the output, it is possible to tune the kinetic parameters for either the AND or the OR gate to achieve any specific positive value of these statistics. In contrast, the 3point correlation function is positive for AND gates and negative for OR gates regardless of the parameter values. Thus, these simulations support the notion that the mode of combinatorial regulation is unambiguously reflected in this correlation function. It will be positive for synergistic (ANDlike) regulation, negative for subadditive (ORlike) regulation, and zero for additive regulation. Although the above discussion has focused on a gene regulated by 2 activators, the 3point correlation function also distinguishes between qualitatively different modes of combinatorial regulation in other situations as summarized in Table 1.
Practical Issues for the Detection of Noise Signatures.
Given singlecell data for the joint distribution of copy numbers of X_{0} and the transcription factors X_{1} and X_{2}, we compute the 3point correlation function 〈ξ_{0}ξ_{1}ξ_{2}〉 as follows. As discussed above, up to a factor involving the volume, ξ_{m} is the difference between the value of the concentration of X_{m} in a single cell and its average across the population of cells. We thus calculate the average level of each species in the instrument response units (〈N_{m} 〉, for m = 0,1,2), subtract the average from the level of the species in each cell to obtain a deviation (δN_{m} = N_{m} − 〈N_{m}〉), multiply the 3 deviations for each cell together (δN_{1}δN_{2}δ_{0}), and average the resulting product over the population of cells (〈δN_{1}δN_{2}δN_{0}〉); 〈ξ_{1}ξ_{2}ξ_{0}〉 = Ω^{–}
The data needed for this analysis can be obtained from flow cytometry, in which several molecular species are simultaneously fluorescently tagged and detected to estimate their concentrations in many individual cells. Although more reliable data can be obtained by engineering fusions to fluorescently active proteins, antibodies coupled to dyes can often be used and obviate molecular biological manipulation. A limitation of flow cytometry is that the measurement errors can be quite large. Nonetheless, because determining the qualitative mode of combinatorial regulation only requires determining the sign of the 3point correlation function, it is relatively robust to such errors. Fig. S3 in SI Appendix shows that numbers of cells routinely accessible 10^{6} are sufficient to determine the sign of the correlation function with high certainty even when the measurement errors are of the same magnitude as the fluorescence signal.
A more challenging issue is that fluctuations that affect gene expression globally [e.g., changes in the number of ribosomes or RNA polymerase (RNAP) molecules, variations in cell sizes, cell cycle stages] can create correlations between molecular species which are independent of transcriptional regulation. In the literature, such variability is termed “extrinsic noise” and is distinguished from variability directly associated with the processes of gene expression under consideration (“intrinsic noise”) (20,36). To explore this issue, we performed simulations of the idealized logic gates in which we explicitly included fluctuations in RNAP. As expected, correlations were observed even between species that had no causal relationship (i.e., the 2 inputs). The fluctuations in RNAP led to a decrease in the 3point correlation function because they increased the output less than what was expected from summing the effects of the concomitant increases in the individual inputs. Variations in cell size can be detected by using forward scatter in a flow cytometry experiment; other extrinsic fluctuations can be quantified by measuring an additional species without any (direct or indirect) regulatory relation to the species of interest. Associated spurious correlations can be suppressed by sorting the data into bins for the forward scatter or the concentration of the additional species, computing the statistics of interest separately for each bin, and then pooling the results SI Appendix, Fig. S4). The main limitation of this procedure is that one must be able to detect sufficient variations in the species of interest beyond those arising from the extrinsic noise.
Phageλ Construct.
To assess whether signatures of combinatorial regulation would be detectable experimentally, we consider the genetic circuit based on the phageλ operon from ref. 37. In this system, the P_{RM} promoter and O_{R}2 binding site are in their natural locations and an additional binding site for the Escherichia coli lac activator CRP is located upstream; the cI dimer activates transcription by binding to O_{R}2 (Fig. 3A). The authors also created a second construct that only contained the P_{RM} promoter and O_{R}2 binding site. By measuring the expression from these 2 constructs in the presence and absence of cI protein, they were able to assay the activities resulting from each of the four possible combinations of transcription factors bound: none, just CRP, just cI, or both. These had activities of 21, 206, 158, and 1,420 units, respectively, showing that these protein factors function synergistically (1,420 units > 158 + 206 units). The authors also identified a mutant that served as a positive control (pc). In that case, the combination of cI and CRP was nearly additive (293 units).
We performed realistic stochastic simulations (35) of this circuit by using parameters taken from the literature (37–40). The reactions and parameters that we used can be found in SI Appendix, Tables S2 and S3. We expect these simulations to faithfully reflect the biological system because phageλ is a wellstudied system for which many parameters are measured; comparable models are capable of accurately reproducing distributions of protein concentrations in prokaryotic systems (33,41). To understand how the presence of additional regulators affects the noise signatures, we also include the phageλ protein Cro which competes with cI for binding to the O_{R}2 binding site and represses transcription from P_{RM} when it is bound.
As expected from the theory (Eq. 5), the 3point correlation function is always positive for the wildtype construct, which reflects the synergy between CRP and cI (Fig. 3B, red symbols). As more Cro is added, fluctuations in cI have smaller effects because the O_{R}2 binding site is occupied more often by Cro. When the concentration of Cro is high, Cro is nearly always bound, and fluctuations in cI have almost no effect on the output. Consistent with the theoretical predictions, simulations with parameters corresponding to the pc mutant yielded 3point correlation functions close to zero, which indicates that the activators function nearly independently in that case (Fig. 3B, blue symbols). Thus, the simulations show that synergistic and additive modes of combinatorial regulation can be distinguished by their noise signatures in a realistic situation. The dependence on the concentration of Cro in Fig. 3B shows that the approach is robust to fluctuations in additional, unmeasured species so long as they do not dominate the behavior of the gene.
Discussion
Recently, calculation of the noise properties of biological systems has received much attention. Most studies have focused on the variance of the protein concentration that has been calculated for models of single unregulated genes (26,27,32), simple gene circuits (32), cascades of genes (23), and signaling cascades (28,42,43). A result for the variance of the output of a gene regulated combinatorially by multiple transcription factors has also been reported recently (44), and this statistic, together with the halfautocorrelation time has been suggested to be useful for discriminating between competing models (45). In contrast, we have focused on the crosscorrelations between the regulators and the gene product. It is these crosscorrelations, rather than the variance, which contain the unambiguous signature of the mode of combinatorial regulation. In particular, we showed that the sign of the 3point correlation function can distinguish AND and ORlike logic operations in both idealized and realistic systems with 2 regulators. In the latter case, the theory led to an experimentally testable prediction: the 3point correlation function will be positive for the natural construct and close to zero for the pc mutant. This prediction can be tested by using flow cytometry to measure the concentrations of the regulators and output (e.g., as in refs. 33 and 46). Although we have focused on the sign of the correlation function and that is independent of units, in some applications it may be useful to utilize a dimensionless quantity. By analogy with the Pearson pair correlation coefficient, the 3point function can be normalized to give a value between −1 1 (see SI Appendix, Figs. S5 and S6).
Despite the apparent importance of combinatorial regulation (12–19), knowledge of it is presently limited to a few wellstudied examples such as the mediation of the response to endocrine signaling by nonsteroid nuclear receptors (12) and the regulation of Oct genes by POU and SOX in pluripotent mammalian cells (13). We believe that the approach presented here will be useful for probing transcription factor synergies and thus expanding our understanding of combinatorial regulation. Beyond the analysis illustrated here for the bacterial construct, one can imagine combining knowledge of the signatures of combinatorial regulation with statistical inference techniques such as Bayesian methods (47). In such an approach, one could use existing statistical methods to characterize the casual relationships and identify genes thought to have more than one input. One could then use the 3point correlation function to determine the degree to which each such node behaves synergistically. Finally, the casual relationships could be reanalyzed in a Bayesian framework by using conditional probability distributions that account for the combinatorial regulation (i.e., distributions with the appropriate third moments).
The applicability of the analysis in experimental situations is premised on the idea that variations in the output are related to the variations in the regulator levels measured at the same time. If there were a long lag in the detectable response of a gene in comparison with the rates at which the levels of the relevant transcription factors fluctuated within individual cells, singletime multipoint correlation functions would not be informative. However, fluctuations in protein levels in eukaryotic cells can persist for many days and over multiple cell generations (48), so it is likely that singletime measurements can be used in these systems as well as in prokaryotic systems where there are generally fewer layers of regulation between transcription and protein expression and longer lag times in responses are less common.
Although we have focused on the signatures of synergies between 2 regulators that are present in the 3point correlation function, the general relationship (Eq. 3) shows that synergies between more regulators can be probed by measuring higherorder correlation functions. For example, by extending the 2 regulator model above to include an additional regulator, we found that the 4point correlation function contains the signature of synergies which are not accounted for by pairwise and single transcription factor effects. Note, however, that the fact that the (j + 1)point correlation function scales with the system size as
Acknowledgments
We thank Harinder Singh, Ben Gantner, Roger Sciammas, Lawrence Uricchio, and Mark MaienscheinCline for helpful discussions and critical readings of the manuscript. This work was supported by the National Science Foundation and the Chicago Biomedical Consortium.
Footnotes
 ^{1}To whom correspondence should be addressed. Email: dinner{at}uchicago.edu

Author contributions: A. W. and A. R. D. designed research; A. W. performed research; and A. W. and A. R. D. wrote the paper.

This article contains supporting information online at www.pnas.org/cgi/content/full/0809314105/DCSupplemental
 © 2008 by The National Academy of Sciences of the USA
References
 ↵
 ↵
 ↵
 Milo R,
 et al.
 ↵
 ↵
 ↵
 Setty Y,
 Mayo AE,
 Surette MG,
 Alon U
 ↵
 Kuhlman T,
 Zhang Z,
 Saier MH,
 Hwa T
 ↵
 Mayo AE,
 Setty Y,
 Shavit S,
 Zaslaver A,
 Alon U
 ↵
 Yuh CH,
 Bolouri H,
 Davidson EH
 ↵
 Davidson EH,
 et al.
 ↵
 Hermsen R,
 Tans S,
 ten Wolde PR
 ↵
 ↵
 ↵
 Buchler NE,
 Gerland U,
 Hwa T
 ↵
 So CW,
 Cleary ML
 ↵
 ↵
 ↵
 ↵
 ↵
 Elowitz MB,
 Levine AJ,
 Siggia ED,
 Swain PS
 ↵
 Vilar JM,
 Kueh HY,
 Barkai N,
 Leibler S
 ↵
 Rosenfeld N,
 Young JW,
 Alon U,
 Swain PS,
 Elowitz MB
 ↵
 Pedraza J,
 van Oudenaarden A
 ↵
 ↵
 ↵
 ↵
 Pedraza JM,
 Paulsson J
 ↵
 ↵
 ↵
 Chandler D
 ↵
 van Kampen NG
 ↵
 Thattai M,
 van Oudenaarden A
 ↵
 ↵
 ↵
 ↵
 Swain PS,
 Elowitz MB,
 Siggia ED
 ↵
 Joung JK,
 Koepp DM,
 Hochschild D
 ↵
 ↵
 ↵
 ↵
 Mettetal JT,
 Muzzey D,
 Pedraza JM,
 Ozbudak EM,
 van Oudenaarden A
 ↵
 Shibata T,
 Fujimoto K
 ↵
 ↵
 Sanchez A,
 Kondev J
 ↵
 Cox CD,
 McCollum JM,
 Allen MS,
 Dar RD,
 Simpson ML
 ↵
 Anderson LM,
 Yang H
 ↵
 Sachs K,
 Perez O,
 Pe'er D,
 Lauffenburger DA,
 Nolan GP
 ↵
Citation Manager Formats
More Articles of This Classification
Physical Sciences
Applied Physical Sciences
Biological Sciences
Biophysics
Related Content
 No related articles found.