## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Transcriptional control of noise in gene expression

Edited by Charles R. Cantor, Sequenom, Inc., San Diego, CA, and approved January 25, 2008 (received for review August 23, 2007)

## Abstract

*Cis*-regulatory control of transcription is the dominant form of regulation of gene expression. Recent experimental results suggest that, in addition to the mean expression level, cell-to-cell variability might also be transcriptionally regulated. Here, we develop a stochastic model of transcriptional regulation that allows us to calculate closed-form analytical expressions for the mean and variance of the protein and mRNA distributions for an arbitrarily complex *cis*-regulatory motif. Our model allows us to investigate how noise may be transcriptionally regulated independently from the mean expression. We show that our approach is in excellent agreement with stochastic simulations and experiment, and leads to an experimentally testable formula for the noise in gene expression as a function of inducer-molecule concentrations.

Transcription is initiated at promoter sequences, at which RNA polymerase (RNAP) proteins bind specifically to catalyze the initial steps that lead to the synthesis of mRNA. At any given time, the cell must regulate which genes are expressed, and to what degree. The amount of gene expression is regulated by transcription factor (TF) proteins that bind specifically to operator or enhancer sequences on the DNA. The TFs influence the transcription rate by interacting with the RNAP and the chromosomal DNA, either repressing or activating transcription initiation (1⇓⇓–4).

Because of the small copy number of promoters, TFs, and mRNA molecules involved in gene expression, the process described above is random (5⇓⇓–8). The stochastic nature of gene expression has been experimentally proven both in eukaryotes, for example, in mammalian cells (9) and yeast (10⇓–12), and in prokaryotes, for example, in *Escherichia coli* (13, 14). There has been much progress over the past few years in understanding the origins of this stochasticity. Several different models and experiments have pointed out the contribution of extrinsic sources of stochasticity (15, 16), such as the cell cycle, fluctuations in the number of RNAPs or ribosomes, or other global transcriptional regulators (for a recent review, see ref. 17). Other work has focused on intrinsic factors such as the relative influence of the rate of transcription and the translation efficiency of the generation of noise in protein and mRNA content (18⇓⇓–21).

Several lines of evidence (9⇓⇓⇓–13, 17) point to slow transitions between active and inactive promoter states as an important source of noise in gene expression, which is responsible for generating heterogeneity in the response of genetically identical cells to the same stimulus. It has been demonstrated recently in yeast cells that high levels of cell-to-cell variability, originated by slow promoter state fluctuations, may confer cell colonies with an enhanced probability of survival when subjected to external stress, such as addition of high concentrations of antibiotic (10). These studies were performed on the synthetic GAL1* yeast promoter, a complex promoter that is regulated by multiple transcription factors and two different inducers (10, 11). Stochastic simulations have identified a candidate regulatory scheme that gives excellent agreement with the data. The effects of different mutations in the promoter have also been explored, both experimentally and through stochastic simulations (10, 11). These studies, along with other recent work in yeast (12) represent a first attempt to understand how the architecture of the regulatory region controls not only the level of gene expression, but the noise as well.

Traditionally, gene regulation in organisms like yeast or *E. coli* has been studied through bulk *in vivo* transcription experiments based on reporter gene assays, in which the mean expression level is measured as a function of one or more transcription factor or inducer concentrations. Data obtained in this way can be rationalized and connected to molecular mechanisms of transcriptional regulation by using thermodynamic models. These models provide a theoretical framework, based on statistical mechanics, that yields analytical formulas for the mean expression as a function of transcription factor or inducer molecule concentrations for promoters of arbitrary complexity (1, 2, 22, 23).

This combination of theory and experiment has been very useful in developing quantitative models of combinatorial transcriptional control of bacterial promoters, and in suggesting new experiments (4, 24).

With the advent of single-cell gene expression experiments, the quantitative outcome of an experiment is no longer simply the mean expression coming from a population of cells, but rather a histogram of gene expression levels that describes cell-to-cell variability (11). Thermodynamic models can only account for the mean, but not for other moments of the mRNA and protein distributions. This state of affairs provides the impetus for developing theoretical models that answer the experimental challenge of relating the measured cell-to-cell variability to proposed molecular mechanisms of gene regulation in a quantitative and predictive way.

In this article, we develop a theoretical model of transcriptional regulation that can be used to study promoters of arbitrary combinatorial complexity, mirroring the range of applicability of thermodynamic models but providing analytical formulas for all of the moments of the probability distribution of mRNA and protein numbers in the cell. We demonstrate the validity of our approach by computing the noise strength as a function of the concentrations of the two inducer molecules that control the GAL1* promoter (11). By analyzing the three-dimensional plot of noise as a function of the two concentrations, we show that the kinetic regulatory mechanism proposed for the GAL1* promoter predicts the existence of a previously undetected mode of independent regulation of noise and mean expression, in which one of the inducers regulates the noise strength without altering the mean expression (which is solely controlled by the other inducer). This effectively creates a two-knob system, in which one knob (i.e., the concentration of one inducer) controls the mean expression, and the second knob (the other inducer) can be tuned to control the level of cell-to-cell variability, without altering the response of the promoter to the first inducer.

We discuss applications of our approach to the rational design of synthetic promoters with a prescribed amount of noise, and the possibility for differential control of transcriptional noise and mean expression levels both in natural and synthetic systems.

## Analytical Model of Stochastic Gene Expression

Our theory of stochastic transcriptional regulation is based on a master equation approach. We view transcriptional regulation as a stochastic process in which the promoter makes transitions between different states. These states are defined by the occupancy of operator or enhancer sites by transcription factors. Furthermore, different promoter states are characterized by different rates of mRNA production., Kepler, Elston, and others (25⇓⇓–28) used this master equation formalism to calculate formulas for the noise strength in the mRNA population introduced by a two-state (ON/OFF) promoter. In the following, we show how to generalize this theory to a promoter with an arbitrary number of states, and also to account for the noise in protein content. The calculations described below are summarized as an algorithm in Fig. 1.

We derive separate equations for the variance and the mean of both mRNA and protein steady-state distributions. For each promoter state *s*, the promoter initiates transcription at a different rate *k _{s}^{r}*. The promoter makes stochastic transitions between promoter states following a Markov process, as different TFs bind and fall off the promoter, causing fluctuations in the transcription rate. To capture the fluctuations in mRNA number generated by fluctuations in the promoter state, we make the assumption that each initiation event corresponds to the synthesis of one mRNA, which is then degraded linearly with rate γ.

The state of our system is described by two stochastic variables: the number of mRNA molecules *m* per cell, and a label *s* characterizing the state of the promoter (16, 25). Thus, the probability distribution function characterizing our stochastic chemical reactor is the bivariate probability *p(s, m)*. Because the total number of promoter states is finite, we can write the bivariate probability *p(s*, *m)* as a vector **p**(*m*) = (*p*(1, m), *p*(2, *m*),…, *p*(*s*, *m*),…). This will prove useful in what follows.

The time evolution for this bivariate probability can be calculated from the master equation (12, 25, 26, 29). For a simple birth-and-death mechanism, in which genes are assumed to be expressed at a constant rate from each promoter state, and mRNA is assumed to be degraded linearly, the master equation for the mRNA probability distribution can be written in matrix form as [see supporting information (SI) *Appendix*].
where we define **K̂** as the matrix whose element *s*, *q* is the kinetic rate of making a transition from state **q** to state *s*, and whose diagonal elements (*K*)* _{s, s}* are equal to the net rate of abandoning state

*s*.

**R̂**is a diagonal matrix whose elements are the rates of transcription initiation from each promoter state

*k*; Γ̂ is also a diagonal matrix whose elements are all equal and given by the mRNA degradation rate γ.

_{s}^{r}It can be shown that the *j*th moment of the mRNA probability distribution *P(m)* can be obtained by multiplying both sides of Eq. **1** by *m ^{j}*, summing over all

*m*, and finally multiplying both sides of the equation by the vector (

**u**= (1, 1, … 1). As described in the

*SI Appendix*, this leads to the following equations for the mean and variance of the mRNA probability distribution in steady state: where we have defined the vectors

**k**= (

^{r}*k*

_{1}

^{r},

*k*

_{2}

^{r}, …,

*k*, …) and

_{s}^{r}**m**

^{(j)}= Σ

_{m}

*m*

^{j}**p**(

*m*). The vector

**m**

^{(0)}can be determined as the solution of the equation

**K̂**·

**m**

^{(0)}= 0, subject to the normalization condition

**u**·

**m**

^{(0)}= 1. The vector

**m**

^{(1)}can be determined as the solution of: The solution to the preceding system of linear equations can be determined by standard techniques of linear algebra.

This model can be extended to determine the moments of the protein distribution by assuming that each mRNA generates a burst of proteins, whose number is geometrically distributed (14, 19, 30, 31). The probability distribution of a burst with *x* proteins is given by *h(x)* = *b ^{x}*/(1 +

*b*)

^{1+x}, with

*b*representing the average burst size (30). The initiation rate at each promoter state is still described by a constant rate

*k*for each state

_{s}^{r}*s*. The relevant stochastic variables describing our system are the number of proteins in the cell

*n*and the promoter state

*s*. Proteins are assumed to be degraded linearly, with rate γ

_{n}.

The master equation for the protein probability distribution can also be written in matrix form as:
The first term in this equation is the rate at which the cell abandons the state characterized by **p**(*n*), and the second term is the rate at which the cell enters the state **p**(*n*) through degradation of one protein. The last term is the rate at which the cells enters the state **p**(*n*) through production of a burst of proteins distributed by *h*(*x*) (hence, the sum). A more detailed explanation is given in the *SI Appendix*. Algebraic manipulation of this equation, following the same steps we described earlier for the mRNA case, yields closed-form expressions for the moments of the protein distribution. With the mean and variance in hand we characterize the noise strength by their ratio. For the mRNA distribution the noise strength is:
whereas for proteins,
Here, **n**^{(0)} is the solution of the equation K̂·**n**^{(0)} = 0 with the normalization condition **u**·**n**^{(0)} = 1, and **n**^{(1)} is the solution to the equation:
A detailed derivation of the equations above is given in the *SI Appendix*. Two things should be noted. Both **m ^{(0)}** and

**n**are the solution to the same equation, and are therefore identical to each other. They are vectors whose

^{(0)}*s*th element is the steady-state probability that the promoter will be in state

*s*. It is interesting to note that, for protein production, both the mean and the noise strength are the same, as if proteins were synthesized at a constant rate

*b*×

*k*, rather than in bursts, with the exception that the noise strength is offset by the mean burst size

_{s}^{r}*b*(19, 29).

## Control of Noise from the GAL1* Promoter in Yeast

With the equations derived above in hand, we can make predictions for how the mean and the noise in gene expression will depend on the concentration of transcription factors. The steady-state mean expression as a function of transcription factor concentration is typically called the regulatory (input) function (4, 32), and it is often calculated by using thermodynamic models of gene regulation (1, 2, 22⇓–24, 33, 34). It is straightforward to verify that Eq. **2** gives the same regulatory input function as the one computed from thermodynamic models. In addition, Eqs. **6** and **7** provide formulas for the noise strength, so they can be used to understand transcriptional regulation of noise and cell-to-cell variability of gene expression (14).

To test the validity of our theory and illustrate how it can be used to make predictions for the transcriptional regulation of expression noise from complex promoters, we apply our formalism to the GAL1* promoter in yeast. Expression from this promoter at the single-cell level has been previously studied both experimentally and computationally (11). Just as the wild-type GAL1 promoter, the GAL1* promoter is activated by transcription factor Gal4, in response to increasing concentrations of galactose. The GAL1* promoter is different from the wild type in that it carries two tetO operator sites upstream from the promoter, which results in repression of transcription initiation when the Tet repressor (TetR) is bound to them. Therefore, transcription can be up-regulated in this promoter either by addition of galactose (Gal) and/or by addition of anhydrotetracycline (aTc), which weakens the affinity of tetR for its operator sequence, rendering it incapable of transcriptional repression.

This dual activation mode results in a multistate regulatory motif for the GAL1* promoter (11). In their experiments, Blake and coworkers expressed TetR constitutively, and then determined the mean expression level and the cell-to-cell variability for a variety of aTc and Gal concentrations. For the rest of the article, we will refer to the whole set of possible values that the aTc and Gal concentrations can take as the induction space (aTc, Gal). Blake proposed a regulatory scheme (Fig. 2), which they implemented by using stochastic simulations, and found it to be in very good agreement with both the experimental mean expression level and the cell-to-cell variability for those points of the induction space ([aTc],[Gal]) that were sampled by the authors (Fig. 3, dotted lines).

Here, we use our analytical theory to derive mathematical formulas for the mean expression level and noise strength generated by the regulatory mechanism shown in Fig. 2, and we evaluate them by using the same parameter values reported in ref. 11. We compare the noise strength and mean predicted from our model with the experimental results and stochastic simulations obtained by Blake *et al.* (11). The results are shown in Fig. 3 *a–c*, and we observe an excellent agreement between the analytical results obtained from our theory and the experimental data and stochastic simulations reported by Blake *et al.* (11).

Moreover, by using the analytical expressions for both the noise and the mean, it is straightforward to obtain a three-dimensional plot of the noise strength and mean expression level for all points in the (aTc,Gal) induction space. This allows us to investigate the cell-to-cell variability, as predicted by the model, for all possible concentrations of both inducers. The result is presented in Fig. 4. We note that the regions of largest noise are two stripes approximately corresponding to the lines ([*aTc*] = 30 ng/ml, [*Gal*]) and ([*aTc*], [*Gal*] = 0.3%) (Fig. 4*a*).

Notably, in Fig. 4 we see that even though the mean expression level does not change significantly when we increase the Gal concentration beyond 2%, the noise strength decreases.

This suggests that it may be possible to regulate the noise independently from the mean expression. To test this idea, we plot in Fig. 5 the predicted mean expression and noise strength for [Gal] = 2%, 5%, and 10% as a function of [aTc]. In Fig. 5*a* we show that the mean response of the GAL1* promoter to variations in [aTc] is practically unaltered by increasing the galactose concentration beyond 2%. However, as we show in Fig. 5*a*, the noise strength can be substantially reduced by increasing the value of [Gal] beyond 2%, up until reaching a point at [*Gal*] > 15% in which the transcriptional noise becomes negligible and all of the cell-to-cell variability is dominated by the burst-like nature of protein synthesis.

Therefore, this analysis indicates that the GAL1* promoter allows for independent control of the *cis*-regulatory function (mean gene expression level vs. aTc inducer concentration) and the cell-to-cell variability, and that each of them can be adjusted by a different inducer. The concentration of aTc serves as a knob that controls the level of gene expression, whereas the concentration of Gal (when the galactose concentration is [*Gal*] > 2%) functions as a separate knob that controls the level of noise without altering the mean. This is a prediction of the theory that suggests a new round of experiments should be performed on the GAL1* promoter at these inducer concentrations.

## Discussion and Conclusions

It has been recently observed that stochastic fluctuations in promoter state represent an important source of stochasticity in gene expression, which can generate phenotypic diversity in genetically identical cells and may be critical to the survival of cell colonies (10). This raises the question of whether this stochasticity can be controlled at the *cis*-regulatory level, just as the mean expression is regulated and, if so, to what extent are the control of mean and noise independent from one another. Previous stochastic models of transcription (19, 25⇓–27) either assumed that genes were expressed at a constant rate, which was modulated through the action of transcription factors, or assumed a two-state promoter, which switched stochastically between one active and one inactive state. The first approach allows for investigation of the noise strength generated by complex promoters, but ignores the now experimentally demonstrated fact that transcription often occurs in bursts, rather than at a constant rate (9⇓⇓⇓–13). The second type of model incorporates transcriptional bursting, but is limited to promoters that can only exist in (or be approximately reduced to) two states, and cannot capture the complex kinetic schemes often associated with promoters that are regulated by more than one species of transcription factors. Such is the case of the GAL1* promoter in yeast and of the canonical lac promoter from *E. coli*, or the PR, PRM, and PL promoters from bacteriophages lambda (3, 8). Both types of analytical models mentioned above are inadequate for analyzing single-cell expression data from real, complex promoters such as the data obtained recently from a number of synthetic yeast promoters (10, 11, 35). Instead, stochastic simulations have been used to quantitatively evaluate the proposed kinetic schemes for these complex promoters by comparing numerical data with experimental data.

We have developed a formalism that leads to analytical expressions for all of the moments of the probability distribution of the amount of gene expressed (in terms of mRNA or protein number) for an arbitrarily complex mechanism of transcriptional regulation. As shown in Eq. **2**, the first moment is the same as that obtained from thermodynamic models, thus providing a more general setting within which the validity of these models for population-wide measurements of gene expression can be assessed. For higher moments, as demonstrated in Fig. 3, our analytical formulas can replace stochastic simulations when analyzing gene expression data, yielding the same results but without the need for computational resources. More important is that analytical formulas enable *in vivo* parameter estimations through data fitting and an analysis of parameter sensitivity. We will develop these applications in a subsequent article.

The formalism presented here leads to quantitative insights into how cell-to-cell variability is transcriptionally regulated. In particular, by applying it to the proposed regulatory mechanism of the GAL1* promoter, we found that it is possible, for now only in theory, to independently regulate the mean expression and the noise by adjusting the concentrations of two different inducer molecules. The complexity of this promoter is crucial for the existence of this, previously unobserved mode of gene regulation. Namely, the two inducers act on separate steps of the regulatory mechanism. The galactose concentration enters linearly in both forward (*k*_{1f}) and backward (*k*_{1b}) transitions between states PC1 and PC2, and RC1 to RC2, but does not influence any other transitions between promoter states (see Fig. 2). At high enough galactose concentrations the ratio between the *k*_{1b} and *k*_{1f} saturates and becomes constant with [Gal], even though the transition rates keep increasing linearly with [Gal]. Because the only dependence of the first moment on the galactose concentration is through the ratio between *k*_{1b} and *k*_{1f} (see Eq. **2**), increasing galactose concentration above saturation of this ratio will not change the mean. Because the remaining kinetic steps are entirely unaffected by the galactose concentration, and only depend on the concentration of the second inducer (aTc), the mean will still depend, now solely, on the concentration aTc. However, as shown by Eq. **3**, the second moment does depend on the absolute value of the transition rates between PC1 and PC2, and RC1 to RC2 through **m ^{(1)}**, and not only on their ratio. This is what allows independent regulation of the mean and cell-to-cell variability of gene expression from this promoter.

This possibility of independently tuning the mean and the noise in gene expression by using inducer molecule concentrations as knobs is an experimentally testable prediction of our theory, and its consequences, we believe, reaches beyond the GAL1* promoter. It raises the possibility that other natural and synthetic promoters may use similar mechanisms as the one studied in this article to transcriptionally regulate the cell-to-cell variability. This is a question that will need to be settled experimentally, but we expect that the tools developed herein will help guide the research in this direction.

## Acknowledgments

We thank Pankaj Mehta for his reading and comments on the manuscript, and Jeff Gelles, Doug Martin, Larry Friedman, and all of the other members of the Gelles lab for support and discussion at the early stages of this project. J.K. was supported by National Science Foundation Grant DMR-0706458.

## Footnotes

- ↵
^{§}To whom correspondence should be addressed. E-mail: kondev{at}brandeis.edu

Author contributions: Á.S. designed research; Á.S. and J.K. performed research; Á.S. analyzed data; and Á.S. and J.K. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0707904105/DC1.

- Received August 23, 2007.

- © 2008 by The National Academy of Sciences of the USA

## References

- ↵
- ↵
- ↵.
- Ptashne M

- ↵.
- Setty Y,
- Mayo AE,
- Surette MG,
- Alon U

- ↵.
- McAdams HH,
- Arkin A

- ↵
- ↵
- ↵.
- Arkin A,
- Ross J,
- McAdams HH

- ↵.
- Raj A,
- Peskin CS,
- Tranchina D,
- Vargas DY,
- Tyaqi S

- ↵.
- Blake WJ,
- et al.

- ↵
- ↵.
- Raser JM,
- O'Shea EK

- ↵
- ↵
- ↵.
- Rosenberg N,
- Young JW,
- Alon U,
- Swain PS,
- Elowitz MB

- ↵.
- Swain PS,
- Elowitz MB,
- Siggia ED

- ↵
- ↵.
- Fraser HB,
- Hirsh AE,
- Giaever G,
- Kumm J,
- Eisen MB

- ↵.
- Thattai M,
- Van Oudenaarden A

- ↵
- ↵
- ↵
- ↵.
- Buchler NE,
- Gerland U,
- Hwa T

- ↵.
- Kuhlman T,
- Zhang Z,
- Saier MH Jr,
- Hwa T

- ↵.
- Mayo AE,
- Setty Y,
- Shavit S,
- Zaslaver A,
- Alon U

- ↵
- ↵
- ↵
- ↵.
- Hornos JEM,
- et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Murphy KF,
- Balazsi G,
- Collins JJ

## Citation Manager Formats

## Sign up for Article Alerts

## Jump to section

## You May Also be Interested in

### More Articles of This Classification

### Physical Sciences

### Related Content

- No related articles found.

### Cited by...

- Selection at the pathway level drives the evolution of gene-specific transcriptional noise
- Integrating regulatory information via combinatorial control of the transcription cycle
- The Evolution of Gene-Specific Transcriptional Noise Is Driven by Selection at the Pathway Level
- Transcriptional precision and accuracy in development: from measurements to models and mechanisms
- Topologically associated domains enriched for lineage-specific genes reveal expression-dependent nuclear topologies during myogenesis
- Structure of silent transcription intervals and noise characteristics of mammalian genes
- Promoter architecture dictates cell-to-cell variability in gene expression
- A Hormonal Regulatory Module That Provides Flexibility to Tropic Responses
- Signatures of combinatorial regulation in intrinsic biological noise