New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Intrinsic and extrinsic contributions to stochasticity in gene expression

Edited by Robert H. Austin, Princeton University, Princeton, NJ, and approved June 17, 2002 (received for review January 23, 2002)
Abstract
Gene expression is a stochastic, or “noisy,” process. This noise comes about in two ways. The inherent stochasticity of biochemical processes such as transcription and translation generates “intrinsic” noise. In addition, fluctuations in the amounts or states of other cellular components lead indirectly to variation in the expression of a particular gene and thus represent “extrinsic” noise. Here, we show how the total variation in the level of expression of a given gene can be decomposed into its intrinsic and extrinsic components. We demonstrate theoretically that simultaneous measurement of two identical genes per cell enables discrimination of these two types of noise. Analytic expressions for intrinsic noise are given for a model that involves all the major steps in transcription and translation. These expressions give the sensitivity to various parameters, quantify the deviation from Poisson statistics, and provide a way of fitting experiment. Transcription dominates the intrinsic noise when the average number of proteins made per mRNA transcript is greater than ≃2. Below this number, translational effects also become important. Gene replication and cell division, included in the model, cause protein numbers to tend to a limit cycle. We calculate a general form for the extrinsic noise and illustrate it with the particular case of a single fluctuating extrinsic variable—a repressor protein, which acts on the gene of interest. All results are confirmed by stochastic simulation using plausible parameters for Escherichia coli.
Molecules are discrete entities. When present in large numbers, addition or removal of any single molecule typically has little effect on the properties of a system. However, stochastic fluctuations can become significant in smaller systems. In living cells, many components are present at very low copy numbers, [e.g., of order one for DNA loci and of order tens for transcription factors (1)]. Therefore, stochastic effects are thought to be particularly important for gene expression and have been invoked to explain cell–cell variations in clonal populations (2–4). Indeed, cellular components interact with one another in complex regulatory networks. Thus, fluctuations in even a single component may potentially affect the performance of the entire system.
Consider a particular gene of interest. The amount of protein it produces will vary from cell to cell in a population and over time in a single cell. These fluctuations originate in two ways: First, even if all cells were in precisely the same state, the reaction events leading to transcription and translation of the gene would still occur at different times, and in different orders, in different cells. Such stochastic effects are set locally by the gene sequence and the properties of the protein it encodes and will be referred to as “intrinsic” noise.
In addition, one must consider that other molecular species in the cell, e.g., RNA polymerase (RNAP), are themselves gene products and therefore will also vary over time and from cell to cell. This variation causes additional, and corresponding, fluctuations in the expression of the gene of interest that will be referred to as “extrinsic” noise. Thus, extrinsic sources of noise arise independently of the gene but act on it. Examples of extrinsic variables are numerous. They include the number of RNAPs or ribosomes, the stage in the cell cycle, the quantity of the protein, and mRNA degradation machinery, and the cell environment. In general, the total variation in gene expression will have both intrinsic and extrinsic sources. A particular cellular component will suffer intrinsic fluctuations in its own concentration and, at the same time, will be a source of extrinsic noise for other components with which it interacts.
Although the stochastic nature of gene expression has long been postulated (2), previous theoretical research (5–11) has concentrated on intrinsic noise. Excepting studies of plasmid copy number control (12), extrinsic effects have only been added in a post hoc manner (13). It is not known which molecular properties influence noise or even how a clear measurement of intrinsic noise could be obtained in vivo.
This paper seeks to address several problems. First, we distinguish between intrinsic and extrinsic sources of noise and integrate both within a single framework. Second, we model intrinsic noise at a level that allows direct connection with biochemical parameters, including those related to cell growth. Third, we suggest an experimental method that can be used to discriminate and quantify the two components of noise in living cells. Our approach, by integration of intrinsic and extrinsic effects, is general enough to allow comparison with experimental data (14).
Definitions
Fluctuations in the rates of transcription and translation of a particular gene result in corresponding fluctuations in the amount of its protein product. A natural and biologically relevant measure of the magnitude of gene expression noise is thus the size of protein fluctuations compared to their mean concentration. If P(t) is the protein concentration at time t, then the protein noise, η(t), is given by 1 where the angled brackets denote an average over the probability distribution of P at time t. We will similarly use standard deviation divided by mean, or coefficient of variation, as a measure of noise in other distributions.
To examine the noise for a particular gene across a cell population, let the intrinsic and extrinsic variables (including time—cells are typically desynchronized) for that gene be given by vectors I and E, each of whose components represent a different source of noise. The expression level of the gene in one cell, as measured experimentally, is denoted P_{k} (with k a cell label). From a snapshot of N genetically identical cells, the P_{k}s can be averaged to find the moments of the protein distribution. This averaging process (where m = 1 and m = 2 for the mean and variance, respectively) is equivalent to 2 Here p(EI) is the probability density function for the intrinsic and extrinsic variables, and P(E, I) is the measured expression level for particular values of E and I. Using the product rule of probabilities, this becomes 3 The second integral is an average over the intrinsic variables with the extrinsic variables held fixed and shall be denoted by angled brackets: 4 Averages over the extrinsic variables will be indicated with an overbar, so that Eq. 3 becomes 5 that is, an average over both intrinsic and extrinsic noise sources.
Hence, the measured noise, η_{tot}, defined empirically by 6 is equivalent to 7 This can be written as 8 In other words, the square of the experimentally measurable noise is a direct sum of the intrinsic, η_{int}, and extrinsic, η_{ext}, contributions. The intrinsic noise, η_{int}, is proportional to the variance of the intrinsic distribution, calculated for a particular value of the extrinsic variables and then averaged over all possible values of these variables. The extrinsic noise, η_{ext}, vanishes as the extrinsic distributions become more and more spiked.
Finally, we need to address experimentally how both intrinsic and extrinsic contributions can be discriminated from the total noise, given by Eq. 6. From Eq. 8, this requires a measurement of the quantity . Consider what would happen if two identical copies of the gene were present in the same (kth) cell, and their protein products, labeled P and P, were measured simultaneously. These will have different values of the intrinsic variables, but, because both are present in a single cell, they will be exposed to the same intracellular environment and so have the same value of the extrinsic variables. Therefore, by summing their product, we obtain 9 precisely the average needed.
Experimentally, two distinguishable variants of green fluorescent protein, corresponding to P^{(1)} and P^{(2)}, would allow estimation of Eq. 9 (14). By considering quantities such as ∑_{k}(P − P)^{2}, the intrinsic noise could be measured and η_{ext} extracted from the total noise by using Eq. 8.
Expressions for Intrinsic Noise
To understand the sources of intrinsic fluctuations, consider the simplified model of gene expression shown in Fig. 1. All extrinsic variables (excepting time) are set to constant values; thus, the binding of RNAP, ribosomes, and degradosomes, for example, become firstorder processes as their respective concentrations are held fixed. We model the bacterial cell cycle by allowing the gene copy number, n, to double at some (fixed) time t_{d} into each cycle and to halve at cell division (time T). NonDNA molecular species are randomly distributed at cell division between the two daughter cells.
Fig. 1 Inset shows a simplified version of the model that can be solved analytically. Two effective rate constants, marked with primes, have been introduced and can be related to those of the full system. The mRNA halflife is given by the set of differential equations describing 〈mR^{U}〉 and 〈mC^{2}〉: d′_{0} ≃ log 2 × (ℓ_{1} − ℓ_{0})/2 with ℓ_{1} = k_{1} + mb_{1} + mf_{0} + mf_{1} and ℓ = ℓ − 4 mf_{0}(k_{1} + mb_{1}). The number of proteins made from a particular mRNA is distributed geometrically (similar to ref. 6), and so the mean number of proteins produced per transcript, b, can be shown to be 10 and the overall translation rate is ν′_{1} = bd′_{0}, by definition.
Simulation Results
Stochastic simulation (16, 17) was used to model the scheme of Fig. 1 by using parameter values published as supporting information on the PNAS web site, www.pnas.org. The probability of a given reaction occurring is equal to the product of the rate constant for that reaction and the number of potential reactants present. Time steps between reactions obey a Markov process and take account, for binary reactions, of the growing cell volume (17). The latter increases linearly (18) from an initial value until cell division (at time T), when it halves. Gene doubling and binomial partitioning of nonDNA molecules are included, and one daughter cell is followed at each division.
After a sufficient number of divisions, the protein number and concentration tend not to a steadystate but rather to the limit cycle (whose period is set by cell division), shown in Fig. 2. The slight kink in the protein number curve is due to the increased rate of protein production as the number of genes doubles (chosen arbitrarily to be at time t_{d} = 0.4T into the cell cycle). The protein concentration is approximately the same before and after cell division once the limit cycle state has been reached. It falls initially (before gene replication) as protein is produced at a rate slower than that of cell growth.
The time scales associated with transcription and mRNA degradation are much shorter than the protein decay rate or the cell cycle time (see supporting information). Therefore, mRNA levels alternate between two approximately steady states, with a short transient in between. Despite the noise, this effect can be discerned in Fig. 2 (compare numbers for t < t_{d} with those for t > t_{d}).
Analytical Results
Assuming that the molecules involved in transcription are in one of two steady states (depending on the gene copy number n), and that all other time dependence is absorbed into the protein distribution, then the simpler model, shown in Fig. 1 Inset, can be substituted for the full scheme and solved analytically (see supporting information).
The mean mRNA number satisfies 11 before replication, t < t_{d} and twice this result for t > t_{d}. Here, ℓ = f_{0} + b_{0} + k_{0}.
The mean protein number, which changes with time, obeys 12 with φ_{0} a continuous function of t, 13 Note that the factor of two arising from gene replication is absorbed into the function φ_{0}, and so 〈mR〉 in Eq. 12 is given by Eq. 11 regardless of t being greater or less than t_{d}.
Eqs. 12 and 13 can be understood as being the solution of 14 with continuity at t = t_{d} and the limit cycle boundary condition 15 which arises because of the partition at each cell division (see Fig. 2 Upper), that is, a simple birthanddeath process with the birth rate doubling after gene replication.
The noise in mRNA number is 16 Eq. 16 is less than the Poisson value, 1/〈mR〉, because conservation of DNA species limits the maximum amount of C that can be present. This in turn gives an upper bound to the rate at which T is created, narrowing the distribution of T. A narrower T distribution leads to a narrower mRNA distribution and so to the negative term in Eq. 16. Typically, however, this correction is rarely large.
The intrinsic noise in protein number (denoted η̂_{int}, rather than η_{int}, as the extrinsic variables have not yet been averaged away) satisfies in the limit small d_{1}/d′_{0} (see supporting information for the full expression), 17 with 18 for each traversal of the limit cycle. Note that 1/〈P〉 is also of order d_{1}/d′_{0} from Eq. 12 as ν′_{1} = bd′_{0}. Eq. 17 contains a Poisson term, the mean 〈P(t)〉, and a nonPoisson term, which is a measure of the stochastic effects present in transcription. The limit d_{1}/d′_{0} ≪ 1 taken is expected to be valid for many genes in E. coli, because protein lifetimes are typically hours, whereas those of mRNA are only minutes (see supporting information).
Fig. 3 shows the good agreement between theory and simulation. Because of the difference in gene copy number, the number of mRNAs for t > t_{d} is approximately twice that for t < t_{d} (see Eq. 11). If cell cycle effects are temporarily set aside, the protein noise, dominated by mRNA number, will therefore be higher for a gene copy number of n than for a gene copy number of 2n. However, because of the cell cycle, immediately after cell division the protein noise is low (being still determined by the previous 2n gene state) and will grow for 0 < t < t_{d} as it tends toward the higher value set by the cell being in a n gene state. Immediately after gene replication, the noise is high (from the cell having just left the n gene regime) and so for t_{d} < t < T will fall as it tends toward the lower value prescribed by the cell's 2n state. Consequently, the intrinsic protein noise goes through a maximum at t = t_{d}.
The intrinsic noise, via Eq. 12 and using ν′_{1} = bd′_{0}, can also be written as 19 with Φ_{0}(t) = 1/φ_{0}(t) given in Eq. 13. The parameter dependence and form of Eq. 19 can in fact be shown to hold for the full model of Fig. 1 when d_{1}/d′_{0} ≪ 1. By inspecting Eq. 19, only the first term depends on the parameters controlling translation rates. Therefore, if the second term dominates the first, transcription rather than translation will determine protein intrinsic noise. Expanding in d_{1}T, this condition is fulfilled if 20 for d_{1}T ≪ 1. As 1 − f_{0}k_{0}/ℓ^{2} ≥ 3/4, Eq. 20 will certainly be satisfied, and so transcription will dominate over translation if the number of proteins per transcript (or burst size in the terminology of refs. 6 and 10) is much greater than two, i.e., b ≫ 2. In fact, the number of proteins per transcript may be of order tens (19) (although individual translation rates vary widely). In such cases, we conclude that transcription is the chief source of intrinsic noise. From Eq. 20, noise at translation becomes important only if mf_{1}, the rate of ribosome binding, and k_{1}, the rate of commitment of a bound ribosome to carrying out translation, are low and if mf_{0}, the rate of mRNA degradation, is high.
Previous work claimed that translation controls protein noise (10). This conclusion was reached by using an alternative noise definition (variance, rather than standard deviation, over the mean), which divides out all dependence on transcriptional parameters. In contrast, Eq. 19 treats transcription and translation on an equal footing and keeps all parameter dependence explicit. (For example, if the gene copy number, n, is increased 100fold, the noise is reduced as expected intuitively.) The conclusion that translation is a minor contributor to noise (for b > 2) is thus transparent. This conclusion is also confirmed by recent independent simulations of LacZ expression (7).
Eq. 19 makes explicit the dependence of the intrinsic noise on the parameters shown in Fig. 1. A high ratio of protein to mRNA lifetime (low d_{1}/d′_{0}) reduces noise. Given Eq. 11, the importance of the rates controlling the initiation of transcription can be seen; both terms in Eq. 19 decrease for high f_{0}, the “on” rate of the RNAP, a large isomerization rate, k_{0}, and to a lesser extent a low RNAP “off” rate, b_{0}. The square of the noise is independent of ν_{0} and ν_{1} and varies inversely with the gene copy number, n. Thus, for example, fastgrowing bacteria, which undergo multiple rounds of initiation of DNA replication before division, are therefore expected to be intrinsically less noisy because of their higher n values (depending on gene distance from the origin of replication). The additional parameter dependence when translation also becomes important, i.e., when b ≪ 2 and η^{2} ∼ 1/b, is given by Eq. 10.
To illustrate the role of the cell cycle and the significance of the gene replication time, t_{d}, let us, for the moment, ignore all other effects and assume that time is the only extrinsic variable. Then, the extrinsic average (the overbar) in Eq. 8 is just a cell cycle average and 21 for example. This experimentally accessible value is an approximation to the “true” average value of the intrinsic noise, ∫ dt η̂ (t), which involves only one integral and not two. Table 1 demonstrates the validity of Eq. 21 as an estimate of the true noise and the excellent agreement between theory and simulation.
In the limit where d_{1}T → 0 (appropriate for GFP in E. coli), protein does not decay but is diluted only because of partition at each cell division. Eq. 21 is then simple to evaluate and satisfies (with τ_{d} = t_{d}/T) 22 Both terms in Eq. 22 can be seen to increase monotonically with τ_{d} = t_{d}/T; a low t_{d} implies that the cell spends most time in the high gene copy number state, 2n, and so protein and mRNA levels increase, reducing noise. As t_{d}/T increases, Eq. 22 varies by a factor of around 0.35 to 0.7. Thus, a steadystate approximation, in which time dependence is ignored (Φ_{0} = Φ_{1} = 1), could overestimate intrinsic noise by as much as 65%.
Expressions for Extrinsic Noise
Time is, of course, not the only extrinsic variable; the sources of extrinsic noise are multiple, ranging from fluctuations in the bacterial growth rate to changes in the degree of DNA supercoiling and are often poorly understood. The simplest way to model these effects is to let each extrinsic variable, E_{k}, have mean μ_{k} and standard deviation, σ_{k}, and an independent, normal distribution. Using asymptotic expansion methods (20), performing an extrinsic average of a function f(E) then gives 23 in the limit of small extrinsic noise, η_{k} = σ_{k}/μ_{k}. Here we write ∂_{k} for a partial derivative with respect to the variable x_{k} = E_{k}/μ_{k}.
The extrinsic average arises in Eq. 8 as, experimentally, averages are taken over a population of cells, each cell having a different set of values of the extrinsic variables. The theoretical intrinsic noise, Eq. 19, is calculated with all extrinsic variables fixed and therefore needs a correction term before it can be properly compared with experiment. Rather than also average over time explicitly, we will, for the sake of clarity, treat it just as a parameter in this section. The intrinsic noise, η, then satisfies, via Eq. 23, 24 with η̂_{int} given by Eq. 17 and all the extrinsic variables set to their mean values, μ_{k}. Experimentally, it is possible to continuously vary the rate of transcription from certain inducible promoters (14, 21). Eq. 24 implies that there is a correction to the more naive expectation, from Eq. 19, that the intrinsic noise should vary as the inverse square of induction level (assuming that the latter is proportional to the number of mRNAs).
The extrinsic noise, η_{ext}, obeys 25 with all extrinsic variables set to their mean and χ_{k} = (∂_{k} log〈P〉)^{2} defined, in analogy with statistical physics, as a noise “susceptibility.” The individual χ_{k} measure the contribution of a particular extrinsic process to the total noise strength.
An Example: Repression
To illustrate the effects of a fluctuating extrinsic variable, we consider a gene of interest that is repressed by another extrinsic protein. The repressor has a noise given by Eq. 17 with the appropriate parameter values for its own expression. To find the intrinsic noise, from Eq. 9, we assume that two identical copies of our gene are present, both acted on by the same repressor.
Repression is modeled by the repressor, R, binding to the promoter and preventing access to it by RNAP (ref. 22). This repressor–DNA complex forms and decomposes with rates f_{1} and b_{1}, respectively, implying that the mean mRNA number of the (repressed) protein obeys 26 where β = b_{0} + k_{0} and K = f_{1}/b_{1}.
In Eq. 26, to make the extrinsic repressor concentration explicit, the “on” rate (which is really a secondorder process) should be written as f_{1} = f̂_{1}μ_{rep}, where f̂_{1} is the binding rate of a single repressor to DNA, and μ_{rep}, the mean repressor number. Applying the definition of χ_{k} to Eq. 12, with 〈mR〉 given by Eq. 26, the repressor noise susceptibility can be found, and Eq. 8 becomes 27 where η̂ is given by 28 with Eq. 26 again.
The total, intrinsic, and extrinsic noises found by simulation are compared with the corresponding theoretical values (with time again included properly as an extrinsic variable) in Table 2. The good agreement, as well as validating Eqs. 27 and 28, also demonstrates the suitability of the Gaussian approximation implicit in Eq. 23 as the repressor is expressed in the simulation using the full scheme of Fig. 1 and not added in an ad hoc manner.
The repressor noise may dominate the extrinsic noise to such an extent that Eq. 27 is still valid when all other extrinsic variables fluctuate. For example, in ref. 14, the transcription rate of two distinguishable alleles of gfp, both having the same regulatory sequences, is controlled by a repressor. When the repressor concentration is systematically varied from low to high values (by adding different amounts of inducer), the extrinsic noise goes through a maximum (14). This phenomenon can be understood by realizing that the extrinsic noise is dominated by the repressor. Although χ_{rep} increases with increasing repressor concentration, η_{rep}, the noise in repressor number, decreases. As a result of these two opposing behaviors, η_{ext} (and therefore η_{tot}) exhibits a maximum as a function of repressor concentration. This behavior illustrates clearly the importance of noise susceptibilities in setting cell–cell variation.
Conclusion
We have presented a theoretical framework that enables interpretation of experimental measurements of stochasticity in gene expression. Cell–cell variation in expression of a single gene (η_{tot}) is not a direct measure of intrinsic noise. Rather, it contains both intrinsic and extrinsic contributions. In particular, extrinsic noise, a consequence of the different local environments of the gene in the different cells, must be considered.
Only the intrinsic variables (given in Fig. 1) vary from gene to gene, as well as moment to moment, within a particular cell. By changing the parameters that influence these variables, the cell can locally affect the noise in expression of a given gene. On the other hand, alterations in the extrinsic variables can potentially affect all genes within the cell (although the magnitude of these effects for one gene may be very different from those for another). Eqs. 17 and 19 are analytical expressions for the major component of the intrinsic noise. There is a Poisson term, expected for a birthanddeath process, determined by the protein mean, and an additional contribution coming from the noise generated during transcription (essentially a time average of the noise in mRNA level). Two noise regimes exist: if the translation efficiency, or burst size (6), b, is high (more than two proteins per transcript), as is believed to be typical in E. coli, then transcription dominates intrinsic noise. Otherwise, translational effects must also be considered.
All the major steps in transcription and translation are accounted for, and the complete parameter dependence of the noise is given by Eq. 19 with Eq. 10. Intrinsic noise (except possibly for very short lived proteins) is unaffected by ν_{0} and ν_{1}, the rates of transcription by RNAP and translation by a ribosome, respectively. As transcription usually dominates, f_{0} and b_{0}, the “on” and “off” rates of RNAP as well as the isomerization rate, k_{0}, strongly influence noise. Longerlived proteins (compared to mRNA lifetimes) and genes with high copy number are less stochastic. The chromosomal position of the gene also controls intrinsic noise—genes replicated early being less noisy.
The cell cycle drives protein numbers and intrinsic noise to a limit cycle. Protein numbers can be significantly different from the steadystate approximations used in the literature. The intrinsic noise itself does not change appreciably during the course of the cell cycle, but the cell cycle is crucial in determining its absolute magnitude.
The extrinsic noise is expected to be a linear sum of the noise in each of the extrinsic variables (see Eq. 25), where the coefficients play the role of noise “susceptibilities.” These susceptibilities determine the relative importance of each term in the total extrinsic noise and allow exploration of how the environment in which a gene is expressed influences its expression level. By simulating a repressed gene, where the repressor number is the only fluctuating extrinsic variable, we have verified our analytical expressions are quantitatively correct. Experimentally, the intrinsic and extrinsic noise can often be of similar magnitude (14). For a given gene, however, the quantity of interest is usually the intrinsic noise, which we have shown here can be measured by monitoring expression from two identical copies of the same gene integrated into each cell (see also ref. 14).
Our theoretical framework should provide support for experimental research (14) aimed at discovering whether noise is detrimental to the cell, whether it can be “regulated away” with higherlevel circuitry (23), and to what extent it might confer evolutionary advantages on a clonal population.
Acknowledgments
We are grateful for conversations with S. Bekiranov, A. J. Levine, J. Paulsson, N. Rajewsky, B. Shraiman, N. Socci, and M. Zapotocky. P.S.S., M.B.E., and E.D.S. acknowledge support from the National Institutes of Health (GM59018), the Seaver Institute and BurroughsWellcome Fund, and the National Science Foundation (DMR0129848), respectively.
Footnotes
Abbreviation
 RNAP,
 RNA polymerase
 Received January 23, 2002.
 Copyright © 2002, The National Academy of Sciences
References
 ↵
 ↵
 ↵
 Arkin A,
 Ross J,
 McAdams H H
 ↵
 ↵
 McAdams H H,
 Arkin A
 ↵
 Kierzek A M,
 Zaim J,
 Zielenkiewicz P
 ↵
 Thattai M,
 Van Oudenaarden A
 ↵
 ↵
 ↵
 Hasty J,
 Pradines J,
 Dolnik M,
 Collins J J
 ↵
 Elowitz M B,
 Levine A J,
 Siggia E D,
 Swain P S
 ↵
 Neidhardt F C
 Record T M,
 Reznikoff W S,
 Craig M L,
 McQuade K L,
 Schlax P J
 ↵
 ↵
 Gibson M A,
 Bruck J
 ↵
 Kubitschek H E
 ↵
 Neidhardt F C
 Bremer H,
 Dennis P P
 ↵
 Bender C M,
 Orszag S A
 ↵
 ↵
 Ptashne M
 ↵