Previous Article |
Table of Contents
| Next Article
MICROBIOLOGY
The global transcriptional regulatory network for metabolism in Escherichia coli exhibits few dominant functional states
Bioengineering Department, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0412
Edited by Philip P. Green, University of Washington School of Medicine, Seattle, WA and approved November 1, 2005 (received for review June 24, 2005)
| Abstract |
|---|
|
|
|---|
systems biology | transcriptional regulation
Genome-scale reconstructions like iMC1010v1 can be used to compute phenotypic states by using constraint-based approaches (7). Dynamic batch-culture growth simulations (8, 9) have been performed by using constraint-based analysis (1014). Constraint-based analysis has been used to assess optimal growth states (15), consequences of gene knockouts (1618), endpoints of adaptive evolution (19), synthetic lethals (20), epistatic interactions between metabolic genes (21), and enzyme dispensability (22). The aim of this work was to comprehensively simulate the full set of possible molecular interactions in iMC1010v1 to gain a global view of the range of functional network states and to compare the results with experimental outcomes.
| Materials and Methods |
|---|
|
|
|---|
Growth Simulations. Of the 108,723 minimal-media compositions, 15,580 were predicted to allow cellular growth by iMC1010v1. Any condition that resulted in a biomass doubling time of, at most, 12 h was considered to be one that allowed cellular growth. Simulations corresponded to log-phase batch growth for 50 min, with time steps in 5-min intervals.
Computation-Based Gene-Expression Profiles. The expression level of each gene for a growth simulation was computed to be the fraction of all time steps for which the gene was active, or "on" and then rounded to 0 or 1.
Computation-Based Activity Profiles. Because of combinatorial logic at promoters, the "off/on" relationship between transcription factors (TFs) and the genes they regulate is often observed to change during a growth simulation. Therefore, we developed an encoding scheme that records the computed gene-expression states of iMC1010v1 and the logic of all regulatory interactions observed in a simulation in a single binary vector. Fig. 1 demonstrates the procedure for forming such computation-based activity profiles. This encoding scheme is regulatory-connection centric as opposed to promoter centric and, as such, loses some of the information about combinatorial regulation at individual promoters. This scheme does, however, have the strength of being comparable with (experimental) expression-based activity profiles (see below).
Clustering Profiles. The similarity between growth simulations was quantified by computing the Hamming (27) distance between either the computation-based gene-expression profiles or the computation-based activity profiles. We developed an agglomerative strategy for clustering profiles. This strategy used a distance cutoff to decide whether to place a profile within an existing cluster or to form a new cluster. The location of the first troughs in Fig. 2 A and B were used as the distance cutoff, giving a cutoff of 45 bits for expression-profile clustering and 200 bits for activity-profile clustering. The procedure placed each profile together with its "nearest" profile(s) until all vectors were part of a cluster. An iterative adjustment phase was then used to place each vector into another cluster when it was actually closer to all of the members of that cluster than to the members of its current cluster. This adjustment phase was reperformed until convergence. In practice, three such iterations were needed. Very similar clustering results were obtained with the program CLUSTER (23) from the Eisen lab, using (average linkage) hierarchical clustering and the Euclidean distance metric.
|
Visualizing Profiles in 3-Dimensional Space. The clustering described above resulted in 36 and 13 clusters for the computation-based gene expression profiles and computation-based activity profiles, respectively. Classical multidimensional scaling (24), also known as principal-components analysis, was then used to project the very high-dimensional activity profiles into 3-dimensional space. (The relative contribution of the first three eigenvalues was 76% and 90% for the expression and activity profiles, respectively, indicating that the location of activity profiles relative to each other in three dimensions reflected very well their location relative to each other in their original high-dimensional space.) As a final step, a convex hull (25) was placed around all of the members of each cluster so that clusters could be visualized.
Expression Data. Microarray data from nine different experimental conditions were obtained from various studies performed in this laboratory and in others (6, 26, 37). All data sets had the following features in common: (i) They were generated by using WT E. coli K-12 MG1655 in batch culture grown to midlog phase; (ii) mRNA was reverse-transcribed and labeled as recommended by Affymetrix (Santa Clara, CA); (iii) cDNA samples were hybridized to E. coli antisense arrays (part no. 900381, Affymetrix); and (iv) data from .CEL files were processed by using the program MAS ver. 5.0 or 5.1 (Affymetrix) and normalized such that the average signal from all ORFs was 1,000 (an arbitrary value used by other laboratories). Log2 values of signal were used in this analysis. The growth conditions for the expression profiles are detailed in Table 1, which is published as supporting information on the PNAS web site.
Experiment-Based Activity Profiles. To use experimental expression data to support results stemming from computation-based activity profiles, we combined signal-intensity values from gene-expression profiles with the known structure of the transcriptional regulatory network to produce experiment-based activity profiles (see Fig. 1). Key to the computation-based activity profiles was the encoding of the expression relationship between a TF and target gene. For real expression profiles, this relationship was encoded as the logarithm of the ratio of the signal intensity values between a TF and a (regulated) target gene. Experiment-based activity profiles were constructed from experimental expression data for nine different environments.
Comparing Computation-Based Profiles with Experiment-Based Profiles. In this work, we investigated the correspondence between computational and experimental profiles. We were interested in measuring the degree to which the spatial relationship among the experimental profiles matched the spatial relationship among computational profiles for both gene-expression and activity profiles. Because computational and experimental profiles reside in different metric spaces, they are not directly comparable. To measure the degree of correspondence, we computed the distance between all pairs of experimental profiles and all pairs of computational profiles for the nine growth environments. This computation gave rise to 9 x 8/2 = 36 pairs of profiles between which distances could be calculated. Each set (computational and experimental) of distance values imparts a ranking among the 36 pairs of growth environments. We then used rank-correlation analysis (27) to measure the degree of correspondence between the computational and experimental profiles. We computed both Kendall's
and Spearman's
rank-correlation coefficients and the P value associated with the computed values.
| Results |
|---|
|
|
|---|
|
The second approach incorporates both the gene-expression profiles (as above) and the logic states of the TF-target-gene regulatory connections into an activity profile (see Fig. 1). Each activity profile is a bit vector composed of units describing the logical interactions observed for every transcription-factortarget-gene pair and also accounts for the expression state of each gene. A histogram of the Hamming distances between all pairs of such computation-based activity profiles is trimodal (Fig. 2 A) and has a more distinct structure than the expression state alone (Fig. 2 A). Through clustering and dimensionality reduction (as above), we visually elucidate clusters of activity profiles of iMC1010v1 in a 3-dimensional space (Fig. 4).
We investigated how our clustering results changed when we used cutoff values progressively lesser and greater than the automatically chosen ones. We performed this assessment quantitatively by first defining a coclustering relationship for all pairs of profiles based on the clusters for the automatically chosen cutoff. We then established a "reference relationship" between every pair of profiles: either the two profiles of a pair were in the same cluster, or they were not. Next, we reperformed the clustering procedure for different cutoff values. For each different value, we observed the resulting coclustering relationship for all pairs of profiles and then calculated the fraction of the reference relationships that were maintained. We observed that >90% of the reference relationships were maintained for cutoff values ±30% of the automatically chosen value (data not shown), showing the stability of the structure of the clusters in Figs. 3 and 4 to the exact value of the clustering cutoff value.
Functional States of iMC1010v1 Are Globally Organized According to the Terminal Electron Acceptor and Carbon Source. We analyzed the growth environments corresponding to the computed gene expression and activity profiles in the clusters shown in Figs. 3 and 4. The clusters of expression profiles in Fig. 3 are generally distinguished by the fact that the growth environments to which they correspond use the same terminal electron acceptor. This distinction is not crisp, however, because many clusters characterized by different terminal electron acceptors overlap and/or are closer to each other than they are to other clusters characterized by the same electron acceptor. Distinguishing clusters based on other growth-environment components did not prove fruitful.
In contrast, the activity profile clusters in Fig. 4 could be clearly distinguished by three discriminating environmental characteristics. The vast majority (97%) of the activity profiles could be discriminated based solely on the available terminal electron acceptor, regardless of the available carbon, nitrogen, phosphate, or sulfur sources. All activity profiles in clusters 14 had dimethyl sulfoxide, fumarate, trimethylamine N-oxide, or none (i.e., anaerobic fermentation) as the electron acceptor; all in clusters 7 and 8 had NO2 or NO3 as the electron acceptor; and all in clusters 1012 had O2 as the electron acceptor. Together, these nine clusters contained 97% of the activity profiles.
The clusters using the same electron-acceptor class are indicated by the ellipses in Fig. 4. The remaining 3% of the activity profiles were contained in the four clusters (5, 6, 9, and 13) that are separated within the ellipses. These four clusters represent discrimination based on the presence of glucose or gluconate in the growth environment. So, for example, whereas clusters 79 contained all activity profiles that used NO2 or NO3 as the electron acceptor, those that used glucose or gluconate were in cluster 9, and those using all other carbon sources were in clusters 7 and 8. This separation of clusters 7 and 8 from cluster 9 implies that iMC1010v1 varies significantly in its use when glucose or gluconate, as opposed to all other carbon sources, are present in the growth environment. This result is, in large part, due to Crp, which is inactive in the presence of glucose and gluconate because of low levels of cAMP and whose regulon is nearly twice the size (197 vs. 111) of the next-largest regulon (28).
The remaining organization in the set of activity profiles was attributable to the available nitrogen source. The distinct but overlapping clusters in Fig. 4 (indicated by different colors and cluster numbers) were mostly the result of some nitrogen sources eliciting different regulatory and metabolic responses. The nitrogen sources that distinguished overlapping clusters were adenine, adenosine, deoxyadenosine, guanosine, deoxyguanosine, cytidine, inosine, and N-acetyl-D-mannosamine. The fact that these clusters overlap indicates that the regulatory and metabolic networks are not as differently used for these nitrogen sources as compared with glucose/gluconate and the rest of the carbon sources.
The Activity of a Few TFs Distinguishes the Dominant Operating Modes of iMC1010v1. The clusters in Fig. 4 can be discriminated among based on three environmental components. Because the computation-based activity profiles include the state of the TFs, we can also distinguish the clusters based on TF activities. To determine the roles of the TFs in the clusters, we first detailed the activities of all TFs in the following sets of clusters: 14; 5, 6; 7, 8; 9; 1012; and 13. Then, for every pair of sets of clusters, we report in Table 2 (which is published as supporting information on the PNAS web site) those TFs that were active in 100% of all simulations in one set and inactive in 100% of all simulations in another set. The implication of these results is that some or all of these identified TFs are global determinants of the expression program in response to the environments in which they are active.
|
and Spearman's
rank-correlation coefficients based on the ranking of interprofile distances for the 9 x 8/2 = 36 pairs of experiment- and computation-based profiles. For the gene-expression profiles, we computed
= 0.615 and
= 0.828, with associated P values of 6.3 x 10-8 and 4.9 x 10-7, respectively. For the activity profiles, we computed
= 0.610 and
= 0.790, with associated P values of 8.5 x 10-8 and 1.5 x 10-6, respectively. These values are acceptably high, especially considering the measurement noise (mean r2 value between microarray replicates was 0.942). We concluded that the available experimental gene-expression data were highly consistent with the spatial organization of the clusters in Figs. 3 and 4. Because of the limited amount of available experimental data, this conclusion will require revalidation as additional expression-profiling data for appropriate minimal-media conditions are published. | Discussion |
|---|
|
|
|---|
Fig. 4 demonstrates how all functional states of iMC1010v1 can be condensed to allow one to quickly and easily understand the functional capabilities and properties of the E. coli integrated transcriptional regulatory and metabolic network for a large number of growth environments. This representation and summary of functional states is not possible with a logical connectivity diagram of the transcriptional regulatory-network hierarchy. The result that a large genetic system of >1,000 genes operating in 15,580 different growth environments achieves few distinct overall responses is consistent with previous experimental results and discussion about "simplicity from complexity" (31) and "complexity for simplicity" (32). Singular value decomposition analysis of gene-expression data showed the transcriptional response of the yeast genome to be orchestrated through a few fundamental patterns of gene-expression change (35). With changing cellular conditions, a majority of genes were seen to transition between active and inactive states, at most, once or twice. The study of signaling networks (6) led to the proposition (also conceptually applicable to regulatory networks) that the objective of system complexity is to guarantee reliable system output. That is, system complexity is built in to robustly provide for simple behavior. The result of few global functional modes is also consistent with earlier theoretical work on logical control networks and their applicability to genomic regulatory networks (33). In this work, random Boolean networks parameterized to reflect fundamental features of polymer chemistry were found to have few state cycle attractors, which are interpreted to be cell phenotypes. The results presented here, based on the largest reconstructed genomic regulatory system to date, are consistent with this hypothesis.
|
Of the other cluster-defining TFs identified in Table 2, Fnr and NarL are also known to be widely influential. Fnr, which belongs to the Crp family of transcriptional regulators and is active in the absence of O2, activates genes for fermentation and anaerobic growth and represses genes for aerobic growth. Fnr activates transcription initiation in a similar way to Crp, and each can bind to the DNA site for the other protein (34, 35). Because of these abilities, the authors of the Crp study proposed that some of their identified Crp-regulated operons may actually be Fnr-regulated in vivo. NarL acts as an activator and repressor in response to a nitrate/nitrite environment to enable anaerobic respiration using either nitrite or nitrate as the terminal electron acceptor.
The two remaining transcription factors identified in Table 2, CaiF and FeaR, are known to regulate only relatively few genes in comparison with the other TFs. It appears that CaiF and FeaR are narrowly acting TFs that are directly regulated only by global TFs. Nonetheless, the result that they were singled out with known global regulators may make them interesting targets for study.
Figs. 3 and 4 show that the more detailed, or informative, state description leads not to a more complicated view of the cell's "state space," but to a markedly simpler one. We interpret this result to be a reflection of the internal organization of the cell, which is that of a system robustly structured against all but a few characteristic behaviors. If this is the case, then we expect that this reflection will become increasingly pronounced as large amounts of additional types of state information are integrated.
| Conclusion |
|---|
|
|
|---|
This study, on the whole, represents an integration of diverse sources of data. Such structured integration of diverse data sets represents one of the main current challenges in systems biology. By demonstrating that the integration of the metabolic and transcriptional regulatory networks in a mechanistic format can be used to generate an improved understanding of the functional states of E. coli metabolism, we have demonstrated the utility of the systems-biology approach and a framework for its implementation.
| Acknowledgements |
|---|
| Footnotes |
|---|
Conflict of interest statement: University of California at San Diego has a related patent application (U.S. Patent Application 20040072723) that has been licensed.
This paper was submitted directly (Track II) to the PNAS office.
Abbreviation: TF, transcription factor.
* To whom correspondence should be addressed. E-mail: palsson{at}ucsd.edu.
© 2005 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
Related articles in PNAS:
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
L. G. Koch and S. L. Britton Aerobic metabolism underlies complexity and capacity J. Physiol., January 1, 2008; 586(1): 83 - 95. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. G. Koch and S. L. Britton Evolution, atmospheric oxygen, and complex disease Physiol Genomics, August 20, 2007; 30(3): 205 - 208. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. D. Vo and B. O. Palsson Building the power house: recent advances in mitochondrial studies through proteomics and systems biology Am J Physiol Cell Physiol, January 1, 2007; 292(1): C164 - C177. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Sanjuan and S. F. Elena Epistasis correlates to genomic complexity PNAS, September 26, 2006; 103(39): 14402 - 14405. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Rahman and D. Schomburg Observing local and global properties of metabolic pathways: 'load points' and 'choke points' in the metabolic networks Bioinformatics, July 15, 2006; 22(14): 1767 - 1774. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||