# Reconstructing the pathways of a cellular system from genome-scale signals by using matrix and tensor computations

## Abstract

We describe the use of the matrix eigenvalue decomposition (EVD) and pseudoinverse projection and a tensor higher-order EVD (HOEVD) in reconstructing the pathways that compose a cellular system from genome-scale nondirectional networks of correlations among the genes of the system. The EVD formulates a genes × genes network as a linear superposition of genes × genes decorrelated and decoupled rank-1 subnetworks, which can be associated with functionally independent pathways. The integrative pseudoinverse projection of a network computed from a “data” signal onto a designated “basis” signal approximates the network as a linear superposition of only the subnetworks that are common to both signals and simulates observation of only the pathways that are manifest in both experiments. We define a comparative HOEVD that formulates a series of networks as linear superpositions of decorrelated rank-1 subnetworks and the rank-2 couplings among these subnetworks, which can be associated with independent pathways and the transitions among them common to all networks in the series or exclusive to a subset of the networks. Boolean functions of the discretized subnetworks and couplings highlight differential, i.e., pathway-dependent, relations among genes. We illustrate the EVD, pseudoinverse projection, and HOEVD of genome-scale networks with analyses of yeast DNA microarray data.

### Sign up for PNAS alerts.

Get alerts for new articles, or get an alert when an article is cited.

DNA microarrays make it possible to record the complete genomic signals, such as mRNA expression (e.g., refs. 1 and 2) and DNA-bound proteins' occupancy levels (e.g., ref. 3), that are generated and sensed by cellular systems. The underlying genome-scale networks of relations among all genes of the cellular systems can be computed from these signals (e.g., refs. 4–6). These relations among the activities of genes, not only the activities of the genes alone, are known to be pathway-dependent, i.e., conditioned by the biological and experimental settings in which they are observed (e.g., ref. 7). For example, the mRNA expression patterns of the yeast

*Saccharomyces cerevisiae*genes*KAR4*and*CIK1*are correlated during mating yet anticorrelated during cell-cycle progression (8). A single genome-scale nondirectional network of correlations cannot describe the pathway-dependent differences in relations, such as those between the expression patterns of*KAR4*and*CIK1.*Recently, we showed that the matrix singular-value decomposition (SVD), generalized SVD, and pseudoinverse projection separate genome-scale signals, i.e., gene and array patterns of, e.g., mRNA expression and proteins' DNA binding, into mathematically defined patterns that correlate with the independent biological and experimental processes and cellular states that compose the signals (9–12). For example, the comparative generalized SVD of yeast and human mRNA expression during their cell cycles formulates the yeast expression as a linear superposition of cell-cycle oscillations, which are common to the yeast and human, and response to synchronization by the mating pheromone, which is exclusive to the yeast, and describes a differential relation in the expression of genes such as

*KAR4*and*CIK1*that is in agreement with their pathway-dependent activities (11).Now, we describe the use of the matrix eigenvalue decomposition (EVD) and pseudoinverse projection and a tensor higher-order EVD (HOEVD) in reconstructing the pathways, or genome-scale pathway-dependent relations among the genes of a cellular system, from nondirectional networks of correlations, which are computed from measured genomic signals and tabulated in symmetric matrices. The EVD formulates a genes × genes network, which is computed from a “data” signal, as a linear superposition of genes × genes decorrelated and decoupled rank-1 subnetworks. We show that significant EVD subnetworks might represent functionally independent pathways that are manifest in the data signal. The integrative pseudoinverse projection of a network, computed from a data signal, onto a designated “basis” signal approximates the network as a linear superposition of only the subnetworks that are common to both signals, i.e., pseudoinverse projection filters off the network the subnetworks that are exclusive to the data signal. We show that the pseudoinverse-projected network simulates observation of only the pathways that are manifest under both sets of the biological and experimental conditions where the data and basis signals are measured. We define a comparative HOEVD that formulates a series of networks computed from a series of signals as linear superpositions of decorrelated rank-1 subnetworks and the rank-2 couplings among these subnetworks. We show that significant HOEVD subnetworks and couplings might represent independent pathways or transitions among them common to all or exclusive to a subset of the signals. Boolean functions of the discretized subnetworks and couplings highlight known as well as previously unknown differential, i.e., pathway-dependent relations between genes. We illustrate the EVD, pseudoinverse projection, and HOEVD of genome-scale networks with analyses of mRNA expression data from the yeast

*Saccharomyces cerevisiae*during its cell cycle (1) and DNA-binding data of yeast transcription factors that are involved in cell-cycle, development, and biosynthesis programs (3).## Mathematical Methods: EVD, Pseudoinverse Projection, and HOEVD of Networks

**Eigenvalue Decomposition.**Let the symmetric matrix

*â*

_{1}of size

*N*-genes ×

*N*-genes tabulate the genome-scale nondirectional network of correlations among the genes of a cellular system.¶ The network

*â*

_{1}is computed from a genome-scale signal, designated the data signal, of, e.g., mRNA expression levels measured in a set of

*M*

_{1}samples of the cellular system using

*M*

_{1}DNA microarrays and tabulated in the

*N*-genes ×

*M*

_{1}-arrays matrix

*ê*

_{1}, such that \( \begin{equation*}{\hat {a}}_{1}={\hat {e}}_{1}{\hat {e}}_{1}^{T}\end{equation*}\). We compute the EVD of the network

*â*

_{1},

\[ \begin{equation*} {\hat {a}}_{1}={\hat {u}}_{1}{\hat {{\varepsilon}}}_{1}^{2}{\hat {u}}_{1}^{T},\;\end{equation*}\]

[1]

*M*

_{1}-“eigenarrays” ×

*M*

_{1}-“eigengenes” diagonal matrix \( \begin{equation*}{\hat {{\varepsilon}}}_{1}\end{equation*}\) defines the

*M*

_{1}nonnegative “eigenexpression” levels, such that the expression of the

*m*th eigengene in the

*m*th eigenarray is the

*m*th eigenexpression level of

*ê*

_{1}, \( \begin{equation*}{\varepsilon}_{1,m}{\equiv}{\langle}m|{\hat {{\varepsilon}}}_{1}|m{\rangle}{\geq}0\end{equation*}\). The orthogonal transformation matrices

*û*

_{1}and \( \begin{equation*}{\hat {v}}_{1}^{T}\end{equation*}\) define the

*N*-genes ×

*M*

_{1}-eigenarrays and the

*M*

_{1}-eigengenes ×

*M*

_{1}-arrays subspaces, respectively. The

*m*th column of

*û*

_{1}, |α

_{1,m}〉 ≡

*û*

_{1}|

*m*〉, lists the genome-scale expression of the

*m*th eigenarray of

*ê*

_{1}. The

*n*th row of \( \begin{equation*}{\hat {v}}_{1}^{T}\end{equation*}\), \( \begin{equation*}{\langle}{\gamma}_{1,n}|{\equiv}{\langle}n|{\hat {v}}_{1}^{T}\end{equation*}\), lists the expression of the

*n*th eigengene.

EVD formulates the network The significance of the

*â*_{1}as a linear superposition of a series of*M*_{1}rank-1 symmetric “subnetworks” of size*N*-genes ×*N*-genes each, where the*m*th subnetwork is the outer product of the*m*th eigenarray with its transpose |α_{1,}*〉 〈α*_{m}_{1,}*| (Fig. 5 in*_{m}*Supporting Appendix*, which is published as supporting information on the PNAS web site),\[ \begin{equation*} {\hat {a}}_{1}={{\sum^{M_{1}}_{m=1}}}{\varepsilon}_{1,m}^{2}|a_{1,m}{\rangle}{\langle}{\alpha}_{1,m}|.\;\end{equation*}\]

[2]

*m*th subnetwork is indicated by the*m*th “fraction of eigenexpression” \( \begin{equation*}p_{1,m}={\varepsilon}_{1,m}^{2}/({{\sum^{M_{1}}_{m=1}}}{\varepsilon}_{1,m}^{2})\end{equation*}\), i.e., the expression correlation captured by the*m*th subnetwork relative to that captured by all subnetworks. Each subnetwork is decorrelated of all other subnetworks, i.e., |α_{1,}*〉 〈α*_{m}_{1,}*|α*_{m}_{1,}*〉 〈α*_{n}_{1,}*| = 0 for all*_{n}*m*≠*n*, since*û*_{1}is orthogonal. Each subnetwork is also decoupled of all other subnetworks, such that there are no contributions to the network*â*_{1}from the*M*_{1}(*M*_{1}– 1)/2 rank-2 symmetric “couplings” among the subnetworks, i.e., |α_{1,}*〉 〈α*_{m}_{1,}*| + |α*_{n}_{1,}*〉 〈α*_{n}_{1,}*| for all*_{m}*m*≠*n*, since \( \begin{equation*}{\hat {u}}_{1}\end{equation*}\) is diagonal. For a real data signal*ê*_{1}, the eigenarrays are unique up to phase factors of ±1, and therefore the subnetworks are also unique, i.e., data-driven, except in degenerate subspaces defined by subsets of equal eigenexpression levels.**Pseudoinverse Projection.**Let the matrix

*b̂*of size

*N*-genes ×

*L*-arrays tabulate the genome-scale signal, designated the “basis” signal, of, e.g., proteins' DNA-binding occupancy levels measured in a set of

*L*samples of the cellular system using

*L*arrays. We compute the pseudoinverse projection (12, 13) of the network

*â*

_{1}onto the basis signal

*b̂*,

\[ \begin{equation*} {\hat {a}}_{1}{\rightarrow}{\hat {a}}_{2}{\equiv}({\hat {b}}{\hat {b}}^{{\dagger}}){\hat {a}}_{1}({\hat {b}}{\hat {b}}^{{\dagger}}),\;\end{equation*}\]

[3]

*ê*

_{1}onto the basis

*b̂*,

*ê*

_{1}→

*ê*

_{2}=

*b̂b̂*

^{†}ê_{1}, using the SVD of the basis \( \begin{equation*}{\hat {b}}={\hat {U}}{\hat {{\omega}}}{\hat {V}}^{T}\end{equation*}\) to compute its pseudoinverse \( \begin{equation*}{\hat {b}}^{{\dagger}}={\hat {V}}{\hat {{\omega}}}^{-1}{\hat {U}}^{T}\end{equation*}\). The

*l*th column of

*Û*, |β

*〉≡*

_{l}*Û*|

*l*〉, lists the genome-scale binding of the

*l*th eigenarray of

*b̂*. The pseudoinverse-projected network

*â*

_{2}is unique, i.e., data-driven. For a real basis signal

*b̂*,

*b̂b̂*

^{†}is an orthogonal projection matrix, and the projected network

*â*

_{2}is symmetric.

We compute the EVD of the projected network where

*â*_{2},\[ \begin{equation*} {\hat {a}}_{2}={\hat {u}}_{2}{\hat {{\varepsilon}}}_{2}^{2}{\hat {u}}_{2}^{T}={{\sum^{M_{2}}_{m=1}}}{\varepsilon}_{2,m}^{2}|{\alpha}_{2,m}{\rangle}{\langle}{\alpha}_{2,m}|,\;\end{equation*}\]

[4]

*M*_{2}= min{*L, M*_{1}}, from the SVD of the projected signal \( \begin{equation*}{\hat {e}}_{2}={\hat {u}}_{2}{\hat {{\varepsilon}}}_{2}{\hat {v}}_{2}^{T}\end{equation*}\), where the*m*th column of*û*_{2}, |α_{2,}*〉≡*_{m}*û*_{2}|*m*〉, lists the genome-scale expression of the*m*th eigenarray of*ê*_{2}. In reconstructing*â*_{2}, the pseudoinverse projection filters out of*â*_{1}each of its subnetworks |α_{1,}*〉 〈α*_{m}_{1,}*|, which is decorrelated of the series of*_{m}*L*rank-1 symmetric subnetworks |β*〉 〈β*_{l}*| that compose the network*_{l}*b̂b̂*computed from the basis signal^{T}*b̂*, such that |β*〉 〈β*_{l}*|α*_{l}_{1,}*〉 〈α*_{m}_{1,}*| = 0 for all l = 1, 2,...,*_{m}*L*(Fig. 6 in*Supporting Appendix*).**Higher-Order EVD (HOEVD).**Let the third-order tensor {

*â*} of size

_{k}*K*-networks ×

*N*-genes ×

*N*-genes tabulate a series of

*K*genome-scale networks computed from a series of

*K*genome-scale signals {

*ê*}, of size

_{k}*N*-genes ×

*M*-arrays each, such that \( \begin{equation*}{\hat {a}}_{k}={\hat {e}}_{k}{\hat {e}}_{k}^{T}\end{equation*}\) for all

_{k}*k*= 1, 2,...,

*K*. We define and compute a HOEVD of the tensor of networks {

*â*},

_{k}\[ \begin{equation*} {\hat {a}}{\equiv}{{\sum^{K}_{k=1}}}{\hat {a}}_{k}={\hat {u}} \left \left({{\sum^{K}_{k=1}}}{\hat {{\varepsilon}}}_{k}^{2}\right) \right {\hat {u}}^{T}={\hat {u}}{\hat {{\varepsilon}}}^{2}{\hat {u}}^{T},\;\end{equation*}\]

[5]

*m*th column of

*û*, |α

*〉 ≡*

_{m}*û*|

*m*〉, lists the genome-scale expression of the

*m*th eigenarray of

*ê*. Whereas the matrix EVD is equivalent to the matrix SVD for a symmetric nonnegative matrix, this tensor HOEVD is different from the tensor higher-order SVD (14–16) for the series of symmetric nonnegative matrices {

*â*}, where the higher-order SVD is computed from the SVD of the appended networks (

_{k}*â*

_{1},

*â*

_{2},...,

*â*) rather than the appended signals. This HOEVD formulates the overall network computed from the appended signals

_{K}*â*=

*êê*as a linear superposition of a series of \( \begin{equation*}M{\equiv}{{\sum^{K}_{k=1}}}M_{k}\end{equation*}\) rank-1 symmetric “subnetworks” that are decorrelated of each other \( \begin{equation*}{\hat {a}}={{\sum^{M}_{m=1}}}{\varepsilon}_{m}^{2}|{\alpha}_{m}{\rangle}{\langle}{\alpha}_{m}|\end{equation*}\). Each subnetwork is also decoupled of all other subnetworks in the overall network

^{T}*â*, since \( \begin{equation*}{\hat {{\varepsilon}}}\end{equation*}\) is diagonal.

This HOEVD formulates each individual network in the tensor {for all

*â*_{k}} as a linear superposition of this series of*M*rank-1 symmetric decorrelated subnetworks and the series of*M*(*M*-1)/2 rank-2 symmetric couplings among these subnetworks (Fig. 7 in*Supporting Appendix*), such that\[ \begin{equation*} {\hat {a}}_{k}{\equiv}{{\sum^{M}_{m=1}}}{\varepsilon}_{k,m}^{2}|a_{m}{\rangle}{\langle}{\alpha}_{m}|+{{\sum^{M}_{m=1}}}{{\sum^{M}_{l=m+1}}}{\varepsilon}_{k,lm}^{2}(|a_{l}{\rangle}{\langle}a_{m}|+|{\alpha}_{m}{\rangle}{\langle}{\alpha}_{l}|),\;\end{equation*}\]

[6]

*k*= 1, 2,...,*K*. The subnetworks are not decoupled in any one of the networks {*â*}, since, in general, \( \begin{equation*}\{{\hat {{\varepsilon}}}_{k}^{2}\}\end{equation*}\) are symmetric but not diagonal, such that \( \begin{equation*}{\varepsilon}_{k,lm}^{2}{\equiv}{\langle}l|{\hat {{\varepsilon}}}_{k}^{2}|m{\rangle}={\langle}m|{\hat {{\varepsilon}}}_{k}^{2}|l{\rangle}{\not=}0\end{equation*}\). The significance of the_{k}*m*th subnetwork in the*k*th network is indicated by the*m*th fraction of eigenexpression of the*k*th network \( \begin{equation*}p_{k,m}={\varepsilon}_{k,m}^{2}/({{\sum^{K}_{k=1}}}{{\sum^{M}_{m=1}}}{\varepsilon}_{k,m}^{2}){\geq}0\end{equation*}\), i.e., the expression correlation captured by the*m*th subnetwork in the*k*th network relative to that captured by all subnetworks (and all couplings among them, where \( \begin{equation*}{{\sum^{K}_{k=1}}}{\varepsilon}_{k,lm}^{2}=0\end{equation*}\) for all l ≠*m*) in all networks. Similarly, the amplitude of the fraction \( \begin{equation*}p_{k,lm}={\varepsilon}_{k,lm}^{2}/({{\sum^{K}_{k=1}}}{{\sum^{M}_{m=1}}}{\varepsilon}_{k,m}^{2})\end{equation*}\) indicates the significance of the coupling between the*l*th and*m*th subnetworks in the*k*th network. The sign of this fraction indicates the direction of the coupling, such that*p*_{k}_{,}*> 0 corresponds to a transition from the*_{lm}*l*th to the*m*th subnetwork and*p*_{k}_{,}*< 0 corresponds to the transition from the*_{lm}*m*th to the*l*th. For real signals {*ê*}, the subnetworks are unique, and the couplings among them are unique up to phase factors of ±1, except in degenerate subspaces of \( \begin{equation*}{\hat {{\varepsilon}}}\end{equation*}\)._{k}**Interpretation of the Subnetworks and Their Couplings.**We parallel- and antiparallel-associate each subnetwork or coupling with most likely expression correlations, or none thereof, according to the annotations of the two groups of

*x*pairs of genes each, with largest and smallest levels of correlations in this subnetwork or coupling among all

*X*=

*N*(

*N*– 1)/2 pairs of genes, respectively. The

*P*value of a given association by annotation is calculated by using combinatorics and assuming hypergeometric probability distribution of the

*Y*pairs of annotations among the

*X*pairs of genes, and of the subset of

*y*⊆

*Y*pairs of annotations among the subset of

*x*⊆

*X*pairs of genes, \( \begin{equation*}P(x;y,Y,X)=(\begin{matrix}X\\ x\end{matrix})^{-1}{{\sum^{x}_{z=y}}}(\begin{matrix}Y\\ z\end{matrix})(\begin{matrix}X-Y\\ x-z\end{matrix})\end{equation*}\), where \( \begin{equation*}(\begin{matrix}X\\ x\end{matrix})=X!x!^{-1}(X-x)^{-1}\end{equation*}\) is the binomial coefficient (17). The most likely association of a subnetwork with a pathway or of a coupling between two subnetworks with a transition between two pathways is that which corresponds to the smallest

*P*value. Independently, we also parallel- and antiparallel-associate each eigenarray with most likely cellular states, or none thereof, assuming hypergeometric distribution of the annotations among the

*N*-genes and the subsets of

*n*⊆

*N*genes with largest and smallest levels of expression in this eigenarray. The corresponding eigengene might be inferred to represent the corresponding biological process from its pattern of expression.

For visualization, we set the

*x*correlations among the*X*pairs of genes largest in amplitude in each subnetwork and coupling equal to ±1, i.e., correlated or anticorrelated, respectively, according to their signs. The remaining correlations are set equal to 0, i.e., decorrelated. We compare the discretized subnetworks and couplings using Boolean functions (6).## Biological Results: Yeast Pathways from mRNA Expression and Proteins' DNA-Binding Signals

**Significant EVD Subnetworks Are Associated with Functionally Independent Pathways.**We compute the network

*â*

_{1}from the data signal

*ê*

_{1}, which tabulates relative mRNA expression levels of

*n*= 4,153 yeast genes with valid data in at least 15 of the

*M*= 18 samples of a cell cycle time course of a culture synchronized by the mating pheromone α factor (1). The relative expression level of the

*n*th gene in the

*m*th sample is presumed valid when the ratio of the measured expression to the background signal is >1.5 for both the synchronized culture and asynchronous reference. Before computing

*â*

_{1}, we use SVD to estimate the missing data in

*ê*

_{1}(10, 18) and to approximately center the expression pattern of each gene in

*ê*

_{1}at its time-invariant level (

*Supporting Appendix*).

EVD of the network

*â*_{1}uncovers four significant subnetworks, which capture >60%, 10%, 5%, and 5%, respectively, of the expression correlation of*â*_{1}. These subnetworks are associated with the independent pathways manifest in the data signal*ê*_{1}, following the*P*values for the distribution of the*Y*= 1,035 pairs of the 46 genes that were microarray-classified as pheromoneregulated (2) among all*X*= 2,926 pairs of the 77 genes that were traditionally classified as cell-cycle-regulated (1), and among each of the subsets of*x*= 150 pairs of genes with largest and smallest levels, respectively, of expression correlation (Table 2 in*Supporting Appendix*). The associations of the EVD subnetworks of*â*_{1}are consistent with those of the corresponding SVD eigenarrays of*ê*_{1}following the*P*values for the distribution of the 284 pheromone-regulated genes and that of the 574 genes, which were traditionally or microarray-classified as cell-cycle-regulated, among all 4,153 genes and among each of the subsets of 150 genes with largest and smallest levels, respectively, of expression (Table 1 in*Supporting Appendix*). The associations of the EVD subnetworks of*â*_{1}are also consistent with the patterns of expressions of the corresponding SVD eigengenes of*ê*_{1}(Fig. 8 in*Supporting Appendix*). We visualize the discretized four subnetworks and their Boolean functions in the subset of 70 genes that constitute the*x*= 150 correlations in each subnetwork that are largest in amplitude among the*X*= 2,926 pairs of traditionally classified cell-cycle-regulated genes.The first and most significant subnetwork is associated with the α factor signal-transduction pathway, where the relations among the genes depend only on their pheromone-response classifications. Genes that are up-regulated in response to pheromone, and separately also genes that are down-regulated, are correlated, even when these genes are classified into antipodal cell-cycle stages. Genes that are up-regulated in response to pheromone are anticorrelated with genes that are down-regulated, even when these genes are classified into the same cell-cycle stages. For example,

*KAR4*, which is up-regulated in response to pheromone, is correlated with*CIK1*, which is also up-regulated, and anticorrelated with*CLN2*, which is down-regulated (Fig. 1*a*), even though the expression of both*KAR4*and*CLN2*peaks at the cell-cycle stage G_{1}while the expression of*CIK1*peaks at the antipodal stage S/G_{2}. In the second subnetwork, which is associated with the exit from the α factor-induced cell-cycle arrest in M/G_{1}and the entry into cell-cycle progression at G_{1}, genes that are up-regulated in response to pheromone are correlated, independent of their cell-cycle classification. The relations among genes that are down-regulated, however, depend on their cell-cycle, rather than their pheromone-response, classification. For example,*CLN2*and*CLB2*, which encode cyclins of the antipodal stages G_{1}and G_{2}/M, respectively, are anticorrelated, even though both are down-regulated in response to pheromone; and*SWI4*, which encodes a G_{1}transcription factor, is correlated with*CLN2*and anticorrelated with*CLB2*(Fig. 1*b*). In the third and fourth subnetworks, which are associated with the two pathways of antipodal cell-cycle-expression oscillations that are orthogonal, i.e., π /2 out of phase relative to one another, the relations among genes depend only on their cell-cycle classifications. For example, in the third subnetwork, which is associated with the cell-cycle-expression oscillations at S vs. those at M,*KAR4*is anticorrelated with*CIK1*, where*KAR4*is correlated, and*CIK1*is anticorrelated with*ASH1*(Fig. 1*c*). In the fourth subnetwork, which is associated with expression at G_{1}vs. that at G_{2},*KAR4*is correlated with*CLN2*(Fig. 1*d*).Fig. 1.

Boolean functions of the discretized subnetworks highlight known pathway-dependent relations among genes, common to a subset of the subnetworks or antipodal across the subnetworks (Fig. 9 in

*Supporting Appendix*).**Integrative Pseudoinverse-Projected Networks Simulate Observation of only the Pathways Manifest in both the Data and Basis Signals.**We compute the network

*â*

_{2}by pseudoinverse-projecting the network

*â*

_{1}onto the basis signal, which tabulates the relative DNA-bound protein occupancy levels of the 2,120 genes with at least one valid data point in any one of

*L*= 12 samples that correspond to 12 yeast-cell-cycle transcription factors (3). The relative binding occupancy level of the

*n*th gene in the

*l*th sample is presumed valid when the associated

*P*value is <0.1. Similarly,

*â*

_{3}is computed by projecting

*â*

_{1}onto the basis signal, which tabulates the occupancy levels of 2,476 genes in 12 samples of transcription factors involved in developmental programs, such as mating; and

*â*

_{4}is computed by projecting

*â*

_{1}onto the basis signal, which tabulates the occupancy levels of 2,943 genes in eight samples of factors involved in biosynthesis, such as DNA replication. Before computing

*â*

_{2},

*â*

_{3}, and

*â*

_{4}for the 1,588, 1,827, and 2,254 genes at the intersections of

*â*

_{1}and the proteins' DNA-binding basis signals, we divide each gene measurement in each basis signal by the arithmetic mean of the measurements for that gene in that signal, thus converting the signals to DNA-binding levels of each transcription factor relative to those of all other factors. We also approximately center the binding pattern of each gene at its transcription factor-invariant level using SVD (

*Supporting Appendix*).

EVD of the cell-cycle-projected network

*â*_{2}uncovers only two significant subnetworks, which capture ≈55% and 30% of the expression correlation of*â*_{2}, respectively, and are associated with the two pathways of antipodal cell-cycle-expression oscillations at G_{1}vs. those at G_{2}and at S vs. M, respectively [Table 4 (row a) in*Supporting Appendix*]. Boolean AND intersection of the discretized first subnetwork of*â*_{2}, in the subset of 200 correlations largest in amplitude among all traditionally classified cell-cycle genes of*â*_{2}, with the discretized fourth subnetwork of*â*_{1}highlights correlations among traditionally classified M/G_{1},G_{1}, and S genes, and anticorrelations among these genes and G_{2}/M genes, independent of their responses to pheromone (Fig. 2*a*). Boolean AND of the second subnetwork of*â*_{2}with the third subnetwork of*â*_{1}highlights correlations among M/G_{1}genes and their anticorrelations with S and S/G_{2}genes (Fig. 2*b*). The α factor signal-transduction pathway that is manifest in the data but not in the basis signal is not associated with either one of the subnetworks of*â*_{2}. Similarly, EVD of the development-projected network*â*_{3}uncovers only one significant subnetwork, which captures >90% of the expression correlation of*â*_{3}and is associated with the α factor signal-transduction pathway [Table 4 (row b) in*Supporting Appendix*]. Boolean AND of the subnetwork of*â*_{3}with the first subnetwork of*â*_{1}highlights correlations among genes that are up-regulated in response to pheromone and their anticorrelations with down-regulated genes, independent of their cell-cycle classifications (Fig. 2*c*). The cell-cycle-expression oscillation pathways that are manifest in the data but not in the basis signal are not associated with either one of the subnetworks of*â*_{3}. EVD of the biosynthesis-projected network*â*_{4}uncovers three significant subnetworks, which capture together >90% of the expression correlation of*â*_{4}, all of which are associated with the activity of histones that peaks during DNA replication at the cell-cycle stage S [Table 4 (row c) and Fig. 13 in*Supporting Appendix*].Fig. 2.

The associations of the EVD subnetworks of the projected networks

*â*_{2},*â*_{3}, and*â*_{4}are consistent with the associations of the corresponding SVD eigenarrays (Table 3 in*Supporting Appendix*) and eigengenes (Figs. 10–12 in*Supporting Appendix*) of the projected signals*ê*_{2},*ê*_{3}, and*ê*_{4}, respectively.**Comparative HOEVD Subnetworks and Their Couplings Are Associated with Pathways and the Transitions Among Them Common to the Series or Exclusive to a Subset of Networks.**HOEVD of the series of networks {

*â*

_{1},

*â*

_{2},

*â*

_{3}} uncovers three significant subnetworks, which capture ≈40%, 15%, and 9% of the expression correlation of the overall network

*â*≡

*â*

_{1}+

*â*

_{2}+

*â*

_{3}, respectively, and the three couplings among these subnetworks, which capture expression correlations only in the individual networks. The associations of the HOEVD subnetworks and couplings of {

*â*

_{1},

*â*

_{2},

*â*

_{3}} (Table 6 in

*Supporting Appendix*) are consistent with the associations of the corresponding SVD eigenarrays (Table 5 in

*Supporting Appendix*) and eigengenes (Fig. 14 in

*Supporting Appendix*) of the appended signals

*ê*≡ (

*ê*

_{1},

*ê*

_{2},

*ê*

_{3}), computed for the 868 genes at the intersection of

*ê*

_{1},

*ê*

_{2}, and

*ê*

_{3}.

The subnetworks are associated with the independent pathways that are manifest in the overall network as well as the individual networks. The first subnetwork, which is associated with the α factor signal-transduction pathway (Fig. 3

*a*), contributes to the expression correlations of the network*â*_{1}as well as to the development-projected network*â*_{3}, but its contribution to the cell-cycle-projected network*â*_{2}is negligible (Fig. 4*a*). The second and third subnetworks, which are associated with the two pathways of antipodal cell-cycle-expression oscillations at G_{1}vs. that at G_{2}and at S vs. that at M, respectively (Fig. 3*b*and*c*), contribute to*â*_{1}and*â*_{2}but not to*â*_{3}. The couplings are associated with the transitions among these independent pathways that are manifest in the individual networks only. The coupling between the first and second subnetworks is associated with the transition between the two pathways of response to pheromone and cell-cycle expression at G_{1}vs. that at G_{2}, i.e., the exit from pheromone-induced arrest and entry into cell-cycle progression (Fig. 3*d*). The coupling between the first and third subnetworks is associated with cell-cycle expression at G_{1}/S vs. that at M (Fig. 3*e*). The coupling between the second and third subnetworks is associated with cell-cycle-expression oscillations at the two antipodal cell-cycle checkpoints of G_{1}/S vs. G_{2}/M (Fig. 3*f*). All these couplings contribute to the expression correlation of*â*_{2}. Their contributions to the expression correlations of*â*_{1}and*â*_{3}are negligible (Fig. 4*b*).Fig. 3.

Fig. 4.

Boolean functions of the discretized subnetworks and couplings highlight known as well as previously unknown pathway-dependent relations among genes that are in agreement with current understanding of the cellular system of yeast (Fig. 15 in

*Supporting Appendix*) (19).## Discussion

We have shown that the matrix EVD and pseudoinverse projection and a tensor HOEVD can separate genome-scale nondirectional networks of, e.g., mRNA expression and proteins' DNA-binding relations among genes into mathematically defined subnetworks and their couplings that can be associated with functionally independent pathways and the transitions among them. In analyses of genome-scale yeast networks, these subnetworks and couplings uncover coordinated differential relations among cell-cycle- and pheromone-regulated genes that are in agreement with reported pathway-dependent activities of these genes. Possible additional applications of EVD, pseudoinverse projection, and HOEVD include reconstruction of pathways and transitions among these pathways from nondirectional networks of correlations among sets of orthologous genes, which are computed from genome-scale signals of different types and from different organisms to elucidate organism, as well as pathway, dependence of relations among genes (e.g., refs. 6, 11, 20, and 21).

## Supplementary Material

Supporting Appendix

## Notes

Author contributions: O.A. and G.H.G. designed research; O.A. performed research; O.A. analyzed data; and O.A. and G.H.G. wrote the paper.

Conflict of interest statement: No conflicts declared.

Abbreviations: EVD, eigenvalue decomposition; HOEVD, higher-order EVD; SVD, singular-value decomposition.

¶

In this article,

*m̂*denotes a matrix, |*v*〉 denotes a column vector, and 〈*u*| denotes a row vector, such that*m̂*|*v*〉, 〈*u*|*m̂*, and 〈*u*|*v*〉 all denote inner products, and |*v*〉 〈*u*| denotes an outer product.## Acknowledgments

We thank T. G. Kolda and T. O. Yeates for thoughtful reviews of this manuscript; J. F. X. Diffley, V. R. Iyer, E. M. Marcotte, and B. K. Tye for helpful comments; and the American Institute of Mathematics in Palo Alto for hosting the 2004 Workshop on Tensor Decompositions where some of this work was done. This work was supported by National Science Foundation Grant CCR-0430617 (to G.H.G.) and National Human Genome Research Institute Individual Mentored Research Scientist Development Award in Genomic Research and Analysis 5 K01 HG00038 (to O.A.).

## Supporting Information

Adobe PDF - 09033Appendix.pdf

Adobe PDF - 09033Appendix.pdf

- Download
- 1008.87 KB

## References

1

Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. & Futcher, B. (1998)

*Mol. Biol. Cell***9****,**3273–3297.2

Roberts, C. J., Nelson, B., Marton, M. J., Stoughton, R., Meyer, M. R., Bennett, H. A., He, Y. D., Dai, H., Walker, W. L., Hughes, T. R.,

*et al.*(2000)*Science***287****,**873–880.3

Lee, T. I., Rinaldi, N. J., Robert, F., Odom, D. T., Bar-Joseph, Z., Gerber, G. K., Hannett, N. M., Harbison, C. T., Thompson, C. M., Simon, I.,

*et al.*(2002)*Science***298****,**799–804.4

Ihmels, J., Levy, R. & Barkai, N. (2002)

*Nat. Biotechnol.***22****,**86–92.5

Balazsi, G., Barabasi, A. L. & Oltvai, Z. N. (2005)

*Proc. Natl. Acad. Sci. USA***102****,**7841–7846.6

Bowers, P. M., Cokus, S. J., Eisenberg, D. & Yeates, T. O. (2004)

*Science***306****,**2246–2249.7

Braun, E. & Brenner, N. (2004)

*Phys. Biol.***1****,**67–76.8

Kurihara, L. J., Stewart, B. G., Gammie, A. E. & Rose, M. D. (1996)

*Mol. Cell. Biol.***16****,**3990–4002.9

Alter, O., Brown, P. O. & Botstein, D. (2000)

*Proc. Natl. Acad. Sci. USA***97****,**10101–10106.10

Alter, O., Brown, P. O. & Botstein, D. (2001) in

*Microarrays: Optical Technologies and Informatics*, eds. Bittner, M. L., Chen, Y., Dorsel, A. N. & Dougherty, E. R. (Int. Soc. Optical Eng., Bellingham, WA), Vol.**4266**, pp. 171–186.11

Alter, O., Brown, P. O. & Botstein, D. (2003)

*Proc. Natl. Acad. Sci. USA***100****,**3351–3356.12

Alter, O. & Golub, G. H. (2004)

*Proc. Natl. Acad. Sci. USA***101****,**16577–16582.13

Golub, G. H. & Van Loan, C. F. (1996)

*Matrix Computation*(Johns Hopkins Univ. Press, Baltimore), 3rd Ed.14

De Lathauwer, L., De Moor, B. & Vandewalle, J. (2000)

*SIAM J. Matrix Anal. Appl.***21****,**1253–1278.15

Kolda, T. G. (2001)

*SIAM J. Matrix Anal. Appl.***23****,**243–255.16

Zhang, T. & Golub, G. H. (2000)

*SIAM J. Matrix Anal. Appl.***23****,**534–550.17

Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J. & Church, G. M. (1999)

*Nat. Genet.***22****,**281–285.18

Kim, H., Golub, G. H. & Park, H. (2005)

*Bioinformatics***21****,**187–198.19

Caro, L. H., Smits, G. J., van Egmond, P., Chapman, J. W. & Klis, F. M. (1998)

*FEMS Microbiol. Lett.***161****,**345–349.20

Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. (2003)

*Science***302****,**249–255.21

Bergmann, S., Ihmels, J. & Barkai, N. (2004)

*PLoS Biol.***2****,**E9.## Information & Authors

### Information

#### Published in

#### Classifications

#### Copyright

Copyright © 2005, The National Academy of Sciences.

#### Submission history

**Published online**: November 28, 2005

**Published in issue**: December 6, 2005

#### Keywords

#### Acknowledgments

We thank T. G. Kolda and T. O. Yeates for thoughtful reviews of this manuscript; J. F. X. Diffley, V. R. Iyer, E. M. Marcotte, and B. K. Tye for helpful comments; and the American Institute of Mathematics in Palo Alto for hosting the 2004 Workshop on Tensor Decompositions where some of this work was done. This work was supported by National Science Foundation Grant CCR-0430617 (to G.H.G.) and National Human Genome Research Institute Individual Mentored Research Scientist Development Award in Genomic Research and Analysis 5 K01 HG00038 (to O.A.).

### Authors

## Metrics & Citations

### Metrics

#### Citation statements

#### Altmetrics

### Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

#### Cited by

Loading...

## View Options

### View options

#### PDF format

Download this article as a PDF file

DOWNLOAD PDF### Get Access

#### Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Personal login Institutional Login#### Recommend to a librarian

Recommend PNAS to a Librarian#### Purchase options

Purchase this article to get full access to it.