New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
A tensor higherorder singular value decomposition for integrative analysis of DNA microarray data from different studies

Contributed by Gene H. Golub, September 26, 2007 (received for review June 8, 2007)
Abstract
We describe the use of a higherorder singular value decomposition (HOSVD) in transforming a data tensor of genes × “xsettings,” that is, different settings of the experimental variable x × “ysettings,” which tabulates DNA microarray data from different studies, to a “core tensor” of “eigenarrays” × “xeigengenes” × “yeigengenes.” Reformulating this multilinear HOSVD such that it decomposes the data tensor into a linear superposition of all outer products of an eigenarray, an x and a yeigengene, that is, rank1 “subtensors,” we define the significance of each subtensor in terms of the fraction of the overall information in the data tensor that it captures. We illustrate this HOSVD with an integration of genomescale mRNA expression data from three yeast cell cycle time courses, two of which are under exposure to either hydrogen peroxide or menadione. We find that significant subtensors represent independent biological programs or experimental phenomena. The picture that emerges suggests that the conserved genes YKU70, MRE11, AIF1, and ZWF1, and the processes of retrotransposition, apoptosis, and the oxidative pentose phosphate pathway that these genes are involved in, may play significant, yet previously unrecognized, roles in the differential effects of hydrogen peroxide and menadione on cell cycle progression. A genomescale correlation between DNA replication initiation and RNA transcription, which is equivalent to a recently discovered correlation and might be due to a previously unknown mechanism of regulation, is independently uncovered.
DNA microarrays make it possible to record the genomescale signals, for example, mRNA expression levels (1–4) and proteins' DNAbinding occupancy levels (5–7), that guide the progression of cellular processes. Future discovery and control in biology and medicine will come from the mathematical modeling of these data, where the mathematical variables and operations represent biological reality: The variables, patterns uncovered in the data, might correlate with activities of cellular elements, such as regulators or transcription factors, that drive the measured signals. The operations, such as data classification and reconstruction in subspaces of selected patterns, might simulate experimental observation of the correlations and possibly also causal coordination of these activities (8). Comparative analyses of these data among two or more organisms might give insights into the universality and specialization of evolutionary, biochemical, and genetic pathways (9). Integrative analyses of different types of signals from the same organism might reveal cellular mechanisms of regulation (10).
The structure of DNA microarray data integrated from different studies is of an order higher than that of a matrix. Each of the multiple biological and experimental settings under which the data are measured represents a degree of freedom in a tensor (11). Unfolded into a matrix, these degrees of freedom are lost and much of the information in the data tensor might also be lost.
We describe the use of a higherorder singular value decomposition (HOSVD) (12–14) in transforming a data tensor of genes × “xsettings,” that is, different settings of the experimental variable x × “ysettings,” which tabulates DNA microarray data from different studies, to a “core tensor” of “eigenarrays” × “xeigengenes” × “yeigengenes.” The eigenarrays and x and yeigengenes are unique orthonormal superpositions of the arrays and the genes across the x and ysettings, respectively. Reformulating this multilinear HOSVD, also known as the Nmode singular value decomposition (SVD) (15–17), such that it decomposes the data tensor into a linear superposition of all outer products of an eigenarray, an x and a yeigengene, that is, rank1 “subtensors” (12), the superposition coefficients of which are the “higherorder singular values” tabulated in the core tensor, we define the significance of each subtensor in terms of the fraction of the overall information in the data tensor that it captures.
We illustrate this HOSVD with an integration of genomescale mRNA expression data from three yeast cell cycle time courses, two of which are exposed to either hydrogen peroxide (HP) or menadione (MD) (1, 2). We find that significant subtensors represent independent biological programs or experimental phenomena common to all three studies or exclusive to either one or two of the studies (18), including the subtle differential effects of HP and MD on cell cycle progression. We also find that this subtensor interpretation is robust to variations in the data selection cutoffs.
The picture that emerges from this datadriven analysis suggests that the conserved genes YKU70, MRE11, AIF1, and ZWF1, and the processes of retrotransposition, apoptosis, and the oxidative pentose phosphate pathway, that these genes are involved in, may play significant, yet previously unrecognized, roles in the differential effects of HP and MD on cell cycle progression (1, 19–27). A genomescale correlation between DNA replication initiation and RNA transcription, which is equivalent to a recently discovered correlation (10), is consistent with the current understanding of replication initiation (28–31) and recent experimental results (32–36), and might be due to a previously unknown mechanism of regulation, is independently uncovered.
Mathematical Methods: HOSVD
A single DNA microarray probes the genomescale signal of K genes of a cellular system in a single sample. A series of L arrays probes L different samples under L different settings of the experimental variable x, that is, xsettings. A series of M arrays probes the genomescale signal under M different ysettings for each given xsetting. Let the thirdorder tensor , of size Kgenes × Lxsettings × Mysettings, tabulate the genomescale signal for all genes and under all x and ysettings, assuming that LM < K. Each element of , that is, _{klm} , is the signal measured for the kth gene under the lth x and mth ysettings. Each column vector of , that is, :_{lm}, lists the genomescale signal measured under the lth x and mth ysettings. The x and yrow vectors, _{k:m} and _{kl:} , list the signal measured for the kth gene under the mth ysetting across all xsettings, and under the lth xsetting across all ysettings, respectively.
The N = 3mode SVD, a HOSVD (12–14) of the thirdorder data tensor, is then a transformation of the data tensor from the space of Kgenes × Lxsettings × Mysettings to the reduced space of LM < Keigenarrays × Lxeigengenes × Myeigengenes [supporting information (SI) Fig. 4], where × _{a} U, × _{b} V_{x} , and × _{c} V_{y} denote multiplications of the tensor and the matrices U, V_{x} , and V_{y} , which contract the first, second, and third indices of with the second indices of U, V_{x} , and V_{y} or, equivalently, the first indices of U^{T} , V _{x} ^{T}, and V _{y} ^{T}, respectively. In this space the data tensor is represented by the thirdorder core tensor , which in general, is full. The transformation matrix U defines the Kgenes × LMeigenarrays basis set. The vector in the ath column of U, U: _{a} , lists the genomescale signal of the ath eigenarray. The transformation matrices V _{x} ^{T} and V _{y} ^{T} define the Lxeigengenes × Lxsettings and Myeigengenes × Mysettings basis sets, respectively. The vectors in the bth and cth rows of V _{x} ^{T} and V _{y} ^{T}, V _{x,b:} ^{T} and V _{y,c:} ^{T}, list the signal of the bth xeigengene across all ysettings and that of the cth y eigengene across all xsettings, respectively. The eigenarrays and the x and yeigengenes are orthonormal superpositions of the arrays and the genes across the x and ysettings, respectively.
The multilinear HOSVD of Eq. 1 can be reformulated such that it decomposes the data tensor into a linear superposition of ≤ (LM)^{2} rank1 subtensors, the superposition coefficients of which are the higherorder singular values, tabulated in the core tensor (12), that is, where the subtensor (a, b, c) is the outer product, denoted by ⊗, of the ath eigenarray U: _{a} and the bth x and cth yeigengenes, V _{x,b:} ^{T} and V _{y,c:} ^{T} (SI Fig. 5). Following Eq. 2 , we define the significance of a subtensor (a, b, c) relative to all other subtensors in terms of the “fraction” _{abc} , which measures the fraction of the overall information in the data tensor that this subtensor captures. The “Shannon entropy” d, measures the complexity of the data tensor from the distribution of the overall information among the different subtensors. This HOSVD holds for a tensor of any order N. For a secondorder tensor, that is, a matrix, this HOSVD reduces to the matrix SVD (15).
HOSVD Computation.
We compute the transformation matrix U from the SVD of the matrix T_{k} ≡ ( _{:11}, …, _{:1M}, …, _{:LM} ) = UDV^{T} , which is obtained by appending all column vectors { _{:LM} } along the Kgenes axis. Note that U is independent of the order of the appended arrays. The singular values, which are tabulated in the diagonal matrix D, are ordered in decreasing order, such that the eigenarrays, the column vectors of U, are ordered in decreasing order of their relative significance in terms of the fraction of the overall information in the data tensor that each eigenarray captures (SI Fig. 6). Similarly, we compute the transformation matrices V_{x} and V_{y} from the SVD of the matrices T_{l} = U_{x}D_{x}V _{x} ^{T} and T_{m} = U_{y}D_{y}V _{y} ^{T}, which are obtained by appending all xrow vectors { _{k:m} } along the Lxsettings axis and all yrow vectors { _{kl:} } along the Mysettings axis, respectively (SI Figs. 7 and 8). For a real data tensor, the eigenarrays and the x and y eigengenes are unique up to phase factors of ±1, such that each eigenarray and each x and yeigengene capture both parallel and antiparallel data patterns, except in degenerate subspaces, defined by equal corresponding singular values in the diagonal matrices D, D_{x} , or D_{y} , respectively. For example, the yeigengenes V _{y,c:} ^{T} and V _{y,m:} ^{T}, which satisfy D_{y,cc} ≈ D_{y,mm} , span an approximately degenerate subspace. We reformulate the HOSVD of Eqs. 1 and 2 with a unique orthogonal rotation of these two y eigengenes, which is selected by subjecting the rotated yeigengenes to a constraint, that may be advantageous in the interpretation and visualization of the data (SI Fig. 9). We then compute the core tensor by multiplying the data tensor and the transformation matrices U, V_{x} , and V_{y} , that is, = × _{k} U^{T} × _{l} V _{x} ^{T} × _{m} V _{y} ^{T} (SI Fig. 10).
Approximately Degenerate Subtensor Space Rotation.
We define a subset of subtensors as approximately degenerate if their corresponding higherorder singular values are approximately equal in magnitude and if N − 1 = 2 of their N = 3 indices are equal, such that they are listed in a single vector in the core tensor . For example, the subtensors (a, b, c) and (k, b, c), which satisfy  _{abc}  ≈  _{kbc} , span an “approximately degenerate subtensor space.” We reformulate the HOSVD of Eq. 2 with a single rank1 subtensor (a + k, b, c) unique to the data tensor, which is composed of these two subtensors, with the corresponding higherorder singular value _{a+k,b,c}, that is, _{abc} (a, b, c) + _{kbc} (k, b, c) = _{a+k,b,c} (a + k, b, c). The subtensor (a + k, b, c) ≡ U_{:,a+k} ⊗V _{x,b:} ^{T} ⊗ V _{y,c:} ^{T} is computed from the outer product of U_{:,a+k} ≡ _{a+k,b,c} ^{−1} ( _{abc}U: _{a} + _{kbc}U_{:k} ), a normalized superposition of the eigenarrays U: _{a} and U_{:k} , and the shared x and y eigengenes, V _{x,b:} ^{T} and V _{y,c:} ^{T} (Fig. 1). This subtensor is unique to the data tensor, because it is defined by a unique rotation in the space spanned by (a, b, c) and (k, b, c).
Subtensor Interpretation.
We associate a subtensor with an independent biological program or experimental phenomenon when a consistent biological or experimental theme is reflected in the interpretations of the patterns of the eigenarray, or superposition of eigenarrays, and the x and yeigengenes, which outer product defines the subtensor mathematically, taking into account the sign of the superposition coefficient of this subtensor, that is, the sign of the corresponding higherorder singular value. We parallel and antiparallelassociate an eigenarray with the most likely parallel and antiparallel cellular states according to the annotations of the two groups of k genes, one with largest and one with smallest levels of biological signal in this eigenarray among all K genes, respectively. The P value of a given association is calculated assuming hypergeometric probability distribution of the J annotations among the K genes, and of the subset of j ⊆ J annotations among the subset of k genes, P(j; k, K, J) = (_{k} ^{K})^{−1} Σ_{i=j} ^{k} (_{i} ^{J}) (_{k − i} ^{K − J}) (18). We associate the x and yeigengenes with a biological or experimental process when their patterns of variation across the x and ysettings, respectively, are interpretable (Fig. 2). For visualization, we set the average of each array across the genes and of each gene across the x and ysettings to zero, such that the signal of each array and gene is centered at its gene or x and ysettinginvariant level, respectively.
Biological Results: Integrative Analysis of mRNA Expression from Yeast Cell Cycle Time Courses Under Different Oxidative Stress Conditions
The data tensor we analyze (SI Dataset 1) tabulates relative mRNA expression levels of K = 4,329 yeast Saccharomyces cerevisiae genes across L = 13 time points sampled from each of M = 3 cell cycle time courses of cultures synchronized by the pheromone αfactor, under different oxidative stress conditions: Exposures to (i) ≈0.2 mM HP, and (ii) ≈2 mM MD, starting at 25 min after 90 min of incubation in ≈7 nM αfactor, monitored by Shapira et al. (1) and (iii) a control time course, synchronized by 120 min of incubation in ≈7 nM αfactor, monitored by Spellman et al. (2). The time points sample approximately two cell cycle periods in the control culture. The first period of 63 min is sampled at 7min intervals. The second period is sampled at 77, 98, and 119 ± 2 min. Each relative expression level is presumed valid when the signaltobackground ratio is >1.1 for both the synchronized culture and asynchronous reference, and each of the 4,329 genes has valid data in at least eight time points in each course, and at least 32 of the LM = 39 arrays.
We use SVD to estimate the missing data in each time course separately (9). After normalizing each array by its norm ‖ _{:LM} ‖, and computing the transformation matrices U, V_{x} , and V_{y} (SI Figs. 6–8), we rotate the approximately degenerate second and third yeigengenes, V _{y,2:} ^{T} and V _{y,3:} ^{T}, such that the rotated V _{y,3:} ^{T} describes over and underexpression in response to HP and MD, respectively, and steadystate expression in the control time course ( SI Mathematica Notebook ). We then compute the HOSVD of the data tensor (SI Fig. 9), and rotate the approximately degenerate subtensor spaces (4, 2+3, 1), (5+2, 1, 3), (8+2, 4, 3), and (3+7, 2, 3) (Fig. 1).
Of the 4,329 genes, the mRNA expression of 579 was traditionally or microarrayclassified as cell cycleregulated (2). The expression of 312 and 680 genes was microarrayclassified as regulated by pheromone (3) or environmental stress (4), respectively (SI Dataset 2). We annotate each of the genes as a DNAbinding target of either one of 19 transcription factors and four replication initiation proteins if the microarrayassigned P value for the binding of that protein to at least one of the probes that maps to that gene is <0.02 (5–7) (SI Datasets 3–6). The DNAbinding occupancy levels of the oxidative stress response activators and the pheromone response factors were measured after a 30min exposure to ≈4 mM HP or 3 nM αfactor, respectively. The cell cycle factors, Stb5 and the replication initiation proteins were measured at steady growth conditions (Fig. 2).
We find that significant subtensors represent independent biological programs or experimental phenomena common to all three studies or exclusive to either one or two of the studies, including the subtle differential effects of HP and MD on cell cycle progression. We also find that this subtensor interpretation is robust to variations in the data selection cutoffs.
Significant Subtensors Represent Independent Biological Programs or Experimental Phenomena.
Steady state.
The first and most significant subtensor (1, 1, 1) captures _{111} ≈70% of the overall expression information in the data tensor, with the corresponding higherorder singular value _{111} > 0 (Fig. 1 a). Following the P values for the distribution of the genes among each of the subsets of k = 200 genes with largest and smallest levels of expression in the first eigenarray U _{:1} (SI Dataset 7), which defines the expression variation across the genes in this subtensor, this eigenarray is antiparallelassociated with mRNA expression in response to environmental stress and the pheromone, and is parallelassociated with overexpression during the cell cycle stage M/G_{1} (Fig. 2). Consistently, this eigenarray is also antiparallelassociated with the expression of genes bound by oxidative stress response activators and the pheromone response factors Dig1 and Tec1, and is parallelassociated with the expression of genes bound by the M/G_{1} factor Ace2. The first xeigengene V _{x,1:} ^{T}, which defines the expression variation across time in this subtensor, describes timeinvariant underexpression (Fig. 1 b). The first yeigengene V _{y,1:} ^{T}, which defines the expression variation across the oxidative stress conditions, describes conditioninvariant overexpression (Fig. 1 c). Taken together, the first subtensor is inferred to represent the steady state of mRNA expression in response to HP, MD, or αfactor, averaged over time and conditions.
Oxidative stress responses.
The second, third, and seventh subtensors, (2, 1, 2), (2, 2, 1), and (2, 2, 2), capture ≈6%, 3.3%, and 1% of the overall information, respectively, with _{212} < 0 and _{221}, _{222} > 0. The second eigenarray is parallelassociated with expression in response to environmental stress and is antiparallelassociated with pheromone response and G_{1}. The second xeigengene describes a transition from under to overexpression at ≈35 min. The second yeigengene describes overexpression in the HP and MDtreated cultures and underexpression in the control culture. These subtensors are inferred to represent expression in response to oxidative stress: The second subtensor represents timeaveraged response to the oxidative stress induced by HP and MD vs. the timeaveraged response induced by αfactor. The third subtensor represents conditionaveraged expression variation across time in response to HP or MD exposure starting at 25 min, or in response to αfactor, which in the control culture dissipates at ≈20 min. The seventh subtensor represents oxidative stress response that varies across both time and conditions.
Pheromone responses.
The fourth, fifth, and sixth subtensors, (4, 2+3, 1), (3, 2, 2), and (3, 1, 2), capture ≈1.6%, 1.4%, and 1% of the overall information, with _{4,2+3,1} > 0 and _{322}, _{312} < 0. The superposition of the second and third xeigengenes describes a timedecaying transition from over to underexpression at ≈20 min. Both third and fourth eigenarrays are antiparallel and parallelassociated with expression in response to environmental stress and the pheromone, respectively. These subtensors are inferred to represent pheromone and pheromoneinduced oxidative stress responses: The fourth subtensor represents a conditionaveraged, timedecaying response. The fifth subtensor represents an αfactor response that varies across time and conditions. The sixth subtensor represents a timeaveraged response to the αfactor in the HP and MDtreated cultures vs. that in the control culture.
HP vs. MDInduced Expression.
The eighth, ninth, and tenth subtensors, S(5+2, 1, 3), S(8+2, 4, 3), and S(3+7, 2, 3), capture ≈0.9%, 0.75%, and 0.6% of the overall information, with the corresponding higherorder singular values > 0. Of the corresponding superpositions of eigenarrays, U _{:,5+2} is antiparallel and U _{:,8+2} and U _{:,3+7} are parallelassociated with expression in response to environmental stress and of oxidative stress activatorbound genes. Also, U _{:,5+2} is parallel and U _{:,8+2} and U _{:,3+7} are antiparallelassociated with expression activated by the G_{2}/M factor Ndd1. These subtensors are inferred to represent responses to the HP vs. MDinduced oxidative stress: The eighth subtensor represents timeaveraged underexpression. The ninth and tenth subtensors represent overexpression, starting at ≈25 and 35 min and peaking at ≈40 and 55 min, when the control culture is at S/G_{2} and G_{2}/M, respectively (Fig. 3 a). Taken together, oxidative stressinduced and G_{1} genes are over and G_{2}/M genes are underexpressed in the HP vs. the MDtreated time course. These results are in agreement with the current understanding of the differences in the response to HP vs. the response to MD: The HPtreated culture arrests in G_{2}/M after extended G_{1} and S stages in a manner that depends on inactivation of the Mcm1Fkh2Ndd1 transcription regulatory complex (1) and the DNA damageinduced RAD9 checkpoint, whereas the MDtreated culture continues through G_{2}/M and M/G_{1} and arrests in G_{1} because of underexpression of the G_{1} cyclinencoding CLN1 and CLN2 (19).
The eighth, ninth, and tenth subtensors classify the yeast genes according to the time dependence of their differential expression and identify the subsets of genes with largest and smallest expression in each subtensor as significant in the HP vs. MDinduced responses in terms of the fraction of the information in either subtensor that they capture. The genomescale picture that emerges from this datadriven analysis suggests that the evolutionarily highly conserved genes YKU70, MRE11, AIF1, and ZWF1, and the processes of retrotransposition, apoptosis, and the oxidative pentose phosphate pathway, that they are involved in, may play significant, yet previously unrecognized, roles in the difference between the effects of HP and MD on cell cycle progression in yeast.
Retrotransposition.
Overexpression in the eighth subtensor and underexpression in the ninth and tenth subtensors define genes of which timeaveraged expression is greater in the MD than the HPtreated culture and is modulated by a peak in the MD and a trough in the HPtreated culture at ≈50 min, when the control culture is at G _{2} /M. The most significant of these genes in terms of the fraction of the information in the eighth, ninth, and tenth subtensors that it captures is the yeast Ku proteinencoding YKU70 (Fig. 3 b). Yku70 is a telomere maintenance protein, which is necessary for escape from the RAD9 checkpoint arrest in G _{2} /M. In this process, Yku70 and the meiotic recombination protein Mre11 play antagonistic roles, even though deletion of YKU70 is similar to that of MRE11 in its effect on nonhomologous end joining of DNA doublestrand breaks (20). Yku70 was shown to potentiate retrotransposition (21), whereas disruption of MRE11 was shown to increases retrotransposition levels (22). We find MRE11 the 40th most significant gene with underexpression in the eighth and tenth subtensors and overexpression in the ninth subtensor. Consistently, the subset of the 200 most significant genes, which are anticorrelated with MRE11 in these subtensors, includes 16 of the 20 retrotransposon nucleocapsid genes in this data tensor, such as YIL080W, an enrichment that corresponds to a P value of ≈10^{−18}.
Apoptosis.
Among genes anticorrelated with YKU70 in the eighth, ninth, and tenth subtensors, the second most significant gene is FLR1, a multidrug transporter. This differential expression of FLR1 is consistent with the observation that its transcription is regulated by the oxidative stress factor YAP1 and is induced by HP but not by MD (23). The 19th most significant gene is AIF1, which encodes the yeast apoptosisinducing factor. Overexpression of AIF1, which with SKN7, SNQ2, and YAP1, constitutes the gene ontology “response to singlet oxygen” core (24), stimulates HPinduced apoptopic cell death (25). This differential expression of AIF1 is consistent with the inactivation of the frog Xenopus laevis Ku70 during apoptosis (26).
Oxidative pentose phosphate pathway.
Among genes correlated with AIF1 and anticorrelated with YKU70, the 18th most significant is ZWF1, which encodes the yeast glucose6phosphate dehydrogenase. Glucose6phosphate dehydrogenase catalyzes the first step of the pentose phosphate pathway, that is, the oxidative utilization of glucose, and is involved in response to HP. ZWF1 is among the 200 genes with the smallest expression in the ninth subtensor, together with GND1 and SOL3, the two other genes in the gene ontology “oxidative brunch of the pentosephosphate shunt” core in this data tensor, and STB5, an S/G_{2} gene that encodes a transcription factor required for the regulation of the pentose phosphate pathway (27). Consistently, the ninth subtensor is parallelassociated with expression of Stb5bound genes (Fig. 2).
Oxidative Stress Response Is Correlated with Overexpression of Binding Targets of Replication Initiation Proteins.
Recently, we discovered a genomescale correlation between the DNA binding of the replication initiation proteins Mcm3, Mcm4, and Mcm7 and underexpression of adjacent genes during G_{1} (16). Replication initiation requires G_{1} binding of these proteins, which are involved in transcriptional silencing (28), at replication origins (29). Therefore, we suggested that this correlation might be explained by a previously unknown mechanism of regulation.
Now we uncover independently an equivalent genomescale correlation: In all ten most significant subtensors and the corresponding seven eigenarrays and superpositions of eigenarrays, overexpression of binding targets of Mcm3, Mcm4, and Mcm7 correlates with expression in response to environmental stress and with overexpression of oxidative stress activatorbound genes. DNA damage as caused by oxidative stress is known to inhibit binding of origins by targeted degradation of the essential prereplicative complex protein Cdc6 (30, 31). Taken together, we find that overexpression of binding targets of replication initiation proteins correlates with reduced, or even inhibited, binding of the origins. This correlation is in agreement with the recent observation that reduced efficiency of activation of origins correlates with local transcription (32, 33).
As with the correlation between the DNA binding of Mcm3, Mcm4, and Mcm7 and underexpression of adjacent genes during G_{1}, this equivalent correlation between overexpression of binding targets of Mcm3, Mcm4, and Mcm7 and expression in response to stress may be due to either one of at least two mechanisms of regulation: Stressinduced transcription of genes that are located near origins (34, 35) may reduce the binding efficiency of the adjacent origins. Or, reduced or even inhibited binding of origins by replication initiation proteins caused by degradation of Cdc6 may release genes that are located near origins for transcription. For example, the promoter region of the stressinduced FLR1, which includes Cin5 and Yap7 binding sites, overlaps with the yeast autonomously replicating sequence ARS209, and the stressinduced ZWF1 is transcribed in the direction of ARS1412 (36).
Conclusions
We have shown that this multilinear HOSVD, reformulated to decompose a data tensor into a linear superposition of rank1 subtensors, provides an integrative framework for analysis of DNA microarray data from different studies, where significant subtensors represent independent biological programs or experimental phenomena. By using this HOSVD in an integration of genomescale mRNA expression data from three yeast cell cycle time courses, two of which are exposed to either HP or MD, we were able to find that the conserved genes YKU70, MRE11, AIF1, and ZWF1, and the processes of retrotransposition, apoptosis, and the oxidative pentose phosphate pathway that these genes are involved in, may play significant, yet previously unrecognized, roles in the differential effects of HP and MD on cell cycle progression. A genomescale correlation between DNA replication initiation and RNA transcription, which is equivalent to a recently discovered correlation and might be due to a previously unknown mechanism of regulation, has been independently uncovered.
Acknowledgments
We thank B. W. Bader, I. W. Dawes, and L. De Lathauwer for thoughtful and thorough reviews of this manuscript, J. F. X. Diffley and M. Shapira for helpful comments, and the American Institute of Mathematics in Palo Alto for hosting the 2004 Workshop on Tensor Decompositions where some of this work was done. This work was supported by National Human Genome Research Institute Grant HG004302 (to O.A.) and National Science Foundation Grant CCR0430617 (to G.H.G.).
Footnotes
 ^{¶}To whom correspondence should be addressed. Email: orlyal{at}mail.utexas.edu

Author contributions: L.O., G.H.G., and O.A. designed research; L.O. and O.A. performed research; L.O. and O.A. analyzed data; L.O., G.H.G., and O.A. wrote the paper.

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/cgi/content/full/0709146104/DC1.
 Abbreviations:
 HOSVD,
 higherorder singular value decomposition;
 HP,
 hydrogen peroxide;
 MD,
 menadione;
 SVD,
 singular value decomposition.

Freely available online through the PNAS open access option.
 © 2007 by The National Academy of Sciences of the USA
References

↵
 Shapira M ,
 Segal E ,
 Botstein D

↵
 Spellman PT ,
 Sherlock G ,
 Zhang MQ ,
 Iyer VR ,
 Anders K ,
 Eisen MB ,
 Brown PO ,
 Botstein D ,
 Futcher B

↵
 Roberts CJ ,
 Nelson B ,
 Marton MJ ,
 Stoughton R ,
 Meyer MR ,
 Bennett HA ,
 He YD ,
 Dai H ,
 Walker WL ,
 Hughes TR ,
 et al.

↵
 Gasch AP ,
 Spellman PT ,
 Kao CM ,
 CarmelHarel O ,
 Eisen MB ,
 Storz G ,
 Botstein D ,
 Brown PO
 ↵

↵
 Wyrick JJ ,
 Aparicio JG ,
 Chen T ,
 Barnett JD ,
 Jennings EG ,
 Young RA ,
 Bell SP ,
 Aparicio OM
 ↵

↵
 Alter O

↵
 Alter O ,
 Brown PO ,
 Botstein D

↵
 Alter O ,
 Golub GH

↵
 Alter O ,
 Golub GH
 ↵
 ↵
 ↵

↵
 Golub GH ,
 Van Loan CF

↵
 Alter O ,
 Brown PO ,
 Botstein D

↵
 Alter O ,
 Golub GH
 ↵

↵
 FlatteryO'Brien JA ,
 Dawes IW
 ↵

↵
 Downs JA ,
 Jackson SP

↵
 Scholes DT ,
 Banerjee M ,
 Bowen B ,
 Curcio MJ

↵
 Nguyên DT ,
 Alarco AM ,
 Raymond M

↵
 Gene Ontology Consortium

↵
 Wissing S ,
 Ludovico P ,
 Herker E ,
 Büttner S ,
 Engelhardt SM ,
 Decker T ,
 Link A ,
 Proksch A ,
 Rodrigues F ,
 CorteReal M ,
 et al.

↵
 Le Romancer M ,
 Cosulich SC ,
 Jackson SP ,
 Clarke PR

↵
 Larochelle M ,
 Drouin S ,
 Robert F ,
 Turcotte B
 ↵
 ↵
 ↵

↵
 Blanchard F ,
 Rusiniak ME ,
 Sharma K ,
 Sun X ,
 Todorov I ,
 Castellano MM ,
 Gutierrez C ,
 Baumann H ,
 Burhans WC
 ↵

↵
 Snyder M ,
 Sapolsky RJ ,
 Davis RW

↵
 Ramachandran L ,
 Burhans DT ,
 Laun P ,
 Wang J ,
 Liang P ,
 Weinberger M ,
 Wissing S ,
 Jarolim S ,
 Suter B ,
 Madeo F ,
 et al.
 ↵
 ↵
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Physical Sciences
Applied Mathematics
Biological Sciences
Genetics
Related Content
Cited by...
 Learning common and specific patterns from data of multiple interrelated biological scenarios with matrix factorization
 Threeway clustering of multitissue multiindividual gene expression data using constrained tensor decomposition
 TNFinsulin crosstalk at the transcription factor GATA6 is revealed by a model that links signaling and transcriptomic data tensors
 Global effects of DNA replication and DNA replication origin activity on eukaryotic gene expression