New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Generalized singular value decomposition for comparative analysis of genomescale expression data sets of two different organisms

Contributed by David Botstein
Abstract
We describe a comparative mathematical framework for two genomescale expression data sets. This framework formulates expression as superposition of the effects of regulatory programs, biological processes, and experimental artifacts common to both data sets, as well as those that are exclusive to one data set or the other, by using generalized singular value decomposition. This framework enables comparative reconstruction and classification of the genes and arrays of both data sets. We illustrate this framework with a comparison of yeast and human cellcycle expression data sets.
Recent advances in highthroughput genomic technologies enable acquisition of different types of molecular biological data, e.g., DNAsequence and mRNAexpression data, on a genomic scale. Comparative analysis of these data among two or more model organisms promises to enhance fundamental understanding of the universality as well as the specialization of molecular biological mechanisms. It also may prove useful in medical diagnosis, treatment, and drug design. Comparisons of the DNA sequence of entire genomes already give insights into evolutionary, biochemical, and genetic pathways.
Comparative analysis of mRNAexpression data requires mathematical tools that are able to distinguish the similar from the dissimilar among two or more largescale data sets. These tools should provide mathematical frameworks for the description of the data, where the variables and operations may represent some biological reality. Recently we showed that singular value decomposition (SVD) provides such a framework for genomewide expression data (refs. 1–3; see also refs. 4–7).
Now we show that generalized SVD (GSVD) (8) provides a comparative mathematical framework for two genomescale expression data sets. GSVD is a linear transformation of the two data sets from the two genes × arrays spaces to two reduced and diagonalized “genelets” × “arraylets” spaces. The genelets are shared by both data sets. Each genelet is expressed only in the two corresponding arraylets, with a corresponding “angular distance” indicating the relative significance of this genelet, i.e., its significance, in one data set relative to that in the other.
We show that a genelet of equal significance in both data sets may represent a process common to both data sets. The two corresponding arraylets may represent the cellular states in each data set that correspond to this common process. A genelet of no significance in one data set relative to the other may represent a process exclusive to the latter data set. The corresponding arraylet of this data set may represent the cellular state that corresponds to this exclusive process.
We also show that mathematical reconstruction of gene expression in a subset of genelets may simulate experimental observation of only the process that these genelets are inferred to represent. Similarly, reconstruction of array expression in the subset of corresponding arraylets may simulate observation of only the corresponding cellular state. Reconstruction of each data set in two or more subspaces may simulate observation of genomescale differential expression in the processes, which these subspaces are inferred to span. We demonstrate comparative classification of both sets of genes and arrays based on similarity in their reconstructed rather than overall expression.
We illustrate this framework with a comparison of yeast (9) and human (10) cell cycleexpression data sets.
Mathematical Methods: GSVD
A single microarray probes the relative expression levels of N_{1} genes in a single sample. A series of M_{1} arrays probes the genomescale expression levels in M_{1} different samples, i.e., under M_{1} different experimental conditions. Let the matrix ê_{1}, of size N_{1}genes × M_{1}arrays, tabulate the full expression data. The vector in the nth row of the matrix ê_{1}, 〈g_{1,n} ≡ 〈nê_{1}, lists the expression of the nth gene across the different samples that correspond to the different arrays.§ The vector in the mth column of the matrix ê_{1}, a_{1,m}〉 ≡ ê_{1}m〉, lists the genomescale expression measured by the mth array. Let the matrix ê_{2}, of size N_{2}genes × M_{2}arrays, tabulate the relative expression levels of N_{2} genes under M_{2} = M_{1} ≡ M <max{N_{1}, N_{2}} experimental conditions that correspond one to one to the M_{1} conditions underlying ê_{1}. This onetoone correspondence between the two sets of conditions is at the foundation of the GSVD comparative analysis of the two data sets and should be mapped out carefully.
GSVD then is simultaneous linear transformation of the two expression data sets ê_{1} and ê_{2} from the two N_{1}genes × Marrays and N_{2}genes × Marrays spaces to the two reduced Mgenelets × Marraylets spaces (see Fig. 5, which is published as supporting information on the PNAS web site, www.pnas.org, and also at http://genomewww.stanford.edu/GSVD/), In these spaces the data are represented by the diagonal nonnegative matrices ɛ̂_{1} and ɛ̂_{2}, which satisfy 〈kɛ̂_{1}m〉 ≡ ɛ_{1,m}δ_{km} ≥ 0 and 〈kɛ̂_{2}m〉 ≡ ɛ_{2,m}δ_{km} ≥ 0 for all 1 ≤ k, m ≤ M. The mth genelet is expressed only in the two mth arraylets, each of which corresponds to one of the two data sets. Therefore, each genelet is decoupled from all other genelets in both data sets simultaneously.
The antisymmetric angular distance between the data sets, indicates the relative significance of the mth genelet, i.e., its significance in the first data set relative to that in the second in terms of the ratio of the expression information captured by this genelet in the first data set to that in the second. An angular distance of 0 indicates a genelet of equal significance in both data sets, with ɛ_{1,m} = ɛ_{2,m}; ±π/4 indicates no significance in the second data set relative to the first, with ɛ_{1,m} ≫ ɛ_{2,m}, or in the first relative to the second, respectively. The angular distances are arranged in decreasing order of significance in the first data set relative to the second such that π/4 ≥ θ_{1} ≥ ⋯ ≥ θ_{M} ≥ −π/4. The “generalized fractions of eigenexpression” of each data set separately indicate the significance of each genelet and its corresponding arraylet in this data set in terms of the fraction of the overall expression information that they capture in this data set alone (see Appendix, Eqs. 4 and 5, and Fig. 6, which are published as supporting information on the PNAS web site).
The transformation matrix x̂^{−1} defines the Mgenelets × Marrays basis set that is shared by both data sets. The transformation matrices û_{1} and û_{2} define the N_{1}genes × Marraylets and N_{2}genes × Marraylets basis sets that correspond to the first and second data sets, respectively. The vector in the mth row of x̂^{−1}, 〈γ_{m} ≡ 〈mx̂^{−1}, lists the expression of the mth genelet across the different arrays in both data sets simultaneously. The vectors in the mth columns of û_{1} and û_{2}, α_{1,m}〉 ≡ û_{1}m〉 and α_{2,m}〉 ≡ û_{2}m〉, list the genomescale expression in the mth arraylets of the first and second data sets, respectively. The genelets are normalized, such that 〈γ_{m}γ_{m}〉 = 1 for all 1 ≤ m ≤ M, but not necessarily orthogonal superpositions of the genes of the first and, at the same time, the second data set. The arraylets of either data set are orthonormal superpositions of the arrays of this data set such that, in general, x̂^{−1} is nonorthogonal, whereas û_{1} and û_{2} are both orthogonal, where Î is the identity matrix. Therefore, each arraylet of either data set is decoupled and decorrelated from all other arraylets of this data set. The genelets and arraylets are unique, and therefore also datadriven, up to a phase factor of ±1, because each genelet and arraylet capture both parallel and antiparallel gene or arrayexpression patterns, respectively, except in degenerate subspaces, defined by subsets of equal angular distances.
GSVD Calculation.
From Eqs. 1 and 3, the Marrays × Marrays symmetric correlation matrices â_{1} = êê_{1} = (x̂^{−1})^{T}ɛ̂x̂^{−1} and â_{2} = êê_{2} are represented in the Mgenelets × Mgenelets space by the simultaneously diagonal matrices ɛ̂ and ɛ̂, respectively. In theory, it is possible to calculate the GSVD of the two data sets ê_{1} and ê_{2} by (i) diagonalizing ââ_{1} = x̂(ɛ̂ɛ̂_{1})^{2}x̂^{−1} to obtain x̂; (ii) projecting x̂ onto ê_{1} and ê_{2} to obtain ɛ̂ = (û_{1}ɛ̂_{1})^{T}(û_{1}ɛ̂_{1}) = (ê_{1}x̂)^{T}(ê_{1}x̂) and ɛ̂; and (iii) projecting x̂, ɛ̂_{1}, and ɛ̂_{2} onto ê_{1} and ê_{2} to obtain û_{1} = ê_{1}x̂ɛ̂ and û_{2}. In practice, we avoid computing the quotient of the correlation matrices, ââ_{1}, and use the numerically robust GSVD algorithm (8, 9) to obtain x̂.
Comparative Pattern Inference.
The decorrelation of the arraylets suggests that some of the significant arraylets of each data set, i.e., these with the largest generalized fractions of eigenexpression (see Appendix, Eqs. 4 and 5, and Fig. 6), may represent independent cellular states, where the corresponding genelets represent the corresponding regulatory programs, biological processes, or experimental artifacts that contribute to the overall expression signal in each data set. The onetoone correspondence between the two sets of experimental conditions that underlie the two data sets suggests that among these genelets, a genelet of equal significance in both data sets with angular distance of ≈0 may represent a process common to both data sets; a genelet of no significance in one data set relative to the other with angular distance of ≈± may represent a process exclusive to the latter data set. We infer that a genelet represents a process exclusive to one or common to both data sets when its expression pattern across the corresponding one or both sets of arrays is biologically or experimentally interpretable. We associate this genelet with a biological process when this inference is supported by one or two coherent biological themes, reflected in the functions of the genes of the corresponding one or both data sets, whose coefficients of this genelet in the GSVD expansion, as listed in the corresponding one or both arraylets, are largest in magnitude compared to those coefficients of all other genes. With this we assume that the corresponding one or both arraylets represent the cellular states of this exclusive or common process, respectively. We estimate the probabilistic significance of these associations by annotations using combinatorics (ref. 10; see Appendix, Fig. 7, and Table 1, which are published as supporting information on the PNAS web site).
Comparative Data Reconstruction.
The decoupling of the genelets and both sets of arraylets allows reconstructing either data set in a given subspace of Kgenelets and corresponding arraylets without eliminating genes or arrays, ê_{i} → ∑ɛ_{i,k}α_{i,k}〉〈γ_{k}, where i = 1, 2. For visualization and classification, we set the arithmetic mean of each genelet across the arrays and that of each arraylet across the genes to 0, such that the expression of each gene and array in the reconstructed data set is centered at its array or geneinvariant level, respectively.
Comparative Data Classification.
Inferring that subsets of genelets and arraylets represent independent processes or states, exclusive to one or common to both data sets, allows classifying the genes and arrays of one or simultaneously both data sets by similarity in their expression of these genelets or arraylets, respectively, rather than their overall expression. We leastsquaresapproximate a subspace spanned by K > 2 genelets with that spanned by the two orthonormal vectors x〉 and y〉, which maximize ∑ 〈γ_{k}(x〉〈x + y〉〈y)γ_{k}〉. We plot the projection of each gene of either data set 〈g_{i,n}, where i = 1, 2, from the Kgenelets subspace onto y〉, ∑ɛ_{i,k}〈nα_{i,k}〉〈γ_{k}y〉/N_{i,n}, along the y axis vs. that onto x〉 along the x axis, normalized by its ideal amplitude, where the contribution of each genelet to the overall projected expression of the gene adds up rather than cancels out, N = ∑ ∑ɛ_{i,k}ɛ_{i,l}〈nα_{i,k}〉〈α_{i,l}n〉〈γ_{k}(x〉〈x + y〉〈y)γ_{l}〉. In this plot, the distance of each gene from the origin, r_{i,n}, is the amplitude of its normalized projection. An amplitude of 1 indicates that the genelets add up; 0 indicates that they cancel out. The phase difference of each gene from the x axis, φ_{i,n}, is its phase in the progression of expression across the genes from x〉 to y〉 and back to x〉, going through the projections of all Kgenelets in this subspace (x〉〈x + y〉〈y)γ_{k}〉. We sort the genes according to φ_{i,n}. Similarly, we plot the projection of each array, a_{i,m}〉, from the Karraylets subspace onto ∑α_{i,k}〉〈γ_{k}y〉, ∑ɛ_{i,k}〈yγ_{k}〉〈γ_{k}m〉/N_{i,m}, along the y axis vs. that onto ∑α_{i,k}〉〈γ_{k}x〉 along the x axis, normalized by its ideal amplitude, N = ∑∑ ɛ_{i,k} ɛ_{i,l}〈mγ_{k}〉〈γ_{l}m〉〈γ_{k}(x〉〈x + y〉〈y)γ_{l}〉. We sort the arrays according to their phase differences from the x axis, φ_{i,m}.
Biological Results: Comparison of Yeast and Human CellCycle Expression Data Sets
Spellman et al. (11) monitored mRNA levels for 6,113 putative ORFs of the yeast Saccharomyces cerevisiae over two cellcycle periods in a yeast culture synchronized initially in the cellcycle stage M/G_{1} by the pheromone α factor, relative to reference mRNA from an asynchronous culture, at 7min intervals for 119 min. The data set for the yeast experiments we analyze (see Data Sets 1–4, which are published as supporting information on the PNAS web site and mathematica notebook at http://genomewww.stanford.edu/GSVD/) tabulates the ratios of geneexpression levels for the N_{1} = 4,523 genes with no missing data in at least 15 of the M_{1} = 18 arrays. Of these genes, 604 were classified as cell cycleregulated by Spellman et al., and 77 were classified by traditional methods. Whitfield et al. (12) monitored mRNA levels for 43,198 human gene clones over two and a half cellcycle periods in a HeLa cellline culture synchronized initially in S by a doublethymidine block, relative to reference mRNA from an asynchronous HeLa culture, at 2h intervals for 34 h. The data set for the human experiments we analyze (see Data Sets 5–8, which are published as supporting information on the PNAS web site) tabulates the ratios of geneexpression levels for the N_{2} = 12,056 clones with no missing data in at least 15 of the M_{2} = 18 arrays. Of these clones, 750 were classified as cell cycleregulated by Whitfield et al., and 73 were classified by traditional methods. We estimate the missing data in each data set using SVD (ref. 2; see Appendix and Figs. 8–11, which are published as supporting information on the PNAS web site) and calculate the GSVD of both data sets.
Common Yeast and Human CellCycle Subspace.
The time, i.e., array variations of the third, fourth, and fifth genelets, 〈γ_{3}, 〈γ_{4}, and 〈γ_{5}, that are almost equally significant in both data sets (slightly more in the yeast data), with 0 < θ_{3}, θ_{4}, θ_{5} < π/16 (Fig. 1), fit normalized cosine functions of two periods and initial phases of π/3, 0, and −π/3, respectively, superimposed on timeinvariant expression (Fig. 2). The genelets 〈γ_{14}, 〈γ_{15}, and 〈γ_{16}, which are also almost equally significant in both data sets (slightly more in the human data), with −π/6 < θ_{14}, θ_{15}, θ_{16} < 0, fit normalized cosines of two and a half periods and initial phases of −π/3, π/3, and 0, respectively. Coherent themes of yeast and human cellcycle programs emerge from the annotations of the 100 yeast and 100 human genes (13, 14), with largest parallel and separately also antiparallel contributions from each one of these six genelets as listed in the corresponding yeast and human arraylets (see Data Sets 9 and 10, which are published as supporting information on the PNAS web site). We associate all these six genelets with the cellcycle geneexpression oscillations common to both the yeast and human genomes and manifested in both data sets. We assume that the corresponding six yeast and six human arraylets represent the yeast and human cellcycle cellular states, respectively. The probabilistic significance of these associations by annotations, estimated using combinatorics, is high: Most of the P values, calculated assuming hypergeometric probability distribution of the annotations among the genes, are orders of magnitude <0.01 (ref. 10; see Appendix, Fig. 7, and Table 1). Following the traditional classifications, the 0phase genelet 〈γ_{4} is associated in parallel with the yeast cellcycle stage M/G_{1}, in which the yeast culture is initially synchronized, and both 0phase genelets 〈γ_{4} and −〈γ_{16} are associated in parallel with the human cellcycle stage S, in which the human culture is initially synchronized.
Projecting the expression of the 18 yeast arrays from this sixdimensional yeast arraylets subspace onto the twodimensional subspace that approximates it, ≥50% of the contributions of the six arraylets add up (rather than cancel out) in the overall expression of 16 arrays, the normalized amplitudes of which satisfy 0.5 ≤ r_{1,m} < 1 (Fig. 3). Sorting the arrays according to their phases, {φ_{1,m}}, gives an array order similar to that of the cellcycle time points measured by the arrays that describes the yeast cellcycle progression from the M/G_{1} stage through G_{1}, S, S/G_{2}, and G_{2}/M back to M/G_{1} twice. Because the projection of the 0phase arraylets α_{1,4}〉 and −α_{1,16}〉, which correspond to the 0phase genelets, 〈γ_{4} and −〈γ_{16}, is correlated with the arrays a_{1,1}〉, a_{1,2}〉, and a_{1,10}〉 and also a_{1,9}〉 and a_{1,18}〉, we associate both yeast 0phase arraylets with the cellcycle cellular state of transition from G_{2}/M to M/G_{1}, in which the yeast culture is synchronized initially. Projecting the expression of the 18 human arrays from the sixdimensional human arraylets subspace onto the twodimensional subspace that approximates it, ≥50% of the contributions of the six arraylets add up in the expression of 16 arrays. Sorting the arrays describes the human cellcycle progression from S through G_{2}, G_{2}/M, M/G_{1}, and G_{1}/S back to S two and a half times. Because the projection of the 0phase arraylets, α_{2,4}〉 and −α_{2,16}〉, is correlated with the arrays a_{2,2}〉 and a_{2,9}〉, we associate both human 0phase arraylets with the cellcycle stage S, in which the human culture is synchronized.
Projecting the expression of the yeast and human genes from the sixdimensional genelets subspace onto the twodimensional subspace that approximates it, ≥50% of the contributions of the six genelets add up in the overall expression of 547 of the 604 yeast genes that were classified as cell cycleregulated by Spellman et al. (11), 709 of the 750 human genes classified by Whitfield et al. (12), and 71 of the 77 yeast and 71 of the 73 human genes classified by traditional methods (including, e.g., 14 of 16 human histones, that were not classified by Whitfield et al. as cell cycleregulated based on their overall expression). Simultaneous classification of the yeast and human genes into the five cellcycle stages describes the yeast and human cell cycles' progression along the yeast and human genes, respectively, and is in good agreement with the classifications by Spellman et al. and Whitfield et al. and also the traditional ones. Because the projection of the 0phase genelets, 〈γ_{4} and −〈γ_{16}, is correlated with yeast genes that peak late in G_{2}/M and early in M/G_{1} and human genes that peak in S, we associate 〈γ_{4} and −〈γ_{16} with cellcycle expression oscillations of yeast at the transition from G_{2}/M to M/G_{1} and human at S. This simultaneous classification therefore outlines a correspondence between the groups of yeast genes and those of human genes, e.g., yeast genes that peak at M/G_{1} correspond to human genes that peak at S, the cellcycle stages in which the yeast and human cultures are synchronized initially, respectively.
With all 4,523 yeast and 12,056 human genes sorted, the gene variations of the six yeast and six human arraylets approximately fit oneperiod cosines of π/3, 0, and −π/3 initial phases (Fig. 4) such that the initial phase of each arraylet is similar to that of its corresponding genelet. Both sorted and reconstructed yeast and human expressions approximately fit traveling waves of oneperiod cosinusoidal variation across the genes and of two or two and a half periods across the arrays, respectively.
Exclusive Yeast PheromoneResponse Subspace.
The genelets 〈γ_{1} and 〈γ_{2}, insignificant in the human data set relative to that of the yeast, with θ_{1}, θ_{2} > π/7 (Fig. 1), describe initial transient increase and decrease in expression, respectively (Fig. 2). A theme of yeast response to pheromone synchronization emerges from the annotations of those yeast genes with contributions from 〈γ_{1} and 〈γ_{2} that are largest in magnitude. The genelet 〈γ_{6}, equally significant in both data sets with θ_{6} ∼ 0, describes an initial transient increase in expression superimposed on cosinusidial variation. A theme of transition from pheromone response to cellcycle progression emerges from the annotations of those yeast genes with contributions from 〈γ_{6}, as listed in the corresponding yeast arraylet α_{1,6}〉, that are largest in magnitude (see Data Set 9). We associate these three genelets and corresponding three yeast arraylets with the pheromone response, which is exclusive to the yeast genome. Classification of the yeast genes and arrays into pheromoneresponse stages in the subspaces spanned by these genelets and arraylets, respectively, is in good agreement with the traditional understanding of this program (ref. 13; Figs. 12–14, which are published as supporting information on the PNAS web site).
Exclusive Human StressResponse Subspace.
The genelets 〈γ_{17} and 〈γ_{18} are insignificant in the yeast data set relative to that of the human, with θ_{17}, θ_{18} < −π/6. A theme of human synchronization stress response emerges from the annotations of those human genes with contributions from 〈γ_{17} and 〈γ_{18} that are largest in magnitude. Also, from the annotations of those human genes with contributions from 〈γ_{6}, as listed in the corresponding human arraylet α_{2,6}〉, that are largest in magnitude emerges a theme of transition from stress response to cellcycle progression (see Data Set 10). We associate these three genelets and corresponding three human arraylets with this humanexclusive stress response. Classification of the human genes and arrays into stressresponse stages in the subspaces spanned by these genelets and arraylets, respectively, is in agreement with current understanding of this program (ref. 12; Figs. 15–17, which are published as supporting information on the PNAS web site).
Differential Expression of Yeast Genes in the Exclusive PheromoneResponse and the Common CellCycle Subspaces.
According to their expression in the yeastexclusive pheromoneresponse subspace, mRNA expression of both yeast genes KAR4 and CIK1 peak early in the time course (together with that of other genes known to be involved in the αfactor response) (Fig. 3). In the common cellcycle subspace, KAR4 peaks at the G_{1} cellcycle stage, whereas CIK1 peaks almost half a cellcycle period later (and also earlier) at S/G_{2} (Fig. 12). This differential expression of CIK1 and KAR4 in the response to pheromone program vs. that of the cell cycle is in agreement with the experimental observation of Kurihara et al. (15), who showed that induction of CIK1 depends on that of KAR4 during mating, and is independent of KAR4 during mitosis.
Differential Expression of Human Genes in the Exclusive StressResponse and the Common CellCycle Subspaces.
In the humanexclusive stressresponse subspace, most human histones reach their expression minima early (Fig. 3). In the common cellcycle subspace, most histones peak early, together with other genes known to peak in the cellcycle stage S (Fig. 14). This differential expression of most histones may explain why these histones do not appear to be cell cycleregulated based on their overall expression.
Conclusions
We have shown that GSVD provides a comparative mathematical framework for two genomescale expression data sets, in which the variables and operations may represent some biological reality. Using GSVD in a comparison of yeast and human cellcycle expression data sets, we were able to find (i) biological similarity in these two disparate organisms in terms of their mRNA expression during their cellcycle programs; (ii) experimental dissimilarity in terms of yeast and human mRNA expression during their different synchronizationresponse programs; and (iii) differential gene expression in the yeast and human cellcycle programs vs. their synchronizationresponse programs, respectively.
Possible additional applications of GSVD include comparison of two genomic data sets, each corresponding to (i) the same experiment repeated, e.g., using different experimental protocols, to separate the biological signal that is similar in both data sets from the dissimilar experimental artifacts; (ii) one of two different types of genomic information (e.g., DNA copy number, mRNA expression, or protein abundance) collected from the same set of samples (e.g., tumor samples) to elucidate the molecular composition of the overall biological signal in these samples; (iii) one of two chromosomes of the same organism to illustrate the relation, if any, between these chromosomes in terms of their, e.g., mRNA expression in a given set of samples; and (iv) one of two interacting organisms, e.g., during infection, to illuminate the exchange of biological information in these interactions.
Acknowledgments
We thank G. H. Golub for insightful discussions of matrix computation, M. L. Whitfield for discussions of the human cellcycle data and careful reading, and G. M. Church, S. R. Eddy, and E. Rivas for thoughtful reviews of this manuscript. This work was supported by National Cancer Institute Grants CA77097 (to D.B.) and CA85129 (to P.O.B.) and National Institute of General Medical Sciences Grant GM46406 (to D.B.). O.A. is a Sloan Foundation/Department of Energy Postdoctoral Fellow in Computational Molecular Biology (DEFG0399ER62836) and a National Human Genome Research Institute Individual Mentored Research Scientist Development Awardee in Genomic Research and Analysis (5 K01 HG0003801). P.O.B. is a Howard Hughes Medical Institute Investigator.
Footnotes
Abbreviations

SVD, singular value decomposition

GSVD, generalized SVD
 Accepted January 14, 2003.
 Copyright © 2003, The National Academy of Sciences
References
 ↵
 Alter O.
 ↵
 Alter O.
 ↵
 Wen X.

 Hilsenbeck S. G.

 Raychaudhuri S.

 Holter N. S.
 ↵
 Golub G. H.
 ↵
 ↵
 ↵
 Spellman P. T.
 ↵
 Whitfield M. L.
 ↵
 Dwight S. S.
 ↵
 Sherlock G.
 ↵
 Kurihara L. J.
Citation Manager Formats
More Articles of This Classification
Biological Sciences
Related Content
 No related articles found.
Cited by...
 Personalized disease signatures through informationtheoretic compaction of big cancer data
 Factors underlying variable DNA methylation in a human community cohort
 Transcriptome Analysis of Pseudomonas syringae Identifies New Genes, Noncoding RNAs, and Antisense Activity
 An overview of recent developments in genomics and associated statistical methods
 A tensor higherorder singular value decomposition for integrative analysis of DNA microarray data from different studies
 Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables
 A rapid genomescale response of the transcriptional oscillator to perturbation reveals a perioddoubling path to phenotypic change
 Discovery of principles of nature from mathematical modeling of DNA microarray data
 Disentangling information flow in the RascAMP signaling network
 Reconstructing the pathways of a cellular system from genomescale signals by using matrix and tensor computations
 Integrative analysis of genomescale data by using pseudoinverse projection predicts novel correlation between DNA replication and RNA transcription
 A genomewide oscillation in transcription gates DNA replication and cell cycle
 A GeneCoexpression Network for Global Discovery of Conserved Genetic Modules
 Comparing the continuous representation of timeseries expression profiles to identify differentially expressed genes