PNAS | August 29, 2000 | vol. 97 | no. 18 | 10101-10106
Previous Article |
Table of Contents
| Next Article
Genetics
Singular value decomposition for genome-wide expression data
processing and modeling
Orly
Alter*,
,
Patrick
O.
Brown
, and
David
Botstein*
Departments of * Genetics and
Biochemistry, Stanford
University, Stanford, CA 94305
Contributed by David Botstein, June 15, 2000
 |
Abstract |
We describe the use of singular value decomposition in transforming
genome-wide expression data from genes × arrays space to reduced
diagonalized "eigengenes" × "eigenarrays" space, where the
eigengenes (or eigenarrays) are unique orthonormal superpositions of
the genes (or arrays). Normalizing the data by filtering out the
eigengenes (and eigenarrays) that are inferred to represent noise or
experimental artifacts enables meaningful comparison of the expression
of different genes across different arrays in different experiments.
Sorting the data according to the eigengenes and eigenarrays gives a
global picture of the dynamics of gene expression, in which individual
genes and arrays appear to be classified into groups of similar
regulation and function, or similar cellular state and biological
phenotype, respectively. After normalization and sorting, the
significant eigengenes and eigenarrays can be associated with observed
genome-wide effects of regulators, or with measured samples, in which
these regulators are overactive or underactive, respectively.
 |
Introduction |
DNA microarray technology (1,
2) and genome sequencing have advanced to the point that it is now
possible to monitor gene expression levels on a genomic scale (3).
These new data promise to enhance fundamental understanding of life on
the molecular level, from regulation of gene expression and gene
function to cellular mechanisms, and may prove useful in medical
diagnosis, treatment, and drug design. Analysis of these new data
requires mathematical tools that are adaptable to the large quantities of data, while reducing the complexity of the data to make them comprehensible. Analysis so far has been limited to identification of
genes and arrays with similar expression patterns by using clustering
methods (4-9).
We describe the use of singular value decomposition (SVD) (10) in
analyzing genome-wide expression data. SVD is also known as
Karhunen-Loève expansion in pattern recognition (11) and as
principal-component analysis in statistics (12). SVD is a linear
transformation of the expression data from the genes × arrays
space to the reduced "eigengenes" × "eigenarrays"
space. In this space the data are diagonalized, such that each
eigengene is expressed only in the corresponding eigenarray, with
the corresponding "eigenexpression" level indicating their
relative significance. The eigengenes and eigenarrays are unique, and
therefore also data-driven, orthonormal superpositions of the genes and
arrays, respectively.
We show that several significant eigengenes and the corresponding
eigenarrays capture most of the expression information. Normalizing the
data by filtering out the eigengenes (and the corresponding
eigenarrays) that are inferred to represent noise or experimental
artifacts enables meaningful comparison of the expression of different
genes across different arrays in different experiments. Such
normalization may improve any further analysis of the expression data.
Sorting the data according to the correlations of the genes (and
arrays) with eigengenes (and eigenarrays) gives a global picture of the
dynamics of gene expression, in which individual genes and arrays
appear to be classified into groups of similar regulation and function,
or similar cellular state and biological phenotype, respectively. These
groups of genes (or arrays) are not defined by overall similarity in
expression, but only by similarity in the expression of any chosen
subset of eigengenes (or eigenarrays). Upon comparing two or more
similar experiments, with a regulator being overactive or underactive in one but normally expressed in the others, the expression pattern of
one of the significant eigengenes may be correlated with the expression
patterns of this regulator and its targets. This eigengene, therefore,
can be associated with the observed genome-wide effect of the
regulator. The expression pattern of the corresponding eigenarray
is correlated with the expression patterns observed in samples in which
the regulator is overactive or underactive. This eigenarray, therefore,
can be associated with these samples.
We conclude that SVD provides a useful mathematical framework for
processing and modeling genome-wide expression data, in which both the
mathematical variables and operations may be assigned biological meaning.
 |
Mathematical Framework: Singular Value Decomposition |
The relative expression levels of N genes of a model
organism, which may constitute almost the entire genome of this
organism, in a single sample, are probed simultaneously by a single
microarray. A series of M arrays, which are almost identical
physically, probe the genome-wide expression levels in M
different samples
i.e., under M different experimental
conditions. Let the matrix ê, of size
N-genes × M-arrays, tabulate the full
expression data. Each element of ê satisfies
n|ê|m
enm for all
1
n
N and 1
m
M,
where enm is the relative expression level of
the nth gene in the mth sample as measured by the
mth array.§ The
vector in the nth row of the matrix ê,
gn|
n|ê, lists the relative
expression of the nth gene across the different samples which correspond to the different arrays. The vector in the
mth column of the matrix ê, |am
ê|m
, lists the genome-wide relative expression
measured by the mth array.
SVD (10) is then linear transformation of the expression data from the
N-genes × M-arrays space to the reduced
L-"eigenarrays" × L-"eigengenes"
space, where L = min{M, N} (see Fig. 7 in supplemental material at www.pnas.org),
|
[ 1 ]
|
In this space the data are represented by the diagonal nonnegative
matrix ê, of size L-eigengenes × L-eigenarrays, which satisfies
k|
|l
l
kl
0 for all
1
k,l
L, such that the lth
eigengene is expressed only in the corresponding lth
eigenarray, with the corresponding "eigenexpression" level
l. Therefore, the expression of each
eigengene (or eigenarray) is decoupled from that of all other
eigengenes (or eigenarrays). The "fraction of eigenexpression,"
|
[ 2 ]
|
indicates the relative significance of the lth
eigengene and eigenarray in terms of the fraction of the overall
expression that they capture. Assume also that the eigenexpression
levels are arranged in decreasing order of significance, such that
1
2
...
L
0. "Shannon entropy" of a dataset,
|
[ 3 ]
|
measures the complexity of the data from the distribution of the
overall expression between the different eigengenes (and eigenarrays), where d = 0 corresponds to an
ordered and redundant dataset in which all expression is captured by a
single eigengene (and eigenarray), and d = 1
corresponds to a disordered and random dataset where all eigengenes
(and eigenarrays) are equally expressed.
The transformation matrices û and
T define the N-genes × L-eigenarrays and the L-eigengenes × M-arrays basis sets, respectively. The vector in the
lth row of the matrix
T,

l|
l|
T, lists the
expression of the lth eigengene across the different arrays.
The vector in the lth column of the matrix û,
|
l
û|l
, lists the genome-wide
expression in the lth eigenarray. The eigengenes and
eigenarrays are orthonormal superpositions of the genes and arrays,
such that the transformation matrices û and
are both orthogonal
|
[ 4 ]
|
where Î is the identity matrix. Therefore,
the expression of each eigengene (or eigenarray) is not only decoupled
but also decorrelated from that of all other eigengenes (or
eigenarrays). The eigengenes and eigenarrays are unique, except in
degenerate subspaces, defined by subsets of equal eigenexpression
levels, and except for a phase factor of ±1, such that each eigengene (or eigenarray) captures both parallel and antiparallel gene (or array)
expression patterns. Therefore, SVD is data-driven, except in
degenerate subspaces.
SVD Calculation.
According to Eqs. 1 and 4, the
M-arrays × M-arrays symmetric correlation
matrix â = êTê = 
2
T is represented
in the L-eigengenes × L-eigengenes space by
the diagonal matrix
2. The
N-genes × N-genes correlation matrix
= êêT = û
2ûT is represented
in the L-eigenarrays × L-eigenarrays space
also by
2, where for L = min{M, N} = M,
has a null subspace of at
least N
M null eigenvalues. We, therefore, calculate
the SVD of a dataset ê, with M
N, by
diagonalizing â, and then projecting the resulting
and
onto ê to
obtain û = ê

1.
Pattern Inference.
The decorrelation of the eigengenes (and eigenarrays) suggests the
possibility that some of the eigengenes (and the corresponding eigenarrays) represent independent regulatory programs or processes (and corresponding cellular states). We infer that an eigengene |
l
represents a regulatory program or
process from its expression pattern across all arrays, when this
pattern is biologically interpretable. This inference may be supported
by a corresponding coherent biological theme reflected in the functions
of the genes, whose expression patterns correlate or anticorrelate with
the pattern of this eigengene. With this we assume that the
corresponding eigenarray |
l
(which lists
the amplitude of this eigengene pattern in the expression of each gene
|gn
relative to all other genes
n|
l
=
gn|
l
/
l) represents the cellular state which corresponds to this process. We
infer that the eigenarray |
l
represents
a cellular state from the arrays whose expression patterns correlate or
anticorrelate with the pattern of this eigenarray. Upon sorting of the
genes, this inference may be supported by the expression pattern of
this eigenarray across all genes, when this pattern is biologically interpretable.
Data Normalization.
The decoupling of the eigengenes and eigenarrays allows filtering
the data without eliminating genes or arrays from the dataset. We
filter any of the eigengenes |
l
(and the
corresponding eigenarray |
l
)
ê
ê
l|
l

l|, by substituting zero for the
eigenexpression level
l = 0 in the
diagonal matrix
and reconstructing the data
according to Eq. 1. We normalize the data by filtering out those eigengenes (and eigenarrays) that are inferred to represent noise
or experimental artifacts.
Degenerate Subspace Rotation.
The uniqueness of the eigengenes and eigenarrays does not hold in a
degenerate subspace, defined by equal eigenexpression levels. We
approximate significant similar eigenexpression levels
l
l+1
...
m with
l = ... =
m =
.
Therefore, Eqs. 1-4 remain valid upon rotation of the
corresponding eigengenes {(|
l
, ... ,
|
m
)
(|
l
, ... ,
|
m
)}, and eigenarrays
{(|
l
, ... , |
m
)
(|
l
, ... ,
|
m
)}, for all orthogonal
,
T
= Î. We choose a unique rotation
by subjecting the rotated eigengenes to m
l constraints, such that these constrained eigengenes may be
advantageous in interpreting and presenting the expression data.
Data Sorting.
Inferring that eigengenes (and eigenarrays) represent independent
processes (and cellular states) allows sorting the data by similarity
in the expression of any chosen subset of these eigengenes (and
eigenarrays), rather than by overall similarity in expression. Given
two eigengenes |
k
and
|
l
(or eigenarrays |
k
and
|
l
), we plot the correlation of
|
k
with each gene |gn
,

k|gn
/
gn|gn
(or |
k
with each array
|am
) along the y-axis, vs. that
of |
l
(or
|
l
) along the x-axis. In this
plot, the distance of each gene (or array) from the origin
is its amplitude of expression in the subspace spanned
by |
k
and
|
l
(or |
k
and |
l
), relative to its overall
expression rn
gn|gn
1
(or rm
am|am
1
).
The angular distance of each gene (or array) from the x-axis
is its phase in the transition from the expression pattern
|
l
to |
k
and back to |
l
(or
|
l
to |
k
and back to |
l
) tan
n

k|gn
/
l|gn
,
(or tan
m

k|an
/
l|am
). We sort the genes (or arrays) according to
n
(or
m).
 |
Biological Data Analysis: Elutriation-Synchronized Cell Cycle |
Spellman et al. (3) monitored genome-wide mRNA levels,
for 6,108 ORFs of the budding yeast Saccharomyces cerevisiae
simultaneously, over approximately one cell cycle period, T
390 min, in a yeast culture synchronized by elutriation,
relative to a reference mRNA from an asynchronous yeast culture, at
30-min intervals. The elutriation dataset we analyze (see supplemental
data and Mathematica notebook at www.pnas.org and at
http://genome-www.stanford.edu/SVD/) tabulates the measured
ratios of gene expression levels for the N = 5,981 genes, 784 of which were classified by Spellman et al. as
cell cycle regulated, with no missing data in the M = 14 arrays.
Pattern Inference.
Consider the 14 eigengenes of the elutriation dataset. The first and
most significant eigengene |
1
, which
describes time invariant relative expression during the cell cycle
(Fig. 8a at www.pnas.org), captures more than 90% of the
overall relative expression in this experiment (Fig. 8b).
The entropy of the dataset, therefore, is low d = 0.14
1. This suggests that the underlying processes are manifested by
weak perturbations of a steady state of expression. This also suggests
that time-invariant additive constants due to uncontrolled experimental
variables may be superimposed on the data. We infer that
|
1
represents experimental additive constants superimposed on a steady gene expression state, and assume
that |
1
represents the corresponding
steady cellular state. The second, third, and fourth eigengenes, which
show oscillations during the cell cycle (Fig. 8c), capture
about 3%, 1%, and 0.5% of the overall relative expression,
respectively. The time variation of |
3
fits a normalized sine function of period T,
sin(2
t/T). We infer that
|
3
represents expression oscillation, which is consistent with gene expression oscillations during a cell
cycle. The time variations of the second and fourth eigengenes fit a
cosine function of period T with
the
amplitude of a normalized cosine with this period,
cos 2
t/T. However, while
|
2
shows decreasing expression on transition from t = 0 to 30 min,
|
4
shows increasing expression. We infer
that |
2
and
|
4
represent initial transient increase and decrease in expression in response to the elutriation,
respectively, superimposed on expression oscillation during the cell cycle.
Data Normalization.
We filter out the first eigengene and eigenarray of the elutriation
dataset, ê
êC = ê
1|
1

1|,
removing the steady state of expression. Each of the elements of the
dataset êC,
n|êC|m
eC,nm, is the difference of the measured
expression of the nth gene in the mth array from
the steady-state levels of expression for these gene and array as
calculated by SVD. Therefore, eC,nm2 is the
variance in the measured expression of the nth gene in the
mth array. Let êLV tabulate the
natural logarithm of the variances in elutriation expression, such that
each element of êLV satisfies
n|êLV|m
log(eC,nm2) for all 1
n
N and 1
m
M, and consider the
eigengenes of êLV (Fig. 9a in
supplemental material at www.pnas.org). The first eigengene
|
1
LV, which captures more
than 80% of the overall information in this dataset (Fig.
9b), describes a weak initial transient increase
superimposed on a time-invariant scale of expression variance. The
initial transient increase in the scale of expression variance may be a
response to the elutriation. The time-invariant scale of expression
variance suggests that a steady scale of experimental as well as
biological uncertainty is associated with the expression data. This
also suggests that time-invariant multiplicative constants due to
uncontrolled experimental variables may be superimposed on the data. We
filter out |
1
LV, removing
the steady scale of expression variance, êLV
êCLV = êLV
1,LV|
1
LV LV
1|.
The normalized elutriation dataset
êN, where each of its elements satisfies
n|êN|m
sign(eC,nm)
, tabulates for each gene and array expression patterns that are approximately centered at the steady-state expression level (i.e., of
approximately zero arithmetic means), with variances which are
approximately normalized by the steady scale of expression variance
(i.e., of approximately unit geometric means). The first and second
eigengenes, |
1
N and
|
2
N, of
êN (Fig.
1a), which are of similar
significance, capture together more than 40% of the overall normalized
expression (Fig. 1b). The time variations of
|
1
N and
|
2
N fit normalized sine and
cosine functions of period T and initial phase
2
/13,
sin(2
t/T
) and
cos(2
t/T
), respectively (Fig. 1c). We infer that
|
1
N and
|
2
N represent cell cycle expression oscillations, and assume that the corresponding eigenarrays |
1
N and
|
2
N represent the
corresponding cell cycle cellular states. Upon sorting of the genes
(and arrays) according to
|
1
N and
|
2
N (and
|
1
N and
|
2
N), the initial phase
2
/13 can be interpreted as a delay of 30 min between
the start of the experiment and that of the cell cycle stage
G1. The decay to zero in the time variation of
|
2
N at t = 360 and 390 min can be interpreted as dephasing in time of the
initially synchronized yeast culture.

View larger version (35K):
[in this window]
[in a new window]
|
Fig. 1.
Normalized elutriation eigengenes. (a) Raster display of
NT, the expression of 14 eigengenes
in 14 arrays. (b) Bar chart of the fractions of
eigenexpression, showing that
| 1 N and
| 2 N capture about 20% of
the overall normalized expression each, and a high entropy d = 0.88. (c) Line-joined graphs of the expression levels
of | 1 N (red) and
| 2 N (blue) in the 14 arrays
fit dashed graphs of normalized sine (red) and cosine (blue) of period
T = 390 min and phase = 2 /13,
respectively.
|
|
Data Sorting.
Consider the normalized expression of the 14 elutriation arrays
{|am
} in the subspace spanned by
|
1
N and
|
2
N, which is assumed to
approximately represent all cell cycle cellular states (Fig.
2a). All arrays have at least
25% of their normalized expression in this subspace, with their
distances from the origin satisfying 0.5
rm < 1, except for the eleventh array
|a11
. This suggests that
|
1
N and
|
2
N are sufficient to
approximate the elutriation array expression. The sorting of the arrays
according to their phases {
m}, which
describes the transition from the expression pattern
|
2
N to
|
1
N and back to
|
2
N, gives an array order which is similar to that of the cell cycle time points measured by the
arrays, an order that describes the progress of the cell cycle
expression from the M/G1 stage through G1, S,
S/G2, and G2/M and back to
M/G1.

View larger version (40K):
[in this window]
[in a new window]
|
Fig. 2.
Normalized elutriation expression in the subspace associated with the
cell cycle. (a) Array correlation with
| 1 N along the
y-axis vs. that with
| 2 N along the
x-axis, color-coded according to the classification of the
arrays into the five cell cycle stages, M/G1 (yellow),
G1 (green), S (blue), S/G2 (red), and
G2/M (orange). The dashed unit and half-unit circles
outline 100% and 25% of overall normalized array expression in the
| 1 N and
| 2 N subspace. (b)
Correlation of each gene with
| 1 N vs. that with
| 2 N, for 784 cell cycle
regulated genes, color-coded according to the classification by
Spellman et al. (3).
|
|
Because |
1
N is
correlated with the arrays |a4
,
|a5
, |a6
, and
|a7
and is anticorrelated with
|a13
and |a14
,
we associate |
1
N with the
cell cycle cellular state of transition from G1 to S, and
|
1
N with the transition
from G2/M to M/G1. Similarly, |
2
N is correlated with
|a2
and |a3
,
and therefore we associate |
2
N with the transition from
M/G1 to G1. Also,
|
2
N is anticorrelated with
|a8
and |a10
,
and therefore we associate
|
2
N with the transition
from S to S/G2. With these associations the phase of |a1
,
1 = 
2
/13, corresponds to the 30-min delay between the start of
the experiment and that of the cell cycle stage G1, which
is also present in the inferred cell cycle expression oscillations |
1
N and
|
2
N.
Consider also the expression of the 5,981 genes
{|gn
} in the subspace spanned by
|
1
N and
|
2
N, which is inferred to
approximately represent all cell cycle expression oscillations (Fig. 10 in supplemental material at www.pnas.org). One may expect that genes
that have almost all of their normalized expression in this subspace
with rn
1 are cell cycle regulated,
and that genes that have almost no expression in this subspace
with rn
0, are not regulated by the
cell cycle at all. Indeed, of the 784 genes that were classified by
Spellman et al. (3) as cell cycle regulated, 641 have more than 25% of their normalized expression in this subspace (Fig. 2b). We sort all 5,981 genes according to their phases
{
n}, to describe the transition from the
expression pattern |
2
N to
that of |
1
N and back to
|
2
N, starting at
1
2
/13. One may expect this to order the
genes according to the stages in the cell cycle in which their
expression patterns peak. However, for the 784 cell cycle regulated
genes this sorting gives a classification of the genes into the five
cell cycle stages, which is somewhat different than the classification
by Spellman et al. This may be due to the poor quality of
the elutriation expression data, as synchronization by elutriation was
not very effective in this experiment. For the
factor-synchronized
cell cycle expression there is much better agreement between the two
classifications (Fig. 5b).
With all 5,981 genes sorted, the gene variations of
|
1
N and
|
2
N fit normalized sine and
cosine functions of period Z
N
1 = 5,980 and initial phase
2
/13,

sin(2
z/Z
) and
cos(2
z/Z
),
respectively, where z
n
1 (Fig.
3 b and c). The
sorted and normalized elutriation expression fit approximately a
traveling wave of expression, varying sinusoidally across both genes
and arrays, such that the expression of the nth gene in the
mth array satisfies
n|êN|m
2 cos[2
(t/T
z/Z)]/
(Fig. 3a).

View larger version (77K):
[in this window]
[in a new window]
|
Fig. 3.
Genes sorted by relative correlation with
| 1 N and
| 2 N of normalized
elutriation. (a) Normalized elutriation expression of the
sorted 5,981 genes in the 14 arrays, showing traveling wave of
expression. (b) Eigenarrays expression; the expression of
| 1 N and
| 2 N, the eigenarrays
corresponding to | 1 N and
| 2 N, displays the sorting.
(c) Expression levels of
| 1 N (red) and
| 2 N (green) fit normalized
sine and cosine functions of period Z N 1 = 5,980 and phase 2 /13 (blue), respectively.
|
|
 |
Biological Data Analysis: Factor-Synchronized Cell Cycle and
CLB2 and CLN3 Overactivations |
Spellman et al. (3) also monitored genome-wide mRNA
levels, for 6,108 yeast ORFs simultaneously, over approximately two cell cycle periods, in a yeast culture synchronized by
factor, relative to a reference mRNA from an asynchronous yeast culture, at
7-min intervals for 119 min. They also measured, in two independent experiments, mRNA levels of yeast strain cultures with overactivated CLB2, which encodes a G2/M cyclin, both at
t = 40 min relative to their levels at the start of
overactivation at t = 0. Two additional independent
experiments measured mRNA levels of strain cultures with overactivated
CLN3, which encodes a G1/S cyclin, at
t = 30 and 40 min relative to their levels at the start
of overactivation at t = 0. The dataset for the
factor, CLB2, and CLN3 experiments we analyze
(see supplemental data and Mathematica notebook at www.pnas.org)
tabulates the ratios of gene expression levels for the N = 4,579 genes, 638 of which were classified by Spellman et
al. as cell cycle regulated, with no missing data in the
M = 22 arrays.
After data normalization and degenerate subspace rotation (see
Appendix in supplemental material at www.pnas.org), the time variations of |
1
RN and
|
2
RN fit normalized sine and
cosine functions of two 66-min periods during the cell cycle, from
t = 7 to 119 min, and initial phase
/4, respectively (Fig. 4c). While
|
2
RN describes steady-state
expression in the CLB2- and CLN3-overactive
arrays, |
1
RN describes
underexpression in the CLB2-overactive arrays and
overexpression in the CLN3-overactive arrays.

View larger version (43K):
[in this window]
[in a new window]
|
Fig. 4.
Rotated normalized factor, CLB2, and CLN3
eigengenes. (a) Raster display of
RNT, where
| 1 RN = 2 1| 1 N,
| 2 RN = 1| 2 N, and
| 3 RN = 2| 3 N.
(b) | 1 RN,
| 2 RN and
| 3 RN capture 20% of the
overall normalized expression each. (c) Expression levels of
| 1 RN (red) and
| 2 RN (blue) fit dashed
graphs of normalized sine (red) and cosine (blue) of period
T/2 = 66 min and phase /4, respectively, and
| 3 RN (green) fits dashed
graph of normalized sine of period T = 112 min and
phase  /8, from t = 7 to t = 119
min during the cell cycle.
|
|
Upon sorting the 4,579 genes in the subspace spanned by
|
1
RN and
|
2
RN (Fig.
5b),
|
1
RN is correlated with
genes that peak late in the cell cycle stage G1 and early
in S, among them CLN3, and we associate
|
1
RN with the cell cycle
expression oscillations that start at the transition from
G1 to S and are dependent on CLN3, which encodes
a G1/S cyclin. Also,
|
1
RN is anticorrelated with
genes that peak late in G2/M and early in
M/G1, among them CLB2, and therefore we
associate
|
1
RN with the
oscillations that start at the transition from G2/M to
M/G1 and are dependent on CLB2, which encodes
a G2/M cyclin. Similarly,
|
2
RN is correlated with
genes that peak late in M/G1 and early in G1,
anticorrelated with genes that peak late in S and early in
S/G2, and uncorrelated with CLB2 and
CLN3. We, therefore, associate
|
2
RN with the oscillations that start at the transition from M/G1 to G1
(and appear to be CLB2- and CLN3-independent),
and
|
2
RN with the
oscillations that start at the transition from S to S/G2
(and appear to be CLB2- and CLN3-independent).

View larger version (41K):
[in this window]
[in a new window]
|
Fig. 5.
Rotated normalized factor, CLB2, and CLN3
expression in the subspace associated with the cell cycle.
(a) Array correlation with
| 1 RN along the
y-axis vs. that with
| 2 RN along the
x-axis, color-coded according to the classification of the
arrays into the five cell cycle stages, M/G1 (yellow),
G1 (green), S (blue), S/G2 (red), and
G2/M (orange). The dashed unit and half-unit circles
outline 100% and 25% of overall normalized array expression in the
| 1 RN and
| 2 RN subspace.
(b) Correlation of each gene with
| 1 RN vs. that with
| 2 RN, for 638 cell cycle
regulated genes, color-coded according to the classification by
Spellman et al. (3).
|
|
Upon sorting the 22 arrays in the subspace spanned by
|
1
RN and
|
2
RN (Fig. 5a),
|
1
RN is correlated with the
arrays |a13
and
|a14
, as well as with
|a21
and |a22
,
which measure the CLN3-overactive samples. We therefore
associate |
1
RN with the cell
cycle cellular state of transition from G1 to S, which is simulated by CLN3 overactivation. Also,
|
1
RN is anticorrelated with
the arrays |a9
and
|a10
, as well as with
|a19
and |a20
, which measure the CLB2-overactive samples. We associate
|
1
RN with the cellular
transition from G2/M to M/G1, which is
simulated by CLB2 overactivation. Similarly,
|
2
RN appears to be
correlated with |a2
, |a3
,
|a11
, and |a12
,
anticorrelated with |a6
, |a7
, |a16
, and |a17
, and
uncorrelated with |a19
, |a20
,
|a21
, or |a22
. We
therefore associate |
2
RN
with the cellular transition from M/G1 to G1
(which appears to be CLB2- and CLN3-independent), and
|
2
RN with the cellular
transition from S to S/G2 (which also appears to be
CLB2- and CLN3-independent).
With all 4,579 genes sorted the gene variations of
|
1
RN and
|
2
RN fit normalized sine and
cosine functions of period Z
N
1 = 4,578 and initial phase
/8, respectively (Fig.
6 b and c). The
normalized and sorted cell cycle expression approximately fits a
traveling wave, varying sinusoidally across both genes and arrays. The
normalized and sorted expression in the CLB2- and
CLN3-overactive arrays approximately fits standing waves, constant across the arrays and varying sinusoidally across genes only,
which appear similar to
|
1
RN and
|
1
RN, respectively (Fig.
6a).

View larger version (86K):
[in this window]
[in a new window]
|
Fig. 6.
Genes sorted by relative correlation with
| 1 RN and
| 2 RN of rotated normalized
factor, CLB2, and CLN3. (a)
Normalized expression of the sorted 4,579 genes in the 22 arrays,
showing traveling wave of expression from t = 0 to 119 min during the cell cycle and standing waves of expression in the
CLB2- and CLN3-overactive arrays. (b)
Eigenarrays expression; the expression of
| 1 RN and
| 2 RN, the eigenarrays
corresponding to | 1 RN and
| 2 RN, displays the sorting.
(c) Expression levels of
| 1 RN (red) and
| 2 RN (green) fit normalized
sine and cosine functions of period Z N 1 = 4,578 and phase /8 (blue), respectively.
|
|
 |
Conclusions |