Skip to main content
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian
  • Log in
  • My Cart

Main menu

  • Home
  • Articles
    • Current
    • Latest Articles
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • Archive
  • Front Matter
  • News
    • For the Press
    • Highlights from Latest Articles
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Purpose and Scope
    • Editorial and Journal Policies
    • Submission Procedures
    • For Reviewers
    • Author FAQ
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home

Advanced Search

  • Home
  • Articles
    • Current
    • Latest Articles
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • Archive
  • Front Matter
  • News
    • For the Press
    • Highlights from Latest Articles
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Purpose and Scope
    • Editorial and Journal Policies
    • Submission Procedures
    • For Reviewers
    • Author FAQ

New Research In

Physical Sciences

Featured Portals

  • Physics
  • Chemistry
  • Sustainability Science

Articles by Topic

  • Applied Mathematics
  • Applied Physical Sciences
  • Astronomy
  • Computer Sciences
  • Earth, Atmospheric, and Planetary Sciences
  • Engineering
  • Environmental Sciences
  • Mathematics
  • Statistics

Social Sciences

Featured Portals

  • Anthropology
  • Sustainability Science

Articles by Topic

  • Economic Sciences
  • Environmental Sciences
  • Political Sciences
  • Psychological and Cognitive Sciences
  • Social Sciences

Biological Sciences

Featured Portals

  • Sustainability Science

Articles by Topic

  • Agricultural Sciences
  • Anthropology
  • Applied Biological Sciences
  • Biochemistry
  • Biophysics and Computational Biology
  • Cell Biology
  • Developmental Biology
  • Ecology
  • Environmental Sciences
  • Evolution
  • Genetics
  • Immunology and Inflammation
  • Medical Sciences
  • Microbiology
  • Neuroscience
  • Pharmacology
  • Physiology
  • Plant Biology
  • Population Biology
  • Psychological and Cognitive Sciences
  • Sustainability Science
  • Systems Biology
Research Article

Robust singular value decomposition analysis of microarray data

Li Liu, Douglas M. Hawkins, Sujoy Ghosh, and S. Stanley Young
PNAS November 11, 2003 100 (23) 13167-13172; https://doi.org/10.1073/pnas.1733249100
Li Liu
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Douglas M. Hawkins
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sujoy Ghosh
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
S. Stanley Young
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  1. Edited by Peter J. Bickel, University of California, Berkeley, CA, (received for review May 22, 2003)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Abstract

In microarray data there are a number of biological samples, each assessed for the level of gene expression for a typically large number of genes. There is a need to examine these data with statistical techniques to help discern possible patterns in the data. Our technique applies a combination of mathematical and statistical methods to progressively take the data set apart so that different aspects can be examined for both general patterns and very specific effects. Unfortunately, these data tables are often corrupted with extreme values (outliers), missing values, and non-normal distributions that preclude standard analysis. We develop a robust analysis method to address these problems. The benefits of this robust analysis will be both the understanding of large-scale shifts in gene effects and the isolation of particular sample-by-gene effects that might be either unusual interactions or the result of experimental flaws. Our method requires a single pass and does not resort to complex ”cleaning” or imputation of the data table before analysis. We illustrate the method with a commercial data set.

Biologists are using DNA microarrays to monitor the level of gene expression of biological samples. Thousands of genes are typically monitored on a few to tens of samples. In the near future, it is expected that there will be data sets of hundreds of samples. Patterns of gene expression can be used to determine coregulated genes, suggest biomarkers of specific disease, and propose targets for drug intervention.

Microarray data present a number of challenges to statistical modeling. The size of the typical array (up to thousands of columns and perhaps hundreds of rows) defies easy graphical analyses. There may be severe distributional difficulties such as non-normal distributions, outliers (unusual data values), and numerous missing values. Common objectives are finding ”patterns” in the data, in particular

  • clustering the biological samples (rows) into groups with similar expression profiles;

  • clustering the genes (columns) into groups where the level of gene expression is similar in the samples.

One attractive way of clustering is a by-product of ”ordination.” Ordination involves finding suitable permutations of the rows (and perhaps of the columns) that lead to a steady progression going down the rows (and perhaps across the columns). A clustering is given by placing vertical (and perhaps horizontal) dividing lines in the array to break it up into rectangular blocks within which the values are homogeneous.

Conversely, not all clustering methods are hierarchical, but if we cluster the rows and elect to do so with any hierarchical clustering method, the dendrogram produces an ordering of the rows, although, since the layout of the dendrogram is not unique, neither is the ordering produced. Thus good ordination methods lead to good clustering whereas hierarchical clustering gives (nonunique) row ordinations.

Methods

The classical method of ordination is through the singular value decomposition. Write the expression data as an n by p array X with rows representing the n biological samples and columns representing the p genes. Approximate X with a bilinear form Math where ri is a parameter corresponding to the ith biological sample, cj corresponds to the jth gene, and eij is a ”residual.” This representation solves the ordination problem in that the rows can be ordered by their ri values and the columns by their cj values. Ordering the rows by r and the columns by c permutes the original data array to one in which we have high and low values in the corners and medium values in the middle, leading to an informative display.

Subsequently, grouping together those rows whose ri are similar will give clusters of biological samples. Grouping the columns with similar cj will give clusters of genes. If the residuals are small so that the ricj captures all of the important structure of the data matrix, then the ordination and subsequent clustering using the r or c values is essentially unique.

Standard practice is to remove ”uninteresting structure” such as a grand mean, or even the row or column means from X before attempting the approximation. This is more of an implementation detail than a central aspect of the method.

The conventional method of getting this bilinear approximation is from the singular value decomposition (SVD) of X from Healy (1). It is well known that the leading term of the SVD provides the bilinear form with the best least-squares approximation to X. The SVD is often found by performing a principal component analysis on either X′X or XX′.

The conventional SVD, however, has some serious deficiencies. First, being a least-squares method, it is highly susceptible to outlier values in the array X. Such outliers are an accepted fact of life when dealing with microarray data, where a sprinkling of entries are found to be very large or small. Second, finding the SVD through a principal component analysis of X′X requires that all elements of X be observed. This goes counter to a second reality of microarray data, which is that missing values are a routine feature of the experimental data. Standard approaches to missing data are to ”impute” values for the missing cells or eliminate whole rows or columns of the data matrix that are felt to be too incomplete. These methods are obviously at best unattractive necessities.

Alternating Least Squares. There is a standard remedy for the ”missing information” deficiency, the Gabriel-Zamir alternating least-squares algorithm (2). This begins with a tentative estimate of the column factors cj, which are used to provide a matching scaling for the rows. Regarding Math as a regression of the ith row of X on the column factors identifies rj as the coefficient of a no-intercept regression. Fitting this regression row by row by using all nonempty cells then leads to an estimate of the row factors ri. Then switching roles, we take the row factors ri as given and use regression of all nonempty cells in exactly the same way to calculate fresh estimates of the column factors cj. This approach uses all of the observed data and does not require imputation of missing data. If the data set is complete, this alternating least-squares algorithm gives the first term of the conventional SVD.

Alternating Robust Fitting (ARF). The ALS method is effective in solving the missing information problem. But it does nothing about the sensitivity to outliers. Solving the outlier issue, however, can be done by a simple change in the regression method that lies at the heart of the Gabriel-Zamir algorithm. Instead of using ordinary least squares to carry out the alternating regressions, we can use any outlier-resistant regression method such as L1 (D.M.H., L.L., and S.S.Y., www.niss.org, technical report 122), weighted L1 [refer to Croux et al. (3) for details], least trimmed squares [refer to Ukkelberg and Borgen (4) for details], or an M-estimation method. In this article, we use the least trimmed squares method. The resulting algorithm then is to take the model Math Use any convenient initial values for the column factors cj (or optionally for the row factors) and then apply a robust no-intercept linear regression algorithm to alternately use the cj to refine the estimates of the ri and the ri to refine the estimates of the cj.

Using any robust regression criterion, each of these alternating regressions will lead to a reduction in the regression criterion, so the algorithm will converge.

Properties of the ARF Fit. Broad properties of the ARF bilinear fit follow at once. The method handles missing information routinely, without requiring a separate ”fill-in” step. And it is impervious to a minority of outlier cells. Outliers will, of course, create a problem for the ARF, as with almost any conceivable method, if they constitute the majority of the elements of any row or column.

Clustering of the Rows and Columns. As already noted, sorting the rows by their ri values creates a natural ordination. This can be turned into a clustering of k groups of biological samples by finding ”breakpoints” b(0) = 0 < b(1) < b(2) < b(k - 1) < b(k) = n and allocating to cluster h those genes that, in the reordering, have index b(h - 1) < i < = b(h). The breakpoints need to be chosen so that the biological samples within each cluster have ri values as similar as possible. This can be made operational by the criterion that the pooled sum of squared deviations of the ri broken down into the k clusters should be a minimum. Exact algorithms for finding breakpoints to attain this minimum are given by Venter and Steel (5) and Hawkins (6). Similarly, applying the optimal segmentation algorithm to the column factors cj clusters the genes into any specified number of clusters such that the genes within clusters have cj values as similar as possible.

Relationship to Other Clustering Approaches

A common approach to clustering genes has been through (dis)similarity indices between the rows of X, for example, the Euclidean distance between rows as a dissimilarity measure or their correlation as a similarity measure. These measures can then be used in any convenient dissimilarity-based cluster method such as average linkage. If we look at this approach through the bilinear approximation Math we see that the squared Euclidean distance between any two rows i and k can be written Math Now if the bilinear term ricj has captured all of the ”structure” in the sample-gene association, and all that is left is statistically independent measurement noise (not necessarily small) as is the case when the SVD is used to get the bilinear approximation. Consider the three terms comprising Math.

  • The last term is zero or near zero because of statistical independence.

  • The center term is made up simply of measurement noise. It cannot contribute usefully to the clustering, but in fact will have the effect of degrading the clustering.

  • The first term is the inter-row distance that our proposal uses for its clustering. It may be thought of as a way of filtering measurement noise out of the computation of inter-row distance.

Theory therefore implies that a well-fitting bilinear approximation to the matrix X will give a better picture of the biological sample differences through its row factors ri than can be found directly by using the Euclidean distances between rows. A similar conclusion applies to using the correlation between pairs of row profiles.

There is yet another consideration favoring use of the ARF for clustering. If the matrix contains outliers, these outliers contaminate the Euclidean distances (and the correlation coefficients) between rows. However, it is a consequence of the robust fits used in the ARF that the outliers do not contaminate the ri or cj substantially.

GeneLogic Data

We will illustrate the bilinear fit by using the ARF and the subsequent segmentation of the genes and biological specimens by using the following real data sets. The data set is a subset of gene expression profiles of human tissues from the GeneExpress database, commercially available from Genelogic (Gaithersburg, MD). Gene expression data are generated on oligonucleotide microarrays from Affymetrix (Santa Clara, CA). Two sample data sets are created for analysis, one containing both normal and malignant tissues (504 samples, covering eight tissue types: adipose tissue, breast, colon, kidney, liver, lung, ovary, and prostate, hereafter called set A) and the other containing only normal tissues (822 samples covering all of the above tissues plus white blood cells, hereafter called set B). A set of 224 genes are selected for both data sets based on their roles in metabolic and signaling pathways. The goal of the study is to determine whether SVD analysis can correctly cluster the genes and samples based on their biological function (drug metabolizing genes predominantly expressed in liver samples, for example).

Fig. 1 shows the image of the unordered log transformed data matrix of set A after removing the row and column means, where the rows correspond to the cell lines and the columns correspond to the genes. In this image, we cannot see any clear patterns. The log-transformed and row and column means subtracted data are available at www.samsi.info/200304/dmml/web-internal/bio/data/data_rsvd.xls.

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

(A) The unordered gene expression data matrix of set A. The rows correspond to the cell lines, and the columns correspond to the genes. In this image, we cannot see any clear patterns. (B) Outliers identified in the image of the residuals. The outliers are yellow (higher than expected) or blue (lower than expected). To be able to view the figure clearly, we selected only 60 columns (genes) to illustrate here.

Our goal is to cluster similar genes together and similar cell lines together simultaneously. Graphically, we hope to form blocks of reds and greens.

We used the ARF to fit a bilinear approximation to the log-transformed expression matrix after removing the row and column means. Using the resulting bilinear approximation to order the rows and the columns of X leads to the display of Fig. 2A. Note that visually the ordination has been highly successful in rearranging the matrix so as to give blocks of high and low values in the corners and in-between values in the remainder of the array.

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

(A) The first rSVD component from set A. Looking at the names of the ordered samples (rows) shows clear separation of liver and kidney samples from the other seven tissues. (B) The second rSVD component from set A. This shows a separation of prostate and colon from other tissues. Here, liver(n) represents normal liver samples, liver(m) represents malignant liver samples, colon(n) represents normal colon samples, and colon(m) represents malignant colon samples.

Next, we applied the segmentation algorithm to the row factors ri to segment the biological samples, and to the column factors cj to segment the genes. Tables 1 and 2 show the results of fitting various numbers of clusters. In an exploratory statistical analysis such as this, we do not need a rigorous answer to the question of the number of genuine clusters, but guidance comes from the variance explained by breaking the factors into two groups, three groups, four groups, etc. In Tables 1 and 2, the major explained variability is attained once five clusters are formed, and so we will use this as our working solution. Numbering the columns (samples) in their ri order, the optimal division into five segments breaks samples 1-18, 19-27, 28-63, 64-204, and 205-504. Similarly, based on Table 2, the optimal five-segment division of the genes is (using their cj ordering) genes 1-6, 7-33, 34-73, 74-204, and 205-224.

View this table:
  • View inline
  • View popup
Table 1. The segmentation of tissues in component 1 for set A
View this table:
  • View inline
  • View popup
Table 2. The segmentation of genes in component 1 for set A

Looking at the names of the ordered samples (rows) shows clear separation of liver and kidney samples from the other seven tissues; samples 1-18 are mostly normal liver samples, samples, 19-27 are mostly malignant liver samples, and samples 28-63 are mostly kidney samples. Turning to the ordering of genes (columns), genes 1-6 (the first six genes in Fig. 2 A) in gene lists for set A are enriched for genes that are involved in steroid hormone metabolism (UGT1A, UGT2B, and HSD). This is biologically consistent since the liver and kidney are the two organs predominantly involved in metabolism. Two of the genes in this group (UGT1A and UGT2B) have duplicate probes on the microarrays, and the SVD algorithm correctly clusters them together. This finding is to be expected since the duplicate probes, coding for the same gene, should display very similar expression profiles.

This data set happened to contain no missing values, so it was possible to analyze it with the conventional squared-norm singular value decomposition and to carry out a clustering paralleling by using the robust SVD (rSVD). The results, however, were far inferior. The conventional SVD did a bad job in identifying the genes relating to androgen/estrogen metabolism.

We tested the validity of our rSVD results by comparing the predicted liver and kidney enrichment of genes with independently generate data on the same genes from an unrelated, public-domain gene expression database (Gene Express Atlas, maintained by the Genome Institute of Novartis Research Foundation, ref. 7). This database contains gene expression profiles from 91 human and mouse samples and is generated on an earlier version of Affymetrix microarrays (compared with that used in the GeneLogic database). Table 3 summarizes the results obtained for the queried genes. In all cases, the gene expression is significantly higher in liver and kidney samples compared with the median gene expression across all tissues. Thus the patterns of gene-sample clusters identified by rSVD are substantiated in an independent data set (see Figs. 4-6, which are published as supporting information on the PNAS web site).

View this table:
  • View inline
  • View popup
Table 3. Expression of selected genes from set A in the Gene Express Atlas database

Additionally, evidence from the literature helps explain some of the reasons the specific genes are found to be enriched in the liver samples. For example, mutations in the UGT1A gene are associated with a variety of liver-specific diseases such as Crigler-Najaar syndrome, Gilbert syndrome, familial transient neonatal hyperbilirubinemia, and cholelithiasis (8-10). Another gene found to be up-regulated in this cluster is hydroxysteroid dehydrogenase (HSD11). The gene product of HSD11 catalyzes the conversion of cortisol to cortisone and is an important regulator of glucocorticoid metabolism. The gene for hydroxysteroid (11 beta) B1 isoform was first isolated from rat liver (11). Intense immunoreactivity to HSD11B1 has been observed around the hepatic central vein, confirming that the protein expression is also high in liver (12). Studies with knockout mice demonstrate that HSD11B1 deficiency produces an improved metabolic profile characterized by increased lipid catabolism, increased hepatic insulin sensitivity, and reduced intracellular glucocorticoid concentrations (13). Another up-regulated gene, aldolase B, is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Aldolases A, B, and C are distinct proteins exhibiting developmentally regulated expression of the different isozymes. Aldolase B expression is observed only in adult human liver, kidney, and intestine. Significantly, our segmentation algorithm detected the right isoform of aldolase (aldolase B) for enriched liver expression. Deficiency in ALDOB function is related to hereditary fructose intolerance. The preferential expression of ALDOB in liver has been used in 31P magnetic resonance imaging studies to follow the metabolism of fructose in the liver of patients with this disorder (14).

Similarly, we then examine results from set B. Set B shows a clear separation of white blood cells from other tissue types as shown in the ordered samples based on the first rSVD component (see Fig. 7, which is published as supporting information on the PNAS web site). The corresponding gene segmentation shows a subset of genes (records 211-224; see Tables 5 and 6, which are published as supporting information on the PNAS web site) that show preferential expression in white blood cells. This list is enriched for genes that are involved in apoptosis (programmed cell death).

We then investigate the tissue expression of the apoptosis-specific genes in the Gene Express Atlas database. As shown in Table 4, the majority of the genes identified by the rSVD algorithm indeed show very high levels of expression in white blood cells, again providing excellent validation of our results.

View this table:
  • View inline
  • View popup
Table 4. Expression of selected genes from set B in the Gene Express Atlas database

The enrichment of proapoptotic genes in white blood cells is consistent with the biology of white blood cells. Apoptosis plays an essential role in immune system homeostasis. The vertebrate immune system uses apoptosis to control cell number, delete lymphocytes with inoperative or autoreactive receptors from its repertoire, and reverse clonal expansion at the end of an immune response. The programmed removal of lymphocytes in response to cellular stress or injury or genetic errors serves to preserve genomic integrity and constitutes an important mechanism of tumor surveillance. Given the crucial role of apoptosis in such a diverse array of physiologic functions, aberrations of this process underlie a host of immune disorders. Genetic aberrations that render cells incapable of executing their suicide program promote tumorigenesis and underlie the observed resistance of lymphoid cancers to genotoxic anticancer agents (15). In addition to immune homeostasis, T and B cell lymphocytes and neutrophils undergo programmed cell death under a wide variety of physiological and drug-induced conditions such as immunosuppression by polycyclic aromatic hydrocarbons, perturbations of redox states, tissue repair, treatment of Crohn's disease, immune evasion in renal cell carcinoma, and tumor necrosis factor α-induced effects in aging, to name a few (16-21).

Finding Additional Structure

The bilinear fit produced by the ARF does not necessarily exhaust all structure in the matrix X. As with the conventional, nonrobust least-squares SVD, we can remove the first bilinear fit from X to get the initial residual matrix (xij - ricj) and apply the ARF to this matrix to get a second pair of matching row and column factors, which may be segmented, just as were the leading pair.

Doing so does indeed uncover additional biologically meaningful structure, as shown in Fig. 2B for set A. The segmentation algorithm suggests six segments for the cell lines: 1-76, 77-195, 196-333, 334-391, 392-462, and 463-504. The cell lines of the first segment are mostly prostate samples, and the last two segments are mostly colon samples. Corresponding to this cell line segmentation, we divide the genes into five segments, genes 1-11, 12-65, 66-176, 177-219, and 220-224 (see Tables 7 and 8, which are published as supporting information on the PNAS web site).

As we can see, the two components for set A give two substantially different orderings of genes and samples and represent different aspects of the gene expression data. This richness of interpretation cannot be achieved by one single ordering of genes and samples.

We could continue this process. Subtracting this second bilinear term and repeating the ARF on the residuals gives a third component and could be repeated for a fourth and so on. The number of components that we should study depends on the number of significant eigenvalues. For set A, the scree plot of the eigenvalues suggests that two components (see Fig. 3A) capture all of the structure of the array, so we stop at two components.

Fig. 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 3.

(A) The plot of the eigenvalues for set A. This plot suggests that we keep the first two components. (B) The quantile-quantile plot of the residuals for set A. As shown, the residuals follow a heavy tailed distribution.

Finding Outliers, Filling In Estimates of Missing Values, and Smoothing

A strength of our method is that it does not require complete information and is not affected by a minority of outliers. Outliers can be identified automatically by looking at the final residuals after removal of the structural components. A simple outlier model might be that most of the residuals follow a normal distribution, but that some small number are ”wild.” A probability plot shows that, rather than this simple two-category model, the residuals follow a heavy-tailed distribution (Fig. 3B). If we so want, we can flag those readings that seem particularly anomalous. For example, in normal data 1.5 times the median absolute deviation (MAD) of the residuals gives a robust estimate of the standard deviation. Thus residuals more than six times MAD (a cutoff equivalent to four standard deviations) should be extremely rare. In the actual data, however, some 2% of residuals are beyond six times MAD. This is a red-flag warning against the use of nonrobust methods (22).

Enriching notation slightly, write rim and cjm for the row and column factors given in the mth bilinear pair fitted. Then for any missing cell ij, we could predict the missing value by Σmrimcjm. Another possible use of the ARF fits is to replace the entire matrix by the rank-m approximation given by using this missing value fill-in for all cells for purposes of other statistical analyses or displays. The potential attraction of this approach is that it would largely remove the impact of outlier cells as well as avoiding gaps in the matrix.

Discussion

We have proposed an analysis based on a variant of the SVD that is largely impervious to outliers and missing information. This can be used for ordination and display of the microarray and also for segmentation.

The microarray example illustrates the usefulness of this method, where the cell lines of the same origin are grouped together, and some genes found are confirmed by previous literature. The outlier detection points out some possible outliers. They may be experimental mistakes or specific gene actions that deserve further study.

This microarray example was chosen to verify that the techniques worked on a reasonably well-understood data set. In addition to this example, these methods have been used successfully on other public-domain and proprietary data sets.

Acknowledgments

We thank Alan Karr and Jerry Sacks for helpful discussions.

Footnotes

    • ↵† To whom correspondence should be sent at present address: Aventis Pharmaceuticals, Bridgewater, NJ 08807. E-mail: li.liu{at}aventis.com.

    • This paper was submitted directly (Track II) to the PNAS office.

    • Abbreviations: SVD, singular value decomposition; rSVD, robust SVD; ARF, alternating robust fitting.

    • Received May 22, 2003.
    • Accepted July 2, 2003.
    • Copyright © 2003, The National Academy of Sciences

    References

    1. ↵
      Healy, M. J. R. (1986) Matrices for Statisticians (Clarendon, Oxford), pp. 64-66.
    2. ↵
      Gabriel, K. R. & Zamir, S. (1979) Technometrics 21, 489-498.
      OpenUrlCrossRef
    3. ↵
      Croux, C., Filzmoser, P., Pison, G. & Rousseeum, P. J. (2002) Stat. Comput. 13, 23-36.
      OpenUrl
    4. ↵
      Ukkelberg, A. & Borgen, O. (1993) Anal. Chim. Acta 277, 489-494.
      OpenUrlCrossRef
    5. ↵
      Venter, J. H. & Steel, S. J. (1996) Comput. Stat. Data Anal. 22, 481-504.
      OpenUrlCrossRef
    6. ↵
      Hawkins, D. M. (2000) Comp. Stat. Data Anal. 37, 323-341.
      OpenUrl
    7. ↵
      Su, A. I., Cooke, M. P., Ching, K. A., Hakak, Y., Walker, J. R., Wiltshire, T., Orth, A. P., Vega, R. G., Sapinoso, L. M., Moqrich, A., et al. (2002) Proc. Natl. Acad. Sci. USA 99, 4465-4470.pmid:11904358
      OpenUrlAbstract/FREE Full Text
    8. ↵
      Sappal, B. S., Ghosh, S. S., Shneider, B., Kadakol, A., Chowdhury, J. R. & Chowdhury, N. R. (2002) Mol. Genet. Metab. 75, 134-142.pmid:11855932
      OpenUrlCrossRefPubMed
    9. Huang, C. S., Chang, P. F., Huang, M. J., Chen, E. S. & Chen, W. C. (2002) Gastroenterology 123, 127-133.pmid:12105841
      OpenUrlCrossRefPubMed
    10. ↵
      Passon, R. G., Howard, T. A., Zimmerman, S. A., Schultz, W. H. & Ware, R. E. (2001) J. Pediatr. Hematol. Oncol. 23, 448-451.pmid:11878580
      OpenUrlCrossRefPubMed
    11. ↵
      Agarwal, A. K., Rogerson, F. M., Mune, T. & White, P. C. (1989) J. Biol. Chem. 264, 18939-18943.pmid:2808402
      OpenUrlAbstract/FREE Full Text
    12. ↵
      Ricketts, M. L., Verhaeg, J. M., Bujalska, I., Howie, A. J., Rainey, W. E. & Stewart, P. M. (1998) J. Clin. Endocrinol. Metab. 83, 1325-1335.pmid:9543163
      OpenUrlCrossRefPubMed
    13. ↵
      Morton, N. M., Holmes, M. C., Fievet, C., Staels, B., Tailleux, A., Mullins, J. J. & Seckl, J. R. (2001) J. Biol. Chem. 276, 41293-41300.pmid:11546766
      OpenUrlAbstract/FREE Full Text
    14. ↵
      Oberhaensli, R. D., Rajagopalan, B., Taylor, D. J., Radda, G. K., Collins, J. E., Leonard, J. V., Schwarz, H. & Herschkowitz, N. (1987) Lancet II, 931-934.
      OpenUrl
    15. ↵
      Ravi, R. & Bedi, A. (2002) Curr. Opin. Oncol. 14, 490-503.pmid:12192267
      OpenUrlCrossRefPubMed
    16. ↵
      Novosad, J., Fiala, Z., Borska, L. & Krejsek, J. (2002) Acta Med. 45, 123-128.
      OpenUrl
    17. Chlichlia, K., Los, M., Schulze-Osthoff, K., Gazzolo, L., Schirrmacher, V. & Khazaie, K. (2002) Antioxidants Redox Signaling 4, 471-477.pmid:12215214
      OpenUrlCrossRefPubMed
    18. Sylvia, C. J. (2003) J. Wound Care 12, 13-16.pmid:12572231
      OpenUrlPubMed
    19. Van Den Brande, J. M., Peppelenbosch, M. P. & Van Deventer, S. J. (2002) Ann. N.Y. Acad. Sci. 973, 166-180.pmid:12485856
      OpenUrlPubMed
    20. Ng, C. S., Novick, A. C., Tannenbaum, C. S., Bukowski, R. M. & Finke, J. H. (2002) Urology 59, 9-14.
      OpenUrlPubMed
    21. ↵
      Gupta, S. (2002) Exp. Gerontol. 37, 293-299.pmid:11772515
      OpenUrlCrossRefPubMed
    22. ↵
      Huber, P. J. (1981) Robust Statistics (Wiley, New York).
    View Abstract
    Back to top
    Article Alerts
    Email Article

    Thank you for your interest in spreading the word on PNAS.

    NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

    Enter multiple addresses on separate lines or separate them with commas.
    Robust singular value decomposition analysis of microarray data
    (Your Name) has sent you a message from PNAS
    (Your Name) thought you would like to see the PNAS web site.
    Citation Tools
    Robust singular value decomposition analysis of microarray data
    Li Liu, Douglas M. Hawkins, Sujoy Ghosh, S. Stanley Young
    Proceedings of the National Academy of Sciences Nov 2003, 100 (23) 13167-13172; DOI: 10.1073/pnas.1733249100

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    Request Permissions
    Share
    Robust singular value decomposition analysis of microarray data
    Li Liu, Douglas M. Hawkins, Sujoy Ghosh, S. Stanley Young
    Proceedings of the National Academy of Sciences Nov 2003, 100 (23) 13167-13172; DOI: 10.1073/pnas.1733249100
    del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
    • Tweet Widget
    • Facebook Like
    • Mendeley logo Mendeley
    Proceedings of the National Academy of Sciences: 116 (50)
    Current Issue

    Submit

    Sign up for Article Alerts

    Jump to section

    • Article
      • Abstract
      • Methods
      • Relationship to Other Clustering Approaches
      • GeneLogic Data
      • Finding Additional Structure
      • Finding Outliers, Filling In Estimates of Missing Values, and Smoothing
      • Discussion
      • Acknowledgments
      • Footnotes
      • References
    • Figures & SI
    • Info & Metrics
    • PDF

    You May Also be Interested in

    News Feature: Getting the world’s fastest cat to breed with speed
    Cheetahs once rarely reproduced in captivity. Today, cubs are born every year in zoos. Breeding programs have turned their luck around—but they aren’t done yet.
    Image credit: Mehgan Murphy/Smithsonian Conservation Biology Institute.
    Adaptations in heart structure and function likely enabled endurance and survival in preindustrial humans. Image courtesy of Pixabay/Skeeze.
    Human heart evolved for endurance
    Adaptations in heart structure and function likely enabled endurance and survival in preindustrial humans.
    Image courtesy of Pixabay/Skeeze.
    Viscoelastic carrier fluids enhance retention of fire retardants on wildfire-prone vegetation. Image courtesy of Jesse D. Acosta.
    Viscoelastic fluids and wildfire prevention
    Viscoelastic carrier fluids enhance retention of fire retardants on wildfire-prone vegetation.
    Image courtesy of Jesse D. Acosta.
    Water requirements may make desert bird declines more likely in a warming climate. Image courtesy of Sean Peterson (photographer).
    Climate change and desert bird collapse
    Water requirements may make desert bird declines more likely in a warming climate.
    Image courtesy of Sean Peterson (photographer).
    QnAs with NAS member and plant biologist Sheng Yang He. Image courtesy of Sheng Yang He.
    Featured QnAs
    QnAs with NAS member and plant biologist Sheng Yang He
    Image courtesy of Sheng Yang He.

    Similar Articles

    Site Logo
    Powered by HighWire
    • Submit Manuscript
    • Twitter
    • Facebook
    • RSS Feeds
    • Email Alerts

    Articles

    • Current Issue
    • Latest Articles
    • Archive

    PNAS Portals

    • Classics
    • Front Matter
    • Teaching Resources
    • Anthropology
    • Chemistry
    • Physics
    • Sustainability Science

    Information

    • Authors
    • Editorial Board
    • Reviewers
    • Press
    • Site Map
    • PNAS Updates

    Feedback    Privacy/Legal

    Copyright © 2019 National Academy of Sciences. Online ISSN 1091-6490