New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
A simple and exact Laplacian clustering of complex networking phenomena: Application to gene expression profiles

Edited by H. Eugene Stanley, Boston University, Boston, MA, and approved January 25, 2008 (received for review September 11, 2007)
Abstract
Unraveling of the unified networking characteristics of complex networking phenomena is of great interest yet a formidable task. There is currently no simple strategy with a rigorous framework. Using an analogy to the exact algebraic property for a transition matrix of a master equation in statistical physics, we propose a method based on a Laplacian matrix for the discovery and prediction of new classes in the unsupervised complex networking phenomena where the class of each sample is completely unknown. Using this proposed Laplacian approach, we can simultaneously discover different classes and determine the identity of each class. Through an illustrative test of the Laplacian approach applied to real datasets of gene expression profiles, leukemia data [Golub TR, et al. (1999) Science 286:531–537], and lymphoma data [Alizadeh AA, et al. (2000) Nature 403:503–511], we demonstrate that this approach is accurate and robust with a mathematical and physical realization. It offers a general framework for characterizing any kind of complex networking phenomenon in broad areas irrespective of whether they are supervised or unsupervised.
Uncovering the common essence of complex networking phenomena occurring in the broad range of nature, for example social networks (1, 2), biological networks, metabolic networks (3–7), the Internet, disordered networks (8, 9), the stock market (10), etc., is an important endeavor for grasping the unified description for various networking phenomena. Although developing such a unified mathematical or numerical framework is nontrivial and highly challenging, graph theory and its application to the network facilitated the investigation of the underlying networking characteristics of various complex networking phenomena (11). Among many interesting phenomena, of particular importance to our life is the application of complex network analyses for the treatment of cancer (3, 4). The number of studies involving gene expression profiles based on DNA microarray technology in biomedical laboratories is increasing (12–15). The identification of new tumor classes using gene expression profiles is very important to the successful and efficient treatment of cancer, especially when the number of different classes of tumors is unknown. To achieve this goal, the clustering method is the usual statistical tool in the analysis of a complex network (16–18). Cluster analysis partitions a set of objects into groups or clusters where each of the clustered objects is as similar as possible sharing common characters. Among many clustering methods, the Kmeans clustering (12) and selforganizing maps (19) are widely used. However, clustering conditions such as the number of different classes K and the number of grids are unknown a priori and should be predetermined or chosen artificially. Therefore, arbitrary specification of clustering conditions in the beginning of the analysis of a complex network may easily mislead researchers in the essential clustering nature of the underlying complexity and moreover hamper the correct interpretation of the associated complex phenomenon. This is a serious obstacle for the exact analysis and success of new class discovery and prediction.
To build up a logical and general strategy for clustering of complex networking phenomena based on the rigorous mathematical and physical model, we propose one method, not by an ad hoc assumption or an iterative calculation (13, 14) from experience, but by using the simple and exact algebraic properties of a Laplacian matrix (11, 20) for correlated complex networking phenomena represented by complex networks. Our method eliminates fundamental difficulties found in the existing clustering methods by both estimating the number of different classes in a datadriven manner and classifying the class of each sample simultaneously. The eigenvector for the nonzero smallest eigenvalue of a Laplacian matrix is called the Fiedler vector (21, 22). This vector has been used for several graph manipulations such as partitioning (23, 24), linear labeling (25), and envelope minimization (11, 20, 25, 26). In this study, after identifying an exact analogy of a Laplacian matrix with the transition matrix of a master equation in nonequilibrium statistical physics, we propose a Laplacian clustering method based on the unique character of the Fiedler vector for the discovery and prediction of new classes in complex networking phenomena. The idea behind the proposed method is quite simple, exact, and robust so that it can be broadly used, in the same spirit for any kinds of complex networking phenomena in general, irrespective of whether they are supervised or unsupervised. We illustrate the practical application of this Laplacian clustering method to gene expression profiles of tumors.
Results and Discussion
The Laplacian Matrix and Motivation for Clustering.
A graph G = G(V, E) consists of a set of vertices V and a set of edges E. Two vertices v_{i} and v_{j} of a graph G are said to be adjacent if there exists an edge connecting v_{i} and v_{j} with the nonzero weight e_{ij} . The adjacency matrix A = (a_{ij} ) of a graph G with n vertices is defined as a n × n symmetric matrix with components a_{ij} =e_{ij} or a_{ij} = 0 if there is no connecting edge, where the diagonal elements a_{jj} are equal to zero for all j = 1, 2, …, n. The Laplacian matrix of a graph G is defined as L = D − A, where D, called the degree matrix, is a diagonal matrix with the jth diagonal element d_{jj} = Σ_{i=1} ^{n} a_{ij} (11, 18, 20).
We assume that there are n samples x _{1}, …, x _{n} , where x _{i} is a vector of gdimension. For example, in the gene expression matrix, x _{i} denotes an expression of g genes for the ith sample. To construct the Laplacian matrix for clustering, we first define the adjacency matrix from some similarity measures between x _{1}, …, x _{n} . For example, for two vectors x and y, the Euclidean distance [{Σ_{i=1} ^{g}(x_{i} − y_{i} )^{2}}^{1/2}] and Manhattan distance (Σ_{i=1} ^{g}x_{i} − y_{i} ) are often used among others (16–18). In general the component of the adjacency matrix a_{ij} should reveal the closeness or degree of connectivity between x _{i} and x _{j} . The goal is to partition n samples into an arbitrary number of groups such that samples belonging to the same group have higher correlation or stronger connectivity sharing the common characteristics than those in the other group.
Let z = (z _{1}, …, z_{n} ) ^{T} be an unknown argument, where the T denotes the transpose of a vector or a matrix and z_{i} contains information on the group where the ith sample belongs, i.e., if z_{i} = z_{j} , then x _{i} and x _{j} belong to the same group. Therefore, the classification of n samples corresponds to obtaining the solution z. This goal can be achieved by minimizing the weighted sum of squares After we set S = Σ_{i=1} ^{n} z_{i} and R = Σ_{i=1} ^{n} z _{i} ^{2}, we impose two constraints R = 1 to avoid the trivial solution z_{i} = 0 for all i and S = 0 to keep the invariance of the minimum in Q. We take advantage of the relation Q = z ^{T} Lz. We use Lagrangian multiplier methods to minimize Q subject to R = 1 with a Lagrangian multiplier λ and then obtain the eigenvalue equation Lz = λz. This equation yields a nontrivial solution z if and only if λ is an eigenvalue of L, and z is the corresponding eigenvector. Now we have z ^{T} Lz = λ by multiplying z ^{T} on both sides of Lz = λz. Therefore, the nonzero smallest eigenvalue and its associated eigenvector (Fiedler vector) yields the optimal solution (11, 20–24).
Analogy of the Laplacian Matrix with a Transition Matrix in a Master Equation.
Because the Laplacian matrix is constructed such that the sum of elements in each raw vector is always zero, it provides an exact analogy with a transition matrix of a master equation in nonequilibrium statistical physics (27, 28). Let us consider a particle moving on a network with n vertices. The motion of a particle on this network could be described as a hopping between adjacent vertices. Assuming that the hopping probability of this particle from a site j to i is given by m_{ij} , where M = (m_{ij} ) is a transition matrix, and that p(t) = (p _{1}, p _{2}, …, p_{n} ) is a probability vector of finding a particle on the vertices (1, 2, …, n) at a time t, then the time evolution of the probability vector p(t) satisfies a master equation d p(t)/dt = −Mp(t) with the detailed balance condition, where m_{jj} = −Σ_{i≠j} ^{n} m_{ij} such that Σ_{i=1} ^{n} m_{ij} = 0 for each row j. This master equation describes how a system that starts from a nonequilibrium state evolves into an equilibrium state at the asymptotic time limit. The time evolution (relaxation rate) toward an equilibrium stationary state is governed by the eigenvalues of a transition matrix M and the relaxation mode is determined by the eigenvectors of M. The exact algebraic property of the solution of a master equation (27, 28), in particular, is that (i) there is always one zero eigenvalue. Its eigenvector describes the equilibrium (stationary) probability of finding a particle on the vertices of a network, (ii) the sum of the eigenvector elements for each of the nonzero eigenvalues is always zero, which governs the relaxation mode of a probability toward that of an equilibrium one. Most importantly, the nonzero smallest eigenvalue and its eigenvector dictate the dominant mode of time evolution (relaxation) of p(t) to an equilibrium one with the longest relaxation time. Bearing in mind the exact properties of eigenvalues and eigenvectors of a master equation, it is important to recognize that the Laplacian matrix L in our setting for a cluster analysis and a transition matrix M in a master equation are constructed in the same way. Therefore, the Fiedler vector of the Laplacian matrix L shares the same exact property of the eigenvector for the nonzero smallest eigenvalue of a transition matrix M (21, 22, 27, 28).
A Strategy for Cluster Discovery and Prediction.
Here, we briefly illustrate the flow of the clustering strategy using a Laplacian matrix constructed from gene expression profiles. We first construct an n × n adjacency matrix A = (a_{ij} ) from the g × n gene expression matrix X = (x_{ji} ), where j = 1, …, g; i = 1, …, n. x_{ji} is the gene expression level of the jth gene for the ith sample. After we eliminate noises in the raw data a_{ij} by the thresholding procedure, we define a Laplacian matrix L. Then, n elements (samples) of a Fiedler vector of L are readily grouped into K groups where K is the number of distinct clusters. Few essential genes playing a significant role in clustering are selected based on the Ftest following Dudoit et al. (29) and Lee and Lee (30). Based on these essential genes and n samples, we reconstruct a new Laplacian matrix and estimate the number K where m out of n samples are classified into their corresponding clusters by the unique character of the Fiedler vector for a new Laplacian matrix. Now we predict the classes of n − m unclassified samples in such a way that each unclassified sample has the highest correlation with those in the already classified class. (For the details of the clustering strategy, see Materials and Methods.)
Illustrative Examples of Cluster Discovery and Prediction.
A. Leukemia data.
Golub et al. (3) suggested gene expression monitoring for the classification of two types of acute leukemia; namely, acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML). This dataset includes 6,817 genes for the 38 training set (patients) (27 ALL, 11 AML) and the 34 test set. The raw data are available at www.genome.wi.mit.edu/MPR. Golub et al. (3) proposed “a weighted gene voting scheme,” which turned out to be a variant of quadratic discriminant analysis. They applied it to training data based on 50 informative genes. The number of correct decisions was 36 genes of 38 training samples and 29 genes of 34 test samples. ALL can be further divided into finer subclasses such as B cell and T cell ALLs. Then, this problem can be also regarded as a threeclass (ALLB/ALLT/AML) problem. In contrast, Lee and Lee (30) argued that gene expression patterns in ALLB are much closer to those in AML than ALLT by inspecting 20 top ranked genes. Therefore, the leukemia data can be classified into three different types.
The raw microarray data contain a lot of abnormal samples or outliers, so it is conventional to perform data preprocessing before we make a further downstream analysis. Here, we perform the following data preprocessing as done in Dudoit et al. (29): i.e., (i) thresholding, with a floor of 100 and a ceiling of 16,000; (ii) filtering, with an exclusion of genes with max/min ≤5 or max–min ≤500, where max and min refer, respectively, to the maximum and minimum intensities for a particular gene across the samples; and (iii) a base 10 logarithmic transformation. This preprocessing of data resulted in 3,571 genes of 6,817 genes. Herein, we deal with a 3,571 × 72 gene expression matrix.
Clustering in an unsupervised setup.
Here, we assume that the leukemia data are unsupervised—i.e., the number of different classes of tumor and the class of each sample are assumed to be completely unknown. We apply our proposed method to 72 samples. We demonstrate how well our Laplacian clustering method discovers and predicts the tumor class of each sample.
First, we construct a 3,571 × 72 gene expression matrix X with normalized elements and get a 72 × 72 adjacency matrix A = X ^{T} T − I. The criteria of WCC and ANC, defined in Materials and Methods, give the optimal value of δ such that δ = 0.800 for K = 2 and δ = 0.818 for K = 3. We note that the δ = 0.800 for the K = 2 case corresponds to the Type B classification and the δ = 0.818 for the K = 3 case corresponds to the Type C classification, respectively. The elements of each class are the same as those of Golub et al. (3) except for a couple of misclassification. Here, the number of selected essential genes is 50. We also tried 100 and 200 essential genes, and they gave similar results (δ = 0.790–0.802 for K = 2 and δ = 0.812–0.820 for K = 3, respectively). For the K = 2 case (Type B) only two samples (1 and 2) are misclassified of 72 samples, and for K = 3 case (Type C) only two samples (1 and 34) are misclassified. Clustering results based on our proposed method are as good as other clustering methods such as Kmeans or SOM where the number of different classes had to be predetermined for a reasonable cluster analysis. One thing to note is that the first sample is misclassified in both cases. This sample is also misclassified in the approach of Golub et al. (3). This patient seems to be either an outlier or incorrectly reported so that further examination or followup on that patient is required. Fig. 1 shows heat maps for the adjacency matrix for K = 2 (Type B) and K = 3 (Type C), respectively, based on 100 essential genes. We clearly observe two groups in Fig. 1 a and three groups in Fig. 1 b, whose detailed classification in terms of a connecting network is also shown in Fig. 2.
Classification in a supervised setup.
Here, we assume that the leukemia data are supervised—i.e., the class of each sample in the training data is known. Using the information in the training dataset, we predict the class of each sample in the test dataset. There are no widely accepted guidelines for choosing the relative sizes of the training sets and test sets. We choose a 2:1 crossvalidation (CV) scheme (onethird of the datasets are assigned to the test sets) as done by Dudoit et al. (29). Specifically, we choose 24 samples numbered by 7–9, 27–34, and 60–72 of 72 samples. After applying the proposed method to the 48 training samples, we obtain classification results for the 24 test samples in three different types. For better validation we perform LOOCV (leaveoneout crossvalidation), where only one sample is used for the test dataset and others are used for the training dataset. Both results are summarized in Table 1. The classification results are very good. The number of selected genes does not seriously affect classification results.
B. Lymphoma data.
The lymphoma dataset (4) consists of gene expression profiles in the three most prevalent adult lymphoid malignancies: diffuse large B cell lymphoma (DLBCL), follicular lymphoma (FL), and B cell chronic lymphocytic leukemia (BCLL). The original gene expression matrix consists of p = 4,682 genes and n = 81 samples. Here, we take n = 62 samples: 42 samples of DLBCL, 9 samples of FL, and 11 samples of BCLL. For convenience, we label each sample in each class as DLBCL (1–42), FL (43–51), and BCLL (52 62).
Clustering in an unsupervised setup.
Assume that the lymphoma data are unsupervised—i.e., we do not know the number of different classes and the class of each sample a priori. By using the same steps applied to the leukemia data, we obtained two types of classification: (i) two classes DLBCL and FL/BCLL (samples 1–42 and 43–62) with one misclassified sample (sample 42) and (ii) three classes DLBCL, FL, and BCLL (samples 1–42, 43–51, and 52–62) with three misclassified samples (samples 1, 41, and 42). Hence, the first type of classification does not distinguish samples in FL and BCLL classes. To see whether samples in the FL and BCLL are really nondistinguishable, we applied our Laplacian matrix method to 20 samples in FL and BCLL (43–62). We obtained two classes (43–51 and 52–62) without any misclassified samples. Therefore, we may conclude that the lymphoma data consist primarily of two classes (DLBCL and FL/BCLL) and that FL and BCLL are secondary classes—i.e., samples in FL and BCLL are closer to each other compared with samples in DLBCL. Heat maps for the adjacency matrix and connection networks are shown in supporting information (SI) Figs. 3–6.
Classification in a supervised setup.
When we apply our proposed method to the lymphoma data with a supervised setup, it gives two classes: class 1 (1–42) and class 2 (43–62), with no misclassification even though the given number of classes was three (1–42, 43–51, and 52–62). As in the unsupervised setup, we applied the same method to the samples in class 2 (43–62). We obtained two classes: 43–51 and 52–62, with one misclassified sample. Conclusively, in the supervised setup, we obtained the same results and have the same interpretations as those in the unsupervised setup.
C. Comparison with other methods.
We have noted other clustering methods such as coupled twoway clustering (CTWC) analysis (13, 14) and the stochastic dynamic model (10), which share the similar objective of clustering as ours. CTWC is an iterative clustering process by looking for pairs of a relatively small subset of samples and genes because the “signal” may be masked by the “noise” generated by the uncorrelated data. CTWC can be performed with any clustering method, but the superparamagnetic clustering (SPC) algorithm (31, 32) is especially suitable for the analysis of gene microarray data. The input for SPC is a distance matrix that corresponds to the adjacency matrix A in the Laplacian clustering method. In SPC, the resolution of clustering is governed by a tunable parameter T, called temperature, which has a similar role to a thresholding parameter δ in our method. T is chosen as that above where the cluster is stable. Our thresholding parameter δ is determined by WCC and ANC criteria in a datadriven manner. The CTWC algorithm provides a broad list of stable gene and sample clusters so that an appropriate discovery of a meaningful process or interpretation of identified clusters is chosen. However, the Laplacian algorithm gives an estimate of number of different clusters and the corresponding members of each cluster. Despite some differences between the two methods, they gave similar results from the analysis of leukemia data. For example, both methods successfully detected ALLB and ALLT clusters that can hardly be found by other conventional clustering methods.
Another relevant method is the stochastic model of coupled random walks for stock–stock correlations (10). This model consists of a system of n walks at g different times that corresponds to n samples and g genes, respectively, in microarray data. This method is similar to the Laplacian clustering method in using the information contained in the eigenvector corresponding to the primal eigenvalues of the correlation matrix. However, it is different from the Laplacian clustering method in using all of the information contained in g times, whereas the Laplacian clustering method uses information contained in the essential genes only. In addition, like the CTWC method, the stochastic model does not estimate the number of clusters.
Summary and Conclusion.
We proposed a Laplacian clustering method for the discovery and prediction of new classes in unsupervised complex networking phenomena where the class of each sample is completely unknown. Our method is based on using an analogy of a Laplacian matrix of correlated complex networking phenomena with a transition matrix of a master equation in nonequilibrium statistical physics and applying their exact algebraic properties. Our approach differs fundamentally from previous methods in that it can classify both the unsupervised data and the supervised data without prior knowledge of the clustering condition, and discover different classes and determine the identity of each class simultaneously. When we applied the proposed method to the leukemia data (3), it produced successful results that could hardly be achieved by the existing clustering methods in the unsupervised setup. Throughout the lymphoma data (4), we also obtained accurate results. Furthermore, we noticed the hierarchical property of our method—i.e., the lymphoma data consist primarily of two classes (DLBCL and FL/BCLL), and FL and BCLL are secondary classes.
In conclusion, the Laplacian clustering method is an excellent strategy for class discovery and prediction of unknown classes in unsupervised complex networking phenomena. This method is robust in the sense that the prediction results are not sensitive to the choice of the number of essential genes in the feature selection. It is simple with the mathematical basis and physical realization. It offers a general framework for analyzing any kind of complex networking phenomenon in the broad range of life science, sociology, the Internet, disordered complex networks, etc., irrespective of whether they are supervised or unsupervised.
Materials and Methods
Cluster Analysis by Using the Laplacian Matrix Method.
Recall that x_{ji} , j = 1, …, g; i = 1, …, n denotes the gene expression level of the jth gene for the ith sample, and let X = (x_{ji} ) be the g × n gene expression matrix. We describe the details of a clustering method for the unsupervised data in the following steps.
Step 1. Construction of a Laplacian matrix.
We normalize the gene expression level x_{ji} as x_{ji} ← (x_{ji} − x̄. _{i} )/s_{i} , where x̄._{i} = Σ_{j=1} ^{g} x_{ji} /g and s _{i} ^{2} = Σ_{j=1} ^{g}(x_{ji} − x̄._{i} )^{2}. Let A = X ^{T} X − I be the adjacency matrix that is the same as the correlation matrix except all of the diagonal terms a_{ii} = 0, i = 1, …, n. To remove noise in the raw data, we modify the adjacency matrix as R = (r_{ij} ), r_{ij} = a_{ij}I (a_{ij}  ≥ δ), for some δ > 0, called a thresholding parameter. Finally, we define a Laplacian matrix L = D − R, where D = (d_{ij} ) is the degree matrix with d_{jj} = Σ_{i=1} ^{n} r_{ij} and d_{ij} = 0 (i ≠ j).
Step 2. Estimation of the number of classes.
The general from of Fiedler vector for L is Then n samples are classified into K groups for a given δ; that is, B _{1} = (b _{11}, …, b _{1n1}), …, B_{K} = (b _{K1}, …, b_{KnK} ), where n_{k} ≥ 2, k = 1, …, K, and B_{k} is the index set of the kth class. Note that m = Σ_{l=1} ^{K} n_{l} is the number of classified samples and n − m is the number of unclassified samples. We apply two criteria to determine a proper thresholding parameter δ. First, it is appropriate to select δ that maximizes the average of within the class correlations (WCC) defined as all possible correlations between two samples in the same class. Second, we select δ that maximizes the average number of samples (ANC) per class, ANC = m/K. Therefore, considering both WCC and ANC will give a moderatesized K with relatively large members in each class.
Step 3. Selection of the essential genes.
Even though the gene expression profiles consist of thousands of genes, most of them are uninformative for classification. Therefore, it is highly recommended to select a few essential genes before clustering. Assume that m samples are classified as B _{1}, B _{2}, …, B_{K} based on δ chosen in Step 2. To test whether the mean intensities for each class at the jth gene are the same, we apply the Ftest defined as the ratio of the between sum of squares (BSS) and the within sum of squares (WSS), which was used by Dudoit et al. (29) and Lee and Lee (30). Here, we selected 50, 100, and 200 essential genes. The clustering results were not seriously affected by the number of essential genes.
Step 4. Prediction for the unclassified samples.
We repeat Step 1 and Step 2 based on essential genes selected from Step 3, then m of n samples are classified into K classes. Now, we predict the classes of n − m unclassified samples in the following way. Let r̄ _{j} ^{(k)} = Σ_{i∈Bk }r _{ij}/n_{k} be the average correlation between an unclassified sample j and classified samples in a class k, then the class of sample j is declared as the class satisfying max _{k}r̄ _{j} ^{(k)}.
Application to Classification of Supervised Data.
Because the supervised data already has information on the class of each sample, the clustering strategy for the supervised data are the same as the unsupervised case except in determining δ. We minimize the misclassification rate, defined as the proportion of incorrectly predicted samples out of the training dataset to determine δ.
Acknowledgments
We acknowledge the valuable help of E. Moon and W. Yu in the first stage of this work. We thank K. Han (Molecular Cancer Center/KRIBB), J. Bhak (Korea Bioinformatics Center/KRIBB), and S. Kim (Chemistry/Pusan Nat'l Univ.) for their careful reading of and critical feedback on the manuscript. This work was supported by the Korea Science and Engineering Foundation under National Research Laboratory Program M10433306J33310 (M.C. and I.C.) and Grant R142003002010000 (to C.K.).
Footnotes
 ^{§}To whom correspondence should be addressed. Email: chang{at}random.phys.pusan.ac.kr

Author contributions: C.K. and M.C. contributed equally to this work; C.K., M.C., and I.C. designed research; C.K., M.C., M.K., and I.C. performed research; C.K. and M.C. contributed new reagents/analytic tools; C.K., M.C., M.K., and I.C. analyzed data; and C.K. and I.C. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0708598105/DC1.
 © 2008 by The National Academy of Sciences of the USA
References

↵
 Amaral LAN ,
 Scala A ,
 Barthelemy M ,
 Stanley HE
 ↵

↵
 Golub TR ,
 et al.
 ↵
 ↵
 ↵
 ↵
 ↵

↵
 Chen Y ,
 Lopez E ,
 Havlin S ,
 Stanley HE

↵
 Ma WJ ,
 Hu CK ,
 Amritkar RE

↵
 Mohar B
 Alavi Y ,
 Chartrand G ,
 Oellermann OR ,
 Schwenk AJ

↵
 Eisen MB ,
 Spellman PT ,
 Brown PO ,
 Botstein D
 ↵

↵
 Getz G ,
 Levine E ,
 Domany E

↵
 Speed T

↵
 Everitt B

↵
 Gordon A

↵
 Kaufman L ,
 Rousseeuw P

↵
 Kohonen T
 ↵

↵
 Fiedler M

↵
 Fiedler M
 ↵
 ↵
 ↵

↵
 Barnard ST ,
 Pothen A ,
 Simon HD
 ↵
 ↵
 ↵

↵
 Lee Y ,
 Lee CK
 ↵
 ↵
Citation Manager Formats
Sign up for Article Alerts
Jump to section
You May Also be Interested in
More Articles of This Classification
Physical Sciences
Applied Physical Sciences
Related Content
 No related articles found.