## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Closed-form density-based framework for automatic detection of cellular morphology changes

Edited by John W Sedat, University of California, San Francisco School of Medicine, San Francisco, CA, and approved March 28, 2012 (received for review October 30, 2011)

## Abstract

A primary method for studying cellular function is to examine cell morphology after a given manipulation. Fluorescent markers attached to proteins/intracellular structures of interest in conjunction with 3D fluorescent microscopy are frequently exploited for functional analysis. Despite the central role of morphology comparisons in cell biological approaches, few statistical tools are available that allow biological scientists without a high level of statistical training to quantify the similarity or difference of fluorescent images containing multifactorial information. We transform intracellular structures into kernels and develop a multivariate two-sample test that is nonparametric and asymptotically normal to directly and quantitatively compare cellular morphologies. The asymptotic normality bypasses the computationally intensive calculations used by the usual resampling techniques to compute the *P*-value. Because all parameters required for the statistical test are estimated directly from the data, it does not require any subjective decisions. Thus, we provide a black-box method for unbiased, automated comparison of cell morphology. We validate the performance of our test statistic for finite synthetic samples and experimental data. Employing our test for the comparison of the morphology of intracellular multivesicular bodies, we detect changes in their distribution after disruption of the cellular microtubule cytoskeleton with high statistical significance in fixed samples and live cell analysis. These results demonstrate that density-based comparison of multivariate image information is a powerful tool for automated detection of cell morphology changes. Moreover, the underlying mathematics of our test statistic is a general technique, which can be applied in situations where two data samples are compared.

Fluorescent markers attached to proteins of interest in conjunction with modern fluorescent microscopy technologies are a useful proxy for studying subcellular compartments and their behavior after a given manipulation. Treatment with chemical compounds or specific gene silencing by RNA interference are commonly used at the scale of individual experiments to high-throughput studies. Visual inspection by expert biologists has been performed for several decades, ranging from early studies by microscopists like Ramon y Cajal to contemporary large scale, high-throughput screens (1⇓⇓–4). Although human observation may be very accurate, the three major drawbacks are that (*i*) it lacks quantitative measures, (*ii*) it may be biased, and (*iii*) it is time consuming.

The structural features of cells and the topological relationships between the numerous intracellular compartments give rise to multivariate data whose unbiased, automatic comparison is a major challenge. Importantly, alterations in cellular morphology also occur in many diseases, including cancer, requiring quantitative tools for their detection. Given that the majority of functional cell biology is based on image comparison, few tools are available that allow an unbiased, automatic comparison of the multivariate data encoded in fluorescent images. The cytometric tools developed so far are based on the extraction of a variety of numerical features from images in combination with classification strategies (5⇓⇓⇓–9). Features represent any measured property derived from the image, such as total/mean/standard deviation of fluorescence intensity, texture, Zernike shape descriptions, etc. (5, 10). Although feature-based approaches have proved to be very powerful in detection of morphological changes (11⇓⇓–14), they may suffer from lack of biologically meaningful, human interpretable measurements due to the acquisition of abstract numerical features and high-dimensional feature vector analysis. Furthermore, they require careful choice and calibration for each comparison (15). Current approaches also suffer from reduction of information as multidimensional information is transformed to one-dimensional metrics such as distances (16). There are few statistical approaches that directly assess intracellular organization, which makes automated image analysis of intracellular topology challenging. Thus, spatial comparisons could complement feature-based techniques for analyzing cell morphology alterations.

Recently, we showed that global spatial organization of defined subcellular structures (e.g., organelles, membrane domains) can be quantified by probabilistic density maps (17). We grew cells on adhesive micropatterns that enforce cells to take a certain shape, mimicking tissues’ microenvironment (18). Image stacks of fluorescently marked proteins from several tens of cells were transformed into a cloud of coordinate points by segmentation analysis and were aligned using characteristic landmarks of micropatterns. To rigorously measure the topology of the fluorescently labeled subcellular structures, we centered Gaussian functions (kernels) with mean zero and an optimized variance at each of the data points and summed, revealing the underlying density throughout the cell. This analysis demonstrated that density estimation is a reliable statistical technique for the analysis of the morphology of subcellular structures whose point coordinates can be resolved, providing the basics for a comprehensive framework for statistical analysis. By transforming intracellular structures into three-dimensional kernels, alterations in cellular global organization, and thus cell morphology, can be translated into differences in density maps that are tractable by mathematical tools.

The problem of comparing two data samples has attracted much research to investigate its theoretical and practical aspects. Historically, the first methods involved small computational burdens. The well-known *t*-test developed in the Guinness brewery fits normal distributions with different means but with equal variances to each data sample, thus reducing the original problem to a simpler comparison for a difference in the means. However, this test is limited by the prespecification of the parametric form. Amongst the most widely known nonparametric tests for one-dimensional continuous data are the Mann-Whitney, Kolmogorov-Smirnov, and Wald-Wolfowitz tests (19). The need for analogous tests for multivariate data has been addressed (20⇓–22). However, these multivariate approaches have not met with the same wide acceptance as their univariate ancestors, because the former have not consistently yielded intuitive inferences when applied to experimental data. Given that the *t*-test is a density-based comparison, replacing parametric density estimates with their nonparametric counterparts should lead to a more flexible testing procedure. Kernel smoothing is a widely used computational technique for density estimation due to its intuitive construction and interpretation (23). Thus, it is an ideal basis for nonparametric density-based testing. Kernel-based tests have been developed with other discrepancy measures (24⇓⇓–27), but all rely on computationally intensive resampling methods to compute the critical quantiles of the null distribution. Although resampling methods provide a general framework for consistent tests, a second major trade-off is that they require sufficient familiarity, as resampling requires calibration for each data analysis situation at hand. These constraints prevent the wide adoption of bootstrap kernel density-based testing outside the computational statistical community. In particular, these tests are not easily available to biologists.

Here, we develop a test statistic that is asymptotically normal under the null hypothesis, allowing density-based, “black-box” comparisons of multivariate data. We use simulated and experimental data analysis to verify its performance for finite samples. Given that 3D organizations of cells can be expressed by probabilistic density maps, this test allows us to assess the statistical significance of the similarity or difference between two cellular topologies. Analyzing the data from fluorescent images of intracellular organelles, this test allows us to compare cellular morphology under different conditions in an automated and unbiased manner.

## Results

### Construction of the Test Statistic.

We have used the usual squared discrepancy measure in order to construct a nonparametric and multivariate test statistic that is asymptotically normal under the null hypothesis. (Algorithmic details are deferred to the *Methods* section).

Let *X*_{1},*X*_{2},…*X*_{n1} and *Y*_{1},*Y*_{2},…*Y*_{n2} be *d*-variate random samples from their respective common densities *f*_{1} and *f*_{2}. Concretely, *X*_{1},*X*_{2},…*X*_{n1} are the spatial coordinates of subcellular structures extracted from a first group of images, and likewise for *Y*_{1}, *Y*_{1},*Y*_{2},…*Y*_{n2} from a second group of images. So *f*_{1} represents the steady-state spatial probability density function of the subcellular structures in the first images, and likewise for *f*_{2}. This is the same statistical framework used in Schauer et al. (17) to construct density maps from a single set of images. The kernel density estimates of *f*_{1} and *f*_{2} are [1]where *K* is the kernel function with , and **H**_{l} is a bandwidth matrix, for *l* = 1,2.

To test the null hypothesis *H*_{0}: *f*_{1} = *f*_{2}, we follow Anderson et al. (28), who proposed the following discrepancy measure *T* = ∫[*f*_{1}(** x**) -

*f*

_{2}(

**)]**

*x*^{2}

*d*

**. As is the case in the rest of this manuscript whenever the limits of integration are omitted, integration is taken over the appropriate Euclidean space. We use the squared error measure, since it has the most extensive body of work in automatic optimal selection of the smoothing parameters in comparison to other discrepancy measures such the absolute error, Kullback-Leibler error, and Shannon-Jenson error. We rewrite the discrepancy as**

*x**T*=

*ψ*

_{1}+

*ψ*

_{2}- (

*ψ*

_{1,2}+

*ψ*

_{2,1}) where

*ψ*

_{l}= ∫

*f*

_{l}(

**)**

*x*^{2}

*d*

**and**

*x**ψ*

_{l1,l2}= ∫

*f*

_{l1}(

**)**

*x**f*

_{l2}(

**)**

*x**d*

**. The test statistic is where**

*x*We can interpret this test statistic as the comparing intrasample pairwise differences *X*_{i1} - *X*_{i2} and *Y*_{j1} - *Y*_{j2} to the intersample pairwise differences *X*_{i} - *Y*_{j}. So if the latter are larger than the former, then this indicates that the samples are different. The following theorem is our main result, which establishes the asymptotic normality under the null hypothesis of the test statistic .

Under the conditions in the Methods section, and assuming that the null hypothesis holds, *H*_{0}: *f*_{1} = *f*_{2} = *f*. As *n*_{1},*n*_{2} → ∞, then , where and .

### Null Distribution Parameter Estimation.

To use the asymptotic null distribution, we need to estimate the mean parameter *μ*_{T} and the variance parameter . For *μ*_{T}, Chacón and Duong (29) showed an algorithm to obtain consistent estimators of the bandwidth matrices **H**_{1} and **H**_{2} as minimizers of the asymptotic Mean Squared Error of and respectively. For , it is straightforward to show that an estimator is where is an estimator of the variance of *f*_{1}(** X**) and an estimator of the variance of

*f*

_{2}(

**). Previous research has indicated that asymptotic normal approximations of a null distribution tend to reject the null hypothesis more often than is indicated by the nominal level of significance (25). One of the primary causes is the overestimation of the variance. In the context of kernel estimators, this usually arises from using a bandwidth which is optimal for density estimation, but which leads to an inflated variance estimate. Our proposed solution is to estimate the variance more directly using a larger bandwidth, since larger bandwidths reduce the variance by mitigating the effect that individual data points have on the value of the kernel estimator. Examining the first order Taylor’s series expansion about the expected value:**

*Y**f*(

**) ∼**

*X**f*(E

**) + (**

*X***- E**

*X***)**

*X*^{T}D

*f*(E

**) where is the column vector of first partial order derivatives, thus Var**

*X**f*(

**) ∼**

*X***[**D

*f*(E

**)**

*X***]**

^{T}(Var

**)**

*X***[**D

*f*(E

**)**

*X***]**. So plug-in estimators of and are and where , are the sample means,

*S*_{l}are the sample variances, and are the normal scale selectors for a kernel estimator of the first density derivative (30).

Given these parameter estimates, the standard equation to obtain a *z*-score from is . The *p*-value is then computed from this *z*-score using standard software or tables. The completely automatic testing procedure (including the parameter estimation, and the computation of the test statistic and its *P*-value) is programmed in the ks library (31) in the open-source R programming language.

### Simulated Data Analysis.

To verify the performance of our kernel density-based test for finite samples, we performed simulation studies using pairs of mixture normal densities, mostly taken from Chacón (32). The contour plots of these test densities as well as representative scatter plots for the two considered sample sizes (*n* = 100 and *n* = 1000) are displayed in Fig. 1. The first pair *N*((-1/2,0),**I**_{2}) and *N*((1/2,0); **I**_{2}) represent two single normal densities with identity variance, whose means are separated by distance of 1. This example was treated as base case. The second pair both are bimodal densities, 1/2*N*((1,-1),Σ) + 1/2*N*((-1,1),Σ) and 1/2*N*((1,-1),Σ) + 1/2*N*((1,-1),I_{2}) where Σ = [4/9 4/15; 4/15 4/9]. The lower right component of the pairs was exactly the same, but their upper right component was different, making it potentially a challenging case to distinguish between two finite samples. As a third example, we chose a pair 3, *N*((0,0),**I**_{2}) and 1/2*N*((0,0),**I**_{2}) + 1/10*N*((0,0),1/16 **I**_{2}) + 1/10*N*((-1,-1),1/16 **I**_{2}) + 1/10*N*((-1,1),1/16 **I**_{2}) + 1/10*N*((1,-1),1/16 **I**_{2}) + 1/10*N*((1,1),1/16 **I**_{2}), that have (approximately) zero mean and identity variance. Because this pair reveals different internal structure, it would most likely benefit from a density-based, rather than a moment-based, test.

First, we verified the asymptotic normality of by comparing the density estimates of the *z*-scores with the standard normal (Fig. S1). The larger sample gave better estimates of the zero mean. On the other hand, performance in variance estimation was more uneven, since the *n* = 1000 samples did not lead to better variance estimates for pair 2. Related results have been observed previously (25), indicating that the variance estimation is the most difficult part in calibrating an asymptotically normal null distribution.

We performed simulations of the test statistic for two common nominal levels of significance *α* = 0.05, 0.01 (Table 1), where α is the error rate of rejecting the null hypothesis H_{0} when the null hypothesis is true (false positive). To estimate how close our statistical test in achieving this error rate, we computed the proportions of experiments where two samples are simulated from the same distribution, which reject H_{0}. Given a level of significance, the other error that can be made is to accept the null hypothesis H_{0} when it is false (false negative). We estimated this by computing the proportion of the experiments where two samples are simulated from different distributions, which H_{0} is accepted. The empirical power is . For the smaller sample size, we found that the empirical significance levels were close to the nominal values, but the power was low for pairs 2 and 3. This indicated that *n* = 100 data points were not sufficient to distinguish reliably between these more difficult comparisons. For the larger sample, the lack of power was resolved for all three pairs. The empirical levels of significance were more conservative for the larger sample size. This simulation evidence demonstrated that our proposed test does not identify more false positives than expected from the nominal level of significance and identifies almost all true negatives.

To evaluate the performance of our test, we compared it to a parametric alternative. The *t*-test is a well-known hypothesis test for univariate data, which has been generalized for multivariate data in Nel and Van der Merwe (33). The average *P*-values from 100 simulations of sample size 1000 were computed for each of the three pairs target densities (Table S1). As expected, the modified Nel and Van der Merwe (MNV) was more sensitive for the first pair, which only differed in mean (average *P*-value = 0). However, the kernel test was still highly significant (average *P*-value = 1.142 ∗ 10^{-29}). For the next two pairs with similar mean values but clear differences in the internal organizations between the two densities, the MNV test gave nonsignificant average *P*-values of 0.5195 and 0.2158, whereas the kernel-based test gave highly significant average *P*-values of 1.353 ∗ 10^{-8} and 3.386 ∗ 10^{-23}. This demonstrated that our density-based test outperformed the parametric MNV test in the detection of differences in internal organization.

### Detection of Morphological Changes in Micropatterned Cells after Drug Treatment.

To evaluate how our test is performing on experimental data, we compared the morphologies of intracellular structures under different experimental conditions (Fig. 2). As indicated in the flowchart (Fig. 2*A*), we analyzed the morphology changes of multivesicular bodies (MVB) induced by a treatment with the drug nocodazole (NZ) that depolymerizes microtubules, a major component of the cellular cytoskeleton. MVB are endosomes involved in several important cellular functions, including processing of nutrients, ligands and receptors during endocytosis, exosome secretion, and autophagy (34) that are transported along microtubules (35). Intracellular MVB were visualized by indirect immunofluorescence against CD63, a transmembrane protein enriched in MVB (Fig. 2*B*). Cells were cultured on micropatterns of extracellular matrix proteins that standardize cell shape and allow alignment of CD63-marked structures. Combining the signals of CD63-marked components from several tens of cells, we showed that the 3D organization of MVB is reproducible in these normalized conditions (17). Disruption of microtubules with NZ disconnects MVB from microtubules, leading to subtle changes in cell morphology (Fig. 2*C*). We transformed the fluorescent signal of normalized cells into coordinates by segmentation analysis as previously reported (17). All detected signals from a control group of cells and a group of cells exposed to treatment conditions were combined to the test populations *f*_{1} and *f*_{2}, respectively. In previous analysis, we estimated that pooling signals from about 20–30 normalized cells (containing several hundreds of structures each) was required to produce reliable density maps (17). So we took these cell numbers as a starting point for our comparisons. Representative 2D and 3D scatter plots of 40 cells are shown in Fig. 2*D*–*G*; the 2D scatter plots of individual cells are represented in Fig. S2. The coordinates from 40 cells from each condition were compared by .

First, we compared a nontreated control group 1 of 40 cells with 11786 detected structures with a second control group 2 of 40 cells containing 12585 structures. The two control samples gave slightly different estimates of the CD63 steady-state distribution (Fig. S3). We estimated the normalized and corresponding to *P*_{2D}-value of 0.2581 and *P*_{3D}-value = 0.1138. So there is strong evidence that the minor differences between the two control samples are not significant.

We then compared the control group 1 with the NZ treated group of 40 cells with 13615 detected structures. For the control versus treatment condition, we estimated the normalized and giving rise to *P*_{2D}-value of 1.589 ∗ 10^{-5} and *P*_{3D}-value = 1.280 ∗ 10^{-11}, indicating that there is strong evidence that the drug treatment significantly affects the distribution of MVB. These results agree with previous studies demonstrating microtubule-dependent movement of MVB (35). To further evaluate our approach, we performed additional analysis of diverse subcellular structures (*SI Text*, Fig. S4, and Table S2).

Second, we compared how our test is performing in comparison to a resampling strategy that was previously established for the comparison of fluorescent images (17). We calculated average *P*-values from either our test statistics or the permutation analysis as a function of the number of cells analyzed, taking 100 random samples of 1, 2, 10, 20, and 40 cells (Fig. 2*H*, Table S3). First, we randomly picked two subsamples from the control conditions (Ctrl) to estimate the false positive rate of our test. Then we compared one control subsample with one subsample taken from treated condition (NZ). Dashed lines represent the permutation test; solid lines represent the kernel density test (Fig. 2*H*). According to the fundamental *P*-value calculations, *P*-values follow a uniform distribution on [0, 1] and thus mean 0.5, assuming the null hypothesis holds as expected for Ctrl. This is true for the permutation test, since it can mimic the sampling distribution of the test statistic. The kernel density test gives more small *P*-values (false positives) than predicted due to the asymptotic approximation (see also Fig. S5). However, an average *P*-value > 0.05 was obtained in each case, not rejecting the null hypothesis at the usual significance levels. Applied to the comparison between Ctrl and NZ treatment, the kernel density test gives lower *P*-values (more true positives) that the permutation test for < 10 cells. Together, these two tests gave the same conclusions when testing a treatment for more than 10 cells (as typically analyzed), demonstrating that the normal approximation for the sampling distribution was as accurate as bootstrap resampling in this case. Thus, our test statistic is comparable to bootstrap resampling.

Next, we evaluated how sensitive our method is to errors in cell alignment on patterned substrates. We systematically estimated how strongly our test statistic degrades as a function of rotational and translational misalignment (see *SI Text* and Table S4). Overall, as expected, the *P*-values uniformly decrease as the magnitude of misalignment increases. As expected from the simulated data analysis, our test was highly sensitive when the entire cell sample to be compared was misaligned. Both rotations of as little as 10° and translations of 20–30 pixels were sufficient to give significant *P*-values (*P* < 0.05) (pattern size was 550 pixels). Our test was, however, less sensitive to random misalignment in individual cells within one sample. Significant *P*-values for rotations appeared at 30–40° and at 30–50 pixels for translations. This analysis highlights the importance of cell alignment to reduce false positive results. Together, our analysis on CD63 demonstrated that the density-based comparison is well suited to detect changes in steady-state morphology of cells cultured under controlled conditions of adhesion.

### Detection of Morphological Changes in Live-Cell Assays in Unconstrained Cells.

To demonstrate that our density-based framework is also valid for the detection of morphological changes in unconstrained cells that are classically studied, we applied our test to live cell analysis. Because cells maintained a consistent orientation during a given time period, fast changes in intracellular organization as observed after drug treatments could be analyzed by our density-based method. We comprehensively benchmarked the statistical method on the dynamics of MVB in unconstrained cells before and after treatment with NZ (Fig. 3). We acquired 3D stacks over 24 min with acquisition at each 60 s, extracted 3D positional information of labeled compartments by segmentation, and compared morphology changes.

We split the images of Movie S1 into four groups (1–4) containing six images each (Fig. 3*A*). Groups 1 and 2 represented the nontreated control groups, with 1080 and 1002 detected CD63-positive structures that were acquired before addition of the drug. Groups 3 and 4 were the treatment test groups, containing 1019 and 801 structures that were recorded after the addition of the drug. The corresponding 2D and 3D scatter plots of the four groups are shown in Fig. 3*B* and *C*. We applied our test statistic on each of the possible combination of pairs. The corresponding *P*-values for the 2D and 3D comparison are listed in Table S5. The results indicated that whereas no significant changes in CD63 morphology was detected before drug treatment (*P*_{2D}-value = 0.4136; *P*_{3D}-value = 0.3565), the treatment with NZ significantly affected CD63-morphology (*P*_{2D}-value = 3.998 ∗ 10^{-6}; *P*_{3D}-value = 4.844 ∗ 10^{-6}). The effect of the drug was more significant for later time points in agreement with visual inspection of the images, demonstrating that the statistical significance can quantify compound influence. Thus, our approach allowed unbiased automated detection of morphological changes in live-cell assays in unconstrained cells.

## Discussion

We have developed a test statistic which inherits the advantages of kernel density estimates to facilitate generally applicable two-sample comparisons of multivariate data. By drawing on recent distributional results for kernel estimators, we were able to express its null distribution in a closed asymptotic form, thus circumventing the requirement for resampling to determine the critical quantiles of the null distribution.

This test allowed us to compare complex data from fluorescent microscopy without reducing the provided information into simple summary statistics. This allowed quantitative comparison of cellular morphology by directly measuring the three-dimensional organization of intracellular structures visualized by fluorescent microscopy. Note that our proposed image comparison focuses on the spatial localization of structures whose point coordinates can be resolved. It is also applicable to the comparison of continuous structures, such as microtubules, but performance is not optimal. As our test statistic requires independent data points (like most kernel-based estimators), the representation of continuous structures by several connected coordinates leads to smaller *P*-values. This is a disadvantage compared to commonly used feature-based techniques that collate any type of measured quantity from the microscopy images and therefore can also be applied even to diffuse fluorescent patterns. However, feature-based comparisons require the critical stage of feature selection to be calibrated carefully for each comparison (15), especially in order to compute *P*-values. So the challenge to automatize optimal feature selection into a black-box method remains an open problem. Our simpler, more direct approach of comparing spatial distributions does not face equivalent problems allowing full automatization. As image acquisition facilities are developed at an accelerated speed, there is a rising need for image analysis tools that estimate the required test parameters directly from the data and do not require computationally intensive analytical techniques. As feature-based and density-based approaches use different numerical information, they are complementary to each other and the combination of both of them should lead to an improvement of image analysis.

Such computational imaging methods are indispensable tools for the high-content and high-throughput image acquisition capability of advanced microscopes that daily acquire thousands of high-resolution images in time-lapse experiments. We have shown that our density-based mathematical framework is powerful for phenotype profiling and can be easily adjusted to high-throughput analysis. Attempts are underway to incorporate this computational imaging comparison into a high-throughput workflow to screen for cell morphological changes due to chemical compounds treatment and siRNA-based gene silencing.

A second disadvantage of our approach is that it requires the cells to have a constant shape in order to construct spatial density maps and the test statistic. Fortunately, the micropatterning technique allows us to grow cells reproducibly into standardized shapes in culture. One important advantage of growing cells in controlled conditions of adhesion is that cells are much closer to their physiological state in tissues, where cells are restrained, than in classical (unconstrained) culture conditions on Petri dishes. Another advantage is that standardizing cells by micropatterning technology represents an important step towards quantitative approaches in cell biology. We have also shown that our testing procedure can be applied to live cell comparisons. By orienting an unconstrained cell through time, we validate that unconstrained cells are in principle analyzable. However, alignment of unconstrained cells with the help of computational approaches (36, 37) will be a requisite in order to apply density-based comparison as a general approach for unconstrained cells. An important future application is to compare cell spatial morphology in tissues, in which many cell types show reproducible shapes and inherent polarization. In particular, this application would be important to detect alterations in the cellular architecture during pathological processes such as cancer. Since imaging and alignment of tissue cells is more challenging, it is not yet suitable to apply our testing procedure.

A promising extension of our test is to elaborate the regions of the sample space, which are the largest contributors to the overall statistical difference. There have been some attempts to tackle this problem (38) using a heuristic density differences approach based on data mining approaches, but which is unable to make rigorous statistical inferences. Developing a rigorous analogue would be an important advance for the analysis of the multivariate samples comparisons. Another future challenge is analyzing structures with a diffuse fluorescence. An alternative is to consider diffuse fluorescence patterns as functional data; i.e., not to be finite dimensional vectors (as in multivariate analysis) but infinite dimensional functions. The state of the art in formal hypothesis testing for functional data analysis is less advanced than in multivariate data analysis, leaving a testing procedure for functional data analogous to our proposed statistic an open problem.

## Methods

To establish the asymptotic sampling distribution of , we follow the approach of Chacón and Duong (29). Suppose that the conditions hold. For *l* = 1, 2,

(F) The target densities *f*_{l} have two derivatives, which are bounded, continuous, and square integrable.

(H) The bandwidths **H**_{l} = **H**_{l}(*n*_{l}) are a sequence of symmetric, positive definite matrices such that all elements of **H**_{l} → 0 and as *n*_{l} → ∞.

(K) The kernel *K* is a symmetric probability density function such that *m*_{0}(*K*^{2}) = ∫*K*(** x**)

^{2}

*d*

**is finite, and that ∫**

*x*

*xx*^{T}

*K*(

**)**

*x**d*

**=**

*x**m*

_{2}(

*K*)

*I*_{d}for some real number

*m*

_{2}(

*K*) and

*I*_{d}is the

*d*×

*d*identity matrix.

(N) The sample sizes *n*_{1},*n*_{2} are such that *n*_{1}/*n*_{2} and *n*_{2}/*n*_{1} are bounded away from zero and infinity as *n*_{1},*n*_{2} → ∞. The proof is deferred to the *SI Text*.

### Cells and Sample Preparation.

Cell culture and sample preparation for fixed cells was as in ref. 17. Antibodies used were primary α-CD63 (Invitrogen), Sec13 (17), α-tubulin (BD Biosciences) and Alexa-Fluor 488, Cy-3, or Cy-5-coupled secondary antibodies. EGFP-CD63-expressing stable cells (generated by transfection of the plasmid pEGFP-CD63 (34) into RPE-1 cells) were seeded on iwaki glass base dishes (Asahi Glass) for live cell observation. To depolymerize microtubules, NZ was added to a final concentration of 20 μM. Cells were imaged before and after addition of NZ.

### Immunofluorescence Image Acquisition and Processing.

Image acquisition of fixed cells was as in ref. 17. Live cell imaging was performed on a Yokogawa spinning disc mounted on an Eclipse TE2000 Inverted Microscope using 60x Plan Apo VC 1.4 Oil objective, Laser 491 nm and CCD camera (Roper CoolSnap HQ2). Z-series of images were taken every 0.2 μm every 60 s.

Images were segmented with MetaMorph (Universal Imaging Corporation) as described in ref. 17. Briefly, the centroids of fluorescent objects were detected as fluctuations that are 15-fold larger than noise. The watershed function was routinely applied to precisely detect individual structures in dense regions. The coordinates of the segmented structures were aligned using the center of the micropatterns as in ref. 17 and used in a completely automatic testing procedure programmed in the ks library (31) in the open-source R programming language.

## Acknowledgments

We thank Christophe Zimmer and Jost Enninga for critical reading of the manuscript, and Sara Chambard for participating in the analysis for the parametric test. We acknowledge the Nikon Imaging Centre and Imaging Facility at Institut Curie (IC)—Centre National de la Recherche Scientifique (CNRS) for support with microscopes and deconvolution service. K.S. received funding from the Fondation pour la Recherche Médicale en France and Association pour la Recherche sur le Cancer. T.D. was supported by Mayent-Rothschild. This project was further supported by grants from Agence Nationale de la Recherche, the CNRS, and IC.

## Footnotes

↵

^{1}B.G. and K.S. contributed equally to this work.- ↵
^{2}To whom correspondence should be addressed. E-mail: Kristine.Schauer{at}curie.fr.

Author contributions: T.D., B.G., and K.S. designed research; T.D. and K.S. performed research; T.D. contributed new reagents/analytic tools; T.D. and K.S. analyzed data; T.D. and K.S. wrote the paper.

Conflict of interest statement: A patent has been filed on the reported approach.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1117796109/-/DCSupplemental.

## References

- ↵
- Farhan H,
- et al.

- ↵
- ↵
- Kim JK,
- et al.

*C. elegans*. Science 308:1164–1167. - ↵
- Ramon y Cajal S

- ↵
- Boland MV,
- Murphy RF

- ↵
- ↵
- Jones TR,
- et al.

- ↵
- ↵
- ↵
- ↵
- Chen SC,
- Zhao T,
- Gordon GJ,
- Murphy RF

- ↵
- ↵
- Perlman ZE,
- et al.

- ↵
- ↵
- Logan DJ,
- Carpenter AE

- ↵
- ↵
- ↵
- Thery M,
- et al.

- ↵
- Gibbons JD,
- Chakraborti S

- ↵
- ↵
- ↵
- ↵
- Simonoff JS

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Duong T

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵

## Citation Manager Formats

### More Articles of This Classification

### Physical Sciences

### Applied Mathematics

### Biological Sciences

### Related Content

- No related articles found.

### Cited by...

- Evolutionary history and adaptation of a human pygmy population of Flores Island, Indonesia
- Evidence for a dual role of actin in regulating chromosome organization and dynamics in yeast
- Micropatterned Macrophage Analysis Reveals Global Cytoskeleton Constraints Induced by Bacillus anthracis Edema Toxin
- BRCA2 diffuses as oligomeric clusters with RAD51 and changes mobility after DNA damage in live cells