## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Correlation analysis framework for localization-based superresolution microscopy

Edited by Joseph D. Puglisi, Stanford University School of Medicine, Stanford, CA, and approved February 14, 2018 (received for review June 24, 2017)

## Significance

Correlation analysis is one of the most widely used image-processing methods. In the quantitative analysis of localization-based superresolution images, there still lacks a generalized coordinate-based correlation analysis framework to take full advantage of the superresolution information. We propose a coordinate-based correlation analysis framework for localization-based superresolution microscopy. We mathematically prove that point-point distance distribution is equivalent to pixel-based correlation function. This framework can be easily extended to model the effect of localization uncertainty, to the time domain and other distance definitions. We demonstrated the versatility and advantages of our framework in three applications of superresolution microscopy: model-free image alignment and averaging for structural analysis, spatiotemporal correlation analysis for mapping molecule diffusion, and quantifying spatial relationships between complex structures.

## Abstract

Superresolution images reconstructed from single-molecule localizations can reveal cellular structures close to the macromolecular scale and are now being used routinely in many biomedical research applications. However, because of their coordinate-based representation, a widely applicable and unified analysis platform that can extract a quantitative description and biophysical parameters from these images is yet to be established. Here, we propose a conceptual framework for correlation analysis of coordinate-based superresolution images using distance histograms. We demonstrate the application of this concept in multiple scenarios, including image alignment, tracking of diffusing molecules, as well as for quantification of colocalization, showing its superior performance over existing approaches.

In recent years, localization-based superresolution microscopy has been demonstrated to be a powerful technique to image beyond the diffraction limit and has produced numerous beautiful images of subcellular structures. The scientific community has now started to embrace it as a routine tool to answer actual biomedical questions: for example, the in situ dissection of molecular organizations for large-protein complexes, including neuronal synapses (1), focal adhesion complex (2), clathrin-coated pits (3), centrosome (4, 5), nuclear pore complex (6, 7), and the escort complex at viral budding sites (8); high-density molecule diffusion and transport study (9); and the colocalization and interaction of two different structures or biomolecules (10).

To make accurate conclusions about the biological system of interest, it is often required to quantitatively characterize the acquired images. While numerous analysis strategies have been established for conventional fluorescence microscopy, these strategies do not apply directly to localization-based superresolution images. The reason is that a conventional fluorescence image consists of pixels or voxels, whereas a localization-based superresolution image consists of a collection of 2D or 3D coordinates, each associated with a localization uncertainty. Under this coordinate-based representation, many trivial operations on conventional images, such as thresholding and subtraction, become challenging. A simple solution for this challenge is to adapt localization-based superresolution images to established, pixel-based analysis routines by binning the coordinates on a pixel grid; however, this binning inevitably leads to a loss of precise localization information. On the other hand, certain operations complicated for pixelated images, such as subpixel image translation, rotation, and image deformation, become straightforward for coordinates. Therefore, it would be greatly beneficial to establish a generalized coordinate-based analysis framework.

We focus on image correlation, which is one of the most widely used image-processing methods. For localization-based superresolution microscopy, correlation analysis or related methods (such as Ripley’s functions) have been used in measuring resolution (11), testing colocalization and clustering (10, 12, 13), image-based drift correction (14, 15), and aligning superresolution images of individual structures (7, 16), although most of these cases still used spatial binning of the super-resolved coordinates.

Here, we present a coordinate-based correlation analysis framework for localization-based superresolution microscopy. We mathematically showed that point-point distance distribution is equivalent to pixel-based correlation function. This point-point correlation function can easily model the effect of localization uncertainty. Moreover, this concept can be extended to the time domain and other distance definitions. We then demonstrated our framework in three applications of superresolution microscopy for which existing methods are either nonexistent or underperforming: model-free image alignment and averaging for structural analysis, spatiotemporal correlation analysis for the mapping of molecule diffusion, and the quantification of spatial relationships between complex structures utilizing the generalized point-set distance definition.

## Results

### Correlation Function Between Two Coordinate-Based Images.

In the pixel-based image representation, the translational cross-correlation function of two images A and B, *C*_{AB} (*ξ, η*), is usually defined as the mean of pixel value products when shifting the two images by (*ξ, η*), normalized by the product of mean pixel values:*N*_{A} and *N*_{B} localization points, and thus they can be described as the sum of Dirac delta functions with each function representing one single-molecule localization point (17, 18). By replacing **1** with this coordinate-based definition, we can derive that the cross-correlation function between two coordinate sets is as follows (*SI Appendix, Note S1*):

To describe the effect that each localization point is associated with uncertainties in its coordinates, we replace the Dirac delta functions with normal distributions. The 2D point-point correlation can then be written as follows:*SI Appendix, Note S2*). The weight factor, *SI Appendix, Note S3*).

When ignoring the directional information, a radially averaged 2D correlation function (*SI Appendix, Notes S4* and *S5*) forms a “pair-distance distribution”*r*, *r* + ∆*r*) range. Equivalent versions of

### Using Correlation for the Alignment of Superresolution Images.

A straightforward application of image correlation is in aligning and averaging multiple superresolution images of a subcellular structure to gain signal-to-noise ratio, which, in turn, leads to enhanced effective image resolution (11). This very successful biological application for superresolution microscopy has allowed in situ dissection of molecular organizations for large-protein complexes. In most cases, however, image aligning and averaging relied on manual image stacking by hand, imposing a predefined structural model (19), or pixel-binning the coordinate sets (20). Algorithms for model-free averaging of coordinate-based images only have been discussed recently (19).

We incorporate our coordinate-based definition of correlation into an extensively used single-particle cryoEM reconstruction strategy (21). In this method, the sum of all coordinates acted as the initial reference and translational and rotational transformations were applied to individual particles to maximize their correlation with the reference; then, the reference was updated with the sum of transformed coordinates. This procedure was iterated multiple times to result in a satisfactory alignment (Fig. 2*A*) (*Materials and Methods*). Compared with pixel-based implementations, the major advantage of using the coordinate-based image representations is that image transformation is straightforward and can be performed exactly without interpolation or loss of information. Moreover, to deal with the excessive noise in localization-based superresolution images, we added a density normalization to correlation calculation, which avoids misalignment caused by clusters with a high density of localization points (*SI Appendix, Note S3*). We validated this method by aligning simulated superresolution images of a nearly symmetric structure (*SI Appendix*, Fig. S1). With density normalization, the asymmetry of the structure was correctly resolved in the averaged image, even when the individual input images are highly noisy. This simulation also demonstrates the advantages of model-free alignment, which is not prone to artifacts from incorrect initial models (e.g., by mistakenly assuming a symmetric structure in this case).

As a demonstration, we applied this algorithm to DNA-PAINT images of a DNA origami structure (22) (Fig. 2*B*). We purposely used a very short imaging period (500 frames) so that each image is highly noisy because of the very limited number of localization points (Fig. 2*C* and *SI Appendix*, Fig. S2). The mean localization precision of this dataset is estimated to be 3.7 nm. By rotationally and translationally aligning and summing 133 images (no initial model was assumed), the underlying structure was precisely recovered (Fig. 2*D*). We further calculated the pair-distance distribution between the combined image and the ground truth from the origami design to quantify the alignment precision (Fig. 2*E*). The resulting correlation function has a peak width of 4.9 nm, which is comparable to the mean localization precision, indicating that our coordinate-based, model-free image alignment method has very high alignment precision. Our algorithm clearly outperforms pixel-based alignment using the single-particle EM image-processing software RELION (23) (*SI Appendix*, Fig. S3). We have also demonstrated its robustness against added background noise points (*SI Appendix*, Fig. S4).

### Frame-Pair Correlation for the Mapping of Molecule Diffusion.

Next, we show that by incorporating temporal information, pair-distance distribution analysis can be used to analyze the movement dynamics of biomolecules in living cells. By combining with photo-switching and single-molecule tracking, localization-based superresolution microscopy has allowed a high density of target molecules to be labeled and followed over time (albeit with short trajectories) (24), thus offering the opportunity to map the spatial heterogeneity of molecule diffusion and transport (9).

Tracking a moving molecule considers the displacement of fluorescent molecules from one camera frame to the next one to produce diffusion trajectories. Coincidently, a collection of all pair-wise displacements is exactly the cross-correlation function of molecule localizations in these two camera frames. Therefore, we define the frame-pair distance distribution FPD(ξ, η) as the average displacement map of molecules localized in two consecutive frames (shown in the radially averaged form):*F* is the total number of frames, *k* is the frame number indicating a single data acquisition time point, and *H*_{k,} _{k} _{+} _{1}(*r*, ∆*r*) is a histogram of distances between localizations in frame *k* with frame *k* + 1. This FPD is analogous to image correlation spectroscopy (25), particle image correlation spectroscopy (17), and the localization-specific spatiotemporal image correlation spectroscopy (26).

FPD describes the ensemble molecule diffusion activity within the area of analysis. For 2D Brownian diffusion, the resulting FPD distribution is a Gaussian peak centered at zero, with an SD representing the mean displacement (MD) per frame. MD is determined by the diffusion coefficient, *D*, the delay between two exposures, ∆*t*, and the localization precision, *p*:

To demonstrate FPD-based diffusion analysis, we took STORM images of *Drosophila* S2 cell membranes stained with a photo-switchable membrane dye, DiD-C_{18} (9) (Fig. 3). Because DiD is a small molecule that has a high diffusion speed in the membrane, we used a stroboscopic illumination scheme to reduce the motion blur from fluorophore diffusion within the exposure time (28). While the camera exposure time was 8.3 ms (∼121-Hz frames per second), we turned on the excitation laser only for 1/10th of the frame duration (0.83 ms). Moreover, by varying the time point of strobing within a frame, we were able to access subframe temporal resolution. Specifically, we turned on the laser at 8/10th of the frame duration for even frames and at the beginning of a frame for odd frames (Fig. 3*A*). As a result, the effective time lag from even to odd frames was only 0.2 frames (1.7 ms) and 1.8 frames (15 ms) from odd to even frames. We then computed the frame-pair distance distribution for these two different sets of frame pairs as both 2D histograms and radially averaged 1D curves (Fig. 3 *D*–*F*). Both curves can be fitted with a Gaussian function, hinting normal Brownian diffusion for DiD in the plasma membrane. In this case, the mean squared displacement (MSD) of DiD between frames can be calculated from the square of the SD of the fitted peak (which we define as MD). Benefiting from the high localization density, we were able to produce a spatial map of MD by binning the STORM image and fitting the frame-pair distance distribution of each spatial bin. The MD maps appear rather homogenous across the cell plasma membrane (Fig. 3 *G* and *H*). We note that non-Brownian diffusion results in a non-Gaussian FPD histogram, but the MSD value can still be easily calculated from the baseline-subtracted histogram based on its definition.

With additional sets of measurements at lag times of 5.8 and 10.8 ms, we were able to construct an MSD curve in the time range of 1.7–14.9 ms (Fig. 3*I*), enabling different diffusion modes to be distinguished. Linear fitting indicates that DiD undergoes largely simple Brownian diffusion at this time scale. The nonzero intercept at zero time lag should be attributed to single-molecule localization uncertainty, while the slope of the fitting corresponded to a diffusion coefficient of 0.2 μm^{2}/s, similar to the previous measurement of DiI/DiD diffusion in cell plasma membrane (29). Our result of simple Brownian diffusion is consistent with previous stimulated emission depletion (STED)-FCS studies of simple lipids (30). In fact, our variable-time strobe method probes into a comparable temporal (1.7 ms) and subdiffraction spatial scale as STED-FCS (31), and it has the similar capability to resolve heterogeneous diffusion behavior by fitting the frame-pair correlation curve with a multispecies diffusion model. Therefore, it offers a complementary method to investigate lipid diffusion and membrane heterogeneity at high spatiotemporal resolution, with STED-FCS potentially offering a higher temporal resolution and variable-time strobe proving to be an easier way for spatial mapping.

### Point-to-Set Correlation Function for Structural and Colocalization Analysis.

Finally, exploiting the equivalence between the distance histogram and the correlation function of coordinate-based superresolution images, an alternative definition of distances can be incorporated in our framework for quantifying the spatial relationship between more complicated structures. Specifically, by considering the distance between individual point A and all points in B as a set, *B*, we used the histogram of these point-set distances to define a point-set correlation function. Mathematically, the distance between a point and a set of points is defined as the distance between this point and its nearest neighbor in the set. We calculate the histogram of this distance for all A points, *A*):*SI Appendix, Note S6*).

To illustrate the utility of the point-set correlation concept, we analyzed the spatial relationship between the cis-Golgi protein GRASP65 and either the cis-Golgi protein GM130 (Fig. 4*B*) or the trans-Golgi protein TGN46 (Fig. 4*C*) (32). We used the DBSCAN method (33, 34) to automatically identify Golgi ribbons in either the GM130 or TGN46 channel and isolate the structures from background noise, calculated the point-set and point-point correlation functions between GRASP65 and Golgi ribbons, and then subtracted the cross-talk contribution from the correlation functions (Fig. 4*D*). For GRASP65 and GM130, which were shown biochemically to interact with each other (35), the point-set correlation displayed a sharp peak at zero distance, indicating strong colocalization. On the other hand, while the point-point correlation peak was broader (reflecting the width of the Golgi ribbons), it does provide short-distance information by showing a dip at zero distance. This dip may be due to spatial exclusion of antibodies, differential accessibility of the two epitopes, or imprecise cross-talk subtraction. For GRASP65 and TGN46, which should exist in parallel, nonoverlapping structures, the point-set correlation displayed a peak at ∼200–300 nm, reflecting the distance between cis- and trans-Golgi ribbons. In contrast, the point-point correlation was completely smeared by the length of the Golgi ribbons, making it much less informative than the point-set correlation in characterizing the spatial relationship between these two parts of Golgi.

## Conclusion

In summary, we have demonstrated the utility of our coordinate-based correlation analysis framework in quantitative interpretation of localization-based superresolution microscopy data in a number of cases: image alignment, tracking of molecular diffusion, and quantification of colocalization. We have also shown the generality and flexibility of this framework in expanding into the time domain (frame-pair correlation) and adoption of alternative definitions of distance (point-set correlation). Although we described our framework and algorithms in 2D representation, they can be easily extended to 3D superresolution microscopy. The resulting correlation function will be a set of points at the tip of 3D distance vectors between two localizations. Even though in most cases the localization precision in the axial direction is different from that in the lateral direction, this anisotropy can be handled by transforming the uncertainty cloud together with image transformation. We expect that the analysis methods described here, as well as model-free superresolution image alignment and fast diffusion analysis by strobe illumination with variable timing, will be broadly useful in practical applications of superresolution microscopy in various biological systems. Finally, we envision that, in many cases, the most efficient algorithm will be a hybrid of coordinate- and pixel-based approaches (*SI Appendix, Note S7*).

## Materials and Methods

### Roto-Translational Image Alignment.

The experimental conditions for this dataset were described previously (22). A set of 133 images of the origami structure was picked by hand (*SI Appendix*, Fig. S1). The alignment procedure was performed as follows. First, coordinates of individual images were shifted so that their center of mass overlaps. Then, several iterations of the following procedure were applied:

*i*) For each individual image, find the translation and rotation that maximizes the cross-correlation of this image with the sum of all other images. For that matter, a brute force-maximizing algorithm was used that samples the cross-correlation function at discrete steps of the rotation angle and the two translational dimensions. For a given rotation, the translational cross-correlation was evaluated as the sum of Gaussians, centered at the tip of pairwise distance vectors with a width of the combined localization uncertainty (theoretically estimated as in ref. 36). Note using a typical optimization problem solver (even the Jacobian can be supplied as the gradient of the sum of Gaussians).

*ii*) Apply the correlation-maximizing rotation and translation to each image, so that a new and improved image is formed when taking the sum of all images. For the alignment shown in this article, 10 iterations of this procedure were applied with a 5-nm translational grid (±25 nm around the center of mass in each dimension) and a 5° rotational step size (covering the complete 360°). The resulting aligned images were further optimized by three iterations with a 2-nm translational grid (±10 nm around the center of mass) and a 2° rotational step size (again covering 360°). To avoid the issue that image alignment for localization microscopy is strongly dominated by local clusters comprising a relatively high number of localization points, during correlation calculation we weighted the displacement vector against the local density of the origin point (*SI Appendix, Note S3*).

### Single-Molecule Localization Microscopy of DiD in S2 Cells.

WT S2 cells were cultured in Sf-900 II serum-free medium (Gibco) and plated onto 35-mm glass-bottom dishes (no. 1.5 cover glass; MatTek) for fluorescence imaging. To facilitate a spread-out cell morphology, the cover glass was coated with 0.1 mg/mL Con A for 0.5 h before cell plating. The cells were then washed (3×, PBS), fixed [4% paraformaldehyde (PFA), 10 min], incubated in DiD (1 µM, 30 s), and washed again (3×, PBS). For images, cells were mounted in PBS with the addition of 100 mM mercaptoethylamine (pH 8.5) and 5% glucose and 1% oxygen scavenging solution [0.5 mg/mL glucose oxidase (Sigma-Aldrich) and 40 mg/mL catalase (Roche Applied Science)]. Single-molecule localization data were acquired on a microscope, as described previously (37), with a 121-Hz camera frame rate, a view field of 128 × 128 pixels, and an EM gain of 100. For DiD excitation, the 642-nm laser power was ∼74 mW measured at the back port of the microscope. The laser was shuttered in synchronization with the camera frames as shown in Fig. 3*A*.

### STORM and Image Analysis of Golgi Proteins.

RPE cells were seeded overnight in a Lab-Tek II eight-well–chambered no. 1.5 cover glass (Nunc), washed twice with warm PBS, and fixed with warm 4% PFA for 15 min at 37 °C. Then, cells were washed again (3×, PBS) and blocked with 3% BSA and 0.5% Triton-X 100 in PBS for 15 min. Primary antibody (Rabbit Anti-GRASP65, ab30315; Abcam; Sheep Anti-TGN46, AHP500GT; AbD Serotec; Goat Anti-GM130, sc-16268; Santa Cruz; 1–5 µg/mL) staining was performed overnight in 3% BSA at 4 °C, followed by washing (3×, PBS) and secondary antibody staining (1–5 µg/mL) in 3% BSA for 1 h. Cells were washed gain (3×, PBS) and postfixed with 3% PFA + 0.1% gluteraldehyde in PBS before final washing (3×, PBS). Secondary antibodies were labeled with a mixture of two fluorophores (either Alexa Fluor 405 and Alexa Fluor 647, or Cy3 and Alexa Fluor 647) for activator-reporter type two-color STORM imaging (14). STORM acquisition was performed on a microscope, as described previously (37). For display, nonspecific blinking was removed from the STORM images (1). The point-set distance distribution was calculated as follows. First, the localization points belonging to Golgi ribbon structures were identified and isolated from background noise points. For this purpose, DBSCAN, a well-known density-based clustering method (33) that has been previously applied to segmentation of localization-based superresolution images, was used. DBSCAN requires two parameters: neighborhood radius and the number of minimum points within this radius to qualify as in a cluster. It starts with an arbitrary starting point that has not been visited. If the neighborhood of this point contains a sufficient number of points, a cluster is then started and then propagated until reaching the boundary points whose neighborhood falls below the point number threshold. Considering the size and shape of Golgi ribbons, we set the neighborhood radius to be 300 nm. The threshold for minimum points was set to be 30 and 110 for GM130 and TGN46 reference images, respectively, so that the segmentation results best matched visual examination of the images. Although, in practice, the two parameters for cluster identification do need to be optimized for the shape of the structure, labeling density, and acquisition conditions, we have found that small variations in these parameters do not substantially change the segmentation results. The clustered molecules are defined as the “set” for the following computation, whereas points not included in any clusters were labeled as background noise localizations and were excluded in further analysis. For the point-set distance distribution, the GRASP65 density was calculated as a function of distance r to the reference set (GM130 or TGN46). To facilitate the computation of the area of each distance shell around the reference set, the calculation was performed based on pixelated STORM images (20-nm pixel size) where the pixel value is equal to the number of localizations in that pixel. The GRASP65 pixel image was subtracted by a pixel image from nonspecific blinking events (which was divided by the ratio of specific to nonspecific frames beforehand) (1, 14), allowing negative values to preserve statistical effects. The image of the reference set was convolved with a centered logical disk of radius r (each pixel is either unity if its distance to the image center is <r and zero otherwise). The resulting convolution is a logical map of pixels that are closer than r to the reference set. Based on logical maps for a range of distances r, the cumulative point-set distance histogram was calculated by summing up all pixels in the GRASP65 image where the logical map is equal to or larger than unity. Besides the localization histogram, the corresponding cumulative area was calculated by summing up all nonzero pixels in the logical map. The point-set distance distribution was then calculated by taking the discrete difference of the cumulative localization histogram and dividing the result by the discrete difference of the cumulative area.

### Code Availability.

Executable programs for particle alignment and diffusion analysis, the Python code for colocalization analysis, and all of the experimental dataset used in this article are available at huanglab.ucsf.edu/Data/Correlation.zip.

## Acknowledgments

We thank Francis Brodsky and Yanzhuang Wang for their help with Golgi staining and imaging, Ralf Jungmann and Peng Yin for generously providing the DNA-PAINT data, and Eugene Palovcak and Yifan Cheng for assistance in image alignment using the RELION software. This project is supported by the NIH Director’s New Innovator Award (DP2OD008479). J.S. acknowledges support from a Boehringer Ingelheim Fonds PhD fellowship, and S.Z. thanks the School of Life Sciences, Peking University, for internship support. B.H. is a Chan Zuckerberg Biohub investigator.

## Footnotes

↵

^{1}J.S. and Y.W. contributed equally to this work.- ↵
^{2}To whom correspondence should be addressed. Email: bo.huang{at}ucsf.edu.

Author contributions: J.S. and B.H. designed research; J.S., Y.W., S.Z., M.B., T.N., and B.C. performed research; J.S., Y.W., and B.H. contributed new reagents/analytic tools; J.S., Y.W., and B.H. analyzed data; and J.S., Y.W., and B.H. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1711314115/-/DCSupplemental.

Published under the PNAS license.

## References

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Löschberger A, et al.

- ↵
- Szymborska A, et al.

- ↵
- ↵
- Shim S-H, et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- Bates M,
- Huang B,
- Dempsey GT,
- Zhuang X

- ↵
- ↵
- ↵
- ↵
- ↵
- Broeken J, et al.

- ↵
- ↵
- Park W,
- Midgett CR,
- Madden DR,
- Chirikjian GS

- ↵
- ↵
- ↵
- ↵
- ↵
- Ashdown G,
- Pandžić E,
- Cope A,
- Wiseman P,
- Owen D

*J Vis Exp*, e53749. - ↵
- Shuang B,
- Chen J,
- Kisley L,
- Landes CF

- ↵
- Elf J,
- Li G-W,
- Xie XS

- ↵
- ↵
- ↵
- ↵
- ↵
- Simoudis E,
- Han J,
- Fayyad U

- Ester M,
- Kriegel H-P,
- Sander J,
- Xu X

- ↵
- ↵
- ↵
- ↵

## Citation Manager Formats

## Sign up for Article Alerts

## Article Classifications

- Physical Sciences
- Applied Physical Sciences

- Biological Sciences
- Biophysics and Computational Biology