Nanometer-accuracy distance measurements between fluorophores at the single-molecule level

Significance Measurements of macromolecular shapes provide insight into the mechanism of molecular machines. Distance measurements at the scale of biological macromolecules are often pursued by single-molecule fluorescence techniques. However, while single-molecule Förster resonance energy transfer can estimate distances of less than 8 nm, distances on the scale of 8 to 25 nm are difficult to determine. Here, we report two-color fluorescent distance measurement techniques capable of determining distances with ∼1-nm accuracy over a wide range of length scales. These methods can be implemented in high throughput on commonly available microscopes. As an example of their utility, we used our methods to uncover an unexpected conformational change in the antiparallel coiled-coil stalk of the dynein motor domain in different nucleotide states.


Fig. S2 | Comparison of target registration error (TRE) and fiducial registration error (FRE)
shows that FRE is unreliable and that TRE should always be reported as registration error. The target registration error (TRE) reports the distance (ideally 0) for fiducials other than the points used to create the registration map (2) and is more critical than the fiducial registration error  Table S4.   Table S4. Overall, the optimal parameter settings for piecewise affine maps are a minimum of 10 and a maximum of 100 fiducial points at a maximum distance of 2 μm.
Details about fitting parameters are in Table S4.  Assuming the true position of Cy5 is known, each measurement that finds Cy3 inside the circle will be less than d and measurements finding Cy3 outside the circle are larger than d.
Integrating the intensities of the blue molecule inside and outside the circle shows that the total intensity outside the circle is higher than inside. Consequently, the probability for measuring distances larger than d is higher than measuring distances lower than d. Halos represent position / distance uncertainty. (b) Probability distribution (Eq. 2) plotted for various parameter combinations of calculated distance μ and distance uncertainty σ d shows that small variations in σ d lead to large changes in estimation of μ.   Quantification of 20 nm DNA-origami nanorulers with 5-10 dyes of Cy3 and 5-10 dyes of Alexa 647 whose center of mass is 20 nm apart (a-f), and of TetraSpeck™ beads (g-l). For the intensity as well as the background we expect a linear increase for increasing radiant exposure.
Only if for instance photobleaching occurs faster than the acquisition time or if a pixel gets saturated, we expect divergence from the linear behavior. Blue dots show values for channel 1 (Cy3 or Cy3 like dye(s)) and red dots for channel 2 (Cy5 or Cy5 like dye(s)). (a, g) Intensity in number of photons as a function of radiant exposure (Table S2) Table S6). Details about fitting parameters are in Table S4.  Here, not 100 particles but 1,000 (b, d, f) and 10,000 (c, e, g) particles were used. We evaluated the performance of Vector-P2D (red) and Vector (grey) by calculating the distance discrepancy showing that this is very reproducible. This is an example of a precise yet highly inaccurate measurement. Large error bars typically indicate bimodal cases for which we measured both, distances that are similar to the expected distance and distances that are much smaller than the expected distance. Hence, the increasing size of error bars with increasing σ d /d ratios shows that the fitting outcome is becoming more and more bimodal until it collapses to one side (measuring distances of around 0 nm). (h, i) Since we always used a true distance of 10 nm in all our simulations, we also tested if Vector-P2D can resolve distances of 2 nm, 10 nm, 20 nm, 50 nm, 100 nm, 200 nm, and 500 nm. We therefore used Monte Carlo simulated data with 1,000 particles, a ratio of distance uncertainty over distance of 2.2, either 5 (h) or 20 frames (i) and the different distances listed above. For each condition we created 100 datasets.
We then used Vector-P2D to determine the distance for all datasets and calculated the average distance for each condition. We used this data to calculate the correlation between true and measured distance by determining the slope and found 1.00 for the 5 and 20 frame data which indicates a perfect agreement between true and measured distance. Error bars show the standard deviation.   Table S4. number of frames) until Sigma-P2D or Vector-P2D still deliver reliable results (this is an average distance discrepancy of less than 20% from the true distance with a standard deviation of less than 30% of the true distance).

Filter Data
Nothing selected

Positions
Always all, except for registration maps where data is split to create affine and piecewise affine maps so that ~1000 beads can be used for affine map. Remainder is used for piecewise affine map.

Skip Channels
Not selected  All datasets: 300 Supplementary Figure 10 See Table S2 See Table S2 Supplementary Figure 11 See

Color centers of TetraSpeck™ beads do not overlap
For TetraSpeck™ beads the registration precision (σ x , σ y ) is worse than one would expect based on their localization errors σ l (σ loc_1 , σ loc_2 and their variances σ σ(loc_1) , σ σ(loc_2) ) (Fig. S5). To assess if this discrepancy is due to our image registration procedure or an intrinsic parameter of the beads, we imaged individual beads translated in a 30x30 grid pattern and noticed that most beads have a consistent x-y-distance offset, independent of the position in the image of the test sample we turned to a single biotinylated Cy3/Cy5 dsDNA construct, which has been shown to have no distance variation across molecules and to be of zero distance (5). For this sample we found very good agreement between μ x and σ l as well as μ y and σ l (Fig. S7).
Thus, these experiments suggest that the sample's localization error σ l can almost solely account for the registration precision σ reg as long as the sample is uniform in distance and the registration accuracy is high (< 1 nm).

Supplementary Information Note 2
The two-dimensional probability distribution (9) is given by in which r is the measured Euclidean distance, μ the calculated distance, σ d the distance uncertainty, and I 0 the modified Bessel function of integer order zero. For σ d ≫ μ, r similar order or less than σ d , and with we can find the following approximation: Thus, the approximation for σ d ≫ μ of the probability distribution (Eq. 2 / 4) is independent of μ.

Predicted and measured localization errors correlate well
Sigma-P2D can only be used with experimental data if the fluorophores' localization errors can be determined with high accuracy, because imprecise predictions lead to incorrect distance estimates (Fig. S8b). To test the available theoretical predictions of localization errors, we first We then investigated the standard deviation of localization errors (σ σ(loc_1) , σ σ(loc_2) ) for fluorescent probes with a cluster of up to ten Cy3 and Alexa 647 dyes (20 nm nanorulers), and of TetraSpeck™ beads. We found that localization errors follow a probability distribution (Fig. S9) that depends amongst other things on the illumination pattern and the number of fluorophores per particle. Thus, in order to plot a fit for Sigma-P2D of many molecules the distance uncertainty σ d-adj has to be adjusted to .
(11) σ d adj = √ σ reg 2 + σ loc Note, that the distance μ is still determined as described above by performing MLE for individual (single) particles using equations 2 and 3 and we only need this adjusted distance uncertainty σ d-adj if we plot a fit for a histogram with distances of many particles.
Next, we measured localization errors at various radiant exposures and compared these to those predicted by equations. We measured the variations in probe position in a time-lapse sequence to determine the experimental localization error. We determined the localization error of each probe by averaging the variance of its pairwise distance with each of the other probes in the image over more than 120 frames (see Materials and Methods). The error predicted by the MLEwG fit for a single probe was calculated from the average pairwise localization error of that probe with all other probes from the same dataset. Interestingly, the measured and predicted localization errors correlate well (R 2 ≥ 0.8) at low radiant exposures (Fig. S11) but poorly for one of the two samples at high radiant exposures (Fig. S11f). We discovered that the poor correlation was caused by intrasample distance changes during the measurements likely caused by photobleaching (Fig. S11). To test this further we performed Sigma-P2D on single molecules from the same dataset (Fig. S11g-l) and noticed that the discrepancy between measured and predicted localization errors for samples with few fluorophores at high radiant exposures is mainly due to intrasample distance changes (bleaching of dyes and change in 'center-of-mass' -distance between the two colors) during measurements (Fig. S11k, l).
Hence, outliers for which the predicted and measured localization errors do not match are likely caused by sample imperfections. Taken together, the localization errors predicted by the

Supplementary Information Protocol
In the following sections we provide a detailed protocol for data acquisition, fitting of emitters, image registration, and data analysis (distance determination) in μManager (1)

Supplementary Information -Reviewers Comments
Reviewer #1's comments and author response: Comments to Author : In this manuscript, Niekamp et al. introduce analysis methods that advance the state of the art for measuring molecular distances by colocalizing two different fluorophores. The performance of the methods is analyzed using simulations, validated using single-molecule experiments, and applied to a novel measurement addressing an outstanding question in the dynein field. Building on prior work from the Spudich and Flyvbjerg groups, they show that the useful range of SHReC can be extended to shorter distances using two approaches -fits to distance distributions can be improved by incorporating independent information about localization error ("Sigma-P2D"), and vector-averaging over multiple frames can be productively combined with fits to rigorous distance distributions ("Vector-P2D"). Combined with optimized registration methods presented here, these approaches convincingly enable robust determination of mean fluorophore-to-fluorophore distances in the 8-20 nm range using a standard TIRF microscope. While the technical insights here might each be considered incremental in isolation, the overall outcome is an impressive and important advance that will have a strong impact on the single-molecule measurement field, enabled by the carefully detailed methodology documented by the authors and the software they will make available to the community. The kinesin measurement in particular provides a striking example of an accurate SHReC measurement at short distance scales. The dynein capstone is a good example of a problem that is not easily addressed using other available measurements, and the authors appropriately perform extensive complementary experiments to test their interpretation of the SHReC result. This is a strong paper and the evidence largely supports the conclusions drawn (I would have entered "mostly" for question 6 if the option were available). I think it could be further improved by addressing the following questions.
1. Although the authors introduce Vector-P2D motivated by applications to heterogeneous samples, they also recommend using Vector-P2D for homogeneous samples in cases where enough frames are available for each molecule. For high enough values of sigma/d Vector-P2D will of course suffer from the same category of problem as P2D, yielding erroneous distance measurements (Fig. 4c), which the authors could discuss. In the special case of homogeneous samples, is there an opportunity for further improvement by combining the ideas in Sigma-P2D and Vector-P2D, making "Sigma-Vector-P2D"? That is, with independent determination of localization error and propagation through the vector averaging over frames, can the width of the distribution be removed as a fitting parameter in order to push the usability of Vector-P2D to higher values of sigma/d? Having introduced these two approaches it would seem natural to combine them; it would be great to see a discussion of the feasibility and/or utility of that. We fully agree that Vector-P2D will encounter similar limitations as P2D with higher ratios of sigma/d. However, for the case of homogeneous samples, the vector averaging approach reduces the width of the distribution significantly (see Fig. 4b). Thus, Vector-P2D can tolerate higher ratios of sigma/d than P2D can (see Fig. 4 and Fig. S12). We now commented on this more extensively in the discussion section (see page 15) and when introducing Vector-P2D (see page 10). Also see point 4.
We implemented and tested the suggested "Vector-Sigma-P2D" method. We created 100 Monte Carlo simulated datasets for samples that are homogeneous in distance with either 5 or 20 frames and 1,000 particles. Using these, we performed the regular Sigma-P2D fitting and "Vector-Sigma-P2D". For the "Vector-Sigma-P2D" we first calculated the vector average distance for each particle (as in Vector-P2D) and propagated (standard error of the mean) the corresponding localization errors so that these could be used in a subsequent Sigma-P2D fit as a fixed parameter. As expected, the distance distributions of "Vector-Sigma-P2D" are narrower than the distance distributions of Sigma-P2D. Overall, the "Vector-Sigma-P2D" fit results in smaller error bars than the Sigma-P2D fit. However, the average distance discrepancy for "Vector-Sigma-P2D" was shifted to higher values for larger ratios of distance uncertainty over distance (sigma/d) and thus less accurate than Sigma-P2D (see figure  below). Hence, we decided not to include "Vector-Sigma-P2D" as a method in our manuscript.
We note that performing the "Vector-Sigma-P2D" the other way round, that is first performing Sigma-P2D and then vector averaging ("Sigma-Vector-P2D"), does not work. Say one performs Sigma-P2D on individual particles that were imaged for 20 frames to determine the distances of each particle separately. Taking all of these distances of individual particles is then equivalent to single frame observations and thus can not be fit with the vector averaging method since it requires two frames of more. Nevertheless, one could perform another P2D fit on this data, in which the distance and the distance uncertainty are both used as fitting parameters. This could be called something like "Sigma-P2D-P2D". We also tested this method and it performed worse than Sigma-P2D alone and thus we did not include it in the manuscript.
2. Throughout the paper the authors include plots of average discrepancy vs sigma/d, averaged over repeated simulations, to evaluate various analysis methods. I don't think it is explicitly stated what the errors bars in these plots represent -are these the standard deviations over repeated simulations? This should be labeled clearly as it is an important metric for readers considering the applicability of the methods to different situations. The authors could also consider providing inserts that focus on the most experimentally relevant regions of these plots, which cover a wide range of sigma/d and associated discrepancies.
This is a very good point as we forgot to list how the errors were calculated. Indeed, the error bars in Fig. 3 and 4 as well as Fig. S12, S13, S17 and S18 are the standard deviation over the repeated simulations. We added a sentence to each figure caption. We evaluated figures with insets, however, these became too busy and difficult to read.
3. In the plots described above, the authors consistently average results over just 10 simulations per data point. For many cases, over the wide parameter ranges presented, this is not enough sampling to determine the average discrepancies very well, so there is a lot of scatter in large regions of the plots. If the goal is really to precisely define the limits of these different approaches, why not use a much larger number of simulations and precisely determine the average (and standard deviation of) distance discrepancies for each data point?
We now generated more Monte Carlo simulated data (100 instead of 10 datasets). We then performed comparisons between different methods and updated the corresponding figures. We did this for the comparison between Sigma-P2D and P2D (Fig. 3c), Vector-P2D and Vector (Fig. 4c, Fig. S12 b-g, S13 (performance as function of sample heterogeneity)), the single-molecule analysis between Sigma-P2D and Vector (Fig. S17), and the comparison between MLE and NLLSQ fitting (Fig. S18).
4. The reader is left to gauge the domains of applicability by gazing at the plots described above, often as noted in plots with scatter due to undersampling. It would be valuable for the authors to summarize guidance about when different analyses are applicable, depending on sigma/d ratios and numbers of particles.
We tried to provide some guidance with Fig. S16 and in the Discussion. We now extended this part in the Discussion (see pages 14-17) and defined some more stringent parameters (cut-offs) up to which each method is still good to use. Therefore we defined measurements as reliable when they resulted in an average distance discrepancy of less than 20% from the true distance with a standard deviation of less than 30% of the true distance. We also added a table in Fig. S16 that shows under which conditions (number of particles, number of frames and sigma/d ratio) Vector-P2D and Sigma-P2D can be used.
5. The authors discuss anticipated applications to dynamic measurements, but much of the paper focuses on methods for extracting average values from ensembles rather than measurements on individual molecules. Can the authors comment at more length in the main text on which aspects of their advances will be helpful for dynamic measurements?
We now added a few more sentences in the Discussion section (page 16/17) where we discuss how our new methods could be advantageous for dynamic measurements of single molecules.
6. The issue above is briefly addressed in Fig. S17, where the authors compare Vector-P2D and Sigma-P2D for a single molecule. It should be clearly explained what "Vector-P2D" means for a single molecule here -is that just the single vector average over the frames recorded (in which case there is no "P2D" fit involved)? The specific claim of how many frames are required for Vector-P2D to do better than Sigma-P2D is not so convincingly presented here, again due to the very small number of simulations -certainly this plot could make good use of averaging over a larger number of simulations.
This is an important point and we failed to describe clearly what Vector-P2D means in the case of a single particle. Since only single particles are analyzed, Vector and Vector-P2D are equivalent, because there is only one data point that can be fitted with the P2D function after vector averaging. Thus, we only used the Vector method for this case. We reran all Monte Carlo simulations and now used 100 instead of 10 datasets (see point 3). Moreover, we compared Sigma-P2D and Vector for different conditions of distance uncertainty over distance (see Fig. S17). With new data, we actually see that Sigma-P2D performs better than or at least equally well as Vector (smaller distance discrepancy) for all conditions when distance distributions of single particles and not ensembles are analyzed. Thus, our previous assessment that Vector can be better than Sigma-P2D for single particles was wrong and using a larger dataset helped to clarify this point. We used this data also to provide a better guideline for users (see Fig. S16 https://docs.google.com/document/d/1ec9RLWYTUioQDv-TnuF2z0aj7hB23AHmYzbjdQV_CIw/edit 54/61 and response to point 4). We also updated this in the discussion section (see pages 15) and in the figure caption (Fig. S17).
7. The positions of fluorophores used for the authors' analysis are here determined using Gaussian fits to the image data. The authors should discuss the conditions (focus, dye mobility, etc.) under which this can be done without introducing important discrepancies in distances measured (and relate these conditions to the experiments performed). There is of course considerable literature (including work referenced here and work from e.g. the Moerner group) on how fixed dipoles or fluorophores with partially restricted mobility can lead to asymmetric PSFs and thus systematic localization errors when not appropriately fit.
We discussed the importance of focus, drift, and sample localization for the image registration process with Tetraspeck beads and now also added a similar discussion for the distance measurements (see page 15/16): As for the image registration, a high quality autofocus system is essential for two-color distance measurements since the image registration, and therewith the distance measurement, changes with focus. Thus, imaging of fiducial markers for image registration and sample of interest on the same slide ( Fig. S1) is necessary. Restricted dye mobility causes changes in the point spread function leading to systematic localization errors and incorrect distance measurements. We observed a "normal" point spread function shape in all our samples, and also used intensity comparisons between linearly and circularly polarized light to ascertain full dye mobility.
8. Minor: I would advise caution in defining the domain of competing methods that the authors' analysis should be (favorably) compared to, given the large space of single-molecule methods that can be used for distance measurements. E.g. in Figure 1a could retitle "Current single-molecule localization techniques" with "Current single-fluorophore colocalization techniques".
We fully agree and should have been more careful with terminology. We now changed Fig. 1a as suggested and also updated the terms throughout the manuscript.

Reviewer #1's comments after revision:
Stuurman and coworkers have improved their manuscript through substantial revisions that were very responsive to questions raised by both referees. I am satisfied that they have addressed the issues raised in my original review.

Reviewer #2's comments and author response:
Comments to Author: The authors devised a method to determine the registration error between two colors, then used these measurements to greatly improve on two different single-molecule analysis methods, creating a pipeline enabling distance measurements between fluorophores in the currently unresolved 8-25 nm range and beyond. Overall, this is an important advance that will facilitate a number of new studies that were previously not possible. This paper will be of general interest, as commendably, an emphasis has been placed on enabling broad implementation of this technique by others.
No additional experiments are necessary for publication; however, guidance on the limits of this technique (suggested by the authors' simulations) is essential for the rigorous adoption by others. Below are some clarifications that might assist in the reader's understanding of this paper. Additionally, a number of changes to figure legends would help comprehension of the supplemental material.
General comments: Many figures would be improved by captions motivating the experiments and a brief analysis of the results, especially in the supplement. Currently, experiments in the supplement are introduced in a combination of the main text, the methods section, and supplemental notes.
Consequently, it is difficult to follow what motivates many of these figures and what conclusions to take away from the data. Ideally, the figure legends would summarize the key conclusions from each portion of the figure.
As discussed in more detail in the specific comments below, it should be explained why the distance discrepancy always trends to -1 with no error term with increasing noise in the author's simulations. (see comments regarding figure 4 and S12) We added a more detailed summary/discussion of the key conclusions to the figures and tried to make it easier to follow. For how we changed things in many figures, see comments below.
Specific comments: Page 6: "With the corrected second fiducial marker dataset, we then calculated the target registration error (TRE)" The reference to figure S2 in the main text should refer to figure 2 instead-figure S2 deals with the comparison between TRE and FRE.
This is a good point. We made the change accordingly. 20% offset of the measured distance from the predicted distance. We now added a clarification on page 11.
Page 13: "...we show that our techniques enable distance measurements from ~2 nm to hundreds of nanometers..." Please explain where these bounds come from and how those who utilize this technique can determine the bounds for their own input images.
As we have shown in Fig. 4 c, the successful execution of Vector-P2D is insensitive to the actual distance but depends on the ratio of distance uncertainty over distance. In our simulations, we used a true distance of 10 nm. However, we tested if Vector-P2D can also resolve distances of 2 nm, 10 nm, 20 nm, 50 nm, 100 nm, 200 nm, and 500 nm by Monte-Carlo simulation but did not show the data. We now included this data in Fig.  S12h, i. We also added a sentence about this in the manuscript (see page 11). Moreover, we may not have explained well enough why we used units of distance discrepancy [d] instead of an actual distance value in nanometer (as we could see from your comment for Fig. 4). We now added a clarification on page 9 and more detailed clarification in the corresponding figure captions (Fig. 3, 4, S12, S13, S17, and S18).

Figure 2:
Please provide the histogram of target registration error before the piecewise affine correction (i.e. from the data in panels c and d). This will allow the reader to evaluate the contribution of the piecewise affine registration step to the overall registration performance.
We now added the histogram of the target registration error for the affine registration in Fig. 2.

Figure 4:
Panel c: In the 20 frame data, why does error bar size increase with increasing sigma (as expected) and then suddenly decrease at σd/d = 6 (see also comments on figure S12)? Can the authors comment if this reflects on some limit of the technique?
This is a good point and we should have clarified better. The reason that the error bar disappears is that we evaluated the performance of Vector-P2D and Vector by calculating the average distance discrepancy. Therefore we subtracted the expected distance from the measured distance and normalized by the expected distance. Thus, values around -1.0 represent cases for which we measured 0 nm and for which we find very small error bars showing that this is very reproducible. Large error bars typically indicate bimodal cases for which we measured both distances that are similar to the expected distance and distances that are much smaller than the expected distance (around 0 nm). Hence, the increasing size of error bars with increasing sigma shows that We now added a better definition for the distance discrepancy (pages 9 and 11). Also, see comment to Page 13 and Fig. 4. We now provide a label for the purple line and also defined the abbreviations in the figure caption. As for the labeling which dynein is apo and ADP-vi, we already added labels on top of each of the two dynein cartoons in Fig. 5a. Figure 5 and associated main text: It took me a while to understand the same disorder/order connection the authors did from between the difference in distances measured, and lack of visible stalks in the cryo data. What I had to realize what that, while one can understand that disorder can make the stalk disappear in EM, in the SA-immobilized samples, both ends are constrained -hence disorder reduces length, order increases it. A simple sentence or two making this clarification explicitly would help the reader make the connection as well.
We now added additional clarification in the main text and fully agree that this is needed (see page 13).

S2:
This figure not substantially motivated in the text. Specifically, why is a different dataset used in this figure than in figure 2, and in particular one that yields a very different outcome than the successful registration in figure 2? Is the goal to show a condition in which target registration has failed, and in which this failure is detected by TRE and not FRE?
We agree that Fig. S2 is not extensively discussed in the text and that we did not motivate this figure well. Therefore we now added a few sentences in the figure caption as well as in the Materials and Methods section. We indeed used a different dataset (compared to Fig. 2 with a successful image registration) for which the image registration failed (likely due to an instability in the focus between or during the acquisition of the first and second fiducial marker dataset) to show that the TRE can detect the failure and that FRE would not. We now added this description to the figure caption as well.

S3:
A bit more explanation of the exact meaning of the maximum and minimum fiducial and the maximum distance in the context of the registration algorithm is unclear. Define please. Panels m-r are also appear highly redundant with panels a-f, and may not be necessary.
We now added a more detailed description (in the figure caption) of the meaning of the maximum and minimum fiducial number and the maximum distance in the context of the image registration process. We used the 10 by 10 um binning for the evaluation of a very large parameter set for piecewise image registration as we described and have shown in Fig. S4. However, we agree that panels m-r convey very similar information to a-f. Thus, we now removed these panels. Moreover, we now evaluated the performance of various parameters for image registration (Fig. S4) without binning. The results, for which parameter setting are the best for image registration, are very similar.

S6:
The caption for panel d does not make sufficiently clear how panel d differs from panel a. Incorporating the information in supplementary note 1 that grids are overlapping would aid understanding. Additionally, if the data for multiple overlapping beads is shown in panels E and F, something should be done to distinguish the data for each bead (ex. different point shapes).
We now added a clarification in the figure caption. We also added more information in Supplementary Note 1, and changed the bead representation in e and f so that the positions of one bead are shown as a circles and the positions of the other bead are shown as a squares. S12: As in figure 4c, there is little to no error at high σd/d values-why? In addition, in many panels in this figure, as in the main text, the analysis always appears to end up at a distance discrepancy of -1.0 with no error. What causes this to happen? This is an unexpected behavior. Such systematic error, seemingly on the same order of magnitude as the distance being measured, may disqualify this measurement technique in certain σd/d (~>2-3) regimes.
The reason that the error bar disappears is that we evaluated the performance of Vector-P2D and Vector by calculating the average distance discrepancy. To do so, the expected distance is subtracted from the measured distance and normalized by the expected distance. Thus, values around -1.0 represent cases for which we measured 0 nm and for which we find very small error bars showing that this is very reproducible. Large error bars typically indicate bimodal cases for which we measured both distances that are similar to the expected distance and distances that are much smaller than the expected distance (around 0 nm). Hence, the increasing size of error bars with increasing sigma shows that the fitting outcome is becoming more and more bimodal until it collapses to one side (measuring distances of around 0 nm). This means that the method is highly error prone for conditions that lead to -1.0 values with small error bars.

S14
: Panels e and f may be unnecessary-they do not add additional information beyond what is shown in the beginning of the figure. If the authors wish to keep the data in panel F, something should be done to distinguish the data for the different rulers (ex. different point shapes) as in figure S6.
We agree and removed panels e and f. S17: The figure states that Vector-P2D works better for frame numbers of 7 and up, but elsewhere in the paper a cutoff of 5 frames is quoted (e.g. figure S16). The authors should be consistent about the recommended cutoff-it seems like a cutoff of 7 is the best-motivated choice given the data presented here.
This is a valid point. As described in the comments to reviewer #1, we reran all Monte Carlo simulations and now used 100 instead of 10 datasets. Moreover, we compared Sigma-P2D and Vector (which is the same as Vector-P2D for single particles) for different conditions of distance uncertainty over distance (see Fig. S17). With this new data we actually see that Sigma-P2D always performs better than or at least equally well as Vector for single particles. We added a comment about this in the manuscript (see page 15) and in the figure caption. Moreover, we included this in our guideline for when to use which method (Fig. S16). See also responds to point 6 of reviewer #1. S18: The data in panel g contradicts the claim on page 24 ("...NLLSQ fitting outperforms MLE in all conditions where random background noise was added (Fig. S18)"). These data also display the same systematic errors seen in figure 4c and S12 (data trending to -1 discrepancy with no error). Why, if NLLSQ outperforms MLE, was MLE fitting used for sigma-P2D?
We overstated the performance of NLLSQ. What we are trying to say is that the fitting with NLLSQ is better or at least as good as the MLE fitting (for Vector-P2D), as long as background noise was below 5%. At higher levels of background noise both methods failed to recover the true distance. We corrected this in the manuscript (see page 26).
As for the systematic error, see comments for Figure 4c.