Stochastic bacterial population dynamics restrict the establishment of antibiotic resistance from single cells

Significance The emergence of antibiotic resistance poses a critical threat to the efficacy of antibiotic treatments. A resistant bacterial population must originally arise from a single cell that mutates or acquires a resistance gene. This single cell may, by chance, fail to successfully reproduce before it dies, leading to loss of the nascent resistant lineage. Here, we show that antibiotic concentrations that selectively favor resistance are nonetheless sufficient to reduce the chance of outgrowth from a single cell to a very low probability. Our findings suggest that lower antibiotic concentrations than those required to clear a large resistant population may be sufficient to prevent, with high probability, outgrowth of initially rare resistant mutants.

S1; but well below the previously estimated MIC of 2048µg/ml for PAMBL2-carriers 2 ). Control cultures were plated neat. After overnight incubation at 37 • C, colonies appeared from all three transformed strains, but not from the controls. We picked 3 colonies per transformed strain to streak out on one streptomycin-containing plate each, incubated at 37 • C overnight, then picked a single colony from each streaked plate to inoculate an overnight culture in LB containing 200µg/ml streptomycin to select for plasmid maintenance. Freezer stocks were prepared from the overnight cultures in approx. 17% glycerol and stored at -80 • C.
Successful transformation of PAMBL2 into each of the three PA01 backgrounds was confirmed by PCR. Specifically, we used primers flanking the tniA gene on PAMBL2 (Table I.  We conducted standard MIC assays using the broth microdilution method. Specifically, overnight cultures were diluted 10 3 -fold, then used to inoculate antibiotic-containing media on 96-well plates at a further 1 in 10 dilution. This procedure consistently yielded final inoculation density close to 5×10 5 CFU/ml, as per standard guidelines 5 . Actual culture densities in each experiment were estimated by plating further-diluted cultures on LB-agar, incubating overnight at 37 • C, then counting colony-forming units. Growth was tested on 96-well plates in media containing antibiotics at two-fold concentration steps, as well as in antibiotic-free media (positive controls); negative control wells were mock-inoculated with PBS. All concentrations reported here correspond to mass of the powder compound (streptomycin sulfate or meropenem trihydrate) per unit volume. Growth was assessed by optical density at 20h, 2d, and 3d post-inoculation, with OD 595 > 0.1 scored as growth. MIC in a given replicate (i.e. a row containing one culture well at each antibiotic concentration) was defined as the lowest tested antibiotic concentration preventing growth. Here, we take median MIC across replicates assessed at 3d as our standard values, to be consistent with the assessment of growth at 3d in our seeding experiments. For comparison, we also report results assessed at 20h, as per standard guidelines 5 .
We first tested the sensitive strain with each fluorescent background (no marker, YFP, or DsRed) to check whether fluorescent labelling affects MIC. We used two separate overnight cultures of each strain, each tested in triplicate for growth in antibiotics (with one replicate per strain on each 96-well test plate, to control for plate effects). No contamination in the negative control wells was detected over the course of this experiment. We found no evidence that fluorescent label affects MIC (Table I.2). in streptomycin at 3d), we broke the tie by looking at individual growth replicates; this yielded an overall median of 2048µg/ml, which was consistent with the median MIC typically obtained at standard inoculation density in further experiments (section 2.2; main manuscript, Fig. 3a; and SI Appendix, Fig. S3). A single edge well (negative control) on one meropenem test plate showed apparent contamination, i.e. OD > 0.1, first appearing at the 3d measurement; we considered this contamination rate of ∼ 0.3% across test plates to be negligible. We extended the standard MIC assay protocol to evaluate the MIC in streptomycin of the YFPlabelled, Rms149-carrier strain at variable inoculation density. For this purpose, we used an overnight culture diluted 10 3 -, 10 4 -, 10 5 -, and 10 6 -fold to inoculate test cultures at a further 1 in 10 dilution. Thus, the highest inoculation density corresponds to the standard assay. Actual inoculum sizes were again estimated by plating. Specifically, starting from the 10 6 -fold-diluted overnight culture used for the smallest inoculum size, we took a further 5-fold dilution step, replicated three times. Each of these three 5 × 10 6 -fold diluted cultures was plated on 5-6 LBagar plates (20µl/plate). The number of colony-forming units was then averaged across the three replicate dilutions to obtain the reported estimate.
In the main, "standard volume" experiment (main manuscript, Fig. 3a; and SI Appendix, cultures on 24-well test plates. The 1160µl volume was chosen to match surface area to volume ratio on the standard test plates. The 116µl volume was then chosen to obtain 10-fold lower vol-ume and hence 10-fold higher density at matched absolute inoculum sizes. In both experiments, we tested streptomycin-free positive controls along with streptomycin-containing media in 2-fold concentration steps from 1/16×MIC R up to 2×MIC R in the standard volume experiment, or up to 1×MIC R in the varying volumes experiment (in which the capacity was limited by the 24-well plates). We scored growth in six replicates per test condition.
In the standard volume experiment, the four inoculation densities were distributed across four test plates to control for any plate effects. No contamination (growth in outer, negative control wells) was detected over the course of the experiment. In the varying volume experiment, on 96well test plates, each plate included two replicates at each inoculation density along with negative controls in outer wells; again, no contamination was detected over the course of the experiment.
On 24-well test plates, each test plate included one replicate per inoculation density, while an additional plate processed in parallel served as a negative control and showed no contamination.
Occasionally, a replicate showed no growth at a given streptomycin concentration, but did grow at the next highest concentration step, before growth was abolished. (This occurred in the standard-volume experiment for one replicate at estimated inoculum size 1.28 × 10 3 CFU evaluated at 20h, and one replicate at 1.28 × 10 4 CFU evaluated at 3d.) This result can arise through stochastic effects leading to growth in some cultures but not others at any given streptomycin concentration. Note that cultures at successive streptomycin concentrations grew independently of one another, and were grouped simply by plate row as one replicate for MIC evaluation. In these ambiguous cases, we scored MIC for the replicate as the higher concentration, at and beyond which no growth occurred.
3 Probability distribution of the number of colony-forming units In this experiment, we evaluated the probability of outgrowth of a detectable population of the YFP:Rms149 strain in monoculture, at a given streptomycin concentration, as a function of inoculum size (main manuscript, Fig. 3b; and SI Appendix, Fig. S4-S5). In each experiment, we also tested growth in streptomycin-free media in parallel, in order to estimate effective inoculum size in liquid culture conditions (see . Experimental protocol: The experiment proceeded similarly to the seeding experiments described above with the resistant strain in monoculture (Section 4). We made a single dilution series of an overnight culture of the YFP:Rms149 strain to inoculate test plates, but selected a different subset of these diluted cultures to use at different streptomycin concentrations (Table I.4). Thus, any inaccuracy in individual steps of the dilution series is a possible source of discrepancy between streptomycin-free and streptomycin-containing cultures. We did not account for this possible error in our model (which assumes perfect dilution steps), but we minimized the possibility of compounding errors in the experimental protocol by taking dilutions in parallel rather than in series wherever possible (e.g. 1e6-through 2e7-fold dilutions would each be prepared in a single independent step from a common 2e5-fold dilution). To control for possible plate effects, different inoculum sizes at a given streptomycin concentration were distributed across test plates. All test plates included edge wells as negative controls to check for contamination. In the second supplementary experiment, there were two new appearances of growth, but also several contaminated outer negative control wells on the second day; thus, we did not count these new appearances.) Growth of streptomycin-treated cultures was scored daily up to five days post-inoculation.
This extended protocol was chosen because in the previous seeding experiments (Section 4) we observed some new appearances of growth on the third day at the highest streptomycin concentration (1/8×MIC R ). In the present experiment, we observed a few cases of new growth beyond Day 3 at 1/8×MIC R , but only one replicate in one experiment at 1/16×MIC R . For comparison, we report results based on growth up to Day 3 and up to Day 5 in Table III.5 and SI Appendix, Table S2; the plots in the main manuscript, Fig. 3b, and SI Appendix, Occasionally a well dried up due to evaporation by Day 5; in these cases, we counted culture growth if it appeared earlier. In supplementary experiment 2, a cluster of four wells on one plate hovered just below the threshold OD of 0.1, with one of these wells just crossing the threshold on two of the five measurements. However, there did not appear to be bacterial growth in these wells and thus they were not counted.
Contamination was generally rare, and when it did appear, it was usually late in the experiment (likely due to contamination during OD measurements with lids removed) and judged unlikely to have affected our results. There was one possible exception in supplementary experiment 2, in which there were four new appearances of growth in test cultures on Day 5, but also two cases of contamination appearing in adjacent wells on one plate. In this case, we repeated the model fitting on growth data assessed at Day 4 instead of Day 5, and still did not reject the null model (D = 11.5, p = 0.24, cf.   Seeding test plates were first inoculated with either 10µl/well of PBS (absence of sensitive strain) or 10µl/well of sensitive strain culture at the appropriate dilution factor, yielding initial densities of approximately 5 × 10 5 CFU/ml (low density) or 5 × 10 7 CFU/ml (high density).
Then, the resistant strain culture at the appropriate dilution factor was inoculated at 10µl/well.
Separate plates to test growth of the sensitive strain alone (24 replicates per condition in each experiment) were inoculated with 10µl/well PBS plus 10µl/well sensitive strain culture. On all plates, edge wells were mock-inoculated with 20µl/well PBS to serve as negative controls. In experiment 1, two additional media-only plates were inoculated with 20µl/well PBS to serve as controls on background fluorescence during later plate readings. (Since fluorescence level depends on well volume and differential evaporation occurs in edge vs. interior wells, edge wells were not sufficient for this purpose.) In experiment 2, which tested fewer streptomycin concentrations, two spare interior columns on each plate served this purpose.
Test plates were incubated and read 1, 2, and 3 days post-inoculation, as before. In addition to OD, we now measured fluorescence near the peak of YFP (excitation 500±27nm; emission 540±25nm). For consistency across daily readings, the fluorescence 'gain' setting was fixed to 100, chosen based on pilot readings using the plate reader's 'autogain' function.
Data processing: As before, we set a threshold OD of 0.1 to score as culture growth. In addition, we set a threshold fluorescence of 5 × 10 5 units to score growth by the YFP-labelled resistant strain. This threshold was chosen such that all cultures showing growth by Day 3 that were inoculated with the resistant strain alone fell above the threshold (except for two suspected cases of cross-contamination in experiment 1; see below), while all those inoculated with the sensitive strain alone fell below the threshold. In nearly all cases, this separation was clear-cut (SI Appendix, Fig. S11). In wells scored as "growth by resistant strain", we cannot rule out that the sensitive strain is also still present; however, based on the high fluorescence level, we can be confident that a sizeable population of the resistant strain is present, and can therefore be considered as established. The number of replicate cultures scored as growth by the resistant strain was then used to estimate its establishment probability in each condition (Sections 11 and 16).
In experiment 1, in streptomycin-free media, two cultures seeded with the resistant strain alone (out of a total of 120 cultures across the two resistant dilution factors) showed growth (OD>0.1), but low fluorescence (< 5×10 5 ). Similarly, in experiment 2, one culture in 1/8×MIC R streptomycin showed this pattern. These cases are suspected to represent cross-contamination by the sensitive strain or by other (possibly also streptomycin-resistant) bacteria in the lab.
Meanwhile, growth in negative control wells also occurred at a low rate (1-2% of wells in streptomycin-free media and 1/128×MIC R streptomycin in experiment 1; 1.25% of wells only in streptomycin-free media in experiment 2), with low fluorescence readings consistent with crosscontamination by the sensitive strain or by unrelated bacteria. These contamination rates were considered negligible for the analysis.

Fraction of dead cells (live-dead staining and flow cytometry)
The goal of this experiment was to assess the proportion of dead cells induced by sub-MIC R streptomycin treatment of the Rms149 resistant strain. Live-dead staining and flow cytometry were carried out with one set of replicates at a time in order to avoid having samples exposed to the stain for too long. Each of the six replicate sets included a media-only control, a heat-killed control, and a culture treated at each tested streptomycin concentration; the streptomycin-free culture was also repeated as the last sample in order to check for an effect of time exposed to the stain before sampling. For each replicate set, the 10-fold diluted samples were diluted a further 10-fold into pre-warmed, sterile filtered 1mM EDTA in PBS and incubated for 10min at 37 • C. Then 2µl each of thiazole orange [TO] and propidium iodide [PI] (BD Cell Viability Kit, product no. 349483) were added per 200µl sample and incubated 5min further at room temperature. We then analyzed 50µl per sample using flow cytometry (BD Accuri C6 Flow Cytometer with software version 1.0.264.21 -Accuri Cytometers, Inc.) with fast fluidics (66µl/min), discarding events with forward scatter FSC-H < 10,000 or side scatter SSC-H < 8000.
Data processing: To analyze the flow cytometry data, we proceeded as follows.
1. Cell densities in diluted treated cultures were sometimes low, especially at higher streptomycin concentrations. In order to better discriminate cells from background, we first defined a gate (labelled "cells") in the FSC-A/SSC-A (forward/side scatter) plot that incorporated the majority of events in the sampled cultures, but excluded the majority of events in the media-only controls (SI Appendix, Fig. S6a). Further analysis was limited to events within this gate.
2. We then defined gates around events in the FL1/FL3 plot that clustered according to their fluorescence (SI Appendix, Fig. S6b). TO (live stain) is primarily detected in the FL1 channel (488nm laser with 533/30 filter), while PI (dead stain) is primarily detected in FL3 (488nm laser with 670LP filter). Thus a cluster appearing with higher FL1 and lower FL3 was labelled "intact" and a cluster appearing with lower FL1 and higher FL3 was labelled "dead". Nearly all events in the heat-killed controls fell within the "dead" gate.
Together, these two gates incorporated the majority of events in the sampled cultures, but only a minority of events in the media-only controls, providing further discrimination from background events.
3. Finally, to correct for any remaining background, within each replicate set we subtracted the number of events in the media-only control from the number of events in each sampled culture, within each of the two gates ("dead" and "intact").
4. The proportion of dead cells in sampled cultures was defined as the number of events falling within the "dead" gate divided by the total number falling in either the "dead" or the "intact" gate (after background correction as described above).
One replicate set appears as an outlier with an elevated fraction of dead cells (see SI Appendix, Table S3). Since this elevation occurs at all streptomycin concentrations, it can likely be attributed to the staining or flow cytometry steps, in which all samples were processed together, rather than the culture growth step, in which cultures at each concentration grew independently.
If we exclude this replicate from the analysis, the mean fractions of dead cells correspondingly drop slightly, but the significance of streptomycin effects does not change (Table S3).
TO has toxic effects on cells, and thus the order of sampling the diluted cultures by flow cytometry within each replicate set (hence time exposed to the stain before sampling) could potentially have been confounded with the effect of streptomycin treatment. However, by comparing the streptomycin-free culture sampled earlier vs. later within each replicate set, we found that the proportion of dead cells on average actually decreased (mean of six replicates: 0.0403 in first sample vs. 0.0265 in second; SI Appendix, and Set B (hourly from 1h to 8h, and at approx. 24h) to account for slower growth at higher streptomycin concentrations. In total, we thus had 8 treatment plates in Set A and 10 in Set B. Actual sampling times (time elapsed between inoculation and plating) were recorded in the course of the experiment.
Upon sampling, the treated cultures were plated undiluted in 4µl spots on each of five square (12cm × 12cm) LB-agar plates, for a total sampling volume of 20µl out of each 200µl culture.
After sampling, treatment plates were returned to the incubator for later OD reading (after approx. 1, 2, and 3 days) to assess eventual growth in all the sampled plates, as in our previous experiments. Contamination was rare (one negative control edge well on each of two plates showed contamination first appearing on Day 2, corresponding to an overall contamination rate of 0.3% across all plates).
LB-agar plates were immediately moved to 37 • C for the rest of the day, then removed to the bench (room temperature) overnight to prevent overgrowth of colonies, then returned to 37 • C the following day for several hours until colonies were visible, but still separated, for optimal counting by eye. Total colony counts from the five plates were used to estimate viable population size in the treated cultures at time of sampling (scaling up by a factor 10 from sampled to total volume). Later plated time points were excluded if colonies became too dense to count at a given streptomycin concentration.

Competition assay (flow cytometry)
The goal of this experiment was to determine the direction of selection for resistance vs. sensitivity across a range of streptomycin concentrations, by competing resistant (YFP:Rms149) and sensitive (DsRed) strains at reasonably high starting densities, such that demographic stochasticity is negligible.
Experimental protocol: We used the YFP-labelled Rms149 resistant strain and the DsRedlabelled sensitive strain. Although DsRed does not provide a strong enough fluorescent signal to aid in discrimination, it controls for the fitness effect of carrying a fluorescent marker.
Overnight cultures of each strain were mixed at an initial 20-fold dilution each, then this mixture was diluted 100-fold further. This yielded a 1:1 volumetric mixture of strains each at 2000-fold dilution, which was inoculated at 20µl per 200µl total culture volume. The total bacterial density at the start of treatment was thus expected to be around 5 × 10 5 CFU/ml, as we also used for MIC tests at standard inoculation density. For the pure cultures, each strain was diluted 2000-fold alone and then inoculated similarly. That is, we chose to match the density of a given strain between pure and mixed cultures, rather than the total bacterial density.
At each streptomycin concentration, we inoculated a total of 6 replicate mixed cultures and 2 replicates of each pure culture, split evenly across two treatment plates. Test concentrations ranged from 1/2048 to 1/8 × MIC R (1/32 to 8 × MIC S ) streptomycin in 2-fold steps, as well as streptomycin-free. Outer wells were also filled with streptomycin-free media and mockinoculated with 20µl of PBS to serve as media-only controls. Treatment plates were incubated (37 • C, 225rpm), sampled (20µl per well) at 6.5h, then immediately returned to the incubator and sampled again at approx. 24h. The latter time point provided better resolution at higher streptomycin concentrations and is thus used for data analysis. The 24h treatment culture samples (along with media-only controls) were diluted a total of 500-fold in sterile filtered PBS for flow cytometry (BD Accuri C6 Flow Cytometer). From each diluted sample, 66µl were sampled using fast fluidics, i.e. 66µl/min, discarding events with forward scatter FSC-H < 10, 000 or side scatter SSC-H < 8000.
Data processing: To analyze the flow cytometry data, we proceeded as follows.

We defined non-overlapping gates 'S' and 'R' in the FL1 (fluorescence detection) -FSC-
A (forward scatter) plots (see SI Appendix, Fig. S8). FL1 is configured with a blue (488nm) laser and 533/30 interference filter, which primarily detects the YFP signal. The S and R gates roughly correspond to DsRed-labelled sensitive cells (lower fluorescence) and YFP-labelled resistant cells (higher fluorescence), respectively. However, the pure cultures revealed overlap into the opposite gates, particularly of resistant cells with low fluorescence into the 'S' gate, which is accounted for in the following steps. The gates were drawn separately, but similarly, for each of the two treatment plates. In total, these gates comprised an average of 98% of all detected events on one treatment plate (range across wells: 95-99%) and an average of 96% (range: 89-98%) on the other treatment plate.
Events falling outside both gates were excluded from analysis.
2. We corrected for background events in each well by subtracting the number of events in the corresponding media-only control from the number of events in the sample of interest, in each gate. If negative, we set this value to zero.
3. From the (background-adjusted) number of events in each gate in pure cultures, where we know only a single strain is present, we calculated the parameters p i,j : the proportion of cells of strain i that fall into gate j. For example, p S,R is the proportion of sensitive cells that fall into the 'R' gate, calculated as: is the number of events falling into gate j in the pure culture of strain i.
(Thus, p S,S + p S,R = 1 from the pure sensitive culture and p R,S + p S,S = 1 from the pure resistant culture.) These parameters are calculated separately at each streptomycin concentration, but crucially, we assume below that they are fixed for a given strain whether it is in a mixed culture or a pure culture. 4. In mixed cultures, we want to know the "true" number of cells of each strain (N mix S,tot and N mix R,tot ), adding up cells that fall into either gate: that is, for each strain i, On the other hand, what we observe in a mixed culture is the total number of cells of either strain that fall into each gate j, G mix j . We can express the relationship between these quantities as: where the parameters p i,j were calculated above from the pure cultures. With Eqn. S1 we have two linear equations in two unknowns, which can be readily solved to obtain: reflects the imperfect assumption that the proportion of cells falling in each gate is the same across cultures, whereas it will in reality show some variation. In these cases, we manually set the number of sensitive cells to zero; this adjustment had a very small effect relative to the total number of cells.

Mathematical modelling and model fitting
Here we describe the models that we fit to population growth data in the seeding experiments and the tests of the inoculum size effect. We note that our approach is not entirely novel, but is presented here in full for clarity. Connections to previous work are discussed in Section 10.2, and the correspondence between different methods of fitting the data is explained in Section 12.1.

Theoretical model of population growth
We treat the number of cultures showing growth, across independent biological replicates in a given test condition, as binomially distributed with number of trials equal to the number of replicate cultures and "success" probability p w , the probability of detectable bacterial population growth. The parameter p w will depend on the inoculum size, whose expected valueN is con- To further relate population growth to individual cell fates, we make the fundamental assumption that population growth will be observed if and only if at least one individual in the inoculum establishes a surviving lineage. (In the following exposition, we equate "individual" with "cell", but generally an individual unit might be a clump of cells; see Section 10.1 below.) Furthermore assuming that the number of individuals that establish is Poisson-distributed with mean α, we can express our model in its most general form as: (corresponding to Eqn. 2 in the main manuscript). In support of this assumption, we find experimentally that the number of colony-forming units on antibiotic-free LB-agar is indeed adequately described by a Poisson distribution (SI Appendix, Fig. S1). Note that Equation It is useful to define the relative establishment probability,p c , in a focal environment x, as the mean number of established cells in that environment, normalized by the result in some baseline environment (for our purposes, antibiotic-free media), denoted x = 0. From results at mean inoculum sizeN i , we calculate relative establishment probability as: Note thatp c is not a true probability; in particular, it is possible forp c to exceed one, if α in environment x exceeds that in environment 0. (The interpretation ofp c is further discussed below.) Using definition S4, we can rewrite the full model in terms of the transformed parameters  Using these transformed parameters, we can also make the reasonable simplifying assumption that the relative establishment probabilityp c (x j ) is constant in a given test environment x j , while α(N i , 0) varies arbitrarily withN i (Model B , "fixed environmental effect"). That is, we jointly and {p c (x j )} s j=1 from the results pooled across all test conditions. This model still requires us to have tested growth in every environment at the same inoculum sizes.
The number of parameters to be estimated is reduced from m · (s + 1) in Model A , to m + s in the nested Model B .
Finally, we introduce our null model of the inoculum size effect, i.e. the relationship between p w andN . Here we invoke the key assumption that each individual acts independently, i.e. the outcome of establishing a surviving lineage is not affected by other individuals in the inoculum.
(This independence assumption is very common when modelling dynamics at low population density, for instance using branching processes; here we rigorously test the validity of this assumption.) Suppose that the number of individuals in the inoculum is Poisson-distributed with meanN and that the fate of each individual (i.e. whether it establishes a surviving lineage) is an independent Bernoulli trial with success probability p c , called the per-cell establishment probability, which depends only on the environment, x. Then we arrive at the number of established lineages being Poisson-distributed, * consistent with our earlier assumption, and can write the mean very simply as: Substituting Eqn. S5 into Eqn. S3 leads to (Eqn. 1 in the main manuscript), which we call the null model of the inoculum size effect, relating p w toN . Under this model, inoculum size cancels out in the definition of relative establishment probability (Eqn. S4) and we have simply: Note that according to this model, we cannot obtain estimates of absolute establishment probability (p c ) since this parameter plays a symmetrical role to inoculum size (N ) and thus their effects cannot be separated. That is, if we observe a higher proportion of established populations in an experiment, we cannot tell whether this was due to higher inoculum size or higher establishment probability. This limitation is not unique to our experimental protocol: quantifying cell density by counting colony-forming units on solid media relies on the implicit assumption that the establishment probability of a "viable cell" is one. Likewise, we will re- Bernoulli(pc) trials, and thus has a Binomial(N, pc) distribution with PGF g Y |N (z) = (1 − pc + pcz) N . We can then derive the distribution of Y via its PGF, gY (z), as follows: This is the PGF of a Poisson random variable with meanN pc.
For the purposes of parameter estimation under the null model, we pool results across inoculum sizes by supposing that we do not make any experimental error in culture dilution steps, and so the mean inoculum size is inversely proportional to the dilution factor applied to the inoculating culture. That is, the i th inoculum size is: and thus whereN * is the mean inoculum size at a chosen normalizing dilution factor d * , and d i is the

Heterogeneous establishment probability
In the above model, we assumed that establishment probability p c is the same for every individ- Thus, Y is overall a Bernoulli trial with success probability equal to the mean establishment probability, and in turn, the sum of N such independent trials yields a binomial distribution.
Therefore, the null model outlined above still applies if we simply interpret p c as the mean establishment probability among individual units behaving independently of one another. In particular, under this model, experimental parameters (e.g. the growth phase from which the inoculating culture was taken) will only affect the outcome if they change the mean establishment probability, whereas the extent of individual variation about a fixed mean does not matter.

Comparison to previous work
The same or similar theoretical models as presented here have also been used previously. Much earlier, Druett 8 derived the equation (converted to our notation) .1). In this case, p c (or θ * R in their notation) represents "rate of rescue per inoculated individual" (ref. 9 , p. 4), which captures different processes depending on the scenario. For instance, in the case of rescue relying on de novo mutations, θ * R accounts for the mutation rate as well as the establishment probability of mutants. In the case of rescue relying on preexisting (but rare) mutations, θ * R is simply the per-individual establishment probability, as in our interpretation.
Importantly, Martin et al. 9 also fit this equation to experimental data using similar methods to ours (cf. Sections 11-12 below). They likewise assessed the goodness of fit of this model (S9) according to deviance from the full model (our Model C vs. A), but did not include our additional Model B . Our approach moreover differs in that we estimate effective inoculum size from growth in baseline (antibiotic-free) conditions in parallel with test conditions, rather than treating inoculum size as a known (separately measured) value. We thus derive confidence intervals on the estimated establishment probability relative to the baseline conditions, taking into account the uncertainty in both measures.
More generally, a number of studies over the past decade have quantified the probability of evolutionary rescue experimentally, beginning with the pioneering work of Bell & Gonzalez 10 and recently reviewed by Bell 11 . Notably, Ramsayer et al. studied rescue in a very similar experimental model system to ours, namely Pseudomonas fluorescens exposed to streptomycin, and observed stochastic extinctions even in large initial populations where resistance was likely to be present at the outset. 12 However, this study, like most other experimental studies of evolutionary rescue to date, did not fit a theoretical model to their data in the manner of Martin et al. 9 or us.
We are aware of two other empirical studies that, like us, have effectively estimated the establishment probability of single bacterial cells faced with antibiotics, but using different methods. 13;14 These two studies quantified bacterial growth on solid media, and could thus directly visualize the number of established cells (reflecting α, in our notation) as colony-forming units. The ratio between CFU counts at any given antibiotic concentration and CFU counts on antibiotic-free plates, sometimes called "plating efficiency", 14 is equivalent to our "relative establishment probability" (p c ). In contrast, with liquid cultures, we visualize growth of populations from a random number of cells (reflecting p w ), with growth scored as a binary outcome.
We then infer the mean number of established cells that yielded this growth (α), and hence relative establishment probability, using our stochastic model (Eqn. S3 and S4). One strength of our method is that experiments in liquid culture can readily be conducted on a large scale and are amenable to automation. We are aware of one other study 6 using seeding experiments to assess growth from very few cells in liquid culture; however, this study only assessed the proportion of replicate cultures showing growth (p w ), and did not fit these data to a model to estimate the per-cell establishment probability.

Likelihood-based parameter estimation and model comparison
Basic binomial likelihood: The number of cultures showing growth at a given test condition, n grow , is modelled as binomially distributed with number of trials equal to the total number of replicate cultures, n tot , and success probability p w , the parameter we want to estimate. That is, Pr(n grow |(n tot , p w )) = n tot n grow p ngrow w (1 − p w ) ntot−ngrow and so the log likelihood function of p w given the data (n grow , n tot ) can be written (up to a constant that can be dropped) as: n grow log(p w ) + (n tot − n grow ) log(1 − p w ) , 0 < n grow < n tot n tot log(1 − p w ) , n grow = 0 n tot log(p w ) , n grow = n tot (S10) We have the simple analytical result for the maximum likelihood estimate (MLE): To obtain likelihood-based confidence intervals, we use the test statistic: i.e. twice the difference in log likelihood between the MLE and any test value of p w , and solve for the boundaries p * w such that D(p * w ) = D * , the critical value for a chosen significance level in the chi-squared test with one degree of freedom (ref. 15 , §2.6-2.9 and §9.5). The MLE and confidence interval boundaries for p w can simply be transformed to those for α using: Pooling dilution factors: Recall that under the null model of inoculum size effect, and assuming perfect dilution steps, we have where d i is the i th dilution factor, normalized by a chosen dilution factor d * , taken as known values. This leaves a single parameter α * (x) := α(N * , x), the mean number of established cells scaled to the chosen dilution factor, to be estimated in each environment by pooling data across all dilution factors. We thus define the pooled likelihood function in environment x: where log L is the binomial log likelihood defined in Equation S10, and m is the number of dilution factors tested in this environment. We write the joint log likelihood across all tested conditions (inoculum sizes and environments), given the data (n grow , n tot ) in each condition, as: where m j is the number of inoculum sizes tested in environment x j . Proceeding further depends on the particular model (cf.  • Model C (null model of inoculum size effect): We estimate a single parameter α * 0 := α(N * , 0) in the baseline environment at some normalizing dilution factor d * , assuming Eqn. S11 holds; as well asp c (x j ) in each non-baseline environment. We can again estimate α * 0 for the baseline environment in isolation, andp c (x j ) for each environment x j using only the data from the baseline and focal (j th ) environments. Therefore, we do not require the same dilution factors (i.e. inoculum sizes) to be used in each environment. However, since the data in the baseline and focal environments are used simultaneously, the dilution factors should be normalized by the same factor d * in both cases.
To define confidence intervals (CIs) on a given parameter when the model's likelihood is a function of more than one parameter, in particular for any estimate ofp c , we use the concept of profile likelihood (ref. 15 , §3.4). The profile likelihood function of a focal parameter is defined as the likelihood when holding this parameter fixed to a given value. The focal parameter's CI is in turn defined by the limits of its fixed values that allow its profile likelihood, optimized over all other parameters, to attain an optimum within a critical difference below the maximum likelihood, as optimized over all parameters including the focal. The critical difference is defined by a chi-squared test with one degree of freedom, since one parameter is fixed in the profile likelihood.
When plotting probability of population growth (p w ) at streptomycin concentration x versus effective mean inoculum size (N eff ) calibrated from α * 0 in streptomycin-free media (main manuscript, Fig. 3b; and SI Appendix, Fig. S5), we plot confidence intervals around the bestfitting (MLE) curve that indicate the uncertainty relative to the x-axis. This accounts for the intuitive idea that a lower value ofp c (x) can only sufficiently explain the data (observed population growth) in association with a higher value of α * 0 , and vice versa. More precisely, at the lower limit of the profile likelihood CI onp c (x), sayp c,L (x), we plot the curve p w = 1 − exp(−p c,L (x)N effα * 0,L /α * 0 ), whereα * 0 is the MLE in streptomycin-free media andα * 0,L is the optimized value of α * 0 whenp c (x) is fixed top c,L (x).  • natural logarithm of the dilution factor applied to the inoculating culture (treated as either continuous or categorical) • antibiotic concentration on the treatment plates (categorical) • experiment date, when pooling data from more than one experiment (categorical) We

Correspondence between theoretical models used for likelihood inference and GLM
We note that due to the simple form of our theoretical models (in general, Eqn. S14), and hence the natural choice of link function (Eqn. S15), we could have relied entirely on built-in GLM methods to fit our models. More specifically, we have the following correspondences: • The saturated GLM, with dilution factor as a categorical variable, corresponds to our full model (A or A ).
• The GLM including only main effects of antibiotic and dilution factor (again categorical), but not their interaction, corresponds to our model B .
• The GLM treating the logarithm of the dilution factor as a continuous variable, and again including only main effects, is similar, but not exactly the same, as our model C or C .
The slight difference arises because the theoretical model assumes perfect dilution steps where n is the total number of observations, n i is the number of observations in category i, and p i is the probability of an observation falling in category i in the fitted distribution (thus, np i is the expected number of observations in category i). We carried out two separate experiments, each with 144 plated spots. In both cases, we could use categories of 0, 1, 2, 3, 4, and 5 or more colonies per spot.
According to the goodness-of-fit test, the deviation from a Poisson distribution is not significant (c 1 = 7.76; χ 2 4 : p = 0.10) and thus we do not reject the null hypothesis that the data are drawn from a Poisson distribution. 14 Seeding experiments to estimate establishment probability of the resistant strain in isolation, across antibiotic concentrations Here we report detailed results of model fitting to data from our seeding experiments with the resistant strain in isolation, screening across antibiotic concentrations (Section 4). For each experiment, we first report the results of fitting our theoretical models of population growth (Sections 10-11) to the data. The model selected by the likelihood ratio test is used to obtain the maximum likelihood estimates and confidence intervals of relative establishment probability (p c ), as plotted in the main manuscript, Fig. 2 and 5, and reported in the SI Appendix, Tables S2 and S4, for the Rms149 strain in streptomycin and the PAMBL2 strain in meropenem, respectively. Next, we fit a generalized linear model (GLM) to the growth data (Section 12), which is used to determine significant effects of antibiotic concentration as annotated on Fig. 2 and 5.

Resistant (Rms149) strain alone, in streptomycin
We first consider the data from two repeat experiments seeding the Rms149-carrying (streptomycinresistant) strain, in isolation, into media containing various concentrations of streptomycin (0, 1/64, 1/32, 1/16, 1/8 × MIC R , where MIC R =2048µg/ml). These results are presented in the main manuscript, Fig. 2, and SI Appendix, Table S2. Taking log(dilfac) as continuous, the reduced model fit (Table III.1) indicated that the effects of Strep at 1/16×MIC R and 1/8×MIC R are significant relative to the streptomycin-free conditions. As expected, the effect of log(dilfac) is also highly significant; furthermore, the fitted coefficient of −1.30 is reasonably close to the theoretical prediction of -1 (see Section 12), but presumably skewed by the suspected error in one dilution factor mentioned above. Treating log(dilfac) instead as categorical does not change the conclusions regarding significant effects (lowest vs. highest dilution factor: p < 2e-16; middle vs. highest dilution factor: p=3e-10; Strep at 1/16×MIC R : p=0.010; Strep at 1/8×MIC R : p <2e-16).

Experiment 2:
Taking log(dilfac) as continuous, fitting a saturated model again indicated that the log(dilfac) and Strep main effects were significant, but their interaction was not; the reduced model excluding the interaction term was correspondingly preferred by AIC (saturated: 86.5 vs. reduced: 81.6). The reduced model fit (Table III.2) again indicated that the effects of Strep at 1/16×MIC R and at 1/8×MIC R are significant. The effect of log(dilfac) is also highly significant, and the fitted coefficient of −1.01 is in excellent agreement with the theoretical prediction, consistent with the acceptance of the theoretical Model C in the previous analysis.  Pooling both experiments: Using experiment date as an additional explanatory variable, a hierarchical search according to minimal AIC (applying the built-in R function 'step' to the saturated model) identified a reduced model in which all main effects and the experiment date × log(dilfac) interaction effect are retained, regardless of whether log(dilfac) is treated as continuous or categorical (see Table III.3 for the model fit in the continuous case).
The experiment data × log(dilfac) interaction presumably arises because of the suspected inaccuracy in the lowest dilution factor only in Experiment 1, as described above. Indeed, the interaction is identified as significant (p = 0.003) with log(dilfac) taken as continuous, while with log(dilfac) taken as categorical with the highest dilution factor as the baseline, the interaction between experiment date and the lowest dilution factor is significant (p = 0.008), but the interaction with the middle dilution factor is not significant (p = 0.8). Taken together, these results again point to error in a single dilution step in Experiment 1 as the source of deviating effects; we do not interpret this result as having biological significance.
More importantly, pooling data from two experiments strengthens the conclusions regarding the effect of streptomycin: specifically, Strep at 1/16×MIC R or 1/8×MIC R has a highly significant effect relative to the streptomycin-free control (p=2e-8 and p <2e-16, respectively, regardless of whether log(dilfac) is treated as continuous or categorical).

Resistant (PAMBL2) strain alone, in meropenem
We now turn to the seeding experiment data for the PAMBL2-carrying (meropenem-resistant) strain, seeded in isolation into media containing meropenem at various concentrations (0, 1/32, where MIC R = 512µg/ml). These results are presented in the main manuscript, Fig. 5, and SI Appendix, Table S4.

Theoretical model fitting
Similarly to the previous experiments, we fit the theoretical models A , B and C to the growth data, and compared their fits using the likelihood ratio test (LRT). Note that we used five antibiotic concentrations but only two inoculating dilution factors in this experiment, implying that there are 10 parameters to fit in Model A , six in Model B , and five in Model C . The LRT

Generalized linear model fitting
In the GLM, the explanatory variables are meropenem concentration, Mero for short; and the logarithm of the inoculating dilution factor, log(dilfac) for short. The significance of meropenem effects as determined from the GLM fit is annotated on Fig. 5 Table III.4).   Table S2.
In Table III.5 we summarize the results of the likelihood ratio test at each streptomycin concen-tration in each experiment, based on growth evaluated at either 3d or 5d post-inoculation in the presence of streptomycin. In all but one case, the deviance (D) of the null model from the full model (calculated separately at each streptomycin concentration) is non-significant (p > 0.05), and thus we do not reject the null model. In supplementary experiment 2, if growth is evaluated at 3d, the deviance becomes marginally significant (p = 0.48), but does not remain significant after correcting for multiple testing.  c In streptomycin-free conditions, we use the maximum likelihood estimate of α * 0 , scaled up by the corresponding dilution factor applied for the inoculation, to estimate an effective viable cell density in the overnight culture used for inoculation. This is equivalent to the "most probable number" method for determining bacterial density using multiple dilution factors 17 . We use this estimate, along with the known dilution factors applied to the inoculating culture, to scale the "effective mean inoculum size" on the x-axis of our plots.
16 Seeding experiments to estimate establishment probability of the resistant strain in the presence of the sensitive strain Here we present detailed analysis of experiments seeding the resistant (Rms149-carrying) strain in the presence of the sensitive strain, across streptomycin concentrations (Section 6; results presented in the main manuscript, Fig. 6, and SI Appendix, Table S6).

Growth of sensitive strain alone
As controls, we tested growth of the sensitive strain alone, at each streptomycin concentration. Table III.6 reports these growth data. The frequent growth of cultures in 1×MIC S streptomycin when inoculated at high density (approx. 5 × 10 7 CFU/ml) is consistent with previous observations that MIC often increases with inoculation density 18;19 ; recall that the standard MIC S was evaluated at inoculation density similar to the "low" density here (approx. 5×10 5 CFU/ml). Occasionally, at streptomycin concentrations well above the MIC of the sensitive strain (4×MIC S and higher), replicate cultures inoculated with the sensitive strain alone (see Table III.6) showed growth but low fluorescence. We interpret these cases as probable outgrowth of spontaneous resistant mutants on the DsRed sensitive strain background (or possibly cross-contamination with unrelated, streptomycin-resistant bacteria in the lab). Notably, all of these cases occurred when the sensitive strain was inoculated at high density, providing 100-fold higher initial population size from which mutants could have arisen than the low-density inoculum.  Section 6), as data for model fitting. The "baseline condition", against which relative establishment probabilities were normalized, was chosen as antibiotic-free media in the absence of the sensitive strain. As before, model selection was based on the likelihood ratio test.

Seeding data: Assessing significance of experimental conditions
The seeding data showed too many conditions in which zero replicates established for us to effectively fit a GLM to the full dataset. Instead, we focused on comparing selected pairs of conditions that were of particular interest, and in which significance of the effect of the sensitive population on resistant establishment was not immediately clear. For this purpose, we treated each seeding replicate as a Bernoulli trial (with "success" being establishment of resistance in that culture), and compared pairs of experimental conditions using a two-sided Wilcoxon ranksum test, with a Bonferroni correction for multiple testing. For this purpose, we pooled data across the two inoculated dilution factors of the resistant strain; since the Wilcoxon test is nonparametric, it does not matter that the distribution of the pooled data will be a weighted sum of two binomial distributions, with different success probabilities at different inoculum sizes.
In experiment 1, we also pooled data across absence and presence of the sensitive strain at low density, which showed very similar results, to increase the power to detect an effect of the sensitive strain at high density.  Test cultures at two-fold concentration steps of streptomycin were inoculated with the PA01:Rms149 strain at four different inoculum sizes. MIC was evaluated as the minimal tested concentration that prevented detectable growth up to 20h (blue) or 3d (red) post-inoculation. The data at 3d are the same as in the main Fig. 3A. The y-axis is scaled by the MIC of this strain at standard inoculation density (MICR; see Table S1). The points (plotted with slight offsets in the ydirection for visual clarity) represent six biologically independent replicates at each inoculum size, with the line segments indicating their median at each time point.  Table S1). Growth was tested in two-fold concentration steps of streptomycin, up to a maximum of 1 x MICR; if growth occurred at this concentration, the MIC is plotted here as 2 x MICR but could be higher. The absolute inoculum size in CFU, in log scale on the x-axis, was estimated by plating. The plotted points represent six biologically independent replicates at each condition and the line segments indicate their median. Points are slightly offset in both x-and y-directions for visual clarity. Inoculation density in CFU/ml is indicated by colour as per the legend. At three of the nine tested inoculum sizes (6.9 x 10 2 , 6.9 x 10 3 , and 6.9 x 10 4 CFU), two different densities were tested in each case; note that the medians (in cyan and yellow) coincide for the two densities tested at 6.9 x 10 2 CFU. At matched absolute inoculum sizes, in no case is there a significant effect of density on MIC: at 6.9x10 2 CFU, comparing 6x10 2 vs. 6x10 3 CFU/ml: p=0.17; at 6.9x10 3 CFU, comparing 6x10 3 vs. 6x10 4 CFU/ml: p=0.14; at 6.9x10 4 CFU, comparing 6x10 4 vs. 6x10 5 CFU/ml: p=0.21 (Wilcoxon rank-sum test with continuity correction; approximate p-values computed due to ties).  primarily detects the propidium iodide (PI) stain. In a second analysis step, the "dead" gate was drawn around the cluster that appeared with lower TO and strong PI staining (representing cells with compromised membranes), and the "intact" gate was drawn around the cluster that appeared with higher TO and weak PI staining (representing cells with intact membranes). This gating provided further discrimination from background events, due to the low proportion of events falling in either the dead or the intact gate in media-only controls (across six replicates, 3-17% of events within the "cells" gate, or 11-38 events per sample), compared to the high proportion in heat-killed samples (99%), cultures treated with up to 1/16 x MICR streptomycin (86-94%), and cultures treated with 1/8 x MICR streptomycin (67-78%). We determined the fraction of dead cells, with correction for remaining background events, as the number of events in the "dead" gate divided by the total number in both "dead" and "intact" gates (minus the numbers in each gate in the media-only control). The fraction of dead cells in the heat-killed samples was thus close to 100%, while the fraction in treated cultures varied with streptomycin concentration (Table S3 and Fig.  4A). The examples shown here are samples from 500-fold diluted cultures after 24h in streptomycin-free media; from left to right: a pure sensitive strain culture, a pure resistant strain culture, and a mixed culture (inoculated with both strains in a 1:1 volumetric mixture). The FL1 detector is configured with a 488nm laser with a 533/30 interference filter, which will detect YFP fluorescence; thus, the resistant strain appears with elevated fluorescence in this channel. Note however the substantial overlap of the pure resistant culture into the "sensitive" gate; therefore, the counts falling in each gate in the mixed cultures were adjusted accordingly to infer the proportion of each strain (see Suppl. Text, section 9). Following inoculation at a 1:1 volumetric mixture of both strains, cultures were incubated for 24h and then diluted 500-fold and sampled by flow cytometry. Detected events were classified as sensitive cells (blue) or resistant cells (red) according to gating by fluorescence, corrected for overlap between strains and for background events in media-only controls (Fig. S8). The number of cells in a 66µl sample of diluted culture is plotted here. Streptomycin concentration on the x-axis is scaled by the standard MIC values for the resistant and sensitive strains (Table S1). Points indicate six biologically independent replicates at each concentration, with sensitive and resistant cells in the same replicate culture plotted at the same horizontal position; line segments indicate the mean for each strain.  (Table S1). Points represent six biologically independent replicates at each concentration, with line segments indicating their mean (see also Table S4). Asterisks indicate that the mean final proportion of the resistant strain significantly differs from 0.5 using a two-sided t-test at each of the lowest seven tested streptomycin concentrations, with a Bonferroni correction for multiple testing (n.s.: p > 0.05/7; *: p = 4e-3; **: p £ 5e-6). At the highest three tested streptomycin concentrations, lack of variation among replicates precludes a t-test. Text, section 6). Panels A and B illustrate the results of two separate experiments that tested different sets of culture conditions, as in the main Fig. 6. Each point on a scatter plot represents one replicate culture. Points are colour-coded according to absence of the sensitive strain (black); presence of the sensitive strain at low density (cyan; first experiment only); or presence of the sensitive strain at high density (orange). Cultures inoculated with the resistant strain at higher mean inoculum size (5e7-fold diluted culture) are represented by points shaded darker, while those inoculated at lower mean inoculum size (2e8-fold diluted culture) are shaded lighter. Each plot corresponds to a different streptomycin concentration, as annotated above the plots. The thick lines indicate the threshold value of OD used to define growth (0.1) and the threshold value of fluorescence used to assign growth to the resistant strain (5e5). The thin grey lower line indicates the mean background fluorescence in media-only controls.    . S8 and Suppl. Text, section 9, for details). The reported confidence intervals (CI) on the mean, test statistics and p-values are from two-sided, onesample t-tests (d.f.=5) comparing the final proportion of the resistant strain to a mean of 0.5 (the initial proportion). Significant results after a Bonferroni correction for multiple testing (7 tests, giving a significance threshold of 0.05/7 » 0.007) are in bold font. N/A: a t-test cannot be performed due to lack of variation among replicates. [