Cell size control driven by the circadian clock and environment in cyanobacteria

Significance When and at what size to divide are critical decisions, requiring cells to integrate internal and external cues. While it is known that the 24-h circadian clock and the environment modulate division timings across organisms, how these signals combine to set the size at which cells divide is not understood. Iterating between modeling and experiments, we show that, in both constant and light−dark conditions, the cyanobacterial clock produces distinctly sized and timed subpopulations. These arise from continuous coupling of the clock to the cell cycle, which, in light−dark cycles, steers cell divisions away from dawn and dusk. Stochastic modeling allows us to predict how these effects emerge from the complex interactions between the environment, clock, and cell size control.


Segmentation and cell length
All images were segmented and tracked using a modified version of Schnitzcells (1). The segmentation algorithm is performed on combined fluorescent and phase contrast images. The fluorescent image is acquired in the range of red wavelengths, and so it collects the cells' autofluorescence. Since S. elongatus cells are rod shaped, cell size is proportional to cell length. Cell length is defined as the length of the semi-major axis of the segmented cell shapes.
Only cells that divided within 120 h of the start of the experiment were used in the subsequent analysis.

Exponential elongation rates
Time-series of cell length were extracted and smoothed with either a locally weighted regression (lowess) method (for constant light data) or a moving average window 3 data points wide (LD cycles). We defined the exponential elongation rate as where is the smoothed cell length at time .

Defining time of day in constant light and in LD cycles
In all experiments, cells were entrained by 12:12 LD cycles (step cycles) before image acquisition. Clock phase is reset after prolonged periods of darkness (2), and the clock maintains robust and synchronous oscillations at the single-cell level for long periods, even under constant light (3). For simplicity, we assume that a circadian cycle lasts 24 h, and we define 'time of day' ( in the model), as a proxy for clock phase, in relation to environmental transitions: dawn (when lights are switched on) and dusk (when lights are switched off). Times of day are therefore set at the beginning of the experiment, and reset every 24 hours. Under constant light, subjective dawn occurs 24 h after the end of the last 12 h long dark period, and every 24 h thereafter. Subjective dusk occurs 12 h after subjective dawn. Times of day relative to a 24-hour day are thus obtained by shifting total experimental time by 12 modulo 24.
In light-dark cycles, times of day at dawn are the same for 12:12 LD and 16:8 LD conditions. Under 16:8 LD cycles, the light period is extended by 4 h at the end of the day relative to 12:12 LD, and dusk occurs at a time of day 4 h.

Subpopulation clustering
For each tracked single cell that completed a cell cycle during image acquisition, we extracted distributions of time of birth, time of division, length at birth, length at division, added length and cell cycle duration. Joint distributions of time of birth and cell cycle duration under constant light revealed two distinct peaks (Fig. 2C). Both peaks are approximately symmetric and bell shaped along each dimension, so we assumed they can be fitted with Gaussian distributions, and that all points can therefore be thought of as belonging to a probabilistic mixture of two Gaussians (clusters).
The mixture components yield the likelihood that a measurement belongs to either cluster. We then applied the likelihood-ratio criterion to assign cells to either of the two clusters or subpopulations (see Supp. Fig. 1 for results from experiment and simulation). For light-dark data, the data falls naturally into two well separated subpopulations, which are defined according to whether the cell cycle history of a cell included a period of darkness or not, so in principle no mixture model is needed.

Analysis of division times using lineage-weighting
To analyse the frequency of cell divisions throughout the day, we used a lineage-weighted estimation in Fig. 6B,C. To this end, we extracted all lineages from the data and weighted each data point by a factor 1/2 depending on the number of cell divisions in that lineage. The method takes into account that fast growing cells are overrepresented compared to slow growing ones (4-6). We found the difference to the unweighted data is significant only in the LD conditions, because the two subpopulations have vastly different interdivision times.

Nonparametric estimator of the size control hazard
The relation between the division rate Γ (with 1) and the cell cycle duration distribution is Changing variable from cell cycle duration to division length , we find the distribution of division length is where we used Γ from Eq. 1 of the main text with 1. A simple relation between the size control hazard , and the above is therefore (S1) or, equivalently, where is the the probability of reaching length before dividing. Eq. S1 is a non-parametric estimator of the size control hazard in clock-deletion cells, which is shown in Fig. 3A, Supp. Figs. 4A and 11D-F. See Refs. (7,8) for further information.

Size control hazards parameterized using linear models
In this subsection, we explain how to parameterize the dynamic model of the size control of the clock-deletion strain. We assume a linear model where Δ follows a gamma distribution with shape and scale parameters and where Ψ is the gamma function. The gamma distribution is a continuous distribution characterised by only two parameters, essentially describing mean and variance. For small variances, the Gamma distribution converges to the Gaussian distribution, but for larger variances it enjoys the particular advantage of being strictly positive (unlike a Gaussian distribution).
The size control hazard Δ then follows from the division length distribution as The above is obtained as the ratio between this distribution and its survival function, giving the relation (S2) Using the gamma distribution, this equation evaluates to where Γ denotes the upper incomplete gamma function.

Unbiased estimation of growth parameters
In principle, the cell size control parameters , and can be found by estimating the slope a through a linear regression and evaluating the residuals of the regression. The residuals are are then fit to a gamma distribution to extract and . However, the parameters obtained using this simple approach are typically biased when we do not account for unobserved growth before and after division. The power of the Bayesian approach is that we can estimate the true parameters given these uncertainties.
To this end, we assume cells are imaged every Δt and denote the length measurements in the birth and division frames as , and taken at times , and , for 1,2, . . , respectively. The true birth and division lengths are and , where , and , are unknown time-intervals in 0, Δ . The growth rate in the different conditions are described in SI Sec. 9.
Assuming that the unobserved time-intervals are uniformly distributed in 0, Δ ,the likelihood of the growth parameters reads We sampled the posterior distributions of the growth parameters using an adaptive MCMC sampler (9) and evaluating the above integrals numerically. In Supp. Fig. 3, we demonstrate the uncorrected regression method leads to biased estimates of the growth parameters, while the Bayesian method accounting for unobserved growth removes this bias.

Inference of the circadian coupling function from single-cell data
To determine the coupling function, we need to estimate its likelihood. The probability for a single cell born at time to divide after a cell cycle duration is a product of the probability of cell division Γ , , , and the probability that the given cell has not divided before that time. The result is This expression depends not only on the whole single-cell trajectory of length but also on its derivative with respective to cell age ′.
Because derivatives are difficult to estimate from single-cell data, we focus on the likelihood for a cell to divide at a specific length, which can be obtained by a change of variable. Using Eq. 1 of the main text in Eq. S4, we obtain the probability of a cell to divide at a length , (S5) a quantity that is independent of the length derivatives but depends on cell age ′ . Since cell length strictly increases in almost all of the observed single-cell traces, we estimated this quantity by interpolating cell age against cell length measurements for each cell using a cubic Bspline and evaluated the resulting integral in the above equation numerically.
The likelihood of the coupling function given N single-cell observations is then given by Accounting for the unobserved growth (see SI Sec. 6 for details), where and are independent random variables that are uniformly distributed in the interval 0, Δ . The unobserved growth was estimated by extrapolating the length-traces. We parameterized the coupling function Exp , a positive function with arguments taken modulo 24 hours, by a cubic B-spline with 12 knots (0,2,...,22 h) and periodic boundary conditions. The spline was evaluated at the knots and we sampled the posterior distributions using an adaptive Gibbs-sampler implemented in the Julia library Mamba (9). The result of this inference is shown in Fig. 3E.
Direct estimation of the coupling function is complicated because of the difficulty to enumerate all possible cell length histories. Alternatively, we can obtain an estimate of the division rate by simply ignoring the history-dependence in Eq. S5. By doing so, we may estimate the division length distribution (S8) and then compute the approximate size-control hazard , as shown in Supp. Fig. 5.
Note, however, that due to the average over histories in Eq. S8, which enter through the coupling function (cf. Eq. S5), Eq. S9 is a biased estimator of the size-control in the presence of circadian modulation. In contrast, in the absence of circadian modulation, the division length depends only the birth length and Eq. S9 equals the size control hazard as explained in SI Sec. 6.1. The non-parametric estimator of the division rate is thus unbiased for clock deletion cells but not for WT cells.

Estimation of the circadian coupling function using division times (ignoring cell size control)
An alternative model for estimating the coupling function is to ignore the dependence on cell length in the division rate. The division rate Γ , then depends only on the present time of day and on cell age , Eq. S4 then reduces to In this simpler model, which we call the division-time model, the division rate can be obtained directly from inverting the above expression. The non-parametric estimator is given by (S10) Note that the function in the denominator is just the probability that a cell born at time t has not division before reaching age τ. In Supp. Fig. 4A,B, we show that the division rates clockdeletion and WT cells obtained using the above equation. Specifically, by plotting the ratio of the division rates (Supp. Fig. 4C), we find that Γ factorises as where the functions and can be estimated efficiently using the likelihood (S11) which depends on the measured birth times and cell cycle durations , , ,…, . Here, the birth time refers to the acquisition time of the first frame, and the cell cycle duration is the difference between the acquisition times of the first and last frames the cell was observed in. Since cells were imaged every Δ (45 min in constant light) and cell divisions were not directly observed, we accounted for the uncertainty in division times (S12) Where u and u are independent random variables that are uniformly distributed in the interval 0, Δ . Samples from the posterior distribution were obtained using an adaptive MCMC sampler as shown in Supp. Figs. 4D, 6 and 12, where we parametrise using a cubic B-spline with periodic boundary conditions as in SI Sec. 7.
is the hazard function of a Gamma distribution, whose parameters are estimated alongside .

Parameters used in stochastic simulations
For model simulations, we use Δ given by the hazard of the gamma distribution (Eq. S3). Because Δ is an increasing function, it is bounded by Δ 1 .  Fig. 7). In clock-deletion cells, it is constant throughout the day with a mean of 0. Parameters , and represent medians of the posterior distributions sampled using the method described in SI Sec. 6.3. Brackets denote the corresponding 95% credibility interval.        These couplings also generate two or three subpopulations but display decreasing added lengths in the slow subpopulations and a lower slope (violet lines) in the fast ones, which is not supported by our experiments (C). In the lower panels, magenta and black dots are single cell data points, and violet and grey lines are the regression lines for the fast and slow subpopulations, respectively. Grey shades represent subjective night. All panels show experiments or simulations for constant light conditions.     S12. Comparison between coupling functions obtained using data from this study and data from Yang et al. (10). We use the Bayesian approach (SI Sec. 8, Eq. S7) to extract the coupling function from division time data reported by Yang et al. We find the coupling function extracted from their data (teal) shares similar features with the coupling function found in this study (yellow), namely a peak of cell division towards the end of the day and lower division rates at other times. The coupling function from Yang et al. dataset (10) has been scaled to match the coupling function used in this study, which reports coupling levels relative to clock-deletion cells. Grey shades represent subjective night under constant light conditions. Data replicated for two cycles to highlight periodicity.