Temperature and population density influence SARS-CoV-2 transmission in the absence of nonpharmaceutical interventions

Significance There is still much to be understood about the factors influencing the ecology and epidemiology of COVID-19. In particular, whether environmental variation is likely to drive seasonal changes in SARS-CoV-2 transmission dynamics is largely unknown. We investigate the effects of the environment on SARS-CoV-2 transmission rates across the United States and then incorporate the most important environmental parameters into an epidemiological model. We show that temperature and population density can be important factors in transmission but only in the absence of mobility-restricting policy measures, although particularly strong policy measures may be required to mitigate the highest population densities. Our findings improve our understanding of the drivers of COVID-19 transmission and highlight areas in which policy decisions can be proactive.


Data and Code Availability
The entirety of the code to reproduce our analyses, download source data and update models with new data are available from our GitHub repository (https://www.github.com/pearselab/tyrell). We also provide releases of the R workspaces from the Bayesian model runs used in the manuscript: the main model, the cross-validation forecasting model and cross-validation null model.

Collinearity of Climate Variables
As stated in the main text, correlations between the climate variables (temperature, absolute humidity, UV radiation levels) mean that we could not disentangle each variable's contributions to R0 in multiple regression analysis. This can be seen in the high variance inflation factors for all climate variables when these predictors are regressed together (table S1).
To further illustrate this, we perform a principal components analysis (PCA) on the three climate variables in our dataset, plus population density. Eigenvalues of the first two components of the combined datasets together accounted for 83% of the total variance. Temperature and absolute humidity fall mainly along the axis of PC1, which accounts for 55% of the variation. This close relationship is expected given that absolute humidity is a function of temperature (see main text eq. 1). UV radiation is correlated with temperature and humidity, but less so, and contributes more to PC2 which also comprises population density and explains 28% of the variance ( fig. S1).
For further analysis we therefore chose the best fitting climate variable, assessed by Pearson's r (table S2). Based on this, we used temperature as our climate variable in further analyses.

Effects of environment before and during lockdown
When regressed against temperature and log 10 -transformed population density only, we find that R0 significantly increases with population density and decreases with temperature (both p < 0.001, table S3, see main text Fig. 1A and fig S2). However, when regressing lockdown Rt (mean Rt across the two weeks following a state-wide stay-at-home mandate) against temperature and log10-transformed population density, we find no significant effect of temperature, and a much lower coefficient for population density than in the R0 regression model (table S4).

Lockdown as a model term.
We combine the pre-lockdown R0 and during-lockdown Rt data for the USA and perform multiple linear regression with lockdown as an additional parameter, to further investigate the impact of mobility restrictions (i.e. lockdown) compared to the environment. We restrict the environmental parameters to temperature and population density due to the collinearity between temperature, UV radiation and humidity (see section 1). We first incorporate lockdown as a binary (i.e. in lockdown, or not in lockdown) additive effect and test the effects on R: R ∼ T emperature + log 10 (P opulation density) + Lockdown As expected, we find that lockdown has a significant effect on R, with an order of magnitude greater strength than either environmental parameter (table S5).
Subsequently we ask whether lockdown mediates the environmental effects by incorporating this as an interaction term, i.e.: R ∼ T emperature + log 10 (P opulation density) + Lockdown+ T emperature : Lockdown + log 10 (P opulation density) : Lockdown Here we find that there is a significant interaction between lockdown and temperature, but not population density (table  S6), i.e. the effects of temperature on R are dependant upon lockdown status. This interaction model is preferred over the purely additive model (ANOVA, F2,74 = 9.996, p < 0.001). This is the result presented in the main text.

Additional explanatory variables
Urban population. Whilst we have shown an effect of state-level population density on R, this may not be fully representative of the transmission dynamics if most transmission occurs within metropolitan areas. We therefore explored the use of census data specifying the proportion of people living in urban environments as an alternative predictor to population density. We also considered the total population of each state and the total urban population of each state (i.e. % urban population × total population). However, we find that population density is by far the strongest correlate of R0 (table S7), and therefore use this as the population demographic predictor in our models. Airport arrivals. Our modelling approach accounts for the comparative movement of people within states using the mobility data from Google, both for our independent validation R estimates, as well as within the Bayesian model that we have adapted here. However, we have no variables specifically taking into account visitors from other states or countries, which may increase a state's transmission rate. To investigate this potential effect, we asked whether R0 increases in states with a great amount of air traffic. We collated data from the Federal Aviation Administration's performance metrics giving daily flight arrivals at 77 of the USA's busiest airports (the ASPM77). We compared the cumulative arrivals in the two weeks proceeding our R0 estimate by state, but find no correlation with R0. Furthermore, the states with the greatest numbers of airport arrivals have comparatively low R0 estimates (Florida, California, Texas, Fig. S3). Note however that we cannot differentiate between interand intra-state flight arrivals here.

Potential confounding factors
Epidemic starting dates. For each state, R0 was estimated 30 days prior to the first 10 cumulative deaths recorded (see methods). This means that R0 was estimated on different dates for each state, depending upon when the epidemic took hold. However, if people in states where the epidemic took hold later had already begun to modify their behaviours based on the situation in earlier succumbing states, this may result in a reduction in R0 in those later states. Moreover, if there were spatial component to this, i.e. epidemic beginning earlier in Northern (colder) states and later in Southern (warmer) states, it could result in a spurious correlation between temperature and R0. We investigated this potential confounding factor, but find no relationship between R0 and the date it was estimated on (Fig. S4). Furthermore, we see no relationship between these dates and the temperatures across states, i.e. epidemics did not generally take hold in colder states and transfer to warmer states later (but note the strong partitioning in R0 between colder and warmer states, Fig, S4). Therefore, the lower transmission rates seen in warmer states are not due to epidemics starting later in these states and giving the population the opportunity to modify their social behaviour in preparation.
Effect of environment on recreational mobility. Mobility trends for parks were omitted from estimates of Rt both from the datasets that we used our external validation exercise, as well as our own Bayesian modelling (as significant contact events are assumed to be negligible). However, here we further investigate the impacts of environmental temperature on outdoor recreational mobility levels as a potential confounding factor in these types of analysis (i.e. "do people go to the park more when it's warm?"). We first detrend the data using the diff() function in R, as the effects of policy are expected to have a large impact on the realised mobility levels. This allows us to compare whether an increase in environmental temperature at the daily level causes a similar increase in recreational mobility. When we compare daily changes in mobility against daily changes in temperature across all US states, between February and June 2020, we find no significant correlation (t5276 = −0.64628, r = −0.0089, p = 0.518).

Bayesian model
Model structure. The model used to generate R0 and Rt estimates used in our independent validation of environmental effects is presented in Unwin et. al. (1). This model is based on the framework provided by Flaxman et. al. (2), in which Rt informs a latent function for infections, which are then calibrated against observed deaths, but has been adapted to paramaterise Rt as a function of google mobility data.
Specifically, the number of infected individuals, i, is modelled using a discrete renewal process. The number of infections it,m on a given day t, and state m, is given by the following function: where the population of state m is denoted by Nm and the adjustment factor St,m accounts for the number of susceptible individuals in the population. The generation distribution g is discretised by gs = Observed deaths are mechanistically linked to transmission, using previously estimated COVID-19 infection fatality ratios (IFRs), where each state has a specific mean IFR (ifrm). The distribution of times from infection to death, π was assumed (based on previous epidemiological studies) to be a convolution of an infection-to-onset distribution (π ) and an onset-to-death distribution: π ∼ Γ(5.1, 0.86) + Γ(17.8, 0.45). [3] The expected number of a deaths dt,m on a given day t, for a given state m, is given by: where iτ,m is the number of new infections on day τ in state m and π is discretised via πs = s+0.5 s−0.5 π(τ )dτ for s = 2, 3, ... and π1 = 1.5 0 π(τ )dτ where π(τ ) is the density of π. Rt,m is parameterised as a function of the relative change in human mobility from a baseline: where f (x) = 2exp(x)/(1 + exp(x)) is twice the inverse logit function. X t,m,k are covariates that have the same effect for all states, Y t,m,l is a covariate that has region-specific effects, Zt,m is a covariate with state-specific effects and m,wm(t) is a weekly AR(2) processed which captures the variation between states not explained by the mobility covariates. The mobility covariates used were: is the average change in mobility for retail and recreation, groceries and pharmacies, and workplaces, and M residential t,m is the change in mobility for places of residence. The prior distribution for R0,m was chosen to be: where κ is the same among all states. Seeding of new infections was assumed to begin 30 days before first 10 cumulative deaths recorded by a state. From this date, the model was seeded with 6 sequential days of an equal number of infections The seed infections are inferred in the Bayesian posterior distribution.
The prior distribution for the shared coefficients are given as: and the prior distributions for pooled coefficients as: Parameters were jointly estimated for all states in a single hierarchical model, with fitting performed in the Stan programming language.
To integrate temperature and population density into this model, we made a number of alterations to the model structure, see main text methods.
Model coefficients. Figure S5 shows the posterior interval estimates from MCMC draws for the main parameters in our Bayesian model. We also provide the full model coefficients and results of posterior predictive checks to ensure that all chains had mixed and converged in table S8.

Contribution of Bayesian model terms to
Rt. We provide a visualisation showing the influence of our various model terms on Rt for each state through time in fig S6. When reductions in mobility are strong, Rt is largely driven by the mobility terms in the model, however when mobility reductions are weak, temperature and population density are much greater drivers of Rt. See for example New York: when mobility was close to baseline in February and March, fluctuating temperatures cause large spikes in Rt, which is also generally driven upwards by the relatively high population density of the state. However, after stay-at-home measures were implemented, Rt drops and closely tracks changes in mobility, with temperature fluctuations having much lesser an effect. This shows that although we don't explicitly model an interaction between environment and mobility, our model represents this effect.

Cross-validation.
To validate the point-estimates of our model, we use our fitted coefficients with known temperature and mobility data to forecast deaths into the 14 days immediately following the date range that the model was fitted to. Our model provides a good prediction over this 14-day period, with predicted deaths highly correlated with the real observed deaths (Pearson's r = 0.99, Fig. S7). We visualise these predictions for each state in Fig. S8, where the time series of death data is shown with the model predictions overlaid. Whilst the model provides a good prediction of deaths over this period, we can observe that the uncertainty in the predictions rapidly increases (wide credible intervals) over the prediction range. This is in line with previous validation of the base model used here, which found good forecasting over short-timescales, but that performance degrades rapidly at longer time-scales (2).
To further validate the inclusion of daily temperature variation our model, we compared the model fit to that of a null model in which we replaced all daily temperature data within a state with that state's median temperature, and fit the model as described in the methods. This allows us to ask whether the daily changes in temperature through time are an important factor, or whether only the absolute differences in temperature between states are important (i.e. the temperature-R0 correlation in our independent regression analyses). Within this model run, the posterior distribution for C suggested limited-to-no residual effect of temperature (median −0.038; 95% CIs −0.26 -+0.23). This supports our conclusion that changes in temperature and transmission are correlated through time: it is not the average temperature of a state through the four-month period of this study that is associated with daily transmission, but rather the temperature on each individual day. Were our findings driven solely by average temperature differences across US states, and not daily temperature, we would have found similar coefficient estimates between our observed data and this null analysis.  (table 1, main text), using the median population density across all states for the temperature figure, and the median temperature across states for the population density figure, for preand during-lockdown respectively. transformed as these data are heavily right-skewed. This is based on data from the 77 busiest airports in the USA; some states have no airports within this dataset and so are omitted here. These are the overall transmission rate common to all states, the three measures of mobility, and the temperature and population density parameters that we added to the original model (each with their symbols as given in equation 2 [methods] in parentheses). Point represents median, thick line 50% interval, thin line 95% interval.   Here we show the cumulative predicted deaths for a each state over the 14-day prediction period, plotted against the real observed deaths over the same period (log10 transformed). Black line is 1:1. Observed deaths are highly correlated with those predicted by our model (Pearson's r = 0.99).
Two states with zero observed deaths over this period (AK and HI) are omitted from this plot, although their lack of deaths were also well predicted by the model (median cumulative death predictions for AK = 0.28, HI = 0.34).

Fig. S8. Predicted deaths by state.
Here we show the observed and predicted deaths for each state across the entire time-series of our model. Grey bars show the daily recorded deaths for the date range that our model is fitted to. Red bars show the observed daily deaths in the range which our model is predicted for without death data (with the dashed line designating the split between these data ranges). Blue line is the median death prediction, with dark blue (50%) and light blue (95%) credible interval bands for the predictions.