Bayesian inference of reassortment networks reveals fitness benefits of reassortment in human influenza viruses

Significance Genetic recombination processes, such as reassortment, make it complex or impossible to use standard phylogenetic and phylodynamic methods. This is due to the fact that the shared evolutionary history of individuals has to be represented by a phylogenetic network instead of a tree. We therefore require novel approaches that allow us to coherently model these processes and that allow us to perform inference in the presence of such processes. Here, we introduce an approach to infer reassortment networks of segmented viruses using a Markov chain Monte Carlo approach. Our approach allows us to study different aspects of the reassortment process and allows us to show fitness benefits of reassortment events in seasonal human influenza viruses.

true reassortment rate estimated reassortment rate Figure 1: Estimates of effective population sizes and reassortment rates from simulated genetic sequences. A Estimated effective population sizes and 95% confidence intervals (y-axis) vs. simulated effective population sizes on the x-axis. B Estimated reassortment rates and 95% confidence (y-axis) vs. simulated reassortment rates on the x-axis. C Posterior support for true reassortment events (y-axis) given the reassortment distance (x-axis). Inference of reassortment networks from sequences simulated with a evolutionary rate of 5 × 10 −3 mutations per site and year (top row) and 5 × 10 −4 mutations per site and year (bottom row). From left to right, the reassortment events are for networks with 2,3 and 4 segments. This is particularly true when we only look at reassortment events be-138 tween pairs of segments and drops when we look at 3 or 4 segments. This 139 decrease is driven by our definition that two reassortment events are only 140 the same if all segments reassort in the same relative direction at the same 141 time with exactly the same clade below the segment trees; a requirement 142 that becomes harder to satisfy as the number of segments increases. As 143 expected for methods that correctly take into account uncertainty, the 144 posterior support decreases when lower evolutionary rates are used to 145 simulate the sequences of the segments. 146 Joint inference from full genomes increases preci-147 sion in dating nodes 148 We compared the internal node ages inferred using the coalescent with 149 reassortment to ages inferred under the assumption that all segments 150 evolved independently under the standard coalescent model. To do this, 151 we first compiled datasets of several human influenza A subtypes, as well 152 as influenza B (details in the Materials and Methods). From each of these 153 we generated three datasets consisting of a random sample of sequences. 154 We then analysed each of these sub-sampled datasets once using the 155 coalescent with reassortment and once using a normal coalescent prior 156 with shared effective population size across all segments, but assuming 157 that each segment evolved independently. We first computed the 95% 158 highest posterior density (HPD) interval of node heights for each clade that was supported by both approaches with a posterior probability of 160 more than 0.5. We then normalized the difference between the lower and 161 upper bound of the 95% HPD interval, by the median node height estimate 162 to get the relative with of the HPD interval for each clade. As shown in 163 figure 2A, using the coalescent with reassortment reduces the uncertainty 164 of node height estimates of segment tree nodes by 34% for p2009 like 165 H1N1)up to 50% for influenza B.

166
Next we computed the distribution of clade supports for clades repre-167 sented in the MCC trees inferred using the two approaches. As shown in 168 figure 2B, segment tree clades are much better resolved when using the 169 coalescent with reassortment for all datasets. 170 We then compared the effective population sizes and evolutionary in-171 ferred using the two approaches. The coalescent with reassortment infers 172 higher effective population sizes for all datasets (see figure 2C). This also 173 influences the inferred clock rates, since lower effective population sizes 174 put stronger weight on shorted branches and therefore larger clock rates 175 (see figure 2C). We explain this discrepancy as follows. Coalescent events 176 closer to the tips are more likely between lineages that are for example 177 geographically more closely related and can be assumed to occur rapidly which then leads to differences in the estimated clock rates. 194 We also compared the performance of the two approaches by infer-     Figure 2: Comparison of estimates between the coalescent with reassortment and assuming that each segment codes for an independent realization of the same coalescent process. A Comparison of the relative width of the 95% HPD interval of segment tree node heights. The vertical axis shows the distribution of ratios of the relative width of the 95% HPD intervals of the coalescent with reassortment over the coalescent assuming independent segment evolution. The values show the median reduction in node height uncertainty when using the coalescent with reassortment over the coalescent with independent segments. B Comparison between the distribution of posterior clade support of segment trees found the maximum clade credibility segment trees. C Comparison between the inferred effective population sizes. When assuming each segment is an independent realization of the same coalescent process, the effective population sizes are inferred to be much smaller and much more certain. D Comparison between the inferred clock rates. The coalescent with reassortment infers lower clock rates.
man influenza viruses 214 We compared the reassortment rates of different influenza types. To do 215 so, we used the same datasets as described above, as well as an influenza 216 A/H2N2 dataset sampled between 1957 and 1970. We then jointly inferred 217 the reassortment network, the embedding of segment trees, evolutionary 218 rates, effective population sizes and reassortment rates of these viruses. We 219 find that the estimated reassortment rates vary greatly between different 220 influenza viruses. Next, we test if there is a fitness effect associated with reassortment events.

250
To do so, we classify every network edge from the posterior distribution of 251 inferred networks as either "fit" or "unfit". We define a fit edge to be any    The coalescent with reassortment is a continuous time Markov process that proceeds backward in time. It involves three possible events: sampling, coalescent and reassortment events. As is usually case for coalescent approaches, we condition on sampling events. These happen at predefined times and simply the number of active network lineages by 1. Coalescent events occur between two network lineages l and l at a rate that is inversely proportional to the effective population size Ne and reduce the number of active network lineages by 1. The smaller the effective population size, the more likely two lineages are to share a common ancestor, i.e. the more likely they are to coalesce. Upon a coalescent event, the segments that the parent lineage p of lineages l and l carries is the union of the segments that is carried by i and j, i.e.: This coalescent events in the network only corresponds to a coalescent 336 event in a segment tree when the corresponding segment is present in 337 both C(l) and C(l ).

338
Reassortment events happen at a rate ρ per lineage per unit time.

339
A reassortment event on lineage l will increase the number of network 340 lineages by 1. The segments carried by lineage l are randomly assigned to 341 the two parent lineages p1 and p2. This means that the probability of the 342 ancestry of a given segment to follow p1, for example, is 0.5.

343
As we are not interested in the history of segments that are not ancestral to our sample, we explicitly integrate over this ancestry in our model. As with standard coalescent with recombination models, this is done by omitting non-ancestral events from the process and modifying the reassortment rate to exactly account for this omission. In our model, the events which are omitted are "reassortment" events on l in which the ancestry of every ancestral lineage in C(l) is assigned to the same parent. edge uniformly at random, the probability of either p1 or p2 being chosen as ancestral to all segments is The effective rate of "observable" reassortments on lineage l is then simply

397
The total rate is the sum of the coalescence rate λ

401
The rate of observable reassortment events is Note that this is generally less than the total rate of reassortment events 404 in this interval, which would be simply ρ|Li|, as this rate excludes reas-405 sortment events that produce lineages carrying no ancestral segments.

406
The parameter priors