Evolutionary Stalling in the Optimization of 1 the Translation Machinery 2

10 Biological organisms are modular. Theory predicts that natural selection would steadily improve 11 modules towards their performance optima up to the margin of effective neutrality. This classical 12 theory may break down for populations evolving in the clonal interference regime because 13 natural selection may focus on some modules while adaptation of others stalls. Such evolutionary 14 stalling has not been observed and it is unclear whether it limits the power of natural selection to 15 optimize module performance. To empirically characterize evolutionary stalling, we evolved 16 populations of Escherichia coli with genetically perturbed translation machineries (TMs). We 17 show that populations with different suboptimal TMs embark on statistically distinct trajectories 18 of TM optimization. Yet, before TMs approach the margin of effective neutrality, the focus of 19 natural selection shifts to other cellular modules, and TM optimization stalls. Our results suggest 20 that module optimization within an organism may take much longer than suggested by classical 21 theory. 22


Introduction 24
Biological systems are organized hierarchically, from molecules to cells, organisms and 25 populations [1][2][3][4][5]. At the lowest level, molecules within cells form functional modules, such as 26 the translation machinery, or various other metabolic pathways [4, [6][7][8][9]. Organismal fitness 27 depends on the performance of these modules. However, the ability of natural selection to optimize 28 cellular modules is constrained by the abundance and the effects of available beneficial mutations. 29 In the simplest case, the speed of evolutionary optimization of a module depends only on the 30 supply and the fitness effects of beneficial mutations in that module. Theoretical models predict 31 that the fitness effects of beneficial mutations will decline as the module's performance approaches 32 an optimum. Therefore, the module's performance is expected to improve steadily, albeit with a 33 gradually declining rate [10][11][12][13][14][15][16][17][18][19][20][21]. When the module's performance approaches the optimum and 34 the fitness effects of beneficial mutations drop below ~1/N, the inverse of the population size, the 35 optimization of the module by natural selection stops [12,[22][23][24]. 36 In reality, evolution of any one module within an organism depends on the supply and effects of 37 beneficial mutations in all modules. One reason for this interdependence is that modules are 38 encoded in genomes, and genomes are physically linked [25]. Therefore, new beneficial mutations 39 affecting different modules must compete against each other in the population whenever they 40 simultaneously arise on different genetic backgrounds [25][26][27][28][29]. This effect, known as "clonal 41 interference", is particularly strong when recombination is rare and the supply of adaptive 42 mutations is large [25,28], e.g., if the organism reproduces asexually, the population is large and 43 the environment is new. In the clonal interference regime, small-effect mutations are usually 44 outcompeted. Instead, adaptation is driven by mutations that provide fitness benefits above a 45 certain "clonal interference" threshold, which depends on the current supply and the fitness effects 46 of all adaptive mutations in the genome [25,29,30]. 47 Beneficial mutations in different modules likely arise at different rates and have different effects 48 on the fitness of the organism. Therefore, natural selection will be "focused" on optimizing 49 modules where mutations have effects above the clonal interference threshold, while other 50 modules would adapt slowly or not at all. Modules that are more important for fitness in the current 51 environment and those that are farther from their performance optima are expected to contribute 52 more large-effect mutations. Such modules are more likely to be in the focus of natural selection. 53 However, as natural selection improves the performance of any such module, the supply and 54 effects of adaptive mutations in that module will decline. Eventually, further improvements will 55 only be possible by mutations with effects below the clonal interference threshold. At this point, 56 the evolutionary optimization of the focal module will slow down or cease entirely. We call this 57 phenomenon "evolutionary stalling". 58 Evolutionary stalling imposes a limit on the power of natural selection to improve the performance 59 of a module within an organism, in addition to the well known threshold of effective neutrality. 60 While the effective neutrality threshold cannot be overcome, evolutionary optimization of a stalled 61 module can resume once large-effect adaptive mutations in competing modules are exhausted. 62 Nevertheless, stalling poses a potentially serious obstacle for the evolutionary optimization of a 63 module because it can occur much farther from the optimum than the hard limit of effective 64 neutrality. To replace all EF-Tu molecules in the cells, the tufB gene was deleted and the foreign orthologs 106 were integrated into the tufA locus [55]. We also included the control strain in which the tufB 107 gene was deleted and the original E. coli tufA was left intact. We refer to the engineered 108 "founder" E. coli strains as E, S, Y, V, A and P by the first letter of the origin of their tuf genes 109 ( fitness and growth rate were highly correlated ( Figure S1). We conclude that the competitive 120 fitness of our founders in our environment reflects their TM performance. The fitness of the S 121 and Y founders were similar to that of the control E strain (≤ 3% fitness change) indicating that 122 their TMs were at most mildly suboptimal. In contrast, the fitness of the V, A and P founders 123 were dramatically lower (≥ 19% fitness decline; Table 1) indicating that their TMs were severely 124 suboptimal. 125 To determine whether natural selection focuses on restoring defective TMs, we instantiated 10 135 replicate populations from each of our six founders (60 populations total) and evolved them in 136 LB for 1,000 generations (Methods) with the bottleneck population size N = 5×10 5 cells. We then 137 measured the competitive fitness of the evolved populations relative to their respective founders. 138 Fitness in all but one population increased significantly (t-test P < 0.05 after Benjamini-139 Hochberg correction; Figure S2), and the average fitness increase of a population correlated 140 negatively with the initial fitness of its founder ( Figure 1). These results show that even 141 substantial fitness defects caused by reductions in TM performance can be largely compensated 142 in a short bout of adaptive evolution. 143 The pattern of "declining adaptability" in Figure 1  We selected replicate populations 1 through 6 descended from each founder (a total of 36 153 populations), sampled each of them at 100-generation intervals (a total of 11 time points per 154 population) and sequenced the total genomic DNA extracted from these samples. We developed 155 a bioinformatics pipeline to identify de novo mutations in this data set (Methods). Then, we 156 called a mutation adaptive if it satisfied two criteria: (i) its frequency changed by more than 20% 157 in a population; and (ii) it occured in a "multi-hit" gene, i.e., a gene in which two independent 158 mutations passed the first criterion. Reliably tracking the frequencies of some types of mutations 159 (e.g., large copy-number variants) is impossible with our sequencing approach. Therefore, we 160 augmented our pipeline with the manual identification of copy-number variants which could only 161 be reliably detected after they reached high frequency in a population (Methods and Figure S3). 162 This procedure yielded 167 new putatively adaptive mutations in 28 multi-hit genes, with the 163 expected false discovery rate of 13.6%, along with an additional 11 manually-identified 164 chromosomal amplifications, all of which span the tufA locus (Methods and Table S1, Figure  165 S4). We classified each putatively adaptive mutation as "TM-specific" if the gene where it 166 occurred is annotated as translation-related (Methods). We classified mutations in all other genes 167 as "generic". We found that 38 out of 178 (21%) putatively adaptive mutations in 6 out of 28 168 multi-hit genes were TM-specific (Table S1). This is significantly more mutations than expected 169 by chance (P < 10 -4 , randomization test) since the 215 genes annotated as translation-related 170 comprise only 4.0% of the E. coli genome. All of the TM-specific mutations occurred in genes 171 whose only known function is translation-related, such as rpsF and rpsG, suggesting these 172 mutations arose in response to the initial defects in the TM. The set of TM-specific mutations is 173 robust with respect to our filtering criteria ( Figure S5). 174 TM-specific mutations occurred in 17 out of 36 sequenced populations. Generic mutations were 175 also observed in all of these populations ( Figure S4). Thus, whenever TM-specific mutations 176 occurred, generic mutations also occurred, such that the fate of TM-specific mutations must have 177 depended on the outcome of clonal interference between mutations within and between modules 178 ( Figure 2). As a result of this competition, only 14 out of 27 (52%) TM-specific mutations that 179 arose (excluding 11 tufA ampliciations) went to fixation, while the remaining 13 (48%) 180 succumbed to clonal interference (Figures 2, S4). In at least two of these 13 cases a TM-specific

193
Evolution of the TM stalls far from the optimum 194 Competition between adaptive mutations in different modules is necessary but not sufficient for 195 evolutionary stalling to occur in any one module. Therefore, we sought direct evidence of 196 evolutionary stalling in the TM. To this end, we examined the distribution of TM-specific 197 mutations among founders and across evolutionary time. All of the detected TM-specific 198 mutations occurred in the V, A and P populations whose TMs were initially severely suboptimal; 199 no TM-specific mutations were detected in the E, S, and Y populations whose TMs were mildly 200 suboptimal ( Figure 3A). Out of the 14 TM-specific mutations that eventually fixed in the V, A 201 and P populations, 12 (86%) did so in the first selective sweep (this excludes 11 tufA 202 amplifications). In contrast, out of the 16 generic mutations that fixed in these populations, only 203 7 (44%) did so in the first selective sweep. As a result, an average TM-specific beneficial 204 mutation reached fixation after only 300 ± 52 generations, compared to 600 ± 72 generations for 205 an average generic mutation ( Figure 3B, S4). Only one (7%) TM-specific beneficial mutation 206 reached fixation after generation 600, in comparison to 9 (56%) generic beneficial mutations. 207 Thus, by the end of our evolution experiment, adaptive TM-specific mutations are depleted even 208 in populations descended from the V, A and P founders. 209

214
These data demonstrate that evolutionary stalling in the optimization of the TM occurs in our 215 populations. They also allow us to place bounds on the TM defects which can and cannot be 216 improved by natural selection prior to the onset of evolutionary stalling. First, consider the 217 founder Y in which the initial defect in the TM incurs a ~3% fitness cost (Table 1). While Y 218 populations gained on average 2.4% in fitness during evolution (Figure 1), none of these gains 219 are attributed to TM-specific mutations. This indicates that TM adaptation is stalled if the initial 220 TM defect incurs ≤ 3% fitness cost. Next, consider founder V in which the initial defect in the 221 TM incurs a ~19% fitness cost (Table 1). We observed 8 TM-specific mutations across all V 222 populations, including three tufA amplifications. At least one of these mutations reached fixation 223 ( Figure S4), suggesting that natural selection can repair defects in the TM that incur ≥ 19% 224 fitness cost without the onset of evolutionary stalling. We conclude that the focus of natural 225 selection shifts from optimizing the TM to other cellular modules when the TM incurs a fitness 226 cost somewhere between 3% and 19%. 227 Another way of arriving at a lower bound for the onset of stalling is to consider the V, A and P 228 populations. On average, these populations fixed 0.8 TM-specific mutations during evolution, 229 and remained ~5.3% less fit than the control E strain, assuming fitness is transitive (Figure 1). 230 Even if we conservatively attribute all these fitness gains to improvements in the TM, by the end 231 of the experiment, TMs in these populations must still be on average ~5.3% below the optimum. 232 Yet, by the end of the experiment, fixation of TM mutations had essentially stopped, while 233 fixation of generic mutations continued unabated ( Figures 3B). This suggests that TMs that incur 234 fitness defects larger than 3% may still be subject to evolutionary stalling. 235 To further corroborate and possibly refine these bounds, we selected two TM-specific mutations 236 that arose in our populations, genetically reconstructed them in their respective founder strains 237 and directly measured their fitness benefits. The TM-specific mutation A74G in the rpsF gene, 238 which arose in population A5, provides an 8.2 ± 1.0% fitness benefit in the A founder. The TM-239 specific mutation G331A in gene rpsG, which arose in populations P2, P3 and P5, provides a 6.5 240 ± 1.2% fitness benefit in the P founder. Such large-effect mutations can never arise in TMs that 241 incur a less than 6.5% fitness cost, which is further indirect evidence that TM adaptation stalls 242 when it incurs a fitness cost larger than our conservative 3% bound. 243 If the TM was the only suboptimal module in the cell, theory suggests that its adaptation would 244 continue until the fitness defect it incurs is interactions might be similarly important in the short bout of evolution observed in our 258 experiment. Specifically, we asked whether different initial TM variants acquired adaptive 259 mutations in the same or in different translation-associated genes. 260 We found that 4 out of 7 classes of TM-specific mutations arose in a single founder ( Figure 4A). 261 For example, we detected six independent mutations in the rpsG gene, which encodes the 262 ribosomal protein S7, and all of these mutations occurred in the P founder (P < 10 -4 , 263 randomization test with Benjamini-Hochberg correction, Methods). Similarly, all four mutations 264 in the rpsF gene, which encodes the ribosomal protein S6, occurred in the A founder (P < 10 -4 , 265 randomization test with Benjamini-Hochberg correction). To directly measure how the effects of 266 these mutations vary across genetic backgrounds, we attempted to genetically reconstruct 267 mutation A74G in the rpsF gene and mutation G331A in rpsG gene in all six of our founder 268 strains. We successfully reconstructed both of these mutations in the founder strains in which 269 they arose and confirmed that they were strongly beneficial, as described above (8.2 ± 1.0% and 270 6.5 ± 1.2% benefit, respectively  Therefore, in addition to intra-module epistasis demonstrated above we might expect inter-293 module epistasis, such that initially different TM variants could precipitate distinct adaptive 294 responses in the rest of the genome. To test this hypothesis, we examined the distribution of 295 generic mutations among founder genotypes. 296 We found that generic mutations in 7 out of 22 genes occurred in fewer founders than expected 297 by chance ( Figure 4B, Methods). For example, we detected five independent mutations in the 298 ybeD gene, which encodes a protein with an unknown function, and all these mutations occurred 299 in the V founder (P < 10 -4 , randomization test with Benjamini-Hochberg correction). Similarly, 300 all three mutations in the alaA gene, which encodes a glutamate-pyruvate aminotransferase, 301 occurred in the A founder (P < 10 -4 , randomization test with Benjamini-Hochberg correction). 302 To corroborate these statistical observations, we reconstructed the T93G mutation in the ybeD 303 gene in all six founder strains and directly measured its fitness effects. As expected, this 304 mutation confers a 5.9% fitness benefit in the V founder. In contrast, it is strongly deleterious in 305 the P founder and indistinguishable from neutral in the remaining founders ( Figure 5). These 306 results show that at least some genetic perturbations in the TM can have genome-wide 307 repercussions. They can precipitate bouts of genome-wide adaptive evolution that are contingent 308 on the initial perturbations in the TM. 309 The fitness of an organism depends on the performance of many molecular modules inside cells. 316 While natural selection favors genotypes with better-performing modules, it is difficult for 317 evolution to optimize multiple modules simultaneously, particularly when recombination rates 318 are low and many adaptive mutations in different modules are available. In this regime, natural 319 selection is expected to focus on optimizing those modules where many mutations provide large 320 fitness benefits, while the adaptive evolution in other modules stalls. Here we have documented 321 and characterized the evolutionary stalling of the translation machinery (TM) in E. coli. 322 We found that evolutionary optimization of the TM was slowed down by competition with 323 adaptive mutations in the rest of the genome (Figure 2). The populations whose TMs were 324 initially mildly sub-optimal (incurring ≲ 3% fitness cost) adapted by acquiring mutations that did 325 not directly affect the TM. In contrast, populations whose TMs were initially severely sub-326 optimal (incurring ≳19% fitness cost) rapidly discovered and fixed TM-specific beneficial 327 mutations. We conclude that the adaptive evolution of the TM stalls when the TM defect incurs a 328 fitness cost between 3% and 19%. This is a conservative lower bound on the onset of stalling that 329 we derived under the assumption that the TM in the control E strain is close to optimal. 330 However, the E strain itself suffers a 4.1 ± 0.1% fitness defect relative to wild-type E. coli that 331 contains the tufB gene (Methods). Thus, the adaptive evolution of the TM may actually stall 332 when the TM defect incurs a fitness cost between 7.1% and 23.1%. 333 Evolutionary stalling in the TM occurs for one of two reasons. First, the rate of TM-specific 334 beneficial mutations may be too low for these mutations to survive genetic drift when rare. 335 Alternatively, these mutations occur frequently enough to survive drift but succumb to clonal 336 interference. Both theoretical and empirical (albeit limited) evidence suggest that small-effect 337 beneficial mutations are more common than large-effect mutations [16,19,79,80]. The fact that 338 we observed TM-specific mutations with effects ≥ 5% indicates that the rate of such mutations is 339 high. We expect the rate of TM-specific mutations with effects < 5% to be even higher. If we 340 relax the stringency criteria for detecting beneficial mutations, we find one TM-specific mutation 341 in the gene rbbA in the population E5 ( Figure S5). This suggests that small-effect TM-specific 342 beneficial mutations exist and supports the conjecture that adaptation of the TM stalls because of 343 clonal interference. 344 Our results show that evolutionary stalling limits the ability of natural selection to improve a 345 module, but this limit is not absolute. As a population accumulates beneficial mutations in other 346 modules, their supply will be depleted and their fitness effects will likely decrease due to 347 diminishing returns epistasis [37,66,67,81-83]. These changes will in turn increase the chances 348 for small-effect mutations in the focal module to survive clonal interference thereby overcoming 349 evolutionary stalling. While we did not observe resumption of adaptive evolution in the TM in 350 this experiment, we find some evidence for such a transition in one other module. We detected 351 11 mutations in multi-hit genes that affect cytokinesis (Methods, Figures S6, S7) thought to be nearly optimal [54], but when and how TMs evolved to this optimal state is 390 unknown. Our work helps us constrain the plausible evolutionary scenarios. One possibility is 391 that the TM approached the optimum prior to the last universal common ancestor (LUCA), and 392 subsequent evolution in TM components along most lineages was driven by conditionally neutral 393 and mildly deleterious substitutions. Another possibility is that the TM in LUCA was not 394 optimal, and TMs in different lineages were optimized after LUCA. Our results suggest that 395 evolving an optimal TM after it was encapsulated in a cell with a physically contiguous genome 396 may have been difficult, especially if other components of the cell also required continuous 397 adaptation to a changing environment. In other words, the possibility that the TM has been 398 functionally optimized prior to LUCA appears more likely. 399 400 Materials and methods 401 Materials, data and code availability 402 All strains and plasmids constructed and used in this work are available per request. Raw 403 sequencing data were analyzed with the python-based workflow implemented in Ref.
[40] and 404 run on the UCSD TSCC computing cluster via a custom python wrapper script. All analysis and 405 plots reported in this manuscript have been performed using the R computing environment. The 406 script, modified reference genomes and the raw data (except for raw sequencing data) used for 407 analysis can be found at https://github.com/sandeepvenkataram/EvoStalling. Raw sequencing 408 data for this project have been deposited into the NCBI SRA under project PRJNA560969. 409 Media and culturing conditions 410 Liquid medium is the Luria-Bertani medium (LB) (per liter, 10 g NaCl, 5 g yeast extract, and 10 411 g tryptone) and solid medium is LBA (LB with 1.5% agar), unless noted otherwise. ybeD, rpsF and rpsG mutations were constructed using the same method, except the 420 chromosomal kanR marker was not removed ( Figure S9). For a full list of primer sequences used 421 for ybeD, rpsF and rpsG engineering, see Table S2. 422 Plasmids pZS1-TnSL and pZS2-TnSL were used in competition assays to provide Ampicillin 423 and Kanamycin resistance, respectively. pZS1-TnSL, derived from pUA66 [89], was kindly 424 provided by Georg Rieckh. pZS2-TnSL was constructed from pZS1-TnSL by replacing the 425 ampR cassette with kanR. 426 Evolution experiment 427 Experimental evolution was performed by serial dilution at 37°C in LB broth. To start the 428 evolution experiment, an initial 5 mL overnight culture was inoculated from a single colony from 429 the frozen stock of each founder strains. 10 replicate populations were started from single 430 colonies derived from these overnight cultures. The replicates were serially transferred every 24h 431 (±1h) as follows: 100 µL of saturated culture were transferred into 10 mL saline solution (145 432 mM NaCl), 50 µL of these dilutions were then transferred to 5 mL fresh LB (tubes were  433 vigorously vortexed prior to pipetting). This resulted in a bottleneck population size of about 434 5×10 5 cells. Freezer stocks (200 µL of 20% glycerol + 1 mL saturated culture) were prepared 435 approximately every 100 generations and stored at -80°C. 436 Competitive fitness assays 437 To carry out pairwise competition assays, an Ampicillin-resistant and a Kanamycin-resistant  438 versions of the query and reference strains/populations were generated by transforming these 439 strains/populations with plasmids pZS1-TnSL and pZS2-TnSL, using standard methods [90]. 440 Two replicate competition assays were performed for each query-reference pair with reciprocal 441 markers (four assays total per pair), except for allele-replacement mutants (see below). To 442 validate that the resistance-marker plasmids do not differentially impact fitness in any of the six 443 founder genetic backgrounds, we carried out three-way competition assays between the KanR-444 marked, AmpR-marked and the unmarked versions of the founders ( Figure S8). Since the allele-445 replacement mutants carry a chromosomal kanR marker (see above), they were only competed 446 against AmpR reference strains. 447 To start a competition assay, a query and a reference cultures were scraped from frozen stocks 448 and inoculated into 5 mL LB-Amp or LB-Kan media as appropriate. After about 24 hours, the 449 query and the reference cultures were mixed together in ratio 1:9 and diluted 1:10,000 into 5 mL 450 fresh LB media. After that, the mixed culture was propagated as in the evolution experiment. Competitions between two reciprocally marked versions of the same strain represent a special 462 case. If the two marker-carrying plasmids impose exactly the same fitness cost, our competition 463 assay between two reciprocally marked versions of the same strain is fully symmetric, which 464 implies that in expectation it must yield a fitness value of exactly zero. Any estimate of fitness 465 from a finite number of measurements even in such idealized fully symmetric case will not zero. 466 However, such deviations from zero would reflect only measurement noise rather than any 467 biologically meaningful fitness difference. In reality, the two marker-carrying plasmids may 468 impose slightly different fitness costs, but because the difference in the cost is detectable (see 469 above), we still interpret deviations from zero in our fitness estimates as noise. leverages the fact that each population was sampled multiple times across the evolution 514 experiment to increase our ability to distinguish real low-frequency variants from sequencing 515 errors and other sources of noise. 516 The reference genome was modified with the appropriate tufA sequence for each genetic 517 background used in the evolution experiment along with the removal of the tufB sequence, and 518 annotation coordinates were lifted over to be consistent with the original MG1655 reference 519 sequence using custom scripts. The modified reference genomes and annotation files are 520 included in the github repository. The variants reported in Table S1  segregating variants present in a population at generation 100 must either be fixed or lost in 545 generations 900 and 1000. Thus, we removed variants that failed to do so. 546 Variants that were present at an average frequency ≥ 95% at generation 100 across at least 18 547 populations were denoted as ancestral mutations that differentiate the founder from the reference 548 genome (n = 10). Variants that were not ancestral but present at ≥ 95% on average across all 549 populations derived from one founder were denoted as founder mutations (n = 11). These 550 mutations were likely introduced as a byproduct of the strain engineering process. Multiallelic 551 variants (two or more derived alleles present in a single population at the same site) were also 552 removed as likely mapping artifacts. Finally, variants that were present at generation 100 in 11+ 553 populations (of 36 total sequenced populations) are either mapping artifacts or pre-existing 554 variants and were not considered further (n = 169, including the 10 ancestral mutations identified 555 earlier). 556 Identification of adaptive mutations 557 The putatively adaptive mutations were identified as follows. We first identified mutations that 558 reached at least 10% frequency, were present in at least two consecutive time points and whose 559 frequency changed by at least 20% throughout the evolution. We then merged together such 560 mutations within 10 bp of each other as likely being derived from a single event. This resulted in 561 a set of candidate adaptive mutations. To identify likely adaptive mutations in this candidate set, 562 we considered only mutations in "multihit" genes, i.e., genes with 2 or more candidate adaptive 563 mutations. 564 Identification of modules in the genome 565 The 215 genes annotated as being associated with translation were identified using the Gene 566 Ontology database at http://geneontology.org/ by searching for all E. coli K12 genes that were 567 identified in a search for "translation OR ribosom". Similarly, the 45 genes associated with 568 cytokinesis were identified using a search for "cytokinesis". 569

Statistical analyses 570
The expected number of mutations in multihit genes was calculated via multinomial sampling. 571 Mutations were randomly redistributed across all genes in the E. coli genome controlling for 572 variation in gene length. The average of 10,000 such randomizations was used to calculate an 573 empirical FDR. A similar procedure was used to estimate the probability of observed as many or 574 more TM-specific mutations by chance as we actually observed in this study. 575 To test whether mutations in the 7 TM-specific multi-hit loci were distributed uniformly across 576 the six founders we first estimated the entropy of the distribution of mutations across founders 577 for each gene. Mutations in that gene were then randomly redistributed across six founders 578 10,000 times, weighted by the total number of TM-specific mutations observed in each founder. 579 An empirical P-value was calculated as the fraction trials with smaller than observed entropy 580 value. These P-values were then corrected for multiple testing across the 7 TM-specific loci 581 using the Benjamini-Hochberg procedure. We used the same procedure to test for significant 582 deviations in the distributions of generic mutations across founders. 583 584 Alex Pleşa, Divjot Kaur, Emily Peñaherrera, Kevin Longoria and Lesly Villarejo for laboratory 588 assistance. We thank Huanyu Kuo for the analysis of growth-curve data. We thank Eva 589 Garmendia for providing the recombineering plasmids and Georg Rieckh for providing the 590 resistance marker plasmids. We thank Benjamin Good for help with his genome sequencing data 591 analysis pipeline. We thank Kristen Jepsen and the UCSD Institute for Genomic Medicine for 592 sequencing services and the San Diego Supercomputing Center for providing the computational 593 environment. BK acknowledges the support by the John Templeton Foundation (#58562 and 594 #61239); the NASA Exobiology and Evolutionary Biology Program (#H006201406) and the 595 NASA Astrobiology Institute (#NNA17BB05A). SK acknowledges the support by BWF Career 596 Award at the Scientific Interface (#1010719.01), the Alfred P. Sloan Foundation (#FG-2017-597 9227) and the Hellman Foundation. 598