Evolutionary signals of symbiotic persistence in the legume–rhizobia mutualism

Understanding the origins and evolutionary trajectories of symbiotic partnerships remains a major challenge. Why are some symbioses lost over evolutionary time whereas others become crucial for survival? Here, we use a quantitative trait reconstruction method to characterize different evolutionary stages in the ancient symbiosis between legumes (Fabaceae) and nitrogen-fixing bacteria, asking how labile is symbiosis across different host clades. We find that more than half of the 1,195 extant nodulating legumes analyzed have a high likelihood (>95%) of being in a state of high symbiotic persistence, meaning that they show a continued capacity to form the symbiosis over evolutionary time, even though the partnership has remained facultative and is not obligate. To explore patterns associated with the likelihood of loss and retention of the N2-fixing symbiosis, we tested for correlations between symbiotic persistence and legume distribution, climate, soil and trait data. We found a strong latitudinal effect and demonstrated that low mean annual temperatures are associated with high symbiotic persistence in legumes. Although no significant correlations between soil variables and symbiotic persistence were found, nitrogen and phosphorus leaf contents were positively correlated with legumes in a state of high symbiotic persistence. This pattern suggests that highly demanding nutrient lifestyles are associated with more stable partnerships, potentially because they “lock” the hosts into symbiotic dependency. Quantitative reconstruction methods are emerging as a powerful comparative tool to study broad patterns of symbiont loss and retention across diverse partnerships.

Understanding the origins and evolutionary trajectories of symbiotic partnerships remains a major challenge. Why are some symbioses lost over evolutionary time whereas others become crucial for survival? Here, we use a quantitative trait reconstruction method to characterize different evolutionary stages in the ancient symbiosis between legumes (Fabaceae) and nitrogen-fixing bacteria, asking how labile is symbiosis across different host clades. We find that more than half of the 1,195 extant nodulating legumes analyzed have a high likelihood (>95%) of being in a state of high symbiotic persistence, meaning that they show a continued capacity to form the symbiosis over evolutionary time, even though the partnership has remained facultative and is not obligate. To explore patterns associated with the likelihood of loss and retention of the N 2 -fixing symbiosis, we tested for correlations between symbiotic persistence and legume distribution, climate, soil and trait data. We found a strong latitudinal effect and demonstrated that low mean annual temperatures are associated with high symbiotic persistence in legumes. Although no significant correlations between soil variables and symbiotic persistence were found, nitrogen and phosphorus leaf contents were positively correlated with legumes in a state of high symbiotic persistence. This pattern suggests that highly demanding nutrient lifestyles are associated with more stable partnerships, potentially because they "lock" the hosts into symbiotic dependency. Quantitative reconstruction methods are emerging as a powerful comparative tool to study broad patterns of symbiont loss and retention across diverse partnerships. symbiosis | cooperation | deep history | reconstruction | persistence S ymbiotic partnerships have transformed the Earth's nutrient cycles (1), driven the diversification of organisms (2,3), and facilitated rapid adaptation of species to divergent new niches (4). Almost all organisms rely on symbiotic associations for some form of metabolism, protection, or energy (5,6). These relationships can be stabilized if the interests of the partners are tightly aligned or partners have evolved mechanisms to control and coordinate interactions (7)(8)(9). However, symbioses can also be unstable over evolutionary time (10)(11)(12), and conflicts among partners can lead to breakdown of potentially beneficial relationships, and even drive the emergence of parasitism (13,14). Although our theoretical understanding of the factors that promote cooperative relationships has increased (e.g., refs. [15][16][17][18], we have lacked comparative approaches that can be used across a diversity of symbiotic systems to ask why some partnerships persist over millions of years and others are more labile. What distinguishes host species that acquire highly stable symbioses from those species that do not? Do different hosts follow similar trajectories of symbiotic evolution? Can different stages of symbiotic evolution be correlated to specific ecological conditions? One tool to study broad patterns of symbiont evolution is quantitative trait reconstruction (19)(20)(21), a comparative method that allows us to (i) identify when symbioses become more or less persistent and (ii) correlate these levels of persistence to ecological factors and partner traits.

Persistent Symbioses
We define symbiotic persistence as the continued capacity of an organism to form a particular symbiosis over evolutionary time (e.g., millions to tens of millions of years). High symbiotic persistence means that the capacity to form the symbiosis is more likely to be retained in a species or lineage whereas highly labile symbioses are characterized by low persistence and frequent loss of symbiotic capacity. As defined here, symbiotic persistence is an inherently phylogenetic concept (i.e., increased evolutionary stability of symbioses), meaning that species with an increased symbiotic persistence may be phenotypically indistinguishable from more labile symbioses. Symbiotic persistence is not the same as a symbiosis being obligate or genetically integrated. Although we expect intracellularly integrated partnerships to typically have high symbiotic persistence, the reverse is not necessarily true. For instance, some symbioses successfully persist for millions of years, without ever integrating intracellularly into the host (9,(22)(23)(24)(25). When a symbiosis has become fully persistent, the loss rate falls to zero, or nearly zero, because even some usually presumed permanent symbioses (e.g., mitochondria or chloroplasts in multicellular organisms) can be lost in extremely rare cases (26,27).
The level of symbiotic persistence can be quantified using modern comparative methods, allowing us to estimate the likelihood that a symbiosis will be retained, rather than lost (20,28). Loss rates across phylogenetic trees can be compared to identify species or clades associated with more persistent symbiotic partnerships. We can then attempt to correlate highly persistent symbioses with global distribution maps, climate and soil databases, or organismal trait databases to elucidate the environmental conditions and species characteristics associated with evolutionary persistence.
Testing for Persistence: The Case of the N 2 -Fixing Symbiosis Our aim is to quantify symbiotic persistence in the mutualism between nodulating plants and their nitrogen-fixing bacteria. We focus specifically on plants in the Fabaceae (legumes) associating with a class of nitrogen-fixing bacteria known broadly as rhizobia (29). In exchange for atmospheric nitrogen fixed by bacterial partners, plants provide rhizobia with photosynthate for energy and highly specialized root structures, called nodules, to house bacteria (30). Although rhizobia can survive and reproduce in the soil, colonization of a legume and establishment of a successful symbiosis can lead to a 10 8 -fold increase in a single rhizobial cell's abundance (31), thereby creating a strong selection pressure for symbiotic engagement. Host plants can also survive in the absence of bacterial partners but do, in many ecological situations (32), benefit greatly from the nitrogen the bacteria provide (33,34).
The legume-rhizobial symbiosis is a good case study to quantify symbiotic persistence for a number of reasons. First, the phylogenetic relationship among potential hosts is well-characterized (35), and a newly compiled database of which hosts engage in the rhizobial symbiosis is available (36; data available from Dryad, dx.doi.org/doi:10.5061/dryad.05k14). Second, the symbiosis is found across a range of plant hosts, spanning highly diverse habitats and ecosystems (37,38), allowing for potential variation in symbiotic persistence and in the selection pressures driving it. Third, different morphological types of the symbiosis have evolved across host plants (30,38,39), suggesting that there is potential for variation in the evolutionary trajectory of the symbiosis. Lastly, because the symbiosis is facultative for both partners, it represents an interesting case in which high symbiotic persistence is not necessarily linked with strict mutual dependence or host-symbiont integration, as is the case with eukaryotic organelles or some insect endosymbionts (40,41).
We previously used a phylogenetic reconstruction approach called "Hidden Rate Models" (HRMs) (20,28) to study the evolutionary history of the N 2 -fixing symbiosis across the angiosperms (36). These models allow for heterogeneity in the loss rate of a trait, like a symbiotic partnership, and thus can identify the phylogenetic signal where host clades shift in and out of different evolutionary states (20,28), such as when labile symbiotic associations become more stable. In our previous work, we applied such a model to 3,467 angiosperm species, 1,709 of which were in the four symbiotic nitrogen-fixation orders (Fabales, Fagales, Rosales, and Cucurbitales), known as the nitrogen-fixing clade (37,42). We found that, across the nitrogen-fixing clade, symbiotic host plants can be in two different evolutionary states: a "regular fixing state" where the ability to form the symbiosis remains highly labile (1.17 losses per 100 million years per lineage) or a second class termed "stable fixing state" in which host plants are characterized by extremely low loss rates (0.02 losses per 100 million years per lineage): i.e., almost 60 times lower than that of hosts in the regular fixing state (36). The analysis suggested that, for plants in a state of stable fixing, it is almost impossible to lose the capacity to form the symbiosis (36).
What plant hosts have entered into this stable symbiotic state and why? Can specific ecological factors and host traits be correlated with the stable fixing state? To answer these questions, we further studied the 1,366 legume species in our analyzed dataset, of which 1,195 are capable of forming the symbiosis (data available from Dryad, dx.doi.org/doi:10.5061/dryad.05k14). Legumes are the best characterized order in the nitrogen-fixing clade and have a wide geographic range (37,38), ensuring that we could explore a range of potential environmental conditions and host traits, and there is considerable variation in their ability to successfully form a symbiosis with rhizobia (30). We first used an HRM approach to test whether the best fit model of symbiosis evolution in angiosperms was similar to that in legume species. To establish if this model still had the best fit, we repeated our HRM analysis using the legume dataset and the same phylogeny, this time pruned to the legumes (35; data available from Dryad, dx.doi.org/10.5061/dryad.63q27.2). We found that, in agreement with the angiosperm analysis (36), our best model included both a regular fixing state and a stable fixing state with increased symbiotic persistence [89.6% Akaike information criterion (AIC) weight]. Additionally, we found a highly significant correlation between the stable fixing state likelihoods of extant nodulating legume species from our previous analysis (36) and our analysis of legumes only (r = 0.92, P << 0.01). These observations confirm that our model of symbiosis evolution across the angiosperms also provides an accurate representation of the evolution of stable fixation within the legumes.
We then mapped our calculated stable fixing likelihoods (data available from Dryad, dx.doi.org/doi:10.5061/dryad.05k14) ( Fig.  1) onto the legume phylogeny ( Fig. 2) to analyze the evolutionary trajectory of the stable fixing state and to explore its correlation with ecological factors and host traits. The first observation is that there is considerable variation in symbiotic persistence. High stable fixing likelihoods (>95%) were found in 803 of the symbiotic legume species studied whereas 95 symbiotic species had very low likelihoods (<20%) of stable fixing (Fig. 1). The second observation is that the evolution to the stable fixing state is likely to have multiple origins across the legumes (Fig. 2). This finding suggests that there was no single event or innovation that allowed hosts to transition into a stable state, but rather that there were several parallel routes toward persistent capacity for symbiosis (36). Third, we observed that, within the Papilionoidea, there are repeated likely origins of stable fixing (Fig. 2). These origins range from the oldest over 50 million years ago, to a more recent origin a little over a million years ago, in the genus Amorpha (false indigo). This observation illustrates how, even within single clades, there is large variation in the timing of symbiotic stabilization. Fourth, we see that not all legumes have reached or are close to reaching the stable state ( Fig. 1), which emphasizes that, in many legume clades like in the Mimosoideae (Fig. 2), the symbiosis is easier to lose.

Correlations Between Symbiosis Persistence and Ecological Factors or Host Traits
Quantitative reconstruction methods, such as the hidden rate models (20,28), allow researchers to quantify the likelihood across the host phylogeny of evolutionary states that cannot necessarily be identified through phenotypic differences, such as the stable fixing state, and then ask whether there are ecological or trait commonalities among hosts in the stable fixing state. Specifically our aim was to ask how occurrence, climate, environment, and host trait data can be related to particular evolutionary states. We characterized symbiotic persistence in nodulating legumes by testing potential explanatory factors at four levels: (i) global geographical distribution, (ii) climate variables, (iii) local soil variables, and (iv) plant traits. For all variables considered, we explored the correlation of the potential explanatory variable (distribution, climate, etc.) with the underlying continuous likelihood of being a stable fixer as the response variable (36; data available from Dryad, dx.doi.org/doi:10.5061/dryad.05k14).
Latitudinal Pattern in Symbiotic Persistence Is Driven by Temperature.
We first asked whether there were differences in global distribution between stable fixing legume species (>50% stable fixing likelihood) and legumes where the symbiosis is more easily lost (<50% stable fixing likelihood) (Fig. 1). We used the Global Biodiversity Information Facility (GBIF) to obtain (after quality control steps) a total of 3.2 million legume occurrences for 1,182 nodulating legume species in our analysis (43) (Methods). We  plotted the occurrence of both regular (Fig. 3A) and stable fixers (Fig. 3B) onto the map of the world. We found that, in tropical regions, both regular and stable fixers are common. In contrast, in temperate, and particularly in polar regions, the legumes are almost exclusively characterized by highly persistent symbioses. When we considered the absolute median latitude of fixing legume species as a potential explanatory variable for stable fixing likelihood, we found a significant positive correlation using phylogenetic regression to correct for phylogenetic nonindependence (coefficient, 0.04% increase in stable fixing likelihood per degree absolute latitude, P = 0.02) (44,45). We then asked whether the correlation between latitude and symbiotic persistence could be explained by mean annual temperature (TMP Mean ). For each legume occurrence, we extracted the annual mean temperature (Fig. 3) from a global interpolated climate grid (46), obtaining a total of over 1.4 million temperature records, and calculated the median species-level TMP Mean over all of the locations of a species for the 1,095 species for which sufficient data were available (Methods). We then used AIC criteria to test models that included temperature, latitude, and their interaction. We found that the only significant term was a negative correlation between the median annual mean temperature and stable fixing (P = 0.04), and that the previously significant latitude term was no longer significant in a full model containing both terms (phylogenetic regression, P = 0.41). The negative correlation between temperature and stable fixing was confirmed when we simplified the model to include only the significant model term temperature (coefficient, −0.1% change in stable fixing likelihood per degree TMP Mean , P = 0.01). This finding suggests that the observed latitudinal pattern of stable symbiosis distribution is driven by temperature.

Soil Properties Are Not Correlated with Symbiotic Persistence in
Legumes. We then considered local environmental variables, specifically focusing on soil characteristics. We were interested in whether stable fixing hosts were positively correlated with low resource soils, particularly N availability. We extracted soil nitrogen, soil pH, and clay content based on our legume occurrence data (Fig. 2) from the Harmonized World Soil Database (47) and the International Soil Reference and Information Centre (ISRIC)-World Inventory of Soil Emission Potentials (WISE) database (48,49), obtained a total of over 1.2 million records per soil variable and calculated species-level median values for 1,095 species (Methods). Using AIC criteria, we tested various models, which included the soil factors and their interactions. We found that none of the three soil variables improved the fit of our phylogenetic regression model compared with a model that fits only the intercept. We therefore rejected soil N, clay content, and pH as factors correlated with symbiotic persistence in the N 2 -fixing symbiosis in legumes.

High Host Phosphorus and Nitrogen Content Correlates with Symbiotic
Persistence. Lastly, we tested for correlations between stable fixing state likelihoods and host plant traits, focusing on leaf nitrogen and phosphorus content, asking whether the stable fixing state was associated with greater accumulation of nitrogen and phosphorus (32,50). We obtained plant leaf nitrogen content (N mass %; 1,114 records) and phosphorus content (P mass %; 612 records) and calculated averages for 131 fixing legumes from our phylogenetic analysis for which data were available (Methods). We tested the association between stable fixing and host plant nutrient content in a phylogenetic regression model and found positive and significant relationships between N mass % (coefficient 0.03, P = 0.03) (Fig. 4) and P mass % (coefficient 0.05, P = 0.03) (Fig. 5) and the likelihood that a species is currently a stable fixer, as well as a negative interaction between N mass % and P mass % (coefficient, −0.01, P = 0.04). These positive correlations suggest that a higher symbiosis persistence is associated with a higher average leaf nitrogen and phosphorus content in legume hosts whereas the negative interaction term suggests that the positive relationship of either nutrient with symbiosis persistence diminishes with increases in the other nutrient.

Discussion
Symbiotic associations can be highly labile, with partnerships gained and lost over relatively short evolutionary time periods. Other symbioses show remarkable persistence over hundreds of millions of years. Although a suite of genomic tools has emerged to study symbiotic persistence as the physical process of host-  Fig. 2. Pie charts indicate the likelihood of a host node being in either a nonsymbiotic state (orange), a "regular" N 2 -fixing state (green), or in a stable fixing state (purple), where the symbiosis has a high persistence. Represented are the legumes from their ancestral node about 62 million years ago. After the evolution of symbiotic nitrogen fixation (transition from orange to green), repeated evolution of stable fixing (transition from green to purple) is found across the legumes. Particularly within the Papilionoideae, legume species are very likely to currently be stable fixers, meaning that they show a continued capacity to form the symbiosis over evolutionary time. (Scale bar: 10 million years.) symbiont integration (51-53), we lacked an approach that allowed researchers to ask broader scale questions about patterns and processes across diverse symbiotic systems. Quantitative phylogenetic reconstruction methods can be used to first identify clades with an increased symbiotic persistence, and then to analyze factors that can be correlated with persistence likelihoods. Studying symbiotic persistence in the legume-rhizobia mutualism is interesting because (i) there are legume clades in which the symbiosis is relatively labile over evolutionary times and others where it is quite stable (Figs. 1 and 2) and (ii) the symbiosis is not linked to mutual dependence or to intracellular integration (51,52,54), despite host benefits of N 2 fixation in many ecological conditions (33,34). These findings stress that not all highly persistent symbioses are on a route to intracellular lifestyles (55,56). What factors determine whether a symbiosis is maintained or easily lost on an evolutionary time scale?
We performed a phylogenetic analysis of symbiotic persistence in legume hosts to explore possible geographic, climate, and soil variables and host traits that might correlate with symbiotic state likelihoods in legumes. Although research has demonstrated global variability in symbiosis investment (for example, physical intensity of host root colonization by symbiotic mycorrhizal fungi, ref. 57), there has been no attempt to map global patterns of evolutionary states of a symbiosis. First, we analyzed the geographical distribution of stable fixers across the globe and found a clear pattern showing that, at higher latitudes, legumes are generally stable fixers (Fig. 3A). In contrast, closer to the equator, the symbiosis can be highly persistent in some clades, as well as labile in others (Fig. 3B). These observations have an interesting parallel with a recent model that showed a latitudinal shift in N 2 -fixing tree strategies, with a constant rate of nitrogen fixation being favored at high latitudes and a plastic, flexible rate closer to the tropics (58). We found evidence that the latitudinal pattern of stable fixing is driven by temperature differences among species occurrences, with low annual temperatures associated with high symbiotic persistence. Although we cannot exclude the possibility that this pattern is driven by an unmeasured variable associated with global temperatures, this result is further confirmed by the relatively high occurrences of stable fixers in other low-temperature habitats at lower latitudes, such as the Himalayas and the Andes (Fig. 3B). An interesting hypothesis to consider is that the negative relationship between symbiosis persistence and temperature is linked with higher rates of molecular evolution in the tropics versus higher latitudes (59,60). Experimental work has revealed that mutations in single genes can be sufficient to disrupt a legume's capacity for symbiosis (61,62). Although speculative, higher rates of molecular evolution driven by temperature could disrupt key gene networks involved in nodulation, thereby driving loss of the capacity for symbiosis. A similar positive effect of temperature on symbiosis loss has been reported experimentally in rhizobial bacteria (63). Whether this relationship between low temperature and high persistence holds for other symbiotic systems requires further exploration, but illustrates how patterns emerging from quantitative reconstructions can be used to answer broad questions about symbiosis evolution and stability. Another potential explanation for the observed latitudinal pattern relates to relative phosphorus and nitrogen limitations to plant hosts. Colder biomes are thought to be relatively nitrogen-limited, especially in formerly glaciated regions where time for nitrogen to accumulate from biotic N 2 fixation has been relatively short (64)(65)(66)(67)(68). In contrast, the (sub)tropics tend to have very old, weathered soils depleted in phosphorus (68)(69)(70), where special adaptations to acquire phosphorus are presumably more important than N 2 fixation. Such differences in soil type could favor the evolution of stable N 2 -fixing symbioses in polar regions.
Linking persistence of the N 2 -fixing symbiosis to N/P ratios remains an open question. However, we were able to extract three soil properties for our global legume occurrences to test the idea that stable fixing is associated with lower quality soils. We expected to find increased symbiotic persistence when environmental nitrogen availability, a function of soil pH, clay content, and nitrogen content (71), was lower because hosts theoretically derive more benefit from symbiosis in low-quality  environments (72)(73)(74). Surprisingly, we found no significant relationship between soil variables and symbiotic persistence. This result could have a few potential explanations. The first is that most legumes are capable of down-regulating, and temporally controlling, nodulation in high soil nitrogen conditions (75,76). Such control means that legumes can limit the potential negative effects of engaging in symbiosis in high nitrogen conditions and thus might not experience strong selection to lose the capacity for nodulation. However, this scenario assumes that there are few costs associated with hosts maintaining the symbiosis, which is unlikely (77). A second explanation is that we have analyzed only median total soil nitrogen whereas symbiosis stability in legumes might correlate with plant-available nitrogen and particularly with its lowest values, which fluctuate through the seasons (78)(79)(80). Third, a potential limitation of this analysis is that the scale of resolution of the soil databases we analyzed (1/2 a degree) might not be sufficient to detect variation in soil properties at a fine spatial scale. Although this smaller-scale variation is unlikely to affect global-scale patterns in soil properties, it might obscure relationships between soil variables at a finer scale and symbiotic persistence. A fourth explanation is that the soil environment data are a reflection, rather than a driver, of symbiotic persistence. The lack of correlation with low-quality soil and high symbiotic persistence could be an indication of the historic colonization of the environment by N 2 -fixing legumes, resulting in an influx of nitrogen to the soil (81,82). This relationship illustrates a limitation of correlating particular habitat variables with evolutionary states: it can be difficult to determine cause and effect. This temporal problem can potentially be addressed by modeling evolution of the nitrogen fixation and soil variables simultaneously in more advanced phylogenetic frameworks (e.g., refs. 19, 21, 83, and 84), although evolutionary mapping of extrinsic traits like soil properties has attracted criticism (85).
Lastly, we found that leaf nitrogen and phosphorus contents were both positively correlated with highly persistent symbioses (Figs. 4 and 5). High N-demanding lifestyle and the capacity to engage in N 2 fixation are thought to be linked (32,50); our results are consistent with this idea and suggest that it also extends to symbiosis stability. Symbiotic N 2 fixation allows for a more consistent source of nitrogen, and this link potentially drove the evolution of a high nitrogen lifestyle in some legumes, in turn favoring the symbiosis to persist. The positive correlation between leaf phosphorus content and persistent symbiosis is likewise expected because N 2 fixation is linked with high phosphorus demand because N 2 -fixing efficiency is decreased by low phosphorus availability (32,86). Another idea is that symbiotic N 2 fixation may help alleviate phosphorus limitation because it allows for the secretion of more nitrogen-rich phosphatases, favoring plant phosphorus harvesting (87). Consequentially, high host nutrient demands may increase the benefits of being in a symbiotic relationship (88,89), selecting for persistence of symbiotic capacity. It is still an open question as to whether a high nutrient lifestyle predated symbiosis persistence or the reverse. More detailed research is needed to determine the evolutionary order and causality of the correlation between persistence of nodulation capacity and high nutrient demands. Because a high-nutrient lifestyle is linked with fast growth and more investment in reproduction, that suite of traits is highly beneficial within a particular postdisturbance regeneration niche (sensu ref. 90). Legume clades experiencing such a selective regime may have effectively "locked" themselves into a symbiotic lifestyle.
A major question emerging from this research is why facultative symbioses, such as the N 2 -fixing symbiosis, are lost at all? In obligate symbioses, the absence of the symbiotic partner, or a single mutation disrupting the interaction, could immediately lead to extinction of the obligate partner. But why would a facultative species lose the capacity for symbiosis? In general, we would expect that three conditions need to be met: (i) an extended period in which the host plant does not need the symbiosis (for instance because the same benefits are provided by the environment); (ii) one or more mutations occur that cause the loss, which may be more or less likely in different clades; and (iii) the species needs to survive potential competition with other legumes that do retain the capacity for symbiosis, and then even perform better if conditions return in which being symbiotic is favorable. Thus, the capacity for symbiosis is lost if there is a long-term selective environment for loss (conditions i and iii) and genetic variation driving symbiosis loss (condition ii). If there is a cost, small or large, associated with maintaining the underlying genetic mechanism of N 2 -fixation capacity, this cost can contribute to a selective environment for loss of symbiotic capacity. Although there is some theoretical support for this idea (77,91), it would be interesting to explicitly test for costs of maintaining facultative symbioses across diverse systems. Similarly, other factors we have considered, like high host nutrient demands, also affect the selective environment whereas factors like high molecular rates of evolution (59,60) and genome duplication (92) impact upon the likelihood that mutations disrupt symbiotic capacity. By duplicating genes or whole genomes, redundant copies of key genes regulating the symbiosis can be created (92,93). Because mutations are less likely to disrupt key symbiosis genes, duplication could result in a lower rate of symbiosis loss and higher retention of symbiotic capacity (36,94). The potential link between symbiosis persistence and gene duplication is an interesting research avenue.
Although our aim is to illustrate how quantitative phylogenetic methods can be used to explore patterns of symbiosis evolution, we also recognize limitations to this approach. For example, our dataset of leaf nutrient contents was relatively small compared with our legume occurrence, temperature, and soil datasets (Methods). Additionally, these quantitative analyses are only correlative, and thus can only suggest patterns. Experiments are needed to compare the physiology of legumes and other hosts showing different likelihoods of symbiotic persistence (Fig. 1), allowing us to test for actual mechanisms that influence symbiosis stability (e.g., ref. 95). For instance, whereas there have been short-term evolutionary studies analyzing the mechanisms of symbiosis loss in rhizobia (96,97), similar work studying the molecular mechanisms underlying nodulation loss in legumes is lacking. Lastly, our work is not a comprehensive analysis of all of the factors that could be involved in symbiotic persistence in N 2 fixation. Rather, it illustrates the types of questions and variables that can be addressed. Other factors, such as historic atmospheric CO 2 concentrations (98), have been thought to affect the evolution and stability of other plant root mutualisms (99,100) and could likewise be considered in this framework.
The real power of a quantitative comparative approach is the ability to study evolutionary dynamics across a diversity of symbiosis types, from terrestrial (54,101) to marine (102) habitats in various stages of persistence. For example, some squid species have a very intimate association with bioluminescent bacteria (25). Similar to the legume-rhizobial symbiosis, the bacterial symbionts retain a free-living stage and do not become physically integrated with the host. However, unlike legume hosts that are often found without rhizobial partners (30), adult squid hosts show a more consistent symbiotic engagement strategy with their bioluminescent symbionts (103). Why do these patterns differ and how is this trend reflected in the phylogenetic reconstruction of the partnerships? For instance, do we see fewer instances of loss for symbiotic capacity in squid lineages than legumes and why? Similarly, many insects have bacterial endosymbionts, some of which have been estimated to be associated with the same host clades for millions of years, and have become physically, genetically, and metabolically integrated with hosts (52,54). One idea is that such integration is favored when symbionts access and deliver new forms of energy for their hosts, rather than simply increasing the efficiency of resource acquisition (56), as happens in the rhizobia-legume symbiosis. This idea could be tested in a comparative framework, asking what host or symbiont traits are associated with symbiont integration versus facultative associations? The first step to integration may be a switch from a more labile symbiotic state to a more persistent state, which can be phenotypically indistinguishable, but may be detected using phylogenetic comparative methods. Understanding these signals, and the general features that promote symbiosis evolution, will require quantitative reconstructions across a diversity of partnerships. Such comparative studies will shed light on how symbioses have solved the problem of aligning and controlling the interests of two species throughout the history of life (55).

Methods
We previously reconstructed the evolutionary history of N 2 fixation using an HRM approach and a database of the nodulation status of 3,467 angiosperm species (36). Our subsequent analyses focused on the 1,366 legumes (Fabaceae) from our database to determine their stable fixing likelihoods and to test possible correlations between stable fixing likelihood and environmental and trait factors. We first confirmed our previous conclusion that nodulating legumes could be subdivided into stable fixers, with an increased symbiotic persistence, and in regular fixers, and that this finding also holds when analyzing only the legumes. We then tested for correlations between stable fixing likelihood in legumes capable of forming nodules (1,195 species; this subset is smaller because not all 1,366 legumes species can nodulate) (36) and (i) geographic, (ii ) climate, and (iii) environmental variables and (iv) host traits. We obtained (i) location records for these species from the Global Biodiversity Information Facility (GBIF), (ii ) temperature records for these locations from the Climate Research Unit (CRU) (46), (iii) soil clay content and pH values for all legume occurrences from the Harmonized World Soil Database (47) and soil nitrogen from the ISRIC-WISE database (48,49), and (iv) plant leaf nitrogen and phosphorus concentrations from the TRY Plant Trait Database (104) and from the Yasuni Forest Dynamics database (105).
We used the R-package rgbif (43) to obtain location records for the nodulating legume species recorded in GBIF and our database, resulting in coverage of 1,182 species. We limited the maximum total number of locations per species to 100,000 (43). This procedure resulted in an analysis of almost 3.7 million locations. Subsequently, we scrubbed our location data to exclude all data points with spatial error codes (cdiv, cdout, cdrepf, cdreps, cucdmis, gdativ, preneglat, preneglon, preswcd, zerocd) and obtained a dataset of over 3.2 million legume locations across the globe (median 289 locations per species). We characterized species as stable fixers if they had a corrected stable fixing likelihood of more than 50% (Figs. 1 and 2), and otherwise as regular fixers for visualization purposes. We plotted these locations to the world map to identify general geographic patterns (Fig. 3) and found a strong latitudinal pattern. We calculated the median latitude per species across all their locations in our dataset and tested whether there was a latitudinal pattern in stable fixing likelihood by generating a phylogenetic linear regression model of stable fixing likelihood on absolute median latitude in the R-package phylolm, using Pagel's lambda (44,45). To correct for phylogenetic nonindependence, we pruned the full angiosperm phylogeny (35; data available from Dryad, dx.doi.org/10.5061/dryad.63q27.2) to include only those legumes for which we had both median latitude and stable fixing likelihoods.
We then obtained mean long-term annual temperature (TMP Mean ) for all locations in our cleaned location dataset, using a bilinear interpolation (106) to sample for the location from the climate grid (46). We calculated the median species-level TMP Mean across all locations for each species, including only species for which we had temperature records that covered at least 10 different locations, resulting in a coverage of 1,095 legumes. We then explored the extent to which the observed latitudinal pattern was driven by temperature. As previously, we generated phylogenetic linear regression models of stable fixing likelihood as a response variable and its association with absolute median latitude and median TMP Mean in phylolm using Pagel's lambda (44,45). We used AIC criteria to establish that the best model contained only temperature as an explanatory variable.
To analyze the correlation between stable fixing likelihoods and soil properties, we used a similar approach and used a bilinear interpolation to sample for the legume occurrence locations, which were then used to obtain soil clay content, soil pH, and total soil N levels from the Harmonized World Soil Database and the ISRIC-WISE database (both 1/2 a degree resolution) (47)(48)(49). These values were used to calculate species-level median values. We then generated phylogenetic linear regression models to explain the response variable stable fixing likelihoods in terms of these three soil variables. Using AIC criteria, we found that inclusion of none of these variables or their combinations provided a better fit than a model that included only an intercept.
We then studied the relationship between symbiotic persistence and host plant traits in nodulating legumes, and we obtained a database of plant leaf nitrogen and phosphorus contents from the TRY Plant Trait Database (104, and from the Yasuni Forest Dynamics database (105). We calculated the mean nutrient content per species for both nitrogen (N mass %) and phosphorus (P mass %) and compiled a dataset of 131 symbiotic legumes for which we have both nitrogen and phosphorus data. Here, we found using AIC that the best phylogenetic linear model explaining stable fixing likelihood among legumes was one that included plant leaf phosphorus and nitrogen content as explanatory variables, as well as their interaction. We normalized N mass and P mass by scaling them by their respective SDs so that the regression coefficient could be compared between variables.
A potential limitation to the phylogenetic linear model analyses presented here is that we are taking the output of one phylogenetic analysis (i.e., stable fixing likelihoods) (36) and using it as an input for subsequent analysis. As defined here, symbiotic persistence can be discovered only in a phylogenetic reconstruction, not directly measured or observed in an organism, necessitating this approach. However, we can verify our results by testing for the phylogenetic correlation between the variables of interest for our best fit models and the underlying binary data on the presence or absence of nitrogen fixation capacity in legumes (data available from Dryad, dx.doi.org/doi:10.5061/ dryad.05k14). We therefore generated phylogenetic generalized linear models in phylolm (44,136), with host legume nodulation capacity as a dependent binary variable, and either mean annual temperature (TMP Mean ) or leaf nitrogen (N mass %) and phosphorus (P mass %) content and their interactions as explanatory variables. As previously, we scaled N mass and P mass by their respective SDs so that the regression coefficient could be compared between variables. We expect that, if there are relationships between these variables and stable fixing, they should also be recovered when analyzing the correlation between them and fixation per se. As expected, we recovered the same relationship of leaf nitrogen (coefficient, 1.22, P < 0.01) and phosphorus content (coefficient, 1.11, P < 0.01) and their interaction (coefficient, −0.37, P < 0.01) with symbiotic N 2 fixation. Additionally, as expected, we recover that legumes at lower TMP Mean are more likely to be able to form the symbiosis (coefficient, −0.10, P = 0.24). However, this association is not significant, indicating that this result is less robust than the link between symbiotic nitrogen fixation and host nutrient requirements.