Skip to main content

Main menu

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home
  • Log in
  • My Cart

Advanced Search

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
Research Article

Evolutionary learning of adaptation to varying environments through a transgenerational feedback

View ORCID ProfileBingKan Xue and Stanislas Leibler
  1. aThe Simons Center for Systems Biology, Institute for Advanced Study, Princeton, NJ 08540;
  2. bLaboratory of Living Matter and Center for Studies in Physics and Biology, The Rockefeller University, New York, NY 10065

See allHide authors and affiliations

PNAS October 4, 2016 113 (40) 11266-11271; first published September 19, 2016; https://doi.org/10.1073/pnas.1608756113
BingKan Xue
aThe Simons Center for Systems Biology, Institute for Advanced Study, Princeton, NJ 08540;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for BingKan Xue
  • For correspondence: bkxue@ias.edu livingmatter@rockefeller.edu
Stanislas Leibler
aThe Simons Center for Systems Biology, Institute for Advanced Study, Princeton, NJ 08540;
bLaboratory of Living Matter and Center for Studies in Physics and Biology, The Rockefeller University, New York, NY 10065
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  1. Contributed by Stanislas Leibler, August 22, 2016 (sent for review June 1, 2016; reviewed by Terence Hwa and Oliver J. Rando)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

Phenotypic diversification, a form of evolutionary bet hedging, is observed in many biological systems, ranging from isogenic bacteria to cell populations in a tumor. Such phenotypic diversity is adaptive only if it matches the statistics of local environmental variation. Often, the timescale of environmental variation can be much longer than the lifespan of individual organisms. Then how could organisms collect long-term environmental information to modulate their phenotypic diversity? We propose here a general mechanism of “evolutionary learning,” which could overcome the mismatch between the two timescales. The learning mechanism can in principle be realized through known molecular processes of epigenetic inheritance, suggesting experimental directions to probe the evolutionary significance of such processes.

Abstract

Organisms can adapt to a randomly varying environment by creating phenotypic diversity in their population, a phenomenon often referred to as “bet hedging.” The favorable level of phenotypic diversity depends on the statistics of environmental variations over timescales of many generations. Could organisms gather such long-term environmental information to adjust their phenotypic diversity? We show that this process can be achieved through a simple and general learning mechanism based on a transgenerational feedback: The phenotype of the parent is progressively reinforced in the distribution of phenotypes among the offspring. The molecular basis of this learning mechanism could be searched for in model organisms showing epigenetic inheritance.

  • evolution
  • bet hedging
  • epigenetic inheritance
  • population growth
  • environmental fluctuations

All biological organisms have to adapt to variations in their environment. This is particularly challenging if these variations are not regular or present strong random components. In this case, adaptation has to rely on the information gathered over time, allowing organisms to anticipate such environmental changes.

One class of adaptation mechanisms depends directly on detection of undergoing changes in the environment, such as a sudden shift in temperature, chemical composition, or ecological structure. Sensing and signal transduction machinery allows the organism to change the course of development or induce a direct phenotypic transformation. Such “phenotypic plasticity” can be implemented on the level of an individual organism, at the cost of maintaining an efficient detection and signal transduction system. Examples of phenotypic plasticity are ubiquitous, e.g., the switching of metabolic pathways in microorganisms upon changes of food source (1) or the control of flowering time in plants according to temperature and photoperiod modulations (2).

Another class of adaptation mechanisms relies on “phenotypic diversification,” which acts primarily on the population level. Instead of transitioning to a new phenotype upon a detected change in the environment, adaptation is achieved by constantly sustaining a distribution of various phenotypes in the population. The latter is implemented by individual organisms having different propensities for expressing different phenotypes, which can be mathematically described by an associated probability distribution. At a given time, some of the phenotypes are better adapted to the prevailing local environment than the others. The simultaneous presence of multiple phenotypes can increase the population’s long-term growth rate in a varying environment. Such adaptive phenotypic diversification is often referred to as “bet hedging” (3, 4). Many observed cases of phenotypic diversity have indeed been suggested as examples of bet hedging (5, 6), including bacterial populations that generate a low frequency of slowly growing persisters that are tolerant to antibiotics (7), annual plants that produce a fraction of seeds that remain dormant underground for multiple years while unaffected by unpredictable weather (8⇓–10), and insects that undergo variable lengths of diapause that halts reproductive development to cope with uncertain ecological conditions (11).

Phenotypic diversification is proposed to be useful when environmental cues are unreliable or phenotypic plastic responses are costly or ineffective (12). (Theoretical studies of population dynamics in fluctuating environments, including bet-hedging strategies, often assume growing populations with density-independent fitness values. For recent work on the effect of environmental fluctuations on growth-limited populations, applicable to phenotypic plasticity, see, e.g., ref. 13.) It has been argued that the optimal distribution of phenotypes in the population is typically determined by the frequency of encountering different environmental conditions (14). Therefore, the bet-hedging organism requires information not only about the present environment, but also about the statistics of the past environment. Such long-term environmental information can be acquired only if the information is gathered, stored, and then transmitted over consecutive generations.

If the knowledge about the past is used to adjust the phenotypic diversity of the population in a favorable direction, then the organism may be said to have “learned” to adapt to the varying environment. Fundamentally, how can such “evolutionary learning” be achieved? Ideally, the mechanism of learning should function for a wide range of environmental variations. In particular, when the environment is stable except for rare shifts, the phenotype distribution should narrow down through this learning mechanism to the most favorable phenotype during each environmental condition. On the other hand, when the environment changes frequently and irregularly, the phenotype distribution should broaden out to allow bet hedging.

We use a theoretical model to show that the evolutionary learning of adaptation to varying environments can be achieved through a simple and very general mechanism. This mechanism acts on the level of individual organisms without relying on environmental signals. It is based on a positive feedback that enhances the probability of the offspring to express the same phenotype as the parent. Our results do not depend on particular molecular details of this “transgenerational feedback.” Many known examples of molecular processes with wide-ranging timescales (15, 16) can potentially support the learning mechanism postulated by our model. We describe some of them below, but first we outline the main idea of the learning mechanism itself.

Model

For simplicity, consider a population of isogenic asexual organisms that can express alternate phenotypes. Each individual inherits information determining the probability of expressing different phenotypes and randomly expresses a phenotype according to that probability distribution. For concreteness, we imagine that the phenotypic choice is based on a bistable epigenetic switch, which takes variable molecular inputs and generates one of two possible outputs that determines a phenotype ϕA or ϕB, respectively; once a phenotype is selected after birth, it then remains unchanged due to some irreversible developmental events. The probability of selecting each phenotype, denoted by πA and πB satisfying πA+πB=1, depends on the threshold of the switch and the variability of the molecular input. We assume that each generation of individuals experiences the same environmental condition, that different generations do not overlap in time, and that the environment can change between generations. We also suppose that the phenotypes ϕA and ϕB are favorable in one of two possible environments, εA and εB, respectively, as illustrated in Fig. 1A. The expected number of offspring produced by an individual with phenotype ϕi in an environment εj (j= A, B) is denoted by wi(j). For now we assume that environmental selection is extremely strong, so that an individual cannot survive if its phenotype does not match the environment; i.e., wi(j)=0 for i≠j. Generalization to the case of nonextreme selection is discussed later and shown in detail in SI Appendix.

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

Schematic view of the evolutionary learning mechanism. (A) Each individual (solid circle) randomly expresses a phenotype ϕA (blue) or ϕB (red) according to a probability distribution πi (i=A,B). In an environment εj (j=A,B; background colors), a phenotype ϕi has a fitness value wi(j). Under extreme selection, only individuals whose phenotype matches the environment survive and produce offspring; individuals with mismatched phenotypes die (crossed out). (B) The learning rule: Each individual in a new generation t+1 inherits a probability πi(t+1) that equals the probability πi(t) carried by its parent plus a change Δπi that depends on the parent phenotype. This change makes the offspring more likely (Δπi>0) to have the same phenotype as the parent and less likely (Δπi<0) otherwise. (C) Evolutionary learning in a varying environment: Bar plots show the phenotype probability distributions πi(t) in each generation. After a few generations, πi(t) approaches the environment frequencies as a result of the learning mechanism. (D) Lineages of individuals: Solid arrows form a continuous lineage; dashed arrows are where a lineage terminates. The history of phenotypes along a continuing lineage reflects the past environment.

The evolutionary learning mechanism, which we propose, is based on the following positive feedback: If an individual exhibits phenotype ϕk (k= A or B), then the frequency of this phenotype is further increased in its offspring. Mathematically, the phenotype probability distribution of the offspring is different from that of the parent by an amount Δπi, which is positive for i=k and negative otherwise (Fig. 1B). Different variants of this general learning mechanism are possible; here we consider the following simple rule:Δπi=η(δik−πi)={η (1−πk),i=k;−η πi,i≠k.[1](δik is the Kronecker delta, which equals 1 if i=k and 0 otherwise.) The parameter η is the learning rate, satisfying 0<η≪1. (Note that the total probability is conserved because ∑iΔπi=0). This rule is reminiscent of “Hebbian learning” (17), which strengthens newly experienced states in a neural network by reinforcing synaptic connections between the coactive neurons.

Through this learning mechanism, the phenotype probability distribution πi is updated every generation and changes over time. Individuals of different phenotypes induce different changes of the distribution πi in their offspring. Although natural selection acts not on the distribution πi but on the phenotypes randomly generated from πi, the overall distribution of phenotypes in the population will nevertheless adapt to the environmental changes.

SI Appendix

Learning Under Nonextreme Selection.

Here we derive the dynamic changes of the phenotype probability distribution πi under nonextreme selection, i.e., when the fitness matrix wi(j) has nonvanishing off-diagonal elements, wi(j)>0 for i≠j. Nevertheless, assume that the phenotype that matches the environment is the most favorable; i.e., the diagonal element wj(j) is greater than wi(j) for i≠j. In this case, the adaptation of the average phenotype distribution π¯i of the whole population can be analyzed as follows.

Recall that there can be multiple alternate phenotypes labeled by ϕi (i=A,B,⋯), and each phenotype is expressed with probability πi, satisfying ∑iπi=1. Let πi and πi′ be the probability of expressing a phenotype ϕi by a parent and its offspring. If the parent actually expresses a phenotype ϕk, then according to the learning rule (Eq. 1 in the main text), its offspring will have a phenotype distributionπi′=(1−η)πi+η δik.[S1]Therefore, the average phenotype distribution in the offspring generation is given byπ¯i′=(1−η)E′[πi]+ηE′[δik],[S2]where E′ denotes averaging over the offspring generation, and k labels the phenotype of all parents in the previous generation. Approximating the πis of the parents by their average π¯i, and accounting for the fitness wk(j) of the parents in an environment εj, the average phenotype distribution of the offspring becomesπ¯i′≈(1−η) π¯i+η ∑kδikwk(j)π¯k∑ℓwℓ(j)π¯ℓ=π¯i+η(wi(j)π¯i∑ℓwℓ(j)π¯ℓ−π¯i).[S3]Therefore, the difference in the average phenotype distribution between the generations isΔπ¯i≈η(qi(j)−π¯i), where qi(j)≡wi(j)π¯i∑ℓwℓ(j)π¯ℓ.[S4]In the limit of extreme selection where wi(j)→w(j)δij, we have qi(j)→δij, recovering Eq. 5 in the main text (Materials and Methods), where πi is homogeneous among individuals. With nonextreme selection where wi(j)>0 for i≠j, the average distribution π¯i approaches qi(j); but because qi(j) depends on both π¯ and the environment εj that change with time, π¯i behaves as if following a moving target.

When the environment remains εj for a long time, eventually π¯ converges to a stable fixed point where qi(j)(π¯)=π¯i. This yields π¯i=δij as in the extreme selection case, which means the probability distribution narrows down to the most favorable phenotype in the current environment.

When the environment changes frequently, π¯ converges to a steady distribution π¯* given by the fixed point of the time-averaged equation〈Δπ¯i〉≈η(〈qi(jt)〉−π¯i),[S5]where jt labels the environment εjt at time t. Once again, these time-averaged dynamics follow a gradient ascent with respect to the asymptotic growth rate (Eq. 11 in the main text) (Materials and Methods), because the gradient ascent is given byΔπi∝∑jgij(∂Λ∂πj)=∑jpj(wi(j)πi∑ℓwℓ(j)πℓ−πi)=(∑jpjqi(j))−πi.[S6]The time average of qi(jt) in Eq. S5 is equal to the probability average of qi(j) in Eq. S6, because pj represents the frequency of each environment εj. It follows that π¯i converges to the steady distribution π¯* that coincides with the optimal bet-hedging solution.

Note that, when the gap between fitness values of different phenotypes vanishes, the adaptation time becomes increasingly longer. This result can be seen by linearizing Eq. S4. Assume that the environment is εj but π¯ is initially biased to a phenotype ϕk; i.e., π¯k≈1 for k≠j. Then qi(j)≈(wi(j)/wk(j))π¯i, and hence π¯j will be updated byΔπ¯j≈η (wj(j)wk(j)−1)π¯j.[S7]This is positive because wj(j)>wk(j), so π¯j increases in the environment εj as expected. However, the characteristic time for π¯j to become significant isτad′≈1η wk(j)wj(j)−wk(j)≈wk(j)wj(j)−wk(j) τad.[S8]As the gap vanishes, wj(j)−wk(j)→0+, it takes an infinitely long time, τad′→∞, for the population to shift to the marginally more favorable phenotype ϕj. This means π¯j would remain negligibly small during a finite time τenv. Indeed, when τad′≫τenv, the bet-hedging solution kicks in, which gives π¯j∗→0 because the phenotype ϕj becomes asymptotically dominated by ϕk in terms of fitness value.

Optimal Learning Rate.

Here we look for the optimal learning rate that maximizes the population growth for given patterns of environmental variations. Consider first the case of extreme selection. The asymptotic growth rate Λ of the population can be expressed as a function of the learning rate η as follows. Let {kt} be the time sequence of labels that specify the environmental condition over a long period T. Under extreme selection, the ancestral phenotypes must match the past environment; hence, by Eq. 4 in the main text, the phenotype distribution πi(t) is given byπi(t)=∑n=1∞η(1−η)n−1 δi,kt−n.[S9]Therefore, according to Eq. 17 in the main text (Materials and Methods), the asymptotic growth rate Λ is given byΛ=1T∑tlog(wkt(kt)πkt(t))=1T∑tlog⁡wkt(kt)+1T∑tlog∑n=1∞η(1−η)n−1 δkt,kt−n .[S10]The first term is independent of η and represents the fastest possible asymptotic growth rate that would be achieved by perfect sensing with no cost, Λs=∑i pi⁡log⁡wi(i). The second term, for a stationary distribution of the environment, converges in the limit of large T to a value that depends on the learning rate η.

Let us evaluate Λ for some examples of environmental variation patterns. Consider two possible environments εA and εB and assume that the durations of each environment, tA and tB, are randomly drawn from certain probability distributions, with their means equal to τA and τB, respectively. Denote the sequence of environmental durations by {⋯,tn−1,tn,tn+1,⋯}, where todd are drawn from the distribution of tA and teven from that of tB. Then Eq. S10 can be written asΛ=Λs+1T∑n∑t=0tn−1log[1−(1−η)t∑k=0∞(−1)k(1−η)∑l=1ktn−l],[S11]where T=∑ntn.

First, suppose the environment changes periodically, so that tA is always equal to τA and tB is equal to τB. In the symmetric case where τA=τB≡τenv, for example, Λ can be analytically expressed asΛ=Λs+1τenv∑t=0τenv−1log(1−(1−η)t1+(1−η)τenv).[S12]Fig. S1 A–C shows Λ as a function of η for different lengths of the period τenv. Let η ∗ be the value of η that maximizes Λ. It can be seen that, for small τenv (τenv<9), the maximum is reached at η∗→0 (Fig. S1A), corresponding to an adaptation timescale τad*≃1/η∗→∞. This means that, in the steady state, the population will settle on the optimal bet-hedging strategy (because τad∗≫τenv, see main text) and keep a minimum learning rate (η∗→0) such that the phenotype distribution remains nearly constant. For large τenv (τenv≥9), however, the maximum Λ is reached at η∗>0 (Fig. S1 B and C), such that τad∗ remains smaller than τenv (Fig. S2A). Hence the population will effectively show transgenerational plasticity, just like in the example presented in Fig. 2 C and F of the main text.

Fig. S1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S1.

Asymptotic growth rate Λ as a function of the learning rate η for given patterns of environmental variations, assuming two possible environments, εA and εB, and extreme selection. The constant term, Λs, is the asymptotic growth rate that would be achieved by perfect sensing with no cost. (A–C) Periodic environmental changes with fixed durations τA and τB for each environment, respectively, where τA=τB=5 (A), 10 (B), and 40 (C). (D–F) Environmental durations are geometrically distributed with means equal to τA and τB, respectively, where τA=10/3,τB=10/7 (D); τA=τB=3 (E); and τA=τB=10 (F).

Fig. S2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S2.

Adaptation timescale τad* corresponding to the optimal learning rate η∗, for different patterns of environmental variations as shown in Fig. S1. (A) Periodic environmental changes with a fixed duration τenv. (B) Geometrically distributed environmental durations with a same mean equal to τenv. Error bars are due to uncertainty in numerical calculations. Dashed lines represent critical values of τenv below which η∗→0, and hence τad*→∞.

Further, suppose tA and tB are geometrically distributed, with means equal to τA and τB. Then the time series of the environment is actually given by a Markov chain with a transition matrix M=(1−α,βα,1−β), where α=1/τA and β=1/τB. Numerically calculated values of Λ are shown in Fig. S1 D–F. The results are qualitatively similar to those of the periodic case. In the symmetric case where τA=τB≡τenv, for example, it is found that η∗→0 for τenv<4 (Fig. S1E) and η*>0 for τenv≥4 (Fig. S1F). The adaptation timescale τad∗ corresponding to the optimal learning rate η∗ is shown in Fig. S2B.

A special case is when the environment is independent and identically distributed (i.i.d.) with probabilities (pA,pB). This can be described by the above transition matrix M with α=pB and β=pA (hence τA=1/pB and τB=1/pA). Fig. S1D shows an example where pA=0.7 and pB=0.3, the same as for the example presented in Fig. 2 A and D of the main text. It can be seen that η∗→0, which means a nearly constant bet-hedging strategy is optimal for this case.

For the more general case of nonextreme selection, similar results are found by numerical simulations. Consider the same examples of environmental variation patterns as above. The growths of populations having different learning rates are compared in Fig. S3 A–C. We can see that the optimal learning rate is very close to the one found in the extreme selection case. For example, in Fig. S3A where the environment is i.i.d., we see that the smaller the learning rate η is, the faster the population grows, in agreement with Fig. S1D. Similarly, in Fig. S3 B and C, it can be inferred that the optimal learning rate lies in between η=0.15 and 0.2, consistent with Fig. S1 F and C, which has the same environmental variation patterns. Note that, in Fig. S3B, we may observe intermittent periods of time during which the population growth shows either transgenerational plasticity-like (shaded regions, compare Fig. S3C) or bet-hedging–like (open regions, compare Fig. S3A) behavior, as discussed in the main text.

Fig. S3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S3.

Simulation of population growth with different learning rates, under nonextreme selection with given patterns of environmental variations. The fitness matrix is chosen to be the same as in Fig. 2 of the main text; i.e., wi (j)=[[2.0,0.2],[0.2,2.0]]. Colored lines represent learning rates ranging from η=0.05 to 0.30. (A) Environment is i.i.d. (τenv≃1) with probabilities pA=0.7 and pB=0.3, as in Fig. 2C. The population adopts bet-hedging behavior. (B) Environmental durations are geometrically distributed with the mean τenv=10. Shaded regions mark intermittent periods of time during which the population shows transgenerational plasticity, whereas in open regions the population behaves more similarly to bet hedging. (C) Environment switches periodically between two conditions every τenv=40 generations, as in Fig. 2D. The population shows transgenerational plasticity. (D) The pattern of environmental variations alternates every 200 generations between being i.i.d. (τenv≃1, as in A) and periodic (τenv=40, as in C). The population exhibits either bet hedging or transgenerational plasticity during the corresponding periods of time.

Finally, we use numerical simulations to study the effect of varying environmental statistics. As an example, Fig. S3D shows the case where the pattern of environmental variations alternates between being i.i.d. (τenv≃1) and being periodic with a long duration (τenv=40). It can be seen that the optimal learning rate η∗ is positive and finite, and the population exhibits either bet-hedging or transgenerational plasticity during the corresponding periods of time. This is qualitatively similar to the case of a broad distribution of τenv, such as in Fig. S3B. Nevertheless, the transition between the transgenerational plasticity and the bet-hedging behaviors is much more pronounced here.

Results

To better understand how the transgenerational positive feedback functions, consider first the case where environmental selection is extremely strong. Because only individuals with the phenotype that matches the environment may survive and produce offspring, those offspring will have the same Δπi and hence the same πi. Let πi(t) be the phenotype distribution of individuals in the tth generation (Fig. 1C). If the environment is εA and remains so for t generations, then by applying the learning rule (Eq. 1) recursively, one finds (Materials and Methods) that the phenotype distribution becomes{πA(t)=1−(1−η)t (1−πA(0)),πB(t)=(1−η)t πB(0).[2]Therefore, as a result of the transgenerational positive feedback, the probability πA(t) of expressing the favorable phenotype ϕA increases over the generations. The adaptation timescale is given by τad=−1/log(1−η)≃1/η (measured in the number of generations), controlled by the learning rate η.

Now, suppose that the environment switches repeatedly during that timescale τad; i.e., the typical duration of an environment, τenv, is much shorter than τad. Then, through the learning mechanism, the phenotype probability distribution πi converges to a steady distribution πi∗ and fluctuates around it. Indeed, averaging Δπi from Eq. 1 over a time period ≃τad and assuming Δπi is small so that πi≈ constant, one finds (Materials and Methods)〈Δπi〉≈η (pi−πi),[3]where pi is the empirical frequency of the environment εi during that period. The steady distribution is given by 〈Δπi〉=0, which yields πi∗=pi. This distribution, being proportional to the environment frequency, exactly recovers the optimal bet-hedging strategy (14).

We have thus shown that, in a rapidly fluctuating environment, the proposed learning mechanism enables the population to reach the optimal phenotype distribution for bet hedging. Eq. 3 also implies that the phenotype distribution πi converges to the optimal distribution exponentially, following the direction of fastest adaptation (Materials and Methods). In particular, for the case of extreme selection, the convergence time is given by the adaptation timescale, τad≃1/η (this timescale is lengthened for the case of nonextreme selection) (SI Appendix).

The evolutionary learning mechanism effectively allows the organism to gather information about the past environment. This process is possible because the learning mechanism gives every individual an effective memory of its ancestral phenotypes. Starting from an individual in the tth generation and following its lineage backward, let the phenotype of its n-generation ancestor be ϕit−n, where n is the generation distance. By using Eq. 1 recursively, we find (Materials and Methods)πi(t)=∑n=1∞η (1−η)n−1 δi,it−n.[4]This result means that the individual may express the same phenotype as its n-generation ancestor with probability η(1−η)n−1, which decays exponentially with the generation distance. Hence, effectively, the individual is able to sample the phenotypes of its ancestors from a recent time period ≃1/η≃τad. In the case of extreme selection, the phenotypes of one’s ancestors carry complete information about the past environment, because an ancestor could survive only because its phenotype matched the environment at its time (Fig. 1D). Thus, by sampling the ancestral phenotypes, the organism is able to collect the statistics of the past environment. In particular, if the environmental variation happens much faster than the adaptation timescale τad, the organism will be able to adapt by learning a bet-hedging strategy.

Note that in the opposite regime, in which an environment lasts for a time τenv much longer than τad, the organism is able to adapt by learning the most favorable phenotype in that environment. In the previous example where the environment remains εA for t generations, one finds πA(t)→1 when t≫τad in Eq. 2, and hence the phenotype distribution becomes biased heavily toward the phenotype ϕA. Thus, interestingly, without having any sensors, the population effectively shows a form of plasticity that lets individuals preferentially express the favorable phenotype in the current environment. As a result, the population quickly approaches the maximum growth rate in that environment (Materials and Methods).

Our model can be generalized to include nonextreme environmental selection, in which case individuals with a mismatched phenotype can also produce offspring. According to the learning mechanism, those offspring will have a phenotype distribution πi that is different from that of the offspring of the individuals with the favorable phenotype. As a result, πi no longer remains homogeneous in the whole population. Nevertheless, if the fitness of the mismatched phenotype is much lower than that of the favorable phenotype, then the variation in πi will be small among individuals, and the average phenotype distribution π¯i over the whole population should behave similarly to the extreme selection case (SI Appendix). Note that, although we have considered here the case of only two different phenotypes, a generalization of the model leads to similar results for any number of alternate phenotypes (Materials and Methods and SI Appendix).

Those analytical results are confirmed by numerical simulations. Examples of such simulations for nonextreme selection are shown in Fig. 2. As predicted, the evolutionary learning mechanism allows the population to dynamically adjust the phenotype distribution, depending on the pattern of environmental variations. When the environment changes frequently, i.e., τenv≪τad, the phenotype distribution quickly converges to the optimal bet-hedging solution (Fig. 2 A and D). As a result, the population reaches a stable phenotypic diversity that offers the highest possible growth rate without any direct sensing of the environment. On the other hand, when the environment remains stable for long periods, i.e., τenv≫τad, the population progressively expresses the favorable phenotype in every environment and approaches the maximum growth rate after a finite adaptation period ≃τad (Fig. 2 C and F). Hence, effectively, the population shows a “transgenerational phenotypic plasticity.”

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

Simulation of a learning population in a fluctuating environment. The environment alternates between two conditions εA and εB. Each individual carries a probability distribution (πA, πB) and randomly selects a phenotype ϕA or ϕB. Each phenotype ϕi has a fitness wi (j) in the environment εj, where wi (j)=[[2.0,0.2],[0.2,2.0]]. The learning rate is η=0.1, which corresponds to τad≈10. (A–C) Blue shades represent the histograms of the probability πA(t) in each generation t estimated from N=105 individuals, with normally distributed initial values πA(0)∼N(0.5,0.05); yellow lines are the average probability π¯A(t) over the population. (D–F) Blue lines are the population growth curves estimated from the πA(t)s in the same examples as in A–C; black lines represent the optimal bet-hedging solutions calculated from the environment frequencies; yellow lines are the maximum growth curve obtained by having the phenotype matching the environment at all times and with no cost for sensing. (A and D) The environment is independent and identically distributed (τenv≃1) with pA=0.7, pB=0.3. (B and E) The durations of the environment εA and εB are geometrically distributed with means τA=10 and τB=5, respectively. (C and F) The environment switches every τenv=40 generations.

In addition, if the timescale of environmental changes is comparable to the adaptation timescale τad, the population may exhibit intermittent periods of either bet-hedging or transgenerationally plastic behavior (Fig. 2 B and E). This behavior happens when the actual durations of each environment vary broadly with time. As a result, there are times when an environment happens to last for a period τ>τad (e.g., t≈50−100 in Fig. 2 B and E), as well as times when environmental durations are predominantly short, τ<τad (e.g., t≈0−50 in Fig. 2 B and E). In the former case the population shows transgenerational plasticity, whereas in the latter case it adopts a bet-hedging strategy.

The adaptation timescale τad, which determines the behavior of the population growth for a given pattern of environmental variations, is controlled by the learning rate η. Different learning rates may be achieved, depending on the molecular basis of the learning mechanism. Therefore, in principle, the learning rate could be selected over evolutionary timescales. A quantitative comparison of the performance of different learning rates is presented in SI Appendix, which yields the optimal learning rate η* that maximizes population growth for given patterns of environmental variations. For very short environmental durations (e.g., τenv≪10), it is found that η∗ approaches 0, and thus the phenotype distribution becomes nearly constant after reaching the optimal bet-hedging solution. In contrast, if long periods of constant environment exist (e.g., τenv≳10), η∗ becomes positive and finite, and hence the population can take advantage of the transgenerational plasticity during such periods. The behaviors of the phenotype distribution and the population growth for the optimal learning rate under those environmental variation patterns are similar to the examples shown in Fig. 2 (SI Appendix).

Note that the process of evolutionary learning cannot be thought of as adaptation through competition between groups of individuals with different and stably inherited phenotype probability distribution πis. In general, each lineage of individuals could have a different πi, and modification of πi arises constantly at every generation and in every individual. Hence the population acts as a whole, with a dynamically maintained distribution of πis. It is the population, not particular individuals, that adapts to the varying environment through the evolutionary learning of the optimal phenotypic diversity.

Discussion

The learning mechanism that we propose here can potentially be realized by means of epigenetic inheritance. The essential feature of the learning mechanism is the progressive reinforcement of the parent phenotype in the distribution of offspring phenotypes. Such parent-dependent changes of phenotype distribution have been characterized, for example, in the Agouti viable yellow (Avy) mouse (18), a well-studied model organism for the so-called transgenerational epigenetic inheritance. The expression of the Avy allele results in a yellow coat color, as well as some other physiological conditions. However, isogenic mice carrying this allele (Avy/a) show variable coat colors ranging from full yellow to wild-type agouti. Importantly, the distribution of phenotypes among the offspring is influenced by the phenotype of the dam (but not the sire)—yellow dams produce offspring that are more likely to be yellow, whereas agouti dams produce a higher percentage of agouti offspring; moreover, the percentage of agouti offspring increases even further in the next generation if both the dam and the grand-dam are agouti (18). Similar transgenerational effects are observed in axin-fused (AxinFu) mice showing a distribution of kinky tail phenotypes, which in this case depends on the phenotype of both parents (19). The molecular basis of these transgenerational effects has been attributed to the epigenetic inheritance of the DNA methylation pattern at the intracisternal A particle (IAP) site upstream of the Avy (AxinFu) locus (18, 19), although the details of how the methylation pattern escapes epigenetic reprogramming are not fully understood (20).

Many other examples of such transgenerational effects are found in organisms across a wide range of taxa (21). There are well-established molecular processes that enable transgenerational epigenetic inheritance (15, 22), such as DNA methylation, histone modification, small RNA interference, and protein structural templating. Any of those epigenetic processes can in principle serve as the basis of the proposed learning mechanism. In addition, if the environmental statistics remains stable, the adaptation achieved through the learning mechanism may be further stabilized by other molecular processes acting on even longer timescales, including genetic adaptations such as contingency loci, copy number changes, and mutations (15, 16).

In conclusion, we have introduced a general evolutionary learning mechanism for adaptation to varying environments. It is based on a transgenerational feedback that can use well-established molecular processes of epigenetic inheritance. It is our hope that this theoretical work will stimulate experimental searches for such a learning mechanism in organisms exhibiting adaptive phenotypic diversification.

Materials and Methods

Here we derive the dynamic changes of the phenotype probability distribution πi under the learning rule (Eq. 1). In general, suppose there are multiple possible phenotypes (can be more than two) labeled by ϕi (i=A,B,⋯), and each phenotype is expressed with probability πi, satisfying ∑iπi=1. We treat the simple case with extreme selection, i.e., when the fitness matrix wi(j) has only diagonal elements. Generalization to the case of nonextreme selection, i.e., when wi(j)≠0 for i≠j, is presented in SI Appendix.

Long-Lasting Environment and Plastic Phenotype Distribution.

Under extreme selection, only individuals with a phenotype that matches the environment could survive and produce offspring. When the environment is εj and remains so, the phenotype probability distribution πi is updated every generation byπi(t+1)=(1−η)πi(t)+η δij.[5]

The explicit expression of πi(t) for all i can be found by recursively using Eq. 5 as follows:πi(t)=(1−η)πi(t−1)+η δij=(1−η)2πi(t−2)+(η+(1−η)η)δij⋮=(1−η)tπi(0)+∑n=0t−1(1−η)n ηδij=(1−η)tπi(0)+(1−(1−η)t)δij.[6]

Therefore, the probability for the favorable phenotype ϕj is given by the componentπj(t)=1−(1−η)t (1−πj(0)),[7]

which increases with time and approaches 1 when t≫τad≈1/η.

As the environment remains to be εj, the number of individuals in the population after t generations is given byN(t)=(∏n=0t−1πj(n)wj(j))N(0),[8]

where N(0) is the initial population size. The growth curve of the logarithmic population size is given bylogN(t)N(0)=∑n=0t−1log(πj(n)wj(j)),[9]

with an instantaneous growth rate Λ(j)(t)=log(πj(t)wj(j)). By the time t≳τad, the population will approach the maximum growth rate in this environment, Λ(j)(t)→Λmax(j)=log⁡wj(j).

Frequently Changing Environment and Optimal Bet-Hedging Solution.

When the environment changes frequently and irregularly, the phenotype probability distribution πi converges to a steady distribution and fluctuates around it. Let ϕkt be the viable phenotype in the tth generation. By averaging Δπi from Eq. 1 over the adaptation timescale τad and assuming Δπi is small so that πi≈ constant, we find〈Δπi〉≈η(〈δi,kt〉−πi)=η(1τad∑tτadδi,kt−πi)=η(τiτad−πi)=η(pi−πi).[10]

For the second equality we used the condition of extreme selection, which means kt must match the environment at all times, and therefore δi,kt simply counts the number of times when the environment is εi during that period τad, denoted by τi. The ratio of τi and τad then gives the empirical frequency pi of the environment εi. Finally, setting 〈Δπi〉=0 yields the steady distribution, πi∗=pi.

To see that this steady distribution is in fact the optimal bet-hedging solution, note that the asymptotic growth rate of a population using a constant phenotype distribution π=(πA,πB,⋯) is given by (14)Λ(π)=∑jpj⁡log∑iπiwi(j)=∑jpj⁡log(πjw(j)),[11]

where the second equality holds under extreme selection, wi(j)=w(j)δij. This Λ is a strictly concave function of π, which guarantees a unique maximum. The maximum can be found by using gradient ascent,Δπi∝∑jgij(∂Λ∂πj)=∑j(πiδij−πiπj)(pjπj)=pi−πi,[12]

where gij=πiδij−πiπj is the Fisher information metric on the space of probability distributions, {π|∑iπi=1}. Therefore, setting the gradient to 0, we find that the optimal bet-hedging solution is indeed given by πi∗=pi. Comparing Eqs. 3 and 12 shows that the learning mechanism, averaged over time, follows the direction of steepest ascent.

For consistency, we have to check that the fluctuation around the steady distribution is small. During the period of an environment that lasts for a typical time τenv≪τad, the relative change in the probability πi can be estimated as|Δπiπi|≃1−(1−η)τenv≈η τenv≈τenvτad≪1.[13]

Because the environmental changes are random, Δπi from different time periods tend to average out. This supports our approximation that πi≈ constant over a period τad.

Learning as Transgenerational Phenotypic Memory.

Here we derive how an individual’s phenotype probability πi(t) depends on the phenotypes of its ancestors. Start from an ancient ancestor with a phenotype probability πi(0) and follow the lineage onward with one ancestor in every generation t, whose actual phenotype is labeled by ϕit. We find, recursively,πi(1)=(1−η)πi(0)+ηδi,i0,[14]πi(2)=(1−η)πi(1)+ηδi,i1=(1−η)2πi(0)+(1−η)η δi,i0+ηδi,i1,[15]⋮πi(t)=(1−η)tπi(0)+(1−η)t−1 ηδi,i0+⋯ +(1−η)ηδi,it−2+ηδi,it−1.[16]

For large t, the initial probability πi(0) becomes irrelevant, and hence we can write the above expression for πi(t) as in Eq. 4.

Simulation of the Learning Mechanism.

The adaptation of the phenotype distribution through the learning mechanism is simulated as follows: A population of size N is created. Each individual carries a probability distribution π=(πA,πB,⋯). At each time step, an environment εjt is chosen. Each individual randomly generates a phenotype according to its own π. The fitness of each individual is then determined by its phenotype and the current environment. At the end of each generation, a new sample of N individuals is drawn independently from all current individuals with a probability proportional to their fitness values. This step represents a process of replicating each current individual by a number proportional to its fitness value and then normalizing the sample size back to N. Each new individual inherits an updated probability distribution π+Δπ from its parent, according to the learning rule (Eq. 1). The process is repeated for every new generation.

Although the population size is constantly normalized, the instantaneous growth rate of the population can be estimated byΛ(t)=log∑iwi(jt)π¯i(t),[17]

where the average π¯i is taken over all individuals in the current generation. Then, the logarithmic growth of the population size is estimated bylogNtN0=∑s=1tΛ(s).[18]

Those estimates are valid in the limit of a large sample size N.

Acknowledgments

We thank David Jordan, Edo Kussell, Harmit Malik, Eric Miska, Luca Peliti, Oliver Rando, Olivier Rivoire, and Alexander Tarakhovsky for numerous discussions and encouragements. This research has been partly supported by grants from the Simons Foundation (to S.L.) through Rockefeller University (Grant 345430) and the Institute for Advanced Study (Grant 345801). B.X. is funded by the Eric and Wendy Schmidt Membership in Biology at the Institute for Advanced Study.

Footnotes

  • ↵1To whom correspondence may be addressed. Email: bkxue{at}ias.edu or livingmatter{at}rockefeller.edu.
  • Author contributions: B.X. and S.L. designed research; B.X. performed research; and B.X. and S.L. wrote the paper.

  • Reviewers: T.H., University of California, San Diego; and O.J.R., University of Massachusetts Medical Center, Worcester.

  • The authors declare no conflict of interest.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1608756113/-/DCSupplemental.

References

  1. ↵
    1. Lambert G,
    2. Kussell E
    (2014) Memory and fitness optimization of bacteria under fluctuating environments. PLoS Genet 10(9):e1004556.
    .
    OpenUrlCrossRefPubMed
  2. ↵
    1. Amasino R
    (2010) Seasonal and developmental timing of flowering. Plant J 61(6):1001–1013.
    .
    OpenUrlCrossRefPubMed
  3. ↵
    1. Slatkin M
    (1974) Hedging one’s evolutionary bets. Nature 250(5469):704–705.
    .
    OpenUrl
  4. ↵
    1. Philippi T,
    2. Seger J
    (1989) Hedging one’s evolutionary bets, revisited. Trends Ecol Evol 4(2):41–44.
    .
    OpenUrlCrossRefPubMed
  5. ↵
    1. Simons AM
    (2011) Modes of response to environmental change and the elusive empirical evidence for bet hedging. Proc R Soc B 278(1712):1601–1609.
    .
    OpenUrlAbstract/FREE Full Text
  6. ↵
    1. Grimbergen AJ,
    2. Siebring J,
    3. Solopova A,
    4. Kuipers OP
    (2015) Microbial bet-hedging: The power of being different. Curr Opin Microbiol 25:67–72.
    .
    OpenUrlCrossRefPubMed
  7. ↵
    1. Balaban NQ,
    2. Merrin J,
    3. Chait R,
    4. Kowalik L,
    5. Leibler S
    (2004) Bacterial persistence as a phenotypic switch. Science 305(5690):1622–1625.
    .
    OpenUrlAbstract/FREE Full Text
  8. ↵
    1. Cohen D
    (1966) Optimizing reproduction in a randomly varying environment. J Theor Biol 12(1):119–129.
    .
    OpenUrlCrossRefPubMed
  9. ↵
    1. Simons AM
    (2009) Fluctuating natural selection accounts for the evolution of diversification bet hedging. Proc R Soc B 276(1664):1987–1992.
    .
    OpenUrlAbstract/FREE Full Text
  10. ↵
    1. Gremer JR,
    2. Venable DL
    (2014) Bet hedging in desert winter annual plants: Optimal germination strategies in a variable environment. Ecol Lett 17(3):380–387.
    .
    OpenUrlCrossRefPubMed
  11. ↵
    1. Rajon E,
    2. Desouhant E,
    3. Chevalier M,
    4. Débias F,
    5. Menu F
    (2014) The evolution of bet hedging in response to local ecological conditions. Am Nat 184(1):E1–E15.
    .
    OpenUrlCrossRefPubMed
  12. ↵
    1. Kussell E,
    2. Leibler S
    (2005) Phenotypic diversity, population growth, and information in fluctuating environments. Science 309(5743):2075–2078.
    .
    OpenUrlAbstract/FREE Full Text
  13. ↵
    1. Melbinger A,
    2. Vergassola M
    (2015) The impact of environmental fluctuations on evolutionary fitness functions. Sci Rep 5:15211.
    .
    OpenUrlCrossRefPubMed
  14. ↵
    1. Rivoire O,
    2. Leibler S
    (2011) The value of information for populations in varying environments. J Stat Phys 142(6):1124–1166.
    .
    OpenUrlCrossRef
  15. ↵
    1. Rando OJ,
    2. Verstrepen KJ
    (2007) Timescales of genetic and epigenetic inheritance. Cell 128(4):655–668.
    .
    OpenUrlCrossRefPubMed
  16. ↵
    1. Yona AH,
    2. Frumkin I,
    3. Pilpel Y
    (2015) A relay race on the evolutionary adaptation spectrum. Cell 163(3):549–559.
    .
    OpenUrlCrossRefPubMed
  17. ↵
    1. Hertz J,
    2. Palmer RG,
    3. Krogh AS
    (1991) Introduction to the Theory of Neural Computation (Westview, a Member of the Perseus Books Group, Cambridge, MA), 1st Ed.
    .
  18. ↵
    1. Morgan HD,
    2. Sutherland HG,
    3. Martin DI,
    4. Whitelaw E
    (1999) Epigenetic inheritance at the agouti locus in the mouse. Nat Genet 23(3):314–318.
    .
    OpenUrlCrossRefPubMed
  19. ↵
    1. Rakyan VK, et al.
    (2003) Transgenerational inheritance of epigenetic states at the murine Axin(Fu) allele occurs after maternal and paternal transmission. Proc Natl Acad Sci USA 100(5):2538–2543.
    .
    OpenUrlAbstract/FREE Full Text
  20. ↵
    1. Blewitt ME,
    2. Vickaryous NK,
    3. Paldi A,
    4. Koseki H,
    5. Whitelaw E
    (2006) Dynamic reprogramming of DNA methylation at an epigenetically sensitive allele in mice. PLoS Genet 2(4):e49.
    .
    OpenUrlCrossRefPubMed
  21. ↵
    1. Jablonka E,
    2. Raz G
    (2009) Transgenerational epigenetic inheritance: Prevalence, mechanisms, and implications for the study of heredity and evolution. Q Rev Biol 84(2):131–176.
    .
    OpenUrlCrossRefPubMed
  22. ↵
    1. Heard E,
    2. Martienssen RA
    (2014) Transgenerational epigenetic inheritance: Myths and mechanisms. Cell 157(1):95–109.
    .
    OpenUrlCrossRefPubMed
PreviousNext
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Evolutionary learning of adaptation to varying environments through a transgenerational feedback
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Evolution of adaptation to varying environments
BingKan Xue, Stanislas Leibler
Proceedings of the National Academy of Sciences Oct 2016, 113 (40) 11266-11271; DOI: 10.1073/pnas.1608756113

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Evolution of adaptation to varying environments
BingKan Xue, Stanislas Leibler
Proceedings of the National Academy of Sciences Oct 2016, 113 (40) 11266-11271; DOI: 10.1073/pnas.1608756113
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley

Article Classifications

  • Biological Sciences
  • Evolution
  • Physical Sciences
  • Physics
Proceedings of the National Academy of Sciences: 113 (40)
Table of Contents

Submit

Sign up for Article Alerts

Jump to section

  • Article
    • Abstract
    • Model
    • SI Appendix
    • Results
    • Discussion
    • Materials and Methods
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Smoke emanates from Japan’s Fukushima nuclear power plant a few days after tsunami damage
Core Concept: Muography offers a new way to see inside a multitude of objects
Muons penetrate much further than X-rays, they do essentially zero damage, and they are provided for free by the cosmos.
Image credit: Science Source/Digital Globe.
Water from a faucet fills a glass.
News Feature: How “forever chemicals” might impair the immune system
Researchers are exploring whether these ubiquitous fluorinated molecules might worsen infections or hamper vaccine effectiveness.
Image credit: Shutterstock/Dmitry Naumov.
Venus flytrap captures a fly.
Journal Club: Venus flytrap mechanism could shed light on how plants sense touch
One protein seems to play a key role in touch sensitivity for flytraps and other meat-eating plants.
Image credit: Shutterstock/Kuttelvaserova Stuchelova.
Illustration of groups of people chatting
Exploring the length of human conversations
Adam Mastroianni and Daniel Gilbert explore why conversations almost never end when people want them to.
Listen
Past PodcastsSubscribe
Horse fossil
Mounted horseback riding in ancient China
A study uncovers early evidence of equestrianism in ancient China.
Image credit: Jian Ma.

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Special Feature Articles – Most Recent
  • List of Issues

PNAS Portals

  • Anthropology
  • Chemistry
  • Classics
  • Front Matter
  • Physics
  • Sustainability Science
  • Teaching Resources

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Subscribers
  • Librarians
  • Press
  • Cozzarelli Prize
  • Site Map
  • PNAS Updates
  • FAQs
  • Accessibility Statement
  • Rights & Permissions
  • About
  • Contact

Feedback    Privacy/Legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490