## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# The value of monitoring to control evolving populations

Edited by David R. Nelson, Harvard University, Cambridge, MA, and approved December 2, 2014 (received for review June 26, 2014)

## Significance

Evolution of drug resistance, as observed in bacteria, viruses, parasites, and cancer, is a key challenge for global health. We approach the problem using the mathematical concepts of stochastic optimal control to study what is needed to control evolving populations. We focus on the detrimental effect of imperfect information and the loss of control it entails, thus quantifying the intuition that to control, one must monitor. We apply these concepts to cancer therapy to derive protocols where decisions are based on monitoring the response of the tumor, which can outperform established therapy paradigms.

## Abstract

Populations can evolve to adapt to external changes. The capacity to evolve and adapt makes successful treatment of infectious diseases and cancer difficult. Indeed, therapy resistance has become a key challenge for global health. Therefore, ideas of how to control evolving populations to overcome this threat are valuable. Here we use the mathematical concepts of stochastic optimal control to study what is needed to control evolving populations. Following established routes to calculate control strategies, we first study how a polymorphism can be maintained in a finite population by adaptively tuning selection. We then introduce a minimal model of drug resistance in a stochastically evolving cancer cell population and compute adaptive therapies. When decisions are in this manner based on monitoring the response of the tumor, this can outperform established therapy paradigms. For both case studies, we demonstrate the importance of high-resolution monitoring of the target population to achieve a given control objective, thus quantifying the intuition that to control, one must monitor.

The progression of cancer is an evolutionary process of cells driven by genetic alterations and selective forces (1). The frequent failure of cancer therapies, despite a host of new targeted cancer drugs, is largely caused by the emergence of drug resistance (2). Cancer therapy faces a real dilemma: the more effective a new treatment is at killing cancerous cells, the more selective pressure it provides for those cells resistant to the drug to take over the cancer population in a process called competitive release (3, 4).

A genetic innovation conferring resistance can either be already present as standing variation or in close evolutionary reach, via de novo mutations. The probability of these events is often proportional to the genetic diversity of the tumor. Therefore, resistance is a problem especially for genetically heterogeneous cancers (5). This diversity can be the result of a variable microenvironment, with different pockets of acidity, blood supply, and geometrical constraints of surrounding tissue (2). Also, late-stage cancers not only carry the cumulative archaeological record of their evolutionary history (6) but can also become genetically unstable and fall victim to chromothripsis (7), kataegis (8), and other disruptive mutational processes (9, 10). Thus, the probability of treatment success is higher in genetically homogeneous and/or early-stage cancers (11). Taken together, these considerations place emphasis on early detection of tumors.

In cases where early detection is not achieved, the pertinent question is how to avoid treatment failure in the presence of genetic heterogeneity, which seems to be the norm for most solid cancers. One obvious attempt is to make treatments more complex and thus put the resistance mechanisms out of reach of the tumor. In combination therapy, the tumor is simultaneously treated with two or more drugs that would require different, possibly mutually exclusive, escape mechanisms for cells to become resistant. This approach has proven to be successful in the treatment of HIV and is discussed as a possible model also for cancer (12). In the context of cancer, this form of personalized therapy is not yet widely realized, mainly because of the much richer repertoire of genetic variation and adaptability of cancer cells and a comparable shortage of drugs targeting distinct biological pathways. For a recent study of the conditions under which combination therapy is expected to be successful in cancer, see ref. 13.

For application of single drugs, there are a number of studies that concentrate on how the therapeutic protocol itself can be optimized. It was realized that all-out maximum tolerated dose chemotherapy is not the only, or necessarily the best, treatment strategy (14). Alternative dosing schedules have been proposed such as drug holidays, metronome therapy (15), and adaptive therapy (16). The realization of Gatenby et al. in ref. 16 is that cancer, as a dynamic evolutionary process, can be better controlled by dynamically changing the therapy, depending on the response of the tumor. Their protocol of reducing the dose while the tumor shrinks and increasing it under tumor growth showed a drastic improvement of life expectancy in mice models of ovarian cancer (16). Furthermore, Gatenby et al. made the important conceptual step of reformulating cancer therapy to be not necessarily about tumor eradication. Instead, dynamic maintenance of a stable tumor size can also be a preferable outcome.

Motivated by this experiment, we conjecture that there are substantial therapy gains in optimal applications of existing drugs, as of yet underexploited. As a first step toward using this potential we would like to formalize the intuition of Gatenby et al. To this extent, we aim to establish a theoretical framework for the adaptive control of evolving populations. In particular, we connect the idea of adaptive therapy to the paradigm of stochastic optimal control, also known as Markov decision problems. For other applications of stochastic control in the context of evolution by natural selection see refs. 17, 18. A stochastic treatment is necessary due to the nature of evolutionary dynamics where fluctuations (so-called genetic drift) matter even in large populations. For instance, the dynamics of a new beneficial mutation is initially dominated by genetic drift before it becomes established (19). Stochastic control is a well-established field of research which provides not only a natural language for framing the task of cancer therapy, but also a set of general purpose techniques to compute an optimal control or therapy regimen for a given dynamical system and a given control objective. Although we demonstrate the main steps in this program, we focus on the detrimental effect of imperfect information and the loss of control it entails, thus quantifying the intuition that to control, one must monitor. The informational value of continued monitoring is a natural concept for controlled stochastic systems, whereas in deterministic models successful control usually does not rely on sustained observations.

We first introduce the concepts of stochastic optimal control using a minimal evolutionary example: how to keep a finite population polymorphic under Wright–Fisher evolution by influencing the selective difference between two alleles. If perfect information about the population is available, the polymorphism can be maintained for a very long time. We will show how imperfect information due to finite monitoring can lead to a quick loss of control and how some of it can be partially reclaimed by informed preemptive control strategies. We then move to our main problem and introduce a minimal stochastic model of drug resistance in cancer that incorporates features such as variable population size, drug-sensitive and -resistant cells, a carrying capacity, mutation, selection, and genetic drift. After computing the optimal control strategies for a few important settings under perfect information, we demonstrate the effect of imperfect monitoring. If only the total tumor size can be monitored, we show how a control strategy emerges that can adaptively infer, and thus exploit, the inner tumor composition of susceptible and resistant cells.

## Controlling Evolving Populations

One can think about cancer therapy as the attempt to control an evolving population by means of drug treatment. Typically, the drug changes some of the parameters of the evolutionary process, such as the death rate of drug-sensitive cells. With application of the drug, one can thus actively influence the dynamics of the stochastic process and change its direction. All this happens with a concrete aim, such as to minimize the total tumor burden in the long term. To introduce some of the concepts of stochastic optimal control, we use an example with a nontrivial control task.

Imagine a biallelic and initially polymorphic population of constant size *N* under the Wright–Fisher model of evolution (20), i.e. binomial resampling of the population in each generation. The *A* allele confers a selective advantage of size *B* allele and will, without intervention, eventually take over the entire population (*f* denotes reproductive rates; see Fig. 1*A*). Assuming mutation to be negligible, the task at hand is to avoid, or at least delay, such a loss in diversity. Now assume that we can change the selection coefficient externally by a quantity *A*-allele frequency of *T* generations?*T*, it is lost. The optimal control can thus also be seen to minimize this cost of extinction. The standard technique to solve problems of this kind is to use a dynamic programming ansatz. Assuming that a partial problem starting from some intermediate point *cost-to-go* *J*: the cost-to-go at *W* on the control decision **2** can also include a term *x* and the control cost to apply *u*. Here, both are assumed zero and cost is paid only at the boundaries. For Wright–Fisher evolution, the transition matrix *W* can be expressed as the probability under binomial sampling to draw *A* allele. The crucial computational advantage of this relation is that the hard optimization in the space of all control protocols **2** is that the decision for a control now relies on future controls to be carried out optimally. In practice, the results of the local optimizations *t*. In many applications (21), it is also useful to consider a receding time horizon, such that **1** is not guaranteed to be unique. However, even degenerate optimal controls would achieve the same optimal value of the control objective (22).

Throughout this study, we apply the diffusion approximation (**2**, the so-called Hamilton–Jacobi–Bellman equation (21, 23), is valid.**2** (see also Fig. S1). For a treatment of finite mutation rate, see *SI Text* and Fig. S2.

### Optimal Control of a Wright–Fisher Population with Perfect Monitoring.

The optimal control function *T*. In the infinite horizon time limit *x*). In Eq. **3**, the control variable *u* appears only linearly. This means that only the two extreme control strengths are ever used to steer the system. This particular type of control is called bang-bang. It follows that the control profile *SI Text* and Fig. S1). At the correct threshold and with strong selective forces (

### Loss of Control due to Imperfect Monitoring.

The main assumption made so far was that perfect information is available about the state of the system in the form of continuous (in time), synchronous (without delay), and exact (without error) measurements of *x*. These requirements are impossible to achieve in practice, when monitoring is always imperfect. As we will see, when the assumption of perfect information is relaxed, not only is control over the system lost, but the control profile

Consider the situation where measurements of the frequency *x* are given only at discrete times *x* under this regime (see the decrease in survival time in Fig. 1*B*). For example, for *x* crosses

### Playing-To-Win vs. Playing-Not-To-Lose.

Without a continuous flow of observations as input, a preemptive control protocol *B* and *C*); second, the safe parking position moves away from the boundary toward

A similar loss of control can be expected for other types of monitoring imperfections and is a general feature of stochastic optimal control. It is important to note that the perfect-information control problem, and its solution *N* generations). In most cases, as we will see in the adaptive cancer therapy model below, finding

## Application to Adaptive Cancer Therapy

Here we first introduce a minimal stochastic model of drug resistance in cancer. For different qualitative regimes, we then find the optimal adaptive therapy with perfect information. Finally, we extend these ideas to the case where only the total cell population size can be observed but no readout of the fractions of susceptible and resistant cells is available. In the context of models of the cell cycle, deterministic control theory has previously been applied to find optimal cancer treatment protocols (e.g. refs. 26⇓–28). However, for the key concepts of this study—adaptive therapy and finite monitoring—stochastic control theory is needed and in fact leads to control protocols that exploit fluctuations.

### A Minimal Model of Drug Resistance in Cancer.

The desired features of a minimal model of drug resistance in cancer include: (*i*) a variable tumor cell population size *N*, (*ii*) at least two cell types, drug-sensitive and drug-resistant, (*iii*) a carrying capacity *K* that describes a (temporary) state of tumor homeostasis, and (*iv*) the possibility for mutation and selection between cell types. Control over the tumor can be applied via a drug that changes the evolutionary dynamics by increasing, for example, the death rate of sensitive cells. We will assume here, as others have done in the context of cancer (11), a well-mixed cell population where the birth (or rather duplication) rate of cells is regulated by the carrying capacity. The dynamics of the model we have chosen here is encapsulated in the following birth and death rates for sensitive and resistant cells:*i*. For *BRAF* mutation treated with vemurafenib (29). Altogether, sensitive and resistant cells initially grow exponentially until the total population size

For the stochastic version of this process we can assume independent and individual birth and death events with the above probabilities per unit time. In analogy to the Wright–Fisher binomial update rule, here we can use a Poisson-like update.*SI Text*). This scaling exercise is mainly important because it allows to relate systems with small *K* (hundreds to thousands, as required for numerical analysis) to systems with large *K* (

### Optimal Cancer Therapy with Perfect Monitoring.

With the minimal model of drug resistance in cancer introduced above, we can start the program of stochastic optimal control to compute adaptive therapy protocols. The first task is to define the goal of such a program: what is the quantity one aims to optimize? One very important objective is to maximize the (expected) time until the cancer proceeds to the next, possibly lethal stage. This could mean the emergence of a new cell type with a much higher carrying capacity, e.g. with metastatic potential. We will denote this critical event simply with a “driver” event or “metastasis.” The rate of metastasis emergence is a combination of tumor size and the rate

Earlier, the optimal control for the Wright–Fisher evolution example turned out to be a piecewise constant function of allele frequency. Here, we need to find a control profile **1**, the control objective can be expressed as*T* generations (for *T* and also that control itself is cost-free. In cancer therapy, especially chemotherapy, this is certainly not the case: the side effects of treatment incur a considerable cost in terms of life quality and medical care. The difficulty, however, lies in quantifying these control costs in a manner that would make them comparable to the potential costs considered here. A simple extension of the above control objective to include such a control cost is demonstrated in *SI Text* and Fig. S4. The recurrence equation for the cost-to-go *W* is the product of the two Skellam distributions resulting from Eq. **7** (including boundary conditions). With this equation, we can solve the dynamic programming task numerically for moderate values of *K* (see Fig. S5). For the numerical analysis, we have to introduce an upper bound

In the case of *A*). However, this parameter regime of very high selection against resistance and/or very low rate of driver mutation, and therefore this therapy option, is not realistic for cancer. For higher values of *B*). This procedure can lead to cycles of tumor size reduction followed by regrowth, with the overall effect of extending the time until metastasis.

If *C*). For a lower mutation rate, the optimal protocol first tries to amplify one cell type before switching to an environment that is deadly for most cells (Fig. 2*D*).

The effectiveness of different therapy protocols is compared in Fig. 3 with 1,000 stochastic forward simulations (with *D*. Whereas no therapy

All these control strategies require perfect information, not only in the sense of the earlier Wright–Fisher example (continuous, synchronous, and exact), but also in terms of the inner tumor composition

### Loss of Therapy Efficacy due to Low-Resolution Monitoring.

There are very few cases where the genetic basis for a drug-resistance mechanism is known and can be specifically monitored (29, 32). In most cases the regrowth of the tumor under the drug is observed without understanding the exact biological processes responsible for the resistance. Here we aim to find rational control strategies when only the total tumor cell population size can be monitored. The adaptive therapy protocol that was applied by Gatenby et al. in ref. 16 (coupling the drug concentration to the tumor size) is one example of such a strategy.

Consider the situation where only the total population size **5**–**7** and all parameter values are known. Under these circumstances, the perfect-information optimal control profiles from the last section cannot be used directly. However, there is still valuable information available. The response *SI Text*) and then repeat the cost-to-go calculation of Eqs. **9** and **10**. This propagator takes into account not only the current size *W* used in Eq. **10** by integrating over the internal degrees of freedom *C*, the new control profile is shown in Fig. S6. The drug regimen (

## Discussion

We used stochastic control theory to quantify optimal control strategies for models of evolving populations. We hope our results will lead to interesting new designs of microbial and cancer cell evolution experiments where feedback plays a central role in achieving a given control task. We further demonstrated how control can be maintained with finite resources, when the monitoring necessary for adaptive control is imperfect. These strategies all depend on our ability to anticipate evolution, i.e. on a knowledge of the relevant equations of motion and their parameter values. For cancer, such detailed knowledge of evolutionary dynamics is certainly not yet available. Sequencing technologies are facing up to the challenge of tumor control with finite information, already accelerating progress in the monitoring of serial biopsies of tumors, circulating tumor cells, or cell-free tumor DNA in the bloodstream (33, 34). Once such time-resolved data become prevalent, we can start to learn and improve dynamical tumor models and compute their optimal control strategies. For instance, genetic heterogeneity within the tumor is now becoming quantifiable from sequencing data via computational inference (35⇓–37). Heterogeneity and subclonal dynamics have been found to have an impact on treatment strategy selection (38). Furthermore, all other available sources of clinical data, such as medical imaging, can provide additional information on cellular phenotypes and should be integrated into personalized and data-driven tumor control (see e.g. ref. 39 for imaging data-based computational modeling of pancreatic cancer growth dynamics to guide treatment choice and ref. 40 for integrative analysis of imaging and genetic data).

Beyond cancer, the need to control evolving populations is a key global health challenge as resistant strains of bacteria, viruses, and parasites are spreading (41, 42). Any long-term success in controlling evolution depends, at the very least, on mastering the following components. Firstly, on a quantitative understanding of the underlying evolutionary dynamics. Progress in the understanding is best demonstrated by predicting evolution; this has so far proven difficult, even in the short term. Nevertheless, new population genetic approaches applied to data are promising––see influenza strain prediction in ref. 43. Secondly, the success of control will depend on the availability of a sufficient arsenal of non–cross-resistant therapeutic agents. These therapeutics should be combined with the ability to decide an appropriate drug regimen given the genetic and phenotypic structure of the population. Large-scale drug vs. cell line screens are systematically pushing this component forward (see e.g. ref. 44). And finally, long-term success depends on the ability to monitor the evolution of a target population and act rationally based on this information, the topic of this paper.

## Acknowledgments

We thank C. Illingworth for discussions and J. Berg, C. Callan, M. Gerlinger, C. Greenman, M. Hochberg, P. Van Loo, and two anonymous reviewers for comments. We thank participants of the program on Evolution of Drug Resistance held at the Kavli Institute for Theoretical Physics for discussions. We acknowledge the Wellcome Trust for support under Grant References 098051 and 097678. A.F. is in part supported by the German Research Foundation under Grant FI 1882/1-1. This research was supported in part by the National Science Foundation under Grant NSF PHY11-25915.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: vm5{at}sanger.ac.uk.

Author contributions: A.F., I.V.-G., and V.M. designed research; A.F., I.V.-G., and V.M. performed research; A.F. contributed new reagents/analytic tools; A.F. analyzed data; and A.F., I.V.-G., and V.M. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1409403112/-/DCSupplemental.

Freely available online through the PNAS open access option.

## References

- ↵
- ↵
- ↵.
- Wargo AR,
- Huijben S,
- de Roode JC,
- Shepherd J,
- Read AF

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Bozic I, et al.

- ↵.
- Read AF,
- Day T,
- Huijben S

- ↵
- ↵.
- Gatenby RA,
- Silva AS,
- Gillies RJ,
- Frieden BR

- ↵
- ↵.
- Rivoire O,
- Leibler S

- ↵.
- Rouzine IM,
- Rodrigo A,
- Coffin JM

- ↵.
- Ewens WJ

- ↵
- ↵.
- Bertsekas DP

- ↵.
- Bellman RE,
- Kalaba RE

- ↵.
- Gardiner CW

- ↵.
- Cassandra AR,
- Kaelbling LP,
- Littman ML

*Proceedings of the Twelfth National Conference on Artificial Intelligence*94:1023–1028 - ↵
- ↵.
- De Pillis LG,
- Radunskaya A

- ↵
- ↵
- ↵.
- Van Kampen NG

- ↵.
- Bäuerle N,
- Rieder U

- ↵
- ↵.
- Schuh A, et al.

- ↵
- ↵
- ↵
- ↵
- ↵.
- Beckman RA,
- Schemmann GS,
- Yeang CH

- ↵
- ↵.
- Yuan Y, et al.

- ↵
- ↵
- ↵
- ↵