# Consecutive seeding and transfer of genetic diversity in metastasis

^{a}Program for Evolutionary Dynamics, Harvard University, Cambridge, MA 02138;^{b}Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138;^{c}Canary Center for Cancer Early Detection, Department of Radiology, Stanford University School of Medicine, Palo Alto, CA 94304;^{d}Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114;^{e}Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114;^{f}Department of Mathematics, Harvard University, Cambridge, MA 02138

See allHide authors and affiliations

Edited by Andrea Sottoriva, The Institute of Cancer Research, London, United Kingdom, and accepted by Editorial Board Member Anton Berns June 3, 2019 (received for review November 13, 2018)

## Significance

The success of cancer treatment largely depends on the genetic mutations present within metastases, which cause 90% of cancer-related deaths. Genetically diverse metastases are more likely to harbor resistance mutations, contributing to treatment failure. It is often assumed that each metastasis is seeded exactly once, such that its diversity cannot be inherited and instead must emerge entirely during growth, yet many metastases have a diversity pattern inconsistent with this assumption. We introduce a mathematical model of consecutive seeding by multiple cells that can explain these patterns. We then apply this model to tumor sequencing data to infer that 10 to 150 cells seeded each metastasis. We derive predictions for the fraction of transferred diversity and the proportion of polyclonal lesions.

## Abstract

During metastasis, only a fraction of genetic diversity in a primary tumor is passed on to metastases. We calculate this fraction of transferred diversity as a function of the seeding rate between tumors. At one extreme, if a metastasis is seeded by a single cell, then it inherits only the somatic mutations present in the founding cell, so that none of the diversity in the primary tumor is transmitted to the metastasis. In contrast, if a metastasis is seeded by multiple cells, then some genetic diversity in the primary tumor can be transmitted. We study a multitype branching process of metastasis growth that originates from a single cell but over time receives additional cells. We derive a surprisingly simple formula that relates the expected diversity of a metastasis to the diversity in the pool of seeding cells. We calculate the probability that a metastasis is polyclonal. We apply our framework to published datasets for which polyclonality has been previously reported, analyzing 68 ovarian cancer samples, 31 breast cancer samples, and 8 colorectal cancer samples from 15 patients. For these clonally diverse metastases, under typical metastasis growth conditions, we find that 10 to 150 cells seeded each metastasis and left surviving lineages between initial formation and clinical detection.

Intratumoral heterogeneity is an inevitable consequence of cancer evolution (1, 2). At the time of cancer diagnosis, many clones (subpopulations of genetically similar cells that share a common ancestry) coexist in the primary tumor (3, 4). When some of these clones give rise to metastases, the clonal heterogeneity present in the primary tumor is distributed to distant sites (5⇓⇓–8). Across cancer types, the mutations with the greatest predicted functional consequences are predominantly shared across all metastases, suggesting that these mutations first arose in the primary tumor and were then distributed to each metastasis (9). Since primary tumors are often surgically removed, the heterogeneity within metastases determines the probability for treatment efficacy (2, 3, 10).

While it frequently has been assumed that individual metastases are seeded exactly once by a single cell or a small cluster of similar cells (11, 12), recent studies have identified metastases with multiple subpopulations derived from different clones in the primary tumor (13⇓⇓⇓⇓⇓–19). This transfer of clonal diversity suggests at least one of two possible mechanisms: that metastases can be seeded multiple times by different migrating cells (consecutive seeding) or that metastases can be seeded by a cluster of multiple clonally diverse cells (polyclonal cluster seeding). Although some empirical and theoretical work has suggested that circulating tumor cell clusters can be genetically diverse (20⇓–22), the diversity established by polyclonal cluster seeding cannot necessarily be maintained during metastasis growth without consecutive seeding, as only a small number of cell lineages typically survive the stochastic growth process (17). However, established tumors may be consecutively seeded by an influx of cells from other tumors (23, 24), which presents a plausible mechanism for the dynamic transfer of clonal diversity between tumors.

The probability to successfully colonize a distant site depends on many factors (e.g., cancer type, metastatic potential, distance to site, and anatomy), described by the classical “seed and soil” hypothesis put forth more than a century ago (25⇓–27). A consequence of this hypothesis is that if a primary tumor disseminates highly potent seeds to a perfectly compatible and nearby soil, this site will receive a constant stream of incoming and proliferating cancer cells. In contrast, a distant and unfavorable colonized site might receive one or very few cancer cells that can then expand. The seeding of metastases is therefore bounded by two extreme hypothetical scenarios: (*i*) A site is colonized by a single founding cell that expands by cell division to a detectable metastasis, such that the primary tumor and metastasis share only the mutations present in that founding cell, and (*ii*) a site is colonized by continuous influx of cancer cells and expands solely by this continuous influx, such that the primary tumor and metastasis on average contain the same genetic diversity (Fig. 1).

Many metastases might be established by a process which lies between these two extreme points, in which a tumor expands due to a balance of consecutive seeding events and subsequent cell divisions. However, previous mathematical models of metastasis have focused almost exclusively on extreme *i*, single-cell seeding (28⇓⇓⇓–32). Although not yet studied in the context of cancer genetics, some mathematical models of consecutive immigration have been applied to other biological systems, in particular island populations (33⇓⇓–36). Yet these models from population genetics typically assume populations of fixed size (37), whereas the rapid growth of tumors can lead to dramatically different predictions (38, 39).

Here we develop a mathematical framework that generalizes past models to allow for multiple consecutive seeding events during tumor expansion, enabling us to assess and estimate the balance between seeding and cell division during the growth of metastases. This framework establishes a precise, quantitative mapping between the rates of seeding influx to each metastasis and the clonal diversity of metastases. This mapping can be used to predict tumor clonal diversity when information about the rate of seeding is known, and inversely it can be used to estimate the seeding rate from clone frequency data measured across multiple tumors within a patient.

## Model Formulation

We developed a mathematical framework using a multitype continuous-time branching process (4, 9, 40⇓–42) to assess the dependence of metastatic heterogeneity on the seeding rate, birth rate, and death rate of cancer cells in a growing lesion (*SI Appendix*, Fig. S1). We consider a primary tumor that seeds M growing metastases and assess the composition of each metastasis once it has reached a detectable size Y. Each cell in the metastases derives from one of N clones, where every clone has a constant size in the primary tumor. Cells from each clone

After arriving at the new site, the cells from each clone i replicate according to an exponential birth–death process with division rate *SI Appendix*, Fig. S2) are reported in *SI Appendix*.

We evaluate the heterogeneity of a metastasis at a detection time T, defined as the first time that the total size of the metastasis

## Results

Our mathematical framework gives rise to several predictions about the shared genetic diversity, the proportion of polyclonal metastases, and the distribution of detection times for metastases growing with consecutive seeding. First, a key prediction of consecutive seeding is that the number of clones shared between the primary tumor and metastasis can increase over time as the metastasis grows and is consecutively seeded by cells from the primary tumor; this is a distinguishing feature from polyclonal cluster seeding, where the number of clones shared between the primary tumor and the metastasis decreases over time as lineages are lost to extinction (*SI Appendix*, Fig. S9). To investigate the clonal dynamics of metastasis growth under our model of consecutive seeding, we calculate the average size

Because stochasticity in metastasis growth can lead to deviation from this mean behavior, we also computed the full probability distribution for the clone size *SI Appendix*). We find that the stochastic size *SI Appendix*.

To assess the clonal composition of a detected metastasis, we define *SI Appendix*, emerges from the Pólya urn scheme of sampling with double replacement. In this statistical scheme, the clonal membership of each cell in a metastasis is evaluated in sequence: For the first cell, sampled at random, its probability to be of a particular clone is simply given by the prior distribution of clone sizes in the primary tumor; but once the clonal membership of the first cell is identified, the probability that the second cell is of the same clone is increased relative to the prior distribution, and so on for each cell identified in this manner.

This scheme can be applied to evaluate the number of clones n present with nonzero size in a metastasis of size Y. We find that the mean number of clones n present in the metastasis is*A* and *SI Appendix*). This polyclonality probability is greatest when multiple clones have a high seeding influx. If only one clone has a high influx, or if all clones have a low influx, then polyclonality will be rarely detected because one clone dominates the metastasis (Fig. 3 *B* and *C* and *SI Appendix*, Fig. S3 *A* and *B*). In the particular case that each clone has an equal and small seeding rate

In practice, clones and their population sizes are not measured directly and are instead approximated using mutation frequencies in bulk sequencing samples (4, 53). We therefore adapt our results, denoting the frequency of each clone i in the metastasis as *SI Appendix*, we show that the vector of clone frequencies

This precise mapping between the clonal composition of the primary tumor and its metastases, mediated by the seeding rates, can be simplified when considering only the clonal diversity of the tumors, rather than the full set of clone frequencies. Clonal diversity, measured on a scale 0 (least diverse) to 1 (most diverse), is a simple but informative summary metric for clonal composition; a natural measure of the clonal diversity of a tumor is the Simpson index, defined here as the probability that two cells selected at random from the metastasis are heteroclonal (descendants from different clones) (55). In a large tumor, this is calculated according to the expression *D* of a tumor, the fraction *D*).

Moreover, in our analytic framework, the ratio of the mean clonal diversity *E* and *SI Appendix*, Fig. S3*E*). This ratio can be interpreted as the mean fraction of clonal diversity that is disseminated from the primary tumor to the metastasis. This analysis can also be extended to quantify intermetastatic heterogeneity (2, 9): If a primary tumor seeds M metastases with equal rates, the difference in clone composition among the metastases is captured by the fixation index *SI Appendix*, Fig. S3*F*). This quantity, a standard measure of clonal differentiation in population genetics (54, 56, 57), can be readily estimated from genetic data collected from spatially segregated metastases (58, 59). From the above expression, we find that as additional metastases are seeded, the clonal diversity of the aggregate metastatic population will converge to that of the primary tumor,

Because the above results make predictions about clonal diversity given the seeding influxes of each clone, we can invert our model to infer the seeding influxes from measurements of clonal frequencies across multiple tumors in a patient. In this inference approach, we observe the clonal frequencies *SI Appendix*). If the clonal composition of the primary tumor *SI Appendix*, Fig. S4). To first order, the MLE seeding influx *SI Appendix*, Fig. S5*A*). Specifically, in *SI Appendix* we show that

To demonstrate how this model-based inference approach can be used, we identified three published studies that reported sequencing results from multiple tumors collected simultaneously from a patient and revealed a pattern of at least two shared clones between tumors (13⇓–15). Because these patterns can be explained only by several cells seeding a tumor, rather than just one, these datasets were appropriate for our inference approach; any dataset consistent with a single-cell seeding model would result in a maximum-likelihood estimate of zero consecutive seeding in our framework. In cases where multiple samples from a patient were collected from the originating organ and the true primary tumor site was unclear in the literature (16), inference was conducted across all tumor samples jointly regardless of anatomical location.

First, using a clone frequency dataset from whole-genome sequencing of 68 tumor samples across 7 patients with high-grade serous ovarian cancer with intraperitoneal metastasis (13), we apply our MLE approach to estimate the seeding influx of each clone (Fig. 4 and *SI Appendix*, Fig. S6). Peritoneal metastasis represents an ideal test case for our inference approach because cancer cells that enter the peritoneal cavity are thought to mix easily within this space, facilitating consecutive seeding. We find that our total seeding influx estimates span the range

We analyzed a second dataset of 31 tumor samples from 4 patients with metastatic breast cancer (14), and surprisingly we find qualitatively similar results (Fig. 4 and *SI Appendix*, Fig. S7), although in these cases metastasis must have occurred through lymphatic or hematogenous routes. The total seeding influx estimates vary in the range

We also studied 4 pairs of primary tumors and metastases from patients with colorectal cancer (15) and again estimated similar seeding influx values. Patients A01, A02, and A04 had MLE seeding influxes of 1.4, 7.1, and 5.8 cells, respectively. However, patient A03 had a primary tumor and metastasis with very similar clonal compositions, leading to an unusually high MLE seeding estimate of 114.5 cells. We note that patient A03 had the smallest number of clones (

For every sample in our analysis, our 95% confidence interval for *Materials and Methods*). We find that even when we introduce substantial measurement error of more than 50%, our maximum-likelihood estimates are robust, changing by less than 6.3% (*SI Appendix*, Fig. S5*B*), indicating that our framework is robust even to large amounts of uncorrelated noise in the data.

Using the estimated k values from our inference results, we can infer the total number of cells X that migrated to each tumor before detection and gave rise to a surviving lineage according to the expression for its expected value,*SI Appendix*, Fig. S3 *C* and *D*). In the patients with ovarian cancer, for a typical survival probability of *et al*. (16) find that a minimum of 6 to 10 consecutive seeding events (or “comigrations”) per patient are necessary to explain the observed clone patterns across samples. Because our MLE value is chosen to correspond to the most likely number of cells, rather than the smallest possible (most parsimonious) number, our estimates consistently exceed this minimum, as expected.

In the patients with breast cancer, we estimate *SI Appendix*, Table S1 and visualized in Fig. 4 and *SI Appendix*, Figs. S6 and S7. These estimates are more accurate when measurements of ρ and Y are known, as *SI Appendix*, Fig. S3). In particular, for larger metastases with

## Discussion

The presented mathematical framework quantitatively captures the stochasticity of metastatic seeding, cell division, and cell death, as well as clonal competition during the colonization of distant sites. We have derived from this stochastic framework a set of baseline predictions for clonal diversity that can be readily compared with observations as a means of evaluating to what extent these simple principles can explain the observed range of clonal complexity. We demonstrate that continuous seeding, as a mechanism for the transfer of clonal diversity between tumors (13, 50), can act as a filter of intratumoral heterogeneity and thereby influence the probability of resistance and treatment success (3, 46, 60). Given measurements of only 3 independent parameters, the model predicts the number of clones that are transferred to each metastasis before its detection (Eq. **4**), the fraction of polyclonal metastases (Eq. **5**), the distribution of clone frequencies in each metastasis (Eq. **6**), the expected fraction of clonal diversity transferred to a metastasis (Eq. **7**), and several other quantities of interest.

These model predictions can be inverted to provide a means of estimating the seeding influxes and mean clone frequencies in the primary tumor. Our analysis of 68 tumor samples from patients with ovarian cancer, 31 tumor samples from patients with breast cancer, and 4 pairs of primary tumor and metastasis samples from patients with colorectal cancer yielded seeding influx estimates consistently in the range 0.6 to 11.6 cells per generation time. These datasets were chosen because they include explicitly reported clone frequencies. Our high seeding influx estimates reflect the high degree of shared clonal diversity observed in the patients included in these datasets, as it is likely that these patients have higher seeding influxes than most patients with cancer. We note that, in contrast to the suggestion of McPherson *et al*. (13), our model demonstrates that invoking a nonuniform fitness landscape is not required to explain the high proportion of polyclonal metastases observed in some patients with cancer. Rather, the stochastic features of metastasis growth, coupled with a seeding influx that falls in the range 0.6 to 11.6 cells per generation time, are sufficient to explain these observations.

The simple, analytical form of our results reveals how various quantities precisely depend on the model parameters and provides a means of calculating these quantities without the need for computationally expensive numerical simulation. As such, these results may be readily integrated in computational methods that seek to infer the clonal composition of tumors and their metastatic seeding patterns (4, 16, 50). We note several simplifying assumptions made to ensure tractability of the model. First, we assume that metastasis occurs after the primary tumor has reached a steady size and stable clonal composition. Consequently, the model may underestimate the variance in some predictions by neglecting possible fluctuations in the primary tumor size and clonal frequencies. In cases of early metastasis, these fluctuations have been modeled according to an upstream branching process in the primary tumor (9, 40). Very high seeding influxes k or survival probabilities ρ would increase the probability that surviving lineages are seeded early during primary tumor growth. Second, we model only the clonal diversity established in the primary tumor and not new clones that may arise in a growing metastasis. These new clones may be rare due to low mutation rates and relatively unlikely to outcompete established clones (9, 61, 62). Third, it is possible that the dissemination rate *SI Appendix*, Fig. S5*B*).

Our results describe properties of unidirectional consecutive seeding from a primary tumor to metastases and do not explicitly account for seeding between metastases (*SI Appendix*, Fig. S8). Nonetheless, our model can provide a useful approximation even in more complicated seeding scenarios. If a metastasis Z is seeded by another metastasis Y (with equal parameters governing the growth of both) rather than by the primary tumor X, the first seeding event on average occurs when metastasis Y is already a fraction *SI Appendix*). Since at this size the clonal fractions in the tumor are stable, our inference framework for the seeding influx is not significantly affected. This result applies equally well to reseeding or self-seeding, in which cells that have left the primary tumor later return (23, 24), because metastasis Z can represent the population of the primary tumor X that has ancestry in metastasis Y. Then only k surviving cells return to the primary tumor during metastasis growth, again resulting in a negligible effect on neutral clone frequencies in either tumor (*SI Appendix*). Even when the reseeding outflux is as high as twice the seeding influx k, neglecting reseeding altogether has minimal effect on our seeding estimates (*SI Appendix*, Fig. S5*C*).

Intratumoral heterogeneity, a facilitator of treatment resistance and tumor relapse, is directly mediated by the seeding dynamics of cancer cells. Cancers characterized by a high rate of cell dissemination and mixing are especially likely to give rise to highly heterogeneous metastases as the cancer progresses. Our model of the transfer of clonal diversity between tumors, along with the corresponding analytical results and inference approach developed in this work, provides the tools to predict the genetic diversity and differentiation index of metastases, as well as to estimate the seeding influxes that gave rise to that diversity. Metastasis is a stochastic process that can generate considerable intratumoral heterogeneity, and understanding its role in determining this heterogeneity will be an important step toward providing more effective treatment.

## Materials and Methods

### Model.

We model the growth and evolution of a metastatic lesion as a continuous-time multitype branching process (40, 43, 44). Each lesion originates from a single cell but is consecutively seeded by additional cells over time. For more details, see *Model Formulation*.

### Analysis.

Using the mathematical properties of a Poisson process to describe consecutive seeding events, we derive several statistical quantities of interest in a stochastic setting. Full details and derivations of our results are provided in *SI Appendix*.

### Simulations.

We simulate the multitype branching process using the Gillespie algorithm (63) until a total tumor size of Y cells is achieved. For statistics, we conduct 100,000 independent realizations of our simulation for each set of model parameters (Table 1).

### Robustness.

For each of **6**. After multiplying each frequency by an independent multiplicative error factor and renormalizing, we computed the MLE influx *SI Appendix* for further details.

## Acknowledgments

We thank Jeffrey Gerold and Allison Paul for helpful discussions. This research is supported by the National Science Foundation under Grant DGE-1144152 (to A.H.), by the National Institutes of Health under Grant K99CA229991 (to J.G.R.), by NIH/National Cancer Institute Grant R37CA225655 (to K.N.), and by the Bill and Melinda Gates Foundation. Any opinions, findings, and conclusions expressed herein do not necessarily reflect the views of the supporting institutions.

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: aheyde{at}g.harvard.edu or martin_nowak{at}harvard.edu.

Author contributions: A.H., J.G.R., K.N., and M.A.N. designed research; A.H. performed research; A.H. analyzed data; and A.H., J.G.R., K.N., and M.A.N. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. A.S. is a guest editor invited by the Editorial Board.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1819408116/-/DCSupplemental.

- Copyright © 2019 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

## References

- ↵
- ↵
- B. Vogelstein et al.

- ↵
- ↵
- ↵
- ↵
- S. Turajlic,
- C. Swanton

- ↵
- J. Z. Sanborn et al.

- ↵
- ↵
- J. G. Reiter et al.

- ↵
- R. Rosenthal,
- N. McGranahan,
- J. Herrero,
- C. Swanton

- ↵
- J. E. Talmadge,
- S. R. Wolman,
- I. J. Fidler

- ↵
- N. Aceto,
- M. Toner,
- S. Maheswaran,
- D. A. Haber

- ↵
- ↵
- P. Savas et al.

- ↵
- Q. Wei et al.

- ↵
- M. El-Kebir,
- G. Satas,
- B. J. Raphael

- ↵
- R. Maddipati,
- B. Z. Stanger

- ↵
- G. Macintyre et al.

- ↵
- ↵
- ↵
- K. J. Cheung et al.

- ↵
- Z. Ahmed,
- S. Gravel

- ↵
- ↵
- ↵
- ↵
- ↵
- A. C. Obenauf,
- J. Massagué

- ↵
- ↵
- K. N. Yamamoto,
- A. Nakamura,
- H. Haeno

- ↵
- ↵
- ↵
- P. K. Newton et al.

- ↵
- W. F. Bodmer,
- L. L. Cavalli-Sforza

- ↵
- ↵
- ↵
- ↵
- W. J. Ewens

- ↵
- ↵
- ↵
- R. Durrett

- ↵
- ↵
- D. Wodarz,
- N. L. Komarova

- ↵
- K. B. Athreya,
- P. E. Ney

- ↵
- M. Kimmel,
- D. E. Axelrod

- ↵
- ↵
- ↵
- I. Bozic,
- M. A. Nowak

- ↵
- ↵
- ↵
- E. Renshaw

- ↵
- S. Tavaré

- ↵
- ↵
- ↵
- R. Durrett,
- J. Foo,
- K. Leder,
- J. Mayberry,
- F. Michor

- ↵
- G. Bhatia,
- N. Patterson,
- S. Sankararaman,
- A. L. Price

*F*_{ST}: The impact of rare variants. Genome Res. 23, 1514–1521 (2013). - ↵
- ↵
- W. Zhai et al.

- ↵
- R. Sun et al.

- ↵
- ↵
- N. Sharygina,
- H. Veith

- J. G. Reiter,
- I. Bozic,
- K. Chatterjee,
- M. A. Nowak

- ↵
- ↵

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Medical Sciences

- Physical Sciences
- Applied Mathematics