Consecutive seeding and transfer of genetic diversity in metastasis

Significance The success of cancer treatment largely depends on the genetic mutations present within metastases, which cause 90% of cancer-related deaths. Genetically diverse metastases are more likely to harbor resistance mutations, contributing to treatment failure. It is often assumed that each metastasis is seeded exactly once, such that its diversity cannot be inherited and instead must emerge entirely during growth, yet many metastases have a diversity pattern inconsistent with this assumption. We introduce a mathematical model of consecutive seeding by multiple cells that can explain these patterns. We then apply this model to tumor sequencing data to infer that 10 to 150 cells seeded each metastasis. We derive predictions for the fraction of transferred diversity and the proportion of polyclonal lesions.

Given the frequency of a particular clone of interest in a metastasis and its frequency in the primary tumor that seeded it, the maximum likelihood estimate (MLE) for the seeding influx k from the primary tumor to the metastasis is shown using two different axis formats: (A) linear scale, which emphasizes intermediate clone frequencies and (B) logit scale, which emphasizes extreme clone frequencies near 0 or 1. Our inference approach computes the MLE seeding influxk for a test dataset of 1,000 independently simulated tumor samples with N = 100 clones, each with a true seeding influx drawn in the range k = 10 −3 to k = 10 3 . (A) Both the average per-clone MLE seeding influxk/N and its true value k/N scale inversely with the KL divergence D KL (γ γ) between the clonal composition of the two tumors. (B) The MLE seeding influxk ξ was estimated after exposing the test dataset to multiplicative error with standard deviation ξ, and the MLE changek ξ /k and MLE errork ξ /k were nearly 1 for all error levels ξ < 1. (C) In simulations in which cells could leave the metastasis and reseed the primary tumor with influx k 1 2 , where the magnitude of reseeding is measured relative to the forward seeding influx k 2 1 by the reseeding ratio k 1 2 /k 2 1 , we find that the forward MLE seeding influxk 2 1 is relatively unchanged compared to its valuek estimated without reseeding, and the MLE error ratiok 2 1 /k is generally negligible.  [2]. As estimated by MLE over the distribution given by Eq. (6) in the main text, the circulating clone frequenciesγ i of each clone i are depicted as wide colored bars in the top bar chart, and the seeding influxes k j for each tumor j (the mean number of arriving cells per generation time) as wide gray bars in the right bar chart, with black standard error bars. In addition, the narrow blue bars depict the mean clone frequencies across all metastases (top bar chart) and the fraction of diversity transferred from the circulating cells to each metastasis (right bar chart). The tree in each panel depicts the inferred phylogenetic relationship of the detected clones in the patient.
Supplementary Figure 8: Dissemination of clonal diversity via intermediate populations. We simulate our model framework for a primary tumor (large center circle) composed of four neutral clones of equal size at maturity. This primary tumor seeds five secondary populations (inner ring) with a mean influx of k = 0.33, sampling from the clonal frequencies of the primary tumor. Once mature, each secondary population seeds five tertiary tumors (outer ring) with the same mean influx k = 0.33. For these values, more than 99% of secondary tumors are polyclonal, but only 40% of tertiary tumors are polyclonal. C. B.

D.
Supplementary Figure 9: Properties of cluster seeding. If a cluster of 1-14 cells were to seed a tumor in our model, typically only 0-1 cells would produce surviving lineages, while the others would produce non-surviving lineages that go extinct quickly. (A) The probability that at least two cell lineages survive in the long run increases with the initial cluster size. For a typical cluster size of 6 cells, this probability is just 3.3%. (B) The expected time until all non-surviving cell lineages go extinct is just 12-15 days, using a growth rate r = 0.0125 days −1 . (C) A non-surviving lineage might expand in the short run before it goes extinct, but on average only to a maximum of a few dozen cells at most, plotted here as an increasing function of the initially seeded cluster size. (D) Before its extinction, even the non-surviving lineage that divides the most has on average no more than 200 divisions. In contrast, the surviving lineages must divide far more than Y = 10 8 times. For all panels, we use ρ = 5.0% and average over 10 6 independent numerical simulations. Estimates for all seeding influxes k j , defined as the mean number of cells that migrate to metastasis j per cell generation, and all seeding frequenciesγ i , defined as the mean number of migrating cells descended from clone i, for each patient included in our analysis. Estimates were obtained by maximum likelihood estimation using the consecutive seeding model.

Supplementary Analysis
We consider a growing metastasis composed of y(t) total cells at time t [3,4,5,6,7]. Each cell in the metastasis derives from one of N clones originating in the primary tumor, and we let y i (t) denote the number of cells derived from clone i = 1, . . . , N in the metastasis at time t (Fig. S1). Cells from each clone i arrive at the metastasis with a constant seeding rate λ i , which can be considered the product of three factors: the frequency of the clone in the primary tumor, the total size of the primary tumor, and the average per-cell likelihood of metastasis from the primary tumor. This latter factor may be affected, for example, by the clonal genotype or the spatial arrangement of clones in the primary tumor. Consequently, we avoid strict assumptions about the magnitude of λ i , allowing for wide-ranging seeding rates. The total seeding rate across all clones is denoted by λ = N i=1 λ i . Once at the metastasis site, each cell of clone i replicates according to an exponential birthdeath process with division rate b i and death rate d i [8,9]. Each clone i has a positive net growth rate r i = b i −d i , and the average number of clone i migrants per generation is given by the seeding influx k i = λ i /b i . The probability that a cell gives rise to a lineage that survives Overall, each clone varies in size with three degrees of freedom, either expressed as the model rates (b i , d i , λ i ), or more conveniently, the model parameters (r i , ρ i , k i ). If time is measured in units of average generation time such that b i = 1, then note that ρ i = r i and k i = λ i . Several of our derivations consider the special case of neutral passenger diversity, such that b i = b and d i = d for all clones i, but the rates λ i can freely vary between clones [10,11,12,13].
We initialize the metastasis to have zero size at time t = 0, such that y i (0) = 0 for all clones i. The detection time T can then be defined as the first time that the total metastasis size y(t) reaches some fixed detection size Y , in terms of the average number of all migrants per generation, . Proof: If there are y i (t) cells derived from clone i at time t, then the number of cells y i (t+τ ) derived from clone i at time t + τ is given by where the numbers of births B i and deaths D i between time t and time t + τ , in the limit as τ → 0 + , follow a binomial distribution in which each of the y i (t) cells has a probability b i τ to divide and d i τ to die, each being sufficiently small to permit a Poisson approximation: and the number of migrants L i between time t and time t + τ follows a binomial distribution in which each of the x i cells of clone i in the primary tumor has a probability i τ to migrate: Taking the expectation of both sides of equation 1 gives where the seeding rate is defined as λ i = x i i . Rearranging, we obtain since r i = b i − d i , and taking the limit as τ → 0 + gives the rate of clone size expansion, where the overset dot denotes the time derivative. The solution of this differential equation for the clone sizeȳ i (t) is given bȳ where the initial condition of no cells at the metastatic site,ȳ i (0) = 0, has been applied.
Proposition 2. The size y i (t) of the ith clone follows the negative binomial distribution Proof: The master equation that governs the probability P i that the ith clone size is equal to y i at time t can be written aṡ (10) where we enforce the normalization condition ∞ y i =0 P i (y i |t) = 1 and the boundary condition P i (−1|t) = 0, because clone sizes y i must be non-negative integers. This master equation can be rewritten in terms of the backward step operator S, defined such that S k [f (y i )] = f (y i −k): It is useful to again rewrite equation 11 in terms of the probability generating function of the clone size, defined mathematically as where a prime denotes differentiation with respect to s. This first-order linear partial differential equation can be solved by the method of characteristics, yielding the solution where ρ i = 1 − d i /b i , and k i and q i are defined as in equation 9. We recognize that G i (s|t) is the generating function of a negative binomial random variable with parameters k i and q i , with the associated probability mass function for the clone size y i (t) given by over the support of non-negative integers y i , which hence solves the master equation 11. The first factor in this expression is a binomial coefficient and can be calculated even if the mean number of migrants per generation k i is not an integer by extending the factorial in the usual manner by using the gamma function, so that Corollary 2.1. The coefficient of variation in clone size converges to 1/k i over time.
Proof: The variance in clone size y i at time t can be derived from its probability generating function G i (s|t) by computing the quantity Computing this value using the probability generating function found in equation 12 gives Using the mean from Proposition 1, the coefficient of variation in clone size is then Over time, for e r i t 1, this coefficient of variation rapidly converges to 1/k i .
Corollary 2.2. The size of a metastasis y i at time t can be equivalently generated via a repeated Bernoulli process. Beginning with an empty site for the metastasis, 1. Create a pool of k i circulating cells andȳ i potential daughter cells.
2. Choose a cell uniformly at random from this pool and add a copy of it to the metastasis. If, with probability q i = k i k i +ȳ i , a circulating cell is chosen, then remove a cell at random from the metastasis.
3. Repeat step 2 until k i circulating cells have been chosen from the pool. The number of cells in the metastasis will then be distributed as desired, y i ∼ NBin(k i , q i ).
Proof: Arbitrarily, we will refer to the event that a circulating cell is chosen as a "failure" and the event that a daughter cell is chosen as a "success." The number of cells y i in the metastasis following the process described here is then the number of successes before k i failures occur, each with failure probability q i . This is one standard definition of the negative binomial distribution.
To show that this is equivalent, we can compute the probability that there are y i successes and k i − 1 failures before the k i th failure. There are y i +k i −1 y i ways to choose the position of the k i − 1 failures in the string of the first y i + k i − 1 outcomes, and the probability of each of these strings is the product of the probability of each outcome: The y i + k i trial will then fail with probability q i . Therefore the probability that the first y i + k i − 1 outcomes will have k i − 1 failures and the k i th outcome will be a failure is This precisely matches equation 14, and it is the probability mass function of a variable with a negative binomial distribution, y i ∼ NBin(k i , q i ).
Corollary 2.3. The number of surviving lineages L i (t) at time t, each originated by a unique seeding event, follows a Poisson distribution with mean −k i ln q i (t).
Proof: Because the seeding rate is assumed to be constant in time, the number of seeding events will be Poisson-distributed with mean λ i t. The probability that each lineage will survive until a time τ after it arrives is given by The number of surviving lineages at time t is then Poisson-distributed with mean which is approximately λ i t at small times t 1/r i and approximately λ i ρ i t − k i ln ρ i at large times t 1/r i .
Corollary 2.4. The size of each lineage at time t follows a log-series distribution x ij (t) ∼ LogS(1 − q i (t)), with probability mass function . Because these variables are related as Using the generating function of y i (t) from equation 13, this is equivalent to This directly implies that which is precisely the probability generating function of the LogS(1 − q i (t)) distribution.
Lemma 2.6. For sufficiently long times rt 1, the deviation from expected sizeŷ i (t) converges to a gamma distribution, Proof: To describe the shape of the clone size distribution in the limit of long times, we consider the distribution ofŷ i = y i /ȳ i in the limit t → ∞. The probability density function f i (ŷ i ) ofŷ i can be calculated as follows: This is the probability density function of a Gamma(k i , k i ) random variable.
Proposition 3. If the detection size Y is sufficiently large, then the detection time T follows a log-gamma distribution with meanT and modeT equal tō Proof: If the detection size Y , and hence also the detection time T , is sufficiently large that rT 1, then Lemma 2.6 gives thatŷ(T ) = y(T )/ȳ(T ) approximately follows a Gamma(k, k) distribution. We know that y(T ) = Y and we knowȳ(T ) from Proposition 1, sô Since rT 1, then this is approximately Y r λ e −rT ∼ Gamma(k, k). We define ω = e −rT , and applying the scaling property gives that ω = e −rT ∼ Gamma(k, ρY ). Using the change of variables T = −r −1 ln ω gives the probability density function of the detection time T , corresponding to a log-gamma distribution. To compute the mean detection time, we note thatT = −r −1 E[ln ω], and since ω follows the gamma distribution as noted above, we can apply the known expression for the mean of the logarithm of a gamma random variable: where ψ(x) = d dx ln Γ(x) is the digamma function. In addition, the mode detection timeT can be directly computed by solving d dt f T (t) = 0 to givê Corollary 3.1. The difference between the mean and mode detection times,T −T , scales as (rk) −1 for small k but scales as (2rk) −1 for large k.
Proof: The difference between the mean and mode detection times is Expanding this expression about k = 0 gives where c γ ≈ 0.577 is the Euler-Mascheroni constant and π ≈ 3.14 is the Archimedes constant. For very small k 1, the (rk) −1 term dominates, and so approximatelyT −T = (rk) −1 . Hence, in the limit k → 0, we have thatT −T → ∞. If we instead expand the expression for the difference about k = ∞, we obtain and for very large k 1, the (2rk) −1 term dominates, so approximatelyT −T = (2rk) −1 . Hence, in the limit k → ∞, we have thatT −T → 0. Proof: The average tumor size at the mode detection time can be computed as where the approximation assumes that rT 0. In contrast, for the mean detection time, where we have used the fact that ln(k) > ψ(k) from Corollary 3.1.
Corollary 3.3. The expected number of cells that migrated to the metastasis prior to detection and gave rise to a surviving lineage is given bȳ Proof: The expected number of cells that migrate to the metastasis over a time period τ is λτ , and the expected fraction of those cells that give rise to a surviving lineage is ρ. Hence the expected number of cells that migrate during the time period T before detection and give rise to a surviving lineage is which is precisely the result to be obtained.
Proposition 4. In a tumor of size Y , the vector of neutral clone sizes (Y 1 , . . . , Y N ), which obey Y = N i=1 Y i , follows a Dirichlet-multinomial distribution, with mass function Proof: Let ΣY i denote the sum of the first i clone sizes, i j=1 Y j , and let Σk i denote the sum of the first i seeding ratios, i j=1 k j . The distribution of ΣY i is proportional to since the first n clones can be treated as a single superclone with total size ΣY i and seeding ratio Σk i . If the n th clone size is known, then this clone can be excluded from the superclone, and so the superclone has only total size ΣY i−1 and seeding ratio Σk i−1 . Mathematically, Conversely, the marginal distribution of the i th clone size Y i given Y = ΣY i can be found using Bayes' law, where the proportionality factors can be eliminated to give which is the mass function of a Beta-binomial distribution. Now we can compute the full joint mass function of Y 2 to Y N (which also specifies that since many of the binomial coefficients balance with the neighboring factors in the product. Then, noting that ΣY 1 = Y 1 , Σk 1 = k 1 , ΣY N = Y , and Σk N = k, we obtain the result where we need not condition on ΣY N because it must be equal to the known tumor size Y .
Corollary 4.1. The mean number of clones n in the metastasis is given bȳ Proof: Consider only the joint distribution of the i th clone size Y i and the total number of other cells, which has a superclone size Y − Y i with seeding ratio k − k i . Using the result from Proposition 4, we know that the joint probability mass function for these two sizes is To find the probability that the i th clone is absent, we evaluate this at Y i = 0 to obtain The mean number of clones n in the tumor is then given bȳ which gives precisely the result to be proved.

Corollary 4.2. The probability that a metastasis is polyclonal is
Proof: To find the probability that the i th clone size Y i is equal to the total tumor size Y , we simply evaluate equation 52 at Y i = Y to obtain The probability that the tumor is monoclonal for any of the clones is then given by the sum , and so the probability of polyclonality is the complement of this sum, which gives precisely the result to be proved.
and I n is the set of all unordered combinations of length n from the set I = {1, 2, . . . , N }. Proof: The probability that the subset of clones S present in the metastasis is included in any subset of clones I n ⊆ I = {1, 2, . . . , N } is and summing over all sets I n ⊆ I with n elements, that is over the elements I n of I n , gives B n /B N . We then apply the inclusion-exclusion principle to obtain the distribution of n, and setting the index of summation to m = n − j + 1 gives the desired result.

Corollary 4.4.
In the case where all N clones are seeded at an equal and low rate k i = k/N 1 and the tumor size is large, Y 1, the polyclonal probability is approximately where κ = k 1 − 1 N is a clone-adjusted seeding influx.
Proof: Given k i = k/N for all clones i, the expression for the polyclonality probability from Corollary 4.2 can be written as For a large tumor size Y , we apply the binomial approximation Y +z Given k/N 1, we can approximate Γ(k/N ) ≈ N Γ(k)/Γ(k − k N + 1). Then we have and because Γ(κ + 1) = κ!, this is precisely the result to be proved.
Proposition 5. The vector of clone frequencies (γ 1 , . . . , γ N ) follows a Dirichlet distribution, with joint probability density function Proof: Given the total tumor size Y , we can rewrite the result of claim 4 in terms of the clone frequencies γ i = Y i /Y by using the change of variables Y = γY , Using Stirling's approximation ln Γ(x) ≈ x(ln x − 1) for large x, we can approximate the logarithm of the binomial coefficient for large Y with the expression With this approximation, the logarithm of the joint density function is approximately provided that Y 1. Exponentiating this expression directly gives the result to be proved.
Corollary 5.1. The mean, variance, and covariances in clone frequencies γ i are given bȳ Proof: Consider only the joint distribution of the ith clone frequency γ i and the total frequency of other cells 1 − γ i . Using the result from Proposition 5, we know that the joint probability density function for these two frequencies is that of the Beta distribution, which is the marginal distribution of γ i . The ith clone frequency has the first moment which is sensible, because the only asymmetry between neutral clones is the asymmetry in seeding rates; hence the average frequency of a clone in a mature metastasis is the fraction of migrants to that metastasis that are in that clone. The second moment is The variance in clone frequency can then be calculated to give the desired result. Finally, the crossed second moment for two of the clone frequencies is for i = j. The covariance between two clone frequencies is thus Corollary 5.2. The mean number of clones n in the metastasis that are detected (that is, present at a frequency greater than τ where 0 < τ < 1) is given bȳ where I x denotes the regularized incomplete beta function.
Proof: The probability that each clone has a frequency less than the detection threshold τ is given by integrating equation 71, The mean number of clones which are not detected is then which immediately implies the result to be proven. Proposition 6. The ratio of the mean clonal diversity of a metastasisD 2 to the mean clonal diversity of the population of migratory cells in circulation D 1 is Proof: Of the k average migrants per generation, on average k i are of clone type i, so the probability that any given chosen migratory cell in circulation is of clone type i is k i /k. The probability that two chosen migratory cells, sampled with replacement, are of the same clonal type is therefore N i=1 (k i /k) 2 . The mean clonal diversity of this population of migratory cells, defined as the probability that the two sampled cells are different, is then the complement, The mean clonal diversity of the metastasis is instead where we have used the result of equation 73. Then we see that

this becomes
which immediately implies the result to be proven.
whereD * 2 denotes the mean clonal diversity in the population of cells aggregated across all metastases seeded by the same primary tumor.
Proof: If each metastasis receives k mean migrants per generation, the aggregate population of M metastases receives M k mean migrants per generation. Otherwise the growth of the aggregate population behaves identically to an individual metastasis, and so by replacing k with M k in equation 82 we obtain an expression for the clonal diversity across all metastases, Using our expressions forD 2 andD * 2 , we calculate the clonal differentiation index F ST to be We note that this result does not require all of the metastases to be of equal size, only that their sizes all be much larger than k. We further note that when the number of metastases is large, M 1, this implies that F ST ≈ 1 1+k .
Proposition 7. Given the clone frequenciesγ i of each clone i = 1, . . . , N in the primary tumor and the analogous clone frequencies γ ij in each metastasis j = 1, . . . , M , the maximum likelihood estimatesk j for the total seeding influx k j to each metastasis j satisfy Proof: Because the clone frequencies in each metastases are independent when conditioned on the clone seeding rates, the likelihood of the clone frequency data is a simple product over the Dirichlet distribution describing the clone frequencies in each metastasis: The associated log-likelihood is Maximizing the log-likelihood with respect to k j gives Here β ij can be interpreted as the sample bias for the log-scaled clone frequency data, since the expected value of ln γ ij is given by ψ(γ i k j ) − ψ(k j ).
Corollary 7.1. The uncertainty in the maximum likelihood estimate lnk j for the log-scaled seeding influx ln k j given the clonal frequenciesγ i and γ ij is given by such that a 95% confidence interval for k j can be constructed from the boundsk j e ±1.96σ j .
Proof: The first derivative of the log-likelihood function 88 with respect to ln k j is and hence the MLE for ln k j is lnk j . The second derivative with respect to ln k j is then When evaluated at the MLE k j =k j , the final term vanishes, leaving Inverting both sides gives the desired expression for the uncertainty. For a likelihood function that is approximately log-normal in k j , we can treat ln k j as having a normal distribution with meank j and variance σ 2 j . Using the z-score of 1.96 for a 95% confidence interval, an upper and lower bound for ln k j is then lnk j ± 1.96σ j , or roughly two standard deviations above and below the mean. Exponentiating gives the desired upper and lower bound for k j .
Corollary 7.2. Given the clone frequenciesγ i and γ i of each clone i = 1, . . . , N in the primary tumor and a metastasis that it seeded, respectively, and dropping the j subscript, the maximum likelihood estimatek for the total seeding influx k follows the scaling laŵ where D KL denotes the KL-divergence between the clone frequenciesγ and γ, and α is a constant such that α = 1 if D KL (γ γ) N − 1 and α = 2 if D KL (γ γ) N −1 2 min iγi . Proof: By proposition 7, we know that the maximum likelihood estimatek solves Suppose thatk 1, which implies thatγ ik 1 for all i becauseγ i < 1. Since the digamma function has the series expansion ψ(x) = − 1 x − c γ + O(x), then fork 1 we find that Rearranging this equation to solve fork gives the approximate scaling laŵ where D KL is the KL-divergence defined above and S 1 = N i=1γ i ln(1/γ i ) is the Shannon diversity index of the primary tumor. This approximation is valid only fork 1, which requires that D KL (γ γ) + S 1 N − 1. But S 1 ≤ ln N , with equality for uniform seeding ratesγ i = 1 N . Because N > ln N , we can obtain the validity condition D KL (γ γ) N − 1, which implies D KL (γ γ) S 1 ; neglecting S 1 then gives the desired scaling law with α = 1.
At the other extreme, suppose thatk 1, and moreoverγ ik 1 for all i, so that we havê k 1/ min iγi . Using the series expansion ψ( Combining like terms and solving fork gives the approximate scaling laŵ This is the desired scaling law with α = 2, and to ensure the validity conditionk 1/ min iγi is satisfied, we require that D KL (γ γ) Corollary 7.3. The uncertainty in the maximum likelihood estimate lnk scales as where α is the constant described in Corollary 7.2.
Proof: In the low seeding regimek 1 considered in Corollary 7.2, we apply the series expansion ψ (x) = x −2 + O(1) to approximate the uncertainty from Corollary 7.1 as which is the desired scaling law with α = 1. In the high seeding regimek 1/ min iγi , we apply the series expansion ψ (x) = x −1 + 1 2 x −2 + O(x −3 ) to instead the approximation which is the desired scaling law with α = 2. Hence, in both regimes, the behavior of α scales analogously to its behavior shown in Corollary 7.3. With these scaling laws, a 95% confidence interval can be constructed using the upper and lower boundŝ For low seeding with α = 1, this can be simplified numerically ask · 7.1 ± 1 √ N −1 , while for high seeding with α = 2, we instead obtain the numerical simplificationk · 16 where Λ is the Lagrange multiplier. Maximizing this objective J with respect to k j gives precisely the same condition as in Proposition 7. Maximizing with respect toγ i gives Multiplying byγ i and summing over all clones i = 1 to N gives Applying condition (106), we can simplify this to obtain , and so condition (107) becomes Therefore the maximum likelihood estimates for k j andγ i are jointly in the nullspace of β ij , such that the sample bias for the log-scaled clone frequency data vanishes when averaged over the metastases proportionally to their influxes k j , and also vanishes when averaged over the clones proportionally to their seeding frequenciesγ i .
Proposition 8. Consider a tertiary metastasis that is not directly seeded by the primary tumor, but is instead seeded indirectly via a secondary site such as a lymph node or larger metastasis. The ratio of the mean clonal diversity of a mature tertiary tumor D 3 to that of the primary tumor D 1 can be computed as a product over each of the two seeding steps: where k 2 1 denotes the seeding influx from the primary tumor to the secondary site, and k 3 2 denotes the seeding influx from the secondary site to the tertiary metastasis, assuming that the secondary site reaches a steady size prior to seeding the tertiary site (Fig. S2F, Fig. S7).
Proof: The result is a straightforward application of Proposition 6 independently for each of the two seeding steps, and in principle this approach can be extended for any number of consecutive seeding steps as follows: We also note that the covariance in a clonal frequency γ i between two tertiary tumors must be equal to the variance in the clonal frequency at the secondary site. By defining an effective flux k 3 1 between the primary and the tertiary sites such that D 3 D 1 = , we find that this effective flux is given by the harmonic mean of the stepwise fluxes k 2 1 , k 3 2 and their product, It follows that, when only one seeding step is limiting and the other is very rapid, then k 3 1 is approximately equal to the slower of the two stepwise influxes k 2 1 and k 3 2 , such that we have k 3 1 ≈ min{k 2 1 , k 3 2 }, while if both steps are very slow, then k 3 1 is approximately given by the product of the two stepwise fluxes, k 3 1 ≈ k 2 1 × k 3 2 , as is typical of multistep reaction dynamics.
Corollary 8.1. In the case where every pair of tumors has the same seeding influx, k j+1 j = k, the expected time until the secondary site seeds a surviving lineage in a tertiary tumor (or equivalently reseeds a surviving lineage in the primary tumor) is given bȳ for realistically large tumor sizes Y k 2 ρ , where c γ ≈ 0.577 is the Euler-Mascheroni constant. Proof: The expected number of surviving lineages in the tertiary tumorL 3 is the product of the mean size of the secondary tumorȳ(t) that seeds it (derived in Proposition 1), the per-cell seeding rate = λ/Y , and the lineage survival probability ρ, integrated over time: In terms of the factor = k 2 ρY , the time T * at which the first cell from the secondary site seeds a surviving lineage in another site follows a time-inhomogeneous exponential distribution, where is exceedingly small since Y Since e ≈ 1 for small 1, then rT * ≈ Γ ( ). Either expression has the series expansion for small , and dividing through by the growth rate r gives precisely the result to be proved.
Corollary 8.2. At the expected timeT * at which the secondary tumor first seeds a surviving lineage in another tumor, the expected size of the secondary tumor is a fraction 56%/k of its mature size Y .
Proof: The expected size of the secondary tumor at any time t is given by Proposition 1, andT * is given by Corollary 8.1. Substituting the latter into the former gives where the approximation holds for realistically large tumors Y k 2 ρ e 1+cγ . Because we know that e −cγ ≈ 0.5615, we conclude that the ratio of the expected secondary tumor size at the time it first seeds a surviving lineage to its mature size Y is given by 56.15%/k. For a seeding rate in the range 1 < k < 10, this fraction would be in the range 5.615% to 56.15%. With exponential tumor growth, both of these values are near the end of a tumor's growth phase. Corollary 8.3. At the most likely timeT at which the secondary tumor is detected, the expected number of surviving lineages that it will have seeded at some other site (such as a different metastasis or the primary tumor) is nearly equal to the seeding influx k.
Proof: The mean number of seeded surviving lineages is given by equation 115 to bē with = k 2 ρY as before. Substituting in the median detection timeT from Proposition 3, where we can justify the approximation because ln k 1 and | ln | 1 for small 1. Therefore the seeding influx k is not only the expected number of cells transferred from the primary tumor to the secondary tumor per generation, but also the expected number of cells transferred from the secondary tumor to another site during the entire tumor growth phase.