The fractional coalescent

A new approach to the coalescent is introduced. The fractional coalescent (f-coalescent) extends the Kingman's n-coalescent allowing more variability in the patterns of the waiting times. These changes in the waiting times could be caused by population size changes or selection processes. The f-coalescent is derived by applying fractional calculus, in particular the fractional Poisson process which itself is based on the expansion to the fractional waiting times of the conventional Poisson process. The f-coalescent introduces an additional parameter α affecting the steepness of the waiting time function. With an α = 1, Kingman's n-coalescent and the f-coalescent are equivalent. The f-coalescent has been implemented in the population genetic model inference software MIGRATE. With an α = 1, as expected, both, the n-coalescent and f-coalescent implementation deliver the same results. When either data is simulated under models that allow for α < 1 or for real datasets, then Bayes factor comparisons show an improved model fit of the f-coalescent over the n-coalescent.


Introduction
introduced the n-coalescent. The n-coalescent describes the probability density function of a genealogy of samples embedded in a population with fixed size. Extensions to this probabilistic description of the genealogical process include changing population size (Kuhner et al. 1998) , immigration (Notohara 1990;Wilkinson-Herbots 1998), population divergence (Takahata 1988), selection (Neuhauser and Krone 1997), and recombination (Griffiths and Marjoram 1996). These theoretical advances resulted in several widely use computer packages that estimate various population parameters (for example, (Beerli 2006;Hey and Nielsen 2004;Kuhner 2006)). While the waiting times for events in the n-coalescent are exponentially distributed, The extension of this distribution can be introduced by using Fractional calculus. Fractional calculus have attracted considerable interest because of their ability to model complex phenomena such as continuum and statistical mechanics (Carpinteri and Mainardi 2014), visco-elastic materials (Bagley and Torvik 1985), and solid mechanics (Rossikhin and Shitikova 1997). Our work explores the use of the fractional Poisson process in the context of the coalescent. The fractional Poisson process was derived from the standard Poisson process for two different cases: the time-fractional Poisson process and the space-fractional Poisson process. These two different cases can be combined to introduce the time-space-fractional Poisson process (Orsingher and Polito 2012). We focus on the time-fractional Poisson process. Time-fractional generalizations of the homogeneous Poisson process are based on the substitution of the integer-order derivative operator appearing in Cauchy problems with a fractional-order derivative operator such as the Riemann-Liouville fractional derivative (Laskin 2003) or the Caputo fractional derivative (Beghin et al. 2009) (see the Appendix The Mittag-Leffler Functions). Properties of this time-fractional Poisson process have been studied by (Meerschaert et al. 2011) and (Politi et al. 2011). Laskin (2003) introduced a generalization of the time-fractional Poisson process expressing waiting times using Mittag-Leffler functions, which we use for our discussion.
We use the fractional Poisson process (Laskin 2003) to introduce a new model of coalescent, the fractional coalescent or fcoalescent. This f -coalescent is then implemented in a Bayesian estimator of effective population size and immigration rate; we discuss the implementation and runtime characteristics. We explore the quality of the inference for simulated datasets and also apply the new method to existing real data sets of humpback whales, the malaria parasite Plasmodium falclparum, and the H1N1 influenza virus stain.

Fractional Coalescent
Looking backward to the time, in a Wright-Fisher population of size N (haploid), the probability that two randomly selected individuals have the same parent in the previous generation is 1 N , and the probability that they have different parents is 1 − 1 N . The probability that they do share a common parent after t generations is (1 − 1 N ) t . By using a suitable timescale τ such that one unit of scaled time corresponds to N generations, the probability that the two lineages remain distinct for more than one unit of scaled time is as N goes to infinity. Thus, in the limit, the coalescence time for a pair of lineages is exponentially distributed with mean 1 (Nordborg 2001).
If we assume not only the probability of having same parents for two individuals is not 1 N , but there is also m different scenarios changing the coalescence probability then the neutral Kingman's n-coalescent becomes incorrect. For each scenario, we can introduce a coalescence probability as κ i 1 N for all i = 1, ..., m. If we assume that the scenario i is not changing among generation then the probability that the two lineages do not coalescent is To make a generalization of Eq. (2), having different scenarios in each generation, we introduce mapping function ω i such that the probability that the two lineages do not coalescent is As a conclusion, the total probability that the two lineages remain distinct for more than one unit of scaled time is as N goes to infinity, where E α (−τ α ) is the Mittag-Leffler function (see Appendix The Mittag-Leffler Functions) and α is dependent on ω i and κ i . Thus, in the limit, the coalescence time for a pair of lineages is distributed as the fractional generalization of the exponential distribution with rate 1 (MacNamara et al. 2017). This result allowed us to descibe a new framework of the coalescent which we call fractional coalescent or f -coalescent for short. The f -coalescent is based on the fractional Poisson Process which is a generalization of the classical Poisson process. The n-coalescent generalizes the two-lineages framework to k lineages; we similarly expand the f -coalescent from two to k lineages. The details of the proof for Eq. (4) are presented in the Appendix (The Mittag-Leffler Functions). The waiting times in Kingman's n-coalescent have an exponential distribution and we express the probability density function of a time interval u k with k lineages as where with the mutation-scaled population size Θ = 4N e µ where N e is effective population size and µ is mutation rate per generation. For the f -coalescent we express the waiting time with the fractional generalization of the exponential probability distribution. The exponential function is replaced by the Mittag-Leffler function E α,α which has an additional parameter α (The Mittag-Leffler Functions) as We consider values for α in the interval (0, 1]. The Mittag-Leffler function reduces to the exponential function with α = 1. Details of the derivation of E α,α are described in the Appendix. The probability density of all observed u k with k = 2, ..., K given the mutation-scaled population size Θ is the joint probability distribution function as where K is the number of samples. To extract a particular genealogy G out of the many possible topologies defined by the interval times u 2 , u 3 , ..., u K we need to take into account the number of possible configurations at any time u k ; using the Kingman's n-coalescent for any u k and k lineages there are ( k 2 ) potential configurations and we pick one particular one. If we use the same assumption for our f -coalescent that only two lineages per generation can coalesce then we get

Extending to two populations
With two populations, we have two population size Θ 1 and Θ 2 , two different number of lineages k 1 and k 2 , and two coalescent merging rates Also, suppose the mutation-scaled immigration rate from population one to population two is λ 12 = k 2 M 1→2 , and the immigration rate from population two to population one is λ 21 = k 1 M 2→1 . By these assumptions, the total rate of waiting time is λ = λ 1 + λ 2 + λ 12 + λ 21 . The probability distribution function for every time interval x on the genealogy G using f -coalescent is If a coalescent happens in population 1 If a coalescent happens in population 2 If a migration happens from population 2 to 1 If a migration happens from population 1 to 2 (9) where after simplification we have where u is the time of interval x and E α,α is the Mittag-Leffler function.

Implementation
We implemented our method in the existing framework of our software MIGRATE (Beerli 2006). In this framework we approximate the Bayesian posterior density p(ρ|X, α) = p(ρ)p(X|ρ, α)/p(X|α) where X is the data, ρ is the parameter set for a particular population model, and α is a fixed parameter for the Mittag-Leffler function. The software uses MCMC to approximate the posterior density, calculating f (G|ρ), p(ρ), and p(X|ρ, α). To draw the time, we choose the random number which shows the probability of the event so we have where r is a random numbers between (0, 1), so for finding t 0 we solve Eq. (11) by using the bisection method. Also, for drawing the time we suppose λ = (k(k − 1)/(4µN e ) where k is the number of lineages at the beginning of the interval, N e is the effective population size and µ is the rate of mutation. It should be noted that for calculating the integral in Eq. (11), we use the properties of Mittag-Leffler function (see Appendix The Fractional Poisson Process).
A particular problem for a useful implementation was the calculation time of the generalized Mittag-Leffler function.
We programmed the function in the C-language using the algorithm described by (Gorenflo et al. 2002) that also was used to create a Matlab-function by I. Podlubny (https://www.mathworks.com/matlabcentral/fileexchange/8738mittag-leffler-function). We compared our C Mittag-Leffler function with the one implemented in Mathematica 11.0.1.0 (Wolfram Research 2017) for correctness. During these tests we realized that our implementation will be very slow in a MCMC run, because the function will be called millions of times. We implemented a lookup

Methods
Time to the most recent common ancestor for the f -coalescent: In a coalescent framework the root of a genealogy is also called the most recent common ancestor of the sampled individuals. The time of the most recent common ancestor (TMRCA) is directly related to the effective population size of the population the sample was taken from. We assume that each time interval j is independent from the time interval j − 1 or j + 1. With a single parameter, we only need to consider the time intervals and do not need to know details of the topology of the genealogy. We compared empirical distributions of time intervals for a sample of 5 individuals for the f -coalescent, n-coalescent, and Bolthausen-Sznitman-coalescent (BS-coalescent; Bolthausen and Sznitman 1998). BS-coalescent allows merging of more than two lineages and is used to describe the waiting times in a genealogy that is affected by selection (Neher and Hallatschek 2012). All calculations were executed in Mathematica 11.0.1 (Wolfram Research 2017). For the n-coalescent, BS-coalescent, and the fcoalescent we summed independently drawn time intervals for 5, 4, 3, and 2 lineages to get the TMRCA. We drew 10,000 replicated TMRCAs for each of the three n-coalescent settings (strong population shrinkage, fixed population size, strong population growth), two BS-coalescent settings (T c = 0.005 and 0.01), and the two f -coalescent settings (α = 0.9 and 0.8). For the fixed population size n-coalescent time intervals we solved Eq. (3) in Beerli and Felsenstein (1999); for time intervals of the growing and shrinking population we used Eq. (4.6) in Hein et al. (2005); for the f -coalescent we used Eq. (11).
Simulated data: We evaluated the algorithms using simulations. We updated our simulator package (available at peterbeerli.com/software and bitbucket.com) to allow generating genealogies from the f -coalescent. We generated 100-locus datasets with a population size N e = 2500 and mutation rate per site per generation µ = 0.000001 for α in [0.4, 0.5, ..1.0]. Each locus had 10,000 base pairs. Datasets generated with α < 0.4 most commonly did not have any variable sites. The datasets were then analyzed with MIGRATE with a fixed α of 0.4 or 0.5, ..., or 1.0. Each dataset was thus analyzed 7 times with different α values allowing us to compare the different runs considering each α value as a different model. We use the framework of marginal likelihoods (cf. Kass and Raftery 1995) to compare the different models. The method implemented in MIGRATE uses thermodynamic integration to approximate the marginal likelihoods (Beerli and Palczewski 2010;Palczewski and Beerli 2014). We then used these marginal likelihood as model weights (Burnham and Anderson 2002) to decide wether models could be considered to be appropriate for the data or not.
Real data: We used three biological datasets to explore whether the f -coalescent could be a better fit than Kingman's n-coalescent. The first dataset is a dataset of short control region sequences of two humpback whale populations (Roman and Palumbi 2003) (H data), the second is a small dataset of the malaria parasite Plasmodium falciparum (Joy et al. 2003) (P data), and the third dataset is a small dataset of the H1N1 influenza strain the scared the world in 2014 with an outbreak in Mexico (I data) (downloaded on August 7 2017 from www.fludb.org). We used a simple mutation model, F84 + G (Felsenstein and Churchill 1996;Yang 1993), optimized the transition/transversion ratio (ttr) and the site rate parameters in paup* (Swofford 2003). For the whale and Plasmodium data the ttr were 20.07 and 1.56, respectively. The site variation parameter estimates were 0.00456 and 0.00982. The influenza data were heterogenous: segment 1-4 had a site rate variation parameter of ∼ 0.02, and the segments 5-8 had a value of > 200.0. We analyzed the total influenza data as 8 independent loci using the F84 model. Independently, we also analyzed the segments 1-4 using F84 + G with a site rate variation of 0.02 and the segements 5-8 using the F84 model.

Results
Time to most recent ancestor for f -coalescent: Figure 1 shows examples of empirical distributions of the TMRCA for a sample of five individuals for the n-coalescent and the f -coalescent with two different values for α. Each curve is based on 10,000 simulated TMRCA values. With α < 1 the distribution becomes more peaked with more shorter time intervals and rare, but large time intervals, that lead to heavier tails than with the ncoalescent; maximal values for times in our simulations were: for α = 1.0 it was 0.0477, for α = 0.9 it was 181.68, and for α = 0.8 it was 264.959. The expectation of the TMRCA for the n-coalescent for 5 samples simulated with a Θ = 0.01 is 0.008. The expected TMRCA for the f -coalescent is infinite because of the heavy tails (cf. Repin and Saichev 2000). Using averages of the 10,000 samples matches the expectation for the n-coalescent (Θ α=1.0 = 0.00807681), but did not match well for our simulation of the f -coalescent (Θ α=0.8 = 0.0294863,Θ α=0.9 = 0.0570966), because with α < 1.0 we will need many more replicates to sample the heavy tail of the distribution accurately. Comparison with the BScoalescent are more difficult because the parametrization, using Neher and Hallatschek (2012) of the coalescent rates of the BScoalescent and the n-coalescent are not equivalent, the parameter T c in BS-coalescent defines time time to the selectively positive innovation whereas in the f -coalescent and th n-coalescent the rate is defined by the scaled population size. There is certainly a direct relationship between T c and Θ but under selection that relationship changes. We recognize that distributions of the fcoalescent TMRCA and the BS-coalescent TMRCA look rather similar, but the mapping of the parameter T c and Θ will need further investigation.
Simulation: Figure 2 shows that it will be difficult to recover the parameter α that was used to simulate the data; data simulated with a particular α has a considerable range when estimated. It is particularly difficult to accurately estimate α when the data was simulated with low α. A problem with these simulations is that the number of variable sites is dependent on the mutation-scaled population size Θ and the simulated branch length. If Θ = 0.01 among all simulations, the number of variables sites varies considerably over the range of α: with an α = 0.4 about 126 10,000 sites are variable compared to α = 1.0 with 253 10,000 sites. With α < 0.4 it is common to see no variable sites in the dataset. Low variability data sets are difficult to use in any coalescent framework.
Real data results: The analysis of the humpback whale datasets (H data) showed that the gene flow between the South-Atlantic and the North-Atlantic is uneven and that the North-Atlantic receives about 5 individuals per generation but also has a much larger population size than the South-Atlantic. our marginal likelihood model selection suggests that the most appropriate model is either the n-coalescent or then a f -coalescent with a very high α, models with an α < 0.85 have a model probability of 0.0000. The best model with α = 0.95 had modes of Θ N A = 0.04283, Θ SA = 0.01243, M SA→N A = 133.7, andM N A→SA = 30.3 which translates to the number if immigrants Nm N A→SA = 0.28 and Nm SA→N A = 5.72 where NA is North-Atlantic and SA is South-Atlantic. The analysis of the Plasmodium dataset revealed that the Mittag-Leffler tuning parameter α may have a wide range (Fig. 3). Model selection using marginal likelihood suggests that models within the range of 0.55 < α ≤ 1.0 are preferred; models with an α < 0.55 had a model probability of 0.0000. The maximum marginal likelihood was at α = 0.9. The estimated mutation-scaled population size Θ varied considerably in this range. At an α = 0.55, Θ was 0.04817 and at α = 1.0 Θ was 0.00197. The best model had a mode and 95%-credibility for Θ of 0.00470 and from 0.00133 to 0.00860, respectively. Both, the P and the H dataset suggest that the ncoalescent model was a good fit. In stark contrast, the 8-segment dataset of the H1N1 strain of influenza from Mexico in 2014 had a well-defined maximum marginal likelihood at α = 0.8. The influenza dataset was the only dataset among the tree tested dataset that excluded the n-coalescent as a reasonable model with confidence. Model probability for α = 0.9 was 0.0017 and for n-coalescent (α = 1.0) was 0.0000. The independent analysis of segments 1-4 and segments 5-8 did not change the result of the full dataset that the best α is not near 1.0 (The electronic data supplement contains the results of these analyses).

Discussion
A feature of the f -coalescent is the ability to accommodate very variable time intervals. Mixtures of very short branch lengths with very large branch lengths are possible, whereas the ncoalescent forces a more even distribution of these time intervals. Extensions of the n-coalescent to allow for population growth do not match the variability of time intervals of the f -coalescent. With exponential growth, time interval near the sampling date are enlarged and near the root of the genealogy the time intervals are shortened; the n-coalescent with exponentially shrinking populations also have heavy tails, but seem to have more longer branches than the f -coalescent. In the f -coalescent time inter- val near the tips are shortened and time intervals near the root are lengthened. The three real data examples suggest that the f -coalescent is a often a better fit to the data (in all of the test cases). Although, the marginal likelihood comparison of different α for the Humpback whale and the Plasmodium data did not exclude the n-coalescent. Only the influenza dataset rejected the n-coalescent and may indicate that the f -coalescent may improve our understanding of fast-evolving organisms that are under strong selection. We believe that the f -coalescent could serve as a model how to approach loci under selection, where the positively selected locus will sweep quickly through the population and result in many short coalescent intervals. The f -coalescent is a promising extension of the coalescent because it allows considering a wider range of waiting times between events, but certainly will need more study.
Derivatives). Properties of the time-fractional Poisson process have been studied in (Meerschaert et al. 2011) and(Politi et al. 2011). Space-fractional generalizations of the Poisson process are based on the substitution of the backward shift operator space in Eq. (A2) with the fractional backward shift operator (Orsingher and Polito 2012). We use the time-fractional generalization of the homogeneous Poisson process which has been introduced in (Laskin 2003). In this generalization, the fractional Poisson process has been introduced as the counting process with probability P α (n, u) of having n items (n = 0, 1, 2, ...) during time interval u by the following special form of the fractional Kolmogorov-Feller equation where δ n,0 is the Kronecker symbol with normalization condition ∞ ∑ n=0 P α (n, u) = 1, By solving Eq. (A4) we have Laskin (2003) (A5) In this generalization, the probability distribution function of waiting time for the fractional Poisson process can be expressed as where P α (u) is the probability that a given inter arrival time is greater or equal to u where E α is the Mittag-Leffler function see (The Mittag-Leffler Functions). By using Eqs. (A6) and (A7) the waiting time for fractional Poisson process has the following probability distribution function where E α,α is the generalization of the Mittag-Leffler function (see The Mittag-Leffler Functions). The ψ α (u) defined by Eq. (A8) is fractional generalization of the well known exponential probability distribution. It should be noted that the generalization of the Mittag-Leffler function has two different parameters α and β (see The Mittag-Leffler Functions), and as it has been mentioned in Eq. (A8) the fractional Poisson process, has been derived by using the Mittag-Leffler function using ( α = β). Figure 4 shows the time of 100 independent events which happen in three cases of the Poisson process with various α (1.0, 0.9, and 0.8). For calculating these times we have supposed that the probability each event occurs is a random number while λ has been chosen to be fixed and same in all three different cases. With an α < 1 more times became shorter but a few times become longer; and the variance of the times becomes larger.

Figure 4
Times between events drawn using the classical Poisson process and the fractional Poisson process with α = 0.9, and 0.8; from top to bottom..

Definitions of Fractional Derivatives
There are various definitions for fractional derivatives. The widely used definitions include the Riemann-Liouville fractional derivatives (Podlubny 1998). Definition : Riemann-Liouville's fractional derivative of order α is defined as where α > 0 is the order of the derivative and n is the smallest integer greater than α. For the Riemann-Liouville's derivative we have where c is a constant. Replacing c with t ν we get The
This function provides a simple generalization of the exponential function because of the substitution in the exponential series of n! = Γ(n + 1) with (αn)! = Γ(αn + 1). A straightforward generalization of the Mittag-Leffler function is obtained by replacing the additive constant 1 in the argument of the Gamma function in Eq. (A12) by an arbitrary complex parameter β. For this function we use the following notation where ω(κ) is the probability density ω(κ) = λ sin(απ) π κ α−1 κ 2α + 2λκ α cos(απ) + λ 2 .