New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
The age incidence of chronic myeloid leukemia can be explained by a onemutation model

Communicated by Bertram Kostant, Massachusetts Institute of Technology, Cambridge, MA, August 17, 2006 (received for review June 10, 2006)
Abstract
Chronic myeloid leukemia (CML) is associated with the Philadelphia chromosome, which arises by a reciprocal translocation between chromosomes 9 and 22 and harbors the BCRABL fusion oncogene. It is unknown whether any other mutations are needed for the chronic phase of the disease. The CML incidence increases as a function of age with an exponent of ≈3. A slope of 3 could indicate that there are two mutations, in addition to the Philadelphia translocation, that have not yet been discovered. In this work, we explore an alternative hypothesis: We study a model of cancer initiation requiring only a single mutation. A mutated cell has a net reproductive advantage over normal cells and, therefore, might give rise to clonal expansion. The cancer is detected with a probability that is proportional to the size of the mutated cell clone. This model has three waiting times: (i) the time until a mutated cell is produced, (ii) the time of clonal expansion, and (iii) the time until the clone is detected. Surprisingly, this simple process can give rise to cancer incidence curves with exponents up to 3. Therefore, the CML incidence data are consistent with the hypothesis that the Philadelphia translocation alone is sufficient to cause chronic phase CML.
Chronic myeloid leukemia (CML) is a malignant clonal disorder of the hematopoietic system that leads to increased numbers of myelocytes, erythrocytes, and thrombocytes in peripheral blood (1, 2). The molecular hallmark of CML is the Philadelphia chromosome, first described as shortened chromosome 22 in 1960 (3) and then as a reciprocal t(9;22) translocation in 1973 (4). This abnormal chromosome is found in cells from the myeloid, erythroid, megakaryocytic, and B lymphoid lineages, indicating the presence of a “cancer stem cell” that is capable of producing several types of differentiated cancer cells (2). The Philadelphia chromosome is present in 95% of patients. The remaining 5% have complex or variant translocations that have the same result: fusion of the breakpoint cluster region (BCR) gene on chromosome 22 to the Ableson leukemia virus (ABL) gene on chromosome 9. The chimeric oncogene BCRABL encodes a constitutively active cytoplasmic tyrosine kinase. This protein activates growth and differentiation pathways in hematopoietic cells; its effectors include RAS, RAF, MYC, JUN, STAT, and phosphatidylinositol 3kinase (1). BCRABL initiates a process to transform hematopoietic cells such that their growth and survival become independent of cytokines (5).
The question of whether BCRABL is necessary and sufficient to cause CML has been addressed in different ways. Several laboratories could reproduce a CMLlike disease in mice expressing the BCRABL oncogene alone (6–8) or in combination with vabl (9). Exposure to ionizing irradiation was shown to increase the risk of acquiring CML, but disease usually develops only after a prolonged latent period (10); this finding could indicate that several mutations are necessary for CML, but it also could mean that the Philadelphiapositive cell clone has a slow rate of expansion. Finally, ≈30% of healthy individuals express the BCRABL oncogene at very low levels (11, 12). This observation, however, does not prove that additional genetic changes are necessary for CML because it can alternatively be explained by the expression of BCRABL in nonselfrenewing differentiated cells rather than hematopoietic stem cells. In the latter case, the continuous production of healthy hematopoietic cells will eventually replace the Philadelphiapositive cell clone and the disease will be “washed out” of the system (13). Hence, the experimental evidence of how many mutations are necessary to cause CML is inconclusive.
The multistep theory of carcinogenesis was inspired by the observation that the cancer incidence increases as a higher order function of age. In the 1950s, Nordling, Armitage, and Doll (14, 15) pointed out that the cancer incidence data could be explained by an increasing somatic mutation rate with age, or by the requirement of several events to cause cancer (16). In 1957, a simple deterministic model of two mutations was shown to produce incidence curves with slope six if the intermediate mutants lead to clonal expansion (17). Later, Moolgavkar et al. (16, 18, 19) introduced modeling of incidence curves based on multistate branching processes including cell proliferation and death.
Based on these insights, many mathematical models have been developed that investigate the number of mutations needed to cause particular kinds of cancer (19–22). A model fitted to the agespecific incidence of colorectal cancer proposes that two rare events and one highfrequency event suffice to initiate clonal expansion of the mutated stem cell, and only one more mutation is needed to give rise to an adenomatous polyp (19). The agespecific incidence of CML was used to calibrate a multistage model and predicts that three stem cell mutations are necessary for the chronic phase of the disease (20). These models neglect the population size and structure of the cells that are prone to accumulating mutations. However, the population genetics of susceptible cells must be considered if meaningful conclusions shall be drawn. For example, depending on the effective population size as compared with the mutation rate, it can take two, one, or even zero ratelimiting hits to accumulate two genetic alterations (23–26). The number of ratelimiting hits also can be larger than the actual number of mutations because of time delays due to clonal expansion (17). Therefore, the number of mutations necessary to cause a particular type of cancer cannot simply be read off the agespecific incidence data, and mathematical analyses must take into account population size and structure.
In the following, we offer a population genetics analysis of the dynamics of cancer initiation and its epidemiological consequences. Our model is motivated by CML but does not exclusively apply to this disease; rather, it is a general approach to studying many different types of cancer. We use CML as a specific example to show that a single mutation model can generate cancer incidence curves with an exponent of up to 3.
Results and Discussion
Assume that a population of N (hematopoietic stem) cells proliferate according to the Moran process (27, 28): Initially, all cells are wild type. Cells divide every τ days. At each time step, a cell is chosen for reproduction proportional to fitness. The newly produced daughter cell replaces another randomly chosen cell (Fig. 1). Hence, the population size remains strictly constant. A wildtype cell gives rise to a mutated cell with probability u per cell division. Back mutation is neglected. A mutated cell has relative fitness r: if r = 1, then the mutation is neutral and the cell has the same growth rate as a wildtype cell; if r > 1, the mutant is advantageous, and if r < 1, the mutant is disadvantageous as compared with wildtype cells. We assume that the probability to diagnose the disease is linearly proportional to the number of mutated cells in the population. If there are x _{1} mutant cells, the rate of detection is qx _{1}. We perform exact numerical simulations of this process and compare the results with analytic approximations.
In Methods, we show that the probability of detecting the disease before time t is given by Here, b = Nu(1 − 1/r)/τ and c = (r − 1)/τ. The stochastic process is characterized by three waiting times: (i) the waiting time until production of the first successful mutated cell is given by 1/b; (ii) the time for clonal expansion of its lineage is given by ln[N(1 − 1/r)]/c, and (iii) the time until detection of the disease is given by 1/qN. This stochastic process can give rise to incidence curves with exponents up to 3. Define t _{0} as the age at which we are interested in the slope of the incidence; for example, t _{0} = 50 years. The factor b is always much less than the inverse of t _{0} because otherwise, almost everybody would be diagnosed with cancer. Patients are diagnosed with a probability proportional to their number of leukemic stem cells, and if this probability is small, then the average time to detection might be >100 years. If qN ≫ 1/t _{0} and c ≫ 1/t _{0}, then the incidence increases with an exponent of 1. In this case, the waiting time is dominated by the time it takes to produce the first successful mutant, and the time needed for clonal expansion and detection is negligible and, hence, does not contribute to the slope. If qN ≪ 1/t _{0} and c ≫ 1/t _{0}, then the slope is 2 + α, where α is between 0 and 1. Finally, if qN ≪ 1/t _{0} and c ≪ 1/t _{0}, then the slope can take a value of 3. This contribution is only important if the time for clonal expansion is sufficiently long, i.e., if the mutated cell clone has a small fitness advantage. Therefore, a simple onemutation model can give rise to incidence curves with an exponent of up to 3. Fig. 2 shows the exact stochastic simulation and Eq. 1 shows exponents 1–3.
The agespecific incidence data for CML were obtained from the Surveillance, Epidemiology, and End Results (SEER) registry, which covers ≈10% of the U.S. population (www.seer.cancer.gov). In total, 5,256 CML cases were observed between 1973 and 2002, and these cases were recorded in age classes of five years (Table 1, columns 1 and 2). To calculate the probability to get CML per year from this data, some adjustments are made. First, the cases recorded in SEER have to be normalized to account for the number of cases per year. In 2000, there were ≈4,400 CML cases (29), hence, each SEER entry is multiplied with 4,400/5,256 (Table 1, column 3). This operation gives the number of CML cases per year per age class. To obtain the probability to get CML per age class per year, the number of cases has to be normalized by the number of susceptible people. Hence, we divide the cases by the U.S. Census data from 2000 (Table 1, column 4) and get probabilities p_{i} to be diagnosed with CML per year of age. Finally, these probabilities p_{i} are used to calculate the probabilities P_{k} to be diagnosed with CML anytime before age k (measured in years). We have (Table 1, column 5). The resulting incidence curve is a nearly straight line on a doubly logarithmic plot with slope 2.86.
We compare these data with the direct computer simulation and with Eq. 1 and find that our simple onemutation model can be fit to the CML incidence curve (Fig. 3). Hence, the Philadelphia chromosome alone might be sufficient to cause chronic phase CML.
In this work, we have shown that a simple onemutation model can give rise to cancer incidence curves with an exponent up to 3. We assume that the susceptible cells proliferate according to a stochastic process. The probability of diagnosis of the disease increases linearly with the number of mutated cells. We compare exact simulations of this process with an analytical formula for the cumulative probability of being diagnosed until a certain age. The stochastic process is characterized by three waiting times: the time until the first successful mutant is produced, the time of clonal expansion, and the time until detection of the disease. If the time until production of the first successful mutant is the only dominantwaiting time, then the resulting incidence data increase with exponent 1. Depending on the magnitude of the other two waiting times, epidemiological data with exponents up to 3 can be obtained. These results, together with previous work on the dynamics of tumor suppressor gene inactivation (23–26), suggest that the incidence curves of cancer cannot be used to estimate the number of mutations necessary to cause the disease. Population size, mutation rates, cell division times, and fitness values all interact to determine the number of ratelimiting hits a mutation causes, and this number can be larger or smaller than the actual number of mutations. Thus, a meaningful mathematical analysis of cancer incidence must take the population genetics of the susceptible cells into account.
As a specific example, we have studied the agespecific incidence data of CML, which was obtained from the SEER registry. We calculate the cumulative probability of developing CML from the SEER cases and find that the incidence has a slope of 2.86 on a doubly logarithmic plane. We fit this slope with the onemutation model and identify appropriate parameter values. Therefore, the hypothesis that the Philadelphia chromosome alone is sufficient to cause chronic phase CML is consistent with the observed incidence curve (16, 30). However, further experimental efforts must be made to firmly establish whether BCRABL is sufficient to cause chronic phase CML.
Methods
We consider a population of N cells by following the Moran process with a mean cell generation time of τ. Initially all cells are wild type. A mutant cell is produced with probability u per cell division and has a relative fitness r. We assume that r > 1. Diagnosis of the disease occurs as a stochastic event described as a continuoustime Markovian transition model. The probability of detection increases proportionally to the number of mutant cells: If there are x _{1} mutant cells, the rate of detection is qx _{1}. Let P(t) be the probability that the cancer is detected before time t; it is the cumulative incidence of cancer. The incidence curves of many cancers (apart from childhood cancers) are approximately straight lines on a doubly logarithmic plot.
The stochastic process can be calculated as follows: at a random time, a mutation occurs that eventually will become fixed in the population. The appearance of the first successful mutant is a random event occurring at rate b = Nu _{1}(1 −1/r)/τ. This rate is the product of the expected number of mutants produced per unit of time and the probability of nonextinction of the cell lineage. An advantageous mutation may go extinct because of the stochasticity caused by the small population size, but once the mutant cell number increases to sufficiently many, the lineage grows deterministically, and extinction can be neglected. We approximate the clonal expansion of mutant cells by the logistic equation Here, a is the time since the mutation occurred, x(a) is the frequency of mutants in the population at that time where 0 < x(a) < 1, and c = (r − 1)/τ. This equation implies that the frequency of mutants increases as x(a) = 1/(1 + (N − 1)exp[−ca]).
The probability of cancer detection per unit time is given by qNx. Hence, the cumulative risk of cancer detection at or before time t is given by if the mutant arises at time t = 0 and its lineage grows until time t after the deterministic growth rule given by Eq. 2. The time of appearance of the first successful mutant, m, is given by a negative exponential distribution, b = Nu _{1}(1 −1/r)/τ. This stochasticity is much greater than the stochasticity caused by the clonal expansion; the latter is approximated by Eq. 2. The derivation of the following formula neglects the stochasticity related to clonal expansion.
The risk of detection at or before time t is given by There is a small probability that the first successful mutant arises exceptionally early (small m); for this unlucky patient, the risk is much higher than for the average person, and it is those unlucky patients that will be diagnosed first. Therefore, we have to average the risk with respect to m. The probability of detection for a particular patient who has a successful mutation event at m is given by and the averaged risk of diagnosis is given by Here, the frequency of the mutant cells conditional to fixation is given by the deterministic model, Eq. 2. However, some trajectories go extinct, and, hence, the deterministic model is an underestimation for the trajectory of clonal growth conditional to fixation. For a more accurate approximation, we use the initial condition x(0) = 1/[N(1 − 1/r)] in Eq. 2. Hence, we have If the integral is calculated with respect to a, we have which also can be written as Eq. 1 in Results.
Acknowledgments
The program for Evolutionary Dynamics at Harvard University is supported by Jeffrey Epstein.
Footnotes
 ^{†}To whom correspondence should be addressed. Email: franziska_michor{at}harvard.edu

Author contributions: F.M. designed research; F.M., Y.I., and M.A.N. performed research; F.M. analyzed data; and F.M., Y.I., and M.A.N. wrote the paper.

The authors declare no conflict of interest.
 Abbreviation:
 ALV,
 Ableson leukemia virus;
 BCR,
 breakpoint cluster region;
 CML,
 chronic myeloid leukemia;
 SEER,
 Surveillance, Epidemiology, and End Results.
 © 2006 by The National Academy of Sciences of the USA
References
 ↵
 ↵
 ↵
 ↵

↵
 Gishizky ML ,
 Witte ON
 ↵

↵
 Daley GQ ,
 VanEtten RA ,
 Baltimore D
 ↵

↵
 Kelliher MA ,
 McLaughlin J ,
 Witte ON ,
 Rosenberg N

↵
 Lichtman M

↵
 Biernaux C ,
 Loos M ,
 Sels A ,
 Huez G ,
 Stryckmans P

↵
 Bose S ,
 Deininger M ,
 GoraTybor J ,
 Goldman JM ,
 Melo JV

↵
 Nowak MA ,
 Michor F ,
 Iwasa Y
 ↵
 ↵
 ↵
 ↵
 ↵

↵
 Luebeck EG ,
 Moolgavkar SH
 ↵
 ↵

↵
 Frank SA

↵
 Nowak MA ,
 Michor F ,
 Komarova NL ,
 Iwasa Y

↵
 Iwasa Y ,
 Michor F ,
 Nowak MA
 ↵
 ↵

↵
 Moran P
 ↵

↵
 American Cancer Society
 ↵