New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
A method for fine mapping quantitative trait loci in outbred animal stocks

Edited by David E. Housman, Massachusetts Institute of Technology, Cambridge, MA, and approved August 31, 2000 (received for review June 30, 2000)
Related Article
Abstract
Highresolution mapping of quantitative trait loci (QTL) in animals has proved to be difficult because the large effect sizes detected in crosses between inbred strains are often caused by numerous linked QTLs, each of small effect. In a study of fearfulness in mice, we have shown it is possible to fine map smalleffect QTLs in a genetically heterogeneous stock (HS). This strategy is a powerful general method of fine mapping QTLs, provided QTLs detected in crosses between inbred strains that formed the HS can be reliably detected in the HS. We show here that singlemarker association analysis identifies only two of five QTLs expected to be segregating in the HS and apparently limits the strategy's usefulness for fine mapping. We solve this problem with a multipoint analysis that assigns the probability that an allele descends from each progenitor in the HS. The analysis does not use pedigrees but instead requires information about the HS founder haplotypes. With this method we mapped all three previously undetected loci [chromosome (Chr.) 1 logP 4.9, Chr. 10 logP 6.0, Chr. 15 logP 4.0]. We show that the reason for the failure of singlemarker association to detect QTLs is its inability to distinguish opposing phenotypic effects when they occur on the same marker allele. We have developed a robust method of fine mapping QTLs in genetically heterogeneous animals and suggest it is now cost effective to undertake genomewide highresolution analysis of complex traits in parallel on the same set of mice.
Most phenotypes of medical importance can be measured quantitatively, and in many cases the genetic contribution is substantial, accounting for 40% or more of the phenotypic variance. Considerable efforts have been made to isolate the genes responsible for quantitative genetic variation in human populations, but with little success, mostly because genetic loci contributing to quantitative traits (quantitative trait loci, QTL) have only a small effect on the phenotype (1). Association studies have been proposed as the most appropriate method for finding the genes that influence complex traits (2). However, familybased studies may not provide the resolution needed for positional cloning, unless they are very large, whereas environmental or genetic differences between cases and controls may confound populationbased association studies (3).
These difficulties have led to the study of animal models of human traits. Studies using experimental crosses between inbred animal strains have been successful in mapping QTLs with effects on a number of different phenotypes, including behavior, but attempts to fine map QTLs in animals often have foundered on the discovery that a single QTL of large effect was in fact caused by multiple loci of small effect positioned within the same chromosomal region (4). A further potential difficulty with detecting QTLs between inbred crosses is the significant reduction in genetic heterogeneity compared with the total genetic variation present in animal populations: a QTL segregating in the wild need not be present in the experimental cross.
In an attempt to circumvent the difficulties encountered with inbred crosses, we have been using a genetically heterogeneous stock (HS) of mice for which the ancestry is known. The heterogeneous stock was established from an eightway cross of C57BL, BALB/c, RIII, AKR, DBA/2, I, A, and C3H/2 inbred strains (5). Since its foundation 30 years ago, the stock has been maintained by breeding from 40 pairs and, at the time of this experiment, was in its 60th generation. Thus each chromosome from an HS animal is a finegrained genetic mosaic of the founder strains, with an average distance between recombinants of 1/60 or 1.7 cM.
Theoretically, the HS offers at least a 30fold increase in resolution for QTL mapping compared with an F_{2} intercross (6, 7). The high level of recombination means that fine mapping is possible by using a relatively small number of animals; for QTLs of small to moderate effect, mapping to under 0.5 cM is possible with fewer than 2,000 animals. The large number of founders increases the genetic heterogeneity, and in theory one can map all QTLs that account for progenitor strain genetic differences. Potentially, the use of the HS offers a substantial improvement over current methods for QTL mapping.
However, for HS mapping to achieve widespread use, we need to establish its limitations and provide a robust statistical method of analysis. In this paper we describe a multipoint method capable of detecting smalleffect QTLs in the HS; we evaluate both its power of QTL detection and the expected degree of QTL resolution. The utility of the method is demonstrated by fine mapping five QTLs for fearfulness in HS mice, only two of which were detectable by singlemarker (SM) association.
Materials and Methods
Openfield behavioral testing, genotyping, mapping and generation of markers was performed as described in ref. 8. The following microsatellites were generated: chromosome 1 markers, 103.37 ATAGAACCTGGTGCCTGTGG, TCCCCAGGAGAAGACACAAG and 103.64B AAGGGTTCTGAGGTGCAGAA, TAGTGGTGCACATCTGCA; and chromosome 12 markers, 419.2 TCCAGATCTCCCCACAGTTC, CCACACTCCAGGAAAGGATC, 419.19 GGCAGTGGTAATCAGGATGTG, TCCCTTCTCCTGGTTGTTGT, and 419.21 TCACTGGGCTCTAACCTTGG, GTAAAATGGTGGCAGTGGTG.
Statistical Theory
Failure of SM Association Analysis.
It has been noted in association studies in human populations that SM association analysis may fail to detect QTLs expected to be segregating (1). We encountered the same problem in a study (8) of openfield behaviors of HS mice, a validated animal model of susceptibility to anxiety (9). We typed a total of 67 markers approximately 1 cM apart on 750 HS mice, over five regions where previous F_{2} intercrosses had detected QTLs (refs. 10 and 11; Table 1). We expected to confirm QTLs in all five regions because the strains that were used in the F_{2} detection experiments were among the founders of the HS.
We used SM analysis of variance to map the QTLs. At each marker the animals were grouped according to their genotype and oneway ANOVA was used to test for significant differences between the group means. MarkerQTL association was indicated by a significant Fstatistic in the ANOVA. We confirmed and fine mapped QTLs in only two of the five regions (Table 1). On chromosome 1 a QTL accounting for 6% of the phenotypic variance was mapped into an interval of 0.8 cM, so in some circumstances SM association works well. We therefore sought an explanation for the three failures.
One possible reason is that genetic drift in the HS has resulted in allele fixation, but computer simulations of the HS breeding protocol indicate that only 5% of the genome should be fixed, consistent with the observed level of marker homozygosity. In fact, the explanation is that alleles of the same size are descended from different strains. SM association analysis does not use information about the founder haplotypes or from neighboring markers and cannot distinguish between strains having different QTL effects but identical alleles at a nearby marker. At most markers there are only two or three alleles, so one cannot determine from which of the eight strains a single allele has descended.
For example, consider two markers (D1Mit100, D1Mit496), less than 500 kb apart, near the QTL at position 64 cM on chromosome 1 (Fig. 1A). SM ANOVA yielded a logP of 5.35 at D1Mit100, but gave a logP of 0.04 at D1Mit496 (all significance levels are given here as logP values, i.e., log_{10}p, so e.g., Pvalue 10^{−}^{4} corresponds to logP 4). The proximity of the markers rules out recombination as a reason for the difference in significance. Rather, the important difference between D1Mit496 and D1Mit100 is that strain RIII can be distinguished from A/J, C3H, I at D1Mit100.
A Multipoint Model Using Progenitors.
To incorporate information from flanking markers and the progenitor haplotypes, we developed a multipoint method that determines the probability of each founder strain being the ancestor of a given allele in the HS. QTLs then are detected by testing for differences between the genetic effects of the progenitor haplotypes rather than by association at each locus. Note that it would not help to reconstruct the haplotypes of the HS at the generation we tested, as this would not determine whether (in the example) an allele at D1Mit496 was derived from RIII or from one of the other strains. The critical issue is to calculate the probability that an allele is descended from one of the eight progenitors, which is different from standard interval mapping (12) or interval mapping with marker cofactors (13–15).
Because the number of possible ancestral haplotype reconstructions increases exponentially with the number of markers, it is impossible to calculate the probability of each haplotype separately. However a dynamic programming (DP) algorithm greatly reduces the complexity. DP was first used by Lander and Green (16) in a different context to reconstruct haplotypes from pedigrees. Our method does not use pedigree information. The analysis is in two stages: ancestral haplotype probability reconstruction using dynamic programming followed by hypothesis testing using linear regression.
We assume that at a QTL locus, L, a chromosome originating from the progenitor strain s, contributes an unknown additive amount T_{s} to the phenotype, so that the expected genetic effect for a diploid individual with ancestral alleles labeled s, t at the trait locus is T_{s} + T_{t}; a test for a QTL is equivalent to testing for differences between the T_{s}s. The DP method computes the probability F_{Li}(s, t) that individual i has the ancestral alleles s, t at L. Then the expected phenotype is 1 say, and the T_{s}s are estimated by a linear regression of the observed phenotypes across all individuals using the design matrix X_{L}, followed by an ANOVA to test whether the progenitor estimates differ significantly. The method's effectiveness depends on the ability to distinguish ancestral haplotypes across the interval; clearly the power will be lower where all markers have the same noninformative allele distribution, but markers share information where there is a mixture.
This problem can be thought of as a Hidden Markov model, where the hidden states are the progenitor haplotypes and the observed data the genotypes. Define P_{mi}(s,t) to be the probability that for a certain individual i, the progenitor haplotypes are s,t at marker m, given (i) the genotypes for the ordered markers numbered 1–m, (ii) the founder strain haplotypes, expressed as the probability π_{m}(sa) that the ancestral state at marker m on a particular chromosome is s given the allele observed at that locus is a, (iii) the genetic distances d_{m} between markers m, m + 1. Ignoring interference and nonrandom mating effects (i.e., pedigree information), the number of recombinants between markers is distributed as a Poisson random variable with mean Gd_{m}, where G is the number of generations since the HS was founded. Consequently the prior probability that on a certain chromosome locus m + 1 is in state s given locus m is in state is σ is 2 where S is the number of strains. The prior probability of each of the S progenitor strains is 1/S at any locus, and missing data are treated as an allele with equal probability in the founder strains. Conditional on the genotype a, b for individual i at marker m + 1 and the ancestral haplotypes at m being σ,τ the transition probability that the haplotypes at m + 1 are s, t is: 3 (the subscript m has been dropped from r,π for clarity). As the phase of the genotypes is unknown we must consider both possibilities. Therefore the total probability that that the haplotypes at m + 1 are s, t can be expressed as a DP recurrence relation 4 summed over all possible haplotypes σ,τ at m. P_{mi}(s, t) is computed iteratively across the chromosome, starting at the first marker. Similarly, we can find Q_{m}_{+1i}(s, t), the probability that locus m + 1 is in state s, t given all information from markers m + 1 through M by running the algorithm backward from the terminal marker. Analysis of N individuals, M markers, and S strains requires space proportional to NMS^{2} and time proportional to NMS^{4}.
QTL Detection.
Suppose the QTL at locus L is between markers m, m + 1 at an unknown distance cd_{m} from m. The probability F_{Li}(s, t) that the haplotypes are s, t at L in individual i will depend on the flanking marker distributions and the pattern of recombination in the interval. Fixing on one chromosome, the locus must either be linked to both markers, or just the left marker, or just the right, or be unlinked, with respective probabilities 5 A diploid individual's chromosomes need not be linked the same way. By integrating over c we obtain the intervalwide prior probability that the joint linkage state for the the QTL is XY, as p_{XY} =∫_{0}^{1} p_{X}(c)p_{Y}(c) dc. Then, dropping the subscripts for clarity, the probability that the founder alleles are s, t at the QTL L is found by summing over all possible linkage states XY: 6 where the probability an unlinked locus is in any given state = 1/S, and P(⋅, t) = Σ_{s} P_{mi}(s, t), Q(⋅, t) = Σ_{s}Q_{m}_{+1i}(s, t), etc. For example, the term P(⋅, t)Q(s, t)p_{RB} is the probability that chromosome 1 of the QTL is in state s and linked just to the righthand marker, and chromosome 2 is in state t and linked to both left and right markers.
We found that greatest sensitivity to detect a QTL occurs when the generations G is set substantially higher that the true number. Likely reasons for this phenomenon are that the distances of nearby markers may be inaccurate, and the presence of erroneous genotypes that create false recombinant events. On a 450Mhz Pentium III running redhat linux 2.2, 750 mice, 45 markers, and eight strains can be analyzed (i.e., DP plus linear regression) in 73 central processor unit s using a Cprogram, happy.
We test for a QTL in the intervals between adjacent markers rather than at each marker locus; DP logP values refer to marker intervals and SP logP values to markers at the interval endpoints. In Fig. 1, DP logP values are plotted as step functions that are constant over each interval. It is possible to generate pointwise logP values but they do not differ significantly from the intervalwise values.
Results
Significance Levels and Resolution.
We examined a 10cM region around each of the five QTLs identified in the F_{2} intercrosses (Table 1), placing markers on the radiation hybrid map and, where possible, the European Collaborative Interspecific Backensi genetic map to provide accurate marker positions necessary for the method. The results are shown in Fig. 1.
To check the accuracy of the tabulated ANOVA significance levels, we permuted the phenotypes between animals and repeated the ANOVA 1,000 times, thereby taking into account the large number of markers, the fact that the tests are no longer independent, and that the phenotypes may not be normally distributed. At each marker interval the logP values were ranked, and the 5%, 1%, and 0.1% significance levels were defined as the corresponding percentiles. They are slightly less than their theoretical values, so the use of logP derived from a tabulated F distribution is reliable and conservative. Fig. 1 shows the 0.1% significance levels. Additionally, the most significant permuted logP in each region was close to the reciprocal of the number of intervals, so the tests may be treated as independent. Therefore, to establish significance levels appropriate for any mapping experiment, we need only divide the individual regression Pvalue by the number of intervals. We analyzed a total of 63 intervals, so the 1.0% and 0.1% logP thresholds are 3.8 and 4.8, respectively. All of the QTLs we have detected exceed the 1% level, and only one (near D15Mit134, logP 3.95) fails to exceed the 0.1% level.
We used a bootstrap procedure to determine mapping resolution. A data set was created by sampling the animals with replacement and the regression analysis repeated in the neighborhood of each QTL. The number of times that each marker interval contained the most significant logP was recorded in 500 iterations. Table 2 gives the intervals with the highest percentage of QTL locations. On chromosome 10, 99% of bootstraps placed the QTL into a 0.5cM interval; however, on chromosome 12 we found that almost 20% of bootstraps indicated a second location for the QTL. The bootstrapdefined range for chromosomes 1 and 15 agree with an independent highresolution haplotype and recombinant inbred segregation test carried out on the BALB/cJ and C57BL/6 crosses (17) (see Table 2). Consequently, we conclude that the QTLs have been replicated in the HS.
Table 3 documents the DP effect sizes and the T statistics associated for the three QTLs where SM analysis failed, together with the ancestral alleles of the two flanking marker loci. This confirms that the DP method succeeds over SM when QTLs with opposite effects are associated with the same allele.
Simulations.
We also compared the DP and SM methods by simulating the HS breeding protocol for 60 generations. The large number of variables involved means that is not feasible to simulate all possible combinations of effect size and progenitor allele distributions, so instead we used the DPestimated progenitor effects and the observed progenitor alleles for the loci on chromosomes 1, 10, 12, and 15, which captures the QTL phase association derived from the DP analysis. QTL effect sizes were scaled so that the QTL accounted for 5% of the total phenotypic variance in the final generation. In about 5% of simulations the QTL alleles went to fixation and these results were discarded. The percentages of successful detections for the two methods are given in Table 4. DP was always more efficient than SM. The efficiencies vary between loci, which we attribute to the different phase relationships between alleles and QTLs. Additional simulations using other phase associations confirm a marked variation in detection rates. We also performed 500 computer simulations where there was no QTL present. After taking into account the number of simulations made, no false positives were found.
Discussion
We have shown that using DP generally improves QTL detection and fine mapping. The failure of SM methods appears to be because of different QTL alleles occurring on similar haplotypes. In human populations a similar phenomenon has been observed, where numerous mutations for thalassaemia have been found on apparently identical haplotypes (18). In particular, the oscillatory behavior of SM analysis in Fig. 1A is similar to that often observed in studies of linkage disequilibrium. Presumably the relationship between QTLs and ancestral human chromosomes is equally complex, reducing the power of genomewide association studies unless ancestral chromosomes can be reconstructed (19).
DP analysis of HS animals provides a fast, robust, and costeffective strategy for highresolution analysis of complex quantitative traits. There are many possible applications of the method.
Genomewide Scans.
Multipoint DP analysis of HS mice seamlessly combines data from any marker type [especially single nucleotide polymorphism (SNPs) and microsatellites] thus making it ideal for highresolution genomewide scans, as has become possible with the publication of a firstgeneration set of highdensity SNPs for the mouse (20). We used simulation of the HS breeding protocol to estimate the power of our method for a whole genome scan with SNPs. We simulated 50 diallelic markers spaced at 1cM intervals, with a QTL that explained 2.5%, 5%, and 10% of the phenotypic variance placed midway between the two central markers, in populations sizes of 500 and 1,000 animals. To establish significance levels appropriate for a genome scan, we divide the individual Pvalue by the number of intervals (3,000) analyzed. In Table 5 we show the probabilities of successful QTL detections obtained for three genomewide significance levels (5%, 1% and 0.1%, corresponding to log Pvalues of 4.38, 5.08, 6.08).
Mapping Traits in Parallel.
Not only can the method simultaneously map multiple QTLs of small effect, it also can fine map many traits in parallel, removing the need for separate F_{2} QTL detection experiments, and avoiding problems where a QTL is present in the HS but not in the F_{2}. The only requirements in trait selection are that the measurements do not interfere with each other and each phenotype has a heritable variance in the founders of the HS. Based on our work required to fine map a single trait, wholegenome mapping would be cost effective when 20 or more traits are mapped in parallel on the same set of mice. A potential disadvantage of this approach would be that the number of tests performed per trait would be about 20 times larger (3,000 markers in a genome scan at 1cM resolution compared with an F_{2} detection experiment on 100 markers followed by fine mapping about a further 100), so significance thresholds would be correspondingly higher.
Mapping Modifier Genes.
Numerous mouse models of human disease, either spontaneous or genetically engineered mutants, have been established. These animals are either maintained on an inbred background, or by continually backcrossing onto a F_{1} between two inbred lines (21). It has been shown that the phenotype of the mutant will vary depending on the genetic background, indicating that modifier genes can have a significant effect. Consequently molecular characterization of modifiers is likely to provide novel insights into pathogenesis.
Mapping modifier loci by crosses between inbred mutant strains is qualitatively similar to and has the same limits of resolution as standard QTL detection experiments. However, it is possible to extend our method to fine map modifier genes. Consider an F_{1} cross between an HS animal and a mutant. For simplicity we will assume the mutant is on a constant inbred background, so that all modifier loci may be treated as coming from the HS chromosome. Therefore, the expected effect on the observed phenotype of a modifier locus descended from progenitor strain s will be T_{s}, that is, the analysis resembles that of a haploid genome.
Proceeding analogously to the analysis of a pure HS population, at the marker m, let β_{m}(a) be the probability that a chromosome around marker m is from the inbred background, given the allele observed is a. Let π_{m}(sa) be the probability that a chromosome around m is HS and derived from ancestral strain s, given a. Then the transition probability that the HS ancestral strain at m + 1 is s given the observed genotype a,b for individual i at m + 1, and the HS is in state σ at m, is 7 Consequently the probability P_{m}_{+1 i}(s) that the HS chromosome is from founder strain s at marker m + 1, conditional on all of the genotypes for markers 1,2 … m + 1 satisfies 8 The remaining analysis follows the previous case with the obvious simplifications for a haploid genome and is omitted.
In the case of a HS intercrossed with a backcross, it can be shown that the backcrossderived chromosome will resemble an HS chromsome derived from two founder strains after three generations of breeding. Consequently, we can analyze the data as a cross between two HS of different founders and ages.
QTL Detection and Fine Mapping Using F_{2} × HS Hybrids.
Consider a hybrid whose parents are an HS and an F_{1} intercross of two inbreds. The HSderived chromosome will have a high level of recombination, whereas the other will be an F_{2} intercross. Such animals can be used for both a genome scan with 100 markers at 20 to 30cM spacing, using information from just the F_{2} chromosomes, followed by fine mapping at 1cM resolution those regions likely to contain a QTL, using the HS chromosomes. The method of analysis is similar to that of a cross between two distinct HS. During the QTL detection phase, the high level of recombination in the HS chromosome means that HS QTLs will probably be unlinked to the markers, so their state cannot be determined and their effects contribute to the residual variance. However, during the fine mapping, the state of the F_{2} chromosome can in general be determined and is usually constant across each region.
The relative merits of this approach over performing two separate experiments depend on the costs of the animals, phenotyping, and the phenotypic variance between the HS founders. If necessary, the increased residual variance during the QTL detection phase could be offset by increasing the marker density.
Acknowledgments
We thank L. Cardon for helpful discussions. This work was supported by the Wellcome Trust (R.M., C.J.T., and J.F.). The analysis software and data sets are available from http://www.well.ox.ac.uk/happy.
Footnotes

↵‡ To whom reprint requests should be addressed. Email: jf{at}well.ox.ac.uk.

This paper was submitted directly (Track II) to the PNAS office.

See commentary on page 12389.

Article published online before print: Proc. Natl. Acad. Sci. USA, 10.1073/pnas.230304397.

Article and publication date are at www.pnas.org/cgi/doi/10.1073/pnas.230304397
Abbreviations
 QTL,
 quantitative trait loci;
 HS,
 heterogeneous stock;
 SM,
 single marker;
 DP,
 dynamic programming
 Received June 30, 2000.
 Copyright © 2000, The National Academy of Sciences
References
 ↵
 ↵
 Risch N J,
 Merikangas K
 ↵
 ↵
 Legare M E,
 Bartlett F S,
 Frankel W N
 ↵
 Lindzey G,
 Thiessen D
 McClearn G E,
 Wilson J R,
 Meredith W
 ↵
 Darvasi A,
 Soller M
 ↵
 ↵
 ↵
 Gray J A
 ↵
 Flint J,
 Corley R,
 DeFries J C,
 Fulker D W,
 Gray J A,
 Miller S,
 Collins A C
 ↵
 ↵
 ↵

 Jansen R C
 ↵
 ↵
 Lander E S,
 Green P H
 ↵
 ↵
 ↵
 ↵
 ↵