## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Inferring the shape of global epistasis

Edited by Richard E. Lenski, Michigan State University, East Lansing, MI, and approved June 20, 2018 (received for review March 7, 2018)

## Significance

How does an organism’s genetic sequence govern its measurable characteristics? New technologies provide libraries of randomized sequences to study this relationship in unprecedented detail for proteins and other molecules. Deriving insight from these data is difficult, though, because the space of possible sequences is enormous, so even the largest experiments sample a tiny minority of sequences. Moreover, the effects of mutations may combine in unexpected ways. We present a statistical framework to analyze such mutagenesis data. The key assumption is that mutations contribute in a simple way to some unobserved trait, which is related to the observed trait by a nonlinear mapping. Analyzing three proteins, we show that this model is easily interpretable and yet fits the data remarkably well.

## Abstract

Genotype–phenotype relationships are notoriously complicated. Idiosyncratic interactions between specific combinations of mutations occur and are difficult to predict. Yet it is increasingly clear that many interactions can be understood in terms of global epistasis. That is, mutations may act additively on some underlying, unobserved trait, and this trait is then transformed via a nonlinear function to the observed phenotype as a result of subsequent biophysical and cellular processes. Here we infer the shape of such global epistasis in three proteins, based on published high-throughput mutagenesis data. To do so, we develop a maximum-likelihood inference procedure using a flexible family of monotonic nonlinear functions spanned by an I-spline basis. Our analysis uncovers dramatic nonlinearities in all three proteins; in some proteins a model with global epistasis accounts for virtually all of the measured variation, whereas in others we find substantial local epistasis as well. This method allows us to test hypotheses about the form of global epistasis and to distinguish variance components attributable to global epistasis, local epistasis, and measurement error.

The mapping from genetic sequence to biological phenotype in large part determines the course of evolution. Nonadditive interactions between sites, called epistasis, can either accelerate or severely constrain the pace of adaptation. Due to the high dimensionality of sequence space, studies of evolution on epistatic genotype–phenotype maps, or fitness landscapes, have historically been limited to mathematical models, such as NK landscapes (1, 2), or computational models of simple biophysical phenotypes, such as RNA folding (3, 4).

Recently, high-throughput sequencing has made it possible to measure genotype–phenotype maps for proteins (5) and across genomes (6). While many thousands of genotype–phenotype pairs can now be assayed in a single experiment, this is still only a minuscule fraction of all possible sequences, and so the sampled genotypes typically take the form of a scattered cloud centered on the wild type. Statistical models are therefore required to derive biological insight from these sparsely sampled genotype–phenotype maps. One approach has been to fit models that include terms for each pairwise interaction between genetic sites (7, 8). Such models can sometimes predict protein contacts or mutational effects from protein sequence alignments (9), but they do a poor job at capturing a complete picture of the genotype–phenotype map—so that their predictions far from the observed data are often wildly inaccurate (10, 11).

Models of a genotype–phenotype map that contain terms for every possible pairwise interaction seem reasonable if we believe that epistasis arises primarily through the idiosyncratic effects of particular pairs of mutants—for instance, pairs of residues that contact each other in a folded protein. While there is abundant biochemical evidence for epistasis of this type, geneticists have long suspected that a substantial fraction of observed epistasis is due not to specific pairwise interactions, but rather to inherent nonlinearities in molecular phenotypes, cellular fitness, organismal physiology, and reproductive success. Apparent pairwise interactions may be caused by mutations that contribute additively to some underlying trait, combined with a nonlinear relationship between the underlying trait and the measured phenotype, or fitness.

A classic argument for this form of epistasis is the physiological theory of dominance (12), which held that dominance arises from the inherently nonlinear relationship between gene activity or dosage and measured phenotype [e.g., saturation phenomena in metabolic networks (13)]. Another classical example arose in the attempt to reconcile apparent contradictions between patterns of genetic polymorphism and the tolerable degree of genetic load, by positing models of nonlinear selection—for instance, truncation selection—operating on an underlying additive trait such as the number of deleterious mutations or heterozygous genes (14⇓⇓⇓–18). Following these classic examples, research into the shape of nonlinear selection on an observed quantitative trait using either quadratic (19) or more general (20) forms became a major interest of evolutionary quantitative genetics (21). Contemporary studies on the fitness effects of mutations often incorporate epistasis based on thermodynamic arguments, where the probability of transcription factor binding (22) or protein folding (23⇓–25) is expressed as a nonlinear function of a binding or folding energy that is additive across sites. Today, the idea that epistatic interactions are the result of a nonlinear mapping from an underlying additive trait has been called “unidimensional” (26, 27), “nonspecific” (28), or “global” (29) epistasis, and several heuristic techniques have been proposed to infer epistasis of this form (30⇓⇓–33).

Here we present a maximum-likelihood framework for inferring models of global epistasis, and we apply this framework to several large genotype–phenotype maps of proteins derived from deep mutational scanning (DMS) experiments. Under the assumption that the observed phenotype is a monotonic, nonlinear function of some unobserved additive trait, we find the best-fit model of global epistasis, test hypotheses about the form of this nonlinear relationship, infer the additive coefficients for the unobserved trait together with their confidence intervals, and estimate the extent of additional epistasis due to specific interactions between sites. Our approach shares much of the simplicity of additive models, but it can capture far more of the phenotypic variation using only a handful of additional parameters.

## A Model of Global Epistasis

Statistical models can infer genotype–phenotype maps from data by assuming some form of underlying structure. The starting point of our analysis is a nonepistatic model. Given a sequence of amino acids

One obvious way to incorporate epistasis into this additive model is by including explicit terms for interactions between each pair of sites and possibly higher-order interactions as well (7, 8). By contrast, we develop a model and inference procedure for a different form of epistasis, motivated by examining the deviations between empirical datasets and their best-fit nonepistatic models. Such deviations often show a smooth nonlinearity, as shown in *SI Appendix*, Fig. S1*A* for protein G. This observed nonlinearity suggests a global coupling between all sites that determines the observed phenotype. This idea can be formalized as a semiparametric model

We also estimate the amount of epistasis that is not captured by the global nonlinearity in our model. When independent estimates of the measurement error for each sequence are available, as is often the case, we can infer how much of the remaining variation in the measured phenotype is due to other forms of epistasis. We refer to this last component of variation in the measured phenotype as house-of-cards (HOC) epistasis, because in our statistical framework we model the phenotypic contribution of this epistasis as an independent random draw, for each genotype, from a Gaussian distribution—akin to a fully uncorrelated house-of-cards model (36). In particular, we estimate the HOC epistasis component, *Materials and Methods*).

We infer the model described above by maximum likelihood. Doing so allows us to compare models of different complexity by means of a likelihood-ratio test via parametric bootstrap (detailed in *Materials and Methods*). This approach allows us to ask whether there is statistical support for certain features, but not others, in the shape of global epistasis and to assess the uncertainty in our estimates of both global epistasis and the coefficients describing the impact of mutations on the underlying additive trait.

## Global Epistasis in Protein GB1

As a first application of our framework, we characterized global epistasis in the IgG-binding domain of protein G (GB1), a model system of protein folding and stability (Fig. 1), using data from a study by Olson et al. (37). In particular, Olson et al. (37) targeted 55 sites for mutation and measured the binding to an immunoglobulin fragment (IgGFC) for all single–amino-acid substitutions (1,045) and 95% of all double substitutions (509,693) (37). We find that for GB1 our model of global epistasis outperforms a purely nonepistatic model (*SI Appendix*, Table S1; see *Materials and Methods* for statistical methodology), and it substantially reduces the extent of unmodeled epistasis (

Looking first at the shape of global epistasis inferred by our model (Fig. 1*A*), we see that it includes both diminishing and increasing returns, depending on the value of the underlying additive trait. In particular, our inferred nonlinear function has a negative second derivative in the region around the wild type (negative in 95% of bootstraps for *SI Appendix*, Table S1), suggesting that further increases in binding affinity outside the range of observed data are possible. For deleterious mutations, we observe saturation with decreasing additive trait values, as indicated by a positive second derivative (i.e., increasing returns; positive in 95% of bootstraps for *SI Appendix*, Table S1), but our estimate of this slope is extremely small, with an expected change of only 0.0037 (CI 0.0034–0.0046) per mutation, in units of relative log binding. This slope is very small compared with the overall range of binding scores observed (from around −5 to +2) and it is consistent with control experiments by ref. 37 that established a lower bound on the binding score in their assay. More generally, our estimates of the shape of global epistasis are extremely precise, with an average 95% CI of width 0.13 (*Materials and Methods*) over the full range of the observed binding affinities (light gray in Fig. 1*A*, too small to be visible).

For the effects on the unobserved additive trait (Fig. 1*D*), we find 234 beneficial and 774 deleterious mutations with 95% CIs that exclude zero, out of a total of 1,045 mutations (*SI Appendix*, Fig. S1*D*). These additive effects are similar to what would be inferred under the purely additive model, but they are more widely distributed for deleterious mutations (*SI Appendix*, Fig. S1*B*). Interestingly, the distribution of inferred additive traits is unimodal whereas the observed distribution of binding affinities is bimodal, with one peak around the wild-type affinity and another at low binding (Fig. 1 *B* and *C*). This apparent contradiction is resolved by the form of global epistasis, which maps all sequences with low values of the additive trait to similar values for the observed phenotype, creating a second mode.

Taken together, our model for the underlying additive trait and for the nonlinear function ϕ provides a simple explanation for the observed binding data. The model accounts for essentially all of the variation in binding measurements beyond what would be expected under measurement noise alone. In particular, the root-mean-square error in our predictions is *A*, where we predict that the true phenotypes for 95% of genotypes fall between the two dashed lines.

## Complex Epistasis in GB1

While our model captures the form of global epistasis and quantifies the extent of remaining HOC epistasis, deviations from our model can provide evidence for one remaining type of epistasis, which we call complex epistasis. This form of epistasis arises from interactions between specific combinations of sites. Unlike global epistasis, which depends in a nonlinear way on a single additive trait, or HOC epistasis, which consists of independent Gaussian effects associated with each genotype, complex epistasis can be attributed to nonlinear interactions between specific sites on the measured phenotype.

To investigate complex epistasis we compared the predictions of our model, fitted to data from Olson et al. (37), to an independent dataset from a follow-up experiment by ref. 38. Wu et al. (38) measured the binding of GB1 variants mutated to all possible combinations of amino acids at four sites specifically chosen because they were suspected to exhibit idiosyncratic epistatic interactions.

On the whole, our results confirm that the pattern of global epistasis inferred using shallow mutagenesis across the whole domain persists at a deeper level of mutagenesis for these four sites. In particular, the predictions of the global epistasis model, fitted to Olsen et al. (37) data, are close to the mean binding curve measured by Wu et al. (38), across the entire range of the inferred additive trait (*SI Appendix*, Fig. S2*A*). However, the width of the deviations from our binding predictions is much broader than the variation predicted by the error model fitted to the Olsen et al. (37) experiment (Fig. 1*A*). Some of this difference can be attributed to the much larger measurement noise in the ref. 38 dataset [*SI Appendix*, Fig. S2*B*). We interpret this skew as evidence of positive complex epistasis among the specific sites mutagenized by Wu et al. (38).

To infer which combinations of sites and mutations in GB1 exhibit complex epistasis we marginalize the distribution of deviations from our model predictions for each pair of substitutions. Pairs of mutations with large mean-square deviations (*SI Appendix*, Fig. S2*C*) indicate systematic deviation from the global epistasis model. For example, the pair of mutations 41L/54G shows the largest deviation in our analysis, and indeed it is known to be associated with structural changes in GB1 (37). By contrast, the pair 41Q/54P exhibits a similar magnitude of pairwise epistasis calculated directly from binding data and yet, in our analysis, the deviations from the model are small, indicating this pairwise epistasis is primarily caused by global epistasis. Accounting for global epistasis can therefore be instrumental in identifying the specific sites that have idiosyncratic, complex interactions for a measured phenotype.

## Global Epistasis in GFP

The GB1 dataset of Olsen et al. (37) contains only single and double mutants, which covers a limited cloud of sequence space near the wild-type protein sequence. To characterize genotype–phenotype maps more broadly we applied our model of global epistasis to a DMS study that measured the fluorescence of 51,715 variants of a green fluorescent protein (avGFP), including up to 11 mutations per sequence and 3.7 mutations on average (31).

Applied to the GFP data, our model of global epistasis infers a sharp threshold that delineates fluorescing from nonfluorescing proteins, with remarkably few outliers (true positive rate 0.9966, true negative rate 0.9787, where we define positive sequences to have log relative fluorescence of −1.25 or greater; Fig. 2). Below the threshold, where the fluorescence is at background levels, the trait–fluorescence relation is almost flat (slope 0.00032 log fluorescence per mutation, 95% CI 0.00022–0.00041); while above the threshold the global epistasis function is increasing (likelihood-ratio test, *SI Appendix*, Table S1; slope 0.020 log fluorescence per mutation, 95% CI 0.019–0.021). In other words, mutations observed to have no effect below the threshold may have a substantial effect in a different genetic background.

Comparing the epistatic and nonepistatic models (*SI Appendix*, Fig. S3*A*) shows that including global epistasis improves the fit substantially (*SI Appendix*, Fig. S1*D*), for GFP we find that the bootstrapped 95% CIs for *SI Appendix*, Fig. S3*C*) show much greater uncertainty than observed for GB1. In particular, we find that only 62% of the additive coefficients have CIs excluding zero (141 positive and 1,131 negative coefficients out of 1,810 total), and the average width of these CIs was equal to 0.91 times the magnitude of an average mutation. These results show the importance of quantifying uncertainty, and they suggest that although the GFP experiment had sufficient enough power to infer the form of global epistasis, it was underpowered with respect to inferring the effects of individual mutations on the unobserved, additive trait.

Overall, our results indicate that the sequence–function relationship for GFP can be adequately understood in terms of a sharp threshold imposed on an underlying additive trait. Below this threshold, fluorescence is zero, and above this threshold fluorescence is essentially additive in the effects of mutations. More specifically, our model suggests that 95% of genotypes will fall between the two dashed lines in Fig. 2, and the error in our predictions is little more than would be expected under measurement noise alone (mean-square error in our predictions is *Materials and Methods*). Thus, like GB1, we find that global epistasis plays a dominant role in the GFP genotype–phenotype map, with little need to invoke other forms of epistasis to explain the observed data. These results are essentially consistent with the results of the neural network approach used by ref. 31 in their original analysis (based on a neural network architecture with a sharp bottleneck to represent the latent trait), but our approach provides a simpler, easier to understand picture of the sequence–function relationship together with better quantification of uncertainty.

## Genotype-by-Environment Interactions in β-Lactamase

Deep mutational scanning studies often measure phenotypes across a library of variants in several different conditions—such as antibiotic activity at several concentrations of antibiotic. While testing in multiple conditions can in principle provide more insight into the function of a protein and should provide more robust results than measurements in a single condition, it also introduces a problem of interpretability. In particular, it becomes difficult to summarize the effects of any given mutation across the experimental conditions. Here we address this problem by extending our model to analyze genotype-by-environment interactions. We do this by assuming that the genotype and environmental condition each contribute additively to the latent underlying trait ϕ, which then determines the observed phenotype via a nonlinear function. While not universally applicable, such models make biological sense in cases where both the genotype and the environment can act on some latent underlying trait. For instance, it is reasonable to assume that the action of an antibiotic depends on its “local” concentration in the cellular or extracellular environment, which can be modulated either through changes to the activity of an enzyme that degrades the antibiotic or by changes to the overall concentration of antibiotic in the medium.

Here we apply this approach to a study of β-lactamase, an enzyme that degrades β-lactam antibiotics, which has been a model system for DMS (30, 39, 40) and protein evolution (41⇓⇓–44). Stiffler et al. (45) measured the effects of 4,997 single mutants of TEM-1 β-lactamase in five different concentrations of the antibiotic ampicillin. They quantified β-lactamase activity by measuring the growth rate of each mutant in each condition. Our model takes these five growth rate measurements and synthesizes them into a single score capturing the activity of that particular genotype, together with a nonlinear function capturing the mapping from the latent trait (e.g., effective local concentration of antibiotic) to the observed growth rate.

The results of fitting this global-epistatic model are shown in Fig. 3, where the *x* axis displays the activity of each mutant (i.e., the impact of that mutant on the underlying trait), and the environmental conditions determine a series of curves, one for each condition. Due to our assumption that the environment and genotype interact additively to determine the underlying trait, these curves are simply translations of one another, and each curve produces a prediction of the growth rate of each mutant for one environment. Overall, we find the model produces a very good fit to the data (cross-validated *SI Appendix*, Table S1). The performance is also comparable to that of the biophysically based mechanistic model originally fitted by ref. 45, which uses approximately 5,000 more parameters.

The global-epistasis model provides a score for each mutation’s additive effect on the underlying trait. We again normalize so that the wild type has a score equal to zero and the mean absolute effect of a mutation on the additive trait equals 1. We infer that the single mutation effects (*SI Appendix*, Fig. S5) are largely deleterious: 3,295 out of 3,312 single mutations have CI below zero, and none has CI above zero (*SI Appendix*, Fig. S6*B*), consistent with the observation that no mutant displays a growth rate consistently higher than wild type.

The inferred activity scores are mapped to growth rates via a nonlinear function that shifts, depending on the antibiotic concentration. This function shows first increasing and then decreasing returns (for instance, in the highest antibiotic concentration, the second derivative has positive 95% CI for *SI Appendix*, Table S1). These results indicate a threshold-like phenomenon where, for any given genotype, increasing the antibiotic concentration up to a certain point produces no growth defect, whereas further increases in antibiotic produce rapidly declining growth rates. Interestingly, while the slope of the nonlinear function becomes shallower for low-activity genotypes (slope 0.090 fitness change per mutation on left-hand side, CI 0.073–0.095), it remains significantly greater than 0 (*SI Appendix*, Table S1). This result may have clinical relevance: Although growth rates experience a sudden decline with increasing antibiotic concentration, increasing the concentration beyond the point of first inhibition is still capable of producing a further decrease in bacterial growth rate. Finally we tested the hypothesis that the environmental coefficients were a linear function of the log antibiotic concentration vs. a more general model that fitted an arbitrary coefficient for each condition. We found strong evidence against the linear model (*SI Appendix*, Table S1), consistent with the visual pattern of increasingly severe antibiotic effects at higher concentrations (Fig. 3).

The simplicity of our model together with its graphical representation provides insights into the design and behavior of this experiment. For instance, in Fig. 3 we have shown the two biological replicates in each condition in slightly different colors. The plot shows that the replicates are not completely overlapping, which indicates biological variation between the replicates, and indeed we can reject our basic model in favor of a model with a different environmental coefficient for each antibiotic concentration in each replicate (*SI Appendix*, Table S1). Another fine-scale pattern can be observed in the CIs for the additive effects (*SI Appendix*, Fig. S6*B*), which widen for genotypes that have WT-like growth rates at one antibiotic concentration and then very low growth rates in the next concentration. This suggests a finer grid of antibiotic concentrations would be optimal for future experiments.

## Additive Traits and Biophysical Quantities

Since we infer an underlying additive trait when fitting a global epistasis model, it is natural to ask whether the underlying trait corresponds to some known biophysical quantity. While this question will surely depend upon the details of the assay used in a given DMS experiment—e.g., whether the assay is measuring binding of a protein to a substrate, enzymatic activity, etc.—thermodynamic stability for a protein’s native conformation is almost always an important phenotype, and there is a consistent thread in the literature suggesting that the phenotypic effects of most mutations are mediated via their effects on thermodynamic stability and the nonlinear relationship between free energy of folding and probability of being folded (23⇓–25, 46⇓–48). Moreover, low-throughput measurements of mutational effects on thermodynamic stability (

To test this hypothesis, we calculated the correlation between our additive effects and previously published estimates of the stability effects of these mutations. For protein G, we do not find a correlation between the measured stability effects and our inferred effects on the underlying additive phenotype (*SI Appendix*, Fig. S1*C*; *SI Appendix*, Fig. S6*A*; *SI Appendix*, Fig. S3*B*) with computational predictions by ref. 31 of energetic effects for single mutations on stability, as would be predicted if the fitness effects for most mutations were mediated through their effects on folding stability. Thus, while the good fit of our model is consistent with the hypothesis that measured phenotypic effects are mostly determined by a nonlinear function of the free energy of folding (23⇓–25, 46), the inconsistent relationship between our inferred additive effects and the estimated stability effects of mutations provides at best equivocal support for the widely held hypothesis that free energy of folding is the underlying trait.

## Discussion

We have proposed a maximum-likelihood framework for inferring genetic interactions from high-throughput mutagenesis experiments. Our key assumption is that genetic interactions arise from a global nonlinear mapping between an unobserved, additive trait and the measured phenotype. Modeling this nonlinear mapping by a flexible class of monotonically increasing functions, we simultaneously infer the form of the nonlinear relationship and the contributions of each possible mutation to the underlying additive trait. Analyzing three well-studied proteins, we have shown that this form of global epistasis can provide a remarkably good fit while providing an easy-to-understand summary of the underlying sequence–function relationship. Our approach also provides a principled statistical framework for testing hypotheses and quantifying uncertainty.

Understanding genotype–phenotype relationships provides a substantial challenge due to the enormous size and large dimensionality of the space of possible genotypes. An ideal model of this relationship would exhibit comprehensible behavior, make accurate predictions, and give mechanistic insight into the underlying biology. In practice, however, models must make trade-offs between these competing demands. Our strategy is based on the gambit that, despite abundant evidence for complex genetic interactions between specific sites, a simple-to-understand model of global epistasis can account for most phenotypic variation. The resulting models can be summarized visually by a pair of graphics: a heat map of coefficients showing the effects of each possible single–amino-acid substitution on an underlying additive trait and a curve showing the shape of the nonlinear relationship between the additive trait and the observed phenotype.

Besides being easy to display, the behavior of the model over sequence space is easy to understand. Because we assume that the nonlinear relationship is monotonically increasing with increasing values of the additive underlying trait, the resulting model is single peaked, and it shows no sign epistasis (54). In particular, the sign of the effect of any particular mutation is constant across backgrounds, and the magnitude of the effect depends only on the current trait value rather than on the details of the genetic sequence (29). At a more technical level, models of this form can be analyzed using the now-classical information-theoretic treatment introduced by Berg and von Hippel (55). In particular, given a global epistasis model one can readily approximate important descriptive statistics such as the fraction of random sequences that achieve at least a given level of functionality [a quantity known in the literature as the “functional information” (56, 57)], as well as more detailed predictions such as the site-specific amino acid use among highly functional sequences (55). This simplicity and analytical tractability contrast with the highly complex, rugged landscapes typically inferred by fitting a model with all possible pairwise interactions between sites, where assessing even qualitative features of the landscape from the fitted coefficients can be extremely challenging. For instance, finding the global maximum in a pairwise interaction fit is a nondeterministic polynomial time (NP)-complete problem (58), whereas the optimal genotype can be read off directly from the heat map of additive coefficients in our model of global epistasis.

Our results show that sometimes a simple model of global epistasis can provide accurate predictions that account for virtually all of the observed experimental variation. Because of the high dimensionality of protein sequence space, even a completely additive model will typically have several thousand parameters. By augmenting an additive model with just a handful of additional parameters, which control the nonlinear function, we can produce a dramatically better fit and capture the majority of observed variation with several hundred times fewer parameters than in pairwise models (e.g., there are

The global epistasis model also makes reasonable predictions for genotypes far from those sampled in the mutagenesis assay, unlike the qualitatively incorrect out-of-sample predictions exhibited by pairwise models (10, 11). Extrapolation can be easily understood under the global epistasis framework, because it is based on the biologically plausible assumption that the physiological factors responsible for saturation or potentiation operate consistently across genetic backgrounds. Furthermore, our global epistasis model is additive outside the observed range of phenotypes and so it provides highly controlled, conservative behavior when extrapolating to extreme phenotypes.

While the global epistasis model is readily comprehensible, and it is sometimes sufficient to capture the major features of the genotype–phenotype relationship, it is unclear whether the model provides an unambiguous mechanistic interpretation in terms of molecular biology and biophysics. If most epistasis in proteins is indeed due to essentially additive effects on the free energy of folding that are then converted by a nonlinear transformation between folding energy and the probability of folding (23⇓–25, 46⇓–48), then our model should fit well and the theory of stability-mediated epistasis would provide a mechanistic basis for the observed additive trait. On the other hand, the observation that our model fits well is not alone sufficient to infer the existence of a mechanistically meaningful underlying additive trait. For example, if the phenotype is a saturating function of several underlying traits (e.g., refs. 37, 52, and 59), the additive trait inferred under our model may correspond to an amalgamation of multiple mechanistic traits.

The interpretation of the global epistasis model as the simplest possible epistatic extension of an additive model, together with its capacity to capture a large proportion of the observed epistasis, suggests that it should be used as a standard first analysis of mutagenesis assay data, instead of an additive fit. More expressive models, capable of capturing complex epistasis, can then be used to describe the idiosyncratic interactions between the small set of sites whose interactions deviate substantially from the large-scale patterns identified by the global-epistasis fit. Moreover, differences in experimental protocol can be viewed as changes in the nonlinear mapping from the additive trait to the observed phenotype, and so we expect the inferred additive coefficients to be more consistent and reproducible across laboratories and experiments than the raw measurements. Indeed for TEM-1, we find that our inferred coefficients are more highly correlated with previous measurements of TEM-1 activity (39) than the measured growth rates in any single antibiotic concentration (

A subtle, but important, technical note concerns our assumption that the nonlinear global epistasis function

In contrast, recent heuristic approaches to model global epistasis essentially suffer from imposing too strong or too weak constraints on the nonlinear function *SI Appendix*, Fig. S1*B*).

The intuition that complicated behavior in a high-dimensional space can often be summarized by learning a nonlinear function of a latent additive trait has a long history in biology—as detailed in the Introduction. And techniques for inferring such relationships have been rediscovered multiple times across a breadth of scientific disciplines [e.g., single-index models in econometrics (63), projection pursuit regression in statistics (64), and linear–nonlinear models in neuroscience (65)]. Here we have shown that these ideas can be productively applied to modeling genotype–phenotype relationships in high-throughput mutagenesis data. While the form of epistasis is very simple and easy to understand, our model nonetheless captures the major features of genetic and genotype-by-environment interaction for GB1, GFP, and TEM-1 β-lactamase. We suggest that it provides a natural first model to apply to any high-throughput mutagenesis dataset.

## Materials and Methods

### Preprocessing of Genotype–Phenotype Data.

As input our procedure requires pairs of genotype–phenotype measurements and optional estimates of measurement error and environmental condition.

For the GB1 datasets (37, 38), we defined a functional score and variance from the sequence counts before and after selection based on a Poisson approximation

For the GFP dataset (31), we used the fluorescence measurement (minus the WT fluorescence) and variance as provided.

For the β-lactamase dataset (45), trials 1 and 2 were combined, with given functional measurements relative to WT (and no estimates of measurement error). The WT sequences were not given, and therefore we added them to our data with value equal to zero for each trial and antibiotic concentration. Mutations were excluded that showed small effects relative to the distribution of neutral variants (absolute difference from WT less than 0.25 under all concentrations of antibiotic) since the effect of mutations that show no fitness defect across all conditions is not well defined under our best-fit model (below).

### Inference of Nonepistatic Model.

The additive trait **2**, is the prediction of a nonepistatic model given a sequence a and parameters

In the GFP (31) dataset, estimated errors are based on the fluorescence variance of sequences with the same barcode. Many of the sequences appeared only once, with no associated variance estimates. In this case, we extended our likelihood such that if there was measurement error associated with a sequence, we set the total variance in the likelihood to be

In the β-lactamase dataset, no measurement noise estimates were used, and the log-likelihood reduces to a mean-square error.

### Inference of Global Epistasis.

Our model of global epistasis is an extension of the nonepistatic model, replacing ϕ with

While it is possible for the optimization to get stuck in local optima, resulting in a poor fit, the following choice of initial parameters produced very good solutions in all datasets we tested. First, a nonepistatic model was fitted, resulting in

One subtlety in the inference procedure is that the latent additive trait is defined only up to an affine transformation. This occurs because, for any invertible affine transformation *A* of the underlying phenotype, a corresponding change in the global epistasis can give identical predictions:

### Separating Genetic and Environmental Effects.

We separate genetic and environmental effects with a model where the measured phenotype is a nonlinear function of an intermediate trait plus the environmental effect,

As noted in the main text, our best-fit nonlinear function g for the β-lactamase data had slope equal to zero,

When the data consist of single mutants in multiple environments, the cross-validation of the model requires some modification to avoid assigning all data on a given mutation to the test set, since we have no way of making predictions for mutations that have never been observed. To avoid this problem, we generate the training and testing subsets based on the environmental conditions. For each mutation that is observed under several conditions/replicates, we assign a random permutation of integers between one and the number of replicates and conditions (i.e., the nine categories in Fig. 3). These integers define the subsets used for training and testing. For each fold of cross-validation, all observations assigned to a particular integer are used as the test set and the remaining observations are used as training.

### Bootstrapped Confidence Intervals.

We estimated confidence intervals by a parametric bootstrap based on the maximum-likelihood predictions. The bootstrapped data were generated as

### Model Comparison.

We compared nested maximum-likelihood models by a bootstrapped likelihood-ratio test. First the difference in log-likelihood was computed between the simpler (null) model and the more complex (hypothesis) model. The likelihood optimization of the more complex model had initial parameters determined by the simpler (null) model. *P* values were calculated by comparing to a bootstrapped log-likelihood difference distribution, with

We validated this procedure by bootstrapping the *P* values for the first test in *SI Appendix*, Table S1. That is, for each bootstrap in the test, we calculate a *P* value by means of a second (or recursive) likelihood-ratio test based on the bootstrapped data and models (with 100 bootstraps). The resulting *P*-value distribution (*SI Appendix*, Fig. S7) should be uniform in the ideal case; however, the resulting distribution has excess probability in the center, indicating that the test is somewhat conservative.

## Acknowledgments

We thank Jesse Bloom for comments on the bioRxiv preprint and the anonymous referees for helpful comments. J.O. was supported by NIH Grant T32AI055400, and J.B.P. acknowledges support from the David and Lucile Packard Foundation and the US Army Research Office (W911NF-12-R-0012-04).

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: jakubo{at}sas.upenn.edu.

Author contributions: J.O., D.M.M., and J.B.P. designed research; J.O. performed research; and J.O., D.M.M., and J.B.P. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The code used for inference and analysis, as well as input–output data, is available on GitHub at https://github.com/jotwin/GlobalEpistasis.jl.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1804015115/-/DCSupplemental.

Published under the PNAS license.

## References

- ↵
- ↵
- Kauffman SA

- ↵
- Huynen MA,
- Stadler PF,
- Fontana W

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Levy RM,
- Haldane A,
- Flynn WF

- ↵
- Otwinowski J,
- Plotkin JB

- ↵
- ↵
- ↵
- Kacser H,
- Burns JA

- ↵
- Sved JA,
- Reed TE,
- Bodmer WF

- ↵
- King JL

- ↵
- Milkman RD

- ↵
- Kimura M,
- Crow JF

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Bloom JD, et al.

- ↵
- ↵
- Wylie CS,
- Shakhnovich EI

- ↵
- Kondrashov FA,
- Kondrashov AS

- ↵
- ↵
- ↵
- Kryazhimskiy S,
- Rice DP,
- Jerison ER,
- Desai MM

- ↵
- Jacquier H, et al.

- ↵
- ↵
- Pokusaeva V, et al.

- ↵
- Sailer ZR,
- Harms MJ

- ↵
- Szendro IG,
- Schenk MF,
- Franke J,
- Krug J,
- de Visser JAGM

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Klesmith JR,
- Bacik JP,
- Wrenbeck EE,
- Michalczyk R,
- Whitehead TA

- ↵
- Weinreich DM,
- Delaney NF,
- Depristo MA,
- Hartl DL

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Sandberg WS,
- Terwilliger TC

- ↵
- ↵
- Wu NC,
- Olson CA,
- Sun R

- ↵
- Otwinowski J

- ↵
- ↵
- ↵
- ↵
- Hazen RM,
- Griffin PL,
- Carothers JM,
- Szostak JW

- ↵
- ↵
- Manhart M,
- Morozov AV

- ↵
- Hwang S,
- Park SC,
- Krug J

- ↵
- Ramsay JO

- ↵
- Adams RM,
- Kinney JB,
- Walczak AM,
- Mora T

- ↵
- Li Q,
- Racine JS

- ↵
- ↵
- ↵
- Plackett RL

## Citation Manager Formats

## Sign up for Article Alerts

## Article Classifications

- Biological Sciences
- Evolution