## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Bayesian design of synthetic biological systems

Edited by Peter J. Bickel, University of California, Berkeley, CA, and approved July 27, 2011 (received for review December 1, 2010)

## Abstract

Here we introduce a new design framework for synthetic biology that exploits the advantages of Bayesian model selection. We will argue that the difference between inference and design is that in the former we try to reconstruct the system that has given rise to the data that we observe, whereas in the latter, we seek to construct the system that produces the data that we would like to observe, i.e., the desired behavior. Our approach allows us to exploit methods from Bayesian statistics, including efficient exploration of models spaces and high-dimensional parameter spaces, and the ability to rank models with respect to their ability to generate certain types of data. Bayesian model selection furthermore automatically strikes a balance between complexity and (predictive or explanatory) performance of mathematical models. To deal with the complexities of molecular systems we employ an approximate Bayesian computation scheme which only requires us to simulate from different competing models to arrive at rational criteria for choosing between them. We illustrate the advantages resulting from combining the design and modeling (or *in silico* prototyping) stages currently seen as separate in synthetic biology by reference to deterministic and stochastic model systems exhibiting adaptive and switch-like behavior, as well as bacterial two-component signaling systems.

As we are beginning to understand the mechanisms governing biological systems we are starting to identify potential ways of guiding or controlling the behavior of cellular and molecular systems. Rationally reengineering organisms for biomedical or biotechnological purposes has become the central aim of the fledgling discipline of synthetic biology. By redirecting regulatory and physical interactions or by altering molecular binding affinities we may, for example, control metabolic processes (1, 2) or alter intra- and intercellular communication and decision making processes (3, 4). The range of potential applications of such engineered systems is vast: designing microbes for biofuel production (5, 6) and bioremediation (7); developing control strategies which drive stem cells through the various decisions to become terminally differentiated (or back) (8, 9), with the aim of developing novel therapeutics (10, 11); construction of new drug-delivery systems with homing microbes delivering molecular medicines directly to the site where they are needed (12); use of bacteria or bacterial populations (employing swarming and quorum sensing) as biosensors (13); and gaining better understanding of all manner of biological systems by systematically probing their underlying molecular machinery.

A range of tools and building blocks for such engineered biological systems are now available which allow us to, at least in principle, build such systems from simple and reusable biological components (14). In electronic systems, such modularity has been crucial and has allowed the cost-effective production of reliable components that can be combined to produce desired outputs. Biology, however, poses different and novel challenges that are intimately linked to the biophysical and biochemical properties of biomolecules and the media in which they are suspended. Especially in crowded environments such as found inside living cells the lack of insulation between different components, i.e., the very real possibility of undesired cross talk, can create problems; with increasing miniaturization similar, albeit quantum effects, are now also surfacing in electronic circuits (15).

As synthetic biology gears up to bring engineering methods and tools to bear on biological problems the way in which we manipulate biological systems and processes is likely to change. Historically, each new branch of engineering has gone through a phase of what can be described as tinkering before rationally planned and executed designs became common place. Arguably, this practice is the current state of synthetic biology and it has indeed been suggested that the complexity of synthetic biological systems over the past decade has reached a plateau (16). From the earliest days, explicit quantitative modeling of systems has been integral to the vision and practice of synthetic biology and it will become increasingly important in the future. The ability to model how a natural or synthetic system will perform under controlled conditions must in fact be seen as one of the hallmarks of success of the integrative approaches taken by systems and synthetic biology.

Here we present a statistical approach to the design of synthetic biological systems that utilizes methods from Bayesian statistics to train models according to specified input-output characteristics. It incorporates modeling and automated design and is general in the sense that it can be applied to any system that can be described by a mathematical model which can be simulated. Because of the statistical nature of this approach, previously challenging problems such as handling stochastic models, accounting for kinetic parameter uncertainty, and incorporating environmental stochasticity can all be handled in a straightforward and consistent manner.

## Bayesian Approach to System Design

The question of how to design a system to perform a specified task can be viewed as an analogue to reverse engineering. In design we want to elucidate the most appropriate system to achieve our design objectives; in reverse engineering we aim to infer the most probable system structure and dynamics that can give rise to some observational data. In this respect, the design question can be viewed as statistical inference on data we wish to observe.

In the Bayesian approach to statistical inference the posterior distribution is the quantity of interest and is given by the normalized product of the likelihood and the prior. In most practical applications the posterior distribution cannot be derived analytically, but if the likelihood (and prior) can be expressed mathematically we can use Monte Carlo methods to sample from the posterior. In many cases where the model structure is complex the likelihood cannot be written in closed form and traditional Monte Carlo techniques cannot be applied. These include inference for the types of stochastic processes encountered in systems and synthetic biology. In these cases a family of techniques known collectively as approximate Bayesian computation (ABC) can be applied: ABC uses model simulations to approximate the posterior distribution directly. Here we use a sequential Monte Carlo ABC algorithm known as ABC SMC to move from the prior to the approximate posterior via a series of intermediate distributions (17). This framework can also be used to perform Bayesian model selection (18) and has been implemented in the software package ABC-SysBio (19).

Fig. 1 depicts the approach presented here. The design objectives are first specified through input-output characteristics. Here these have been depicted as a single time series, though the method can be applied in a much broader sense with multiple inputs and outputs. A set of competing designs is then specified through deterministic or stochastic models, each containing a set of kinetic parameters and associated prior distributions. The distance function measures the discrepancy between the model output and the objective. In principal it is possible to specify a distribution over the objective and each model could also contain experimental error. The ABC SMC algorithm then automatically evolves the set of models toward the desired design objectives. The results are a set of posterior probabilities representing the probability for each design to achieve the specified design objectives in addition to the posterior probability distribution of the associated kinetic parameters. This approach is similar in spirit to some existing methods for the automated design of genetic networks such as those adopting evolutionary algorithms (20, 21), Monte Carlo methods (22, 23), or optimization (24⇓–26) but the advantages of our method over traditional ones are that we can utilize powerful concepts from Bayesian statistics in the design of complex biological systems, including

the rational comparison of models under parameter uncertainty using Bayesian model selection, which automatically accounts for model complexity (number of parameters) and robustness to parameter uncertainty;

a posterior distribution over possible design parameter values that can be analyzed for parameter sensitivity and robustness and provide credible limits on design parameters;

the treatment of stochastic systems at the design stage including the design of systems with required probability distributions on system components; and

methods for the efficient exploration of high-dimensional parameter space.

In the following we demonstrate the power of this approach by examining, from this unique perspective, systems that have been of interest in the recent literature. First we consider systems that are capable of biochemical adaptation (27), we then look at the ability of two bacterial two-component system (TCS) topologies to achieve particular input-output behaviors, and finally we finish with an analysis of designs for a stochastic toggle switch with no cooperative binding at the promoter.

## Biochemical Adaptation

Biochemical adaptation refers to the ability of a system to respond to an input signal and return to the prestimulus steady state (Fig. 1*A*). Ma et al. (27) identified two three-node network topologies that are necessary for biochemical adaptation: a negative feedback loop with a buffering node and an incoherent feedforward loop with a proportioner node (IFFLP). Within these categories they identified eleven simple networks that were capable of adaptation (Fig. 2*A*). We applied the Bayesian design approach to these eleven networks using Michaelis–Menten kinetic models with and without cooperativity [*SI Appendix* shows ordinary differential equations (ODEs) describing these models]. The desired output characteristics were defined through the adaptation efficiency, *E*, and sensitivity, *S*, given by where *I*_{1},*I*_{2} are the input values (here fixed at 0.5 and 0.6, respectively), *O*_{1},*O*_{2} are the output steady-state levels before and after the input change and *O*_{peak} is the maximal transient output level. We defined the two-component distance to be such that as decreases the behavior approaches the desired behavior. The final population was defined to obey the tolerances *ϵ* = {0.1,1.0}, which defines close to perfect adaptation (when *O*_{1} - *O*_{2} ≤ *O*_{1}/50) and a fractional response equal to the fractional change in input.

The results of the model selection are shown in Fig. 2 *B* and *C*. When cooperativity is not included the most robust designs for producing the desired input-output characteristics are the incoherent feedforward loops, but when cooperativity is added the posterior shifts significantly toward the negative feedback topology. If a system with these requirements were to be implemented then not only would designs 11 and 4 be clear candidates for further study, but many of the designs can be effectively ruled out and the ranking of the models provides a clear strategy for an experimental program. These results also illustrate how small changes in context or incomplete understanding of a system can produce a large change in the most robust design. The Bayesian framework allows us to incorporate such uncertainty—or safeguard against our ignorance—naturally into the design process.

The posterior distribution provides information on which parameters are correlated and which are the most sensitive to the desired behavior. The posterior for model 11 under no cooperativity is shown in Fig. 2*D*, where the ODE model is given by where *X* = {*A*,*B*,*C*} corresponds to the concentrations of the active forms of proteins (1 - *X* corresponds to the concentrations of the inactive form). *I* represents the input signal and *k* and *K* represent the reaction rate parameters (of which there are 12 in total). *F*_{A} and *F*_{B} represent background deactivating enzymes with concentration fixed at 0.5.

The posterior shows in particular that the parameters for the background deactivating enzyme (*k*_{FB} and *K*_{FB}) on node B are correlated, and that *K*_{FB} should be large; this criterion is exactly the requirement for the linear regime necessary for the IFFLP system to achieve adaptation (27). A principal component (PC) analysis of the posterior (*SI Appendix*, Fig. S1) shows other correlated parameters on the first few principal components. The last principal component describes the direction of least variance and therefore the most sensitive parameters. From this information we can deduce for example that the nature of the reaction between nodes A and C is relatively unimportant. A similar analysis for model 4 in the case of cooperativity (*SI Appendix*, Figs. S2 and S3) shows for example that the behavior is insensitive to the values of the Hill coefficients and the details of the reaction between nodes A and C.

## Bacterial Two-Component Systems

TCSs allow bacteria to sense external environmental stimuli and relay information into the cell, e.g., to the gene expression apparatus. They consist of a histidine kinase (HK) that autophosphorylates upon interaction with a specific stimulus. The phosphate group is then passed on to a cognate response regulator protein (RR) which, once phosphorylated, can regulate transcription (28).

Naturally occurring TCSs differ in the number of phosphate binding domains. In the most common form there are two phosphate binding domains (Fig. 3*A*), but an alternative form exists that consists of a phosphorelay mechanism with four phosphate binding domains (Fig. 3*B*). These are referred to as the orthodox and unorthodox TCSs, respectively (29). The reason for the existence of two forms of TCS remains largely unknown but it has been demonstrated that the phosphorelay is robust to noise and can provide an ultrasensitive response to stimuli (29, 30), whereas the orthodox system can provide behavior that is independent of the concentrations of its components (31). Here we have applied the Bayesian approach to directly compare the ability of orthodox and unorthodox designs to achieve various input-output behaviors, using ODE models similar to ones described previously (29) (see *SI Appendix*).

Fig. 3 *C*–*F* show four types of behavior that may be desired in synthetic TCS systems (e.g., for bioremediation or biopharmaceutical applications), and the corresponding posterior probabilities of the orthodox and unorthodox models to achieve them. In Fig. 3*C* the specified behavior is that of a fast response to a square pulse input signal. That is the output should show a maximum within 0.1 s after the pulse starts and a minimum 0.1 s after the pulse ends. As can be seen from the posterior probabilities, both models achieve this behavior easily, as one would expect from a signaling system, with the orthodox system slightly outperforming the unorthodox system. In Fig. 3*D* the ability of the two systems to achieve a steady output state for *t* > 2 s under a constant input signal is examined, and again both systems perform comparably with the orthodox system appearing slightly more favorable.

In Fig. 3*E* the input signal is a high-frequency sinusoid with a mean of 0.6, and the desired output is a constant signal with the same mean and root mean square < 0.3; this behavior would mimic a system that is robust to high-frequency noise. The output trajectories at some intermediate and final populations are shown in *SI Appendix*, Fig. S4. In this example the unorthodox system clearly outperforms the orthodox system, which indicates the increased robust to noisy signals that comes with the relay architecture (29). But the direct comparison of the two models’ ability to cope with noise, which is becoming possible in this approach, also reveals some unexpected characteristics: Inspection of the posterior distribution (*SI Appendix*, Fig. S5) shows that all the dephosphorylation reaction rates, *k*_{6},*k*_{8},*k*_{9}, are minimized whereas the rate of the signal induced autophosphorylation (*k*_{1}) is large. Thus the noise reduction mechanism in the unorthodox system works by saturating the system.

In Fig. 3*F* the input is again a step function but the output is more specific; it must reach to > 0.8 and drop to < 0.2 within 0.5 s of the pulse start and end, respectively, thus approximately reproducing the input. *SI Appendix*, Fig. S6 shows the evolution of the system in this case. Here the orthodox model clearly outperforms the unorthodox model. Inspection of the posterior distribution (*SI Appendix*, Fig. S7) shows that both the rate of the signal induced autophosphorylation and the rate of phosphorylation of the response regulator by the histidine kinase, *k*_{1},*k*_{2}, are large whereas the rate of dephosphorylation of the response regulator, *k*_{5}, is small. These properties ensure that the shape of the signal is transferred faithfully through the system.

## Stochastic Genetic Toggle Switch Without Cooperativity

The genetic toggle switch is a synthetic realization of a bistable switch that forms the basis of cellular memory (32). It is formed by two genes *A* and *B*, whose respective proteins repress the production of the other protein; protein *A* represses the production of protein *B* and vice versa (Fig. 4). The presence of an interaction with inducer molecules allows the system to switch between steady states with the probability of spontaneous switching low enough such that, in the absence of an interaction, the system will effectively remain in its appropriate steady state indefinitely.

Here we consider four versions of the stochastic genetic toggle switch that are all bistable without the requirement for cooperative binding of the proteins to the gene promoter (33) (note that the deterministic models are not necessarily bistable). The designs are shown in Fig. 4*A* and consist of the basic toggle switch, an exclusive version containing only one promoter, a version that includes bound repressor degradation (BRD), and a version containing a protein-protein interaction between *A* and *B* with the resulting complex nonfunctional. The additional reactions are always to reduce the probability of the deadlock state where both *A* and *B* are bound to the promotors of *B* and *A*, respectively (33). We modeled the switches using a continuous time Markov jump process which obeys the chemical master equation. Only protein level reactions were modeled which makes the models simpler (and faster to simulate) whereas retaining the important behavior. The stochastic models for all four switches are given in *SI Appendix*. For such complicated stochastic dynamical systems the advantages of a Bayesian perspective over conventional model design strategies (based, e.g., on optimization) come to the fore: Without an appreciation of the whole distribution choice of the best model would be subject to considerable uncertainty.

Fig. 4*B* shows the desired toggle switch behavior. The inducer *S* is added between *t* = 10 and *t* = 30, after which the level of protein *B* should reach a steady state with a mean number of 40. Between *t* = 60 and *t* = 80 the inducer *R* is added after which the level of protein *B* should drop to zero. The inducer numbers are both assumed to be 40 molecules which is fixed in the specified range. The desired output behavior was specified via the two-component distance metric, defined to be , where *x*_{t} is the number of protein *B* at time *t*, *y* is the target (here fixed at 40), *α* = {*t*: 30 < *t* ≤ 60}, *β* = {*t*: 0 < *t* ≤ 10 and 80 < *t* ≤ 100}, *n*_{α} = #*α* and *n*_{β} = #*β*. The final population was defined to be *ϵ* = {7.0,0.05}. Here *d*_{1} represents the distance in the on region and *d*_{2} represents the distance in the off region.

*SI Appendix*, Fig. S8 shows the evolution of the stochastic simulations toward the desired behavior. Fig. 4*C* shows the posterior probabilities for each system to achieve the specified behavior, and in particular demonstrates that the exclusive toggle switch outperforms all the others, which chimes with intuition, because the exclusive switch removes the possibility of the deadlock state without the addition of any extra reactions. The fact that the BRD switch performs worse than the original toggle shows that the addition of the two extra degradation reactions does not offer a great enough performance increase for the extra parameters, a manifestation of the parsimony principle or Occam’s razor which is inherent to the Bayesian model selection framework used here.

The reactions that comprise the exclusive switch are given by where *gA*,*gB* represent the gene promotors for protein *A*,*B*, respectively, and are fixed at one copy, *A* - *gB*, *B* - *gA* represent the bound transcription factors and *S*,*R* represent the inducer molecules. The *P* species, fixed to be one copy, ensures only *A* or *B* can be bound at any one time. Examination of the posterior distribution for this model, Fig. 4*D*, clearly shows a large correlation between *k*_{3} and *k*_{4}, which are the production and decay rates of protein *B*, respectively. This correlation is clearly seen in the principal components (*SI Appendix*, Fig. S9) through the combination *k*_{3} + *k*_{4} dominating the first PC and the combination *k*_{4} - *k*_{3} dominating the last PC. Thus the system is sensitive to only the difference in these rates which is typical in birth-death processes.

## Discussion

In this paper we have presented a unique method for the design of synthetic biological systems employing ideas from Bayesian statistics. We have demonstrated its utility and generality on three different systems spanning biochemical, signaling, and genetic networks; we also examined oscillatory behavior which is presented in *SI Appendix*. This method has advantages over traditional design approaches in that the modeling and model evaluation/characterization is incorporated directly into the design stage. The statistical nature of the method has many attractive features including the handling of stochastic systems, the ability to perform model selection and the handling of parameter uncertainty in a well-defined manner. Model selection using summary statistics can be problematic in other settings (34) but is not a concern here because we either use the full dataset with no summary or we define our posterior distributions through the desired system outputs. We used the ABC-SysBio and cuda-sim softwares (19, 35), which takes as input a set of SBML files and as such can be used by bioengineers and experimentalists to rationally compare their competing designs for a system. By using this method we hope that the implementation time of synthetic systems can be reduced by defining a program of experimental work based on the posterior probabilities of each design.

Monte Carlo sampling of parameter spaces has been used to assess the robustness of engineered and biological systems in the past. But like in the statistical case, simple Monte Carlo sampling tends to waste too much effort and time on those regions which are of no real interest for reverse engineering or design purposes. Our statistically based sequential approach homes in onto those regions where the probability of observing the desired behavior is appreciable, which allows us a more nuanced comparative assessment of different design proposals, especially when dynamics are expected (or indeed desired) to exhibit elements of stochasticity. And the Bayesian model selection approach automatically strikes a balance between the systems’ abilities to generate the desired behavior effectively but also robustly.

Further developments will include the incorporation of methods for model abstraction to reduce computation time (36) and to handle a database of standard parts as in other existing design software systems (37, 38). Moreover, it is also possible to include the generation of novel structures (by, e.g., using stochastic context-free grammars (39) to propose alterations to a reaction/interaction network) as part of the design process. Just like in the case where ideas from control engineering and statistics can gainfully be combined to reverse-engineer the structure of naturally evolved biological systems, we feel that in the design of synthetic systems such a union will also be fruitful.

## Acknowledgments

C.B., D.S. and M.P.H.S gratefully acknowledge support from the Biotechnology and Biological Research Council. X.S. is supported through a Kwok Foundation scholarship. M.P.H.S. is a Royal Society Wolfson Research Merit Award holder.

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. E-mail: christopher.barnes{at}imperial.ac.uk or m.stumpf{at}imperial.ac.uk.

Author contributions: C.P.B. and M.P.H.S. designed research; C.P.B., D.S., and X.S. performed research; and C.P.B. and M.P.H.S. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1017972108/-/DCSupplemental.

## References

- ↵
- ↵
- ↵
- ↵
- Kobayashi H,
- et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Toni T,
- Welch D,
- Strelkowa N,
- Ipsen A,
- Stumpf MPH

- ↵
- Toni T,
- Stumpf MPH

- ↵
- Liepe J,
- et al.

- ↵
- ↵
- François P,
- Hakim V

*in silico*. Proc Natl Acad Sci USA 101:580–585. - ↵
- Battogtokh D,
- Asch DK,
- Case ME,
- Arnold J,
- Schuttler HB

*Neurospora crassa*. Proc Natl Acad Sci USA 99:16904–16909. - ↵
- ↵
- Rodrigo G,
- Carrera J,
- Jaramillo A

- ↵
- ↵
- Batt G,
- Yordanov B,
- Weiss R,
- Belta C

- ↵
- ↵
- ↵
- ↵
- Csikász-Nagy A,
- Cardelli L,
- Soyer OS

- ↵
- Shinar G,
- Milo R,
- Martínez MR,
- Alon U

- ↵
- ↵
- ↵
- Robert CP,
- Cornuet JM,
- Marin JM,
- Pillai N

- ↵
- Zhou Y,
- et al.

- ↵
- Myers CJ,
- et al.

- ↵
- Hill AD,
- Tomshine JR,
- Weeding EMB,
- Sotiropoulos V,
- Kaznessis YN

- ↵
- Marchisio MA,
- Stelling J

- ↵
- Baldi P,
- Brunak Sa

## Citation Manager Formats

## Sign up for Article Alerts

## Article Classifications

- Biological Sciences
- Biophysics and Computational Biology

- Physical Sciences
- Statistics

## Jump to section

## You May Also be Interested in

_{2}into liquid methanol using artificial marine floating islands.