Previous Article |
Table of Contents
| Next Article
Computer Sciences / Biochemistry
Stochastic computing with biomolecular automata

, 


, ¶
, ||
Departments of *Biological Chemistry and
Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel;
School of Computer Science and Engineering and ¶Center for Neural Computation, Hebrew University, Jerusalem 91904, Israel
Edited by Richard M. Karp, International Computer Science Institute, Berkeley, CA, and approved May 5, 2004 (received for review February 2, 2004)
| Abstract |
|---|
|
|
|---|
Biomolecular computers are molecular-scale, programmable, autonomous computing machines in which the input, output, software, and hardware are made of biological molecules (12). Biomolecular computers hold the promise of direct computational analysis of biological information in its native biomolecular form, eschewing its conversion into an electronic representation. Recently this capability was shown to afford direct recognition and analysis of molecular disease indicators, providing in vitro disease diagnosis, which in turn was coupled to the programmed release of the biologically active molecule modeled after an antisense DNA drug (13). Because of the stochastic nature of biomolecular systems (14), a stochastic biomolecular computer would be more suitable for this biomedical task than a deterministic one.
Although the vision of a universal biomolecular computer was proposed three decades ago (15, 16), experimental research in this area began only a decade ago (17). Initially, research focused on large-scale DNA computers in which a human operator, or a large-scale robot, uses parallel DNA manipulation operations to achieve high performance in solving computationally intensive problems (1825). A different research track demonstrated molecular systems that go through a predetermined sequence of state-to-state transitions in a deterministic (26) or stochastic (27) fashion and programmable, autonomous computing machines with molecular input, output, software, and hardware that realize two-state, two-symbol finite automata (1012). A mathematical description of such an automaton is shown in Fig. 1A, and an example computation is shown in Fig. 1B. In the molecular realization of these automata the input is encoded as a single DNA molecule, transition rules are encoded by another set of DNA molecules, and the hardware consists of DNA-manipulating enzymes. A computation commences when all molecular components are present in solution and processes the input molecule in steps performed by the hardware molecules, as directed by the software molecules. The output molecule, formed by the programmed digestion of the input molecule, encodes the result of the computation.
|
Here, we demonstrate a design principle for stochastic computers, afforded by the unique properties of biomolecular computers, to realize the intended probability of each transition by the relative molar concentration of the software molecule encoding that transition. We describe and experimentally analyze a stochastic molecular automaton that operates according to this principle. The experiments show robustness of programmed transition probabilities to input molecule concentrations and to absolute software molecule concentrations and a good fit between predicted and actual result probabilities of multistep computations.
| Materials and Methods |
|---|
|
|
|---|
Calibration Reactions. Calibration of the different concentration ratios of the software transition pairs was performed by using 0.1 µM concentrations of four-symbol inputs. In a typical reaction, a program required for a particular calibration (Fig. 2A) was composed as follows: for each deterministic step, a 0.5 µM concentration of the corresponding transition was added. Thus, the deterministic part of the computation was performed by a total of 1.5 µM transition mixture. For the last-choice step, a total of the 0.5 µM concentration of the tested transition molecules mixture was taken at a required ratio. FokI enzyme was added at the 2 µM final concentration to maintain the stoichiometric ratio with the software molecules. Before input addition reaction mixtures were preincubated with FokI at 15°C for 20 min. The reactions were quenched after 2 h and run on denaturing acrylamide gel (40 cm x 0.4 mm, 15%, 7 M urea) with low-fluorescence plates. The fluorescence was read by using the TYPHOON SCANNER CONTROL software of the Typhoon 9400 machine and quantified by using the IMAGEQUANT V. 5.2 software (Amersham Pharmacia Biosciences). The quantitation was performed according to the Cy5 label (PMT 500550 V) on the 5' terminus of the antisense strand, because it gave more reproducible and consistent results than the FAM label on the 3' terminus of the sense strand. Excitation was done with the red laser (633 nm), and emission was measured through the 670 BP30 filter.
|
Calculation of Transition Probabilities. To calculate the transition probabilities by using measured output distribution, a computer implementation of the computation graph was developed in MATLAB 6.5 (MathWorks, Natick, MA). The simulation generated an equation set for each given program, with transition probabilities as the unknown variables. Each equation summed the probabilities of all possible computation paths carried out on one input. The result was expected to be similar to the measured final-state distribution for that computation. A solution to such an equation set is an optimal set of transition probabilities, which minimizes the discrepancy between the calculated and the measured final-state distribution. A least-squares optimization was used for calculating the optimal solution by iteratively refining the transition probabilities by using the method of preconditioned conjugate gradients. However, because of the nature of the problem, there could be local solutions, which might not be consistent with other programs. The following process was used for finding a consistent solution: Programs 1, 2, and 3 were used jointly as training programs according to which the simulation could learn a consistent set of transition probabilities. The optimization process was repeated 450 times for programs 1, 2, and 3. For each set, a single optimization was performed given the calibration initial values. Additional 449 optimizations were performed for each program with random initial values. All possible combinations of the calculated optimal transition sets were compared with each other. The most consistent triplet-of-transition probability set was selected, so that the transition probabilities calculated for the same concentration ratios were similar for different programs.
Determining the Deviation of Predicted Results. Determination of the standard deviation of the predicted output ratio was performed by simulating all possible independent pipetting errors of 5% with the same probabilities when preparing the mixtures of the transition molecules. We simulated the experimental setup where each pair of transitions was mixed independently, and then the program mixture was composed from equal solution volumes containing each pair. We assumed discrete deviations of -5%, 0%, and 5% from the nominal volume of each software molecule solution. The deviation in the transition probability for each volume deviation was calculated from the measured calibration curve. Each program generated a set of 6,561 (38) different probability combinations, resulting in 6,561 possible output ratios, for which a standard deviation was calculated. The average of the set was very close to the predicted value with no deviations. The simulation was performed with the MATLAB 6.5 software (MathWorks).
| Results |
|---|
|
|
|---|
At the molecular level, two transition molecule species compete for any intermediate configuration with probabilities of success correlated to their relative concentrations (Fig. 1F). When the computation is performed on a large ensemble of input molecules, the probability to obtain a particular final state can be measured directly from the relative concentration of the output molecule encoding this state among all output molecules. We control the transition probabilities by varying the relative concentration of the transition molecules and attempt to predict the distribution between output molecules for a given input based on these concentrations. Our key problem in doing so is to determine the function linking relative concentrations of competing transition molecules to the probability of a transition being chosen. Minute variations in the chemical composition of each transition molecule result in different affinities to the input molecule and, therefore, this function must be determined experimentally. Furthermore, we cannot assume that this function is independent of the absolute molar concentrations of the input and/or of the software molecules. The bulk of our experimental and computational work was devoted to determining this function for each pair of competing transition molecules and to establishing its independence of these other parameters.
To determine the function mapping relative concentrations of transition molecules to transition probabilities we performed calibration with the four-symbol inputs aaab and bbba. The computation proceeds deterministically until the last symbol and then performs a stochastic choice between two competing transitions. We used this design to represent adequately multisymbol computations and avoid effects unique to initial symbols. We prepared software mixtures that represent all four possible intermediate state-symbol combinations (Fig. 2 A). The computation was carried to completion and the exact duration was determined in the preliminary experiments with inputs of different lengths. The resulting measured calibration curves for all four competing pairs of transition molecules are shown in Fig. 2B, demonstrating a different mapping for each pair of transitions. Transitions processing the symbol a provide a linear mapping, with a probability distribution being close to the concentration ratio. On the other hand, transitions processing b reveal a convex mapping, which is apparently caused by the software molecules that result in state S1 (T4 and T8) having a higher reaction rate than the competing software molecules that result in state S0 (T3 and T7). In steep regions of the curve the precision with which probabilities can be programmed is reduced because of higher sensitivity to pipetting errors. Error probabilities of the software molecules are reflected in the non-zero probability to accomplish a transition when its competing transition is absent (e.g., T3T4). However, we suspect that the computation reaction also has byproducts that overlap with error-representing bands but do not accumulate in the final error of multistep computations. Studies to distinguish between the two types of erroneous products are underway. Accumulating errors in transitions result from the incorrect cleavage by FokI that also cleaves one nucleotide further than expected both in the sense and antisense strands of the input molecule. Because of this imprecision in enzyme action and the relative location of the S1 and S0 sticky ends, transitions that transform to state S1 may result in transitions to state S0, whereas transitions intended to transform to S0 result in dead-end products. This finding explains the higher apparent error rates observed with the software molecules that transform to S1.
To verify that the system is insensitive to fluctuations in input concentration, which naturally occur in the course of a multistep computation, we measured the distribution of output states of a computation with one stochastic choice by using the input bbba and the computation tree used for calibration of the competing pair of transition pair T1T2. The summary of the results in Fig. 2C indicates that the computation is insensitive to the different input concentrations used and the transition probability is determined solely by the ratio between the transition molecules. Another set of experiments was performed to ensure that the transition probability as determined by the ratio between transition molecules is not affected by their absolute concentrations. We used the same experimental system as above, but in this case the concentration of the input was kept constant while the total concentration of the competing transition molecules varied. The summary of the results is shown on the graph in Fig. 2D. The results indicate that the transition probability is indeed relatively insensitive to the absolute software concentrations and is defined mostly by the relative concentration ratio.
We tested the stochastic finite automaton by running four programs with the same structure as the one described in Fig. 1C, but with different transition probabilities (Table 1), on nine inputs that varied in symbol composition and length (I1I9, Table 2, which is published as supporting information on the PNAS web site). Each program specifies the relative concentrations of all pairs of competing transition, which in turn determine their expected probabilities as explained below. We carried out extensive preliminary experiments (not shown) to develop the reaction protocol. These experiments indicated that at high concentrations the transition T6 probably digests other transition molecules with the aid of the enzyme FokI, which is present in the solution, modifying the software composition during the computation in an unpredictable way. Hence, we avoided high concentrations of T6 in our programs. We then performed the calibration and computation experiments reported here in a single run under uniform conditions. Fig. 2E shows representative results of the computations as analyzed by PAGE. Each lane is an application of programs 3 and 4 to one of the inputs. We predicted the distribution of the outputs by using the calibration graphs (Fig. 2B). Good correlation was observed between predicted and measured results by using measured transition probabilities, although some systematic errors were apparent. To account for these discrepancies, we calculated the expected deviation in output probabilities caused by independent pipetting errors of 5% when mixing transition molecules (Fig. 3). The calculation indicated that a number of measured results, notably with inputs ending with b, fell outside of the expected error range and were consistently lower than the prediction. Therefore, this bias could not be attributed solely to pipetting errors but rather to some error in the method of direct probability measurement. To compensate for discrepancies between transition probabilities obtained in direct measurements and those manifested in multistep computations, we designed an in silico method for probability determination based on a simulation of the reaction network. The simulation utilizes least-squares optimization to find an optimal set of transition probabilities for each program. The optimization begins with measured transition probabilities and iteratively refines them until a minimum discrepancy between calculated and measured output probabilities is reached. Programs 1, 2, and 3 were used jointly as a training set according to which the simulation could learn a consistent set of optimal transition probabilities. The transition probabilities calculated for the same concentration ratios were equal for different programs, as they should be under our molecular computational model. The computed transition probabilities were then tested independently on program 4, since it is composed of pairs of transitions already used in at least one of the programs 13, and it provided a good fit to the measured final-state probabilities of that program.
|
|
In Table 1, column [T]rel shows the relative percentage of the transition molecules that transform to S1 within each competing pair; column Pm shows the measured transition probability corresponding to this molecule, and column Pc shows the calculated transition probability. Fig. 2F shows the correlation between measured output probabilities and output probabilities calculated from measured and calculated transition probabilities. The correlation that utilizes the calculated transition probabilities is very good. A detailed summary of the results is given in Tables 2 and 3, which are published as supporting information on the PNAS web site.
| Discussion |
|---|
|
|
|---|
| Acknowledgements |
|---|
| Footnotes |
|---|
Freely available online through the PNAS open access option.
Abbreviation: FAM, carboxyfluorescein.
R.A. and Y.B. contributed equally to this work. ![]()
|| To whom correspondence should be addressed. E-mail: ehud.shapiro{at}weizmann.ac.il.
© 2004 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
S. Beyer and F. C. Simmel A modular DNA signal translator for the controlled release of a protein by an aptamer Nucleic Acids Res., March 17, 2006; 34(5): 1581 - 1587. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||