New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Protein–DNA computation by stochastic assembly cascade

Communicated by Robert H. Austin, Princeton University, Princeton, NJ (received for review October 11, 2001)
Abstract
The assembly of RecA on singlestranded DNA is measured and interpreted as a stochastic finitestate machine that is able to discriminate fine differences between sequences, a basic computational operation. RecA filaments efficiently scan DNA sequence through a cascade of random nucleation and disassembly events that is mechanistically similar to the dynamic instability of microtubules. This iterative cascade is a multistage kinetic proofreading process that amplifies minute differences, even a single base change. Our measurements suggest that this stochastic Turinglike machine can compute certain integral transforms.
Every computation requires a reliable recognition of its input data. Any scheme for computation based on protein–DNA binding must attain this recognition within the physical properties of this interaction, its specificity, affinity, and cooperativity. These properties define biochemical networks such as those used by the cell to process information received from stimuli and to compute its response. The resulting computations are inherently stochastic due to the “noisy” nature of biochemical pathways that resemble more a probabilistic pinball machine than a deterministic desktop PC.
So far, artificial, in vitro biomolecular computing strategies relied mainly on Watson–Crick complementarity of DNA (or RNA). These schemes, which were used to solve a certain class of hardtocompute problems, are almost deterministic due to the relatively high hybridization energy. The typical algorithm, combinatorial search, encodes potential solutions as DNA sequence library and then selects correct solution(s) via parallel filtration, eliminating the wrong solutions by manipulations based on complementarity (1, 2). Another approach constructs finitestate computing machines, the internal states of which are encoded in DNA sequence (3).
Here we report an in vitro stochastic biomolecular computation based on lowspecificity protein–DNA binding: An assembly cascade of RecA proteins on singlestranded DNA can discriminate between similar sequences, thus fulfilling a basic computational task that may be one stage in a more complex computation. The assembly process overcomes the errorprone nature of the single protein binding by constructing a multistage cascade, similar to kinetic proofreading (4), in which many proteins bind and unbind collectively. We find that the dynamics of the cascade is mechanistically similar to the dynamic instability of microtubules, which is used as an efficient space search algorithm within the living cell (5). It also resembles a stochastic counter (6), an imperfect digital apparatus that registers the number of certain events (think of a voting machine). The collective, nonlinear mode of operation of the cascade enables sensitive discrimination of minute length and sequence differences including a single base change.
The hardware of our molecular machine comprises a testtube filled with a solution of singlestranded DNA molecules, RecA proteins, and ATP molecules that fuel the assembly cascade. When the concentration of RecA monomers exceeds some onset value, they start to form helical filaments, one RecA monomer per each base triplet, that envelope and stretch the DNA (7). A filament first forms when a nucleus, a RecA monomer, binds to a random site along the DNA and then extends rapidly by polymerization to the 3′ end of the empty strand. When bound to DNA, RecAs hydrolyze ATP and change their conformation into a less stable state. The RecA that is closest to the 5′ end, with only one neighboring monomer, tends to disassemble back into the solution when hydrolyzing ATP (8). The resulting assembly–disassembly cascade is asymmetric; while nucleation events extend the filament by long chunks, disassembly removes monomers one by one. A graphical manifestation of this stochastic asymmetry is the irregular sawtooth form of the filament length (or machine state) dependence on time (Fig. 1A).
Rather than further describing the extensively studied biochemistry of RecA assembly (7) we focus on the computational features of this protein–DNA molecular machine, its “software.” We use here the notion of “machine” in the sense of certain physical realization of an abstract computation, sequence discrimination in our case. Nucleation and disassembly are the two basic operations of this machine. They change the machine's internal state, which is determined by the current length of RecA filament. To describe the machine dynamics, we use the traditional statetransition diagram, where circles represent states, and arrows represent transitions between states (Fig. 1B). In state Q_{n}, n binding sites out of total N sites along the DNA are vacant, and the RecA filament length is therefore N − n (Q_{0} is a fully covered DNA, and Q_{N} is an empty strand). Clearly, this is a finitestate machine with the number of states equal to the number of binding sites, N. The symbols on each arrow represent the probability per unit of time that such transition occurs given that the machine is in the state at the tail of the arrow. Disassembly can take the machine from state Q_{n} to the next state of the cascade Q_{n+1} at rate κ_{−} whereas at nucleation events the machine jumps from Q_{n} to any of the lower states Q_{m}, m < n, at rate κ_{+}. We also need an output device that will report the machine's current state. In the experiment, the molecular machine “reports” its state through a change in the rotational motion of the DNA molecule, which is directly related to the number of bound RecA monomers and measured by fluorescence anisotropy (9).
The stochastic statetransition diagram can be expressed as a set of N differential equations for the probabilities p_{n} that the machine is at state Q_{n}. Summing the incoming (first two terms) and outgoing (last two terms) transitions at each state of the diagram we obtain 1 The statetransition diagram couples each polymerization state Q_{n} with all the lower states Q_{m}, m < n (Fig. 1), and the equivalent master equation (Eq. 1) is therefore integrodifferential, with boundary and normalization conditions dp_{N}/dt = κ_{−}p_{N−1} − Np_{N}, ∑ p_{n} = 1. To reduce the connectivity of the statetransition diagram, we express it in terms of the cumulative P_{n} = ∑ p_{m}, the probability to find the machine at a stage higher or equal to Q_{n}. It is also the probability that the filament is shorter than N − n and that site n is empty. Two processes can alter P_{n}: (i) disassembly fronts vacate site n at a rate proportional to the front position distribution, −p_{n} = P_{n} − P_{n+1}, and (ii) nucleation at any of the n vacant sites fills site n. The dynamics is simpler, since any possible nucleation from a higher state, Q_{m}, m > n leaves P_{n} unchanged. The resulting master equation is local, dP_{n}/dt = −κ_{−}(P_{n+1} − P_{n}) − κ_{+}nP_{n}, with the normalization P_{0} = 1. Technically, one can obtain this result directly by summing Eq. 1 from n to N with the boundary conditions.
Thinking of n as a spatial coordinate, we approximate the discreet master equation for P_{n}(t) by a “drift” equation for the continuous cumulative probability P(n, t). 2 The disassembly term is approximated by a gradient, neglecting higher order derivatives in the Kramers–Moyal expansion (10). In particular, we omit the familiar secondorder diffusive term that plays a minor role as long as disassembly is much faster than nucleation rate, in the regime, (κ_{+}/κ_{−})N ≪ 1, which includes the large fluctuation regime, (κ_{+}/κ_{−})N^{2} ≈ 1, where the RecAassembly cascade is the most sensitive.†
Master equations such as 1 and 2 are generic in stochastic transition processes, especially in chemical kinetics (10). What makes the computingmachines terminology natural in our case is the understanding that the RecAbinding cascade processes information encoded in the DNA sequence. This may be clarified if one considers a concrete machine model of the cascade. This time we think of a Turinglike device, a deterministic machine that is coupled to an infinite tape through a reading head (Fig. 1C). The internal states of the machine are the same N binding states Q_{n}. The noisy Brownian dynamics of the cascade is embedded in the tape, which is produced by the following procedure: Time is divided into an infinite series of short equal segments that correspond to the squares on the tape. To each square we randomly assign a symbol with a probability that matches the transition rates. We denote disassembly from state Q_{n} by d_{n}, nucleation to state Q_{n} by e_{n}, and in the rest of the squares we write x to denote that nothing happens during the corresponding time duration. The machine reads the squares sequentially from, say, left to right and responds according to the symbol written in the current square. Suppose that the machine is at state Q_{n}, then it responds according to a simple set of rules. (i) If it reads d_{n}, it moves to state Q_{n+1}. (ii) If it reads e_{m} and m < n the machine moves to state Q_{m}. (iii) In all other cases, if it reads x or d_{m} with m ≠ n, or e_{m} with m ≠ n, then it stays at state Q_{n}. After its state is determined, the reading head moves one square to the right.
Stochastic automata are natural to information processes ever since they emerged in Shannon's classical study of communication channels (14, 15). The notion of stochastic computers was introduced to the molecular realm in Bennett's discussion of DNA translation and replication, where the computational task is sequence copying (16). We show below that rather than Xeroxing the sequence like RNA and DNApolymerase, the RecA cascade carries out another type of computation, the discrimination of closeby sequences. Sequence information is encoded in the random tape through the dependence of the probabilities for disassembly (d_{n}) and nucleation (e_{n}) events at a certain site n on the specific base triplet. This information can be equivalently encoded as sequencespecific transition rates, κ_{−}(n) and κ_{+}(n), in the statetransition diagram and the corresponding master equation. Although RecA is a nonspecific binding protein with similar affinities for many possible triplets, our measurements show (9) that the collective assembly cascade constructed from these lowspecificity components is a highly specific detector that can amplify and discriminate even minute sequence differences (17).
The “sequencedetector” machines we construct are assembly cascades on singlestranded DNAs, 39 or 78 bases long (13 and 26stage machines). Any measurement that tries to “look inside” such a stochastic machine, that is to infer its internal dynamics from observable output, has to rely on statistical analysis (18). One must collect a sufficient set of observations to overcome the noisiness of the output. We resolve this difficulty by simultaneously measuring many identical machines, ≈10^{5}–10^{6} fluctuating DNA–RecA complexes that produce a very smooth ensembleaverage signal. An alternative approach could be time averaging over a singlemolecule signal (19). Our DNAs carry a fluorescent dye attached to their 3′ end. The fluorescence anisotropy of the dye reports RecA binding as it slows down the rotational motion of the DNA (9). The response of the cascade is examined as we tune the interplay between nucleation and disassembly by changing the available amount of RecA in the solution. The nucleation rate at any vacant site increases with RecA concentration, R, like κ_{+}(n) = κ_{T}(n)⋅R (where κ_{T}(n) is the tripletspecific rate constant), while the disassembly rate remains constant as the amount of ATP it consumes is kept at saturation level. When we add RecA to the test tube more monomers bind DNA through nucleationpolymerization process, and the chance to find occupied sites increases. The binding curve is sigmoidal, typical of collective chemical kinetics (Fig. 2 Inset).
Our measurement suggests an exponential sensitivity of the assembly cascade to sequence length and RecAbinding rate constants, κ_{−} and κ_{+} (Fig. 2). To understand how such amplified sensitivity is accomplished, we reexamine the master equation (Eq. 2). Motivated by our measurements that indicate fast relaxation of the assembly cascade, we study its steady state. For a uniform sequence, κ_{−}(n) = κ_{−}, κ_{+}(n) = κ_{+}, the steadystate probability distribution is Gaussian, 3 It follows that even a slight difference in transition rates of two uniform sequences is exponentially amplified as the cascade advances to its higher states (large n). The maximal enhancement increases exponentially like the square of the states number, actually the number of DNA base triplets. Similarly, the cascade can discriminate between lengths, N_{1} and N_{2}, of two sequences made of the same triplets, since the probability ratio at the highest states is P(N_{1})/P(N_{2}) = exp[−(κ_{+}/2κ_{−})(N − N)]. The exponential amplification is the result of the iterative, multistage structure of the cascade. It is the same design principle that underlies industrial distillation (20) and the kinetic proofreading pathway of protein synthesis (4). The exponential amplification of the cascade is evident from the behavior of the normalized fluorescence anisotropy of uniform triplet repeats at the saturated regime (Fig. 2). In this regime of high nucleation rate, it becomes harder for the machine to climb up to higher states through successive disassembly steps (Fig. 1B). However, this helps the cascade to discriminate lengths, because shorter sequences need less disassembly steps to reach higher states, and indeed the curve for the longer (TAC)_{26} tripletrepeat sequence is steeper than that of the halfsize (TAC)_{13}.
We test the sequencediscrimination capability of the cascade by comparing the binding curves of two uniform singlestranded DNA molecules made of very similar triplet repeats, TAC and TCA (Fig. 3A). The difference between the two binding signals behaves similarly to the relative entropy of the two probability distributions (sometimes called “information for discrimination”; ref. 21) and therefore gives a good idea about their distinguishability. For both lengths n = 13,26 we find that the difference peaks at a certain rate ratio κ_{+}/κ_{−} that corresponds to the maximal slope of the binding curves (Fig. 2 Inset), where cooperativity is highest (Eq. 2). The peak indicates optimal tuning of the back and forth scanning motion that is used by the stochastic machine to “read” the sequence (a process that was mapped to sequential reading of a random tape). Thinking of the serriform time series (Fig. 1A) as a “sentence” composed of an Nstate alphabet printed by a stochastic typewriter (something like … Q_{10}Q_{11}Q_{8}Q_{9}Q_{3}Q_{3}Q_{3}Q_{4}… ), then the appearance of the “letter” Q_{N} corresponds to a completed scan. Interpreting Q_{N} as a “space bar,” the maximal rate of completed scans corresponds to the most informative reading with the maximal rate of “words.” This occurs at the “working point” of the cascade, t_{+}/t_{−} ∼ 1, when the time interval between nucleation events, t_{+} ∼ 1/(Nκ_{+}), is matched with the time required to climb back to state Q_{N}, t_{−} ∼ N/2κ_{−}. With the rates measured independently by kinetic assays we find that optimal separation occurs indeed in the optimal regime, t_{+}/t_{−} ∼ 1–3.
The protein assembly cascade dynamics can detect also localized differences in nonuniform sequences. A stringent test for our machine is the discrimination of a single base change. We therefore introduced a change C → G at the seventh triplet of the two uniform (TAC)_{N} sequences and measured the discrimination (Fig. 3B). Similar to the uniform sequences, the difference in binding between a sequence and its variant peaks at an optimal κ_{+}/κ_{−} that is lower for the longer sequence, consistent with the working point.
The machine's ability to discriminate localized changes suggests a basis for certain mathematical computations, integral transforms. Consider an ensemble of uniform sequences made of N RecAfavored triplets (relatively high κ_{+}; ref. 9). Within each sequence, we encode a “defect” in the form of a single unfavorable triplet placed at one of the N possible sites. Let w(n) designate the fraction of sequences with defect at site n. A test tube with a mixture of all these sequences encodes the vector [w(1), w(2), … w(N)]. A monodisperse solution with defect only at site n is one of the Nunit base vectors that span our sequence space. As shown below, the signal from such a base vector is exponential in the position of the defect, S(n, k) ∼ exp(−k n), where k = Δ(κ_{+}/κ_{−}) is the difference in the ratio of reaction rates at the defect. Since the fluorescence anisotropy is an ensemble average, the signal of mixture is a Laplacelike transform, 4 To account for the nonuniformity of a DNA sequence with sitedependent nucleation and disassembly rates, κ_{+}(n) and κ_{−}(n), we modify the continuous master equation to with the inhomogeneous steadystate solution A mutation at site n_{0} implies a localized change of reaction rates by Δκ_{+} and Δκ_{−.} When the mutation is in a formerly uniform sequence, variation of the steadystate profile exhibits a change that depends on the position of the site as ΔP(n)/P(n) ≃ − κn_{0}, where the “wave number,” k, is the difference in the reaction rates ratio, The resulting relative change in P(n) depends exponentially on the position of the mutation n_{0}. Integrating over P(n) we find that the anisotropy signal scales like S(n_{0}, k) ∼ exp(−kn_{0}).
By choice of other types of sequence base vectors, the stochastic cascade machinery, through the ensemble measurement, can encode and decode mixtures in terms of other transforms. It is tempting to speculate that with additional operations to manipulate sequences at hand, such as recombination, one could construct a molecular architecture for more complex computations. The question of whether RecA assembly is used for natural computation requires in vivo testing (22).
Acknowledgments
We thank K. Adzuma and B. Shraiman for discussions and suggestions and D. Thaler for a fruitful and inspiring collaboration.
Footnotes

↵* To whom reprint requests should be addressed. Email: libchbr{at}mail.rockefeller.edu.

↵† We note that the assembly dynamics differs essentially from the Langevin dynamics of a particle diffusing in a onedimensional random force field (11) or the related asymmetric exclusion process (12): While a diffusing particle travels continuously, the filament end can abruptly jump to a new site by nucleation. The master equation, therefore, does not lead to the familiar Fokker–Planck equation, and an equivalent Langevin formulation would require infinite stochastic forces to enable the nucleation jumps (10). The effect of diffusion remains minor for short enough inhomogeneous sequences such as the sequence with the point mutation measured in the experiment. In contrast, assembly on longer sequences, (κ_{+}/κ_{−})N ≈ 1, is predicted to exhibit a striking randomness effect of both sequence and diffusion, which may lead to anomalous motion and phase transitions (13).
 Received October 11, 2001.
 Accepted June 20, 2002.
 Copyright © 2002, The National Academy of Sciences
References
 ↵
 Adleman L M
 ↵
 ↵
 ↵
 Hopfield J J
 ↵
 ↵
 ↵
 Kowalczykowski S C,
 Dixon D A,
 Eggelston A K,
 Lauder S D,
 Rehrauer W M
 ↵
 ↵
 BarZiv R,
 Libchaber A
 ↵
 van Kampen N G
 ↵
 ↵
 ↵
 ↵
 Shannon C E
 ↵
 Shannon C E
 ↵
 ↵
 Shannon C E,
 McCarthy J
 von Neumann J
 ↵
Grenander, U. (1966) Res. Pap. Statist. Testchrift J. Neyman 107–123.
 ↵
 Hegner M,
 Smith S B,
 Bustamante C
 ↵
 Lord Rayleigh
 ↵
 Cover T M,
 Thomas J A
 ↵
Citation Manager Formats
More Articles of This Classification
Physical Sciences
Physics
Biological Sciences
Related Content
 No related articles found.
Cited by...
 Molecular mechanisms and genomic maps of DNA excision repair in Escherichia coli and humans
 Speed, dissipation, and error in kinetic proofreading
 Synthesis of crystals with a programmable kinetic barrier to nucleation
 Fueling protein DNA interactions inside porous nanocontainers
 Stochastic computing with biomolecular automata