Utilizing the information content in twostate trajectories
See allHide authors and affiliations

Contributed by Robert J. Silbey, June 5, 2006
This article has a Correction. Please see:
Abstract
The signal from many singlemolecule experiments monitoring molecular processes, such as enzyme turnover by means of fluorescence and opening and closing of ion channel through the flux of ions, consists of a time series of stochastic “on” and “off” (or open and closed) periods, termed a twostate trajectory. This signal reflects the dynamics in the underlying multisubstate on–off kinetic scheme (KS) of the process. The determination of the underlying KS is difficult and sometimes even impossible because of the loss of information in the mapping of the multidimensional KS onto two dimensions. Here we introduce a previously undescribed procedure that efficiently and optimally relates the signal to all equivalent underlying KS. This procedure partitions the space of KS into canonical (unique) forms that can handle any KS and obtains the topology and other details of the canonical form from the data without the need for fitting. Also established are relationships between the data and the topology of the canonical form to the on–off connectivity of a KS. The suggested canonical forms constitute a powerful tool in discriminating between KS. Based on our approach, the upper bound on the information content in twostate trajectories is determined.
The data from a wide range of singlemolecule experiments (1–23), i.e., the passage of ions and biopolymers through individual channels (3–5), activity and conformational changes of biopolymers (6–15), diffusion of molecules (16–19), and blinking of nanocrystals (20–23), is inevitably turned into a trajectory of “on” and “off” periods (waiting times) (Fig. 1 A). A frequently used assumption in the context of the listed processes describes its corresponding mechanisms by multisubstate on–off Markovian kinetic scheme (KS) (refs. 24–36; Fig. 1 B). (This is a fairly unrestrictive assumption in the context of the listed processes because, in many cases, adding substates to the KS is equivalent for describing the process by coupled stochastic (sub) processes; see also refs. 37–51.) The KS describes a discrete conformational energy landscape of a biomolecule, chemical kinetics with (or without) conformational or environmental changes, stands for quantum states, etc. The underlying stochastic dynamics of the process in the multisubstate on–off KS is thus encoded in the twostate trajectory (the stochastic signal changes value only when transitions between substates of different states in the KS take place). The aim of singlemolecule experiments is to learn about the underlying KS to an extent that is unattainable from bulk measurements due to averaging. However, determining the KS from the twostate trajectory is difficult because the number of the substates in each of the states, L_{x} (x = on, off), is usually large, and the connectivity among the substates is generally complex. A widely used approach for deducing the KS relies on the construction of waiting time–probability density functions (WTPDFs): the WTPDF of state x (= on, off), φ _{x} (t), and the joint probability density functions (PDFs) of two successive waiting times, x event followed by y event, φ_{x,y}(t _{1}, t _{2}), x, y = on, off. [Higherorder successive WTPDFs do not contain additional information on top of φ_{x,y}(t _{1}, t _{2}) (33)]. φ _{x} (t) and φ_{x,y}(t _{1}, t _{2}) are fitted to sums of exponents by common methods, e.g., ref. 52. Then, a search for a KS that leads to the fitted WTPDFs is performed. Alternatively, a maximumlikelihood method can be applied (24, 25), which demands first assuming a KS topology. Although these techniques are frequently used, looking for a possible KS that reproduces the data is an exhaustive task. Moreover, there are KS with the same WTPDFs (26–32). A more sophisticated approach divides the KS space into canonical (unique) forms. (Underlying KS with the same canonical form are equivalent to each other; see, however, the discussion in Examples and the Utility of RD Forms.) Two divisions into canonical forms were previously suggested, called, following Bruno et al. (30), manifest interconductance rank (MIR) and Bauer–Kienker uncoupled (BKU) (31–32) forms. MIR and BKU forms are useful in handling reversible connection, nonsymmetric (i.e., the spectra of the φ _{x} (t) x = on, off are nondegenerate), underlying KS, and are not so efficient in discriminating between KS. In practice, MIR and BKU forms are found from the data by using fitting procedures. Here, we introduce previously undescribed canonical forms, called reduced dimensions (RD) forms, which can handle any KS, i.e., also KS with irreversible connections and/or symmetry. Relationships between fitting free properties of the data, the on–off connectivity of the KS, and the RD form’s topology are established. These relationships are used in mapping a KS into a RD form. A simple procedure for finding the RD form from the data is given, where the topology and other details of the RD form are determined without the need of fitting, which significantly shortens the search time in the KS space. The suggested canonical forms constitute a powerful tool in discriminating among KS. Based on our approach, the upper bound on the information content in twostate trajectories is set.
Methods
Our approach is based on expressing the WTPDFs in an explicit on–off connectivity representation (for any KS). As usual, the on–off process is separated into two irreversible processes that occur sequentially (24–36). For example, φ_{x,y} (t _{1}, t _{2}) (x ≠ y) is given by
(In Section A in Supporting Text, which is published as supporting information on the PNAS web site, expressions for φ _{x} (t) and φ_{x,y}(t _{1}, t _{2}) are given.) Eq. 1 emphasizes the role of the KS topology in expressing φ_{x,y}(t _{1}, t _{2}). N_{x} and M_{x} are the numbers of initial and final substates in state x in the KS, respectively. Namely, each event in state x starts at one of the N_{x} initial substates, labeled, n_{x} = 1, …, N_{x} , and terminates through one of the M_{x} final substates, labeled m_{x} = 1 …, M_{x} , for a reversible on–off connection KS (Fig. 1 B), or m_{x} = N_{x} + 1 − H_{x} , …, N_{x} + M_{x} − H_{x} , for an irreversible on–off connection KS (Fig. 2 A), where H_{x} (0 ≤ H _{x} ≤ N _{x}) is the number of substates in state x that are both initial and final ones. (In each of the states, the labeling of the substates starts from 1.) An event in state x starts in substate n_{x} with probability W_{nx} . The first passage time PDF for exiting to substate n_{y} , conditional on starting in substate n_{x} (x ≠ y), is f _{ny} _{nx}(t) and F _{nx}(t) = Σ_{ny} f _{ny} _{nx}(t). Writing f _{ny} _{nx}(t) as f _{ny} _{nx}(t) = Σ_{mx}ω_{ny} _{mx} f̃ _{mx} _{nx}(t), emphasizes the role of the KS on–off connectivity, where ω_{ny} _{mx} is the transition probability from substate m_{x} to substate n_{y} , and f̃ _{mx} _{nx}(t)ω_{ny} _{mx} is the first passage time PDF, conditional on starting in substate n_{x} , for exiting to substate n_{y} through substate m_{x} . (A sum z_{x} ∈ {Z_{x} } is a sum over a particular group of Z_{x} substates.) Note that all of the factors in Eq. 1 can be expressed by using the master equation (see Section B in Supporting Text).
Results and Discussion
Rank of φ_{x,y}(t_{1}, t_{2}) and its Topological Interpretation.
For discrete time, φ _{x} _{,y}(t _{1}, t _{2}) is a matrix, whose rank R _{x,y} (= 1, 2, …), which is the number of nonzero eigenvalues (or singular values for a nonsquare matrix) of its decomposition, can be obtained without the need of finding the actual functional form of φ_{x,y}(t _{1}, t _{2}). By using Eq. 1 , which gives φ_{x,y}(t _{1}, t _{2}) as sums of terms each of which is a product of a function of t _{1} and a function of t _{2}, we can relate R _{x,y} (x ≠ y) to the topology of the underlying KS. When none of the terms in an external sum on Eq. 1 , after the first or second equality, are proportional, R _{x,y} = min(M_{x} , N_{y} ) (Fig. 1 A and Figs. 5–7, which are published as supporting infromation on the PNAS web site). Otherwise, R _{x, y} < min(n _{x}, N_{y} ) (Fig. 2 E; see also Fig. 8, which is published as supporting information on the PNAS web site, and Section C in Supporting Text), and Eq. 1 is is rewritten such that it has the minimal number of additives in the external summations
This equation means R _{x,y} = Ñ_{y} + M̃ _{x} . Ñ _{y} and M̃ _{x} can be related to the on–off connectivity of the KS. Consider a case where M_{x} < N_{y} , and there is a group of final substates in state x, {O_{x} } with connections only to a group of initial substates in state y, {O_{y} }, and O_{x} > O_{y} (see Fig. 8). Then, M̃ _{x} = M_{x} − O_{x} and Ñ _{y} = O_{y} . (Further discussion and a generalization of this relationship are given in Section C in Supporting Text.)
RD Form.
The R _{x,y} values are obtained from the φ_{x,y}(t _{1}, t _{2}) x, y = on, off without the need of finding its actual functional forms, thus constituting a fittingfree relationship between the data to the on–off connectivity and details of the underlying KS. Using this relationship, the KS space is divided into canonical forms, RD forms, using the R _{x,y} values. Excluding KS with symmetry, R _{x,y} (x ≠ y) is the number of substates in state y in the RD form (see also the discussion in Additional Relationships Between the Data, RD Form, and KS). RD forms can represent underlying KS with symmetry and irreversible connections because they are built from all four R _{x,y} values. The RD form has the minimal number of substates needed to reproduce the data. This number is smaller or equal to the number of independent on–off connections in the MIR form (see Section D in Supporting Text). (The equality holds for nonsymmetric, reversible connection KS.) Connections in the RD form are only between substates of different states, as in the BKU form. Unlike the MIR and BKU forms, for each connection in the RD form there is a WTPDF that is not necessarily exponential.
Mapping a KS into a RD Form.
Mapping a KS into a RD form is based on clustering of (some of) the initial substates in the KS, depending on the on–off connectivity of the KS. Such clusters are one of the two kinds of substates in the RD form, where the second kind originates from single initial substates in the KS. For a nonsymmetric KS, initial substates in state y in the KS that contribute to R _{x,y} (x ≠ y) are mapped to themselves, and those that do not contribute to R _{x,y} are clustered, where initialystate substates in a cluster are all connected to the same finalxstate substate that contributes to R _{x,y}. (When the substate m_{x} has a single exit connection to substate n_{y} , which is its only entering connection, substate n_{y} is defined as the one contributing to the rank.) For example, the KS in Fig. 1 B is mapped into a RD form (Fig. 3 D) when clustering substates 1_{off}–2_{off} and substates 3_{off}–5_{off} into the RD form’s substates 1_{off} and 2_{off}, respectively, because none of the initialoffstate substates contribute to R _{on,off}. The initial on substates are mapped to themselves because they both contribute to R _{off,on}.
The clustering procedure fully determines the WTPDFs for the connections in the RD form (technical details for obtaining these WTPDFs given a KS are discussed in Section C in Supporting Text). Note that the clustering procedure, along with the fact that substates in the KS that are not initial or final ones do not affect the RD form’s topology, reduce the KS dimensionality to that of the RD form.
Finding the RD Form from the Data.
The following steps can be used for finding the RD form from the data (when fitting is needed, we rely on known procedures, e.g., refs. 24, 25, and 52). (i) Find the number of substates in the RD form using decomposition of the φ_{x,y}(t _{1}, t _{2}) x, y = on, off. (ii) Obtain the spectrum of the φ _{x} (t) x = on, off using fitting procedures. The spectrum of the WTPDFs for the x to y (x ≠ y) connections in the RD form is the same spectrum as that of φ _{x} (t), because substates of the same state in the RD form are not connected. Differences lie in the preexponential coefficients. (Steps i and ii can be permutated.) (iii) Apply fitting procedures for finding the preexponential coefficients of the WTPDFs for the connections in the RD. (Other technical details for constructing the RD form from the data are discussed in Section E in Supporting Text and Figs. 9–13, which are published as supporting information on the PNAS web site.)
Examples and the Utility of RD Forms.
The simplest topology for a RD form has one substate in each of the states, namely, R _{x,y} = 1 (x, y = on, off), and the only possible choice for the WTPDFs for the connections is φ_{on}(t) and φ_{off}(t) (Fig. 2 D). This RD form means that all of the information in the data is contained in φ_{on}(t) and φ_{off}(t). Consequently, KS with R _{x,y} = 1 (x, y = on, off) and the same φ_{on}(t) and φ_{off}(t) are indistinguishable (assuming no additional information on the mechanism is known). Examples of such KS are shown in Fig. 2 A–C. This case was discussed in refs. 26–28. The generalization of the equivalence of KS for any case is straightforward using RD forms. KS with the same R _{x,y} values and the same WTPDFs for the connections in the RD form cannot be distinguished. Indistinguishable KS with R _{x,y} = 2 (x, y = on, off) and triexponential φ_{on}(t) and φ_{off}(t) and the corresponding RD form are shown in Fig. 2 E–G.
Clearly, two KS with different R _{x,y} values can be resolved by the analysis of a twostate trajectory. Among the advantages of RD forms is in providing a powerful tool in resolving KS with the same R _{x,y} values, and the same number of exponentials in φ_{on}(t) and φ_{off}(t), even without the need of performing actual calculations, based only on distinct complexity of the WTPDFs for the connections in the corresponding RD forms (compare the KS in Fig. 3 A and B) or on different connectivity of RD forms (compare KS in Fig. 3 A and B with that in Fig. 3 C).
Perhaps it is worthwhile stressing that the above general statement implies that it is impossible to find positive (>0) transition rates for the KS in Fig. 3 A–C that make the φ_{x,y}(t _{1}, t _{2}) x, y = on, off from these KS the same, so these KS can be distinguished by analyzing a twostate trajectory (excluding symmetric cases for which R _{x,y} = 1, ∀x, y).
Note that a RD form can preserve microscopic reversibility on the on–off level even when having irreversible connections. These can be balanced by the existence of directiondependent WTPDFs for the connections. (Microscopic reversibility in a RD form means that the φ_{x,y}(t _{1}, t _{2}) x, y = on, off obtained when reading the twostate trajectory in the forward direction are the same as the corresponding φ_{x,y}(t _{1}, t _{2}) x, y = on, off obtained when reading the trajectory backwards, as suggested in ref. 36 for aggregated Markov chains. Using matrix notation, microscopic reversibility means φ_{x,y}(t _{1}, t _{2}) = [φ_{y,x}(t _{1}, t _{2})] ^{T} , where T stands for the transpose of a matrix.)
The division of KS into equivalence groups (RD forms) is useful also when, on top of the information extracted from the “original” twostate trajectory, additional information about the observed process is available. [Additional information can be inferred, under some physical assumptions, by analyzing different kind of measurements, e.g., the crystal structure of the biopolymer, or by analyzing twostate trajectories while varying some parameters, e.g., the substrate concentration (13–15)]. Suppose that the connectivity of the underlying KS is unchanged by the manipulation. Then, the additional information can be used to resolve KS that correspond to the RD form found from the statistical analysis of the original twostate trajectory, whereas any KS with a different RD form is irrelevant. Alternatively, when manipulating the system leads to a change in the connectivity of the underlying KS, or even to the addition or removal of substates, the RD forms obtained from the different data sets are distinct. Either of these possibilities is identifiable using RD forms and the corresponding KS; in the first case an adequate parameter tuning relates the RD forms obtained from the various sources, whereas in the second case the RD forms cannot be related by a parameter tuning.
Additional Relationships Between the Data, RD Forms, and KS.
Additional relationships between the data, RD forms, and underlying KS are discussed below when considering two cases. (i) All of the R _{x,y} values are the same. For such cases, R _{x,y} is the number of substates in each of the states in the RD form. Also, the number of exponents in φ _{x} (t) is the number of substates in state x in the (simplest) underlying KS. (ii) Some of the R _{x,y} values are different. For such cases, the KS must have irreversible on–off connections and/or symmetry. (iia) When R _{on,off} ≠ R _{off,on}, there are irreversible on–off connections in the underlying KS. (iib) When R _{x,y} ≥ R _{z,z} (x ≠ y) for both values of z = on, off, R _{x,y} is the number of substates in state y in the RD form. (iiic) When R _{z,z} > R _{x,y} for the other three combinations of x and y, R _{z,z} is the number of substates of both states in the RD form, and there is symmetry in state z′ (≠ z) in the underlying KS. Take, for example, the KS in Fig. 3 C, with the on to off transition rates having the same value. Then, R _{on,off} = R _{off,on} = R _{on,on} = 1, and R _{off,off} = 2, but the topology of the RD form is the same as that in Fig. 3 E. (iid) When R _{x,z} > R _{z,z} (x ≠ z), there are irreversible on–off connections and a special connectivity in state x in the KS. In particular, R _{z,z} is the minimal number of substates in state x of the KS among which the random walk must visit in each event in that state. Fig. 4 shows an example for such a case, with H_{x} = 0 and no direct connections between substates in {N_{x} } and substates in {M_{x} }.
Concluding Remarks
The main effort in this work is to use the information content in an ideal (noiseless, infinitely long) twostate trajectory for an efficient elucidation of a unique mechanism that can generate it. Accordingly, the KS space is partitioned into canonical forms that are (usually) not Markovian, where a canonical form is determined by the ranks of the φ_{x,y}(t _{1}, t _{2}) x, y = on, off and the (usually nonexponential) WTPDFs for the connections among substates of different states in the canonical form. The relationships between the (fittingfree) R _{x,y} values, the on–off connectivity of the KS, and the RD form’s topology are the basis for our results, where the mathematical support is provided by Eqs. 1 and 2 .
As a final remark, note that, in principle, one can collect successive x–y events in a selective way, such that the decomposition of the obtained twodimensional histogram has one nonzero eigenvalue (see Section E in Supporting Text). The number of these rankone x–y histograms is equal to the corresponding R _{x,y} and are the terms in a particular external sum in Eq. 1 or 2 . Although as R _{x,y} increases it becomes harder to obtain these rankone x–y histograms, they supply more details on the WTPDFs for the connections in the RD form than their sum and therefore can be viewed as the upper bound on the information content in a twostate trajectory.
Acknowledgments
This work was suppported by the National Science Foundation.
Footnotes
 *To whom correspondence should be addressed. Email: silbey{at}mit.edu

Author contributions: O.F. and R.J.S. designed research, performed research, and wrote the paper.

Conflict of interest statement: No conflicts declared.
 Abbreviations:
 KS,
 kinetic scheme(s);
 PDF,
 probability density function;
 WTPDF,
 waiting time–PDF;
 MIR,
 manifest interconductance rank;
 BKU,
 Bauer–Kienker uncoupled;
 RD,
 reduced dimensions.
Abbreviations:
 © 2006 by The National Academy of Sciences of the USA
References

↵
 Moerner W. E. ,
 Orrit M.

↵
 Weiss S.
 ↵

↵
 Kasianowicz J. J. ,
 Brandin E. ,
 Branton D. ,
 Deamer D. W.

↵
 Kullman L. ,
 Gurnev P. A. ,
 Winterhalter M. ,
 Bezrukov S. M.
 ↵

↵
 Yang H. ,
 Luo G. ,
 Karnchanaphanurach P. ,
 Louie T. ,
 Rech I. ,
 Cova S. ,
 Xun L. ,
 Xie X. S.

↵
 Min W. ,
 Lou G. ,
 Cherayil B. J. ,
 Kou S. C. ,
 Xie X. S.

↵
 Rhoades E. ,
 Gussakovsky E. ,
 Haran G.

↵
 Zhuang X. ,
 Kim H. ,
 Pereira M. J. B. ,
 Babcock H. P. ,
 Walter N. G. ,
 Chu S.

↵
 Lu H. ,
 Xun L. ,
 Xie X. S.
 ↵

↵
 Velonia K. ,
 Flomenbom O. ,
 Loos D. ,
 Masuo S. ,
 Cotlet M. ,
 Engelborghs Y. ,
 Hofkens J. ,
 Rowan A. E. ,
 Klafter J. ,
 Nolte R. J. M. ,
 de Schryver F. C.

↵
 Flomenbom O. ,
 Velonia K. ,
 Loos D. ,
 Masuo S. ,
 Cotlet M. ,
 Engelborghs Y. ,
 Hofkens J. ,
 Rowan A. E. ,
 Nolte R. J. M. ,
 Van der Auweraer M. ,
 et al.
 ↵

↵
 Nie S. ,
 Chiu D. T. ,
 Zare R. N.

↵
 Shusterman R. ,
 Alon S. ,
 Gavrinyov T. ,
 Krichevsky O.

↵
 Zumofen G. ,
 Hohlbein J. ,
 Hübner C. G.

↵
 Cohen A. E. ,
 Moerner W. E.
 ↵

↵
 Chung I. ,
 Bawendi M. G.
 ↵

↵
 Tang J. ,
 Marcus R. A.
 ↵
 ↵
 ↵

↵
 Flomenbom O. ,
 Klafter J.
 ↵
 ↵

↵
 Bruno W. J. ,
 Yang J. ,
 Pearson J.
 ↵

↵
 Kienker P.
 ↵
 ↵
 ↵
 ↵

↵
 Vlad M. O. ,
 Moran F. ,
 Schneider F. W. ,
 Ross J.
 ↵
 ↵

↵
 Qian H. ,
 Elson E. L.
 ↵
 ↵

↵
 Granek R. ,
 Klafter J.
 ↵

↵
 Barsegov V. ,
 Thirumalai D.

↵
 Šanda F. ,
 Mukamel S.

↵
 Allegrini P. ,
 Aquino G. ,
 Grigolini P. ,
 Palatella L. ,
 Rosa A.
 ↵

↵
 Goychuk I. ,
 Hänggi P.

↵
 Flomenbom O. ,
 Klafter J.

↵
 Gopich I. V. ,
 Szabo A.
 ↵