New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Nonlinear system theory: Another look at dependence

Communicated by Murray Rosenblatt, University of California at San Diego, La Jolla, CA, August 4, 2005 (received for review April 29, 2005)
Abstract
Based on the nonlinear system theory, we introduce previously undescribed dependence measures for stationary causal processes. Our physical and predictive dependence measures quantify the degree of dependence of outputs on inputs in physical systems. The proposed dependence measures provide a natural framework for a limit theory for stationary processes. In particular, under conditions with quite simple forms, we present limit theorems for partial sums, empirical processes, and kernel density estimates. The conditions are mild and easily verifiable because they are directly related to the datagenerating mechanisms.
Let ε _{i} , , be independent and identically distributed (iid) random variables and g be a measurable function such that is a properly defined random variable. Then (X_{i} ) is a stationary process, and it is causal or nonanticipative in the sense that X_{i} does not depend on the future innovations ε _{j}, j > i. The causality assumption is quite reasonable in the study of time series. Wiener (1) considered the fundamental coding and decoding problem of representing stationary and ergodic processes in terms of the form Eq. 1 . In particular, Wiener studied the construction of ε _{i} based on X_{k}, k ≤ i. The class of processes that Eq. 1 represents is huge and it includes linear processes, Volterra processes, and many time series models. In certain situations, Eq. 1 is also called the nonlinear Wold representation. See refs. 24 for other deep contributions of representing stationary and ergodic processes by Eq. 1 . To conduct statistical inference of such processes, it is necessary to consider the asymptotic properties of the partial sum and the empirical distribution function .
In probability theory, many limit theorems have been established for independent random variables. Those limit theorems play an important role in the related statistical inference. In the study of stochastic processes, however, independence usually does not hold, and the dependence is an intrinsic feature. In an influential paper, Rosenblatt (5) introduced the strong mixing condition. For a stationary process (X_{i} ), let the sigma algebra , m ≤ n, and define the strong mixing coefficients If α _{n} → 0, then we say that (X_{i} ) is strong mixing. Variants of the strong mixing condition include ρ, Ψ, and βmixing conditions among others (6). A central limit theorem (CLT) based on the strong mixing condition is proved in ref. 5. Since then, as basic assumptions on the dependence structures, the strong mixing condition and its variants have been widely used and various limit theorems have been obtained; see the extensive treatment in ref. 6.
Since the quantity in Eq. 2 measures the dependence between events A and B and it is zero if A and B are independent, it is sensible to call α _{n} and its variants “probabilistic dependence measures.” For stationary causal processes, the calculation of probabilistic dependence measures is generally not easy because it involves the complicated manipulation of taking the supremum over two sigma algebras (79). Additionally, many wellknown processes are not strong mixing. A prominent example is the Bernoulli shift process. Consider the simple AR(1) process X_{n} = (X_{n} _{1} + ε _{n} )/2, where ε _{i} are iid Bernoulli random variables with success probability 1/2 (see refs. 10 and 11). Then X_{n} is a causal process with the representation and the innovations ε _{n} , ε _{n} _{1},..., correspond to the dyadic expansion of X_{n} . The process X_{n} is not strong mixing since α _{n} ≡ 1/4 for all n (12). Some alternative ways have been proposed to overcome the disadvantages of strong mixing conditions (8, 9).
Dependence Measures
In this work, we shall provide another look at the fundamental issue of dependence. Our primary goal is to introduce “physical or functional” and “predictive dependence measures” a previously undescribed type of dependence measures that are quite different from strong mixing conditions. In particular, following refs. 1 and 13, we shall interpret Eq. 1 as an input/output system and then introduce dependence coefficients by measuring the degree of dependence of outputs on inputs. Specifically, we view Eq. 1 as a physical system where e_{i}, e_{i} _{1},... are inputs, g is a filter or a transform, and x_{i} is the output. Then, the process X_{i} is the output of the physical system 3 with random inputs. It is clearly not a good way to assess the dependence just by taking the partial derivatives ∂g/∂e_{j} , which may not exist if g is not wellbehaved. Nonetheless, because the inputs are random and iid, the dependence of the output on the inputs can be simply measured by applying the idea of coupling. Let () by an iid copy of (ε _{i} ); let the shift process ξ _{i} = (..., ε _{i} _{1}, ε _{i} ), . For a set , let if j ∈ I and ε _{j} _{,} _{I} = ε _{j} if j ∉ I; let ξ _{i} _{,} _{I} = (..., ε _{i} _{1,} _{I} , ε _{i} _{,} _{I} ) and . Then ξ _{i} _{,} _{I} is a coupled version of ξ _{i} with ε _{j} replaced by if j ∈ I. For p > 0 write if and ∥X∥ = ∥X∥_{2}.
Definition 1 (Functional or physical dependence measure): For p > 0 and let δ _{p} (I, n) = ∥g(ξ _{n} )  g(ξ _{n} _{,} _{I} )∥ _{p} and . Write δ(n) = δ_{2}(n).
Definition 2 (Predictive dependence measure): Let p ≥ 1 and g_{n} be a Borel function on such that , n ≥ 0. Let ω _{p} (I, n) = ∥g_{n} (ξ_{0})  g_{n} (ξ_{0,} _{I} )∥ _{p} and . Write ω(n) = ω_{2}(n).
Definition 3 (pstability): Let p ≥ 1. The process (X_{n} ) is said to be pstable if , and pstrong stable if . If Ω = Ω_{2} < ∞, we say that (X_{n} ) is stable.
By the causal representation in Eq. 1 , if min{i: i ∈ I} > n, then δ _{p} (I, n) = 0. Apparently, δ _{p} (I, n) quantifies the dependence of X_{n} = g(ξ _{n} ) on {ε _{i}, i ∈ I} by measuring the distance between g(ξ _{n} ) and its coupled version g(ξ _{n} _{,} _{I} ). In Definition 2, is the nstep ahead predicated mean, and ω _{p} (n) measures the contribution of ε_{0} in predicting future expected values. In the classical prediction theory (14), the conditional expectation of the form is studied. The one used in Definition 2 has a different form. It turns out that, in studying asymptotic properties and moment inequalities of S_{n} , it is convenient to use and predictive dependence measure (cf. Theorems 2 and 3), while the other version is generally difficult to work with. In the special case in which X_{n} are martingale differences with respect to the filter σ(ξ _{n} ), g_{n} = 0 almost surely and consequently ω(n) = 0, n ≥ 1.
Roughly speaking, since , the pstability in Definition 3 indicates that the cumulative contribution of ε_{0} in predicting future expected values is finite. Interestingly, the stability condition Ω_{2} < ∞ implies invariance principles with norming in a natural way (Theorem 3). By (i) of Theorem 1, pstrong stability implies pstability since δ _{p} (n) ≥ ω _{p} (n).
Our dependence measures provide a very convenient and simple way for a largesample theory for stationary causal processes (see Theorems 25 below). In many applications, functional and predictive dependence measures are easy to use because they are directly related to datagenerating mechanisms and because the construction of the coupled process g(ξ _{n} _{,} _{I} ) is simple and explicit. Additionally, limit theorems with those dependence measures have easily verifiable conditions and are often optimal or nearly optimal. On the other hand, however, our dependence measures rely on the representation 1, whereas the strong mixing coefficients can be defined in more general situations (6).
Theorem 1. (i) Let p ≥ 1 and n ≥ 0. Then δ _{p} (n) ≥ ω _{p} (n). (ii) Let p ≥ 1 and the projection operator , . Then for n ≥ 0, (iii) Let p > 1, C_{p} = 18p ^{3/2}(p  1)^{1/2} if 1 < p < 2, C_{p} = if p ≥ 2; let . Then
Proof: (i) Since , which by Jensen's inequality implies δ _{p} (n) ≥ ω _{p} (n). (ii) Since and and (ε _{i} ) are independent, we have and inequality 4 follows from
(iii) For presentational clarity, let I = {..., 1, 0}. For i ≤ 0 let Then D _{0}, D _{1},.. .are martingale differences with respect to the sigma algebras σ(ε _{i} ,..., ε _{n} ), i = 0, 1,.... By Jensen's inequality, ∥D_{i} ∥ _{p} ≤ δ _{p} (n  i). Let , and . Then and To show Eq. 5 , we shall deal with the two cases 1 < p < 2 and p ≥ 2 separately. If 1 < p < 2, then . By Burkholder's inequality (15) If p ≥ 2, by proposition 4 in ref. 16, . So Eq. 5 follows.
Inequality 5 suggests the interesting reduction property: the degree of dependence of X_{n} on can be bounded in an elementwise manner, and it suffices to consider the dependence of X_{n} on individual ε _{i} . Indeed, our limit theorems and moment inequalities in Theorems 25 involve conditions only on δ _{p} (n) and ω _{p} (n).
Linear Processes. Let ε _{i} be iid random variables with , p ≥ 1; let (a_{i} ) be real coefficients such that is a proper random variable. The existence of X_{t} can be checked by Kolmogorov's three series theorem. The linear process (X_{t} ) can be viewed as the output from a linear filter and the input (..., ε _{t} _{1}, ε _{t} ) is a series of shocks that drive the system (ref. 17, pp. 89). Clearly, , where . Let p = 2. If then the filter is said to be stable (17) and the preceding inequality implies shortrange dependence since the covariances are absolutely summable. Definition 3 extends the notion of stability to nonlinear processes.
Volterra Series. Analysis of nonlinear systems is a notoriously difficult problem, and the available tools are very limited (18). Oftentimes it would be unsatisfactory to linearize or approximate nonlinear systems by linear ones. The Volterra representation provides a reasonably simple and general way. The idea is to represent Eq. 3 as a power series of inputs. In particular, suppose that g in Eq. 3 is sufficiently wellbehaved so that it has the stationary and causal representation where functions g_{k} are called the Volterra kernel. The righthand side of Eq. 8 is generically called the Volterra expansion, and it plays an important role in the nonlinear system theory (13, 1822). There is a continuoustime version of Eq. 8 with summations replaced by integrals. Because the series involved has infinitely many terms, to guarantee the meaningfulness of the representation, there is a convergence issue that is often difficult to deal with, and the imposed conditions can be quite restrictive (18). Fortunately, in our setting, the difficulty can be circumvented because we are dealing with iid random inputs. Indeed, assume that e_{t} are iid with mean 0, variance 1 and g_{k} (u _{1},..., u_{k} ) is symmetric in u _{1},..., u_{k} and it equals zero if u_{i} = u_{j} for some 1 ≤ i < j ≤ k, and Then X_{n} exists and is in . Simple calculations show that and The Volterra process is stable if .
Nonlinear Transforms of Linear Processes. Let (X_{t} ) be the linear process defined in Eq. 6 and consider the transformed process Y_{t} = K(X_{t} ), where K is a possibly nonlinear filter. Let ω(n, Y) be the predictive dependence measure of (Y_{t} ). Assume that ε _{i} have mean 0 and finite variance. Under mild conditions on K, we have (cf. theorem 2 in ref. 23). By Theorem 1, . In this case, if (X_{t} ) is stable, namely Eq. 7 holds, then (Y_{t} ) is also stable.
Quite interesting phenomena happen if (X_{n} ) is unstable. Under appropriate conditions on K, (Y_{n} ) could possibly be stable. With a nonlinear transform, the dependence structure of (Y_{t} ) can be quite different from that of (X_{n} ) (2427). The asymptotic problem of has a long history (see refs. 23 and 27 and references therein). Let and assume for some . Consider the remainder of the τth order Volterra expansion of Y_{n} where r = 0,..., τ, and Let and . Under mild regularity conditions on K and ε _{n} , by theorem 5 in ref. 23, . By Theorem 1, the predictive dependence measure ω^{(τ)}(n) of the remainder L ^{(τ)}(ξ _{n} ) satisfies It is possible that while . Consider the special case a = n ^{β} l(n), where 1/2 < β < 1 and l is a slowly varying function, namely, for any c > 0. l(cn)/l(n) → 1 as n → ∞. By Karamata's Theorem (28) for j ≥ 2, . If τ > (2β  1)^{1}  1, then is summable. Therefore, if the function K satisfies κ _{r} = 0 for r = 0,..., τ and (τ + 1)(2β  1) > 1, then Y_{t} = K(X_{t} ) is stable even though X_{t} is not. Appell polynomials (29) satisfy such conditions. For example, let , then K _{∞}(w) = w ^{2} and κ_{1} = 0, κ_{2} = 2. If β ∈ (3/4, 1), then the process is stable. If 1/2 < β < 3/4, then S_{n} (K)/∥S_{n} (K)∥ converges to the Rosenblatt distribution.
Uniform Volterra expansions for F_{n} (x) over are established in refs. 30 and 31. Wu (32) considered nonlinear transforms of linear processes with infinite variance innovations.
Nonlinear Time Series. Let ε _{t} be iid random variables and consider the recursion where R is a measurable function. The framework 11 is quite general, and it includes many popular nonlinear time series models, such as threshold autoregressive models (33), exponential autoregressive models (34), bilinear autoregressive models, autoregressive models with conditional heteroscedasticity (35), among others. If there exists α > 0 and x _{0} such that where then Eq. 11 admits a unique stationary distribution (36), and iterations of Eq. 11 give rise to Eq. 1 . By theorem 2 in ref. 37, Eq. 12 implies that there exists p > 0 and r ∈ (0, 1) such that where I = {..., 1, 0}. Recall . By stationarity, . So Eq. 13 implies . On the other hand, by Theorem 1 (iii), if holds for some p > 1 and for some r ∈ (0,1), then Eq. 13 also holds. So they are equivalent if p > 1. In refs. 37 and 38, the property 13 is called geometricmoment contraction, and it is very useful in studying asymptotic properties of nonlinear time series.
Inequalities and Limit Theorems
For (X_{i} ) defined in Eq. 1 , let S_{u} = S_{n} + (u  n)X_{n} _{+1}, n ≤ u ≤ n + 1, n = 0, 1,..., be the partial sum process. Let R_{n} (s) = [F_{n} (s)  F(s)], where is the distribution function of X _{0}. Primary goals in the limit theory of stationary processes include obtaining asymptotic properties of {S_{u} , 0 ≤ u ≤ n} and . Such results are needed in the related statistical inference. The physical and predictive dependence measures provide a natural vehicle for an asymptotic theory for S_{n} and R_{n} .
Partial Sums. Let , and B_{p} = p /(p  1), p > 1. Recall and let
By Theorem 1, Θ _{p} ≤ Ω _{p} ≤ 2Θ _{p} . Moment inequalities and limit theorems of S_{n} are given in Theorems 2 and 3, respectively. Denote by IB the standard Brownian motion. An interesting feature in the large deviation result in Theorem 2(ii) is that Ω _{p} and X_{k} do not need to be bounded.
Theorem 2. Let p ≥ 2. (i) We have ∥Z_{n} ∥ _{p} ≤ B_{p} Θ _{p} ≤ B_{p} Ω _{p} . (ii) Let 0 < α ≤ 2 and assume Then for 0 ≤ t < t _{0}, where t _{0} = (eαγ^{α})^{1}2^{α/2}. Consequently, for u > 0, .
Proof: (i) It follows from W.B.W. (unpublished results) and theorem 2.5 in ref. 39. For completeness we present the proof here. Let and . Then . By Doob's maximal inequality and theorem 2.5 in ref. 39 (or proposition 4 in ref. 16),
Since , (i) follows. (ii) Let Z = Z_{n} and p _{0} = [2/α] + 1. By Stirling's formula and Eq. 14
By (i), since , (ii) follows from
Example 1: For the linear process 6, assume that and . We now apply (ii)of Theorem 2 to the sum , where g̃(ξ _{i} ) = 1 _{Xi} _{≤} _{u}  F(u). To this end, we need to calculate the predictive dependence measure ω _{p} (n, g̃) (say) of the process g̃(ξ _{n} ). Without loss of generality let a _{0} = 1. Let F _{ε} and f _{ε} be the distribution and density functions of ε_{0} and assume c:= sup _{u}f _{ε}(u) < ∞. Then Eq. 14 holds with α = 1. To see this, let Y_{n} _{1} = X_{n}  ε _{n}, Z_{n} _{1} = Y_{n} _{1}  a_{n} ε_{0} and . Let n ≥ 1. Then and . By the triangle inequality,
Hence, . Since , we have . Clearly, 0 ≤ Q_{n} ≤ 1. So , where C = 2cA. For η > 0 let the set . By Eq. 15
Condition 15 holds if .
Theorem 3. (i) Assume that Ω_{2} < ∞. Then where . (ii) Let 2 < p ≤ 4 and assume that . Then on a possibly richer probability space, there exists a Brownian motion IB such that where l(n) = (log n)^{1/2+1/} ^{p} (log log n)^{2/} ^{p} .
The proof of the strong invariance principle (ii) is given by W.B.W. (unpublished results). Theorem 3(i) follows from corollary 3 in ref. 40, and the expression is a consequence of the martingale approximation: let and M_{n} = D _{1} +... + D_{n} , then ∥S_{n}  M_{n} ∥ = o() and ∥S_{n} ∥/ = σ + o(1) (see theorem 6 in ref. 41). Theorem 3(i) also can be proved by using the argument in ref. 42. The invariance principle in the latter paper has a slightly different form. We omit the details. See refs. 43 and 44 for some related works.
Empirical Distribution Functions. Let , , be the conditional distribution function of X_{i} given ξ_{0}. By Definition 2, the predictive dependence measure for g̃(ξ _{i} ) = 1 _{Xi} _{≤} _{u}  F(u), at a fixed u,is . To study the asymptotic properties of R_{n} , it is certainly necessary to consider the whole range u ∈ (∞, ∞). To this end, we introduce the integrated predictive dependence measure and the uniform predictive dependence measure where , j = 0, 1,..., i ≥ 1. Let . Theorem 4 below concerns the weak convergence of R_{n} based on . It follows from corollary 1 by W.B.W. (unpublished results).
Theorem 4. Assume that and for some positive constants τ, c _{0} < ∞. Further assume that Then , where W is a centered Gaussian process.
Kernel Density Estimation. An important problem in nonparametric inference of stochastic processes is to estimate the marginal density function f (say) given the data X _{1},..., X_{n} . A popular method is the kernel density estimation (45, 46). Let K be a bounded kernel function for which and b_{n} > 1 be a sequence of bandwidths satisfying Let K_{b} (x) = K(x/b). Then f can be estimated by If X_{i} are iid, Parzen (46) proved a central limit theorem for under the natural condition 21. There has been a substantial literature on generalizing Parzen's result to time series (47, 48). Wu and Mielniczuk (49) solved the open problem that, for shortrange dependent linear processes, Parzen's central limit theorem holds under Eq. 21 . See references therein for historical developments. Here, we shall generalize the result in ref. 49 to nonlinear processes. To this end, we shall adopt the uniform predictive dependence measure 19. The asymptotic normality of f_{n} requires a summability condition of .
Theorem 5. Assume that for some constant c _{0} < ∞ and that f = F′ is continuous. Let . Then under Eq. 21 and we have for every .
Proof: Let m be a nonnegative integer. By the identity and the Lebesgue dominated convergence theorem, we have and h_{m} _{+1} is also bounded by c _{0}. By Theorem 1(ii), . Let . By Theorem 2(i) and Eq. 23 Let and . Observe that Then . Following the argument of lemma 2 in ref. 49, M_{n} / ⇒ N[0, f(x)κ], which finishes the proof since and b_{n} → 0.
Acknowledgments
I thank J. Mielniczuk, M. Pourahmadi, and X. Shao for useful comments. I am very grateful for the extremely helpful suggestions of two reviewers. This work was supported by National Science Foundation Grant DMS0448704.
Footnotes

↵ † Email: wbwu{at}galton.uchicago.edu.

Author contributions: W. B. W. designed research, performed research, and wrote the paper.

Abbreviation: iid, independent and identically distributed.
 Copyright © 2005, The National Academy of Sciences
References

↵
Wiener, N. (1958) Nonlinear Problems in Random Theory (MIT Press, Cambridge, MA).

↵
Rosenblatt, M. (1959) J. Math. Mech. 8 , 665681.

Rosenblatt, M. (1971) Markov Processes. Structure and Asymptotic Behavior (Springer, New York).

↵
Kallianpur, G. (1981) in Norbert Wiener, Collected Works with Commentaries eds. Wiener, N. & Masani, P. (MIT Press, Cambridge, MA) pp. 402424.

↵
Rosenblatt, M. (1956) Proc. Natl. Acad. Sci. USA 42 , 4347.

↵
Bradley, R. C. (2005) Introduction to Strong Mixing Conditions (Indiana Univ. Press, Bloomington, IN).

↵
Blum, J. R. & Rosenblatt, M. (1956) Proc. Natl. Acad. Sci. USA 42 , 412413.
 ↵
 ↵

↵
Rosenblatt, M. (1964) J. Res. Natl. Bureau Standards Sect. D 68D , 933936.

↵
Rosenblatt, M. (1980) J. Appl. Prob. 17 , 265270.

↵
Andrews, D. W. K. (1984) J. Appl. Prob. 21 , 930934.

↵
Priestley, M. B. (1988). Nonlinear and Nonstationary Time Series Analysis (Academic, London).

↵
Pourahmadi, M. (2001) Foundations of Time Series Analysis and Prediction Theory (Wiley, New York).

↵
Chow, Y. S. & Teicher, H. (1988) Probability Theory (Springer, New York).
 ↵

↵
Box, G. E. P., Jenkins, G. M. & Reinsel, G. C. (1994) Time Series Analysis: Forecasting and Control (PrenticeHall, Englewood Cliffs, NJ).

↵
Rugh, W. J. (1981) Nonlinear System Theory: The Volterra/Wiener Approach (Johns Hopkins Univ. Press, Baltimore).

Schetzen, M. (1980) The Volterra and Wiener Theories of Nonlinear Systems (Wiley, New York).

Casti, J. L. (1985) Nonlinear System Theory (Academic, Orlando, FL).

Bendat, J. S. (1990) Nonlinear System Analysis and Identification from Random Data (Wiley, New York).

↵
Mathews, V. J. & Sicuranza, G. L. (2000) Polynomial Signal Processing (Wiley, New York).

↵
Wu, W. B. (2006) Econometric Theory 22 , in press.
 ↵

Sun, T. C. (1963) J. Math. Mech. 12 , 945978.

↵
Ho, H. C. & Hsing, T. (1997) Ann. Prob. 25 , 16361669.

↵
Feller, W. (1971) An Introduction to Probability Theory and its Applications (Wiley, New York) Vol. II.

↵
Avram, F. & Taqqu, M. (1987) Ann. Prob. 15 , 767775.
 ↵

↵
Wu, W. B. (2003) Bernoulli 9 , 809831.

↵
Wu, W. B. (2003) Statistica Sinica 13 , 12591267.

↵
Tong, H. (1990) Nonlinear Time Series: A Dynamical System Approach (Oxford Univ. Press, Oxford).

↵
Haggan, V. & Ozaki, T. (1981) Biometrika 68 , 189196.
 ↵

↵
Diaconis, P. & Freedman, D. (1999) SIAM Rev. 41 , 4176.
 ↵
 ↵

↵
Rio, E. (2000) Theorie Asymptotique des Processus Aleatoires Faiblement Dependants (Springer, Berlin).
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵

↵
Parzen, E. (1962) Ann. Math. Stat. 33 , 19651976.

↵
Robinson, P. M. (1983) J. Time Ser. Anal. 4 , 185207.

↵
Bosq, D. (1996) Nonparametric Statistics for Stochastic Processes. Estimation and Prediction (Springer, New York).
 ↵
Citation Manager Formats
More Articles of This Classification
Physical Sciences
Related Content
 No related articles found.
Cited by...
 No citing articles found.