On the trend, detrending, and variability of nonlinear and nonstationary time series
 *Center for OceanLandAtmosphere Studies, 4041 Powder Mill Road, Suite 302, Calverton, MD 20705;
 ^{†}Research Center for Adaptive Data Analysis, National Central University, Chungli 32054, Taiwan, Republic of China;
 ^{‡}Ocean Sciences Branch, Code 614.2, National Aeronautics and Space Administration Goddard Space Flight Center, Wallops Flight Facility, Wallops Island, VA 23337; and
 ^{§}Division of Interdisciplinary Medicine and Biotechnology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215
See allHide authors and affiliations

Edited by Inez Y. Fung, University of California, Berkeley, CA, and approved July 27, 2007 (received for review February 9, 2007)
Abstract
Determining trend and implementing detrending operations are important steps in data analysis. Yet there is no precise definition of “trend” nor any logical algorithm for extracting it. As a result, various ad hoc extrinsic methods have been used to determine trend and to facilitate a detrending operation. In this article, a simple and logical definition of trend is given for any nonlinear and nonstationary time series as an intrinsically determined monotonic function within a certain temporal span (most often that of the data span), or a function in which there can be at most one extremum within that temporal span. Being intrinsic, the method to derive the trend has to be adaptive. This definition of trend also presumes the existence of a natural time scale. All these requirements suggest the Empirical Mode Decomposition (EMD) method as the logical choice of algorithm for extracting various trends from a data set. Once the trend is determined, the corresponding detrending operation can be implemented. With this definition of trend, the variability of the data on various time scales also can be derived naturally. Climate data are used to illustrate the determination of the intrinsic trend and natural variability.
The terms “trend” and “detrending” frequently are encountered in data analysis. In many applications, such as climatic data analyses, the trend is one of the most critical quantities sought. In other applications, such as in computing the correlation function and in spectral analysis, it is necessary to remove the trend from the data, a procedure known as detrending, lest the result might be overwhelmed by the nonzero mean and the trend terms; therefore, detrending often is a necessary step before meaningful spectral results can be obtained. As a result, identifying the trend and detrending the data are both of great interest and importance in data analysis.
Because the concept of a trend in a data set seems clearly selfevident, most data analysts take it for granted and only few bother to examine the essence of it or to define it rigorously for the purpose of data analysis. For example, in statistics and in numerous scientific analyses, the trend often is taken as the tendency over the whole data domain that presumably will continue into the future when new observations become available. In other cases, the trend can be the residue of data after removing the components of the data with frequency higher than a threshold frequency (1). In a casual Internet search, for example, there are presently more than 12 million items related to trend and detrending. However, a rigorous and satisfactory definition of either the trend of nonlinear nonstationary data or the corresponding detrending operation still is lacking, which leads to the awkward reality that the determination of trend and detrending often are ad hoc operations. Because many of the difficulties concerning trend stem from the lack of a proper definition for the trend in nonlinear nonstationary data, a definitive and quantitative study on trend and detrending is needed.
In this article, a definition for trend is introduced, and a corresponding algorithm for finding intrinsically the trend and implementing the detrending also is presented. Because the detrended data define a more meaningful variability associated with a particular time scale of the data, the variability of the data also will be examined. It should be noted here that the definition of trend and the algorithm for detrending in this study are quite general and can be applied to any data from nonstationary and nonlinear processes. The goal, however, is not for prediction but for analysis. The assumption is that the predictive models have to be processbased, not datadriven. The analysis aspect emphasizes the discovery and understanding of the underlying processes to provide a basis for building predictive models. Therefore, the emphasis of this article differs from those contained in the works by two pioneers of financial data analysis, R. F. Engle and C. W. J. Granger, whose work covers similar problems in determining trend and variability or volatility as used in financial communities (2). However, their emphasis is on models for market prediction, a daunting challenge for a patently nonstationary process. As a justification, they regard the financial market as a special AutoRegressive Integrated Moving Average (ARIMA) process, controlled by a series of shocks and relaxations. They clearly pointed out the limitation of their works, that not all nonstationary data satisfy their special assumptions. Indeed, the vast majority of realworld data are of a nonstationary and nonlinear nature and do not fit the ARIMA prediction models at all.
This article is arranged in the following way: it begins by discussing the drawbacks of subjectively determined but widely used definitions of trend; clarifying some concepts on the essence of the trend of nonlinear nonstationary time series; and then providing a definition of intrinsically determined trend and a method for detrending. This definition of the trend will be applied to the annual global surface air temperature anomaly (GSTA) (with respect to the 30year mean global surface temperature from 1961–1990) time series. Some discussion and conclusions also will be provided. A brief description of the method also will be presented.
A Definition of Trend
Extrinsic and Predetermined Trends.
The most commonly seen trend is the simple trend, which is a straight line fitted to the data, and the most common detrending process usually consists of removing a straight line best fit, yielding a zeromean residue. Such a trend may suit well in a purely linear and stationary world. However, the approach may be illogical and physically meaningless for realworld applications such as in climatic data analyses. In these studies, the trend often is the most important quantity sought (3), and the linearly fitted trend makes little sense, for the underlying mechanism is likely to be nonlinear and nonstationary.
Another commonly used trend is the one taken as the result of a moving mean of the data. A moving mean requires a predetermined time scale so as to carry out the mean operation. The predetermined time scale has little rational basis, for in nonstationary processes the local time scale is unknown a priori. More complicated trend extraction methods, such as regression analysis or Fourierbased filtering, also are often based on stationarity and linearity assumptions; therefore, one will face a similar difficulty in justifying their usage. Even in the case in which the trend calculated from a nonlinear regression happens to fit the data well fortuitously, there still is no justification in selecting a timeindependent regression formula and applying it globally for nonstationary processes. In general, various curve fits with a priori determined functional forms are subjective, and there is no foundation to support any contention that the underlying mechanisms should follow the selected simplistic, or even sophistic, functional forms, except for the cases in which physical processes are completely known.
Intrinsic and Adaptive Trends.
The definitions of trend and the algorithms for detrending discussed above generally involve prescribed parameters or functions that are extrinsic and subjective. To overcome the aforementioned drawbacks, one must address the problem of how to determine the trend for data sets from nonstationary and nonlinear processes without relying on extrinsic functional or simplifying assumptions.
Before proceeding further, several subtle, but important, points must be considered. First, the trend of the data should be an intrinsic property of the data; it is an integral part of the data and also is driven by the same mechanisms or part of the same mechanisms that generate the data. Being intrinsic requires that the method used in defining the trend be adaptive, so that the trend extracted is derived from and based on the data. Unfortunately, most of the presently available methods define trend by using extrinsic approaches (e.g., preselected functional forms).
Second, the trend should exist within a given data span (the whole length, or a part, of the data) and be a property associated with the corresponding local time scales. A trend of a certain time scale shorter than the current data span defined this way is not likely to be affected by any continuing addition of new data. However, the continuing addition of new data does have effects: it will lead to an extension of the current overall trend and even a new overall trend of a time scale longer than the current data span. With this idea in mind, it is easy to recognize the difficulty of distinguishing the trend from the cycle as stated by Stock and Watson (4): “one economist's ‘trend' can be another's ‘cycle,”' when no local time scale is introduced. To separate the two clearly, the trend must be limited to a curve containing at most one extremum within the given data span.
Under the above considerations, the trend is thus defined: The trend is an intrinsically fitted monotonic function or a function in which there can be at most one extremum within a given data span.
Here, “a given data span” could be the whole length, or a part, of the data. Having defined the trend, detrending and the variability can be readily defined as follows:
Detrending is the operation of removing the trend. The variability is the residue of the data after the removal of the trend within a given data span.
Empirical Mode Decomposition (EMD) for Determining Intrinsic Trend.
If the functional form of the trend is not preselected, the processes of determining the trend have to be adaptive to accommodate data from nonstationary and nonlinear processes. As discussed above, regression, moving mean, and filtering all are problematic in dealing with nonlinear nonstationary data. With these considerations, only the recently developed EMD method (1, 5–8) fits the requirements. A brief description of the method also will be presented in the Methods.
Having presented the definitions of trend and detrending, the adaptive approach in trend determination will be demonstrated by using the annual GSTA data. It should be pointed out that the reason for using the annual GSTA is that it is one of the most widely studied climatic time series (3), and it can serve well to illustrate the general methodology.
Determination of the Trends of Annual GSTA.
Global warming has become an extremely contentious issue that has gone far beyond just climate science for its economical and political implications (3, 9), especially within the last two decades. The painstakingly assembled historical instrumental record shows clear evidence of a warming trend over the last century. There are, however, some unsettled arguments, which include the warming rates, the causes of warming, and the precise trend. This article is limited to determining trends with their associated time scales and the corresponding warming rates by using the EMD method. These results will be useful for understanding the variability of the climate on various time scales. The data used here are the annual global surface temperature anomalies analyzed by Jones et al. (10) and posted at the web site of the Climate Research Unit, University of East Anglia, Norwich, U.K. (www.cru.uea.ac.uk/cru/data/temperature/), which is maintained jointly by the Climate Research Unit and the U.K. Meteorological Office Hadley Centre. The annual GSTA is the yearly averaged deviation from the 1961–1990 mean. The data are plotted in Fig. 1.
The data are decomposed into intrinsic mode functions (IMFs) by using the EMD method (1, 5–8). As previously demonstrated (7), the IMFs and the residual obtained by using EMD are not always unique and could change as the stoppage criterion for the sifting process changes, which makes it difficult to confirm whether the extracted trend is “real, actual.” To gain some confidence in the result, the method suggested by Huang et al. (7) has been used. In this method, 10 different S number stoppage criteria for the sifting process are used, and the corresponding IMFs are compared. The details of the S number stoppage criterion, which is the consecutive numbers of sifting for which the numbers of extrema and zerocrossing are constants and have the same value, is given in Huang et al. (7). In this study, 10 different S numbers, from 4 to 13, were selected.
All of the decompositions with their different stoppage criteria gave six IMF components, indicating that the siftings were producing quite stable results. The mean IMFs and their corresponding standard deviations from the 10 different sets are shown in Fig. 2. The values of standard deviations are about one order of magnitude smaller than that of their mean values, indicating the robustness of the result. A statistical significance test was performed on the IMF components, based on the a posteriori test method proposed by Wu and Huang (8, 11). From this test, it was found that the first four IMFs are not distinguishable from the corresponding IMFs of pure white noise. However, the fifth IMF, which represents the multidecadal variability of the data, and the reminder, which is the overall trend, are statistically significant, indicating these two components contain physically meaningful signals. The details of the test can be found in the Methods and Fig. 7.
Various trends, including the linear trend, the overall adaptive trend (the residual component C6), and the multidecadal trend (the sum of C5 and C6), are plotted in Fig. 3. Here, the overall adaptive trend is the trend derived by using the EMD over the whole data span, and the multidecadal trend is the remainder after IMFs of periods shorter than multidecades are removed from the GSTA, which can be regarded as the union of the trends derived from consecutive multidecadal sections of GSTA. The overall adaptive trend is determined intrinsically and is neither linear nor quadratic. The narrowness of the variance limit for the trend (see Fig. 2 Bottom) indicates that the trend is highly robust and reliable. In Fig. 3, a comparison among the different fittings also is illustrated. The linear trend is no doubt the poorest one, the intrinsically determined overall adaptive trend is a major improvement over the linear fitting, and the multidecadal trend catches essentially the meaningful variability and change associated with the annual GSTA, showing even greater improvement.
The variability of the annual GSTA with respect to various trends is given in Fig. 4. From visual inspection, the variability with respect to the linear trend contains a dominant centennial time scale (the time scale of the data length) as well as a multidecadal time scale. However, the variability with respect to the overall adaptive trend, showing mostly multidecadal fluctuating patterns, indicates cyclical variability on a shorter time scale than that of the overall adaptive trend. The variability with respect to the multidecadal trend is not distinguishable from that of white noise.
The change rates of various trends, defined as the temporal derivatives of various trends, are plotted in Fig. 5. The linear trend gives a warming value of 0.5 K per century. However, if greenhouse gases are indeed the causes of warming (3), such a constant warming rate certainly does not reflect the acceleration of warming caused by the accumulation of greenhouse gases. The change rate of the overall adaptive trends seems to reflect the acceleration of warming much better: it was close to no warming in the mid19th century and is ≈0.8 K per century currently. This tendency was qualitatively mentioned earlier (3), but its quantitative characteristics would be totally missed if a linear trend were adopted. For the multidecadal trend, the rate of change is much higher compared with the overall adaptive trend. From Fig. 5, it can be seen that there were three periods when the rates of change were higher (1860s, 1930s, and 1980s), which are interspersed with brief periods of temperature decreases.
As was discussed earlier, a local trend is a timescaleassociated quantity. The time scale of the multidecadal trend based on the generalized zerocrossing method (12) (see the Methods for more detail), which determines the local time scale based on the information of neighboring extrema and zerocrossing, is plotted in Fig. 6. The time scale certainly is not a constant, but it varies from 50 to 80 years and has a mean value slightly higher than 65 years.
Judging from the statistical significance test, we decided not to pursue the trend to any finer scale, because the IMF with a time scale <65 years may not significantly differ from white noise. The reliable trends are those of longer time scales. Significantly, other than the familiar overall global warming trend, the 65year cycle really stands out. The origin of this 65year time scale is not completely clear because there is no known external force that varies with such a time scale.
Finally, it is noted that the global temperature anomalies with respect to the sum of the overall EMD trend and the multidecadal variability appear to be quite stationary in the whole data span, indicating that the higher frequency part of the record in recent years is not more variable than that in the 1800s. The extreme temperature records in the 1990s stand out mainly because the general global warming trend over the whole data length coincides with the warming phase of the 65year cycle.
Conclusions
In this article, we proposed a definition for trend of nonstationary nonlinear data, which in turn made it possible to perform a detrending operation on the data and to determine the variability about the trend line. The key to making this definition of trend feasible is the realization that the trend is one of the many local properties of the data; therefore, it has to be associated with a time scale. Without reference to a time scale, the trend will be confusingly mingled with local cycles.
Other general methods such as leastsquares or maximumlikelihood fits might fit the data well, but they are extrinsic. Although some of the extrinsic functions used in fitting data, such as exponential, power law, and hyperbolic baselines, are nonlinear models, there is no guarantee that the externally determined nonlinearity characteristics correspond to those embedded in the mechanisms generating the data. As most of the underlying mechanisms of either natural or humaninduced variability are only incompletely known, it is almost impossible to decide which of the myriad functions to choose so as to render the best extrinsically determined trend. Therefore, to have a meaningful trend, the method has to be adaptive (so as to let nature speak for itself). The EMD method fits these requirements well. This work has demonstrated an application of the present approach to annual GSTA. It has been shown that this approach not only defines the trend but also reveals some intriguing intrinsic properties of the data. Experience with various realworld data indicates that the variance of the detrended data with respect to any known extrinsically determined trend is larger than that corresponding to the intrinsically fitted variance. However, a rigorous proof of this statement still is under investigation.
Methods
Contrary to almost all of the previous decomposing methods, EMD is empirical, intuitive, direct, and adaptive, without requiring any predetermined basis functions (5–7). The decomposition is designed to seek the different intrinsic modes of oscillations in any data based on the principle of local scale separation. An intrinsic mode of oscillation is called an IMF when it satisfies: (i) in the whole data set, the number of extrema and the number of zerocrossings must either equal or differ at most by one and (ii) at any point, the mean value of the envelope defined by the local maxima and the envelope defined by the local minima is zero. In this way, an IMF is a pure oscillatory mode that bears amplitude and frequency modulations.
The IMFs are extracted level by level: first the highestfrequency local oscillations riding on the corresponding lowerfrequency part of the data are extracted; then the next level highestfrequency local oscillations of the residual of the data are extracted; and so on until no complete oscillation can be identified in the residual. In practice, the EMD is implemented through a sifting process that uses only local extrema. Suppose r _{j−1} is the remainder of data x(t) after j − 1 IMFs are extracted, then the procedure for extracting the jth IMF is as follows:

Identify all of the local extrema (the combination of both maxima and minima) and connect all these local maxima (minima) with a cubic spline as the upper (lower) envelope;

Obtain the first component h by taking the difference between the data and the local mean of the two envelopes (the average of the upper and lower envelopes at any temporal location); and

Treat h as the data and repeat steps 1 and 2 as many times as is required until the envelopes are symmetric with respect to zero mean under certain criteria. The final h is designated as c_{j} , the jth IMF.
A complete sifting process stops when the residue, r_{n} , becomes a monotonic function from which no more IMF can be extracted. The total number of IMFs of a data set is close to log_{2} N, with N being the number of total data points. In short, the EMD is an adaptive method that will decompose data, x(t), in terms of IMFs, c_{j} , and a residual component, r_{n} , i.e., In Eq. 1 , the residual component, r_{n} , could be a constant, a monotonic function, or a function that contains only a single extrema, from which no more oscillatory IMFs can be extracted.
Recent studies by Flandrin et al. (13) and Wu and Huang (8, 11) established that the EMD is equivalent to a dyadic filter bank ^{‖} and to an adaptive wavelet (1). Being adaptive, the EMD has avoided the shortcomings of using any a priori defined wavelet basis and also has avoided the spurious harmonics that would certainly have resulted. The components of the EMD usually are physically meaningful because the characteristic scales are based on and derived from the data (1, 5–8, 11, 14).
From the above description of the EMD, it is clear that the definition of the residual in the EMD is almost identical to the definition of the trend when the data span in the trend definition covers the whole data length. It is noted that there is some similarity between the present approach to trend and variability and the Detrended Fluctuation Analysis (DFA) proposed by Peng et al. (15, 16). Detailed comparisons will need to be discussed in later publications.
Statistical Significance Test of IMFs.
Wu and Huang (8, 11), based on numerical experiments on white noise with the EMD method, found that the EMD is effectively a dyadic filter bank and that the Fourier spectra of the IMF components all are identical and cover the same area on a semilogarithmic period scale. They inferred from the Central Limit Theorem that the IMF components all are distributed normally. They further deduced that the product of the energy density of IMF and its corresponding mean period is a constant and that the energy density function is χ^{2}distributed. Based on these results, they derived the energy density spread function of the IMF components and further established a method to assign statistical significance (at any given statistical confidence level) of information content for IMF components derived from noisy data.
As for the annual GSTA data, which contains nonstationary components such as the trend, the noise contained in the data were estimated based on the first IMF, which is almost always a result of noise for any well sampled data containing noise. For convenience in displaying the results, a minor modification of the significance criterion also was made: the upper (lower) bound is based on three times the variance of the estimated white noise and is selected instead of computing the bounds with a 95% confidence interval. The results are shown in Fig. 7. The consistency of tests performed on other cases indicates that this modification is a reasonable one. If this criterion is adopted, the upper bound indicates that the fifth IMF and the residue trend both lie far above the noise characteristics represented by the solid line; therefore, they are statistically significant.
Time Scales Associated with Intrinsic Trends.
By definition, one of the critical elements of the trend is the time scale associated with it. As a trend is considered a local nonoscillatory function defined for a local time scale not longer than the local full oscillatory cycle, the upper bound of the time scale can be determined by the zerocrossings of the longest full cycle of the oscillations contained in the corresponding variability. With this in mind, the generalized zerocrossing method for defining wavelength proposed by Huang (12) can be used. In this generalized approach, the time scale is determined by the time spans between various combinations of the critical points defined as the union of all of the zerocrossings and extrema of an IMF. Therefore, the most local, and also the finest resolution, of the time scales is the distance between an extremum and the neighboring zerocrossing, representing a quarter wave cycle. The next choice is the time either between two consecutive extrema (a minimum to the next maximum, for example) or between two consecutive zerocrossings (from an up zerocrossing to a down zerocrossing, for example), which represents a half wave cycle. The longest, and most physical, local time scale is the full wavelength: from one maximum (minimum) to the next maximum (minimum) or from one up (down) zerocrossing to the next up (down) zerocrossing. As a result, for any given time location, there are seven possible local time scales representing different local properties. A weighted mean and a standard deviation can be computed from these time scales. The weighted mean is the time scale used in Fig. 6. The details of the method are given in ref. 12.
Acknowledgments
We thank Professor J. M. Wallace of the University of Washington (Seattle, WA) for encouragement and numerous suggestions and corrections for improving the clarity of the article and sharpening the focus. Z.W. was supported by National Science Foundation Grants ATM0342104 and ATM0653123. N.E.H. was supported in part by a Chair at National Central University endowed by Taiwan Semiconductor Manufacturing Company, Ltd., and National Research Council, Taiwan, Grant NSC 952119M008031MY3. S.R.L. was supported in part by the National Aeronautics and Space Administration's Oceanic Physics Program of the Earth–Sun System Division. C.K.P. was supported by National Institutes of Health National Center for Research Resources Grant P41RR13622.
Footnotes
 ^{¶}To whom correspondence should be addressed. Email: zhwu{at}cola.iges.org

Author contributions: N.E.H. designed research; Z.W., N.E.H., S.R.L., and C.K.P. performed research; N.E.H. contributed new reagents/analytic tools; Z.W., N.E.H., and S.R.L. analyzed data; and Z.W., N.E.H., S.R.L., and C.K.P. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

↵ ‖ A dyadic filter bank is a collection of bandpass filters that have a constant bandpass shape (e.g., a Gaussian distribution) but with neighboring filters covering half of or double the frequency range of any single filter in the bank. The frequency ranges of the filters can be overlapped. For example, a simple dyadic filter bank can include filters covering frequency windows such as 50 to 120 Hz, 100 to 240 Hz, 200 to 480 Hz, etc.
 Abbreviations:
 EMD,
 Empirical Mode Decomposition;
 GSTA,
 global surface air temperature anomaly;
 IMF,
 intrinsic mode function.
 © 2007 by The National Academy of Sciences of the USA
References

↵
 Flandrin P ,
 Gonçalvès P ,
 Rilling G
 Huang NE ,
 Shen SSP

↵
 Engle RF ,
 Granger CWJ

↵
 Intergovernmental Panel on Climate Change
 ↵

↵
 Huang NE ,
 Shen Z ,
 Long SR ,
 Wu MC ,
 Shih EH ,
 Zheng Q ,
 Tung CC ,
 Liu HH
 ↵

↵
 Huang NE ,
 Wu MLC ,
 Long SR ,
 Shen SSP ,
 Qu W ,
 Gloersen P ,
 Fan KL

↵
 Wu Z ,
 Huang NE
 ↵
 ↵

↵
 Wu Z ,
 Huang NE
 Huang NE ,
 Shen SSP

↵
 Huang NE
 ↵

↵
 Wu Z ,
 Huang NE
 ↵
 ↵