HYDROL 3897
Journal of Hydrology 227 (2000) 1–20 www.elsevier.com/locate/jhydrol
Review
Chaos theory in hydrology: important issues and interpretations B. Sivakumar* Department of Hydrology and Water Resources, The University of Arizona, Tucson, AZ 85721, USA Received 1 June 1999; received in revised form 24 September 1999; accepted 22 October 1999
Abstract The application of the concept of chaos theory in hydrology has been gaining considerable interest in recent times. However, studies reporting the existence of chaos in hydrological processes are often criticized due to the fundamental assumptions with which the chaos identification methods have been developed, i.e. infinite and noise-free time series, and the inherent limitations of the hydrological time series, i.e. finite and noisy. This paper is designed: (1) to address some of the important issues in the application of chaos theory in hydrology; and (2) to provide possible interpretations to the results reported by past studies reporting chaos in hydrological processes. A brief review of some of the past studies investigating chaos in hydrological processes is presented. An insight into the studies reveals that most of the problems, such as data size, noise, delay time, in the application of chaos theory have been addressed by past studies, and caution taken in the application of the methods and interpretation of the results. The study also reveals that the problem of data size is not as severe as it was assumed to be, whereas the presence of noise seems to have much more influence on the nonlinear prediction method than the correlation dimension method. The study indicates that the presence of noise in the data could be an important reason for the low-prediction accuracy estimates achieved in some of the past studies. These observations, with the fact that most of the past studies used the correlation dimension either as a proof or as a preliminary evidence of chaos, suggest that the hypothesis of deterministic chaos, as the basis in those studies, for hydrological processes is valid and has great practical potential. q 2000 Elsevier Science B.V. All rights reserved. Keywords: Chaos theory; Hydrological data; Identification methods; Correlation dimension; Nonlinear prediction; Data size; Noise
1. Introduction One aspect which hydrologists have been extensively working on is the structure of hydrological processes, such as rainfall and runoff. Even though, during the past few decades, a number of mathematical models have been proposed for modeling hydrological processes, there is, however, no unified mathematical approach. In part, this difficulty stems from the fact that hydrological processes exhibit * Present address: Department of Land, Air and Water Resources, Veihmeyer Hall, University of California, Davis, CA 95616, USA. Fax: 11-530-752-5262. E-mail address:
[email protected] (B. Sivakumar).
considerable spatial and temporal variability. However, another part of this difficulty is due to the limitation in the availability of ‘appropriate’ mathematical tools to exploit the structure underlying the hydrological processes. The latter aspect has gained considerable interest in recent times. The tremendous spatial and temporal variability of hydrological processes has been believed, until recently, to be due to the influence of a large number of variables. Consequently, the majority of the previous investigations on modeling hydrological processes have essentially employed the concept of a stochastic process. However, recent studies have indicated that even simple deterministic systems, influenced by a few nonlinear interdependent
0022-1694/00/$ - see front matter q 2000 Elsevier Science B.V. All rights reserved. PII: S0022-169 4(99)00186-9
2
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
variables, might give rise to very complicated structures (i.e. deterministic chaos). Therefore, it is now believed that the dynamic structures of the seemingly complex hydrological processes, such as rainfall and runoff, might be better understood using nonlinear deterministic chaotic models than the stochastic ones. The investigation of the existence of chaos in hydrological processes has been of much interest lately (e.g. Hense, 1987; Rodriguez-Iturbe et al., 1989; Sharifi et al., 1990; Tsonis et al., 1993; Jayawardena and Lai, 1994; Koutsoyiannis and Pachakis, 1996; Porporato and Ridolfi, 1996, 1997; Sivakumar et al., 1998, 1999a). The outcomes of the investigations are very encouraging as they provided evidence regarding the existence of low-dimensional chaos, implying the possibility of accurate short-term predictions. However, such studies and the reported results have very often been subject to intense debate (e.g. Ghilardi and Rosso, 1990; Koutsoyiannis and Pachakis, 1996) because of the inherent limitations in employing the chaos identification methods for hydrological processes. The low prediction accuracy estimates achieved for the rainfall and streamflow, which have also been identified to exhibit low-dimensional chaos (Jayawardena and Lai, 1994; Sivakumar et al., 1998, 1999a), only raise further questions. In view of the above findings, there is a need to bridge the gap between the theoretical notions of deterministic chaos on one hand, and practical hydrology on the other. Therefore, the present paper has two main objectives. The first is to address some of the important issues, in the application of chaos identification methods in hydrology. The second, but equally important, objective of this paper is to seek implications of the studies investigating and reporting the existence of chaos in hydrological processes. The organization of this paper is as follows. In Section 2, a brief review of the previous studies investigating the existence of chaos in hydrological processes is furnished. Section 3 addresses some of the important issues in the application of the chaos identification methods to hydrological data. An attempt is then made in Section 4 to discuss the important results reported by past studies and to provide possible interpretations. Such interpretations lead to the general discussion, in Section 5, concerning the question of whether a hypothesis of deterministic chaos is valid for hydrological processes.
2. Review of studies investigating chaos in hydrology The last decade has witnessed a number of studies employing the concept of chaos theory in hydrology (e.g. Hense, 1987; Rodriguez-Iturbe et al., 1989; Sharifi et al., 1990; Jayawardena and Lai, 1994; Georgakakos et al., 1995; Sangoyomi et al., 1996; Puente and Obregon, 1996; Porporato and Ridolfi, 1996, 1997; Liu et al., 1998; Wang and Gan, 1998; Sivakumar et al., 1998, 1999a,c). Even though the primary objective of those studies was to investigate the existence of chaos in hydrological processes, other aspects such as prediction (e.g. Jayawardena and Lai, 1994; Porporato and Ridolfi, 1996, 1997; Liu et al., 1998; Sivakumar et al., 1999a), noise level determination (e.g. Sivakumar et al., 1999b,c), and noise reduction (e.g. Porporato and Ridolfi, 1997; Sivakumar et al., 1999c) were also given due consideration. In this section, only a brief account of some of the studies employing the concept of chaos theory in hydrology is presented so as to facilitate us to address the important issues in implementation, and to subsequently discuss about the validity of such studies and the reported results. The possible existence of chaos in hydrological processes was first investigated by Hense (1987), who applied the correlation dimension method to a series of 1008 values of monthly rainfall recorded in Nauru Island. The existence of chaos in the rainfall time series was indicated based on the low correlation dimension value (between 2.5 and 4.5) obtained. Rodriguez-Iturbe et al. (1989) investigated the existence of chaos in rainfall using the correlation dimension method and the Lyapunov exponent method. They analyzed two rainfall records: (1) weekly rainfall data over a period of 148 years observed in Genoa; and (2) a record of 1990 rainfall values, measured with a sampling frequency of 8 Hz and then aggregated at equally spaced intervals of 15 s, from a single storm event in Boston. Observation of a finite low-correlation dimension of about 3.78 provided preliminary evidence on the existence of chaos in the storm data. The presence of chaos in the storm data was supported further by the observation of a positive Lyapunov exponent (0.0002 bits/s). However, the application of the correlation dimension method to the weekly rainfall data did not indicate the existence of chaos.
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
Further evidence on the presence of chaos in storm rainfall was presented by Sharifi et al. (1990), who employed the correlation dimension method to examine fine-increment data from three storms. The total number of data points for each of the three storms were 4000, 3991, and 3316 and the correlation dimensions obtained were 3.35, 3.75, and 3.60, respectively. Tsonis et al. (1993) investigated data representing the time between successive raingage signals each corresponding to a collection of 0.01 mm of rain using the correlation dimension method. The presence of a low correlation dimension of about 2.4 indicated the possible existence of chaos. Islam et al. (1993) obtained a low-dimension on analyzing simulated rainfall intensity data using the correlation dimension method. From a data set of 7200 points, generated at 10-s time steps from a three-dimensional cloud model, they obtained a value as low as about 1.5 for the dimension. Jayawardena and Lai (1994) investigated the daily rainfall and streamflow data from three and two stations, respectively, in Hong Kong for the purpose of identifying the existence of chaos. The correlation dimension method, the Lyapunov exponent method, the Kolmogorov entropy method, and the nonlinear prediction method were applied to data sets containing 4015 points (for rainfall), and 7300 and 6205 points (for streamflow), respectively. Their study provided convincing evidence of the existence of chaos in the daily rainfall and streamflow data in Hong Kong. Although the rainfall and streamflow prediction accuracy estimates were found to be low, they showed the superiority of the nonlinear prediction method over the traditional linear autoregressive moving average (ARMA) method. Using the nonlinear prediction method, Waelbroeck et al. (1994) observed that the prediction skill for daily rainfall dropped off quickly within a time scale of 2 days. However, the prediction skill of 10-day rainfall accumulations was found to be much better. Using the correlation dimension method, Georgakakos et al. (1995) analyzed data from 11 storm events in Iowa City, and reported the possible existence of chaos (except for data from one of these storms). The correlation exponents were found to range from 2.8 to 7.9 in the high-intensity scaling region, while in the low-intensity scaling region they ranged from 0.5 to 1.6. The possibility of the existence of chaos, in the
3
volume of the Great Salt Lake was studied by Sangoyomi et al. (1996). The analysis of a 144-year, biweekly time series of the Great Salt Lake volume yielded a correlation dimension of about 3.4. Puente and Obregon (1996) reported the existence of chaos in storm events observed in Boston by analyzing a time series of 1990 points using the correlation dimension method, the Kolmogorov entropy method, the false neighbors algorithm, and the Lyapunov exponent method. They presented a deterministic fractal-multifractal (FM) approach for modeling the storm event. A detailed comparison of the real and FM fitted time series revealed the possibility of the use of a deterministic FM approach for a faithful representation of the Boston storm event, and led them to hint that a stochastic framework for rainfall modeling might not be necessary. However, at the same time, Koutsoyiannis and Pachakis (1996) defended the use of stochastic models in modeling hydrological processes. They concluded, while analyzing incremental rainfall depths measured every 15 min, that a synthetic continuous rainfall series generated by a well-structured stochastic rainfall model might be practically indistinguishable from a historic rainfall series even if one used the tools of the chaotic dynamics theory to characterize and compare the two rainfall series. Porporato and Ridolfi (1996) provided clues to the existence of deterministic chaos in the daily flow data of Dora Baltea, a tributary of the river Po, in Italy. The application of the correlation dimension method and the nonlinear prediction method to a time series consisting of 14,246 points indicated the existence of a strong deterministic component. The study also paved the way for a more detailed analysis of the flow phenomenon (Porporato and Ridolfi, 1997), such as noise reduction, interpolation, and nonlinear prediction, which provided important confirmations of the nonlinear deterministic behavior of the flow phenomenon. Liu et al. (1998) analyzed, using the nonlinear prediction method, the daily streamflow data observed in 28 selected stations from the continental United States and reported that the daily streamflow signals spanned a wide dynamical range between deterministic chaos and periodic signal contaminated with additive noise. Further studies regarding the existence of deterministic chaos in streamflow data were carried out by Wang and Gan
4
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
(1998), who estimated the correlation dimensions of the unregulated streamflow data of six rivers in the Canadian prairies to be about 3.0. However, based on their observation of the consistent underestimation of the correlation dimension for the randomly resampled data by an amount of 4–6, they interpreted that the actual dimensions of the streamflow data should be between 7 and 9. Sivakumar et al. (1998, 1999a) investigated the daily rainfall data of different record lengths observed from each of six stations in Singapore using the correlation dimension method and the nonlinear prediction method, and provided convincing evidence regarding the existence of chaos. They also employed the surrogate data method, which indicated the absence of linearity in the rainfall time series. Subsequently, Sivakumar et al. (1999b,c) studied the problem of the influence of the presence of noise (measurement error) on the correlation dimension and prediction accuracy estimates, by proposing a systematic approach for noise reduction, coupling a noise level determination method and a noise reduction method. The outcomes provided additional support regarding the existence of a deterministic component in the rainfall phenomenon and possible reasons for the low prediction accuracy estimates achieved in the earlier study (Sivakumar et al., 1999a).
3. Issues in the investigation of chaos in hydrology As the application of the concept of chaos theory to hydrological processes has been gaining momentum lately, so are the questions on the validity of such studies and the reported results. This section addresses the possible bases for such questions. It is not the intent of this section to discuss, in detail, all the issues pertaining to such questions, rather the intent is to address only those issues that have been recognized or suspected to significantly influence the outcomes. The questions regarding the applicability of chaos theory in hydrology (or any natural phenomenon) may broadly be divided into two categories. The first is concerned with the lack of investigative methods, which provide sufficient conditions to identify the existence of chaotic dynamics in hydrological phenomena. The second is concerned with the validity of chaos identification methods to hydrological data
due to practical limitations such as small sample size, insufficient sampling frequency, and presence of noise. What is more important is that all the above issues play major roles when one deals with hydrological data and, therefore, makes the problem of chaos identification much more difficult. A brief discussion of the above problems is provided below. 3.1. Chaos identification methods The science of chaos is a burgeoning field and the available methods to investigate the existence of chaos in a time series are still in the state of infancy. Though a wide variety of methods are available, such as the correlation dimension method (e.g. Grassberger and Procaccia, 1983a), the Lyapunov exponent method (e.g. Wolf et al., 1985), the Kolmogorov entropy method (e.g. Grassberger and Procaccia, 1983b), the nonlinear prediction method (e.g. Farmer and Sidorowich, 1987; Casdagli, 1989, 1991; Sugihara and May, 1990), and the surrogate data method (e.g. Theiler et al., 1992a,b; Schreiber and Schmitz, 1996), there is no single method that can provide an infallible distinction between a chaotic and a stochastic system. For instance: (1) a finite correlation dimension, usually understood as the principal, if not unique, sign of deterministic chaos, may be observed also for a stochastic process (e.g. Osborne and Provenzale, 1989); (2) an autoregressive (AR) stochastic process can also produce accurate short-term prediction, which is a typical characteristic of a chaotic process; (3) a positive Lyapunov exponent may be observed also for random and ARMA processes (e.g. Jayawardena and Lai, 1994); (4) random noises with power law spectra may provide convergence of the Kolmogorov entropy (e.g. Provenzale et al., 1991); and (5) phase-randomized surrogates can produce spurious identifications of non-random structure (e.g. Rapp et al., 1994). Consequently, a conclusive resolution of whether or not a given finite data set is chaotic is difficult to provide. On one hand, these problems have motivated improvements on existing methods for the diagnosis of chaos and the proposal of new ones. Popular among these are nonlinear prediction (e.g. Farmer and Sidorowich, 1987; Casdagli, 1989; Sugihara and May, 1990) including deterministic versus stochastic (DVS) diagrams (e.g. Casdagli, 1991), surrogate
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
data (e.g. Theiler et al., 1992a,b; Schreiber and Schmitz, 1996), and linear and nonlinear redundancies (e.g. Palus, 1995; Prichard and Theiler, 1995). On the other hand, they have highlighted the caution needed in studying natural phenomena. Only the application of diverse techniques, each one in some way complementary to the others, and their critical analysis can enable us to confirm whether or not to exclude the existence of chaotic dynamics in a phenomenon (e.g. Porporato and Ridolfi, 1997). Having said the above, it is crucial to note, at this point, that only a few studies employed more than one method in their investigation of the existence of chaos in hydrological processes and to verify or confirm the results (e.g. Rodriguez-Iturbe et al., 1989; Jayawardena and Lai, 1994; Porporato and Ridolfi, 1996, 1997; Sivakumar et al., 1999a). Also, all the other studies, except those of Waelbroeck et al. (1994) and Liu et al. (1998), based their conclusions on the correlation dimension method, where the presence of a finite lowcorrelation dimension was taken as an indicator of chaos (e.g. Hense, 1987; Rodriguez-Iturbe et al., 1989; Sharifi et al., 1990; Islam et al., 1993; Tsonis et al., 1993; Georgakakos et al., 1995; Koutsoyiannis and Pachakis, 1996; Sangoyomi et al., 1996; Sivakumar et al., 1998; Wang and Gan, 1998). The observation, as previously mentioned, that even stochastic processes may yield finite low-correlation dimensions therefore brings an inevitable question on the validity of the reported results. The failure to continue investigations either to provide further support, or confirmation on the existence of chaos, or to try to make short-term predictions at least based on those preliminary results only raises additional concerns. 3.2. Limitations of data A fundamental limitation of the applicability of chaos theory in hydrology arises from the basic assumptions with which the chaos identification methods are developed, i.e. the time series is infinite and noise-free. This is because hydrological data are always finite and inherently contaminated by noise, such as errors arising from measurement. A finite and small data set may probably result in an underestimation of the actual dimension of the process (e.g. Havstad and Ehlers, 1989). The presence of noise may affect the scaling behavior in the correlation dimension
5
estimate and the prediction accuracy in the nonlinear prediction method (e.g. Schreiber and Kantz, 1996). There are also other issues such as the sampling frequency, delay time and critical embedding dimension. Since the correlation dimension method and the nonlinear prediction method have been widely employed in studies investigating the existence of chaos in hydrological processes, much of the discussion below on the data limitations is restricted to these two methods. However, most of these limitations apply to other methods as well. 3.2.1. Data size and sampling frequency The problem of data size is believed to be much more serious in the correlation dimension method than in the nonlinear prediction method. The correlation exponent and hence the correlation dimension are computed from the slope of the scaling region in the log C
r versus log r plot. It is always desirable to have a larger scaling region to determine the slope, since the determination of the slope for a smaller scaling region may be difficult and possibly give rise to errors. The infinite length of the data set results in a larger scaling region due to the inclusion of a large number of points (or vectors) on the reconstructed phase–space. However, if the data set were finite and small, there would be only a few points on the reconstructed phase–space, which makes slope determination difficult. Therefore, it may be necessary to have a large data size for dimension estimation. The belief that a large data size would be necessary created a lot of debate on the minimum data size required for the computation of the correlation dimension. Numerous attempts have been and are being made to provide some guidelines on this issue (e.g. Smith, 1988; Havstad and Ehlers, 1989; Nerenberg and Essex, 1990; Ramsey and Yuan, 1990). A brief overview of some of the important suggestions and recommendations is given below. The painful exercise of determining the minimum number of data points (Nmin) was first tackled by Smith (1988), who concluded that this number was equal to 42 m, where m is the smallest integer above the dimension of the attractor (an attractor is a geometric form that characterizes long-term behavior of a system in the phase–space). Nerenberg and Essex (1990) demonstrated that Smith’s procedure to obtain the 42 m estimate was flawed and that the data requirements
6
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
might not be so extreme. They suggested that the minimum number of points required for the dimension estimate is N min , 10210:4m : Havstad and Ehlers (1989) used a variant of the nearest neighbor dimension algorithm to compute the dimension of the time series generated from the Mackey–Glass equation (Mackey and Glass, 1977), whose actual dimension is 7.5. Using a data set of as small as 200 points, the study resulted in an underestimation of the dimension by about 11%. Ramsey and Yuan (1990) concluded that for small sample sizes, dimension could be estimated with upward bias for chaotic systems and with downward bias for random noise as the embedding dimension is increased. They proved that, due to these bias effects, a correlation dimension estimate of 0.214 could imply an actual correlation dimension value of as high as 1.68. Though none of the studies addressing the issue of data size has been able to provide a clear-cut guideline on the minimum data size for the correlation dimension estimation, what is clear is that a large (if not infinite) data set may be necessary to obtain realistic results. However, a large data size alone does not solve the overall problem, as other factors, such as the sampling frequency, may also be important. This is because, the theorems which justify the use of delay (or other) embedding vectors recovered from a scalar measurement as a replacement for the ‘original’ dynamical variables are themselves strictly valid only for data of infinite (or at least very high) resolution. It is important to recognize that large data sets with infinite resolution are generally not available in the field of hydrology (the problem of deriving highresolution data from low-resolution data has been of much interest in recent times). For instance, regarding data size, let us consider the recommendation by Smith (1988) on the minimum data size, i.e. Nmin 42m : This means that, when m 4; if Nmin is not at least equal to 3,111,696 no accurate estimate of the dimension can be obtained. In other words, if one is dealing with daily data, then one has to have data collected over a period of about 8350 years. Such a restrictive figure questions all the studies claiming low-dimensional chaos in hydrology (e.g. Hense, 1987; Rodriguez-Iturbe et al., 1989; Sharifi et al., 1990; Islam et al., 1993; Tsonis et al., 1993; Jayawardena and Lai, 1994; Georgakakos et al.,
1995; Koutsoyiannis and Pachakis, 1996; Sangoyomi et al., 1996; Puente and Obregon, 1996; Porporato and Ridolfi, 1996, 1997; Liu et al., 1998; Wang and Gan, 1998; Sivakumar et al., 1998, 1999a,c). In fact, Smith’s results effectively eliminate the possibility of estimating the dimension of any hydrological phenomena, since no time series contains such a large number of values. A similar kind of problem is faced regarding the sampling frequency, as rainfall and runoff data generally available are as low a frequency as daily, though recent advances in technology, such as high-resolution measurement gages and remote sensing, might solve this problem to a certain extent. 3.2.2. Noise Noise affects the performance of many techniques of identification, modeling, prediction, and control of deterministic systems. Some of the most characteristic examples of the effects of noise are: (1) self-similarity of the attractor is broken; (2) phase–space reconstruction appears as high-dimensional on small length scales; (3) nearby trajectories diverge diffusively rather than exponentially; and (4) prediction error is found to be bounded from below no matter which prediction method is used and to how many digits the data are recorded (e.g. Kantz and Schreiber, 1997). The severity of the influence of noise on chaos identification and prediction methods depends largely on the level and the nature of noise. In general, when the noise level approaches a few percent, estimates can become quite unreliable (e.g. Schreiber and Kantz, 1996; Kantz and Schreiber, 1997). The presence of noise influences the estimation of the correlation dimension primarily from the identification of the scaling region. Noise may corrupt the scaling behavior at all length scales, but its effects are significant especially at smaller length scales. If the data are noisy, then below a length scale of a few multiples of the noise level, the data points are not confined to the fractal structure but smeared out over the whole available phase–space. Thus, the local scaling exponents may increase. It has been observed that even small levels of noise significantly complicate estimates of dimension, a quantity that in principle should be straightforward to measure (e.g. Schreiber and Kantz, 1996). Noise is one of the most prominent limiting factors
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
for the predictability of deterministic systems. Noise limits the accuracy of predictions in three possible ways: (1) the prediction error cannot be smaller than the noise level, since the noise part of the future measurement cannot be predicted; (2) the values on which the predictions are based are themselves noisy, inducing an error proportional to and of the order of the noise level; and (3) in the generic case, where the dynamical evolution has to be estimated from the data, this estimate will be affected by noise (Schreiber and Kantz, 1996). In the presence of the above three effects, the prediction error will increase faster than linearly with the noise level. The sensitivity of the correlation dimension (or any other invariant) and the prediction accuracy to the presence of noise is the price one has to pay for using these to identify chaos. The definitions of these involve the limit of small length scales because it is only then that the quantity becomes independent of the details of the measurement technique, the data processing and the phase–space reconstruction method. The permissible noise level for a practical application of these methods depends, in a complicated way, on the details of the underlying system and the measurement. The foregoing discussion clearly indicates that noise present in hydrological data cannot be ignored if the analysis is to remain realistic. The important first step is to be aware of the problem and to recognize its effects on the data analysis techniques by estimating the level and the nature of noise. If it is found that the level of noise is only moderate, and there are hints that there is a strong deterministic component in the signal, then one can attempt the second step of separating the deterministic signal from the noise. However, none of the studies investigating chaos in hydrology, except a very recent study by Sivakumar et al. (1999c), attempted to determine the level of noise in the data and, therefore, it is very difficult to comprehend the possible effects of noise on the reported results. Although a wide variety of nonlinear noise reduction methods have been made available in the literature over the past decade (e.g. Schreiber and Grassberger, 1991; Schreiber, 1993; Grassberger et al., 1993), their applicability to hydrological data has been tested only recently (Porporato and Ridolfi, 1997; Sivakumar et al., 1999c). The failure of the majority of the studies to address the
7
problem of noise and its possible effects on chaos identification in hydrological data forms another side of the criticism of the validity of such studies. 3.2.3. Delay time An appropriate delay time, t , for the reconstruction of the phase–space necessary because an optimum selection of t gives best separation of neighboring trajectories within the minimum embedding phase– space (e.g. Frison, 1994). If t is too small, then there is little new information contained in each subsequent datum and this may result in an underestimation of the correlation dimension (e.g. Havstad and Ehlers, 1989). On the contrary, if t is too large, and the dynamics are chaotic, all relevant information for phase–space reconstruction is lost since neighboring trajectories diverge, and averaging in time and/or space is no longer useful (e.g. Sangoyomi et al., 1996). This may result in an overestimation of the correlation dimension (e.g. Havstad and Ehlers, 1989). Many researchers have addressed the problem of the selection of an appropriate delay time and proposed various methods. Well known among these are the autocorrelation function method (e.g. Holzfuss and Mayer-Kress, 1986; Schuster, 1988; Tsonis and Elsner, 1988), the mutual information method (e.g. Frazer and Swinney, 1986) and the correlation integral method (e.g. Liebert and Schuster, 1989). The autocorrelation function method is the most commonly used one due to its computational ease. Holzfuss and Mayer-Kress (1986) suggested using a value of delay time at which the autocorrelation function first crosses the zero line. Other approaches consider the lag time at which the autocorrelation function attains a certain value, say 0.1 (Tsonis and Elsner, 1988), 0.5 (Schuster, 1988). According to Frazer and Swinney (1986), however, the autocorrelation function method measures the linear dependence between successive points and, thus, may not be appropriate for nonlinear dynamics. They suggested the use of the local minimum of the mutual information, which measures the general dependence between successive points. They reasoned that if t is chosen to coincide with the first minimum of the mutual information, then the recovered state vector would consist of components that possess minimal mutual information between them. The mutual
8
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
information method is a more comprehensive method of determining proper delay time values (e.g. Tsonis, 1992). However, the method has the disadvantage of requiring a large number of data, unless the dimension is small, and is computationally cumbersome. A somewhat similar approach, which does not demand as much data as the mutual information method, was proposed by Liebert and Schuster (1989). According to this approach, the first minimum of the logarithm of the generalized correlation integral provides a proper choice of the delay time. For some attractors, it really does not matter whether the autocorrelation function or the mutual information or the correlation integral is used. For example, when applied to the Rossler system (Rossler, 1976), all approaches provided a value of t approximately equal to one-fourth of the mean orbital period (Tsonis, 1992). However, for some other attractors, the estimation of t might depend strongly on the approach employed. Evidently, none of the aforementioned rules has emerged as the definitive rule for choosing t , but the mutual information approach appears to have the edge. In the absence of clear-cut guidelines, a practical approach is to experiment with different t to ascertain its effect on the correlation dimension (e.g. Rodriguez-Iturbe et al., 1989; Tsonis, 1992; Tsonis et al., 1993). The problem of the selection of an appropriate t has been addressed by some of the studies investigating the existence of chaos in hydrological data (e.g. Tsonis et al., 1993; Jayawardena and Lai, 1994; Sangoyomi et al., 1996; Sivakumar et al., 1999a). However, on one hand, most of these studies, except that of Sangoyomi et al. (1996), have employed only the autocorrelation function method to determine the appropriate t and, therefore, there is no way to comprehend whether the delay times used are in fact appropriate. On the other hand, even when employing the autocorrelation function method, not many studies could ascertain the effect of t on the correlation dimension estimate, as they failed to carry out the analysis with different t values. 3.2.4. Other problems The problems of data size, data sampling frequency, delay time, critical embedding dimension, and presence of noise are encountered in almost every field of natural and physical phenomena, including
hydrology, and this is why they have received considerable attention. However, there could also be other problems, as serious as the above, that might not have received the necessary attention because of their association with a particular field. One such problem that is commonly encountered in the field of hydrology is the presence of a large number of zeros in the measurements. One possible influence of this problem is that in the presence of a large number of zeros (or any other single value), the reconstructed hyper-surface in phase–space will tend to a point and may result in an underestimation of the correlation dimension (e.g. Tsonis et al., 1993). In fact, some of the criticism on studies reporting lowdimensional chaos in hydrological data, particularly low-resolution data such as daily, revolves around the problem of the presence of a large number of zeros. The various issues discussed above on the inability of the investigative methods to provide sufficient conditions to identify chaos, and the inherent limitations of the hydrological data, clearly indicate the potential difficulties and uncertainties on chaos identification in hydrological processes. The failure of most of the past studies to address, in detail, all the pertinent and important issues only raises further concern on the basis for the application of the chaos theory in hydrology and the validity of the reported results.
4. Reported results and interpretations It is clear, from the foregoing discussion, that there cannot be any second opinion on the inherent problems in the application of chaos theory in hydrology and the associated uncertainties on the outcomes. However, whether such problems are tremendously serious enough to warrant criticism on the use of chaos theory in hydrology and the evidence provided by past studies on the existence of chaos in hydrological processes, is a question that needs to be immediately addressed. In this regard, the emphasis should not be on dwelling much on the limitations of the methods and the data, but rather should be on trying to provide possible interpretations of the results obtained, keeping in mind the limitations. Therefore, this section is dedicated to providing possible
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
interpretations of the results reported by the past studies, most of which have also to some extent addressed, and even taken care of, the limitations. 4.1. Use of diverse techniques As discussed previously, each of the available chaos identification methods has its own limitations and, therefore, it is absolutely not possible to provide irrefutable proof regarding the existence of chaotic dynamics in a phenomenon. Having this in mind, when looking for the possible existence of a deterministic component in a phenomenon, the goal must be to try to acquire clues that allow us not to exclude its existence rather than to ensure its existence. One possible way to achieve this is to employ diverse techniques, in order to verify whether the results from each one of them are complementary to the others. Unfortunately, this has not often been the case as far as studies investigating the existence of chaos in hydrological processes are concerned. Most of the studies reported existence of chaos based on the finite low-dimensions achieved using the correlation dimension method and, as a result, paved the way for criticisms of such studies since finite correlation dimensions may be observed also for linear stochastic processes (e.g. Osborne and Provenzale, 1989). Though there cannot be any argument on the possibility of linear stochastic processes providing finite correlation dimensions, a pertinent question is whether this alone is sufficient enough to form a strong basis for interpreting the low correlation dimensions (resulting from stochastic processes) reported by past studies investigating chaos in hydrological processes. The importance of this question lies in the fact that not many (artificial) stochastic systems have been identified to yield finite and low correlation dimensions, whereas low dimensions have been observed for every (artificial) chaotic system, e.g. Lorenz system (Lorenz, 1963), Henon map (Henon, 1976), Rossler system (Rossler, 1976), Mackey– Glass delay differential equation (Mackey and Glass, 1977) and the Ikeda map (Ikeda, 1979). An implication of this is that, though it may not be possible to conclude based on the correlation dimension results reported by past studies, whether or not chaos exists in hydrological processes, such an existence cannot be excluded altogether. This point has been reflected in
9
almost all the studies investigating chaos in hydrological processes as they either recommended or employed other methods to verify and confirm the results. Tsonis et al. (1993), for example, established that due to the weaknesses of the existing algorithms, such as the Grassberger–Procaccia algorithm, the results from the correlation dimension method could be considered to present just evidence rather than proof of existence of chaos. They recommended that evidence for chaos should be fortified by additional evidence using other methods, such as Lyapunov exponent and nonlinear prediction. Though only a few studies (e.g. Rodriguez-Iturbe et al., 1989; Jayawardena and Lai, 1994; Puente and Obregon, 1996; Porporato and Ridolfi, 1996, 1997; Sivakumar et al., 1999a) have employed one or more methods in addition to the correlation dimension method, it is important to note that the outcomes clearly provided additional evidence to those achieved using the correlation dimension method, regarding the existence of chaos in hydrological processes. Among the methods used, the nonlinear prediction method was found to be very promising (e.g. Jayawardena and Lai, 1994; Porporato and Ridolfi, 1996, 1997; Sivakumar et al., 1999a), although the method yielded only low-prediction accuracy in some of the studies (e.g. Jayawardena and Lai, 1994; Sivakumar et al., 1999a). (This could possibly be due to the presence of noise in the data, details of which will be discussed below.) The advantages of this method are: (1) the existence of chaos can be identified by comparing the prediction accuracy against the number of neighbors (e.g. Casdagli, 1991), the embedding dimension (e.g. Casdagli, 1989), and the lead time (e.g. Sugihara and May, 1990); and (2) it does not require a large data size and can provide reasonably good results even when the data size is small. The studies of Jayawardena and Lai (1994) and Porporato and Ridolfi (1996, 1997) checked the prediction accuracy against the lead time and embedding dimension, whereas all the above three were used by Sivakumar et al. (1999a) in their investigation of chaos in the daily rainfall data observed in Singapore. All the above studies provided convincing evidence regarding the existence of chaos in rainfall and streamflow. On the contrary, since finite correlation dimensions may be observed even for linear stochastic processes,
10
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
it is necessary to confirm the absence of linearity in the data to verify the results achieved using the above methods. One possible approach to achieve this is to reject a null hypothesis that the data could be the outcome of a linear stochastic process. Such an approach, popularly known as the surrogate data method (e.g. Theiler, 1992a,b), makes use of the substitute data generated in accordance to the probabilistic structure underlying the original data. This means that the surrogate data possess some of the properties specified in a null hypothesis. The rejection of the null hypothesis can be made based on some discriminating statistics, such as the correlation dimension. If the discriminating statistics obtained for the surrogate data are significantly different from those of the original time series, then the null hypothesis can be rejected and the original time series may be considered to have come from a nonlinear process. However, if the discriminating statistics obtained for the original data and the surrogate data are not significantly different, then the null hypothesis cannot be rejected and the original time series is considered to have come from a linear stochastic process. To the author’s knowledge, only two studies employed the concept of surrogate data in order to confirm the absence of linear stochasticism in hydrological data (Koutsoyiannis and Pachakis, 1996; Sivakumar et al., 1999a). Koutsoyiannis and Pachakis (1996) generated a synthetic time series using a stochastic model capable of preserving important properties of the rainfall process, such as intermittency, seasonality and scaling behavior. Based on the correlation dimensions obtained for both the original and the synthetic data, they concluded that a synthetic rainfall series might be practically indistinguishable from a historic time series even if one used tools of the chaotic dynamics theory to characterize the rainfall time series. Their study, though rejecting the possibility of chaos in hydrological processes, drew its conclusion only based on the results obtained from the correlation dimension method and, therefore, is similar to most of the other studies, as it failed to verify the results using other methods. Recently, Sivakumar et al. (1999a), in their investigation of the daily rainfall data in Singapore, generated surrogate data sets preserving the major probabilistic characteristics of the original data. The (number of) zero and non-zero
values in the rainfall data were modeled using a Bernoulli random variable. The high significance values of the statistic (correlation dimension) indicated that the null hypothesis (i.e. the data arose from a linear stochastic process) could be rejected and hence the original (rainfall) data were possibly derived from a nonlinear process (for further details, see Sivakumar et al., 1999a). The results provided additional evidence, to those obtained using the correlation dimension method and the nonlinear prediction method, regarding the existence of chaos in the rainfall data. Also, the observation of no saturation of the correlation exponent for the surrogate data sets, having almost the same number of zeros and non-zeros as the original data, reveals that the low correlation dimensions obtained for the original rainfall data are not due to the presence of a large number of zeros, an issue raised above in Section 3.2.4. 4.2. Is a large data size necessary? One reason for the general belief that a large data size is required for the correlation dimension estimate is the assumption that the data size is a function of the embedding dimension used to obtain the vectors by phase–space reconstruction (Smith, 1988; Nerenberg and Essex, 1990). However, this is not entirely true, since the data size required may depend largely on the dynamics of the phenomenon. In practice, for a particular data size, the number of reconstructed vectors may not differ much whether an embedding dimension of, for example, 4 or 10 is used. For example, according to Nerenberg and Essex (1990), for a four and ten-dimensional embedding phase– space the number of points required is, respectively, about 4000
10210:4×4 and 1,000,000
10210:4×10 : However, a fairly accurate estimation of the correlation dimension can be obtained with as low as 5000 points even for an embedding dimension of 10, as a large scaling region is evident in the correlation dimension plots shown in Fig. 1 for an artificial (noise-free) chaotic (Henon) data. It seems, in this case, that accurate estimation of the correlation dimension may be obtained for even higher embedding dimensions. These observations suggest that the data size may not be a function of the embedding dimension. The argument that the data size required for the correlation dimension estimate may not be a function
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
11
Fig. 1. Local slopes versus log r for Henon data.
of the embedding dimension can also be explained as follows. Assuming that we have a time series (Henon data) of dimension d 1:22; in all embedding dimensions m , 1:22; the object is space filling. Thus, for m , 2; d m; while for m $ 2; d 1:22: Thus, the first deviation of the correlation exponent from the diagonal (i.e. d 1:22 starting at m 2 and remain constant for higher values of m) against the embedding dimension from the diagonal should provide estimation of the correlation dimension (Fig. 2(a)). This, however, may not be what is usually observed when data from measurements or from known dynamical systems are analyzed. For such systems, such as rainfall, the correlation exponent may deviate from the diagonal for values of m , 2 (e.g. m 1) and gradually increase with an increase in embedding dimension up to certain value (e.g. m 10), which is higher than the minimum dimension required
m 2 to embed the attractor (Fig. 2(b)). Surely, in such cases, the first deviation from the diagonal does not correspond to the dimension of the underlying attractor for which one needs to go to higher embedding dimensions. This is the case in most of the studies employing the correlation dimension method for hydrological data (e.g. Tsonis et al., 1993; Islam et al., 1993; Jayawardena and Lai, 1994; Sangoyomi et al., 1996; Koutsoyiannis and Pachakis, 1996; Porporato and Ridolfi, 1996; Wang and Gan, 1998; Sivakumar et al., 1998, 1999a). Also, this is the reason for the
proposal of minimum and sufficient dimensions of the embedding phase–space (or number of variables) to model the dynamics of the system (e.g. Fraedrich, 1986), rather than 2d 1 1 dimensions (e.g. Takens, 1981) or d 1 1 dimensions (e.g. Abarbanel et al., 1990). The above observations imply that the minimum data size required for the correlation dimension estimation may largely depend on the type and dimension of the attractor, rather than the embedding dimension. The calculations and derivations, presented thus far, relating the data size and the embedding dimension may be valid only for m , d: The need for data size may increase when m . d; but at a much slower rate. Therefore, in cases where saturation is observed for m . d; one may need N min , f
d points rather than N min , f
m points (provided m is not much larger than d). However, this conclusion may not be valid for every dynamical system or data set. According to Lorenz (1991), different variables could yield different estimates of correlation dimension and suitably selected variables could sometimes yield a fairly good estimate even if the number of points were not large. Such an interpretation was also supported by Zeng and Pielke (1993), who reported that the apparent reason for finding lowdimensional atmospheric attractors was that this might reflect the weak nonlinear interaction between the analyzed variable and the other variables in the
12
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
Fig. 2. Relationship between correlation exponent and embedding dimension for: (a) Henon data; (b) Singapore rainfall data.
atmosphere. Islam et al. (1993) offered an explanation that if the single variable time series chosen for analysis depended on physical constraints and thresholds, then its correlation dimension would be significantly less than that of the underlying dynamical system. They argued that this could be the reason, why variables like pressure and vertical wind velocity yielded high correlation dimension values, while derived variables, like sunshine duration and rainfall, resulted in low dimension estimates. While attempting to compare the behavior of rainfall with that of the vertical wind velocity, they reported correlation dimension values of about 1.5 for the rainfall data,
and an infinite dimension for the vertical wind velocity data. Their studies suggest that a low number of variables, resulting from a low correlation dimension, may capture the important dynamical aspects of the analyzed time series rather than the entire underlying dynamical system. The results from the studies investigating chaos in hydrological processes seem to indicate that such processes may be strongly coupled with only a few dominant variables of the underlying systems. Therefore, it is important to try to identify those strongly coupled variables than to worry about data requirements. In the absence of clear-cut guidelines, one reasonable
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
13
Table 1 Results of correlation dimension analysis: daily rainfall data in Singapore with different data sizes Record length (year)
Number of data points
Time delay (day)
Correlation dimension
Minimum embedding dimension
Sufficient embedding dimension
30 20 10 5 4 3 2 1
10 958 7305 3653 1826 1461 1096 731 365
10 10 12 8 8 7 8 5
1:01 ^ 0:02 1:03 ^ 0:03 1:03 ^ 0:03 1:03 ^ 0:03 1:01 ^ 0:03 1:01 ^ 0:04 0:91 ^ 0:08 0:87 ^ 0:06
2 2 2 2 2 2 1 1
12 16 15 16 16 15 14 16
way to determine the minimum data size is to compute the correlation dimensions for different sample sizes until significant changes are observed below a certain sample size (e.g. Rodriguez-Iturbe et al., 1989; Lorenz, 1991; Tsonis, 1992; Tsonis et al., 1993). Such an approach has been employed by Sivakumar et al. (1998, 1999a) in their analysis of the daily rainfall data observed in Singapore, where the correlation dimensions were estimated for data of 30, 20, 10, 5, 4, 3, 2, and 1 years from each of six stations in Singapore. The dimension results achieved for data from one of the stations (Station 05) are presented in Table 1. The results indicate that, in general, significant variations in the dimension of rainfall data seem to occur when the rainfall record length is less than 4 years (equivalent to 1461 points), suggesting that the minimum number of data points essential to reasonably represent the dynamics of the daily rainfall process in Singapore might be taken to be about 1500. Although this does not provide a general guideline on the minimum data size, as it depends on the properties of the attractor, the results indicate that such an analysis would be very useful for determining an approximate data size for the computation of the correlation dimension. The minimum data size estimated (about 1500 or equivalent to 4 years) by Sivakumar et al. (1998, 1999a) for the computation of the correlation dimension of the daily rainfall data in Singapore seems to be reasonable since there is no significant variation in rainfall observed in Singapore and, therefore, a record length of about 4 years is sufficient to reasonably represent the dynamics of the daily rainfall process. Having this in mind, it can be suggested that the data size cannot be the reason for the low
correlation dimension achieved for the Singapore rainfall data, as record lengths of much higher than 4 years (i.e. 30 years) yielded almost the same dimensions. The fact that almost all the studies investigating the existence of chaos in hydrological processes used at least a few thousands of points (e.g. RodriguezIturbe et al., 1989; Sharifi et al., 1990; Jayawardena and Lai, 1994; Sangoyomi et al., 1996; Porporato and Ridolfi, 1996, 1997; Sivakumar et al., 1998, 1999a), the low correlation dimensions achieved might not be due to the data size used in the analysis but could actually be representations of the true dimensions of the processes investigated. Therefore, criticisms that the results of low correlation dimensions achieved for hydrological processes are due to the small data size used (e.g. Ghilardi and Rosso, 1990) may not always be correct. 4.3. Noise has more influence on prediction than dimension estimation As mentioned previously, noise affects the performance of many techniques of identification and prediction of chaotic deterministic systems. The influence of noise on the outcomes of studies investigating chaos in hydrological processes can be readily explained from the correlation dimension and prediction accuracy results reported by the studies. The observations of: (1) small scaling region in the correlation dimension plots (e.g. Sangoyomi et al., 1996; Porporato and Ridolfi, 1996, 1997); and (2) low prediction accuracy even for short lead times (e.g. Jayawardena and Lai, 1994; Sivakumar et al., 1999a) indicate only less than perfect characteristics
14
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
Table 2 Comparison of results for noise-free and noise added Henon data: correlation dimension and prediction accuracy Noise level (%)
0 4 8 16
Correlation dimension
1.22 1.26 1.31 1.34
Prediction accuracy Correlation coefficient
Number of neighbors
Optimal embedding dimension
Maximum lead time
0.97 0.86 0.72 0.55
20 40 70 160
4 4 4 4
7 4 3 1
of chaos. Though such problems have already been identified (e.g. Jayawardena and Lai, 1994), most of the studies failed to either investigate the extent of its influence or to reduce it. One possible reason for this is that dealing with the problem of noise present in hydrological data is not a straightforward task due to the lack of prior information on: (1) the level and the nature of noise; and (2) the noise-free signal and the dynamics of the system. As a result: (1) it is difficult to determine the extent of the influence of noise on data analysis; and (2) the appropriate noise reduction method and the extent of improvement that can be achieved after noise reduction are difficult to determine. To the author’s knowledge, the tremendous task of reducing the noise present in a hydrological time series was first attempted by Porporato and Ridolfi (1997), who employed a simple noise reduction method developed by Schreiber and Grassberger (1991) to the (chaotic) flow series of the river Dora Baltea (Porporato and Ridolfi, 1996, 1997). In their study, a local averaging procedure was applied iteratively until the mean absolute corrections between successive iterations became insignificant. The procedure was stopped after 200 iterations since it was noted that above 200 iterations an unjustifiable calculation time was necessary to produce significant corrections. The improvements achieved in the estimates of correlation dimension and prediction accuracy for the noise-reduced river flow series are indeed encouraging. However, Sivakumar et al. (1999b,c) identified some of the potential problems in the noise reduction method of Schreiber and Grassberger (1991) applied by Porporato and Ridolfi (1997), or any other method for that matter, to hydrological time series. They
stressed the importance of the determination of the level of noise present in the hydrological time series, because of the problems faced in the selection of the optimal values of the parameters involved in the method, such as the size of the neighborhood, and the number of iterations of the procedure required to achieve optimal noise reduction. Subsequently, they also demonstrated that, in the absence of prior knowledge on the level of noise in the time series, the application of the noise reduction method could have serious consequences, as the deterministic component that drives the dynamics of the system might also be removed. To overcome such problems, they proposed a systematic noise reduction approach, by coupling a noise level determination method (Schouten et al., 1994) and a noise reduction method (Schreiber, 1993). The approach was demonstrated first on different levels (4, 8, and 16%) of an additive and uniformly distributed noise-added artificial chaotic (Henon) time series and then tested on a hydrological time series, the daily rainfall data observed in Singapore. The prediction accuracy was considered as the main diagnostic tool to verify the success of the noise reduction, since the prediction accuracy can be determined without any knowledge of the noise-free signal or the underlying dynamics of the system and is also sensitive to under- or overremoval of noise. The correlation dimension was used as a supplementary tool. An important feature of this approach is that the noise reduction results themselves may provide some guidelines on the most probable level of noise present in the data. In the following paragraphs, some of the important results achieved by Sivakumar et al. (1999c) are highlighted.
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
15
Table 3 Noise reduction results for Henon data: correlation dimension and prediction accuracy Noise level (%)
0 4 8 16
Correlation dimension
Correlation coefficient
Original
Noise-reduced
Original
Noise-reduced
1.22 1.26 1.31 1.34
– 1.23 1.23 1.26
0.97 0.86 0.72 0.55
– 0.93 0.88 0.79
prediction accuracy estimates, the correlation dimension estimates do not seem to be significantly influenced even when the noise levels are high. This suggests that: (1) the correlation dimension method may be used as a preliminary approach for the investigation of the existence of chaos in hydrological data, before attempting detailed analysis such as application of noise reduction procedures; and (2) the nonlinear prediction method may not provide accurate results when applied to hydrological data, unless the noise is reduced. The results achieved for the noisereduced data indicate that: (1) the correlation dimension estimates achieved for the noise-reduced data are very close to those of the noise-free data; and (2) in general, significant improvement in the prediction accuracy estimates is achieved after noise reduction. These observations suggest that the application of a noise reduction procedure is always desirable before employing any of the chaos identification methods, but may be necessary if the nonlinear prediction method is employed. Table 4 presents a summary of the correlation dimension and prediction accuracy results achieved for the original and noise-reduced Singapore rainfall data. The ranges of the most probable levels of noise
A summary of the correlation dimension and prediction accuracy estimates obtained for the noisefree and different levels of noisy Henon data is shown in Table 2, whereas Table 3 presents a comparison of the correlation dimension and prediction accuracy results obtained for noisy- and noise-reduced data. The results shown for the noise-reduced data, in Table 3, are those achieved at the optimal level of noise reduction obtained through the systematic noise reduction approach (see Sivakumar et al. (1999c) for details). The results indicate that, in the presence of noise: (1) an overestimation of the correlation dimension occurs, and the dimension increases as the noise level increases; (2) the prediction accuracy decreases with an increase in the noise level; (3) when the noise level increases, relatively large number of neighbors (indicating stochastic modeling) is required to obtain the best predictions; (4) the prediction accuracy decreases when the embedding dimension is increased beyond the optimal embedding dimension; and (5) the lead time for which good predictions are possible decreases with an increase in the noise level. An important observation from the results is that while the presence of even small levels of noise significantly influences the
Table 4 Noise reduction results for Singapore rainfall data: correlation dimension, prediction accuracy, and most probable noise level Station no.
05 07 22 23 31 43
Correlation dimension
Correlation coefficient
Original
Noise-reduced
Original
Noise-reduced
1:02 ^ 0:02 1:03 ^ 0:03 1:06 ^ 0:03 1:03 ^ 0:02 1:02 ^ 0:02 1:03 ^ 0:02
0:97 ^ 0:01 0:96 ^ 0:01 1:02 ^ 0:03 1:01 ^ 0:01 0:95 ^ 0:02 0:97 ^ 0:01
0.291 0.248 0.262 0.421 0.326 0.288
0.484 0.431 0.410 0.587 0.478 0.471
Most probable noise level (%)
6.9–11.5 8.3–13.8 8.0–13.3 7.0–10.5 9.2–15.3 9.6–14.4
16
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
in the rainfall data observed in the six stations are also presented in Table 4. The results achieved for the original and noise-reduced rainfall data indicate that: (1) the correlation dimension estimates for the rainfall data are not significantly affected by noise reduction; and (2) significant improvements are achieved in the prediction accuracy estimates after noise reduction. These results could possibly have the following implications: (1) the influence of noise on the correlation dimension estimate is not significant; and (2) the presence of noise has significant effects on the prediction results. It may be intrinsically impossible to provide strong proof for these results, since the noise-free rainfall data is not available, but the emphasis is that such implications cannot be excluded altogether. All these observations suggest that the application of a noise reduction procedure is always desirable before employing any of the chaos identification methods, but may be necessary if the nonlinear prediction method is employed. The most probable noise levels estimated for the rainfall data observed in the six stations in Singapore are in the range between 6.9 and 15.3%. The magnitudes of the estimated noise levels seem to be quite reasonable considering the fact that the rainfall measurement is influenced by a large number of factors, such as wind, wetting, evaporation, gage exposure, instrumentation, and human error in reading rainfall data. The data used in the study is also influenced by a certain imprecision due to the round-off errors resulting from the conversion of the hourly data to daily data. The noise level range achieved in the present study, using the systematic noise reduction approach proposed, is in good agreement with the one observed by other means (e.g. Sevruk, 1996). The presence of noise in the rainfall data in the order of the above magnitudes could significantly influence the outcomes of the chaos identification methods, in particular nonlinear prediction, as is evident from the results obtained for the Henon data, particularly with noise levels of 8 and 16%. This could be the main reason for the low prediction accuracy achieved for the (noisy) original rainfall data (Table 4, see also Sivakumar et al. (1999a)). The significant improvement in the prediction results achieved for the noise-reduced rainfall data (Table 4) provides additional support to the above. However, the failure to achieve accurate predictions could be
attributed to the following: (1) the noise reduction method (of Schreiber (1993)) might not be so effective when the noise levels are high; (2) the noise levels estimated might only be the most probable noise levels and not the exact ones; and (3) the dynamical noise might also have some influence on prediction, but unfortunately could not be studied using the noise reduction method of Schreiber (1993). Also, it is very important to note that, even a simple additive, independent, and uniformly distributed noise could have significant influence on the prediction accuracy, as the results achieved for the Henon data indicate. Since the noise present in the rainfall data might not be of the simple additive, independent, and uniformly distributed type, but could be more complicated or even a combination of several types, it only indicates the uncertainties of the problem one is dealing with and the extent of its influence. However, the not so significant variations in the correlation dimension estimates achieved for the original and noise-reduced rainfall data seem to imply that the influence of noise on the correlation dimension is reasonably less, if not negligible. The consistency of these observations with those for the artificial chaotic (Henon) data provides only additional support to such implications. The improvement achieved in the correlation dimension (larger scaling region) and prediction accuracy (higher correlation coefficient) estimates achieved for the noise-reduced Singapore rainfall data confirms and reinforces with greater clarity and consistency the evidence found in the earlier studies (Sivakumar et al., 1998, 1999a) regarding the existence of chaos. The above observations suggest that the low-prediction accuracy reported in past studies investigating chaos in hydrological processes (e.g. Jayawardena and Lai, 1994; Sivakumar et al., 1999a) could be due to the presence of noise and, therefore, the reported results regarding the existence of chaos might be acceptable. 4.4. Delay time With respect to the delay time, t , a possible criticism on the low-correlation dimensions reported by past studies investigating chaos in hydrological processes could be that the delay time used was not appropriate, in other words it may be small, because a small t may
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
17
Fig. 3. Correlation exponent versus embedding dimension for various delay time values: Singapore rainfall data.
result in an underestimation of the dimension. However, this is not necessarily the case, as most of the studies used an appropriate t computed using any of the widely accepted methods. Among these, most employed the autocorrelation function method, and t was taken as the lag time at which the autocorrelation function first crossed the zero line (e.g. Jayawardena and Lai, 1994; Sangoyomi et al., 1996; Porporato and Ridolfi, 1996, 1997; Wang and Gan, 1998; Sivakumar et al., 1998, 1999a). In fact, few of these studies employed more than one method to verify the results. For example, Sangoyomi et al. (1996) used the autocorrelation function method and the mutual information method to determine the appropriate t for the volume time series of the Great Salt Lake, and found no significant difference between the t values obtained. The autocorrelation function method yielded a t value of 13, whereas a value of t between 9 and 13 was obtained using the mutual information method. Rodriguez-Iturbe et al. (1989) recommended, in the absence of clear-cut guidelines, estimating correlation dimensions using different delay time values to ascertain its effect. This approach was employed by Sivakumar et al. (1999a) to investigate the effect of delay time on the correlation dimension estimates for the daily rainfall data observed in Singapore. For rainfall data from one of the stations (Station 05) in Singapore, the delay time computed using the
autocorrelation function method was 10 days, and the correlation dimension obtained was about 1.01. Subsequently, using other t values of 1, 2, 8, 12, 20, and 50 days, they observed an underestimation or overestimation of the dimension when t was considerably smaller or larger than 10 days. Fig. 3 shows the relationship between the correlation exponent values and embedding dimension values for a 30year data from one of the stations (Station 05) in Singapore with different values of t . Based on these results, they recommended the selection of the lag time at which the autocorrelation function first crosses the zero line, if the autocorrelation function method is used. These observations suggest that the low correlation dimensions reported by past studies, employing the autocorrelation function method to compute t for the phase–space reconstruction, investigating chaos in hydrological processes could not be a result of the selection of an inappropriate t , but could be the true dimensions of the processes investigated.
5. Summary and conclusions Though the science of chaos has been receiving considerable attention in hydrology, there have also been widespread criticisms on the application of chaos theory in hydrology and suspicions on studies
18
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
reporting the existence of chaos in hydrological processes. Important reasons for this are: (1) the assumptions with which the chaos identification methods have been developed, i.e. infinite and noise-free time series; and (2) the inability of the investigative methods to provide irrefutable proof regarding the existence of chaos. The fact that hydrological time series are always finite and are inherently contaminated by noise, such as errors arising from measurements, necessitate addressing the above issues in the application of chaos theory in hydrology and, therefore, formed the basis for this paper. The paper followed a systematic approach to address these issues by: (1) reviewing some of the important studies investigating the existence of chaos in hydrological processes; (2) presenting the critical issues that have been raised in the application of chaos theory in hydrology; and (3) discussing some of the notable results reported by past studies and providing possible interpretations to those. The important conclusions are as follows. Due to the limitations of each of the chaos identification methods, it may not be possible to provide a definitive resolution of whether or not hydrological processes exhibit chaotic behavior based on the results achieved from the application of a single method. It is necessary to employ diverse techniques to facilitate us to verify whether the results from each one of them is complementary to the other and, hence, to confirm the results. Although only the correlation dimension method was used in most of the studies investigating existence of chaos in hydrological processes, some of the studies (e.g. Rodriguez-Iturbe et al., 1989; Jayawardena and Lai, 1994; Puente and Obregon, 1996; Porporato and Ridolfi, 1996, 1997; Sivakumar et al., 1999a) employed more than one method and reported evidence of chaos indicating that the reported results could be very meaningful. The studies indicate that, among the methods available, the nonlinear prediction method seems to provide better results, since the existence of chaos can be identified using three different approaches. In addition, the method has been found to be effective even when the data size is small. The investigations carried out on the issue of the minimum data size required for the correlation dimension estimation indicate that this issue may not be as severe as it is believed to be. The minimum data size
may largely depend on the type and dimension of the attractor and, therefore, it may be possible to obtain reasonably accurate results even with a small data size. This suggests that the use of a small data size for the correlation dimension estimation cannot be considered as the sole cause for low correlation dimensions achieved for hydrological processes, as commented, for example by Ghilardi and Rosso (1990). Since most of the past studies used at least a few thousands of points for dimension estimation, the results achieved may be considered reasonable and, therefore, the dimensions could well be the actual dimensions of the underlying systems. Regarding the delay time, the selection of delay time using the autocorrelation function method, where the delay time is taken as the lag time at which the autocorrelation function first crosses the zero line, provides reasonable results on dimension estimation. This indicates that the dimension estimates reported by past studies may not be significantly affected due to the problem of delay time, since most of the studies employed the autocorrelation function method to determine the delay time. The studies on the influence of noise revealed that the correlation dimension was not significantly influenced by the presence of noise, whereas noise had significant effect on the prediction accuracy. These observations, together with the fact that all the past studies investigating chaos in hydrological data used the correlation dimension either as a proof or as a preliminary evidence of chaos, suggested that the outcomes of such studies might still be valid, though the influence of noise was not considered. The study also indicated that the low prediction accuracy achieved, in the past studies (e.g. Jayawardena and Lai, 1994; Sivakumar et al., 1999a), for chaotic hydrological data could well be due to the presence of noise, but significantly improved if noise were reduced. On one hand, the basis for the criticisms, among the majority of the hydrological community, of studies investigating and reporting existence of chaos in hydrological processes is our strong belief that they are influenced by a large number of variables and, therefore, are stochastic. On the other hand, the outcomes of the present study provide strong support to the claims that the (seemingly) highly irregular hydrological processes could be the result of simple deterministic systems with a few degrees of freedom.
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
Therefore, the hypothesis of chaos in hydrology is reasonable and can provide an alternative approach for characterizing and modeling the dynamics of hydrological processes. There is no doubt that the significant inroads that have been made in the past decade into the application of chaos theory in hydrology would reach out to a wider audience. References Abarbanel, H.D.I., Brown, R., Kadtke, J.B., 1990. Prediction in chaotic nonlinear systems: methods for time series with broadband Fourier spectra. Phys. Rev. A 41 (4), 1782–1807. Casdagli, M., 1989. Nonlinear prediction of chaotic time series. Physica D 35, 335–356. Casdagli, M., 1991. Chaos and deterministic versus stochastic nonlinear modeling. J. R. Stat. Soc. B 54 (2), 303–328. Farmer, D.J., Sidorowich, J.J., 1987. Predicting chaotic time series. Phys. Rev. Lett. 59, 845–848. Fraedrich, K., 1986. Estimating the dimensions of weather and climate attractors. J. Atmos. Sci. 43 (5), 419–432. Frazer, A.M., Swinney, H.L., 1986. Independent coordinates for strange attractors from mutual information. Phys. Rev. A 33 (2), 1134–1140. Frison, T., 1994. Nonlinear data analysis techniques. In: Deboeck, G.J. (Ed.). Trading on the Edge: Neural, Genetic, and Fuzzy Systems for Chaotic Financial Markets, Wiley, New York, pp. 280–296. Georgakakos, K.P., Sharifi, M.B., Sturdevant, P.L., 1995. Analysis of high-resolution rainfall data. In: Kundzewicz, Z.W. (Ed.). New Uncertainty Concepts in Hydrology and Water Resources, Cambridge University Press, New York, pp. 114–120. Ghilardi, P., Rosso, R., 1990. Comment on chaos in rainfall. Water Resour. Res. 26 (8), 1837–1839. Grassberger, P., Procaccia, I., 1983a. Measuring the strangeness of strange attractors. Physica D 9, 189–208. Grassberger, P., Procaccia, I., 1983b. Estimation of the Kolmogorov entropy from a chaotic signal. Phys. Rev. A 28, 2591–2593. Grassberger, P., Hegger, R., Kantz, H., Schaffrath, C., 1993. On noise reduction methods for chaotic data. Chaos 3 (2), 127–141. Havstad, J.W., Ehlers, C.L., 1989. Attractor dimension of nonstationary dynamical systems from small data sets. Phys. Rev. A 39 (2), 845–853. Henon, M., 1976. A two-dimensional mapping with a strange attractor. Commun. Math. Phys. 50, 69–77. Hense, A., 1987. On the possible existence of a strange attractor for the southern oscillation. Beitr. Phys. Atmos. 60 (1), 34–47. Holzfuss, J., Mayer-Kress, G., 1986. An approach to error-estimation in the application of dimension algorithms. In: Mayer-Kress, G. (Ed.). Dimensions and Entropies in Chaotic Systems, Springer, New York, pp. 114–122. Ikeda, K., 1979. Multiple valued stationary state and its instability of the transmitted light by a ring cavity system. Opt. Commun. 30, 257–261. Islam, S., Bras, R.L., Rodriguez-Iturbe, I., 1993. A possible
19
explanation for low correlation dimension estimates for the atmosphere. J. Appl. Meteor. 32, 203–208. Jayawardena, A.W., Lai, F., 1994. Analysis and prediction of chaos in rainfall and stream flow time series. J. Hydrol. 153, 23–52. Kantz, H., Schreiber, T., 1997. Nonlinear Time Series Analysis, Cambridge University Press, Cambridge. Koutsoyiannis, D., Pachakis, D., 1996. Deterministic chaos versus stochasticity in analysis and modeling of point rainfall series. J. Geophys. Res. 101 (D21), 26 441–26 451. Liebert, W., Schuster, H.G., 1989. Proper choice of the time delay for the analysis of chaotic time series. Phys. Lett. A 141, 386– 390. Liu, Q., Islam, S., Rodriguez-Iturbe, I., Le, Y., 1998. Phase-space analysis of daily streamflow: characterization and prediction. Adv. Water Resour. 21, 463–475. Lorenz, E.N., 1963. Deterministic nonperiodic flow. J. Atmos. Sci. 20, 130–141. Lorenz, E.N., 1991. Dimension of weather and climate attractors. Nature 353, 241–244. Mackey, M.C., Glass, L., 1977. Oscillations and chaos in physiological control systems. Science 197, 287–289. Nerenberg, M.A.H., Essex, C., 1990. Correlation dimension and systematic geometric effects. Phys. Rev. A 42 (12), 7065–7074. Osborne, A.R., Provenzale, A., 1989. Finite correlation dimension for stochastic systems with power-law spectra. Physica D 35, 357–381. Palus, M., 1995. Testing for nonlinearity using redundancies: quantitative and qualitative aspects. Physica D 80, 186–205. Porporato, A., Ridolfi, L., 1996. Clues to the existence of deterministic chaos in river flow. Int. J. Mod. Phys. B 10, 1821–1862. Porporato, A., Ridolfi, L., 1997. Nonlinear analysis of river flow time sequences. Water Resour. Res. 33 (6), 1353–1367. Prichard, D., Theiler, J., 1995. Generalized redundancies for time series analysis. Physica D 84, 476–493. Provenzale, A., Osborne, A.R., Soj, R., 1991. Convergence of the K2 entropy for random noises with power law spectra. Physica D 47, 361–372. Puente, C.E., Obregon, N., 1996. A deterministic geometric representation of temporal rainfall: results for a storm in Boston. Water Resour. Res. 32 (9), 2825–2839. Ramsey, J.B., Yuan, H.J., 1990. The statistical properties of dimension calculations using small data sets. Nonlinearity 3, 155–176. Rapp, R.E., Albano, A.M., Zimmerman, I.D., Jimenez-Montano, M.A., 1994. Phase-randomised surrogates can produce spurious identifications of non-random structure. Phys. Lett. A 192, 27–33. Rodriguez-Iturbe, I., De Power, F.B., Sharifi, M.B., Georgakakos, K.P., 1989. Chaos in rainfall. Water Resour. Res. 25 (7), 1667– 1675. Rossler, O.E., 1976. An equation for continuous chaos. Phys. Lett. A 57, 397–398. Sangoyomi, T.B., Lall, U., Abarbanel, H.D.I., 1996. Nonlinear dynamics of the Great Salt Lake: dimension estimation. Water Resour. Res. 32 (1), 149–159. Schouten, J.C., Takens, F., van den Bleek, C.M., 1994. Estimation of the dimension of a noisy attractor. Phys. Rev. E 50 (3), 1851– 1861.
20
B. Sivakumar / Journal of Hydrology 227 (2000) 1–20
Schreiber, T., 1993. Extremely simple nonlinear noise reduction method. Phys. Rev. E 47 (4), 2401–2404. Schreiber, T., Grassberger, P., 1991. A simple noise reduction method for real data. Phys. Lett. A 160, 411–418. Schreiber, T., Kantz, H., 1996. Observing and predicting chaotic signals: is 2% noise too much? In: Kravtsov, Yu.A., Kadtke, J.B. (Eds.). Predictability of Complex Dynamical Systems, Springer Series in Synergetics, Springer, Berlin, pp. 43–65. Schreiber, T., Schmitz, A., 1996. Improved surrogate data for nonlinearity tests. Phys. Rev. Lett. 77 (4), 635–638. Schuster, H.G., 1988. Deterministic Chaos, VCH, Weinheim. Sevruk, B., 1996. Adjustment of tipping-bucket precipitation gage measurement. Atmos. Res. 42, 237–246. Sharifi, M.B., Georgakakos, K.P., Rodriguez-Iturbe, I., 1990. Evidence of deterministic chaos in the pulse of storm rainfall. J. Atmos. Sci. 47, 888–893. Sivakumar, B., Liong, S.-Y., Liaw, C.-Y., 1998. Evidence of chaotic behavior in Singapore rainfall. J. Am. Water Resour. Assoc. 34 (2), 301–310. Sivakumar, B., Liong, S.-Y., Liaw, C.-Y., Phoon, K.-K., 1999a. Singapore rainfall behavior: chaotic? J. Hydrol. Engng, ASCE 4 (1), 38–48. Sivakumar, B., Phoon, K.-K., Liong, S.-Y., Liaw, C.-Y., 1999b. Comment “on nonlinear analysis of riverflow time series” by Amilcare Porporato and Luca Ridolfi. Water Resour. Res. 35 (3), 895–897. Sivakumar, B., Phoon, K.-K., Liong, S.-Y., Liaw, C.-Y., 1999c. A systematic approach to noise reduction in observed chaotic time series. J. Hydrol. 219 (3,4), 103–135. Smith, L.A., 1988. Intrinsic limits on dimension calculations. Phys. Lett. A 133 (6), 283–288. Sugihara, G., May, R.M., 1990. Nonlinear forecasting as a way of
distinguishing chaos from measurement error in time series. Nature 344, 734–741. Takens, F., 1981. Detecting strange attractors in turbulence. In: Rand, D.A., Young, L.S. (Eds.). Dynamical Systems and Turbulence, Lecture Notes in Mathematics, 898. Springer, Berlin, pp. 366–381. Theiler, J., Galdrikian, B., Longtin, A., Eubank, S., Farmer, J.D., 1992a. Using surrogate data to detect nonlinearity in time series. In: Casdagli, M., Eubank, S. (Eds.). Nonlinear Modeling and Forecasting, pp. 163–185. Theiler, J., Eubank, S., Longtin, A., Galdrikian, B., Farmer, J.D., 1992b. Testing for nonlinearity in time series: the method of surrogate data. Physica D 58, 77–94. Tsonis, A.A., 1992. Chaos: from Theory to Applications, Plenum Press, New York. Tsonis, A.A., Elsner, J.B., 1988. The weather attractor over very short timescales. Nature 333, 545–547. Tsonis, A.A., Elsner, J.B., Georgakakos, K.P., 1993. Estimating the dimension of weather and climate attractors: important issues about the procedure and interpretation. J. Atmos. Sci. 50, 2549– 2555. Waelbroeck, H., Lopez-Pena, R., Morales, T., Zertuche, F., 1994. Prediction of tropical rainfall by local phase space reconstruction. J. Atmos. Sci. 51 (22), 3360–3364. Wang, Q., Gan, T.Y., 1998. Biases of correlation dimension estimates of streamflow data in the Canadian prairies. Water Resour. Res. 34 (9), 2329–2339. Wolf, A., Swift, J.B., Swinney, H.L., Vastano, A., 1985. Determining Lyapunov exponents from a time series. Physica D 16, 285– 317. Zeng, X., Pielke, R.A., 1993. What does a low-dimensional weather attractor mean? Phys. Lett. A 175, 299–304.