Advances in Water Resources 25 (2002) 179–190 www.elsevier.com/locate/advwatres
Characterization and prediction of runoff dynamics: a nonlinear dynamical view M.N. Islam, B. Sivakumar
*
Department of Land, Air and Water Resources, University of California, Davis, CA 95616, USA Received 8 November 2000; accepted 20 September 2001
Abstract An attempt is made in this study to characterize and predict runoff dynamics, using ideas gained from nonlinear dynamical theory. Daily runoff data observed over a period of 19 years (January 1, 1975–December 31, 1993) at the Lindenborg catchment in Denmark is studied using a variety of techniques. First, the autocorrelation function and the Fourier power spectrum are used as indicators to obtain some preliminary information regarding the runoff behavior. A comprehensive characterization is done next through the correlation integral analysis, the false nearest neighbor algorithm, and the nonlinear prediction method, all of which use the concept of phase-space reconstruction, i.e., reconstruction of the single-dimensional (or variable) runoff series in a multi-dimensional phase-space to represent its dynamics. The average mutual information method is used to estimate the delay time for the phase-space reconstruction. The exponential decay in the autocorrelation function plot and the sharp spectral lines in the Fourier power spectrum seem to provide some preliminary indication regarding the possible presence of chaos in the runoff dynamics. The (low) correlation dimension (of about 3.76) obtained from the correlation integral analysis, the (low) global dimension (of 4 or 5) obtained from the false nearest neighbor algorithm, and the (low) optimal embedding dimension (of 3) from the nonlinear prediction method are in close agreement with each other, providing convincing evidence regarding the presence of low-dimensional chaotic behavior in the runoff dynamics. The near-accurate predictions achieved for the runoff series (correlation coefficient of about 0.99 and coefficient of efficiency of about 0.98) indicate the appropriateness of the chaotic dynamical approach for characterizing and predicting the runoff dynamics at the Lindenborg catchment. Ó 2002 Elsevier Science Ltd. All rights reserved.
1. Introduction Adequate knowledge of runoff dynamics is required for, among others: (1) optimal design of water storage and drainage networks; and (2) management of extreme events, such as floods and droughts. The physical mechanisms governing the runoff dynamics are many, and act on a range of temporal and spatial scales. Runoff depends not only on the space-time distribution of the rainfall occurrence, but also on the kind and the particular state of the basin, which, in turn, depend on climatic conditions, vegetation state, and others. Also, as almost all mechanisms involved in the runoff process present some degree of nonlinearity, modeling its dynamics is nontrivial, to say the least. Even though applications of (linear) stochastic approaches are very common in the study of complex *
Corresponding author. Tel.: +1-530-752-8577; fax: +1-530-7525262. E-mail address:
[email protected] (B. Sivakumar).
natural and physical systems, such as runoff, the dramatic recent advances in nonlinear science and a rapidly growing set of tools for nonlinear time series analysis have brought about a major methodological revolution. This nonlinear science revolution has provided important findings, such as: 1. systems can be completely deterministic or completely unpredictable/uncontrollable; 2. seemingly random systems can have hidden structures; 3. systems may not get simpler as they are broken down; 4. infinitesimal causes can lead to massive effects; 5. complex systems may have simple solutions; and 6. simple equations can produce very complex results. Among a large number of discoveries/theories that make up such a revolution, the ‘‘science of chaos’’ (that apparently complex irregular behavior could be the outcome of a simple deterministic system with a few dominant nonlinear interdependent variables) has gained significant popularity in almost all fields of natural and physical sciences, including hydrological sciences.
0309-1708/02/$ - see front matter Ó 2002 Elsevier Science Ltd. All rights reserved. PII: S 0 3 0 9 - 1 7 0 8 ( 0 1 ) 0 0 0 5 3 - 7
180
M.N. Islam, B. Sivakumar / Advances in Water Resources 25 (2002) 179–190
The past decade has witnessed a large number of studies employing the ideas gained from the science of chaos to characterize, model, and predict the dynamics of various hydrological phenomena [1–14]. The outcomes of such studies are very encouraging, as they not only revealed that the dynamics of the apparently irregular hydrological phenomena could be understood from a chaotic deterministic point of view but also reported very good predictions using such an approach, in particular for runoff process [8,9,13]. It is also noteworthy that studies that have employed both the chaotic and stochastic approaches have revealed that the chaotic approach was better than the stochastic approach for the streamflow series analyzed [13]. In spite of the general criticisms regarding the application of the chaotic deterministic approach to hydrological time series, as the chaos identification methods have been developed typically for infinite and noise-free time series, the reasonableness and usefulness of such an alternative approach for understanding hydrological phenomena cannot be eliminated easily [15]. An important criticism, in addition to the issues of data size and noise, on the studies that reported possible presence of low-dimensional chaos in hydrological phenomena is that such studies have made premature conclusions, without detailed investigations. Specifically, this criticism is directed towards studies that employed (only) the correlation dimension method and interpreted the low correlation dimension obtained as evidence of low-dimensional chaos. The importance of such a criticism lies in the fact that a finite and low correlation dimension may be observed even for a stochastic process [16]. Though such a counter-example may readily be provided to refute the reports regarding the existence of low-dimensional chaos in hydrological phenomena, to the authors’ knowledge, this also has been made based only on the finite and low correlation dimension [17]. On the other hand, not only the validity of the studies employing the methods has been questioned but the criticisms of such studies as well [18]. One way to avoid such premature (and possibly misleading) conclusions and criticisms is to employ a variety of techniques, each one in some way complementary to the others, and to critically analyze the results that can enable us to confirm whether or not to exclude the possible existence of chaotic dynamics in a phenomenon [8,15]. In view of the above, the present study attempts to employ a variety of techniques for characterizing the dynamics of daily runoff observed at the Lindenborg catchment in Denmark (or more specifically, for identifying the possible presence of chaotic dynamics). The techniques employed range from standard statistical techniques that can provide general indications regarding the dynamics of the phenomenon to specific ones that can provide comprehensive characterization of the dynamics. The standard statistical techniques used are
the autocorrelation function and the Fourier power spectrum, whereas the correlation integral analysis, the false nearest neighbor algorithm, and the nonlinear prediction method are employed for comprehensive characterization. The organization of this paper is as follows. Section 2 presents a brief review of the methods employed in the present study, both preliminary and comprehensive, for characterizing the runoff dynamics. Details of the Lindenborg catchment and the data considered for investigation are provided in Section 3. The section also presents the analysis of the data and discusses the results. Important conclusions drawn from the present study are presented in Section 4.
2. Methods employed If the mathematical formulation of the system is available, then recognizing chaotic behavior is relatively straightforward. Since the system evolution is deterministic, broadband noise spectra, for example, would be sufficient to identify the presence of chaos. Furthermore, since the number of variables is known, the reconstruction of the phase-space and the attractor and the estimation of the various invariants are straightforward. However, when one deals with controlled experiments, where one cannot record all the variables, and/or with observables from an uncontrolled system (like the atmospheric or hydrologic system), whose mathematical formulation and total number of variables may not be known exactly, the problem of identifying chaos becomes complicated. In such cases, Fourier analysis alone, for instance, cannot be used to identify the presence of chaos, since the observable might be a random variable. The difficulty in using Fourier analysis, or any other single (statistical) method, to identify chaos in a phenomenon necessitated formulation of new methods and modification of existing ones. This resulted in the emergence of a wide variety of methods, popular among them are: 1. correlation dimension method; 2. false nearest neighbor algorithm; 3. nonlinear prediction method; 4. Lyapunov exponent method; 5. Kolmogorov entropy method; 6. surrogate data method; and 7. method of redundancy. It is important to note that, in spite of the advantages they possess, none of these methods can provide an infallible distinction between a chaotic and a stochastic system. The advantages and limitations of each of these methods have already been made available in the literature [5,15,16,19–23] and, therefore, are not reported herein. However, wherever appropriate, some of the
M.N. Islam, B. Sivakumar / Advances in Water Resources 25 (2002) 179–190
limitations/issues are discussed below with respect to the runoff series analyzed. As none of these methods can provide an infallible distinction between a chaotic and a stochastic system, it is necessary to employ diverse techniques to provide at least convincing, if not conclusive, evidence regarding the presence/absence of chaotic dynamics in a phenomenon. In view of this, in the present study, a total of five methods are employed to investigate the dynamics of the daily runoff observed at the Lindenborg catchment in Denmark. The investigation is carried out in two steps. In the first step, useful preliminary information regarding the general runoff dynamics is obtained using two standard statistical techniques: (1) the autocorrelation function; and (2) the Fourier power spectrum. This is followed by the application of three specific chaos identification methods that could provide comprehensive characterization of the runoff dynamics: (1) the correlation integral analysis; (2) the false nearest neighbor algorithm; and (3) the nonlinear prediction method. These five methods are explained next. 2.1. Preliminary characterization 2.1.1. Autocorrelation function The autocorrelation function is a normalized measure of the linear correlation among successive values in a time series. The use of the autocorrelation function in characterizing the behavior of a time series lies in its ability to determine the degree of dependence present in the values. In general, for a purely random process, the autocorrelation function fluctuates about zero, indicating that the process at any certain instance has no ‘‘memory’’ of the past at all. For a periodic process, the autocorrelation function is also periodic, indicating the strong relation between values that repeat over and over again. The autocorrelation function of signal from a chaotic process decays exponentially with increasing lag, because the states of a chaotic process are neither completely dependent (i.e., deterministic) nor completely independent (i.e., random) of each other. 2.1.2. Fourier power spectrum The Fourier power spectrum is particularly useful for characterizing the regularities/irregularities in observed signals (or time series). In general, for a purely random process, the power spectrum oscillates randomly about a constant value, indicating that no frequency explains any more of the variance of the sequence than any other frequency. For a periodic or quasi-periodic sequence, only peaks at certain frequencies exist; measurement noise adds a continuous floor to the spectrum. Thus, in the spectrum, signal and noise are readily distinguished. Chaotic signals may also have sharp spectral lines but even in the absence of noise there will be continuous part (broadband) of the spectrum. This is an immediate
181
consequence of the exponentially decaying autocorrelation function. 2.2. Comprehensive characterization 2.2.1. Phase-space reconstruction An important first step in any (specific) chaos identification technique is the reconstruction of the phasespace of the time series under investigation and, hence, its attractor (a geometric object which characterizes the long-term behavior of a system in the phase-space). Such a reconstruction approach uses the concept of embedding a single-variable series in a multi-dimensional phase-space to represent the underlying dynamics. According to this approach, for a scalar time series, such as runoff series, Xi , where i ¼ 1; 2; . . . ; N , the multidimensional phase-space can be reconstructed, using the method of delays, according to [24]: Y j ¼ ðXj ; Xjþs ; Xjþ2s ; . . . ; Xjþðm1Þs Þ;
ð1Þ
where j ¼ 1; 2; . . . ; N ðm 1Þs=Dt; m is the dimension of the vector Y j , called as the embedding dimension; and s is a delay time taken to be some suitable multiple of the sampling time Dt [24,25]. An appropriate delay time, s, is essential for the phase-space reconstruction, since only an optimum s gives the best separation of neighboring trajectories within the minimum embedding phase-space. On one hand, if s is too small, the resulting (phase-space) coordinates will not be independent enough to produce, in a practical sense, new information about the evolution of the system. On the other hand, if s is too large, and the dynamics are chaotic, then all relevant information for phase-space reconstruction is lost since neighboring trajectories diverge, and averaging in time and/or space is no longer useful [7]. Studies have revealed that the use of an inappropriate s in the phase-space reconstruction could have serious influence on the outcomes of chaos identification methods. For instance, use of a too small s may result in a significant underestimation of the correlation dimension, whereas a significant overestimation may occur if a too large s is used [26]. In view of the above, the problem of selection of s has received a lot of attention not only from nonlinear dynamicists but also from researchers in various natural and physical sciences. Research in this direction thus far has resulted in the formulation of a large number of methods and recommendations. Popular among these are the autocorrelation function method [27], the mutual information method [28] and the correlation integral method [29]. A brief account of these methods is presented next. The autocorrelation function has been the most widely used tool to determine s, particularly in studies dealing with hydrological time series. Possible reasons
182
M.N. Islam, B. Sivakumar / Advances in Water Resources 25 (2002) 179–190
for the popularity and use of the autocorrelation function are that: (1) its computation is relatively simple; and (2) it is one of the most fundamental and standard statistical tools in any time series analysis. Within the use of autocorrelation function, there have been several recommendations regarding the selection of s. For instance, Holzfuss and Mayer-Kress [27] suggest using a value of delay time at which the autocorrelation function first crosses the zero line. Other approaches suggest the use of lag time at which the autocorrelation function attains a certain value, for instance, 0.5 [30] and 0.1 [31]. According to Frazer and Swinney [28], the autocorrelation function measures only the linear dependence between successive points in a time series and, thus, may not be appropriate for nonlinear dynamics. They suggested the use of mutual information method, as this method measures the nonlinear dependence in addition to the linear dependence. They reasoned that if s is chosen to coincide with the first minimum of the mutual information, then the recovered state vector would consist of components that possess minimal mutual information between them, i.e., the successive values in the time series are statistically independent but (also) without any redundancy. For a discrete time series, with Xi and Xis as successive values, for instance, the mutual information function, IðsÞ, is computed according to X P ðXi ; Xis Þ P ðXi ; Xis Þ log2 IðsÞ ¼ ; ð2Þ P ðXi ÞP ðXis Þ i;is
reconstruction of the runoff series observed at the Lindenborg catchment.
where P ðXi Þ and P ðXis Þ are the individual probabilities of Xi and Xis respectively, and P ðXi ; Xis Þ is the joint probability density. A method that is based on the generalized correlation integral (but is otherwise similar to the mutual information method), known as the correlation integral method, to determine the delay time was proposed by Liebert and Schuster [29]. According to this method, the first minimum of the logarithm of the generalized correlation integral is considered to provide a proper choice of s. For some attractors, it really does not matter whether the autocorrelation function or the mutual information or the correlation integral is used. However, for some other attractors, the estimation of s might depend strongly on the approach employed. Evidently, none of the aforementioned methods or rules is definitive regarding the proper choice of s. However, according to Tsonis [32], the mutual information method is more comprehensive than the others and, therefore, may have an edge. The method, due to its ability to measure the nonlinear dependence in a time series, could be much more appropriate for hydrological time series, as such series are often nonlinear in nature. In view of this, the mutual information method is employed, in the present study, for choosing an appropriate s for the phase-space
m ¼ lim
2.2.2. Correlation integral analysis The correlation integral analysis (also known as the correlation dimension method) is one of the widely used techniques to investigate the presence/absence of chaos in a time series. The analysis uses the correlation integral or function, CðrÞ, to distinguish between chaotic and stochastic systems. Among a large number of existing algorithms for the computation of the correlation integral, the Grassberger–Procaccia algorithm [33] is the most commonly used one. According to this algorithm, the correlation integral is computed according to CðrÞ ¼ lim
N !1
2 N ðN 1Þ
X
H r kY i Y j k ;
ð3Þ
i;j ð1 6 i
where H is the Heaviside step function, with H ðuÞ ¼ 1 for u > 0, and H ðuÞ ¼ 0 for u 6 0, where u ¼ r jY i Y j j, r is the radius of sphere centered on Y i or Y j , and N is the number of data points. If the time series is characterized by an attractor, then the correlation integral CðrÞ is related to the radius r given by CðrÞ r!0 arm ; N !1
ð4Þ
where a is a constant; and m is the correlation exponent or the slope of the log CðrÞ versus log r plot given by r!0 N !1
log CðrÞ : log r
ð5Þ
The slope is generally estimated by a least-squares fit of a straight line over a certain range of r, called the scaling region. If the correlation exponent attains saturation with an increase in the embedding dimension, then the system is generally considered to exhibit chaotic dynamics. The saturation value of the correlation exponent is defined as the correlation dimension (d) of the attractor. The nearest integer above the saturation value provides the minimum or optimum embedding dimension ðmopt Þ for reconstructing the phase-space or the number of variables necessary to model the dynamics of the system. On the other hand, if the correlation exponent increases without bound with increase in the embedding dimension, the system under investigation is generally considered stochastic. There are certain important limitations in the use of the correlation integral analysis (the Grassberger–Procaccia algorithm in particular) in the determination of attractor dimension and, hence, in the search for chaos. For instance, the selection of inappropriate values for the parameters involved in the method may result in an underestimation (or overestimation) of the attractor dimension [26]. Consequently, finite and low correlation
M.N. Islam, B. Sivakumar / Advances in Water Resources 25 (2002) 179–190
dimensions could be observed even for a stochastic process [16]. In addition to the problems that could arise due to the inappropriate selection of data size, data sampling frequency, noise and delay time, the outcomes of the method may be significantly influenced also because of the neighbor searching procedure (for counting the points within a sphere of radius, r). For instance, in an embedding dimension that is too small to embed the dynamics (or unfold the attractor), not all points that lie close to one another will be (true) neighbors because of the dynamics. Some will actually be far away from each other and simply appear as neighbors because the geometric structure of the attractor has been projected down onto a smaller space. To avoid the uncertainties that could result due to this problem, an alternative method, known as the false nearest neighbor algorithm, was proposed by Kennel et al. [34] to determine the optimum embedding dimension. To the authors’ knowledge, only a very few studies have employed this algorithm in studying hydrological time series [6,7,35,36]. The algorithm, employed in the present study, (1) to determine the optimum embedding dimension ðmopt Þ for reconstructing the phase-space to represent the dynamics of runoff; and (2) to verify the results obtained from the correlation integral analysis, is explained next. 2.2.3. False nearest neighbor algorithm The false neighbor algorithm examines, in dimension m, the nearest neighbor Y NN of every vector Y j , as it j behaves in dimension m þ 1. If the vector Y NN is a true j neighbor of Y j , then it comes to the neighborhood of Y j through dynamical origins. On the other hand, if the vector Y NN moves far away from vector Y j as the j dimension is increased, then it is declared a false nearest neighbor as it arrived in the neighborhood of Y j in dimension m by projection from a distant part of the attractor. When the percentage of these false nearest neighbors drops to zero, the geometric structure of the attractor has been unfolded and the orbits of the system are now distinct and do not cross (or overlap). A key step in the false nearest neighbor algorithm is to determine how to decide upon increasing the embedding dimension that a nearest neighbor is false. Two criteria are generally used [7]. These are: 1. If Rmþ1 ðjÞ P 2RA , the jth vector has a false nearest neighbor (where Rmþ1 ðjÞ is the distance to the nearest neighbor of the jth vector (i.e., Y NN j ) in an embedding of dimension ðm þ 1Þ, and RA is the standard deviation of the time series Xi ; i ¼ 1; 2; . . . ; N ). 2. If ½Rmþ1 ðjÞ Rm ðjÞ > eRm ðjÞ, the jth vector has a false nearest neighbor (where e is a threshold factor (generally between 10 and 50), and the distance Rmþ1 ðjÞ is computed to the same neighbor that was identified
183
with embedding m, but with the ðm þ 1Þth coordinate (i.e., Xjms appended to the jth vector and to its nearest neighbor with embedding m). The first criterion is needed because with a finite data set, such as the runoff series used in the present study, under repeated embedding, one may stretch out the points such that they are far apart and yet cannot move any farther apart upon increasing the dimension. The second criterion checks whether the nearest neighbors have moved far apart on increasing the dimension. The appropriate threshold e is generally selected through experimentation. In the present study, the false nearest neighbor algorithm is implemented using the nonlinear time series analysis software, cspW [37]. 2.2.4. Nonlinear prediction method The application of the nonlinear prediction method in the present investigation of runoff dynamics is threefold: (1) to investigate whether the runoff dynamics could be predicted reliably; (2) to detect the possible presence of chaos using the prediction results (i.e., an inverse approach to identify chaos); and (3) to verify the results obtained from the correlation integral analysis and the false nearest neighbor algorithm. The (correct) reconstruction of the attractor in a phasespace of dimension m (Eq. (1)) makes it possible to interpret the dynamics in the form of an m-dimensional map fT , that is, Y jþT ¼ fT ðY j Þ;
ð6Þ
where Y j and Y jþT are vectors of dimension m, describing the state of the system at times j (current state) and j þ T (future state), respectively. [In real situations, however, the optimum embedding dimension for reconstruction is not known a priori and, therefore, different embedding dimensions have to be used and the optimum dimension should be chosen based on the prediction results; see below for more details]. The problem then is to find an appropriate expression for fT (i.e., FT ). There are several possible approaches for determining FT , which may broadly be divided into two categories: (1) global; and (2) local. By means of the first, an attempt is made to approximate the map (Eq. (6)), working globally on all the attractors and seeking a map FT valid at every point of it. Local approximation [38], on the other hand, entails the subdivision of the fT domain into many subsets (neighborhoods), each of which identifies some approximations FT , valid only in that same subset. In this way, the dynamics of the system is described step by step locally in the phase-space. This choice leads to a considerable reduction in the complexity of the representation FT , without lowering the quality of the forecast, to the point that, for the very short term, it provides generally better results than those obtainable with global methods.
M.N. Islam, B. Sivakumar / Advances in Water Resources 25 (2002) 179–190
In the present study, the local approximation approach is employed for the prediction of the runoff time series. The identification of the sets in which to subdivide the domain can be done in several ways; the usual one entails fixing a metric k k, then, given the starting point Y j from which the forecast is initiated, identifying neighbors Y pj ; p ¼ 1; 2; . . . ; k, with jp < j, nearest to Y j , which constitute the set corresponding to the point Y j . With this, the local functions can then be built, which take each point in the neighborhood to the next neighborhood: Y pj to Y pjþ1 . A local polynomial model of this evolution can be expressed by Y pjþ1 ¼ A þ BY pj þ CðY pj Þ2 ;
ð7Þ
where A, B, and C are constants, which are learned from the training sets [35]. The local map FT , which does this, is determined by a least-squares fit minimizing k X
2
p kYjþ1 FT Yjp k :
ð8Þ
p¼1
The local maps are learned in the form of local polynomials [35], and the predictions are made forward from a new point Z 0 using these local maps. For the new point Z 0 , nearest neighbor in the learning or training set is found, which is denoted as Y q . Then the evolution of Z 0 is found, which is denoted as Z 1 and is given by Z 1 ¼ Fq ðZ 0 Þ
ð9Þ
and then the nearest neighbor to Z 1 is found, and the procedure is repeated to predict other values. The nonlinear prediction method is employed in the present study by implementing the algorithm provided by the cspW software [37]. The prediction accuracy can be evaluated using, for instance, the correlation coefficient (CC), the root mean square error (RMSE), and the coefficient of efficiency ðR2 Þ. The time series plots and the scatter diagrams may also be used to choose the best prediction results, among a large combination of results achieved with the different embedding dimensions, as is done in the present study. The procedure to identify the presence of chaos using the results of nonlinear prediction analysis is termed as ‘‘the inverse approach’’. One possible way to do this is to check the prediction accuracy against the embedding dimension. If the dynamics is chaotic, the prediction accuracy is expected to increase (to reach its best) with the increase in the embedding dimension up to a certain point, called the optimal embedding dimension mopt , and then remain close to its best for embedding dimensions higher than mopt . On the other hand, for a stochastic time series, there would be no increase in the prediction accuracy with an increase in the embedding dimension and the accuracy would remain the same for any value of the embedding dimension [19].
3. Analysis, results and discussion 3.1. Study area and data used In the present study, the dynamical behavior of runoff at the Lindenborg catchment in Denmark is investigated. The Lindenborg catchment is situated in the lborg. northern part of Jutland between Hobro and A The catchment is a medium-sized one with a total area of 130:2 km2 . The slope of the main stream is approximately 0.14% and, therefore, may be categorized as mild. The catchment consists predominantly of sandy soils with a high and persistent groundwater contribution. For the present investigation, daily runoff data observed over a period of 19 years (January 1, 1975–December 31, 1993) are considered. Fig. 1 shows the variation of this runoff series, and Table 1 presents some of the important statistics of the series. As can be seen in Fig. 1, the runoff series exhibits significant variations, though an annual (or seasonal) cyclicity seems to be evident. Clearly, a visual inspection of the (irregular) runoff series does not provide any clues regarding its dynamical behavior, whether chaotic or stochastic. Presented below are the details of the analysis of the runoff series using additional tools. 3.2. Preliminary characterization of runoff dynamics Fig. 2 presents the variation of the autocorrelation coefficient for the daily runoff series observed at the 10 8 Runoff (Cumec)
184
6 4 2 0 0
1000
2000
3000
4000
5000
6000
7000
Time (Day)
Fig. 1. Time series plot for daily runoff from Lindenborg catchment. Table 1 Statistics of runoff data from Lindenborg catchment Statistic
Value
Number of data Mean Standard deviation Maximum value Minimum value Coefficient of variation Skewness
6209 2.325 0.773 9.543 1.028 0.332 2.333
Cumec Cumec Cumec Cumec
M.N. Islam, B. Sivakumar / Advances in Water Resources 25 (2002) 179–190
Autocorrelation Coefficient
1.0 0.8 0.6 0.4 0.2 0.0 -0.2 0
200
400
600
800
1000
Lag Time (Day)
Fig. 2. Autocorrelation function for daily runoff from Lindenborg catchment.
Lindenborg catchment. As can be seen, the autocorrelation function exhibits some kind of exponential decay up to a lag time of about 200 days. Such an exponential decay may be an indication of the presence of chaotic dynamics in the runoff process. A closer look at the autocorrelation function reveals that the autocorrelation function exhibits the same kind of seasonal variation as that of the original runoff time series (with the rises and falls), particularly after the initial exponential decay. Fig. 3 shows the Fourier power spectrum for the daily runoff series. As can be seen, the power spectrum not only exhibits sharp spectral lines (or peaks) at certain frequencies but also is somewhat continuous (broadband). This interesting mixture in the power spectrum could be due to the presence of chaotic dynamics in the runoff phenomenon. This could also be an immediate consequence of the exponentially decaying autocorrelation function. 3.3. Comprehensive characterization of runoff dynamics In this study, a comprehensive characterization of the runoff dynamics is done employing the correlation integral analysis, the false nearest neighbor algorithm, and the nonlinear prediction method. As mentioned previously, the phase-space reconstruction plays an essen-
185
tial role in representing the governing (runoff) dynamics. Also, the appropriateness and accuracy of such a reconstruction, however, depends a lot on the delay time, s, used. For this purpose, in the present study, the mutual information function method is used, and the lag time that provides the (first and local) minimum mutual information is chosen as s. Fig. 4 presents the variation of the mutual information function against the lag time. The mutual information function exhibits an initial rapid decay (up to a lag time of about 7 days) before attaining near-saturation. It is clear that the selection of the minimum mutual information is difficult, as, except the initial decay, the function seems to continue to decrease very slowly (up to a large lag time). The problem becomes more complicated also with the difficulty in the selection of the saturation point, as can be seen in the figure. The implementation of the mutual information method using the cspW recommends a s value of 7. The appropriateness of this value and its superiority over other values are also verified by looking at the phase-space plots reconstructed using different s values. Fig. 5, for instance, shows the phase-space plots reconstructed in two-dimensions (m ¼ 2), i.e., the projection of the attractor on the plane fXi ; Xiþs g, with: (a) s ¼ 1, i.e., use of successive values in the series, Fig. 5(a); (b) s ¼ 7, i.e., use of value from the mutual information method, Fig. 5(b); and (c) s ¼ 200, i.e., use of value from the autocorrelation function method, Fig. 5(c), respectively. Though the selection of s even among these three values is difficult, on one hand, the plots show that the use of s ¼ 1 and s ¼ 200 result in highly dependent (a well-defined attractor) and highly independent (no attractor at all) reconstructions, respectively, and, therefore, may not be appropriate. On the other hand, the reconstruction with s ¼ 7 (with a somewhat defined attractor) seems to provide some kind of compromise when compared to the above two. In spite of the observation of a well-defined attractor with s ¼ 1, it should be noted that the obtained phase-space may be redundant. For these
Average Mutual Information
12 3.0
Log (Power)
2.0
1.0
0.0
10 8 6 4 2 0
-1.0 0
400
800 1200 Frequency (Days)
1600
2000
Fig. 3. Fourier power spectrum for daily runoff from Lindenborg catchment.
0
50
100
150
200
250
300
Lag Time (Day)
Fig. 4. Average mutual information for daily runoff from Lindenborg catchment.
M.N. Islam, B. Sivakumar / Advances in Water Resources 25 (2002) 179–190
12
12
10
10
8
8 Xi+7
Xi+1
186
6
6
4
4
2
2
0
0 0
2
4
6 Xi
8
10
12
0
2
4
8
10
12
6 Xi
8
10
12 10
Xi+200
8 6 4 2 0 0
2
4
6 Xi
Fig. 5. Phase-space diagram for daily runoff from Lindenborg catchment: (a) s ¼ 1; (b) s ¼ 7 (from average mutual information); and (c) s ¼ 200 (from autocorrelation function).
reasons, in this study, s ¼ 7 is used for the phase-space reconstruction. 3.3.1. Correlation integral analysis The correlation integrals and the exponents for the runoff series are computed using the Grassberger–Procaccia algorithm for phase-spaces reconstructed with embedding dimensions from 1 to 20. Figs. 6(a) and (b) present the results of the correlation integral analysis. The relationship between CðrÞ and r, shown in Fig. 6(a), indicates clear scaling regions (between log r ¼ 0:6 and 0.6) for all the embedding dimensions used, allowing fairly accurate estimates of the correlation exponents, m, which are presented in Fig. 6(b) against the corresponding m values. Fig. 6(b) shows an increase in the correlation exponent with the embedding dimension up to a certain point, and saturation beyond this point. Such saturation may be an indication of the deterministic dynamics in the runoff phenomenon. The saturation
value of the correlation exponent (or correlation dimension) is about 3.76, suggesting that the number of variables dominantly influencing the runoff dynamics (or the minimum number of phase-spaces required) is 4. The low correlation dimension obtained seems to suggest the possible presence of low-dimensional chaotic behavior in the runoff dynamics. 3.3.2. False nearest neighbor analysis Fig. 7 displays the percentage of false nearest neighbors obtained for the runoff series, for phase-spaces reconstructed with embedding dimensions from 1 to 15. As can be seen, the percentage of false neighbors drops to its minimum at m ¼ 4 or 5. This indicates that a fouror five-dimensional phase-space is necessary to represent the dynamics (or unfold the attractor) of the runoff series. Such a result seems to be in close agreement with the one obtained from the correlation integral analysis, providing further support to the observation made
M.N. Islam, B. Sivakumar / Advances in Water Resources 25 (2002) 179–190 0.0
Correlation Exponent
4
-1.0 -2.0
Log C(r)
187
-3.0
3
2
1
-4.0
(a)
-5.0 -0.6
0 -0.4
-0.2
0.0
0.2 Log r
0.4
0.6
0.8
1.0
(b)
0
2
4
6
8
10
12
14
16
18
20
Embedding Dimension
Fig. 6. Correlation dimension results for daily runoff from Lindenborg catchment: (a) correlation function versus radius; and (b) correlation dimension estimation.
zero (or minimum) beyond mopt . However, in the present case, the percentage of false neighbors does not remain at its minimum after mopt (m ¼ 4 or 5), but is found to increase. One possible reason for these results is the presence of noise (e.g., measurement error) in the runoff data. To the authors’ knowledge, the effects of noise on the outcomes of the false nearest neighbor analysis have not been studied in detail thus far and, therefore, interpretation of the present results in the context of noise is difficult to provide. However, the prediction results achieved for the runoff series, discussed below, seem to support, at least partially, the above interpretation.
% False Nearest Neighbors
100 80 60 40 20 0 0
3
6
9
12
Embedding Dimension
Fig. 7. Global dimension estimation using false nearest neighbor method for daily runoff from Lindenborg catchment.
3.3.3. Nonlinear prediction analysis The nonlinear prediction method with a local approximation technique (where the maps are learned in the form of local polynomials), explained in Section 2.2.4, is now employed to predict the runoff dynamics and, hence, to identify the possible presence of chaotic dynamics. Runoff data from the first 17 years is used for reconstructing the phase-space (i.e., training or learning set) and for predicting the runoff for the subsequent one year. The phase-space is reconstructed with embedding dimensions from 1 to 10, and predictions are made one step ahead. Table 2 and Fig. 8 present a summary of the prediction results achieved for the runoff series, in terms of
earlier regarding the presence of low-dimensional chaotic dynamics in runoff. Two other important observations can be made from Fig. 7. First, in principle, the percentage of false nearest neighbors should attain zero when a sufficient number of phase-space ðmopt Þ to unfold the attractor is reached. However, as can be seen in Fig. 7, for the runoff series, the percentage of false neighbors does not reach zero at all, rather it reaches only the minimum of about 15% at m ¼ 4 or 5. Second, after reaching zero (or minimum) at mopt , the percentage of false neighbors should remain Table 2 Nonlinear prediction results for daily runoff from Lindenborg catchment Embedding dimension (m)
Correlation coefficient (CC)
Root mean square error (RMSE)
Coefficient of efficiency (R2 )
1 2 3 4 5 6 7 8 9 10
0.9759 0.9874 0.9907 0.9852 0.9715 0.9509 0.9296 0.9077 0.8789 0.8464
0.0891 0.0630 0.0542 0.0675 0.0944 0.1232 0.1470 0.1692 0.1917 0.2146
0.9434 0.9718 0.9791 0.9676 0.9366 0.8919 0.8462 0.7962 0.7384 0.6723
M.N. Islam, B. Sivakumar / Advances in Water Resources 25 (2002) 179–190 1.00
0.24
0.96
0.20
0.92
0.16
RMSE
Correlation Coefficient
188
0.88 0.84
0.12 0.08
0.80
0.04
0
2
4
6
8
10
0
2
4
Embedding Dimension
8
1
4.0
1.0
Observed Predicted
3.5
0.9
Runoff (cumec)
Coefficient of Efficiency
6
Embedding Dimension
0.8
0.7
3.0 2.5 2.0 1.5
0.6 0
2
4
6
8
10
0
50
100
150
200
250
300
350
400
Time (Day)
Embedding Dimension
Predicted Runoff (Cumec)
4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.5
2.0
2.5
3.0
3.5
4.0
4.5
Observed Runoff (Cumec) Fig. 8. Nonlinear prediction results for daily runoff from Lindenborg catchment: (a) relationship between correlation coefficient and embedding dimension; (b) relationship between root mean square error and embedding dimension; (c) relationship between coefficient of efficiency and embedding dimension; (d) comparison between time series plots of predicted and observed values; and (d) scatter plot of predicted versus observed values.
CC, RMSE, and coefficient of efficiency ðR2 Þ. As can be seen, the best prediction results are achieved (consistently) for embedding dimension 3, i.e., mopt ¼ 3 (CC ¼ 0:9907, RMSE ¼ 0:0542, R2 ¼ 0:9791). In addition to these three criteria, the time series plots and the scatter diagrams, such as the ones shown in Figs. 8(d) and (e), respectively, are also used to select the best predictions and mopt . Figs. 8(a)–(c), comparing the three prediction measures against m, show that the prediction accuracy increases with the increase in the embedding dimension
up to a certain point (m ¼ 3) and then decreases when the dimension is increased further. This result, i.e., the presence of mopt , seems to indicate that the runoff dynamics exhibit chaotic behavior. This supports the earlier contention regarding the possible presence of low-dimensional chaotic dynamics in the runoff phenomenon. A comparison of the actual runoff values and the predicted ones, using the time series plot (Fig. 8(d)) and scatter plot (Fig. 8(e)), indicates an excellent agreement between the two in the overall appearances and trends (rises and falls). What is more encouraging is
M.N. Islam, B. Sivakumar / Advances in Water Resources 25 (2002) 179–190
that even minor fluctuations present in the actual series are very well captured by the nonlinear prediction technique (Fig. 8(d)). Such results certainly indicate the appropriateness of the phase-space-based nonlinear prediction technique, employed herein, to understand, model and predict the runoff dynamics at the Lindenborg catchment. The influence of the presence of noise, mentioned above, on the outcomes of the nonlinear prediction analysis can be explained using the prediction results presented in Table 2 and Figs. 8(a)–(c). In principle, the prediction accuracy, after reaching its best at mopt , should remain the same beyond mopt . However, the present results indicate that the prediction accuracy decreases when the embedding dimension is increased beyond mopt . A possible reason for this may be the contamination of nearby points in the high-dimensional embedding with points whose earlier coordinates (at low embedding dimensions) are close but whose recent coordinates (at high embedding dimensions) are distant [10,20]. Having said that, it seems that the noise level in the runoff series from the Lindenborg catchment, studied in the present study, is very less, as near-accurate predictions (with CC ¼ 0:99 and R2 ¼ 0:98) are achieved.
4. Conclusions The behavior of runoff dynamics at the Lindenborg catchment in Denmark was studied from a nonlinear dynamical perspective. Specifically, the study was driven towards identifying the possible presence of low-dimensional chaotic dynamics in the runoff phenomenon. Daily runoff series observed over a period of 19 years (January 1975–December 1993) was analyzed. A variety of techniques, ranging from standard statistical tools, i.e., autocorrelation function and Fourier power spectrum, to specific chaotic techniques that are based on the concept of phase-space reconstruction, were employed. In addition to the commonly used correlation integral analysis to identify chaos, the present study employed the false nearest neighbor algorithm and the nonlinear prediction method. Particular emphasis was given to the appropriate selection of delay time for reconstructing the phase-space. Preliminary information regarding the possible presence of chaos was gathered using the two statistical techniques. The results from the three specific chaos identification methods not only provided independent verification of such evidence but also were found to be consistent with each other, both qualitatively and quantitatively. All these methods suggested that the daily runoff dynamics at the Lindenborg catchment were dominantly influenced by about 3–4 degrees of freedom or variables. The appropriateness of the concept of
189
phase-space reconstruction to understand, model, and predict the runoff dynamics was revealed as near-accurate predictions (correlation coefficient ¼ 0.99, coefficient of efficiency ¼ 0.98) were achieved. A brief discussion about the possible effects of noise, in the runoff data, on the outcomes of the false nearest neighbor and the nonlinear prediction analyses was also presented, with particular reference to the optimal embedding dimension for phase-space reconstruction (or dominant number of variables) to represent the runoff dynamics.
References [1] Rodriguez-Iturbe I, De Power FB, Sharifi MB, Georgakakos KP. Chaos in rainfall. Water Resour Res 1989;25(7):1667–75. [2] Islam I, Bras RL, Rodriguez-Iturbe I. An explanation for low correlation dimension estimates for the atmosphere. J Appl Meteorol 1993;32(2):203–8. [3] Tsonis AA, Elsner JB, Georgakakos KP. Estimating the dimension of weather and climate attractors: important issues about the procedure and interpretation. J Atmos Sci 1993;50: 2549–55. [4] Berndtsson R, Jinno K, Kawamura A, Olsson J, Xu S. Dynamical systems theory applied to long-term temperature and precipitation time series. Trends Hydrol 1980;1:291–7. [5] Jayawardena AW, Lai F. Analysis and prediction of chaos in rainfall and stream flow time series. J Hydrol 1994;153:23–52. [6] Puente CE, Obregon N. A deterministic geometric representation of temporal rainfall: results for a storm in Boston. Water Resour Res 1996;32(9):2825–39. [7] Sangoyomi TB, Lall L, Abarbanel HDI. Nonlinear dynamics of the Great Salt Lake: dimension estimation. Water Resour Res 1996;32(1):149–59. [8] Porporato A, Ridolfi L. Nonlinear analysis of river flow time sequences. Water Resour Res 1997;33(6):1353–67. [9] Liu Q, Islam S, Rodriguez-Iturbe I, Le Y. Phase-space analysis of daily streamflow: characterization and prediction. Adv Water Resour 1998;21:463–75. [10] Kawamura A, McKerchar AI, Spigel RH, Jinno K. Chaotic characteristics of the southern oscillation index time series. J Hydrol 1998;204:168–81. [11] Sivakumar B, Liong SY, Liaw CY, Phoon KK. Singapore rainfall behavior: chaotic?. J Hydrol Eng, ASCE 1999;4(1):38–48. [12] Stehlik J. Deterministic chaos in runoff series. J Hydraul Hydromech 1999;47(4):271–87. [13] Jayawardena AW, Gurung AB. Noise reduction and prediction of hydrometeorological time series: dynamical systems approach vs. stochastic approach. J Hydrol 2000;228:242–64. [14] Sivakumar B, Sorooshian S, Gupta HV, Gao X. A chaotic approach to rainfall disaggregation. Water Resour Res 2001;37(1):61–72. [15] Sivakumar B. Chaos theory in hydrology: important issues and interpretations. J Hydrol 2000;227(1–4):1–20. [16] Osborne AR, Provenzale A. Finite correlation dimension for stochastic systems with power-law spectra. Physica D 1989;35:357–81. [17] Pasternack GB. Does the river run wild? Assessing chaos in hydrological systems. Adv Water Resour 1999;23:253–60. [18] Liaw CY, Islam MN, Phoon KK, Liong SY. Comment on ‘‘does the river run wild? Assessing chaos in hydrological systems’’ by Pasternack, G.B. Adv Water Resour 2001;24(5):575–8. [19] Casdagli M. Nonlinear prediction of chaotic time series. Physica D 1989;35:335–56.
190
M.N. Islam, B. Sivakumar / Advances in Water Resources 25 (2002) 179–190
[20] Sugihara G, May RM. Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature 1990;344:734–41. [21] Theiler J. Statistical precision of dimension estimators. Phys Rev A 1990;41(6):3038–51. [22] Provenzale A, Osborne AR, Soj R. Convergence of the K2 entropy for random noises with power law spectra. Physica D 1991;47:361–72. [23] Rapp RE, Albano AM, Zimmermann ID, Jimenez-Montano MA. Phase-randomized surrogates can produce spurious identifications of non-random structure. Phys Lett A 1994;192:27–33. [24] Takens F. Detecting strange attractors in turbulence. In: Rand DA, Young LS, editors. Dynamical systems and turbulence. Lecture Notes in Mathematics, vol. 898. Berlin: Springer; 1980. p. 366–81. [25] Packard NH, Crutchfield JP, Farmer JD, Shaw RS. Geometry from a time series. Phys Rev Lett 1980;45(9):712–6. [26] Havstad JW, Ehlers CL. Attractor dimension of nonstationary dynamical systems from small data sets. Phys Rev A 1989;39(2):845–53. [27] Holzfuss J, Mayer-Kress G. An approach to error-estimation in the application of dimension algorithms. In: Mayer-Kress G, editor. Dimensions and entropies in chaotic systems. New York: Springer; 1986. p. 114–22.
[28] Frazer AM, Swinney HL. Independent coordinates for strange attractors from mutual information. Phys Rev A 1986;33(2):1134– 40. [29] Liebert W, Schuster HG. Proper choice of the time delay for the analysis of chaotic time series. Phys Lett A 1989;141:386–90. [30] Schuster HG. Deterministic chaos. Weinheim: VCH; 1988. [31] Tsonis AA, Elsner JB. The weather attractor over very short timescales. Nature 1988;333:545–7. [32] Tsonis AA. Chaos: from theory to applications. New York: Plenum Press; 1992. [33] Grassberger P, Procaccia I. Measuring the strangeness of strange attractors. Physica D 1983;9:189–208. [34] Kennel MB, Brown R, Abarbanel HDI. Determining embedding dimension for phase space reconstruction using a geometric method. Phys Rev A 1992;45:3403–11. [35] Abarbanel HDI. Analysis of observed chaotic data. New York: Springer; 1996. [36] Abarbanel HDI, Lall U. Nonlinear dynamics of the Great Salt Lake: system identification and prediction. Climate Dynamics 1996;12:287–97. [37] Randle Inc., cspW, Tools for dynamics. Applied nonlinear sciences. Great Falls, VA: LLC, Randle Inc.; 1996. [38] Farmer DJ, Sidorowich JJ. Predicting chaotic time series. Phys Rev Lett 1987;59:85–848.