HYDROL 3994
Journal of Hydrology 230 (2000) 244–257 www.elsevier.com/locate/jhydrol
Daily reservoir inflow forecasting using artificial neural networks with stopped training approach P. Coulibaly a,*, F. Anctil b, B. Bobe´e c b
a Department of Civil Engineering, Universite´ Laval, Sainte-Foy, QC, Canada G1K 7P4 Department of Civil Engineering, Centre de Recherche Ge´omatique, Universite´ Laval, Sainte-Foy, QC, Canada G1K 7P4 c NSERC/Hydro-Quebec Chair in Statistical Hydrology (INRS-Eau), Sainte-Foy, QC, Canada G1V 4C7
Received 12 July 1999; received in revised form 25 January 2000; accepted 9 February 2000
Abstract In this paper, an early stopped training approach (STA) is introduced to train multi-layer feed-forward neural networks (FNN) for real-time reservoir inflow forecasting. The proposed method takes advantage of both Levenberg–Marquardt Backpropagation (LMBP) and cross-validation technique to avoid underfitting or overfitting on FNN training and enhances generalization performance. The methodology is assessed using multivariate hydrological time series from Chute-du-Diable hydrosystem in northern Quebec (Canada). The performance of the model is compared to benchmarks from a statistical model and an operational conceptual model. Since the ultimate goal concerns the real-time forecast accuracy, overall the results show that the proposed method is effective for improving prediction accuracy. Moreover it offers an alternative when dynamic adaptive forecasting is desired. 䉷 2000 Elsevier Science B.V. All rights reserved. Keywords: Real-time forecasting; Reservoir inflow; Artificial neural networks; Stopped training approach
1. Introduction Accurate real-time forecasts of natural inflows to hydropower reservoirs are of particular interest for operation and scheduling. A variety of methods have been proposed for this purpose including conceptual (physical) and empirical (statistical) models (WMO, 1994) but none of them can be considered as a single superior model (Shamseldin, 1997). In a large scale hydrosystem context, owing to the complexity of hydrological processes, there are many situations where accurate site-specific predictions remain a difficult task using the linear recurrence * Corresponding author. Tel: ⫹1-418-656-3653; fax: ⫹1-418656-2928. E-mail address:
[email protected] (P. Coulibaly).
relations or physically based watershed models. The former do not attempt to take into account the nonlinear dynamic of hydrological processes, and the latter generally ignore the stochastic behavior underlying any hydrosystem. Nonlinear statistical methods have been suggested and discussed (Jacoby, 1966; Amorocho and Brandstetter, 1971; Tong, 1990). Owing to difficulties of formulating reasonable nonlinear watershed models, recent attempts have resort to artificial neural network (ANN) approach for complex hydrologic modeling (Saad et al., 1996; Clair and Ehrman, 1998; Jain et al., 1999; Coulibaly et al., 2000). ANNs are data dependent. They do not impose functional relationship between the independent and dependent variables. Instead, the functional relationship is determined by the data in the training (or
0022-1694/00/$ - see front matter 䉷 2000 Elsevier Science B.V. All rights reserved. PII: S0022-1694(00 )00 214-6
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257
245
Fig. 1. Simplified multi-layer feed-forward neural network with one hidden layer.
calibration) process. The advantage of such an approach is that a network with sufficient hidden units is able to approximate any continuous function to any degree of accuracy, if efficient training is performed (Cybenko, 1989; Hornik et al., 1989). In the hydrological forecasting context, recent experiments have reported that ANNs may offer a promising alternative for rainfall–runoff modeling (Zhu and Fujita, 1994; Smith and Eli, 1995; Hsu et al., 1995; Shamseldin, 1997; Sajikumar and Thandaveswara, 1999; Tokar and Johnson, 1999), streamflow prediction (Kang et al., 1993; Karunanithi et al., 1994; Thirumalaiah and Deo, 1998; Clair and Ehrman, 1998; Zealand et al., 1999; Campolo et al., 1999), and reservoir inflow forecasting (Saad et al., 1996; Coulibaly et al., 1998; Jain et al., 1999). Recently, Coulibaly et al. (1999) reviewed the ANN-based modeling in hydrology over the last years, and reported that about 90% of the experiments extensively make use of the multilayer feed-forward neural networks (FNN) trained by the standard backpropagation (BP) algorithm (Rumelhart et al., 1986). Although BP training has proved to be efficient in some applications, its convergence tends to be very slow, and it often yields
sub-optimal solutions (Baldi and Hornik, 1989; Mu¨lenbein, 1990; Sima, 1996). This may not be suitable for dynamic adaptive accurate forecasting purpose. FNN using BP algorithm simply fails to find a solution to even rather simple pattern classification problems (Mu¨lenbein, 1990). Recent attempts to use BP training for real-time flood prediction has lead to relatively poor performance particularly for peak flows (Khondker et al., 1998). A major objective of training an ANN for prediction is to generalize, i.e. to have the outputs of the network approximate target values given inputs that were not in the training set. An efficient backpropagtion training requires at least some heuristic modifications (Jacobs, 1988; Tollenaere 1990; Weigend et al., 1991; Yu and Chen, 1997) or some of the numerical optimization techniques fully described by Hagan et al. (1996, p. 12.8). The most well-known BP variant named cascadecorrelation (Fahlman and Lebiere, 1990) has shown promising river flow prediction skill (Karunanithi et al., 1994; Thirumalaiah and Deo, 1998), however Smieja (1993) argued that it may generalize better when learning about logical rather than natural problems.
246
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257
This paper investigates the use of early stopped training approach (STA) to improve multi-layer FNN training for real-time reservoir inflow forecasting. The idea of “early stopping” or “stopped training” before network convergence has first been introduced by Nelson and Illingworth (1991, p.165) to avoid the problem of overfitting in large FNN. Unfortunately, the stopping criterion used was not a good estimate of the generalization error (Sarle, 1995). Recently, Prechelt (1998) proposed an empirical early stopping criterion to improve the network generalization ability. In practice, this method requires a rather long training time. Here we use a generalization loss criterion to perform the early stopping training with Levenberg–Marquardt Backpropagation (LMBP). This study aims to find an acceptable trade-off between network training time and valid generalization ability in order to enhance predictions accuracy. In Section 2, the proposed method is presented. The selected forecast models are described in Section 3. Results from the forecast experiment are reported in Section 4, and finally some conclusions are drawn in Section 5.
2. Methodology 2.1. The FNN architecture In general, the architecture of multi-layer FNN can have many layers where a layer represents a set of parallel processing units (or nodes). The three-layer FNN (Fig. 1) used in this study contains only one intermediate (hidden) layer. Multi-layer FNN can have more than one hidden layer, however theoretical works have shown that a single hidden layer is sufficient for ANNs to approximate any complex nonlinear function (Cybenko, 1989; Hornik et al., 1989). Indeed many experimental results seem to confirm that one hidden layer may be enough for most forecasting problems (Zhang et al., 1998; Coulibaly et al., 1999). Therefore, in our experiment, one hidden layer FNN is used. It is the hidden layer nodes that allow the network to detect and capture the relevant pattern(s) in the data, and to perform complex nonlinear mapping between the input and the output variables. The sole role of the input layer of nodes is to relay the external inputs to the neurons of the
hidden layer. Hence the number of input nodes corresponds to the number of input variables (Fig. 1). The outputs of the hidden layer are passed to the last (or output) layer which provides the final output of the network. The network ability to learn from examples and to generalize depends on the number of hidden nodes. A too small network (i.e. with very few hidden nodes) will have difficulty learning the data, while a too complex network tends to overfit the training samples and thus has a poor generalization capability. Finding a parsimonious model for accurate prediction is particularly critical since there is no formal methods for determining the appropriate number of hidden nodes prior to training. Therefore, here we resort to the trial-and-error method commonly used for network design. The training algorithm used is hereafter presented. 2.2. Levenberg–Marquardt backpropagation training In prediction context, multi-layer FNN training consists of providing input–output examples to the network, and minimizing the objective function (i.e. error function) using either a first-order or a secondorder optimization method. This so-called supervised training can be formulated as one of minimizing as function of the weight, the sum of the nonlinear least squares between the observed and the predicted outputs, defined by: E
n X m 2 1 X ypk ⫺ y^pk 2 p1 k1
1
where n is the number of patterns (observations) and m the total output units, y represents the observed response (“target output”) and y^ the model response (“predicted output”). In the case of one output unit
m 1; Eq. (1) reduces to E
n 2 1 X yp ⫺ y^p 2 p1
2
which is the usual function minimized in least squares regression. In the BP training, minimization of E is attempted using the steepest descent method and computing the gradient of the error function by applying the chain rule on the hidden layers of the FNN (Rumelhart et al., 1986, p. 318). Consider a typical multi-layer FNN (Fig. 1) whose hidden layer contains
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257
M neurons. The network is based on the following equations: netpj
N X
Wji xpi ⫹ Wjo
3
1 1 ⫹ e⫺netpj
4
i1
g netpj
where netpj is the weighted inputs into the jth hidden unit, N the total number of input nodes, Wji the weight from input unit i to the hidden unit j, xpi a value of the ith input for pattern p, Wjo the threshold (or bias) for neuron j, and g(netpj) the jth neuron’s activation function assuming that g( ) is the logistic function. Note that the input units do not perform operation on the information but simply pass it onto the hidden nodes. The output unit receives a net input of netpk
M X
Wkj g netpj ⫹ Wko
to its drawbacks. Overall, second-order nonlinear optimization techniques are usually faster and more reliable than any BP variant (Masters, 1995; Bertsekas and Tsitsiklis, 1996). Therefore, here we focus on LMBP for multi-layer FNN training. The LMBP uses the approximate Hessian matrix (second derivatives of E) in the weight update procedure as follows: DWkj ⫺ H ⫹ mI ⫺1 J T r
8 where r is the residual error vector, m a variable small scalar which controls the learning process, J 7E is the Jacobian matrix, and H J T J denotes the approximate Hessian matrix usually written as 7 2 E 艑 2J T J: In practice, LMBP is faster and finds better optima for a variety of problems than do the other usual methods (Hagan and Menhaj, 1994). To improve the network training speed and efficiency, the LMBP is used with the early stopped training approach (STA).
5 2.3. Early stopped training approach (STA)
j1
y^pk g netpk
6
where M is the number of hidden units, Wkj represents the weight connecting the hidden node j to the output k, Wko is the threshold value for neuron k, and y^pk the kth’s predicted output. Recall that the ultimate goal of the network training is to find the set of weights Wji connecting the input units i to the hidden units j and Wkj connecting the hidden units j to output k, that minimize the objective function (Eq. (1)). Since Eq. (1) is not an explicit function of the weight in the hidden layer, the first partial derivatives of E are evaluated with respect to the weights using the chain rule, and the weights are moved in the steepest-descent direction. This can be represented mathematically as DWkj ⫺h
247
2E 2Wkj
7
where h is the learning rate which simply scales the step size. The usual approach in BP training consists in choosing h according to the relation 0 ⬍ h ⬍ 1: From Eq. (7) it is straightforward that BP can suffer from the inherent slowness and the local search nature of first-order optimization method. However, BP remains the most widely used supervised training method for FNN because of the available remedies
The main issue in training multi-layer FNN for prediction is the generalization performance. Multilayer FNN, like other flexible nonlinear estimation methods such as kernel regression, smoothing splines, can suffer from either underfitting or overfitting. While a too complex FNN (i.e. too much hidden nodes) may likely fit the noise leading to overfitting, an insufficiently complex network (i.e. insufficient hidden nodes) can fail to detect the regularities in the data set, leading to underfitting. Underfitting produces excessive bias in the model outputs whereas overfitting produces excessive variance. Here, in order to avoid underfitting and overfitting, STA is introduced with the LMBP training. STA allows the use of complex network without overfitting since the training is stopped as soon as some nonzero criterion is met. Therefore the efficiency of the method depends highly on the stopping criterion. There are a number of plausible stopping criteria that have been reported to be superior to regularization methods (Finnoff et al., 1993). Here the stopping criterion used involves a tradeoff between training time and generalization error and proceeds as follows. The available data are split into three parts: (1) a training set, used to determine the network weights;
248
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257
(2) a validation set, used to estimate the network performance and decide when we stop training; (3) a prediction (or test) set, used to verify the effectiveness of the stopping criterion and to estimate the expected performance in the future. Recall that E is the objective function (Eq. (2)), then Etr(t) represents the mean square-error per example over the training set after epoch t, and Eval(t) is the corresponding error on the validation set. As long as Eval(t) decreases, training continues. When the validation error starts to increase, we stop training. The final result of the training in this case is the set of weights that exhibit the lowest validation error that can be written E Low
t minEval
t 0
9
t 0 ⱕt
where ELow(t) is the lowest validation error obtained in epochs up to t and Eval(t 0 ) is the error on the validation set. The stopping criterion can be defined as
Eval
t ⫺1 GR
t 100 ELow
t
10
where GR(t) is the percent of generalization loss at epoch t. This corresponds to the relative increase of the validation error over the actual lowest. In fact, Eq. (9) determines the optimum (or lowest error) on the validation set, while Eq. (10) estimates the relative increase of the validation error over the observed minimum (or optimum). Generalization loss is obviously a serious reason to stop training, and therefore avoid the overtraining. As the generalization error is estimated by cross-validation with a holdout set, this allows for comparing solutions, and stopping when the validation error is effectively minimum. Using STA with second-order optimization method (LMBP) should reduce the training time, therefore the model could be retrained on-line to adapt to changing future events. Here, we do not require the training process to converge, rather, the training process is used to perform a direct search of a model with superior generalization performance. The performance of the method is assessed on a practical real-time forecasting problem in Section 4 and the results are compared to those of the following models.
3. Description of selected forecast models 3.1. The conceptual model (PREVIS) A conceptual model (PREVIS) maintained by the Energy Division of Aluminum Company of Canada (Alcan) for real-time reservoir operation is selected. The PREVIS model is an extended version of the global conceptual model proposed by Kite (1978) for the Canadian watersheds. The model structure incorporates interconnected conceptual storage systems that are found to have significant contribution to the generation of streamflow. The inputs to the PREVIS model are total precipitation (rainfall, estimated snowmelt) and temperature. In this study, the PREVIS model is calibrated using the same data as those of the statistical and FNN models presented hereafter. In order to improve the predictions accuracy, a correction procedure is applied to the PREVIS model output as ÿ y^corr
t y^
t ⫹ y
t ⫺ 1 ⫺ y^
t ⫺ 1
11 ^ are corrected and predicted forewhere y^corr
t and y
t ^ ⫺ 1 casts at time t, respectively, and y
t ⫺ 1 and y
t are observed and predicted forecasts at time t ⫺ 1: The PREVIS model is used as a benchmark because it has been periodically updated for operational use in the study area since 1979. 3.2. The autoregressive moving average with exogenous inputs (ARMAX) model The ARMAX models developed by Box and Jenkins (1976) have been widely used for hydrologic modeling. In this linear regression method, the outcome is related to the linear combination of the independent variables assuming a functional form. ^ is In this case, the forecasted reservoir inflow y
t ^ ⫺ i, inputs x
t ⫺ j; and related to past outcomes y
t model errors e
t ⫺ k assuming a linear autoregressive moving average with exogenous inputs (ARMAX) model: y^
t ⫺
na X
ai y^
t ⫺ i ⫹
i1
⫹ e
t
nb X j1
nc ÿ X bj x t ⫺ j ⫹ ck e
t ⫺ k k1
(12)
where na, nb, and nc, are the number of past outputs,
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257
249
Fig. 2. Selection of hidden layer nodes: (a) M 7 for LMBP and (b) M 24 for STA-LMBP.
inputs and error terms, respectively, ai, bj, ck, are the model parameters to be estimated, and e(t) is the model error. In this study, the optimal ARMAX model identified by Ribeiro et al. (1998) using the same hydrological time series as considered here is selected. This model is formulated as follows: y^
t a1 y^
t ⫺ 1 ⫹ b11 x1
t ⫹ b21 x2
t ⫹ b31 x3
t ⫹
4 X j0
4 ÿ X ÿ b4; j⫹1 x4 t ⫺ j ⫹ b5; j⫹1 x5 t ⫺ j ⫹ e
t j0
13 where x1, x2, x3 represent the maximum, minimum and mean temperature, x4 and x5 are precipitation (rainfall) and snowmelt, respectively. Eq. (13) shows that the ARMAX model uses 14 exogenous inputs to predict ^ the outcome y
t. The inputs x(t) are estimated over the lead-time using a cross-correlation analysis. In order to enable daily adjustment of estimated parameters, a Kalman filter is coupled with the ARMAX model and the final model is referred to as ARMAXKF (autoregressive moving average with exogenous inputs model with Kalman filter). The ARMAX-KF model uses a total of 17 parameters whereas the PREVIS model requires 18 parameters. 3.3. The multi-layer feed-forward neural network (FNN) models A difficult task with ANNs involves choosing parameters such as the number of hidden nodes, the learning rate h , and the initial weights. As discussed previously, there is no theory yet to tell how many hidden units are needed to approximate any given
function. The network geometry is problem dependent. Here, we use the three-layer FNN with one hidden layer (Fig. 1) and the common trial and error method to select the number of hidden nodes. The model structure can be represented by the notation FNN(N,M,m), where N is the number of input nodes, M the number of hidden nodes and m the number of nodes in the output layer. In this study, the 14 input variables (i.e. exogenous inputs) used in the ARMAX model (Eq. (13)) constitute the network inputs. To identify an appropriate FNN model, M is varied over the range 2–14 and for each model, the fit to the calibration and validation data set is evaluated, respectively, using the root mean square error (RMSE). The validation mean RMSE for all the forecast lead times (1–7 days) is presented in Fig. 2. The optimum number of hidden nodes for the FNN with LMBP is M 7 (Fig. 2(a)). Therefore, the FNN(14,7,1) is chosen. Here it is found that for a short-time (daily) forecast, there is no significant advantage to select a specific network structure for each forecast lead time. However, as indicated by Zealand et al. (1999) for a multi-week ahead forecast, it may be appropriate to use a specific network structure for each forecast lead time. Using the early stopping training approach (STA), it is essential to use lots of hidden units to avoid bad local optima (Sarle, 1995). In this case, M is varied over the range 15–30. Fig. 2(b) shows that FNN(14,24,1) seems to be the most efficient when the STA is used with the LMBP. Therefore, here the notations LMBP and STA-LMBP refer to FNN(14,7,1) and FNN(14,24,1), respectively. Finally, the selected models are tested on a prediction set.
250
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257
Note that the models are trained using the default training parameters (h 0:01; Eq. (7); m 0:01; Eq. (8)) of the LMBP algorithm. Each model is initialized using a very small random initial values. The logistic function (Eq. (4)) and a linear function are used as the hidden nodes and the output node activation function, respectively. The LMBP and the STALMBP with a total of 22 and 39 nodes (or parameters), respectively, have a total of 113 and 385 weights and biases (or free parameters). To perform the daily inflow forecast, we use an iterative method. There are two ways of making multi-step forecasting: the iterative method which uses one output node and the direct method which uses k-output nodes for k-step ahead forecasts. The latter uses only past information to directly forecast multi-step-ahead events while the former rely on its own outputs and all useful past information to perform a single-step-ahead forecast iteratively to attend the kstep-ahead forecast as follows: ÿ ^ x
t; x
t ⫺ 1; …; x
t ⫺ n ^ ⫹ 1 f1 y
t; y
t ÿ ^ ⫹ 1; y
t; ^ x
t; x
t ⫺ 1; …; x
t ⫺ n ^ ⫹ 2 f2 y
t y
t .. .
ÿ ^ ⫹ k ⫺ 1; ^ ⫹ k fk y
t y
t ^ ⫹ k ⫺ 2; …; y
t; ^ x
t; x
t ⫺ 1; …; x
t ⫺ n y
t
14
^ is the where x(t) are the input variables at time t, y
t forecast at time t, n denotes the number of input variables, f1,…,fk are functions determined by the network. As can be seen from Eq. (14), the network forecasting method estimates a specific function for each forecast lead time whereas the Box–Jenkins method (Eq. (13)) uses only a single function to predict one point and then iterates this function on its own outputs to predict future points. This can explain why the former may be better than the latter as the forecast lead time grows.
du-Diable watershed, in northern Quebec. The watershed is 9700 km 2 in size and contains a large hydropower reservoir. The original data consist of 32 years (1964–1995) of daily natural inflows, precipitation (rain and snow) and estimated snowmelt, daily maximum, minimum and mean temperature. We used 29 years (1964– 1992) of daily records for the models calibration and the last 3 years (1993–1995) for the prediction. When we use the FNN with STA, the calibration set is split into two subsets: the first one (1964–1980) is used for the network training (or calibration), and the second one (1981–1992) for the validation. For the real-time reservoir operation purpose, a forecast horizon of 1–7 days ahead is chosen. In this experiment, accurate forecast of the spring term water inflows is of particular interest. Therefore, the models prediction performance is evaluated on the spring period. 4.2. Models performance criteria To evaluate the models performance, four criteria are used. The correlation statistic (CORR) is selected to measure the linear correlation between the actual and the predicted water inflows. The optimal CORR value is unity and a value smaller than 0.7 is assumed to be problematic. The RMSE (root mean square error) is selected as the common performance measure as it shows the global goodness of the fit. RMSE equal to zero for a perfect prediction. To estimate the efficiency of the fit, the Nash–Sutcliffe coefficient (NSC) (Nash and Sutcliffe, 1970) also known as the R 2 criterion is used. The optimum NSC value is unity and a NSC smaller than 0.7 corresponds to a very poor fit. As the accurate forecast of peak flows is of particular interest for the hydropower reservoir operation, the peak flow criterion (PFC) (Ribeiro et al. 1998) is also considered. It can be specified as 0 11=4 np 2 X @ yp ⫺ y^p y2p A PFC
4. Experiment and results
p1
0 @
np X
11=4
15
y2p A
p1
4.1. Basic dataset Data for the experiment were taken from Chute-
where np is the number of peak flows greater than one third of the mean peak flow observed. PFC provides
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257
251
Table 1 Model RMSE (m 3/s) and NSC (in parentheses) statistics averaged over the test period (1993–1995) Model
PREVIS ARMAX-KF LMBP STA-LMBP
Forecast lead (day) 1
2
3
4
5
6
7
97.02 (0.880) 64.14 (0.950) 76.54 (0.992) 74.05 (0.925)
106.45 (0.857) 99.92 (0.885) 83.05 (0.908) 77.59 (0.913)
112.63 (0.841) 129.58 (0.809) 97.05 (0.870) 89.96 (0.878)
118.31 (0.825) 152.05 (0.740) 102.42 (0.861) 91.80 (0.874)
124.56 (0.804) 170.13 (0.677) 110.07 (0.846) 92.23 (0.870)
130.95 (0.781) 185.14 (0.618) 109.87 (0.849) 96.65 (0.868)
136.39 (0.760) 198.45 (0.562) 109.75 (0.848) 96.67 (0.860)
accurate measure of the model performance for flood periods. A PFC equal to zero represents a perfect fit. 4.3. Real-time forecast statistics The RMSE and NSC statistics of the identified models for the three-year test period, are summarized in Table 1. Examination of Table 1 indicates that, in general, ARMAX-KF provides the most accurate prediction for 1 day ahead forecast, whereas the STA-LMBP is the best model for 2–7-day ahead forecast. PREVIS and LMBP outperforms the ARMAXKF on medium term (4–7 days) forecast. Comparison of the results of STA-LMBP with those of the ARMAX-KF, shows that the former has significantly better performance than the latter for the forecast lead time of 4–7 days in all the 3 years of test data. In general, as can be seen from Table 1, the forecast error (RMSE) increases with the growth of the forecast lead time for all the models. However, the RMSE increases more rapidly for the ARMAX-KF and the PREVIS models than the ANN models. In general, a NSC value greater than 0.9 indicates a very satisfactory model performance, while a NSC value in the range 0.8–0.9 indicates a fairly good model, and values less 0.8 indicate an unsatisfactory model. It is interesting to note that for the forecast lead-time
of 1-week, only the LMBP and the STA-LMBP models have a NSC value in the range 0.8–0.9 on an average (Table 1). This suggests that the proposed ANN-based models are acceptable models. Conversely, the PREVIS and the ARMAX-KF models have NSC values less than 0.8 in average, suggesting that they are unsatisfactory models for 1-week ahead forecast. To examine the models performance during the spring flooding period, the peak flow criteria (PFC) is used. The PFC is a better indicator than the RMSE to measure the model performance on the peak flows period. The PFC statistics (Table 2) reveal that the four models are fairly equivalent for 1 day ahead forecast except the PREVIS model which is inferior. For the forecast lead time of 2–7, the STA-LMBP and the LMBP model performance is clearly superior for the peak flow forecasting. STA-LMBP improves the peak flow forecast accuracy in average by 45 and 31% with comparison to ARMAX-KF and PREVIS, respectively. Whatever the forecast lead time, the PFC statistics (Table 2) are almost similar for the LMBP and the STA-LMBP in all the 3 years of test data. For further analysis of the models prediction performance, Fig. 3 presents the CORR and NSC statistics for the test years 1993 and 1995. Clearly the STALMBP model performance is better than the others, except for the first day forecast. The CORR statistics
Table 2 Model PFC statistics averaged over the test period (1993–1995) Model
PREVIS ARMAX-KF LMBP STA-LMBP
Forecast lead (day) 1
2
3
4
5
6
7
0.185 0.140 0.145 0.145
0.195 0.187 0.146 0.142
0.200 0.217 0.149 0.145
0.204 0.236 0.147 0.144
0.208 0.249 0.149 0.147
0.212 0.260 0.148 0.146
0.216 0.270 0.147 0.145
252
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257
Fig. 3. Forecast CORR and NSC statistics.
range from 0.9–0.98 for both the LMBP and STALMBP, and show a slow decrease with the increase in the forecast lead time. Conversely, for the ARMAXKF model, the CORR decreases rapidly with the increase in the forecast horizon. As mentioned earlier, this lack of consistency to increasing forecast lead time may be owing to the Box–Jenkins forecasting procedure (see Eqs. (13) and (14)). Fig. 3 also reveals that the PREVIS model behaves like the neural network models, but it provides less accurate forecast than the latter. To examine these results in more detail, Fig. 3 also provides the NSC statistics. For the two test years showed (1993 and 1995), the NSC performance is slightly similar for all the models except for the ARMAX-KF. Overall, the STALMBP appears to be the best model for the medium term (4–7 days ahead) forecast. Conversely, the ARMAX-KF is superior only for 1 day ahead forecast in the test year 1993. For the test year 1995, all the models are almost similar for 1 day ahead forecast.
Finally, the comparative performance in term of RMSE provided in Fig. 4 suggest that using the STA-LMBP, significantly improves the overall prediction accuracy. There is no systematic deterioration in the forecast skill with the growth in the forecast lead time. This may indicate the dynamic forecast skill and robustness of the STA-LMBP. Fig. 4 also indicates that the LMBP does not deteriorate in this case for the peak flow prediction. However, it is important to note that the use of STA significantly reduces the network training times by a factor 4 and provides at least better forecast accuracy than the LMBP used alone. Using only the LMBP, the peak flow forecast accuracy can be improved in average by a factor 2 with comparison to the ARMAX-KF model. The PFC graphics (Fig. 4) show clearly that both PREVIS and ARMAX-KF have a poor PFC performance particularly for the forecast lead time 2–7 days. In terms of accuracy and consistency, as can be seen from Figs. 3 and 4 the overall performance of the STA-LMBP is better
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257
253
Fig. 4. Forecast RMSE and PFC statistics.
than the other models in all the two-year test period showed. To substantiate the proposed model performance statistics discussed before, results of forecasting using the STA-LMBP on the test period 1994 are presented in Figs. 5 and 6 for 1-, 3-, 5- and 7-day ahead forecasts. The computed hydrographs using the LMBP model are not shown here as they are almost similar to those of the STA-LMBP. However, as discussed earlier, the STA-LMBP generalizes more reliably than the LMBP because the STA feature prevents the network from underfitting or overfitting in any case. The LMBP generalization capability is highly case dependent and it can suffer from underfitting or overfitting at any time. Here, in general, the computed hydrographs (Figs. 5 and 6) show that the STA-LMBP model forecasts the magnitude and the timing of the peak flows better than it does for the baseflow. The forecast skill deteriorates very slowly when forecasting from 1-day ahead to 1week ahead. The results for 1- and 3-day forecast
(Fig. 5(a) and (b)) are quite good for the three peak flows in the test set. The results for 5- and 7-day forecast (Fig. 6(c) and (d)) are quite acceptable for the peak flows. Conversely, for the flows below 250 m 3/s the forecast accuracy is worse whatever the lead time. However, the model is generally effective at forecasting the spring volume of flow and the magnitude of peak flow for the different lead times (Table 3). For a lead time of 1-day, the forecasted spring volumes range from an underprediction of 17% to an overprediction of 25% in average while the forecasted peak flows range from an underprediction of 4.5% to an overprediction of 2.5% in average (Table 3). For a lead time of 1week, the forecasted spring volumes range from an underprediction of 27% to an overprediction of 20% in average while the forecasted peak flows range from an underprediction of 4.3% to an overprediction of 3.8% on an average. These results clearly confirm that the model is more effective at forecasting peak flows than baseflow.
254
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257
Fig. 5. Forecasted and observed inflow hydrographs for spring 1994 using the STA-LMBP model: (a) 1-day ahead forecast; (b) 3-day ahead forecast.
5. Conclusions In this paper, the FNN is trained using the early stopped training approach (STA) with the secondorder optimization method (LMBP) for real-time hydrologic forecasting. From the results obtained, the proposed method appears to be an effective tool for daily real-time reservoir inflow forecasting. An important advantage of the STA over the existing ANN pruning methods, is that it is faster than training
to complete convergence followed by pruning. Moreover, the use of the STA secures the network against overfitting or underfitting in any case. In the experiment considered, the use of the STA reduced the network training times by a factor 4 and the test results obtained using the STA-LMBP indicate that the method can provide better and reliable generalization performance than the second-order optimization method (LMBP) alone. The comparison of the results of the STA-LMBP with those of the
Table 3 Average percent of underprediction and overprediction for spring 1994 using the STA-LMBP Lead time (days)
1 3 5 7
Forecasted volumes
Forecasted peak flows
Underprediction (%)
Overprediction (%)
Underprediction (%)
overprediction (%)
17 20 32 27
25 29 16 20
4.5 3.6 3.9 4.3
2.5 2.9 3.3 3.8
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257
255
Fig. 6. Forecasted and observed inflow hydrographs for spring 1994 using the STA-LMBP model: (c) 5-day ahead forecast; (d) 1-week ahead forecast.
ARMAX-KF, and the PREVIS model shows that, in general, the proposed method has substantially better prediction accuracy than these models. Moreover, the very low deterioration of the prediction performance with the increase in the forecast lead time, suggests that the method may be a robust and adequate alternative for dynamic on-line forecasting. The present study provides a less computational and less costly method to apply FNN model in hydrologic modeling context, since there is not need for network convergence. Therefore, the network can easily be retrained while new input information is available. However, in future applications in the hydrologic forecasting context, consideration should be given to testing STA with recurrent neural networks.
Acknowledgements Financial support has been granted through a grant
from the Natural Sciences and Engineering Research Council of Canada (NSERC) to the second author. This support is gratefully acknowledged. We gratefully acknowledge Professor Jean Rousselle (Civil Engineering Department, Ecole Polytechnique of Montreal) and Aluminum Company of Canada (Alcan) for providing the experiment data. The authors would like to thank two anonymous reviewers for their valuable comments and suggestions. References Amorocho, J., Brandstetter, A., 1971. Determination of nonlinear function response functions in rainfall–runoff processes. Water Resour. Res. 7 (5), 1087–1101. Baldi, P., Hornik, K., 1989. Neural network and principal component analysis: learning from examples without local minima. Neural Networks 2 (1), 53–58. Bertsekas, D.P., Tsitsiklis, J.N., 1996. Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, USA. Box, G.E.P., Jenkins, G.M., 1976. Time Series Analysis: Forecasting and Control, Holden-Day, San-Francisco, CA, USA.
256
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257
Campolo, M., Andreussi, P., Soldati, A., 1999. River flood forecasting with a neural network model. Water Resour. Res. 35 (4), 1191– 1197. Clair, T.A., Ehrman, J.M., 1998. Using neural networks to assess the influence of changing seasonal climates in modifying discharge, dissolved organic carbon, and nitrogen export in eastern Canadian rivers. Water Resour. Res. 34 (3), 447–455. Coulibaly, P., Anctil, F., Bobe´e, B., 1998. Real time neural networkbased forecasting system for hydropower reservoirs. In: Miresco, E.T. (Ed.), Proceedings of the First International Conference on New Information Technologies for Decision Making in Civil Engineering, October 10–13, University of Quebec, Montreal, Canada, pp. 1001–1011. Coulibaly, P., Anctil, F., Bobe´e, B., 1999. Pre´vision hydrologique par re´seaux de neurones artificiels: e´tat de l’art. Can. J. Civil Engng 26 (3), 293–304. Coulibaly, P., Anctil, F., Rasmussen, P., Bobe´e, B., 2000. A recurrent neural networks approach using indices of low-frequency climatic variability to forecast regional annual runoff, submitted for publication. Cybenko, G., 1989. Approximation by superposition of a sigmoidal function. Math. Control Signals Syst. 2, 303–314. Fahlman, S.E., Lebiere, C., 1990. The cascade-correlation learning architecture. In: Touretzky, D. (Ed.), Advances in Neural Information Processing Systems 2, Morgan Kaufmann, San Mateo, pp. 524–532. Finnoff, W., Hergert, F., Zimmermann, H.G., 1993. Improving model selection by nonconvergent methods. Neural Networks 6, 771–783. Hagan, M.T., Menhaj, M.B., 1994. Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Networks 5 (6), 989–993. Hagan, M.T., Demuth, H.B., Beale, M., 1996. Neural Network Design, PWS Publishing Company, Boston, USA. Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2 (5), 359–366. Hsu, K.L., Gupta, H.V., Sorooshian, S., 1995. Artificial neural network modeling of the rainfall-rainoff process. Water Resour. Res. 31 (10), 2517–2530. Jacobs, R.A., 1988. Increased rates of convergence through learning rate adaptation. Neural Networks 1 (4), 295–308. Jacoby, S.L.S., 1966. A mathematical model for nonlinear hydrologic systems. J. Geophys. Res. 71 (20), 4811–4824. Jain, S.K., Das, D., Srivastava, D.K., 1999. Application of ANN for reservoir inflow prediction and operation. J. Water Resour. Planning Mgmt ASCE 125 (5), 263–271. Kang, K.W., Park, C.Y., Kim, J.H., 1993. Neural network and its application to rainfall–runoff forecasting. Korean J. Hydrosci. 4, 1–9. Karunanithi, N., Grenney, W.J., Whitley, D., Bovee, K., 1994. Neural networks for river flow prediction. J. Comp. Civil Engng ASCE 8 (2), 201–220. Khondker, M.H., Wilson, G., Klinting, A., 1998. Application of neural networks in real time flash flood forecasting. In: Babovic, V., Larsen, L.C. (Eds.), Proceedings of the 3rd International Conference on Hydroinformatics, August 24–
25, Copenhagen, Denmark, A.A. Balkema, Rotterdam, pp. 777–782. Kite, G.W., 1978. Development of a hydrologic model for a Canadian watershed. Can. J. Civil Engng 5 (1), 126–134. Masters, T., 1995. Advanced Algorithms for Neural Networks: a C⫹⫹ Sourcebook, Wiley, New York, USA. Mu¨lenbein, H., 1990. Limitations of multi-layer perceptron networks—steps towards genetic neural networks. Parallel Computing 14, 249–260. Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models. Part I—a discussion of principles. J. Hydrol. 10, 282–290. Nelson, M.C., Illingworth, W.T., 1991. A Practical Guide to Neural Nets, Addison-Wesley, Reading, MA, USA. Prechelt, L., 1998. Automatic early stopping using cross validation: quantifying the criteria. Neural Networks 11, 761–767. Ribeiro, J., Lauzon, N., Rousselle, J., Trung, H.T., Salas, J.D., 1998. Comparaison de deux mode`les pour la pre´vision journalie`re en temps re´el des apports naturels. Can. J. Civil Engng 25, 291– 304. Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning internal representation by error propagation. In: Rumelhart, D.E., McClelland, J.L. (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. MIT Press, Cambridge, MA, pp. 318–362. Saad, M., Bigras, P., Turgeon, A., Duquette, R., 1996. Fuzzy learning decomposition for the scheduling of hydroelectric power systems. Water Resour. Res. 32 (1), 179–186. Sajikumar, N., Thandaveswara, B.S., 1999. A non-linear rainfall– runoff model using an artificial neural network. J. Hydrol. 216, 32–55. Sarle, W.S., 1995. Stopped training and other remedies for overfitting, Proceedings of the 27th Symposium on the Interface of Computing Science and Statistics, pp. 352–360. Shamseldin, A.Y., 1997. Application of a neural network technique to rainfall–runoff modelling. J. Hydrol. 199, 272–294. Sima, S., 1996. Back-propagation is not efficient. Neural Networks 9 (6), 1017–1023. Smieja, F.J., 1993. Neural network constructive algorithms: trading generalization for learning efficiency? Circuits, Signals and Signal Processing 12, 331–374. Smith, J., Eli, R.N., 1995. Neural-network models of rainfall–runoff process. J. Water Resour. Plan. Mgmt ASCE 121 (6), 499–508. Thirumalaiah, K., Deo, M.C., 1998. Real-time flood forecasting using neural networks. Computer-Aided Civil Infrastruct. Engng 13 (2), 101–111. Tokar, A.S., Johnson, P.A., 1999. Rainfall–runoff modeling using artificial neural networks. J. Hydrol. Engng ASCE 4 (3), 232–239. Tollenaere, T., 1990. SuperSAB: Fast adaptive backpropagation with good scalling properties. Neural Networks 3, 561–573. Tong, H., 1990. Non-Linear Time Series: a Dynamical System Approach, Oxford University Press, Oxford, UK. Weigend, A.S., Rumelhart, D.E., Huberman, B.A., 1991. Generalization by weight-elimination with application to forecasting. In: Lippmann, R.P., Moody, J.E., Touretzky, D.S. (Eds.), Advances in Neural Information Processing Systems 3, Morgan Kaufmann, San Mateo, pp. 875–882.
P. Coulibaly et al. / Journal of Hydrology 230 (2000) 244–257 World Meteorological Organisation (WMO), 1994. Guide for hydrological practices, Geneva, Switzerland. Yu, X.H., Chen, G.A., 1997. Efficient backpropagation learning using optimal learning rate and momentum. Neural Networks 10, 517–527. Zealand, C.M., Burn, D.H., Simonovic, S.P., 1999. Short term streamflow forecasting using artificial neural networks. J. Hydrol. 214, 32–48.
257
Zhang, G., Patuwo, B.E., Hu, M.Y., 1998. Forecasting with artificial neural networks: the state of the art. Int. J. Forecasting 14, 35– 62. Zhu, M.L., Fujita, M., 1994. Comparisons between fuzzy reasoning and neural network methods to forecast runoff discharge. J. Hydrosci. Hydraulic Engng 12 (2), 131–141.