Statistical procedures for the evaluation of evapotranspiration computing models

Statistical procedures for the evaluation of evapotranspiration computing models

Agricultural water management Agricultural Water Management 27 (1995) 365-371 Short Communication Statistical procedures for the evaluation of evapo...

442KB Sizes 9 Downloads 64 Views

Agricultural water management Agricultural Water Management 27 (1995) 365-371

Short Communication

Statistical procedures for the evaluation of evapotranspiration computing models C.P. Jacovides *, H. Kontoyiannis Laboratory of Meteorology, Department of Applied Physics, University of Athens, Athens, Greece

Accepted 8 September 1994

Abstract The widely used mean bias error (MBE) and root mean square error (RMSE) in combination with the t-statistic are proposed as statistical indicators for the evaluation and comparison of evapotranspiration computing models. Using data published in Jacovides et al. ( 1988) it was demonstrated that the use of MBE and RMSE separately can lead to a wrong decision in selecting the best model from a suite of candidate models. The t-statistic should be used in conjunction with the MBE and RMSE errors to better evaluate a model’s performance. Finally, the t-statistic indicator can be viewed as a supplement of the MBE and RMSE errors in aiding modellers to determine whether or not a model’s estimates are statistically significant at a particular confidence level. Keywords: Evapotranspiration; Statistical indicators; t-statistic

1. Introduction In agricultural modeling research, scientists usually evaluate a model’s performance only by means of regression lines or correlation coefficients (R*) and occasionally by means of standard deviation. The two most widely used statistical indicators in the literature dealing with environmental estimation models are the root mean square error (RMSE) and the mean bias error (MBE) . These are defined as:

* Corresponding author. 037%3774/95/$09.50

0 1995 Elsevier Science B.V. All rights reserved

SSDIO378-3774(95)01152-S

366

C.P. Jacovides, H. Kontoyiannis/Agricultural

Water Management 27 (1995) 365-371

where N is the number of data pairs and di is the difference between i” predicted and i” measured values (Kennedy and Neville, 1986). The RMSE provides information on the short-term performance of a model by allowing a term by term comparison of the actual difference between the predicted value and the measured value. The smaller the RMSE value, the better the model’s performance. However, this test does not differentiate between under-and over-estimation. The MBE provides information on the long-term performance of a model. A positive value gives the average amount of over-estimation in the estimated values and vice versa. The smaller the absolute value, the better the model performance. It is obvious that the RMSE and MBE statistical indicators, if not used in combination with one another, may not be adequate indicators of a model’s performance. The dimensional values of these indicators do not allow model testing under a wide range of various meteo-climatic conditions. To circumvent this problem, these two statistical indicators are usually non-dimensionalized by some authors. The two most popular methods have been those of Page et al. ( 1979) and Davies et al. (1984). In the following, these procedures are presented briefly. The non-dimensional RMSE and MBE according to Page et al. ( 1979) are: (3)

(4) and according to Davies et al. (1984)

are:

where Fi is the $ measured value. Nevertheless, although these indicators provide generally a reasonable procedure for model comparison, they do not indicate objectively whether a model’s estimates are statistically significant. Thus, this analysis involves an additional statistical indicator: the tstatistic. The sampling distribution of the t-statistic is similar to the standard normal. For large N the t-distribution approaches the standardized normal distribution, while for small N the t-distribution is flatter and has higher tails than the normal one. Furthermore, this statistical indicator allows models to be compared and at the same time can indicate whether the model’s estimates are statistically significant at a particular confidence level. The tstatistic is defined through the MBE and RMSE errors, as:

1 “2

(Kennedy

and Neville,

1986; Walpole and Myers, 1989).

C.P. Jacovides, H. Kontoyiannis /Agricultural Water Management 27 (199.5) 365-371

367

The t-statistic is chosen as an indicator for an hypothesis testing vs. the correlation coefficient R* because it is more informative, as it combines both the MBE and RMSE values. Furthermore, the smaller the value of t the better the model’s performance is. In order for the model’s estimates to be statistically significant a critical t value has to be determined. The critical t value is obtained from standard statistical tables, i.e. talz, and depends on the level of significance (cz) and the degrees of freedom (N- 1). In order for the model’s estimates to be judged statistically significant at the 1 - (Yconfidence level, the calculated t value must be less than the critical t value. The level of significance can vary between 0 and 1, but is usually 0.005 or 0.01 (Rees, 1989). The significance level means that the probability of rejecting the null hypothesis when true is cy, while the confidence level 1 - (Y,is the probability that the null hypothesis is not rejected when true. In the present study the level of significance is chosen to be a=0.005, so that the corresponding critical t value, as obtained from the statistical tables, is t = 2.66, for N- 1 degrees of freedom. (It must be noted here that when the manuscript was in the final form, a paper by Stone ( 1993) concerning solar radiation estimation models through t-statistic, came to the authors attention. The aforementioned author demonstrated that at least in insolation modeling the use of the RMSE and MBE in conjunction with the t-statistic produces a better evaluation of a model’s performance.)

2. Application of the t-statistic The procedures can best be illustrated by an example. Therefore, some calculations are repeated from Jacovides et al. ( 1988), concerning precision of four models estimating evapotranspiration over a water surface, namely Penman, Penman-Monteith, Bowen ratio, and bulk Aerodynamic method. Furthermore, Jacovides et al. ( 1988) reported data on latent heat flux (LE) taken over the lake Washington, Seattle. The LE values were obtained through eddy-correlation instrumentation. There are 70 independent observed data points, therefore resulting in 70 degrees of freedom. Figs. 14, present the pairs of (observed, predicted) LE values for the models of Penman, Penman-Monteith, Bowen ratio, and bulk aerodynamic, respectively. In Jacovides et al. ( 1988) the accuracy of the models were evaluated via scatter diagrams as shown in Figs. 14. Furthermore, Table 1, shows the regression constants a and b, the correlation coefficients R*, the MBE and RMSE errors, and the t-statistics, for the four models concerned. Jacovides et al. ( 1988) concluded the following: (1) the bulk aerodynamic method performed best in estimating evapotranspiration; (2) the Penman’s model shows a trend of over-estimating potential evapotranspiration; (3) the Penman-Monteith’s model and the Bowen’s ratio method give rather satisfactory results. Table 1, leads to similar conclusions with respect to the regression constants a and b and R2. Reviewing Figs. l-4, with respect to scatter of points, it is clear that the aerodynamic method performed best. Penman-Monteith’s, Penman’s and Bowen’s models follow in that order.

368

C.P. Jacovides, H. Kontoyiannis /Agricultural Water Management 27 (1995) 365-371

k'=0.961 MBE=1.41 RMSE=5.79 t-stotistic=2.07

I

(W.m-‘) (W.m-‘) _

60 i-

Fig. I. Eddy correlation the Penman’s model.

measurements

of latent heat flux (LE) are compared with the predicted LE values from

From Table 1, it is obvious that the bulk aerodynamic model performed best as MBE values are lower. The Bowen’s model follows in model performance with respect to MBE values while the Penman-Monte&h’s and Penman’s models follow in that order. Interestingly, Table 1, also indicates that the bulk aerodynamic method performed best since RMSE values are the lower ones. Accordingly, Penman-Monteith’s, Penman’s and Bowen’s models follow in that order. The Bowen’s model performed slightly better than the bulk aerodynamic model according to the t-statistic. For practical purposes therefore, the distinction between the bulk aerodynamic and Bowen’s model is negligible and the use

uu

1

Y = 0.975 r R2=0.986

x

+

IL---

?? 25!o Fig. 2. Eddy correlation measurements the Penman-Monte&h’s model.

!

2.123

1

40

LE oBsLRf:

I

(w.%‘)

I O0

of latent heat flux (LE) are compared with the predicted LE values from

C.P. Jacovides, H. Kontoyiannis /Agricultural

Water Management 27 (1995) 365-371

369

00

A

80

Y = 0.876 :R2=0.926 -MBE=0.65 !RMSE=6.32 It-statistic=0.92

X + 7.427 (W.m-‘) (W.m-‘)

60

2 d

Fig. 3. Eddy correlation the Bowen’s model.

measurements

10

of latent heat flux (LE) are compared with the predicted LE values from

of the RMSE and MBE separately could be considered adequate in determining the best model. Furthermore, in the case of Penman-Monteith’s and Penman’s models, the former model performed better than the latter model with respect to MBE and RMSE values. However, the Penman-Monteith’s model performed slightly better than the Penman’s model according to the t-statistic. Nevertheless, both models produced estimates that are statistically significant. This example clearly indicates that relying on the RMSE and MBE used separately can lead to a wrong decision in selecting the best model from a suite of candidate models. Although the bulk aerodynamic and the Bowen’s models produced the best per-

Y = 0.978 R2=0.99 1 MBE=-0.32 RMSE=2.69 t-statistic=0.97

60 L

x +

1.519

(W.m-‘) (W.m-‘) .

1 1 100

Fig. 4. Eddy correlation measurements the aerodynamic method.

of latent heat flux (LE) are compared with the predicted LE values from

370

C.P. Jacovides, H. Kontoyiannis /Agricultural

Table 1 Performance

of models using regression

Models

aW m-’

bW m-’

RZ

Aerodynamic method

0.978

1.519

Penman PenmanMonteith

0.963 0.975

Bowen ratio

0.876

Water Management 27 (1995) 365-371

constants a and b, R’, MBE, and RMSE errors, and t-statistic

t

&it

0.039

0.97

2.66

0.117 0.076

0.086 0.051

2.07 1.97

2,66 2.66

0.124

0.109

0.92

2.66

MBE W m-*

MBE,

MBE, (%)

RMSE Wm-*

RMSE, (%)

RMSE,

(%)

0.991

-0.32

- 0.0084

-0.004

2.69

0.057

3.339 2.123

0.961 0.986

1.41 0.79

0.0318 0.0183

0.021 0.014

5.79 3.39

7.427

0.926

0.65

0.0155

0.012

6.32

(%)

formance, nevertheless the Penman-Monteith’s and the Penman’s models work satisfactorily as the values they predict are statistically significant. In general, the four models considered in Jacovides et al. ( 1988) produced estimates that are statistically significant at the particular confidence level, i.e. 1- cy= 99.5%, as the calculated t values are less than the critical t value (2.66). It is clear that the indicators of each procedure are lower and higher than the indicators of the other procedures. In other words, it is clear that the use of the RMSE and MBE in evaporation is not an adequate indicator of model performance. The t-statistic should be used in conjunction with these two indicators to better evaluate a model’s performance. The case of the Bowen’s model performance, is an effective example of the above. Although this model produces the worst scatter diagram, the higher RMSE values and the lower correlation coefficient, the t-statistic is very successful.

Acknowledgements The micrometeorological data used in this analysis were provided by the Department of Atmospheric Sciences, University of Washington, Seattle. Professor Dr. Kristina Katsaros is greatly acknowledged.

References Davies, J.A., Abdel-Wahab, M. and MacKay, D.C., 1984. Estimating solar irradiation on horizontal surfaces. Jnt. J. Solar Energy, 2: 405-409. Jacovides, C.P., Papaioannou, G. and Kerkides, P., 1988. Micro and large-scale parameters evaluation of evaporation from a lake. Agric. Water. Manage., 13: 263-272. Kennedy, J.B. and Neville, A.M., 1986. Basic Statistical Methods for Engineers and Scientists. 3rd Edn., Harper and Row, New York. Page, J.K., Rodgers, G.G., Souster, C.G. and Le Sage, S.A., 1979. Predetermination of irradiation on inclined surfaces for different European centres. Final Report, Vol. 2, EEC Solar Energy Programme, Project F, Sheffield. Rees, D.G., 1989. Essential Statistics. 2nd Edn., Chapman and Hall, London.

C.P. Jacovides, H. Kontoyiannis/Agricultural Water Management 27 (1995) 365-371

371

Stone, R.J., 1993. Improved statistical procedure for the evaluation of solar radiation estimation models. Solar Energy, 5 1: 289-291. Walpole, R.E. and Myers, R.H., 1989. Probability and Statistics for Engineers and Scientists. 4th Edn., Macmillan, New York.