A statistical model for minimum temperature forecasting at Dhahran, Saudi Arabia

A statistical model for minimum temperature forecasting at Dhahran, Saudi Arabia

Energy Vol. 16, No. 4, pp. 771-778, 1991 Printed in Great Britain. All rights reserved 0360-5442/91 $3.00 + 0.00 Copyright @J 1991 Pergamon Press plc...

587KB Sizes 0 Downloads 32 Views

Energy Vol. 16, No. 4, pp. 771-778, 1991 Printed in Great Britain. All rights reserved

0360-5442/91 $3.00 + 0.00 Copyright @J 1991 Pergamon Press plc

A STATISTICAL MODEL FOR MINIMUM TEMPERATURE FORECASTING AT DHAHRAN, SAUDI ARABIA D. Y. Energy

Research

Laboratory,

(Received

ABDEL-NABIt

and M. A.

ELHADIDY

Research Institute, King Fahd University Dhahran 31261, Saudi Arabia

of Petroluem

and Minerals,

12 April 1989; received for publication 2 July 1990)

Abstract-One-day forecasting techniques, based on classical statistical methods and general linear modelling, are described for the particular case of predicting the minimum temperature. The resulting models are designed to run in real time on a computer-driven automatic weather station. Two years of data have been used to develop the minimum temperature model. Variable selection is carried out by using Mallow’s CP criterion, as well as stepping regression methods. The problem of collinearity between the regressors is investigated. Model adequacy is examined by using the residual analysis technique. The final model involves parameters of mean air temperature, mean wind speed, mean total (global) solar radiation, mean relative humidity, and maximum relative humidity. Model validation was carried out by using an independent data set of 243 observations. During this period, the forecasts achieved a 67% hit rate and exhibited absolute and algebraic

mean errors of 1.5 and 0.7”C, respectively. Further, no large forecast errors occurred. The model is a useful adjunct tool for subjective forecasting. This statistical forecasting technique is now being applied to nowcasting and l/2 h forecasting of off-shore conditions.

INTRODUCTION

Purely statistical and mixed statistical/dynamical techniques are extensively employed in short-range weather prediction. Two well-known examples are the GEM (Generalized Equivalent Exponential Markov) and MOS (Model Output Statistics) approaches.’ Multivariate statistical models have also been employed for medium to long time scales.’ The possibility of combining subjective and objective forecasts for a resultant improvement in final forecast effectiveness has also been recognized.3,4 In more recent times, there has been a movement toward artificial-intelligence-like approaches, for example, the knowledge-based nowcasting approach of Ref. 5. A major aim of the present project is to develop a statistically-based objective forecasting models for nowcasting and l- and 24-h forecasting of maximum and minimum temperatures. These models are designed for applications in real time onboard a computer-driven automatic weather station located at Dhahran. This station is computer-driven (Hewlett-Packard 9845B), data quality is strictly monitored and measurements are made for a wide range of radiation and meteorological parameters. The station was established in July 1979, and its present configuration is detailed in Ref. 6.

BACKGROUND

AND

ENVIRONMENT

Dhahran (26.32”N, 50.13”E) is located a few kilometers inland from the Gulf and its climate, though arid (approximately 80 mm total annual precipitation), is significantly influenced by the Arabian Gulf waters. The Arabian Gulf extends from the Shatt Al Arab Waterway of southern Iraq to the Straits of Hormuz and exhibits a NW/SE-aligned major axis almost 1000 km in length with a mean depth of only about 35 m.’ However, the overall shallowness of the Gulf, combined with occurrence of the deepest waters near the Iranian coast against the Zagros mountain chain, greatly reduces thermal control of the regional weather by the Gulf. tPresent

address:

MIC-Dammam,

P.O.

Box 4927, Dammam 771

31412,

Saudi Arabia.

D. Y. ABDEL-NABIand M. A. ELHADIDY

712

The climate of the Dhahran area is discussed in Ref. 8. Throughout the period 1950-1976, the absolute maximum temperature recorded at Dhahran was 51°C and the absolute minimum was -1°C. Average daily maxima range between 21 and 42°C and exceed 40°C for 4 months of the year. Average daily minima vary between 12°C in December/January and 29°C in July/August and fall below 20°C only during the months November-March. For the present study, the daily minimum temperature at Dhahran is the dependent (predicted) variable; 2 years (1986 and 1987) of data from the Research Institute station have been used in formulating the model.

MODEL

SELECTION

One of the forecast requirements of the Gulf region calls for a forecast to be issued at 1200 GMT (1500 local) valid for the following day. Thus, the dependent variable in the model is defined as the minimum temperature of 1 day ahead (TN,) and the independent variables are the maxima, mean, and minima of meterological and radiation parameters for the period up to 1200 GMT of the current day. Hence, the pre-selection predictive model corresponds to TNr = F(TA, WS, WD, WV, BP, RH, TAX, TAN, WSX, WSN, RHX, RHN, THR),

(1)

where TN, = predicted minimum temperature (“C), TA = mean air temperature (“C), WS = mean wind speed (m/set), WD = mean wind direction (degrees from true north), WV = mean wind velocity vector (m/set), BP = mean barometric pressure (mmHG), RH = mean relative humidity (%), TAX = maximum air temperature (“C), TAN = minimum air temperature (“C), WSX= maximum wind speed (m/set), WSN = minimum wind speed (m/set), RHX = maximum relative humidity (%), RHN = minimum relative humidity (%), and THR = mean global radiation (W-day/m*). For the present application, the wind direction is unsuitable for use in a linear regression model because of the shift from 360 to 0”. To overcome this problem, the horizontal (U) and vertical (V) components of the mean wind velocity vector were used instead of the wind direction and wind-velocity vector. Using a statistical package, a number of approaches were employed to make the final variable selection. Complete details of the employed approaches are provided in the Appendix. A preliminary normal probability analysis of residuals (i.e., the differences between observed and predicted minimum temperatures) revealed four minimum temperature values, which could not be supported by extant meteorological events and hence were identified as spurious and disregarded.

MODEL

As shown in the Appendix, Dhahran, Saudi Arabia, is

ADEQUACY

the best correlation

for predicting the minimum temperature

at

TNp = -4.20 + 0.8963 * TA - 0.0742 * WS + 0.0069 * THR + 0.0468 * RH - 0.0249 * RHX.(2) Several methods can be used’ to evaluate the model adequacy. In the present study, the residual analysis method is used. The residuals were calculated using the functional form of the multiple regression model, Eq. (2). A normal probability analysis of the residuals was used to check normality (Fig. 1). The points lie approximately along a straight line; small departures from normality are statistically acceptable. A plot of the residuals vs the corresponding predicted values is useful in searching for several common types of model inadequacies. Such plots typically exhibit patterns such as horizontal bands, outward-opening, double bows, and curved bands.’ The desirable pattern for a good model is one in which the residuals are contained within a horizontal band. A plot of residuals vs fitted values is shown in Fig. 2. There is no indication of any model defects. Measured and predicted minimum temperatures vs time are shown in Fig. 3 for the year 1986. The same agreement was found for the year 1987.

Model for minimum temperature

forecasting

. ++. . +... .... .

+2 M z E G a

O-

-2 -

+... ++... ++.*. -4 _,;;**** t -2

..I

. .... .. .. . . ... .... . ... ....

0

-1 Expected

Fig. 1. Normal probability

+l

Normal

+2

plot for residuals. (+) Normal cumulative cumulative distribution. MODEL

+3

Value

distribution;

(*) residual

VALIDATION

Model validation is different from model adequacy, model validation deals with the determination of model function. In general, three different techniques can be used to validate a regression model. 9 These techniques are: (i) collection of fresh data to investigate the predictive performance, (ii) analysis of the model coefficients and predictive values including comparisons with prior experience, and (iii) use of a data-splitting technique. Of these three, the first is the most effective, but all may lead to useful insights. Analysis of the model coefficients is best approached for meteorological applications by assessing the accuracy of meteorological predictions. For our model, the negative intercept is in line with TAN being below the mean air temperature TA which is the dominant parameter in

x xxxil -.

XX

x 0

10 Predicted

i

xXxX;x;

x;;

x

20 minimum temperature

30 1°C)

Fig. 2. Plot of residuals vs the predicted values TN,

40

D. Y. ABDEL-NABIand M. A. ELHADIDY

I

1

I

32

62

92

1

122

I

152

I

182

1

212

I

242

I

212

I

302

I

332

2

Julian Day Fig. 3. Predicted (---) and measured (-)

minimum temperatures

vs the cumulative Julian day (JD)

over the year 1986. Day 1 is 1 January 1986.

the model. The TA coefficient is positive and indicates a positive correlation between mean and minimum temperatures. The correlation between minimum temperature (TAN) and wind speed (WS) is negative. The positive correlation with global radiation (THR) and net positive correlation with relative humidity (RH/RHX) are related to enhanced radiation heating during the day and reduced radiation cooling at night. Five statistical parameters used in the model validation are defined in terms of the measured minimum temperature (TAN) and the predicted value (TN,) as: H, = percent number of hits, where a hit occurs when TAN and TN, fall in the same 3°C wide category, H,, = percent number of hits, where a hit occurs when (TAN-TNJ < 1.5”C, AE = absolute error defined as the mean of ITAN-TN,(, GE = algebraic error defined as the mean of (TAN-TN,), and LE = percent large errors, i.e., percent occurrences of ITAN-TN,1 2 6°C. The H, temperature categories are -1-2, 3-6, 7-10, . . . , 47-50 and temperatures are rounded to the nearest degree for II, determination only. These parameters were determined for the statistical model defined in Table A3 and for predictions made by trend persistence (TN,,). This latter is defined such that the prediction for tomorrow is the minimum temperature of today plus the trend observed between yesterday and today, i.e., TN,, = TAN, + (TAN, - TAN,_,) = 2 TAN, - TAN,_, where t denotes the present day. While the TN, forecast is issued at 1200 GMT, the TN,,,, forecast can only be given at 2100 GMT (midnight local), thus giving a significantly reduced forecast interval for TN,,.

RESULTS

Figure 4 shows TAN and TN, for each day of the validation interval; and the results for the present prediction model together with the results obtained using GEM and MOS techniques

Model for minimum temperature

forecasting

J

O~.........,.........,........,~,,,,,,,, 0

10

20

Measured

Fig. 4. Measured minimum temperature

minimum

4.O

30 temperature

(‘Cl

(TAN) vs predicted (TN,) over the independent The units are “C.

test period.

are shown in Table 1. The model gave very acceptable results, with a mean absolute error AE of 1S”C and a hit rate of 67%. In the frame of reference of the GEM and MOS results, this AE is about half that for GEM/MOS and the hit rate 40% better. Further, no large errors (LE) were recorded, while rates between 1.2 and 2.4% were acceptable for GEM/MOS. Further, the degradation of model performance between application inside and outside the domain of its definition is small. Quantitatively, the comparison is: 69.7% hit rate within the domain, 67.1 outside; AE 1.2”C inside, 1.5 outside; no large errors in either case (Table 1). The superiority of such models against persistence is well known,” and only a limited comment needs be made here: AE (persistence) 40% greater than AE (model); model hit rate 30% improved for both H, and Z&; 4.1% LE (persistence) as against zero for model (Table 1).

CONCLUDING

REMARKS

In this paper, the techniques being used to develop statistical forecast models have been laid out and the case of minimum temperature prediction at Dhahran, Saudi Arabia has been treated in detail. The adequacy of the model was verified within the data domain from which the model was developed. The five-variable model for minimum temperature prediction is seen to satisfy all criteria of statistical adequacy. It is demonstrated, through application to independent data, that the 5-variable model developed is an effective forecast tool, particularly when one is reminded that it is intended only as a real-time aid to forecasters currently employing strictly subjective techniques. It is Table 1. Summary over a 243-day validation period of forecasts. Also included are comparable numbers for 2 yr of data used in formulating the model (in parentheses), and for trend-persistence forecasts over the 243-day validation interval. Additionally listed results are from Ref. 1. using the GEM and MOS techniques.

Model Persistence GEM MOS I

I

AE

GE

LE

%

K

lS(1.2) 2.1 2.8 3.0

0.7(0.0) 0.7 -0.9 -0.3

O.O(O.0) 4.1 1.2 2.4

59.7(69.0) 46.1 ____ ____

67.1(69.7) 51.9 47.4 47.7

1

,

t

I

I

D. Y.

776

ABDEL-NABI and M. A. ELHADIDY

particularly encouraging that there is very little degradation in performance domain of definition of the model and the independent operational interval.

between

the

work is part of the KFUPM/RI project No. 12011 supported by the Research Institute of King Fahd University of Petroleum and Minerals.

Acknowledgemenr-This

REFERENCES 1. T. J. Petrone

2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

and R. G. Miller, World Meteorological Organization, Research Publication Series No. 9, Geneva, Switzerland (1985). R. H. Maryon and A. M. Storey, J. Climatol. 5, 561 (1985). R. T. Clemen and A. H. Murphy, Weath. Forecasting 1,56 (1986). R. T. Clemen and A. H. Murphy, Weath. Forecasting 1,213 (1986). R. C. McArthur, J. R. Davis, and D. Reynolds, J. Atmos. Oceanic Tech. 4, 29 (1987). P. D. Kruss, V. Bahel, M. A. Elhadidy, and D. Y. Abdel-Nabi, ASHRAE Trans. 9, 3 (1989). A. H. Meshal and H. M. Hassan, Arab Gulf Scient. Res. 4, 649 (1986). R. 0. Williams, Environmental Unit, Arabian American Oil Company, Dhahran, Saudi Arabia (1979). D. C. Montgomery and E. A. Peck, Wiley, New York, NY (1982). R. G. Miller, Proc. 7th Conf. on Probability and Statistics, Sciences of the American Meteorological Society, CA (1981). D. A. Belsley, E. Kuh, and R. E. Welsch, Wiley, New York, NY (1985). APPENDIX

Model-Selection

Procedure

Model-type

Replacing the wind direction and wind-velocity velocity vector U and V in Eq. (1) gives TNp=F(TA,WS,U,V,

vector

by the components

of the wind-

BP,RH,TAX,TAN,WSX,WSN,RHX,RHN,THR).

(Al)

The necessary first step was to determine the relative contributions of linear, quadratic, and crossproduct expressions derived from the variables list of Eq. (Al). This analysis gave the values listed in Table Al for the coefficient of multiple determination (R’), F-ratio, and model sum of squares (SS ,,.,,,&. The results presented in Table Al show minimal model enhancement from the quadratic and crossproduct terms and hence a multiple linear regression model is deemed the most appropriate for the present application. There are four variable selection procedures appropriate to the present application. Detailed description of these procedures can be found in Ref. 9. The results of applying the four procedures are summarized as follows. Forward selection. Application of this procedure indicates a six-variable model with an R2 of 0.957, Mallow’s Cp of 4.4, and an F-ratio of 2666; the effective regressors are TA, THR, RH, TAN, WS, and RI-IX in the order of insertion in the model. Backward elimination. Applying this process suggests a seven-variable model with R2 of 0.957, Mallow’s Cp of 4.7, and an F-ratio of 2285; the effective regressors are TA, RH, TAN, WSX, WSN, RHX, and THR. Stepwise regression. In this case, the model obtained is identical to that obtained via the forward selection procedure. Table

Al.

Contribution

of linear,

Model linear regression quadratic regression cross-product regression

quadratic,

and crossproduct

terms

for the regression

R’

F-ratio

ss

0.957 0.004 0.009

1480 5.6 2.0

41403 155 351

model

model.

Model for minimum temperature

forecasting

777

Table A2. Correlation matrix.

=NP

TA ws THR TAN RH RHX

WS

THR

TAN

RH

RHX

-0.06 -0.05 I .oo

0.75 0.75 0.10

0.97 0.99 -0.0 1 0.72 1.00

-0.54 -0.57 -0.25 -0.67 -0.52 1.oo

-0.41 -0.42 -0.39 -0.5 I -0.4 1 0.87 1.00

07.7 23.2

16.2 56.4

18.2 74.6

1.oo

STD Mean

02.5 05.8

87.3 340.1

Mullow’s Cp statistics. This criterion indicated the variables TA, THR, RH, TAN, WS, and RHX (C, = 4.4) to be effective regressors, which are identical to the variables obtained from the forward selection and stepwise regression procedures. The above four procedures indicated five variables for definite consideration (TA, THR, RH, TAN, and RHX) and three additional candidates of lesser contribution, namely WS, WSN and WSX. The C, criterion, forward selection, and stepwise regression are felt to be the stronger here, and hence WS is held for further consideration. Collinearity diagnostics If any selected regressor can be closely approximated by a linear relation with one or more of the other regressors in the model, then the affected estimates are unstable and have high standard errors. This collinearity (or multicollinearity) problem is not statistical in nature but it is a problem inherent in the data itse1f.i’ It is essential to investigate this problem with regard to the preliminarily-selected variables listed above. One method for the identification of simple collinearity is inspection of the off-diagonal elements of the correlation matrix shown in Table A2; collinearity exists if the absolute value of an element is near unity. Table A2 reveals a high correlation between mean temperature TA and minimum temperature TAN (0.99). Another diagnostic criterion is based on variance-inflation factor (VIF) analysis, where VIF; = 1/(1-R?) and R: is the multiple correlation coefficient of the ith explanatory variable regressed on the remaining explanatory variables. A high VIF value must point to collinearity. VIFs below 10 are statistically acceptable.’ The VIF values for the regressors TA, THR, RH, TAN, WS, and RHX are 84.5, 3.0, 7.6, 72.2, 1.4, and 5.6, respectively. The VIFs of TA and TAN are exceptionally high. The correlation matrix and VIF results reveal that the simple collinearity between TA and TAN is sufficient to affect the accuracy with which the regression coefficients can be calculated. To overcome the collinearity problem, the statistical model is respecified by eliminating the regressor TAN. Table A3. Results for the general linear models procedure the five-variable model. Source

DF

SS

MS 8270 2.6

F-Value

R2

C.V.

RMSE

3146

0.956

7.1

1.62

Model Error

005 723

41352 1900

Corrected total

728

43254

Parameter

Estimate

T for HO: Parameter = 0

PR>ITI

Intercept TA

-4.20 0.8963 -0.0742 0.0069 0.0468 -0.0249

-6.6 76.8 -2.7 5.9 5.2 -3.4

0.000 1 0.000 1 0.0080 0.000 I 0.000 I 0.0007

ws THR RH RHX

Adj R* = 0.956

STD Error Estimate 0.63

0.01 0.02

0.00 0.01 0.01

of

178

D.Y.

ABDEL-Nnerand M. A.ELHADIDY

Repeating the above steps (i.e., the forward, backward, stepwise and Mallow’s Cp procedures) for the reduced parameter list resulted in the elimination of a further variable. The model thus obtained includes the effective regressors TA, WS, THR, RH, and RHX. The corresponding VIFs are, respectively, 2.5, 1.3, 2.9, 5.8, and 4.9; these values are all statistically acceptable. General linear modeling

The recommended regressors obtained by using the indicated procedures were introduced into a general linear modelling procedure to determine the unknown coefficients of the model, as well as other statistical parameters; the results are summarized in Table A3. In this particular case, the significance level terms PR > 1TI for all of the regressors included are much less than 0.05 (the acceptable limit), indicating that all submitted regressors contribute significantly to the model.