Copyright © IFAC Modelling and Control of National Economics, Beijing, PRC, 1992
THE COMBINATION OF FORECASTS: TWO EXTENSIONS G. Libert·, 8ao Liu·· and Liang Wang·· ·Departl'Mnt of Computer Science, Faculti Polytechnique de Mons, Rue de Houdain 9. 7000 Mons, Belgium ··institute of Systems Engineering . Tiilnjin University, Tiilnjin 300072 , PRC
Abstract. The combination offorecasts improves forecasting accuracy and should be used more frequently in practice. In this paper, two new combining methods, called recursive equal weighting (REW) and linear programming (LP), respectively, are Introduced; their forecasting performance are examined using several practical cases. It is shown that in forecasting accuracy REW is comparable to the optimal combining methods, and LP is superior to all existing combining methods when outliers are present on observations. Keywords. Forecasting theory; combining forecasts; recursive equal weighting; linear programming; outliers.
INTRODUCTION
Since combining forecasts has so many merits, any forecasting effort ought to seriously consider using such an approach. Several methods for combining two or more forecasts into a composite forecast have been developed, among which equal weighting (EW) is the simplest. EW has performed well in a number of empirical studies (e.g., Makridakis and co-workers, 1982; Makridakis and Winkler, 1983), but it is theoretically not appealing. In order to overcome this imbalance, Wang Yu and Liu (1988) proposed a recursive equal weighting (REW) method for combining forecasts. When the competing forecasts being combined exhibit different error variances, the combining weights given by REW are not again the impartial ones, and the sample information about individual forecasts has been incorporated into the combined forecast in an implicit manner. In this paper, we further examine the forecasting performance of REW using several practical cases, showing that REW is indeed comparable to those theoretically optimal combining methods.
Bates and Granger (1969), in their pioneering work, showed that if a number of unbiased forecasts of a time series were available, then it was rarely optimal to seek out the 'best' of the competing forecasts and use it alone. Rather, the forecasts could always be combined in such a way that the composite forecast had variance less than or equal to any of the individual forecasts. In another contribution (Newbold and Granger, 1974), the authors arrived at the following conclusions: 'It does appear ... that Box-Jenkins forecasts can frequently be improved upon by combination with either HoltWinters or stepwise autoregressive forecasts, and we feel that our results indicate that in any particular forecasting situation combining is well worth trying, as it requires very little effort. Further improvement is frequently obtained by considering a combination of all three types of forecast'. Although their studies were criticized repeatedly by several distinguished statisticians at that time, today the principle of combining forecasts is accepted without controversy (Bunn, 1988). To date nearly all empirical studies have borne out that combining produces more accurate forecasts than the individual models being combined. By combining, one can retain more insight than is obtainable from the use of any individual model. Sometimes, two less expensive models can produce a better combined forecast than that coming from a third but more expensive model (Bopp, 1985). Under the circumstance of models being mis-specified, a composite forecast even can provide a temporary stopgap until better specified models are available (Longbottom and Holly, 1985).
Recently, combining forecasts using the regressionbased methods has been paid more and more attention (e.g., see Granger and Ramanathan, 1984; Bopp,1985; Clemen,1986; Diebold and Pauly,I987). This is partly because these methods may be more tractable than the variance-covariance method posed by Bates -and Granger (1969). In the regression-based methods, combining weights are calculated by the ordinary least squares (OLS) estimator. If noise on the history of data is Gaussian and has known covariance, the OLS gives the maximum-likelihood minimum-variance estimate. However, in practice, it is unlikely that noise statistics 129
will be well known, and in the case where one or more data are greatly in error, the OLS estimator may give very poor results. In this paper we applied linear programming (LP) method to the determination of combining weights. Many practical examples showed that this method greatly outperformed those regression-based methods and other weighted ones when outliers were present on observations.
get a mean squared error sequence that comes down monotonously (i.e. MSE1 < MSEI+.)' Since this sequence is nonnegative, it will inevitably converge to a limit value. If either condition (a) or (b) cannot be satisfied consistently, and if, rather, both appear alternately, we will have a mean squared error sequence lying between two non negative ones which go down in a partiaUy monotonous manner (A sequence {x ..... ,x,,} Is defmed as a partially-monotonously-decreasing one if its elements satisfy XI :5 xl+ l for i=1, ... ,n-1). According to the intermediate value theorem of limit, this sequence is also of convergence.
COMBINING FORECASTS USING REW Despite its theoretical sub-optlmality, EW appears to perform well empirically. A possible reason for this Is that EW does not necessitate some model fitting, which makes it more robust than other combining methods in the case where limited history of data Is available or the pattern of forecast errors shows some nonstationariness.
COMBINING FORECASTS USING LP When outliers are present on observations, the OLS estimator may give very poor results. The erroneous observations are weighted according to the square of their residuals, and therefore have an exaggerated effect on the estimated weights. Instead of minimizing the sum of the squares of the forecasting errors, we estimate the combining weights by minimizing the sum of the absolute differences between the actual and predicted values
Maintaining the advantage of no model fitting, but allowing for a successive equal weighting to p separate forecasts, the REW method incorporates the sample information into the combined forecasts in an implicit manner. As a result, the weights given by REW are not again equal ones when the original forecasts exhibit different error variances.
min J = CII 8 - FIl I
(1)
13
The main steps of this method are outlined as follows: where Cl is a vector of Is with dimension (n-m), 8 is the vector of forecast variable with dimension (n-m), Il is the weight vector with dimension p, and
(i)Ca1culate the mean squared error MSE I (i=l,... ,p) using the error history (t=l,... ,n-m) of each individual method, and sort them in ascending order MSE. < MSE1
••••••
< MSE p F=
(ii)Combine p forecasts f,(I), ... , flIP) (t=l,•.. ,n-m) using equal weighting, and get the combined forecast ft) (t= 1,... ,n-m). If its mean squared error MSE, Is less than MSE., the combining Is thought 'optimal', and it Is certain to improve forecasting accuracy by combining, then go to (iil). If MSE, is greater than MSE I but less than MSE p, the combining Is thought 'sub-optimal', and it is probable to improve forecasting by combining, then go to (iil). If MSE, Is greater than MSEp ' the combining is thought 'poor', and it is impossible to improve forecasting accuracy by combining, then go to (v).
(p)
(1)
f •. m
f •. m
•••
Il = (Il •......
Il~'
However, it seems difficult to solve (1) directly, so let 8 - FIl, if 8 ~ FIl
U= {
(iil)lf the difference between MSE, and MSE. Is less than a required threshold value, it does not seem necessary to continue to use combining. Rather, take ft") (t=l, ... ,n-m) as the final combined result, and compute the eventual combining weights, then go to (v). If not, go to (iv).
o,
otherwise
and
V=
(iv)Take ft") (t=l, ... ,n-m) as a primary forecast, and meanwhile, eliminate the forecast that has the maximum mean squared error, return to (ii).
{
-(8 - FIl) ,
if 8 < FIl
o,
otherwise
then (1) becomes
min
(2)
p,U,V
(v)The recursive process Is terminated. subject to
It can easily be shown that such a recursive process Is of convergence. During the course of the recursive session, if either (a) MSE, < MSE I or (b) MSE 1 < MSE, < MSE p can be satisfied from beginning to end, we will
FIl + U - V = 8 (3)
Il , U, V
130
~
0
where Cl
=[ 0 ... 0 1 1 •.• 1 1 1 ..• 1 ) .
accuracy measures [mean square error(MSE), mean absolute error(MAE) and mean percentage error (MAPE») favored by both academicians and practitioners (Carbone and Armstrong, 1982) are used as the evaluation criteria.
Formulae (2) and (3) constitute a standard linear programming problem with (n-m) constraints and (p+2n-2m) variables. The dimensionality of the problem may be reduced by reformulating the constraint as inequality. Note that U must be non negative, so the equation (3) may be rewritten F8 - V:O;
e
Each case was analyzed in two ways. Initially, the entire series was used to form optimal combining weights, with accuracy evaluation occurring in the whole sam pie. Secondly, the first n terms of series were used to estimate weights and the relative forecasting ability then compared out of the sample over the remaining m terms. As Granger and Ramanathan(1984) have pointed out, the latter Is a much better evaluation procedure, although the short post-sample period means that the results can be misleading owing to sampling error.
(4)
and U =
e - F8 + V
Therefore, (2) becomes
Cherryoak Company Sales Merge the same terms into one, and note that constant vector and may be omitted, giving
min J a.v
=
Cl [ S V
e
is a Parker and Segura (1983) made forecasts of annual sales in the Cherryoak Company for the period 1952 to 1970 using a regression model (REGRE) and a linear extrapolation model (EXTRA). We analyzed the 19 data points using various combining methods. A post sample with the last 5 observations had been used to illustrate the relative forecasting ability of these methods. In order to simulate the effect of outlier, the fifth data point 156.33 was replaced by 6.33 and the tenth data point 126.16 replaced by 1126.16. Table 1 shows the results. It can be seen that REW greatly outperforms EW in all situations In terms of three different accuracy measures, and its forecasting performance is aiso comparable to those theoretically optimal combining methods, e.g., VCA, RA, RD, and RC. In the situation where outliers are present, VCA and three regression-based methods give very poor results, while LP still gives quite an accurate solution.
1
subject to (4), where Cl = [-r ;~-: f,(\), ... ,-
r;:. f/P) 12 ,...,2).
Furthermore, in order to guarantee the efficiency of combining forecasts, we constrain the weights to sum to one, that is 1'8 = 1 where I is a vector of Is with dimension p. Now, we get a linear programming problem with (n-m+l) constraints and (p+n-m) variables, which may be solved using the simplex algorithm or any of its variants (see Dantzig, 1963).
Chemical Firm Sales
Whereas the OLS estimator gives undue weights to erroneous observations by squaring their residuals, the LP method only incorporates the modulus of each residual. Therefore, the combining weights given by the LP method will not be as strongly affected by outliers In a data set since these observations will not be as heavily weighted as in the OLS estimation.
Reeves and Lawrence(1982) used a data base consisting of thirty consecutive monthly observations to illustrate their extended decision support system for combining multiple forecasts, given multiple objectives, using a multiple objective linear programming procedure. These data are typical of actual chemical product sales data. In their example, forecasts were prepared based upon these data using three different modeis, namely an exponential smoothing (EXP), a harmonic smoothing (HARM) and a multiple regression (MREG). We combined the three individual forecasts using various combining methods and this led to the results shown in Table 2, in which the effect of bad data was illustrated by substituting the fifth observation 1103 with 103 and the tenth observation 1043 with 2043 in the original sample. Once again the LP method is unaffected by these bad data points, while RA and three regressionbased methods are in great error. Moreover, it can be found that REW is beaten by EW because of the presence of outliers. However, in these situations where outIiers are not present, REW is shown to be superior to EW and comparable to those theoretically
CASE STUDIES In this section, the forecasting performance of the two proposed methods are examined using several practical cases, For comparison, some traditional combining methods, including 'equal weighting' (EW), 'optimal' [i.e., error variance minimizing (VCA»), 'optimal with independence assumption' [i.e., error variance minimizing assuming zero correlation between individual forecast errors (VCD)] and three regression-based methods (named RA, RD and RC, respectively) presented by Granger and Ramanathan (1984) are considered. A package containing the ahove-mentioned methods has contributed towards this exercise (Wang and Libert, 1991). Three different
131
focus on combining forecasts in as many real world situations as possible.
optimal methods.
Hog Prices REFERENCES Bessler and Brandt(1981) considered the combination of forecasts for quarterly hog prices from an econometric model, an ARIMA model and from expert opinions, for the period 1976-01 to 1979-02. We analyzed the 14 data points according to the above procedure and got the results shown in Table 3. When there are no outliers on observations, the forecasting performance of REW is shown to be little different from that of EW and other combining methods. However, after two artificial bad data points are introduced (the fifth observation 38.96 Is replaced by 138.96 and the tenth observation 47.85 is replaced by 7.85), REW is shown to be inferior to EW (despite its superiority to those theoretically optimal combining methods), which suggests that the robustness of REW has indeed declined as compared with EW. Also, this case proves again that the presence of outliers has no significant effect on the LP estimation.
Bates, J.M. and C.WJ. Granger (1969). Combination of forecasts, Oper. Res. Quart., 20,451-468. Bessler, D.A. and J.A. Brandt (1981). Forecasting livestock prices with individual and composite methods, AppL Econ., 13, 513-522. Bopp, A.E. (1985). On combining forecasts: some extensions and results, MalUlge. Sc;', 31, 1492-1498. Bunn, D.W. (1988). Combining forecasts, Eur. J. Oper. Res., 33, 223-229. Carbone, R. and J.S. Armstrong (1982). Evaluation of extrapolative forecasting methods: results of a survey of academicians and practitioners, J. Forecasting, 1, 215-217. Clemen, R.T. (1986). Linear constraints and the efficiency of combined forecasts, J. Forecasting, 5,31-38. Dantzig,G.B.(1963)Linear Programming and Extensions, Princeton University Press, Princeton, NJ. Diebold, F.X. and P. Pauly (1987). Structure change and the combination offorecasts,J. Forecasting, 6,21-40. Granger, C.WJ. and R. Ramanathan (1984). Improved methods of combining forecasts, J. Forecasting, 3,197-204. Longbottom, J.A. and S. Holly (1985). The role of time series analysis in the evaluation of econometric models, J. Forecasting, 4, 75-87. Makridakis, S., A. Anderson, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, and R. Winkler (1982). The accuracy of extrapolative (time series) methods: results of a forecasting competition, J. Forecasting, 1,111-153. Makridaki'l, S. and R. Winkler (1983). Averages of forecasts: some empirical results, MalUlge. Sc;', 29, 987-996. Parker, G.G.C. and E.L. Segura (1983). How to get a better forecast, in D.N. Dickson (ed.), Using Logical Techniques/or Making Better Decisions, John WiJey and Sons, New York, pp. 435-452. Newbold, P. and C.W.J. Granger (1974). Experience with forecasting univariate time series and the combination of forecasts, J. Roy. Stat. Soc., Series A, 137, 131-164. Reeves, G.R. and K.D. Lawrence (1982). Combining multiple forecasts given multiple objectives, J. Forecasting, 1, 271-279. Wang, L. and G. Libert(I991).A package for combining forecasts, Technical Report, Faculte Polytechnique de Mons. Wang, Y. and B. Liu (1988). A recursive equal weighting method for combining forecasts, Chillese J. Forecasting,S, 18-22 Winkler, R.L. and S. Makridakis (1983). The combination of forecasts, J. Roy. Slat. Soc., Series A, 146, 150-157.
CONCLUSIONS The combination of forecasts improves forecasting accuracy and should, therefore, be used more frequently in practice. Any new combining method which contributes toward this exercise will be a welcome addition to the forecaster's tool kit. The proposed REW method not only reaches a forecasting accuracy comparable to that of the theoretically optimal combining methods, but also provides some valuable insights into the methodology of combination: i.e., the combination of combined forecasts can further improve the forecasting performance. This discovery may be not confined to the repeated use of equal weighting, rather, it may be extended to other differential weighting methods. Take Chemical Firm Sales case for example: after combining primary forecasts using RB and LP, we again combine the combined forecasts using EW. It can be found that through the second combination the MSE of LP has reduced to 359.96 from 388.34, while the MSE of RB only shows a small increase (from 350.37 to 359.56). This is especially meaningful when we do not know in advance which combining method Is the best one. The occurrence of outliers has been shown to lead to inaccuracy in the OLS solution, while the proposed LP method can effectively eliminate their effect. Several case studies have consistently confirmed this. Another interesting finding from the above cases is that VCB has performed well in all situations. This finding is consistent with previous studies (New bold and Granger, 1974; Winkler and Makridakis, 1983). Our conclusions may be quite limited, because of the small samples considered. Simulation studies can overcome this problem. However, as Clemen (1986) has pointed out, for the purpose of making recommendations to practitioners, it seems more appropriate to
132
APPENDIX: Results of Case Studies.
For .. caating Errora tor Ch.. rryoak Company Sal .. a
TABLE 1 Accuracy . . . . ur ••
Combining _thoda
Individual _thoda REGRI:.
REW
VCA
VCB
55.87 6.33 4.17
55.90 6.36 4.19
56.90 6.48 4.27
EW
EXTRA.
Th. whole .amp1.(.ampl .. • ise.19) 221.74 IISI: 61.66 6.46 12.40 IlA& 8.32 IlAPE 4.26
RB
RA
RC
LP
54.37 6.28 4.19
55.87 6.33 4.17
54.37 6.28 4.19
56.84 6.20 4.08
Th.. partial .ample(.ample .is •• 14, poat aample .iz ... 5, without outlier) 39.21 151. 81 65.23 41.50 43.15 38.38 IISI: 41.52 5.38 IlA& 5.35 9.52 6.03 5.59 5.59 5.65 4.67 2.89 2.43 IlAPI: 2.38 2.56 2.60 2.56
41.49 5.59 2.56
173.03 9.59 4.27
39.46 S.U 2.47
Tbe partial aampl .. (.amp1e .is ... 14, po.t .ampl . . . iz ... 5, with outli .. r) 41515 63.78 IISI: 39.21 151. 81 65.23 39.21 59.76 6.01 IlA& 5.35 9 . 52 6.03 5.35 2.38 29.03 2.87 2.38 4.67 5.22 IlAPI:
4643 59.93 29.11
6160 69.23 32.84
39.21 5.45 2.45
TABLE 2 Accuracy
83.29 7.82 5.22
22813 1U.18 69.04
For .. caatinll Errors tor Ch .. mical Firm Sal .. a
Individual _thod.
Combining
_thoda
ma.aur •• RA
RB
RC
LP
347.65 15.85 1.48
350.37 15.81 1.48
347.63 15.85 1.48
386.31 15.49 1.45
The partial aampl .. (.ampl .. aize.20, poat aample size-1O, without outli .. r) 505.52 MSE 870.79 825.45 1593.24 526.70 481. 01 492 . 57 565.53 21. 90 21.09 20.42 26.04 25.95 33.85 21. 52 20.07 IlAE 2.04 2.41 2.41 3.12 2.01 1.88 1.91 1. 96 IlAPE
474.28 19 . 92 1. 87
507.08 21.14 1.97
450.69 19.27 1.81
13610 95.30 8.71
24232 118.10 10.68
450.69 19.27 1.81
RA
RB
RC
LP
8.79 2.35 5.45
8.93 2.22 5.20
8.77 2.35 5.U
9.0' 2.20 5.11
Th. partial aample(aampl. aize.10, poat aampl. .1ze.4, without outlier) 14 .57 12.12 12.55 12.76 12.58 12.19 IISE 18.73 14 .01 2.67 IlAE 3.48 2.80 3.29 2.45 2.51 2.51 2.51 7.00 5.41 5.55 5.57 5.56 5.85 IlAPE 7.58 6.19
12 . 52 5.54
26.29 3.52 7.84
13.77 2.62 5.81
210.01 13.33 26.94
659.28 20.15 41.41
12.83 2.33 5.20
UP
KARII
IIREG
The whol .. aampl .. (aampl .. aiz ... 30) 609.33 1294.14 IISE 952.12 24.49 20.55 31.01 IlAE 2.27 1. 92 2.90 IlAPE
I:W 387.47 17.00 1.59
REW 351. 04 15.87 1.49
VCA
VCB 376.16 16.60 1. 56
350.52 15.86 1.49
Th.. partial aample(aample aize.20, poat aampl .. aiz .. z10, with outlier) 870.79 825.45 1593.24 526.70 1594 17952 5415.74 MSI: 21.52 33.86 109.24 21.91 IlAE 26.04 25.95 33.85 3.12 2 . 01 3.12 9.98 2.04 IlAPE 2.41 2.'1
TABLE 3 Accuracy _a.ur ...
For .. caating Errora tor HOIl Price a
Individual _thoda ECONO.
MIIlA
EXPERTS
Th. whole aampl .. (aample aiz •• 1') 23.10 MSE 20.88 10.93 2.73 3.62 IlA& 3.54 XAPE 8.61 6.U 8.12
24188 117.95 10.67
Combining EW 8.96 2 . 26 5.29
REW
VCA
8.95 2.27 5.32
9.07 2.23 5.26
VCB 9.00 2.28 5.36
Th. partial • ample(.amp1. aiz •• 10, poat aampl • aiz •• ' , with outli.r) 18.73 14.57 14 .01 12.12 18.74 274 .19 12.13 IISI: 3.48 2.U 3.48 2.80 3.29 2.45 14 .80 IlA& 7.58 5.41 7.58 29.99 5.40 IlAPE 6.19 7.00
133
_thoda
233.18 11.18 22.00
2.50