International Journal of Forecasting 16 (2000) 101–109 www.elsevier.com / locate / ijforecast
Does updating judgmental forecasts improve forecast accuracy? a b, c Marcus O’Connor , William Remus *, Kenneth Griggs a University of New South Wales, Sydney, NSW, Australia University of Hawaii, 2404 Maile Way, Honolulu, HI 96822, USA c California Polytechnic, San Luis Obispo, CA, USA
b
Abstract This study investigates whether updating judgmental forecasts of time series leads to more accurate forecasts. The literature is clear that accurate contextual information will improve forecast accuracy. However, forecasts are sometimes updated when pure temporal information like the most recent time series value becomes available. The key assumption in the latter case is that forecast accuracy improves as one gets closer in time to the event to be forecasted; that is, accuracy improves as new times series values become available. There is evidence both to support and to question this assumption. To examine the impact of temporal information on forecast accuracy, an experiment was conducted. The experiment found improved forecast accuracy from updating time series forecasts when new temporal information arrived if the time series was trended. However, there appeared to be no value in updating time series forecasts when the time series were relatively stable. 2000 International Institute of Forecasters. Published by Elsevier Science B.V. All rights reserved. Keywords: Judgment; Time series; Rolling forecasts
1. Introduction At companies around the world, monthly meetings are convened to forecast the sales of key products for many months into the future. At these monthly forecasting meetings previously made forecasts are updated as well as new forecasts made. The assumption in updating the forecasts is that the process of updating a forecast made some time ago for new information will improve the accuracy of the final *Corresponding author. Tel.: 11-808-956-7608; fax: 11-808-956-9889. E-mail address:
[email protected] (W. Remus)
forecasts. For example, a forecast for December’s sales made in February (a 10 month ahead forecast) should be less accurate than a forecast for the same month made in November (a 1 month ahead forecast). This paper examines this underlying assumption in a laboratory study.
2. Background The updating of forecasts described above is typical of product forecasting environments. Lawrence, O’Connor and Edmundson (1999) examined the process of forecasting in consumer product organizations in Australia. Ac-
0169-2070 / 00 / $ – see front matter 2000 International Institute of Forecasters. Published by Elsevier Science B.V. All rights reserved. PII: S0169-2070( 99 )00039-4
102
M. O’ Connor et al. / International Journal of Forecasting 16 (2000) 101 – 109
ross the organizations the forecast horizons vary, but in many cases a system of ‘rolling forecasts’ existed. For some companies, only two forecasts for each month were required. For others, 12 months of rolling forecasts were in evidence. Often the differences between the companies resulted from factors such as raw material purchasing cycles and the need to plan for major promotions in the medium term. In product forecasting the presumption of improved accuracy with reduced lead-time is often justified by the assumption of the increasing relevance of information as the event becomes closer to realization. Is it true that the latest forecast in the rolling forecast sequence is the most accurate? Some evidence for this can be gained from an examination of the way in which forecast accuracy deteriorates as the forecast horizon increases.
Fig. 1 presents the Mean Absolute Percentage Error (MAPE) for selected statistical methods (Makridakis et al., 1982) and judgmental methods (Lawrence, Edmundson & O’Connor, 1985) used for forecasting time series. As shown in Fig. 1, forecast accuracy deteriorates as the forecast horizon increases. This is true for all forecasting methods shown. Similar graphs characterize the forecasts in Fildes, Hibon, Makridakis and Meade (1998). These results, however, apply to a task that is static. That is, the forecasts were made at one point in time. There was no re-forecasting as new data arrived. No feedback was given to either the models or the people that would have allowed an assessment of prior accuracy and an adjustment for the new data. Since many forecasting processes are dynamic, data such as those in Fig. 1 do not demonstrate that the latest
Fig. 1. Mean Absolute Percentage Error (MAPE) for selected statistical methods and judgmental methods used for forecasting time series.
M. O’ Connor et al. / International Journal of Forecasting 16 (2000) 101 – 109
forecast in the rolling forecast sequence is the most accurate. Nevertheless, they do provide an indication as to the way in which forecast accuracy improves as the forecast horizon shortens. In a dynamic forecasting task like the example at the beginning of this paper, forecasts are continually revised in the light of new information or improved understanding of the generating process. In such product forecasting meetings, the new information can be divided into two categories. The first category is temporal information. Here the discussion centers on timing; that is, the latest actual values of the product at issue. The second category is contextual information. Here the discussion centers on the non-time series information about the product; for example, information about the competitors actions. Of course, these two categories of information are not orthogonal. The contextual information may address the reasons for the temporal movements. There is some evidence that temporal information can improve forecast accuracy. Brown, Hagerman, Griffin and Zmijewski (1987) found that one of the key reasons for the superior performance of security analysts over statistical forecasting models was due to the timing of the judgmental forecasts — the judgmental forecasts were invariably made close to the event. Studies by Hopwood and McKeown (1990) and Waymire (1986) had similar results. Thus, these studies point to the importance of the forecast timing. However, note that in these studies the more up-to-date values of the time series (temporal information) often was confounded with improved contextual information as is noted by Brown et al. (1987), Fried and Givoly (1982), and Brown (1996). There is also some evidence that temporal information may not improve forecast accuracy since there may be problems associated with judgmental forecasts made close to the event. O’Connor, Remus and Griggs (1993) found that
103
forecasts were often subject to unneeded last minute alterations and adjustments. Specifically, people tended to over-react to the revelation of the last actual value. Those findings were replicated and extended by Remus, O’Connor and Griggs (1995) who found subjects reacted much more strongly to the most recent data point provided than did forecasting methods such as single exponential smoothing. These foregoing results are consistent with several experiments that found that decision makers often believe random variation indicates a persisting change (e.g., Kahneman & Tversky, 1973). Hence, it is not clear whether forecast timing by itself improves forecasts. Contextual information is the second category of information discussed at product forecasting meetings; these discussions center on the nontime series information about the product. Contextual information can provide explanations for the past behavior and future direction of the series. For example, promotion strategies provide both an explanation of past time series movements as well as for the implications of future movements of the time series. Contextual information and other non-time series information has been extensively studied (e.g., Waymire, 1986; Remus et al., 1995) and correct contextual information has been demonstrated to improve forecast accuracy (Lawrence et al., 1999). In one of the few studies to address this issue, Lawrence and O’Connor (1999) examined the improvements in accuracy of the forecast revisions in a product sales environment. Contrary to expectation, they found that the process of revising forecasts made little difference to accuracy. Any improvements in accuracy as the forecast horizon shortened were quite small and a lot less than the possible improvements in accuracy suggested by the statistical methods. As the forecast horizon shortened from five periods ahead to one period ahead, there was about 36% greater improvement in accuracy
104
M. O’ Connor et al. / International Journal of Forecasting 16 (2000) 101 – 109
with the statistical methods compared to the company judgmental forecasts, where contextual information assumes a key role. Lawrence and O’Connor (1999) suggest that the prime reason for the lack of improvement is an over-reaction to the information content of the new contextual information that becomes available. An alternative explanation is that people are not examining the new contextual information and just over-reacting to the revelation of the last actual value (O’Connor et al., 1993). Thus, one needs to determine whether such over-reaction is due to the new contextual information or merely the last actual value. Whilst not addressing this issue directly, this study focuses on the way in which people react on the basis of time series information alone. Thus, this study examines whether using rolling forecasts improves forecast accuracy in a time series setting. The objective was to observe the revision of forecasts as new values of the time series were revealed. Properly conducted research on rolling forecast accuracy needs to disaggregate these two kinds of information; a reaction to past movements in the time series and the incorporation of new information into the latest forecast. Our focus in this experiment was on the impact of temporal information only since the impact of contextual information has been extensively studied. Contextual and temporal information is often used to do more than just make a one period ahead prediction. Often such information is used to detect, confirm, or project emerging trends. Thus we also were interested in whether temporal information is effective by itself in detecting, confirming, and projecting trend changes. Thus not only did we examine the impact of temporal information on forecasts of noisy, flat time series but also its impact on emerging upwardly trended and downwardly trended series. The revelation of the latest actual value is hypothesized to induce a re-assessment of one’s beliefs about the overall trend of the time series. In view of the strong evidence for recency effects,
such new information may lead to excessive reactions or re-assessments of trend.
3. Research design In describing the research task, we will first describe the time series to be forecast. We will then describe the experimental design used to test how people’s forecast accuracy is affected when they update their forecasts. Lastly, we will provide details on the subjects and the presentation of the forecasting data to the subjects.
3.1. The time series Time series containing discontinuities were generated for this experiment. Each time series was divided into four contiguous segments. Segment 0 (periods 1 to 20). This segment was 20 periods of historical data generated with a base of 100 and error added. These data were displayed to the subjects so they could assess the initial characteristics of the time series. Segment 1 (periods 21 to 28). This segment was a continuation of the series as displayed for the first 20 points (segment 0). However, the subjects now made forecasts in each of the periods from 21 to 28. In this way, they became accustomed to forecasting the series. Segment 2 (periods 29 to 38). At period 29, the underlying series either grew at 2 units per period or declined at 2 units per period. In addition, we included a flat control series. Segment 3 (periods 39 to 44). The flat, upwardly or downwardly time series continued into the third segment. However, after period 38, the subjects were asked to make two independent sets of three rolling forecasts (in addition to the forecasts as required in segments 1 and 2). After 38 they forecasted 39, 40, and 41. In each case, the cursor moved to that period, a message was printed, and the forecast gathered. Then the actual value of the series for 39 was shown and the forecasts for 40 and 41
M. O’ Connor et al. / International Journal of Forecasting 16 (2000) 101 – 109
erased (leaving only the forecast and actual for period 39). Then the forecasts for 40, 41, and 42 were gathered and the actual for period 40 was shown and forecasts for 41 and 42 erased (leaving only the forecast and actual for period 40). This procedure was repeated for periods 41 to 44. In the last rolling forecast following period 44, they forecasted 45, 46, and 47. Then the actual value of the series for 45 was shown and forecasts for 46 and 47 were erased (leaving only the forecast and actual for period 45). The subjects then began to forecast another time series. The randomness added to each series was uniformly distributed since earlier work (O’Connor & Lawrence, 1989) showed that a uniform distribution best models the forecast errors. The randomness added was 5% MAPE. This level of randomness is representative of error rates in sales forecasting (Dalrymple, 1987; Taranto, 1989). There were nine time series in total consisting of three types of time series (Flat, Down, and Up). Series 1 to 3 were flat series (mean 100) with no discontinuities. Series 4 to 6 changed to a declining series at period 29 (the start of segment 2) as the flat series (mean 100) changed to an underlying series declining at 2 units per period. Series 7 to 9 changed to an upward series at period 29, as the flat series changed to an underlying series growing at 2 units per period. The nine series were randomly presented to the subjects as described below.
3.2. Details of the data gathering procedures As in O’Connor et al. (1993), the display of data and the gathering of the subjects’ forecasts were done on Macintosh computers with Hypercard software and a mouse interface. After viewing a graphical display of the first 20 points of the series, subjects were required to forecast the next point in the series. When making their forecasts, all subjects used a mouse to ‘point’ to their forecasted value on the graphical display
105
and ‘clicked’ to record that value. Following the forecast for the current point, the actual value of the time series was graphically displayed (the forecasted value also was shown). The cursor then moved to the right ready for the next forecast. The gathering of rolling forecast data began after period 38 as described in the previous section. The series were presented in a random order unique to each subject. The 54 subjects for the experiment were undergraduate students at the University of Hawaii. They were recruited from an Operations Research course that covered time series forecasting in the 2 weeks prior to the experiment. The subjects received course credit for participating and four $20 US prizes were given for the best forecasting accuracy. The subjects were trained in the use of the software immediately before beginning the experiment. In this training session, the subjects were told that some of the nine series that they would forecast might grow or decline. They were also told to provide point forecasts every period until after period 38 when the rolling forecasts would begin. No context was given for the nine time series to be forecast (e.g., forecasting sales, stock prices, or demand) to avoid having the subjects bring their preconceptions into the forecasting process. The subjects then forecast a practice series to finish the training session. After forecasting one practice series, the subjects viewed segment 0 data and forecast the nine series described earlier including completing the rolling forecasts in segment 3. This sequence was repeated for each of the nine series. The average time taken to complete the task was less than 1 h.
4. Analysis and results In this analysis we focus on the accuracy of the rolling forecasts with flat, growing, and declining time series. We chose absolute fore-
106
M. O’ Connor et al. / International Journal of Forecasting 16 (2000) 101 – 109
cast error as our measure of forecast accuracy as it is the most widely known and used of forecast accuracy measures (Makridakis, 1993). For this analysis we calculated the absolute forecast error for one period, two period, and three period ahead forecasts. These absolute errors were then compared using repeated measures Analysis of Variance (ANOVA) with multiple comparisons (Pedhazur & Schmelkin, 1991, pp. 482–490).1 The first step in the analysis was to ensure that the randomization of the time series had been effective in avoiding any effects due to the order of presentation. This was tested by examining the one period ahead forecasts error across treatment number while controlling for time series direction (Up, Down, or Flat). The ANOVA test on one period ahead forecast error (F 5 1.51, df58, 2860, P , 0.147) found no evidence for an order effect. Table 1 presents the relative forecasting accuracy of one, two, and three period ahead rolling forecasts controlling for time series direction (Up, Down, or Flat). Table 1 shows the absolute errors of all forecasts to be significantly different across the periods in the forecast horizon (the number of periods ahead that the forecast was made) (F 5 11.28, df52, 3938, P , 0.001). The further one forecasted into the future, the larger the forecast error. Table 1 also shows the absolute forecast error to be significantly different across time series direction (F 5 18.71, df52, 1869, P , 0.001). 1
In the ANOVA results presented, the sample size in each cell often differs from that in another cell. This results from the following. To point to their forecasted value, the subjects would pick up a cursor and move that cursor while holding down the right mouse button. They then released the mouse button at the desired forecast value on the graph. Sometimes they would pick up the cursor and not hold down the mouse button. These bad pieces of data were known since the forecasted value was close to the cursor pick up point. These points were discarded from the analysis.
To explore these differences we used the method of multiple comparisons; the results are shown at the bottom of Table 1. Whether the series was trended upward or downward made no difference (t 5 1.39, P , 0.163); however, whether the series was trended or flat did (t 5 2 5.94, P , 0.001). Overall, the absolute forecasting error was greater for the trended series. Direction and forecast horizon also interacted significantly (F 5 4.93, df54, 3738, P , 0.001). To examine this effect, Fig. 2 displays the absolute forecast error by direction and forecast horizon. As shown in Fig. 2, the absolute forecast error for the trended series grew as the forecast horizon increased. The trended series one period ahead forecast was significantly more accurate than the trended series two period ahead forecasts (paired t-test5 24.16, df5 1571, P , 0.001) and the trended series three period ahead forecasts (paired t-test5 25.81, df51260, P , 0.001). On the other hand, also shown in Fig. 2, the absolute forecast error for the flat series did not grow as the forecast horizon increased. The flat series one period ahead forecast was not significantly more accurate than flat series two period ahead forecasts (paired t-test50.49, df5 892, P , 0.624) nor the flat series three period ahead forecasts (paired t-test50.69, df5633, P , 0.488).
5. Discussion and conclusions As we noted early in the paper, the new information discussed in typical product forecasting meetings can be divided into two categories. Discussion about the first category of information (temporal information) centers on the latest actual values. Discussion about the second category of information (contextual information) centers on the non-time series information that is discussed in relation to the forecast. This is the information that provides an
M. O’ Connor et al. / International Journal of Forecasting 16 (2000) 101 – 109
107
Table 1 The absolute forecasting error of one, two, and three period ahead forecasts across the direction of time series One period ahead absolute forecast error Series direction Flat Up Down For entire sample
Mean 6.478 7.005 6.844 6.775
S.D. 4.680 5.193 5.439 5.113
N 630 627 615 1872
Two period ahead absolute forecast error Series direction Flat Up Down For entire sample
Mean 6.348 7.847 7.348 7.178
S.D. 4.319 6.022 5.687 5.423
N 630 627 615 1872
Three period ahead absolute forecast error Series direction Flat Up Down For entire sample
Mean 6.284 8.187 7.906 7.454
S.D. 4.638 5.804 6.191 5.638
N 630 627 615 1872
SS
DF
MS
F
Significance of F
Tests on main effect Direction Within cells
1762.39 88 043.56
2 1869
881.19 47.11
18.71
,0.001
Test on repeated measures Horizon Direction by horizon Within cells
442.05 386.21 73 215.18
2 4 3738
221.03 96.55 19.59
11.28 4.93
,0.001 ,0.001
Coefficient 0.543122945 23.9934111
S.E. 0.38952 0.67143
t-Value 1.39433 25.94759
Significance of t ,0.163 ,0.001
Source of variation
Multiple comparison Up vs. down Trended vs. flat
explanation for the time series past behavior and can be used to predict the future direction of the time series. This study focused only on temporal information. We found forecast horizon and trend interacted; that is, using the temporal information significantly improved time series forecasting accuracy when forecasting upwardly and downwardly trended series. Updating flat time series did not significantly improve the forecasts nor does it reduce forecasting accuracy. Thus,
using temporal information to revise forecasts seems to be a worthwhile way to improve forecasting accuracy, particularly when dealing with trended data. O’Connor, Remus and Griggs (1997) provide an extended discussion on the forecast and trend interaction. We might note that contextual information was deliberately withheld to permit analysis of the effect of temporal information alone. Clearly when new contextual information and other non-time series information arrive, the forecasts
108
M. O’ Connor et al. / International Journal of Forecasting 16 (2000) 101 – 109
Fig. 2. Absolute forecast error across forecast horizon.
need to be updated [see Remus et al. (1995) for a general discussion of that literature and Waymire (1986) for a discussion of the literature as related to the decisions of security analysts]. The generalizability of the current study is limited by the laboratory setting used. We attempted to reduce the threat to external validity by designing the experimental setting to capture relevant aspects of the forecasting task without introducing artifacts or having technology get in the way of the forecasting process. There are, however, limitations in generalization since the current experiment examined only one level of variance (5%) and only flat, growing (at 2 units per period), and declining (at 2 units per period) time series. Thus, care must be taken when generalizing our results to real forecasting situations.
References Brown, L. (1996). Analyst forecasting errors and their implications. Financial Analysts Journal 52, 40–47. Brown, I., Hagerman, R., Griffin, P., & Zmijewski, M. (1987). Security analyst superiority relative to univariate time-series models in forecasting quarterly earnings. Journal of Accounting and Economics 9, 61–87. Dalrymple, D. J. (1987). Sales forecasting practices: results of a United States survey. International Journal of Forecasting 3, 379–391. Fildes, R., Hibon, M., Makridakis, S., & Meade, N. (1998). Generalising about univariate forecasting methods: further empirical evidence. International Journal of Forecasting 14, 339–358. Fried, D., & Givoly, D. (1982). Financial analyst’s forecasts of earnings: a better surrogate for market expectations. Journal of Accounting and Economics 4, 85–107. Hopwood, W., & McKeown, J. (1990). Evidence on surrogates for earnings expectations within a capital market context. Journal of Accounting, Auditing and Finance Summer, 339–368.
M. O’ Connor et al. / International Journal of Forecasting 16 (2000) 101 – 109
Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review 80, 237–251. Lawrence, M. J., Edmundson, R. H., & O’Connor, M. J. (1985). An examination of the accuracy of judgmental extrapolation of time series. International Journal of Forecasting 1, 25–35. Lawrence, M. J., O’Connor, M. J., & Edmundson, R. (1999). A field study of sales forecasting accuracy and processes. European Journal of Operations Research, in press. Lawrence, M. J., & O’Connor, M. (1999). An investigation of the performance of fixed event judgementally estimated sales forecasts. Working paper, UNSW. Makridakis, S. (1993). Accuracy measures: theoretical and practical concerns? International Journal of Forecasting 9, 437–450. Makridakis, S., Anderson, A., Carbone, E., Fildes, R., Hibon, M., Lewandowski, R., Newton, J., Parzen, E., & Winkler, R. (1982). The accuracy of extrapolation (time series) methods: results of a forecasting competition. Journal of Forecasting 1, 111–153. O’Connor, M., & Lawrence, M. (1989). An examination of the accuracy of judgemental confidence intervals in time series forecasting. Journal of Forecasting 8, 141–156. O’Connor, M. J., Remus, W. E., & Griggs, K. (1993). Judgmental forecasting in times of change. International Journal of Forecasting 9, 163–172. O’Connor, M. J., Remus, W. E., & Griggs, K. (1997). Going up–going down: how good are people at forecasting trends and changes in trends? Journal of Forecasting 16, 165–176. Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement and Analysis: An Integrated Approach, Student edition, Erlbaum, Hillsdale, NJ.
109
Remus, W. E., O’Connor, M. J., & Griggs, K. (1995). Does reliable information improve the accuracy of judgmental forecasts? International Journal of Forecasting 11, 285–293. Taranto, G. M. (1989). Sales forecasting practices: results from an Australian survey. Unpublished thesis, University of New South Wales. Waymire, G. (1986). Additional evidence on the accuracy of analysts forecasts before and after voluntary management earnings forecasts. Accounting Review 61, 129– 142. Biographies: William REMUS is a Professor in the College of Business Administration at the University of Hawaii, USA. His research interests focus on the process of human judgment as used in forecasting and decision making. Marcus O’CONNOR is a Professor in the School of Information Systems at the University of New South Wales, Sydney, Australia. His research interests center on the process of forecasting, especially the way judgement is incorporated into the final forecasts. Ken GRIGGS is an Associate Professor at California Polytechnic at San Luis Obispo, USA. His research interests are object-oriented languages and systems analysis. Their work has appeared in International Journal of Forecasting, Journal of Forecasting, Organizational Behavior and Human Decision Processes, and Management Science.