ELSEVIER
International Journal of Forecasting 12 (1996) 139-153
Judgmental forecasting with time series and causal information J o a S a n g L i m a, M a r c u s
O ' C o n n o r b'*
aSamsung Corporation, Seoul, South Korea bSchool of Information Systems, University of NSW, Sydney, N.S.W. 2052, Australia
Abstract
Although contextual or causal information has been emphasised in forecasting, few empirical studies have been conducted on this issue in controlled conditions. This study investigates the way people adjust statistical forecasts in the light of contextual/causal information. Results indicate that people appeared to reasonably incorporate extra-model causal information to make up for what the statistical time-series model lacks. As expected, the effectiveness of causal adjustment was contingent upon the reliability of the causal information. While adjustment of forecasts using causal information of low reliability did not lead to significant improvement, adjustment using highly reliable causal information produced forecasts more accurate than the best statistical models. However, people relied too heavily on their initial forecasts compared with the optimal model. Moreover, people did not seem to learn over time to modify this conservative behaviour. People also seemed to prefer statistical forecasts in favour of causal information. Keywords: Judgemental adjustment; Causal information
1. Introduction The role of extra-model contextual knowledge has been demonstrated in a number of field studies for judgemental forecasting (Edmundson et al., 1988; Sanders and Ritzman, 1992) as well as for judgemental adjustment (Mathews and Diamantopoulos, 1986, 1989). Contextual knowledge 1 includes situational causal informa* Corresponding author. Tel.: +61 2 385-4640; fax: +61 2 662-4061. ~Gordon and Wright (1993, p. 149) defined contextual information as any information in addition to time series and labels. This information may include knowledge that practitioners gain through experience as part of their jobs (Sanders and Ritzman, 1992, p. 40).
tion, such as product and industry knowledge in the case of product forecasting. For example, weather information (e.g. temperature) is certainly an important causal factor in the sales of soft drinks or ice creams. This study is designed to address two research questions concerning the integration of contextual/causal information and time-series information. First, can people reasonably incorporate causal information into the final forecast and overcome the deficiencies of the statistical timeseries model? Second, how do people use causal information and statistical forecasts in deriving their final forecasts? The next section reviews prior literature on causal judgement with time series, and a set of
0169-2070/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved SSD1 0169-2070(95)00635-4
140
J.S. Lim, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153
hypotheses is developed subsequently in relation to the research questions. The remainder of the paper describes the design and results of a laboratory-based experiment that examines the ability of people to integrate causal information into the time-series forecasting process.
2. Prior literature on causal judgement
A number of studies in forecasting have demonstrated the importance of incorporating extramodel causal information into the final forecast. The role of judgement in this practice has been emphasised, particularly in the discontinuity situation where it can serve as a predictor of abrupt changes (Kleinmuntz, 1990). For example, Edmundson et al. (1988) and Sanders and Ritzman (1992) demonstrated how non-time series causal information improved the accuracy of the final forecast over the statistical and judgemental initial estimates. Managers who possessed considerable product knowledge were also found to successfully adjust exponential smoothing forecasts (Mathews and Diamantopoulos, 1986, 1989). Whilst contextual information has been shown to be vitally important in forecasting, little empirical analysis has been conducted on how people use the causal or contextual information. Although causal judgement has long been an issue in the psychological literature (for a review, see Klayman, 1988), we have little knowledge as to how this causal information is incorporated into a time-series forecasting task. Two empirical studies seem to be most relevant to the present study. Harvey et al. (1994) explored the accuracy of judgemental extrapolation and causal forecasting. They asked subjects to forecast the number of train passengers based on two cyclical series, the passenger movements themselves and the number of criminals travelling on the rail network. Forecasting based purely on criminal data was significantly less accurate than that based on the passenger time series itself. In their study, the causal information was expressed as time-series information, which was the same as
the passenger series except for amplitude and phase differences. Arguably, in most business settings the causal relationships are directly related and often far more complicated. Andreassen (1991) also empirically compared the accuracy of judgemental extrapolation and judgemental causal forecasting, although not in an adjustment task. The judgemental extrapolation group was provided with temporally ordered previous values and asked to make one-periodahead forecasts without any causal information. On the other hand, the causal forecasting group was provided with current situational diagnostic variables, from which they then forecasted a current value. In this causal condition, the numbers of target time series were presented randomly in order to minimise temporal information. The results indicated that judgemental extrapolation significantly outperformed judgemental causal forecasting (p < 0.0001). The reason was attributed to the fact that judgemental extrapolation was made only from the previous value, which would require less mental load and thus be subject to less cognitive biases. On the other hand, this result is not surprising since the naive value is an extremely good forecast (Makridakis et al., 1982). There are a number of cognitive biases in relation to causal forecasting. Tversky and Kahneman (1980) defined four types of evidence (D) in conditional judgement P ( X / D ) : causal, diagnostic, indicational and incidental. Normative Bayesian theory suggests that the impacts of these types of evidence on judgement should be identical, when they are of equal informativeness. However, different types of evidence had different impacts on probabilistic judgement, despite their equal informativeness. For example, people assign greater impact to causal (causes to consequences) than to diagnostic data (consequences to causes) of equal informativeness (p. 50). Moreover, the neglect of base rates has often been observed, in the presence of causal information. Schustack and Sternberg (1981) also found that causal inference was suboptimal and subject to many cognitive biases. They include the confirmatory trap, the undere-
J.S. Lim, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153
stimation of negative evidence, the neglect of base-rate information and insensitivity to the notion of sample sizes. In many cases, these results may be due to the feedback mechanism, which is often flawed or unavailable in the real world (Brehmer, 1980; Garb, 1989). For example, some positive hearsay (e.g. bankruptcy of competitors) could lead a forecaster to make some favourable adjustments from the initial forecast. As this whole process is very often intuitively performed in his/her mind alone, the forecaster is often deprived of an opportunity to receive systematic feedback from the forecasting monitoring systems (Goldberg, 1968; Hogarth, 1981). Moreover, this judgement-outcome relationship is often mitigated by spurious correlations and thus strengthens the false causal chain (e.g. illusory correlation, Chapman and Chapman, 1969; Einhorn and Hogarth, 1982). Thus, whilst a number of studies have demonstrated the crucial role of causal or contextual information in real life forecasting situations (Edmundson et al., 1988; Sanders and Ritzman, 1992), other studies performed in the laboratory have questioned the ability of people to perform causal forecasting. It should be noted, however, that studies from practice have usually examined circumstances where people use their contextual information in conjunction with statistical or judgemental time-series forecasts. In other words, they adjusted the forecasts, which had been based on pattern matching and recognition procedures, for the causal information provided to them. This was certainly not the case with the laboratory-based studies reviewed earlier. They either forecast with time series or causal information. As the studies from practice emphasise, we believe that it is more realistic (Lawrence et al., 1994) to consider the case of judgemental adjustment of statistical forecasts for the causal information. Unfortunately, because they were not undertaken in controlled conditions, the studies from practice cannot provide insight into what information is used and how it is used in the adjustment process. In utilising laboratory conditions, this study con-
141
trois the nature and amount of causal information and provides some understanding on the way people made adjustments to initial forecasts.
3. Development of research hypotheses 3.1. Causal adjustment
Causal forecasting was found in previous studies to be less accurate than judgemental extrapolation (Andreassen, 1991; Harvey et al., 1994). However, this finding may be due to a heavier mental load required and more cognitive biases involved in causal forecasting (Tversky and Kahneman, 1980). Andreassen (1991, p. 20) urged further research: one interesting question for the future will be whether having access to information that would allow combined extrapolation and causal judgemental forecasting is more or less accurate than judgemental extrapolation alone. This is examined in the present study. As the extra information should lead to an improvement over the initial extrapolation, the first hypothesis is formulated as follows:
Hla: Improvement in forecast accuracy for the people with causal information is greater than for those who do not have it. Whilst there are a few empirical studies which have examined causal forecasting of time series (Andreassen, 1991; Harvey et al., 1994), the impact of causal information on judgemental adjustment from the initial forecast has not been investigated. In practice, the magnitude of judgemental adjustment from the statistical forecast is often determined by causal information and thus the difference between the two methods from different sources is hard to differentiate. In this study, the difference between causal adjustment and judgemental adjustment is clearly made. The judgemental adjustment group is provided only with the statistical forecast, whereas the causal adjustment group is allowed a single piece of causal information. Since judgemental and statistical extrapolation have
142
J.S. Lira, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153
approximately equal forecast accuracy (Lawrence et al., 1985), we expect that people with causal information would produce better forecasts than those with statistical forecasts. Our second hypothesis is then: H~b: Improvement in accuracy for the people with causal information is greater than for those who have been provided with a statistical forecast. There is a considerable body of evidence in support of judgemental adjustment when people have extra-model information (Mathews and Diamantopoulos, 1986). It is also important to determine whether people could utilise extramodel causal information to make up for the weakness of the statistical forecast. Little research has been conducted into the contribution of causal information toward judgemental adjustment of the statistical forecast (cf. Wolfe and Flores, 1990). To address this issue, two experimental conditions are created. The judgemental adjustment group is allowed to look at only the statistical forecast, whereas those in the mixed adjustment condition are provided with causal information as well as the statistical forecast. Thus, the third hypothesis is put forward as: Hlc: Improvement in accuracy for the people with the statistical forecast and the causal information is greater than for those who have been provided with the statistical forecast alone.
3.2. The predictive validity of causal information A succinct conclusion from SCPL (Single Cue Probability Learning) studies from psychology is that decision performance (rxy) improved as cue validity increased (Brehmer and Lindberg, 1970; Brehmer, 1973, 1978). Moreover, despite some exceptions (Brehmer, 1976), decision consistency also improved as cue validity increased (Brehmer, 1978, 1980; Brehmer and Kuylenstierna, 1978). This is not surprising at all in that
the predictability 2 of the task imposes a maximum achievable performance according to the Lens model (Brehmer and Kuylenstierna, 1978, p. 460), and thus those in a high-predictable task perform better than those in a low-predictable task. This general superiority of decision-making with high-validity information is persistently shown under a variety of functional forms: linear vs. non-linear (Brehmer, 1976, 1980) and negative vs. positive relationships between cue and criterion (Brehmer, 1976, 1980; Bremher and Kuylenstierna, 1978). These studies were typically conducted with naive students (high-school or undergraduate students) making predictions of criterion values from environmental cues in a general task context (Brehmer and Brehmer, 1988). This finding seems very robust and as such may be generalisable to more meaningful contexts. In managerial pricing decisions, Ashton (1981) asked people to infer the relationship between criterion (price) and cues (demand, cost and competition) from a set of past pricing decisions and to apply this decision-model knowledge to another set of data in the same context. The subjects' decision performance and consistency at this task significantly improved as task predictability (Re) increased (p < 0.01), consistent with the findings in SCPL tasks (Brehmer, 1978). In accounting, Casey and Selling (1986) also found that those in the high-validity task condition made bankruptcy predictions that were significantly more accurate than those in the low-validity task. This superiority was persistent even in the presence of prior probabilities which were arbitrarily given by the experimenters. This evidence suggests that with the accuracy of the initial forecast being equal, improvement from it would be greater for the high-validity group. Thus, we formulate a null hypothesis that: H2: Improvement from high-validity causal information is greater than that from lowvalidity causal information. 2 Cue validity is a conceptual equivalent of task predictability when there is only a single piece of information to predict a criterion value, as in SCPL.
J.S. Lim, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153
The term high (low) validity causal information is often abbreviated to high (low) causal information in the remainder of the paper.
3.3. The learning of causal information The ability of people to learn probabilistic relations between cue and criterion has been of interest to some researchers (Brehmer, 1988). Most empirical studies generally found some learning, although it was conditional on (1) the type of feedback (Hammond et al., 1973), and (2) functional cue-criterion relationships (Brehmer, 1973, 1988: Klenimuntz, 1990). In a SCPL task, Brehmer and Lindberg (1970) and Naylor and Clark (1968) found that decision performance increased over time for a high-valid cue whereas it did not for a low-valid cue. Later, Brehmer (1973, p. 392) claimed that this differential learning across the validity conditions was merely due to confounded effects. Indeed, Brehmer and his colleagues (Brehmer, 1973; Brehmer and Kuylenstierna, 1978) generally reported some learning in linear SCPL tasks (cf. Brehmer, 1980; no significant learning at a nonlinear task) for both high- and low-cue validity and such learning did not appear to interact with the magnitude of cue validity. In time-series task settings, Lim and O'Connor (1994) found that people persistently relied on their own initial forecast and did not increase their reliance on the statistical forecast over time. Despite the little empirical evidence from a time-series setting and mixed results from the SCPL studies, we expect that people should learn to use the causal information over time. Thus, the third hypothesis is formulated as: H3: People improve in forecast accuracy more rapidly over time with causal information than without it.
4. Research methodology
4.1. Subjects The subjects were 40 Masters and Ph.D. students who were taking post-graduate subjects
143
offered in the School of Information Systems at the University of New South Wales, Australia. The great majority of the students were employed in business full time, completing their studies on a part-time basis. They were required to participate as part of the course as announced by the lecturer in charge. Task-contingent monetary incentives were provided and there was about a 60% chance of winning A$10-25 for each student. Subjects were informed of this incentive prior to the start of the experiment. Casual observation indicated that the students were particularly motivated by this incentive scheme. They were rather impressed with the high chance of winning!
4.2. Research design and experimental procedure This study was conducted as a 2 (statistical forecast) × 3 (causal information) × 3 (blocks) factorial design with a repeated measure on the last factor. The first between-subject variable consisted of two levels depending upon the presence or absence of a statistical forecast. The second between-subject factor was causal information, which was composed of three levels: (1) no causal, (2) low causal and (3) highly causal information. That is, subjects were either provided with no causal information, causal information with low validity or causal information with high validity. The term low causal (high causal) refers to a low correlation (high correlation) between the causal information and sales of a soft drink. The last factor was three time blocks of 10 iterations each. The research design can be categorised into four methods of judgemental forecasting. They include judgemental extrapolation and three methods of adjustment, depending upon the nature of reference information---causal information, statistical forecast and both of them. The experimental requirements of the four conditions were as follows (see also Table 1).
4.2.1. Judgemental extrapolation Neither causal information nor a statistical
144
J.S. Lim, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153
Table 1 Information provided to four forecastingmethods
Judgementalextrapolation Judgemental adjustment Causal adjustment Mixed adjustment
Time-series information yes yes yes yes
forecast was provided. People simply projected one period ahead judgemental forecasts from the time series alone that was graphically displayed on the computer screen. Yet they were also asked to revise their initial forecast to maintain the experimental condition comparable with the other forecasting methods.
4.2.2. Causal adjustment Having made an initial judgemental forecast, causal information was then presented in the form of bar charts, and percentage changes in the causal variable from previous periods were also presented.
4.2.3. Judgemental adjustment Subjects were required to make an initial forecast. They were then provided with a statistical forecast from which they made a revised judgemental forecast.
4.2.4. Mixed adjustment In this condition, subjects were provided with both causal information and the statistical forecast after their initial forecast. Note that the information was simultaneously presented to avoid recency effects by displaying one before the other (Hogarth, 1987).
4.3. Task instrument
The task used was represented as sales of a soft drink at Bondi beach, a famous surf beach
Statistical forecast no yes no yes
Causal information no no yes yes
in Sydney, Australia. Not only does the beach attract a large number of the local population (it is an attractive beach with a good surf), but it is also a popular place for tourist visitors. Sales of soft drinks are dependent largely on two fact o r s - t h e temperature of the day and the number of tourists visiting Sydney at the time. Of these two cues, temperature is, by far, the most important. In this study, temperature is viewed as the high causal cue and the number of tourist visitors as the low causal cue. The time series of sales of soft drinks was generated from 9 a.m. temperature data which was obtained from the Bureau of Meteorology, Australia. Thus, the major causal variable--i.e. temperature--was used to generate the series. The formula used to generate the actual time series was as follows: SALES, = 100.1 x TEMPERATURE, ( t = 0 , 1 , 2 ..... i). This equation data into the illustration of information is
merely transformed temperature realistic forecast dimension. An the task instrument with causal contained in the appendix.
4.4. Independent variables
As mentioned, there were three independent variables manipulated in this study. (a) The first was the presence or absence o f a statistical forecast. The statistical forecast group was provided with the damped exponential smoothing forecast (Gardner and McKenzie, 1985).
J.S. Lim, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153 (b) The second variable was the predictive validity of causal information. As mentioned, sales of soft drinks were generated from temperature data alone. However, two causal information conditions were generated. These two causal information conditions were generated via: C A U S A L INFORMATION t = T E M P E R A T U R E t + e~. Random errors were added to control the strength of the correlation relationship, i.e. predictive validity. Correlations were planned to be about 0.8 and 0.3 for the high- and the lowcausal information, respectively. The actual correlations (rxy) of the high- and the low-causal information with the actual time series were 0.759 and 0.285, respectively. The correlation between the damped statistical forecast and the actual series was 0.466, about mid-way between the high- and the low-causal validity. Unlike Andreassen (1991), where people were provided only with current causal information, the present study provided causal information for the past three consecutive days in bar charts so that they could comprehend the pattern of changes. Labelling information (Miller, 1971; Sniezek, 1986) was clearly provided on a separate page of the handout and enforced on each trial by displaying it on the pop-up window. The high-causal cue condition was labelled as temperature, and the low-causal cue condition as the number of tourist visitors to the beach. To maintain comparable experimental conditions between the two methods of adjustment, description (labelling) of the statistical forecast was also clearly provided. (c) The last independent variable was time blocks. Each subject forecasted one-period ahead for 30 iterations. To determine whether there was any change in improvement over time, these iterations were segmented into three blocks.
4.5. Analysis methodology Since the primary focus of this study was the
145
degree of improvement from initial judgemental estimates, forecast performance was measured in terms of improvement (IMP) as follows: I M P = APEbase - APErevise d , where APEbas~ (APEreviseo) represents the Absolute Percentage Error of the initial (revised) judgemental forecast. Thus I M P represents the improvement in accuracy of the revised forecast over the initial forecast. A 2 (statistical forecast) x 3 (causal information) × 3 (blocks) ANOVA with a repeated measure on the last factor was performed to test the major hypotheses of this study. Polynomial trend analyses were used for the time blocks. To examine anchoring-adjustment heuristics in causal adjustment, weights were computed by regressing the initial judgemental forecasts and the reference information (a statistical forecast and causal information) against the revised forecasts using step-wise regression, after appropriate normalisation.
5. Results
5.1. Comparison of four judgemental forecasting methods 5.1.1. Overall results The results of ANOVA indicate a significant difference in I M P due to the provision of both the statistical forecast (F = 6.133, p < 0.005) and causal information (F = 16.093, p < 0.0005). Table 2 shows that any adjustment method led to some improvements from the initial forecast,
Table 2 IMP means across four forecasting methods IMP Judgemental extrapolation Judgemental adjustment Causal adjustment Mixed adjustment
- 0.09 0.46 1.12 1.25
146
J.S. Lim, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153
with jud~emental extrapolation being the least accurate.
5.1.2. Hla: Judgemental extrapolation vs. causal adjustment The first hypothesis, Hla , compares the forecast accuracy of judgemental extrapolation and causal adjustment, as prompted by Andreassen (1991). Remember, causal information was provided only to the causal adjustment group in addition to time-series information common to both groups (see Table 1) and thus this hypothesis will bear out the true impact of causal information. Table 2 reveals that IMP was significantly greater for those (in causal adjustment conditions) who received both time series and causal information than those (in judgemental extrapolation conditions) who received timeseries information only ( F = 11.86, p <0.0005). This beneficial contribution of causal information occurred whether the statistical forecast was provided or not (i.e. judgemental adjustment vs. mixed adjustment) (p =0.509). The evidence presented thus far leads us to conclude that there was a difference between judgemental extrapolation and causal adjustment and, thus, Hla is accepted.
5.1.3. Hlb: Judgemental adjustment vs. causal adjustment With the second hypothesis, H l b , w e wish to compare forecast performance between causal and judgemental adjustment. Table 2 shows that IMP of causal adjustment (1.12) was not significantly greater than that of judgemental adjustment (0.46) (p = 0.086). This evidence leads us to reject Hlb.
5.1.4. Hlc: Causal adjustment vs. mixed adjustment The third hypothesis compared IMP for the causal and mixed adjustment groups. As Table 2 3 It should be.noted, however, that only mixed adjustment
(MAPE = 7.25) achieved marginal superiority in accuracy over the damped statistical forecast (MAPE = 7.80).
shows, there was very little difference in IMP between these two groups (p = 0.65). One of the reasons for this effect was that the initial forecasts of the causal adjustment group were comparatively inaccurate and so the improvement to the revised forecasts was bigger than expected. This evidence leads us to reject Hlc. To summarise, judgemental extrapolation was the least accurate of the four methods and people could take advantage of any additional information provided. This suggests that unaided judgemental eyeballing could benefit from adjustment from any type of information, with mixed adjustment providing the most improvement.
5.2. The reliability of causal information The hypothesis H 2 is concerned to test whether forecast improvement is affected by the reliability of causal information. Table 3 reveals that the effectiveness of causal adjustment was contingent upon the reliability of the causal information: that is, the accuracy of causal adjustment was lower when causal information was less reliable. We examine here the impact of causal information with regards to (1) adjustment from the initial forecast in the light of causal information alone (causal adjustment) and (2) adjustment from the initial forecast in the light of both causal information and the statistical forecast (mixed adjustment).
5.2.1. Impact on causal adjustment Table 3 indicates that the high-causal group improved from the initial forecast significantly more than did the low-causal group (F--30.645, p < 0.0005).
Table 3 IMP means across causal validity conditions
Causal adjustment Mixed adjustment
Low-causal
High-causal
0.20 0.31
2.03 2.20
J.S. Lim, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153
5.2.2. Impact on mixed adjustment Results similar to (1) occurred for mixed adjustment. IMP from high-causal information was higher than from low-causal information ( F = 2 8 . 4 8 , p <0.0005). The lack of a causal information x statistical forecast interaction (p = 0.732) suggests that high-causal information was also decision-effective in the presence of the statistical forecast--people took advantage of the causal information that was not contained in the statistical forecast (see Table 3).
5.3. Learning We wish to examine whether there was any difference in IMP across time, depending on the varying validity of the causal information. Resuits indicate that there was no interaction between blocks and the validity of the causal information (p = 0.866). Fig. 1 shows that people did not appear to learn the task well over the time blocks. In fact, there was a significant decreasing linear trend in accuracy of the revised forecast ( F = 9.590, p <0.005) and the initial forecast (F = 22.17, p < 0.0005). The increasing linear trend in IMP was merely due to the variation in decreasing accuracy rates between
12
the initial and the revised forecast. Slow learning in this study could be due to the difficulty of the task, in particular when people were provided with no information at all or low-causal information. Although they did not do well at the task even with high-causal information, the accuracy of the high-causal group did not deteriorate over time as much. Indeed, the no-causal group did perform significantly worse over time than the high-causal group (F = 5.729, p < 0.05). This evidence leads us to support the conclusion made earlier that people did not learn well to incorporate their judgement and the additional information into the final forecast. So, H3 receives some mixed support from the results.
6. Discussion
The ability of people to incorporate causal information into their forecasts is dependent upon a cognitive ability to (1) select diagnostic causal variables which are not contained in the statistical model and (2) meaningfully incorporate them into the task. This issue is of great importance since substantial benefits in judgemental adjustment come from contextual
::'~;..-:.... :......:~:-...:~%.:::::. .:~-.:.~-
10 ii~;~:,~:,.'~:~.~. "~ ~: ~l.
8
-:.::~:
Q Initial forecast
14,1
6
[ ] Revised forecast
4 2 0 block 1-10
block 11-20
147
."-!!i:i Ni~
~i!iii::
~:~
i~$i:
block 21-30
Fig. 1. Changes in accuracy over blocks for initial and revised forecasts.
148
J.S. Lira, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153
knowledge (Edmundson et al., 1988) and causal information is one type of contextual knowledge (Sanders and Ritzman, 1992). The present study examined the ability of people to incorporate causal information into the process of judgemental adjustment of the statistical forecast. First, it is encouraging that people appeared to incorporate causal information beneficially in judgemental adjustment of time series. People were generally able to differentiate the reliability of causal information. In other words, people could discern that they should place more faith in a high-causal cue rather than a low-causal cue. This finding is contrary to Harvey et al. (1994) and Andreassen (1991) who cautioned against causal forecasting. This contradicting result may be due to the graphical interface used in our study and the nature of the task. The studies of Harvey et al. and Andreassen employed a nongraphical interface where causal information was presented in tabular forms. Remember, the present study provided causal information of three consecutive days in bar charts. From this graphical presentation of causal information the subjects should have been able to associate better the behaviour of causal information with the graphical time series (e.g. Lawrence et al., 1986). Indeed, a greater mental load would be required to recall past causal actions on forecasts without graphical presentation. Second, people were also able to successfully adjust from causal information as well as from the statistical forecast. However, the effectiveness of causal adjustment was contingent upon the reliability of the causal information. That is, low-causal information was less useful than highcausal information (Brehmer, 1978; Ashton, 1981; Casey and Selling, 1986) and thus causal adjustment tended to be more accurate than other judgemental forecasting methods only when the adjustment was made from highly causal information. Third, people were able to use causal information to improve accuracy over the statistical forecasts (Sanders and Ritzman, 1992). Indeed, with high-causal information, people outperformed the statistical forecast. This finding pro-
vides some support for the forecasting literature which cautioned against routine judgemental adjustment conducted by those who do not have domain expertise (Goodwin and Wright, 1993). This finding may also extend the earlier studies of Andreassen (1991) and Harvey et al. (1994) who merely showed the inferiority of causal forecasting to judgemental extrapolation due to the cognitive biases and mental load involved in causal judgement. However, it should be noted that the subjects in this study dealt with a single causal cue which required less mental load. Accordingly, cognitive biases in causal judgement were less likely than those that may occur in a multiple cue task, as in Andreassen (1991). To explore the cognitive operation of incorporating causal information into the process of judgemental adjustment, the initial forecast, statistical forecast and the causal information (where appropriate) were regressed against the revised forecasts (to obtain the subjective weights), and against the actual value (to obtain the optimal weights). These subjective weights provide information about the emphasis people put on each of the independent variables in deriving their final forecasts.4 Table 4 provides these subjective and optimal regression weights. Table 4 shows that people were sensitive to the varying validity of the causal information. Smaller dependence was placed on the low-causal information (/3 = 0.16-0.17) than on the highcausal information (0.24-0.37). However, a comparison with the optimal model (/3 *) reveals that overall there appeared to be conservative behaviour. Indeed, conservatism was more dysfunctional for the high-causal information (subjective/3 = 0.24 vs. optimal/3 * = 0.73) than for the low-causal information (subjective /3 = 0.17 vs. optimal/3' = 0.27). One of the prime interests in this study was to examine any difference in the reliance of people 4 Two other regression models were also tested. T h e m o r e predictive of these models regressed the initial forecast and the m e a n of the series against the final subjective forecasts. In all cases the R 2 was less than the models presented in this paper.
149
J.S. Lira, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153
Table 4 Weights for initial forecasts and reference information
Initial forecast Statistical forecast Causal information Adj. R 2
Without statistical forecast
With statistical forecast
Low causal
High causal
No causal
/3 0.87 n/a 0.16 0.80
/3 0.65 n/a 0.37 0.69
/3 0.77 0.22 n/a 0.81
/3* 0.49 n/a 0.23 0.30
/3* 0.35 n/a 0.65 0.69
/3* 0.22 0.43 n/a 0.32
Low causal
High causal
/3 0.61 0.33 0.17 0.74
/3 0.66 0.26 0.24 0.83
/3* 0.36 0.28 0.27 0.40
/3* 0.13 0.41 0.73 0.82
/3 = subjective weights (standardised). /3"= optimal weights (standardised). p < 0.001 for all observations.
b e t w e e n the statistical forecast and causal information. R e m e m b e r , the predictability of the statistical forecast was about midway between the low- and the high-causal information. Thus, a normative forecaster should place his/her weights accordingly in order of the predictive diagnosticity of information. Surprisingly, however, Table 4 shows that people placed relatively greater weights on the statistical forecast than on any of the causal information, Indeed, /3 for high-causal information (0.24) was not higher than for the statistical forecast (0.26). Considering that the correlations for high-causal information, low-causal information and the statistical forecast were 0.760, 0.285 and 0.466, respectively, it suggests that they appeared to prefer the statistical forecast to causal information. Thid m a y be because of the efforts required to transf o r m the value of the causal information into the
forecast scale (e.g. to translate t e m p e r a t u r e information into sales figures). As Table 5 shows, people did not change their weightings or reliance over t i m e . We would expect that reliance on the high-causal information should steadily increase over time. Overall results indicate that this did not occur. H o w ever, people seemed to increase their reliance on the high-causal information as they p e r f o r m e d the task. For example, when provided with both the statistical forecast and the high-causal information, people a p p e a r e d to rely m o r e on the statistical forecast (0.28) than on high-causal information (0.16) at the first block. Yet they were able to adjust their behaviour and e n d e d up placing m o r e weight on high-causal information (0.33) than on the statistical forecast (0.22) at the last block. It was, however, only slight and slow. Thus, we cannot conclude that they
Table 5 Weights for reference information over blocks Statistical forecast and low-causal information Statistical forecast and high-causal information No statistical forecast p < 0.0005 for all observations.
Statistical forecast Low causal
0.32
0.25
0.37
0.17
0.17
0.18
Statistical forecast High causal
0.28
0.33
0.22
0.16
0.26
0.33
Low causal High causal
0.23 0.09
0.13 0.19
0.14 0.27
150
J.S. Lirn, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153
learned the task effectively and therefore any adjustment was insufficient! The above discussion seems to provide encouraging support for the role of people in judgemental adjustment. However, their learning about the cues to causality link (Einhorn and Hogarth, 1982) was far from optimal. People did not appear to learn to use the validity of causal information. Moreover, they did not take full advantage of causal information. The contribution of extra-model information was far less than suggested by a normative model. People appeared to favour their own forecast much more than any of the additional diagnostic information (e.g. Tversky and Kahneman, 1974). This conservatism was more prominent for high-causal information and people were not able to adjust their behaviour accordingly over time. Interestingly, inappropriate preference was given to the statistical forecast rather than the high-causal information, despite the greater diagnosticity of the causal information. Perhaps greater cognitive efforts were required to transform causal information into the forecast scale. Slovic (1972, p. 42) stated that information that has to be "transformed in any but simple ways will tend to be discounted or ignored". This may explain the higher reliance placed on statistical forecasts. Thus, the anchoring-adjustment heuristic appeared to be more dysfunctional in processing causal information than the statistical forecast. This strongly suggests that people might have had more difficulty in learning causal information than in using the statistical forecast. These results tentatively indicate that the role of judgement should be emphasised in the task of selecting good predictors rather than to incorporate additional information into the final forecast (Einhorn, 1972). Adjustment improvement, if any, appeared to be due merely to the predictive power of the causal cues provided rather than their ability to aggregate them! Indeed, any routine adjustment should be cautioned, especially when the model is quite reliable. A reasonable approach is to ask managers to prepare an independent judgemental forecast with all the extra-model causal information they have. The
rest of the adjustment process could be mechanised (Lawrence et al., 1986). Previous literature on the ability of people to learn the statistical nature of cue validity has been under debate. Some (Peterson and Beach, 1967; Nisbett et al., 1983) argue that people are able to use statistical rules in decision-making. On the other hand, there also exists a solid body of evidence (Tversky and Kahneman, 1974) which claims the opposite. In this study, people increased only slightly their reliance on highcausal information over time. This learning was slow and any adjustment was insufficient. People seemed to adopt simple short cuts (e.g. averaging, Lopes, 1985) with little consideration for cue validity. In highlighting the problem of learning the validity of causal information, Brehmer and Kuylenstierna (1978) reported the limited ability of people, even though they were provided with task-relevant information and the statistical notions required to process it. Results in this study have also clearly shown that forecasting using both statistical forecasts and causal information is best done by statistical methods. Judgement has shown itself to be quite poor at this task. Regression analysis clearly shows that more appropriate weightings are attached to the input variables by this means. This result appears contrary to the studies from practice that also clearly show that judgement has an important role in the integration of the additional causal information (Edmundson et al., 1988; Sanders and Ritzman, 1992). How can we reconcile the findings of this study and those of practice? There may be two reasons for the difference. First, it must be recognised that, particularly in the area of sales forecasting, statistical methods that permit integration of multiple sources of information are rarely used (Dalrymple, 1987) and so we do not know how regression-based approaches would have performed. But, second, it should also be recognised that the results in this study are predicated on a single causal variable that is available and applicable for all periods. It may be that this is not a common characteristic of the forecasting process (Lawrence et al., 1994). It may be more
J.S. Lira, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153
common to have information about each variable that is only sporadic or relevant for any particular period. In addition, the information may come from a great many sources. For example, the product forecasting meeting for one month may be dominated by consideration of the entry of a competitor into the marketplace. The next month, it may focus on impending stock shortages due to production difficulties: and the following month it may focus on the impact of the advertising campaign running that month. The variety and irregularity of the information considered in this process may preclude the use of statistical approaches. But given the results of this study, further research could be directed at an examination of the way in which people use such sporadic information in conjunction with their own and statistical forecasts.
151
to utilise causal information in conjunction with time series to produce a forecast. Results indicate that people are able to utilise the causal information and that they are partially sensitive to variations in predictive validity. However, performance was far from optimal and people did not seem to adjust their behaviour over time. Moreover, statistical forecasts based solely on the time series were almost as accurate as judgemental forecasts using highly predictable causal information. In reconciling the results of this study with those of practice, it may be that causal/contextual information is sporadic and variable in nature. This would make the use of statistical approaches often untenable. Further research into the judgemental use of causal information that has large variety and is sporadic would help our understanding of the use of non-time series and causal information.
7. Conclusions This study has examined the ability of people
U I
S
!
T
0
Appendix
R S (~housand)
4. . . . . . . . .
t oda~ t,lmst e r d , l ~ 2 dat,lS •~1o ~UER~IGE
Todaq's
visitors
utll
be do¢~
bq -25.0~.
..... m......... J....~ .........
i. . . . . . . . .
Press
the
OK
;~hen
~onfir~ing
till
tuxt.
152
J.S. Lim, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153
References Andreassen, P.B., 1991, Causal prediction versus extrapolation: Effects on information source on judgemental forecasting accuracy, Working paper, MIT. Ashton, R.H., 1981, A descriptive study of information evaluation, Journal of Accounting Research, 19 (1), 42-61. Brehmer, B., 1973, Single-cue probability learning as a function of the sign and magnitude of the correlation between cue and criterion, Organizational Behavior and Human Performance, 9, 377-395. Brehmer, B., 1976, Subjects' ability to find the parameters of functional rules in probabilistic inference tasks, Organizational Behavior and Human Performance, 17, 388-397. Brehmer, B., 1978, Response consistency in probabilistic inference tasks, Organizational Behavior and Human Performance, 22, 103-115. Brehmer, B., 1980, In one word: Not from experience, Acta Psychologica, 45, 223-241. Brehmer, B., 1988, The development of social judgement theory, in: B. Brehmer and C.R.B. Joyce, eds., Human Judgement: The SJT View (Elsevier Science Publishers B.V., North-Holland, Amsterdam), 13-40. Brehmer, A. and B. Brehmer, 1988, What have we learned about human judgement from thirty years of policy capturing, in: B. Brehmer and C.R.B. Joyce, eds., Human Judgement: The SJT View (Elsevier Science Publishers B.V., North-Holland, Amsterdam), 75-114. Brehmer, B. and J. Kuylenstierna, 1978, Task information and performance in probabilistic inference tasks, Organizational Behavior and Human Performance, 22, 445-464. Brehmer, B. and L. Lindberg, 1970, The relation between cue dependency and cue validity in single-cue probability learning with scaled cue and criterion variables, Organizational Behavior and Human Performance, 5, 542-554. Casey, C. and T.I. Selling, 1986, The effect of task predictability and prior probability disclosure on judgement quality and confidence, The Accounting Review, LXI (2), 302-317. Chapman, L.J. and J.P. Chapman, 1969, Illusory correlation as an obstacle to the use of valid psychodiagnostic signs, Journal of Abnormal Psychology, 74 (3), 271-280. Dalrymple, D.J., 1987, Sales forecasting practices: Results from a United States survey, International Journal of Forecasting, 3, 379-391. Edmundson, R.H., M. Lawrence and M.J. O'Connor, 1988, The use of non-time series information in sales forecasting: A case study, Journal of Forecasting, 7, 201-211. Einhorn, H.J., 1972, Expert measurement and mechanical combination, Organisational Behaviour and Human Performance, 7 (1), 86-106. Einhorn, H.J. and R.M. Hogarth, 1982, Prediction, diagnosis, and causal thinking in forecasting, Journal of Forecasting, 1, 23-36. Garb, H.N., 1989, Clinical judgement, clinical training, and professional experience, Psychological Bulletin, 105 (3), 387-396.
Gardner, E.S., Jr. and E. McKenzie, 1985, Forecasting trends in time series, Management Science, 31 (10), 12371246. Goldberg, L.R., 1968, Simple models or simple processes? Some research on clinical judgement, American Psychologist, 23 (7), 483-496. Goodwin, P. and G. Wright, 1993, Improving judgemental time series forecasting: A review of the guidance provided by research, International Journal of Forecasting, 9, 147161. Hammond, K.R., D.A. Summers and D.H. Deane, 1973, Negative effects of outcome-feedback in multiple-cue probability learning, Organizational Behavior and Human Performance, 9, 30-34. Harvey, N., F. Bolger and A. McClelland, 1994, On the nature of expectations, British Journal of Psychology, in press. Hogarth, R.M., 1981, Beyond discrete biases: Functional and dysfunctional aspects of judgemental heuristics, Psychological Bulletin, 90 (2), 197-217. Hogarth, R.M., 1987, Judgement and Choice, 2nd edn. (Wiley, New York). Klayman, J., 1988, On the how and why (not) of learning from outcomes, in: B. Brehmer and C.R.B. Joyce, eds., Human Judgement: The SJT View (Elsevier Science Publishers, B.V., North-Holland, Amsterdam), 115-163. Kteinmuntz, B., 1990, Why we still use our heads instead of formulas: Toward an integrative approach, Psychological Bulletin, 107 (3), 296-310. Lawrence, M.J., R.H. Edmundson and M.J. O'Connor, 1985, An examination of the accuracy of judgemental extrapolation of time series, International Journal of Forecasting, I, 14-25 Lawrence, M.J., R.H. Edmundson and M.J. O'Connor, 1986, The accuracy of combining judgemental and statistical forecasts, Management Science, 32 (12), 1521-1532. Lawrence, M.J., R.H. Edmundson and M.J. O'Connor, 1994, A field study of forecast accuracy, Working Paper, School of Information Systems, The University of New South Wales, Australia. Lim, J.S. and M.J. O'Connor, 1994, Systems to support judgemental adjustment of initial forecasts, TwentySeventh Hawaii International Conference on System Sciences, Jan., 263-271. Lopes, L.L., 1985, Averaging rules and adjustment processes in Bayesian inference, Bulletin of the Psychonomic Society, 23 (6), 509-512. Makridakis, S., A. Anderson, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. Newton, E. Parzen and R. Winkler, 1982, The accuracy of extrapolation (time series) methods: Results of a forecasting competition, Journal of Forecasting, 1, 111-153. Mathews, B.P. and A. Diamantopoulos, 1986, Managerial intervention in forecasting: An empirical investigation of forecast manipulation, International Journal of Research in Marketing, 3, 3-10. Mathews, B.P. and A. Diamantopoulos, 1989, Judgemental
J.S. Lim, M. O'Connor / International Journal of Forecasting 12 (1996) 139-153 revision of sales forecasts: A longitudinal extension, Journal of Forecasting, 8, 129-140. Miller, P.M., 1971, Do labels mislead?: A multiple cue study, within the framework of Brunswik's probabilistic functionalism, Organizational Behavior and Human Performance, 6, 480-500. Naylor, J.C. and R.D. Clark, 1968, Intuitive inference strategies in interval learning tasks as a function of validity magnitude and sign, Organizational Behavior and Human Performance, 3, 378-399. Nisbett, R.E., D.H. Krantz, C. Jepson and Z. Kunda, 1983, The use of statistical heuristics in everyday inductive reasoning, Psychological Review, 90 (4), 339-363. Peterson, C.R. and L.R. Beach, 1967, Man as an intuitive statistician, Psychological Bulletin, 68 (I), 29-46. Sanders, N.R. and L.E Ritzman, 1992, The need for contextual and technical knowledge in judgemental forecasting, Journal of Behavioural Decision Making, 5, 39-52. Schustack, M.W. and R.J. Sternberg, 1981, Evaluation of evidence in causal inference, Journal of Experimental Psychology: General, 110 (1), 101-120. Slovic, P., 1972, From Shakespeare to Simon: Speculations about man's ability to process multidimensional information, Interfaces, 2, 42. Sniezek, J.A., 1986, The role of variable labels in cue
153
probability learning tasks, Organizational Behavior and Human Decision Processes, 38, 141-161. Tversky, A. and D. Kahneman, 1974, Judgement under uncertainty: Heuristics and biases, Science, 185, 11241131. Tversky, A. and D. Kahneman, 1980, Causal schemas in judgements under uncertainty, in: M. Fishbein, ed., Progress in Social Psychology, vol. 1 (Lawrence Erlbaum Associates, Hillsdale, NJ), 49-72. Wolfe, C. and B. Flores, 1990, Judgemental adjustment of earnings forecasts, Journal of Forecasting, 9, 389-405.
Biographies: Joa Sang LIM is employed by Samsung Data Systems in Seoul, Korea. Prior to that appointment he was a doctoral student at the University of New South Wales. His research has been concerned with utilising decision support systems to aid the task of forecasting and the cognitive processes involved in the use of judgement in that task. Marcus O'CONNOR is an Associate Professor in the School of Information Systems at the University of New South Wales, Sydney, Australia. His research interests centre around the use of judgement in forecasting and decisionmaking and the way in which various types of information are incorporated into the judgemental process.