Immediate and short-term judgemental forecasting: Personologism, situationism or interactionism?

Immediate and short-term judgemental forecasting: Personologism, situationism or interactionism?

ferwn. in&id. D@ Vol. 9, No. I, pp. 109-120. 1988 Printed in Great Britain. All rights reserved IMMEDIATE FORECASTING: 0191-886988 53.00+O.OO Copyri...

1MB Sizes 4 Downloads 65 Views

ferwn. in&id. D@ Vol. 9, No. I, pp. 109-120. 1988 Printed in Great Britain. All rights reserved

IMMEDIATE FORECASTING:

0191-886988 53.00+O.OO Copyright % 1988Pergamon Journals Ltd

AND SHORT-TERM JUDGEMENTAL PERSONOLOGISM, SITUATIONISM OR INTERACTIONISM?

GEORGE WRIGHT and PETER AYTON’ Bristol Business School, Bristol Polytechnic, Coldharbour Lane, Bristol BS16 IQY, and ‘Psychology Department, City of London Polytechnic, Old Castle Street, London El 7NT, England (Received

9 February

1987)

Summary-This study examines the relative influence of the person and the forecasting situation on the quality of judgemental probability forecasting in the immediate- and short-term. College students made forecasts for low- and high-desirable events over forecast periods differing in imminence and time duration. The major finding is one of strong evidence for personologism in forecasting response and performance. Situational effects on forecasting and interactions between the forecasting situation and the forecaster were found to be negligible, at least with the sample of people and situations studied here. It may well be possible to select people with an all-round ability in judgemental forecasting.

INTRODUCTION This

paper is concerned with evaluating the relative influence of the person and the situation on the quality of judgemental forecasting. By the quality of judgemental forecasting we mean the ability to assess realistic probabilities for the occurrence of future events. One measure of realism of forecasts is ‘calibration’. For a person to be perfectly calibrated, assessed probability should equal percentage correct over a number of assessments of equal probability. In other words, for all events assessed as having a O.XX probability of occurrence, XX% should actually occur. More details about the mathematical measurement of calibration and related scores are given in the method section of this paper. Previously, most of the research on the calibration of probability assessments has utilized general knowledge questions of the form: “Which is Longer? (a) Suez Canal (b) Panama Canal.” The respondent is required to indicate the answer he or she feels to be the correct one and then indicate how sure he or she is of this by assigning a probability in the range 0.5 to 1.0. Since the experimenter already knows the answers to the questions, calibration measures can be easily computed (see Lichtenstein, Fischhoff and Phillips, 1982). A general finding has been that people are often ‘overconfident’ in their assessments, in that for all propositions assessed as having a O.XX probability of being true less than XX% actually are true. However, Wright (1982) has shown that calibration and overconfidence differ between sets of general knowledge and future event questions. Probability assessments for future event questions tend to be less certain than probability assessments for past-event and general knowledge questions. Wright and Wisudha (1982) have argued that overconfidence shown with general knowledge questions is due to the positive social utility of expressing certainty in a situation where there is known to be an answer. Conversely, with future event questions the initial response anchor may be a 0.5 probability which indicates an immediate recognition of uncertainty. Insufficient adjustment from this anchor would result in the underconfidence for future event questions shown in Wright’s (1982) study. Why is it important for probability assessors to make realistic forecasts? Imagine a simple two-alternative decision problem where a businessman has to decide whether to re-tool to manufacture product A or whether to re-tool to manufacture product B. Each of the two products requires a different type of production and plant and because re-tooling is extremely expensive the businessman can only produce one type of product. Suppose that the utilities attached to the outcomes are such that the production of A is more attractive if the probability of making a 109

GEORGE WRIGHT and PETER AYTON

I10

‘reasonable’ return is >0.7, otherwise it is better to manufacture Product B. If the businessman assesses the probability of a reasonable return on product A is 0.8, but is poorly calibrated, so the appropriate probability is 0.6, then the businessman would not maximise expected utility by re-tooling for product A. Furthermore, when the decision problem is complicated by an extensive set of act and event sequences and large potential payoffs, compounded miscalibration can mean a large expected loss. Although not concerned with calibration, other research on judgemental forecasting has recognised the influence of the time horizon and the desirability of the occurrence of a particular event upon probability assessment. This research is reviewed briefly below. Studies of judgemental

forecasting

The duration of the time period and its imminence are two obvious factors which one might expect to have some influence on forecasting. Milburn (1978) investigated the effect of imminence and found that subjects perceived desirable events as becoming increasingly more likely to occur in each of four successive decades in the future. In contrast, undesirable events were perceived as becoming less likely in each of the four successive decades. However, in a complementary experiment where each subject gave forecasts for only one of the decades, there was a significant downward trend in perceived likelihood for both types of event across time into the future. These results are difficult to interpret. Milburn suggested that the availability heuristic (Tversky and Kahneman, 1974) would predict that subjects will feel that the world may change more and become progressively less like the present in successive decades. Thus it should be harder to imagine what the world will be like and events should be seen as less probable. Desirable events increase in probability over time because, Milburn argued, desirability has a greater effect in more ambiguous circumstances. Due to the time periods involved, there was no attempt to assess the calibration of the forecasts. Perhaps the most significant result from this research is that when subjects do not have to produce a number of forecasts for successive time periods the pattern of results changes. In summary, the influence of imminence does not seem to be straightforward. It interacts with the desirability of the events to be forecast and changes as a function of the method of elicitation of the forecasts. The way in which forecast probabilities vary as a function of the time duration seems to have received little attention. Howell and Burnett (1978) have suggested that short spans will be influenced to a greater degree by local context effects and less by knowledge of the past frequency of the events. Thus subjects may believe in ‘luck’ playing a bigger part in the short-term and are prone to unstable ephemeral influences. They cited the gambler’s fallacy (e.g. the belief that a coin which comes up heads many times in a row will more likely come up tails next) as an illustration of this view. A number of studies have identified the desirability of an event as a significant factor affecting forecast judgements. Zakay (1983) found that subjects perceive desirable life events as being more likely to occur to themselves than to another person similar to themselves. Undesirable life events showed the opposite effect. This contrasting pattern of results can be termed an optimistic bias. Zakay suggested that operation of the availability heuristic would tend to retrieve many instances of extremely undesirable events occurring to others from news reports, but not to oneself. The availability heuristic does not explain the finding for desirable events, however. Weinstein (1980) found similar results to those reported by Zakay, as well as finding that perceived controlability of events correlated positively with the amount of optimistic bias. Controlability was measured by a 5-point subjective rating scale. Some individual differences in forecasting ability have been reported, although relating those to specific personality variables has proved difficult. O’Carroll (1977), in a study investigating economic forecasting, found substantial differences between the individuals’ performances, as evaluated by a scoring rule, that were consistent over the five forecasting tasks to be analysed. These tasks involved predicting each week for eighteen weeks, the Financial Times share index, the dollar-sterling rate, etc. Borges, Roth, Nichols and Nichols (1980) attempted to relate subjects’ accuracy of prediction of their own academic performance to measures of self esteem and locus of control. Although they found some significant correlations the picture is not clear since age and gender of the subjects moderated the correlations.

Immediate

and short-term

judgemental

forecasting

III

In summary, the time duration of a forecast and the imminence of a forecast time period would seem to be variables that will influence forecasting ability. Desirability in interaction with time period and degree of personal involvement seems a well established factor. Individual differences. although undoubtedly present, remain to be fully explored and explained. For this reason, an investigation of forecasting ability to determine the relative influence of the person and the task would seem to be of major importance, since most decisions are made under uncertainty about the future (for a review see Wright, 1984). Wheelwright and Makridakis (1980) in their review of statistical forecasting methods and their applications have differentiated time horizon as the one most useful criteria for matching a specific forecasting situation with the most short-term, appropriate forecasting technique. These authors differentiate immediate-term, medium-term and long-term horizons. By immediate-term they mean those forecasts that are prepared for one month or less in advance. Short-term refers to a l-3 month horizon whilst medium-term refers to 3 months to 2 years and long-term to a horizon greater than 2 years. Only the distinction between immediate-term and short-term need concern us here. For immediate-term forecasting, the techniques most appropriate are smoothing techniques. Smoothing methods are often not appropriate for short-term situations and decomposition, control and other more complex techniques are often used. The role of judgement in immediate- and short-term forecasting has not been systematically evaluated. Makridakis and Wheelwright (1979) concluded that: “application of quantitative approaches will continue to increase and supplement or replace many of the applications now handled through purely judgemental approaches” (Makridakis and Wheelwright, 1979, p. 348). However, they also note that: “Of course it must be remembered that just as it is impossible to say which methodology is always best, it is impossible to conclude that quantitative methods are always better than subjective or judgementally based methods. Human forecasters can process much more information than most of the formalized quantitative methods, and such forecasters are more likely to have knowledge of specific near-term events that need to be reflected in current forecasts’* (p. 348). Gilchrist (1976) has argued that judgemental forecasts are used when there is insufficient time to obtain and use a statistical forecast or when situations are changing so rapidly that a statistically-based forecast would be no use, even as a guide. From this short overview of the role of judgement in forecasting it is clear that from applied and theoretical points of view the study of judgemental forecasting is an important topic. The influence of individual and task influences on immediate-, short-, medium- and long-term judgemental forecasting is worthy of systematic investigation. Before we turn to our own study of judgemental forecasting in the immediate and short-term we will selectively review previous research on the contribution of individual and task influences to decision-making in general. This review sets the scene for our own study. Studies of decision-making

Various decision situations have been studied, various correlates of decision-making have been obtained, and various decision styles have been proposed. The major difference between studies has been the relative emphasis on the decision-maker or the decision situation as the main source of behavioral variation. Wright (1984) gives an overview of research in behavioral decision theory. Cognitive style research, which has emphasised the decision-maker, and contingent decision research, which has emphasised the decision situation, can be viewed as alternative frameworks for research on decisional variance. The results of research conducted within each framework tend to be congruent with that framework because the methodologies of the frameworks tend to initiate fairly distinct lines of research and theory. Wright and Fowler (1986) have argued that these two distinct lines have followed closely the correlational or psychometric research format and the experimental research format, respectively. PA.,D4) hl

GEORGEWRIGHT and PETERAYTON

II?

Cognitice

style research

In the psychometric tradition, Driver and Mock (1975) ha\-e identified four basic decision styles. The Decisice style is ‘one in which a person habitually uses a minimal amount of information to generate one firm option. It is a style characterized by a concern for speed, efficiency and consistency”. The Flexible style “also uses a minimal data, but sees it having a different meaning at different times. . . It is a style associated with speed, adaptability and intuition”. In contrast, the Hierarchic Style. . _ “uses masses of carefully analysed data to arrive at one best conclusion. It is associated with great thoroughness, precision and perfectionism”. Similarly, the Integrative Style”. . , uses masses of data, but will generate a multitude of possible solutions. . . It is a highly experimental, information-loving style-often very creative”. Driver and colleagues have developed two main psychometric measures of Decision Style which have apparently predicted such behavior as, decision speed, use of data, information search and information purchase on experimental tasks. However, most if not all of these empirical studies, including the psychometric measures themselves, are contained in unpublished reports and so the quality of this research is diflicult to evaluate. For instance, Driver (1974) has apparently shown that some persons use one style predominantly whereas others employ one style as often as another. McGhee, Shields and Birnberg (1978) have published a study examining the relationship between Driver’s decision styles and information processing in decision-making. They used a decision situation in which their subjects were to make recommendations about whether to include certain companies in the investment portfolio of a large insurance company. Information about the companies included information about eight cue variables including sales, net income and primary earnings per share. By constructing a multiple linear regression model of each subject, McGhee et al. were able to operationalize “use of information” by counting the number of significant betas in a subject’s model, since a significant beta means that when a cue is systematically varied the subject’s judgement is affected. No significant difference was noted for those subject classified as maximal or minimal data users on the basis of Driver’s (1971) Integrative Style Test. The Myers-Briggs type indicator (Myers, 1962) has also been used to discriminate decision styles. This indicator follows the psychology of types developed by Jung. According to Casey (1980), individuals categorized as sensors “prefer to analyze isolated, concrete details in making a decision”, whereas intuitors “focus on relationships, or gestalt”. Davis (1982) utilized the MyersBriggs instrument to differentiate individuals’ performance on a computer simulation of a production function. Individual decision-makers acted as operations managers. One of the tasks faced by his subjects was to develop a production plan for a 5 week production period with the objective of minimizing the firm’s total costs. Davis found that sensing subjects obtained significantly lower costs than intuitive subjects. He argued that this was because his decision task was analytical and moderately-well-structured whereas other tasks involving tactical and strategic decisions would be less well-structured and would tend to favour good performance by intuitive types. Wright and Phillips (1984) argued that it was possible to define alternate cognitive styles of probabilistic and non-probabilistic thinking and predict performance across a variety of decision tasks. Specifically they argued that a probabilistic thinker with no cognitive limitations would take a probabilistic rather than non-probabilistic view when confronted with uncertainty, would value information that could reduce uncertainty, would revise probabilities in light of new information, would be less prone to violate a normative axiom of decision theory, would take account of future uncertainties when making plans, and would show no bias for certain over uncertain events just because the former are certain. On the other hand, the non-probabilistic thinker would translate uncertainty into yes-no or don’t know terms, would put little value on fallible information, would show little revision of probabilities when fallible information is presented, would be prone to violate a normative axiom of decision theory, would make plans on the basis of best guesses, and would be biased towards opinions with certain consequences. However, Wright and Phillips found little evidence of cross-situational consistency of their hypothesised cognitive styles and they were led to conclude that “Probabilistic and non-probabilistic not appear to be unidimensional

thinking, at least with the sample and measures cognitive styles.”

used here, do

Immediate

and short-term

judgemental

forecasting

113

Contingent decision research

Researchers working in the experimental tradition have placed emphasis on the contribution of task characteristics to decisional variance. Kahneman and Tversky (1982) in their “Prospect Theory” of decision-making argued that the subjective valuation of the possible outcomes of a decision is crucially dependent on the decision maker’s reference point or frame of reference. For example, “Framing effects-consumer behavior may be particularly pronounced in situations that have a single dimension of cost (usually money) and several dimensions of benefit. An elaborate tape deck is a distinct asset in the purchase of a new car. Its cost, however, is naturally treated as a small increment over the price of the car. The purchase is made easier by judgeing the value of the tape deck independently and its cost as an increment. Many buyers of homes have similar experiences. Furniture is often bought with little distress at the same time as a house. Purchases that are postponed, perhaps because the desired items were not available, often appear extravagent when contemplated separately: their cost looms larger on its own. The attractiveness of a course of action may thus change if its cost or benefit is placed in larger account.“(Kahneman and Tversky 1982, p. 140). In another study which emphasised the contingent nature of decision-making, Lichtenstein and Slavic (1971) had subjects evaluate gambles by a choice procedure and also by a bidding procedure. The choice procedure required subjects to indicate which of a pair of gambles they preferred whilst the bidding procedure asked subjects to name an amount of money at which they would be indifferent between playing a specified single gamble or having that amount of money. When they compared the results of the choice and bidding procedures they found, surprisingly, that for the same subject the results of the two procedures were not correlated. Specifically, they often found that subjects would indicate a greater preference for one gamble when a choice procedure was used, and bid more for another gamble when bidding procedure was used. When asked to choose between gambles, people tended to prefer those containing a higher probability of winning. whereas higher bids were made for gambles containing the larger amounts to win. Payne (1982) summarized most of this situation-orientated research: “The present review strongly suggests the conclusion that decision-making is a highly contingent form of information processing. The finding that decision behavior is sensitive to seemingly minor changes in task and context is one of the major results of years of decision research. It will be valuable for researchers to continue to identify task and context effects. Nevertheless, the primary focus of decision research should now be the search for some general principles from which contingent processing would follow.” Personality psychology and the cognitive style/contingent

decision-making issue

Wright (1985) has argued that the time is now opportune for a reconciliation of decision style and contingent decision-making research. He drew the analogy to research in personality where similar issues have been of major concern. In personality psychology, three main theoretical positions describe the individual and his interaction with the environment. Personologism advocates that stable intraorganismic constants such as traits or cognitive styles are the main determinants of behavioral variation (e.g. Alker, 1971). Situationism emphasises environmental (situational) factors as the main sources of behavioral variation (e.g. Mischel, 1968). Znteractionism, a synthesis of personologism and situationism, implies that the interaction between these two factors is the main source of behavioral variance (e.g. Endler and Magnusson, 1976). Bowers (1973) has noted that almost all recent studies investigating the source of behavioral variation have concluded that interactionism is more important than either personologism or situationism. In short, interactionism would seem to be the major contemporary conceptualization of personality. Two major methodologies have been used for investigation of the relative contribution of the person, the situation or an interaction of the two. The first strategy has been simply to correlate measures of a personality trait or behavior. As Endler (1975) pointed out, this strategy has usually yielded correlations of 0.30 and such results have usually been taken to support the situationist position. However, it must be noted that whilst

GEORGE WRIGHT and PETER AVTOV

I I4

such low correlations do not support a personologist or trait position they do not differentiate between situationism and interactionism. This is because correlations may be attenuated by interactions and by less than perfect reliabilities of the tests correlated, in addition to situational specificity. The second research strategy has been to use an analysis of variance (ANOVA) approach which allows the relative variance contributed by situation, persons and an interaction of these to be evaluated. Endler and Hunt (1968. 1969) provide one of the first demonstrations of this technique in an investigation of the person-situation issue. In essence, the development of the ANOVA approach has made comparisons possible between the personologist, situationist and interactionist positions. Endler (1966) presents an account of a variance components technique that surmounts the methodological problem of directly comparising mean squares from different sources of variance where the mean squares are not independent of one another. However, more recently, some personality psychologists pointed to some potential problems with what would appear, at first sight, to be an ideal methodology for disentangling situationist, interactionist and personologist accounts of decisional variance. Golding (1975) in a paper entitled “Flies in the ointment: methodological problems in the analysis of the percentage of variance due to persons and situations”, showed that Endler’s methodology is inappropriate. Golding constructed a hypothetical data set set demonstrating perfect consistency across both persons and situations and showed that Endler’s variance components technique attributed almost twice as much influence to persons as to situations. In the paper, Golding advocated use of Cronbach, Gleser, Nanda and Rayaratnam’s (1972) generalizability theory as a more appropriate methodology. However, use of generalizability theory to investigate the existence of interactions between persons and situations is not so straightforward. As Golding himself notes, “These

coefficients.

.

are not presented

. . . because

they are rather

difficult

to interpret.”

The present study We focus on judgemental forecasting and investigate the relative influence of persons and forecasting situations on variation in our response and performance measures in a study of immediateand short-term forecasting. As we argued earlier, knowledge about the source(s) of differences in forecasting ability is of major value because of the implications for the selection of effective decision-makers. The methodology we use to investigate the relative influence of the person and the situation differs from that of Endler (1966) or Golding (1975). Instead we compute product-moment correlations between peoples’ scores on a forecasting measure across situations and we compute point-biserial correlations between situations (coded nominally) and peoples’ scores on a forecasting measure. The relative consistency of persons and situations. respectively. can then be compared. In addition, we compute Tukey’s (1949) test for non-additivity on our data when it is presented in a one-way analysis of variance format. Tukey’s test is sensitive to a type of interaction indexed by the correlation between a subject’s average performance and the rate at which his performance changes relative to changes in the group performance. Consider an experiment in which each of a group of subjects is tested in a number of different situations. We would expect an individual’s score to be correlated with the situation mean score; for example, if the group means over the different situations show an increase we would expect any individual subject to show an increase. A person-situation interaction can be said to exist if the rate of increase over situations varies for individual subjects. If this variance in the rate of change of subjects is systematic within the terms of the experiment then it cannot be error of measurement and it cannot be interaction with other extra-experimental factors. Suppose that an individual’s degree of increase over situations was correlated with his average performance. For example, subjects with high average performance might show less change over situations because they are already near to their ceiling. Other subjects might show a greater change as they have more room for improvement. The correlation between an individual’s change in behavior across situations and the difference between that change and the mean change of the group is what is detected by the Tukey test. The test identifies a component in the residual term which, if present to a significant degree, can be utilized in future studies to make precise predictions about the behavior of individuals across situations. Ayton and Wright (1985, 1986) have outlined some

lmmedlate

and short-term

judgemental

forecasting

115

theoretical advantages of using this test when analysing data for the relative influence of individual differences, situational factors and interactions between these variables. METHOD

A total of 36 British students attending the City of London Polytechnic voluntarily completed the questionnaire in November 1983. The questionnaire presented 210 statements such as “Snow (a) will, (b) will not fall in London in December”. Seventy of the statements dealt with December, and an amended version of the same 70 statements dealt with January. A similar set of amended statements concerned the two month period, December and January. These manipulations allowed us to study the effects of varying the imminence of a one month duration forecast period and the effects of changes in the duration, one month or two months, of a forecast period. Both of these manipulations change judgemental forecasting from immediate-term to short-term. Since there is applied rather than theoretical basis for the study of these forecasting terms we have no a priori rationale that other possible time manipulations, for example, 1 week versus I2 weeks, would be more suitable. Each set of seventy statements appeared in a separate part of the whole questionnaire and the ordering of the part-questionnaires was randomised across subjects. Within a particular part-questionnaire the ordering of individual questions was also randomised. Subjects were instructed to mark the right answer and to indicate how sure they were of the answer by writing a percentage between 50 and 100, where a percentage of 50 quantifies a “don’t know” response and a percentage of 100 quantifies a certainty response. Beach and Wise (1969) found close similarity between direct probability estimates and confidence ratings on a scale similar to that used in the present study. Intuitively it seemed that the sample of polytechnic students reported here would find it easier to express their uncertainty as confidence on a 50 to 100 scale rather than as a probability on a 0.5 to I scale, therefore degree of belief was measured as a percentage. After each subject has completed all the probability assessment questionnaires he or she rated his or her subjective desirability for each of the events when these were expressed in a general way, such as “Snow falls in London”. Desirability was assessed on a seven-point scale with I meaning that you really would not like the event to happen and 7 meaning that you really would like the event to happen. The ordering of the events in the desirability questionnaire was randomised across subjects. For each subject, several probability performance measures were computed for the three sets of 70 questions after the time allowed for the event’s occurrence or non-occurrence had elapsed. Calibration = f

i n,(r, - c,)‘, I==1

where n, is the number of times response r, was used, c, is the proportion correct for all items assigned probability r, and T is the total number of different response categories used (e.g. T = 2 for subjects who use only responses of 0.5 and 1.0). This measure was first proposed by Murphy (1973) and is reported in Lichtenstein and Fischhoff (1977). A perfectly calibrated person would score 0 on this measure. Over/underconfidence

= f

i n,(r, - c,)

1-l

This measure is equivalent to the difference between the mean of the probability responses and the overall proportion correct. Overconfidence is shown by a positive difference, underconfidence by a negative difference. Resolution = & i n,(c, - c)’ 1-I

(3)

This measure, first proposed by Murphy (1973), is independent of calibration. Resolution, as Yates (1982) notes, refers to: “the ability of the forecaster to discriminate individual occasions on which the event of interest will and will not take place”. Since the number of categories, T, will affect the size of the calibration and resolution scores all responses were grouped into six categories: 50 to 59, 60 to 69, 70 to 79, 80 to 89, 90 to 99 and

GEORGE WRIGHT and PETERAYTOS

116 Table

I.

Between-subjecr

a~v~,s situatmn Independent

Dependent

measure

January und Dee Jan

December und DK Jan

Mean cOrrelJtlOn

0.649 0.526 0.293 0.858

0.765 0.663 0.396 0.761

0.4?5 0 576 0.121 0.802

0.613 (P <0.002) 0.588 (P < 0.002) 0.270 (P > 0.05) 0.807 (P
2. Point-bserial

correlations

between measures

pairs of months

First dependent

Second dependent variable Calibration

measures

December and January

Calibration Overconfidence Resolution Mean probability

Table

correlations

December and JZ0luaI.Y 0.079

Overconfidence

- 0.065

Resolution Mean probability

0.069 - 0.090

JaIlllX) and Dee Jan 0.03 I -0.067 -0.073 0.020

and forecasting

variable

DeCember and Dee Jan 0.025 0.042 0.143 -0.08 I

Mean correlaIion o.o-l5 (P > 0.05) -0.030 (P > 0.05) 0.046 (P > 0.05) - 0.050(P > 0.05)

100. The mean response in each category was used as r,. and the proportion correct across the whole category was used for c,. This method is in accordance with Lichtenstein and Fischhoff ( 1980). We also computed a subject’s mean probability response to each set of 70 statements. The importance of noting this response measure in addition to calculating the performance measures, defined above, will become clear later. RESULTS

Forecasting

periods

Table 1 presents the between-subject correlations across the pairs of forecasting periods. The mean correlation for each dependent measure is also reported. With the exception of the correlations based on the resolution measure, the mean correlations are all positive and significantly different from zero. The result suggests that forecasting performance, as measured by calibration and overconfidence scores, is, to a fair degree, consistent across the three forecasting situations. The response measure, mean probability, also shows cross-situational consistency. Table 2 presents the point-biserial correlations computed between pairs of forecasting periods* and the forecasting measures. None of the correlations in the matrix is significantly different from zero. These results suggest that the forecasting periods used in the present study have only a negligible effect on forecasting performance and mean probability response. In other words, the evidence for situationism is minimal. Prior to computing Tukey’s test for non-additivity it was necessary to compute a one-way analysis of variance for each of the performance and response measures. The independent variable had three levels corresponding to the forecast periods. Table 3 sets out the obtained F-ratios. The time period of the forecast had no significant effect on calibration, overconfidence and resolution, the three performance measures. The subsequent Tukey tests revealed no evidence of interactions. This result, taken with the correlational analyses reported earlier, suggests that forecasting performance is a personal characteristic relatively uninfluenced by situations or an interaction between person and situation, at least in the situations studied here. Similarly, the one way analysis of variance of the forecasting response measure, mean probability, did not achieve significance. But Tukey’s test for non-additivity did reach significance at the 0.05

the first forecasting period was coded as 0 and the second as I. In fact, it would make no difference what numerical label a forecasting period was given as long as it was consistently assigned and different from the numerical label given to the second period.

*Here,

Immediate

Table 3. One way ANOVA

and short-term

judgemental

forecasting

117

and Tukey test on the t-orecastmgperformance and respon,r measures Analysis

Measure

One-way ANOVA

Calibration

Overconfidence

Resolution

Mean

Tukev test for non-additlvitv F = I.54 dI/= I .69 (P = 20.05)

F = 0.54, df = 2.70 P z 0.05 F = 0.48. l/f = 2.70 P z 0.05 F = I .07. d/‘= 2.70 P > 0.05 F = 3.155, dJ = 2.70 P > 0.05

probability

F = 0.09. $f = (P > 0 05)

I .69

F = 2.77. dJ’= 1.69 (P > 0.05) F=5.31. dJ= (P < 0.05)

1.69

level for this response measure. This result implies that individual subjects are adapting their responses differentially to changes in the forecasting periods. It may be that subjects are interactively changing their probability in order to keep of ecents Table 4 presents the between-subject correlations using the 35 lowest desirable and the 35 highest-desirable events, within the three forecasting periods. The mean correlation for each dependent measure is also reported. Similarly to our findings of consistency of forecasting performance and response across time periods, calibration, overconfidence and mean probability between-subject mean correlations. However, the crossresponse show significant, positive, situational consistency of these two performance measures, as measured by changes in r ‘. has been reduced. Personologism in forecasting performance now accounts for some 15% of the variance in each of the dependent measures. Clearly, forecasting performance may be more consistent across variations in time periods than variations in the desirability of the events to be predicted. However, we must caution that the number of data points on which the performance measures were calculated were reduced in the latter analysis. Table 5 presents point-biserial correlations computed between high and low desirability items and the forecasting measures?. None of the correlations in the matrix is significantly different from

flow-desirable

events

were coded

1 and high-desirable

events

were coded

0

Table 4. Between subject correlations across high and low desirable items Independent measure Dependent Measure Calibration Overconfidence Resolution Mean probability

High and low desirability in December 0.26 I 0.353 -0.213 0.866

High and low desirability in Januarv

High and low desirability in DeciJan

0.363 0.447 0.039 0.863

0.542 0.304 0.205 0.786

Mean correlation 0.388 (P 0.368 (P 0.010 (P 0.x3x (P

< < > <

0.02) 0.05) 0.05) 0.002)

Table 5. Point-biserial correlations between high and low desirability and the forecasting measures in the different forecastina oeriods First deoendent variable High and low desirability

Second dependent variable

High and low desirability in December

Jan’iarv

High and low desirability in Dee/Jan

Calibration Overconfidence Resolution Mean Probability

-0.146 -0.026 -0.058 0. I55

-0.132 -0.185 0.048 0.186

0 -0.01 I -0.037 0. I59

Mean correlatton - 0.092 -0.094 -0.015 0.167

(P (P (P (P

c < < <

0.05) 0.05) 0.05) 0.05)

GEORGEWRIGHT and PETERAVTON

118

Table 6. Pearson

Correlations

for Subject-Situation Indeoendent

Dependent

measure

High and low desirable events in December 0.206 0.027 -0.223 - 0.036

Calibration Overconfidence Resolution Mean Probabilitv

Interactions

measure

High and low desirable events in January

High and low desirable events in Dec.‘Jan

0.439 -0.154 -0.1 IO -0.045

-0.050 -0.320 -0.047 0.1 I6

Mean correlation 0.198 (P z 0.05) -0.149 (P > 0.05) -O.l26(P >0.05) 0.035 ff > 0.051

zero. This result complements that of the previous analysis of situationism across the forecasting periods. Evidence for situationism in forecasting performance and response across changes in the desirability of the events to be forecast is minimal. Table 6 presents a measure of interaction analogous to the Tukey test presented earlier in Table 3. For two levels, A and B, of a factor the interaction can be represented as a correlation between Ai + Bi and Ai - B,, where i is an individual’s score. Thus any relationship between the subjects’ mean performances and changes in performance across situations will be revealed; a significant correlation implies a person-situation interaction. In our case, the correlation is between a subject’s summed score on the dependent variable for the low and high-desirable events and his or her change in score across the partitioned questionnaire. The Pearson correlation coefficients indicate the degree of correlation between each subject’s mean performance and the rate of change in that performance across situations. Although (as with the Tukey test) there are possible subject interactions to which this measure is insensitive it detects all interactions with a significant linear component. The greater the correlation the greater the degree of interaction. As can be seen, only one of the correlations is significantly different from zero. There seems little evidence for subject-situation interactions in forecasting on any of the four measures we have considered. . DISCUSSION

AND

CONCLUSION

The major finding from this study is of strong evidence for personologism in judgemental forecasting’between immediate- and short-term forecasting periods. These periods differed both in imminence and in time duration. This result, based on response and performance measures of judgemental forecasting suggests that it may well be possible to select people for their forecasting ability. Given that subjective forecasts are a prime input to the management technology decision analysis this result is, we believe, of considerable importance. Often in decision making and planning, simple time series analyses based on regression techniques are known to be inappropriate, either due to the lack of historical data or the presence of a discontinuous change in factors that are presumed to have a causal influence in’the event whose prediction is of interest. Such forecasts must rely on human judgement. For a review see Wright and Ayton (1987). Conversely, the effects of situation&m, measured in terms of the variations in imminence and time duration studied here, have a minimal influence on judgemental forecasting. Similarly, the evidence for interaction between the person and the forecasting situation is minimal, with the exception of the intriguing possibility that some subjects may be interactively changing their probability responses in order to keep their forecasting performance steady. Evidence for personologism in judgemental forecasting also emerges when the unit of analysis is low versus high desirable events, although the strength of cross-situational consistency is somewhat reduced. However, we obtained no evidence of situational or interactional influences on the judgemental forecasting of high and low desirable events. In general, with the forecasting tasks utilized here, there is little evidence for task contingency on this aspect of decision-making. Forecasting behavior seems to be more of a personal characteristic, the aetiology of which has yet to be explored. Whether there are separable cognitive styles that can describe individual differences in judgemental forecasting is another issue that as yet is unanswered. It may be that people differ along a single unidimensional scale of forecasting ability.

Immediate

and short-term

judgemental

forecasting

119

On the basis of the present results there would seem to be some potential benefit in proceeding to evaluate the possibility of combining judgemental and statistical forecasts, as Makridakis, Wheelwright and McGhee (1983) propose, as one direction for future research. However, the relative influence and consistency of individual differences in judgemental forecasters remains to Exploration of the factors influencing be tested over a wide range of forecasting situations. judgemental forecasts is plainly of interest to professional forecasters and psychologists. REFERENCES Alker H. A. (1971)Relevance of person perception to clinical psychology. J. Consulf. clin. Psychol. 37, 167-176. Ayton P. and Wright G. (1985) The evidence for interactionism in psychology: A reply to Furnham and Jaspars. person. individ. Dt$

6, 509-512.

Ayton P. and wright G. (1986) Persons, situations,

interactions and error: consistency, variability and confusion. person. indicid. 01% 7, 233-235. Beach L. R. and Wise J. A. (1969) Subjective probability estimates and confidence ratings. J. e.rp. Psychol. 79, 4384. Borges M. A., Roth A., Nichols T. and Nichols B. S. (1980) Effects of gender, age, locus of control. and self-esteem on estimates of college grades. Psychol. Rep. 47, 831-837. Bowers K. (1973) Situationism in psychology: An analysis and a critique. Psycho/. Rev. SO, 307-334. Casey C. J. (1980) The usefulness of accounting ratios for subjects’ predictions of corporate failure: Replication and extensions. J. Account. Res. 18, 603-613. Cronbach L. J., Gleser C. G., Nanda H. and Rayaratnam N. (1972) The Dependability of Behavioral Measurements: Theor! of Generalizability for Scores and Profiles. John Wiley, New York. Davis D. L. (1982) Are some cognitive types better decision makers than others? An empirical investigation. Human Systems Management 3, 165-172. Driver M. J. (1971) Integrative Style Test. University of Southern California. Driver M. J. (1974) Decision style and its measurement. Unpublished manuscript. Graduate School of Business Administration, University of Southern California. Driver M. J. and Mock T. J. (1975) Human information processing, decision style theory and accounting systems. Account. Rev. July, 490-508. Endler N. S. (1966) Estimating variance components from mean squares for random and mixed effects analysis of variance models. Percept. Mot. Skills 22, 559-570. Endler N. S. (1975) The case for person-situation interactions. Can. Psychol. Rev. 16, 319-329. Endler N. S. and Hunt J. McV. (1968) S-R Inventories of Hostility and comparisons of the proportions of variance from persons, responses, and situations for hostility and anxiousness. J. Person. sot. Psychol. 9, 309-315. Endler N. S. and Hunt J. McV. (1969) Generalizability of contributions from sources of variance in the S-R Investigations of Anxiousness. J. Person. 37, I-24. Endler N. S. and Magnusson D. (1976) Towards an interactional psychology of personality. Psychol. Bull. 83, 956974. Gilchrist W. G. (1976) Statistical Forecasting. John Wiley, Chichester. Golding S. L. (1975) Flies in the ointment: Methodological problems in the analysis of the percentage of variance due lo persons and situations. Psycho\. Bull. 82, 278-288. Howell W. C. and Burnett S. A. (1978) Uncertainty measurement: A cognitive taxonomy. Org. Behar. Hum. Perform. 22. 45-68. Kahneman D. and Tversky A. (1982) The psychology of preferences. Sci. Am. 39, 136142. Lichtenstein S. and Fischhoff B. (1977) Do those who know more also know more about how much they know? Org. Behar. Human Perform. 20, 159-183. Lichtenstein S. and Fischhoff B. (1980) Training for calibration. Org. Behav. Hum. Perform. 26, 149-171. Lichtenstein S., Fischhoff B. and Phillips L. D. (1982) Calibration of probabilities: the state of the art lo 1980. In Judgement under Uncertainty: Heuristics and Biases. (Edited by Kahneman D., Slavic P. and Tversky A.). Cambridge Univ. Press. Lichtenstein S. and Slavic P. (1971) Reversals of preference between bids and choices in gambling decisions. J. exp. Psychol. 89, 46-55. Makridakis S. and Wheelwright S. C. (Eds). (1979) Forecasting. North-Holland, Amsterdam. Makridakis S., Wheelwright S. C. and McGhee V. E. (1983) Forecasting: Metho& and Applications. John Wiley, Chichester. McGee W., Shields M. D. and Birnberg J. G. (1978) The effect of personality on a subject’s information processing. Account. Rev. July, 681497. Milbum M. A. (1978) Sources in bias in the prediction of future events. Org. Behau. Hum. Pre/. 21, 17-26. Mischel W. (1968) Personality and Assessment. John Wiley, New York. Murphy A. H. (1973) A new vector partition of the probability score. J. Appt. Meteor. 12, 595-600. Myers I. B. (1962) Manual: The Myers-Briggs Type Indicator. Educational Testing Service, Princeton, N.J. O’Carroll J. M. (1977) Subjective probabilities and short-term economic forecasts: Am empirical investigation. Appl. Statist. 26, 269-278. Payne J. W. (1982) Contingent decision behavior. Psychol. Bull. 92, 382-2. Tukey J. W. (1949) One degree of freedom for non-additivity. Biometrics 5, 232-242. Tversky A. and Kahneman D. (1974) Judgement under uncertainty: Heuristics and biases. Science 185, 1124-I 131. Weinstein N. D. (1980) Unrealistic optimism about future life events. J. Person. sot. Psychol. 39, 806820. Wheelwright S. C. and Makridakis S. Forecasting Mefhoa!s for Management. John Wiley, New York. Wright G. (1982) Changes in the realism and distribution of probability assessments as a function of optimism type. Acta Psychol. 52, 16>174. Wright G. (1984) Behavioral Decision Theory. Penguin, Harmondsworth and Sage, Beverly Hills. Wright G. (1985) Decisional variance. In Behavioral Decision-making. (Edited by Wright G.). Plenum Press. Sew York. (1986) Wright G. and Ayton P. (1987) Judgmenral Forecasting. John Wiley, Chichester.

I20

GEORGE WRIGHT and

PETER AYTON

Wright G. and Fowler C. (1986) fncesfigdce Design and Sratisfics. Penguin. Harmondsworth. Wright G. and Phillips L. D. (1984) Decision-making: cognitive style or task-related behavior? In Personaliry psxchologj in Europe (Edited by Bonarius H., van Heck G. and Smid N.). Swets and Zeitlinger, Lisse. Wright G. and Wisudha A. (1982) Distribution or probability assessments for almanac and future event questions. Stand. J. Psychol. 23, 219-224. Yates J. F. (1982) External correspondence: Decompositions of the mean probability score. Organ. Behar. Hum. Perf 30, 132-156. Zakay D. (1983) The relationship between the probability assessor and the outcomes of an event as a determiner of subjective probability. Acra Psychol. 53, 271-280.