acta psychologica Acta Psychologica
ELSEVIER
The psychology
87 (1994) 137-154
of linear judgement
models
Berndt Brehmer Dept. of Psychology, Uppsala University, P.O. Box 1854, S-751 48 Uppsala, Sweden
Abstract The ordinary policy capturing paradigm that focuses on cue-judgement relations is too limited to serve as a basis for a theoretical understanding of human judgement. To get on, we need a Brunswikian approach with a representation of both the task and the judge. Three stable results from studies with linear models are discussed from that perspective. Following Einhorn et al. (19791, the result that linear models usually fit judgement data well is explained by reference to the fact that linear models capture an essential feature of human judgement, viz., vicarious functioning. For the result that judges are inconsistent and that inconsistency varies with the predictability of the judgement task, the theory of quasi-rationality proposed by Hammond and Brehmer (1973) is invoked. Finally, it is argued that the wide interindividual differences in policies usually found show that the level of analysis is inappropriate. A given level of achievement can be reached by many different combinations of weights, and we should not be surprised to find wide interindividual differences at the policy level. We must search for stability at the level of achievement and those aspects that affect achievement, rather than at the level of cue utilisation coefficients.
1. Introduction
Linear models have a long history in the study of human judgement, going back to the study of “what is in the corn judge’s mind” by Wallace in 1923. Their modern origins are found in the classic papers by Hammond (19.55) and Hoffman (1960). Since the publication of these papers, judgement has been studied by means of linear models in a variety of contexts (see Brehmer and Brehmer, 1988, for a review). Despite the wealth of studies showing that such models fit judgement data quite well, there has, however, been little progress in our theoretical understanding of the psychological processes that produce these data. Indeed, many psychologists do not seem to have seen the need for any deeper psychological thinking about the results from such studies, since they have been convinced that a linear model will fit the data from judgement studies whatever the nature of the OOOl-6918/94/$07.00 0 1994 Elsevier SSDI OOOl-6918(94)00013-7
Science
B.V. All rights reserved
138
B. Brehmer/Actu
Psychologica 87 (1994) 137-154
underlying psychological process. This is empirically false; there are examples of studies in which a linear model does not fit judgement data at all (e.g. Alm and Brehmer, 1991, to take an example from my own research). It is based on a misunderstanding of an important paper by Dawes and Corrigan (1974). This paper clarified the conditions under which a linear model will fit and argued that configural components and differences in cue weights may not crossvalidate because they may be sample specific. A unit-weighted linear model is therefore often a more stable candidate than a model with differential weights and non-linear components, provided that the data follow a pattern of conditional monotonicity. Thus, the fact that linear models fit judgement data is something that requires an explanation, as there are stable findings from studies using such models. Before listing these findings, and discussing some possible explanations for them, we need to define the nature of the tasks in studies of human judgement, however. 1.1. What kinds of tasks are used in the study of human judgement by means of linear models? Dictionary do not mention any Definitions of judgement in, e.g., Webster’s specific task; judgement can be exercised in many different situations and for many different kinds of tasks. It is just an ability to come to a conclusion. Studies using linear models are, however, more limited and concern judgement under conditions where a person is faced with a number of cues and has to integrate these into some unitary response. This may involve an evaluation of some multidimensional object (as in studies of riskless choice using multi-attribute objects), or a prediction of some distal state of affairs from a set of proximal cues (as in studies of clinical judgement). Although linear models often fit both of these kinds of judgements, we cannot very well believe that the underlying psychological process could be the same for an evaluation and a prediction. The tendency to lump such studies together is probably one of the reasons why there has been so little theoretical progress in the area. In the present paper we will be concerned only with the latter form of judgement, that involving predictions. This is also the general context in which linear models were first used in psychology. Both Hammond and Hoffman were concerned with clinical judgement, and the use of linear models belongs in the last stage of the controversy over the relative efficacy of clinical and statistical had prediction started by Meehl in 1954, viz., the stage when the controversy developed into an interest in the “cognitive activity of the clinician”, to quote another well-known paper by Meehl (19601. A second impediment to theoretical progress stems from the tendency to equate studies of experienced judges, e.g., clinical psychologists, with studies of undergraduate students doing a more or less contrived judgement task for the first (and only) time. In a prediction task, which is basically concerned with induction, the nature of the subjects’ experience with the task is, of course, not only relevant, but crucial. In the present context, therefore, we will be concerned only with studies where the judges have relevant experience, either from doing a task as part of their
B. Brehmer/Acta
Psychologica 87 (1994) 137-154
139
job, as, for example, clinical psychologists do, or from learning experiments in the laboratory. There is evidence that the results from multiple-cue judgement learning experiments closely parallel those from studies of clinical judgement (Brehmer, 19761, so such results are relevant in the present context also.
2. Research with linear models: The principal results The results from research with linear models have been reviewed many times, most recently by Brehmer and Brehmer (1988). There are basically three principal results: (1) Linear models fit the judgements quite well. When configural components have been found, they have usually only accounted for a few percent of the variance, and the extent to which they would have survived crossvalidation is uncertain. (2) Judgements are inconsistent, and the level of consistency varies with the level of predictability of the judgement task. (3) There are wide interindividual differences in how judges weigh the cues, also when they have considerable experience with the tasks studied. We will discuss each of these findings in turn, and briefly touch upon a fourth finding: that judges often cannot give accurate reports about how they make their judgements.
3. Why do linear models fit so well? Both Hammond and Hoffman offered linear models as a remedy for a difficult methodological problem in the study of cognitive processes: the fact that subjects often cannot report accurately about these processes. This makes it necessary to find means to make useful inferences about the cognitive processes of interest, e.g., those of a clinician making diagnoses, from what can be observed, viz., from the judgements themselves. As is demonstrated in many sciences, a mathematical model can often be used for this purpose. The problem is to find a useful model. While Hammond and Hoffman both proposed that a linear model such as a regression equation would serve, they differed in their justification of the choice of this model as their candidate. To Hoffman (19601, the choice was a pragmatic one, based on the flexibility of linear models. He gave no theoretical basis for his choice, hence his term “paramorphic model” for the linear models fitted to the subject’s judgement. This term emphasises that the linear model is but one of many possible descriptions of the psychological processes in a judgement task such as the MMPI. In contrast, Hammond (19.55) had a theoretical reason for his choice of model. His use of the linear model in the study of clinical judgement was part of an attempt to apply the general framework of probabilistic functionalism (Brunswik,
140
B. Brehmer /Acta Psychologicu 87 (1994) 137-154
1952, 19.56) to the study of clinical judgement. His reason for choosing this particular model was that it captures the capacity for vicarious functioning, a fundamental property of human cognition stressed by Brunswik (e.g., 1952). At the heart of the application of probabilistic functionalism to clinical judgement lies the rejection of the traditional focus on the clinician in favour of a focus on the clinician--patient system where the symptoms (be they test scores or direct observations of patient behaviours) form a boundary common to the two subsystems, the clinician and the patient as illustrated in Brunswik’s well-known lens model (Brunswik, 1952, see Hammond, 1955, for a lens model for clinical judgemerit). As noted above, a fundamental problem in studying this system lies in the lack of intersubjective communicability: the clinician cannot give accurate verbal reports about his judgements. According to Hammond, this is not a mere technical problem to be solved by better methods of obtaining verbal reports, it lies at the heart of the clinical situation itself. Specifically, it is a consequence of the cicarious ~ufuncfioning that is a characteristic of clinical judgement. Thus, the patient will exhibit his or her problem by means of many intersubstitutable symptoms (vicarious mediation), and the clinician must use these symptoms vicariously as they appear: ’
“The patient is trying, say, to achieve a certain goal. The clinician is attempting to discover the patient’s motive. The patient substitutes one form of behavior for another as he attempts to achieve his goal (equifinality). The clinician perceives these behaviors, as they substitute for one another, as cues which also substitute for one another (equipotentiality). Because of vicarious functioning, then the clinician is hardpressed to point at, to communicate, the basis for a decision (except in the special case where univocal cues are available).” (Hammond, 1955, p. 258)
Thus, because these judgements will be based on a variety of intersubstitutable symptoms, the clinician is seldom able to point to any particular set of symptoms as the basis for his or her judgements. Specifically, the clinician will be hard pressed to give general rules that describe his or her judgements, since these judgements will be made from different cues from patient to patient, or even for different occasions for the same patient. Therefore, we must assume that clinical judgement will exhibit a form of probabilistic functioning, and we cannot expect to find stable relations between a set of symptoms and the judgements. This implies that we must study the clinical judgement by means of statistical methods. Just as Brunswik (1952) had proposed, statistics is the basis for a unified methodology in psychology.
Twik (1952) had pointed to the intersubstitutable nature of symptoms as a main feature of psychoanalytic theory thus establishing contact between his theory and a major theory in the clinical field, so the application of probabilistic functionalism to clinical judgment was hardly foreign to his thinking.
B. Brehmer/Acta
Psychologica
87 (1994) 137-154
141
3.1. Linear models and vicarious functioning Einhorn et al. (1979) have clarified the relation between linear models and vicarious functioning. They maintain that a linear model captures vicarious functioning in at least three important respects: (1) The additive combination function implies a fully compensatory system. (2) The degree to which the cues trade off depends on the task environment since the beta weights that decide the tradeoffs are determined by considering all the cues and their particular levels (3) Cue redundancy is incorporated in the model since the beta weights are determined by the correlational structure of the cues in the task, including their intercorrelations As for the first of these points, the additive combination means that a given outcome can be produced by many different combinations of cue values. These combinations are thus alternative (“intersubstitutable”) indications of the same distal state. That is, the task exhibits the basic feature of vicarious mediation. The extent to which the clinician gives the same judgement for each of these intersubstitutable combinations then indicates vicarious functioning on his or her part. For example, in the MMPI, for which a linear combination of the scales provides a good way of using the test for diagnosing neurosis vs. psychosis (Goldberg, 1965), there will be many combinations of scale values that suggest, say neurosis. That is, the distal state of neurosis in a patient will manifest itself in many different combinations of cue values. Thus, the task is characterised by vicarious mediation, and if the clinician learns such a task, his or her cognitive processes will exhibit the basic feature of vicarious functioning. Therefore, a linear model will fit the judgements. The second point of Einhorn et al. is just a different way of stating the same thing as the first. Trading off one cue against another is the mechanism in a linear model that produces the same judgement for different cue combinations. The beta weights specify how these tradeoffs can be made, and they show how the tradeoffs that the judge makes depend on the task. That is, they specify the conditions for intersubstitutability, i.e. for deciding when two sets of cue values point to the same distal state. The third point, concerning cue redundancy, points to the basis for using some cues as substitutes for other cues. This is another aspect of vicarious functioning. It points to the possibility of using a subset of cues to represent the total set, either for reasons of cognitive economy, or because not all cues happen to be available for a given case. Within the lens model framework, then, the use of linear models is not an arbitrary choice. It is motivated by a fundamental aspect of the clinician’s task: intersubstitutability of cues, and the attendant need to capture the resulting cognitive process of vicarious functioning. A linear model serves this purpose, and the extent to which the model fits the data demonstrates vicarious functioning. That is, from a Brunswikian point of view, the reason linear models fit human
142
B. Brehmer/Acta
Psych&&
87 (1994) 137-154
judgement so well is that they capture a basic feature of the judgemental process, viz., vicarious functioning, and they offer an explanation for how vicarious functioning occurs. Rather than learning which cue combinations are equivalent directly, subjects learn the tradeoffs that are required. Thus, there is psychological reason why linear models are appropriate, and why they fit human judgement. It is because judgement tasks demand vicarious functioning and because the linear model captures this form of cognitive functioning. This also tells us when to expect a good fit and when not to expect a good fit. We will have a good fit when the task has intersubstitutable cues, and when the judge has had a chance to learn that this is the case. The answer to the question of whether a linear model will fit is not to be found in the character of the judgement process as such, it is to be found in the nature of the tasks that require human judgement (Brehmer, 1969).
4. The phenomenological
reality of multiple
regression
equations
The Brunswikian perspective stressing vicarious functioning provides one inroad to the problem of the psychological reality of linear models. However, the Brunswikian perspective has not been the most common one in studies of clinical judgement. Instead, most judgement researchers have followed Hoffman (1960) and seen regression equations as a form of intervening variable, relating input to output, but without any psychological reality in themselves. The resulting models have been considered paramorphs, i.e. one of many ways of characterising a judgement process, but not as exhaustive characterisations. However, somewhere on the way, the regression equation became reified. Subjects were said to have a policy that they then applied imperfectly. The distinction between knowledge and control proposed by Hammond and Summers (1972) is a particularly clear example, as are my own studies on cognitive skills in judgement (e.g. Brehmer et al., 1980). This way of thinking about regression equations is very inviting since it “makes sense”. It makes sense because a regression model provides intuitively understandable answers to four basic questions that we might be interested to ask about judgement: (1) Which cues are used by the judge? This is shown by the variables that receive significant weights in the analysis. (2) What is the relative importance of the different cues? This is shown by the relative weights for different cues in the regression equation. There may be some disagreement about which of the different available indices that is the best index of weight, but under most circumstances, the various indices are monotonically related. (3) Is a cue used linearly or nonlinearly? The function form can be found by adding higher order polynomials in the analysis (see e.g., Wiggins and Hoffman, 1968). (4) How is the information from different cues combined into an overall judgement? Finding the combination rule may require some work, but it is possible to find also nonlinear combination rules.
B. Brehmer/Acta
Psychologica 87 (1994) 137-154
143
These are reasonable questions to ask anyone about his or her judgements. A linear model gives answers to these questions. Thus, the regression equation serves as a language for communicating the basis for our judgements, a point stressed by Hammond and Brehmer (1973). They advocated the use of computer graphics as a means of helping people in conflict communicate about their cognitive differences in terms of regression equations fitted to their judgements. This technology has subsequently been applied with considerable success in negotiation and conflict resolution (e.g., Rohrbaugh, 19881, thus supporting the hypothesis that the results of analyses by means of linear models provide an acceptable and understandable way of communication about judgement processes. Thus, the indices provided by the regression equation do not only make sense to judgement researchers, they also make sense to those whose judgements have been analysed in this way, despite that they are unable to describe their judgement policies. They are, however, able to recognise their own policies, and pick out their own set of weights from among a large set of policies (Reilly and Doherty, 1989, 1992) From their results, Reilly and Doherty (1992) concluded that “The regression models are representing the individuals’ judgment strategies sufficiently veridically for the individuals to see themselves in these highly abstract representations A regression policy is much more than a prediction device, much more than an equation relating the output to the input.” (op. cit., p. 307-308)
The problem, then, is what “more” the regression equation is. We will return to this question below. Here, we only note that it does not necessarily mean that a linear model also produced these judgements. If a linear model produces the judgements, it is almost certainly not a regression equation in any proper sense (Armelius and Armelius, 1974). One of many indications that judges do not do a simple regression analysis is the pervasive result that they are inconsistent in the judgements (Brehmer and Brehmer, 1988). We now turn to a discussion of this result and a possible explanation.
5. Inconsistency The pervasive finding that subjects are inconsistent (see Brehmer and Brehmer, 1988) shows that regression equations alone cannot describe the subjects’ cognitive processes. Interestingly, inconsistency is one aspect about which subjects definitely lack insight. When asked to describe how they arrive at their judgements, judges may point to the cues that they use, the relative weights of these cues and how they combine them. But no study so far has reported that the subjects have said that they were inconsistent. That is, though the subjects may be wrong in their verbal reports, certain concepts, such as cue weights, are part of their understanding of judgement, but other concepts, such as inconsistency, are not. Many judgement researchers assume that the inconsistency comes from the problems that subjects have in applying their judgemental policy. Evidence that
144
B. Brehmer /Acta
Psychologica
87 (1994) 137-154
the complexity of the judgement task affects the level of consistency supports this hypothesis. Thus, consistency is lower when there are many cues (e.g., Einhorn, 1971) and non-linearity (Brehmer et al., 1980) in the task. Moreover, subjects seem to know at least that it is more difficult to apply a non-linear rule compared to a linear rule (Knez, 1992). Consequently, it does not seem unreasonable to think that the regression equation may express a policy that subjects are trying to implement, a policy which they fail to implement with complete consistency because they lack the necessary cognitive skills (Brehmer et al., 1980). However, this is not a complete explanation, for complexity is only one of the factors that affects consistency. The other factor is task predictability, i.e., the extent to which the cues available to the judge permit valid judgements. This factor is usually measured in terms of the multiple correlation between the cues and the criterion, R,. Specifically, results both from laboratory studies of learning and conflict and from studies of clinical judgement show that consistency, measured as the multiple correlation between cues and judgements, R,, is a monotone function of R, (Brehmer, 1976; Camerer, 1981). This finding cannot be understood in terms of the simple distinction between a policy and its application, nor can it be explained in terms of the related distinction between knowledge and control proposed by Hammond and Summers (1972). Their theory assumes that policy application is disturbed by probabilistic feedback (probabilistic feedback leads to lack of cognitive control over the execution of the policy), but it contains no mechanism that would explain why lack of control would vary with predictability, nor can it explain why consistency varies with predictability also when the subjects receive no feedback in the policy capturing stage as in the studies reviewed by Brehmer (1976) and Camerer (1981). To explain such results, we need a different conception of the judgement process than that provided by the distinction between a policy and its application. The concept of quasi-rationality (Hammond and Brehmer, 1973) offers one possibility. 5.1. Judgement
as u quasi-rational
process
Hammond and Brehmer (1973) proposed that inconsistency in judgement reflects quasi-rationality, a special cognitive process that is characteristic of judgement under uncertainty. Specifically, Hammond and Brehmer proposed that people do not develop dependable rules for uncertain tasks, rules that they themselves trust. The reason is that they do not approach such tasks as statistical problems (Brehmer, 1980). However, they must make judgements, despite that they do not have trustworthy rules and they do so by means of a compromise strategy that involves using the rules they have found and the specific memories of previous outcomes. For example, a clinician making a judgement from the MMPI may first use his or her rules to come up with a judgement. The clinician may then recall an earlier patient with a similar profile that did not fit this judgement. The rule-based judgement is then adjusted to agree with wnat he or she can remember about the earlier patient.
B. Brehmer /Acta Psychologica 87 (1994) 137-154
145
Hammond and Brehmer did not work out the specifics of this process, and there are obviously at least two alternative possibilities: that specific memories sometimes substitute for the judgements that would be produced by the rules, or that each judgement is a compromise between that which would be produced by the rules and the specific memory that the judge happened to retrieve. If the memories are retrieved by random sampling from earlier cases, either of these processes would produce results of the kind usually found in studies of clinical judgement. That is, the judgements would be partly regular and partly random, and the level of inconsistency would be a monotone function of task predictability. The quasi-rationality hypothesis proposed by Hammond and Brehmer (1973) could, in principle, explain the relation between task predictability and consistency. Whether the explanation is correct must, of course, remain an open question, for no specific tests of this hypothesis have yet been made. To make it testable, it is necessary to work out the proposed mechanisms in more detail. Simulation studies testing the effects of different assumptions about these mechanisms could be a first step here. The results from studies on cue probability learning are consistent with the quasi-rationality hypothesis. In such tasks, the ratio of the unaccounted for to that in the task variance in the subjects’ response system, s& is proportional 1973). This is system, s& and the ratio of s& to s& is below unity (Brehmer, exactly the result that we would expect if the subjects remembered the outcomes but made their judgements as a compromise between a specific memory and a rule. It is also consistent with results by Bjiirkman (1966) who found that subjects learn both the functional rule and the distribution of outcomes in a cue probability learning task, and reproduce both aspects of the task in their judgements. The fact that the ratio of s& to s:~ is below unity in cue probability learning tasks mirrors the finding in studies of judgement that R, > R, (Brehmer, 1976; Camerer, 1981). Thus, the subjects do not match their judgements exactly to the distribution of outcomes. One possible interpretation of this finding is that the subjects give more weight to the rule aspect than to the specific memories in their compromise judgements. An alternative interpretation could be that the subjects do not have specific memories for all cue combinations, and that they therefore have to rely on the rule component in some judgements. Finally, it is possible that the memories are biased and that cases similar to those predicted from the rule are easier to recall. At present, we do not know enough about the role of memory in judgement to decide among these possibilities. The quasi-rationality hypothesis also explains the effect of complexity. It is due to the rule component and the problems in applying the rule. Hammond and Brehmer’s quasi-rationality hypothesis takes us back to Hammond’s original point that clinical judgement is a form of probabilistic functioning. Hammond’s formulation was based on the hypothesis of vicarious functioning, and the fact that the subjects might have different samples of cues on different occasions. The quasi-rationality hypothesis puts the basis of the probabilistic component in the subjects’ memory. It does, of course, not exclude probabilistic functioning due to vicarious functioning. Obviously, a regression equation cannot capture the full quasi-rational process.
146
B. Brehmer /Acta Psychologica 87 (1994) 137-154
It can only capture the systematic part, i.e. the rules. The fact that subjects can recognise these systematic aspects of their policies suggests that the regression results do indeed capture the rule bound aspect of quasi-rational thinking. This may be the “much more” that Reilly and Doherty talked about. Hammond and Brehmer (1973) hypothesised that quasi-rationality would prevent the subjects from communicating the basis for their judgements adequately. They may be able to do so for a single case, but they would be unable to express any very general rules covering all their judgements for a given task. The reason is similar to that given by Hammond in the original 1955 paper: there is not bale basis for the judgements but many. The judges therefore cannot describe their strategy as a simple set of rules. Empirical support for the hypothesis that inconsistency prevents people from communicating is provided by a study by Brehmer (1974). This study showed that in a cognitive conflict situation, the number of questions about policies (“what are you doing?” and “why are you doing it?“) increased with the level of inconsistency. The quasi-rationality hypothesis requires us to make a sharp distinction between those tasks for which subjects have experience and those for which this is not so. It applies only to the former case. For the latter, the subjects cannot have any specific memories of outcomes. Therefore, the judgements can only reflect the subjects’ rules and the problems that they may have in applying them. Consequently, we would expect high consistency for such tasks. Unfortunately, there are no empirical results directly relevant to this problem in studies on clinical judgement. A second prediction for which there is empirical evidence, is that insight would be better for tasks for which the subjects have little experience. Slavic’s (1969) study of stock brokers provides support for this hypothesis. Specifically, Slavic found that while novices made their judgements based on text book rules that they could both remember and report, more experienced brokers based their judgements on personal experience and were thus unable to provide adequate reports. Interestingly, Hammond et al. (1964) also found that novice clinicians making judgements of intelligence based on Rorschach scores used textbook rules while more experienced clinicians relied on experience. However, Hammond et al. did not assess insight directly so we do not know whether that part of Slavic’s results were replicated also. The quasi-rationality hypothesis explains how inconsistency is produced, but it has nothing to say about where the rules come from. Presumably, judges develop them from experience in the widest sense, ranging from what they have read or been told, to concrete experience of specific cases. Einhorn et al. (1979) proposed that the rules people use in many judgement tasks result from two heuristics. The first is that the sign of the relationship between judgements and a given cue should be independent of other cues (a heuristic consistent with the results of Armelius and Armelius, 1974, who showed that tasks with suppressor variables are especially difficult to learn) and the second to treat the tradeoff between a given pair of cues as independent of the levels of other cues. These two heuristics of conditional monotonicity and tradeoff independence imply a linear model, and explain why
B. Brehmer/Acta
Psychologica 87 (1994) 137-154
147
such a model is so easy to learn. The judges could then learn the specific function forms for the individual cues by a hypothesis sampling process of the kind proposed by Brehmer (1980).
6. Linear models and the experience
of judgement
as a process
Even though the results of judgement analyses by means of linear models provide indices of judgement that are readily understood and communicated, it does not capture the experience of making a judgement. Even though a person arrives at such a judgement quite quickly, there is nevertheless often a sense of deliberation. In short, making a judgement is experienced as a process over time. This suggests that some form of verbal protocols would provide a useful method for studying judgement. On the other hand, the results from studies of judges’ insight into their judgement processes show that people generally cannot give accurate reports about the weights that they give to the cues (Brehmer and Brehmer, 1988, review these results). Such results suggest that people lack the insight into the process needed and that verbal protocols may not give much information about judgemental processes. This expectation turns out to be wrong, however. Studies by Kleinmuntz (1963; see also Einhorn et al., 1979) shows that verbal protocols can be used to develop the usual form of computer simulation of psychological processes for one of the mainstays in judgement analysis: judgements from the MMPI. The rules extracted from such verbal protocols, and subsequently successfully used in the simulations of the judgement processes, were not expressed in the form of beta weights in a regression equation. It is an important question, therefore, what the relations between such simulations and the models obtained from judgement analysis might be. This problem has been discussed in some detail by Einhorn et al. (1979). They argue that although linear modelling and process tracing methods model the same process, they do so at different levels of detail. They refer to Hayek’s (1962) distinction between rules at different levels of generality, and point to the possibility that regression equations model the judgement process on a higher level of generality than do the process tracing models. The models at the highest level of generality are not necessarily conscious. Rules at this level describe what is taken for granted and thus the constraints that rules at the more specific level are subject to. What appears in consciousness is the application in terms of specific rules to handle specific cases, rules that are constrained by the rules on the higher level. The rules extracted from the verbal protocols should therefore be consistent with the linear model, but they need not necessarily be expressed in terms of that model. Einhorn et al. provide examples from a case for which they developed both a model from a concurrent verbal protocol and a regression model based on the cue-judgement relations. The distinction between rules at different levels offers a useful way of reconciling the abstract linear model, which fits the judgements although it cannot be
B. Brehmer/Acta
148
Psychologica 87 (1994) 137-154
reported, with the experience of more specific rules in judgement. The results of Reilly and Doherty (1989, 1992) suggest that subjects know these constraints and can recognise them, although they cannot report them. These results show that we need a more complex model than that proposed by Hammond and Brehmer (1973). Specifically, that model may need a more elaborate rule concept to account for the possibility of rules at different levels.
7. Interindividual
differences
in judgement
policies
In this section, we turn to the third result obtained without exception in studies of judgement: that there are wide interindividual differences in cue weights, also among subjects with considerable experience with the same task (Brehmer and Brehmer, 1988). In the light of the results obtained with traditional learning tasks in psychology, this result seems difficult to understand. Experience should lead subjects to become similar, and the only explanation why subjects would not become similar with experience would be that they somehow failed to learn from experience. Self-evident as this may seem, it is nevertheless wrong in the case of multiple-cue probabilistic judgement tasks. To understand why, we need to consider the nature of the demands that such tasks make in more detail. A central methodological assumption in probabilistic functionalism is the Behaviour-Research Isomorphy principle, BRI-principle (Brunswik, 1952; see Brehmer, 1984, for a discussion). It says that research should focus where the organism focuses. That is, we can only hope to understand what a person is doing if we know what he or she is trying to achieve. To understand judgement, we must therefore understand what the judge is trying to achieve, for this will shape his or her judgement policy. The clinician’s focus is, obviously, not the cue weights. It is achievement or accuracy, i.e. the clinician’s focus is the extent to which his or her judgements agree with the distal state that he or she is trying to achieve. The specific aspects of the clinician’s judgement policy can only be understood in terms of the demands that achievement makes on the policy. The lens model equation (Hursch et al., 1964) helps us understand exactly what regularities to expect. Eq. (1) gives the lens model equation in the form that was proposed by Tucker (1964) and for the case when both the task and the judge can be described in terms of linear models with linear relations between cues and criterion values and judgement, respectively. ra = GR,R,, where ra is the correlation between the subject’s judgements and the distal values that he or she tries to achieve, G is the correlation between the linearly predictable variance in the task system and that in the judge’s response system, R, is the multiple correlation between the cues and the distal values, and R, is the
B. Brehmer/Acta
Psychologica 87 (1994) 137-154
149
multiple correlation between cues and the subject’s responses. ra is usually referred to as the achievement correlation. G shows the extent to which the relative weights given to the cues by the subject match the relative weights in the task system. R, reflects the extent to which the distal values can be predicted from the cues and R, the subject’s consistency. According to the BRI principle, judges are trying to maximise ra. The lens model equation shows what they must do: they must find the correct relative weights for the cues so that G = 1.00 and apply these consistently, i.e. so that R, = 1.00. If they succeed in this, they achieve the maximum value of ra = R,.This requires a statistical approach to the task, but as we have already noted, subjects do not take such an approach to the kind of tasks found in clinical judgement. Therefore, we cannot assume that they have a very good conception of what the maximum achievement is. This, in turn, makes it hard for them to evaluate their performance, and they will not know when they have reached the maximum level. They will note, however, that changes in the weights that they give to the cues will not change their achievement very much. As shown in a recent paper by Castellan (19921, G is not a particularly sensitive measure. It will be very close to unity even if the subjects’ weights deviate considerably from the optimal weights in the task. This may be seen as a deficiency in G as a dependent variable in psychological studies, but there is a more important message here: relative weights simply do not matter very much. It means that there is not very much pressure towards any particular set of weights in a multiple-cue judgement task. Therefore, we have little reason to expect that the subjects should exhibit the same set of weights. Consequently, we should not be surprised to find wide interindividual differences with respect to cue weights; there is simply no pressure towards uniformity. Such differences can only be explained in terms of the personal history of the judges. They are not related to any of the independent variables in judgement studies, and the conditions under which judgement policies are acquired will not affect these weights. To affect such weights, it is necessary to make the judges focus on the weights, e.g. by means of cognitive feedback. However, so long as the judges do not focus on the weights, but only on their achievement or agreement, they are not likely to notice what their policy differences are. This tells us that because of the nature of probabilistic judgement tasks, we cannot expect to find stable results at the level of cues weights. Stability can only be found at the higher levels, such as r,, R, and G. That is, it can only be found for those parameters that directly express the subjects’ focus and that have some direct relation to his or her ability to achieve the distal variable of interest. If it is the task of science to find stable empirical relations as a point of departure for the development of theories, this is the level where a psychology of judgement should operate. It should not operate at the more microscopic level of cue weights and phenomena at the level revealed by “think aloud” protocols as discussed above. In short, we need to model achievement, rather than behaviour. In this sense, paramorphic modelling at the level of cue-judgement correlations may be said to have led us astray for it has led us to focus on an unimportant feature of the judgement process, viz., the cue weights.
150
8. Some limitations
B. Brehmer/Acta
in research
Psychologica 87 (1994) 137-154
on human judgement
using linear models
In using the term paramorphic modelling, Hoffman (1960) called attention to the fact that the models of judgement obtained by means of regression models do not describe all aspects of the human judgement. The most obvious limitation of judgement analysis by means of models is, of course, that it is a form of input-output analysis. The process between the presentation of the cues and the judgement is not considered in such an analysis. Whether this is a serious limitation depends on one’s goals. To some psychologists (see e.g., the volume edited by Montgomery and Svenson, 1989), this limitation is probably fatal for it means that the results obtained by means of linear models cannot be used for theorising at their preferred level of analysis. Brunswikians, adhering to the BRI-principle, would of course, reply that the data from verbal protocols show too much variation to be used for theorising (cf., the results from Einhorn et al., 1979, reviewed above). Moreover, the level of analysis in linear modelling is sufficient for practical applications in the form of methods that enhance learning and conflict resolution (Wigton, 1988; Rohrbaugh, 1988). This supports that this level of analysis is meaningful. However, despite this, it is nevertheless clear that certain aspects of interest are left out of these analyses. Kahneman and Tversky (1979) note that judgement consists of three interrelated subprocesses: information search, information combination and feedback/learning. Obviously, linear modelling focuses only on the second of these aspects or subprocesses. Information search is ruled out because the methodology of linear modelling requires that a subject be provided with all cues for all cases. Consequently, there can be no real search for information, except for the trivial reading of all or some of the information that is provided. Therefore, the results of studies using linear modelling and the traditional format presenting all possible cues in abstract form may overestimate the number of cues that are used under more normal conditions. Some evidence for this hypothesis comes from two studies that have compared judgement policies obtained from photographs and policies obtained when the cues were presented in the ordinary verbal format. Both of these studies (Carlstrom, 1989; Phelps and Shanteau, 1978) suggest that subjects may be using fewer cues when the cases are presented as photographs and when they have to search for the cues (in the Phelps and Shanteau study, the interpretation of the results in these terms is problematic, however, because there were intercorrelations among the cues in the photograph condition but not in the verbal condition. This is not true in Carlstrom’s study). This may seem to question the conclusion that judgement policies that are obtained by means of “paper patients” are the same as those obtained for real patients (Brehmer and Brehmer, 1988). It may well be that the high correlations between the judgements obtained for “paper patients” and those obtained for real patients do not necessarily mean that the policies are similar. Instead, they may be due to high intercorrelations among the cues for the real cases. They lead to high correlations between judgements for “paper patients” and for real patients even if only a subset of cues used for the “paper patients” are used for the real cases.
B. Brehmer/Acta
Psychologica 87 (1994) 137-154
151
This limitation is not serious for those studies of judgement that use tasks for which it is standard practice for a judge to have all the information. This is the case for many situations where linear modelling was first used, i.e. the interpretation of tests, such as the MMPI or Rorschach by clinical psychologists. But even if such situations may be common, there are certainly many clinical situations that require information search also. Linear modelling is thus not likely to provide a complete theory for clinical judgement, nor is it likely to provide a complete picture of the clinician’s skills. It is limited to the part of the process concerned with the integration of the cues.
9. Conclusions As the reader will note, most of the references in this paper come from the 1960’s and 70’s. Although it is easy enough to find applications involving the use of linear models also in the 1980’s and 90’s, there seems to have been little theoretical and methodological work after, say, 1979 when the paper by Einhorn et al. appeared. One reason for this is probably that the principal problem in studies in the paramorphic modelling tradition, that of configurality, has been solved. It is now recognised that studies using linear modelling techniques could not take us further than to the conclusion that configurality is not beyond human judges, albeit, perhaps, not very typical of human judgement. Now, the second problem, that of insight, seems to have received some kind of solution too in the demonstrations by Reilly and Doherty that subjects can recognise their policies, although they cannot describe them. These were the only problems that could be investigated in the traditional single-systems paradigm employed in the paramorphic modelling tradition. This approach is limited to an analysis of the relations between cues and judgements. Zpso facto it is limited to testing hypotheses about a judgement process that is independent of the judgement task, for the task is not represented in the paradigm. Consequently, task effects could only appear as unaccounted for variance between studies, i.e. the results with the paradigm would seem to lack reliability. The results discussed in this paper show that stable results can be obtained only if the task is included in the paradigm, and if we view these results in terms of the judge’s focus on accuracy. We then find the answer to the question why linear models fit judgements so well (it is because they capture the all important characteristic of vicarious functioning), and why there are wide interindividual differences (because there is no pressure towards uniformity). We also find that inconsistency is related to task characteristics, and does not simply express lack of fit. This provides a stepping stone towards theorising about the nature of the judgement process. Finally, consideration of the task and the process of adaptation to the task will presumably answer questions about the particular form of the judgement process, e.g., whether or not there will be configural components. This suggests that the single systems paradigm used in the paramorphic mod-
152
B. Brehmer /Acta Psychologica 87 (1994) 137-154
elling approach has now outlived its usefulness as a tool for understanding human judgement. If we are to get ahead, we need to adopt the Brunswikian approach of studying the interaction between the judge and the judgement task. This is the level where we can find stable results and ask meaningful questions about judgement. It is, perhaps, appropriate to end the paper with a summary of what a theory of judgement may look like in the light of the results reviewed above. We can do so in terms of four principles. _ Human judges are concerned with achievement, they focus on the accuracy of their judgements. _ To reach a high level of achievement, they must learn the constraints imposed by judgement tasks. An important constraint is vicarious mediation with the attendant need for vicarious functioning. This implies a linear model. The constraints may well be implemented in the form of two heuristics proposed by Einhorn et al. (1979), and they are reported by the judges because they constitute what is taken for granted in the situation. They can, however, recognise that their judgement follows these constraints. _ Within the general constraints imposed by the linear model, there is not very much pressure towards uniformity of policies because a wide range of policies will lead to similar levels of achievement. The actual set of weights for a given judge therefore cannot be explained in terms of the task, but must be explained in terms of factors outside the paradigm. such as the personal history of the judge. - Human judges do not approach judgement tasks as statistical problems, and therefore, they do not develop consistent rules. This is also true of clinical psychologists, who certainly have the training needed for understanding and developing such rules. Whether they then fail to apply their statistical knowledge because they do not realise that it applies, or because they lack the memory capacity for doing so is not clear. Whatever the reason, the result is inconsistency, which may be explained in terms of quasi-rationality. That is, the actual judgements result from a compromise between rules and specific memories of earlier outcomes. This explains the relation between consistency and task predictability. Such a process may not lead to optimal judgements in a regression sense, but the judgements that result are not likely to be widely off the mark. In so far as the judgements can be corrected by feedback in a dynamic process, which is likely to be more typical of situations in which human decision making occurs than the one-shot situations envisioned in traditional decision theory (Hogarth, 1981), such a process may well suffice for most decision problems, for it will insure that the decision maker is headed in the right direction. Acknowledgement This study was supported by a grant from the Swedish the Humanities and Social Sciences.
Council
for Research
in
B. Brehmer/Acta
Psychologica 87 (1994) 137-154
153
References Alm, I. and B. Brehmer, 1991. Judgments about traffic safety measures. Swedish Road Research Institute, Linkoping. Manuscript. Armelius, K. and B. Armelius, 1974. The use of redundancy in multiple-cue judgments: Data from a suppressor variable task. American Journal of Psychology 87, 385-392. Bjorkman, M., 1966. Stimulus-event learning and event learning as concurrent processes. Organizational Behavior and Human Performance 2, 219-236. Brehmer, A. and B. Brehmer, 1988. ‘What has been learned about human judgment from thirty years of policy capturing?’ In: B. Brehmer and C.R.B. Joyce (Eds.), Human judgment: The SJT view. Amsterdam: North-Holland. Brehmer, B., 1969. Cognitive dependence on additive and configural cue-criterion relations. American Journal of Psychology 82, 490-503. Brehmer, B., 1973. Single-cue probability learning as a function of the sign and magnitude of the cue-criterion correlation. Organizational Behavior and Human Performance 9, 377-395. Brehmer, B., 1974. Policy conflict, policy consistency and interpersonal understanding. Scandinavian Journal of Psychology 15, 273-276. Brehmer, B., 1976. Note on the relation between clinical judgment and the formal characteristics of clinical tasks. Psychological Bulletin 83, 778-782. Brehmer, B., 1980. In one word: Not from experience. Acta Psychologica 45, 223-241. Brehmer, B., 1984. ‘Brunswikian psychology for the 1990’s’. In: K. Lagerspetz and P. Niemi (Eds.), Psychology for the 1990’s. Amsterdam: North-Holland. Brehmer, B., R. Hagafors and R. Johansson, 1980. Cognitive skills in judgement: Subjects’ ability to use information about weights, function forms, and organizing principles. Organizational Behavior and Human Performance 10, 290-313. Brunswik, E., 1952. Conceptual framework of psychology. Chicago, IL: University of Chicago Press. Brunswik, E., 1956. Perception and the representative design of psychological experiments. Berkeley, CA: University of California Press. Camerer, C.F., 1981. General conditions for the success of bootstrapping models. Organizational Behavior and Human Performance 27, 411-422. Carlstrom, A., 1989. Mode of presentation of cues in policy capturing: A comparison between verbal and pictorial presentation of targets in judgment of probability to fire. FOA Report C 50074-5.2, Swedish Defense Research Institute, Stockholm. Castellan, N.J., 1992. Relations between linear models: Implications for the lens model. Organizational Behavior and Human Decision Processes 51, 364-381. Dawes, R.M. and B. Corrigan, 1974. Linear models in decision making. Psychological Bulletin 81, 95-106. Einhorn, H.J., 1971. Use of nonlinear, noncompensatory models as a function of task and amount of information. Organizational Behavior and Human Performance 6, l-27. Einhorn, H.J., D. Kleinmuntz and B. Kleinmuntz, 1979. Linear regression and process tracing models of judgment. Psychological Review 86, 465-485. Goldberg, L.R., 1965. Diagnosticians vs. diagnostic signs: The diagnosis of psychosis vs. neurosis from the MMPI. Psychological Monographs 79 (9, Whole No. 602). Goldberg, L.R., 1970. Man versus model of man: A rationale, plus some evidence, for a method of improving on clinical inferences. Psychological Bulletin 73, 422-432. Hammond, K.R., 1955. Probabilistic functioning and the clinical method. Psychological Review 62, 255-262. Hammond, K.R. and B. Brehmer, 1973. ‘Quasi-rationality and distrust. Implications for international conflict’. In: L. Rappoport and D. Summers (Eds.1, Human judgment and social interaction. New York: Holt, Rinehart and Winston. Hammond, K.R., C.J. Hursch and F.J. Todd, 1964. Analyzing the components of clinical inference. Psychological Review 71, 438-456. Hammond, K.R. and D.A. Summers, 1972. Cognitive control. Psychological Review 79, 58-67.
1.54
B. Brehmer /Acta Psychologica 87 (1994) 137-154
Hayek, F.A., 1962. Rules, perception and intelligibility. Proceedings of the British Academy 48, 321-344. Hoffman, P.J.. 1960. The paramorphic representation of clinical judgment. Psychological Bulletin 57, 116-131. Hogarth, R.M.. 1981. Beyond discrete biases: Functional and dysfunctional aspects of judgmental heuristics. Psychological Bulletin 90. 197-217. Hursch, C.J., K.R. Hammond and J.L. Hursch, 1964. Some methodological considerations in multiplecue probability studies. Psychological Review 71, 42-60. Kdhneman, D. and A. Tversky, 1979. Prospect theory: An analysis of decisions under risk. Econometrica 47, 263-291. Kleinmuntz, B., 1963. MMPI decision rules for the identification of college maladjustment: A digital computer approach. Psychological Monographs 77 (14, Whole No. 577). Knez, I.. 1992. Estimation of the hypothesis hierarchy in probabilistic inference tasks, Scandinavian Journal of Psychology 33, 47-55. Meehl, P.E., 1954. Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Minneapolis, MN: University of Minnesota Press. Meehl, P.E.. 1960. The cognitive activity of the clinician. American Psychologist 15, 19-27. Montgomery. H. and 0. Svenson (Eds.), 1989. Process and structure in human decision making. Chichester: Wiley. Phelps, R.H. and J. Shanteau, lY78. Livestock judges: How much information can an expert use’? Organizational Behavior and Human Performance 21, 209-219. Reilly, B.A. and M.E. Doherty, 1989. A note on the assessment of self insight in judgment research. Organizational Behavior and Human Decision Processes 44, 123-131. Reilly, B.A. and M.E. Doherty, 1992. The assessment of self-insight in judgment policies. Organizational Behavior and Human Decision Processes 53. 2855309. Rohrbaugh, J., 198X. ‘Cognitive conflict tasks and small group processes’. In: B. Brehmer and C.R.B. Joyce (Eds.), Human judgment: The SJT view. Amsterdam: North-Holland. Slavic, P., 1969. Analyzing the expert judge: A descriptive study of a stockbroker’s decision processes. Journal of Applied Psychology 53, 2555263. Tucker. L.R., 1964. A suggested alternative formulation in the developments by Hursch, Hammond and Hursch and by Hammond, Hursch and Todd. Psychological Review 71, 528-530. Wallace, H.A., lY23. What is in the corn judge’s mind? Journal of the American Society of Agronomy 15, 300-304. lY68. Three models of clinical judgement. Journal of Abnormal Wiggins, N. and P.J. Hoffman, Psychology 73, 70-77. Wigton, R., 198X. ‘Applications of judgment analysis and cognitive feedback in medicine’. In: B. Brehmer and C.R.B. Joyce (Eds.1, Human judgement: The SJT view. Amsterdam: North-Holland.