ORGANIZATIONAL
BEHAVIOR
AND
HUMAN
DECISION
PROCESSES
53, 35-54 (192)
Task Information, Cognitive Information, or Functional Validity Information: Which Components of Cognitive Feedback Affect Performance? WILLIAM K. BALZER Bowling Green State University
LORNE M. SULSKY Louisiana State University
AND LESLIE B. HAMMERANDKENNETH
E. SUMNER
Bowling Green State University A recent review by Balzer, Doherty, and O’Connor (1989) decomposed cognitive feedback (CFB) into three conceptually distinct components: information about the task system (task information, TI), information about the subject’s cognitive system (cognitive information, CI), and information about the relationship of the task system to the cognitive system (functional validity information, FVI). Their review suggested that the TI component of CFB appeared to be the component responsible for improvements in performance on multiple cue probability learning (MCPL) tasks. A laboratory experiment was designed to test whether different combinations of CFB components lead to different levels of performance. Undergraduates (N = 133) were randomly assigned to one of five CFB conditions (TI only, CI only, TI + CI, TI + Cl + FVI, or no feedback (NF)). Subjects completed a MCPL task, predicting number of wins for major league baseball teams. Subjects returned a week later, received the CFB appropriate for their experimental condition, and repeated the judgment task. Traditional measures of performance on MCPL tasks (e.g., rJ were collected, along with measures of the accuracy of predictions and self-report measures of the understandability and helpfulness of feedback. Results indicated that subjects who received TI (i.e., TI, TI + CI, or TI + CI + FVI feedback conditions) showed significantly better performance than subjects who received no feedback as indicated by validity and This study was funded by an award from the State of Ohio’s Academic Challenge Program granted to the Industrial-Organizational Psychology program at Bowling Green State University. The authors acknowledge the contributions of David Pollock, Heidi Josephson, and Paul Damiano to the project. We also thank Michael Doherty and David Kravitz for helpful comments on an earlier draft of this paper. Leslie Hammer is now at the Department of Psychology, Portland State University. Address correspondence and reprint requests to William Balzer, Department of Psychology, Bowling Green State University, Bowling Green, OH 43403. 35 0749-5978192$5.00 Copyright All rights
0 1992 by Academic Press. Inc. of reproduction in any form reserved.
36
BALZER ET AL. accuracy measures of performance. In no instance did the CI condition show better performance than the NF condition, nor did the TI + CI or TI + CI + FVI conditions show better performance than the TI condition. No differences were found in subjects’ self-reported reactions to CFB. Overall, TI was found to be the CFB component that improved the validity and accuracy ofjudgment performance. 0 1992 Academic Press, Inc.
Cognitive Feedback (CFB) is the process of providing individuals with information that allows them to compare relations between cues and their judgments with relations in the task (Brehmer, 1988; Todd & Hammond, 1965). Recent reviews (Balzer, Doherty, & O’Connor, 1989; Doherty & Balzer, 1988) indicate that CFB improves performance on multiple cue probability learning (MCPL) tasks in both laboratory and field settings (e.g., Balke, Hammond, & Meyer, 1973; Hammond & Adelman, 1976; Hammond & Boyle, 1971; Steinmann, 1974). The reviews by Doherty and Balzer (1988) and Balzer et al. (1989) critically examined what is meant by CFB. Elaborating on a previous category system developed by Hammond, McClelland, and Mumpower (1980, p. 228), Doherty and Balzer identified three components of CFB: task information (TI), cognitive information (CI), and functional validity information (FVI). These components are described below and are shown graphically in Fig. 1. Task Information TI provides information about the task system (i.e., environment). In general, it refers to relations across judgment profiles between the cues (Xi) and criterion (Y,). (Note that TI can represent information about another person, as in research on interpersonal learning and cognitive conflict; Rohrbaugh, 1988.) Three kinds of relational indices can be distinguished for TI. Task predictability (R,), the multiple correlation between the cues and the criterion, indexes the degree to which it is possible to estimate the criterion value, given knowledge of the cues. The relation between each cue and the criterion is the second type of relational index and may include information about weights (e.g., Tie)and function forms relating the criterion to the cues. The third and final relational index is used to represent cue intercorrelations, rW In addition to the relational indices shown in Fig. 1, other potentially important forms of TI include indices of the level (TJ and variability (SD,<) of criterion values across profiles. Cognitive Information CI provides information about the subject’s cognitive strategy, reflecting relations between the cues and the subject’s judgments (Y,). Referring
COMPONENTS OF COGNITIVE ENVIRONMENT
CUES
FEEDBACK
37
SUBJECT
FIG. 1. Components of cognitive feedback (CFB). 0 task information (TI); 0 information (CI); V functional validity information (FVI).
cognitive
to Fig. 1, CI can include information about judgment consistency (R,) and the weights and function forms relating each cue to the judgment. CI may also include indices of the level (iEJ and variability (SD,*) of a subject’s judgments across profiles. Functional Validity Znformation FVI includes those indices that depict the relations between the task system and the subject’s cognitive strategy, including the correlation between the actual values of the criterion and judgments (Y,), the correlation between the predictions of the linear model of the environment and the linear model of the subject (G), and the correlation between the residuals from the predictions of these models (C). Usefulness of the CFB Categorization Framework This conceptual framework, which distinguishes among the components of CFB, may be useful for a number of reasons. First, it provides a framework for studying which type(s) of information individuals find more helpful for improving their performance on judgment tasks. For example, does the performance of individuals improve with information about how they are doing the judgment task (i.e., CI), how they should be
38
BALZER
ET AL.
doing the judgment task (i.e., TI), or about the “success” of their judgment strategy (i.e., FVI)? Or is some combination of these components necessary to improve performance? Hammond and Boyle (1971, p. 107) state that cognitive feedback “. . . should include information about (1) task (or environmental-system) properties in relation to (2) the properties of the learner’s cognitive system.” This implies that for CFB to be effective, all three components (TI, CI, and FVI) are necessary; empirical evidence to support this assertion, however, is lacking. The identification of which type(s) of feedback improves judgment performance may provide some insight into how individuals monitor and revise their judgments. From a practical standpoint, determining whether CI and FVI have an incremental impact on performance, over and above TI, is an important question (Balzer et al., 1989). This is because the return of CI and FVI to individual judges requires (a) the development of a judgment task with a sufficient number of profiles on which to base a stable policy equation, (b) the investment of each judge’s time to evaluate the complete set of judgment profiles, and (c) the experimenter (or consultant) to compute CI and FVI indices and return them to the judges. Although microcomputer software is available to generate judgment profiles and return CFB measures to individual judges (e.g., Hoffman, 1987; Rohrbaugh, 1986), there is still the considerable investment of time by the judges (who might be hard pressed or unwilling to find the time, such as physicians or business executives) to evaluate the set of profiles. If TI is the only CFB component necessary to improve performance, considerable time and expense could be saved. Additionally, because the return of TI to judges is simple and inexpensive, this “simplified” procedure could potentially expand the application of CFB to more real-world judgments. Balzer et al. (1989) found that although CFB led to improved performance on a number of different criteria, not all CFB components appeared to be equally useful. Specifically, In situations in which the subject is trying to predict an environment. . . . , the evidence suggests that it is the TI component of CFB that has made it work; those studies that have made the appropriate comparisons have shown CI to have had little if any effect on performance. Less is known about the possibility of a separate contribution of FVI. . . These conclusions have great implications for CFB applications. . . . (p. 428)
The conclusion that it is TI that improves performance rather than CI and FVI, and that returning CI and FVI along with TI does not lead to a significant improvement over TI alone, must be considered tentative for several reasons. First, the conclusions are based on only a handful of studies that provided comparisons of the different components of CFB. In
COMPONENTS
OF COGNITIVE
FEEDBACK
39
their review, Balzer et al. (1989) found that only seven studies provided even some of the needed comparisons (Galbraith, 1984; Hammond & Boyle, 1971; Newton, 1965; Nystedt & Magnusson, 1973; Schmitt, Coyle, & King, 1976; Steinmann, 1974, 1976), and not all of the studies were consistent with the general conclusions. For example, Hammond and Boyle (1971) found that CI presented along with TI led to higher levels of Y, than did TI only. Second, the studies do not systematically compare the major combinations of CFB components to determine whether any particular combination might be more effective for improving performance. For example, only one study looked at the impact of CI alone on performance (Schmitt et al., 1976), and only one study has compared the return of all three components of CFB (TI + CI + FVI) to TI only (Galbraith, 1984). None of the seven studies included the necessary control group (i.e., No CFB) which would provide a performance baseline against which to assess the impact of different components of CFB. Research is therefore needed to provide a more complete set of comparisons among CFB components. A third reason why the conclusions of the Balzer et al. (1989) review should be considered tentative is that sample size, CFB characteristics. and different dependent measures of performance make it difficult to draw strong conclusions from the seven studies. For example, five of the seven studies included 25 or fewer subjects; of these five, only one (Hammond & Boyle, 1971) found a statistically significant difference between TI and other combinations of CFB components. The lack of significant differences (e.g., TI is not significantly different from TI + CI), however, could be due to the lack of statistical power in the studies. Indeed, Schmitt et al. (1976) used a much larger sample size (N = 160) and found that TI + CI led to greater improvements in I, and rm (i.e., G) than TI alone. Differences in studies also make comparisons difficult. In addition, a variety of measures have been used to measure whether CFB affects “performance.” By far, the most common measures of performance that have been used are those that assess changes in specific aspects of MCPL task behavior. These measures, termed behavioral criteria by Doherty and Balzer (1988), include R,, ra, G, and C. But even these measures are not used in all studies. Most of the seven studies that have compared the influence of TI relative to CI and FVI have generally included measures (or some variant) of R,, Y,, and G. But other behavioral criteria measures are available, as well as testimonials and self-report measures reporting CFB recipients’ assessments of the CFB (i.e., Reaction Criteria) and measures that assesswhether CFB led to improvements beyond the CFB task itself (e.g., improved diagnosis of future patients; Results Criteria; Balzer et al., 1989; Doherty & Balzer, 1988).
40
BALZER
ET
AL.
Study and Hypotheses The present study was designed to overcome a number of the limitations in previous research noted above and provide a stronger test of the hypothesis that TI is the only component of CFB necessary to improve performance. First, five CFB conditions were used: (a) TI only, (b) CI only, (c) TI + CI, (d) TI + CI + FVI, and (e) no feedback (NF). Second, a large sample was used to provide greater statistical power. Third, dependent measures were used to assess the impact of CFB on both behavioral criteria and reaction criteria (i.e., self-report scales). METHOD
A judgment task was developed that required subjects to predict the number of wins for each of 50 baseball teams based on five team statistics (e.g., team earned run average). During Session 1, subjects completed the judgment task. During Session 2 (1 week later), subjects were provided with CFB (except for the no feedback group) and completed a reordered version of the judgment task from Session 1. Subjects (except for the no feedback group) also completed a self-report questionnaire measuring their reactions to the CFB. At no time was outcome feedback, i.e., knowledge of the criterion value on a given judgment profile, provided. Subjects A total of 133 undergraduates from a large southern university voluntarily participated in the study in exchange for course credit. This sample size was chosen to provide an adequate level of statistical power for the CFB main effect (power = .79) given an alpha of .05 and assuming a medium effect size (Cohen, 1977). Due to attrition, the number of subjects in the CI alone condition (N = 25) was less than that of the other experimental conditions (N’s of 27 per condition). Subjects were run in small groups of up to six individuaIs per session. Baseball Judgment Task A task was developed that included: (a) a predictable criterion with which we could compare subjects’ judgments; (b) five cues that differed in validity and sign (positive or negative); (c) low cue intercorrelations; (d) a reasonable case-to-cue ratio (10: I), in an effort to provide stable and accurate CFB; and (e) a task content that would differ in familiarity for different subjects. We chose to develop a baseball task where subjects were given information about the performance of a team and asked to predict the number of wins the team had that season; the subject’s predictions could then be compared to the actual number of wins that season. Team statistics were gathered from a random sample of 50 American
COMPONENTS OF COGNITIVE
41
FEEDBACK
TABLE 1 BASEBALL
Earned run average (ERA) Batting average (BA) Double plays (DP) Stolen bases (SB) Errors (E) Team
wins
(WINS)
JUDGMENT
TASK CHARACTERISTICS
M
SD
ERA
.396 ,262 159.5 99.7 139.0
.038 .Oll 16.3 39.0 18.8
.06 .Ol - .23 .05
10.8
-.57
78.8
BA
DP
SB
E
-.lO - .04 -.I8
-.07 .05
.18
-
.60
.oo
.13
- .34
League baseball teams for the years 1977 to 1984, omitting 1982 due to a shortened season resulting from a players’ strike (Baseball Encyclopedia, 1985). Five team baseball statistics were used as cues in the judgment task: Earned run average (ERA): average number of runs allowed by the team (excluding those due to errors) per game. Batting average (BA): ratio of the number of team hits per team at bats, expressed as a proportion. Double plays (DP): total number of double plays successfully completed by the team that season. Stolen bases (SB): total number of bases stolen by the team that season. Errors (E): total number of fielding errors committed by the team that season. Cue means, standard deviations, validities, signs, and intercorrelations are presented in Table 1. Task predictability was high (R, = .894), and the median cue intercorrelation was .03.’ Measures of Reaction
to Feedback
A 13-item questionnaire was designed to gather information on subjects’ reactions to the feedback they received. A Principal Components Analysis (varimax rotation) of these items in an unpublished data set resulted in a three factor solution (eigenvalues > 1.0). This factor structure was also found in the present sample. Based on these analyses, three unit-weighted scales were created using 12 of the 13 items: Understandability of feedback. This two-item scale was used to measure subjects’ perceptions of the understandability and interpretability of the feedback (e.g., I found the feedback very difficult to follow). Coeffrcient alpha internal consistency reliability in the present sample was .81. Current helpfulness of feedback. This eight-item scale was used to ’ A copy of the task and further information on task characteristics are available upon request from the first author.
42
BALZER ET AL.
measure subjects’ perceptions of whether they found the feedback helpful for improving their performance on the baseball judgment task (e.g., I believe the feedback improved my predictions during the second session). Coefficient alpha in the present sample was .88. Future helpfulness offeedback. This two-item scale was used to measure subjects’ perceptions of whether they thought the feedback would be helpful for improving their performance on other everyday judgment tasks (e.g., I believe that the feedback will be useful for improving other decisions that I will make in the future). Coefficient alpha in the present sample was .77. Procedure The experiment was conducted in two sessions conducted 1 week apart. During Session 1, subjects were first read instructions describing the baseball judgment task. Each of the baseball cues was defined, and the means and ranges of the cue levels were also provided to serve as a benchmark for determining what were high and low levels for the five cues.2 After any questions regarding the task were answered, subjects made predictions of the number of team wins for each of the 50 team profiles. Session 1 lasted approximately 40 min. One week later in Session 2, subjects again predicted the number of wins for the 50 reordered team profiles. Prior to making predictions, subjects were provided with one of five types of CFB based on assignment to experimental condition: Task information only. TI included (a) graphic information on how predictable the actual number of team wins was given each of the five baseball statistics (R,), (b) the correct relative weights (Ullman & Doherty, 1984) of the five baseball statistics (rieLZrie), (c) the function form relating each statistic and the actual number of wins, (d) the average number of actual games won, and (e) the ranges of wins between which 68% (approximately 1 SD) and 95% (approximately 2 SD) of the teams actually won. Cognitive information only. CI included (a) graphic information on how consistent the subject was in predicting the number of games a team would win given each of the five baseball statistics (R,), (b) the relative weights for each of the five baseball statistics used by the subject when * Information about the means and ranges of cue levels was provided to subjects based on pilot work that indicated that subjects needed some benchmarks for determining whether a value of a cue was low, average, or high. This information can be viewed as a type of TI provided to all experimental conditions. Note, however, that this “feedforward” did not contain information regarding the mean of the criterion measure (i.e., number of team wins) or relations between cues and criterion.
COMPONENTS
OF COGNITIVE
FEEDBACK
43
predicting the number of team wins (TJE-J, (c) the function form relating each statistic and the predicted number of wins, (d) the average number of predicted wins, and (e) the ranges of wins between which 68 and 95% of the teams were predicted to win. Tusk + cognitive information. TI + CI included both TI and Cl just described. Task, cognitive, and functional validity information. In addition to including TI + CI information, TI + CI + FVI included (a) graphic information on how well the subject’s predictions of number of games won corresponded to the actual number of games won (r3 and (b) the average absolute deviation of the predicted number of wins from actual number of wins for each team (averaged across the 50 team profiles). No feedback. No feedback was provided. Feedback was provided to subjects in booklet form. In addition, the experimenter followed a prepared script which further described the components of CFB. The script also contained information on how the feedback should be interpreted. The time to read the script and provide feedback ranged from 7 to 11 min (depending on the amount of information returned), and subjects were provided additional time (up to 10 mitt; no subject requested additional time) to review their booklets prior to making their second set of baseball judgments. As a motivational incentive, all subjects were informed at the onset of Session 2 that a $25.00 cash prize would be awarded to the subject who showed the greatest improvement in performance from Session 1 to Session 2, and that their knowledge of baseball would not affect their performance .3 Following the judgment task, subjects in the four feedback conditions completed the scales reporting their reactions to the feedback. Session 2 lasted approximately 50 min. Dependent
Variables
In addition to the self-report scales, several performance measures were calculated based on subjects’ ratings of the 50 team profiles. These measures were calculated separately for judgments made during Sessions 1 and 2. Consistency (R,): the Pearson correlation between the subject’s judgments of wins (Y,) and his or her predicted judgments (Y’,), based on the linear regression of Y, onto the cue values. 3 Performance was operationalized as R,. Within each of the five experimental conditions, the subject with the greatest improvement in R, from Session 1 to Session 2 was selected, and a winner was randomly selected from these five finalists.
44
BALZER ET AL.
Achievement (~a): the correlation between the subject’s judgment of wins (Y,) and the actual number of wins (Y,). Knowledge-Linear (G’): the correlation between the linear additive aspects of the environment (Y’,) and the linear aspects of the subject’s policy ( Y,) . Knowledge-Nonlinear (C): the correlation between the nonlinear nonadditive aspects of the environment (Y, - Y’,) and of the subject’s policy (Y, - Y,). Accuracy-Error in individual predictions (DEVIAT): the average absolute deviation of predicted number of wins from actual number of wins for team i, averaged across the 50 teams. Accuracy-Error in mean level of predictions (MDIFF): the squared difference between the mean of the subject’s judgments and the actual mean number of wins. Accuracy-Error in variability of predictions (SDIFF): the squared difference between the standard deviation of the subject’s judgments and the actual standard deviation of the number of wins. Data Analysis
All correlations were transformed to Fisher’s 2 prior to further analysis, and reported means are retransformed back to correlation coefIicients. Also, square roots of MDIFF and SDIFF (to return the measures to their original unit of measurement) are reported. Overall MANOVAs for both Session 1 and Session 2 were conducted, and significant results were followed by separate univariate ANOVAs. Tukey’s procedure was used to conduct post hoc comparisons among conditions following significant effects.4 RESULTS Means, standard deviations, and intercorrelations for all dependent measures for both sessions are presented in Table 2. Behavioral
Criteria
First, analyses were conducted to determine whether there were any significant differences on the dependent measures among experimental groups in Session 1 (i.e., prior to the feedback manipulation). A 4 A repeated-measures design was not used in this study because within-group variability for the dependent measures at Session 1 was extremely large. Thus, although average levels of performance on many dependent measures improved dramatically from Session 1 to Session 2, large estimated standard errors at Session 1 made it difficult to reach statistical significance.
-
-
-
-
3.95
4.25
,181
.61 -.290
.63
2.
-.276
.I92
-.318 -.074
-.404
,522 .550 .931 .316 ,268 -.002
11.30 -.516
.36 .33 .75 .17
1.
-.260
,120
-.286 -.I40
-.434
.I41
.a72 .250
3.
5.
6.
,064
.927
.314
,198
-
,098 -.103 - ,071
,185
2
9.
10.
.llO .099
.lOl
.413
,120 -.051
.163 ,003
,125
.151 - .073 -.058 .137 -.154 -.063 .136 -.OOl -.029 .I20 .I19 .045
8.
1 AND
.334 -.I44
,058 - .195
.899 .271 - ,093
,054 - .043
-.162 .021
-.218
-.019 -.036 -.057 - .017
I.
FOR SESSIONS
.288 -.425 -.281 ,001 -.248 -.201 ,081 -.409 -.254 .006 .068
4.
MEASURES
3.27 .91 -.I58 ,007 -.I49 -.013 .107 ,085 .093 __.-_. - .__ ---__ .-. ~____ Note. ra, achievement; R,, consistency; G, knowledge-linear; C, knowledge-nonlinear; DEVIAT, accuracy--error in individual predictions; MDIFF, accuracy--error in mean level of predictions; and SDIFF, accuracy-error in variability of predictions. Session 1 correlations are presented below the diagonal; Session 2 correlations are presented above the diagonal. N = 133 for measures l-8; r’s > 2.17 are significant, p < .05; N = 106 for measures 8-10 (subjects in the no feedback condition did not complete self-report measures); r’s > 2.19 are significant, p < .05.
9. Current helpfulness of feedback 10. Future helpfulness of feedback
16.03
51 1.05 1.09 - .05
SD
573.17 842.65 295.04 635.31 -.358 167.82 221.21 52.09 101.78 -.I23
13.49
.34 .28 .73 .I4
Mean
6. MDIFF 7. SDIFF 8. Understandability of feedback
.43 .85 .97 .oo
SD
24.23
r, R, G c
Mean
FOR DEPENDENT
TABLE 2 AND INTERCORRELATIONS
Session 2
DEVIATIONS,
Session 1
STANDARD
5. DEVIAT
1. 2. 3. 4.
Variable
MEANS,
R
3 ; g g
3
8 2 =i
8
s
O” 2
46
BALZER ET AL.
MANOVA indicated that there were no group differences on the seven behavioral measures, Wilk’s h = .79, F(28,441.30) = 1.09, p = .34. A MANOVA indicated that experimental conditions at Session 2 (i.e., following the feedback manipulation) differed on the seven dependent measures, Wilk’s h = .60, F(28,441.30) = 2.43, p < .OOl. Separate ANOVAs were conducted for each of the behavioral criteria dependent measures. Significant effects for the feedback manipulation were found for r, (F(4,128) = 2.53, p = .043), G (F(4,128) = 3.57, p = .009), DEVIAT (F(4,128) = 10.04, p < .OOl), MDIFF (F(4,128) = 6.71, p < .OOl), and SDIFF (F(4,128) = 2.76, p = .031). There were no significant differences among feedback conditions on R, (F(4,128) = 1.69, p = .156) and C (F(4,128) = 0.48, p = .75). The means and standard deviations for the dependent measures are shown in Table 3. Post hoc comparisons indicated that r, was significantly higher (p < .05) in the TI condition (M = .56) than in the NF condition (M = .34). G was significantly lower in the NF condition (M = .62) than TABLE 3 MEANS AND STANDARD DEVIATIONS FOR DEPENDENT MEASURES: BY CFB CONDITIONS
NF”
-
.56 (.31) .79 (.37) .87 (.74) -.05 (.17) 9.43 (3.12) 34.07 (52.92) 8.01 (14.57) 3.17 (.75) 3.96
-
3.16
.34 (.39) .74
ra R,
t.32) G C
.62 (.70) - .04
C.23 DEVIAT MDIFF SDIFF Understandability of feedback Current helpfulness of feedback Future helpfulness of feedback
TI
25.14 (16.56) 769.03 (1146.08) 96.15 (132.23) -
C60) t.84)
CI
TI + CI
.43 (.37) .83
.54 (.34) .77 (.31) .87 (.72) -44 (. 18) 13.09 (8.18) 172.09 (332.37) 43.59 (92.20) 4.42 (55) 4.12 (.49) 3.48 (.97)
C.29) .72
(.@‘I -.lO (. 18) 19.06 (9.89) 377.77 (466.31) 52.97 (114.64) 4.24 (.50) 3.68 (.71) 3.12 (.89)
TI + CI + FVI .48 (.34) .79
C.36) .83
t.76) - .06 (.13) 13.69 (7.26) 128.40 (257.87) 59.83 (100.50) 4.07
t.69) 4.02
(.W 3.31 (.95)
Note. r, transforms were applied to r,, R,, G. and C prior to analysis; the mean vaiues for these variables presented here have been retransformed back to r units for ease of interpretation. a NF, no feedback; TI, task information; CI, cognitive information; TI + CI, task and cognitive information; TI + CI + FVI, task, cognitive, and functional validity information,
COMPONENTS
OF COGNITIVE
FEEDBACK
47
either the TI (M = .87) or TI + CI (M = .87) conditions. DEVIAT was significantly higher (i.e., more error) in the NF condition (M = 25.14) than in the TI (M = 9.43), TI + CI (M = 13.09), and TI + CI + FVI (M = 13.69) conditions. In addition, DEVIAT was significantly higher in the CI condition (M = 19.06) than in the TI, TI + CI, and TI + CI + FVI conditions. Post hoc comparisons also indicated that MDIFF was significantly higher (i.e., more error) in the NF condition (M = 769.03) than in the TI (M = 34.07), TI + CI (M = 172.09), and TI + CI + FVI (M = 128.40) conditions. Finally, SDIFF was significantly higher (i.e., more error) in the NF condition (M = 96.15) than in the TI condition (M = 8.01). Overall, post hoc comparisons indicated that groups that received TI (i.e., TI, TI + CI, or TI + CI + FVI) showed significantly better performance on many of the behavioral criteria measures than did subjects who received no feedback; in no instance did the CI condition show better performance than the NF condition. In addition, in no instance did the TI + CI or TI + CI + FVI conditions show better performance than the TI condition. Self-Report
Measures
To test whether the four experimental groups that received feedback differed in their reactions to feedback, a MANOVA was conducted including the three self-report scales (Understandability of Feedback, Current Helpfulness of Feedback, and Future Helpfulness of Feedback) as dependent measures. No significant differences were found among the feedback conditions, Wilk’s X = .88, F(9,243.52) = 1.38, p = .196. DISCUSSION
Significant feedback effects were found for ra, G, DEVIAT, MDIFF, and SDIFF. Post hoc comparisons among the feedback conditions on these dependent measures support Balzer et al.‘s (1989) and Doherty and Balzer (1988) conclusion that it is the TI component of CFB that leads to improvement in both the validity (i.e., correlational measures of performance) and the accuracy (i.e., discrepancy score measures of performance) judgment. Specifically, only the CFB groups that received TI feedback (i.e., the TI, TI + CI, or TI + CI + FVI feedback conditions) showed significantly better performance on the behavioral criteria measures than did subjects who received no feedback. In no instance did the performance of the subjects who received only CI feedback differ significantly from subjects who received no feedback. In addition, the results indicated that providing CI feedback or CI and FVI feedback along with TI feedback did not result in greater improvements in performance than TI feedback alone. In no case was the performance better for subjects who received TI feedback plus some other component of feedback than that of subjects who received only TI feed-
48
BALZER
ET AL.
back. Thus, in this study, it appears that TI feedback is the essential CFB component for improving subjects’ performance on the judgment task. The failure to find significant differences among the feedback conditions on C and R, was not completely unexpected. We had not anticipated any effects of CFB on C in this study, given that the judgment task included only linear cue-judgment relations. Thus, given the lack of configurality in the judgment task, there was no reason to expect that feedback would have affected judges’ levels of configurality (which were all, in fact, low). C was reported here primarily for sake of completeness, because it is typically used as a measure of performance in CFB studies. Perhaps future studies that employ configural judgment tasks would predict that subjects who receive TI feedback (which in this case include information about the configurality in the task environment) would show higher levels of C than subjects who did not. Regarding R,, it was not apparent to us why CFB would lead to higher consistency in judgments or why subjects who received some components of CFB would be more consistent than those who received other components. Individuals who receive no CFB could be expected to be quite consistent in their judgments, and we know of no rationale that would lead us to predict that subjects who receive CFB would be any more consistent in their judgments. In fact, it would seem more likely that consistency in judgments would decrease following CFB as subjects “develop” a revised judgment policy based on the CFB. Overall, the finding that TI is the component of CFB that affects performance suggests that future interventions using CFB to improve the validity and accuracy of judgments may be simplified, for both the judges and the researcher/consultant, without deleterious effects on performance. Based on this laboratory task, our findings suggest that requiring individuals to complete a multiple profile judgment task to obtain CI (e.g., R,) and FVI (e.g., ra) may not be necessary; thus, future interventions using CFB may be able to reduce the amount of time judges must commit to the CFB intervention strategy. This might be particularly important in situations where judges’ time is limited or expensive. For example, CFB could be introduced more easily into training programs that have limited flexibility for including additional training (e.g., medical school curricula, management development workshops). In addition, the indirect cost of the judges’ salary while completing the judgment task are eliminated. This would reduce the costs of CFB interventions, perhaps making them more appealing to organizations with limited resources. Three caveats, however, must be kept in mind regarding the claim that TI feedback is the feedback necessary to improve performance. First, there are expenses associated with calculating and returning TI feedback information. To the extent that archival data are unavailable to compute
COMPONENTS
OF COGNITIVE
FEEDBACK
49
R,, rie, and other TI indices, a database would need to be established, data collected, analyzed, and so on. Therefore, although the costs of CFB interventions may be reduced by omitting the judgment task necessary to calculate CI and FVI, TI may still be expensive, and difficult, to obtain. Second, as pointed out by Balzer et al. (1989), CI is the only component of CFB available in situations where there is no environment (e.g., what factors are important to individuals when they buy a car). Third, and perhaps most important, is that earlier studies have demonstrated that CI and FVI can improve judgment performance. The results of the present study are consistent with studies that found that CI and FVI do not improve performance over that of TI alone (Galbraith, 1984; Newton, 1965; Nystedt & Magnusson, 1973; Steinmann, 1974, 1976). The present results, however, do not explain why others (Hammond & Boyle, 1971; Schmitt et al., 1976; Experiment 1) have found that the addition of CI results in higher performance than TI alone. A comparison of these studies with conflicting results does not reveal any systematic differences in task characteristics (e.g., levels of R,, cue intercorrelations, linear vs nonlinear task), TI indexes used (e.g., environmental weights, R,, function forms), CFB statistical measures presented (e.g., rie vs Qieto represent cue-criterion relations), or subject populations (i.e., all subjects were college students). Previous studies that have failed to find significant improvements in performance when CI and FVI are added to TI might be suspect on grounds of inadequate statistical power. For example, the study by Nystedt and Magnusson (1973) contained only four subjects per experimental condition. The relatively large sample size and the pattern of means in the present study, however, suggest that low statistical power is not a viable explanation for the lack of significant findings. In fact, a recent study we conducted comparing TI, CI, TI + CI, and TI + CI + FVI feedback conditions, using the same task with over 300 undergraduate subjects, also found nonsignificant differences in performance (Balzer, Hammer, Sumner, Birchenough, Parham, & Raymark (1991). Thus, we are at a loss to reconcile the differences in results across studies. Additional research is needed to systematically investigate whether these and other factors (e.g., frequency of CFB, task predictability, judgment experience or expertise) moderate the effect of various types of CFB on performance (Balzer et al., 1989). Therefore, based on the mixed research findings and paucity of studies that systematically investigate the influence of CFB components, it would be premature to discontinue investigations into the effects of CI and FVI on performance. Subjects’ Reactions to CFB Subjects reactions to CFB did not differ among the CFB conditions. This suggests that subjects do not perceive any advantage (or disadvan-
50
BALZER
ET AL.
tage) from receiving complete (or limited) CFB, and that subjects’ reactions to any type CFB will be similar. We urge caution in accepting this conclusion for a number of reasons. First, the CFB manipulation in this study was between subjects, and subjects were unaware what other types of feedback might have been available. Perhaps if CFB was manipulated as a within-subjects variable, subjects who received, for example, CI feedback alone after having earlier received TI, CI and FVI feedback would react quite differently to the understandability and usefulness of feedback. In addition, only three aspects of perceptions regarding feedback were included in this study. Although we did choose these items because we thought they were important, perhaps other reactions not studied (e.g., confidence in predictions, insight into task and/or personal judgment policy) would have found significant differences among the feedback conditions. More work is needed in identifying and measuring individuals’ reactions to CFB. Practical
Considerations
One must be cautious in generalizing CFB results from the laboratory to real-world situations. However, some general issues regarding the utility of this laboratory research for applied situations should be mentioned. One important issue is whether real-world judgment situations have available environments to compute the TI component of CFB. There are at least three applied decision making situations that can provide environments for computing TI. First, there are environments that contain a natural criterion (e.g., laboratory tests that provide definitive findings of patient illness, future job performance measures for an interviewing job applicant) that allow one to calculate TI empirically (Dougherty, Ebert, & Callender, 1986; Wigton, Patil, & Hoellerich, 1986). However, it is often difficult to obtain the criterion of interest (e.g., criterion performance is only available for interviewees who are subsequently hired into the organization) or the criterion itself is questioned (Wigton, Connor, & Centor, 1986). Second, in situations where a natural criterion is unavailable, a de facto environment could be created. For example, key decision makers in an organization may establish a policy (e.g., what information will be used, how it should be weighted and integrated) for making student admission decisions, setting employee staffing levels (Rohrbaugh, 1984), choosing which new projects to fund, appraising employee performance (Bernardin & Buckley, 1981), and so on. Thus, TI may be readily available, through either written policy statements or direct questioning of key decision makers. Third, interpersonal learning studies (e.g., Summers, Taliaferro, & Fletcher, 1970) provide an opportunity to gather and return TI to individuals; here, the judgments of each member of a dyad serves as the environment for the other. In summary, although there are many
COMPONENTS
OF COGNITIVE
FEEDBACK
51
situations in which TI cannot be computed and only CI is available, there are many situations where TI may be used. A second issue is whether trainers and trainees (i.e., CFB providers and recipients, respectively) will have confidence in a single model for TI. To the extent that multiple models of TI are equally effective or that the model may change due to a dynamic environment, trainers and trainees may not choose to apply, or have great confidence in, TI. In a laboratory study, subjects may accept and apply TI; in practice, it may be resisted by individuals (Chaput de Saintonge & Hattersley, 1985). Perhaps individuals who are reluctant to follow the specified TI model can be given the opportunity during CFB training to try alternative models. FVI then can be calculated for the various judgment models, providing a direct comparison of the models for the judge. Results that indicate the specified TI model outperforms alternative models may be enough to convince skeptical judges. A final practical issue is the cost effectiveness of CFB training. If a straightforward application of TI produces the “best” possible judgments, judgments could be automated and the money invested in CFB training saved or allocated elsewhere. Balzer et al. (1989) recommend against replacing judges with judgment models: . both Dawes (1979) and Sawyer (1966) pointed out that people remain an integral part of the judgment process, if only because of their ability to select and code information necessary to the judgment at hand (Einhorn, 1972). Furthermore, the exclusion of people from the decision process might lead the decision makers to distrust the judgments made, be dissatisfied with the judgment process, and try to undermine the organization using the judgment models. Thus, if only for pragmatic reasons, it appears that people will retain an important role in the judgment process, and efforts to improve their abilities will remain an important concern. (p. 425)
Limitations and Future Research Several limitations of the present research should be noted. First, our findings must be replicated using other tasks. Our task was specifically chosen because it was highly predictable, had low cue intercorrelations, and had cues with differential signs and weights. Future research should examine the possibility that task characteristics themselves moderate the impact of CFB on performance. For example, while CI was not necessary to improve performance in the present task, perhaps it would be beneficial in more “complicated” tasks with highly intercorrelated cues, nonlinear relationships between cues and criterion, environments that are highly configural, and so on. Second, this research should also be replicated using real-world judges and tasks. Although our task content was chosen to both be interesting to
52
BALZER ET AL.
students and allow for individual differences in task content knowledge, our subjects may not have had the same levels of involvement or motivation as experienced, professional judges. This concern is an important one, because motivation and involvement may be key factors affecting how much effort judges (a) invest in understanding their CFB and (b) apply to the subsequent judgment task. Furthermore, it is not obvious whether real-world judges would be more or less motivated and involved than undergraduate subjects; arguments can be made for both cases. For example, our subjects may have been highly motivated and involved due to the novelty of the task and experimenter demands, or may have been unmotivated and uninvolved because there were no consequences for poor performance (except perhaps their chance of winning the $25.00 prize for showing the largest improvement in performance). On the other hand, real-world judges may be very motivated and involved because of the personal or organizational payoffs of improved judgments or, as Chaput de Saintonge and Hattersley (1985) suggest, doubtful about whether CFB is useful at all, thereby affecting their willingness to participate in the project. Further work is needed to improve our understanding across populations that may differ on important dimensions such as motivation and involvement. A third limitation is the lack of manipulation check of whether the different CFB conditions were equally well understood by the subjects. One could speculate that CI and FVI feedback may have been more difficult to understand or integrate than TI feedback, causing the pattern of results found. That is, perhaps it is more natural to think about how tasks work rather than how we think how tasks work. Although our pilot work and interactions with the subjects suggest that this is not likely, future studies should verify that CFB was understood by subjects. Finally, our results are limited to situations where subjects receive “one-shot” CFB. Perhaps with multiple presentations of CFB, the CI and FVI components may play a significant role in improving performance. For example, with a single presentation of TI, CI, and FVI feedback, subjects may consciously limit their focus to TI feedback to grasp the important and unimportant task characteristics. On subsequent presentations of CFB, however, subjects may then begin to focus on CI and FVI feedback as a way to sharpen their performance on the judgment task. Future research using multiple presentations of CFB could examine this possibility, perhaps simply asking the subjects how they are attending to and integrating the CFB on each presentation. REFERENCES Balke, W. M., Hammond, K. R., & Meyer, G. D. (1973). An alternate approach to labormanagement relations. Administrative Science Quarterly, 18, 31 l-327.
COMPONENTS OF COGNITIVE
53
FEEDBACK
Balzer, W. K., Doherty, M. E., & O’Connor, R., Jr. (1989). The effects of cognitive feedback on performance. Psychological Bulletin, 106, 410-433. Balzer, W. K., Hammer, L. B., Sumner, K. E., Birchenough, T., Parham, S., & Raymark, P. (1991). Effects of cognitive feedback components, display format, and elaboration on performance. Unpublished manuscript. Baseball Encyclopedia (1985). (6th ed.). New York: MacMillan. Bemardin, H. .I., & Buckley, M. R. (1981). A consideration of strategies in rater training. Academy of Management Review, 6, 205-212. Brehmer, B. (1988). The development of socialjudgment theory. In B. Brehmer & C. R. B. Joyce (Eds.), Human judgment: The UT approach. Amsterdam: North-Holland. Chaput de Saintonge, D. M., & Hattersley, L. A. (1985). Antibiotics for otitis media: Can we help doctors agree? Family Practice, 2, 205-212. Cohen, J. (1977). Statistical power analysis for the behavioral sciences (Rev. ed.). New York: Academic Press. Doherty, M. E., & Balzer, W. K. (1988). Cognitive feedback. In B. Brehmer & C. R. B. Joyce (Eds.), Human judgment: The SJT approach. Amsterdam: North-Holland. Dougherty, T. W., Ebert, R. J., & Callender, I. C. (1986). Policy capturing in the employment interview. Journal of Applied Psychofogy, 71, 9-l 5. Galbraith, J. T. (1984). Training assessment center ussessors: Applying principles of human judgment. Unpublished doctoral dissertation, Bowling Green State University, Bowling Green, OH. Hammond, K. R., & Adelman, L. (1976). Science, values and human judgment. Science. 194, 389-396. Hammond, K. R., & Boyle, J. R. (1971). Quasi-rationality, quarrels, and new conceptions of feedback. Bulletin of the British Psychological Society, 24, 103-l 13. Hammond, K. R., McClelland, G. H., & Mumpower, J. (1980). Human judgment and decision making. New York: Praeger. Hammond, K. R., Mumpower, J. L., & Smith, T. H. (1977). Linking environmental models with models of human judgment: A symmetrical decision aid. IEEE Transactions on Systems, Man, and Cybernetics,
Hoffman, P. J. (1987). EXPERT87:
I, 358-367. Artificial intelligence
and decision-making
support for
[Computer program]. Los Altos, CA: MAGIC7 Software (101 First Street, Suite 237, Los Altos, CA 94022). Newton, J. R. (1965). Judgment and feedback in a quasi-clinical situation. Journal of Personality and Social Psychology, 1, 336-342. Nystedt, L., & Magnusson, D. (1973). Cue relevance and feedback in a clinical prediction task. Organizational Behavior & Human Performance, 9, lOO-109. Rohrbaugh, J. R. (1984). Making decisions about staffing standards: An analytical approach to human resource planning in health administration. In L. G. Nigro (Ed.), Decision making in the public sector (pp. 93-l 15). New York: Marcel Dekker. Rohrbaugh, J. R. (1986). Policy PC: Software for judgment analysis [Computer program]. Albany, NY: Executive Decision Services (P.O. Box 9102, Albany, NY 12209). Rohrbaugh, J. R. (1988). Cognitive conflict tasks and small group processes. In B. Brehmer & C. R. B. Joyce (Eds.), Human judgment: The SJT approach. Amsterdam: NorthHolland. Schmitt, N., Coyle, B. W., & King, L. (1976). Feedback and task predictability as determinants of performance in multiple-cue probability learning tasks. Organizational Bethe desk-top microcomputer
havior
& Human Performance,
16, 388-402.
Steinmann, D. 0. (1974). Transfer of lens model training. Organizational man Performance, 12, I-16.
Behavior
& Hu-
54
BALZER
ET AL.
Steinmann, D. 0. (1976). The effects of cognitive feedback and task complexity in multiplecue probability learning. Organizational Behavior & Human Performance, 15, 168-179. Summers, D. A., Taliaferro, J. D., & Fletcher, D. J. (1970). Judgment policy and interpersonal learning. Behavioral Science, 15, 514-521. Todd, F. J., & Hammond, K. R. (1965). Differential effects in two multiple-cue probability learning tasks. Behavioral Science, 10, 429-435. Ullman, D. G., & Doherty, M. E. (1984). Two determinants of the diagnosis of hyperactivity: The child and the clinician. In M. Wolraich & D. K. Routh (Eds.), Advances in behavioral pediatrics. Greenwich, CT: JAI Press. Wigton, R. S., Connor, J. L., & Centor, R. M. (1986). Transportability of a decision rule for the diagnosis of streptococcal pharyngitis. Archives of Internal Medicine, 146, 8183. Wigton, R. S., Patil, K. D., & Hoellerich, V. L. (1986). The effect of feedback in learning clinical diagnosis. Journal of Medical Education, 61, 816-822. RECEIVED: April 3, 1990