ORGANIZATIONAL
BEHAVIOR
AND
HUMAN
DECISION
PROCESSES
46,
217-239 (1990)
The Effect of Performance Appraisal Salience on Recall and Ratings KEVIN J. WILLIAMS The University at Albany, State University of New York
THOMAS P. CAFFERTY University of South Carolina
AND ANGELO Institute of Management
S. DENISI
and Labor Relations, Rutgers University
Although models of performance appraisal implicitly assume that the rating task is highly salient to raters, it is unlikely that raters have evaluation as their main objective when performance information is fast encountered. Two studies investigated the effect of appraisal salience on information processing and ratings. High appraisal salience was related to on-line information processing, while low appraisal salience was related to memory-based information processing. Results from Study 1 indicated that when appraisal salience was low, performance information was less accessible to raters and ratings showed marginally less discriminability than when salience was high unless raters were required to recall information prior to rating. Results from Study 2 related low appraisal salience to greater distortions in ratings but found that certain organization strategies used during encoding were able to improve accuracy. Implications for processing models of performance appraisal are discussed. 0 1990 Academic Press, Inc.
Support for this research was provided by the National Science Foundation under Grant BNS-8023252. Portions of this research were reported at the annual meeting of the Southern Academy of Management, 1985, in New Orleans. Study 1 was part of the first author’s doctoral dissertation, conducted under the supervision of the second author. Comments from other committee members Keith E. Davis and David Clement are greatly appreciated. We thank Robert Peters and Peter Wickert for their help in collecting the data for Study 2. We are indebted to two anonymous reviewers for their thoughtful comments and reactions to earlier versions of this paper. Correspondence concerning this article should be addressed to Kevin Williams, Department of Psychology, The University at Albany, State University of New York, 1200 Washington Avenue, Albany, NY 12222. 217 0749-5978l90 $3 .OO Copyright All lights
0 19w by Academic Press, Inc. of rcpmduction in my form reserved.
218
WILLIAMS,
CAFFERTY,
AND
DENISI
Cognitive approaches to performance appraisal (DeNisi, Cafferty, & Meglino, 1984; Ilgen & Feldman, 1983; Landy & Fan-, 1980) propose that ratings are affected by the way in which raters process performance information. The manner in which information is processed, however, will largely depend on what the observer is doing when behavior is observed. The processing objective, or cognitive set, that an observer has when behavior is observed affects the abstraction, organization, and accessibility of person information (Hamilton, Katz, & Leirer, 1980; Hastie & Carlson, 1980; Hastie, Park, & Weber, 1984). Processing objectives “cognitively tune” the observer to relevant behaviors, which are subsequently easier to recall than behaviors that are unrelated to the salient processing objective (Hastie et al., 1984). These findings are relevant to performance appraisal research since worker evaluation may or may not be salient to raters when they first observe worker behavior. Models of the performance appraisal process implicitly assume that appraisal is highly salient to raters during observation of performance. Accordingly, rater training programs suggest that accuracy can be improved by training raters to first observe job-relevant behaviors and then evaluate and weight each behavior to form a single composite rating (e.g., Smith, 1986). These approaches, however, may be limited since raters do not typically have evaluation as their main objective when performance information is first encountered (Bemardin & Villanova, 1986). Rather, performance may be encountered in different job contexts while performing various job duties and only processed as relevant to performance standards when evaluations are asked for at a later date. When raters must divide their attention between appraisal and nonappraisal tasks, they are limited in the extent to which they can attend to and retain appraisal-relevant behaviors (Balzer, 1986). Few studies (e.g., Balzer, 1986; Barnes-Farrell & Couture, 1983) have examined the effect of appraisal salience on performance evaluations. The present research adds to this literature. Effects of Appraisal Salience Research in cognitive psychology strongly supports the principle, formally proposed by Lingle and Ostrom (1980), that the encoding of stimuli is dependent on the thematic framework salient to observers during observation. Raters for whom performance appraisal is salient may focus their attention on the workers’ levels of proficiency and related behaviors. This information should be deeply encoded and easily accessible during the rating task (Tulving, 1974). Raters for whom performance appraisal is not salient, but who are engaged in a different task, may be less likely to encode relevant performance information and less likely to recall such information in a fashion suitable to appraisal when evaluations are
APPRAISAL
SALIENCE
219
subsequently required. The greater the attentional demands of the nonappraisal task, the less likely performance appraisal information will be encoded. Appraisal salience may also affect the organization of information in memory. In person perception research, subjects instructed to form impressions of others organize information in memory by persons to a greater extent than subjects given memory set instructions (Hamilton et al., 1980; Srull, 1983). This organization may facilitate the integration of attribute information to form ,a global impression of an individual. Since performance appraisal and impression formation are both judgment goals (Cohen, 1981), they should result in similar organization patterns in memory. Recent evidence suggeststhat raters use person categories to encode and store information about others for upcoming performance appraisals (DeNisi & Williams, 1988). When performance appraisal is not salient, observers may not automatically organize information by ratees (see Ostrom, Pryor, & Simpson, 1981, for evidence that person clustering is not naturally prevalent) but may be expected to use organizing schemesconsistent with their observational goal. Judgments of performance may require more cognitive effort in such situations. The judgment process may also be affected by appraisal salience. High and low appraisal salience, as presented here, are analogous to on-line and memory-based judgments (Hastie & Park, 1986). Hastie and Park (1986) posit that when individuals are asked to make a judgment, a limited-capacity judgment operator acts on the relevant information to produce the judgment or decision. This operator is affected by whether the judgment is memory-based, where individuals must rely on information stored in long-term memory, or on-line, where individuals are able to form judgments concurrent with the receipt of information. Ratings made when appraisal salience was low during observation correspond to memory-based judgments; raters rely mainly on information they are able to retrieve from long-term memory. Ratings made when appraisal salience is high correspond to on-line judgments; raters are able to form impressions andjudgments of ratees as information is encountered, and may revise or adjust their impressions in the face of new evidence (cf. Anderson & Hubert, 1963). Hastie and Park (1986) document information processing differences between on-line and memory-based judgments. For memory-based judgments, a high correlation exists between recall and judgments; one’s decisions are highly dependent on the information that is retrieved from long-term memory. The quality of the judgment will relate to the availability of information (Tversky & Kahneman, 1973). Thus, when appraisal salience is low the quality of ratings may depend primarily on the quality of recall. Factors influencing recall accuracy (e.g., negativity bias,
220
WILLIAMS,
CAFFERTY,
AND
DENISI
primacy and recency effects, accessibility and representativeness heuristics) will have corresponding affects on ratings. It should be noted that the extent of one’s reliance on memory may be expected to vary with memory capacity demands. Heavy processing demands, where attention is directed solely on a nonappraisal task, may result in a strict reliance on recall for ratings. Less heavy demands may still allow other cognitive operations, such as global abstraction (Posner & Snyder, 1975) or impression formation processes (Schul, 1983), to occur. In such cases, ratings may be based on the integration of weak initial impressions and recall data. For on-line judgments, judges base their decisions not on information retrieved from long-term memory but on the current impressions they have formed of targets. The quality of the judgment will relate to the nature of the impressions and a low correlation between ratings and recall is likely. Thus, when appraisal salience is high raters will likely use global impressions of others to guide their ratings. Increased halo bias is likely to follow (cf. Feldman, 1981). The present analysis of on-line processing is similar to the automatic processing mode suggested by Ilgen and Feldman’s (1983) model of performance appraisal. Ilgen and Feldman (1983) suggest that raters process information according to categorical conceptions of workers, combining different performances into meaningful wholes. This categorization process is consistent with organizing information in memory by persons. Impression formation tendencies should be facilitated as well. To summarize, differences in information processing should exist when appraisal is salient, and on-line judgments are made, than when appraisal is not salient, and memory-based judgments are made. High appraisal salience should be characterized by attention to relevant behavior, organization in memory according to person categories, simultaneous impression formation, and increased recall of information (but less reliance on long-term memory). Low appraisal salience should be characterized by decreased attention to performance information, organization of information in memory according to nonperson categories, lower recall, and greater reliance on information retrieved from long-term memory. While rating accuracy would appear to be greater when appraisal salience is high, it is likely that bias will occur for both high and low salience. The sources of bias, however, may be traced to differences in information processing. Bias under high salience conditions may be related to impression formation effects while bias under low salience conditions may be due to accessibility effects, Two studies were conducted which examined the effect of appraisal salience on storage, retrieval, and judgment processes involved in performance appraisal. Study 1 related appraisal salience to memory orga-
APPRAISAL
SALIENCE
221
nization and retrieval processes and examined the correlation between recall and ratings. Study 2 attempted to clarify some of the findings of Study 1 with respect to rating accuracy. STUDY 1
Several exploratory hypotheses were tested in Study 1. First, it was hypothesized that appraisal salience would affect memory processes: HI(A): High appraisal salience will result in the organization of information in memory by worker categories to a greater extent than low appraisal salience, which will result in the organization of information by nonappraisal task categories. HI(B): High appraisal salience will result in greater recall of performance information than low salience.
Second, it was hypothesized that appraisal salience would affect ratings: H2: High appraisal salience will result in more accurate performance ratings than low salience.
Specifically, raters in this group should be better at discriminating between levels of worker proficiency and task performance. However, since high appraisal salience may also increase the reliance on general impressions a third hypothesis was offered: H3: High appraisal salience should result in lower correlations between recall and ratings than low salience conditions.
A test of the effects of appraisal salience requires raters to observe performance from competing perspectives. If raters in low appraisal salience conditions are not provided a competing task, processing demands may not prevent them from attending to and forming impressions of worker behavior (cf. Barnes-Farrell & Couture, 1983). Using Cohen’s (1981) taxonomy of observational goals, subjects in the present studies were told to either rate the performance of workers (judgment goal) or to reach decisions about the tasks performed by the workers (information seeking lwa. The order in which performance was recalled and rated was counterbalanced in Study 1 since responses to one task may influence responses to the other. Introducing a structured recall task prior to ratings may, for example, increase the accessibility of relevant information. Alba and Hasher (1983) have discussed the distinction between the availability and accessibility of information and have argued that far more information is encoded than a strict selective encoding perspective would suggest (Tulving & Pearlstone, 1966). Requiring raters to recall information prior to making their judgments may provide an appropriate retrieval context or cue for them to access seemingly unavailable information (Fass & Schumacher, 1981).
222
WILLIAMS,
CAFFERTY,
AND
DENISI
Method Subjects. Eighty undergraduates enrolled in psychology courses at the University of South Carolina participated in the experiment in exchange for partial fulfillment of course requirements. Consent forms were given to subjects when they arrived at the laboratory and the American Psychological Association guidelines for research with human subjects were followed. Stimulus material. Four stimulus tapes were used in the study. Each tape was composed of 16 videotaped segments showing the performance of four workers on four woodworking tasks: sawing, sanding, hammering, and staining. The overall proficiency displayed by the actors was counterbalanced across the four tapes. The actors were white, male, paid volunteers recruited from an advanced woodworking class at a local technical college. The tapes were constructed from a pool of 128 performance instances in which each actor performed each task correctly four times and incorrectly four times. The videotaped segments used in the study were selected from this pool based on the results of pretesting with a comparable sample of undergraduates. For each worker, two episodes of performance were selected for each task. One episode indicated good performance and one episode indicated poor performance. These episodes were selected from the pretest such that correct performances were rated significantly higher than incorrect performances and the variance in the ratings for good and poor performance was minimal. The resulting set of 32 performance episodes was used to construct the four tapes. On each tape, one worker performed three out of four tasks correctly (75% proficient), two workers performed two out of four tasks correctly (50% proficient), and one worker performed one out of four tasks correctly (25% proficient). This configuration of ratee performance was designed to ensure both inter- and intraratee variability, and allowed the discriminability of performance between and within ratees to be assessed. Pretesting showed that subjects could discriminate between these proticiency levels. Ratee characteristics were controlled by counterbalancing the proficiency level of the workers across the four tapes. The 16 performance episodes on each tape were ordered so that each episode involved a different target and task than the episode preceding it. This “nonblocked” format prevented indices of subjective organization in memory from being influenced by the sequence in which performance was observed (Cafferty, DeNisi, & Williams, 1986). The tapes used were !&in. beta color videocassettes and were presented to subjects on a 21-in. color television monitor. Procedure. Subjects arrived at the laboratory at assigned times in groups of 2-5. Upon arrival, subjects were briefed on the particular ac-
APPRAISAL
SALIENCE
223
tivity of woodworking. Each subject was given a manual describing the proper technique for performing the four tasks. After the subjects read the manual, the experimenter reviewed the specific techniques with the subjects and answered any questions. Initial processing objectives were randomly assigned to the groups. Subjects in the high appraisal salience condition were told to attend to the performance of the workers with the expectation of rating each worker after viewing the tapes. Subjects in the low appraisal salience condition were told to attend to the tapes and try to determine the difficulty of each task. The subjects were then shown the tapes. After each incident, subjects completed an observational checklist in which they indicated whether the behavior was correct or not. This was done to ensure that all subjects initially attended to the same behaviors and encoded the task performances correctly. Subsequent analysis of this checklist revealed that all subjects did indeed initially encode the performances correctly. After the last performance episode, subjects were given the Group Embedded Figures Test (Oltman, Raskin, Witkin, & Karp, 1971). This interpolated task lasted approximately 30 min and was used to reduce shortterm memory effects.’ Subjects were then instructed to mentally review the tape with the intent to rate the workers. Thus, for half the sample, this involved reprocessing the information from a new perspective. Subjects were given 3 min for this reprocessing task. The recall and rating tasks were then given in a counterbalanced order. In the recall task, subjects were given a blank booklet and instructed to write down as many performance incidents as they could remember, one incident per page. In the rating task, subjects were instructed to rate each worker on each task and to provide an overall rating for each worker. Subjects were then debriefed, given credit for the experiment, and dismissed. Dependent measures. Organization of information in memory was assessedby measuring the degree of category clustering in the free recall data. Clustering indices measure the extent to which subjects recall items belonging to the same category in successive order, and relate the amount of clustering displayed to the amount of clustering expected by chance alone. Clustering scores were calculated for both person and task categories. The clustering index used was the adjusted ratio of clustering (ARC) index (Roenker, Thompson, & Brown, 1971):
ARC=R
- E(R)/Max R - E(R),
1 The Group Embedded Figures Test provided an individual difference measure of field dependency for each rater. Analyses showed no relationship between scores on this measure and any dependent variable.
224
WILLIAMS,
CAFFERTY,
AND
DENISI
where R is the total number of observed repetitions, MUX R is the maximum number of repetitions (always the number of items recalled minus the number of category targets), and E(R) is the number of repetitions based on chance. E(R) = (CWZ(~)~/N) - 1, where m is the number of items in category i and N is the total number of items recalled. An ARC score equal to 1.0 reflects perfect clustering, and an ARC score equal to 0.0 reflects clustering at the level of chance.2 Free recall responses were recorded as recalled items if they made reference to performance information contained in the specific episodes. Recalled items which paired a worker with task performance were recorded as recalled performances. This latter measure was of more value as a dependent variable because it provided more detailed information, but recalled items were also analyzed since raters may use incomplete data when richer data are not recalled. Accuracy of recalled performances was assessed by examining a given item to see if it correctly paired a worker with his proper level of performance on the stated task. Recalled performances were coded as incorrect if a worker was paired with an incorrect performance level for a certain task or if the type of incorrect performance identified was not the one depicted. Subjects rated the performance of each worker on the four tasks and the overall performance of each worker on 7-point Likert-type scales anchored at very poor (I), average (4), and very good (7). Results Clustering indices. The mean ARC scores for each type of clustering were analyzed in a 2 (appraisal salience) X 2 (order of recall and rating) X 2 (clustering type: person or task) mixed factor ANOVA. Clustering type was a within-subjects factor. Only the Appraisal Salience x Clustering Type interaction was significant, F(1,63) = 10.44, p < .Ol, o2 = .05. Simple effects tests showed that person clustering was significantly higher for high appraisal salience than low salience (MS = .26 and - .Ol , respectively) and that task clustering was significantly higher for the low appraisal salience condition (MS = .32 and - .05), ps < .OS. The clustering
2 Category clustering measures the extent to which recalled items are temporally grouped into categories. Thus, in our study high person clustering would occur in the following list: “John sawed the wood correctly,” “ John bent the nail when he hammering,” “John went against the grain when staining,” “Bill sanded correctly,” “Bill splintered the wood when sawing,” “ Mike sanded incorrectly by going against the grain,” “Mike sawed correctly,” “Mike dripped the stain when staining.” An example of high task clustering would be the following list: “John did not saw straight, ” “Mike sawed correctly,” “ Bill splintered the wood when sawing,” “Mike stained the wood correctly,” “Bill dripped the stain,” “John stained correctly,” etc.
APPRAISAL
225
SALIENCE
effects supported Hypothesis 1A: person clustering was found for person judgments and task clustering for task judgments. Recall. The number of items recalled, complete performances recalled, and number of performances correctly recalled were analyzed in 2 (appraisal salience) x 2 (order) ANOVA designs. The means for these analyses are presented in Table 1. The main effects for salience posited by Hypothesis 1B were not significant for complete performances or correct recall, Fs(1,74) < 1. The salience effect for number of items recalled was significant, F(1,74) = 4.88, p < .05. Significant Salience x Order interactions were found for each dependent variable, Fs(1,74) > 4.00, ps < .05, o*s > -03. The simple effects tests for all three interactions revealed the same effect: when appraisal salience was low, subjects who performed the free recall task first scored higher on the recall measure in question than subjects who rated the workers first (ps < .05). In addition, their responses did not significantly differ from subjects in the high salience condition. Subjects performing the rating task first in the high salience condition responded higher on all three recall measures than subjects performing the recall task first, but these differences were not significant. Overall ratings. The overall ratings assigned the workers were analyzed in a 2 (appraisal salience) x 2 (order) X 3 (ratee proficiency level) mixedfactor ANOVA. Ratee proficiency was a within-subjects factor. Our main interest was whether raters could discriminate between the proficiency levels of the workers. The means for this analysis are presented in Table 2. A significant effect was found for ratee proficiency, F(2,152) = 7.54, p TABLE MEAN
RECALL
INDICES
1
AS A FUNCTION OF APPRAISAL SALIENCE RATING AND RECALL IN STUDY 1
AND ORDER
OF
Order
Items recalled
Performances recalled
Performances correctly recalled
Recall fust Rate fust
6.95 8.52
4.25 5.68
2.80 3.58
Recall fast
8.26 4.35
5.42 3.20
3.75 1.85
4.62 15.40
3.00 8.58
:.G*
4.00*
Appraisal salience High salience Low salience
Rate first
Salience
M MS, F( 1,74)
Salience x order F(1,74)
6.99 8.54 4.68* 17.19**
Note. * = p < .05; ** = p < .Ol.
ll.S.
226
WILLIAMS,
CAFFERTY, TABLE
MEAN
AND DENISI
2
PERFORMANCE RATINGS OF Goon, AVERAGE, AND Poop WORKERS BY APPRAISAL SALIENCE AND ORDER OF RATINGS AND RECALL IN STUDY I
Appraisal salience High salience Good Average Poor Low salience Good Average Poor
Order Recall first
Rate first
4.60 4.15 4.05
5.15 3.95 4.15
4.90 4.13 4.05
4.35 4.30 4.25
c .Ol, o2 = .07. Newman Keuls tests revealed that subjects rated the best (75% proficient) worker significantly higher than the average (50% proficient) and worst (25% proficient) workers, M = 4.75 vs. 4.13 and 4.12, p < .05. The ratings for the average and poor workers were not significantly different. The Salience x Ratee interaction, which represented a test of Hypothesis 2, was not significant, F(2,152) < 1. A marginally significant Salience x Ratee x Order interaction was found, F(2,152) = 2.25, p < . 10. Discriminability was greater for the low salience group when subjects performed the recall task first than when they performed the rating task first. Subjects in this condition were able to rate the good worker higher than the other workers. Task performance ratings. The ratings assigned to the task performances of ratees were analyzed in a 2 (salience) x 2 (order) x 2 (task performance-those tasks performed correctly versus those performed incorrectly) x 3 (ratee proficiency) mixed factor ANOVA with task performance and ratee proficiency as within subjects factors. Tasks performed correctly were rated higher (M = 4.70) than tasks performed incorrectly (M = 3.94), F(1,76) = 33.76, p < .Ol. Only a marginally significant Salience x Task Performance interaction was found, F(1,76) = 3.42, p < .07. Subjects in the high salience condition tended to rate correct performance higher (MS = 4.82 versus 4.57) and incorrect performance lower (M = 3.82 versus 4.05) than subjects in the low salience condition. No other effects were evident. Thus, overall support for Hypothesis 2 was weak. Correlations between recall and ratings. Zero-order correlation coeffrcients were computed for the proportion of favorable performance items recalled about a worker and the evaluation given that worker. As hypothesized (H3), recall and ratings were not related when appraisal salience
APPRAISAL
SALIENCE
227
was high (IS = .20 and .21, n.s., for the rate first and recall first conditions, respectively) but were significantly correlated when appraisal salience was low (rs = .46 and .43, ps < .05, respectively). Discussion
High appraisal salience results in the organization of information in memory by person categories, greater recall of performance information, and low (nonsignificant) correlations between recall and ratings. The low correlations between recall and ratings suggest that raters, despite their high level of recall, base their ratings on general impressions of workers rather than actual behavior. Low appraisal salience results in the absence of person categorization, lower recall of performance information, and positive correlations between recall and ratings. These results support the notion that on-line processing occurs for high salience and memory-based processing occurs for low salience conditions. However, we cannot be sure of the extent to which information processing is strictly on-line or memory-based. The processing demands on the low salience group may not have prevented spontaneous impression formation, especially since subjects attended to the quality of worker performances (as indicated by the results of the observational checklist).3 The recall-rating correlations indicate only that raters in the low salience condition rely significantly on recall when making their ratings. It is possible that recalled items were integrated with weak person impressions formed during observation. Thus, it may be best to view the salience manipulation as affecting the strength of spontaneous person impressions. High appraisal salience may lead to strong person impressions which are not appreciably intluenced by recall; low appraisal salience may lead to weak impressions which are influenced or altered by recall. The Salience x Order interactions reveal that low appraisal salience leads to information loss only when raters are not asked to recall performance before making their ratings. Having subjects in low salience conditions report their recall prior to rating increased the number of performances recalled, correct recall, and discriminability in overall ratings. In fact, these raters do not significantly differ in terms of recall or ratings from raters in the high salience condition. One explanation for this finding is that the recall task prompts raters in the low salience condition to undergo a more extensive memory search. Associated with this search may be an increase in the cognitive effort expended to differentiate memory traces associatedwith original events from memory traces which have been internally generated (Johnson & Raye, 1981). Johnson and Raye 3 We are grateful to an anonymous reviewer for suggesting this explanation.
228
WILLIAMS,
CAFFERTY,
AND
DENISI
(1981) propose that people remember information from two sources: (a) that derived from perceptual processes (external sources), and (b) that generated by internal sources such as imagination, reasoning, or impression formation. Errors in memory may be the result of “a failure to discriminate the origins of a memory trace” (p. 69). In the present study, raters in the low salience/recall first group had the highest amount of correct recall. Perhaps the recall task not only encouraged these raters to search extensively for performance information in memory but also to carefully discriminate the origins of their memory traces. The results in the low salience/rate first condition suggest that raters do not automaticalIy engage in extensive memory search or recall monitoring processes. Cues provided by the retrieval context may be necessary for these processes to occur (Alba & Hasher, 1983). The recall task may not have shown similar effects for the high salience condition because of the strong person impressions that raters were likely to have formed. These impressions may have created internally generated data that were difficult to distinguish from observed events. In addition, a ceiling effect may have occurred since recall was expected to be high in this condition. An alternative explanation for the present findings could be offered by an automatic versus controlled processing framework (Ilgen & Feldman, 1983; Lord, 1985a). According to this view, raters in prototypical rating situations process information automatically according to wellestablished schemata. Bias may result when observed events do not match one’s implicit theories (Lord, 1985a). When raters are encouraged to process information in a controlled fashion they may be more likely to produce ratings which match observed behaviors. It could be argued that only the low salience/recall-first condition reflects the type of controlled processing referred to by these models. We hesitate, however, to attribute the positive outcomes in this condition strictly to controlled processing. Individuals do not always make effective use of information when consciously trying to do so (Nisbett & Ross, 1980) and controlled processing has not always been linked to more accurate performance ratings (McKelvey & Lord, 1986). The memory search and recall monitoring explanations seem more parsimonious given the present data. Finally, the results of Study 1 indicate that the dispersion in ratings between workers is restricted in all conditions. At best, raters only differentiate the best worker from the others in terms of overall ratings; the average and poor workers are not discriminated. Perhaps the best worker is more salient to raters in memory than the other workers. Alternatively, performance may not be perceived as additive-the difference between 25% correct and 50% correct performance may not be the same as the difference between 50% correct and 75% correct. Whatever the underlying mechanism, lack of differentiation occurs in both salience conditions.
APPRAISAL
SALIENCE
229
While interventions during reprocessing benefit the low salience group, they may occur too late to impact heavily on ratings. The reprocessing stage is an output stage of the information processing model (Lord, 1985a); intervening during an input stage (i.e., encoding) may be more effective in increasing accuracy for both salience conditions. It was mentioned earlier that the way in which observers organize information in memory influences recall. Raters in Study 1 did not organize information to a great extent. Perhaps priming them in the use of organizing strategies may increase categorization in memory. In addition, the use of person categories appears to be related to an overreliance on general impressions of workers. Training raters to use other categorization schemes may increase rating accuracy. Study 2 examines these propositions. STUDY 2
The main hypothesis tested in Study 2 was that organizing performance information in certain patterns during encoding will increase rating accuracy. Performance information may be considered nonindividuated data (Pryor & Ostrom, 1981). That is, multiple ratees may be compared along common task dimensions. This suggests two memory organization schemes: organization by persons or tasks. Either strategy should increase recall and rating accuracy relative to situations where no organization is used (Cafferty et al., 1986). Person blocking, however, may lead to impression formation (Hamilton et al., 1980), which, in turn, may result in rating distortions such as halo bias. Spontaneous impression formation may be less likely to occur for task blocking. Study 2 provided task blocking, person blocking, or no blocking instructions to subjects before they observed worker performance. The effect of blocking strategy was examined under high and low appraisal salience. It was hypothesized that task performance and overall ratings would be affected by appraisal salience and organization strategies as follows: HI: High appraisal salience will lead to more accurate ratings than low appraisal salience. H2: Person and task blocking will lead to more accurate ratings than no blocking conditions. H3: The influence of blocking instructions will be greater when appraisal salience is low.
No a priori prediction was made for the effect of task versus person blocking. An additional hypothesis was made for task dimensions ratings. Since person-blocking may facilitate impression formation, it was predicted that:
230
WILLIAMS,
CAFFERTY,
AND
DENISI
H4: The ratings of specific task performances would be influenced more by the ratees’ overall proficiency level when raters are instructed to organize information by persons than by tasks.
Hypothesis 4 implies a three-way interaction between blocking tions, worker proficiency, and task performance.
instruc-
Method Subjects and design. Sixty undergraduates participated in the experiment in exchange for extra course credit and were randomly assigned to treatment groups in a 2 x 3 (Appraisal Salience X Organization Instructions) factorial design. Procedure. The same procedure as Study 1 was used with the following changes. After being introduced to the experiment, subjects were given the memory organization instructions. They were told that they would be viewing the performances of four workers arranged in an unstructured, random pattern. Subjects in the person blocking condition were told to mentally reorganize the information so that all the performances for each worker were grouped together. Examples of this reorganization were provided. Subjects in the task blocking condition were told to reorganize the information so that the performances of the workers on each task were grouped together. Again, examples of this reorganization scheme were provided. Subjects in the no blocking condition were not told to reorganize the information in any manner. Next, subjects were given the appraisal salience manipulation and shown the workers’ performances. One of the four tapes used in Study 1 was selected for Study 2.4 After viewing the videotape, a IO-mitt filler task was given to subjects to remove short-term memory effects. The same recall and rating measures as in Study 1 were then collected. All subjects were given the recall task prior to the rating task. Results Clustering indices. The mean ARC scores for person and task categories in each condition are presented in Table 3. The ability of subjects to subjectively organize information according to their instructions was tested with planned t tests. Person blocking instructions resulted in higher ARC scores for person categories (M = 51) than tasks or no blocking instructions (M = .20), t(58) = 1.84, p < .07. While task blocking instructions led to higher ARC scores for task clustering than person or no 4 The analyses from Study 1 were repeated with stimulus tape as an independent variable. No signiticant difference was found between tapes for any dependent variable. In particular, dispersions in ratings were found to be similar in ah cases. The tape chosen for use in Study 2 displayed the best match between desired levels of performance and performance ratings.
APPRAISAL
231
SALIENCE
TABLE 3 MEAN
CLUSTERING INDICES FOR PERSON AND TASK CATEOORIES BY APPIUISAL SALIENCE AND BLOCKING CONDITION IN STUDY 2
Clustering type Salience
Blocking condition
Person
Person Task None
.41 .42 .36
Task
Hiih .08 .22 -.08
LOW
Person Task None
.54
.07
-.02 .03
.28
.lO
Note. 1.0indicatesperfectclustering,0.0indicatesclusteringat chancelevels.
blocking instructions, the planned comparison was not significant, t(58) = 1.46, p > .lO. While ARC scores for person clustering were higher when salience was high (A4 = .42) than low (M = .18), this difference was not reliable, t(58) = 1.49, p > .lO. Correct recall. Performance instances correctly recalled were analyzed in a 2 (Salience) x 3 (Blocking Instructions) ANOVA. A significant effect was found for salience, F(1,54) = 4.82, p < .05, o2 = .06. High appraisal salience resulted in more instances correctly recalled (M = 10.47) than low salience (M = 8.40), thus further supporting Hypothesis 1B from Study 1. The blocking instructions main effect and the Salience x Blocking Instructions interaction were not significant. Overall ratings. Mean overall ratings were analyzed in a 2 (Salience) x 3 (Blocking Instructions) x 4 (Ratee) mixed factor ANOVA. Ratee was a within subjects factor. A salience main effect was found, F(1,48) = 4.75, p < .05, w2 = .Ol. High appraisal salience resulted in higher elevated ratings (M = 4.22) than low salience (M = 3.93). The only other significant finding was a ratee effect, F(3,144) = 33.27, p < .OOl, o2 = .30. Tukey comparisons revealed higher ratings for Ratee 4 (the 75% proficient worker, M = 5.28) than all others, ps < .05, and higher ratings for Ratee 3 (one of the 50% proficient workers, M = 4.22) than for Ratee 2 (the other 50% proficient worker, M = 3.30) and Ratee 1 (the 25% proficient worker, M = 3.48), ps < .05. Salience and blocking conditions did not differentially effect ratee discriminability. Thus, in terms of overall ratings, no support was found for Hypotheses 1, 2, or 3. Ratings of task performance. Task performance ratings were tested by analyzing the average ratings assigned each ratee on those tasks he performed well and those tasks he performed poorly. Mean ratings are pre-
232
WILLIAMS,
CAFFERTY,
AND
DENISI
sented in Table 4 and were analyzed in a 2 (Salience) x 3 (Blocking) x 4 (Ratee) x 2 (Task Performance: those tasks performed well vs. those tasks performed poorly) mixed factor ANOVA with ratee and task performance as within subjects factors. Main effects were found for appraisal salience, F(1,48) = 6.80, ratee, F(3,144) = 13.28, and task performance, F(1,48) = 124.00, ps < .05. Post hoc comparisons revealed that high salience led to higher ratings than low salience; mean task performance ratings were higher for the best worker than for all others; and good task performance was rated significantly higher than poor task performance. More directly related to the hypotheses, a significant Salience x Task Performance x Ratee interaction was found, F(3,144) = 4.50, p < .Ol. Simple effects tests of the Task Performance X Ratee interaction for the two salience conditions were conducted to break down this three-way interaction (i.e., simple, simple effects tests, Keppel, 1982). For these tests, a significant effect for task performance shows differentiation of good from poor performance; an effect for ratees indicates different elevation in the rating of good and poor performance across ratees; an interaction indicates differential rating of good and poor performances across ratees. When appraisal salience was high, good performance was rated higher than poor performance, F( 1,29) = 69.38, p < .OOl, and good and poor performances by the best worker were rated higher than similar performance by other workers, F(3,87) = 7.56, p < .Ol. When appraisal salience was low, the same task performance, F(1,29) = 58.65, and ratee F(3,87) = 5.94, ps < .Ol, effects were found. Additionally, the Task Performance X Ratee interaction was significant, F(3,87) = 4.14, p < .Ol, TABLE MEAN
PERFORMANCE RATINGS BY APPIUSAL SALIENCE,
4
FOR TASKS PERFORMED CORRECTLY AND INCORRECTLY, BLOCKING INSTRUCTIONS, AND RATEE IN STUDY 2
Ratee (% correct)’ 1 (25%) Salience High salience Person Task None Low salience Person Task None
2 (50%)
3 (50%)
4 (75%)
corr
Inc
corr
Inc
corr
Inc
corr
Inc
5.40 5.60 5.10
3.07 3.30 3.30
4.55 5.00 3.95
3.45 2.50 2.85
5.60 5.15 5.30
3.00 3.15 2.75
5.87 5.33 5.50
3.40 4.00 4.80
4.10 3.90 5.10
3.27 3.30 3.80
3.75 4.55 3.85
3.10 2.65 2.00
5.25 5.05 4.85
2.65 3.05 3.75
5.73 5.10 5.60
2.70 2.80 3.70
Note. Corr = tasks performed correctly; Inc = tasks performed incorrectly. a % correct = percentage of tasks performed correctly by each ratee.
APPRAISAL
233
SALIENCE
and was due mainly to good performance by Ratees 1 and 2 (those with lower overall proficiency ratings) being rated significantly lower than good performance by Ratees 3 and 4 (those with higher overall proticiency ratings), Tukey ps < .05. Thus, when performance appraisal was not salient, the ratings of task performances were more likely to be affected by the ratees’ overall proficiency level. This provides some support for Hypothesis 1. A significant Blocking Instruction x Task Performance x Ratee interaction, F(6,144) = 2.41, p < .05, suggestedthat blocking instructions also affected task ratings. Figure 1 displays this interaction. Again, Performance x Ratee simple effects tests were conducted at the levels of the between group factor (i.e., blocking). Task blocking instructions led only to the desirable task performance effect, F(1,19) = 41.00, p < .Ol (good performance was rated higher than poor performance across all ratees). However, a significant Ratee x Task Performance interaction was found for person blocking, F(3,57) = 7.24, p < -01. Good performance by high proficient workers was rated higher than good performance by low proficient workers. In the no blocking condition, a significant effect for ratee, F(3,57) = 10.54, p < .Ol was found in addition to the main effect for task
I
:
3
4
1
2
3
4
1
:
3
4
HTEE FIG. 1. Mean task performance ratings by blocking instructions and ratees in Study 2. Ratee’s 1,2,3, and 4 correspond to 25,50,50, and 75% correct performances, respectively.
234
WILLIAMS,
CAFFERTY,
AND
DENISI
performance. Thus, person and no blocking instructions resulted in distortions in task ratings across ratees relative to task blocking instructions. This provides support for Hypothesis 4. Discussion
The results of Study 2 further support the hypothesis that appraisal salience tunes raters toward relevant performance information. First, the use of person categories in memory is greater for high appraisal salience than low appraisal salience. Second, raters recall more correct performance information when appraisal salience is high. Third, in regard to task ratings, the Appraisal Salience x Ratee x Task Performance interaction suggests that ratings of similar performances are affected by ratee proficiency levels more when appraisals salience is initially low than when it is high. This interaction provided partial support for Hypothesis 1. Specifically, when appraisal salience is low, good performance by workers seen as below average in overall proficiency is rated lower than similar performance by better workers. The remaining hypotheses in Study 2 examined the effects of subjective organization strategies on the rating process. While recall was greater when subjects were instructed to organize information in memory (by either ratees or tasks) than when no organizing instructions were given, blocking instructions did not interact with appraisal salience and did not affect recall accuracy. Thus, Hypotheses 2 and 3 were not supported. Blocking instructions did, however, affect task ratings. Consistent with Hypothesis 4, task performance ratings vary with the ratee’s perceived proficiency level when raters are encouraged to organize information by ratees or are not encouraged to organize information in any specified manner. Ratings of good and poor performance are consistent across ratee proficiency levels only for task blocking instructions. The absence of person impression effects for task blocking cannot be attributed solely to memory organization processes since subjects instructed to organize information by tasks were not able to do so to a reliable extent. It is also difficult to attribute the observed effects to differences in automatic versus controlled processing since both task and person blocking instructions should interrupt the automatic processing of information. The interaction seems more consistent with the weak spontaneous impressions explanation offered for the results of Study 1. That is, task blocking instructions may lead to weaker person impressions than person or no blocking instructions because aggregation and abstraction processes are discouraged (Schul, 1983). Thus, ratings would be more likely to reflect actual behavior rather than internally generated impressions of performance. Person blocking instructions, on the other hand, may actually facilitate tendencies toward integration and abstraction.
APPRAISAL
SALIENCE
235
One implication of the Blocking Instructions x Task Performance x Ratee interaction is that task dimension categories may be used to structure the appraisal decision process. Rater training programs could be designed to encourage raters to organize information in memory by task dimensions. Subjects were not able to accomplish this in the present study, but prolonged training may prove more successful. Alternatively, rater diaries, which are used to reduce memory demands placed on the rater, may be structured by task dimensions so that raters can organize information in an easily accessible and efficient manner. These effects may be extended to the format of rating scales. First, halo bias may be lower for task blocked rating formats than person blocked formats because impressions of raters do not remain temporally active (cf. Wherry, 1952). Second, rating scale formats may also act as memory organizing devices (Ilgen & Feldman, 1983; Pulakos, 1986). Pulakos (1986) found that rating accuracy was greater when rater training and rating scale format were congruent than incongruent. Thus, using task dimensions to organize encoding and rating may result in greater accuracy. Before implementing these suggestions,however, research is needed to address the criticalness of task categorization in memory. The data in Study 2 suggest that perhaps it is the absence of person categorization that is responsible for the more accurate ratings. GENERAL
DISCUSSION
The present studies provide evidence of the effect of appraisal salience on the rating process. High salience can be characterized by on-line processing while low salience can be characterized by memory-based processing. Expecting to rate others directs observers toward performance information and facilitates the use of worker categories in memory (Feldman, 1981). Recall of relevant information is enhanced as a result of these processes. Distortions may occur, however, for both high and low salience conditions. The extent of information loss may be greater when appraisal salience is low, but high appraisal salience and person categorization increase person impression effects. Retrieval and encoding conditions have been identified which may increase accuracy. Structured recall tasks and the use on nonperson categories in memory may reduce the tendency to rely on loosely formed impressions or easily accessible but (possibly) inaccurate information. These findings have implications for rater training programs, which often assume that the rating task is salient to raters and that relevant information is easily accessible. More emphasis should be given to extensive memory search and reintegration procedures. Rater training should also incorporate methods that increase the retention of performance information across situations and time.
236
WILLIAMS,
CAFFERTY,
AND
DENISI
There are limitations of this research that restrict the generalizability of the findings. The use of a laboratory setting, videotaped performances, and nonexpert raters enabled us to draw inferences about underlying cognitive processes. Obviously, many of the affective and cognitive demand components of the modal criterion setting for performance appraisal in organizations (Bernardin & Villanova, 1986)were ignored. Future research should compare the present results with the effects of salience and encoding/retrieval strategies on ratings in more realistic settings. The present investigation raises a number of issues for future research. The results are seen as consistent with a dual storage model of person memory (Anderson & Hubert, 1963; Carlston, 1980; Schul, 1983) where the details of observed behavior and impressions of ratees are stored separately in memory. We have also speculated as to the roles that person impressions, memory traces for behavioral detail, and memory organization play in the appraisal process. Our inferences are based on a free recall methodology that reveals cognitive processes when information presented to subjects has been held constant and orienting tasks have been varied (Srull, 1985). Unfortunately, this methodology does not allow us to address more subtle questions such as: What are the relative strengths of person impressions under high and low salience conditions? How susceptible are impressions to reintegration in these conditions? To answer these questions and address hypotheses relating memory organization and rating accuracy to appraisal salience, changes in methodology may be needed. Reaction time procedures are particularly useful for addressing the process questions posed above because cognitive processes operate in real time (Srull, 1984). Response latency data would allow investigators to infer when subjects are relying on impressions for their decisions rather than observed behavior (as impressions increase in strength recognition of observed behavior takes longer). Schul(l983) used response latencies to examine how trait information is integrated in memory and how global impressions are formed. Similar procedures could test the representation of information in memory under high and low salience conditions. Signal detection analysis could also be used to address accuracy issues (Lord, 1985b). Accuracy measures based on signal detection theory would allow the decision or guessing components of subjects’ performance to be separated from the accuracy component under low and high salience conditions. The strength of externally generated and internally generated memory traces can also be assessedthrough both response latencies and signal detection analyses. Attention to these issues would greatly add to the cognitive literature on performance appraisal. The present research reminds us to consider the intluence contextual events, and processing
APPRAISAL
SALIENCE
237
objectives in particular, have on encoding, integration, and retrieval processes. REFERENCES Alba, J. W., & Hasher, L. (1983). Is memory schematic? Psychological Bulletin, 93, 203231. Anderson, N. H., & Hubert S. (1%3). Effects of concomitant verbal recall on order effects in personality impression formation. Journal of Verbal Learning and Verbal Behavior, 2, 37s391. Balzer, W. K. (1986). Biases in the recording of performance-related information: The effects of initial impression and centrality on the appraisal task. Organizational Behavior and Human
Decision
Processes,
37, 329-347.
Barnes-Farrell, J. L., & Couture, K. A. (1983). Effects of appraisal salience on immediate and memory-based judgments. In C. Banks & L. Roberson (Chairs), Cognitive processes in performance appraisal: Newfindings. Symposium presented at the 91st meeting of the American Psychological Association, Anaheim, CA. Bernardin, H. J., & Villanova, P. (1986). Performance appraisal. In E. A. Locke (Ed.), Generalizing from the laboratory tofield settings (pp. 200-211). Lexington, MA: Lexington . Cafferty, T. P., DeNisi, A. S., & Williams, K. J. (1986). Search and retrieval patterns for performance information: Effects on evaluations of multiple targets. Journal of Personal@
and Social
Psychology,
50, 67-3.
Carlston, D. E. (1980). The recall and use of traits and events in social inference processes. Journal of Experimental Social Psychology, 16, 303-328. Cohen, C. E. (1981). Goals and schemata in person perception: Making sense from the stream of behavior. In N. Cantor & J. F. Kihlstrom (Eds.), Personalby, cognition, and social interaction (pp. 45-67). Hillsdale, NJ: Erlbaum. DeNisi, A. S., Catferty, T. P., & Meglino, B. M. (1984). A cognitive model of the performance appraisal process: A model and research propositions. Organizational Behavior and Human
Performance,
33, 360-396.
DeNisi, A. S., & Williams, K. J. (1988). Cognitive approaches to performance appraisal. In G. R. Ferris & K. R. Rowland (Eds.), Research in personnel and human resource management (Vol. 6, pp. 109-155). Greenwich, CT: JAI Press. Fass, W., & Schumacher, G. M. (1981). Schema theory and prose retention: Boundary conditions for encoding and retrieval effects. Discourse Processes, 4, 17-26. Feldman, J. M. (1981). Beyond attribution theory: Cognitive processes in performance appraisal. Journal of Applied Psychology, 66, 127-148. Hamilton, D. L., Katz, L. B., & Leirer, V. 0. (1980). Cognitive representation of personality impressions: Organizational processes in fust impression formation. Journal of Personality and Social Psychology, 39, 1050-1063. Hastie, R., & Carlson, D. (1980). Theoretical issues in person memory. In R. Hastie, et al. (Eds.), Person memory: The cognitive bases of social perception (pp. l-53). Hillsdale, NJ: Erlbaum. Hastie, R., & Park, B. (1986). The relationship between memory and judgment depends on whether the judgment task is memory-based or on-line. Psychological Review, 93, 256-268. Hastie, R., Park, B., & Weber, R. (1984). Social memory. In R. S. Wyer, Jr., &T. K. Srull (Eds.), Handbook of social cognition (Vol. 2, pp. 151-202). Hillsdale, NJ: Erlbaum. Ilgen, D. R., & Feldman, J. M. (1983). Performance appraisal: A process focus. In B. M.
238
WILLIAMS,
CAFFERTY,
AND
DENISI
Staw & L. L. Cummings (Eds.), Research in organizational behavior (Vol. 5, pp. 141-197). Greenwich, CT: JAI Press. Johnson, M. K., & Raye, C. L. (1981). Reality monitoring. Psychological Review, 88,6785. Keppel, G. (1982). Design and analysis: A researcher’s handbook (2nd ed.). Englewood Clitfs, NJ: Prentice-Hall. Landy, F. S., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87, 72-107. Lingle, J. H., & Ostrom, T. M. (1980). Principles of memory and cognition in attitude formation. In R. Petty, T. Ostrom, t T. Brock (Eds.), Cognitive responses in persuasion (pp. 399-423). Lord, R. G. (1985a). An information processing approach to social perceptions, leadership, and behavioral measurement in organizations. In B. W. Staw & L. L. Cummings (Eds.), Research in organizational behavior (Vol. 7, pp. 87-128). Greenwich, CT: JAI Press. Lord, R. G. (1985b). Accuracy in behavioral measurement: An alternative definition based on raters’ cognitive schema and signal detection theory. Journal ofApplied Psychology, 70, 66-71. McKelvey, J. D., t Lord, R. G. (1986). The effects of automatic and controlledprocessing on rating accuracy. Paper presented at the annual meeting of the Society for Industrial Organizational Psychology, Chicago. Nisbett, R. E., 8s Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment. Englewood Clilfs, NJ: Prentice-Hall. Oltman, P. K., Raskin, E., Witkin, H. A., & Karp, S. A. (1971). Group embeddedfigures test (manual). Palo Alto, CA: Consulting Psychologists Press. Ostrom, T. M., Pryor, J. B., & Simpson, D. D. (1981). The organization of social information. In E. T. Higgins, C. P. Herman, & M. P. Zanna (Eds.), Social cognition: The Ontario symposium (Vol. 1, pp. 3-38). Hillsdale, NJ: Erlbaum. Posner, M. I., & Snyder, C. R. (1975). Attention and cognitive control. In R. L. Solso (Ed.), Information processing and cognition: The Loyola Symposium. Hillsdale, NJ: Erlbaum. Pryor, J. B., & Ostrom, T. M. (1981). The cognitive organization of social information: A converging-operations approach. Journal of Personality and Social Psychology, 44, 628-641. Pulakos, E. D. (1986). The development of training programs to increase accuracy with different rating tasks. Organizational Behavior and Human Decision Processes, 38, 76-91. Roenker, D. L., Thompson, C. P., & Brown, S. C. (1971). Comparison of measures for the estimation of clustering in free recall. Psychological Bulletin, 76, 45-48. Schul, Y. (1983). Integration and abstraction in impression formation. Journal of Personality and Social Psychology, 44, 45-54. Smith, D. E. (1986). Training programs for performance appraisal: A review. Academy of Management Review, 11, 22-40. Srull, T. S. (1983). Organizational and retrieval processes in person memory: An examination of processing objectives, presentation format, and the possible role of selfgenerated cues. Journal of Personality and Social Psychology, 44, 1157-l 170. Srull, T. S. (1984). Methodological techniques for the study of person memory and social cognition. In R. S. Wyer & T. S. Srull (Eds.), Hundbookofsociul cognition (Vol. 2, pp. l-72). Hillsdale, NJ: Erlbaum.
APPRAISAL
SALIENCE
239
Tulving, E. (1974). Recall and recognition of semantically encoded words. Journal of&perimental Psychology, 102, 778-787. Tulving, E., & Pearlstone, Z. (1966). Availability versus accessibility of information in memory for words. Journal of Verbal Learning and Verbal Behavior, 5, 381391. Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5, 207-232. Wherry, R. J. (1952). The control of bias in mtings: A theory of rating. Columbus: The Ohio State Research Foundation. RECEIVED:
February 24, 1988