SOURCES
OF BIAS IN PERFORMANCE TWO EXPERIMENTS’
EVALUATION:
JACK M. FELDMAN and ROBERT J. HILTERMAN’ University ofFlorida Attribution and stereotyping theories predicted that poor-performing black workers would receive lower evaluations than corresponding Whites, while good performance would have the reverse effect. “Dress and lifestyle” was also predicted to influence performance evaluation. Male business students, evaluating bogus employees differing in race, dress, and behaviorally described performance, provided weak support for the second hypothesis. A second experiment investigated the possibility that perceived social class, rather than race, was the relevant variable in the operation of a stereotype confirmation-contrast process. Individual differences in stereotyping were also measured. No support for the specific hypothesized process was found, but results supported the potential relevance of attribution theory to a configural performance-evaluation model. Results were also discussed in terms of worker-perceived discrimination.
Bias in the evaluation of employee performance, especially bias based on racial, ethnic, or sexual grounds, is an important issue for workers and organizations alike. For workers, a biased performance appraisal system can wipe out the benefits expected from effort on the job, directly affecting material and psychological well-being. For the organization, biased performance appraisal can result in the retention and reward of less productive workers, higher turnover costs, and a possible increase in interracial or interethnic conflict due to perceptions of mistreatment (see Davidson & Feldman, 197 1). The evidence for the existance of such bias in natural settings is unfortunately indirect and anecdotal (Davidson & Feldman, 1971). Bass and Turner (1973) found, for example, that ratings of black bank teller’s performance were more strongly correlated with objective indices of performance than were white teller’s ratings. This suggests that bank managers may have been “bending over backwards” to avoid appearing biased against Blacks. At the same time, some Whites who were actually performing worse than some Blacks could have been rated higher, possibly producing dissatisfaction or
36
International
Journal
of Intercultural
Relations
conflict. Rotter and Rotter ( 1966) found that Blacks were rated higher than Whites at low performance levels, suggesting a guilt-induced tendency toward leniency. It seems useful, since no clear pattern is apparent, to analyze the problem theoretically. This analysis, coupled with experimentation, might provide clues to guide further field investigation and the development of less biased evaluation systems. The following discussion attempts to provide such an analysis. The employee evaluation forms in general use (Tiffin 111McCormick, 1965) do not directly report behavior. Rather, they call for inferences from behavior to underlying traits or dispositions of the worker. They may also require judgements of the individual’s performance against an “average,” which is often unspecified. Attribution and stereotyping theories predict two sorts of bias when forms like these are used: a stereotype contrast/confirmation effect. producing over- and underevaluation of performance respectively. and an effect of job-extraneous factors which may be either direct or interact with ethnic group, objective performance level, or both. A great deal of research is consistent with the idea that negatively stereotyped workers who perform well will be overevaluated relative to workers for whom no negative stereotype exists. Feldman (1972) and Feldman and Hilterman ( 1975) found that black professional stimulus persons were rated more likely to possess characteristics such as intelligence, resourcefulness, etc., than were comparable Whites; this was explained by the contrast of the Blacks’ attained position with that stereotypically expected of black people, and the greater effort and ability thought necessary for a Black to attain professional status. Other studies, not dealing with stereotypes, have found similar results. They generally support the notion that, at equal performance levels, the individual with least initial advantage (and thus higher motivation, by Kelley’s 1971 theory) will be evaluated more highly. For example, Lanzetta and Hannah (1969) found more punishment given a “slow learner” when a lack of motivation could be inferred. Weiner and Kukla ( 1970), Leventhal and Michaels ( 197 1). and Leventhal and Whiteside (1973) all found, in a variety of settings and for a variety of tasks, that stimulus persons of perceived low task aptitude were evaluated more highly at a constant performance level than were stimulus persons of perceived high aptitude. These results
Feldman and Hilterman
37
are consistent with Kelley’s (197 1) and Jones and Davis’ (1965) attribution theories. To summarize, if we can assume that Blacks are generally perceived to have lower ability, motivation, and/or opportunity to successfully perform a job, a Black who exhibits high objective job performance may be evaluated more highly than a White whose performance is equally good. The logic for predicting underevaluation of Blacks’ performance follows similar lines. Basically, it is argued that a stereotype provides some information about a stimulus person in the absence of observed behavior. If behavioral evidence consistent with the stereotype is effect takes place, and the appropriate presented, a “confirmation” attributions are more strongly made than would otherwise be the case. Campbell (1967) proposes a similar “stereotype confirmation” effect, postulating that less real difference between two groups is required to attribute a trait once that trait has become part of a stereotype. In support of this proposition, Dion, Berscheid, and Walster (1972) have established the existence of a physica attractiveness stereotype in which less attractive people are judged to have fewer socially desirable personality traits. Dion (1972) has further shown that relatively unattractive children who commit severe transgressi(~ns (throwing a snowball with a rock in it, for example) have more negative personality characteristics attributed to them than do attractive children who commit the same offense. The unattractive children are also judged more likely to repeat the offense, which is itself judged to be more serious. Logically, then, if a negatively evaluated behavior is enacted by a member of a negatively stereotyped group, it ought to be seen as confirmation of the pre-existing stereotype “hypothesis.” This should lead to the more confident attribution of appropriate traits (relative to the same behavior seen in a member of a nonstereotyped or positively stereotyped group). Poor job performance would thus lead to underevaluation of a black worker relative to a poor-pe~orming White, other things being equal. An additional aspect of the hypotheticd performance evaluation process is not directly related to job behavior, but to the extraneous features of an individual’s dress and behavior. Such characteristics may obviously have direct affective implications for the evaluator, and may influence attributions as well. Triandis, Loh, and Levin
38
International
Journal
of Intercultural
Relations
(1966) found that style of dress, political beliefs, and quality of spoken English influenced the evaluation of black and white stimulus persons. Davidson (1972) found that Whites react negatively to speech, dress, and nonverbal behavior indicative of overt militancy among Blacks. L. M. Davison (1973) found, in a field setting, that the success of black job trainees in a factory setting depended on their being perceived by supervisors as serious, highly motivated, and nonmilitant. This perception in turn depended to a large extent on modes of dress and speech unrelated to job performance itself. Leventhal ( 197 1) has suggested that Blacks who act in a “socially desirable” manner would be relatively overevaluated, while those actway would be underevaluated. The common ing in an “undesirable” presence of “halo effects” (general evaluative biases) on typical performance evaluation scales would also suggest a direct influence of extraneous factors on performance evaluations. To summarize, while the exact nature and direction of influence can not be predicted, it follows from previous research that worker’s modes of dress and speech should influence the evaluation of their performance. This influence might be direct. with certain modes resulting in a higher or lower evaluation; it may also be indirect, with dress interacting with race, objective performance level, or both, to produce an over- or underevaluation. Generally, more conservative dress is expected to generate higher evaluations, especially in the case of black workers (Leventhal, 197 I). Parenthetically, it should be noted that the processes discussed above may partially explain the perceptions of mistreatment frequently reported integrated work settings (Davidson 6i Feldman, 197 1). Whites in such situations often claim that “Blacks get all the breaks,” while Blacks may claim that “You have to be twice as good as a white to get anywhere around here,” or “You have to act like an Uncle Tom to get a decent job.” These statements are quite consistent with the operation of the above processes. Though no information exists as to the actual frequency of such perceptions, the statements are made, and should be taken as an indication that something is seen to be amiss. Briefly, it is predicted that: la. Black stimulus employees will be evaluated more highly than Whites at high levels of objective performance. b. Black stimulus employees will be evaluated lower than Whites at low levels of objective performance.
Feldman and HiIterman 2.
39
Descriptions of dress style and verbal behavior will influence performance evaluations either directly, in interaction with race and performance, or both. The most likely direction for such influence is that more conservative descriptions will yield higher evaluations, especially in the case of black stimuli.
METHOD
Subjects were 182 male undergraduate business students,3 enrolled in introductory accounting and management courses. Less than 1% of the subjects were Black; the large majority of the sample was college age (only a few were over 25). Since both courses are required of all business students, it was felt that this sample would represent students planning business careers. Although students are likely to differ from present-day businessmen, many will eventually attain executive or supervisory positions. Thus, it is likely that the results will become more externally valid over time. Design, instruments,
and Procedure
The experiment was a 3-by-3-by-3 factorial design (high, moderate, or poor objective performance level; black, no identification, or white race; conservative, stylish, or “hip’‘-radical dress and demeanor). At least five subjects fell in each cell. The between-subjects design was adopted to prevent the operation of social desirability biases possibly induced by the use of identical black and white stimulus persons (Feldman, 1972b), to enhance the realism of the task, and to avoid the problem of range effects common to repeated-measures designs (Poulton, 1973). Stimulus materials. Subjects were given three forms containing the independent variable manipulations, the dependent measures, and an awareness questionnaire, respectively. The independent variables were manipulated by a photocopy of a bogus “Employee Behavior Report Form, ’ ’ which contained spaces for the employee’s name (obviously removed), employee number, age (20 in all conditions), sex (all male) and physical handicaps (none). In addition, an area on the form was set off by asterisks and marked “For Affirmative Action
40
lnt~rn~tiunal
Journal
of Intercultural
Relations
Office Use Only.” In this box, information was presented as to the stimulus person’s race (Black, White, or obliterated), whether or not the employee was in the “Hardcore Unempl~~yed Hiring Program” {always yes) and whether or not this was a trial hiring (always yes). This ~~lanip~lati(~n was designed to draw attention to the cmployee”s race, without making the manipulati(~n extremely obvious. The form was checked at the top as a “merit rating” of ‘-six months progress.“ Emphyees were described as supermarket cashiers. Dress style and performance manipulatit~ns were contained in a two-page, eight-category “Employee Behavior Report,“ supposedly completed by the store manager. The first category manipulated dress and behavior. and contained descriptions of clothing and speech patterns. It was labelled pusnnal appecwance und demrunor.” The descriptions themselves were adapted from a multjdjmensional scaling by Davidson (1972) of Blacks’ and Whites’ perceptions of black peoples’ dress and behavi~}r. Since the sample was almost exclusively White, and since generalizability to white populations was desired. only the white’s solutions were used to generate descriptions. The descriptions
were:
Conscrvafive: Wears hair short, clean shaven, Wears white shirt, other ordinary clothing. Uses grammatically correct English, no slang expressions or gestures. S~~~j~~~:Wears hair moderately long; has moderately long mustache. Wears stylish but not extreme &thing-flared slacks, boots, cotored or patterned shirts, etc. Uses some slang expressions. uses “peace” sign to friends. Hip: Wears hair quite long; has long “Fu Manchu‘” mustache. Wears extremely stylish cl~~thing-somewhat “hippie” looking. Uses a lot of slang expressions, uses raised-fist sign to friends.
Performance levels were manipulated by using Fogli, Hulin, and Blood’s ( 197 I) seven-scaled categories of cashier behavior.” The pnor ~rformance condition used two statements in each category from the lowest end of the scale, with an overall mean of 1.77 on a seven-point scale. The moderate performance conditjon used two statements from the middle of the dist~bution in each category, with an overall mean of 3.62, while the high-pe~o~ance condition used two statements from the high end, with a mean of 6.05.
Feldman
and Hilterman
41
This manipulation was chosen because: (1) the general public is familiar with the job of supermarket checker; (2) the scaled behavior represented a means of manipulating performance in an exact yet experimentally unobstrusive way, and (3) using behavior descriptions allowed a plausible rationale for the collection of evaluative ratings. It should be noted that the “Behavior Report Form” was completely nonevaluative, using no phrases like “good attendance” or “pleasant appearance. ’ ’ Dependent vuriuble meusures. Subjects rated the bogus employees on an “Employee Behavior Evaluation Form.” Ratings were made for each of the eight categories, including personal appearance and demeanor, on a seven-point scale ranging from exceptional to extremely poor. An “overall evaluation” was also made on this scale. Each employee was also rated on a growth future scale, which had five steps. These were terminated immediately (= I); given three months probution; retained at present level; retuined with increase in pay and considered for promotion; promoted ut first opportunity (= 5). The format of the scale itself was loosely adapted from Tiffin and McCormick (1965, p. 228). Awareness questionnaire. This questionnaire contained several open-ended questions designed to tap subject’s perceptions of the purposes and hypotheses of the study, and intentions to confirm or invalidate them. A subject was scored “aware” if “bias in employee evaluation,” “ race differences in performance evaluation,” “prejudice,” or similar responses were present. Procedure. Subjects were contacted in their classrooms. The experimenters were introduced as “members of the Management Department who are conducting a study of employee evaluation.” The experimenters then distributed the materials and reviewed the written introduction to the study. This presented the study as an attempt to develop a “fair, reliable, and objective” evaluation procedure for the Pickwick Corporation, owners of a discount supermarket chain in the western U.S. The corporation was said to be looking for knowledgeable but disinterested people to rate employees, so that ratings done by actual managers could be evaluated for possible bias. The entire study was said to be part of a larger Pickwick effort to “eliminate all forms of discrimination in hiring, placement and promotion.” The use of male checkers was explained as part of this program.
42
International
Journal
of Intercultural
Relations
Subjects were instructed to read the “Behavior Report Form” carefully several times, and then complete their employee ratings. They then filled out the awareness questionnaire, presented as a device to study people’s evaluative strategies. Questionnaires were distributed randomly to each class, subject to the constraint that at least five subjects appear in each cell of the design. To accomplish this, appropriate numbers of each of the 27 different questionnaires were stacked randomly and distributed to the class. After the data had been collected, subjects were asked to write their names and mailing addresses on a separate sheet of paper if they wished to receive a report of the results. Letters explaining the results of the study and the necessity for deception were sent after analysis and interpretation of the data had been completed. RESULTS Intrrrul
Validity
Because knowledge of the true purpose of the study might have produced social desirability bias, all awareness questionnaires were examined prior to analysis. Only three subjects showed any awareness of the hypotheses, as judged by their responses to the question “Write down, as clearly as you can, what you thought was the purpose of each part of the study. ” They were retained in the analysis. In fact, many students spontaneously offered suggestions for “motivating” stimulus persons in the low and moderate performance conditions, commented on the progressiveness of the Pickwick Corporation, or recounted their own work experiences. Students in the good performance condition commented that they were “glad I didn’t work with this guy-he sure would have made me look bad.” Others in the poor performance condition approved of the company’s forbearance in retaining such an obviously “unmotivated” employee, in order to “give him a chance.” In short, it appears that the manipulation was successful and involving. Because the ratings of the seven performance categories, overall evaluation, and “growth future” were highly intercorrelated, (mean r = .90, range = .84 to .95) only the 3-by-3-by-3 analysis of variance on overall evaluation is reported in Table 1.6 Neither hypothesis was strongly supported, since the predicted race by performance and race by dress by performunce interactions
43
Feldman and Hilterman TABLE 1 Analysis of Variance for Experiment 1: Overall Evaluation
Source
Race
Mean
style
(D)
Perfonnarici~
F
1.52
2.98**
1.83
(P)
.98 1.3h
RxDxP
.90
All of
figures variance
1%.
The
cell
sizes.
rounded estimate
to
two
(Hays,
between-subjects the
method
decimals. 1963) error
of
The
has term
unweighted
.83
Cl
--_
1.59
--_
I
.?7
fiXP
.Ol
475.14***
291.53
D
RxP
Note:
I
df
.94
CR)
Dress
Rx
squarr
2.21*
.Ol
1.47
--_
omega-squared
been has means
rounded 155 has
df.
percentage to
Due been
the to
used
tteclrest unequal (Hays,
1963).
* ** ***
.05
c p <
.lO
.05
< p
.06
P <
c
.OOOl
were not significant. By far the strongest effect is performance, controlling 83% of the variance. Analyses of the effect by the Scheffe test (Hays, 1963) show that all differences between performance levels are significant and in the expected direction (low, X = 1.67; moderate, X = 3.76; high, X = 6.07). The m~ginaIly signi~cant dress style main effect is due to the slightly higher rating given conservative stimuli (conse~ative, X = 4.03; stylish, X = 3.74; hip, X = 3.72). The ratings of conservative stimuli differ from those of both the stylish and hip stimuli at p < .05 by Sheffe test. The dress by pe$ormance interaction, presented in Table 2, is interesting despite its marginal significance level, since it suggests the use of a configural performance evaluation process. Conservative stimuli are rated highest (p < .05) at poor and moderate performance
44
International
Dress x Performance
Journal
TABLE 2 Interaction of Experiment
of Intercultural
Relations
I: Overall Evaluation
levels. This result is consistent with Weiner and Kukla (1970), if conservative dress is assumed to denote stronger motivation than either stylish or hip dress. DISCUSSION
The major hypothesis, that low-performing Blacks would be evaluated lower than comparable Whites, while high-performing Blacks would be rated higher than their white counterparts. was not supported. As usual when this is the case, a number of potential explanations exist. Bias in the subject population is one possibility. College students may not possess the negative stereotype of Blacks which is a prerequisite of the original hypothesis. However, Karlins. Coffman, and Walters (1969), Feldman ( 1972a) and Feldman & Hilterman (1975) did find that college students were willing to attribute stereotypes to Blacks, which would argue against such an interpretation. A second plausible explanation is that the negative results are a product of describing all stimulus persons as hardcore unemployed on the Behavior Report Form. If the so-called “racial” stereotype is in fact primarily a stereotype of the lower class, as Feldman (1972a; Feldman & Hilterman, 1975), has speculated, the hypothesized confirmation and disconfirmation effects would be equal for the black, white and no-identification conditions, and the expected differences would not occur.
Feldman
and Hitterman
45
A third possibility is that, since the program was presented as an attempt to eliminate discrimination, responses to Blacks were less subject to bias than otherwise might have been the case (Wollowick, Greenwood, and MacNamara, 1969). The awareness questionnaires do not confirm this explanation, but the possibility remains. A fourth hypothesis is that the extreme differences in performance washed out any effects of race or dress. However, tests of the variance within the good and bad performance conditions against that of the moderate performance condition show no significant differences, which makes this explanation relatively untenable. Of course, the most straightforward interpretation is that the theoretical hypothesis is incorrect. Gollob, Rossman, and Abelson (1973) have, however, supported a similar (though more elegantly stated) hypothesis, strengthening a methodological explanation of the negative findings. In order to assess the validity of two of these alternative explanations, Experiment 2 was conducted. EXPERIMENT
2
The second experiment was intended to ask two questions: whether perceived social class, and not race, is the essential variable in the operation of the hypothetical stereotype confirmation-disconfirmation effects, and whether or not the predicted effect depends on the possession by the subject of a negative racial and/or economic class stereotype. METHOD
Subjects were 78 male undergraduate business students,7 enrolled in an introductory management course. The age and race distribution in this sample was virtually identical to that in Experiment 1. The same comments made about the former sample naturally apply. Design, Instruments,
and Procedure
The experiment was a 2-by-3-by-2 between-subjects factorial design (good vs. poor performance; hardcore unemployed vs. normal
46
International
Journal of Intercultural
Relations
hire; black vs. white). Nine or more subjects were included in each of the eight cells of the design. ~t~rnu~~s ~~lut~ria~~and d~~~nd~~zt ~~~r~u~l~~.~. As before, subjects were given three forms. The “Employee Behavior Report Form” was identical to that used in Experiment 1, except that the prsonal ~7ppearance und demeanor section was eliminated. Stimulus person’s race was manipulated as before. Additionally, employment c*ate,gor~ was manipulated by changing the information in the ~‘Affirmative Action” box. For the hardcore unemployed condition, the box was identical to that of Experiment I. For the normal hire condition, the “Hardcore Unemployed Hiring Program” and “Trial Hiring” boxes were checked “no.” Good and poor performance manipulations were identical to those in Experiment I. Dependent variable measures were also identical to those in Experiment I. except that the perscmaf uppearanc’e and demeanor category was eliminated. Sterrotype rnv~wt-c. To assess the degree of racial and economic class stereotyping, a disguised stereotype measure was developed. Ten questions were intended to reflect racial stereotypes while 10 others were concerned with economic-class stereotypes. Table 3 presents these items. The 20 questions were randomly placed among 40 filler items, such as “Most unions are now more concerned with working conditions than with wages.” The 60 items were presented in a true-false format as a “Business Information Test.” Pretest subjects (44 paid male volunteers recruited from the entire campus) reSocial Desponded to this “test,” along with the Crowne-Marlowe sirability Scale (1964) and Semantic Differential evaluative scale ratings of hardcore unemployed men, ,rwrking-class men, hluck rwn, and ~~hit~~mm. The 20 stereotype items were subjected to principal components factor analysis. Two factors, accounting for 27% of the common variance, were extracted and rotated to a varimax criterion. Factor 1 combined items pertinent to both race and economic class, with positively worded items loading negatively. It was labelled “lack of m~~tivati~~nand ability. ” Items loading highly on Factor 2 were positively worded statements about both Blacks and hardcore unemployed, and so the factor was labelled “favorable racial/economic class stereotype.” Items loading more than .40 were summed with unit weights to generate individual scores.X Negatively loading items
Feldman
47
and Hilterman
TABLE 3 Rotated Factor Matrix for the Disguised Stereotype Measure Item
1.
Factor
Economically disadvantaged workers and work hard to get ahead.
often
an opportunity,
Poor people store work.
3.
Hardcore unemployed a lack of motivation
training programs often fail due on the part of the participants
4.
White black
"get away" with be disciplined.
5.
Black clerks clerks.
6.
Often, black workers white workers.
7.
Black people tend to dress more attractively whites at the same economic level.
often would
are more
properly
courteous
1
Factor
for retail
things
for which
to customers
are not as attentive
to
than white
to rules
as
than
1
-.OL
.3J
.LJ
.52*
.08
-.31 r-t-
.G8
‘/ -.35
.26
.a5
1
.50*
35
.66*
I
8.
Poor people generally be a salesclerk.
lack
the skills
necessary
to
.37
1 -.18 I
9.
Middle-class young people generally do worse as retail clerks than poor people, because they do not consider the job important.
10. Black
workers
tend
to be absent
more
12. Whites
are better
a~ handling
13. Black employees are more to do a good job.
money
highly
workers
15. BLack people than whites.
less
are generally
16.
People with backgrounds hard workers.
17.
White retail by customers
learn
.66*
satisfactory
of poverty
clerks tend to more than blacks.
-.lO
are
highly
.52*
than whites
new
jobs
as
.06
-.65*
employees
usually
very
.30
I
than blacks.
motivated
14. Economically disadvantaged quickly as anyone else.
-.42*
than whites.
11. People from "hardcore unemployed" backgrounds generally do poorly in public contact jobs.
2
---I-.07
2.
workers workers
do not dress
value
.40*
.3&l
.27
.20
a*
.19
-.a2
-.05
.61*
07
66*
evaluated
+
48
International
Journal
of Intercultural
Relations
TABLE 3 (continued)
were reverse-scored. Table 3 presents the rotated factor matrix. and indicates which items were included in the final stereotype measures. The Crowne-Marlowe Social Desirability Scale is an individual difference measure of the tendency to give “good” responses; that is, to answer questions in a manner designed to gain the approval of others. In the current context, a high correlation of either Factor 1 or 2 with the scale would indicate that answers were based on the subject’s impressions of what answers “should” be given, and that the filler items did not successfully disguise the intent of the “Business Information Test. ” Since neither factor was significantly correlated with the social desirability scale (r = - .28 and . IS respectively), it is unlikely that responses were strongly influenced by notions of propriety. Correlations of Factors I and 2 with the Semantic Differential evaluations of hardcore unemployed, black men, working-class men, and whire men should likewise be low, since beliefs are different from evaluations. However, they should be in appropriate directions. as particular beliefs are often associated with attitudes. Factor 1 correlated -.31 (p<.O5), -.21 (NS), --.I1 (NS), and .lc) (NS) with the respective evaluations, which is consistent with the logic above. Factor 2 correlated .20, -.08, .03, and -.02 (all NS) with the same evaluations. These correlations indicate that those who believe
Feldman
and Hilterman
49
somewhat negative things about Blacks and the hardcore unemployed (Factor 1) have a slight tendency to evaluate them negatively, as would be expected. Favorable beliefs (as reflected in Factor 2) are associated with slightly favorable ratings of the hardcore unemployed. Though this relationship does not reach significance, it is in the appropriate direction. Finally, Factors 1 and 2 themselves are correlated - .02, indicating they represent separate belief domains. This independence is not assured by the orthogonal rotation, since the factor loadings of all items were not used to generate individual scores. In short, while not ideal, Factor 1 seems to possess adequate convergent and discriminant validity for the present purpose. It is not greatly influenced by social desirability, the items load acceptably on the factors, and it relates to other constructs in a reasonable way, though not as strongly as could be desired. Procedure. Subjects were contacted during normal class hours. The same rationale was presented as had been used in Experiment 1. Instructions were identical, with the exception of explaining the Business Information Test as an attempt to measure knowledge of current business issues and practices, in order to assess the effect of such knowledge on performance ratings. Questionnaires were distributed as before, subject to the provision that at least eight subjects fell in each of the eight design cells. No awareness questionnaires were used, since so few people were judged aware in the previous study. Informal observations and comments about employee performance and company policy, similar to those of the subjects in Experiment 1, occurred. No spontaneous comments denoting awareness of the true purpose of the study were noted by the experimenters. After data analysis, the class was debriefed as a whole, and the results and theoretical orientation of the study were explained. RESULTS
The results of the race by employment category by performance analysis of variance on overall evaluation are presented in Table 4. As before, the performance main effect is by far the strongest. The employment category by performance interaction is also significant, but inspection of the cell means shows that they do not support the
50
International
Journal
TABLE 4 Analysis of Variance for Experiment Suurce Race
Mean
(R)
Employment (E) catt.gory Performance (P) RXE RXP
Square
F
df
1
1.58
.n4
L
'. I
444.09
I
2.30
1
$.68*
1
‘: 1
E x P
2.45
1
RxExP
.Ol
1
Relations
2: Overall Evaluation
.77
.fll
t p<
of Intercultural
Yfl4.63***
Omega-Squared
--.92
4.99*
111
01
.05
*** ji< .OOOl
hypothesis that poor performing hardcore employees will be rated lower than others, and good hardcore performers, higher. In fact, the opposite seems to be the case. As Table 5 shows, good performing nonhardcore employees are rated significantly higher than the good performing hardcore (p < .05 by Sheffe test), while the poor performers show a nonsignificant reversal of this relationship. The unexpected race by employmen? category interaction a slight degree of pro-black bias in the hardcore ratings. Blacks are rated significantly (p < .05) higher than hardcore The nonhardcore category shows a nonsignificant reversal. presents the means.
indicates Hardcore Whites. Table 6
If, in fact, the hypothesized stereotype confirmationdisconfirmation effects depend on the degree to which a negativestereotype is held, a negative correlation between scores on Factor I, “lack of motivation and ability,” and overall evaluation would be expected for ratings of black and/or hardcore unemployed, goodperforming stimulus persons. A positive correlation would be expected for ratings of the same stimulus persons under poor performance conditions. To test this hypothesis, the procedures for chi-square analysis of correlation coefficients in factorial designs (recommended
Feldman
51
and Hilterman
Employment
TABLE 5 Category x Performance Interaction of Experiment Overall Evaluation Employment
Category Nonhardcore
Hardcore
Pefonnance 5.
Good
Note:
1.19
I.50
POOI-
2:
93”
6.34a
An 5 superscript indicates a figures rounded to two decimals. significant (p < .05) Employment Category difference within Perfornmce
All
levels.
Race x Employment
TABLE 6 Category Interaction of Experiment 2: Overall Evaluation Race
Black
Note:
All figures rounded to two decimals. significant (p < .Oj by Sheffi test) within Race levels.
White
An a superscript indicates a Employment Category difference
by Jones, 1968) were extended to the three-variable, unequal-N case and applied to the within-cell correlations of Factor 1 and overall evaluation. Although the cell N’s are too small to yield useful point estimates of correlations, main and interaction effects in the predicted direction would indicate differences in the slopes of regression lines, a result which can be interpreted with more confidence. However, as Table 7 shows, no significant main or interaction effects were observed. GENERAL
DISCUSSION
The hypothesis of stereotype contrast-confirmation effects, though supported elsewhere (Gollob et al., 1973) was not found in either the racial or social class context. The results of Experiments 1 and 2 strongly suggest that bias in performance evaluation is not terribly
52
International
Journal
of Intercultural
TABLE 7 Chi-square Analysis of Factor l/Overall Evaluation Wi~n-dell Experiment 2 Source Ch i-sr udre Race
(K)
Empliynent Performance
Relations
Correlations
of
.Ol Categorv (P)
(I:)
1.95 .L3
extreme. This conclusion must be tempered, however, by the recognition that these studies were performed with college students in a strongly liberal atmosphere, under conditions such that no real benefit or harm could come to them, or the persons being evaluated, as a result of their ratings. Under actual rating conditions, the results might well be different (Wollowick et al., 1969). The results are not devoid of theoretical and practical implications. The effects of dress and the clrrss by ~~~rf{~r~i~~,~~ interaction found in Experiment 1, though weak, indicate the presence of extraneous influences on evaluation in the face of extreme performance differences. Conservative dress may denote a “hardworking” employee to evaluators, and result in differential evaluations at all but the highest performance levels. This may cause the perception of racial bias. since young Blacks (who dislike conservative dress and behavior; Davidson, 1972) may see the organization as rewarding Whites and “Uncle Tams.” This reasoning may appear to be unjustified, in the face of half-point differences in mean ratings. It is important to remember that, under some “merit” systems. a half-point difference on similar rating scales can make several hundred dollar’s difference in annual salary. or can even affect continued employment. It would be foolish to assume that employees are unaware of such contingencies. The results of Experiment 2 may likewise shed some light on the perceptions of discrimination by both Blacks and Whites. If a company’s hardcore hiring/training program is mostly composed of black employees, the significantly higher rating of good-performing nonhardcore employees (Table 5) may be interpreted as anti-black bias. Similarly, the significantly higher evaluations given to poor-
Feldman and Hilterman
53
performing black hardcore workers (Table 6) may indicate pro-black bias to Whites. Kelley’s (197 1) discussion of causal attribution would support these arguments. The differences discussed above are small, but they are also significant using a test noted for its conservatism (Hays, 1963). It is this significance, rather than the size of these differences, which makes them worth discussing, as they are the differences most likely to be replicated in future research. Further, the differences noted here are most likely to be found in actual performance evaluation settings, if the present studies have any external validity whatsoever. These data suggest that a configural model of the performanceevaluation process might be worth further investigation. Terborg and Ilgen (1975) have, for example, found that some aspects of antifemale discrimination can be explained by stereotyping and attribution processes. It would be surprising if some of the variance in performance evaluation of other currently disadvantaged groups were not influenced by similar processes. The occurrence of even weak interaction effects in the presence of very large performance differences indicates that extraneous sources of information do not combine in a simple additive fashion with performance to influence performance ratings. This and other data also suggest that attributional processes and norms of equity may be elicited by the rating task. In retrospect, the collection of data on causal attributions for performance would have been valuable in interpreting these results. Future studies should certainly include such questions. Additionally, such studies should manipulate actual, observable behavior (perhaps by the use of videotapes) as well as behavioral reports; systematically manipulate various dimensions of behavior to assess their separate influences on performance evaluatiomY use field studies in order to increase the generality of results over measures, settings, and populations; and deal with the relationships between performance evaluations and actual consequences to the worker, such as raises, promotions, and other job events. The usefulness of the present study is limited somewhat by the absence of data on causal attributions. The alternative explanations proposed in the discussion of Experiment 1 are admittedly speculative, and for each there exist reasons for doubting its validity. Only more precise research in laboratory and field settings will reveal which, if any, of the proposed alternatives is valid. On the other hand, the instruments used are very similar to those in daily use in many kinds of organizations. The behavioral descriptions
54
International
Journal
of Intercultural
Relations
used to manipulate performance levels are known to be valid (Fogli et al., 197 1). The experimental procedures were very similar to the widely used “In-Basket” technique (see Terborg & Ilgen. 1975, and Campbell, Dunnette, Lawler. & Weick, 1970) which has been found very useful in research of this sort. Available evidence indicates that the respondents were actively involved and interested in the task, and neither suspected nor resented the mild deception. The stereotype measure was free from obvious contamination by social desirability or attitude, even though it was probably not an ideal measure of all aspects of race/social class stereotyping. Perhaps the major contribution of this study is heuristic. These two studies raise the possibility that performance evaluation can be included in the same theoretical system which explains other aspects of person perception. The possibility also exists that, by better understanding the evaluation process, it may be improved with benefit to both individuals and organizations.
NOTES I. Initial development of the independent variable manipulations and dependent variable measures was done by the first author and David E. Weldon while both were at the University of Illinois, Urbana, under the support of Social and Rehabilitation Service Grant # IS-55 175/S (H.C. Triandis, principal investigator). The study reported here was supported by a grant from the Division of Sponsored Research. University of Florida, to the first author. 2. Requests for reprints should be addressed to Jack M. Feldman. Department of Management. University of Florida. Gainesville, Florida. 3261 I. Robert J. Hilterman is now with the Palm Beach County School Board, West Palm Beach. Florida. The authors express appreciation to Harry C. Triandis and Joe Reitz for their comments on an earlier version of this paper. 3. Due to the relatively small number of female subjects, sex comparisons are not included in the present data. 4. For convenience, this dimension will be referred to as “dress style” throughout the rest of this paper. 5. These categories are c.o~~~,irntiousn~ss, knowledge and judgment, skill in human relations, skill in register operation, skill in bagging. orgunizutionul ubility . and skill in monetary trunsuctions. 6. The factors were: race (Black. no-identification, White). dress (conservative. stylish, hip), and perj?wmancr (bad, moderate, good). The high intercorrelations of performance category. overall evaluation, and “growth future” arc most likely due to the fact that subjects saw only good, moderate, or poor ratings on all categories, and naturally responded in appropriate directions on the overall evaluation and growth future scales. Response intercorrelation simply reflects the stimulus intercorrelation (of I .O) built into the experimental materials. The origi-
55
Feldman and Hilterman
nal article (Fogli et al., 1971) supports the seven categories as valid aspects of actual cashier performance. 7. Female students were given an alternative task. 8. The correlation matrix (with I’s in the diagonal) used as input to the factor analysis was composed of phi coefficients, regarded as equivalent to product-moment correlations on dichotomous data when dealing with ordered categories (i.e., a~reemen~disagreement). See Hays, 1963 (pp. 604~6051,for further discussion. Two factors were selected on the basis of successive differences in eigenvalues for the unrotated solution; the plot of these differences showed a “break” at two or three factors. A three-factor rotation was also performed, but the third factor was not meaningful in terms of the study’s objectives, as it was composed of items referring to public-contact jobs. More details are available from the first author. 9. This is not possible with the current data since performance in all categories has been deliberately set at high, moderate, or low levels. Proper assessment requires a factorial design or measurement in a natural setting.
REFERENCES ALFERT, E. Are social stereotypes vanishing: A study of non-college population. Journal of Sociaf Issues. 1972, 28, 89 100. BASS, A. R., & TURNER, J. N. Ethnic group differences in relationships among criteria of job performance. Journal of Applied Psychology, 1973, 57, 101-109. CAMPBELL, D. T. Stereotypes and the perception of group differences. American Psychologist,
CAMPBELL, Managerial
1967, 22, 817-829.
J. P., DUNNETTE, behavior,
M. D., LAWLER,
performance,
and effectiveness.
E. E. III, & WEICK, K. E. New York: McGraw-Hill,
1970. CROWNE, D., & MARLOWE, D. The approval motive. New York: Wiley, 1964. DAVIDSON, A. R. Multidimensional scaling of black stimulus characteristics by black and white subjects, Mimeo, University of Illinois, Department of Psychology, Champaign, Illinois (I 972). DAVIDSON, A. R., & FELDMAN, J. M. An att~bution theory analysis of interracial conflict in job settings. Report No. I 1, Social and Rehabilitation Service No. 1.5-P-55175/5. Champaign, Illinois: Department of Psychology, University of Illinois, 1971. DAVIDSON, L. M. The process of employing the disadvantaged. Unpublished doctoral dissertation, M.I.T., 1973. DION, K., BERSCHEID, E., & WALSTER, E. What is beautiful is good. Journal of Personality and Social Psychology, 1972, 23, 285290. FELDMAN, J. M. Stimuius characteristics and subject prejudice as determinants of stereotype attribution. Journal of Personality and Social Psychology. 1972, 21, 333-340. (a) FELDMAN, J. M. Race and Ievel of abstraction of disagreements as dete~inants of evaluation and behavioral intentions. Report No. 12, Social and RehabiIitat~on Service No. 15-P-55175/5, Champaign, Illinois, Department of Psychotogy, University of Illinois, 1972. (b) FELDMAN, J. M., & HILTERMAN, R. J. Stereotype attribution revisited: The role
International
56
of stimulus
characteristici;.
racial
.
I..
HULIN.
C. L..
ioral job criteria. GOLLOB.
BLOOD.
ROSSMAN.
of the number
and cognitive
Relations
Journtrl
differentiation.
M.
R.
Devclopmcnt
P,~~~.h~~~~~,~~.1Y7
B. B.,
&z ABELSON.
of instances
of first-level
hehav-
1. 55. 3-Y.
R. P. Social
and Lxmsistency
infcrcncc
of information
as a
prescntcd.
cud Soc,itrl P.sFc,hoio,qy , IW3. 77. I V-33.
.lournal of’ Persorulity HAYS.
&
Jf)1ir~7ul c$‘App!itd
H. F..
function
attitude.
of Intercultural
P~yc~h~dog~. I Y75. 3 1. 1177. I 188,
~!f’Pcr.s~uw/it~ crnfl Socd FOGLI.
Journal
MI. L. Stcrtistic~sji~r
Ps~c~ho/o,~i.sts.
New
York:
Holt.
Rinehart.
and Winston,
lY63.
KARLINS.
M..
stereotypes:
COFFMAN. Studies
7‘. L..
&
WALTEKS,
G.
in rhrec generations of collrgc
On
the
fading
of social
cfPcr_.v~mr/it\
students. Jourrul
trnd Soc,icll Ps~cholo~~~~,I YhY. 1.3. I 1h. KELLEY.
H.
Kclley,
H.
H.
Attribution
H..
in social
Nishett.
R. t..
interaction.
Valins,
Pc~t-c~c~i~*incg the ~mc~~~.s or hehm%w. tion.
F.. I:..
.I.: Gencrai
n. N
Morristow
Kanousc.
L. tJ..
(fids.). ilttr-ihrctkm:
B.
Learning
c’orpc~~-
J. T..
LEVENTHAL,
G. S. Equity
proposal.
National
LEVENTHAL.
lY7l.
N..
&
~~p~~~~~ntal cholopical TERBORG.
behavior
of “naive”
trainers.
in social relationships.
(cited
Locus
Research
1Y73).
in Bass & Turner.
of cause and equity
rnotjvat~(~n
.lournul (15” Pcr.sondit~ cwd Sac,itr/ Ps\‘cho/o-
I
&
Philadelphia.
& ILGEN.
MCCORMICK.
LOH.
and opinions o$’ Pcrsontrlity
TRIANDIS,
H.
hardcore
panmcnt
C..
o$
WOLLOWICK. testing
C‘onvcntion,
performance,
and merit
rating:
of the Eastern
1069 (cited in Bass Xc Turner. approach
Orgtini:ufiorur/
An Psy-
1973).
to sex discrimination
Behmior
und
in
Per-
Hwntrr7
E. J.
W.
D..
& LEVIN,
D.
E.,
Service
No.
FEtLDMAN,
(5th
ed).
Englewood
University
a minority
of Illinois,
J. M.,
quality
I YlO.
attitudes.
3, 468-472. J. M.
Black
and
Report
Champion.
white
No.
Iliinois.
of achievement
14, Dc-
motivation.
15, l-20.
bi MacNAMARA.
population.
ctf spoken
lY77.
analysis
Psycho&v,
group
status,
of interpersonal
A cross-validation.
1S-P-55t7515.
A. An attributional
H. B.. GREENWOOD. 1969.
&
1066,
sub,jective cultures:
Pcwonulit_v und Swirtl with
L. A. Race.
as determinants
und Socitrl P.syholog,v,
R~habilitati(~n 8r KUKLA.
/rlf~~.str~ul Pswhnlt~g~
1965.
WELDON.
of Psychology. B..
experimental
I
at the C~~nventi~n
April
about civil rights
and middle-class
and
work
D. R. A theoretical
occupations.
using within-subject
I I3- I2
13, 352-376.
Y7S.
H. C..
English,
from
80.
Ci. S. Race.
N. J.: Prentice-fall.
TRIANDIS,
effects 1973.
Paper presented
masculine
J..
./out?&
ROTTER,
Association,
.fiJrrncl~7w.
range
Bullctit7.
cvaluatittn.
traditionally
WEINER.
I Y7 I
J. W.
allocation.
Unwanted
J. R.,
Sociar
& MICHAELS.
Pswhologic~ul
designs.
./out-t&
allocation
F(~undati~~n.
of reward
E. C.
Cliffs,
E. Reinforcing
17. 22Y-73s.
POULTON.
ROTTER.
T.
and reward
Science
G. S..
as determinants
TIFFIN,
6i HANNAH.
Pc~rsontrlity tend Socitrl P.syc,holrjgy , I%#, 1I , 145-257.
.lorrrrltrl (!f’
ical
In Jones,
and Weiner.
lY7 I.
LANZETTA.
gx,
S..
W.
APA Pnwcrlir~~~.s.
J. Psycholog77th
Annual
Feldman
and Hilterman
ABSTRACT
57
TRANSLATIONS
ORIGENES
DEL PREJUICIO EN LA EVALUACION DOS EXPERMENTOS
DE LA EJECUCION:
Las teorias de la atribucion y 10s estereotipos predicen que 10s trabajadores negros con calificaciones bajas en pruebas de ejecucion recibirin evaluaciones mis bajas que el grupo equivalente de blancos, mientras que se encontrara el efecto inverso en el grupo superior. Se predijo tambien que el “estilo de vida y vestido” influirian la evaluation. En el primer experiment0 se pidio a estudiantes de comercio, de1 sexo masculine, evaluar a empleados ficticios que diferian en raza, vestimenta y ejecucion. Los resultados apoyan poco la segunda hipotesis. Con un Segundo experiment0 se investigo la posibilidad de que sea la clase social percibida y no la raza la variable relevante en el proceso de la estereotipizacion. Se midieron tambien 10s estereotipos a traves de las diferencias individuales. A traves de este estudio no se encontro evidencia para apoyar ninguna de las hipotesis, sin embargo 10s resultados mostraron su relevancia potential a la teoria de las atribuciones, en especial el modelo configural de evaluation-ejecucion. Se discuten tambien 10s resultados en terminos de la discrimination percibida por el trabajador. SOURCE
DE BIAIS DANS L’EVALUATION DES PERFORMANCES: DEUX EXPERIENCES
Les theories de I’attribution et de la stereotypic predisaient que les travailleurs noirs ayant une faible performance recevraient des evaluations plus basses que les blancs correspondants, tandis qu’une haute performance aurait des effets inverses. I1 etait predit Cgalement que “la maniere de s’habiller et le style de vie” auraient une influence sur l’evaluation des performances. Des etudiants de commerce du sexe masculin, evaluant des employ& fictifs qui differaient tant par la race, I’habillement, et le comportement d&it en relation a la performance, ne confirmaient pas significativement la seconde hypothese. Une deuxieme experience avait pour objet d’explorer la possibilite que la classe sociale perGue plutot que la race soit la variable pertinente dans les processus de confirmation-contraste de la stereotypic. Les differences individuelles dans la stereotypic ont Cte aussi mesurees. L’hypothese particuliere concernant les processus n’etait pas non plus confirmee, en revanche les resultats confirmaient la pertinence eventuelle de la theorie de I’attribution d’un modele configural de la performance-evaluation. Les resultats ont ete egalement discutes en termes d’une discrimination perGue du travailleur.