Age, racial, and gender bias as a function criterion specificity: A test of expert testimony

Age, racial, and gender bias as a function criterion specificity: A test of expert testimony

AGE, RACIAL, AND GENDER BIAS AS A FUNCTION CRITERION SPECIFICITY: A TEST OF EXPERT TESTIMONY H. John Bernardin Florida A t/an tic University H. W. He...

1MB Sizes 41 Downloads 83 Views

AGE, RACIAL, AND GENDER BIAS AS A FUNCTION CRITERION SPECIFICITY: A TEST OF EXPERT TESTIMONY H. John Bernardin Florida A t/an tic University

H. W. Hennessey, Jr. University of Hawaii-Hi/o

Joseph Peyrefitte Florida A t/an tic University

We examined a common expert witness theme in EEO cases that rating bias in the form of ethnic, age, or gender differences in personnel decisions based on performance appraisals is moderated by criterion specificity or rating scale format. Few studies have investigated this issue and results do not support the position that the more objective or specific criteria for assessment will result in smaller differences between groups based on age, gender or ethnic classification.

AGE, RACIAL AND GENDER BIAS AS A FUNCTION OF THE APPRAISAL SYSTEM appraisal is the most heavily litigated personnel activity in equal employment opportunity law involving decisions such as terminations, lay-offs and promotions. For example, one of the most common EEO suits combining statute, personnel activity and personnel decision is an Age Discrimination in Employment Act (ADEA) claim alleging that age was the “determinative factor” in a decision to terminate and the organization maintains the decision to terminate was based on performance (Barrett & Kernan 1987; Kane & Kane 1993). Title VII claims of race, gender and national origin discrimination which concern personnel decisions based on performance appraisals are also very

Performance

Direct all correspondence to: H. John Bernardin, Department of Management Florida Atlantic University, Boca Raton, FL 33431. Human Resource Management Review, Volume 5, Number 1,1995, pages 63-77 All rights of reproduction in any form reserved.

and International

Business,

Copyright 0 1995 by JAI Press Inc. ISSN:1053-4822

64

HUMAN RESOURCE MANAGEMENT

REVIEW

VOLUME

5. NUMBER

1. 1995

common and on the increase since passage of the Civil Rights Act of 1991. Wrongful discharge lawsuits and claims related to the Americans with Disabilities Act, also on the increase, often involve testimony regarding the validity, accuracy and/or bias in an appraisal system as well (Austin, Villanova, & Hindman in press). Our ongoing study of the litigation in these areas indicates the frequent use of expert witnesses who address the critical question of whether a particular performance appraisal system or system characteristic is more (or less) susceptible to bias so as to create a disadvantage for a particular person or class of persons who possess one or more protected class characteristics. For example, one designated expert testified on behalf of a plaintiff in an ADEA action that a particular performance appraisal instrument used by a company to make performance-based downsizing decisions was the type of appraisal form susceptible to various forms of rating bias, including age discrimination. Our study of court testimony in EEO complaints has revealed that the general theme that the instrument may have contributed to the negative outcome for a plaintiff is common. The implication for a jury of course is that if the company had only done the appraisal in another way, the preferred way as presented by the plaintiff’s expert, the particular alleged age, race or gender “bias” would not have resulted and thus, perhaps the decision(s) deleterious to the plaintiff would not have not have been made (e.g., wude u. Mississippi Cooperative Extension Service 1974). There is also an indication that such testimony has an impact on the outcome of EEO cases and particularly out-of-court settlements. Numerous articles reviewing court cases related to performance appraisal conclude that the characteristics of a performance appraisal system format does have an impact on the outcome of a case (e.g., Ashe & McRae 1985; Austin et al. in press; Bernardin & Cascio 1988; Feild & Holley 1982; Kane & Kane 1993; Martin, Bartol, & Levine 1986; McEvoy & Beck-Dudley 1991; Ritchie & Lieb 1994). The type of rating format (e.g., trait versus behavioral), the number of raters involved, whether there was an appeal process available to the plaintiff, whether a job analysis was done to develop the appraisal form are among the characteristics which have been cited as related to case outcomes (Bernardin & Beatty 1984; Burchett & DeMeuse 1985; Feild & Halley 1982; Thompson & Christianson 1984; Thompson & Thompson 1982). Perhaps the most common expert testimony concerns the appraisal format itself. Numerous articles either cite court cases in which this issue appears to have had an impact on the outcome of a case (e.g., Austin et al. in press) or make the argument that certain formats (e.g., behavioral or results-based) are susceptible to less bias (e.g., Ledvinka & Scarpello 1991). The general position taken in this writing is that the greater the criterion specificity (i.e., behaviors, results or outcomes and not traits), the greater the probability of a fair and accurate appraisal and thus lower, or less, or no age, race, or gender bias. Expert testimony is particularly si~i~cant since the Supreme Court ruling in Watson u. Ft. Worth Bank and Trust (1988). This unanimous decision allows the use of the “disparate impact” theory when personnel decisions are made by subjective employment practices such as performance appraisals (Bersoff

AGE, RACIAL, AND GENDER BIAS AS A FUNCTION

65

1988). The decision allows plaintiffs to present statistical evidence of discrimination using the “disparate impact” theory rather than requiring them to present evidence of deliberate or intentional bias against them based on their particular protected class characteristic (Sharf 1988a, 1988b). Clara Watson, for example, presented data showing white supervisors hired 3.5% of black applicants and 14.8% of white applicants. These same supervisors also rated blacks 10 points lower in their annual appraisals. Testimony about the appraisal system which led to the negative outcome for Watson indicated that the appraisal system was biased; thus implying that the deleterious decisions for Watson and other similarly situated blacks were related to the subjective appraisal system. The ability of plaintiffs to use the “disparate impact” theory instead of the “disparate treatment” theory places a relatively greater legal burden on the part of the defendant (Ledvinka & Scarpello 1991). If the plaintiff establishes prima facie evidence of discrimination based on the statistical evidence, the defendant must show that the basis of the decision was “job-related,” a far more onerous burden for the defendant compared to the burden in a “disparate treatment” case. Although the Court said in Watson that “employers are not required, even when defending standardized or objective tests, to introduce formal ‘validation studies’ showing that particular criteria predict on- the- job performance,” there is no question that the burden in explaining the “job relatedness” of the personnel activity is greater than in disparate treatment cases where the burden is on the plaintiff to prove whether the protected class characteristic was “determinative” in the decision (Lee 1990). The problem with expert testimony regarding appraisal characteristics is that we were unaware of a body of research which clearly supported this expert view. Implied in this testimony is the argument that the instrument or rating format is somehow related to (or even caused) the negative outcomes for protected class members. One would think that research relating, for example, ethnicity, age or gender to appraisal results as a function of the appraisal format should thus be cited to support this expert position. In fact, there has been little research which has directly assessed the extent to which the characteristics of a rating instrument or the specificity of the criterion moderated the relationship between ratee race, gender or age and appraisal results or personnel decisions (Schmitt & Noe 1986). In such cases, a jury is typically presented with the statistical evidence followed by the testimony of an expert who either implies or explicitly states that the statistical evidence resulted as a direct function of the particular appraisal system which was used to make decisions. For example, Ledvinka and Scarpello (1991) state that “subjectivity also encourages disparate impact” (p. 51), clearly implying that relatively more “objective” systems of appraisal will result in less statistical disparity. Expert testimony may also refer to the plaintiff’s position that a “less discriminatory alternative practice does exist” (Ledvinka & Scarpello 1991, p. 54). If such an argument is persuasive, the plaintiff can prevail in a Title VII case even when the defendant presents unrefuted evidence ofjob relatedness (Arvey & Faley 1988). For “disparate treatment” cases, the burden is on the plaintiff to prove

66

HUMAN RESOURCE MANAGEMENT

REVIEW

VOLUME

5, NUMBER

1, 1995

“intentional” discrimination (Lee 1990). For these cases then, expert testimony which establishes that particular appraisal characteristics are more likely to facilitate “intentional” discrimination would obviously be germane to such cases. The purpose of this article is to review the literature related to the issue of bias in performance appraisal to determine the extent to which expert testimony as described above can be justified by the research, We will investigate the impact of bias on performance ratings as a function of the type of appraisal system and system characteristics. Borrowing from the expert theme presented above, the hypothesis to be tested is that greater criterion specificity as defined by an appraisal format or instrument will result in less rating bias such that there would be relatively less for no> deleterious impact (as defined in legal proceedings) against protected class members in personnel decisions. We will also examine the impact of other appraisal characteristics and determine whether the question of intentional discrimination has been studied in the context of particular appraisal characteristics.

RESEARCH RELATED TO RATING BIAS Landy and Farr (1980) and Feldman (1981) called for research which would move away from the development and evaluation of rating scales toward research which studies the cognitive processes of raters. They argued for a shift in research focus since, at that point, only a modest amount of variance in ratings could be explained strictly by format. Their reviews found no relationship between rating format or criterion specificity and rating bias defined as race, gender or age differences. Considerable research has since focused on the accuracy of rater judgments, and the factors which may affect the accuracy of the judgment process (e.g., DeNisi, Cafferty, & Meglino 1984; DeNisi & Summers 1986; Ilgen, BarnesFarrell, & McKellin 1993). Many researchers have concentrated on one of those factors, bias, and its effect on performance ratings. Investigations of the effects of rater and ratee age, gender, and race on ratings have been numerous (e.g., Cascio & Valenzi 1978; Cleveland & Landy 1981; Cleveland & Landy 1983; Dedrick & Dobbins 1991; Dipboye 1985; Ferns, Yates, Gilmore, & Rowland 1985; Nieva & Gutek 1980; Schmitt & Lappin 1980; Waldman & Avolio 1991). Several researchers, however, have discussed the inconsistencies in the literature regarding the effects of age, gender and race on performance ratings (e.g., Kraiger & Ford 1985; McEvoy & Cascio 1989; Oppler, Campbell, Pulakos, & Borman 1992; Shore & Bleicken 1991; Pulakos, White, Oppler, & Borman 1989; Sackett & DuBois 1991; Waldman & Avolio 1986). As these researchers note, although dozens of studies have assessed variances in subgroup evaluations, some have found significant subgroup effects while others have found none. In addition, the overall contribution to variance explanation in most studies which have found significant subgroup effects has been very small. Effect sizes also generally covary with the research design. Studies which have found large and significant variances in ratings as a

67

AGE, RACIAL, AND GENDER BIAS AS A FUNCTION

function of some ratee characteristic such as gender or age have been primarily laboratory studies where the ratee performance specimens under study were very limited (e.g., Hamner, Kim, Baird, & Bigoness 1974; Rosen & Jerdee 1973; Rosen & Jerdee 1976a; Rosen & Jerdee 1976b). While there have been significant subgroup effects in field studies, the variance explanation in those studies has been relatively small or nonexistent (Bass & Turner 1973; Griffeth 8z Bedeian 1989; Mobley 1982; Schmidt & Johnson 1973; Schwab & Heneman 1978; Pulakos et al. 1989; Wendelken & Inn 1981; Wexley & Pulakos 1982; Zalesny & Kirsch 1989). Thus, findings in field studies either contradict or do not support the laboratory studies (Barrett & Morris 1993).

META-ANALYSIS

ON PERFORMANCE

DIFFERENCES

Several meta-analytic studies have been conducted in order to help explain the discrepancies in subgroup effects (Ford, Kraiger, & Schechtman 1986; Kraiger & Ford 1985; McEvoy & Cascio 1989; Sackett & DuBois 1991; Waldman & Avolio 1986). The general conclusion we can derive from these studies is that race and age effects explain small proportions of the variance in performance ratings. We could locate no meta-analytic studies which have assessed gender effects in performance evaluations. Kraiger and Ford (1985) concluded that although we could explain from three to five percent of the variance, a ratee race effect was consistent across the seventy-four studies they examined, albeit small for both black and white raters. Both black and white raters gave significantly higher ratings to members of their own race. In contrast, Ford et al. (1986) found that whites were rated higher but that the performance of whites was higher on “objective” performance indices (e.g., training and job knowledge tests, absenteeism, units produced). The Ford et al. (1986) study cannot be directly contrasted to Kraiger and Ford (19851, however, since the Ford et al. (1986) study was limited to primarily white raters. This comparison is possible, however, with the Sackett and DuBois (1991) study. Sackett and DuBois (1991) showed that whites received higher ratings from both black and white supervisors, even while controlling for performance differences. The age meta-analyses agreed with one another with respect to the relationship between age and performance. In their study of forty samples, Waldman and Avolio (1986) concluded that the belief that job performance declines with age could not be supported. McEvoy and Cascio (1989) concurred with these findings. Their meta-analysis of 96 studies revealed that age and job performance were generally unrelated. An important outcome of the meta-analytic studies is the discovery of moderators of the subgroup effects and evaluations. By removing the effects of moderators, researchers may demonstrate the reduction or elimination of subgroup effects from evaluations. Waldman and Avolio (1991) illustrated this phenomenon. Once cognitive ability, education, and experience were controlled, ratee race effects were generally eliminated from their study. Ford et al. (1986), Kraiger and Ford (1985), Waldman and Avolio (19861, and

68

HUMAN RESOURCE MANAGEMENT

REVIEW

VOLUME

5, NUMBER

1.1995

McEvoy and Cascio (1989) all addressed moderators as part of their metaanalyses. The moderators which were examined included study setting, subgroup composition of the Workgroup, training, rating purpose, job type, and the type of rating criteria. Five moderators of ratee race effects were investigated in the Kraiger and Ford (1985) study: (1) study setting, (2) rater training, (3) type of rating, (4) rating purpose, and (5) composition of the Workgroup. Only study setting and the composition of the Workgroup moderated rater-ratee effects, Effect sizes were larger in field settings than in laboratory settings, while race effects declined as the percentage of blacks in the workgroup increased. Mean effect sizes were not significantly different for rater training, rating purpose or rating format. Tests for these moderators, however, were limited to white raters and the 55 field studies in the meta-analysis. The small number of studies precluded the examination of moderator variables for black raters. The authors tested for moderators only in the field studies since they argued that training, purpose, and format represent organizational conditions rather than psycbolo~~~ processes that could occur in any setting. Thus, to investigate the effects of the moderators (except Workgroup compositions, the authors classified studies with dichotomous variables (e.g. training offered/not offered, behavior based/trait) for further analysis. Of the 55 field studies in their sample, only one study (viz., Brugnoli, Campion, & Basen 1979) specifically investigated the effects of rating format. Kraiger and Ford (1985) concluded that behaviorallybased ratings were equally prone to race effects as trait ratings. They noted that this finding is contrary to prevalent theory (Wherry & Bartlett 19821, although consistent with Landy and Farr (1980) and Feldman (1981) that rating formats account for only limited variance in ratings. With a subset of the studies used in the Kraiger and Ford (1985) metaanalysis, Ford et al. (1986) searched for differences in race effects according to rating criteria (objective performance indices vs. subjective ratings). Only those studies which reported at least one objective indicator and one subjective rating of performance were selected. The 44 different types of objective criteria found in the studies examined were classified into three categories: performance indicators (e.g. units produced, customer complaints), absenteeism, and cognitive criteria (training and job knowledge tests). The subjective ratings were either overall ratings of effectiveness or defined dimension ratings. The researchers found a small but significant effect size for the objective indices in addition to the subjective criteria. Whites were rated higher on subjective ratings, but they also scored higher on the objective indicators. Ford et al. also noted that “race effects tended to covary between objective and subjective measures; for example, studies with large (or small) race effc?ct sizes for ratings tended to have a large (or small) effect size for the same sample of employees on the objective measure” (1986, p. 334). Thus, across all studies and criterion categories, the eflect sizes for subjective and objective criteria were similar. Waldman and Avolio (1986) showed that age effects across studies varied according to the type of performance measure used and the job type. However,

69

AGE, RACIAL, AND GENDER BIAS AS A FUNCTION

their analysis was limited to only 40 samples from 13 different studies. Objective indicators (productivity indices such as unit of output over a period of time) demonstrated performance increases as employees grew older, however, supervisory ratings showed small declines as employees aged. Supervisory ratings were either an overall rating or a score based on a composite index of dimensions. In addition, performance ratings showed more positive relations with age for professionals as compared to non-professionals. They called for research which uses both objective and subjective indices simultaneously in order to reconcile these discrepancies, and perhaps determine age bias since none of the studies in their meta-analysis had done so. In the second meta-analysis to address age effects on evaluations, McEvoy and Cascio (1989) came to conclusions opposite to Waldman and Avolio (1986). They found that the type of performance measure (supervisory ratings vs. productivity indices) and the job type (professional vs. nonprofessional) did not significantly moderate the relationship between age and performance. These conflicting findings may be due to the number of studies which were included in the meta-analyses. While Waldman and Avolio (1986) found only 13 published studies which correlated age and job performance, McEvoy and Cascio (1989) found 65 studies with 96 independent samples. The difference was due to the method of study selection. McEvoy and Cascio (1989) also investigated studies which did not deal directly with age and job performance, but had measured performance in some capacity and had collected age information as a control variable. Productivity indices in the McEvoy and Cascio (1989) study were outputtype measures; the supervisory ratings were not specifically described. The authors mentioned only that they were subjective. The authors also stated that studies were selected only if they reported a measure of job performance, and not a proxy of performance. Thus, studies which only reported performance indicators such as tenure, absenteeism, or salary level were not included. In summary, the meta-analytic studies lean toward the conclusion that subgroup differences do not vary with rating system, instrument or format. Kraiger and Ford (1985) found no differences in race effect size between behavioral or trait systems, while the results of the Ford et al. (1986) study extended these findings. Ford et al. (1986) found no difference in race effects between objective indicators and subjective ratings. Although Waldman and Avolio (1986) did find differences in age effects due to the rating type (objective vs. subjective), the larger McEvoy and Cascio (1989) study discredits this finding since the majority of the studies used in the Waldman and Avolio (1986) study were also a part of the McEvoy and Cascio (1989) meta-analysis.

CRITERION

SPECIFICITY

AND RATING BIAS

We located only four published studies which investigated the extent to which performance scores or decisions based on protected class characteristics differ as a function of criterion specificity.

70

HUMAN RESOURCE MANAGEMENT

REVIEW

VOLUME

5, NUMBER

1, 1995

Bass and Turner (1973) presented data most germane to the theme of expert testimony under study here. Six supervisory ratings and four objective criteria were compared for black and white bank tellers. While the mean differences between black and white employees were generally small, the most significant differences in which whites out-performed blacks were on the objective criteria (e.g., percent of time worked, number of shortages, number of overages). Differences between black and white employees on the supervisory ratings were smaller and almost all were non-significant when the effects of job tenure were partialled out. Brugnoli, Campion and Basan (1979) examined the role of behavioral specificity of the evaluation instrument and task relevance in explaining racial bias in work samples for personnel selection. The applicants were evaluated using a specific behavioral form, a global rating scale, or both. Race-linked bias, defined as the reliance on stereotypes in behavioral assessment (Hamner et al. 1974), was found only under global evaluations of irrelevant task behavior. When behavioral formats were used, no bias was found, regardless of the task condition. Thus, when relevant task behavior was observed, scale specificity did not affect bias. Thompson and Thompson (1985) found significant race effects when an overall ranking format was used and no such effects when a task-based rating instrument was used. No gender effects were found under either situation. The authors noted, however, that format differences could be a function of the reliability differences in the methods-.66 for the ranking format and 55 for the task-based scores and, since “true” performance could not be measured, the race differences obtained using the ranking method do not necessarily indicate bias. In a study of self-rated performance and pay satisfaction among sales representatives, Motowidlo (1982) showed that age was not significantly correlated with self-ratings of performance. However, age was significantly and positively correlated with an objective measure (total dollar value of sales) and the extent to which the sales measures was confounded by contaminants which were to some extent correlated with age or job tenure could not be determined. The subjective measures used in the study were two sets of supervisory ratings. The first ratings were graphical scales of personal qualities (e.g., sales energy, sales acuity, organization). The second ratings were also graphic scales but consisted of specific aspects of the job (e.g. identifying prospects, closing sales, keeping records). The two subjective ratings were highly correlated. Ilgen et al. (1993) concluded that research on scale formats provided clear support that all types of rating formats under study are subject to cognitive processing distortions; raters do not simply report what they have observed. Several studies were cited which suggested that both behaviorally anchored rating scales and summated or behavioral observation scales, for example, affected rater observation and storage/retrieval processes and are therefore subject to cognitive distortion (Bernardin & Kane 1980; DeNisi, Cafferty, & Meglino 1984; DeNisi & Summers 1986; McDonald 1991; Murphy & Constans 1987; Murphy, Martin, & Garcia 1982; Nathan & Alexander 1985; Nathan &

AGE, RACIAL, AND GENDER BIAS AS A FUNCTION

71

Cascio 1986). Ilgen et al. (1993) thus argued that ratings could be biased regardless of the type of scale used and that no one type of rating format was relatively more (or less) susceptible to rating bias. Ilgen et al. cited no studies nor were we able to locate any studies which directly investigated raters who “intentionally” bias ratings because of a ratee’s race, gender, age, disability, national origin or any other characteristic and the extent to which raters were more (or less) successful as a function of the particular appraisal format or other appraisal characteristics. Most of the research cited in the Ilgen et al. review which led to this conclusion involved students either rating professors or videotapes of performance specimens. None of the format comparisons reviewed involved standardsbased, MB0 or Work Planning and Review (WPR) systems of performance appraisal, the most common appraisal systems being used in the U.S. today (Bretz, Milkovich, & Read 1992). Many appraisal experts argue that an appraisal system which entails the writing of results-based or behavioral standards or objectives for each job and a more specific, results-oriented definition of performance criteria is less likely to result in rating bias than appraisal systems which entail less specific performance criteria as the basis for evaluation (Heneman 1986; Huber 1989). Heneman, Wexley, and Moore (1987), for example, surmised that formats which were more specific and contained behavioral statements as opposed to traits were more accurate and thus less biased. This conclusion was based on the findings of Borman (1979), Fay and Latham (1982), and Osburn, Timmreck, and Bigby (1981). However, none of this research investigated racial, gender or age differences as a function of format or criterion specificity. Other researchers have taken similar positions regarding rating accuracy but do not cite research to justify the inferential leap taken by some expert witnesses that greater criterion specificity necessarily translates into relatively smaller race, gender or age effects (e.g., Bernardin 1992; Kane & Kane 1993). Contrarily, Bernardin, Kane, Villanova, and Peyrelitte (in press) recently illustrated significant rater main effects across very different rating formats, thus supporting the notion that a biased rater can be (and is) a biased rater regardless of rating format. Their research showed that a lenient (or harsh) rater generally maintained that position even when the ratees and the rating formats changed.

CONCLUSION The research indicates that all types of appraisal systems which have been studied to date are subject to rating distortion, deliberate or otherwise. The conclusions derived from the meta-analytic studies and the equivocal findings in four studies which investigated criterion specificity, and the lack of research comparing different formats for their susceptibility to bias as defined in the case law, supports the argument that positions regarding criterion specificity espoused by some expert witnesses in litigation are not supported by the

72

HUMAN RESOURCE MANAGEMENT

REVIEW

VOLUME

5. NUMBER

1, 1995

research. We therefore argue that it is incorrect to assume that the specificity of the criteria in a performance appraisal system will necessarily reduce or eliminate rating bias. It is still fair to say that some, albeit limited, research and argument supports the notion that criterion specificity can result in greater accuracy and less rating bias. The more objective the criterion in the form of, for example, countable results not contaminated by some variable correlated with a protected class, the less likely there is an opportunity for deliberate rating bias. It is not unreasonable to assume that a racist, sexist or ageist rater is more likely to manifest these tendencies in personnel decisions when the performance criteria which are the basis of these decisions are relatively more ambiguous. The Waldman and Avolio (1986) meta-analysis on age effects is particularly noteworthy in this regard. However, we can only form this hypothesis since the research does not allow an unequivocal conclusion. We thus can only conclude that the “jury is still out” with regard to the effects of appraisal system characteristics and, in particular, rating criterion specificity. The results concluding that older workers are generally rated as high or higher than younger workers is relevant to ADEA cases in which performance appraisals are conducted after a decision has been made to reduce overhead and/or eliminate personnel. Many companies use outside consultants to develop and conduct “special” performance appraisals as a part of a downsizing/restructuring/rightsizing effort (Kane & Kane 1993). If the result of this “special” performance appraisal is a negative correlation between age and performance ratings and/or, more importantly, termination rate, this result is contrary to the literature and would be particularly unusual if the direction of the correlation between age and ratings went from positive for a predownsizing appraisal period to a negative correlation after the decision was made to downsize. Much like our criticism of the expert views supporting the typical plaintiff positions regarding allegations of bias, the research does not support an expert/company defense of this change in the correlational sign that the “special” appraisal system was somehow different, more valid or jobrelated than the appraisal system used before the downsizing and that data comparisons are thus inappropriate. Rather, the research seems to show that when you make substantial changes in an appraisal system but do not impose different conditions for rating (e.g., a context of overhead reduction where employee salaries are hard to ignore), differences between older and younger workers’ ratings do not typically change significantly such that younger workers suddenly emerge with significantly higher ratings. The research in general agrees with Landy and Farr (1980) and, more recently, Ilgen et al. (1993) that criterion specificity need not make a difference in the amount of rating bias as such bias is typically defined in litigation involving performance appraisal. We found no evidence that statistical disparities which are used to define “disparate impact” would be different as a function of a particular rating format. We also found no studies which specifically addressed the issue of intentional or deliberate rating bias as a function of the particular rating format.

AGE, RACIAL, AND GENDER BIAS AS A FUNCTION

73

Based on the 1993 Supreme Court ruling in dubber u. barbell Dow Fkrmaceutical, judges have begun to exclude expert testimony not generally accepted by the scientific community. There are a plethora of “hired-guns” in EEO litigation who depose on demand. This paper invites professional criticism in the hope of reaching a scientific consensus to determine, in the words of the Court, whether the evidence’s “underlying reasoning or methodology is scientifically valid.“’ We must conclude that the burden is certainly on those experts who maintain that there is some causal connection between a particular deleterious outcome for some protected class member(s) and a particular type of performance appraisal format or system. The research does not seem to support this general theme.

REFERENCES Arvey, R. D. and R. H. Faley. 1988. ~air~ss in ~e~e~t~~g ~rnp~o~ees~ (2nd ed.). Reading, MA: Addison-Wesley. Ashe, R. L. and G. S. h&Rae. 1985. “Performance Evaluations go to Court in the 1980’s.” Mercer Law Review 36: 887-905.

Austin, J. T., P. Villanova, and II. D. Hindman. (in press). “Legal Requirements and Technical Guidelines Involved in Implementing Performance Appraisal Systems.” In Human Resource ~a~age~~t: Pers~ctiues and Issues, (3rd ed.f, edited by G. R. Ferris, K. M. Rowland, and M. R. Buckley. Boston: Al&n and Bacon. Barrett, G. V and M. C. Keman. 1987. ~Perfo~ance Appraisal and Te~inations: A Review of Court Decisions since Brito v. Zia with Implications for Personnel Practices.” Personnel Psychology 40: 489-503. Barrett, G. V. and S. B. Morris. 1993. ‘The American Psycholo~ca~ Association’s Amicus Curiae Brief in Price Waterhouse v. Hopkins.” Law and Human Behavior 17: 20X215.

Bass, A. R. and J. N. Turner. 1973. “Ethnic Group Differences in Relationships among Criteria of Job Performance.” Journal of Applied Psychology 57: 101-109. Bernardin, H. J. 1992. “The ‘Analytic’ Framework for Customer-Based Performance Content Development and Appraisal.” Human Resource Ma~age~~t Review 2: 81-102.

Bernardin, H. J. and R. W. Beatty, 1984. Pe~rman~e Appra~a~: Assessing Haman Behavior at Work. Boston, MA: Kent-Wadsworth. Bernardin, H. J. and W. F. Cascio. 1988. “Performance Appraisal and the Law.” Pp. 235247 in Readings in Personnel and Human Resources, edited by R. Schuler, S. Youngb~ood and V. L. Huber. St. Paul: West Publishing Co. Bernardin, H. J., J. Kane, P Villanova, and J. Peyrefitte. (in press). “The Stability of Rater Leniency: Three Studies.” Academy of Ma~eme~t Journal. Bernardin, H. J. and J. S. Kane. 1980. “A Second Look at Behavioral Observation Scales.” Personnel Psychology 33: 809-814. Bersoff, D. N. 1988. ‘Should Subjective Employment Devices be Scrutinized?: It’s Elementary, My Dear Ms. Watson.” American Psychologist 43: 1016-1018. Box-man, W. C. 1979. “Format and Training Effects on Rating Accuracy and Rater Errors.” JournaE of speed Psychology 64: 4X0-421. Bretz, R. D. Jr., G. T Milkovich, and W. Read. 1992. ‘The Current State of Performance

74

HUMAN RESOURCE MANAGEMENT

REVIEW

VOLUME

5, NUMBER

1 t 1995

Appraisal Research and Practice: Concerns, Directions, and Implications.” Journal of Management

18: 321-352.

Brugnoli, G. A., J. E. Campion, and J. A. Basen. 1979. “Racial Bias in the Use of Work Samples far Personnel Selection.” journal of Applied Psycho~gy 64: 119-123. Burchett, S. R. and K. P. DeMeuse. 1985. “Performance Appraisal and the Law. Personnel 62: 29-37.

Cascio, W. F. and E. R. Valenzi. 1978. Relations among Criteria of Police Performance.” Journal of Applied Psychology 63: 22-28.

Cleveland, J. N. and F. J. Landy. 1981. “The Influence of Rater Age and Ratee Age on Two Performance Judgments.” Personnel Psychology 34: 19-29. 1983. “The Effects of Person and Job Stereotypes on Two Personnel Decisions.” -. Journal of Applied PsychoLogy 68: 609-619. Zlaubert u. Merrell Dow Pharmaceutical, 113 S. Ct. 2786. 1993. Dedrick, E. J. and G. H. Dobbins. 1991. “The Influence of Subordinate Age on Managerial Actions: An Attributional Analysis.” Journal of organizational 3ehavior 12: 367-377.

DeNisi, A. S., 1: P. Cafferty, and B. M. Meglino. 1984. “A Cognitive in View of the Performance Appraisal Process: A Model and Research Propositions.” Organization& Behavior and Human Performance

33: 360-396.

DeNisi, A. S. and T. P. Summers. 1986. Rating Forms and the organization of Information: A Cognitive Role for Appraisal Instruments. Paper presented at the 46th Annual Meeting of the Academy of Management, New Orleans, LA. Dipboye, R. L. 1985. ‘Some Neglected Variables in Research on Discrimination in Appraisals.” Academy of Ma~~ement Review 10: 116-127. Fay, C. H. and G. P. Latham. 1982. “Effects of Training and Rating Scales on Rating Errors.” Personnel Psychology 35: 105-116. Feild, H. S. and W. H. Holley. 1982. “The Relationship of Performance Appraisal Systems Characteristics to Verdicts in Selected Discrimination Cases.” Academy of Manage~nt

Journal 25: 392-406.

Feldman, J. M. 1981. “Beyond Attribution Theory: Cognitive Processes in Performance Appraisal.” Journal of Applied Psychology 66: 127-148. Ferris, G. R., V. L. Yates, D. C. Gilmore, and K. Rowland. 1985. The Influence of Subordinate Age on Performance Ratings and Causal Attributions.” Personnel Psychology 38: 545-557.

Ford, J. K., K. Kraiger, and S. L. Schechtman. 1986. “Study of Race Effects in Objective Indices and Subjective Evaluations of Performance: A Meta-Analysis of Performance Criteria.” ~ycho~ogica~ Bulletin 99: 330-337. Griffeth, R. W. and A. G. Bedeian. 1989. “Employee Performance Evaluations: Effects of Ratee Age, Rater Age, and Ratee Gender.” Journal of Organljational Behavior 10: 83-90.

Hamner, W. C!.,J. S. Kim, L. Baird, and W. J. Bigoness. 1974. “Race and Sex as Determinants of Ratings by Potential Employers in a Simulated Work-Sampling Task.” Journal of Applied Psychology 59: 705-711.

Heneman, R. L. 1986. “The Relationship between Supervisory Ratings and ResultsOriented Measures of Performance: A Meta-Analysis.” Personnel Psychology 39: 81 l-826. Heneman, R. L., K. N. Wexley, and M. L. Moore. 1987. “Perfoimanee-Rating Accuracy: A Critical Review.” Journal of Business Research 15: 431-448. Huber, V. 1989. ~~omparison of Specific and General Performance Standards on Performance Appraisal Decisions.” Decision Science 20: 545-557.

AGE, RACIAL, AND GENDER BIAS AS A FUNCTION

75

Ilgen, D. R., J. L. Barnes-Farrell, and D. B. McKellin. 1993. “Performance Appraisal Process Research in the 1980s: What has it Contributed to Appraisals in Use?” Organizational Behavior and Human Decision Processes 54: 321-368. Kane, J. S. and K. Kane. 1993. “Performance Appraisal.” Pp. 377-404 in Human Resource Management: An Experiential Approach, edited by H. J. Bernardin and J. Russell. New York: McGraw-Hill. Kraiger, K. and J. K. Ford. 1985. “A Meta-Analysis of Ratee Race Effects in Performance Ratings.” Journal of Applied Psychology 70: 56-65. Landy, F. J. and J. L. Farr. 1980. “Performance Rating.” Psychological Bulletin 87: 72107. Ledvinka, J. and V. G. Scarpello. 1991. Federal Regulation F Personnel and Human Resource Management. Boston: Kent-Wadsworth. Lee, B. A. 1990. “Subjective Employment Practices and Disparate Impact: Unresolved Issues.” Employee Relations Law Journal 15: 403-417. McDonald, T. 1991. “The Effect of Dimension Content on Observation and Ratings of Job Performance.” Organizational Behavior and Human Decision Processes 48: 252271. McEvoy, G. and C. L. Beck-Dudley. 1991. Legally Defensible Performance Appraisals: A Review of Federal Appeals Court Cases. Paper presented at the Annual meeting of the Society Industrial and Organizational Psychology, (April), St. Louis, MO. McEvoy, G. M. and W. F. Cascio. 1989. “Cumulative Evidence of the Relationship between Employee Age and Job Performance.” Journal of Applied Psychology 74: 11-17. Martin, D. C., K. M. Bartol, and M. J. Levine. 1986. “The Legal Ramifications of Performance Appraisal.” Employee Relations Law Journal 12: 370-396. Mobley, W. H. 1982. “Supervisor and Employee Race and Sex Effects on Performance Appraisals: A Field Study of Adverse Impact and Generalizability.” Academy of Management Journal 25: 598-606. Motowidlo, S. J. 1982. “Relationship between Self-Rated Performance and Pay Satisfaction among Sales Representatives.” Journal of Applied Psychology 67: 209-213. Murphy, K. R. and J. I. Constans. 1987. “Behavioral Anchors as a Source of Bias in Rating.” Journal of Applied Psychology 72: 573-577. Murphy, K. R., C. Martin, and M. Garcia. 1982. “Do Behavioral Observation Scales Measure Observation?” Journal of Applied Psychology 67: 562-567. Nathan, B. R. and R. A. Alexander. 1985. ‘The Role of Inferential Accuracy in Performance Rating.“Academy of Management Review 10: 109-115. -. 1988. “A Comparison of Criteria for Test Validation: A Meta-Analytic Investigation.” Personnel Psychology 41: 517-535. Nathan, B. R. and W. F. Cascio. 1986. “Introduction: Technical & Legal Standards.” Pp. l-50 in Performance Assessment: Methods &Applications, edited by R. A. Berk. Baltimore: Johns Hopkins Press. Nieva, V. F. and B. A. Gutek. “Sex Effects on Evaluation.” Academy of Management Review 5: 267-276. Oppler, S. H., J. P. Campbell, E. D. Pulakos, and W. C. Borman. 1992. “Three Approaches to the Investigation of Subgroup Bias in Performance Measurement: Review, Results, and Conclusions.” Journal of Applied Psychology 77: 201-217. Osburn, H. G., C. Timmreck, and D. Bigby. 1981. “Effect of Dimensional Relevance on Accuracy of Simulated Hiring Decisions by Employment Interviewers.” Journal of Applied Psychology 66: 159-165. Pulakos, E. D., L. A. White, S. H. Oppler, and W. C. Borman. 1989. “Examination of

76

HUMAN RESOURCE MANAGEMENT

RRllEW

VOLUME

5, NUMBER

1, 1995

Race and Sex Effects on Performance Ratings.” Journal of Applied Psychology 74: 770-780. Ritchie, J. E. and P. S. Lieb. 1994. Feild and Holley Revisited: A Look at Performance Appraisal Characteristics Since 1980. Paper presented at the Annual Meeting of the National Academy of Management, (April), Dallas, TX. Rosen, B. and T. H. Jerdee. 1973. “The Influence of Sex-Role Stereotypes on Evaluations of Male and Female Supervisory Behavior.” Journal of Applied Psychology 57: 44-48. 1976a. “The nature of job-related stereotypes. Journal of Applied Psychology 61: -. 180-183. -. 1976b. “The Influence of Age Stereotypes on Managerial Decisions.” Journal of Applied Psychology 61: 428-432. Sackett, P. R., C. L. Z. DuBois, and A. W. Noe. 1991. “Tokenism in Performance Evaluation: The Effects of Work Group Representation on Male-Female and WhiteBlack Differences in Performance Ratings.” Journal of Applied Psychology 76: 263-267. Sackett, P. R. and C. L. Z. DuBois. 1991. “Rater-Ratee Race Effects on Performance Evaluation on: Challenging Meta-Analytic Conclusions.” Journal of Applied Psychology 76: 873-877. Schmidt, F. L. and R. H. Johnson. 1973. “Effect of Race on Peer Ratings in an Industrial Situation.” Journal of Applied Psychology 57: 237-241. Schmitt, N. and R. A. Noe. 1986. “Personnel Selection and Equal Employment Opportunity.” Pp. 71-115 in International Review of Industrial and Organizational Psychology (Vol. 21, edited by C. L. Cooper and I. Robertson. Chichester: Wiley. Schmitt, N. and M. Lappin. 1980. “Race and Sex as Determinants of the Mean and Variance of Performance Ratings.” Journal of Applied Psychology 65: 428-435. Schwab, D. P. and H. G. Heneman, III. 1978. “Age Stereotyping in Performance Appraisal.” Journal of Applied Psychology 63: 573-578. Sharf, J. C. 1988a. “APA & Civil Rights Bar Opposed by Justice, EEOC, ASPA, IPMA, & EEOC before Supreme Court in CLARA WATSON V: FORT WORTH BANK & TRUST” The Industrial-Organizational Psychologist 25: 27-34. 1988b. “Litigating Personnel Measurement Policy.” Journal of Vocational Behav-. ior 33: 235-271. Shore, L. M. and L. M. Bleicken. 1991. “Effects of Supervisor Age and Subordinate Age on Rating Congruence.” Human Relations 44: 1093-1105. Thompson, D. E. and T. A. Thompson. 1985. “Task-Based Performance Appraisal for Blue Collar Jobs: Evaluation of Race and Sex Effects.” Journal of Applied Psychology 70: 747-753. Thompson, D. E. and P. S. Christiansen. 1984. “Court Acceptance of Uniform Guidelines Provisions: The Bottom Line and the Search for Alternatives.” Employee Relations Law Journal 8: 587-602. Thompson, D. E. and T. A. Thompson. 1982. “Court Standards for Job Analysis in Test Validation.” Personnel Psychology 35: 865-874. Wade v. Mississippi Cooperative Extension Service, 372 F. Supp. 126. 1974. Waldman, D. A. and B. J. Avolio. 1986. “A Meta-Analysis of Age Differences in Job Performance.” Journal of Applied Psychology 71: 33-38. 1991. “Race Effects in Performance Evaluation: Controlling for Ability, Educa-. tion and Experience.” Journal of Applied Psychology 76: 897-901. Watson v. Fort Worth Bank & Trust, 108 S. Ct. 2777. 1988. Wendelken, D. J. and A. Inn. 1981. “Nonperformance Influences on Performance Eval-

AGE, RACIAL, AND GENDER BIAS AS A FUNCTION

77

uations: A Laboratory Phenomenon ?” Journal of Applied Psychology 66: 149158. Wexley, K. N. and E. D. Pulakos. 1982. “Sex Effects on Performance Ratings in Manager-Subordinate Dyads: A Field Study.” Journal of Applied Psychology 67: 433-439. Wherry, R. J. and C. J. Bartlett. 1982. “The Control of Bias in Ratings: A Theory of Rating.” Personnel Psychology 35: 521-551. Zalesny, M. D. and M. P. Kirsch. 1989. “The Effect of Similarity on Performance Ratings and Interrater Agreement.” Human Relations 42: 81-96.