Test bias and ability level testing

Test bias and ability level testing

Journal of School Psychology 1979 • Vol. 17, No. 3 0022-4405/79/1500-0255500.95 O 1979 The Journal of School Psychology, Inc. TEST BIAS AND ABILITY ...

238KB Sizes 4 Downloads 60 Views

Journal of School Psychology 1979 • Vol. 17, No. 3

0022-4405/79/1500-0255500.95 O 1979 The Journal of School Psychology, Inc.

TEST BIAS AND ABILITY LEVEL TESTING BERNARD SILVERMAN Roosevelt University

Summary: The average grade equivalent reading comprehension scores of students in 50 black schools were compared to those of students in 50 white schools under two forms of test administration. Traditional grade level testing resulted in smaller group differences between blacks' and whites' scores than the apparently more valid ability level testing technique. It was concluded that the use of grade level testing with the Iowa Tests of Basic Skills is biased in favor of low scoring subgroups. The term test bias has come to have almost as many meanings as there are experts engaged in studying the problem (Cleary, 1968; Thorndike, 1971; Cole, 1973; Peterson & Novick, 1976). In charging that a test is biased against members of a particular subgroup such as blacks, some mean to say that scores earned by blacks imply more of the ability being measured than is usually the case. If an intelligence test is biased against blacks, black individuals who score 80 might manifest cognitive competency comparable to whites who score 100. For others, claiming that a test is biased against blacks means that, as a group, blacks' performances compared to whites is appreciably worse on the test than on the criterion (i.e., that which the test is supposed to measure). Those who prefer the first meaning of bias have concluded that scholastic ability tests are either unbiased or biased in favor of blacks (Cleary, 1968; Davis & Temp, 1971; Silverman, Barton, & Lyon, 1976). Thus, if a white and a black student both scored 500 on the SAT, the white usually did better in college. Those who prefer the second meaning of bias have looked at the same data and concluded that the use of scholastic ability tests is biased against blacks. They found a number of instances in which blacks' scores overlapped more with whites' scores on the criterion (college GPA) than on the test itself (Schmidt & Hunter, 1974; Mercer, 1974). Ability level testing, a practice common to many big city school systems (Ayer & McNamara, 1973; Wick & Ward, 1977), refers to giving students a level of a reading test (e.g., Iowa Test of Basic Skills) commensurate with their apparent reading ability. Most often this results in poor readers' taking a lower level of the test than would be merited given their grade in school. Such a procedure reduces the likelihood of poor readers' responding by guessing and thus increases the tests' reliability and presumably its validity as well. Thus, scores derived from ability level testing may well provide a better approximation of students' actual reading ability than scores obtained in the traditional manner. The aim of the present study was to assess the effect of out-of-level or ability level testing on the overlap between blacks' and whites' scores on the reading comprehension subtest of the Iowa Test of Basic Skills. Should ability level testing lead to greater overlap in scores, it might be argued that reading comprehension scores resulting from the traditional in-grade administration of the test are biased against blacks. On the other hand, should the presumably more meaningful scores resulting from ability level indicate similar or less overlap between blacks and whites, the argument that the 255

256

Journal of School Psychology

in-grade administration of the test is biased against blacks would not be supported. In fact, a wider gap between blacks' and whites' reading scores would suggest that traditional administration techniques are biased in favor of blacks.

METHOD In March of 1973 and in March of 1974 the sixth- and eighth-grade forms of the Iowa Test of Basic Skills reading comprehension subtest were administered to sixth and eighth graders respectively in the more than 400 grammar schools in Chicago. One finding of these grade level testings was that 38% of the students were scoring at or below the chance level. These students could not read the tests and were answering in a random fashion. Their scores, which placed them among the bottom 1 or 2% of students in their grade, earned the sixth and eighth graders grade equivalent scores of 3.2 and 4.3 respectively. To obtain a more meaningful index of reading comprehension, it was necessary to provide the students with a test they could read. To that end ability level testing was instituted in March, 1975. Teachers were instructed to give students a form of the test that they believed the student could comprehend, even if it meant giving an eighth grader the form of the test traditionally administered to those in the fourth or fifth grade. In order to ascertain if ability level testing differentially affected the measurement of blacks' and whites' reading comprehension, average test scores of sixth and eighth graders in 50 all black schools were compared to the average test scores of sixth and eighth graders in 50 all white schools across a 3-year period from 1973 to 1975. Although a minimal voluntary pupil transfer plan for increasing racial balance was in effect at this time, the racial composition of the schools included in this study remained constant. R E S U L T S AND D I S C U S S I O N The results of a 2(race) x 2(grade) x 3(year) analysis of variance with repeated measures on the last two factors shown in Table 1 indicate that the main effects of race,

Table 1 The relationship between race, grade, year (testing technique), and grade equivalent reading scores Source

SS

df

MS

F

Race (A) Error Grade (B) A x B Error Year (C) A x C Error B x C A x B × C Error

736.11 108.94 405.07 8.73 12.83 39.97 15.60 23.37 4.38 .72 21.26

1 98 1 1 98 2 2 196 2 2 196

736.11 1.11 405.07 8.73 .13 19.98 7.80 .12 2.19 .36 .10

662.13"*

*p < .05. **p < .01.

3092.04** 66.68** 167.56"* 65.40** 20.18"* 3.36*

Silverman

257

grade, and year (testing technique) were significant. Whites ()( = 7.98) scored higher than blacks (X = 5.76), eighth graders (X = 7.69) higher than sixth graders (X = 6.04), and those tested in 1973 (X = 6.92) and 1974 ()7( = 7.15) higher than those tested in 1975 (X = 6.52). All interaction effects also achieved significance. However, we shall interpret only the race x year and the race x year x grade interactions as they bear directly upon the question at hand. Figure 1 depicts the means and standard deviations of the 50 reading scores falling in each of the 12 cells of the design and in so doing provides the data needed to interpret the significant three-way interaction. It further provides sufficient information to allow the interested reader to reconstruct a similar diagram depicting the two-way interaction between race and year (testing technique). The two-way interaction was significant because only for blacks was there a relationship between the way in which the test was administered and reading scores. If we collapse across grades we find that when tested at grade level in 1973 and 1974, blacks' reading scores hovered around 6.00. They dropped significantly to 5.19 when ability level testing was initiated in 1975. Whites' reading scores remained about 8.00 regardless of the testing procedure employed. The three-way interaction apparently attained significance because blacks' reading score decline was especially pronounced among eighth graders. To precisely assess the overlap between black and white schools' mean reading scores at each grade level within each year, the difference between their means was divided by the white schools' standard error of the mean. This revealed that for both sixth and eighth graders the black schools' mean fell a greater number of standard deviation units below that of the white schools in 1975, when ability level testing was begun, than when reading scores were obtained by means of grade level testing in the previous years. The marked reduction in the percentage of students answering at or below the chance level (80%) in 1975 strongly suggests that ability level testing leads to more meaningful scores. Figure 1 shows that compared to scores generated by this technique, traditional grade level testing yielded scores that overestimated the reading comprehension scores of blacks but not whites. Thus, the traditional administration of the reading comprehension subtest can be said to be biased in favor of blacks as blacks' and whites' distributions of test scores are in fact closer together than are the two subgroups on the ability (inferred from the 1975 testing) that the test purports to measure. No doubt this occurs because as a group blacks score lower than whites and therefore derive greater benefit from the fact that the minimum grade equivalent score assignable often implies more ability or skill than the student possesses. This is accentuated among those in the higher grades. For example, an illiterate sixth grader taking the sixth-grade form of the test and falling in the first percentile can only earn a grade equivalent score of 3.2, while an equally illiterate eighth grader taking the eighth-grade version can do no worse than 4.3. Giving both students the fifth-grade form of the test will result in their both earning grade equivalents of 2.7 and a more precipitous decline for the eighth grader. This, of course, is exactly what the data revealed. Finally, it should be pointed out, lest anyone believe that our findings resulted from prejudiced teachers' underestimating blacks' reading comprehension and giving them versions of the test that precluded grade level performance, that eighth graders given the fifth-grade form of the test could still earn a grade equivalent score of 9.1. Were black eighth graders given the fourth-grade form they might still earn a score (7.8) that exceeded their average performance when tested at grade level.

258

Journal of School Psychology

Silverman

259

Interestingly, tests are only thought of as biased when they assign comparatively low scores to easily identifiable subgroups. Perhaps it is the ethic of equality that predisposes many to imagine that apparent group differences are contributed to by tilt in the test. Hopefully this paper will serve to remind readers that in some instances actual between group differences may be greater than traditionally derived test scores suggest.

REFERENCES Ayer, J., & McNamara, T. Survey testing on an out-of-level basis. Journal of Educational Measurement, 1973, 10, 79-83. Cleary, T. Test bias: Prediction of grades of Negro and white students in integrated colleges. Journal of Educational Measurement, 1968, 6, 115-124. Cole, N. Bias in selection. Journal of Educational Measurement, 1973, 10, 237-255. Davis, J., & Temp, G. Is the SAT biased against black students? College Board Review, 1971, 81, 4-9. Mercer, J. Latent functions of intelligence testing in the public schools. In L. Miller (Ed.). The testing of black students. Englewood Cliffs: Prentice-Hall, 1974. Peterson, N., & Novick, M. An evaluation of some models for culture-fair selection. Journal of Educational Measurement, 1976, 13, 3-29. Schmidt, F., & Hunter, J. Racial and ethnic bias in psychological tests. American Psychologist, 1974, 29, 1-8. Silverman, B., Barton, F., & Lyon, M. Minority group status and bias in college admissions criteria. Educational and Psychological Measurement, 1976, 36, 401-407. Thorndike, R. Concepts of culture-fairness. Journal of Educational Measurement, 1971, 8, 63 -70. Wick, J., & Ward, F. Testing students at functioning reading level: A 2 year report from Chicago. Chicago: Department of Research and Evaluation, Chicago Board of Education, 1977. Bernard Silverman Assistant Professor Department of Psychology Roosevelt University 430 South Michigan Avenue Chicago, Illinois 60605 Received: July 22, 1977 Revision Received: December 12, 1977