System, Vol. 22, No. 3, pp. 349-352, 1994 Copyright 0 1994. Elsevier Science Ltd Printed in Great Britain. All rights reserved 0346-251X/94 $7.00+0.00
Pergamon 0346-251X(94)00024-7
ARE MULTIPLE-CHOICE
TESTS UNFAIR
TO GIRLS?
JAN HELLEKANT
Department
of Education,
Gdteborg University
Gender differences in English proficiency as a function of testing method were examined by comparing the results of a 10% representative sample (n varied between 3,258 and 4,13 1) of boys and girls in the Swedish upper-secondary school on a subtest of the national test in English for the years 1986-93. The tasks were similar, but ca half of them were of the multiple-choice and half of the free-response format. A hypothesis that boys would be more successful than girls on the multiplechoice tasks was confirmed. Each year the boys performed better than the girls on the multiple-choice part and the girls better on the free-response part.
INTRODUCTION One of the most important measures taken in order to achieve objectivity and reliability in language tests was the introduction of multiple-choice items. In the old pre-scientific days, the score obtained on a language test depended to some (or a large) extent on the personality and temporary mood of the marker. With the m/c technique it became possible for an untrained person or even a machine to mark a paper exactly according to the test writer’s intentions. If items were carefully selected after pre-testing and statistical analysis and if the number of items was large enough, the result was, in addition, a highly reliable test. DEBATE True, there were language teachers who were sceptical, at least in Sweden where the novelty appeared in national language tests for the first time in 1967. Statisticians soon convinced those who thought it was possible to pass a test by guessing the right answers, that there was no cause for concern. There were, however, sceptics who questioned the validity of the m/c format by declaring that the ability to choose the correct alternative in a m/c item comprised other important variables besides language proficiency. Some teachers said, for instance, that risk-takers and persons who did well on crossword puzzles and old-style intelligence tests were likely to be unduly favoured by the m/c technique. As time went on, the initial criticism subsided, however, for test construction improved, objective freeresponse parts were added and teachers became generally satisfied with the quality of the national language tests. But a few years ago, attention was again focused on the validity of the m/c format. This time it was not teachers but some test constructors who expressed concern, for they had begun to suspect that the m/c format interacted with gender to the effect that girls were put at a disadvantage. 349
350
JAN HELLEKANT
EMPIRICAL
EVIDENCE
The same suspicions had, in fact, been voiced much earlier by some British educationalists. Wood (1978) analysed the London Board pass rates in O-Level English language. The number of pupils varied from 20,462 in 1971 to 25,096 in 1977. In 1971 the pass rate for girls was 22.2% higher than that for boys. This difference narrowed down to 10.0% in 1977 with the biggest reduction occurring in 1972. The only thing that could explain the drastic change in 1972, he writes, is that this was the year in which multiple-choice questions were introduced into the examination. Wood concludes that the m/c format favours boys and disadvantages girls. Bolger and Kellaghan (1990) quote Murphy (1980), who “provided time-series evidence that, following the introduction of a multiple-choice paper into a 1977 examination in geography, on which the performance of male and female candidates had always been similar, the percentage of male candidates who obtained A, B, or C grades became approximately 10% higher than the equivalent figure for female candidates” (p. 166). Bolger and Kellaghan studied the performance of 15-year-old boys (n = 739) and girls (n = 758) in Irish schools on m/c tests and free-response tests of mathematics, Irish, and English achievement. Their main conclusion was that “males performed significantly better than females on multiple choice tests compared to their performance on free-response examinations. An expectation that the gender difference would be larger for the languages and smaller for mathematics because of the superior verbal skills attributed to females was not fulfilled” (p. 165).
SWEDISH
NATIONAL
TEST
DATA
A close examination of some results on the national English test for the Swedish uppersecondary school seems to confirm the findings mentioned above. This test is taken by well over 30,000 pupils each year. It consists of a listening and a reading comprehension subtest and a third subtest where grammar, phraseology and vocabulary are tested in context. The listening and reading subtests are of the m/c format, whereas about half of the third subtest is of the free-response format. This free-response part is a rational-deletion cloze text of ca 30 items where pupils are supposed to fill in one missing word in each gap, e.g. Most people have some goal or ambition in life that they ultimately hope to attain. At the time I’m writing about, my only goal in life, in addition to not starving to . . . , was to own a shiny new automobile, with a steering . . . untouched by human hands, etc. The second part of this subtest is a m/c test comprising ca 25 items, each of which consists of a short text where the missing word is to be chosen from five printed alternatives. In other words, this part of the subtest consists of a number of short rational-deletion cloze texts-but of the m/c format, e.g. In a society as competitive as ours no one can take success for . . .
A granted B naturaI C given D assured E sure
ARE
MULTIPLE-CHOICE
TESTS
UNFAIR
TO
GIRLS?
351
If we adhere to the hypothesis that boys do better on m/c tests than girls, we would expect boys to be more successful on items of the last-mentioned model. Here are the results from the last seven years of a representative 10% sample of the pupils who sat the test. Reliable data from 1990 are missing-only a limited number of teachers had their pupils take the 1990 test because of industrial action. Table
1. Average
correct
Format m M/c Free-resp Difference Data are based
response
1986 f diff
70 64 6 66 68 -2 8
m
rates (070) for males and females on two different Swedish national tests in English
1987 f diff
6358 5 64 65 - 1 6
on a 10% representative
m
1988 f diff
70 65 5 57 59 -2 7 sample
m
1989 f diff
5951 8 56 58 -2 10
m
1991 f diff
6359 4 60 62 -2 6
where n varies between
formats
m
in the 1986-1993
1992 f diff
61 58 3 65 67 -2 5
m
1993 f diff
6152 9 60 63 -3 12
3,258 and 4,131.
As can be seen from the table, our hypothesis is confirmed. The girls invariably score higher than the boys on the free-response part, although only one word per item is to be produced. (Their superiority cannot be ascribed to better handwriting.) The boys invariably score higher on the m/c part. The differences between boys and girls are bigger on the m/c part, varying between 3% in 1992 and 9% in 1993. On the free-response part, the differences between boys and girls are small, but consistently negative, varying between - 1% and - 3%. If the free-response part is a correct reflexion of the pupils’ “true” proficiency, we must admit that the m/c format gives a pretty distorted picture of that proficiency. Indeed, the deviations from this supposed “truth” varied, in mathematical terms, between 5% in 1992 and 12% in 1993, the average deviation being ca 8%.
CONCLUSION We find, then, that when Swedish boys and girls did an English test where the tasks were similar but where one part was of the m/c and one of the free-response format, the girls invariably did a little better than the boys on the free-response part, while the boys did considerably better than the girls on the m/c part. It is difficult to find another reason for this result than the testing method. To be sure, the content of a text has a bearing on test results. We know, for instance, that girls usually score higher on texts dealing with “female” domains and boys higher on “male” domains. But the whole subtest is constructed by the same test writer and there is no reason to believe that, in seven successive tests, the contents of the m/c part were always masculine and that those of the free-response part were always feminine. In fact, the free-response part from which the extract above is quoted deals mainly with a man fond of cars-a typically male domain. Yet the girls scored slightly higher on that rather masculine text (the 1988 test). The suspicion that there is a fairly strong interaction between the m/c method and certain personality traits may be less absurd than some testers think. A possible explanation for our results is, in fact, that pupils with certain personality traits are favoured by the m/c format, and that these traits are more common in boys than in girls. What exactly these traits are like we can only guess at.
352
JAN HELLEKANT
Multiple-choice tests have many advantages. They are objective and can be made very reliable. It seems, however, that they may not be sufficiently valid if the purpose of the test is to measure nothing but language proficiency. It would therefore seem wise to replace m/c items with free-response tasks, where this is possible without jeopardizing reliability and objectivity. In cases where, for various reasons, m/c tasks in large numbers are considered absolutely necessary, we should try to find ways to compensate for the undesirable effects the m/c format may have. This applies particularly to examinations where the future of the examinees is at stake. It would be a pity, indeed, if candidates should fail the exam not because their language proficiency is inadequate but because they are handicapped by the testing format.
REFERENCES BOLGER,
N. and
achievement. MURPHY,
KELLAGHAN,
T. (1990) Method
of measurement
and gender
differences
in scholastic
Journal of Educational Measurement, 27, 165-174. R. J. L. (1980) Sex differences
in GCE examination
entry statistics
and success rates. Educational
Studies, 6, 169-178. WOOD,
R. (1978) Sex differences
4, 157-165.
in answers
to English
language
comprehension
items. Educational Studies,