Use of National Board test questions to evaluate student performance in obstetrics and gynecology

Use of National Board test questions to evaluate student performance in obstetrics and gynecology William Chapel N. P. Herbert, Hill, North M.D., ...

Download PDF

425KB Sizes 0 Downloads 22 Views

Report

PDF Reader
Full Text

Use of National Board test questions to evaluate student performance in obstetrics and gynecology William Chapel

N. P. Herbert, Hill,

North

M.D.,

William

C. McGaghie,

Ph.D., and George

B. Forsythe,

M.A.

Carolina

The evaluation of student performance in clinical obstetrics and gynecology is frequently based on results of the National Board Examination, Part II. An Obstetrics and Gynecology subtest of this examination was studied to determine its value as a measure of this clinical experience. The clinical usefulness of each question and the distribution of questions among subjects within the specialty (Normal Obstetrics, Reproductive Endocrinology and Infertility, etc.) were determined independently by faculty in obstetrics and gynecology. In addition, the influence of the type of question (e.g., multiple-choice, matching), the category of material, and the clinical usefulness of each question were studied in regard to student test performance. Eighty-six percent of the questions were judged to be indispensable, highly useful, or moderately useful. Abnormal Obstetrics, Gynecology, and Reproductive Endocrinology and Infertility accounted for 70% of the questions with the remaining questions distributed among Normaf Obstetrics, Population and Family Planning, and Gynecologic Oncology. Student test performance was not significantly influenced by the type of question format or category of material but was related to the level of clinical usefulness. Overall, these results, which are based on ratings from five faculty and a single class of medical students at one medical school, indicate that the Obstetrics and Gynecology subtest of the National Board Examination, Part II, is a reasonable measure of clinical experience in this field. (AM. J. OBSTET. GYNECOL. 147:73, 1983.)

Examinations developed by the National Board of Medical Examiners are widely used to evaluate medical students. The 1982-I 983 Curriculum Directory of the Association of American Medical Colleges reports that 57 of 126 (45%) United States medical schools required students to pass the Part I examination for promotion to the clinical years. In addition, 44 (35%) medical schools have established a passing score on the Part II Examination as a graduation requirement. The National Board of Medical Examiners has repeatedly stated that its nationally standardized examinations may not match the curricular characteristics of all medical schools and asserts that individual medical schools are responsible for the intramural use of its examinations.‘. ’ At the University of North Carolina (UNC), the posttest developed by the Association of Professors of Obstetrics and Gynecology is given to third-year medical students upon completion of the clerkship in obstetrics and gynecology, but the results are not con-

From the Department 4 Obstetrics and Gynecology and the Office of Research and Develobment for Education in the Health Professions, University of North Carolina School of Medicine. Received for fiublication Februan 9. 1983. Revised ,+i 4, 1983. ’ ’ Accepted April 12, 1983. Reprint requests: William N. P. Herbert, M.D., Department of Obstetriks and Gynecology,Uniuersie of North Carolina School of Medicine, 214 MacNider Building 202H, Chapel Hill, North Carolina 27J14.

sidered in evaluating student performance. To be promoted to the fourth year, students must pass the National Board Examination, Part II, which is administered at the end of the third year of medical school. Medical faculty that use National Board examinations for student evaluation make several assumptions about these tests. First, they assume the test questions are consistent with the educational goals their students are expected to achieve and that the tests contain questions representative of the material the students have experienced. Second, they assume the test questions are unbiased and that the ccores that result are appropriate for reaching decisions about students. This implies that National Board examinations are not susceptible to “testmanship” and that individual score differences reflect genuine variation in levels of knowledge of the material. Research designed to test these assumptions has produced conflicting results. Two studies have demonstrated a close match of National Board examinations with medical curricula,s, j Several other studies involving basic medical sciences”-’ and in clinical cancer education’* s have failed to confirm fully the assumed match between National Board test content and medical teaching goals. In regard to student performance as it relates to the question format, studies of National Board test questions have suggested that differences in the three basic question formats’ (multiple-choice, matching, or multiple true-false) can influence student performance.“. I1 73

74

Herbert,

McGaghie,

and Forsythe

September Am. J. Obstet.

Table I. Faculty ratings of the test questions

of the clinical

Scale value

5 4 3 2 I

Definition

Indispensable Hiehlv useful MGderately useful Slightly useful Not useful

usefulness

Absolute frequency

Percenlape

22 63 34 17

15.9 45.7 24.6 12.3

2 138

1.4 99.9

The present study of an Obstetrics and Gynecology subtest of the National Board Examination, Part II, was undertaken to address three questions. First, do the test questions measure clinically useful material? Second, what is the distribution of the questions among various topics within the specialty? Third, is medical student performance influenced by the type of question, the category of material, or the clinical usefulness of particular questions?

Material and methods Five full-time, Board-certified faculty members of the Department of Obstetrics and Gynecology at UNC participated in the study. The divisions of Gynecologic Oncology, Reproductive Endocrinology and Fertility, Maternal-Fetal Medicine, and General Obstetrics and Gynecology were represented. Each faculty member independently evaluated all 149 questions contained in the April, 1981, version of the Obstetrics and Gynecology subtest of the National Board Examination, Part II. This subtest along with five other National Board subtests was administered in the summer of 1982 to 1 I8 UNC third-year medical students to assess their readiness for promotion to the fourth year of medical school. Eleven test questions were deleted by the National Board during the scoring procedure, presumably after review of the content and item statistics.’ The remaining 138 questions are the object of analysis in this study. The participating faculty members evaluated each question in three ways. First, each question was coded according to six categories of material: Normal Obstetrics, Abnormal Obstetrics, Gynecology, Population and Family Planning, Gynecologic Oncology, and Reproductive Endocrinology and Infertility. Second, each question was coded according to its format (multiplechoice, matching, or multiple true-false). Third, each question was rated in terms of its “usefulness for clinical medicine” according to a five-point scale (5 = indispensable, 4 = highly useful, 3 = moderately useful, 2 = slightly useful, 1 = not useful). The difficulty index for each test question was used as the dependent variable for this study. This index, which represents the proportion of examinees within a

1, 1983 Gynecol.

group that answers each question correctly, was adjusted according to the method described by Jenseni and cast on a scale ranging from 200 to 800 with a mean of 500 and SD of 100. Higher values on this scale represent increasingly difficult questions. An adjusted difficulty index was derived for each question based on responses from the 118 UNC medical students and from a national sample of medical students for which the sample size was unknown. Means and SDS for the adjusted difficulty indexes of the questions in each category of content by question format were also calculated. The reliability of the faculty ratings was assessed according to procedures discussed by Winer.13 Reliability values can range from 0.00 to 1.00; the higher the number, the greater the consistency among the judges. The adjusted local and national data were analyzed in separate analyses of variance to determine if question format, content category, or an interaction of the two factors was responsible for variation in question difficulty. Finally, correlation coefficients were computed between the adjusted local and national difficulty indexes for the questions and the sum of the clinical usefulness ratings. The correlation between the local and national question difficulty was also calculated.

Results The clinical usefulness ratings of the Obstetrics and Gynecology test questions are shown in Table I. The three highest rating categories-indispensable, highly useful, moderately useful-accounted for 119 of the 138 questions (86%). Only two of the questions (1.4%) were judged to have no clinical usefulness. The distribution of test questions according to content category and question format is given in Table II. There was no significant disagreement among faculty judges over assignment of questions to the content categories. Three content categories-Abnormal Obstetrics, Gynecology, Reproductive Endocrinology and Infertility-having 31, 36, and 30 questions, respectively, accounted for 70% of the total examination. Normal Obstetrics accounted for 20 (14%) questions, Gynecologic Oncology for 18 (13%), and Population and Family Planning for eight (2%). Of the 138 questions, the simple multiple-choice format was used for 111 (80%). The remaining 27 (20%) questions were of the matching or multiple true-false format. Table II also presents means and SDS for the adjusted local and national difficulty indexes for each combination of test content and question format. Multiple-choice questions on Population and Family Planning were apparently the least challenging for medical students both nationally and at UNC. Matching and multiple true-false questions on Reproductive

Volume Number

Evaluation

147

performance

of student

75

1

Table II. Content coverage, subtest of the 1982 National NO?TZd Obstetrics Format

UNC

National

distribution of questions, and difficulty Board Examination, part II* Abnormal Obstetrics UNC

National

Multiple-choice questions 442 457 454 462 M SD 74 72 73 61 13 24 n Matching and multiple true-false questions M 429 449 454 476 SD 72 57 59 56 n 7 7

Total ques-

20 (14%)

Gynecology

indexes

Population and Family Planning

for Obstetrics

GynecoLqic Oncology

and Gynecology Rep-oductive Endocrinology and Inftiility

UNC

National

UNC

National

UNC

National

UNC

Nation&

428 81

430 72

420 109

415 104

463 62

475 50

465 65

480 64

30 462 49

31 (22%)

3 472 38

0 0

15 0 0

469 81

26 491 70

508 82

Total questions

111 (80%) 514 55

6

0

3

4

36 (26%)

3 (2%)

18 (13%)

30 (22%)

27 (20%)

tions *Because

of adjustment,

the lower

the mean

score,

the more

Endocrinology and Infertility were the most difficult for both student groups. Cautious interpretation is needed, however, because of the small number of questions in each category. The reliability analysis of the faculty ratings, based on the clinical usefulness of the test questions, was 0.75 when all five faculty judges were considered. However, in contrast with the other four judges, one faculty member rated the questions much higher in clinical usefulness and did not discriminate among individual questions. For this reason, the ratings determined by the four consistent faculty judges were used to assess clinical usefulness. Elimination of the question ratings from the inconsistent faculty judge raised the reliability to 0.81. The univariate analyses of variance were based on the UNC and national data contained in Table II. Neither analysis revealed statistically significant differences (p < 0.05) in test question difficulty because of question format, content category, or an interaction of the two factors. However, the influence of question content approached statistical significance (p = 0.08) for the national data. The aim of the last analysis was to determine if a correlation existed between the four judges’ rating of question usefulness (summed across judges) and the difficulty of each question. For the UNC data r = -0.25 (p < 0.01); for the national data r = -0.23 (p < 0.01). This indicates that more useful questions were more often answered correctly. The correlation between the question difficulty indexes derived from UNC students and the national sample was quite high (r = 0.91, p < 0.0001).

Comment These data clearly indicate the UNC faculty members believe most of the Obstetrics and Gynecology test

often

the questions

were answered

correctly.

Decimals

are omitted.

questions address clinically useful concepts and principles. This suggests that the Obstetrics and Gynecology subtest is more consistent with clinical educational goals than past studies have shown for National Board examinations used for student evaluation in the UNC basic science curriculum.“. ’ The distribution of questions by category (Table II) may not conform with the undergraduate teaching goals of all departments. For example, a department’s teaching faculty could give Increased attention to Normal Obstetrics and Populat.ion and Family Planning, with concomitant reductions in Abnormal Obstetrics, Gynecology, and Reproductive Endocrinology and Infertility. This acknowledges 1hat the clinical curriculum is not nationally standardized. Other departments may have different, yet equally valid, objectives for medical student education and evaluation. It is important to emphasize that the content categories used in this study were derived at UNC. The categories were not based on the National Board test outline. Also, the distributicm of questions on future examinations is unlikely to be identical to the distribution reported in this article. Neither the type of question nor the category of material seems to influence question difficulty significantly. This contrasts with past research”, ” and suggests that this test contained questions of equivalent difficulty across six contem areas and for different types of questions. Most faculty members would expect that students would answer questions correctly more often if the material were thought to be clinlcally useful, and the present analysis supports this notion. Faculty views about the clinical utility of test material have a statistically significant correlation with student test performance. It is noteworthy that faculty members may vary in their judgment of the clinical usefulness of individual test

76

Herbert,

McGaghie,

as illustrated

questions,

faculty members from the other Numerous

four

from

However, inations

designs matters

such are

also

the

not cialty.

members

subject

the format to faculty

one

of the

with

drawing

in

would

prevent

way

in which

analyses

disagreement questions

of the type

potential

Gynecology

useful one

test

and indicate

succeeds

as judged

medical

school.

among

six areas was

are

we have

problems

subtest

material

performance

or content evaluations

firm

investigation.

of National Board examcontrol, test research with

to many

and

five

significantly

interpretation of results.‘” the results of this investigation

distributed

Student

that

retrospective

Data

clinically

evenly

fact

associated

as the

Obstetrics

evaluating ulty

” that

is not possible.

require cautious In summary, that

are

because the design is not subject to local

prospective

September 1, 1983 Am. J. Obstet. Gynecol.

this study differed in this assessment.

a single,

about used

by the

in

hazards

conclusions

coded*

and Forsythe

by

Questions within

not

in facwere

the

spe-

influenced

of the questions but of question usefulness.

was

by related

We acknowledge the assistance of Drs. Edward H. Bishop, Lamar E. V. Ekbladh, Luther M. Talbert, and Leslie A. Walton of the Department of Obstetrics and Gynecology, University of North Carolina School of Medicine. and Dr. Paul Kelley of the National Board of Medical Examiners for their assistance in this invrestigation.

REFERENCES

1. Hubbard, J. P.: Measuring medical education: The and the experience of the National Board of Medical aminers, ed. 2, Philadelphia, 1978, Lea & Febiger.

Bound

volumes

Bound volumes able to subscribers ($90.00 international)

available of the

(only)

tests Ex-

2. National Board of Medical Examiners: 1981 Annual Report, Philadelphia, 1982. 3. Kennedy, W. ‘B., Kelly, P. R., and Hubbard, J. P.: The relevance of National Board Part I Examinations to Medical School Curricula, Philadelphia, 1970. National Board of Medical Examiners. 4. Garrard, J., McCollister, R. J., and Harris, I.: Review by medical teachers of a certification examination: Ratio. nale, method, and application, Med. Educ. 12:421, 1978. 5. McGaghie, W. C., Burford, H. J., and Harward, D. H.: Content representativeness and student performance on National Board Part I special subject examinations, Ann. Conf. Res. Med. Educ. 19:15, 1980. 6. McGaghie, W. C., Burford, H. J., and Harward, D. H.: External and internal tests of preclinical learning: Content representativeness, item educational emphasis, and student achievement, Ann. Conf. Res. Med. Educ. 20: 115, 1981. 7. Wile, M. 2.: External examinations for internal evaluation: The National Board Part I test as a case, J. Med. Educ. 53:92, 1978. 8. Ruckdeschel, J. C., Lea, J. W., Brown, S., and Horton, J.: Content bias in the neoplastic-related items of the National Board of Medical Examiners Part II examination, Med. Pediatr. Oncol. 10~269, 1982. 9. Goetzel, R. Z., Croen, L. G., Lan, S., and Bases, R. E.: A content validity analysis of neopIastic items of the National Board of Medical Examiners Part II examination, Med. Pediatr. Oncol. 10:413, 1982. 10. Skakun, E. N., Nanson, E. M., Kling, S., and Taylor, W. C.: A preliminary investigation of three types of multiple choice questions, Med. Educ. 13:91, 1979. 11. Sarnacki, R. E.: The effects of test-wiseness in medical education, Eval. Health Prof. 4:207, 1981. 12. Jensen, A. R.: Bias in Mental Testing, New York, 1980, Free Press. 13. Winer, B. J.: Statistical Principles in Experimental Design, ed. 2, New York, 1971, McGraw-Hill Book Company, Inc. 14. Appelbaum, M. I., and Cramer, E. M.: Some problems in the nonorthogonal analysis of variance, Psychol. Bull. 81:335, 1974.

to subscribers

AMERICAN

JOURNAL

OF OBSTETRICS

AND GYNECOLOGY

are

avail-

for the 1983 issues from the Publisher, at a cost of $70.95 for Vol. 145 (J anuary-April), Vol. 146 (May-August), and Vol. 147 (September-December). Shipping charges are included. Each bound volume contains a subject and author index and all advertising is removed. Copies are shipped within 30 days after publication of the last issue in the volume. The binding is durable buckram with the JOURNAL name, volume number, and year stamped in gold on the spine. Payment must accompany all orders. Contact Mr. Deans Lynch at The C. V. Mosby Company, 11830 Westline Industrial Drive, St. Louis, Missouri 63146, USA. Subscriptions must be in force to qualify. Bound volumes are not available in place of a regular JOURNAL subscription.

Use of National Board test questions to evaluate student performance in obstetrics and gynecology

Use of National Board test questions to evaluate student performance in obstetrics and gynecology

Recommend Documents