Correlation between radiology resident rotation performance and examination scores

Correlation between radiology resident rotation performance and examination scores

Correlation between Radiology Resident Rotation Performance and Examination Scores ' Saroja Adusumilli, MD, Richard H. Cohan, MD, Melvyn Korobkin, MD,...

638KB Sizes 0 Downloads 92 Views

Correlation between Radiology Resident Rotation Performance and Examination Scores ' Saroja Adusumilli, MD, Richard H. Cohan, MD, Melvyn Korobkin, MD, James T. Fitzgerald, PhD, Mary S. Oh, BS

Rationale and Objectives. The authors' purpose was to determine whether there is a relationship between subjective assessment of radiology resident performance on individual rotations and objective assessment of radiology resident performance on the American College of Radiology (ACR) in-training and American Board of Radiology (ABR) written examinations. Materials and Methods. Records of 81 radiology residents completing their residency between 1991 and 2000 were reviewed. Mean scores from all rotation evaluation forms obtained during the study period were calculated for each residency year. The means of the overall raw scores and percentiles obtained on the annual ACR in-training examinations during the first 3 years of residency and of the written portion of the ABR examination taken during the 4th year of residency were also determined. Rotation evaluation scores were then compared to examination scores obtained during the same year of residency, and correlation coefficients were obtained. Results. In the 2rid, 3rd, and 4th years of radiology residency, there is positive correlation between rotation evaluation scores and overall scores from the corresponding ACR in-training examination and written portion of the ABR examination taken during the same year. In contrast, in the 1st year of residency, resident rotation evaluation scores do not correlate with ACR in-training examination scores. Conclusion. Residents who are perceived as doing well on their rotations after the 1st year of residency are more likely to do well on standardized written examinations. Key Words. Resident evaluation, resident performance, American College of Radiology in-training examination, American Board of Radiology.

Residency programs are required by the Accreditation Council for Graduate Medical Education to invest substantial time and effort (1) in periodically evaluating residents during the course of their residency. In addition, it is generally believed that regular resident evaluations are valuable tools in detecting major deficiencies that may necessitate close observation or remedial action (2). For these reasons, radiology residents are usually evaluated at the conclusion

Acad Radiol 2000; 7:920-926

1From the Departments of Radiology(S.A., R H C., M.K.) and MedFoalEducation (J.T F., M.S.O ), Universityof Michigan Health System,Ann Arbor. Received February 9, 2000; revision requested May 10, revision received

and acceptedJune 19. Address correspondence to R.H C, Department of Radiology/B1D502G,Universityof MIchNan Hospital, 1500 E Medical Center Dr, Ann Arbor, MI 48109-0030. ©AUR, 2000

920

of each of the subspecialty rotations. By necessity, a substantial component of such an evaluation process requires the recording of subjective impressions by faculty. An alternative, more objective evaluation process is also used at the vast majority of radiology residency programs: the administration of the American College of Radiology (ACR) in-training examination. The written portion of the American Board of Radiology (ABR) examination also provides objective data on radiology resident performance. A strong positive correlation between the results of the ACR in-training examination and subsequent performance on the A B R written examination has been previously demonstrated, first in a single institutional report in 1996 (3) and subsequently in a follow-up multi-institutional study (4). In those studies, residents with low scores on the A C R examination were found to perform poorly on the written portion of the A B R examination.

It has been our impression that performance characteristics assessed by a faculty member on subspecialty rotations are different from those assessed with a written examination, and, in many instances, there is little correlation between resident rotation performance and test scores. While some attempts (5-8) have been made to determine a relationship between subjective and objective assessments of radiology resident performance, to our knowledge resident performance on individual rotations as assessed with traditional evaluation forms has never been directly compared with performance on standardized examinations. For this reason, we decided to compare our resident rotation evaluation with written examination scores to determine whether there is any correlation between these two evaluation methods.

MATERIALS AND METHODS Rotation Evaluation Forms We reviewed rotation evaluation forms for 81 residents completing their residency at our institution between 1991 and 2000. Evaluation forms were available for 65 residents who completed their 1st year of training in our program, 68 residents who completed their 2nd year, 61 residents who completed their 3rd year, and 54 residents who completed their 4th year. Most (n -- 52) residents completed all 4 years of training in our program during the study period and were included in the analysis of each year of residency. The remaining residents began or completed their residency prior to or after the study period or transferred into or out of our program during residency training. Also, our records were incomplete for several residents who completed their residency in 1991 and 1992. In our program, resident rotations are organized in the following manner: In the 1st year, emphasis is placed on basic rotations such as chest, bone, gastrointestinal and genitourinary radiology, nuclear medicine, pediatrics, and emergency radiology in order to prepare for call duties. An introduction to neuroradiology and cross-sectional body imaging is also incorporated into the 1st-year curriculum. While many rotations are repeated during the 2rid and 3rd years of residency, certain rotations such as body magnetic resonance imaging, mammography, and interventional radiology are introduced for the first time. During the 4th year of training, our residents usually spend time in 12 or 13 subspecialty areas to review their entire residency experience prior to taking the ABR oral examination. At the conclusion of each 3-4-week subspecialty rotation, the division director or faculty member who has the

most contact with a resident completes an evaluation form for that resident. Although residents are subjectively evaluated in a number of different categories, we chose to record data from two categories that are applicable to every rotation: "overall performance" and "general knowledge." Descriptions of these categories have previously been documented in the radiology literature (9). Overall performance reflects a bottom-line impression of how the resident performed during the rotation. General knowledge consists of an assessment of radiologic and general medical knowledge as it relates to the specific rotation. We believed that this latter category might represent the closest approximation of written test performance. Two different evaluation forms were used during the study period. Between 1991 and 1995, resident performance was scored in each category on a five-point scale: 1, poor; 2, below average; 3, average; 4, above average; and 5, outstanding. A new four-point scoring system was recommended at the 1995 meeting of the Association of University Radiologists by Littlefield (Littlefield J, "The Evaluation of Residents," oral communication, April 1995) and implemented at our institution in July 1995. In this system residents are scored in each category as follows: 1, unsatisfactory; 2, circumscribed deficits; 3, effective; or 4, outstanding. A mean score was calculated for each resident in the general knowledge and overall performance categories over all forms collected during a given year (thus reflecting the average level of performance for that year). In accordance with our plan to compare resident performance evaluations with their ACR or ABR examination results, we undertook to combine data from the two differently scored evaluation forms. We treated each year separately in order to weight individual residents' evaluations equally. Within each year a z score was created for each set of forms: the five-point form and the four-point form. The z scores were translated to a standardized mean of 3 and a standard deviation of 1. The two scales were combined to form what is designated as the "corrected z scale." With this scale, all of the resident scores of a given year could be compared with the scores on the written examination taken by the residents during that same year. During testing for trends over time, the relationship between the five-point and the four-point scales became a concern. To transform the five-point scale to be comparable to the four-point scale, a z score was calculated for all ratings over all 4 years of residency for evaluations with just the five-point form. This z score standard deviation and mean were then set to the standard deviation and mean of

921

the four-point scale over all 4 years. This standardization allows for a comparison of the ratings over all 4 years of residency. Without this normalization, there were 14 residents whose evaluations could be compared over all 4 years. No residents were evaluated exclusively with the new forms during all 4 years of their residency.

ACR In-training Examination S c o r e s The results (total number of correct answers and/or percentile scores) of the yearly in-training examination given by the ACR were available for 65 lst-year, 55 2rid-year, and 47 3rd-year residents. Only overall scores for the entire examination were recorded; specific scores in subspecialty areas were not considered in our analysis. Fewer examination scores were available than were rotation evaluation scores, because, each year, some of our residents did not take the ACR examination due to illness or to vacations. ACR examination percentiles were provided for several residents (six in the 1st year, seven in the 2rid year, and seven in the 3rd year) for whom the total number of correct answers was not explicitly reported. Instead, for these residents, results listed only a percentile and a percentage correct score. ACR examination scores for 11 4th-year residents who took this examination in 1991 and 1992 were excluded from analysis because of the small sample size. The ACR examination has not been given to our 4th-year residents since 1992.

ABR Written Examination Scores Results (overall raw numerical scores and percentile scores) for the entire clinical written examination administered by the ABR, which were available for each of the 54 4th-year residents, were also recorded, and a mean score for the entire group was calculated. Scores for the physics portion of the written examination were not included in our analysis.

Statistical Analysis Repeated-measures analysis of variance was performed to assess the trend of mean rotation evaluation scores over time and to test the trend in the ACR in-training examination results over time. Pearson correlation coefficients were then calculated for comparisons between rotation evaluation scores in the categories of overall performance and general knowledge and the ACR in-training examination raw scores and percentiles for 54 lst-year, 52 2nd-year, and 46 3rd-year residents in whom both same-year evaluation and examination data were available. Similarly, for the 54

922

4th-year residents, correlations between 4th-year evaluations and the ABR written examination scores and percentiles were calculated. For these analyses, we compared rotation evaluation and examination scores from the same year only, since we were searching for any relationship between these two measures assessing performance during the same time period. The Pearson correlation coefficients were calculated for comparisons of written examination scores separately with scores on the old and new evaluation forms, and also with scores on the combined corrected z scale. For all comparisons, a P value of less than .05 was considered to indicate a statistically significant difference.

Rotation Evaluation S c o r e s The mean of the mean rotation evaluation scores and their standard deviations in the categories of general knowledge and overall performance for residents evaluated with the old and with the new evaluation forms for each of the 4 years of residency is listed in Table 1. Table 1 also lists the means and standard deviations for all of the residents during each year with use of the combined z scale. Table 2 shows the yearly results for the 14 residents who were evaluated for all 4 years with the old five-point form for each of the 4 years and those for the 38 residents who were evaluated for all 4 years with both the old five-point forms and the new four-point forms. For each grouping, there was a slight but statistically significant increase in each measure of resident performance over the entire 4year period (repeated-measures ANOVA).

ACR In-training Examination Scores The mean number of correct answers selected on the ACR in-training examination improved significantly (repeated-measures ANOVA, n = 40, within-subject, F = 46, 44, df= 78, 21) during the first 3 years of residency from 332 _+32 (range, 265-402) in the 1st year to 363 _+35 (range, 285--450), in the 2nd year and to 387 _+35 (range, 318-484) in the 3rd year. The mean percentile scores on the ACR examination did not change significantly. Our 1styear residents performed in the 60th percentile (+27) (range, 6%-99%), our 2nd-year residents in the 57th percentile (+26) (range, 4%-96%), and our 3rd-year residents in the 56th percentile (+25) (range, 5%-99%). These findings indicate that our residents' scores improved comparably to those of all residents who took the examination.

Table 1 Resident Evaluation Ratings: Old and New Forms and Combined New and Standardized Old Forms

Old Evaluation Form (five-point scale) Year

n

Mean

Combination of Standardized* Old with Original New Form

New Evaluation Form (four-point scale) SD

n

Mean

SD

n

Mean

SD

0.16 0.12 0.30 0.24

65 68 61 54

3.09 3.20 3.20 3.29

0.15 0.18 0.26 0.27

0.17 0.16 0.20 0.24

65 68 61 54

3.16 3.25 3.26 3.31

0.16 0.18 0.19 0.26

General Knowledge 1 2 3 4

38 41 34 28

3.43 3.60 3.61 3.71

0 19 0.29 0.33 0.44

27 27 27 26

3.09 3.20 3.17 3.31 Overall Performance

1 2 3 4

38 41 34 28

3.50 3.64 3.68 3.69

0.27 0.36 0.32 0.49

27 27 27 26

3.14 3.25 3.24 3.33

*Old evaluation form ratings for general knowledge over the 4 years (n = 141) were standardized by creating z scores and then normalizing to the mean and standard deviation of the readings of the new form over the 4 years (n = 107, mean = 3.19, standard deviation = .23). Old evaluation form ratings for overall performance over the 4 years (n = 141) were standardized by creating zscores and then normalizing to the mean and standard deviation of the readings of the new form over the 4 years (n = 107, mean = 3 24, standard deviation = .20).

Table 2 Trends in Residents' Evaluations over 4 Years, for Those Evaluated with the Same Form and for Those Evaluated with the Standardized Old Form and New Form Ratings Combined Onginal Ratings with the Same (old) Form All 4 Years (n = 14)* Year

Mean

SD

t 2 3 4

3.37 3.52 3.59 3.69

O.22 0,21 0.31 0.40

1 2 3 4

3.37 3.49 3.64 3.71

0.25 0.23 0.30 0.48

ABR Written Examination Scores

The mean of the scores on the ABR examinations for all 54 4th-year residents was 553 + 50 (range, 450-650). Comparison

Combined Ratings for All 4 Years (n = 38) t Mean

SD

General Knowledge 3.08 3.20 3.26 3.31

O. 14 0.17 0.19 0.24

Overall Performance 3.17 3.26 3.30 3.34

0.15 0.16 0.17 0.24

*For general knowledge ratings with the old (five-point) scale: repeated-measures analysis of variance (ANOVA), within-subject effect, F = 5.46, P < .01, df= 39.3. For overall performance rating with the old (five-point) scale: repeated-measures ANOVA, withinsubject effect, F = 4.24, P < .01, df = 39.3. tFor general knowledge ratings with standardized to new (fourpoint) scale and then combined with new scale: repeated-measures analysis of variance (ANOVA), within-subject effect, F = 16.15, P <.01, df = 111.3. For overall performance ratings with standardized to new (four-point) scale and then combined with new scale: repeated-measures ANOVA, within-subject effect, F = 9.12, P < .01, df= 111.3.

of E v a l u a t i o n F o r m S c o r e s w i t h

Written Examination Scores

Table 3 lists the correlation coefficients obtained from comparisons between yearly rotation evaluations in the general knowledge and overall performance categories and examination scores from the same year. Figures 1-4 depict graphs that compare the correlations between rotation evaluations in the overall performance category (with 95% confidence intervals) for each resident year. Many significant positive correlations were identified between evaluation scores and written examination scores in the 2nd, 3rd, and 4th years of residency. The most significant correlations were noted in the 3rd year, between the corrected z scale rotation evaluation scores in the general knowledge category and both the raw score and percentile from the ACR in-training examination from the same year. Similarly, a significant positive correlation was found between the corrected z scale rotation evaluation scores in overall performance for the 3rd year and the raw scores from the corresponding ACR examination. There were also significant positive correlations between 4th-year rotation corrected z scale evaluation scores in both assessed categories and the ABR written examination scores and percentiles

923

Table 3 C o r r e l a t i o n C o e f f i c i e n t s for C o m p a r i s o n s of Rotation E v a l u a t i o n S c o r e s a n d A C R a n d A B R E x a m i n a t i o n S c o r e s

Old Form* Examination 1st year ACR total no. correct ACR percentile 2nd year ACR total no. correct ACR percentile 3rd year ACR total no. correct ACR percentile 4th year ABRscore ABR percentile

General Knowledge

-.03 -.04

(n = 36) (n = 36)

New Form t

Overall Performance

-.09 -.01

General Knowledge

z Scores*

Overall Performance

General Knowledge

Overall Performance

(n = 36) (n = 36)

-.05 (n = 18) .08 (n = 24)

.15 (n = 18) .06 (n = 24)

-.02 (n = 54) .01 (n = 60)

-.02 (n = 54) .01 (n = 60)

.28 (n = 36) .33~ (n = 36)

.35 § (n = 36) .33~ (n = 36)

.48 (n = 16) .12 (n = 23)

.45 (n = 16) .09 (n = 23)

.32~ (n = 52) 25 (n = 59)

.35 ~ (n = 52) .23 (n = 59)

.42~ (n = 28) .28 (n = 28)

.43~ (n = 28) .28 (n = 28)

.38 (n = 18) .46~ (n = 25)

.28 (n = 18) .44~ (n = 25)

.41" (n = 46) .36" (n = 53)

.39" (n = 46) .35" (n = 53)

.33 ( n = 2 8 ) .31 (n = 28)

.36 ( n = 2 8 ) .35 (n = 28)

.28 ( n = 2 6 ) .27 (n = 26)

.28 ( n = 2 6 ) .29 (n = 26)

.30 § ( n = 5 4 ) .29 ~ (n = 54)

.32~(n=54) .32 § (n = 54)

Note.--General knowledge and overall performance refer to categories on the rotation evaluation forms. *Old form uses a five-point scale. tNew form uses a four-point scale. *z scores created for each form separately for each year, with standard deviation = 1 and mean set to 3. ~P < .05. ' P < .01.

and between 2nd-year rotation z-scale evaluation scores in both assessed categories and the raw scores from the corresponding A C R examination. In contrast, there was no correlation between evaluation scores in both categories and scores from the A C R in-training examination during the 1st year of residency.

420 _~ 400 ._"~ _= E 380 x 360 ~: o _ <

340

tO

The A B R encourages residency programs to administer in-training examinations in order "to assess the progress of residents in training and to identify individual and/or programmatic strengths or weaknesses" (10). It is for similar reasons that the A C G M E encourages resident testing as a means of evaluating resident performance and program quality. Such goals of standardized testing have been fairly consistent among various specialties in medicine. For example, general surgery residency programs use in-training examinations so residents may demonstrate competence in basic surgical knowledge (11). Residency programs do not rely solely on examination scores to determine competence, because the programs are required to regularly evaluate resident clinical performance. This is largely because many other attributes and skills are better assessed with more subjective evaluation methods. The ultimate goal of evaluation forms is not unlike

924

"d 320 #_ 300 5 z 280 260

I 1

'

I 2

'

I 3

'

I 4

'

I 5

'

I 6

Overall z Score Comparison of overall z score and number of questions answered correctly on the ACR examination for year 1. Graph shows best-fit line with 95% confidence curve. Figure 1.

that of written examinations, namely, to identify and correct deficiencies. Several groups have looked at the relationship between faculty evaluation of residents and performance on standardized examinations. Most applicable are two studies (6,7) that evaluated the correlation between subjective ranking of residents by faculty and ranking of resident raw scores on the A C R in-training examination. These studies found that pooled faculty ranking of

o~ 500 t

450

~ 450

E ....~plll ~j~li~i~l~l~l~l~ ....

a 400

X

LU r~

rr 0


E 0

"5 350

"5

8 300

350

Z |

i

1

i

I

2

I

i

I

i

3 4 Overall z Score

I

5

Figure 2. Comparison of overall z score and number of questions answered correctly on the ACR examination for year 2. Graph shows best-fit line with 95% confidence curve.

650

600

~- 550 rn < 500

450 I

1

'

I

2

'

I

'

I

3 4 Overall z Score

'

I

5

Figure 4. Comparison of overall z score and ABR test score for year 4. Graph shows best-fit line with 95% confidence curve.

residents from "best" to "weakest" was predictive of performance on the A C R in-training examination. Of course, such ranking of residents is not frequently performed at most institutions. The goal of our review was to assess the correlation between two routinely obtained measures of resident performance--rotation evaluation scores and written examination scores--since we had previously suspected that little correlation would exist. To our surprise, we found that there was a strong positive correlation between both rotation evaluation scores in general knowledge and overall performance and the corresponding examination results from the same year. Our findings indicate that poor rotation evaluation scores obtained during the 2nd, 3rd,

300

t

I

I

.5

1

1.5

I

I

I

I

I

I

2 2.5 3 3.5 4 4.5 5 Overall z Score Figure 3. Comparison of overall z score and number of questions answered correctly on the ACR examination for year 3. Graph shows best-fit line with 95% confidence curve.

and 4th years of residency are predictive of poor performance on standardized written examinations taken during that same year. Our study has even more intriguing implications when its results are considered in light of the findings of Banmgartner and Peterman (3,4), who found that poor performance on the ACR in-training examination can be predictive of subsequent poor performance on the A B R written examination taken later in the residency. Together, these results suggest that a resident who performs poorly on 2nd- or 3rd-year rotations is at risk for performing poorly on the A C R examination taken that same year but, even more important, also is at risk for performing poorly on the ABR written examination during the 4th year. The lack of correlation between rotation evaluations and the A C R in-training examination scores in the 1st year of residency may be due to the difficulty that faculty have in accurately assessing residents as they begin their radiology training. First-year residents are subjected to new radiologic principles, imaging modalities, and faculty each month. Our results suggest that it may not be necessary to take the step described in a 1993 report (8), in which residents were identified as needing remedial training based on poor evaluations and low examination scores as early as the 1st year of residency. It may be more helpful to consider remediation for residents who perform poorly after the 1st year, at a time when there is a strong correlation between subjective and objective measures of performance. In our study, an assumption has been made that the standard subjective and objective measurements used to gauge resident performance at our (and most other) institutions

925

are accurate. It has been suggested by some (12,13) that in the evaluation of resident performance a greater emphasis should be placed on noncognitive skills, since these skills are highly valued by radiology faculty. Such traits may not be measured on either standard monthly rotation-evaluation forms or on written examinations. In fact, innovative evaluation methods have been suggested in the resident education literature to alleviate the chronic dissatisfaction felt by residents and faculty about the state of resident evaluation (14). However, the purpose of this study was not to address the controversy of what constitutes a proper assessment of resident performance but rather to determine whether our current subjective evaluation process correlates with utilized objective evaluation techniques. Certain limitations of this study should be addressed. Our sample size was small, since our data review included only residents from a single institution. The value of follow-up studies with larger sample sizes has already been shown in the studies comparing scores from the ACR intraining examination and ABR written examination (3,4). A multi-institutional study including much larger numbers of residents would be of greater importance and would be helpful in confirming (or refuting) the validity of our results. A second limitation relates to the use of two different evaluation forms during the study period that decreased the statistical power of the original data and complicated the statistical analysis, necessitating the creation of an entirely separate scale so that all evaluation scores could be combined in our analysis. Last, we did not specifically correlate rotation performance in individual subspecialties with the corresponding examination scores for that subspecialty. It is'possible that had this been done, we might have been able to identify stronger or weaker correlations between rotation evaluation scores and examination scores for specific subspecialty areas. We grouped all subspecialty rotation evaluation forms together for each year, because we believed that such a grouping would provide us with a mean resident score that would more accurately reflect overall performance for that year.

926

In summary, our results indicate that, despite the subjectivity of our rotation evaluation process, rotation evaluation scores correlate well with objective assessment data obtained during the same year of residency for 2nd-, 3rd-, and 4thyear residents. Residents who are perceived as not doing well on their rotations during the 2nd, 3rd, and 4th years are more likely to do poorly on the ACR in-training examination or ABR examination taken during the same year. In contrast, there is no correlation between lst-year resident rotation performance and lst-year ACR examination scores. ~CKNOWLEDGMEN"

The authors thank Peter J. Hedlesky for his assistance in the preparation of the manuscript. IEFERENCE~

1. Gale ME. Resident evaluations: a computenzed approach. A JR Am J Roentgeno11997, 169:1225-1228 2 Schreiber MH Evaluation and documentabon of performance. Invest Radiol 1979, 14:453-455. 3. Baumgartner BR, Peterman SB Relattonship between American College of Radtology in-training exammatton scores and American Board of RadKology written examination scores Acad Radio11996, 3 873-878. 4. Baumgartner BR, Peterman SB. Relationship between American College of Radiology In-training examination scores and American Board of Radtology written examination scores IL Multi-mstltuhonal study. Acad Radio11998; 5:374-379. 5. Curtis DJ, Am~s ES Jr, Cruess DF, R~ordan DD. Evaluation of radiology resident cognitive performance invest Radio11985; 20 640-642. 6. Curtts DJ, Amis ES Jr, Cruess DF, Riordan DD Rankmg: a reproducible semtobjechve means of evaluatmg overall resident performance. Invest Radio11985, 20:757-758. 7 Curtis DJ, Cruess DF, Riordan DD, AIIman RM. Ranking: A year three follow-up in a different restitution Invest Rad]o11988; 23:541-544. 8 Edeiken BS. Remedtal program for diagnoshc radiology residents. Invest Radio11993, 28:269-274. 9 Cuttino JT, Scathff JH. Resident performance evaluation. Invest Radiol 1987, 22:986-989. 10 American Board of Radtology. Booklet of mformation for dragnoshc radiology 2000-2001. Tucson, Anz: Amencan Board of RadioJogy, 2000. 11 Garvm PJ, Kaminskt DL Stgnfficance of the in-traimng examination in a surgical residency program. Surgery 1984, 96:109-112. 12. Tanco VS, Smith WL, Altma~er EM, Franken EA Jr, VanVelzen D. Critical incident interviewing in evaluation of resident performance Radiology 1984; 152:327-329. 13. Altmmer EM, Smith WL, Wood PS, et al. Cross-instituhonal stability of behavioral cntena desirable for success in radiology residency invest Radiol 1989; 24:249-281. 14 Gordan MJ. Cutting the Gordian knot: a two-part approach to the evaluation and professional development of residents. Acad Med 1997; 72 876-880.