How well does applicant rank order predict subsequent performance during radiology residency?

How well does applicant rank order predict subsequent performance during radiology residency?

I How Well Does Applicant Rank Order Predict Subsequent Performance during Radiology Residency? Saroja Adusumilli, MD, Richard H. Cohan, MD, Kelley W...

633KB Sizes 0 Downloads 10 Views

I

How Well Does Applicant Rank Order Predict Subsequent Performance during Radiology Residency? Saroja Adusumilli, MD, Richard H. Cohan, MD, Kelley W. Marshall, MD, James T. Fitzgerald, PhD Mary S. Oh, BS, Barry H. Gross, MD, James H. Ellis, MD

Rationale and Objectives. Residency selection committees expend substantial time and resources on assessing the quality

of residency applicants to derive an appropriate rank order for the National Residency Matching Program. The authors determined whether there is a relationship between the rank number or rank percentile of applicants selected for a residency training program and subsequent radiology residency performance. Materials and Methods. Records of radiology residents completing their residency between 1991 and 1998 were reviewed.

Available rank numbers and rank percentiles for each resident were compared with subsequent performance, as assessed subjectively by 4th-year radiology rotation evaluation forms and retrospective recall of four senior faculty members and objectively by numerical and percentile scores on the written portion of the American Board of Radiology (ABR) examinations. Correlation coefficients were obtained for each comparison. Results. Rank number and rank percentile were not significantly correlated with 4th-year resident rotation evaluations or ABR written examination scores or percentiles. A small correlation existed between rank order and retrospective evaluation of resident performance by the four senior faculty. Conclusion. Applicant rank number and rank percentile do not correlate with subsequent radiology residency performance

as assessed on rotation evaluation forms or the ABR written examinations. Key Words. Resident ranking, resident evaluation, resident performance.

The computerized ranking system of the National Residency Matching Program was designed to facilitate matches between residency programs and medical student applicants. This nationwide ranking system for all medical students has been embraced by most residency programs because it has replaced uncertainty in the application process with order (1). A great deal of time and extensive resources often are expended by residency selection committees on assessing the quality of medical students who apply to residency training programs, so that an appropriate rank order can be derived for the matching program. Despite these efforts, there is little

Acad Radiol 2000; 7:635-640

1 Fromthe Departmentsof Radiology(S.A., R.H.C., K.W.M., B.H.G., J.H.E.) and MedicalEducation(J.T.F., M.S.O.), Universityof MichiganHealthSystem, Ann Arbor. ReceivedAugust6, 1999; revisionrequestedNovember6; revision received November 24; accepted December 3. Address correspondence to R.H.C., Departmentof Radfology,Universityof Michigan Hospital, 1500 E Medical Center Dr, Ann Arbor, MI 48109-0036.

©AUR, 2000

agreement about whether any selection process is effective in predicting which applicants will perform best during their subsequent residency. A 1990 survey revealed that radiology residency program directors place greatest emphasis on academic performance when evaluating written application material (2). However, one study has shown that, although academic achievement in medical school--as measured by National Board of Medical Examiner test scores, class rank, grades, and the dean's letter--does predict performance during the 1st year of residency, correlation with subsequent performance is poor (3). The same study also demonstrated that there is no correlation between selection in medical school to the Alpha Omega Alpha national medical honor society and performance during residency (3). Studies performed to assess the value of the interview (where noncognitive traits can be best evaluated) in determining resident potential have shown conflicting results. One study found no relationship between the personal applicant interview and performance in an internal medicine residency

635

(4). Conversely, other studies found that assessment of noncognitive attributes during a directed interview was more important than evaluation of academic qualifications in determining an applicant's success in a residency training program (5-9). For the past 7 years, a complex system that involves assessment of applicant cognitive and noncognitive ability to determine applicant rank order has been used at our institution. We performed this study to learn if this system had been worthwhile and achieved its goal of predicting which medical students become the most successful residents. We evaluated whether, at our institution, there is any correlation between applicant ranking and the most commonly used measures of subsequent radiology resident performance.

The records of 54 fourth-year radiology residents enrolled in the residency training program in diagnostic radiology at our institution between 1991 and 1998 were retrospectively reviewed. Resident performance was assessed with both subjective and objective criteria. Subjective assessment involved compiling data from all rotation evaluation forms obtained during the 4th year of residency and obtaining a retrospective ranking by four senior faculty members. Objective measurement of performance was made by recording scores obtained on the American Board of Radiology (ABR) written examinations.

Subjective Assessment of Resident Performance Rotation evaluation forms.--During the 4th year of radiology residency at our institution, residents usually spend time in 12 or 13 subspecialty areas to review their entire residency experience prior to the ABR oral examination. We chose to review only evaluation forms from these 4th-year rotations rather than evaluation forms obtained during earlier years of residency because we believed that the 4th-year evaluation best reflects overall performance during residency training. At the conclusion of each subspecialty rotation, the division director or staff member who had the most contact with the resident completed an evaluation form. Although residents were evaluated in a number of different categories, we chose to record data from only two categories that were appficable to every rotation during the 7-year period included in our review: overall performance and general knowledge. Overall performance reflects a bottom-line impression of how the resident performed during the rotation. General knowledge consists of an assessment of radiologic and general medical knowledge as related to that specific rotation.

636

Two different evaluation forms were used. Between 1991 and 1995, 4th-year residents were scored in each category on a five-point scale as 1 = poor, 2 = below average, 3 = average, 4 = above average, or 5 = outstanding. A new four-point scoring system recommended at the 1995 meeting of the Association of University Radiologists (J. Littlefield, PhD, oral communication, April 1995) was implemented in July 1995. In this system, residents are scored in each category as 1 = unsatisfactory, 2 = circumscribed deficits, 3 = effective, and 4 = outstanding. A mean score was calculated for each 4th-year resident in the general knowledge and overall performance categories for the entire year (thus reflecting performance on every rotation during that year). To combine data from the two differently scored evaluation forms, an additional scale was created and named the corrected z-scale. With this scale, a mean for all 4th-year residents was set at 3 with a standard deviation (SD) of 1. In this way, mean scores for residents obtained at any time between 1991 and 1998 could be directly compared. Retrospective evaluation by senior faculty.--We asked four senior faculty members who knew all of the residents to provide a retrospective score of overall performance for each of the 4th-year residents. These four senior faculty members, from the divisions of chest, abdominal computed tomographic, pediatric, and vascular and interventional radiology, were asked to grade overall resident performance on a five-point scale--0 = poor, 1 = below average, 2 -- average, 3 = above average, and 4 = outstanding--and a mean score was calculated for each resident. None of the senior faculty had access to resident rank order data.

Objective Assessment of Resident Performance The results (overall raw numerical scores and percentile scores) of the written examination given by the ABR were also recorded for each of the 4th-year residents.

Assessment of Applicant Rank Rank order data were retrospectively available for 26 residents. For each of these residents, we recorded the numerical position of that resident on our match list in the year in which the resident was an applicant (rank number). We also recorded the total number of applicants ranked in that year. From this information, we then calculated each resident's rank percentile, defined as the applicant's rank number divided by the total number of ranked applicants in that year's group subtracted from 100%. Rank percentile identifies the percentage of ranked applicants who were ranked below the applicant during that year. For example, a

Academic Radiology, Vol 7, No 8, August 2000

APPLICANT RANK ORDER AND RESIDENCY PERFORMANCE

resident ranked second of 50 would be assigned a rank percentile of 96%, indicating that the applicant received a ranking that was superior to 96% of all applicants in that match group. We chose to assess rank percentile, as well as rank number, to standardize our data. This allowed us to correct for differences in the total number of applicants ranked from year to year. As an example, we were not sure if being ranked 25th of 75 ranked applicants (rank percentile, 67%) indicated better applicant quality than being ranked 25th of 26 ranked applicants (rank percentile -- 4%). Rank order is determined at our institution according to a complex method. Two or three faculty members on the Residency Selection Committee review all written documents for each applicant. These include a completed application form, curriculum vitae, personal statement, medical school transcript, dean's letter, and at least two letters of recommendation. All applications are assessed in five categories, which attempt to evaluate cognitive and noncognitive skills. The cognitive categories include educational background (quality of college and medical school attended), scholarship (grades, class rank, membership in Alpha Omega Alpha, and National Board of Medical Examiners or United States Medical Licensing Examination scores), and intellectual curiosity (research and volunteer activity). The noncognitive categories include interpersonal skills (such as the ability to communicate with others) and responsibility and maturity. Applicants are graded in each of these five cognitive and noncognitive areas on a scale of 0-5 (with 5 being the highest score). The individual scores are multiplied by factors of one ~ (for intellectual curiosity), two (for educational background, interpersonal skills, and responsibility and maturity), or three (for scholarship), thereby more heavily weighting those areas that we have considered to be most important. The weighted scores are summed. With this system, the highest possible applicant score is 50. Subjective modifications of the final score are then made by each of the evaluating faculty after a 20-30minute interview with the applicant. After the conclusion of the interview process, all applicant scores for each Resident Selection Committee faculty member are averaged to calculate a mean score and SD for that physician. Each candidate score is then expressed in terms of SDs from that faculty member's mean. This is done to equalize variations in scoring between faculty members, because some tend to be consistently higher or lower scorers. The adjusted scores are then summed to give a final score for each applicant. The applicants are listed in order from highest to lowest scores, thus establishing a tentative rank order. A group meeting is subsequently held, at which time Residency Selection Committee members are

allowed to make further subjective modifications to the rank order list, if these changes are agreed on by all members of the committee. Reasons for such adjustments include concerns raised at the time of the interview that are shared by all members of the Residency Selection Committee and subsequent direct communication between faculty at the applicant's medical school and a Residency Selection Committee member.

Statistical Comparisons Initial statistical analysis was performed to assess the consistency of our measures of resident performance. Correlation coefficients were calculated for comparisons between 4th-year rotation evaluation scores in the categories of overall performance and general knowledge, faculty recall scores, and ABR written examination scores and percentiles for all 54 residents. For all comparisons, P < .05 was considered to indicate a statistically significant difference. We also compared applicant rank number with rank percentile data to determine whether these two measures were correlated. Subsequently, statistical comparisons were made between applicant rank data information (applicant rank number, applicant rank percentile) and resident performance data (rotation evaluation scores in overall performance and general knowledge, faculty recall scores, and ABR written examination scores and percentiles) in the subset of 26 residents for whom both sets of data were available. Rank order data were not retrospectively available for 28 residents who had completed their residency. Correlation coefficients were calculated for each pair of variables that were to be compared; P < .05 was considered to indicate a statistically significant difference. An additional comparison was made of the performance of a subgroup of applicants matching with our program who received the lowest rank numbers with those who received the highest rank numbers for each of five matches. Matches were included in this analysis only if more than two residents were subsequently assigned to our program. This comparison was done because we thought that differences in rank number might be important only for those applicants considered to be either most desirable or least desirable. RESULTS

The mean and SDs of the mean rotation evaluation scores in the category of general knowledge and for overall performance for the 28 residents evaluated with the old evaluation forms were 3.7 + 0.4 (range, 3.0--4.6) and 3.7 + 0.5 (range,

637

Table 1 Comparison of Subjective and Objective Assessments of Resident Performance (n = 54) Objective Measures

Subjective Measures 4th-year rotation evaluation scores Overall performance General knowledge Faculty recall scores

ABR Written Examination Scores

ABR Percentile

Subjective Measures: Faculty Recall Scores

0.32 (.02) 0.30 (.03) 0.33 (.01)

0.32 (.02) 0.29 (.04) 0.35 (<.01)

0.63 (<.01) 0.56 (<.01) ...

Note.--Data presented are correlation coefficients (r); numbers in parentheses are P values.

3.0-4.8), respectively, on a scale of 1-5. The mean and SD of the mean rotation evaluation scores for the 26 residents evaluated with the new evaluation forms was 3.3 + 0.2 (range, 2.8-4.0) for general knowledge and 3.3 + 0.2 (range, 2.8-3.8) for overall performance, on a scale of 1-4. The mean and SD of the mean retrospective faculty recall scores for all 54 residents was 2.7 + 0.5 (range, 1.0-4.0) on a scale of 1-4. The mean and SD of the scores on the ABR examinations for all 54 residents was 553 + 50 (range, 450-650). The mean rank number and SD for the 26 residents for whom rank data were available was 23 _+ 17 (range, 1-63). The mean number and SD of residents ranked each year was 60 + 18 (range, 23-81). We identified significant positive correlations between the subjective and objective resident performance measures (Table 1). Specifically, there Were significant positive correlations between 4th-year rotation evaluation scores in both assessed categories and ABR written examination scores and percentiles. Significant positive correlations were also identified between retrospective faculty recall scores and ABR written examination scores and percentiles. The strongest significant positive correlations were noted when the two subjective measures of resident performance (rotation evaluation scores and retrospective faculty recall scores) were compared (Table 1). Rank number and rank percentile correlated in a strong and inverse fashion (r = -0.88, P < .01). These results confirm that these two measurements are strongly associated. The inverse correlation is to be expected because applicants with lower rank numbers would be expected to have higher rank percentiles. For example, an applicant with a rank number of 1 (a very low rank number) in a pool of 50 ranked applicants would have a rank percentile of 98% (a very high rank percentile).

638

Of the multiple comparisons that were performed between rank number, rank percentile, and subjective or objective resident performance measures, only applicant rank percentile and retrospective senior faculty recall scores demonstrated any significant correlation. For all other comparisons, correlation coefficients were low and P values were well above .05 (Table 2). Specifically, no correlation was found for comparisons between rank number or rank percentile and rotation evaluation scores in the two measured categories (overall performance and general knowledge). There also was no correlation between rank number or rank percentile and ABR written examination results, whether the latter were measured as raw scores or percentiles. Last, there was no correlation between applicant rank number and faculty recall scores. Specific comparisons were then made between the performance measures of residents receiving the lowest and those receiving the highest rank numbers for each of the five matches included in our analysis. The five residents who received top ranking (the lowest rank numbers) were assigned rank numbers of one (of 39), two (of 33), two (of 81), six (of 72), and 10 (of 67). The five residents who received the lowest ranking (highest rank numbers) were assigned rank numbers of 22 (of 67), 27 (of 39), 29 (of 33), 36 (of 81), and 63 (of 72). The mean of the mean rotation evaluation scores for general knowledge and overall performance for the low rank number group were 3.5 and 3.5, respectively, and those for the high rank number group were 3.3 and 3.5, respectively. The mean of the mean faculty-recall score for the low rank number group was 2.9, while that for high rank number group was 3.0. The mean of the ABR written examination scores for the low rank number residents was 585 and for the high rank number residents was 556. Not surprisingly, these differences were not statistically significant for such a small group of residents.

Table 2 Compan~on of Rank Number and Rank Percentile with Residency Performance Measures (n = 26)

Performance Measures Subjective 4th-year rotation evaluation scores Overall performance General knowledge Faculty recall scores Objective: ABR written examination results Score Percentile

Rank No.

Rank Percentile

0.04 (NS) -0.16 (NS) - 0 2 2 (NS)

-0.15 (NS) 0.11 (NS) 0.29 (.04)

-0.20 (NS) -0.20 (NS)

0.21 (NS) 0.25 (NS)

Note.--Data presented are correlation coefficients (r); numbers in parentheses are P values. NS = not significant.

Interestingly, although performance measures were similar for both groups, in three of the five matches the lowest-ranked resident (receiving the highest rank number) for that match subsequently received higher rotation evaluation scores, faculty recall scores, and ABR written examination scores than did the highest-ranked resident (receiving the lowest rank number) during that same year.

Residency training programs enrolled in the National Resident Matching Program have been required to devise a ranked order of their applicants. Although adopting a rank -~ ing system can create problems (not the least of which is creating a method whereby applicants can be appropriately ranked), it does help guide residency programs as they face the arduous task of selecting a group of future residents from an often exceptional applicant pool. In our study, no significant correlation was found between the rank number or rank percentile that was assigned to each of our residents at the time of his or her candidacy for residency and that resident's subsequent residency performance, as assessed by 4th-year rotation evaluations or ABR written examination scores. Thus, despite the extensive time and careful detailed procedures allotted at our institution for evaluating radiology resident applicant quality and determining rank order, rank number has little predictive value in determining how well that applicant ultimately will be judged to have performed on subsequent clinical radiology rotations. In addition, highly ranked applicants who match with our radiology residency are no more likely to do well on the ABR written examination than are matched applicants who were assigned lower rank numbers.

Our results also indicate that rank percentile appears to be no more valuable than rank number. At our institution, the relative standing of an applicant with respect to the size of the applicant pool in any given year was, in general, no more important than that applicant's rank number. Of all of the comparisons between rank number or rank percentile and the measurements of subsequent resident performance, the only statistically significant correlation was for the comparison between rank percentile and retrospective faculty recall. One possible explanation for this correlation is the fact that both the applicant evaluation process and the retrospective faculty recall scores call on faculty to respond to noncognitive resident abilities. We suspect that the same noncognitive features that may have originally led a faculty interviewer to rank an applicant highly may have similarly influenced faculty perception after training was completed. It is possible that such features had less influence on the rotation evaluation scores and the ABR written examination scores. It is conceivable that differences in subsequent performance are identified only when individuals with the lowest rank numbers are compared directly with individuals with the highest rank numbers. If this is the case, our ability to detect differences between some residents might have been obscured by the inclusion of a large number of intermediately ranked residents for whom rank number is of no value in predicting performance. Therefore, we also examined a small subgroup of residents, directly comparing those who received the lowest rank numbers with those who received the highest rank numbers in each of five matches. Even in this comparison, no consistent difference in performance could be identified. In fact, more often than not, resident performance was superior in those receiving lower ranking (higher rank numbers) during the same year. Among the 10 residents included in this

639

ADUSUMILLI

ET AL

analysis, mean rotation evaluation scores were highest for a resident ranked 27th (of 39) and lowest for a resident ranked 2nd (of 81). Thus, rank number does not appear to correlate with subsequent performance in even this small subset of residents. At our institution, it is possible that the predictive value of an applicant's rank number or rank percentile in terms of future performance as a resident could be improved by modifying the way in which the rank number is derived. Since noncognitive attributes such as conscientiousness and interpersonal skills have been deemed by some to be more critical to job performance than cognitive selection criteria (5), perhaps even more of our attention should be given to these noncognltive traits during the selection process. While noncognitive attributes are best assessed during the applicant interview, this can be done ffdly and accurately only if the interview is structured. For example, the structured behavioral selection "accomplishment interview" introduced by Wood et al (9) has been shown to predict subsequent performance as a resident. The results of the interview could be incorporated into our ranking system, along with changes in the weighting of the categories used to calculate the rank number; this could lead to better ranking-performance correlation. In this study, an assumption was made that the standard measurements used to gauge resident performance at our institution (and most others) are accurate. This assumption is supported to some extent by the fact that, in our series, statistically significant positive correlation existed between the subjective and objective measurement data used. Even so, it is conceivable that the criteria being measured on both rotation evaluation forms and written examinations do not accu'rately gauge resident performance. It has been suggested by some (5) that just as assessment of applicant performance should rely more heavily on analysis of noncognitive skills, evaluation of resident performance also needs to concentrate more heavily on behavioral issues by using tools such as the critical incident interview. Prior studies have shown that the most valued behaviors among residents by radiology staff deal with social skills, conscientiousness, and interpretive ability (5,8). These are traits that may not be measured as accurately as cognitive abilities on standard monthly rotation evaluation forms or written examinations. Several limitations of this study should be addressed. Since our data review included only residents at one institution, and since rank data were not available for all of our senior residents, our sample size was small. A multi-institutional study including much larger numbers of residents would be more important and helpful in confirming (or refuting) the validity of our results. Such an undertaking also might deter-

640

Academic Radiology, Vol 7, No 8, August 2000

mine whether, despite differences among institutions in the techniques used for deriving rank order lists, there is a similar lack of correlation between ranking and subsequently measured resident performance. Another limitation relates to the fact that we reviewed only 4th-year rotation evaluation forms. Comparisons between rotation scores in the 1st, 2nd, or 3rd year of residency training and rank order or rank percentile might have yielded different results. In addition, our study contains substantial selection bias. By the nature of our review, we could include only applicants that were preselected for interview at a competitive institution. Also, an inherent problem is encountered with use of rank numbers and rank percentiles for a number of different years. The pool of ranked residents may vary substantially in quality from one year to the next. Despite these limitations, we believe that the results of our study raise important questions about the ranking process. In conclusion, resident applicant rank number and rank percentile, as currently derived at our institution, do not correlate with subsequent radiology residency performance in the senior year, as determined by rotation evaluations. There is also no correlation between rank and ABR written examination scores or percentiles. These results suggest either that modifications should be made in the mechanism used to obtain a rank number or that less time and effort should be spent arriving at rank numbers at our institution. ~CKNOLWEDGMENT~

The authors thank Michael DiPietro, MD, Barry H. Gross, MD, Melvyn Korobkin, MD, and M. Victoria Marx, MD, who contributed the retrospective ranldngs of resident performance. They also thank Peter Hedlesky for his assistance in the preparation of the manuscript. IEFERENCE~ 1. Simon M. Radiology resident selection: a radical proposal. Invest Radiol 1992; 27:400-402. 2. Grantham JR. Radiology resident selection: results of a survey. Invest Radio11993; 28:99-101. 3. Yindra KJ, Rosenfield PS, Donnelly MB. Medical school achievements as predictors of residency performance. J Med Educ 1988; 63:356-363. 4. Komives E, Weiss ST, Rossa RM. The applicant interview as a predictor of resident performance. J Med Educ 1984; 59:425-426. 5. Taricc VS, Smith WL, Altmaier EM, Franken EA Jr, Van Velzen D. Critical incident interviewing in evaluation of resident performance. Radiology 1984; 152:327-329. 6. Tarico VS, Altmaier EM, Smith WL, Franken EA Jr, Berbaum KS. Development and validation of an accomplishment interview for radiology residents. J Med Educ 1986; 61:845-847. 7. Wagoner NE, Suriano JR, Stoner JA. Factors used by program directors to select residents. J Med Educ 1986; 61:10-21. 8. Altmaier EM, Smith WL, Wood PS, et al. Cross-institutional stability of behavioral criteria desirable for success in radiology residency. Invest Radio] 1989; 24:249-251. 9. Wood PS, Smith WL, Altmaier EM, Tarico VS, Franken EA Jr. A prospective study of cognitive and noncognitive selection criteria as predictors of resident performance. Invest Radio11990; 25:855-859.