A new assessment tool: The patient assessment and management examination

A new assessment tool: The patient assessment and management examination

A new assessment tool: The patient assessment and management examination Helen M. MacRae, MD, Robert Burnstein, MD, Toronto, Ontario Cohen, PhD, Glen...

819KB Sizes 4 Downloads 118 Views

A new assessment tool: The patient assessment and management examination Helen M. MacRae, MD, Robert Burnstein, MD, Toronto, Ontario

Cohen, PhD, Glenn Regehr, PhD, Richard

Reznick,

MD, and Marcus

Background. The major goal of certification

is to assure the public that the candidate is competent in all facets required of the position. The patient assessment and management examination (PAME) was developed to enable a more comprehensive assessment of competence in the practice of surgery. Methods. A six-station, 3-hou7; standardized-patient-based evaluation was developed. Each station was scored using a set offive-$oint global rating scales. PAME results were compared to the last two intraining evaluation reports (ITER), the clinical knowledge component of the ITER (ITER-CK), an inhouse oral examination (OE), and the Canadian Association of General Surgeons’ multiple-choice examination (CAGS). Results. Eighteen senior general surgery residents were evaluated. Overall reliability was 0.70 (Cronbach’s alpha). Fifth-y ear residents scored significantly better than fourth-year residents (t = 3.062; p = 0.0074), with I year of training accounting for 37% of the variance in scores. Correlations between the PAME and each of the other measures were ITER, 0.24; ITER-CK, 0.38; OE, -0.13; and CAGS, 0.061, with the PAME demonstrating better reliability and stronger evidenceof validity than any other Conclusions. The PAME had better psychometric properties than other measures and assessed areas often not evaluated. This type of evaluation may be useful for feedback, remediation, or certafication decisions. (Surgery 1997;122:335-44.) From the Departmrnt

of Surgery,

University

of Toronto,

CLINICAL COMPETENCE ENCOMPASSES a wide variety of clinical skills including the ability to obtain information from the patient history and physical examination, the application of medical knowledge to clinical problems, the effective use of communication skills with patients and peers, and the ability to integrate all of these facets to solve problems around a clinical enc0unter.l Ability to ensure that a candidate is capable of handling all aspects of job-related problems is an important facet of certification.* The evaluation methods used in residency training and for certification decisions should ensure that the successful candidate has demonstrated the requisite level of performance in all objectives deemed essential for Supported by the physicians of Ontario the Physicians’ Services Incorporated. Presented University

at the Fifty-eighth Surgeons, Tampa,

through

Annual Meeting Fla., Feb. 13-15,

a grant

from

of the Society 1997.

Reprint requests: Helen M. MacRae, MD, Room 1525, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario, Canada M5G 1X5. Copyright 0 1997 by Mosby-Year Book, Inc. 0039~6060/97/$5.00+0

11/6/82176

of

Toronto,

Ontario,

Canada

clinical competence, including cognitive, psychomotor, and affective objectives. As an individual progresses through specialty training it is inappropriate to devote testing time solely to the assessmentof basics such as history and physical examination and simple management decisions. Rather, it is important to learn whether the trainee can integrate information from the history and physical examination with his or her knowledge of medicine to order investigations appropriately, interpret the results of the investigations, and communicate findings to the patient. Finally, it must be determined whether the resident can develop a management plan, outline potential risks and benefits to the patient, and involve the patient in the decision-making process. The purpose of this study was to investigate the psychometric properties of a new assessmenttool, the patient assessmentand management examination (PAME), which was designed to enhance the summative assessmentof the competence of senior residents and provide a more comprehensive evaluation of competence. SURGERY

335

336 MacRae et al.

SUKgEY~ August 1997

YES

NO

Histoly duration of symptoms .....................................................................................

0 .................... .

blood mixed with stool ...................................................................................

0 ................... .O

dripping into bowl ..........................................................................................

0 .................... .

blood on paper.. .............................................................................................

0 ................... .O

no pain. ..........................................................................................................

0.. ................. .O

no protrusion ..................................................................................................

0.. ................. .O

no change in continence ..................................................................................

0.. ................. .O

no urgency .....................................................................................................

0 .................... .

no mucous ......................................................................................................

0. .................. .O

no tenesmus ...................................................................................................

0.. ................. .O

no change in bowel habit ................................................................................

0.. .................. 0

no weight loss ................................................................................................

0.. .................. .

family history (-) for Colon CA .......................................................................

8.. ................. .O

Phvsical Examination examines abdomen .........................................................................................

0 ................... .O

not overly rough .............................................................................................

0 ................... .O

appropriately thorough ...................................................................................

0 ................... .O

asks for rectal examination, or tells pt. it w-ill be done at time of endoscopy .... 0.. ................. .O o&&g explains and/or obtains consent for colonoscopy .............................................

0 .................... .

Fig. 1. First patient encounter examiner checklist.

METHODS A six-station standardized patient-based examination was developed, drawing on the principles of traditional standardized patient (SP)-based evaluations. Station subject matter was chosen by a panel of general surgeons from the objectives of the Royal College of Physicians and Surgeons of Canada for General Surgery. The six stations were a patient with rectal bleeding; dysphagia and reflux in a middle-aged man; a young woman with abdominal pain and a liver mass;a patient with multiple trauma; a patient with a mammographic abnormality; and a patient with Crohn’s disease.Each station was developed by a general surgeon and included a training script for the standardized patient, referral letters, imaging studies, endoscopy images, and laboratory results, as well as a short

structured oral examination. All stations were reviewed by at least three general surgeons for content, relevance, and level of difficulty. Stations were 30 minutes in length and included four components: an initial patient assessment (8 minutes), ordering and interpretation of investigations (4 minutes), a second interaction with the patient to discuss diagnosis and management (10 minutes), and a structured oral examination (6 minutes). Two minutes were used for changeover. In the initial encounter candidates were supplied with referral letters and investigations that would likely have been completed before general surgical referral. They then took a history and performed a physical examination, ordered any investigations they believed necessary, and obtained informed consent from the patient for invasive investigations. Results of laboratory investigations were given, and the

AhRae

surgq

et al.

337

Volume 122, Number 2

YES diagnosis of carcinoma of rectum .........................................................................

NO

0. .................. .O

outlines procedure ..........................................................................................

0 ................... .O

outlines risks/potential complications (anastomotic leak) ................................

0 ................... .O

autonomic dysfunction ...................................................................................

0 ................... .O

possibility of colostomy - temporary ...............................................................

0 ................... .O

possibility of permanent colostomy ...............................

0 ................... .O

................................

outlines potential need for adjuvant therapy ..........................................................

0 ................... .O

discusses prognosis ..............................................................................................

0. .................. .O

accurate information re: time off work .................................................................................................

0. .................. .O

post operative bowel function .........................................................................

0 ................... .O

familial risk .....................................................................................................

0.. ................. .O

Fig. 2. Follow-uppatient encounter examiner checklist.

candidates were asked to interpret radiographs or endoscopy photographs for investigations they had ordered. The SP then returned for a follow-up visit, at which time the candidate discussed the diagnosis and developed a management plan with the patient. Finally, the candidate was asked four to six predetermined questions by the examiner. During the initial and follow-up encounters, examiners rated candidates’ performances with objective checklists. The first checklist was similar to that used in the standard Objective Structured Clinical Examination (OSCE) , with points awarded for specific questions asked and actions on physical examination. Because of the nature of the second encounter, the items on the second checklist were more general and more open to interpretation by the examiner. Sample checklists for each encounter are included in Figs. 1 and 2. In addition to the checklists, qualified genera1 surgeons were also asked to rate the candidates with global rating forms. Ordering and interpretation of investigations and the structured oral examination were rated with global rating scales alone. The global scales were scored from one to five, such that a three indicated that the candidate demonstrated sufficient skill for independent practice in genera1 surgery in the area assessed.Samples of the global rating scalesfor each component of the examination are shown in Figs. 3 to 6. Finally, SPsrated candidates’ communication skills on a modified version of the American Board of Internal Medicine

patient satisfaction scale at the completion of each encounter (Fig. 7). The eighteen candidates were senior residents in genera1 surgery at a single institution. Half the candidates were within three months of completing their residencies; the other half had at least one year of training to complete. Three tracks of six stations each were run to facilitate completion of the examination in a timely fashion. Eighteen genera1 surgeons served as examiners. Reliabilities were assessedusing Cronbach’s alpha; correlations were computed using Pearson Product Moment correlations.

RESULTS The overall reliability of the global scores on the examination was 0.70. When broken down by examination component, the reliabilities were first patient encounter global score, 0.51; first checklist score, 0; investigation and interpretation, 0.52; second patient encounter global score, 0.74; checklist for the second encounter, 0.69; structured oral examination, 0.57. Because the overall global rating was more reliable than checklist scores, it was used in subsequent analyses. Reliability of the patient satisfaction scores was 0.42. The correlation between patient satisfaction scores and overall global assessmentswas 0.45 (p = 0.063). Construct- and criterion-related validity were assessed.Residents in their last 3 months of train-

338

MacRae

et al.

SUVFY August 1997

1 inaccurate or missing important information

2 borderline

3 missing some minor points but relatively complete and accurate

4

2 borderline

3 adequate for surgical plXtiCe

4

5 excellent technique, 4vrwiaW tborougb

2 borderline

3 appropriate knowledge level, minor ticienck that would not affect outcome

4

5 thorough=d indeptb knowledge

5 precise and thorough

PHYSICAL 1 incomplete or not adequate

KNOWLEDGE 1 lacks important infoimation

COMMUNICATION 1 unprofessional manner, poor or vague explanations, poor technique or skills

OVERALL

demOllSk&Cd

SKILLS 2 borderline

3 explanations adequate, professional manner, appropriate cxmlmunication skills

exceptional rappoe puts patient at ease, professional at all times

ON TEUS STATION

1 lacks knowledge and skills and/or attitude necessaryfor independent practice inthearea

2 borderline

3 adequate for independent prache in the area

Fig. 3. First patient encounter

ing scored significantly better than those with another year of training to complete. The mean score of the fourth-year residents was 3.11 + 0.31. That of the final-year residents was 3.54 + 0.27 (t = 3.06; p = 0.0074), demonstrating evidence of construct validity. Year of training accounted for 37% of the variance in scores. To assess criterion-related validity the PAME was correlated with other measures of clinical skills commonly used in residency training. These in eluded an in-house oral examination (OE) , the last two in-training evaluation reports (ITER) , the clinical knowledge component of the ITER (ITER-

examiner

4

5 exceeds expectations

global rating form.

CK), and, finally, results of the Canadian Association of General Surgeons multiple-choice examination (CAGS). The reliabilities of these scores were ITER, 0; ITER-CK, 0; and unavailable for the oral examination or for the 18 residents on the CAGS examination. The Pearson product moment correlations between the PAME and each of these measures were ITER, 0.24; ITER-CK, 0.38; OE, 0.14; and CAGS, 0.061. None of the correlation coefficients were significant. Finally, examiners were asked to evaluate the PAME as an assessment tool for senior residents. Overall, the examination was given a rating of 4.5 f

MacRae

surgery

et al,

339

Volume 122, Number 2

USE OF RESOURCES 1 incomplete or inefficient use of resources

2 borderline

3 adequate for independent practice

4

5 complete and coat efficient use of resources

2 borderline

3 interpretation of results adequate for independent practice

4

5 interprets results with a high degree of precision

INTERPRETATION 1 dilRulty interpreting results or incorrect interpretation

Fig. 4. Investigations

examiner

Likert scale (1 = poor, 5 = excellent), with examiners commenting that they believed the stations were realistic and portrayed patient problems well. 0.5 on a five-point

DISCUSSION Assessmentof clinical competence requires that a variety of testing techniques be used because no one method assessesall important components we11.3The OSCE4 and the use of standardized patients in assessment5has emerged asthe state-ofthe-art method for assessingbasic clinical skills. Various forms of standardized patient-based examinations for medical students have been extensively researched, with demonstration of acceptable to high reliability, face, content, and construct validity, as well as some evidence of predictive validity. OSCEs have also been used to assessthe clinical competence of residents, focusing on history and physical examination skills, often discovering unidentified deficits in these skills.6With the OSCE problem solving and clinical judgment may be assessedin the poststation encounter, usually a written exercise given after the patient encounter. However, the examinee is often unable to act on results of investigations and is seldom required to formulate a complex management plan. The typical OSCE includes a 5- to lo-minute postencounter probe7 that asksthe candidate to list his or her differential, write admission orders, or interpret an xray film. Only rarely are candidates required to complete an entire patient assessmentincluding diagnosis and management. The management component of a typical OSCE may be suitable for the student level but does not allow for the types of decisions faced by a surgeon, such as whether to use a nonoperative or an operative approach, or which operation is most suitable for a given patient. The candidate’s ability to justify one course of

global rating form.

action over another is also not well assessed. Management options are chosen after the clinical encounter without the ability to include the patient in the decision-making process. OSCEs at the resident level have had acceptable reliability. 8,gHowever, these reliabilities have been based on a fairly heterogeneous population of both junior and senior residents. One would expect the reliability for a more homogeneous group to be lower. Furthermore, although OSCEs have been shown to discriminate between junior and senior residents, the differences in scores have been minimal, implying that the traditional OSCE, with its focus on discrete basic skills, does not adequately assessthe maturation in clinical judgment and problem solving that develops over years of clinical training. A further criticism of the OSCE is that stations that rely on detailed checklists may reward thoroughness rather than competence. Research on problem solving by the expert clinician suggests that experts are not necessarily thorough in the history and the physical examination, but are accurate in arriving at a diagnosis with minimal information.‘O Thus the use of detailed, structured checklists may discriminate against the expert. Furthermore, the structured nature of the checklists may not allow for recognition that there could be more than one approach to the clinical problem.‘l Critics of the checklist approach to assessment also argue that not all features of professional practice can be quantified in detailed checklists.12 For more complex cases, subjective global ratings by experts may yield more reliable and valid information than objective lists, which lack flexibility in rewarding different approaches and tend to reward thoroughness. In the PAME traditional checklists, as used in the OSCE, functioned poorly. The reliability of the first checklist was 0, in contrast to the first patient encounter

340 MacRae et al.

MANAGEMENT

surgery August 1997

PLAN

1 unsafe, inaccurate or illogical

2 borderline

^

3 based on appropriate information/tests, shows sound judgment

4

5 excellenf doe3 not miss any pertinent information in making decision, superior judgment

4

5 thorough indepth knowledge demonstrated

KNOWLEDGE 1

2

3

lacks important

borderline

appropriate knowledge level, minor deficiencies that would nol affect

COMMUNICATION

SKILLS

1

2

unprofessional manner, poor or vague explanations, poor technique or skills

borderline

3 explanations adequate, professional

4

5 =ceptionalrapport, puts patient at ease, professional at all times

manner, appropriate communication skills

OVERALL ON TEUSSTATION 1

2

3

lacks knowledge and skills and/or attitude necaary for independent practice inthearea

borderline

adequate for independent practice inthearea

Fig. 5. Follow-up

4

5 exceeds expectations

patient examiner globalrating form.

global rating of 0.51. This is in keeping with a study of physician competence by Norman et al.,” who found that an examination with longer standardized patient encounters rated on a seven-point global scale had better psychometric properties than traditional OSCE stations with structured checklists. Although the reliability of the second checklist was much better than that of the first, the second encounter checklists were shorter, less structured, and more open to global interpretation than traditional OSCE checklists, and may have functioned like global ratings. These findings suggest that a model based on expert judgments, consistent with a connoisseurship model of evaluation,13 may be more appropriate than objective checklists in the evaluation of complex constructs such as clinical competence.14 The overall global assessmenton the PAME had fairly good reliability for a six-station examination of a fairly homogeneous group of residents. Other than multiple-choice examinations, evaluation

tools commonly used have not been shown to be as reliable as the PAME at the senior resident level. Construct- and criterion-related validity of the PAME were also examined. The PAME demonstrated evidence of construct validity by being able to differentiate between fourth- and fifth-year residents. The proportion of variance in examination scores, which could be attributed to year of training, was remarkably high considering the homoge.neity of the candidate population and findings of case specificity in the OSCE literature.15 Establishing criterion-related validity of the PAME is difficult because there is no gold standard against which the PAME can be compared. The low correlations between the PAME and the measures of competence used in our program could be due in large part to the poor reliabilities of the ITER, the ITER-CK, and the OE. This is probably not true for the CAGS examination, however, because multiple-choice examinations tend to be quite reliable. Global ward ratings, such as the ITER and the

MacRae et al. 341

surgfq

Volumr122, Number2

KNOWLEDGE 1 missing important information

2 borderline

3 suffcient for independent practice in general surgery

4

5 at level of subspecialist in areas of assessment

2 borderline

3 safeand sufficient for independent practice inthearea

4

5 superior judgment demonstrated

2 borderline

3 sufticient for independent practice inthearea

4

5 superior performance overall

JUDGMENT 1 lapses in judgment which may lead to adverse patient outcomes

OVERALL 1 did not demonstrate skills or knowledge at level expected for independent practice

Fig. 6. Structured

oral examiner

ITER-CK, have very high face validity in that they are designed to assess the day-to-day clinical functioning of the resident. They assess performance rather than competence, and would theoretically be a better measurement of how a candidate will function in practice. However, in keeping with the findings of this study, others have shown that the reliability of these ratings is 10w.l~ Individual supervisor ratings are often based on unsystematic and limited observations of performance with widely varying types of patients. The poor reliabilities of the ITER and the ITER-CK ratings indicate the large error component of these measures. Furthermore, the PAME relies on direct observations of history and physical examination and communication skills in contrast to the ITER, in which these skills are often only inferred. Thus, although these ratings may be useful for formative feedback, until better ward ratings are developed, history and physical examination skills, as well as interpersonal skills, cannot be presumed adequate based on the ITER and should be directly assessed. Multiple-choice examinations are reliable, but they tend to measure knowledge and, to a limited extent, problem solving, with little or no ability to assess other components of clinical competence. Their use may be justified in sampling a broad range of knowledge as long as one recognizes the limitations of relying on written examinations and is aware that higher levels in the cognitive domain

global rating form.

such as application and evaluation may not be assessed. l2 The low correlations between the PAME and the CAGS suggest that the two measures may be assessing different components of competence and may be complementary. Although some would argue that oral examinations assess problem solving and clinical judgment, McGuire” demonstrated that less than one third of questions asked required interpretive or problem-solving skill. Psychometric properties of oral examinations have been less than optimal, with poor reliability and questions about the validity of these scores.12~‘7,18 Unfortunately, the reliability of the OE scores used in this study could not be assessed because only a single score from two examiners was given. However, the OEs used showed no evidence of construct validity. Another concern with reliance on the oral examination is that neither history and physical examination skills nor the candidate’s patient communication skills are assessed. These concerns are addressed in the PAME, which also assesses problem solving and clinical judgment, but in the context of a patient encounter. Another concern with traditional assessment techniques, including the OSCE, is that they rely on historic views of the physician-patient relationship, in which the physician directs care and makes decisions about treatment in isolation from the patient. rg Studies have shown that physicians’ com-

342 Mac&e

et al.

HOW WAS THE DOCTOR

surgery August 1997 YOU JUST SAW IN: POOR

FAIR

GOOD

YERY GOOD

EXCEL LENT

1. Telling you everything; being truthtid, upfront and frank; not keeping things from you that you should know

0

0

0

0

0

0

2. Greeting you warmly; calling you by the name you prefer; being friendly, never crabby or rude

0

0

0

0

0

0

3. Treating you like you’re on the same level; never ‘talking down’ to you or treating you like a child

0

0

0

0

0

0

0

0

0

0

0

0

5. Showing interest in you as a person; not acting bored or ignoring what you have to say

0

0

0

0

0

0

6. Warning you during the physical exam about what he/she is going to do and why; telling you what he/shefinds

0

0

0

0

0

0

0

0

0

0

0

0

8. Encouraging you to ask questions; answering them clearly; never avoiding your questions or lecturing you

0

0

0

0

0

0

9. Explaining what you need to know about your problems, how and why they occurred, and what to expect next

0

0

0

0

0

0

0

0

0

0

0

0

4. Letting you tell your story; listening carefully; asking thoughthd questions; not interrupting you while you are talking

7. Discussing options with you; asking your opinion; offering choices and letting you help decide what to do; asking what you think before telling you what to do

10. Using words you can understand when explaining your problems and treatment; explaining any technical medical terms in plain language

Fig. 7. Patient satisfaction rating scale.

skills affect the patient’s expectations and perceptions of his or her illness, with the motivation and compliance of the patient in following the recommended treatment plan also directly affected by the doctor-patient relationship.20Furthermore, deficiencies in communication skillsare a major cause of patient dissatisfaction and litigation.21 Demonstration of knowledge in diagnosisand management issuesis essentialbut should not be viewed in isolation from the communication skillsphysiciansneed to pro vide quality health care. To ensure valid interpretations of examination results, the evaluation process should comprehensively assess all components of competence, including how the physician explores, elicits, and evaluatespatient preferences and wishes. munication

Thus, for senior residents, neither traditional techniques nor OSCEs comprehensively assess the higher-level skills required in developing management plans and solving clinical problems. The examination tools currently in usedo not show the candidate’s ability to interact effectively with patients and, in so doing, reinforce traditional views that management decisions can be made in isolation from patients. Specialty certification decisions are made without ensuring that all components of clinical competence have been demonstrated as adequate. The PAME may help to addressthis deficiency, although further studiesof the psychometric properties and feasibility of this examination on a larger scaleare required.

surgery Volume 122, Number 2

REFERENCES 1. Norman GR. Defining competence: a methodological review In: Neufeld VR, Norman CR editon. Assessing clinical compe tence. NewYork Springer 1985. 2. D’Gxta AG. The validity of credentialing examinations. Evaluation and the Health Professions 1986;9:137-69. 3. Task Force of the Evaluation Committee, Royal College of Physicians and Surgeons of Canada. Report on the evaluation s)s tern for specialist ceriification. 1993. 4. Harden R, Stevenson M, Dotie W, Wilson G. Assessment of clinical competence using an objective structured examination. British Medical Journal 1975;1:447-51. 5. Barrow HS. Simulated patients. Springfield (IL): Thomas, 1971. 6. Li JT Assessment of basic physical examination skills of internal medicine residents. Academic Medicine 1994;69:296-9. 7. ReznickR, Smee S, Rothman AI, et al. An objective structured clinical examination for the licentiate. Report of the pilot project of the Medical Council of Canada. Acad Med 1992;67:487-94. 8. Reznick R, Cohen R An objective swctured clinical examination for senior residents: testing at the high end. Proceedings of the Sixth Ottawa Conference on Medical Education, Toronto: University of Toronto Press, 1994. 9. Srillman P, Swanson D, Regan MB, Philbii MM, Nelson V, et al. Assessment of clinical skills of residents utilizing standardized patients: a follow up study and recommendations for application. Ann Intern Med 1991;114:393401. 10. Elstein AS, Shulman LS, Sprafka SA. Medical problemsolving: an analysis of clinical reasoning. Cambridge: Harvard University Press; 1978. Il. Norman GR, Davis DA, Iamb S, Hanna E, Caluford P, Kaigas T timpetency assessment of primary care physicians as part of a peer review program. JAMA 1993270: 104&51. 12. Norman GR Theoretical and psychometric considerations. In: Task Force of the Evaluation Committee, Royal College of Ph@cians and Surgeons of Canada Report on the evaluation v tern for specialist certilication. 1993. 13. Worthen BR, Sanders JR. Educational evaluation. Alternative approaches and practical guidelines. White Plains: Longman; 1987. p 98113. 14. McGaghie WC. Professional competence evaluation. Educational Researcher 1991;20:39. 15. Colliver JA, Williams RG. Technical issues: test application. Acad Med 1993;68:45460. 16. Sueiner DL. Global rating scales. In: Neufeld VR, Norman CR, editors. Assessing clinical competence. NewYork: Springer; 1985. 17. McGuire CH. The oral examination as a measure of professional competence. Journal of Medical Education 1996;41:267-74. 18. Muzin LJ, Hart L Oral e xaminations. In: Neufeld VR, Norman GR editors. Assessing clinical competence. New York: Springer; 1985. 19. Deber RB. Physicians in health care management: 7. The patientphysician partnership: changing roles and the desire for information. Can Med AsocJ 1994;151:1716. 20. Simpson M, Buckman R, Stewart M, Maguire P, Lipkin M, Novack D, et al. Doctor-patient communication and the Toronto Consensus Statement Brit Med J 1991;303:13857. 21. Shapiro RS, Simpson DE, Lawrence SL, Talsky AM, S&o&ski KA, Schiedetmayer DL. A survey of sued and non sued physicians and suing patients. Arch Intern Med 1989;19:219@6 DISCUSSION Dr. ing to paper, areas

Alden H. Hat-ken (Denver, Colo.). What are we trydo with the certifying process? In reading over your it seemed to me that there really are five different that we are trying to assess. One is surgical knowl-

MacRae

edge.

We do that

with

the

in-training

et al.

examination.

343

The

difficulty with the in-training examination and the multiple-choice examination is that they require a noncontroversial answer, which is why we ask so many questions about multiple endocrine neoplasia syndromes and so few about breast cancer. The second is diagnostic facility; the third is the ability to digest data; the fourth is fluency in communication, or how to tell a patient that she has breast cancer. The fifth area is the safety of clinical performance. I would suggest that four of the five of these are very subjective. For that reason I would not anticipate any construct validity when testing those areas. I would like to think that the residents and those of us who have more basic surgical knowledge can also communicate that knowledge better to a patient, but that doesn’t necessarily follow. Whenever you put together a well-organized teaching tool presented by enthusiastic faculty, I think that has to be good. Therefore I think this PAME, just the process, has to be good. Much like the structured clinical instruction module, it has to be good. In a discipline in which a lot of what we do is science, we should stop being apologetic for evaluating ourselves in a subjective fashion. What do you envision us trying to evaluate in an American Board of Surgery examination? There is a factual component, but there is a lot more. The subjective components of diagnostic facility and the ability to digest data can be ferreted out in an oral examination. The good examiners can figure out whether you can digest and deal with that data. Fluency in communication is subjective, as is the ultimate question about whether this person is a safe surgeon. What are we trying to do with the examination process? Does the examination process itself promote the residents and those of us, their teachers, to learn more and to be able to communicate better? Ultimately, surgery is a subjective discipline, and we have to evaluate it subjectively. D.r. Ajit K. Sachdeva (Philadelphia, Pa.). You had a reliability of 0.7, which obviously is too low to make passfail decisions. Could you speculate on the factors that led to the lower reliability than has been reported from your own institution in certain other series? We reported our experience with first-year residents in which we got a reliability of 0.9 with 3.3 hours of testing. Could you speculate on the reasons for the lower reliability in this study, and how you could improve that? Dr. MacRae. Dr. Sachdeva, the reliability of 0.7 is not as high as we normally would have observed and, as you ponted out, not high enough for pass-fail decisions. If you look at most of the OSCEs in the literature, they are usually very short stations. So you are usually looking at 5- to lo-minute stations. In 3.0 hours of testing time you most likely had a much broader sample. The problem with those very short stations is that although you do enhance the reliability, you don’t get at the spectrum of communication skills, and your validity may suffer. One of our stations was quite problematic. Without that particular station, our reliability would have been better. We assume that we are going to need eight stations to receive the pass-fail of 0.8 that we need. If we are trying to get

344

MacRae

et al.

surge?y August 1997

this comprehensive assessment of competence, we might have to accept that the reliability will be a little bit lower than with the very short, quick stations. To answer Dr. Harken’s more global comments on what we are trying to do with certification, I think the main thing that we are trying to do with certification is to assure the public that those people that we say are certified to practice general surgery are actually competent in all components of the job. I agree that certification is a

subjective decision. Perhaps in the past by trying to rely on objective tests such as multiple-choice examinations or objective structured clinical examinations, we have been minimizing the gestalt of what goes into being a competent physician. The nice thing about the PAME is that we go back to that subjective assessment. We are looking to our colleagues to evaluate candidates subjectively, but not necessarily with a lot of error in measurement.