How assessment drives learning in neurosurgical higher training

How assessment drives learning in neurosurgical higher training

Journal of Clinical Neuroscience 14 (2007) 349–354 www.elsevier.com/locate/jocn Neurosurgical Education How assessment drives learning in neurosurgi...

120KB Sizes 1 Downloads 142 Views

Journal of Clinical Neuroscience 14 (2007) 349–354 www.elsevier.com/locate/jocn

Neurosurgical Education

How assessment drives learning in neurosurgical higher training Michael Kerin Morgan a,b,*, Rufus M. Clarke b, Michael Weidmann a, John Laidlaw a, Andrew Law a a

Neurosurgery Education Development Committee, Royal Australasian College of Surgeons, Melbourne, Victoria, Australia b The Office of Teaching and Learning in Medicine, The University of Sydney, Sydney, New South Wales, Australia Received 5 November 2005; accepted 5 December 2005

Abstract Certifying the competence of neurosurgeons is a process of critical importance to the people of Australia and New Zealand. This process of certification occurs largely through the summative assessment of trainees involved in higher neurosurgical training. Assessment methods in higher training in neurosurgery vary widely between nations. However, there are no data about the ‘utility’ (validity, reliability, educational impact) of any national (or bi-national) neurosurgical training system. The utility of this process in Australia and New Zealand is difficult to study directly because of the small number of trainees and examiners involved in the certifying assessments. This study is aimed at providing indirect evidence of utility by studying a greater number of trainees and examiners during a formative assessment conducted at a training seminar in Neurosurgery in April 2005. Aim: To evaluate an essay examination for neurosurgical trainees for its validity, reliability and educational impact. Methods: A short answer essay examination was undertaken by 59 trainees and corrected by up to nine examiners per part of question. The marking data were analysed. An evaluation questionnaire was answered by 48 trainees. Eight trainees who successfully passed the Fellowship examination who had also taken the short essay examination underwent a semi-structured interview. Results: The essay examination was found to be neither reliable (generalisability coefficient of 0.56 if the essay paper had comprised 6 questions) nor valid. Furthermore, evidence suggests that such an examination may encourage a pursuit of declarative knowledge at the expense of competence in performing neurosurgery. Conclusion: This analysis is not directly applicable to the Fellowship examination itself. However, this study does suggest that the effect of assessment instruments upon neurosurgical trainees’ learning strategies should be carefully considered. Ó 2006 Elsevier Ltd. All rights reserved. Keywords: Neurosurgery; Training; Assessment; Essay; Evaluation

1. Introduction Summative and formative assessment methods in higher training in neurosurgery vary widely between nations. Despite the importance of credentialing competent neurosurgeons, no national or multinational neurosurgical training system has been studied as to the utility of its assessment (reliability, validity and impact).1 What is known is that the methods and content of assessment in higher education * Corresponding author. Present address: The University of Sydney, Department of Surgery, Level 8, 193 Macquarie Street, Sydney, NSW 2000, Australia. E-mail address: [email protected] (M.K. Morgan).

0967-5868/$ - see front matter Ó 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.jocn.2005.12.011

are critically important in determining the quality of the outcomes to be achieved.2–4 The use of assessment as a tool to ensure quality of training, and outcome-based education, are two of the principles underscoring contemporary medical education.5 Constructive alignment of the assessment task with the outcome goals is critical in influencing trainees’ learning.6 Higher training in neurosurgery is a bi-nationally coordinated training scheme of the Royal Australasian College of Surgeons (RACS), supervised by its Board of Neurosurgery (BON). Summative assessments (an examination for the purpose of grading) include a written examination, an oral examination, supervisor assessment and log-book assessment. Formative assessments (an examination for

350

M.K. Morgan et al. / Journal of Clinical Neuroscience 14 (2007) 349–354

the purpose of providing feedback) include supervisor feedback and written examination at the twice-yearly training seminars conducted by the BON. However, unless educational activity is constructively aligned with the goal of competence in neurosurgery, the quality of trainees’ learning may be harmed.7 It has been argued that all aspects of neurosurgical training need to be reviewed at this time to optimise the contribution of professional insight into trainees’ achievement of competent performances in neurosurgery.8 Assessment is a critical element in this educational process and is a critical tool employed by the BON in providing feedback to trainees. The twice-yearly bi-national training seminars constitute an important teaching event in neurosurgical training. These are focused meetings conducted over 3 days, consisting of teaching-learning activities – tutorials and lectures – based on the theme of the seminar. During each seminar, a formative assessment, consisting of essays or multiple choice questions (MCQs), is conducted. The format attempts to mimic aspects of the written assessment component of the summative Fellowship examination. However, the value of this process (as currently instituted) has not been subject to formal evaluation or analysis. The aim of this study was to investigate the reliability and validity of a specific formative assessment task undertaken by trainees. In this context, reliability includes the extent to which different examiners generate similar judgments when correcting these two essay questions for the 59 trainees.9 The reliability of the whole examination and inter-examiner reliability were analysed. The face validity of the assessment task was examined by an evaluation questionnaire for all trainees attending the seminar, as well as an interview of the trainees who successfully completed the Fellowship examination, which was held within 4 weeks of the seminar. 2. Methods and resources 2.1. Essay examination 2.1.1. Trainees Fifty-nine trainees participating in the seminar had been provided with prior reading material with the knowledge that the exam was to be confined to this reading material (in the field of cerebrovascular neurosurgery, which was the theme of the training seminar). The trainees at the seminar comprised all advanced trainees in neurosurgery in Australia and New Zealand. They ranged in experience from years 1–5 on the training program. 2.1.2. Examination Trainees wrote an essay examination over 60 minutes, consisting of two questions each with three parts. The questions were three (of four) parts taken from the April 2002 and the May 2000 Fellowship examination for neurosurgery (Appendix 1), which were the most recent questions on this topic (as of May 2005) in past Fellowship examinations. Similarities between the Fellowship short-essay

examination and the trial essay examination were that these were previously-used questions from the Fellowship examination in neurosurgery, and the time allocation for each of the three stem parts were the same as the Fellowship examination (i.e. averaging 10 minutes). Differences between the seminar essay examination and the Fellowship essay examination included the restriction to two rather than five essay questions (held once over 1 hour rather than for the Fellowship examination of 2 days for a total of 3.5 hours allocated to essays), and three parts rather than four or five stems as used in the Fellowship examination; and finally, essay questions used in the Fellowship examination are not generally re-used. The fifty-nine participants sat this formative assessment paper comprising two essay questions (1 and 2), each of three equal parts (A, B, and C). The scripts were marked, independently, by a variable number of examiners, as follows: question 1 A, B and C (9, 1 and 3 examiners respectively) and question 2 A, B and C (1, 8 and 1 examiners respectively). Examiners used a close-marking technique inherited from the Royal College of Surgeons (England). Each answer was given a score of 8, 8.5, 9 or 9.5 corresponding to grades of Fail, Borderline fail, Pass, and Outstanding. 2.1.3. Examiners The examiners chosen to provide feedback were experienced neurosurgeons with an interest in neurosurgical training (either as supervisors, members of the BON, or participants in the seminar). The number of examiners was determined by the availability of recruited volunteers (rather than statistical methodological determinants). Examiners were asked to mark the exam as if it were the Fellowship Examination for the RACS. As for the Fellowship Examination, a model answer was provided for each of the two questions. These model answers were prepared in point form by two different ‘‘experts’’ thought to have a good knowledge of the subject. The examiners were instructed to refer to the model answers in determining their allocation of marks. The examiners were blinded both to the identity of the examinee and to other examiners’ marks. The scripts were marked in random order. 2.2. Course evaluation At the completion of the course an evaluation was performed by the RACS that incorporated trainees’ evaluations of the trial examination. Of the 20 evaluation questions, two specifically addressed the examination. These were: ‘‘The examination experience will help me pass the final Fellowship examination’’ and ‘‘The examination experience will help me become a competent neurosurgeon.’’ The evaluation scale was a continuous line marked between the pole of ‘‘I strongly disagree’’ (rated as 0) and ‘‘I strongly agree’’ (rated as 10). In the analysis, an independent assessor marked the responses to the nearest 0.5.

M.K. Morgan et al. / Journal of Clinical Neuroscience 14 (2007) 349–354

A semi-structured interview of trainees successfully passing the Fellowship examination within 4 weeks of the seminar examination was performed using open-ended questions: 1. ‘‘What did you find most useful about the trial examination held in the seminar?’’ 2. ‘‘What was most surprising about the trial examination held in the seminar?’’ 3. ‘‘What did you find least useful about the trial examination held in the seminar?’’ 4. ‘‘What would you suggest be changed in the future for the trial examination?’’ 5. ‘‘Do you think the trial examination makes you a better neurosurgeon and if so why?’’ 6. ‘‘Do you think the trial examination should be in this form again?’’

2.2.1. Statistical analysis As the aim was to determine reliability of the formative assessment sat by participants in the training seminar conducted in April 2005 by the Board of Neurosurgery in the presence of multiple sources of error (differences between questions, difference between candidates, and other differences), the instrument chosen was derived from generalisability theory generating a ‘generalisability coefficient’ as a measure of the reliability of the examination. Analysis of the evaluation responses to the two questions about the trial examination was made using Student’s t-test on the scores of ‘‘the examination experience will help me become a competent neurosurgeon’’ and ‘‘the examination will help me pass the final Fellowship examination.’’ When this was analysed by year of training, a Bonferroni correction was applied for five comparisons.

351

3.1.1. Reliability of the whole examination (2 questions) A score for each question was obtained by summing the separate scores obtained on the three parts of each question, using the mean where replicate (multiple) examiners’ scores were available for any part of a question. The generalisability coefficient was 0.28. Since the reliability of an assessment increases with the number of questions in the examination, the Spearman-Brown ‘prophecy’ formula can be used to calculate what the reliability would have been for a full examination (6 questions). The reliability of a 6-question paper composed of questions of similar psychometric quality would have been 0.54. Analysis of variance reveals the extent to which the variability in the data can be ascribed to three elements: (i) differences between candidates (a desirable characteristic of an examination designed to distinguish between candidates); (ii) differences between questions (not a desirable characteristic if candidates are expected to display overall competence across the field of study, and if examiners are consistent in setting the level of difficulty of the questions and in their application of the marking standards); and (iii) residual variance, which is the combination (in unknown proportions) of: (a) the statistical interaction between candidates and questions (some candidates score well on some questions, and poorly on others, while other candidates demonstrate a different pattern of performance on the same questions); and (b) ‘residual error’, which is the variance due to other factors, not incorporated in the candidates/questions model, for example, candidate fatigue, examiner error.

3. Results The distribution of the scores awarded by the examiners in this formative assessment is shown in Table 1.

This analysis reveals that the source of variance was the questions (proportion of variance = 0.04), candidates (proportion of variance = 0.16) and residual (proportion of variance = 0.80).

3.1. Reliability analyses Analysis of variance reveals the sources of variance in the data: due to candidates, due to questions, and residual variance.

3.1.2. Inter-examiner reliability The use of multiple markers for two of the parts of the questions allowed estimates of inter-examiner reliability to be made. The analyses yielded the results in Table 2.

Table 1 Distribution of scores for the examination Score

Grade equivalent

Number awarded

Percentage

8 8.5 9 9.5 Missing Total

Fail Almost pass Pass Outstanding

327 496 496 35 3 1357

24% 37% 37% 3% 0% 100%

Three cells for Question 1.A had data missing and these were filled with the candidate’s mean score from the other examiners.

Table 2 Inter-examiner reliability

Number of examiners Unadjusted generalisability coefficient Examiner source of variance Candidates source of variance Residual source of variance

Question 1A

Question 2B

9 0.31 2% 5% 94%

8 0.83 13% 32% 54%

352

M.K. Morgan et al. / Journal of Clinical Neuroscience 14 (2007) 349–354

Table 3 Evaluation questions included in course evaluation questionnaire Evaluation questions

Score mean

SD

Raw scores not agreeing with statement (i.e. <5.5)

‘‘Training seminar will help me to pass the Fellowship examination’’ ‘‘Training seminar will help me become a competent neurosurgeon’’ ‘‘The examination will help me pass the final Fellowship examination’’ ‘‘The examination experience will help me become a competent neurosurgeon’’

8.43 8.23 8.1 6.32

1.24 1.26 1.7 2.2

2% 4% 8% 27%

Table 4 Differences in the two questions relevant to examination with respect to year of training

Number of trainees responding ‘‘The examination will help me pass the final Fellowship examination’’ ‘‘The examination experience will help me become a competent neurosurgeon’’ p-value for Student’s t-test with Bonferroni correction for 5 years analysed

4. Course evaluation

Year 1

Year 2

Year 3

Year 4

Year 5

All

9 8.7 ± 1.7 6.6 ± 1.3 0.035

9 7.7 ± 1.8 5.8 ± 3.0 0.015

11 7.6 ± 1.9 6.5 ± 2.4 0.45

11 8.0 ± 1.5 5.7 ± 2.1 0.002

8 8.7 ± 1.7 7.1 ± 2.2 1.0

48 8.1 ± 1.7 6.3 ± 2.2 <0.001

a breadth of reading that would be beneficial in becoming a neurosurgeon.

4.1. Evaluation questionnaire 5. Discussion Forty-eight of 59 trainees responded. The mean and standard deviation of the responses are shown in Table 3. Two questions relevant to the seminar and two relevant to the examination have been included. Analysis of the responses for two questions regarding the trial examination demonstrated a significant difference between the mean scores, with ‘‘the examination experience will help me become a competent neurosurgeon’’ scoring a significantly higher agreement than ‘‘the examination will help me pass the final Fellowship examination.’’ The proportion of respondents who did not agree with the statement that ‘‘the examination experience will help me become a competent neurosurgeon’’ was 27%, while no more than 8% disagreed with any of the remaining three statements. The difference in these two questions with respect to years of training are shown in Table 4. 4.2. Interview successful candidates of the Fellowship examination This was completed with eight of the nine candidates who both attended the training seminar and passed the Fellowship examination. There was uniform agreement that the trial examination was a valid representation of the final Fellowship examination with regard to content, time pressure and overall experience. The main benefits of the trial examination were the practice at answering questions within a restricted time, and the feedback provided to these Fellowship candidates. The final Fellowship examination was not uniformly regarded as an experience that would make them ‘‘a better neurosurgeon’’ but there was a minority opinion that it may do so by encouraging

This is a limited study of assessment in higher neurosurgical training with significant limitations upon the conclusions that can be drawn with regard to the Fellowship examination. However, it does provide some evidence that should be considered. The examination in this seminar varied significantly from that in the Fellowship examination with respect to candidate preparation time, length of time per essay, the naivety (the examiners marking these essays had not been Fellowship examiners although they had taken instruction from a previous Fellowship examiner) of the examiners with respect to Fellowship experience, the inability of examiners to confer with colleagues to help with judgment, and the time allocated by examiners in examination judgment. However, it does raise questions as to whether assessment instruments should be evaluated more rigorously, as they may not be driving learning in an appropriate direction. This appropriate direction is to achieve a level of understanding that encourages a competent performance in neurosurgery. From the trainees’ perspective the formative assessment was regarded as a valid experience in preparation for the final Fellowship examination (as suggested by the course evaluation questionnaire). The examination scored less well with respect to trainees believing that the formative assessment helped to create competent neurosurgeons. Furthermore, it is clear that this examination does not reach the level of reliability customarily required for a high-stakes summative assessment. However, there are no published data on the reliability of the Fellowship examination, although it is known that essay questions are traditionally known to be of low reliability.

M.K. Morgan et al. / Journal of Clinical Neuroscience 14 (2007) 349–354

The educational impact of an essay examination upon trainees’ approaches to learning was of some concern, in that it may have contributed to encouraging a breadth of understanding and declarative knowledge at the expense of performing competently. This competent performance must be in the domains of managers of patient care, managers of themselves and managers of their environment.8 This has been previously discussed by the Neurosurgery Education Development Committee.8 5.1. Reliability The reliability of a 6-question paper composed of questions of similar psychometric quality would have been 0.54. An acceptable level of reliability for high-stakes examinations such as this, is in the range 0.70–0.80, and a value of 0.90 is the goal.5 However, it is encouraging that analysis of variance revealed that more of the variance could be ascribed to the candidates than to the questions (although most of the variance was not accounted for by these two sources). The generalisability coefficient for Question 2 (in comparison to Question 1) was quite respectable and discriminated well between candidates. The model answers for these two questions were developed by different examiners. With respect to inter-examiner reliability, here again Question 2.B performed better than 1.A, with a respectable generalisability coefficient, and the ability to distinguish between candidates. However, there was also more variability attributable to the examiners. This analysis shows that there were minor differences in performance between the two questions. These differences are probably insufficient to account for the lack of reliability of the examination as a whole. The superior performance of Question 2 (with a generalisability coefficient of 0.58 and greater candidate discrimination) may be due to the model answer for this question. The importance of the model answer for this type of examination may be crucial for acceptable inter-rater reliability and the contribution made by the model answer to reliability deserves analysis. 5.2. Validity The analysis of the evaluation reveals that trainees found the trial examination was valid with respect to the goal of passing the Fellowship examination but is less strongly associated with the goal of becoming a competent neurosurgeon. In fact, 27% of trainees did not believe that it was of assistance in this goal. This level of disagreement was not seen for any other question, where the maximum disagreement was 8%. The discord between the goals of becoming a competent neurosurgeon and passing the Fellowship examination sends a message that the final Fellowship examination is not seen

353

by trainees as being in constructive alignment with trainees’ presumed ambitions to be competent neurosurgeons, although they recognise the necessity of passing the Fellowship essay examination. The impact upon learning (suggested by the semi-structured interviews) is that trainees seek to develop a breadth of declarative knowledge rather than to perform with a depth of understanding. 6. Conclusion Training in neurosurgery must include the goals of learning to be a competent performer of neurosurgery with life-long learning skills. Any assessment that drives learning must also be valid with respect to these goals. In addition, for summative assessment, in which the stakes for both the trainees and the communities of Australia and New Zealand are high, the assessment must be reliable. The results from this limited study suggest that assessment drives learning for neurosurgical trainees in higher surgical training in Australia and New Zealand. This is evidenced by:  Higher neurosurgical trainees recognising that preparation for formative and summative assessment may not be in constructive alignment with the outcome of competence in neurosurgery.  Higher neurosurgical trainees learning strategically in order to achieve success at the final Fellowship examination, irrespective of whether or not such an examination relates to performing neurosurgery with competence.  Higher surgical trainees prefer formative assessment to reproduce aspects of what is to come in the final Fellowship examination, including aspects of anxiety and time limitations. Neither reliability nor validity was achieved with the formative assessment employed in this study. Furthermore, the assessment would not have achieved acceptable reliability even if it were doubled in length and time. The role of essays as an assessment instrument for the high-stakes surgical Fellowship examination is important to examine for performance assessment.10 Written examinations are considered to be central to the ‘‘hidden curriculum’’.1 Therefore, their impact must be seen as crucial irrespective of the number of other assessment tools employed by the RACS (including oral examination, supervisor assessment and portfolio examination). A written examination lacks constructive alignment with the outcome goal of performing surgery competently, although it may be possible to test surgical decision-making by this instrument. Following Miller’s schema, it is important for neurosurgeons (both in training and post-training) to ‘‘show how’’ or ‘‘do’’ for the assessment, a level that should be required of a credentialing examination in surgery.10 Such an

354

M.K. Morgan et al. / Journal of Clinical Neuroscience 14 (2007) 349–354

assessment instrument would drive learning in a desirable direction. Appendix 1 Examination questions. Each question is of equal marks. Question 1. A 52-year–old woman, whose brother recently died from a cerebral aneurysm rupture, presents with headache and a CT scan suggests a 1-cm anterior communicating artery aneurysm. (a) Describe the factors that influence formation, growth and rupture of intracranial aneurysms. (b) What further investigations would you recommend to this patient and what advice would you give her for management and prognosis? (c) What are the arguments for and against endovascular versus direct microsurgical clipping of intracranial aneurysms? Question 2. A 44-year-old right-handed man previously in good health developed sudden severe headache and shortly thereafter, a dense left hemiparesis. He remained fully alert. A CT scan showed a right fronto-parietal intracerebral haemorrhage with moderate midline shift. A carotid angiogram showed a right parietal arteriovenous malformation close to the Rolandic fissure. The nidus was 3.5 cm in diameter and extended toward the lateral ventricle. The arterial supply was predominantly from the right middle cerebral artery with a lesser supply from the anterior cerebral artery.

(a) Outline your immediate management. (b) Discuss the treatment options including the advantages and disadvantages of each option. (c) Discuss the pathological anatomy of arteriovenous malformations and indicate those factors which influence surgical management. References 1. Van der Vleuten CPM. The assessment of professional competence: developments, research and practical implications. Adv Health Sci Educ 1996;1:41–67. 2. Crooks T. The impact of classroom evaluation practices on students. Assess Educ 1998;5:131–7. 3. Fowell SL, Maudsley G, Maguire P, et al. Report of findings: student assessment in undergraduate medical education in the United Kingdom 1998. Med Educ 2000;34 (Suppl 1): 1–78. 4. Harden RM, Crosby JR. AMEE education guide no. 20: The good teacher is more than a lecturer – the twelve roles of the teacher. Med Teach 2000;22:334–47. 5. Shumway JM, Harden RM. AMEE guide no. 25: The assessment of learning outcomes for the competent and reflective physician. Med Teach 2003;25:569–84. 6. Biggs J. Teaching for Quality Learning at University. What the Student Does. 2nd edition. Philadelphia: The Society for Research into Higher Education and Open University Press; 2003. 7. Ramsden P. Learning to Teach in Higher Education. 2nd edition. New York: Routledge Falmer; 2003. 8. Morgan MK, Clarke RM, Lyon PMA, et al. The neurosurgical training curriculum in Australia and New Zealand is changing. Why? J Clin Neurosci 2005;12:115–8. 9. Friedman Ben-David M, Davis MH, Harden RM, et al. AMEE education guide no. 24: Portfolios as a method of student assessment. Med Teach 2001;23:535–52. 10. Miller GE. The assessment of clinical skills/competence/performance. Acad Med 1990;65:s63–7.