Evaluation of objective structured clinical examination for advanced orthodontic education 12 years after introduction

Evaluation of objective structured clinical examination for advanced orthodontic education 12 years after introduction

ORIGINAL ARTICLE Evaluation of objective structured clinical examination for advanced orthodontic education 12 years after introduction Henry W. Fiel...

841KB Sizes 0 Downloads 36 Views

ORIGINAL ARTICLE

Evaluation of objective structured clinical examination for advanced orthodontic education 12 years after introduction Henry W. Fields,a Do-Gyoon Kim,a Minjeong Jeon,b Allen R. Firestone,a Zongyang Sun,a Shiva Shanker,a Ana M. Mercado,a Toru Deguchi,a and Katherine W. L. Vigc Columbus, Ohio, and Boston, Mass

Introduction: Advanced education programs in orthodontics must ensure student competency in clinical skills. An objective structure clinical examination has been used in 1 program for over a decade. The results were analyzed cross-sectionally and longitudinally to provide insights regarding the achievement of competency, student growth, question difficulty, question discrimination, and question predictive ability. Methods: In this study, we analyzed 218 (82 first-year, 68 second-year, and 68 third-year classes) scores of each station from 85 orthodontic students. The grades originated from 13 stations and were collected anonymously for 12 consecutive years during the first 2 decades of the 2000s. The stations tested knowledge and skills regarding dental relationships, analyzing a cephalometric tracing, performing a diagnostic skill, identifying cephalometric points, bracket placement, placing first-order and second-order bends, forming a loop, placing accentuated third-order bends, identifying problems and planning mixed dentition treatment, identifying problems and planning adolescent dentition treatment, identifying problems and planning nongrowing skeletal treatment, superimposing cephalometric tracings, and interpreting cephalometric superimpositions. Results were evaluated using multivariate analysis of variance, chi-square tests, and latent growth analysis. Results: The multivariate analysis of variance showed that all stations except 3 (analyzing a cephalometric tracing, forming a loop, and identifying cephalometric points) had significantly lower mean scores for the first-year student class than the second- and third-year classes (P \0.028); scores between the second- and third-year student classes were not significantly different (P .0.108). The chi-square analysis of the distribution of the number of noncompetent item responses decreased from the first to the second years (P \0.0003), from the second to the third years (P \0.0042), and from the first to the third years (P \0.00003). The latent growth analysis showed a wide range of difficulty and discrimination between questions. It also showed continuous growth for some areas and the ability of 6 questions to predict competency at greater than the 80% level. Conclusions: Objective structure clinical examinations can provide a method of evaluating student performance and curriculum impact over time, but cross-sectional and longitudinal analyses of the results may not be complementary. Significant learning appears to occur during all years of a 3-year program. Valuable questions were both easy and difficult, discriminating and not discriminating, and came from all domains: diagnostic, technical, and evaluation/synthesis. (Am J Orthod Dentofacial Orthop 2017;151:840-50)

a Division of Orthodontics, College of Dentistry, Ohio State University, Columbus, Ohio. b College of Arts and Sciences, Department of Psychology, Ohio State University, Columbus, Ohio. c Advanced Graduate Education Program in Orthodontics, Department of Developmental Biology, Harvard School of Dental Medicine, Boston, Mass. All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest, and none were reported. Address correspondence to: Henry W. Fields, Division of Orthodontics, College of Dentistry, Ohio State University, 4088F Postle Hall, 305 W. 12th Ave, Columbus, OH 43210; e-mail, fi[email protected]. Submitted, July 2016; revised and accepted, October 2016. 0889-5406/$36.00 Ó 2017 by the American Association of Orthodontists. All rights reserved. http://dx.doi.org/10.1016/j.ajodo.2016.10.031

840

“A

n advanced specialty education program in orthodontics and dentofacial orthopedics requires extensive and comprehensive clinical experience, which must be representative of the character of orthodontic problems encountered in private practice.”1 These skills must be ensured at the competency level. All programs must periodically assess student progress with feedback and at the same time provide increased independence and responsibility to the students. Successfully meeting these requirements must be demonstrated during the accreditation process. This process is in place to encourage programs to recognize positive migrations in practice methods, evaluate

Fields et al

their curriculums, and ensure that the students are receiving a contemporary education. Numerous methods are available to evaluate students and curriculums. Some accomplish 1 form of evaluation better than others. Over a decade ago, the Ohio State University orthodontic graduate program elected to provide an annual objective structure clinical examination (OSCE), which is a timed, multistation, objective, and anonymous performance-based evaluation of clinically related skills for its students to monitor their growth and the adequacy of the curriculum.2 This method attempts to control variables known to influence evaluation such as expectations and previous student contact through an anonymous and standardized examination format.3 There are challenges to constructing this type of an examination such as space, logistics, and the time required to administer the examination. Although not new to health science evaluation or dentistry, it had not been used routinely in the United States for graduate-level orthodontics.4 The graduate orthodontic OSCE previously reported2 was developed based on entry-level skills determined by a survey of board-certified and nonboard-certified private orthodontic practitioners in Ohio. The top-rated skills that could sensibly be evaluated with an OSCE were chosen, and 13 stations were developed. The test focused on areas of diagnosis, clinical evaluation and synthesis, and orthodontic technique. This program, the rationale for its implementation, and the results of 3 years’ experience were previously described in detail.2 The purpose of the current project was to report on the results of 12 years of experience with the previously reported OSCE in a graduate orthodontic environment. MATERIAL AND METHODS

OSCEs have been performed annually in the advanced orthodontic education (graduate) program at Ohio State University. The study was approved by the Institutional Review Board, prospectively for the first 3 years and retrospectively for the last 9 years. This study used grades of each station, which were collected for 12 consecutive years during the first 2 decades of the 2000s for the examination given each September. Stations were set up in multiple rooms of the program's facilities so that there were 2 identical versions of each station to make the length of the examination more manageable. All required materials were provided at each station: eg, pencils, pliers, and measuring devices. All students started the examination at the same time and rotated through all 13 stations at the same pace. An audio signal provided the start and end times for each station (10 minutes per question), when the students moved to the

841

next designated station. The starting station for each student was randomly determined. All students carried a packet of precoded answer sheets so that their identity and class were anonymous. For stations testing handson skills such as wire bending, bracket placement, and so on, the examination products were submitted with the answer sheets for subsequent grading. Each station was scored on a 10-point basis, with points distributed proportionately when questions had multiple parts. The points were converted to percentages for final reporting. For 10 of the 13 stations, the examiners did not change. For the others, 1 examiner was trained by the previous examiner; for the remaining 2 stations, the station did not change from year to year, so the previous correct answers were reviewed as a method of calibration. There were 13 stations for each OSCE examination: (1) identify dental relationships, (2) bracket placement, (3) identify problems and plan treatment—mixed dentition, (4) analyze a cephalometric tracing, (5) identify problems and plan treatment—adolescent dentition, (6) place first- and second-order finishing bends, (7) form a loop, (8) identify problems and plan treatment— nongrowing skeletal problems, (9) perform a diagnostic skill, (10) place accentuated third-order bends, (11) identify cephalometric points, (12) superimpose cephalometric tracings, and (13) interpret cephalometric superimpositions. For more details on each station, see Table I. Statistical analysis

This 33-month orthodontic program maintained 1 class per year. Through the 12 years included in this project, the number of students in each class varied between 4 and 6. In addition, a 1-year internship program was offered for 1 or 2 postdental graduates, who were included in the first-year student group. All students participated in the OSCE every year with the exception of some absences for illness. We analyzed 218 (82 first, 68 second, and 68 third year classes) scores of each station from 85 orthodontic trainees. We first looked at the data cross-sectionally by analyzing students’ overall performance on each station across classes (first year, second year, and third year) using multivariate analysis of variance. The distribution of competent stations responses vs noncompetent responses by resident year also was examined by chi-square analysis with the Bonferroni adjustment. We also looked at the data longitudinally. Data from the first 5 years of the OSCE were anonymous, so we could not follow any 1 student. Also, students in the last 2 years of testing had not completed all 3 years of

American Journal of Orthodontics and Dentofacial Orthopedics

May 2017  Vol 151  Issue 5

842

Fields et al

Table I. OSCE station descriptions Station Description 1 Identify dental relationships and status. The students were asked to analyze 3 sets of digital casts and evaluate the occlusion for crowding, spacing, missing/impacted teeth, and interarch relationships in 3 planes of space. These cases were judged by faculty to remain equivalent. 2 Place bondable brackets on plaster models using a direct bonding technique. The students were expected to place brackets on the incisors, canines, and first premolars in both maxillary and mandibular arches of plaster models using wax as a temporary luting agent, adjust their positions and angulations, and then remove any excessive wax around brackets. Immediately after completing the task, the models with the brackets attached were collected with care (to avoid bracket drifting) and graded by 2 calibrated faculty examiners. For each tooth, the mesiodistal position, vertical position, and mesiodistal angulation of the brackets were judged. Three levels of grades (2, ideal position/angulation; 1, reasonable position/angulation; 0, poor position/angulation or no bracket placed) were assigned to each tooth. This was the same exercise each year. 3 Analyze records for a growing mixed-dentition patient. This station simulated evaluation of a mixed-dentition patient who had problems related to eruption, crowding, or skeletal malocclusion—sometimes in combination. The challenge was to analyze the records including a brief patient description, chief complaint, relevant past medical and dental histories, extraoral and intraoral photographs, lateral cephalometric and panoramic radiographs, cephalometric tracings, and measured values and images of either digital or plaster casts. The student ranked the top 3 problems on the prioritized problem list using the method of Ackerman et al5 with a description for each problem and the most ideal treatment plan. Multiple faculty (ie, an expert panel) agreed on the hierarchies of problems and the potential treatment plans. These cases were judged by faculty to remain equivalent. 4 Analyze a single cephalometic headfilm. This exercise showed a cephalometric tracing with the patient's values, normal values, and standard deviations. All indications of degree of deviation from normal values (asterisks) were removed. Students were asked to provide a diagnosis (summary) of the skeletal, dental, and soft-tissue status of the patient and the rationale/cephalometric values for their diagnosis for each portion. This case was judged by faculty to remain equivalent. 5 Analyze records for a comprehensive adolescent treatment patient. This station simulated evaluation of an adolescent patient in the early permanent dentition with problems relating to a developing Class II or Class III malocclusion including vertical and transverse problems with a skeletal component. We provided the records including a brief patient description, chief complaint, relevant medical and dental histories, extraoral and intraoral photographs, lateral cephalometric and panoramic radiographs, cephalometric tracing, and measured values and images of either digital or plaster casts. The students ranked the top 3 problems on the prioritized problem list using the method of Ackerman et al5 with a description for each problem and the most ideal treatment plan. The students are tested in this station on knowledge specifically as it relates to problems and management of a malocclusion with a skeletal component. They are asked to discriminate between the choices of growth modification, camouflage, and surgery. This exercise tests their most routine and fundamental decisions in many orthodontic scenarios. Multiple faculty (ie, an expert panel) agreed on the hierarchies of problems and the potential treatment plans. These cases were judged by faculty to remain equivalent. 6 Place first- and second-order finishing bends. These bends were placed in a 0.018 3 0.025 mil stainless steel preformed archwire. The bends requested were a combination of distal root tip, mesial root tip, step-up, or step-down. A typodont with brackets and bands was provided as a reference for the location of the required bends on selected maxillary central incisors, lateral incisors, and canines, and the arch form and size. Markers and a variety of pliers were provided. After completing the exercise, students taped the finished archwire to the answer sheet. The final score took into account the correct execution of bends in vertical and horizontal dimensions as well as the location, arch symmetry, form, size, and overall flatness. This exercise was some portion of the described exercise each year. 7 Form an archwire including a specific loop at a specific location. In 2003, the request was to create an arch form from a straight 0.018 3 0.025 mil stainless steel archwire. In the years after that, the request was to place a single loop (boot loop, teardrop loop, box loop, or T-loop) between a mandibular canine and premolar on an 0.018 3 0.025 mil stainless steel preformed archwire. A typodont with brackets and bands was provided as a reference for the location of the required bends and the arch form and size. Markers and a variety of pliers were provided. After completing the exercise, students taped the finished archwire to the answer sheet. The final score took into account the correct execution of bends in the vertical and horizontal dimensions as well as the location, arch symmetry, form, size, and overall flatness. This exercise was judged by faculty to remain equivalent. 8 Analyze records for an adult patient with a severe skeletal malocclusion. This station simulated evaluation of a nongrowing patient with a severe skeletal malocclusion. We provided the records including a brief patient description, chief complaint, relevant medical and dental histories, extraoral and intraoral photographs, lateral cephalometric and panoramic radiographs, cephalometric tracing, and measured values and images of either digital or plaster casts. The students were asked to rank the top 3 problems on the prioritized problem list using the method of Ackerman et al5 and a brief description for each. The students were then asked to develop the most ideal treatment plan. The students were tested in this station on their knowledge related to problems and management of a nongrowing patient with a significant skeletal component addressed with a combined surgical-orthodontic approach. They were tested on their knowledge of identifying various skeletal and occlusal traits that are associated with a severe skeletal problem. They should be able to identify and manage dental compensations that coexist in a skeletal malocclusion and make decisions with respect to treatment planning, including the need for extractions. They also should be able to define the needed surgical procedure, its timing, and sequencing of care. Multiple faculty (ie, an expert panel) agreed on the hierarchies of problems and the potential treatment plans. These cases were judged by faculty to remain equivalent.

May 2017  Vol 151  Issue 5

American Journal of Orthodontics and Dentofacial Orthopedics

Fields et al

843

Table I. Continued Station 9

10

11

12

13

Description Perform a diagnostic method. This station simulated completion and/or interpretation of a supplemental diagnostic method such as space analysis,6 tooth-size analysis,7 or cervical vertebral maturation staging (CVMS).8 Students were provided either 1 to 1 digital casts and a Boley gauge along with the Bolton relationships algorithm or at least 1 cephalometric headfilm to perform the necessary analysis and interpretation, respectively. The space discrepancy, tooth-size discrepancy, or CVMS stage was determined, and the clinical implication of that finding was discussed in the context of the clinical scenario. This exercise rotated among those listed each year, and the exercises were judged to be equivalent. Place accentuated third-order bends (eg, anterior torque, progressive posterior torque). These bends were placed on an 0.018 3 0.025 mil stainless steel preformed archwire. The bends requested were lingual root torque on maxillary central incisors and lateral incisors and buccal root torque on a maxillary premolar. A typodont with brackets and bands was provided as a reference for the location of the required bends and the arch form and size. Markers and a variety of pliers were provided. After completing the exercise, students taped the finished archwire to the answer sheet. The final score took into account the correct execution of bends in direction and the magnitude of torque as well as the location, arch symmetry, form, size, and overall flatness. This was the same exercise each year. Identify cephalometric points. Up to 13 cephalometric points were listed for identification by the student on a printed copy of a cephalometric headfilm. Structural fiducials were provided on the image and transferred to an acetate tracing overlay by the students. Cephalometric point locations were marked on the acetate sheet. For grading, the students' fiducials were aligned with those on the answer key acetate. The key had the points marked as determined by consensus of 2 examiners with an oval surrounding the point that indicated the calculated envelope of error.9 Student identification marks in the envelope were scored as correct. This was the same exercise each year. Complete cephalometric superimpositions. In this section, students were to ask to superimpose cephalometric radiographs from before and after orthodontic treatment. The case used was a skeletal and dental Class II malocclusion treated with maxillary and mandibular premolar extractions. Two sets of transparent sheets with before (black line) and after (red line) tracings (full face including soft tissues, maxilla, and mandible) were presented. In addition, cephalometric values and the amounts of crowding before and after orthodontic treatment were indicated. The students were instructed to superimpose on the cranial base and regionally (maxilla and mandible) by orienting and taping the provided tracings together on a background piece of paper. The method of superimposition was according to methods described by the division in teaching materials. For assessment, 10 points (cranial base, 4 points; maxilla, 3 points; mandible, 3 points) were awarded. In evaluating the cranial base superimposition, 1 point was assigned for the correct reference point or structure for superimposition, correct resulting maxillary changes (anteroposterior and vertical), correct resulting mandibular changes, and correct resulting soft tissue changes. For the maxillary superimposition, 1 point was assigned for the correct reference point or structure for superimposition, correct resulting maxillary incisor movement, and correct resulting maxillary molar movement. For the mandibular superimposition, the same evaluation was made. The final score was the percentage correct. This was the same exercise each year. Analyze cephalometric superimpositions. After making the superimpositions in station 12, the students interpreted the treatment results for the cranial base and the maxillary and mandibular superimpositions. They also commented on the suggested treatment plan and mechanics (eg, extraction or nonextraction, anchorage, and appliance designs) used in the case. For assessment, 10 points were awarded (2 points each for cranial base, maxillary, mandibular interpretation, treatment plan, and mechanics). For the cranial base, if the student adequately identified the changes of at least 2 of 3 regions (maxilla, mandible, or soft tissue), 2 points were given. In the maxillary (and in the mandibular interpretation), 1 point each was awarded for incisor and molar movement. For treatment plan, if students were able to indicate that the case was treated with both maxillary and mandibular premolar extractions and identify the degree of anchorage control, 2 points were awarded. For mechanics, if the students were able to describe that anchorage reinforcement and that some intrusion mechanics were used, 2 points were awarded. This was the same exercise each year.

evaluation. Therefore, we could not include these 2 groups in a longitudinal analysis of growth in the 3year program. Ultimately, we used the OSCE data that were collected from 28 students who entered the program and completed it during a consecutive 5-year period. These students completed the program in 3 years, and these data were collected from 5 cohorts of students over 8 years. In this manner, each student could be tracked from year to year. This formed the basis for the latent growth analysis. Latent growth curve modeling was used to assess students’ growth during their 3-year residency program.10 A latent growth curve model is a statistical method that allows for direct examination of measurement qualities as well as within-person changes over time.11 This type of model consists of 2 parts: a

measurement model and a structural model. The measurement model specifies an observed score (or dependent variable) as a function of the latent variable that represents the psychological construct or ability that is intended to be measured. In addition, the measurement model assumes that the 2 measurement properties (ie, easiness and discrimination) as well as the measurement error variance remain constant across different times to ensure that the measurement is invariant over time. The structural model specifies the within-person growth of the latent variable and the individual differences in the within-person growth. This part of the model specifies a linear growth curve model for the latent trait as a function of time points and subject groups. For this project, we considered the residency program's 3 years as the time points and the students'

American Journal of Orthodontics and Dentofacial Orthopedics

May 2017  Vol 151  Issue 5

Fields et al

844

Table II. Comparison of OSCE scores between student classes of each station for 12 years Student class Skill Diag/dental relationships Diag/analyze a cephalometric tracing Diag/diagnostic skill Diag/identification of cephalometric points Tech/bracket placement Tech/place first and second order finishing bends Tech/form a loop Tech/place accentuated third order bends Eval and synth/identify problems and plan treatment—mixed dentition Eval and synth/identify problems and plan treatment—adolescent dentition Eval and synth/identify problems and plan treatment—nongrowing skeletal problems Eval and synth/superimpose cephalometric tracings Eval and synth/interpret cephalometric superimpositions

Station 1 4 9 11 2 6 7 10 3

First 73.52 6 1.6.72 80.7 6 13.73 63.93 6 39.09 69.37 6 14.64 81.06 6 14.33 58.72 6 25.77 72.66 6 18.06 59.88 6 20.64 66.53 6 16.4

Second 82.96 6 12.24 83.83 6 13.23 75.96 6 30.37 73.51 6 17.11 89.44 6 09.1 81.69 6 17.77 77.79 6 16.69 77.64 6 16.66 74.73 6 16.15

Third 85.79 6 10.06 83.37 6 13.54 81.31 6 26.35 75.76 6 14.36 91.17 6 07.69 89.11 6 10.47 79.65 6 15.07 81.31 6 15.02 72.84 6 17.83

Significant differences 1\2 5 3 15253 1\2 5 3 1 5 2, 2 5 3, 1 \ 3 1\2 5 3 1 \ 2 \3 1 5 2, 2 5 3, 1 \ 3 1\2 5 3 1\2 5 3

5

64.33 6 19.59

73.97 6 13.43

75.51 6 16.51

1\2 5 3

8

69.15 6 17.28

76.18 6 10.55

80.07 6 12.8

1\2 5 3

12 13

79. 6 25.37 61.39 6 23.91

88.88 6 13.89 73.98 6 18.8

91.03 6 12.93 79.45 6 14.89

1\2 5 3 1\2 5 3

Diag, Diagnostic skill; Tech, technique skill; Eval and synth, evaluation and synthesis skill; 5, not significant (P . 0.063); \, significantly higher (P \0.028).

entrance classes as the subject group variables. The constructed model was estimated using the software package gllamm in Stata (StataCorp, College Station, Tex). All other analyses were done in R (R Foundation, Vienna, Austria). RESULTS

The overall change of OSCE scores over a 12-year period is summarized in Table II. All stations except for station 4 (cephalometric analysis), 7 (form a loop), and 11 (cephalometric points) had significantly lower mean scores for the first-year student class than the secondand third-year classes (P \0.028), whereas those between the second- and third-year student classes were not significantly different (P .0.108). For station 4 (cephalometric analysis), the mean scores were not significantly different between the student classes (P .0.160) (Table II). For stations 7 (form a loop) and 11 (cephalometric points), the first-year student class had significantly lower mean scores than did the thirdyear class (P \0.012), although they were not significantly different between the first- and second-year student classes (P .0.063). Table II demonstrates that the mean score levels that reached competency (greater or equal to 0.80) for year 1 were for 2 stations; for year 2, 5 stations; and for year 3, 8 stations. Additionally, the number of noncompetent individual station scores and the distribution of noncompetent vs competent scores decreased from the first to the second years (c2 5 46.15; P \0.0003), from the second to the third years (c2 5 10.26; P \0.0042), and

May 2017  Vol 151  Issue 5

from the first to the third years (c2 5 101.40; P \0.00003). Results for the latent growth analysis are presented in Figures 1 through 4 and Table III. First, the latent growth [F1-4/C] model suggested that the difficulty of the 13 OSCE sta- [F2-4/C] tions ranged from 6.15 to 8.05 (Fig 1). Since each OSCE [F3-4/C] station was evaluated on a 0% to 100% basis, stations [F4-4/C] for bracket placement and cephalometric superimposition were relatively easier, whereas stations for adolescent treatment and a diagnostic skill were more difficult than the other stations. The difference in the difficulty estimates between the station for cephalometric superimposition (easiest) and the station for a diagnostic skill (most difficult) was 1.90, which was significantly different from 0 (c2 5 18.06; P \0.001). Second, the levels of discrimination for a question (ie, those who did best overall did well on a question, and those who did worse overall did worse on the question) of the 13 stations ranged from 0.02 to 3.31 (Fig 2). In general, estimated values greater than 2 indicated high discrimination, discrimination values between 1 and 2 indicated moderate discrimination, and values smaller than 1 indicated low discrimination. Stations for bracket placement, first- and second-order bends, form a loop, a diagnostic skill, third-order bends, cephalometric points, cephalometric superimposition, and interpret a superimposition were relatively more discriminating, whereas stations for mixed dentition treatment, analyze a cephalometric tracing, adolescent treatment, and nongrowing treatment were relatively less discriminating.

American Journal of Orthodontics and Dentofacial Orthopedics

Fields et al

845

Fig 1. Easiness of the 13 OSCE stations and their 95% confidence intervals. A higher value indicates an easier station. The dotted horizontal line indicates the mean easiness value of 6.90.

Next, the results for the structural part of the latent growth model suggested that the students' overall ability improved over time (Fig 3). The amount of increase in the ability was 0.99 between year 1 and year 2 (P \0.001); it was 1.55 between year 1 and year 3 (P \0.001). The students' performance improved more between years 1 and 2 than between years 2 and 3 (c2 5 18.02; P \0.001). Also, nearly every class exceeded the previous one in the first and the final OSCE examinations (Fig 3). When we examined the students' individual performance on each station, somewhat different growth patterns were observed from the longitudinal analysis (Fig 4). For instance, for bracket placement, there was a significant performance improvement between years 1 and 2, with a mean difference of 1.33 (P \0.01), but there was no significant increase between years 2 and 3 (P 5 0.34). For a diagnostic skill, the students’ performance did not significantly improve between years 1 and 2 (P 5 0.16); between years 2 and 3, however, there was a significant improvement, with the mean difference of 1.62 (P 5 0.05). Furthermore, the mean differences in the performance levels between the resident classes were somewhat different across stations. For instance, for the station to form a loop, the differences between class 1 and classes 3, 4, and 5 were significantly different at the P 5 0.025, P 5 0.012, and P \0.01 levels, respectively. Finally, some questions were better at predicting overall competency (Table III). Stations for a diagnostic skill, adolescent treatment, cephalometric superimposition, bracket placement, form a loop, and interpret a superimposition (in order) were quite predictive, but stations for third-order bends, first- and second-order

bends, nongrowing treatment, cephalometric points, and analyze a cephalometric tracing were less important (in order). The first 6 variables together explained 82.8% of the variability of predicting overall competency. DISCUSSION

First, these data are specific for this examination and constellation of questions. Different choices could have been made in the type of questions and their content, but we wanted a group of skill areas that were diverse and clinically applicable, and lent themselves to an OSCE format. The skills were categorized as diagnostic, technical, and evaluation/synthesis and covered a full range of Bloom's taxonomy.12 We believe that they are a worthy group of skills, because they were validated by active regional practitioners.2 These data illustrate that it is possible to perform cross-sectional analysis that allows for comparisons between students of different classes, whereas longitudinal analysis allows us to examine growth of individual students to evaluate their performance and the curriculum. But the data treated as either cross-sectional or longitudinal may tell contrasting stories. When we examined the data cross-sectionally (Table II) for all classes, we were able to achieve more competency (80% level) with diagnostic skills and technical skills than for evaluation and synthesis skills, where we only succeeded in 2 of 5 skills. Lower performance in the evaluation and synthesis is understandable since educational taxonomy indicates that these are higherlevel skills.12 Analyzing a cephalometric headfilm achieved competency in the first year and maintained it at the

American Journal of Orthodontics and Dentofacial Orthopedics

May 2017  Vol 151  Issue 5

Fields et al

846

Fig 2. Discrimination of the 13 OSCE stations with 95% confidence intervals. The higher the value, the more discriminating the station. The dotted horizontal line indicates the mean value of 1.77.

Fig 3. The overall growth of students in their OSCE performance over the 3-year period for 5 consecutive cohort groups (class 1 to class 5, respectively).

competency level throughout. That skill and bracket placement appeared to be the most readily attainable skills. For only 1 skill—placing first- and second-order finishing bends—did the cross-sectional analysis indicate significant changes between the first and second years, and between the second and third years. One can understand that a more complex technical skill may be more difficult to master. But for 9 of the 12 skills, there was no significant change after the second year of the program; this seemingly questions the worth of the third year of the program, at least for these skills.

May 2017  Vol 151  Issue 5

Although all but 1 resident had at least 1 noncompetent station in every examination, viewed on a competency by station level, there was significant improvement through the 3 years. Certainly, this was a demonstration of competency improvement throughout the program. From the cross-sectional individual skill analysis from years 2 to 3, growth only occurred for 1 skill: placing finishing bends. But even when years 1 and 2, and 2 and 3, were equivalent, there was enough growth from years 1 to 3 to be significant for cephalometric point identification and loop forming.

American Journal of Orthodontics and Dentofacial Orthopedics

Fields et al

847

Fig 4. The growth of students in their OSCE performance per station over the 3-year period for 5 cohort groups (class 1 to class 5, respectively).

American Journal of Orthodontics and Dentofacial Orthopedics

May 2017  Vol 151  Issue 5

Fields et al

848

Table III. Based on second- and third-year scores for

the 5-year longitudinal data, the predictive importance of the stations for overall competency are listed in decreasing order Station Skill 9 Diag/diagnostic skill 5 Eval and synth/identify problems and plan treatment—adolescent dentition 12 Eval and synth/superimpose cephalometric tracings 2 Tech/bracket placement 7 Tech/form a loop 13 Eval and synth/interpret cephalometric superimpositions 1 Diag/dental relationships 3 Eval and synth/identify problems and plan treatment—mixed dentition 10 Tech/place accentuated third order bends 6 Tech/place first and second order finishing bends 8 Eval and synth/identify problems and plan treatment—nongrowing skeletal problems 11 Diag/identification of cephalometric points 4 Diag/analyze a cephalometric tracing

Weight 2.80 1.38 1.25 1.23 1.06 1.05 1.000000 0.99 0.71 0.70 0.33 0.18 0.20

Diag, Diagnostic skill; Tech, technique skill; Eval and synth, evaluation and synthesis skill.

The longitudinal analysis also demonstrated that significant overall growth was made during the program. This occurred between years 1 and 2 and between years 1 and 3. Growth occurred between years 2 and 3, just not to a statistically significant level (Fig 3) for the tested skills. From the longitudinal analysis of individual skills, the most significant change in years 2 to 3 was for the diagnostic skill. So, considering all perspectives of improvement or growth, some diagnostic and technical advances were contributed by the third year. Also in nearly all instances, subsequent classes exceeded previous classes. This may have been the case because the growth for each class was similar during the program, but the introductory educational experiences before the first examination, which was in the fall of each year, may have been enhanced. The longitudinal analysis of individual questions (Fig 4) shows different patterns of growth for different questions. Also, there were differences between some classes and not between others for the same questions. Interestingly, although different classes performed better on some questions than others (eg, a class was consistently the top performer on 1 question and the bottom performer on another), they never varied in their order within a question. Several factors can affect the appearance of growth. When a longitudinal sample is tested for the second time

May 2017  Vol 151  Issue 5

and a change is observed, it could be learning or an environmental influence such as cohort or selection effects.13 Either of these could be at work, and we would attribute it to learning, since the overall examination did not change. In this study, there should be no cohort or selection effect, because the groups were stable over time and represented 100% of the sample. Although the CODA1 guidelines require a minimum program length of 24 months, these longitudinal data clearly indicate a benefit in terms of knowledge gained between the first and second years, and between the second and third years. So, beyond the longer opportunity to perform research and increased patient contact available in a longer program, valuable learning was occurring. When the 5 difficult skills (mixed dentition treatment, adolescent treatment, form a loop, cephalometric points, and interpret a superimposition) according to the cross-sectional analysis were examined longitudinally using the latent growth analysis, different impressions emerge. The analysis of difficulty showed that the 5 skills that were previously found to be most difficult (least able to achieve competency) were not the most difficult. Although 3 of the 5 using the cross-sectional analysis were again judged the most difficult (adolescent treatment, cephalometric points, and interpret a superimposition) by the latent analysis, 2 were of mean difficulty (mixed dentition treatment and form a loop). The station for a diagnostic skill was cross-sectionally one of the most difficult initially, but competence was achieved by the third year. It was judged one of the most difficult longitudinally. Some changes in difficulty could have occurred over time because of the test-retest effect, since some stations (bracket placement and cephalometric superimposition) did not change. The data show that these questions, even when repeated, maintained their difficulty. By contrast, although the overall examination did not change, some individual station components such as treatment planning (mixed dentition, adolescent, and nongrowing skeletal) and a diagnostic skill changed for each test so test-retest did not affect their scores and may have maintained the difficulty.13 The longitudinal analysis also examined the ability of a skill to discriminate among the students. Stations for bracket placement, form a loop, diagnostic skill, thirdorder bends, and cephalometric superimposition provided the most discrimination. Three of the 5 skills were technical (bracket placement, form a loop, and third-order bends), 1 was diagnostic (diagnostic skill), and 1 was evaluation/synthesis (cephalometric superimposition). Two of the 5 skills were among the easiest to attain (bracket placement and cephalometric

American Journal of Orthodontics and Dentofacial Orthopedics

Fields et al

superimposition), and 1 was among the hardest (diagnostic skill). This readily demonstrates that unless a challenge is so easy that everyone attains it or so hard that no one can accomplish it, the skill can contribute to sound evaluation. The skills all had positive discrimination indexes, which usually indicate good internal consistency or relevance.14 When tests examine varying skills or areas, often the ability to uniformly discriminate is lower.14 One question had low difficulty and low discrimination (analyze a cephalometric tracing). This question could be considered inappropriate for the test,15 except that it was an essential skill, so its place in the OSCE was appropriate. Beyond this is the finding that all the domains, if addressed appropriately, can be discriminating. This is an important finding and indicates that the examination does not need to be strictly tailored by domain of questions to be an effective discriminator, and this is desirable. The 6 questions that were most predictive of the overall competency were heavily weighted toward the evaluation and synthesis skills (adolescent treatment, cephalometric superimposition, interpret a superimposition), but also included 2 technical questions (bracket placement and form a loop) and 1 diagnostic question (diagnostic skill). The easiest questions (bracket placement and cephalometric superimposition) and the hardest question (diagnostic skill) were included, and 3 of the 6 (bracket placement, diagnostic skill, cephalometric superimposition) were among the most discriminating. Generally, high and low difficulty questions are rejected from examination materials, but here they contributed to predictions of competency.16 Furthermore, 3 of the 6 skills (adolescent treatment, form a loop, and interpret a superimposition) were not uniformly learned to a competency level according to the cross-sectional analysis. Again on the face of it, one would believe that complex questions would be desirable and at least predictive. This was not demonstrated. The construction of the examination can follow the clinical curriculum and have selections from all portions and at the same time be predictive of competency. The findings for both discrimination and prediction should bolster educators to consider incorporating these types of examinations into their curriculum, since the options appear open to inclusion of subject matter and type of question. Also, the inability of educators to predict the results of students on this type of examination indicates that traditional biases need to be countered by an anonymous clinically oriented examination. Although the students obtain feedback at 3 points in their education from the OSCE, they receive peer tutoring and learning throughout the program from the discussions during group education settings such as case

849

presentations, literature reviews, and research presentations. This perpetuates learning whether or not they achieve competency in a skill early or late in the program. This assumes that all students can and will master the skills even though they entered the program with variable levels of knowledge and skill. Using the questions we did use, 6 predicted over 80% of the variability and did it with questions from different learning domains, and different discrimination and difficulty indexes. Those differences gave the students a more clinically appropriate and diverse evaluation and learning tool. Only through analysis of the data using multiple approaches can the value of the questions comprising such an examination be determined. CONCLUSIONS

1.

2. 3. 4.

5.

OSCE examinations can provide a method for evaluating student performance and curriculum impact over time. Significant learning appears to occur during all years of a 3-year program. Cross-sectional and longitudinal analyses of the results may not be complementary. It is possible to have discriminating and predictive questions from different domains of skills (diagnostic, technical, and evaluative/synthesis). Both difficult and easy questions can contribute to discriminating and predictive questions.

REFERENCES 1. Commission on Dental Accreditation. Accreditation Standards for Advanced Specialty Education Programs in Orthodontics and Dentofacial Orthopedics. Available at: www.ada.org/coda. Accessed April 1, 2016. 2. Fields HW, Rowland ML, Vig KW, Huja SS. Objective structured clinical examination use in advanced orthodontic dental education. Am J Orthod Dentofacial Orthop 2007;131:656-63. 3. Harden RM, Stevenson M, Downie WW, Wilson GM. Assessment of clinical competence using objective structured examination. Br Med J 1975;1:447-51. 4. Mossey PA, Newton JP, Stirrups DR. Scope of the OSCE in the assessment of clinical skills in dentistry. Br Dent J 2001;190:323-6. 5. Ackerman JL, Proffit WR, Sarver DM, Ackerman MB, Kean MR. Pitch, roll, and yaw: describing the spatial orientation of dentofacial traits. Am J Orthod Dentofacial Orthop 2007;131:305-10. 6. Tanaka MM, Johnston LE. The prediction of the size of unerupted canines and premolars in a contemporary orthodontic population. J Am Dent Assoc 1974;88:798-801. 7. Bolton WA. Disharmony in tooth size and its relation to the analysis and treatment of malocclusion. Am J Orthod 1958;28:113-30. 8. Baccetti T, Franchi L, McNamara JA Jr. An improved version of the cervical vertebral maturation (CVM) method for the assessment of mandibular growth. Angle Orthod 2002;72:316-23. 9. Baumrind S, Frantz RC. The reliability of head film measurements. 1. Landmark identification. Am J Orthod 1971;60:111-27.

American Journal of Orthodontics and Dentofacial Orthopedics

May 2017  Vol 151  Issue 5

Fields et al

850

10. Meredith W, Tisak J. Latent curve analysis. Psychometrika 1990; 55:107-22. 11. Duncan TE, Duncan SC, Strycker LA. An introduction to latent variable growth curve modeling: concepts, issues, and application. New York: Routledge Academic; 2013. 12. Bloom B, Engelhart MD, Furst EJ, Hill WH, Krathwohl DR. Taxonomy of educational objectives: the classification of educational goals. Handbook I: cognitive domain. New York: David McKay; 1956.

May 2017  Vol 151  Issue 5

13. Hilton TL, Patrick C. Cross-sectional versus longitudinal data: an empirical comparison of mean differences in academic growth. J Educ Measure 1970;7:15-24. 14. Embretson S, Reise SP. Item response theory for psychologists. Mahwah, NJ: Psychology Press; 2000. 15. Wiberg M. Classical test theory vs item response theory. EM No. 50, 2004. 16. de Boeck P, Wilson M. Explanatory item response models: a generalized linear and nonlinear approach. New York: Springer; 2004.

American Journal of Orthodontics and Dentofacial Orthopedics