Child Abuse & Neglect 101 (2020) 104379
Contents lists available at ScienceDirect
Child Abuse & Neglect journal homepage: www.elsevier.com/locate/chiabuneg
Deliberate practice as an educational method for learning to interpret the prepubescent female genital examination
T
Davis A.L.a, Pecaric M.b, Pusic M.V.c, Smith T.d, Shouldice M.d, Brown J.e, Wynter S.A.f, Legano L.g, Kondrich J.h, Boutis K.a,* a
Division of Pediatric Emergency Medicine, Department of Pediatrics, Hospital for Sick Children and University of Toronto, Toronto, Ontario, Canada Contrail Consulting Services Inc, Toronto, ON, Canada c Department of Emergency Medicine and Division of Learning Analytics at the NYU School of Medicine, NY, United States d The Suspected Child Abuse and Neglect Program, Division of Pediatric Medicine, The Hospital for Sick Children, University of Toronto, Canada e Department of Pediatrics, Columbia University, Irving Medical Center-Vagelos College of Physicians and Surgeons, New York Presbyterian Morgan Stanley Children’s Hospital, United States f Pediatric Emergency Medicine, Department of Pediatrics, Children’s Hospital at Montefiore, Albert Einstein College of Medicine, NY, United States g Department of Pediatrics, Child Protection Team, New York University School of Medicine, New York, NY, United States h Departments of Emergency Medicine and Pediatrics, New York Presbyterian Hospital-Weill Cornell Medicine, New York, NY, United States b
A R T IC LE I N F O
ABS TRA CT
Keywords: Child abuse Education Medical Deliberate practice Physical examination
Background: Correct interpretation of the prepubescent female genital examination is a critical skill; however, physician skill in this area is limited. Objective: To complement the bedside learning of this examination, we developed a learning platform for the visual diagnosis of the prepubescent female genital examination and examined the amount and rate of skill acquisition. Participants and Setting: Medical students, residents, and fellows and attendings participated in an on-line learning platform. Methods: This was a multicenter prospective cross-sectional study. Study participants deliberately practiced 158 prepubescent female genital examination cases hosted on a computer-based learning and assessment platform. Participants assigned the case normal or abnormal; if abnormal, they identified the location of the abnormality and the specific diagnosis. Participants received feedback after every case. Results: We enrolled 107 participants (26 students, 31 residents, 24 fellows and 26 attendings). Accuracy (95 % CI) increased by 10.3 % (7.8, 12.8), Cohen’s d-effect size of 1.17 (1.14, 1.19). The change in specificity was +16.8 (14.1, 19.5) and sensitivity +2.4 (-0.9, 5.6). It took a mean (SD) 46.3 (32.2) minutes to complete cases. There was no difference between learner types with respect to initial (p = 0.2) or final accuracy (p = 0.4) scores. Conclusions: This study’s learning intervention led to effective and feasible skill improvement. However, while participants improved significantly with normal cases, which has relevance in reducing unnecessary referrals to child protection teams, learning gains were not as evident in
Abbreviation:PGY, Post-graduate year ⁎ Corresponding author. E-mail addresses:
[email protected] (A.L. Davis),
[email protected] (M. Pecaric),
[email protected] (M.V. Pusic),
[email protected] (T. Smith),
[email protected] (M. Shouldice),
[email protected] (J. Brown), shwynter@montefiore.org (S.A. Wynter),
[email protected] (L. Legano),
[email protected] (J. Kondrich),
[email protected] (K. Boutis). https://doi.org/10.1016/j.chiabu.2020.104379 Received 30 October 2019; Received in revised form 10 January 2020; Accepted 13 January 2020 0145-2134/ © 2020 Published by Elsevier Ltd.
Child Abuse & Neglect 101 (2020) 104379
A.L. Davis, et al.
abnormal cases. All levels of learners demonstrated a similar performance, emphasizing the need for this education even among experienced clinicians.
1. Introduction The pre-pubescent female genital examination is an important clinical skill for health care providers who practice in community, urgent, and emergent care settings (Jenny, Crawford-Jakubiak, Committee on Child, Neglect, & American Academy of P., 2013). Competency in this evaluation requires knowledge of normal anatomy, common dermatologic, infectious, urologic conditions, and the ability to distinguish between accidental and nonaccidental trauma (Adams, Farst, & Kellogg, 2018, 2016). Learning this critical skill in residency currently relies heavily on incidental exposure to cases, which for many is a variable and limited experience (Frasier, Thraen, Kaplan, & Goede, 2012; Giardino, Brayden, & Sugarman, 1998; Starling, Heisler, Paulson, & Youmans, 2009), and a formal teaching curriculum for this skill or specialty electives in child abuse or pediatric gynecology are not routinely a part of many residency programs (Dubow, Giardino, Christian, & Johnson, 2005; Giardino et al., 1998; Heisler, Starling, Edwards, & Paulson, 2006; Narayan, Socolar, & St Claire, 2006; Starling et al., 2009; Ward et al., 2004). It is not surprising then that skill in this area is lacking for physicians, particularly in the area of normal genital anatomy and identification and interpretation of genital findings in young female sexual abuse cases (Adams et al., 2012; Atabaki & Paradise, 1999; Dubow et al., 2005; Heisler et al., 2006; Ladson, Johnson, & Doty, 1987; Lentsch & Johnson, 2000; Makoroff, Brauley, Brandner, Myers, & Shapiro, 2002; Paradise et al., 1997), which can have significant consequences for the patient. Failure to identify findings suggestive of trauma puts children at further risk of abuse, while misinterpretation of normal findings as abusive can lead to significant legal, social and emotional consequences for patients and families, and in some cases incorrect child protection/legal interventions (Frasier et al., 2012; Makoroff et al., 2002). Gaining expertise in evaluating the female with genital complaints is complex since it includes taking a comprehensive history, interpretation of physical examination findings, and integration of all information into clinical decision making. To facilitate learning of complex tasks, instructional design models recommend intentionally alternating part-task with whole-task training (van Merrienboer, Kirschner, & Kester, 2010). In this light, e-learning provides an opportunity to expose learners to an image interpretation learning experience (i.e. part-task) that could complement the routine face-to-face teaching and bedside learning that addresses all facets of history taking, physical examination and integration (i.e. whole-task). Web-based image banks that provide cognitive simulation and deliberate practice on hundreds of cases have demonstrated success for increasing skills in the interpretation of electrocardiograms, musculoskeletal radiographs and point-of-care ultrasound images (Hatala, Gutman, Lineberry, Triola, & Pusic, 2019; Kwan et al., 2019; Lee et al., 2019). That is, cases are presented to via a web-based interface to mirror bedside presentation and decision making (cognitive simulation). Specifically, images are presented, and clinicians decide on whether an image is normal or abnormal and be able to locate specific areas of abnormality on the image. Further, participants are provided specific feedback after each case is completed (deliberate practice), and feedback provides an ongoing measure of performance as part of the instructional strategy (Black & William, 1998; Larsen, Butler, & Roediger, 2008). Using these methods, participants learn to recognize similarities and differences between diagnoses, identify specific images that are relative weaknesses for the participant and build up a global representation of possible diagnoses (Boutis, Pecaric, Seeto, & Pusic, 2010; Ericsson, 2004; Pecaric, Boutis, Beckstead, & Pusic, 2017; Pusic, Pecaric, & Boutis, 2011; Pusic, Pecaric, & Boutis, 2011; Pusic et al., 2012). Thus, this type of learning-assessment platform could be applied to the image interpretation component of prepubescent female genital complaints, and the respective learning analytics form this experience could improve our understanding of how this skill is developed. We developed a web-based image interpretation learning and assessment system that included de-identified, prepubertal female genital images. We had physician learners deliberately practice interpreting these images and determined performance metrics relevant to clinical practice. We hypothesized that the learning intervention would lead to skill improvement in study participant ability to differentiate normal from abnormal prepubertal female genital images.
2. Methods 2.1. Education intervention We used previously established methods to develop the education intervention (Boutis et al., 2016, 2010; Lee et al., 2019; Pusic et al., 2012). The education tool was developed in collaboration with three tertiary care children’s hospitals and an industry partner.
2.1.1. Tutorial We developed a 15-slide, interactive, on-line introductory tutorial that participants reviewed prior to starting the cases (Adams et al., 2018, 2016). This included a glossary of terms and anatomical landmarks, normal and normal variants of the hymen with respective image examples, medical conditions (dermatologic, infectious, urethral prolapse, labial fusion) and traumatic presentations.
2
Child Abuse & Neglect 101 (2020) 104379
A.L. Davis, et al.
2.1.2. Case Selection, Case Solutions, and Presentation Cases consisted of patient age and previously collected anonymized colposcopic images from the three medical centers. We collected a total of 234 de-identified case images taken during routine clinical evaluation for child sexual abuse. Abnormal images were those that included findings that resulted in a change in management such as further consultation, need for follow up, or medical/surgical intervention. Rare or unusual congenital variations of the hymen were included in the case experience and were not considered abnormal. Three child-protection clinicians (2 pediatricians and 1 nurse practitioner) with over 20 years of expertise, credentialing in child-protection team medicine, and exposure to over 1000 prepubescent female genital examinations interpreted all the images to determine case diagnoses. The participating child abuse clinicians each had several years of experience as child sexual abuse specialists and regularly reviewed research in this area, which has been shown to add validity to diagnostic acumen (Adams et al., 2012). Further, diagnostic agreement between the specialists demonstrated an intraclass correlation coefficient of 0.97 (95 % CI 0.96, 0.98). Discrepant interpretations were resolved by group consensus. There were 76 cases discarded due to poor image quality (e.g. blurry photo), content out of scope (e.g. genital mutilation), or a consensus could not be reached on a discrepant interpretation or poor item metrics (i.e. point biserial correlation < 0.20 or > 0.80). This resulted in a final sample of 158 cases, 40.1 % of which had abnormal findings (Table 1) with proportions of specific pathology representative of clinical practice. That is, we included higher proportions of more common pathology (e.g. labial fusion) and lower proportions of rare entities (e.g. hymenal tear). There was up to three pathological findings per abnormal case. A website was developed using FLASH, HTML, PHP, and a mySQL database.(Pusic et al., 2011a, 2011b) Once participants were provided unique access, they were taken to the on-line system and presented with the 158 cases. In keeping with evidence in the education literature to avoid history bias of image interpretation (Leblanc, Brooks, & Norman, 2002; Paradise, Winter, Finkel, Berenson, & Beiser, 1999), we presented the images only with the age of the patient. After review of the image, participants declared the case as definitely normal, probably normal, probably abnormal or definitely abnormal (Fig. 1). The definitely/probably qualifiers served as a measure of participant confidence. The binary classification of “normal or abnormal” represents the typical first step in bedside decision making for front-line physicians to consider if additional test/specialty consultation is warranted, and thus was the first-pass task selected for this education intervention. If their answer was in the “abnormal” category, the user was then required to designate one area of abnormality directly on the image and select the diagnosi(e)s from a menu of common medical and traumatic genitourinary findings (Adams et al., 2012) using an interactive system. When ready, participants submitted their response, after which they received immediate visual and written feedback on the correctness of their response, demonstrating the region of abnormality on the image, diagnosis of the case, and normal anatomy labels. Prior to launch, the system was pilot tested on two physicians with expertise in child protection who provided technical and content feedback and revisions were made accordingly. 2.2. Study design and setting This was a multicenter prospective cohort study, which included sites in the United States and Canada from May 1 to June 30, 2017. 2.3. Study population and participant recruitment Participants included third- and fourth-year medical students, residents in pediatrics and emergency medicine, pediatric emergency medicine fellows, and attending physicians in pediatric emergency medicine. An email was sent to program/divisional Table 1 Case Diagnoses. Diagnosis, no. (%) N=158 Normal Abnormal – Medical Anatomic Labial fusion Urethral prolapse Infectious Herpes Condylomata Molluscum Abnormal discharge
94 (59.5) 36 (22.8) 16/36 (44.4) 7 9 12/36 (33.3) 2 7 1 2
Dermatologic Lichen sclerosis Eczema Vitiligo Abnormal - Traumatic Non-hymenal injury Hymenal injury
8/36 (22.2) 5 1 2 28 (17.7) 24/28 (85.7) 4/28 (14.3)
3
Child Abuse & Neglect 101 (2020) 104379
A.L. Davis, et al.
Fig. 1. Case example.The participant considers the image and declares it as “abnormal” or “normal” as demonstrated in A. In B, the participant selected abnormal. In C, the participant selects a diagnosis from a pick list, in this case “condyloma acuminate.” In D, the participant submits response and receives visual and text feedback.
leadership asking them to forward the email to their respective team. Interested participants contacted the study coordinator. Participants were excluded if they had prior fellowship training in pediatric gynecology or child abuse, practiced as a child-protection team physician, or if they did not complete the study case set. This study was approved by the Institutional Review Boards of the three participating sites.
2.4. Data collection Secure entry was ensured via unique participant login credentials and access to the system was available 24 h per day, 7 days per week. We collected information on level of practice (medical student, resident, fellow, attending), sex (male vs. female), geographic location (Canada vs. United States), most recent completed/current post-graduate training (pediatrics, emergency medicine, pediatric emergency medicine, other), practice setting (university affiliated general/community hospital, university affiliated children’s hospital, non-university affiliated general/community hospital, non-university affiliated children’s hospital, other), number of prepubescent female genital examinations completed (none, < 20, 21–50, 51–100, > 100), prior elective that included specific training on the prepubescent female genital examination (yes vs. no), current level of post-graduate (PGY) training if a trainee (medical student, PGY1, PGY2, PGY3, PGY4, PGY5, PG6+) and years as an attending since graduation (less than five years, 5–10 years, 11–20 years, > 20 years). Participants completed the introductory tutorial. Following this, participants proceeded to the cases. All participants reviewed the same cases and assigned each case as normal or abnormal, with respective locations and diagnoses as detailed above. With each case interaction, the system automatically recorded the accuracy of the participant response and the time that elapsed from starting the case until submission of a response. A five-case demonstration that includes the introductory tutorial can be found at https:// imagesim.research.sickkids.ca/demo/gap/enter.php (username/password demo/demo). Cases were presented in random order unique to each participant. Participants were given a time limit of three weeks to complete all cases to minimize educational decline that may confound results (Boutis et al., 2019). Participants who had not completed the cases within 1.5 weeks were sent an e-mail reminder. Those who completed all the cases received a $20-gift card. 4
Child Abuse & Neglect 101 (2020) 104379
A.L. Davis, et al.
2.5. Outcomes The primary outcome was the change in performance metrics from the initial to final 25 cases (Boutis et al., 2019) for the three scoring scenarios described below (Fig. 2). We also derived learning curves for the performance metrics (Boutis et al., 2010; Pusic et al., 2011a, 2011b), and three authors (AD, MPu, KB) reviewed the images to qualitatively estimate the number of cases to the inflection point (i.e. where learning slows down and gains decrease) (Boutis et al., 2010; Pusic et al., 2011a, 2011b). 2.6. Analyses Sample size: Based on prior work using different image sets ranging from 50 to 300 images, 158 images would allow multiple cases per each diagnostic category of normal versus abnormal and a determination of learning curves, which demonstrate where learning starts and the rate o of learning with each case exposure (Boutis et al., 2010; Pusic et al., 2011a, 2011b). With a sample size of 80 participants, using one-way ANOVA based on four experience levels and 20 subjects/group, we had 80 % power to detect a 30 % difference in scores (assuming a 45 % standard deviation on raw scores) (Boutis et al., 2010; Pusic et al., 2011a, 2011b). Scoring: The scoring considered three scenarios (Fig. 2). First (Fig. 2A), the broad categories of in classifying normal/abnormal (with respective location of the abnormality); second (Fig. 2B), classifying normal/abnormal (with respective location) and assigning the broad categories of medical versus traumatic abnormal cases; and, finally (Fig. 2C), classifying normal/abnormal (with respective location) and assigning all specific diagnoses from the pick list provided. Participant Comparisons: We used a one-way ANOVA to compare the mean change in accuracy between the four learner types. Overall Metrics: Using the initial and final 25 cases, we calculated change in participant accuracy, sensitivity, specificity and Cohen’s d-effect size (Cohen, 1988) for accuracy, with respective 95 % confidence intervals (CI). In this context, sensitivity refers to the participant accurately identifying an abnormal exam [abnormal cases correct/(abnormal cases correct + abnormal cases labelled as normal)] and specificity refers to the participant accurately identifying a normal exam [normal cases correct/(normal cases correct + normal cases labelled as abnormal)]. We also determined the median time spent on the 158 cases and compared the time commitment between learner types using the Kruskal Wallis test. Learning Curves: We generated cumulative learning curves by participant type and accuracy, sensitivity and specificity for all participants.(Boutis et al., 2010) The learning curves of a given performance metric (e.g. accuracy) were generated using raw moving averages across the participant’s last 25 cases and for clarity a phaseless Butterworth low pass filter algorithm was applied (Winter, 2009). From a qualitative review of the learning curves, the inflection point was also determined (Pusic et al., 2011a, 2011b). Significance was set at p < 0.01 to account for multiple testing. All analyses were carried out using SPSS® (Version 23, IBM 2015). 3. Results 3.1. Study participants We enrolled 126 participants and 107 (84.9 %) completed all 158 cases; 70.1 % were female. Of the 107, there were 26 (24.3 %) medical students, 31 (29.0 %) pediatric/emergency medicine residents, 24 (22.4 %) pediatric emergency medicine fellows, and 26 (24.3 %) pediatric emergency medicine attendings. Participants were recruited from 34 institutions located in 16/50 (32.0 %) American states and 3/10 (30.0 %) Canadian provinces. Of note, 15.0 % of participants reported that they had never performed a prepubertal female genital examination prior to participation, and 25.2 % reported completing an elective in child abuse or pediatric gynecology (Table 2). Overall, 42.1 % performed this examination at least once but fewer than twenty times. There were 20 participants who reported more than 50 case-exams were attending level physicians, and 15 (75 %) were in practice at least five years. 3.2. Performance results 3.2.1. Participant comparisons There were no differences between learner types with respect to change in accuracy for all three diagnostic scoring schemes (Table 3; ANOVA comparisons: normal vs. abnormal, p = 0.9; normal vs. abnormal +/- medical vs. traumatic abnormal findings, p = 0.9; normal vs. abnormal +/- specific diagnosis, p = 0.5). As a result, participant data was pooled for learning curve and performance metric analyses. 3.2.2. Overall metrics Accuracy scores increased significantly across all three diagnostic scoring schemes, with respective high (> 0.8) (Cohen, 1988) Cohen’s effect sizes (Table 4) with respect to diagnostic accuracy. The most significant learning gains were seen in the normal cases (n = 94), with a specificity increase of 16.8 %. There were no significant learning gains in the abnormal cases (n = 64), regardless of whether the task was to categorize normal vs. abnormal, normal vs. abnormal and medical vs. traumatic, or normal vs. abnormal with specific diagnosis (Table 4). However, initial sensitivity was significantly different depending on the diagnostic scoring scheme (p < 0.0001). Post-hoc testing demonstrated that tasking participants with identifying normal vs. abnormal and the specific diagnosi (e)s demonstrated the lowest initial sensitivity relative to identifying normal vs abnormal +/- medical vs. traumatic (p < 0.0001) or normal vs. abnormal (p < 0.0001). 5
Child Abuse & Neglect 101 (2020) 104379
A.L. Davis, et al.
Fig. 2. Diagnostic Scoring Schemes: In A) participants received points if they correctly labelled the case as normal or abnormal, with the correct localization of the abnormality (where applicable). In B), participants received points if they correctly labelled the case as normal or abnormal, with the correct localization of the abnormality (where applicable) and selected the correct broad diagnostic category of medical vs. traumatic. In C), participants received points if they correctly labelled the case as normal vs abnormal, with the correct localization of the abnormality and selected the correct specific diagnosi(e)s from a pick list (where applicable). 6
Child Abuse & Neglect 101 (2020) 104379
A.L. Davis, et al.
Table 2 Participant Demographics. Faculty, Post-graduate trainees, and Medical Students
N = 107 No. (%)
Canadian participants Female participants Participant type Attending physician Fellow/subspecialty Resident Resident Medical student Type of medical centre University-affiliated children’s hospital University-affiliated general hospital Non-university affiliated children’s hospital Non-university affiliated general hospital Clinic setting of exposure to pre-pubescent female genital examination None Child abuse Gynecology Pediatrics Emergency Medicine Reported number of completed pre-pubescent female genital examination None 1-20 21-50 51-100 > 100 Faculty Most recent/active training of post-graduate trainees and faculty Pediatrics Pediatric emergency medicine Emergency medicine Post-Graduate Trainees Post-graduate year of residents/fellows PGY1 PGY2 PGY3 PGY4 PGY5 PGY6+
46 (43) 75 (70.1) 26 24 31 26
(24.3) (22.4) (29.0) (24.3)
62 (57.9) 37 (34.6) 3 (1.9) 5 (4.7) 50 (46.7) 27 (22.4) 3 (2.8) 23 (21.5) 4 (3.7) 16 (15.0) 45 (42.1) 24 (22.4) 8 (7.5) 12 (13.1) N = 81 43 (53.1) 32 (39.5) 6 (7.4) N = 55 13 (23.6) 8 (14.5) 9 (16.4) 9 (16.4) 8 (14.5) 8 (14.5)
The median (IQR; min, max) time on each case was 9.5 (5.6, 17.2) seconds, with a total mean (SD) time to course completion of 46.3 (32.2) minutes. Per learner group, median (IQR) time on case data was as follows: medical students 8.3 (4.9, 14.5) seconds, residents 9.1 (5.4, 16.2) seconds, fellows 10.4 (5.8, 19.0) seconds and attendings 10.9 (6.6, 19.8) seconds, p < 0.0001. Post-hoc analyses demonstrated that medical students (p < 0.0001) and residents (p < 0.0001) took a significantly shorter time to complete a case relative to attendings and fellows. 3.2.3. Learning curves A qualitative review of the learning curves revealed that for overall accuracy to distinguish normal from abnormal and normal cases (specificity), learning slowed down (inflection point) at about case 100 but continued through to case 158 (Fig. 3). There were no clear learning gains for abnormal cases (sensitivity) regardless of how the diagnostic task was categorized (Fig. 3A–C). 3.3. Missing data Of the 126, 19 (15.1 %) participants started the cases but did not complete them. There was no difference between participants who completed the intervention versus those who did not with respect to learner type (p = 0.08), country (p = 0.5), sex (p = 0.4), years in practice (p = 0.9), post-graduate training (p = 0.4), less than 50 exams completed prior to participation (p = 0.1), or reported clinical teaching in pediatric genital exam (p = 0.5). 4. Discussion We demonstrated that deliberate practice of prepubescent female genital examination images was effective for improving participant skill in diagnosing normal versus abnormal cases. Specifically, participants improved with normal cases, but for abnormal cases participants did not increase their skill in accurately assigning the diagnostic categories of medical versus traumatic and the specific diagnoses for abnormal cases. There were no significant differences in performance between levels of learners. Completing all 7
8 72.9
61.5
Initial – performance on the initial 25 cases. Final – performance on the final 25 cases. * Scoring details are described in Fig. 2.
81.9 84.4
71.9 74.2
Normal versus Abnormal Normal versus Abnormal and Medical versus Traumatic Diagnosis Normal versus Abnormal and Specific Diagnosi(e)s
Final
Initial
Scoring Scheme*
11.4 (6.2, 16.6)
10.0 (4.7, 15.3) 10.2 (6.7, 13.8)
Difference (95 % CI)
Accuracy (%) Medical Student N = 26
Table 3 Participant Accuracy by Learner Type for the Three Scoring Schemes.
58.1
69.8 71.9
Initial
69.7
79.5 82.5
Final
11.6 (6.9, 16.3)
9.7 (5.2, 14.2) 10.6 (7.3, 13.9)
Difference (95 % CI)
Accuracy (%) Resident N = 31
62.0
72.8 74.1
Initial
73.3
82.2 84.3
Final
11.3 (6.9, 15.7)
9.3 (5.3, 13.3) 10.2 (6.4, 14.0)
Difference (95 % CI)
Accuracy (%) Fellows N = 24
64.2
71.1 74.0
Initial
71.1
80.8 83.9
Final
6.9 (1.9, 11.9)
9.7 (4.8, 14.6) 9.9 (5.4, 14.5)
Difference (95 % CI)
Accuracy (%) Faculty N = 26
A.L. Davis, et al.
Child Abuse & Neglect 101 (2020) 104379
9
Final 81.0 83.7 71.6
Initial 71.3 73.4 61.3
10.3 (7.8, 12.8)
Difference 9.7 (7.3, 12.0) 10.2 (8.2, 12.3) 39.6
Initial 65.5 71.3 42.0
Final 65.3 72.4 2.4 (-0.9, 5.6)
Difference −0.2 (-0.04, 0.04) 1.1 (-1.7, 3.8)
Percent Sensitivity (95 % CI) N = 107
74.2
Initial 74.2 74.2
91.0
Final 91.0 91.0
16.8 (14.1, 19.5)
Difference 16.8 (14.1, 19.5) 16.8 (14.1, 19.5)
Percent Specificity (95 % CI)** N = 107
Initial – performance on the initial 25 cases. Final – performance on the final 25 cases. * Scoring details are described in Fig. 2. ** Note that specificity will not change between scoring schemes since there is no change to how normal cases were scored between scoring schemes.
Scoring Scheme* Normal versus Abnormal Normal versus Abnormal and Medical versus Traumatic Diagnosis Normal versus Abnormal and Specific Diagnosi(e)s
Percent Accuracy (95 % CI) N = 107
Table 4 Performance metrics of all participants for a given diagnostic task.
1.17 (1.14, 1.19)
0.79 (0.76, 0.81) 1.40 (1.38, 1.43)
Cohen’s effect size d (95 % CI)
A.L. Davis, et al.
Child Abuse & Neglect 101 (2020) 104379
Child Abuse & Neglect 101 (2020) 104379
A.L. Davis, et al.
Fig. 3. Each of these demonstrates the learning curves (25 case moving average) of accuracy (green), sensitivity (red) and specificity (blue) reported as a proportion for number of cases completed for the three different scoring schemes (Fig. 2): A) Normal versus Abnormal and location of abnormality; B) Normal versus Abnormal and Medical versus Traumatic for abnormal cases; C) Normal versus Abnormal and Specific Diagnosi(e)s for abnormal cases. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article). 10
Child Abuse & Neglect 101 (2020) 104379
A.L. Davis, et al.
cases took, on average, less than one hour. The most significant benefit for participants in this learning intervention derived from their increased ability to identify normal prepubescent genital images. This is relevant since previous work demonstrates that health care providers have limited ability to diagnose normal cases. In one study, only 19 % of emergency medicine and family medicine residents across 67 US training programs correctly identified three anatomic structures (hymen, urethra and labia minora) on photos of prepubescent female genitalia and over 40 % incorrectly interpreted normal examination photos as abnormal (Starling et al., 2009). Another study found poor agreement between pediatric emergency medicine attendings and child abuse experts on genital examination findings in the prepubertal age group, whereby 70 % of findings on normal prepubescent female genital exams were falsely-interpreted as abnormal by pediatric emergency attendings (Makoroff et al., 2002). Palusci et al. found that nearly 40 % of primary care physicians reported that they could not accurately identify prepubertal genital anatomy (Palusci & McHugh, 1995). Importantly, this skill deficiency has practical consequences since false-positive referrals to child protection teams for normal anatomy confused with abusive findings may have a significant emotional impact on families and children (Frasier et al., 2012; Makoroff et al., 2002). Thus, this study’s learning intervention could potentially fill at least some of the knowledge gap of identifying normal prepubescent female genital anatomy, with the hope of facilitating more appropriate referrals to specialists such as child protection teams or gynecology. While participants were better able to categorize abnormal cases as medical or traumatic pathology versus identify a specific case diagnosis, they demonstrated no improvement in either of the latter skills through the education intervention. Prior work examining skill increases in image interpretation demonstrated that at least half the cases should have abnormal findings to achieve significant sensitivity gains (Pusic et al., 2012); thus, the lower proportion of abnormal cases in this learning intervention may have impacted learning gains for this set of cases. That said, our results showed that those who reported having done over 50 of these clinical examinations were in practice at least 5 years and since we know that the majority of exams are normal, many of our participant’s prior examinations were likely with normal cases. Therefore, to get exposure to 60 cases with abnormal findings would likely take many years to acquire through clinical practice alone. Although we cannot be certain of this from this study’s data, it is possible that exposure to the cases from this learning intervention may enhance a clinician’s mental repository of abnormal findings in this anatomic area, which in turn may impact diagnostic skill at the bedside (Kellman, Massey, & Son, 2010; Norman, 2018). Future study should work towards expanding the abnormal cases in hopes of demonstrating improved skill interpreting these images. A salient negative finding in our study was that there were no significant differences in performance between levels of learners, despite the longer time spent on reviewing cases among fellows and attendings relative to medical students and residents. Prior work in image interpretation has demonstrated that more experienced participants do spend longer reviewing images than more novice participants, but this is often associated with a higher interpretation accuracy (Boutis et al., 2013). However, in this study, the longer image review by more experienced participants did not result in higher accuracy, suggesting that there was a fundamental lack of content expertise in this area. A lack of expertise aligns with another result from our study which demonstrated that less than half of all fellows and attending physician participants reported having prior elective training in child protection or pediatric gynecology. Together, these findings mirror literature showing skill deficiencies in this area for both residents and attending physicians (Dubow et al., 2005; Lentsch & Johnson, 2000; Makoroff et al., 2002; Menoch, Zimmerman, Garcia-Filion, & Bulloch, 2011) and stresses the need for additional strategies to augment current teaching. The digital learning environment presented in this study offers one solution to added learning in this area and could provide highly specific and effective deliberate practice, towards the goal of improved clinician diagnostic skill at the bedside (Pecaric et al., 2017). Our study has limitations that warrant consideration. This study used two-dimensional images for an examination that is threedimensional at the bedside. Thus, our use of still images vs. video (Killough et al., 2016) or a higher concentration of bedside examinations may have limited learning gains for the benefit of efficiency (Adams et al., 2018). Specific diagnoses may be difficult to distinguish from each other (e.g. condylomata versus molluscum) on still images. While we accepted the consensus diagnosis of our child protection team experts, distinguishing these entities on still images alone may pose challenges for many learners; as such, in future iterations of this tool, diagnostic categories will be collapsed to what are critical distinguishing features for most front-line physicians. Since this is a part-task education intervention removed from the clinical practice, it is uncertain how knowledge gained via this tool will translate to patient-level skill and outcomes. As we did not have contact information for practicing pediatricians, we could not extend study participation to those pediatricians in general practice, and thus our findings may not be generalizable to this group. We used a convenience sample of participants, and therefore, our results may be biased by more motivated physician learners (Kalet et al., 2013). Finally, this study included medical students, pediatric and emergency medicine physician participants, and thus may not be generalizable to other categories of learners. In conclusion, deliberate practice of prepubescent female genital examination images was effective for improving participant skill in diagnosing normal versus abnormal cases in our study cohort of medical students, residents, and attendings. Participants improved most with normal cases, which has practice implications in potentially reducing unnecessary referrals to child protection teams for normal anatomy misidentified as abusive. Participants were better able to categorize abnormal cases as medical or traumatic pathology versus identify a specific case diagnosis. However, they demonstrated little improvement in either of the latter skills through the education intervention. Interestingly, there were no significant differences in performance between levels of learners, which reinforces the need for education in this area even among experienced clinicians. Future directions include refinement of the education intervention to include more abnormal case examples to allow for better development of mental representation of these cases and measuring learning retention.
11
Child Abuse & Neglect 101 (2020) 104379
A.L. Davis, et al.
Declaration of Competing Interest All the authors have no conflicts of interest to disclose. Acknowledgements We thank Dr. Hussein Salehmohamed for his assistance in preparing and labeling case images, and Ms. Beryl Chung for her help in building the pre-case tutorial. We appreciate the time that Dr. Vince Palusci took to pilot test the education intervention and provide additional feedback. Funding for this study was provided by the Canadian Association for Medical Education Wooster Family Education Grant. Appendix A. Supplementary data Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.chiabu.2020. 104379. References Adams, J. A., Farst, K. J., & Kellogg, N. D. (2018). Interpretation of medical findings in suspected child sexual abuse: An update for 2018. Journal of Pediatric and Adolescent Gynecology, 31(3), 225–231. Adams, J. A., Kellogg, N. D., Farst, K. J., Harper, N. S., Palusci, V. J., Frasier, L. D., ... Starling, S. P. (2016). Updated guidelines for the medical assessment and care of children who may have been sexually abused. Journal of Pediatric and Adolescent Gynecology, 29(2), 81–87. Adams, J. A., Starling, S. P., Frasier, L. D., Palusci, V. J., Shapiro, R. A., Finkel, M. A., & Botash, A. S. (2012). Diagnostic accuracy in child sexual abuse medical evaluation: role of experience, training, and expert case review. Child Abuse & Neglect, 36(5), 383–392. Atabaki, S., & Paradise, J. E. (1999). The medical evaluation of the sexually abused child: Lessons from a decade of research. Pediatrics, 104, 178–186. Black, P., & William, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–71. Boutis, K., Cano, S., Pecaric, M., Welch-Horan, B., Lampl, B. S., Ruzal-Shapiro, C., & Pusic, M. V. (2016). Interpretation difficulty of normal versus abnormal radiographs using a pediatric example. CMEJ, 7(1), e68–77 Mar 31;37(31). Boutis, K., Pecaric, M., Carrière, B., Stimec, J., Willan, A., Chan, J., & Pusic, M. (2019). The effect of testing and feedback on the forgetting curves for radiograph interpretation skills. Medical Teacher, 41(7), 756–764. Boutis, K., Pecaric, M., Ridley, J., Andrews, J., Gladding, S., & Pusic, M. (2013). Hinting strategies for improving the efficiency of medical student learning of deliberately practiced web-based radiographs. Medical Education, 47(9), 877–887. Boutis, K., Pecaric, M., Seeto, B., & Pusic, M. (2010). Using signal detection theory to model changes in serial learning of radiological image interpretation. Advances in Health Sciences Education: Theory and Practice, 15(5), 647–658. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Mahwah, New Jersey: Lawrence Erlbaum Associates. Dubow, S. R., Giardino, A. P., Christian, C. W., & Johnson, C. F. (2005). Do pediatric chief residents recognize details of prepubertal female genital anatomy: A national survey. Child Abuse & Neglect, 29(2), 195–205. Ericsson, K. A. (2004). Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Academic Medicine: Journal of the Association of American Medical Colleges, 79(10 Suppl), S70–81. Frasier, L. D., Thraen, I., Kaplan, R., & Goede, P. (2012). Development of standardized clinical training cases for diagnosis of sexual abuse using a secure telehealth application. Child Abuse & Neglect, 36(2), 149–155. Giardino, A. P., Brayden, R. M., & Sugarman, J. M. (1998). Residency training in child sexual abuse evaluation. Child Abuse & Neglect, 22(4), 331–336. Hatala, R., Gutman, J., Lineberry, M., Triola, M., & Pusic, M. (2019). How well is each learner learning? Validity investigation of a learning curve-based assessment approach for ECG interpretation. Advances in Health Sciences Education: Theory and Practice, 24(1), 45–63. Heisler, K. W., Starling, S. P., Edwards, H., & Paulson, J. F. (2006). Child abuse training, comfort, and knowledge among emergency medicine, family medicine, and pediatric residents. Medical Education Online, 11(1), 4600. Jenny, C., Crawford-Jakubiak, J. E., Committee on Child, A., Neglect, & American Academy of P (2013). The evaluation of children in the primary care setting when sexual abuse is suspected. Pediatrics, 132(2), e558–567. Kalet, A., Ellaway, R. H., Song, H. S., Nick, M., Sarpel, U., Hopkins, M. A., ... Pusic, M. V. (2013). Factors influencing medical student attrition and their implications in a large multi-center randomized education trial. Advances in Health Sciences Education: Theory and Practice, 18(3), 439–450. Kellman, P. J., Massey, C. M., & Son, J. Y. (2010). Perceptual learning modules in mathematics: Enhancing students’ pattern recognition, structure extraction, and fluency. Topics in Cognitive Science, 2(2), 285–305. Killough, E., Spector, L., Moffatt, M., Wiebe, J., Nielsen-Parker, M., & Anderst, J. (2016). Diagnostic agreement when comparing still and video imaging for the medical evaluation of child sexual abuse. Child Abuse & Neglect, 52, 102–109. https://doi.org/10.1016/j.chiabu.2015.12.007. Kwan, C., Pusic, M., Pecaric, M., Weerdenburg, K., Tessaro, M., & Boutis, K. (2019). The variable journey in learning to interpret pediatric point-of-Care ultrasound images: A multicenter prospective cohort study. Acad Emerg Med Educ & Training. https://onlinelibrary.wiley.com/action/doSearch?AllField=The+Variable +Journey+in+Learning+to+Interpret+Pediatric+Point-of-Care+Ultrasound+Images%3A+A+Multicenter+Prospective+Cohort+Study&SeriesKey= 24725390. Ladson, S., Johnson, C. F., & Doty, R. E. (1987). Do physicians recognize sexual abuse? American Journal of Diseases of Children, 141(4), 411–415. Larsen, D. P., Butler, A. C., & Roediger, I. I. I. H. L. (2008). Test enhanced learning in medical education. Medical Education, 42(10), 959–966. Leblanc, V. R., Brooks, L. R., & Norman, G. R. (2002). Believing is seeing: The influence of a diagnostic hypothesis on the interpretation of clinical features. Academic Medicine, 77(10), S67–S69. Lee, M. S., Pusic, M., Carriere, B., Dixon, A., Stimec, J., & Boutis, K. (2019). Building emergency medicine trainee competency in pediatric musculoskeletal radiograph interpretation: A multicenter prospective cohort study. Acad Emerg Med Educ & Training, 00, 1–11. Lentsch, K. A., & Johnson, C. F. (2000). Do physicians have adequate knowledge of child sexual abuse? The results of two surveys of practicing physicians, 1986 and 1996. Child Maltreatment, 5(1), 72–78. Makoroff, K. L., Brauley, J. L., Brandner, A. M., Myers, P. A., & Shapiro, R. A. (2002). Genital examinations for alleged sexual abuse of prepubertal girls: Findings by pediatric emergency medicine physicians compared with child abuse trained physicians. Child Abuse & Neglect, 26(12), 1235–1242. Menoch, M., Zimmerman, S., Garcia-Filion, P., & Bulloch, B. (2011). Child abuse education: An objective evaluation of resident and attending physician knowledge. Pediatric Emergency Care, 27(10), 937–940. Narayan, A. P., Socolar, R. R., & St Claire, K. (2006). Pediatric residency training in child abuse and neglect in the United States. Pediatrics, 117(6), 2215–2221. Norman, G. (2018). Is the mouth the mirror of the mind? Advances in Health Sciences Education: Theory and Practice, 23(4), 665–669. Palusci, V. J., & McHugh, M. T. (1995). Interdisciplinary training in the evaluation of child sexual abuse. Child Abuse & Neglect, 19(9), 1031–1038. Paradise, J. E., Winter, M. R., Finkel, M. A., Berenson, A. B., & Beiser, A. S. (1999). Influence of the history on physicians’ interpretations of girls’ genital findings.
12
Child Abuse & Neglect 101 (2020) 104379
A.L. Davis, et al.
Pediatrics, 103, 980–986. Paradise, J. E., Finkel, M. A., Beiser, A. S., Berenson, A. B., Greenberg, D. B., & Winter, M. R. (1997). Assessments of girl’s genital findings and the likelihood of sexual abuse: Agreement among physicians self-rated as skilled. Archives of Pediatrics & Adolescent Medicine, 151(9), 883–891. Pecaric, M. R., Boutis, K., Beckstead, J., & Pusic, M. V. (2017). A big data and learning analytics approach to process-level feedback in cognitive simulations. Academic Medicine: Journal of the Association of American Medical Colleges, 92(2), 175–184. Pusic, M., Pecaric, M., & Boutis, K. (2011a). How much practice is enough? Using learning curves to assess the deliberate practice of radiograph interpretation. Academic Medicine: Journal of the Association of American Medical Colleges, 86(6), 731–736. Pusic, M., Pecaric, M., & Boutis, K. (2011b). How much practice is enough? Using learning curves to assess the deliberate practice of radiograph interpretation. Academic Medicine: Journal of the Association of American Medical Colleges, 86(6), 731–736. Pusic, M. V., Andrews, J. S., Kessler, D. O., Teng, D. C., Ruzal-Shapiro, C., Pecaric, M., & Boutis, K. (2012). Determining the optimal case mix of abnormals to normals for learning radiograph interpretation: A randomized control trial. Medical Education, 46(3), 289–298. Starling, S. P., Heisler, K. W., Paulson, J. F., & Youmans, E. (2009). Child abuse training and knowledge: A national survey of emergency medicine, family medicine, and pediatric residents and program directors. Pediatrics, 123(4), e595–602. van Merrienboer, J. J. G., Kirschner, P. A., & Kester, L. (2010). Taking the load off a learner’s mind: Instructional design for complex learning. Educational Psychologist, 38(1), 5–13. Ward, M. G., Bennett, S., Plint, A. C., King, W. J., Jabbour, M., & Gaboury, I. (2004). Child protection: A neglected area of pediatric residency training. Child Abuse & Neglect, 28(10), 1113–1122. Winter, D. A. (2009). Biomechanics and motor control of human movement. John Wiley Sons, Inc.
13