Multiple Sclerosis and Related Disorders 39 (2020) 101884
Contents lists available at ScienceDirect
Multiple Sclerosis and Related Disorders journal homepage: www.elsevier.com/locate/msard
Clinical instrument to retrospectively capture levels of EDSS a
b
c
d
a
John Robert Ciotti , Noah Sanders , Amber Salter , Joseph R. Berger , Anne Haney Cross , ⁎ Salim Chahina,
T
a
Department of Neurology, Washington University in St. Louis; St. Louis, MO, USA University of Minnesota Medical School; Minneapolis, MN, USA Division of Biostatistics, Washington University in St. Louis; St. Louis, MO, USA d Department of Neurology, University of Pennsylvania; Philadelphia, PA, USA b c
ARTICLE INFO
ABSTRACT
Keywords: Multiple sclerosis Outcome measure Disability Algorithm Chart review Examination
Background: The Expanded Disability Status Scale (EDSS), a common outcome measure in Multiple Sclerosis (MS), is obtained prospectively through a direct standardized evaluation. The objective of this study is to develop and validate an algorithm to derive EDSS scores from previous neurological clinical documentation. Methods: The algorithm utilizes data from the history, review of systems, and physical exam. EDSS scores formally obtained from research patients were compared to captured EDSS (c‐EDSS) scores. To test inter‐rater reliability, a second investigator captured scores from a subset of patients. Agreement between formal and cEDSS scores was assessed using a weighted kappa. Clinical concordance was defined as a difference of one-step in EDSS (0.5) and functional system (1.0) scores. Results: Clinical documentation from 92 patients (EDSS range 0.0–8.5) was assessed. Substantial agreement between the c‐EDSS and formal EDSS (kappa 0.80; 95% CI 0.74–0.86) was observed. The mean difference between scores was 0.16. The clinical concordance was 78%. Near-perfect agreement was found between the two raters (kappa 0.89; 95% CI 0.84–0.95). The mean inter-rater difference in c-EDSS was 0.23. Conclusions: This algorithm reliably captures EDSS scores retrospectively with substantial correlation with formal EDSS and high inter‐rater agreement. This algorithm may have practical implications in clinic, MS research and clinical trials.
1. Background Quantifying the degree of disability caused by multiple sclerosis (MS) is the cornerstone of MS research and clinical trials (Cohen et al., 2012). The Expanded Disability Status Scale (EDSS) is the most widely used disability outcome measure for MS and is accepted by regulatory authorities for use in research studies and clinical trials. It is obtained through direct standardized evaluation of the person with MS (pwMS) (Kurtzke, 1983). Seven Functional Systems (FS) (vision, brainstem, pyramidal, cerebellar, sensory, bowel/bladder, and cerebral) are evaluated and scored. The overall EDSS score, ranging from 0–10, is based on FS scores and ambulatory status (Kurtzke, 1983). Although current guidelines recommend documentation of a quantitative measure of disability in the clinical setting (Rae-Grant et al., 2015), the EDSS is not easily assessed and tracked in clinical practice due to its required specialized training, complexity, and time-intensive nature (Rae-Grant et al., 2015; Baldassari et al., 2018). Attempts to streamline the EDSS have shown relatively good agreement with formal
⁎
EDSS scores (Baldassari et al., 2018). Other attempts that used patientderived data (either by written questionnaire or telephone interview) showed varying degrees of agreement with the formal EDSS, especially at lower scores (Bowen et al., 2001; Collins et al., 2016; Huda et al., 2016; Ingram et al., 2010; Lechner-Scott et al., 2003). Retrospective capture of EDSS based on recollection of symptoms by pwMS has been useful primarily for obtaining major benchmarks (Ingram et al., 2010). Despite the specialized training that is often required for the EDSS, its inter-rater inconsistencies were high (Ontaneda et al., 2015; Kappos et al., 2018; Noseworthy et al., 1990; Landis and Koch, 1977; Plemel et al., 2017; Amato et al., 1988; Francis et al., 1991; Goodkin et al., 1992), especially at lower EDSS levels (Landis and Koch, 1977); thus, the more rigorous Neurostatus EDSS was developed to improve inter-rater reliability (Cohen et al., 2012; Kappos et al., 2015). An algorithmic electronic scoring approach to the Neurostatus has further improved its inter-rater reliability and consistency (D'Souza et al., 2017). To our knowledge, no validated tool is commonly used to
Corresponding author. E-mail address:
[email protected] (S. Chahin).
https://doi.org/10.1016/j.msard.2019.101884 Received 29 October 2019; Received in revised form 27 November 2019; Accepted 2 December 2019 2211-0348/ © 2019 Elsevier B.V. All rights reserved.
Multiple Sclerosis and Related Disorders 39 (2020) 101884
J.R. Ciotti, et al.
retrospectively capture EDSS scores using data from prior clinical documentation (Cohen et al., 2012; Ontaneda et al., 2015). This has implications for retrospective chart review studies and clinical trials, such as documentation of EDSS progression (or confirmed stability) as a requirement for study entry (Ontaneda et al., 2015; Kappos et al., 2018). We aimed to develop and validate a standardized algorithm to retrospectively derive an EDSS score by review of neurology clinical documentation to address the gap in retrospective disability evaluation.
and supplement for missing data; specifically, the use of subjective complaints when objective findings were lacking was added. The final version of the algorithm is the version tested and validated herein. 2.4. Patient selection and chart review Patients included in this study were culled from a convenience sample taken from databases of research patients from three investigator-initiated studies at Washington University in St. Louis (WUSM) (two longitudinal imaging studies, one study into use of dalfampridine for vision). All 138 patients were screened for study eligibility. Many of these patients are also managed clinically at the same institution. All pwMS in these studies had formal research evaluations including original EDSS scoring as part of their research protocols, and all were sequentially assessed for inclusion in this analysis. For each patient, the most proximate clinical documentation to the date of the formal EDSS was reviewed. PwMS without any clinical documentation or with distant documentation (i.e. more than one month before or six months after the formal EDSS) were excluded. We attempted to confirm clinical stability (no worsening disability or relapses) in the interval period. For those patients in whom clinical stability could not be reliably confirmed, a sensitivity analysis was performed as detailed below. The primary rater (JC) reviewed the clinical neurology note and recorded the data into the CIRCLE scoring sheet. To test inter-rater reliability, a second rater (SC) captured scores from the first 50 consecutive eligible pwMS. CIRCLE raters were blinded to the formal EDSS scores and to the other rater's c-EDSS scores.
2. Methods 2.1. Development of algorithm To develop the Clinical Instrument to Retrospectively Capture Levels of the EDSS (CIRCLE), all available objective and subjective data from a clinical note (including neurologic examination, history, and review of systems) were matched to elements belonging to each FS (Kurtzke, 1983). Several principles were defined to guide the development of the algorithm:
• Objective findings from the neurologic examination are prioritized, • • • •
but subjective data are used when objective information is missing, or for certain FS (e.g., bowel/bladder problems, fatigue). Because worse signs and symptoms produce higher FS scores, the algorithm handles duplicate data by prioritizing more severe and disabling abnormalities within each FS, as well as for the overall EDSS (i.e., ambulation is assessed first because restricted ambulation produces a high EDSS score). To address inconsistencies or missing data, a default severity of “moderate” is applied if the severity of a symptom/sign is not clear in the records. If no abnormality within a FS is mentioned in the chart, it is scored as normal. Optional adjustments are included for comparison to the Neurostatus scoring system (i.e., adjustments to sensory and bowel/ bladder FS, inclusion of disc pallor and fatigue) (Ontaneda et al., 2015).
2.5. Statistical analysis Agreement between the formal EDSS scores and the c-EDSS score was assessed using a weighted kappa and 95% confidence interval (CI). Similar comparison was made for each FS score. Prior studies have suggested that a one-step change in the EDSS or FS is not likely clinically significant (Noseworthy et al., 1990). A measure of clinical concordance was thus defined as a difference of one step in either the EDSS (0.5 points) or the individual FS (1.0 points). Kappas and 95% CIs were also evaluated in a sensitivity analysis to assess the effect of time interval between formal EDSS and clinical visit for c-EDSS determination. A kappa of 0 to 0.20 is considered slight agreement; 0.20–0.40 fair agreement; 0.40–0.60 moderate agreement; 0.60–0.80 substantial agreement; and 0.80–1.00 almost perfect agreement (Landis and Koch, 1977). The inter-rater agreement was assessed using a weighted kappa and 95% CI for the EDSS score and each FS score. The above measure of clinical concordance is reported as well. To maximize the sample size, clinical notes from one month prior to six months after the formal EDSS date were included. However, because clinically stability cannot be reliably confirmed for visits before a formal EDSS or for relatively distant visits after the formal EDSS, a sensitivity analysis was also performed, with a c-EDSS from a second clinical note substituted into the analysis for some pwMS with relatively distant clinical notes from the formal EDSS date, or for those without confirmed clinical stability (relative to the formal EDSS date) (Fig. 3).
2.2. Use of algorithm Full CIRCLE instructions are available in Appendix A. Briefly, users review the office note and mark all abnormalities on a scoring sheet (Appendix B). In addition to the neurologic examination, users review the subjective and review of systems portions of the note. For example, data on visual acuity, visual field testing and any mention of visual symptoms are pertinent components for the visual FS. Using data from the scoring sheet, abnormalities within each FS are considered in a stepwise fashion based on their degree of severity (i.e., paraplegia is considered before abnormal reflexes; see Fig. 1 for an example of how to score the pyramidal FS). Even if there is a discrepancy in severity within an FS, the algorithm prioritizes more severe abnormalities (with the exception of ambulation). When no abnormalities are mentioned, the FS score is zero. The instrument also generates an ambulation score, allowing direct comparison to be made with Neurostatus ambulation scores. Similar to the derivation of a formal EDSS, FS scores are used to calculate an overall captured EDSS (c-EDSS) score using the EDSS CIRCLE and ambulation table (Fig. 2, Appendix C).
3. Results 3.1. Patients 138 consecutive WUSM study patients with MS were considered for this analysis after applying the inclusion criteria; 46 pwMS were excluded, either due to no usable clinical documentation (3 pwMS), or clinical notes falling outside of the time criteria (43 pwMS); 92 pwMS were included in the analysis. Thirteen different neurologists authored the clinical documentation which was the basis for the c-EDSS score.
2.3. Initial testing An initial version of the algorithm was tested in 20 pwMS (from investigator-initiated studies) from the University of Pennsylvania, and the c-EDSS scores were directly compared to formal EDSS scores. Minor adjustments were made to correct for any systematic score deviations 2
Multiple Sclerosis and Related Disorders 39 (2020) 101884
J.R. Ciotti, et al.
Fig. 1. Example of pyramidal system scoring. The pyramidal FS is scored by first checking whether any muscle weakness is present. More profound weakness or weakness of more limbs leads to a higher FS score. If no muscle weakness is present, other symptoms including motor fatigability, gait difficulty, and any disability are assessed; if any are present, the FS is scored as 2. If none of these symptoms are endorsed, signs only on exam may lead to an FS of 1; otherwise, the FS is scored as 0. Within an FS, once an FS score is established (as denoted by the asterisk), there is no need to proceed further with the algorithm.
3.2. Algorithm performance/agreement Substantial agreement between the c-EDSS obtained using the algorithm and the formal EDSS (kappa 0.80, 95% CI 0.75–0.86) was achieved (see Table 2, Fig. 4). The mean and median difference between c-EDSS and formal EDSS scores was 0.16 and 0.5, respectively; the mean absolute difference between c-EDSS and formal EDSS scores (i.e., the difference between the two scores irrespective of which score is higher or lower) was 0.43. In 44 of the 92 pwMS (47.8%), the c-EDSS matched the formal EDSS exactly. When the c-EDSS did not match the formal EDSS, the most frequent difference was 0.5 (30.4%). Agreement for individual FS varied from fair (kappa 0.37 for Cerebral) to moderate (kappa 0.62 for Pyramidal). When evaluated for clinically-relevant concordance (overall EDSS within 0.5 and individual FS within 1.0), the c-EDSS was clinically concordant with the formal EDSS at a rate of 78%. With the exception of Vision (79%), all individual Functional Systems were clinically concordant at least 80% of the time. Of the 92 patients included in the analysis, 40 had a clinical note within one week before or after the formal EDSS, or within one month after the formal EDSS with documented stability. One patient had a note within one month after the formal EDSS that did explicitly document stability (and thus was included in the sensitivity analysis). The remaining 51 pwMS had a more distant note from the formal EDSS evaluated using the CIRCLE algorithm that was substituted for the sensitivity analysis. Results for the sensitivity analysis were comparable to the primary analysis (kappa 0.79, 95% CI 0.73–0.85), with mean difference between c-EDSS and formal EDSS of 0.45 (median difference was 0.5). The agreement for individual FS scores was similar to the primary analysis (Table 2).
Fig. 2. EDSS CIRCLE. The EDSS CIRCLE is used to calculate a c-EDSS score by determining the highest FS score first (starting in the center-most ring). The frequency of the highest FS score and lesser FS scores are used to determine which final c-EDSS score applies (outer-most ring in the corresponding sector). For patients with restricted ambulation, the ambulation table determines the cEDSS score (Appendix C); individual FS scores may still be calculated for these patients using the scoring sheet (Appendix B).
3.3. Inter-rater agreement The clinical note was 44 ± 53 (mean ± standard deviation) days from the date of the formal EDSS. All MS subtypes were included, covering a broad range of disability level (EDSS 0.0–8.5) (Table 1).
For the 50 pwMS reviewed by a second investigator, almost perfect agreement between the two raters (kappa 0.89, 95% CI 0.84–0.95) was achieved (see Table 3, Fig. 5). The mean difference in c-EDSS between 3
Multiple Sclerosis and Related Disorders 39 (2020) 101884
J.R. Ciotti, et al.
Fig. 3. Notes included in sensitivity analysis. For patients whose most proximal clinical note was not authored within a week (before or after) of the formal EDSS, the CIRCLE algorithm was performed on a second clinical note, and the results were substituted into a secondary analysis to assess differences in rater agreement for the instrument's sensitivity over time. If the proximal note was recorded one month to one week prior to the formal EDSS, the algorithm was repeated on a second clinical note from after the formal EDSS (regardless of interval). Similarly, for notes obtained more than one week after the formal EDSS, a second clinical note from a date prior to the formal EDSS (regardless of interval) was substituted into the sensitivity analysis (exceptions could be made for a proximal note within one month after the formal EDSS that explicitly documented stability).
4. Discussion
Table 1 Patient characteristics. Number of patients Age, years
92 Avg (SD) 49.7 (12.2) Avg (SD) 44.3 (53.1)
Interval, days Gender Male Female EDSS 0–2.5 3.0–5.5 6.0–8.5 MS Subtype RRMS PPMS SPMS
This study introduces a new method to accurately obtain EDSS scores from prior clinical documentation. Scores obtained retrospectively with the CIRCLE algorithm had substantial agreement with formal EDSS scores. Furthermore, the inter-rater reliability of the CIRCLE was very strong, further confirming the reliability and reproducibility of the algorithm. This algorithm may be a useful resource for clinical and research purposes. In clinical practice, it can serve as a template to standardize current and prior neurological evaluations (including subjective and objective findings), allowing for comparisons between providers across time points. A more standardized patient assessment may allow for earlier and more accurate detection of subtle signs of progression which can have an impact on disease management (Rae-Grant et al., 2015). Many opportunities exist for use of the algorithm in research and clinical trials. Chart review studies (that previously lacked a reliable disability outcome measure) can utilize the algorithm to obtain an accurate assessment of disability from prior notes. Similarly, the standardized prospective or retrospective documentation of disability scores can help secure a disease subtype more accurately and confirm the presence (or absence) of disability progression, which would enhance investigators’ ability to confirm that patients meet eligibility criteria for certain interventional clinical trials (Kappos et al., 2018). With the increasing focus on progressive MS subtypes in clinical trials, and the introduction of more trials focused on repair (Plemel et al., 2017), this algorithm will prove a timely tool to improve recruitment into such trials. The algorithm performed consistently well in a sample of varied MS patients with different disease subtypes and disability levels, and on
Range 21.6–74.0 Range -30–172
21 (22.8%) 71 (77.2%) 27 (29.3%) 35 (38.0%) 30 (32.6%) 57 (62.0%) 15 (16.3%) 20 (21.7%)
the two raters was 0.23. In 34 of the 50 pwMS (68%), the two raters arrived at the same c-EDSS (median difference between raters was 0.0). When the two raters do not exactly agree, the most frequent difference is 0.5, the smallest possible incremental difference in EDSS scores. Agreement on individual Functional Systems ranged from kappa 0.56 (Brainstem) to kappa 0.93 (Vision). Although Ambulation scores were not formally obtained in research patients, the algorithm also allows for Ambulation to be scored and compared between raters; the two raters agree near-perfectly in this domain (kappa 0.94). Using the measure of clinically-relevant concordance, EDSS scores from the two raters were 92% concordant. All individual FS scores from the two raters are clinically concordant with each other greater than 80% of the time.
Table 2 Validation of c-EDSS (comparison to formal EDSS).
EDSS Functional Systems Vision Brainstem Pyramidal Cerebellar Sensory Bowel/Bladder** Cerebral ⁎ ⁎⁎
Primary Analysis Kappa (95% CI)
SE
Exact agreement Freq (%)
Clinical concordance* Freq (%)
Sensitivity Analysis Kappa (95% CI)
0.80 (0.74–0.86)
0.03
44/92 (47.8%)
72/92 (78.3%)
0.79 (0.73–0.85)
0.47 0.46 0.62 0.50 0.47 0.44 0.37
0.07 0.08 0.05 0.07 0.07 0.07 0.09
51/92 60/92 51/92 52/92 45/92 40/91 57/92
73/92 78/92 84/92 78/92 80/92 76/91 74/92
0.62 0.48 0.67 0.58 0.40 0.44 0.36
(0.33–0.61) (0.31–0.62) (0.51–0.72) (0.37–0.64) (0.34–0.60) (0.31–0.58) (0.20–0.54)
(55.4%) (65.2%) (55.4%) (56.5%) (48.9%) (44.0%) (62.0%)
(79.3%) (84.8%) (91.3%) (84.8%) (87.0%) (83.5%) (80.4%)
Clinical concordance defined as within 0.5 (inclusive) of overall EDSS and within 1.0 (inclusive) of individual Functional Systems. One Bowel/Bladder FS score missing from source data for formal EDSS.
4
(0.50–0.74) (0.32–0.64) (0.57–0.76) (0.46–0.70) (0.27–0.53) (0.30–0.57) (0.20–0.53)
Multiple Sclerosis and Related Disorders 39 (2020) 101884
J.R. Ciotti, et al.
Fig. 4. Agreement of c-EDSS and formal EDSS. There is substantial agreement between the c-EDSS obtained using the algorithm and the formal EDSS (kappa 0.80, 95% CI 0.75–0.86). In 47.8% of patients, there is exact agreement between the c-EDSS and the formal EDSS scores.
notes recorded by more than 10 different neurologists with varying style of documentation and exam templates. The mean difference of only 0.16 between the c-EDSS and formal EDSS scores suggests that it does not systematically over or under-score disability levels in patients. Furthermore, the instrument requires only minimal training, and the note review and EDSS scoring takes only a few minutes for a trained rater to perform in full. Simplifying the approach to the EDSS, removing redundancy, and prioritizing systems and symptoms that lead to higher c-EDSS scores first (i.e., ambulation) contributes to time savings.
Furthermore, the algorithmic backbone of the instrument allows for automation of the FS and EDSS score calculations. Other groups have suggested the potential for automation and integration of novel, streamlined EDSS tools into electronic medical record (EMR) systems (Baldassari et al., 2018). Our colleagues at Washington University, using EDSS data from the Combi-Rx trial, notably streamlined the EDSS to only the elements that may be seen in a clinical evaluation and still demonstrated moderate agreement with the formally obtained EDSS (kappa for agreement 0.57) in a retrospective validation study of nearly
Table 3 Inter-rater comparison of c-EDSS.
EDSS Functional Systems Ambulation** Vision Brainstem Pyramidal Cerebellar Sensory Bowel/Bladder Cerebral ⁎ ⁎⁎
Kappa (95% CI)
SE
Exact agreement Freq (%)
Clinical concordance* Freq (%)
0.89 (0.84–0.95)
0.03
34/50 (68%)
46/50 (92%)
0.94 0.93 0.56 0.83 0.59 0.73 0.63 0.63
0.03 0.04 0.11 0.04 0.10 0.06 0.09 0.11
45/49 46/50 36/50 38/50 41/50 35/50 33/50 41/50
46/49 50/50 42/50 50/50 41/50 46/50 49/50 43/50
(0.88–1.00) (0.86–1.00) (0.34–0.78) (0.75–0.91) (0.40–0.78) (0.62–0.85) (0.46–0.81) (0.41–0.84)
(92%) (92%) (72%) (76%) (82%) (70%) (66%) (82%)
Clinical concordance defined as within 0.5 (inclusive) of overall EDSS and within 1.0 (inclusive) of individual Functional Systems. Included in Neurostatus scoring; one Ambulatory score missing from source data for formal EDSS.
5
(94%) (100%) (84%) (100%) (82%) (92%) (98%) (86%)
Multiple Sclerosis and Related Disorders 39 (2020) 101884
J.R. Ciotti, et al.
Fig. 5. Inter-rater agreement. There is almost perfect agreement between the c-EDSS obtained by two independent raters (kappa 0.89, 95% CI 0.84–0.95). In 68% of patients, the two raters arrive at exactly the same c-EDSS.
1000 patients (Baldassari et al., 2018). Their work and ours suggests the feasibility of embedding a streamlined, standardized EDSS into EMR systems to simultaneously produce an exam in the office note and automatically calculate an EDSS score for the visit (Rae-Grant et al., 2015). Other attempts to streamline or simplify the EDSS using patientderived data has been shown to be generally precise, but may be inaccurate; one study demonstrated that patients tended to over-score themselves by 0.5–0.7 on the EDSS on average (56–65% scored within 0.5 of the formal EDSS, and 77–82% scored within 1 point of the formal EDSS) (Bowen et al., 2001). An analysis of this and other scoring systems confirmed that, at EDSS < 6, patient-derived rating scales generally tend to overestimate the EDSS by about 0.5 (Collins et al., 2016). Another group using patient-derived data via written questionnaire or telephone interview was able to achieve substantial correlation with the most current formally-obtained EDSS (coefficient of correlation: 0.79), but attempts to retrospectively capture an EDSS based on the patient's recollection of symptoms were limited to major benchmarks in the stepwise scoring system (i.e., when EDSS = 6 [use of assistive device for ambulation]) rather than the full continuous EDSS scale (Ingram et al., 2010). Other attempts to capture an EDSS over the phone have either been limited to those patients with a baseline EDSS ≥ 6.0 (Huda et al., 2016), where steps between EDSS values are explicitly determined by impairment in ambulation, or demonstrated poorer correlation at lower overall EDSS values or select Functional Systems (Lechner-Scott et al., 2003).
An important limitation of this study is to recognize that it does not endeavor to address the limitations of the EDSS itself. The inter-rater reliability of the EDSS has known shortcomings that have been examined in many older studies (D'Souza et al., 2017; Noseworthy et al., 1990; Amato et al., 1988; Francis et al., 1991; Goodkin et al., 1992; Noseworthy, 1994; Sharrack and Hughes, 1996; Verdier-Taillefer et al., 1991) (i.e. prior to the development of the Neurostatus EDSS (Kappos et al., 2015)). In one trial, only 69% of a subset of patients evaluated by multiple physicians on the same day had perfect agreement on the overall EDSS score, with equivalent or lower degrees of agreement for each FS category (Noseworthy et al., 1990); the same trial found that only 62% of the agreement between two raters could be explained other than by chance. Reproducible scoring across raters is even more challenging at lower EDSS scores; in another study of patients with EDSS 1.0–3.5, scores varied between experienced raters by as much as 1.5 EDSS points or 3.0 individual FS points (Goodkin et al., 1992). The agreement between CIRCLE and formal EDSS scores in our study were at least equal to (if not better than) those reported in prior inter-rater studies. In addition to strong inter-rater reliability, our instrument demonstrates functionally relevant agreement with formal EDSS and FS scores. Defined as no more than a one step change in EDSS or FS scores (Noseworthy et al., 1990), clinical concordance of CIRCLE was at least 80% for the EDSS and nearly all FS scores. We attempted to use the most proximate clinical note to the formal EDSS and to confirm clinical stability in the note. It is possible that 6
Multiple Sclerosis and Related Disorders 39 (2020) 101884
J.R. Ciotti, et al.
relapses not captured, subtle progression, or the potential salutary effect of dalfampridine (18% of patients included in this study were enrolled in a dalfampridine trial) in the interval period could be confounding some of the results (i.e., the agreement would be even higher if all notes were from the same day as the formal EDSS). However, results from our sensitivity analysis (using a temporally distant note for some patients) produced similar results to the primary analysis, suggesting that, in our sample, the interval from the formal evaluation to the clinical note plays a negligible role. This study validated the use of the instrument at one academic medical center. Furthermore, most of the notes used in this study were authored by MS specialists (whose history and exam may be more focused on MS-specific abnormalities). However, we believe that the instrument would perform well on any neurology note because the depth of descriptive information in the notes used in this study varied widely, and the algorithm is designed to use all available data from within an office note. Future directions for this tool may include formal assessment at different clinical institutions and in select patient populations (i.e., low vs. high disability, by MS subtype, by duration of disease, etc.). We did not perform subgroup analyses due to the relatively small overall number of patients in this study and the inherent issues in performing multiple subgroup analyses. Our group plans to perform a contemporaneous comparison of the CIRCLE with the traditional EDSS to further confirm the validity of the tool in clinical practice.
personal fees from Biogen, grants from TEVA, personal fees from Genentech/Roche, personal fees from Genzyme, personal fees from Millennium/Takeda, personal fees from Novartis, personal fees from Inhibikase, personal fees from ExcisionBio, personal fees from Roche, personal fees from Amgen, personal fees from Astra-Zeneca, personal fees from Alkermes, personal fees from Bayer. Anne Cross reports consulting honoraria from Biogen, Celgene, EMD Serono, Genentech/ Roche, Novartis, and TG Therapeutics, and receives research support from Genentech and EMD Serono. Salim Chahin reports consulting and/ or speaking honoraria from Biogen, Genentech, Sanofi Genzyme, Novartis, and Teva Neuroscience.
5. Conclusion
Cohen, J.A., Reingold, S.C., Polman, C.H., Wolinsky, J.S., 2012. International advisory committee on clinical trials in multiple S. disability outcome measures in multiple sclerosis clinical trials: current status and future prospects. Lancet Neurol. 11 (5), 467–476. Kurtzke, J.F., 1983. Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS). Neurology 33 (11), 1444–1452. Rae-Grant, A., Bennett, A., Sanders, A.E., Phipps, M., Cheng, E., Bever, C., 2015. Quality improvement in neurology: multiple sclerosis quality measures: executive summary. Neurology 85 (21), 1904–1908. Baldassari, L.E., Salter, A.R., Longbrake, E.E., Cross, A.H., Naismith, R.T., 2018. Streamlined EDSS for use in multiple sclerosis clinical practice: development and cross-sectional comparison to EDSS. Mult. Scler. 24 (10), 1347–1355. Bowen, J., Gibbons, L., Gianas, A., Kraft, G.H., 2001. Self-administered expanded disability status scale with functional system scores correlates well with a physicianadministered test. Mult. Scler. 7 (3), 201–206. Collins, C.D., Ivry, B., Bowen, J.D., Cheng, E.M., Dobson, R., Goodin, D.S., et al., 2016. A comparative analysis of patient-reported expanded disability status scale tools. Mult. Scler. 22 (10), 1349–1358. Huda, S., Cavey, A., Izat, A., Mattison, P., Boggild, M., Palace, J., 2016. Nurse led telephone assessment of expanded disability status scale assessment in MS patients at high levels of disability. J. Neurol. Sci. 362, 66–68. Ingram, G., Colley, E., Ben-Shlomo, Y., Cossburn, M., Hirst, C.L., Pickersgill, T.P., et al., 2010. Validity of patient-derived disability and clinical data in multiple sclerosis. Mult. Scler. 16 (4), 472–479. Lechner-Scott, J., Kappos, L., Hofman, M., Polman, C.H., Ronner, H., Montalban, X., et al., 2003. Can the expanded disability status scale be assessed by telephone? Mult. Scler. 9 (2), 154–159. Kappos, L., D’Souza, M., Lechner-Scott, J., Lienert, C., 2015. On the origin of Neurostatus. Mult. Scler. Relat. Disord. 4 (3), 182–185. D’Souza, M., Yaldizli, O., John, R., Vogt, D.R., Papadopoulou, A., Lucassen, E., et al., 2017. Neurostatus e-Scoring improves consistency of expanded disability status scale assessments: a proof of concept study. Mult. Scler. 23 (4), 597–603. Ontaneda, D., Fox, R.J., Chataway, J., 2015. Clinical trials in progressive multiple sclerosis: lessons learned and future perspectives. Lancet Neurol. 14 (2), 208–223. Kappos, L., Bar-Or, A., Cree, B.A.C., Fox, R.J., Giovannoni, G., Gold, R., et al., 2018. Siponimod versus placebo in secondary progressive multiple sclerosis (EXPAND): a double-blind, randomised, phase 3 study. Lancet 391 (10127), 1263–1273. Noseworthy, J.H., Vandervoort, M.K., Wong, C.J., Ebers, G.C., 1990. Interrater variability with the expanded disability status scale (EDSS) and functional systems (FS) in a multiple sclerosis clinical trial. The Canadian Cooperation MS Study Group. Neurology. 40 (6), 971–975. Landis, J.R., Koch, G.G., 1977. The measurement of observer agreement for categorical data. Biometrics 33 (1), 159–174. Plemel, J.R., Liu, W.Q., Yong, V.W., 2017. Remyelination therapies: a new direction and challenge in multiple sclerosis. Nat. Rev. Drug Discov. 16 (9), 617–634.
Acknowledgements The authors would like to acknowledge the principal investigator on the studies from which formal EDSS scores were obtained: Drs. Robert Naismith at Washington University in St. Louis and Clyde Markowitz and Dina Jacobs at the University of Pennsylvania. Supplementary materials Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.msard.2019.101884. References
The CIRCLE algorithm is a quick and simple tool that achieved substantial agreement with formal EDSS scores. The tool produces scores within a smaller margin than the known inter-rater reliability of the original EDSS. The algorithm performed well using clinical documentation from 13 different neurologists and in pwMS of different subtypes and a full range of disability levels. Using the CIRCLE tool to capture an EDSS score from clinical documents may have implications for clinical care of pwMS, retrospective research studies and clinical trials, and can potentially be automated. Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. CRediT authorship contribution statement John Robert Ciotti: Conceptualization, Methodology, Investigation, Writing - original draft, Writing - review & editing. Noah Sanders: Conceptualization, Investigation, Writing - review & editing. Amber Salter: Formal analysis, Writing - review & editing. Joseph R. Berger: Writing review & editing, Supervision. Anne Haney Cross: Writing - review & editing, Supervision. Salim Chahin: Conceptualization, Methodology, Investigation, Writing - original draft, Writing - review & editing, Supervision. Declaration of Competing Interest The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: John Ciotti has nothing to disclose. Noah Sanders has nothing to disclose. Amber Salter reports consulting fees for statistical reviews for Circulation: Cardiovascular Imaging. Joseph Berger reports grants and
7
Multiple Sclerosis and Related Disorders 39 (2020) 101884
J.R. Ciotti, et al. Amato, M.P., Fratiglioni, L., Groppi, C., Siracusa, G., Amaducci, L., 1988. Interrater reliability in assessing functional systems and disability on the Kurtzke scale in multiple sclerosis. Arch. Neurol. 45 (7), 746–748. Francis, D.A., Bain, P., Swan, A.V., Hughes, R.A., 1991. An assessment of disability rating scales used in multiple sclerosis. Arch. Neurol. 48 (3), 299–301. Goodkin, D.E., Cookfair, D., Wende, K., Bourdette, D., Pullicino, P., Scherokman, B., et al., 1992. Inter- and intrarater scoring agreement using grades 1.0 to 3.5 of the Kurtzke expanded disability status scale (EDSS). Multiple Sclerosis Collaborative Research
Group. Neurology 42 (4), 859–863. Noseworthy, J.H., 1994. Clinical scoring methods for multiple sclerosis. Ann. Neurol. 36 (Suppl), S80–S85. Sharrack, B., Hughes, R.A., 1996. Clinical scales for multiple sclerosis. J. Neurol. Sci. 135 (1), 1–9. Verdier-Taillefer, M.H., Zuber, M., Lyon-Caen, O., Clanet, M., Gout, O., Louis, C., et al., 1991. Observer disagreement in rating neurologic impairment in multiple sclerosis: facts and consequences. Eur. Neurol. 31 (2), 117–119.
8