1629
ORIGINAL ARTICLE
The Symptom Inventory Disability-Specific Short Forms for Multiple Sclerosis: Reliability and Factor Structure Carolyn E. Schwartz, ScD, Rita K. Bode, PhD, Timothy Vollmer, MD ABSTRACT. Schwartz CE, Bode RK, Vollmer T. The Symptom Inventory disability-specific short forms for multiple sclerosis: reliability and factor structure. Arch Phys Med Rehabil 2012;93:1629-36. Objective: To further the development of the 99-item Symp-
tom Inventory (SI) for multiple sclerosis (MS) using modern test theory methods to create 3 disability-specific short forms for MS patient subgroups identified using Performance Scale (PS) items. Design: A web-based cross-sectional study. Setting: National MS Registry. Participants: People with MS (N⫽1532) who participate in the North American Research Committee on Multiple Sclerosis Registry. Interventions: None. Main Outcome Measures: The SI; the disease-specific PS and the Patient-Determined Disease Steps; and the generic ShortForm 12. Results: When the original SI subscales did not demonstrate unidimensionality, exploratory factor analysis was conducted yielding 14 factors that could be classified using the structure of the PS. Confirmatory factor analysis confirmed the unidimensionality of the hand function, vision, fatigue, cognitive, bowel/bladder, spasticity, and pain scales. The mobility scale was split into mobility and use of assistive devices; the sensory scale was split into sensory and vasomotor. Item response theory analyses revealed good model fit. Conclusions: This study provides empirical support for a 10-scale symptom measure for use in MS clinical research, with short forms in 5 scales tailored to have good specificity for people with mild, moderate, and severe disability and single forms for the remaining 5 scales. The PS items can serve as a screener for these disability-specific short forms, which provide choice and flexibility that are similar to a computerized adaptive test but without the reliance on real-time computer infrastructure. Key Words: Outcome Assessment, Patient; Psychometrics; Quality of life; Rehabilitation; Symptoms.
© 2012 by the American Congress of Rehabilitation Medicine HE STUDY OF NEUROLOGIC impairment and disabilT ity in multiple sclerosis (MS) has been advanced by the development and integration of patient-reported outcome (PRO) measures into clinical research. While a number of PRO measures have been developed in the past 2 decades, the clinician-reported Expanded Disability Status Scale (EDSS)1 remains the most widely used in MS research.2 Although its widespread use ensures the comparability of results, the scale’s lack of precision in defining mild, moderate, and severe impairment, as well as divergences in mean staying time at a particular scale level, leads to a substantial degree of interexaminer variability.2 The scale also loses some of its objectivity in the areas of bowel and bladder function, as well as ambulation, because it relies on subjective assessment and is not responsive to clinical change, particularly with regard to upper limb and cognitive dysfunction.2 More than a decade ago, a PRO measure was validated that is of particular interest because it serves as a useful measure of symptoms. The 99-item Symptom Inventory (SI) measure of impairment3 was validated in large samples in the 1990s and has been used by clinical researchers as an inexpensive and valid replacement or complement to the EDSS. In the time since the original validation article3 was published, there have been 2 major changes that serve as the foundation for the current work. First, in 2006, the Food and Drug Administration published its guidance on the use of PRO measures in medical product development to support labeling claims.4 This guidance formalized the use of PRO measures in drug development and emphasized the use of symptom and function measures in such research.4 The publication of this document has provided a clear path for MS clinical researchers to document the benefits of new disease-modifying treatments using PRO measures and, in particular, symptom and disability List of Abbreviations
From the DeltaQuest Foundation, Inc, Concord, MA (Schwartz, Bode); Departments of Medicine and Orthopaedic Surgery, Tufts University Medical School, Boston, MA (Schwartz); Department of Physical Medicine and Rehabilitation, Feinberg School of Medicine, Northwestern University, Chicago, IL (Bode); Department of Neurology, University of Colorado Denver, Denver, CO (Vollmer); and Rocky Mountain Multiple Sclerosis Center, Aurora, CO (Vollmer). Supported in part by a Metric Development and Validation Award (USAMRAA grant no. MS090018), and by a CMSC/Global MS Registry Visiting Scientist Fellowship, which was supported through a Foundation of the Consortium of Multiple Sclerosis Centers grant from EMD Serono, Inc. CMSC/Global MS Registry is supported by the Consortium of Multiple Sclerosis Centers and its Foundation. No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit on the authors or on any organization with which the authors are associated. Reprint requests to Carolyn E. Schwartz, ScD, 31 Mitchell Rd, Concord, MA 01742, e-mail:
[email protected]. In-press corrected proof published online on May 15, 2012, at www.archives-pmr.org. 0003-9993/12/9309-01285$36.00/0 http://dx.doi.org/10.1016/j.apmr.2012.03.006
CFI EDSS ICF IRB IRT MS NARCOMS PDDS PRO PS RMSEA SI SI-SF 2-PL
comparative fit index Expanded Disability Status Scale International Classification of Functioning, Disability and Health institutional review board item response theory multiple sclerosis North American Research Committee on Multiple Sclerosis Patient-Determined Disease Steps patient-reported outcome Performance Scales root mean square error of approximation Symptom Inventory Symptom Inventory short-form 2-parameter logistic (model)
Arch Phys Med Rehabil Vol 93, September 2012
1630
SYMPTOM INVENTORY SHORT FORMS, Schwartz
measures. There is an acute need to hone the SI for use in clinical research so that it is shorter and provides short forms tailored to patient subgroups with specific levels of disability. The second major change that motivates the current work is that psychometric methods have grown substantially. These methods have opened the door to a deeper and more nuanced approach to working with data and for characterizing clinically important change,5 of relevance both for individual patient monitoring and for assessment of treatment value.6 Methodologic advances from educational testing using item response theory (IRT)7 have been applied to health-related quality-of-life assessment, leading to a paradigm shift in PRO measures assessment. These methods allow the selection of items for short forms based on the range of the underlying trait (ie, ) that is of most interest. Thus, short forms can be created for different disease groupings or levels of disabilities, rather than having 1 short form for all. This is particularly relevant for the SI given its 99-item length and the importance of developing abbreviated forms for clinical MS research. The purpose of the present work is to use classic and modern test theory methods to reduce the number of items in the SI and generate item calibrations for the development of short forms. We aim to use the Performance Scales (PS) items as screener items to identify disability subgroups for each SI subscale, and to create 3 disability-specific short forms for specific subgroups of patients with MS. METHODS Sample and Design This project involved cross-sectional data from 1532 people with MS who participate in the North American Research Committee on Multiple Sclerosis (NARCOMS) Registry. NARCOMS is a self-report registry of more than 36,000 individuals who have MS, with approximately 10,000 updating their data every 6 months using either paper or secure webbased survey forms capturing data on demographics, disease characteristics, disability, treatments, and access to health providers. Potential candidates for the study were selected among those NARCOMS Registry participants who completed the latest 2 semiannual update surveys online, reside in the United States, and are at least 18 years of age (n⬎5000). To ensure adequate cell sizes, we obtained a stratified sample with equal distribution by sex, age, course of disease (relapsing or not), and level of disability as measured by the most recent PatientDetermined Disease Steps (PDDS) score (0 –2, mild; 3– 4, moderate; 5– 8, severe). These NARCOMS participants were sent an invitation to participate in this add-on survey after they completed the spring 2010 semiannual update. Procedure An e-mail notification describing the study was sent to a randomly selected cohort. Once our desired sample size was reached, no more e-mails were sent out. This e-mail included a unique identifier that was linkable to the NARCOMS database, and a link to a web-based set of questionnaires designed for this study. The survey engine is SurveyGizmo.com (www.surveygizmo.com), a user-friendly and Health Insurance Portability and Accountability Act– compliant interface for collecting data in a secure environment. The SurveyGizmo questionnaire began with an online (written) consent form and the measures described below. Data from the NARCOMS spring update and the SurveyGizmo data were then linked by the NARCOMS Coordinating Center using the NARCOMS unique identifier, and data were deidentified before data analysis. The project was reviewed and approved by Arch Phys Med Rehabil Vol 93, September 2012
the institutional review boards (IRBs) associated with NARCOMS (the Western IRB, Olympia, WA) and DeltaQuest Foundation (the New England IRB, Newton, MA). Measures In addition to demographic characteristics and MS-related treatment information, data from the following 2 PRO tools were used in the present psychometric development work: 1. The 99-item SI. Initial validation of the SI supported the reliability and validity of its 6 subscales designed to reflect localization of brain lesions (visual, left and right hemisphere, brainstem and cerebellum, spinal cord, and nonlocalized symptoms).3,8,9 Item response options range from 0 (“not at all”) to 4 (“a great deal”) or, if the patient deems the item not applicable, the item is scored as missing or ⫺99. SI scale scores were combined additively to produce an overall index of neurologic impairment. 2. The PS.3 The PS is a measure of 8 domains of neurologic disability (mobility, hand function, vision, fatigue, cognition, bladder/bowel, sensory, spasticity). PS scores range from 0 (“normal”) to 5 (“total [subscale name] disability”) on all subscales except mobility, for which the highest score is 6, and are aggregated for an overall index of neurologic disability. The PS has been consistently used since the inception of NARCOMS as a brief, comprehensive measure of disability. Two other PRO measures were included in the study for the purpose of sample description. These included the PDDS,10 a self-report measure that was modeled after and correlates highly with the EDSS. The patient characterizes disability level into 1 of 9 steps (0, normal; 1, mild disability; 2, moderate disability; 3, gait disability; 4, early cane; 5, late cane; 6, bilateral support; 7, wheelchair or scooter; 8, bedridden). The Medical Outcomes Study Short-Form 12-Item Health Survey, version 211 generic measure of functional health was also included. This 12-item measure yields physical and mental health component scores that are reported in T scores, having a range of 0 to 100, a mean of 50, and an SD of 10. Statistical Analysis Descriptive analysis. Preliminary analyses were conducted on data for each existing SI scale to examine the distribution of responses within each item to identify categories with sparse data (potentially problematic in the IRT analysis) and to identify out-of-range responses and violations of monotonically increasing or decreasing sum score means across categories of the rating scale. Reliability analyses were conducted to assess the internal consistency of the items within the scales. The raw frequencies and sum score means for respondents choosing each response category were examined to identify categories with sparse data and inversions in category means. SPSS version 17.0a was used in these analyses. Dimensionality. To evaluate the dimensionality of the scales, factor analysis was used. Before these analyses, the sample was randomly split in half: one sample for use in the exploratory factor analyses and the other for use in the confirmatory factor analyses. Since the factor structure of the SI was previously established,3 confirmatory factor analysis was conducted on the item data for each SI scale by using Mplus version 5.2.b Because the SI responses were categorical, polychoric correlation matrices were used as input. Dimensionality was assessed using the weighted least-squares estimation method with adjustments for means and variances estimator; the criteria for acceptable unidimensionality used
1631
SYMPTOM INVENTORY SHORT FORMS, Schwartz
were a comparative fit index (CFI) greater than .9012 and a root mean square error of approximation (RMSEA) less than .08,13 with CFI greater than .95 and RMSEA less than .06 considered evidence of good fit. In addition, local dependence was assessed; the criterion used for acceptable independence was an interitem correlation of residuals less than .20.14 When these criteria were met, the dimensionality of the SI scales was confirmed. When scale unidimensionality was not confirmed, an exploratory factor analysis was conducted on the complete item set (99 items) using Mplus with the other half of the sample. For this analysis, polychoric correlations and unweighted least-squares estimation was used to identify items loading primarily on each scale. Scree plots and eigenvalues greater than 1 were used to determine the number of factors in the data with parallel analysis using RawPar within SPSS to confirm the number of factors identified in the exploratory factor analysis. For scales in which a single dimension was identified, confirmatory factor analysis was conducted on the other half of the sample to confirm the revised factor structure. For scales in which multiple dimensions were identified, half of the sample was used in an exploratory factor analysis to identify unidimensional subsets of items, and the other half was used in a confirmatory factor analysis to confirm the new structure. IRT calibration and model fit. For scales determined to be unidimensional, IRT methods7 were used with data from the entire sample to calibrate the items. Initially, Multilog version 7.0c was used to implement the 2-parameter logistic (2-PL) model. To estimate the parameters of each item (slopes and thetas for each threshold) on the latent trait, Samejima’s15 graded response model was used along with item and scale information functions to assess precision along the trait continuum. For slopes, values greater than 2.00 and a range of thresholds as wide as the scores for the sample are considered good. Fit to the 2-PL IRT model was estimated using the S-X2 and S-G2 statistics,16 with values with a probability of less than .001 used to identify items that did not fit this IRT model. When misfit was found using the 2-PL model, the less restrictive Rasch model was implemented using Winsteps version 3.69.d To estimate the item difficulties and thresholds using the Rasch model, the rating scale model was used.17 Fit to the Rasch model was assessed using infit mean square fit statistics, with values greater than 1.40 identifying items that misfit the model.18 For both models, misfitting items were dropped from the item set and the analyses repeated until the item set contained no misfitting items. The measurement quality of the scales was evaluated by examining separation (the extent to which the items differentiate among persons with different levels of disability) and its corresponding reliability. Separation values greater than 2.00 indicated that the items were able to differentiate persons into at least 3 levels (with a corresponding reliability of .80, which is interpreted the same as Cronbach ␣), and values less than 1.50 (with a corresponding reliability of .70) indicated that the items were unable to differentiate persons into more than 1 level.18 Selection of items for short forms. To facilitate the use of the disability-level short forms, it was decided that the method used to select items for each form had to be the same across levels and scales. When it became apparent that the only approach in which all items fit the model was the Rasch approach, we selected items based on their difficulty level provided by the Rasch analysis. In this approach, the average location was used to rank the items, and natural breaks in the locations were used to identify items for specific short forms. Items with the lowest average locations were designated as the items for the short form designed for patients with minimal
Table 1: Sample Characteristics (nⴝ1532) Characteristics
Age Sex (female) Marital status (married) Time since diagnosis
PDDS PS Mobility Hand function Vision Fatigue Cognition Bowel and bladder Sensory Spasticity Pain SF-12 physical component score SF-12 mental component score
Values
54.21⫾9.41 75% 71% 7.6% ⬍5y 20.6% 5–9y 44.2% 10–19y 20.8% 20–29y 6.8% ⬎30y 3.38⫾2.39 2.71⫾1.99 1.47⫾1.25 1.09⫾1.08 2.50⫾1.37 1.47⫾1.22 1.66⫾1.27 1.65⫾1.24 1.65⫾1.29 1.64⫾1.38 37.65⫾11.67 48.42⫾10.90
NOTE. Values are mean ⫾ SD or as otherwise indicated. Abbreviation: SF-12, Medical Outcomes Study Short-Form 12-Item Health Survey.
limitations, items with the highest average locations were designated as the items for the short form designed for patients with severe limitations, and items in the middle range of average locations (overlap with highest and lowest when necessary) were designated as the items for the short form designed for patients with moderate limitations. Because item locations and estimates were on the same scale, it was possible to use the range of item locations within each level to recommend the appropriate disability-level short form for patients choosing each category in the corresponding PS scale. RESULTS Sample Table 1 summarizes the demographic and disease-related characteristics of the overall sample, and supplemental figures 1 and 2, available online only at the Archives website: www.archives-pmr.org, provide histograms of the distributions for the PS items and PDDS scores. This sample is representative of a broad range of disability and thus appropriate for testing the psychometric characteristics of disability-specific short forms, where the full range of the underlying trait of interest (ie, disability) is represented. A comparison of our sample’s demographics to the full NARCOMS sample exemplified in the work by Marrie et al19 suggests that our sample is similar to the full NARCOMS sample in terms of sex and level of disability, but is slightly older (our mean age, 54.2y as compared with 52.5y in the Marrie sample; P⬍.0001) and received a diagnosis at a slightly older age (our mean age at diagnosis, 39.1y as compared with 37.9y in the Marrie sample; P⬍.0001) (see supplemental table 1, available online only at the Archives website: www.archives-pmr.org). Descriptive Analyses With the use of the original factor structure, there were no out-of-range responses or categories with sparse data. No inArch Phys Med Rehabil Vol 93, September 2012
1632
SYMPTOM INVENTORY SHORT FORMS, Schwartz Table 2: Mapping Between PS-Based Structure of SI and Original Subscales* Exploratory Factor Analysis Results
Reliability
New SI Subscale Based on PS Structure
Mobility
Hand function Vision
Fatigue
Cognitive
Bowel and bladder Sensory
Spasticity Pain Unclassified
Original SI Subscale From Which Items Were Drawn (N Items)
No. of factors
No. of Items
Confirmatory Factor Analysis Results CFI
RMSEA
Locally Dependent Item Pairs
␣ Reliability
Corrected ItemTotal Correlations (Range)
.95
.31–.87
2 1
18 14
.979 .979
.130 .130
10 pairs None
.89
.65–.77
1
6
.967
.298
6 pairs
.88
.50–.70
1 1
10 8
.949 .989
.131 .057
12 pairs None
.93
.52–.85
1 1
12 9
.984 .984
.190 .231
4 pairs None
.95
.42–.83
2 1
15 13
.971 .974
.089 .096
Spinal cord (5)
.86
.50–.81
1
4
.988
.188
None (2 poorly loading) None None
Left hemisphere (3) Right hemisphere (3) Brainstem & cerebellar (6) Spinal cord (10) Nonlocalized (2)
.91
.32–.70
5 Sensory 1
24 9
.910 .920
.145 .155
2 pairs 1 pair marginal
Vasomotor 1 1 1 NA
3
⬎.999
.000
None
5 3 NA
.945 ⬎.999 NA
.280 .000 NA
3 pairs None NA
Brainstem & cerebellar (6) Spinal cord (12) Left hemisphere (4) Right hemisphere (2) Vision (8) Brainstem & cerebellar (2) Brainstem & cerebellar (2) Spinal cord (3) Nonlocalized (4) Left hemisphere (25) Right hemisphere (3)
Spinal cord (5) Spinal cord (3) Left hemisphere item related to speech (1)
.84 .77 NA
.57–.74 .53–.69 NA
Original After Deletion
Abbreviation: NA, not applicable. *For domains in which multiple factors were identified or the confirmatory model was not satisfactory, exploratory and confirmatory factor analyses were rerun after deleting items. If not, the cells reflecting these statistics are not applicable.
version in category means was found in 3 scales (left hemisphere, right hemisphere, and nonlocalized subscales) but there was marginal evidence that 1 vision, 2 brainstem/cerebellar, and 2 spinal cord items contained category inversions. When examined in the revised factor structure, 4 of 5 of the items with category inversions measured use of assistive mobility devices. Dimensionality Examination of the dimensionality using the original factor structure showed single dimensions in 3 scales (vision, right hemisphere, and nonlocalized) but multiple dimensions in the remaining 3 scales: 3 factors in left hemisphere, 2 factors in brainstem/cerebellar, and 4 factors in spinal cord items. Since these results did not support the unidimensionality of the existing SI scales, an exploratory factor analysis of all 99 items was conducted, and the number of factors was confirmed using parallel (ie, bootstrapped) analysis. These analyses revealed 14 factors with eigenvalues greater than 1.0. On examination of the items loading on each factor, it became clear that the items could be classified using the structure of the PS (see table 2 for the mapping of items onto the revised factor structure). Arch Phys Med Rehabil Vol 93, September 2012
The results of the exploratory factor analyses conducted on each new scale are presented in table 2. Single factors were identified for hand function, vision, fatigue, cognitive, bowel/ bladder, spasticity, and pain. Mobility and sensory subscales were not unidimensional until each was split into 2 subscales: mobility and use of assistive devices, and sensory (including neuropathic pain) and vasomotor, respectively. For mobility, 4 items measuring use of mobility assistive devices loaded on a second factor, although the first factor accounted for 79% of the variance. For sensory, after consultation with the clinical neurologist on the project (T.V.), 2 scales were created: one (consisting of 10 items) to measure sensory symptoms and the other (consisting of 3 items) to measure vasomotor symptoms. Exploratory factor analysis was conducted on the sensory scale before and after deleting a poorly loading item. After deletion, a single factor was identified for the items in both the sensory and vasomotor scales. The confirmatory factor analyses revealed that the items in each scale were unidimensional. The confirmatory factor analyses were also used to identify locally dependent items that should not be included in the same short form (see table 2). For some scales (bowel and bladder, pain, vasomotor), none of the
1633
SYMPTOM INVENTORY SHORT FORMS, Schwartz
items were locally dependent. In others (vision, fatigue, cognitive), one of the pairs of locally dependent items was deleted, and in still others (mobility, hand function, spasticity), where the items represented different levels of disability, the items were retained because they could be placed in different short forms. IRT Analysis Two IRT analyses were conducted on the unidimensional SI scales. The results of both analyses are presented in table 3. With the use of the 2-PL model, which parameterizes discrimination using slopes, only half of the scales fit the model. However, items that misfit the 2-PL model fit the Rasch model, which does not parameterize discrimination. Note that in order to obtain model fit, the mobility items dealing with use of assistive devices were dichotomized. No other rescoring was necessary in this or the other scales. To confirm that the use of the less restrictive Rasch model did not result in different estimates of a person’s disability, scores from each approach were compared. Since the 2 methods anchor their scales differently, the resulting scores were not expected to be the same. However, their correlations across scales ranged from .956 for fatigue to .981 for mobility, indicating the 2 estimates ranked the respondents in essentially the same order. Short-Form Item Selection With the use of the results of the Rasch analysis, items were selected for disability-level short forms. For scales in which there were a sufficient number of items and that sufficiently differentiated people at various levels of the trait (mobility, vision, fatigue, cognition, sensory), item locations (average thresholds) were used to select items for the short forms that would be administered based on responses to the screener. For scales that either contained too few items or did not sufficiently differentiate people at various levels to justify more than 1 scale across disability level (hand function, bowel and bladder, spasticity, pain, vasomotor), the same items would be administered to all. The number of items within each level ranges from 3 to 7, and
the extent of overlap in the moderate-disability form was a function of the number of items available. Scoring The SI short forms are scored using T scores, with a range of 0 to 100, a mean of 50, and an SD of 10. A proposed scoring algorithm and disability-level short-form items for each scale, along with the corresponding PS scores, are available by contacting the lead author (C.E.S.). DISCUSSION This study provides empirical support for a 10-scale symptom measure for use in MS clinical research, with short forms for half of the scales tailored for people with mild, moderate, and severe disability and short forms consisting of the same items administered for the other half of the scales. These short forms would reduce the respondent burden while retaining high levels of precision and reliability. By mapping these scales to the PS measures of MS disability, 2 goals are achieved. First, the PS serves as a screener tool for the selection of the appropriate SI short-form (SI-SF). The PS can thus be used alone or in conjunction with the SI-SFs, depending on the nature of information desired. Second, the information provided by the SI-SFs is complementary to the PS, providing more in-depth information about the specific nature of the patient’s symptom experience. Our findings did not confirm the original factor structure that was published in 1999.3 This unexpected result reflects the different analytic methods used. In the 1999 article, we did not use factor analysis to create the subscales, but rather we tested the internal consistency of subscales that were defined by presumed central nervous system localization of the symptoms. The internal consistency of the subscales was high, as they were in our analyses of the original subscales in the present sample. In the 1999 work, logistic regression modeling was used to create a 29-item short form based on each subscale’s ability to explain substantial variance in discriminating known groups based on EDSS scores. In the present work, a more psychometrically specific variation of logistic modeling was used (ie, IRT modeling) to generate
Table 3: IRT Calibrations of SI Subscales 2-PL Model
Fit to the 2-PL Model Acceptable Statistic
Subscale Mobility* Hand function Vision Fatigue Cognitive Bowel and bladder Sensory Spasticity Pain Vasomotor
Rasch Model
Fit to the Rasch Model
Ability to Distinguish Levels of Disability
Interpreted Same as Cronbach ␣
No. of Misfitting Items
Range of Item Fit Statistics P⬍.001
Separation (No. of distinct levels)
Corresponding Reliability
0 4 0 5 0 2
⬍.001–.965 ⬍.001–.233 .009–.966 ⬍.001–.234 .010–.997 ⬍.001–.049
2.00 (3) 3.41 (3⫹) 1.92 (⬃3) 1.49 (⬃2) 2.71 (3⫹) 2.94 (3⫹) 1.93 (⬃3)
.80 .92 .79 .69 .88 .90 .79
0 0 2 2
.001–.560 .009–.373 ⬍.001–.103 ⬍.001–.205
1.74 (2) 1.57 (⬃2) 0.99 (1) 1.10 (1)
.75 .71 .50 .55
No. of Misfitting Items
1 1 1 1
0 0 marginal marginal marginal marginal 0 0 0 0
Range of Item Fit Statistics
0.7–1.4 0.65–1.40 0.84–1.23 0.80–1.44 0.63–1.50 0.60–1.42 0.58–1.41 0.72–1.28 0.76–1.25 0.85–1.30 0.85–1.34
*In order to obtain model fit, the mobility items dealing with use of assistive devices were dichotomized. No other rescoring was necessary in this or the other scales.
Arch Phys Med Rehabil Vol 93, September 2012
1634
SYMPTOM INVENTORY SHORT FORMS, Schwartz
item calibrations that guided the selection of short forms based on the items’ ability to distinguish severity levels of the measured trait. This IRT modeling was done after the requisite factor analysis to confirm the unidimensionality of each scale, a necessary prerequisite for IRT models. The increased precision of the IRT analytic methods has resulted in single short forms with 21 items for scales assessing hand function, bowel/bladder, spasticity, pain, and vasomotor symptoms. The short-form variations focus on discriminating levels of disability in mobility, vision, fatigue, cognition, and sensory symptoms, and can range from 40 to 44 items if all scales are used within a level of disability. In this study, we produced a calibrated item “bank” for each scale using an IRT approach. An advantage of using this approach is that scores on the resulting short forms are on the same scale and can be compared even if different short forms of the scale are administered over time. A unique application of the new short forms is that they can be used individually or in conjunction with each other, depending on the desired application. For example, if one is using the tools for clinical monitoring and there is no change in reported
scores on some PS items, then the SI-SF might not be used in its entirety. The clinician might request more information only on the SI-SF scale that evidenced change in the PS item. Further, one could use different variants of the SI-SF scales, depending on the level of disability within a scale. For example, one might opt for the moderate-disability cognition and fatigue SI-SFs but the mild mobility SI-SF. The selection would be based on the level of disability indicated by the PS screener. A Conceptual “Blueprint” for Understanding Measurement Models Symptom experience is a critical component of the patient’s daily experience because of direct effects on functional health (eg, role performance), direct medical expenditures (eg, health care utilization), and indirect costs (eg, lost productivity). The disability-specific short forms of the SI may be useful measurement tools in a number of contexts, such as for recently diagnosed patients whose disability is less apparent to the observer. The tool may also be useful in clinical research and practice focused on testing whether an intervention improves
Appraisal Characteristics of the Individual S Symptom t Amplification
Goal Attainment Motivation
Value Preferences
Evaluative Functional Health Clinical Indicators
Symptom Experience
(EDSS, MSFC)
(General Health and Well-Being)
O
X1 X2
Quality of Life
Symptom Impact
X4 ξ1
X3
Formative Measurement Model
ξ2
X5
Y1
X6
Reflective Measurement Model
Fig 1. A conceptual “blueprint” for understanding measurement models. Our conceptual framework builds on Wilson and Cleary’s22 seminal article that elucidated the relationship between different types of outcome measures in understanding clinical indicators. Consistent with the ICF,23-26 the model posits causal linkages between biological or physiologic variables, symptoms, functioning, general health perceptions, and overall quality of life.22 Two important distinctions are made: A first distinction is that quality of life is not entirely different from biological parameters but can be placed on a continuum from least to most subjective. A second distinction is that symptom experience and symptom impact are qualitatively, theoretically, and statistically different. Abbreviation: MSFC, MS Functional Composite.
Arch Phys Med Rehabil Vol 93, September 2012
SYMPTOM INVENTORY SHORT FORMS, Schwartz
symptom experience, as compared with preventing deterioration. It is important to note, however, that the measurement of symptom experience is distinct from measuring symptom impact or overall quality of life. We believe that the SI represents a useful clinimetric20,21 tool because it captures symptom experience in a way that can be useful for clinicians and clinical researchers. Our conceptual framework builds on Wilson and Cleary’s22 seminal article that elucidated the relationship between different types of outcome measures in understanding clinical indicators. Consistent with the International Classification of Functioning, Disability and Health (ICF),23-26 the model posits causal linkages between biological or physiologic variables, symptoms, functioning, general health perceptions, and overall quality of life.22 Figure 1 presents a schematic of the Wilson and Cleary22 model that integrates key distinctions relevant to symptom research. A first distinction is that quality of life is not entirely different from biological parameters but can be placed on a continuum from least to most subjective. The outcomes in this model include clinical indicators (eg, EDSS, MS Functional Composite), symptom experience, evaluative functional health, and quality of life. Evaluative functional health refers to health-related outcomes that require the individual to evaluate the subjective impact of symptoms or disability. This would be comparable to “function” in the ICF terminology. Assessments focus on the evaluated difficulty or ease in performing specific tasks or functions, and can relate to physical, social, emotional, and role functioning measured by disease-specific or cross-disease (generic) measures. Quality of life is a broader construct, reflected by general health perceptions and existential or eudemonic well-being.27 These more global measures take account of the weights or values patients attach to different symptoms or functional impairments. Overall quality of life may be strongly influenced by factors such as an individual’s economic and employment status, family situation, or the political environment.22 A second distinction is that symptom experience and symptom impact are qualitatively, theoretically, and statistically different. An investigation into the statistical and psychometric literature suggests that symptom measures are “causal indicators”—that is, measuring something that causes change in functional health outcomes—whereas most evaluative PRO measures are “effect indicators”—that is, measuring something that reflects changes in functional health outcomes.28-31 Figure 1 illustrates how this causal-effect indicator distinction relates to the measurement model of PRO data: causal indicators fit a formative measurement model (ie, measures or indicators x1, x2, and x3 cause changes in the latent variable 1). In contrast, effect indicators fit a reflective measurement model (ie, measures or indicators x4, x5, and x6 reflect changes in the latent variable 2). This distinction should also be considered in the statistical modeling of PRO data by specifying different structural equation models. Further refinement of the SI should focus on understanding the relationship between symptom change and other outcome measures in order to better treat symptoms. Study Limitations The limitations of the present work should be noted. First, the content validity of the SI disability-specific short forms could be improved. For example, the items bank would be enhanced by the addition of low-level hand function symptoms that would prevent one from being able to feed oneself. While the items included are unidimensional and fit the measurement model, the range of difficulty of the tasks assessed by the
1635
current items is relatively narrow. Further development of the measure might develop more items that specifically measure low-level hand function, but care would need to be taken to include items that were unidimensional with the other hand function items. A second limitation is that it does not address psychometric characteristics related to longitudinal change because it is beyond the scope of the cross-sectional data used in the present work. Our companion article32 addresses the stability of the SI-SFs, their responsiveness to clinical change, and interpretation guidelines for the SI-SFs (eg, defining a minimal clinically important difference). CONCLUSIONS PRO measures are increasingly recognized as important in clinical research, with recent trials of disease-modifying agents using PRO measures as tertiary endpoints.33 This article reports on the further development of a PRO measure for use in assessing symptom experience in MS, which the companion article32 demonstrates is a distinct construct from evaluative functional health outcomes, such as quality of life. Disabilityspecific short forms of the SI can be used in conjunction with the PS as a screener, with SI-SF scales used individually, in groups of several at the same or different levels of disability, or all 10 scales, depending on the desired information or application of the tool. These short forms thus provide a level of choice and flexibility that is similar to a computerized adaptive test34 but without the reliance on real-time computer infrastructure. Measuring symptom experience may be useful in understanding the short- and long-term effects of disease-modifying and rehabilitation treatments. The SI disease-specific short forms may thus be a useful addition to the available clinical research tools. Acknowledgments: We thank Gary Cutter, PhD, and Stacey Cofield, PhD, for data management services early in the project, and Brian Quaranto, BS, for data analysis services later in the project; and Margaret Nosek, PhD, Ruth Ann Marrie, MD, PhD, and Robert Fox, MD, for helpful comments on an earlier draft of this article. References 1. Kurtzke JF. Rating neurologic impairment in multiple sclerosis: an Expanded Disability Status Scale (EDSS). Neurology 1983;33: 1444-52. 2. Amato MP, Portaccio E. Clinical outcomes measures in multiple sclerosis. J Neurol Sci 2007;259:118-22. 3. Schwartz CE, Vollmer T, Lee H. Reliability and validity of two self-report measures of impairment and disability for MS. North American Research Consortium on Multiple Sclerosis Outcomes Study Group. Neurology 1999;52:63-70. 4. Food and Drug Administration. Guidance for industry. Patientreported outcome measures: use in medical product development to support labeling claims. Washington (DC): US Dept of Health and Human Services; 2009. 5. Wells G, Beaton D, Shea B, et al. Minimal clinically important differences: review of methods. J Rheumatol 2001;28:406-12. 6. Sprangers MAG, Moinpour CM, Moynihan TJ, Patrick DL, Revicki DA; Clinical Significance Consensus Meeting Group. Assessing meaningful change in quality of life over time: a user’s guide for clinicians. Mayo Clin Proc 2002;77:561-71. 7. Embretson SE, Reise SP. Item response theory for psychologists. Mahwah: Lawrence Erlbaum; 2000. 8. Marrie RA, Goldman M. Validity of performance scales for disability assessment in multiple sclerosis. Mult Scler 2007;13: 1176-82. 9. Motl RW, Schwartz CE, Vollmer T. Continued validation of the Symptom Inventory in multiple sclerosis. J Neurol Sci 2009;285: 134-6. Arch Phys Med Rehabil Vol 93, September 2012
1636
SYMPTOM INVENTORY SHORT FORMS, Schwartz
10. Hohol MJ, Orav EJ, Weiner HL. Disease steps in multiple sclerosis: a simple approach to evaluate disease progression. Neurology 1995;45:251-5. 11. Ware JE Jr, Kosinski M, Keller SD. A 12-item Short-Form Health Survey. Med Care 1996;34:220-33. 12. Hu L-T, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Modeling 1999;6:1-55. 13. Browne MW, Cudeck R. Alternative ways of assessing model fit. In: Bollen KA, Long KS, editors. Testing structural equation models. Newbury Park: Sage Publications; 1993. p 136-62. 14. Marsh HW, Balla JR, McDonald RP. Goodness-of-fit indexes in confirmatory factor analysis: the effect of sample size. Psychol Bull 1988;103:391-470. 15. Samejima F. Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Suppl 1969;34(4 Part 2):100. 16. Orlando M, Thissen D. Further examination of the performance of S-X2, an item fit index for dichotomous item response theory models. Appl Psychol Meas 2003;27:289-98. 17. Wright BD, Masters GN. Rating scale analysis: Rasch measurement. Chicago: MESA Pr; 1982. 18. Bond TG, Fox CM. Applying the Rasch model. Mahwah: Lawrence Erlbaum; 2001. 19. Marrie RA, Cutter G, Tyry T, Vollmer T, Campagnolo D. Disparities in the management of multiple sclerosis-related bladder symptoms. Neurology 2007;68:1971-8. 20. Feinstein AR. Multi-item “instruments” vs Virginia Apgar’s principles of clinimetrics. Arch Intern Med 1999;159:125-8. 21. Feinstein AR. An additional basic science for clinical medicine: IV. The development of clinimetrics. Ann Intern Med 1983;99: 843-8. 22. Wilson IB, Cleary PD. Linking clinical variables with healthrelated quality of life. A conceptual model of patient outcomes. JAMA 1995;273:59-65. 23. Stucki G. International classification of functioning, disability, and health (ICF): a promising framework and classification for rehabilitation medicine. Am J Phys Med Rehabil 2005;84:733-40. 24. Holper L, Coenen M, Weise A, Stucki G, Cieza A, Kesselring J. Characterization of functioning in multiple sclerosis using the ICF. J Neurol 2010;257:103-13.
Arch Phys Med Rehabil Vol 93, September 2012
25. Stucki G, Cieza A, Melvin J. The international classification of functioning, disability, and health: a unifying model for the conceptual description of rehabilitation strategy. J Rehabil Med 2007; 39:279-85. 26. Stucki G, Melvin J. The international classification of functioning, disability and health: a unifying model for the conceptual description of physical medicine and rehabilitation. J Rehabil Med 2007; 39:286-92. 27. Smith JA. The idea of health: a philosophical inquiry. ANS Adv Nurs Sci 1981;3:43-50. 28. Fayers PM. Quality-of-life measurement in clinical trials—the impact of causal variables. J Biopharm Stat 2004;14:155-76. 29. Fayers PM, Hand DJ. Factor analysis, causal indicators, and quality of life. Qual Life Res 1997;6:139-50. 30. Fayers PM, Hand DJ. Causal variables, indicator variables and measurement scales: an example from quality of life. J R Stat Soc 2002;165:233-61. 31. Fayers PM, Hand DJ, Bjordal K, Groenvold M. Causal indicators in quality of life research. Qual Life Res 1997;6:393-406. 32. Schwartz CE, Bode RK, Quaranto BR, Vollmer T. The Symptom Inventory disability-specific short forms for multiple sclerosis: construct validity, responsiveness, and interpretation. Arch Phys Med Rehabil 2012;93:1617-28. 33. Miller D, Rudick RA, Hutchinson M. Patient-centered outcomes: translating clinical efficacy into benefits on health-related quality of life. Neurology 2010;74(Suppl 3):S24-35. 34. Wainer H, Dorans N, Flaugher R, Second E. Computerized adaptive testing: a primer. Hillsdale (NJ): Lawrence Erlbaum Associates; 2000. Suppliers a. SPSS version 17; SPSS Inc, 233 S Wacker Dr, 11th Fl, Chicago, IL 60606. b. Mplus version 5.2; Muthén & Muthén, 3463 Stoner Ave, Los Angeles, CA 90066. c. Multilog version 7.0; Scientific Software International Inc, 7383 N Lincoln Ave, Ste 100, Lincolnwood, IL 60712-1747. d. Winsteps version 3.69; KAGI, 1442-A Walnut St, PMB #392, Berkeley, CA 94709-1405. www.winsteps.com.
SYMPTOM INVENTORY SHORT FORMS, Schwartz
1636.e1
Supplemental Fig 1. Histograms of PS items.
Arch Phys Med Rehabil Vol 93, September 2012
1636.e2
SYMPTOM INVENTORY SHORT FORMS, Schwartz
Supplemental Fig 2. Histogram of PDDS scores.
Supplemental Table 1: A Comparison of Sample Characteristics of Overall NARCOMS Registry Participants and Our Sample NARCOMS Overall (n⫽9688)
Our Sample (n⫽1532)
Variable
n
Mean/%/Median
SD/IQR
n
Mean/%/Median
SD/IQR
Age (mean) (y) Age at diagnosis (mean) (y) Sex (% female) PDDS (median) Performance Scales Items’ Medians Mobility Hand Vision Fatigue Cognition Bladder Sensory
16,858 16,858 16,858 16,858
52.5 37.9 75.3 3
10.6 9.7
54.2 39.1 75 4
9.4* 9.6*
5
1532 1532 1532 1532
16,858 16,858 16,858 16,858 16,858 16,858 16,858
3 1 1 3 1 2 1
4 1 2 2 2 2 2
1532 1532 1532 1532 1532 1532 1532
3 1 1 3 1 1 1
Abbreviation: IQR, interquartile range. *P⬍.0001 in comparing the NARCOMS overall sample and our subsample.
Arch Phys Med Rehabil Vol 93, September 2012
4 3 1 2 3 1 2 2