Sleep Medicine 10 (2009) 531–539 www.elsevier.com/locate/sleep
Original Article
Psychometric evaluation and tests of validity of the Medical Outcomes Study 12-item Sleep Scale (MOS sleep) Richard P. Allen a,*, Mark Kosinski b, Christina E. Hill-Zabala c, Michael O. Calloway c a
Department of Neurology, Johns Hopkins University, Asthma and Allergy Building 1B46b, 5501 Hopkins Bayview Circle, Baltimore, MD 21224, USA b QualityMetric Inc., 640 George Washington Highway, Suite 201, Lincoln, RI 02865, USA c GlaxoSmithKline, Five Moore Drive, Research Triangle Park, NC 27709, USA Received 20 January 2008; received in revised form 2 June 2008; accepted 16 June 2008 Available online 19 September 2008
Abstract Objective: To validate the psychometric properties of the Medical Outcomes Study (MOS) Sleep Scale in subjects with restless legs syndrome (RLS). Methods: Data from a clinical trial program involving two Phase III, double-blind, placebo-controlled trials of ropinirole in subjects with moderate-to-severe primary RLS were analyzed. Subjects were assessed on the MOS Sleep Scale at baseline, Weeks 8 and 12. Results: The baseline validation population included 551 subjects on which full longitudinal data are available. Psychometric assessment of four MOS sleep domains revealed satisfactory item convergent validity (r > 0.40) for most items. All domain items in both trials surpassed the standard for item discriminant validity, with no significant floor or ceiling effects. The MOS sleep domain scores showed good internal consistency reliability. Concurrent validity (r = 0.40) was exceeded in correlations between the RLS overall quality-of-life score and sleep problems index II. The clinical validity of the MOS Sleep Scale was demonstrated against self-reported RLS symptoms and clinician-determined severity; changes in MOS Sleep Scale were responsive to improvements in RLS severity, as measured by the Clinical Global Impression-Improvement and Severity-of-Illness scales. Conclusion: The MOS Sleep Scale is a reliable, valid tool for assessing changes in the sleep of subjects with moderate-to-severe primary RLS. The somnolence domain failed to relate to clinical severity of RLS, indicating a possible sleep–wake relationship unique to RLS. Use of this scale to evaluate other conditions causing sleep disturbance is supported. Ó 2008 Elsevier B.V. All rights reserved. Keywords: Restless legs syndrome; Ropinirole; Sleep; Sleep disturbance; Validity tests; MOS sleep; Psychometrics of MOS sleep; Validation MOS sleep; Somnolence; Hyperarousal
1. Introduction 1.1. Sleep and the restless legs syndrome The chronic neurological disorder restless legs syndrome (RLS) disrupts sleep to the extent that sleep disturbance is considered one of the associated clinical *
Corresponding author. Tel.: +1 410 550 2609; fax: +1 410 550 3364. E-mail address:
[email protected] (R.P. Allen). 1389-9457/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.sleep.2008.06.003
features of the disorder [1]. Sleep disturbance is associated with much of the primary morbidity for the disorder [2] and is a primary complaint of almost 90% of patients with clinically significant RLS [3]. Thus, characterizing and evaluating RLS should include assessments of sleep problems. Further, considerations and evaluations of treatment efficacy should not be considered complete without addressing changes in sleep patterns. Evaluation and characterization of sleep problems in RLS can be achieved using objective polysomnographic assessments (although this is not the standard of care or
532
R.P. Allen et al. / Sleep Medicine 10 (2009) 531–539
required for diagnosis), and subjective assessments capturing the patient’s perceptions of sleep. Objective assessments based on sleep laboratory studies have characterized the sleep of RLS patients as involving excessive periodic limb movements, reduced sleep efficiency, and reduced total sleep times [4,5]. An assessment tool that characterizes the subjective experience of sleep relevant to patients with RLS has not been developed.
sleep quantity are recalibrated on a 0–100 scale that represents the percentage of a particular sleep domain; sleep quantity is recorded as 0–24 h. Higher scores for the domains of sleep disturbance, somnolence and the sleep indices indicate worse sleep problems, whereas lower scores for sleep quantity and sleep adequacy indicate worse sleep problems. The recall period for the MOS Sleep Scale is over the preceding 4 weeks.
1.2. MOS Sleep Scale
1.3. Previous psychometric evaluation of the MOS Sleep Scale
The Medical Outcomes Study (MOS) Sleep Scale is a patient-reported, non-disease-specific instrument for evaluating sleep outcomes. Consisting of 12 items, the MOS Sleep Scale measures subjective experiences of sleep across several different domains, and as such may be potentially relevant for evaluating RLS-related sleep problems. Development of the MOS Sleep Scale was based on an extensive review of the published literature on sleep related and sleep disorders, which led to subsequent identification of sleep domains seen as common areas of agreement across all the classifications of sleep disorders [6]. The scale was designed to include items that measure all of the various characteristics of sleep that were identified across different sleep-related diseases or syndromes. The 12 items of the MOS Sleep Scale measure sleep parameters across six domains. These domains are: (1) sleep disturbance, which measures the ability to fall asleep and maintain restful sleep (four items); (2) sleep adequacy, which measures the sufficiency of sleep in terms of sleeping enough to provide restoration of wakefulness (two items); (3) sleep quantity, which measures (in hours) the amount of sleep an individual has had each night (one item); (4) somnolence, which measures daytime drowsiness or sleepiness (three items); (5) snoring (one item); and (6) shortness of breath, or headache (one item). The scale also includes two indices. The sleep problems index I allows for the summary of sleep problems using an abbreviated six-item index. This index contains questions from the sleep disturbance, sleep adequacy, respiratory impairment, and somnolence domains, but not sleep quantity. The sleep problems index II uses nine of the 12 items of the scale to compute an overall sleep problem summary. This index contains questions from the sleep disturbance, sleep adequacy, respiratory impairment, and somnolence domains but does not include the question on sleep quantity. The three questions in index II not included in index I are time to fall asleep, feeling that sleep was quiet, and feeling drowsy or sleepy during the day. Ten of the scale’s 12 items are scored using a six-point response scale, one item uses a five-point Likert scale, and sleep quantity is an open-ended question recording the actual number of hours slept. All domains except
Psychometric evaluation of the MOS Sleep Scale during development [6] supported its use in the assessment of sleeping problems (and changes in sleeping problems) in both clinical and non-clinical populations. Subsequently, the scale has been used extensively in a variety of clinical populations, such as patients with cancer pain [7] or overactive bladder [8,9], and those on maintenance dialysis [10]. Most recently, the scale was psychometrically validated in the general US population [11] and in two groups of patients with neuropathic pain [11,12]. These studies all demonstrated the internal consistency and test–retest reliability of scale scores, content and construct validity, and sensitivity of the scales in detecting differences in sleep-related problems. To date, however, no evidence has been presented on the psychometric properties of the MOS Sleep Scale in an RLS population. 1.4. Aim A key factor determining the appropriateness of using a specific patient-reported outcome instrument is that it should be validated in the population of interest. This ensures that the measure is appropriate (sensitive, specific, and responsive) in that population. Therefore, the aim of this study was to evaluate the psychometric properties of the MOS Sleep Scale in subjects with moderate-to-severe primary RLS. The validation process included assessment of reliability, validity (including construct, concurrent, and clinical validity) and responsiveness of the MOS Sleep Scale in two Phase III clinical trials of ropinirole in the treatment of RLS (Therapy with Ropinirole: Efficacy And Tolerability in RLS [TREAT RLS] 1 and 2) [13,14]. The validation process was carried out as a secondary analysis of clinical trial data. 2. Methods 2.1. Data sources The study designs for the data sources have been fully described elsewhere [13,14]. Briefly, TREAT RLS 1 and 2 were 12-week, multicenter, double-blind, placebo-con-
R.P. Allen et al. / Sleep Medicine 10 (2009) 531–539
trolled, phase III studies of the efficacy and tolerability of ropinirole in patients with moderate-to-severe primary RLS. TREAT RLS 1 comprised 284 subjects and TREAT RLS 2 comprised 267 subjects. Subjects were recruited from multiple sites in Europe, the USA, and Australia. Inclusion criteria were age 18–79 years; diagnosis of primary RLS using the International Restless Legs Syndrome Study Group (IRLSSG) diagnostic criteria [1]; baseline total score of P15 on the International Restless Legs Scale (IRLS); and at least 15 nights of RLS symptoms during the previous month. Exclusion criteria were signs of secondary RLS (e.g., renal failure or iron deficiency); augmentation or rebound with previous treatment; daytime RLS symptoms requiring treatment; primary sleep disorders, movement disorders, or medical conditions that affect the assessment of RLS; taking medications known to affect RLS or sleep; or withdrawal/introduction/dose change of medications known to inhibit or induce P450 CYP1A2. Subjects were assessed on the MOS Sleep Scale at baseline, Weeks 8 and 12. The validation analyses used two different study samples; a baseline validation sample that included all patients who had evaluable MOS Sleep Scale scores (i.e., <50% missing data, as specified by the developers of the questionnaire) at baseline; and a sample with full longitudinal data (i.e., evaluable scores at baseline and at a post-baseline visit [Weeks 8 or 12]). The type of analysis used, whether correlational or a differences of means, dictated the choice of statistics. When differences of means tests were conducted, the choice was to use non-parametric tests given that sample sizes were stretched across several groups of interest. 2.2. Psychometric analyses: outcome measures In the psychometric analyses presented here, only four of the MOS sleep domains (sleep disturbance, sleep adequacy, sleep quantity, and somnolence) and the more comprehensive sleep problems index II were used. Although the domains for snoring and respiratory problems provide measures of distinct sleep problems, they were not considered directly relevant to RLS or considered to be responsive to treatment, and thus were not analyzed. The sleep problems index I was also not analyzed as it is a shorter version of the sleep problems index II, and both are highly correlated. Because the findings from both studies were very similar, only one study (TREAT RLS 1) will be reported in full in this article. Findings from TREAT RLS 2 are discussed only where they confirm or deviate from those of TREAT RLS 1 (selected data from TREAT RLS 2 are presented in Supplementary Tables 1A–5A). Other outcome measures captured in these studies were the IRLS (primary endpoint), a 10-item, diseasespecific measure of RLS symptom severity and frequency (score ranging from 0 ‘‘no symptoms” to 40
533
‘‘very severe symptoms”) [15]; the Clinical Global Impression-Improvement (CGI-I) scale, a seven-point clinician-rated scale of change in the patient’s condition compared with pre-treatment baseline; the Clinical Global Impression-Severity-of-Illness (CGI-S) scale, another seven-point clinician-rated scale that compares the patient’s severity-of-illness with that of other patients with RLS; and the Johns Hopkins RLS quality-of-life questionnaire (RLSQoL), an 18-item, patient-reported, disease-specific questionnaire that assesses the impact of RLS symptoms on daily life, emotional well being, and the social/work life of RLS subjects [16]. 2.3. Tests of construct validity The MOS sleep domains and the sleep problems index II were assessed for construct validity (i.e., item convergence, item discrimination, and internal consistency [6,11]). Other scaling assumptions tested were clinical validity, floor and ceiling effects, and responsiveness to changes to RLS symptoms. A summary discussion of these psychometric tests is provided below. 2.3.1. Item convergence and discrimination While no hard and fast rules exist as to when a sleep domain would be viewed as having sufficient convergence, in these analyses each item in a domain must be correlated with its domain at 0.40 or above. The 0.40 criterion is based on interpreting factor scores in principal components where 0.40 is often accepted as an appropriate cut-off point for domain development [17]. Item discrimination requires that domain items have higher correlations with the items in their own domain than with items in other domains [18]. A large percentage of strong correlations with items outside their own domain would call into question the scaling assumptions. 2.3.2. Internal consistency assessment of reliability Internal consistency is the extent to which individual items are consistent with each other, and was estimated at the scale and domain level using Cronbach’s a coefficient. For a domain to be considered reliable, an a-value of at least 0.70 is recommended [19,20]. Internal consistency was assessed for the sleep problems index II and each domain score. 2.3.3. Concurrent validity Concurrent validity is based on the correlation between a newly developed measure and an accepted measure for that construct. To demonstrate acceptable concurrent validity, the expectation is that the MOS scale domains would exceed correlations of 0.40 with other accepted outcomes associated with sleep. Since the RLSQoL, which assesses the impact of RLS symp-
534
R.P. Allen et al. / Sleep Medicine 10 (2009) 531–539
toms on patients’ QoL (including patients’ sleep and daytime somnolence) [21,22], was captured in the trial program, it provided an acceptable and appropriate measure for assessing concurrent validity [21,22]. Thus, the concurrent validity of the MOS Sleep Scale was based on the correlations between the baseline MOS sleep domain scores and the sleep problems index II score with the baseline RLSQoL score. 2.3.4. Clinical validity The MOS Sleep Scale was assessed for clinical validity by examining the relationship between the severity of RLS symptoms and the scores on the MOS Sleep Scale using the logic of known-groups validity [23]. If the MOS Sleep Scale has clinical validity, its scores should be related to symptom severity. That is, the MOS Sleep Scale scores would be worse for subjects with more severe self-reported and clinician-determined severity and more frequent RLS symptoms. In an effort to demonstrate clinical validity, ‘‘criterion” groups were created on the basis of RLS symptom severity. These groups were constructed based on three criterion variables: (1) self-reported RLS symptom severity; mild, moderate, severe, or very severe, as determined by item 6 of the IRLS (‘‘How severe was your RLS as a whole?”); (2) item 7 of the IRLS (‘‘How often did you get RLS symptoms?”); and (3) clinicianrated severity defined by scores on the CGI-S scale (patients with a score of 0 were not included). In order to establish clinical validity for the MOS Sleep Scale, the mean baseline MOS domain scores would differ significantly and linearly across groups of subjects that differed on symptom severity and frequency. The differences in mean scores across the criterion groups were tested using analysis of variance. 2.3.5. Responsiveness Responsiveness of the MOS Sleep Scale was examined by comparing changes in MOS sleep domain scores and the sleep problems index II score against changes in clinician-reported severity in the study population with full longitudinal data (i.e., between baseline and Weeks 8 and 12). Responsiveness was assessed in subject groups defined by their CGI-I score as clinically improved (score 1–3), unchanged (score 4), or worsened (score 5–7). Responsiveness was also assessed using the change in CGI-S score between baseline and Weeks 8 and 12. Subjects were defined as ‘‘improved” if the difference over time in CGI-S score was positive (i.e., a reduction in severity); ‘‘unchanged” if they had a change score of 0; and ‘‘worsened” if they had a negative change in score (i.e., an increase in severity). Subjects ‘‘not assessed” (CGI-I or CGI-S score of 0) were not included in the analysis. 2.3.6. Known-groups validity Known-groups validity was assessed by viewing the differences in MOS sleep domain scores and the sleep
problems index II score between the severity groups using Mann–Whitney–Wilcoxon tests. Effect size (ES) was calculated by dividing the change in mean MOS sleep domain scores and sleep problems index II score (from baseline to a subsequent time-point) by the standard deviation (SD) of mean scores at baseline. No change was an ES below 0.20; a small change was between 0.20 and 0.50; a moderate change was between 0.50 and 0.79; and a large change was P0.80 [24]. Standardized response means were calculated by dividing the change in mean MOS sleep domain scores and sleep problems index II score by the SD of the mean change scores for stable patients. Scale responsiveness would be demonstrated by a strong and linear relationship between changes in scores on sleep domains and changes in the clinical presentation of the subjects. In addition, paired Mann–Whitney–Wilcoxon tests were calculated in subjects whose sleep problems worsened, improved, or remained unchanged. 2.3.7. Floor and ceiling effects Floor and ceiling effects were assessed for the scale as a whole (<5% of subjects scoring the lowest or highest possible MOS sleep total score), for the domains (<20% of subjects scoring at the lowest or highest possible domain score), and for individual items (<50% of subjects with the lowest or highest response for an item). In short, an appropriate scale should not have a majority of responses at one or both extremes of the response categories. 3. Results 3.1. Patient populations In TREAT RLS 1, 284 subjects were included in the baseline validation population. Of these, 14 had no evaluable post-baseline MOS Sleep Scale questionnaires and were excluded from the analyses. Therefore, the TREAT RLS 1 longitudinal validation population comprised 270 subjects. Demographic characteristics of the subjects were broadly representative of those with moderate-to-severe primary RLS [13,14]. The corresponding numbers for the validation and longitudinal populations in TREAT RLS 2 were 267 and 253, respectively. 3.2. Tests of scaling assumptions Overall, the results for the MOS sleep domains demonstrated satisfactory item convergence (Table 1). In TREAT RLS 1, item 3 (‘‘How often in the past 4 weeks did you feel that your sleep was not quiet?”) on the sleep disturbance domain did not satisfy the criterion of r = 0.40 for item convergent validity (r = 0.38). Also, the somnolence domain, item 11 (‘‘How often in the past 4 weeks did you take naps [5 min or longer] during the
R.P. Allen et al. / Sleep Medicine 10 (2009) 531–539
535
Table 1 Item convergent and discrimination results for the MOS Sleep Scale (TREAT RLS 1; baseline validation population) MOS Sleep Scale
Sleep disturbance Sleep adequacy Somnolence Sleep problems index II
Number of items
4 2 3 9
Item convergent validitya
Item discriminant validityb
Range of correlations
Success rate (%)a
Success rate (%)
Floorc (%)
Ceilingd (%)
0.38–0.71 0.61–0.61 0.30–0.64 0.24–0.63
75.0 100 66.7 77.8
100 100 100 NA
0.4 16.2 4.2 0
6.0 0.7 1.1 0.4
MOS, Medical Outcomes Study; NA, data not applicable; TREAT RLS, Therapy with Ropinirole: Efficacy And Tolerability in RLS. a Correlations P 0.40. b Item-scale correlations higher with the item’s own subscale than with the other MOS sleep subscales. c Proportion of respondents with the lowest possible score. d Proportion of respondents with the highest possible score.
day?”) did not meet this criterion (r = 0.30). In TREAT RLS 2, all item domain correlations surpassed the 0.40 criterion for acceptable convergence (Supplementary Table 1A). For the sleep problems index II, two items fell below the 0.40 criterion: item 5 (‘‘How often in the past 4 weeks did you awaken feeling short of breath or with a headache?”), which was not included in any of the four domains of interest, and item 9 (‘‘How often during the past 4 weeks did you have trouble staying awake during the day?”) of the somnolence domain. Item 5 was also found to be below the 0.40 criterion in TREAT RLS 2 (r = 0.30). Item 4 (‘‘How often during the past 4 weeks did you get enough sleep to feel rested upon waking in the morning?”) from the sleep adequacy domain fell below the 0.40 criterion in TREAT RLS 2, but not in TREAT RLS 1 (Supplementary Table 2A). All items in TREAT RLS 1 met the standard for item discrimination, i.e., had higher item domain correlations with their own domain (adjusted for overlap) than with other domains (Table 1). The use of an item domain correlation to judge the distinctiveness of the sleep problems index II is not applicable because it comprises almost all items in the scale; therefore, there is no other subdomain with which to accurately compare the correlations. Similar results were found in Treat RLS 2 (Supplementary Table 1A).
3.3. Internal consistency reliability The MOS Sleep Scale domain scores showed acceptable levels of internal consistency reliability (Table 2). In TREAT RLS 1, Cronbach’s a coefficient for the sleep problems index II and domains of sleep disturbance and sleep adequacy all exceeded the 0.70 standard for internal consistency reliability; however, the somnolence domain did not (0.66). There is no coefficient for sleep quantity given that it is a single item domain. In TREAT RLS 2, the a coefficient for all domains surpassed 0.70 (Supplementary Table 2A). 3.4. Concurrent validity The correlations between the MOS sleep domains and sleep problems index II and the RLSQoL overall life impact score were all above the threshold (r = 0.40) and were all statistically significant, indicating concurrent validity of the MOS Sleep Scale (Table 2). These results indicate that as sleep problems worsened, scores for overall quality-of-life decreased. 3.5. Clinical validity The MOS Sleep Scale showed good clinical validity, with sleep problems worsening with greater severity of RLS symptoms. In TREAT RLS 1, all scores for the
Table 2 Internal consistency reliability results for the MOS Sleep Scale and correlation between RLSQoL total life impact score and MOS Sleep Scale scores (baseline validation population) for TREAT RLS 1 MOS Sleep Scale
Number of items
n
Internal consistency reliabilitya
Sleep disturbance Sleep quantity Sleep adequacy Somnolence Sleep problems index II
4 1 2 3 9
284 276 264 284 284
0.75 N/A 0.76 0.66 0.78
Correlationb 0.42 0.33 0.44 0.42 0.59
P-value
n
<0.0001 <0.0001 <0.0001 <0.0001 <0.0001
284 276 284 284 284
MOS, Medical Outcomes Study; NA, data not applicable; RLSQoL, restless legs syndrome quality-of-life questionnaire; TREAT RLS, Therapy with Ropinirole: Efficacy And Tolerability in RLS. a Cronbach’s a coefficient. b Spearman’s correlation coefficient.
536
R.P. Allen et al. / Sleep Medicine 10 (2009) 531–539
MOS sleep domains and the sleep problems index II discriminated between RLS symptom severity groups, as defined by IRLS item 6 (Table 3). While there was a pattern of increasing somnolence with an increase in RLS severity in TREAT RLS 2, the somnolence domain score did not show acceptable clinical discrimination (Supplementary Table 3A). When RLS severity was defined by item 7 of the IRLS scale (frequency of RLS symptoms), all except the somnolence domain of the MOS Sleep Scale differentiated between severity groups in TREAT RLS 1 (data not shown). Differences between RLS severity groups were statistically significant for the sleep problems index II (P = 0.0096), sleep disturbance (P = 0.0033), sleep quantity (P = 0.0030), and sleep adequacy (P = 0.0125). Sleep problems increased with the frequency of RLS symptoms. In TREAT RLS 2, only the differences in the sleep disturbance scores by RLS severity were statistically significant (P = 0.0230). A third criterion variable of clinical validity assessed the ability of MOS sleep domains to distinguish subjects
Table 3 Clinical validity: MOS Sleep Scale domain and index II scores at baseline for each RLS severity ranking, as defined by item 6 of the IRLS (‘How severe was your RLS as a whole?’) (TREAT RLS 1; baseline validation population) MOS Sleep Scale
Sleep disturbance
RLS severity
n
Mean
SD
Kruskal– Wallis test (P-value)
Mild Moderate Severe Very severe Total
6 99 124 55 284
50.17 47.36 63.00 73.96 59.40
33.45 22.89 23.42 19.21 24.69
Mild Moderate Severe Very severe Total
6 97 120 53 276
6.17 6.12 5.10 4.74 5.41
1.47 1.32 1.57 1.47 1.56
0.0001
Sleep adequacy
Mild Moderate Severe Very severe Total
6 99 124 55 284
31.67 40.30 30.16 24.36 32.61
34.30 24.30 24.66 27.06 25.81
0.0002
Somnolence
Mild Moderate Severe Very severe Total
6 99 124 55 284
50.17 31.78 36.77 41.36 36.20
26.78 18.37 23.24 22.32 21.81
0.0350
Mild Moderate Severe Very severe Total
6 99 124 55 284
51.67 43.22 54.30 63.75 52.21
19.91 15.39 17.12 15.52 17.84
0.0001
Sleep quantity
Sleep problems index II
0.0001
MOS, Medical Outcomes Study; IRLS, International Restless Legs Scale; SD, standard deviation; TREAT RLS, Therapy with Ropinirole: Efficacy And Tolerability in RLS.
classified by physicians according to disease severity, using the CGI-S. In TREAT RLS 1, differences in mean scores on the sleep problems index II and the sleep disturbance, sleep quantity, and sleep adequacy domains all showed significant differences among groups of subTable 4 Clinical validity: comparison of MOS Sleep Scale scores at baseline by CGI-S group (TREAT RLS 1; baseline validation population) CGI-S group Sleep disturbance Normal, not at all ill Borderline ill Mildly ill Moderately ill Markedly ill Severely ill Among the most extremely ill patients Total Sleep quantity Normal, not at all ill Borderline ill Mildly ill Moderately ill Markedly ill Severely ill Among the most extremely ill patients Total Sleep adequacy Normal, not at all ill Borderline ill Mildly ill Moderately ill Markedly ill Severely ill Among the most extremely ill patients Total Somnolence Normal, not at all ill Borderline ill Mildly ill Moderately ill Markedly ill Severely ill Total Sleep problems index II Normal, not at all ill Borderline ill Mildly ill Moderately ill Markedly ill Severely ill Among the most extremely ill patients Total
n
Mean
SD
Kruskal–Wallis test (P-value)
6 3 26 99 91 54 5
48.17 65.00 44.31 50.12 66.51 70.06 87.40
33.11 13.11 23.89 22.81 21.76 22.47 22.72
284
59.40
24.69
6 3 25 95 91 51 5
5.83 6.00 6.16 5.98 4.92 4.88 4.40
0.75 0.00 1.43 1.38 1.50 1.72 0.55
276
5.41
1.56
6 3 25 95 91 51 5
5.83 6.00 6.16 5.98 4.92 4.88 4.40
0.75 0.00 1.43 1.38 1.50 1.72 0.55
276
5.41
1.56
0.0001
6 3 26 99 91 54 284
27.50 22.33 34.58 35.16 36.42 40.76 36.20
19.94 20.40 20.93 19.53 23.61 23.89 21.81
0.57911
6 3 26 99 91 54 5
44.67 51.67 42.65 46.37 56.07 60.33 69.00
18.16 14.29 16.66 16.72 15.85 18.38 13.95
284
52.21
17.84
0.0001
0.0001
0.0001
GI-S, Clinical Global Impression-Severity-of-Illness; MOS, Medical Outcomes Study; SD, standard deviation; TREAT RLS, Therapy with Ropinirole: Efficacy And Tolerability in RLS.
R.P. Allen et al. / Sleep Medicine 10 (2009) 531–539
537
Table 5 Mean change in MOS Sleep Scale scores from baseline to Week 8 by CGI-I groups (‘improved’, ‘no change’, and ‘worsened’) (TREAT RLS 1; longitudinal validation population) CGI-I group
n
Mean change
SD
Sleep disturbance Improved No change Worsened Total
140 94 3 237
22.56 7.88 4.33 16.51
23.72 20.99 34.08 23.84
Sleep quantity Improved No change Worsened Total
135 91 3 229
0.94 0.04 0.67 0.58
1.42 1.26 0.58 1.42
Sleep adequacy Improved No change Worsened Total
140 94 3 237
25.50 3.83 3.33 16.62
32.17 26.03 45.09 31.75
Somnolence Improved No change Worsened Total
140 94 3 237
11.03 5.23 4.67 8.65
17.95 18.69 21.57 18.42
Sleep problems index II Improved 140 No change 94 Worsened 3 Total 237
19.01 6.11 5.67 13.73
17.82 16.78 21.73 18.51
Kruskal–Wallis test (P-value)
Effect size
Standardized response mean
Wilcoxon signed rank test (P-value)
0.92 0.32
1.08 0.38
0.0001 0.0012
0.59 0.03
0.75 0.04
0.0001 0.7928
0.99 0.15
0.98 0.15
0.0001 0.2097
0.53 0.25
0.59 0.28
0.0001 0.0101
1.06 0.34
1.13 0.36
0.0001 0.0010
0.0001
0.0001
0.0001
0.0183
0.0001
CGI-I, Clinical Global Impression-Improvement; MOS, Medical Outcomes Study; SD, standard deviation; TREAT RLS, Therapy with Ropinirole: Efficacy And Tolerability in RLS. Not calculated.
jects classified by their CGI-S scale score (Table 4). While there was more reported somnolence as severity increased, this trend was not statistically significant (P < 0.579). In TREAT RLS 2, scores for all four domains and the sleep problems index II differed significantly by CGI-S level (Supplementary Table 4A). 3.6. Responsiveness In both trials changes in MOS Sleep Scale scores between baseline and Weeks 8 or 12 were significantly different between groups of subjects, classified by their improvement at that time (CGI-I score of improved, no change, or worsened). In TREAT RLS 1 at Week 8, all four domain scores and the sleep problems index II score showed significant differences between subject groups classified in this manner (Table 5). The number of subjects who showed worsening of RLS at Week 8 was small (n = 3 in TREAT RLS 1); therefore, conclusions cannot be drawn from these analyses. Likewise, in TREAT RLS 2 at Week 8, all four domain scores and the sleep problems index II score showed significant differences among subject groups classified by their CGI improvement (Supplementary Table 5A). Similar results
were found up to Week 12 in both studies among CGI improvement groups (detailed results not shown). When responsiveness was tested using changes in CGI-S scale score between baseline and endpoint, the results were similar to those obtained using the CGI-I. In TREAT RLS 1 at Weeks 8 and 12, all domain scores and the sleep problems index II score showed significant differences among subject groups. Similarly, in TREAT RLS 2 at Week 8, all domain scores and the sleep problems index II score showed significant differences among subject groups classified by their CGI-S. At Week 12 in TREAT RLS 2, the changes in sleep adequacy and somnolence did not differ by change in severity, as assessed using the CGI-S scale. 4. Discussion The results from this psychometric evaluation of the MOS Sleep Scale in two clinical trials of subjects with RLS provide evidence of the psychometric integrity of the sleep problems index II and the MOS Sleep Scale domains. The standard for item convergence (correlation >0.40) was surpassed by all items in both trials, with
538
R.P. Allen et al. / Sleep Medicine 10 (2009) 531–539
the exception of items 3 (‘‘How often in the past 4 weeks did you feel that your sleep was not quiet?”) and 11 (‘‘How often do you take naps [5 min or longer] during the day?”) in TREAT RLS 1. As item 3 only just failed to meet the standard for item convergence (item domain correlation of 0.38), this finding is not considered troublesome. It is possible that the low item domain correlation for item 11 is due to subjects having difficulty napping during the day just as they have difficulty sleeping at night. In both trials, all items in all domains surpassed the standard for item discrimination. In addition, there was no significant floor or ceiling effects in either study. This is further evidence of the acceptable psychometric properties of the MOS Sleep Scale in this population. The MOS Sleep Scale sleep problems index II and domains also demonstrated adequate reliability in both studies. All scores exceeded the standard for internal consistency reliability except for the somnolence domain in TREAT RLS 1 which was just below the set criterion. The standard of r = 0.40 for concurrent validity was exceeded in the correlations between the RLSQoL overall life impact scale and the sleep problems index II in both studies. Moreover, all of the MOS sleep domains demonstrated concurrent validity (correlations >0.40) with the exception of the sleep quantity domain in TREAT RLS 2. Using the logic of known-groups validity, the clinical validity of the MOS Sleep Scale was established. Mean scores on each MOS sleep domain differed across the criterion groups in the hypothesized manner. Specifically, the MOS sleep domains adequately discriminated the different RLS severity levels with the exception of subjects with mild disease severity in TREAT RLS 1. However, the sample size of this group was small (n = 6), making the conclusion tenuous. The discriminatory power of the MOS sleep domains for symptom frequency was not as robust as for severity levels. The frequency group occasionally consisted of a single patient in TREAT RLS 1 and four subjects in TREAT RLS 2; thus any findings for this group should be interpreted with caution. However, the weaker results for symptom frequency suggest that it may not be as closely related with sleep problems as symptom severity. This is consistent with the IRLS, which showed that symptom frequency was not as strongly related to sleep problems as symptom severity [25]. Scores on the sleep problems index II, and the sleep disturbance, sleep quantity, and sleep adequacy domains were worse for subjects with worse CGI-S ratings, indicating good correspondence with actual clinical judgement. Sleep problems increased with greater symptom severity for subjects who were ‘‘moderately ill” or ‘‘worse.” This relation was not found for subjects whose severity-of-illness was ‘‘mildly ill” or ‘‘better.” The lack of statistical associations may result from small sample sizes.
In all clinical validity tests, the somnolence domain was less consistently related to RLS symptom severity compared with the other MOS sleep domains. One possible explanation is that somnolence during the day is confounded with other predictors not measured. Another reason may be that subjects’ daytime responsibilities and duties prevented daytime napping, even though they felt the need. A further possibility is that symptom severity may cause greater sleep disturbance than somnolence, suggesting that severe symptoms may produce arousal effects that counteract the somnolence expected from the sleep disturbance. This finding of a possible hyperarousal with RLS deserves more study. On the basis of these results, the somnolence domain should be used cautiously when evaluating treatment efficacy. The lack of relationship of somnolence domain to disease severity may reflect a unique and interesting characteristic of RLS. In both studies, changes in MOS Sleep Scale scores between baseline and Weeks 8 and 12 were responsive to improvements in RLS severity, as measured by the CGI-I and CGI-S scales. Sleep problems showed related improvements with symptom improvements. Effect sizes were large for the sleep disturbance and sleep adequacy domains and the sleep problems index II, and were small or moderate for the sleep quantity and somnolence domains. In conclusion, this study found the MOS Sleep Scale to be a reliable and valid (including responsiveness to change) scale for use with subjects with moderate-to-severe primary RLS. The somnolence domain indicates some unexpected lack of change with severity of RLS despite all other sleep measures indicating greater sleep disruption. This suggests a possible unique feature of RLS effects on sleep and waking. This is consistent with the clinical impression that RLS patients rarely complain of falling asleep while driving despite profound sleep loss at night. The use of this scale to measure sleep changes in the RLS patient population is therefore supported. Moreover, our findings provide additional support for the general use of the MOS Sleep Scale to evaluate disease impact on sleep.
Acknowledgements This study was supported by GlaxoSmithKline Research and Development. Dr. Allen has in the last 12 months received travel support, honoraria or consultant fees from GlaxoSmithKline, Boehringer Ingelheim, Schwarz Pharma, Xenoport, Luitpold, Respironics, IM systems, and Orion Pharma. Mr Kosinski has received consultant fees from GlaxoSmithKline. Drs. Hill-Zabala and Calloway are employees of GlaxoSmithKline.
R.P. Allen et al. / Sleep Medicine 10 (2009) 531–539
Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/ j.sleep.2008.06.003. References [1] Allen RP, Picchietti D, Hening WA, Trenkwalder C, Walters AS, Montplaisir J. Restless legs syndrome: diagnostic criteria, special considerations, and epidemiology. A report from the restless legs syndrome diagnosis and epidemiology workshop at the National Institutes of Health. Sleep Med 2003;4:101–19. [2] Kushida CA, Allen RP, Atkinson MJ. Modeling the causal relationships between symptoms associated with restless legs syndrome and the patient-reported impact of RLS. Sleep Med 2004;5:485–8. [3] Hening W, Walters AS, Allen RP, Montplaisir J, Myers A, FeriniStrambi L. Impact, diagnosis and treatment of restless legs syndrome (RLS) in a primary care population: the REST (RLS epidemiology, symptoms, and treatment) primary care study. Sleep Med 2004;5:237–46. [4] Allen RP, Earley CJ. Restless legs syndrome: a review of clinical and pathophysiologic features. J Clin Neurophysiol 2001;18:128–47. [5] Saletu B, Gruber G, Saletu M, Brandsta¨tter N, Hauer C, Prause W, et al. Sleep laboratory studies in restless legs syndrome patients as compared with normals and acute effects of ropinirole. 1. Findings on objective and subjective sleep and awakening quality. Neuropsychobiology 2000;41:181–9. [6] Hays RD, Stewart AL. Sleep measures. In: Stewart AL, Ware JEJ, editors. Measuring functioning and well-being The Medical Outcomes Study approach. Durham and London: Duke University Press; 1992. p. 235–59. [7] Payne R, Mathias SD, Pasta DJ, Wanke LA, Williams R, Mahmoud R. Quality of life and cancer pain: satisfaction and side effects with transdermal fentanyl versus oral morphine. J Clin Oncol 1998;16:1588–93. [8] Coyne KS, Zhou Z, Bhattacharyya SK, Thompson CL, Dhawan R, Versi E. The prevalence of nocturia and its effect on healthrelated quality of life and sleep in a community sample in the USA. BJU Int 2003;92:948–54. [9] Coyne KS, Zhou Z, Thompson C, Versi E. The impact on healthrelated quality of life of stress, urge and mixed urinary incontinence. BJU Int 2003;92:731–5. [10] Unruh ML, Hartunian MG, Chapman MM, Jaber BL. Sleep quality and clinical correlates in patients on maintenance dialysis. Clin Nephrol 2003;59:280–8. [11] Hays RD, Martin SA, Sesti AM, Spritzer KL. Psychometric properties of the Medical Outcomes Study Sleep measure. Sleep Med 2005;6:41–4.
539
[12] Rejas J, Ribera MV, Ruiz M, Masrramo´n X. Psychometric properties of the MOS (Medical Outcomes Study) Sleep Scale in patients with neuropathic pain. Eur J Pain 2007;11:329–40. [13] Trenkwalder C, Garcia-Borreguero D, Montagna P, Lainey E, de Weerd AW, Tidswell P, et al. Ropinirole in the treatment of restless legs syndrome: results from the TREAT RLS 1 study, a 12 week, randomised, placebo controlled study in 10 European countries. J Neurol Neurosurg Psychiatry 2004;75:92–7. [14] Walters AS, Ondo WG, Dreykluft T, Grunstein R, Lee D, Sethi K. Ropinirole is effective in the treatment of restless legs syndrome. TREAT RLS 2: a 12-week, double-blind, randomized, parallel-group, placebo-controlled study. Mov Disord 2004;19:1414–23. [15] Walters AS, LeBrocq C, Dhar A, Dhar A, Hening W, Rosen R, Allen RP, et al. Validation of the International Restless Legs Syndrome Study Group rating scale for restless legs syndrome. Sleep Med 2003;4:121–32. [16] Abetz L, Arbuckle R, Allen R, Mavraki E, Kirsch J. Validating the restless legs syndrome quality of life questionnaire (RLSQOL) in a trial patient population. Value Health 2004;7:793. [17] Stevens J. Applied multivariate statistics for the social sciences. Hillsdale, New Jersey: Lawrence Erlbaum Associates; 1986. [18] Cambell DT, Fiske DW. Convergent and discriminant validation by the multi-trait multi-method matrix. Psychol Bull 1959;56:81–105. [19] Nunnaly J, Bernstein IR. Chapter 7: the assessment of reliability. In: Psychometric theory. New York: McGraw-Hill; 1994. p. 248–92. [20] Chassany O, Sagnier P, Marquis P, Fulleton S, Aaronson N. Patient-reported outcomes and regulatory issues: the example of health related quality of life – a European guidance document for the improved integration of health related quality of life assessment in the drug regulatory process. Drug Inf J 2002;36:209–38. [21] Abetz L, Arbuckle R, Allen RP, Mavraki K, Kirsch J. The reliability, validity and responsiveness of the restless legs syndrome quality of life questionnaire (RLSQoL) in a trial population. Health Qual Life Outcomes 2005;3:79. [22] Abetz L, Vallow SM, Kirsch J, Allen RP, Washburn T, Earley CJ. Validation of the restless legs syndrome quality of life questionnaire. Value Health 2005;8:157–67. [23] Kerlinger F. Foundation of behavioral research. 2nd ed. New York: Holt, Rinehart and Winston; 1973. [24] Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale, New Jersey: Lawrence Erlbaum Associates; 1988. [25] Abetz L, Arbuckle R, Allen RP, Garcia-Borreguero D, Hening W, Walters AS, et al. The reliability, validity and responsiveness of the International Restless Legs Syndrome Study Group rating scale and subscales in a clinical-trial setting. Sleep Med 2006;7:340–9.