Psychometric Evaluation of the Diabetes Symptom Checklist-Revised (DSC-R)—A Measure of Symptom Distress

Psychometric Evaluation of the Diabetes Symptom Checklist-Revised (DSC-R)—A Measure of Symptom Distress

Volume 12 • Number 8 • 2009 VA L U E I N H E A LT H Psychometric Evaluation of the Diabetes Symptom Checklist-Revised (DSC-R)—A Measure of Symptom Di...

248KB Sizes 0 Downloads 37 Views

Volume 12 • Number 8 • 2009 VA L U E I N H E A LT H

Psychometric Evaluation of the Diabetes Symptom Checklist-Revised (DSC-R)—A Measure of Symptom Distress

vhe_571

1168..1175

Robert A. Arbuckle, MA,1 Louise Humphrey, MSc,1 Kawitha Vardeva, MSc,2 Bhakti Arondekar, PhD, MBA,3 Muriel Danten-Viala, MSc,4 Jane A. Scott, PhD,1 Frank J. Snoek, PhD5 1

Mapi Values, Bollington, UK; 2Amgen Ltd, Cambridge, UK; 3GlaxoSmithKline, Philadelphia, PA, USA; 4Mapi Values, Lyon, France; 5VU University Medical Center, Amsterdam, The Netherlands

A B S T R AC T Objective: To assess the psychometric validity, reliability, responsiveness, and minimal important differences of the Diabetes Symptoms ChecklistRevised (DSC-R), a widely used patient-reported outcome measure of diabetes symptom distress. Research Design and Methods: Psychometric validity of the DSC-R was assessed using blinded data from a large-scale trial of approximately 4000 type 2 diabetes patients. Confirmatory factorial analysis (CFA) and multitrait analysis were used to examine the construct validity of the structure of DSC-R. DSC-R internal consistency, discriminative validity, and responsiveness were also assessed. Distribution and anchor-based methods were used to estimate minimal important differences for DSC-R domains. Results: Mean age of the sample was 56 years, 42% were female, 88% were Caucasian. Patients had a mean body mass index (BMI) of 32.2 and

mean glucose-fasting level of 151.7 md/dl. CFA and multitrait analysis indicated that the scoring of the DSC-R has acceptable construct validity. Item-scale correlations ranged from 0.44 to 0.78. Cronbach’s alpha coefficients ranged from 0.69 to 0.87. At baseline, DSC-R scores were higher among patients with higher BMI scores (P < 0.0001), supporting the discriminative validity of the DSC-R. Minimal important difference estimates ranged from 0.39 to 0.60 points when using distribution methods and from 0.00 to 0.33 when estimated using anchor-based methods. Conclusions: The DSC-R demonstrated excellent psychometric properties when tested in a large-scale diabetes clinical trial. Responsiveness and test–retest reliability of the DSC-R warrant further evaluation. Keywords: diabetes, patient reported outcome, psychometric validation, quality of life, symptom distress.

Introduction

change the scaling from a 4-point to a 5-point Likert scale. No changes to item content were made. The resulting instrument is known as the DSC-Revised (DSC-R). Validity and reliability for the DSC-R have not been reported in the literature, although recent studies would suggest DSC-R has satisfactory reliability in both newly diagnosed and insulintreated type 2 diabetes patients [7–11]. To address the need for evidence of psychometric validity, blinded, post hoc analysis of data from a large scale, multicenter, randomized controlled clinical trial was performed to assess the psychometric properties of DSC-R.

There is general consensus that quality of life is an important outcome of diabetes care, requiring reliable patient-reported measures, pertaining to the physical, social, and psychological domains [1]. The Diabetes Symptoms Checklist (DSC) was developed to capture the subjective experience of diabetes-related symptoms and changes therein as a result of medical treatment [2]. Symptoms associated with type 2 diabetes may be directly related to hyperglycemia (e.g., excessive thirst, dryness of the mouth, fatigue, frequent urination), complications associated with diabetes (e.g., loss of sensation in the extremities), and the treatment of diabetes (e.g., hypoglycemia) [3]. The DSC items were derived from a review of the literature and discussions with experienced physicians in the field [2]. Guided by their clinical knowledge, the developers identified eight domains of importance for diabetes symptom distress. A final selection of 34 items measuring these eight domains was made based on psychometric criteria. Based on research findings [4–6], the developers of the instrument later sought to improve the DSC in two ways. First, the frequency scale was replaced by a dichotomous “yes” or “no” response for the presence or absence of each symptom. This change was made because at times patients found the dual response format confusing to answer and missing data could be a problem. Moreover, reported frequency and burden were generally highly intercorrelated (>0.80), suggesting redundancy. The second change was to Address correspondence to: Rob Arbuckle, Mapi Values, Adelphi Mill, Bollington, Macclesfield, Cheshire SK10 5JB UK. E-mail: Rob. [email protected] 10.1111/j.1524-4733.2009.00571.x

1168

Subjects and Methods Post hoc psychometric validation analyses were performed using blinded data from A Diabetes Outcome Progression Trial (ADOPT) [12]. The ADOPT study was a double-blind, randomized, parallel group study comparing rosiglitazone, metformin, and glyburide as an initial treatment for 4360 recently diagnosed patients with type 2 diabetes. Details of the study design are reported elsewhere [12]. The sample included patients from the United States (n = 1644), Canada (n = 612), France (n = 388), Germany (n = 466), Spain (n = 397), UK (n = 313), and other countries (n = 466). The primary outcome was the time to monotherapy failure defined as fasting plasma glucose of more than 180 mg/dl on consecutive testing after at least 6 weeks of treatment at the maximum tolerated dose of study medication. Crosssectional psychometric validation analyses (all analyses except responsiveness and minimal important differences [MID]) were performed on 4286 patients who completed the DSC-R at baseline. The 3594 patients who completed the DSC-R and the Short-Form 36 (SF-36) at 1-year follow-up were included in the longitudinal analyses.

© 2009, International Society for Pharmacoeconomics and Outcomes Research (ISPOR)

1098-3015/09/1168

1168–1175

1169

Psychometric Validity of the DSC-R

Figure 1 Example items showing the format of the Diabetes Symptoms Checklist-Revised.

Questionnaires

Diabetes Symptom Checklist-Revised The 34 items of the DSC-R are grouped into eight symptom clusters or domains, each measuring a different aspect of diabetes symptomatology: hyperglycemic, hypoglycemic, psychologicalcognitive, psychological-fatigue, cardiovascular, neurologicalpain, neurological-sensory, and ophthalmologic. A conceptual framework detailing the items included in each domain is presented in Figure 1. For each item, participants are asked if they have experienced the symptom in the past 4 weeks, and if yes, how troublesome that particular symptom is for them. Two example items demonstrating the format of the questionnaire are provided in Figure 2. Items are summed to form domain scores and all items of the DSC-R can be summed together to form a total score. Higher scores indicate greater symptom burden.

Short-Form 36 The SF-36 is a 36-item measure of perceived health status with established validity in both healthy subjects and somatic patients [13–16]. A total score can be calculated as well as a Physical Component and Mental Component summary score. Domain scores range from 0 to 100; higher scores indicate better health status. The recall period is the past 4 weeks.

Clinical Measures Body mass index (BMI) and HbA1c levels were recorded at both study visits reported here.

Analysis All analyses were performed post hoc on blinded data from the ADOPT clinical trial. With the exception of the analyses of responsiveness and MID, all analyses were performed on baseline cross-sectional data. Responsiveness and MID analyses were assessed using baseline and 1 year (visit 9) data. Confirmatory factorial analysis (CFA) was used to evaluate the overall structure of the DSC-R, as shown in Figure 1. The goodness of fit of the model was assessed based on the Goodness of Fit Index (GFI) (good fit if >0.90), the Root Mean

Square Residual (RMR) (good fit if <0.05), the Comparative Fit Index (CFI) (good if >0.95), and the Root Mean Square Error of Approximation (RMSEA) fit indices (good fit if <0.05) [17]. Multitrait analysis was performed to examine the validity of the DSC-R item-scale structure. This included tests of item convergent validity (the correlation between each item and its own scale should be ⱖ0.40), [18] and internal consistency reliability (alpha coefficients ⱖ0.70 for acceptable internal consistency) [19]. Percentages scoring at floor and ceiling (lowest and highest possible scores) were also examined to check for the presence of floor or ceiling effects, and the DSC-R interscale correlations were examined. Concurrent validity (also known as convergent/divergent validity) involves analyzing correlation levels between the scales of the studied questionnaire and scales of well-established and validated questionnaires measuring similar concepts [18]. Scales measuring similar concepts are expected to correlate more highly than scales measuring unrelated concepts. Concurrent validity was assessed by examining Spearman’s correlations between the DSC-R scores and the SF-36 scores. The known groups validity of the DSC-R (ability to distinguish among groups of patients that would be expected to differ [18]) was evaluated by examining differences in DSC-R scores according to HbA1c levels (<6%; 6–6.9%; 7–7.9%, ⱖ8%) and BMI levels (<25, 25–29.9, 30–39.9, ⱖ40). It was hypothesized that there would be statistically significant differences among the groups compared (P < 0.05), as assessed using analysis of variance (ANOVA). Responsiveness was evaluated by comparing changes in DSC-R scores from baseline to 1 year later among subgroups of patients defined by the change score on SF-36 item 2 “compared to one year ago, how would you rate your health in general now?” Only patients who completed the DSC-R at both visits were included in this analysis. Effect sizes were calculated. The effect size is calculated as the mean difference (change score) in scores from baseline to 1 year later divided by the standard deviation of the baseline score. Effect sizes of 0.2, 0.5, and 0.8 were considered small, moderate, and large, respectively, as defined by Cohen [20]. Effect sizes were expected to be moderate or large (>0.50) in the expected direction for patients who

Arbuckle et al.

1170 1 – Lack of strength (energy)? 4 – An overall sense of fatigue?

Psychological fatigue

17 – Increasing fatigue during the course of the day? 20 – Fatigue in the morning when getting up?

0.81 0.33

6 – Sleepiness or drowsiness? 7 – Difficulty concentrating? 31 – Dull head?

Psychological cognitive

33 – Difficulty staying attentive?

0.84

2 – Aching calves when walking? 15 – Burning pain in the calves at night?

Neuropathic pain

21 – Shooting pains in the legs? 0.68

25 – Burning pain in the legs during day? 0.60 3 – Numbness (loss of sensation) in feet? 9 – Numbness (loss of sensation) in the hands? 11 – Tingling sensations in the limbs at night?

Neuropathic sensoric 0.66

26 – Tingling or prickling sensation in hands or fingers? 29 – Odd feeling in legs or feet when touching? 34 – Tingling or prickling sensations in legs or feet?

DSC-R Total Score

5 – Shortness or breath at night? 13 – Palpitations or pains in the breast or heart region? 24 – Pains in the breast or heart region?

Cardiovascular

0.81

30 – Shortness of breath during exercise?

10 – Persistently blurred vision (also with glasses on )?

0.69

14 – Deteriorating vision? 18 – Flashes or black spots in the field of vision? 22 – Fluctuating clear and blurred vision?

Ophtalmologic

28 – Sudden deterioration of vision? 0.77 8 – Moodiness? 19 – Irritability just before a meal?

Hypoglycemic

27 – Easily irritated or annoyed?

0.69 12 – Very thirsty? 16 – Dry mouth? 23 – Frequent voiding?

Hyperglycemic

32 – Drinking a lot (all sort of beverage)?

Figure 2 Original model of the Diabetes Symptoms Checklist-Revised tested by confirmatory factorial analysis. Figures on single arrows: factor loadings; figures on two-headed curved arrows, correlations among factors; item errors are not shown.

reported their health to have improved or worsened, and small or negligible (<0.25) for patients who reported having experienced no change in their health. An MID for a patient reported outcome (PRO) has been defined as the “smallest difference in score in the domain of interest which patients perceive as beneficial, and which would mandate, in the absence of troublesome side effects and excessive cost, a change in patient’s management” [21]. It is recommended that more than one method of estimating MID is employed— thus in the present study both distribution-based and anchorbased methods were used. First, an effect size of 0.5 (a change of 0.5 of a standard deviation) was considered to be clinically significant (distribution method) [22]. Second, the MID was considered to be the change in DSC-R scores for patients who considered their health to be “somewhat better” or “somewhat worse” in response to the SF-36 item 2 “compared to one year ago, how would you rate your health in general now?” (anchorbased method).

For all analyses, the threshold for significance was P < 0.05. Where correlations were evaluated, Pearson’s correlation coefficient was used. ANOVA was used in the comparison of groups. All data processing and analyses were performed using SAS software (Statistical Analysis System, Version 9, SAS Institute, Cary, NC).

Results Cross-sectional analyses were performed on 4286 randomized patients. Mean age for the overall population was 56 years. Forty-two percent of the sample was female; 88% were Caucasian, 2% were Asian, 4% were black, 5% were Hispanic, and 1% other. Mean BMI was 32.2; mean waist–hip ratio was 0.95; mean number of years with diabetes was 0.8 years; mean HbA1c was 7.36, and the mean glucose-fasting level was 151.7 md/dl. The number of missing data was very low for all items, with less than 3% of missing data for all items, at all study visits. At

1171 — 0.94 9 0 0.46–0.71 0.79 27.8 0.2 0.50–0.70 0.77 51.8 0.1 0.48–0.74 0.85 53.2 0 0.44–0.55 0.69 44.8 0 0.56–0.70 0.84 51.3 0 *Correlation between each item and its own scale (corrected for overlap) should be ⱖ0.40 [18]. † Cronbach’s alpha should be ⱖ0.70 for acceptable internal consistency reliability [19]. DSC-R, Diabetes Symptoms Checklist-Revised.

0.51–0.63 0.76 60.2 0.1 0.48–0.70 0.79 38.5 0.1 0.64–0.78 0.87 31.6 0.3 Item convergent validity* Internal consistency (Cronbach’s alpha)† % scoring at floor % scoring at ceiling

Hyperglycemic Hypoglycemic Ophthalmologic Cardiovascular

DSC-R Domains

Neuropathic sensory Neuropathic pain Psychological— cognitive Psychological— fatigue

Table 1 DSC-R scaling test results

baseline, the percentage of missing data for the DSC-R items ranged from 0.37% (n = 16) missing for item 15 “Pain in calves at night” to 1.63% missing for item 23 “Frequent voiding.” Thus, there was considered to be an excellent quality of completion. The model fit indices for the CFA all narrowly failed to meet the criteria for acceptability: the GFI was 0.9022, the RMR was 0.0522, the CFI was 0.9029, and the RMSEA was 0.055 (90% lower confidence limit: 0.0538). Standardized factor loadings ranged from 0.50 to 0.85—all were well above the 0.40 threshold used as a rule of thumb for acceptable factor loadings. Scaling test results are summarized in Table 1. Item-domain correlations ranged from 0.44–0.78; thus all items met the test of item convergent validity (a correlation of >0.40 between the item and the other items in its domain). Almost all of the scales surpassed the 0.70 threshold for acceptable internal consistency reliability (Cronbach’s alpha range: 0.69–0.87). “Cardiovascular” was the single exception, with an alpha coefficient of 0.69, just below the alpha threshold. Across all of the domains there was a high floor effect (for each domain the percentages of patients with lowest possible score ranged from 27.8% to 60.2%), reflecting the fact that many patients gave the response “Did not occur” for all the items in those domains. Zero-order interdomain correlations ranged from r = 0.39 (between Neuropathic Sensory and Hyperglycemic domains) to r = 0.71 (between Psychological Fatigue and Psychological Cognitive domains). Neuropathic Sensory and Neuropathic Pain domains showed a zero-order correlation of 0.62. Thus, we believe that the domains are related but not redundant. Correlations between the DSC-R scale scores and the SF-36 scale scores ranged from -0.22 to -0.69 and were mostly small to moderate (Table 2). Scales measuring concepts that would be expected to be related correlated the most highly. For example, the highest correlation was between the DSC-R Psychological Fatigue domain and the SF-36 Vitality scale (r = -0.69), both of which measure fatigue/vitality. Similarly, the DSC-R Neuropathic Pain domain correlated most highly with the SF-36 Bodily Pain scale (r = -0.38) and the DSC-R Hypoglycemic domain (which includes items asking about mood and irritability) correlated most highly with the SF-36 Mental Health scale (r = -0.45) and the Mental Component Scale (r = -0.45). Known groups validity results are summarized in Table 3. There were statistically significant differences in all DSC-R scale scores among the four groups determined by HbA1c level (P < 0.005). For all DSC-R scales among the three groups <6%, 6% to 6.9% and 7% to 7.9%, there was a clear linear progression of higher DSC-R scores for higher HbA1c levels. However, with the exceptions of the Neuropathic pain and Ophthalmologic scales, the ⱖ8% group did not continue this progression—for all other DSC-R scales scores for this group were slightly lower than for the 7% to 7.9% group. For all DSC-R scales, there was a linear pattern of higher DSC-R scores (indicating greater symptom bother) for patients with higher BMI, with statistically significant differences among the groups (P < 0.0001). Responsiveness results are summarized in Table 4. Analysis of changes in DSC-R scale scores from baseline to year one revealed the expected pattern, but changes were very small for all groups. Patients who believed they had “much worse health” compared to baseline had worse scores on the DSC-R scales compared to baseline (effect sizes ranged from 0.18 to 0.51). Patients who reported their health was “about the same” had negligible changes in their DSC-R scores between baseline and 1-year follow up (effect sizes ranged from -0.12 to 0.10), and there were small or negligible improvements in DSC-R scores in patients

Total score

Psychometric Validity of the DSC-R

Arbuckle et al.

1172 Table 2

Concurrent validity: Correlation coefficients between DSC-R scores and SF-36 scale scores (N = 4286) SF-36 scales

DSC-R domains Psychological—fatigue Psychological—cognitive Neuropathic pain Neuropathic sensory Cardiovascular Ophthalmologic Hypoglycemic Hyperglycemic Total

Bodily pain

General health

Mental health

Physical functioning

Role emotional

Role physical

Social functioning

-0.43* n = 4256† -0.37 n = 4259 -0.38 n = 4275 -0.37 n = 4270 -0.37 n = 4272 -0.27 n = 4274 -0.31 n = 4278 -0.29 n = 4255 -0.47 n = 4230

-0.46 n = 4245 -0.37 n = 4248 -0.28 n = 4264 -0.25 n = 4259 -0.34 n = 4261 -0.23 n = 4263 -0.32 n = 4267 -0.30 n = 4244 -0.45 n = 4219

-0.45 n = 4249 -0.39 n = 4252 -0.25 n = 4267 -0.23 n = 4262 -0.30 n = 4264 -0.23 n = 4266 -0.45 n = 4270 -0.28 n = 4247 -0.46 n = 4223

-0.41 n = 4242 -0.32 n = 4244 -0.34 n = 4260 -0.28 n = 4255 -0.41 n = 4257 -0.23 n = 4260 -0.25 n = 4264 -0.27 n = 4241 -0.43 n = 4216

-0.41 n = 4232 -0.38 n = 4249 -0.23 n = 4250 -0.25 n = 4244 -0.29 n = 4246 -0.25 n = 4249 -0.35 n = 4253 -0.26 n = 4230 -0.41 n = 4206

-0.45 n = 4211 -0.37 n = 4212 -0.30 n = 4229 -0.28 n = 4223 -0.32 n = 4225 -0.25 n = 4228 -0.30 n = 4233 -0.28 n = 4209 -0.44 n = 4186

-0.46 n = 4256 -0.42 n = 4259 -0.32 n = 4275 -0.29 n = 4270 -0.35 n = 4272 -0.27 n = 4274 -0.42 n = 4278 -0.28 n = 4255 -0.49 n = 4230

Vitality

Mental component summary

Physical component summary

-0.69 n = 4250 -0.52 n = 4253 -0.33 n = 4268 -0.30 n = 4263 -0.44 n = 4265 -0.29 n = 4267 -0.42 n = 4271 -0.36 n = 4248 -0.61 n = 4224

-0.48 n = 4177 -0.42 n = 4178 -0.22 n = 4194 -0.22 n = 4188 -0.29 n = 4190 -0.24 n = 4193 -0.45 n = 4198 -0.28 n = 4174 -0.47 n = 4152

-0.47 n = 4177 -0.36 n = 4178 -0.38 n = 4194 -0.33 n = 4188 -0.42 n = 4190 -0.26 n = 4193 -0.25 n = 4198 -0.31 n = 4174 -0.48 n = 4152

*Spearman’s correlation, P < 0.0001 for all correlations examined. † Number of patients. DSC-R, Diabetes Symptoms Checklist-Revised; SF-36, Short-Form 36.

who believed their health was much better (effect sizes ranged from -0.32 to -0.08). This pattern was observed across all scales, except the “Hyperglycemic” scale. MID were estimated to range from 0.39 to 0.60 points across domains when estimated using distribution methods (0.5 of a standard deviation) and from 0.00 to 0.33 points when estimated using anchor-based methods (change for patients who reported that they were “somewhat better” or “somewhat worse” on the SF-36 item 2). Therefore, a conservative approach would be to adopt the distribution-based MID for all domains.

Conclusions The ADOPT study provided an excellent opportunity to examine the psychometric properties of the DSC-R in a large sample of type 2 diabetes patients. Analysis of data gathered from this study confirmed that the DSC-R has good validity and reliability. Construct validity testing through CFA and multitrait analyses confirmed the validity of the DSC-R item-domain structure, with all items loading on and correlating with the domain in which they were included. A priori criteria for “goodness of fit” were not quite met, but in all cases the criteria were only narrowly missed. Furthermore, it could be argued that different symptom clusters should not necessarily be expected to be (strongly) related. For example, a patient experiencing ophthalmologic symptoms (and associated distress) would not necessarily be expected to be experiencing cardiovascular symptoms. The Cardiovascular domain was the only domain that failed (narrowly) to meet the criterion for internal consistency, but this is unsurprising given that the scale includes items relating to shortness of breath and pain/palpitations in the heart, which would not necessarily be expected to be closely related. Both correlations among the factors of the CFA (shown in Fig. 2) and zero-order Pearson’s correlations between the domains (paragraph 4 of the results) were conducted. For the correlations between the “psychological fatigue” and “psychological cognitive” domains, there were considerable differences between the zero-order or “raw” correlations between the

domains (0.71), and the “adjusted” interdomain correlation in the CFA model (0.33). This difference is most likely explained by the fact that these two factors are highly related to the Total score (factor loading with Total score = 0.81 for “psychological fatigue” and 0.84 for “psychological cognitive”). As the Total score explains a large part of the variance between these two factors, it only leaves a “residual” correlation of only 0.33 between the “psychological” domains in the model. By contrast, the raw correlation between the “neuropathic pain” and “neuropathic sensoric” domains is close to the correlation given by the model (0.62 and 0.60 respectively), and the loading of these “neuropathic” domains on the Total score are lower than those of the “psychological” domains (0.68 and 0.66 vs. 0.81 and 0.84). This shows that the two “neuropathic” domains may have a strong “residual” correlation independent from the Total score. The relatively high correlation between the two “neuropathic” domains could indicate the presence of a higher-order factor. However, it was deliberately decided not to evaluate this as part of the model, because the objective was to evaluate the fit of the original model of the DSC-R and not to try to find the best structure. Furthermore, it also could be argued that the two domains should be left separate as they are distinct in terms of the item content: one focuses on neuropathic pain while the other asks about other neuropathic sensations. As the model fit was acceptable with this specification and the other psychometric results of the two separate domains were good, it was decided to keep these two distinct (but correlated) domains separate. Future validation of the DSC-R could further explore whether these domains are best kept separate or if they should be combined into a single higher-order domain. An examination of correlations between DSC-R scores and SF-36 scale scores suggested that the DSC-R has acceptable concurrent validity. Domains that one would expect to be related correlated more highly than domains measuring less closely related concepts. Correlations were generally small or moderate, but this is unsurprising given that the DSC-R is a disease-specific measure of symptom distress and the SF-36 is a generic measure

P-value (ANOVA)

ⱖ8%

7–7.9%

6–6.9%

<6% 0.61 (0.85) n = 167 0.78 (0.93) n = 1278 0.86 (0.98) n = 1883 0.84 (0.95) n = 840 0.001

Groups 1.07 (1.09) n = 168 1.16 (1.21) n = 1274 1.27 (1.2) n = 1885 1.24 (1.19) n = 838 0.0035

Mean (SD) n

Mean (SD) n

P-value (ANOVA)

ⱖ40

30–39.9

25-29.9

0.69 (0.91) n = 335 0.70 (0.87) n = 1429 0.87 (0.98) n = 2036 1.04 (1.04) n = 453 <0.0001

1.04 (1.11) n = 332 1.05 (1.12) n = 1428 1.27 (1.21) n = 2039 1.62 (1.34) n = 451 <0.0001

<25

Groups

Psychological-cognitive Mean (SD) n

-0.13 n = 543 -0.08 n = 1009 -0.01 n = 1683 0.22 n = 215 0.25 n = 20

-0.21 n = 535 -0.10 n = 1001 0.03 n = 1664 0.33 n = 211 0.51 n = 19

-0.10 n = 539 -0.02 n = 1013 0.10 n = 1678 0.21 n = 214 0.19 n = 20

Neuropathic pain Effect size n

-0.08 n = 541 -0.00 n = 1009 0.06 n = 1680 0.21 n = 216 0.18 n = 20

-0.18 n = 542 -0.08 n = 1010 0.06 n = 1681 0.29 n = 216 0.40 n = 21

0.4 (0.67) n = 168 0.55 (0.83) n = 1279 0.58 (0.85) n = 1893 0.64 (0.87) n = 841 0.0004

Mean (SD) n

0.50 (0.79) n = 335 0.50 (0.77) n = 1435 0.61 (0.87) n = 2045 0.73 (0.92) n = 453 <0.0001

-0.16 n = 540 -0.07 n = 1013 -0.01 n = 1681 0.06 n = 214 0.39 n = 20

0.61 (0.93) n = 168 0.72 (0.98) n = 1280 0.79 (1.01) n = 1894 0.77 (1) n = 843 0.0151

Mean (SD) n

0.59 (0.89) n = 337 0.63 (0.91) n = 1433 0.80 (1.00) n = 2049 1.04 (1.16) n = 453 <0.0001

Hypoglycemic Mean (SD) n

-0.13 n = 543 -0.06 n = 1011 0.03 n = 1682 0.22 n = 215 0.24 n = 20

Hypoglycemic Effect size n

Ophthalmologic Mean (SD) n

Ophthalmologic Effect size n

0.42 (0.62) n = 167 0.58 (0.78) n = 1278 0.67 (0.78) n = 1892 0.61 (0.79) n = 843 <0.0001

Mean (SD) n

0.41 (0.65) n = 338 0.50 (0.71) n = 1433 0.67 (0.79 n = 2040 0.92 (0.90) n = 455 <0.0001

Cardiovascular Effect size n

DSC-R Domains

0.42 (0.76) n = 167 0.54 (0.8) n = 1279 0.61 (0.84) n = 1889 0.59 (0.85) n = 843 0.0008

Mean (SD) n

0.45 (0.76) n = 337 0.47 (0.75) n = 1433 0.60 (0.82) n = 2040 0.85 (1.00) n = 454 <0.0001

Cardiovascular Mean (SD) n

DSC-R domains Neuropathic sensory Mean (SD) n

Neuropathic sensory Effect size n

0.36 (0.69) n = 168 0.42 (0.73) n = 1280 0.49 (0.78) n = 1892 0.52 (0.83) n = 842 0.0009

Mean (SD) n

0.37 (0.69) n = 338 0.38 (0.70) n = 1435 0.49 (0.78) n = 2042 0.67 (0.91) n = 454 <0.0001

Neuropathic pain Mean (SD) n

*Defined by responses to SF-36 item 2 “Compared to one year ago, how would you rate your health in general now?” †Mean DSC-R change scores between baseline and year by the standard deviation at baseline. DSC-R, Diabetes Symptoms Checklist-Revised; SF-36, Short-Form 36.

“Much worse”

“Somewhat worse”

“About the same”

“Somewhat better”

“Much better”

Psychological-cognitive Effect size n

Psychological-fatigue Effect size† n

Responsiveness of DSC-R total and subscale scores

Change* groups

Table 4

ANOVA, analysis of variance; BMI, body mass index; DSC-R, Diabetes Symptoms Checklist-Revised.

HbA1c levels

BMI

Known groups validity

Psychological-fatigue Mean (SD) n

Table 3 Known groups validity of Dsc-R total score and by domain

-0.32 n = 538 -0.20 n = 1015 -0.12 n = 1686 -0.07 n = 214 0.24 n = 21

Hyperglycemic Effect size n

0.95 (1.07) n = 167 1.15 (1.14) n = 1272 1.23 (1.16) n = 1886 1.19 (1.13) n = 838 0.0048

Mean (SD) n

0.96 (1.04) n = 336 1.02 (1.06) n = 1430 1.24 (1.15) n = 2032 1.59 (1.30) n = 451 <0.0001

Hyperglycemic Mean (SD) n

-0.22 n = 526 -0.11 n = 986 0.02 n = 1645 0.25 n = 209 0.48 n = 19

Total score Effect size n

0.6 (0.59) n = 167 0.74 (0.68) n = 1264 0.81 (0.72) n = 1870 0.8 (0.71) n = 831 <0.0001

Mean (SD) n

0.63 (0.64) n = 330 0.65 (0.63) n = 1415 0.82 (0.71) n = 2023 1.06 (0.81) n = 448 <0.0001

Total score Mean (SD) n

Psychometric Validity of the DSC-R 1173

Arbuckle et al.

1174 of health status. Indeed, the relatively small correlations provide support for the value of a disease-specific measure such as the DSC-R. Analysis of DSC-R scores according to differences in clinical variables supported the discriminative or “known groups” validity of the instrument. DSC-R scores were higher among patients with higher BMI levels. Interestingly, a recent study in nondiabetic obese and nonobese participants also found significant higher DSC-R scores in the obese subjects, further confirming the discriminative validity of the measure [23]. The finding that a linear relationship between DSC-R scores and HbA1c levels was not observed above HbA1c values of 7% to 8% is intriguing and deserves further investigation. MID was estimated using both anchor and distribution methods, with anchor-based estimate resulting in very much smaller MIDs than the distribution-based estimate. The applicability of the anchor-based results may be limited by the appropriateness of the anchors used. Ideally, an anchor more closely related to diabetes symptom-distress would have been used. In the absence of such an anchor, the SF-36 change item was used; however, the results should be interpreted with caution due to the questionable appropriateness of this anchor. A conservative approach would be to accept the MID estimated based on the distribution approach of 0.5 of the baseline standard deviation. This is more conservative because this distribution approach resulted in the largest MID and would therefore require a greater change in score in order to conclude a meaningful difference had been achieved. The study has a number of limitations that need to be mentioned. First, the study was performed in recently diagnosed participants (less then 3 years) who had still few diabetes-related symptoms. This was reflected in the high percentages of patients who scored at floor. Despite the changes being very small, there were still clear differences in DSC-R change scores between subjects who reported changes in their health and those who reported their health to be unchanged. Further evaluation of changes over time in a sample of patients who have more severe diabetes symptoms is recommended. Second, because the psychometric validation was performed post hoc using clinical trial data, test– retest reliability could not be assessed and should be evaluated in future studies. Third, due to the post hoc nature of the study, we were constrained in the other PRO and clinical measures available for validation of the DSC-R. In particular, the criterion used to define change groups in the responsiveness analysis was less than optimal—further testing of responsiveness using a change criterion which is focused on changes in diabetes symptoms (as opposed to general health) is recommended as a priority. In summary, the evidence reported here suggests that the DSC-R has acceptable reliability, validity, and sensitivity to changes over time, thus making it a suitable measure of diabetes symptom burden for use in clinical research involving patients with type 2 diabetes.

Acknowledgments The authors would like to acknowledge the contribution of the ADOPT steering committee in designing the ADOPT trial and collecting the data as well as commenting on a draft of the article. Source of financial support: This article was supported by a grant from GlaxoSmithKline Pharmaceuticals. Competing interests: KV was and BA is an employee of GlaxoSmithKline. LH, RA, JS, and MV are all employees of Mapi Values and were contracted by GlaxoSmithKline to conduct the study. FS has received financial compensation from GSK to consult on this validation project.

Authors’ Contributions RA and JS designed the study, interpreted the results, and participated in the writing of the article. LH wrote the first draft of the article. MV wrote the first draft of the analysis plan, performed the psychometric validation analysis, and reviewed the article. KV and BA were involved in the design of the study, reviewed the analysis plan, interpreted the results, and reviewed the article. FS reviewed the analysis plan, interpreted the results, and reviewed and participated in the writing of the article.

References 1 Rubin RR, Peyrot M. Quality of life and diabetes. Diabetes Metab Res Rev 1999;15:205–18. 2 Grootenhuis PA, Snoek FJ, Heine RJ, Bouter LM. Development of a type 2 diabetes symptom checklist: a measure of symptom severity. Diabet Med 1994;11:253–61. 3 American Diabetes Association Diabetes Symptoms. Available from: http://www.diabetes.org/diabetes-symptoms.jsp [Accessed October, 2007]. 4 Valk GD, Grootenhuis PA, Bouter LM, Bertelsmann FW. Complaints of neuropathy related to the clinical and neurophysiological assessment of nerve function in patients with diabetes mellitus. Diabetes Res Clin Pract 1994;26:29–34. 5 Van der Does FE, De Neeling JN, Snoek FJ, et al. Symptoms and well-being in relation to glycemic control in type II diabetes. Diabetes Care 1996;19:204–10. 6 de Sonnaville JJ, Snoek FJ, Colly LP, et al. Well-being and symptoms in relation to insulin therapy in type 2 diabetes. Diabetes Care 1998;21:919–24. 7 Adriaanse MC, Dekker JM, Spijkerman AM, et al. Health-related quality of life in the first year following diagnosis of Type 2 diabetes: newly diagnosed patients in general practice compared with screening-detected patients. The Hoorn Screening Study. Diabet Med 2004;21:1075–81. 8 Adriaanse MC, Dekker JM, Spijkerman AM, et al. Diabetesrelated symptoms and negative mood in participants of a targeted population-screening program for type 2 diabetes: the Hoorn Screening Study. Qual Life Res 2005;14:1501–9. 9 Fischer JS, McLaughlin T, Loza L, et al. The impact of insulin glargine on clinical and humanistic outcomes in patients uncontrolled on other insulin and oral agents: an office-based naturalistic study. Curr Med Res Opin 2004;20:1703–10. 10 Secnik BK, Matza LS, Oglesby A, et al. Patient-reported outcomes in a trial of exenatide and insulin glargine for the treatment of type 2 diabetes. Health Qual Life Outcomes 2006;4:80. 11 Vinik AI, Zhang Q. Adding insulin glargine versus rosiglitazone: health-related quality-of-life impact in type 2 diabetes. Diabetes Care 2007;30:795–800. 12 Kahn SE, Haffner SM, Heise MA, et al. Glycemic durability of rosiglitazone, metformin, or glyburide monotherapy. N Engl J Med 2006;355:2427–43. 13 McHorney CA, Kosinski M, Ware JE, Jr. Comparisons of the costs and quality of norms for the SF-36 health survey collected by mail versus telephone interview: results from a national survey. Med Care 1994;32:551–67. 14 McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res 1995;4:293–307. 15 McHorney CA, Ware JE Jr., Lu JF, Sherbourne CD. The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care 1994;32:40–66. 16 McHorney CA, Ware JE Jr., Raczek, AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care 1993;31:247–63. 17 Shumaker RE, Lomax RG. A Beginner’s Guide to Structural Equation Modeling. Mahwah, NJ: Lawrence Erlbaum Associates Publishers, 1996.

Psychometric Validity of the DSC-R 18 Hays R, Hayashi T. Beyond internal consistency reliability: rationale and user’s guide for multitrait analysis program on the microcomputer. Behav Res Methods Instrum Comput 1990;22:167–75. 19 Nunnally JC, Bernstein IH. Psychometric Theory. New York: McGraw-Hill, 1994. 20 Cohen J. Statistical Power Analysis for the Behavioral Sciences. New York: Academic Press, 1988. 21 Lydick E, Epstein RS. Interpretation of quality of life changes. Qual Life Res 1993;2:221–6.

1175 22 Middel B, Stewart R, Bouma J, et al. How to validate clinically important change in health-related functional status. Is the magnitude of the effect size consistently related to magnitude of change as indicated by a global question rating? J Eval Clin Pract 2001;7:399–410. 23 Matza LS. Obese versus non-obese patients with type 2 diabetes: patient reported outcomes and utility of weight change. Curr Med Res Opin 2007;23:2051–62.