An integrated method to determine meaningful changes in health-related quality of life

An integrated method to determine meaningful changes in health-related quality of life

Journal of Clinical Epidemiology 57 (2004) 1153–1160 An integrated method to determine meaningful changes in health-related quality of life Ross D. C...

208KB Sizes 0 Downloads 12 Views

Journal of Clinical Epidemiology 57 (2004) 1153–1160

An integrated method to determine meaningful changes in health-related quality of life Ross D. Crosbya,b,*, Ronette L. Kolotkinc,d, G. Rhys Williamse a Neuropsychiatric Research Institute, 700 First Avenue South, Fargo, North Dakota, 58107, USA University of North Dakota School of Medicine and Health Sciences, 1919 Elm St North, Fargo, North Dakota, 58102, USA c Obesity and Quality of Life Consulting, 1400 Norwood Ave, Durham, North Carolina, 27707, USA d Department of Community and Family Medicine, Duke University Medical Center, PO Box 2914, Durham, North Carolina, 27710, USA e Bristol-Myers Squibb, PO Box 4000, Princeton, New Jersey, 08543-4000, USA b

Accepted 14 April 2004

Abstract Objective: We describe an integrated method for determining meaningful change in health-related quality of life (HRQOL) that combines information from anchor-based and distribution-based methods and illustrate this method using data aggregated from weight loss studies. Study Design and Setting: A total of 1476 participants in weight loss studies were evaluated at baseline and at 6 months using the Impact of Weight on Quality of Life-Lite (IWQOL-Lite). Severity of baseline impairment was determined by comparing scores with those obtained from a normative sample of 534 normal/overweight individuals. The precision of the IWQOL-Lite was evaluated using standard error of measurement corrected for regression to the mean. Weight loss was used as an anchor for evaluating changes in IWQOLLite scores. Results: Change in HRQOL varied as a function of weight loss and baseline severity of HRQOL. Using this integrated method, an improvement of 7.7 to 12 points (depending on baseline severity) on IWQOL-Lite total score is considered meaningful. Conclusion: Meaningful change in HRQOL can be determined using an integrated method that (1) combines information from anchorbased and distribution-based methods, (2) reconciles discrepancies between these two methods, and (3) adjusts for baseline severity and regression to the mean. This method may be applied to other types of HRQOL measures and conditions. 쑖 2004 Elsevier Inc. All rights reserved. Keywords: Meaningful change; Clinically important differences; Health-related quality of life; IWQOL-Lite; Weight loss; Obesity

1. Introduction There is growing interest in the field of obesity on healthrelated quality of life (HRQOL) [1–4]. Measures of HRQOL are increasingly being used to evaluate treatments for obesity, to make therapeutic decisions about the initiation and type of obesity treatment, and to allocate clinical and research resources. Obesity has been consistently linked to impairments in important aspects of HRQOL, including physical health, emotional well-being, and psychosocial functioning [5–7]. Further, weight loss has been associated with improvements in HRQOL [8–11]. However, little work has examined whether these observed improvements in HRQOL after weight loss are meaningful.

* Corresponding author. Neuropsychiatric Research Institute, 700 First Avenue South Fargo, ND 58107. Tel.: 701-293-1335; fax: 701-293-3226. E-mail address: [email protected] (R.D. Crosby). Portions of this paper were presented at the 12th European Congress on Obesity, Helsinki, Finland, May 2003. 0895-4356/04/$ – see front matter 쑖 2004 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2004.04.004

It has long been recognized that traditional statistical methods that are used to evaluate treatment efficacy are inadequate for addressing issues of the clinical significance of the effects of that treatment [12–15]. As noted by Jacobson and Truax [15], “Whether a treatment effect exists in the statistical sense has little to do with the clinical significance of the effect.” Statistical effects are those that occur beyond some level of chance. In contrast, the clinical significance of that effect refers to the benefits derived from that treatment, its impact upon the patient, and its implications for treatment of the patient [15–17]. A number of methods have been proposed for establishing clinical significance in HRQOL [17]. These methods can be broadly classified as anchor-based methods or distributionbased methods. Anchor-based methods [18] compare changes in HRQOL outcome with other measures or known phenomena that have clinical relevance. Lydick and Epstein [18] have likened this to establishing the construct validity of a measure. Examples of anchors that have been used include global ratings of change [16,19] and comparison to

1154

R.D. Crosby et al. / Journal of Clinical Epidemiology 57 (2004) 1153–1160

normative populations [15]. We have suggested the term “criterion-referenced change” to refer to meaningful change established using anchor-based methods [17]. Anchor-based methods do not take into account the measurement precision of the instrument. Consequently, these methods provide no information about the range of change that would be expected by random variation alone. In contrast to anchor-based methods, distribution-based methods use some statistical property of the sample or the instrument to establish clinically meaningful change. Examples of distribution-based techniques include effect size [20], reliable change index [15], and standard error of measurement (SEM) [11,21]. Several distribution-based methods (e.g., reliable change index, SEM) take into account the measurement precision of the instrument. We have suggested the term “precision-referenced change” to describe meaningful change established using distribution-based measures of instrument precision [17]. A disadvantage of distributionbased methods is that there are few agreed-upon benchmarks for establishing clinically significant change. In addition, distribution-based methods alone do not provide a good sense of the clinical relevance of the observed change. In our recent review [17], we identified the need for an integrated system of determining meaningful change that combines information from anchor-based and distributionbased methods and takes other relevant factors, such as baseline impairment in HRQOL and regression to the mean (RTM), into account. We are aware of only two previous attempts to combine anchor-based and distribution-based methods for determining meaningful change in HRQOL. Cella et al. [22] used anchor-based and distribution-based information to determine meaningful change in HRQOL among cancer patients. Concordance rates between these two methods were found to be high, supporting the validity of their established cutoffs. No guidelines were provided for resolving discrepancies in classification between these methods when they did occur. Jacobson and Truax [15] describe a method that integrates anchor-based and distribution-based approaches to determine meaningful change. As the anchor-based criteria, they propose comparing post-treatment functioning to characteristics of a known functional population or a known dysfunctional population of relevance. As the distribution-based measure, they propose the reliable change index for determining whether the change after treatment exceeds the limits of random variation. An individual is considered to be improved (or deteriorated) only when they meet the anchor-based criteria and distribution-based criteria for establishing clinically meaningful change. Thus, information from both of these methods is used to establish cutoffs to determine meaningful change. Several investigators have raised the question of whether individuals with more severe impairments in HRQOL require a greater change to be considered meaningful than those with less severe impairments [17,23–24]. Few studies have examined this issue directly. Stratford et al. [25] reported that patients with more severe low back pain required

greater change to be considered “clinically important” than those with less severe pain. We have previously reported [26] that among obese individuals experiencing comparable weight loss, those with more severe initial impairments in HRQOL reported greater improvement than those with less severe impairments. Finally, in a longitudinal study of individuals losing and subsequently regaining at least 5% of their initial weight, Engel et al. [27] reported that those with more severe initial impairments in HRQOL experienced greater improvements during weight loss and greater deterioration during weight regain than those with less severe impairments. Taken together, these suggest the importance of considering baseline HRQOL impairments in establishing cutoffs for determining meaningful change. A second issue of relevance, related to that of baseline severity, is RTM. RTM is a statistical error-based artifact describing the tendency of extreme scores to become less extreme at follow-up. RTM is typically established by showing a correlation between baseline distance from the mean (typically a normative mean) and subsequent change. When present, RTM has implications for establishing criteria for determining meaningful change because individuals with more extreme scores improve, on average, more than individuals with less extreme scores. Several distribution-based methods have been proposed that take RTM into account [28– 30]. All of these methods share the feature that different thresholds are established for determining meaningful change based on the initial distance from the normative mean. The purpose of this article is to describe and illustrate an integrated method of determining meaningful change in HRQOL in response to weight loss treatment. Similar to the methods described by Cella et al. [22] and Jacobson and Truax [15], the current method combines information from anchor-based and distribution-based techniques. In addition, this integrated method takes into account the severity of baseline impairment in HRQOL and RTM and provides a systematic method for resolving discrepancies between anchor-based and distribution-based classification. 2. Methods 2.1. Subjects Subjects were 1,476 (1,101 women, 375 men) obese (body mass index [BMI] ⭓30) individuals participating in one of eight weight loss studies. These studies included an open-label trial combining phentermine-fenfluramine and dietary counseling (n ⫽ 181) [11]; four double-blind, placebo-controlled trials of sibutramine (n ⫽ 469) [31]; a naturalistic weight loss study in a managed care setting (n ⫽ 337) [32]; a double-blind, placebo-controlled trial of bupropion (n ⫽ 232) [33]; and a randomized controlled trial comparing a self-help and a structured commercial weight loss program (n ⫽ 257) [34]. Subjects assigned to both active medication and placebo in the placebo-controlled trials were included to provide a full range of weight loss/gain.

R.D. Crosby et al. / Journal of Clinical Epidemiology 57 (2004) 1153–1160

2.2. Measures 2.2.1. Impact of Weight on Quality of Life-Lite The Impact of Weight on Quality of Life-Lite (IWQOLLite) [35] is a 31-item self-report measure that assesses obesity-specific HRQOL. The IWQOL-Lite contains items from five domains (physical function, self-esteem, sexual life, public distress, and work) and provides scores for separate domains and a total score. Internal consistency coefficients for the IWQOL-Lite have ranged from 0.90 to 0.96 [35] and test-retest coefficients from 0.81 to 0.94 [36]. Confirmatory factor analysis has been used to verify the scale structure of the IWQOL-Lite [35]. Improvements in IWQOL-Lite scores have been shown to correlate with weight loss [11,35]. All IWQOL-Lite scores are based on 0 to 100 scoring, with 0 representing the poorest and 100 the best quality of life. 2.3. Procedures Participants completed the IWQOL-Lite at entry into their respective studies and again at 6 months. Height, weight, and BMI data were obtained for each subject at both points in time. 2.4. Statistical Analyses Initial severity in obesity-specific HRQOL was determined by comparing baseline IWQOL-Lite total scores with scores obtained from a normative sample consisting of 534 normal weight and overweight (BMI ⫽ 18–29.9) individuals (238 women, 296 men) not currently enrolled in any weight loss treatment program [37]. The following criteria were used to determine baseline severity of HRQOL: none—⬍1 standard deviation (SD) below normative mean; mild—⭓1 but ⬍2 SD from normative mean; moderate—⭓2 but ⬍3 SD from normative mean; and severe—⭓3 SD from normative mean. Differences in BMI across severity categories were compared using one-way ANOVA. Changes in HRQOL over the 6-month interval were calculated as the difference between baseline and 6-month IWQOL-Lite total score. Effect sizes were calculated by dividing the 6-month change in IWQOL-Lite total score by the SD of the entire sample at baseline (17.7). Participants were classified in terms of 6-month weight loss into the following categories: (1) 0.1% to 9.9% gain, (2) 0% to 4.9% loss, (3) 5% to 9.9% loss, and (4) 10%⫹ loss. A two-way ANOVA was performed comparing HRQOL change in terms of baseline HRQOL impairment (none, mild, moderate, severe) and weight loss category. Partial η2 coefficients were calculated for main effects and interactions to determine the unique portion of criterion variance accounted for by each term. The precision of the IWQOL-Lite was evaluated based on the SEM. The SEM was calculated by the following formula [38]: SEM ⫽ SDt√1⫺rt where SDt is the standard deviation of the sample and rt is

1155

the reliability of the instrument. The SEM is often used to evaluate the differences between scores of individuals taken at different times and is thus “particularly well suited to the interpretation of individual scores” [38]. The SEM is best considered a property of the measurement instrument rather than a characteristic of a sample. Unlike the SD and the reliability coefficient of the instrument, which vary from sample to sample, the SEM remains relatively constant across homogeneous and heterogeneous samples [38–39]. Based upon the recommendation of Hageman and Arrindell [29] for calculating SEM, a single fixed value of the IWQOLLite SEM was assumed. This approach has the advantage of allowing comparison of results across studies. As internal consistency coefficients were available for the IWQOL-Lite based on a much larger sample than test-retest coefficients, values for SDt and rt were calculated using the current IWQOL-Lite database of 3,643 respondents [37]. Baseline to 6-month change scores were adjusted for RTM using the Edwards-Nunnally method [28,40]. The Edwards-Nunnally method classifies pre-post change scores as improved or deteriorated based on confidence intervals (CIs) calculated using the SEM. However, unlike the SEM method, which centers the CI on the pretest score (x1), the CI for the Edwards-Nunnally method is centered around the estimated unbiased (referred to as the “true”) pretest score corrected for RTM. Consequently, an individual’s estimated true score would be closer to the mean of the group than to the actual pretest score, unless the test is perfectly reliable [41], and the discrepancy between actual and “true” score would be magnified as the unreliability of the measure increases. The Edwards-Nunnally index was calculated as: t′⫺x2 SEM where t′, the true score, is t′ ⫽ rt(x1 ⫺ MG) ⫹ MG, in which rt is the reliability coefficient of the measure, and MG is the mean of the normative group toward which the scores are assumed to regress. In contrast to the CI based on the SEM, the Edwards-Nunnally CI is asymmetrical around the actual pre-test score (x1). Thus, if the pretest score is below the normative mean (assuming higher scores indicate better quality of life), the post-test score must be farther from the pretest score (i.e., greater improvement is required) to be considered improved for the Edwards-Nunnally method than for the SEM method. The opposite pattern is true for determining meaningful deterioration. Threshold values of ⫾1.96 were used based on the 95% CI to establish cutoffs for improvement and deterioration, respectively. Individuals falling within that interval were classified as unchanged. The mean HRQOL change for participants losing 5% to 9.9% within each baseline HRQOL severity classification was used as the anchor for determining criterion-referenced improvement. The 5% to 9.9% weight loss anchor was chosen based on recommendations made by the Food and Drug Administration [42] and on recommendations made

1156

R.D. Crosby et al. / Journal of Clinical Epidemiology 57 (2004) 1153–1160

by recent reviewers of the weight loss literature [43–44]. Individuals with change scores greater than that cutoff were considered to be improved, and individuals with change scores below this cutoff were considered to be unchanged. Because ⬍2% of all participants gained 5% or more during the 6-month interval, no criterion-referenced cutoffs were established for deterioration. Discrepancies between criterion-referenced and precisionreferenced (corrected for RTM) improvement cutoffs were resolved by choosing the larger of the two cutoffs for any given baseline IWQOL-Lite score. Cutoffs for determining meaningful deterioration were based solely on the precisionreferenced cutoff. For any given baseline IWQOL-Lite score, individuals with a 6-month IWQOL-Lite change greater than the established improvement cutoff were classified as demonstrating meaningful improvement in HRQOL, individuals with a 6-month IWQOL-Lite change less than the established deterioration cutoff were classified as demonstrating meaningful deterioration in HRQOL, and all others were considered as unchanged. Spearman’s rank order correlations (rho) were calculated between HRQOL change category (1 ⫽ improved, 2 ⫽ unchanged, 3 ⫽ deteriorated), baseline impairment category (1 ⫽ none, 2 ⫽ mild, 3 ⫽ moderate, 4 ⫽ severe), and weight loss category (1 ⫽ 0.1% to 9.9% gain, 2 ⫽ 0% to 4.9% loss, 3 ⫽ 5% to 9.9% loss, 4 ⫽ 10%⫹ loss). 3. Results

Table 1 IWQOL-Lite total score at baseline and change at 6 months Baseline IWQOL impairment(Weight loss/gain) None (⬍1 SD) 0.1–9.9% gain (n ⫽ 66) 0–4.9% loss (n ⫽ 117) 5–9.9% loss (n ⫽ 80) 10%⫹ loss (n ⫽ 74) Mild (1–2 SD) 0.1–9.9% gain (n ⫽ 52) 0–4.9% loss (n ⫽ 74) 5–9.9% loss (n ⫽ 60) 10%⫹ loss (n ⫽ 70) Moderate (2–3 SD) 0.1–9.9% gain (n ⫽ 32) 0–4.9% loss (n ⫽ 77) 5–9.9% loss (n ⫽ 47) 10%⫹ loss (n ⫽ 65) Severe (⬎3 SD) 0.1–9.9% gain (n ⫽ 110) 0–4.9% loss (n ⫽ 224) 5–9.9% loss (n ⫽ 164) 10%⫹ loss (n ⫽ 164)

Baseline (mean ⫾ SD)

Changea (mean ⫾ SD)

Effect sizeb

93.6 93.0 93.2 92.9

⫾ ⫾ ⫾ ⫾

3.3 3.1 3.5 3.3

⫺1.9 ⫺1.0 ⫺0.4 1.9

⫾ ⫾ ⫾ ⫾

6.3 8.7 8.3 5.2

⫺0.11 ⫺0.06 ⫺0.02 0.11

83.0 83.2 83.8 83.1

⫾ ⫾ ⫾ ⫾

2.2 2.3 2.7 2.2

⫺0.7 3.8 4.7 7.9

⫾ ⫾ ⫾ ⫾

9.3 7.5 6.7 8.4

⫺0.04 0.21 0.27 0.45

76.0 75.7 76.1 75.6

⫾ ⫾ ⫾ ⫾

2.1 2.0 2.0 2.0

0.2 3.2 7.2 10.2

⫾ ⫾ ⫾ ⫾

7.1 9.1 7.3 8.5

0.01 0.18 0.41 0.58

56.1 54.2 55.8 57.7

⫾ ⫾ ⫾ ⫾

11.3 13.2 11.8 10.4

6.1 10.1 12.0 18.8

⫾ ⫾ ⫾ ⫾

11.7 12.1 11.8 13.6

0.35 0.57 0.68 1.06

a Positive change indicates improvement; negative change indicates deterioration. b based on standard deviation for all subjects at baseline (17.7).

group were 0.1% to 9.9% gain ⫽ 260 (17.6%), 0% to 4.9% loss ⫽ 492 (33.3%), 5% to 9.9% loss ⫽ 351 (23.8%), and 10%+ loss ⫽ 373 (25.3%).

3.1. Demographics

3.4. IWQOL-Lite 6-month change

The sample consisted of 1,101 women (74.6%) and 375 men (25.4%). The average age of the participants was 47.5 (SD ⫽ 10.6) with a range of 19 to 79 years. The average BMI for women was 37.0 (SD ⫽ 5.3, range ⫽ 30–63) and for men was 36.1 (SD ⫽ 5.2, range ⫽ 30–67). Ethnic background information was not available for the majority of participants.

Baseline IWQOL-Lite total scores and 6-month change are presented in Table 1 by baseline HRQOL impairment (none, mild, moderate, severe) and 6-month weight loss (0.1% to 9.9% gain, 0% to 4.9% loss, 5% to 9.9% loss, 10%⫹ loss). Greater HRQOL change is observed with greater weight loss (F ⫽ 36.0; df ⫽ 3,1460; P ⬍ .001; partial η2 ⫽ 0.069) and more severe baseline HRQOL impairments (F ⫽ 111.0; df ⫽ 3,1460; P ⬍ .001; partial η2 ⫽ 0.186). The comparison of η2 values indicates that HRQOL change is more strongly associated with baseline HRQOL severity than with amount of weight loss, thereby supporting the emphasis of baseline impairments in the current approach. Further, a significant interaction between weight loss and baseline HRQOL impairment (F ⫽ 2.60; df ⫽ 9,1460; P ⫽ .006; partial η2 ⫽ 0.016) suggests that the influence of weight change on HRQOL change is minimal with no baseline HRQOL impairment but is substantial with severe impairments.

3.2. Baseline HRQOL impairment The mean IWQOL-Lite total score for the normative sample was 94.7 (SD ⫽ 7.6). Comparison of baseline IWQOL-Lite scores in the current sample to the normative mean resulted in the following classifications of baseline HRQOL impairment: none ⫽ 337 (22.8%), mild ⫽ 256 (17.3%), moderate ⫽ 221 (15.0%), and severe ⫽ 662 (44.9%). IWQOL-Lite total scores by baseline severity category are presented in Table 1. Average BMIs for the four severity groups were 34.7 (SD ⫽ 3.7), 35.5 (SD ⫽ 4.3), 36.8 (SD ⫽ 5.2), and 38.2 (SD ⫽ 5.9), respectively (F ⫽ 40.3; df ⫽ 3,1472; P ⬍ .001). 3.3. 6-Month weight loss/gain Average 6-month weight loss was 6.0% (SD ⫽ 6.7, range ⫽ ⫺8.7% to 30.1%). Frequencies in each weight loss

3.5. Regression to the mean The Pearson correlation between pretest and post-test scores in the current sample was 0.760. The correlation of baseline IWQOL-Lite total score with change from baseline to 6-month was ⫺.452 (P ⬍ .001). The absolute difference

R.D. Crosby et al. / Journal of Clinical Epidemiology 57 (2004) 1153–1160

between baseline score and the normative mean was correlated 0.446 (P ⬍ .001) with baseline to 6-month change. These findings suggest considerable regression toward the normative mean. Specifically, individuals with more severe impairments in IWQOL-Lite total score at baseline exhibited greater change in IWQOL-Lite total score at 6 months. 3.6. Precision-referenced change Cronbach’s alpha for the normative sample was 0.965, and the SD was 20.7, resulting in a SEM for the IWQOLLite total score of 3.87. The upper and lower bounds of the 95% Edwards-Nunnally CIs for determining precision-referenced improvement and deterioration were calculated. These cutoffs are presented in Table 2. For reference purposes only, precisionreferenced cutoffs not corrected for RTM are also presented. The Edwards-Nunnally cutoffs required to establish precision-referenced improvement range from 7.7 to 10.9 points depending upon baseline HRQOL score. The EdwardsNunnally cutoffs required to establish precision-referenced deterioration range from ⫺4.4 to ⫺7.8 depending upon baseline HRQOL score.

Table 2 Precision- and criterion-referenced change cutoffs by baseline HRQOL score

HRQOL baseline

Table 2 presents the precision-referenced and criterionreferenced cutoffs for establishing meaningful change. Considered separately, neither approach provides reasonable cutoffs across all four baseline HRQOL categories. For example, based on the Edwards-Nunnally precisionreferenced cutoff for improvement, an individual in the severe range with a score of 40 at baseline would be required to improve by 9.5 points at follow-up to be considered improved. However, if we examine HRQOL changes among those with severe baseline impairments, almost 40% of those who gained 0.1% to 9.9% of their weight and over half those who lost only 0% to 4.9% meet this criterion. Consequently, the precision-referenced cutoff seems to be unreasonably low in this example. In contrast, the criterion-referenced cutoff for establishing meaningful improvement for individual’s with a baseline score ⬎87.1 (i.e., no impairment) is ⫺0.4, indicating a slight decrease in HRQOL. This cutoff is also unreasonably low. Discrepancies between criterion-referenced and Edwards-Nunnally precision-referenced improvement cutoffs

Deterioration cutoffs

Precisionreferenced

Precisionreferenced

Score

w/o RTM

EdwardsNunnally

Criterionreferenced

w/o RTM

EdwardsNunnally

None

100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0

— — — — — — — — 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6

— — — — — — — — 7.7 7.7 7.7 7.8 7.8 7.9 7.9 7.9 8.0 8.0 8.0 8.1 8.1 8.1 8.2 8.2 8.2 8.3 8.3 8.3 8.4 8.4 8.6 8.8 9.0 9.1 9.3 9.5 9.7 9.8 10.0 10.2 10.4 10.5 10.7 10.9

⫺0.4 ⫺0.4 ⫺0.4 ⫺0.4 ⫺0.4 ⫺0.4 ⫺0.4 ⫺0.4 ⫺0.4 ⫺0.4 ⫺0.4 ⫺0.4 ⫺0.4 4.7 4.7 4.7 4.7 4.7 4.7 4.7 4.7 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 12.0 12.0 12.0 12.0 12.0 12.0 12.0 12.0 12.0 12.0 12.0 12.0 12.0 12.0 12.0

⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 ⫺7.6 — —

ⴚ7.8b ⴚ7.7 ⴚ7.7 ⴚ7.7 ⴚ7.6 ⴚ7.6 ⴚ7.6 ⴚ7.5 ⴚ7.5 ⴚ7.5 ⴚ7.4 ⴚ7.4 ⴚ7.4 ⴚ7.3 ⴚ7.3 ⴚ7.2 ⴚ7.2 ⴚ7.2 ⴚ7.1 ⴚ7.1 ⴚ7.1 ⴚ7.0 ⴚ7.0 ⴚ7.0 ⴚ6.9 ⴚ6.9 ⴚ6.9 ⴚ6.8 ⴚ6.8 ⴚ6.7 ⴚ6.5 ⴚ6.4 ⴚ6.2 ⴚ6.0 ⴚ5.8 ⴚ5.7 ⴚ5.5 ⴚ5.3 ⴚ5.1 ⴚ5.0 ⴚ4.8 ⴚ4.6 ⴚ4.4 —

Mild

Moderate

Severe

3.8. Resolving precision-referenced and criterion-referenced cutoffs

Improvement cutoffsa

Severity

3.7. Criterion-referenced change The cutoffs used to establish criterion-referenced improvement for the four baseline HRQOL impairment groups were: none ⫽ ⫺0.4, mild ⫽ 4.7, moderate ⫽ 7.2, severe ⫽ 12.0. These cutoffs are shown in Table 2. No criterionreferenced cutoffs were established for deterioration because fewer than 2% of participants gained 5% or more during the 6-month period.

1157

RTM, regression to the mean a Positive cutoffs indicate improvement (i.e., increase), and negative cutoffs indicate a deterioration (i.e., decrease) in IWQOL-Lite total scores from baseline to 6 months. b Bold face indicates final meaningful change cutoff.

were resolved by choosing the greater of the two cutoffs for any given baseline IWQOL-Lite score. This is shown in Table 2 (boldface numbers). Thus, for individuals in the “severe” category at baseline, an improvement of 12.0 points or more is considered to be meaningful based on the criterionreferenced cutoff. For individuals in the remaining categories,

1158

R.D. Crosby et al. / Journal of Clinical Epidemiology 57 (2004) 1153–1160

the cutoffs for determining meaningful improvement are based on the Edwards-Nunnally precision-referenced criteria. These cutoffs range from 7.7 to 8.4 points. In establishing cutoffs for meaningful deterioration, only the precisionreferenced cutoff is used to establish deterioration, with cutoffs ranging from ⫺4.4 to ⫺7.8 points depending upon baseline HRQOL score. Table 3 presents the percentage of individuals classified as improved, unchanged, or deteriorated based on choosing the larger of the Edwards-Nunnally and criterion-referenced cutoffs. This classification is presented as a function of baseline HRQOL severity and weight loss. These percentages can be interpreted as the relative probability of achieving meaningful improvement and deterioration for any combination of baseline HRQOL impairment and subsequent weight loss. The results presented in Table 3 are generally consistent with expectations and support the validity of this classification system in several respects. First, within each impairment classification, the percentage of participants classified as “improved” increases with greater weight loss. Similarly, the percentage of participants classified as “deteriorated” decreases with greater weight loss, with one exception. Second, relatively few participants received a classification that was seemingly inconsistent with observed weight loss. Less than 6% (69 of 1,216) of participants losing any amount of weight were classified as “deteriorated”; less than 16% (41 of 260) of those gaining any weight were classified as “improved,” the majority (29 of 41) of which werein the severely impairedcategory. Finally, theclassification of improved/unchanged/deteriorated is significantly correlated with baseline impairment classification (Spearman’s rho ⫽ ⫺0.303, P ⬍ .001) and weight loss classification (Spearman’s rho ⫽ ⫺0.301, P ⬍ .001). Table 3 HRQOL improvement and deterioration rates by baseline HRQOL severity and 6-month weight loss Baseline IWQOL impairment (Weight gain/loss) None (⬍1 SD) 0.1–9.9% gain (n ⫽ 66) 0–4.9% loss (n ⫽ 117) 5–9.9% loss (n ⫽80) 10%+ loss (n ⫽ 74) Mild (1–2 SD) 0.1–9.9% gain (n ⫽ 52) 0–4.9% loss (n ⫽ 74) 5–9.9% loss (n ⫽ 60) 10%⫹ loss (n ⫽ 70) Moderate (2-3 SD) 0.1–9.9% gain (n ⫽ 32) 0–4.9% loss(n ⫽ 77) 5–9.9% loss (n ⫽ 47) 10%⫹ loss (n ⫽ 65) Severe (3⫹ SD) 0.1–9.9% gain (n ⫽ 110) 0–4.9% loss (n ⫽ 224) 5–9.9% loss (n ⫽ 164) 10%⫹ loss (n ⫽ 164)

Improved

Unchanged

Deteriorated

1 3 4 9

(1.5%) (2.6%) (5.0%) (12.2%)

56 (84.8%) 101(86.3%) 69 (86.3%) 61 (82.4%)

9 (13.6%) 13(11.1%) 7 (8.8%) 4 (5.4%)

6 25 22 44

(11.5%) (33.8%) (36.7%) (62.9%)

33 45 36 24

(63.5%) (60.8%) (60.0%) (34.3%)

13 4 2 2

(25.0%) (5.4%) (3.3%) (2.9%)

5 19 20 38

(15.6%) (24.7%) (42.6%) (58.5%)

21 50 27 25

(65.6%) (64.9%) (57.4%) (38.5%)

6 8 0 2

(18.8%) (10.4%) (0.0%) (3.1%)

29 96 81 118

(26.4%) (42.9%) (49.4%) (72.0%)

66 111 77 42

(60.0%) (49.6%) (47.0%) (25.6%)

15 17 6 4

(13.6%) (7.6%) (3.7%) (2.4%)

4. Discussion We describe and illustrate an integrated method for determining meaningful change in HRQOL that combines information from anchor-based and distribution-based methods. This integrated method takes into account measurement precision, external reference information, baseline HRQOL impairment, and RTM. Although we use weight loss and obesity-specific quality of life to illustrate this method in the current example, this method provides a general framework that can easily be adapted to other applications and settings. This integrated method may be used with other anchor-based or distribution-based techniques to establish cutoffs and can be applied to other time frames, instruments, populations, or disease states. Although other researchers have used a combination of anchor-based and distributionbased methods to establish meaningful change in HRQOL [15,22], we believe that we are the first to integrate these methods in a systematic way that resolves discrepancies obtained by the two methods and takes other important factors into account. This method provides a useful research tool for evaluating whether changes in HRQOL that occur in clinical trials are meaningful. When evaluating the efficacy of different treatments, this method may be used to determine the percentage of subjects in each treatment group that exhibited meaningful improvement (or deterioration) in HRQOL. Such information is a valuable addition to tests of statistical significance used to determine whether group means (or group change scores) differ on HRQOL. The method we describe may also prove to be a convenient clinical tool, providing guidelines to clinicians for the interpretation of baseline HRQOL impairment and changes in HRQOL over time. Use of this method may facilitate identification of factors potentially associated with meaningful improvement in HRQOL, such as treatment modality, individual patient characteristics, and duration and severity of illness. Finally, this method allows for the systematic evaluation of claims made by weight loss products and pharmacologic agents regarding their ability to produce meaningful improvement in quality of life. The data presented herein highlight several important points. First, they illustrate the limitations of using a single method for determining meaningful change. The distribution-based method produced unrealistically low cutoffs for individuals with severe impairments in HRQOL. Likewise, anchor-based methods produced unrealistically low cutoffs for individuals with mild or low impairments in HRQOL. Only by combining information from multiple methods were more realistic cutoffs achieved. Second, consistent with the findings of Stratford et al. [25], these data support the importance of considering initial HRQOL severity in determining meaningful change. Baseline HRQOL impairment was found to be a better predictor of subsequent changes in HRQOL than was weight change during that same period. Third, similar to the findings reported by Speer [28], these data support the importance of considering RTM.

R.D. Crosby et al. / Journal of Clinical Epidemiology 57 (2004) 1153–1160

Substantial correlations were found between distance from the normative mean at baseline and subsequent change in HRQOL. Fourth, these data illustrate that factors other than weight loss may influence changes in HRQOL. Baseline HRQOL impairment is one of these factors. However, there may be other factors that affect changes in HRQOL, such as physical health, treatment factors, patient expectations, and the increased social support and self-monitoring that occur while one is in treatment. Several observations can be made regarding RTM. First, in circumstances in which HRQOL differs substantially by subgroups (e.g., gender, comorbid condition), it may be advantageous to compute norms separately for each subgroup and adjust change scores for regression to the subgroup mean. In the current study, gender accounted for ⬍3% of the criterion variance in the normative sample. On that basis, we elected not to use subgroup norms, although there may be circumstances in which the use of subgroup norms is warranted. Second, RTM is almost always present in longitudinal studies of HRQOL to some degree due to the use of measures that are less than perfectly reliable. In the current study, correcting for RTM changed precision-referenced improvement and deterioration cutoffs by 3 points (Table 2). The decision whether to adjust for RTM should be based on the reliability of the measure and the observed correlation between baseline HRQOL scores and subsequent HRQOL change. Using Cohen’s [45] definition of a “medium” correlation as a general guideline, we suggest considering adjustment for RTM when the correlation between baseline HRQOL scores and subsequent HRQOL change reaches 0.30. Much of the current literature on change in HRQOL instruments has been devoted to identifying minimal score differences that can be interpreted as meaningful [16,19,46– 47]. The minimal important difference has been defined as “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side-effects and excessive cost, a change in the patient’s management” [16]. In contrast, the current approach resolves discrepancies between precision- and criterion-referenced cutoffs by selecting the larger of the two. Consequently, the resultant cutoffs for the current approach may be somewhat larger, albeit more conservative, than those determined using previous methods and therefore may not represent minimally meaningful change. Several limitations of the current approach must be noted. First, all of the individuals in the current sample were participants in weight loss studies or programs. Studies have shown that obese treatment seekers differ on quality of life from obese persons not seeking treatment [48,49]. Consequently, the current findings may not be generalizable to individuals losing or gaining weight who are not actively involved in weight loss programs. Second, relatively few individuals in the current sample experienced substantial weight gain. As such, no anchors were available for establishing reasonable cutoffs for meaningful deterioration in HRQOL. Consequently, the proposed deterioration cutoffs based on

1159

distribution-based methods must be considered tentative. Third, the proposed method for determining meaningful change in HRQOL has been developed and applied only in the context of obesity treatment. Future research is needed to demonstrate whether this method will produce comparable results when applied to other disease-specific quality of life areas. Similarly, it remains to be seen what effect the use of different anchors, different distribution-based techniques, or different methods for classifying baseline impairment would have on the results. Another potential limitation of this method is that many HRQOL instruments may lack normative data, which is essential in adjusting for RTM. In those cases, our recommendation is to apply this integrated method using precision-referenced cutoffs that are not adjusted for RTM. However, users should be aware that not correcting for RTM may result in less conservative cutoffs for determining meaningful change, particularly for individuals with severe HRQOL impairments at baseline. In conclusion, meaningful change in HRQOL can be determined using an integrated method that (1) combines information from anchor-based and distribution-based methods, (2) reconciles discrepancies between these two methods, and (3) adjusts for baseline severity and RTM. Using this integrated method, an improvement of 7.7 to 12 points (depending on baseline severity) on IWQOL-Lite total score is considered meaningful. The method we describe may be applied to other types of HRQOL measures and conditions.

Acknowledgments We acknowledge the contributions of Amy Phillips and Talat Ashraf from Abbott Laboratories; Julie Porter, Marsha Raebel, and Douglas Conner of Kaiser Permanente of Colorado; the Obesity Research Network; Stan Heshka; Weight Watchers International; and Guilford Hartley from Hennepin County Medical Center. Financial support for this project was provided by Bristol-Myers Squibb, Princeton, New Jersey.

References [1] Kolotkin RL, Meter K, Williams GR. Quality of life and obesity. Obes Res 2001;2:219–29. [2] Kushner RF, Foster G. Obesity and quality of life. Nutrition 2000; 16:947–52. [3] Fontaine KR, Barofsky I. Obesity and health-related quality of life. Obes Res 2001;2:173–82. [4] Sullivan M, Karlsson J, Sjostrom L, Taft C. Why quality of life measures should be used in the treatment of patients with obesity. In: Bjorntorp P, editor. International textbook of obesity. New York: John Wiley & Sons; 2001. p. 485–510. [5] Sullivan HM, Karlsson J, Sjostrom L, Backman L, Bengtsson C, Bouchard C, Dahlgren S, Jonsson E, Larsson B, Lindstedt S. Swedish obese subjects (SOS): an intervention study of obesity. Baseline evaluation of health and psychosocial functioning in the first 1743 subjects examined. Int J Obes 1993;1743:503–12. [6] Kawachi L. Physical and psychological consequences of weight gain. J Clin Psychol 1999;60(Suppl 21):5–9.

1160

R.D. Crosby et al. / Journal of Clinical Epidemiology 57 (2004) 1153–1160

[7] de Zwaan M, Mitchell JE, Howell M, Monson N, Swan-Kremeier L, Roerig JL, Kolotkin RL, Crosby RD. Two measures of health related quality of life in morbid obesity. Obes Res 2002;10:1143–51. [8] Rippe JM, Price JM, Hess SA, Kline G, DeMers KA, Damitz S, Kreidieh I, Freedson P. Improved psychological well-being, quality of life, and health practices in moderately overweight women participating in a 12-week structured weight loss program. Obes Res 1998; 6:208–18. [9] Fine JT, Colditz GA, Coakley EH, Moseley G, Manson JE, Willett WC, Kawachi I. A prospective study of weight change and health-related quality of life in women. JAMA 1999;282:2136–42. [10] Fontaine KR, Barofsky I, Andersen RE, Bartlett SJ, Wiersema L, Cheskin LJ, Franckowiak SC. Impact of weight loss on health-related quality of life. Qual Life Res 1999;8:275–7. [11] Kolotkin RL, Crosby RD, Williams GR, Hartley GG, Nicol S. The relationship between health-related quality of life and weight loss. Obes Res 2001;9:564–71. [12] Jacobson NS, Follette WC, Revenstorf D. Psychotherapy outcome research: methods for reporting variability and evaluating clinical significance. Behav Ther 1984;15:336–52. [13] Barlow DH. On the relation of clinical research to clinical practices: current issue, new directions. J Consult Clin Psychol 1981;49:147–55. [14] Yeaton WH, Sechrest L. Critical dimensions in the choice and maintenance of successful treatments: strength, integrity, and effectiveness. J Consult Clin Psychol 1981;49:156–67. [15] Jacobson NS, Truax P. Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. J Consult Clin Psychol 1991;59:12–9. [16] Jaeschke R, Singer J, Guyatt GH. Ascertaining the minimal clinically important difference. Control Clin Trials 1989;10:407–15. [17] Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol 2003; 56:395–7. [18] Lydick F, Epstein RS. Interpretation of quality of life changes. Qual Life Res 1993;2:221–6. [19] Juniper EF, Guyatt GH, Willan A, Griffith LE. Determining a minimal important change in a disease specific quality of life questionnaire. J Clin Epidemiol 1994;47:81–7. [20] Kazis LE, Anderson JJ, Meenan RS. Effect sizes for interpreting changes in health status. Med Care 1989;27(Suppl 3):S178–89. [21] Wyrwich K, Nienaber N, Tierney W, Wolinsky F. Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life. J Clin Epidemiol 1999;52: 861–73. [22] Cella D, Eton DT, Fairchough DL, Bonomi P, Heyes AE, Silberman C, Wolf MK, Johnson DH. What is clinically meaningful change on the Functional Assessment of Cancer Therapy-Lung (FACT-L) questionnaire? Results from Eastern Cooperative Oncology Group (ECOG) Study 5592. J Clin Epidemiol 2002;55:285–95. [23] McHorney CA. Generic health measurement: past accomplishments and a measurement paradigm for the 21st century. Ann Intern Med 1997;127:743–50. [24] Samsa G, Edelman D, Rothman ML, Williams GR, Lipscomb J, Matchar D. Determining clinically important differences in health status measures: a general approach with illustration to the Health Utilities Index Mark II. Pharmacoeconomics 1999;15:141–55. [25] Stratford PW, Binkley JM, Riddle DL, Guyatt GH. Sensitivity to change of the Roland-Morris Back Pain Questionnaire: Part 1. Phys Ther 1998;78:1186–96. [26] Kolotkin RL, Crosby RD, Williams GR. Integrating anchor-based and distribution-based methods to determine clinically meaningful change in obesity-specific quality of life. Qual Life Res 2002;11:670. [27] Engel SG, Crosby RD, Kolotkin RL, Hartley GG, Williams GR, Wonderlich SA, Mitchell JE. The impact of weight loss and regain on obesity-specific quality of life: mirror image or differential effect? Obes Res 2003;11:1207–13.

[28] Speer DC. Clinically significant change: Jacobson and Truax (1991) revisited. J Consult Clin Psychol 1992;60:402–8. [29] Hageman WJ, Arrindell WA. Establishing clinically significant change: increment of precision and distinction between individual and group level of analysis. Behav Res Ther 1999;37:1169–93. [30] Hsu LM. Regression toward the mean associated with measurement error and the identification of improvement and deterioration in psychotherapy. J Consult Clin Psychol 1995;63:141–4. [31] Samsa GP, Kolotkin RL, Williams GR, Nguyen MH, Mendel CM. Effect of moderate weight loss on health-related quality of life: an analysis of combined data from 4 randomized trials of sibutramine versus placebo. Am J Manag Care 2001;7:875–83. [32] Raebel MA, Conner DA, Porter JA, Lanty FA, Vogel EA, Gay EC, Merenich JA. The long-term outcomes of sibutramine effectiveness on weight (LOSE Weight) study in a managed care organization: twelve-month clinical outcomes in a naturalistic clinical setting. Diabetes 2002;51(Suppl 2):A413. [33] Anderson JW, Greenway FL, Fujioka K, Gadde KM, McKenney J, O’Neil PM. Bupropion SR enhances weight loss: a 48-week doubleblind, placebo-controlled trial. Obes Res 2002;10:633–41. [34] Heshka S, Anderson JW, Atkinson RL, Greenway F, Hill JO, Phinney SD, Kolotkin RL, Miller-Kovach K, Pi-Sunyer FX. Weight loss with self-help compared with a structured commercial program: a randomized trial. JAMA 2003;289:1792–8. [35] Kolotkin RL, Crosby RD, Kosloski KD, Williams GR. Development of a brief measure to assess quality of life in obesity. Obes Res 2001; 9:102–11. [36] Kolotkin RL, Crosby RD. Psychometric evaluation of the Impact of Weight on Quality of Life-Lite questionnaire (IWQOL-Lite) in a community sample. Qual Life Res 2002;10:748–56. [37] Kolotkin RL, Crosby RD. The Impact of Weight on Quality of LifeLite (IWQOL-Lite): user’s manual. Durham (NC): Obesity and Quality of Life Consulting; 2002. [38] Anastasi A, Urbina S. Psychological testing. 7th edition. Upper Saddle River (NJ): Prentice Hall; 1997. [39] Nunnally JC, Bernstein IH. Psychometric theory. 3rd edition. New York: McGraw-Hill; 1994. [40] Edwards DW, Yarvis RM, Mueller DP, Zingale HC, Wagman WJ. Test-taking and the stability of adjustment scales: can we assess patient deterioration?. Eval Q 1978;2:275–92. [41] Stanley JC. Reliability. In: Thorndike RL, editor. Educational measurement. 2nd edition. Washington, DC: American Council on Education; 1971. p. 356–442. [42] Food and Drug Administration Guidance for the Clinical Evaluation of Weight-Control Drugs. Available at: http://www.fda.gov/cder/guidance/obesity.pdf (accessed April 10, 2004) [43] Vidal J. Updated review on the benefits of weight loss. Int J Obes 2002;26(Suppl 4):S25–8. [44] Pasanisi F, Contaldo F, deSimone G, Mancini M. Benefits of sustained moderate weight loss in obesity. Nutr Metab Cardiovasc Dis 2001;11: 401–6. [45] Cohen J. Statistical power analysis for the behavioral sciences. 2nd edition. Hillsdale (New Jersey): Lawrence Erlbaum; 1988. [46] Hudgens SA, Yost K, Cella D, Hahn E, Peterman A. Comparing retrospective and prospective anchors for identifying minimally important differences. Qual Life Res 2002;11:629. [47] Guyatt GH, Osaba D, Wu AW, Wyrwich KW, Norman GR. Clinical Significance Consensus Group. Methods to explain the clinical significance of health status measures. Mayo Clin Proc 2002;77:371–83. [48] Fontaine KR, Bartlett SJ, Barofsky I. Health-related quality of life among obese persons seeking and not currently seeking treatment. Int J Eat Disord 2000;27:101–5. [49] Kolotkin RL, Crosby RD, Williams GR. Health-related quality of life varies among obese sub-groups. Obes Res 2002;10:748–56.