Journal of Affective Disorders 105 (2008) 81 – 91 www.elsevier.com/locate/jad
Research report
The responsiveness of EQ-5D utility scores in patients with depression: A comparison with instruments measuring quality of life, psychopathology and social functioning ☆ Oliver H. Günther a,b,⁎, Christiane Roick b , Matthias C. Angermeyer b , Hans-Helmut König a,b a
Health Economics Research Unit, University of Leipzig, Johannisallee 20, 04317 Leipzig, Germany b Department of Psychiatry, University of Leipzig, Johannisallee 20, 04317 Leipzig, Germany Received 25 January 2007; received in revised form 20 April 2007; accepted 20 April 2007 Available online 29 May 2007
Abstract Introduction: The EQ-5D provides preference weights (utilities) for health-related quality of life to be used for calculating qualityadjusted life years (QALYs) in cost-utility analysis. The aim of this study was to compare differences in EQ-5D utility scores with differences in quality of life, psychopathology, and social functioning scores. Methods: In an observational longitudinal cohort study, EQ-5D utilities (EQ visual analogue scale (EQ VAS), EQ-5D indices of the United Kingdom (EQ-5D index-UK) and Germany (EQ-5D index-D)) were compared with scores of the WHOQOL-BREF, CGI, and GAF at baseline and at 18 months (N = 104). The patients' health status at follow-up was categorized as “worse”, “stable”, or “better” using the EQ-5D transition question (patient-based anchor) and the Bech–Rafaelsen melancholy scale (clinician-based anchor). Effect sizes (ES) were used to compare differences in scores within each group over time; regression analysis was used to derive meaningful difference scores in health status associated with a shift from “stable” to “better” health status. Results: The most responsive instrument was the CGI (patient-based anchor: ES = |0.98|; clinician-based anchor: ES = |1.35|); responsiveness was large in EQ VAS (patient-based anchor: ES = |0.84|; clinician-based anchor: ES = |1.19|), but rather small to medium for EQ-5D index-UK (patient-based anchor: ES = |0.55|; clinician-based anchor: ES = |0.65|) and EQ-5D index-D (patientbased anchor: ES = |0.41|; clinician-based anchor: ES = |0.45|). Compared with the other instruments, the shift to a “better health status” was smaller if elicited by the EQ-5D indices. Discussion: Both EQ-5D indices were less responsive and need larger patient samples to detect meaningful differences compared with EQ VAS and the other instruments. © 2007 Elsevier B.V. All rights reserved. Keywords: EQ-5D; Quality of life; Depression; Responsiveness; EuroQol
☆ Conflict of interest: All authors declare that they have no conflicts of interest. Contributors: All authors contributed to and have approved the final manuscript: Oliver H. Günther undertook the statistical analyze and wrote the first draft and the final version of manuscript. Christiane Roick designed the study and managed the process of data collection. Matthias C. Angermeyer planned the study and reviewed the final manuscript. Hans-Helmut König analyzed the data and reviewed the first draft. Role of funding source: Funding for this study was provided by German Statutory Health Insurance (grant number 932000-050) and the German Federal Ministry of Education and Research (grant number 01ZZ0106); both had no further role in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the paper for publication. ⁎ Corresponding author. University of Leipzig, Health Economics Research Unit, Department of Psychiatry, Johannisallee 20, D-04317 Leipzig, Germany. Tel.: +49 341 97 24560; fax: +49 341 97 24569. E-mail address:
[email protected] (O.H. Günther).
0165-0327/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jad.2007.04.018
82
O.H. Günther et al. / Journal of Affective Disorders 105 (2008) 81–91
1. Introduction The EQ-5D is a short generic patient rated questionnaire measuring health-related quality of life (HRQOL). It is often applied as a measure of outcome in studies comparing different treatments (The EuroQol Group provides a detailed reference list [April 2007]: http:// www.euroqol.org). The EQ-5D provides two important aspects: a descriptive profile of HRQOL based on five dimensions including mobility, self-care, usual activities, pain/discomfort, anxiety/depression, and a valuation of the profile by a visual analogue scale (EQ VAS), i.e. a single score reflecting patients’ preferences (Brooks, 1996; The EuroQol Group, 1990). In addition, for various countries (including the United Kingdom, Germany, and the United States) an index score is available assigned to all possible health states described by the EQ-5D according to a particular set of preference values derived from surveys of the general population (Dolan, 1997; Greiner et al., 2004; Shaw et al., 2005). The patients' scores and/or index scores derived from the general population might be used in evaluating change in health status, with the former reflecting the preferences of beneficiaries of care and the latter reflecting community preferences (Dolan, 1999). For the purpose of cost-utility analysis in economic evaluation, with the consequences of treatment being measured in terms of quality-adjusted life years (QALYs), these preference weights are typically used for calculating QALYs (Drummond et al., 1997; Gold et al., 1996). Decisions about the suitability of the EQ-5D in economic evaluation, especially concerning the EQ VAS and the various country specific societal EQ-5D indices, need to be based on a clear conceptual framework; that means the EQ-5D has to demonstrate its psychometric validity and reliability (Revicki et al., 2000). In the field of depressive disorders, several studies evaluated the suitability of the EQ-5D index-UK (Hayhurst et al., 2006; Lamers et al., 2006; Sapin et al., 2004). In the study by Sapin et al., the authors showed that significant change in EQ-5D index-UK was found by disease severity level, with more severe patients having lower index scores (Sapin et al., 2004). Further studies demonstrated that the mean in EQ-5D index-UK decreased with deterioration in health status and that the EQ-5D index-UK discriminates between groups with varying levels of depression (Hayhurst et al., 2006; Lamers et al., 2006). Overall, these results could corroborate that the EQ-5D index-UK reflected psychopathology and mental aspects of the quality of life in patients with major depression. In addition to aspects of validity and reliability, the EQ-5D has to show evidence demonstrating respon-
siveness. Responsiveness reflects the ability of an instrument to detect a change in health status. Responsiveness is determined by evaluating the relationship between changes in clinical endpoints and changes in an instrument’s outcome over time in either observational or clinical trials (Guyatt et al., 1987; Revicki et al., 2000). Recently, the Food and Drug Administration (FDA) in the Unites States has raised a draft on patient reported outcomes including methods to calculate responsiveness and to interpret the detected changes as meaningful (Food and Drug Administration, 2006). The recommended best practice in the evaluation of responsiveness is the calculation of various distribution-based estimates (i.e. effect size, standardized response mean, standard error of measurement) under several anchor-based criteria (i.e. patient or clinician ratings of global improvement) (Revicki et al., 2006): on this note, the anchor-based criterion is used as a external indicator to assign patients into groups reflecting “no change”, and a “(small) positive/negative change”. The distribution-based estimates describing responsiveness consist of a ratio in which the difference in mean baseline to endpoint score reflects the numerator, and different estimates of variability reflect the denominator. Each of these statistical measures acts as a quantitative description of change within the groups. Guidance on interpretation of the magnitude of a distribution-based estimate, for example whether differences in scores are viewed as meaningful, is provided (Cohen, 1988; Norman et al., 2003). Nonetheless, there is no gold standard in terms of whether the difference in scores are meaningful either from the patient’s or clinician’s perspective, but there are some methods which one will find very useful in interpretation; for example, one definition of a meaningful difference is based on the “minimal important difference” (MID) in scores perceptible to patients as a beneficial change (Guyatt et al., 2002; Jaeschke et al., 1989). In practice, the MID is viewed as the difference in scores between the group with “no change” and the group with “small positive/negative change”. Actually, the MID is an interpretation of change across the groups. The objective of this study was to compare and contrast the responsiveness in preference-based scores of the EQ-5D for patients with depression with the responsiveness in composite summary scores of instruments measuring quality of life, psychopathology and social functioning. To facilitate interpreting whether the differences in scores are meaningful, we combined measurement information with respect to a patient and clinician anchor-based criterion. We hypothesized the
O.H. Günther et al. / Journal of Affective Disorders 105 (2008) 81–91
following: (1) since the preference-based scores of the EQ-5D as well as the other instruments used for comparison reflect aspects of HRQOL, the difference scores should show at least a moderate relationship with each other; (2) since the studies published on quality of life in depression showed that HRQOL is related to depressive symptoms, instruments measuring a specific “facet of depression” are expected to reflect a more responsive, meaningful change than the generic preference-based scores derived from the EQ-5D (Papakostas et al., 2004; Wiebe et al., 2003). 2. Methods 2.1. Subjects Patients suffering from an affective disorder according to the International Statistical Classification of Diseases (ICD-10) (World Health Organization, 2004) were recruited consecutively by clinicians at three departments of psychiatry in the federal state “Schleswig Holstein”, Germany. Only patients with a depressive episode according to the ICD classification F32.1, F32.2, F33.1 and F33.2 participated in this observational longitudinal cohort study. Consequently, patients with psychotic symptoms and patients with manic symptoms were excluded from participation. Baseline interviews were administered after admission from September 2003 until March 2004 by trained interviewers as part of a research study within the pilot project of the Regional Budget for Mental Health Care (Roick et al., 2005). All patients received therapy and were again interviewed after follow-up period of eighteen month. They were at least 18 years old and had residence in “Schleswig Holstein” or the city of “Hamburg”. Their participation was voluntary, and non-participation was not recorded.
83
coded: (1) no problems, (2) moderate problems, and (3) extreme problems. Theoretically, 35 = 243 different health states can be defined. The EQ-5D descriptive system was followed by a visual analogue scale (EQ VAS), similar to a thermometer ranging from 0 (worst imaginable health state) to 100 (best imaginable health state). The EQ VAS records the respondent's self-rated valuation of HRQOL, which is based on the respondent's preferences. 2.2.2. EQ-5D index Two different EQ-5D indices which represent societal preference values were used in the present study: the first index was obtained from a large United Kingdom population sample (n = 2997), where the valuation of 42 EQ-5D health states by Time Trade Off (TTO) technique was used to derive an algorithm for societal preference values of all 243 possible EQ-5D health states (Dolan, 1997). The second index used resulted from TTO based valuations of 36 EQ-5D health states by a random sample of the German general population (n = 334) (Greiner et al., 2004). According to each individual's health status on the self-classifier of the EQ-5D, one EQ-5D index-UK value and one EQ-5D index-D value were assigned. 2.2.3. WHOQOL-BREF The WHOQOL-BREF is a self-rated generic questionnaire for measuring quality of life over the previous two weeks (WHOQOLGroup, 1998). In the study only the WHOQOL-BREF score for overall quality of life was used. This score is calculated from two items ranging from 1 (worst) to 100 (best) (Angermeyer et al., 2000).
All subjects were assessed by a set of instruments containing self-rated instruments and scales to be completed by clinicians. The study measures will be briefly described.
2.2.4. CGI-S, GAF CGI-S is a single-item scale rated by the clinician. It is a standard measure for global assessment of severity of illness rated on a seven-point-Likert scale ranging from 1 (“not ill at all”) to 7 (“among the most extremely ill individuals”) (Guy, 1976). The GAF single-item scale measures the overall level of occupational functioning (Goldman et al., 1992). Rated by the clinician, the GAF scale consists of a series of ranked sentences associated with numerical scores ranging from 1 (worst) to 100 (best).
2.2.1. EQ-5D A standard version of the EQ-5D was administered, comprised of five items relating to problems in the following dimensions: “mobility”, “self-care”, “usual activities”, “pain/discomfort”, and “anxiety/depression” (Brooks, 1996; The EuroQol Group, 1990). Responses in each dimension are divided into three ordinal levels
2.2.5. Bech–Rafaelsen melancholia scale (BRAMES) The BRAMES is used to assess the severity of depression rated by the clinician (Bech et al., 1979). It consists of 11 items, all scored on a five-point Likert scale. The BRAMES fulfills the criteria of unidimensionality resulting in a total score range from 0 (best) to 44 (worst) (Licht et al., 2005).
2.2. Measures
84
O.H. Günther et al. / Journal of Affective Disorders 105 (2008) 81–91
2.3. Statistical analysis The relationship of difference scores in EQ-5D utilities and in the composite overall measures used in the study was determined by analyzing their level of correlation. Since the EQ VAS score and the EQ-5D indices did not follow a normal distribution, the non-parametric Spearman rank correlation coefficient (rs) was calculated. Correlation was considered to be small for |0.1| N rs ≥ |0.3|, medium for |0.3| N rs ≥ |0.5| and large for rs N |0.5|. All patients were classified in three different groups according to their change between baseline and followup on two anchor-based criteria: the first criterion was the EQ-5D transition question with patients rating their health status as worse, stable or better at follow-up in comparison with the baseline interview. Patients answered the transition question after the EQ-5D descriptive profile and the EQ VAS. The second criteria was based on the clinical change in BRAMES score: an interval defined as half a standard deviation (SD) around zero change represented the category “stable health status”. Negative values beyond this interval represent the category “worse health status”, whereas positive values beyond this interval represented the category “better health status” (Norman et al., 2003).
Responsiveness was compared by paired t-tests statistics, effect sizes (ES) and standardized response mean (SRM). The method to calculate ES was: ES =M2 −M1 / SDbaseline; where M1 is the mean score of baseline assessment, M2 the mean score of the post-assessment, and SDbaseline the pooled SD of the baseline assessment. The method to calculate SRM was: SRM=M2 −M1 /SDM2 − M1; where the numerator remains the same as for calculating ES, but the denominator represents the SD of the differences in mean scores. We considered an absolute magnitude of difference scores expressed by ES and SRM from b|0.20| as trivial, from ≤|0.20| to b |0.50| as small, from ≤|0.50| to b|0.80| as medium, and from ≥|0.80| as large based on Cohen's interpretation guidelines (Cohen, 1988). Missing values (b 1.3%) were replaced by the baseline value or by the follow-up value, respectively. As a consequence, the sum of baseline minus follow-up value was equal to zero. In the case of missing values at both measurement points, the difference in scores was set equal to zero (Sprangers et al., 2002). “Meaningful differences” in health status were estimated by a linear regression model. The model is represented by the equation ▵T=a+b1 X1 +b2 X2 +b3 X3 +b4 X4 +b5 X5 +e; where ΔT is the difference in instrument's score of the baseline and the follow-up assessment, a is the constant,
Fig. 1. Distribution of responses to items of the EQ-5D self-classifier in patient sample at baseline and eighteen months after (N = 104).
O.H. Günther et al. / Journal of Affective Disorders 105 (2008) 81–91
85
Table 1 Correlation between change score of preference-based measures, and of other instruments’ overall scores (N = 104) WHOQOL-BREF (total score) CGI WHOQOL-BREF (total score) 1.00 CGI − 0.568⁎ GAF 0.591⁎ EQ VAS 0.642⁎ EQ-5D index UK 0.545⁎ EQ-5D index D 0.429⁎ BRAMES − 0.680⁎
GAF
EQ VAS EQ-5D index UK EQ-5D index D BRAMES
1.00 −0.784⁎ 1.00 −0.444⁎ 0.508⁎ 1.00 −0.539⁎ 0.492⁎ 0.440⁎ 1.00 −0.441⁎ 0.382⁎ 0.321⁎ 0.945⁎ 0.704⁎ − 0.748⁎ − 0.574⁎ − 0.576⁎
1.00 − 0.462⁎
1.00
⁎p b 0.01.
b1… b2 are the regression coefficients for “worse” and “better” health state”, X1… X2 are the dummy variables for “worse” and “better” health state, b3… b5 are the regression coefficients for the variables X3 (score at baseline), X4 (the period since first diagnosis (in years)), X5 (age of the patients), and e is the error term. This regression model generates coefficients that reflect the incremental amount in difference scores for a shift to better/worse health status compared to the stable health status and concurrently controlled for instrument's baseline score, the period since first diagnosis, and age. All calculations were preformed by using the software STATA (STATA Corp., College Station, Texas, USA, Version 9.2).
2.4. Ethics The research protocol of this study was reviewed by the Committees of Research Ethics at the Medical Faculty of the University of Leipzig. 3. Results 3.1. Demographic characteristics Of the 141 patients initially participating in the study at baseline, 37 dropped out at follow-up, resulting in 104 patients completing both interviews. At baseline the mean age of patients who completed the study was
Table 2 Baseline, follow-up, and change scores of instruments for patients classified as deteriorated, stable and improved by criterion EQ-5D transition question and BRAMES anchor Instruments [possible range of Time scores: worst–best] period
Anchor by EQ-5D transition question (patient based)
Anchor by BRAMES psychopathology scale (clinician based)
Worse HS
Stable HS
Better HS
Worse HS
Stable HS
Better HS
N = 16
N = 39
N = 49
N = 10
N = 43
N = 51
Mean (SD) WHOQOL-BREF global score [0–100] CGI [6–0]
GAF [1–100]
EQ VAS [0–100]
EQ-5D index UK [− 0.59–1.0] EQ-5D index D [− 0.21–1.0]
Baseline 34.38 (23.50) 45.19 (23.41) 44.39 (26.03) 52.50 (18.44) 51.74 (25.96) 34.07 (21.66) Follow-up 26.56 (13.60) 49.36 (18.57) 67.60 (18.91) 30.00 (18.82) 52.61 (21.57) 60.78 (21.94) Change −7.81 (28.46) 4.17 (23.18) 23.21 (25.26) −22.50 (24.15) 0.87 (21.55) 26.71 (22.36) Baseline 3.31 (0.87) 2.95 (1.34) 3.02 (1.20) 2.50 (0.85) 2.65 (1.23) 3.47 (1.10) Follow-up 3.87 (0.93) 2.49 (1.39) 1.84 (1.28) 3.70 (0.95) 2.53 (1.28) 1.98 (1.46) change 0.56 (0.89) − 0.46 (1.60) − 1.18 (1.44) 1.20 (0.79) − 0.12 (1.07) − 1.49 (1.46) Baseline 53.88 (12.60) 61.13 (17.92) 59.82 (16.76) 68.30 (11.85) 65.88 (15.34) 52.18 (15.66) Follow-up 51.56 (8.54) 62.10 (13.68) 67.57 (14.96) 49.80 (8.61) 61.88 (13.49) 66.65 (15.03) Change −2.31 (14.30) 0.97 (20.73) 7.76 (15.79) −18.50 (10.81) − 4.0 (10.91) 14.47 (16.49) Baseline 36.38 (24.36) 51.82 (23.26) 53.88 (22.96) 51.90 (27.40) 58.77 (23.94) 43.08 (20.98) Follow-up 38.25 (21.36) 60.21 (18.67) 73.73 (17.17) 40.20 (20.53) 62.05 (19.32) 68.06 (21.59) Change 1.88 (26.04) 8.38 (21.58) 19.20 (25.71) − 11.7 (20.48) 3.28 (20.56) 24.98 (22.48) Baseline 0.461 (0.290) 0.583 (0.315) 0.643 (0.283) 0.677 (0.283) 0.662 (0.284) 0.518 (0.303) Follow-up 0.171 (0.326) 0.659 (0.256) 0.798 (0.194) 0.413 (0.462) 0.626 (0.303) 0.716 (0.285) Change −0.290 (0.301) 0.075 (0.301) 0.155 (0.286) −0.265 (0.310) − 0.037 (0.235) 0.198 (0.336) Baseline 0.661 (0.245) 0.754 (0.265) 0.808 (0.231) 0.826 (0.207) 0.811 (0.230) 0.715 (0.266) Follow-up 0.412 (0.282) 0.810 (0.221) 0.903 (0.144) 0.645 (0.401) 0.776 (0.253) 0.836 (0.224) Change −0.249 (0.246) 0.056 (0.260) 0.095 (0.243) −0.181 (0.286) − 0.035 (0.207) 0.121 (0.293)
HS = Health status; SD = Standard deviation.
86
O.H. Günther et al. / Journal of Affective Disorders 105 (2008) 81–91
Table 3 Comparison of responsiveness statistics for all instruments' composite overall scores and EQ-5D's preference measures by external anchors of change Statistics
Summary score
T-statistics WHOQOL BREF (paired t-test) (total score) CGI GAF EQ VAS EQ-5D index UK EQ-5D index D Effect size WHOQOL BREF (total score) CGI GAF EQ VAS EQ-5D index UK EQ-5D index D SRM WHOQOL BREF (total score) CGI GAF EQ VAS EQ-5D index UK EQ-5D index D
Anchor by EQ-5D transition question (patient based)
Anchor by BRAMES psychopathology scale (clinician based)
Worse HS N = 16
Worse HS N = 10
− 1.10
Stable HS N = 39 1.12
Better HS N = 49 6.43⁎⁎
Stable HS N = 40
−2.95⁎
0.27
Better HS N = 51 8.53⁎⁎
1.96 − 0.65 0.29 − 3.85⁎⁎ − 4.07⁎⁎ − 0.33
−1.80 0.29 2.43⁎ 1.53 1.34 0.18
− 5.76⁎⁎ 3.44⁎⁎ 5.23⁎⁎ 3.79⁎⁎ 2.73⁎⁎ 0.89
4.81⁎⁎ − 5.41⁎⁎ − 1.81 − 2.70⁎ − 2.06 − 1.22
− 0.71 − 2.41⁎ 1.05 − 1.03 1.11 0.03
−7.28⁎⁎ 6.27⁎⁎ 7.93⁎⁎ 4.21⁎⁎ 2.93⁎ 1.23
0.64 − 0.18 0.08 − 1.00 − 1.02 − 0.27
−0.34 0.05 0.36 0.24 0.21 0.18
− 0.98 0.46 0.84 0.55 0.41 0.92
1.41 − 1.56 − 0.43 − 0.94 − 0.87 − 0.93
− 0.10 − 0.26 0.14 − 0.13 − 0.15 0.04
− 1.35 0.92 1.19 0.65 0.45 1.19
0.63 − 0.16 0.07 − 0.96 − 1.01
−0.29 0.05 0.39 0.25 0.22
− 0.82 0.49 0.75 0.54 0.39
1.52 − 1.71 − 0.57 − 0.85 − 0.63
− 0.11 − 0.37 0.16 − 0.16 − 0.17
− 1.02 0.88 1.11 0.59 0.41
Significant t-statistics (⁎p b 0.05; ⁎⁎p b 0.01) and effect sizes, standardizes response mean N|0.8| are printed bold. SRM = Standardized response mean. HS = Health status.
47.2 (SD 13.9). The age ranged from 20 to 86 years, and the majority (70.2%) were female. More than half of all patients lived with a partner (51.9%). Most patients were diagnosed with moderate to severe levels of depressive episodes (61.6%), followed by repeated depressive episodes (38.5%). The mean duration of disease was 7.2 (SD 8.0) years and was associated with 1.8 (SD 2.1) inpatient stays on average. At the time of recruitment 40.8% of the patients were using inpatient care, 35.0% were outpatients, and 24.2% were daypatients. At baseline, mean scores of patients who completed at follow-up were similar in measures of quality of life compared to patients not participating in the follow-up interview ( p N 0.05). Some mean scores in psychopathology (CGI, BRAMES) and social functioning (GAF) were significantly different in the two groups indicating a higher impairment in the drop-out group. 3.2. Health status assessed by the EQ-5D descriptive system Fig. 1 shows the frequency of patients indicating problems in the dimensions of the EQ-5D descriptive
system: at baseline patients indicated problems predominantly in the dimension “anxiety/depression” (78.8%), followed by “usual activities” (66.4%), “pain/discomfort” (66.0%), “mobility” (28.8%) and “self care” (27.9%). At follow-up improvement in health status were clearly indicated in the dimensions “usual activities”, “pain/discomfort”, and “anxiety/depression”. The number of patient indicating an improvement in health status elicited by one of the EQ-5D dimensions ranged from 16 patients in the dimension “mobility” to 40 patients in the dimension “usual activities”. Indicated deterioration in health status ranged from 13 patients in the dimension “usual activities” to 21 patients in the dimension “anxiety/depression”. At baseline, the most frequently reported EQ-5D selfclassified health state showed no problems in the dimensions “mobility” and “self care”, but moderate problems in the other dimensions, which was indicated by 12.6% of all patients. The proportion of individuals with extreme problems in at least one of the dimensions was 25.4%. At follow-up the most frequently reported EQ-5D self-classified health state was no problems on any dimension, which was indicated by 17.3%. Extreme problems in at least one dimension were stated by 21.5%.
O.H. Günther et al. / Journal of Affective Disorders 105 (2008) 81–91
87
3.3. Relationship of difference scores in the EQ VAS and in the EQ-5D indices with other instruments
3.4. Responsiveness of the EQ VAS and the EQ-5D indices compared with other instruments
Table 1 demonstrates that difference scores of the EQ VAS and the EQ-5D indices showed significant correlation with all other difference scores of instruments used for comparison. Correlations of the EQ VAS and the EQ-5D indices with the BRAMES total score indicate a medium to large relationship. The EQ-5D index-D and the EQ-5D index-UK showed a large correlation of rs = 0.945. Correlations of the EQ VAS score with overall composite scales of the instruments tended to be stronger than correlations of the EQ-5D indices with these scales.
Table 2 shows the mean scores and the SD of the baseline interview, at follow-up and the resulting difference scores classified by the levels “worse health status”, “stable health status” and “better health status” according to the two anchors “EQ-5D transition question” and “BRAMES psychopathology scale”. For both anchors and all instruments, the category “better health status” was associated with mean difference scores reflecting an improvement in health status. The category “worse health status” was associated with mean difference scores reflecting a
Table 4 Results of the regression model estimating score differences controlled by instrument’s baseline score, period since first diagnosis, and age of the instruments interpreted as meaningful according the patient/clinician-based anchors based on the stable health status and (N = 104) Instruments Regression coefficient Anchor by EQ-5D transition question (patient based) Constant [SE] (95%-CI) Shift in change scores from stable HS to worse HS [SE] (95%-CI) Shift in change scores from stable HS to better HS [SE] (95%-CI) Instrument' score at baseline [SE] (95%-CI) Period since first diagnosis (years) [SE] (95%-CI) Age [SE] (95%-CI) R2
WHOQOLBREF
CGI
GAF
EQ VAS
EQ-5D index UK
EQ-5D index D
36.48 [7.41] 21.76; 51.19 −18.78 [5.30]
1.77 [0.55] 0.68; 2.86 0.98 [0.37]
39.97 [7.30] 25.48; 54.45 − 6.77 [3.98]
45.41 [7.81] 29.91; 60.92 −18.11 [5.50]
0.560 [0.098] 0.366; 0.754 −0.470 [0.038]
0.648 [0.093] 0.464; 0.833 −0.397 [0.057]
− 29.28; − 8.27 19.02 [3.76]
0.24; 1.71 −0.75 [0.26]
− 14.67; 1.14 6.84 [2.83]
− 29.02; −7.20 − 0.605; − 0.334 −0.509; − 0.284 12.29 [3.86] 0.104 [0.049] 0.059 [0.040]
11.56; 26.48 −0.75 [0.07] − 0.89; − 0.61 − 0.23 [0.23] − 0.69; 0.23 0.06 [0.13] − 0.20; 0.32 0.63
− 1.28; − 0.23 −0.67 [0.10] − 0.87; − 0.47 0.03 [0.02] − 0.004; 0.06 0.01 [0.01] − 0.03; 0.01 0.41
1.24; 12.45 −0.72 [0.08] − 0.88; − 0.57 − 0.28 [0.17] − 0.63; 0.07 0.14 [0.10] − 0.06; 0.33 0.51
4.62; 19.95 −0.73 [0.07] − 0.88; −0.58 0.06 [0.24] − 0.41; 0.54 0.01 [0.14] − 0.26; 0.28 0.53
0.007; 0.200 −0.661 [0.075] − 0.809; − 0.513 0.004 [0.003] − 0.002; 0.009 − 0.002 [0.002] − 0.006; 0.001 0.58
−0.022; 0.139 −0.674 [0.075] −0.822; − 0.526 0.004 [0.002] −0.0004; 0.009 −0.002 [0.001] −0.005; 0.0006 0.58
1.34 [0.51] 0.33; 2.34 1.26 [0.41]
21.48 [7.48] 6.63; 36.32 − 13.18 [4.08]
34.31 [8.39] 17.66; 50.96 − 18.55 [6.33]
0.328 [0.124] 0.082; 0.575 − 0.223 [0.095]
0.429 [0.121] 0.188; 0.669 −0.139 [0.080]
0.44; 2.08 −1.00 [0.27]
− 21.29; − 5.08 − 31.12; −5.99 − 0.411; − 0.036 −0.298; 0.021 11.98 [2.73] 13.49 [4.00] 0.167 [0.059] 0.106 [0.049]
− 1.53; − 0.47 −0.44 [0.11] − 0.65; − 0.23 0.02 [0.16] − 0.01; 0.05 − 0.01 [0.01] − 0.03; 0.01 0.45
6.57; 17.39 −0.49 [0.08] − 0.65; − 0.34 − 0.19 [0.16] − 0.50; 0.12 0.17 [0.09] − 0.001; 0.35 0.60
Anchor by BRAMES psychopathology scale (clinician based) Constant [SE] 29.13 [8.55] (95%-CI) 12.17; 46.09 Shift in change scores from stable − 23.18 [6.59] HS to worse HS [SE] (95%-CI) − 36.25; − 10.10 Shift in change scores from stable 15.98 [4.25] HS to better HS [SE] (95%-CI) 7.55; 24.40 Instrument' score at baseline [SE] −0.54 [0.08] (95%-CI) − 0.70; − 0.38 Period since first diagnosis (years) [SE] − 0.17 [0.25] (95%-CI) − 0.67; 0.33 Age [SE] 0.021 [0.14] (95%-CI) − 0.26; 0.30 R2 0.56
HS = Health status; SE = Standard error; 95%-CI = 95% confidence interval. Significant coefficients ( p b 0.05) are printed bold.
5.54; 21.43 −0.53 [0.08] − 0.69; −0.38 0.07 [0.24] − 0.40; 0.55 − 0.01 [0.14] − 0.27; 0.26 0.51
0.050; 0.283 −0.444 [0.092] − 0.627; − 0.261 0.01 [0.04] − 0.006; 0.008 − 0.02 [0.002] − 0.006; 0.002 0.37
0.008; 0.204 −0.505 [0.094} −0.691; − 0.319 0.02 [0.003] −0.004; 0.008 −0.01 [0.002] −0.005; 0.002 0.34
88
O.H. Günther et al. / Journal of Affective Disorders 105 (2008) 81–91
deterioration in health status, except for the EQ VAS anchored by the EQ-5D transition question; here the EQ VAS difference score reflects an improvement in health, although the anchor is associated with “worse health status”. Another striking issue concerns both EQ-5D indices: the absolute values of mean difference scores reflecting a deterioration in health status in the category “worse health status” were considerably larger than the mean difference scores reflecting an improvement in health status in the category “better health state”. Table 3 shows the responsiveness statistics (T-statistics, effect size (ES), standardized response mean (SRM)) for all instruments used split by the two anchors. The results are mainly based on the mean scores and SD presented in Table 2. In general, responsiveness of instruments was larger according to the clinician-based anchor compared with the patient-based anchor. Moreover, the WHOQOLBREF, the CGI, and the EQ VAS showed a large improvement in health in the category “better health status” according to both anchors. Concerning the category “better health status”, the EQ-5D index-UK demonstrated rather medium ES and SRM, whereas the responsiveness of the EQ-5D index-D was even smaller. Concerning the category “worse health status” the ES and SRM of both EQ-5D indices were almost twice as large, indicating rather large responsiveness. 3.5. Meaningful differences of the EQ VAS and the EQ-5D indices compared with other instruments Table 4 shows the results of the regression analysis. In general, absolute values of regression coefficients associated with a shift to “better health status” were smaller than regression coefficients associated with a shift to “worse health status” (for both anchors). Moreover, all coefficients indicating the influence of the baseline score were significant, but age and period since first diagnosis had no significant influence on difference scores. That means, the smaller the score at baseline the larger is the difference between the two assessments. In detail, the WHOQOL-BREF and the EQ VAS showed almost similar meaningful differences regarding both anchors compared to the group with stable health status. That the difference scores recording a meaningful improvement/deterioration in health status could differ across anchors became explicitly apparent in the EQ-5D indices; the EQ-5D index-UK indicated a gain between 0.104 and 0.167 as a meaningful difference associated with a shift to a “better health status” and a reduction between 0.470 and 0.223 as a meaningful difference associated with a shift to a “worse
health status”. Compared with the instruments measuring quality of life, psychopathology or social functioning, the shift to a “better health status” was smaller if elicited by the EQ-5D indices. 4. Discussion Consistent with the first hypothesis, there was a medium to large overlap of constructs measuring aspects of quality of life, psychopathology, and social functioning. More specifically, the EQ VAS seemed to measure similar constructs as the WHOQOL-BREF, whereas the two EQ5D indices showed less overlap with the instruments used for comparison. The EQ-5D index-UK revealed almost perfect correlation with the EQ-5D index-D. Thus, correlation analysis could lead to the impression that the measurement constructs recorded by the EQ-5D indices and the remaining instruments were not perfectly congruent. Results concerning the second hypothesis supported this impression; in comparison with the responsiveness statistics of the WHOQOL-BREF, the CGI, and the GAF, responsiveness was large in EQ VAS, yet rather medium in EQ-5D indices. EQ-5D indices differed from other instruments in respect of quantifying a shift in health status according to both, a patient-based and clinician-based anchor. In this respect, preference weights elicited by the general population seemed either to underestimate a health improvement or probably to measure a construct different from that of the instruments used for comparison. It is important to appreciate that particularly the characteristics of the EQ-5D preference-based measures, the definition of the anchors used and the nature of illness might account for the instrument’s responsiveness to record meaningful change in health status (Beaton et al., 2001; Krabbe et al., 2004). The characteristics of EQ-5D index scores are rather complex, reflecting status on health dimensions as well as preferences of the general population. At baseline patients were mainly affected by problems in the EQ-5D dimensions “usual activities”, “pain/discomfort“, and “anxiety/depression”; at follow-up substantial improvement was indicated in the two EQ-5D dimensions “anxiety/depression” and “usual activities” with a smaller number of patients revealing moderate and extreme problems. Responsiveness and meaningful differences of the EQ-5D indices was partly determined by their scoring algorithm. It contains a term – N3 – that reduces the index by 0.269 (EQ-5D index-UK) and 0.323 (EQ-5D index-D) units if patients indicate “extreme problems” in any dimensions. Thus if patients shift away from “extreme problems”, the EQ-5D index increases by the same amount. In this sample only four
O.H. Günther et al. / Journal of Affective Disorders 105 (2008) 81–91
patients changed in this direction. In case of the EQ-5D index-UK, a gain of 0.094 units was attached to the shift from “extreme problems” to “no problems” in the dimension “usual activities”, and a gain of 0.236 units was attached to such a shift in the dimension “anxiety/ depression” (Dolan, 1997). Such a shift in the dimension “mobility”, which captures problems principally due to physical impairment, would lead to a gain of 0.314 units. Thus, the numerical weight of each dimension on the composite preference index suggests that the general population in the United Kingdom assigns “usual activities” and “anxiety/depression” lower importance than, for example, “mobility” and “pain/discomfort” within the five dimensional health state described by the EQ-5D. As a consequence, it seems that disease specific aspects of depression have a rather low impact on the EQ-5D index-UK (Willige van de et al., 2005). With respect to the EQ-5D index-D, the shift from “extreme problems” to “no problems” in the dimension “usual activities” and “anxiety/depression” leads to zero unit change and a gain of only 0.065 units, respectively (Greiner et al., 2004). Different from the two indices, disease specific aspects of depression seem to exert influence on the EQ VAS evaluation of health status. This may be one reason for exhibiting a larger responsiveness in the category “better” health status of both anchors compared to the EQ-5D indices. The EQ-5D index-UK and to a higher extent the EQ5D index-D recorded a change within the category “worse health status” with greater responsiveness than a change in the category “better health status”. Intuitively, one explanation might be that the general population associates a greater loss in utility if patients shift to a “worse health status “. This would support the validity of the EQ-5D index in this patient sample. Another possible explanation might be attributed to the often reported ceiling effect of the EQ-5D indices (Brazier et al., 2004; König et al., 2006). Thus, the larger range at the bottom of the EQ-5D indices provided a potential for the assessment of larger change in health than at the compressed top of the scale. It seems that the responsiveness of the EQ-5D indices depend not only on the patient's change in health status but also on the degree of patient's impairment. An instrument's responsiveness and the ability of detecting a “meaningful difference” depend to a considerable extent on the anchor's definition (Guyatt et al., 2002). A clinician’s opinion about a “meaningful difference” may differ from patient's opinion and may lead to different results. In the patient-based anchor, the EQ-5D transition questions determined the categories of change. The estimation of “meaningful differences” of
89
an instrument which severs as an anchor (EQ-5D transition questions) and which is also part of the analysis (EQ VAS, EQ-5D indices), may lead to confounded results. Moreover, the likelihood of a patient belonging to one category defined by one of the anchors might be affected by the follow-up score. Supplementary Spearman correlation analysis showed large positive correlation of the EQ-5D transition question with follow-up scores, but small near zero correlation with baseline scores. This may be a hint that patients do not remember their baseline health status well at the time of answering the transition question (Norman et al., 1997). In contrast, the clinician-based anchor showed large positive correlation with follow-up scores and large negative correlation with baseline scores, which was expected since this transition assessment is based on the distribution of a disease specific instrument measuring psychopathology. Additionally, the time between baseline and follow-up assessment was up to 18 months, which is a large time frame compared with other longitudinal studies evaluating HRQOL in depression (Sapin et al., 2004). It could be expected that there is presumably a response shift due to adaptation in health status over time and that the meaningful difference score might not be constant over time, especially if a large change in health status is expected at the beginning of therapy. The selection of the five EQ-5D dimensions describing HRQOL was mainly based on the experience of the EuroQol group members along with a review of other generic HRQOL instruments (Brooks, 1996). In regard to patients with depression, empirical results suggest that other attributes like “sleep, cognition, energy, and participation in recreation” have also a connotatively impact on HRQOL (Skevington and Wright, 2001). Briefly, quality of life in patients with depression is presumably more comprehensive than the EQ-5D is able to elicit and the scope to what extent EQ5D index scores reflect other than problems in the dimension “anxiety/depression” of patients with depression needs further research (Supina et al., 2007). It should be noted that the EQ-5D index was primarily developed as an instruments to measure outcome for QALY analysis in economic evaluation rather than eliciting HRQOL in clinical trials (Supina et al., 2007). Within the framework of economic evaluation, the incremental cost-effectiveness ratio (ICER) (i.e. additional cost per QALY gained) is more informative than the difference in HRQOL alone. Theoretically, even a small difference in the EQ-5D index might still be “costeffective”, if additional cost for such a change in HRQOL is very low. Conclusively, despite the small
90
O.H. Günther et al. / Journal of Affective Disorders 105 (2008) 81–91
“meaningful differences” the economic information might be useful from decision maker’s perspective. However, the low responsiveness of the EQ-5D index tends to increase the uncertainty associated with the estimated ICER. Further research is needed at least in terms of how change in health status of patients with depression is elicited by other preference-based questionnaires (e.g. the SF-6D, the HUI or the Aqol). A limitation of our study was the sample size possibly precluding the detection of the very small responsiveness of the EQ-5D indices. We suggest that further research should resolve problems in statistical power by using a larger sample. However, the sample size was adequate to show the superior responsiveness of the instruments used for comparison. Therefore, against the background of study resources, the investigator has to consider that a larger sample size is required to detect significant meaningful differences in EQ-5D index scores of patients with depression. Acknowledgement This study was funded by the German Statutory Health Insurance (grant number 932000-050) and the German Federal Ministry of Education and Research (grant number 01ZZ0106). References Angermeyer, M.C., Kilian, R., Matschinger, H., 2000. Handbuch für die deutsche Version der WHO Instrumente zur Erfassung der Lebensqualität, Hogrefe Verlag. Beaton, D.E., Bombardier, C., Katz, J.N., Wright, J.G., 2001. A taxonomy for responsiveness. J. Clin. Epidemiol. 54, 1204–1217. Bech, P., Bolwig, T.G., Kramp, P., Rafaelsen, O.J., 1979. The Bech– Rafaelsen mania scale and the hamilton depression scale. Acta Psychiatr. Scand. 59, 420–430. Brazier, J., Roberts, J., Tsuchiya, A., Busschbach, J., 2004. A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ. 13, 873–884. Brooks, R., 1996. EuroQol: the current state of play. Health Policy 37, 53–72. Cohen, J., 1988. Statistical Power Analysis for Behavioral Science, 2 edn. Lawrence Earlbaum Associates, Hilsdale, NJ. Dolan, P., 1997. Modeling valuations for EuroQol health states. Med. Care 35, 1095–1108. Dolan, P., 1999. Whose preferences count? Med. Decis. Mak. 19, 482–486. Drummond, M., Jonsson, B., Rutten, F., 1997. The role of economic evaluation in the pricing and reimbursement of medicines. Health Policy 40, 199–215. Food and Drug Administration, 2006. Draft guidance for industry patient reported outcome measures: use in medical product development in support labeling claims. Fed. Regist. 71, 5862–5863. Gold, M., Siegel, J., Russel, L., Weinstein, M., 1996. Cost-Effectiveness in Health and Medicine. Oxford University Press, New York.
Goldman, H.H., Skodol, A.E., Lave, T.R., 1992. Revising axis V for DSMIV: a review of measures of social functioning. Am. J. Psychiatry 149, 1148–1156. Greiner, W., Claes, C., Busschbach, J.J., Graf von der Schulenburg, J.M., 2004. Validating the EQ-5D with time trade off for the German population. Eur. J. Health Econ. 6, 124–130. Guy, W., 1976. CGI, ECDEU assessment manual for psychopharmacology. US Departement of Health, Education, and Welfare, pp. 76–338. Guyatt, G., Walter, S., Norman, G., 1987. Measuring change over time: assessing the usefulness of evaluative instruments. J. Chronic. Dis. 40, 171–178. Guyatt, G.H., Osoba, D., Wu, A.W., Wyrwich, K.W., Norman, G.R., 2002. Methods to explain the clinical significance of health status measures. Mayo Clin. Proc. 77, 371–383 Notes: CORPORATE NAME: Clinical Significance Consensus Meeting Group. Hayhurst, H., Palmer, S., Abbott, R., Johnson, T., Scott, J., 2006. Measuring health related quality of life in bipolar disorder: relationship of the EuroQol (EQ-5D) to condition-specific measures. Qual. Life Res. 15, 1271–1280. Jaeschke, R., Singer, J., Guyatt, G.H., 1989. Measurement of health status. Ascertaining the minimal clinically important difference. Control. Clin. Trials 10, 407–415. König, H.H., Roick, C., Angermeyer, M.C., 2006. Validity of the EQ5D in assessing and valuing health status in patients with schizophrenic, schizotypal or delusional disorders. Eur. Psychiatr. 22, 177–187. Krabbe, P.F., Peerenboom, L., Langenhoff, B.S., Ruers, T.J., 2004. Responsiveness of the generic EQ-5D summary measure compared to the disease-specific EORTC QLQ C-30. Qual. Life Res. 13, 1247–1253. Lamers, L.M., Bouwmans, C.A., van Straten, A., Donker, M.C., Hakkaart, L., 2006. Comparison of EQ-5D and SF-6D utilities in mental health patients. Health Econ. 15, 1229–1236. Licht, R.W., Qvitzau, S., Allerup, P., Bech, P., 2005. Validation of the Bech–Rafaelsen melancholia scale and the hamilton depression scale in patients with major depression; is the total score a valid measure of illness severity? Acta Psychiatr. Scand. 111, 144–149. Norman, G.R., Stratford, P., Regehr, G., 1997. Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach. J. Clin. Epidemiol. 50, 869–879. Norman, G.R., Sloan, J.A., Wyrwich, K.W., 2003. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med. Care 41, 582–592. Papakostas, G.I., Petersen, T., Mahal, Y., Mischoulon, D., Nierenberg, A.A., Fava, M., 2004. Quality of life assessments in major depressive disorder: a review of the literature. Gen. Hosp. Psych. 26, 13–17. Revicki, D.A., Osoba, D., Fairclough, D., Barofsky, I., Berzon, R., Leidy, N.K., Rothman, M., 2000. Recommendations on healthrelated quality of life research to support labeling and promotional claims in the United States. Qual. Life Res. 9, 887–900. Revicki, D.A., Cella, D., Hays, R.D., Sloan, J.A., Lenderking, W.R., Aaronson, N.K., 2006. Responsiveness and minimal important differences for patient reported outcomes. Health Qual. Life Outcomes 4, 70. Roick, C., Deister, A., Zeichner, D., Birker, T., Konig, H.H., Angermeyer, M.C., 2005. The regional budget for mental health care: a new approach to combine inpatient and outpatient care. Psychiatr. Prax. 32, 177–184. Sapin, C., Fantino, B., Nowicki, M.L., Kind, P., 2004. Usefulness of EQ-5D in assessing health status in primary care patients with major depressive disorder. Health Qual. Life Outcomes 2, 20.
O.H. Günther et al. / Journal of Affective Disorders 105 (2008) 81–91 Shaw, J.W., Johnson, J.A., Coons, S.J., 2005. US valuation of the EQ5D health states: development and testing of the D1 valuation model. Med. Care 43, 203–220. Skevington, S.M., Wright, A., 2001. Changes in the quality of life of patients receiving antidepressant medication in primary care: validation of the WHOQOL-100. Br. J. Psychiatry 178, 261–267. Sprangers, M.A., Moinpour, C.M., Moynihan, T.J., Patrick, D.L., Revicki, D.A., 2002. Assessing meaningful change in quality of life over time: a users' guide for clinicians. Mayo Clin. Proc. 77, 561–571. Supina, A.L., Johnson, J.A., Patten, S.B., Williams, J.V., Maxwell, C.J., 2007. The usefulness of the EQ-5D in differentiating among persons with major depressive episode and anxiety. Qual. Life Res. 16, 749–754.
91
The EuroQol Group, 1990. EuroQol-a new facility for the measurement of health-related quality of life. The EuroQol group. Health Policy 16, 199–208. WHOQOLGroup, 1998. Development of the World Health Organization WHOQOL-BREF quality of life assessment. Psychol. Med. 28, 551–558. Wiebe, S., Guyatt, G., Weaver, B., Matijevic, S., Sidwell, C., 2003. Comparative responsiveness of generic and specific quality-of-life instruments. J. Clin. Epidemiol. 56, 52–60. Willige van de, G., Wiersma, D., Nienhuis, F.J., Jenner, J.A., 2005. Changes in quality of life in chronic psychiatric patients: a comparison between EuroQol (EQ-5D) and WHOQoL. Qual. Life Res. 14, 441–451. World Health Organization, 2004. International Statistical Classification of Diseases and Related Health Problems — Tenth Revision.