Original Research PULMONARY ARTERIAL HYPERTENSION
Estimating a Minimally Important Difference in Pulmonary Arterial Hypertension Following Treatment With Sildenafil* Claire Gilbert, MSc; Martin C. J. Brown, MSc; Joseph C. Cappelleri, PhD; Martin Carlsson, MS; and Stephen P. McKenna, PhD Background: No guidelines exist to help physicians determine whether functional and health-related quality of life (HRQoL) changes observed following treatment of patients with pulmonary arterial hypertension (PAll) represent important benefits. These analyses were undertaken to help define a minimally important difference (MID) in exercise capacity, measured by the 6-min walk distance (6MWD),and HRQoL, measured by the Short Form-36 (SF-36) questionnaire in patients with PAll. Patients and methods: Data from a 12-week sildenafil study in patients with PAll were used to calculate MIDs for 6MWD and the SF·36 physical functioning, role-physical, social functioning, and vitality scales using effect size, SEM, and SE of the difference approaches. Data from all patients enrolled into the treatment groups in the study were included. Results: A range of plausible MID estimates, including a score change for SF-36 scales and a change in distance walked in meters for 6MWD, were generated fur each end point. Mean values were calculated for each outcome and recommended as MIDs for each parameter. Based on these computations, the mean MIDs for the SF-36 physical functioning, role-physical, social functioning, and vitality scales and for 6MWD were 13, 25, 21, and 15 points, and 41 m, respectively. Conclusions: 'Otis is the first clinical investigation to estimate MIDs for key SF-36 domains and 6MWD in patients with PAll and provides a much needed metric for interpreting the level of change in patients with PAll against which other treatments and trials can be measured. (CHEST2~;13~137-14~
Key words: minimal important difference; patient-perceived benefits; pulmonary hypertension; sildenafil; sustained improved health-related quality of life
Abbreviations: (If = degrees of freedom; EOT = end of treatment; ES = effect size; HRQoL = health-related quality of life; MID = minimal important difference; PAH = pulmonary arterial hypertension; Sdiff = SE of the difference; SF-36 = Short Form-36; 6MWD = 6-min walk distance; SUPER = Sildenafil Use in Pulmonary Arterial Hypertension; WHO = World Health Organization
pulmonary arterial hypertension (PAH), a devastating condition of varied etiology, is characterized by vascular remodeling and progressive increase "From PGRD (Ms. Gilbert), Pfizer Ltd., Sandwich, Kent, UK; Global Health Outcomes Research (Mr. Brown), UCB Celltech, Slough, Berkshire, UK; Pfizer Inc (Dr. Cappelleri and Mr. Carlsson), Groton, CT; and Galen Research (Dr. McKenna), University of Central Lancashire, Manchester, UK. Ms. Gilbert is an employee of Pfizer Ltd., Dr. Cappelleri and Mr. Carlsson are employees of Pfizer Inc, Mr. Brown is a former employee of Pfizer Ltd., and Dr. McKenna was an unpaid consultant on this project. There are no other conflicts of interest. This study was funded by Pfizer Ltd., Sandwich, Kent, UK. www.chesljoumal.org
in pulmonary vascular resistance causing failure of the right ventricle and premature death.>" It is defined as persistent elevated pulmonary arterial pressure > 25 mm Hg at rest and> 30 mm Hg with exercise.l-" Patients with PAH exhibit extensive deManuscript received January 30, 2008; revision accepted August 21, 2008. Reproduction of this article is prohibited without written permission from the American College of Chest Physicians (www.chestjoumal. orglmisclreprints.shtml).
Correspondence to: Claire Gilbert, PGRD, PfizerLtd., Ramsgate Rd, Sandwich CT13 9NJ, UK; e-mail:
[email protected] DOl: IO.13781chest.07.0275 CHEST /135/1 / JANUARY, 2009
137
creases in physical mobility, energy, and sleep, and increases in pain, emotional reactions, and social isolation.' Typically, patients with PAH can walk < 400 m as measured by 6-min walk distance (6MWD) compared with 400 to 700 m for healthy indlviduals.S" Improvements following treatment have been reported: patients treated with epoprostenol were less fatigued, emotionally stronger, and had greater feelings of control over their disease than those not treated." Thus, both clinical and healthrelated quality of life (HRQoL) end points are essential for determining the outcomes of treatment.? However, no previous trials have attempted to quantitate the minimal effect in response to treatment that can be interpreted as important. There is controversy within the literature about the precise definitions and methods used to interpret changes in -end points, A common approach is to use the minimal important difference (MID), an estimation of the smallest difference in a measured end point that Signifies an important rather than trivial, although statistically significant, change in outcome," Calculating MID allows us to interpret the meaning and relevance of any change observed and to judge the magnitude and relevance of the beneflts.P-'? However, defining MID is problematic and complicated by differences in patient or physician perceptions, For example, individuals may assess the same benefit differently, or the same individual may change the value placed on a particular benefit based on circumstance. 10 Therefore, determinations of MID may vary depending on how the outcomes are defined and measured. No estimates of MID have yet been published for outcomes in PAH, nor is there a consensus on the methods to use. The main approaches are either anchor basedB-12 or distribution based. B-1B While anchor-based calculations of MID may often be preferred,Bthis approach could not be used in the present study as no suitable anchor was included in this trial. Instead, several distribution-based methods, namely effect size (ES),13,14 SEM,lO and SE of the difference (Sdiff),16 were used to produce a range of MIDs. We report this first attempt at estimating MID in PAH for prespecified Short Form-36 (SF-36) scales based on the results obtained in a randomized clinical trial," The MID for the 6MWD was also determined.
MATERIALS AND METHODS The Sildenafil Use in Pulmonary Arterial Hypertension (SUPER) trial (Revatio; Pfizer Inc; New York, NY) was a doubleblind, placebo-controlled study6 of 278 patients with symptomatic PAH. Patients were randomized to placebo or oral sildenafll three times daily for 12 weeks. Study medication was added to patient's background therapy. The primary clinical end point was exercise 138
capacity measured by 6MWD.5.6 A secondary end point was HRQoL assessed with the SF-36 questionnatre.P In the analyses that follow, data were pooled from all treatment groups (n = 2iJ7). HRQoL Instrument
The SF-36 version 1 is a widely used, multidimensional, generic HRQoL questionnaire that measures elements of physical and mental health with scores ranging from 0 to 100.20 It was selected for the SUPER trial because it appeared to address areas of HRQoL affected by PAR. No PAH-specific patient-reported outcome measures (eg, the Cambridge Pulmonary Hypertension Outcome Review2L) were available at the onset of this trial. Four SF-36 scales expected to be most relevant to patients with PAH were prespecified for these analyses:the physical functioning and role-physical scales. which address physical parameters (eg, climbing stairs, lifting groceries, and walking a specified distance), and the social functioning and vitality scales. The SF-36 questionnaire was completed at the time of randomization and at the end of treatment (EOT). Estimating a MID
Three distribution-based methods of estimating MID were explored as follows: ES: ES provides a benchmark for assessing the magnitude and meaning of changes in health status.P ES was defined as the average of the EOT minus baseline score divided by the SD of the baseline scores.J" Since no consensus exists on what constitutes a meaningful ES in patients with PAH, MID values were calculated using general guidelines for "small-to-moderate" (0.3), "moderate" (0.5), and "large" (0.8) ES levels,IS for the 6MWD test and each of the prespecified SF-36 scales. SEM: SEM is a cross-sectional statistic defined as "the variability between an individual's observed score and the true score," which estimates the extent to which observed change is true change, above and beyond individual measurement error.'? It is based on the premise that any change> 1 - SEM,16-L8 or 1.96 - SEM,L7.18 is likely to be meaningful. For the SF-36 domains, SEM was calculated at both thresholds as the SD of the instrument multiplied by the square root of 1 minus its reliability coefficient (ie, Cronbach a).I0,14 Since the 6MWD test contains only a single determination, no measure of Cronbach a could be obtained. Instead, an estimate of the intraclass correlation coefficient was used as a reliability coefficient. It was calculated from the screening to baseline period within the study that was not expected to change. This reliability estimate was then used for the calculations made at both time points (for baseline-based estimates and follow-up based estimates leading to Sdiff), Sdiff: Sdiff is based on SEM at two time points within a study and may offer a more accurate estimation of errors in measurement of the instrument being used, and a better interpretation of longitudinal change within a population.!" It is calculated as the square root of the sum of the squares of the SEM at the two time points within the trial: Sdiff = V(SEM 2 baseline + SEM 2 EOT). Corresponding estimates based on Sdiff were calculated for the 6MWD test and for each of the prespecified SF-36 scales at each threshold value of SEM. Interpreting the Range of Estimates As a result of the focus on distribution-based methods, the work reported here uses multiple approaches to estimate a range of change that can be interpreted as meaningful and could be assumed to incorporate the MID, rather than determining one singular threshold. Researchers have used ES levels from 0.2 to Original Research
Table I-Baseline Characteristics of Patients Enrolled* Sildenafil Treatment Groups Characteristics Female gender WHO functional class I II III IV Diagnosis Idiopathic PAH Associated PAH Connective tissue disease Repaired congenital S-P shunts 6MWD, m
20 mg (n = 69)
40 mg (n = 67)
SOmg (n 71
49 (71)
47 (70)
56(79)
0(0) 24 (35) 40 (58) 5(7)
0(0) 23(34) 44 (66) 0(0)
0(0) 28(39) 44 (59) 1 (1)
44(64)
43 (64)
46(65)
21 (30)
20(30)
21 (30)
4 (6)
4 (6)
4 (6)
347 ± 90
345 ± 77
339 ± 79
=
frequent diagnosis was idiopathic PAH (64%) followed by PAH associated with connective tissue disease (30%). An overwhelming majority of the patients (97%) were placed in either World Health Organization (WHO) functional class II (36%) or functional class III (62%). Calculating MID
*Data are presented as No. (%) or mean ± SD.
0.8 to estimate MID,22.2:l with the minimal change usually associated with ES between 0.2 and 0.5. 24 Published reports state that I-SEM can be equated with meaningful benefit in HRQoL measures, 16.I H.25 and that 1.96 - SEM offers a more conservative estimate.!" Since no firm consensus exists on what magnitude of ES or SEM corresponds to a MID, the required subgroup of those experiencing minimal change, as described by Farivar et a1,2Ii could not be clearly identified. The spread of estimates selected to represent a whole range of estimates was evaluated and, in the absence of a standard method for deciding one "correct" value from the range of estimates, the average was taken to represent a central magnitude of change that could be indicative of MID. The utility and limitations of these methods have been discussed in the literature.tv-? A post hoc responder analysis (X2) was then conducted to evaluate the number of patients achieving the MID in the treatment and placebo arms of the SUPER study. As tills work is an exploratory examination of methods used to estimate MID and the number of tests conducted was not high, no adjustment in p values was made.
RESULTS
Patient Characteristics
There were no significant differences in the baseline characteristics of the 207 patients in the different treatment groups of the trial (Table 1). The most
There were statistically significant differences from baseline to week 12 in all the prespecified SF-36 scales and 6MWD for the treatment groups when compared with placebo.P While there was no statistically Significant change in the placebo group for the 6MWD, there were statistically significant placebo effects for some SF-36 scales (paired t tests; degrees of freedom [dj]): physical functioning (t[df = 69] = 2.347; P = 0.022; mean change, 4.48); role-physical (t[d! = 67] = 3.142; P = 0.003; mean change, 15.44), social functioning (t[dj] = 69] = 2.473; P = 0.016; mean change, 7.32), and vitality (t[df = 69] = 2.679; P = 0.009; mean change, 5.5). Point estimates of MID were calculated using each of the methods described and are shown in Table 2 together with the mean and median of these estimates. The mean MID for each scale was similar for the SF-36 physical functioning and vitality scales (13 and 15, respectively) and a little higher for the social functioning (21) and role-physical (25) scales. The mean MID for the 6MWD was 41 m. There was considerable variation in the estimates for each end point on which the mean was calculated with a range of 35 points for the socialfunctioning scale and > 55 m for the 6MWD (Table 2). As expected, the magnitude of estimates increased with level of ES from 0.3 to 0.8 and also with SEM threshold. The Sdiff calculations incorporated the variability of the measure at both baseline and EOT and, consequently, were consistently higher than the SEM calculations. Responder Analysis
In a retrospective responder analysis (X2 ) , a higher sampled proportion of patients achieved the com-
Table 2-Calculating a MID; Comparison of Estimates Using All Methods* End Point SF-36 physical functioning SF-36 role-physical SF-36 social functioning SF-36 vitality 6MWD
ES 0.3 ES 0.5 ES 0.8 1- SEM 1 - Sdiff 1.96 - SEM 1.96 - Sdiff
.5.84
11.98 7.91 6.88 18.70
9.73 19.96 13.19 11.46 31.16
15.57 31.94 21.10 18.34 49.86
7.67 15.04 14.39 9.31 24.74
12.92 23.66 21.74 13.70 37.83
15.03 29.48 28.20 18.25 48.49
25.32 46.37 42.61 26.85 74.15
Range 5.84-25.32 11.98-46.37 7.91-42.61 6.88-26.85 18.70-74.15
Median Meant MIDt 12.92 23.66 21.10 13.70 37.83
13.15 25.49 21.31 14.97 40.70
13 25 21 15 41
*ES 0.3 (small to moderate); ES 0..5 (moderate); ES 0.8 (large); 1 - SEM = change of 1 SEM at baseline; I - Sdiff = change of I Sdiff 1.96 - SEM = change of 1.96 SEM at baseline; 1.96 - Sdiff= change of 1.96 Sdiff. t Mean of all MID estimations. www.chestjournal.org
CHEST / 135 / 1/ JANUARY, 2009
139
puted MID in the sildenafll group than in the placebo group for the 6MWD and the SF-36 physical functioning, role-physical, and vitality scales (Table 3). However, there was no difference between the groups for the social functioning scale. Statistical significance by X2 analysis was observed only for the 6MWD (p < 0.0001) and the physical functioning scale (p = 0.034). DISCUSSION To interpret any change in score on a trial end point accurately, it is essential to distinguish between changes that are clinically important or meaningful to the patient, and those that are trivial to the patient even though they may be statistically significant. The usual way of doing this is to calculate the MID: the smallest change in outcome that is considered to be meaningful to the patient," In the SUPER trial,21 statistically significant improvements were observed in several clinical parameters by EOT in patients treated with oral sildenafll. These were increase in distance walked and improvements in hemodynamic parameters, WHO functional class," and in relevant HRQoL domains.P The clinical improvements observed in the sildenafll-treated patients would be expected to have a positive impact on scores on the SF-36 scales that measure parameters associated with PAH symptoms (ie, physical functioning, rolephysical, social functioning, and vitality). Therefore,
Table 3-&sponder Anallsis; Comparison of Placeboand Sildenafi Treated Groups Parameters SF-36 physical functioning Total treated, No. Responders, No. Responders, % SF-36 role-physical Total treated, No. Responders. No. Responders, % SF -36 social functioning Total treated, No. Responders, No. Responders, % SF-36 vitality Total treated, No. Responders. No. Responders, % 6MWD Total treated, No. Responders, No. Responders, %
140
Placebo
Sildenafll
70 21 30
200 89 45
0.034
4.5156
68 26 38
200 93 47
0,236
1.4041
70 27 39
200 77 39
0.992
0.0001
70 23 33
200
0.202
1.6243
66 10 15
200 101 51
<0.0001
25.5001
83
p Value
X2 (df = 1)
42
MID estimations were explored for these elements of HRQoL and for 6MWD, the primary clinical outcome measured, to identify factors that could help physicians interpret these results more accurately. The current analysis focused on the estimation of MID using several distribution-based methods, each with its own strengths and limitations.v-" However, there is no agreement about which particular method, or level of change within a method, should be taken as equivalent to MID. In the absence of an anchor-based estimate of the MID, we have calculated a range of potential values for the MID based on statistics recommended for this purpose in the literature. 1O,14,16-18 As expected, the different methods yielded a wide range of estimates, with the level of MID varying predictably within each methodology (eg, the estimates increased with increasing levels of ES and threshold value of SEM). The method used to calculate Sdiff, which takes the reliability of the scale at both baseline and EOT into account, rather than using the average of the reliability from both time points, could be said to inflate the resulting Sdiff value. This is confirmed by the Sdiff calculations being consistently higher than the SEM estimates, and suggests that our postulated MID may be higher than it should be. With this in mind, we evaluated the median of the MID estimates to compare to the mean and explore the robustness of the data. As can be seen in Table 2, this would result in slightly lower MID estimates for the role-physical (24 vs 25), and vitality (14 vs 15) SF-36 scales, and somewhat lower MID for the 6MWD (38 vs 41). The MID values forphysical functioning and social functioning SF-36 scales remain the same. It is interesting to note that the MID obtained for 6MWD in the present study (41 m) compares closely with the 40 m reported for patients with chronic lung disease.P' The changes from baseline to EOT observed in those receiving sildenafll in the SUPER study were above the mean MID levels calculated for the primary end point (6MWD)6 and physical functioning 19 subscale of the SF-36. These are the two most salient end points for patients with PAH because they reflect limitations in patient activity. The retrospective responder analysis was consistent with this interpretation because despite a high placebo response on the SF-36 scales (30 to 90% being considered responders on all SF-36 scales explored), a greater proportion of patients were identified as responders (changed by at least MID level) on these two scales in subjects receiving sildenafil compared with those receiving placebo. However, these calculations were based on post hoc analyses for which the OriginalResearch
study was not powered. As such, these results need to be interpreted with caution. The main limitation of this analysis was the lack of anchor-based estimates of MID generated from patients or physicians. Including such assessments could change the MID values obtained. However, studies 17· 1s that have included both anchor-based and distribution-based estimates of MID have shown that the values are generally similar. At onset of the SUPER study, SF-36 provided the best option for measuring HRQoL because it addressed all the physical parameters of importance for patients with PAH. It will be interesting to evaluate this in relation to meaningful change on disease-specific measures that have since become available, such as the Cambridge Pulmonary Hypertension Outcome Heview.s! This study provides a much needed metric for interpreting the level of change in patients with PAH against which other treatments can be measured. By employing multiple methods, we generated a range of MID estimates for each outcome, which we suggest should be used as guides for interpreting the change being assessed. Based on these computations, the mean MIDs for the SF-36 physical functioning, role-physical, social functioning, and vitality scales and for 6MWD were 13, 25, 21, and 15 points, and 41 m, respectively. Although these estimates are likely to be higher than they should be due to the inflated Sdiff calculation, it is believed that the analyses reported here will provide useful standards for interpreting the effectiveness of PAH therapies in future trials. However, further work on patient and clinical perspective of change using anchorbased methods is strongly encouraged to supplement the findings reported here. ACKNOWLEDGMENT: The authors acknowledge Mukund Nori, PhD, MBA. of Envision Pharma for editorial assistance.
REFERENCES
2 3 4 5 6 7
Calie N, Torbicki A, Barst R, et al. Guidelines on diagnosis and treatment of pulmonary arterial hypertension: the Task Force on Diagnosis and Treatment of Pulmonary Arterial Hypertension of the European Society of Cardiology. Eur Heart J 2004; 25:2243-2278 Farber HW. Loscalzo J. Pulmonary arterial hypertension. N Engl J Med 2004; 351:1655-1665 Humbert M, Sitbon 0, Simonneau G. Treatment of pulmonary arterial hypertension. N Engl J Med 2004; 351:14251436 Shafazand S, Goldstein MK, Doyle RL, et al. Health-related quality of life in patients with pulmonary arterial hypertension. Chest 2004; 126:1452-1459 Enright PL. The six-minute walk test. Respir Care 2003; 48:783-785 Calfe N, Ghofrani HA, Torbicki A, et al. Sildenafil citrate therapy for pulmonary arterial hypertension. N Engl I Med 2005; 353:2148-2157 Assessing health status and quality-of-life instruments: at-
www.chestjoumai.org
tributes and review criteria. Qual Life Res 2002; 11:193-205 8 Hays RD, Farivar SS, Liu H. Approaches and recommendations for estimating minimally important differences for health-related quality of life measures. I COPD 2005; 2:63-67 9 Hays RD, Woolley 1M. The concept of clinically meaningful difference in health-related quality-of-life research: how meaningful is it? Pharmacoeconomics 2000; 18:419-423 10 Guyatt GH, Osoba D, Wu AW, et al. Methods to explain the clinical significance of health status measures. Mayo Clin Proc 2002; 77:371-383 11 Norman GR, Sridhar FG, Guyatt GH, et al. Relation of distribution- and anchor-based approaches in interpretation of changes in health-related quality of life. Med Care 2001; 39:1039-1047 12 Wells G, Beaton D, Shea B, et al. Minimal clinically important differences: review of methods. I Rheumatol 2001; 28:406-412 13 Kazis LE, Anderson JI, Meenan RF. Effect sizes for interpreting changes in health status. Med Care 1989; 27:S178189 14 Sprangers MA, Moinpour CM, Moynihan TI, et al. Assessing meaningful change in quality of life over time: a users' guide for clinicians. Mayo Clin Proc 2002; 77:561-571 15 Cohen I. Statistical power analysis for the behavioral sciences. New York, NY: Academic Press, 1977 16 Fitzpatrick R, Norquist 1M, [enktnson C. Distribution-based criteria for change in health-related quality of life in Parkinson's disease. I Clin Epidemiol 2004; 57:40-44 17 Wyrwich KW, Nienaber NA, Tierney WM, et al. Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life. Med Care 1999; 37:469-478 18 Wyrwich KW, Tierney WM, Wolinsky FD. Further evidence supporting an SEM-based criterion for identifYing meaningful intra-individual changes in health-related quality of life. I Clin Epidemiol 1999; 52:861-873 19 Pepke-Zaba I, Gilbert C, Collings L, et al. Sildenafil improves health-related quality of life in patients with pulmonary arterial hypertension. Chest 2008; 133:183-189 20 Ware JE Ir, Snow KK, Kosinski M. SF-36 health survey: manual and interpretation guide. Lincoln, RI: Quality Metric, 2000 21 McKenna SP, Doughty N, Meads DM, et al. The Cambridge Pulmonary Hypertension Outcome Review (CAMPHOR): a measure of health-related quality of life and quality of life for patients with pulmonary hypertension. Qual Life Res 2006; 15:103-115 22 Samsa G, Edelman D, Rothman ML, et al. Determining clinically important differences in health status measures: a general approach with illustration to the health utilities index mark ii. Pharmacoeconomics 1999; 15:141-155 23 Symonds T, Spino C, Sisson M, et al. Defining the minimum important difference in female sexual dysfunction: how to decide among different estimates from different methods? 12th Annual Conference of the International Society for Quality of Life Research. San Francisco CA. Qual Life Res 2005; 2040 24 Yost KI, Cella D, Chawla A, et al. Minimally important differences were estimated for the Functional Assessment of Cancer Therapy-Colorectal (FACT-C) instrument using a combination of distribution- and anchor-based approaches. I Clin EpidemioI2005; 58:1241-1251 25 Wyrwich KW, Tierney WM, Wolinsky FD. Using the standard error of measurement to Identify important changes on CHEST /135/1 / JANUARY, 2009
141
the asthma quality of life questionnaire. Qual Life Res 2002; 11:1-7 26 Farivar 55, Liu H, Hays RD. Half standard deviation estimate of the minimally important difference in HRQoL scores? Expert Rev Pharmacoeconomics Outcomes Res 2004; 4:5Hh523 27 Crosby RD, Kolotkin RL, Williams GR. Defining clinically
142
meaningful change in health-related quality of life. J Clin Epidemiol 2003; 56:395-407 28 Redelmeier DA, Bayoumi AM, Goldstein R5, et al. Interpreting small differences in functional status: the six minute walk test in chronic lung disease patients. Am J Respir Crit Care Med 1997; 155:1278-1282
Original Research