A questionnaire found disease-specific WORC index is not more responsive than SPADI and OSS in rotator cuff disease

A questionnaire found disease-specific WORC index is not more responsive than SPADI and OSS in rotator cuff disease

Journal of Clinical Epidemiology 63 (2010) 575e584 ORIGINAL ARTICLE A questionnaire found disease-specific WORC index is not more responsive than SP...

419KB Sizes 0 Downloads 10 Views

Journal of Clinical Epidemiology 63 (2010) 575e584

ORIGINAL ARTICLE

A questionnaire found disease-specific WORC index is not more responsive than SPADI and OSS in rotator cuff disease Ole M. Ekeberga,*, Erik Bautz-Holtera, Anne Kellera, Einar K. Tveita˚a, Niels G. Juela, Jens I. Broxb a

Department of Physical Medicine and Rehabilitation, Oslo University Hospital, Ulleva˚l and Medical Faculty, University of Oslo, 0407 Oslo, Norway b Department of Orthopaedics, Back Surgery and Physical Medicine and Rehabilitation Section, Oslo University Hospital, Rikshospitalet and Medical Faculty, University of Oslo, 0027 Oslo, Norway Accepted 3 July 2009

Abstract Objectives: To compare responsiveness and minimal clinically important change (MCIC) for the disease-specific Western Ontario Rotator Cuff index (WORC) and the two region-specific questionnaires Shoulder Pain and Disability Index (SPADI) and Oxford Shoulder Scale (OSS) in patients with rotator cuff disease receiving corticosteroid injection therapy. Study Design and Setting: One hundred twenty-one patients with rotator cuff disease. Western Ontario Rotator Cuff index, SPADI, and OSS were administered before treatment and at 2 and 6 weeks after corticosteroid injection. Responsiveness was compared between questionnaires using the standardized response mean (SRM), area under the receiver operating characteristic curve, and reliable change proportion (RCP) statistics. Minimal clinically important change estimates were reported. Results: The differences between questionnaires were small and not consistent across the different responsiveness indices. Shoulder Pain and Disability Index was significantly more responsive than OSS measured by SRM and RCP at 2 and 6 weeks. Western Ontario Rotator Cuff index was significantly more responsive than OSS in RCP and area under receiver operating characteristic curve at 6 weeks. Shoulder Pain and Disability Index was significantly more responsive than WORC measured by RCP at 2 weeks. Minimal clinically important change was estimated to 5, 275, and 20 points for OSS, WORC, and SPADI, respectively. Conclusions: All questionnaires are suitable for measuring change in patients with rotator cuff disease. Disease-specific WORC index is not more responsive than the region-specific SPADI and OSS in rotator cuff disease. Ó 2010 Elsevier Inc. All rights reserved. Keywords: Responsiveness; Minimal clinically important change; Rotator cuff disease; SPADI; OSS; WORC

1. Introduction Patient-centered evaluation of pain, disability, and quality of life has been recognized as important in clinical research of interventions for shoulder pain. Many selfadministered questionnaires have been developed, but only a few have been thoroughly validated in different clinical settings and by independent researchers [1]. The selection of a proper tool is challenging for most clinicians and researchers. Responsiveness has been defined as the ability to detect change when a real change has occurred [2] and is regarded as a form of construct validity [3]. This is an important characteristic of any measure of treatment * Corresponding author. Department of Physical Medicine and Rehabilitation, Oslo University Hospital, Ulleva˚l and Medical Faculty, University of Oslo, Kirkeveien 166, 0407 Oslo, Norway. Tel.: þ4723027444; fax: þ47-23027455. E-mail address: [email protected] (O.M. Ekeberg). 0895-4356/10/$ e see front matter Ó 2010 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2009.07.012

effects. Responsiveness of an outcome may depend on several factors such as the patient population studied, type of intervention, timing of data collection, and the construct of change being quantified [4]. Thus, it is necessary to investigate the responsiveness of the outcome measures in different clinical settings [5]. Controversies exist about how to measure responsiveness. Several approaches have been proposed based on study design and the construct of change. The different strategies have been divided into anchor-based and distribution-based methods. The anchorbased method links the instrument to a meaningful external anchor, whereas the distribution-based approaches include methods based on sample variability and measurement precision. As both approaches may have advantages and limitations, there have been authors advocating the use of both anchor- and distribution-based methods [5e7]. Self-report health assessment questionnaires are classified as generic, region-specific, and disease-specific

576

O.M. Ekeberg et al. / Journal of Clinical Epidemiology 63 (2010) 575e584

What is new? Key findings e WORC is not more responsive than SPADI and OSS in patients with rotator cuff disease. e New estimates of minimal clinically important change illustrate the limitation of using only one method. What this study adds to what was known? e Disease-specific Western Ontario Rotator Cuff index (WORC) has been compared with other region-specific shoulder questionnaires with diverging results. The current study provides statistical testing of differences in responsiveness estimates between questionnaires. e MCIC estimates of OSS and WORC have not previously been reported. What is the implication, what should change now? e No additional benefit of using the disease-specific WORC index as outcome measure in clinical trials of rotator cuff disease. e The MCICs provided may be used when planning and interpreting clinical trials.

measures. Specific measures are generally believed to be more responsive than generic measures [8]. Disease-specific questionnaires for patients with shoulder pain have been developed [9]. Region-specific questionnaires have the advantage that the same questionnaire can be used in patients with different diagnoses. A responsive measure will be preferred because the greater the responsiveness of an outcome measure, the fewer the patients required to detect significant between-group treatment effects. The responsiveness of the disease-specific Western Ontario Rotator Cuff index (WORC) has been compared with other region-specific shoulder questionnaires, but the results are diverging and without proper statistical testing of differences between questionnaires [10e12]. It is often difficult to retrieve what constitutes important changes in self-reported health status questionnaires [3,7]. Statistical difference is largely dependent on sample size and has little relation to the importance of the observed differences. To guide the interpretation of change scores, estimates of minimum changes in scores that are considered important by patients or clinicians are established [7]. This threshold has been designated the minimal important difference or minimal clinically important change (MCIC). We were not able to find any published studies reporting MCIC values for the WORC index and the Oxford Shoulder Scale (OSS).

The aim of this study is two-fold; first, we want to determine if the responsiveness of the disease-specific WORC index is superior to two commonly used region-specific scales, the OSS and the Shoulder Pain and Disability Index (SPADI) in patients with rotator cuff disease, and second, we want to present estimates of the MCIC of these outcome scores.

2. Methods 2.1. Study design and setting We invited general practitioners in Oslo, serving a population of half a million people, to refer patients with rotator cuff disease to the outpatient clinic of the Physical Medicine and Rehabilitation Department at Oslo University Hospital, Ulleva˚l. All patients were considered for inclusion in a randomized controlled trial comparing systemic and ultrasound-guided corticosteroid injection [13]. We included patients who were at least 18 years old and had all the following: pain on abduction; less than a 50% reduced passive glenohumeral range of motion in no more than one direction of external rotation, internal rotation, or abduction; pain on two of three isometric tests for abduction, external rotation, and internal rotation; and a positive HawkinseKennedy impingement sign. We excluded patients if they had symptomatic acromioclavicular arthritis, clinical and radiological findings indicating glenohumeral joint pathology, referred pain from the neck or internal organs, generalized muscular pain syndrome with bilateral muscular pain in neck and shoulders, a history of inflammatory arthritis, or insulin-dependant diabetes mellitus. Patients with previous fractures or surgery to the shoulder and contraindications to local steroid injections were excluded. From March 2005 until October 2006, 312 patients were considered for eligibility. One hundred seventy-three patients did not meet inclusion criteria. Thirty-three patients refused to participate, leaving 106 patients for the randomized trial. The randomized study was double blind. Local ultrasound-guided injections of triamcinolone and xylocain in the subacromial bursa and xylocain in the gluteal region were compared with ultrasound-guided injection of xylocain in the subacromial bursa and triamcinolone and xylocain in the gluteal region. Fifteen patients, not willing to participate in the randomized trial or excluded from the randomized trial because of a low SPADI score or a duration of symptoms less than 3 months, agreed to participate in the responsiveness study and received ultrasound-guided triamcinolone and xylocain injection in the subacromial bursa. Patients participating in the randomized study (n 5 106) were followed from the initial visit and followed up at 2 and 6 weeks. The patients only participating in the responsiveness study (n 5 15) were unable to attend long time follow-up or required additional treatment and were therefore only followed from the initial visit until the 2-week

O.M. Ekeberg et al. / Journal of Clinical Epidemiology 63 (2010) 575e584

follow-up. Seventy-four of the 121 patients in the responsiveness study filled in the questionnaires at two occasions before treatment with a 1-week interval, and the 55 of these indicating no change between visits on an 18 point change in main complaint scale were used for testeretest reliability. The results of the testeretest study have been reported previously, and a detailed description of analysis is presented there [14]. The intervals between the initial visit and the 2- and 6-week follow-up were used for responsiveness and MCIC analyses. At inclusion, each patient completed a demographic summary form along with the SPADI, OSS, and WORC. At 2 and 6 weeks after treatment, patients completed the set of outcome measures and were asked to rate the change in main complaint in shoulder condition after treatment on an 18-point ordinal scale. The patients were also asked to rate if the observed changes were of importance to their overall shoulder condition. The regional committee for medical research ethics in Norway approved the project.

577

subscale, dividing by the maximum possible score, and multiplying with 100. The total SPADI score was calculated by averaging the pain and disability subscale scores. The total SPADI score can range from 0 to 100, and a higher score indicates a worse shoulder pain and function. 2.2.1. Change in main complaint All patients were asked to evaluate their change in main complaint between initial visit and 2-week follow-up and the initial visit and the 6-week follow-up on an 18-point ordinal scale ranging from 9 to 9 anchored on the extreme values; worst possible and best possible. 2.2.2. Important change All patients indicating a change in main complaint at the 2-week and 6-week follow-up were asked to indicate if they believed that the degree of change experienced after treatment was of importance to their shoulder condition by using one of the answer categories ‘‘yes’’ or ‘‘no.’’ 2.3. Statistical analysis

2.2. Outcome measures Norwegian versions of SPADI, WORC, and OSS have been culturally adapted according to the guidelines in the literature [14,15]. The WORC index is a self-report questionnaire developed to measure health-related quality of life in patients with rotator cuff disease [16]. Western Ontario Rotator Cuff index consists of 21 items in five domains: physical symptoms (six items), sports and recreation (four items), work (four items), lifestyle (four items), and emotions (three items). Each item is scored on a 100-mm visual analog scale and summed to a total score of maximally 2,100, with a higher score indicating a reduced quality of life. Oxford Shoulder Score is a self-report questionnaire developed for patients having shoulder disease other than instability and consists of 12 questions about pain and disability [17]. The patients reported their pain or difficulty in completing a task by circling a number from 1 to 5 according to the verbal anchors after each number. All items were summed up for a total score ranging from 12 to 60. The sum scores were converted as recommended by Dawson et al. [18], to range from 0 (worst) to 48 (best). In the original publication of OSS, all respondents were asked to consider their shoulder for the last 4 weeks when completing the questionnaires. To compare the questionnaires, this was revised to pertain to the most recent week. Shoulder Pain and Disability Index is a self-report questionnaire for patients with shoulder pain and consists of 13 items divided into two domains: pain (five items) and disability (eight items) [19]. According to the original scoring system described by Roach et al., all items were rated using visual analog scales. At scoring, the visual analog scales were divided into 12 segments scored from 0 to 11. The subscale scores were calculated by summing the item scores in each

2.3.1. Sample size No formal sample size calculation was undertaken before starting the study. Generally, a sample size of 50 subjects is recommended for estimation of MCIC [20]. 2.4. Descriptive analyses Continuous variables were reported using mean and 95% confidence intervals (CI) when data had a normal distribution and median and quartiles if the distribution was skewed. All estimates were based on total raw sum scores. The distribution of baseline scores was inspected for possible floor and ceiling effects. To ease the comparison of the different responsiveness estimates between the questionnaires, results were also reported after transformation of all scores to range from 0 (worst) to 100 (best). Total sum scores of OSS and SPADI were calculated if no more than two items were missing [18,19]. We implemented a previously used ‘‘missing rule’’ requiring at least twothirds of items to be completed for calculating a sum score in WORC [21]. The Spearman rho correlations between the questionnaire change scores and the main complaint question were calculated to evaluate the suitability of this transition question as anchor. 2.5. Responsiveness Responsiveness was assessed using three different strategies: standardized response mean (SRM), receiver operating characteristic curve (ROC curve), and reliable change proportion (RCP). Standardized response mean statistics were calculated for the improved (SRMimproved) and the unchanged groups (SRMunchanged). Patient perceived important change was determined using the important change

578

O.M. Ekeberg et al. / Journal of Clinical Epidemiology 63 (2010) 575e584

question and the 18-point main complaint question. The distribution of change scores of the main complaint question was inspected in relation to the important change question. There was an obvious cutoff between three and four in the main complaint question for patient-rated improvement as shown in Fig. 1. Because we did not want to overlook patients undergoing important improvement, we chose to consider patients scoring three or more in the main complaint question as improved. Seven subjects at 2-week and six subjects at 6-week follow-up reported deterioration on the change in main complaint and rated the deterioration as important on the important change question. Due to the low number of subjects deteriorating, a ‘‘worse’’ subgroup was not included. We excluded these subjects from the analyses. The SRMs were calculated by dividing the mean change score by the standard deviation (SD) of mean change scores. According to Cohen [22], an effect size (ES) of 0.2, 0.5, and 0.8 or above is considered small, moderate, and large clinical changes, respectively. SRMimproved represents the distribution-based responsiveness of the questionnaires. SRMunimproved represents the specificity to change. Change without clinical relevance may occur in instrument scores. Ninety-five percent CIs of the SRMs and the difference between SRMs were estimated using a jackknife method calculated with the statistical package R [23,24]. Second, a ROC curve analysis was used to assess the sensitivity and specificity of correctly classifying patients to the improved or the unchanged group anchored to the main complaint question as defined above. The ROC curve is created by plotting the true-positive rate (sensitivity) against the false-positive rate (1-specificity) and shows the tradeoff between the true-positive success and the false-positive error at each of several cutoff points in the change score. Area under receiver operating characteristic curve (ROCAUC) can be interpreted as the ability of the instrument to discriminate between the improved and the unchanged groups. A perfect accuracy would yield a value of 1, whereas a value of 0.5 would indicate chance alone.

Responsiveness was considered adequate when ROCAUC was more than 0.70 [20]. The ROCAUC is reported with 95% CIs and compared between questionnaires using the DelongeDelongeClarke-Pearson comparison method [25] calculated with the statistical software Analyze-it Method Evaluation Edition for Microsoft Excel (version 2.12). Third, the RCP was estimated for each questionnaire. Reliable change proportion is defined as the proportion of patients improving by more than the smallest detectable change (SDC) for each outcome measure [26]. The SDC for an individual patient was calculated for each instrument by SDC 5 S.E.M.  1.96  O2, where 1.96 represents the z score 95% level of confidence and S.E.M.agreement is calculated by taking the square root of the mean square error term of the 1-way ANOVA report using subjects as a factor. The SDC estimates for SPADI, OSS, and WORC were based on the 55 patients responding to the outcome measures at two occasions with a 1-week interval [14]. The RCPs are reported as proportions with CIs using the Wilson method [27]. Difference between RCPs with 95% CIs was calculated using the recommended Newcombe method for paired samples [27]. We used the statistical computer programme Confidence Interval Analysis 2.1.2 for calculating CIs of the RCPs [27]. Remaining statistics were computed using the SPSS version 16.0 for Mac (SPSS, Chicago, Illinois, USA). 2.6. Minimal clinically important change There are several different approaches for estimating MCIC, and little evidence exists that relates the size of MCIC to the method applied [28]. We used two different strategies for estimating the MCIC, the anchor-based MCIC distribution method proposed by de Vet et al. [7] and the mean change score of patients with minimal clinically important improvement. For the present study, the latter includes patients scoring 3 and 4 on the main complaint score. The anchor-based MCIC distribution integrates a distribution-based and an anchor-based method. Patients were categorized as improved and unchanged according to

Fig. 1. Distribution of change in main complaint scores according to patient-perceived importance of change at 2 and 6 weeks.

O.M. Ekeberg et al. / Journal of Clinical Epidemiology 63 (2010) 575e584

scoring on the main complaint question. Distribution plots of change scores were depicted with MCIC thresholds drawn as lines in the plot based on the optimal cutoff of ROC curve analysis and a 95% limit method. Optimal cutoff point by the ROC method (MCICROC) was determined as the upper left point on the ROC curve for each questionnaire. The 95% limit cutoff (MCIC95% limit) is defined as the upper limit of the distribution of persons who are, according to the anchor, not importantly changed. MCIC95% limit was estimated by adding the mean change to the cross product of 1.645 and SD of change (mean change þ 1.645 SDchange). The MCIC95% limit was drawn as lines in the distribution plots for visual comparison with the MCICROC. The underlying concept is that MCICROC should be detectable beyond measurement error [7]. Because the estimate of MCIC is likely to be affected by the baseline scores [29,30], MCIC was calculated according to the lowest and highest tertiles of baseline scores and percent improvement from baseline.

3. Results Forty-four men and 77 women with a mean age of 51 years [11] participated in the study (Table 1). Two patients did not attend the 2-week follow-up. All patients were encouraged to answer all items in the questionnaires. At follow-up of 2 and 6 weeks, one subject did not fill in WORC and one subject did not fill in OSS and thereby violated the missing rules. At baseline, there were five patients leaving one missing item and two patients leaving two missing items in SPADI, three patients leaving one missing item in OSS and six patients leaving one missing item, and one and two patients leaving three and four missing items, respectively, in WORC. At 2week follow-up, 10 patients left one missing item in SPADI and OSS, 2 patients left one, 2 patients left two, and 1 patient left three missing items in WORC. At 6-week follow-up, five patients left one and two patients left two missing items in SPADI; four patients left one missing item in OSS; and six patients one item, one patient three items, and one patient five Table 1 Demographic characteristics at baseline Age, mean (SD), yr Sex (male/female) Symptom duration !6 mo 6 mo to 1 yr 1e2 yr O2 yr Dominant arm affected On sick leave Baseline SPADI score Baseline OSS score Baseline WORC score

51 (11) 44/77 35 39 20 27 73 36 52 29 1,129

(29) (32) (17) (22) (60) (30) (19) (7) (345)

Notes: Values are numbers (percentages) unless stated otherwise. Abbreviations: SPADI, Shoulder Pain and Disability Index; OSS, Oxford Shoulder Scale; WORC, Western Ontario Rotator Cuff Index.

579

missing items in WORC. No formal investigations were planned to reveal reasons for missing items, but it is our impression that patients were reluctant to answer questions they did not think pertained to them. Missing items were randomly distributed in OSS and SPADI. Patients seemed more reluctant to answer question 8: ‘‘How much difficulty do you experience doing push-ups or other strenuous shoulder exercises because of your shoulder?’’, item 9: ‘‘How much has your shoulder affected your ability to throw hard or far?’’ and item 17: ‘‘How much difficulty do you have ‘roughhousing or horsing around’ with family and friends?’’ in WORC. Two patients had significant improvement on all outcome measures, at clinical examination, rated their shoulder as good or excellent at 2- and 6-week follow-up, rated the change as important at 2-week follow-up, but still answered that the change was unimportant at 6 weeks. We believed that these two patients had misinterpreted the important change question at 6 weeks. The patients were contacted retrospectively and affirmed our suspicion. After a discussion within the research group, the responses were corrected. All change scores were normally distributed. There were no signs of floor or ceiling effects in any of the sum scores. The Spearman Rho correlations between the anchor and the change scores of OSS, SPADI, and WORC ranged from 0.50 to 0.75 and imply an appreciable association between the anchor and the outcome change scores [5,7,31]. Seventy-nine of 112 (71%) patients rated themselves as improved at 2 weeks and 65 of 100 (65%) patients at 6 weeks according to the chosen anchor. 3.1. Distribution-based responsiveness Mean change scores and SRMs with 95% CIs in the improved and the unchanged groups are presented in Table 2. For the improved group, the SRMs were high for all three questionnaires at both 2- and 6-week follow-up. Shoulder Pain and Disability Index had the highest point estimate of SRM. In the improved group, the 95% CI of the difference between SRMs did not include zero, which implies a significantly higher SRM in SPADI compared with OSS at 2 and 6 weeks (Table 3). Other contrasts did not reach statistical significance. For the unchanged group, the SRMs were small and differences between scores nonsignificant. Reliable change proportions and differences between RCPs are presented for the improved and the unchanged groups in Tables 2 and 3. Reliable change proportions for the improved group were 53%, 60%, and 71% at 2 weeks and 59%, 74%, and 82% at 6 weeks for OSS, WORC, and SPADI, respectively. The 95% CI of the difference between RCPs did not include zero and implies a significantly higher RCPimproved of SPADI compared with OSS and WORC at 2 weeks and a significantly higher RCP of WORC vs. OSS and SPADI vs. OSS at 6 weeks. All

580

O.M. Ekeberg et al. / Journal of Clinical Epidemiology 63 (2010) 575e584

Table 2 Measurement’s mean baseline scores, change scores, SRM and RCP with 95% confidence intervals in patients reporting improvement and no change Improved group SPADI Baseline score 2 wk 50.1 (17.4) 6 wk 53.3 (16.6) Mean change score 2 wk 30.2 (18.6) 6 wk 36.3 (17.7) SRM 2 wk 1.62 (1.31, 1.94) 6 wk 1.87 (1.54, 2.19) RCP 2 wk 71% (60e80%) 6 wk 82% (70e89%)

Unchanged group WORC

OSS

SPADI

WORC

OSS

1,102.9 (331.4) 1,180.8 (316.5)

29.2 (6.4) 28.7 (5.9)

54.6 (22.1) 52.1 (20.1)

1,161.4 (347.0) 1,103.1 (362.5)

28.1 (7.0) 28.9 (6.4)

529.1 (352.4) 671.6 (339.3)

8.7 (6.6) 10.9 (6.5)

7.9 (14.1) 8.4 (17.2)

73.3 (201.2) 60.9 (268.4)

1.1 (5.0) 1.5 (4.7)

1.46 (1.21, 1.70) 1.69 (1.36, 2.02)

1.32 (1.05, 1.60) 1.52 (1.22, 1.81)

0.56 (0.05, 1.18) 0.35 (0.17, 0.70)

0.39 (0.05, 0.72) 0.09 (0.25, 0.42)

60% (49e70%) 74% (62e83%)

53% (42e63%) 59% (47e71%)

9% (3e24%) 14% (6e25%)

10% (3e25%) 6% (2e19%)

0.22 (0.16, 0.59) 0.15 (0.19, 0.50) 6% (2e20%) 3% (1e15%)

Notes: Means and SDs reported for baseline and change scores. 95% CIs for SRM and RCP statistics reported in parentheses. Abbreviations: SPADI, Shoulder Pain and Disability Index; WORC, Western Ontario Rotator Cuff Index; OSS, Oxford Shoulder Scale; SRM, standardized response mean; RCP, reliable change proportion.

remaining contrasts were nonsignificant. There were no statistically significant differences in RCPunchanged of the questionnaires. 3.2. Anchor-based responsiveness Figure 2 and Tables 3 and 4 show the results of the ROC analyses. ROCAUC was high and above 0.70 for all outcome measures. Western Ontario Rotator Cuff index had the highest ROCAUC and was significantly higher than OSS (difference 0.09, 95% CI: 0.03, 0.15) (Table 3 and Fig. 2). The difference between ROCAUC of SPADI and WORC was nonsignificant. 3.3. Minimal clinically important change The frequency distribution of the change scores from baseline to 6-week follow-up according to the chosen anchor is depicted in Fig. 3 with cutoff lines for MCICROC and MCIC95% limit. The ROC cutoff points with highest sensitivity and specificity measured on a 0e100 scale were 20.3, 14.6, and 10.5 at 2 weeks and 20.0, 8.3, and 12.8 at 6

weeks for SPADI, OSS, and WORC, respectively (Table 4). For concurrent comparison, using the mean value of patients scoring 3 and 4 on the global scale, MCICmean were 15.4, 9.0, and 11.5 at 2 weeks and 23.1, 9.6, and 17.0 at 6 weeks for SPADI, OSS, and WORC, respectively. There was a tendency toward an increase in MCIC in the tertile with the highest baseline score (Table 4). The percent changes from baseline necessary for patient-rated improvement were 47%, 38%, and 22% at 2 weeks and 30.5%, 25%, and 22.6% at 6 weeks for SPADI, OSS, and WORC, respectively. The MCIC95% limit was 31.1, 19.3, and 19.3 at 2 weeks and 24.6, 16.9, and 15.4 at 6 weeks for SPADI, OSS, and WORC, respectively. The SDC was 19.7 for SPADI, 16.1 for OSS, and 17.2 for WORC and within range of the MCIC95% limit.

4. Discussion The primary objective of this study was to determine whether responsiveness of the disease-specific WORC index was greater than the region-specific SPADI and OSS

Table 3 Differences in SRM, RCP, and ROCAUC between SPADI, OSS and WORC for improved and unchanged groups reported with 95% CIs Improved SRM SPADI vs. WORC 2 wk 0.17 6 wk 0.17 OSS vs. WORC 2 wk 0.13 6 wk 0.17 SPADI vs. OSS 2 wk 0.30 6 wk 0.34

Unchanged RCP

(0.04, 0.38) (0.10, 0.44)

0.11 (0.02, 0.19)* 0.07 (0.05, 0.18)

(0.41, 0.14) (0.48, 0.14)

0.08 (0.18, 0.03) 0.18 (0.30, 0.06)*

(0.01, 0.60)* (0.04, 0.65)*

0.17 (0.07, 0.27)* 0.23 (0.12, 0.35)*

ROCAUC 0.04 (0.10, 0.02) 0.01 (0.06, 0.03) 0.06 (0.13, 0.02) 0.09 (0.15, 0.03)* 0.02 (0.11, 0.07) 0.05 (0.01, 0.13)

Difference in SRM

RCP

0.17 (0.28, 0.89) 0.26 (0.12, 0.64)

0.03 (0.19, 0.13) 0.09 (0.23, 0.03)

0.17 (0.54, 0.20) 0.07 (0.28, 0.42)

0.03 (0.19, 0.12) 0.03 (0.16, 0.08)

0.35 (0.19, 0.89) 0.19 (0.21, 0.59)

0.03 (0.19, 0.12) 0.11 (0.27, 0.02)

Abbreviations: CI, confidence interval; SRM, standardized response mean; RCP, reliable change proportion; SPADI, Shoulder Pain and Disability Index; WORC, Western Ontario Rotator Cuff; OSS, Oxford Shoulder Scale. * Significant differences on P 5 0.05 level. No adjustments made for multiple contrasts.

O.M. Ekeberg et al. / Journal of Clinical Epidemiology 63 (2010) 575e584

581

Fig. 2. ROC curves of SPADI, OSS, and WORC at 2- and 6-week follow-up. ROC, Receiver Operating Characteristic; SPADI, Shoulder Pain and Disability Index.

shoulder questionnaires. The study sample consisted of 121 patients with a clinical diagnosis of rotator cuff disease who had systemic or local corticosteroid injections. Applying a main complaint scale as an external anchor, all scores showed high SRMs, area under the ROC curves, and RCPs in the improved group. The SRMunchanged and RCPunchanged were low and not significantly different from zero, indicating adequate specificity to change for all questionnaires. Our results demonstrate that all questionnaires are suitable for measuring change in rotator cuff disease. The diseasespecific WORC index is not superior to the region-specific SPADI or OSS at measuring change over time. The different responsiveness indices did not provide consistent significant differences between questionnaires. Controversies exist regarding optimal design and methods used in responsiveness studies. Stratford and Riddle recommend that analytic strategies should be based on study design and the corresponding sample change characteristics [32]. Others have proposed combining anchor- and distribution-based approaches [5e7]. In the present study, all point estimates favored SPADI and SRM and RCP statistics indicated that SPADI was significantly superior to OSS. Based on these analyses, it is reasonable to conclude that SPADI provides best responsiveness but the ROC curve analysis did not favor SPADI compared with OSS. When analyzing responsiveness in a population with heterogeneous patient

change composition, the area under ROC curve is recommended as a valid measure of responsiveness [32]. Results from the ROC curve analysis, weighting sensitivity and specificity equally important, suggest that a conservative conclusion of no difference between questionnaires is most appropriate. Several authors have investigated the responsiveness of SPADI and reported high SRMs, ESs, RCPs, and ROCAUC compared with other region-specific questionnaires [26,33e35]. Oxford Shoulder Scale has shown comparable ES to SPADI in a population of patients with rotator cuff disease who received a variety of different treatments, although OSS was more responsive than the generic SF-36 and the disability index of the Stanford Health Assessment Questionnaire [17,36,37]. MacDermid et al. found higher point estimates of SRM of WORC compared with two region-specific shoulder scores, but CIs were not provided [12]. Others have not found evidence for superior responsiveness of WORC compared with region-specific questionnaires [10,11]. Generally, evidence does not seem to support the notion that disease-specific questionnaires are more responsive than region-specific questionnaires in musculoskeletal disease [38,39]. The secondary objective of this study was to estimate an MCIC in these outcome measures. Minimal clinically important change defines the smallest meaningful change score of

Fig. 3. Frequency distribution plots of change scores of SPADI, OSS, and WORC (0e100 score) after 6 weeks of unchanged and improved groups according to chosen anchor. MCICROC and 95% limit cutoff points drawn as lines in the plots. SPADI, Shoulder Pain and Disability Index.

582

O.M. Ekeberg et al. / Journal of Clinical Epidemiology 63 (2010) 575e584

Table 4 Area under ROC curve and MCIC estimates with 95% confidence intervals according to patient perceived important change in main complaint ROCAUC SPADI 2 wk 6 wk OSS 2 wk 6 wk WORC 2 wk 6 wk

0.84 (0.76, 0.91) 0.92 (0.87, 0.97)

MCICROC 20.3 20.0

Sensitivity/ specificity (%) 70/97 80/91

MCIC95%

limit

31.1 24.6

0.82 (0.74, 0.90) 0.87 (0.80, 0.94)

7.0 (14.6) 4.0 (8.3)

58/94 87/71

9.3 (19.3) 8.1 (16.9)

0.87 (0.80, 0.94) 0.95 (0.89, 0.98)

220.4 (10.5) 269 (12.8)

74/81 85/97

404.3 (19.3) 324.3 (15.4)

MCICmean

MCICpercent

SDC

MCIC1.tertile

MCIC3.tertile

15.4 23.1

47.0 30.5

19.7

20.9 13

15.0 15.7

4.3 (9.0) 4.6 (9.6)

38.0 25.0

7.7 (16.1)

242.0 (11.5) 358.0 (17.0)

22.0 22.6

361.2 (17.2)

4.2 (8.8) 2.5 (5.2) 19.4 (0.9) 241.7 (11.5)

6.5 (13.5) 8 (16.7) 388.5 (18.5) 308.5 (14.7)

Notes: ROCAUC reported with 95% CI. Numbers are given as raw scores with 0e100 scales in parentheses. Abbreviations: MCIC, minimal clinically important change; ROCAUC, area under receiver operating characteristic curve; MCICROC, MCIC measured by optimal cutoff of ROC curve; MCIC95% limit, MCIC measured by 95% limit of ‘‘unchanged group’’; MCICmean, MCIC measured by mean score rating change minimal important, that is, change in main complaint to 3 or 4; MCICpercent, MCIC measured by optimal cutoff of ROC curve based on percent change from baseline score; SDC, smallest detectable change from testeretest study; MCIC1.tertile, MCIC3.tertile, MCIC measured by optimal cutoff of ROC curve based on subjects with lowest and highest tertile baseline score, respectively; SPADI, Shoulder Pain and Disability Index; OSS, Oxford Shoulder Scale; WORC, Western Ontario Rotator Cuff.

an outcome measure. Minimal clinically important change is not a fixed number, but a context-specific value that may depend on several factors such as the perspective of which minimal importance is considered, the baseline values on the outcome instrument under study, and the statistical method used [7,28]. Several approaches for estimating MCIC have been reported in the literature [6,40]. In the present study, the MCICROC and the MCICmean method showed comparable values for important change. Incorporating the measurement error, as with the MCIC95% limit method, resulted in a larger value and a more conservative estimate of the MCIC. The results imply that change scores above the individual measurement error will automatically be clinically relevant and that these questionnaires are better suited for group level than individual patient evaluation. The MCIC values are point estimates. Because of the inherent uncertainty in both the estimates and methodology used, it has been recommended to report the MCIC estimates as absolute values but incorporating a small range of uncertainty [5]. The present results suggest that a change of approximately 5 points in OSS, 275 points in WORC, and 20 points in SPADI is necessary for patient perceived important change. We found increasing MCIC values with increasing baseline score for OSS and WORC, but not in SPADI. Results from subgroup analyses should be interpreted with caution. Receiver operating characteristic curves of nonparametric data from small subgroups are not smooth, which increase the risk of deriving erratic optimal cutoff points [41]. The MCIC values obtained in the present study using the percentage change from baseline are comparable to change considered to be clinically important for a range of commonly used back pain outcome measures [42]. The MCIC of SPADI has been reported to be 8, 10, and 13.8 points [26,43,44]. Differences in populations tested and statistical and methodological strategies used in these studies may explain the different estimates of MCIC. We

were not able to find any published MCIC values for WORC and OSS. The developers of OSS have proposed using half of the SD of change until estimates are available [45]. There is a debate regarding the use of MCIC values [42]. In clinical trials, between-group differences are compared. Some authors have proposed that because MCIC values are given for individual patient changes, the MCIC estimates cannot be used directly in the interpretation of betweengroup differences. They argue that statistically significant between-group differences smaller than the MCIC may have clinical relevance and suggest analyzing proportions of patients who benefit from treatments [7,46]. A methodological challenge in responsiveness analyses is the lack of a ‘‘gold standard’’ for the construct of clinical change. We contrasted the change score of the outcome measurements in those who improved and those who had no change on the self-perceived global assessment of change. The potential bias of using a single global scale as a gold standard has been discussed, especially with respect to recall bias and the unknown reliability and validity of global ratings [47,48]. Other authors regard global rating scales of change as clinically relevant outcome measures [49]. Ostelo and de Vet convincingly argued, ‘‘most physicians would be reluctant to label a patient as improved or deteriorated against the patients’ personal assessment’’ [50]. A single global rating question may reflect other constructs such as patient satisfaction and not improved pain, function, or return to work. In the present study, the anchor correlated with the outcome 0.5 or above as recommended when assuming sufficient validity of a global scale. This threshold is not a guarantee of sufficient validity as only 25% of the variability is explained, and the outcomes may measure different constructs. There were patients responding with missing items in all questionnaires. In SPADI and OSS, the number of missing items was within the recommended ‘‘missing rules.’’ No

O.M. Ekeberg et al. / Journal of Clinical Epidemiology 63 (2010) 575e584

missing rules are published for the WORC index. The number of missing items was slightly higher in WORC compared with OSS and SPADI. Although unlikely, missing items may challenge the validity of the reported results. A cutoff point separating stable from changed patients is by definition arbitrary. The cutoff has often been chosen without any relation to patient or clinician rating of change in condition. We focused on the individual patient’s within change in shoulder condition and stressed the patient’s perceived importance to the change observed. The choice of cutoff made an impact on the size of all estimates. Choosing a less strict cutoff (change in main complaint 5 2) reduced all responsiveness estimates and a stricter cutoff (change in main complaint 5 4) increased all responsiveness estimates, but the between-scale differences were unaltered. The MCICs were approximately halved in size with the less strict cutoff and remained unchanged choosing the stricter cutoff. The SDC values from the testeretest study were generally slightly smaller than the MCIC95% limit values. This finding indicates that the chosen cutoff resulted in an ‘‘unchanged group’’ consisting of subjects who may have undergone a small but patient-rated unimportant change in condition. Results presented in this study are likely to be affected by patient selection, treatment, follow-up period, and the choice of anchor and are therefore not necessarily generalizable to other settings. It has been recommended that responsiveness and MCICs should be demonstrated and documented for the particular study population [5]. The differences in responsiveness between scores in the present study were small and inconsistent between different responsiveness indices. There is no agreement of what constitutes important between-score differences in responsiveness or the implications of these differences. In research, SPADI and WORC may be preferred. In clinical practice, features such as comprehensibility, administration time, and ease of scoring may outweigh small differences in responsiveness. The choice of what constitutes a clinical important change is a clinical decision. It is reasonable to demand a larger improvement for operative treatment than clinician’s advice.

5. Conclusion We conclude that the main finding in the present study suggests that all questionnaires are suitable for measuring change in patients with rotator cuff disease. We did not identify consistent significant differences in responsiveness between the disease-specific WORC index and the regionspecific SPADI and the OSS in this patient sample.

Acknowledgments The University of Oslo funded this study. All authors declare no competing interests. We wish to thank Margrethe

583

Grotle at FORMI, Division for Neuroscience and Musculoskeletal Medicine at Oslo University Hospital, Ulleva˚l, for helpful advices in the planning of this study; the Department of Physical Medicine and Rehabilitation at Oslo University Hospital, Ulleva˚l, for recruitment of patients; and Jan Michael Gran at the Department of Biostatistics at the University of Oslo for help with the jackknife procedures.

References [1] Bot SD, Terwee CB, van der Windt DA, Bouter LM, Dekker J, de Vet HC. Clinimetric evaluation of shoulder disability questionnaires: a systematic review of the literature. Ann Rheum Dis 2004;63: 335e41. [2] Liang MH. Evaluating measurement responsiveness. J Rheumatol 1995;22:1191e2. [3] Streiner DL, Norman GR. Health measurement scales. New York: Oxford University Press; 2003. [4] Beaton DE. Understanding the relevance of measured change through studies of responsiveness. Spine 2000;25:3192e9. [5] Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol 2008;61:102e9. [6] Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol 2003;56: 395e407. [7] de Vet HC, Ostelo RW, Terwee CB, van der Roer N, Knol DL, Beckerman H, et al. Minimally important change determined by a visual method integrating an anchor-based and a distribution-based approach. Qual Life Res 2007;16:131e42. [8] Patrick DL, Deyo RA. Generic and disease-specific measures in assessing health status and quality of life. Med Care 1989;27(3 Suppl): S217e32. [9] Kirkley A, Griffin S, Dainty K. Scoring systems for the functional assessment of the shoulder. Arthroscopy 2003;19:1109e20. [10] Holtby R, Razmjou H. Measurement properties of the Western Ontario rotator cuff outcome measure: a preliminary report. J Shoulder Elbow Surg 2005;14:506e10. [11] Razmjou H, Bean A, van OV, MacDermid JC, Holtby R. Cross-sectional and longitudinal construct validity of two rotator cuff disease-specific outcome measures. BMC Musculoskelet Disord 2006;7:26. [12] MacDermid JC, Drosdowech D, Faber K. Responsiveness of self-report scales in patients recovering from rotator cuff surgery. J Shoulder Elbow Surg 2006;15:407e14. [13] Ekeberg OM, Bautz-Holter E, Tveita EK, Juel NG, Kvalheim S, Brox JI. Subacromial ultrasound guided or systemic steroid injection for rotator cuff disease: randomised double blind study. BMJ 2009;338:a3112. [14] Ekeberg OM, Bautz-Holter E, Tveita EK, Keller A, Juel NG, Brox JI. Agreement, reliability and validity in 3 shoulder questionnaires in patients with rotator cuff disease. BMC Musculoskelet Disord 2008;9:68. [15] Tveita EK, Ekeberg OM, Juel NG, Bautz-Holter E. Responsiveness of the Shoulder Pain and Disability Index in patients with adhesive capsulitis. BMC Musculoskelet Disord 2008;9:161. [16] Kirkley A, Alvarez C, Griffin S. The development and evaluation of a disease-specific quality-of-life questionnaire for disorders of the rotator cuff: the Western Ontario Rotator Cuff Index. Clin J Sport Med 2003;13:84e92. [17] Dawson J, Fitzpatrick R, Carr A. Questionnaire on the perceptions of patients about shoulder surgery. J Bone Joint Surg Br 1996;78: 593e600.

584

O.M. Ekeberg et al. / Journal of Clinical Epidemiology 63 (2010) 575e584

[18] Dawson J, Rogers K, Fitzpatrick R, Carr A. The Oxford shoulder score revisited. Arch Orthop Trauma Surg 2008;129:119e23. [19] Roach KE, Budiman-Mak E, Songsiridej N, Lertratanakul Y. Development of a Shoulder Pain and Disability Index. Arthritis Care Res 1991;4:143e9. [20] Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007;60:34e42. [21] Angst F, Pap G, Mannion AF, Herren DB, Aeschlimann A, Schwyzer HK, et al. Comprehensive assessment of clinical outcome and quality of life after total shoulder arthroplasty: usefulness and validity of subjective outcome measures. Arthritis Rheum 2004;51:819e28. [22] Cohen J. Statistical power analysis for the social sciences. 2nd edition. Hillsdale, NJ: Laurence Erlbaum; 1988. [23] Efron B, Tibshirani RJ. An introduction to the bootstrap. 1st edition. New York, NY: Chapmann & Hall; 1993. [24] R development core team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation of Statistical Computing; 2007. [25] DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837e45. [26] Schmitt JS, Di Fabio RP. Reliable change and minimum important difference (MID) proportions facilitated group responsiveness comparisons using individual threshold criteria. J Clin Epidemiol 2004;57:1008e18. [27] Altman D, Machin D, Bryant T, Gardner M. Statistics with confidence. London: BMJ Books; 2007. [28] Beaton DE, Boers M, Wells GA. Many faces of the minimal clinically important difference (MCID): a literature review and directions for future research. Curr Opin Rheumatol 2002;14:109e14. [29] Stratford PW, Binkley JM, Riddle DL, Guyatt GH. Sensitivity to change of the Roland-Morris Back Pain Questionnaire: part 1. Phys Ther 1998;78:1186e96. [30] Farrar JT, Young JP, LaMoreaux L, Werth JL, Poole RM. Clinical importance of changes in chronic pain intensity measured on an 11-point numerical pain rating scale. Pain 2001;94:149e58. [31] Cella D, Hahn EA, Dineen K. Meaningful change in cancer-specific quality of life scores: differences between improvement and worsening. Qual Life Res 2002;11:207e21. [32] Stratford PW, Riddle DL. Assessing sensitivity to change: choosing the appropriate change coefficient. Health Qual Life Outcomes 2005;3:23. [33] Angst F, Goldhahn J, Drerup S, Aeschlimann A, Schwyzer HK, Simmen BR. Responsiveness of six outcome assessment instruments in total shoulder arthroplasty. Arthritis Rheum 2008;59:391e8. [34] Heald SL, Riddle DL, Lamb RL. The Shoulder Pain and Disability Index: the construct validity and responsiveness of a region-specific disability measure. Phys Ther 1997;77:1079e89. [35] Beaton D, Richards RR. Assessing the reliability and responsiveness of 5 shoulder questionnaires. J Shoulder Elbow Surg 1998;7:565e72.

[36] Cloke DJ, Lynn SE, Watson H, Steen IN, Purdy S, Williams JR. A comparison of functional, patient-based scores in subacromial impingement. J Shoulder Elbow Surg 2005;14:380e4. [37] Dawson J, Hill G, Fitzpatrick R, Carr A. Comparison of clinical and patient-based measures to assess medium-term outcomes following shoulder surgery for disorders of the rotator cuff. Arthritis Rheum 2002;47:513e9. [38] Stratford PW, Kennedy DM, Hanna SE. Condition-specific Western Ontario McMaster Osteoarthritis Index was not superior to region-specific Lower Extremity Functional Scale at detecting change. J Clin Epidemiol 2004;57:1025e32. [39] Beaton DE, Katz JN, Fossel AH, Wright JG, Tarasuk V, Bombardier C. Measuring the whole or the parts? Validity, reliability, and responsiveness of the Disabilities of the Arm, Shoulder and Hand outcome measure in different regions of the upper extremity. J Hand Ther 2001;14:128e46. [40] Wells G, Beaton D, Shea B, Boers M, Simon L, Strand V, et al. Minimal clinically important differences: review of methods. J Rheumatol 2001;28:406e12. [41] van der Roer N, Ostelo RW, Bekkering GE, van Tulder MW, de Vet HC. Minimal clinically important change for pain intensity, functional status, and general health status in patients with nonspecific low back pain. Spine 2006;31:578e82. [42] Ostelo RW, Deyo RA, Stratford P, Waddell G, Croft P, Von Korff M, et al. Interpreting change scores for pain and functional status in low back pain: towards international consensus regarding minimal important change. Spine 2008;33:90e4. [43] Paul A, Lewis M, Shadforth MF, Croft PR, Van Der Windt DA, Hay EM. A comparison of four shoulder-specific questionnaires in primary care. Ann Rheum Dis 2004;63:1293e9. [44] Williams JW Jr, Holleman DR Jr, Simel DL. Measuring shoulder function with the Shoulder Pain and Disability Index. J Rheumatol 1995;22:727e32. [45] Dawson J, Doll H, Boller I, Fitzpatrick R, Little C, Rees J, et al. The development and validation of a patient-reported questionnaire to assess outcomes of elbow surgery. J Bone Joint Surg Br 2008;90: 466e73. [46] Guyatt GH, Juniper EF, Walter SD, Griffith LE, Goldstein RS. Interpreting treatment effects in randomised trials. BMJ 1998;316: 690e3. [47] Norman GR, Stratford P, Regehr G. Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach. J Clin Epidemiol 1997;50:869e79. [48] Schmitt J, Di Fabio RP. The validity of prospective and retrospective global change criterion measures. Arch Phys Med Rehabil 2005;86: 2270e6. [49] Bombardier C. Outcome assessments in the evaluation of treatment of spinal disorders: summary and general recommendations. Spine 2000;25:3100e3. [50] Ostelo RW, de Vet HC. Clinically important outcomes in low back pain. Best Pract Res Clin Rheumatol 2005;19:593e607.