Validity, Reproducibility, and Responsiveness of the Oxford Hip Score in Patients Undergoing Surgery for Femoroacetabular Impingement

Validity, Reproducibility, and Responsiveness of the Oxford Hip Score in Patients Undergoing Surgery for Femoroacetabular Impingement

Validity, Reproducibility, and Responsiveness of the Oxford Hip Score in Patients Undergoing Surgery for Femoroacetabular Impingement Franco M. Impell...

543KB Sizes 0 Downloads 13 Views

Validity, Reproducibility, and Responsiveness of the Oxford Hip Score in Patients Undergoing Surgery for Femoroacetabular Impingement Franco M. Impellizzeri, Ph.D., Anne F. Mannion, Ph.D., Florian D. Naal, M.D., and Michael Leunig, M.D.

Purpose: To examine the validity, reproducibility, and responsiveness of the Oxford Hip Score (OHS) in patients with femoroacetabular impingement (FAI). Methods: One hundred twenty-six consecutive patients with FAI and 550 patients undergoing total hip arthroplasty (THA) completed the OHS and the Hip Outcome Score (HOS) at baseline and at 6 and 12 months postoperatively. The patients also rated the global treatment outcome (“How much did the operation help your hip problem?”) on a 5-point Likert scale. Sixty-eight FAI and 96 THA patients completed the OHS twice within 2 weeks so that we could assess its reproducibility. Results: The reproducibility of the OHS was good and was similar for THA and FAI patients (standard error of measurement of 5.6% for THA and 6.2% for FAI and intraclass correlation coefficient of 0.97 for both FAI and THA). In the FAI group, the correlations between the OHS and HOS subscale scores were strong (r ¼ 0.67 to 0.85). The internal responsiveness (standardized response mean) of the OHS in FAI patients was high and similar to that of the HOS (from 0.84 to 1.48 for the OHS and from 0.75 to 1.53 for the HOS). External responsiveness was confirmed by the strong correlations between the change scores for the 2 instruments (r ¼ 0.60 to 0.76) and between the change scores of the OHS and the global treatment outcome score (r ¼ 0.52 to 0.60). No floor or ceiling effects were found, and internal consistency was high (Cronbach a ¼ 0.94). Exploratory factor analysis showed a 2-factor structure for the OHS in both the THA and FAI groups. Conclusions: We conclude that the OHS, though originally developed for patients undergoing THA, represents an appropriate outcome instrument for assessing pain and function in FAI patients treated with arthroscopy or mini-open surgery. Level of Evidence: Level III, diagnostic study of consecutive patients (without consistently applied reference gold standard).

F

emoroacetabular impingement (FAI) occurs when there is a repetitive impact of the femoral head/ neck against the acetabular labrum and/or its adjacent cartilage. In recent years there has been a gradual acceptance of the FAI concept as a disease mechanism of the hip, and the syndrome has been described more frequently in the literature.1,2 Accordingly, the number

From the Department of Research and Development, Schulthess Clinic, Zurich, Switzerland. F.M.I. and A.F.M. contributed equally to the work. The authors report the following potential conflict of interest or source of funding: F.D.N. receives support from Smith & Nephew. Consultant fees for academic hip courses. Deutsche Arthrose-Hilfe e.V. Grant for research project in arthroplasty. M.L. receives support from Smith & Nephew. Received October 17, 2013; accepted July 25, 2014. Address correspondence to Franco M. Impellizzeri, Ph.D., Department of Research and Development, Schulthess Clinic, Lengghalde 2, 8008 Zurich, Switzerland. E-mail: [email protected] Ó 2015 by the Arthroscopy Association of North America 0749-8063/13741/$36.00 http://dx.doi.org/10.1016/j.arthro.2014.07.022

42

of studies examining the results of various types of treatment for FAI has also increased.3 The efficacy of treatment is most commonly assessed using patient-reported outcome measures. Indeed, such self-reports provide a more comprehensive assessment of patient health and disability status than the use of clinical outcomes alone. However, with the exception of a few instruments such as the Hip Outcome Score (HOS),4 the patient-reported outcome measures used in previous studies on FAI have not been externally validated for use in FAI patients.5-7 The Oxford Hip Score (OHS) is one such instrument that was recently used in the assessment of FAI patients.8,9 The OHS was developed for use in total hip arthroplasty (THA) patients and addresses their perceptions of pain, mobility, and function in connection with their hip problem.10,11 Examination of the composite items of the OHS suggests that it should have adequate content validity for use in patients with other hip problems. In THA patients the use of the OHS is associated with very high response rates,12 and this makes it feasible for implementation in

Arthroscopy: The Journal of Arthroscopic and Related Surgery, Vol 31, No 1 (January), 2015: pp 42-50

OXFORD HIP SCORE FOR FAI

national joint registries.13 If the OHS were to prove to be a valid instrument also in FAI patients, it would serve as a suitable questionnaire that is quick and easy to complete (12 items),14 which would be an advantage in the systematic evaluation of outcomes in future FAI registries. Registries typically require valid but brief instruments to minimize the burden on respondents and the administrative system. Many studies using the OHS have already been published in patients with hip replacement and resurfacing (including controls); the wider use of the OHS in patients with other pathologic conditions would facilitate the comparison of outcomes in different hip conditions and after different types of surgery. Lastly, given that outcome studies using the OHS in FAI patients have already been published,8,9 it is necessary to examine the validity of the OHS in this population to compare with historical controls and correctly interpret the conclusions of such studies. The purpose of this study was to examine the validity, reproducibility, and responsiveness of the OHS in patients with FAI. We hypothesized that the OHS would show acceptable validity, reproducibility, and responsiveness, allowing its use to be extended to the assessment of treatment outcome in patients undergoing surgery for FAI.

Methods Design The OHS was originally developed and validated for use in THA patients, and we therefore compared its psychometric characteristics (reliability and responsiveness) in FAI patients with those of the original population for whom the questionnaire was designed. Data were collected from both THA patients and FAI patients using a prospective observational design. Participants This study involved consecutive FAI patients undergoing surgery according to the following inclusion criteria: cam, localized pincer, or mild to moderate mixed impingement in hips with at most early osteoarthritis (<1 ); arthroscopic or mini-open osteochondroplasty as the foreseen surgical intervention; and a good understanding of written German. To cover the entire range of FAI patients, other than these inclusion criteria, no exclusion criteria were used. FAI patients commonly presented with moderate to severe groin pain, usually exacerbated by physical activity and prolonged sitting, that was not resolved by conservative treatment. Symptoms had been present for about 6 months to 2 years. Diagnosis of impingement was based on patient history, clinical examination (reduced flexion and internal rotation and positive impingement test), conventional radiographs (anteroposterior pelvis and cross-table lateral views), and obligatory magnetic

43

resonance imaging with intra-articular gadolinium contrast of the involved hip. All hips had no or only slight degenerative changes. Arthritic changes with Tönnis grade 2 or greater were a contraindication. Over the same period (2009 to 2011), 613 consecutive patients undergoing primary THA (mean age, 65.6 years [standard deviation (SD), 10.5 years]; 52% male and 48% female patients) were asked to complete the questionnaire booklet at baseline and at 6 and 12 months after surgery. Patients undergoing THA had clinical and radiographic evidence of end-stage hip osteoarthritis. Usually, these patients had undergone prior conservative treatment comprising physiotherapy, oral nonsteroidal anti-inflammatory drugs, and therapeutic hip joint injections. The questionnaire booklet containing the OHS and HOS was completed up to 3 weeks before surgery. At 6 and 12 months’ follow-up, the questionnaire booklet was sent out by post to those patients who had returned a preoperative questionnaire, with the request to complete it and return it using the stamped addressed envelope provided. To examine the test-retest reproducibility of the OHS, 200 patients (100 FAI and 100 THA patients) were asked to complete the questionnaire a second time within 2 weeks (approximately one-third at baseline and two-thirds at 6 and 12 months’ follow-up). Patients sending back the second questionnaire after more than 2 weeks were excluded. The total sample size was selected based on published recommendations suggesting a minimum of 50 subjects for reliability and a minimum of 100 subjects for factor analysis with a subject-to-variable ratio of at least 10:1 (120 FAI subjects for the 12 items of the OHS).15,16 For the THA patients, we simply collected questionnaires from all patients over the same study period. The study was approved by the local ethical committee, and all patients gave their written informed consent to participate. Questionnaires Oxford Hip Score. Patients completed the crossculturally adapted and validated German version17 of the OHS.10 The OHS consists of 12 questions asking patients to describe their hip pain and function during the past 4 weeks.10,18 Each item uses a 5-point response scale with values from 0 to 4. An overall score is created by summing the responses to each of the 12 questions. The total score can range from 0 to 48 (most recent scoring system), where 0 is the worst possible score, indicating severe hip problems, and 48 is the best score, suggesting excellent hip function.12 According to the instructions of the developers, if 1 or 2 items were not completed, the missing values were substituted with the mean value of all the available responses. If more than 2 questions were unanswered, the overall score was not calculated. If

44

F. M. IMPELLIZZERI ET AL.

patients indicated 2 answers for 1 item, the worst (most severe) response was used. Hip Outcome Score. Patients completed the crossculturally adapted and validated German version19 of the HOS.4,20,21 The HOS comprises 2 separately scored subscales: the activitieseofedaily living (ADL) subscale with 19 items and the sport subscale with 9 items. Each item uses an adjectival scale ranging from “no difficulty at all” to “unable to do.” Examples of ADL items include standing for 15 minutes, putting on socks and shoes, walking up a steep hill, going up 1 flight of stairs, deep squatting, rolling over in bed, performing heavy work, and twisting/pivoting on the involved leg. Examples of sport items are running 1 mile, jumping, landing, and cutting/lateral movement. ADL and sport subscale scores can range from 0 to 100, with higher scores representing better physical function. Details on data sampling and scoring have been published previously.4,21 According to the developers, questionnaires can be scored if at least 14 items have been completed on the ADL subscale and 7 on the sport subscale. Global Treatment Outcome. At 6 and 12 months’ follow-up, the patients rated the global treatment outcome (GTO) (“How much did the operation help your hip problem?”) on a 5-point Likert scale (with the following response options: operation “helped a lot,” “helped,” “helped only little,” “didn’t help,” or “made things worse”).9,22 Statistical Analysis Measures of centrality and dispersion include means and standard deviations unless otherwise stated. Reproducibility (Reliability and Measurement Error). Reliability concerns the degree to which individuals maintain their position in a sample with repeated measurements.23,24 We assessed this type of reliability with the intraclass correlation coefficient (ICC), using a 2-way random-effects model with single measures. Measurement error (agreement) was determined by calculating the standard error of measurement (SEM). The minimal detectable change (MDC), that is, the smallest individual change that can be considered real and not the result of measurement error, was calculated as SEM  O2  1.96. Consistent deviations in measures between the 2 assessment time points were examined by inspecting the 95% confidence intervals (CIs) of the differences between measurements. Heteroscedasticity was examined by plotting the absolute differences between the 2 sets of scores against their means and calculating the Pearson correlation coefficient between these 2 variables; significant correlations indicated the presence of heteroscedasticity.25

Convergent (Construct) Validity and Floor/Ceiling Effects. Convergent validity between the OHS and HOS was examined using Pearson product moment correlations. Floor and ceiling effects were calculated as the percentages of patients showing the worst and best values for the instrument, respectively. Factorial Structure and Internal Consistency. The structure of the OHS was examined using exploratory factor analysis. In previous studies the factor structure of the Oxford Knee Score has been examined using principal component factor analysis (extraction method) with non-orthogonal promax rotation.26,27 However, principal component analysis is the preferred method for data reduction, not for detecting underlying instrument structure. According to Fabrigar et al.,28 a better extraction method is maximal likelihood or principal axis factoring, depending on the data distribution. However, the results of principal component analysis were similar to those of principal axis factoring, and therefore, to allow direct comparison with the findings of Xie et al.,26,27 we presented the data extracted using the same method. We retained factors with eigenvalues greater than 1.0 and based on the scree test. Internal consistency was determined using the Cronbach a. Internal and External Responsiveness. The changes over time in OHS and HOS values were evaluated using repeated-measures analysis of variance (ANOVA) with 1 within-subject factor on 3 levels (time: baseline, 6 months, and 12 months) and 1 between-subjects factor on 2 levels (FAI group: arthroscopy and miniopen). The changes in THA were examined using repeated-measures ANOVA with only 1 withinsubject factor (time). We calculated internal responsiveness (i.e., the ability of an instrument to detect change, regardless of whether it is meaningful29) using the standardized response mean (SRM [group mean change score/standard deviation change score]).29 Internal responsiveness was calculated for each of the patient groups dichotomized based on the responses to the GTO item: those indicating that the operation “helped” or “helped a lot” were classified as “good,” and those indicating that it “helped only little,” “didn’t help,” or “made things worse” were classified as “moderate-low.” The SRM was also calculated for the FAI group dichotomized based on the surgical approach used: arthroscopy and mini-open. External responsiveness (i.e., the ability to detect change over time in the construct to be measured) was examined by assessing the correlations (Pearson) between change scores in both the OHS and HOS, as well as between change scores in the OHS and the GTO scores. The correlations were expected to be large (>0.50)

45

OXFORD HIP SCORE FOR FAI

according to the Cohen classification. Changes in the OHS and HOS values over time were analyzed using a repeated-measures 1-way ANOVA. All the described analyses were carried out using SPSS software (version 17; SPSS, Chicago, IL) and purpose-made spreadsheets in Excel (Microsoft, Redmond, WA). P < .05 was considered statistically significant.

Table 1. Floor and Ceiling Effects of OHS and HOS in Patients With FAI and THA Preoperatively FAI OHS HOS-ADL HOS-Sport THA OHS

6 mo

12 mo

Floor

Ceiling

Floor

Ceiling

Floor

Ceiling

0.0% 0.0% 8.6%

0.6% 0.0% 0.4%

0.0% 0.0% 1.9%

4.1% 14.5% 9.6%

0.0% 0.7% 1.0%

7.6% 16.8% 3.8%

0.0%

0.0%

0.0%

18.5%

0.0%

33.7%

Results One hundred sixty-five consecutive patients with FAI (mean age, 36.1 years [SD, 11.7 years]; 47% male and 53% female patients) undergoing either arthroscopic surgery with labral preservation (52%) or limited anterolateral open surgery (i.e., mini-open) with labral resection (48%) participated in the study. The data from 126 FAI patients (76%) who sent back the questionnaire booklet at all 3 time points were used for the analyses, after the exclusion of 5 patients who had undergone revision or had undergone surgery on the other side. Their mean age was 35 years (SD, 12 years), male patients comprised 49% and female patients comprised 51%, and arthroscopic surgery was performed in 54% and mini-open surgery in 46%. To identify a potential selection bias, all 39 nonrespondents were contacted by phone and asked why they did not send back the questionnaire booklet. Of these, 25 (64%) could be reached by phone: 11 said they had forgotten but they intended to send back the booklet (but never did), 6 declared that they had no time or just could not be bothered with filling out the questionnaires, and only 3 declared that their current status was not good and they were dissatisfied (and therefore did not want to cooperate with us). For the reliability analyses, 68 FAI and 96 THA patients returned the second OHS questionnaire within 2 weeks. Of the THA patients, 550 (89%) returned the questionnaire at all 3 time points. Their mean age was 66 years (SD, 10 years); 51% were male and 49% were female patients.

Factorial Structure and Internal Consistency The exploratory factor analysis showed a similar 2factor structure in both THA and FAI patients. The correlation coefficients between these 2 factors were large (r > 0.50) in both THA patients (r ¼ 0.59) and FAI patients (r ¼ 0.53). The pattern matrix is presented in Table 2. The Cronbach a was 0.94 in FAI patients, with values greater than 0.93 when items were deleted. The Cronbach a was 0.89 in THA patients and was greater than 0.87 when items were deleted.

Reproducibility (Reliability and Measurement Error) The reproducibility of the OHS was calculated separately for patients with FAI and THA. No significant and substantial bias in repeated measures was found for either FAI patients (test-retest difference, 0.5; 95% CI, 1.2 to 0.1) or THA patients (test-retest difference, 0.7; 95% CI, 0.1 to 1.3). Both reliability and agreement were similar for the 2 populations. The ICCs for both FAI and THA were high, in each case being 0.97 (95% CI, 0.96 to 0.98). The SEM was 1.7 points (95% CI, 1.4 to 2.2 points) for FAI and 2.2 points (95% CI, 1.8 to 2.5 points) for THA; these figures corresponded to coefficients of variation (percent of mean values) of 4.5%

Internal and External Responsiveness in FAI and THA Patients No Time  FAI group interactions (P ¼ .246 to .912) were found, indicating that the changes over time were similar for the FAI arthroscopy and mini-open groups. The absolute scores for OHS and HOS in the FAI patients at the 3 time points for those reporting a “good global outcome” are presented in Figure 1. Both groups improved significantly (P < .001). Similarly, the THA patients with a good global outcome showed higher OHS raw scores at 6 months (40.3 [SD, 9.1]) and at 12 months (44.0 [SD, 6.2]) compared with baseline (25.3 [SD, 7.4]; P < .001).

and 6.3%, respectively. The corresponding MDC values for FAI and THA were 4.7 points and 6.1 points, respectively. Convergent (Construct) Validity and Floor/Ceiling Effects of OHS in FAI Patients At each time point (baseline and 6 and 12 months after surgery), the OHS values were significantly and highly correlated with both HOS-ADL scores (r ¼ 0.84 to 0.85) and HOS-Sport scores (r ¼ 0.67 to 0.74). At baseline, no notable floor or ceiling effects (i.e., no proportions >15%) were found for either HOS or OHS (Table 1). At 6 and 12 months’ follow-up, there were higher ceiling effects but they still did not exceed 15%, except for the HOS-ADL, where they were just over 15%. In contrast, in THA patients a high ceiling effect for the OHS (34%) was found at 12 months’ follow-up.

46

F. M. IMPELLIZZERI ET AL.

Table 2. Exploratory Factor Analysis (Pattern Matrix) of OHS in Patients With THA and FAI THA Items Question 1: How would you describe the pain you usually had from your hip? Question 2: Have you had any trouble with washing and drying yourself (all over) because of your hip? Question 3: Have you had any trouble getting in and out of a car or using public transport because of your hip? Question 4: Have you been able to put on a pair of socks, stockings, or tights? Question 5: Could you do the household shopping on your own? Question 6: For how long have you been able to walk before pain from your hip became severe (with or without a stick)? Question 7: Have you been able to climb a flight of stairs? Question 8: After a meal (seat at a table), how painful has it been for you to stand up from a chair because of your hip? Question 9: Have you been limping when walking because of your hip? Question 10: Have you had any sudden severe pain (“shooting,” “stabbing,” or “spasms”) from the affected hip? Question 11: How much has pain from your hip interfered with your usual work (including housework)? Question 12: Have you been troubled by pain from your hip in bed at night?

In FAI patients, internal responsiveness was similar for the OHS and HOS-ADL scale (Table 3). The SRM values for the OHS in THA patients were high (1.49 and 2.23 at 6 and 12 months, respectively) and greater than those in FAI patients (0.87 to 1.53). Examining the external responsiveness in FAI patients, we found large, significant correlations between the change scores of the OHS and both the HOS-ADL (r ¼ 0.73 and r ¼ 0.76 at 6 and 12 months, respectively; P < .001) and HOSSport (r ¼ 0.66 and r ¼ 0.60 at 6 and 12 months, respectively; P < .001). Large, significant correlations were also found between OHS and the rating of improvement at 12 months (r ¼ 0.60). The relations between GTO and HOS-ADL and between GTO and HOS-Sport were similar (r ¼ 0.59 and r ¼ 0.52, respectively).

Discussion The results of this study clearly show that the OHS, though developed for patients with THA, exhibits

Factor 1

FAI Factor 2 0.83

Factor 1

0.89

0.97

0.71

0.68

0.93 0.74 0.39

0.91 0.73

Factor 2 0.83

0.64

0.74

0.49 0.45

0.44

0.80

0.60 0.87

0.50

0.68

0.81

0.71

0.55

psychometric properties similar to the HOS questionnaire, which was specifically developed for FAI patients. The OHS showed high reproducibility, as well as good construct validity and responsiveness, thus supporting the appropriateness of its use in patients with FAI. Reproducibility (Reliability and Measurement Error) The reproducibility of the OHS was good in both THA and FAI patients. The ICC values were very high (0.97) in both patient groups and well above the 0.70 cutoff value suggested as the minimal acceptable ICC.16 The high ICCs suggest that the OHS displays a good ability to differentiate between patients and is therefore a suitable questionnaire for cross-sectional evaluations. However, in the routine setting and for longitudinal assessment of an individual patient or groups of patients (e.g., for evaluating the response to treatment), measurement agreement is the crucial characteristic an instrument should possess: the higher the agreement (i.e., the smaller the SEM), the smaller

Fig 1. OHS and HOS values at baseline and follow-up (6 and 12 months) for patients with FAI operated on with arthroscopy or mini-open technique.

47

OXFORD HIP SCORE FOR FAI Table 3. Change Scores and Internal Responsiveness 6 mo e Baseline

12 mo e Baseline

95% CI FAI with arthroscopy OHS Good Moderate-low HOS-ADL Good Moderate-low HOS-Sport Good Moderate-low FAI with open technique OHS Good Moderate-low HOS-ADL Good Moderate-low HOS-Sport Good Moderate-low THA OHS Good Moderate-low

SRM

95% CI

n

%

Mean

Lower

Upper

Mean

Lower

Upper

Baseline to 6 mo

Baseline to 12 mo

43 17

72 28

14.4 4.7

10.7 3.8

18.1 13.2

18.4 4.7

14.8 2.5

22.0 12.0

1.15 0.26

1.48 0.30

43 16

73 27

17.5 6.3

12.8 3.7

22.1 16.2

20.3 1.1

16.3 6.6

24.3 8.8

1.13 0.30

1.49 0.07

31 16

66 34

22.4 5.5

15.7 6.6

29.0 17.6

27.6 3.7

21.7 5.7

33.4 13.1

1.15 0.22

1.53 0.19

46 20

70 30

12.9 6.0

8.5 1.6

17.3 10.5

18.1 3.5

13.3 0.4

22.9 7.5

0.84 0.59

1.08 0.38

41 21

66 34

12.0 4.5

7.9 2.7

16.0 11.8

16.6 3.3

12.1 3.3

21.2 9.9

0.88 0.27

1.11 0.22

35 15

70 30

17.0 1.5

10.0 8.6

24.0 11.6

30.8 5.9

24.0 0.5

37.6 12.3

0.75 0.07

1.28 0.42

541 9

98 2

31.2 2.8

29.4 8.9

32.9 14.5

39.0 9.2

37.5 2.0

40.4 20.4

1.49 0.16

2.23 0.51

the changes that can be detected.24 We found low SEM values for the OHS in both THA and FAI patients. However, to evaluate the acceptability of the level of agreement, it has been suggested that the corresponding MDC (i.e., the smallest change that can be considered real and not the result of measurement error) should be lower than the minimal clinically important change (MCIC).16,30 In a previous study using the same FAI population as in this study, the MCIC was shown to be 6 points for the OHS.9 The MDC reported in our study for FAI was 4.7 and, therefore, lower than the MCIC, indicating that measurement agreement for the OHS was acceptable. Interestingly, we found a similar level of reproducibility for the OHS in THA patients, the population for whom the instrument was originally developed. Convergent (Construct) Validity and Floor/Ceiling Effects In this study we used the HOS as the reference instrument for assessing convergent validity because this is one of the few instruments specifically developed for patients with FAI.4,20,21 Significant and large correlation coefficients were found for the relation between the OHS and HOS-ADL scores. Although the OHS does not specifically address the sport domain, large correlations were also found between the OHS and HOSSport subscale scores. These associations confirm the convergent validity of the OHS.

Floor and ceiling effects are important measurement properties because they can influence the ability of an instrument to show further deterioration or improvement in the given health state. The OHS did not show any notable floor or ceiling effects at any time point, with values well below the 15% level considered acceptable. Factorial Structure and Internal Consistency Although the factorial structure of an instrument is an attribute that should be tested in the validation process,15 relatively few studies have examined the structure of the OHS and most have involved the knee version of the instrument.26,27 In assessing validity the confirmatory approach is generally preferred, but this can only be performed when the instrument has a welldefined factor structure, which is not the case for the OHS. In previous studies on the OHS in total knee arthroplasty patients,26,27 exploratory factor analysis revealed 2 factors, suggesting that the instrument measures more than 1 construct. From examination of the item classification, it seems that 1 factor is more related to the difficulties in physical activities and the other is more related to pain, and of course, these 2 factors correlate with each other. The 2-factor structure was found in both THA and FAI patients, with the items loading similarly on the 2 factors. The internal consistency of the OHS was high and similar to that reported in the original validation studies in patients undergoing total knee or hip replacement.10,31

48

F. M. IMPELLIZZERI ET AL.

Internal and External Responsiveness Internal responsiveness refers to the ability of an instrument to detect change regardless of whether it is meaningful (distribution-based approach).29,32 The internal responsiveness can be calculated in different ways,29 but for this study, we used the SRM. In this study we calculated the internal responsiveness for the patients classified into 2 groups based on the perceived global improvement, and we hypothesized a greater value for the patients indicating that the operation “helped” or “helped a lot” compared with the patients indicating that the treatment “helped only little,” “didn’t help,” or “made things worse.” As expected, those perceiving substantial improvements (“good” group) showed higher SRMs than those who perceived slight or no benefits from the intervention. When only the patients in the “good” group were examined, the OHS showed high internal responsiveness, particularly in the FAI patients operated on with arthroscopy. In patients treated with mini-open surgery, the internal responsiveness was lower. This finding may suggest that their improvement after the operation was not as great compared with arthroscopy patients. However, observation of the raw OHS (and HOS) data showed that the absolute changes at 6 and 12 months were similar. Therefore it appears that the lower SRM values were mainly because of the larger standard deviation of the change scores (used as the denominator in the calculation of SRM) in mini-open FAI patients. This just indicates a higher interindividual variability in the treatment response, which was nonetheless “good” in the majority of patients; indeed, approximately 70% of FAI patients operated on with either technique reported a good GTO (“helped” or “helped a lot”). The OHS and the HOS showed similar internal responsiveness in the FAI patients. The OHS showed higher responsiveness in THA patients than in FAI patients, the likely consequence of the typically greater improvement seen in THA patients. Overall, these results indicate that, in FAI patients, the OHS shows good internal responsiveness, similar to that of the HOS subscales. It has been suggested that responsiveness should not be viewed as a separate measurement property from validity because responsiveness in fact reflects the validity of a change score.15,33 We examined the external responsiveness by assessing the correlations between the change scores for the OHS and the change scores for the reference instrument (HOS). In addition, we examined the associations between OHS change scores and the perceived GTO (degree of improvement) declared by the patients. The correlation coefficients describing all these relations were large. Interestingly, we also found large correlations between the change scores for the OHS and the HOS-Sport subscales, despite the fact that the OHS does not specifically assess the sport domain. This finding suggests that improvements

in sport are reflected by improvements in general physical activity and that sport-specific items contribute only marginally to making the instrument more responsive in FAI patients. Nevertheless, the lack of any sports items may be seen as a potential limitation of the instrument because the ability to play sports is typically considered a concern in FAI patients. Interestingly, however, a recent study has shown that the main reasons given by FAI patients for undergoing surgery are pain reduction (33%), fear of worsening of the situation (20%), and desire to improve their performance in everyday activities (16%); sports participation was the main reason for surgery in just 10% of the patients, with 20% indicating that it was their second or third most important reason for undergoing surgery.8 In other words, contrary to popular belief, sports participation is the main concern for only a small proportion of the patients (1 to 2 of 10). Furthermore, we showed that the OHS values correlated well with those of the HOS-Sport scale, suggesting that the construct measured with the OHS is closely related to that reflecting limitations in playing sports, as measured by the HOS. If more precise information about limitations in sports participation is required, a recently validated sports scale for FAI patients could easily be added.34 Overall, the psychometric properties of the OHS were comparable with those reported in THA patients, the target group for the instrument, and similar to the HOS, which is an instrument specifically developed for FAI patients. Furthermore, previous studies have already provided information facilitating the interpretation of the OHS values in FAI patients, with an MCIC for improvement of about 6 points (feeling better), and a total score higher than 40 points for an acceptable state (feeling good).8,9 Although our study supports the validity of the OHS in FAI patients, future investigations should further examine the content validity of the OHS for FAI patients. Limitations One potential limitation of this study is the relatively low response rate (76%). As a standard quality-control procedure, we usually contact by phone all the patients who do not send back the questionnaires. On the basis of these phone interviews, only 3 patients reported that they were dissatisfied and therefore not willing to cooperate. Therefore we probably lost to follow-up only a very small percentage of patients who could have influenced the outcome data and, in particular, the responsiveness values. Furthermore, because the aim of this study was not to interpret the magnitude of the changes per se but rather to compare 2 different instruments, we do not consider this a major limitation; selection bias would have influenced the absolute changes but not the comparison between instruments.

OXFORD HIP SCORE FOR FAI

Conclusions We conclude that the OHS, though originally developed for patients undergoing THA, represents an appropriate outcome instrument for assessing pain and function in FAI patients treated with arthroscopy or mini-open surgery.

References 1. Leunig M, Beaule PE, Ganz R. The concept of femoroacetabular impingement: Current status and future perspectives. Clin Orthop Relat Res 2009;467: 616-622. 2. Botser IB, Smith TW Jr, Nasser R, Domb BG. Open surgical dislocation versus arthroscopy for femoroacetabular impingement: A comparison of clinical outcomes. Arthroscopy 2011;27:270-278. 3. Clohisy JC, St John LC, Schutz AL. Surgical treatment of femoroacetabular impingement: A systematic review of the literature. Clin Orthop Relat Res 2010;468: 555-564. 4. Martin RL, Philippon MJ. Evidence of reliability and responsiveness for the hip outcome score. Arthroscopy 2008;24:676-682. 5. Cooper AP, Basheer SZ, Maheshwari R, Regan L, Madan SS. Outcomes of hip arthroscopy. A prospective analysis and comparison between patients under 25 and over 25 years of age. Br J Sports Med 2013;47: 234-238. 6. Lodhia P, Slobogean GP, Noonan VK, Gilbart MK. Patientreported outcome instruments for femoroacetabular impingement and hip labral pathology: A systematic review of the clinimetric evidence. Arthroscopy 2011;27: 279-286. 7. Gedouin JE, May O, Bonin N, et al. Assessment of arthroscopic management of femoroacetabular impingement. A prospective multicenter study. Orthop Traumatol Surg Res 2010;96:S59-S67. 8. Mannion AF, Impellizzeri FM, Naal FD, Leunig M. Fulfilment of patient-rated expectations predicts the outcome of surgery for femoroacetabular impingement. Osteoarthritis Cartilage 2013;21:44-50. 9. Impellizzeri FM, Mannion AF, Naal FD, Hersche O, Leunig M. The early outcome of surgical treatment for femoroacetabular impingement: Success depends on how you measure it. Osteoarthritis Cartilage 2012;20: 638-645. 10. Dawson J, Fitzpatrick R, Carr A, Murray D. Questionnaire on the perceptions of patients about total hip replacement. J Bone Joint Surg Br 1996;78:185-190. 11. Fitzpatrick R, Morris R, Hajat S, et al. The value of short and simple measures to assess outcomes for patients of total hip replacement surgery. Qual Health Care 2000;9: 146-150. 12. Murray DW, Fitzpatrick R, Rogers K, et al. The use of the Oxford hip and knee scores. J Bone Joint Surg Br 2007;89: 1010-1014. 13. Paulsen A, Pedersen AB, Overgaard S, Roos EM. Feasibility of 4 patient-reported outcome measures in a registry setting. Acta Orthop 2012;83:321-327.

49

14. Ahmad MA, Xypnitos FN, Giannoudis PV. Measuring hip outcomes: Common scales and checklists. Injury 2011;42: 259-264. 15. Mokkink LB, Terwee CB, Knol DL, et al. The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: A clarification of its content. BMC Med Res Methodol 2010;10:22. 16. Terwee CB, Bot SD, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007;60:34-42. 17. Naal FD, Sieverding M, Impellizzeri FM, von Knoch F, Mannion AF, Leunig M. Reliability and validity of the cross-culturally adapted German Oxford hip score. Clin Orthop Relat Res 2009;467:952-957. 18. Dawson J, Fitzpatrick R, Murray D, Carr A. A response to issues raised in a recent paper concerning the Oxford knee score. Knee 2006;13:66-68. 19. Naal FD, Impellizzeri FM, Miozzari HH, Mannion AF, Leunig M. The German hip outcome score: Validation in patients undergoing surgical treatment for femoroacetabular impingement. Arthroscopy 2011;27: 339-345. 20. Martin RL, Philippon MJ. Evidence of validity for the hip outcome score in hip arthroscopy. Arthroscopy 2007;23: 822-826. 21. Martin RL, Kelly BT, Philippon MJ. Evidence of validity for the hip outcome score. Arthroscopy 2006;22: 1304-1311. 22. Mannion AF, Porchet F, Kleinstuck FS, et al. The quality of spine surgery from the patient’s perspective: Part 2. Minimal clinically important difference for improvement and deterioration as measured with the Core Outcome Measures Index. Eur Spine J 2009;18: 374-379 (suppl 3). 23. Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 2010;63:737-745. 24. de Vet HC, Terwee CB, Knol DL, Bouter LM. When to use agreement versus reliability measures. J Clin Epidemiol 2006;59:1033-1039. 25. Bland JM, Altman DG. Measurement error proportional to the mean. BMJ 1996;313:106. 26. Xie F, Ye H, Zhang Y, Liu X, Lei T, Li SC. Extension from inpatients to outpatients: Validity and reliability of the Oxford Knee Score in measuring health outcomes in patients with knee osteoarthritis. Int J Rheum Dis 2011;14: 206-210. 27. Xie F, Li SC, Lo NN, et al. Cross-cultural adaptation and validation of Singapore English and Chinese Versions of the Oxford Knee Score (OKS) in knee osteoarthritis patients undergoing total knee replacement. Osteoarthritis Cartilage 2007;15:1019-1024. 28. Fabrigar LR, Wegener DT, MacCallum RC, Strahan EJ. Evaluating the use of exploratory factor analysis in psychological research. Psychol Methods 1999;4:272-299. 29. Terwee CB, Dekker FW, Wiersinga WM, Prummel MF, Bossuyt PM. On assessing responsiveness of healthrelated quality of life instruments: Guidelines for instrument evaluation. Qual Life Res 2003;12:349-362.

50

F. M. IMPELLIZZERI ET AL.

30. Scientific Advisor Committee of Medical Outcomes Trust. Assessing health status and quality-of-life instruments: Attributes and review criteria. Qual Life Res 2002;11:193-205. 31. Dawson J, Fitzpatrick R, Murray D, Carr A. Questionnaire on the perceptions of patients about total knee replacement. J Bone Joint Surg Br 1998;80:63-69. 32. Husted JA, Cook RJ, Farewell VT, Gladman DD. Methods for assessing responsiveness: A critical review and recommendations. J Clin Epidemiol 2000;53:459-468.

33. Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: An international Delphi study. Qual Life Res 2010;19:539-549. 34. Naal FD, Miozzari HH, Kelly BT, Magennis EM, Leunig M, Noetzli HP. The hip sports activity scale (HSAS) for patients with femoroacetabular impingement. Hip Int 2013;23:204-211.