Mapping WOMAC onto the EQ-5D-5L Utility Index in Patients with Hip or Knee Osteoarthritis

Mapping WOMAC onto the EQ-5D-5L Utility Index in Patients with Hip or Knee Osteoarthritis

- Contents lists available at sciencedirect.com Journal homepage: www.elsevier.com/locate/jval Mapping WOMAC onto the EQ-5D-5L Utility Index in Pati...

588KB Sizes 0 Downloads 39 Views

-

Contents lists available at sciencedirect.com Journal homepage: www.elsevier.com/locate/jval

Mapping WOMAC onto the EQ-5D-5L Utility Index in Patients with Hip or Knee Osteoarthritis Amaia Bilbao, PhD,1,2,3,* Jesús Martín-Fernández, MD, PhD,2,4,5 Lidia García-Pérez,2,6 Juan Carlos Arenaza, MD,2,7 Gloria Ariza-Cardiel, MD, PhD,2,4 Yolanda Ramallo-Fariña, MSc,2,6 Laura Ansola, MSc1 1 Osakidetza Basque Health Service, Basurto University Hospital, Research Unit, Bilbao, Spain; 2Health Service Research Network on Chronic Diseases (REDISSEC), Bilbao, Spain; 3Kronikgune Institute for Health Services Research, Barakaldo, Spain; 4Oeste Multiprofessional Teaching Unit of Primary and Community Care, Primary Healthcare Management, Madrid Health Service, Madrid, Spain; 5Health Sciences Faculty, Rey Juan Carlos University, Madrid, Spain; 6Fundación Canaria de Investigación Sanitaria (FUNCANIS), Santa Cruz de Tenerife, Tenerife, Spain; 7Osakidetza Basque Health Service, Basurto University Hospital, Traumatology and Orthopedic Surgery Service, Bilbao, Spain.

A B S T R A C T Objectives: To map the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) onto the EQ-5D-5L in patients with hip or knee osteoarthritis (OA). Methods: A prospective observational study was conducted on 758 patients with hip or knee OA who completed the EQ-5D5L and WOMAC questionnaires, of whom 644 completed them both again 6 months later. Baseline data were used to derive mapping functions. Generalized additive models were used to identify to which powers the WOMAC subscales should be raised to achieve a linear relationship with the response. For the modeling, general linear models (GLM), Tobit models, and beta regression models were used. Age, sex, and affected joints were also considered. Preferred models were selected based on Akaike and Bayesian information criteria, adjusted R2, mean absolute error (MAE), and root mean squared error (RMSE). The functions were validated with the follow-up data using MAE, RMSE, and the intraclass correlation coefficient. Results: The preferred models were a GLM with Pain21Pain31Function1Pain$Function as covariates and a beta model with Pain31Function1Function21Function3 as covariates. The adjusted R2 were similar (0.6190 and 0.6136, respectively). The predictive performance of these models in the validation sample was similar and both models showed an overprediction for health states worse than death. Conclusion: To our knowledge, these are the first functions mapping the WOMAC onto the EQ-5D-5L in patients with hip or knee OA. They showed an acceptable fit and precision and could be very useful for clinicians and researchers when costeffectiveness studies are needed and generic preference-based health-related quality of life instruments to derive utilities are not available. Keywords: EQ-5D, mapping, osteoarthritis, utility index, WOMAC. VALUE HEALTH. 2019; -(-):-–-

Introduction Osteoarthritis (OA) is a chronic disease that is a major and frequent cause of pain and disability, particularly in aging populations,1,2 representing a serious public health problem.3,4 This disease is very common in hip and knee joints, and the considerable associated functional limitation and social isolation in elderly people can impair health-related quality of life (HRQoL).2,5,6 Given that the main symptoms are joint pain and functional limitation, in addition to the measurement of clinical parameters, there is a growing interest in the measurement of HRQoL7 as an important indicator of the effects of OA on patients'

lives and the effects of treatment. Moreover, OA also consumes a great amount of social and health resources, representing a significant economic burden for patients, healthcare providers, and society in general.8-11 To make efficient use of limited healthcare resources, cost-effectiveness evidence has become more important than ever for decision makers at various levels.12 In cost-effectiveness studies, one of the most commonly recommended outcome measures is quality-adjusted life years (QALYs).13,14 The QALY combines both quality and quantity of life and allows a broad comparison between different treatment strategies, patient populations, and clinical settings. The parameters necessary for calculating QALYs are the utility for a given

* Address correspondence to: Amaia Bilbao, Unidad de Investigación, Hospital Universitario Basurto, Avda. Montevideo, 18, 48013 Bilbao, Bizkaia, Spain. Email: [email protected] 1098-3015/$36.00 - see front matter Copyright ª 2019, ISPOR–The Professional Society for Health Economics and Outcomes Research. Published by Elsevier Inc. https://doi.org/10.1016/j.jval.2019.09.2755

2

- 2019

VALUE IN HEALTH

health state as a measure of quality of life and the amount of time spent in that state as a measure of the length of life. Utility scores represent the strength of a person's preferences for health states with a range of values between 0 (dead) and 1 (full health), although there are also negative values for health conditions deemed worse than death. Utilities are normally measured with generic HRQoL questionnaires, the EQ-5D being one of the most widely used.15,16 It is a preference-based tool for describing and valuing health17 based on a simple, brief, self-administered questionnaire, which has been adapted to and validated in a large number of populations and contexts (www.euroqol.org). The original EQ-5D, called the EQ5D-3L, consists of 5 items covering 5 dimensions, each with 3 response categories.18 A preference-based scoring function is used to convert the descriptive system to a summary utility score. More recently, the EuroQoL group has developed the EQ-5D-5L,19 which offers 5 response options for each dimension to overcome some limitations shown by the EQ-5D-3L.20-26 Nevertheless, in clinical practice, such generic instruments are much less often used than specific HRQoL questionnaires, which address more specific aspects of the disease, have greater discriminatory power, and are more sensitive to change.27,28 Among the available disease-specific questionnaires, the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) is one of the instruments most widely used to evaluate symptoms and function in patients with hip or knee OA.29-31 Unfortunately, most of these disease-specific HRQoL questionnaires are not preference based and do not serve to estimate utilities for use in cost-utility studies. To overcome this limitation, it has been suggested that algorithms be used to map diseasespecific measures onto generic preference-based instruments, allowing utilities to be estimated using the disease-specific questionnaires.32,33 Three algorithms have already been developed to map the WOMAC onto the EQ-5D.34-37 All of them, however, have used the original EQ-5D-3L and not the EQ-5D-5L. Therefore, the objective of this study was to develop mapping functions to map the WOMAC onto the EQ-5D-5L in patients with hip or knee OA, using different statistical strategies, and to compare and validate the derived functions.

Methods Study Population We included patients recruited from 6 hospitals and 21 primary care centers in the National Health Service in 3 regions of Spain (Basque Country, Canary Islands, and Madrid), to which the researchers of the Health Service Research Network on Chronic Diseases (REDISSEC) belong. Consecutive patients with a diagnosis of hip or knee OA seen in the hospital traumatology or rheumatology departments or in primary care during the study period were invited to participate. We excluded patients with malignant or other organic diseases or psychiatric disorders that hindered participation, as well as those who could neither read nor understand Spanish. Patients were recruited between November 2014 and December 2015. The study was approved by the review board of each institution.

Measurements All patients were given a letter informing them about the study and asking for their voluntary participation. Those who agreed to participate were provided with the EQ-5D-5L19 and the WOMAC30 questionnaires, along with some questions regarding sociodemographic characteristics. Six months later, the same questionnaires

were sent by mail to patients at home for completion and return by mail. A reminder telephone call was made to patients who had not replied after 15 days, and patients who still had not responded 15 days later received the questionnaire again. In addition, general demographic and clinical data were also retrieved by trained personnel from the patients’ medical records, at baseline and at follow-up. The EQ-5D-5L19 has been developed concurrently in different languages including Spanish. Its 5 questions about the respondent's state of health measure mobility, self-care, performance of usual activities, pain or discomfort, and anxiety or depression, and they are rated on a 5-point scale from 1 (no problems) to 5 (unable to perform/extreme problems). The respondents' health status is expressed as a 5-digit profile based on their scores on each of the 5 questions. The preference-based scoring function for the Spanish population is used to derive a utility index, which is a weighted score ranging from -0.4162 to 1, with higher scores indicating better HRQoL.38 The EQ-5D-5L has shown good psychometric properties in patients with hip or knee OA.21,39,40 The WOMAC is a health status instrument designed specifically for patients with hip or knee OA.30 It has a multidimensional scale comprising 24 items grouped into 3 dimensions: pain (5 items), stiffness (2 items), and physical function (17 items). We used the Likert version of the WOMAC with 5 response levels for each item, representing different degrees of intensity ranging from 0 (none) to 4 (extreme). The scores for each dimension and for the total score range from 0 to 100, with higher scores indicating poorer health status. The WOMAC has been translated into Spanish and validated in Spain.27,28,41

Statistical Analysis To describe the sample, we used means and standard deviations (SDs) for quantitative variables and frequencies and percentages for categorical variables. The distribution of the EQ5D-5L index was studied by plotting a histogram, and floor and ceiling effects were assessed. Baseline data (derivation sample) were used for developing the mapping functions and the 6-month follow-up data (validation sample) for validating these functions. We explored the relationship between the EQ-5D-5L index and the WOMAC subscales by calculating Spearman's correlation coefficient.

Derivation of the mapping functions We used 3 different statistical approaches: general linear models (GLMs), Tobit models, and beta models: 1. A GLM assumes the dependent variable is continuous and the residuals must be normally distributed. Moreover, HRQoL outcomes such as the EQ-5D-5L index are restricted to an interval, and so the use of a GLM may not always be appropriate.42 2. To overcome this issue, we constructed Tobit models,43 which are based on linear models, with the difference that observations are restricted to the range in which the response variable is defined. The dependent variable (EQ-5D-5L index) is censored at a lower bound of -0.4162 and upper bound of 1, which makes this method appropriate.44,45 As in GLMs, the residuals must be normally distributed. 3. To avoid this assumption, beta regression models were built.46 These models use the logit function as a link and allow modeling outcomes with skewed distributions. The response variable has to be defined in the open unit interval (0, 1); hence, we needed to transform the EQ-5D-5L index as follows: we first applied a linear transformation to the closed interval

--

[0, 1] by taking Y ¼ðY  2 aÞ=ðb 2 aÞ, where a is the smallest possible score (a = ‒0.4162), b is the highest (b = 1), and Y is the dependent observed variable; then we transformed boundary points to slightly higher or lower values by applying the formula ½YðN 2 1Þ 1 0:5=N, where Yis the dependent observed variable transformed to [0, 1] and N is the number of individuals.46 All 3 types of models assume that the relationship of each of the covariates with the response variable is linear. To overcome this issue, before developing the mapping functions, we used generalized additive models (GAMs)47,48 to identify to which powers the WOMAC subscales should be raised. Then, we raised the WOMAC subscales to the estimated power and included them in the models as covariates. The strategy used in the 3 statistical approaches was the same. Four models were developed, starting with the most parsimonious, considering the following predictor variables: Model 1: WOMAC subscales; Model 2: WOMAC subscales with P , .10; Model 3: WOMAC subscales plus interactions plus powers derived from GAMs; and Model 4: WOMAC subscales plus interactions plus powers with P , .10. To help interpret the models, all the possible WOMAC covariates were transformed linearly to the interval [0-100] like the range of the original WOMAC subscales. For each model, a complete case analysis was done. To select the preferred model per approach, we considered the following goodness-of-fit measures49: the Akaike (AIC) and the Bayesian (BIC) information criteria and adjusted R-squared for GLM and beta models. We also compared the 4 models according to the following measures of predictive performance50: (1) the mean absolute error (MAE), which is the mean value of the absolute differences between observed and predicted utilities; and (2) the root mean squared error (RMSE), which is the square root of the average of squared errors. We used a bootstrapping with replacement method51 to evaluate the performance of the predictive models. Specifically, 1000 bootstrapped samples were generated, and in each replication, the MAE and RMSE were calculated for each predictive model. The 95% probability intervals for each measure were calculated as the 2.5th and 97.5th percentile of the distribution of the resulting bootstrapped values. The lower the AIC, BIC, MAE, and RMSE, and the higher the Rsquared, the better the goodness of fit. The residuals of the GLM and Tobit models were examined graphically to assess whether they followed a normal distribution. The coefficients of the models were reported along with standard errors (SEs). For the final preferred models, we also tested whether the inclusion of demographic variables, such as age and sex, and affected joint improved the fit. We also simulated data from the preferred models and plotted the cumulative distribution functions to compare the simulated with the observed data across the spectrum of disease severity.49,52 Regarding the sample size, a minimum of 400 patients were needed in the GLM model to obtain excellent predictive power (R2 $ 0.5) including about 10 predictive variables.53 The beta models required a smaller sample size.54

Validation of the mapping functions For the validation of the preferred mapping functions in the follow-up sample, the MAE and RMSE were estimated, in addition to the intraclass correlation coefficient (ICC), which is a measure of agreement between an individual’s predicted and observed utility scores.55 Furthermore, to assess the performance of the mapping function across the range of the EQ-5D-5L index, we plotted the

3

observed utility scores against the prediction errors (predicted score minus observed score) from the preferred predictive model. The predictive performance was also analyzed for different intervals of the EQ-5D-5L index; for different intervals of the WOMAC total score; for patients who underwent hip or knee replacement between baseline and the follow-up (surgical patients) and those who did not (nonsurgical patients); and for recruitment place. We also studied the predictive precision of the preferred models at the individual level.37 The absolute difference between the predicted and observed utility scores was estimated and classified into various ranges. We compared the percentage of patients falling into each range with each of the preferred models based on the different statistical approaches. Finally, observed and predicted EQ-5D-5L mean changes were calculated by subtracting the baseline score from the follow-up. The effect sizes (ESs), defined as the mean change score divided by the SD of the change score,56 were calculated, and Cohen’s benchmarks were used to classify the magnitude of the ESs.57 This analysis was performed separately for surgical and nonsurgical patients. The statistical analyses were performed with R software version 3.4.1 (R Development Core Team, 2017) and SAS 9.2 for Windows (SAS Institute, Cary, NC).

Results During the recruitment period, we included 758 patients who met the selection criteria and agreed to participate. Of these, 644 (84.96%) completed the questionnaires at 6 months. Table 1 summarizes the sociodemographic and clinical data. At baseline, the mean age was 69.78 (SD = 10.57), 61.87% were women, and 47.63% had hip OA. The distribution of the characteristics was similar at 6 months, and 130 (20.19%) patients had undergone hip or knee replacement surgery by the time of the follow-up. Table 1 also reports descriptive statistics of the WOMAC and EQ-5D-5L scores. The WOMAC subscales ranged from 0 to 100 at both baseline and follow-up. The mean score of the EQ-5D-5L index was 0.5328 (SD = 0.2874) at baseline and 0.5979 (SD = 0.2900) at follow-up. The floor and ceiling effects in EQ-5D-5L were minimal (range, 0% to 4.41%), as were the missing data (1.06% at baseline and 1.40% at follow-up). Figure 1 shows the distribution of the EQ-5D-5L index. Spearman’s correlation coefficients between EQ-5D-5L utilities and WOMAC subscales were -0.69, -0.78, and -0.58 for pain, function, and stiffness, respectively, at baseline; they were even higher at 6 months: -0.76, -0.85, and -0.70, respectively.

Derivation of the Mapping Functions The GAMs estimate a third power relationship between the EQ-5D-5L utilities and WOMAC scores, except for stiffness in beta models in which the relationship was linear. Regarding the 4 predictive models derived from the GLM (Table 2), the best goodness of fit was found in Model 4, both AIC and BIC being lower, the adjusted R2 higher, and the range of predicted values somewhat closer to the observed range. The best predictive accuracy was found in Model 3 and 4, with both MAE and RMSE being lower. Consequently, Model 4, including Pain21Pain31Function1Pain$Function, was selected as the preferred GLM model. Further, the residuals of Model 4 followed a normal distribution (see Appendix Figure 1S in Supplemental Materials found at https://doi.org/10.1016/j.vhri.2019.09.005). In the Tobit regression (Table 2), the best predictive accuracy and goodness of fit was also found in Model 4, including

4

- 2019

VALUE IN HEALTH

Table 1. Characteristics of the study participants at baseline and at 6 months of follow-up. Parameter Age (years), mean (SD) Sex, women, n (%)

At baseline (n = 758)

At 6 months (n = 644)

69.78 (10.57)

70.29 (10.38)

469 (61.87)

382 (59.32)

Body mass index (kg/m2), n (%) ,25 25-30 $30

153 (20.56) 312 (41.94) 279 (37.50)

129 (20.44) 265 (42) 237 (37.56)

Charlson comorbidity index, n (%) 0 1 .1

423 (55.95) 179 (23.68) 154 (20.37)

365 (56.85) 149 (23.21) 128 (19.94)

Affected joint, n (%) Hip Knee

361 (47.63) 397 (52.37)

313 (48.60) 331 (51.40)

295 (38.92) 392 (51.71)

264 (40.99) 333 (51.71)

71 (9.37)

47 (7.30)

Recruitment place, n (%) Primary care Hospital, Traumatology Service Hospital, Rheumatology Service WOMAC subscales, mean (SD) Pain Function Stiffness EQ-5D-5L utility index Mean (SD) Range Floor effect, n (%) Ceiling effect, n (%)

46.43 (21.60) 51.72 (21.97) 47.50 (25.71)

39.54 (24.34) 44.44 (24.59) 40.66 (26.57)

0.5328 (0.2874) 20.4162 to 1 2 (0.27) 19 (2.53)

0.5979 (0.2900) 20.3982 to 1 0 (0) 28 (4.41)

Note. Percentages exclude patients with missing data. The scores for the WOMAC subscales range from 0 to 100, with higher scores indicating worse health status; the scores for the EQ-5D-5L utility index range from -0.4162 to 1, with higher scores indicating better health status. SD indicates standard deviation.

Pain31Function, as in the preferred GLM. Given that the Tobit had slightly higher MAE and RMSE than GLM and that no models predicted values outside the range of the EQ-5D-5L index, the Tobit was not considered the preferred model. In the beta regression, the MAE and RMSE of Model 3 and 4 were similar, but Model 4 yielded the lowest AIC and BIC, and the highest adjusted R2. Therefore, this model, including Pain31Function1Function21Function3, was selected as the preferred beta model (Table 2). When including age, sex, and affected joint in the preferred functions (Model 5), only sex yielded P , .10 (Table 2). Comparing Model 4 and Model 5, the goodness of fit and predictive accuracy were similar, with a slightly higher adjusted R2 and lower AIC but higher BIC in Model 5. Therefore, we did not consider that inclusion of sex improved Model 4, and, consequently, Model 5 was rejected. Therefore the mapping functions selected as preferred were Models 4 of GLM and beta regression. Appendix Figure 2S (in Supplemental Materials found at https://doi.org/10.1016/j.vhri.201 9.09.005) shows that simulated data from the preferred models fit the observed data closely for both GLM and beta models. Table 3 shows the parameter estimates for the preferred models, and

Appendix Tables 1S and 2S (in Supplemental Materials found at https://doi.org/10.1016/j.vhri.2019.09.005) show the matrix of variances and covariances and the equations for both prediction models to estimate the EQ-5D-5L utilities, respectively. In Appendix Table 3S in Supplemental Materials, the syntax in R to perform the mapping functions is shown.

Validation of the Mapping Functions Table 3 shows the predictive performance of the preferred models in the validation sample. The predicted mean scores were similar with both models and close to the observed mean scores, and both models underestimated the observed variance. The lower limit of the predicted range was closer to the observed minimum in the beta model. The fit indices were very similar, although MAE and RMSE were slightly better in the GLM, but ICC was slightly worse. Figure 2 and Appendix Table 4S (in Supplemental Materials found at https://doi.org/10.1016/j.vhri.2019.09. 005) show how the prediction errors of both models vary with the observed EQ-5D-5L utility, with both models showing an overprediction for very severe health states. That is, both functions adequately fit health states between 0 and 1, but not those worse than death (negative utility values). Appendix Tables 5S and 6S (in Supplemental Materials found at https://doi.org/10.1016/j.vhri.201 9.09.005) show the predictive accuracy of both models according to the WOMAC total score, surgical and nonsurgical patients, and recruitment place. The results were very similar for both models, with worse predictive accuracy for higher values of WOMAC and for nonsurgical patients. Table 4 presents the number and percentage of patients for whom the absolute difference between the predicted and observed EQ-5D-5L utilities fell into various ranges, the distribution being similar with both models (P = .6757). Appendix Table 7S (in Supplemental Materials found at https:// doi.org/10.1016/j.vhri.2019.09.005) shows the observed and predicted EQ-5D-5L change scores. Among surgical patients, both changes were large with ESs greater than 0.8, although the predicted ESs were larger than the observed ones. In any case, the results derived from the GLM and beta regression were similar.

Discussion The current prospective study with a large cohort of patients seen in different hospital or primary care centers provides 2 alternative and similar algorithms to map the disease-specific WOMAC measure onto the generic preference-based EQ-5D-5L. To our knowledge, this is the first report of any algorithm to map the WOMAC onto the more recently developed EQ-5D-5L19,38 in patients with hip or knee OA. What is more, the mapping functions obtained have shown an acceptable fit, providing a way of estimating the utilities for use in cost-effectiveness studies when generic HRQoL preference-based questionnaires are not available. In hip or knee OA patients, numerous studies have linked the EQ-5D with disease-specific instruments,34-37,58-61 but all of them have used the original version, EQ-5D-3L, not the EQ-5D-5L. Furthermore, despite its strong assumptions, the GLM is widely used to derive mapping functions,34,37,60 probably owing to its simplicity and ease of interpretation, although the response variable might be non-normally distributed. The GLM is not suited to response variables with such distributions and the resulting bias has been demonstrated in OA patients.34,62 Other studies have proposed other modeling options given the characteristics of the EQ-5D index.36,58,59,61 In our study, in addition to using GLMs, we have used other statistical methods, strengthening the study: Tobit models, given that the EQ-5D index is restricted to an

--

5

Figure 1. Distribution of the EQ-5D-5L utility index at baseline. 20.0 17.5

Percent

15.0 12.5 10.0 7.5 5.0 2.5 0 -0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

EQ-5D-5L index

interval, and beta models, given the usual skewed distribution of EQ-5D scores.35,36,58-61 Regarding the variables included in the derived models, stiffness was dropped from the final models because it had no significant effect on the EQ-5D-5L score. This is also reflected in the weaker correlation between the EQ-5D-5L index and this subscale. This agrees with the results obtained by Waillo et al36 and Barton et al,34 who concluded that the stiffness subscale was relatively unimportant in determining EQ-5D utilities. The stiffness domain is largely redundant and often excluded when using the WOMAC.63 Both WOMAC pain and function subscales appeared in all preferred derived models, although in different ways. Another strength of our study was the use of GAM functions to identify to which powers the WOMAC subscales should be raised, since all 3 statistical methods used required a linear relationship between response and explanatory variables. We concluded the need to raise the WOMAC subscales to the third power. In contrast, other existing algorithms have only considered second powers but without any prior analysis.34,36,37 Regarding the inclusion of sociodemographic variables such as age and sex in the models, Xie et al37 did not consider these variables, but Barton et al34 included sex, age, and age2, concluding that the model improved, although neither sex nor age2 were significant. Wailoo et al36 also considered age and sex, with both being significant. In our case, the inclusion of sociodemographic variables did not improve the models’ performance. The preferred models were validated on a sample assessed at another time, being entirely different from that used for assessing the models, although it stemmed from the same set of patients. The small differences found in the performance of the GLM and beta may be due to the fact that the EQ-5D-5L19 overcomes limitations of the EQ-5D-3L, such as the ceiling effect and bimodal distribution (Figure 1), at least in patients with hip or knee OA.39 These advantages of the new version, the EQ-5D-5L, explain why the results of the Tobit and beta models were not better than those of the GLM, unlike in other studies,36 and therefore we conclude that the GLM is not as inappropriate as might have been

expected. Comparing our results with the other 3 existing algorithms,34-37 both the GLM and beta model obtained much higher adjusted R2 values (GLM, 0.6190; beta, 0.6136) than Barton et al34 and Xie et al37 (0.313 and 0.449, respectively). In any case, we must recall that the results are not completely comparable since the other studies used the EQ-5D-3L. Both preferred models showed overprediction for very severe health states (negative utility values). This phenomenon has been reported by other authors,34-36,62 although we found less accentuated overprediction than the algorithms derived by Barton et al34 or the validation of Kiadaliri et al35 (both finding marked overestimation for values lower than 0.5). This general problem in mapping may be due to regression to the mean or the conceptual difference between the WOMAC and the EQ-5D5L; namely, whereas the former focuses on physical function, the latter also covers mental aspects such as anxiety and depression. The fact that, in our case, the overestimation was less accentuated and only for health states worse than death might be due to our sample including a broad spectrum of patients, covering all levels of severity, as reflected in the distribution of utility scores. Nonetheless, the predictive values derived from both models for very severe health states should be applied with greater caution. Further, it is necessary to study the implications in a cost-effectiveness study of using these predicted utilities. It should be noted that the predicted EQ-5D-5L change scores derived from the preferred GLM and beta models were very similar. Comparing the mapping-derived with the observed mean change scores, they were similar in nonsurgical patients, and slightly lower in surgical. Nevertheless, the ESs yielded from mapping-derived scores were higher than those derived from observed scores, because both preferred mapping functions underestimated the observed variance. This study has some limitations. In particular, the sample studied may not be representative of the Spanish population with hip or knee OA. Nevertheless, patients included covered a wide range of disease severity and were recruited in different

6

- 2019

VALUE IN HEALTH

Table 2. Fit measures for the different models used to predict the EQ-5D-5L utility index based on WOMAC scores using different methodologies in the derivation sample. Models without demographics

Models with demographics

Model 1

Model 2

Model 3

Model 4

Model 5

Domain scores

Domain scores with P , .10

Domains plus interactions and powers

Domains plus interactions and powers with P , .10

Model 4 plus demographics with P , .10

Variables

P1F1S

P1F

P 1 P2 1 P3 1 F 1 F2 1 F3 1 S 1 S2 1 S3 1 P$F 1 P$S 1 F$S

P2 1 P3 1 F 1 P$F

P2 1 P3 1 F 1 P$F 1 Sex*

n

747

748

747

748

748

Predicted index Mean (SD) Range

0.5327 (0.2237) 0.0037 to 1.0671

0.5328 (0.2235) 0.0105 to 1.0630

0.5327 (0.2271) 20.1871 to 0.9832

0.5328 (0.2267) 20.1756 to 0.9516

0.5328 (0.2269) 20.1654 to 0.9650

Fit measures AIC BIC Adjusted R2

2424.04 2400.96 0.6026

2426.33 2407.86 0.6025

2442.14 2377.51 0.6167

2456.05 2428.34 0.6190

2457.05 2424.73 0.6200

GLM

Predictive accuracy MAE (95% PI) RMSE (95% PI)

0.140 (0.132 – 0.148) 0.181 (0.171 – 0.191)

0.140 (0.132 – 0.148) 0.181 (0.170 – 0.191)

0.134 (0.126 – 0.142) 0.177 (0.166 – 0.187)

0.134 (0.126 – 0.142) 0.177 (0.166 – 0.187)

0.134 (0.126 – 0.142) 0.177 (0.166 – 0.187)

Tobit Variables

P1F1S

P1F

P 1 P2 1 P3 1 F 1 F2 1 F3 1 S 1 S2 1 S3 1 P$F 1 P$S 1 F$S

P3 1 F

P3 1 F 1 Sex†

n

747

748

747

748

748

Predicted index Mean (SD) Range

0.5351 (0.2289) 20.0114 to 1

0.5352 (0.2286) 20.0034 to 1

0.5354 (0.2323) 20.2087 to 1

0.5357 (0.2319) 20.1401 to 1

0.5357 (0.2322) 20.1346 to 1

Fit measures AIC BIC

2352.81 2329.73

2354.79 2336.32

2362.90 2298.27

2376.28 2357.81

2377.72 2354.63

Predictive accuracy MAE (95% PI) RMSE (95% PI)

0.139 (0.131 – 0.147) 0.181 (0.170 – 0.191)

0.139 (0.131 – 0.148) 0.181 (0.170 – 0.191)

0.134 (0.127 – 0.142) 0.177 (0.167 – 0.187)

0.135 (0.126 – 0.143) 0.178 (0.167 – 0.188)

0.134 (0.126 – 0.143) 0.177 (0.166 – 0.187)

Beta Variables

P1F1S

P1F

P 1 P2 1 P3 1 F 1 F2 1 F3 1 S 1 P$F 1 P$S 1 F$S

P3 1 F 1 F2 1 F3

P3 1 F 1 F2 1 F3 1 Sexk

n

747

748

747

748

748

Predicted index Mean (SD) Range

0.5251 (0.2371) 20.0799 to 0.9171

0.5253 (0.2371) 20.0737 to 0.9161

0.5248 (0.2330) 20.2246 to 0.9674

0.5249 (0.2325) 20.2113 to 0.9667

0.5249 (0.2330) 20.2080 to 0.9688

Fit measures AIC BIC Adjusted R2

21069.62 21046.54 0.6089

21072.60 21054.13 0.6088

21101.41 21046.02 0.6126

21108.79 21081.09 0.6136

21111.68 21079.36 0.6156

Predictive accuracy MAE (95% PI) RMSE (95% PI)

0.136 (0.128 – 0.144) 0.180 (0.169 – 0.191)

0.136 (0.128 – 0.145) 0.180 (0.168 – 0.191)

0.135 (0.128 – 0.143) 0.178 (0.167 – 0.189)

0.135 (0.127 – 0.144) 0.178 (0.167 – 0.189)

0.135 (0.127 – 0.143) 0.178 (0.167 – 0.188)

AIC indicates Akaike information criterion; BIC, Bayesian information criterion; F, function; GLM, general linear model; MAE, mean absolute error; P, pain; PI, probability interval; RMSE, root mean square error; S, stiffness; SD, standard deviation. *Adding sex to Model 4 in GLM, the P value for sex was P = .0845, but the P value for P2 was P = .1443. † Adding sex to Model 4 in Tobit, the P value for sex was P = .0640. k Adding sex to Model 4 in beta model, the P value for sex was P = .0271.

--

7

Table 3. Parameter estimates for the preferred models used to predict the EQ-5D-5L utility index based on WOMAC scores in the derivation sample and the predictive accuracy of these preferred models in the validation sample. GLM

Beta*

Model 4

Model 4

Derivation sample (n = 748)† Parameters, b (SE) Intercept Pain2/100 Pain3/10,000 Function Function2/100 Function3/10,000 Pain$Function/100

0.9516 (0.0279)‡ 0.0034 (0.0021)§ 20.0044 (0.0020)k 20.0062 (0.0012)‡

3.7265 (0.2076)‡ 20.0091 20.1145 0.1633 20.0948

20.0042 (0.0024)§

Validation sample (n = 633)

(0.0018)‡ (0.0138)‡ (0.0285)‡ (0.0181)‡



Predicted index Mean (SD) Range

0.5981 (0.2445) 20.1756 to 0.9663

Predictive accuracy MAE (95% PI) RMSE (95% PI) ICC (95% PI)

0.117 (0.108 –0.125) 0.158 (0.147 – 0.170) 0.826 (0.800 – 0.851)

0.5932 (0.2531) 20.2113 to 0.9667 0.118 (0.109 – 0.126) 0.159 (0.147 – 0.171) 0.830 (0.804 – 0.855)

Note. The scores for the WOMAC subscales range from 0 to 100, with higher scores indicating worse health status. All the possible covariates were transformed linearly to the interval (0-100), like the range of original WOMAC scales. The scores for the EQ-5D-5L utility index range from 20.4162 to 1, with higher scores indicating better health status. Appendix Table 2S in the Supplemental Materials shows the equations for both preferred prediction models to estimate the EQ-5D-5L utilities. GLM indicates general linear model; ICC, intraclass correlation coefficient; MAE, mean absolute error; PI, probability interval; RMSE, root mean square error; SD, standard deviation; SE, standard error. † Considering Model 4, the valid sample size without missing data in the EQ-5D-5L utility index, and WOMAC pain and function scores is 748 in the derivation sample and 633 in the validation sample. *In the beta regression model, the dependent variable is the logit of the transformed EQ-5D-5L utility index to the range (0, 1). Therefore, to obtain the predicted EQ-5D5L utility this transformation must be undone as follows: (1) we estimate the logit of the EQ-5D-5L utility index transformed to the open unit interval (0, 1), called A, as a linear combination of the variables based on the b parameters; (2) we estimate the EQ-5D-5L utility index transformed to the open unit interval (0, 1), called B, as B ¼ eA = ð1 1 eA Þ; and (3) we transform the estimation of the EQ-5D-5L utility index to the possible range of the index in the Spanish population, as EQ-5D-5L utility index = 1.4162 $ B – 0.4162. ‡ P , .001. § .05 # P , .10. k .001 # P , .05.

Figure 2. Comparison of the observed EQ-5D-5L utility index and the prediction errors of Model 4 for the general linear model (GLM) and beta model in the validation sample. Beta: Model 4

-0.5

0.0

Prediction error

0.0

-1.0

-0.5 -1.0

Prediction error

0.5

0.5

1.0

1.0

GLM: Model 4

-0.4

-0.2

0.0

0.2

0.4

0.6

Observed EQ-5D-5L utility index

0.8

1.0

-0.4

-0.2

0.0

0.2

0.4

0.6

Observed EQ-5D-5L utility index

0.8

1.0

8

- 2019

VALUE IN HEALTH

Table 4. Predictive precision of the preferred models at the individual level in the validation sample.

REFERENCES 1.

jDifferencej

GLM Model 4

Beta Model 4

n (%)

n (%)

2.

3.

jDj # 0.01

46 (7.27)

45 (7.11)

0.01 , jDj # 0.03

79 (12.48)

73 (11.53)

0.03 , jDj # 0.05

68 (10.74)

74 (11.69)

0.05 , jDj # 0.07

83 (13.11)

68 (10.74)

0.07 , jDj # 0.10

82 (12.95)

89 (14.06)

0.10 , jDj # 0.15

108 (17.06)

106 (16.75)

0.15 , jDj # 0.20

48 (7.58)

69 (10.90)

0.20 , jDj # 0.25

46 (7.27)

40 (6.32)

7.

0.25 , jDj # 0.32

36 (5.69)

33 (5.21)

8.

jDj . 0.32

37 (5.85)

36 (5.69)

9.

Note. D = difference between the predicted and observed EQ-5D-5L utility index. GLM indicates general linear model.

4.

5.

6.

10.

11.

healthcare settings and geographic regions. Furthermore, the EQ5D-5L and the WOMAC scores do not match perfectly, since the WOMAC does not measure mental health aspects. Nevertheless, the high correlation observed between the EQ-5D-5L and WOMAC scores indicates that the algorithms do provide valid and precise predictions. Another limitation is that the validation sample is composed of the same patients as the derivation sample. Therefore, as recommended,49 an external validation of the proposed models should be carried out. In conclusion, the present study provides 2 easy-to-apply mapping algorithms, both of which have acceptable goodness of fit and precision, allowing their use to predict utilities of the more recently developed EQ-5D-5L from the specific WOMAC questionnaire in patients with hip or knee OA. In clinical practice and clinical studies, disease-specific tools are more commonly used than generic ones, thus posing a barrier to cost-effectiveness studies. The identification of suitable mapping functions reduces this barrier by allowing the estimation of utilities, implying more opportunities to conduct cost-effectiveness studies.

Acknowledgments

12.

13.

14.

15. 16.

17. 18. 19.

20. 21.

22. 23.

We are grateful to colleagues in the participating hospitals and primary care centers for their support and to all patients for their collaboration. We acknowledge help from the Biostatistics Research Group (Biostit), supported by the Department of Education, Linguistics Policy and Culture of the Basque Government (Ref: IT620-13). The authors also acknowledge editorial assistance provided by Ideas Need Communicating Language Services, through the translation and edition service of the Basque Foundation for Health Innovation and Research (BIOEF). Source of financial support: This study was supported by grants from the Carlos III Health Institute (Refs: PI13/00560, PI13/00518 and PI13/ 00648) and the European Regional Development Fund.

24.

25.

26.

27.

28.

Supplemental Materials

29.

Supplementary data associated with this article can be found in the online version at https://doi.org/10.1016/j.vhri.2019.09.005.

30.

Mahon JL, Bourne RB, Rorabeck CH, et al. Health-related quality of life and mobility of patients awaiting elective total hip arthroplasty: a prospective study. CMAJ. 2002;167(10):1115–1121. Núñez M, Lozano L, Núñez E, et al. Total knee replacement and health-related quality of life: factors influencing long-term outcomes. Arthritis Rheum. 2009;61(8):1062–1069. Cushnaghan J, Coggon D, Reading I, et al. Long-term outcome following total hip arthroplasty: a controlled longitudinal study. Arthritis Rheum. 2007;57(8):1375–1380. Rat AC, Guillemin F, Osnowycz G, et al. Total hip or knee replacement for osteoarthritis: mid- and long-term quality of life. Arthritis Care Res. 2010;62(1):54–62. Dawson J, Linsell L, Zondervan K, et al. Epidemiology of hip and knee pain and its impact on overall health status in older adults. Rheumatology (Oxford). 2004;43(4):497–504. Quintana JM, Arostegui I, Escobar A, et al. Prevalence of knee and hip osteoarthritis and the appropriateness of joint replacement in an older population. Arch Intern Med. 2008;168(14):1576–1584. Kaufman S. The emerging role of health-related quality of life: data in clinical research, part 2. Clin Res. 2001;1:38–43. Leardini G, Salaffi F, Caporali R, et al. Direct and indirect costs of osteoarthritis of the knee. Clin Exp Rheumatol. 2004;22(6):699–706. Loza E, Lopez-Gomez JM, Abasolo L, et al. Economic burden of knee and hip osteoarthritis in Spain. Arthritis Rheum. 2009;61(2):158–165. Rabenda V, Manette C, Lemmens R, et al. Direct and indirect costs attributable to osteoarthritis in active subjects. J Rheumatol. 2006;33(6): 1152–1158. Salmon JH, Rat AC, Sellam J, et al. Economic impact of lower-limb osteoarthritis worldwide: a systematic review of cost-of-illness studies. Osteoarthritis Cartilage. 2016;24(9):1500–1508. Zhang W, Moskowitz RW, Nuki G, et al. OARSI recommendations for the management of hip and knee osteoarthritis, part I: critical appraisal of existing treatment guidelines and systematic review of current research evidence. Osteoarthritis Cartilage. 2007;15(9):981–1000. Australia Commonwealth Department of Human Services and Health. Guidelines for the Pharmaceutical Industry on the Preparation of Submissions to the Pharmaceutical Benefits Advisory Committee. Canberra: Commonwealth Department of Human Services and Health; 1995. Canadian Agency for Drugs and Technologies in Health (CADTH). Guidelines for the Economic Evaluation of Health Technologies: Canada. 4th ed. Ottawa: CADTH; 2017. Brazier J, Ratcliffe J, Salomon JA, Tsuchiya A. Measuring and Valuing Health Benefits for Economic Evaluation. New York: Oxford University Press; 2007. Drummond MF, Sculpher MJ, Torrance GW, et al. Methods for the Economic Evaluation of Health Care Programmes. 3rd ed. New York: Oxford University Press; 2005. Brooks R. EuroQol: the current state of play. Health Policy. 1996;37(1):53–72. EuroQoL Group. EuroQol–a new facility for the measurement of healthrelated quality of life. Health Policy. 1990;16(3):199–208. Herdman M, Gudex C, Lloyd A, et al. Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L). Qual Life Res. 2011;20(10):1727–1736. Brazier J, Roberts J, Tsuchiya A, Busschbach J. A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ. 2004;13(9):873–884. Conner-Spady BL, Marshall DA, Bohm E, et al. Reliability and validity of the EQ-5D-5L compared to the EQ-5D-3L in patients with osteoarthritis referred for hip and knee replacement. Qual Life Res. 2015;24(7):1775–1784. Fransen M, Edmonds J. Reliability and validity of the EuroQol in patients with osteoarthritis of the knee. Rheumatology (Oxford). 1999;38(9):807–813. Janssen MF, Pickard AS, Golicki D, et al. Measurement properties of the EQ5D-5L compared to the EQ-5D-3L across eight patient groups: a multicountry study. Qual Life Res. 2013;22(7):1717–1727. Johnson JA, Pickard AS. Comparison of the EQ-5D and SF-12 health surveys in a general population survey in Alberta, Canada. Med Care. 2000;38(1): 115–121. Ostendorf M, van Stel HF, Buskens E, et al. Patient-reported outcome in total hip replacement. A comparison of five instruments of health status. J Bone Joint Surg Br. 2004;86(6):801–808. Sullivan PW, Lawrence WF, Ghushchyan V. A national catalog of preferencebased scores for chronic conditions in the United States. Med Care. 2005;43(7):736–749. Escobar A, Quintana JM, Bilbao A, et al. Responsiveness and clinically important differences for the WOMAC and SF-36 after total knee replacement. Osteoarthritis Cartilage. 2007;15(3):273–280. Quintana JM, Escobar A, Bilbao A, et al. Responsiveness and clinically important differences for the WOMAC and SF-36 after hip joint replacement. Osteoarthritis Cartilage. 2005;13(12):1076–1083. Anderson JG, Wixson RL, Tsai D, et al. Functional outcome and patient satisfaction in total knee patients over the age of 75. J Arthroplasty. 1996;11(7):831–840. Bellamy N, Buchanan WW, Goldsmith CH, et al. Validation study of WOMAC: a health status instrument for measuring clinically important patient

--

31.

32. 33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44. 45. 46.

relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol. 1988;15(12):1833–1840. Hawker G, Melfi C, Paul J, et al. Comparison of a generic (SF-36) and a disease specific (WOMAC) (Western Ontario and McMaster Universities Osteoarthritis Index) instrument in the measurement of outcomes after knee replacement surgery. J Rheumatol. 1995;22(6):1193–1196. Brazier J. Valuing health states for use in cost-effectiveness analysis. Pharmacoeconomics. 2008;26(9):769–779. Brazier JE, Yang Y, Tsuchiya A, Rowen DL. A review of studies mapping (or cross walking) non-preference based measures of health to generic preference-based measures. Eur J Health Econ. 2010;11(2):215–225. Barton GR, Sach TH, Jenkinson C, et al. Do estimates of cost-utility based on the EQ-5D differ from those based on the mapping of utility scores? Health Qual Life Outcomes. 2008;6:51. Kiadaliri AA, Englund M. Assessing the external validity of algorithms to estimate EQ-5D-3L from the WOMAC. Health Qual Life Outcomes. 2016;14(1):141. Wailoo A, Hernandez AM, Escobar MA. Modelling the relationship between the WOMAC Osteoarthritis Index and EQ-5D. Health Qual Life Outcomes. 2014;12:37. Xie F, Pullenayegum EM, Li SC, et al. Use of a disease-specific instrument in economic evaluations: mapping WOMAC onto the EQ-5D utility index. Value Health. 2010;13(8):873–878. Ramos-Goñi JM, Craig BM, Oppe M, et al. Handling data quality issues to estimate the Spanish EQ-5D-5L value set using a hybrid interval regression approach. Value Health. 2018;21(5):596–604. Bilbao A, Garcia-Perez L, Arenaza JC, et al. Psychometric properties of the EQ5D-5L in patients with hip or knee osteoarthritis: reliability, validity and responsiveness. Qual Life Res. 2018;27(11):2897–2908. Conner-Spady BL, Marshall DA, Bohm E, et al. Comparing the validity and responsiveness of the EQ-5D-5L to the Oxford hip and knee scores and SF-12 in osteoarthritis patients 1 year following total joint replacement. Qual Life Res. 2018;27(5):1311–1322. Escobar A, Quintana JM, Bilbao A, et al. Validation of the Spanish version of the WOMAC questionnaire for patients with hip or knee osteoarthritis. Western Ontario and McMaster Universities Osteoarthritis Index. Clin Rheumatol. 2002;21(6):466–471. Arostegui I, Nunez-Anton V, Quintana JM. Statistical approaches to analyse patient-reported outcomes as response variables: an application to healthrelated quality of life. Stat Methods Med Res. 2012;21(2):189–214. Pullenayegum EM, Tarride JE, Xie F, et al. Analysis of health utility data when some subjects attain the upper bound of 1: are Tobit and CLAD models appropriate? Value Health. 2010;13(4):487–494. Austin PC, Escobar M, Kopec JA. The use of the Tobit model for analyzing measures of health status. Qual Life Res. 2000;9(8):901–910. Sullivan PW. Are utilities bounded at 1.0? Implications for statistical analysis and scale development. Med Decis Making. 2011;31(6):787–789. Smithson M, Verkuilen J. A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychol Methods. 2006;11(1):54–71.

47. 48.

49.

50.

51. 52.

53. 54.

55.

56.

57. 58.

59.

60.

61. 62.

63.

9

Hastie T, Tibshirani R. Generalized Additive Models. London: Chapman & Hall; 1990. Wood S. On confidence intervals for gams based on penalized regression splines. Australian and New Zealand Journal of Statistics. 2006;48: 445–464. Wailoo AJ, Hernandez-Alava M, Manca A, et al. Mapping to estimate healthstate utility from non-preference-based outcome measures: an ISPOR good practices for outcomes research task force report. Value Health. 2017;20(1):18–27. Grootendorst P, Marshall D, Pericak D, et al. A model to estimate health utilities index mark 3 utility scores from WOMAC index scores in patients with osteoarthritis of the knee. J Rheumatol. 2007;34(3):534–542. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. CRC press; 1994. Gray LA, Wailoo AJ, Hernandez AM. Mapping the FACT-B instrument to EQ5D-3L in patients with breast cancer using adjusted limited dependent variable mixture models versus response mapping. Value Health. 2018;21(12):1399–1405. Knofczynski GT, Mundfrom D. Sample sizes when using multiple linear regression for prediction. Educ Psychol Meas. 2008;68:431–442. Meaney C, Moineddin R. A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design. BMC Med Res Methodol. 2014;14:14. Schuck P. Assessing reproducibility for interval data in health-related quality of life questionnaires: which coefficient should be used? Qual Life Res. 2004;13(3):571–586. Guillemin F, Bombardier C, Beaton D. Cross-cultural adaptation of healthrelated quality of life measures: literature review and proposed guidelines. J Clin Epidemiol. 1993;46(12):1417–1432. Cohen J. A power primer. Psychol Bull. 1992;112(1):155–159. Dakin H, Gray A, Murray D. Mapping analyses to estimate EQ-5D utilities and responses based on Oxford Knee Score. Qual Life Res. 2013;22(3): 683–694. Kim HL, Kim D, Jang EJ, et al. Mapping health assessment questionnaire disability index (HAQ-DI) score, pain visual analog scale (VAS), and disease activity score in 28 joints (DAS28) onto the EuroQol-5D (EQ-5D) utility score with the KORean Observational study Network for Arthritis (KORONA) registry data. Rheumatol Int. 2016;36(4):505–513. Oppe M, Devlin N, Black N. Comparison of the underlying constructs of the EQ-5D and Oxford Hip Score: implications for mapping. Value Health. 2011;14(6):884–891. Pinedo-Villanueva RA, Turner D, Judge A, et al. Mapping the Oxford hip score onto the EQ-5D utility index. Qual Life Res. 2013;22(3):665–675. Marshall D, Pericak D, Grootendorst P, et al. Validation of a prediction model to estimate health utilities index Mark 3 utility scores from WOMAC index scores in patients with osteoarthritis of the hip. Value Health. 2008;11(3):470–477. Whitehouse SL, Lingard EA, Katz JN, Learmonth ID. Development and testing of a reduced WOMAC function scale. J Bone Joint Surg Br. 2003;85(5): 706–711.