How Many Alternatives Can Be Ranked? A Comparison of the Paired Comparison and Ranking Methods

How Many Alternatives Can Be Ranked? A Comparison of the Paired Comparison and Ranking Methods

VALUE IN HEALTH ] (2016) ]]]–]]] Available online at www.sciencedirect.com journal homepage: www.elsevier.com/locate/jval How Many Alternatives Can...

765KB Sizes 0 Downloads 21 Views

VALUE IN HEALTH ] (2016) ]]]–]]]

Available online at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/jval

How Many Alternatives Can Be Ranked? A Comparison of the Paired Comparison and Ranking Methods Minsu Ock, MD, MPA1, Nari Yi, MPH2, Jeonghoon Ahn, PhD3, Min-Woo Jo, MD, PhD1,* 1 Department of Preventive Medicine, University of Ulsan College of Medicine, Seoul, South Korea; 2Vital Statistics Division, Statistics Korea, Daejeon, South Korea; 3Department of Health Convergence, Ewha Womans University, Seoul, South Korea

AB STR A CT

Objectives: To determine the feasibility of converting ranking data into paired comparison (PC) data and suggest the number of alternatives that can be ranked by comparing a PC and a ranking method. Methods: Using a total of 222 health states, a household survey was conducted in a sample of 300 individuals from the general population. Each respondent performed a PC 15 times and a ranking method 6 times (two attempts of ranking three, four, and five health states, respectively). The health states of the PC and the ranking method were constructed to overlap each other. We converted the ranked data into PC data and examined the consistency of the response rate. Applying probit regression, we obtained the predicted probability of each method. Pearson correlation coefficients were determined between the predicted probabilities of those methods. The mean absolute error was also assessed between the observed and the predicted values. Results: The overall consistency of the response

rate was 82.8%. The Pearson correlation coefficients were 0.789, 0.852, and 0.893 for ranking three, four, and five health states, respectively. The lowest mean absolute error was 0.082 (95% confidence interval [CI] 0.074–0.090) in ranking five health states, followed by 0.123 (95% CI 0.111–0.135) in ranking four health states and 0.126 (95% CI 0.113–0.138) in ranking three health states. Conclusions: After empirically examining the consistency of the response rate between a PC and a ranking method, we suggest that using five alternatives in the ranking method may be superior to using three or four alternatives. Keywords: consistency, discrete choice experiments, paired comparison, ranking.

Introduction

the values of a TTO [6]. Furthermore, the lack of theoretical basis and ethical concerns have been constantly raised as criticism of the PTO [7–9]. Ordinal methods, such as the DCE, have been recently used to overcome the limitations of cardinal methods. For example, the EuroQol Group has tried to adopt the DCE for the five-level version of the EuroQol five-dimensional questionnaire valuation study on a trial basis [10]. In addition, in the Global Burden of Disease (GBD) Study 2010, the primary method for eliciting respondent preferences was a paired comparison (PC, a type of DCE), in which respondents were asked to select the better health state between two options [11]. Although a DCE could be easily conducted in the general population, there might be some difficulties in a study design when there are many alternatives to be compared. In the case of the GBD Study 2010, 220 unique health states were compared with each other and approximately 30,000 respondents participated in household surveys or a Web survey [11]. Obtaining a large sample size or increasing the number of questions for each respondent is needed to compare a large number of alternatives. If there are many alternatives to

Two types of methods can be used in health economics to elicit stated preferences: cardinal methods and ordinal methods [1]. Typical examples of cardinal methods are the standard gamble (SG) and the time trade-off (TTO), and examples of ordinal methods include discrete choice experiments (DCE) and ranking methods [2,3]. First of all, the SG has a rigorous theoretical background because this task is conducted through comparisons of two alternatives under uncertainty about possible events or episodes [1]. There have been, however, several weak points from the perspective of feasibility, because people could not easily understand the concept of event probability. Furthermore, the values from the SG can be affected by loss aversion and risk attitude of respondents [1]. The TTO was then suggested by Torrance [4] as an alternative to the SG, and the person tradeoff (PTO) was provided by Nord [5] to estimate disability weights that can be used to calculate disability-adjusted life-years. Although a TTO is considered to be easier for the public to understand than an SG, time preference can have an effect on

Copyright & 2016, International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc.

* Address correspondence to: Min-Woo Jo, Department of Preventive Medicine, University of Ulsan College of Medicine, 86 Asanbyeongwon-gil, Songpa-Gu, Seoul 138-736, South Korea. E-mail: [email protected]. 1098-3015$36.00 – see front matter Copyright & 2016, International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. http://dx.doi.org/10.1016/j.jval.2016.03.1836

2

VALUE IN HEALTH ] (2016) ]]]–]]]

be compared, ranking methods can be viable alternatives to a DCE because the data from a ranking method, in particular a complete ranking method, can have more statistical information than those from a DCE [12,13]. Nevertheless, ranking methods also cannot infinitely increase the number of alternatives in each question because of cognitive burden. The cognitive burden increases substantially with an increase in the number of alternatives to be ranked, although there seems to be no consensus on the number of alternatives to be ranked [14]. Furthermore, in comparison with a DCE, the lack of consideration of the analytical method for a ranking method might be another problem [15]. Existing analytical methods for ranked data, such as the rank-ordered logit model, are difficult to apply if the alternatives to be ranked have no attributes or dimensions. Consequently, there have been attempts to convert the ranking method data into PC data [15,16]. Such analytical methods, however, may not reflect the actual preferences of respondents if the data from the ranking method do not agree with the converted PC data, that is, if there are logically inconsistent responses between PC data and ranking data [12]. These weaknesses of the analytical methods for ranking data could act as barriers to promoting extended application of ranking methods. In our present study, we compared the consistency of the response rates between a PC and a ranking method to explore the feasibility of applying analytical methods for DCE to ranking data. We also assessed how many alternatives can be ranked, considering Pearson correlation coefficients between the PC and the ranking method. Thus, specifically, we compared the stated preference of two health states from a PC and that of three, four, and five health states from a ranking method.

Methods Study Design and Health States A household survey was conducted in a representative sample of 300 members of the general population around the capital area in South Korea. Sampling was performed using the multistage stratified quota method. A sample quota was assigned to each of the regions of the capital area (Seoul city, Inchon city, and Gyeonggi province) according to sex, age, and education level as defined by June 2013 resident registration data available through the Ministry of Administration and Security, South Korea. The survey was performed between March 25, 2014, and April 4, 2014. Data were collected using a survey program involving computer-assisted faceto-face interviews. The survey program provided the time recordings of the respondents. Thus, we could identify how much time each respondent spent performing each valuation method. We used a total of 222 health states, which reflected a diversity of health outcomes as a consequence of various disease causes. Of the 222 health states, 220 health states were from the GBD 2010 disability weight study and 2 health states were “full health” and “being dead.” Each health state (except “full health” and “being dead”) was made up of brief lay descriptions that explained the meaning of the health states in various aspects of health [11]. M. Ock initially translated the 220 health states from the GBD 2010 disability weight study into Korean, and M.-W. Jo modified them. A reverse translation process was also performed by a bilingual person, and M. Ock and M.-W. Jo reconfirmed the translation.

Valuation Method and Survey Procedure A computer program was developed for this study using the design of a previous study [11]. Participants in this program were initially asked for details about their sex, age, and educational level. Next, they performed three valuation methods (a PC, a PTO,

and a ranking method). First, the respondents were asked to select the healthier option between two health states in each PC, considering physical or mental problems. To elicit a preference for health states, we asked the respondents to imagine experiencing the health problems for the rest of their lives. Second, the respondents compared the health benefit of two life-saving programs for eliciting trade-offs between “being dead” and less fatal health states in the PTO [11]. The purpose of the PTO, however, was to erase the memory of the PC in this study, and so we did not analyze the results of the PTO. Third, the respondents were asked to determine the ranking of health states in order of good health in the ranking method, considering physical or mental problems. In the same way as for the PC, we also asked them to imagine experiencing the health problem for the rest of their life time. Each respondent conducted a PC 15 times, a PTO 3 times, and a ranking method 6 times. In each of the two questions involving a ranking method, the respondents ranked three, four, and five health states. After completing the valuation methods to assess health states, the participants were also asked about their clinical information, including ambulatory care visits in the past 2 weeks, hospitalization in the past 12 months, and morbidity.

Composition of the Questions To compare the results between the PC and the ranking methods, the health states of the PC and the ranking methods were constructed to overlap each other. Table 1 shows the overall composition of the questions for the valuation methods. “Being dead” should be included in 1-A and 3-E, whereas “full health” should be included in 1-H and 3-F. The fifth and sixth questions of the ranking method (3-E and 3-F) will be reference points to other questions in terms of selecting the health states compared or ranked. For example, if H1, H2, H3, and H4 are selected among 220 health states in 3-E, the 5 health states to be ranked are H1, H2, H3, H4, and “being dead.” Then, the 4 health states of 3-C are randomly determined from the 5 health states in 3-E and the 3 health states of 3-A are randomly chosen from the 4 health states in 3-C. The 2 health states of 1-A to be compared are “being dead” and a randomly selected health state among 4 health states from 3-E (excluding “being dead”). The 6 questions from 1-B and 1-G are similar to playing a full league with 4 soccer teams. That is, all the possible combinations of comparison in the 4 health states from 3-E (excluding “being dead”) are the 6 questions from 1-B and 1-G (4C2 ¼ 6). If the 5 health states from 3-F are selected (including “full health”), the health states of 1-H to 1-O, 3-B, and 3-C are randomly determined from 3-F in the same way. As a result, each respondent evaluated 10 health states (including “being dead” and “full health”) using the PC and the ranking methods.

Analysis Initially, descriptive analyses for sociodemographic factors were performed. Before comparing the results between the PC and the ranking methods, we converted the ranked data into PC data. For example, if the orders of health states were “full health” 4 H5 4 H6 in ranking the 3 health states (e.g., 3-B), they were converted as follows: “full health 4 H5,” “full health 4 H6,” and “H5 4 H6.” This conversion was applied to the other ranking methods (i.e., ranking four health states and ranking five health states). After conversion, we examined the consistency of the response rate. We defined the consistency of response rate as follows: (the number of coincident responses between the PC and the converted ranking methods)/(the number of converted responses in the ranking method)  100%. For example, as mentioned earlier, assume that the orders of health states in

3

VALUE IN HEALTH ] (2016) ]]]–]]]

Table 1 – The composition of questions for valuation methods. N

1-A 1-B 1-C 1-D 1-E 1-F 1-G 1-H 1-I 1-J 1-K 1-L 1-M 1-N 1-O

2-A 2-B 2-C 3-A 3-B 3-C 3-D 3-E 3-F

Description

Examples (H1–H220, full health, and being dead)

PC (15 questions)* Being dead and 1 randomly selected health state among 4 health states from 3-E (excluding being dead) Full league of 4 health states from 3-E Full league of 4 health states from 3-E Full league of 4 health states from 3-E Full league of 4 health states from 3-E Full league of 4 health states from 3-E Full league of 4 health states from 3-E Full health and 1 randomly selected health state among 4 health states from 3-F (excluding full health) Full league of 4 health states from 3-F Full league of 4 health states from 3-F Full league of 4 health states from 3-F Full league of 4 health states from 3-F Full league of 4 health states from 3-F Full league of 4 health states from 3-F 1 randomly selected health state among 4 health states from 3-E (excluding being dead) and 1 randomly selected health state among 4 health states from 3-F (excluding full health) PTO (3 questions) 1 randomly selected health state among 220 health states 1 randomly selected health state among 219 health states (excluding health state from 2-A) 1 randomly selected health state among 218 health states (excluding health states from 2-A and 2-B) Ranking methods (6 questions) 3 randomly selected health states among 4 health states from 3-C 3 randomly selected health states among 4 health states from 3-D 4 randomly selected health states among 5 health states from 3-E 4 randomly selected health states among 5 health states from 3-F Being dead and 4 randomly selected health states among 220 health states Full health and 4 randomly selected health states among 216 health states (excluding health states from 3-E)

Being dead, H1 H1, H2 H1, H3 H1, H4 H2, H3 H2, H4 H3, H4 Full health, H5 H5, H5, H5, H6, H6, H7, H1,

H6 H7 H8 H7 H8 H8 H5

H1 H10 H100 Being dead, H1, H2 Full health, H5, H6 Being dead, H1, H2, H3 Full health, H5, H6, H7 Being dead, H1, H2, H3, H4 Full health, H5, H6, H7, H8

PC, paired comparison; PTO, person trade-off. * The order of questions in the paired comparison was randomly determined.

ranking three health states were “full health 4 H5 4 H6,” whereas the responses of the PC were “full health 4 H5,” “full health 4 H6,” and “H6 4 H5”; then the consistency of the response rate is 66.7% (Table 2). We calculated the consistencies of the response rate in each of the ranking methods (ranking three, four, and five health states). Furthermore, the consistencies of response rate by the sociodemographic factors and the timed recording of respondents were compared using Student t

Table 2 – Example of calculating the consistency of response rate. PC

Ranking method

1-I: full health 4 H5 (coincidence) 1-J: full health 4 H6 (coincidence) 1-K: H6 4 H5 (not coincidence)

3-B: full health 4 H5 4 H6

full health 4 H5 (coincidence) full health 4 H6 (coincidence) H5 4 H6 (not coincidence) Consistency of response rate ¼ 2/3 ¼ 66.7%

PC, paired comparison.

test and an analysis of variance. The timed recording of respondents was divided into two sections, which were “greater than or equal to the average time of doing the entire ranking methods” and “less than the average time.” To compare the stated preference of two health states from the PC and that of three, four, and five health states from the converted ranking methods, we ran probit regression analyses with the stated discrete choice of preference as the dependent variable. The independent variable was the 222 health states (including “full health” and “being dead”) and was treated as a dummy variable with “full health” as the reference. The 220 health states were a set of brief lay descriptions that highlighted the major health outcomes using simple and nonclinical terms, reflecting all aspects of health [15]. For example, the health state of diabetic foot was defined as “has a sore on the foot that is swollen and causes some difficulty in walking” in the GBD Study 2010. In this case, it is difficult to classify these health states by their characteristics because the health states have no attributes or dimensions. For this reason, we ran probit regression rather than a rank-ordered logit model. Using the coefficient estimates of each health state, we obtained the predicted probabilities of each health state in the PC and each of the ranking methods (ranking three, four, and five health states). The predicted probability of each health state was defined as the probability of nonchoosing compared with “full health.”

4

VALUE IN HEALTH ] (2016) ]]]–]]]

Pearson correlation coefficients were determined between the predicted probability of the PC and that of each ranking method (ranking three, four, and five health states). The mean absolute error (MAE) was also assessed between the observed and the predicted values. Stata 13.1 software (StataCorp, College Station, TX) and Microsoft Office Excel 2007 were used for all statistical analyses, and P values less than 0.05 were considered to be statistically significant. This study was approved by the institutional review board of the Asan Medical Center (S2014-0323-0001), and all respondents’ written informed consents were obtained before their survey participation.

Results A total of 300 respondents participated in this survey comprising 147 (49.0%) male and 153 (51.0%) female respondents. The mean age of the study participants was 43.9 ⫾ 13.6 years, and the mean time to complete all the ranking methods was 629 ⫾ 155 seconds. The details of the respondents’ sociodemographic factors and clinical information are presented in Table 3. Table 4 presents the consistencies of the response rate in the three types of ranking methods by the sociodemographic factors and the time spent for all the ranking methods. The overall consistency of response rate was 82.8%. The highest consistency of response rate was 83.8% (in ranking three health states), followed by 83.5% (in ranking four health states) and 82.0% (in ranking five health states). The consistency of the response rate was higher in males than in females, and the highest consistency (84.8%) was seen in the age group of 40 to 49 years among all age groups. As the education level of respondents increased, the consistency of response rate showed a tendency to increase. Furthermore, respondents who spent greater than or equal to the average time for ranking methods displayed a slightly higher consistency of response rate than did respondents who spent less than the average time. Nevertheless, neither of the trends between the consistency of response rate for ranking methods and the sociodemographic factors or time spent for all the ranking methods were statistically significant. The Pearson correlation coefficients between the PC and the ranking methods are shown in Figure 1. The Pearson correlation coefficients were 0.789 (in ranking three health states), 0.852 (in ranking four health states), and 0.893 (in ranking five health states). The lowest MAE was 0.082 (95% confidence interval [CI] 0.074–0.090) in ranking five health states, followed by 0.123 (95% CI 0.111–0.135) in ranking four health states and 0.126 (95% CI 0.113–0.138) in ranking three health states.

Discussion This study was conducted to compare the stated preference of two health states from a PC and that of three, four, and five health states from a ranking method. In pursuit of this aim, we examined the consistency of response rate to determine the feasibility of converting the ranking method data into PC data to adapt analytical methods for DCE into ranking data. We also assessed how many alternatives can be ranked, mainly considering the consistency of the response rates and Pearson correlation coefficients between the PC and the ranking methods. Before comparing the results between the PC and the ranking methods, we converted the ranked data into PC data and calculated the consistency of response rate. Clearly, as the number of health states for the ranking method increased, the consistencies of response rate declined to a small extent. Furthermore, there was no statistically significant difference

Table 3 – Sociodemographic characteristics and clinical information for the respondents. Characteristic Sex Male Female Age group (y) 19–29 30–39 40–49 50–59 460 Education level Elementary school graduate or lower Middle school graduate High school graduate College graduate or higher Ambulatory care visit in the past 2 wk Yes No Hospitalization in the past 12 mo Yes No Morbidity Yes No

N

%

147 153

49.0 51.0

57 63 62 58 60

19.0 21.0 20.7 19.3 20.0

3 13 184 100

1.0 4.3 61.3 33.3

21 279

7.0 93.0

4 296

1.3 98.7

18 282

6.0 94.0

between the consistencies of response rate of the ranking methods by the sociodemographic factors and time spent for all ranking methods. In particular, considering the small difference in the consistencies of response rate by educational level, these results may mainly be due to the easiness of ordinal preference elicitation methods, including a ranking method [17,18]. These results also showed that the cognitive burden of the ranking method is not as high as it might seem. How many health states people can prioritize in a healthier order using a ranking method is also an important matter. Because the complexity of a ranking method will increase as the number of health states to be ranked increases, the results from a ranking method may not reflect the exact preferences of the respondents [19]. Nevertheless, ranking a small number of health states will not produce much statistical information compared with a PC. Considering the number of ranked alternatives in other studies [3,20,21], there still does not appear to be a consensus about the number of alternatives to be ranked. Although it is not easy to build a consensus regarding that problem on the basis of the results from this study, the consistency of the response rate declined to a small extent with an increase in the number of health states up to five in the ranking method. We had anticipated that the cognitive burden of the ranking method would be similar in ranking three, four, and five alternatives. We also expected that the consistency of response rate in ranking five health states from this study might be acceptable, although, to our knowledge, there has been no empirical research about admittable absolute standard for an inconsistent response rate. Furthermore, the Pearson correlation coefficient was the highest and the MAE was the lowest in ranking five health states. We assumed that the Pearson correlation coefficient in ranking five alternatives may be correlated to the scale of statistical information, indicating the strength of the ranking method. Ranking three, four, and five alternatives produced 3, 6, and 10 PC data, respectively. Consequently, when there are many health states to be compared, using five health

5

VALUE IN HEALTH ] (2016) ]]]–]]]

Table 4 – Consistency of response rate in three types of ranking methods. Characteristic Sex Male Female Age group (y) 19–29 30–39 40–49 50–59 460 Education level Elementary school graduate or lower Middle school graduate High school graduate College graduate or higher Time spending for ranking methods Greater than or equal to the average time Less than average time Total

Ranking 3

Ranking 4

Ranking 5

Overall ranking method

86.2 81.6

84.4 82.6

82.7 81.4

83.7 81.8

84.6 83.3 86.9 85.3 79.1

82.7 82.4 86.5 80.9 84.6

81.1 82.8 83.2 79.9 82.9

82.1 82.7 84.8 81.1 82.9

52.8 76.7 85.7 82.3

70.0 84.4 83.5 83.6

84.2 81.4 82.1 81.9

75.6 81.5 83.1 82.5

83.9 83.7 83.8

83.7 83.2 83.5

81.4 82.8 82.0

82.5 83.1 82.8

P value 0.1547

0.4987

0.6906

0.6180

Fig. 1 – Correlation between paired comparison and ranking methods of predicted values.

6

VALUE IN HEALTH ] (2016) ]]]–]]]

states in the ranking method will be more appropriate than using three or four health states. In comparison with the methods for analyzing PC data, methods for analyzing ranked data have been only recently reported [22,23]. Furthermore, application of the rank-ordered logit model, the most typical analytical method, is not easy if reducing alternatives to their attributes or dimensions is realistically as difficult as in the GBD Study 2010. In the case of a PC however, Thurstone’s model has been used since the 1920s [24] and the Bradley-Terry model has also been widely used [25]. In recent years, a conditional logistic regression or probit regression has been commonly used in the analysis of PC data [26]. If the consistency of the response rate is acceptable, converting ranked data into PC data could be applied in the analysis of ranked data. Although further empirical studies are needed to evaluate the degree of consistency between the values obtained from the PC and the ranking methods, this study suggests the possibility of converting ranked data into PC data. This study had some limitations. First, not all responses between the PC and the ranking methods were compared because of the limited number of questions. Because we made each respondent rank a set of five health states twice, 20 PCs were needed to evaluate the complete consistency of the response rate (5C2  2 ¼ 20), but we checked only 14 PCs (1-O was for a link between the two sets of five health states). The six unasked PC data, however, included “full health” or “being dead.” The respondents would be less confused on the six unasked PC questions than on other PC questions because “full health” or “being dead” may work as an anchor point in eliciting preferences. For this reason, we expect that the estimated consistency of the response rate from this study might be underestimated rather than overestimated. Second, memory effects might be present. Although we used a PTO to erase the memory of the PC, a memory effect between the PC and the ranking methods may still exist. This would lead to overestimating the estimated consistency of the response rate. Furthermore, we made respondents evaluate health states in order of three, four, and five health states. The respondents might find it easier to respond to ranking five health states, rather than ranking three health states, because of a memory effect. Third, we used the health states from the GBD Study 2010 in the evaluation of the PC and the ranking methods. Because the health states used in this study have no attributes or dimensions, the respondents may be perceived as being easier to manage than other health states with attributes or dimensions, such as the EuroQol five-dimensional questionnaire. Therefore, we expect that this factor might have led to overestimation of the consistency of the response rate and thus limited our ability to generalize our results to other settings that use health states with attributes or dimensions. In addition, we could not determine the consistency of response rate in a comparison within closer health states. Despite these limitations, our present study empirically examined the consistency of response rate between a PC method and a ranking method and suggested that using five alternatives in the ranking method will be more appropriate than using three or four alternatives. It will be meaningful to conduct similar studies in the future that expand the number of health states to be ranked (more than five) or use health states with attributes or dimensions. If six or more alternatives are ranked, the consistency of response rate may decrease, but this needs to be empirically validated. More empirical evidence on comparison of ranking methods and other valuation methods will be needed to make a ranking method usable. Source of financial support: This study was supported by a grant from the Korean Health Technology R&D Project, Ministry

of Health and Welfare, Republic of Korea (number of study: HI13C0729).

R EF E R EN C ES

[1] Ali S, Ronaldson S. Ordinal preference elicitation methods in health economics and health services research: using discrete choice experiments and ranking methods. Br Med Bull 2012;103:21–44. [2] Salomon JA, Murray CJ. A multi-method approach to measuring healthstate valuations. Health Econ 2004;13:281–90. [3] Ratcliffe J, Brazier J, Tsuchiya A, et al. Using DCE and ranking data to estimate cardinal values for health states for deriving a preferencebased single index from the sexual quality of life questionnaire. Health Econ 2009;18:1261–76. [4] Torrance GW. Social preferences for health states: an empirical evaluation of three measurement techniques. Socioecon Plann Sci 1976;10:129–36. [5] Nord E. The person-trade-off approach to valuing health care programs. Med Decis Making 1995;15:201–8. [6] Johannesson M, Pliskin JS, Weinstein MC. A note on QALYs, time tradeoff, and discounting. Med Decis Making 1994;14:188–93. [7] Østerdal LP. The lack of theoretical support for using person trade-offs in QALY-type models. Eur J Health Econ 2009;10:429–36. [8] Doctor JN, Miyamoto J, Bleichrodt H. When are person tradeoffs valid? J Health Econ 2009;28:1018–27. [9] Arnesen T, Nord E. The value of DALY life: problems with ethics and validity of disability adjusted life years. BMJ 1999;319:1423–5. [10] Oppe M, Devlin NJ, van Hout B, et al. A program of methodological research to arrive at the new international EQ-5D-5L valuation protocol. Value Health 2014;17:445–53. [11] Salomon JA, Vos T, Hogan DR, et al. Common values in assessing health outcomes from disease and injury: disability weights measurement study for the Global Burden of Disease Study 2010. Lancet 2012;380:2129–43. [12] Flynn TN, Louviere JJ, Peters TJ, Coast J. Best-worst scaling: what it can do for health care research and how to do it. J Health Econ 2007;26:171–89. [13] Fok D, Paap R, Van Dijk B. A rank ordered logit model with unobserved heterogeneity in ranking capabilities. J Appl Econ 2012;27:831–46. [14] Bridges JFP, Buttorff C, Groothuis-Oudshoorn K. Estimating patients’ preferences for medical devices: does the number of profiles in choice experiments matter? NBER Working Paper 17482, National Bureau of Economic Research, Cambridge, MA, 2011. [15] Thuesen KF. Analysis of Ranked Preference Data [dissertation]. Lyngby, Denmark:Technical University of Denmark 2007. [16] Bradley M, Daly A. Use of the logit scaling approach to test for rankorder and fatigue effects in stated preference data. Transportation 1994;21:167–84. [17] Brazier J, Ratcliffe J, Salomon JA, Tsuchiya A. Measuring and Valuing Health Benefits for Economic Evaluation. Oxford, UK: Oxford University Press, 2007. [18] Brazier J, Rowen D, Yang Y, Tsuchiya A. Using rank and discrete choice data to estimate health state utility values on the QALY scale. MPRA Paper No. 29891, Sheffield, UK: The University of Sheffield, 2009. [19] Carson RT, Louviere JJ. A common nomenclature for stated preference elicitation approaches. Environ Res Econ 2011;49:539–59. [20] Slothuus U, Larsen ML, Junker P. The contingent ranking method—a feasible and valid method when eliciting preferences for health care? Soc Sci Med 2002;54:1601–9. [21] Stolk EA, Oppe M, Scalone L, Krabbe PF. Discrete choice modeling for the quantification of health states: the case of the EQ-5D. Value Health 2010;13:1005–13. [22] Lee PH, Yu PLH. Distance-based tree models for ranking data. Comput Stat Data Anal 2010;54:1672–82. [23] Lee PH, Yu PLH. Mixtures of weighted distance-based models for ranking data with applications in political studies. Comput Stat Data Anal 2012;56:2486–500. [24] Thurstone LL. A law of comparative judgment. Psychol Rev 1927;34:273. [25] Bradley RA, Terry ME. Rank analysis of incomplete block designs, I: the method of paired comparisons. Biometrika 1952;39:324–45. [26] de Bekker-Grob EW, Ryan M, Gerard K. Discrete choice experiments in health economics: a review of the literature. Health Econ 2012;21:145–72.