VALUE IN HEALTH ] (2017) ]]]–]]]
Available online at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/jval
The Effect of Level Overlap and Color Coding on Attribute Non-attendance in Discrete Choice Experiments Marcel F. Jonker, PhD1,2,*, Bas Donkers, PhD1,3, Esther W. de Bekker-Grob, PhD1,2,4, Elly A. Stolk, PhD1,5 1 Erasmus Choice Modelling Centre, Erasmus University Rotterdam, The Netherlands; 2Erasmus School of Health Policy & Management, Erasmus University Rotterdam, The Netherlands; 3Erasmus School of Economics, Erasmus University Rotterdam, The Netherlands; 4Department of Public Health, Erasmus University Medical Centre, Rotterdam, The Netherlands; 5 EuroQol Research Foundation, Rotterdam, The Netherlands
AB STR A CT
Objective: The aim of this study was to test the hypothesis that level overlap and color coding can mitigate or even preclude the occurrence of attribute nonattendance in discrete choice experiments. Methods: A randomized controlled experiment with five experimental study arms was designed to investigate the independent and combined impact of level overlap and color coding on respondents’ attribute nonattendance. The systematic differences between the study arms allowed for a direct comparison of observed dropout rates and estimates of the average number of attributes attended to by respondents, which were obtained by using augmented mixed logit models that explicitly incorporated attribute non-attendance. Results: In the base-case study arm without level overlap or color coding, the observed dropout rate was 14%, and respondents attended, on average, only two out of five attributes. The independent introduction of both level overlap and color coding reduced
the dropout rate to 10% and increased attribute attendance to three attributes. The combination of level overlap and color coding, however, was most effective: it reduced the dropout rate to 8% and improved attribute attendance to four out of five attributes. The latter essentially removes the need to explicitly accommodate for attribute non-attendance when analyzing the choice data. Conclusions: On the basis of the presented results, the use of level overlap and color coding are recommendable strategies to reduce the dropout rate and improve attribute attendance in discrete choice experiments. Keywords: discrete choice experiment, attribute non-attendance, level overlap, color coding.
In health-related discrete choice experiment (DCE) research, it is generally assumed that participating survey respondents process all available information when choosing from among different options. An extensive literature, however, has recognized that respondents in DCE surveys often simplify their choice tasks by ignoring one or more of the included attributes [1–14]. The latter can result in biased preference estimates and divergent willingness-to-pay estimates if not adequately accounted for. In this paper, we postulate and evaluate the hypothesis that the use of level overlap and color coding can reduce, or even preclude, the occurrence of attribute nonattendance already present during the data collection. This can avoid the need for more advanced econometric models that specifically take attribute attendance into account and consequently simplify the statistical analyses after the data have been collected. Attribute level overlap is a DCE design characteristic that imposes, in each choice task, a certain number of attributes to be presented at the same level. With different attributes being “overlapped” in different choice tasks, the use of dominant attribute strategies (e.g., always choosing the option with the highest level on one specific attribute) is no longer viable, and
respondents are thereby stimulated to evaluate all attributes when completing the choice tasks. In addition to level overlap, color coding and other visually informative presentations can be used to (further) reduce the level of task complexity and hence reduce the need for choice heuristics [15–19]. Moreover, when combined with level overlap, the use of color coding automatically informs respondents about which levels are presented at the same value. Accordingly, color coding is expected to exert a stand-alone effect as well as an amplifying effect of level overlap on attribute non-attendance.
Copyright & 2017, International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc.
Methods Randomized Discrete Choice Experiment A randomized controlled experiment with five experimental study arms was designed to investigate the separate and combined impact of level overlap and color coding on respondents’ attribute non-attendance. To create sufficient potential for attribute non-attendance, a relatively large DCE was used with 21
* Address correspondence to: Marcel F. Jonker, Erasmus School of Health Policy & Management, Erasmus University Rotterdam, PO Box 1738, 3000DR Rotterdam, The Netherlands. E-mail:
[email protected] 1098-3015$36.00 – see front matter Copyright & 2017, International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. http://dx.doi.org/10.1016/j.jval.2017.10.002
2
VALUE IN HEALTH ] (2017) ]]]–]]]
Fig. 1 – Visual presentation of the choice tasks. (1) “No color”; (2) “Shades of purple”; (3) “Highlighting of differences.” choice tasks per respondent based on the EQ-5D-5L [20]. The EQ-5D-5L is a generic health state instrument with five dimensions (i.e. mobility, self-care, usual activities, pain/discomfort, and anxiety/depression). EQ-5D-5L health states are defined by selecting one of the five levels from each dimension, and preferences for each level were derived by asking respondents to repeatedly choose between two hypothetical health states (Fig. 1). The observed discrete choices were subsequently used to infer the respondents’ preference weights attached to the various health problems in each dimension, always with the “no health problem” levels used as the reference category.
The DCE Design Bayesian efficient design algorithms were used to optimize two separate DCE designs: one with level overlap and one without level overlap. The design without level overlap had all attributes at different levels. In contrast, the design with level overlap always had three attributes in each choice task fixed at the same level. The latter implied a maximum difference in level overlap between the two designs and hence maximum statistical power to establish differences in the dropout rate and attribute non-attendance. Both DCE designs were optimized on the basis of a D-efficiency criterion. To maximize the D-efficiency of the designs while accommodating substantial respondent heterogeneity, a so-called heterogeneous DCE design [21] was used. Heterogeneous DCE designs consist of several subdesigns that were simultaneously optimized, but each participating
respondent was only asked to complete a single subdesign, implying that no additional effort from individual respondents was required. The Bayesian design optimization algorithms were implemented in Julia [22]. All designs were optimized for the standard conditional logit model based on a main-effects utility function. The required prior preference information was obtained from previous EQ-5D-5L research and updated after a pilot run of 200 respondents to maximize statistical efficiency. The Bayesian D-efficiency criterion was based on a Latin hypercube sample consisting of 100 draws, optimized (also using Julia) to maximize the minimum distance between points. The optimization criterion was calculated as the weighted average Bayesian D-efficiency of the 21 choice tasks per subdesign, with one-third of the weight assigned to the combined (i.e., population) efficiency of the eight included subdesigns and with twothirds of the weight assigned to the individual D-efficiency of the subdesigns.
Color Coding Both DCE designs (i.e., with and without level overlap) were presented to respondents with and without color coding. Figure 1 presents examples of the different layouts that were used. As shown, the format without color coding uses no visual aids except for the use of bold face to highlight specific levels. The intensity color coding scheme implements the color-blind optimized shades of purple, as previously used by Jonker et al. [23,24], with the darkest purple used to denote the worst level and lighter purple used to denote better EQ-5D-5L levels. This color coding
3
VALUE IN HEALTH ] (2017) ]]]–]]]
scheme has been specifically optimized to signal differences in attribute levels for individuals with red–green color blindness (i.e., the most prevalent form of color blindness) while keeping the text readable for respondents who suffer from other forms of color blindness. Finally, the highlighting layout (as previously used by Norman et al. [25], for example) emphasizes the differences between choice options via the use of different background colors and bold black lines. This color coding format inherently depends on some degree of overlap and, as such, could not be used in combination with the DCE design without any level of overlap. Hence, as shown in Table 1, there were five experimental study arms.
Respondent Recruitment and Survey Structure A nationally representative sample from the Dutch general population in terms of age, gender, education, and geographic region was recruited online via Survey Sampling International. The sample size was intentionally large (i.e., exceeding 500 respondents per study arm) to provide ample statistical power to detect differences between the study arms. All participating survey respondents were randomly assigned to one of the five study arms and received a small financial compensation (i.e., approximately 1 euro) upon completion of the survey. It was known whether each respondent completed the survey, and, if not, at which question the dropout occurred. The survey itself was structured as follows: First, the survey was briefly introduced, followed by a self-rating question, in which respondents were asked to rate their current health in terms of the EQ-5D-5L health dimensions. This procedure familiarized respondents with the format and color coding of the health states that were used in the questionnaire. Subsequently, two warmup questions introduced respondents to the layout and the type of trade-offs in the DCE questions. Then, the actual set of 21 pairwise choice tasks were shown in a randomized order. To avoid too much repetition, a few demographic background questions were included after DCE questions 7 and 14. Finally, at the end of the survey, the respondents were asked several debriefing questions, including a ranking of the visual formats used in the surveys.
EQ-5D-5L health state characteristics (Xita) and health state preference parameters (βi), given by: uita ¼βi Xita þεita
ð1Þ
with εita denoting an independent and identically Gumbelldistributed error term. The respondent-specific β-parameters are assumed to be multivariate normal distributed with a common sample mean, μ, and covariance matrix, Σ: βi e Multivariate Normalðμ,Σ Þ
ð2Þ
The stochastic attribute selection is subsequently introduced using a vector, τi, that is element-wise multiplied with βi: uita ¼ τi βi : Xita þ εita ð3Þ Each element in τi has a value of 1 if the corresponding element in βi is attended to, and a value of 0 otherwise. The τi are, thus, assumed to be binary cq. Bernoulli distributed with probability of success, p: τi½1:5 e Bernoulli p½1:5 ð4Þ Here, the subscript [1:5] denotes that there are only five binary indicators (i.e., one per EQ-5D dimension). Accordingly, all τi pertaining to a specific EQ-5D attribute are automatically set at the same value: 1 if the entire attribute is attended to (which implies that all corresponding elements in βi are multiplied by 1), or 0 if the attribute is not attended to (which effectively sets all relevant elements of βi in the utility specification to 0). Similar to Scarpa et al. [2], Bayesian methods were used to fit the models and calculate the number of attributes attended to in each study arm. The latter was implemented by monitoring the average of the sum of the τi [1:5] . All models were implemented in the BUGS (Bayesian inference Using Gibbs Sampling) language and fitted using the OpenBUGS software, making use of two userwritten extensions to speed up the likelihood computations and improve the efficiency of the Markov Chain Monte Carlo sampling. Appendix A contains the full model code and specification of the priors.
Statistical Analyses
Results
Respondents’ health state preferences were modeled on a standard mixed logit model extended with stochastic attribute selection [2]. Such a model accommodates the panel structure of the choice data, allows for correlated preference parameters, and can be used to calculate the (average) number of attributes attended to by survey respondents. In the standard mixed logit model, the utility for alternative a in choice task t for respondent i is modeled as the product of the
In total 3394 respondents were recruited to participate in the survey. This resulted in 2656 completes (78%) and 738 dropouts (22%). Of the 738 dropouts, 298 occurred during the introduction of the survey (40%), 302 respondents dropped out during the DCE tasks (41%), and another 138 respondents (19%) dropped out after the completion of the DCE choice tasks. The last was caused by browser compatibility problems with the final ranking task of the visual formats. These respondents were not excluded but,
Table 1 – Number of completes, dropout rate, and attributes attended to per study arm. Study arm
1 2 3 4 5 ⁎ †
Overlap
Presentation
No. completes
No. drop-outs*
Dropout rate
No No Yes Yes Yes
No color Shades of purple No color Highlighting Shades of purple
553 554 567 563 557
89 60 60 50 43
13.9% 9.8% 9.6% 8.2% 7.2%
Dropouts occurring during the DCE choice tasks. 95% credible intervals in parenthesis.
No. attributes on average attended† 2.14 2.76 3.49 3.78 3.98
[1.86–2.40] [2.50–3.03] [3.25–3.72] [3.58–3.97] [3.77–4.22]
4
VALUE IN HEALTH ] (2017) ]]]–]]]
instead, were combined with the other completes, resulting in a total of 2794 respondents in the statistical analyses. Table 1 presents the number of completes, the observed dropout rate, and the estimated average number of attributes attended for each experimental study arm. In the base-case study arm with neither level overlap nor color coding, the dropout rate was 13.9%, and respondents were estimated to take, on average, only 2.1 out of five attributes into account when choosing between health states. With the introduction of color coding, these numbers improved to 9.8% and 2.8 attributes, and with level overlap, to 9.6% and 3.5 attributes, respectively. Interestingly, the combination of overlap and color coding was particularly effective. The combination of level overlap and highlighting reduced the dropout rate to 8.2% and increased attribute attendance to 3.8 attributes, whereas the combination of level overlap and intensity color coding reduced the dropout rate to 7.2% and increased respondents’ attribute attendance to 4.0 attributes.
Conclusions On the basis of the presented results, we can recommend the use of level overlap and color coding as strategies to reduce the dropout rate and improve respondents’ attribute attendance in discrete choice experiments. Not all DCE design generation software packages include level overlap options yet. However, the next update of the Ngene software package [28] is expected to include the capability to impose level overlap in DCE designs, and other design optimization packages, such as JMP [29], do already include the option to specify a minimum amount of level overlap.
Funding The authors gratefully acknowledge financial support from the EuroQol Research Foundation.
Disclaimer Discussion On the basis of a randomized controlled discrete choice experiment, the use of level overlap and color coding was confirmed to significantly improve respondents’ attribute attendance. The combination of level overlap with color coding was particularly effective; it increased the number of attributes taken into consideration by respondents from 2.1 attributes in the base-case scenario to 3.8 and 4.0 attributes for the “highlighting of differences” and “shades of purple” formats, respectively. Interestingly, respondents themselves also preferred the use of color coding in their DCE. In the final ranking task, 59% of respondents preferred the shades of purple, 23% preferred the highlighting of differences, and only 18% preferred the format without any color coding. Most importantly, with only one attribute unattended (by definition being the least important attribute that has the smallest and least significant preference estimates), the use of level overlap and color coding essentially removes the necessity to explicitly model potential attribute nonattendance after the data have been collected. This is further illustrated by a direct comparison of all EQ-5D-5L health state values in each of the five samples, derived by using statistical models that did and did not explicitly correct for potential attribute nonattendance (see Appendix B). Overall, our presented results are congruent with those of Maddala et al. [26], who had reported a (statistically insignificant) finding that respondents in a DCE design with level overlap were less likely to adopt a dominant attribute strategy. However, as already mentioned by Maddala et al., there is also a cost associated to the introduction of level overlap in the form of a reduction in statistical efficiency of the DCE design [26]. In terms of the DCE sample size calculations proposed by De Bekker-Grob et al. [27], the implemented DCE design with overlap required on average 35 additional respondents when evaluated for a conditional logit model, and six additional respondents when evaluated for a MIXL model to obtain statistically significant results for all EQ-5D-5L parameters. In clinical research, such an increase can still be problematic; however, the loss in statistical efficiency is almost entirely compensated for by the lower dropout rate. Moreover, a minor reduction in statistical efficiency seems from a quality perspective a small price to pay for obtaining a lower task complexity, a lower dropout rate, and more informative choice data when respondents (in a design without level overlap) would otherwise have resorted to simplifying choice heuristics.
The views and opinions expressed in this article are those of the authors and do not necessarily reflect those of the EuroQol Group.
Conflict of Interest The authors have indicated that they have no conflicts of interest with regard to the content of this article.
Supplemental Materials Supplemental material accompanying this article can be found in the online version as a hyperlink at http://dx.doi.org/10.1016/j. jval.2017.10.002. or, if a hard copy of article, at www.valuein healthjournal.com/issues (select volume, issue, and article).
R EF E R EN C ES
[1] Campbell D, Hutchinson WG, Scarpa R. Incorporating discontinuous preferences into the analysis of discrete choice experiments. Environ Resource Econ 2008;41:401–17. [2] Scarpa R, Gilbride TJ, Campbell D, et al. Modelling attribute nonattendance in choice experiments for rural landscape valuation. Eur Rev Agri Econ 2009;36(2):151–74. [3] Campbell D, Hensher DA, Scarpa R. Non-attendance to attributes in environmental choice analysis: a latent class specification. J Environ Plan Man 2011;54(8):1061–76. [4] Scarpa R, Zanoli R, Bruschi V, et al. Inferred and stated attribute nonattendance in food choice experiments. Am J Agri Econ 2012;95 (1):165–80. [5] Hensher DA, Rose J, Greene WH. The implications on willingness to pay of respondents ignoring specific attributes. Transportation 2005;32 (3):203–22. [6] Hensher DA, Rose JM. Simplifying choice through attribute preservation or non-attendance: implications for willingness to pay. Transportation Res Part E Logistics Transport Rev 2009;45(4):583–90. [7] Hensher DA, Rose JM, Greene WH. Inferring attribute non-attendance from stated choice data: implications for willingness to pay estimates and a warning for stated choice experiment design. Transportation 2012;39(2):235–45. [8] Hess S, Hensher DA, Daly A. Not bored yet–revisiting respondent fatigue in stated choice experiments. Transportation Res Part A Policy Pract 2012;46(3):626–44. [9] Hess S, Stathopoulos A, Campbell D, et al. It’s not that I don’t care, I just don’t care very much: confounding between attribute non-attendance and taste heterogeneity. Transportation 2013;40(3):583–607. [10] Hole AR. A discrete choice model with endogenous attribute attendance. Econ Lett 2011;110(3):203–5.
VALUE IN HEALTH ] (2017) ]]]–]]]
[11] Lagarde M. Investigating attribute non‐attendance and its consequences in choice experiments with latent class models. Health Econ 2013;22(5):554–67. [12] Hole AR, Kolstad JR, Gyrd-Hansen D. Inferred vs. stated attribute nonattendance in choice experiments: a study of doctors’ prescription behaviour. J Econ Behav Org 2013;96:21–31. [13] Hole AR, Norman R, Viney R. Response patterns in health state valuation using endogenous attribute attendance and latent class analysis. Health Econ 2016;25(2):212–24. [14] Erdem S, Campbell D, Hole AR. Accounting for attribute‐level non‐ attendance in a health choice experiment: does it matter? Health Econ 2015;24(7):773–89. [15] Hawley ST, Zikmund-Fisher B, Ubel P, et al. The impact of the format of graphical presentation on health-related knowledge and treatment choices. Patient Educ Couns 2008;73(3):448–55. [16] Hauber AB, Johnson FR, Grotzinger KM, et al. Patients’ benefit-risk preferences for chronic idiopathic thrombocytopenic purpura therapies. Ann Pharmacother 2010;44(3):479–88. [17] Gonzalez JM, Johnson FR, Runken MC, et al. Evaluating migraineurs’ preferences for migraine treatment outcomes using a choice experiment. Headache J Head Face Pain 2013;53(10):1635–50. [18] Mühlbacher AC, Bethge S. Reduce mortality risk above all else: a discrete-choice experiment in acute coronary syndrome patients. Pharmacoeconomics. 2015;33(1):71–81. [19] Mühlbacher A, Bethge S. First and foremost battle the virus: eliciting patient preferences in antiviral therapy for hepatitis C using a discrete choice experiment. Value Health. 2016;19(6):776–87.
5
[20] Herdman M, Gudex C, Lloyd A, et al. Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L). Qual Life Res 2011;20(10):1727–36. [21] Sandor Z, Wedel M. Heterogeneous conjoint choice designs. J Market Res 2005;42(2):210–8. [22] Bezanson J, Karpinski S, Shah VB, et al. Julia: a fast dynamic language for technical computing. arXiv preprint arXiv:12095145 2012. [23] Jonker MF, Stolk EA, Donkers B. Valuing EQ-5D-5L using DCE Duration: too complex or not? EuroQol Plenary 2012. [24] Jonker MF, Attema AE, Donkers B, et al. Are health state valuations from the general public biased? A test of health state reference dependency using self‐assessed health and an efficient discrete choice experiment. Health Econ 2016 Oct 27. http://dx.doi.org/10.1002/hec. 3445. [Epub ahead of print]. [25] Norman R, Viney R, Aaronson N, et al. Using a discrete choice experiment to value the QLU-C10D: feasibility and sensitivity to presentation format. Qual Life Res 2016;25(3):637–49. [26] Maddala T, Phillips KA, Reed Johnson F. An experiment on simplifying conjoint analysis designs for measuring preferences. Health Econ 2003;12(12):1035–47. [27] de Bekker-Grob EW, Donkers B, Jonker MF, et al. Sample size requirements for discrete-choice experiments in healthcare: a practical guide. Patient 2015;8(5):373–84. [28] Rose J, Collins A, Bliemer M, et al. NGENE software, version: 1.1.2. Build 2014. [29] JMP®, Version 13. Cary, NC: SAS Institute Inc., 1989.