Exploring preference anomalies in double bounded contingent valuation

Exploring preference anomalies in double bounded contingent valuation

Journal of Health Economics 26 (2007) 463–482 Exploring preference anomalies in double bounded contingent valuation Verity Watson ∗ , Mandy Ryan ∗ He...

223KB Sizes 8 Downloads 60 Views

Journal of Health Economics 26 (2007) 463–482

Exploring preference anomalies in double bounded contingent valuation Verity Watson ∗ , Mandy Ryan ∗ Health Economics Research Unit, Institute of Applied Health Sciences, University of Aberdeen, Polwarth Building, Foresterhill, Aberdeen, AB25 9ZD, UK Received 7 February 2006; received in revised form 24 October 2006; accepted 25 October 2006 Available online 22 November 2006

Abstract Double bounded dichotomous choice (DBDC) contingent valuation offers increased efficiency of willingness to pay (WTP) estimates compared with the single bounded format. However, evidence suggests DBDC generates anomalous respondent behaviour. This paper provides the first investigation and explanation of these anomalies in health. Results suggest the incentives for truthful preference revelation are altered in the presence of a follow up question. This result is found using both regression techniques and analysis of raw responses. Although findings suggest ‘very certain’ respondents exhibit less anomalous behaviour inconsistencies remain across bounds. The results of this study question the use of iterative valuation formats. © 2006 Elsevier B.V. All rights reserved. JEL classification: D12; D60; I10 Keywords: Contingent valuation; Anomalies; Prospect theory; Anchoring; Calibration

1. Introduction Whilst contingent valuation is being increasingly used in health economics to estimate willingness to pay (WTP) (Diener et al., 1998; Klose, 1999; Smith, 2003), the appropriate elicitation format is a continuing source of debate (Smith, 2003). There is a consensus that an open-ended (OE) approach is not appropriate (Arrow et al., 1993; Donaldson et al., 1997). The payment card (PC) approach has proved popular (Smith, 2003). Proponents of this method argue it mimics real ∗

Corresponding authors. Tel.: +44 1224 555937; fax: +44 1224 550926. E-mail addresses: [email protected] (V. Watson), [email protected] (M. Ryan).

0167-6296/$ – see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.jhealeco.2006.10.009

464

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

life allowing individuals to “shop around” for a value that is the most they would pay (Donaldson et al., 1997). Following the National Oceanic and Atmospheric Administration (NOAA) panel recommendations (Arrow et al., 1993), there exist many applications of the single bounded dichotomous choice (SBDC) approach in both health and environmental economics (Smith, 2003; Bateman et al., 2002). The advantages of this approach include cognitive simplicity for respondents and reduced incentives for strategic behaviour (Hoehn and Randall, 1987; Carson et al., 1999). The SBDC approach, however, provides limited information regarding respondents’ true WTP with larger sample sizes required for an equivalent level of statistical precision compared with PC or OE elicitation formats. To increase statistical efficiency the initial dichotomous choice question (DC1) can be supplemented with a second dichotomous choice question (DC2), resulting in the double bounded dichotomous choice (DBDC) format. Responses to DC1 determine the bid offered in DC2. If respondents state ‘yes’ in DC1, a higher bid is offered in DC2. Conversely, if respondents state ‘no’ in DC1, a lower bid is offered in DC2. Hanemann et al. (1991) provided empirical support for increased statistical efficiency through the reduced variance of WTP estimates from DBDC experiments. However, a well-documented result from DBDC applications is that resultant welfare estimates are lower than those from SBDC (Hanemann et al., 1991; Cameron and Quiggan, 1994; McFadden, 1994; Alberini et al., 1997; Clarke, 2000; Bateman et al., 2001; Kennedy, 2002; Whitehead, 2002). Moreover, responses to DC2 are dependant on DC1 (Alberini et al., 1997; Cameron and Quiggan, 1994). Within health economics applications of the DBDC format are limited, although there appears to be growing interest in the method (Clarke, 2000; Liu et al., 2000; Kennedy, 2002; Asfaw and von Braun, 2005; Hackl and Pruckner, 2005; Prosser et al., 2005). Of the existing studies only Clarke (2000) and Kennedy (2002) reported comparisons of WTP estimates from DBDC and SBDC models, both finding results lower WTP estimates from the DBDC model, neither study investigated the behavioural motivations underlying their results. Given the evidence of anomalous preference expressions questions are raised concerning the behavioural motivations within DBDC studies. A number of specific explanations for the anomalous behaviour observed in DBDC studies have been proposed. Respondents may use the information provided by DC1 to inform their response to DC2 through anchoring or starting point bias (Boyle et al., 1985; Herriges and Shogren, 1996; Boyle et al., 1997). Alternatively, applying prospect theory (Kahneman and Tversky, 1979), respondents would frame DC2 as a gain or loss against the initial bid DC1 (DeShazo, 2002). Carson et al. (1999) proposed cost based expectations. Here respondents assume the initial bid amount represents the cost of providing the good. A higher DC2 is seen as an attempt by the government to raise additional revenue, and a lower DC2 will result in a lower quality good being provided. Bateman et al. (1999) argued that ‘Guilt and Indignation’ affect respondents when answering a DBDC question. Indignation occurs when respondents perceive that they struck a deal with the interviewer in DC1. When asked a higher bid amount in DC2 respondents feel the interviewer has reneged on the deal. Conversely, guilt (or a sense of social responsibility) occurs when respondents state ‘no’ to DC1 and are then asked a lower bid amount in DC2. Yea saying has been proposed as an explanation of SBDC resulting in higher welfare estimates than OE or PC CV approaches (Holmes and Kramer, 1995; Kanninen, 1995; Ready et al., 1996; Frew et al., 2003; Ryan et al., 2004). In DBDC, yea saying would manifest as a tendency for respondents to say ‘yes’ to all bids resulting in higher welfare estimates. Finally, strategic behaviour may be induced where respondents perceive DC2 indicates uncertainty and price flexibility result-

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

465

ing in respondents being provided with an incentive to understate true WTP (Carson et al., 1999). This paper considers the application of DBDC to estimate WTP within health, and investigates if preference anomalies are observed. The application is concerned with preferences for the provision of a national air ambulance service (AAS). Following estimation of WTP, a number of explanations for any divergence between DC1 and DC2 are investigated. Consideration is also given to how preference anomalies differ according to respondents’ certainty when completing the CV task. Consequently, this study not only provides the first consideration of these issues a health care context, but also tests the generalisability of a DBDC results from other disciplines (Hunter, 2001) and the robustness of these findings when respondent certainty is taken into account. Section 2 discusses the study design, methods of analysis for DBDC data, and tests of preference anomalies. These tests can be split into two groups: (i) tests incorporated into the analytical framework and (ii) tests exploiting study design to consider aggregate response patterns across DC1 and DC2. Section 3 presents the results and Section 4 discusses these results. 2. Methods 2.1. The experiment This study sought public perceptions of, and WTP for, the provision of a national air ambulance service1 . Between August and September 2002, a representative sample of 1400 members of the public was interviewed using computer assisted telephone interviews (CATI). For the general survey results see Johnston and Ryan (2002). Prior to the DBDC respondents were asked if they were willing to pay anything for the good. Respondents stating ‘yes’ were offered a randomly assigned ‘base bid’, respondents stating ’no’ were asked the reasons for their response. The objective of this paper is to consider anomalies in DBDC, thus respondents who stated ‘no’ to the screening question are not considered in the analysis.2 Advanced disclosure has been argued to reduce the likelihood of preference anomalies; accordingly respondents were informed they would be presented with two valuation questions. Respondents were told the second valuation question was dependant on their response to the initial question, where the bid amount would be higher if they stated ‘yes’ and lower if they stated ‘no’ to the initial valuation question. Respondents were randomly presented one of five ‘base bids’ (DC1): £25, £50, £100, £200, and £300. If a ‘yes’ (‘no’) response was given to DC1, respondents were offered a higher (lower) ‘follow up bid’ (DC2). The bid levels in DC2 for each ‘base bid’ level are; £25 (with lower = £10 and higher = £50); £50 (with lower = £25 and higher = £100); £100 (with lower = £50 and higher = £200); £200 (with lower = £100 and higher = £300); and £300 (with lower = £200 and higher = £400)3 . The experimental design of the bid vector was chosen to ensure overlapping bid levels across bounds. A feature of this design is the ability to compare responses to the same bid amount (e.g. £50) across bounds. For instance, responses to the bid 1 Whilst the results presented provide insight into the value the community places on air ambulances, the primary objective is to investigate anomalies in DBDC data. 2 In the valuation exercise those who answered ‘no’ to the screening question were asked why. This was to distinguish ‘protestors’ from genuine zero values (Johnston and Ryan, 2002). 3 The bid levels were obtained from a CV survey of air ambulances conducted in the Grampian Region of Scotland (Ryan et al., 2004). Response monitoring ensured an equal split of respondents across bid levels.

466

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

amount, £50, can be compared when £50 is presented as the base bid or the follow up bid. From the design, £50 is a follow up bid both after a lower base bid (£25) and after a higher base bid (£100). The experimental design permits consideration of four bid groupings (£50, £100, £200, and £300). Following both DC1 and DC2 respondents were asked to assess their degree of certainty for the response. Certainty was measured on a 5-point scale (1 = very uncertain to 5 = very certain). Respondents’ certainty following DC1 is used to distinguish between ‘very certain’ respondents (certainty = 5) and ‘less certain’ respondents (certainty = 1, 2, 3, 4). Degree of certainty has previously been used in stated preference (SP) studies to filter ‘false’ yes responses. Evidence suggests if response data are calibrated by certainty, resultant WTP estimates have higher external validity (Johannesson et al., 1998; Blumenschien et al., 1998; Blumenschein et al., 2001). 2.2. Data analysis Table 1 provides a summary of the regression models estimated to investigate both welfare estimates from DBDC experiments (Models I–IV) and preference anomalies (Models V–VIII). 2.2.1. Estimating willingness to pay using DBDC data Responses to DBDC data, as with SBDC data, are analysed within the random utility model (McFadden, 1974; Hanemann, 1984). Let t1 be the base bid at DC1 and t2 be the follow up bid at DC2. Possible responses are: Yes|Yes ⇒ WTP  t2 Yes|No ⇒ t1  WTP < t2 No|Yes ⇒ t1 > WTP  t2 No|No ⇒ WTP < t2 Following this: WTPij = zij β + εij

(1)

where WTPij is the jth individual’s WTP, i = 1, 2 represents DC1 and DC2, respectively, zij β are vectors of variables related to individuals and their parameters, respectively. The error term, εij , incorporates both individual and question specific error. Thus, Eq. (1) incorporates the notion Table 1 Summary of empirical models Model

Description

Analysis and welfare estimates from DBDC Model I Model II Model III Model IV

Bivariate probit Interval data model Probit analysing (DC1 only) Probit analysis (DC2 only)

Models of preference anomalies Model V Model VI Model VII Model VIII

Random effects probit (na¨ıve model) Shift model (random effects probit) Anchoring model (random effects probit) Shift and anchoring model (random effects probit)

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

467

that individual j may respond differently to each question, i. Combining Eq. (1) with response descriptors above, the probability of respondent j answering ‘yes’ to DC1 and ‘no’ to DC2 is expressed as: Pr(yes, no) = Pr(WTP1j ≥ t 1 , WTP2j < t 2 ) Pr(yes, no) = Pr(z1 β + ε1j ≥ t 1 , z2 β + ε2j < t 2 )

(2)

Expanding to incorporate all response combinations results in the likelihood function: YN

Lj (zβ|t) = Pr(z1 β + ε1j ≥ t 1 , z2 β + ε2j < t 2 )

YY

×Pr(z1 β + ε1j > t 1 , z2 β + ε2j ≥ t 2 )

NN

×Pr(z1 β + ε1j < t 1 , z2 β + ε2j < t 2 )

NY

×Pr(z1 β + ε1j < t 1 , z2 β + ε2j ≥ t 2 )

(3)

Assuming error terms ε1j and ε2j are normally distributed with mean zero and variance σ12 and σ22 , respectively, and allowing for correlation between DC1 and DC2, expressed by ρ, Eq. (3) is estimated using the bivariate probit model (Cameron and Quiggan, 1994). This is referred to as Model I. A restricted version of the bivariate probit model is the interval data model (Hanemann et al., 1991).4 Here responses to DC1 and DC2 are assumed to be motivated by the same latent WTP value, observed differences are due to randomness in the underlying WTP distribution, and ρ = 1. Thus: z1j β = z2j β ε1j = ε2j

(4)

σ1 = σ2 These restrictions are tested using a likelihood ratio test to compare the bivariate probit (Model I) and interval data model (Model II). Analysing responses to DC1 and DC2 separately (referred to as Models III and IV, respectively), as if elicited from independent DC experiments, assumes no correlation between the responses (ρ = 0). Again, this restriction is tested using a likelihood ratio test between Model I and Models III and IV. Further, comparing the interval data model with responses to DC1 only (Model III) provides a test of the common finding that DBDC results in lower WTP estimates than SBDC. For simplicity all models are estimated with only a constant and the bid vector. Models I–IV are re-estimated for ‘very certain’ and ‘less certain’ respondents to test if any observed preference anomalies persist. The restriction that preferences are the same across ‘very certain’ and ‘less certain’ respondents is tested using a likelihood ratio test. Given evidence that ‘very certain’ responses have higher external validity, it is hypothesised that preference anomalies may differ across groups.

4 The interval data is the most usually method of analysis for DBDC data. Further, Alberini (1995) found welfare estimates from the model were relatively unbiased even when ρ was as low as 0.2.

468

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

2.2.2. Incorporating preference anomalies into empirical models Regression analysis is used to investigate any preference anomalies found in Models I–IV above. The hypothesis tested is that respondents’ preferences shift between DC1 and DC2. Assuming responses to DC1 are based on respondents true WTP, then responses to DC2 are based on respondents true WTP plus the effect of a follow up question. This effect is captured through the inclusion of a structural shift parameter, δ (Alberini et al., 1997): WTP2 = WTP1 + δ

(5)

A negative coefficient indicates the follow up question increases respondents’ probability of rejecting the bid amount in DC2. This finding is consistent with several proposed explanations for the divergence between SBDC and DBDC, including prospect theory, indignation, and cost expectations. A positive coefficient would be consistent with yea saying. Anchoring (or starting point bias) can also be incorporated into this model (Herriges and Shogren, 1996). Again responses to DC1 are based on respondents’ true WTP. However, WTP expressed in DC2 is based upon the weighted average of respondents WTP expressed at DC1 and t1 . Respondents faced with the second bid level, t2 , assume the true value of the good lies between t1 and t2 . Accordingly, WTP expressed at DC2 (WTP2 ) is the weighted average of true WTP (WTP1 ) and the bid level t1 : WTP2 = (1 − γ)WTP1 + γt1

(6)

where 0 ≤ γ ≤ 1. If γ = 0 no anchoring is present and WTP2 = WTP1 (responses to both DC1 and DC2 are based on same underlying WTP). However, if γ > 0 anchoring is present and WTP2 = WTP1 . As γ tends to 1, anchoring is increased, and WTP2 tends to the base bid, t1 . The presence of a structural shift and anchoring are tested using: WTPi = β0 + βt ti + βD D + γt1 D

(7)

where β0 is the constant term; incorporating respondents preference for the goods provision, ti the bid amount at bound i, where i = 1, 2. A shift effect can be tested through consideration of δ = βD , where D is a dummy variable equal to 1 for DC2 and equal to 0 for DC1. Under the weighted average hypothesis 0 < γ < 1, where γ is the coefficient when t1 is included as a covariate in modelling responses to DC2. In the case of DC1, where there is no shift effect of the follow up question and no anchoring, Eq. (7) would be reduced to: β0 − β1 t1

(8)

For DC2, Eq. (7) becomes: β0 + β2 t2 + βD D + γt1 D

(9)

Four random effects probit models are estimated (see Table 1): the na¨ıve model with no controls (Model V); the shift effect model (Model VI); the anchoring model (Model VII); and the combined shift and anchoring model (Model VIII). These models are estimated for the whole sample and subgroups of respondents according to reported response certainty to base bid levels (certainty = 5 and certainty = 1, 2, 3, 4). Respondents reporting certainty = 5 are assumed to hold well-defined preferences and be less likely to base subsequent valuations on information provided in DC1. In this case the shift and anchoring parameters are expected to be insignificant.

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

469

2.2.3. Investigating anomalies through response patterns Consideration is given here to the pattern of raw responses to explain preference anomalies (DeShazo, 2002). Hanemann and Kanninen (1999) tested response consistency based on a nonparametric test where unconditional probability of stating ‘yes’ to bid amount, t, in DC1 is compared with the conditional probability of the same bid in DC2 having stated ‘yes’ to a lower bid, tL , or ‘no’ to a higher bid, tH . Accordingly, it is expected that consistent responses will satisfy the following conditions: Pr{‘yes’ to t} = Pr{‘yes’ to t|‘yes’ to tL } × Pr{‘yes’ to tL } or Pr{‘yes’ to t} = Pr{‘yes’ to t|‘no’ to tH } × Pr{‘no’ to tH } Applying this framework permits the proposed explanations from the divergences between SBDC and DBDC to be investigated. 2.2.3.1. Prospect theory. Prospect theory (Kahneman and Tversky, 1979) assumes respondents form a reference in answering ‘yes’ to DC1 (respondents answering ‘no’ are assumed not to form a reference). A higher follow up bid, DC2, is compared with this reference point. DC2 is negatively framed, thus: Pr{‘yes’ to t} > Pr{‘yes’ to t|‘yes’ to tL } × Pr{‘yes’ to tL } Pr{‘no’ to t|‘yes’ to tL } × Pr{‘yes’ to tL } > Pr{‘yes’ to tL |‘no’ to t} × Pr{‘no’ to t} and Pr{‘no’ to tL } = Pr{‘no’ to tL |‘no’ to t} × Pr{‘no’ to t} 2.2.3.2. Guilt and Indignation. Bateman et al. (2001) proposed indignation occurs when respondents believe they struck a deal with the interviewer in answering ‘yes’ to DC1. This increases the probability of a negative follow up response. Conversely, when respondents state ‘no’ to DC1 guilt, or sense of social responsibility, increases the probability of a positive response to follow up questions. The effect in the middle interval is dependent on the relative strength of the guilt and indignation effects. Pr{‘yes’ to t} > Pr{‘yes’ to t|‘yes’ to tL } × Pr{‘yes’ to tL } Pr{‘no’ to t|‘yes’ to tL } × Pr{‘yes’ to tL }?Pr{‘yes’ to tL |‘no’ to t} × Pr{‘no’ to t} and Pr{‘no’ to tL } > Pr{‘no’ to tL |‘no’ to t} × Pr{‘no’ to t} 2.2.3.3. Cost expectations. Carson et al. (1999) argue that respondents may interpret DC1 as the cost of providing the good. Accordingly, when respondents state ‘yes’ to DC1, DC2 is seen as an attempt by government to obtain additional funds beyond the actual cost of the good, reducing conditional ‘yes’ responses thus: Pr{‘yes’ to t} > Pr{‘yes’ to t|‘yes’ to tL } × Pr{‘yes’ to tL }

470

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

Conversely, following a ‘no’ response to DC1, respondents perceive a lower quality of good will be provided, reducing conditional ‘yes’ responses to DC2 thus: Pr{‘no’ to tL } < Pr{‘no’ to tL |‘no’ to t} × Pr{‘no’ to t} When the hypotheses at the upper and lower intervals are combined the predicted impact on the middle interval is: Pr{‘no’ to t|‘yes’ to tL } × Pr{‘yes’ to tL } > Pr{‘yes’ to tL |‘no’ to t} × Pr{‘no’ to t} Under the cost expectations explanation respondents do not express their valuation of the good in DC2 but rather react to the new information, based on the expectation that the good will ‘cost’ DC1. 2.2.3.4. Strategic behaviour. Proponents of this explanation suggest the presence of DC2 indicates price flexibility, compromising incentive compatibility and provoking strategic behaviour (Carson et al., 1999; DeShazo, 2002). It is argued that respondents are more likely to answer ’no’ to DC2, regardless of whether DC2 is in the ascending or descending sequence. Thus: Pr{‘yes’ to t} > Pr{‘yes’ to t|‘yes’ to tL } × Pr{‘yes’ to tL } Pr{‘no’ to t|‘yes’ to tL } × Pr{‘yes’ to tL } > Pr{‘yes’ to tL |‘no’ to t} × Pr{‘no’ to t} and Pr{‘no’ to tL } < Pr{‘no’ to tL |‘no’ to t} × Pr{‘no’ to t} 2.2.3.5. Yea saying. Yea-sayers will state ‘yes’ to any bid offered (Holmes and Kramer, 1995; Kanninen, 1995; Ready et al., 1996). This should not affect the descending sequence, given this starts with a ‘no’. Thus, the descending sequence is anomaly free and is used as a reference with which to compare the ascending sequence. The probability of respondents answering ‘yes’ to both DC1 and DC2, will be greater than the probability of stating ‘yes’ to DC1. Thus: Pr{‘yes’ to t} < Pr{‘yes’ to t|‘yes’ to tL } × Pr{‘yes’ to tL } Pr{‘no’ to t|‘yes’ to tL } × Pr{‘yes’ to tL } < Pr{‘yes’ to tL |‘no’ to t} × Pr{‘no’ to t} and Pr{‘no’ to tL } < Pr{‘no’ to tL |‘no’ to t} × Pr{‘no’ to t} 2.2.3.6. Anchoring. The weighted average anchoring explanation assumes follow up responses to DC2 are based on a weighted average of DC1 and DC2 (Herriges and Shogren, 1996). Respondents in ascending sequences form an anchor at DC1, with DC2 interpreted as a lower weighted average bid increasing the probability of ‘yes’ to DC2: Pr{‘yes’ to t} < Pr{‘yes’ to t|‘yes’ to tL } × Pr{‘yes’ to tL }

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

471

Respondents in descending sequences anchor responses at the higher DC1 level, and DC2 is interpreted as a higher weighted average bid decreasing the probability of ‘yes’ to DC2 Pr{‘no’ to tL } < Pr{‘no’ to tL |‘no’ to t} × Pr{‘no’ to t} The effect upon the middle interval of the sequence is dependent on the relative strengths of effects in the upper and lower intervals. Pr{‘no’ to t|‘yes’ to tL } × Pr {‘yes’ to tL }?Pr{‘yes’ to tL |‘no’ to t} × Pr{‘no’ to t} Raw responses to DC1 and DC2 from the DBDC experiment are used to investigate the above hypotheses. The design of the bid vector allows comparison across four bid groupings (£25 and £50; £50, and £100; £100 and £200; £200 and £300). The significance of differences conditional and unconditional probabilities is tested using a test of proportions. Response patterns are again investigated according to respondents’ self reported certainty following DC1. A priori less certain respondents are expected to be more susceptible to using the information presented in DC1 to inform their valuations in DC2. 3. Results A summary of responses across bid levels, for the full sample and the subgroups according to certainty, is presented in Table 2. A priori expectations of the probability of ‘yes’ falling as the bid level increases, were fulfilled for all the data. For the full sample the proportion of ‘yes’ responses to DC1 ranged from 76% for £25 to 23% for £300. A similar pattern was observed for the proportion of ‘yes’ responses to DC2 in the lower bound (76–21%) and the upper bound (35–20%). Across bounds ‘very certain’ respondents were more likely to state ‘yes’ to lower bid levels and less likely to state ‘yes’ to higher bid levels. For example, in the case ‘very certain’ respondents (certainty = 5), the probability of ‘yes’ to the initial bid fell from 80% for £25 to 15% for £300. With ‘less certain’ respondents (certainty = 1, 2, 3, 4) the probability of ‘yes’ to the initial bid fell from 72% to 31%. 3.1. Comparison of willingness to pay estimates Estimates of mean WTP from Models I–IV (Table 3) were in line with existing evidence. Willingness to pay estimated from DBDC data, using the interval data model (Model II), was lower than from SBDC model of initial bids (Model III) (£104.34 < £141.91). In the bivariate probit model (Model I) Rho (ρ) was negatively significant, implying a negative correlation between responses to DC1 and DC2 (ρ = −0.239; χ2 (d.f.) = 9.928(1)). This was confirmed by WTP estimates, where WTP estimated from Model III, using only initial responses, was higher than WTP estimated from responses to DC2 (Model IV). Likelihood ratio tests indicated restrictions imposed by Models II, III and IV were invalid compared to the unrestricted bivariate probit model (Model I) (Model I versus Model II: 239.71 ∼ χ2 (2), Models III and IV versus Model I: 8.32 ∼ χ2 (2)). A likelihood ratio test indicated the restriction of equal preferences for ‘very certain’ and ‘less certain’ respondents was not valid in Models I–IV (96.7 ∼ χ2 (2), 53.54 ∼ χ2 (2), and 16.52 ∼ χ2 (2), respectively). For Model IV the likelihood ratio test statistic (0.16 ∼ χ2 (2)) can-

472

Base bid DC1

N

Yes N (%)

No N (%)

Upper bound (DC2)

Yes N (%)

No N (%)

Lower bound (DC2)

Yes N (%)

No N (%)

Full sample £25 £50 £100 £200 £300

157 150 172 151 172

120 (76.4) 103 (68.6) 92 (53.4) 56 (37.1) 39 (22.7)

37 (23.6) 47 (31.3) 80 (46.5) 95 (62.9) 133 (77.3)

£50 £100 £200 £300 £400

42 (35) 35 (33.9) 20 (21.7) 13 (23.2) 8 (20.5)

24 (60) 68 (66.1) 72 (78.3) 43 (76.8) 31 (79.5)

£10 £25 £50 £100 £200

28 (75.7) 31 (65.9) 44 (55) 42 (44.2) 28 (21)

9 (24.3) 16 (34.1) 36 (45) 53 (55.8) 105 (79)

‘Very certain’ (certainty = 5) £25 85 £50 69 £100 81 £200 77 £300 96

68 (80) 47 (68.1) 42 (51.9) 18 (23.4) 15 (15.6)

17 (20) 22 (31.9) 39 (48.1) 59 (76.6) 81 (84.4)

£50 £100 £200 £300 £400

32 (47.1) 21 (44.7) 14 (33.3) 7 (38.9) 5 (33.3)

36 (52.9) 26 (55.3) 28 (66.7) 11 (61.1) 10 (66.7)

£10 £25 £50 £100 £200

14 (82.4) 11 (50) 18 (46.2) 22 (37.3) 10 (12.3)

3 (17.6) 11 (50) 21 (53.8) 37 (62.7) 71 (87.7)

‘Less certain’ (certainty = 1–4) £25 72 £50 80 £100 89 £200 74 £300 74

52 (72.2) 55 (68.8) 50 (56.2) 38 (51.3) 23 (31.1)

20 (27.8) 25 (31.3) 39 (43.8) 36 (48.6) 51 (68.9)

£50 £100 £200 £300 £400

10 (19.2) 14 (25.5) 6 (12) 6 (15.8) 3 (13)

42 (80.1) 41 (74.5) 44 (88) 32 (84.2) 20 (86.9)

£10 £25 £50 £100 £200

14 (70) 20 (80) 26 (66.7) 20 (55.6) 18 (35.3)

6 (30) 5 (20) 11 (28.2) 16 (44.4) 33 (64.7)

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

Table 2 Responses to DBDC bid levels

Table 3 Welfare estimates from DBDC data Model I bivariate model

Observations Log-likelihood Rho (ρ) Chi-squared (d.f.) ‘Very certain’ (certainty = 5) Lower bound Upper bound Overall Observations Log-likelihood Rho (ρ) Chi-squared (d.f.)

142.81 (142.27–143.42) −33.59 (−26.64–−40.52)

Model IV DC2 responses

141.91 (141.15–142.27) 27.46 (26.14–28.78) £104.34 (95.93–112.77)

802 −984.71 −0.239 9.928 (1)

802 −1224.42 1

121.68 (120.46–121.74) 38.29 (36.31–40.26)

802 −491.12

802 −497.75

123.05 (122.05–123.33) 10.73 (7.84–13.64) 95.37 (91.79–107.17)

408 −487.96 0.182 2.936 (1)

‘Less certain’ (certainty = 1, 2, 3, 4) Lower bound 178.72 (177.45–180.01) Upper bound −204.44 (−679.47–270.59) Overall Observations Log-likelihood Rho (ρ) Chi-squared (d.f.)

Model III DC1 responses

394 −469.98 −0.628 42.576 (1)

408 −506.42 1

408 −227.87

408 −261.55

177.65 (176.36–178.96) 31.73 (30.23–33.24) 112.28 (100.52–125.29) 394 −669.65 1

394 −254.99

394 −236.28

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

Full sample Lower bound Upper bound Overall

Model II interval model

95% confidence intervals for welfare estimates are reported in parenthesis, calculated by bootstrapping with 1000 replications.

473

474

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

not reject the restriction that preferences were the same across ‘very certain’ and ‘less certain’ respondents. For all models WTP estimated for ‘very certain’ respondents was lower than for the full sample, inline with existing literature (Johannesson et al., 1998; Blumenschien et al., 1998; Blumenschein et al., 2001). Further, within ‘very certain’ respondents WTP estimated using the interval data model was lower than WTP from SBDC model of initial bids. Combined with a ρ significantly less than 1 (ρ = 0.182; χ2 = 2.936) this implies a positive association between responses to DC1 and DC2. This suggests that preference anomalies persist when only ‘very certain’ respondents are considered. Likelihood ratio tests of the restrictions in Models II, III and IV compared with the unrestricted Model I indicated that whilst the restriction imposed by Model II was not valid (36.92 ∼ χ2 (2)), the restriction that models are independent cannot be rejected (Models III and IV versus Model I (2.92 ∼ χ2 (2)). For ‘less certain’ respondents, WTP estimated using the interval data model from DBDC data was lower than WTP from SBDC model of initial bids. Welfare estimates for this group were higher than for ‘very certain’ respondents and for the full sample. Here, ρ was significant and negative, with a stronger negative association than for the full sample, implying a negative association between responses to DC1 and DC2. Likelihood ratio tests rejected the restrictions imposed by Model II and Models III and IV compared with the unrestricted Model I (Model I versus Model II: 399.34 ∼ χ2 (2), Models III and IV versus Model I: 42.58 ∼ χ2 (2)). 3.2. Exploring anomalies based on regression analysis. The random effects probit models, incorporating controls for preference anomalies, are presented in Table 4. Model V presents the results of the random effect probit model with no anomaly tests incorporated. The shift parameter in Model VI, δ, was negative and significant, indicating respondents’ WTP differs significantly across bounds. In Model VII, the coefficient on t1 × D, γ in Eq. (7), was negative and significant across all samples, implying weighted average anchoring was not observed. The shift and anchoring effects were included in Model VIII. For the full sample, the shift variable was significant and negative, and the anchoring variable was positive and significant at the 10% level. This suggests that when the effect of a shift in preferences across bounds was controlled for, a weak weighted average anchoring effect was present in the data. Re-estimating Model VIII to include only ‘very certain’ respondents, while the shift parameter remained significant, no significant anchoring effect was found. This implies the follow up question affected respondents stated WTP, but the anomalies were not explained by anchoring. ‘Less certain’ respondents were found to include a significant anchoring effect. This result was inline with behavioural explanations where anchoring had been proposed when respondents were uncertain of their true WTP anchor on DC1 (Herriges and Shogren, 1996). 3.3. Exploring anomalies based on response patterns Results from the investigation of the raw data are shown in Table 5. Table 5a reports the results for the total group. Across intervals and groupings, response probabilities were significantly different at the 5% level (except the lower interval of

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

475

Table 4 Empirical test of preference anomalies Variable

Full sample Constant Bid D t1 D

Model V Na¨ıve model

Model VI Shift model

Model VII Anchoring model

Model VIII Shift and anchoring model

0.407 (7.537)*** −0.004 (12.68)***

0.628 (9.63)*** −0.004 (12.85)*** −0.418 (6.37)***

0.435 (7.91)*** −0.003 (10.87)*** −0.0012 (3.43)***

0.673 (9.62)*** −0.005 (12.12)*** −0.544 0.0009 (1.79)*

Observations Log-likelihood

1604 −1012.9947

1604 −992.5204

1604 −1007.0404

1604 −990.91958

Rho (ρ)

8.32 × 10−8 −0.0001

8.32 × 10−8 −0.0001

8.32 × 10−8 −0.0001

8.32 × 10−8 −0.0001

0.783 (5.83)*** −0.007 (−7.48)*** −0.332 (−3.29)***

0.591 (5.67)*** −0.006 (−7.38)*** −0.0012 (−2.27)**

0.808 (5.46)*** −0.007 (−6.67)*** −0.382 (−2.46)** 0.0003 (0.043)

816 −494.9089

816 −491.7094

‘Very certain’ (certainty = 5) Constant 0.590 (5.11)*** Bid −0.006 (−7.31)*** D t1 D Observations Log-likelihood Rho (ρ)

816 −497.3287 0.274 0.092

‘Less certain’ (certainty = 1, 2, 3, 4) Constant 0.396 (5.15)*** Bid −0.004 (−8.03)*** D t1 D Observations Log-likelihood Rho (ρ)

778 −502.0453 1.13 × 10−7 0.00004

816 −491.8025 0.286 0.090 0.666 (7.25)*** −0.004 (−7.97)*** −0.539 (−5.74)*** 778 −485.3973 1.13 × 10−7 0.00004

0.217 0.075 0.412 (5.31)*** −0.003 (−6.75)***

0.300 0.094

−0.0009 (−1.86)*

0.794 (7.98)*** −0.005 (−8.61)*** −0.889 (−6.39)*** 0.003 (3.47)***

778 −500.3123

778 −479.4125

1.13 × 10−7 0.00004

1.13 × 10−7 0.00004

(***), (**), (*) denote significance at the 1%, 5% and 10% levels, respectively. T statistics reported in parenthesis.

grouping four). This provides evidence of anomalous behaviour between DC1 and DC2. Responses were best explained by the guilt/indignation hypothesis (Bateman et al., 1999). Table 5b reports patterns for ‘very certain’ respondents. In contrast to the full sample data response patterns were mixed with responses for the lower bid groups consistent with prospect theory and indignation (without guilt). However, at the higher bid groupings response patterns are consistent with well-defined preferences. Table 5c reports response patterns for respondents ‘less certain’. Response patterns indicated the significant finding of indignation and guilt.

476

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

Table 5a Results from tests for consistency of responses (all respondents N = 802) Bid increasing path

Proportion (%)

Bid decreasing path

Proportion (%)

Observed pattern

Possible explanations

Grouping 1: £25 and £50 Upper interval P(Y25 Y50 ) Middle interval P(Y25 N50 ) Lower interval P(N25 )

26.6 49.7 23.5

P(Y50 ) P(N50 Y25 ) P(N50 N25 )

67.8 20.3 10.2

<*** >*** >**

Guilt/indignation

Grouping 2: £50 and £100 Upper interval P(Y50 Y100 ) Middle interval P(Y50 N100 ) Lower interval P(N50 )

22.9 44.7 30.9

P(Y100 ) P(N100 Y50 ) P(N100 N50 )

52.9 25.3 20.7

<*** >*** >**

Guilt/indignation

Grouping 3: £100 and £200 Upper interval P(Y100 Y200 ) Middle interval P(Y100 N200 ) Lower interval P(N100 )

11.5 41.4 45.9

P(Y200 ) P(N200 Y100 ) P(N200 N100 )

37.1 27.8 35.1

<*** >*** >**

Guilt/indignation

Grouping 4: £200 and £300 Upper interval P(Y200 Y300 ) Middle interval P(Y200 N300 ) Lower interval P(N200 )

8.6 28.4 62.9

P(Y300 ) P(N300 Y200 ) P(N300 N200 )

22.5 16.1 60

<*** >*** >

Guilt/indignation

(***), (**), (*) denote significant differences between proportions at the 1%, 5%, and 10% levels, respectively.

Table 5b Results from tests for consistency of responses (‘very certain’ respondents [certainty = 5] N = 408) Bid increasing path

Proportion (%)

Upper interval Middle interval Lower interval

P(Y25 Y50 ) P(Y25 N50 ) P(N25 )

27.3 42.4 20

Upper interval Middle interval Lower interval

P(Y50 Y100 ) P(Y50 N100 ) P(N50 )

Upper interval Middle interval Lower interval Upper interval Middle interval Lower interval

Bid decreasing path

Proportion (%)

Observed pattern

Possible explanations

Grouping 1: £25 and £50 P(Y50 ) 57.7 P(N50 Y25 ) 15.7 P(N50 |N25 ) 15.7

<*** >*** =***

Prospect theory/indignation

30 38.6 31.4

Grouping 2: £50 and £100 P(Y100 ) 50.6 P(N100 Y50 ) 21.7 P(N100 N50 ) 27.7

<*** >*** =***

Prospect theory/indignation

P(Y100 Y200 ) P(Y100 N200 ) P(N100 )

16.8 33.7 49.4

Grouping 3: £100 and £200 P(Y200 ) 23.3 P(N200 Y100 ) 28.0 P(N200 N100 ) 48.1

=*** =*** =***

Consistent

P(Y200 Y300 ) P(Y200 N300 ) P(N200 )

9.1 14.3 76.6

Grouping 4: £200 and £300 P(Y300 ) 16.3 P(N300 Y200 ) 10.2 P(N300 N200 ) 73.5

>* =*** =***

Unclear

(***), (**), (*) denote significant differences between proportions at the 1%, 5%, and 10% levels, respectively.

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

477

Table 5c Results from tests for consistency of responses (‘Less Certain’ respondents [certainty = 1–4], N = 389) Bid increasing path Grouping 1: £25 and £50 Upper interval P(Y25 Y50 ) Middle interval P(Y25 N50 ) Lower interval P(N25 ) Grouping 2: £50 and £100 Upper interval P(Y50 Y100 ) Middle interval P(Y50 N100 ) Lower interval P(N50 ) Grouping 3: £100 and £200 Upper interval P(Y100 Y200 ) Middle interval P(Y100 N200 ) Lower interval P(N100 ) Grouping 4: £200 and £300 Upper interval P(Y200 Y300 ) Middle interval Lower interval

P(Y200 N300 ) P(N200 )

Proportion (%)

Bid decreasing path

Proportion (%)

Observed pattern

Possible explanations

13.8 58.3 27.8

P(Y50 ) P(N50 Y25 ) P(N50 N25 )

69.1 24.7 6.2

<*** >*** >***

Indignation/guilt

17.3 51.8 30.8

P(Y100 ) P(N100 Y50 ) P(N100 N50 )

54.9 28.6 16.5

<*** >*** >***

Indignation/guilt

6.6 48.3 45

P(Y200 ) P(N200 Y100 ) P(N200 N100 )

51.3 27 21.6

<*** >*** >***

Indignation/guilt

P(Y300 )

31.5

<***

Indignation/prospect theory

P(N300 Y200 ) P(N300 N200 )

19.5 44.7

>** =

8.1 43.2 48.6

(***), (**), (*) denote significant differences between proportions at the 1%, 5%, and 10% levels, respectively.

4. Discussion Consistent with previous studies, evidence was found that welfare estimates derived from DBDC data were lower than those generated from SBDC. Further, the bivariate probit model indicated low correlation between DC1 and DC2. These results held when models were re-estimated according to respondents’ certainty. Rho (ρ) was positive in the case of ‘very certain’ respondents and negative for ‘less certain’ respondents, suggesting behavioural motivations differed between groups. These findings were consistent with results from the random effects probit model incorporating tests of preference shifts and anchoring. All models provided evidence of a significant shift effect between DC1 and DC2. This suggested DC1 and DC2 were not drawn from the same underlying distribution, and provided evidence of incentive incompatibility of the follow up question. Weak evidence of a weighted average anchoring effect was found for the full sample. When ‘very certain’ respondents were considered no significant anchoring effect was found, again suggesting different behavioural motivations across certainty groups. Response patterns in the raw data indicated results consistent with the indignation/guilt hypothesis for the full sample. These results are in contrast to DeShazo (2002), who reported evidence of prospect theory (although DeShazo’s results were consistent with indignation). ‘Very certain’ respondents at lower bid levels had similar response patterns to those reported by DeShazo, whilst response patterns at higher bid groupings were consistent with theory. Overall, greater response consistency was found at higher bid levels, Alberini et al. (1997) argued when bid amounts are close to true WTP, respondents face a more difficult task. Accordingly, when bid amounts are (considerably) higher than respondent’s WTP, the task is simplified. Thus, respondents should find it easier to provide consistent responses (and be more certain of their response). An alternative explanation is when respondents are faced with higher bid amounts

478

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

they are more considered in their responses, as they perceive making a ‘mistake’ to be more costly. ‘Less certain’ response patterns were consistent with the indignation/guilt hypothesis. It was not possible to distinguish between prospect theory and indignation for the highest bid groupings. Bateman et al. (2002), through debriefing focus groups, reported qualitative evidence of an indignation effect. Further research is needed to investigate the findings from the analysis of raw responses, and qualitative data is likely to be useful here. DeShazo (2002) concluded only responses in the ascending sequence were anomalous and thus recommended only ‘no’ responses be followed by a subsequent bid. The results of this study found anomalies in both the ascending and descending sequences. This suggests the presence of any follow up question creates anomalous responses. Our results have potential implications for all multiple bounded elicitation formats, including the payment card, iterative bidding and random card sort (Smith, 2006). For example, in payment card formats, respondents consider each of the amounts presented to them, and state if they would be willing to pay that amount. Whilst respondents have advanced disclosure (they see all amounts they face), the study presented here also informed respondents they would face a second valuation question that would be higher if they stated ‘yes’ to DC1 and lower if they stated ‘no’ to DC1. Anomalous behaviour persisted, however. Future research should investigate these issues within the context of payment cards, considering ascending and descending bid amounts. The NOAA panel recommended the use of dichotomous choice contingent valuation based on the perceived incentive compatibility of the method. The move from a single to double bounded dichotomous choice structure may compromise this incentive compatibility, thus implying the use of DBDC should be avoided in favour of SBDC (Carson et al., 1999). If anomalies are attributed to incentive incompatibility introduced by the follow up question, Carson et al. (1999) predict response patterns consistent with the cost expectations explanation. In both this study and DeShazo (2002) anomalies were not consistent with this explanation, but rather prospect theory and indignation. Indignation may also be a consequence of incentive incompatibility, as the respondent does not truthfully reveal their valuation of the good. The explanations proposed for anomalous behaviour (with the exception of yea saying) assume utility maximisation at DC1. However, patterns observed may indicate respondents were not utility maximising, and did not hold well-defined preferences for the good in question (McFadden, 1994; Sugden, 1999). In such cases respondents may use the contingent valuation question frame to aid their preference formation. Gregory et al. (1993) acknowledge this when recommending researchers ‘should function not as archaeologists, carefully uncovering what is there, but as architects, working to build a defensible expression of value’. This would indicate preferences are malleable; dependent upon, and constructed in response to the contingent valuation frame (Bateman and Mawby, 2004; Hanley and Shogren, 2005; Sugden, 2005). The existence of well-formed preferences may be related to respondent’s familiarity with, or experience of, the good being valued (Boyle et al., 1993; Roach et al., 1999; Braga and Starmer, 2005; McCollum and Boyle, 2005). Evidence here is mixed. Boyle et al. (1993) and McCollum and Boyle (2005), when looking at preferences for boating on the Colorado river and moose hunting in Maine, found no significant difference between WTP estimates elicited from more experienced and less experienced respondents. In contrast Roach et al. (1993), in a study concerned with boating preferences, found respondents’ experience influenced their WTP estimates. They question whether respondents with little or no experience are able to provide valid responses to stated preference tasks. Many applications of stated preference techniques in health economics elicit the preferences of patients for a treatment they have experienced. In these cases

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

479

respondents can reasonably be expected to have a greater familiarity with the good in question. Future work should consider if such anomalies as reported in this paper persist under varying degrees of respondent familiarity. San Miguel and Ryan (2003) in series of discrete choice experiments valuing goods that differ in familiarity (a supermarket, a dentist appointment, and bowel cancer screening) did not find evidence of preference construction. As respondents are presented with a number of choices, does this elicitation format allow individuals to form their preferences? Indeed many DCEs include a set of warm up questions allowing respondents to form their preferences. When respondents are familiar with the good in question researchers may be able to carefully uncover existing preferences (acting as archaeologists). Conversely, when respondents are asked to value a good they have no direct experience of, in this case an air ambulance service, respondents’ preferences may build on the information they receive in the question frame. Hanley and Shogren (2005) discuss the presence of a middle ground between well-defined preferences and constructed preferences, where respondents have a range of uncertainty. This study found ‘certain’ respondents exhibited less anomalous behaviour than ‘less certain’ respondents, perhaps indicating ‘less certain’ respondents were more dependent on the question frame to form their preferences. Providing respondents with the opportunity to ‘learn’ their preferences within a CV experiment may overcome some of the anomalies observed. Within health economics Dolan et al. (1999) eliciting patients views on priority setting over two focus group sessions, found views on priority setting were systematically different after respondents were given the opportunity for discussion and deliberation. This led the authors to conclude that the results of ‘surveys, which do not allow respondents the time or opportunity to reflect on their preferences’ may be doubtful. Future stated preference studies should consider information provision and how to best provide information to respondents prior to a valuation task. The citizen’s jury a promising approach, which allows respondents to reflect on their preferences, has been combined with stated preference techniques within the environmental economics literature (Kenyon et al., 2001; Kenyon et al., 2003; MacMillan et al., 2002; MacMillan et al., in press). This methodology combines a contingent valuation task with elements of participatory analysis. Respondents are given time to reflect upon their preferences and obtain larger amounts of information, especially in the case of unfamiliar goods. Of particular relevance is a study by MacMillan et al. (in press) that valued two goods, Red Kite preservation and renewable energy. Respondents’ familiarity with the goods differed; respondents were unfamiliar with Red Kite preservation and familiar with renewable energy. Findings indicated WTP values were significantly different for the unfamiliar good after respondents were given the opportunity to deliberate and obtain additional information. The authors concluded that CV could act as a ‘preference engine’. The presence of observed anomalies in the application of CV questions its use at the policy level. Despite the UK Treasury’s Green Book recommending the use of monetary measures to value non-marketed goods, cost-benefit analysis (CBA) has not been used at the policy level within health care (Drummond et al., 2005). Anomalies, such as those reported in this paper, undermine the validity of CV. A recent special issue of the Journal of Environmental and Resource Economics (Volume 32, Number 1, September 2005) considered how stated preference research could cope with observed anomalies. In the opening paper Sugden (2005) proposes a framework for the discussion of observed anomalies: recognise the aspiration of SP techniques as legitimate; best practice in SP is not a closed world (evidence maybe found from other judgement and decision-making tasks); and adopt a precautionary principle by developing methodologies to cope with anomalies. Following this he summarised five alternative ways to respond to the problems

480

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

anomalies create in CBA: be pragmatic, allow for preference discovery in experiments, develop new theories of preferences, considering market simulation, and measure happiness (Sugden, 2005). The interested reader is directed to this Journal. In conclusion, this study found serious anomalies in a widely applied contingent valuation technique: the DBDC format. In line with previous studies WTP estimated from DBDC was lower than WTP from SBDC. Results of regression models and tests of raw data response patterns indicated anomalous behaviour across bounds. Anomalies were present for ascending and descending bounds, and for less certain and very certain respondents, although they were greater for less certain respondents. The results may indicate that for some health interventions, where there is limited familiarity, respondents do not hold theoretically consistent preferences. The implication of this conclusion could be seen as worrying. One interpretation is that the results of such experiments cannot be used within an economic evaluation framework. However, once we recognise that economists can act as architects, working to build preferences, rather than archaeologists, working to uncover them, CV tasks can be designed accordingly. If CV is to be adopted by policymakers the health economics community needs to discuss how to deal with anomalies. Acknowledgements The authors gratefully acknowledge comments from Ian Bateman on an earlier version of this paper. Financial support from the Department of Health, University of Aberdeen and Health Foundation is acknowledged. The Chief Scientist Office of the Scottish Executive Health Department (SEHD) funds HERU. The usual disclaimer applies. References Alberini, A., 1995. Efficiency vs bias of willingness to pay estimates: bivariate and interval data models. Journal of Environmental Economics and Management 29, 169–180. Alberini, A., Kanninen, B., Carson, R., 1997. Modelling response incentive effects in dichotomous choice contingent valuation data. Land Economics 73, 309–324. Arrow, K., Solow, R., Portney, P.R., Leamer, E.E., Radner, R., Schuman, H., 1993. Report of the NOAA panel on contingent valuation. Federal Register 58, 4601–4614. Asfaw, A., von Braun, J., 2005. Innovations in health care financing: new evidence on the prospect of community health insurance schemes in the rural areas of Ethopia. International Journal of Health Care Finance and Economics 5, 241–253. Bateman, I.J., Mawby, J., 2004. First impressions count: interviewer appearance and information effects in stated preference studies. Ecological Economics 49, 47–55. Bateman, I.J., Landford, I.H., Jones, A.P., Kerr, G.N., 2001. Bound and path effects in double and triple bounded dichotomous choice contingent valuation. Resource and Energy Economics 23, 191–213. Bateman, I.J., Carson, R.T., Day, B., Hamemann, M., hanley, N., Hett, T., Jones-Lee, M., Loomes, G., Mourato, S., ¨ Ozdemiroglu, E., Pearce, D.W., Sugden R., Swanson, J., 2002. Economic Valuation with Stated Preferencs: A manual. Edward Elgar. Blumenschein, K., Johannesson, M., Yokoyama, K.K., Freeman, P.R., 2001. Hypothetical versus actual willingness to pay in the health care sector: results from a field experiment. Journal of Health Economics 20, 441–457. Blumenschien, K., Johannesson, M., Blomquist, G.C., Liljas, B., O’Conor, R.M., 1998. Experimental results on expressed certainty and hypothetical bias in contingent valuation. Southern Economic Journal 65, 169–177. Boyle, K., Bishop, R.C., Welsh, M.P., 1985. Starting point bias in contingent valuation bidding games. Land Economics 61, 188–194. Boyle, K., Welsh, M.P., Bishop, R.C., 1993. The role of question order and respondent experience in contingent valuation studies. Journal of Environmental Economics and Management 25, 45–55. Boyle, K., Johnston, F.R., McCollum, D.W., 1997. Anchoring and adjustment in single bounded, contingent valuation questions. American Journal of Agricultural Economics 79, 1495–1500.

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

481

Braga, J., Starmer, C., 2005. Preference anomalies, preference elicitation and the discovered preference hypothesis. Environmental and Resource Economics 32, 55–89. Cameron, T., Quiggan, J., 1994. Estimation using contingent valuation data from dichotomous choice with follow up questionnaire. Journal of Environmental Economics and Management 27, 218–234. Carson, R., Groves, T., Machina, M., 1999. Incentive and informational properties of preference questions—plenary address. In: Proceedings of the Ninth Annual Conference of the European Association of Environmental and Resource Economics (EAERE), Oslo, Norway, June 1999. Clarke, P.M., 2000. Valuing the benefits of mobile mammographic screening units using the contingent valuation method. Applied Economics 32, 1647–1655. DeShazo, J.R., 2002. Designing transactions without framing effects in iterative question formats. Journal of Environmental Economics and Management 43, 360–385. Diener, A., O’Brien, B., Gafni, A., 1998. Health care contingent valuation studies: a review and classification of the literature. Health Economics 7, 313–326. Dolan, P., Cookson, R., Ferguson, B., 1999. The effect of group discussions on the public’s view regarding priorities in health care. British Medical Journal 318, 916–919. Donaldson, C., Thomas, R., Torgenson, D., 1997. Validity of open-ended and payment scale approaches to eliciting willingness to pay. Applied Economics 29, 79–84. Drummond, M., Sculpher, M., Torrance, G., O’Brien, B., Stoddart, G., 2005. Methods for the Economic Evaluation of Health Care Programmes, third ed. Oxford University Press. Frew, E.J., Whynes, D.K., Wolstenholme, J.L., 2003. Eliciting willingness to pay: comparing closed-ended with openended and payment scale formats. Medical Decision Making 23, 150–159. Gregory, R., Lichtenstien, S., Slovic, P., 1993. Valuing environmental resources: a constructive approach. Journal of Risk and Uncertainty 7, 171–197. Hackl, F., Pruckner, G.J., 2005. Warm glow, free riding and vehicle neutrality in a health-related contingent valuation study. Health Economics 14, 293–306. Hanemann, W.M., 1984. Welfare evaluations in a contingent valuation experiments with discrete responses. American Journal of Agricultural Economics 66, 332–341. Hanemann, W.M., Loomis, J., Kanninen, B., 1991. Statistical efficiency of double bounded dichotomous choice contingent valuation. American Journal of Agricultural Economics 73, 1255–1263. Hanemann, W.M., Kanninen, B., 1999. The statistical analysis of discrete-response CV data. In: Bateman, I.J., Willis, K.G. (Eds.), Valuing Environmental Preferences: theory and practice of the contingent valuation method in the US, Eu and Developing countries. Hanley, N., Shogren, J.F., 2005. Is cost-benefit analysis anomaly-proof? Environmental and Resource Economics 32, 13–34. Herriges, J.A., Shogren, J.F., 1996. Starting point bias in dichotomous choice with follow up questioning. Journal of Environmental Economics and Management 30, 112–131. Hoehn, J., Randall, A., 1987. A satisfactory benefit cost indicator from contingent valuation. Journal of Environmental Economics and Management 14, 112–131. Holmes, T.P., Kramer, R.A., 1995. An independent sample test of yea-saying and starting point bias in dichotomous choice contingent valuation. Journal of Environmental Economics and Management 29, 121–132. Hunter, J.E., 2001. The desperate need for replications. Journal of Consumer Research 28, 149–158. Johannesson, M., Liljas, B., Johansson, P.O., 1998. An experimental comparison of dichotomous choice valuation questions and real purchase decisions. Applied Economics 30, 643–647. Johnston, D., Ryan, M., 2002. Air ambulances: public perceptions of value. Report prepared for Department of Health. Kahneman, D., Tversky, A., 1979. Prospect theory: an analysis of decision under risk. Econometrica 47, 263–291. Kanninen, B., 1995. Bias in discrete response contingent valuation. Journal of Environmental Economics and Management 28, 114–125. Kennedy, C.A., 2002. Revealed preference valuation compared to contingent valuation: radon-induced lung cancer prevention. Health Economics 11, 585–598. Kenyon, W., Hanley, N., Nevin, C., 2001. Citizens’ juries: an aid to environmental valuation? Environmental Planning C: Government and Policy 19, 557–566. Kenyon, W., Kevin, C., Hanley, N., 2003. Enhancing environmental decision-making using citizens’ juries. Local Environmental 8, 221–232. Klose, T., 1999. The contingent valuation method in health care. Health Policy 47, 97–123. Liu, J.T., Hammitt, J.K., Wang, J.D., Liu, J.L., 2000. Mother’s willingness to pay for her own and her child’s health: a contingent valuation study in Taiwan. Health Economics 9, 319–326.

482

V. Watson, M. Ryan / Journal of Health Economics 26 (2007) 463–482

MacMillan, D., Philip, L., Hanley, H.D., Alvarez-Farizo, B., 2002. Valuing non-market benefits of wild goose conservation: a comparison of interview and group-based approaches. Ecological Economics 43, 49–59. MacMillan, D., Hanley, N., Lienhoop, N. Contingent valuation: environmental polling or preference engine? Ecological Economics, in press. McCollum, D.W., Boyle, K.J., 2005. The effect of respondent experience/knowledge in the elicitation of contingent values: an investigation of convergent validity, procedural invariance and reliability. Environmental and Resource Economics 30, 23–33. McFadden, D., 1974. Conditional logit analysis of qualitative choice behaviour. In: Zarembka, P. (Ed.), Frontiers in Econometrics. Academic Press, New York. McFadden, D., 1994. Contingent valuation and social choice. American Journal of Agricultural Economics 76, 689–708. Prosser L.A., Bridges, C.B., Uyeki, T.M., Rego, V.H., Ray, G.T., Meltzer, M.I., Schwartz, B., Thompson, W.W., Fukuda, K., Lieu, T.A., 2005. Values for preventing influenza-related morbidity and vaccine adverse events in children. Health and Quality of Life Outcomes 3. Roach, B., Boyle, K., Bergstrom, J.C., Reiling, S.D., 1999. The effect of instream flows on whitewater visitation and consumer surplus: a contingent valuation application to the Dead River Maine. Rivers 7, 11–20. Ready, R.C., Buzby, J.C., Hu, D., 1996. Differences between continuous and discrete contingent value estimates. Land Economics 72, 397–411. Ryan, M., Scott, D.A., Donaldson, C., 2004. Valuing health care using willingness to pay: a comparison of the payment card and dichotomous choice methods. Journal of Health Economics 23, 237–258. San Miguel, F., Ryan, M., 2003. Revisiting the axiom of completeness in healthcare. Health Economics 12, 295–307. Smith, R.D., 2003. Construction of the contingent valuation market in health care: a critical assessment. Health Economics 12, 609–628. Smith, R.D., 2006. Its not just what you do it’s the way that you do it: the effect of different payment card formats and survey administration on willingness to pay for health gain. Health Economics 15, 281–293. Sugden, R., 1999. Alternatives to the neo-classical theory of choice. In: Bateman, I.J., Willis, K.G. (Eds.), Valuing Environmental Preferences: theory and practice of the contingent valuation method in the US, Eu, and Developing Countries. Oxford University Press. Sugden, R., 2005. Anomalies and stated preference techniques: a framework for coping strategies. Environmental and Resource Economics 32, 1–12. Whitehead, J.C., 2002. Incentive compatibility and starting point bias in iterative valuation questions. Land Economics 78, 285–297.