Multi-attribute, multi-alternative models of choice: Choice, reaction time, and process tracing

Multi-attribute, multi-alternative models of choice: Choice, reaction time, and process tracing

Cognitive Psychology 98 (2017) 45–72 Contents lists available at ScienceDirect Cognitive Psychology journal homepage: www.elsevier.com/locate/cogpsy...

3MB Sizes 13 Downloads 54 Views

Cognitive Psychology 98 (2017) 45–72

Contents lists available at ScienceDirect

Cognitive Psychology journal homepage: www.elsevier.com/locate/cogpsych

Multi-attribute, multi-alternative models of choice: Choice, reaction time, and process tracing q Andrew L. Cohen a,⇑, Namyi Kang b, Tanya L. Leise b a b

University of Massachusetts Amherst, United States Amherst College, United States

a r t i c l e

i n f o

Article history: Accepted 8 August 2017

Keywords: Decision making Computational modeling Response time Process tracing

a b s t r a c t The first aim of this research is to compare computational models of multi-alternative, multi-attribute choice when attribute values are explicit. The choice predictions of utility (standard random utility & weighted valuation), heuristic (elimination-by-aspects, lexicographic, & maximum attribute value), and dynamic (multi-alternative decision field theory, MDFT, & a version of the multi-attribute linear ballistic accumulator, MLBA) models are contrasted on both preferential and risky choice data. Using both maximum likelihood and cross-validation fit measures on choice data, the utility and dynamic models are preferred over the heuristic models for risky choice, with a slight overall advantage for the MLBA for preferential choice. The response time predictions of these models (except the MDFT) are then tested. Although the MLBA accurately predicts response time distributions, it only weakly accounts for stimulus-level differences. The other models completely fail to account for stimulus-level differences. Process tracing measures, i.e., eye and mouse tracking, were also collected. None of the qualitative predictions of the models are completely supported by that data. These results suggest that the models may not appropriately represent the interaction of attention and preference formation. To overcome this potential shortcoming, the second aim of this research is to test preference-formation assumptions, independently of attention, by developing the models of attentional sampling (MAS) model family which incorporates the empirical gaze patterns into a sequential sampling framework. An MAS variant that includes attribute values, but only updates the currently viewed alternative and does not contrast values across alternatives, performs well in both experiments. Overall, the results support the dynamic models, but point to the need to incorporate a framework that more accurately reflects the relationship between attention and the preference-formation process. Ó 2017 Elsevier Inc. All rights reserved.

1. Introduction People are often asked to choose from a set of alternatives, each of which varies across multiple attributes or features, for example, choosing which of three apartments to rent based on location, condition, kitchen, and size. Modelers of such q Author Note: The authors gratefully thank Adrian Staub for use of eye tracking equipment, Cheryl A. Nicholas for assistance in collecting data, and Joseph Johnson for early discussions regarding this work. TL was supported in part by a grant from the Amherst College Faculty Research Award Program, funded by the H. Axel Schupf ‘57 Fund for Intellectual Life. ⇑ Corresponding author at: Department of Psychological and Brain Sciences, University of Massachusetts Amherst, United States. E-mail address: [email protected] (A.L. Cohen).

http://dx.doi.org/10.1016/j.cogpsych.2017.08.001 0010-0285/Ó 2017 Elsevier Inc. All rights reserved.

46

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

multi-attribute, multi-alternative decisions have made a variety of assumptions about the processes underlying choice, including how people gather and use information. Early models assumed decisions are made based on the calculation of valuation or utility (e.g., Rieskamp, Busemeyer, & Mellers, 2006; Thurstone, 1927; Train, 2003). To allow for rapid decisions based on a subset of the information, later models assumed choices were driven by heuristics (e.g., Gigerenzer & Goldstein, 1996; Hogarth & Karelaia, 2007; Shah & Oppenheimer, 2008). More recently, models have incorporated dynamic processes in which information is integrated over time until some threshold condition is met, triggering the choice (Pleskac, Diederich, & Wallsten, 2015). Many of these models have been extended from simple binary choice to multi-attribute, multialternative decision making, and can incorporate both subjective valuation of alternatives and the allocation of attention to different attributes of the alternatives. The first aim of this research is to compare computational models of multi-alternative, multi-attribute choice when the stimulus attributes are explicit and novel to participants. This comparison is designed to be more comprehensive than previous work, by comparing a broad set of models on both preferential and risky choice data using choice, response time, and attentional measures. By explicit attributes, we mean that the value of each attribute for each alternative is presented separately, as in the previous apartment example, as opposed to, for example, selecting among a set of pictures of snack foods. By novel attributes, we mean that participants begin the task without prior knowledge or opinions about the alternatives. We compare utility and heuristic models with representatives of the relatively new class of dynamic models. The models are compared in two experimental paradigms: preferential and risky choice. In the preferential choice task, participants select from three apartments that vary on four attributes. In the risky choice task, participants select from three gambles, each with three probabilistic outcomes. In both tasks, the information is presented in a grid with alternatives presented as rows and attributes as columns. A similar stimulus format has been used in previous research including the collection of process tracing measures (e.g., Kwak, Payne, Cohen, & Huettel, 2015; Payne, Bettman, & Johnson, 1988; Venkatraman, Payne, & Huettel, 2014). To provide a wide-ranging assessment, three basic classes of models are contrasted. Here each model is briefly described. The model details are provided in Appendix A. All of the models were selected as reasonable strategies that a participant might employ when faced with a choice between multiple alternatives with explicit, multiple attributes presented in a grid. The first class of models calculates a utility for each alternative based on a weighted combination of the attribute values plus noise. Choice is determined by the probability that an alternative’s utility is greater than those of the other alternatives. These utilities are calculated independently, that is, they are not influenced by the other alternatives. The standard random utility model (SRU; Rieskamp et al., 2006) and weighted valuation model (WV; Kahneman & Tversky, 1979) are used in preferential and risky choice, respectively. The second class of models assumes that choice is driven by heuristics, allowing a choice to be determined without performing demanding computations. In contrast to the previous class of models, a choice may be made without considering all of the available information. An extensive set of heuristics have been developed (e.g., Shah & Oppenheimer, 2008). Here we examine three heuristics: elimination by aspects (EBA; Tversky, 1972), the lexicographic heuristic (LEX; Fishburn, 1974), and the maximum attribute value heuristic (MV; similar, in this scenario, to the maximax criterion). EBA is a process of sequential elimination. An attribute is randomly selected, with the probability of selection based on the importance of the attribute. Any alternative with a low value for that attribute is eliminated. The process continues until only one alternative remains. The LEX heuristic also steps through attributes, but seeks a dominating alternative. The MV heuristic makes the very simple, but reasonable, assumption that the alternative with the maximum attribute value is selected. A minimax heuristic, in which the alternative with the maximum worst alternative is selected, was also tried, but, because it performed significantly worse than MV, will not be discussed further. The third class of models assumes information is accumulated and integrated over time until a threshold is reached. These dynamic choice, sequential sampling models are typically implemented as noisy diffusion processes. Examples include the leaky competing accumulator model (LCA; Usher & McClelland, 2004), decision field theory (DFT; Busemeyer & Townsend, 1993), and the linear ballistic accumulator (LBA; Brown & Heathcote, 2008). Here we consider versions of the LBA and DFT models that have been adapted to multi-alternative and multi-attribute paradigms: the multi-attribute linear ballistic accumulator (MLBA; Trueblood, Brown, & Heathcote, 2014) and the multi-alternative decision field theory (MDFT; Roe, Busemeyer, & Townsend, 2001). To date, the MLBA has only been applied to preferential choice stimuli with two attributes. This model is adapted to risky choice and stimuli with more than two attributes. Because both preferential and risky choice are considered and because different stimuli are used than in the past, both the MDFT and MLBA formulations were adapted to the current paradigms. In particular, in the current versions of both the MDFT and the MLBA, attentional weighting is based on prospect theory (Tversky & Kahneman, 1992) and in the current version of the MLBA subjective valuation is also based on prospect theory. Details are provided in Appendix A. Because, in many studies, not all of the models under consideration make response time predictions, most prior work involving the comparison of dynamic models only considers choice performance, not response time. For example, Berkowitsch, Scheibehenne, and Rieskamp (2014) compared the MDFT to utility models in a preferential choice task. Newell and Lee (2011) compared a sequential sampling model to a rational model and a set of heuristic models in a categorization task. Scheibehenne, Rieskamp, and González-Vallejo (2009) compared the DFT to the proportional difference model (Gonzalez-Vallejo, 2002) in a risky choice task. Trueblood et al. (2014) compared the MDFT and MLBA choice predictions for inference and perceptual experiments. Fiedler and Glöckner (2012) compared the DFT to a parallel constraint

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

47

satisfaction model (Glöckner & Herbold, 2011), utility model, and heuristic models in a risky choice task, although they did consider qualitative response time predictions. Following these studies, in the current work, all of the models are first compared on choice performance. We go beyond past work, however, in the following ways. First, a wider range of models than is typically used are compared. Second, two dynamic models, the MDFT and MLBA, are directly compared on both preferential and risky choice (also see Trueblood et al., 2014). Third, two fit measures are used. Most previous work has relied on maximum likelihood parameter estimates and related measures such as AIC and BIC. Here, the models are fit using both maximum likelihood and cross-validation. Although maximum likelihood has robust mathematical properties, cross-validation results are more easily interpretable and naturally account for differences in model complexity. We also fit the models to both choice and response time data for the preferential choice and risky choice experiments. An advantage of dynamic models is the precise, quantitative prediction of both choice and response time. Surprisingly little work has compared the dynamic models of decision making discussed previously on quantitative response time prediction for multi-attribute, multi-alternative choice. Researchers have looked at each of the component parts. For example, (Glöckner, Hilbig, & Jekel, 2014) showed that the predictions of a parallel constraint satisfaction model strongly correlate with participant RTs in a multi-attribute, binary choice task. Ashby, Jekel, Dickert, and Glöckner (2016) compared drift-diffusion models on their ability to fit response times in a multi-alternative preferential choice task, without explicit attributes. Krajbich and Rangel (2011) examined the relationship between choice and reaction time in a multi-alternative context, with preformed preferences and without explicit attributes. The only research we know of that has quantitatively modeled response times in a multi-attribute, multi-alternative task is Trueblood and Dasari (2017) that fit the MLBA to both choice and response time, but in a perceptual task. We examine how well the heuristic models and the MLBA can jointly predict choice and response time. The simplicity of the MLBA facilitates closed-form solutions for calculating choice and response time probabilities. Because there is no closedform solution for the MDFT when trials are terminated by the participant and it is therefore computationally intractable, we do not fit this model to response times and we are not aware of any other work that has done so. To preview, the utility and dynamic models perform well on the choice data of both experiments. At best, however, the models can only weakly predict the choice plus RT data. One reason that these models may not be able to account for the RT data is that, in this experimental paradigm, they do not appropriately represent the interaction of attention and preference formation. To test the attention assumptions of the models, we also consider process tracing measures, such as eye or mouse tracking (Schulte-Mecklenbeck, Kühberger, & Ranyard, 2011). This analysis considers the order of information search, how many and which values are attended, and whether attention focuses on a subset of alternatives as choice progresses. As a further preview, we find that none of the model information-gathering assumptions completely match the data, suggesting that these models do not appropriately capture patterns of attention. Other studies have reached similar conclusions (Fiedler & Glockner, 2012; Glockner & Herbold, 2011). Another potential problem is that, as typically formulated, the utility, MV, MDFT (the closed-form solution), and MLBA models assume that the participant has full stimulus knowledge from the beginning of the trial. For perceptual stimuli, in which the information-gathering phase can be rapid, this assumption may be appropriate. Indeed, similar models have been successful at modeling RT in other judgment tasks (e.g., Brown & Heathcote, 2008; Ratcliff & McKoon, 2008; Trueblood & Dasari, 2017) or choice tasks in which the attributes are implicit and spatially co-located. For example, Krajbich, Armel, and Rangel (2010) and Krajbich and Rangel (2011) used the attentional drift-diffusion model (ADDM) to model choice between multiple products, each of which was visually displayed as a single image, so the attributes are implicit. When faced with a set of stimuli in which the values are explicit and spatially separated, however, each attribute must be individually attended and this knowledge has to be built up over time during the choice (Noguchi & Stewart, 2014; Stewart, Hermens, & Matthews, 2016). The current versions of the MDFT and MLBA cannot naturally incorporate this assumption. Furthermore, these models assume that all pertinent information is utilized by the decision maker, but it is not uncommon for a participant to only view a subset of the information before making a decision (Glöckner & Betsch, 2008; Lohse & Johnson, 1996; Nicholas & Cohen, 2016). When fitting to data, the attentional and preference-formation assumptions of models like the MDFT and MLBA are jointly tested. Given that the attentional assumptions may be inappropriate for the current task, as discussed previously, it would be beneficial to isolate and test the preference-formation assumptions. Therefore, the second aim of this research is to test preference-formation assumptions, independently of attention, by developing a model framework that directly incorporates empirical process-tracing information. That is, this model framework uses the actual sequence of participant attention to attributes and alternatives to drive preference formation. To this end, a family of dynamic models based on ideas from the MDFT, MLBA, and ADDM, which we call the models of attentional sampling (MAS), that incorporate different preference-formation assumptions are developed and compared on how well they account for choice. The three tested preference-formation assumptions, drawn from previous models, are whether dimension values are used to directly update preference states, whether people contrast values across alternatives within an attribute, and whether the accumulated evidence is updated for all alternatives concurrently or only for the currently viewed alternative. These assumptions are detailed below. The research proceeds as follows. First, we describe the preferential and risky choice experiments in which choice, response time, and process-tracing patterns are measured, as well as a third risky choice experiment with many within-subject trials, but no process-tracing data, used to further constrain the models’ response time predictions. Second,

48

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

a wide range of utility, heuristic, and dynamic models are compared on their ability to predict choice data. Third, the models are fit to both choice and response times. Fourth, the models are compared on their ability to predict process-tracing patterns. Finally, the MAS model family is developed and used to test hypotheses about preference-formation behavior of participants independent of attentional assumptions. All models are fit using a single, common set of parameters for all stimuli at the participant level. 2. Methods 2.1. Preferential choice experiment In the preferential choice experiment, participants selected one of three apartments based on 1–5 star ratings on four attributes. 2.1.1. Participants Thirty-eight undergraduates at the University of Massachusetts Amherst received course credits for participation. 2.1.2. Stimuli The three apartments varied on four attributes (Ease of Transportation, Size, Condition, and Kitchen Facilities) and were presented in a 34 grid. A sample stimulus is provided in Fig. 1. Each row was an alternative and each column was an attribute. Each cell in the grid indicated the 1–5 stars rating of the apartment on the corresponding attribute. The ratings were defined relative to the participant’s ideal value on that attribute, with 5 representing ‘‘very good for you” and 1 ‘‘very bad for you”. Each cell was 6 cm wide by 4 cm tall. Participants sat approximately 75 cm from the screen, thus, each cell was approximately 4.58 by 3.06 visual degrees. Each cell serves as an AOI. To provide participants with an incentive to search the grid for information, while at the same time ensuring that there is no easy-to-choose winner, two of the three alternatives were high alternatives and one was a low alternative. The mean ratings for the high alternatives in a stimulus were identical and were selected randomly from 2 to 4 stars. Because each participant may weight the attributes differently, equal mean rating translates to similar, but not equal, utility for the good alternatives. The mean rating for the low alternative had an average of 0.5 stars below the good alternatives. The rules for generating stimuli were not explained to participants. The same set of 45 stimuli were used for all participants. The mapping of apartment to row was randomly assigned, but fixed across participants within a particular stimulus. To control for order effects, four different counterbalanced attribute orderings were generated. Five catch trials were included in which one alternative clearly dominated the other two. 2.1.3. Procedure The experiment was implemented in Experiment Builder (SR Research). On each trial, the participant selected one of the three apartments via a keyboard press. Eye tracking was done using an EyeLink 1000 (SR Research), with a drift correction procedure before each trial. There were 2 practice trials. Stimulus order was randomized for each participant.

Fig. 1. Sample preferential choice stimulus.

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

49

2.1.4. Data Six participants were removed because they gave the wrong answer on more than one of the 5 catch trials, leaving a total of 32 participants. Catch trials are not included in the data. Trials were eliminated if the response time (RT) was more than 3.5 standard deviations away from the participant’s mean RT. One trial each was dropped from 7 of the 32 participants. Because it implies that the participant did not have sufficient information to make a meaningful decision, trials were eliminated if fewer than 3 rating cells were viewed. Two trials were dropped from one participant. Under the assumption that there is a minimum gaze duration required to be able to interpret information, gazes with duration in the lowest 5% from each participant were eliminated from analysis. Fixations were determined using the default settings in Data Viewer (SR Research). 2.2. Risky choice experiment In the risky choice experiment, participants selected one of three gambles based on three probabilistic outcomes. The design is very similar to Nicholas and Cohen (2016). 2.2.1. Participants 28 University of Massachusetts, Amherst undergraduates participated for course credit. Participants were also entered into a raffle based on performance; see Nicholas and Cohen (2016) for details. Different participants were involved in the preferential and risky choice experiments. 2.2.2. Stimuli Each stimulus consisted of three gambles. Each gamble had three possible outcomes. Each outcome was associated with a probability. Each stimulus was presented in 33 grid. A sample, schematic stimulus is provided in Fig. 2. Each row was a gamble alternative. Each column was a potential outcome. The top row contained the probability of each outcome. The probabilities were randomly generated on every trial from a uniform distribution from 0 to 1, normalized to sum to 1. Probabilities outside the range 0.10 to 0.45 triggered re-sampling. Probability order was random. As in the preferential choice experiment, there were two types of alternatives: high and low. Outcomes from high alternatives were sampled from a normal distribution with mean 60 and standard deviation 10. Outcomes from the low alternative were sampled from a normal distribution with standard deviation 10. To ensure that the low probability outcomes were relevant, the mean of the low outcome distribution changed based on the probability. For a probability of 0.45 (the maximum possible), the mean was 60 (the same as for the high alternative outcomes). For a probability of 0.10 (the lowest possible), the mean was 45. The mean for outcomes with probabilities between 0.10 and 0.45 were linearly interpolated between these two endpoints and rounded. On every trial, there was one low and two high alternatives. To prevent extreme outliers, any gambles with outcomes outside the range 5–100 were re-sampled. The row order of the 3 alternatives was randomized. The rules for generating stimuli were not explained to participants. 2.2.3. Procedure The task was to select a gamble to maximize payout. On half of the trials, the participant was interrupted with a secondary spatial task, i.e., determining if a rotated R was presented in a normal or mirror configuration. Because only the non-interruption trials were analyzed here, the interruption trials will not be discussed (see Nicholas & Cohen, 2016, for details). The experiment was broken into 11 blocks: 1 practice block and 10 experiment blocks. Each experiment block had 3 interruption trials and 3 non-interruption trials for a total of 30 non-interruption trials. At the start of each trial the only the alternative labels were visible. The rest of the grid was blank. Participants could then view the outcomes and probabilities by clicking the mouse in the empty cells. The value in that cell was visible while the mouse button was depressed and was removed from view when the mouse button was released. An alternative was selected by clicking on an alternative label. At least one outcome needed to be viewed before selecting an alternative. The participant played against the computer, which selected the gamble with the highest expected value not selected by the participant. The gamble was realized after each trial with outcomes added to the participant and computer totals. If the participant gained more total money than the computer, they qualified for a raffle.

Fig. 2. Sample risky choice stimulus.

50

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

2.2.4. Data The data from all non-interruption trials from all participants were used. 2.3. Risky choice replication experiment A replication of the risky choice experiment was also run with many more trials per participant. This experiment was identical to the previous risky choice experiment with the four differences explained below. This experiment was specifically designed to constrain the response time models and so will only be discussed in that section. 2.3.1. Participants Data were collected from 12 University of Massachusetts, Amherst undergraduates participated for $11. A participant received $10 if the overall amount won at the end of the experiment was higher than the computer. One participant with near random choices and very short RTs was removed from analysis. 2.3.2. Stimuli The stimuli were identical to the risky choice experiment. 2.3.3. Procedure The procedure is identical to the risky choice experiment with four differences. First, because the focus of this analysis is the RTs, not process-tracing, all of the information is visible to the participant throughout the trial. Second, as discussed previously, a participant received $10 if the overall amount won at the end of the experiment was higher than the computer. Third, after 2 practice trials, each participant was run on three blocks of 101 trials, with a short break between blocks. Finally, there were no interruption trials. 2.3.4. Data The first trial of each block was not analyzed, for a total of 300 trials per participant. RTs tend to change over time within an experiment. To select a set of stationary trials, i.e., trials on which the mean and variance of the RTs do not change, 235 consecutive trials1 were selected for each participant that maximized stationarity as defined by the Dickey-Fuller regression test (p < 0.05, Dickey & Fuller, 1979). 2.4. Behavioral results On average, participants selected one of the two high alternatives on 89.4% (sd = 5.3%), 92.6% (sd = 7.1%), and 90.5% (sd = 8.3%) of the preferential, risky choice, and risky choice replication trials, respectively. The best alternative was defined based on participant-reported attribute ratings in the preferential choice experiment and expected value in the risky choice experiments. On average, participants selected the best alternative on 62.0% (sd = 9.9%), 73.8% (sd = 13.2%), and 68.9% (sd = 15.3%) of the trials in the preferential, risky choice, and risky choice replication experiments, respectively. The means across participants of the median RTs were 8.54 s (sd = 4.25 s), 11.00 s (sd = 5.75 s), and 7.25 s (sd = 2.06 s) on the preferential, risky choice, and risky choice replication trials, respectively. The 0.1, 0.3, 0.5, 0.7, and 0.9 RT quantiles for each participant in the preferential, risky choice, and risky choice replication experiments are provided in the middle right panels of Figs. 5, 6, and 7, respectively. Process-tracing results are described below as part of the model process-tracing analysis. 3. Modeling choice and response time 3.1. Model fitting Recall that the model details are provided in Appendix A. All model parameters for each model are provided in Tables A1– A3 in Appendix A. Model parameters were fit to participant choice data in two ways: maximizing log likelihood and crossvalidation. Means of all maximum log likelihood parameters for all models are provided in a Supplement. 3.1.1. Maximum log likelihood Maximum likelihood estimators have desirable mathematical properties such as consistency and efficiency (Lewandowsky & Farrell, 2010). The model was fit separately to the data from each participant. Let Rt be the response of the participant on trial t. Given the stimulus presented on trial t and a set of model parameters h, each model M generates the probability of choosing Rt, i.e., P(Rt|M,h). The maximum log likelihood given model M is then defined as

X ln L ¼ max lnðPðRt jM; hÞÞ; h

1

t

Due to a computer error, only the first 241 trials of one participant were stored, which is why 235 consecutive trials are considered.

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

51

where t ranges across all trials for the participant. Calculating the likelihood also provides a convenient means for computing measures such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), which evaluate evidence for one model over another, taking model complexity into account. Here, we report BIC and Bayes Factors (BF). BIC is defined as

BIC ¼ 2 ln L þ lnðnÞk; where ln L is the maximum log likelihood measure, n is the number of trials, and k is the number of free model parameters. Tables A1–A3 in Appendix A provide the number of parameters, e.g., degrees of freedom or DOF, used to compute BIC for each model. The BF is a measure of the relative evidence for one model over another, given the data. The BF yielding evidence for Model 1 over Model 2 can be approximated from BIC as (Wagenmakers, 2007)

BF ¼ exp

  BIC 2  BIC 1 ; 2

where BICx is the BIC for Model x. We follow the guidelines from (Raftery, 1995) and define BF upper thresholds 3, 20, and 150 for weak, positive, and strong evidence favoring one model over another. 3.1.2. Cross validation Maximum likelihood and its derivatives provide an excellent way to compare the relative performance of models, but are difficult to interpret in terms of evaluating how fully a model accounts for the observed data. We therefore also employed cross-validation using the expected proportion of correctly predicted choices. Cross-validation avoids the issue of overfitting that can occur when all trials are used to fit a model, naturally tends to favor simpler models over very complex ones (Lewandowsky & Farrell, 2010), and provides a means of assessing how well a model can predict a participant’s choices on trials not used in the fitting process. While for large sample sizes, cross-validation criterion is asymptotically equivalent to AIC (Browne, 2000), for the sample sizes we consider here, cross-validation provides useful information concerning predictive ability. For instance, Towal, Mormann, and Koch (2013) used leave-one-out cross-validation to test a drift-diffusion model of preferential choice. We apply k-fold cross-validation (Geisser, 1975), in which the set of trials for each participant is divided into k subsets. The fitting process is repeated k times, withholding a different subset each time from the training set used to fit the parameters. For cross-validation, the expected value of the proportion of correct model predictions was maximized on the training set. In each repetition, the model is tested on the withheld subset, yielding the probability of the participant’s choice on each test trial. The final measure is the average across the k testing subsets. In the preferential and risky choice experiments, k is 9 and 6, respectively, yielding a testing subset size of 5 in both experiments. 3.1.3. Fitting choice and RT When both choice and RTs are fit, CV is no longer an appropriate fitting technique. So, only the maximum log likelihood fits were used. The models were fit to choice and RT data using the 0.1, 0.3, 0.5, 0.7, and 0.9 RT quantiles and maximum log likelihood as described in (Donkin, Averell, Brown, & Heathcote, 2009; Heathcote, Brown, & Mewhort, 2002). That is, the participant’s data on each trial includes both a selected alternative Rt and an RT quantile Qt. For a given stimulus and a given set of parameters h, the model M predicts the probability for each alternative and RT quantile combination. These model predictions are used to calculate a maximum log likelihood for that participant as

ln L ¼ max h

X

lnðPðRt \ Q t jM; hÞÞ;

t

where the sum is across all trials t for a participant. We also fit the MLBA to the choice and RT data in a direct manner, i.e., without using quantiles. Because the results of these fits were both qualitatively and quantitatively similar to the quantile-based fits, we only report the quantile fits. 3.1.4. Parameter search Fits were accomplished by first employing a coarse grid search across the parameter space, followed by a simplex optimization method using constrained fminsearch in MATLAB (MathWorks) with multiple randomized starts near the best parameter set found by the grid search (Lewandowsky & Farrell, 2010). Parameters constraints are provided in Table A1 for preferential choice models and Table A2 for risky choice models. 3.2. Choice The utility, 3 heuristic, and 2 dynamic models were first compared on how well they account for choice in both the preferential and risky choice experiments. Given a particular set of alternatives, each of these models produces a probability distribution of choice across the alternatives. Because participants differ in how they weight attributes, we fit data within each individual, rather than fit averaged data. For instance, Ashby et al. (2016) advocate fitting drift-diffusion models to individual choice data (also see Cohen, Sanborn, & Shiffrin, 2008).

52

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

Results are provided in Figs. 3 and 4 for the preferential and risky choice experiments, respectively. The upper left panel provides the negative BIC measure for each model. Negative BIC was used so higher values are more favorable. The box plots show the results across participants. The lower left panel shows the ln(BF) for each pair of models. Each group of bars displays the evidence for the model listed on the x-axis compared to each of the other models. Each bar shows the central 50% (thicker bar) and 90% (thinner bar) range of ln(BF) values across participants. The higher the value, the stronger the relative evidence for the given model. For example, in Fig. 3, the ln(BF) provides at least positive evidence for most participants in favor of the MLBA relative to the SRU and MDFT and very strong evidence for all participants in favor of the MLBA relative to the LEX, EBA, and MV. The upper right panel shows the average proportion correct across the testing subsets of the k-fold cross-validation procedure. A value of 1 indicates that a model perfectly predicted choices in all testing subsets when fit to the training subsets. Chance performance in both experiments is 1/3. The box plots show the results across participants. The lower right panel shows pairwise model differences in the cross-validation measure. Higher values indicate better performance for the model listed on the x-axis relative to the other models. Each bar shows the central 50% (thicker bar) and 90% (thinner bar) range of cross-validation differences across participants. For example, in Fig. 3, the MLBA had higher predictive performance than the SRU for almost all participants, the MDFT for approximately 75% of participants, the LEX, EBA, and MV for all participants. First, consider the results from the preferential choice experiment shown in Fig. 3. Regardless of the measure used, all of the heuristic models do poorly, so the focus will be on the other models. The SRU, MDFT, and MLBA yield similar, but not identical, results. The BF produces weak to strong evidence in favor of the MDFT over the SRU and the MLBA over both the SRU and MDFT. Similarly, the MDFT accounts somewhat better for the cross-validation training subset than the SRU, and the MLBA accounts somewhat better for the cross-validation training subset than the MDFT. Next, consider the results from the risky choice experiment shown in Fig. 4. Although the MV model does a good job for a small subset of participants, on the whole, the heuristic models fare poorly again. The WV, MDFT, and MLBA results are very similar and neither the BF nor cross-validation measures provide any clear evidence for one model. In summary, similar to the results of Glöckner and Pachur (2012), the heuristic models perform poorly. For the risky choice data, the WV, MDFT, and MLBA models produce similar results. For the preferential choice data, the MLBA has a slight advantage over the MDFT which itself is slightly better than the SRU. Note that, according to the cross-validation results, the SRU, WV, MDFT, and MLBA have fairly good predictive power, especially for the risky choice data. This result is particularly impressive given that, as discussed previously, 2 of the 3 alternatives in each trial were designed to have comparable utility. That is, the models are able to predict how a participant will differentiate between two similar alternatives.

Fig. 3. Choice results across participants from the preferential choice experiment. Upper left: -BIC results. Lower left: ln(BF) for each pair of models. Upper right: Expected proportion correct from the cross-validation procedure. Lower right: Pairwise model differences in expected proportion correct from the cross-validation procedure.

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

53

Fig. 4. Choice results across participants from the risky choice experiment. Upper left: -BIC results. Lower left: ln(BF) for each pair of models. Upper right: Expected proportion correct from the cross-validation procedure. Lower right: Pairwise model differences in expected proportion correct from the crossvalidation procedure.

3.3. Choice plus response time The MLBA, LEX, and EBA were jointly fit to choice and response time (RT) data from both experiments. Although the MDFT does, in principle, make reaction time predictions, in practice, as discussed previously, fitting the optional stopping time version of the model is computationally intractable. The EBA and LEX heuristics involve comparison of alternatives, examining attributes in turn until a decision is made. The response time will depend on how many attributes must be examined until a single alternative remains. The utility and MV models were also fit, but, because these models do not make explicit RT predictions (i.e., all values must be viewed on every trial), the RTs from these models were assumed to follow an ex-Gaussian distribution, which well describes RTs in a wide variety of experiments (Lacouture & Cousineau, 2008). These latter models serve as a comparison for the MLBA, LEX, and EBA, which intrinsically predict RTs. 3.3.1. Overall fit Consider the overall fit of the models as measured by BIC. The fitting results are shown in Figs. 5 and 6 for the preferential and risky choice data, respectively. As before, the upper left panel provides the negative BIC measure for each model and the middle left panel shows the ln(BF) for each pair of models. The LEX and EBA models perform poorly for both experiments. For the preferential choice experiment, the MLBA performs better for most participants than the SRU model which, in turn, performs better than the MV model for all participants. For the risky choice experiment, the WV model performs better for most participants than the MLBA and MV, which perform similarly. 3.3.2. Choice Next, consider how well the models account for choice performance. The upper right panel shows the mean proportion correct for each model across participants. Again, the LEX and EBA models fare poorly. For both experiments, the utility models and the MLBA perform similarly and better than the MV model. 3.3.3. Response time Finally, consider how well the models account for RTs. The middle right panel of Fig. 5 and Fig. 6 shows the RT quantiles for each participant in the preferential and risky choice experiments, respectively. Each vertical set of 5 dots are the 0.1, 0.3, 0.5, 0.7, and 0.9 RT quantiles for a single participant. The participants are ordered by median RT.

54

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

Fig. 5. Choice and RT results across participants from the preferential choice experiment. Upper left: -BIC results. Middle left: ln(BF) for each pair of models. Lower left: Kendall’s tau rank correlation for RTs. Upper right: Proportion correct predictions. Middle right: Quantile analysis for each participant for the exGaussian distribution and MLBA models. Lower right: Kendell’s tau correlations between the actual and MLBA predicted RTs when fit to RTs in the original and permuted order for participant and simulated data.

To assess the ability of the models to account for the overall pattern of RTs, we first consider the RT distribution combining across trials for each participant. The dashed lines show the predicted RT quantiles for the utility and MV models, which both rely on the ex-Gaussian (exG) to predict RTs. The solid lines show the predicted RT quantiles for the MLBA model. Both the ex-Gaussian and MLBA provide an excellent account of the overall RT distributions for each participant in each experiment, even though the models were not fit directly to the RT distributions, but rather to the trial-by-trial RT quantiles. This result, that the MLBA can predict RT distributions, supports previous analysis of the LBA (Brown & Heathcote, 2008). The EBA and LEX models fail to account for these distributions (see Fig. B1 in Appendix B). We next examine the trial-by-trial model predictions. The lower left panels of Fig. 5 and Fig. 6 show the rank correlations (rs, Kendall’s tau) between the participant RTs and predicted RTs for each model in the preferential and risky choice experiments, respectively. The LEX and EBA models continue to perform poorly, with correlations near 0. Although it was able to account for the overall RT distributions, the ex-Gaussian (i.e., the utility and MV models) cannot account for these data, again, with correlations near 0. Given that the ex-Gaussian does not take any stimulus information into account, this result is not surprising. Unlike the ex-Gaussian, however, the MLBA has the advantage of being able to generate RT predictions on a stimulus-by-stimulus basis. As discussed previously, to our knowledge, this aspect of the MLBA has never been tested. The MLBA performs somewhat better than the other models with a median correlation of 0.14 and 0.16 in the preferential and risky choice experiments, respectively.

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

55

Fig. 6. Choice and RT results across participants from the risky choice experiment. Upper left: -BIC results. Middle left: ln(BF) for each pair of models. Lower left: Kendall’s tau rank correlation for RTs. Upper right: Proportion correct predictions. Middle right: Quantile analysis for each participant for the exGaussian distribution and MLBA models. Lower right: Kendell’s tau correlations between the actual and MLBA predicted RTs when fit to RTs in the original and permuted order for participant and simulated data.

We now explore the MLBA RT fits in more detail. Consider the lower right panels of Fig. 5 and Fig. 6. The x-axis coordinate of each filled circle represents the rank correlation between a participant’s RTs and the MLBA RT predictions (i.e., the correlations from the lower left panels). The gray vertical line shows the p = 0.05 cutoff for the correlation. That is, correlations to the right of this line are significantly different from 0 at the a = 0.05 level. Seven of the 32 participants in the preferential choice experiment and 7 of the 28 participants in the risky choice experiment had a correlation that was significantly different from 0. As another way to test whether the correlation to the participant RT data is significant, we examined correlations resulting from scrambled RT data. That is, for each participant, each stimulus was randomly paired with a RT from that participant, without replacement. The MLBA was fit to the choice data and these permuted RTs and a correlation was generated as above. This process was repeated 100 times. For each of 100 permuted data sets, we found the rank correlation with predicted RT, and then determined the middle 95% interval of the resulting set of correlations. A participant RT correlation above this interval indicates that the model is accounting for meaningful patterns in the RT data. The results were very similar to using the direct correlation significance cutoff, i.e., 8 and 6 participants had a correlation above the 95% interval for the preferential and risky choice experiments, respectively. One potential concern is that there are insufficient trials to accurately recover model parameters for each participant. We address this concern in four ways. First, we determine whether we are able to account for RT data that was actually generated by the model with the given experimental designs. The MLBA was fit to choice and RT data that were generated by the

56

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

MLBA using the same stimuli and the number of trials as used in the two experiments. RT data were simulated from the MLBA using randomly generated parameters with the same range as the fit parameters. The MLBA was fit to the simulated data and also to permuted data, as described above. The resulting correlations are shown as triangles in Figs. 5 and 6, shaded by the value of drift rate variability, s, used to generate the simulated data (to indicate the level of noise in that simulation; a lighter color indicates higher values of s, i.e., higher noise; values range from 0 to 0.08). For both experiments, most of the correlations are strongly significant for the simulated data suggesting that the MLBA can account for patterns in the RT data in this experimental context. To examine why a few of these simulated correlations are relatively low, we explore the effect of noise in the model, specifically in the drift rate variability, s. With low to medium noise, the model does an excellent job recovering the RT data, again demonstrating that parameter recovery is possible with the current experimental design. With high noise, the correlations have a similar range as those for the participant data. The high-noise results can be interpreted in two ways: the model cannot fully account for the RTs and so increases noise to improve fit, or participant data are inherently noisy. An exploration of this difficult question is left for future research. Second, we verify the experimental design and fitting method through a test of parameter recovery. Data were generated from the MLBA using randomly generated parameters from the same range as the participant parameters. We tested the ability of the model to recover the parameter values and correctly predict choice and RT. For both experiments, the recovery was highly successful, again suggesting that these experiments are sufficient to fit both choice and RT data. Details of the analysis and results are provided in Appendix C. The next two approaches, pooling data and a new experiment with more trials per participant, are designed to increase the sample size of the data. Only risky choice is considered because individual variability in preferential choice attribute weights preclude pooling across participants, the original risky choice experiment had a smaller sample size than the preferential choice experiment, the application of the MLBA to risky choice is novel, and we wanted to ensure that the requiring participants to click cells to view values did not fundamentally alter choice strategy. Third, we fit to choice and RT data pooled across participants for the risky choice experiment. To account for differences in RT across participants, the RTs were Vincentized (Heathcote, 1996), a method for creating a pooled RT distribution. The MLBA was then fit to these pooled data consisting of 840 trials, as described previously, for both choice and RT. The model correctly predicted 73.8% of the choices, which is very similar to the mean of the individually fit participants (75.2%). The rank correlation of the participant and predicted RTs is 0.09, which is similar to, but slightly lower than, the mean of the individually fit participants (0.14). These results are consistent with those reported previously. Finally, we ran the risky choice replication experiment described previously with many more trials per participant. The models were fit to the individual participant data as before. The results are provided in Fig. 7, which follows the same format as in Fig. 6. Consistent with the previous risky choice experiment, the heuristic models perform poorly on both choice and RT, the WV and MLBA perform similarly on choice, and the MLBA better accounts for the RT data. For six of the participants, the MLBA achieved a significant correlation with the RT data. Using the 95% interval determined from the 100 permutation fits, as described previously, the results were identical to using the direct correlation significance cutoff, i.e., 7 participants had a correlation above the 95% interval. Having more trials per participant also allows us to determine a more stable RT estimate, by averaging across stimuli within a participant. Recall that the stimuli were all generated from a common underlying distribution, that is, there were no a priori stimulus classes. We therefore allow the model itself to group the stimuli, based on predicted RT. Given bestfitting choice-plus-RT parameters, the MLBA was used to generate an expected RT for each stimulus. The stimuli were then placed into 5 equal-sized bins from predicted fastest to slowest, i.e., 47 stimuli in each bin. The results are shown in Fig. 8. Each panel of this figure shows the range of participant RTs for each set of binned stimuli for each participant. The indicated correlations (rs), provided for reference, are between the participant RTs and predicted RTs for all stimuli and correspond to the rank correlations shown in Fig. 7. There is no discernable relationship between observed and predicted RT for Participants 1–5. Participant 11, on the other hand, shows a clear, monotonic relationship between the bins and participant RT. For the remaining participants, Participants 6–10, the fastest bin is fairly well predicted, although, with a few exceptions, there is little to no relationship in the other bins. There are clearly individual differences in how well the MLBA can predict RTs. For one participant, it does a moderately good job. For about half of the participants it does poorly and for the other half it can reliably predict which stimuli will be responded to fastest. To look further into the relationship between stimulus characteristics and MLBA RT predictions, consider the lower right panel of Fig. 8. Each point is a participant. The y-axis provides the rank correlation between participant RT and the MLBA predicted RT, corresponding to both the MLBA data in the lower right panel of Fig. 7 and the rs in the other panels of Fig. 8. That is, the higher the y-axis correlation, the better the relationship between participant RT and MLBA predicted RT. The x-axis provides the rank correlation between participant RT and the difference in expected value between the top two alternatives of each stimulus. That is, a more negative x-axis correlation means that participants take longer to respond to stimuli in which the expected values of the top two alternatives are similar. The correspondence between these two measures is remarkable. The strong negative linear relationship means that the MLBA RT predictions are strongly related to how easy it is to distinguish the expected values of alternatives. Participants who are faster on easy to discriminate alternatives are better predicted by the MLBA on RT. Some, but not all participants follow this pattern, suggesting that choice is not always based purely on a straight expected value calculation. In summary, the utility and MLBA models continue to account for the choice data. Both the ex-Gaussian and MLBA do an excellent job of describing the overall RT distribution. The ex-Gaussian cannot predict RTs on a trial-by-trial level. The MLBA

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

57

Fig. 7. Choice and RT results across participants from the risky choice replication experiment. Upper left: -BIC results. Middle left: ln(BF) for each pair of models. Lower left: Kendall’s tau rank correlation for RTs. Upper right: Proportion correct predictions. Middle right: Quantile analysis for each participant for the ex-Gaussian distribution and MLBA models. Lower right: Kendell’s tau correlations between the actual and MLBA predicted RTs when fit to RTs in the original and permuted order for participant data.

does account for some of the trial-by-trial RT variability, however, it does so only weakly and only for a subset of participants. Fitting the MLBA to simulated data with participant-level noise highlights that the noise dominates any potential explanatory power of the model. That is, even if the MLBA is the correct model, the vast majority of the variability in the RT data is left unexplained. 4. Process-tracing analysis Although the utility and dynamic models can both account for choice performance, the RT analyses only weakly support the dynamic models, represented by the MLBA, over the utility models. All of these analyses thus far, however, assume that the participant has access to and uses the complete set of attribute values for all alternatives. As discussed previously, this assumption is very strong and probably incorrect (Glöckner & Betsch, 2008; Lohse & Johnson, 1996; Nicholas & Cohen, 2016). To explore whether the addition of an attention component to the models can improve predictive ability, the next set of analyses incorporates what information was actually viewed and in what order. The experiments provide this information in the form of process tracing measures: eye tracking in the preferential choice experiment and mouse tracking in the risky choice experiment.

58

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

Fig. 8. Each panel shows the range of participant RTs for each set of stimuli, as binned by the MLBA. Bin 1 corresponds to the fastest 20% of the stimuli as predicted by the MLBA. Bin 2 corresponds to the next fastest 20% of the stimuli, and so on. The indicated correlation is between the participant RTs and predicted RTs for all stimuli. The lower right panel shows the rank correlation of participant RT and predicted MLBA RT, on the y-axis, vs the rank correlation of participant RT and the difference in expected value between the top two gambles, on the x-axis.

We derive qualitative hypotheses about the process tracing measures for each of the models and test these hypotheses against the observed data. The SRU and WV utility models are adapted to the process tracing paradigm. We also consider the MLBA and two forms of the MDFT: a closed form and sequential form. The closed form of the MDFT was derived in (Roe et al., 2001) and was used to predict choice above. The closed form uses an external stopping rule, i.e., the trial length was predetermined, and essentially provides an expected value over all possible information search trajectories from the sequential model. The sequential model assumes that the participant views one attribute at a time and sequentially updates preferences for the alternatives. Because the MV heuristic performed well for a subset of participants in the risky choice experiment, it is also adapted to the process tracing paradigm. Because they prescribe the information search pattern, and so are not easily adapted to this paradigm, the EBA and LEX heuristics are not considered here. Here we define a gaze or fixation as any set of consecutive views or mouse clicks on a value in the stimulus matrix before moving away from that value. For example, in Fig. 1, if a participant moves their eye around the 2 stars on the size attribute for apartment 1 before moving on to the 5 stars on apartment 2, that counts as a single gaze on the 2 stars. For simplicity, we use the term gaze to refer to both eye fixations and mouse clicks. We follow previous research (Fiedler & Glöckner, 2012; Glöckner & Herbold, 2011) that has derived model-based predictions about the information search process. We focus on four measures: search direction, the number of values examined, the gaze cascade effect, and choosing the last alternative viewed. A summary of the process tracing predictions are provided in Table 1. It is important to note that, whereas a ‘‘yes” in Table 1 means that the hypothesis necessarily follows, a ‘‘no” means that the hypothesis does not necessarily follow, not that it cannot sometimes occur. Search direction is a measure of whether participants tend to collect information within-alternative (i.e., looking at different attributes within an alternative before moving on to a new alternative) or within-attribute (i.e., looking at the same attribute across alternatives before moving on to a new attribute). Within-alternative search is the hallmark of compensatory strategies (Payne et al., 1988) such as the utility models considered here. The idea is that, to calculate an alternative’s utility, the participant needs to sequentially consider all of the values within that alternative. The MDFT contrasts attribute values across alternatives. Thus, the most natural prediction for the MDFT is within-attribute search. Predictions of the MLBA are somewhat more complicated. The MLBA incorporates both front- and back-end processes. The front-end, which is assumed to take the same time for all stimuli, is the information gathering phase that sets the mean drift rates for each alternative (Trueblood et al., 2014). Although not explicit, the drift rate formulas suggest that pairs of alternatives are compared within each attribute, leading to a within-attribute search pattern (but see Noguchi & Stewart, 2014). There is no

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

59

Table 1 Model process tracing hypotheses.

Note: Shaded cells were supported by the data.

information-search in the back-end process, which accounts for variation in RTs. Thus, in Table 1, the MLBA is listed as predicting both unspecified and within-attribute search. Because search for the maximum value can proceed in any direction, the MV heuristic is agnostic regarding search direction. The Payne Index (PI; Payne, 1976) was used to measure search direction. PI is defined as

PI ¼

within-alternative  within-attribute ; within-alternative þ within-attribute

where within-alternative and within–attribute are the number of within-alternative and within-attribute transitions, respectively. Transitions that cross both attributes and alternatives are discarded. This measure ranges from 1 (totally withinattribute) to +1 (totally within-alternative). The median PI in the preferential and risky choice experiments were 0.23 (middle 90% = 0.04, 0.44) and 0.24 (middle 90% = 0.46, 0.75), respectively, strongly suggesting within-alternative search for preferential choice and moderately favoring within-alternative search for risky choice. The PI, however, is a biased measure when the matrix of information is not square (Böckenholt & Hynan, 1994; Reisen, Hoffrage, & Mast, 2008). For example, in the preferential choice experiment, there are 4 attributes, but only 3 alternatives, which makes it more likely that a participant will make a within-alternative transition. To account for this bias, we determined the middle 90% interval of PIs for each participant assuming random transitions.2 Each participant’s PI was then compared to the corresponding interval. In the preferential choice experiment 37.5% and 25% of the participants favored within-alternative and within-attribute search, respectively. In the risky choice experiment, 64.3% and 14.3% of the participants favored within-alternative and withinattribute search, respectively. The PI for the remaining participants did not differ from chance. More participants used within-alternative search, but there were individual differences. Models of choice often assume that all of the information is accessed by the participant. Participants, however, do not always examine all values provided (Glöckner & Betsch, 2008; Lohse & Johnson, 1996; Nicholas & Cohen, 2016). The utility and MV models require that all values are known. The closed-form version of the MDFT and the MLBA also assume all information is accessed. In the sequential version of the MDFT, not all information is necessarily accessed before the threshold is reached, i.e., before the choice is made. The proportion of unique values viewed (i.e., repeated views of the same value were not counted) was calculated for each participant within each trial. For the preferential and risky choice experiments, the mean proportion of unique values viewed was 0.75 (middle 90% = 0.55, 0.91) and 0.87 (middle 90% = 0.58, 1.00). Participants usually did not view all available information. They did, however, view information multiple times. The mean total number of values viewed, including repeats, were 20.72 (middle 90% = 10.22, 35.72) and 13.74 (middle 90% = 5.63, 23.30) in the preferential and risky choice experiments, respectively. As the trial progresses, participants tend to look more often at the alternative they ultimately select. This progression is called the gaze cascade effect (Shimojo, Simion, Shimojo, & Scheier, 2003). Because none of the models considered in this section narrows the set of alternatives over time,3 unlike, for example the EBA and LEX, they do not predict a gaze cascade effect. To measure the gaze cascade effect, we split each trial into the first, middle, and final third of gazes. For each third, we calculated the proportion of gazes on the alternative that was ultimately selected on that trial. An increasing proportion indicates a gaze cascade effect. The proportion of gazes on the selected alternative in each third was 0.34, 0.42, and 0.55 in the preferential choice experiment and 0.35, 0.35, and 0.45 in the risky choice experiment.4 There was a weak gaze cascade effect, which we conjecture was mainly driven by elimination of the weakest, low alternative. An extreme version of the gaze cascade effect is the prediction that the participant will select the last option viewed. None of the considered models predict that the last alternative viewed is necessarily selected. The mean proportion of trials on 2 We simulated 10,000 random sequences of information search for each participant with the sum of within-alternative and within-attribute transitions equaling that of the participant. 3 Although Roe et al. (2001) suggest removing alternatives that hit a lower threshold. 4 Preferential choice: F(2, 62) = 141.6, p < 0.001, Bins 1v2 t(31) = 7.8, p < 0.001, Bins 1v3 t(31) = 13.3, p < 0.001, Bins 2v3 t(31) = 11.5, p < 0.001; Risky choice: F (2, 54) = 9.6, p < 0.001, Bins 1v2 t(27) = 0.1, p = 0.89, Bins 1v3 t(27) = 4.4, p < 0.001, Bins 2v3 t(27) = 2.9, p = 0.007.

60

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

which the last alternative viewed was selected was 0.71 (middle 90% = 0.49, 0.87) in the preferential choice experiment and 0.55 (middle 90% = 0.16, 0.78) in the risky choice experiment. Participants did not necessarily select the last alternative viewed, but some were biased towards viewing their selected alternative last. The results of the process tracing hypothesis analysis are equivocal. None of the models is fully supported. This analysis therefore suggests that the information-gathering assumptions of the models do not fully capture human behavior. 5. Models of attentional sampling The second aim of this research is to test preference-formation assumptions, independently of attention, by developing a model framework that directly incorporates empirical process-tracing information. The models were adapted to incorporate the process tracing data. That is, these adapted models act on the viewed values as they arrive sequentially in time. Because the models incorporate the actual gaze trajectories used by the participants, RT is fixed for each trial, so only choice is considered. All model details are provided in Appendix A. Prior work has found that process-tracing fixation data can be predictive of trial-by-trial choice (Franco-Watkins, Davis, & Johnson, 2016; Glöckner & Herbold, 2011; Johnson, Schulte-Mecklenbeck, & Willemsen, 2008). For example, the ADDM uses empirical transition probabilities between products to drive the stochastic evidence accumulation process. A similar approach was used by Towal et al. (2013). Ashby et al. (2016) incorporated moment-by-moment eye fixations into a set of drift-diffusion models to predict multi-alternative choice. These previous studies, however, do not consider multiple, explicit attributes. As a basis of comparison, the utility models are adapted to use the information-processing data. The SRU and WV utility models are identical to the models described previously with the addition that all values unknown to the participant are replaced by the mean value across all attributes and trials. For the MV heuristic model, all missing values are assumed to be 0, so they could not be selected as the maximum value. Building on ideas from the MDFT, MLBA, and ADDM, we also develop and test a family of sequential sampling models, the models of attentional sampling (MAS). In each MAS model, information is added to the decision process in the order in which it was viewed. As for the SRU and WV, unviewed values are assumed to equal the mean value across all attributes and trials. Each MAS instantiation incorporates a different set of preference-formation assumptions. The first assumption is that dimension values are used to directly update preference states. Although intuitive that values play a role in preference formation, once their influence on attention is accounted for, the scope of their further contribution is an open question. Recently, researchers have taken the strong position that, in certain choice contexts, attention predicts choice independently of values (Stewart et al., 2016). For example, it could be that attention to each alternative is driven by the stimulus values, but that the preference states of attended alternatives change at a fixed rate, regardless of the values. The second assumption is that people contrast values across alternatives within an attribute, rather than independently using the attribute values. In the former case, the preference state for each alternative is updated based on the difference between its attribute value and the mean of the attribute values of the other alternatives. In the latter case, each alternative is updated based only on its own attribute value. The final assumption is that people update the preference states, i.e., accumulated evidence, for all alternatives concurrently, rather than only updating the preference state for the currently viewed alternative. Because the second two assumptions are only relevant if values are attended, we only consider five of the eight possible variants. These MAS variants were named as follows. Each model has three binary digits after ‘‘MAS”. The first digit indicates whether values were used. The second digit indicates whether value contrasts were used. The third digit indicates whether the preference states for all alternatives were updated. For example, MAS100 uses values, but does not incorporate contrasts, and only updates the preference state for the currently viewed alternative. MAS000 only tracks attention, ignoring values, and only updating the viewed alternative. MAS111, which uses values, incorporates contrasts, and updates all alternatives, is equivalent to the sequential sampling version of the MDFT. See Appendix A for model details. For computational considerations, the implemented versions of MAS incorporate normally distributed additive noise at the end of a trial, rather than moment-by-moment information accumulation noise. The accumulation rates change as new information is viewed. This simplification provides an approximation to the actual stochastic process, while being computationally tractable. This change is akin to the constant drift rate assumption of the MLBA. Indeed, although the MLBA is not considered here as a separate model, the current implementation is similar to a piecewise linear version of the MLBA (Holmes, Trueblood, & Heathcote, 2016). The purpose here is to both compare the MAS family of models against the utility and MV models and test the three assumptions discussed previously. The models were evaluated, as described in the previous section on choice, using BIC and cross-validation. The results are shown in Fig. 9 for the preferential choice experiment and Fig. 10 for the risky choice experiment. These figures follow the same format as Figs. 3 and 4. For the preferential choice experiment, most of the models perform similarly, with two exceptions. Models MAS100 and MAS000 have a much better BIC than the rest of the models, with MAS100 providing weak to strong evidence over MAS000 across most participants. The cross-validation, however, clearly favors MAS100 (-BIC median = -62.1; CV E(Pcorr) median = 0.76). For preferential choice, the dynamic models were supported over the utility and heuristic models. These results suggest that values were used during preference-formation, but participants did not contrast attributes across alternatives and updated preference states one at a time, consistent with the within-alternative gaze pattern.

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

61

Fig. 9. Process tracing results across participants from the preferential choice experiment. Upper left: -BIC results. Lower left: ln(BF) for each pair of models. Upper right: Expected proportion correct from the cross-validation procedure. Lower right: Pairwise model differences in expected proportion correct from the cross-validation procedure.

Fig. 10. Process tracing results across participants from the risky choice experiment. Upper left: -BIC results. Lower left: ln(BF) for each pair of models. Upper right: Expected proportion correct from the cross-validation procedure. Lower right: Pairwise model differences in expected proportion correct from the cross-validation procedure.

62

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

For the risky choice experiment, the results are less clear. Using both BIC and cross-validation, MAS100 again performs best, but the advantage is less pronounced (-BIC median = 35.6; CV E(Pcorr) median = 0.80). Model MAS000 performs far worse than all of the other MAS variants, which again suggests that value is an important part of the preferenceformation process (c.f., Stewart et al., 2016). The BIC results, where MAS110 and 100 perform somewhat better than MAS111 and 101, suggest that preference states are updated one alternative at a time. Because MAS100 and 110 produce similar results, the findings are ambivalent on whether participants contrast values. Except for a few participants, the MV heuristic model performs poorly. The WV model performance is very similar to MAS100. This latter result is perhaps not surprising given that the accumulator models converge to the prospect theory predictions. In absolute terms, the performance of MAS100 in both experiments is impressive. As demonstrated by the crossvalidation results, the predictive power of this model is quite robust. Inclusion of the process tracing information significantly improves model performance, increasing average prediction by up to 8 percentage points over the best choice models discussed previously. These results highlight that knowledge of the values is not sufficient, it is also important to incorporate time-extended attentional processes into models of choice. Although they did not perform well in this experimental context, one potential advantage of heuristics like the EBA and LEX (and, for example, the take-the-best heuristic in a judgment task, Gigerenzer & Goldstein, 1996) is that they naturally incorporate such attentional assumptions. The finding that the utility model performed relatively poorly in preferential choice, but relatively well in risky choice may point to an underlying difference in the information search process in these two paradigms. 6. Discussion The first aim of this research was to compare utility, heuristic, and dynamic models in a multi-alternative, multi-attribute experimental paradigm where all attribute values were explicit and presented separately. Both preferential and risky choice experimental environments were considered. Models were compared on choice alone, choice plus response time, and on qualitative process-tracing predictions. In the preferential choice task, the choice data clearly favor the MLBA. Although less clear, the risky choice task results favor the utility and dynamic models. Some previous work on choice has favored sequential sampling models over utility (Berkowitsch et al., 2014; Fiedler & Glöckner, 2012) and heuristic models (Fiedler & Glöckner, 2012; Newell & Lee, 2011; Scheibehenne et al., 2009). The MDFT and MLBA were developed, in large part, to account for context effects such as the similarity, compromise, and attraction effects (Roe et al., 2001). The focus here is on a more general choice task. Thus, although the utility models perform reasonably well, especially for the risky choice task, prior work has shown that the dynamic models can account for a wider range of qualitative response patterns (Roe et al., 2001; Trueblood et al., 2014). Recall that, in both experimental paradigms, the stimuli were deliberately constructed to make discrimination of two of the choices difficult. Thus, these models are doing a particularly good job in a challenging choice environment. To determine whether the results are confined to these tasks and stimuli, future studies should explore other experimental paradigms. Consistent with previous results using the LBA (Brown & Heathcote, 2008), the MLBA does an excellent job accounting for overall response time distributions in both tasks. However, the MLBA can only weakly predict RTs for individual stimuli (but see Trueblood & Dasari, 2017). None of the other models could predict RT at the stimulus-level. Furthermore, the results of the process tracing hypothesis analysis (Table 1) were equivocal. None of the models had all of their predictions supported by the data. A similar result was found, for example, by Fiedler and Glöckner (2012). What these results suggest is that, although they can account for choice performance and, to a certain extent, reaction times, the current models do not account for the entire range of processes involved in these decision-making tasks. In particular, the models may not appropriately represent the interaction of attention and preference formation, but these two processes are intertwined in the models and so are tested concurrently. To overcome this potential shortcoming, the second aim of this research is to test preference-formation assumptions, independently of attention, by developing the MAS model family which incorporates the empirical gaze patterns into a sequential sampling framework. The MAS model variant that includes attribute values, but only updates the currently viewed alternative and does not contrast values across alternatives clearly performs best in the preferential choice task and marginally better in the risky choice task. The performance of this best model is impressive, correctly predicting up to 80% of choices on average. This research extends previous studies in the following ways. First, a wide range of utility, heuristic, and dynamic models were compared on choice using common data sets generated from previously untested stimulus classes. Second, these models were tested on both preferential and risky choice. Where necessary, the models were adapted to risky choice and extended to stimuli with multiple attributes. Third, on choice, the models were compared using both maximum likelihood and cross-validation measures. Fourth, the models were also compared using choice plus response time data. In particular, despite the model’s ability to produce response time predictions, we know of no prior work fitting the MLBA to response time data for preferential or risky choice with explicit attributes (but see Trueblood & Dasari, 2017 for a perceptual task). Finally, we developed a family of models to incorporate value-by-value process tracing information. Although previous work (e.g., Krajbich & Rangel, 2011) has included process tracing data into sequential models of choice, it has usually involved switching between alternatives using empirical transition probabilities and has not been done with explicit, separated attribute values.

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

63

Other researchers have developed models of choice that could incorporate process tracing data. As noted previously, Krajbich et al. (2010) and Krajbich and Rangel (2011) used transition probabilities from eye tracking data to drive a diffusion model of multi-alternative choice, although explicit attributes were not presented. Diederich and Oswald (2016) developed a multi-stage model for multi-attribute choice in which attention could switch between attributes (also see Diederich, 1995, 1997). That model provides a framework to account for both choice and response time. Unlike the current implementation, however, it is assumed that values from all alternatives are concurrently available when an attribute is viewed. Cohen and Nosofsky (2003) developed an extension of the exemplar-based random-walk model of categorization (Nosofsky & Palmeri, 1997) in which the similarity relations between stimuli changed over time as attributes were attended (also see Lamberts, 1995). Unlike the MAS models, all attended attributes are assumed to be processed concurrently. Drift diffusion models with constantly changing drift rates have also been developed (Heath, 1981; McClelland, 1979; Ratcliff, 1980; Smith & Ratcliff, 2009) although they tend to be computationally challenging. Addition of a process tracing component to a sequential sampling model significantly improved the model’s ability to account for human data. This result strongly underscores the importance of the attention process when modeling choice. Previous research has also found evidence for models of choice that incorporate attention (Ashby et al., 2016). The strength of the link between attention and preference formation remains unclear. The front-end of the MLBA assumes that the information-search time is constant across stimuli (Trueblood et al., 2014). Shimojo et al. (2003) proposed a positive feedback loop of preferential looking (i.e., you look at things you like) combined with an exposure effect (i.e., the more you look at something, the more you like it). On the other hand, Krajbich and Rangel (2011) assert that their results are consistent with the assumption that attention drives the preference-formation process, but that preference states may not significantly influence attention. In an eye-tracking study of risky choice, Stewart et al. (2016) found that people choose the gamble they look at more often, independent of the values they view. In a simulation study, Mullett and Stewart (2016) showed that the gaze cascade effect can occur even with a random gaze process, but that it does require a relative, rather than absolute, stopping rule. To complicate matters further, the relation between attention and preference formation may change during the decision process (Shi, Wedel, & Pieters, 2013). These results highlight the findings of Orquin and Loose (2013) who examined a broad set of models and concluded that none adequately accounted for the role of attention in decision making. In summary, we found that dynamic models perform well for choice data in multi-attribute, multi-alternative decision, although they do not account for all qualitative aspects of choice or RT prediction. In general, these and other sequential sampling models hold much promise as a paradigm for studying the interplay between attention and preference formation, in that they assume that a person gathers information to form preferences by switching attention between the attributes and alternatives, until the evidence is sufficiently strong to select one alternative. Moving forward, such decision models need to incorporate a framework that more accurately reflects the relationship between the attention and preference formation processes. This framework should include a theory of what information will be viewed next, based on the current preference state and the information gathered thus far (e.g., Yang, Toubia, & De Jong, 2015). Coupled with an improved stopping rule, such an updated framework could also lead to more accurate RT predictions. Appendix A. Model specifications This appendix provides the detailed specifications for each model. Tables A1–A4 provide a summary of the model parameters. A.1. Standard random utility (SRU) model A.1.1. Choice-only version The SRU model (e.g., Train, 2003) calculates the utility Ui of the ith alternative based on the values Vij for each attribute j presented in the stimulus grid, combined with the importance weight wj that the participant gives to each attribute. We use the probit model assumption of normally distributed noise:

Ui ¼

4 X

wj V ij þ ei ;

ei  Nð0; r2 Þ:

j¼1

The probability of selecting alternative i is given by

PðiÞ ¼ PðU i > U k

8k – iÞ:

A.1.2. Choice-plus-RT version Choice was determined as above. RTs followed an ex-Gaussian distribution which was fit in MATLAB using the exgauss package (Zandbelt, 2014). The ex-Gaussian probability density function is defined as

f ðt; l; r; kÞ ¼

k kð2lþkr2 2tÞ e2 erfc 2





l þ kr2  t pffiffiffi : 2r

64

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

Table A1 Preferential choice model parameters for choice-only data. Model

DOF

Parameters

Range

SRU

4

wj = weight for jth attribute, j = 1, 2, 3, 4 r = noise standard deviation

0  wj  1, Rwj = 1 0r2

MDFT

4

/1 = sensitivity parameter /2 = decay parameter w = dominant direction weight r = noise standard deviation c = weighting function exponent

0  /1  1 0  /2  1 w = 12 0r2 0c2

MLBA

4

s = noise standard deviation for drift rates I0 = constant in drift rate formula cd = scaling factor in drift rate formula a = scaling exponent c = weighting function exponent A = starting range b = threshold

0s1 0  I0  10 0  cd  10 0a2 c=1 A = 100 b = 1000

EBA

4

wj = probability weight for jth attribute, j = 1, 2, 3, 4 h = elimination threshold

0  wj  1, Rwj = 1 1h4

LEX

4

wj = probability weight for jth attribute, j = 1, 2, 3, 4

c = weighting function exponent

0  wj  1, Rwj = 1 0c2

c = weighting function exponent r = noise standard deviation

0c2 0r2

MV

2

Table A2 Risky choice model parameters for choice-only data. Model

DOF

Parameters

Range

WV

3

a = scaling exponent c = probability weighting function exponent r = noise standard deviation

0a2 0c2 0r2

MDFT

4

/1 = sensitivity parameter /2 = decay parameter w = dominant direction weight r = noise standard deviation c = weighting function exponent

0  /1  1 0  /2  1 w = 12 0r2 0c2

MLBA

4

s = noise standard deviation for drift rates I0 = constant in drift rate formula cd = scaling factor in drift rate formula a = scaling exponent c = weighting function exponent A = starting range b = threshold

0s  1 0  I0  10 0  cd  10 a=1 0c2 A = 100 b = 1000

EBA

3

a = scaling exponent c = probability weighting function exponent

0a2 0c2 1h5

LEX

3

vj = probability weight for jth attribute, j = 1, 2, 3

h = elimination threshold

MV

2

c = weighting function exponent

0  vj  1, Rvj = 1 0c2

c = weighting function exponent r = noise standard deviation

0c2 0r2

A.1.3. Process-tracing version The process tracing version assumes that only the viewed values are included in the decision-making process. All missing values are replaced by the mean value across all attributes and trials. A.2. Weighted valuation (WV) model A.2.1. Choice-only version The WV model employs prospect theory to calculate the overall subjective value Ui of the ith alternative, given the probability pj and reward Vij of the jth outcome. We assume a normally distributed additive noise term. The subjective value of the reward is assumed to follow a power function (Tversky & Kahneman, 1992) and the weighting function for the probabilities takes the form derived in Karmarkar (1979; without rank dependent weighting).

65

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72 Table A3 Additional preferential and risky choice model parameters for choice-plus-RT data. Model

DOF

Parameters

Range

exG

3

l = mean of the Gaussian component r = standard deviation of the Gaussian component k = rate of the exponential component

0  l  100 0  r  100 0  k  100

MLBA

1

cm = additional scaling factor in drift rate formula Tnd = non-decision time

0  cm  10 Tnd = 0

EBA

2

Tmean = mean time to view a value Tsdev = sd of time to view a value

0  Tmean  100 0  Tsdev  100

LEX

2

Tmean = mean time to view a value Tsdev = sd of time to view a value

0  Tmean  100 0  Tsdev  100

Notes: The exG model is used to predict RT for the SRU, WV, and MV models. The parameters in this table are in addition to the parameters from Tables A1 and A2. The overall DOF for a model is the sum of the DOF in Table A1 or Table A2 and Table A3. The MLBA RT is measured in ms; all other models are in seconds.

Table A4 Process tracing model parameters. Model

DOF

Parameters

Range

SRU

4

wj = weight for jth attribute, j = 1, 2, 3, 4 r = noise standard deviation

0  wj  1, Rwj = 1 0r2

WV

3

a = scaling exponent c = probability weighting function exponent r = noise standard deviation

0a2 0c2 0r2

MASXXX

3

/1 = sensitivity parameter /2 = decay parameter r = noise standard deviation c = weighting function exponent

/1 = 0 0  /2  1 0r2 0c2

MV

2

c = weighting function exponent r = noise standard deviation

0c2 0r2

Ui ¼

3 X

pðpj ÞV aij þ ei ; ei  Nð0; r2 Þ;

j¼1

where

pðpÞ ¼

pc

pc : þ ð1  pÞc

ðA1Þ

The probability of selecting alternative i is given by

PðiÞ ¼ PðU i > U k

8k – iÞ:

A.2.2. Choice-plus-RT version As for the SRU model, RTs are assumed to follow an ex-Gaussian distribution. A.2.3. Process-tracing version The process tracing version assumes that only the viewed values are included in the decision-making process. All missing values are replaced by the mean value across all attributes and trials. A.3. Multi-alternative decision field theory (MDFT) The MDFT (Roe et al., 2001) generates preference states for each alternative in vector P(t). The preferences states evolve according to

Pðt þ 1Þ ¼ SPðtÞ þ Vðt þ 1Þ;

ðA2Þ

with initial state P(0) equal to the zero vector. V(t) is a vector of valences computed from the stimulus values. The valence vector is defined by V(t) = CMW(t), where Mij equals the value of alternative i on attribute j, C is a contrast matrix with diagonal entries equal to 1 and off-diagonal entries equal to ½, and W(t) indicates the attention allocated to each alternative, modeling attention switching as a multinomial

66

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

trials process (Roe et al., 2001). In the preferential choice task, we set the expected value of W(t) using Eq. (A1) applied to the fitted SRU wj parameters for relative importance of attributes. For the risky choice task, we set the expected value of W(t) using Eq. (A1) applied to the outcome probabilities. In both cases the weights were normalized to sum to one. S is a feedback matrix that defines how preference strength decays over time and how preference strengths interact across alternatives. We define the feedback matrix S as suggested in Hotaling, Busemeyer, and Li (2010):

Sij ¼ dij  /2 expð/1 D2ij Þ;

ðA3Þ

where Dij represents the psychological distance between the alternatives i and j in the attribute space as defined in Berkowitsch, Scheibehenne, Rieskamp, and Matthaus (2015) and dij is the Kronecker delta. We assume an external stopping time of 500 (in arbitrary units, as actual RTs are not being fit here) to ensure model convergence and fit parameters using the closed form solution given in Roe et al. (2001). This version of the MDFT is a modified version of Hotaling et al. (2010). The only difference was the use of the prospect theory Eq. (A1) when calculating attention weights. This change was motivated by the need to transform probabilities in the risky choice experiment. A.4. Models of attentional sampling (MASXXX) using process tracing The following models of attentional sampling (MAS) variants use process tracing information to update the preference state for each alternative over time. These models build off of the base version of the MDFT described previously. Information is added to the decision process in the order in which it is viewed. Each gaze or mouse click on a value is quantized, that is, the amount of time spent viewing a value is not used. Each unit time step is equivalent to examination of one value, i.e., one gaze. Until a stimulus value is viewed, that value is assumed to equal the mean of all values across all attributes and alternatives. This assumption produces the best results of numerous default values that were tried, including setting the default value to zero and allowing the default value to be a free parameter. Once viewed, the value is assumed known for the remainder of the trial. Future models could include, for example, a fading memory for not only preference states, but also values. The model updates preference states according to Eq. (A2) from the MDFT, but with /1 from Eq. (A3) fixed at 0 (results were insensitive to this parameter). The components of the attention vector are Wj(t) = p(xj) from Eq. (A1) if attribute j is attended at time t and 0 otherwise, where xj = wj from the SRU and xj = pj from the WV for the preferential and risky choice experiments, respectively. This form of Wj accounts for both which attribute is attended and how much weight is placed on that attribute. Recall that matrix M has entries Mij that equal the value of alternative i on attribute j. To normalize values across the two experiments, this matrix M is transformed to N = (M/hMi)a, where a = 0.5 and 2 in the preferential and risky choice experiments, respectively, and hMi is the mean value across all attributes, alternatives, and trials. Dividing by hMi has no effect on the model predictions, but serves to keep the parameter values similar across experiments. The contrast matrix C is defined as for the MDFT. For computational tractability, normally distributed noise with mean 0 and standard deviation r is added at the end of a trial, rather than as moment-by-moment information accumulation noise. Variants of the model were developed that assume different preference-formation updating processes. Each model has three binary digits after ‘‘MAS”. The first digit indicates whether values were used. The second digit indicates whether value contrasts were used. The third digit indicates whether the preference states for all alternatives were updated. That is, MAS111. MAS101. MAS110. MAS100. MAS000.

V(t) = CNW(t). V(t) = NW(t). Vi(t) = [CNW(t)]i if alternative i is attended at time t and 0 otherwise. Vi(t) = [NW(t)]i if alternative i is attended at time t and 0 otherwise. Vi(t) = Wj(t) for alternative i and 0 otherwise, when attending attribute j for alternative i at time t.

To illustrate how they make use of the process tracing information, Fig. A1 provides a representative trajectory from each of these MAS variants for a common stimulus. A.5. Multi-attribute linear ballistic accumulator (MLBA) model A.5.1. Choice-only version The MLBA (Trueblood et al., 2014) is an extension of the LBA (LBA; Brown and Heathcote, 2008) to multi-attribute choice. The LBA is a race model in which accumulators for each alternative move at a constant drift rate towards a choice threshold. The LBA builds stochastic elements into the starting points (uniform random variables chosen from [0, A]) and the drift rates (di, based on stimulus information and with normally distributed additive noise, s) for each accumulator, after which the race to threshold b is deterministic and linear. Thus, once the drift rates are determined from the stimuli, the MLBA proceeds as a ballistic accumulator as in the LBA (Eqs. (1)-(3) from (LBA; Brown and Heathcote, 2008)).

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

67

Fig. A1. The best-fitting process-tracing MAS model trajectories for a single, representative stimulus for a single participant. The lines show how the preference states for the three alternatives (i.e., the three gambles) change over time. Each column is an individual gaze. The color of the column and the number at the top of the column denote the viewed alternative and attribute, respectively. Dark, medium, and light gray colors denote alternatives 1, 2, and 3, respectively. Lower right: The risky choice stimulus values. Gamble 3 was selected by the participant.

In the MLBA, the drift rates are based on pairwise differences of subjective valuations of attributes across alternatives. Let Vik be the value of the kth attribute of the ith alternative. The subjective valuation of Vik, uik, is taken from prospect theory (Tversky and Kahneman, 1992) and defined as uik ¼ V aik . The pairwise difference of subjective valuations for kth attribute, uik  ujk, is multiplied by the attention weight, ak. This attention weight is defined using the weighting function p, with parameter c, as defined in (A1). That is,

ak ¼ pðwk Þ; where wk is equal to the fitted SRU w parameters for relative importance of attributes in the preferential choice experiment, and wk = pk, the probability of the kth outcome, in the risky choice experiment. The drift rate di for the ith accumulator, associated with the ith alternative is given by

di ¼ I0 þ

cd

PP j k ak ðuik  ujk Þ P P : a k k m umk

The additive constant I0 provides a baseline drift rate. The scaling parameter cd determines the extent to which differences in subjective valuation affect the drift rate as in Trueblood and Dasari (2017). The inner sum of the numerator sums across all weighted pairwise differences of subjective values and is an extension of Eq. (3) of Trueblood et al. (2014) to multiple attributes. The outer sum of the numerator then sums these values across all alternatives and is equivalent to Eq. (1) of Trueblood et al. (2014), with the constant I0 added as described previously. The sums in the denominator are normalizing factors. The first sum normalizes the attention weights to sum to 1. The second sum serves to differentiate the effects of parameters cd and a. Without this term, scaling of the difference could be accomplished by either changing cd or a. This factor also has the added benefit of centering the subjective valuations on one. Because the attributes are constant across trials, in the preferential choice experiment c = 1, so ak = wk. Letting c vary does not improve model performance in this context. In the risky choice experiment, c is a free parameter. To set the overall scale of the race process, the threshold b was fixed. The model fits were insensitive to the value of A, which was also left fixed. Because it improved fits in the preferential choice paradigm, but not the risky choice paradigm, a was free in preferential choice, but fixed to 1 in risky choice. Following the R-code in the supplemental materials of Donkin et al. (2009) and Heathcote et al. (2002), finishing times for alternatives with negative drift rates are set to a very large value (1000  the fastest accumulator). This version of the MLBA is modified from Trueblood et al. (2014). Although the conceptual framework and base LBA parameters are identical to Trueblood et al. (2014), we modified the original version of the MLBA to account for multiple attributes and a more general class of stimuli. The MLBA was modified in the following ways. First, the scaling parameter cd was added as in Trueblood and Dasari (2017). Second, we use a different subjective valuation function. In the original MLBA, the subjective value formulation involves 2 attributes and assumes alternatives lie on a line, ‘, in stimulus space. Our stimuli are generated from distributions and are not constrained to lie on a line or plane. The original MLBA projects attribute values of alternatives on ‘ onto a (generalized)

68

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

ellipse. For alternatives not lying on ‘, the attribute values are projected onto the x- and y-intercepts parallel to ‘. Although useful for reproducing context effects, this formulation is difficult to generalize to different stimulus classes. For example, it is not a continuous transformation - the subjective values of alternatives off ‘ do not approach the value of the limiting alternative on ‘. One of the advantages of the ellipse approach is to allow for extremeness aversion (or seeking). The subjective valuation function in the current work is taken from prospect theory (Tversky and Kahneman, 1992), which is a wellestablished and arguably less complex transformation that still allows for extremeness aversion (or seeking). Third, we use a different weighting function. The weighting function p used here is taken from prospect theory (Tversky and Kahneman, 1992) which has been used extensively in risky choice paradigms. The original weighting function of Trueblood et al. (2014), which allows for weighting based on similarity of alternatives, was also tried, but was not as successful. The parameter k from that weighting function tended to be fit to 0, implying that the weighting term did not depend on the value difference, i.e., the weight was a constant. The context in which this effect of similarity would occur may not have been present in the current experiments, highlighting the need to consider different experimental paradigms. A.5.2. Choice-plus-RT version To account for both choice and RT, the mean drift rate calculation was updated to the following:

di ¼ I0 þ

cd

PP cm maxðak uik Þ j k ak ðuik  ujk Þ k P P P þ : k ak m umk k ak

The left two terms are identical to the choice-only model. That model was tried, but the addition of the third term significantly improved performance. Some very common models of choice, e.g., utility theory, rely not on comparisons across alternatives, but on the absolute value of the attributes. We therefore added the rightmost term which allows the model to also account for absolute attribute values, i.e., not only value differences. In particular, the drift rate of an alternative with a high weighted value is increased, making it more likely to be selected. The additional parameter, cm, scales this term. The addition of this term significantly improved the fit of the model for choice and RT data (but not for choice alone). Other alternatives were tried, for example, relying on the sum of weighted values rather than the max value, but did not improve fit as much. Response time is defined as the sum of the time it takes the first accumulator to hit the response threshold and a nondecision time, Tnd. The model produced virtually identical results whether or not a non-decision time parameter Tnd was included, and, therefore, Tnd was set to 0. The model was again insensitive to the value of A, which was left fixed. Fixing these parameters avoided the BIC penalty associated with additional parameters without reducing overall fit. A.6. Elimination by aspect (EBA) heuristic A.6.1. Choice-only version We adapt the originally proposed EBA heuristic (Tversky, 1972) to the preferential choice task as follows. The jth attribute is randomly chosen with probability wj, and any alternative with a value below threshold parameter h for that attribute is eliminated. The process iterates through different attributes until only one alternative remains, which is then chosen. If all attributes have been considered and more than one alternative remains, an alternative is randomly selected from the remaining alternatives with highest value on the last considered attribute. Each attribute is attended at most once. For the risky choice task, the heuristic can be modified by using Eq. (A1), with p(pj) playing the role of wj. A.6.2. Choice-plus-RT version All possible orderings of attention to attributes until a decision is made are considered. The probability of each ordering i, P(ordi) is determined by the wj. Each attribute is attended at most once. After each attribute is attended, the number of remaining alternatives is updated as in the choice-only version. Each ordering i is associated with a number of values, n(ordi), in the grid that must be examined before reaching a decision. The time taken to examine a value is assumed to be normally distributed with mean Tmean and standard deviation Tsdev, i.e., N(Tmean, Tsdev). The overall RT distribution for choice j is given by

X Pðordi Þnðordi ÞNðT mean ; T sdev Þ i

where i ranges over all orderings leading to choice j. A.7. Lexicographic (LEX) heuristic A.7.1. Choice-only version Similar to the EBA, the LEX considers the attributes in turn, with the jth attribute randomly chosen with probability wj. If one alternative has greater value for that attribute than all other remaining alternatives, that alternative is chosen. If there is a tie for best value, then the next attribute is considered for the tied alternatives, repeating until a choice is made. Each attribute is attended at most once. For the risky choice task, the importance of the attributes are not known until the stimulus is

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

69

presented. Participants may be selecting an attribute to consider based on either the probability of the outcome, a preference ordering for the attributes (e.g., preferring the left-most attribute), or both. Indeed, participants in our studies showed a strong left-to-right order bias in the risky choice experiment. Therefore, the heuristic was modified by using weights, vj, to account for positional preference. These weights were combined with the weighting function of Eq. (A1), so that wj = vj p(pj). This variant was also tried for the EBA, but did not improve performance. A.7.2. Choice-plus-RT version RT is determined in the same way as for the EBA. A.8. Maximum attribute value (MV) heuristic A.8.1. Choice-only version The MV heuristic assumes that the participant searches the grid for the maximum value, weighted according to the importance of the attribute, and chooses the alternative with that maximum value. For the preferential choice task, values for each attribute are weighted using Eq. (A1) (using the fitted SRU wj parameters for relative importance of attributes), plus a normally distributed noise term:

U i ¼ maxðpðwj ÞV ij Þ þ ei ; 16j64

ei  Nð0; r2 Þ:

A similar formula holds for the risky choice task, using probability pj in place of wj. A.8.2. Choice-plus-RT version As for the utility models, RTs are assumed to follow an ex-Gaussian distribution. A.8.3. Process-tracing version In the process tracing version, only viewed values are considered for the maximum value. All missing values are assumed to be 0. Appendix B. LEX and EBA quantiles Fig. B1 in this appendix provides the results of the quantile analyses of the EBA and LEX models to the preferential and risky choice experiments.

Fig. B1. Quantile analysis for each participant for the LEX and EBA models for the preferential choice (left) and risky choice (right) experiments.

Appendix C. MLBA parameter and data recovery This appendix provides the results of the MLBA parameter and data recovery analysis. We generated 200 sets of parameters randomly selected from the same ranges as the participant fit parameters. Each of those parameter sets was used to generate simulated choice and RT data. The simulated data were then fit in the same manner as the participant data. The results are provided in Fig. C1 for the preferential choice experiment and Fig. C2 for the risky choice experiment. The left panel displays the fitted vs. generating values for each of the 5 MLBA parameters. Parameters were normalized by dividing by the maximum value of each parameter range. For the simulated data, the MLBA fits provide a predicted probability of each alternative on each trial. The probabilities of the selected alternatives averaged across the trials provides the expected value of the proportion correct for a given set of parameters. The middle panel provides a histogram of expected values of the pro-

70

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

Fig. C1. Results of the MLBA parameter and data recovery analysis for the preferential choice experiment. Left panel: Normalized fitted vs. generating parameter values. Middle panel: Histogram of the expected value of the proportion of correctly predicted choices. Right panel: Normalized predicted vs. simulated RTs.

Fig. C2. Results of the MLBA parameter and data recovery analysis for the risky choice experiment. Left panel: Normalized fitted vs. generating parameter values. Middle panel: Histogram of the expected value of the proportion of correctly predicted choices. Right panel: Normalized predicted vs. simulated RTs.

portion of correctly predicted choices across all sets of parameters. The fitted parameters also generate RT predictions. The right panel provides the complete set of predicted RTs against the corresponding simulated RTs for all trials across the 200 parameter sets. The RTs are normalized by dividing by the max simulated RT for a given simulated data set. Appendix D. Supplementary material Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.cogpsych.2017.08.001. References Ashby, N. J., Jekel, M., Dickert, S., & Glöckner, A. (2016). Finding the right fit: A comparison of process assumptions underlying popular drift-diffusion models. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(12), 1982. Berkowitsch, N. A., Scheibehenne, B., & Rieskamp, J. (2014). Rigorously testing multialternative decision field theory against random utility models. Journal of Experimental Psychology: General, 143(3), 1331. Berkowitsch, N. A., Scheibehenne, B., Rieskamp, J., & Matthaus, M. (2015). A generalized distance function for preferential choices. British Journal of Mathematical and Statistical Psychology, 68(2), 310–325. http://dx.doi.org/10.1111/bmsp.12048. Böckenholt, U., & Hynan, L. S. (1994). Caveats on a process-tracing measure and a remedy. Journal of Behavioral Decision Making, 7(2), 103–117. Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57(3), 153–178. Browne, M. W. (2000). Cross-validation methods. Journal of Mathematical Psychology, 44(1), 108–132. http://dx.doi.org/10.1006/jmps.1999.1279. Busemeyer, J. R., & Townsend, J. T. (1993). Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment. Psychological Review, 100(3), 432–459. http://dx.doi.org/10.1037/0033-295X.100.3.432. Cohen, A. L., & Nosofsky, R. M. (2003). An extension of the exemplar-based random-walk model to separable-dimension stimuli. Journal of Mathematical Psychology, 47(2), 150–165.

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

71

Cohen, A. L., Sanborn, A. N., & Shiffrin, R. M. (2008). Model evaluation using grouped or individual data. Psychonomic Bulletin & Review, 15(4), 692–712. Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American statistical association, 74(366a), 427–431. Diederich, A. (1995). Intersensory facilitation of reaction time: Evaluation of counter and diffusion coactivation models. Journal of Mathematical Psychology, 39(2), 197–215. Diederich, A. (1997). Dynamic stochastic models for decision making under time constraints. Journal of Mathematical Psychology, 41(3), 260–274. Diederich, A., & Oswald, P. (2016). Multi-stage sequential sampling models with finite or infinite time horizon and variable boundaries. Journal of Mathematical Psychology, 74, 128–145. Donkin, C., Averell, L., Brown, S., & Heathcote, A. (2009). Getting more from accuracy and response time data: methods for fitting the linear ballistic accumulator. Behavior Research Methods, 41(4), 1095–1110. http://dx.doi.org/10.3758/brm.41.4.1095. Fiedler, S., & Glöckner, A. (2012). The dynamics of decision making in risky choice: An eye-tracking analysis. Frontiers in Psychology, 3(335). http://dx.doi.org/ 10.3389/fpsyg.2012.00335. Fishburn, F. (1974). Lexicographic orders, utilities and decision rules: a survey. Management Science, 20(11), 1442–1471. Franco-Watkins, A. M., Davis, M. E., & Johnson, J. G. (2016). The ticking time bomb: Using eye-tracking methodology to capture attentional processing during gradual time constraints. Attention, Perception, & Psychophysics, 78(8), 2363–2372. Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association, 70(350), 320–328. Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103(4), 650–669. http://dx. doi.org/10.1037/0033-295X.103.4.650. Glöckner, A., & Betsch, T. (2008). Multiple-reason decision making based on automatic processing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(5), 1055. Glöckner, A., & Herbold, A.-K. (2011). An eye-tracking study on information processing in risky decisions: Evidence for compensatory strategies based on automatic processes. Journal of Behavioral Decision Making, 24(1), 71–98. http://dx.doi.org/10.1002/bdm.684. Glöckner, A., Hilbig, B. E., & Jekel, M. (2014). What is adaptive about adaptive decision making? A parallel constraint satisfaction account. Cognition, 133(3), 641–666. Glöckner, A., & Pachur, T. (2012). Cognitive models of risky choice: Parameter stability and predictive accuracy of prospect theory. Cognition, 123(1), 21–32. Gonzalez-Vallejo, C. (2002). Making trade-offs: A probabilistic and context-sensitive model of choice behavior. Psychological Review, 109(1), 137. Heath, R. A. (1981). A tandem random walk model for psychological discrimination. British Journal of Mathematical and Statistical Psychology, 34(1), 76–92. Heathcote, A. (1996). RTSYS: A DOS application for the analysis of reaction time data. Behavior Research Methods, 28(3), 427–445. Heathcote, A., Brown, S., & Mewhort, D. J. K. (2002). Quantile maximum likelihood estimation of response time distributions. Psychonomic Bulletin & Review, 9(2), 394–401. http://dx.doi.org/10.3758/bf03196299. Hogarth, R. M., & Karelaia, N. (2007). Heuristic and linear models of judgment: Matching rules and environments. Psychological Review, 114(3), 733–758. http://dx.doi.org/10.1037/0033-295X.114.3.733. Holmes, W. R., Trueblood, J. S., & Heathcote, A. (2016). A new framework for modeling decisions about changing information: The Piecewise Linear Ballistic Accumulator model. Cognitive Psychology, 85, 1–29. Hotaling, J. M., Busemeyer, J. R., & Li, J. (2010). Theoretical developments in decision field theory: Comment on Tsetsos, Usher, and Chater (2010). Psychological Review, 117(4), 1294–1298. http://dx.doi.org/10.1037/a0020401. Johnson, E. J., Schulte-Mecklenbeck, M., & Willemsen, M. C. (2008). Process models deserve process data: Comment on Brandstätter, Gigerenzer, and Hertwig (2006). Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–292. Karmarkar, U. S. (1979). Subjectively weighted utility and the Allais paradox. Organizational Behavior and Human Performance, 24(1), 67–72. Krajbich, I., Armel, C., & Rangel, A. (2010). Visual fixations and the computation and comparison of value in simple choice. Nature Neuroscience, 13(10), 1292–1298. Krajbich, I., & Rangel, A. (2011). Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions. Proceedings of the National Academy of Sciences, 108(33), 13852–13857. Kwak, Y., Payne, J. W., Cohen, A. L., & Huettel, S. A. (2015). The rational adolescent: Strategic information processing during decision making revealed by eye tracking. Cognitive development, 36, 20–30. Lacouture, Y., & Cousineau, D. (2008). How to use MATLAB to fit the ex-Gaussian and other probability functions to a distribution of response times. Tutorials in Quantitative Methods for Psychology, 4(1), 35–45. Lamberts, K. (1995). Categorization under time pressure. Journal of Experimental Psychology: General, 124(2), 161. Lewandowsky, S., & Farrell, S. (2010). Computational modeling in cognition: Principles and practice. Sage Publications. Lohse, G. L., & Johnson, E. J. (1996). A comparison of two process tracing methods for choice tasks. Organizational Behavior and Human Decision Processes, 68, 28–43. McClelland, J. L. (1979). On the time relations of mental processes: An examination of systems of processes in cascade. Psychological Review, 86(4), 287. Mullett, T. L., & Stewart, N. (2016). Implications of visual attention phenomena for models of preferential choice. Decision, 3(4), 231. Newell, B. R., & Lee, M. D. (2011). The right tool for the job? Comparing an evidence accumulation and a naive strategy selection model of decision making. Journal of Behavioral Decision Making, 24(5), 456–481. Nicholas, C. A., & Cohen, A. L. (2016). The effect of interruption on the decision-making process. Judgment and Decision Making, 11(6), 611. Noguchi, T., & Stewart, N. (2014). In the attraction, compromise, and similarity effects, alternatives are repeatedly compared in pairs on single dimensions. Cognition, 132(1), 44–56. Nosofsky, R. M., & Palmeri, T. J. (1997). An exemplar-based random walk model of speeded classification. Psychological Review, 104(2), 266. Orquin, J. L., & Loose, S. M. (2013). Attention and choice: A review on eye movements in decision making. Acta Psychologica, 144(1), 190–206. Payne, J. W. (1976). Task complexity and contingent processing in decision making: An information search and protocol analysis. Organizational Behavior and Human Performance, 16(2), 366–387. Payne, J. W., Bettman, J. R., & Johnson, E. J. (1988). Adaptive strategy selection in decision making. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(3), 534. Pleskac, T. J., Diederich, A., & Wallsten, T. S. (2015). Models of decision making under risk and uncertainty. The Oxford handbook of computational and mathematical psychology (p. 209). USA: Oxford University Press. Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 111–163. Ratcliff, R. (1980). A note on modeling accumulation of information when the rate of accumulation changes over time. Journal of Mathematical Psychology, 21 (2), 178–184. Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20(4), 873–922. http://dx. doi.org/10.1162/neco.2008.12-06-420. Reisen, N., Hoffrage, U., & Mast, F. W. (2008). Identifying decision strategies in a consumer choice situation. Judgment and Decision Making, 3(8), 641. Rieskamp, J., Busemeyer, J. R., & Mellers, B. A. (2006). Extending the bounds of rationality: Evidence and theories of preferential choice. Journal of Economic Literature, 44(3), 631–661. Roe, R. M., Busemeyer, J. R., & Townsend, J. T. (2001). Multialternative decision field theory: A dynamic connectionst model of decision making. Psychological Review, 108(2), 370. Scheibehenne, B., Rieskamp, J., & González-Vallejo, C. (2009). Cognitive models of choice: Comparing decision field theory to the proportional difference model. Cognitive Science, 33(5), 911–939.

72

A.L. Cohen et al. / Cognitive Psychology 98 (2017) 45–72

Schulte-Mecklenbeck, M., Kühberger, A., & Ranyard, R. (2011). A handbook of process tracing methods for decision research: A critical review and user’s guide. Psychology Press. Shah, A. K., & Oppenheimer, D. M. (2008). Heuristics made easy: An effort-reduction framework. Psychological Bulletin, 134(2), 207. Shi, S. W., Wedel, M., & Pieters, F. (2013). Information acquisition during online decision making: A model-based exploration using eye-tracking data. Management Science, 59(5), 1009–1026. Shimojo, S., Simion, C., Shimojo, E., & Scheier, C. (2003). Gaze bias both reflects and influences preference. Nature Neuroscience, 6(12), 1317–1322. Smith, P. L., & Ratcliff, R. (2009). An integrated theory of attention and decision making in visual signal detection. Psychological Review, 116(2), 283. Stewart, N., Hermens, F., & Matthews, W. J. (2016). Eye movements in risky choice. Journal of Behavioral Decision Making, 29(2–3), 116. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34(4), 273–286. http://dx.doi.org/10.1037/h0070288. Towal, R. B., Mormann, M., & Koch, C. (2013). Simultaneous modeling of visual saliency and value computation improves predictions of economic choice. Proceedings of the National Academy of Sciences, 110(40), E3858–E3867. Train, K. (2003). Discrete choice methods with simulation. Cambridge University Press. Trueblood, J. S., & Dasari, A. (2017). The Impact of Presentation Order on the Attraction Effect in Decision-making. Proceedings of the 39th Annual Conference of the Cognitive Science Society. Trueblood, J. S., Brown, S. D., & Heathcote, A. (2014). The multiattribute linear ballistic accumulator model of context effects in multialternative choice. Psychological Review, 121(2), 179. Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review, 79(4), 281. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4), 297–323. Usher, M., & McClelland, J. L. (2004). Loss aversion and inhibition in dynamical models of multialternative choice. Psychological Review, 111, 757–769. Venkatraman, V., Payne, J. W., & Huettel, S. A. (2014). An overall probability of winning heuristic for complex risky decisions: Choice and eye fixation evidence. Organizational Behavior and Human Decision Processes, 125(2), 73–87. Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804. Yang, L., Toubia, O., & De Jong, M. G. (2015). A bounded rationality model of information search and choice in preference measurement. Journal of Marketing Research, 52(2), 166–183. Zandbelt, B. (2014). Exgauss: a MATLAB toolbox for fitting the ex-Gaussian distribution to response time data, .