Journal of Mathematical Psychology 91 (2019) 145–158
Contents lists available at ScienceDirect
Journal of Mathematical Psychology journal homepage: www.elsevier.com/locate/jmp
Linking the diffusion model and general recognition theory: Circular diffusion with bivariate-normally distributed drift rates Philip L. Smith Melbourne School of Psychological Sciences, The University of Melbourne, Victoria, 3010, Australia
highlights • • • •
Develops link between the diffusion decision model and general recognition theory. Assumes bivariate normal distribution of drift rates in circular diffusion model. Derives distributions of decision times and outcomes for correlated drift rate components. Speed and accuracy predictions depend on the covariance matrix of the drift rate distribution.
article
info
Article history: Received 21 December 2018 Received in revised form 14 June 2019 Accepted 15 June 2019 Available online 25 June 2019 Keywords: Diffusion process Signal detection Response time Psychophysics Working memory
a b s t r a c t The circular diffusion model is a model of continuous outcome decisions, which are modeled as evidence accumulation by a two-dimensional Wiener diffusion process on the interior of a disk whose bounding circle represents the decision criterion. When there is across-trial variability in the evidence entering the decision process, represented by variability in drift rates, the model predicts that inaccurate responses will be slower than accurate responses, in agreement with, and generalizing, the slow-error property of the one-dimensional diffusion model of two-choice decisions. A natural generalization of the one-dimensional model’s assumption of normally distributed drift rates is provided by general recognition theory, which represents the perceptual effects of a two-dimensional stimulus as a bivariate distribution with possibly correlated components. Analytic predictions are derived for the joint distributions of decision outcomes and decision times for the circular diffusion model with bivariate-normally distributed drift rates with correlated components. The decision time and accuracy predictions of the model are shown to depend on the relationship between the orientation of the ellipses of equal likelihood of the drift-rate distribution and the orientation of the mean drift rate vector. The analytic expressions for the model can be computed efficiently and are well suited to fitting data. © 2019 Elsevier Inc. All rights reserved.
1. Introduction Researchers in cognitive psychology have increasingly become interested in experimental tasks in which people make judgments about continuously distributed stimulus attributes, such as the hues of color patches, the orientations of line segments or gratings, or the direction of motion. In the area of visual working memory, in particular, these kinds of continuous outcome decisions play a central role in studies of how memory representations change in quality as the number of items stored in memory varies (Adam, Vogel, & Awh, 2017; van den Berg, Awh, & Ma, 2014; Marshall & Bays, 2013; Oberauer & Lin, 2017; Wilken & Ma, 2004; Zhang & Luck, 2008). In many tasks of this kind, although not all, the decision can usefully be thought of as a mapping from a two-dimensional (2D) stimulus space to a E-mail address:
[email protected]. https://doi.org/10.1016/j.jmp.2019.06.002 0022-2496/© 2019 Elsevier Inc. All rights reserved.
continuous, closed response space: The stimuli are represented as points or vectors in a 2D space and the response space is a circular contour [−π, π] or a subset of it. As an example of this kind of task, Fig. 1a shows a stimulus like the ones used in a study by Adam et al. (2017, Experiments 1b and 2b), in which people were asked to report the remembered orientation of a radius inscribed within a circle and to indicate their decision by clicking a mouse on the circle’s circumference. Fig. 1b shows a commonly used variant of the task in which people are asked to make decisions about the orientation of a bar or a grating patch (Gorgoraptis, Catalao, Bays, & Husain, 2011; Kvam, 2019; Marshall & Bays, 2013; Rademaker, Tredway, & Tong, 2012). The second example differs from the first only in the rotational symmetry of the stimuli, which reduces the range of unique decision outcomes from [−π, π] to [0, π]. In what is perhaps the most widely-studied form of the continuous outcome task, people make decisions about the hues of color patches that
146
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
are defined in a 2D isoluminant color space (Wyszekci & Stiles, 1982) and express their decisions by indicating the corresponding point on a color wheel (Adam et al., 2017, Experiments 1a and 2a, Bays, Catalao, & Husain, 2009; Kool, Conway, & Turk-Brown, 2014; Oberauer & Lin, 2017; Wilken & Ma, 2004; Zhang & Luck, 2008). Historically, the continuous outcome decision task has its origins in the method of adjustment of classical psychophysics (Woodworth & Schlosberg, 1954), in which sensory thresholds were measured by asking people to adjust the intensity of a variable stimulus to match a standard. Prinzmetal, Amiri, Allen, and Edwards (1998) adapted the task to study the effects of attention on perceptual variance and it was subsequently used by Wilken and Ma (2004) to study visual working memory. Since then, the task has become the method of choice for many researchers in visual working memory (see Ma, Husain, & Bays, 2014, for a review), for whom its appeal is that – unlike the traditional two-alternative forced-choice task – it yields a measure of the precision of responding. Precision characterizes the dispersion of decision outcomes around the true stimulus value and can be interpreted as an expression of the fidelity with which items are represented in memory. The challenge for researchers is then to explain the decrease in item precision that is found empirically with increasing memory load. A number of competing theories of visual working memory have been proposed that seek to explain this relationship, all of which use continuous outcome performance as the meeting point for theory and data (Bays, 2014; Bays et al., 2009; van den Berg et al., 2014; Oberauer & Lin, 2017; Zhang & Luck, 2008). One of the limitations of research in this area is that, until recently, there has been no working model of the decision process that is involved in continuous outcome tasks. There has, in particular, been no counterpart of the sequential sampling models of two-choice decision making (Busemeyer & Townsend, 1993; Ratcliff, 1978; Usher & McClelland, 2001), which have successfully accounted for the relationship between decision outcomes and decision times in a variety of settings (Ratcliff, Smith, & McKoon, 2015). Recently, however, two such models have been proposed, the circular diffusion model of Smith (2016) and the spatially continuous diffusion model (SCDM) of Ratcliff (2018). Both of these models represent decision making as evidence accumulation by a diffusion or diffusion-like process. Smith’s model assumes that the dimensionality of the decision process is equal to the dimensionality of the stimulus space, which, for decisions about color, orientation, and direction of motion, leads to a 2D evidence accumulation process. Ratcliff’s model assumes that the dimensionality of the process is equal to the dimensionality of the response space (i.e., the number of different response alternatives), which leads to an infinite-dimensional evidence accumulation process.1 My focus in this article is on one of these models, the circular diffusion model of Smith (2016), and my aim is to present new results that elaborate the theoretical link between it and another model of decisions about multi-attribute stimuli, the general recognition theory (GRT) of Ashby and Townsend (1986). The latter is a multidimensional generalization of signal detection theory (Green & Swets, 1966), which allows for correlated acrosstrial variability in the representations of stimulus attributes. Here I show that the GRT assumption, that two-feature stimuli are represented cognitively by bivariate normal distributions with correlated dimensions, can be incorporated into the circular diffusion model in a straightforward way, and that it provides explicit 1 The evidence accumulation process in Ratcliff’s (2018) model is infinite dimensional in a phase space sense. The phase space of a system describes the number of different coordinates needed to characterize its state (Schutz, 1980, p. 28).
Fig. 1. Stimuli for continuous outcome and GRT decision tasks. (a) Orientation discrimination task. The task is to indicate the remembered orientation of the radius by indicating the corresponding point on the circumference of the circle. (b) Orientation discrimination task with restricted [0, π] range of outcomes. (c) GRT decision task. The stimulus consists of two binary-valued (vertical or horizontal) features. The task is to identify which of the four possible stimulus configurations was presented.
analytic predictions for the speed and accuracy of continuous outcome decisions about stimuli with correlated across-trial variability in their components. I have focused on the circular diffusion model because, unlike the SCDM of Ratcliff (2018), for which no analytic solutions exist, there exist tractable analytic solutions for the circular diffusion model that allow the connections with GRT to be developed in an explicit way. As a motivating example of why this link may be theoretically interesting, Fig. 1c shows an example of a decision task similar to one studied by Corbett and Smith (2017) that could be represented in a GRT framework.2 The stimuli consist of pairs of binary-valued features — in this case, pairs of gratings, each of which can be oriented vertically (V) or horizontally (H). The decision maker’s task is to determine which of the four possible stimulus configurations (VV, VH, HV, or HH) was presented and respond accordingly. As formalized in Section 1.2, GRT assumes that two-feature stimuli of this kind are represented cognitively as bivariate normal distributions, possibly with correlated components, where the ‘‘components’’ are representations of the individual features. Psychologically, the mean of the distribution characterizes the average quality of the encoded stimulus representation, the variance characterizes across-trial variability in quality, and the correlation characterizes across-trial covariance. The decision on any trial is based on a single bivariate sample from this distribution which is compared with a pair of decision bounds that partition the space into four response regions (Ashby & Townsend, 1986; Kadlec & Townsend, 1992). As discussed by Smith (2016) and developed in Section 4.2, these kinds of categorical decisions can be represented in a natural way in the circular diffusion model by partitioning the continuous response space with decision bounds. Like the standard form of the two-choice diffusion model, the circular diffusion model assumes that, in addition to moment-to-moment diffusive variability in evidence accumulation, there may be across-trial variability in the evidence entering the decision process, which is represented by the drift rate, and in the amount of evidence required for a response, which is represented by the decision criterion. The link between this kind of representation and GRT comes via the assumption of across-trial variability in drift rates. Specifically, the bivariate normal stimulus representation in GRT, including any correlation between the stimulus features, can be identified with the components of drift rate variability in the 2 Corbett and Smith (2017, Experiments 1a and 1b) studied search for vertical grating targets among horizontal distractors in four- rather than two-item displays. Smith and Corbett (2019) presented a hyperspherical generalization of the circular diffusion model (see Section 2.2) to account for speed and accuracy of decisions about the contents of four-item displays. Unlike the models described in this article, they assumed the components of across-trial variability in the drift rate of the 4D diffusion process were independent.
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
147
√ the drift rate has a length or norm, ∥µ∥ = µ21 + µ22 , and a phase angle, or direction, θµ = arctan(µ2 /µ1 ). Psychologically, the drift
Fig. 2. Circular diffusion model of continuous outcome decision making. Evidence accumulation is modeled as a two dimensional Wiener diffusion process on the interior of a disk, whose bounding circle, of radius a, represents the decision criterion. The process X t = (Xt1 , Xt2 ) consists of independent components, with drift rate µ = (µ1 , µ2 ) and infinitesimal standard deviation σ . In polar coordinates, the norm of the drift rate is ∥µ∥ and the phase angle is θµ . A decision is made when the accumulating evidence reaches the criterion. The hitting point, X θ , represents the decision outcome and the hitting time, Ta , represents the decision time.
circular diffusion model. My goal in this article is to characterize the implications of this identification in detail. 1.1. The circular diffusion model Fig. 2 summarizes the main properties of the circular diffusion model. The model conceives of the decision process as a noisy, two-dimensional (2D) evidence accumulation process on the interior of a disk in the Cartesian plane, R2 , whose bounding circle, of radius, a, represents the decision criterion for the task. The evidence accumulation process is represented by a 2D Wiener diffusion process, X t = (Xt1 , Xt2 )′ , described by the vector-valued stochastic differential equation dXt = µ dt + σ dWt ,
(1)
or in components,
[
dXt1 dXt2
]
[ =
µ1 µ2
]
[ dt +
σ 0
0
σ
][
dWt1 dWt2
]
,
(2)
where dWt = [dWt1 , dWt2 ]′ is the differential of a two-dimensional Brownian motion, or Wiener, process and where the prime denotes the matrix transpose. The quantity dW t describes the horizontal and vertical components of the random change in the process X t during a small interval of length dt. The quality of the information in the stimulus is represented by the drift rate, µ = (µ1 , µ2 )′ , while the rate at which the process diffuses towards the decision boundary is determined by the infinitesimal standard deviation, σ , which is assumed to be the same in the horizontal and vertical directions. The square of the infinitesimal standard deviation, σ 2 , is the diffusion coefficient. The matrix σ on the right-hand side of (1) and (2) is the dispersion matrix. Psychologically, the model has very similar properties to the 1D diffusion model of two-choice decisions of Ratcliff and colleagues (Ratcliff, 1978; Ratcliff & McKoon, 2008). On stimulus presentation, noisy evidence begins to accumulate, starting at the origin, X 0 = 0, at a rate and in a direction determined by the drift rate vector µ. The irregular sample path in Fig. 2 represents the accumulating evidence on a single experimental trial. Because of the circular symmetry of the model, it is often more convenient to represent the drift rate in polar coordinates. In polar coordinates,
norm represents the quality of the information in the stimulus while the phase angle represents its identity. Evidence accumulation continues until the process hits a point on the bounding circle. The hitting point is denoted X θ and the hitting time is denoted Ta , where the capitalization in the notation indicates that the hitting point and hitting time are both random variables. In applications of the model to continuous outcome decision tasks, the hitting point corresponds to the subject’s report of the stimulus identity and the hitting time corresponds to the decision time. Because of the noisiness of the evidence accumulation process, repeated presentation of the same stimulus, represented by a given value of the drift rate vector, µ, will lead to a range of different hitting points and decision outcomes across trials. The predicted distribution of hitting points follows a von Mises distribution (Rogers & Williams, 1987/2000; Smith, 2016), which is a circular counterpart of the normal distribution. Distributions of report outcomes from continuous outcome decision tasks are well described by mixtures of von Mises distributions (van den Berg et al., 2014; Oberauer & Lin, 2017; Zhang & Luck, 2008) so the model aligns nicely with the current experimental literature. Clearly, the circular symmetry of the model implies a strong set of assumptions about the underlying stimulus representations and how decisions are made about them. The model in the form shown in Fig. 2 is likely not applicable to tasks in which the response alternatives are represented as a set of points on a line or a semicircle (Ratcliff, 2018) — although Smith (2016) discussed ways in which it might be extended to these kinds of tasks as well. Nevertheless, the model in its basic form is well suited to the most widely-used continuous outcome tasks in the cognitive literature, such as decisions about color, orientation, or direction of motion. The model can be extended to apply to rotationally symmetrical orientation discrimination tasks like the one in Fig. 1b either by remapping the space via the transformation θ ↦ → 2θ (Kvam, 2019), or by identifying antipodal points of the response circle (Lilburn, Smith, & Sewell, 2019, see Section 3.3). The model’s principal theoretical advantage is that it yields analytic expressions for the joint distributions of decision times and decision outcomes that are fast and easy to compute. As a corollary, it yields an explicit and theoretical meaningful expression for the precision of the distribution of report outcomes. The dispersion properties of the von Mises distribution are characterized by a precision parameter, κ (Fisher, 1993; Mardia & Jupp, 1999), which determines how concentrated or spread out the responses are on the range (−π, π ). How precision varies with memory load is a central theoretical question in the contemporary visual working memory literature, but current memory models provide no explicit account of why the distribution of decision outcomes should analytically follow a von Mises distribution or why precision should take the form it does. Smith (2016, Eq. 29) showed that the precision of the distribution of decision outcomes predicted by the circular diffusion model admits the decomposition
κ=
a∥µ∥
σ2
.
(3)
That is, precision is the product of the decision criterion and the drift norm, scaled in diffusion coefficient units. Put otherwise, precision is equal to the quality of the information in the stimulus, multiplied by the amount of information needed for a response, divided by the noisiness of the evidence accumulation process. This explicit and interpretable decomposition of precision relates the model in an explicit way to earlier random walk models of two-choice decision making (Link, 1975) and is one of its attractive theoretical properties.
148
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
The circular diffusion model shares many of the well-known properties of random walk and diffusion process models of twochoice decisions, particularly those pertaining to response time (RT) ordering (Luce, 1986). Like those models, when the only source of variability in the model is within-trial noise in the evidence accumulation process, the model predicts that the distributions of RT will be the same for all decision outcomes.3 When there is across-trial variability in the quality of stimulus information entering the decision process, represented by variability in the drift rate, it predicts that mean RT and accuracy will be negatively correlated. When there is across-trial variability in the amount of evidence needed for a response, represented by variability in the radius of the criterion circle, then it predicts that mean RT and accuracy will be positively correlated. These properties are continuous counterparts of the slow-error and fast-error predictions of the 1D diffusion model of two-choice decisions with variability in drift rate and starting point, respectively (Ratcliff & McKoon, 2008), which are widely used to fit empirical data (Ratcliff & Smith, 2004). Like the 1D diffusion model, the relationships between mean RT and accuracy carry over to the ordering of distributions of RT (Smith, 2016, Figure 11). 1.2. General recognition theory General recognition theory (GRT) was introduced by Ashby and Townsend (1986) as a model of the accuracy of decisions about multi-attribute stimuli. It is most straightforwardly viewed as a generalization of the multidimensional signal detection model with independent Gaussian components that is widely used in the psychophysics literature (Eckstein, Thomas, Palmer, & Shimozaki, 2000; Palmer, Ames, & Lindsey, 1993; Shaw, 1982; Smith, 1998). Whereas the Gaussian signal detection model assumes that the perceptual effect of a stimulus or set of stimulus features can be described by independent normal distributions, GRT allows the perceptual effects of the features to be correlated. In the most widely studied form of GRT, which describes a stimulus with two correlated features, the perceptual effect of a two-feature stimulus is represented by a bivariate normal distribution with probability density function N(ξ; µ, Σ ) =
1 2π |Σ |1/2
[ ] 1 ′ −1 exp − (ξ − µ) Σ (ξ − µ) , 2
(4)
where ξ ′ = (ξ1 , ξ2 ) is a bivariate normal random variable with mean µ and covariance matrix Σ with determinant |Σ |. In components, N(ξ1 , ξ2 ; µ1 , µ2 , η1 , η2 , ρ ) = 1
{ exp −
1
[
(ξ1 − µ1 )2
2(1 − ρ 2 )
η12 ]} 2ρ (ξ1 − µ1 )(ξ2 − µ2 ) (ξ2 − µ2 ) − + , η1 η2 η22
2πη1 η2
√
1 − ρ2
2
(5)
3 One form of random walk model, Link and Heath’s (1975) relative judgment theory, can predict both fast errors and slow errors without across-trial variability. It does so by assuming asymmetry in the moment generating function of the distribution of increments to the walk. In relative judgment theory, the ordering of mean RTs for correct responses and errors depends on a parameter, γi , the moment generating asymmetry parameter, which is equal to minus the ratio of the derivatives of the moment generating function, M(θ ), evaluated at the zero and nonzero roots of the equation M(θ ) = 1 (Link, 1975, p. 133). When the ratio is greater than 1.0, correct responses will be slower than errors and when the ratio is less than 1.0, correct responses will be faster than errors. Although the device of moment generating function asymmetry provides a formal solution to the problem of RT ordering, the theory did not ultimately gain widespread acceptance because it did not provide an account of how and why the distribution of increments to the walk should vary with stimulus conditions and experimental instructions in the way the theory implied.
where µ1 and µ2 and η12 and η22 are the means and the variances of the features ξ1 and ξ2 , respectively, and ρ is the correlation between them. Much of the theoretical interest in GRT lies in analyzing how the perceptual effect of a stimulus interacts with the decision rule used to classify it. The most common assumption is that the decision maker partitions the (ξ1 , ξ2 ) perceptual space into decision regions by means of decision bounds, usually represented as straight lines, which may or may not be parallel to the coordinate axes. The proportion of responses of each kind depends both on the distributions of the stimulus components, including any correlation between them, and the way in which the perceptual space is partitioned by decision bounds. As in signal detection theory, the partitioning of the perceptual space by decision bounds in GRT is tantamount to transforming, or mapping, the perceptual space into the evidence space of the decision process, in such a way that perceptual representations falling on one side of a decision bound provide evidence for classifying the stimulus in one way and representations falling on the other side provide evidence for classifying it in another. In passing from a static model like signal detection theory or GRT to a dynamic model like the 1D or 2D diffusion models, in which evidence is represented by a drift rate, the mapping from perceptual space to evidence space is effected by comparing the perceptual representation to a referent (Link, 1975) or drift criterion (Ratcliff, 1985; Ratcliff, Van Zandt, & McKoon, 1999). The resulting evidence space can be viewed as an affine transformation of the perceptual space: Psychologically, the drift criterion is an indifference point in the perceptual space that represents the zero point of the evidence space. In the 1D diffusion model, the drift criterion is scalar-valued: If x is the perceptual effect produced by a given stimulus and x0 is the drift criterion, then the drift rate on that trial will be µ = x − x0 . In higher dimensions, the drift criterion will be vector-valued (Smith & Corbett, 2019). In a 2D diffusion model, if the perceptual effect produced by a given stimulus is x = (x1 , x2 ) and the drift criterion is x0 , then the drift rate on that trial will be µ = x − x0 . In the circular diffusion model of Fig. 2, the magnitude of the drift norm, ∥µ∥, is equal to the Euclidean distance of the perceptual effect, x, from the indifference point, x0 . In Ashby’s (2000) stochastic GRT model, discussed in Section 4.3, which models binary classification decisions about stimuli defined in 2D perceptual spaces, the drift criterion is replaced by a decision bound, which is a line that partitions the perceptual space into two decision regions. Like the 1D diffusion model, the magnitude of the drift rate, |µ|, in Ashby’s model is equal to the minimum Euclidean distance of the stimulus from this line. A large part of the appeal of GRT lies in the plausibility of its correlated-components assumption, which captures, among other things, the effects of trial-to-trial fluctuations in attention. If there are fluctuations in the amount of attention directed to the stimulus or stimulus display as a whole, such that both components are either well attended or poorly attended, then a positive correlation between them would be expected. If there are fluctuations in the allocation of attention across features, such that one feature is attended at the expense of the other, then a negative correlation between them would be expected. Bonnel and Prinzmetal (1998) reported these kinds of correlations in a dual-task study in which subjects were required to make simultaneous judgments about color and shape. When the task required judgments about the color and shape of the same stimulus, performance on the two tasks was positively correlated; when the task required judgments about the color and shape of stimuli that were separated in space, performance on the two tasks was negatively correlated. Compared to the independent-components signal detection model, the richer mathematical representation provided by GRT
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
provides a way to embody these kinds of assumptions in a formal model and to investigate their consequences in a systematic way.4 Since it was first introduced, GRT has undergone extensive theoretical and methodological development, and has been widely applied to a variety of cognitive tasks (Ashby & Lee, 1991; Fitousi, 2018; Kadlec & Townsend, 1992; Silbert & Thomas, 2013, 2014, 2017; Soto & Ashby, 2015; Soto, Vucovich, Musgrave, & Ashby, 2015; Thomas, 1995, 2001, 2003; Thomas, Altieri, Silbert, Wenger, & Wessels, 2015). Paralleling these developments, attempts have been made to develop stochastic versions of GRT that can predict RT as well as accuracy (Ashby, 1989, 2000; Townsend, Houpt, & Silbert, 2012). While some of these representations, particularly that of Townsend et al., allow for fairly general assumptions about the effect of the correlation between the stimulus components on the rates at which they are processed, the resulting models are not particularly analytically tractable. In contrast, the model I consider here, which combines the circular diffusion model with GRT assumptions about across-trial variability and correlated stimulus components, while more restrictive than the model of Townsend et al. yields explicit analytic expressions for the joint distributions of decision outcomes and RT. The stochastic GRT model of Ashby (2000) also leads to an analytically tractable RT model, but only for tasks in which the 2D stimulus space is mapped to a two-choice response. I consider the relationship between the circular diffusion model with decision bounds and Ashby’s model in Section 4.3. The organization of the rest of this article is as follows. In Section 2.1, I review the mathematical foundations of the circular diffusion model. To formalize the link with GRT, I consider how the predictions of the circular diffusion model are affected by across-trial variability in drift rates. In Section 2.2, I review the results of Smith and Corbett (2019) who derived predictions for a model with across-trial variability in drift rates under the restrictive assumption that the components of drift rate variability are independent. In Section 3, I present new results on the properties of the model with correlated across-trial variability in drift rates. In some tasks, there may be no strong basis for assuming privileged or canonical coordinates for the stimulus space. The components of across-trial variability are then most usefully conceived of as variability in the radial and tangential components of drift rate. Making this identification is tantamount to rotating the coordinates of the process so that one of the axes aligns with the mean drift rate vector. The results in Section 3.3 extend the model to this more general setting. Specifically, they express the properties of the process in the new, rotated coordinates as a function of the old coordinates, and characterize the relationship between the two. In Section 4, I consider the empirical predictions of models with correlated drift rate components, including, in Section 4.2, the case in which the response space is partitioned into categories with decision bounds, as foreshadowed in the discussion of the decision task in Fig. 1c. Section 5 considers general implications of the results. 4 Another way to think about the effects of attention in GRT, which links with the literature on continuous-outcome decision tasks, is to assume that attention affects the variances of stimulus representations across trials. Prinzmetal et al. (1998) found that attention reduced report-outcome variance in a continuousoutcome decision task relative to inattention and, consistent with this, Maddox and Dodd (2003) found that the estimated stimulus variances in a GRT categorization model were reduced on an attended dimension relative to an irrelevant dimension. Maddox and Dodd attributed the reduction in variance to the effects of selective attention to the relevant dimension. It is not my intention to try to adjudicate between these alternative views of attention – and, indeed, different kinds of effects may be found in different tasks – but simply to show how correlated across-trial variability can be incorporated into the circular diffusion model and to indicate why it may be of theoretical interest to do so.
149
2. Mathematical preliminaries 2.1. The Girsanov theorem and the Bessel process The main theoretical quantities of interest in the circular diffusion model are the joint distributions of decision times and decision outcomes. Smith (2016) showed that the joint distribution can be obtained by applying the Girsanov change-ofmeasure theorem (Karatzas & Shreve, 1991; Rogers & Williams, 1987/2000) to the first-passage time distribution of the 2D Bessel process. The √ Bessel process, Rt , describes the Euclidean distance, ∥W t ∥ = (Wt1 )2 + (Wt2 )2 , of a Wiener diffusion process, W t , with zero drift rate from its starting point. When W 0 = 0, all of the relevant information about the zero-drift process is carried by the Euclidean distance process. We denote by dPt (a) the probability density function of the Bessel process through the decision boundary a at time t. Symbolically, d P [Rt ≤ a | Rs < a, 0 ≤ s < t ]. dt The first-passage time density function has an infinite series representation of the form dPt (a) =
dPt (a) =
∞ σ 2 ∑ j0,k
a2
J1 (j0,k )
k=1
( exp −
j20,k σ 2 2a2
) t
,
(6)
(Borodin & Salminen, 1996, p. 297; Hamana & Matsumoto, 2013). In this equation J1 (x) is a first-order Bessel function of the first kind and the j0,k terms are the zeros of a zero-order Bessel function of the first kind, J0 (x) (Abramowitz & Stegun, 1965, p. 360). The term J1 (j0,k ) in the denominator represents the values of the function J1 (x) evaluated at the zeros of J0 (x). We denote by dP˜ t (θT ) the probability density that the nonzerodrift diffusion process hits the bounding circle at a point X T with phase angle θT at time Ta . We use a capital T to indicate that the hitting point and hitting time are both random variables but omit the a from the notation for the hitting time to avoid double subscripting. The tilde indicates that P˜t is the probability distribution of the nonzero-drift process. The Girsanov theorem states that the probability density functions dP˜ t (θT ) and dPt (a) are related in a simple way, via an exponential martingale Z T (X ), dP˜ t (θT ) = Z T (X ) dPt (a),
(7)
where
[ Z T (X ) = exp
1
1
σ
2σ
(µ · X T ) − 2
] 2 ∥µ∥ T , 2
(8)
∑2
i and where (µ · X T ) = i=1 µi XT is the dot product of the drift rate vector and a random vector X T with norm a and phase angle θT . In polar coordinates, the components of X T map out a locus of random hitting points on the bounding circle with XT1 = a cos θT and XT2 = a sin θT , so Z T (X ) can be represented in polar form as
[ Z T (X ) = exp
1
1
σ
2σ
(aµ1 cos θT + aµ2 sin θT ) − 2
] 2 ∥µ∥ T . 2
(9)
Eqs. (9) and (6) used in (7) provide explicit expressions for the joint first-passage time density functions of the process X T . The function Z T (X ) factors into a product of two terms, one which depends on the phase angle of the hitting point and is independent of hitting time and another which depends on the hitting time and is independent of the hitting point. It is immediate from this factorization that the distribution of decision times is the same for all decision outcomes.5 5 Time is capitalized as T in (7) through (10) to indicate that the decision time is a random variable. The martingale property of Z T (X ) means that (7)
150
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
The ratio of probability densities, dP˜ t (θT ) dPt (a)
= Z T (X ),
(10)
is known as the Radon–Nikodym derivative of the probability measure P˜ t with respect to the measure Pt , and has a useful statistical interpretation as a likelihood ratio process. Specifically, it defines the likelihood of obtaining the observed evidence process X t , given a drift rate of µ, versus the likelihood of obtaining the same process given a drift rate of 0. Evidently, the value of Z T (X ) will be maximized when the dot product (µ · X T ) is maximized. The maximum will occur when the phase angles of µ and XT coincide, which means that the most likely place for the process to hit the bounding circle is at the point corresponding to the phase angle of the drift vector. An observer who reports the hitting point as the decision about the identity of the stimulus is a maximum likelihood observer. 2.2. Across-trial variability in drift rates When there is across-trial variability in drift rates, the model predicts slow errors, just like the standard form of the 1D model (Ratcliff & McKoon, 2008). Eq. (7) shows that the effects of drift rate on decision times and decision outcomes are confined to the function Z T (X ). The joint density in the presence of drift rate variability can therefore be obtained by marginalizing Z T (X ) across the distribution of drift rates. When the components of drift rate variability are independent, this calculation is comparatively straightforward. Smith and Corbett (2019) carried out the calculation explicitly for a 4D generalization of the model, which represents evidence accumulation as diffusion by a 4D Wiener process on the interior of a hypersphere, S 3 ⊂ R4 . When the components are independent, Z T (X ) factors into a product of components, Z T (X ) =
∏
i
ZT (X ) ≜
∏
i
[ exp
i
1
σ2
(µi ·
XTi )
−
1 2σ 2
µ
2 iT
]
.
(11)
(Here I am using the delta-equivalent symbol ‘‘≜’’ to mean ‘‘defined to be equal to’’ (e.g., Karatzas & Shreve, 1991). Other comdef
mon notations are ‘‘ = ’’, the Pascal-like assignment symbol, ‘‘:=’’, and the equivalence symbol, ‘‘≡’’. I favor the delta-equivalent symbol because it is compact, typographically distinct, and has no semantic ambiguity.) Like (11), the independent-components distribution of drift rates factors into a product of univariate components, so the integration of Z T (X ) over drift rates can be carried out on a componentwise basis. We denote by Z¯T (X i ) the ith component of Z T (X ) in the presence of drift rate variability. The overbar notation is intended to imply averaging or marginalization. For a multivariate normal distribution of drift rates with independent components, with component means νi , and a common standard deviation η, Smith and Corbett (2019) showed that Z¯T (X i ) has the form
ν2 [XTi (η/σ )2 + νi ]2 1 exp − i 2 + Z¯Ti (X i ) = √ 2η 2η2 [(η/σ )2 T + 1] (η/σ )2 T + 1
{
}
. (12)
The calculation required to obtain (12) essentially recapitulates a similar calculation carried out by Tuerlinckx (2004, Eq. 28), holds at both fixed and random times; the latter include stopping times of the underlying probability space, including the first time at which the process hits the decision boundary. The fact that (7) holds at stopping times is what allows us to substitute points on the boundary (a cos θ, a sin θ ) for X T to obtain the explicit representation (9). When Z T (X ) is applied to dPt (a) in (7), T takes on specific values T = t and the time-dependent part of (7) becomes exp[−∥µ∥2 t /(2σ 2 )] dPt (a).
who derived an expression for the first-passage time density for the two-barrier (1D) Wiener diffusion model in the presence of across-trial variability in drift rate. An alternative, and potentially more illuminating, representation of Z¯Ti (X i ) is
{ i } 2 2 ¯ZTi (X i ) = ZTi (X i ; νi ) exp (XT − T νi ) (η/σ ) , 2[T (η/σ )2 + 1]
(13)
where ZTi (X i ; νi ) denotes ith component of Z T (X ) in (11), with νi replacing µi . Eq. (13) is obtained by expanding the quadratic in (12), putting the two terms over a common factor, and then factoring out ZTi (X i ; νi ). 3. Multivariate normal drift rates 3.1. Correlated components The main result of this article is the derivation of the joint density for the circular diffusion model where the drift rates are bivariate-normally distributed with correlated components. By the preceding discussion, this means we need to evaluate the iterated integral Z¯ T (X ) =
∫
∞ −∞
∞
∫
N(µ1 , µ2 ; ν2 , ν2 ) Z T (X ) dµ1 dµ2 ,
(14)
−∞
where N(µ1 , µ2 ; ν1 , ν2 ) =
{
1
exp −
[
1 2(1 − ρ
(µ1 − ν1 )2
η12 ]} 2ρ (µ1 − ν1 )(µ2 − ν2 ) (µ2 − ν2 )2 , − + η1 η2 η22
2πη1 η2
√
1 − ρ2
2)
(15)
and where Z T (X ) is given by (8). The dependence of N(·) on the standard deviations, η1 and η2 , and the correlation, ρ , has been suppressed from the notation for compactness. Unlike the calculation reported by Smith and Corbett (2019), the presence of the cross-product term in (15) means that (14) cannot be reduced to a componentwise product of integrals, but must be evaluated directly. The technique required to evaluate (14) is similar to that required to derive the moment generating function of a bivariate normal distribution, but the resulting expressions are appreciably more complex because Z T (X ) contributes both linear and quadratic terms to the integrand. In brief, when evaluating the iterated integral, one term of the cross-product is fixed and the other is variable with respect to a given variable of integration. The fixed term can be therefore treated as constant while the inner integral is evaluated. After the inner integral is evaluated, one of the variables is removed from the equation. The outer integrand can then be evaluated, treating the previously fixed term as variable. In what follows I freely use substitution in order to prevent the resulting expressions from becoming unmanageably large. We initially make the substitution ai ≜ XTi /σ 2 and b ≜ T /(2σ 2 ), allowing us to write Z T (X ) in (8) as Z T (X ) = exp(a1 µ1 + a2 µ2 − bµ21 − bµ22 ).
(16)
In order to simplify the cross-product term in (15) as much as possible, we make the change of variables, x = (µ1 − ν1 )/η1 and y = (µ2 − ν2 )/η2 . We also define κ ≜ 1 − ρ 2 . (This usage is distinct from the earlier usage of κ in (3), as the precision of a von Mises distribution). The bivariate normal distribution in (15) then becomes N(x, y) =
[ ] ) 1 ( 2 x − 2ρ xy + x2 . √ exp − 2κ 2πη1 η2 κ 1
(17)
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
We note that the change of variables leads to the relation dµ1 dµ2 = η1 η2 dx dy, so that integration of N(x, y) with respect to x and y will lead to canceling of the standard deviation terms in (17). With the change of variables and the indicated substitutions, Z T (X ) becomes Z T (X ) =
and complete the square to obtain
{ exp −
+
(GH − ρ 2 ) 2κ G
κ (GE + ρ B)
[
2
y −
κB
2
2
2G(GH − ρ 2 )
+
2κ (GE + ρ B)y GH − ρ 2
κ 2 (GE + ρ B)2 + (GH − ρ 2 )2
]
}
2G
or
exp −bη12 x2 + (a1 − 2bν1 )η1 x + (a1 ν1 − bν12 ) − bη22 y2
[
{
+ (a2 − 2bν2 )η2 y + (a2 ν2 − bν22 ) .
]
(18)
We then make the substitutions
exp −
(GH − ρ 2 ) 2κ G
[ y−
2κ (GE + ρ B)
]2 }
GH − ρ 2
{ exp
} κ B2 κ (GE + ρ B)2 + . 2G(GH − ρ 2 ) 2G
The first term can be recognized as the exponent of a normal 2 distribution with variance κ G √/(GH − ρ ) and mean 2κ (GE + 2 ρ B)/(GH −ρ ). It integrates to 2πκ G/(GH − ρ 2 ), to yield, finally,
A ≜ bη12 B ≜ (a1 − 2bν1 )η1 C ≜ a1 ν1 − bν12 D ≜ bη
151
] [ √ κ B2 + F) κ κ (GE + ρ B)2 ¯Z T (X ) = exp(C √ + . exp 2G(GH − ρ 2 ) 2G GH − ρ 2
2 2
E ≜ (a2 − 2bν2 )η2
(22)
Making use of the fact that G = 1 + 2κ A and H = 1 + 2κ D we obtain
F ≜ a2 ν2 − bν22 . With these substitutions (14) can be written in the explicit form ∞
∞
GH − ρ 2 = κ[1 + 2(A + D) + 4κ AD],
[ ] ) 1 1 ( 2 Z¯ T (X ) = exp − x − 2ρ xy + y2 √ 2κ 2π κ −∞ −∞ ( ) × exp −Ax2 + Bx + C − Dy2 + Ey + F dx dy,
and putting the terms in the exponent over a common factor allows (22) to be written in terms of the intermediate-level substitutions as
or, after taking out constant terms from under the inner and outer integrals, as
Z¯ T (X ) = √
∫
∫
[ ] ) 1 ( 2 exp(C + F ) exp − Z¯ T (X ) = y + 2κ Dy2 − 2κ Ey √ 2κ 2π κ ] [ −∞ ∫ ∞ ) 1 ( 2 x + 2κ Ax2 − 2κ Bx − 2ρ xy dx dy. (19) exp − 2κ −∞ We make the further substitutions G ≜ 1 + 2κ A and H ≜ 1 + 2κ D to obtain the reduced representation
[ ] ∫ ∞ ) 1 ( 2 ¯Z T (X ) = exp(C√+ F ) exp − Hy − 2κ Ey 2κ 2π κ } { −∞ ∫ ∞ ] 1 [ 2 Gx − 2(κ B + ρ y)x dx dy. × exp − 2κ −∞
G
[
2κ
x2 −
(20)
2(κ B + ρ y)x G
+
(κ B + ρ y)2
(κ B + ρ y)2
]
G2
+
}
2κ G
G
exp −
[ x−
2κ
2(κ B + ρ y)
]2 }
G
{ exp
(κ B + ρ y)2 2κ G
exp(C + F )
√
2π G
∫
∞
[ exp −
−∞
.
exp −
2κ G
) (κ B + ρ y)2 1 ( 2 Hy − 2κ Ey + 2κ 2κ G
} ] 2
(GH − ρ )y − 2κ (GE + ρ B)y − κ B 2
2
2
(23)
Eq. (23) provides us with a representation of Z¯ T (X ) in the presence of bivariate-normal across-trial variability in drift rates that is suitable for use in computations. When used in (7) it gives the joint distribution of decision times and decision outcomes for a circular diffusion model with correlated drift-rate components.
When ρ = 0 the components of drift rate become uncorrelated so we would expect that Z¯ T (X ) in (23) would reduce to a product of terms, ZT (X i ), of the form (12), or equivalently, (13). Written in terms of the original variables, the denominator term, 2[1 + 2(A + D) + 4(1 − ρ 2 )AD] in (23) is equal to
]
2 [1 + T (η1 /σ )2 ][1 + T (η2 /σ )2 ] − ρ 2 T 2 (η1 /σ )2 (η2 /σ 2 ) ,
{
]
dy.
We now carry out a second completion of the square operation in (21). We write the exponent as 1 [
.
or
}
(21)
{
}
2[1 + 2(A + D) + 4(1 − ρ 2 )AD]
[
The first of these terms can be recognized as the exponent of a normal distribution with √ variance κ/G and mean 2(κ B + ρ y)/G. It therefore integrates to 2πκ/G, which reduces (20) to the single integral Z¯ T (X ) =
× exp
GE 2 + 2ρ BE + HB2
2 1 + (T /σ 2 ) (η12 + η22 ) + (1 − ρ 2 )(T 2 /σ 4 ) η12 η22 ,
or
{
{
3.2. Uncorrelated components
We can now complete the square in the inner integral by writing the exponent as
{
1 + 2(A + D) + 4(1 − ρ 2 )AD
∞
∫
exp −
exp(C + F )
}
which reduces when ρ = 0 to the simple product 2[1 + T (η1 /σ )2 ][1 + T (η2 /σ )2 ],
(24)
in agreement with the denominator in (13). When ρ = 0 the exponent in (23) becomes (GE 2 + HB2 )/{2[1 + 2(A + D) + 4AD]}, which, written in terms of the original variables, and in the light of (24), becomes { exp
[1 + T (η1 /σ )2 ][XT2 − T ν2 ]2 (η2 /σ 2 ) + [1 + T (η2 /σ )2 ][XT1 − T ν1 ]2 (η1 /σ 2 ) 2[1 + T (η1 /σ )2 ][1 + T (η2 /σ )2 ]
which can be rearranged to yield
{ exp
} [XT1 − T ν1 ]2 (η1 /σ 2 ) [XT2 − T ν2 ]2 (η2 /σ 2 ) + , 2[1 + T (η1 /σ )2 ] 2[1 + T (η2 /σ )2 ]
}
152
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
components of drift rate variability would be independent of the stimulus identity, which is characterized by ϕ . When the mean drift rate is in canonical orientation, ρ = 0 and the radial and tangential components of drift rate variability are independent. When the components are independent, the predictions can be derived using Smith and Corbett’s (2019) results, which were reproduced in Section 3.2. We can characterize the properties of the rotationally invariant model by linearly transforming the distribution of drift rates so that its major axis remains aligned with νϕ . In preparation for Section 4.1, we consider a more general form of the model than the one in Fig. 3, in which we allow, ρ , the correlation between the components of drift rate for a stimulus in canonical orientation, to be nonzero. When ρ = 0, this model reduces to the model of Fig. 3. As the variance and covariance properties of the distribution of drift rates are independent of its mean, we proceed as if the latter were zero. Let µ = (µ1 , µ2 ) be the components of drift rate in the standard basis, (X 1 , X 2 ), and let µ′ = (µ′1 , µ′2 ) be the coordinates of the new distribution after rotating its major axis by ϕ . The new and old coordinates are related by a rotation matrix, Fig. 3. Rotationally invariant model. The major axes of the ellipses of equal likelihood of N(µ1 , µ2 ) align with the mean drift rate vectors, νϕ . The standard deviations η1 and η2 then describe the radial and tangential components of drift rate variability.
µ′ = R ϕ µ, where
[ Rϕ = while the exp(C + F ) term in (23) is (ν12 + ν22 )T ν1 XT1 + ν2 XT2 = exp − σ2 2σ 2 ≜ Z T (X ; ν).
Z T (X ) =
i=1
− sin ϕ cos ϕ
]
,
µ′1 = cos ϕµ1 − sin ϕµ2 µ′2 = sin ϕµ1 + cos ϕµ2 .
}
Taking expectations yields an expression for the covariance of the components of drift rate in the new coordinates,
Putting these elements together gives 2 ∏
cos ϕ sin ϕ
and where ϕ = arctan(ν2 /ν1 ), from which we obtain
exp (C + F ) = exp(a1 ν1 − bν12 + a2 ν2 − bν22 )
{
(26)
} [XTi − T νi ]2 (ηi /σ 2 ) ZT (X i ; νi ) exp , 2[1 + T (ηi /σ )2 ]
cov(µ′1 , µ′2 ) = E [(cos ϕµ1 − sin ϕµ2 ) (sin ϕµ1 + cos ϕµ2 )]
{
(25)
= cos ϕ sin ϕ E [µ21 ] − sin ϕ cos ϕ E [µ22 ] + (cos2 ϕ − sin2 ϕ )E [µ1 µ2 ]
in agreement with the results of Smith and Corbett (2019) and Tuerlinckx (2004) in (13).
= sin ϕ cos ϕ (η12 − η22 ) + (cos2 ϕ − sin2 ϕ )ρη1 η2
3.3. The rotationally invariant model
=
The representation in (23) is a fairly general one, inasmuch as it prescribes no relationship between, ϕ , the phase angle of the mean drift rate vector, and ρ , the correlation between the components of drift rate variability. Geometrically, ρ determines the orientation of the ellipses of equal likelihood of N(µ1 , µ2 ). An important special case theoretically is the one shown in Fig. 3, which I call the rotationally invariant model. In this model, the major axes of the ellipses of equal likelihood align with the mean drift rate vector, νϕ . When ϕ = 0, νϕ = (ν1 , 0) = (∥ν∥, 0), and the mean drift vector points along the positive X 1 axis. We say that such a mean drift vector is in canonical orientation. Under these circumstances, the standard deviations, η1 and η2 , of N(µ1 , µ2 ) describe the radial and tangential components of drift rate variability, respectively. The radial component is the component of drift rate variability normal to the decision boundary and the tangential component is the component along the boundary. Concretely, the radial component characterizes variability in the quality of the encoded stimulus information and the tangential component characterizes variability in its identity. In the rotationally invariant model, the radial and tangential components of drift rate variability are independent of ϕ . Fig. 3 shows this invariance for a distribution of drift rates with ϕ = π/2. This model is a theoretically important one because in many situations we might expect that the radial and tangential
1 2
sin 2ϕ (η1 − η2 )(η1 + η2 ) + cos 2ϕ ρη1 η2 ,
(27)
√ where we have made use of the fact that ρ = E [µ1 µ2 ]/ η12 η22 in the old coordinates. A similar calculation yields expressions for the variances var(µ′1 ) = cos2 ϕη12 + sin2 ϕη22 − sin 2ϕ ρη1 η2 var(µ′2 ) = sin2 ϕη12 + cos2 ϕη22 + sin 2ϕ ρη1 η2 ,
(28)
from which we obtain Eq. (29) after some algebra. (See Box I.) Fig. 4 shows predicted distributions of decision outcomes, P˜ rot. inv. (θ; νϕ , ηϕ ), and mean RTs for the rotationally invariant model, where the components of ηϕ = (ηrad. , ηtan. ), are the radial and tangential components of drift rate variability. The three sets of functions are for phase angles of ϕ = 0, φ/4, φ/2, and, as expected, they are identical except for a shift in phase angle. In practical terms, the calculations in (27) through (29) are not needed to obtain predictions for the rotationally invariant model, because they can always be obtained from those of the independent components model in canonical orientation by making use of the relationship P˜ rot. inv. (θ; νϕ , ηϕ ) = P˜ ind. comp. (θ − ϕ; νcan. , ηϕ ), with νcan. = (∥νϕ ∥, 0), in an obvious notation. Nevertheless, they are of theoretical interest because they make explicit the mathematical relationship between the two models. Smith (2016, Appendix C) showed that the mean decision time Eµ [Ta ] for the circular diffusion model with no across-trial
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
ρ′ = √
153
cov(µ′1 , µ′s )
var(µ′1 ) var(µ′2 )
=
sin 2ϕ (η1 − η2 )(η1 + η2 ) + 2 cos 2ϕ ρη1 η2
1 2
√
(cos2 ϕη12 + sin2 ϕη22 − sin 2ϕ ρη1 η2 )(sin2 ϕη12 + cos2 ϕη22 + sin 2ϕ ρη1 η2 ) sin 2ϕ (η1 − η2 )(η1 + η2 ) + 2 cos 2ϕ ρη1 η2
= √
sin 2ϕ (η1 − η2 )2 (η1 + η2 )2 + 2 sin 4ϕ (η1 − η2 )(η1 + η2 )ρη1 η2 + 4η12 η22 (1 − sin2 2ϕ ρ 2 ) 2
,
(29)
Box I.
Fig. 4. Distributions of decision outcomes and mean decision times for the rotationally invariant model. The solid (dark gray), dashed, and solid-dotted (light gray) lines are predictions for phase angles of ϕ = 0, π/4, and π/2 radians, respectively. The other generating parameters for the model were ∥ν∥ = 1.5, ηrad. = 1.0, ηtan. = 0.5, σ = 1.0, a = 1.2.
variability and drift rate µ is aI1 (a∥µ∥/σ 2 )
, (30) ∥µ∥aI0 (a∥µ∥/σ 2 ) where I1 (.) and I0 (.) are modified Bessel functions of the first kind Eµ [Ta ] =
of order one and order zero, respectively (Abramowitz & Stegun, 1965, p. 375). As noted previously, the mean decision time is constant for all values of hitting point, θ . The slow-error pattern in Fig. 4 comes about from the mixing of decision times across drift rates — specifically, from the fact that decision times are longer and distributions of decision outcomes are more dispersed when drift rates are lower, as implied by (3). Low drift rate trials contribute more probability mass to the tails of the distribution of decision outcomes than do high drift rate trials and as a result, decision times in the tails are longer. For any such mixture model, in which the probability density of drift rate µ is p(µ) and the ˜ θ; µ), the marginal mean distribution of decision outcomes is P( decision time, conditional on decision outcome, is E¯ [Ta ](θ ) =
∫
˜ θ; µ)p(µ) dµ Eµ [Ta ]P(
∫
˜ θ; µ)p(µ) dµ P(
.
(31)
The mean decision times in Fig. 4 reflect the mixing properties of (31). As the right-hand panel of the figure shows, the addition of across-trial variability in drift rates causes the mean decision time as a function of phase angle to vary by almost 150 ms. One potential application of the rotationally invariant model is to the study of visual working memory. In a recent study using a two-alternative forced-choice task with fine orientation discrimination judgments, Lilburn et al. (2019) showed that the effects of orientation tuning and memory strength were separable: Changing the number of items in memory changed the strength of
the memory representation, as measured by signal detection d′ , but did not change the orientation tuning. They analyzed their data using a signal detection decision model and a tuned-channel stimulus coding model but, as they discussed, their results could also be represented using a circular diffusion model in which the response circle is partitioned into four categories using decision bounds (Smith, 2016; Smith & Corbett, 2019), two of which represent stimuli oriented clockwise to the vertical and two of which represent stimuli oriented anticlockwise.6 When orientation discrimination decisions are characterized in this way, the separability of memory strength and orientation tuning imply a model in which the radial component of drift rate is affected by the number of items in memory while the tangential component remains invariant. The rotationally invariant model provides a natural way to represent the effects of across-trial variability in stimulus encoding in a model of this kind. See the Discussion of Lilburn et al.’s article for details. A related application, also in the area of visual working memory, was reported by Fitousi (2018). Fitousi studied the binding of color and shape features in memory using a dual-presentation paradigm, in which an initial 500 ms preview of the stimulus was followed after 2000 ms by a variable-duration target stimulus. Participants made independent judgments of whether the color and the shape of the stimulus changed from the preview to the target. The resulting accuracy data were well described by a correlated-components GRT model, in which the sign of the correlation depended on whether features were repeated or alternated. When both features were repeated or both were alternated, the correlation was positive; when one was repeated and the other was alternated, the correlation was negative. The resulting ‘‘clover leaf’’ pattern of correlation (Fitousi, 2018, Figure 4) resembles that predicted by the rotationally invariant model (Fig. 4). Expressed in the language of that model, the clover leaf pattern found by Fitousi is one in which drift vectors oriented at π/4 and 5π/4 are associated with positive correlations (both features repeated or both alternated, respectively) and drift vectors oriented at 3π/4 and 7π/4 are associated with negative correlations (one feature repeated and one feature alternated). This relationship between the location of the stimulus in decision space and the correlations it induces is nicely captured by the rotationally invariant model.
6 Lilburn et al.’s experiment used Gabor patch stimuli, which are invariant under a rotation of π radians, like the stimuli in Fig. 1b. Four rather than two categories are needed to model decisions about such stimuli using the circular diffusion model because the implied orientation changes at 0, π/2, π , and 3π/2 radians. An alternative approach to modeling decisions about rotationally invariant stimuli was investigated by Kvam (2019), who used a continuous outcome task in which the θ ∈ [0, π] orientation space was mapped to a [0, 2π] response circle via the mapping θ ↦ → 2θ . The task yielded regular data that were well fit by an elaborated version of the circular diffusion model.
154
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
Fig. 5. The effects of correlated and uncorrelated variability in the components of drift rate. The left-hand and right-hand panels show, respectively, distributions of decision outcomes and mean decision times for a stimulus in canonical orientation, (ν1 = 1.5, ν2 = 0). The dashed lines are for no acrosstrial variability (η1 = 0, η2 = 0). The solid lines are predictions for unequal, uncorrelated variability, (η1 = 1.0, η2 = 0.5, ρ = 0). The circles and crosses are predictions for positively and negatively components, (η1 = 1.0, η2 = 0.5, ρ = 0.85) and (η1 = 1.0, η2 = 0.5, ρ = −0.85), respectively. The other model parameters were σ = 1.0 and a = 1.2.
4. Further properties of the model
Fig. 6. Circular diffusion model with categorical decision bounds. The stimulus ¯ respectively, and the consists of two features, A and B, with levels a, a¯ , and b, b, decision task is to identify which of the four possible stimulus configurations was presented. The response circle is partitioned into four regions by decision ¯ bounds aligned with the cardinal axes. The decision maker responds AB, AB, ¯ or AB¯ depending on the quadrant of the circle in which the process hits the A¯ B, decision boundary. For the task in Fig. 1c, the features A and B would correspond to the gratings on the left and right and the levels would correspond to their orientations.
4.1. General effects of correlation in drift rate components The effects of correlation in the components of drift rate variability that are not aligned with the mean drift vector are shown in Fig. 5. The left-hand panel shows distributions of decision outcomes and the right-hand panel shows mean decision times. The figure shows predictions for four models. The first model is for the ‘‘pure’’ circular diffusion model with no across-trial variability. This model has the highest precision (smallest standard deviation) decision outcomes and predicts constant mean decision times (the flat dashed line in the right-hand panel). The joint distributions of decision outcomes and decision times predicted by the model are scaled copies of one another ((Smith, 2016), Figure 5). The second model is for uncorrelated drift rate components in which the standard deviation of the radial component was twice the tangential component. Introduction of across-trial variability reduces the precision of the distribution of decision outcomes and induces a slow-error pattern like the one in Fig. 5. The remaining two sets of predictions are for models with positive (ρ = 0.85) and negative (ρ = −0.85) correlations between the components. The effect of correlation is to rotate the ellipses of equallikelihood in Fig. 3 anticlockwise or clockwise relative to the phase angle of the mean drift vector. Rotation has the effect of biasing both the distribution of decision outcomes and decision times relative to those of the uncorrelated model. A positive correlation shifts the peak of the distribution of outcomes slightly to the right (anticlockwise) and more substantially increases the distribution shoulder. A negative correlation does the converse. Although the mean drift vector has a phase angle of ϕ = 0, correlation biases the population of drift rates in either a clockwise or anticlockwise direction. The effect of such biasing is most apparent in the distributions of decision times. When the components are uncorrelated, the minimum mean decision time occurs at ϕ = 0. The minimum is shifted to the left or right depending on the sign of the correlation. A positive correlation increases the proportion of drift rates with positive phase angles and a negative correlation does the converse. This increase slightly shifts the peak of the distribution of decision outcomes and has a more pronounced effect on the distribution of mean decision times, whose minimum, for the particular set of parameters used to generate the figure, is shifted by around π/9 radians
(20◦ ). However, the main effect of correlation is to introduce asymmetry into the distributions of decision outcomes and mean decision times. When there is uncorrelated across-trial variability, the distributions of decision outcomes and mean decision times are symmetrical around their vertical midline. Increasing the correlation increases the degree of left–right asymmetry. In general, asymmetry, in both decision outcomes and decision times, can be viewed as a qualitative signature of correlation in the components of drift rate. 4.2. Categorical decision bounds Smith (2016) described how the circular diffusion model can be linked to GRT in an explicit way, by partitioning the response circle with decision bounds, as shown in Fig. 6. The model assumes that the decision maker responds with the category name of the segment of the response circle that contains the hitting point. The response probabilities and decision times for this model can be obtained straightforwardly, by integrating over the points on the decision boundary within each response category. The discussion in Section 1 of the task in Fig. 1c attempted to show why this kind of model might be theoretically interesting. As a concrete example, Smith and Corbett (2019) used a 4D generalization of the model with categorical decision bounds to represent decisions about four-element displays, in which the decision process had 16 possible end states. Fig. 7 shows predictions for the model with categorical decision bounds in which the components were negatively correlated, uncorrelated, or positively correlated. The predictions are for an unbiased decision process in which the decision bounds are aligned with the cardinal axes. The phase angle of the mean drift vector, ν, was ϕ = π/4 radians, which is centered on the first quadrant. The predictions show the same qualitative properties as those of the continuous model in Fig. 5. Correlation has a larger effect on the distribution of mean decision times than on the distribution of decision outcomes. As in the continuous model, correlation introduces asymmetry into the distribution of decision outcomes. Positive correlation shifts the peak of the distribution of decision outcomes slightly to the right (anticlockwise) while negative correlation shifts it slightly to the
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
155
When there is across-trial variability in drift rates, especially when the components of drift rate are correlated, the likelihood ratio interpretation becomes less straightforward because all of the associated likelihoods are then compound rather than simple likelihoods. However, the broad qualitative interpretation of boundary placement as an expression of response bias remains a useful one. The predictions in Fig. 7 will be affected, to some degree at least, by boundary placement because the boundaries determine the bounds of integration in the computation of decision probabilities and decision times. I have not made any attempt here to characterize how the predictions change with changes in boundary placement in a systematic way. Fig. 7. Predictions for a model with categorical decision bounds. The mean drift vector was centered in the first quadrant, ν = (1.5, 1.5), ϕ = π/4. The category boundaries were set at ±π , −π/2, 0, and π/2. The predictions are for models with ρ = 0 (inverted triangles), ρ = 0.85 (circles), ρ = −0.85 (crosses). The correlations are specified relative to the canonical orientation (ϕ = 0). The other model parameters were η1 = 1.0, η2 = 1.0, σ = 1.0, a = 1.2.
left (clockwise). Correlation has the opposite effect at large phase angles: The shoulder of the distribution at 3π/4 radians, which corresponds to an angular error of π/4, is higher for negative correlation than for positive correlation; the shoulder at −π/4 radians, which corresponds to an angular error of −π/2, shows the opposite relationship. At an angular error of π radians, there is no effect of correlation on decision probability. As in Fig. 5, the effect of correlation on mean decision time is more pronounced than it is on accuracy. The distributions of decision times have a fixed point at ϕ = 0. Mean decision times for negatively correlated stimuli are longer than those for positively correlated stimuli at positive angular differences and shorter at negative angular differences. Again, these orderings reflect the effects of different mixtures of drift rates. The asymmetry in the distributions of decision outcomes is a reflection of the fact that correlation alters the relative proportions of probability mass that exceed a given angular separation in the positive and negative directions. The combination of increased probability mass and smaller drift rate norms on one side of the distribution than the other produces the asymmetry in mean decision times. The model of Fig. 6 assumes that the decision bounds that partition the response circle are aligned with the cardinal axes of the space, but there is no necessity for this to be the case. As an example both of how this assumption can be relaxed and of why one might wish to do so, Smith and Corbett (2019) considered a 4D model with categorical bounds in which the placement of the decision bounds was viewed as an expression of response bias. This kind of bias can be given a likelihood ratio interpretation by virtue of the function Z T (X ) in (10) and its interpretation as a likelihood ratio process. Consider, for example, a version of the task in Fig. 6, in which the drift rate vectors for the four kinds of stimuli have phase angles of −3π /4, −π /4, π/4, and 3π/4 radians and all have the same norms. Associated with each stimulus will be a function Z T (X ) which characterizes the likelihood of the process hitting the boundary at the point θ as a function of the drift rate for that stimulus. By the symmetry of the construction, these four functions will intersect on the cardinal axes, at ±π , −π/2, 0 and π /2 radians, and observers who align their decision bounds with the axes are unbiased in a likelihood ratio sense because the points of intersection are points of equal likelihood. Conversely, observers who place their decision bounds at different points, so that some of the decision regions are larger than others, are biased — again in a likelihood ratio sense. Smith and Corbett found evidence for this kind of bias in a task requiring detection of single targets in four-item arrays but not in one requiring detection of pairs of targets.
4.3. Ashby’s stochastic GRT model The model of Fig. 6 can usefully be compared with the stochastic GRT model of Ashby (2000), which also links a 2D diffusion model of evidence accumulation to a model of categorical responding. Unlike the model of Fig. 6, Ashby’s model applies to tasks in which a two-choice categorization response is required. Also unlike the model of Fig. 6, in which the only correlation between the features of the stimulus is in the drift rates, Ashby’s model permits correlation between the components of the evidence process (Xt1 , Xt2 ). Mathematically, correlation is represented by nonzero off-diagonal elements in the dispersion matrix, σ , in (1). The model assumes that the evidence state in the decision process is equal to the value of a linear discriminant function computed on the components of the process (Xt1 , Xt2 ). Geometrically, the discriminant function can be thought of as the projection of (Xt1 , Xt2 ) onto a line normal to a decision bound that partitions the stimulus space into response categories. Computing the discriminant function has the effect of reducing the dimensionality of the process from two dimensions to one: Both the drift rate and the diffusion coefficient of the resulting 1D process are single numbers, which are weighted sums of the corresponding coefficients of the 2D process, with weights given by the discriminant function. The diffusion coefficient, in particular, is a weighted sum of the squares of the diagonal and off-diagonal elements of the dispersion matrix. Because of the reduction in dimensionality that occurs when computing the discriminant function, the additional generality of the nonzero covariance terms in the dispersion matrix is absorbed in the diffusion coefficient of the 1D process and consequently does little or no predictive work in the resulting model.7 Because the decision process in Ashby’s (2000) model is a 1D Wiener process, the predicted decision time distributions for the model are just the first-passage time distributions of a two-barrier Wiener process (Ratcliff, 1978; Smith, 1990, 2016). Consequently, like any 1D Wiener model, if the starting point of 7 One place where assumptions about the covariance structure of the dispersion matrix may have testable consequences is in comparing RTs for stimuli located at different distances from the decision bound. In Ashby’s (2000) model, this distance affects both the drift rate and the diffusion coefficient of the 1D discriminant function process. The diffusion coefficient sets the ‘‘clock’’ of the process, which determines how rapidly it diffuses towards the boundaries. In practice, quite small differences in the diffusion coefficient can lead to fairly large changes in RT distributions (Donkin, Brown, & Heathcote, 2009; Smith, Ratcliff, & Sewell, 2014). I am not aware of any published studies that have investigated these predictions of Ashby’s model at a distributional level, which is where stimulus-specific differences in the diffusion coefficient would be most apparent. Ashby’s model also includes an additional 1D source of ‘‘criterial’’ noise that can potentially be traded off against the 2D diffusion noise in the GRT model, which would act to limit the effects of differences in the diffusion coefficient for stimuli located at different distances from the decision bound. But to the extent that the model predictions are dominated by 1D criterial noise rather than 2D GRT noise, the latter should then be viewed as doing relatively little predictive work, as commented in the text.
156
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
the process is equidistant between the decision criteria, Ashby’s model predicts equal decision times for correct responses and errors in the absence of across-trial variability in drift rates and decision criteria. In applications of his model to categorization tasks, in which people are asked to categorize members of a set of stimuli located at different points in the stimulus space, across-trial variability in drift rate is supplied by differences in the positions of the stimuli relative to the decision bound. Stimuli that are closer to the bound generate smaller values of the discriminant function, which translate into smaller drift rates, longer RTs, and higher probability of errors (Ashby, 2000, Figure 5). These properties are the 1D counterparts of those of the circular diffusion model with categorical decision bounds. In sum, then, the circular diffusion model with decision bounds and Ashby’s (2000) stochastic GRT model represent two different ways of obtaining tractable analytic RT models from GRT, and possess similar properties. The main difference between the two models is in the kinds of tasks to which they apply. Ashby’s model applies most naturally to categorization tasks in which people are asked to make binary classification judgments about two-feature stimuli. The circular diffusion model with decision bounds applies most naturally to tasks in which people are asked to make four-way classification judgments about such stimuli and – in the generalized form considered by Smith and Corbett (2019) – to higher dimensional analogues of such tasks. Both models have clear and distinct areas of application, although arguably the four-way classification task is closer in spirit to the kinds of multi-way ‘‘recognition’’ judgments that GRT was originally intended to characterize. 5. Discussion The results I have presented in this article provide an analytic characterization of the link between the circular diffusion model and GRT. The assumption I have used to develop this link was that the bivariate-normally distributed stimulus components of a GRT model can be interpreted as components of a vector-valued drift rate in the circular diffusion model. Trial-to-trial variability in the perceptual representation of stimuli in GRT then has a natural expression as across-trial variability in the drift rate of the diffusion process. Analytic predictions for the resulting model can be derived in a (conceptually) straightforward way by marginalizing the exponential martingale in the Girsanov theorem across the distribution of drift rates. The resulting expression, while somewhat complex, nevertheless provides an explicit analytic characterization of the joint distribution of decision outcomes and decision times that is fairly efficient to compute. The predictions of decision outcomes and mean decision times presented here were computed directly from the joint distributions obtained in this way. The resulting model is, of course, not completely general. It assumes, in particular, that the effects of correlation between stimulus components are confined to the across-trial distribution of drift rates and that the two components of the evidence accumulation process are independent. Whether this is actually so is an open question that can only be answered by investigating the model’s ability to account for data. Because of the analytic form of the model’s likelihood equations, this is an eminently feasible undertaking, especially when compared to more complex models that do not have analytic likelihoods and which can only be investigated using Monte Carlo or other numerical methods. The uncorrelatedness of the two components of the evidence accumulation process and the strong symmetry of the associated decision process in Fig. 1 are what allow analytic predictions for the model to be derived via the Girsanov theorem and the Bessel
process. These fairly strong assumptions render what would otherwise be a complex model surprisingly tractable. While it would be conceptually straightforward to relax these assumptions, doing so would break the strong symmetry properties on which the analytic tractability of the model depends. Naturally, other, more general, forms of the model could be investigated using different methods, like Monte Carlo simulation or finite-state Markov chain approximation (Diederich & Busemeyer, 2003). However, there are strong theoretical and practical reasons for favoring models with analytic prediction equations whenever possible. One reason is that they have the potential to provide important theoretical insights, like the analytic decomposition of precision of (3), which simulation methods and purely numerical methods usually do not. Another reason is that they provide an efficient tool for fitting, which makes the task of evaluating a large set of candidate model alternatives more feasible than it would be otherwise. Apart from these pragmatic considerations, in many empirical settings the idea that the correlation between stimulus components reflects variability across trials is a natural one to make cognitively. Such a representation follows from the assumption that variability in stimulus encoding represents trial-to-trial variability in attention, as I remarked in Section 1.2. This kind of model expresses the idea that attention varies across trials but is stable or comparatively stable within trials. Of course, other assumptions are possible. Smith and colleagues have investigated the idea that the spatial distribution of attention can vary continuously during a trial (Sewell & Smith, 2012; Smith & Ratcliff, 2009) and Diederich and colleagues have investigated the complementary idea that attention may be rapidly and serially allocated to the individual features of an item during a trial (Diederich, 2016; Diederich & Trueblood, 2018). In any setting, the appropriateness of a given set of assumptions will depend on the phenomena under investigation and the cognitive processes that are of theoretical interest. The utility of the refinements proposed by Smith and Diederich notwithstanding, there are strong heuristic grounds for starting with the simplest set of assumptions possible and then investigating how many relevant phenomena they can explain. This is particularly so in cases where the other aspects of the model are fairly complex, as is the case of the model presented here. 6. Conclusion In this article, I have derived an analytic expression for the joint distribution of decision outcomes and decision times for a circular diffusion model in which the drift rates are bivariatenormally distributed with correlated components. The resulting model can be viewed as a diffusion model counterpart of GRT, which characterizes the perceptual representation of a twofeature stimulus in a similar way. The predicted effects of drift rate in the circular diffusion model are completely specified by the exponential martingale in the Girsanov change-of-measure theorem, which characterizes the joint distributions of a diffusion process with a nonzero drift rate as a function of a process with zero drift rate. The effects of across-trial variability in drift rate can therefore be characterized by marginalizing the exponential martingale across the bivariate-normal distribution of drift rates. The resulting expressions, although somewhat complex algebraically, are nevertheless efficient to compute and are well suited to fitting data. I discussed the qualitative properties of the model, including the important special case in which the ellipses of equal likelihood remain aligned with the mean drift vector as the phase angle of the drift rate changes. Across-trial variability in drift rates reduces the precision of the distribution of decision outcomes and results in inaccurate responses being slower than accurate responses. When the major axes of the ellipses of equal
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
likelihood are not aligned with the mean drift vector, correlation in the components of drift rate introduces left–right asymmetries into the distributions of decision outcomes and decision times. Such asymmetry can be viewed as a qualitative signature of correlation in the drift rates. Overall, I have shown that the GRT assumptions about perceptual representations of stimuli carry over to the circular diffusion model in a conceptually natural and analytically tractable way. The resulting model is likely to be useful as a tool for characterizing the choice probabilities and response times in both continuous report and categorical decision tasks. Acknowledgments The research in this article was supported by Australian Research Council Discovery Grant DP180101686. Code for the model can be downloaded from https://github.com/philipls/GRTDiffusio n. I thank Simon Lilburn for helpful comments on an earlier draft of the article. References Abramowitz, M., & Stegun, I. (1965). Handbook of mathematical functions. New York, NY: Dover. Adam, K. C. S., Vogel, E. K., & Awh, E. (2017). Clear evidence for item limits in visual working memory. Cognitive Psychology, 97, 79–97. Ashby, F. G. (1989). Stochastic general recognition theory. In D. Vickers, & P. L. Smith (Eds.), Human information processing: measures, mechanisms, and models (pp. 435–457). Amsterdam: Elsevier. Ashby, F. G. (2000). A stochastic version of general recognition theory. Journal of Mathematical Psychology, 44, 310–329. Ashby, F. G., & Lee, W. W. (1991). Predicting similarity and categorization from identification. Journal of Experimental Psychology: General, 120, 150–172. Ashby, F. G., & Townsend, J. T. (1986). Varieties of perceptual independence. Psychological Review, 93, 154–179. Bays, P. M. (2014). Noise in neural populations accounts for errors in working memory. Journal of Neuroscience, 34, 3632–3645. Bays, P. M., Catalao, R. F. Q., & Husain, M. (2009). The precision of visual working memory is set by allocation of a shared resource. Journal of Vision, 9(10), Art. 7. van den Berg, R., Awh, E., & Ma, W. J. (2014). Factorial comparison of working memory models. Psychological Review, 121, 124–149. Bonnel, A. M., & Prinzmetal, W. (1998). Dividing attention between the color and the shape of objects. Perception & Psychophysics, 60, 113–124. Borodin, A. N., & Salminen, P. (1996). Handbook of Brownian motion — Facts and formulae. Basel: Birkhäser. Busemeyer, J., & Townsend, J. T. (1993). Decision field theory: A dynamiccognitive approach to decision making in an uncertain environment. Psychological Review, 100, 432–459. Corbett, E. A., & Smith, P. L. (2017). The magical number one-on-square-roottwo: The double target deficit in brief visual displays. Journal of Experimental Psychology: Human Perception and Performance, 43, 1376–1396. Diederich, A. (2016). A multistage attention-switching model account for payoff effects on perceptual decision tasks with manipulated processing order. Decision, 3, 81–114. Diederich, A., & Busemeyer, J. (2003). Simple matrix methods for analyzing diffusion models of choice probability, choice response time and simple response time. Journal of Mathematical Psychology, 47, 304–322. Diederich, A., & Trueblood, J. S. (2018). A dynamic dual process model of risky decision making. Psychological Review, 125, 270–292. Donkin, C., Brown, S. D., & Heathcote, A. (2009). The overconstraint of response time models: Rethinking the scaling problem. Psychonomic Bulletin & Review, 16, 1129–1135. Eckstein, M. P., Thomas, J. P., Palmer, J., & Shimozaki, S. (2000). A signal detection model predicts the effects of set size on visual search accuracy for feature, conjunction, triple conjunction, and disjunction displays. Perception & Psychophysics, 62, 425–451. Fisher, N. I. (1993). Statistical analysis of circular data. Cambridge, U.K.: Cambridge University Press. Fitousi, D. (2018). Feature binding in visual short term memory: A general recognition theory analysis. Psychonomic Bulletin & Review, 25, 1104–1113. Gorgoraptis, N., Catalao, R. F. G., Bays, P. M., & Husain, M. (2011). Dynamic updating of working memory objects. Journal of Neuroscience, 31, 8502–8511. Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York, N.Y.: Wiley.
157
Hamana, Y., & Matsumoto, H. (2013). The probability distributions of the first hitting times of Bessel processes. Transactions of the American Mathematical Society, 365, 5237–5257. Kadlec, H., & Townsend, J. T. (1992). Implications of marginal and conditional detection parameters for the separabilities and independence of perceptual dimensions. Journal of Mathematical Psychology, 36, 325–374. Karatzas, I., & Shreve, S. E. (1991). Brownian motion and stochastic calculus. New York, NY: Springer. Kool, W., Conway, A. R. A., & Turk-Brown, N. B. (2014). Sequential dynamics in visual short-term memory. Attention, Perception & Psychophysics, 76, 1885–1901. Kvam, P. D. (2019). Modeling accuracy, response time, and bias in continuous outcome orientation judgments. Journal of Experimental Psychology: Human Perception and Performance, 45, 301–318. Lilburn, S. D., Smith, P. L., & Sewell, D. K. (2019). The separable effects of feature precision and item load in visual short-term memory. Journal of Vision, 19(1), Art 2. Link, S. W. (1975). The relative judgment theory of two choice response time. Journal of Mathematical Psychology, 12, 114–135. Link, S. W., & Heath, R. A. (1975). A sequential theory of psychological discrimination. Psychometrika, 40, 77–105. Luce, R. D. (1986). Response times: their role in inferring elementary mental organization. New York, NY: Oxford University Press. Ma, W. J., Husain, M., & Bays, P. M. (2014). Changing concepts of working memory. Nature Neuroscience, 17, 347–356. Maddox, W. T., & Dodd, J. L. (2003). Separating perceptual and decisional attention processes in the identification and categorization of integraldimension stimuli. Journal of Experimental Psychology. Learning, Memory, and Cognition, 29, 467–480. Mardia, K. V., & Jupp, P. E. (1999). Directional statistics. New York, NY: Wiley. Marshall, L., & Bays, P. M. (2013). Obligatory encoding of task-irrelevant features depletes working memory resources. Journal of Vision, 13(2), Art. 21. Oberauer, K., & Lin, H.-Y. (2017). An interference model of visual working memory. Psychological Review, 124, 21–59. Palmer, J., Ames, C. T., & Lindsey, D. T. (1993). Measuring the effect of attention on simple visual search. Journal of Experimental Psychology: Human Perception and Performance, 19, 108–130. Prinzmetal, W., Amiri, H., Allen, K., & Edwards, T. (1998). Phenomenology of attention. 1. Color, location, orientation and spatial frequency. Journal of Experimental Psychology: Human Perception and Performance, 24, 261–282. Rademaker, R. L., Tredway, C. H., & Tong, F. (2012). Introspective judgments predict the precision and likelihood of successful maintenance of visual working memory. Journal of Vision, 12(13), Art. 21. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108. Ratcliff, R. (1985). Theoretical implications of speed and accuracy of positive and negative responses. Psychological Review, 92, 212–225. Ratcliff, R. (2018). Decision making on spatially continuous scales. Psychological Review, 125, 851–887. Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20, 873–922. Ratcliff, R., & Smith, P. L. (2004). A comparison of sequential-sampling models for two choice reaction time. Psychological Review, 111, 333–367. Ratcliff, R., Smith, P. L., & McKoon, G. (2015). Modeling response time and accuracy data. Current Directions in Psychological Science, 24, 458–470. Ratcliff, R., Van Zandt, T., & McKoon, G. (1999). Connectionist and diffusion models of reaction time. Psychological Review, 106, 261–300. Rogers, L. C. G., & Williams, D. (1987/2000). Diffusions, Markov processes and martingales. In Itô calculus: vol. 2, Chichester, U.K.: Wiley, reprinted 2000 Cambridge University Press, Cambridge, U.K.. Schutz, B. (1980). Geometrical methods of mathematical physics. Cambridge, UK: Cambridge University Press. Sewell, D. K., & Smith, P. L. (2012). Attentional control in visual signal detection: Effects of abrupt-onset and no-onset stimuli. Journal of Experimental Psychology: Human Perception and Performance, 38, 1043–1068. Shaw, M. I. (1982). Attending to multiple sources of information: I. The integration of information in decision making. Cognitive Psychology, 14, 353–409. Silbert, N. H., & Thomas, R. D. (2013). Decisional separability, model identification, and statistical inference in the general recognition theory framework. Psychonomic Bulletin & Review, 20, 1–20. Silbert, N. H., & Thomas, R. D. (2014). Optimal response selection and decisional separability in general recognition theory. Journal of Mathematical Psychology, 60, 72–81. Silbert, N. H., & Thomas, R. D. (2017). Identifiability and testability in GRT with individual differences. Journal of Mathematical Psychology, 77, 187–196. Smith, P. L. (1990). A note on the distribution of response times for a random walk with Gaussian increments. Journal of Mathematical Psychology, 34, 445–459. Smith, P. L. (1998). Attention and luminance detection: A quantitative analysis. Journal of Experimental Psychology: Human Perception and Performance, 24, 105–133.
158
P.L. Smith / Journal of Mathematical Psychology 91 (2019) 145–158
Smith, P. L. (2016). Diffusion theory of decision making in continuous report. Psychological Review, 123, 425–451. Smith, P. L., & Corbett, E. A. (2019). Speeded multielement decisionmaking as diffusion in a hypersphere: Theory and application to double-target detection. Psychonomic Bulletin & Review, 26, 127–162. Smith, P. L., & Ratcliff, R. (2009). An integrated theory of attention and decision making in visual signal detection. Psychological Review, 116, 283–317. Smith, P. L., Ratcliff, R., & Sewell, D. K. (2014). Modeling perceptual discrimination in dynamic noise: Time-changed diffusion and release from inhibition. Journal of Mathematical Psychology, 59, 95–113. Soto, F. A., & Ashby, F. G. (2015). Categorization training increases the perceptual separability of novel dimensions. Cognition, 139, 105–129. Soto, F. A., Vucovich, L., Musgrave, R., & Ashby, F. G. (2015). General recognition theory with individual differences: A new method for examining perceptual and decisional interactions with an application to face perception. Psychonomic Bulletin & Review, 22, 88–111. Thomas, R. D. (1995). Gaussian general recognition theory and perceptual independence. Psychological Review, 102, 192–200. Thomas, R. D. (2001). Perceptual interactions of facial dimensions in speeded classification and identification. Perception & Psychophysics, 63, 625–650.
Thomas, R. D. (2003). Further considerations of a general d′ in multidimensional space. Journal of Mathematical Psychology, 47, 220–224. Thomas, R., Altieri, N., Silbert, N., Wenger, M., & Wessels, P. (2015). Multidimensional signal detection decision models of the uncertainty task: Application to face perception. Journal of Mathematical Psychology, 66, 16–33. Townsend, J. T., Houpt, J. W., & Silbert, N. H. (2012). General recognition theory extended to include response times: Predictions for a class of parallel systems. Journal of Mathematical Psychology, 56, 476–494. Tuerlinckx, F. (2004). The efficient computation of the cumulative distribution and probability density functions in the diffusion model. Behavior Research Methods, Instrumentation, & Computers, 36, 702–716. Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108, 550–592. Wilken, P., & Ma, W. J. (2004). A detection theory account of change detection. Journal of Vision, 4(12), Art. 11. Woodworth, R. S., & Schlosberg, H. (1954). Experimental psychology (revised ed.). London UK: Methuen. Wyszekci, G., & Stiles, W. S. (1982). Color science: concepts and methods, quantitative data and formulae (2nd ed.). New York, NY: Wiley. Zhang, W., & Luck, S. J. (2008). Discrete fixed-resolution representations in visual working memory. Nature, 453, 233–235.