Journal of Mathematical Psychology 45, 603634 (2001) doi:10.1006jmps.2000.1340, available online at http:www.idealibrary.com on
Extending General Processing Tree Models to Analyze Reaction Time Experiments Xiangen Hu The University of Memphis
General processing tree (GPT) models are usually used to analyze categorical data collected in psychological experiments. Such models assume functional relations between probabilities of the observed behavior categories and the unobservable choice probabilities involved in a cognitive task. This paper extends GPT models for categorical data to the analysis of continuous data in a class of response time (RT) experiments in cognitive psychology. Suppose that a cognitive task involves several discrete processing stages and both accuracy (categorical) and latency (continuous) measures are obtained for each of the response categories. Furthermore, suppose that the task can be modeled by a GPT model that assumes serialization among the stages. The observed latencies of the response categories are functions of the choice probabilities and processing times (PT) at each of the processing stages. The functional relations are determined by the processing structure of the task. A general framework is presented and it is applied to a set of data obtained from a source monitoring experiment. 2001 Academic Press
INTRODUCTION
Dependent measures for a cognitive task can be classified into two categories, discrete measures that record the types of behavior observed (e.g., accuracy data) and continuous measures that record the degree or latency of the observed behavior (e.g., RT (response time) data). Cognitive psychologists have a long tradition of analyzing RT data to infer mechanisms of the unobservable mental processes. For certain paradigms, such as the simple detection paradigm and the discrimination paradigm, (see Luce, 1986, for details), mathematical models have been developed This research was supported by Grant BNS-8910552 from the National Science Foundation to William H. Batchelder (University of California, Irvine) and David M. Riefer (California State University, San Bernardino). The original idea of this paper was evolved from a personal communication between the author and John Kounios. The author thanks Dr. William H. Batchelder, who provided instructional comments to the author about this project. The author also thanks Drs. Barbara Dosher, William Dwyer, John Kounios, David LaBerge, R. Duncan Luce, William Marks, Louis Narens, David M. Riefer, and William Shadish and several anonymous reviewers for their helpful comments. Address correspondence and requests for reprints and computer programs to Dr. Xiangen Hu, Department of Psychology, The University of Memphis, Memphis, TN 38152. E-mail: xhu memphis.edu.
603
0022-249601 35.00 Copyright 2001 by Academic Press All rights of reproduction in any form reserved.
604
XIANGEN HU
to obtain RT distributions of the unobservable mental processes by certain types of decomposition of the observed RT distributions. However, the current approach turns to another tradition in analyzing accuracy and RT data. Imagine that one examines accuracy and RT data from a cognitive task that involves several temporally distinct mental processes with several different behavior responses. In such experiments, the traditional decomposition of the observed RT distribution into unobservable RT for the mental processes may not be feasible, simply because of the complexity of the processes or the insufficiency of the RT observations. In such a situation, one may want to consider primarily the accuracy data and use RT data, at the level of the mean, as a supplemental dependent variable. The purpose of this paper is to provide a general framework for such experiments and to demonstrate how to analyze accuracy and RT data in a unified framework. The current approach assumes the discrete stages model of Sternberg (1969) and uses GPT (Hu 6 Batchelder, 1994) to model the cognitive processes. The basic idea is to use a GPT model as a model for the cognitive processes and then to extend the model, with its structural property, to analyze RT data. The advantage of the current approach is that the GPT model methodology has been studied (Hu 6 Batchelder, 1994), implemented (Hu, 1998a, 1998b), and applied in cognitive psychology (Batchelder 6 Riefer, 1997, for a review) as a family of models of choice probability for categorical data, so that results can be applied directly to experiments. The emphasis of the current study is to show how such a methodology can be applied. In this paper, I will provide a general level. In the literature, there are approaches that are in the same spirit as the current study. For example, Link (1982) has demonstrated that a simple GPT model can be used to analyze both accuracy and mean RT data in a lexical decision task. There are also theoretical approaches that use a general network model to study accuracy and RT data (Shiffrin 6 Thompson, 1988; Shiffrin, 1997). The current approach is between a paradigm specific approach (Link, 1982) and conceptual approaches (Shiffrin 6 Thompson, 1988; Shiffrin, 1997). It can be viewed as an extension of Link's (Link, 1982) approach, because more general results are obtained. On the other hand, the current study considers only GPT models, which is a special case of the Shiffrin's (Shiffrin 6 Thompson, 1988; Shiffrin, 1997) general network, and the level of analysis is restricted to the mean of the RT observation. From this point of view, the current study can be viewed as a special case of Shiffrin's (Shiffrin 6 Thompson, 1988; Shiffrin, 1997) approach. I will start the discussion by examining Link's example, then I will introduce the GPT model and extend the model to handle the mean RT. After obtaining the results, I will apply them to a set of data collected in a source monitoring experiment. Finally, I will briefly discuss the relationships between the current approach and other approaches dealing with RT and accuracy data. LINK'S MODEL OF LEXICAL DECISION
The example Link (1982) used is a hypothetical lexical decision experiment designed to allow investigation of the relation between the speed of memory search and the size of the effective lexicon. Suppose that we examine three groups of
605
GPT MODELS FOR ANALYZING RT EXPERIMENTS
TABLE 1 Response Time and Percent Correct for the Three Groups of Subjects Category responses
Group
(1) Korsakoff (2) Nonnatives (3) Readers
RT responses
PC
PE
M
MC
ME
0.60 0.75 0.95
0.40 0.25 0.05
540 570 616
540 560 580
540 600 1, 300
Note. P C is the percent of correct response; P E the percent of incorrect response; M the mean response time; M C the mean correct response time; M E the mean error response time.
participants with different effective lexicon sizes: Korsakoff patients, nonnative English speakers, and native English speakers. Assume that both categorical data (accuracy measurements) and RT data are collected from the task (as shown in Table 1), in which participants are asked to decide whether a letter string is a word or a nonword. Without any modeling approach, one might conclude from the data that RT is closely related to the probabilities of correct responses and that probabilities of correct responses are functions of the effective lexicon, i.e., RT is closely related to the effective lexicons of the participants. With a modeling approach, parameters % 1 and % 2 can be used for substantive psychological processes, Pr[recognized as word | given a word]=% 1 Pr[not recognized as word | given a word]=1&% 1
(1)
Pr[respond ``word'' | not recognized as word]=% 2 Pr[not respond ``word'' | not recognized as word]=1&% 2 . Note that in (1) parameters are in the form of %, (1&%), which are typical forms of link probability (a formal definition is given in Definition 1). (See Fig. 1 for a graphical representation.) Given link probabilities, probabilities for each of the categories can be obtained easily as Pr[respond ``word'' | given a word]=% 1 +(1&% 1 ) % 2
(2)
Pr[respond ``nonword'' | given a word]=(1&% 1 )(1&% 2 ). Given any specific value for % 2 (for example, % 2 =0.5), % 1 can be estimated (see Table 2). Given the values of the parameters, let the random variable { 1 be the PT that the participant consumed in memory search and the random variable { 2 be the time for the participant to guess a letter string as a ``word.'' Let the random vector
606
XIANGEN HU
FIG. 1. A two-state representation of distinct psychological process. Of major interest is State A, which results in perfect performance when a word is presented. Associated with each state is a unique response time (RT) distribution.
TABLE 2 Probability of Identifying the Letter String as a Word, i=1, 2, 3 Estimated probabilities (% 2 =0.5) Group
Probability of successful search (% 1 )
(1) Korsakoff (2) Nonnatives (3) Readers
0.20 0.50 0.90
TABLE 3 Estimated Times for Identifying the Letter String as a Word, i=1, 2, 3 Estimated processing times
Group (1) Korsakoff (2) Nonnatives (3) Readers
Time spent on searching
Time spent on guessing
540.0 540.0 540.0
0.0 60.0 760.0
GPT MODELS FOR ANALYZING RT EXPERIMENTS
607
T=(T 1 , T 2 ), where T 1 and T 2 are the times for the participant to give ``word'' responses and ``non-word'' responses, respectively. Link (1982) uses the model
E(T 1 )=
(1&% 1 ) % 2 %1 (E({ 1 )+E({ 2 ))+ E({ 1 ) % 1 +(1&% 1 ) % 2 % 1 +(1&% 1 ) % 2
(3)
E(T 2 )=E({ 1 )+E({ 2 ). to obtain the estimates for the expected PT. With this model, Link (1982) successfully disentangled the time spent on search and the time spent on guessing, and with the assumption of equal bias (% 2 =0.5) the model showed that equal amounts of time are spent in memory search. The RT differences are due to the different times spent on guessing (see Table 3). The special parameterization of the simple model allows one to estimate the parameters and to disentangle unobservable processes. Link suggested that this simple model could be used as a method for what is called ``correcting response measures for guessing.''
GPT MODELS
The tree diagram used in Link's (1982) study (shown in Fig. 1) is a typical example of a family of models which is currently used in cognitive psychology. Such a graphical tree can be represented mathematically in terms of a binomial distribution with probabilities in the form of (2). In the model, cognitive processes (such as recognition and response bias) are modeled by nodes of the tree diagram and the corresponding probabilities are modeled by the parameters % 1 , % 2 . Furthermore, the parameters appear in the model Eq. (2) in the form of % 1 , % 2 or (1&% 1 ), (1&% 2 ). Models with the above characteristics have been used in cognitive psychology research, especially in human memory studies (Riefer 6 Rouder, 1992; Riefer 6 Batchelder, 1991; Batchelder 6 Riefer, 1980; Batchelder, Hu, 6 Riefer, 1994; Batchelder 6 Riefer, 1986; Riefer, Hu, 6 Batchelder, 1994; Batchelder, Riefer, 6 Hu, 1994). Hu 6 Batchelder (1994) studied the entire family of such models and named the family general processing tree (GPT) models (see Hu, 1998b, for other resources about GPT models). The approach in this paper is in the same spirit as Link (1982). However, I use GPT as the categorical data model, which is much more general than the model Link (1982) used. We first give a formal definition of GPT models. Definition 1. Let M(3; ( c ij ); ( a ijkl ) ) be a parametric multinomial model defined over J observable categories C j and with S+K parameters, called link probabilities, partitioned into K groups, k +1 % =1, and K 3=(3 1 , ..., 3 K ), 3 k =(% k 1 , ..., % kJk +1 ), Jl=1 kl k=1 J k =S. Then M is a GPT model if there are positive integers I j , nonnegative integers a ijkl , and
608
XIANGEN HU
nonnegative reals c ij so that the category probabilities p j (3) can be written in the form Ij
p j (3)=Pr(C ij ; 3)= : p ij (3),
(4)
i=1
where j=1, ..., J and i=1, ..., I j with J
: p j (3)=1,
(5)
j=1
where K
Jk +1
p ij (3)=Pr(B ij ; 3)=c ij ` k=1
_ ` % &, aijkl kl
(6)
l=1
which is called the Branch probability. The above definition of GPT models is a direct extension of the definition of Hu and Batchelder (1994). This definition covers situations where the branches are not binary. When all of the branches are binary, namely, J k =1, k=1, ..., K, the definition is identical to the Hu and Batchelder (1994) definition. To understand the above definition, three examples are examined next. In the first example, we rewrite the category probabilities of Link's model to show that the model for lexical decision is a GPT model; in the second example, we provide a pure statistical example of dice tossing to show the meaning of the structure constants ( a ijkl ) and ( c ij ); in the third example, we present a model of source monitoring that will be used in a later section of the paper. LINK'S MODEL AS A GPT MODEL
It is very easy to rewrite the categorical probability (2) in Link's model in the form of (4) and (6). Link's model is a parametric binomial (J=2) with p 1(3)=p 11(3)+ p 21(3), p 2(3)=p 12(3), where 3 is defined as 3=(3 1 , 3 2 ); 3 1 =(% 11 , % 12 ),
% 11 +% 12 =1;
3 2 =(% 21 , % 22 ),
% 21 +% 22 =1;
% 11 =% 1 ,
% 12 =1&% 1 ;
% 21 =% 2 ,
% 22 =1&% 2 ;
GPT MODELS FOR ANALYZING RT EXPERIMENTS
609
and branch probabilities are functions of the link probabilities, p 11(3)=% 111 % 012 % 021 % 022 , a 1111 =1,
a 1112 =a 1121 =a 1122 =0;
p 21(3)=% 011 % 12 1% 121 % 022 , a 2111 =0, 0 11
p 12(3)=% % a 1211 =0,
a 2112 =a 2121 =1, 1 0 12 21
a 2122 =0;
1 22
% % ,
a 1212 =1,
a 1221 =0,
a 1222 =1.
A GPT MODEL FOR DICE TOSSING
To understand the definition of GPT models with nonbinary branches, consider the toss of two identical die. The probability % l is associated with the face l, l=1, ..., 6. For each combination [l 1 , l 2 ], l 1 , l 2 =1, ..., 6, Pr([l 1 , l 2 ])=
{
% 2l 2% l1 % l2
l 1 =l 2 =l l 1 {l 2 ,
(7)
where 6
: % l =1. k=1
Note that the factor 2 in Eq. (7) corresponds to the structure constant c ij in the definition. In most cases, a ijkl takes on values of only 0 or 1. In some other cases, it can be any nonnegative integer. In this example, assuming the tosses are identical, non-negative integer powers can be obtained. A GPT MODEL OF SOURCE MONITORING
In a source monitoring experiment (Johnson, Hashtroudi, 6 Lindsay, 1993), participants study items from two or more different sources and then are tested in a recognition memory task, in which they are required to classify each test item according to its source. For example, in the study phase of a typical two-source monitoring experiment, the experimenter might present different items to participants as pictures, while others are presented as words. In the testing phase, participants are presented with items from the studied list (picture items or word items) and with some items that were not on the studied list (new). Participants are required to classify each item as a picture, word, or new. In a source monitoring experiment, several cognitive processes are of interest to cognitive psychologists, such as item recognition (or detection), source discrimination, and response biases. In the literature, several source monitoring experiments have been conducted to study these processes (Batchelder et al., 1994a; Bayen, Murnane, 6 Erdfelder, 1996;
610
XIANGEN HU
TABLE 4 Data Representation for a Two-Source Monitoring Experiment Responses Sources Source 1 Source 2 New
FIG. 2.
Source 1
Source 2
New
Marginal(fixed)
Y1 Y4 Y7
Y2 Y5 Y8
Y3 Y6 Y9
Y 1. =Y 1 +Y 2 +Y 3 Y 2. =Y 4 +Y 5 +Y 6 Y 3. =Y 7 +Y 8 +Y 9
Tree for source i items (i=1, 2). R i denotes the response source i, N the response new item.
FIG. 3. D 3 is the probability of detecting new items as new. With D 3 =0, one gets the high-theshold model.
TABLE 5 Definition of the Parameters for Source Monitoring Model Stimulus discrimination % 11(D 1 ) % 21(D 2 ) % 81(D 3 )
Probabilities of correct detection for an old Source 1 item. Probabilities of correct detection for an old Source 2 item. Probabilities of correct detection for a New item. Source discrimination
% 31(d 1 ) % 41(d 2 )
Probabilities of discriminating the source of detected old Source 1 items. Probabilities of discriminating the source of detected old Source 2 items. Response bias
% 51(b) % 61(a) % 71( g)
Probability of responding Source 1 or Source 2 to a New item. Probability of guessing that a detected item belongs to Source 1. Probability of guessing that an undetected item belongs to Source 1.
Note. To be consistent with the notation in this paper, parameters are denoted as % 11 , ..., % 81 . In Bayen et al. (1996), the corresponding parameters are D 1 , D 2 , d 1 , d 2 , b, a, g, D 3 .
GPT MODELS FOR ANALYZING RT EXPERIMENTS
611
Ferguson, Hashtroudi, 6 Johnson, 1992; Hashtroudi, Johnson, 6 Chrosniak, 1989; Johnson et al., 1993; Johnson, Foley, Suengas, 6 Raye, 1988; Lindsay, Johnson, 6 Kwon, 1991; Lindsay 6 Johnson, 1989; Riefer et al., 1994). Typical data from a two-source monitoring experiment can be displayed in a 3_3 table, as shown in Table 4. Batchelder and Riefer (1990) studied the source monitoring experiments and proposed a multinomial model for such paradigms. Figures 2 and 3 show a variation of Batchelder and Riefer's (Batchelder and Riefer (1990) model. In this model, cognitive processes are measured by 8 specifically designated parameters (see Table 5). Under the assumption that discrimination processes get started after detection is completed, a processing tree model is obtained (see Fig. 2). The equations for the observed categories can therefore be obtained from the tree representation, which is in the same functional form as specified by (4) and (6). p j (3)=p 1j (3)+p 2j (3)+p 3j (3) p 1(3)=% 11 % 31 +% 11 % 32 % 61 +% 12 % 51 % 71 p 2(3)=% 11 % 32 % 62 +% 12 % 51 % 72 p 3(3)=% 12 % 52 p 4(3)=% 21 % 42 % 61 +% 22 % 5 % 7
(8)
p 5(3)=% 21 % 41 +% 21 % 42 % 62 +% 22 % 51 % 72 p 6(3)=% 22 % 52 p 7(3)=% 51 % 71 % 82 p 8(3)=% 51 % 72 % 82 p 9(3)=% 81 +% 52 % 82 EXTENDING GPT MODELS TO HANDLE RT EXPERIMENTS
We have seen how Link's (1982) example using the lexical decision task, when both accuracy and latency data are obtained, can be considered in terms of a GPT model. Let us consider a more general situation in a cognitive task where participants make different categorical responses and, at the same time, latency data are collected for each of the responses. Furthermore, assume that the mental processes involved in the task are serially arranged and the information transmission can be approximated in an all-or-none fashion. Under such conditions, a GPT model can be used to represent the mental processes, where the serialization of the mental processes is represented as a tree structure and information transmission from state to state is represented as the conditional probabilities (or choice probabilities) of the model, namely, the link parameters. In this section, I assume that a GPT model is used for such a task and that link probabilities are estimated from the accuracy data (Hu 6 Batchelder, 1994; Hu, 1998a, 1998b). Given the parameter estimates and the structure of the GPT model, we then apply it to the
612
XIANGEN HU
obtained mean RT for each of the observed categories. By using the estimated parameters and the same GPT model, a straightforward relationship between the observed mean RT and the unobserved mean PT for the participating mental processes can be obtained. For the discussion of this section, I first outline a few necessary assumptions. Assumptions The assumptions that are necessary for the current approach to work are the discrete stages model of Sternberg (1969). More specifically, I am considering cognitive tasks in which information is processed in finite stages in a serial fashion. At each stage, a positive amount of PT is consumed and the processed information is passed to one and only one of its several alternative states immediately afterword. The dependent measures for the given task are the response categories and the latency of the responses. Furthermore, I assume that such a discrete stages model can be modeled by an identifiable 1 GPT model. For later discussion, I formally assume that a cognitive task meets the following basic assumption. Assumption 1 (Discrete Stages). The task involves K discrete processing stages. Assumption 2 (Serialization). In any stage, say stage s, information is received from the previous stage and the processed information is transformed to one and only one of its (J k +1) immediate stages with probability % kl , l=1, ..., (J k +1), k=1, ..., K. Assumption 3 (Identifiability). model (Definition 1),
The task can be modeled by an identifiable GPT
M(3; ( c ij ); ( a ijkl ) ), with each branch B ij representing a possible serial processing path leading to its terminal category C j . Assumption 4 (Nonzero RT). For each link in the tree, there is a RT distribution with a positive mean. Assumption 5 (Equal RT). When nodes are repeated in the tree, the RT distributions for corresponding links are identically distributed. Assumption 6 (Additivity). The RT distribution for a branch B ij is the sum of the (not necessarily independent) link RTS. Assumption 7 (Independence). Choice probabilities are independent of PT. There is no need to explain any of the assumptions except for the identifiability assumption and independence assumptions. The identifiability assumption says that (1) the task can be modeled by a GPT model and (2) for any given set of data the MLE of the parameters is unique. With the first six assumptions above, there are 1
Identifiability means unique parameter recoverability from category probabilities.
GPT MODELS FOR ANALYZING RT EXPERIMENTS
613
two sets of random variables, discrete random variables that correspond to the choice at each stage and continuous random variables associated with the PT of the choices made. The independence assumption assumes that the choice probabilities at each stage of the process are independent of PT. For example, Link (1982) assumed that the random variable for the PT is a function of the choice probability but the choice probability is independent of the PT. With this assumption he was able to estimate the choice probability first and then use the estimated choice probability to estimate the mean PT associated with each of the stages. This is a rather strong assumption. I will briefly discuss the situation when this assumption is violated. Next, we present the RT extension of GPT models based on the above seven assumptions. RESULT
Based on the above seven assumptions, I can first estimate the choice probabilities based on the categorical responses. After the choice probabilities have been obtained, I then use the observed latency data to recover the PT. Because the estimation theory of the choice probabilities for GPT models has been systematically studied and implemented (Hu 6 Batchelder, 1994; Hu 1998a, 1998b), in this paper I will always assume that the parameters of the GPT model have been uniquely obtained from the accuracy data. The major theorem of this paper is a direct extension of Link's approach. Theorem 1. Assume that at any stage associated with parameter group 3 k the absolute duration (PT ) for passing information to the l th immediate stage is a random variable { kl , l=1, 2, ..., (J k +1). So, there are (J k +1) ({ kl , l=1, 2, ..., J k +1) PT corresponding to the link parameter vector 3 k that correspond to (J k +1) immediate stages next to stage k, k=1...K. Then T=4t ,
(9)
where 1. T=(E(T 1 ), ..., E(T J )) , t=(t 1 , ..., t K ), t k =(E({ k1 ), ..., E({ k(Jk +1) )), and T j is the random variable that represents the RT for observable category C j , j=1, ..., J. 2. 4 is a J_((J k +1)) matrix of functions of 3=(3 1 , ..., 3 K ). Proof. Assume that T ij is the random variable corresponding to the time spent on finishing branch B ij . From the additivity assumption and (6), K
E(T ij )= : k=1
Jk +1
_:
&
[a ijkl E({ kl )] .
l=1
(10)
Multiplying both sides by p ij (3)p j (3), we have K
p ij (3) p ij (3) E(T ij )= : p j (3) p j (3) k=1
Jk +1
_:
l=1
&
[a ijkl E({ kl )] .
(11)
614
XIANGEN HU
From (4), E(T j ), the observed mean RT for category C j , is a weighted average of the mean RT, E(T ij ), for all the branches that go to the category, namely, Ij
E(T j )= : i=1
p ij (3) E(T ij ). p j (3)
(12)
Denote a ijk =(a ijk 1 , ..., a ijk(Jk +1) ) 1_(Jk +1) .
(13)
We can rewrite a portion of (11) as Jk +1
: [a ijkl E({ kl )]=a ijk _t k . l=1
Then (11) becomes p ij (3) K p ij (3) E(T ij )= : (a ijk _t k ). p j (3) p j (3) k=1
(14)
a ij =(a ij1 , ..., a ijK ) 1_( Kk=1 (Jk +1)) .
(15)
Furthermore, denote
From (12) and (14), Ij
E(T j )= : i=1
I
_
j p ij (3) p ij (3) (a ij _t ) = : a ij p j (3) p j (3) i=1
& \ _
&+_t .
Denote I1
:
\ + i=1
4=
(16)
b
IJ
:
i=1
then (9) is obtained.
p i 1(3) ai1 p 1(3)
p iJ (3) a iJ p J (3)
J_(
K (Jk +1)) k=1
,
K
To understand the above theorem and how the methods work, let us apply the results of the theorem back to Link's example. The observed mean RT and the mean PT for each of the stages are T=(E(T 1 ), E(T 2 )) , t 1 =(E({ 11 ), E({ 12 )), t 2 =(E({ 21 ), E({ 22 )).
615
GPT MODELS FOR ANALYZING RT EXPERIMENTS
a ijk , defined in (13), can be obtained by a 111 =(a 1111 , a 1112 ) 1_2 =(0, 1)
a 112 =(a 1121 , a 1122 ) 1_2 =(0, 0)
a 211 =(a 2111 , a 2112 ) 1_2 =(1, 0)
a 212 =(a 2121 , a 2122 ) 1_2 =(1, 0)
a 121 =(a 1211 , a 1212 ) 1_2 =(0, 1)
a 122 =(a 1221 , a 1222 ) 1_2 =(0, 1),
where J=2, J k=1 =1, k=1, 2; and a ij , defined in (15), is hence obtained by a 11 =(a 111 , a 112 )=(a 1111 , a 1112 , a 1121 , a 1122 ) 1_4 =(0, 1, 0, 0) a 21 =(a 211 , a 212 )=(a 2111 , a 2112 , a 2121 , a 2122 ) 1_4 =(1, 0, 1, 0) a 12 =(a 121 , a 122 )=(a 1211 , a 1212 , a 1221 , a 1222 ) 1_4 =(0, 1, 0, 1), I
j where i=1, .., I j , J=2. Using the obtained a ij , i=1 [( p ij (3)p j (3)) a ij ] are obtained,
I1
p 11(3)
p i 1(3)
p 21(3)
_ p (3) a & = p (3) (0, 1, 0, 0)+ p (3) (1, 0, 1, 0) p (3) p (3) : _ p (3) a & = p (3) (0, 1, 0, 1).
:
ij
1
i=1 I2
1
i2
1
12
ij
2
i=1
2
Finally,
4=
\
p 21(3) p 1(3) 0
p 11(3) p 1(3) p 12(3) p 2(3)
p 21(3) p 1(3)
0
0
p 12(3) p 2(3)
+
and (E(T 1 ), E(T 2 )) =4t
=
\
(17)
p 21(3) p 1(3) 0
p 11(3) p 1(3) p 12(3) p 2(3)
p 21(3) p 1(3)
0
0
p 12(3) p 2(3)
E({ 11 ) E({ 21 ) E({ 12 ) E({ 22 )
+\ +
% 11 % 12 % 21 (E({ 12 )+E({ 21 ))+ E({ 11 ) = % 12 % 21 +% 11 . % 12 % 21 +% 11 E({ 12 )+E({ 22 )
\
+
(18)
Note that (18) contains four PT and that there are only two observed mean RT. The solution for (18) will not be unique. In general, Eq. (9) is a system of simultaneous equations relating an unobservable vector of mean PTs, t, and a vector of observed mean RTs, T, for each of the categories. Because (9) specifies only the relationship between observed mean latency and mean RTs for the unobservable processes, there are no distributional assumptions to provide statistical tests for the fit of the model. I will discuss the model from the mathematical point
616
XIANGEN HU
of view, namely, the existence and uniqueness of the solutions. Given the existence of a unique solution, I then consider the interpretations of the parameters from the point of view of psychology. For example, in order to have a meaningful interpretation of the obtained t, which is a vector of the mean PTs of the participating processes, all elements of t need to be nonnegative. Such a consideration is useful and it may be used to question certain assumptions about cognitive processes. For instance, if under a given assumption the unique t is obtained with some negative elements, then it is reasonable to assume that the relationship between the observed mean RT and the unobserved mean PTs is incorrectly specified, hence alternative assumptions about the processes are supported. I will use this as a guideline when I analyze the example in the next section. Next, I discuss the mathematical properties of the system of simultaneous equations specified in (9). Uniqueness of t The existence and uniqueness problem for (9) is standard linear algebra. Usually, researchers are more interested in the existence and uniqueness solutions for a minimization problem of the residual of (9), namely, &T&4t &.
(19)
Results in matrix algebra show that a necessary and sufficient condition for the existence of a unique t that minimizes the residual (19) of (9) is that 4 4 has full rank. Since 4 is a J_( K k=1 (J k +1)) matrix, the necessary condition for the existence of such t is K
: (J k +1)J.
(20)
k=1
In other words, the number of possible unknown mean PTs, namely Kk=1 (J k +1), should be less than or equal to the number of the observed categories J. In some cases, 4 4 may not be a full rank matrix (for instance, 4 has a rank less than J) even if (20) holds. However, for any given 4, numerical procedures are available to determine its rank. The ideal situation is to obtain a model such that (1) 4 4 is full rank, so the solution for the minimization is unique, and (2) min[&T&4t &]=0, which means there is a solution for (9). Usually, Kk=1 (J k +1)>J, which means 4 4 cannot be full rank and unique solutions do not exist. Next, I provide two special examples to reduce the number of unknowns in (9), to meet the necessary condition for the unique solutions for (19). I will then provide a general approach for this purpose. Equating Mean PT to Each Other In a GPT model, the number of unknown mean PTs (namely, K k=1 (J k +1)) is usually larger than the number of independent parameters (namely, Kk=1 J k ). In most cases, the number of unknown mean PTs exceed the number of observed categories (namely, J ). For example, in the source monitoring example, the total number of unknown mean PTs is 16 (8 parameters, J k =1, K k=1 (J k +1)=16) and
GPT MODELS FOR ANALYZING RT EXPERIMENTS
617
the number of mean RTs is just 9 (9 cells of the 3_3 data table). The maximum number of unknown PTs that can be uniquely obtained is 9, but the maximum number of choice parameters that can be uniquely estimated is 6 (the choice probability data have only 6 degrees of freedom). The most straightforward solution for such a problem is to assume that certain processes have the same PT random variable, and hence the same mean PT. For example, in Link's model there should be 4 PT random variables 2 for memory searching (success or failure) and 2 for guessing (word or nonword). Link (1982) assumes that the time consumed for memory search is { 1 , regardless of whether the search is successful or not. By making such an assumption, only one mean PT is associated with searching. In the same way, the time variable for participants to guess is { 2 , regardless of guessing words or nonwords. With these assumptions, Link was able to obtain the mean PT, E({ 1 ) and E({ 2 ), uniquely. Notice the fact that K k=1 (J k +1) is the largest number of unknown PT in M(3; ( c ij ); (a ijkl ) ), which is determined by the number of parameter groups (K) and the number of link probabilities with each group (J k +1). From what Link did for the lexical decision model, I extend the same method to the general situation where GPT model is used. In general, the restriction assumes that there is only one PT in the parameter group k 0 (associated with the link probabilities 3 k0 =(% k0 1 , ..., % k0 Jk +1 ), 1k 0 K), 0 which can be written in the form of E({ k01 )= } } } =E({ k0(Jk +1) )=E({ k0 ). 0
(21)
With such a restriction, the number of unknown mean PT is reduced by J k0 , and the new model T=4$(t$)
(22)
can be obtained from the original model (9), where t$=(t 1 , ..., t K ), tk =
k{k 0 k=k 0 ,
(E({ k1 ), ..., E({ k(Jk +1) ))
{E({ ) k0
and I1
:
\ + i=1
4$=
p i 1(3) a$i 1 p 1(3) b
IJ
:
i=1
p iJ (3) a$iJ p J (3)
J_(
a ij =(a ij 1 , ..., a ijK ) 1_( K
k=1
a ijk =
{
K (Jk +1)&Jk ) k=1 0
(Jk +1)&Jk )
(a ijk1 , ..., a ijk(Jk +1) ) 1_(Jk +1)
0
k{k 0
Jk +1 0
: a ijk0 l l=1
k=k 0 .
(23)
618
XIANGEN HU
One can imagine that, by applying restrictions in the form of (21) repeatedly, eventually the number of unknown mean PT will be less than or equal to the number of observed categories. It is important to point out that the original model (9) and the model after the restriction (22) have the same structure. Applying this method to Link's example by assuming E({ 11 )=E({ 12 )=E({ 1 ) and E({ 21 )=E({ 22 )=E({ 2 ), Eq. (18) is identical to (3), hence given the observed RT (Table 2), PT (Table 3) can be obtained. Equating Mean PT to Constants The second solution is to assume E({ k0 l0 )=C k0 l0 ,
(24)
for some 1l 0 J k0 +1, 1
(25)
where C=(c 1 , ..., c J ) is a 1_J vector of constants, Ij
cj= : i=1
p ij (3) [a ijk0 l0 C k0 l0 ], p j (3)
and t"=(t 1 , ..., t K ),
tk =
2
{
(E({ k 1 ), ..., E({ k(Jk +1) )) (E({ k0 1 ), ..., E({ k0(l0 &1) ), E({ k0(l0 +1) ), ..., E({ k0(Jk +1) ))
k{k 0 k=k 0 1
(E({ k0 1 ), ..., E({ k0(Jk &1) ))
k=k 0 l 0 =J k0 +1
(E({ k0 2 ), ..., E({ k0(Jk ) ))
k=k 0 l 0 =1
0
0
0
This is often done in mathematical modeling. For example, in Link's model of lexical decision, there are two parameters, b for the response bias and p for the probability of a correct identification of a word. The fact that there are only two alternatives (word and nonword) allows one to have only one independent parameter in the model. Link just assumed b to be a constant of 0.5, so p could be obtained uniquely. To understand (24) in the current approach, assume that one tries to examine the differential effect of instructions on the decision processes of the participants. Participants are required to make one of several key presses. It is reasonable to assume that the time that is associated with the motor response is constant, while the times for the decision processes are different under different instructions. In this situation, if one assumes that C (any constant) is the mean RT for the motor response, then C disappears when the differences of the times for the decision processes are examined.
GPT MODELS FOR ANALYZING RT EXPERIMENTS
I1
:
4"=
b
IJ
:
i=1
=
p i 1(3) a$i 1 p 1(3)
\ + i=1
p iJ(3) a$iJ p J (3)
J_(
a ij =(a ij 1 , ..., a ijK ) 1_([ K
k=1
a ijk =
619
{
K (Jk +1)]&1) k=1
(Jk +1)]&1)
(a ijk 1 , ..., a ijk(Jk +1) ) 1_(Jk +1) (a ijk0 1 , ..., a ijk0(l0 &1) , ..., a ijk0(l0 +1) , ..., a ijk0(Jk +1) ) 1_Jk
k{k 0 k=k 0 1
(a ijk0 1 , ..., a ijk0 Jk ) 1_Jk
k=k 0 l 0 =J k0 +1
0
0
0
0
(a ijk0 2 , ..., a ijk0(Jk +1) ) 1_Jk 0
0
(26)
k=k 0 l 0 =1.
Note that (25) is no longer in the form of (9) or (22). A simple manipulation of (25) can change it into the same form of (9) or (22). For example, T"=4"(t") ,
(27)
where T"=T&C T. Given (9), one may use the above described methods to reduce the number of unknown mean PT, so the necessary condition for the unique solution can be satisfied. It is relatively easy to determine the uniqueness of a vector t that minimizes the residual (19). For example, the singular value decomposition (Press, Teukolsky, Vetterling, 6 Flannery, 1992, Chap. 2) routine can be used to determine the rank of 4 4. If 4 4 is full rank, then a unique vector t that minimizes the residual (19) can be obtained. General Approaches Equating mean PT to each other or equating mean PT to a constant are two special cases of hypothesis that are motivated by psychological principles. As one can see, in either case the resulting relationship between the observed mean RT and the mean PT is the original Eq. (9) plus some linear constraints on the matrix t . For example, assuming that two mean PT are the same is equivalent to eliminating one element in t , which means reducing the number of columns of 4 by 1. In the same way, the case of equating one PT to a constant eliminates one element in t , reduces the number of columns of 4 by 1, and adds a constant vector C . From these observations, a general approach on model restriction can be applied. The purpose of imposing constraints on (9) is to reduce the number of unknown mean PTs in the system of equations (9), so that the remaining mean PTs can be obtained (uniquely). As indicated by the previous special examples, there are two types of constraints:
620
XIANGEN HU
1. constraints that result in the elimination of elements of t (equating two or more mean PT is a special case), and 2. constraints that result in the elimination of elements of t and add a constant vector to the system of equations (equating mean PTs to a constant is a special case). For both of the cases, the new model will be in the form of T=4t
(same as (9))
Ut =V, where Ut =V represent some linear constraints on the mean PT; U is a matrix with the same number of rows as the number of elements in t and V is a vector with the same number of elements as the row of U. V=0 (vector of zero) corresponds to the first case and V{0 (vector of zero) corresponds to the second case. Next, I apply the theorem and the above-mentioned methods to a source monitoring experiment.
AN EXAMPLE
In this section, we will apply the model and analysis in this paper to a set of RT data collected in a source monitoring experiment. I will emphasize the application of the methodology and demonstrate how one could use the developed methods to analyze RT data. I will not discuss the details about the theoretical background of the experiments (see Johnson, Kounios, 6 Reeder, 1994, for detailed discussions about the experiments). First, I will analyze the RT data using a GPT model for source monitoring (see Figs. 2 and 3 and the parameter definitions in Table 5). Then I will analyze an alternative processing tree model for source monitoring (see Fig. 4), namely, a model that assumes a different order for the detection and discrimination processes. I argue that the extension of the GPT model to handle RT data may help researchers rule out invalid assumptions in cognitive modeling. TABLE 6 Frequency Data Table of the Experiment (Johnson et al., 1994)
P I N
P
I
N
480 29 5
89 545 19
31 24 1174
GPT MODELS FOR ANALYZING RT EXPERIMENTS
621
TABLE 7 Mean Response Time Data Table of the Experiment
P I N
P
I
N
1726 1930 3615
2402 1845 2581
1907 1397 1285
Analyzing RT Data for Source Monitoring Johnson et al. (1994) 3 conducted a source monitoring experiment in which both frequency and RT data were collected. In their experiment, participants were asked whether a given item was perceived or imagined in the learning phase or if it was a new item that was not learned in the learning phase. Both response frequency (Table 6) and RT (Table 7) data were collected. Note that if one sets % 8 (or D 3 )=0 (in Fig. 3) then the BatchelderRiefer model is obtained (see (Batchelder and Riefer (1990) for details). Frequency data in Table 6 are used to fit Model 5c from Batcheldar and Riefer (1990). 4 Parameter values can be estimated and are included in Table 8. I now consider the residual &T&4t &,
(28)
where T is the vector of the observed mean RTs (Table 7), t is a 1_16 vector of unknown mean PTs corresponding to the 8 processes in the model, and 4 is a 9_16 design matrix which can be obtained using (16), where the a ij are rows in the power matrix (Table 9) and p ij (3)p j (3) is computed by (8) with values of 3 provided in (8) (Table 10). As I have previously discussed, the necessary condition for the existence of unique t that minimizes the residual (28) is that the number of unknown elements in t is less than the number of rows in the design matrix 4. Next, I will apply the above methods to reduce the number of unknown elements in t in order to find a unique solution.
One PT for Each Process I first assume that each subprocess has only one PT. In this case, I use (21) and apply the algorithm outlined in (23) repeatedly to obtain a design matrix 4$ 3 Personal communication. Frequency data have been published (Johnson et al., 1994). I use the unpublised RT data for demonstration only. 4 Model 5c means that participants do not differ in item detections (D 1 =D 2 ) and response biases (a=g). See Batchelder and Riefer (1990) for detail.
622
XIANGEN HU
TABLE 8 Parameter Estimation of Model 5c in Batchelder and Riefer (1990). Model 5c Assumes that There Are No differences in Item Detection (D 1 =D 2 ) and that Response Biases Are the Same for Detected and Undetected Items ( g=a). Parameters
MLE
95 0 CFI
In model
In tree
%81 % 11 =% 21
D3 (D 1 =D 2 ) D
0.00000 0.95315
[0.94300, 0.96331]
%31 %41
d1 d2
0.80317 0.75846
[0.75664, 0.84970] [0.58423, 0.93270]
%51 % 61 =% 71
b (a= g) g
0.02003 0.20850
[0.01337, 0.02669] [0.07215, 0.34486]
/ 2(1)=0.91187
(Table 11). Although 4$ has eight columns and nine rows, it turns out that 4$ only has a rank of seven, so we still can not obtain a unique t=(E({ 1 ), ..., E({ 8 )) to minimize the residual (19). This is a situation where even though the necessary condition (20) holds, the unique solution still does not exist. This fact indicates that only seven of the eight mean PTs are independent. One way to solve such a
TABLE 9 Power Matrix for the Source Monitoring Model j
i
% 11
% 12
%21
% 22
% 31
% 32
% 41
% 42
%51
% 52
% 61
% 62
% 71
% 72
%81
% 82
1 1
1 2
1 1
0 0
0 0
0 0
1 0
0 1
0 0
0 0
0 0
0 0
0 1
0 0
0 0
0 0
0 0
0 0
1 2
3 1
0 1
1 0
0 0
0 0
0 0
0 1
0 0
0 0
1 0
0 0
0 0
0 1
1 0
0 0
0 0
0 0
2
2
0
1
0
0
0
0
0
0
1
0
0
0
0
1
0
0
3 4
1 1
0 0
1 0
0 1
0 0
0 0
0 0
0 0
0 1
0 0
1 0
0 1
0 0
0 0
0 0
0 0
0 0
4 5
2 1
0 0
0 0
0 1
1 0
0 0
0 0
0 1
0 0
1 0
0 0
0 0
0 0
1 0
0 0
0 0
0 0
5 5
2 3
0 0
0 0
1 0
0 1
0 0
0 0
0 0
1 0
0 1
0 0
0 0
1 0
0 0
0 1
0 0
0 0
6 7
1 1
0 0
0 0
0 0
1 0
0 0
0 0
0 0
0 0
0 1
1 0
0 0
0 0
0 1
0 0
0 0
0 1
8 9
1 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1 0
0 0
0 0
0 0
0 0
1 0
0 1
1 0
9
2
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
GPT MODELS FOR ANALYZING RT EXPERIMENTS
623
problem is to set one of the eight mean PTs to a free constant, which is what I turn to next. Applying Constant Restrictions to (21) To say that a 9_8 matrix has a rank of seven means that one of the columns is linear dependent on the other seven columns. Assuming that one element in t=(E({ 1 ), ..., E({ 8 )) equals a constant may reduce one column and the result is a 9_7 full rank matrix. For example, assuming 5 that E({ 8 )=C, a new model in the form of (24) can be obtained, with a 9_7 design matrix resulting by elimination of the last column of 4$ (Table 11). By doing so, a unique set of mean PTs, E({ 1 )=623+C E({ 2 )=112+C E({ 3 )=1199&C E({ 4 )=1500&C
(29)
E({ 5 )=1285&C E({ 6 )=455 E({ 7 )=1813 E({ 8 )=C is obtained, with a minimum value of the residual &T$=4$(t$) &=776. Note that before any restrictions on the mean PTs, the residual (28) can be minimized to a value of zero, but the solution is not unique. After parameter restrictions, namely allowing only one PT for each parameter group and restricting E({ 8 )=C, the residual can be minimized with a unique solution but the value is not zero. There is an obvious trade-off between the number of parameters involved in the residual formula and the minimal value of the residual. The goal for this practice is to minimize the value of the residual to zero, with as few parameters as possible. Next, I will examine the model and try to find a situation such that a minimum number of PTs can be uniquely obtained (up to a certain constant) and the residual minimized to a value of zero. Mixed Approach: Certain Processes Have Two PT Whereas Other Processes Have Only One PT One can systematically release the restrictions on each of the processes. Obviously, only certain parameters allow two PTs and still result in a full rank 5
Except for E({ 6 ) and E({ 7 ), assigning any one of the other elements in t to a constant will result in a full rank (rank=number of columns) design matrix.
624
XIANGEN HU
TABLE 10 Original Design Matrix for Detection first Model j
1
2
3
4
5
6
7
8
9
E({ 11 ) E({ 12 ) E({ 21 ) E({ 22 ) E({ 31 ) E({ 32 ) E({ 41 ) E({ 42 ) E({ 51 ) E({ 52 ) E({ 61 ) E({ 62 ) E({ 71 ) E({ 72 ) E({ 81 ) E({ 82 )
1.0 0 0 0 0.951 0.049 0 0 0 0 0.049 0 0 0 0 0
0.995 0.005 0 0 0 0.995 0 0 0.005 0 0 0.995 0 0.005 0 0
0 1.0 0 0 0 0 0 0 0 1.0 0 0 0 0 0 0
0 0 0.996 0.004 0 0 0 0.996 0.004 0 0.996 0 0.004 0 0 0
0 0 0.999 0.001 0 0 0.798 0.201 0.001 0 0 0.201 0 0.001 0 0
0 0 0 1.0 0 0 0 0 0 1.0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1.0 0 0 0 1.0 0 0 1.0
0 0 0 0 0 0 0 0 1.0 0 0 0 0 1.0 0 1.0
0 0 0 0 0 0 0 0 0 1.0 0 0 0 0 0 1.0
matrix. For example, we can assume two mean PTs associated with the response bias processes, namely, the processes with probabilities % 6 and % 7 , t$=(E({ 1 ), E({ 2 ), E({ 3 ), E({ 4 ), E({ 5 ), E({ 61 ), E({ 62 ), E({ 71 ), E({ 72 )).
(30)
The resulting design matrix (see Table 12) has rank 9 and reduces the value of the residual (19) to 0. Finally, a configuration is obtained such that (1) the mean PT for % 8 is constant, (2) there is one PT for each of the processes except the decision processes at the discrimination level. Under such conditions, unique mean PTs TABLE 11 Design Matrix for the Model in Which Only One Mean Processing Time for Each Parameter Group Exists j
1
2
3
4
5
6
7
8
9
E({ 1 ) E({ 2 ) E({ 3 ) E({ 4 ) E({ 5 ) E({ 6 ) E({ 7 ) E({ 8 )
1.0 0 1.0 0 0 0.049 0 0
1.0 0 0.995 0 0.005 0.995 0.005 0
1.0 0 0 0 1.0 0 0 0
0 1.0 0 0.996 0.004 0.996 0.004 0
0 1.0 0 0.999 0.001 0.201 0.001 0
0 1.0 0 0 1.0 0 0 0
0 0 0 0 1.0 0 1.0 1.0
0 0 0 0 1.0 0 1.0 1.0
0 0 0 0 1.0 0 0 1.0
625
GPT MODELS FOR ANALYZING RT EXPERIMENTS
TABLE 12 Design Matrix for the Model in Which Only One Mean Processing Time for Each Parameter Group Exists, Except the Decision Processes Associated with Probabilities % 6 and % 7 j
1
2
3
4
5
6
7
8
9
E({ 1 ) E({ 2 ) E({ 3 ) E({ 4 ) E({ 5 ) E({ 61 ) E({ 62 ) E({ 71 ) E({ 72 )
1.0 0 0.99976 0 0.00024 0.04855 0 0.00024 0
1.0 0 0.99502 0 0.00498 0 0.99502 0 0.00498
1.0 0 0 0 1.0 0 0 0 0
0 1.0 0 0.99594 0.00406 0.99594 0 0.00406 0
0 1.0 0 0.99918 0.00082 0 0.20138 0 0.00082
0 1.0 0 0 1.0 0 0 0 0
0 0 0 0 1.0 0 0 1.0 0
0 0 0 0 1.0 0 0 0 1.0
0 0 0 0 1.0 0 0 0 0
E({ 1 )=622+C E({ 2 )=112+C E({ 3 )=1093&C E({ 4 )=1594&C E({ 5 )=1285&C
(31)
E({ 61 )=216 E({ 62 )=683 E({ 71 )=2330 E({ 72 )=1296 E({ 8 )=C are obtained. From the above analysis for the source monitoring data, it seems that there is no theoretical guarantee for the existence of a model that has a unique solution to minimize the residual to zero. I have been exploring mathematical as well as numerical solutions to this problem. The current study provides only a canonical solution driven by the numerical properties of the original model, which is determined by the structure of the tree model and observed categorical frequencies. The steps used to obtain such a model can be outlined as the following: 1. Restrict one PT for each of the parameter groups. By doing so, a resulting model will have fewer parameters for mean PT than the number of observed mean RT. 2. If the design matrix is full rank, then consider the next step; otherwise, select a few parameters in the model and set them to a constant until a model with full rank design matrix is obtained.
626
XIANGEN HU
Among the above two steps, the second step needs extra explanations. From an algebraic point of view, given a system of simultaneous equations in the form of (22), the dimension of the solution space equals the number of unknown parameters minus the rank of the matrix 4, which is the number of parameters assigned constants. From the modeling point of view, assigning certain parameters as constants will help the interpretations of the model parameters. For example, by assigning an arbitrary constant C to E({ 8 ), unique solutions (up to the constant C ) are obtained. Furthermore, certain relations among the parameters remain unchanged. For instance, the difference between E({ 1 ) and E({ 2 ), which are the mean PTs for detecting source A and source B, respectively, is independent of the constant. The same is true for the difference between E({ 3 ) and E({ 4 ), which are the mean PTs for discrimination of the two sources. It can be seen from (31) that for certain values of C, 0
627
GPT MODELS FOR ANALYZING RT EXPERIMENTS
FIG. 4. Tree model for items for source i, i=1, 2. Parameters D i , d i , a, b, g have the same meaning as in Batchelder and Riefer (1990). This tree model corresponds to the assumption that the discrimination process is before the detection process.
TABLE 13 Power Matrix for the Source Monitoring Model with Discrimination Process Prior to the Detection Process
j
i
%11
%12
%21
%22
%31
%32
%41
%42
%51
%52
%61
%62
%71
%72
%81
%82
1 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9
1 2 3 1 2 1 1 2 1 2 3 1 1 1 1 2
0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0
0 0 1 0 1 0 0 1 0 0 1 0 1 1 0 0
0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1
0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1
628
XIANGEN HU
discriminate the source, namely color, of the items). Figure 4 is a tree representation (old source A and source B items) for such an assumption. Similar model Eqs. (32) and a power matrix (Table 13) are obtained with all parameter definitions identical to those of the BatchelderRiefer model (Table 5). p j (3)=p 1j (3)+p 2j (3)+p 3j (3) p 1(3)=% 31 +% 32 % 11 % 61 +% 32 % 12 % 51 % 71 p 2(3)=% 32 % 11 % 62 +% 32 % 12 % 51 % 72 p 3(3)=% 32 % 12 % 52 p 4(3)=% 42 % 21 % 61 +% 42 % 22 % 51 % 71
(32)
p 5(3)=% 41 +% 42 % 21 % 62 +% 42 % 22 % 51 % 72 p 6(3)=% 42 % 22 % 52 p 7(3)=% 82 % 51 % 71 p 8(3)=% 82 % 51 % 72 p 9(3)=% 81 +% 82 % 52 I use the observed data (Table 6) to fit this model. It is interesting to note that, in comparison to the fits of the detection-first model, the data fits the discrimination-first model better. I first use the power matrix (Table 13) and parameter estimates (Table 14) to obtain the original design matrix (Table 15). I then use the same methods that I have used in analyzing the detection-first model to decide the unknown mean PTs, t$, for the RT model. I obtain the same vector t$ (30). By restricting E({ 81 )= E({ 82 )=C, a design matrix (Table 16) for the new model is obtained and the solution (33) is unique (up to certain constant C ). TABLE 14 Parameter Estimation of the Model under the Assumption that the Discrimination Process Occurs before the Detection Process Parameters In model
In tree
%8 % 1 =% 2 %3 %4 %5 % 6 =% 7
D3 (D 1 =D 2 ) D d1 d2 b (a= g) g
/ 2(1)=0.6157561060
MLE
95 0 CFI
0.00000 0.80136 0.74601 0.78236 0.02003 0.26395
[0.75651, 0.84621] [0.70504, 0.78698] [0.71760, 0.84712] [0.01337, 0.02669] [0.17520, 0.35269]
629
GPT MODELS FOR ANALYZING RT EXPERIMENTS
TABLE 15 Original Design Matrix for the Discrimination First Model j
1
2
3
4
5
6
7
8
9
E({ 11 ) E({ 12 ) E({ 21 ) E({ 22 ) E({ 31 ) E({ 32 ) E({ 41 ) E({ 42 ) E({ 51 ) E({ 52 ) E({ 61 ) E({ 62 ) E({ 71 ) E({ 72 ) E({ 81 ) E({ 82 )
0.067 0 0 0 0.933 0.067 0 0 0 0 0.067 0 0 0 0 0
0.995 0.005 0 0 0 1.0 0 0 0.005 0 0 0.995 0 0.005 0 0
0 1.0 0 0 0 1.0 0 0 0 1.0 0 0 0 0 0 0
0 0 0.995 0.005 0 0 0 1.0 0.005 0 0.995 0 0.005 0 0 0
0 0 0.141 0.001 0 0 0.858 0.142 0.001 0 0 0.141 0 0.001 0 0
0 0 0 1.0 0 0 0 1.0 0 1.0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1.0 0 0 0 1.0 0 0 1.0
0 0 0 0 0 0 0 0 1.0 0 0 0 0 1.0 0 1.0
0 0 0 0 0 0 0 0 0 1.0 0 0 0 0 0 1.0
E({ 1 )=&1053+C E({ 2 )=&1725+C E({ 3 )=1675 E({ 4 )=1837 E({ 5 )=1285&C
(33)
E({ 61 )=1809&C E({ 62 )=1776&C E({ 71 )=2330 E({ 72 )=1296 E({8)=C It is interesting to note that, from (33), for any C>0, there are some elements of t$ that are negative. In other words, a discrimination-first model of source monitoring cannot explain the observed RT data without employing negative RTs, although the model cannot be rejected on the basis of frequency data. Given (33) and the unavoidable negative solutions for the mean PT, this suggests that the discrimination-first model is subject to questions. This outcome is consistent with the SAT study of Johnson et al. (1994), where they found evidence that item detection precedes source discrimination.
630
XIANGEN HU
TABLE 16 Design Matrix for the Discrimination First Model with the Same Parameter Restrictions as that of Table 12 j
1
2
3
4
5
6
7
8
9
E({ 1 ) E({ 2 ) E({ 3 ) E({ 4 ) E({ 5 ) E({ 61 ) E({ 62 ) E({ 71 ) E({ 72 )
0.067 0 1.0 0 0 0.067 0 0 0
1.0 0 1.0 0 0.005 0 0.995 0 0.005
1.0 0 1.0 0 1.0 0 0 0 0
0 1.0 0 1.0 0.005 0.995 0 0.005 0
0 0.142 0 1.0 0.001 0 0.141 0 0.001
0 1.0 0 1.0 1.0 0 0 0 0
0 0 0 0 1.0 0 0 1.0 0
0 0 0 0 1.0 0 0 0 1.0
0 0 0 0 1.0 0 0 0 0
DISCUSSION
The current approach was developed from a simple example of Link (1982). However, its approach to incorporating RTs into GPT models is related to several other extant approaches, e.g., serial-parallel trade-off (Townsend, 1984), PERT network (Townsend 6 Schweickert, 1989; Schweickert 6 Townsend, 1989), orderof-processing models (Fisher 6 Goldstein, 1983; Goldstein 6 Fisher, 1991), and semi-Markovian processing models (Shiffrin 6 Thompson, 1988; Shiffrin, 1997). All these approaches share the assumption that observed RTs are computed over one or more possible processing paths or sequences. This current approach differs from most applications of these other frameworks by allowing multiple response categories and using estimation techniques to obtain the response category parameters separately from the RT. In particular, the current approach uses the GPT model as the model for the categorical data and utilizes the unified mathematical form of GPT model in estimating the means of the RT. The special model and processing assumptions in the current approach brings some advantages and, at the same time, come with some disadvantages. From the point of view of data analysis, the current approach is the simplest. GPT models have been used in cognitive psychology, especially in human memory research. In the literature, GPT models have been used as theoretical models, where theoretical assumptions are embedded in the structure of the models, and as measurement models, where model parameters are choice probabilities of the mental processes (see Batchelder 6 Riefer, 1997, for a review). Using a GPT model as the model of the categorical data and extending the model to examine latency data allow one to investigate theoretical assumptions using both categorical data and latency data, as has been demonstrated in the example of source monitoring. The simplicity of data analysis in the current approach is due to certain simplifications in model assumptions and the level of analysis of the RT distribution, which brings several limitations to the current approach. To be specific, there are three limitations:
GPT MODELS FOR ANALYZING RT EXPERIMENTS
631
The first limitation is due to the serialization assumption of the processes. Although such an assumption simplifies computation, it nevertheless limits the range of possible applications. The second limitation is due to the assumption that the choice probabilities (model parameters %) are independent of the choice processing times ({) in each of the states. In general, this assumption is violated. The third limitation is that the current model only examines PT at the level of the mean. Because there is no information about the variability of RT, (9) is not capable of providing significance tests. Because of this, the current approach can be used only to obtain supporting evidence (with all positive mean PT estimates) or to obtain an indication of invalid assumptions (some negative mean PT estimates). Further examination of models is needed in either case. General solutions to the above limitations of the current study are implied in several approaches in the literature (Luce, 1986; Townsend 6 Ashby, 1983). For example, for the independence assumption, one can assume that the choice probabilities are related to the PT for each of the mental processes. Although different functional relations can be hypothesized, a reasonable assumption is the framework of random walk models (Link 6 Heath, 1975; Ratcliff, 1978), hence the relations are functions of the drift rate and boundaries (Shiffrin, 1997, p. 337). In this case, the link parameter % kl is a function G k 1 of the mean PT of all the alternatives for the given mental process, k, namely, % k 1 =G k 1(E({ kl ), ..., E({ kJk +1 )), for l=1, ..., (J k +1); k=1, ..., K. Furthermore, link parameters are also related to the observed frequencies in the form of (4) and (6), and E({ kl ), l=1, ..., (J k +1), k=1, ..., K, satisfy (9) where T is the observed mean RT, t is the vector of E({ kl ), and 4 is a matrix of the functions of parameters. Obviously, under such an assumption, the computation will be more complicated. In general, different assumptions of the random walk models (Vickers, 1970; Link, 1975; Link 6 Heath, 1975; Laming, 1968; Smith 6 Vickers, 1988) will give different forms of the functional relations of G k 1 . For any given assumption, which will entail a specific form of G k 1 , an algorithm can be obtained, but that is beyond the reach of the current approach. In the current approach, assuming that independence simplifies the computation and restricting the analysis of RT to the mean allows us to extend the multiplicative of the link probability (as in Eq. (6)) to the additivity of the mean PTs (as in (11)) in a straightforward way. Conceptually, Shiffrin and colleagues (Shiffrin 6 Thompson, 1988; Shiffrin, 1997) provide a general framework with solutions for each of the limitations. However, their approach remains at a conceptual level (the assumption of the network model is too general). It has been pointed out (Shiffrin, 1997, p. 342) that the implementation of the general network model is complicated. As we have demonstrated in this study, there are obvious advantages to using a GPT model as a special case of more complex network models. Because the mathematical and statistical properties of GPT models are better understood than general network models, the current approach is able to obtain specific algorithms. Furthermore, GPT models have
632
XIANGEN HU
been used in cognitive psychology to study substantive cognitive phenomena, so results obtained based on GPT models can be directly applied to experiments. Additional work that would continue the current study would be to further implement the framework of Shiffrin and colleagues (Shiffrin 6 Thompson, 1988; Shiffrin, 1997) on GPT models. CONCLUSION Link (1982) provided an earlier example of how a categorical data model could be used to analyze RT data. This article uses Link's basic methodology and generalizes it to a much wider family of GPT models. This study provides a simple framework for studies in which both accuracy and latency data are collected. The basic assumption of the current approach is that a cognitive task satisfies the assumptions of the discrete stages model Sternberg (1969). The basic idea in this study is to analyze categorical data using a GPT model, where the observed categorical probability is the sum of the multiplication of choice probabilities of the mental processes (shown in (4) and (6)), and then to extend the GPT model to obtain a system of simultaneous linear equations that relates the observed mean latency data and the unobserved mean PT of the mental processes. The basic result of this paper is that, given a GPT model representation of the cognitive task, not only can the choice probabilities for the mental processes be obtained (Hu 6 Batchelder, 1994; Hu, 1990, 1993, 1998a), but can the mean PTs of the processes (Hu, 1998a). The ideal situation in which the results of this approach can be implemented is when researchers analyze categorical data using the GPT model and also examine RT data at the level of observed mean latency of the observed behavior categories. A source monitoring example demonstrated how the results can be used. Although it may not be used to prove a theory, it can be used to obtain evidence of invalid assumptions about certain cognitive phenomena. The current study is a special case of a broader approach developed by Shiffrin and Thompson (Shiffrin 6 Thompson, 1988; Shiffrin, 1997). Instead of developing a framework for general network models, this approach is based on a family of GPT models, which are, structurally, a special case of the general network model of Shiffrin and Thompson (Shiffrin 6 Thompson, 1988; Shiffrin, 1997). All of the algorithms are obtained based on the mathematical form for this special family of models. The emphasis of the current study is to demonstrate how GPT models can be used to handle RT. Future work that will continue the current approach is to develop algorithms and implement results of Shiffrin et al. (Shiffrin 6 Thompson, 1988; Shiffrin, 1997) on the family of GPT models. REFERENCES Batchelder, W. H., Hu, X., 6 Riefer, D. M. (1994a). Analysis of a model for source monitoring. In G. H. Fischer 6 D. Laming (Eds.), Contributions to mathematical psychology, psychometrics, and methodology (pp. 5165). New York: SpringerEuropean Mathematical Psychology Society. Batchelder, W. H., 6 Riefer, D. M. (1980). Separation of storage and retrieval factors in free recall of clusterablepairs. Psychological Review, 87, 375397.
GPT MODELS FOR ANALYZING RT EXPERIMENTS
633
Batchelder, W. H., 6 Riefer, D. M. (1986). The statistical analysis of a model for storage and retrieval processes in human memory. British Journal of Mathematical and Statistical Psychology, 39, 129149. Batchelder, W. H., 6 Riefer, D. M. (1997). Theoretical and empirical review of multinomial process tree modeling (Tech. Rep. MBS 97-17). Irvine, CA: Mathematical Behavioral Sciences at UC Irvine. Batchelder, W. H., 6 Riefer, D. M. (1990). Multinomial processing models of source monitoring. Psychological Review, 97(4), 548564. Batchelder, W. H., Riefer, D. M., 6 Hu, X. (1994b). Measuring memory factors in source monitoring: Reply to kinchla. Psychological Review, 101(1), 172176. Bayen, U. J., Murnane, K., 6 Erdfelder, E. (1996). Source discrimination, item detection, and multinomial model of sorce monitoring. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 197215. Ferguson, S. A., Hashtroudi, S., 6 Johnson, M. K. (1992). Age differences in using source-relevant cues. Psychology and Aging, 7, 443452. Fisher, D. L., 6 Goldstein, W. M. (1983). Stochastic PERT networks as models of cognition: derivation of the mean, variance, and distribution of reaction time using order-of-processing (OP) diagrams. Journal of Mathematical Psychology, 27, 121151. Goldstein, W. M., 6 Fisher, D. L. (1991). Stochastic networks as models of cognition: Derivation of response time distributions using the order of processing method. Journal of Mathematical Psychology, 35, 214241. Hashtroudi, S., Johnson, M. K., 6 Chrosniak, L. D. (1989). Aging and source monitoring. Psychology and Aging, 4, 106112. Hu, X. (1990). Source: Computer program for source monitoring experiments (available upon request). Memphis, TN: Department of Psychology, The University of Memphis. Hu, X. (1993). MBT: Computer program for general processing tree models (available upon request). Memphis, TN: Department of Psychology, The University of Memphis. Hu, X. (1998a). GPT: Computer program for arbitrary general processing tree models (available at http:xhuoffice.psyc.memphis.edugptindex.htm). Memphis, TN: Department of Psychology, The University of Memphis. Hu, X. (1998b). Resources for general processing tree (gpt) models [on-line] (available at http: xhuoffice.psyc.memphis.edugptindex.htm). Hu, X., 6 Batchelder, W. H. (1994). The statistical analysis of general processing tree models use the EM algorithm. Psychometrika, 59(1), 2147. Johnson, M. K., Foley, M. A., Suengas, A. G., 6 Raye, C. L. (1988). Phenomenal characteristics of memories for perceived and imagined autobiographical events. Journal of Experimental Psychology: General, 117, 371376. Johnson, M. K., Hashtroudi, S., 6 Lindsay, D. S. (1993). Source monitoring. Psychological Bulletin, 114, 328. Johnson, M. K., Kounios, J., 6 Reeder, J. A. (1994). Time-course studies of reality monitoring and recognition. Journal of Experimental Psychology, Learning, Memory, and Cognition, 20(6), 14091419. Laming, D. (1968). Information theory of choice reaction times. New York: Academic Press. Lindsay, D. S., 6 Johnson, M. K. (1989). The eyewitness suggestibility effect and memory for source. Memory and Cognition, 17, 349358. Lindsay, D. S., Johnson, M. K., 6 Kwon P. (1991). Developmental changes in memory source monitoring. Journal of Experimental Child Psychology, 52, 297318. Link, S. W. (1975). The relative judgement theory of two choice response times. Journal of Mathematical Psychology, 12, 114135. Link, S. W. (1982). Correcting response measures for guessing and partial information. Psychological Bulletin, 92(2), 469486. Link, S. W., 6 Heath, R. A. (1975). A sequential theory of psychological discrimination. Psychometrika, 40, 77105.
634
XIANGEN HU
Luce, R. D. (1986). Response times: Their role in inferring elementary mental organization. New York: Oxford University Press. Press, W. H., Teukolsky, S. A., Vetterling, W. T., 6 Flannery, B. P. (1992). Numerical recipes in C: The art of scientific computing (available on-line, CambridgeNew YorkPort ChesterMelbourneSydney: Cambridge University Press. Available at http:cfatab.harvard.edunrnronline.html). Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59108. Riefer, D. M., Hu, X., 6 Batchelder, W. H. (1994). Response strategies in source monitoring. Journal of Experimental Psychology: Learning, Memory 6 Cognition, 20(3), 680693. Riefer, D. M., 6 Batchelder, W. H. (1991). Age differences in storage and retrieval: A multinomial modeling analysis. Bulletin of the Psychonomic Society, 29(5), 415418. Riefer, D. M., 6 Rouder, J. N. (1992). A multinomial modeling analysis of the mnemonic benefits of bizarre imagery. Memory and Cognition, 20(6), 601611. Schweickert, R., 6 Townsend, J. T. (1989). A trichotomy method: interactions of factors prolonging sequential and concurrent mental processes in stochastic PERT networks. Journal of Mathematical Psychology, 33, 328347. Shiffrin, R., 6 Thompson, M. (1988). Moments of transition-additive random variables defined on finite regenerative random processes. Journal of Mathematical Psychology, 32, 313340. Shiffrin, R. M. (1997). A network model for multiple choice: accuracy and response times. In A. A. J. Marley (Ed.), Choice, decision, and measurement: Essays in honor of Duncan Luce (pp. 329346). NJ: Lawrence Erlbaum. Smith, P. L., 6 Vickers, D. (1988). The accumulator model of two-choice discrimination. Journal of Mathematical Psychology, 32, 135168. Sternberg, S. (1969). The discovery of processing stages: Extensions of Donders' method. In W. G. Koster (Ed.), Attention and performance II; Acta Psychologica, 30, 276315. Townsend, J. T. (1984). Uncovering mental, processes with factorial experiments. Journal of Mathematical Psychology, 28, 363400. Townsend, J. T., 6 Ashby, F. G. (1983). The stochastic modeling of elementary psychological processes. Cambridge, England: Cambridge University Press. Townsend, J. T., 6 Schweickert, R. (1989). Toward the trichotomy method of reaction times: Laying the foundation of stochastic mental networks. Journal of Mathematical Psychology, 33, 309327. Vickers, D. (1970). Evidence for an accumulator model of psychophysical discrimination. Eronomics, 13, 3758. Received: November 14, 1996; published online: April 4, 2001