JOURNAL
OF MATHEMATICAL
PSYCHOLOGY:
Theoretical
3, 140-162
(1966)
Interpretations of a Markov Avoidance Conditioning1
JOHN
THEIOS
The University
AND
JOHN
of Texas,
Model
BRELSFORD, Austin,
for
JR.
Texas
Several alternative theoretical interpretations of a general Markov model for avoidance conditioning were considered. Experimental tests with two different conditioning procedures indicated that a two-stage conditioning interpretation (emotional conditioning followed by instrumental conditioning) was not tenable. Two interpretations are consistent with the obtained data. One is based on the notions of a temporary, imperfect memory and a relatively permanent, perfect memory for the association between the CS and an arousal response. The other involves representing the CS and the CS-plus-UCS as sets of stimuli with common components. It was possible to account for performance differences between rats permitted to make escape responses and rats prevented from making escape responses by differences in one parameter of the model and to demonstrate parameter invariance for the remaining parameters.
Simple avoidance conditioning in rats has been described as a Markov process involving three discrete levels of performance (Bower and Theios, 1964; Theios, 1963). Although the abstract structure of the Markov model has been able to predict avoidance conditioning data well, there exists no completely satisfactory psychological interpretation of the process. The present research is directed toward testing two alternative interpretations of the general Markov model for avoidance conditioning.
TWO-STAGE
CONDITIONING
INTERPRETATION
In an early paper Theios (1963) suggested that a subject can be viewed as being naive when an avoidance conditioning experiment begins. In this initial, naive state the organism gets shocked on every trial, since the probability of an avoidance response is zero in the naive state. The shock will cause both central and autonomic responses in the organism which, for convenience, can be called emotional arousal responses. The first stage of learning consists of conditioning the emotional responses to the conditioned stimulus (C‘S) which signals the start of a trial. This conditioning occurs i This of Child
research Health
was supported by PHS and Human Development,
research grant HD 00950 Public Health Service.
140
from
the National
Institute
INTERPRETATION
OF A MARKOV
MODEL
FOR
AVOIDANCE
CONDITIONING
141
with probability c on any trial. After conditioning of the emotional responses the organism is in an intermediate state where the presentation of the CS induces emotional arousal. When aroused, the organism should make an avoidance response by chance with probability p, which is constant over trials. For example, with probability p arousal may lead to increased activity which removes the organism from the shock compartment before the onset of shock. With probability l-p arousal may lead to crouching or “freezing” behavior which prevents an avoidance response. The second stage of learning consists of conditioning the instrumental avoidance response to proprioceptive stimuli from the emotional responses. The instrumental conditioning may occur on a trial with probability s. The Markov process is described by the following transition matrix, response probability vector, and starting vector:
MEMORY
INTERPRETATION
An alternative interpretation of the Markov process may be given in terms of temporary and permanent memory. Again the process is assumed to start in a naive state N where the probability of an avoidance response is zero. However, each time the subject is shocked, emotional arousal becomes conditioned to the CS with a probability equal to unity. With probability c, an association is formed between the stimulus consequences of arousal and the instrumental running response. When this happens the process leaves state N and the association between the CS and arousal is stored in temporary memory, state T. When the association is in temporary memory, the CS will evoke arousal with a probability of unity, and arousal will lead to an avoidance response with a probability of unity. Following an avoidance response, the association may be transferred from temporary to permanent memory, state P, with a fixed probability s. In state P the CS will always induce an arousal response which in turn always leads to an instrumental running response. Thus, the probability of an avoidance response in state P is unity. If the association between the CS and arousal is not transferred to permanent memory, however, it may be forgotten (or if you prefer, may be extinguished) with probability Q during the intertrial interval. This is represented by the process moving from state T to state F. If forgetting occurs, that is, if the subject fails to become aroused, there is no longer a stimulus for the instrumental running response, and the subject makes an error the next time the CS is presented. When the subject gets shocked, the association between the CS and emotional arousal gets reconditioned
142
THEIOS
AND
BRELSFORD
with a probability of unity. The association may be stored in permanent memory with probability s. With probability (l-s) it is not stored in permanent memory but is stored in temporary memory. With probability q, the association may be forgotten during the intertrial interval, and the process moves back to state F before the start of the next trial. The memory model is described by the following transition matrix, response probability vector, and starting vector: P
s
T 0 (1 - s)(l -4) (l--)(1-q)
F
N
0 (1 - s)q (l--)q
0
41 - 9)
c4
0 0 1-C
111[I Pr(avoid) 1 1 0
-0
Pr(start) 0 0 0
(2)
1
where N representsthe naive state, F the forgotten state, T the temporary memory state, and P the permanent memory state. The structure of this model is identical to that proposed in Eq. 1. However, for the memory interpretation, the intermediate state has been separatedinto two statesto specifically indicate the forgetting process. TEST
OF
THE
INTERPRETATIONS
In order to effect a comparisonbetween the two interpretations, two groups of rats were given avoidance training with slightly different procedures. A control group given regular one-way avoidance training (cf. Theios and Dunaway, 1964)was compared to a group that was prevented from making the escaperesponseon error trials. This latter group is called the trapped group, since these subjectswere trapped and shockedin the starting box when they failed to run. According to the identifications of the two-stageconditioning interpretation, the parameterc representsthe probability of conditioning emotional arousalto the CS. It follows that the value of c should not be affected by trapping, since classicalconditioning of emotional responsescould occur equally well whether or not the subject wasallowedto makethe escaperesponse. On the other hand, it would be expected that the value of the parameter s would be reducedby trapping sinces is the probability of conditioning the instrumental response which is blocked on shock trials. The memory interpretation leadsto just the opposite predictions. The value of the parameterc is the probability of conditioning the instrumental responseto the stimulus consequences of arousal.Since the instrumental responseis blocked on trapped trials, the value of c should be reduced by trapping. On the other hand, the value of the parameter s should not be affected by trapping. The parameter s is the probability of permanently storing the associationbetween the CS and emotional arousal. Since storageis a central event not dependentupon the occurrenceof the running response, the value of s should be unaffected by trapping.
INTERPRETATIONS
OF A MARKOV
MODEL
FOR
AVOIDANCE
CONDITIONING
143
At first blush, it might be expected that the operation of trapping should reduce the parameter c to a value of zero since the trapped organism can never make an instrumental escape response. In other words, it might be expected that the trapped subjects should not learn to avoid. This type of reasoning, however, is not sound from either a contiguity or a reinforcement position on learning. On trapped trials the sequence of events is as follows: (1) the CS is presented, (2) the rat sits, (3) shock is added to the CS, (4) the rat has an arousal reaction, (5) the rat runs in random directions, and (6) while the rat is running the CS and shock are terminated. The crucial point is that the trapped rat runs in the presence of the CS and the stimulus consequences of his arousal reaction. In terms of the memory interpretation, it can be seen that the response of running can become conditioned to the stimuli from arousal, but with a probability lower than that in the case where the subject is permitted to make a complete running escape response. Because of the physical reflecting barriers in the conditioning situation, a rat trained with the trapping procedure will make an avoidance response if he continues to run randomly after the CS has induced arousal. Method Subjects. The subjects were 262 male albino rats between 90 and 130 days of age obtained from the Josamar Co. of Houston, Texas. There were 166 subjects in the normative control group and 96 in the trapped group. Apparatus. The conditioning box consisted of two 14-inch long, g-inch wide, and 6-inch high compartments separated by a motor-driven guillotine door. The raising and lowering of the door took approximately .l sec. The end wall of each compartment consisted of a milk glass plate behind which was a 15watt incandescent lamp. The floor of each compartment consisted of 3/16-inch brass rods spaced 1 l/16 inch apart, positioned parallel to the door, in a hinged frame. The weight of a rat depressed a microswitch mounted beneath each frame. The apparatus was painted flat black and had hinged Plexiglass tops covered with black cloth mesh. A l-inch speaker was centered in the top of each compartment. The apparatus rested on foam rubber cushioning in a modified horizontal freezer chest which had been entirely lined with acoustical tile for sound attenuation. All stimulus presentation, timing, and response recording was programmed by Foringer, Grason-Stadler, and Lehigh Valley Electronics modules. The CS was the complex consisting of the opening of the door between the compartments, the offset of the light in the starting compartment, and the onset of 90-db wide band white noise. The UCS was scrambled electric shock delivered by a Lehigh Valley Electronics LVE 1531 Constant Current Shocker set at 2.5 ma. Response latency in .I0 set units and type of responses (errors or avoidances) were recorded by a Grason-Stadler E46OOA Print-Out-Counter. Procedure. The CS-UCS interval was 5 set, and the intertrial interval was 30 sec. All trials began with the subject in the starting compartment. If the subject left the starting compartment during a trial he was returned there 5 seconds after the end of the trial. Pretest trials consisted of the presentation of the CS alone for 11 sec. During this period shock was never presented, and the subject’s responses had no effect on the termination of the CS. Each subject was given pretest trials until he reached a criterion of five successive failures to respond within 5 set after the beginning of the trial. Training began on the next trial. During training, if the subject remained in the starting box for 5 set after CS onset, he was shocked. Responses from the start box to the safe box with latencies of 5 set or less were scored as avoidance responses (CR’s).
144
THEIOS
AND
BRBLSFORD
Responses with latencies longer than 5 set were scored as errors. Each subject was run to a criterion of 10 consecutive avoidance responses. Control group subjects were given standard delayed conditioning in which the CS and, on error trials, the shock were terminated when the subject left the compartment. For the subjects of the trapped group, the door between the two compartments came down at the onset of the shock. The shock continued for .85 set, and then the trial ended. The .85 set shock was chosen because it is the average escape latency, and thus the mean duration of shock which subjects receive when given regular avoidance training under this set of experimental conditions.
Results As expected, trapping did affect total errors. The obtained mean total errors for the control group was 3.2 and that for the trapped group was 5.2. The difference is statistically significant, t(260) = 3.04, p < .Ol. The theoretical expectation of total errors is
E(T) = I/c + q/s. The difference between the control and trapped the values of the parameters c, s, or 4. However, parameter should be either c or s, but not 4. The that the value of s, but not c, should be reduced by that the value of c, but not s, should be reduced first avoidance response, 2, is a function of s and
E(Z) = 4 -
(3)
groups could have been due to a change in according to the interpretations, the varying two-stage conditioning interpretation predicts trapping. The memory interpretation predicts by trapping. The number of errors after the q, but not c,
1 -(I
q
-s)4’
(4)
The two stage conditioning interpretation predicts a difference in s across the two experimental conditions and thus differences between the trapped and control groups on the statistic Z. The memory interpretation, on the other hand, predicts invariance of s across experimental conditions and thus no differences between trapped and control groups on the statistic Z. The obtained mean number of errors after the first avoidance response was .80 for the control group and .76 for the trapped group. The difference is not statistically significant, ~(260) = .19, p > .50, indicating that we cannot reject the hypothesis that the parameters s and q were unaffected by trapping. Since there was a significant difference between the groups in mean total errors, it is most likely that the parameter c was reduced by trapping. These results suggest that the two-stage conditioning interpretation is incorrect and that the memory interpretation is consistent with the data.
Homogeneity
of Sequences Following
First Avoidance
Since the memory interpretation requires that both control and trapped subjects have the same values of the parameters s and q, the two groups of subjects should yield sets of response sequences following the first avoidance which do not differ significantly. This prediction follows since the response sequences following the first avoidance are functions of only s and q. To test the homogeneity of the sets of sequences, the frequency of each possible sequence of responses on the first four trials following the first avoidance response were tabulated for each group, and a x2 test was performed on the hypothesis that the proportion of any particular sequence should
INTERPRETATIONS
OF
A MARKOV
MODEL
FOR
AVOIDANCE
CONDITIONING
145
be the same for the control and trapped groups. Table 1 presents the obtained and expected proportions of each sequence for both groups. As can be seen, the two sets of data are quite homogeneous (x2 = 12.73, df = 14, p > SO). In performing this x2 test the cell with the zero expected value was not considered. TABLE PROPORTION
OF RESPONSE
SEQUENCES
THE
FIRST
(1 = error, Response Sequence (01234)
(N
1 ON FIRST
FOUR
TRIALS
AFTER
AVOIDANCE’
0 = avoid)
Control = 166)
Trapped (N = 96)
.018 .018 .018 .042 0 .024 .006 ,084 0 .018 .012 .048 0 ,030 .030 .651
Expected (Weighted average)
.OlO ,021 0 .031 0 .OlO 0 .052 .031 .021 .OlO .083 .OlO .021 .03 1 .667
.015 .019 .Oll .038 0* .019 .004 .073 .Oll .019 .Ol 1 .061 .004 .027 .031 .657
“x2 = 12.73, df = 14, p > SO. b This row was not considered in the analysis.
VARIATIONS
ON
THE
GENERAL
MODEL
The learning model we have been considering is a special case of a general four-state Markov model which has the following transition matrix and associated vectors: P P T F N IO
-1 5 .t
T (1 L)p (1 -4P c(1 - d)p
F
N
0 (1” S)Cj 0 0 (1 - e)q c(l -d)q 1 -c
Pr(avoid)
Pr(start)
1[I [I 01
01
.
(5)
146
THEIOS
AND
BRELSFORD
The parameter p is equal to l-q. There are two differences between this model and the one given in Eq. 2. First, in the general model, given that the initial state is left on a trial, it is possible to make a direct transition to the absorbing state with probability d on that same trial. Second, absorption from the two intermediate states may take place with different probabilities, s and e. The general model has five parameters. In all likelihood, fewer than five parameters will permit an adequate description of a given set of data. Thus, it should be possible to equate two or more of the parameters or to fix some of the parameters at specified values, such as zero or unity. The restrictions on the parameters should be dictated by theoretical interpretations of the abstract model. A number of theoretically attractive interpretations are consistent with the fact that the control and trapped groups did not differ on responses following the last error. Temporary
Permanent
Memory
Interpretation
The memory storage interpretation which we have been considering requires that s = e and that d = 0.0. The parameters is identified as the probability of permanently storing the association between the CS and the arousal response. The restriction that d = 0 requires that the association first has to be stored in temporary memory and is only later transferred to permanent memory. Long Term-Short
Term Memory
Interpretation
Atkinson and Crothers (1964) have applied a similar long term-short term memory model to paired-associate learning. Applied to avoidance conditioning the long-short model is a special case of the general model given in Eq. 5. The restriction is that the parameters s, e, and d are all equal and identified as the probability of permanently storing the association between the CS and arousal. The only difference between the long-short and temporary-permanent memory models is whether or not the association may be stored directly into long term memory with probability d as soon as it is formed. Classical Conditioning
Interpretation
A stimulus sampling interpretation of classical conditioning, originally suggested by W. K. Estes, is presented in Bower and Theios (1964). This interpretation is consistent with the data from the control and trapped groups and yields a model which is also a special case of the general model. The interpretation requires that the CS-alone and CS-plus-shock be represented as two sets of stimuli with an overlap equal to a proportion p of the CS set. The stimulus pattern consisting of all the elements in the CS-plus-shock set may become conditioned to the running response with probability c on each shock trial. On trials subsequent to the conditioning of running to CS-plusshock, the rat will make a running response to the CS-alone with probability p since a
INTERPRETATIONS
OF
A MARKOV
MODEL
FOR
AVOIDANCE
CONDITIONING
147
proportion p of the CS-alone set is conditioned to running. When the rat does make a running response to the CS-alone the response may become conditioned by contiguity with probability s. This interpretation requires that the parameters d and e be equal to zero. The three interpretations we have considered are similar in that they each have three free parameters. We will now consider an interpretation that has four free parameters. Dependent
Forgetting
Model
The interpretations that have been considered all make the prediction that the responses on the intermediate trials between a subject’s first avoidance response and last error must be statistically independent. It might be the case in a set of data that this prediction will not be upheld. The general model can be modified to account for sequential dependencies in the intermediate response sequences if we let (1 - e)q’ be the probability of remaining in the forgotten state from trial n to n + 1 where 4’ is not equal to 4 of the general model. When this modification is coupled with the temporarypermanent memory interpretation, q is identified as the probability of forgetting the association between the CS and arousal following an avoidance response and q’ is identified as the probability of forgetting the association following a shock trial. As q and q’ diverge in value, an intermediate response would become more and more dependent upon the preceding response.
APPLICATIONS
OF THE
SUBMODELS
TO THE
DATA
Independence The first three of the four interpretations require that the responses on the intermediate trials between a subject’s first avoidance and last error should be statistically independent (cf. Suppes and Ginsberg, 1963). The frequencies of the first-order response-response transitions over all trials strictly between a subject’s first avoidance and last error for all subjects are given in Table 2. The data for the control and trapped groups have been combined because (a) all the submodels predict that the response sequences following the first response for the two groups should be homogeneous, (b) we were unable to statistically reject that hypothesis earlier, and (c) there are only 32 subjects in the control group and only 16 subjects in the trapped group who have at least two trials strictly between their first avoidance and last error. The x2 = 1.14, df = 1, p > .25 which indicates that the hypothesis of statistical independence cannot be rejected for the entire sample of subjects. It was noticed, however, that three of the 96 trapped Ss showed somewhat dependent response sequences in that they had a run of errors or a run of avoidances as long as 5 or 6. In fact, if an independence test is run only on the data
148
THEIOS
AND
BRELSFORD
TABLE
2
TESTS OF INDEPENDENCE OF THE STRICTLY BETWEEN A SUBJECT’S FIRST
Control
?I
Trial” n+l
error-error error-avoid avoid-error avoid-avoid
OVER AND
TRIALS LAST ERROR
Trapped
Obtained frequency .
RESPONSES AVOIDANCE
Expected frequency
Obtained frequency
Combined Expected frequency
Obtained frequency
Expected frequency
25 33 21 30
26.2 31.8 25.8 31.2
22 9 11 17
17.3 13.7 15.7 12.3
47 42 38 47
43.5 45.5 41.5 43.5
115
115.0
59
59.0
174
174.0
2
% P a Trials
.21 1 >.50 n and n +
1 are both
6.30 1 < .02 strictly
between
the first avoidance
1.14 1 > .20 response
and the last error
of the trapped group the resulting x2 = 6.30 is significant, df = 1, p < .02. However, if the three atypical trapped subjects are removed from the analysis, the hypothesis of independence cannot be rejected, x2 = 2.27, df = 1, p > .13. In an attempt to obtain more information about whether the intermediate responsesfrom the trapped subjectsare dependent or independent, 34 more subjectswere run under the trapped condition in a later experiment. The obtained frequenciesof the four possibleresponseresponsetransitions on the intermediate trials were 5, 4, 5, and 6, respectively, in the order given in Table 1. These additional results certainly do not add support to the hypothesis that the trapping procedure leads to dependent intermediate responses. Since we are not inclined to make a decision for dependenceprimarily on the basis of three subjectsof a total of 96 trapped subjects,we leave this point to the discretion of the reader. If the true state of affairs is independence,then the first three interpretations are applicable. If the data are dependent, then a model like the fourth interpretation would be applicable. Stationarity
According to a theorem proved by JamesGreeno, the generalmodel predicts that the probability of an avoidanceresponseshould be constant over all the intermediate trials between a subject’sfirst avoidance and last error and equal to P(I)
=
PU - 4
1 - ps - qe -
(6)
INTERPRETATIONS
OF A MARKOV
MODEL
FOR
TABLE
AVOIDANCE
149
CONDITIONING
3
PROPORTION OF AVOIDANCE RESPONSES ON TRIALS BETWEEN SUBJECT’S FIRST AVOIDANCE AND LAST ERFCOR
A
.._
Control Trial
Group
Trapped
Proportions
Cases
52 Al .54 .71 SO .55
(44 (32)
126) (17)
Proportions
cells trials
contain dropped
c-w (16) (10) (33)
(12) (29)”
data from all the remaining below 4 at these points.
Proportions
Cases
.61 .48 .50 .56 .44 .50 .45 .58
54 1.66 3 > .05
trials
_-.
Combined
Cases
.I5 .50 .40 .42
.54 2.43 5 >.70
a These individual
Group
(72) (48) (36)
(25) (18) (14) (11) (24) .54 3.51 7 >.80
since
the expected
frequencies
of
Table 3 presents the individual proportions on successive intermediate trials for the control, trapped, and combined groups. The three sets of data points are relatively constant and do not differ significantly from each other or the weighted average. This type of analysis is a valid test of stationarity if all subjects in the sample have the same values of the parameters. However, it has been argued (Suppes and Ginsberg, 1963; Prokasy, 1964) that if the subjects are not homogeneous with respect to the parameter values it may be possible to obtain constant data points with this analysis when the responses are in fact not stationary. To forstall this type of argument a Vincent type stationarity analysis has been performed (cf. Suppes and Ginsberg, 1963). For each rat that had at least two trials between his first avoidance and last error, the response sequence on these trials was divided into-fifths, and the proportion of avoidance responses in each fifth was counted. If the process is stationary, the number of avoidance responses in each fifth should be equal within chance limits. On the other hand, most alternative learning models would predict that there should be significantly more avoidance responses in the later fifths than in the earlier fifths. Table 4 presents the Vincentized stationarity analysis for the control and trapped groups. In each case, x2 tests indicate that the hypothesis of stationarity of the Vincentized date points cannot be rejected.
150
THEIOS
AND
BRELSFORD
TABLE
4
VINCENTIZED PROPORTION OF AVOIDANCE F~SPONSES ON TRIALS BETWEEN THE FIFST AVOIDANCE AND LAST ERROR
Successive
fifths
1 2 3 4 5 Mean N 2
of the hypothesis
Trapped
..58 .47 .43 .57 .68 .55 32 6.62 4 > .lO
i P
a Test
Control
that the two
groups
.I3 .45 .41 .33 .60 .51 16 7.22 4 >.lO
3.80” 4” > .30”
do not differ
from
each other.
SequentialAnalysis In order to make predictions from the submodels, estimates of the parameters c, s, e, and q are needed. Consider all possible sequences of errors and avoidance responses on the first five trials of the experiment. Since all subjects begin with an error, there are 16 possible sequences. All the parameters of a submodel are involved in the theoretical expressions for the expected probabilities of the 16 different sequences. The 16 equations for the general model, which can be derived in a straightforward manner, are listed in the Appendix. In order to estimate parameter values and provide a goodness-of-fit test of the model a Control Data Corporation 1604 computer was used to select values for the parameters which minimized the x2 between obtained and expected frequencies of the 16 sequences for the control group. This estimation technique has been described in detail by Atkinson and Crothers (1964). A consideration of the first five trials of the data provides an exhaustive test in this case since over 72% of the subjects had their last error on or before trial 5 and over 91 y0 of all errors occurred during the first 5 trials. Table 5 presents the obtained and predicted frequencies of each response sequence for each submodel. The table also presents the best estimates of the values of the parameters and the results of the x2 goodness-offit tests. In each test, the degrees of freedom are equal to the number of cells for sequences less the number of parameters estimated minus one for the restriction on the number of cases. In computing the x2 values, sequences with an expected value of less than 2.0 were pooled together. Statistically, none of the four submodels can be rejected. It can be seen, however, that the memory model yields the best fit, with the
INTERPRETATION
OF A MARKOV
MODEL
FOR
TABLE FREQUENCY
OF F~SPONSE FOR
THE
Sequence
Data
11111 11110 11101 11100 11011 11010 11001 11000 10111 10110 10101 10100 10011 10010 10001 10000
Memory 10 18 9 29 4 5 5 41 4 2 1 I 1 5 1 24
13.2 16.3 6.6 25.9 4.2 8.0 3.9 38.2 2.0" 3.8 1.8" 9.4 1.8" 3.5 1.7” 25.1
151
CONDITIONING
5
SEQUFNXS CONTROL
(1 = error,
AVOIDANCE
ON FIRST
FIVE
TRIALS
GROUP
0 = avoid) Long-Short 16.4 16.1 6.8 22.4 4.1 6.9 4.3. 31.9 1.9” 3.1 1.9” 6.4 1.9" 3.2 2.0" 36.7
Conditioning 16.2 16.0 6.8 22.2 4.0 7.0 4.5 31.6 1.8" 3.1 2.0" 6.6 2.05 3.5 2.3 36.4
Dependent 12.6 16.1 6.8 26.0 3.8 8.9 4.0 39.3 1.5" 3.4 1.9" 9.6 1.5" 3.6 1.6" 25.4
N c s
166
i 4 Q’ 2 2f P a These b These
cells were pooled values were fixed
166.0 .625 .343 .343" .ooo” .52 .52b 6.16 9 >.75 together. by theoretical
166.0 .551 ,231 .231" .231" ,492 .492" 14.54 9 >.lO
166.0 .554 .356 .ooo* .OOOb .362 .362* 14.81 10 > .lO
166.0 .610 .350 .35ob .ooo” .520 ,460 5.83 8 >.50
interpretation.
dependent model a close second. It is interesting to note that the long-short and classical conditioning models are giving almost identical predictions, in spite of the fact that the long-short model permits transitions to the absorbing state following intermediate errors while the classical conditioning model does not permit such transitions. Under theidentifications made in each of the submodels, it follows that the control and trapped groups should differ only on the value of the parameter c and that the rest of the parameters should be the same for both groups. As a first test of invariance
152
THEIOS
AND
BRELSFORD
of the parameters, the values of S, e, and 4 were fixed at the values estimated for the control group, and the value of c for the trapped group was estimated by minimizing the x2 between obtained and predicted frequencies of the 16 possible response sequences on the first five trials of the experiment. The obtained and predicted frequencies of the 16 response sequences for each submodel are given in Table 6, TABLE FREQUENCY
OF RESPONSE
SEQUENCES
FOR THE
(1
6
= error,
Sequence
Data
Memory
11111 11110 11101 11100 11011 11010 11001 11000 10111 10110 10101 10100 10011 10010 10001 10000
33 16 5 12 0 2 2 12 0 0 0 0 2 8 0 4
32.5 13.1 4.0 13.0 1.7” 3.2 1.6” 13.1 3” 1.1” .5” 2.6 .5” 1.0” .5” 7.1
96
96.0 .300 8.54 7 > .25
ON THE
TRAPPED
FIRST
FNE
TRIALS
GROUP
0 = avoid) Long-Short
Conditioning
Dependent
33.7 12.1 3.6 11.9 1.6a 2.7 1.6” 12.3 .5” .9” .6” 1.8” .6” .9” .6-3 10.6
33.8 12.0 3.6 11.8 1.5” 2.7 1.7” 12.2 .5” .9” .6” 1.9” .6” 1 .O” .7” 10.5
43.8 12.0 3.5 11.0 1.2” 2.8 1.2” 10.4 .3” .7” .4= 2.0 .3” 2” .3” 5.3
96.0 .276 6.98 6 > .30
96.0 ,272 6.85 6 >.30
96.0 .220 8.65 7 > .25
N c 2 & P
5 These
cells were
pooled
together.
along with parameter values and measures of goodness-of-fit. Again, none of the four submodels can be rejected on statistical grounds. All the submodels are predicting about equally well; however, the classical conditioning and long-short model are giving slightly better fits to the trapped data. For the trapped group, over 56 y. of the subjects had their last error by trial five, and over 64% of all errors occured during the first five trials.
INTERPRETATIONS
Distribution
OF A MARKOV
MODEL
FOR
AVOIDANCE
153
CONDITIONING
of Total Errors
As a further test of the model, the values of the parameterswhich were estimated on the basisof the first five trials were assumedto hold for all trials of the experiment, and predictions were made for a number of random variables defined over all trials. Let T be a random variable which designatesthe total number of errors made by a subject. The theoretical probability distribution of total errors for the general model is Pr(T=i)
=[d+(l-d)~f]c(l-~)“+(‘-~~~b,~~)
[( 1 - by-1 - (1 - +-l-J
(7) for i = 1, 2, 3, .... where f = s/(q + ps) and b = e + (1 - e)@f. The derivation of the distribution is given in Theios (1965). Obtained and predicted distributions of T for the control and trapped groups are given in Tables 7 and 8 for each of the submodels. Kolmogorov-Smirnov one sample goodness-of-fit tests indicate that for each submodel, the deviations between obtained and predicted frequencies could have occurred quite easily due to chance sampling under the hypothesis that the data were generated by a process having the properties of the submodel (Brinbaum, 1952). The one-samplegoodness-of-fit test is such that the smaller D (the maximum difference between the observed and theoretical cumulative proporTABLE FREQUENCY
Number of errors 0 1 2 3 4 5 6 7 8 9 10 11 N D df P
DISTRIBUTION
OF TOTAL
7 ERRORS
FOR THE
CONTROL
Long-Short
GROUP
Data
Memory
0 22 46 38 27 19 7 3 2 1 0 1 166
0 24.9 48.8 38.0 24.1 13.9 7.7 4.1 2.1 1.1 .6 .3
0 35.4 37.6 30.2 21.7 14.7 9.7 6.2 3.9 2.5 1.5 1.0
0 34.8 37.3 30.1 21.8 14.9 9.9 6.4 4.1 2.6 1.6 1.0
.035 162 > .87
.081 162 >.17
.079 162 >.17
Conditioning
154
THEIOS
FREQUENCY
Number of errors 0 1 2 3 4 5 6 I 8 9 10 11 12 13 14 15 N D
DISTRIBUTION
AND
BRELSFORD
TABLE
8
OF TOTAL
ERRORS
FOR
THE
GROUP
Data
Memory
0 4 18 16 13 11 9 2 5 6 5 1 3 1 1 1 96
0 6.9 15.8 16.5 14.3 11.4 8.6 6.4 4.6 3.3 2.4 1.7 1.2 .8 .6 .4
0 10.3 13.1 13.8 12.3 10.4 8.4 6.6 5.1 3.9 3.0 2.2 1.7 1.2 .9 .6
0 10.1 13.5 13.6 12.3 10.4 8.4 6.7 5.2 4.0 3.0 2.3 1.7 1.3 .9 .6
.073 94 > .65
.065 94 >.81
.063 94 >.81
df P
Long-Short
TRAPPED
Conditioning
tions) the more likely that the observed data are random samples from a distribution equal to the theoretical distribution. Again the submodels are predicting the two observed distributions quite well, with the memory model predicting slightly better for the control group and the long-short and conditioning models predicting slightly better for the trapped group. Observable States Interpretations Atkinson and Crothers (1964) and Green0 and Steiner (1964) have shown that certain classes of Markov models which appear to have different structures may lead to identical predictions. In view of this it should be pointed out that the numerical values of the predictions reported in this paper probably could be obtained from related models in which the transition probabilities differed from those dictated by the interpretations which have been considered. The close correspondence between the predictions of the long-short and the classical conditioning models suggests that it may be possible to show that these two submodels are isomorphic.
INTERPRETATIONS
OF
A MARKOV
MODEL
FOR
AVOIDANCE
155
CONDITIONING
Green0 and Steiner (1964) have pointed out the desirability of using observable states models because of their mathematical tractability and ease of parameter estimation. For a model to have observable states, it must be possible to uniquely determine, from a knowledge of the entire response sequence of a subject, which state the process was in on every trial. If the states are observable, then the maximum likelihood estimate of a parameter eij specifing a transition from any state i to any other state j is just the number of times the process went from state i to statej divided by the number of times the process was in state i. It is also possible to determine the likelihood of an entire matrix of response sequences under the hypothesis that the data were generated by a process having the properties of the observable states model. Thus, the goodness of fit of observable states models can be easily determined. The simplest observable states model consistent with avoidance conditioning data would have four states and is defined by the following matrix and vectors:
P P T F N
-1 0
i
T
F
N
0
0
0 0
4’
1-Z-q) 0
0
l-q
c
1il iI Pr(avoid)
Pr(start)
.
1-c-d.
01
01 *
(8)
There is one aspect of avoidance conditioning data which complicates the use of an observable states approach, however. The model above and most other simple observable states models require that the number of errors before the first avoidance (Jo) be distributed geometrically. This is not generally the case in avoidance conditioning. On the other hand, the general model given in Eq. 5 predicts that Jo is not distributed geometrically, but rather Pr(J,
= ;) = [d + (1 - d)(l - q)]c(l - c>i-l +
(';"';(y-[a"-(l
-+11,
(9)
for i = 1, 2, 3, . .. . and a = q(1 - e). The derivation of Eq. 9 may be found in Theios (1965). Tables 9 and 10 present the obtained and predicted distributions of errors before the first avoidance for the simple observable states model and the submodels which we have been considering. As can be seen, the geometric predictions are not describing the data as well as those of the submodels in both the control and trapped conditions. An identifiable states model which would predict a nongeometric distribution of errors before the first avoidance would have to link state occupancy during the initial run of errors directly to trial number. This modification would complicate the model to such an extent that mathematical tractability and psychological interpretation would be severly compromised.
156
THEIOS
AND
TABLE FREQUENCY
BRELSFORD
9
DISTRIBUTION OF ERRORS BEFORE THE FIRST FOR THE CONTROL GROUP
AVOIDANCE
Errors
Data
Memory
Long-Short
Conditioning
Geometric
0 1 2 3 4 5 6 7 N D
0 44 58 39 18 4 1 2 166
0 49.8 54.2 32.5 16.3 7.5 3.3 1.4
0 57.1 47.1 29.2 16.1 8.4 4.2 2.0
0 57.6 47.1 29.0 16.0 8.3 4.1 2.0
0 62.3 38.9 24.3 15.2 9.5 5.9 3.7
,038 162 > .68
.079 162 >.17
.082 162 >.17
.llO 162 < .05
df P
TABLE FREQUENCY
10
DISTRIBUTION OF ERRORS BEFORE THE FIFST FOR THE TRAPPED GROUP
Errors
Data
Memory
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 N D
0 14 16 17 16 6 9 1 4 3 5
0 13.8 19.5 17.0 13.1 9.6 6.8 4.8 3.4 2.4 1.7 1.2 .8 .6 .4
0 16.5 18.2 15.5 12.1 9.1 6.7 4.9 3.6 2.6 1.9 1.4 1.0 .I .5
0 16.7 18.2 15.4 12.0 9.0 6.7 4.9 3.6 2.6 1.9 1.4 1.0 .7 .5
0 19.9 15.8 12.5 9.9 7.9 6.2 4.9 3.9 3.1 2.5 2.0 1.5 1.2 1.0
.059 94 >.80
.049 94 >.94
.050 94 > .94
.061 94 > .80
df P
1
2 1 1 96
Long-Short
AVOIDANCE
Conditioning
Geometric
INTERPRETATIONS
OF
A MARKOV
MODEL
FOR
AVOIDANCE
CONDITIONING
157
CONCLUSIONS
The results and analyses which have been presented indicate that the fine grain of avoidance conditioning data can be accounted for by the general Markov model of Bower and Theios (1964) and Theios (1965) which describes learning as transitions among discrete performance levels. In addition, parameter invariance has been demonstrated across experimental conditions. It was possible to estimate values of two of the three parameters for the control group and to successfully assume that these values held for the trapped group in spite of the fact that the trapped subjects could not make an escape response on error trials while the control subjects could escape. Once the general model is accepted as a reasonable description of the avoidance conditioning process, theoretical interpretation and physical identification of the states and parameters of the model become of primary interest. Since the differences between the control and trapped groups could be accounted for by differences in the first state-of-conditioning parameter, c, the early psychological interpretation of the conditioning process suggested by Theios (1963) is not tenable. According to that interpretation the organism first learns to be emotionally aroused in the presence of the CS and then later learns the instrumental running response. Since blocking of the running response reduced the value of c, but not e, learning of the instrumental running response must have occurred entirely in the first stage of conditioning. Two theoretical interpretations of avoidance conditioning seem consistent with the experimental result that only the parameter c was reduced by trapping. The first is a memory interpretation in which: (1) An association between the CS and an emotional probability unity following a paring of CS and shock. (2) The stimulus for the instrumental from emotional arousal.
running
arousal reaction
response is proprioceptive
occurs with feedback
(3) A relatively permanent association between the arousal stimuli and the instrumental response is formed with probability c following application of shock. (4) The association between the CS and the arousal response may suffer temporary extinction or forgetting before it is stored in a relatively permanent memory. Both the temporary-permanent and long-short submodels are consistent with this interpretation. According to the interpretation, errors following the first avoidance are due to the stimulus for the avoidance response, feedback from arousal, not being present. In a recent unpublished doctoral dissertation, the second author manipulated arousal directly and obtained data favorable to the interpretation. The classical conditioning submodel leads to an alternative interpretation which is also consistent with the results. The subject first learns with probability c to run to the stimulus complex consisting of the CS-plus-shock. After this has taken place, the subject with probabilityp may generalize and make a response to CS-alone, because
158
THEIOS
AND
BRELSFOBD
of the common components of CS-alone and CS-plus-shock. If a response to CS-alone occurs, it may become conditioned to CS-alone with probability s. The question of whether the memory interpretation, the classical conditioning interpretation, or some other interpretation of the general model is better will have to be decided by sets of coordinating experiments in which physical variables related to the interpretations are manipulated. For example, if it could be experimentally arranged so that a subject was emotionally aroused on each trial following the first avoidance, the memory interpretation would predict that there should be no errors after the first avoidance response. Taking the alternative theoretical approach, the classical conditioning interpretation predicts that the value of the parameter p should be a direct function of the physical similarity of the conditioned stimulus (CS) and the unconditioned stimulus (UCS).
APPENDIX
Define a random variable X, which indicates the response made by a subject on trial n of an experiment x
R
=
0 if an avoidance response on trial n I 1 if an error on trial n
(10)
for n = 1, 2, 3, . . . . Let Pr(hijkm)
= Pr(X,
= h, Xs = i, X, = j, X4 = k, X, = m)
(11)
for h, i, j, k, and m = 0, 1. Then for the general form of the model the following 16 equations hold where a = (1 - e)p: Pr(llll1)
= (1 - c)* + (1 - c)~c(I - d)p + (1 - c)%(l - d)qa + (1 - C)C( 1 - d)qa2 + c( 1 - d)@
(12)
Pr(llll0)
= (1 - c)~c~+ (1 - c)~c(I - d)p + (1 - c)~c(~- d)p(l - u) + (1 - c)c(l - d)gu(l - a) + c(1 - d)@(l - a)
(13)
Pr(lllO1) = (1 - c)%(I - d)p(l - s)q + (1 - c)c(l - d)q2(1- e)P(l - s) + ~(1 - d)q2u(l - e)P(l - s)
(14)
Pr(lllOO) = (1 - c)W + (1 - c)%(l -d&s + (1 - c)%(l - d)p2(1 -s) + (1 - c)c(l - d)pe + (1 - c)c(l - d)q (1 - e)ps + (1 - c)c(l - d)q(l - e)(l - s)p2+ c(1 - d)que + c(1 - d)qu(l - e)ps + c(l - d)qa(l - e)(l - s)p2
(15)
Pr(llOl1) = (1 - c)c(l - d)p(l - s)qu + c(1 - d)q2(1 - e)P(l - s)u
(16)
Pr(11010) = (1 - c)c(l - d)p(l - s)q(l -a) + c(1 - d)q2(1 - e)p(l - s)(l - u)
(17)
INTERPRETATIONS
OF
A
MARKOV
MODEL
FOR
AVOIDANCE
Pr( 11001)
= (1 - c)c( I - d)p2( 1 - ~)~q + c( 1 - d)q( 1 - e)p2( 1 - ~)~q
(18)
Pr(11000)
= (1 + + $-
(19)
Pr(lOll1) Pr(lO1lO)
= c(1 - d)p(l = c(1 - d)p(l
Pr(lO1O1)
= c(1 - d)p2(1 - s)2q2(1 - e)
Pr(10100) Pr(10011)
= c(l - d)p(l - s)qe + c(1 - d)p2(1 + c(1 - d)p3(l - s)2q(l - e) = c(1 - d)p2(1 - s)2qa
Pr(lOO1O)
= c(1 - d)p2(1 - s)2q(1 - a)
- c)cd + (1 - c)c(l - d)ps + (1 - c)c(l - d)p2(1 - s)s (1 - c)c(l - d)p3( 1 - s)2 + c( I - d)qe c(1 - d)q(l - e)ps + c( 1 - d)q( 1 - e)p2(1 - s)s c( 1 - d)q( 1 - e)f3(1 - s)~ - s)qa2 - s)qu(l
(20) -u)
(21) (22) - s)q( 1 --- e)s (23) (24) (25)
Pr(10001) = c( 1 - d)p3(1 - s)3q Pr( 10000) = cd + c(1 - d)ps + c( 1 - d)p’(l
- - s)s + c(1 - d)p3(1 -
+ c(1 - d)p4(1 - s)~
CODED Rat
101010101 4 3 3 613 211 5 2 4 5 2314121 121 13 1 2 4 3 IO 2 5 12 4 10 1 2 9
RAW
DATA
S)%
(26) (27)
TABLE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
159
CONDITIONING
FOR INDIVIDUAL
11 TRAPPED
GROUP
RATS
Rat
101010101
Rat
101010101
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
2 2 4 3 3 8 3 1 1 3 226 6 121 2 3 241 4 2 3 1 6
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
121 6 442 4 9 121 11 2 2 7 421 14 1 314121111 8 8 122 63311 8 3 121 6 10
160
THEIOS
TABLE 22 23 24 25 26 21 28 29 30 31 32
4 6 121 312 44421 5 45 1 12 2 2 3
54 55 56 51 58 59 60 61 62 63 64
AND
BEELSFORD
11 (Continued) 313 311 31111 123 121 4 2 4 9 43 1 2261
86 87 88 89 90 91 92 93 94 95 96
1
10 511 4 5 2 10 121 3 6 61 1 211
Note. The columns represent consecutive runs of errors and runs of avoidance responses. The columns labeled 1 designate runs of errors. The columns labeled 0 designate runs of successes. The entries in the columns give the length of the runs in number of trials. The final criterion run of 10 successes has not been indicated, but it follows the last entry for each rat.
TABLE CODED Rat
101010101
RAW
DATA Rat
FOR INDIVIDUAL
12 CONTROL
101010101
GROUP Rat
RATS 101010101
1 2 3 4 5 6 I 8 9 10 11 12 13 14 15 16 17 18 19 20
22221 2 3 2 3 5 31211 1 2 2 2 1112111 3 12 1 4 2 4 3 4 4
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
2 121 3 12111 4 2 2 312 14121 411 2311121 4 121 31312122 2 2 2 3 1131121 4
84 85
1
86 87 88 89 90 91 92 93 94 95 96 91 98 99 100 101 102
2 2 5 2 412 21214 2 4 1 2 4 2 2 3 3 21111 1 114 5
INTERPRETATIONS
OF A MARKOV
TABLE 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
2 412 4 13 1 11211 3 114 3111121 213 112 1 1 1 1 241 3 3
36 37 38 39 40 41 125 126 127 128 129 130 131 132 133 134 135
2 I 2 12111 3 221431112 341 1 5 3 1 1 3 5 2 212 1
136 137 138
211 2 4
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 71 78 19 80 81 82 83 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
MODEL
FOR
AVOIDANCE
CONDITIONING
161
12 (Continued) 1 1 1 2 1 4 22121 3 2 1 1 1 2 211 3 1 1 1 1 1 1 3 212 342 1 1 1 1 1 1 1 1 2 52211 3 5 211 43 1 3 21111 1 12121 4 1 114 2 1 4 3 2312111 6
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 154 155 156 157 158 159 160 161 162 163 164 165 166
2 1 2 1 1421133 311 2 2 3 1 1 3 1 2 3 1 3 2 2 1 2 1 3 3 1 1 1 12212 221 321 3 1 3 1 1 3 1 23111 7 221 3
REFERENCES ATKINSON, R. C., AND CROTHERS, E. J. A comparison of paired-associate learning models having different acquisition and retention axioms. Journal of Mathematical Psychology, 1964, 1, 285-315. BOWER, G. H., AND THEIOS, J. A learning model for discrete performance levels. In R. C. Atkinson (Ed.), Studies in mathematical psychology. Stanford, Stanford Univer. Press, 1964, Pp. 1-31.
II
162
THEIOS
AND
BRELSFORD
Z. W. Numerical tabulation of the distribution of Kolmogorov’s statistic for finite size. Journal of the American Statistical Association, 1952, 47, 425-441. GFCEENO, J. G., AND STEINER, T. E. Markovian processes with identifiable states: General considerations and application to all-or-none learning. Psychometrika, 1964, 29, 309-333. PROKASY, W. F. Pattern models, sampling theory, D(A(1 - lO+N)) . . . and some conditioning data. Paper read at Scientific Meetings of the Psychonomic Society, Niagara Falls, Ontario, Canada, Oct. 10, 1964. SUPPES, P., AND GINSBERG, ROSE. A fundamental property of all-or-none models, binomial distribution of responses prior to conditioning, with application to concept formation in children. Psychological Review, 1963, 70, 139-161. THEIOS, J. Simple conditioning as two-stage all-or-none learning. Psychological Review, 1963, 70, 403-417. THEIOS, J. The mathematical structure of reversal learning in a shock-escape T-maze: Overtraining and successive reversals. Journal of Mathematical Psychology, 1965, 2, 26-52. THEIOS, J., AND DUNAWAY, J. E. One-way versus shuttle avoidance conditioning. Psychonomic Sciace, 1964, 1, 251-252. BRINBAUM,
sample
RECEIVED:
January 28, 1965