JOURNAL
OF EXPERIMENTAL
Differential
CHILD
PSYCHOMQY
6,667-676
(1968)
Delay-of-Reward Training and Subsequent Discrimination Learning in Children1 DONALD J. TYRRELL hmddin
and
Marshall
College
Four groups of 16 first-grade children were given 40 single-stimulus training trials and then 100 trials on a subsequent two-choice visual discrimination test which employed the previous stimuli as discriminanda. Two groups (D-O; D-10) experienced differential delays (either zero or 10 seconds) of reward following a response to each stimulus during the training trials. One Group (N-O) of nondifferentially trained AS’Sexperienced a zero-second delay of reward, while Ss in the other nondifferential Group (N-10) experienced a lO-second delay, following a response to both stimuli. During the discrimination test the stimuli associated with zero delay in the differential conditions became positive for one group of subjects (D-O) and negative for the other (D-10). The positive discriminative stimuli in the nondifferential conditions were assigned randomly. Response speeds to the stimuli during training were inversely related to the delay of reward associated with that stimulus. Ss in Group D-10 began the discrimination with a significant number of incorrect responses. Their performance over trials, however, increased rapidly, and the overall level of discriminative performance for the differential Groups (D-O and D-10) was superior to that following nondifferential training (N-O and N-10). Results were interpreted in terms of a mediating response model.
Investigators working with such diverse subject populations as octopuses (MacKintosh and MacKintosh, 1963)) rats (Lawrence, 1949)) and children (Kendler and Kendler, 1962; Zeaman and House, 1963) have found it necessary to postulate ‘(two-stage,” “chaining,” or “mediating response ” models of discriminative behavior. These theorists hypothesize some covert response (e.g., perceptual or verbal) which functionally alters the effective stimulus situation and must be acquired by the organism prior to, or simultaneous with, acquisition of the instrumental (goal) response. The present experiment attempts to demonstrate that differential reward-delay characteristics affect the acquisition of this mediational ‘This investigation was supported in part by Public Health Service Fellowship l-Fl-MH-24, 469-01 from the National Institute of Mental Health. The author is indebted to Dr. S. L. Witryol for his thoughtful assistance throughout the study. He is also grateful to Mr. Donald Parker, principal, and his excellent staff at Meadowbrook School in Tolland, Connecticut. 667
668
DOSALI)
J.
TX-RRELL
response in young children. Several independent expcrimcnts have demonstrated a relationship between reward delay and performance of children in a discrimination learning situation (e.g., Fag:m and Witryol, 1966; Terrell, 1965). The relationships established have been consonent with predictions of one-stage or non-mediational models of discrimination learning (e.g., Spence, 1936). Consequently the reward delay has been assumed to affect, directly, the strength of the inst,rumental approach response. In the present investigation children were trained on differential reward delays associated with stimuli subsequently presented as discriminanda in two choice visual discrimination problems. Test performance was hypothesized to indicate whether prediscrimination reward delay experience affects the acquisition of either instrumental, or mediational, responses transferred to the discrimination tests. Lipsitt, Castaneda, and Kemble (1959) demonstrated that kindergarten children who have experienced prediscrimination trials, during which subsequent positive and negative stimuli were associated with immediate and delayed reward, respect’ively, perform at a higher level on the discrimination task, than did control subjects who have experienced immediate reward to both stimuli. This facilitation in performance was postulated to result from the positive transfer of differential levels of approach strength established during the different,ial delay training. The previously immediately rewarded, positive, discriminative stimulus would have a higher level of approach response strength at the start of discrimination than would the negative discrimination stimulus which was associated with a delay of reward during training. The authors suggested an alternative interpretation, that the differential delays during the prediscrimination trials might have produced differential symbolic cues, serving to increase stimulus distinctiveness and, hence, the speed of learning. If learning a visual discrimination problem does involve acquiring a sequence or chain of two responses as postulated by House and Zeaman (1963) than facilitation of discriminative behavior can be produced by the positive transfer of either the mediational response, 0). the instrumental response or both (Shepp and Turrisi, 1966). Experimental conditions similar to those of Lipsitt, Castaneda, and Kemble were employed in the present study in an attempt to assess these alt,ernative transfer possibilities. First-grade children were trained on either differential or nondifferential delays of reward during single stimulus prediscrimination trials, and were then tested on a two-choice simultaneous discrimination problem. One-half of the differentially trained Ss were tested on a problem in which the immediately rewarded prediscrimination stimulus became positive (Group D-O) ; the stimulus previously associated with a lo-second
DIFFERENTIAL
DELAY-OF-REWARD
TRAINING
669
reward delay was correct for the other half (Group D-loj. Two nondifferential delay control groups were employed; the first was trained with immediate reward following responses to both stimuli (Group N-O), while Ss in the other group experienced lo-second reward delay following responses to both stimuli (Group N-10). All Ss experienced a 5-second delay of reward during the discrimination test trials. In a design such as this, the reward delay employed during the discrimination test is crucial. Any difference in delay between groups would have an effect upon discriminative performance. If either a zero or lo-second delay were employed, two groups would have experienced such a change. For these reasons a 5-second test delay was selected so that all Ss experienced a change in delay upon entering the discrimination test. This condition allowed the comparison of performance on a standard test condition as a function of differential versus nondifferential reward delays during training while controlling for unequal reward during the discrimination test. It also made possible the comparison of performance during discrimination between groups changing from zero and lo-second training delays to the 5-second discrimination test reward delay. If the effect of the differential reward delays during training is to produce different levels of approach strength associated with the stimuli, and this difference is transferred to the discrimination problem, positive transfer of this approach strength should facilitate performance for Ss experiencing the immediately rewarded training stimulus as correct during discrimination (Group D-O). Conversely, performance would be retarded for Ss experiencing that stimulus as negative during discrimination (Group D-10). On the other hand, if the differential training delays facilitate the acquisition of a relevant mediational response which is transferred to the discrimination problem, the differentially trained Ss in both groups (D-O and D-10) should experience positive transfer of this mediational response, which should facilitate their discrimination performance compared to nondifferentially trained 23s. Finally, the differential delays might produce the acquisition of differential instrumental response strengths and relevant mediational responses during the prediscrimination trials, If this were the case, both differentially trained groups would experience positive transfer of the mediational response from training to the discrimination test. However, while Xs in Group D-O experience positive instrumental response transfer, Ss in Group D-10 would experience negative instrumental response transfer. The condition of positive mediational response transfer and negative instrumental response tranfer that would exist for Ss in Group D-10 is characteristic of a reversal shift which typically produces an initial depression of discriminative performance, followed by a rapid rise to asympototic levels.
6’70
DONALD
J.
TYRRELL
METHOD
Subjects and Design Ss were 32 boys and 32 girls attending first grade in Tolland, Connecticut, Mean CA was 78 months with a range of 73-84 months and, although no standard test scores were available, teacher ratings yielded an estimate of average intelligence for the group. Each child was assigned to one (N = 16) of four experimental groups in a random fashion with restrictions for age and sex matching. All Ss were given 40 single stimulus training trials and a subsequent two-choice visual discrimination test. Training task. Each training trial consisted of the presentation of a single stimulus (from a pair of stimuli) to which S was instructed to respond. Following every response a reinforcement was delivered automatically after the appropriate delay period. Ss in both of the differential reward delay groups (D-O and D-10) experienced zero delay following a response to stimulus A, and a lo-second delay following a response to stimulus R. The specific pattern corresponding to A and B was counterbalanced over subjects to minimize any systematic stimulus bias. Ss in each of the two nondifferential delay conditions (Groups N-O and N-10) experienced the same reward delay for both stimuli, either zero or 10 seconds.The test conditions further defined the four experimental groups. Discrimination test. Ss experienced a 5-second delay of reward following a correct response during the simultaneous, two-choice discrimination test. The positive discrimination stimulus for the first differential delay Group (D-O) was the one previously associated with zero-delay during the training trials; Ss in the remaining differential delay Group (D-10) were rewarded for responding to the stimulus previously associated with the lo-second delay. Ss in the two nondifferential delay Groups (N-O and N-10) had the positive stimulus assigned randomly, with the restriction that each stimulus was positive for one half the Ss in each group. Apparatus A three-choice visual display apparatus was used. Mounted on a table between E and S was a 3 X 4-foot gray plywood panel. Centered on the base line 4 in. apart were hinged 3 X 5-in. responsedoors, constructed so that they remained open after S responded by pushing them. Mounted on each door was a display cell which projected a 11/4in. pattern. E manipulat’ed a control panel enabling him to preselect the pattern (X or +) to be projected upon each door. A trial was initiated when E pushed a single button which illuminated the patterns and started a response timer. At
DIFFERENTIAL
DELAY-OF-REWARD
TRAINING
671
the instant of response, defined as a +$-in. opening of either door, the stimulus patterns were extinguished, the response timer was stopped, and another timer was activated. The second timer controlled the time between response and reward delivery. The reward (a trinket) was dropped automatically through the open response door, Procedure Each S was brought to the experimental room individually, and seated in front of the apparatus. E explained the procedure for the first “game” (prediscrimination trials) and demonstrated by giving two unreinforced presentations. Following these practice trials 40 single-stimulus training trials were administered; each of the two stimuli were presented 20 times with the order of appearance randomized. Only the center response panel was utilized during these trials and a gray panel covered the two end doors. Immediately upon completion of these 40 trials E uncovered the two end response panels, covered the center panel, and introduced S to the discrimination test. The game was described as one in which two “pictures” would be turned on at the same time. The names of the stimuli were not spoken by E. These discrimination trials continued until 20 correct responses were made in a block of 25 trials, or for a total of 100 trials, at which time the game was ended. The position of the positive stimulus was randomized according to the Gellermann series. A noncorrection technique was employed and an incorrect response was followed by a delay equivalent to the delay of reward following a correct response, before the response panel was reset for the following trial. The reinforcer employed was a small plastic “cow” demonstrated to be a preferred reward object for first grade children (Witryol, Tyrrell, and Lowden, 1965) over massed trials. RESULTS
AND DISCUSSION
Training Trials A measure of response speed during training was derived by converting response latencies to reciprocal values. Figure 1 presents these response speeds to each stimulus for all four experimental groups plotted in four blocks of five responses to each of the two stimuli. Each subject in the two differential training groups (D-O and D-10) experienced a different delay of reward following a response to each of the stimuli; therefore, the response speeds to the stimulus associated with the immediate and the lo-second delay are plotted separately. Within the nondifferential groups (N-O OT N-lo), the response speeds are plotted separately for the
672
DOiVA4LD
.J. TYRRELL
N-O neg., , p 1.20(
D-10 delay
l--:q a-.-
N-10 pos. N-10 neg.
1 RESPONSE FIG.
groups
1. Response speds lo cvch stimulus plotted in blocks of five rrsponses.
2
3
q
4
BLOCKS during
training
for
all
experimental
stimulus which was to be positive or negative during the discrimination test for each of these groups. Each point represents the mean response speed for 16 subjects for five responses to one stimulus (a total of 80 measures per point). Individual t tests between response speeds associated with each of the two stimuli within each group indicated no significant within group differences during the first block of five responses to each stimulus. Analysis of response speeds in the last block of five responses indicated significantly greater speeds to the stimulus associated with zero, compared to lo-second reward delay in both differential training groups (t = 3.96, df = 158, p < .OOl and t = 2.28, df = 158, p < .Ol for D-O and D-10, respectively), but no reliable differences in speed existed to the stimuli within each of the nondifferential training groups (N-O or N-10). To test differences in the last block of responses between nondifferential groups (N-O versus N-lo), response speeds for both stimuli were compared, yielding a statistical significance (t = 14.31, cZf = 318, p < .OOl)
DIFFERENTIAL
DELAY-OF-REWARD
TRAINISG
673
favoring immediate reward. These results indicate that response speeds associated with each stimulus do depend upon the delay of reward associated with a response to that stimulus. Within each of the differential groups (D-O and D-10) and betw’een the nondifferential groups (N-O versus N-lo), the speed of response to the stimulus associated with zero reward delay was more rapid than that to the stimulus followed by a lo-second delay in all comparisons. One additional analysis was performed on the latencies in the final block of trials to test the effects of having two different reward delays associated with the stimuli in the differentially rewarded groups. For purposes of this analysis the response latencies for all Ss in both differential groups to each delay associated stimulus were compared to latencies of all Ss in the appropriate nondifferential group. Thus, response speeds of 8s in the immediately rewarded group (N-O) were compared to the latencies of all Ss in both differential groups to the immediately rewarded stimulus, and response latencies in the nondifferential group experiencing the lo-second delay of reward (N-10) were compared to the latencies of 8s in the differential groups to the stimulus associated with the losecond delay of reward. These analyses attained significance (t = 5.43, df = 318, p < .OOl and t = 5.46, cZf = 3.18, p < .OOl, respectively) and indicate that terminal response speeds were faster to immediately rewarded stimuli in nondifferential groups than in differential groups and, were slower to the lo-second delayed reward stimuli in the nondifferential groups than in the differential groups. This replicates a similar effect noted by Lipsitt et al. (1959) and may be attributed to the presence of generalization which would affect the response speeds in the differential groups. Discrimination
Test
The effect of the differential training conditions upon discrimination test performance was tested by an analysis of the number of correct responses over trials for all four experimental groups. For purposes of this analysis those Xs who reached the criterion of 20 correct responses in a block of 25 trials prior to 100 trials were given a cont’inuing score equal to the score they obtained on the last 20 trials of their criterion runs. The results of a split plot double classification analysis of variance of the number of correct responses over trials provided a test for t,he effect of differential versus nondifferential training, as well as a test for any possible contrast effects of switching from a training delay (either zero or 10 seconds) associated with the positive discriminative stimulus, to the delay of 5 seconds associated with it during the discrimination test. The main plot analysis comparing test performance as a function of superior
674
DONALD
J.
TYRRELI,
differential to nondifferential training approached significance (F = 2.86, df = 1,60, .lO > p > .05). This effect provides partial support for the prediction of superior discrimination performance following differential training conditions enhancing the transfer of a relevant mediating response. The main effect of changing the delay of reward (from either zero or 10 seconds) associated with a stimulus during training to the 5-second delay experienced during discrimination was not significant (P < 1, df = 1,60, p < .lO) ; however, the interaction of Training Condition X Delay Change was significant (F = 6.45, df = 1,60, p < .05) as was the triple interaction of Training Condition X Delay Change X Trials (F = 2.43, df = 4,240, p < .05). The overall trials effect was highly significant (F = 48.45, df = 4,240, p < .Ol). In order to understand the meaning of these significant interactions a series of simple split plot analyses of variance were computed. These revealed no significant differences between the two nondifferential training groups; however, the difference between the differential groups reached significance (F = 4.76, df = 1,30, p < .05) as did the interaction of these Groups X Trials (F = 4.41, df = 4,120, p < .Ol) . These results combined with an inspection of Figure 2, which presents the mean correct responses over trial blocks for the two differential groups and the combined nondifferential groups, demonstrate that Group D-O was superior to all 20.
1
2 TRIAL
FIQ. 2. Mean number blocks of 20 trials.
of correct
responses
3
4
5
BLOCKS during
discrimination
test
plotted
in
DIFFERENTIAL
DELAY-OF-REWARD
TRAINING
675
groups, and that Group D-10 although initially inferior to all groups rapidly surpasses all but Group D-O. The initial inferiority of performance in Group D-10 is also indicated by an analysis of the distribution of responses on the first discrimination trial. The obtained distribution of the number of S making a correct response shows a significant deviation from chance expectations, with 12 and 1 correct response in Groups D-O and D-10, respectively. It can be concluded then that the initial probability of responding to the positive stimulus is affected by the differential delay conditions. When the positive discriminative stimulus was previously associated with a zero-second delay there was a higher probability of making a correct response than when the lo-second delay stimulus became positive. Impairment in initial performance for Group D-10 probably attenuated the overall superiority of the differential conditions, and the rapid increase in performance would account for the interactions noted earlier. SUMMARY
The superior level of discrimination test performance produced by differential training (D-O and D-lo), the initial retardation of discriminative performance in Group D-10, and the rapid rise to asymptote in the latter were interpreted as supporting the hypothesis that differential delay training increased the probability of making relevant mediating, as well as instrumental, responses. Thus, the transfer of a relevant mediating response from training to test produced an overall superiorit,y in discriminative test performance following the differential, compared to nonthe discrepancy in discrimination differential, training. Furthermore, performance between groups experiencing differential training (D-O and D-10) was explained by the transfer of a high probability of responding to the positive cue for Ss in Group D-O, and a low response probability to the correct cue by Ss in D-10, on the initial discrimination test trials. Although it is impossible, from these results, to specify the exact nature of the mediating response acquired following differential training, recent evidence offers cogent arguments for an attentional or observing response such as postulated by Zeaman and House (1963). Witryol, Lowden, and Fagan (1967) have demonstrated that differential reward training can influence observing response probabilities in discrimination learning. It seems possible that reward value and reward delays influence an attentional mediating response in similar, if not identical, fashion. REFERENCES J. F., AND WITRYOL, on children’s learning 1966,37, 433438.
FAQAN,
S. L. The effects of instructional a simultaneous discrimination
set and delay of reward task. Child Development,
676
DOKALD
J.
TYRRELL
H. H., AND KENDLER, T. S. Vertical and horizontal processes in problem solving. Psychological Review, 1962, 69, 1-16. LAWRENCE, D. H. Acquired distinctiveness of cues: I. Transfer bet,ween discrimination on the basis of familiarity with the stimulus. Journal of Experi,mentnl Psychology, 1949, 39, 770-784. LIPSITT, L. P., CASTANEDA, H., AND KEMBLE, J. Effects of dela,yed reward pre-training on discrimination learning of children. Child Development, 1959, 30, 273-278. MACKINTOSH, N. J., AND MACKINTOSH, J. Reversal learning in Octopus Vulgaris Lasmarck with and without irrelevant cues. Quarterly Joumnl of Experimental Psychology, 1963, 15, 236-242. SHEPP, R. E., AND TURRISI, F. D. Learning and transfer of mediated responses in discriminative learning. In N. R. Ellis (Ed.), International Review of Research in Mental Retardation. New York: Academic Press, 1966, Pp. 85-121. SPENCE, K. W. The nature of discrimination learning in animals. Psychological Review, 1936, 43, 427449. TERRELL, G. Delayed reinforcement effects. In L. P. Lipsitt and C. C. Spiker (Eds.), Advances in Child Development and Behavior. New York: Academic Press, 1965, Pp. 127-158. WITRYOL, S. L., TTHRELL, D. J., AND LOWDEX, L. M. Development of incentive values in childhood. Genetic Psychology Monographs, 1965, 72, 201-246. WITRYOL, S. L., LOWDEN, L. M., AND FAGAN, J. F. Incentive effects upon attention in children’s discrimination learning. Journal of Experimental Child Psychology, 1967, 5, 94-108. ZEAMAN, D., AND HOUSE, B. J. The role of attention in retardate discrimination learning. In N. R. Ellis (Ed.), Handbook of Mental Deficiency. New York: McGraw-Hill, 1963, Pp. 159-223. KENDLER,