Memory facilitation by posttrial hypothalamic stimulation and other reinforcers: A central theory of reinforcement

Memory facilitation by posttrial hypothalamic stimulation and other reinforcers: A central theory of reinforcement

Biobehavioral Reviews, Vol. 1, pp. 143-150, 1977. Copyright © ANKHO International Inc. All rights of reproduction in any form reserved. Printed in the...

752KB Sizes 0 Downloads 34 Views

Biobehavioral Reviews, Vol. 1, pp. 143-150, 1977. Copyright © ANKHO International Inc. All rights of reproduction in any form reserved. Printed in the U.S.A.

Memory Facilitation by Posttrial Hypothalamic Stimulation and Other Reinforcers: A Central Theory of Reinforcement J O S E P H P. H U S T O N , C O R N E L I A C. M U E L L E R A N D C E S A R E M O N D A D O R I

Institute o f Pharmacology, University o f Zftrich, Gloriastrasse 32, 8006 Zftrich, Switzerland ( R e c e i v e d 15 J u n e 1 9 7 7 ) HUSTON, J. P., C. MUELLER AND C. MONDADORI. Memory facilitation by posttrial hypothalamic stimulation and other reinforcers: a central theory of reinforcement. BIOBEHAV. REV. 1(3) 143-150, 1977. - A central theory of reinforcement is presented, which is based on the assumption that reinforcers act directly on dynamic memory processes. One prediction from the model is that learning of various tasks should be improved as a result of reinforcement presented during the period of short-term memory. To test this hypothesis the reinforcers food, water or electrical stimulation of the lateral hypothalamus were presented posttrial in diverse learning situations. Posttrial food reinforcement facilitated passive avoidance learning in mice. In rats 30 sec of posttrial reinforcing brain stimulation facilitated learning of a shuttle-box avoidance, step-down avoidance, and small box avoidance. Appetitive T-maze learning was improved with posttrial reinforcing brain stimulation contingent on errors. Learning of a conditioned taste aversion was not influenced by reinforcing brain stimulation presented during the CS - UCS interval. Hypothalamic stimulation

Short-term memory

Reinforcement

THIS p a p e r p r e s e n t s a c e n t r a l t h e o r y o f r e i n f o r c e m e n t based o n h y p o t h e t i c a l m e m o r y processes, f o l l o w e d b y a review of e x p e r i m e n t a l results of relevance for t h e t h e o r y .

Brain stimulation

r e i n f o r c e r s t r e n g t h e n s or p r o l o n g s this i m m e d i a t e m e m o r y to t h e e x t e n t t h a t it is p r e s e n t e d close in t i m e to t h e response. Thus, the closer the r e i n f o r c e r o c c u r s in t i m e t o the response, t h e less likely is it t h a t t h e m e m o r y has f a d e d (the s t r o n g e r t h e still existing m e m o r y trace), a n d t h e m o r e likely t h a t t h e trace can be m a i n t a i n e d at a h i g h e r level, and t h u s , b e e s t a b l i s h e d as a s h o r t - t e r m m e m o r y . Figure 1 s c h e m a t i c a l l y s h o w s t h e r e l a t i o n s h i p b e t w e e n the delay of r e i n f o r c e m e n t a n d t h e h y p o t h e t i c a l decay f u n c t i o n of an i m m e d i a t e m e m o r y trace ( a r b i t r a r i l y d r a w n to be c o n v e x , r a t h e r t h a n linear or c o n c a v e ) . It s h o w s t h a t an i m m e d i a t e r e i n f o r c e r ( R I ) will establish a m o r e p r o m i n e n t s h o r t - t e r m m e m o r y trace, t h a n a delayed r e i n f o r c e r (R~). Such a t e m p o r a l g r a d i e n t o f s t r e n g t h of r e i n f o r c e m e n t has long b e e n s u p p o s e d f r o m studies dealing w i t h t h e effects o f delayed r e i n f o r c e m e n t [ 3 2 , 3 7 ]. Hence, a c c o r d i n g to o u r m o d e l , a r e i n f o r c e r s t r e n g t h e n s m e m o r y simply b y e n s u r i n g t h a t it persists in t i m e a n d e n t e r s a m o r e p r o l o n g e d , b u t labile state, w h i c h tradit i o n a l l y is called s h o r t - t e r m m e m o r y storage. T h e c o n c e p t of a labile s h o r t - t e r m m e m o r y storage has b e e n characterized a n d o p e r a t i o n a l i z e d b y a large b o d y of e v i d e n c e [25] s h o w i n g t h a t w i t h i n a r e s t r i c t e d p e r i o d of t i m e a f t e r a learning trial an e l e c t r o c o n v u l s i v e s h o c k (or o t h e r treatm e n t s , i n c l u d i n g brain s t i m u l a t i o n a n d drugs) will p r e v e n t l e a r n i n g (i.e. will lead to a r e t r o g r a d e a m n e s i a for t h e task). Thus, d u r i n g a l i m i t e d p e r i o d of t i m e a f t e r a learning trial the m e m o r y trace is n o t y e t p e r m a n e n t or fixed (or c o n s o l i d a t e d ) i n t o a l o n g - t e r m m e m o r y . (The period o f l o n g - t e r m , or p e r m a n e n t m e m o r y storage s u b s e q u e n t to s h o r t - t e r m storage can be defined, for e x a m p l e , b y the relative failure of post-trial a m n e s t i c t r e a t m e n t s to influ-

A MEMORY PROCESSING THEORY OF REINFORCEMENT In the c o n t e x t of t r a d i t i o n a l r e i n f o r c e m e n t t h e o r y m e m o r y is o f t e n c o n s i d e r e d as s o m e t h i n g t h a t is s o m e h o w d e t e r m i n e d b y and, t h u s , follows r e i n f o r c e m e n t . This h a s usually led t o a s e p a r a t i o n b e t w e e n t h e o r i e s o f m e m o r y a n d theories of reinforcement. The following account integrates the c o n c e p t s o f m e m o r y and r e i n f o r c e m e n t b y i n v o k i n g h y p o t h e t i c a l m e m o r y processes to a c c o u n t for the a c t i o n of r e i n f o r c e r s o n b e h a v i o r (see also [ 16,17] ). We can start w i t h t h e p r o p o s i t i o n t h a t r e i n f o r c e r s act o n m e m o r y processes. This simply m e a n s t h a t r e i n f o r c i n g e v e n t s p r e v e n t m e m o r y traces f r o m fading. In s u m m a r y , r e i n f o r c e r s will be c o n s i d e r e d t o p r e v e n t an i m m e d i a t e m e m o r y trace f r o m fading, and t h u s , to e s t a b l i s h a s h o r t - t e r m m e m o r y trace, which, if it persists, will create a l o n g - t e r m m e m o r y trace. In t h e simple o p e r a n t c o n d i t i o n i n g s i t u a t i o n the reinf o r c e r is said t o act b a c k w a r d s in time o n t h e p r e c e d i n g r e s p o n s e or t h e s t i m u l u s - r e s p o n s e c o n n e c t i o n , w h i c h is t h u s s t r e n g t h e n e d as a result. We can go f u r t h e r and h y p o t h e s i z e t h a t t h e r e i n f o r c e r acts o n a m e m o r y of the r e s p o n s e or of t h e s t i m u l u s - r e s p o n s e c o n t i g u i t y . This requires the assumpt i o n t h a t any response or o p e r a n t (or d i s c r i m i n a t i v e s t i m u l u s - o p e r a n t b e h a v i o r c o n t i g u i t y ) leaves an i m m e d i a t e m e m o r y trace, w h i c h n o r m a l l y fades in a s h o r t t i m e unless it is s t r e n g h e n e d or p r o l o n g e d b y some e x t r a critical event. This critical e v e n t is the p r e s e n t a t i o n o f a reinforcer. The

z Supported by SNSF grant No. 3, 6610.75. This manuscript is also published in the Proceedings of the 5th International Neurobiology Symposium, Magdeburg, GDR, June, 1977. 143

144

HUSTON, M U E L L E R AND M O N D A D O R I

immediate memory j~//

_/_.. "-'~

long-term memory short-term memory

,

//

!

_z < ar

~. \\\ 1 operant behavior

R2

\\

\R2 \X,L

R3

FIG. 1. An operant response (in contiguity with a discriminative stimulus situation) leaves an immediate memory trace. A reinforcer (reward) applied during this period (R~ and R 2) will prolong the trace to establish short-term memory (STM), which, depending on its intensity and duration, will determine the level of long-term memory (LTM). It follows that a reinforcer (R3) presented during the STM period could facilitate LTM by prolonging the STM trace.

ence recall). Since during the presumed short-term storage a m e m o r y trace is easily disrupted by convulsions and o t h e r interventions, it is widely held that m e m o r y during this time is coded in the form of d y n a m i c physiological events (e.g. some form of persisting electrical activity in the brain). It can be assumed that the extent to which a short-term m e m o r y trace may be transformed into a more p e r m a n e n t trace depends on its magnitude and persistence; the longer it endures and the more intense it is, the m o r e likely will it lead to " f i x e d " structural or chemical changes, and thus, a more p e r m a n e n t storage, which is less prone to disruption. The question arises as to h o w a reinforcer could act to strengthen or maintain memory traces. A neurophysiological foundation for such a process of prolongation of a m e m o r y trace is provided by evidence that neural events can be operantly conditioned. Electrical activity of the brain (including activity of single neurons, evoked potentials, and c o m p o n e n t s of the gross EEG) can be either maintained or modified by contingent r e i n f o r c e m e n t [2, 11, 13, 33, 39]. Since short-term m e m o r y traces, by reason of their lability, are widely conceived of as being coded in form of active neurophysiological activity, it follows that short-term m e m o r y traces can be operantly c o n d i t i o n e d (can be made to persist by contingent reinforcement). Hence, a reinforcer contingent on a given pattern of neural activity ( m e m o r y ) will prolong this pattern, or decrease the likelihood that it will fade or be disrupted. Such considerations provide a physiological rationale for the theory that reward acts by maintaining or prolonging m e m o r y traces. If reinforcers act by strengthening immediate m e m o r y (i.e. by causing m e m o r y traces to persist in time), it follows that a reinforcer should have a similar direct facilitating effect on any form of dynamic m e m o r y , including short-term memory. A reinforcer presented during the posttrial period of labile short-term storage (Fig. 1, R3) should

increase the likelihood that the m e m o r y trace will persist, and thus the probability and degree of its fixation in a long-term store. Hence, a reinforcer closely contingent on a stimulus-operant contiguity brings the m e m o r y into shortterm storage; a reinforcing state of affairs during short-term storage should also facilitate the fixation of the trace into the long-term store. It follows from the memory-facilitation m o d e l of reinforcement, that it should be possible to facilitate other kinds of learning by presenting reinforcers during the immediate- or short-term m e m o r y phases, irrespective of whether the m e m o r y was established by operant conditioning, avoidance conditioning or any other m e m o r y paradigm, including imprinting. ( F u r t h e r implications of this m o d e l are presented elsewhere [161 ). EFFECTS OF POSTTRIAL REINFORCEMENT We have carried out a n u m b e r of experiments to test the above hypothesis that reinforcers act directly on m e m o r y processes. To examine the effects of posttrial reinforcement, avoidance tasks were preferred to appetitive tasks, since a facilitating effect of a posttrial reinforcer in an appetitive learning situation could be interpreted simply in terms of additive effects of positive reinforcers, whereas a facilitation of avoidance learning as a result of posttrial positive reinforcement would be rather counterintuitive from conventional considerations regarding the action of reinforcers; i.e. a reinforcer following a punishing event would be expected to alleviate instead of enhance the effects of said punishment; whereas from the standpoint of our theory a positive reinforcer presented during the labile m e m o r y for an avoidance task should facilitate learning of the task. Accordingly, our first studies dealt with the effects of posttrial r e i n f o r c e m e n t on avoidance learning.

CEN T R AL THEORY OF R E I N F O R C E M E N T

145 It should be noted that the results are rather counterintuitive from the standpoint of conventional considerations of the contingencies involved; i.e. it might be expected that a punishment followed by a reinforcer should lead to poorer, rather than improved learning, unless the reinforcer acted directly on the memory processing of the behavior-punishment contingency, as predicted from the model above.

90

80 7O a, 60 I/I

~, s o ®g

¢-

301

©c2

2o 1o I 10

I 20

I 30

I 40

I 50

I 60

I I I 70 80 90sec delay after trial

FIG. 2. Mean step-down latencies (SDL) of experimental (posttrial food reward) and control (C) animals. Lower curve: groups were determined on basis of time of access to food, irrespective of onset of feeding. Upper curve: groups determined on basis of exact time of onset of feeding.

(A) POSTTRIAL FOOD OR WATER REINFORCEMENT

Facilitation o f Passive Avoidance Learning in Mice with Posttrial Food Reinforcement The first study to propose as well as to demonstrate a facilitation of avoidance learning with posttrial reinforcement employed a one-trial step-down avoidance task with mice [ 18]. Upon stepping off a round platform onto a grid floor and getting a footshock (2 mA, 1 sec) the animals received 1 min access to food at various times after the footshock. After 24 hr they were retested on the platform. The lower curve of Fig. 2 summarizes the results from the first study. It shows that the step-down latencies were longer (learning was superior to controls) in the groups given food reinforcement within 20 and 50 sec after the learning trial. This experiment was replicated successfully several times. The upper curve of Fig. 2 shows the results of a replication with somewhat different procedures by Mondadori et al. [27]. Significantly better learning was exhibited by the groups given posttrial access to food 10 and 30 sec after the footshock (1 mA, 1 sec). Virtually the same results were obtained in another (unpublished) study in which the mice were directly placed onto the grid floor and given a footshock prior to being tested for step-down avoidance learning 24 hr later. In a further replication [ 17] facilitation of learning was obtained with posttrial food-reinforcement presented within 5 0 - 6 0 sec after the footshock. The phenomenon of a time-dependent facilitation of avoidance learning with posttrial food reinforcement seemed robust enough to warrent the conclusion that the hypothesis outlined above (that reinforcement acts directly on memory processing) is tenable and worthy of further investigation.

Effects o f Posttrial Drinking on Passive Avoidance Learning (Step-Down) in Rats The following study (unpublished) was performed to test the influence of posttrial liquid reinforcement on learning of a step-down avoidance task in rats. Prior to the experiment the rats were handled daily for 2 weeks, and during the second week were brought under stimulus control of a water bottle; i.e., after 48 hr of water deprivation they had access to water from a bottle 4 times per day, each time for 2 min, so that they learned to commence drinking within a few seconds after presentation of the water bottle. Then, after 24 hr of further water deprivation, they were given a trial in the step-down apparatus. The apparatus consisted of a large box (50 × 50 × 37 cm) with an electrifiable grid floor and an adjoining smaller box containing the platform (10 × 11 × 2 cm high). The 1 mA footshock was 1 sec in duration. Subsequent to the footshock the various groups were given 2 min of access to water at either 10, 20, 40, 80 or 160 sec delays (18 animals per group). The control animals received 2 min access to water just before the step-down experiment. They were all retested 24 hr later. Figure 3 summarizes the results. There were no significant differences between any of the groups, although there is an apparent tendency towards superior learning for the 10 sec delay group. Therefore, the results of this study are ambiguous with respect to the question of facilitation of avoidance learning by posttrial reinforcement.

80

"~ 60

©c

®

E

20

I 10

I 20

I 40

I 80

I 160 sec delay after trial

FIG. 3. Step-down latencies (SDL) for experimental (2 min of posttrial drinking) and control (pretrial drinking) rats in step-down passive avoidance task.

146

H U S T O N , M U E L L E R AND M O N D A D O R I

(B) POSTTRIAL REINFORCING LATERAL HYPOTHALAMIC STIMULATION

60

Facilitation of Shuttle-Box Avoidance Learning The first evidence for a f a c i l i t a t i o n of avoidance learning with p o s t t r i a l r e i n f o r c i n g brain s t i m u l a t i o n was p r o v i d e d by o u r p r e v i o u s s t u d y [26] and an ( u n p u b l i s h e d ) replication. The p r e p a r a t i o n in the p r e s e n t a n d all the o t h e r b r a i n s t i m u l a t i o n studies described b e l o w consisted of adult male a l b i n o Sprague-Dawley rats i m p l a n t e d w i t h bipolar 0.2 m m diam. stainless steel e l e c t r o d e s i n t o the lateral h y p o t h a l a mus (AP 5.4; H - 2 . 6 ; L 1.6; [ 3 0 ] . The m i n i m a l c u r r e n t level t h a t still m a i n t a i n e d s e l f - s t i m u l a t i o n w i t h 0.2 sec trains of 100 Hz sinusoidal s t i m u l a t i o n ranged f r o m 2 0 - 8 0 uA, rms. The animals were t r a i n e d in a o n e - w a y s t e p - t h r o u g h active avoidance task. The learning a p p a r a t u s c o n s i s t e d of a white and an a d j o i n i n g black c h a m b e r (18 x 25 x 20 cm e a c h ) w i t h electrifiable grid floors. The f o o t s h o c k was a s c r a m b l e d 1 m A , 3 sec d u r a t i o n c u r r e n t . A trial c o n s i s t e d o f placing an a n i m a l i n t o the black c h a m b e r . It h a d to e n t e r the w h i t e c h a m b e r w i t h i n 10 sec to avoid the f o o t s h o c k . A f t e r 23 trials, all animals reliably p e r f o r m e d the avoidance response. F o r the main experim e n t t h e y were divided i n t o an e x p e r i m e n t a l and a c o n t r o l group a n d s u b j e c t e d to a reversal learning p r o c e d u r e ; i.e. they were placed i n t o the dark c o m p a r t m e n t w h e r e t h e y i m m e d i a t e l y p e r f o r m e d the well e s t a b l i s h e d s t e p - t h r o u g h avoidance. As soon as t h e y e n t e r e d the previously safe white b o x t h e y received a f o o t s h o c k . T h e y were t h e n r e m o v e d and in a n o t h e r c h a m b e r the e x p e r i m e n t a l animals received 30 sec r e w a r d i n g b r a i n s t i m u l a t i o n (0.2 sec o n 0.8 sec off) 30 sec a f t e r the f o o t s h o c k . The c o n t r o l s were h a n d l e d identically, b u t received n o s t i m u l a t i o n . Retest p e r f o r m a n c e , i.e., s t e p - t h r o u g h l a t e n c y f r o m the black to the white c h a m b e r , was m e a s u r e d after 24 hr. A trial was t e r m i n a t e d w h e n retest l a t e n c y e x c e e d e d 150 sec. The results are p r e s e n t e d in Fig. 4. Whereas the c o n t r o l s did n o t s h o w learning ( m e a n s t e p - t h r o u g h l a t e n c y 9 sec), the e x p e r i m e n t a l animals s h o w e d a significant increase in s t e p - t h r o u g h l a t e n c y ( m e a n 59 sec). The difference b e t w e e n the t w o g r o u p s at retest was statistically significant. Hence, posttrial h y p o t h a l a m i c s t i m u l a t i o n facilitated learning to avoid the previously safe c o m p a r t m e n t . F r o m c o n v e n t i o n a l c o n s i d e r a t i o n s of the c o n t i n g e n c i e s involved in this e x p e r i m e n t o n e w o u l d e x p e c t t h a t a reward following a p u n i s h m e n t w o u l d w e a k e n the effects o f the p u n i s h m e n t , and lead to p o o r e r instead of i m p r o v e d learning. A facilitation of learning, as f o u n d above, however, would be e x p e c t e d if the reward h a d its effects on the m e m o r y processing of the b e h a v i o r - p u n i s h m e n t c o n t i n gency. Figure 5 shows the u n p u b l i s h e d result of a similar study. Virtually the same p r o c e d u r e was used, e x c e p t t h a t an a d d i t i o n a l group was i n t r o d u c e d w h i c h received the posttrial r e i n f o r c i n g h y p o t h a l a m i c s t i m u l a t i o n at a 3 0 0 sec delay. Since the animals did n o t learn to w i t h h o l d the c o n d i t i o n e d avoidance response so well as in the above study, t h e y were tested over 6 r e t e s t trials (spaced 24 hr apart). The f o o t s h o c k plus the posttrial t r e a t m e n t s were a d m i n i s t e r e d a f t e r each trial, unless the avoidance response was l e a r n e d to a criterion of 150 sec. The 30 sec delay group p e r f o r m e d significantly b e t t e r than the o t h e r t w o groups (see Fig. 5). Hence, 30 sec delayed posttrial r e i n f o r c i n g s t i m u l a t i o n facilitated avoid-

"C Q; ~h

40

u r(D ,.I-.,

r~ rr~ ~D

--'---20

30 s e c control

E

I--O I

I

A

B trial

FIG. 4. Step-through latencies of 8 control (no stimulation) and 7 experimental (30 sec posttrial hypothalamic stimulation) rats to enter the goal compartment. Trial A: mean latencies after 7 days of training to avoid shock in the start compartment. Trial B: mean latencies in first trial after receiving a footshock in the previously safe compartment. ance learning as in the above study. However, 3 0 0 sec delayed p o s t t r i a l s t i m u l a t i o n did n o t facilitate learning, suggesting t h a t the influence of r e i n f o r c e m e n t on shortt e r m m e m o r y processes is t i m e - d e p e n d e n t .

Facilitation of Passive Avoidance Learning (2-Compartment Test) To e x a m i n e the effects of posttrial r e i n f o r c i n g h y p o t h a l a m i c s t i m u l a t i o n on passive avoidance learning the 2 - c o m p a r t m e n t test was used [ 4 , 2 2 ] . The a p p a r a t u s consists of a little b o x (21 x 13 x 16 cm), w h i c h adjoins a larger c h a m b e r (50 x 50 x 37 cm). In this task the rat's t e n d e n c y to s p e n d m o s t of its time inside the small box is r e d u c e d s u b s e q u e n t to the a n i m a l ' s receiving inescapable electrical f o o t s h o c k in the small c h a m b e r . Fourty-six rats, in w h i c h electrical s t i m u l a t i o n of the lateral h y p o t h a l a m u s was reinforcing, were used. The animals were given 3 baseline trials (one 5 rain trial per day) d u r i n g w h i c h the entries and time spent in the little box were recorded. T h e y were t h e n assigned to 3 groups: to 2 e x p e r i m e n t a l groups (15 rats per g r o u p ) t h a t received 30 sec of r e i n f o r c i n g s t i m u l a t i o n (0.2 sec o n / 0 . 8 sec off) e i t h e r i m m e d i a t e l y or 30 sec after the f o o t s h o c k s in the little b o x ; and a c o n t r o l group (n = 16) t h a t was h a n d l e d e x a c t l y as the 30 sec p o s t t r i a l s t i m u l a t i o n group, b u t received n o stimulation. The following p r o c e d u r e was t h e n r e p e a t e d 3 times: The animals were placed into the small b o x and a d m i n i s t e r e d inescapable s c r a m b l e d f o o t s h o c k (1 mA, 3 sec d u r a t i o n ) . The respective posttrial t r e a t m e n t s were t h e n a d m i n i s t e r e d .

147

CENTRAL THEORY OF REINFORCEMENT

100 -

80-

80--

60Lh L k-

E L.

(b

-

60

/

/

E

/

40-

/ /

;e 40

20-

-

~ ~J

3

0

0

-

-

30s+c sec control

20

30 sec <

I 1+2

I 3+4

I 5+6

trial FIG. 5. Percent learners (retest latencies of /> 150 sec) of two experimental (stimulation 30 or 300 sec after shock) and one control group (sham stimulation after 30 sec) during 6 days (6 trials) of avoidance learning. T w e n t y - f o u r h r l a t e r t h e y were tested in the same m a n n e r as d u r i n g t h e baseline trials. T h e e x p e r i m e n t a l g r o u p t h a t received t h e 30 sec d e l a y e d p o s t t r i a l r e i n f o r c i n g s t i m u l a t i o n p e r f o r m e d significantly b e t t e r t h a n b o t h , t h e i m m e d i a t e r e i n f o r c e m e n t g r o u p as well as t h e c o n t r o l g r o u p , ( p < 0 . 0 1 , p < 0 . 0 2 respectively, K r a u t h - t e s t f o r t i m e - e f f e c t curves; K r a u t h [20] ). Figure 6 illustrates this f i n d i n g b y s h o w i n g the p e r c e n t a g e of learners (rats t h a t n e v e r e n t e r e d the small b o x d u r i n g the 5 m i n t e s t i n g p e r i o d s ) for the 3 groups over 3 trials. As in t h e s h u t t l e - b o x e x p e r i m e n t s described above, t h e result can be i n t e r p r e t e d in t e r m s of a direct facilitating effect o f p o s t t r i a l r e i n f o r c e m e n t o n s h o r t - t e r m m e m o r y processes.

Facilitation o f Passive Avoidance Learning (Step-Down Task] A n o t h e r u n p u b l i s h e d s t u d y p r o v i d e s f u r t h e r e v i d e n c e for p o s t t r i a l f a c i l i t a t i o n o f passive a v o i d a n c e l e a r n i n g w i t h r e i n f o r c i n g h y p o t h a l a m i c s t i m u l a t i o n . In the p r e s e n t s t u d y the same a p p a r a t u s as in t h e p o s t t r i a l liquid r e i n f o r c e m e n t s t u d y was used. However, i n s t e a d of receiving f o o t s h o c k c o n t i n g e n t u p o n s t e p p i n g off the p l a t f o r m , the a n i m a l s were initially d i r e c t l y placed o n the grid f l o o r w h e r e t h e y received the f o o t s h o c k . To test a v o i d a n c e of the grid f l o o r 24 h r l a t e r t h e y were placed o n t o the p l a t f o r m . O p t i m a l s e l f - s t i m u l a t i o n p a r a m e t e r s were d e t e r m i n e d for 29 male Sprague-Dawley rats i m p l a n t e d w i t h e l e c t r o d e s in the lateral h y p o t h a l a m u s .

5

sec

controt

I

I

2

3

trial FIG. 6. Percentage of animals which successfully avoided little box in which they were shocked (total avoidance over the 5 min trial), over 3 trials. Experimental groups received 30 sec of reinforcing brain stimulation either 30 sec (thick solid line) or < 5 sec (thin solid line) after each footshock. The controls received sham stimulation 30 sec after each footshock. E x p e r i m e n t a l g r o u p I (n = 9) received 30 sec delayed p o s t t r i a l s t i m u l a t i o n i n t e r m i t t e n t l y (0.2 sec o n / 0 . 8 sec off) for 3 0 sec d u r a t i o n . E x p e r i m e n t a l group II (n = 10) was t r e a t e d t h e same w i t h c o n t i n u o u s s t i m u l a t i o n . The c o n t r o l group (n = 10) was h a n d l e d i d e n t i c a l l y w i t h o u t b e i n g given stimulation. On t h e basis of the p r e v i o u s e x p e r i m e n t s it was a s s u m e d t h a t t h e 30 sec d e l a y e d p o s t t r i a l d i s c o n t i n u o u s s t i m u l a t i o n group w o u l d s h o w f a c i l i t a t i o n of m e m o r y a n d t h e r e f o r e learn b e t t e r t h a n the c o n t r o l group. It was also e x p e c t e d t h a t t h e c o n t i n u o u s s t i m u l a t i o n m i g h t have some aversive p r o p e r t i e s a n d t h u s i n h i b i t p e r f o r m a n c e of the l e a r n i n g task; a n d t h e r e b y a c c o u n t for the divergent results s u m m a rized in p a r t C below. T h e s p o n t a n e o u s s t e p - d o w n b e h a v i o r was m e a s u r e d in t h r e e baseline trials. The a n i m a l s were p u t i n t o the p l a t f o r m b o x a n d a f t e r 10 sec the guillotine d o o r was raised and the s t e p - d o w n l a t e n c y ( S D L ) was m e a s u r e d . In the following learning trial t h e animals were p u t directly o n t o the grid f l o o r o f t h e large b o x and a 1 m A s c r a m b l e d f o o t s h o c k was a d m i n i s t e r e d for 1 sec. T h e n the animals were r e m o v e d f r o m t h e l e a r n i n g s i t u a t i o n and s t i m u l a t e d or sham stimulated in a separate c o n t a i n e r 30 sec a f t e r the f o o t s h o c k for the d u r a t i o n o f 30 sec a c c o r d i n g to the group t r e a t m e n t . T h e retest trial was carried o u t 24 h r later. As in t h e baseline trial the animals were p u t i n t o t h e p l a t f o r m b o x ,

148

H U S T O N , M U E L L E R AND M O N D A D O R I

80 exp. I exp. II control

6O u Ca I/1 v

_J a UO c

40

E

20

-<3 I I*2

I

I

3*4

5*6

appetitive T-maze learning. It should be e m p h a s i z e d t h a t p o s t t r i a l r e i n f o r c e m e n t in this case was a d m i n i s t e r e d o n l y a f t e r w r o n g responses. In the T-maze s i t u a t i o n any i m p r o v e m e n t in learning due to p o s t t r i a l r e i n f o r c e m e n t a f t e r correct r e s p o n s e s could simply be i n t e r p r e t e d in t e r m s o f additive effects of reinforcers; whereas a facilitation as a result of p o s t t r i a l reward a f t e r errors would be c o u n t e r i n t u i t i v e b y c o n v e n t i o n a l c o n s i d e r a t i o n s , yet would s u p p o r t the h y p o t h e s i s o f a direct a c t i o n of reward o n s h o r t - t e r m m e m o r y processes. F o u r t y male Sprague-Dawley rats w i t h e l e c t r o d e s imp l a n t e d in the lateral h y p o t h a l a m u s were used. The T-maze consisted of a 100 x 10 x 40 cm stem and t w o 45 x 10 x 4 0 cm arms. One arm had black, the o t h e r white walls. Prior to the learning of the simple left-right d i s c r i m i n a t i o n ( b l a c k - l e f t / w h i t e - r i g h t ) food was available in b o t h arms o f the maze. The rats were t h e n assigned to 2 groups: an e x p e r i m e n t a l group (n = 19) t h a t received posttrial r e i n f o r c i n g brain s t i m u l a t i o n (0.2 sec o n / 0 . 8 sec off; 30 sec d u r a t i o n ) 30 sec a f t e r a w r o n g r e s p o n s e in the T-maze (i.e. e n t r y i n t o the arm t h a t was n o t r e i n f o r c e d w i t h f o o d ) and a c o n t r o l group (n = 21) w h i c h was h a n d l e d the same b u t did n o t receive any s t i m u l a t i o n . E n t r y of the correct arm was r e i n f o r c e d w i t h c h o c o l a t e biscuits, w h i c h were i n t r o d u c e d a f t e r the rat h a d m a d e the response. Brain s t i m u l a t i o n was a d m i n i s t e r e d elsewhere in a cylindrical c o n t a i n e r . Two trials p e r day were a d m i n i s t e r e d . Figure 8 s h o w s a p o r t i o n of the results. The animals t h a t received p o s t t r i a l r e i n f o r c i n g brain s t i m u l a t i o n p e r f o r m e d significantly superior to the c o n t r o l group by various criteria.

trial

80

FIG. 7. Mean step-down latencies after posttrial discontinuous brain stimulation (experimental group I), continuous stimulation (experimental group II) and control group over 6 trials. the d o o r was o p e n e d a f t e r 10 sec and the SDL was measured. If an a n i m a l r e m a i n e d 150 sec ( c r i t e r i o n ) o n the p l a t f o r m it was c o n s i d e r e d a l e a r n e r and r e m o v e d f r o m the e x p e r i m e n t a l setting. L e a r n i n g a n d r e t e s t trials followed each o t h e r w i t h an i n t e r t r i a l l a t e n c y of 24 h r u n t i l 6 r e t e s t trials were c o m p l e t e d . T h e r e was a significant difference b e t w e e n t h e experim e n t a l g r o u p I ( d i s c o n t i n u o u s s t i m u l a t i o n ) and the c o n t r o l group, b u t n o significant d i f f e r e n c e s b e t w e e n the experim e n t a l group II ( c o n t i n u o u s s t i m u l a t i o n ) and a n y o t h e r group. The results are s u m m a r i z e d in Fig. 7. In general the a n i m a l s s h o w e d r a t h e r p o o r learning. This m i g h t be due to the e x p e r i m e n t a l design involving the c o n d i t i o n i n g of a grid aversion ( n o t p u n i s h m e n t of stepd o w n ) and the b r i e f f o o t s h o c k of 1 sec. In s u m m a r y , 30 sec delayed p o s t t r i a l r e i n f o r c i n g discont i n u o u s s t i m u l a t i o n (0.2 sec o n / 0 . 8 sec off) r e s u l t e d in significantly b e t t e r p e r f o r m a n c e of a c o n d i t i o n e d a v o i d a n c e task. The c o n t i n u o u s s t i m u l a t i o n m i g h t have b e e n less rewarding or even p u n i s h i n g and t h e r e f o r e less effective in facilitating s h o r t - t e r m m e m o r y processes, as suggested b e l o w in section C.

Facilitation o f Appetitive T-Maze Learning by Posttrial Reinforcement Contingent on Errors The following u n p u b l i s h e d s t u d y suggests t h a t posttrial r e i n f o r c i n g h y p o t h a l a m i c s t i m u l a t i o n can also facilitate

60

c

t_.

40-

30 sec

20-

control

I

I

I

1.2

3*4

5÷6 trial

FIG. 8. Percent correct responses in T-maze (food-rein/orced discrimination; black-left/white-right) over 6 trials (3 days). Experimental group received 30 sec of reinforcing brain stimulation 30 sec after each wrong response (no food). Control animals received sham stimulation after each wrong response.

149

CEN T R AL THEORY OF R E I N F O R C E M E N T Therefore, contrary to c o m m o n sense, but in support of our hypothesis, posttrial reinforcing brain stimulation presented 30 sec after a wrong response in a T-maze improved rather than hindered learning of this task. It should be noted that Deweer [9] similarly found a facilitation of appetitive T-maze learning by administering electrical stimulation of the reticular formation contingent on errors. Failure to Facilitate Conditioned Taste Aversion An unsuccessful attempt was made in the following (unpublished) study to facilitate learning of a taste aversion task with reinforcing lateral hypothalamic stimulation. The reinforcing stimulation was presented after the CS, prior to exposure to the UCS. The design was adapted from Nachman [28] who found a very weak amnestic effect when 5 or 10 sec of saccharin drinking were followed by ECS. Two experimental groups (17 rats per group) received 30 sec of reinforcing stimulation (0.2 sec on/0.8 sec off) either immediately or 30 sec after a 10 sec bout of saccharin drinking. An implanted control group (n = 16) was handled the same as the 30 sec delayed stimulation group, but received no stimulation. Two hr later 0.14 M LiC1 solution (2% of body weight) was injected IP. The following day they had access to saccharin solution for 10 min. Although all 3 groups showed significant taste aversion towards saccharin, there were no significant differences between the 3 groups. Thus, reinforcing brain stimulation presented within 60 sec after exposure to the CS (saccharin) prior to presentation of the UCS (LiC1), did not influence learning. (c) AMNESIA WITH POSTTRIAL STIMULATION OF SUBSTANTIA NIGRA Routtenberg and Holzman [34] reported that electrical stimulation of the substantia nigra (presumably pars compacta) during or after a learning trial disrupted passive avoidance learning in rats. Fibiger and Phillips [12] confirmed this finding, and found that the stimulationinduced amnesia could be prevented by ipsilateral 6-OHDA lesion of the dopaminergic nigro-striatal bundle. In view of the fact that electrical stimulation of the substantia nigra tends to be reinforcing (some of the animals of Routtenberg and Holzman self-stimulated), it would seem that these results contradict our findings with lateral hypothalmic stimulation. One possible explanation for this apparent discrepancy is that the electrical stimulation in these two studies was disruptive; i.e. they used continuous trains of stimulation, which could be disruptive and/or aversive, compared to our use of discontinuous (0.2 sec on/0.8 sec off) stimulation, which approximates the topography of self-administered stimulation. Studies are now in progress to determine whether amnesia and facilitation of learning result from, respectively, continuous and discontinuous posttrial stimulation in the substantia nigra and lateral hypothalamus, or whether other factors (e.g. anatomical

differences, nature of learning task) account for the difference. Preliminary results suggest that posttrial continuous stimulation is less effective than discontinuous stimulation in facilitating passive avoidance learning (see Facilitation o f Passive Avoidance Learning {Step-Down Task), above). (D) FACILITATION OF LEARNING WITH NON-REINFORCING POSTTRIAL BRAIN STIMULATION? Facilitation of learning has also been reported with low intensity stimulation of the mesencephalic reticular formation (RF) [3, 7, 9], the hippocampus [8, 10, 23] and amygdala [ 14]. Although relatively weak self-stimulation can be obtained from all these structures, none of these authors reported rewarding effects of the stimulation, and Bloch [3] even reported an absence of self-stimulation with his reticular formation stimulation. (E) FACILITATION OF LEARNING WITH POSTTRIAL DRUG INJECTION Various central nervous stimulants, when injected posttrial, can enhance learning, presumably by facilitation of memory storage processes in a time-dependent manner [6]. Particularly interesting for our hypothesis that posttrial reinforcement acts on memory processes are studies on the facilitation of learning with posttrial drugs that may influence reward processes, or are among the drugs that animals learn to self-administer. Among these are norepinephrine (NE) and amphetamine. NE has long been implicated in the reward process [31,38], and so has amphetamine [ 35 ], which animals self-administer [ 40]. Hall [15] found a time-dependent amnestic effect of posttrial IP injection of diethyldithiocarbamate (DDC) on discriminated shock-escape learning. Hall suggests that the reduced brain norepinephrine (NE) may be responsible for the amnesia since "the amnesic dose of DDC reduced brain NE content to half that seen in mice treated with the nonamnesic dose". These results support the earlier findings of Stein et al. [36], that injection of NE into the lateral ventricles of rats prevented the amnestic effects of DDC pretreatment on passive avoidance learning (see also [24]. Posttrial injection of d-amphetamine has been shown to facilitate learning [21,24], although opposite effects have also been reported (e.g. [ 19] ). It would be important for the theory of posttrial reinforcement of memory processes to determine, on the one hand, whether drugs or other treatments (such as the various types of brain stimulation; see D above), which facilitate learning when administered posttrial can serve as reinforcers in conventional operant conditioning situations; and, on the other hand, it would be important to know whether known reinforcers can facilitate learning when administered posttrial. Of interest would be, for instance, a number of substances, such as morphine in addicted rats [5,29] and apomorphine [ 1 ], which animals self-administer and which also influence brain stimulation reward.

REFERENCES 1. Baxter, B. L., M. I. Gluckman, L. Stein and R. A. Scerni. Self-injection of apomorphine in the rat: positive reinforcement by a dopamine receptor stimulant. Pharmac. Biochem. Behav. 2: 387-391, 1974. 2. Black, A. H. The direct control of neural processes by reward and punishment. Am. ScL 59: 236-245, 1971.

3. Bloeh, V. Facts and hypotheses concerning memory consolidation. Brain Res. 24: 561-575, 1970. 4. Buret, J. and O. Bure~ova. Cortical spreading depression as a memory disturbing factor. J. comp. physiol. Psychol. 56: 268-272, 1963.

150 5. Davis, W. M. and S. G. Smith. Blocking of morphine based reinforcement by alpha-methyltyrosine. Life ScL 12: 185-191, 1973. 6. Dawson, R. G. and J. L. McGaugh. Drug facilitation of learning and memory. In: The Physiological Basis o f Memory, edited by J. A. Deutsch. New York: Academic Press, 1973, pp. 77-111. 7. Denti, A., J. L. McGaugh, P. W. Landfield and P. Shinkman. Effects of post-trial electrical stimulation of the mesencephalic reticular formation on avoidance learning in rats. Physiol. Behav. 5: 659-662, 1970. 8. Destrade, C., B. Soumiren-Mourat and B. Cardo. Effects of post-trial hippocampal stimulation on acquisition of operant behavior in the mouse. Behav. Biol. 8: 713-724, 1973. 9. Deweer, B. Selective facilitative effect of post-trial reticular formation stimulation in discriminative learning in the rat. Behav. Proe. 1977, in press. 10. Erickson, C. K. and J. B. Patel. Facilitation of avoidance learning by post-trial hippocampal electrical stimulation. J. comp. physiol. Psychol. 68: 400-406, 1969. 11. Fetz, E. E. Operant conditioning of cortical unit activity. Science 163: 955-957, 1969. 12. Fibiger, H. C. and A. G. Phillips. Retrograde amnesia after electrical stimulation of the substantia nigra: mediation by the dopaminergic nigro-neostriatal bundle. Brain Res. 116: 23-33, 1976. 13. Fox, S. S. and A. P. Rudell. Operant controlled neural event: formal and systematic approach to electrical coding of behavior in man. Science 162: 1299-1302, 1968. 14. Gold, P. E., L. Hankins, R. M. Edwards, J. Chester and J. L. McGaugh. Memory interference and facilitation with post-trial amygdala stimulation: effect on memory varies with footshock level. Brain Res. 8 6 : 5 0 9 - 5 1 3 , 1975. 15. Hall, M. E. The effects of norepinephine biosynthesis inhibition on the consolidation of two discriminated escape responses. Behav. Biol. 16: 145-153, 1976. 16. Huston, J. P. Physiologische und motivationale Aspekte der Verst~irkung. In: Pawlow und die Folgen. Psychologie des 20. Jahrhunderts, edited by H. Zeier. Ztirich: Kindler Verlag, 1977, Vol. 5. 17. Huston, J. P. and C. Mondadori. Memory and reinforcement: A model. In: Proc. 2nd Int. Congr. CIANS, Prague, 1975. Activ. nerv. Sup. 19: 17-19, 1977. 18. Huston, J. P., C. Mondadori and P. G. Waser. Facilitation of learning by reward of post-trial memory processes. Experientia 30: 1038-1040, 1974. 19. James, D. T. D. Post-trial d-amphetamine sulfate and one-trial learning in mice. J. comp. physiol. Psychol. 89: 626-635, 1975. 20. Krauth, J. Nichtparametrische Ans/itze zur Auswertung von Verlaufskurven. Biometrische Zeitschrift 15:557 566, 1973. 21. Krivanek, J. A. and J. L. McGaugh. Facilitating effects ofpreand post-trial amphetamine administration on discrimination learning in mice. Agents and Actions 1: 36-42, 1969. 22. Kurtz, K. H. and J. Pearl. The effect of prior fear experience on acquired-drive learning. J. comp. physiol. Psychol. 53: 201-206, 1960.

HUSTON, M U E L L E R AND M O N D A D O R I 23. Landfield, P. W., R. J. Tusa and J. L. McGaugh. Effects of post-trial hippocampal stimulation on memory storage and E.E.G. activity. Behav. Biol. 8: 485-505, 1973. 24. McGaugh, J. L. and P. E. Gold. The effects of drugs and electrical stimulation of the brain on memory storage processes. In: Neurobiological Basis o f Memory Formation, edited by H. Matthies. Berlin: VEB Verlag Volk and Gesundheit, 1974, pp. 474-499. 25. McGaugh, J. L. and M. J. Herz. Memory Consolidation. San Francisco: Albion, 1972. 26. Mondadori, C., K. Ornstein, P. G. Waser and J. P. Huston. Post-trial reinforcing hypothalamic stimulation can facilitate avoidance learning. Neurosci. Letters 2 : 1 8 3 - 1 8 7 , 1976. 27. Mondadori, C., P. G. Waser and J. P. Huston. Time dependent effects of post-trial reintbrcement, punishment or ECS on passive avoidance learning. Physiol. Behav. 1977, in press. 28. Nachman, M. Limited effects of electroconvulsive shock on memory of taste stimulation. J. comp. physiol. Psychol. 73: 31-37, 1970. 29. Ornstein, K. and J. P. Huston. Interaction between morphine and reinforcing lateral hypothalamic stimulation. Psychopharmacology 1977, in press. 30. Pellegrino, L. J. and A. J. Cushman. A Stereotaxic Atlas o f the Rat Brain. New York: Appleton-Century-Crofts, 1967. 31. Poschel, B. P. H. and F. W. Ninteman. Hypothalamic selfstimulation: its suppression by blockade of norepinephrine biosynthesis and reinstatement by methamphetamine. Life Sci. 5: 11-16, 1966. 32. Renner, K. E. Delay of reinforcement: a historical review. Psychol. Bull. 61: 3 4 1 - 3 6 l , 1964. 33. Rosenfeld, J. P. and B. E. Hetzler. Operant controlled evoked responses: discrimination of conditioned and normally occurring components. Science 181: 767-770, 1973. 34. Routtenberg, A. and N. Holzman. Memory disruption by electrical stimulation of the substantia nigra, pars compacta. Science 181: 83-86, 1973. 35. Stein, L. Self-stimulation of the brain and the central stimulant action of amphetamine. Fedn. Proc. 23: 836-850, 1964. 36. Stein, L., J. D. Belluzzi and C. D. Wise. Memory enhancement by central administration of norepinephrine. Brain Res. 84: 329-335, 1975. 37. Tarpy, R. M. and F. L. Sawabini. Reinforcement delay: a selective review of the last decade. Psychol. Bull. 81: 984-997, 1974. 38. Wise, C. D., B. D. Berger and L. Stein. Evidence of c~-noradrenergic reward receptors and serotonergic punishment receptors in the rat brain. Biol. Psychiat. 6 : 3 21, 1973. 39. Wyrwicka, W. and M. B. Sterman. Instrumental conditioning of sensorimotor cortex EEG spindles in the waking cat. Physiol. Behav. 3: 703-707, 1968. 40. Yokel, R. A. and R. Pickens. Self-administration of optical isomers of amphetamine and methylamphetamine by rats..L Pharmae. exp. Ther. 187:27 33, 1973.