Repeated extinction and reversal learning of an approach response supports an arousal-mediated learning model

Repeated extinction and reversal learning of an approach response supports an arousal-mediated learning model

Behavioural Processes 87 (2011) 125–134 Contents lists available at ScienceDirect Behavioural Processes journal homepage: www.elsevier.com/locate/be...

651KB Sizes 0 Downloads 48 Views

Behavioural Processes 87 (2011) 125–134

Contents lists available at ScienceDirect

Behavioural Processes journal homepage: www.elsevier.com/locate/behavproc

Repeated extinction and reversal learning of an approach response supports an arousal-mediated learning model Christopher A. Podlesnik a,∗ , Federico Sanabria b,∗∗ a b

Department of Pharmacology, University of Michigan Medical School, 3415 MSB I, 1301 Catherine Street, Ann Arbor, MI 48109-5632, United States Arizona State University, Department of Psychology, P.O. Box 871104, Tempe, AZ 85287-1104, United States

a r t i c l e

i n f o

Article history: Received 14 September 2010 Received in revised form 1 December 2010 Accepted 12 December 2010 Keywords: Pavlovian conditioned approach Extinction Reversal learning Arousal Nose poke Rat

a b s t r a c t We assessed the effects of repeated extinction and reversals of two conditional stimuli (CS+/CS−) on an appetitive conditioned approach response in rats. Three results were observed that could not be accounted for by a simple linear operator model such as the one proposed by Rescorla and Wagner (1972): (1) responding to a CS− declined faster when a CS+ was simultaneously extinguished; (2) reacquisition of pre-extinction performance recovered rapidly within one session; and (3) reversal of CS+/CS− contingencies resulted in a more rapid recovery to the current CS− (former CS+) than the current CS+, accompanied by a slower acquisition of performance to the current CS+. An arousal parameter that mediates learning was introduced to a linear operator model to account for these effects. The arousal-mediated learning model adequately fit the data and predicted data from a second experiment with different rats in which only repeated reversals of CS+/CS− were assessed. According to this arousal-mediated learning model, learning is accelerated by US-elicited arousal and it slows down in the absence of US. Because arousal varies faster than conditioning, the model accounts for the decline in responding during extinction mainly through a reduction in arousal, not a change in learning. By preserving learning during extinction, the model is able to account for relapse effects like rapid reacquisition, renewal, and reinstatement. © 2010 Elsevier B.V. All rights reserved.

1. Introduction Eliminating the correlation between a conditional stimulus (CS) and an unconditional stimulus (US) results in a progressive decline in responding to the CS. Extinction of the conditional stimulus is a learning process, critical to the organism’s adaptation to a changing environment. There has been much conceptual and theoretical development devoted toward elucidating processes mediating declines in conditioned performance during extinction (e.g., Bouton, 2004; Gallistel and Gibbon, 2000; Killeen et al., 2009; Mackintosh, 1975; Pearce and Hall, 1980; Rescorla, 2001; Rescorla and Wagner, 1972; Wagner, 1981). One development that has become almost universally accepted is that declines in responding are not a result of extinction eliminating prior excitatory learning (e.g., Rescorla, 1993). Instead, at least two distinct processes emerge in extinction when US presentations are eliminated: (1) the sudden absence of the US results in a discriminable change in stimulus conditions, i.e., a generalization decrement develops,

∗ Corresponding author. Tel.: +1 734 615 5012; fax: +1 734 764 7118. ∗∗ Corresponding author. Tel.: +1 480 9654687; fax: +1 480 9658544. E-mail addresses: [email protected] (C.A. Podlesnik), [email protected] (F. Sanabria). 0376-6357/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.beproc.2010.12.005

and (2) the organism’s expectancy that a CS signals a forthcoming US is violated, initiating new inhibitory learning (Mackintosh, 1974). Bouton (2004) further argues that this new inhibitory learning occurs in what becomes a novel stimulus context introduced by removal of the US with the extinction contingency (i.e., the generalization decrement). Thus, conditioning and extinction are learned and expressed conditionally as a function of the prevailing context. Evidence that extinction results in context-mediated learning comes from situations in which, following extinction, reestablishing the original conditioning context restores responding (see Bouton, 2004, for a review). For instance, first training a CS-US association in one context (Context A), followed by extinguishing the CS in a different context (Context B), and finally reintroducing the original Context A produces a marked increase in responding to the CS, even in the absence of the US (Bouton and King, 1983; Bouton and Peck, 1989). These findings suggest that learning about the CS-US relation is preserved throughout extinction despite elimination of conditioned performance; re-exposure to the original conditioning context reveals this preserved learning. Related phenomena in which responding readily recovers following extinction, such as spontaneous recovery and rapid reacquisition, are attributed to a failure to retrieve the extinction memory when testing occurs outside of the temporal or stimulus context mediating extinction.

126

C.A. Podlesnik, F. Sanabria / Behavioural Processes 87 (2011) 125–134

Bouton (1993, 2004) suggested that extinction is just one example of a set of phenomena in which new learning temporarily interferes with performance, but not with the integrity, of an initially trained association. One such situation, the reversal of Pavlovian contingencies, has received relatively little attention with regard to exploring the behavioral processes mediating its effect. In reversal learning experiments, training begins with one CS paired with a US (i.e., CS+) while a different CS is paired with nonreinforcement (i.e., CS−). Following training, the CS+ and CS− are reversed. Consistent with Bouton’s (1993) interference hypothesis, reversing Pavlovian contingencies results in effects similar to standard extinction (e.g., Spear et al., 1980; Thomas et al., 1985). The originally trained CS-US association survives subsequent reversals of CS+ and CS−. The persistence of prior associations following CS+/CS− reversal was clearly shown by rats spontaneously recovering an initially trained magazine approach response that was conditional to a CS+/CS− discrimination, following five reversals of the original discrimination (Rescorla, 2007). During acquisition, eight 30-s diffuse light and white noise presentations were arranged per session as conditional stimuli, with one stimulus preceding the presentation of a food pellet (i.e., CS+) and the other preceding nonreinforcement (i.e., CS−). Next, CS+ and CS− assignments were reversed five times across eight-session blocks. One additional reversal was arranged to equate levels of responding to CS+ and CS− before rats were given six days without experimental sessions. Finally, the rats were placed back in their experimental chambers for a single session in which the light and tone were presented without reinforcement. Large elevations in responding were observed only to the most recently extinguished CS, as is observed with spontaneous recovery following simple extinction of a CS (e.g., Brooks and Bouton, 1993; Pavlov, 1927). Such effects suggest that similar processes mediate the learning underlying both extinction and reversal learning. Although Bouton’s (1993) interference hypothesis accounts for a wide range of phenomena, its implementation as a dynamic learning model is rather challenging. It is unclear, for instance, whether new learning needs to be encoded in a unique storage module to maintain the integrity of old learning. Various solutions have been proposed in which old learning is encoded in the hidden layers of a neural network (Burgos and Murillo-Rodriguez, 2007; Kehoe, 1988; Larrauri and Schmajuk, 2008). The importance of specific components of these networks is not assessed explicitly, and thus it is difficult to determine the merit of these models beyond the proximity of predicted to obtained data. Relative to neural networks, simple learning rules such as those articulated by classic dynamic learning models (Mackintosh, 1975; Pearce and Hall, 1980; Rescorla and Wagner, 1972) have the advantage of simplicity and tractability but, as with Bouton’s model, they do not specify a solution to the problem of how previously acquired associations endure under extinction. Miller et al. (1995) suggest that interference may stem from the asymmetry and simultaneity of excitatory and inhibitory conditioning, but no satisfactory quantitative implementation of this solution has been advanced. This paper introduces a dynamic learning model that accounts for the persistence of learned associations over repeated extinction, reacquisition, and reversal training. The model is based on Killeen et al.’s (2009) notion that the probability of responding to a CS on any given trial is a function of past CS-US pairings – much in line with classic dynamic learning models and even earlier stimulus sampling models (Estes, 1950) – and the momentum of response and no-response states. The new model explains the persistence of learning using just one additional assumption, that learning and performance are conditional to US-elicited arousal (Killeen et al., 1978). To collect behavioral data, we took advantage of the fact that organisms approach discrete visual stimuli that predict the deliv-

ery of appetitive rewards (Brown and Jenkins, 1968; Hearst and Jenkins, 1974; Peterson et al., 1972). The discovery of conditioned approach responses was important because the realm of Pavlovian conditioning broadened from relatively simple reflexes to larger skeletal movements (Hearst, 1977). Approach responses show all signs of being instances of Pavlovian conditioning (Farwell and Ayres, 1979), and can be measured conveniently and automatically to assess responding to CSs differentially associated with a US. 2. Experiment 1 Extinction and reversal learning data were collected under varying training conditions using two CSs. Conditioned responding was alternately acquired for one CS and extinguished for both CSs in blocks of multiple sessions. We propose an arousal-mediated learning model to account for conditioned response probabilities across all training conditions. Model parameters were estimated and possible variations of the model are considered. Finally, the model is extended to account for latencies—intervals between CS onset and conditioned response. 2.1. Methods 2.1.1. Subjects Five male Sprague–Dawley rats originally were obtained from Harlan (Indianapolis, IN) and were maintained in a temperatureand humidity-controlled environment on a 12-h light/12-h dark cycle with lights on at 7:00 am. Rats weighed approximately 300 g and were maintained at approximately 80% of their adult weights (±20 g) by postsession feeding of rat chow. Prior to the present study, all rats participated in a study examining the effects of systemic injections of dopaminergic compounds on conditioned approach responses. Given the previous conditioning history, no additional preliminary training was necessary. All studies were carried out in accordance with the Guide for Care and Use of Laboratory Animals as adopted by the National Institutes of Health. University of Michigan’s Committee on the Use and Care of Animals approved all experimental protocols. 2.1.2. Apparatus Six Med Associates® (St. Albans, VT, USA) operant conditioning chambers were used. Each chamber was approximately 30 cm long, 24 cm wide, and 21 cm high, and housed in a sound-attenuating cubicle with a ventilation fan. A dipper (0.1-ml reservoir) that could deliver liquid food (Vanilla Ensure® ) was centered on the front panel within an approximately 4.1 cm (h) × 3.5 cm (w) aperture with its bottom edge 2 cm above a grid floor. An LED recessed in the roof of the aperture could be turned on to illuminate the aperture and was used as the conditional stimuli. An infrared photobeam located immediately above the dipper receptacle recorded head entries into the aperture and were the primary dependent measure in the present study. Control of experimental events and data recording were conducted with Med Associates interfacing and programming. Sessions occurred 5 days per week at approximately the same time. 2.1.3. Procedures All sessions consisted of 16 Pavlovian conditioned approach trials with two different visual conditional stimuli and were approximately 40 min in duration. During the first Acquisition condition for three rats, 8 trials per session consisted of illuminating the aperture with a steady light for 15 s prior to a 7-s presentation of the dipper (hereafter CS+ trials). Another 8 trials per session consisted of flashing the aperture light on and off every 0.1 s for 15 s prior to 7 s of no dipper presentation (hereafter CS− trials). The CS+ and CS− stimuli were reversed for the other two rats. Prior to all CS+

C.A. Podlesnik, F. Sanabria / Behavioural Processes 87 (2011) 125–134 Table 1 Conditions, conditional stimulus assignments, and number of sessions in Experiment 1. Condition

Contingency

CS+

CS−

Sessions

1 2 3 4 5 6 7 8 9

Acquisition Extinction Reacquisition Extinction Reversal Extinction Reversal Extinction Reacquisition

A A A A B B A A A

B B B B A A B B B

18 9 15 9 21 9 21 10 20

and CS− trials, a variable intertrial interval (ITI) was arranged with an average of 120 s (range 75 s to 165 s). Presentation of CS+ and CS− stimuli were chosen randomly throughout all sessions with the exception that each stimulus could only be presented at most twice in succession. Table 1 shows these conditions with number of sessions in the order they occurred. Acquisition (Condition 1) occurred as described above. During all Extinction conditions (Conditions 2, 4, 6, and 8), dipper presentations were discontinued following presentation of both conditional stimuli. Following Extinction conditions, the original contingencies prior to Extinction were reestablished during Reacquisition conditions (Conditions 3 and 9) or were reversed between conditional stimuli during Reversal conditions (Conditions 5 and 7). To keep track of the stimulus assignments throughout all conditions, the CS+ and CS− stimuli arranged during the initial Acquisition condition will be referred to as Stimulus A and Stimulus B, respectively, throughout (regardless of whether they are the CS+ or CS−). 2.2. Results The points in the top panel of Fig. 1 represent, for each session, the mean proportion of trials with a head entry. The solid symbols represent responding to Stimulus A; the open symbols, to Stimulus B. Rats initially responded to A and B on 85% of the trials (Condition 1, Acquisition). As training progressed in this condition, responses to B (the CS−) dropped to 40%, whereas responses to A (the CS+) remained above their initial frequency. Under Extinction (Condition 2), responding to A and B declined to 15 and 5% of trials, respectively. On the first session in which the CS+ was reinstated (Condition 3, Reacquisition), responding to A and B recovered to pre-extinction levels. Responding to A during Reacquisition remained relatively stable, whereas responding to B declined to 30% of trials, seemingly continuing the downward trend observed during Acquisition and interrupted by Extinction. Extinction performance was replicated in the second Extinction condition (Condition 4). When the dipper activations were reinstated, but following B and not A (Condition 5, Reversal), responding to A and B recovered close to pre-extinction levels, but whereas responding for B (now the CS+) rose over 90% of trials within 4 sessions and remained at that level, responding for A (now the CS−) slowly dropped to 70% of trials. Prior extinction performance was essentially replicated in the third Extinction condition (Condition 6). Reversal performance in Condition 5 was replicated in the second Reversal condition (Condition 7), but with A as CS+ and B as CS−. Prior extinction performance was again replicated in the final Extinction condition (Condition 8). Responding to A (CS+) during the final Reacquisition (Condition 9) remained relatively stable, whereas responding to B (CS−) declined to 35% of trials. The latter was a replication of the first Reacquisition (Condition 3), and like then, responding to A and B during the second Reacquisition appeared to continue the trend observed before Extinction.

127

Data depicted in Fig. 1 reveal 3 non-trivial effects: 1. Responding to the CS− declined faster in the absence of the US following CS+ (Extinction: Conditions 2 and 4) than when the US followed CS+ (Conditions 1 and 3). 2. When training conditions were reinstated following Extinction (Conditions 3 and 9), pre-extinction performance to CS+ and CS− recovered within a session. 3. CS+/CS− reversals following Extinction (Conditions 5 and 7) resulted in a rapid recovery followed by a progressive decline of responding to the former CS+, and a slower acquisition of responding to the former CS−. We aimed at accounting for these effects, along with more conventional effects (e.g., less responding to CS− than to CS+), by formulating a trial-by-trial model that would reproduce the data shown in the top panel of Fig. 1. The continuous lines in the top panel of Fig. 1 are the output of the model, showing that it succeeded in approximating the data. The bottom panel of Fig. 1 shows the fitting residuals. We will first describe the model, then we will disassemble it and consider its effectiveness in the absence of individual components. 2.3. Modeling 2.3.1. An arousal-mediated learning model1 2.3.1.1. Arousal. An incentive is “the end which serves to arouse, to direct, and to bring to a conclusion some persistent activity,” (Simmons, 1924; from Flaherty, 1996). Arousal is a latent variable that mediates exposure to the unconditioned stimulus (US; the input) and changes in frequency of US-relevant responses (the output; cf. Killeen, 1975; Killeen et al., 1978).2 It is assumed that arousal increases with each presentation of the US, regardless of the CS that preceded it, and declines with time. Arousal (A) is dimensionless and bounded between 1 (maximum arousal) and zero (no arousal). When arousal increases, it does so immediately with the presentation of the US by a proportion ˛A of its distance to 1. When arousal decreases, it does so over the course of the ITI by a proportion ˇA of its distance to zero. Fig. 2 illustrates this process. At the end of trial t, a presentation of the US increases arousal level A to A’, which then declines continually until the onset of trial t + 1. In this example, the change in A from t to t + 1, + A, involves both the US-elicited increase in arousal and its time-dependent decrease. Because trial t + 1 does not finish with a US, the subsequent change in A, - A, only involves the decrease in arousal at rate ˇA . Putting these two processes in the form of equations, we obtain + A = ˛A (1 − ˇA )(1 − A) + ˇA (0 − A). − A = ˇA (0 − A)

0 ≤ ˛A , ˇA ≤ 1

(1)

2.3.1.2. Learning. The associative strength of a cue (V) changes according to a linear operator model similar to Rescorla and Wagner’s (1972): + V = A˛V (1 − V ). − V = AˇV (0 − V )

0 ≤ ˛V , ˇV ≤ 1

(2)

Like A, V is a latent variable. Unlike A, however, V is specific to the CS. When the US is present, V for the preceding CS approaches 1 (top of Equation (2)); when the US is absent, V approaches 0 (bottom of

1 A simulation of the model is available in Microsoft Excel 2008 format at http://psychology.clas.asu.edu/files/SimAMLM.xlsx. 2 Here we only consider food-elicited arousal, which facilitates food-seeking behavior. More generally, arousal may be elicited by any US, but the behavioral system facilitated by arousal would likely depend on the nature of the US (Timberlake, 1994).

128

C.A. Podlesnik, F. Sanabria / Behavioural Processes 87 (2011) 125–134

Fig. 1. Top panel: Proportion of trials with a head-poke conditioned approach response across sessions of Experiment 1. Data points are means (n = 5) and functions are model fits. Solid vertical lines indicate onset of conditioning and dotted vertical lines indicate onset of extinction. Note that “rev” at top of figure indicates where CS+ and CS− were reversed following extinction. Bottom panel: Mean of residuals of model fits (predicted–obtained).

Equation (2)). The rate of approach of either asymptote is the product of a fixed rate of change (˛V for acquisition; ˇV for extinction) and the level of arousal when the CS was presented. The mediation of arousal in learning implies that when the US is frequently presented (by itself or along any other stimuli), CSs are more rapidly

learned, extinguished, and discriminated; as time since the last US elapses, learning progressively draws to a pause. 2.3.1.3. Response state. Performance depends both on learning the CS-US association (V) and on energizing the response that is elicited by the association (A). The relation between performance, learning, and arousal is most simply expressed as p(St ) = At Vt ,

(3)

where p(S) is the probability of entering into a response state and the subscript t serves as trial index. Conditioned responses are elicited only when in a response state; it is therefore assumed that the rat was in a response state on a given trial if at least one response occured during that trial. 2.3.1.4. Persistence of response state. Response states persist over multiple trials. On any given trial, the probability of staying in a response state is constant, . Thus, the probability p(Rt ) of observing at least one conditioned response in trial t is p(Rt ) = p(St ) + [1 − p(St )]Rt−1 .

Fig. 2. Illustration of how arousal changes from trial to trial according to the ArousalMediated Learning Model. The grey area is arousal (A); light grey is arousal during the trial, when the CS is present (note the simplifying assumption that A is constant during trials); dark grey is arousal during the ITI. At the end of trial t, the US increases A by a proportion ˛A of its distance to 1. By the beginning of trial t + 1, A has declined by a proportion ˇA of its distance to zero (to make the decline in A visible in this illustration, the value of ˇA was inflated almost 10-fold from empirical estimates). There is no US at the end of trial t + 1, and therefore no increase in A. In the subsequent ITI, A declines at the same rate as in the previous ITI.

0 ≤  ≤ 1, Rt−1 = {0, 1}

(4)

Equation (4) indicates that a response may occur because the rat enters a new response state [with probability p(S)] or it stayed in an ongoing response state (with probability ). Boolean variable R indicates whether a conditioned response was elicited in the previous trial (Rt−1 = 1) or not (Rt−1 = 0), thus indicating whether or not the rat was in a response state in the previous trial. 2.3.1.5. Changes in arousal, learning, and performance between sessions. So far we have considered how variables change between

C.A. Podlesnik, F. Sanabria / Behavioural Processes 87 (2011) 125–134 Table 2 Parameters of the arousal-mediated learning model, estimated from Experiment 1. Parameter

Rat 1

Rat 2

Rat 3

Rat 4

Rat 5

Mean

Initial variable values Arousal (A) Associative strength (V)

1.00 .567

1.00 .653

1.00 .498

1.00 .832

.000 .919

.800 .694

Constants Arousal growth rate (˛A ) Arousal decline rate (ˇA ) Acquisition rate (˛V ) Extinction rate (ˇV ) Persistence rate ()

.239 .062 1.00 .008 .358

.607 .048 .174 .004 .000

.254 .065 1.00 .011 .228

.695 .064 .167 .000 .240

.916 .090 .027 .008 .214

.542 .066 .474 .006 .208

129

Table 3 Differences in maximum likelihood estimates (MLE) between alternative models and the arousal-mediated learning model. Model

Modified Equation

Constraint

MLE

Simple arousal Arousalindependent learning Arousalindependent performance No persistence

1 2

+ A = ˛A (1 − A) + V = ˛V (1 − V), − V = ˇV (0 − V)

−397.5 −23.84

3

p(St ) = kVt

−853.36

4

=0

−56.13

Note: See text for further explanation of model parameters.

trials, but not between sessions, which may pose challenges for the model. For instance, if arousal simply decayed as a function of time since the last US, it would most likely reset between sessions. A model like that, as will be shown, cannot account for the data. Similarly, robust effects of long inter-session intervals, such as spontaneous recovery, make it unrealistic that variables would remain unaffected between sessions. Variables therefore are not completely reset at session start, but do not continue unaffected by the inter-session interval. The impact of a long interval is somewhere in between, a partial resetting of all variables. The simplest way to implement such partial resetting in our model is by assuming that, at the beginning of each session, the values of A and V are their average value in the previous session. (Alternatively, we considered a more flexible implementation, by assuming that the value of each variable was the weighted mean of (1) its value at the beginning of the preceding session, and (2) its value at the end of the preceding session. This implementation was rejected because it is less parsimonious – it adds the weighing factor as a free parameter – and does not provide a better fit of the model to the data). Another important assumption regarding inter-session intervals is whether response states may persist across sessions. We assumed that response states declined over time, and thus could not persist across long inter-session intervals. This assumption was implemented by setting Rt − 1 = 0 in Equation (4) when t is the first trial in a session. 2.3.2. Parameter estimation The arousal-mediated learning model has 5 free parameters: arousal growth rate (˛A ), arousal decline rate (ˇA ), acquisition rate (˛V ), extinction rate (ˇV ), and persistence rate (). These values, along with the initial values of arousal (A) and associative strength (V), were estimated by fitting model predictions to data, using the maximum likelihood estimation (MLE) method,3 implemented using the Solver add-in of Microsoft Excel 2008. The average fits are shown in the top panel of Fig. 1 as continuous curves. Table 2 shows the estimates for individual rats. For all rats except one (Rat 5), arousal was very high at the beginning of the experiment. Initial associative strength was also much higher than what would be expected if the CSs were novel. Arousal grew faster than it declined, and acquisition was faster than extinction (see also Rescorla, 2002). With the exception of 2 rats, whose ‘acquisition’ was immediate (˛V = 1.0), arousal changed faster (i.e., parameter values were higher) than learning. For all rats except

3 Maximum likelihood was computed by varying free parameters to fit predicted probability p(Rt ) to data (conditioned response present, Rt = 1; conditioned response absent, Rt = 0) on each trial. Fitting was conducted by maximizing



log [Rt p(Rt ) + (1 − Rt )(1 − p(Rt ))] ,

Rt = {0, 1}

(F1)

which yields the log-likelihood of the model, the log probability of the data given the best estimates of model parameters.

one (Rat 2), responding persisted over consecutive trials at levels of 21% to 36%. 2.3.3. Model evaluation We tested alternatives to the full-scale model – the “target” – by systematically removing some of its components and adding new ones. These changes sometimes involved a removal or addition of free parameters. Alternative models were evaluated by taking the difference between each of their MLE and that of the target model (i.e., MLE). Larger positive MLE were indicative of stronger evidence favoring the alternative model; negative MLE favored the target model. Following the Akaike Information Criterion (AIC; Burnham and Anderson, 2002), the removal of one parameter was justified when MLE ≥ −1, whereas the addition of one parameter was justified when the MLE ≥ 3. This model selection rule favors more parsimonious models, even when goodness of fit is reduced slightly (i.e., −1 ≤ MLE < 0), over more complex models, except when the latter provides a substantially superior fit to the data. Next we discuss each alternative model and evaluate it on the basis of its corresponding MLE (shown in Table 3). We also discuss possible explanations for the shortcomings of each model, on the basis of visual inspection of its best fit to the data (fits are not shown). 2.3.3.1. Simple arousal model. The assumption of time-dependent decline of arousal, shown in Fig. 2, appears to add an unnecessary nuance to the arousal function. It seems much simpler to assume that arousal growth is symmetrical to arousal decay, and thus + A = ˛A (1 − A). This model assumes that A does not decline with time, but only when the CS is not followed by the US. Such change in the model produced a substantial reduction in MLE, and thus was not justified. Quantitatively, the main weakness of the simple arousal model is that arousal growth now limits at the unit, whereas in the target model it limits at ˛A (1 − ˇA )/(˛A + ˇA − ˛A ˇA ). As shown in Fig. 1, responsiveness to the CS+ always reached asymptote close to, but not quite at, 1. With a large estimate of ˛A and a small estimate of ˇA (see Table 2), the target model faithfully reproduced this effect, whereas the simple arousal model was systematically unable to. 2.3.3.2. Arousal-independent learning model. It seems somewhat arbitrary that learning rate is as strongly dependent on arousal as Equation (2) suggests. We thus considered setting A = 1 in Equation (2). The MLE supports the notion that arousal does mediate learning. The difference between model predictions is somewhat subtle – in fact, it is not visible in a plot like Fig. 1 – but it is systematic: the MLE of the target model was higher for every rat. 2.3.3.3. Arousal-independent performance model. We also considered the possibility that the probability of entering a response state was not dependent on arousal, but was instead proportional to learning. This means that Equation (3) is replaced by p(St ) = kVt .

130

C.A. Podlesnik, F. Sanabria / Behavioural Processes 87 (2011) 125–134

becomes longer, and the probability of observing a response during the trial declines. More precisely, when responses are randomly elicited,



p(R) = 1 − exp



ı T



,

ı, T > 0

(5)

where p(R) is the probability of observing a response within an interval ı (which is in the same units as T). Assuming ı = 15 s (the duration of the CS), median latency L = ln(2)T, and that the maximum value of L is 15 s, L may be solved for as a function of p(R): L = min

Fig. 3. Latency to CS+ and CS− as a function of response probability estimated by the arousal-mediated learning model. The midpoints of .05-wide response probability bins are in the x axis; median latencies within each probability bin are in the y axis. The solid curve is the prediction of a random response process (Equation (6)).

The new proportionality constant k is a new free parameter that needs to be justified. Such justification, however, was not substantiated. Without the mediation of arousal, response probability could not decline faster across sessions in Extinction conditions than on Acquisition, Reacquisition, and Reversal conditions.

 − ln(2) × 15 s ln(1 − p(R))



, 15 s .

0 < p(R) < 1

(6)

Note that Equation (6) has no free parameters—L is completely determined by p(R) and vice versa. The function relating L to p(R) is plotted in Fig. 3 as a continuous curve. For most values of p(R), Equation (6) correctly predicted L, but when p(R) > .8, Equation (6) underpredicted L. Such fit suggests that rats produced conditioned responses at random rates, but that the maximum speed at which a conditioned response was elicited—the left tail of the distribution of latencies—was limited by variables not characterized here. This underprediction cannot be explained by the right-censored nature of latency data, because it was observed when p(R) > .5. 2.5. Discussion

2.3.3.4. No persistence model. Finally, we asked whether response state persistence was a necessary complication of the model. In order to implement an alternative model with no persistence,  was forced to be zero, thus eliminating one free parameter. The MLE did not support this alternative model. Fits of the no-persistence model are very similar to those shown in Fig. 1, but the small twitches in the curves are ironed out. Such twitches accounted for a substantial amount of variance in the data. 2.4. Latencies The arousal-mediated learning model is a model of conditioned response probability. We asked whether this model could be extended to describe response latencies, which is another measure of response strength (Killeen and Hall, 2001). We examined the correlation between latency and model-estimated probability, and compared it against predictions based on the assumption of random responding within response states. Response latencies were measured as the time between the onset of the CS and the first break in the infrared beam in the food receptacle (the conditioned response); for computational convenience, if a conditioned response was not elicited within the 15 s CS, the latency was registered as 15 s. Latencies were pooled across rats and binned according to the conditioned response probability indicated by the arousal-mediated model [p(R)]. Median latencies are shown in Fig. 3. Fig. 3 shows that in most of the trials in which p(R) < .5, no response was elicited; in most of the trials in which p(R) > .5, a response was elicited. Latencies rapidly shortened as p(R) increased from .5 to .7, and then reached asymptote at about 7 s when p(R) > .7. The orderly change in response latency as a function of p(R) suggests that these measures of response strength are related (Killeen and Hall, 2001). Indeed, response latency and probability may be mathematically linked (Killeen et al., 2002). If within a response state responses were elicited randomly with a fixed probability p(R), starting at the onset of the CS, the interval between responses would be exponentially distributed with a median of ln(2)T, where T is the mean time between responses. The median expected interval would be the best estimate of the median expected latency. If the time T between responses increases, the median expected latency

The data in Fig. 1 presented three challenges to learning models: Why does extinction of one CS accelerate the extinction of the other CS? How did training performance recover so quickly following extinction? Why did this happen even when the former CS+ was no longer reinforced? The arousal-mediated learning model answers all three questions. For conditioned responding to occur, not only must a CS-US correlation have been learned, but the animal must be aroused by the US; responding is the product of learning and arousal. The notion of performance as a function of learning and arousal is not new; it dates at least from Hull’s (1943) Principles of Behavior (his equation, Excitatory Potential = Habit Strength × Drive, is analogous to our Equation (3)). More recently, this idea has been couched in modern terms by Killeen’s Mathematical Principles of Reinforcement (Killeen, 1994, 1998; Killeen and Sitomer, 2003). In fact, the notion of arousal advanced here is almost the same as that expounded by Killeen et al. (1978). Specifically, successive presentations of a US result in cumulative increments in unconditioned arousal toward an asymptote; arousal declines in the absence of the US. The concepts of arousal, drive, and linear-operator learning are not new, but a quantitative solution to their interaction had not been advanced before. The notion that arousal mediates the learning coefficient provides a novel set of tools to predict complex changes in conditioned performance. It is important to note here that, even though arousal may be unconditionally elicited by the US, it does not mean that arousal cannot be conditionally elicited. Note that the model carries over arousal from one session to the next; that would not be possible if arousal was only elicited by the US. Instead of functioning as directly as a CS, Bouton (1993) has suggested that the experimental context sets the occasion for the meaning of a given CS (see Bouton and Brooks, 1993; Honey and Watt, 1999; Swartzentruber, 1991). Given that arousal is not specific to a particular CS in this model, arousal had to be elicited directly by the context by virtue of its pairing with the US. Given that the stimulus context of the experimental chamber is not extinguished, context-dependent arousal allows for rapid recovery of arousal at the beginning of each session, and for arousal to mediate learning in the manner shown following extinction at the outset of Conditions 5 and 7. In those conditions,

C.A. Podlesnik, F. Sanabria / Behavioural Processes 87 (2011) 125–134

responding to the current CS− (previous CS+) was re-acquired more rapidly than the current CS+. Rapid reacquisition following extinction has been a problem of great interest in associative learning research, mainly because it implies that extinction learning is not the same as excitatory “unlearning” (Bouton, 2004; Delamater, 2004). As previously mentioned, Bouton (2004) has argued that the inhibitory effects of extinction are mediated by context, and thus any indication that training context is reinstated, such as by US delivery, should immediately recover pre-extinction performance. The arousal-mediated learning model proposes a somewhat different explanation: Fast recovery happens primarily because extinction performance is driven by decline in arousal, and secondarily because learning is slowed down during extinction, when arousal is low. Recovery of pre-extinction performance is due not to relearning, but mostly due to a fast recovery of arousal. Because arousal may be elicited by context, the present model may account for most major recovery phenomena in which stimulus context mediates performance, such as ABA context renewal (see Bouton, 1993). From these data, however, it is difficult to contrast the merit of an explanation of recovery based on contextdependent inhibition versus one based on arousal. One piece of evidence appears to support the latter: Reacquisition of fear conditioning is slower following explicitly unpaired presentations of CS and US than following presentations of the US alone (Rauhut et al., 2001). It is unclear how the context-dependent inhibition hypothesis can accommodate such effect. In contrast, the arousal hypothesis provides a straightforward explanation: arousal is maintained at high levels with the non-contingent presentation of the US, facilitating learning of the negative correlation, and in turn interfering with reacquisition. Finally, upon reintroducing and reversing the US following extinction of both CSs, we observed greater levels of responding to the current CS− (previous CS+) relative to the current CS+ (see Conditions 5 and 7). This faster reacquisition of current CS− responding occurred despite the fact that the US was presented following the current CS+. To our knowledge, this is a novel finding, and the model anticipated this recovery of responding to a former CS+ following extinction and a CS+/CS− reversal. The presentation of a US, through arousal, recovers the associative strength of both CSs near pre-extinction levels, which yields a relatively slow acquisition of the new CS+ after each reversal. There are, however, some salient caveats to the model proposed here. First, there are some systematic deviations in the adjustment of the model to data. The bottom panel of Fig. 1 shows the residuals of the model fit; positive and negative values indicate, respectively, overestimation and underestimation of response probability. The model underestimates responsiveness to CS− at the beginning of Conditions 1, 3, 5, and 7 (Acquisition, Reacquisition, and both Reversals). Asymptotic performance at the end of the first Reacquisition condition is mostly overestimated. Responsiveness to CS− is overestimated during some Extinction conditions, particularly at the beginning of Conditions 2 and 4, and throughout Condition 8. Performance during the third Extinction condition (Condition 6), however, is underestimated, particularly during the CS+. It is unclear what causes these divergences, but it seems reasonable to attribute them to the long-term interference of early learning on later learning. Note, for instance, that the extinction curves drawn by the model are virtually identical in all conditions; the direction of the residuals indicates that extinction performance varied systematically between conditions. Further progress in modeling learning will depend, in part, on accounting for such long-term effects. Another caveat is that ˛A and ˛V are negatively correlated (r2 = .92). This correlation indicates that arousal growth and acqui-

131

Table 4 Conditions, conditional stimulus assignments, and number of sessions in Experiment 2. Condition

Contingency

CS+

CS−

Sessions

1 2 3 4 5

Acquisition Reversal Reversal Reversal Reversal

A B A B A

B A B A B

18 21 21 21 21

sition are collinear, and a single parameter may account for both. This would explain why acquisition rates for two rats were estimated at the unlikely level of 1.0—the work done by one of the correlated parameters (˛A ) might have freed the other one (˛V ) to take any arbitrary value. One solution to this collinearity is to assume that acquisition is not mediated by arousal, only extinction. We intuit, however, that such seemingly arbitrary asymmetry needs to be supported by a reasonable theoretical basis and a larger database than the one provided here. 3. Experiment 2 Data from Experiment 1 provided informative constraints to a model of associative learning. We do not know, however, if the model can accurately predict conditioned approach performance—we only know that it can describe it quite well. The distinction is not trivial. Very complex models may provide good descriptions of the data by fitting well to relevant processes and to noise. Simpler models are more effective at isolating the former from the latter. In Section 2.3.3, we sought to simplify our model by removing selected components, but no satisfactory description could be attained. In Experiment 2, we collected more data from different rats, using a different sequence of conditions; we asked whether the model selected in Experiment 1 could predict the new data set. Such prediction would indicate that the model could generalize to other subjects and procedures. 3.1. Methods 3.1.1. Subjects and apparatus Six male Sprague-Dawley rats were obtained, housed, and had a similar experimental history as those used in Experiment 1. The stimuli, computer, and conditioning chambers were the same as those used in Experiment 1. 3.1.2. Procedures All aspects of the basic procedure were identical to those described for Experiment 1, with the steady and flashing conditional stimuli counterbalanced across rats. As in Experiment 1, the CS+ and CS− stimuli arranged during the initial Acquisition condition will be referred to as Stimulus A and Stimulus B, respectively (regardless of whether they are the CS+ or CS−). The primary procedural difference in Experiment 2 was that the conditional stimuli reversed across successive conditions. That is, dipper presentations followed Stimulus A but not Stimulus B during Conditions 1, 3, and 5. Dipper presentations followed Stimulus B but not Stimulus A during Conditions 2 and 4. Table 4 shows these conditions and number of sessions in each condition. 3.2. Results The points in the top panel of Fig. 4 represent, for each session, the mean proportion of trials with a head entry. At the beginning of the Acquisition condition, rats responded to A (the CS+) 92–96% of the trials, and to B (the CS−), 83–90% of the trials. By the end of this

132

C.A. Podlesnik, F. Sanabria / Behavioural Processes 87 (2011) 125–134

Fig. 4. Top panel: Proportion of trials with a head-poke conditioned approach response across sessions of Experiment 2. Data points are means (n = 6) and functions are model fits. Solid vertical lines indicate reversal of CS+ and CS−. Bottom panel: Mean of residuals of model fits (predicted–obtained).

condition, responding to A remained mostly stable (96–98%), and responding to B declined drastically (48–54%). Subsequent CS+/CS− reversals generated a consistent pattern: responding to the new CS+ attained asymptotic levels within 1 or 2 sessions, whereas responding to the new CS− progressively declined. Despite this consistency in performance, the probability of responding to the CS− in sessions 17–18 of each condition (all conditions were conducted for at least 18 sessions) reveal an upward trend across conditions: .51, .66, .71, .73, and .72 (see also Couvillon and Bitterman, 1986). This trend suggests that extinction became slower, or its asymptote became higher, with reversal training. 3.3. Model predictions The continuous curves in the top panel of Fig. 4 are traces of the model specified by Equations (1)–(4), based on mean parameter values estimated in Experiment 2 (Table 2, rightmost column). Initial values of A and V were fitted using the MLE method; they were both estimated to be 1.00. The curves fell reasonably close to the data. Residuals shown in the bottom panel of Fig. 4 reveal no systematic deviations of predictions of CS+ performance relative to data. Predictions of CS− performance, however, diverged somewhat more noticeably and systematically from data. CS− performance was mostly overestimated in Acquisition (Condition 1), particularly during the last sessions. The pattern of CS− residuals shown in the first Reversal (Condition 2) reveals the divergence between a nearly linear prediction and a reversed-S-shaped performance. CS− residuals in Conditions 3 and 5 do not appear to follow

an identifiable pattern. CS− performance in Condition 4 appears to be somewhat underestimated, particularly during the last sessions. 3.4. Discussion Each CS+/CS− reversal produced a similar response pattern: responding to the CS+ was rapidly reacquired, whereas responding to the CS− slowly declined. The proposed learning model reproduced this pattern faithfully. It is particularly meritorious that the model predicted the reversal pattern using parameters estimated for different rats exposed to different experimental conditions. Moreover, there were no immediate CS+/CS− reversals in Experiment 1. Extinction conditions were interposed between reversals in Experiment 1, yielding response patterns visibly different from those generated by the immediate reversals of Experiment 2. The model’s ability to accommodate to both datasets with a single set of parameters is indicative of its generality. Despite the similarity in performance after each reversal, there were systematic changes across conditions. Particularly, the decline in responding to the CS− was faster during the Acquisition condition than during any Reversal condition. Also, responding to the former CS+ appeared to have persisted longer following the first Reversal (Condition 2) than following any other Reversal condition, generating a distinctly reversed-S-shaped response curve. Both patterns suggest that early CS+ training slightly slowed down subsequent CS− training. Just as in Experiment 1, the model did not account for these long-term effects.

C.A. Podlesnik, F. Sanabria / Behavioural Processes 87 (2011) 125–134

4. General discussion The arousal-mediated learning model provided an adequate description of CS+/CS− maintenance, extinction, and reversal performance (Experiment 1), and accurately predicted performance of a different group of rats exposed to periodic CS+/CS− reversals (Experiment 2). The model adjusts the Rescorla-Wagner linear operator by letting the US-dependent rate parameter (Rescorla and Wagner, 1972, called it ˇ–we call it arousal, or A) vary with US density. The model also allows for arousal to be conditioned to the context. These adjustments are minor in size, but have significant consequences. If arousal varies faster than conditioning, the model accounts for the extinction of conditioned responding as mainly a reduction in arousal, not in conditioning. By preserving conditioning during extinction, the model also accounts for rapid reacquisition, renewal, and reinstatement effects. Moreover, by incorporating the notions of arousal and persistent response state, the model advances the Rescorla-Wagner intuitions toward an account of learning and performance, not just learning. There is a clear analogy between the arousal-mediated learning model and other classic models derived from the Rescorla-Wagner linear operator. Whereas Rescorla and Wagner (1972) assumed that US- and CS-dependent learning rates were constant, Mackintosh (1975) and Pearce and Hall (1980) suggested that the CS-dependent rate varied with the amount of attention allocated to the CS, which presumably varies with conditioning. Although the sort of variations in CS-dependent rate suggested by the Mackintosh and Pearce-Hall models cannot account for reacquisition effects the way changes in arousal do, there is no inherent reason why attentional processes cannot be incorporated into the arousal-mediated model, modulating changes in ˛V and ˇV (Equation (2)), which are currently constant. Conversely, as animals may pay more attention to their environment when highly aroused, the present formalism may accommodate those data. Killeen et al. (2009) suggested a dynamic model of conditioning and extinction similar to the arousal-mediated learning model advanced here, but without the arousal component. In their Experiment 1, when rate of reinforcement varied across conditions, parameter estimates changed; in their Experiment 2, when rate of reinforcement was constant (only the CS duration varied) parameter estimates remained constant. It thus appears that parameter invariance across conditions in their Experiment 1 might have been achieved by incorporating arousal into Killeen et al.’s model. The notions of response perseverance and random responding within response states in the arousal-mediated learning model are directly borrowed from Killeen et al.’s model. The arousal-mediated learning model is not without limitations. In particular, we mentioned the collinearity of the rates of arousal and conditioning acquisition and that the model failed to predict changes in performance over repeated training conditions. Moreover, the arousal-mediated learning model also cannot account for the resurgence of extinguished performance such as that reported by Lindblom and Jenkins (1981). In their experiment, a conditioned response was extinguished by presenting the US either randomly or only in the absence of the CS. When the US was no longer presented in the final condition, conditioned responding resurged. Contrary to these findings, the arousal-mediated learning model predicts that non-contingent presentation of the US should have more effectively trained the CS-no US association and therefore no resurgence should be expected. Another important limitation is that the model cannot account for some well-established extinction effects. It cannot account for the faster extinction of conditioned responses following continuous rather than partial reinforcement (Haselgrove et al., 2004). This is because extinction rate, according to the model, rapidly con-

133

verges to 1 − ˇA within a few trials,4 and nothing in the model indicates that ˇA should be sensitive to training conditions. For this reason, the model also cannot account for data supporting behavioral momentum theory (see Nevin, 1992; Nevin and Grace, 2000, for reviews), in which operant responding is more resistant within and across successive sessions of extinction in discriminative stimulus contexts presenting higher rates of reinforcement (Nevin and Grace, 2005). For instance, Nevin et al. (1990) arranged a multiple schedule in which equal rates of food reinforcement were presented for pigeons keypecking in two components. Additional food was presented independent of keypecking in one component, degrading the operant contingency but enhancing the Pavlovian contingency—and arousal. Because resistance to disruption by satiation and extinction was greater in the component with added food, Nevin et al. suggested that Pavlovian, not operant, contingencies primarily govern the persistence of operant behavior. Recently, Podlesnik and Shahan (2009, 2010) showed that, like resistance to disruption, relapse of discriminated operant behavior in resurgence, reinstatement, and renewal procedures also are governed by Pavlovian contingencies. Consistent with Bouton’s (1993, 2004) interference framework, Podlesnik and Shahan attributed the greater relapse in richer contexts to the differential Pavlovian relations surviving extinction and their subsequent re-establishment. However, the Pavlovian relations between reinforcement and discriminative stimulus contexts influencing these effects also have been described as “non-specific effects that arouse or modulate behavior” (Nevin et al., 1990, p. 374) in a way analogous to a “central motive state” proposed by Bindra (1972) and Rescorla and Solomon (1967). This provides conceptual support for the notion that arousal plays a role in mediating differences in extinction and relapse of complex operant performance. Further theoretical development, perhaps by borrowing ideas from behavioral momentum theory, would result in the model accounting for such training effects (see also Killeen, 1998). The interference hypothesis of extinction (Bouton, 1993, 2004) is likely to account for some of the limitations of the arousalmediated learning model. The price of such account, however, is the reduced precision that is inherent to qualitative relative to quantitative learning models. The challenge for learning researchers is to mathematize the intuitions provided by the interference hypothesis. Whether such intuitions may be reduced to arousal processes is yet to be determined by further research. It is also yet unclear whether neural network models such as those proposed by Burgos and Murillo-Rodriguez (2007), Kehoe (1988), and Larrauri and Schmajuk (2008) can overcome the limitations of the arousal-mediated learning model. In the meantime, it appears that a productive route to advance learning theory is to further develop models that yield closed-form expressions to describe conditioning and extinction effects, thus balancing precision with simplicity. We hope that the arousal-mediated learning model will serve as launch pad for such developments.

4 Combining Equations (1–3), the probability of a response state in the t-th extinction trial may be expressed as

p(St ) = A0 V0 (1 − ˇA )t

t  

1 − A0 ˇV (1 − ˇA )

n



,

0 ≤ A 0 , V 0 , ˇA , ˇV ≤ 1

(F2)

n=1

where A0 and V0 are the arousal and associative strengths at the beginning of the extinction session (respectively, the averages of A and V during the last reinforcement session). Extinction rate may thus be estimated as

 p(Sr ) t = (1 − ˇA ) 1 − A0 ˇV (1 − ˇA ) . p(St−1 ) Extinction rate limits at 1 − ˇA as extinction advances and t becomes larger.

(F3)

134

C.A. Podlesnik, F. Sanabria / Behavioural Processes 87 (2011) 125–134

Acknowledgements Order of authorship was determined by the flip of a coin. The authors would like to thank Adam Kynaston for assistance conducting these experiments, Peter Killeen for feedback regarding the quantitative analyses, Corina Jimenez-Gomez for feedback on a previous version of this manuscript, Jim Woods for use of the laboratory facilities, and Geoff White and two anonymous reviewers for their invaluable feedback on the submitted manuscript. References Bindra, D., 1972. A unified account of Pavlovian conditioning and operant training. In H.A. Black, W.F. Prokasy (Eds.), Classical Conditioning II: Current research and theory. Appleton-Century-Crofts, New York, pp. 453–482. Bouton, M.E., 1993. Context, time, and memory retrieval in the interference paradigms of Pavlovian learning. Psych. Bull. 114, 80–99. Bouton, M.E., 2004. Context and behavioral processes in extinction. Learn. Mem. 11, 485–494. Bouton, M.E., Brooks, D.C., 1993. Time and context effects on performance in a Pavlovian discrimination reversal. J. Exp. Psych. Anim. Behav. Proc. 19, 165–179. Bouton, M.E., King, D.A., 1983. Contextual control of the extinction of conditioned fear: tests for the associative value of the context. J. Exp. Psych. Anim. Behav. Proc. 9, 248–265. Bouton, M.E., Peck, C.A., 1989. Context effects on conditioning, extinction, and reinstatement in an appetitive conditioning preparation. Anim. Learn. Behav. 17, 188–198. Brooks, D.C., Bouton, M.E., 1993. A retrieval cue for extinction attenuates spontaneous recovery. J. Exp. Psych. Anim. Behav. Proc. 19, 77–89. Brown, P.L., Jenkins, H.M., 1968. Auto-shaping of pigeon’s key-peck. J. Exp. Anal. Behav. 11, 1–8. Burgos, J.E., Murillo-Rodriguez, E., 2007. Neural-network simulations of two context-dependence phenomena. Behav. Proc. 75, 242–249. Burnham, K.P., Anderson, D.R., 2002. Model Selection and Multimodel Inference: a Practical Information-theoretic Approach. Springer-Verlag, New York. Couvillon, P.A., Bitterman, M.E., 1986. Performance of honeybees in reversal and ambiguous-cue problems: tests of a choice model. Anim. Learn. Behav. 14, 225–231. Delamater, A.R., 2004. Experimental extinction in Pavlovian conditioning: behavioural and neuroscience perspectives. Quart. J. Exp. Psych. B 57, 97–132. Estes, W.K., 1950. Toward a statistical theory of learning. Psych. Rev. 57, 94–107. Farwell, B.J., Ayres, J.J.B., 1979. Stimulus-reinforcer and response-reinforcer relations in the control of conditioned appetitive headpoking (goal tracking) in rats. Learn. Motiv. 10, 295–312. Flaherty, C.F., 1996. Incentive Relativity. Cambridge University Press, New York. Gallistel, C.R., Gibbon, J., 2000. Time, rate, and conditioning. Psych. Rev. 107, 289–344. Haselgrove, M., Aydin, A., Pearce, J.M., 2004. A partial reinforcement extinction effect despite equal rates of reinforcement during Pavlovian conditioning. J. Exp. Psych. Anim. Behav. Proc. 30, 240–250. Hearst, E., 1977. Classical conditioning as the formation of interstimulus associations: stimulus substitution, parasitic reinforcement, and autoshaping. In A. Dickinson, R.A. Boakes (Eds.), Mechanisms of learning and motivation: a memorial volume to Jerzy Konorski. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 19–52. Hearst, E., Jenkins, H.M., 1974. Sign-tracking: The Stimulus-reinforcer Relation and Directed Action. The Psychonomic Society, Austin, TX. Honey, R.C., Watt, A., 1999. Acquired relational equivalence between contexts and features. J. Exp. Psych. Anim. Behav. Proc. 25, 324–333. Hull, C.L., 1943. Principles of Behavior: An Introduction to Behavior Theory. Appleton-Century-Crofts, New York. Kehoe, E.J., 1988. A layered network model of associative learning: learning to learn and configuration. Psych. Rev. 95, 411–433. Killeen, P.R., 1975. On the temporal control of behavior. Psych. Rev. 82, 89–115. Killeen, P.R., 1994. Mathematical principles of reinforcement. Behav. Brain. Sci. 17, 105–135. Killeen, P.R., 1998. The first principle of reinforcement. In C.D.L. Wynne, J.E.R. Staddon (Eds.) Models of Action: Mechanisms for Adaptive Behavior. Lawrence Erlbaum Associates, Mahwah, NJ, pp. 127–156.

Killeen, P.R., Hall, S.S., 2001. The principal components of response strength. J. Exp. Anal. Behav. 75, 111–134. Killeen, P.R., Hall, S.S., Reilly, M.P., Kettle, L.C., 2002. Molecular analyses of the principal components of response strength. J. Exp. Anal. Behav. 78, 127–160. Killeen, P.R., Hanson, S.J., Osborne, S.R., 1978. Arousal: its genesis and manifestation as response rate. Psych. Rev. 85, 571–581. Killeen, P.R., Sanabria, F., Dolgov, I., 2009. The dynamics of conditioning and extinction. J. Exp. Psych. Anim. Behav. Proc. 35, 447–472. Killeen, P.R., Sitomer, M.T., 2003. MPR. Behav. Proc., 49–64. Larrauri, J.A., Schmajuk, N.A., 2008. Associative models can describe both causal learning and conditioning. Behav. Proc. 77, 443–445. Lindblom, L.L., Jenkins, H.M., 1981. Responses eliminated by noncontingent or negatively contingent reinforcement recover in extinction. J. Exp. Psych. Anim. Behav. Proc. 7, 175–190. Mackintosh, N.J., 1974. The Psychology of Animal Learning. Academic Press, London. Mackintosh, N.J., 1975. Theory of attention: variations in associability of stimuli with reinforcement. Psych. Rev. 82, 276–298. Miller, R.R., Barnet, R.C., Grahame, N.J., 1995. Assessment of the Rescorla-Wagner model. Psych. Bull. 117, 363–386. Nevin, J.A., 1992. An integrative model for the study of behavioral momentum. J. Exp. Anal. Behav. 57, 301–316. Nevin, J.A., Grace, R.C., 2000. Behavioral momentum and the law of effect. Behav. Brain. Sci., 73–90. Nevin, J.A., Grace, R.C., 2005. Resistance to extinction in steady state and in transition. J. Exp Psych. Anim. Behav. Proc. 31, 199–212. Nevin, J.A., Tota, M.E., Torquato, R.D., Shull, R.L., 1990. Alternative reinforcement increases resistance to change: Pavlovian or operant contingencies? J. Exp. Anal. Behav. 53, 359–379. Pavlov, I.P., 1927. Conditioned Reflexes (G.V. Anrep, translation). Oxford University Press, London. Pearce, J.M., Hall, G., 1980. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psych. Rev. 87, 532–552. Peterson, G.B., Ackil, J.E., Frommer, G.P., Hearst, E.S., 1972. Conditioned approach and contact behavior toward signals for food or brain-stimulation reinforcement. Science, 1009–1011. Podlesnik, C.A., Shahan, T.A., 2009. Behavioral momentum and relapse of extinguished operant responding. Learn. Behav. 37, 357–364. Podlesnik, C.A., Shahan, T.A., 2010. Extinction, relapse, and behavioral momentum. Behav. Proc. 84, 400–411. Rauhut, A.S., Thomas, B.L., Ayres, J.J.B., 2001. Treatments that weaken Pavlovian conditioned fear and thwart its renewal in rats: implications for treating human phobias. J. Exp. Psych. Anim. Behav. Proc. 27, 99–114. Rescorla, R.A., 1993. Preservation of response-outcome associations through extinction. Anim. Learn. Behav. 21, 238–245. Rescorla, R.A., 2001. Experimental extinction. In R.R. Mowrer, S.B. Klein (Eds.), Handbook of contemporary learning theories. Lawrence Erlbaum Associates, Mahwah, NJ, pp. 119–154. Rescorla, R.A., 2002. Comparison of the rates of associative change during acquisition and extinction. J. Exp. Psych. Anim. Behav. Proc. 28, 406–415. Rescorla, R.A., 2007. Spontaneous recovery after reversal and partial reinforcement. Learn. Behav. 35, 191–200. Rescorla, R.A., Solomon, R.L., 1967. Two-process learning theory: relationships between Pavlovian conditioning and instrumental learning. Psych. Rev. 74, 151–182. Rescorla, R.A., Wagner, A.R., 1972. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Black, H.A., Prokasy, W.F. (Eds.), Classical Conditioning II: Current Research and Theory. Appleton-Century-Crofts, New York, pp. 64–99. Simmons, R., 1924. The relative effectiveness of certain incentives in animal learning. Comp. Psych. Monograph. 11, 1–79. Spear, N.E., Smith, G.J., Bryan, R., Gordon, W., Timmons, R., Chiszar, D., 1980. Contextual influences on the interaction between conflicting memories in the rat. Anim. Learn. Behav., 273–281. Swartzentruber, D., 1991. Blocking between occasion setters and contextual stimuli. J. Exp. Psych. Anim. Behav. Proc. 17, 163–173. Thomas, D.R., McKelvie, A.R., Mah, W.L., 1985. Context as a conditional cue in operant discrimination reversal learning. J. Exp. Psych. Anim. Behav. Proc. 11, 317–330. Timberlake, W., 1994. Behavior systems, associationism, and Pavlovian conditioning. Psychon. Bull. Rev. 1, 405–420. Wagner, A.R., 1981. SOP: a model of automatic memory processing in animal behavior. In: Spear, N.E., Miller, R.R. (Eds.), Information Processing in Animals: Memory Mechanisms. Erlbaum, Hillsdale, NJ, pp. 5–47.