Optimal behavior and concurrent variable ratio-variable interval schedules

Optimal behavior and concurrent variable ratio-variable interval schedules

ARTICLE IN PRESS Journal of Mathematical Psychology 49 (2005) 339–353 www.elsevier.com/locate/jmp Optimal behavior and concurrent variable ratio-var...

367KB Sizes 0 Downloads 136 Views

ARTICLE IN PRESS

Journal of Mathematical Psychology 49 (2005) 339–353 www.elsevier.com/locate/jmp

Optimal behavior and concurrent variable ratio-variable interval schedules Rachel Belinskya,, Fernando Gonza´lezb, Jeanne Stahlb a

Department of Mathematics and Statistics, Georgia State University, University Plaza, Atlanta, Georgia 30303, USA b Department of Psychology, Morris Brown College, 641 Martin Luther King Jr. Blvd, Atlanta, GA 30314, USA Received 4 August 2004; received in revised form 14 February 2005

Abstract Concurrent variable ratio-variable interval (CONC VRVI) schedules of reinforcement, and the time-based analog of the same schedule (CONC VT*VT), have been used to determine if the matching law accounts for the distribution of choices between the behavior alternatives more accurately than the assumption that subjects distribute time between the alternatives to maximize total reinforcement rate. The results of those experiments leave room for interpretation. One problem is the lack of understanding of the theoretical outcomes associated with maximization in these schedules. A precise understanding of the characteristics of optimal behavior (OB) could help identify experimental evidence of OB. Here we derive equations that describe the optimal times the subject should spend on each alternative of the schedule. We provide a table of the optimal times for a wide range of parameter values of the schedule that experimenters can use to compare easily experimental results to the results expected if subjects behave optimally. We also derive a function m that relates matching and optimal performance and we prove interesting characteristics of the function. Finally, we describe features of OB with CONC VT*VT and with concurrent variable time schedules that can be used to identify evidence of OB. r 2005 Elsevier Inc. All rights reserved. Keywords: Optimal behavior; Optimization; Matching; Matching law; Reinforcement maximization; Concurrent variable -ratio-variable interval schedules; Schedules of reinforcement

1. Introduction 1.1. Reinforcement maximization and the matching law The two-independent component concurrent variable interval schedule (CONC VIVI) has been used often to study how infrahuman animals distribute time between alternative behaviors that yield different rates of reinforcement. In this experimental preparation, a laboratory animal such as a rat is trained to emit two different responses (e.g. pushing down on lever 1 or lever 2) that are occasionally followed by presentation of a reinforcer (e.g. food pellet). A computer controls and records all relevant events that take place during the Corresponding author. Fax: + 1404 651 2246.

E-mail address: [email protected] (R. Belinsky). 0022-2496/$ - see front matter r 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.jmp.2005.02.005

experimental sessions, such as the presentation of the reinforcer, the presentation of stimuli that may be paired with the different response alternatives and the time of occurrence of each response. The following equation, now usually called the matching law, was suggested by Herrnstein (1961) to describe the subject’s performance under two-alternative CONC VIVI schedules: P1 r1 ¼ P1 þ P2 r1 þ r2

(1.1)

or, equivalently, P 1 r1 ¼ , P 2 r2 where P1 ; P2 refer to the number of responses on each of the two alternatives and r1 ; r2 are the corresponding

ARTICLE IN PRESS 340

R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

number of received reinforcements. When a subject’s allocation of responses between the alternatives results in data that are described accurately by Eq. (1.1), the subject is said to be ‘‘matching’’. Because many studies have reported consistent deviation from matching (for reviews see Wearden & Burgess, 1982; Davison & McCarthy, 1988), several variations of the matching law have been put forth that yield better fits to the data. Baum (1974), for example, suggested the exponentiated version  s P1 r1 ¼c P2 r2 or, equivalently, P1 rs ¼ s 1s , P1 þ P2 r1 þ r2 =c

(1.2)

where c and s are free parameters. Eq. (1.2) is referred to as generalized matching law (G-match). In addition to response frequency, the dwell time (DT) on each alternative—time spent at each alternative—has been used as a dependent variable in concurrent schedule research. Then Eq. (1.2) is expressed as t1 rs ¼ s 1s , t1 þ t2 r1 þ r2 =c

(1.3)

where t1 and t2 stand for the two DTs. Eq. (1.3) has repeatedly been shown to fit the data from CONC VIVI experiments as accurately as (1.2) (Baum, 1979; Davison & McCarthy, 1988; Wearden & Burgess, 1982) and it also describes very accurately the results of experiments that used concurrent variable time schedules (CONC VTVT). The difference between CONC VIVI and CONC VTVT schedules is that in the latter the subject is not required to make a specific response to produce a reinforcer. For example, Baum and Rachlin (1969) programmed a CONC VTVT in a shuttle box. The behavior alternatives were the subject’s presence on one or the other side of the box. A reinforcement ready for delivery on one side of the box was delivered when the subject was on that side. No other specific actions were required on the part of the subject. The dependent variable was the time spent on each side, i.e., the DT at each alternative. One issue concerning matching that has interested investigators is the relation between matching behavior and optimal behavior (OB), that is, whether a subject by matching also approximates the maximum number of reinforcements that the experimental procedure can yield during the session. This is an obvious question as it is reasonable to assume that contingencies of reinforcement will act over time to select behaviors that optimize net energy gain for organism (Stephens & Krebs, 1986). In the experimental preparations used in most studies concerning concurrent schedules, the reinforced beha-

vior is a light downward push on a lever, therefore the cost of behaving is minimal and can be ignored usually in the calculation of energy gain. That leaves the number of earned reinforcements as the only factor relevant to optimization in a session of reasonably short duration. Houston and McNamara (1981) derived equations that describe the times an animal should spend on each alternative of a CONC VTVT to maximize total reinforcement rate and showed that the matching law is incompatible with the optimal DT equations. In a previous paper (Belinsky, Gonzalez, & Stahl, 2004) we derived a function m that relates matching and optimal performance in CONC VTVT . We proved theorems establishing the extreme values of function m and proved other interesting formal characteristics of the function. In addition, we provided a table of optimal dwell times (ODT) for a wide range of parameter values of the CONC VTVT that experimenters can use to compare experimental results to the results expected if subjects behave optimally. The present paper extends our analysis of optimal performance to a time-dependent analog of the concurrent variable ratio-variable interval schedule (CONC VRVI). In general, CONC VRVI schedules yield data that are described as accurately by Eq. (1.3) as are the data obtained with CONC VIVI or CONC VTVT schedules. However, CONC VIVI data are usually best fitted with so 1 in Eq. (1.3), whereas CONC VRVI are generally best fitted with s41. This indicates that with CONC VRVI schedules the animals tend to ‘‘overmatch’’ (Baum, 1979), that is, they spend more time on the component that yields the higher reinforcement rate than is required for strict matching (i.e., s ¼ 1). The shaded area in Fig. 1 denotes G-match for the range of values of s that describe most published results. Investigators (e.g. Baum & Aparicio, 1999; Herrnstein & Heyman, 1979; Heyman & Herrnstein, 1986) have used CONC VRVI schedules to generate data for testing competing behavioral theories choice; correctly reasoning (although no mathematical proof was provided) that if subjects distributed responses in CONC VRVI as to maximize reinforcement rate, the resulting data would indicate overmatching. It was not specified how large the parameter s in Eq. (1.2) would be if the animal behaved optimally? It seems, therefore, that derivation of ODT equations that are applicable to CONC VRVI would represent a contribution of value to researchers interested in matching and behavioral studies of choice. As we did in our paper on CONC VTVT, we provide a table of ODTs for a wide range of values of the procedure parameters of the component schedules that experimenters could use to compare against data. We also determine analytically the extreme values of function m for the time-dependent analog of the CONC VRVI schedule and prove several theorems concerning interesting characteristics of the function. Here we do

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

Fig. 1. Dwell time proportion w ¼ DT2 =ðDT1 þ DT2 Þ on the alternative associated with the component schedule with the lower programmed reinforcement rate ðVT2 Þ as a function the proportion of earned reinforcements u ¼ ER2 =ðER1 þ ER2 Þ yielded by the same schedule. It is assumed that the programmed mean inter-reinforcement interval for the other schedule ðVT1 Þ was held constant at 1=R1 and that VT2 has any value 1=R2 greater than 15 s, as long as R2 ¼ l  R1 . The curves represent optimal behavior (OB) with each of three CODs and, therefore, three values of t. The shaded area represents G-match for the range of values of s (Eq. (1.3)) that describe most published results. The OB curves are concave up portions of hyperbolae. They are bounded at (0,0) on the low end. The high ends depend on the value of t (see Theorem 8). The OB curves could be approximately overlaid by curves based on G-match with very high s values. However, since the OB curves only exist over a fraction of the values of u whereas G-match exists over all values of u, it does not make sense to describe the OB curves using G-match.

not review published data to show how it compares to optimal behavior. We simply hope to provide the means by which others could conduct such an analysis. 1.2. Description of the problem and notations A VI schedule stipulates that a reinforcer becomes available for delivery after intervals of unpredictable duration. For example, in a VI60 s a food pellet becomes available once per minute on the average. Presentation of the reinforcer immediately follows the first response emitted after the food pellet is made available by the schedule. At any point in time only one reinforcer is held waiting for delivery and no more are made available until the stored one is delivered. A VR schedule stipulates that a reinforcer is delivered after the subject emits a number of responses that varies unpredictably from reinforcement to reinforcement. For example, in a VR10 a food pellet is presented to the subject, on the average, after it emits 10 responses. In CONC VRVI, as in CONC VIVI schedules, a changeover delay (COD) is

341

typically imposed to keep the subject from switching at a high rate. A COD prescribes that the subject cannot obtain a reinforcer within some time period following a changeover. For example, in a CONC VI1VI2 schedule with a COD 3 s, a reinforcer that becomes available on the VI1 component while the subject is responding on the VI2 component will be delivered following the first response on VI1 emitted at least 3 s after the change over to VI1. In a CONC VR1VI2 the two-component schedules operate independently, one for each of two behavior alternatives BA1 and BA2, respectively. While engaged on BA1, the subject obtains reinforcements according to schedule VR1. At the same time, since VI2 continues to run, the probability that reinforcement will be available when the subject changes over to BA2 increases. However, since the operation of VR1 depends on BA1 responses, the VR1 schedule does not run when the subject is engaged on BA2 and therefore a VR1 reinforcer is never stored for delivery after a change over to BA1. The analysis of optimal performance we described in our previous paper (Belinsky et al., 2004) was based on the CONC VTVT schedule (see also Houston & McNamara, 1981) rather than on CONC VIVI. The results of this analysis apply to CONC VIVI when response rates during the VI components are high relative to the programmed reinforcement rates. The published data suggest that this is a safe assumption. Similarly, here we base our analysis on an analog of CONC VRVI in which a specific response is not required to produce the reinforcer. In this respect, the schedule considered here is related to CONC VRVI in the same way that CONC VTVT is related to CONC VIVI. It should be noted that at least two research papers (DeCarlo, 1985; Heyman & Herrnstein, 1986) report data generated with the schedule that we use for the present analysis. Those papers concern experiments to test the predictions of matching and maximization accounts of choice. The time-based analog of CONC VRVI was used in those studies, because with this schedule, as with CONC VRVI, matching and total reinforcement maximization predicted different values of parameter s in Eq. (1.3). The time-based analog of CONC VRVI also offered the advantage that response costs should be equivalent in the component schedules and, therefore, would not confound the results. We assume a variant of the two-component CONC VTi ði ¼ 1; 2Þ for two behavior alternatives. Behavior alternative 1 (BA1) is reinforced according to component schedule VTn1 , which is the time-dependent VR analog, and behavior alternative 2 (BA2) is reinforced according to a variable time component schedule VT2. Neither of the component schedules operate during the time occupied by reinforcement presentation. The reinforcement presentation times are discarded from

ARTICLE IN PRESS 342

R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

the analysis. Whenever the subject switches from one alternative to the other, a changeover delay (COD) is imposed during which reinforcers are not delivered. ComponentVTn1 is a variable time schedule that operates only when the subject is engaged in BA1, but not during CODs. Component VT2 is a standard variable time schedule that operates when the subject is engaged on either behavior alternative and during the CODs. When the subject is at BA1, only one reinforcer scheduled by VT2 component is stored for delivery, following the COD, after the subject switches to BA2. Programmed reinforcement (PR) rates R1 and R2 correspond to VTn1 and VT2 . Since a VTn1 reinforcer is never stored for delivery following a switch to BA1, it is obvious that if R2 XR1 , then the subject maximizes reinforcement rate by never engaging in BA1. Therefore, our analysis assumes that R2 oR1 . We refer to VTn1 as the ‘‘dense’’ schedule and to VT2 as the ‘‘lean’’ schedule. According to the Poisson distribution, the probability that a reinforcer has not been set up on VT2 by time t is eR2 t , and the average time between reinforcements is 1=R2 . When a component schedule VT2 sets up a reinforcer for delivery while the subject is engaged in BA1, schedule VT2 stops functioning, and the unobtained reinforcer is stored until the subject switches to BA2 and gets it. At that point VT2 resumes function. When the animal switches to component i, a COD of duration si must elapse before a reinforcer can be obtained. The dwell time t1 is the time elapsed between the changeovers to and from BA1, and t2 is the time elapsed between the changeovers to and from BA2. If si X1=Ri , then the subject will obtain more reinforcers by not changing over. Therefore, we assume that si o1=Ri . The variables that determine earned reinforcements (ER) are the stay times a and b, which represent the portion of the dwell times during which reinforcers can be delivered. Thus, t1 ¼ s1 þ a and t2 ¼ s2 þ b. Also remember that we write ODT for optimal dwell time, ER for the number of earned reinforcements, MER for maximum number of earned reinforcements and OB for optimal behavior.

rate R1 . The animal then remains engaged at BA1 for a time a during which the expected number of ER is R1 a. Thus, the expected mean number of ER during time t1 is r1 ¼ R1 a. For the cycle t1 þ t2 the expected mean number of ER is r ¼ r1 þ r2 ¼ p þ R1 a þ R2 b. The total duration of the cycle is t ¼ a þ b þ s1 þ s2 , so that if R0 ða; bÞ is the mean ER rate for stay times a; b, then ða þ b þ s1 þ s2 ÞR0 ¼ r1 þ r2 ,

(2.1)

that is, ða þ b þ s1 þ s2 ÞR0 ¼ 1  eR2 ðaþs1 þs2 Þ þ R1 a þ R2 b. In most experiments s1 ¼ s2 , but no extra effort is required to find the optimal behavior when CODs are not equal. Note that CODs enter Eq. (2.1) only as s1 þ s2 . This means that for given R1 and R2 the optimal behavior depends on the sum of the CODs. Thus, any problem with unequal CODs is equivalent to one with each COD having the value s, where 2s ¼ s1 þ s2 . With this in mind, we may rewrite Eq. (2.1) as ða þ b þ 2sÞR0 ¼ 1  eR2 ðaþ2sÞ þ R1 a þ R2 b.

(2.2)

R2 R1 ,

We assume that R1 4R2 and let l ¼ hence lo1. Note that here a; b, s, t1 ; t2 ; t are times in minutes and R1 , R2 , R0 are reinforcement rates in min1 . It is convenient to have unitless variables x ¼ R1 a;

y ¼ R1 b;

T 2 ¼ R 1 t2 ;

t ¼ R1 s; T 1 ¼ R1 t1 , R0 T ¼ R1 t; R ¼ . R1

ð2:3Þ

Here x; y, t, T 1 ; T 2 ; T are unitless normalized times, and R is the unitless normalized mean ER rate. Unitless PR rate on BA1 is 1 and unitless PR rate on BA2 is l. Note that t ¼ R1 so1. The mean numbers of ER in terms of unitless variables are r1 ¼ x;

r2 ¼ 1  elðxþ2tÞ þ ly.

(2.4)

Eq. (2.1) then becomes 2. Basic equations

ðx þ y þ 2tÞR ¼ r1 þ r2 ,

(2.5)

that is, from Eq. (2.2), It is assumed that the subject’s behavior is sensitive to the parameters of the problem ðRi ; si Þ. When the animal switches to BA2, the probability that a reinforcer will not be available is eR2 ðaþs1 þs2 Þ , therefore the probability of collecting a reinforcer after the COD is p ¼ 1  eR2 ðaþs1 þs2 Þ . The animal then remains engaged at BA2 for a time b during which the expected mean number of ER is R2 b. Thus, the expected mean number of ER during time t2 is r2 ¼ p þ R2 b. But when the animal switches to BA1, nothing is stored there and after the COD elapses, reinforcements are delivered at the

ðx þ y þ 2tÞR ¼ 1  elðxþ2tÞ þ x þ ly

(2.6)

and p ¼ 1  elðxþ2tÞ .

(2.7)

The unitless dwell times are T 1 ¼ x þ s for BA1 and T 2 ¼ y þ s for BA2. The total cycle duration is T ¼ T 1 þ T 2 ¼ x þ y þ 2s. The following theorem shows that staying some time on each side after the COD elapses is not an optimal solution. For OB the subject must stay on the dense

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

343

schedule some time x, switch to the lean schedule only to take a stored reinforcer, and then switch back.

and t, R depends on one variable x, and Eq. (2.6) becomes

Theorem 1. The function R, defined by Eq. (2.6), has an absolute maximum when 0oto0:5; y ¼ 0 and x ¼ lz  2t, where z is a solution to the equation

ðx þ 2tÞR ¼ 1  elðxþ2tÞ þ x.

ez ðz þ 1Þ ¼ 1  2t.

R þ ðx þ 2tÞR0 ¼ lelðxþ2tÞ þ 1,

(2.8)

(2.9)

We differentiate Eq. (2.9) with respect to x,

Proof. To find the maximum value of ER rate (MER), we differentiate Eq. (2.6) with respect to x and y:

and equate the derivative of R to 0:

qR ¼ lelðxþ2tÞ þ 1, R þ ðx þ y þ 2tÞ qx

This is the maximum value of R, because R00 o0 at the point where R0 ¼ 0:

R þ ðx þ y þ 2tÞ

qR ¼l qy

R ¼ lelðxþ2tÞ þ 1.

(2.10)

2R0 þ ðx þ 2tÞR00 ¼ l2 elðxþ2tÞ . Combining Eqs. (2.9) and (2.10), we find that the optimal stay time x on BA1 satisfies the equation

and equate the derivatives of R to 0: R ¼ lelðxþ2tÞ þ 1,

ðx þ 2tÞðlelðxþ2tÞ þ 1Þ ¼ 1  elðxþ2tÞ þ x

R ¼ l.

that may be brought to the form

Now we see that the equation R ¼ l ¼ le

lðxþ2tÞ

elðxþ2tÞ ðlðx þ 2tÞ þ 1Þ ¼ 1  2t.

þ1

(2.11)

Let z ¼ lðx þ 2tÞ, then Eq. (2.11) becomes (2.8)

does not have any solutions, because lo1, but the righthand side is greater than 1. The function R does not have stationary points inside the region x40; y40 and does not have a maximum there. (See details in Appendix A.) Thus, we set y ¼ 0. Then, for fixed l

z

e ðz þ 1Þ ¼ 1  2t. We see that in this equation z depends on t only and does not depend on l. Since the left-hand side is positive, we require to0:5. &

Table 1 Optimal stay times for different values of l (rows) and t (columns) t

0.04

0.08

0.12

0.16

0.2

0.24

0.28

0.32

0.36

0.4

0.44

0.48

l

x

x

x

x

x

x

x

x

x

x

x

x

0.04 0.08 0.12 0.16 0.2 0.24 0.28 0.32 0.36 0.4 0.44 0.48 0.52 0.56 0.6 0.64 0.68 0.72 0.76 0.8 0.84 0.88 0.92 0.96 1

11.5636 5.74182 3.80122 2.83091 2.24873 1.86061 1.58338 1.37546 1.21374 1.08436 0.97851 0.8903 0.81567 0.75169 0.69624 0.64773 0.60492 0.56687 0.53282 0.50218 0.47446 0.44926 0.42625 0.40515 0.38575

17.6409 8.74047 5.77365 4.29024 3.40019 2.80682 2.38299 2.06512 1.81788 1.62009 1.45827 1.32341 1.2093 1.1115 1.02673 0.95256 0.88711 0.82894 0.77689 0.73005 0.68766 0.64913 0.61395 0.58171 0.55204

23.1114 11.4357 7.5438 5.59785 4.43028 3.6519 3.09592 2.67893 2.3546 2.09514 1.88286 1.70595 1.55626 1.42796 1.31676 1.21946 1.13361 1.0573 0.98902 0.92757 0.87197 0.82143 0.77528 0.73298 0.69406

28.4828 14.0814 9.28094 6.8807 5.44056 4.48047 3.79469 3.28035 2.88031 2.56028 2.29844 2.08023 1.8956 1.73734 1.60019 1.48018 1.37428 1.28016 1.19594 1.12014 1.05156 0.98922 0.9323 0.88012 0.83211

34.0105 16.8053 11.0702 8.20263 6.48211 5.33509 4.51579 3.90132 3.42339 3.04105 2.72823 2.46754 2.24696 2.0579 1.89404 1.75066 1.62415 1.5117 1.41108 1.32053 1.2386 1.16412 1.09611 1.03377 0.97642

39.9027 19.7114 12.9809 9.61568 7.59654 6.25045 5.28896 4.56784 4.00697 3.55827 3.19116 2.88523 2.62636 2.40448 2.21218 2.04392 1.89545 1.76348 1.64541 1.53914 1.44299 1.35558 1.27577 1.20261 1.13531

46.3929 22.9165 15.091 11.1782 8.83058 7.26548 6.14756 5.30911 4.65699 4.13529 3.70845 3.35274 3.05176 2.79378 2.57019 2.37456 2.20194 2.04849 1.91121 1.78765 1.67585 1.57422 1.48143 1.39637 1.31812

53.8065 26.5832 17.5088 12.9716 10.2493 8.43441 7.13807 6.16581 5.40961 4.80465 4.30968 3.89721 3.54819 3.24903 2.98976 2.7629 2.56273 2.3848 2.2256 2.08232 1.95269 1.83484 1.72724 1.6286 1.53786

62.6787 30.9793 20.4129 15.1297 11.9597 9.84644 8.33695 7.20483 6.3243 5.61987 5.04352 4.56322 4.15682 3.80848 3.50658 3.24242 3.00933 2.80215 2.61677 2.44993 2.29898 2.16176 2.03646 1.92161 1.81595

74.0577 36.6288 24.1526 17.9144 14.1715 11.6763 9.89395 8.55721 7.51752 6.68577 6.00524 5.43814 4.95828 4.54698 4.19051 3.8786 3.60339 3.35876 3.13988 2.94288 2.76465 2.60262 2.45468 2.31907 2.19431

90.597 44.8585 29.6123 21.9893 17.4154 14.3662 12.1881 10.5546 9.28412 8.2677 7.4361 6.74309 6.1567 5.65407 5.21847 4.83732 4.501 4.20206 3.93458 3.69385 3.47605 3.27805 3.09726 2.93154 2.77908

124.359 61.6995 40.813 30.3697 24.1038 19.9265 16.9427 14.7049 12.9643 11.5719 10.4326 9.48325 8.67992 7.99136 7.3946 6.87244 6.41171 6.00217 5.63574 5.30595 5.00757 4.73632 4.48865 4.26162 4.05276

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

344

Note: Given to0:5, we solve Eq. (2.8) for z numerically and, because z ¼ lðx þ 2tÞ, we find the optimal stay time x ¼ lz  2t. Sometimes it will be convenient to express t through z: t ¼ 12ð1  ez ðz þ 1ÞÞ ¼ 12ð1  ez  ez zÞ.

(2.12)

The equations for the number of reinforcements obtained at BA1 and BA2 become z r1 ¼  2t; r2 ¼ 1  ez . (2.13) l Table 1, provides the ODT values obtained by numerical solution of Eq. (2.11) for a wide range of values of the parameters l and t. We describe in Appendix D how to obtain the ODTs for values of l and t that do not appear in the table margins, and we show how to convert the normalized values to seconds The following theorem provides estimates of the optimal stay time x. Theorem 2. If x is the optimal stay time, then pffiffi (1) xX maxf2ð lt  tÞ; tð6:70182  2Þg, l pffi t (2) x 2ð l  tÞ for small t.

(3) x

tð6:70182 l

where z ¼ lðx þ 2tÞ is a solution of Eq. (2.8), and mðl; 0Þ ¼ lim mðl; tÞ ¼ 0, t!0

mðl; 0:5Þ ¼ lim mðl; tÞ ¼ 0:5. t!0:5

Proof. Let (for t40) T1 x þ t ¼ Z then x þ t ¼ tZ ¼ t T2 and x þ 2t ¼ ðx þ tÞ þ t ¼ tZ þ t ¼ tðZ þ 1Þ. Now we express the ER ratio in terms of Z using Eq. (2.5) with y ¼ 0: r1 r1 þ r2 ðx þ 2tÞR tðZ þ 1ÞR tR þ1¼ ¼ ¼ ¼ ðZ þ 1Þ . r2 r2 r2 r2 r2 By (2.10) and (2.4) with y ¼ 0, we find r1 tR tðlelðxþ2tÞ þ 1Þ þ 1 ¼ ðZ þ 1Þ ¼ ðZ þ 1Þ r2 r2 r2 tðlelðxþ2tÞ þ 1Þ ¼ ðZ þ 1Þ . 1  elðxþ2tÞ Let

 2Þ for 0:2oto0:3; 0olo1.



See proof in Appendix B.

tðlelðxþ2tÞ þ 1Þ tðlez þ 1Þ tðez þ lÞ , ¼ ¼ z 1  elðxþ2tÞ 1  ez e 1

then r1 þ 1 ¼ mðZ þ 1Þ or r2

3. The matching function m We define the matching function m, which is related to the matching law, as taking values that satisfy the relation r2 T2 m ¼ r1 þ r2 T 1 þ T 2

r1 T1 þ ð1  mÞ ¼ m r2 T2

for 0oto0:5,

m

r2 T 2 ¼ r T

and

r1 T1 þ ð1  mÞ ¼ m , r T

where r ¼ r1 þ r2 is the total number of reinforcements obtained from both schedules during the time T ¼ T 1 þ T 2. For proof of mðl; 0Þ ¼ limt!0 mðl; tÞ ¼ 0 and mðl; 0:5Þ ¼ limt!0:5 mðl; tÞ ¼ 0:5 see Appendix C (Propositions C.1 and C.2). & Below we prove that 0omo0:5 for all admissible values 0olo1 and 0oto0:5, on solutions of Eq. (2.11). Theorem 4. The function m satisfies the inequality 0omo0:5 for all 0olo1; 0oto0:5.

Theorem 3. If x is a solution of Eq. (2.11), then tðlelðxþ2tÞ þ 1Þ tðez þ lÞ ¼ z 1  elðxþ2tÞ e 1

Again we may rewrite the latest formula as

or as

when T 1 and T 2 are optimal. It is clear that, on the strength of Eq. (2.11), when T 1 and T 2 are optimal, the stay time x is uniquely determined by l and t. Therefore, the matching function m depends only on two variables, l and t. Since ODT and MER are complexly interdependent, the range of the function m is not obvious. In order to determine the extreme values of m, we first need to express m in terms of l and t. In the analyses that follow, all mentions of proportions refer to the ratio of measures taken during the lean component ðVI2 Þ to the sum of the same measures for both components.



  r1 T1 þ1¼m þ1 . r2 T2

(3.1)

See proof in Appendix C, Section C.5.

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

4. Some characteristics of m, MER, ODT and their relations

Proof. We have w¼

The following theorem shows that m, that is, the ratio of the proportion of ODT to the proportion of MER, is a linear function of l, that is, the ratio of R2 to R1 . Theorem 5. The graph of m as a function of l is a straightline, whose m-intercept 1et z increases from 0 to 0.5 as t increases from 0 to 0.5, and whose slope k ¼ ez t1 is small: 0oko0:0781141. Proof. Since z does not depend on l (Eq. (2.8)), the derivative qm ql is constant for any fixed value of t. Therefore the graph of m as a function of l is a straight line. The slope of this line is (by Eqs. (3.1) and (2.12)) k¼

t 1  ez  zez ¼ . ez  1 2ðez  1Þ

345

T2 t lt lt ¼ ¼ . ¼ x þ 2t lðx þ 2tÞ z T

We see that w is proportional to l for any fixed t, therefore w ¼ TT2 increases when l increases and wo tz for all 0olo1. & It is proved in Appendix B that the function zt has minimum 6.70182 at t ¼ 0:267581 that corresponds z ¼ 1:79328. Therefore tz has maximum 0.149213 at t ¼ 0:267581 for any fixed l. If l is fixed, then wp0:149213l for all 0oto0:5, and wmax ¼ 0:149213l at t ¼ 0:267581. Theorem 7. The proportion u ¼ rr2 increases when l increases for any fixed t and decreases when t increases l for any fixed l. Moreover, uo lþ1 for all 0oto0:5 and ð1ez Þ uo zð1þez Þ for all 0olo1. Proof. We have by the definition of m and Eq. (3.1),

We find the derivative of the slope and equate it to 0: dk 1  ez þ ð2  ez Þz ¼ ; dz 2ðez  1Þ2

dk ¼0 dz

at z ¼ 0:842569;

ðas t ¼ 0:103292Þ,

kð0:842569Þ ¼ 0:0781141. Since limz!0 k ¼ limz!0

1ez zez 2ðez 1Þ

r2 T2 t lt ¼ ¼ ¼ r mT mðx þ 2tÞ mlðx þ 2tÞ lt lð1  ez Þ ¼ . ¼ mz zð1 þ lez Þ



¼ limz!0

ez ez þzez 2ez

z

¼ limz!0 ze2ez ¼ 0 (by L’Hopital’s Rule) and limz!1 z z k ¼ limz!1 1e2ðez ze ¼ 0, we conclude that 1Þ 0oko0:0781141 for all values of t. Slopes of these lines are very small. This means that dependence m on l is very weak, when l changes from 0 to 1, values of m change not more than by 0.0781141. & The following three theorems (6–8) describe the relation between the values of the MER proportion, the ODT proportion and l, t. Theorems 6 and 7 show that for a fixed value of t, the MER proportion and the ODT proportion increase as l increases, and that for a fixed value of l the MER proportion decreases as t increases, but the ODT proportion has maximum at t ¼ 0:267581. Therefore, it follows that these proportions do not change together for a fixed value of l in the direction predicted by the matching law. Theorem 8 shows that the graph of the relation between the MER and ODT proportions for any fixed t is a concave up portion of a hyperbola that starts at the origin ð0; 0Þ and ends at a point that depends on t. T2 T

Theorem 6. The ODT proportion w ¼ is proportional to l for any fixed t. Moreover, if t is fixed, then wo tz for all 0olo1; if l is fixed, then wo0:149213l for all 0oto0:5.

We have exactly the same expression for concurrent l variable interval schedules. See proof of uo lþ1 in the paper by Belinsky et al. (2004). To prove the second part, we note that qu 1ez ¼ 40, therefore the maximum value of u is ql zð1þlez Þ2 at l ¼ 1. Theorem 8. The relationship between w ¼ TT2 and u ¼ rr2 is described by a concave up portion of a hyperbola w¼

k1 u , k2  k3 u

(4.1)

where k1 ; k2 ; k3 are values that depend on t only: k1 ¼ tez ; k2 ¼ ez  1; k3 ¼ z. Endpoints of this portion z of hyperbola are the origin ð0; 0Þ and the point ð1e Þ t ðzð1þle z Þ ; zÞ. For any fixed l, the relationship between w ¼ TT2 and u ¼ rr2 is described by the parametric equation z

z

lð1e Þ lð1e ze u ¼ zð1þle z Þ, w ¼ 2z

z Þ

with parameter z.

Proof. We have from Eq. (2.13): r2 1  ez 1  ez ¼ ¼ . r1 þ r2 x þ 1  ez z  2t þ 1  ez l Since z is a solution to Eq. (2.8), we may substitute ez ðz þ 1Þ for 1  2t: r2 1  ez ¼z . r1 þ r2 þ zez l

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

346 r2 1 þr2

Let u ¼ r

¼

1ez z þzez l

, solve this equation for l:

z



ez

e uz  1  uz

Z ¼ k4 x þ 1.

and substitute this expression into Eq. (3.1): m¼

ez

tez .  1  uz

If we set k1 ¼ tez ; k2 ¼ ez  1; k3 ¼ z, where z is the k1 solution of Eq. (2.8), we obtain the formula m ¼ k k . 2 3u T2 The optimal proportion of time w ¼ T þT spent on 1 2 BA2 is utez w ¼ mu ¼ z e  1  uz r2 where u ¼ . r1 þ r2

k1 u w¼ k2  k3 u

or

2 vs. Thus, if t is fixed, then the graph of w ¼ T TþT 1 2 r2 u ¼ r þr is a part of a hyperbola that is increasing and 1 2 concave up, as we can see in Fig. 1. The value of w is increasing as u increases. As it follows from Theorems 6 and 7, both values u and w are increasing as l is increasing. Therefore this portion of the hyperbola is between the origin ð0; 0Þ, ð1ez Þ t where l ¼ 0, and the point ðzð1þle z Þ ; zÞ, where l ¼ 1. The

ð1ez Þ function zð1þle z Þ decreases as t increases, because its ð1þzez e2z Þ dz derivative z2 ð1ez Þ2 is negative and dt 40. Therefore

the bigger the t, the shorter the portion of the hyperbola in the direction of u-axis. For any t, the portion of the hyperbola passes through the origin ð0; 0Þ with the slope k1 ¼ tez . Therefore the bigger the t, the higher the corresponding portion of the hyperbola. But tz has maximum 0.149213 at t ¼ 0:267581 for any fixed l, as follows from Proposition B.1, Appendix B. Thus, for t ¼ 0:267581 the portion of the hyperbola reaches the highest level 0.149213, all other curves stop short of that level (see Fig. 1). For any fixed l, the relationship between w ¼ TT2 and u ¼ rr2 is described by the parametric equation z

z

z Þ

lð1e Þ lð1e ze u ¼ zð1þle z Þ, w ¼ 2z

with parameter z. The

l ð1þl ; 0Þ,

endpoints of these curves are when t approaches 0, and ð0; 0Þ, when t approaches 0.5. These curves have a maximum at t ¼ 0:267581, that corresponds z ¼ 1:79328, where the derivative qw equals 0 (see details qz in Appendix B). The highest point on this arc-shaped 0:464848l ; 0:149213lÞ. & curve is ð1þ0:166422l The following theorem shows that for experimentally relevant values of l and t, and given that t is fixed, there is a linear relationship between Z ¼ TT 1 and x ¼ rr1 , where 2 2 Z41, x41. Theorem 9. For any fixed t the relationship between Z ¼ TT 1 and x ¼ rr1 is linear with slope depending 2

only on t:

2

The slope k4 of these lines decreases from infinity to 2 when t increases, and the Z-intercept is 1 for all lines. The graph of Z vs. x is a ray emitted from the point   z  2t z þ 1 . ; 1  ez lt For any fixed l, the relationship between w ¼ TT2 and u ¼ rr2 is described by the parametric equation x¼

zð1lþez Þ 1ez

 1, Z ¼ lð1ez2zzez Þ  1 with parameter z.

1 1 Proof. It is obvious that w ¼ Zþ1 and u ¼ xþ1 . If we substitute these formulas into Eq. (4.1), we get the following:



1  ez 1  ez  zez  t xþ t t

or, by Eq. (2.12): Z¼

1  ez 2t  t 1  ez ¼ xþ x þ 1. t t t

Thus, there is a linear relationship between Z ¼ TT 1 and 2 x ¼ rr1 for any fixed value of t. We have proved in 2 Appendix C, Section C.3, that the function 1et z increases from 0 to 12. Therefore the slope of these lines 1ez decreases from infinity to 2 as t changes from 0 to t 0.5, and the Z-intercept is 1. Since z  2t r1 x l x¼ ¼ ¼ and r2 1  ez 1  ez z T1 x þ t l  t z ¼  1, ¼ Z¼ ¼ t t lt T2 z2t z we see that x and Z decrease from 1 to 1e z and t  1, respectively, when l increases from 0 to 1. Therefore the graph of Z vs. x is a ray whose initial point is z2t z ð1e z ; t  1Þ. If t is close to 0.5, then we have (see Appendix B for justification)

lim

t!0:5

z  2t ¼1 1  ez

and

lim

t!0:5

z  1 ¼ 1. t

We use Eq. (2.12) and L’Hopital’s Rule to determine the behavior of the abscissa for small t: lim

t!0

z  2t z  1 þ zez þ ez ¼ lim z z!0 1e 1  ez 1  zez ¼ lim ¼ 1. z!0 ez

The abscissa

z2t 1ez

is an increasing function of t,

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

because its derivative is positive     z  2t 0 z  1 þ zez þ ez 0 ¼ 1  ez 1  ez ¼

and m¼

e3z þ ez ð1  2zÞ  1  e2z . ðez  1Þ2

1 X 3n þ 1  2n  2n n e3z þ ez ð1  2zÞ  1  e2z ¼ z. n! n¼2

Therefore, when t increases, the initial point of the ray shifts from 1 to the right. The ordinate zt  1 has minimum 5.70182 at t ¼ 0:267581 (see Appendix B). Therefore the initial point of the ray is above Z ¼ 5:70182 for any 0oto0:5. For any fixed l, the relationship between x ¼ rr1 and 2

2

is described by the parametric equation

zð1þez Þ

2z l x ¼ 1e z  1, Z ¼ lð1ez zez Þ  1 with a parameter z (use Eq. (2.12)). Because z  1 ¼ 1, lim Z ¼ lim t!0 t!0 lt   1 z z þe 1 l lim x ¼ lim 1¼ , z t!0 z!0 l 1e

z lim Z ¼ lim  1 ¼ 1, t!0:5 t!0:5 lt   1 z z þe l lim x ¼ lim  1 ¼ 1, z!1 t!0:5 1  ez the graph of Z vs. x has the vertical asymptote x ¼ 1l and the minimum point at t ¼ 0:267581, that corresponds z ¼ 1:79328, where the derivative qZ qz equals 0 (see details in Appendix B).  0:64998; The lowest point on the graph is ð2:15128 l 6:70182  1Þ. & l

5. Equations and formulas expressed in terms of actual variables All the equations and formulas above are in terms of normalized variables. We believe it may be helpful to present them also in terms of the actual variables. The following equations yields time a in mi1 nutes provided that b ¼ 0, 0oro 0:5 R1 ¼ 2R1 , and the function m (cf. with Eqs. (2.11), (3.1), (2.13), by using Eq. (2.3)): eR2 ðaþ2sÞ ð1 þ R2 ða þ 2sÞÞ ¼ 1  2R1 s

sðR2 eR2 ðaþ2sÞ þ R1 Þ , 1  eR2 ðaþ2sÞ

r1 ¼ R1 a;

The numerator here is positive, since it may be represented as a sum of the series with positive coefficients

Z ¼ TT 1

347

r2 ¼ 1  eR2 ðaþ2sÞ ;

r ¼ r1 þ r2 .

6. Discussion The schedule CONC VRVI and its time-based analog CONC VT*VT have been used to investigate if matching characterizes the behavior of animals in twoalternative choice situations better than total reinforcement maximization does. The results of several experiments (Baum & Aparicio, 1999; DeCarlo, 1985; Heyman & Herrnstein, 1986) appear to leave room for interpretation. Part of the problem seems to be that there is not a very precise understanding of theoretical outcomes associated with reinforcement maximization that could help identify experimental evidence of optimal behavior, or of approximations thereof. As we did in our paper concerning CONC VTVT schedules (Belinsky et al., 2004), here we derived the ODT equations for CONC VT*VT and used those equations to characterize the function mðl; tÞ that satisfies the following equation when T 1 and T 2 are optimal: m

r2 T2 ¼ . r1 þ r2 T 1 þ T 2

We also explored features of the relationship between the ODT proportion ðwÞ for the lean (VT) schedule and the MER proportion ðuÞ for the same. Similarly, we explored the relationship between the ODT ratio ðZÞ and the MER ratio ðxÞ. The findings that are more pertinent to the identification of data denoting an approximation to optimal behavior with CONC VT*VT schedules are summarized in Table 2. For comparison, we also show in Table 1 the same features of optimal performance with CONC VTVT schedules. With the CONC VT*VT schedule, the region of the procedure parameter ðl; tÞ space in which optimal behavior requires that the animal switch between alternatives is restricted to values of to0:5 (Table 2, row 1) and within that region the optimal stay time on the VT component is zero or, equivalently, the optimal dwell time on VT is equal to the COD. In the same region, the minimum optimal stay time on the VT* component depends on l and t and is positive for t40 (Table 2, row 2). In their analysis of optimal behavior in CONC VTVT schedules, Houston and McNamara (1981) labeled this portion of the parameter space, where for optimal behavior (OB) the subject must spend some time on the dense component and switch to the lean component just to get a stored reinforcer, the

ARTICLE IN PRESS 348

R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

Table 2 Comparison of performance features that can be used to identify approximations to optimal behavior in CONC VT*VT and CONC VTVT in the Stay–Switch region Feature

CONC VT*VT

CONC VT VT

(1) Boundaries of Stay–Switch region

0olo1 0oto0:5

0olo1 0otot0 0:921   2t lþe2t 1 1  ln lþe l 1  2  2t  e2t l

  xXt 6:702 l 2

xX 2tð1lÞ l

(3) Function m

m is a linear function of l Slope varies between 0 and 0.078 depending on t

m is a linear function of l Slope varies between 0 and 0.5, depending on t

(4) Relationship between the dwell time proportion w and the earned reinforcement proportion u for a fixed l as t varies

The graph of the function of w vs. u is an arc-shaped curve with maximum at the value u corresponding to t ¼ 0:268

(5) Upper boundary of the dwell time proportion w

wp0:149l w ¼ 0:149l for t ¼ 0:268

wp l2 ; w ¼ l2 at t ¼ 0

(6) Relationship between the dwell time ratio Z ¼ T 1 =T 2 and the reinforcement ratio x ¼ r1 =r2

Z is a linear function of x: Z ¼ kx þ 1, where kX2 and may be very large for small t The Z intercept is 1 for all t

Z is a linear function of x: Z ¼ k1 x þ k2 , where 1:086ok1 o2 and 1ok2 o0:861 depending on t

(2) Lower boundary for minimum optimal stay time on dense schedule

‘‘Stay–Switch’’ region. The Stay–Switch region encompasses a substantially greater portion of the parameter space in CONC VTVT (81.6% of the square 0 olo1; 0oto1) than in CONC VT* VT (50% of the same square). In CONC VTVT there is also a Stay–Stay region in which optimal performance requires that the animal spend some time in both component schedules. We show in Theorem 1 that there is no Stay–Stay region with CONC VT*VT. One of the findings of our analysis of optimal behavior in CONC VTVT (Belinsky et al., 2004) is that as l increases the function mðl; tÞ increases linearly with a slope that varies between 0 and 0.5 depending on t. The present analysis of OB in CONC VT*VT also indicates that m is a linear function of l, but the slope of the line can only vary between 0 and 0.078 (Table 2, row 3), denoting a very weak dependence of m on l. With CONC VTVT the ODT proportion w increases as u increases for any fixed value of l as t varies. (A proof of this relationship was not provided in our paper on CONC VTVT (Belinsky et al., 2004). It is provided here in Appendix E.) With CONC VT*VT, however, if l is fixed and t varies, w first increases and then decreases as u increases . The maximum point of the curve is always at t 0:268 (Table 2, row 5). This is an interesting result that is clearly incompatible with matching. We can vary u by changing the programmed

w is an increasing function of u

reinforcement rates ðlÞ, by changing the COD ðtÞ or by changing both. According to G-match (Eq. (1.3)) the relationship between w and u is monotonic regardless of how the variation in u is accomplished. However, if the subject behaves optimally under CONC VT*VT then if l is constant and t is varied, the relation between w and u is a concave down curve with a maximum at the value of u corresponding to t 0:268. Another result from our analysis of CONC VTVT performance (Belinsky et al., 2004) was the finding that in the Stay–Switch region, the relationship between the ODT proportion w and the MER proportion u is a concave up portion of a hyperbola and, thus, mathematically incompatible with G-match. However, when the curve representing the relationship between w and u in the Stay–Stay region is added at the end of the curve of the hyperbola to indicate graphically the relation between w and u across all 0ouo0:5 the resulting curve can be fitted relatively accurately by G-match with s41. Here, we found that the relation between w and u for any fixed t in CONC VT*VT is also a concave up portion of a hyperbola. The fraction of the domain of u over which the function exists gets smaller as t increases and can be quite small for t close to 0.5 (see Fig. 1). The endpoint, when l ¼ 1, is always lower than w ¼ 0.149 (Table 2, row 5; see Fig. 1). Since there is no Stay–Stay region with CONC VT*VT, to perform optimally

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

outside the Stay–Switch region the subject must never switch away from the dense schedule. Consequently, a functional relationship between w and u only exists for the values of u and w that encompass the described hyperbola portions. It does not make sense to attempt fitting G-match to these curves. Finally, with CONC VT*VT as with CONC VTVT, there is a linear relationship between the ODT ratio Z and the MER ratio x (Table 2, row 6). However, with the former, the Z-intercept is always 1 and the slope is larger than 2 and decreases as t increases, whereas with the latter schedule the Z-intercept is between 1 and 0.86 and the slope is greater than 1 but less than 2. Although our analysis is based on a CONC VT*VT schedule, all the findings are applicable as approximations to CONC VRVI when response rates are reasonably high in both the VR and VI components. To compare the DT obtained with a CONC VRVI to ODT from our analysis, one must first estimate the parameter value of a VT* that would make it equivalent to the VR component under consideration in terms of the generated reinforcement rate. The VT* parameter is the inverse of the programmed reinforcement rate ð1=R1 Þ, whereas the VR parameter is the mean number of responses per reinforcement. The best estimate of the parameter of a VT* schedule component that makes the schedule equivalent to the VR for which the data are available in terms of reinforcement rate is the mean of the distribution of inter-reinforcement intervals (IRI) obtained with the VR schedule. The accuracy of the approximation will depend only on the variance and other features of the distribution of IRIs, as will be discussed below. Response rate during the VR component is irrelevant except as it determines the characteristics of the IRI distribution. The VT and VI parameters are the inverse of the programmed reinforcement rate, thus, as long as the VI response rate is reasonably high, the VI parameter can be used to compute l. Even if the total reinforcement maximization hypothesis was true, that is, if with repeated exposure to the contingencies prescribed by concurrent schedules a subject’s behavior evolves so as to maximize the total reinforcement rate, asymptotic performance must always be less than optimal. Our optimality analysis yields ODT values for one cycle of behavior. A cycle begins with a switch to one schedule component and ends with the next switch back to the same component. Experiments consist of multiple cycles, and the dependent variable is the average of the DTs over several cycles. The inevitable behavioral variability ensures that the average ER rate from a multi-cycle experiment can never reach the maximum rate from a single cycle. This does not negate the applicability of our findings to experimental results (see the discussion of this issue in Belinsky et al., 2004), but it ensures that a subject’s asymptotic performance cannot be consistently optimal.

349

In addition to the constraints on optimality imposed by behavior variability, it is obvious that once a sufficiently high percentage of the maximum reinforcement rate is achieved, the subject’s behavior will no longer be sensitive to further increases. In CONC VTVT and CONC VT*VT, when ER rate is near the maximum, a further increase in programmed reinforcement rate requires a minute behavior adjustment that is likely to be below the threshold for behavior differentiation. In addition, given a high enough ER rate level, any further rate increase will be small in comparison to the already achieved level and, therefore, it is likely that it will be below the subject’s threshold for reinforcement rate discrimination. It follows that OB can be approached but not achieved. The utility of our analysis is based on the assumption that OB, and therefore MER, might be approached or, at least, that they are important to the performance. Many of the papers we have already cited investigated optimization as a plausible explanation for the distribution of behavior between the reinforced alternatives. While there is much evidence that G-match describes stable behavior in many concurrent schedules, it is also clear that animals tend to maximize rather than match when, for example, stimuli are presented to signal the deviations from matching that yield higher reinforcement rates (Heyman & Tanz, 1995). By using the equations we derived here and making some assumptions concerning G-match and its parameters it would be possible to characterize mathematically the differences between matching and optimization predictions. However, that is an undertaking well beyond the scope of this paper. Our analysis is useful for the investigators, because it offers several ways to assess if data from CONC VTVT or CONC VT*VT schedules denote an approach to OB. All the features of OB discussed above should be roughly approximated if a subject’s behavior is driven towards optimality by the programmed contingencies. In a well-constructed experiment, the severe violation of the bounds or relationships specified in the features listed in Table 2 can be taken as evidence that the distribution of behavior between the alternatives is not determined exclusively by increases in total ER rate. We believe that the form of these violations can provide information that can assist theorists and investigators to develop and test behavioral models to account for subjects’ performance with the schedules discussed here. It is also worth considering that the matching law (and G-match) concerns the relationship between the behavior proportion and the ER proportion and, therefore, says nothing about the dwell time at each alternative ðDTi ; i ¼ 1; 2Þ. One advantage of the OB assumption is that it allows the prediction of optimal DTi ðODTi Þ. We believe that experimenters should compare measured dwell times to ODTi . The observed deviations might be useful in the development of models

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

350

that predict DTi as a function of the true independent variables, that is, the programmed reinforcement rates and the COD. Our results support the analyses by other investigators (Baum & Aparicio, 1999; DeCarlo, 1985; Herrnstein & Heyman, 1979, Heyman & Herrnstein, 1986) indicating that CONC VRVI and CONC VT*VT differentiate between matching and maximization of total ER rate more clearly than CONC VIVI (or CONC VTVT). Therefore, if the aim of an experiment was to pit the matching hypothesis against the total ER maximization hypothesis, it would be best to use the CONC VT*VT schedule. In that experiment both l and t should be manipulated systematically. This is most easily accomplished by (1) making the programmed reinforcement rate in VT* the same in all conditions (2) manipulating l by varying the programmed reinforcement rate in VT and (3) manipulating t by varying the COD.

R is an increasing function of y, and because 2lt

limy!1 1eyþ2tþly ¼ l, we conclude that Let y ¼ 0. Then R¼

1e2lt 2t

oRol.

1  elðxþ2tÞ þ x . x þ 2t

It was shown in Section 2 that R has maximum at x ¼ lz  2t, where z is a solution to Eq. (2.8), and R ¼ lez þ 141 at this point (see Eqs. (2.9)–(2.11)). Therefore this is the absolute maximum of the function R.

Appendix B. Properties of the solution of Eq. (2.8) Since the left-hand side of Eq. (2.8) is positive, we z conclude that to0:5. The derivative dðe dzðzþ1ÞÞ ¼ zez o0, therefore the function on the left decreases from 1 to 0, and Eq. (2.8) has exactly one solution for any value of 0oto0:5. If t ¼ 0, we have the equation ez ðz þ 1Þ ¼ 1 that has the only solution z ¼ 0.

Acknowledgments Proposition B.1. If z is a solution of Eq. (2.8), then This work was supported by Grant 5R24DAO7256 from the National Institute on Drug Abuse-Minority Institutions’ Drug Abuse Research Development Program.

1. z is an increasing function of t lim z ¼ 1 2. t!0:5 3. lim

t!0

Appendix A. An addition to the proof of Theorem 1 It was shown that the function R does not have stationary points inside the region xX0; yX0. Therefore its absolute maximum is on the boundaries x ¼ 0 or y ¼ 0 or at the corners ð1; 1Þ; ð1; 0Þ; ð0; 1Þ. The latter would mean that maximum is not attainable. We have from Eq. (2.6) lim

ðx;yÞ!ð1;1Þ

R¼ ¼

lim

ðx;yÞ!ð1;1Þ

lim

ðx;yÞ!ð1;1Þ

1e

lðxþ2tÞ

þ x þ ly x þ y þ 2t 1 þ l xy x þ ly ¼ lim y . ðx;yÞ!ð1;1Þ 1 þ xþy x



1  e2lt þ ly ; y þ 2t

qR 2lt  1 þ e2lt ¼ . qy ðy þ 2tÞ2

is To show that the numerator 2lt  1 þ e positive for all l and t, we set 2lt ¼ z and find the derivative of the function gðzÞ ¼ z  1 þ ez . Since g0 ðzÞ ¼ 1  ez 40 and gð0Þ ¼ 0, we conclude that gðzÞ40 and therefore qR qy 40 for all l and t. Thus,

(B.1)

z z ¼ 1; min ¼ 6:70182 at t ¼ 0:267581. t t

(B.2)

Proof. When t ! 0:5, we have from Eq. (2.8) lim ez ðz þ 1Þ ¼ 0,

t!0:5

therefore lim z ¼ 1;

lim ez ¼ 0

t!0:5

and

lim

t!0:5

z ¼ 1. t

Moreover, by (2.12) 2t ¼ 1  ez ðz þ 1Þ ¼

1 X n¼2

2lt

lim ez ¼ 0.

t!0:5

pffiffiffi pffiffiffi 4. z4 maxð2 t; 6:70182tÞ and z 2 t for small t. 5. z 6:70182t for 0:2oto0:3 and all values of l.

t!0:5

This limit depends on k ¼ xy. When 0oko1; we have loRo1. Let x ¼ 0. Then

and

ð1Þn

n1 n z. n!

We have an alternating series with decreasing vanishing coefficients, it converges for each z40 and itspsum ffiffiffi is 1 2 less than the first term: 2to z , that is, z42 t and pffi 2 x42ð lt  tÞ. pffiffiffi For pffiffismall values of t we have z 2 t and t x 2ð l  tÞ.

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

The derivative of z with respect to t from Eq. (2.8) is d ð1  2tÞ dz 2ez ¼ dt . (B.3) ¼ d z dt z ðe ðz þ 1ÞÞ dz dz We see that dt 40, that means that z increases when t increases. Because limt!0 z ¼ 0, we may estimate the behavior of the ratio zt using L’Hopital Rule and Eq. (B.3): dz z 2ez lim ¼ lim dt ¼ lim ¼ 1. t!0 t t!0 1 t!0 z To find the minimum of this ratio, find the derivative (use Eq. (B.3)) d z tz0  z 2ez t  z2 . ¼ ¼ dt t t2 t2 z Equate this derivative to 0: t ¼ 12 z2 ez

351

C.2. Extreme values of function m on the horizontal line t ¼ 0:5 Proposition C.2. On the horizontal line t ¼ 0:5, function m is constant, m ¼ 0:5. Proof. According to (3.1) and (B.1) tðlez þ 1Þ t ¼ lim ¼ 0:5. z t!0:5 t!0:5 1 1e Therefore for 0olo1 we have m ¼ 0:5 on this line. &

m ¼ lim

C.3. Extreme values of function m on the vertical line l¼0 Proposition C.3. The values of function m on the vertical line l ¼ 0 satisfy the inequality 0omo0:5 for 0oto0:5.

ez ðz þ 1Þ ¼ 1  z2 ez .

þ1Þ .Proof. m ¼ liml!0 tðlezz ¼ 1et z . 1e To estimate this function, we exclude t using Eq. (2.12)

The solution to this equation is z ¼ 1:79328, that corresponds to t ¼ 0:267581; the minimum of the ratio zt is 6.70182; that is, zX6:70182t and xXtð6:70182  2Þ. l Since the derivative equals 0 at t ¼ 0:267581, the ratio changes very little near this point and we see that z 6:70182t and x tð6:70182  2Þ for 0:2oto0:3 and all l values of l. &

t 1  ez  zez z ¼ 0:5  z . ¼ z z 1e 2ðe  1Þ 2ð1  e Þ P zn1 z We know that the series e z1 ¼ 1 n¼1 n! converges for all z and because all coefficients are positive, its sum is an increasing function that grows from 1 to 1. Therefore 2ðezz1Þ decreases from 0.5 to 0 and 0.5 2ðezz1Þ increases from 0 to 0.5 on this line. &

Note: We have two lower boundaries for z. Equating them, we get pffiffiffi 6:70182t ¼ 2 t.

C.4. Extreme values of function m on the vertical line l¼1

The solution to this equation is t ¼ 0:0890583. pffiffiffi This meanspffithat for to0:0890583 estimations z42 t and x42ð lt  tÞ are better, but for t40:0890583 it is better to use zX6:70182t and xXtð6:70182  2Þ. & l

Proposition C.4. The values of function m on the vertical line l ¼ 1 satisfy the inequality

and substitute t into Eq. (2.8)

Appendix C. Estimation of the function m C.1. Extreme values of function m on the horizontal line t¼0 Proposition C.1. On the horizontal line t ¼ 0 the function m is constant, m ¼ 0. Proof. According to (3.1) and (B.2), since limt!0 z ¼ 0, m ¼ lim

t!0

tðlez þ 1Þ tðl þ 1Þ ¼ 0. ¼ lim z t!0 1e z

Therefore for 0olo1 we have m ¼ 0 on this line. &



0omo0:5

for 0oto0:5. z

þ1Þ Proof. In this case m ¼ tðe1ez ¼ ð1e Eq. (2.12)). Find the derivative

z zez Þðez þ1Þ 2ð1ez Þ

(by

dm 2  2ez þ ð2 þ ez  ez Þz ¼ . dz 2ðez  1Þ2 We show that the numerator is positive: 2  2ez þ ð2 þ ez  ez Þz z3 z4 z5 þ  þ  3 4 60   1 X ð2n  1Þz2n z2nþ1  ¼2 ð2nÞ! ð2n þ 1Þ! n¼1

¼ z2 

¼

1 X nð1 þ ð1Þn Þ  2 n z. n! n¼2

ðC:1Þ

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

352

We have an alternating series with decreasing vanishing coefficients, it converges for each z40 and its sum has the same sign as its first term, that is, positive. Therefore the function m is increasing on this line. Since limt!0 z ¼ 0; we have tðez þ 1Þ 2t ¼ lim ¼ 0; by Eq. (B.2). lim t!0 1  ez t!0 z Since limt!0:5 z ¼ 1, we have tðez þ 1Þ ¼ 0:5. t!0:5 1  ez Thus, 0omo0:5 on this line.

l  l1 ðq  q1 Þ l2  l1 2 0:333  0:32 ð2:525  2:874Þ 2:761. ¼ 2:874 þ 0:36  0:32

&

C.5. Extreme values of function m Proof of Theorem 4. It is obvious that qm tez ¼ 40, ql 1  ez which means that for any constant t the value of m is increasing. Therefore, the function m does not have stationary points inside the region. Its minimum is on the line t ¼ 0, that is, 0 (Proposition C.1). Its maximum is at t ¼ 0:5, that is 0.5 (Proposition C.2). We conclude that 0omo0:5 for the entire domain 0olo1; 0oto0:5. &

Appendix D. Computing optimal dwell times To facilitate the computation of optimal dwell times, we have prepared Table 1. The table was created by solving Eq. (2.11) numerically for x as t was varied between 0.04 and 0.48 in steps of 0.04, and l was varied between 0.04 and 1 in steps of 0.04. The entry x is the optimal normalized stay time in the dense component. To use the table, one must first convert the COD s to a normalized t ¼ R1 s value. If for example R1 ¼ 1=15 (1/ s), R2 ¼ 1=45 (1/s) and COD s ¼ 2 s, then l ¼ 15=45 0:333 and t ¼ 2=15 0:133. If the exact values of t and l are not in the table, then we use linear interpolation. First we find the closest smaller values and the closest larger values. In our case, t1 ¼ 0:12, t2 ¼ 0:16 and l1 ¼ 0:32; l2 ¼ 0:36. We have 4 points in our table: ðt1 ; l1 Þ; ðt1 ; l2 Þ; ðt2 ; l1 Þ; ðt2 ; l2 Þ. The corresponding values of time are x12 ¼ 2:355;

x21 ¼ 3:280;

Then, we interpolate with respect to l: ta ¼ q1 þ

lim

x11 ¼ 2:679;

t  t1 ðx22  x12 Þ t2  t1 0:133  0:12 ð2:880  2:355Þ 2:525. ¼ 2:355 þ 0:16  0:12

q2 ¼ x12 þ

x22 ¼ 2:880.

First, we interpolate with respect to t for l1 and l2 separately: t  t1 q1 ¼ x11 þ ðx21  x11 Þ t2  t1 0:133  0:12 ð3:280  2:679Þ 2:874, ¼ 2:679 þ 0:16  0:12

The value of x calculated by solving Eq. (2.11) numerically for t ¼ 0:133 and l ¼ 0:333 is 2.751. Clearly, the interpolation provides a good approximation. Since the time x is normalized, we need to get the actual value: a ¼ Rx 2:761  15 ¼ 41:415 s and dwell time on BA1 1 is 41:415 s þ 2 s ¼ 43:415 s. Thus, the optimal behavior in a CONC VT15*VT45 with COD ¼ 2 s is to stay on the VT15* side for 43:415 s, switch to the VT45 side, remain there until the COD elapses ð2 sÞ and switch back. If 0:2oto0:3, then it is easier to compute x by the formula given in Theorem 2: x tð6:70182  2Þ. For l example, if t ¼ 0:26; l ¼ 0:3, then x 0:26ð6:702 0:3  2Þ ¼ 5:288, whereas the value of x calculated by solving Eq. (2.11) numerically for t ¼ 0:26; l ¼ 0:3 is 5.290. For very small t ðto0:01Þ, good approximation may be obtained by the another formula given in Theorem 2: pffiffi x ¼ 2ð lt  tÞ. For example, if t ¼ 0:008; l ¼ 0:5; then pffiffiffiffiffiffiffiffi x ¼ 2ð 0:008 0:5  0:008Þ ¼ 0:341771, whereas the value of x calculated by solving Eq. (2.11) numerically for t ¼ 0:008; l ¼ 0:5 is 0.365038. Clearly, these formulas provide a good and easy approximation. The earned reinforcements can be computed by entering x and y ¼ 0 in Eq. (2.4). The maximum number of reinforcements that can be earned in a cycle given the selected values of programmed reinforcement rates and COD is r1 + r2 . Although our analysis is based on a CONC VT*VT schedule, all the findings as well as Table 1 are applicable to CONC VRVI as approximations when response rates are reasonably high in both the VR and VI components. To compare the DT obtained with a CONC VRVI to ODT from our analysis, one must first estimate the parameter value of a VT* that would make it equivalent to the VR component under consideration in terms of the generated reinforcement rate. The VT* parameter is the inverse of reinforcement rate ð1=R1 Þ whereas the VR parameter is the mean number of responses per reinforcement. The best estimate of the parameter of a VT* schedule component that is equivalent to the VR for which the data are available is the mean of the distribution of inter-reinforcement intervals obtained with the VR schedule. The VT and VI parameters are

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 49 (2005) 339–353

the inverse of the programmed reinforcement rate; thus, as long as the VI response rate is reasonably high, the VI parameter can be used for 1=R2 in Table 1.

Appendix E. Relation between w and u for any fixed value of l as t varies in CONC VTVT and CONC VT*VT In both cases we have the same expressions for w and u: w¼

lt ; z



lð1  ez Þ . zð1 þ lez Þ

wt and we look at the derivative dw du ¼ ut that characterizes growth of w as a function of u for fixed l. As it is shown in our previous paper (Belinsky, Gonzalez & Stahl, 2004; Theorems 5 and 6), in CONC VTVT ut o0 and wt o0, therefore dw du 40 and w is an increasing function of u for any fixed l. In contrast, with CONC VT*VT, wt changes its sign at t ¼ 0:267581 for any fixed l, while ut o0 for all t (see Theorems 6 and 7). lð1ez Þ Thus, when u rises from 0 to zð1þle z Þ, the value of w z

z

z

lð1e Þ increases from 0 to lð1e 2zze Þ, when u rises from zð1þle z Þ

to

l lþ1,

the value of w decreases from

lð1ez zez Þ 2z

to 0.

References Baum, W. M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231–242.

353

Baum, W. M. (1979). Matching, undermatching, and overmatching in studies of choice. Journal of the Experimental Analysis of Behavior, 32, 269–281. Baum, W. W., & Aparicio, C. F. (1999). Optimality and concurrent variable–interval variable-ratio schedules. Journal of the Experimental Analysis of Behavior, 71, 75–89. Baum, W. M., & Rachlin, H. C. (1969). Choice as time allocation. Journal of the Experimental Analysis of Behavior, 12, 861–874. Belinsky, R., Gonzalez, F., & Stahl, J. (2004). Optimal behavior and concurrent variable interval schedules. Journal of Mathematical Psychology, 4, 251–266. Davison, M., & McCarthy, D. (1988). The matching law. Hillsdale, NJ: Lawrence Erlbaum. DeCarlo, L. T. (1985). Matching and maximizing with variable-time schedules. Journal of the Experimental Analysis of Behavior, 43, 75–81. Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267–272. Herrnstein, R. J., & Heyman, G. M. (1979). Is matching compatible with reinforcement maximization on concurrent variable interval, variable ratio? Journal of the Experimental Analysis of Behavior, 31, 209–223. Heyman, G. M., & Herrnstein, R. J. (1986). More on concurrent interval-ratio schedules: A replication and review. Journal of the Experimental Analysis of Behavior, 46, 331–351. Heyman, G. M., & Tanz, L. (1995). How to teach a pigeon to maximize overall reinforcement rate. Journal of the Experimental Analysis of Behavior, 64, 277–297. Houston, A. I., & McNamara, J. (1981). How to maximize reward rate on two variable-interval paradigms. Journal of the Experimental Analysis of Behavior, 35, 367–396. Stephens, D. W., & Krebs, J. R. (1986). Foraging theory. Princeton, NJ: Princeton University Press. Wearden, J. H., & Burgess, I. S. (1982). Matching since Baum (1979). Journal of the Experimental Analysis of Behavior, 38, 339–348.