Optimal behavior and concurrent variable interval schedules

Optimal behavior and concurrent variable interval schedules

ARTICLE IN PRESS Journal of Mathematical Psychology 48 (2004) 247–262 Optimal behavior and concurrent variable interval schedules Rachel Belinsky,a,...

347KB Sizes 0 Downloads 154 Views

ARTICLE IN PRESS

Journal of Mathematical Psychology 48 (2004) 247–262

Optimal behavior and concurrent variable interval schedules Rachel Belinsky,a,* Fernando Gonza´lez,b and Jeanne Stahlb a

Mathematics Department, Morris Brown College, USA Psychology Department, Morris Brown College, USA

b

Received 16 April 2003; revised 24 March 2004

Abstract Behavior maintained with 2-component concurrent variable interval schedules of reinforcement (CONC VIVI) is described well by the matching law. Deviations from matching behavior have been handled by adding free parameters to the matching law equation. With CONC VIVI schedules there are infinitely many solutions to the matching law equation at each value of the procedural parameters. However, at each value of the procedural parameters, only one combination of durations of intervals spent in each VI component (dwell times) yields the combined maximum reinforcement rate. The equations that yield the optimal dwell times solution for CONC VIVI schedules are mathematically incompatible with the matching law. Optimal performance and matching coincide only when the parameter values of the two VI components are equal. It seems reasonable to use optimal behavior to assess performance in these schedules. Researchers have not compared optimal and empirical performances in CONC VIVI possibly because the equations for optimal dwell times (ODT) can be solved only numerically. We present a table of ODT for a wide range of VIs and changeover delays. We also derive a function m that can be used to compare matching data and the matching behavior predictions of optimization. We prove that 0:5omo1:003502; and we describe some of the more nteresting properties of the function. r 2004 Elsevier Inc. All rights reserved. Keywords: Optimal behavior; Optimization; Matching behavior; Matching law; Concurrent variable interval schedules; Schedules of reinforcement

1. Introduction 1.1. The matching law The procedure more frequently used to study the distribution of choices among alternatives in infrahuman subjects is the 2-independent-component concurrent variable interval schedule (CONC VIVI). This experimental preparation typically involves a laboratory animal (e.g. rat) trained to emit two different predefined responses (e.g. pressing on lever 1 or lever 2) that are occasionally followed by a reinforcer (e.g. food pellet). A computer controls all the conditions in the experimental chamber, including the presentation of reinforcers and the presentation of visual or auditory stimuli *Corresponding author. Present address: Department of Mathematics and Statistics, Georgia State University, University Plaza, Atlanta, GA 30303, USA. E-mail address: [email protected] (R. Belinsky). 0022-2496/$ - see front matter r 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.jmp.2004.04.001

that may be paired with the different response alternatives. The computer also records the time of occurrence of each response and all other relevant events that take place during the experimental session. A VI schedule prescribes that the reinforcer becomes available for delivery after intervals that vary in duration unpredictably. For example, in a VI60 sec a food pellet becomes available on the average once per minute. The first lever press emitted after a reinforcer becomes available delivers the food pellet. At any point in time only one undelivered reinforcer is stored waiting for delivery, and no more are made available until the stored reinforcer is delivered. In a CONC VI1 VI2 two VI schedules operate independently, one for each of two behavior alternatives (BA), e.g. lever 1 and lever 2. While engaged on BA1 ; the subject obtains reinforcements according to schedule VI1 : At the same time, since VI2 schedule continues to run, the probability that reinforcement will be available when the subject eventually changes over to BA2

ARTICLE IN PRESS 248

R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

increases. Similarly as the subject responds on A2 the probability that reinforcement will be available for a response on BA1 increases with time spent at BA2 : A changeover delay (COD) is typically imposed to keep subjects from switching at a high rate between BA1 and BA2 : The COD stipulates that the subject cannot obtain a reinforcer within some time period following a changeover (i.e., switch) from one alternative to the other. For example, with a COD 3 sec, a reinforcer that becomes available on BA2 as the subject is engaged on BA1 will be delivered for the first response on BA2 emitted at least 3 sec after a switch to BA2 : In studies involving CONC VI1 VI2 the usual variable of interest is the number of responses emitted at one of the alternatives expressed as proportion of the number of responses emitted at both alternatives. This choice of variable stems from the finding that CONC VI1 VI2 data are described well by the equation P1 r1 ¼ ð1:1Þ P1 þ P2 r1 þ r2 or what is the same P1 r1 ¼ ; P2 r2 where P1 and P2 refer to the responses on BA1 and BA2 ; respectively, and r1 and r2 are the earned reinforcements (ER) on each alternative. Eq. (1.1) has had considerable impact in behavioral psychology. The equivalence between the proportional measure of behavior and the proportional measure of obtained reinforcements is commonly referred to as the matching law (Herrnstein, 1970). The matching law has generated much experimentation and stimulating debate. Many studies report consistent deviations from Eq. (1.1) (see Davidson & McCarthy, 1988, for review) and several variations of the matching law have been put forth to account for the deviations. Baum (1974), for example, suggested the exponentiated version  s P1 r1 ¼c P2 r2 that also may be expressed as P1 rs ¼ s 1s P1 þ P2 r1 þ r2 =c

ð1:2Þ

Eq. (1.2) is referred to as the generalized matching law (G-match). The parameters c and s are allowed to vary to fit Eq. (1.2) to data generated by manipulating the values of VI1 and VI2 : A variation of the CONC VI1 VI2 that is relevant to this paper is the two-alternative concurrent variable time schedule (CONC VT1 VT2 ) (Baum & Rachlin, 1969; Brownstein & Pliskoff, 1968). CONC VT1 VT2 differ from CONC VI1 VI2 in that reinforcements are delivered, after the COD, as soon as they become available, that is, the subject need not emit a specific response to

produce an available reinforcer. Baum and Rachlin, for example, programmed a CONC VT1 VT2 in a shuttle box. The behavior alternatives were being at either of the two sides of the box. Changing over consisted in moving from one side to the other. The measured variable in this procedure was the time spent at each alternative (dwell time or DT), rather than number of responses, and the generalized matching law was expressed as t1 rs ð1:3Þ ¼ s 1s ; t1 þ t2 r1 þ r2 =c where t1 and t2 represent the DT at BA1 and BA2 : Dwell time is also a reasonable measure in CONC VI1 VI2 schedules. In order to respond at an alternative, the subject must spend time there and, since response rate is generally quite high relative to reinforcement rate, the average time differential between availability and delivery of reinforcement is usually relatively small. Consequently, both the response proportion (Eq. (1.2)) and the DT proportion (Eq. (1.3)) have been used to accurately describe the results of CONC VI1 VI2 schedules in terms of matching (Baum, 1979). The matching law is an empirically grounded description of a relationship between behavior and earned reinforcements (ER). There are many solutions to the matching equation at each value of the procedural variables. Nevertheless, the fact that the matching law seems to describe well the form of the relationship between the behavior and ER proportions with many procedural variations of the CONC VI1 VI2 suggests that there is something basic going on that yields the ubiquitous result. 1.2. Reinforcement maximization in CONC VI1 VI2 A reasonable a priori approach to explaining reinforced behavior is to assume that contingencies of reinforcement act over time to select behaviors that optimize net energy gain for the organism (Stephens & Krebs, 1986). In the case of experimental preparations such as concurrent schedules, the cost of behaving (e.g., pressing down on a light lever) is minimal and equal on both alternatives so it can be ignored in the calculation of energy gain. That leaves the overall ratio of programmed reinforcement rates on the two schedules as the only factor relevant to optimization during an experimental session of reasonably short duration. Houston and McNamara (1981) derived equations that describe the dwell times that maximize total earned reinforcement rate on CONC VT1 VT2 and showed that the matching law is mathematically incompatible with the ODT equations. Data from experiments concerning CONC VI1 VI2 are described well by Eq. (1.3) with values of s between 0.8 and 1.1 (Baum, 1979; Wearden & Burguess, 1982). Fig. 1

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

249

m that describes the relationship between the ODT proportion and the MER proportion for the BA associated with the lower programmed reinforcement (PR) rate. This function can be used to contrast empirical matching and the ‘‘matching’’ that would result if the animal behaved optimally. We investigate some of the characteristics of the function m; and we prove that 0:5omo1:003502: Here we do not review published data to show how it compares to optimal behavior. We simply hope to provide means by which others could conduct such analysis with relative ease. As it was the case in Houston and McNamara’s paper, our analysis is based on CONC VT1 VT2 ; but the results should yield good approximations to dwell time data in CONC VI1 VI2 schedules. 1.3. Description of the problem and notations

Fig. 1. Proportion of dwell-time (DT) on the alternative associated with the VI component schedule ðVI2 Þ with the lower programmed reinforcement (PR) rate as a function of the proportion of the earned reinforcements (ER) yielded by the same schedule. It is assumed that the programmed mean inter-reinforcement interval for the other schedule ðVI1 Þ was held constant at 1=R1 ¼ 15 s and that VI2 has any value 1=R2 greater than 15 s; as long as R2 ¼ l  R1 : The COD was 3 s; therefore, t ¼ 0:2 for all pairs of VI values (see Eq. (1.6)). The curve representing optimal performance is a hyperbola in the Stay–Switch region (Theorem 7). The sharp change in the tangent of the curve indicates the transition from the Stay–Switch region to the Stay–Stay region. The generalized matching law (G-match) was fitted to the optimal performance curve. Since G-match is not a hyperbola, except at s ¼ 1; the fit can only be approximately accurate. The shaded area represents G-match for the range of values of s that describe most published results.

shows that over the domain of procedural parameters values used in most experiments, the relation between the ODT proportion and the corresponding maximum ER (MER) proportion can also be fitted well by Eq. (1.3); however, the values of s that yield the best fit to the optimal performance vary between 1.11 and 1.55 depending on the COD. It seems plausible that stable performance on CONC VI1 VI2 (or CONC VT1 VT2 ) schedules may represent an approximation to optimal dwell times (ODT) that is differentially affected by some nonrandom variable that depends on the values of the VI pairs. In their paper, Houston and McNamara (1981) did not provide a simple method for calculating ODTs. Perhaps for this reason experimenters have not usually contrasted their results to expected ODTs. In our view, this could lead to a better understanding of CONC VT1 VT2 data. In this paper we present a relatively straightforward method for calculating the optimal values of t1 and t2 : We also offer a matching function

We assume a CONC VTi schedule. Each component schedule i ði ¼ 1; 2Þ has a constant probability of setting up a reinforcer per unit time. R1 and R2 are the programmed reinforcement (PR) rates on two behavior alternatives BA1 and BA2 : For our purposes R1 XR2 and we refer to VI1 as the ‘‘dense’’ component schedule and refer to VI2 as the ‘‘lean’’ component schedule. According to the Poisson distribution, the probability that a reinforcer has not been set up on VIi by time t is eRi t ; and the average time between reinforcements is 1=Ri : When a component schedule, e.g., VT1 ; sets up a reinforcer for delivery while the subject is engaged in BA2 ; schedule VT1 stops functioning, and the unobtained reinforcer is stored until the subject switches to BA1 and gets it. At that point VT1 resumes function. When the animal switches to component i; a changeover delay (COD) of duration si must elapse before a reinforcer can be earned. The dwell time t1 is the time elapsed between the changeovers to and from BA1 ; and t2 is the time elapsed between the changeovers to and from BA2 : If si X1=Ri ; then the subject will obtain more reinforcers by not changing over. Therefore, we assume that si o1=Ri : The variables that determine the number of earned reinforcements (ER) are the stay times a and b; which represent the portion of the dwell times during which reinforcers can be delivered. Thus, t1 ¼ s1 þ a and t2 ¼ s2 þ b: It is assumed that the subject’s behavior adapts to changes in the parameters ðRi ; si Þ of the procedure. When the animal switches to BA1 ; the probability that a reinforcer will not be available after the COD elapses is eR1 ðbþs1 þs2 Þ ; therefore the probability of collecting a reinforcer at termination of the COD s1 is p1 ¼ 1  eR1 ðbþs1 þs2 Þ : The animal then remains engaged at BA1 for a time a during which the expected number of ER is R1 a: Thus, the expected mean number of ER during time t1 is r1 ¼ p1 þ R1 a: The same argument for BA2 yields the following expression for the expected

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

250

mean number of ER during time t2 : r2 ¼ p2 þ R2 b; where p2 ¼ 1  eR2 ðaþs1 þs2 Þ : For the cycle t1 þ t2 the expected mean number of ER is r ¼ r1 þ r2 ¼ p1 þ p2 þ R1 a þ R2 b: The total duration of the cycle is t ¼ a þ b þ s1 þ s2 ; so that if R0 ða; bÞ is the mean ER rate for stay times a; b; then ða þ b þ s1 þ s2 ÞR0 ¼ r1 þ r2 ;

ð1:4Þ

that is, ða þ b þ s1 þ s2 ÞR0 ¼ 2  eR1 ðbþs1 þs2 Þ  eR2 ðaþs1 þs2 Þ þ R1 a þ R2 b: In most experiments s1 ¼ s2 : However, when analyzing optimal performance, it does not matter whether the CODs are equal or not. Note that in Eq. (1.4) the CODs are present only as the sum s1 þ s2 : This means that any problem with different CODs is the same as one in which each COD has the value s; where 2s ¼ s1 þ s2 : Thus, we may rewrite Eq. (1.4) as ða þ b þ 2sÞR0 ¼ 2  eR1 ðbþ2sÞ  eR2 ðaþ2sÞ þ R1 a þ R2 b:

ð1:5Þ R2 R1 ;

hence lp1: We assume that R1 XR2 and let l ¼ Note that here a; b are stay times in minutes, s is the COD in minutes, t1 ; t2 ; t are dwell times (DT) in minutes, R1 ; R2 are PR rates in min1 and R0 is the ER rate in min1 : To facilitate the mathematical argument without loosing generality, it is convenient to have unitless variables: x ¼ R1 a; y ¼ R1 b; t ¼ R1 s; T1 ¼ R1 t1 ; R0 T2 ¼ R1 t2 ; T ¼ R1 t; R ¼ : R1

ð1:6Þ

Also remember that we write ODT for optimal dwell time and MER for maximum number of earned reinforcements. The unitless dwell times are T1 ¼ x þ t for BA1 and T2 ¼ y þ t for BA2 : The total cycle duration is T ¼ T1 þ T2 ¼ x þ y þ 2t: Eq. (1.9) coincides with Eq. (7) in Houston and McNamara (1981). Houston and McNamara also showed that the parameter space for all possible values ðl; tÞ; 0olp1; 0oto1; is divided into three regions where the changeover (switching) strategies for MER rate are: Never Switch, Stay–Stay, Stay–Switch. (I) Never Switch. When the set of parameters ðl; tÞ is in this region, the subject maximizes ER rate by engaging exclusively on the BA associated with the higher PR rate. (II) Stay–Stay. When ðl; tÞ is in this region, the subject maximizes ER rate by engaging for times x and y in BA1 and BA2 ; respectively. Here both x and y are positive. (III) Stay–Switch. When ðl; tÞ is in this region, the subject maximizes ER rate by engaging for a time x in the alternative with the higher PR rate, switches to the other schedule, remains there after COD only for a moment to obtain a reinforcer that might have been stored. In this case x40 and y ¼ 0: The boundaries between the regions are described as follows: 1. The Stay–Stay region is bounded by the vertical line l ¼ 1; and by the two curves described below (Eqs. (1.11) and (1.15)). 2. The boundary between Never Switch and Stay– Stay regions is described by the equation t¼

1 ð1 þ l þ ð1  lÞ Lnð1  lÞÞ; 2l

ð1:11Þ

for 0:841405plp1; 0:920703pto1; Note that l0 E0:841405 is the solution of the equation 1 þ l þ Lnð1  lÞ ¼ 0

ð1:12Þ

Here x; y; t; T1 ; T2 ; T are unitless normalized times, and R is the unitless normalized ER rate. Clearly, the 1 unitless PR rate on BA1 is R R1 ¼ 1 and the unitless PR R2 rate on BA2 is R1 ¼ l: Note that t ¼ R1 so1: The mean numbers of ER in terms of unitless variables are

and that the corresponding value of t from Eq. (1.11) is t0 E0:920703: Besides that, t0 E0:920703 is the solution of the equation

r1 ¼ 1  eðyþ2tÞ þ x;

2  2t  e2t ¼ 0:

r2 ¼ 1  elðxþ2tÞ þ ly:

ð1:7Þ

On this boundary we have

Eq. (1.4) then becomes: ðx þ y þ 2tÞR ¼ r1 þ r2 ;

ð1:8Þ

that is, from Eq. (1.5), ðx þ y þ 2tÞR ¼ 2  e

ðyþ2tÞ

e

lðxþ2tÞ

þ x þ ly; ð1:9Þ

and p1 ¼ 1  eðyþ2tÞ ;

ð1:13Þ

p2 ¼ 1  elðxþ2tÞ :

ð1:10Þ

1 y ¼  ð1 þ l þ Lnð1  lÞÞ: l

ð1:14Þ

3. The boundary between Stay–Switch and Stay–Stay regions is described by the equation   l þ e2t  1 l þ e2t  1 1  Ln l l ¼ 2  2t  e2t ;

ð1:15Þ

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

251

Let f ðl; tÞ ¼

  l þ e2t  1 l þ e2t  1 1  Ln l l  2 þ 2t þ e2t :

The derivatives

@f @f @l; @t

are

@f 1 l þ e2t  1 ¼ 2 ð1  e2t Þ Ln ; @l l l @f e2t l þ e2t  1 ¼ 2ð1  e2t Þ þ : Ln @t l l We see that the system @f @l ¼ 0; f ¼ 0 has the obvious solution l ¼ 1; t ¼ 0; which is the endpoint of the curve, whereas the system @f @t ¼ 0; f ¼ 0 has the solution lE0:838274; tE0:85671: This means that the curve has a horizontal tangent at the endpoint (1,0) and a vertical tangent at the point (0.838274,0.85671); therefore Eq. (1.15) applies for 0:838274plo1; 0oto 0:920703: The admissible values for the interior of the Stay–Stay region are 0:838274olo1; 0oto1; as long as f ðl; tÞ40 and togðlÞ; where gðlÞ is the right side of Eq. (1.11). 4. The Stay–Switch region is bounded by the curve described above (Eq. (1.15)), the horizontal line t ¼ 0; the horizontal line t ¼ t0 E0:920703 ð0olo0:841405Þ; and the vertical line l ¼ 0: Thus, a pair ðl; tÞ belongs to the interior of the Stay– Stay region if 0:838274olo1;

ð1:16Þ

1 ð1:17Þ to ð1 þ l þ ð1  lÞLnð1  lÞÞ; 2l   l þ e2t  1 l þ e2t  1 1  Ln 42  2t  e2t : l l ð1:18Þ A pair ðl; tÞ belongs to the interior of the Stay–Switch region if

Fig. 2. Procedure parameter space for a 2-independent-component CONC VTVT schedule of reinforcement.

2. Equations for optimal stay times 2.1. The Stay–Stay region Houston and McNamara (1981) showed that in the case of optimal behavior R ¼ l þ 1  lp2 ¼ l þ 1  p1 :

Therefore, for the optimal solution lp2 ¼ p1 that is, by Eq. (1.10), 1  eðyþ2tÞ ¼ lð1  elðxþ2tÞ Þ: Solving this equation for x; we get:   1 1 ðyþ2tÞ x ¼ 2t  Ln 1  ð1  e Þ : l l

0olo1;

ð1:19Þ

Combining Eqs. (1.9), (1.10), (2.1) and (2.2) yields

0otot0 E0:920703;

ð1:20Þ

ðx þ y þ 2tÞ ðl þ eðyþ2tÞ Þ lþ1 þ x þ ly: ¼ ð1  eðyþ2tÞ Þ l

  l þ e2t  1 l þ e2t  1 1  Ln o2  2t  e2t : l l ð1:21Þ The boundaries are shown in Fig. 2 which corresponds to Houston and McNamara’s (1981) Fig. 2. In addition, we may point out that the area of the Stay–Stay region is approx. 0.1092, the area of the Stay– Switch region is approx. 0.8163, and other part of the square 0oto1; 0olo1; the Never Switch region, has an area of 0.075.

ð2:1Þ

ð2:2Þ

ð2:3Þ

ð2:4Þ

Thus, we have the system of Eqs. (2.3), (2.4) with respect to x; y; where l; t are in the Stay–Stay domain. We may solve the system numerically for x and y for each pair ðl; tÞ satisfying (1.16)–(1.18). 2.2. The Stay–Switch region In the Stay–Switch region y ¼ 0: Thus, r1 ¼ x þ 1  e2t ;

r2 ¼ 1  elðxþ2tÞ ;

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

252

and ðx þ 2tÞR ¼ r1 þ r2 ;

ð2:5Þ

Houston and McNamara (1981) showed that in this case R ¼ lelðxþ2tÞ þ 1

ð2:6Þ

and the optimal stay time x on BA1 satisfies the equation elðxþ2tÞ ðlðx þ 2tÞ þ 1Þ ¼ 2  e2t  2t:

ð2:7Þ

Let z ¼ lðx þ 2tÞ; then Eq. (2.7) becomes ez ðz þ 1Þ ¼ 2  e2t  2t;

ð2:8Þ

where we see that z depends on t only and does not depend on l: For each pair ðl; tÞ satisfying (1.19)–(1.21), first we solve Eq. (2.8) for z numerically and then find the optimal stay time x ¼ lz  2t: The equations for the number of reinforcements obtained from VI1 and VI2 become z r1 ¼  2t þ 1  e2t ; l

z

r2 ¼ 1  e :

ð2:9Þ

2.3. Computing optimal dwell times To facilitate the computation of optimal dwell times, we have prepared Table 1.1 The table was created by solving Eqs. (2.3), (2.4) and (2.7) numerically for x and y as t was varied between 0.05 and 0.9 in steps of 0.05, and l was varied between 0.04 and 1 in steps of 0.04. The entries ta ¼ x and tb ¼ y are the optimal normalized stay times in the dense and the lean reinforcement components, respectively. The zeros for tb denote the Stay– Switch region where optimization requires that the subject spend no time engaged in the lean reinforcement BA following the COD. To use the table, one must first convert the COD to a normalized t ¼ R1 s value. If for example R1 ¼ 1=15 ð1=sÞ; R2 ¼ 1=45 ð1=sÞ and COD s ¼ 2 s; then l ¼ 15=45E0:333 and t ¼ 2=15E0:133: If the exact values of t and l are not in the table, then we use linear interpolation. First we find the closest smaller values and the closest larger values. In our case, t1 ¼ 0:1; t2 ¼ 0:15 and l1 ¼ 0:32; l2 ¼ 0:36: We have 4 points in our table: ðt1 ; l1 Þ; ðt1 ; l2 Þ; ðt2 ; l1 Þ; ðt2 ; l2 Þ: The corresponding values of time are: t11 ¼ 0:448;

t12 ¼ 0:691;

t21 ¼ 0:376;

t22 ¼ 0:581:

1

A computer program that generates exact values of t and r to the 4th decimal places can be obtained from Fernando Gonzalez at [email protected].

First, we interpolate with respect to t for l1 and l2 separately: t  t1 ðt12  t11 Þ ¼ 0:448 t2  t1 0:133  0:1 ð0:691  0:448ÞE0:608; þ 0:15  0:1

q1 ¼ t11 þ

t  t1 ðt22  t21 Þ ¼ 0:376 t2  t1 0:133  0:1 ð0:581  0:376ÞE0:511: þ 0:15  0:1

q2 ¼ t21 þ

Then, we interpolate with respect to l: l  l1 ðq2  q1 Þ ¼ 0:608 l2  l 1 0:333  0:32 ð0:511  0:608ÞE0:576: þ 0:36  0:32

t a ¼ q1 þ

The value of ta calculated by solving Eq. (2.7) numerically for t ¼ 0:133 and l ¼ 0:333 is 0.573. Clearly, the interpolation provides a good approximation. Since the time ta ¼ x is normalized, we need to get the actual value: ta a ¼ E0:576  15 ¼ 8:64 s R1 and dwell time on BA1 is 8:64 s þ 2 s ¼ 10:64 s: The actual value for tb ¼ y is 0, therefore b ¼ 0 and dwell time BA2 is 2 s: Thus, the optimal behavior in a CONC VI15VI45 with COD ¼ 2 s is to stay on the VI15 side for 10:64 s; switch to the VI45 side, remain there until the COD elapses ð2 sÞ and switch back. The earned reinforcements can be computed by entering x ¼ ta and y ¼ 0 in Eq. (1.7). The maximum number of reinforcements that can be earned in a cycle given the selected values of programmed reinforcement rates and COD is r1 þ r2 :

3. The matching function m We define the matching function mðl; tÞ; which is related to the Matching Law, as taking values that satisfy the relation r2 T2 mðl; tÞ ¼ ð3:1Þ r1 þ r2 T1 þ T2 when T1 and T2 are optimal. It is clear that, on the strength of Eqs. (2.3), (2.4) and (2.7), when T1 and T2 are optimal, the stay times x; y are uniquely determined by l and t: Therefore the matching function m depends only on two variables, l and t: Since ODT and MER are complexly interdependent, the range of the function m is not obvious. In order to determine

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

253

Table 1 Optimal stay times for different values of l (rows) and t (columns) t

0.05

l

ta

0.04 2.443 0.08 1.172 0.12 0.748 0.16 0.536 0.2 0.409 0.24 0.324 0.28 0.263 0.32 0.218 0.36 0.183 0.4 0.154 0.44 0.131 0.48 0.112 0.52 0.096 0.56 0.082 0.6 0.070 0.64 0.059 0.68 0.050 0.72 0.041 0.76 0.034 0.8 0.027 0.84 0.021 0.88 0.016 0.92 0.011 0.96 0.006 1 0.002 t 0.05 l

ta

0.1

0.15

0.2 tb

ta

0.3 tb

ta

0.35 tb

ta

0.4 tb

ta

0.45

tb

ta

tb

ta

tb

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.002

4.982 2.391 1.527 1.095 0.836 0.664 0.540 0.448 0.376 0.318 0.271 0.232 0.199 0.170 0.145 0.124 0.105 0.088 0.073 0.059 0.047 0.036 0.025 0.016 0.007 0.55

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.007

7.628 3.664 2.343 1.682 1.286 1.021 0.833 0.691 0.581 0.493 0.421 0.361 0.310 0.266 0.229 0.196 0.166 0.140 0.117 0.096 0.078 0.060 0.045 0.030 0.017 0.6

0 10.399 0 13.313 0 16.395 0 19.673 0 23.186 0 26.983 0 5.000 0 6.407 0 7.897 0 9.487 0 11.193 0 13.041 0 3.200 0 4.104 0 5.065 0 6.091 0 7.195 0 8.394 0 2.300 0 2.953 0 3.649 0 4.393 0 5.197 0 6.071 0 1.760 0 2.263 0 2.799 0 3.375 0 3.997 0 4.677 0 1.400 0 1.802 0 2.232 0 2.696 0 3.198 0 3.747 0 1.143 0 1.473 0 1.828 0 2.210 0 2.627 0 3.083 0 0.950 0 1.227 0 1.524 0 1.847 0 2.198 0 2.585 0 0.800 0 1.035 0 1.288 0 1.564 0 1.865 0 2.198 0 0.680 0 0.881 0 1.099 0 1.337 0 1.599 0 1.888 0 0.582 0 0.756 0 0.945 0 1.152 0 1.381 0 1.635 0 0.500 0 0.651 0 0.816 0 0.998 0 1.199 0 1.424 0 0.431 0 0.563 0 0.707 0 0.867 0 1.045 0 1.245 0 0.371 0 0.487 0 0.614 0 0.755 0 0.913 0 1.092 0 0.320 0 0.421 0 0.533 0 0.658 0 0.799 0 0.959 0 0.275 0 0.363 0 0.462 0 0.573 0 0.699 0 0.843 0 0.235 0 0.313 0 0.400 0 0.498 0 0.611 0 0.740 0 0.200 0 0.267 0 0.344 0 0.432 0 0.533 0 0.649 0 0.168 0 0.227 0 0.294 0 0.372 0 0.462 0 0.568 0 0.140 0 0.191 0 0.250 0 0.319 0 0.399 0 0.494 0 0.114 0 0.158 0 0.209 0 0.270 0 0.342 0 0.428 0 0.091 0 0.128 0 0.172 0 0.226 0 0.290 0 0.367 0 0.070 0 0.101 0 0.139 0.478 0.186 0.188 0.242 0.376 0.310 0.230 0.050 0.105 0.075 0.230 0.107 0.404 0.146 0.634 0.194 0.926 0.253 0.017 0.031 0.031 0.050 0.050 0.076 0.076 0.108 0.108 0.147 0.147 0.196 0.65 0.7 0.75 0.8 0.85 0.9

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.062 0.129 0.196

tb

ta

tb

ta

tb

tb

0.04 31.127 0 35.708 0 40.854 0 0.08 15.063 0 17.304 0 19.827 0 0.12 9.709 0 11.169 0 12.818 0 0.16 7.032 0 8.102 0 9.314 0 0.2 5.425 0 6.262 0 7.211 0 0.24 4.354 0 5.035 0 5.809 0 0.28 3.590 0 4.158 0 4.808 0 0.32 3.016 0 3.501 0 4.057 0 0.36 2.570 0 2.990 0 3.473 0 0.4 2.213 0 2.581 0 3.005 0 0.44 1.921 0 2.246 0 2.623 0 0.48 1.677 0 1.967 0 2.305 0 0.52 1.471 0 1.731 0 2.035 0 0.56 1.295 0 1.529 0 1.804 0 0.6 1.142 0 1.354 0 1.604 0 0.64 1.008 0 1.201 0 1.428 0 0.68 0.890 0 1.065 0 1.274 0 0.72 0.785 0 0.945 0 1.136 0 0.76 0.691 0 0.837 0 1.013 0 0.8 0.606 0 0.740 0 0.903 0 0.84 0.530 0 0.653 0 0.803 0 0.88 0.460 0.121 0.572 0.033 0.710 0.059 0.92 0.392 0.920 0.492 0.129 0.614 0.173 0.96 0.324 0.174 0.411 0.228 0.517 0.293 1 0.256 0.256 0.330 0.330 0.419 0.419

ta

0.25

tb

ta

tb

ta

tb

ta

tb

ta

tb

ta

tb

ta

tb

46.756 22.728 14.719 10.714 8.311 6.709 5.565 4.707 4.039 3.506 3.069 2.705 2.397 2.133 1.904 1.703 1.527 1.370 1.229 1.103 0.988 0.880 0.766 0.648 0.528

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.090 0.226 0.372 0.528

53.720 26.160 16.973 12.380 9.624 7.787 6.474 5.490 4.724 4.112 3.611 3.193 2.840 2.537 2.275 2.045 1.842 1.662 1.501 1.356 1.225 1.097 0.959 0.814 0.665

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.124 0.287 0.467 0.665

62.290 30.395 19.763 14.448 11.258 9.132 7.613 6.474 5.588 4.879 4.299 3.816 3.407 3.056 2.753 2.487 2.252 2.044 1.857 1.690 1.538 1.384 1.213 1.028 0.837

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.163 0.359 0.581 0.837

73.575 35.988 23.458 17.194 13.435 10.929 9.139 7.797 6.753 5.918 5.234 4.665 4.183 3.770 3.412 3.098 2.822 2.576 2.357 2.159 1.980 1.789 1.565 1.318 1.060

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.002 0.204 0.440 0.719 1.060

90.496 44.398 29.032 21.349 16.739 13.666 11.471 9.825 8.544 7.520 6.681 5.983 5.392 4.885 4.446 4.062 3.723 3.422 3.152 2.910 2.690 2.435 2.109 1.742 1.365

0 127.77 0 62.984 0 41.390 0 30.592 0 24.114 0 19.795 0 16.710 0 14.396 0 12.597 0 11.157 0 9.979 0 8.997 0 8.167 0 7.455 0 6.838 0 6.298 0 5.822 0 5.398 0 5.019 0 4.678 0.009 4.370 0.244 3.902 0.529 3.211 0.887 2.484 1.365 1.815

the extreme values of mðl; tÞ in the Stay–Stay and Stay– Switch regions, we first need to express mðl; tÞ in terms of l and t: In the analyses that follow, all mentions of

ta

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.004 0.273 0.617 1.087 1.815

proportions refer to the ratio of measures taken during the lean component ðVI2 Þ to the sum of the same measures for both components.

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

254

Theorem 1. In the Stay–Stay domain mðl; tÞ ¼

lðy þ tÞðl þ eðyþ2tÞ Þ ; l2 y þ 1  eðyþ2tÞ

ð3:2Þ

where y is the solution of Eqs. (2.3), (2.4) for any given l and t: See proof in Appendix A (Section A.1). Theorem 2. In the Stay–Switch domain mðl; tÞ ¼

tðlez þ 1Þ 1  ez

for

t40;

ð3:3Þ

where z is the solution of Eq. (2.8) for any given t; and mðl; 0Þ ¼ limt-0 mðl; tÞ ¼ lþ1 2 : See proof in Appendix A (Section A.2) and Appendix C (Proposition C.3). >Although there are two different expressions for the function mðl; tÞ (i.e., Eqs. (3.2) and (3.3)), we show in Appendix A (Proposition A.1) that they coincide on the common boundary between the Stay–Stay and Stay–Switch regions. The following corollary shows that in the Stay–Switch region, mðl; tÞ; that is, the ratio of the proportions of ODT to the proportion of MER is a linear function of l; that is, the ratio of R2 to R1 : Corollary. The graph of mðl; tÞ as a function of l within the Stay–Switch region is a straight line, whose mintercept 1et z increases from 0.5 to t0 E0:920703 as t tez increases from 0 to t0 E0:920703; and whose slope 1e z is close to 0 when t is close to 0.920703, and it is close to 0.5 when t is close to 0. This result follows from Proposition C.5 in Appendix C and Eqs. (C.1), (C.5). The following theorems (3,4) concern the possible values of mðl; tÞ in the two regions. These theorems show that the outcome of a matching law analysis of optimal performance in CONC VIVI schedules would yield ‘‘overmatching’’ (Baum, 1979), which means that the optimizing subject would spend more time in the dense schedule component than it is predicted by the strict matching law (Eq. (1.1)). Theorem 3. In the Stay–Stay domain 0:854714omðl; tÞ o1:00350 for all values ðl; tÞ satisfying (1.16)–(1.18), on solutions of Eqs. (2.3), (2.4) in which y40: See proof in Appendix B (Section B.4). Theorem 4. In the Stay–Switch domain 0:5omðl; tÞo1 for all values ðl; tÞ satisfying (1.19)–(1.21), on solutions of Eq. (2.8). See proof in Appendix C (Section C.6).

The following three theorems (5–7) describe the relation between the values of the MER proportion, the ODT proportion and l; t: Theorems 5 and 6 show that for a fixed value of t; the MER proportion and the ODT proportion increase as l increases, and that for a fixed value of l both the MER and ODT proportions decrease as t increases. Therefore, it follows that both proportions change together in the direction predicted by the matching law. However, Theorem 7 shows that the graph of the relation between the proportions MER and ODT is a concave up portion of a hyperbola, whereas the G-match law predicts that the relation should be best described by a power function. Theorem 5. In the Stay–Switch domain, the ODT proportion w ¼ TT2 is proportional to l for any fixed t and decrease when t increases for any fixed l: Moreover, wol2 for all 0otot0 E0:920703: Proof. We have in the Stay–Switch domain T2 t lt lt ¼ ¼ : w¼ ¼ x þ 2t lðx þ 2tÞ z T We see that w is proportional to l for any fixed t; therefore w ¼ TT2 increases when l increases. We prove in Appendix C (Proposition C.2) that the function zt is increasing and zt42 for all 0otot0 E0:920703; therefore tz is decreasing function of t for any fixed l; and tzo12: Thus, wol2: & Theorem 6. In the Stay–Switch domain, the proportion u ¼ rr2 increases when l increases for any fixed t and decreases when t increases for any fixed l: Moreover, l uolþ1 for all 0otot0 E0:920703: See proof in Appendix D. Theorem 7. In the Stay–Switch domain, the relationship between w ¼ TT2 and u ¼ rr2 is described by a concave up portion of a hyperbola: k1 u w¼ ð3:4Þ k2  k3 u where k1 ; k2 ; k3 are values that depend on t only. Proof. We have in the Stay–Switch domain (see Eqs. (2.9)): r2 1  ez ¼z : r1 þ r2 l  2t þ 2  e2t  ez Since z is a solution to Eq. (2.8), we may substitute ez ðz þ 1Þ for 2  e2t  2t: r2 1  ez ¼z : r1 þ r2 l þ zez z

for l: Solve the equation u ¼ z1e þzez l

z



e uz ez  1  uz

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

and substitute this expression into Eq. (3.3): tez : m¼ z e  1  uz If we set k1 ¼ tez ; k2 ¼ ez  1; k3 ¼ z; where z is the k1 solution of Eq. (2.8), we obtain the formula m ¼ k2 k : 3u T2 utez The proportion w ¼ T1 þT2 is w ¼ mu ¼ ez 1uz: Thus, 2 2 if t is fixed, then the graph of w ¼ T1TþT vs. u ¼ r1 rþr 2 2 within the Stay–Switch domain is a concave up portion of a hyperbola, as we can see in Fig. 1. The value of w is increasing as u is increasing. & The following theorem shows that for experimentally relevant values of l and t; and given that t is fixed, there is a linear relationship between Z ¼ TT12 and x ¼ rr12 ; where Z41; x41: Theorem 8. In the Stay–Switch domain, the relationship between Z ¼ TT12 and x ¼ rr12 is linear with slope and intercept that depend only on t:

times a and b in minutes in the Stay–Stay domain, and the matching function mðl; rÞ are as follows (compare with Eqs. (2.3), (2.4), (3.2) by using Eq. (1.6)): 1 1 Ln ð1  ð1  eR1 ðbþ2sÞ ÞÞ; R2 l R1 ðbþ2sÞ Þ R1 ða þ b þ 2sÞðl þ e 1 þ l þ R1 ða þ lbÞ; ¼ ð1  eR1 ðbþ2sÞ Þ l

a ¼ 2r 

2 where l ¼ R R1 ; and

mðl; rÞ ¼

lR1 ðb þ sÞðl þ eR1 ðbþ2sÞ Þ : lR2 b þ 1  eR1 ðbþ2sÞ

eR2 ðaþ2sÞ ð1 þ R2 ða þ 2sÞÞ ¼ 2  e2R1 s  2R1 s

where k4 ; k5 are values that depend on t only. The slope k4 of these lines decreases from 2 to t10 E1:086127 and the Zintercept k5 increases from 1 to 0.86127 as t increases from 0 to t0 E0:920703:

and

These results concerning the linear relationship between m and l; or Z and x in the Stay–Switch domain and concerning the hyperbolic relationship between the ODT proportion and the MER proportion in the same domain are based on t being constant.

4. Equations and formulas expressed in terms of actual variables All the equations and formulas above are in terms of normalized variables. We believe it may be helpful to present them also in terms of the actual variables. Given s in minutes, R1 XR2 in min1 ; and provided that 0oroR11 ; the equations to find the optimal stay

ð4:1Þ

In the Stay–Switch domain the following equations yields time a in minutes provided that b ¼ 0; 0oroRt01 E0:920703 R1 ; and the function mðl; rÞ (compare with Eqs. (2.7), (3.3), by using Eq. (1.6)):

Z ¼ k4 x þ k5 ;

1 1 Proof. It is obvious that w ¼ Zþ1 and u ¼ xþ1 : If we substitute these formulas into Eq. (3.4), we get the following: 1  ez 1  ez  zez  t : xþ Z¼ t t Thus, there is a linear relationship between Z ¼ TT12 and x ¼ rr12 for any fixed value of t within the Stay–Switch domain. We proved in Appendix C (Proposition C.5) t that the function 1e z increases from 0.5 to t0 E0:920703: Therefore the slope of these lines decreases from 2 to 1 t0 E1:086127 as a function of t: The Z-intercept increases from 1 to 0.86127 as t increases from 0 to t0 E0:920703: &

255

mðl; rÞ ¼

sðR2 eR2 ðaþ2sÞ þ R1 Þ : 1  eR2 ðaþ2sÞ

ð4:2Þ

In both domains, mean MER can be computed by r1 ¼ 1  eR1 ðbþ2sÞ þ R1 a; r2 ¼ 1  eR2 ðaþ2sÞ þ R2 b; r ¼ r1 þ r2 :

ð4:3Þ

5. Discussion We employed methods used by Houston and McNamara (1981) to create an ODT table for CONC VIVI schedules that can be used to compute a good approximation to the ODT for any experimentally useful pair of VI values and COD. Experimenters can use this information to assess the extent that performance in CONC VIVI approximates optimal behavior, and to develop models that describe the deviations. The matching law has been commonly used to account for behavior in CONC VIVI schedules. Because data often deviate from the matching law in systematic ways, the equation of the matching law has been modified often (e.g., Baum, 1974; Davidson & Jones, 1995). These modifications retain the assumption that the matching law is key to describing and understanding the behavior engendered by the schedule. However, the matching law does not account for the number of obtained reinforcements at each alternative nor, therefore, for the total number of obtained reinforcements.

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

256

Nevertheless, since researchers have concentrated on the relation between the proportion of time or responses at an alternative and the corresponding proportion of ER, it seemed reasonable to explore how those proportions are related when the animal is behaving optimally. To that end, we derived the function mðl; tÞ that satisfies the following equation when T1 and T2 are optimal: mðl; tÞ

r2 T2 ¼ : r1 þ r2 T1 þ T2

We proved that 0:5omðl; tÞo1:003502 for all pairs of VI values and all COD. We also proved that, for a fixed t; mðl; tÞ is an increasing linear function of l in the domain of the procedure–parameter space that includes the parameter values used in most experiments (Stay– Switch domain). In addition, we showed that (1) the graph of the ODT proportion for the lean schedule against the MER proportion for the same schedule is a concave up portion of a hyperbola, but that (2) the ratio of the ODT for the dense schedule to the ODT for the lean schedule is a linear function of the ratio of the MER for the dense schedule to the MER for the lean schedule. We believe that by comparing the values and characteristics of the function mðl; tÞ to the value of the empirically determined ratio of the proportions may provide useful information on matching behavior. Our optimality analysis, like Houston and McNamara’s (1981), yields ODT values for one cycle of behavior in a CONC V1VI schedule. A cycle begins with a switch to one schedule and ends with the next switch back to the same schedule. In experiments involving CONC VIVI schedules, the dependent variable is the average of the DTs over many cycles. Given the nature of the experimental subjects, it is inevitable that the DTs will vary from cycle to cycle. Therefore, the average reinforcement rate from a multicycle experiment can never equal the maximum reinforcement rate. This inescapable bias of the experimental results raises the issue of whether our optimality analysis is an abstract exercise that has no relevance to experimental work. The applicability of our analysis to data depends on the extent that the simplified mathematical model that is used here accurately simulates experimental results obtained with CONC VIVI (or CONC VTVT). We believe our results are applicable. This view is supported by an analysis by Heyman (1982) who showed (pp. 465– 468) that a model that assumes fixed DT durations and a model that assumes a Poisson distribution of DT durations yield very similar rates of reinforcement across a range of experimentally reasonable DT proportions and across a wide range of CODs. There is only one optimal solution to the maximization of reinforcement rates under CONC VIVI schedules. Like Houston and McNamara before us, our intention here is to describe characteristics of the optimal solution

with the expectation that understanding optimal performance will shed light on actual, non-optimal, behavior and on the factors that account for the failure to optimize. In all the published experimental studies we are aware of, t changed as l was varied, that is, either the COD was kept constant as both schedules were changed or both the COD and the schedules were changed but not in such a way as to (1) keep t constant, or (2) vary t in a systematic fashion. In our analysis, as in Houston and McNamara’s (1981), t is one of two parameters that determine optimal performance in CONC VIVI. In order to explore the connection between some of the results of our analysis (e.g. Theorems 5–8) and empirical results, experiments with CONC VIVI schedules would have to be conducted in which t is fixed (or is varied systematically). This is most easily accomplished by (1) making the programmed reinforcement rate in the dense component schedule the same in all conditions, (2) varying l by manipulating the parameter value of the lean schedule and (3) manipulating t by varying the COD.

Acknowledgments This work was supported by Grant 5r24DAO7256 from the National Institute on Drug Abuse/Minority Institutions Research Development Program and the National Science Foundation agreement No. IBN987654. We wish to thank Alicia Askew, Ph.D., for her helpful suggestions on the manuscript.

Appendix A. The matching function mðl; tÞ A.1. Function mðl; tÞ in the Stay–Stay domain Proof of Theorem 1. Let T1 x þ t ¼ Z; then x þ t ¼ ðy þ tÞZ; ¼ T2 y þ t and x þ y þ 2t ¼ ðx þ tÞ þ ðy þ tÞ ¼ ðy þ tÞZ þ ðy þ tÞ ¼ ðy þ tÞðZ þ 1Þ: Now we express the MER ratio in terms of Z by Eq. (1.8): r1 r1 þ r2 ðx þ y þ 2tÞR ðy þ tÞðZ þ 1ÞR þ1¼ ¼ ¼ r2 r2 r2 r2 ðy þ tÞR ¼ ðZ þ 1Þ : r2 Using Eqs. (2.1), (2.2) and (1.7), we find: r1 ðy þ tÞR lðy þ tÞðl þ eðyþ2tÞ Þ þ1 ¼ ðZ þ 1Þ ¼ ðZ þ 1Þ 2 : r2 r2 l y þ 1  eðyþ2tÞ

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

257

A.3. The function mðl; tÞ on the common boundary

Let ðyþ2tÞ

mðl; tÞ ¼

lðy þ tÞðl þ e Þ ; 2 ðyþ2tÞ l yþ1e

then r1 þ 1 ¼ m  ðZ þ 1Þ; r2

or;

ðA:1Þ

  r1 T1 þ1¼m þ1 : r2 T2

We may rewrite the latest formula as r2 T2 r1 T1 and þ ð1  mÞ ¼ m  ; m ¼ r T r T where r ¼ r1 þ r2 is the total number of rewards earned from both schedules during the time T ¼ T1 þ T2 : In some of the above formulas m was written for mðl; tÞ: For clarity and brevity, we do the same in some other places. &

Proposition A.1. The function mðl; tÞ; defined as mðl; tÞ 8 > lðy þ tÞðl þ eðyþ2tÞ Þ > > > > > l2 y þ 1  eðyþ2tÞ > > < ¼ lðxþ2tÞ > > þ 1Þ > tðle > > > lðxþ2tÞ > 1e > :

if ðl; tÞ satisfies ð1:16Þ2ð1:18Þ; if ðl; tÞ satisfies ð1:19Þ2ð1:21Þ

is continuous on the common boundary. Proof. We have two different expressions for the function m: Let us temporarily denote ms ¼

A.2. Function m in the Stay–Switch domain

lðy þ tÞðl þ eðyþ2tÞ Þ l2 y þ 1  eðyþ2tÞ

in the Stay–Stay domain and Proof of Theorem 2. Let (for t40) T1 x þ t ¼ Z; then x þ t ¼ tZ; ¼ t T2 and x þ 2t ¼ ðx þ tÞ þ t ¼ tZ þ t ¼ tðZ þ 1Þ: Now we express the MER ratio in terms of Z using Eq. (2.5): r1 r1 þ r2 ðx þ 2tÞR tðZ þ 1ÞR tR þ1¼ ¼ ¼ ¼ ðZ þ 1Þ : r2 r2 r2 r2 r2 By (2.6), we find: r1 tR tðlelðxþ2tÞ þ 1Þ þ 1 ¼ ðZ þ 1Þ ¼ ðZ þ 1Þ r2 p2 r2 lðxþ2tÞ tðle þ 1Þ ¼ ðZ þ 1Þ : 1  elðxþ2tÞ Let tðlelðxþ2tÞ þ 1Þ tðlez þ 1Þ ¼ ; ðA:2Þ 1  ez 1  elðxþ2tÞ where z ¼ lðx þ 2tÞ; then   r1 r1 T1 þ 1 ¼ m  ðZ þ 1Þ; or; þ1¼m þ1 : r2 r2 T2 mðl; tÞ ¼

Again we may rewrite the latest formula as r1 T1 þ ð1  mÞ ¼ m  ; r2 T2 or as r2 T2 m ¼ ; r T where r ¼ r1 þ r2 is the total number of rewards earned from both schedules during the time T ¼ T1 þ T2 : Thus, we have exactly the same expressions as in Eq. (A.1), but the value of mðl; tÞ was computed differently. &

tðlelðxþ2tÞ þ 1Þ ðA:3Þ 1  elðxþ2tÞ in the Stay–Switch domain, and show that they coincide on the common boundary. 2t Þ Since y ¼ 0 on this boundary, we have ms ¼ ltðlþe : 1e2t Then, from Eq. (2.3),   1 l  1 þ e2t x ¼ 2t  Ln ; l l

mw ¼

therefore lðx þ 2tÞ ¼ Ln

  l  1 þ e2t ; l

and we find l  1 þ e2t ¼ l  1 þ e2t : l Substitute the previous result into (A.3)):

lelðxþ2tÞ ¼ l 

mw ¼

tðl  1 þ e2t þ 1Þ ltðl þ e2t Þ ¼ ¼ ms : 1  e2t 1  1l ðl  1 þ e2t Þ

&

Appendix B. Estimation of the function mðl; tÞ in the Stay–Stay region In this and the following sections the referenced regions and boundaries are those described in Section 1.3 and identified in Fig. 2. B.1. Extreme values of function m on the vertical line l ¼ 1 Proposition B.1. The function mðl; tÞ on the vertical line l ¼ 1 is constant: mðl; tÞ ¼ 1 on this line. Proof. If we set l ¼ 1 in Eq. (2.3), we will get x ¼ y: Then Eq. (2.4) for optimal stay time on BA2

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

258

becomes ðy þ tÞðe

ðyþ2tÞ

þ 1Þ ¼ 1  e

ðyþ2tÞ

þ y:

ðB:1Þ

In this case l ¼ 1; therefore mðl; tÞ ¼

ðy þ tÞð1 þ eðyþ2tÞ Þ : y þ 1  eðyþ2tÞ

Thus, by Eq. (B.1), m ¼ 1; for all solutions of Eqs. (2.3), (2.4) with l ¼ 1: & B.2. Extreme values of function mðl; tÞ on the boundary (1.11) Proposition B.2. The values of the function mðl; tÞ on the boundary (1.11) satisfy the inequality 0:920703pmðl; tÞp1:003503: Proof. Substitute the functions t and y from Eqs. (1.11), (1.14) into (3.2): ð1 þ lÞð1 þ Lnð1  lÞÞ MðlÞ ¼ mðyðlÞ; l; tðlÞÞ ¼ ; 2lðl þ Lnð1  lÞÞ then find the derivative: 2

dM l þ ð1 þ lÞ Lnð1  lÞ þ Ln2 ð1  lÞ ¼ : dl 2l2 ðl þ Lnð1  lÞÞ2 The derivative equals 0 for lE0:97337; and we have maximum of the function M at this point: Mð0:97337ÞE1:003502

when

tE0:964082:

At the left end: Mð0:841405Þ ¼ mð0; 0:841405; 0:920703ÞE0:920703: At the right end: ð1 þ lÞð1 þ Lnð1  lÞÞ 2lðl þ Lnð1  lÞÞ ð1 þ Lnð1  lÞÞ ¼ lim l-1 ðl þ Lnð1  lÞÞ

Mð1Þ ¼ lim

l-1

¼ lim

l-1

1 1l

1

1 1l

¼ lim

l-1

1 ¼ 1: l

Thus, 0:92703omðl; tÞp1:003502 on this boundary. Note that mðl; tÞ ¼ 1 on this boundary for lE0:944352 and tE0:944352: Values of mðl; tÞ are slightly greater than 1 on this boundary for 0:944352olo1: Values of t defined by (1.11) vary from 0.944352 to 1 on this part of the boundary. & B.3. Extreme values of function mðl; tÞ on the boundary (1.15) Proposition B.3. The values of the function mðl; tÞ on the boundary (1.15) satisfy the inequality 0:854714pmðl; tÞp1:

Proof. On this boundary y ¼ 0: We find a critical point of the function mðl; tÞ in Eq. (3.2) subject to the constraint (Eq. (1.15)) by the method of Lagrange multipliers. For this purpose we find partial derivatives   @m @m @f @f ; ; ; ; @l y¼0 @t y¼0 @l @t where f ðl; tÞ ¼

  l þ e2t  1 l þ e2t  1 1  Ln l l  2 þ 2t þ e2t :

Results are:  @m ðe2t þ 2lÞt ¼ ;  @l y¼0 1  e2t  @m lð1  e2t þ lðe2t  1  2tÞ  2tÞ ¼ ; @t y¼0 e2t ð1  e2t Þ2 2t

ð1  e2t Þ Ln lþe l @f ¼ @l l2

1

; 2t

2lð1  e2t Þ þ e2t Ln lþe l @f ¼ @t l

1

:

We solve the equations @m @f ¼a ; @l @l

@m @f ¼a ; @t @t

f ¼0

numerically. The solution is: lE0:859661; tE0:591242; aE1:11992: Now we find the value of the function mðl; tÞ at this critical point and at the end points: mð0:859661; 0:591242ÞE0:854714; mð0:841405; 0:920703ÞE0:920702; mð1; 0Þ ¼ 1: The minimum value of the function mðl; tÞ on this boundary is 0.854714, the maximum value is 1. &

B.4. Extreme values of function mðl; tÞ in the Stay–Stay region Proof of Theorem 3. We want to find critical values of the function mðy; l; tÞ; described by Eq. (3.2), inside the Stay–Stay region where y is the solution of Eqs. (2.3), (2.4) uniquely defined by l and t: We find a critical point of the function mðy; l; tÞ in Eq. (3.2) subject to the constraints (Eqs. (2.3), (2.4)) by method of Lagrange multipliers as we did in the previous section. For this purpose we find the

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

interval tAð0; t0 Þ; where t0 E0:920703 is a solution of Eq. (1.13). When t-t0 ; we have from Eq. (2.8)

partial derivatives @m @m @m @F @F @F ; ; ; ; ; ; @y @l @t @y @l @t

lim ez ðz þ 1Þ ¼ 0;

t-t0

where ð1  eðyþ2tÞ Þlþ1 l þ x þ ly F ðy; l; tÞ ¼  eðyþ2tÞ  l; x þ y þ 2t   1 1 x ¼ 2t  Ln 1  ð1  eðyþ2tÞ Þ : l l We found these derivatives and solved the system of equations: @m @F @m @F ¼a ; ¼a ; @y @y @l @l @m @F ¼ a ; F ¼ 0: @t @t The solution is: y ¼ 2:03892; l ¼ 1; t ¼ 0:917849; a ¼ 0:97966: This point is on the boundary l ¼ 1: Value of the function mðl; tÞ is mð2:03892; 1; 0:917849Þ ¼ 1: The function m does not have extreme values inside of the Stay–Stay region. Therefore its maximum and minimum values are on the boundaries, and 0:854714omðl; tÞo1:00350 inside of the Stay–Stay region as it follows from Propositions B.1–B.3. &

Appendix C. Estimation of function mðl; tÞ in the Stay–Switch region

Proof. Eq. (2.8) determines z as a function of t: If t ¼ 0; we have the equation ez ðz þ 1Þ ¼ 1

dz ¼ dt

&

Proposition C.2. If z is a solution of Eq. (2.8), then 59 4 t 1. zE2t þ 23 t2 þ 59 t3 þ 135 0oto0:413395;

2. z42t 3. lim

t-0

4.

for all

z ¼ 2 and t

for ðC:3Þ

tAð0; t0 Þ:

ðC:4Þ

zE2t

ðC:5Þ

for smallt

z t

is an increasing function of t on the interval ð0; t0 Þ and zt42 for all tAð0; t0 Þ:

Proof. We’ll show that z42t for all tAð0; t0 Þ: It will be dz enough to show that dt 42: The derivative d 2 z 2ez ½ð1  e2t Þ2 ðz  1Þ þ z2 ez ¼ dt2 z3 dz is obviously positive for zX1; that means that dt is dz increasing for zX1: If dt 42 at z ¼ 1; then it is true for z41: To estimate the dependence z on t for zo1; which corresponds to0:413395; we may represent z as a sum of the series with respect to t:

 3  a21 2 a1 t þ  a1 a 2 t 3 ¼1 2 3  4  a1 a22 2  a1 a3 t 4 þ  þ a1 a2  8 2  5  a1 a3 a 2  1 þ a21 a3  a2 a3 þ a1 a22  a1 a4 t5 þ 30 2 þ?:

that has the only solution z ¼ 0: The derivative of z with respect to t from Eq. (2.8) is d 2t  2tÞ dtð2  e d z dzðe ðz þ 1ÞÞ

lim ez ¼ 0:

t-t0

ez ðz þ 1Þ

ðC:1Þ

t-t0

lim z ¼ N and

t-t0

The left side of (2.8) is represented by

Proposition C.1. If z is a solution of Eq. (2.8), then 1. z is an increasing function of t 2. lim z ¼ N and lim ez ¼ 0:

therefore

z ¼ a1 t þ a 2 t 2 þ a3 t 3 þ a 4 t 4 þ ? :

C.1. Estimation of z as a function of t

t-t0

259

z

¼

2e ð1  e z

2t

Þ

:

ðC:2Þ

dz We see that dt 40; that means that z is increasing when t is increasing. It is easy to check that the function F ðtÞ ¼ 2  e2t  2t from Eq. (2.8) is decreasing from 1 to 0 on the

The right side of Eq. (2.8) is represented by 4 5 2  e2t  2t ¼ 1  2t2 þ 43 t3  23 t4 þ 15 t þ ?:

Equating coefficients, we solve the system of equations: a21 ¼ 2; 2

a1 ¼ 2;

a31 4  a1 a2 ¼ ; 3 3

2 a2 ¼ ; 3

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

260



a41 a2 2 þ a21 a2  2  a1 a3 ¼  ; 3 8 2

5 a3 ¼ ; 9

a51 a31 a2 4  þ a21 a3  a2 a3 þ a1 a22  a1 a4 ¼ ; 15 30 2 Therefore, we obtain (C.3):

a4 ¼

59 : 135

59 4 zE2t þ 23 t2 þ 59 t3 þ 135 t:   59 4  Note that 2t þ 23 t2 þ 59 t3 þ 135 t t¼0:413395 E0:992733: Eq. (C.3) provides a good approximation for z on the interval 0oto0:413395; where zo1: We see that z42t and dz 4 15 236 3 ¼ 2 þ t þ t2 þ t 42: dt 3 9 135 At the end of this interval we have  dz  E2:95952: dtt¼0:413395

Because z42t

dz dt42

for all

for all tAð0; t0 Þ; we conclude that tAð0; t0 Þ:

If to0:413395; we may use the series (C.3) to determine that the function zt increases. Thus, zt is increasing function of t for 0otot0 E0:920703 and, by Eq. (C.4), zt42: & C.2. Extreme values of function mðl; tÞ on the horizontal line t ¼ 0 Proposition C.3. On the horizontal line t ¼ 0 lþ1 1: mðl; 0Þ ¼ ; 2 2:

ðC:6Þ

0:5omðl; 0Þp1 for 0olp1:

Proof. The function m is defined by Eq. (3.3) in the Stay–Switch domain for t40: We define mðl; 0Þ by the formula mðl; 0Þ ¼ lim mðl; tÞ t-0

It follows from Eq. (C.3) that z lim ¼ 2 and zE2t for small t; that is t-0 t 2tð1  lÞ lðx þ 2tÞE2t; and xE for small t: l Now we prove that the function zt is increasing. Find the derivative: d z tz0  z : ¼ dt t t2 Since the denominator is positive, we are interested in the sign of the numerator (use Eq. (C.2)): 2ez ð1  e2t Þ z tz0  z ¼ t z 2ez tð1  e2t Þ  z2 ¼ z ez ð2tð1  e2t Þ  z2 ez Þ : ¼ z Again, we consider only the sign of one factor in the numerator:

According to Eq. (C.5), z ¼ lðx þ 2tÞE2t for small t; and we have for mðl; 0Þ from Eq. (3.3): mðl; 0Þ ¼ lim

t-0

tðle2t þ 1Þ tðl þ 1Þ l þ 1 ¼ : ¼ lim 2t t-0 1e 2t 2

Therefore for 0olp1 we have 0:5omp1:

C.3. Extreme values of function mðl; tÞ on the horizontal line t t0 E0:920703 % % Proposition C.4. On the horizontal line t ¼ t0 E0:920703 the function mðl; tÞ is constant: mðl; t0 Þ ¼ t0 E0:920703:

ðC:7Þ

Proof. We have from Eqs. (3.3) and (C.1): tðlez þ 1Þ mðl; t0 Þ ¼ lim ¼ lim t ¼ t0 E0:920703 t-t0 t-t0 1  ez for all 0olp0:841405:

h ¼ 2tð1  e2t Þ  z2 ez :

&

&

Find the derivative (use Eq. (C.2)): C.4. Extreme values of function mðl; tÞ on the vertical line l ¼ 0

dh dð2tð1  e2t ÞÞ dðz2 ez Þ 0 ¼  z dt dt dz ¼ 2  2e2t þ 4te2t  ez zð2  zÞ 

2ez ð1  e2t Þ z

Note that l ¼ 0 does not make practical sense, but we may consider very small values of l:

¼ 2ð1  e2t Þðz  1Þ þ 4te2t : This derivative is obviously positive for zX1: Therefore h is increasing for zX1; which corresponds to tX0:413395: Because hð0:413395; 1ÞE0:0972302; we conclude that h40 for all tX0:413395; therefore d z z dtðtÞ40 and t is increasing function of t for tX0:413395:

Proposition C.5. On the vertical line l ¼ 0; the function mðl; tÞ takes the value mðl; tÞ ¼ 1et z and 1. mðl; tÞ is an increasing function of t; 2. 0:5omðl; tÞo0:920703 for 0oto0:920703 when l is close to 0.

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

Proof. Let l ¼ 0 in Eq. (3.3) t : mð0; tÞ ¼ 1  ez We know that z as a function of t; and its derivative with respect to t; are determined by Eqs. (2.8) and(C.2). Therefore the complete derivative of mð0; tÞ with respect to t is given by the formula:   @m @m @m dz þ  : ¼ @t comp @t @z dt Thus,   @m 1 ez t 2ð1  e2t Þ ¼  @t comp 1  ez zez ð1  ez Þ2 ¼

zð1  ez Þ  2tð1  e2t Þ zð1  ez Þ2

we conclude that G0 is increasing, and because z42t (by Eq. (C.4)), we confirm that zð1  ez Þ  2tð1  e2t Þ40: Thus, m ¼ 1et z is an increasing function of t and mjt¼0 omð0; tÞomjt¼t0 ; t t t 1 ¼ ; ¼ lim ¼ lim z 2t t-0 1  e t-0 1  e t-0 2t 2 t ¼ lim ¼ lim t ¼ t0 E0:920703: t-t0 1  ez t-t0

mjt¼0 ¼ lim

(Recall that zE2t for small t by Eq. (C.5) and limt-t0 ez ¼ 0 by Eq. (C.1).) Therefore, when l is close to 0.

Proof of Theorem 4. It is obvious that @m tez ¼ 40 ðC:8Þ @l 1  ez that means that for any constant t the value of m is increasing. Therefore the function m does not have stationary points inside the region. Its minimum is on the line l ¼ 0; that is, 0.5. Its maximum is at l ¼ 1; which, in this region, is the point where mð1; 0Þ ¼ 1: We conclude that 0:5omðl; tÞo1 inside of Stay–Switch region as it follows from Propositions C.3 to C.6. &

Appendix D. Estimation of the ratio u ¼ rr2 in the Stay–Switch domain Proof of Theorem 6. Obviously, 0orr2 o12; but we may get more accurate estimation for Stay–Switch domain. We have by (3.1) and (3.3): r2 T2 t lt lt ¼ ¼ u¼ ¼ ¼ r mT mðx þ 2tÞ mlðx þ 2tÞ mz lð1  ez Þ ¼ : zð1 þ lez Þ z

where

0:5omðl; tÞo0:920703

C.6. Estimation of the function m in the Stay–Switch region

:

Show that the numerator G ¼ zð1  ez Þ  2tð1  e2t Þ is positive for t40: Since the derivative of the function G0 ¼ xð1  ex Þ dG0 ¼ ð1  ex Þ þ xex 40; dx

mjt¼t0

261

for

0oto0:920703

&

ð1e Þ Here we write m for mðl; tÞ: Therefore @u @l ¼ zð1þlez Þ2 40 and u increases when l increases for any fixed t: Show that ð@u @t Þcomplete o0: The function u depends on t through dz z only, and dt 40 (by Eq. (C.2)). Therefore it is enough to consider the sign of @u @z :

@u lðez zð1 þ lÞ  ðez  1Þðez þ lÞÞ ¼ : @z z2 ðez þ lÞ2 Let H ¼ ez zð1 þ lÞ  ðez  1Þðez þ lÞ; then @H ¼  ez ½2ðez  1Þ  zð1 þ lÞ : @z Because ½2ðez  1Þ  zð1 þ lÞ 42z  z  lz ¼ zð1  lÞ40;

C.5. Extreme values of the function mðl; tÞ on the boundary (1.15) Proposition C.6. The values of the function mðl; tÞ on the boundary (1.15) satisfy the inequality 0:854714pmðl; tÞp1 Proof. This boundary is common for two regions. The function m; as computed by two different formulas, takes on the same values on the common boundary (see Section A.3, Proposition A.1). The minimum and maximum values of the function on this boundary (0.854714 and 1) were estimated in Section B.3, Proposition B.3. &

we conclude that @H @z o0; the function H is decreasing. Since Hjz¼0 ¼ 0; we have Ho0 and ð@u @tÞcomplete o0; that is, the function u is decreasing function of t for any constant l: Since lim u ¼ lim

t-0

t-0

lð1  ez Þ l ; ¼ zð1 þ lez Þ l þ 1

l we have uolþ1 for all 0olo1; 0otot0 E 0:920703:

&

Appendix E. Interesting observation As we saw before, l0 E0:841405 is a solution of Eq. (1.12), and t0 E0:920703 is a solution of Eq. (1.13),

ARTICLE IN PRESS R. Belinsky et al. / Journal of Mathematical Psychology 48 (2004) 247–262

262

and both give a solution of Eqs. (1.11) and (1.15). Therefore,   l0 þ e2t0  1 l0 þ e2t0  1 1  Ln ¼ 0 and l l0 2  2t0 ¼ e2t0 at this point. Thus, we have: l0 þ e2t0  1 ¼ 0; 0:920703E

l0 þ 1  2t0 ¼ 0;

t0 ¼

l0 þ 1 ; 2

0:841405 þ 1 : 2

Because both numbers are approximate solutions of the corresponding Eqs. (1.12) and (1.13), it is not easy to note this unexpectedly simple relationship.

References Baum, W. M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231–242.

Baum, W. M. (1979). Matching, undermatching, and overmatching in studies of choice. Journal of the Experimental Analysis of Behavior, 32, 269–281. Baum, W. M., & Rachlin, H. C. (1969). Choice as time allocation. Journal of the Experimental Analysis of Behavior, 12, 861–874. Brownstein, A. J., & Pliskoff, S. S. (1968). Some effects of relative reinforcement rates and changeover delay in response-independent concurrent schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 11, 683–688. Davidson, M., & Jones, B. M. (1995). A quantitative analysis of extreme choice. Journal of the Experimental Analysis of Behavior, 64, 147–162. Davidson, M., & McCarthy, D. (1988). The matching law. Hillsdale, NJ: Lawrence Erlbaum. Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243–266. Heyman, G. M. (1982). Is time allocation unconditioned behavior? In M. L. Commons, R. J. Herrnstein, & H. Rachlin (Eds.), Quantitative analyses of behavior: Vol. 2. Matching and maximizing accounts (pp. 457–490). Cambridge, MA: Ballinger. Houston, A. I., & McNamara, J. (1981). How to maximize reward rate on two variable-interval paradigms. Journal of the Experimental Analysis of Behavior, 35, 367–396. Stephens, D. W., & Krebs, J. R. (1986). Foraging theory. Princeton, NJ: Princeton University Press. Wearden, J. H., & Burguess, I. S. (1982). Matching since Baum (1979). Journal of the Experimental Analysis of Behavior, 38, 339–348.