Probabilistic choice: A simple invariance

Probabilistic choice: A simple invariance

Behavioural Processes, Elsevier 15 (1987) 59-92 PROBABILISTIC CHOICE: J. M. HORNERand J. Department (Accepted of 59 A SIMPLE INVARIANCE E. R. ...

2MB Sizes 0 Downloads 44 Views

Behavioural Processes, Elsevier

15 (1987) 59-92

PROBABILISTIC CHOICE: J.

M. HORNERand J.

Department (Accepted

of

59

A SIMPLE INVARIANCE

E. R. STADDON

Psychology,

3 June

Duke University,

Durham, NC

27706,

U.S.A.

1987)

ABSTRACT .I. M. and Staddon .I. E. R. Horner, invariance. Behav. Processes,

1987. Probabilistic 15:5Y-92

choice:

A simple

When subjects must choose repeatedly between two or more alternatives, each of which dispenses reward on a probabilistic basis (two-armed bandit), their behavior is guided by the two possible outcomes, reward and nonreward. The simplest stochastic choice rule is that the probability of choosing an alternative increases following a reward and decreases following a nonreward (reward following). We show experimentally and theoretically that animal subjects behave as if the absolute magnitudes of the changes in choice probability caused by reward and nonreward do not depend on the response which produced the reward or nonreward (source independence), and that the effects of reward and nonreward are in constant ratio under fixed conditions (effect-ratio invariance&properties that fit the definition of satisficing. Our experimental results are either not other theories of free-operant choice such predicted by, or are inconsistent with, as Bush-Mosteller, molar maximization, momentary maximizing, and melioration (matching). Key words: satisficing,

stochastic equilibrium

model, ratio schedules, effect-ratio analysis,

pigeon, Bush-Mosteller, invariance

reward,

INTRODUCTION Understanding important

how reward

questions

symmetrical,

two-choice

to discover have

is

recurrent

and a rule

that

optimality

rate

(e.g.,

Staddon

maximizing

theories

0376.6357/87/$03.50

or

have been

many molar

of

of

that

treatments widely

of

& Motheral,

19781,

Rachlin,

that

subject,

future

choices.

situations like,

0 1987 Elsevier Science

but

to which links

each

is

in a

rewarded

choice--reward

or

and the objective We believe

is

that

we

rate animals

the organism in behavior choice,

act

always is

to

B.V. (Biomedical

strong 1978;

Division)

the

value

of

For example,

so as to maximize

& Burkhard,

be

presumed

two kinds

and probability.

we may term

Rachlin

Publishers

may be quite

can nevertheless

changes

free-operant

and what

1978;

each

to the

discussed:

assume

of

and most

problem

rule.

variable

rules

this

responses

outcome

affect this

theories

(e.g.,

the

the oldest

study

two identical

and the

a reward set

one of

in probabilistic

expectancies

In recent

variable. variable

outcomes

choice

is

way to

available

property

down to two parts:

sensitive,

reward

by which

involving

with

a situation

information

a simple

of

complicated, boiled

only

the rule

discovered

A theory

of

the

behavior A simple

situation

In such

probabilistically. nonreward-

affects

in psychology.

reward

molar Rachlin,

60

Green,

Kagel,

addition reward cf.

& Battalio,

that rate

(&

Staddon

molar

1980;

matching

of

to

computed

over

responding

a time

in the theory 1969;

of

that

animals

higher

will

local

rewards

for

animals

will

all

these

always

(e.g.,

aggregative

data,

attempts

).

occurs

under

Staddon, are

several current

often

(1972),

studying

interval

(cone

pigeons’

Right/responses

but

value

less

Melioration,

Right)

more

wheu the

reward

variable-ratio

rate, is

schedules,

spaced-responding

for

under 1977).

scheduled

VI values

are

small

are

large

for

matching,

maximizing,

(low

the

this

free-operant

1965);

reviews

matching

in

reward

(high

matching

and Peterson

cannot idea

ratios

(VI value reward

reward

account

that

variable-

(responses

scheduled

scheduled

Hinson

from

variable-interval

on concurrent (Hermstein

(see

ratios

proposed

(Staddon,

choice

(a) Although

all

free-

tradition.

Delbrfick,

sometimes

example

individual to

recent

(b) Deviations

violated:

schedules

of

for

in

concurrent

and its

different

applied

1982,

the

to

(1955)

to

Squires,

than

is

often

occur

with

that

are

effects

and maximizing.

extreme

which also

(I Daly,

assumes

probability.

1963) the

is

1966,

responses

reward

widely

response

Molar

Shimp,

assumes

with

paper

under

proposed

of

Sternberg,

that

(c)

is

example,

associated

ratio

higher

an approach

found

b;

maximizing

and Daly

not

for

Melioration

been

Fantiao,

when the VI values

deviations.

and melioration,

choice

are

reward

1980).

deal

d Myers,

performance

the mechanism

systematic maximize

Left)

extreme

does

1983a,

have not

on matching

schedules,

animals

of

variable:

Bush and Mosteller

they

in this

Myers

the

the

of 1972;

1982,

For example,

VI VI)

that

rate

when an animal

the alternative

with

consider

it

the choice

only

period

proposed.

Momentary

models

Vaughan,

1982;

time

been

to

in that

to

(Myerson

for

i.e.,

Staddon,

equalize

as those

we develop

Killeen,

to

6 Wagner,

emphases

systematic.

Left/VI

such

many conditions,

1983b;

devoted

the alternative

reasons

theory

assume

rate,

& Staddon, 1978;

have

(matching).

(see

The approach

from

(Binson

rule

Stochastic

however

1981),

overall

process:

free-operant

as a guiding

6 Casey,

approaches

operant

different

proposed

time

Rescorla

kinetic

responsible in

reward the

in to

alternative.

tending

models

and nonrewards.

are

the

as the

Vaughan,

assume

sensitive

on the maximizing

observed

or over

1980)

directly

(proposed

overall)

window,

been

thus

choose

learning

revards

There

1980; to

behavior-change

rate,

such

widely

Ziriax,

alternatives

many derivatives

theories,

maximizing

increase

reward

Stochastic

from

has also

of

silent

a particular

Hamilton,

Two main kinds

are

opposed

momentary

Silberberg,

theories

ratios

or response

probability

are

& Vaughan,

to

& Battalio,

that

or melioration

and reward

(as

Kagel,

processes

Other

1977)

local

actually Reward

1983).

Berrnstein

:

sensitive

Rachlin,

via

maximizing

Staddon,

response

experiments

1976;

do this

6 Hinson,

& Miesin,

are

animals

for

animals

as an alternative

rates)

rates). these act

so as to

to matching

variable-interval

C Heyman,

1979);

and on schedules

on singlethat

involve

&

61

a choice

between

a simple

In exploratory results

that

and an adjusting

experiments

do not

seem

choice:

matching/melioration, both

sensitive

to

with

each

neither

choice).

theory

we looked the

If

makes

at

choice

same for

both

performance

these

performance

appeared

which

under but

to

implies

the

vary

conditions

absolute

reward

a process

other

rates

both

operant

associated therefore,

local

probability

For are

choices,

these

(Homer,

1986)

variables

were

varied.

as a function

than momentary

up

free

animals

experiments

where

systematically

of

maximizing.

that

or reward same for

turned

theories

assume

In several

prediction.

repeatedly

or momentary

probabilities are

1976).

current

maximizing

variables

alternatives,

often

probability,

(reward

any point

the

maximizing,

and momentary

properties

(Lea, we have

by any of

molar

melioration

ratio

laboratory,

explicable

example,

local

in our

of

Choice

absolute

maximizing

reward

or

melioration. These each

experiments

choice

Left).

This

is

terminology

of

operant

preference.

systematic

changes

l/75

Figure

1 shows

plotted all

the

two

partial

in

the

that

Right)

rewarded

and q (on

the

decision-theory

schedule,

in the

for

in

and reward

the

four

absolute

pigeons.

terminated

animal

showed

across

a partial

of

the

one or other p = l/75

all

for

exposed l/20

food

(days

delivery. right

During to

animals

were

were l-3).

on the

close

alternative.

condition

48th

(sessions).

preference

But when p = l/20, for

the

independent

p and looked

(days

responding

days

are

of

The animals l/75

after

(proportion

rate

value

in ABA sequence: were

each

preference

preferences

of

l/20,

preference

animals

probability

choice

sessions

in

indifference

shifted

As the

across

figure

recoverable

days

shows,

after

the p =

condition. When payoff

experiment,

accounts

proportion value

of

choices responding matching, definition

probabilities

any pattern

maximizing

the

or

alternatives.

exclusive

bandit

variable-ratio

we varied

Sessions

average

p = l/75,

l/20

reward

preferences

g-10).

R/(R+L))

the

both

in the

which

toward

concurrent

schedules

p (on the

conditioning.

equal,

key:

between

a two-armed

of

p = l/75

(days

reward

probabilities

In one experiment

p-values:

4-81,

termed

a variety

When p and q are

two

constant

is

it

to

random , probabilistic

with

procedure

literature;

of

used

response

of s is are

pecks

to x/y

on the

the

R(x) two

rate

molar

= px and R(y) keys

for

higher-probability-of-reward

same for

and R(x) is

the

is is

right-hand

with

= R(x)/R(y),

identical

the

preference

Reward

:

consistent just

are

of

forced

both

independent key,

which

where

with of

Since

are

by this

procedure.

response)

so that offers

the

call

the

associated Reward

momentary

were

matching

reward

x and y are

and R(y)

two choices,

as they

both

preference

we will

maximization. = py,

choices,

consistent

(i.e., 8).

Hence for

rates

of

any the

Thus,

two

rates,

probability

maximizing

no guidance.

the

rates

reward

in this

and

is

(picking neither

by

62

345676

9

10

DAYS FIGllEE 1. The effect of absolute reward probability on choice between identical probabilistic alternatives (“two-armed bandit”). The figure plots the proportion s, across daily sessions for each animal of choices of the right-hand alternative, for two conditions, p = l/75 and p = l/20, in ABA sequence. Open squares, Bird 096; closed diamonds, Bird 145; closed squares, Bird 151; and closed triangles, Bird 156.

molar

maximizing

result:

that

nor

low reward

alternatives

this

reward

the pattern

and we have not

conditions

for

In other

figure.

simple

in which

have not

infrequently

example,

under

will

will (cf.

sometimes Allison,

it

might

be have

the

schedules,

full

the

to

conditions

(usually

sample seems

both,

clear

from

or momentary led

range

of

results

and then derive

smaller

when both than

these

results

that

is

the

going

are

high), two are

on the

low)

majority more

Our reflections

ratio-invariant

more

pigeons

something

on.

the

we For

the

that

describe

for

again

theories.

fixating

we call

is

shown in the

and here

of

we have obtained

predictions

pattern

probabilities

rather

seen

and sufficient

standard

probabilities the

We first

short.

one we have

probabilistic

(p # q), the

maximizing

us to a theory for

the only

between

all

when both in choosing

equiprobable

and exclusivity

choice

our

choice.

the necessary

unequal

contrary

predicts

exclusive-choice

indifference

are

(usually

not

isolate

studied

between

exclusive

1 is

some time

It

or ratio-invariance,

can accommodate

probabilistic

results

persistently 1983).

than melioration

that

following

found

for

yield

to

between

we have

Under other

complicated on what

switch

able

maximizing

indifference

replications, been

two probabilities

persist

probabilities. they

yet

some conditions

sometimes

side

the

yield

shown in Figure

experiments

schedules

nor momentary

probabilities

In various

procedure.

commonest,

the

(melioration)

probabilities

and higher

Unfortunately with

matching

theory,

with

reward show how

simple

sensitive

tests

with

63

an equal-probability first experiment

(symmetrical)

interdependent

for an asymmetrical

interdependent

probabilistic

The

schedule.

The second experiment

tests these predictions.

tests predictions

schedule.

RATIO-INVARIANCE Our preliminary both local reward Accounts

choice.

be modified averaging

results with equal-reward-probability rate and local reward such as melioration

that a rewarded

choice

construction increases

here; a more

formal

assumption

the probability decrements increment

more

of stochastic

this probability. owing

to reward

owing

choice

allocated

respectively,

the expected

choice-reward

of responding:

the contributions

Reward

reward

probabilities

on R:

Nonreward

on R:

Nonreward

on L:

decremented

a rewarded

combinations.

Consider

contribution

s,

a set of

to delta(s) of the four

for responses

outcomes

a(s)ps

to Right and

are as follows:

(la) (lb) (lc)

-b(sXl-p)s

(Id)

b(s)(l-q)(l-s),

in s associated

b(s).

in choice proportion,

-a(s)q(l-s)

is incremented

la-d.

Note that s, the proportion

by both reward

with

of such a co-occurrence, response,

change

and nonreward

by the other two possibilities.

change

probability

in general be some function

on Right and Left be p and q, respectively.

delta(s) is just the sum of terms

(right) choices,

s,

response)

is that the

is an amount

to delta(s) of the four possible

Reward on L:

increments

(unrewarded

s and l-s to Right and Left keys,

and let us look at the relative

outcomes

intuitively

that each reward

and each nonreward

a(s) that will

in the proportion

Let the reward

with our initial

We begin with the

A.

to each nonreward

delta(s),

with the four possible

expected

derived

the assumption

predictions

form of this assumption

is an amount

a quantity

We derive

learning models:

A useful

associated

where

about

however,

while an unrewarded

of this idea is consistent

response,

responses,

Then

alternative,

is given in Appendix

of the rewarded

We can define

Left.

might nevertheless

assumptions

of the law of effect, namely

explicit.

development

of s, and that the decrement

possible

is a simpler

in probability,

version

it needs to be made

standard

maximizing

dynamic

rule out guiding

in probability.

To show how a particular results

There

and the like.

from the most primitive

explicit

alternatives

as the variables

and momentary

to fit these data by adding

windows

decreases

probability

on R and nonreward

In words,

a rewarded

a(s), and similarly

Eq. la says that the

R response

ps, multiplied

of R

on L and

is equal to the

by the change associated

for the other equations.

Summing

with these

64

four terms

yields:

delta(s)

= a(s)ps - a(s)q(l-s)

which yields after rearrangement function

- b(s)(l-p)s

the following

+ b(s)(l-q)(l-s),

expression

(2a)

fOK delta(s) as a

of s, p, q, a(s) and b(s):

delta(s)

Another

= s[(p+q)(a(s)+b(s))

- 2b(s)l + b(s) - q[a(s)+b(s)l.

way to think of Eq. 2a is in terms of the absolute

and nOnKeWaKdS

If the number

on each side.

denoted by RR and RL, and the number

of KeWaKdS

of nonrewards

(2b)

number

of rewards

on Left and Right is

by NR and NL, then Eq. 2a can

also be written delta(s)

where

= {a(s)[RR-RLl

S is the total number

symmetry

of the model,

equilibrium) of rewards

by the

and nonrewards 2 represents

property

following

not on its source Right response

value of the change

This property,

similarly

standard

Bush-Mosteller A(l-s), whereas constant,

0
KewaKd

be termed

will

on s depending reward

is

on whether

and nonreward

choice was made:

on the right increments s by an amount

about the details

the

by the are In the

6 by an amount

As (where A is a

s = l/Z.

of a(s) and D(S), the changes

and nonreward.

(It might

in such a way that s remain

0 - 1, but this is not in fact essential predictions.)

value

is violated

of reward

on which

be the same only when

with reward

that a(s) and b(s) be constrained

equilibrium

affects

R and L choices

setting, whose

source-independence,

on the left decrements

associated

and

for a

on the Left

a(s) and b(s), depending

since the effects

for example,

So far we have said nothing choice probability

reward

on L or R.

models,

< 11, which

defined

or nonreward,

generates

that has a single probability

to act differently model,

models,

The source of the outcome

Lt is as if the subject

might

linear operator

there assumed

in number

in choice probability

event, reward

that a KeWaKd

for a nonreward.

OK nonreward which

the

(a potential

differences

At a given s-value,

6 by the same amount

of the change.

is reward

2c shows

on both sides is equal.

then altered up or down by the quantities outcome

the (weighted)

depends only on the outcome

process

Equation

a general class of novel KeWaKd-following

increments

via a stochastic

(2c)

that delta(s) is zero

between

(a Left OK Right choice).

it--and

only the d

to both sides.

it obvious

that the absolute

a response

decrements

of responses

and makes

only when the difference

Equation

- b(s)[NR-NLI)/S,

to the derivation

within

in

seem necessary the interval

of meaningful

It turns out that we need not know

the form of a(s) and

65

b(s) in order to derive equilibrium predictions. Let us consider how a(s) might depend on s. Bush-Mosteller assumes that a(s) will differ depending on whether reward is for a left or a right response. For example, if s is close to zero, reward for a right response has a large effect, reward for a left response a negligible effect, with the opposite being true for nonreward. The absolute magnitude, as well as the sign, of the effect of reward or nonreward on preference thus depends on the source of the event. Moreover, reward and nonreward vary reciprocally: When reward has a large effect, nonreward has a small one and vice versa. These asymmetries do not seem plausible. It may be at least as reasonable to assume that the subject has a fixed degree of certainty about the properties of each alternative in a symmetrical situation: If he is relatively certain, then reward and nonreward, for either choice, will both have a small effect; conversely, if he is uncertain, both will have a large effect. In other words, it seems reasonable to consider the possibility that the ability of both reward and nonreward to affect preference depends in a similar way on the value of 6. The simplest possible assumption consistent with the intuition of similar dependence on s is just that the ratio a(s)/b(s)is approximately constant for all s-values. It is easy to show that this assumption is by itself sufficient to generate equilibrium predictions from Eq. 2, even without any knowledge whatever about the form of a(s) (and b(s)). The proof is as follows.

If a(s)/b(s) = v, a

constant, then so is the quantity w = b(s)/[a(s)+b(s)l = l/(v+l),which we term the --* effect ratio Dividing both sides of Eq. 2 by the quantity [a(s)+b(s)land making the substitution yields:

[delta(s)]/[(a(s)+b(s)l= s(p + q - 2~) + w - q.

If we are interested only in the equilibrium properties of this process, we need to know only the s-values for which delta(s) is zero and the slope of the delta(s) function at those points. (In the absence of more information about a(s) and b(s), we must forego predictions about the shape of the acquisition function, the number of responses needed to achieve equilibrium, or the variance of the choice distributions.) For an equilibrium analysis, the positive multiplier l/[a(s)+b(s)]on the left-hand side of the equation can therefore be ignored, which yields the following fundamental relation:

delta(s)

=

S(p

+

q

-

2W) +

W

-

q*

(3a)

The comparable derivation from Eq. 2c yields delta(s) = v[RR-RLl - [N~-N~I.

(3b)

66

We term the process effects of reward invariance,

Deriving

from

which

Eq. 3 is derived

and nonreward)

way to derive

8, but for more complex values

of choice

point where

the equilibrium

procedures

proportion,

this function

is therefore

Figure

2 shows

experiment

a potential

or ratio

has other forms,

the abscissa,

represents

as we show in a moment)

values

that oppose

the equilibrium

from other experiments. the abscissa

direction

the prediction

at the

from

in our pilot at 0.03, a value

in

Under these conditions,

point

here because because

indifference.

s = 0.5:

the line has a negative p,q < w), so that small

the point s = 0.5 produce

for the other condition

delta(s)

in the pilot

= l/20 (which is greater

than our w-value

solution

is at s = 0.5, as before,

but the equilibrium

unstable,

because

direction

are amplified,

the predicted

ratio,

p = q

where

across

At the

the change.

2 also shows

experiment,

of a, b, p, and q.

in

delta(s) is equal to zero; this

(p + q - 2w) is negative

of s in either

is

is linear

equilibrium.

a stable eauilibrium

slope (i.e., the quantity perturbations

bandit

a plot of this type for the first condition

p = q, Eq. 3 crosses

Indifference

of ratio invariance

two-armed

(i.e., p = q = l/75); we set w, the effect

i.e., when

Figure

predictions

for the simple

s, for fixed

crosses

the range we have established

values

following,

predictions

The simplest

point

reward

for short.

to plot delta(s) (i.e., Eq. 3a, which

all

(constancy of the ratio of the

effect-ratio-invariant

the slope of the line is positive

outcome

not opposed

is exclusive

(perturbations

by the resulting

choice -I

change

of 0.03).

Here

is

of s in either

in s). Consequently,

at s = 0 or 6 = 1.

+

As Choice

Proportion

[s]

for two equal probabilistic schedules. FIGIJPH 2. Prediction of ratio-invariance The figure plots delta(s)--calculated from Eq. 3--as a function of the choice proportion, s, for p = l/75 (solid line) and p = l/20 (dashed line). The effect ratio (see text), w, is 0.03.

Thus, high,

the major

indifference

results

nor momentary

matching,

because

results,

consistent

with

these results in Appendix

from

when p is

experiment--exclusive

consistent

on the equal-probability

either

schedule

Bush-Mosteller,

all theories.

(it predicts

stable

choice

with ratio

nor molar maximization

maximizing,

we have not always

is not clear disproof

of ratio

depends

constancy

upon relative

following

discussion).

is probably

consistent

pilot experiment invariance

Neither

can predict these

any pattern

indifference

when p is

invariance.'

of choice

and its derivatives,

obtained

is

cannot predict

for all p-values,

because

with

A procedure

pattern,

of the effect ratio, w, between

a great many

procedures

way on w would

switching

the prediction

And in any case, the switching

in which

simple

in which

competing

the results

also be desirable.

effect

as shown

powerful

theories

predicted

conditions

is so simple

(see that it the

tests of ratio-

make specific,

conditions

Experiments

but this

of switching

For these reasons,

processes. More

(i.e., under

for the same situation

w).

the simple

invariance

is a poor test of the theory.

require

predictions

defined

are

A).

Of course,

constant

the pilot

low--

where depend

different

we can assume in some

clearly

1 and 2 used such

procedures.

EXPERIMENT

1 - Preference

The quantity b(s)/[a(s)+b(s )I,

Patterns

on a Symmetrical

w, which represents is

obviously

the relative

critical

There is reason to believe

analysis. individual

(or even in the same

Interdependent

effect of nonreward

to the experimental

that w will differ

individual

food is not rewarding

the same

to it--it acts as if w is close to OS--whereas

w will obviously the effects

be small,

of nonreward.

is likely to affect motivational Consequently,

for a satiated

because Anything

the value of w.

states of our animals, any experimental

animal.

conditions).

to For

for a very hungry animal will be large relative

the animal's motivational

Since we cannot w is certain

of our

individual

Food and no food are all

the effects of reward that affects

and reward,

predictions

from

under different

example,

Schedule

equate exactly

to

state

the

to differ among animals.

test of our analysis

must

be such as to allow

for

1. There is obviously a third possibility, namely, p = w--for which ratio invariance makes no point prediction (i.e., there is no unique equilibrium sOur experience leads us to think that this condition may be hard to value). achieve in practice, however, because w is not independent of such things as overall reward rate and the animal's motivational condition. When the animal can directly affect reward rate (as on the probabilistic schedules studied here) the p = w condition may therefore be an unstable one. We should also note that our analysis can predict the mode(s) of the steady-state choice distribution, but not its variance, which will depend upon the unknown functions a(s) and b(s). For the equal-probability two-arm-bandit situation, the prediction is therefore for a mode either at s = 0.5 or s = 1 or 0, depending upon whether w is greater or less than p,

68

individual

variation

function

of

comparisons of

the

animal’s

are

ruled

out.

behavior

in

individual A slight

in w.

It

constant,

actual

choice

also

history.

a fixed of

we allow

possible

that

group-average

procedures

w can vary

data

that

allow

as a

and between-conditions us to predict

patterns

situation.

Eq. 3 suggests

the reward

proportion,

quite

Thus,

We need

generalization

being

is

a fruitful

probabilities

s (we define

approach.

If,

instead

p and q to be functions

how s is

measured

of

of

the

in a moment 1, then

Eq. 3

becomes delta(s)

Once p(s)

and q(s)

unstable

equilibria.

for

of

each

are

which

interdependent upon the

desired this

alternative

purpose

as well

are

reward

turns

predicted

reward

of

to

we can pit

ratio

to have

that

equilibrium

provide

prisoner’s

dilemma

biology.

schedules points

between

response They

population

powerful

to discriminate

each

are

responses.

invariance

reward

and

procedures

for

other

in

stable

new procedures

These

conditions

relative

selection

out

schedules

possible

probability

controlling

experiments

their

parametric In

The inverted-V shows

the

consists

in

two

8 = 0.5:

0 and s = 1:

General of p(s)

both

against

any

appropriate

for

that

are

tests

of

ratio

invariance

ratio

top

panel

highly invariance, and other

= 0.085)

very

must

about essentially

of

magnitudes

of

the

that

s-value

the predicted

We will

measure

s

For Experiments

with

96,

the

obviously

choice.

special

variance

= q(s),

of

Figure

the

value.

In other

similar

results.

computed

by the

preference

report

1

modes,

elsewhere

3,

function

reward

minima

= 0.045). probabilities

probabilities

were

interdependent--SI--schedule). which

is

we used

a maximum payoff

and two

payoff

i.e.,

(symmetrical

with

and p(l) low

current

s,

M.

reward

segments

= 0.005

4 to

proportion,

schedules

the

Discussion).

responses

(bilinear)

pC.5)

key yields

relative

effect

choice

be nothing

from

we set

linear

p(O)

to

to be on the

(see

the

preceding

M-values

the

on the

triangular of

left

affects

of

interdependent

seems

M seems

same for

be a function

these

there

experiment

the

to

choices

location

data

this

always

but

of

which

schedule,

for

M, of

we have used

The main effect

but not

is

apparatus

some number,

and 2, M was 32,

the

theory.

It

it

find

game known as the multinerson

properly,

these

to

a class

predictions.

the

response

(4)

theories.

If

(i.e.,

because

that

+ w - q(s).

Eq. 4 as before

equilibrium

and q(s)

often

2wl

we can generate

frequency-dependent

so that

as making

choice

over

or

p(s)

constrained,

of

the N-person

19781,

-

we can use words,

schedules,

frequency

resemble

By choosing

+ q(s)

Eq. 4 now makes

termed

(Schelling,

defined, In other

depend formally

= s[p(s)

at

a plot

probability

exclusive

In other

of

p(s)

against

in Experiment

on both

keys,

s,

which

at indifference

responding

words,

1,

exclusive whereas

(i.e.,

s =

choice exclusive

of

8-8-7 __.___.______.._

+

AS -

Choice Proportion [s] FIGORE 3. Top panel: Reward function for the symmetrical interdependent schedule. The figure plots the probability of reward on both alternatives, p(s), as a function of the proportion of choices of the right-hand alternative, s (6 was computed over over the preceding 32 choices in this experiment.) Bottom panel: Prediction of ratio-invariance for the symmetrical interdependent schedule for three w-values. The figure plots delta(s) as a function of s, calculated from Eq. 4, by substituting the reward function shown in the top panel for p(s) and The three curves were generated with w values of .085 (curve I), q(s) (Eq. B3). Stable equilibria for curve III are at 0.58 (curve II) and 0.32 (curve 111). points d and e, for curve II at points b and c, and for curve I at point a (indifference ).

choice is

of

the

maximal

block

of

When reward

all

must rates

chooses

16 are

of

the

is

always

probability

computed

match

on the over

each left

the

two

Equal (the choices

same time

payoff

key equally and 16 of

the

reward

procedure

molar

probability often,

i.e.,

right

key.

the

same for

and (unconstrained)

predictions. always

an intermediate

animal

maximizing,

make different

response

key yields

32 choices

momentary

animals

right

when the

both

probability forces

and R(x) denominator:

choices,

maximizing,

hence

matching,

sides

= R(y)/y,

are x/y

payoff

and ratio-invariance

on both

R(x)/x

and R(y)

on both;

when in every

the

means

where

reward

rates

that

the

x and y are obtained,

= R(x)/R(y>--matching).

70

Hence, matching variable,

makes no prediction

perhaps.

Similarly,

payoff probability, maximization however, reward

because

with

because,

simple momentary

predicts

function

equal payoff probabilities most plausible

choice of constraints--on

unconstrained

maximization

Predictions

for ratio-invariance

function

resulting

delta(s) functions

As in Figure

in Figure

equilibria.

an unstable

is positive

(See Appendix

The top panel of Figure 3 summarizes for the bilinear

1).

Ahe horizontal

equilibria

at the points

for all w values.

the analysis

is essentially

B and C, which

are connected

B and C are flanked by inward-pointing

point,

is an unstable

the prediction

delta(s) function, before,

point.

when w is within

but below

point D, where

that point.

(strictly speaking,

J&&

in the bottom arrowheads

points).

the permitted

point is always

unstable,

is everywhere a choice mode

1 (point E); delta(s) functions

stable lines to

panel to indicate

Point A, the

The third horizontal

line, labelled

w3,

the range of p(s) values on one side of the p(s) values

p(s) = w, is an attractor,

which

when w is

by broken

on the other

so a choice mode

But there is no p(s) value equal to w on the right-hand

delta(s) function,

line

the same as in

the prediction

zeroes of the solid delta(s) function

indifference

The horizontal

less than w, namely,

(points b and c).

shows

of ratio

p(s) values on both sides of the maximum:

labelled

to

with the same probability

predictions

the corresponding

that they are attractors

b and c

derivations.)

line labelled w2 shows

the range of permitted

(so long as

at points

It is also possible

when p(s) is everywhere

(point A in the figure:

to

b (s = 0.36) and

at indifference

but negative

the equilibrium

SI schedule

labelled w1 shows the prediction

within

panel of Figure

slope at those points

are associated

B for formal

of the

II (w = 0.0581, delta(s) is

are stable equilibria.

that these two s-values

Experiment

might

p(s) (i.e., the

A plot

is in the bottom

equilibrium,

show analytically

indifference

Eq.4.

For curve

of reward,

invariance

the

only

by substituting

q(s)in

for three w values

that these two points

p(s).

solution

that with

maximizing

We discuss

(i.e., s = 0.5), and also at the two points

pC.5) > w), indicating indicating

p(s)and

The slope of the curve

c (s = 0.78).

for example--molar

indifference.

at s = 0.5

the maximizing

We also recognize

can be derived

3)for

and unstable

zero at indifference

with the bilinear

with maximum

2, we look for zeroes and the function

stable

(s = 0.51,

here.)

bilinear

identify

is a maximum

for both choices,

memory,

Molar

indifference

schedule

solution.

other than simple

yield the same

makes no prediction.

probability

(W e ch ose a bilinear

than that it should be

always

should tend towards

that is where reward we used.

well predict outcomes

3.

maximizing

that animals

is also the intuitively appropriate

about preference--other

since both alternatives

greater

than w.

is predicted

corresponding

side.

As

is predicted

at

side of the

Since the indifference

here at exclusive

to all three w-values

choice, s =

are shown

in

71

the bottom

panel.

In summary, compatible

there are only three choice patterns

with ratio invariance:

the same p(s) value

(on opposite

mode at a low s value side of the maximum, positive-slope

indifference,

on the SI procedure

bimodal

sides of the maximum),

corresponding

to p(s) below

and the other mode

or bimodal

the minimal

at exclusive

that are

choice with each mode at choice with one

p(s) on the other

choice (s = 1 for the

condition).

METHOD Subiects Each bird was maintained Four adult White Carneau pigeons served. its free-feeding weight, and all had previous experience with various reinforcement.

at 85% of schedules of

Apparatus All experiments were performed in a ventilated, sound-attenuating, aluminum and Plexiglas operant-conditioning chamber measuring 30 cm long x 28 cm wide x 33 cm high. Three translucent keys, 2 cm in diameter, were fixed in a triangular arrangement, apex upward, on one wall. The keys were 5.5 cm from each other, the two lower keys were 20.5 cm from the floor, and the top key was located 25.5 cm from the floor. Each key was transilluminated by a 28-w white light. Directly below and on the same wall as the three keys was a 4 x 5-cm aperture for a food magazine. The food aperture was 8 cm below the two lower keys. Reward was 2-set access to grain. During the reward operation all keylights were turned off and a 28-w light above the hopper was turned on. The experiment was controlled by a SYM microcomputer that recorded individual experimental events and their times to within 1150th of a second. Data from each experimental session were transferred to another computer for analysis. Procedure We used a choice procedure intended to equate the effortfulness of "staying" (two successive responses on the same key) and "switching" (two successive The first peck responses on different keys). Each choice involved two key-pecks. was on the lighted top key, a single peck on the top key turned it off and turned on the two bottom keys. A peck to either of the two bottom keys turned both off, delivered a reward if one was scheduled, and turned on the top key again. This two-response procedure was used in all experiments, but we varied between experiments the rule that determined the probability of reward for pecks on the two bottom keys. In Experiment 1, the probability of reward, p(s), was always the same for both choices; p(s) was determined by the proportion of choices, 8, over the preceding 32 choices according to the bilinear reward function shown in Figure 3. The controlling computer calculated s after every choice and determined the probability of reward for that choice accordingly. The first 50 responses at the beginning of each session were not rewarded; rewards were delivered according to the symmetrical interdependent schedule thereafter. Sessions terminated after the 70th reward. The experiment ran for 10 daily sessions.

72

RESULTS

Figure

4 shows

single

animal.

session

in

form

along

of

the

L and R choices

The center

each

in

row

shows

the

kind

panel

in each

a preference

shows

distribution

for

of

of

s averaged is

the

the

the

s (i.e., the

across

bilinear

for

a

experimental are

ordinate.

line,

Rewards

respectively.

R/(L+R)

session.

over

the

The right

the whole

reward

days

L-choices

along

and above

value

interdependent

the entire

R-choices

throughout

distribution

the

successive

successive

below

the

of

four

row summarizes

as blips

row

typical

data

trajectory:

choice-by-choice

on this

behavior

successive

shown

each

of

comprehensive

abscissa,

32 choices)

superimposed

the

show

are

panel

preceding

of

rows

The left

the

represented for

details

The four

schedule.

panel

in

session;

function

used

in

this

experiment. Figure (top

4 shows

row)

the

trajectory

three

animal

has a slope

distribution

shows

three)

the

varies

systematically

left)

is

stable

choice

Figure this

(computed

panels

of for

the

of in

These

preference

for 1 gives animal’s

the

are

the

same probability symmetrical, exception

animal

156.

such

row)

preference is

of

reward, points tendency

showed similar

Second,

8,

shows

a

are not

at

both

the

animal

for

of (i.e.,

by the

for

the of

SI

entirety

preference First,

of

of

all

had a modal

probability

probability

s-values

Because

each

respects.

bimodal

096 on

5, which

as defined

the animals

highest

the

bird

proportion

degrees

in two

and the that

the modal

for

again

for

sessions

none of

was the

Note

p(s).

right)

(bottom

in Figure

for s,

different

proportion,

(bottom

modes

32 responses) proportion,

were

there

two and

at a particular

each

bimodal.

choice

(rows

animal

confirmed

responses

choice

animal

s-value

day

the

and the

days

distribution the

across

symmetrical;

this

the

made by collapsing

preference.

of

to

of

where

choice

not

The major

“window”

animals

were

modal

perfectly

is

all

indifference,

each

distributions at

are

stable

of

session,

choice

first

choice).

impression

number

Although

distributions

two

This

total

the two alternatives,

Table

are

right

On the

alternative:

and third

slope,

and the

(a)

left

the

second

day (bottom

the

distribution

preference

for

there

graphs

experiment.

fourth

the

throughout

session

for

a running

show the

procedure).

On the

procedure: for

a variable

the

time

this

(b) On the

shows

schedule.

percentage

little

mode.

that

interdependent

choices the

(c) (this

of

preference

varies

through

4 suggests

the

that

trajectory

multimodal.

typical

a stable

a single

preference

shows

patterns

shows

reward.

reward,

p(s),

preference each

reward

equal

distances

modes

to be at

animal

function from the

tend is

the

to

fall

not 0.5 point.

same p(s)

value

P(S)

LEFT

CHOICES

LEFT

CHOICES

LEFT

CHOICES

I I

LEFT

CHOICES

SUCCESSIVE CHOICES

kzzszk RELATIVEFREQUENCY

Four successive experimental sessions for pigeon 096 on the symmetrical PIGIJBE 4. Within each row, the left panel shows the choice interdependent schedule. the center panel shows the value of s (averaged trajectory throughout the session; over the preceding 32 choices) choice-by-choice throughout the session; and the The right-hand panel shows the distribution of s-values (choice distribution). bilinear reward function is also shown.

096

151 I

JO

P(S) .06

.02

156

.14

15X

t

.lO

PCS> .oti

. . .._.

.02 LA

i ChoiceProportion[s]

ChoiceProportion[s]

FIGURE 5. Choice distributions (percentage of the total number of choices for which an animal achieved a particular proportion of right-hand choices, s, measured over the preceding 32 choices) on the symmetrical interdependent schedule for all four subjects. Data for each animal were combined from all 10 experimental days. Solid line is the bilinear reward function; broken line is at the estimated w-value for this choice distribution (see text).

TABLE 1. Choice modes and associated reward probabilities on the symmetrical Columns 2 and 3 give the s-values at each mode in interdependent schedule. Figure 5; columns 4 and 5 give the corresponding payoff probabilities, p(s).

DISCUSSION

The symmetrical proportions, maximizing

interdependent

so matching

schedule

(melioration)

is also inapplicable,

forces matching

makes

because

at all choice

no predictions

reward

probability,

because

the reward

here.

Momentary

p(s), is always

the

same for both alternatives. Bush-Mosteller

predicts

indifference,

same on both keys (see Appendix

A).

Molar

maximizing

probabilities

also implies

were the

indifference

in

this experiment, Despite

because

same reward The animal

Despite

pattern

choice

some

modes)

5; comparison

Each animal's

w-value,

probability

difference

w3.

than at the

show choice

096, 151, and 145 show

The bilinear

reward

(average of partial-

on each choice distribution conforms

of reward

in Figure

to one of the

may account

with absolute

a comparison

p(s) on the right-hand on the other

and the steeper

in Table

1 are

that the data are

reward

probability;

is possible

(birds 096,

side of the maximum

side.

Given normal

is always variation

in

slope of the left limb of the bilinear

that the average

than the average

modes

It is worth noting

for w to increase

it is obvious

greater

(choice mode) gives us an

The partial-preference

than the p(s) for the mode

to be slightly

animals

line at the estimated

that for the three birds where

function,

probability,

at indifference

in this experiment

invariance:

of w for each animal.

s around the modal value, reward

reward

patterns.

145 and 1511, the preferred greater

all animals

are superimposed

with some tendency

1 shows

responses

the

of indifference.

at the maximum

3 shows how each distribution

"preferred"

an estimate

consistent

paid off with

to the prediction

156 shows pattern

dashed

of its effect ratio, w.

therefore

fewer

by ratio

3; and bird

with Figure

three permissible

estimate

conformed

variation,

and a horizontal

preference

at s = 0.5.

proportions.

individual

w2 in Figure

peaked

that both keys always

of responses

has distinctly

of the type predicted

function,

Table

no animal

with the largest number

two preferred

function

appeal, given

probability,

bird 151, nevertheless

patterns

our reward

its intuitive

p(s) on the right

p(s) on the left limb.

for the small, systematic

difference

limb is likely

This small

between

the two modal

p(s) values. We have presented

data for only one reward

data represent

only a small

have done with

the SI procedure.

at bilinear

reward

these conditions by reward point,

selection

functions

we have obtained

which

is not the s-value

All these data (some of which

in Experiment

from a large number

For example,

with modes

invariance-- including

function

instances

that maximizes

These that we

we have looked

than the point s= 0.5: under

of all three of the patterns

distributions

we expect

of experiments

in other experiments

at other

1.

permitted

with a mode at the indifference reward

to report

rate under these conditions.

elsewhere)

support

ratio-

invariance. In this experiment molar maximization predictions). theories:

we compared

(melioration

In Experiment

maximization,

directly

the predictions

and momentary

2, we compare

melioration,

maximizing

directly

momentary

of ratio invariance

and

made no point

the predictions

maximizing,

of all four

and ratio invariance.

76

EXPERIMENT 2 - Preference

Experiment molar

1 showed

maximizing

maximizing in which

nondegenerate

to

momentary

1 the

which

a way to

of

reward

a response to

the

proportions

The top

panel of

Left,

in

the Right

key,

probability

of

small

probability

Because q(s)

first

of

key

degenerate

matching).

rate,

Thus,

+ (l-s)q(s):

broken

a deviation

from

matching

of

exclusive

the momentary-maximizing

requires

melioration responses choice

p(s)

responding

made to of

prediction.

the

the majority Finally,

so the

that highest

first one key

responses

higher

to

reward

condition).

The

from

where

close

to

zero

K = 0.066

line

and J is

overall

in

the

top

panel)

is

exclusively.2

matching = q(s)),

on one of

predicts

be rewarded

to

for

the highest

forces

by exclusive

that the

the

to a maximum at exclusive

= KS + J,

almost

Since

the proportion

reward

p(s)

in the

linearly

alternative

whereas is

increases

experiment.

sp(s)

with

first

have

responses

twice

minority

the AI schedule = 2p(sl,

the

to

arranged

used for

the key

SI in that

as likely is

in

on the

In the present

alternative

always

schedule

the

same.

function

functions

in

(SI) depending

from

the twice

alternative

alternative:

(i.e., the

always

is

alternatives

this

not

probability

Left,

on the majority

different

reward

We term

(i.e.,

both

in

is

reward

q(s),

= 2p(s).

for

reward

can match

also

q(s)

(0.001)

by favoring

the

or

possibilities.

changes

probability

The reward

equal.

as a secondary

interdependent

is

the

was always

processes--melioration

among these

is

ratio

The asymmetrical-interdependent

alternatives

lower

6 shows

minority

an animal

increment

the

condition),

responding

negligibly

achieved

Figure

reward

on the

local

reward.

majority

responding

that

of

p(s):

the

both

a situation

maximizing,

be interpreted

symmetrical

Moreover,

experiment.

the

probability

at exclusive

of

this

for

with

momentary

on the alternatives

operate.

the

inconsistent

predictions.

when the primary

to one alternative

favor

probability

condition

reward

The AI schedule

other.

we describe

and momentary

two alternatives

that

experiment

differentiate

reward

on the

data with)

therefore,

to

preference.

experiment,

(the

“defaults”

similar

of

overall

might,

is

probability

choice

of

Schedule

consistent

make different

perhaps--cannot

probability

as a response

all

animal

provides

current

is precluded,

Interdependent

can explain

In this

probability

maximizing,

schedule

animal’s

by (though

result the

The AI schedule the

invariance

maximizing

Our ratio-invariance

that

ratio

explained

matching

and molar

In Experiment

(AI)

that

and not

on an Asymmetrical

and matching/melioration.

invariance,

process

Patterns

that

(the the the

schedule

only

key is

will

with

(i.e., continually

the

predicted,

unconstrained

the overall reward probability, 2. For the funct)ons given, (l-s)2Ks = 2Ks - KS , which has a maximum at 6 = 1, exclusive minority.

way in which

alternatives

the animal

alternative

ensures

higher which

is

molar

P,is P = sKs + choice of the

As

Choice Proportion

[s]

FIGORB 6. Top panel: Reward functions for the asymmetrical interdependent The figure shows the probability of schedule in the positive-slope condition. reward for the left-hand, q(s), and right-hand, p(s), alternatives as a function of the proportion of choices to the right-hand alternative, s: q(skZp(s), for all values of s. The dashed line is the overall probability of reward (i.e., sp(s) + (l-s)q(s)). Bottom panel: predictions of ratio-invariance for the asymmetrical interdependent schedule (positive-slope condition), for three values of w/K (K = reward-function slope). The figure plots delta(s) as a function of 8, calculated from Eq. 4 in the text by substituting the reward functions in Figure 6 for p(s) w/K = 1.0; III, w/K = 2.0. and q(s) (Eq. B7): curve I, w/K = 0.5; II,

maximization

predicts

Ratio-invariance (see

Appendix

partial

preference

side

of

l/2,

for

is

B for

at

indifference the

low

possibility

exclusive

something

the

argument).

full

that

favors

of

the

away from

positive-slope

s-values

choice

predicts

of

majority

(g < l/3,

in

the

both

should

alternative

the maximizing

preference

alternative.

from

The animal

positive-slope

these

a mode in the

partial-preference

choice

outcomes

develop

a stable

a preference

condition)

mode at exclusive

extreme

always (i.e.,

solution:

When the

condition).

a second

the minority

different

there of

region choice

the

is

on the s < mode

the

minority

alternative. The bottom reward

panel

functions

of

Figure

in the

top

6 illustrates panel

(i.e

., p(s)

these

predictions

and q(s))

by substituting

in Eq. 4, simplifying,

the

78

then plotting shows

three

For values

delta(s) for the full range of s-values delta(s)

of w/K

functions:

for w/K =.05,

(Eq. B6).

1.0 and 2 (curves

> 1 (i.e., effect

ratio greater

there is a single stable equilibrium

at a partial

majority

left is the majority):

alternative

(s < l/2, when

When w/K < 1, there is a possibility choice

of a second

(8 = 1, if right is the minority),

further

towards

II defines

the majority

the boundary

used earlier

in discussing

preference

choice mode

which

slope),

favors the

line I in Figure at exclusive

and the partial-preference

mode

6.

minority shifts

is an example.

these two cases (i.e., w/K = 1).

Figure

panel

I, II and 111).

than reward-function

(s = 0); line III in the figure

between

The bottom

Line

In the terms

3, points A, B, and C are attractors

we

and point D

is unstable. Thus, matching invariance

and momentary

make three distinct

interdependent

schedule,

maximizing,

predictions

which

molar maximizing,

about preference

is therefore

and ratio

on the asymmetrical

a good way to choose among them.

METHOD Subiects Four adult White Carneau pigeons served. Each bird was maintained its free-feeding weight, and all had previous experience with various reinforcement.

The apparatus

and controlling

instrumentation

were

at 85% of schedules of

the same as in Experiment

1.

Procedure This procedure was based on the two-response, choice procedure described in Experiment 1. As in the SI schedule, the probability of reward for either of the two keys is determined by the proportion of choices over the last 32 cycles (M = 32). But in the AI schedule the probability of reward on the two alternatives is In this experiment, the probability of reward was always twice as not the same. For the minority choice p(s) = 0.066s high on one alternative as on the other. (so that q(s) = 0.132s). As before, the computer recalculated choice proportion after every response and determined reward probability for that response accordingly. The first 50 choices in every experimental session were never rewarded. Sessions were terminated after the 48th reward. For the first 10 sessions, the right key was the minority (positive-slope condition); for the last ten sessions, the significance of the two keys was reversed (negative-slope condition). RESULTS Figure Figure

7 shows

8 shows

both conditions positive-slope choice

the choice distribution

for the positive-slope

the same data for the negative-slope there is a partial-preference condition,

(156 shows

a small

mode.

condition.

condition. For all birds

For two of the birds

151 and 156, there is also a mode at exclusive partial-preference

mode near the minority

in

in the minority

in the

.05

P(S) .03

.01

,.I ” LA

.07

.05

P(S) .03

Chome

Proportmn

[s]

Choice

Proportion

[s]

FIGURE 7. Choice distributions (percentage of the total number of choices for which an animal achieved a particular proportion of right-hand choices, 8, measured over the preceding 32 choices) on the asymmetrical interdependent schedule. This figure shows the choice distributions for all animals for the positive-slope condition. Data for each animal were combined from all 10 experimental days in each condition. Solid line, p(s) function; broken line, overall reward probability.

15%

10X

5%

Choxe

Proportion

[S]

Choice

Proportion

[s]

FIGURE 8. Figure 8 show the distributions for the negative-slope condition. Data for each animal were combined from all 10 experimental days in each condition. Solid line, p(s) function; broken line, overall reward probability.

80

TABLE 2. Choice modes and associated reward probabilities for both conditions of Columns 2 and 3 give the s-values at the asymmetrical interdependent schedule. the partial-preference choice mode and the corresponding reward probability, p(s). mode, from Eq. B6: Column 4 is the w value calculated from the partial-preference reward-function slope, K, was 0.066. Condition

Positive-Slope Bird

I

s

I

Negative-Slope

negative

slope-condition,

majority

side).

condition

Most

but a much

importantly,

I

Condition

.070 .042 .022 .OlO

.469 .688 .844 .938

096 145 151 156

Calculated w

I

p(s)

-.230 ,059 .025 .OlO

larger partial-preference

all animals,

(Figure 81, show a predominant

mode on the

except 096 in the negative-slope

partial-preference

mode on the majority

side. Table 2 gives the choice proportion alternatives

at the partial-preference

slope conditions. using

Equation

and the probability mode

of reward

for the positive-slope

Table 2 also gives the w values

for both

and negative-

for each animal,

calculated

B6.

DISCUSSION

Neither matching,

nor momentary

results from this experiment. exclusive

choice of the majority

maximizing

implies

failed to occur. alternative, preference

which

did not occur.

choice of the minority

predicts

maximizing

the

predict

Molar

alternative,

which also

Birds 151 and 156 do show a choice mode at the minority

but this preference

The results invariance:

nor molar maximizing

and momentary

alternative,

exclusive

is in conjunction

mode near the majority

other animals.

majority

simple

maximizing,

Both matching

Molar

maximizing

alternative, cannot account

with the stable partial-

consistent

in seven out of eight subject-conditions

(a) All animals

alternative

showed

a stable partial

in the positive-slope

except bird 096, showed

a stable partial

with that shown by the

for both aspects

preference

are consistent

preference

condition.

of these results. with ratio-

that favored

the

(b) All of the animals,

that favored

the majority

81

alternative

in the negative-slope

and 156, the

showed

a second

positive-slope Bird

096’s

performance

condition

procedure

bird

with

for

its

similar

consideration

condition

b(s)),

reward

to

the

birds

showed

consistent

the

recover

first

condition

156--its

previous

second

animal

predicting

the

time

related

it

will

using

have

been

this

Since

in Table

the too

AI

short

a

experiment; the negative-slope

we lack

nonreward

take

for

156 in the

mode in

condition.

in

be accounted

this

of

151

invariance.

studies

may simply

(and the

that

ratio-invariance,

that

w-values

less

(see

exclusive

than K (see

Appendix

two reward

for

the

choice

of

detailed function,

reward-following

is not

each

to

that

were

animal

whereas

negative-slope

possible

21,

functions

expectations,

096 in the

Table

the minority. this

is

also

B).

were the same for

up to

bird it

mode at

by giving

w values

of

a second

invariance

145 and 156 lived

performance

this

Later

10 sessions

to bird the

ratio

for

birds

alternative

cannot

w-value

a(s)

showed

ratio

We had expected could

with

condition

function

calculated

with

Pigeons

consistent

animals,

stabilize.

Some of all

the

mode shown by bird

in the

may apply

on the

minority

calculated

that

of

the

of

a weak exception.

stabilize

we have no way of

process

Since

also

may be a vestige

information

negative

096 suggest to

also

partial-preference is

behavior

is

Two of

choice

the negative-slope

the

secondary

(c)

exclusive

which

in

(note

small

negative-slope

time

mode at

condition,

by ratio-invariance 21, and the

condition.

compare

in both

bird

is

images

not.

Because

inconsistent

in the

we

conditions.

151 did

condition w values

mirror

two

with

conditions

for

animal. The asymmetrical

climbing always the

choice

relative,

the

towards payoff

a rule,

probability. with

rules,

selecting

animal

such

like

a probability

less Ratio

fall

into

how good

the

and momentary

this

ratio,

w,

dangers

that

takes

kind

of

taking

trap.

of

Ratio of

invariance

will

absolute,

is

as

just

payoff

terms,

if

a shift

it’s

by

drive

as well

invariance

force

as “if

hill-

Both rules,

absolute

in relative

may be summarized

pure

many conditions

account

account

of

maximizing.

can under

an alternative

than w, ratio

invariance

shows

two alternatives, A rule

effect

No matter

otherwise,

of

extinction.

the

procedure

melioration

better

cannot

with

alternatives. it;

interdependent

good

it

to

pays

of

other

enough,

go for

sample.”

GENERALDISCUSSION

If higher

one choice rate--than

probability studies

has a higher another,

and reward (see

Mackintosh,

probability

an animal

rate

should

1974;

of

would

being

do well

be important

Staddon,

1983,

for

rewarded--or to

opt

for

is it.

rewarded

at a

Payoff

determiners

of

choice.

reviews)

show

that

Numerous animals

do

82

indeed choose

the alternative

directly

sensitive

Although

to reward

the strategy

infallible.

It has become

intelligently

probability

a truism

conditions,

The nature

conditions.

of the strategy.

probability,

and perhaps

reward

experiments, that reward

made

to predicting

Schwartz,

that the reduction

It is noteworthy, just this property.

the choice modes

for the covariation Ratio-invariant recurrent

performance) dependence

model.

It is new

of which

is true of standard

stochastic

models

dependent

variable

"stimulus

value," which

experimental approach

as well as being

such as molar and momentary An intriguing nonreward

property

relative

stochastic

it is low; in to w), the closer

provides

an explanation

by current

"learning"

independence

models.

gain in concreteness.

maximizing

to

(really

(i.e., change

of the effect ratio--neither

1972; Vaughan,

incompatible

inexplicable

willing

of choice-probability

not hard-to-measure

is a pleasing

results seem absolutely

source

amount,

and constancy

choice proportion,

that has

are more

(relative

stochastic

in that it assumes

8 Wagner,

of reinforcement.

a mechanism

is high than when

invariance

has even been

both an old and a new account of

linear-operator

(e.g., Rescorla

property

that pigeons

it is another

or nonreward)

For example,

has often been noted

probability.

provides

of the sign, but not the absolute

on the source of reward

from our

The suggestion

probability

and reward

It is old because

choice.

about

to reward

invariance--

properties.

provides

showed

Effect-ratio

of stereotypy

some

of something

results

variation 1971).

probability

reward-following

sensitivity

strategy--&

is the defining

the reward

to exclusivity.

to fail under

can tell us something

interesting

that ratio-invariance

the reward

to

that works well under

may be induced

the complex

6 Simmelhag,

1 and 2, the higher

often act

are equally

expect organisms

is a consequence

behavioral

Our pilot experiments

fixate on one key when Experiments

has several

of variation

therefore,

may

we have studied.

in some detail

1980; Staddon

that animals

strategy

reward-following

reduces

alternative direct nor

beings

show that apparent

conditions

ratio invariance (reinforcement)

ecology

of these failures

an effect-ratio-invariant

In addition

probability

Hence, we might

rate as well,

under

are

rate.

No doubt human

Our results

at least under the restricted

(e.g.,

in behavioral

but which nevertheless

the nature

that animals

this may be neither

option using a simple

artificial

or rate of reward

to assume

and reward

means.

under many conditions.

the higher payoff

many natural

probability

rate or highest

that accomplishes

using unintelligent

unsophisticated

simpler:

the higher

the choice of the highest

be adaptive,

identify

with

Thus, it has seemed reasonable

many conditions.

"response

some

strength"

Moreover,

with the simple theories

Unlike

1982) it takes as or

our

Bush-Mosteller

of free-operant

choice,

and melioration.

of ratio invariance

is the effect ratio w (the effect of

to the sum of the effects of reward

and nonreward),

which

seems

83

to be approximately constant for a given animal in a given

Situation.

We

might

expect that conditions (such as depression) or treatments (such as food deprivation or drugs) that affect sensitivity to reward will have measurable effects on w. If w increases with overall reward rate, as our earlier arguments suggest it might, then ratio invariance has precisely the properties needed for an optimalpatch-foraging mechanism consistent with the much-studied marginal-value theorem (Cbarnov, 1976). The argument is as follows. As a given food patch depletes, the reward probability must drop, eventually reaching a p value less than the value of w. At this point, the animal should begin to show a partial preference (as in Experiment l), i.e., to leave the patch and sample another. If w is directly related to overall food rate, as seems likely, then this "giving-up time" will be shorter (i.e.,at higher probability values) in richer environments, exactly as the marginal-value theorem requires. Ratio invariance has properties similar to what Simon (1956)has termed satisficing, that is, it yields behavior that is often, but not always optimal-but always results in 801118 gain. An organism that satisfices will settle for an acceptable alternative, even if a better one is available, and will not settle for the best alternative if it is not good enough. Both of these characteristicsare predicted by ratio invariance in the two-armed-bandit situation since fixation on the minority alternative is possible, providing the minority payoff probability is greater than w; and partial preference is possible if neither p nor q exceeds w. Neither melioration nor momentary maximizing are satisficing rules. In Experiment 2, for example, either rule, strictly applied, leads to a reward rate close to zero--because both involve comparison only between proferred alternatives, not with an internal standard. The essence of satisficing is the acceptance or rejection of alternative with respect to a threshold value (parameter w, in ratio-invariance). This property is guaranteed by our version of reward-following because it includes nonreward avoidance. Not only does the probability of a behavior increase following reward, it also decreases following nonreward--which protects the animal against the self-generatedextinction entailed by both melioration and momentary maximizing under the asymmetrical interdependent procedure in Experiment 2.

Extensions --to other choice procedures Two-armed A bandit. uneoual probabilities. Exclusive choice of the majority is the usual result on these procedures (e.g.,Herrnstein 6 Loveland, 19751, and it is the prediction of reward invariance when p < w < q (AppendixA). Ratio invariance also makes the colnter-intuitiveprediction that animals can show exclusive choice for the minority alternative when the subject begins with that

84

initial

preference

and we have favor

the

change,

and p and q are

found

(Homer

minority

if

they

preference,

alternative

begin

effect

of

a(s)

to experience

many a(s) where

and b(s)

responding

functions interval

run,

each

increased

with

a run of

for

Appendix

found

A for

animals

some time

= Bs(l-s), is

obviously

apparent

possible

will

after

proof),

often

a schedule

a persistent

analysis

leaves

the

we do not

know the

details

we can predict

we cannot

predict

minority

important

to

realize

be absolutely of

We cannot

a serious

that

with

a(a)

and b(a),

(i.e.,

on the

from

the

stability

ratio

w),

of

of

if

the

the

quite

just

since

properties, we pave

the

so that

aspects which way for

exclusive

Our

as a theory. occur

there

can be modes; it

ratio

at other

(because

effects).

Nevertheless,

is

invariance places

sufficient

usually

=

two-armed-bandit

(even

to disprove

what we mean by stable

animals

a(s) exclusive

changes

where

occurred

a reasonable dynamic

preferences

functions.3

strong;

not

is

and nonreward

modes.

is

s to were

effect

in the

which

(s-values)

modes

limiting

functions

effect

reward

simplest

persistent

invariance at

the

s-value

define

that

majority,

subjects

all

effect

and data

are

stable

within

of

observed

predictions

program

(for

the

the

A and B, however,

reward

proportions

for

The first

show

the

sufficient

1000 “stat”

the

the rate

of

the

cause

towards

many choices

using

and b(s),

in practice,

out

after

prediction

a permitted

using

by separating

effect

a(s)

assurance

limitation

but

on the nature

sizes

invalidated

minority.

account

these

a mode at

the

many values

choice

relative

we have been

Moreover,

of

side

this

in which

of

two-arm-bandit

shift

with

on the nature

occasionally

3 illustrates

for

incompleteness

only

the

minority

= constant,

depend

the

on the

0 < B < A < 1,

of

of

a significant

between

out

to

will

simulations

for

inconsistency

emphasizes

randomness

a simulation

depends

stability

procedures

of

where

situation

model).

and b(s)

seems

schedules

Table

Similar

choice

absence

produce

at the minority,

choice

not

that

minority

reward

preference

majority.

minority

though

the

nonrewards to

1) by means

minority

would

w (see

data)

We have not

and the

the

functions)

an initial

and b(s)

Hence,

of

= constant,

0 -

the

for

and b(s)

concentration

to

This

than

procedure

bias.

may be “captured.”

(a(s)

the

As(l-s)

that

The randomness

subject

in this

preference

functions

procedure.

shift

with

greater

unpublished

however.

The transience the

both

& Staddon,

adjust

but

the

this

is

to the

time. of

choice,

depend a truly

only

which

depend

on their

comprehensive,

upon

ratio dynamic

account. 3. For the first simulation, a(s) = 0.098 and b(s) = 0.002, with s-values so that s is limited to the interval 0 or > 1 set equal to 0 or 1, respectively, We have not pursued these For the second simulation, A = 4a and B = 4b. o-1. simulations to the point where we can be sure whether the exclusive minority choice under the second pair of effect functions is truly permanent, or merely relatively long lasting.

<

85

TABLE

3. Ratio-invariantreward-following simulation for choice between two unequal probabilistic schedules. The table gives the percentage (out of 1000 "stat" subjects) showing different values of 8 (columns) after different numbers of choices, from 1 to 10,000 (rows). All subjects began with s in the interval 0.65-0.73. The reward probabilities were p = 0.05 and q = 0.10; w = 0.02, and a(s) and b(s) were positive constants over the range 0 < s < 1, zero otherwise. Zeroes are omitted to increase readability.

Mean Choice Proportion (8) of resp,

No.

1 5 20 50 100 200 500 1000 2000 5000 10000

-.18 -.27 -.36 -.45 -.55

7 29 4E 56 76 88 90

4 4 3 5 4 8 8

12 3 2

4 2 1

6 5 2

8 7 3

n 9 17 15 14 12 8 5

2 15 23 29 25 20 9 1

Interdependent variable-interval schedules. Our analysis may account for

regularities in the data of Herrnstein and Vaughan (1980) on an interdependent

procedure based on a variable-interval (VI) schedule. In this experiment, local w

of reward were held constant and equal for both choices, but the rate

depended upon current preference (averaged over a block of time). The reward function had a maximum at s = 0.25;nevertheless, all four pigeons showed a preference close to indifference (s = 0.5: Herrnstein 6 Vaughan, Fig. 5.9). Ratio-invariancemay account for these data as follows: On variable-interval schedules, response rate is typically much higher than reward rate. For example, pigeons may peck at 80/min for rewards delivered at a rate of just l/min (cf. Catania 6 Reynolds, 19681, so that reward probability, p(s), is typically quite small (on the order of O.Ol-0.021,smaller than the typical w values in our experiments. When the reward function in an equal-probabilityinterdependent schedule does not permit a p(s) value 1 w, ratio invariance predicts indifference between the alternatives (see Appendix B), which is what Hertnstein and Vaughan found. Position -* habits It is well known that animals in discrimination tasks with spatial choice responses (e.g.,the left and right arms of a T-maze) will often revert to fixed choice for one alternative when they are unable to make the necessary discrimination. The reward probabilities in these experiments are typically high. It may be possible to obtain a reward on every trial if responding is correct, for example, which implies p = 0.5 even if the animal cannot discriminate at all. Hence, position habits may simply reflect ratioinvariance, as we showed in Experiment 1 for the high-reward-probability

86

condition.

An implication

repeatedly

by the same

successive

discrimination

next

(i.e., the animal

of this analysis

individual

this kind of independence communication), this

although

habits

in a series of difficult

settle randomly

shown

tasks, such as

should be uncorrelated

reversals,

should

is that position

from one task to the

on one side or another).

is often found (N. .I. Mackintosh,

Apparently

personal

we have not been able to find published

data bearing

on

prediction.

Concurrent

variable-interval

choice between

variable-interval

some assumption excluded

must

predictions

on VI schedules approximately

Ratio-invariance

schedules,

a much-studied

can be applied procedure,

about the way in which reward rate affects response

heretofore,

interesting

schedules.

reward constant

be reintroduced

into the analysis.

rate; time,

Nevertheless,

can be derived using only the well-established rate has little effect on response over a wide range of VI values

to

if we make

rate, which

(cf. Catania

some

fact that is

6 Reynolds,

1968). On VI schedules,

reward

probability

the pigeon pecks, the more

more slowly

exact form of this inverse relation (Baum, 1973) we assume constant,

x+y=K,

responding

is inversely

for VI.

to response

upon the molar

If we let response

feedback

rate on the two choices

function

appropriate

19781, then the reward

Substituting allows

these values

us to derive

Appendix

are:

x and y are response

of right-choices,

as before.

for p(s) and p(l-s) for p(s) and q(s) in equation

the delta(s) function

makes

First, as the schedules

for the two-choice

VI schedule

4

(see

between

schedules

(VI 60s vs. VI12Os).

function;

response

necessarily

ratios

scheduled

individual

animals.

continue

reward

A related prediction

because

exclusive

between et al.

choice on the majority VI 6s and VI 12s) and were

(Note that these changes of the properties obtained

cf. Staddon,

of ratio-invariance

Given that overall

choice--

is implied

(VI 600s vs. VI 1200s) than between

to match

ratios:

exclusive

indifference

(choice between

lean schedules

had little effect on matching,

implies

VI schedules.

with the data of Fantino

tended toward

for very rich VI schedules

indifferent

moderate

is consistent

that animals

about choice between

the model

richer,

are high; conversely,

This prediction

(1972), who showed alternative

two main predictions

becomes

typical p values

lean choices.

more

p(s) =

maximum

C).

Ratio-invariance

because

is the proportion

be

for random

probabilities

and s (=x/(x+y), where

rates for the two VI schedules,

rates to the two alternatives)

The

function

R(x)/s = A/(A+x) and p(l-s) = B/(B+y), where A and B are the scheduled reward

rate--the

likely each peck will be rewarded.

depends

and use the feedback

(Staddon 8 Motheral,

related

involves

response

reward

Hinson

in preference

of the VI molar

feedback

ratios, but not

C Kram,

1981.)

the performance

rate is approximately

bias of constant

87

over a wide

range of scheduled

overmatching

or undermatching

different

values

of w.

move from consistent scheduled

values.

deprivation

This

second

animals

rate

with different

We suspect,

in w.

increases

reward

VI values

for reasons

Our predictions

on typical

to VI schedules

absolute

reward

VI schedules.

increase

in reward

change

as reward

will be reduced

that the increase

Obviously,

probability

than any opposing

that w will

will be such as to produce

must await measurement

probability

effects

should

concurrent

in reward

effect of reward

assume

is to

with respect to

tested.

have a larger effect

ratio p/w, even though w also increases. analysis

0.5 to 0, the effect

Very hungry animals

already discussed,

with richer VI values

rates for

the value of w (such as food

that the changes

(i.e., the differential

rates).

associated

or undermatching.

assume

consistent

reward

should have corresponding

has not been systematically

Both of these predictions associated

from

that affect

should undermatch,

predicts

relative

overmatching

food rate, for example)

of overmatching

prediction

to consistent

manipulations

Thus,

satiated

overmatch,

to scheduled

As the value of w changes

undermatching

and absolute

on the degree

VI values, ratio invariance with respect

at high

probability,

a net increase

definitive

extension

of the effects of reward

p,

in the of our

rate and

on the value of w.

CONCLUSION

We have shown that a source-independent, following

process

data on simple

is sufficient

data on several standard concurrent explains

(matching),

in operant

and unconstrained

a type of satisficing liable, has properties puzzling

empirical

molar

and avoids consistent

effects

Ratio invariance

situations

or momentary

maximizing.

situations

where

between

interval schedules --where

dominate.

Future

work

in simple

recurrent

exclusive

fashion

cues such as time processes

Ratio

with, standard melioration invariance

pure hill-climbing

with optimal-foraging

theory,

under various

choice

is

rules are

and may underlie

habits.

are available,

operating, however,

such as momentary

in more such as choice

maximizing

will need to settle on just how many processes

choice, and whether

and

Ratio invariance

incompatible

may not be the, or the only, process

complex

bandit,

such as Bush-Mosteller,

the traps to which

such as position

as well as published

schedules.

by, or actually

reward-

range of individual-subject

schedules,

such as the two-armed

variable-interval

either not explained

of choice

for a wide

probabilistic

choice procedures

and interdependent results

treatments

to account

and interdependent

effect-ratio-invariant,

they cooperate regimens.

may

are involved

or act in a mutually

88

ACKNOWLEDGEMENTS This paper is based on a dissertation submitted by JMH in partial fulfillment of the requirements for the Ph.D. degree at Duke University. Parts of the work were presented at the November, 1985, meeting of the Psychonomic Society in Boston, MA. We thank Juan Delius and Mark Rausher for comments on the manuscript and Donald Laming for suggesting the Bush-Mosteller stability analysis. The research was supported by grants to Duke University from the National Science Foundation and the Pew Memorial Trust. JERS wishes to thank the Alexander von Humboldt-Stiftung for support, and Juan Delius and the Ruhr-LJniversit&, Bochum, FRG, for hospitality during the preparation of the paper. HEFERFINCES Allison, J., 1983. Behavioral Economics. Praeger, New York. Baum, W. M, 1973. The correlation-based law of effect. J. Exp. Anal. Behav., 20: 137-153. Bush, R. R. and Mosteller, F., 1955. Stochastic Models for Learning. Wiley, New York. Catania, A. C. and Reynolds, G. S., 1968. A quantitative analysis of the behavior maintained by interval schedules of reinforcement. J. Exp. Anal. Behav., 11: 327-383. Charnov, E. L., 1976. Optimal foraging: The marginal value theorem. Theor. Pop. Bio., 9: 129-136. Daly, H. B. and Daly, J. T., 1982. A mathematical model of reward and aversive nonreward: Its application to over 30 appetitive learning situations. J. Exp. Psych.: Gen., 111: 441-480. Fantino, E., Squires, N., DelbrUck, N. and Peterson, C, 1972. Choice behavior and the accessibility of the reinforcer.J.Exp.Anal.Behav., 18: 35-43. Herrnstein, R. J. and Loveland, D. H., 1975. Maximizing and matching on concurrent ratio schedules. J. Exp. Anal. Behav., 24: 107-116. Herrnstein R. 3. and Vaughan, W., 1980. Melioration and behavioral allocation. In: J. E. R. Staddon (Editor), Limits to Action: The Allocation of Individual Behavior. Academic Press, New York. Hinson, J. M.and Staddon, J. E.R., 1983a. Hill-climbing by pigeons. J. Exp. Anal.Behav., 39: 25-47. Hinson, J.H. and Staddon, J. E. R., 1983b. Matching, maximizing and hillclimbing. J.Exp.Anal. Behav., 40: 321-331. as a determinant of choice. Ph.D. Homer, J. M., 1986. Reward-following Dissertation, Duke. Academic Press, Mackintosh, N. J., 1974. The Psychology of Animal Learning. New York. Stochastic Learning Models. Proceedings of the Third Mosteller, F., 1955. Berkeley Symposium on Mathematical Statistics, 5: 151-167. Myerson, J. and Hale, S., 1984. Transition-State Behavior on Cone VR VR: A Comparison of Melioration and the Kinetic Model. Paper presented at the Meeting of the Association for Behavior Analysist, Nashville, TN. Myerson, J. and Miezin, F. M., 1980. The kinetics of choice: An operant systems analysis. Psychol. Rev., 87: 160-174. Rachlin, H., 1978. A molar theory of reinforcement schedules. J. Exp. Anal. Behav., 30: 345-360. Rachlin, H. and Burkhard, B., 1978. The temporal triangle: Response substitution in instrumental conditioning. Psychol. Rev., 85: 22-48. demand Rachlin, H., Green, L., Kagel, J.H. and Battalio, R.C., 1976. Economic theory and psychological studies of choice. In: G. Bower (Editor), The Psychology of Learning and Motivation: Vol. 10. Academic Press, New York. Substitutability in time Rachlin,H.C., Kagel, J.H. and Battalio, R. C., 1980. allocation. Psychol. Rev., 87: 355-374. Rescorla, R. A. and Wagner, A. R., 1972. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: A. II. AppletonBlack and W. R. Prokasy (Editors), Classical Conditioning Century-Crofts, New York.

89

Schelling, T. C., 1978. Micromotives and Macrobehavior. Norton, New York. Development of complex, stereotyped behavior in the pigeon. Schwartz, B., 1980. J. Exp. Anal. Behav., 33: 153-166. Probabilistically reinforced choice behavior in pigeons. Shimp, C. P., 1966. J. Exp. Anal. Behav., 9: 443-455. Shimp, C. P., 1969. Optimal behavior in free-operant experiments. Psychol. Rev., 76: 97-112. Silberberg, A., Hamilton, B., Ziriax, J. M. and Casey, S., 1978. The structure of choice. J. Exp. Psych.: An. Behav. Proc., 4: 368-398. Rational choice and the structure of the environment. Simon, 8. A., 1956. Psychol. Rev., 63: 129-138. Staddon, J. E. R., 1965. Some properties of spaced responding in pigeons. J. Exp. Anal. Behav., 8: 19-27. On Herrnstein’s equation and related forms. J. Exp. Staddon, J. E. R., 1977. Anal. Behav., 28: 163-170. Staddon J. E. R., 1980. Optimality analyses of operant behavior and their In: J. E. R. Staddon (Editor), Limits to Action: relation to optimal foraging. The Allocation of Individual Behavior. Academic Press, New York. Cambridge University Staddon J. E. R., 1983. Adaptive Behavior and Learning. Press, Cambridge. Staddon, J. E. R. and Hinson, J. M, 1983. Optimization: A result or a mechanism? Science, 221: 976-977. J. M. and Kram, R., 1981. Optimal choice. J. Exp. Staddon, J. E. R., Hinson, Anal. Behav., 35: 397-412. On matching and maximizing in operant Staddon J. E. R. and Motheral, S., 1978. Psychol. Rev., 85: 436-444. choice experiments. Staddon, J. E. R. and Simmelhag, V., 1971. The “superstition” experiment: A reexamination of its implications for the principles of adaptive behavior. Psychol. Rev., 78: 3-43. Stochastic learning theory. In: R. D. Lute, R. R. Bush and Sternberg, S., 1963. E. Galanter (Editors), Handbook of Mathematical Psychology: Vol. 2. Wiley, New York. matching and maximization. J. Exp. Anal. Behav., Vaughan, W., 1981. Melioration, 36: 141-149. Choice and the Rescorla-Wagner model. In: M. L. Commons, R. Vaughan, W., 1982. J. Herrnstein and H. Rachlin (Editors), Quantitative Analyses of Behavior: Vol. Matching and Maximizing Accounts. Ballinger, Cambridge, MA.

A of Reward-Following

APPENDIX

Stability

Properties

Uodels

We consider the general two-choice probabilistic situation, Ratio invariance. p and q on right and left, respectively. Let sn be the with payoff probabil ities Then the four is to the right-hand alternative. probability that the nth response possible sequences of events consequent on the nth response, together with their are as shown in the contingency table below (event associated probabilities, probabilities are at the top in each cell, changes in s below in II);

Left Response

on: Right

Reward

Nonreward Outcome

2,

90

(The changes in sn following reward and nonreward, a$,) and b$,), can be constrained such that 0 < sn < 1, but this is not necessary for stability analysis.) The four possible changes in sn, i.e., sn+l-s,, are as follows: Reward Nonr. Reward Nonr.

on on on on

R: R: L: L:

"n+lSsn = a(6 ) with probability snp = a(s,)s p sn+1-s, = -b(E ) with probability s (1-p) = -g(s 1s (1-p) )q = -a(s")(?-s )q 'n+lmsn = -a(s") with probability 8-s (l-st)(l-q) = %(s,)(?-s,)(l-q) "n+lmsn = b(sz) with probability

The expected value of the change summing the four contributions:

= a(sn)snp - b(s,)s,(l-p)

E{s n+l-snlsn)

which can be rearranged E{s n+l-snlsn~

in s following

response

n can be obtained

- a(s,)(l-s,)q

by

+ b(s,)(l-s,)(l-q),

to

= sn[(p*q)(a(s,)+b(s,))

- Zb(s,)l

+ b(s,)-q(a(s,)+b(sn)),

which is the same as Eq. 2 in the text, where E{sn+l-sn~sn~ sn is written simply as s; equation 3, delta(s)

is termed

delta(s) and

= s(p + q - 2~) + w - q,

where w = b(s)/[a(s)+b(s)l, is there derived from Eq.2. For ease of reading in Two-armed bandit: stability analysis. subsequent analyses we replace s by s. Equation 3 in the text describes the twoarm-bandit situation, with payof P probabilities p and q. The equilibrium behavior for ratio invariance depends upon the conditions for equilibrium (delta(s) = 01, partial preference (0 < R < 11, and stability (negative slope for equation 3). The equilibrium condition yields (Al)

P = (9 - w)/(p + q - 2w),

gives s^ = 0.5 for the case p = q. The stability condition requires that p + q < 2w, which is stable for the case p = q if p < W. For a partial preference, given that 8 must always be positive, the numerator of Eq. Al must be negative if the partial preference is to be stable, i.e., w > q, so that the condition for a stable partial preference is that p,q < W. If p + q > 2w, so that there is no stable equilibrium, we can distinguish two cases: p,q > w, or p > w > q (or q < w < p). In the first case, delta(s) has a zero at the point given by Eq. Al, so that the equilibrium depends on the initial implies value of 8: if to the left, s^ = 0; if to the right, s'= 1. This solution the possibility of a stable minority choice --but for reasons discussed in the text this is not always a realizable long-term outcome. In the final case, the prediction is for stable exclusive choice of the majority alternative. The analysis for the two-armed-bandit stabilitv analvsis Bush-Mosteller: The outcome table situation can be done in the same way as for ratio invariance. is as follows, where A and B are constants, 0
/ Cl-s,)q

1~l-sn~~l-q.

Left Response

on: Right

2

Reward

Nonreward Outcome

91

Replacing following

sn by s for ease of reading, expectation function delta(s)

which

simplifies

= Aps(l-s)

- Bs2(l-p)

as before,

- Aps(l-s)

and setting p = q, yields

the

+ (l-~)~(l-p)B,

at once to delta(s)

= B(l-p)(l-2s).

(A2)

For an equilibrium, delta(s)= 0, which yields s = l/2 as a stable solution for all p values, since the quantity B(l-p) is always > 0. The solution for the symmetrical interdependent 61) schedule in Experiment 1 is the same, indifference, since we merely replace p in Eq. A2 with p(s).

Interdependent

Probabilistic

APPENDIXB Schedules:

Ratio-Invariance

Predictions

In an interdependent schedule the payoff probability for each choice, p(s) or q(s), depends upon choice proportion, S. The equilibrium properties of such a schedule can be derived from Eq. 3 in the text by substituting p(s) and q(s) for p and q, and proceeding as before: delta(s)

= s(p(s)

+ q(s) - 2w) +w

- q(s).

For the schedule -metrical interdependent schedule. q(s) for all values of s, whence Eq. Bl becomes delta(s)

= 2s(p(s)

(Bl)

in Experiment

- w) + w - p(s),

1, p(s) =

(B2)

from which it is apparent that delta(s) is always zero when p(s) = w, SO that if p(s) = w is a possible value for p(s) it is a potential equilibrium. To find the actual s value corresponding to p(s) = w, we need to substitute the actual function to be used for p(s). The left limb of the bilinear function used in Experiment 1 is p(s) = KS + J, where K and J are constants and J is small; to an approximation p(s) = KS, whence delta(s) which

simplifies

= 2s(Ks - w) + w - KS,

to delta(s)

= 2Ks2 - s(2w+K)

+ w,

setting Eq. B3 = 0 yields the roots s = l/2 and s = w/K, which equilibria. The derivative of Eq. B3 is d/ds[delta(s)l

= 4Ks - 2w - K.

(B3) are potential

(B4)

Substituting s = l/2 or s = w/K yields the condition that the equilibrium at 8 = l/2 is stable iff w > K/2, whereas the equilibrium at w/K is stable iff w < K/2. Since our linear function p(s) = KS extends from s = 0 to s = l/2, it will obviously intersect the line p(s) = w iff w < K/2, as Figure 3 in the text indicates. The analysis for the other limb of the bilinear function in Experiment 1 is the same. Thus, when w > K/2, s = l/2 is the only stable equilibrium; otherwise, there will be a stable equilibrium at the point s = w/K. These possibilities are all summarized graphically for the bilinear function in Figure 3.

92

Asvmmetrical interdenendent schedule. For the schedule payoff probability on the left, q(s) was always twice that Substituting this constraint into Eq. Bl yields 2Pb 1. delta(s) Substituting

for

p(s)

the

= s(3p(s) linear

delta(s)

- 2~)

+ w -

function

(p(s)

= 3Ks2 -

sZ(w+K)

in Experiment on the right:

Zp(s). = KS) shown

2, the q(s)

=

CBS) in

Figure

+ w.

6: (~6)

represent potential equilibria, whose stability The two roots of this equation in the usual way. must be assessed The properties of Eq. B6 can be seen most _. . , .. easily when we equate it to zero (to tind the roots) and divide through by K; if we denote w/K by V, Eq. B6 then becomes 0 = 382 -

2s(V+l)

+ v.

(B7)

If V = 1 (i.e., w = K), Eq. B7 has a root at s = 1, which has a positive slope; the other root is at s = l/3. For values of V > 1 (w > K), there is no second root in the interval 0 < s < 1, and the first root is always in the interval l/3 < s < l/2. When 0 < V < 1 (w < K), there are always two roots: a stable root in the interval 0 < s < l/3, and an unstable root in the interval l/2 < s < 1. Examples of delta(s) functions with w/K in each of these three ranges are shown in Figure 7. Thus, for the asymmetrical interdependent schedule, ratio invariance predicts that the partial-preference mode must always be in the region 0 < s < l/2 and that there will only be a second mode, at exclusive minority choice (s = 11, when the partial preference mode is at an s-value less than l/3. The relative strength of these two modes depends upon unspecified factors, such as the degree of variability in behavior and the actual increment and decrement functions, a(s) and that the size of the right-hand (minority-exclusiveb(s 1. We may speculate choice) mode will depend upon the value of the unstable root: the closer it is to 1.0, the less likely that a substantial minority-choice mode will develop.

APPRNDIXC Choice OII Coacurrent Variable-Interval

Schedules

The reward-following model can be adapted to VI choice by appropriate molar feedback function (MFF) for p(s) and q(s) in probability of reward on an interval schedule depends upon the some assumption must be made about rates. We assumed that the two choices, x + y = J, is constant. Choice proportion, s, is x/(x+y). Using the MFF for random responding, reward rate is given R(s) where A is the scheduled that reward probability

and

VI u is just

= Ax/CA (i.e., x/R(x),

p(s) q(s)

substituting the Because the Eq. 4. rate of responding, sum of rates to the then equal to by

+ x),

the maximum possible it follows that

= A/(A+x), = B/(B+y).

reward

rate).

Given

(Cl) (C2)

Substituting J - x for y in Eq. C2, and substituting the result, together with Eq. Cl, into Eq. 4 in the text yields the appropriate delta(s) function. We arrived at the properties described in the text simply by inspecting the plotted functions for the appropriate ranges of A and B values.