BULLETIN OF MATHEMATICAL BIOLOGY VOLUME 37, 1975
B A T E S I A N MIMICRY AND SIGNAL D E T E C T I O N T H E O R Y
[] A. OATEN Department of Biological Sciences, University of California, Santa Barbara, California 93106 C. E. M. PEARCE
Department of Applied Mathematics, University of Adelaide, Adelaide, S. Australia 5001 M. E. B. SMYTH*
Department of Zoology, University of Adelaide, Adelaide, S. Australia
Signal D e t e c t i o n T h e o r y c a n b e u s e d to p r o v i d e a m a t h e m a t i c a l m o d e l describing t h e choice of a p r e d a t o r t r y i n g t o d i s t i n g u i s h b e t w e e n a m o d e l a n d a B a t e s i a n mimic. T h e m a t h e m a t i c a l m o d e l yields a n u m b e r of d e d u c t i o n s , i n p a r t i c u l a r t h a t it m a y or m a y n o t assist t h e m i m i c p o p u l a t i o n i f m i m i c s m o r e closely r e s e m b l e t h e i r models. T h e a s s u m p t i o n s u n d e r l y i n g t h e a n a l y s i s are discussed i n some detail.
1. Introduction. Imagine a predator has found a distinctively colored and patterned insect. He has already tried eating insects like this one; some were edible b u t some were followed b y severely unpleasant consequences--prolonged nausea and vomiting perhaps (Brower et al., 1967), or a sting in the mouth (Brower et al., 1960). The problem confronting this predator is simply stated: is this insect harmless and edible or is it not? Is it a Batesian mimic or a model? Mimics and models are not exactly the same but they are similar * Deceased J a n u a r y 1974. 367
368
A. OATEN,
C. E. M. PEAI~CE
AIND
IV[. E. B. SMYTII
enough to be confused. It is as if the predator must decide which of two overlapping distributions of information the present piece belongs t o - - t h a t set of information arising from all mimics, or that set arising from all models. I f the two sets were normally distributed with equal variance, the situation could be visualized as in Figure 1. Sometimes, as at A, the information almost certainly
A
d
C
0
B
Informefion af predators decision centre
Figure 1. The Signal Detection Theory model for the predator's choices between mimics and models. The predator's choices are made as if, b y the time it reaches the predator's decision center, the information from all mimics is normally distributed and overlaps the normal distribution of information from all models. The symbols are for reference from
the text
came from a mimic; sometimes, as at B, it almost certainly came from a model. B u t many pieces of information between A and B are more or less ambiguous; they could be from either mimic or model. This formulation of the problem makes it amenable to analysis b y means of a model from t h e body of theory known as signal detection theory (SDT), developed for the analysis of risky decisions. The use of SDT for the analysis of Batesian mimicry is not new; Duncan and Sheppard (1965) have already used it, together with the results of a simple experiment, to show that the relative selective advantage gained b y a mimic depends on the costs to the predator of mistakenly eating a model. 2. Mathematical Model. A brief outline of the model and its assumptions is given below; a derivation from a set of more fundamental assumptions is given in the Appendix; more detail can be found in Swets et al. (1961), Green and Swets (1966) or McNieol (1971). The problem the predator faces is usually summarized as follows: one of two states (mimic or model) obtains; the predator does not know which, b u t he does
BATESIAN MIMICRY AND SIGNAL DETECTION THEOI~Y
369
know, say from past experience, that these states occur with relative frequencies P(M) and P(D) = 1 - P(M): these are the a priori probabilities; the predator also will observe a random signal which will be one of a set of values, S, but whose actual value, 8, will be chosen according to a probability distribution which depends on which state obtains: the probability distribution will have probability density function f1(8) if one state (mimic) obtains and f2(8) if the other state obtains, both f l and f,. being known to the predator; finally the predator must choose one of two possible actions (attack or ignore) each of which is appropriate for one state (a different one) and inappropriate for the other: the predator receives a payoff which depends on the state that obtains and on the action he takes; and (in a non-trivial problem) which action leads to the greater payoff will depend on which of the unknown states obtains. In table form we have
A priori
Payoff matrix
State 1 (Mimic) State 2 (Model)
Action 1 (Attack)
Action 2 (Ignore)
a -b
0 0
probabilities
Probability density of signal
P(M) P(D)
fl(s) f2(s)
where a is what the predator gains b y eating a mimic and b is what he loses b y eating a model. In the Appendix it is shown that, under reasonable assumptions, an optimal strategy for a predator is to eat the prey if
fl(s)
>
bP(D)
(1)
where 8 is the signal the predator receives. There are many possible forms for S, f1(8) and f2(s). However, it is not unreasonable to suppose that the signal s, as perceived by the predator, is a number and is made up of a large number of "sub-signals", indicators of various aspects of the prey's appearance. I f we can assume the signal is, in fact, a linear combination of a large number of independent (or even "almost independent") sub-signals, whose variances do not vary too greatly, then the distributions f l and f2 can be taken to be normal (see, e.g. Feller, 1966, p. 491; Ibragimov and Linnik, 1965, pp. 423-432, described in Stein, 1971). We comment further on this assumption following (c) in Section 3. Rescaling and translating appropriately, we can take f2(s) to be the standard normal density function,
f~(8) = ¢(8) = (27r)- 1/~'{exp - ½s2},
(2)
370
A. O A T E N , C. E . M. P E A R C E
A N D M. E. B. S M Y T H
a n d f l to have mean - d and variance ~2 so t h a t f i ( s ) = (2~a2) -i/2 exp L - ~
(3)
The predator should eat the prey if the ratio of these is large e n o u g h - specifically, if a - i exp
-
Writing b/a = r a n d P ( D ) / P ( M )
+ -~ s
(= lIP(M)
>f aP(M------)"
1) as p, this is
-
s2(a 2 - 1) - 2sd - (d 2 + 2a 2log arp) >i O.
(4)
3. P r e d i c t i o n s of the M o d e l . I n this section we consider the effects of changes in d (the "difference" between mimic signals and model signals) on the proportions of the mimic and model populations under a t t a c k by the predator. I f our SDT model is correct, t h e n a n y prey, whether mimic or model, will come under a t t a c k if its signal satisfies (4). Thus if A is the set of values of s for which (4) holds, t h e n f A f l ( s ) ds is the proportion of the mimic population under a t t a c k by the predator. W i t h f i as in (3) and (I) as the normal distribution function q ) ( x ) = f x (27r)_i/2 exp {-y212} dy, we consider three cases
which arise naturally from (4). (a): a > 1. I n this case, A (the set of values of s for which (4) holds so t h a t the predator will attack) consists of all points outside the interval (a 2 - 1)-~[d +_ a{d 2 + 2(~ 2 - 1)log arp}i/2]. (If d 2 + 2(a 2 - 1) log arp < 0, all prey should be eaten.) Consequently, with 7 = ~{d2 + 2( a2 -- 1) log arp} i/2, the proportion of the mimic population t h a t is under a t t a c k is P(d) = 1 - ¢[{a(~ -
1 ) } - l ( a d + 7)] + @[{~(a~ - 1)}-~@ 2d - 7)].
Differentiating this with respect to d we have
P'(d) =
(7 - ~¢(-~)
7 + d -
( ~ _ 1)2)j
This is positive--implying a larger proportion comes under a t t a c k as d increases, i.e. as mimics become less like models---if 7 - d 7 + d
exp ~"
L
(a-
274
1)23 > 0.
(5)
BATESIAN MIMICP~Y AND SIGNAL DETECTION THEORY
371
Let e = 2(a ~ -- 1) log (~rp, so ~ = a(d ~ + e}1/~. W e need to consider t h r e e cases.
Case 1 e < 0.
T h e left side of (5) is clearly continuous in d and is clearly negative for d ~ /> - e [for d ~ < - e , P ( d ) = 1]. T h u s there is a range o f values o f d, ( ~ / - e , d') say, for which P'(d) < 0, so t h a t a decrease in d actually results in an increase in t h e p r o p o r t i o n of mimics a t t a c k e d .
Case 2 e = 0. Again t h e left side of (5) is c o n t i n u o u s a n d is [(~ - 1)/(a + I)] 1 < 0 for d = 0, so again there is a range o f values o f d, (0, d') say, for which a decrease in d is h a r m f u l t o the mimic. Case 3 e > 0. This case is most easily dealt w i t h b y observing t h a t the left side of (5) is continuous in e, so if it is negative for e = 0, it will be negative for some positive values. I t is, however, possible to choose e large enough so t h e left side of (5) is n e v e r negative. This is t h e only ease (for a > 1) where a decrease in d is always advantageous. I t corresponds to models being c o m m o n and/or v e r y noxious (rp > 1). (b) a < 1.
H e r e , A consists of all points inside the interval
(1 - ~ ) - ~ [ - d
+ ~{d~ - 2 0 - ~ ) l o g ~rp}'~].
/If d ~" - 2(1 - a 2) log arp < 0, no p r e y should be eaten.] W i t h ~ = a(d ~ ' - 2 ( 1 - a~)log arp} 112, t h e p r o p o r t i o n of mimics u n d e r a t t a c k is P(g) = ¢[{.(1 - o:)}-1(~_
.~d)] - ¢ [ { . ( 1 - . : ) } - 1 ( _ ~ _
~:d)].
Proceeding as in (a), we find t h a t P'(d) is positive if
d +---) + e~p _ i l - ~ 2 ) ~ j
> 0.
(6)
Again we h a v e t h r e e cases, depending on t h e value of ~ = 2(1 - a 2) log orp.
Case 1 $ < 0.
T h e left side of (6) is 0 when d = 0.
Its derivative at d = 0
is 2or
((1 - a2) 2
)
which is negative if 3 < - a - 2 ( 1 - a2)2; i.e. w h e n rp < 1 (models are rare and/or not v e r y noxious) and a2 is close to u n i t y . I n this case, when d is n e a r zero, decreasing d will increase the p r o p o r t i o n u n d e r attack.
Case 2 8 = 0.
T h e left side o f (6) is [(1 - a)/(1 + a)] + 1 > 0 t h r o u g h o u t , so decreasing d is always advantageous.
372
A. OATEN,
C. E. M. PEARCE
AND
M. E. B. SMYTH
< ~¢/~. For all otherd, the left side of (6) is positive, so decreasing d is always advantageous. Case3~
> O.
(c) a = 1.
P(d) = Oford
Here the proportion under attack is P ( d ) = @(d/2 - d - 1 log rp).
(7)
I f log rp > 0, decreasing d will decrease P(d); but if log rp < 0, then P ( d ) is increased by decreasing d if d < ( - 2 log rp}!/2, i.e. if ( - d / 2 - d - 1 log rp)' + 1<0. One can fit this case [and, with more effort, presumably (a) and (b)] into a more general context t h a t m a y partly overcome objections to assuming signals are normally distributed. Suppose t h a t f2(s) = f l ( s - d) and t h a t rp
b P(D) a P(M)
< 1.
Then from (1), if d = 0 the entire mimic population is under attack. Assuming P ( d ) is differentiable, which will almost always be the case, there will be an interval (d ', d") where it is disadvantageous to the mimic population to have d decrease. ( d ' i s t h e " l a s t " d f o r w h i c h P ( d ) = 1;strictlyitissup{do: P(d ) = 1 for all d e [0, do)}.) Thus, although it is to the advantage of i n d i v i d u a l mimics to look more like models (i.e. to emit a signal which is more to be expected from a model t h a n from a mimic), it m a y be disadvantageous to the mimic p o p u l a t i o n as a whole, in the sense t h a t a larger proportion of mimics m a y come under attack if the distribution of mimic signals becomes more like the distribution of model signals. The generalization to arbitrary density functions, two paragraphs back, can be taken further to signals of more t h a n one dimension (e.g. appearance and sound). Suppose, for example, the signal is /c-dimensional, and t h a t its distribution is/c-variate normal, with co-variance matrix E and mean vector - d (if the signal comes from a mimic) or 0 (if from a model). It is then easy to show t h a t condition (1) leads to the predator eating the prey if the signal vector s satisfies s ' Z - l s - (s + d ) ' Z - l ( s + d) > 2 log rp from which the quadratic terms cancel to give d ' Z - is < - l d ' E - ld - log rp.
[Again, r = b/a and p = P ( D ) / P ( M ) . ] This criterion has us looking at the single number, d'E- is, which is a linear combination of the components of the signal vector, and which has a normal distribution with mean vector - d ' E - l d or 0, for mimic or model respectively, and variance d'E-ld. Thus in this case
BATESIAN MIMICRY ARID SIGNAL DETECTION THEORY
373
(which requires the eovariance matrix, Z, to be the same for mimic and model), we need consider only the univariate, normally distributed, "combination signal" d'Z-ls. In this situation, the proportion of the mimic population under attack is P(d) = #p(½(d'Z- ~d)1/2 - log rp/(d'Z- ~d)1/2}.
(8)
The question is, how does this behave as d gets "closer" to 07 However "close" is ambiguous here: it m a y depend on the scales used to measure the various components of the signal, or on the identification of the components themselves: e.g. "appearance" m a y appear in several components as "amount of yellow color", "lightness of color", etc., some of which may be highly correlated. This suggests we should measure how much the mimics "look like" the models not b y the size of d but b y the difference between the two distributions. An obvious and standard way of measuring the difference (or distance) between two distributions is b y the "average" squared difference between their densities, i.e. b y f(f~(s) - f2(s)} 2 ds. In the present case it is easy to show, b y using the transformation v = Z-1/~s, that the distance (in this sense) between the mimic and model signal distributions is 21-~/2[1 - exp ( - ~ d ' Z - l d } ] . This suggests we use D = (d'Z-ld} 1/2 as a measure of how far apart the mimic and model populations are, in terms of their signals. If we do, we see easily from (8) that the proportion under attack when the distance between the two distributions is D is P(D) = ~9(½D - D -~ log rp). Comparing this with (7), and the conclusions following from it, we see again that decreasing the distance between the distributions--having the mimics more like the models--can actually increase the proportion, P(D), of mimics under attack if rp < 1 and if D < (2 log rp}l/2: i.e. if the two populations are already close and if the "risk" is "worthwhile" because either mimics are very frequent relative to models or the advantage of eating a mimic outweighs the disadvantage of eating a model. Again, this case can be partially extended to more generM distributions. Iff2(s ) = fl(s - d) then, although it is not clear in general how to measure the distance between the distributions of model and mimic signals, it is clear that any reasonable measure will have the property that, for any d, the density fl(s - ad) is closer tofl(s) thanfl(s - rid) is, provided a < ft. Consequently we can pick on a fixed direction, represented by d, and ask whether, if the distribution of model signals has density fl(s - ad), the proportion of mimics under attack necessarily decreases as a decreases and mimics become "more like" models. Again, however, if rp > 1 the entire mimic population is under
374
A. OATEN,
C. E . M. P E A R C E
AND
M. E . B . S M Y T t t
a t t a c k when a = 0, so t h e r e is an interval, (a', a"), in which it is disadvantageous to t h e mimic population to h a v e a decrease. I n all t h e cases discussed, then, it is possible for the p r o p o r t i o n of mimics u n d e r a t t a c k actually to increase if the mimic population becomes more successful in imitating models, p r o v i d e d rp is small enough--i.e, p r o v i d e d either mimics are relatively a b u n d a n t or the noxiousness of models is not great comp a r e d to t h e value of mimics. T h e situation for models is similar. I f a > 1 the proportion of models u n d e r a t t a c k is
Q(d) = 1 - q)[(a2 _ l ) - l ( d + V)] + , [ ( a 2 _ 1 ) - l ( d _ V)], F r o m this, Q'(d) is positive if
where V is as in case (a), previously. ~7
a2d + a2d -
exp f
_
2~/d
(a2 - - 1 ) 2 j > 0.
(9)
Comparing this with (5), we see t h a t it is possible to have Q'(d) > 0 (which would m e a n t h a t an increase in d would increase the proportion of models u n d e r a t t a c k , a counter-intuitive situation), b u t only if P'(d) > O, i.e. only if it is a d v a n t a g e o u s for mimics t o decrease d. I f a < 1, t h e n
Q(g) - -
(I)[(1 -
ai)-l(-g
+
~)] -
(I)[(1 -
a2)-l(-d
-
~)].
F r o m this, Q'(d) > 0 if a2 d +-------~+ exp _
(1 - - ~ 2 ) 2 j > 0.
(10)
Comparing this with (6), we see again t h a t Q'(d) > 0 only if P'(d) > O. Finally, if a = 1, Q(d) = ¢p[-(d/2) - d - l l o g r p ] so Q'(d) > 0 if d < {2 log rp} 1/2 {i.e. if 1 + [(d/2) - d -1 log rp]' < 0}; if log rp < 0, Q'(d) < O. Again, this fits a more general c o n t e x t : i f f l ( s ) = f2(s + d) a n d rp > 1, t h e n if d = 0 t h e p r e d a t o r will a t t a c k no p a r t of either mimic or model population. As d is increased, some sets of possible signals become so m u c h more likely t o be due to mimics t h a n to models t h a t p r e y emitting these signals come u n d e r attack. Thus p a r t of the model p o p u l a t i o n comes u n d e r attack, i.e. an increase in d is " b a d " for b o t h populations. Similarly, if rp < 1, a decrease in d can be b a d for b o t h populations. These r e m a r k s can also be shown to a p p l y to t h e case of multidimensional signals, as discussed earlier for mimics. W e h a v e illustrated examples of some of these results in Figure 2. W e can summarize our results qualitatively: for large rp, decreasing d is a d v a n t a g e o u s to b o t h mimics and models; for small rp, decreasing d is disadvantageous to b o t h ; for " i n t e r m e d i a t e " rp, decreasing d is a d v a n t a g e o u s to mimics, b u t dis-
BATESIAN MIMICRY AND SIGNAL DETECTION THEORY
375
I,O (a) b =0"So 0.8 0.6
0"1
0-4 i "4
0.2 ~
E 4
5
2
1
2
I
0
"5 1.0
03
O,S
[b)b=O l~o
0.6 0,4 0.2
4
$ Increasing resemblance
mimics $o models { - d }
Figure 2. The survival of encountered mimics of different degrees of resemblance to models at three different relative frequencies and at two ratios b:a. W h e n b = 0.5a, curves for which P(M) < ~ will converge to the top right of the figure, like the 0.1 curve, while those for which P(M) > ½ will converge to the b o t t o m left. W h e n b = 10a, the corresponding cut-off point is at P(M) = TT-1 o The curve for P(M) --- 0.9 when b = 0.5a is too close to the abscissa to be shown
a d v a n t a g e o u s t o m o d e l s . H o w s m a l l o r l a r g e rp m u s t b e for t h e s e effects d e p e n d s o n d ( a n d also a); i f a = 1, t h e n f o r l a r g e d t h e " i n t e r m e d i a t e " c a s e a p p l i e s , a n d f o r s m a l l d o n e o f t h e e x t r e m e s - - t h e a d v a n t a g e o u s o n e i f rp > 1, d i s a d v a n t a g e o u s i f rp < 1, a n d a m b i g u o u s i f rp = 1: t h i s is t h e case w h e r e t h e predator's two options have the same expected gain. I t s h o u l d b e s t r e s s e d t h a t t h e effects d e s c r i b e d h e r e a r e f o r t h e p o p u l a t i o n a s a whole. Although a decrease in d may not be advantageous for the population, an individual mimic with a large s may have an advantage over other mimics;
376
A. OATE:N, C. E. M. P E A R C E
A N D M. E . B. S M Y T H
more precisely, in the case a = 1, if sl > s2, an individual mimic with signal sl is less likely to come under attack t h a n a mimic with signal s2: any predator whose a and b are such t h a t it is worthwhile for him to attack a prey emitting sl will also find it worthwhile to attack a prey emitting s2; but there are values of a and b (more precisely, of b/a) which will make it worthwhile to attack s2 but not sl. Thus there is an advantage to the individual with a high s, but as this trait is passed on, so t h a t the mean for the population increases, it m a y happen t h a t the population is in a worse way (i.e. more of it is being attacked) t h a n before. W h a t is good for the individual m a y not be good for the population. In the same way, as pointed out by Fisher (1958, p. 165), it is advantageous for an individual model to emit a large s, i.e. to have a signal as different as possible from those of mimics. Thus the model population will tend to evolve away from the mimic population, a tendency which may actually be disadvantageous to the models; so again the good of the individual m a y not coincide with t h a t of the population. I n both cases, the apparently paradoxical situation occurs, if it does at all, only for small values of d. Such values will occur only if the rate at which mimics evolve towards models is greater t h a n t h a t at which models evolve away from mimics. We might speculate further t h a t this could be the case, at least when d is large since the proportion of mimics under attack is always greater t h a n t h a t of models, so t h a t selective pressures m a y be more intense for mimics. This seems to be so in some cases (Nur, 1970). A further complication arises from the fact t h a t the predator does not really knowf~ and f2, but (we assume) estimates them on the basis of prior experience. I f f l and f~ are changing, the predator could be expected to be always somewhat out of date, in which case these changes would be advantageous, at least for a time, to both populations. I t is possible, however, that the predator's strategy allows for this (as we have not done in the Appendix) by being more daring t h a n would be justifiable if f l and f2 were fixed.
4. Generalizations of the Model. The conclusions of the previous section hold good without the full particularity of the assumptions made. We might, for example, wish to allow for deviations from optimality in the predator's choice of criterion information values (i.e. the predator does not always make the optimal choice but randomizes somewhat) and variation in the values of a and b. For simplicity, take a = 1 and suppose a predator's criterion follows a distribution instead of assuming the optimal deterministic value c = - ( d / 2 ) d -1 log rp. I f the density function for a deviation u from c is g(u), then P(d) =
/;o
g(u) du
f: p
¢(x + d) dx.
BATESIAlq MIMICI%YAND SIGNAL DETECTION THEORY
377
Differentiation with respect to d yields j-
+ d) du cO
so t h a t the sign of P ' is t h a t of c' + 1 as when c is not random. Thus whether there is advantage or disadvantage for the mimic in more closely resembling the model is independent of the form of the predator's criterion distribution density function g. When a ¢ 1, we similarly posit t h a t a predator selects his two criterion points to deviate amounts ul, ~2 from the deterministic values c1, % of s for which relation (4) holds with equality, with probability density g ( u l , uu), i.e. we suppose g to be largely independent of the choice of cl, %. The predation rate for encountered mimics becomes
P(d)
ff g(u.
du2[
[x + dX
and we find
where 91, g2 denote the marginal densities of g. I t follows in particular t h a t the relative advantage in decreasing d is as before provided the distributions of the criterion values are sufficiently concentrated about c1 and %. I f the predator sticks strictly to the criterion, but the criterion is itself random because of randomness in the values of a and b, as might arise from variation in hunger, the effect is not quite the same. Since a and b enter into c through the term d - 1 log rp, a change in d will result not only in a change in location for the distribution of c but also a change in scale. With ~ = l a n d r random, with density function h, we have
where c = - a l l 2 - d -1 log rp. P' =
h(r)¢
Differentiation with respect to d now yields
[c' + 1] dr =
h(r)¢ ~
[~ +
log rp] dr.
It follows t h a t the survival rate of encountered mimics at a given value of d increases with increasing resemblance to models only if the density h does not have too much of its associated mass occurring at low values, i.e. if it does not happen too often t h a t the gain in eating a mimic is high compared with the loss in eating a model, or if the selective frequency of mimics is not too high.
378
A. OATEN, C. E. M. PEARCE AND M. E. B. SMYTH
5. Discussion.
In this section we make some general remarks about the SDT model and the selective advantages of mimicry. These remarks are necessarily qualitative and somewhat imprecise; we have tried, in the Appendix, to state general assumptions from which predator behavior, and some of our remarks here, can be deduced, but it is very difficult to quantify the predator's behavior in any realistic way. At least four factors determine the selective advantages of mimicry: the resemblance of the mimic to the model, the noxiousness of the model, the relative abundance of mimics and models and the abundance of alternative food. The first three of these have been shown experimentally to influence the protection given a mimic, but it is not yet possible to see clearly the relative importance of each of them, nor how t h e y interact. But if a predator does in fact conform to the SDT model, then we can equate each of these four factors with terms in (4) and, by varying these terms, explore the consequences to the mimics. Clearly d in (4) is a measure of the apparent difference between mimic and model. P(m) and 1 - P(m) are the relative abundances of mimic and model. The two values a and b do not have any straightforward interpretation and indeed they would be almost impossible to measure independently of the SDT analysis. But the ratio of b to a in (4) will be influenced by the noxiousness of the model, which influences b, and the hunger of the predator and the abundance of alternative food, which will influence both a and b. We discuss this again towards the end of the Appendix. Thus, given (4) and its consequences in (a), (b) or (c) we can compute the proportion of the mimic population which is liable to attack (i.e. whose signal falls in A) for any values of r, a, d and P(M). Such proportions can be interpreted as relative selective values by dividing the survival of mimic or model under one set of values of the parameters by their survival under another set. These selective values assume t h a t any factors not taken into account in (4) remain constant. Three of these factors are important: (a) the proportion of the population which is encountered by the predator; (b) the proportion of the apparent variance in the signals at the decision center which derives from phenotypic variation in the prey; (c) the heritability of the resemblance of mimics to models (Falconer, 1961). The SDT model will only be as good as its assumptions, so they will be examined here in some detail. Note t h a t SDT is not offering any hypotheses about the mechanisms of perception, about how information is received and processed. Some writers have argued t h a t SDT is appropriate to some detection problems because the nervous system is essentially noisy; it is neural noise t h a t results in normal distributions of signals at the decision center. This problem need not concern us here. Nor do we need to enter into current
B A T E S I A N MIMICRY A N D SIGNAL D E T E C T I O N T H E O R Y
379
psychological controversies about whether SDT provides the most appropriate framework for analysis; these controversies often seem to be between rival accounts which in any given context make remarkably similar predictions. All that SDT prescribes for our purposes is that the predator behaves as if all information available to him for distinguishing mimics from models is distributed normally. This information is about several different items--the color of a spot, the size of a spot, the angle of a line, and so on. Each of these dimensions alone might be amenable to SDT analysis; we assume that all of them together are. In other words SDT is not for our purposes a theory about perception, only a tool for analysis. The analysis will only be justified b y experiments testing hypotheses generated b y SDT, and some of these experiments are now being done. B u t a priori the use of the model is not unreasonable; it has been successfully used in experiments on selective perception by human subjects (Broadbent, 1971) and although it has not been much used for the analysis of experiments on other animals, it has been shown to give a good account of experiments on stimulus generalization (Blough, 1967) and discrimination in pigeons (Boneau and Cole, 1967; Suboski, 1971). In addition, Terman and Terman (1972) asked rats to distinguish between a sound of "standard" (100 dB) intensity and an attenuated comparison [at (]00-d) dB for various d] in several series of trials, with the sound in each trial being chosen randomly, and with P, the probability that the standard is used, varying from series to series. I t seems reasonable to regard a signal sent with intensity 100-d as being received with intensity s (say) where 8 has some distribution centered at 100-d, so the signals received are random with densities fl and f2, which may plausibly be assumed to be normal, very possibly with the same variances. It is clear that both d, the difference between the distributions, and P, the a priori probability of the standard sound, affected the rat's choices in the way SDT would predict, b u t whether (7) gives a good fit to the results is not clear from the data t h e y have given. Other workers have shown that, at least in some cases, the signals received are multidimensional and that the w a y the components are weighted in decision making can depend on past experience (Blough, 1972; Chase and Heinemann, 1972): nevertheless it seems likely that the decision can be regarded as based on a linear combination of the components (perhaps after suitable rescaling) so our model and its multivariate extensions m a y well be applicable. The SDT model assumes that the predator seeks to maximize his gain from any given choice. To begin with it is very difficult if not impossible to quantify the predator's gain and loss. We might calculate his calorific gain in eating a mimic and his average calorific loss in losing the contents of his gizzard if he eats a model, b u t the relative loss in passing over a mimic will depend on hunger
380
A. OATEI~I, C. E. M. P E A R C E A N D M. E. B. SMYTH
and the abundance of alternative food, and a nauseated predator which has just eaten a model not only loses his stomach contents but up to an hour of hunting time as well (Brewer et al., 1968), the value of which again depends on the abundance of food. Further, values of a and b calculated in purely energetic terms will not give anything like a complete answer, for if a predator eats a model it will suffer more than a loss of gizzard contents and hunting time b u t a severe psychological shock as well which it might not care to repeat. It is not possible to be precise about this without being very specific about some of the features mentioned above, and about what it is the predator is (teleologically speaking) aiming to d o - - a n d without actual solutions to the predator's optimization problem. However, it is clear intuitively, and is a consequence of the analysis in the Appendix, that b/a will vary from predator to predator and from time to time. Thus the situation the prey population face is of a variable b/a--the value of b/a for the next predator they meet is a random variable. The effect of a change in the density function, fl, depends on b/a [ = r, in (4)], so the overall effect will be an average. However, consideration of the effect of particular values 0f b/a can indicate how the average will behave, especially as the range of b/a values m a y not be large: the factors affecting b/a--abundance of alternative food, noxiousness of the model, etc.--tend to affect all predators similarly, so the range of values of b/a at any one time m a y not be very great. The situation facing the predator is rather like that in certain psychological experiments in which human subjects are given a detection task, fined for errors and paid for correct decisions. In such experiments one rarely finds that the subjects acts as an ideal observer in strictly financial terms; more usually he shows some systematic departure from ideal behavior (Broadbent, 1971, p. 448). A natural predator, which is not playing a game in the way the subject of a psychological experiment is, might depart less from the ideal and better fit our assumption because at least some of the time the profits and losses involved are important, matters of life and death. However, this would not be an easy matter to decide, since b/a varies over time according to the predator's experience and "aims", and hence m a y not be easily estimated. Most optimization models are likely to fail, at least at times, because it is not always possible to optimize all behaviors simultaneously. Even if an animal compromises b y optimizing a weighted combination of all its behavioral systems--and we know of no evidence that it does--this might not often correspond to an optimization of any one of them. A bird using optimum strategy for gathering food might not be able simultaneously to optimize its defense against predators, or its likelihood of mating, or of holding a territory.
BATESIAN MIMICRY AND SIGNAL DETECTION THEOI~Y
381
I n t h e A p p e n d i x we h a v e a s s u m e d t h a t strategies for other b e h a v i o r are d e t e r m i n e d i n d e p e n d e n t l y of procedures for distinguishing b e t w e e n mimics a n d m o d e l s - - t h a t t h e l a t t e r are d e t e r m i n e d within t h e c o n t e x t o f the former. This seems u n e x c e p t i o n a b l e - - r a t h e r like m a x i m i z i n g a f u n c t i o n of x a n d y b y first finding t h e m a x i m u m for each fixed y a n d t h e n finding t h e y which m a x i m i z e s these m a x i m a . H o w e v e r , this a s s u m p t i o n allows a a n d b to v a r y according to t h e choice of these o t h e r strategies; in addition, P(M) a n d P(D) m a y v a r y , as in t h e A p p e n d i x , r a t h e r t h a n be fixed, as in o u r discussion. A f u r t h e r complication, n o t considered in t h e A p p e n d i x , is t h a t since fl(s) a n d f2(s), t h e densities of m i m i c a n d m o d e l signals, m a y change as m i m i c s evolve t o w a r d s models a n d models a w a y f r o m mimics, it m i g h t be wise for t h e p r e d a t o r , a t times, t o a d o p t a s t r a t e g y which does n o t seem o p t i m a l , to k e e p a b r e a s t of these changes. T h e o p t i m a l decision rule then, as discussed here, is a w e a k l y supp o r t e d a s s u m p t i o n m a d e m o s t l y for its simplicity; it should be possible to revise t h e m o d e l - - o r , a t least, t e s t a b l e refinements of it---if e x p e r i m e n t s show systematic departures from optimal strategy. T h e S D T m o d e l is only r e l e v a n t to p r e d a t o r s which are already a c q u a i n t e d w i t h b o t h models a n d mimics. W h e n t h e p r e d a t o r p o p u l a t i o n contains individuals which h a v e n o t y e t e n c o u n t e r e d models, t h e selective forces on mimics, a n d p a r t i c u l a r l y t h e effects of relative a b u n d a n c e on those selective forces, will be different; t h e a d v a n t a g e s of m i m i c r y will be less, especially if mimics are r e l a t i v e l y a b u n d a n t . W a l d b a u e r a n d Sheldon (1971) h a v e s h o w n t h a t in the case o f D i p t e r a n mimics of Vespid w a s p s t h e mimics are r e l a t i v e l y scarce w h e n y o u n g birds are fledging a n d beginning to forage for t h e m s e l v e s , a n d r e l a t i v e l y m u c h m o r e a b u n d a n t b o t h earlier a n d later. W h a t e v e r t h e r e a s o n for this, one effect would be to give t h e m i m i c s b e t t e r protection, since w h e n t h e y are r e l a t i v e l y c o m m o n the birds w o u l d m o s t l y h a v e h a d experience with t h e i r models. APPENDIX In this section we attempt to set out plausible assumptions from which it will follow that a predator, faced with a prey which may be a mimic or a model, and which emits signal s, should eat the prey iffl(s)/f2(s) is large enough, where fl and J'2 are the densities of the signals emitted by, respectively, mimics and models. Unfortunately the task of proving such a result is similar to that of proving 2 + 2 = 4; we need a rather elaborate structure, difficult to specify, to demonstrate something that seems intuitively obvious: in fact our assumptions may be less intuitively acceptable than the results we use them to prove. (In a helpful review of this paper, Dr. M. Terman has drawn our attention to the lucid article by Boneau (1974) which discusses a structure similar to the one we give here, but with less mathematical detail.) We first assume that the predator has a history, h, which can be regarded as the history (and future) of opportunities and traps offered to the predator by his stu'roundings.
382
A. OATEN,
C. E. M. PEARCE
AND
M. E. B. SMYTI-I
T h i s w o u l d i n v o l v e all a s p e c t s o f t h e p r e d a t o r ' s life; we t a k e it as h = [(tl, te . . . . ); (kl,/c2 . . . . ); (sl, s2, • • • ); (xi, x2 . . . . ); R(h)] w h e r e tl, t2, . • • are t h e t i m e s of a r r i v a l of p r e y ( m e a n i n g m i m i c s a n d models) w i t h i n t h e p r e d a t o r ' s r a n g e of a t t a c k , k~ i n d i c a t e s w h e t h e r t h e i TM p r e y is m i m i c (ki -- Mi) or m o d e l (kt = DI), sl is t h e signal e m i t t e d b y t h e i TM p r e y ( " s i g n a l " b e i n g a n y t h i n g d e t e c t e d b y t h e p r e d a t o r - - a p p e a r a n c e , s o u n d , etc.), xt is a n u m b e r i n t h e i n t e r v a l [0, 1] w h o s e p u r p o s e we give in a m o m e n t , a n d R(h) is s i m p l y " o t h e r a s p e c t s " of t h e p r e d a t o r ' s h i s t o r y - - e . g , t h a t p e r t a i n i n g t o a l t e r n a t i v e food, r e p r o d u c t i o n , etc. F o r a n y h we t a k e h i t o b e t h e p a r t o f t h e h i s t o r y o c c u r r i n g before t h e a r r i v a l o f t h e i th prey,/~t t o b e t h e p a r t o c c u r r i n g a f t e r t h e i TM p r e y a r r i v a l , a n d h ° for t h e p a r t o c c u r r i n g a t t i m e tt. T h u s ht = [(tl . . . . . t t - i ; td; (/el . . . . . /ct-i); (sz . . . . . s t - i ) ; (xl . . . . . x t - 1 ) ; R(ht)], w h e r e t h e semi-colon before tt i n d i c a t e s t h a t t h e h i s t o r y e x t e n d s t o t i m e tl b u t n o t to t h e i TM p r e y a r r i v a l a n d w h e r e R(ht) s t a n d s for all o t h e r a s p e c t s of t h e p r e d a t o r ' s h i s t o r y u p to, b u t n o t i n c l u d i n g , t i m e tt; h ° = [tt,/ci, st, xt, R(h°)], a n d hi = [ ( h + l . . . . ); ( / c t + l , . . . ) ;
(st+l . . . .
); ( x t ÷ l . . . .
);
R(~;t)].
T h e experience of t h e p r e d a t o r differs f r o m his h i s t o r y b y o m i t t i n g t h a t p a r t o f his h i s t o r y o f w h i c h h e is u n a w a r e a n d b y i n c l u d i n g t h e r e c o r d of his a c t i o n s . W e t a k e e = [(tl, t2 . . . . ); (/1, 12. . . . ); (sl, s2 . . . . ); (xi, x2 . . . . ); R(e)], w h e r e It is one o f Mr(g) (i TM p r e y w a s a m i m i c a n d was e a t e n ) , Dt(g) (i th p r e y a m o d e l a n d eaten), a n d Nt (i TM p r e y n o t e a t e n ) , a n d R(e) i n v o l v e s all o t h e r a s p e c t s o f t h e p r e d a t o r ' s experience. A t t h i s t i m e we i n t r o d u c e t h e e v e n t s M r ( ~ g ) a n d D I ( ~ g ) , t h a t t h e i TM p r e y was n o t e a t e n a n d was, resp e c t i v e l y , a m i m i c or a m o d e l . T h e p r e d a t o r will n o t b e able to d i s t i n g u i s h b e t w e e n t h e s e e v e n t s since h e will n o t k n o w w h e t h e r a n u n e a t e n p r e y w a s a m i m i c or a m o d e l . As i n t h e case of histories, we n e e d t o b e able t o split e u p , t h i s t i m e i n t o (et, tt, si, xi, Ii, R(e°), ei) w h e r e e t a n d ei are t h e e x p e r i e n c e s before a n d a f t e r t h e i TM p r e y a r r i v a l , a n d h a v e expressions a n a l o g o u s t o t h o s e for ht a n d ~t, a n d t h e o t h e r s y m b o l s t o g e t h e r give e°, w h a t h a p p e n s as t h e i TM p r e y arrives: t h e p r e d a t o r ' s a w a r e n e s s o f its a r r i v a l a n d signal (tt, st), t h e decision t o e a t it or n o t a n d t h e e a t i n g or l e t t i n g go (x~, ll), a n d a n y o t h e r s i m u l t a n e o u s e x p e r i e n c e , R(e°). W e c a n a s s u m e t h e decision a n d m e a l follow a f t e r st b y a s h o r t t i m e . W e w r i t e (et, s) for (e~, t~, s); t h i s will b e t h e e x p e r i e n c e of a p r e d a t o r who, a t t i m e ti, receives t h e signal s f r o m his i TM prey. H e n o w h a s t o decide w h a t t o do w i t h it; t h i s is w h e r e t h e xt's are used. A strategy, ~, for the predator is a function ~(e I, s), defined for each possible experience e, integer i, and signal s, to be a number in [0, 1]. The predator determines ~(ei, s) when ith p r e y e m i t s signal s, a n d h e eats t h e p r e y if xi < ~(el, s); we will t a k e t h e xl's t o b e unif o r m l y d i s t r i b u t e d o n [0, 1], so t h a t a(el, s) is t h e p r o b a b i l i t y t h a t a p r e d a t o r , w i t h exp e r i e n c e et, will e a t a p r e y e m i t t i n g signal s. A s for h i s t o r y a n d experience, we w r i t e at, ~o a n d 51 for t h e f u n c t i o n ~ r e s t r i c t e d t o v a l u e s of ej for j < i, j = i a n d j > i r e s p e c t i v e l y . L e t H, E, S a n d Z b e t h e sets of all possible histories, experiences, signals a n d s t r a t e g i e s , r e s p e c t i v e l y . W e a s s u m e t h e p r e d a t o r chooses a s t r a t e g y a s Z (where ~ d e n o t e s i n c h i s i o n ) ; h e t h e n gets a p a i r (h, e) e H x E , c h o s e n a c c o r d i n g to a d i s t r i b u t i o n , Po, w h i c h h a s t h e following s t r u c t u r e : for a n y et a n d e°, (a) if A b e a n y collection of possible v a l u e s o f ht a n d B a n y collection o f possible v a l u e s o f h °, t h e n Po[h ° ~ B [ et, hi e A] is i n d e p e n d e n t of a; (b) i f A b e a n y collection of possible v a l u e s of (ht, h °) a n d B a n y collection of possible v a l u e s of ];t, t h e n P~[/~, e B [ (hi, h °) e A , (et, e°)] = Po,[/~t e B [ (hi, h °) e A, (e,, e°)]. T h e i d e a of t h e s e c o n d i t i o n s is t h a t f u t u r e h i s t o r y m a y be affected b y p a s t e x p e r i e n c e - - e . g . if a p r e d a t o r a t t a c k s a p r e y h e m a y f r i g h t e n o t h e r p r e y a w a y ; c o n s e q u e n t l y t h e d i s t r i b u t i o n o f (h, e) d e p e n d s o n t h e choice of a, b u t o n l y t o t h e e x t e n t t h a t o is reflected i n e--i.e, if we k n o w t h e p r e d a t o r ' s actions, t h e n we k n o w h o w h e h a s affected t h e d i s t r i b u t i o n o f his
B A T E S I A N MIMICRY AND SIGNAL D E T E C T I O N T H E O R Y
383
future history w i t h o u t needing to k n o w t h e actual decision process t h a t caused h i m to choose these actions. W e h a v e two further conditions, r a t h e r easier to state: (c) t h e probability density, on S, of p r e y signals is f l if t h e p r e y is a m i m i c and f2 if it is a model. F o r m a l l y : L e t A be any set of possible values of (hi, q), and let B be any collection of possible signals. T h e n
P~[~eB]A,k~=
M] = I fl(s ) ds
and
P x s , e B I A, k, = D] = j y2(s) ds for each i and a, where " d s " denotes t h e measure on S w i t h respect to which f l a n d f2 arc densities. (d) The xt's are i n d e p e n d e n t l y uniformly distributed on [0, 1], a n d not affected b y o t h e r aspects of either history or experience. W e assume t h a t , g i v e n hi a n d at, q is completely d e t e r m i n e d - - i . e , et is a f u n c t i o n of opportunities to t h e ita arrival (including the xt's) a n d t h e process by which actions were decided on. W e t h u s assume t h a t decisions relating t h e h i s t o r y to t h e other aspects of experience, B(e), are m a d e b y preassigned m e t h o d s - - m e t h o d s which m a y perhaps be a d a p t e d to t h e choice o f a. Thus e -- [a, h], q = q[ai, hi], ei -- e~[#~,/~i]. W e assume t h a t t h e p r e d a t o r chooses a to m a x i m i z e some n u m e r i c a l function, C(e) -C(e[a, h]), of his experience. Since h is r a n d o m , he chooses a to m a x i m i z e t h e e x p e c t e d v a l u e of this criterion. Thus t h e optimal a is defined by
E~{V(e[., hi)} = m a x Z,{O(e[ v, hi)}. I n order to describe t h e f o r m of an o p t i m a l strategy, we resort to a device c o m m o n in d y n a m i c p r o g r a m m i n g (see, e.g., B e l l m a n a n d K a l a b a , 1965, pp. 35 ft.). W e assume t h a t , for each e and j > i, we h a v e found t h e o p t i m a l v a l u e of a(ej, s), a n d now w o r k b a c k w a r d s to get t h e best a(q, s). T h u s our p r e d a t o r has, b y some means, got experience q, and has received a signal s; no m a t t e r whether t h e p r e y is m i m i c or m o d e l or whether he eats it or not, he will k n o w h o w best to act after dealing w i t h it. W h a t should he do now? Suppose t h e s t r a t e g y to be used after this decision is m a d e is ~, so for all e e E ,
E~{O I e,, e °} --- E,{C(q, e°, e,(v, ~)) ] e,, e°} = m a x E A C ] q, e°}.
(AD
SeE
A f t e r t h e experience q a n d signal s, t h e probability the n e w l y - a r r i v e d prey is a m i m i c is P(M~ ] q, s), a n d t h a t it is a model is P(D~ ] q, s) : 1 - P ( M i ] q, #). B y assumptions (a) and (c), these probabilities do n o t d e p e n d on t h e s t r a t e g y used so far, except t h r o u g h q. I f t h e p r e d a t o r n o w uses strategy a, then, using a s s u m p t i o n (b),
E¢~, A O I e,, 8} = P[M, I el, 8][~(ei, s)E,{C I e,, s, M,(g)} + (1 a(q, s))En(C [ q, s, Mt(Ng)}] + P[D~ ] q, s][a(q, s)E,{C I q" s, D~(g)} + (1 a(q, s)E,{C I e~, s, D,(~g)}]. -
-
-
(E.g. t h e first of t h e a b o v e 4 terms is (the p r o b a b i l i t y t h e p r e y is a mimic) × (the probability t h e p r e y is eaten) x (the e x p e c t e d v~lue o f t h e criterion if t h e p r e y is a m i m i c and is eaten)). R e a r r a n g i n g , we get
E~..){O i ~,, ~} = P ( M , I ~,. 8)E.{O I q, ~. M ,( ~ g) } + P(Dt [et, s)E,{C { q, s, JD,( ~g)} + a(q, s){P(M~ ] q, s)Fx('r), q, s) - P ( D I I q, s)F2(7), q, s)}
(A2)
384
A. OATEN, C. E. M. P E A R C E AND M. E. B. SMYTH
where
FI(~, a,, s) = E,{O I e,, 8, Ms(g)} - E,,{C I es, s, M , ( ~ g ) } " / and
~
F2(~?, as, s)
(A3)
En{C I as, s, D~( ~g)} - E,{6' [ es, s, D,(g)}.J
I f t h e p r e d a t o r uses s t r a t e g y %bfor t h e first i v a h m of his criterion will be
1 prey, and a for t h e i th, t h e e x p e c t e d
E,,.o.,)[c] = f E,~,,)[c Iet,
s] dPo(et, 8)
(i.e. E[C] =- E[E[C [ %, s]], a s t a n d a r d f o r m u l a in probability), so t h e o p t i m a l value of will be t h a t which m a x i m i z e s (A2). Clearly this m e a n s a(e~, s) should be 1 w h e n its coeffi. cient is positive, 0 w h e n its coefficient is negative, a n d a r b i t r a r y otherwise. A r b i t r a r i l y t a k i n g a = 0 in t h e a r b i t a r y case, we h a v e t h a t t h e o p t i m a l choice for ~(%, s) is 1 if
P(M~ I at, s) -~2(V, as, s) > P(Ds 1%, s) -~'l(~q,%, s) assuming
FI(~, e~, s)
(A4)
> 0.
Since
P ( M s [ e~, s) =
I:'(M, I es)A(s) P(M~ [ %)fl(s) + P(D~ ] e,)f2(s)
b y (a) a n d (c), and P(D~ [ es, s) = 1 - P(M~ [ e~, s) (A4) becomes
fl(s) P(Ds [ es)l~'2(,,?, es, s) f2(s) > P(Ms [ e~)JFl('q, es, s)
(A5)
(In our use of this result in t h e b o d y of t h e p a p e r , we h a v e written P(Mt [ es), P(D~ ] e~), Fl(~?, e~, s) a n d E~(~, e~, s) as P ( M ) , P(D), a a n d b respectively.) This result, shorn of t h e u n w i e l d y s t r u c t u r e we h a v e m a d e to a c c o m m o d a t e t h e predat o r - p r e y situation, has t h e following simple, s t a n d a r d derivation: we observe s, which is d i s t r i b u t e d according to one of t h e densities fz a n d r e ; we h a v e to guess which, a n d if we guess f l w h e n it is reallyf~ we lose/~:(~, a~, s), a n d if we guess f2 w h e n it is r e a l l y f z we lose FI(~/, e,, s). W e lose zero for a correct guess. To help us, we are told t h e d i s t r i b u t i o n was chosen b y a r a n d o m m e c h a n i s m which chooses f l w i t h probability P ( M , [ e~) a n d fe w i t h p r o b a b i l i t y P(Ds [ e,). I f we decide t h a t , on observing s, we will guess the distribut i o n to be f l w i t h probability :p(s), our e x p e c t e d loss is fs (1 - p(s)}Fl(~ l, at, s)fl(s) ds if t h e distribution is really f l , a n d fs P(S)t'2(~), %, s)f2(s) ds if it is really f2; t a k i n g i n t o a c c o u n t t h e probabilities of these t w o situations, our overall e x p e c t e d loss is
fs P(M~ 1 8S){1
p(s)}I'~(.q, %, s)fz(s) + P ( D s 1%)p(s)E2('q, e~, s)fu(s) ds.
Clearly, %o keep this as small as possible, w e should choose p(s) to be 1 w h e n its coefficient is the smaller; i.e.w h e n (A5) holds. A t present, the right side of (A5) depends on s. This could be the case if 8 had an effect on future history apart from being an indicator of the type of prey--e.g, if s was a warning cry. E v e n here, 8 could cancel between numerator and denominator--i.e, its effect could be t h e s a m e regardless of w h e t h e r t h e p r e d a t o r eats t h e p r e y or not. If, however, s does not h a v e an effect on t h e f u t u r e b e y o n d indicating t h e t y p e of p r e y - formally, if for a n y i a n d a n y subset, A , of H x E , n o t involving/% or s~, P¢(A ] ks, st) = Pa(A ] I%) for all a and for ks = Mt or/c~ = D ~ - - a n d if t h e criterion function, C(e), does n o t
BATESIAN MIMICRY AND SIGNAL DETECTION THEORY
385
involve a n y 88 (one w o u l d expect it to involve only the types of prey eaten a n d t h e times t h e y were eaten), t h e n [referring to (A3), defining _F1 a n d z~2] E,[O I e~, s, Ms(g)] = f C{e,, t,, 8, x~, Ms(g ), R(e°), ~,(~, ~i)} dPn{hs I e~, 8, M~(g)}, a n d our assumptions enable b o t h 8's to be r e m o v e d from t h e right side, which is t h u s i n d e p e n d e n t of 8. Similarly, t h e other constituents of F I ( ~, e~, s) are i n d e p e n d e n t of s. The relation (Ad) also depends on FI(~, es, 8) > 0. I f 2"1 < 0 a n d F2 > 0, a(es, s) -- 0 for all s - - e a t i n g a n y p r e y is disadvantageous to t h e predator. I f F1 > 0 a n d F 2 < 0, a(e~, 8) = 1 for all s. I f F1 < 0 a n d F2 < 0, t h e n it is a d i s a d v a n t a g e to eat a mimic, b u t a n a d v a n t a g e to eat a model, a n unlikely s i t u a t i o n which is the o n l y case t h a t does n o t fit the form of (A4), i.e. to eat if fl(s)[f2(8) is large enough. There are situations where F1 < 0 or J~2 < 0 (but n o t both) m a y o c c u r - - i f eating a prey could frighten other prey away we would h a v e F~ < 0; if eating a model provides useful i n f o ~ a a t i o n about t h e n a t u r e of the n e x t prey arrivals we could have F2 < 0. (A somewhat similar s i t u a t i o n - choosing a n o p t i o n which has b a d i m m e d i a t e consequences b u t possibly good long-term consequences--occurs w h e n parents expose their children to m u m p s to p r e v e n t their catching it later.) F o r m a n y predators it is reasonable to assume (i) a meal takes no t i m e - - i . e , t h e p r e d a t o r misses no opportunities b y eating a m i m i c (though eating a model m a y disable h i m for a time), a n d (ii) his history is unaffected b y his a c t i o n s - - e . g , he does n o t frighten p r e y away. F o r m a l l y this m e a n s t h a t t h e distributions P~ are identical a n d t h a t the distributions of (h~, ]~s) given hs, a n d o f / ~ given (hs, h °) are i n d e p e n d e n t , respectively, of es a n d (es, e°). A slightly simpler p r o b a b i l i t y structure is now possible: t h e p r e d a t o r is assigned a history h according to t h e d i s t r i b u t i o n P on H ; he chooses a to m a x i m i z e E[C(e(a, h)}]. I t Mso seems reasonable to assume that, other things being equal, it is b e t t e r to eat a m i m i c t h a n n o t to: i.e. if e a n d e* are two experiences differing o n l y in that e has ls ~ M~(g) t h e n C(e) >_ C(e*). [This p r e d a t o r ' s ability to choose is unaffected b y past experience; he c a n n o t m a k e " b a d " choices for good l o n g - t e r m results. W e Mso assume R(e) does n o t depend o n t h e sequence (11. . . . ); R(e), R(e °), etc., wilI be o m i t t e d i n w h a t follows, to simplify notation.] Suppose these assumptions hold, a n d t h a t ~ is optimM in the sense of (A1); we show t h a t FI(~, es, s) < 0 (we allow dependence on s here) leads to a contradiction. Define ~, for j > i, as follows: ~*{(es, s, M~(g), ~j), 81} = ~{(es, s, Ns, ~j), 81}
(A6)
with ~*(ej, 81) = ~(ej, sl)
if Is ~ Ms(g),
where gtj is the p a r t of e coming after the i ~ prey has b e e n dealt with b u t before the jth has arrived. I n w h a t follows we abbreviate ~*[et, s, -Ms(g), esj, sl] to ~*(ej, 81), a n d the right side of (A6) to ~(e~, 81). The purpose of t h e definition, perhaps obscured b y the n o t a t i o n , is t h a t ~?* agrees w i t h ~? except if the i t~ prey is a m i m i c a n d is eaten, in which case it agrees with the value of ~ for the corresponding experience with the i th prey n o t eaten. Then
E[C{e(v}*, h) [ e,, s, M~(g)] = .tC{e~, s, M,(g), g,(V*~, = .tC{e~, s, ltls(g), gs(V~,
dP(~) dP(£)
>_ .I'C{e~,s, Ns, ~(W, ~)} dP(/~) =E[G{e(w, h) [ es, 8, Ns] > E[C{e(v, h)} ] es, s, Ms(g)]
386
A. OATEN, C. E. ~I. BEARCE AND M. E. B. SMYTI-I
since F i ( ~ , et, s) < O. T h e first e q u a l i t y a b o v e follows from ~m* being, b y definition, identical to ~N. The first i n e q u a l i t y follows since C(e) >__C(e*) if e has M~(g) where e* has N~. W e t h u s h a v e t h a t ~* is a strictly b e t t e r p r o c e d u r e t h a n ~, contradicting t h e o p t i m a l i t y of 7/as defined in (A1). Thus, if ~ is o p t i m a l , F i ( ~ , e~, s) > 0. F i n a l l y we suggest an e x a m p l e of t h e s i t u a t i o n we h a v e been describing. Suppose t h a t at a n y t i m e t h e p r e d a t o r enjoys a " w e l l - b e i n g " function, w ( t ) - - t h e a m o u n t of food in his gut, perhaps, or of stored carbohydrates. I f no p r e y is eaten, w(t) declines at some r a t e dependent, perhaps, on b o t h t a n d w(t); if a m i m i c or a l t e r n a t i v e food is eaten, w(t) j u m p s up, and if a m o d e l is eaten, w(t) drops. W e illustrate in Figure 3. w(t)
A
B
C
O
E'
t
F i g u r e 3. A possible w(t). Mimics h a v e been e a t e n at A a n d C, giving a j u m p of m; a l t e r n a t e food at B gives a j u m p of a; a m o d e l at D gives a drop of d. T h e p r e d a t o r dies at E T h e r e are m a n y possible features of w(t) one m i g h t seek to m a x i m i z e ; one possibility is t h a t t h e p r e d a t o r dies if w(t) e v e r falls below a fixed a m o u n t , so he tries to choose a s t r a t e g y to m a x i m i z e his e x p e c t e d l e n g t h of life. E v e n u n d e r simple conditions, this does n o t seem to be an easy p r o b l e m ; however, one m i g h t expect, intuitively, t h a t if w(t) were low t h e p r e d a t o r would h a v e less inhibitions a b o u t eating, since his h u n g e r forces h i m to t a k e chances. This agrees w i t h (AS): if w(t) is low t h e n t h e difference b e t w e e n eating a m i m i c a n d not eating m a y be quite large, so t h e chance is w o r t h taking. Similarly, b o t h (A5) a n d i n t u i t i o n suggest t h a t if t h e p r e y t e n d m a i n l y to be mimics, t h e chances are t h a t a signal t h a t suggests a m o d e l is m o r e likely to come f r o m an unusual mimic. T h e presence of a b u n d a n t a l t e r n a t i v e food m a y m a k e the p r e d a t o r m o r e cautious since, e v e n if w(t) is low, t h e r e is a good chance of a l t e r n a t i v e food appearing before w(t) reaches t h e critical p o i n t - in t e r m s of (A2) and (A5), E~{C ] e~, s, D~(~gt)} is still quite large, so F i ( ~ , e~, s) (called " a " in t h e text) is not so large, nor Fp(~7, e~, s) (---b) so small, t h a t t h e p r e d a t o r is forced to t a k e chances. LITEEATUEE B e l l m a n , R. and ]~. K a l a b a . 1965. Dynamic Programming and IVlodern Control Theory. N e w Y o r k : A c a d e m i c Press. Blough, D . S . 1967. " S t i m u l u s Generalization as Signal D e t e c t i o n in P i g e o n s . " Science, 158, 940-941.
BATESIAN MIMICI~Y AND SIGNAL DETECTION TI-IEOI%Y
387
]31ough, D . S . 1972. "Recognition by the Pigeon of Stimuli Varying in Two Dimensions." J. Exp. Anal. Behav., 18, 345-367. Boneau, C.A. 1974. "Paradigm Regained? Cognitive Behaviorism l=~estated." Amer. Psych., May 1974, 297-309. and J. L. Cole. 1967. "Decision Theory, the t)igeon and the Psychophysical Function." Psych. Rev., 74, 123-135. Broadbent, D . E . 1971. Decision and Stress. London: Academic Press. Brewer, L. 1~., J. v a n Z. Brewer and J. M. Corvino. 1967. " P l a n t Poisons in a Terrestrial Food Chain." Prec. Nat. Acad. Sci., U.S.A., 57, 893-898. , and P. W. Westcott. 1960. "The l%eactions of Toads (Bufo terrestris) to Bumblebees (Bombus americanorum) and their Robberfly Mimics (Mallophora bomboides). Amer. Nat., 94, 343-355. . . . . . , W. I~. I~yerson, L. L. Coppinger and S. C. Glazier. 1968. "Ecological Chemistry and the Palatability Spectrum." Science, 161, 1349-1351. Chase, S. and E. G. Heinemann. 1972. "Choices Based on l~edtmdant Information: A n Analysis of Two-Dimensional Stimulus Control." J. Exp. Psych., 92, 161-175. Duncan, C. J. and P. M. Sheppard. 1965. "Sensory Discrimination and its Role in the Evolution of Batesian Mimicry." Behaviour, 24, 269-282. Falconer, D . S . 1961. In~rodnction to Quantitative Genetics. l~IewYork: Ronald Press. Feller, W. 1966. A n Introduction to Probability Theory and its Applications, Vol. I I . New York: Wiley. Fisher, Sir R . A . 1958. Genetical Theory of Natural Selection, 2nd Edn. Iqew York: Dover Publications. Green, D. M. and J. A. Swets. 1966. Signal Detection Theory and Psyehophysies. l~ew York: Wiley. Imbraginov, I. A. a n d Yu. V. Lilmik. 1965. Independent and Stationarily Connected Random Variables. Moscow: Nauka (In Russian). McNicol, D. 1972. A Primer of Signal Detection Theory. London: Allen & Unwin. Iqur, U. 1970. "Evolutionary l~ates of Models and Mimics in Batesian Mimicry." Amer. Nat., 104, 477-486. Stein, C. 1971. "A Bound for the Error in the l~lormal Approximation to the Distribution of a Sum of Dependent R a n d o m Variables." Prec. Sixth Berkeley Syrup. Math. Star. and Prob., Vol. 2. Berkeley: University of California Press. Suboski, M . D . 1967. "The Analysis of Classical Discrimination Conditioning Experiments." Psych. Bull.,68, 235-242. Swets, J. A., W. 1). Tanner and T. G. Birdsall. 1961. "Decision l~roblems in Perceptions." Psych. Rev., 68, 301-340. Terman, M. and J. S. Terman. 1972. "Concurrent Variation of l~esponse Bids and Sensitivity in an Operant-psychophysical Test." Perception Psychophys., 11,428-431. Waldbauer, G. P. and J. K. Sheldon. 1971. "Phenological l%elationships of Some Aculeate I-Iymenoptera, their Dipteran Mimics, and Insectivorous Birds." Evolution, 25, 371-382.
RECEIVED 4-2-74 I~EVlSED 12-30-74