Animal Behaviour 82 (2011) 85e92
Contents lists available at ScienceDirect
Animal Behaviour journal homepage: www.elsevier.com/locate/anbehav
The evolution of superstition through optimal use of incomplete information Kevin R. Abbott*, Thomas N. Sherratt Department of Biology, Carleton University
a r t i c l e i n f o Article history: Received 16 November 2010 Initial acceptance 1 February 2011 Final acceptance 21 March 2011 Available online 8 May 2011 MS. number: A10-00872 Keywords: Bayesian learning causal learning explorationeexploitation trade-off optimization superstition two-armed bandit problem
While superstitions appear maladaptive, they may be the inevitable result of an adaptive causal learning mechanism that simultaneously reduces the risk of two types of errors: the error of failing to exploit an existing causal relationship and the error of trying to exploit a nonexistent causal relationship. An individual’s explorationeexploitation strategy is a key component of managing this trade-off. In particular, on any given trial, the individual must decide whether to give the action that maximizes its expected fitness based on current information (exploit) or to give the action that provides the most information about the true nature of the causal relationship (explore). We present a version of this ‘twoarmed bandit’ problem that allows us to identify the optimal explorationeexploitation strategy, and to determine how various parameters affect the probability that an individual will develop a superstition. We find that superstitions are more likely when the cost of the superstition is low relative to the perceived benefits, and when the individual’s prior beliefs suggest that the superstition is true. Furthermore, we find that both the total number of learning trials available, and the nature of the individual’s uncertainty affect the probability of superstition, but that the nature of these effects depends on the individual’s prior beliefs. Ó 2011 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved.
. every man must judge for himself between conflicting vague probabilities. (Charles Darwin 1887, page 307) Dave knew perfectly well that making five copies [of the chain letter] and sending them to his friends wasn’t going to bring him good luck. It was the bad luck he was worried about. . It’s a hard world. You can’t be too careful. It’s not such a big deal to make five photocopies. Even at forty-six cents, a stamp is still a bargain. (Stuart McLean 2000, page 42) Superstitious behaviours can be defined as actions (or inactions) that are given in order to affect the probability that a beneficial outcome occurs when, in fact, there is no causal relationship between the action and the outcome (Skinner 1948). A stricter definition of a superstition (which, following Hood 2010, we henceforth refer to as a ‘supernatural superstition’) is one where there are no rational grounds to believe in a relationship between action and outcome, so that the agent’s prior belief is that the relationship is unlikely. World leaders, athletes and media celebrities have all admitted to engaging in such superstitions (supernatural and otherwise), ranging from carrying ‘lucky coins’ to not treading on white lines (reviewed in: Wargo 2008; Hood
* Correspondence: K. R. Abbott Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, ON K1S 5B6, Canada. E-mail address:
[email protected] (K.R. Abbott).
2010). Superstitions are challenging to explain from an evolutionary perspective when the action carries a cost, because in these instances one might expect that individuals engaging in superstitions would have lower fitness than individuals not engaging in these behaviours. Several authors have resolved the paradox by acknowledging that individuals that wish to exploit causal relationships in the environment must rely on incomplete information about causality, and that superstitious behaviours may be an unavoidable deleterious side effect of adaptively utilizing this information (Killeen 1978; Beck & Forstmeier 2007; Foster & Kokko 2009). This information, albeit incomplete, can be generated by natural selection (i.e. instinct) (Foster & Kokko 2009), cultural transmission, personal learning, or a combination of all three (McNamara et al. 2006; Beck & Forstmeier 2007). Here we focus primarily, although not exclusively, on personally learned information, not least because learning strategies may play a dominant role in superstition formation (Beck & Forstmeier 2007). Skinner (1948) identified what he considered superstitious behaviour in pigeons, namely idiosyncratic behaviours (including head swinging or turning anticlockwise) in advance of food delivery, despite the fact that food was provided at fixed intervals. In so doing, he proposed a well-known learning-based explanation for these behaviours, suggesting that they might be the result of chance early pairings of certain actions and beneficial outcomes. For example, if an outcome occurs by chance
0003-3472/$38.00 Ó 2011 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.anbehav.2011.04.002
86
K. R. Abbott, T. N. Sherratt / Animal Behaviour 82 (2011) 85e92
after performing an action, then this might suggest to the agent that the action increases the probability that the outcome occurs. Even if there is no causal relationship between the action and outcome, such superstitions can be maintained if the agent ceases to explore the consequences of alternative actions, because there is too much to lose if the relationship turns out to be valid. Nevertheless, despite Skinner’s experimentally inspired hypothesis, and multiple formulations of the same basic idea (reviewed in Wargo 2008), the relationship between optimal explorationeexploitation strategies and the development of superstitions remains almost entirely unexplored from a quantitative perspective. Previous models that have attempted to elucidate the adaptive significance of superstitious behaviour (Beck & Forstmeier 2007; Foster & Kokko 2009) did not consider the learning process explicitly. While Beck & Forstmeier (2007) did raise the possibility that an adaptive learning strategy might be important, they did not explore the idea in detail. In this paper, we address this shortfall. Thus, we deal directly with how prior belief, chance events and the optimal use of incomplete information can combine to generate superstitious behaviours over multiple trials, rather than considering costs and benefits in one-off decisions (Foster & Kokko 2009). Consider an individual that is attempting to determine whether a given action is causally related to a given outcome. What the individual needs to learn is whether the outcome is more or less likely to occur if the action is performed than if some other action is given (including nonaction). This requires exploring (i.e. testing) all possible actions. However, on most trials, the information the individual has already acquired will tentatively suggest that one particular action is associated with highest fitness returns. This can create a trade-off between giving the response that maximizes expected fitness on any given trial based on current knowledge (exploitation) and giving the response that provides the most information (exploration) about the true nature of the causal relationship between the individual’s actions and outcomes (Cohen et al. 2007). As we will see, the payoff maximizing strategy should include a mechanism that allows an individual to explore in early trials and shift to exploitation in later trials.
MODEL DESCRIPTION AND RESULTS In this paper, we present and analyse a model of optimal explorationeexploitation strategies for an individual learning the causal relationship between actions and outcomes. To make the model as tractable and informative as possible, we make several simplifying assumptions about the nature of proximate costs and benefits associated with actions and outcomes. We assume that there are only two possible mutually exclusive actions that the individual can perform on any given trial: one that is costly, Ac, and one that is cost-free, Af. One of these actions can, as necessary, be considered a ‘nonaction’ (i.e. the behaviour of not adopting the alternative action). We also assume there are only two possible mutually exclusive outcomes to a trial: a relatively beneficial outcome, Oþ, and an outcome, O that is less beneficial relative to Oþ. For simplicity, we arbitrarily assume that the value of O is 0, the benefit of Oþ is b ¼ 1, and the cost of Ac, c, is as some fraction of b. There are two relevant conditional probabilities, Pr(OþjAc) and Pr(OþjAf). We assume that the individual knows the value of one of these conditional probabilities (the case where both values are known is trivial; the case where there is uncertainty associated with both values is interesting, but complex and is left for future papers). In the next subsection (The Costly Exploration Case), we develop the model and describe the results for the case where Pr(OþjAc) is unknown. We then briefly describe how the model can be modified for the case where Pr(OþjAf) is unknown (see The Cheap Exploration Case) and show the relevant results. Finally, we describe how the results differ between the two cases (see Costly versus Cheap Exploration). The Costly Exploration Case Model description If Af represents cost-free nonaction, then an individual will probably have had extensive experience with Af, and should therefore know the probability pk ¼ Pr(OþjAf) with certainty (see Table 1 for a description of all mathematical terminology). At the same time, if the costly action, Ac, is relatively novel, then the individual will not know the exact size of the (hence unknown) probability pu ¼ Pr(OþjAc). We refer to this as the ‘costly
Table 1 Description of model parameters and variables Parameter/ variable
Description
Oþ O Ac
Relatively profitable outcome Relatively unprofitable outcome Relatively costly action
Af
Relatively cost-free (cheap) action
b c pk
Benefit of Oþ relative to O Cost of Ac relative to Af True and known probability of getting the beneficial outcome, Oþ, with the highly familiar action True, but unknown, probability of getting the beneficial outcome, Oþ, with the novel action Parameters of the Beta prior pu distribution that defines the individuals prior beliefs about the true value of pu (1) Total number of trials available Current number of exploration trials (N) Current number of exploration trials on which the beneficial outcome, Oþ, occurred (n) Expected value of the Beta distribution that defines the individuals current beliefs about the true value of pu The maximum expected return on all future trials The long-term expected return if the individual performs at least one more exploration trial
pu
a, b N n r Eu(r,n) F(r,n) S(r,n)
Value/details Costly exploration case (e.g. lucky charms)
Cheap exploration case (e.g. the number 13)
Costly novel action
Costly familiar action (e.g. actively avoiding Af) Cheap novel action
Cheap familiar action (e.g. passively avoiding Ac) 1 c is represented as some fraction of b Pr(OþjAf)
Same in both cases Same in both cases Pr(OþjAc)
Pr(OþjAc)
Pr(OþjAf)
Number of Ac responses
Number of Af responses
(aþr)/(aþbþn)
Same in both cases
max((Nn)pk, S(r,n)c) Eu(r,n) (1þF(rþ1, nþ1))þ(1Eu(r,n)) F(r,nþ1)
max((Nn) (pkc), S(r,n)) Same in both cases
K. R. Abbott, T. N. Sherratt / Animal Behaviour 82 (2011) 85e92
25 Number of successful trials (r)
exploration’ case. Carrying lucky charms may reflect the above combination of conditions. Thus, carrying a lucky charm, Ac, imposes a cost (albeit extremely small, in terms of purchasing, packing and keeping track of the charm) relative to not carrying the charm, Af. At the same time, individuals will probably initially have more experience with Af, and therefore should have a far greater knowledge of the value of Pr(OþjAf) than of Pr(OþjAc). Giving the action associated with the known conditional probability (Af in the current case) provides a known rate of return, but it provides no further information about the value of pk. Giving the action associated with the unknown conditional probability (referred to as exploration) does, however, provide information about the true value of pu as well as a potential reward. We assume the individual has a set of prior beliefs about the true value of pu, characterized as a Beta probability distribution (we refer to this as the prior pu distribution throughout this paper). On exploratory trials, where the presence or absence of the beneficial outcome is observed, we assume standard binomial Bayesian updating (Bolker 2008) where the individual’s updated knowledge about pu can also be described by a posterior Beta probability distribution (see Appendix for further details). Specifically, after n exploration trials, r of which resulted in the beneficial outcome, Oþ, and n r that resulted in O, the expected value of the posterior probability distribution is given by Eu(r,n) (see equation (A1)). Note that Eu(0,0) gives the expected value of the prior pu distribution before any exploration trials (we use pu to describe the true value of the relevant conditional probability, and Eu(r,n) to describe the individual’s expectation based on current belief about pu; we assume that individuals current belief about pk and the true value of pk are the same, so no additional notation is required). We analyse the effects of the known conditional probability and nature of the uncertainty about the unknown conditional probability by varying the value of pk and the shape of the prior pu distribution. For a given analysis, we hold the variance of the prior pu distribution constant so that varying the shape of the prior pu distribution is equivalent to varying the value of Eu(0,0). We also assume that the individual knows the total number of trials available, N. Collectively, these assumptions make our system a two-armed bandit problem (McNamara & Houston 1980), where the optimal explorationeexploitation strategy can be identified using a combination of the appropriate learning rules (see above) and dynamic programming (Jones 1978; see Appendix for further details). It has long been recognized that the optimal solution to this specific type of problem consists of regions of stopping and continuation, separated by a single boundary (Jones 1978), so that for each number of trials, n, there is a single threshold of beneficial outcomes, r, below which the optimal individual will permanently cease exploration (e.g. Fig. 1). If exploration (performing the costly Ac in this case) is terminated, then the agent will henceforth only select the action (the cost-free Af in this case) with known probability of beneficial outcome, a process of exploitation. If exploration is not terminated, then the exploitation phase is harder to delineate. Selecting the action with unknown probability of beneficial outcome can arise either because the future value of gaining the information is deemed sufficiently high (exploration), or because the agent perceives the action that gives the highest immediate reward (exploitation), or some mixture of the two. Clearly, on the last trial then, there is no future value to the information and the agent should simply choose the action with the highest perceived net reward (i.e. behaviour on the final trial is always exploitative). The fact that the prior pu distribution includes a range of hypothesized pu values from 0 to 1 implies that the individual entertains the possibility that the response with unknown probability of beneficial outcome (Ac in this case) could serve to prevent the beneficial outcome Oþ (pu < pk), could help cause Oþ (pu > pk),
87
20
15
10
5
0
5
10 15 20 Number of exploration trials (n)
0 25
Figure 1. Optimal decision rules: the optimal explorationeexploitation strategy of an individual when the prior pu distribution is symmetrical around Eu(0,0) ¼ 0.5 with variance ¼ 0.025 (a ¼ b ¼ 4.5) and the known probability, pk ¼ 0.5. Parameter values: total number of trials, N ¼ 25, cost, c ¼ 0.05 (thin lines) or 0.15 (thick lines), for both the costly exploration (solid lines) and the cheap exploration cases (dashed lines). This optimal strategy is shown in terms of the combinations of number of exploration trials, n, and number of beneficial outcomes, r, for which the individual will (above the relevant line) or will not (below the relevant line) continue to explore the informative response. An individual’s experience in a causal learning problem can be visualized as a path below the line r ¼ n (dotted line). Exploration ends if this path crosses below the relevant line into the exploitation region. In the costly exploration case, an individual is counted as superstitious if its path reaches the right boundary without passing into the exploitation region at any point. In the cheap exploration case, an individual is counted as superstitious if at any point, its path crosses into the exploitation region. The exploitation region naturally clusters in the area where n is relatively large and r is relatively small, and the smaller this region, the greater the willingness to explore. Note that it is possible to get a strategy where the point r ¼ n ¼ 0 is in the exploitation region so that individuals will not explore the action with unknown probability of reward for even one trial (probability of superstition ¼ 0 in the costly exploration case, as demonstrated by the thick solid line, and probability of superstition ¼ 1 in the cheap exploration case). Conversely, it is possible to get a strategy where there is no exploitation region (probability of superstition ¼ 1 in the costly exploration case and probability of superstition ¼ 0 in the cheap exploration case).
or could have no causal relationship with Oþ (pu ¼ pk). This also suggests that the individual could make a range of errors involving overestimating or underestimating the preventative or causal power that either response has over Oþ. However, for simplicity, and because of how we have defined superstitions (actions given to bring about beneficial outcomes, or prevent deleterious outcomes, when actions and outcomes are not causally related), our analysis focuses on cases where the true value of pu ¼ pk, so that the only error of interest is giving the costly response Ac when the action has no effect. It will also be useful to classify whether an individual’s prior state of belief suggests that the optimal response is Ac or Af. If benefit b ¼ 1, then Ac is optimal any time Pr(OþjAc) c > Pr(OþjAf). Therefore, in the current case, individuals will start with a prior belief that the benefits of Ac outweigh the costs whenever Eu(0,0) c > pk (we refer to this as prior belief), and that Ac is not worth the cost when Eu(0,0) c < pk (prior disbelief, reflecting a ‘supernatural superstition’ if sampling the action with unknown probability of reward continues despite this initial belief that it is not cost effective). Prior indifference is the border region where Eu(0,0) c ¼ pk. Note that in this terminology, it is possible for an individual to suspect that Ac causes Oþ, but still be in a state of prior disbelief because the expected causal power of Ac is not enough to justify the cost. Throughout this paper, prior belief, disbelief and indifference refer to whether or not an individual starts with the belief that the costly Ac is the optimal response, which is related to, but (for c > 0)
88
K. R. Abbott, T. N. Sherratt / Animal Behaviour 82 (2011) 85e92
be expected, the probability of superstition is higher when the individual has a prior belief that the costly action, Ac, is the optimal response (region above the dotted lines in Fig. 2), but superstitions are still possible with prior indifference (dotted lines shown in Fig. 2) and even when individuals have a prior belief that Ac is the suboptimal response (prior disbelief reflecting a supernatural superstition; region below the dotted lines in Fig. 2). Figure 2, as well as an extensive search of parameter space, suggests that the effect of N on the probability of superstition depends on the individual’s prior belief about which response is optimal in the current case. In particular, individuals with prior belief or prior indifference are more likely to develop a superstition if N is small, but individuals with strong prior disbelief are more likely to develop a superstition if N is large. Note that the effect of N is not easily characterized when individuals have weak prior disbelief; in particular when the individual believes that Pr(OþjAc) c < Pr(OþjAf) < Pr(OþjAc). The interaction between degree of prior belief and N can be qualitatively understood. When an individual has a strong prior belief that Ac is the optimal response, extensive exploration is required for the individual to learn that pu z pk. Under these conditions, individuals will only lose this prior belief if N is large enough to favour (or even permit) extensive exploration. Therefore, when N is small, individuals will fail to lose their prior belief and will perform Ac on the final trial (i.e. will be counted as having a superstition). Conversely, when individuals have a prior belief that Ac is the suboptimal response (prior disbelief in the terminology given above), extensive exploration is required for the individual to acquire the belief that the causal power of Ac over Oþ justifies the costs, which means that individuals might perform Ac on the final trial only if N is sufficiently large. Note that for any given pair of actions, the chance sequence of events required to shift an individual from prior disbelief to belief becomes more unlikely the stronger the preconception, so, as mentioned above, superstitions
not the same as, the individual’s initial expectation of the causal power that Ac has over Oþ. We ran forward iterations (5000 replicates per parameter value combination) based on the optimal explorationeexploitation strategies identified for a set of parameter values by the dynamic program (see Appendix). We classified the individual in a forward iteration as having developed a superstition if they were still giving the costly response, Ac, on the final trial (n ¼ N) despite the fact that no causal relationship existed between the actions and outcomes. Focusing on the last trial captures the sense that superstitions are persistent and resistant to extinction by evidence. From these data, we calculated the probability that a superstition develops for a range of c, N, pk (¼pu), and Eu(0,0).
0.7 0.6 0.5 0.4 0.3
pk
Eu (0,0)
Results The simplest, and most obvious, results have to do with the effect of the cost of Ac, c, on the probability of superstition. An extensive search of parameter space suggests that small c consistently favours higher probability of superstitions compared to large c. The reason for this result can be understood in terms of Fig. 1 (solid lines for the current costly exploration case). When exploration is costly, the superstitious response is informative, but costly. Individuals are more willing to continue exploring the informative response, even if prior experience does not strongly support the superstitious belief that Ac causes the beneficial outcome, Oþ, when the cost of Ac is low. In Fig. 1 this is demonstrated by the fact that the solid critical line shifts towards the X axis, and the size of the exploration region expands, as c decreases. In the current case, superstitions are effectively synonymous with a continued willingness to explore the unknown, so an expansion of the exploration region increases the probability of superstition. The left column of Fig. 2 shows the effect of degree of prior belief (Eu(0,0) relative to pk), and the total number of trials, N, on the probability of superstition for the costly exploration case. As might
0.7 0.6 0.5 0.4 0.3
0.4
0.5
0.6
0.7
0.4
0.5
0.6
0.3
0.4
0.5 pk
0.6
0.8 0.4
0.5
0.6
0.7
0.7 0.6 0.5 0.4 0.3
0.7
0.5 0.4 0.4
0.5
0.6
0.7
0.7 0.6 0.5 0.4 0.3
0.7
0.7 0.6
0.3
pk
0.3
0.9
0.3
pk
Pr (O+ Ac) Eu (0,0)
0.7 0.6 0.5 0.4 0.3
Eu (0,0)
0.3
1
0.7 0.6 0.5 0.4 0.3
0.3 0.2 0.1
0.3
0.4
0.5 0.6 Eu (0,0)
0.7
0
Pr (O+ Af)
Figure 2. Probability of superstition: the probability that individuals using the optimal explorationeexploitation rules develop a superstition as a function of each individual’s prior beliefs, when the total number of trials, N ¼ 2 (bottom row), 10 (middle row), or 50 (top row) for both the costly exploration (left column) and the cheap exploration (right column) cases. Here cost, c ¼0.05 and the variance of prior pu distribution ¼ 0.025. In all panels, the X axis gives the individual’s prior belief (i.e. on trial 0) that the beneficial outcome, Oþ, will follow the relatively cheap action, Af (i.e. pk in the costly exploration case and Eu(0,0) in the cheap exploration case). Conversely, the Y axis gives the individual’s prior belief that the beneficial outcome, Oþ, will follow the relatively costly action, Ac (Eu(0,0) in the costly exploration case and will follow pk in the cheap exploration case). This means that in all panels, the region above the dotted white line represent prior belief, the area below represent prior disbelief, and the dotted line itself represents prior indifference (see text).
K. R. Abbott, T. N. Sherratt / Animal Behaviour 82 (2011) 85e92
(‘supernatural superstitions’) are generally unlikely in any given case with prior disbelief. The Cheap Exploration Case Model description If Ac involves paying some cost to avoid some novel state or environment, and Af involves passively accepting this novel state, then the individual should tend to know the probability of the beneficial event given the costly action pk ¼ Pr(OþjAc), but be less certain about the probability of the beneficial outcome given the cheap action pu ¼ Pr(OþjAf) (see Table 1 for a comparison of the two cases). We refer to this as the cheap exploration case. An example of this involves avoiding supposedly unlucky symbols such as the number 13. The act of avoiding these symbols, Ac, imposes a cost (such as arranging to swap one’s airline seat) relative to nonavoidance, Af. However, as encounters with these symbols are relatively rare, individuals should already have considerable (cost-free) experience with the outcomes when the unlucky symbol is not selected and therefore greater knowledge about the value of Pr(OþjAc) than of Pr(OþjAf). Therefore, in contrast to the costly exploration case, here it is the relatively cheap action, Af, that provides information about the true value of pu (i.e. exploration). Conversely, in the current case, it is the costly action, Ac, that provides reward at a known rate, but no further information about the value of pk. The analysis of this case is very similar to the analysis described for the costly exploration case; it is effectively a matter of switching which conditional probability is associated with pk versus pu (the Appendix highlights these small differences in the set-up of the two models). This switch does, however, affect the technical definition of prior belief or disbelief, as originally given for the costly exploration case. For the current case, the individual has a prior belief (disbelief) that Ac is the optimal action when (for b ¼ 1) Eu(0,0) < pk c (Eu(0,0) > pk c). Furthermore, prior indifference is defined as Eu(0,0) ¼ pk c in the current case. Results In the current cheap exploration case, the critical line shown in Fig.1 (dashed lines) shifts towards the positive diagonal as the cost of Ac, c, decreases. Note that this effect of c on these critical lines differs from that of the costly exploration case. Nevertheless, the effect of c on the probability that an individual develops a superstition is the same in both cases; superstitions are less likely with higher c. In the current case, superstitions develop when individuals cross the critical line and permanently cease exploring the relatively cheap action, Af. Therefore, when c is small, the critical line has a higher slope, and the size of the nonsuperstitious exploration region is small, which means that the probability of superstition is high. The right column of Fig. 2 shows the effect of degree of prior belief (Eu(0,0) relative to pk), and the total number of trials, N, on the probability of superstition for the current case. As in the costly exploration case, in the current case the probability of superstitions is positively related to the degree of prior belief in the optimality of the costly action, Ac, although prior belief is not required for the development of a superstition. Similarly, the effect of N on the probability of superstition is qualitatively the same for the current case as for the costly exploration case (i.e. with prior belief, superstitions are more likely with small N; with prior disbelief, superstitions are more likely with large N). The justification for the interacting effect of N and degree of prior belief given for the costly exploration case also applies to the current case. Costly versus Cheap Exploration
are
A comparison of the two columns in Fig. 2 suggests that there systematic differences between the probability that
89
a superstition develops in the costly exploration case and the comparable cheap exploration case. In particular, for equal values of cost, c, total number of trials, N, and degree of belief in the value of Ac, the probability of superstition is generally higher for the cheap exploration case than the corresponding costly exploration case. While not immediately obvious, this result makes sense because costly exploration-type superstitions require the individual to explore (i.e. acquire information about the true nature of the causal relationship between action and outcome) for all N trials, whereas cheap exploration-type superstitions simply require a one-off determination at any stage of the sampling process that Ac is the optimal action and no further exploration will take place. In effect, costly exploration-type superstitions are harder to maintain because the individual must stay in the exploration region depicted in Fig. 1 for all N trials, whereas cheap exploration-type superstitions only require one shift into the exploitation region at any point during the N trials. It also makes sense that in Fig. 2 the exception to this trend (ignoring trivial exceptions involving ceiling and floor effects) involves low N (compare the upper right end of the prior indifference lines in the bottom row of Fig. 2). An extensive exploration of parameter space confirms that these exceptions are limited to very low N (<5), as well as to situations with prior indifference where the individual expects Oþ to occur at high rates regardless of the action given. DISCUSSION Model Predictions and Empirical Results The results of our model can be used to understand when superstitions will arise, what types of superstitions should be more common and how they might be considered unfortunate, but unavoidable, side-effects of optimal explorationeexploitation strategies. First of all, our model predicts that existing superstitions should be cheap relative to their perceived benefits. For example, in the costly exploration case, the model suggests that superstitions that involve carrying small, lightweight lucky charms might persist because the same general learning rules for identifying causal relationships in other settings are advantageous, while here they do next to no harm. Similarly, in the cheap exploration case, avoiding the number 13 may impose a relatively small cost with potentially large benefit, which might explain why this superstition persists (see also the Stuart McLean quote above). Note that cost is scaled to the benefits in our analyses, meaning that costly superstitions can be maintained in cases where the potential benefit is high. The prediction that superstitions will be more common when costs are low is shared with a range of previous optimality models (see Beck & Forstmeier 2007; Foster & Kokko 2009), and is supported by numerous empirical findings. As the cost of superstitions is not generally measured, then the prediction tends to be interpreted in the equivalent form ‘superstitions are more common when the benefits are high’. For example, Rudski & Edwards (2007) found that students reported a greater likelihood of engaging in superstitious rituals to influence the outcome of an exam when the exam was worth a greater portion of the final grade. In a classic observation, Malinowski (1954) reported that Trobriand Islanders had numerous superstitious rituals associated with open sea fishing (a dangerous activity), but none associated with fishing in a lagoon (a safe activity). Similarly, Keinan (1994) found that Israeli citizens that lived in areas exposed to missile attacks during the first Gulf War reported more superstitious beliefs than comparable citizens living in safer areas (for other examples, see Killeen 1978; Padgett & Jorgenson 1982; Biner et al. 1995). Second, our model predicts that superstitions are more likely to develop around causal relationships that, while nonexistent, are
90
K. R. Abbott, T. N. Sherratt / Animal Behaviour 82 (2011) 85e92
plausible a priori (Beck & Forstmeier 2007; Foster & Kokko 2009), although we note that (supernatural) superstitious actions may still develop in individuals that are initially sceptical of their benefits. The individual’s prior beliefs about the superstition are defined by the shape of the prior pu distribution relative to the value of the known probability, pk. This prior pu distribution can be shaped by the individual’s (i.e. personal learning) and/or the species’ (i.e. natural selection) experience with causal and noncausal relationships that have existed in past environments (McNamara et al. 2006; Foster & Kokko 2009). In addition, it is possible that a superstition developed de novo by one individual might, through cultural transmission, affect the prior pu distribution of other individuals in a way that makes it more likely that they develop the same superstition (Beck & Forstmeier 2007). Thus, while ‘plausibility’ can derive directly from an understanding (instinctive or learned) of how nature works, it can also derive from what others believe. This might explain why some superstitions (such as carrying rabbit feet, or throwing salt over shoulders) are so common within a population rather than all individuals having their own unique set of superstitions. While the effect of prior belief is not as obvious as the effect of costs and benefits, it is hardly unexpected. However, because previous studies do not seem to have manipulated, or measured, anything like prior belief, there is less direct support for this prediction. There is one category of result that does provide indirect support for the predicted effect of prior belief. In particular, if individuals are highly confident of getting the beneficial outcome, Oþ, there is little room for a superstitious behaviour to increase the probability of Oþ, and thus prior confidence could be considered a form of prior disbelief, which should be associated with a low probability of developing a superstition. There are several studies that show this expected negative association between confidence and superstitions. For example, Rudski & Edwards (2007) found that individuals reported that they were less likely to use superstitious rituals to affect the outcome of easy tasks than of hard tasks, or of tasks for which they were better prepared. Similarly, among baseball players, Ciborowski (1997) reported that superstitions were less common when the team was likely to win, and Gmelch (1971) found that fielding superstitions were rare, but batting superstitions were common (in any given play, the probability of success of a fielder can be greater than 90%, whereas in batting the probability of success could be less than 30%). Note that to the extent that confidence is the equivalent of the known probability, pk, and that there is some room for a novel action to cause a reduction in the probability of getting the beneficial outcome, Oþ, this argument and these examples will only be consistent with the costly exploration case. As discussed below, it is not always obvious which type of superstition (i.e. costly or cheap exploration) a given example represents, or whether real world superstitions can really be categorized dichotomously, rather than based on a more continuous classification, with different degrees of certainty in the chance of a beneficial outcome arising from two (or more) alternative actions. Thirdly, our results suggest that the effect of the number of opportunities, N, to evaluate the consequences of giving the action with unknown probability of outcome depends on the degree of prior belief. In particular, with prior belief, low N favours the development of superstitions, while with prior disbelief, high N favours the development of superstitions. Many superstitions arise in situations where N is relatively low (such as not shaving beards in hockey playoff finals). In these cases, the individuals concerned do not expect to have many opportunities to exploit (or learn about) the causal relationship (or lack thereof) and may therefore rely on culturally influenced (e.g. team) routines to guide them. While prior disbelief does not favour the development of superstitions, our results suggest that a large N makes the development
of this type of superstition more likely because it gives the individual enough time to shift from a state of disbelief to belief. Thus, it may be that rare ‘a priori implausible’ superstitions are generally based around situations that the individual expects to encounter frequently and has already witnessed some chance beneficial effects. We know of no data related to these predictions, but it is interesting to note that the previous analytical models of the optimality of superstitious behaviours focus on extremely low N cases (i.e. N ¼ 1) and suggest that superstitions are optimal only when plausible. For example, Foster & Kokko’s (2009) model 1 proposes a requirement for superstition that is identical to our definition of prior belief. Similarly, Pascal’s wager (Foster & Kokko 2009) and its variants, such as the question of whether bodypreserving cryogenics is an optimal investment (Shaw 2009), also consider cases in which N ¼ 1, and effectively find the associated behaviours (religious behaviours, paying for cryogenic preservation) to be optimal in cases where our definition of prior belief holds. Interestingly, many of these N ¼ 1 cases would not be classified as superstitions by most people. As Foster & Kokko (2009) suggested, these behaviours are more appropriately thought of as a ‘good wager’ (page 33). Furthermore, our definition of superstitions requires some objective knowledge that there is no casual relationship, and with Pascal’s wager and its cryogenics variant, we are unable to identify the outcome of any of the single trials allotted to all individuals, and are thus unable to objectively determine the true nature of the causal relationship. Lastly, our results seem to suggest that, all else being equal, cheap exploration-type superstitions (e.g. avoiding the number 13) should generally be more common than costly exploration-type superstitions (e.g. carrying lucky charms). However, with prior belief and low N (the conditions that generate the most superstitions; see above), there is little difference between frequencies of the two types of superstitions. Furthermore, with prior indifference and low N, superstitions are more likely in the costly exploration case. So it is difficult to make strong predictions about which type the relatively common ‘plausible’ superstitions should tend to be. The model does predict that the rarer ‘implausible’ superstitions should tend to be of the cheap exploration type. We know of no previous data set that could be applied to these predictions, but this is not surprising for two reasons. First, previous authors do not seem to have classified superstitions according to whether the unknown action (i.e. exploration) is a cheap or costly response. Second, as we highlight here, our predictions about the relative frequency of the two types of superstitions are somewhat equivocal. We suspect that strong predictions about what forms of uncertainty favour the development of superstitions are more likely to come from a more realistic model that acknowledges that individuals will never perfectly know the value of the conditional probabilities associated with any of the possible actions (i.e. there is no known probability, pk, and all actions are exploratory or informative). Comparison of Models To our knowledge, this is the first time that the solution techniques for two-arm bandit problems have been applied to understand the formation and persistence of superstitions, and yet they appear very well suited. There are, however, other complementary ways to view this type of problem. For example, causal learning can be thought of as a form of signal detection (Green 1966), where causal relationships are the signal to be detected (Killeen 1978; Allan et al. 2005, 2008; Beck & Forstmeier 2007; Foster & Kokko 2009) and reducing one type of error (i.e. failing to exploit an existing causal relationship) will increase the other type of error (i.e. superstitiously trying to exploit a nonexistent causal relationship). While the signal detection approach is conceptually
K. R. Abbott, T. N. Sherratt / Animal Behaviour 82 (2011) 85e92
91
appealing, it is not particularly useful as a modelling approach for this problem. In particular, the dynamic programming approach that we have used here seamlessly deals with the optimal learning and decision strategies (i.e. exploration and exploitation) whereas traditional signal detection theory focuses only on the optimal decision aspect of the problem. Alternatively, Stephens (1989, 2007) modelled this type of problem by explicitly calculating the benefits of reducing uncertainty (i.e. the ‘value of information’). This value-of-information approach yields some interesting and intuitive insights that are consistent with our results. In particular, Stephens (2007) emphasized that paying a cost to acquire information is only beneficial when the information has the potential to change behaviour (i.e. could actually change an individual’s belief about what is the optimal action). For example, in the costly exploration case, acquiring information involves exploring the costly Ac response. If the individual starts with a prior (correct in our case) belief that Ac is the suboptimal response (prior disbelief), paying the exploration cost will only be adaptive in cases where it is plausible that the individual could learn (incorrectly in our case) that Ac is actually the optimal response. Therefore, the value-of-information approach suggests that acquiring a superstition in the costly exploration case requires that the prior disbelief is sufficiently weak and that the total number of trials, N, is sufficiently high, so that the individual could actually get the information that would shift them to a state of belief. A similar value-of-information argument can be applied to the cheap exploration case. In both cases, the predictions of the value-of-information approach are consistent with the predictions produced by our simulations. While we have not explicitly used the value-of-information approach, it is interesting to note how Foster & Kokko’s (2009, section 2.a) basic model and our models of superstition fit into this framework. The value of information about the true state of the world (e.g. knowledge about the actual value of the unknown probability, pu) is defined as the difference between the success of an uninformed individual and the success of an informed individual (Gould 1974; Stephens 1989, 2007; see also Clark & Mangel 2000, page 240). Foster & Kokko modelled uninformed individuals that do not acquire information from their experiences, and therefore, base their behaviour entirely on the prior distributions. We, however, modelled individuals that could acquire information that may be used to change their behaviour. The two models produced similar results when the informational value of different actions did not influence which actions an individual performed (i.e. when the value of information was low relative to the cost of acquiring information). Thus, Foster & Kokko’s model can be seen as a special case of our model where the parameter value combination either discourages any exploration or does not permit the acquisition of enough information to change behaviour (e.g. low total number of trials, N). Both models predict that superstitions are more likely when the cost of the superstition is low, and when the superstitious belief is plausible a priori. However, the fact that we modelled adaptive causal learning allowed us to characterize the effects of N (for examples of the interplay of N or its surrogate and learning strategies in other contexts see: McNamara & Houston 1980; Eliassen et al. 2007) and the nature of the individual’s uncertainty about the causal relationship (i.e. the comparison between the two cases).
mechanisms are poorly adapted to the current environment (Gilovich 1991), or that the superstitious behaviours generate some secondary benefit that makes the behaviour adaptive overall (Sosis & Alcorta 2003). Our model, however, emphasizes the fact that even perfectly adaptive causal learning mechanisms will sometimes produce locally maladaptive behaviour (Killeen 1978; Beck & Forstmeier 2007; Foster & Kokko 2009) simply through processes such as early chance associations of specific actions with beneficial outcomes, which would be suboptimal to further evaluate (as anticipated by Skinner 1948), or culturally transmitted prior beliefs that would be too costly to put to the test should they prove to be true. A thorough study of the genesis of superstitions will require understanding what proximate causal learning mechanisms are ultimately adaptive and what factors make these adaptive causal learning mechanisms more or less likely to produce locally maladaptive behaviour. Here we have presented an adaptive causal learning mechanism that focuses on the problem of exploration intensity, which we think is a key component of the problem. For illustrative purposes, we have focused on the interaction between exploration and superstitions for simple (although probably common) types of causal learning problems. In any given instance the potential for superstition formation may be small, but in multiple instances, chance associations between certain actions and beneficial outcomes will almost inevitably arise. Further understanding of how superstitions develop will require considering the adaptive exploration strategies for other types of causal learning problems (e.g. cases with >2 potential causes and outcomes, cases where there is uncertainty about the value of both Pr(OþjAc) and Pr(OþjAc)). Furthermore, there are factors that we have not considered that should affect the optimal resolution of the explorationeexploitation trade-off. For example, risk sensitivity (Caraco 1980; Real 1980; Smallwood 1996) might affect optimal learning in several related ways. First, the extent to which an individual discounts the future (Gittins & Jones 1974; Kagel et al. 1986; Cohen et al. 2007) should affect the degree to which the individual favours immediate gains (exploitation) versus potentially greater future gains (exploration). Secondly, an individual’s state (such as its hunger) and resulting preference for immediate gains might vary from trial to trial as a function of success on previous trials or other external factors (Mangel & Clark 1988; Clark & Mangel 2000), so that desperate individuals may be more likely to develop superstitions. Alternatively, an individual’s state could be approximately constant over all N trials. In this situation, riskprone individuals might engage in little exploration as it yields more variable success.
Concluding Remarks
References
Superstitious behaviours suggest a mismatch between an individual’s perception of the causal structure of the environment and the actual structure of the environment. The existence of these mismatches may imply that the individual’s causal learning
Allan, L. G., Siegel, S. & Tangen, J. M. 2005. A signal detection analysis of contingency data. Learning & Behavior, 33, 250e263. Allan, L. G., Hannah, S. D., Crump, M. J. & Siegel, S. 2008. The psychophysics of contingency assessment. Journal of Experimental Psychology: General, 137, 226e243.
Acknowledgments This paper evolved from an assignment in a Causal Learning course taught by Lorraine Allan. We thank L. Allan, Dan Goldreich, Peter Killeen and Shep Siegel for discussion and comments early in this project. Chris Hassall and our anonymous referees kindly commented on the manuscript. We thank Margaret Abbott for her un-ironic and heartfelt prayers regarding this project. K.R.A. and T.N.S. are funded by Natural Sciences and Engineering Research Council of Canada (NSERC) Accelerator and NSERC Discovery grants, respectively, awarded to T.N.S.
92
K. R. Abbott, T. N. Sherratt / Animal Behaviour 82 (2011) 85e92
Beck, J. & Forstmeier, W. 2007. Superstition and belief as inevitable by-products of an adaptive learning strategy. Human Nature, 18, 35e46. Biner, P. M., Angle, S. T., Park, J. H., Mellinger, A. E. & Barber, B. C. 1995. Need state and the illusion of control. Personality and Social Psychology Bulletin, 21, 899e907. Bolker, B. M. 2008. Ecological Models and Data in R. Princeton, New Jersey: Princeton University Press. Caraco, T. 1980. On foraging time allocation in a stochastic environment. Ecology, 61, 119e128. Ciborowski, T. 1997. ‘Superstition’ in the collegiate baseball player. Sport Psychologist, 11, 305e317. Clark, C. W. & Mangel, M. 2000. Dynamic State Variable Models in Ecology: Methods and Applications. Oxford: Oxford University Press. Cohen, J. D., McClure, S. M. & Yu, A. J. 2007. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society B, 362, 933e942. Darwin, C. 1887. The life and letters of Charles Darwin, including an autobiographical chapter. page 307. In: The Complete Work of Charles Darwin Online: Vol. 1 (Ed. by F. Darwin). London: J. Murray. http://darwin-online.org.uk (Accessed from on 15 December 2010). Eliassen, S., Jørgensen, C., Mangel, M. & Giske, J. 2007. Exploration or exploitation: life expectancy changes the value of learning in foraging strategies. Oikos, 116, 513e523. Foster, K. R. & Kokko, H. 2009. The evolution of superstitious and superstition-like behaviour. Proceedings of the Royal Society B, 276, 31e37. Gilovich, T. 1991. How We Know What Isn’t So: the Fallibility of Human Reason in Everyday Life. New York: Free Press. Gittins, J. C. & Jones, D. M. 1974. A dynamic allocation index for the sequential design of experiments. In: Progress in Statistics (Ed. by J. Gani), pp. 241e266. Amsterdam: North-Holland. Gmelch, G. 1971. Baseball magic. In: Conformity and Conflict: Readings in Cultural Anthropology (Ed. by J. P. Spradley & D. W. McCurdy), pp. 346e352. Boston: Little & Brown. Gould, J. 1974. Risk, stochastic preference, and the value of information. Journal of Economic Theory, 8, 64e84. Green, D. M. 1966. Signal Detection Theory and Psychophysics. New York: J. Wiley. Hood, B. M. 2010. The Science of Superstition: How the Developing Brain Creates Supernatural Beliefs. San Francisco: HarperOne. Jones, P. 1978. On the two armed bandit with one probability known. Metrika, 25, 235e239. Kagel, J. H., Green, L. & Caraco, T. 1986. When foragers discount the future: constraint or adaptation? Animal Behaviour, 35, 271e283. Keinan, G. 1994. Effects of stress and tolerance of ambiguity on magical thinking. Journal of Personality and Social Psychology, 67, 48e55. Killeen, P. R. 1978. Superstition: a matter of bias, not detectability. Science, 199, 88e90. McLean, S. 2000. Vinyl Cafe Unplugged. Toronto: Penguin Canada. McNamara, J. & Houston, A. 1980. The application of statistical decision theory to animal behaviour. Journal of Theoretical Biology, 85, 673e690. McNamara, J. M., Green, R. F. & Olsson, O. 2006. Bayes’ theorem and its applications in animal behaviour. Oikos, 112, 243e251. Malinowski, B. 1954. Magic, Science, and Religion and Other Essays. Garden City, New York: Doubleday. Mangel, M. & Clark, C. W. 1988. Dynamic Modeling in Behavioral Ecology. Princeton, New Jersey: Princeton University Press. Padgett, V. R. & Jorgenson, D. O. 1982. Superstition and economic threat: Germany, 1918e1940. Personality and Social Psychology Bulletin, 8, 736e741. Real, L. 1980. On uncertainty and the law of diminishing returns in evolution and behavior. In: Limits to Action: the Allocation of Individual Behavior (Ed. by J. E. R. Staddon), pp. 37e64. New York: Academic Press. Rudski, J. M. & Edwards, A. 2007. Malinowski goes to college: factors influencing students’ use of ritual and superstition. Journal of General Psychology, 134, 389e403. Shaw, D. 2009. Cryoethics: seeking life after death. Bioethics, 23, 515e521. Skinner, B. F. 1948. ‘Superstition’ in the pigeon. Journal of Experimental Psychology, 38, 168e172. Smallwood, P. D. 1996. An introduction to risk sensitivity: the use of Jensen’s inequality to clarify evolutionary arguments of adaptation and constraint. American Zoologist, 36, 392e401.
Sosis, R. & Alcorta, C. 2003. Signaling, solidarity, and the sacred: the evolution of religious behavior. Evolutionary Anthropology, 12, 264e274. Stephens, D. W. 1989. Variance and the value of information. American Naturalist, 134, 128e140. Stephens, D. W. 2007. Models of information use. In: Foraging: Behavior and Ecology (Ed. by D. W. Stephens, J. S. Brown & R. C. Ydenberg), pp. 31e58. Chicago: University of Chicago Press. Wargo, E. 2008. The many lives of superstition. Observer, 21, 18e24.
APPENDIX BAYESIAN UPDATING AND DYNAMIC PROGRAMMING DETAILS Bayesian Updating The prior pu distribution is a Beta distribution characterized by two shape parameters, a and b. After n exploration trials, r of which result in the beneficial outcome, Oþ, and n r, which result in O, the posterior belief distribution is another Beta distribution with shape parameters a0 ¼ a þ r and b0 ¼ b þ n r. The expected value of the posterior distribution for a given n and r is therefore given by Eu(r,n) ¼ a0 /(a0 þ b0 ) ¼ (a þ r)/(a þ b þ n)
(A1)
Dynamic Programming With b ¼ 1, for any given pair of r n and n N, the maximum expected return on the future N n trials can be given by F(r,n) ¼ max((N n)pk,S(r,n) c)
(A2)
for the costly exploration case and F(r,n) ¼ max((N n)(pk c),S(r,n))
(A3)
for the cheap exploration case. In both cases, the first term describes the expectation if the individual ceases exploration after these n trials and gives the response with known probability of outcome on all subsequent (N n) trials. Similarly, S(r,n) describes the long-term expectation if the individual continues exploring the response with unknown probability of beneficial outcome for at least one more trial, which is given by S(r,n) ¼ Eu(r,n)(1 þ F(r þ 1,n þ 1)) þ (1 Eu(r,n))F(r,n þ 1)
(A4)
where Eu(r,n) gives the estimated mean probability that the beneficial outcome Oþ will occur on the next trial if action associated with the unknown conditional probability is given. The optimal decision (continue exploration or start exploiting) can be determined for all (r,n) pairs by setting F(r,N) ¼ 0 for all r and working backwards (‘backwards induction’) using a standard stochastic dynamic programming algorithm (Mangel & Clark 1988).