Games and Economic Behavior 82 (2013) 31–51
Contents lists available at SciVerse ScienceDirect
Games and Economic Behavior www.elsevier.com/locate/geb
On games of strategic experimentation ✩ Dinah Rosenberg a , Antoine Salomon b , Nicolas Vieille a,∗ a b
Economics and Decision Sciences Department, HEC Paris, and GREG-HEC, France LAGA, Institut Galilée, Université Paris 13, France
a r t i c l e
i n f o
Article history: Received 25 November 2011 Available online 28 June 2013 JEL classification: D83 D82 Keywords: Strategic experimentation Optimal stopping Real options Incomplete information
a b s t r a c t We study a class of symmetric strategic experimentation games. Each of two players faces an (exponential) two-armed bandit problem, and must decide when to stop experimenting with the risky arm. The equilibrium amount of experimentation depends on the degree to which experimentation outcomes are observed, and on the correlation between the two individual bandit problems. When experimentation outcomes are public, the game is basically one of strategic complementarities. When experimentation decisions are public, but outcomes are private, the strategic interaction is more complex. We fully characterize the equilibrium behavior in both informational setups, leading to a clear comparison between the two. In particular, equilibrium payoffs are higher when equilibrium outcomes are public. © 2013 Elsevier Inc. All rights reserved.
Strategic experimentation issues are prevalent in most situations of social learning. In such setups, an agent may learn useful information by experimenting himself, or by observing other agents. Consider two pharmaceutical firms pursuing research on two related molecules. This research may eventually yield positive results if the molecule or one of its derivatives has the desired medical effect, or may never yield such results if this is not the case. Given this uncertainty, how long should each firm be willing to do research in the absence of any positive results? The answer is likely to depend on the informational spillovers between the two firms, on at least two grounds. On the one hand, the correlation between the two research outcomes affects the statistical inferences of the firms. In the above instance, or e.g. in the case of two firms drilling nearby oil tracts, it is often natural to assume that these outcomes are positively correlated. Klein and Rady (2011) discuss examples from economics of law, in which the relevant outcomes are instead typically negatively correlated. On the other hand, the answer depends on the extent to which research outcomes within one firm are observed by the other firm. There are cases in which research successes are immediately publicized (say, in the form of specific patents), and other cases in which only research policy decisions are publicly observed. We refer to Bergemann and Valimaki (2008) for a recent overview of strategic experimentation games. Typical applications include dynamic R&D (see e.g. Moscarini and Squintani, 2010; Malueg and Tsutsui, 1997), competition in an uncertain environment (McLennan, 1984), financial contracting (Bergemann and Hege, 2005), etc. The analysis of such situations usually builds on the model of (two-player) bandit games, introduced in Bolton and Harris (1999). In these games, each player is facing a continuous time, two-arm bandit problem. One of the two arms is a safe one, and yields a constant payoff flow, while the type of the other is ex ante unknown. Each player has to choose how to allocate time between the two arms, based on information acquired along the play. Partly for tractability reasons, many papers have studied variants of so-called exponential bandit games, popularized by Keller et al. (2005) (see Keller and Rady, 2010;
✩
*
The authors thank the ANR Explora and Jeudy, and Vieille thanks the HEC foundation for financial support. Corresponding author. E-mail addresses:
[email protected] (D. Rosenberg),
[email protected] (A. Salomon),
[email protected] (N. Vieille).
0899-8256/$ – see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.geb.2013.06.006
32
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
Klein and Rady, 2011; Murto and Välimäki, 2009 among others). With the exception of Murto and Välimäki (2009) (and Aoyagi, 1998; Rosenberg et al., 2007, though in a different setup), all papers assume that the experimentation outcomes are publicly observed. Besides, all papers assume that the types of the two risky arms are equal, with the exception of Klein and Rady (2011), which assume perfectly negatively correlated types, and Murto and Välimäki (2009), which assume partially positively correlated types. Our purpose here is to study how precisely the two dimensions – correlation of the experimentation problems and observability of the experimentation outcomes – combine and influence the equilibrium outcomes. Following the literature, we focus on exponential bandits. In contrast with most earlier work, we assume that the decision to switch from the risky arm to the safe one is irreversible. We study two informational regimes. In the first one, experimentation outcomes (actions and payoffs) are publicly observed. There is incomplete information about the risky arms’ types, but no asymmetric information between the players, and our game of timing is essentially a game of pure coordination: beyond a certain time, a player is willing to experiment only to the extent that the other player still experiments. Accordingly, the formal analysis is rather straightforward. The belief assigned by a player to his arm being good declines continuously as long as no success occurs. If the two arms are positively correlated, a success hit by the other player brings good news, and the absence of such a success is bad news, so that the belief declines faster than in a one-player setup. If the arms are negatively correlated, such an event instead brings bad news, and the belief held by the other player then jumps downwards. In that setup, all equilibria are pure and symmetric, and Pareto ranked. At any such equilibrium, both players drop out at the same time t, unless one of them has experienced a success prior to t. In the latter case, a ‘successful’ player never drops out, and an ‘unsuccessful’ one adjusts to the news, taking correlation into account, and behaves optimally in the continuation, one-player problem. In the second informational scenario, only experimentation decisions are observed. Thus, a player only observes whether the other player is still active or not. The game does not exhibit strategic complementarities any longer. In any time interval, a player deduces valuable information from the other player’s behavior, only inasmuch as that player may choose to drop out in that time interval. At a symmetric equilibrium, there exists no time at which a player drops out with positive probability. Otherwise indeed, the information obtained by the other player at such an atom would have a positive value, and he would rather wait beyond that time and get this information, contradicting the symmetric equilibrium assumption. This implies that the belief of player i evolves continuously, until player j drops out. When the latter event occurs, this is evidence that player j did not get any payoff from the risky arm. The effect of correlation is then reversed, when compared to the previous scenario. If the two arms are positively correlated, such an event is bad news, in that the belief of player i jumps downwards. A contrario, such an event is good news if the two arms are negatively correlated, and the belief of player i jumps upwards. There is a unique symmetric equilibrium. It is mixed, with a continuous distribution, and the cdf of the equilibrium strategy is obtained as the solution to a linear, second-order differential equation. This contrasting equilibrium structure (compared with the first scenario) can be understood as follows. In the first scenario, beliefs jump when a success occurs. This occurs at a random time, whose distribution is exogenous and has a continuous density. In the second scenario, beliefs jump when a player drops out. The distribution of this exit time is endogenous and (at least for symmetric equilibria) must be a continuous one, as we argued above. There is no asymmetric equilibrium outcome if types are positively correlated, but there are if types are negatively correlated. All of them are purely atomic and (weakly) Pareto dominate the symmetric one. The equilibrium analysis allows us to answer a number of comparative statics question, and several conclusions are worth mentioning here. For some of these conclusions, it is useful to organize the discussion according to the direction of the beliefs’ jumps. Assume first that events/informational jumps bring good news, so that beliefs jump upwards wherever discontinuous.1 Then, the belief threshold at which a player chooses to drop out is the same as the belief threshold at which he would optimally choose to drop out, if he were alone. In other words, the (marginal) option value of observing the other player is then equal to zero. In a sense, equilibria thus exhibit no encouragement effect (see Bolton and Harris, 1999). This is similar to an observation made in Murto and Välimäki (2009). When events instead bring bad news, symmetric equilibria always exhibit an encouragement effect, and players keep experimenting with beliefs below the one-player threshold, see Theorems 1 and 2.2 The proofs for the two information scenarios are fairly different. Yet, there is a common intuition. If player i is indifferent between dropping out at times t and t + dt, then it must be that the informational benefits derived from experimenting between t and t + dt just offset the opportunity cost of doing so: thus, it might be that an event occurs, that will trigger him to continue beyond t + dt. But if events bring bad news, an event linked to player j will increase the willingness of player i to drop out. Thus, the trade-off faced by player i at time t is the same as if he were alone. More importantly, the ranking of the two information scenarios is unambiguous, no matter what the correlation between the two risky arms is. (Symmetric) equilibrium payoffs are higher in the first scenario (public disclosure of experimentation
1 Again, this is the case as soon as arms are negatively correlated and all information is public, or arms are positively correlated, and only actions are publicly observed. 2 In each of the four combinations but one, there is a unique symmetric equilibrium.
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
33
outcomes) than in the second one. Moreover, when only experimentation decisions are publicly observed, the equilibrium payoff at the symmetric equilibrium is always equal to the equilibrium payoff of the one-player game where a player gets no information from the other player. Thus, there is no value in observing the other player’s actions, unless payoffs are observed as well. Unfortunately, we know neither of a direct proof for these conclusions, nor of a direct and conclusive intuition. Instead, both conclusions follow as a consequence of the equilibrium analysis of our games. The impact of the correlation is unambiguous as well, under both informational assumptions. Not surprisingly when experimentation outcomes are public, the more correlated the risky types are, the higher the equilibrium payoffs are. When instead experimentation outcomes are private, it follows from the previous paragraph that equilibrium payoffs are, surprisingly, independent of the correlation between the risky types. We stress that equilibrium behavior does however depend on the amount and on the sign of the correlation. Finally, we do not provide meaningful answers to questions such as the impact of the correlation on the (random) amount of experimentation. To see why, assume that experimentation outcomes are public, and that the risky arms are negatively correlated. As long as a player does not experience a success, this is good news to the other player – the higher the correlation, the better the news, and the more the other player will be willing to experiment in the future. As soon as a player experiences a success, this is bad news to the other and the higher the correlation, the less the other player will experiment beyond that point. The paper is organized as follows. We lay down the model in Section 1. We state our results, and provide intuitions and heuristic derivations in Sections 2 and 3. Detailed proofs are in Appendices A and B. 1. The model 1.1. Setup and comments 1.1.1. The setup Time is continuous. Each of two players is facing a strategic experimentation problem, modeled as an optimal stopping problem. Each player has two possible actions, called arms. One of the arms does not involve any uncertainty, and yields a constant payoff flow with present value s. The other arm is risky. Its type is ex ante unknown, and may be either Good (G) or Bad (B). We denote by R 1 and R 2 the types of the two arms, and by q the prior distribution of the type profile ( R 1 , R 2 ). The decision to switch from the risky arm to the safe one is irreversible. That is, each player has to choose when, if ever, to drop out and stop experimenting. We will throughout denote by θi the time at which player i chooses to drop out. A risky arm of a bad type never yields any payoff. A risky arm of a good type yields no payoff until a random time τ , but yields a constant payoff flow with present value γ from τ on. The random time τ follows an exponential distribution with parameter λ. Conditional on the type profile, the two risky arms are independent. That is, in the event where both arms are good, the two success times τ1 and τ2 associated with the risky arms of the two players are independent random variables.3 We focus on symmetric problems. Thus, the two players share the same discount rate r > 0, and the ex ante probability of one’s arm being good is the same for both players. We make no assumption on q beyond symmetry, and we measure correlation between the two arms by ρ := q( R 1 = R 2 = G ) − q( R 1 = G )q( R 2 = G ). In particular, the two risky arms are positively correlated if ρ > 0, and negatively correlated if ρ < 0. We study two variants of this model, which differ in the information made available to the players along the play. In the first variant, all information – actions and past payoffs – is public. We refer to it as the public scenario. The players always share a common posterior belief over the two types. The game then exhibits strategic complementarities: the longer player j is willing to experiment, the higher the option value for player i of experimenting and, therefore, the longer player i is willing to experiment. As long as both players are active (and no payoff is received), the probability assigned to one’s arm being good decreases continuously.4 The higher is the correlation ρ , the faster this belief decreases.5 Indeed, if ρ > 0, the information that the other player did not hit a success is further statistical evidence that one’s own arm is bad. Instead, when ρ < 0, this information mitigates to some extent this evidence. When a player, say i, first experiences a success, the belief of player j adjusts to these news. If ρ > 0, player i’s success is good news, and player j’s belief jumps upwards. If ρ < 0, this is bad news, and player j’s belief jumps downwards. When only actions are disclosed, the informational interaction is more complex. As an illustration, observe that the optimal payoff of player i is the same, whether player j plans to drop out at time 0, or whether player j plans never to drop out. Indeed, in both cases, player i infers no information whatsoever from observing player j. Thus, in contrast to the public scenario, player i’s optimal payoff is non-monotonic in the amount of experimentation performed by player j.
3 There is a very simple formal setup fitting this description. Let X 1 and X 2 be two independent random variables with an exponential distribution with parameter λ. Nature chooses the pair ( R 1 , R 2 ) of risky types according to the common prior q ∈ ({G , B } × {G , B }). The time τi at which player i’s risky arm starts delivering payoffs is τi := +∞ if R i = B, and τi := X i if R i = G. 4 This relies on the symmetry assumption. 5 Throughout, and depending on the context, we use the word belief to denote either the posterior distribution of the pair of types, or the posterior probability assigned to one’s arm being good. We hope that no confusion will arise.
34
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
More generally, the inference from player j’s behavior is contingent on player j’s strategy: over any time interval, the ‘amount’ of information derived by player i depends on how likely it is that player j would drop out in that interval (in the absence of a success). As long as both players are active, and as time goes by, the information that player j does not drop out becomes stronger evidence that player j may have hit a success. (However, the actual belief held by player i depends upon player j’s strategy.) The higher is the correlation ρ , the slower the beliefs decrease. This is in contrast with the other scenario. When a player, say i, first drops out at time θi , this provides conclusive evidence that τi θi , and player j updates his belief. If ρ > 0, the exit of player i is bad news, and player j’s belief jumps downwards. If ρ < 0, it instead jumps upwards. 1.1.2. The one-player benchmark Whatever the scenario, one option available to player i is to ignore player j. Thus, the one-player decision problem P in which player i only observes his own payoffs is a useful benchmark. The problem P is standard, and we refer to e.g. Presman (1991) (which solves more complex setups). The decision maker i has to choose the time t when to drop out or, equivalently, a belief threshold, since the belief p (t ) := P( R i = G | τi t ) held at time t is given by the decreasing map p (t ) :=
pe −λt . pe −λt +(1− p )
Given the strategy t, the payoff is given by
γ e−r τ if τ < t, and by se−rt if τ t, so that the expected
payoff of the decision maker is
t
π (t ) := p γ
λe −(r +λ)x dx + se −rt p 1 − e −λt + (1 − p ) ,
(1)
0
which has a unique maximum when
p (t ) = p ∗ :=
rs
λ(γ − s)
.
As a function of the initial belief p ∈ (0, 1], the optimal payoff of the decision maker in P is given by
W ( p ) := gp + (s − gp ∗ )
+∞
1− p 1 − p∗
(1 − p ) p ∗ p (1 − p ∗ )
r /λ if p p ∗
and
W ( p) = s
if p < p ∗ ,
λ where g = γ 0 λe −λx e −rx dx = γ r +λ is the ex ante expected payoff of a good risky arm. To avoid trivialities, we assume g > s > 0. We will only make use of the fact that W (·) is a non-decreasing C 1 function such that W ( p ) s. In addition, we use the property that π (0) > 0 (resp. π (0) < 0) when p > p ∗ (resp. p < p ∗ ), which easily follows from (1). In some proofs, we will write π p (t ) to stress that π does depend on the initial belief p.
1.2. Strategies and payoffs A strategy dictates when to stop, as a function of all available information. Since time is continuous, some care is needed. We argue here that following most “private histories”, the optimal decision is unambiguous. This allows for a conceptually straightforward definition of strategies, and of expected payoffs. 1.2.1. The public scenario We start with the case where experimentation outcomes are public. Consider any time t, and focus on player i, who we assume is active up to time t. If τi < t, player i’s risky arm is known to be good, and player i should stick to it. On the other hand, if either τ j < t or θ j < t, player j’s future behavior is ‘irrelevant’. In other words, when at time min(τ j , θ j ), player i should update his belief, and proceed with the optimal policy in the one-player decision problem P . Hence, player i’s optimal behavior at time t and beyond is given by the analysis of the one-player case, unless if min(τi , τ j , θ j ) t. Accordingly, we define a pure strategy of player i to be a time t i ∈ [0, +∞], with the interpretation that player i drops out at time t i if player j has not dropped out earlier (θ j t), and if no payoff has been received prior to t (min(τi , τ j ) t). A mixed strategy is a probability distribution over [0, +∞]. We now provide a formula for the expected payoff γ Pi (t 1 , t 2 ) induced by an arbitrary pure profile (t 1 , t 2 ). Assume first that t 1 = t 2 = t. The outcome of the game is determined as follows. In the event where min(τ1 , τ2 ) t, both players drop out at time t. If a success occurs prior to t, say at time τ1 < min(τ2 , t ), player 2 updates his belief to ψ(τ1 ), where ψ(x) := P( R 2 = G | R 1 = G , τ2 x) is the belief of a player, conditional on the other risky arm being good. Player 2 then remains active until his belief reaches p ∗ , with an expected continuation payoff (at time τ1 ) equal to W (ψ(τ1 )). Thus, when t 1 = t 2 , the expected payoff of player i is
γ Pi (t i , t j ) := γ E e−r τi 1τi
+∞) = 0).
τ1 and τ2 are equal and finite is equal to zero: P(τ1 = τ2 <
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
35
Assume next that t 1 < t 2 . In the event where min(τ1 , τ2 ) t 1 , player 1 drops out at time t 1 , and player 2 next waits until his belief reaches p ∗ . Player 2’s continuation payoff at time t 1 is given by W (φ(t 1 )), where φ(x) := P( R 2 = G | τ1 x, τ2 x) is a player’s belief at time x if no success occurred prior to x. If instead min(τ1 , τ2 ) < t 1 , the ‘non-successful’ player updates his belief at time min(τ1 , τ2 ) to ψ(min(τ1 , τ2 )), and again, switches to the optimal policy in P , with a continuation payoff equal to W (ψ(min(τ1 , τ2 ))). Observe that the outcome of the profile (t 1 , t 2 ) does not depend on t 2 as long as t 2 > t 1 . Observe also that player 1’s expected payoff does not depend on t 2 , as long as t 2 t 1 . Thus, γ P1 (t 1 , t 2 ) = γ P1 (t 1 , t 1 ), while
γ P2 (t 1 , t 2 ) = γ E e−r τ2 1τ2
+ e −rt1 W φ(t 1 ) P min(τ1 , τ2 ) t 1 .
1.2.2. The private scenario We now turn to the definition of strategies and to the computation of payoffs in the private scenario. Again, consider any time t ∈ R+ , and focus on player i, who we assume is active up to t. As above, if τi < t, player i’s arm is known to be good, and player i therefore sticks to it. If now θ j < t,6 this is evidence that τ j θ j . At time θ j , player i then updates his belief to φ(θ j ) (= P( R i = G | min(τi , τ j ) θ j )), remains alone and therefore remains active until his belief reaches p ∗ . Hence, player i’s optimal decision at t is dictated by the analysis of the one-player problem P , unless if min(τi , θ j ) t. Here again, we thus let a pure strategy of player i be a time t i ∈ [0, +∞], with the interpretation that player i drops out at t i if τi t i and θ j t i . Again, a mixed strategy is a probability distribution over [0, +∞]. Even though a pure strategy is the same mathematical object in the two scenarios, the interpretation is of course different. Consider an arbitrary pure profile (t 1 , t 2 ). Assume first that t 1 = t 2 = t. In that case, player i drops out at t if τi t, and stays forever if τi < t. Player i’s expected payoff γ Ai (t , t ) is thus given by
γ Ai (t , t ) := γ E e−r τi 1τi
τ1 t 1 , and stays forever otherwise. Hence, player
γ A1 (t 1 , t 2 ) = γ E e−r τ1 1τ1
γ A2 (t 1 , t 2 ) = e−rt1 W φ(t 1 ) P min(τ1 , τ2 ) t 1 + γ E e−r τ2 (1τ2
1.2.3. Conceptual issues The definition and interpretation of strategies raises a few conceptual issues. For concreteness, we focus on the private scenario, and let a pure profile (t 1 , t 2 ) be given, with t 1 < t 2 . Assume that the event θ1 = t 1 < τ2 materializes.7 According to the previous interpretation, player 1 drops out at t 1 , player 2 then updates his belief to φ(t 1 ), and switches to the optimal policy in P . Depending on t 1 and on ρ , it may well be that φ(t 1 ) < p ∗ , in which case player 2 wishes to stop as early as possible following t 1 and even, possibly, to overturn his time t 1 ’s decision. How then should player 2’s decision at time t 1 be defined? Preventing player 2 to drop out at time t 1 after he has seen that player 1 has dropped out, would lead to spurious equilibrium non-existence results. As a matter of fact, expected payoffs would not even be defined. To avoid this, we implicitly adopted the view that player 2 is allowed to revise his time t 1 decision, should player 1 decide to drop out. This leads to the informal idea that each player has two decision nodes at each time instant t. At the first one, a player conditions his decision on information available prior to t. Should he decide to remain active, and should some event occur at time t, he is next allowed to revise his decision – that is, to drop out –, with no delay.8 We leave aside the technical issues involved in defining a game tree that fits with these interpretations. However, the previous formulas for expected payoffs lead to a well-defined game in strategic form, in which the space of pure strategies is R+ ∪ {+∞}. We emphasize that a strategy only tells when to stop. That is, the pure strategy t does not specify when to stop, in the counterfactual event where one would not stop at time t when required. This prevents us from discussing equilibrium refinements (which anyway are not well-defined in this continuous-time framework), and all of our results are stated in terms of Nash equilibrium.9
6
Recall that τ j is not observed by player i. This is the event where no success has been hit prior to t 1 : min(τ1 , τ2 ) t 1 . 8 Obviously, one cannot merge these two nodes into a single decision node, at which each player would be allowed to condition his time t decision on the other player time t decision. Indeed, circularity problems would arise, leading to an outcome indeterminacy. 9 Note however that, informally, strategies of player i play ‘optimally’ when alone, and do not involve non-credible threats. 7
36
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
2. Public experimentation outcomes This section is devoted to the analysis of the public scenario. In this scenario, the game exhibits strategic complementarities, and the analysis is to some extent predictable. We here state our results, and will provide some intuition for the one result which may sound surprising. We recall that φ(t ) is the belief held by a player at time t conditional on no success having occurred prior to t, that is,
φ(t ) := P R i = G τi t , τ j t , and we denote by t φ ∈ R+ the time at which φ(t φ ) = p ∗ ,10 so that φ(t ) > p ∗ for t < t φ . Given that each player always has the option of ignoring the other player, no player will ever drop out prior to t φ if no success occurs. The extent to which a player will experiment beyond t φ depends on the option value of observing the other
player. It is thus useful to introduce the auxiliary decision problem P˜ , in which player i observes at any time t both risky arms. Equivalently, P˜ is the best-reply problem faced by player i in the public scenario, if player j plans never to drop out. We show in Appendices A and B that the decision maker has a unique optimal strategy in P˜ , denoted t ∗∗ t φ . Under this strategy, the decision maker drops out at time t ∗∗ if no success occurred (that is, min(τi , τ j ) t ∗∗ ), and otherwise drops out if his belief ever reaches p ∗ . In particular, no player ever wants to experiment beyond time t ∗∗ , even if the other player were to experiment forever. The following result formalizes these intuitions. Theorem 1. The symmetric equilibria are the pairs (t , t ), with t ∈ [t φ , t ∗∗ ]. More generally, the equilibria are the profiles (t , σ ), where t ∈ [t φ , t ∗∗ ] and together with the profiles obtained when exchanging the two players.
σ is any distribution such that σ ([t , +∞]) = 1,
Since the outcomes induced by the strategy profiles (t , t ) and (t , σ ) are the same whenever equilibria are not distinguishable from symmetric ones. When the correlation parameter ρ is negative, one can actually say much more.
σ ([t , +∞]) = 1, asymmetric
Proposition 1. The optimal time t ∗∗ is given by t ∗∗ = t φ if ρ 0, and satisfies t ∗∗ > t φ if ρ > 0. If ρ 0, the optimal policy in P˜ is thus to drop out as soon as one’s belief coincides with p ∗ , the optimal cut-off in the one-player benchmark decision problem. In that sense, the marginal option value of observing the other player is equal to zero.11 Consequently, there is a unique symmetric equilibrium when risky arms are negatively correlated. We here provide the intuition behind Proposition 1. It is useful to keep in mind that the beliefs we have introduced so far satisfy φ(·) > p (·) > ψ(·) if ρ < 0 and φ(·) < p (·) < ψ(·) if ρ > 0. We place ourselves at time t ∗∗ , in the event where min(τ1 , τ2 ) t ∗∗ . The dynamic programming principle asserts that the decision maker is indifferent between dropping out at time t ∗∗ and experimenting during an additional, short amount of time dt (and then behaving in an optimal way at time t ∗∗ + dt, given the information acquired between t ∗∗ and t ∗∗ + dt). Three events may occur in this short time interval. Firstly, player 1 may hit a success: τ1 ∈ [t ∗∗ , t ∗∗ + dt ). The probability of this event is approximately λφ(t ∗∗ ) dt. If it occurs, the decision maker’s overall continuation payoff, discounted back to time t ∗∗ , is approximately equal to γ .12 It may also happen that player 2 hits a success, but player 1 does not: τ2 ∈ [t ∗∗ , t ∗∗ + dt ), τ1 t ∗∗ + dt. The probability that both players hit a success between t ∗∗ and t ∗∗ + dt is of the order of (dt )2 , irrespective of the correlation between the two arms’ types. Hence, the probability of only player j hitting a success is again approximately λφ(t ∗∗ ) dt. Moreover, should this occur, the overall continuation payoff of the decision maker would be equal to the value W (ψ(t ∗∗ )) of P , computed at the updated belief. The most likely event is that no player hits a success. The probability of that event is approximately 1 − 2λφ(t ∗∗ ) dt. In that case, the flow payoff of the decision maker is exactly 0, and his continuation payoff is exactly s, since the decision maker then drops out at time t ∗∗ + dt. Hence his overall payoff is equal to e −r dt s when discounted back to t ∗∗ . When neglecting all terms of an order smaller than dt, the indifference condition at time t ∗∗ becomes
s = λφ(t ∗∗ )γ dt + λφ(t ∗∗ ) W ψ(t ∗∗ ) dt + (1 − r dt ) 1 − 2λφ(t ∗∗ ) dt s, which simplifies into
10 Given the symmetry assumption, the map φ(·) is continuous and decreasing, unless if risky arms are perfectly negatively correlated, in which case φ is constant, and t φ := +∞. We stress that all of our results are valid when arms are perfectly negatively correlated. Yet, to avoid qualifiers such as the one at the beginning of this footnote, we find it convenient to assume that they are not. Only minor and obvious modifications are needed when they are. 11 However, the value of P˜ is higher than the value of P . 12 The exact continuation payoff depends on the exact value of τ1 . But it is multiplied by a probability which is of the order of dt.
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
37
λφ(t ∗∗ )γ + λφ(t ∗∗ ) W ψ(t ∗∗ ) − r + 2λφ(t ∗∗ ) s = 0. When
(2)
ρ < 0, one has ψ(t ∗∗ ) < φ(t ∗∗ ) φ(t φ ) = p ∗ , so that W (ψ(t ∗∗ )) = s. In that case, Eq. (2) yields
φ(t ∗∗ ) =
rs
λ(γ − s)
= p∗ ,
and t ∗∗ = t φ as claimed. Intuitively, whether or not player 2 hits a success in the interval [t ∗∗ , t ∗∗ + dt ], player 1 does not experiment beyond t ∗∗ + dt. Hence, observing player 2 is valueless to player 1 beyond t ∗∗ , and player 1 should not be willing to experiment with beliefs below p ∗ . We stress that the observation of player 2 prior to t ∗∗ is valuable, since it affects the speed at which beliefs change. If instead ρ > 0, one has ψ(t φ ) > φ(t φ ), hence W (ψ(t φ )) > s, and t φ is no longer a solution of Eq. (2). Therefore, t ∗∗ > t φ . 3. Private experimentation outcomes This section is devoted to the second scenario, in which only actions are publicly disclosed. The full analysis raises many technical complications, and all detailed proofs are in Appendices A and B. This section is organized as follows. We first argue that there is a unique symmetric equilibrium, and we describe it. We next provide additional qualitative insights, and discuss asymmetric equilibria. A distinguishing feature of this scenario is that beliefs are endogenous. Given a strategy σ j of player j, we denote by ξi (t ) the belief held by player i at time t if player j is still active:
ξi (t ) := P( R i = G | θ j t , τi t ). The map ξi is left-continuous, and admits a right-limit at each t 0, denoted ξi (t +) = P( R i = G | θ j > t , discontinuities of the belief ξi (·) coincide with the atoms of the distribution σ j . This new belief function compares to the earlier ones, and one has φ(t ) > p (t ) > ξi (t ) ψ(t ) for all t > 0 if inequalities are reversed if ρ > 0.
τi > t ). The ρ < 0, while
3.1. Symmetric equilibria We assume that a symmetric equilibrium (σ , σ ) exists, and derive necessary conditions. We first spend some time proving that σ , viewed as a probability measure over R+ ∪ {+∞} is a non-atomic measure. We then argue that the support of the distribution must be an interval. Finally, we show that the indifference condition of a player uniquely pins down the cdf of σ as the unique solution to a linear, second-order differential equation with non-constant coefficients, and with given boundary values. Proving that a symmetric equilibrium exists is done by checking that the unique equilibrium candidate is indeed an equilibrium (see Appendices A and B). Proposition 2. The measure σ is non-atomic. The intuition is as follows. An atom in the strategy of player j is a time t at which player i gets a “grain” of information. When best-replying to that strategy, player i would rather wait and see this information, rather than, say, dropping just before t. Turning this intuition into a proof is a bit delicate. Let’s assume, for the sake of contradiction, that the distribution σ has an atom at time t 0 , and consider player i. By the equilibrium condition, dropping out is the optimal thing for player i to do when at time t 0 , and it yields a continuation payoff of s. On the other hand, waiting to see player j’s decision yields information. Conditional on player j’s decision, the continuation payoff of player i is at least s, and it is equal to W (φ(t 0 )) should player j drop out. Thus, the equilibrium condition implies W (φ(t 0 )) = s, and therefore φ(t 0 ) p ∗ . The remaining part of the proof is different, depending on the sign of ρ . Assume first that ρ < 0. In that case, beliefs satisfy the inequalities13
ξ(t 0 +) < ξ(t 0 ) < φ(t 0 ) p ∗ . We claim that player i should rather drop before t 0 , rather than wait until t 0 and then drop out. The argument is a bit tricky, and goes as follows. Assume that, when at a time t < t 0 very close to t 0 , player i is considering whether to experiment until t 0 (but not until t 0 +), or to drop out at t. The latter decision yields s. Consider now the former alternative, and consider the hypothetical situation in which player i were choosing not to observe player j’s actions in the interval [t , t 0 ). Since ξ(t 0 ) < p ∗ , the expected continuation payoff of player i would be below s, and the expected loss upon s would be of the order of t 0 − t.14 However, player i is observing player j’s decisions, and is adjusting to them. This added flexibility 13 To compute ξi (t 0 ) and ξi (t 0 +), one conditions respectively on τi t 0 , θ j t 0 and on τi t 0 , θ j > t 0 . Since strictly more pessimistic on the latter event, hence the first inequality. 14 This continuation payoff includes the possibility that player i may hit a success in [t , t 0 ).
σ j ({t 0 }) > 0 and since ρ < 0, player i is
38
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
makes a difference upon the previous hypothetical situation only if player j happens to drop out in the time interval [t , t 0 ). In that event, the marginal gain of player i between the hypothetical and the true situations, is at most of the order of W (φ(t )) − s. Since φ(t 0 ) p ∗ , and since W ◦ φ is of class C 1 , this marginal gain W (φ(t )) − s is at most of the order of t 0 − t. On the other hand, the probability of player j dropping out in the half-open interval [t , t 0 ) decreases to zero as t → t 0 . Collecting these observations, it can be shown that the marginal expected value of observing player j’s actions is arbitrary small compared to t 0 − t. Hence the expected payoff of player i is higher when dropping out at any time t < t 0 sufficiently close to t 0 , which is in contradiction with the equilibrium property. Assume next that ρ > 0. Inequalities on beliefs are reversed, and we now have
ξ(t 0 +) > ξ(t 0 ) > φ(t 0 ).
(3)
Our argument is here different, depending on whether ξ(t 0 ) < p ∗ , or ξ(t 0 ) p ∗ . If ξ(t 0 ) < p ∗ , we adapt the argument from the previous paragraph and again argue that player i should rather stop shortly before t 0 , instead of waiting and of dropping out at time t 0 . As above, if player i were choosing not to observe player j between t and t 0 , the increase in player i’s payoff when dropping at time t < t 0 would be of the order of t 0 − t. Observing player j’s behavior would make a difference to player i’s behavior only if player j were dropping out between t and t 0 . Given that ρ > 0, player i would then drop out immediately following player j, rather than wait until t 0 . By dropping out earlier, player i saves the cost of waiting, which is of the order of t 0 − t. Since the probability of that event decreases to zero as t → t 0 , and as above, the marginal value of observing player j’s actions cannot offset the expected loss for t sufficiently close to t 0 . If instead ξ(t 0 ) p ∗ , one has ξ(t 0 ) p ∗ , so that ξ(t 0 +) > p ∗ by (14), and we claim that player i should instead wait beyond t 0 . Indeed, if at time t 0 player i chooses to wait for player j’s decision, player i may condition his continuation behavior on player j’s decision and mimic it: drop out if player j does, and remain active a bit longer if player j does not drop out. Since ξ(t 0 +) > p ∗ , the continuation payoff of such a strategy is higher than s – in contradiction with the equilibrium property. This concludes the discussion of Proposition 2. Proposition 2 has a number of easy consequences. First, since σ has no atom, the payoff function t → γ Ai (t , σ ) of player i when facing σ is a continuous function. Denoting by S the support of the measure σ , the equilibrium condition then implies that γ Ai (t , σ ) is equal to player i’s equilibrium payoff for every t ∈ S. This in turn implies that S is an interval. Indeed, if S were not a connected set, there would exist t 0 , t 1 ∈ S, such that t 0 < t 1 and S ∩ (t 0 , t 1 ) = ∅. Then, ξ(t ) is equal to P( R i = G | τi t , θ j > t 0 ) for all t ∈ (t 0 , t 1 ), and is thus decreasing on the interval (t 0 , t 1 ). In particular, either ξ(t 0 ) > p ∗ or ξ(t 1 ) < p ∗ holds. In the former case, the strategy t 0 is not a best-reply, since player i would gain by waiting a bit longer. In the latter case, the strategy t 1 is not a best-reply, since player i would gain by dropping out prior to t 1 (using arguments similar to the proof of Proposition 2). This shows that S is an interval. In addition, one has ξ(min S ) = p ∗ . Indeed, much as above, player i would gain by dropping out a bit earlier (resp. later) if ξ(min S ) < p ∗ (resp. ξ(min S ) > p ∗ ). Hence, min S = t p . We now argue informally that the indifference condition uniquely pins down both the endpoint max S and the distribution σ . We denote by F (t ) := σ ([0, t ]) = σ ([t p , t ]) (t ∈ S) the cdf of σ . The value of F (t ) is also equal to the probability P(θ2 t | R 2 = B ) that player 2 drops out before t if his risky arm is bad,15 and F˜ (t ) := P(θ2 t | R 2 = G ) is related to F through the identity
1 − F˜ (t ) = P(θ2 t τ2 | R 2 = G ) + P(θ2 τ2 t | R 2 = G )
=e
−λt
1 − F (t ) +
t
λe −λz 1 − F ( z) dz.
0
Let a time t ∈ [t p , max S ) be arbitrary, and consider a slightly later time t + dt ∈ S. When facing σ and by the equilibrium condition, the two strategies t and t + dt must yield the same continuation payoff as of time t. These two strategies yield the same outcome if either τi < t or θ j < t, hence we thereafter assume that min(τi , θ j ) t. Consider the three mutually exclusive events A := {τi < t + dt }, B := {τi t + dt , θ j ∈ [t , t + dt )}, and C := {τi t + dt , θ j t + dt }. The probability of A is of the order of λξ(t ) dt, while the probability of B is P(θ j ∈ [t , t + dt ) | min(τi , θ j ) t ). The continuation payoff induced by the strategy profile (t + dt , σ ) as of time t, is approximately equal to γ on the event A, is equal to W (φ(t )) on the event B , and is equal to se −r dt on the event C , while the continuation payoff induced by the profile (t , σ ) is equal to s in all three cases. Writing the indifference condition, and letting dt → 0, one sees that the quantity is well-defined, and solves the equation
(γ − s)λξ(t ) + W φ(t ) − s μ(t ) = rs. The limit
15
μ may be interpreted as the hazard rate of θ j , conditional on τi t.
There is a small caveat which is addressed in Appendices A and B.
μ(t ) := limdt →0
P(θ j ∈[t ,t +dt )|min(τi ,θ j )t ) dt
(4)
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
Both this hazard rate
[t , t + dt )) is given by
q G G e −λt + q G B
39
μ(t ) and the belief ξ(t ) may be rewritten using the cdf F . Indeed, the probability P(τi t , θ j ∈
F˜ (t + dt ) − F˜ (t ) + q G B e −λt + q B B
F (t + dt ) − F (t ) ,
where qω1 ω2 stands for the prior probability that R 1 = ω1 and R 2 = ω2 , and a similar formula holds for P(τi t , θ j t ). When plugging into (4), standard but tedious algebraic manipulations show that F solves the integro-differential equation (5) on the interval S:
W (φ(t )) − s F (t ) = φ(t ) − p ∗ F (t ) + λφ(t ) ψ(t ) − p ∗ λ(γ − s)
t
e λ(t −x) F (x) dx.
(5)
0
Observe that, when the correlation parameter ρ is positive, one has φ(t ) < p ∗ for t t φ , hence the left-hand side of (5) vanishes and the equation reduces to a first-order, linear differential equation. We will show that it has a unique solution F ∗ on [t p , t ψ ] such that F ∗ (t p ) = 0. When instead ρ < 0, W (φ(t )) > s for all t ∈ [t p , t φ ), and, using standard results on differential equations, there is a unique function G ∗ that solves (5) on the interval [t p , t φ ), and such that G ∗ (t ) = 0 for all t t p . Theorem 2. There is a unique symmetric equilibrium (σ , σ ). The distribution σ is non-atomic. If ρ > 0, the support of σ is the interval [t p , t ψ ], and the cdf of σ is given by F ∗ . If ρ < 0, the support of σ is equal to the interval [t p , t¯], where t¯ := min{t: G ∗ (t ) = 1} < t ψ . The cdf of interval.
σ is equal to G ∗ on this
The case ρ > 0 will be taken again in the next subsection. When ρ < 0, the equilibrium logic is the following. Observe first that player i keeps being more pessimistic with time, as long as τi t and θ j t. This is the combination of two effects. First, the fact that player i does not get any payoff is bad news in itself. Second, the information that player j does not drop out is an increasingly stronger indication that player j may have hit a success with the risky arm, which makes player i even more pessimistic since ρ < 0. On the other hand, the event of player j dropping out at t is worse news, the larger t is. Indeed, if player j drops out at time t, player i’s continuation payoff jumps to W (φ(t )), and that payoff is decreasing with t. Since, by the equilibrium condition, player i is indifferent between all strategies in the support of σ , it must be that the chances of observing these good news increase with time. That is, the hazard rate of θ j should increase with time, in a way that exactly offsets the two effects identified above. This condition writes as (4) or (5). 3.2. Further results More precise results are available, but do depend on the sign of
ρ.
3.2.1. The case of a positive correlation When ρ > 0, the symmetric equilibrium is the unique equilibrium in non-atomic distributions,16 but there exist purely atomic equilibria (see below). However, all the equilibria share the same, striking, qualitative feature. Theorem 3. Let (σ1 , σ2 ) be an equilibrium, and let S i ⊆ R+ ∪ {+∞} denote the support of the strategy σi . Then for every t ∈ S i , one has σ j ({t }) = 0, and ξi (t ) = p ∗ . The first part of the theorem generalizes the insights of the symmetric case: if the equilibrium strategy of a player involves an atom at some time t 0 , then the other player will not drop out in some neighborhood of t 0 . The main conclusion is the second one. It asserts that, whenever player i finds it optimal to drop out (t ∈ S i ), then his belief is equal to the one-player optimal cut-off, p ∗ . In a sense, this result mirrors the conclusion of Proposition 1. In both cases, when reaching the one-player cut-off value p ∗ , the marginal value of the informational spillover is equal to zero, and it is optimal to drop out, whether or not the other player may be observed in the future. This result casts additional light on Theorem 2. At the symmetric equilibrium (positive ρ ), the belief ξ(t ) of a player exceeds p ∗ for t < t p , is equal to p ∗ throughout the interval [t p , t ψ ], and falls below p ∗ for t > t ψ . Thus, the intensity with which a player drops out in the interval [t p , t ψ ] is tuned in such a way that the bad news from not hitting a success is exactly offset by the good news from seeing the opponent remaining active, so that one’s belief remains constant. This intuition for this result is discussed in the next section. We now discuss the existence of equilibria involving atoms.
16
Proof available upon request.
40
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
Proposition 3. There are exactly two pure equilibria: (t p , t ψ ) and the one obtained when exchanging the two players. The logic is clear. At time t p , player 2 deduces from player 1’s decision whether player 1 has hit a success or not. If player 1 drops out, the belief of player 2 drops down to φ(t p ) < p ∗ , and player 2 drops out. If player 1 remains active, then player 2 deduces that R 1 = G, and his belief is from that point on given by ξ2 (t ) = ψ(t ), so that player 2 finds it optimal to experiment until t ψ . On the other hand, when facing player 2’s strategy, the only way in which player 1 could infer something from player 2’s decisions is by waiting until time t ψ , and then deduce whether τ2 t ψ or not. However, in both cases, player 1 would find it optimal to drop out immediately. Thus, the unique best-reply of player 1 is the strategy t p . We conjecture the existence of more complex asymmetric equilibria. More specifically, we conjecture that, for each k ∈ N, there exist two sets S 1 = {u 1 , . . . , uk } ⊂ R+ and S 2 = { v 1 , . . . , v k } ⊂ R+ such that t p = u 1 < v 1 < · · · < uk < v k = t ψ , and an equilibrium (σ1 , σ2 ) in which σi has support S i . The logic is as follows. If player 1 does not drop out at time uk , this is good news for player j, whose belief jumps above p ∗ , and then declines continuously until it reaches p ∗ at time v k . Meanwhile, player i’s belief declines below p ∗ until date uk+1 . If player j does not drop out at that date, player i’s belief jumps above p ∗ , and then declines continuously until it reaches p ∗ at time v k+1 . And so on. In addition, the parameters of the equilibrium are tuned in such a way that, when at time uk , the continuation payoff induced by the strategy uk+1 is equal to s, so that player 1 is indifferent between all strategies in U (and a similar indifference condition holds for player 2).17 3.2.2. The case of a negative correlation When ρ < 0, the symmetric equilibrium is the unique one. Theorem 4. The symmetric equilibrium is the unique equilibrium. The detailed proof of Theorem 4 is involved, and is given in Appendices A and B. The trickier part is to prove that no equilibrium can possibly involve atoms. It consists in elaborating upon the arguments provided for the symmetric equilibrium case, taking into account asymmetries between the players. While we make no attempt at giving a complete intuition, we wish to make two comments. First, it is instructive to see why pure equilibria do not exist. Let’s argue by contradiction, and assume that such an equilibrium (t 1 , t 2 ) exists. Wlog, assume that t 1 t 2 . At equilibrium, player 1’s behavior then does not rely on observing player 2’s behavior (indeed, player 1 either drops out at t 1 or never, depending on τ1 ). Thus, it must be that t 1 = t p . When facing the strategy t p , all approximate best-replies of player 2 are of the following type. Wait until player 1’s decision at time t p . If player 1 drops out, infer that τ1 t p , update one’s belief to φ(t p ) > p (t p ) = p ∗ , and continue experimenting. If instead player 1 remains active, infer that R 1 = G, and update one’s belief to ψ(t p ) < p (t p ) = p ∗ – and drop out (unless of course τ2 < t p , in which case player 2 never drops out). Thus, t 2 = t 1 = t p . But now, the strategy t p is not a best-reply of player 1. Indeed, by choosing not to drop out at time t p , player 1 can infer from player 2’s reaction whether τ2 < t p or τ2 t p , and then update one’s belief to φ(t p ) in the latter case and get a continuation payoff above s. Note that this sketch also rules out approximate equilibria. The same logic allows one to rule out purely atomic equilibria. It is also instructive to get some intuition why asymmetric equilibria (σ1 , σ2 ) of the following continuous/atomic type do not exist: the two distributions σ1 and σ2 (i) have a common connected support [t p , max S ] with lower endpoint t p and (ii) involve no atom, except σ2 which assigns positive probability to t p . That is, the cdf F 1 of σ1 is continuous, while the cdf F 2 of σ2 is discontinuous at t p , and continuous elsewhere. (In the case where ρ > 0, such equilibria are ruled out by Theorem 3, which however does not apply here.) Assume to the contrary that such an equilibrium does exist. Then, when at time t p + (if player 2 remains active), player 1 is more pessimistic than player 2: ξ1 (t p +) < ξ2 (t p +). Since both players 1 and 2 are indifferent between dropping out immediately, and experimenting a bit further, it must be that player 1 assigns higher chances than player 2 does, to the fact that he will receive good news in this time interval. In other words, the probability of player 2 dropping out should be higher than the probability of player 1 dropping out, to compensate for the fact that player 1 is more pessimistic. Should players remain active a bit beyond t p , this will make player 1 even more pessimistic than player 2. Proceeding “inductively”, one obtains more generally that F 2 (t ) > F 1 (t ) for all t ∈ (t p , max S ], which contradicts the equality F 1 (max S ) = F 2 (max S ) = 1. The formal argument consists in a detailed analysis of the asymmetric versions of Eqs. (4) and (5). 3.3. Comments We here compare and contrast briefly the results obtained for the two informational scenarios. We view our main finding to be the following one. Recall that when ρ < 0 and payoffs are disclosed, player j’s success is bad news for player i. Similarly, when ρ > 0 and payoffs are not disclosed, the exit of player j is bad news for player i. That is, in these two cases, informational events bring sudden bad news, and it is the absence of such events which gradually bring good news.
17
It is not difficult to prove that all asymmetric equilibria have to be of this form.
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
41
In both cases, see Proposition 1 and Theorem 3, the marginal value of getting external information is equal to zero when one’s belief is equal to p ∗ . There is a simple common intuition behind Proposition 1 and Theorem 3. Assume that, when at time t ∈ R+ , player i is considering whether to drop out or to experiment a bit more. The rationale for keeping experimenting is that there is a chance that some event will occur that will cause a discontinuity in player i’s belief, and trigger him to continue beyond t + dt. Such an event may either be one’s own success, or an event coming from the observation of player j. In the latter case however, that event would bring bad news, and would only reinforce player i’s desire to drop out. Hence, only the prospect of hitting a success may justify to wait. But then, the condition of being indifferent between dropping out at time t or at time t + dt writes the same way as in the decision problem P . By contrast, when events bring good news, players do experiment with beliefs below p ∗ .18 When focusing on symmetric equilibria, the effect of information on equilibrium payoffs is clear. Indeed, when payoffs are not disclosed, the equilibrium payoff at the symmetric equilibrium is equal to the value of the decision problem P . (This is because the strategy t p is a best-reply to the symmetric equilibrium strategy, and then yields the same payoff as in P .) Thus, a player’s equilibrium payoff is not increased when observing the other player’s decisions. However, a player’s equilibrium behavior is modified when observing the other player’s decisions, and does depend on the correlation ρ . On the other hand, when payoffs are disclosed, all symmetric equilibria yield a payoff above the value of P (no matter what ρ is). The previous conclusion hinges on the symmetry requirement. At asymmetric equilibria, one of the players gets a payoff strictly higher than the value of P . Dispensing with the symmetry requirement allows players to coordinate more efficiently, while preserving equilibrium. This raises the issue of equilibrium efficiency. When payoffs are disclosed, all equilibria are inefficient. As should be expected, there is too little experimentation at equilibrium. When payoffs are not disclosed, the efficiency issue is delicate. As we said, the symmetric equilibrium is inefficient and (weakly) Pareto dominated by any asymmetric equilibrium when ρ < 0. The ranking of asymmetric equilibria is however unclear. The reason is the following. Let us compare the pure equilibrium (t p , t ψ ) and the mixed equilibrium in which player 1 randomizes between t p and some later date t¯ ∈ (t p , t ψ ) while player 2 randomizes between t ∈ (t p , t¯) and t ψ . In both cases, the equilibrium payoff of player 2 is achieved with the strategy t ψ . Under this strategy, player 2 either drops out immediately after player 1, or drops out at time t ψ (unless he has hit a success). On the other hand, player 1 eventually drops out earlier, and with higher probability, when using the pure strategy t p rather than the mixed equilibrium strategy. Hence, player 2 will drop out earlier and with higher probability in the pure equilibrium than in the mixed one. This has a positive impact on player 2’s payoff if his own arm is of a bad type, but not otherwise. Hence the overall effect on player 2’s payoff is a priori ambiguous. More generally, the characterization of the strategy profile for which the sum of the players’ payoffs is maximal remains an open issue. We finally discuss the impact of the correlation ρ on the equilibrium outcome. When payoffs are not disclosed, the sign and amount of correlation affect the equilibrium play, but not equilibrium payoffs, at least for the symmetric equilibrium. When payoffs are disclosed, and focusing for concreteness on the symmetric equilibrium (t φ , t φ ), the impact of correlation is predictable. Numerical evidence suggests that the equilibrium payoff is minimal when ρ = 0, and tends to increase as ρ increases in absolute terms. Appendix A. Public experimentation outcomes We collect in this section the formal proofs for the public scenario. We start with Proposition 1 (the analysis of the decision problem P˜ ), then establish the equilibrium characterization result, Theorem 1. f (ε ) We write o(ε ) to denote any function f (ε ) defined for all ε > 0 small enough and such that limε→0 ε = 0. A.1. Proof of Proposition 1 For concreteness, we assume that the decision maker is player 2. The expected payoff induced in P˜ by the pure strategy t is denoted by π˜ (t ), and is given by
π˜ (t ) = E e−r τ1 W ψ(τ1 ) 1τ1 min(τ2 ,t ) + γ E e−r τ2 1τ2
π˜ t − π˜ (t ) = E e−r τ1 W ψ(τ1 ) 1t τ1 <τ2 ,t + γ E e−r τ2 1t τ2 <τ1 ,t + s e−rt P(Ot ) − e−rt P(Ot ) . We will prove that there is a unique optimal strategy, t ∗∗ , which is obtained as the unique solution to the equation
λφ(t ∗∗ ) W ψ(t ∗∗ ) + γ − 2s = rs.
In addition, t ∗∗ t φ , and t ∗∗ > t φ if and only if The proof is organized in several steps.
18
ρ > 0.
Except at the symmetric equilibrium (t φ , t φ ) when payoffs are disclosed.
(6)
(7)
42
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
Claim 5. There exists an optimal strategy. Proof. By dominated convergence, π˜ is continuous over R+ , and has a limit when t → +∞, which we denote π˜ (+∞). Letting t → +∞ in (7), one gets
π˜ (+∞) − π˜ (t ) = e−rt E e−r (τ1 −t ) W ψ(τ1 ) 1t τ1 <τ2 + γ e−rt E e−r (τ2 −t ) 1t τ2 <τ1 − se−rt P(Ot ).
(8)
The two expectations on the right-hand side of (8) converge to zero as t → +∞, while P(Ot ) has a positive limit. Thus, π˜ (+∞) < π˜ (t ) for every large t. It follows that π˜ admits (at least) one maximum. 2 Claim 6. The function π˜ is of class C 1 and its derivative is given by
π˜ (t ) = e−rt P(Ot ) × λφ(t ) W ψ(t ) + γ − 2s − rs , t ∈ R+ .
(9)
Proof. Fix t 0 ∈ R+ . We will prove the claim relative to the right-derivative. Similar arguments apply for the left-derivative. Observe first that, for t > t 0 ,
P(t 0 τ1 , τ2 t ) P
τ1 , τ2 ∈ [t 0 , t ] R 1 = R 2 = G
2 = e −2λt 1 − e −λ(t −t0 ) λ2 (t − t 0 )2 .
Therefore,
P
τ1 , τ2 ∈ [t 0 , t ] = o(t − t 0 ).
(10)
Thus, the functions 1t0 τ1
0 when taking the limit of . t −t 0 Next, consider in turn the two expectations in (7), and divide them by P(Ot0 ). Observe first that
t
φ(t 0 ) E e −r (τ1 −t0 ) W ψ(τ1 ) 1τ1 ∈[t0 ,t ) Ot0 = t − t0 t − t0 1
e −r (x−t0 ) W ψ(x) λe −λ(x−t0 ) dx,
t0
which converges to λφ(t 0 ) W (ψ(t 0 )) when t t 0 , since W (·) and ψ(·) are continuous. Observe next that t −1t E[e −r (τ2 −t0 ) 1τ2 ∈[t0 ,t ) | Ot0 ] converges to λφ(t 0 ), for similar reasons. 0 We finally deal with the last term in (7). One has
1 P(Ot0 )
=
×
1 t − t0
1 t − t0
e
e −r (t −t0 ) P(Ot ) − P(Ot0 ) =
−r (t −t 0 )
1 t − t0
e −r (t −t0 ) P(Ot | Ot0 ) − 1
− 1 P(Ot | Ot0 ) − P min(τ1 , τ2 ) < t Ot0 .
The first term converges to −r and the second one has the same limit, 2λφ(t 0 ), as the expression
P
τ1 ∈ [t 0 , t ) Ot0 + P τ2 ∈ [t 0 , t ) Ot0 .
The result follows when adding these different limits.
2
We now conclude the proof of Proposition 1. Let t ∗∗ ∈ R+ be any point at which π˜ reaches its maximum. One has
π˜ (t ∗∗ ) = 0, that is λφ(t ∗∗ ) W ψ(t ∗∗ ) + λφ(t ∗∗ )γ − s r + 2λφ(t ∗∗ ) = 0 or equivalently,
λφ(t ∗∗ ) W ψ(t ∗∗ ) + γ − 2s = rs.
(11)
Since φ is decreasing, positive and continuous, and since W (ψ(·)) is non-decreasing, Eq. (11) has at most one solution. Recall now that t φ is defined by φ(t φ ) = p ∗ , and thus
λφ(t φ )(γ − s) = rs.
(12)
Since W (ψ(t ∗∗ )) s, (11) and (12) imply φ(t ∗∗ ) φ(t φ ), hence t ∗∗ t φ . If ρ < 0, one has ψ(t ∗∗ ) φ(t ∗∗ ) φ(t φ ) = p ∗ , hence W (ψ(t ∗∗ )) = s, so that Eq. (11) reduces in that case to λφ(t ∗∗ )(γ − s) = rs, and t ∗∗ = t φ .
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
43
If instead ρ > 0, then ψ(t ∗∗ ) > φ(t φ ), hence W (ψ(t φ )) > s, so that t φ is not a solution to (11). In that case, one therefore has t ∗∗ > t φ . This concludes the proof of Proposition 1. Note that the proof also implies that the payoff function π˜ (·) is increasing for t < t ∗∗ , and decreasing for t > t ∗∗ . A.2. Proof of Theorem 1 We first describe the best-reply correspondence of player i. Recall (see Section 1.2.1) that player i’s expected payoff
γ Pi (t i , t j ) is equal to π˜ (t i ) as long as t i t j , and does not depend on t i whenever t i > t j . The continuity properties of γ P (·,·) on the diagonal t 1 = t 2 play a role in the analysis and need to be clarified. To this
end, fix t 1 ∈ R+ , and compare the outcomes induced by the profiles (t 1 , t 1 ) and (t 1 , t 2 ), where t 2 > t 1 . The two profiles yield the same outcome, except possibly if min(τ1 , τ2 ) t 1 . On this latter event, both players do exit at time t 1 according to the profile (t 1 , t 1 ). Instead, under the profile (t 1 , t 2 ), player 1 drops out at time t 1 , and player 2 gets a continuation payoff of W (φ(t 1 )) s. Formally,
γ P2 (t 1 , t 2 ) − γ P2 (t 1 , t 1 ) = P(Ot1 )e−rt1 W φ(t 1 ) − s . The continuation payoff W (φ(t 1 )) exceeds s if and only if φ(t 1 ) > p ∗ , that is, iff t 1 < t φ . It follows that γ P2 (t 1 , t 1 ) γ P2 (t 1 , t 2 ), and γ P2 (t 1 , t 1 ) < γ P2 (t 1 , t 2 ) if and only if t 1 < t φ . This observation allows us to pin down the best-reply correspondence of player 2. Fix first a pure strategy t 1 of player 1. The map t 2 → γ P2 (t 1 , t 2 ) coincides with π˜ (·) on [0, t 1 ], and is constant on (t 1 , +∞], with γ P2 (t 1 , t 1 +) γ P2 (t 1 , t 1 ), and γ P2 (t 1 , t 1 +) > γ P2 (t 1 , t 1 ) if and only if t 1 < t φ . Since π˜ (·) is single-peaked, with a maximum at t ∗∗ , it follows that the set B 2 (t 1 ) of pure best-replies to t 1 is given by (i) B 2 (t 1 ) = (t 1 , +∞] if t 1 < t φ , (ii) B 2 (t 1 ) = [t 1 , +∞] if t 1 ∈ [t φ , t ∗∗ ], and (iii) B 2 (t 1 ) = {t ∗∗ } if t 1 > t ∗∗ . In particular, if (t , t ) is a pure symmetric equilibrium, then t t φ by (i) and t t ∗∗ by (ii), hence t ∈ [t φ , t ∗∗ ]. Conversely, any profile (t , t ) such that t ∈ [t φ , t ∗∗ ] is an equilibrium. The more general conclusions are readily available as well. Observe that the pure strategy t ∗∗ is a best-reply to any pure strategy of player 1 and, therefore, to any mixed strategy as well. Fix now a mixed strategy σ1 of player 1, and let t 2 ∈ [0, +∞] be arbitrary. Let us compare the outcomes induced by the two profiles (σ1 , t 2 ) and (σ1 , t ∗∗ ). On the event min(τ1 , τ2 , θ1 ) < min(t 2 , t ∗∗ ), the two outcomes coincide. Assume instead that the event min(τ1 , τ2 , θ2 ) min(t 2 , t ∗∗ ) occurs. If t 2 > t ∗∗ , it is strictly suboptimal to experiment any longer beyond t ∗∗ , hence the expected continuation payoff, conditional on min(τ1 , τ2 , θ1 ) t ∗∗ , is strictly higher under (σ1 , t ∗∗ ). Thus, one has γ P2 (σ1 , t 2 ) = γ P2 (σ1 , t ∗∗ ) if and only if σ1 ((t ∗∗ , +∞]) = 0. If now t 2 < t ∗∗ , the only optimal policy is to continue until player 1 drops out. Thus, conditional on min(τ1 , τ2 , θ1 ) t 2 , the expected continuation payoff of player 2 is strictly higher under (σ1 , t ∗∗ ). In that case, γ P2 (σ1 , t 2 ) = γ P2 (σ1 , t ∗∗ ) if and only if σ1 ((t 2 , +∞]) = 0. This allows us to characterize the set of all equilibria. Since γ Pi is not a continuous function, a little care is needed. Let (σ1 , σ2 ) be an equilibrium, and denote by S i the support of the distribution σi . By the equilibrium property, one has γ i (t i , σ j ) = γ i (σi , σ j ) (= γ i (t ∗∗ , σ j )) for σi -a.e. t i ∈ S i . Note that min S i t φ for i = 1, 2. Assume that, say, min S 1 < t ∗∗ .19 (Since min S 1 ∈ S 1 ,) There exist strategies t 1 arbitrarily close to min S 1 , such that γ 1 (t 1 , σ2 ) = γ 1 (t ∗∗ , σ2 ). By the previous paragraph, one has σ2 ((t 1 , +∞]) = 0 for any such t 1 . This implies that max S 2 min S 1 and, a fortiori, min S 2 < t ∗∗ . This allows us to exchange the roles of the two players in this argument. One then obtains max S 1 min S 2 . Hence, min S 1 = max S 1 = min S 2 = max S 2 . That is, (σ1 , σ2 ) is a pure symmetric equilibrium. Assume now that, say, max S 1 > t ∗∗ . The previous argument shows that min S 1 t ∗∗ and min S 2 t ∗∗ . (Since max ∈ S 1 ,) There exist strategies t 1 > t ∗∗ such that γ P1 (t 1 , σ2 ) = γ P1 (t ∗∗ , σ2 ). Therefore, σ2 ((t ∗∗ , +∞]) = 0. Hence, σ2 is the pure strategy t ∗∗ , while σ1 is concentrated on [t ∗∗ , +∞]. Conversely, any such pair (σ1 , σ2 ) is an equilibrium. This concludes the proof of Theorem 1. Appendix B. Private experimentation outcomes We here provide all formal proofs for the scenario in which outcomes are private. The section is organized as follows. Section B.1 addresses a number of technical issues relating payoff monotonicity and beliefs. Section B.2 lists immediate implications for best-replies. Section B.3 introduces the differential equation that describes non-atomic equilibria. Sections B.4 and B.5 deal respectively with the case of a positive and of a negative correlation. We let a strategy σ j of player j be fixed in Sections B.1 through B.3. All probabilities are computed under the assumption that player j is using the strategy σ j .
19
If
ρ < 0, one has t ∗∗ = t φ , hence this situation cannot possibly arise.
44
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
B.1. Technical preliminaries For notational simplicity, we set Ai (t ) := {τi t , θ j t } and Ai (t +) := {τi t , θ j > t }.20 Recall that the belief function ξi (·) is defined by ξi (t ) := P( R i = G | Ai (t )) (t ∈ R+ ). It is readily checked that ξi (·) is left-continuous over R+ , and has a right-limit at each t ∈ R+ , which is given by ξi (t +) := P( R i = G | Ai (t +)). In addition, ξi is continuous at t ∈ R+ if and only if σ j ({t }) = 0. B.1.1. Expected payoffs For t 0, we provide a convenient expression for the payoff γ Ai (t , σ j ) induced by the strategy profile (t , σ j ). Under this profile, the continuation payoff of player i at time min(t , τi , θ j ) is given (i) by s if θ j t and τi t, (ii) by γ if τi θ j and τi < t, and (iii) by W (φ(θ j )) if θ j < min(t , τi ). Since τi has a density conditional on R i = G, one has P(τi = θ j < +∞) = 0, and player i’s expected payoff is therefore equal to
γ Ai (t , σ j ) = se−rt P Ai (t ) + γ E e−r τi 1τi
(13)
γ Ai (t , σ j ) − γ Ai (t 0 , σ j ) = se−rt P Ai (t ) − se−rt0 P Ai (t 0 )
+ γ E e −r τi 1t0 τi
(14)
We state few properties. The first one follows from (13) by dominated convergence. Fact 0. The map t → γ Ai (t , σ j ) is left-continuous over R+ , and admits a right-limit at each t ∈ R+ , given by
γ Ai (t +, σ j ) = γ Ai (t , σ j ) + e−rt P(θ j = t , τi t ) × W φ(t ) − s .
(15)
In the next statements, we let t > 0 be given.21 Since τi follows an exponential distribution if R i = G, the instantaneous probability of success is equal to λ, if the arm is good. That is: Fact 1. P(τi < t + ε | Ai (t )) = ε λξi (x) + o(ε ). Since P(θ j ∈ (t , t + ε )) converges to zero as profile ( R i , R j ), Fact 1 implies that
ε → 0, and since θ j and τi are conditionally independent given the type
P t < θ j , τi t + ε = o(ε ).
(16)
Next, observe that
e −r ε P Ai (t + ε ) Ai (t +) − 1 = e −r ε − 1 P Ai (t + ε ) Ai (t +) − P min(θ j , τi ) < t + ε Ai (t +) .
(17)
The first term on the right-hand side of (17) is equal to −r ε + o(ε ). Using Fact 1 and (16), the second term is equal to
P min(θ j , τi ) < t + ε Ai (t +) = P θ j < t + ε Ai (t +) + ε λξi (t +) + o(ε ). Plugging into (17), one gets Fact 2 below. Fact 2. One has
e −r ε P Ai (t + ε ) Ai (t +) + P θ j < t + ε Ai (t +) = 1 − ε r + λξi (t +) + o(ε ). The next statement is related, and slightly more general. Fact 3. Let β(·) be a C 1 function of time. One has
E β(θ j )1θ j
(18)
and
E β(τi )1τi
These are events on which player i is Actively making a decision. We assume without further notice that P(θ j > t ) > 0.
(19)
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
Note first that for is at most
45
ε > 0 small, supx∈[t ,t +ε] |β(x) − β(t )| is at most of the order of εβ (t ). Thus, the left-hand side of (18)
ε β (t ) P θ j < t + ε Ai (t +) + o(ε) = o(ε) since P(θ j < t + ε | Ai (t +)) converges to zero as ε → 0. This establishes the first assertion. Note next that the left-hand side of (19) is equal to
β(t )P τi < min(t + ε , θ j ) Ai (t +) + o(ε ), which, thanks to (16) and Fact 0, is equal in turn to
β(t )P τi < t + ε Ai (t +) + o(ε ) = ε λξi (t +)β(t ) + o(ε ). This establishes the second assertion. B.1.2. Beliefs and payoff monotonicity We here state and prove two lemmas which are used later. Lemma 1. Let I be an open interval such that σ j ( I ) = 0. Let t 0 ∈ I be such that ξi (t 0 ) < p ∗ . Then t → γ Ai (t , σ j ) is decreasing over the interval [t 0 , +∞) ∩ I . Proof. The intuition is straightforward. Consider time t 0 . Since player j does not drop out in I , player i is facing a one-player problem as long as t ∈ I . Since ξi (t 0 ) < p ∗ , he is better off dropping out at time t 0 than at any later time t ∈ I . Formally, note first that the assumption σ j ( I ) = 0 implies that ξi (·) is decreasing over I . It is thus enough to prove that γ Ai (t , σ j ) < γ Ai (t 0 , σ j ) for each t > t 0 , t ∈ I . Using (14) one has
γ Ai (t , σ j ) − γ Ai (t 0 , σ j ) e −rt0 P(Ai (t 0 ))
= se −r (t −t0 ) P τi t Ai (t 0 ) + γ E e −r (τi −t0 ) 1τi
The sum of the first two terms of the right-hand side of (20) is equal to ξi (t 0 ) < p ∗ . 2
(20)
πξi (t0 ) (t − t 0 ), which is less than s since
Lemma 2. Let t 0 ∈ R+ be such that ξi (t 0 +) > p ∗ . Then t → γ Ai (t , σ j ) is increasing over [t 0 , t 0 + ε ], for ε > 0 small enough. Proof. Since the inequality ξi (t 0 +) > p ∗ implies that ξi (t +) > p ∗ for all t > t 0 close enough to t 0 , it is sufficient to prove that γ Ai (t , σ j ) > γ Ai (t 0 , σ j ) for all t > t 0 close enough to t 0 . Again, the intuition is straightforward. When at time t 0 (on the event Ai (t 0 )), player i expects a continuation payoff above s if player j does not drop out, since ξi (t 0 +) > p ∗ . Hence dropping out at time t 0 is suboptimal. We now formalize this intuition. Since W (φ(t )) s for every t t 0 , one has using (14),
γ Ai (t , σ j ) − γ Ai (t 0 , σ j ) se−rt P Ai (t ) − se−rt0 P Ai (t 0 )
+ sE e −r θ j 1t0 θ j
When canceling the event θ j = t 0 in the second and third terms, and using the fact that
τi has a density, one obtains
γ Ai (t , σ j ) − γ Ai (t 0 , σ j ) se−rt P Ai (t ) − se−rt0 P Ai (t 0 +) + sE e−r θ j 1t0 <θ j
se −rt P(τi t , θ j > t 0 ) − se −rt0 P Ai (t 0 +) + γ E e −r τi 1t0 τi
t 0 ) − se −rt0 P Ai (t 0 +) + γ E e −r τi 1t0 τi
where the last inequality follows using (16). When divided by e −r (t −t0 ) P(Ai (t 0 +)), the right-hand side of this last inequality is equal to
πξi (t0 +) (t − t 0 ) − s + o(t − t 0 ) = (t − t 0 )πξi (t0 +) (0) + o(t − t 0 ) > 0, where the final inequality holds since ξi (t 0 +) > p ∗ .
2
46
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
B.2. Best-reply restrictions In this section, we proceed with equilibrium necessary conditions, and we let σi be a best-reply to σ j . We denote by S i and S j the supports of σi and σ j . We derive two properties. Lemma 3 provides a formal version of the intuitive assertion that a player will not drop out as long as his belief exceeds p ∗ . Since the belief ξi need not be continuous, this will surprisingly turn out to be somewhat delicate. Lemma 5 proves that player i does not drop out immediately before an atom of σ j . Lemma 3. One has
• ξi (t 0 +) p ∗ for all t 0 ∈ S i . • ξi (t 0 )(= ξi (t 0 +)) = p ∗ for all t 0 ∈ S i \ S j . Proof. By the best-reply property, one has γ Ai (t , σ j ) = γ Ai (σi , σ j ) for σi -a.e. t ∈ S i . Let t 0 ∈ R+ be such that ξi (t 0 +) > p ∗ . By Lemma 2, t → γ Ai (t , σ j ) is increasing over some interval [t 0 , t 0 + ε ]. Using in addition the left-continuity of t → γ Ai (t , σ j ) at t 0 , there exists ε > 0 such that γ Ai (t , σ j ) < γ Ai (t 0 + ε , σ j ) for all t ∈ [t 0 − ε , t 0 + ε ). Therefore, σi ([t 0 − ε , t 0 + ε )) = 0, and t 0 ∈ / S i . This proves the first statement. We turn to the second statement. Let I be an open interval containing t 0 , and such that I ∩ S j = ∅. In particular, t → γ Ai (t , σ j ) is continuous on I . Thus,
γ Ai (t , σ j ) γ Ai (t 0 , σ j ) = γ Ai (σi , σ j ), for each t ∈ I . The result then follows from Lemma 1.
2
Lemma 4. Let t 0 ∈ R+ . If ξi (t 0 ) > p ∗ , then σi ([t 0 − ε , t 0 ]) = 0 for some ε > 0. Proof. Since ξi is left-continuous, one has ξi (t ) > p ∗ for all t in some interval [t 0 − ε , t 0 ]. By Lemma 2, this implies that t → γ Ai (t , σ j ) is increasing on the interval [t 0 − ε , t 0 ]. Hence, σi ([t 0 − ε , t 0 )) = 0. It remains to prove that σi ({t 0 }) = 0. We distinguish two cases. Assume first that σ j ({t 0 }) = 0, so that ξi (t 0 +) = ξi (t 0 ) > p ∗ . By Lemma 2, the map t → γ Ai (t , σ j ) is increasing on some interval [t 0 , t 0 + ε], and therefore σi ([t 0 , t 0 + ε)) = 0, as desired. If instead σ j ({t 0 }) > 0, then ξi (·) is discontinuous at t 0 . Since ξi (t 0 ) is a weighted average of φ(t 0 ) and of ξi (t 0 +), one has either φ(t 0 ) > p ∗ or ξi (t 0 +) > p ∗ . If ξi (t 0 +) > p ∗ , one has as in the previous paragraph, σi ([t 0 , t 0 + ε )) = 0 for some ε > 0. If φ(t 0 ) > p ∗ , one has W (φ(t 0 )) > s. Therefore, γ Ai (t 0 +, σ j ) > γ Ai (t 0 , σ j ), using (15), hence σi ({t 0 }) = 0. 2 We stress that Lemmas 3 and 4 do not rule out the possibility that ξi (t ) > p ∗ for some t ∈ S i . Indeed, Lemmas 3 and 4 are consistent with a situation in which ξi (t 0 ) > p ∗ ξi (t 0 +), and a strategy σi that assigns probability zero to some interval [t 0 − ε , t 0 ], yet assigns a positive probability to all intervals (t 0 , t 0 + ε ), ε > 0. The conclusion that ξi (t ) p ∗ for all t ∈ S i is nevertheless valid, and follows from our characterization results. Yet, we do not know of a more direct proof. Lemma 5. Let t 0 ∈ R+ be given. If σ j ({t 0 }) > 0, then σi ([t 0 − ε , t 0 ]) = 0 for some ε > 0. Proof. There are three cases. Firstly, if γ Ai (t 0 +, σ j ) > γ Ai (t 0 , σ j ), then by left-continuity, one has γ Ai (t , σ j ) < γ Ai (t , σ j ) for all t < t 0 < t close enough to t 0 . The result follows in that case. Secondly, if ξi (t 0 ) > p ∗ , the conclusion follows by Lemma 4. Lastly, we are left with the case in which ξi (t 0 ) p ∗ and γ Ai (t 0 +, σ j ) = γ Ai (t 0 , σ j ). Using (15), one must have W (φ(t 0 )) = s, so that φ(t 0 ) p ∗ . By Lemma 3, one has ξi (t 0 +) p ∗ as well. Since ξi is discontinuous at t 0 , and since ξi (t 0 ) is a weighted average of ξi (t 0 +) and of φ(t 0 ), this implies that ξi (t 0 ) < p ∗ . We claim that one must then have γ Ai (t 0 , σ j ) < γ Ai (t , σ j ) for t < t 0 close to t 0 . Since γ Ai (t 0 , σ j ) = γ Ai (t 0 +, σ j ) = i γ A (σi , σ j ), this will yield the desired contradiction. We rely on Eq. (14). Since t → W (φ(t )) is a C 1 function, with W (φ(t 0 )) = s, and by Fact 3, standard manipulations show that for t < t 0 , one has
γ Ai (t 0 , σ j ) − γ Ai (t +, σ j ) = se−rt0 P τi t 0 Ai (t +) + γ E e−r τi 1t τi
(21)
When divided by e −rt , the right-hand side of (21) is equal to πξi (t +) (t 0 − t ) − s + o(t 0 − t ). Since ξi (t 0 ) < p ∗ and since ξi (t +) → ξi (t 0 ) as t → t 0 , it is also equal to (t 0 − t )πξ (t ) (0) + o(t 0 − t ) < 0. 2 i
0
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
47
B.3. On non-atomic equilibria Recall that σ j is a given strategy of player j. We set F j (t ) := σ j ([0, t ]). We will modify σ j and will assume here that player j ignores player i. In effect, we assume that player j’s behavior is as follows: a random time θ j is drawn using the probability distribution σ j , and player j drops out at θ j if θ j τ j , and at time +∞ otherwise. Thus, we modify player j’s behavior only in the event where player i drops out before player j. Plainly, this has no effect on player i’s best-reply. One advantage of this reinterpretation is that the behavior of player j does not depend on t past behavior of player i. With this reinterpretation, F j (t ) is also equal to P(θ j t | R j = B ). We also introduce H j (t ) := 0 e −λx (1 − F j (x)) dx. Proposition 4. Let I be an interval in R+ with non-empty interior, such that statements are equivalent:
σ j ({t }) = 0 for each t ∈ I . Then the following two
• On the interval I , the map t → γ Ai (t , σ j ) is constant; • On the interval I , the map H j is of class C 2 and is a solution to the linear, second-order differential equation
W (φ(t )) − s H (t ) + λ H (t ) = φ(t ) − p ∗ H (t ) + λφ(t ) ψ(t ) − p ∗ H (t ). λ(γ − s)
(22)
Proof. In this proof, we omit a few tedious but routine computations. We first prove that the first statement implies the second one. Fix t ∈ I , and let ε > 0 be such that t + ε ∈ I . Using (14), the equality γ Ai (t , σ j ) = γ Ai (t + ε , σ j ) writes
s = se −r ε P Ai (t + ε ) Ai (t ) + E W φ(θ j ) e −r (θ j −t ) 1θ j
When dividing by ε , taking the limit Facts 1–3, one eventually obtains
(23)
ε → 0 in the previous equality, using the fact that x → W (φ(x))e −r (x−t ) is C 1 , and
0 = −rs + λξi (t )(γ − s) + W φ(t ) − s lim
P(θ j < t + ε | Ai (t ))
ε
ε →0
(24)
.
We introduce the function F˜ j (x) := P(θ j x | R j = G ), which is related to F j by the identity
1 − F˜ j (x) = e −λx 1 − F j (x) +
x
λe −λz 1 − F j ( z) dz.
0
With this notation at hand, elementary computations yield
P θ j < t + ε , Ai (t ) = q G G e −λt + q G B
F˜ j (t + ε ) − F˜ j (t ) + q G B e −λt + q B B
F j (t + ε ) − F j (t ) .
Using the relation between F˜ j and F j , and the continuity of F j , Eq. (24) implies that F j is of class C 1 on the interval I , and therefore, F˜ j is C 1 as well, hence H j is C 2 . In addition, F˜ j (t ) = e −λt F j (t ) for each t ∈ I . It follows that
lim
P(θ j < t + ε | Ai (t ))
ε
ε →0
= q G G e −2λt + 2q G B e −λt + q B B F j (t ).
(25)
On the other hand, P(Ai (t )) is given by
P Ai (t ) = q G G e −λt 1 − F˜ j (t ) + q G B e −λt 1 − F j (t ) + q G B 1 − F˜ j (t ) + q B B 1 − F j (t )
(26)
while
P R i = G , Ai (t ) = q G G e −λt 1 − F˜ j (t ) + q G B e −λt 1 − F j (t ) .
(27)
When plugging (25), (26) and (27) into (24), using the relation between F˜ j and F j , and after some manipulations, one obtains
W φ(t ) − s q G G e −2λt + 2q G B e −λt + q B B F j (t )
= 1 − F j (t ) rs q G G e −2λt + 2q G B e −λt + q B B − λ(γ − s) q G G e −2λt + q G B e −λt
t −λx + λe 1 − F j (x) dx × rs q G G e −λt + q G B − λ(γ − s)q G G e −λt . 0
Observe next that q G G e −2λt + 2q G B e −λt + q B B = P(τi t , τ j t ), while
(28)
48
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
q G G e −2λt + q G B e −λt = P( R i = G , τi t , τ j t ) = φ(t )P(τi t , τ j t ). On the other hand,
q G G e −λt q G G e −λt + q G B
=
P( R i = G , R j = G , τi t ) P( R j = G , τi t )
= ψ(t ).
Therefore, one has
W φ(t ) − s F j (t ) = 1 − F j (t ) rs − λ(γ − s)φ(t )
+ φ(t )e
λt
rs − λ(γ − s)ψ(t )
t
λe −λx 1 − F j (x) dx
0 rs
or equivalently, using p ∗ := λ(γ −s) ,
W (φ(t )) − s F j (t ) = 1 − F j (t ) p ∗ − φ(t ) + λφ(t ) p ∗ − ψ(t ) e λt λ(γ − s) Since H j (t ) =
t
t
e −λx 1 − F j (x) dx.
0
e −λx (1 − F j (x)) dx, it follows that H j solves the equation 0
W (φ(t )) − s H j (t ) + λ H j (t ) = φ(t ) − p ∗ H j (t ) + λφ(t ) ψ(t ) − p ∗ H j (t ) dt λ(γ − s) on the interval I , as desired. t Conversely, if H j (t ) := 0 e −λx (1 − F j (x)) dx is a C 2 function and solves (24) on the interval, then the previous computa-
tion shows that the map t → γ Ai (t , σ j ) is differentiable on I , with a derivative equal to zero throughout I . It follows that it is a constant map, as desired. 2 B.4. The case ρ > 0 We here prove most of the statements relative to the positive correlation case. We start with Theorem 3. We then rely on Theorem 3 to prove the relevant statement in Theorem 2. B.4.1. Proof of Theorem 3 We let (σ1 , σ2 ) be an equilibrium, and t 0 ∈ S i . Assume for a moment that ξi (t 0 ) > p ∗ . Since ρ > 0, this implies that ξi (t +) > p ∗ throughout some neighborhood of t 0 . Hence, by Lemma 2, the map t → γ Ai (t , σ j ) is increasing over some neighborhood, which is in contradiction with t 0 ∈ S i . Therefore, ξi (t 0 ) p ∗ . Since ρ > 0, this implies φ(t 0 ) p ∗ . By Eq. (15), this implies in turn that t → γ Ai (t , σ j ) is continuous at t 0 , so that γ Ai (t 0 , σ j ) = γ Ai (σi , σ j ). We next claim that ξi (t 0 ) = p ∗ . If indeed ξi (t 0 ) < p ∗ , the second part of the proof of Lemma 5 applies here and yields γ Ai (t 0 , σ j ) < γ Ai (t , σ j ), for t < t 0 close to t 0 – a contradiction. Therefore, ξi (t 0 ) = p ∗ . We next prove that σ j ({t 0 }) = 0, so that ξi (t 0 +) = ξi (t 0 ) = p ∗ , as desired. Assume to the contrary that σ j ({t 0 }) > 0. By Lemma 5, one has σi ([t 0 − ε , t 0 ]) = 0 for some ε > 0. On the other hand, and since ξi is discontinuous at t 0 , one has ξi (t 0 +) > p ∗ . Using Lemma 2, this implies that σi ([t 0 , t 0 + ε ]) = 0 for ε > 0 small enough. Hence σi assigns probability zero to some neighborhood of t 0 . But this is in contradiction with t 0 ∈ S i . B.4.2. Proof of Theorem 1 The proof is organized as follows. We first prove in Lemma 6 that any symmetric equilibrium must have a connected support, equal to the interval [t p , t ψ ]. We next rely on Proposition 4 to characterize a unique equilibrium candidate, and check that it indeed defines an equilibrium. Let (σ , σ ) be a symmetric equilibrium. Lemma 6. The support S of σ is equal to [t p , t ψ ]. Proof. Since (σ , σ ) is an equilibrium and by Lemma 5, the distribution σ has no atom, and thus ξ1 (·) = ξ2 (·) is continuous on S. By Theorem 3, ξ1 (·) = ξ2 (·) is equal to p ∗ on S. Finally, if I is any interval such that σ ( I ) = 0, then ξi is decreasing on I . This implies that S is an interval. Since σ ([0, min S ]) = 0, and σ ([0, max S ]) = 1, one has respectively ξi (min S ) = p (min S ) and ξi (max S ) = ψ(max S ), hence min S = t p , and t 2 = t ψ . 2
t
tp
−λt p e −λt dt = 1−eλ . Since σ has γ Ai (t , σ ) is constant on the interval [t p , t ψ ]. Therefore, by Proposition 4, H is C 2 on the no atom, the payoff function t →
Define F (t ) := σ ([0, t ]), and set H (t ) :=
0
e −λx (1 − F (x)) dx. By Lemma 6, H (t p ) =
0
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
49
interval [t p , t ψ ] and solves Eq. (22) on that interval. (This implies that F solves (5).) Since ρ > 0, one has φ(t ) < p ∗ , and therefore W (φ(t )) = s, for each t ∈ [t p , t ψ ]. Hence Eq. (22) boils down to the first-order equation
H (x) =
λφ(x)(ψ(x) − p ∗ ) H (x). p ∗ − φ(x)
(29) −λt p
This equation has a unique solution such that H (t p ) = 1−eλ . This shows that a symmetric equilibrium, if it exists, must be unique. Conversely, consider the (unique) solution H to Eq. (29) on the interval [t p , t ψ ], that satisfies the initial condition H (t p ) = 1−e −λt p
, and observe that H is smooth on the interval [t p , t ψ ]. Set F (t ) = 1 − e λt H (t ) for each t ∈ [t p , t ψ ]. We now prove that F is the cumulative of some mixed strategy, which forms a symmetric equilibrium. Observe first that, using (29) and the boundary condition, one has F (t p ) = 0, and F (t ψ ) = 1. It can be checked that the derivative of F is of the sign of ρ on the interval [t p , t ψ ], so that F is increasing.22 Let σ be the probability distribution defined by σ ([0, t ]) = F (t ). Note that σ has no atom. When facing the strategy σ , the belief of player i solves ξi (t ) = P( R i = G | τi t ) > p ∗ for t < t p , and ξi (t ) = ψ(t ) < p ∗ for t > t ψ . Hence any best-reply of player i to σ assigns probability 1 to [t p , t ψ ]. By Proposition 4, the map t → γ Ai (t , σ ) is constant on [t p , t ψ ]. Hence, any strategy that assigns probability 1 to [t p , t ψ ] is a best-reply to σ . Therefore, (σ , σ ) is an equilibrium. λ
B.5. The case ρ < 0 We here prove that when
ρ < 0, there is a unique equilibrium, which is symmetric and given by Theorem 2.
B.5.1. Uniqueness We let a tentative equilibrium (σ1 , σ2 ) be given. Building on earlier results, we first prove (Lemma 7) that σ1 and σ2 must share the same support, which is an interval starting at t p . We then argue that one of the strategies, say σ1 , is non-atomic, while σ2 has at most one atom, necessarily located at t p . Proposition 4 allows us to uniquely characterize σ1 . The trickier and longer part of the proof consists in showing that the common support property prevents the existence of an atom at t p . Lemma 7. One has S 1 = S 2 = [t p , t¯], for some t¯ ∈ (t p , t φ ). Proof. We first argue by way of contradiction that S 1 = S 2 , and thus assume that there exists t 0 ∈ S 1 \ S 2 . By Lemma 3, one has ξi (t 0 ) = p ∗ . By Lemma 2, and since ξ1 is decreasing, one has t 0 = min S 1 . Hence, ξ2 (t ) is simply equal to P( R 2 = G | τ2 t ) for t < t 0 . This implies that ξ2 (t ) ξ1 (t ), and thus, ξ2 (t ) > p ∗ , for each t < t 0 . By Lemma 2, one must then have t 0 < min S 2 . Set t := inf{t > t 0 : t ∈ S 1 ∪ S 2 }. Observe that ξ1 (t ) < p ∗ for each t ∈ (t 0 , min S 2 ), hence t ∈ / S 1 by Lemma 3. This implies / S 1 , then ξ2 (min S 2 ) = p ∗ by Lemma 3. that t = min S 2 . We claim that min S 2 belongs to S 1 as well. If instead min S 2 ∈ On the other hand, one has ξ1 (t ) ξ2 (t ) for t min S 2 , which yields ξ2 (min S 2 ) < p ∗ since ξ1 (t ) < p ∗ for each t > t 0 – a contradiction, and thus min S 2 ∈ S 1 ∩ S 2 , as claimed. Note that we have proven as well that ξi (min S 2 ) < p ∗ for i = 1, 2. By Lemma 5, at least one of the two strategies σ1 , σ2 does not have an atom at min S 2 . Fix i ∈ {1, 2} such that σi ({min S 2 }) = 0. Then, γ Aj (·, σi ) is continuous at min S 2 , so that γ Aj (min S 2 , σi ) = γ Aj (σi , σ j ). By Lemma 1, the map t →
γ Ai (t , σ j ) is decreasing over the interval [t 0 , min S 2 ]. This is in contradiction with the equality γ Aj (min S 2 , σi ) = γ Aj (σi , σ j ).
This concludes the proof that S 1 = S 2 (= S ). We next prove, again by way of contradiction, that S is connected. Assume instead that (t , t˜) ∩ S = ∅, for some t < t˜ in S. Using Lemma 5, assume wlog that σ2 ({t }) = 0, so that γ A2 (t , σ1 ) = γ A2 (σ1 , σ2 ). Since t ∈ S, one has σ1 ({t }) = σ2 ({t }) = 0,23 one has ξ2 (t ) p ∗ . By Lemma 1 again, this implies that the map t → γ A2 (t , σ1 ) is decreasing over the interval (t , t˜] – a contradiction. 2 By Lemma 5, none of the two distributions
σ1 and σ2 has an atom in (t p , t¯], and at most one of the two distributions t
has an atom at t p . Assume that, say, that σ1 has no atom. We denote by F 1 the cdf of σ1 , and introduce H 1 (t ) = 0 e −λx (1 − F 1 (x)) dx. Since σ1 is non-atomic, the map t → γ A2 (t , σ1 ) is constant on S. By Proposition 4, H 1 is C 2 on S, and solves the linear equation (22) on this interval. Since W (φ(t )) > s on the interval [t p , t φ ), the coefficient of H 1 does not vanish on this interval, and there is a unique solution to (22) on the interval [t p , t φ ) that assumes given initial values for H 1 (t p ) and H 1 (t p ).
−λt p On the other hand, note that H 1 satisfies the boundary condition H 1 (t p ) = 1−eλ , H 1 (t p ) = e −λt p . Hence, H 1 , and therefore F 1 , is uniquely defined. This in turn defines uniquely t¯ as the smallest t such that F 1 (t ) = 1.24
22 23 24
Details of the computation are available upon request. Otherwise, by Lemma 5, and since S 1 = S 2 , t would be a common atom of Recall that we are assuming the existence of an equilibrium.
σ1 and of σ2 – contradicting Lemma 5.
50
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
We now address the more delicate part – proving that σ1 = σ2 . We build on the intuition stated at the end of Section 3.2.2. Should σ2 ({t p }) > 0, player 1 would be more pessimistic than player 2 at time t p +: ξ1 (t p +) < ξ2 (t p +). Since both players find it optimal to experiment beyond t p , player 1 must therefore assign higher chances than player 2 to the fact that the other player drops out soon. That is, the (conditional) hazard rate of θ2 must exceed the hazard rate of θ1 in a neighborhood of t p . Proceeding “inductively”, we will show that F 2 (t ) > F 1 (t ) for all t ∈ (t p , t¯], which contradicts the equality F 1 (t¯) = F 2 (t¯) = 1. t Introduce the map H 2 (t ) := 0 e −λx (1 − F 2 (x)) dx, for t > t p . We argue by contradiction and assume that F 2 (t p ) > 0 (= F 1 (t p )). Since σ2 has no atom in (t p , t¯], one has γ A1 (t , σ2 ) = γ A1 (σ1 , σ2 ) for each t ∈ (t p , t¯], and therefore, H 2 solves (22) −λt p
on the interval (t p , t¯]. In addition, when setting H 2 (t p ) := 1−eλ and H 2 (t p ) = e −λt p (1 − F 2 (t p )), the map H 2 is C 2 on the closed interval [t p , t¯] and solves (22) on [t p , t¯]. We set Ω(t ) := (1 − F 1 (t )) H 2 (t ) − (1 − F 2 (t )) H 1 (t ). Note that Ω(t p ) > 0, while Ω(t¯) = 0 since F 1 (t¯) = F 2 (t¯) = 1. Set t˜ := min{t ∈ [t p , t¯]: Ω(t ) = 0}. Since Ω(t¯) = 0, one has t˜ ∈ [t p , t¯] and Ω(t˜) = 0. Since Ω(t p ) > 0, one has Ω(t ) > 0 for each t ∈ [t p , t˜). From (22), one has
F i (t )
H (x) = λφ(x) p ∗ − ψ(x) + p ∗ − φ(x) i , H i (t ) H i (x) for t ∈ [t p , t¯]. Hence, for fixed t, the ratio F 1 (t ) H 1 (t )
F 2 (t ) H 2 (t )
F i (t ) H i (t )
is decreasing w.r.t. the ratio
1− F i (t ) . H i (t )
Since Ω(t ) > 0 for t ∈ [t p , t˜), it follows
< for t ∈ [t p , t˜), and therefore, Ω(·) is increasing over the interval [t p , t¯]. This contradicts the fact that Ω(t p ) > Ω(t¯). This concludes the proof that there is at most one equilibrium.
that
B.5.2. Existence Since W (φ(x)) > s for x ∈ [t p , t φ ), there is a C 2 function H that solves (22) on the interval [t p , t φ ) and such that −λt p and H (t p ) = e −λt p . H (t p ) = 1−eλ
Lemma 8. Let t˜ ∈ [t p , t φ ) be such that H (t ) > 0 on [t p , t˜]. Then H (t ) + λ H (t ) 0 for each t ∈ [t p , t˜]. In particular, H is decreasing over [t p , t˜]. Proof. Set t 0 := max{t ∈ [t p , t˜]: H (t ) + λ H (t ) 0}. From (22), one can check that H (t p ) + λ H (t p ) < 0, hence t 0 > t p . We claim that t 0 = t˜. Assume to the contrary that t 0 < t˜. One has H (t ) −λ H (t ) < 0 for each t ∈ [t p , t˜], hence H is decreasing on [t p , t 0 ], and H (t 0 ) < 0. Hence there exists t 1 ∈ (t 0 , t˜] such that H is decreasing on [t 0 , t 1 ]. We rewrite Eq. (22) as
α (t ) H (t ) + λ H (t ) = a(t ) H (t ) + b(t ) H (t ), with a(t ) := e λt (rsP(min(τ1 , τ2 ) t ) − λ(γ − s)P( R i = G , τ j t )),
b(t ) := λe λt rsP R i = G , τ j t − λ(γ − s)P R i = R j = G , min(τi , τ j ) t , and α (t ) > 0 for each t ∈ [t p , t φ ]. It can be shown that a(·) is positive and decreasing, while b(·) is negative and decreasing on [t p , t 1 ]. On the other hand, H is positive and decreasing, while H is positive and increasing on [t p , t 1 ]. Therefore,
a(t ) H (t ) + b(t ) H (t ) < a(t p ) H (t p ) + b(t p ) H (t p )
(30)
(t p , t 1 ], and an easy computation shows that the right-hand side of (30) is equal to zero. Hence H
for each t ∈ negative on [t 0 , t 1 ] – in contradiction with the definition of t 0 .
2
+ λ H is
Lemma 9. There exists t < t φ such that H (t ) = 0. Proof. Assume instead that H (t ) > 0 for each t ∈ [t p , t φ ). Integrating (22), there are c 1 , c 2 ∈ R, such that
H (t ) + λ H (t ) = c 1 + c 2
t tp
φ(x) − p ∗ H (x) dx + λ W (φ(x)) − s
t tp
φ(x)(ψ(x) − p ∗ ) H (x) dx. W (φ(x)) − s
(31)
By Lemma 8, H is decreasing and positive on [t p , t φ ], and is therefore bounded. Note also that H is positive, increasing and bounded. Since the map x → W (φ(x)) − s is C 1 and vanishes for t = t φ , the first integral in (31) has a finite limit when t → t φ , while the second one converges to −∞, which implies H (t ) → −∞, a contradiction. 2
D. Rosenberg et al. / Games and Economic Behavior 82 (2013) 31–51
51
Define t¯ := min{t ∈ [t p , t φ ]: H (t ) = 0}. Define F by F (t ) = 0 for t t p , F (t ) = 1 − e λt H (t ) for t ∈ [t p , t¯], and F (t ) = 1 for t > t¯. The map F is increasing on [t p , t¯] by Lemma 8, and is continuous over R+ . It is therefore the cdf of a non-atomic measure σ , which is concentrated on [t p , t¯]. We now argue that (σ , σ ) is a symmetric equilibrium. Since σ is non-atomic, the map t → γ Ai (t , σ ) is continuous over the compact set R+ ∪ {+∞}, and a best-reply therefore exists. When facing σ , one has ξi (t ) > p ∗ for t < t p , and ξi (t ) < p ∗ for t > t¯, hence the support of any best-reply map is included in [t p , t¯]. By Proposition 4, the map t → γ Ai (t , σ ) is constant on [t p , t¯], and thus, any strategy with support in [t p , t¯] is a best-reply to σ . References Aoyagi, M., 1998. Mutual observability and the convergence of actions in a multi-person two-armed bandit model. J. Econ. Theory 82, 405–424. Bergemann, D., Hege, U., 2005. The financing of innovation: Learning and stopping. RAND J. Econ. 36 (4), 719–752. Bergemann, D., Välimäki, J., 2008. Bandit problems. In: Durlauf, S., Blume, L. (Eds.), The New Palgrave Dictionary of Economics. 2nd edition. Palgrave Macmillan Ltd, Basingstoke, New York. Bolton, P., Harris, C., 1999. Strategic experimentation. Econometrica 67 (2), 349–374. Keller, G., Rady, S., 2010. Strategic experimentation with Poisson bandits. Theoretical Econ. 5, 275–311. Keller, G., Rady, S., Cripps, M., 2005. Strategic experimentation with exponential bandits. Econometrica 73 (1), 39–68. Klein, N., Rady, S., 2011. Negatively correlated bandits. Rev. Econ. Stud. 78 (2), 693–732. McLennan, A., 1984. Price diversity and incomplete learning in the long run. J. Econ. Dynam. Control 7, 331–347. Malueg, D.A., Tsutsui, S.O., 1997. Dynamic R&D competition with learning. RAND J. Econ. 28 (4), 751–772. Moscarini, G., Squintani, F., 2010. Competitive experimentation with private information: The survivor’s curse. J. Econ. Theory 145, 639–660. Murto, P., Välimäki, J., 2009. Learning and information aggregation in an exit game. Mimeo. Presman, E.L., 1991. Poisson version of the two-armed bandit problem with discounting. Theory Probab. Appl. 35, 307–317. Rosenberg, D., Solan, E., Vieille, N., 2007. Social learning in one-arm bandit problems. Econometrica 75 (6), 1591–1611.