Available online at www.sciencedirect.com
ScienceDirect Journal of Economic Theory 166 (2016) 396–444 www.elsevier.com/locate/jet
Discounted quotas Alexander Frankel 1 Chicago Booth, United States Received 23 January 2015; final version received 3 August 2016; accepted 10 August 2016 Available online 2 September 2016
Abstract This paper extends the concept of a quota contract to account for discounting and for the possibility of infinitely many periods: a discounted quota fixes the number of expected discounted plays on each action. I first present a repeated principal-agent contracting environment in which menus of discounted quota contracts are optimal. I then recursively characterize the dynamics of discounted quotas for an infinitely repeated iid problem. Dynamics are described more explicitly for the limit as interactions become frequent, and for the case where only two actions are available. © 2016 Elsevier Inc. All rights reserved.
JEL classification: D82; D86 Keywords: Dynamic contracts; Credence goods; Delegation
1. Introduction Contracts between a principal and agent are often conditioned on the actions that an agent takes, but not on the outcomes of those actions. For instance, insurance companies tend to reimburse hospitals according to a fee-for-service model rather than paying for performance. The hospital isn’t fined when the patient suffers a complication. In environments where an agent’s
E-mail address:
[email protected]. 1 Thanks to Manuel Amador, Eric Budish, Jeremy Bulow, Johannes Hörner, Michael Ostrovsky, Canice Prendergast,
Michael Schwarz, and Lars Stole for useful discussions; and especially to Andrzej Skrzypacz and Bob Wilson for their guidance and advice. http://dx.doi.org/10.1016/j.jet.2016.08.001 0022-0531/© 2016 Elsevier Inc. All rights reserved.
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
397
actions are observable but where outcomes may be hard to contract on, how should a principal design a long-term incentive contract? One form of contract which does not depend on outcomes is a quota, i.e., a restriction on the number of times that each treatment can be performed. Quotas have the virtue of giving no incentive for an agent to push for higher revenue or lower cost actions at any point in time. Suppose a governmental insurance provider imposes a yearly quota over the number of times that a hospital can perform costly, intensive treatments versus cheap, simple treatments – call them Surgery versus Aspirin. Without a quota, the hospital would have an incentive to push the higher profit procedure onto everyone. But with a quota, it gets paid the same amount and incurs the same costs no matter what is the order of the procedures. The hospital is willing to give the sicker patients Surgeries and the healthier patients Aspirins. Of course, quotas are not perfect when the distribution of needs is not precisely known in advance. If more sick patients show up than expected in some year, a quota will force the hospital to give the last ones Aspirin instead of the Surgery they require. One way of minimizing the error in a quota is to push back the time horizon. Instead of having a separate quota every year, have a quota over ten-year blocks. Longer quotas average out the short-term noise and allow more patients to receive the appropriate treatment. As we extend the time horizon, though, the discounting of the future becomes more important. In a quota over a long block of time, the hospital might prefer to treat early patients with inexpensive Aspirin and save costly Surgeries for the end. Or if it receives a large payment at the time of performing each Surgery, it would do Surgeries early and save Aspirins for later. Loosely speaking, discounting limits the length of the block of periods over which a quota retains its incentive properties. More impatient agents must be given shorter blocks. In this paper I argue that instead of creating quotas over artificial blocks of periods, we should in fact link together all decisions over the lifetime of the contract. We can do this in a way that maintains incentives for any given discount factor of the agent, and across any number of periods. A discounted quota contract specifies that over the entire lifetime of the interaction, each action is to be played a certain number of discounted times. We can implement the discounted quota as follows. At the beginning of a period, the contract keeps track of a “weight” – a number of remaining discounted plays – on each action. When an action is played its weight is decremented by one, just as in a standard quota. But to adjust for discounting, at the beginning of the next period all weights are scaled up by an “interest rate” equal to the inverse of the agent’s discount factor. In other words, when actions are not played, their weights recharge over time. So whether a Surgery is undertaken today or saved up for later, the hospital incurs a constant discounted number of Surgery costs. If payments are also independent of the timing of procedures, then the agent has no incentive to play lower cost actions earlier or later. Note that the discounted quota constraints depend only on the agent’s discount factor. The government’s suggestions for which treatment to apply to which patient might change based on whether it is a patient social planner or an impatient political actor. But the interest rate used for scaling up weights after each period is unchanged. Section 2 formalizes the logic of this hospital example, and then Section 3 models a more general contracting environment in which a principal and agent serve a stream of customers. The agent (hospital) privately observes the state of the world each period (health condition of an arriving patient) and then performs an observable, contractible action (treatment). The principal (government) wants actions to be appropriate for the underlying conditions. The agent’s costs of each action are state-independent and privately observed. In this environment, I find that menus of discounted quotas are optimal contracts. These quotas provide good incentives to the agent
398
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
in that she is made indifferent over all action choices in each period, so she is willing to follow the principal’s preferred match of actions to states. In an extension (see Section 5), I show that discounted quotas continue to give the agents good incentives when payoffs are partially aligned – for instance, hospitals care about patient welfare – though they may no longer be part of an optimal contract. Having introduced and motivated discounted quotas as a contract form, Section 4 investigates their dynamics. What is the rule that determines when an action should be taken? Under this action policy, what are the principal’s corresponding payoffs? In a setting with iid states drawn from some known distribution, actions and payoffs can be derived as the solution to a simple dynamic program. I characterize the solution to this dynamic program in detail for two special cases. First, I look at the limit as actions become frequent or, equivalently, as players become patient. Increasing the frequency of actions effectively lengthens the relevant time horizon of the players; it is as if we have added extra periods in a finite non-discounted quota. In an iid setting, the aggregate uncertainty over the empirical frequency of different states vanishes. We can therefore calculate the principal’s continuation payoffs nonrecursively starting from any history of a discounted quota contract. Second, I consider the case where there are only two available actions. The policy will take the simple form of a cutoff rule. For each underlying state, we take an action if and only if it has a high enough weight (number of remaining plays). So the more an action has been played, the scarcer it becomes, and the more reluctant the agent is to play it. The weight cutoff for a state depends on the relative payoffs of each action. A patient who would benefit from either Aspirin or Surgery may be given whichever treatment happens to have a higher weight. A patient who desperately needs Surgery, on the other hand, would be treated with Aspirin only if we are out, or almost out, of Surgeries. One question regarding the dynamics of discounted quotas is whether actions are ever “used up.” When an action is down to a single play remaining, are there states under which it should be taken? Or should the action be saved for the future, letting its weight recharge? The answer turns out to depend on the relative discounting of the principal and the agent. When the principal is more patient than the agent, he will always prefer to save a little bit of weight for the future. No action is ever played for a final time. When the principal is less patient than the agent, he will be willing to use up actions. In that case, over an infinite time horizon, some action will eventually be used up and the players will be “immiserated” – the one remaining action will be played forever after, with no flexibility to play different actions in different states. Equal discount factors is a knife-edge case: in the optimal contract, an action can only be used up in the best possible state for that action. So if states are finite and the best state is realized with positive probability, one of the actions will eventually be used up. If states are continuous and any given state occurs with zero probability, actions will never be used up. The model I consider, in which the principal conditions a contract on actions but not on outcomes, can be thought of as a dynamic analogue of the verifiable credence goods model in Dulleck and Kerschbamer (2006). As in that paper, the principal cares about the state of the world while the agent has state-independent preferences. In motivating the use of action restrictions rather over money, I also add in private information on the agent’s costs. Chakraborty and Harbaugh (2010) and Frankel and Schwarz (2014) look at alternative information revelation games over multiple decisions in which the observer of the private information has state-independent preferences. Those papers exogenously assume that monetary incentives are not used.
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
399
If the agent’s preferences as well as the principal’s were to depend on the state of the world, we could think of the model with contracting only on actions as a dynamic version of a delegation problem (see Holmström, 1977). Frankel (2014) and Frankel (forthcoming) look at delegation games over many decisions that may occur all at once, or over time. In particular, Frankel (2014) explores environments where the agent has unknown state-dependent preferences and shows that quota-like contracts can be max–min optimal – maximizing the principal’s payoffs against worst-case agent biases. Guo and Horner (2015) looks at delegation dynamics when underlying states evolve according to a two-state Markov process and finds that the optimal contract takes the form of a kind of dynamic budget over the number of times actions can be taken.2 The idea of restricting the number of times that an action can be taken, or a message reported, in a mechanism or an equilibrium of a repeated game goes back a long way in economics. Without solving for optimal mechanisms, many such papers show that linking multiple decisions through a quota improves payoffs relative to a static model, or show that the payoffs may approach approximate efficiency as the number of decisions grows large. Most prominently, Jackson and Sonnenschein (2007) generalize an example of Townsend (1982) to develop a “linking mechanism” for determining allocations. The mechanism places a quota over the number of times that agents can report realizations of their private values over a finite block of periods. It approaches efficiency in relationships with many periods and iid valuations. To give a sampling of other work in this space, Skrzypacz and Hopenhayn (2004) looks at a repeated auction setting and suggests a “chips mechanism” which approaches a theoretical upper bound on efficiency. Casella (2005) proposes a dynamic voting mechanism with “storable votes” which may perform better than repeated one-shot voting. Chakraborty and Harbaugh (2007) finds that quotas get a decisionmaker approximately first-best payoffs in a cheap talk game over many ex ante identical decisions. Escobar and Toikka (2013) analyze linking mechanisms in a Markovian rather than iid environment. In the current paper I do find approximate efficiency results similar to the ones in the literature, but such results are not my focus.3 The basic incentive property of a discounted quota – that it can make an agent indifferent across actions and therefore willing to act in another’s best interests – was originally noted in Frankel and Schwarz (2014), as a property of a repeated game equilibrium. Specifically, in that paper short-lived customers hire a long-lived expert in each period, and the expert diagnoses and treats a problem. Experts are willing to diagnose the customers’ problems truthfully (in a positive share of periods) when the equilibrium supports an implicit discounted quota: buyers stop hiring a given expert if she makes certain diagnoses too frequently. The paper used such quotas to construct approximately efficient equilibria in a limit with patient experts. By contrast, the current paper derives these quotas and studies their optimal implementation in a contracting setting, where there is a long-lived principal with commitment power. Instead of running a sequence of quotas over artificially short blocks of periods, as in Frankel and Schwarz (2014), all periods should be aggregated into a single block. I can explore the dynamics and corresponding payoffs
2 Amador et al. (2003, Section 4) and Athey et al. (2005) look at dynamic delegation games with tradeoffs driven by time-inconsistent preferences, and find conditions under which budgets or quotas across decisions are not beneficial. Instead, it is optimal to repeatedly use static – that is, history-independent – contracts. 3 In many of these papers, the agent’s relevant private information is on her costs or benefits of each action. The current paper focuses instead on an agent with information about which action the principal would prefer. Discounted quotas can be used in either case (see Section 5 on altruistic principals), but it is the latter setting in which I motivate quotas as optimal contracts.
400
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
of the contract even when players have low discount factors, and even when discount factors disagree. 2. Motivating examples There is a government (principal) and a hospital (agent). In each period t = 1, ..., t , a patient arrives with condition θt (the state of the world) and then the patient receives a treatment at (the action). The condition is drawn according to some stochastic process and may be Severe (Sev) or Mild. There are two possible treatments for a patient, either Surgery (Surg) or Aspirin (Asp). The government prefers that patients with Severe conditions get Surgeries and patients with Mild conditions get Aspirin. Writing his stage utility as UP (a, θ ), we have that UP (Surg, Sev) > UP (Asp, Sev) and UP (Surg, Mild) < UP (Asp, Mild). The hospital doesn’t care about patient welfare. She simply faces a cost UA (Surg) for performing a Surgery, and a cost UA (Asp) for administering Aspirin. These costs are drawn independently from some atomless distributions at the beginning of time, are constant across periods, and are privately observed. If there is a transfer payment τt from the government to the hospital in period t, the stage payoff of the hospital in period t is UA (at ) + τt and the stage payoff of the government is UP (at , θt ) − τt . These payoffs are discounted by both players at discount factor β. Treatments are contractible and publicly observable, but the state is observable only to the agent. The agent learns state θt at the beginning of period t and the principal never learns the state. A contract specifies the set of actions that may be taken at each public history along with corresponding transfer payment for each action. Example 1. Let there be t = 10 periods of the game, and no discounting (β = 1). Consider two types of contracts, fee-for-service and quota. Under a fee-for-service contract, the government fixes some payments τ Surg for Surgery and Asp τ for Aspirin. At period t, the hospital is free to give any treatment at it desires, and receives the corresponding payment τ at . Notice that, for any τ Surg and τ N that are chosen, the hospital will almost surely strictly prefer one treatment over the other. It will treat every patient with a Surgery if its costs are drawn such that UA (Surg) + τ Surg > UA (Asp) + τ Asp , and it will treat every patient with Aspirin if UA (Surg) + τ Surg < UA (Asp) + τ Asp . It will only be indifferent and willing to perform either treatment in the zero probability case that these terms are exactly equal. Under a quota contract, the principal makes some fixed transfer payment to the agent over the course of the game, and then requires that Surgery be undertaken in exactly 0 ≤ n ≤ 10 out of the 10 periods. (If 30% of patients were expected to require Surgery, for instance, a natural choice would be n = 3.) Aspirin must then be chosen in 10 − n periods. Whatever actions it plays in each period, it will ultimately incur n Surgery costs of UA (Surg) and 10 − n Aspirin costs of UA (Asp). The hospital is therefore indifferent over the timing of Surgery and Aspirin choices, and is willing to follow any strategy the government proposes. For instance, the government may suggest Surgery for every Severe patient and Aspirin for every Mild patient until one of the actions is used up. The agent would then play the remaining action for the rest of the game. The quota constraints can be recursively formulated in the following manner. Let the Surg Asp “weights” Wt and Wt denote the remaining number of plays on Surgery and Aspirin at Surg Asp the start of period t. It holds that Wt + Wt = 11 − t. Action Surg can be taken in the current
A. Frankel / Journal of Economic Theory 166 (2016) 396–444 Surg
401
Asp
period if Wt ≥ 1, and action Asp can be taken if Wt ≥ 1. After an action is taken at period t , reduce the weight on that action by one unit at t + 1, and keep the other weight fixed. 2 As the example illustrates, a quota can induce an agent to match her action to the state of the world, at least some of the time. Moreover, if action restrictions of some form are to be used, it is easy to see that they should be made as flexible as possible. Once it has been established that the agent must play three Surgeries over ten periods, say, there is no point in any additional restriction on the agent’s behavior. For instance, it is counterproductive to tell the agent in advance that of the three total Surgeries, she must play exactly one Surgery over periods 1 through 3, or at most two Surgeries over periods 1 through 7. The principal does better just to give the agent the aggregate quota constraint of three Surgeries over ten periods. The agent can then use her information to play in the principal’s best interests. But, as long as the principal doesn’t know the agent’s precise tradeoffs across actions, fee-forservice will not get the agent to condition her action on the state. Indeed, as I will formalize in Lemma 1 of Section 3.4, more complicated monetary payment schemes would also fail. A policy which adjusted payments up or down over time in response to the agent’s action choices, for instance, could never calibrate payments to an agent’s exact indifference level. Even a screening contract, in which an agent could select her reimbursement rate from a menu after observing her costs, would not solve the problem. She would misreport her costs if the principal were to attempt to use these reports to set payments that would make her indifferent. The analysis of fee-for-service contracts would not change if we added in discounting. Whatever payments were suggested, the agent would still almost always push for Surgery or for Aspirin independently of the patient’s need. The next example shows how to modify quotas to account for discounting. Example 2. Let there be t = ∞ periods of the game, and let the discount factor be β = .9. In 1 discounted terms, there are 1−β = 10 periods worth of payoffs remaining from any point onward. Consider a discounted quota of the following form: starting from the first period, Surgery must be played n discounted times, and Aspirin 10 − n discounted times. So, whatever her strategy, the agent’s total discounted action payoff will be n · UA (Surg) + (10 − n) · UA (Asp). The agent is once again indifferent over all strategies and is willing to follow the principal’s suggestions. Surg Asp To recursively write these quota constraints, let the weights Wt and Wt denote the remaining number of discounted plays on Surgery and Aspirin at the start of period t . It holds that Surg Asp Wt + Wt = 10 at each t . An action can be played when its weight is greater than or equal a . 4 to 1. If an action is taken today then the discounting constraint requires that Wta = 1 + βWt+1 a a If action a is not taken, it must be that Wt = 0 + βWt+1 . Rearranging, next period’s weight is given by ⎧ a ⎨ Wt −1 if at = a β a . Wt+1 = W a ⎩ t if at = a β In other words, just as in a standard quota, we decrement the weight by one unit after we take an action. Moving to the next period, though, we scale all of the weights up by β1 . With β = .9 this gets us back to a total weight of 10 going forward from t + 1. 4 In the model of Section 3 I allow for fractional actions via randomization.
402
A. Frankel / Journal of Economic Theory 166 (2016) 396–444 Surg
Asp
For instance, suppose at period 1 we have weights W1 = 1 and W1 = 9. If Aspirin is Surg Asp played in the first period, then at period 2 the weights become W1 = 1/.9 = 1.11 and W2 = Surg (9 − 1)/.9 = 8.89. If Surgery is played in the first period, then period 2 weights go to W2 = Asp (1 − 1)/.9 = 0 and W2 = 9/.9 = 10. 2 While β represented both the agent’s and principal’s discount factors (βA = βP = β), the form of the quota constraints in Example 2 depended only on the agent’s discount factor. A principal with a different discount factor than the agent (βP = βA ) would write a quota which evaluated the number of “discounted times” that each action could be played at the agent’s discount factor. Of course, the principal’s discount factor might affect his suggestions to the agent about which actions to play. In coming sections I generalize these examples and show that under certain conditions, menus of discounted quotas are unimprovable. That is, they can be motivated as optimal contracts. Then I characterize the principal’s preferred strategy mapping histories and states into actions in a discounted quota – when should we play Surgery versus Aspirin? – and I solve for the corresponding payoffs. 3. The model A principal and agent contract over the actions that will be taken and the transfers that will be paid over the course of a number of periods. If the contract is rejected, both players receive an outside option utility which I normalize to 0. If the contract is accepted, then in each period t = 1, 2, ..., t there will be an underlying state of the world θt ∈ ; an action at ∈ A; and a transfer payment τt ∈ R paid from the principal to the agent. I assume that the set of possible actions A is finite, which I will write as A = {a 1 , ..., a I } for I ∈ N. There may be finitely many periods (t ∈ N) or infinitely many (t = ∞). It will be convenient to describe payoffs and information following an accepted contract before getting into the details of the contracting environment. 3.1. Payoffs in an accepted contract The principal and agent objective functions are as follows: Principal: VP =
t
βPt−1 UP (at , θt ) − GP t (τt )
t=1
Agent: VA =
t
βAt−1 UA (at ) + GAt (τt ) .
t=1
Let us first focus on the first terms, the action payoffs. In period t , the principal’s action payoff is UP (at , θt ). His preferences over actions depend on the underlying state of the world θt . The agent, on the other hand, has state-independent preferences. Her payoff for taking action at in period t is UA (at ), regardless of the state.5 The principal and agent discount these payoffs at discount factors βP and βA , which need not be identical. Both discount factors are positive. To 5 In Section 5, I discuss the possibility that the agent may have state-dependent utility. For instance, a hospital may value patient outcomes in addition to treatment costs.
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
403
ensure that payoffs do not go to infinity, I require that UP is bounded if is not a finite set; and I assume that the discount factors are strictly less than 1 in the case that t = ∞. Transfer payoffs in period t are given by GP t (τt ) and GAt (τt ), where the G functions are directly defined in terms of discounted value relative to period 1. The key assumption is that transfer payoffs are additively separable from action payoffs. Given the separability, I will show that the principal can separate the problems of providing incentives for appropriate actions and of determining transfers payments. This paper focuses only on the aspect of incentivizing actions, so I make no explicit restrictions on the period-by-period transfer payoffs GP t and GAt – we may have quasilinear utility, risk aversion, limited liability (infinite costs of payments from the agent to the principal), etc. Presumably each G function should be increasing.6,7 3.2. Information The agent receives private information on two dimensions: her action costs UA (a), and each period’s state of the world θt . The agent observes her action costs prior to accepting a contract, and these utilities are fully persistent across time. Writing the agent’s action costs as the vector UA = (UA (a 1 ), . . . , UA (a I )), I assume that UA has a nondegenerate prior over RI from the perspective of the principal. By nondegenerate I mean that any set of Lebesgue measure 0 in RI has a probability of 0. This implies that there are no atoms of probability, and that (for instance) the ratio of U (a 1 ) to U (a 2 ) is not predetermined. I will write the support of UA as U ⊆ RI . The principal has a prior belief on the distribution of UA over U , but he never observes the agent’s costs directly. If the contract is accepted then in each period t = 1, . . . , t , the agent privately observes the state of the world θt before the action at is taken publicly. A contract, described below, will use the agent’s private information on the state to help determine the action. The players have some prior over the joint distribution of states over time; states are drawn according to some exogenous random process, not necessarily iid.8 The principal never directly observes information about the past, current, or future states. He only learns about states through the agent’s choices or the agent’s communications, and he can never verify the agent’s reports. For instance, the principal either does not observe or cannot contract on his own past utility realizations. In the hospital context, the government never sees patient outcomes and cannot audit the medical records to confirm that the hospital is taking appropriate procedures. For simplicity, I assume all other aspects of the game – the agent’s transfer payoff functions GAt , her outside option payoff – are common knowledge. Without loss, then, we can normalize the agent’s outside option payoff to 0. 6 With quasilinear utility of money discounted at different rates by the principal and the agent, a “money pump” would be available by which both players could get infinite payoffs by having the patient player lend to the impatient player. While there are obvious restrictions that would rule this out – for instance, limited liability or a requirement that monetary payoffs are discounted at the same market interest rate – these technical issues are a distraction from the analysis I perform. 7 Indeed, the results of the paper would go through essentially unchanged even if we generalized transfer payoffs much further, as long as we kept them separable from action payoffs. For instance, one could allow for risk aversion plus “savings” by relaxing the additive separability of transfer payoffs over time. However, this generalization would add notational burden to the paper. 8 It is convenient to assume a common prior but in the later analysis we will see that the agent’s beliefs will not be relevant.
404
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
3.3. Contracts The principal’s goal is to write a contract which uses the agent’s private information to choose actions which are appropriate to the underlying states of the world, even though the agent does not care about those underlying states. Actions are observable and contractible, but outcomes are not. In this contracting environment, I allow for mechanisms that induce stochastic actions. We can think of the principal and agent as having access to a public randomization device which chooses actions for them. Moreover, I assume that the principal and agent are both fully committed to the contract: there is no moral hazard problem in getting the agent to take the prescribed action or getting the principal to pay the transfer payment. A contract takes the following form: 1. In period 0: (a) The agent observes her utility vector UA ∈ U . (b) The agent chooses to accept or reject the contract. If the contract is rejected, the game ends and both players take their outside option utility. If the contract is accepted, proceed. (c) The agent makes some report m0 from a message space M0 . 2. In each period t = 1, ..., t : (a) The agent observes the state θt ∈ . (b) The agent makes some report mt from a message space Mt . (c) An action and transfer (at , τt ) ∈ A × T are determined – possibly stochastically – by the contract. To be willing to accept the contract the agent must receive an expected utility of at least her outside option, 0. Formalizing the contracting notation, say that at t ≥ 1 the “public t -history” Ht is the list of contractible information available at the beginning of time t , prior to the agent’s report. By the separability of payoffs across time, it is without loss to condition contracts only on reports and not on past realizations of actions and transfers (whose distributions were determined by said reports)9 : Ht ≡ (m0 , m1 , ..., mt−1 ). Let Ht ≡ t−1 s=0 Ms be the set of possible public t -histories Ht . ¯ A contract C = Ct tt=1 is then a collection of measurable functions Ct which map public t-histories and period-t reports into distributions over actions and transfers. For each t ≥ 1, Ct : Ht × Mt → (A × T ) where (A × T ) denotes the space of measurable distributions over A × T . The action and transfer (at , τt ) are chosen as a draw from the distribution Ct (Ht , mt ). 9 The restriction of contracts to condition only on reports rules out randomization that induces correlation across periods – for instance, flip a coin and take a 1 forever after if heads, and a 2 forever after if tails. But for any contract with
this kind of persistent randomization, there is a payoff-equivalent one of the form I consider in which all randomization occurs period-by-period – flip a coin independently in each period to pick a 1 or a 2 . .
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
405
Offered a contract C, an agent who accepts the contract chooses a strategy mapping the full set of information available to her at each point in time into a report. At time 0, the agent chooses a message m0 based only on her utility realization UA . At times t ≥ 1, the agent chooses the message mt based on her private history P Ht in addition to the current state. The private history P Ht ≡ (UA , θ t−1 , Ht ) includes her utility realization in U , the true realizations of the past states ¯ θ t−1 ≡ (θ1 , ..., θt−1 ) ∈ t−1 , and the public history in Ht . So the agent’s strategy σ = σt tt=0 is a collection of measurable functions with σ0 : U → M 0 σt : U × t−1 × Ht × → Mt for t ≥ 1. The set of possible strategies is denoted . In this contracting environment, the initial report m0 allows the principal to screen across agent utility types, and the later messages mt allow for actions to be conditioned on state realizations. Indeed, by the revelation principle (see Myerson, 1986 or Pavan et al., 2014 for a broader discussion), this contracting space is without loss of generality in the sense that it includes direct contracts. A direct contract is one in which the agent is asked to report all of her private information at the time she receives it: specifically, M0 = U and Mt = . In a direct contract, I write the initial message m0 as Uˆ A and the time-t message mt as θˆt , the reported type and reported state. The public t -history would then be given by Ht = (Uˆ A , θˆ t−1 ) for θˆ t−1 ≡ (θˆ1 , ..., θˆt−1 ). The truthful strategy of a direct contract, denoted σ ∗ , is the one for which Uˆ A is given by σ0∗ (UA ) = UA and the subsequent reports θˆt are given by σt∗ (P Ht , θt ) = θt for all P Ht . A history is truthful if the agent has followed the truthful strategy to that point. A direct contract is said to be incentive compatible if every type UA who accepts the contract finds it optimal to play according to the truthful strategy at time 0 and at all truthful histories.10 The revelation principle states that any equilibrium of any indirect contract – possibly with a game form different from that assumed above – can be implemented by the truthful strategy in an incentive compatible direct contract. 3.4. Discounted quotas as optimal contracts In this model, the agent has state-independent preferences, and the principal never learns information about the states except through the agent’s reports. So the agent’s payoff in an accepted contract is independent of the realized states. If the agent is to condition her reports on the states of the world – if she is to make truthful reports in an incentive compatible direct contract – she must be made indifferent over all possible reports at times t ≥ 1. Let us return to the agent’s payoff function to see how the agent can be made indifferent. Starting at the beginning of period t, the agent’s payoff going forward is ⎡ ⎤ t VA = E ⎣ βAs−1 UA (as ) + GAs (τs ) ⎦ . s=t
10 For incentive compatibility, it need not be the case that the truthful strategy is optimal conditional on mistruthful past play. For instance, if the agent has misreported her utility value, it may be optimal for her to misreport states. It does however need to be the case that the truthful strategy is optimal going forward among all possible strategies (i.e., joint deviations), not just one-shot deviation strategies that play mistruthfully today and return to truth tomorrow.
406
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
The expectation is taken over current and future actions and transfers, given the agent’s strategy and the private history (which implies beliefs on the distribution of current and future states). Plug in t = 1 at period 0, the start of the game. We can scale up by βA1−t to bring these numbers into units of action utility in period-t , and then decompose the action payoff as the agent’s cost or benefit of taking action a i times the expected number of discounted times that the action will be played over the remaining lifetime of the contract. Writing this rescaled agent objective function as V˜A , ⎞⎞ ⎛ ⎛ ⎡ ⎤ ⎟⎟ ⎜ ⎜ t t ⎟ ⎟ ⎜ I ⎜ ⎟ ⎣ 1−t UA (a i ) · ⎜ βAs−t Prob[as = a i ]⎟ GAs (τs )⎦ . (1) V˜A = ⎜ ⎟⎟ + E βA ⎜ ⎜ ⎠⎠ ⎝ i=1 ⎝ s=t s=t Wi
TA
ai
Call the expected discounted number of times that action will be played the weight on that action, W i , as indicated in Equation (1). The vector of weights is W = (W 1 , ..., W I ). Call the rescaled expected transfer payoff TA . Going forward from any point in a contract, the agent’s strategy σ implies a weight and transfer (W, TA ). Using dot-product notation, the weight and transfer yield a payoff of V˜A = UA · W + TA .
(2)
The agent chooses a strategy σ to maximize UA · W + TA over some set of feasible (W, TA ) pairs. Given a contract, let the set of feasible (W, TA ) pairs over all possible strategies be denoted by WT 0 at period 0, and by WT (Ht ) at public history Ht . (This set is determined only by public histories.) Lemma 1, below, shows how incentive compatibility restricts the weights and transfers in a direct contract. At any history following the acceptance of an incentive compatible direct contract, it is almost always the case that all state reports lead not just to the same payoff UA · W + TA going forward but to the same component values of both W and TA . The mathematical intuition is as follows. A contract specifies a menu of (W, TA ) from which the agent may choose. The envelope theorem says that, if the derivative of the agent’s maximized utility with respect to the vector UA exists, then it is equal to the maximizing W . So if the derivative exists, the maximizing W is unique, and so too is the maximizing TA . In fact, such a derivative must exist at almost every UA . And because UA was assumed to be drawn from a nondegenerate distribution, the realized UA almost surely leads to a unique pair (W, TA ) that maximizes payoffs. Definition (Almost every public history). Fix a contract C. A statement holds for almost every public 1-history if, for almost every UA at which the contract is accepted, it holds for H1 = (Uˆ A ) with Uˆ A = UA . For t ≥ 2, a statement holds for almost every public t -history if, for almost every public (t − 1)-history Ht−1 and for every θˆt−1 , it holds for Ht−1 followed by report θˆt−1 . Lemma 1. Fix some incentive compatible direct contract. For each t ≥ 1, for almost every public t-history Ht it holds that WT (Ht ) is a singleton. All proofs are in Appendix A. To understand the content of this lemma, think back to Example 1. There are t = 10 periods, no discounting, and the agent has quasilinear utility. The state of the world in each period is Mild or Severe, and the action Surgery or Aspirin is to be taken. Suppose the principal suggests a
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
407
fee-for-service contract, with payment in each period of τ Surg for Surgery and τ Asp for Aspirin. As an indirect contract, it is certainly not the case that there is a single feasible weight and transfer WT (Ht ) over all possible strategies going forward from a history Ht . Starting at period 6, for instance, the agent could choose to play 5 more Surgeries; 5 more Aspirins; or any convex combination of the two. But the lemma applies to incentive compatible direct contracts, not to indirect contracts. So let us reinterpret this fee-for-service contract as an equivalent incentive compatible direct contract. Surg Asp Surg At time 0, the agent observes her utility realization UA = (UA , UA ). If UA + τ Surg > Asp UA + τ Asp , then the agent would optimally choose Surgery in every period going forward, Surg Asp independent of state realizations. So after reporting utility such that UA + τ Surg > UA + τ Asp , the direct contract maps any period-t state report θˆt into Surgery. At period t the remaining weight on Surgery would be W Surg = 11 − t and the weight on Aspirin would be W Asp = 0, no matter what state-reporting strategy she chose. The lifetime transfer payoff would be TA = Surg Asp 10τ Surg . Likewise, if the agent reported utilities such that UA + τ Surg < UA + τ Asp , then all state reports would lead to action Aspirin in every period, pinning down the weights and transfers. Surg Asp It is only agents with utility UA + τ Surg = UA + τ Asp for whom the weights and transfers going forward are not predetermined. But agents with these knife-edge payoffs are realized with zero probability. Lemma 1 extends this logic to fully general contracts. As an indirect contract, an agent is given some menu of possibilities at time 0, and at times t ≥ 1 observes the state realizations and makes additional choices. The lemma establishes that, having reached period t on some equilibrium path of play, it is almost surely true that the weights and transfers going forward will not depend on any current or future state realizations. In other words, interpreted as a direct contract, the agent’s initial report of Uˆ A effectively amounts to a selection of time-1 weights and transfers (W, TA ) from a menu. Once selected, she abides by these weights forever after. Even with action weights constrained to be constant across state realizations, a contract in which actions respond to states of the world is still possible. Recall Examples 1 and 2: by using a quota, sufficiently corrected for discounting, today’s action can in fact be conditioned on the state in such a way as to keep the weight fixed. We will explore this idea by formally defining discounted quota contracts. These contracts are most conveniently written as indirect ones. Definition. A discounted quota specifies an initial weight vector W1 = (W11 , ..., W1I ) such that i t t−1 the state θt , the agent makes a report πt = t=1 βA . In period t ≥ 1, after observing i W1 = 1 I (πt , ..., πt ) ∈ (A) subject to the constraint that ts=1 βAs−1 πsi ≤ W1i for each i = 1, ..., I . Then the mechanism draws action at from the distribution πt .11 The transfer payment τt is determined exogenously to the agent’s current and past state reports, and to the history of past actions and transfers. A menu of discounted quotas is a contract that gives the agent a choice of discounted quotas (weights and transfer policies) at time 0. The agent can reject the contract, or accept and report m0 to select one of the discounted quotas to play for the remainder of the game. 11 That is, the message space M is (A), with the report m equal to the distribution π . Any report π violating the t t t t s−1 i quota constraint ts=1 βA πs ≤ W1i can be arbitrarily mapped into some nonviolating report.
408
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
Unpacking this definition, the agent chooses actions in period t through her report πt . The report is a distribution over actions in (A), indicating that the agent may choose stochastic actions that are resolved by a public randomization device. The term πti denotes the probability that action a i is taken at period t . Adding these discounted probabilities over time, we get that t−1 i i s≤t βA πt is the expected discounted number of times that action a has been played over periods 1 through t , given the agent’s reports. (The realized number of plays may be larger or smaller.) A discounted quota lets the agent choose any actions she wants, or any distributions of actions, subject to the constraint that over the lifetime of the contract the expected discounted number of plays on action a i never goes above a specified weight W1i . There is no slack, so by the end of the contract the discounted number of plays exactly hits the limiting weight. Transfers are determined independently of the action choices. A menu of discounted quotas allows agents to choose from different initial weights, and perhaps different transfer policies. For instance, the principal may offer the agent a choice of many Surgeries and few Aspirins for a large payment, or few Surgeries and many Aspirins for a small payment. In a discounted quota, it is convenient to rewrite the weight restriction ts=1 βAs−1 πti ≤ W1i in s−1 i i ≤ W1i − t−1 terms of the constraints on the period-t action: βAt−1 πt−1 s=1 βA πs , or s−1 i W1i − t−1 s=1 βA πs i i i πt ≤ Wt , for Wt ≡ βAt−1 where Wti represents the period-t present value of remaining plays on a i . In recursive terms, the equation of motion of weights across the vector of actions is Wt − πt Wt+1 = . (3) βA So to find the new weight on action a i , we start with the previous period’s weight and then subtract the probability the action was to be taken – i.e., the expected number of times it was played. Then we scale up this value by the inverse of the agent’s discount factor to account for discounting; see a simple numerical example of these dynamics in Example 2. Given a discounted quota contract, the agent is indifferent over all action choices. The principal, of course, is not. He wants the agent to choose actions that are appropriate to the state of the world. The principal’s continuation payoffs in a discounted quota can be broken up into the payoff from actions plus that from transfers. The transfer payoffs are independent of the agent’s reports, and so can be thought of as a sunk cost.12 Given some discounted quota and some agent strategy, denote the principal’s present value of expected lifetime action payoffs starting at private history P Ht by Y (P Ht ): Y (P Ht ) ≡
t
βPs−t E[UP (as , θs )|P Ht ].
s=t
I call Y the principal’s action payoff. Note that we have conditioned this action payoff on the private, not the public, history. In other words, we take expectations with respect to the full information available to the agent. 12 The transfer payments affect the principal’s ex ante problem of designing an optimal menu of discounted quotas. One would have to specify transfer payoff functions GAt and GP t in order to solve this ex ante problem.
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
409
This is because the agent’s strategy and the distribution of future states may both depend on the realizations of past states θ t−1 that have been privately observed by the agent.13 Say that the agent’s strategy over actions in a discounted quota is principal-optimal if it maximizes Y going forward from every history, subject to the constraints of the contract.14 At history P Ht with state θt , let π∗ (θt |P Ht ) denote the action choices in a principal optimal strategy, and Y ∗ (P Ht ) the corresponding principal-optimal action payoff. Principal-optimal strategies do not necessarily myopically maximize the principal’s stage payoff in each period; it can be better to “save up” an action to use it in the future. I characterize principal-optimal payoffs and policies in an iid setting in Section 4. The main result of the paper is that menus of discounted quotas can in fact replicate and improve on any alternative contract. In other words, menus of discounted quotas are optimal contracts. The logic is as follows. In an arbitrary contract, Lemma 1 states that the agent can look forward at the beginning of period 1 and state exactly how many expected discounted times she will play each action, and how much she will be paid, if she is to play optimally. At this point the principal could replace the contract going forward with a discounted quota which replicates these weights and transfers. The agent would be indifferent to such a swap, as we see from Equation (2). In the new discounted quota the agent would be indifferent over all action choices, and so would have the option of replicating (at least in expectation) the outcomes from the original contract. And by construction, the principal-optimal strategy does weakly better for the principal. So the principal might as well pose the contract as a menu of discounted quotas from the beginning, and ask the agent to follow a principal-optimal strategy once she selects a quota. Theorem 1. Any contract can be weakly improved upon by a menu of discounted quotas in which the agent plays a principal-optimal strategy. See the proof for a formalization of the argument in the previous paragraph. An interpretation of this theorem is that the discounted quota contract form solves a dynamic moral hazard problem, that of getting the agent to select appropriate actions at different states. At the initial stage, there is also a problem of screening over agent types with a menu of discounted quotas. Section 4 studies the dynamics of payoffs and action choices once a discounted quota has been selected. Appendix B discusses in more detail the problem of solving for the optimal menu – that is, sets of weight and transfer pairs. 3.5. Discussion of modeling assumptions Let us pause now to highlight three key assumptions of the model. We can shed light on the intuition behind Theorem 1 by illustrating how the result may fail to hold when the assumptions are violated. If the assumptions were relaxed, discounted quotas might still provide an appealing benchmark as contracts which were robust to many details of the economic environment. But it would be possible to fine-tune and improve upon them. 13 As a technical matter, one could augment the discounted quota to have the agent report states as well (without directly affecting actions or transfers), in which case we could replace the private history with a public history. Alternatively, in the implementation of a discounted quota as a direct contract, the states would be truthfully reported, and so the payoff could again be taken as a function only of the public history. 14 The principal-optimal strategy only refers to the agent’s choices in a given discounted quota, not over the choice of discounted quotas from a menu.
410
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
1. No contracting on outcomes: The principal receives no information about past states, or about whether past actions were appropriate. For instance, the government cannot observe health outcomes of patients treated by the hospital. If outcome signals were contractible, the principal would have new ways of incentivizing the agent to give appropriate treatments. One obvious channel would be performance pay: the contract could specify that the hospital is paid more money when more of its patients survive. Or if patient health records could be audited, revealing information about past states of the world, the contract could punish the hospital for having performed inappropriate treatments. Indeed, even if these signals were observable but not verifiable, the principal might be able to make use of performance measures through a so-called implicit or relational contract (see, e.g., Pearce and Stachetti, 1998; Baker et al., 2002; Levin, 2003, or MacLeod, 2003). After subjectively observing a patient with a poor health outcome, say, the government could pay out a low bonus to the hospital. 2. State-independent preferences of the agent: The agent’s payoff for action a i , UA (a i ), does not depend on the state of the world. That is, hospital payoffs depend on the treatments that are taken, but not on patients’ underlying conditions. This assumption is fundamental to the model. First, it implies that an agent can only be given incentives to condition the action on the state of the world by making her indifferent across actions. Second, it guarantees that discounted quota contracts achieve this indifference. So when the agent is given a discounted quota, she is willing to follow the principal’s recommendations on how to match actions to states. In many economic environments, of course, we would expect an agent’s payoffs to vary with the underlying state of the world. Moreover, the exact indifference invoked in the above argument is potentially worrisome on robustness grounds. The agent has no direct preference for maximizing the principal’s payoffs rather than playing in any other manner. Very small utility shocks – for instance, a random ti added to the payoffs of action a i at time t – would shift the agent’s behavior, and reduce the principal’s payoffs, discontinuously. Likewise, if the agent were required to pay any arbitrarily small cost to learn the state of the world, she would never find such an investment worthwhile. As I explore in Section 5, under some kinds of state-dependent utility the agent would not only continue to play principal-optimal strategies, but would do so with strict rather than weak incentives. One example of these more aligned preferences would be an “altruistic” agent who places some weight on the principal’s action payoffs. Suppose the hospital in the motivating examples faced private costs for each treatment, as before, but also cared about patient welfare. The hospital would then strictly prefer to act in the government’s best interests instead of being indifferent.15 With state-dependent preferences that induce alignment, however, we would no longer expect menus of discounted quotas to be optimal contracts. The principal could take advantage of slackened incentive constraints to give the agent more flexibility than before and still have her 15 The flip side, of course, is that an agent with preferences perturbed against the direction of alignment would act strictly opposed to the principal’s interests in a discounted quota. Think of a “disgruntled employee” who resolves indifferences in favor of harming her employer. Facing a disgruntled employee, the principal would certainly want to avoid quota-like contracts. It would be better to mandate actions in each period without giving the agent any input at all.
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
411
act in his best interests. Think of the limit as the agent’s preferences become fully aligned with the principal’s: the agent should face no action restrictions at all. If a hospital acts to maximize patient welfare, her freely chosen actions already give the principal first-best payoffs, better than any quota. 3. Generic uncertainty over agent utilities: The vector UA is drawn from a nondegenerate distribution and is privately observed by the agent. In other words, the government insurance provider is uncertain about the hospital’s precise costs for different treatments. This final highlighted assumption is perhaps the most subtle. As long as the agent is playing to maximize the principal’s payoff, the principal wants her to have as much flexibility as possible. The assumption of generic uncertainty is what guarantees that a discounted quota is in fact “as much flexibility as possible”. With generic uncertainty, if we relaxed the quota constraints at all, the agent would (almost surely) no longer be indifferent, and would no longer condition her actions on the states of the world. But with more precise information on the agent’s utility, new contract forms could potentially maintain indifference while giving the agent additional flexibility. Such a contract could potentially improve on any menu of discounted quotas. To illustrate the intuition, return to the setting of Example 1: Surgery or Aspirin is taken in each period, and the agent has quasilinear utility over money. With generic uncertainty over UA , we saw that a fee-for-service contract was never able make the agent willing to condition her action on the state of the world. For any payments τ Surg and τ Asp , it held with probability one that either UA (Surg) + τ Surg < UA (Asp) + τ Asp and the agent always chooses Aspirin; or that UA (Surg) + τ Surg > UA (Asp) + τ Asp and the agent always chooses Surgery. But now suppose that the principal knew the realization of UA at the start of the contract. (The logic would be similar, though probabilistic, if the principal knew some vector UA that could be drawn with positive probability.) The principal could set payments to make UA(Surg) + τ Surg exactly equal to UA (Asp) + τ Asp . The agent would then be willing to choose the appropriate action at every period. When the number of actions I is greater than or equal to three, we can make a similar argument even without invoking transfers. Using W i to indicate the discounted number of times that action a i is to be played, a discounted quota allows only a single vector (W 1 , ..., W I ). This quota weight i yields a lifetime agent payoff (ignoring transfers) of i UA (a ) · W i . But if the agent’s utility is known rather than uncertain, we can keep the payoff pinned down while givingthe agent additional flexibility: allow any weights (W 1 , ..., W I ) subject to a specified value of i UA (a i ) · W i . Now the set of allowed weight vectors is an (I − 2)-dimensional surface rather than a single point.16 The agent has some freedom to condition the ultimate weights on the state realizations, which can only help the principal.17 When one or more of these assumptions are relaxed – allowing for outcome measures or audits, say – discounted quotas may still provide appealing benchmarks as contracts that are robust to many details of the economic environment. They do not depend on the principal’s or agent’s utility over transfer payments, and by construction are independent of the information structure relating outcomes to publicly or privately observed signals. 16 The surface has I − 2 dimensions because there are I variables related by two linear equations: U (a i ) · W i is i A fixed, and i W i adds up to the total number of discounted periods. 17 With known agent utility, with an exogenous restriction that transfer payments can not be used in contracting, and
with only two possible actions, it is possible that discounted quotas could emerge as features of optimal contracts.
412
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
4. Principal-optimal actions and payoffs Say that the agent is given a discounted quota contract, or selects one from a menu, and follows the principal-optimal strategy. What actions will be chosen in each period? What are the principal’s corresponding payoffs? To study the dynamics of discounted quotas in the cleanest environment, I assume for the rest of the section that the interaction is infinitely repeated and that states are iid. (See Guo and Horner, 2015 for dynamics in a related model with Markovian states.) Assumption 1 (Maintained throughout Section 4). There are infinitely many periods (t = ∞), and in each period the state is drawn from a commonly known, iid distribution over . Once the agent has chosen a discounted quota, the history of the contract P Ht will generally affect actions π∗ (θt |P Ht ) and payoffs Y ∗ (P Ht ) in two ways. First, the past actions (π1 , ..., πt−1 ) constrain what actions the agent may take in the future through the quota constraint. Second, the past states may affect beliefs about the distribution of future states. But under Assumption 1, beliefs on the distribution of future states are independent of past state realizations, and independent of calendar time as well. So history only matters through the impact of past action choices on the remaining weights Wt . We can therefore write Y ∗ (P Ht ) as Y ∗ (W ), and π∗ (θt |P Ht ) as π∗ (θ|W ). At any history with weights W , Y ∗ (W ) tells us the principal’s continuation action payoff under a principal-optimal strategy, and π∗ (θ|W ) tells us which action is to be taken in the current period as a function of the state. In Section 4.2, where I look at payoffs as discount factors vary, I will write Y ∗ (W ) as ∗ Y (W |βA , βP ) to highlight the dependence of Y ∗ on the parameters βA and βP . 4.1. The Bellman equation Fix an arbitrary history under a discounted quota contract. The equation of motion (3) tells us, for any current weights and any actions taken, what the weights will be in the next period. We can plug in this equation of motion to write the principal-optimal action payoff Y ∗ and the action choices π∗ as solutions to a dynamic programming problem. Lemma 2. The action payoff Y ∗ (W ) is the unique solution to the following Bellman equation: A ∗ i i ∗ W − π(θ|W ) Y (W ) = Eθ max π (θ|W )UP (a , θ) + βP Y (4) βA π(θ|W ) i=1 i ∀θ i π (θ|W ) = 1 s.t.
π i (θ|W ) ≥ 0
∀θ, ∀i ,
π i (θ|W ) ≤ W i
∀θ, ∀i
(5)
with π∗ (θ|W ) any argmax of the above program. Furthermore, Y ∗ is continuous and weakly concave. In the maximization problem, the first term gives the principal’s payoff in the current period from taking action π . The second term gives the principal’s discounted continuation payoff starting at the next period, with weights incremented appropriately.
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
413
Both the “state parameter” W and the “policy function” π in this problem are I − 1 dimensional; each has I coordinates which sum to a fixed amount.18 The constraints in (5) are just adding-up constraints on π . We need not consider additional incentive constraints because, as in any discounted quota, the agent is indifferent over action choices. In Sections 4.2 and 4.3 I characterize Y ∗ and π∗ more explicitly in two cases of particular interest: as interactions become frequent, and when there are only two available actions. 4.2. Special Case 1: frequent interactions In per-period terms, the principal-optimal action payoff is given by (1 − βP )Y ∗ (W |βA , βP ). I will characterize the limit of this per-period payoff as the frequency of interaction increases, or γ equivalently as players become very patient. In particular, I will let βP = βA for some γ > 0 and take the discount factors to 1 while holding γ constant. The parameter γ determines the relative discounting or relative patience of the principal and the agent. For instance, if γ = 2 and so βP = βA2 , the principal discounts “twice as fast” as the agent: the principal discounts in one period what the agent discounts in two. A relative patience of γ = 1 means that the players are equally patient; γ < 1 means that the principal is more patient than the agent; and γ > 1 indicates that the principal is less patient. I assume that is a finite set throughout Section 4.2: Assumption 2 (Maintained throughout Section 4.2). is a finite set. The commonly known, iid probability of realizing θ ∈ is p θ > 0 in each period. Intuitively, making the players more patient while keeping their relative patience the same has two effects. First, because we are in an iid setting, extending the relevant time horizon outward removes players’ uncertainty over the long-run distribution of states. That effectively turns the dynamic problem of allocating actions to states into a static one.19 Second, making periods “shorter” gets rid of integer constraints related to the discreteness of time. It is as if each state θ arrives at a fixed flow rate in continuous time, and likewise actions are assigned to states as a flow. In the limiting world, one only needs to keep track of the discounted amount of time that each action will be assigned to each state, from the separate perspectives of both the principal and the agent. The principal’s payoff is determined by the amount of principal-discounted time for which each action is played at each state. The quota constraint restricts the amount of agent-discounted time for which an action is to be played, adding up across states. If the principal and agent have equal discount factors (βA = βP , or γ = 1) then agentdiscounted time and principal-discounted time are the same. But when the discount factors differ, the principal and agent will generally disagree about how much time has been allocated to a given 18 In much of the literature on dynamic contracting, an optimal contract in an infinitely repeated iid environment can be characterized recursively by a one dimensional “state variable” that keeps track of the agent’s promised utility (for example, Spear and Srivastava, 1987). These papers assume that the agent’s utility function is common knowledge. In this model with generic uncertainty over the agent’s utility, contracts will instead characterized by the vector of promised W and TA values. And by the separability of the problem, the transfer TA does not affect the action policy or the payoff from actions. 19 The iid assumption is not crucially important here. Any stochastic process without uncertainty over the long-run distributions of states would give essentially the same results.
414
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
action. A patient principal (βA < βP , or γ < 1) weights later periods more strongly than does the agent. In other words, consider playing an action for a given amount of agent-discounted time in either early or late periods. A patient principal evaluates the action as being played for longer when it is taken in late rather than early periods. When the principal is impatient (βA > βP , or γ > 1), the reverse holds: playing an action for a given amount of agent-discounted time at the beginning of the game corresponds to a longer amount of principal-discounted time than does playing it at the end of the game. One can generalize and formalize the above discussion, as worked out in the proof of Proposition 1 below. I first rescale time to be in [0,1] and to be undiscounted from the agent’s perspective.20 Playing an action from L to L + l in this rescaled time – a share l of the agent-discounted periods – corresponds to a share of (1 − L)γ − (1 − L − l)γ of principal-discounted periods. Pushing actions later by increasing L for fixed l increases the share of the principal’s periods if he is patient (γ < 1), decreases the share if he is impatient (γ > 1), and doesn’t change anything if he is equally as patient as the agent (γ = 1). As we fix the relative patience parameter γ while taking both discount factors to 1 we can solve for the optimal allocation of actions to states at different points in time. Indeed, we only need to figure out the amount of agent-discounted time for which action i should be played at ∗ below. Given these values of l iθ , it is state θ – this term is denoted l iθ in the definition of ylim then easy to determine for which (rescaled) times the actions are played. When the principal and agent have equal discount factors, the timing is irrelevant – payoffs are determined entirely by the set of l iθ . When the principal is patient, his more preferred actions at a given state are pushed later (higher Liθ ) and his less preferred actions are pushed earlier (lower Liθ ). The reverse holds when the principal is impatient. These arguments give us the following construction of the principal’s limiting per-period payoff in a discounted quota as “interactions become frequent.” ∗ (w|γ ) as For γ > 0 and w ∈ (A),21 define ylim ∗ ylim (w|γ ) ≡ max {l iθ }
s.t.
p θ UP (a i , θ) · (1 − Liθ )γ − (1 − Liθ − l iθ )γ
l iθ ≥ 0 l iθ
i p θ l iθ
=1
for all i, θ (7)
for all θ
= wi
for all i ⎧ lj θ ⎪ ⎪ ⎪ ⎪ ⎪ UP (a j ,θ)
UP (a i ,θ), ⎩ θ
(6)
i,θ
if γ ≤ 1 .
(8)
if γ > 1
or s.t. UP (a j ,θ)=UP (a i ,θ) and j
20 To rescale time to be undiscounted from the agent’s perspective, apply the following transformation. For a given t−1 discount factor βA , rename period t as time ξ = 1 − βA in [0, 1) – or, a little more precisely, as the interval [1 − t−1 t βA , 1 − βA ). Then in the limit as βA → 1, the agent weights each increment of rescaled time in [0,1) equally. (Due to kinks at integer values of t , time is approximately but not perfectly undiscounted under this transformation for βA < 1.) 21 That is, w = (w 1 , ..., w I ) ∈ [0, 1]I such that w i = 1; recall that I = |A|. i
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
415
Proposition 1. Fix a relative patience parameter γ > 0 and a weight share w ∈ (A). Con (n) (n) (n) sider a sequence of discount factors and weights βA , βP , W with (i) limn→∞ βA(n) = (n)
(n)
(n)
limn→∞ βP = 1, (ii) βP = (βA )γ for all n, and (iii) W (n) =
n∈N w (n) for 1−βA
all n.
∗ (w|γ ). 1. For each n, it holds that (1 − βP )Y ∗ (W (n) |βA , βP ) ≤ ylim 2. Taking the limit as n goes to infinity, (n) (n) (n) ∗ lim (1 − βP )Y ∗ W (n) |βA , βP = ylim (w|γ ). (n)
(n)
(n)
n→∞
∗ (w|γ ) by maximizing (6) subject to (7) and (8) can be Importantly, the evaluation of ylim solved numerically and nonrecursively. There are I · || variables to maximize over, the choice of each l iθ . In the case of γ = 1, the expression [(1 − Liθ )γ − (1 − Liθ − l iθ )γ ] in (6) simplifies to l iθ ; the maximization reduces to a linear program, for which solutions are particularly easy to compute. If the principal could observe the state in each period and assign his most preferred action, he would get an average per-period action payoff of y max ≡ Eθ maxa i UP (a i , θ) . In the frequentinteraction limit, there is no uncertainty over the number of discounted times that each action ∗ (w|γ ) achieves this theowould be played under this first-best policy. So the limiting payoff ylim max i retical maximum y when the initial weight share w on an action a i is set to the probability i that a is the principal’s preferred action. As a corollary of Proposition 1, part 2, if starting weights are chosen correctly then the principal’s per-period action payoff approaches the firstbest level as the parties become patient.22
Corollary 1 (Asymptotic first-best payoffs). Let a∗ (θ ) be some maximizer of UP (a i , θ ). Let w∗ ∈ (A) be such that for each action a i , w∗i is the probability (over realizations of θ ) that a∗ (θ ) = a i .Fix any γ > 0 and consider a sequence of discount factors and weights (n) (n) (n) (n) (n) (n) βA , βP , W (n) with (i) limn→∞ βA = limn→∞ βP = 1, (ii) βP = (βA )γ for all n, and (iii) W (n) =
n∈N w∗ (n) 1−βA
for all n. It holds that
(n) (n) (n) lim (1 − βP )Y ∗ W (n) |βA , βP = y max .
n→∞
This corollary replicates standard results from previous work on quotas, such as Jackson and Sonnenschein (2007); see further discussion in the introduction. The main contribution of the analysis of this section is not that we get the expected approximate first-best payoff result when weights are set at one particular level. The main result is that we can calculate the payoffs going forward from any weights in a non-recursive manner. Some properties of Proposition 1 and its corollary are illustrated in Fig. 1, which takes a simple 2-action, 2-state example and plots the principal’s value as a function of weight for different discount factors. Fixing a level of relative patience between the players, the thick curve in each 22 It does not follow that these are the optimal starting weights for the principal to assign; see Appendix B. For instance, some action may be preferred by the principal in every state but is known to be very costly for the agent. So the principal might put a low weight on it and ask that it only be taken if it would be exceptionally valuable. If the weight on that action were increased, the agent would require a larger transfer in order to accept the contract.
416
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
Fig. 1. Principal value functions for βA = .7 (dashed), .9 (thin), and lim βA → 1 (thick). These graphs show the principal’s per-period action payoff for an example with a variety of discount factors. The dashed curves show the value functions for βA = .7; thin, for βA = .9; and thick, for the limit as βA → 1. For each graph βP is a fixed power of the varying βA (see captions). The x axis is the agent’s weight share on action a 1 , (1 − βA )W 1 . The y axis is the principal’s per-period action payoff as a function of the weight, (1 − βP )Y ∗ (W |βA , βP ). In all cases, = {θ+ , θ− } and A = {a 1 , a 2 } with p θ+ = .8 and p θ− = .2. The principal’s utility from a 1 is UP (a 1 , θ+ ) = 1, and UP (a 1 , θ− ) = −4; the utility from a 2 is UP (a 2 , θ) = 0 for each θ . So if either action a 1 or action a 2 is taken unconditional on the state, the expected principal payoff is 0. The first-best principal level per period would be .8, if a 1 were to be taken in every θ+ state and a 2 were to be taken in every θ− state. In all cases, this value is achieved by the βA → 1 value function at a starting weight of (1 − βA )W 1 = .8 on a 1 and (1 − βA )W 2 = .2 on a 2 – the probabilities that the actions are optimal in a given period. Each of the βA = .7 and βA = .9 curves are approximated using value function iteration on the program of Lemma 2 over a grid of 500 points, while the βA → 1 curves are calculated pointwise according to the maximization of (6) subject to (7) and (8) (Proposition 1, part 2).
panel is the limiting payoff per period as both discount factors approach 1. The dashed and thin curves show per period payoffs for low and high discount factors away from the limit. We see that the limiting thick curves confirm Corollary 1: neither (i) the principal’s maximum possible payoff per period, nor (ii) the starting weight which achieves this maximum, depends on how patient is the principal relative to the agent. For discount factors away from the limit, however, these two properties no longer hold. The precise levels of the principal’s maximum value and the starting weight which achieves the maximum must be computed recursively (see Lemma 2).
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
417
Fixing a relative patience of the principal and the agent and taking discount factors to 1, the principal’s per-period payoff grows towards the frequent-interaction limit. 4.3. Special Case 2: two actions In Section 4.3 I suppose that the agent has only two available actions. This can be interpreted as “work” versus “shirk,” if working is observable. I am no longer maintaining Assumption 2, that the set of states is finite. Assumption 3 (Maintained throughout Section 4.3). A = {a 1 , a 2 }. The space of possible W ’s and π ’s are both one-dimensional; W 1 and π 1 determine W 2 and π 2 . In this subsection I will therefore write Y ∗ (·) and π∗ (θ|·) as functions just of W 1 , and I will look for the optimal policy π∗1 (θ|W 1 ). The principal’s payoffs under a given state are also effectively captured by a one-dimensional parameter. Let u(θ ) be the relative utility from action a 1 versus a 2 in state θ : u(θ ) ≡ UP (a 1 , θ) − UP (a 2 , θ). 1 Proposition 2. There exists a weight “cutoff” function Wc1 : R → [0, 1−β ] for which the followA ing defines a principal-optimal policy:
⎧ ⎪ 0 ⎪ ⎨ 1 1 π∗ (θ|W ) = W 1 − βA Wc1 (u(θ )) ⎪ ⎪ ⎩ 1
if W 1 < βA Wc1 (u(θ )) if βA Wc1 (u(θ )) ≤ W 1 ≤ 1 + βA Wc1 (u(θ )) . if W 1 > 1 + βA Wc1 (u(θ ))
This cutoff function Wc1 (u) is weakly decreasing in u. Say that the agent enters some period with a weight W 1 on action a 1 , and then observes state θ with relative payoffs u(θ ). If the weight on action a 1 is high relative to the cutoff value Wc1 (u(θ )), it is optimal to play a 1 (π∗1 = 0). The plentiful action a 1 is played while the scarce action a 2 is saved for later. If the weight on a 1 is low, the agent instead plays a 2 (π∗1 = 0). And over an intermediate range of weights, the agent mixes between actions a 1 and a 2 in such a way 1 1 that the weight starting in the next period, W β−π , is exactly the cutoff value. These cutoff weights decline in u(θ ). That is, in states where the relative payoff to action a 1 is higher, a 1 is more likely to be played – it is played for a greater range of weights. A cutoff weight of Wc1 (u(θ ))) = 0 means that action a 1 is so valuable in the current state that it is played if 1 instead there is any weight at all remaining on the action. A cutoff weight of Wc1 (u(θ ))) = 1−β A implies that a 2 is so valuable, or a 1 is so harmful, that a 2 is played if at all possible. As derived in the proof of Proposition 2, the cutoff Wc1 (θ ) is taken to be some point such that the derivative (or, at a kink, supergradient) of Y ∗ (·) at Wc1 (θ ) is equal to ββPA u(θ ). Loosely
418
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
Fig. 2. Illustration of Proposition 2.
speaking, this comes from the first-order condition of Equation (4).23 Fig. 2 illustrates how the cutoffs Wc1 (θ ) determine the action policy π∗1 of Proposition 2. One key question is whether actions are ever “used up” in a discounted quota. Will some action eventually be played until its weight runs out, then never played again? Or will the principal avoid playing actions when their weights are low, waiting to play them in the future once the weights have recharged? Once one action has been used up, the other action will be played in every period afterwards and the weight will never again move. As I will show, the answer will depend on the relative discount factors, with βA = βP being a knife-edge special case. In the following analysis 23 In fact, the optimal action policy π is unique if Y ∗ is strictly concave. The only flexibility comes if Y ∗ (W 1 ) has an ∗ β
interval with constant slope equal to βA u(θ) for some θ . Then it is equally optimal to play as if the cutoff value for θ is P any point in this interval of constant slope, and we need not choose the same cutoff value in different periods.
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
419
I will make a technical assumption (formalized below in Assumption 4) that each action is “good enough” that neither one dominates the other. Defining our terms more carefully, at some history of a discounted quota contract I say that action a i has been used up if the weight on that action is W i = 0. So action a 1 has been used 1 up if W 1 = 0, and action a 2 has been used up if W 1 = 1−β (and hence W 2 = 0). Call W 1 = 0 A
1 corner weights, and all other weights interior. Once a corner weight has been and W 1 = 1−β A reached, we will remain at that weight for every future period. Now, take some discounted quota contract with specified initial weights, and consider how the weights update over time across possible paths of play. If it is the case that for all t and at every possible t -history there is an interior weight, I say that neither action will be used up in the contract. On the other hand, if it is the case that as t → ∞ the probability over t -histories goes to 1 that we are at a corner weight, I say that some action will be used up. According to Proposition 2, in state θ the action chosen will bring next period’s weight as close as possible to the cutoff weight Wc1 (θ ). So these cutoffs tell us whether actions will ever be used up.
Lemma 3. Take some discounted quota contract following the principal-optimal policy of Proposition 2, with interior initial weights. 1. If there is a probability 0 (with respect to realizations of θ ) that Wc1 (u(θ )) is equal to a corner weight, then neither action will be used up. 2. If there is a positive probability that Wc1 (u(θ )) is equal to 0 and a positive probability that 1 Wc1 (u(θ )) is equal to 1−β , then some action will be used up. A In order to cleanly characterize whether actions are used up, I make certain technical assumptions. For βP ≤ βA , I will assume that there is some state in which action a 1 is preferred to a 2 , and some state in which a 2 is preferred to a 1 . For βP > βA , I will make a stronger assumption that each action has a sufficiently high probability of being preferred to the other. Recall that u(θ ) = UP (a 1 , θ ) − UP (a 2 , θ ) is the relative payoff of action a 1 versus a 2 . Let F (u) ≡ Prob[u(θ ) ≤ u] be the cdf of these relative payoffs, with respect to the iid state realizations. Principal payoffs are bounded, so the distribution of utilities given by F has a compact support. Let u be the maximum of this support – the highest relative payoff of action a 1 – and u be the minimum. Let F −1 be the inverse cdf function of relative payoffs: F −1 (p) ≡ infu {F (u) ≥ p} for p ∈ [0, 1]. Assumption 4. i. If βP ≤ βA : Assume that u < 0 < u. ii. If βP > βA : Assume that there exists some p > 1 − assume that there exists some (different) p > 1 −
βA βP
!1 such that 1−p F −1 (q)dq > 0, and !p such that 0 F −1 (q)dq < 0.
βA βP
As the frequency of interactions increases in the manner of Section 4.2, the ratio ββPA goes to 1. So as players become patient, Assumption 4(ii) is satisfied under the same conditions as Assumption 4(i), that u < 0 < u.
420
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
Lemma 4. Fix discount factors βA and βP , and let Assumption 4 hold. There exists a cutoff function Wc1 (u) which defines an principal-optimal policy as in Proposition 2, satisfying: 1 1. If βA < βP : For all θ with u(θ ) ∈ [u, u], there is an interior cutoff Wc1 (u(θ )) ∈ 0, 1−β . A 1 2. If βA = βP : Wc1 (u(θ )) = 0 if and only if u(θ ) = u, and Wc1 (u(θ )) = 1−β if and only if A u(θ ) = u. 3. If βA > βP : There is a positive measure of states with cutoff Wc1 (u(θ )) = 0, and a positive 1 measure of states with cutoff Wc1 (u(θ )) = 1−β . A
Proposition 3 combines Lemmas 3 and 4 to tell us whether or not actions will be used up: Proposition 3. Fix discount factors βA and βP , and let Assumption 4 hold. Under the principaloptimal policy of Lemma 4, 1. Suppose βA < βP (patient principal). Then neither action will be used up. 2. Suppose βA = βP . If states are discrete, so there is a positive probability that u(θ ) = u and a positive probability that u(θ ) = u, then some action will be used up. If states are continuous and there is a 0 probability that either u(θ ) = u or u(θ ) = u, then neither action will be used up. 3. Suppose βA > βP (impatient principal). Then some action will be used up. An impatient principal with βP < βA prefers to use weight on a relatively good action today instead of saving it for later. Any state which gives a payoff close to an action’s best possible payoff will have a corner cutoff. So, some action will be used up. When the principal is patient and βP > βA , the principal will not want to use up an action even in that action’s best possible state. The patience of the principal implies that the weight grows “faster” than the rate at which the principal discounts payoffs. He is always better off saving some weight and letting it recharge, so that it can grow and be used again in the future. Neither action will be used up. When discount factors are equal, the principal will be willing to use up an action only in the state which is best for that action. So if states are discrete and there is a positive probability of realizing the best state for the action, then some action will be used up. If states are continuous and the best state for an action has a 0 probability of ever being realized, neither action will be used up. 5. Discounted quotas with state-dependent preferences In this section I reconsider the assumption that the agent has state-independent preferences. As described in Section 3.5, this assumption was necessary for deriving menus of discounted quotas as an optimal contract form. But we can explore the properties and applicability of discounted quotas in environments in which the agent’s stage utilities vary with the state of the world. In this section, let payoffs be given by Principal: VP =
U˜ P t (at , θt ) − GP t (τt )
t t=1
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
Agent: VA =
421
t U˜ At (at , θt ) + GAt (τt ) . t=1
This formulation generalizes the original payoff formulation in two ways. First, the agent’s utility may now depend on the states of the world. Second, action payoffs may vary over time in ways other than simple discounting.24 As a matter of terminology, under these utilities we may need to clarify the discount factor used in determining the constraints of a “discounted quota”. When it ¯ may be unclear, I say that a discounted quota with discounting β is one that fixes ts=1 β s−1 πsi for each i; β takes the earlier role of βA . The key incentive property of discounted quotas was that, conditional on the constraints of the contracts, all agent types were willing to act in the principal’s best interests – to play a principaloptimal strategy. In the language of Frankel (2014), discounted quotas satisfy the property of “aligned delegation”. But the alignment was weak, in that it was achieved through indifference: agents with state-independent preferences were equally willing to play any strategy, including the principal’s preferred one. This indifference implied a lack of robustness. Adding arbitrarily small perturbations to the model would ruin the alignment and discontinuously reduce the principal’s payoff from a discounted quota. For instance, the agent’s behavior would change discontinuously if she received small utility shocks each period,25 or if she were required to pay a small cost to learn the state of the world. If players had arbitrary state- and/or time-dependent preferences, there would be no reason to expect any alignment of incentives and thus no reason to expect discounted quota contracts to be a reasonable or effective contract form. Below, however, I show that there are some settings in which the utility functions would naturally be state-dependent in a manner that actually strengthens the alignment: discounted quotas would satisfy aligned delegation through strict incentives rather than indifference. Of course, discounted quotas contracts would no longer be optimal. But the agent would continue to play principal-optimal strategies, so we would analyze the dynamics of such contracts and the principal’s corresponding payoffs exactly as in Section 4. Law-of-largenumbers arguments (as in Jackson and Sonnenschein, 2007) would continue to imply that such quotas would give high payoffs in settings with many decisions and little discounting. And, perhaps most importantly, the contracts would now be robust to small perturbations. Starting from a strict alignment of incentives, adding small shocks to the agent’s payoffs would have continuous rather than discontinuous effects on agent behavior and on principal payoffs. Definition. Agent and principal payoffs VA and VP are weakly aligned at discount factor β if for every joint distribution of states, under any discounted quota with discounting β, there exists an optimal strategy for the agent which is principal-optimal. The payoffs are strictly aligned at discount factor β if every optimal strategy for the agent is principal-optimal. 24 For simplicity, I assume here that t¯ and each U˜ At are known in advance to the agent. The important feature is just that
the principal never has any information about these that the agent does not have, and hence the agent would not benefit from communication with the principal. 25 In the above formulation, we could put in utility shocks through U˜ (a i , θ) = β · (U (a i ) + i ), with i drawn At A A t t from a continuous distribution with arbitrarily small support. In a discounted quota the agent would take actions entirely based on ti realizations rather than principal-relevant states of the world θt . This would hold if the agent knows U˜ At , and hence the realization of ti , at time 0 (as suggested in footnote 24); or if the utility shocks ti were only revealed at time t .
422
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
In the model introduced in Section 3, payoffs are weakly aligned but not strictly aligned at discount factor βA . Below, I give conditions on preferences that would guarantee strict alignment at some discount factor. Altruistic agent. I say that there is an altruistic agent if utilities are of the form U˜ P t (at , θt ) = βPt−1 UP (at , θt ) U˜ At (at , θt ) = βAt−1 UA (at ) + αβPt−1 UP (at , θt ) for α, βA , βP > 0. The first term in each player’s utility is familiar from the original model. The agent, a hospital, faces private costs of performing treatments, inducing state-independent payoffs UA (at ). These costs are discounted over time at discount factor βA . The principal, the government, maximizes patient welfare of UP (at , θt ), discounted at the social discount factor βP . But there is also a new term in the agent’s utility: the hospital places some positive weight on patient welfare, discounted at the social discount factor. With an altruistic agent, payoffs are strictly aligned at discount factor βA . We see that the agent’s lifetime action payoff in a discounted quota with weight vector W would be a constant i ) · W i (independent of chosen actions and realized states), plus some weight on the U (a A i principal’s action payoff. So the set of maximizing action choices would be exactly those which maximized the principal’s action payoff. That is, adding some level of partial altruism to the agent leads discounted quotas to be strictly rather than weakly aligned. Altruistic principal. I say that there is an altruistic principal if utilities are of the form U˜ P t (at , θt ) = βPt−1 UP (at ) + αβAt−1 UA (at , θt ) U˜ At (at , θt ) = βAt−1 UA (at , θt ) for α, βA , βP > 0. An altruistic principal is mathematically equivalent to an altruistic agent when the players have equal discount factors, but not when discount factors differ. Regardless, this formulation captures a different motivation in which it is the principal rather than the agent who incurs a state-independent cost to generate an allocation. The agent has a state-dependent value for this allocation. The principal factors in both the agent’s welfare and the production costs.26 With an altruistic principal, payoffs are strictly aligned at discount factor βP , the discount factor on the state-independent terms. The logic is similar to the case of the altruistic agent: under a discounted quota, the principal’s action payoff is equal to a constant plus some weight on the agent’s action payoff. So maximizing the agent’s action payoff in a discounted quota is identical to maximizing the principal’s action payoff. Objective functions in this class are used in two recent papers about repeated decisionmaking with discounting (Guo and Horner, 2015 and Lipnowski and Ramos, 2015). Lipnowski and Ramos (2015) motivate principal altruism by considering the principal to be a central authority that serves an agent, one of its constituents. Think of a regional government paying for local infrastructure projects, or a school administration paying a department’s cost of hiring new faculty. Guo and Horner (2015) consider a more classical mechanism design framework in which 26 Suppose that discount factors do in fact differ. Then in both examples of altruism, it must be the case that the “altruistic” player, J = A or P , has state-independent fundamental preferences UJ . Only the other player ¬J may have state-dependent preferences U¬J .
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
423
the planner produces a good or not in each period. The cost of production is borne by the planner, and the state-dependent benefit accrues to the agent. The planner has an efficiency objective which aggregates the agent’s benefit and its own costs. It is useful to clarify some of the similarities and differences between the current paper and these other two. Guo and Horner (2015) give the principal commitment power as a mechanism designer, as I do, while Lipnowski and Ramos (2015) search for a principal-optimal repeated game equilibrium. Both of the other papers assume binary actions (the principal produces a good or not), binary states (agents have high value or low value if the good is produced, zero otherwise), and a common discount factor. I allow for general action and state spaces, and for different discount factors. And unlike the current paper, these other papers make the assumption that the agent’s utility function is commonly known to the principal and that monetary transfers are exogenously ruled out.27 Finally, Lipnowski and Ramos (2015) consider an iid environment, while Guo and Horner (2015) assume that the state process is Markovian. My characterization in Section 3 of the optimal contract as a menu of discounted quotas made no assumptions about the state process, while my analysis of dynamics in Section 4 studied the iid case. In terms of the results, Guo and Horner (2015) find an optimal mechanism that takes the form of a budget on the number of times the agent can request the service “in a row”. When she requests the service, one unit is deducted from the budget. When she does not, the budget is scaled up, although in a manner that is more complicated than just multiplying by the inverse of the discount factor.28 Lipnowski and Ramos (2015) find an optimal equilibrium with two regimes. The game begins with a “capped budget”, which is essentially an implementation of a discounted quota on the number of times the service can be provided. Thanks to the principal’s lack of commitment in the repeated game, though, there is a limit on how much weight can be accumulated. Once the budget runs out and there are no more plays remaining, the equilibrium shifts to a “controlled budget” regime in which the principal enforces minimum waiting periods between the provision of each service. There is also an indirect connection between the use of quotas for an altruistic principal and the use of quotas for an efficiency-minded principal mediating between multiple agents. Consider specifically Jackson and Sonnenschein (2004) (the extended working paper version of Jackson and Sonnenschein, 2007) and Escobar and Toikka (2013). These papers find that sequences of (non-discounted) quotas over finite blocks can approximate efficiency as discount factors approach one. In the two papers the principal has no costs or benefits of his own, but rather mediates between multiple agents: each agent has private information in each period on her payoff from a public allocation. Compare that to the altruistic principal set-up where there is a single agent and the principal has direct action payoffs. The connection is the following: from the perspective of a given agent in a multiple-agent setting, the sum of all payoffs of all other players can be thought of as a stochastic analog to the altruistic principal’s utility UP (at ). It is a cost borne by others which depends on the action but not the agent’s type. To incorporate information from multiple agents, these papers give the agents quotas over state reports rather than actions; the action is 27 See the discussion of generic uncertainty in Section 3.5 for an explanation of how, with more than two actions, one could improve on a quota when utilities are commonly known even without using monetary contracts. Rather than fixing the number of plays on each action, we would allow any sequence of actions that achieved a specified level for the discounted lifetime costs. 28 In fact, depending on the level of the budget, it is possible that the budget can fall rather than rise when the service is not requested. In that case the budget still falls by less than one full unit, maintaining some incentive for the agent not to request the service.
424
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
determined by the joint reports of the agents. These quotas are such that one’s own sequence of reports cannot affect the “externality” imposed on other players, at least in a limit with many decisions and/or high discount factors. Therefore each agent’s behavior is approximately aligned with the planner, and she is willing to report states approximately truthfully. Generalized quadratic loss utility. The previous examples made no parametric assumptions on utility functions themselves. Alignment was generated through “altruism” in which one player with state-independent payoffs cared about the other’s state-dependent payoffs. Another source of alignment can be through parametric or functional form assumptions on the utility functions. If both players have utilities in the same class of functions, then it is possible that they may face identical maximization problems in a discounted quota. Here, I illustrate an example of such alignment, previously introduced for a non-discounted environment in Frankel (2014) (see Appendix Section D.1 of that paper). Let A and be subsets of the real line, with A finite and compact. I say that players have generalized quadratic loss utility if utilities are of the form U˜ P t (a, θ ) = −β t−1 (cP (a) − θ )2 , for some strictly increasing function cP U˜ At (a, θ ) = −β t−1 (cA (a) − θ )2 , for some strictly increasing function cA where the principal and agent share a discount factor β > 0. The simplest case is where cP (a) = a while cA (a) = a − λ, for some constant λ ∈ R. That corresponds to the familiar quadratic loss payoffs with constant biases: the principal wants to match the action at as closely as possible to the state θt , the agent prefers at = θt + λ, and both players have quadratic losses about their ideal points. This more general functional form allows for each player to have an arbitrary increasing mapping of states to ideal points. Maintaining quadratic losses means that preferences over actions given a distribution of possible states are the same as if the average state were realized. With generalized quadratic loss utilities, payoffs are strictly aligned at discount factor β. The logic is essentially that, regardless of c(·), players want to assortatively match actions to states to the extent that this is possible. For any strategy and any beliefs over the distribution of states, one can calculate the average state at which an action is played (correcting for discounting with an appropriate weighted average). The optimal strategy, independently of c(·), maps the lowest action to the lowest possible average state; the next lowest action to the lowest possible average conditional on the previous assignment; and so forth up through the highest action, matched to the highest possible average state. See Proposition 4 of Appendix C. 6. Conclusion This paper presents an economic environment in which discounted quotas are optimal contract forms, and then it explores the dynamics of such mechanisms. In a broader sense, I think of this paper as showing how to properly account for discounting in quotas. Let me reiterate the distinction between the current treatment of discounting and that in some past work on sequential quotas. Jackson and Sonnenschein (2004, Corollary 2) and Escobar and Toikka (2013) divide time into blocks of periods and impose separate undiscounted quotas over each block. Discounting is relevant in that incentives can be maintained over longer blocks as discount factors grow. As discount factors go to 1 and discounting vanishes, the blocks can become arbitrarily long. In this limit, they find that quotas implement approximately efficient allocations at an appropriate starting weight. In the current paper, by contrast, I build discounting directly into the quota
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
425
constraints. All periods are now bound together into a single block – even when players are impatient. In a recursive environment the action payoffs and policies are simple to compute with a dynamic program, for any discount factors of the principal and agent and going forward from any starting weight. Appendix A. Proofs Proof of Lemma 1. Fix some incentive compatible direct contract. Step 1: I first establish the following claim. " Claim 1. For almost every UA ,
argmax
(W,TA )∈WT 0
#
W · UA (a ) + TA is a singleton. i
i
i
Proof. Let VA (UA ) denote the expected payoff of an agent with utility vector UA at time 0, conditional on accepting a contract: # " A VA (UA ) ≡ max W i · UA (a i ) + TA . (9) (W,T )∈WT 0
i=1
We now apply the envelope theorem. By Theorem 1 of Milgrom and Segal (2002), if VA (UA ) is differentiable with respect to UA (a i ) then for any optimizing pair (W ∗ , TA∗ ), ∂VA 1 I (a ), ..., U (a ) = W ∗i . U A A ∂UA (a i ) Theorem 2 of Milgrom and Segal (2002) implies that VA (·) is absolutely continuous and so this derivative exists everywhere except for a set of utilities of measure 0.29 UA is drawn from a nondegenerate distribution, so the derivative exists with probability 1. If this derivative exists, then the right-hand side must be unique. This establishes the result for each W i . The result for TA follows from Equation (9); if two pairs of (W, TA ) have the same W ’s but different TA ’s, then the pair with higher TA is preferred. This completes the proof of the claim. Step 2: Show the result for t = 1, i.e., show that for almost every UA , it holds that WT ((UA )) is a singleton (where WT ((UA )) represents the weights and transfers available after a report of Uˆ A = UA ). It is enough to show that WT ((UA )) is a subset of # " i i argmax W · UA (a ) + TA (W,TA )∈WT 0
i
since Claim 1 above established that this argmax set is a singleton for almost every UA . By the construction of the maximization problem, the agent’s optimal strategy achieves weights and transfers in WT ((UA )); and the truthful strategy σ ∗ is optimal under the assumption of incentive compatibility. 29 To apply the theorem, W i must be bounded; by construction, W i ∈ [0, 1 ]. 1−βA
426
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
Now suppose that, following an initial truthful report of UA , some (strictly sub-optimal, nontruthful) strategy σ yields weights and transfers (W, T ) ∈ / WT ((UA )). But there is a sequence of states such that the truthful strategy σ ∗ would make all of the same reports as does σ . Since under σ ∗ the private observations of states never affect the agent’s utility directly, the payoff from σ must be the same as σ ∗ . Contradiction. So, in fact, all strategies give weights and transfers in WT ((UA )). Step 3: Show the result for t ≥ 2. I will proceed by induction. Say that there is a unique weight and transfer payoff available at almost every public t -history, for t ≥ 1; I seek to show that the same must hold at t + 1-histories. By the restriction to contracts which depend only on reports (without loss, with respect to payoffs; see footnote 9), we can observe that if at time t ≥ 1 and history Ht there is a unique weight and transfer payoff (Wt , TA,t ) in WT (Ht ), then after any report θˆt , the weight and transfer must also be unique. In particular, say that after report θˆt the contract chooses action a i with probability π i and allocates an expected transfer utility of τ˜ = βA1−t Eτt [GAt (τt )] to the agent. Then at W i −π i history Ht followed by θˆt , the weight on action a i is given deterministically by W i = t (see equation (3)), and the continuation transfer payoff is given by TA,t+1 = pletes the proof. 2
t+1 βA TA,t −τ˜ 30 βA . This com-
Proof of Theorem 1. Fix some initial contract C; without loss, we can take C to be an incentive compatible direct contract. I seek to show that we can find a menu of discounted quotas which replicate the agent acceptance decisions for each UA and, for almost every UA which accepts the contract, (i) replicates the action payoffs of C under a particular agent strategy, while (ii) weakly lowering the principal’s transfer costs (independently of agent strategy). If this is the case, then the same menu of discounted quota contracts under the principal-optimal strategy weakly improves the principal’s payoffs relative to C. We construct the menu of discounted quotas in the following manner. For each public 1-history H1 = (Uˆ A ) – that is, for each Uˆ A ∈ U – find some point (W (Uˆ A ), TA (Uˆ A )) ∈ WT (H1 |C). (For notational ease, I use WT (H1 |C) to indicate that the weight and transfer are for the history at the original contract.) Assign each Uˆ A to a discounted quota, so that the agent select from the menu of discounted quotas by making an announcement Uˆ A . Upon announcing Uˆ A , the agent is given the following discounted quota. First, weights are t set to W (Uˆ A ). Second, the the sequence of transfer payments (τt )t=1 are chosen to minimize principal’s transfer costs t GP t (τt ) subject to setting the agent’s transfer payoff t GAt (τt ) equal to TA (Uˆ A ). Clearly the agent is indifferent to this menu of contracts as the original one – she gets the same action and transfer payments going forward for every initial report she can make – and so she will accept the menu of discounted quotas exactly when she accepts the original contract, and conditional on accepting is willing to announce Uˆ A = UA . For every UA for which WT ((UA )|C) is unique, the principal gets lower expected transfer costs regardless of agent strategy. Moreover, at these UA values the principal can replicate his action payoffs for every realized history of states (θ1 , ..., θt ) by proposing the following strategy (recall that the agent is indifferent across strategies). Under the original contract C, find the induced distribution of period-t actions at truthful public history (UA , θ1 , ..., θt−1 ) and report θt ; 30 Without the restriction to contracts that depend only on reports – that is, allowing contracts with superfluous randomization – the result of the lemma would still hold. But weights and transfers at t + 1 might be random, with expectation at the given values of Wt+1 and TA,t+1 , rather than deterministic.
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
427
under the discounted quota, at this utility and this sequence of states, the agent chooses π to replicate this distribution. This choice of π is feasible because every report under C yields lifetime weights of W (UA ), the unique weight in WT ((UA )|C). At those values of UA for which WT ((UA )|C) is not unique, the principal may not be able to replicate action payoffs while weakly decreasing transfer costs. But such values of UA are zero probability realizations by Lemma 1, and by the assumption of a nondegenerate distribution of UA . The agent can play arbitrarily after realizing one of these utility values. As promised, the proposed choice of π almost surely replicates the principal’s action payoffs; moving to the principal-optimal strategy then weakly improves the principal’s payoffs. 2 Proof of Lemma 2. Here I follow standard dynamic programming arguments. Suppose that we have promised weight W at a private history which, for convenience, I take to be P H1 . (The history is relevant only through its influence on the current weight). ∗
Y (W ) = sup
∞ A
{π(·,·)} i=1 s=1
βPs−1 Eθs ,Hs π i (P Hs , θs )UP (a i , θs )
(10)
s−1 subject to π(P Hs , θs ) being a valid probability vector, and W = ∞ s=1 βA Eθs ,Hs π(P Hs , θs ). The principal-optimal policy π∗ (P Hs , θs ) is defined as a maximizer of this program, if one exists. We can replace the sup with a max, because the constraints on π are compact and the maximand is continuous in π . The same logic implies that Y ∗ is continuous. Y ∗ (W ) is also weakly concave in W ; this follows because at any averaged weight, an averaged action can be played which yields the average of the values.31 We can now rewrite the sequence problem (10) in a recursive functional form with the following constraints, where the argument of Y ∗ comes from the equation of motion (3): A ∗ i i ∗ W − π(θ|W ) Y (W ) = Eθ max π (θ|W )UP (a , θ) + βP Y βA π(θ|W ) i=1 i ∀θ i π (θ|W ) = 1 s.t.
0 ≤ π i (θ|W ) π i (θ|W ) ≤ W i
∀θ, ∀i . ∀θ, ∀i
i i At this point we can observe that the stage utility A i=1 π (θ|W )UP (a , θ ) is linear in π (and so weakly concave) and bounded, and the discount factor is less than one, so standard results (for instance, Stokey and Lucas, 1989 Chapter 4) imply the equivalence of the recursive functional problem and the sequence problem. Furthermore they imply that the solution to the functional problem is the unique fixed point of the associated contraction mapping. 2 Proof of Proposition 1. This proof proceeds in several steps. First, I define a “continuous certainty value” which gives an upper bound for the principal-optimal payoffs Y ∗ (W |βA , βP ) at fixed discount factors βA and βP . Heuristically, as suggested by the name, these are the payoffs 31 Formally, take weights W , W , and W with corresponding action probabilities π , π , π . If W = αW + (1 − α)W with α ∈ (0, 1), then consider πˆ (θ |Hs ) defined to be απ (θ|Hs ) + (1 − α)π (θ|Hs ). πˆ satisfies all the feasibility constraints at W . The right-hand side of (10) yields a value of αY ∗ (W ) + (1 − α)Y ∗ (W ) if we plug in π = πˆ , because every term is linear in π . So Y ∗ (W ) ≥ αY ∗ (W ) + (1 − α)Y ∗ (W ) as desired.
428
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
in a mathematical optimization problem similar to that defining principal-optimal payoffs, but “as if” (i) states arrived and actions were played in continuous time, with (ii) states arriving at a constant and deterministic (“certain”) flow rate. I then show that the continuous certainty value ∗ at the same level of relative patience. Finally, I find a (subopexactly gives the payoffs of ylim ∗ as we take this timal) policy in a discounted quota in which action payoffs approximate ylim ∗ is a lower bound as well as an upper bound for the limiting payoffs, same limit. Therefore ylim and hence the limiting payoffs exactly achieve this value. Proofs of contained lemmas appear directly following this proof. Definition. Given βA , βP , and a vector of weights W , let the continuous certainty value ∗ (W |β , β ) be defined by YCC A P $ − log βP ζ max β X iθ UP (a i , θ)dζ 1 − βP P ζ {Xζiθ }i≤A,θ∈,ζ ∈[0,∞) ∞
∗ YCC (W |βA , βP ) =
i
s.t.
(11)
i,θ 0
Xζiθ = p θ
Xζiθ ! ∞ − log βA ζ iθ θ 0 1−βA βA Xζ dζ
∀ζ, θ
≥0
∀ζ, θ, i .
= Wi
∀i
(12)
To understand this definition, think of the continuous variable ζ ∈ [0, ∞) as representing time. (The program of (11) subject to (12) defines a mathematical maximization problem, not a new continuous-time game; the interpretation here is just a heuristic to motivate this maximization problem.) There is then a constant flow amount p θ of each state θ at each time ζ , and this flow is to be filled in with pθ units of actions from A. In this problem βA and βP are the discount factors relative to each unit length of time – the amount that flow payoffs in time ζ + 1 are discounted at time ζ . A policy function Xζiθ describes the level of action a i allocated to state θ at time ζ , given weights W at time 0. The constraints (12) follow those in (5). A quantity p θ units of actions are allocated to each state θ at each time (contrasting to a quantity of one probability unit of actions assigned to the realized state θt in the original problem); actions are nonnegative; and the discounted quantity of log β action a i used over the lifetime is W i . The −1−β coefficients rescale the continuous time period n 32 from n to n + 1 to a discounted mass of β . Lemma 5 shows that the continuous certainty value is weakly greater than the original value function. ∗ (W |β , β ). Lemma 5. Fixing βA and βP , Y ∗ (W |βA , βP ) ≤ YCC A P
Next, we restate the continuous certainty value under a few changes of variables: ∗ (W |β , β ) = Lemma 6. Fix βA < 1, βP = βA , and a vector of weights W . Then (1 − βP )YCC A P ∗ ∗ yCC (w|γ ) for w = (1 − βA ) · W in (A), and yCC defined as γ
! ! 32 The expression − log β is the multiplicative inverse of 1 β ζ dζ . So − log β n+1 β ζ dζ = β n . As β → 1, this ex0 n 1−β 1−β
pression approaches 1.
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
∗ yCC (w|γ ) =
s.t.
$
max
429
1
γ (1 − ξ )γ −1 xξiθ UP (a i , θ)dξ
{xξiθ }i≤A,θ∈,ξ ∈[0,1) i,θ 0 iθ θ ∀ξ, θ x = p i ξ ∀ξ, θ, i xξiθ ≥ 0 ! 1 iθ i ∀i θ 0 xξ dξ = w
(13)
(14)
.
The argument follows from a straightforward change of the “time” variable from ζ ∈ [0, ∞) ζ to ξ = 1 − βA ∈ [0, 1). I use lowercase y rather than uppercase Y to indicate that payoffs are in per-period terms. I correspondingly use the lowercase xξiθ to indicate the policy rather than Xζiθ . It holds that any maximizer xξiθ of (13) subject to (14) exactly gives a maximizer Xζiθ of (11) ζ
subject to (12) – specifically, setting Xζiθ = xξiθ for ξ = 1 − βA , or ζ = (log(1 − ξ ))/(log(βA )). Among other conclusions, the above lemma shows that the per-period continuous certainty ∗ = (1 − β )Y ∗ depends on the discount factors β and β = β γ only through the value yCC P CC A P A patience parameter γ . Next, we show that this per-period continuous certainty value is exactly ∗ defined as the maximization of (6) subject to (7) and (8). equal to ylim ∗ (W |β , β ) = Lemma 7. Fix βA < 1, βP = βA , and any weights W . It holds that (1 − βP )YCC A P ∗ ylim ((1 − βA )W |γ ). γ
∗ Lemma 5 combined with Lemma 7 completes the proof of Proposition 1, part 1, that ylim ∗ gives an upper bound on (1 − βP )Y . I continue now with the proof of part 2 by showing that it also gives a lower bound on the limit. ∗ , with corresponding {Liθ } implied by (8). Let {l∗iθ } be maximizers of the problem defining ylim ∗ We interpret these maximizers as suggesting that as time ξ runs from 0 to 1, we “play action a i iθ iθ at state θ ” from time ξ = Liθ ∗ to time ξ = L∗ + l∗ . That is, we choose continuous certainty 33 policies of % iθ iθ p θ if ξ ∈ [Liθ ∗ , L∗ + l∗ ) iθ x∗ξ (w|γ ) ≡ 0 otherwise % ζ iθ iθ p θ if 1 − βA ∈ [Liθ ∗ , L∗ + l∗ ) iθ . (W |βA , βP ) ≡ X∗ζ 0 otherwise
These policies indeed maximize the respective programs of (13) subject to (14), or (11) subject to (12)). (It is convenient here to highlight the dependence of x∗ and X∗ on the initial weights and the discounting; other times, when these values are implied, I omit this dependence.) As described in the body of the paper, under these policies the action a i is played at a discounted ζ γ share of l iθ periods under the agent’s discounting of time ζ at βA , and a share of (1 − Liθ ∗ ) − γ ζ ζ iθ iθ γ (1 − L∗ − l∗ ) under the principal’s discounting at βP = (βA ) . We now seek to approximate this policy in the game actually being played, in which time t ∈ N is discrete, and a single iid state is realized at each period. 33 Here and throughout the proof, I highlight the dependence of terms such as X or x on parameters such as W and w, ∗ ∗ or βA , βP , and γ , in their definitions; in later occurrences I often omit this dependence for notational clarity.
430
A. Frankel / Journal of Economic Theory 166 (2016) 396–444 γ
First, for some fixed W and βA , and βP = βA , consider a state- and time-dependent, but not history-dependent, “proposed” policy πprop (θ, t). Under this policy, if state θt = θ is realized at i (t, θ ): time t, then action a i is played with probability πprop i πprop (t, θ |W, βA , βP ) ≡
iθ X∗(t−1) (W |βA , βP )
pθ
x iθ =
t−1 ) ∗(1−βA
((1 − βA )W |γ ) pθ
.
iθ at ζ = t − 1.34 Let That is, at time t and current state θ , the action played is that implied by X∗ζ Yprop (W |βA , βP ) be the principal’s action payoff if policy πprop is played.
Lemma 8. Fix γ > 0 and w ∈ (A). Consider a sequence of discount factors and weights (n) (n) (n) (n) (n) (n) βA , βP , W (n) with (i) limn→∞ βA = limn→∞ βP = 1, (ii) βP = (βA )γ for all n, n∈N w (n) 1−βA
∗ (w|γ ). for all n. It holds that (1 − βP )Yprop (W (n) |βA(n) , βP(n) ) → ylim (n) (n) t−1 i (n) Moreover, under policy πprop , it holds that (1 − βA )E[ ∞ πprop (t, θt |W (n) , βA , t=1 (βA )
and (iii) W (n) = (n)
βP )] → w. The policy πprop cannot exactly be implemented in a discounted quota with starting weight W for two reasons. First, the mean number of discounted plays on action a i is approximately but perhaps not exactly W for βA < 1. Second, the policy leads to some variance over the number of times actions were played, induced by the dependence of actions on states and the uncertain arrival of states of the world. However, as βA → 1, both of these problems essentially disappear. There exists a discounted-quota policy which approximates πprop as βA → 1, and which achieves ∗ . Because y ∗ is an upper bound on per-period per-period payoffs approximately equal to ylim lim payoffs, we see that the action-optimal payoffs in a discounted quota approach this level. Making this precise, consider a policy πbound from histories and reported states into actions: πbound (Ht , θt |W, βA , βP ) ⎧ ⎨πprop (t, θt |W, βA , βP ), if W i (Ht ) + β t−1 π i (t, θt ) ≤ W i for all i prop A ≡ ⎩arbitrary π with W i (H ) + β t−1 π i ≤ W i , otherwise t A where W (Ht ) is the discounted number of plays so far, relative to period 1: W (Ht ) ≡
t−1
βAs−1 πbound (Hs , θs ).
s=1
Starting at time t = 1, this contract does in fact deliver a weight of W : the total weight on a i used through time t is a weakly increasing series bounded above by W i , and they must approach this bound because the weights from the contract add up to the same sum (1/(1 − βA )) as the components of W . Therefore it is implementable as an incentive compatible policy under a discounted quota with starting weight W . It only remains to show that in the limit as n → ∞, πbound is equal to πprop for a share of discounted periods approaching 1, and hence the payoff from πbound approximates the payoff from πprop . I seek to show that for almost every ξ ∈ [0, 1), 34 Because X iθ is a step function in ζ with a finite number of steps – at most I · || – it will not matter in the limit as ∗ζ βA → 1 whether the action at time t is that implied by ζ = t − 1, ζ = t , or anything in between.
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
431
& ' ) for t = log(1−ξ (with · denoting the floor function), E[πbound (Ht , θt )] goes to πprop (t, θt ) as log βA βA → 1, taking expectations over possible histories (realizations of θ s). That is, the probability goes to 1 that for all i, i
i W (Ht ) + βAt−1 πprop (t, θt ) ≤ W i .
(15)
i
For each action i, define ξ as the last time ξ for which a i is ever played in the continuous certainty problem: i
iθ ξ ≡ max{Liθ ∗ + l∗ }. θ i
i
For ξ > ξ , it holds that 1 − βAt−1 is larger than ξ at t = which point
i (t, θ ) = 0 πprop t i
&
log(1−ξ ) log βA
' for βA large enough, at
and so the inequality (15) is satisfied automatically. i
For ξ < ξ , it holds that 1 − βAt−1 is smaller than ξ . If πbound = πprop on every time less than or equal to t , then by a law of large numbers we could see that with probability 1, (1 − βA)W (Ht ) ! ξ iθ ds < w i as βA → 1, and so the inequality (15) would be satisfied with would approach θ 0 x∗s arbitrarily high probability. We cannot apply this argument so simply, because πbound may not have been equal to πprop at all past times, but the same logic applies if πbound = πprop at an 1
2
A
arbitrarily high proportion of past times. And if we order the actions so that ξ ≤ ξ ≤ · · · ≤ ξ , 1 we can see that πbound = πprop prior to ξ at a proportion of times approaching 100%; then the 1
2
same applies for times between ξ and ξ , and so on. 1 So for any > 0, with a probability approaching 1, πbound = πprop on every time in [0, ξ − ] i
i+1
− ), and furthermore on these intervals the and in each of the nontrivial intervals (ξ + , ξ i t−1 i i probability that W (Ht ) + βA πprop (t, θt ) ≤ W for each i goes to 1. (n) ∗ ∗ (w|γ ) is both an upper bound on lim (n) | Putting that all together, ylim n→∞ (1 − βP )Y (W (n) (n) βA , βP ) (part 1) and a lower bound (by constructing a possibly suboptimal policy πbound which
achieves that payoff). Therefore it is equal to the limit. 2
Proof of Lemma 5. Notice that Y ∗ (W |βA , βP ) can be written non-recursively as Y ∗ (W |βA , βP ) = max
∞
π i (Ht ,θt )
βPt−1 p θ EHt [π i (Ht , θ)|Ht ]UP (a i , θ)
t=1 i,θ
subject to the appropriate constraints on π . The notation EHt [π i (Ht , θ )|Ht ] indicates the expected value of π i (the probability that action i is taken) conditional on state θ being realized after history Ht , averaging across all histories. And Xζiθ = p θ EHt [π i (Ht , θt )|Ht ] for t the floor of ζ , t = ζ is a feasible continuous certainty policy X for any feasible π by the law of iterated expectations. Letting [X] be such a policy for an optimizing π∗ , $∞ − log βP ∗ i YCC (W |βA , βP ) ≥ eζ log βP [X]iθ ζ UP (a , θ)dζ 1 − βP i,θ 0
=
∞ t=1 i,θ
i ∗ βPt−1 [X]iθ t−1 UP (a , θ) = Y (W ).
2
432
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
Proof of Lemma 6. In (11) subject to (12), relabel the policy choice as x rather than X, and ζ consider the change of variables from ζ ∈ [0, ∞) to ξ = 1 − βA (i.e., ζ = (log(1 − ξ ))/(log βA )) ζ and w = (1 − βA )W . In an integration, then, dξ = −(log βA )βA dζ . First, we get the last line of (14) from (12) by $ − log βA ζ β X iθ dζ = W i 1 − βA A ζ ∞
θ
0
∞ $ θ
ζ
−(log βA )βA Xζiθ dζ = w i
0
$ θ
1
xξiθ dξ = w i .
0
Then we get the maximand in (13) from (1 − βP ) times the maximand in (11) by $∞ (1 − βP ) 0
− log βP ζ iθ β X UP (a i , θ)dζ = 1 − βP P ζ
$1 0
$1 =
ζ
log βP βP iθ x UP (a i , θ)dξ log βA β ζ ξ A γ (1 − ξ )γ −1 xξiθ UP (a i , θ)dξ
0 γ
where the second line follows from the fact that βP = βA , and hence (i) γ = (log βP )/(log βA ), ζ (γ −1) ζ ζ and (ii) βP /βA = βA = (1 − ξ )γ −1 . 2 ∗ (w|γ ) = y ∗ (w|γ ). 6, it suffices to show that yCC Proof of Lemma 7. Given lim ( Lemma ) I say that a policy x = xξiθ is feasible if it satisfies the constraints (14), and it is optimal if it maximizes (13) subject to being feasible. The following claim helps us to solve the maximization problem of (13) subject to (14). In words, if actions a i and a j are ever taken at (allocated to) some state θ , the claim tells us the chronological order in which these actions must be taken. When the principal is more patient, the action that is better for the principal is always taken later; when the agent is less patient, better actions are always taken earlier; and the order is irrelevant when the parties are equally patient.
Claim 2. Take initial weights w and relative patience γ as given. 1. If γ = 1: Fix some actions a i , a j ∈ A and a state θ ∈ . Suppose that the principal prefers action a i to a j in state θ : UP (a i , θ ) < UP (a j , θ ). Take some optimal policy x. (a) If γ < 1 (Patient principal, βP > βA ): ! ξ˜ j θ !1 For any ξ˜ ∈ (0, 1) it cannot hold that 0 xξ (w)dξ > 0 and ξ˜ xξiθ (w)dξ > 0. (b) If γ > 1 (Impatient principal, βP < βA ): ! ξ˜ ! 1 jθ For any ξ˜ ∈ (0, 1) it cannot hold that x iθ (w)dξ > 0 and ˜ x (w)dξ > 0. 0
ξ
ξ
ξ
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
433
2. If γ = 1 (Players are equally patient, βP = βA ): !1 Take any two feasible policies xξ and xˆξ . If for each i and θ it holds that 0 xξiθ dξ = ! 1 iθ iθ iθ 0 xˆ ξ dξ , then xξ and xˆ ξ yield the same evaluation of the maximand of (13). Any feasible policy xξiθ which satisfies the requirements of Claim 2 can be characterized by I ·|| parameters, the “lengths of time” for which each action a i is played at state θ – the measure of time for which xξiθ = p θ in state θ as opposed to xξiθ = 0 in an optimal policy x. We can write ! 1 jθ these lengths of time as l iθ = p1θ 0 xξ (w)dξ . The lemma then tells us the chronological order of those intervals (or for γ = 1, the order does not matter). The problem of maximizing (13) subject to (14) over I · || separate functions, {x iθ }, can now be reduced to a maximization over I · || scalar variables. For each proposed {l iθ }, we can calculate the times ξ during which a i is played (that is, iθ xξ = p θ rather than 0) and then plug this into the objective function (13). The value of Liθ as defined in (8) gives us the time ξ ∈ [0, 1) at which a i starts being played – that is, the action is played from Liθ until Liθ + l iθ . If two actions a i and a j give an equal principal payoff UP at some state, I use the convention that actions are played in order of their index (the number i or j ). Because timing is irrelevant at γ = 1, we can without loss assign it the timing under γ < 1. Finally, evaluating the integral in the objective (13) with xξiθ equal to p θ for ξ ∈ [Liθ , Liθ + iθ l ) and 0 otherwise yields the objective (6): $1
γ (1 − ξ )γ −1 xξiθ UP (a i , θ)dξ =
0
Liθ $ +l iθ
γ (1 − ξ )γ −1 p θ UP (a i , θ)dξ
Liθ +l = −p θ UP (a i , θ)[(1 − ξ )γ ]L Liθ iθ
iθ
= p θ UP (a i , θ)[(1 − Liθ )γ − (1 − Liθ − l iθ )γ ].
2
Proof of Claim 2. 1. I show the proof for the γ < 1 case, (a). The γ > 1 case, (b), proceeds similarly. ! ξ˜ j θ Suppose the conclusion of (a) is false. We can find ξ˜ and > 0 with 0 xξ (w)dξ ≥ and ! 1 iθ i ˜ ξ˜ xξ (w)dξ = δ ≥ . Consider switching a mass of of action a after ξ with a mass of of kθ j action a before ξ˜ , leaving xξ constant for all other k. Let yξ be the old xξiθ minus the new jθ jθ ! ξ xξiθ after the switch, or equivalently the new xξ minus the old xξ . 0 ys ds is a continuous function which is 0 at ξ = 0, decreases to − at ξ˜ , and rises to 0 at ξ = 1; it is weakly negative everywhere. The new utility after the replacement minus the old utility is
$1 γ (1 − ξ )γ −1 yξ dξ UP (a , θ) − UP (a , θ) j
i
0
⎞ ⎛ ξ $1 $ ⎝ ys ds ⎠ γ (1 − γ )(1 − ξ )ξ −2 dξ > 0 = UP (a j , θ) − UP (a i , θ)
0
0
434
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
where the equality comes from integrating by parts. We have found a feasible improvement over the optimal policy, which gives the desired contradiction. 2. The γ = 1 case comes immediately from Equation (13). 2 ∗ Proof of Lemma 8. For the first part, it suffices to show that (1 − βP )Yprop → (1 − βP )YCC iθ ∗ (omitting superscripts). Given X∗ζ maximizing (11) subject to (12), we can write (1 − βP )YCC as
$
∞ ζ
iθ −(log βP )βP X∗ζ UP (a i , θ)dζ
i,θ 0
= −(log βP )
∞
$1 βPt−1
i,θ t=1
ζ
iθ i βP X∗(t−1+ζ ) UP (a , θ)dζ. 0
Because is a step function with finitely many steps (at most I · ||), all but finitely many iθ iθ values of t are unchanged in the right-most sum when we replace X∗(t−1+ζ ) with X∗(t−1) . And because − log βP goes to 0 as we take βP to 1, these finitely many terms contribute nothing to ∗ approaching the RHS in the limit. So as βP → 1 we have (1 − βP )YCC iθ X∗ζ
− (log βP )
∞
$1 βPt−1
i,θ t=1
= −(log βP )
ζ
iθ βP X∗(t−1) UP (a i , θ)dζ 0
∞
$1 iθ βPt−1 X∗(t−1) UP (a i , θ)
i,θ t=1
= −(log βP ) = (1 − βP )
∞
ζ
βP dζ 0
iθ βPt−1 X∗(t−1) UP (a i , θ)
i,θ t=1 ∞
1 − βP − log βP
iθ βPt−1 X∗(t−1) UP (a i , θ).
i,θ t=1
On the other hand, (1 − βP )Yprop is equal to # " iθ ∞ ∞ X∗(t−1) t−1 θ i iθ (1 − βP ) βP p UP (a , θ) = (1 − βP ) βPt−1 X∗(t−1) UP (a i , θ) pθ i,θ t=1
i,θ t=1
pθ
where the on the left-hand side is the probability of state θ being realized in a given period, and the subsequent parenthetical is the payoff conditional on that realization. So we see that ∗ as β → 1. (1 − βP )Yprop → (1 − βP )YCC P Next, I want to show that (1 − βA ) times the expected discounted number of plays under πprop ! ∞ log βA ζ iθ goes to w. Recall that by (12), θ 0 −1−β βA X∗ζ dζ = W i , and so A $
∞
θ
0
ζ
iθ −(log βA )βA X∗ζ dζ = w i .
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
435
We now proceed as above: −(log βA )
∞ θ
t=1
⇒ −(log βA )
⇒ −(log βA )
ζ
iθ βA X∗(t−1) dζ → w i as βA → 1 0
$1 iθ βAt−1 X∗(t−1)
t=1
∞ θ
$1 βAt−1
t=1
∞ θ
ζ
iθ i βA X∗(t−1+ζ ) dζ = w 0
∞ θ
⇒ −(log βA )
$1 βAt−1
ζ
βA dζ → w i 0
iθ βAt−1 X∗(t−1)
t=1
⇒ (1 − βA )
∞ θ
1 − βA → wi − log βA
iθ βAt−1 X∗(t−1) → wi .
t=1
The LHS of this last line is equal to (1 − βA ) expected discounted number of plays of action
∞ θ
ai
t−1 θ t=1 βA p ·
iθ X∗(t−1) , pθ
under strategy πprop .
which is exactly the
2
Proof of Proposition 2. Let Wc1 (θ ) be some solution to the following maximization problem: Wc1 (θ ) ∈ argmax βP Y ∗ (Wˆ 1 ) − βA u(θ )Wˆ 1 .
(16)
1 ] Wˆ 1 ∈[0, 1−β
Y ∗ is concave, so the maximizer Wc1 (θ ) is weakly decreasing in u(θ ). (In the case of strict concavity, the argmax would be unique. At differentiable points of Y ∗ , the derivative of Y ∗ is βA βP u(θ ) at the argmax while at kinks this is instead a supergradient.) Given W and θ , the principal’s problem of maximizing the RHS of (4) subject to (5) can be written as finding W˜ 1 to maximize W˜ 1 ∈
argmax
(W 1 − βA Wˆ 1 )u(θ ) + βP Y ∗ (Wˆ 1 ) + UP (a 2 , θ)
(17)
1 W1 1 Wˆ 1 ∈[ W βA −1, βA ]∩[0, 1−βA ]
where Wˆ 1 = W β−π . W 1 and UP (a 2 , θ ) are constants in Wˆ 1 , so this problem is identical to (16) A except for the constraints. There are three cases: either Wc1 (θ ) is in the constraint set for (17), all points in the constraint set are less than Wc1 (θ ), or all points are greater than Wc1 (θ ). In the first case, W˜ 1 is the unconstrained maximizer Wc1 (θ ). In the latter two cases, by the concavity of the function Y ∗ , W˜ 1 is as close as possible to Wc1 (θ ) – the uppermost or lowermost point in the constraint set, respectively. 2 1
1
1 Proof of Lemma 3. 1. Starting at a history with 0 < W 1 < 1−β , if state θ is realized and * 1 + A 1 Wc1 (θ ) > 0 then next period’s weight will be min W βA , Wc (θ ) > 0. So if almost all cutoffs are
greater than 0, then with probability 1 the weight W 1 will never hit 0. Likewise, if almost all 1 1 , then the weight will almost surely never hit 1−β . cutoffs are less than 1−β A A
436
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
2. Suppose we begin with an interior weight W 1 . We will reach a weight of 0 and use up action a 1 if the cutoff is 0 for the next K 1 (W 1 ) periods, where K 1 (W 1 ) is the smallest integer 1 (W 1 ) s−1 1 βA ≥ W 1 . Similarly, we will reach a weight of 1−β and use up a 2 if the such that K s=1 A 1 1 2 (W 1 ) periods, with K 2 (W 1 ) = K 1 1 . cutoff is 1−β for the next K − W 1−β A A Say that Wc1 (θ ) = 0 with probability P > 0 and Wc1 (θ ) =
1 1−βA
with probability Q > 0. 1 From any starting weight, it must hold that either a string of K 1 2(1−β instances of 0 A) 1 cutoffs or a string of the same number of 1−β cutoffs in a row will use up an action. So there is a A , 1 1 1 1 K 2(1−β ) K 2(1−β 1 A ,Q A) probability of at least min P that at the end of the next K 1 2(1−β ) A
periods, one of the actions will be used up. Because this is an infinitely repeated game, some action will be used up in finite time almost surely. 2 Proof of Lemma 4. I will prove everything for the Wc1 (u(θ )) = 0 case. An identical argument 1 would establish the results for Wc1 (u(θ )) = 1−β . A 1 To determine whether Wc (u(θ )) = 0, it suffices only to look at the right-derivative of Y ∗ (0). . / As discussed above, Y ∗ Wc1 (u(θ )) has supergradient of ββPA u(θ ). So ⎧ βA =⇒ Wc1 (u(θ )) = 0 if u(θ ) sufficiently close to u < βP u ⎪ ⎪ ⎨ ∗ dY . (0+ ) = ββPA U P =⇒ Wc1 (u(θ )) = 0 if and only if u(θ ) ≥ u ⎪ dW 1 ⎪ ⎩ βA 1 > βP U P =⇒ Wc (u(θ )) > 0 if u(θ ) ≤ u So to prove the result, it suffices to show that ⎧ β A ⎪ ⎪< βP u when βA > βP dY ∗ + ⎨ (0 ) = u when βA = βP . ⎪ dW 1 ⎪ ⎩ βA > βP u when βA < βP ∗
∗
βA dY dY + + Step 1 (Upper bounds): Show dW 1 (0 ) ≤ u for βA = βP , and dW 1 (0 ) < βP u for βA > βP . Case 1: Assumption 2 holds – states are finite, and θ is realized with probability p θ . ∗ defined as the When states are finite, consider the frequent-interaction limiting value ylim maximization of (6) subject to (7) and (8) in Section 4.2. From Proposition 1, part 1, we know y ∗ ((1−β )W ) that Y ∗ (W ) ≤ lim 1−βPA for w = (1 − βA )W ; and at W 1 = 0, the expression holds with
θ ∗ (0) 2 ylim θ p UP (a ,θ) = . Therefore we have a bound on the derivative 1−βP 1−βP ∗ ∗ dy 1−β dY of Y ∗ at 0, that dW 1 (0+ ) ≤ 1−βPA dwlim1 (0+ ). dy ∗ We can calculate dwlim1 (0+ ) explicitly. For βA ≥ βP (i.e., γ ≥ 1) and u > 0, the first incremental unit of w1 is allocated to some state θ for which u(θ ) = u, and is pushed towards the earliest
equality because Y ∗ (0) =
∗ we take l 1θ positive only for possible time (that is, in the maximization problem defining ylim 1θ 2 θ = θ , and it holds that L = 0). Action a is allocated to all other states and times. This gives: ⎧ ⎨y ∗ (w 1 ) = up θ (1 − (1 − w1 )γ ) + θ p θ UP (a 2 , θ) if γ > 1 lim θ p . For w 1 0, ⎩y ∗ (w 1 ) = U w 1 + p θ U (a 2 , θ) if γ = 1 P P θ lim
A. Frankel / Journal of Economic Theory 166 (2016) 396–444 βP Taking derivatives and plugging in γ = log log βA , ⎧ ∗ ⎨ dylim1 (0+ ) = log βP u if βA > βP log βA dw =⇒ ∗ ⎩ dylim + (0 ) = u if β = β A P 1 dw
% 1−βA dY ∗ + 1−βP (0 ) ≤ dW 1 u
log βP log βA u
437
if βA > βP if βA = βP
.
(18) 1−βA log βP This yields the desired inequalities, noting that 1−β < ββPA when βA > βP .35 P log βA Case 2: The state space is not finite. Consider some sequence of finite approximations of the probability distribution over indexed by k = 1, 2, ... with a corresponding distribution Fk (·) over u(·) values, and a maximum u(·) value of uk . Say that under this approximation, we have value function Yk∗ . As the finite approximation becomes arbitrarily fine and Fk approaches F in distribution, it will hold that the cordY ∗ dY ∗ + responding value functions – and their derivatives – approach the original: dWk1 (0+ ) → dW 1 (0 ).
But uk → u as k → ∞, and so the limit of above results from Case 1 still hold.
dYk∗ (0+ ) dW 1
is exactly as given by (18). Therefore the
∗
∗
dY dY + + Step 2 (Lower bounds): Show dW 1 (0 ) ≥ u for βA = βP , and dW 1 (0 ) = ∞ for βA < βP . It suffices to construct some discounted quota contract whose value function has these rightderivatives at 0; the optimal contract has value pointwise greater than or equal to any other contract, and all contracts have identical values at Y (0). So the optimal contract has at least as large a derivative at 0.36 If βA = βP , fix uˆ ∈ (u, u) and δ ∈ (0, 1). Consider a contract of the cutoff form as in Proposition 2, with the following cutoffs: Wc1 (u) = 0 for u > u, ˆ and Wc1 (u) = δ for u ≤ u. ˆ Let pˆ = Prob[u(θ ) > u] ˆ ∈ (0, 1). The value Y (δ) is finite. Consider Y ( ) − Y (0) for = δβAt .37 # " t
s−1 s−1 Y ( ) − Y (0) ≥ ˆ pˆ · s−1 uˆ + βPt (1 − p) ˆ t (Y (δ) − Y (0)) . βP (1 − p) β A s=1 If u(θl )≤uˆ in periods l=1,...,t If u(θl )≤uˆ in periods l=1,...,s−1 and u(θs )>uˆ
(The inequality follows because we replace u(θs ) with uˆ in the first term.) Dividing through by , plugging in βA = βP , and reducing the summation gives Y ( ) − Y (0) Y (δ) − Y (0) ˆ t ≥ u(1 ˆ − (1 − p) ˆ t ) + (1 − p) .
δ Y ( ) − Y (0) Taking = δβAt to 0 by raising t to infinity, this yields lim ≥ u. ˆ We can do this
→0
∗ dY + exercise for any uˆ < u, implying that dW 1 (0 ) ≥ u. !1 ! 1 If βA < βP , find p > 1 − ββPA with 1−p F −1 (q)dq > 0. Let μ = p1 1−p F −1 (q)dq. 35 The inequality 1−βA log βP < βA is equivalent to βP log βP > βA log βA , so it suffices to show that s log s decreases βP 1−βP log βA 1−βP 1−βA 1−s
in s on 0 < s < 1. This can easily be confirmed by taking the derivative and recalling that log s < s − 1 in this range. 36 Technically I will not show that the limit of the difference quotient exists, and so formally I will not have proven that Y has a derivative at 0 – only that there is a sequence of going to 0 for which Y ( )/ approaches the appropriate value. ∗ But this is sufficient to provide a lower bound on dY 1 (0+ ), which does exist by concavity. dW 37 Subtracting out Y (0) from each Y value is akin to normalizing the payoff from a 1 in state θ to u(θ), and the payoff from a 2 to 0.
438
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
I want to define an indicator variable χt that is 1 when θt is in the top p fraction of the distribution. When the distribution of states is continuous this is straightforward: take χt = 1 if F (u(θt )) ≥ 1 − p, and χt = 0 if F (u(θt )) < 1 − p. If states are finite, or if there are atoms of probability mass on states, I may have to randomize conditional on θt . In this case if F (u(θt )) ≥ 1 − p then let χt = 1 with probability 1, and if F (u(θt )) < 1 − p but limu→u(θt )+ F (u) ≥ 1 − p lim
+
F (u)−(1−p)
then let χt = 1 with probability lim u→u(θt+) F (u)−F (u(θt )) . Otherwise let χt = 0. u→u(θt ) So in any period, χt is 1 with probability p and is 0 with probability 1 − p. Conditional on χt = 1, u(θt ) has mean μ. 1 Now take any integer r ≥ 1 and a value δ2 ∈ 0, min{1, 1−β − 1}] . These parameters r and A r δ2 will define a contract. Let δ1 = βA δ2 . For W 1 < δ1 , the contract specifies that action a 2 is taken for sure and next period’s weight 1 1 goes to W βA . For δ1 ≤ W < δ2 , we look at the χ variable. If χ = 1 in this period, then action 1 a is taken with probability W 1 and next period’s weight goes to W 1 = 0; if χ = 0, action a 2 is 1 1 taken and next period’s weight becomes W βA . For W ≥ δ2 , let the contract be arbitrary. s Now, solve for Y ( ) − Y (0) with = δ1 βA = δ2 βAs+r . Without loss of generality, say that we start at period t = 1. With initial weight W 1 = , we take action a 2 in periods t = 1 through t = s; then for the next r periods we use up a 1 in the case of χ = 1, and otherwise take a 2 . Y ( ) − Y (0) " r #
μ t−1 s t−1 βP (1 − p) p s+t−1 + βPs+r (1 − p)r (Y (δ2 ) − Y (0)) = βP βA t=1 If χl =0 in periods l=s+1,...,s+r If χl =0 in periods l=s+1,...,s+t−1 and χs+t =1
βP =
βA
s " r t=1
(1 − p)βP βA
Dividing through by and noting that Y ( ) − Y (0) =
βP βA
s
⎡ ⎢ ⎣
1−
1−
t−1
(1−p)βP βA (1−p)βP β (1−p)βP βA
#
(1 − p)βP pμ + βA
r
< 1 because p > 1 − r
pμ +
(1 − p)βP βA
Y (δ2 ) − Y (0) . δ2
βA βP ,
r
this gives us ⎤
Y (δ2 ) − Y (0) ⎥ ⎦. δ2
(19)
For any fixed δ2 , there exists r large enough so that the bracketed expression in (19) is positive; pμ we can see this because the limit of the bracketed expression as r → ∞ is (1−p)βP > 0. So, 1−
βA
find such an r and δ2 for which the bracketed expression is positive; these two parameters define a contract. Now we can take to 0 by increasing s to infinity. Because ββPA > 1, this takes the right-hand
side of (19) to infinity. This is a lower bound on the derivative of Y ∗ at 0, so positive infinity. 2 Proof of Proposition 3. Follows from Lemmas 3 and 4.
2
dY ∗ (0+ ) dW 1
must be
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
439
Appendix B. Optimal menus In this appendix I discuss the principal’s ex ante problem of setting an optimal discounted quota, or an optimal menu of discounted quotas, given the agent’s participation constraint. Suppose here that players have quasilinear utility in money, discounted at the same market rate by both players. Then the realized monetary benefit to the agent over the lifetime of any accepted contract is exactly equal to the principal’s monetary cost. Moreover, this transfer payment is entirely determined by the agent’s initial choice of a discounted quota from a menu. Once the agent has accepted a contract and made such a choice, the payment is independent of all future play. So a menu of quotas consists of a menu of weight and transfer pairs. Specifically, let the ¯ number of agent-discounted plays over the lifetime of the game be denoted by M ≡ tt=1 βAt−1 . We can write a menu of discounted quotas as a menu of pairs (W, T ) where W is the initial weight vector in M(A), and T is the payment from the principal to the agent. The agent either picks a pair (W, T ) from the menu or rejects the contract. For each weight vector W , there can only be a single transfer T that the agent might choose: the highest available one. Let T (W ) denote the transfer that would be selected for each available weight W . If a contract is rejected, both players get payoff normalized to 0. If the agent accepts and chooses a pair (W, T ), she gets a lifetime expected utility of W · UA + T ,
(20)
with W · UA indicating the dot product of two vectors. Once (W, T ) has been selected, the principal’s expected utility going forward can be written as Y ∗ (W ) − T ,
(21)
where Y ∗ (W ) denotes the principal-optimal action payoff starting from the initial period with weights W .38 In this appendix the principal takes as given the function Y ∗ as well as some beliefs over the agent’s utility type UA . He then chooses a menu of (W, T ) to maximize his expectation of (21) times the probability of acceptance, subject to the agent’s acceptance decision and her choice of (W, T ) to maximize (20) conditional on her type. It generally holds that Y ∗ is weakly concave in W , by the same argument as in the proof of Lemma 2. A further description of the function Y ∗ (W ) would depend on the primitives of the problem: the number of periods t¯, the principal and agent discount factors, the joint distribution of states over time, the principal’s utility function UP (a, θ ). (Section 4 shows how to solve for Y ∗ (W ) in an infinitely repeated iid environment.) Even so, the solution for Y ∗ is separable from the solution for the optimal menu of discounted quotas given Y ∗ . In other words, the solution to the dynamic moral hazard problem of getting the agent to select appropriate actions is separable from the static screening problem over agent types. Moral hazard is addressed by the use of discounted quotas combined with principal-optimal strategies. This appendix considers the screening problem. For simplicity, in the following two subsections I suppose that players have access to unrestricted transfers. That is, I make no assumption of limited liability. 38 Following Theorem 1, the optimal contract involves the agent playing a principal-optimal strategy. Note that I am not assuming that we must be in the kind of recursive environment studied in Section 4. The function Y ∗ indicates the principal-optimal action payoff starting at the initial period, which may differ from such a function starting at later histories.
440
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
B.1. A single type UA To illustrate the simplest trade-offs that emerge in the solution for an optimal menu of discounted quotas, I now look at the case where the agent’s type is known to be UA .39 Here, the “menu” will be a single point (W, T ). If the agent is to participate, joint surplus across players adds up to Y ∗ (W ) + UA · W . Let W ∗ ∂Y ∗ ∗ be a weight vector maximizing joint surplus. W ∗ satisfies the first order condition ∂W i (W ) = i −UA (interpreting the derivatives as supergradients if necessary). If Y ∗ (W ∗ ) + UA · W ∗ ≤ 0, then no mutually beneficial contract can be found – the principal does best to make an offer that will be rejected. If Y ∗ (W ∗ ) + UA · W ∗ > 0, then there are gains to be had from contracting. In that case, the principal-optimal contract (W, T ) sets W = W ∗ and T = −UA · W ∗ . This contract maximizes total surplus, then sets a payment to make the agent exactly indifferent to participating. The agent is left with utility of 0, and the principal gets all of the surplus. We see that the principal does not necessarily start the contract at the weight which maximizes Y ∗ . Instead, the starting weight is pushed in the direction of the agent’s preferences so that the principal can get the agent’s participation more cheaply. B.2. The general problem Now let there be a distribution over agent types UA . A first observation is that the transfer function T (W ) can without loss of generality be taken to be weakly concave over the set of available weights W . This result follows because the agent has linear payoffs in each W i term. For any vector UA , the maximizer of W · UA + T (W ) is on the “concavification” of that objective function – the smallest concave function everywhere weakly above it – over the domain of available W . And a point is on the concavification of W · UA + T (W ) if and only if it is on the concavification of T (W ), since the addition of linear terms does not affect concavity. So we can remove extraneous (W, T (W )) pairs from the menu at points W where T (W ) is below its concavification. Moreover, suppose that agent types UA ∈ RI are drawn from a distribution with compact support. Then, given any menu (W, T (W )), we can in fact extend the set of available weights to the full domain W ∈ M(A), and likewise extend T (W ) to be a weakly concave function over the full domain, without affecting agent behavior. First, by compactness, we can find a sufficiently low transfer T such that if any new weight W were offered with transfer T (W ) = T , it would not be chosen by any type UA . Let Tˆ (W ) be equal to T (W ) for every W in the original menu, and Tˆ (W ) = T at all other W . Now extend T (W ) to the full domain M(A) by taking it as the concavification of Tˆ (W ). Allowing an agent to pick any W ∈ M(A) at these extended transfers does not change any agent type’s problem of choosing to either reject the contract, or to accept it and select an optimal (W, T (W )) from the original menu.40 39 In keeping with the assumption of a nondegenerate type distribution in Section 3, we can think of this exercise as solving for the optimal contract – which is a menu of discounted quotas – in a limit where the support of the agent’s type becomes concentrated at a single value of UA . If the agent’s type were exactly known to be UA , however, then discounted quotas might not be a feature of the optimal contract. 40 For every point with T (W ) > Tˆ (W ), it holds that (W, T (W )) is the convex combination of two points (W, T (W )) for which T (W ) = Tˆ (W ). And when a convex combination of two points (W , T (W )) and (W , T (W )) is added to the menu, the argmax of (20) will always include one or both of the edges. So for every agent type UA , the optimizing choice of W will remain optimal after T (W ) is extended to the full domain in this manner. Indeed, as in Lemma 1,
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
441
In essence, then, the problem of screening across agent types reduces to the choice of a weakly concave transfer function T (W ) over the set of all possible weights. I leave a more explicit characterization of the properties of optimal menus for future work. Appendix C. Alignment under generalized quadratic loss utilities Proposition 4. Under generalized quadratic loss utilities, payoffs are strictly aligned at discount factor β. Proof of Proposition 4. As a preliminary step, let us define some notation. At the start of ¯ period t, there is a mass of Mt ≡ ts=t β s−t of weighted periods remaining. Order actions a 1 < a 2 < · · · < a I . A discounted quota places some weights Wti on action a i at time t , and it holds that i Wti = Mt . Given weights Wt at time t , for any q ∈ [0, Mt ) we can define the “qth action” as aˆ t (q) as the action a i which satisfies j
at time t, after observing θt , of
s≥t
a i will be played. (If Wti = 0, player’s objective under a discounted quota, going forward from time t , as Wti cJ (a i )θ¯ti .
(22)
i
One’s lifetime utility over action payoffs is equal to (22), plus terms which do not depend on the strategy. I will now show the result for t¯ finite, and then extend to the infinite case (with β < 1). For the finite case, the argument is an extension of that in Appendix D.1 of Frankel (2014) to allow for discounting; see that paper for a more thorough interpretation of the proposed “sequentialassortative” strategy. Finite t¯. Consider a discounted quota at period t with past and current states θ t . Define a measure νt (·; θ t ) of mass Mt over R in the following manner, given by its cumulative measure function. At t = t¯, in which case Mt = 1, νt places a mass one on θt : a nondegenerate distribution of agent types implies that even after this extension agents almost surely have a unique optimizer. So with probability one this extension adds no new options which the agent is even indifferent to choosing.
442
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
% t¯
νt¯(θ; θ ) =
0 if θ < θt¯ 1 if θ ≥ θt¯
.
At t < t¯, define νt given νt+1 by backwards induction: % βEθt+1 [νt+1 (θ|θ t+1 )] if θ < θt t νt (θ; θ ) = 1 + βEθt+1 [νt+1 (θ|θ t+1 )] if θ ≥ θt where at time t it holds that θt+1 , and thus θ t+1 = (θ1 , ...θt , θt+1 ), is uncertain. Each expectation is taken over the realization of θt+1 , given the knowledge of θ t = (θ1 , ..., θt ). Now, slightly tweaking the notation of the proof of Lemma 7, part 2 in Frankel (2014) (see Appendix D.1 of that paper), let θ˜t (q; θ t ) for q ∈ [0, Mt ) be the inverse function of the cumulative measure function νt : θ˜t (q; θ t ) ≡ sup{θ ∈ R|νt (θ; θ t ) ≤ q}. The interpretation is that θ˜t (q; θ t ) tells us the state at the qth quantile of νt , for q ∈ [0, Mt ). Let q ∗ = min{q|θ˜t (q; θ t ) = θt }. This value exists, and is at most Mt − 1 because νt places a mass of at least one on θt . (That is, there is an interval of q’s of length at least 1 for which θ˜t (q; θ t ) = θt .) One can think of the current state θt as being represented by the interval of quantiles q ∗ to q ∗ + 1 – it holds that θ˜t (q; θ t ) = θt for all of the q’s in this range. Then, following the logic of the proof of Lemma 7, part 2 of Frankel (2014), a sequential-assortative strategy41 plays the range of actions a(q) ˆ over q in the interval [q ∗ , q ∗ + 1): q$∗ +1
πi =
1{aˆ t (q) = a i }dq. q∗
This is the exact analog of the strategy in Frankel (2014), modulo the fact that we now treat future states as a measure with mass Mt − 1 rather than as t¯ − t discrete events. By backwards induction, this can be shown to be an optimal strategy following exactly the proof of Lemma 7, part 2 of Frankel (2014); see that paper for details. The intuition is that, if this strategy is to be played in the future, then the expected payoff going forward from period t + 1 will be equal to that from assortatively matching the remaining actions to the measure of states νt+1 (up to a constant, as in the normalization following (22)). Given that, it is optimal to play actions which assortatively match the quantile interval of the current state (q ∗ to q ∗ + 1 under θ˜t , induced by νt ) ˆ to the quantile interval of remaining actions today (q ∗ to q ∗ + 1 under a). The above establishes that there is an optimal strategy for the agent which does not depend on cA . Hence it is optimal for the principal as well, since it would be optimal for cA = cP . I seek now to establish strictness. I need to show that all optimal strategies of the agent are optimal for the principal. Let θ¯ti∗ be the average state at which action a i is played under the sequential-assortative strategy: θ¯ti∗
1 = i Wt
$Mt
1{aˆ t (q) = 1}θ˜t (q)dq
0
41 The “remaining distribution” of actions in that paper exactly corresponds to the weights in this paper.
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
443
I claim now that all optimal strategies for an agent with strictly increasing cA must yield the same average states θ¯ti as does this strategy. If so, then the principal is indifferent across all agent-optimal strategies, so all optimal strategies for an agent are indeed optimal for the principal. not: suppose alternative strategy which yields θ¯ = θ¯ ∗ , but for which ithere iis some Suppose i i i i∗ 42 i ¯ ¯ ¯i ¯ i∗ i Wt cA (a )θt = i Wt cA (a )θt . Then find i for which Wt > 0 and for which θt = θt – i i∗ without loss, suppose θ¯t > θ¯t . Then there is another strictly increasing function defining a pos , for which c (a) = c (a) at all a = a i , and for which c (a i ) > c (a i ); sible agent type, cA A A A A we simply add a small enough to cA at a i . (Because cA is strictly increasing, such an addi as being increasing.) Clearly an agent of type c tion can always be done while maintaining cA A strictly prefers the alternative strategy to the sequential-assortative one, which contradicts the sequential-assortative one’s being optimal. Infinite t¯. For any game with infinite t¯ and β < 1, it holds for any > 0 we can find a that t−1 β UJ (at , θ )) are within of large enough tˆ such that the payoffs over the whole game ( ∞ t=1 tˆ t−1 the payoffs accumulated in the first tˆ periods ( t=1 β UJ (at , θ )). Moreover, payoffs are also continuous in the weights, so payoffs are within of a discounted quota with tˆ periods and with ˆ starting weights adjusted by less than t>tˆ β t−1 so that the weights add up to tt=1 β t−1 . Along a sequence of these discounted quotas with tˆ → ∞, the above argument gives us a sequence of strategies which are independent of cJ , and which approach the optimal payoff in the limit game for any increasing cJ . Moreover, the same argument as above shows that the limiting θ¯ i ’s must also be unique across optimal strategies, yielding strict alignment. 2 References Amador, Manuel, Werning, Ivan, Angeletos, George-Marios, 2003. Commitment vs. flexibility. National Bureau of Economic Research. Athey, Susan, Atkeson, Andrew, Kehoe, Patrick J., 2005. The optimal degree of discretion in monetary policy. Econometrica 73 (5), 1431–1475. Baker, George, Gibbons, Robert, Murphy, Kevin J., 2002. Relational contracts and the theory of the firm. Q. J. Econ. 117 (1), 39–84. Casella, Alessandra, 2005. Storable votes. Games Econ. Behav. 51, 391–419. Chakraborty, Archishman, Harbaugh, Rick, 2007. Comparative cheap talk. J. Econ. Theory 132 (1), 70–94. Chakraborty, Archishman, Harbaugh, Rick, 2010. Persuasion by cheap talk. Am. Econ. Rev. 100 (5), 2361–2382. Dulleck, Uwe, Kerschbamer, Rudolf, 2006. On doctors, mechanics and computer specialists: the economics of credence goods. J. Econ. Lit. 44, 5–42. Escobar, Juan, Toikka, Juuso, 2013. Efficiency in games with Markovian private information. Econometrica 81, 1887–1934. Frankel, Alexander, 2014. Aligned delegation. Am. Econ. Rev. 104 (1), 66–83. Frankel, Alexander, forthcoming. Delegating multiple decisions. AEJ: Microeconom. https://www.aeaweb.org/ articles?id=10.1257/mic.2013-0270. Frankel, Alexander, Schwarz, Michael, 2014. Experts and their records. Econ. Inq. 52 (1), 56–71. Guo, Yingni, Horner, Johannes, 2015. Dynamic mechanisms without money. Working paper. Holmström, Bengt Robert, 1977. On incentives and control in organizations. Stanford University. Jackson, Matthew O., Sonnenschein, Hugo, 2004. Overcoming incentive constraints by linking decisions. (Extended version). Jackson, Matthew O., Sonnenschein, Hugo, 2007. Overcoming incentive constraints. Econometrica 75 (1), 241–257. Levin, Jonathan, 2003. Relational incentive contracts. Am. Econ. Rev. 93 (3), 835–857. 42 Recall that θ¯ i is defined as 0 in the case of W i = 0, and so if the θ¯ ’s disagree then they must do so on i for which t t Wti > 0.
444
A. Frankel / Journal of Economic Theory 166 (2016) 396–444
Lipnowski, Elliot, Ramos, Joao, 2015. Repeated delegation. Working paper. MacLeod, W. Bentley, 2003. Optimal contracting with subjective evaluation. Am. Econ. Rev. 93 (1), 216–240. Milgrom, Paul, Segal, Ilya, 2002. Envelope theorems for arbitrary choice sets. Econometrica 70 (2), 583–601. Myerson, Roger B., 1986. Multistage games with communication. Econometrica 54 (2), 323–358. Pavan, Alessandro, Segal, Ilya, Toikka, Juuso, 2014. Dynamic mechanism design: a Myersonian approach. Econometrica 82, 601–653. Pearce, David G., Stachetti, Ennio, 1998. The interaction of implicit and explicit contracts in repeated agency. Games Econ. Behav. 23, 75–96. Skrzypacz, Andrzej, Hopenhayn, Hugo, 2004. Tacit collusion in repeated auctions. J. Econ. Theory 114, 153–169. Spear, Stephen E., Srivastava, Sanjay, 1987. On repeated moral hazard with discounting. Rev. Econ. Stud. 54 (4), 599–617. Stokey, Nancy L., Lucas, Robert E., 1989. Recursive Methods in Economic Dynamics. Harvard University Press. Townsend, Robert M., 1982. Optimal multiperiod contracts and the gain from enduring relationships under private information. J. Polit. Econ. 90 (6), 1166–1186.