Dynamic agency with persistent observable shocks

Dynamic agency with persistent observable shocks

Accepted Manuscript Dynamic agency with persistent observable shocks Rui Li PII: DOI: Reference: S0304-4068(17)30068-X http://dx.doi.org/10.1016/j.jm...

628KB Sizes 1 Downloads 54 Views

Accepted Manuscript Dynamic agency with persistent observable shocks Rui Li PII: DOI: Reference:

S0304-4068(17)30068-X http://dx.doi.org/10.1016/j.jmateco.2017.04.003 MATECO 2153

To appear in:

Journal of Mathematical Economics

Received date: 8 December 2014 Revised date: 22 April 2017 Accepted date: 23 April 2017 Please cite this article as: Li, R., Dynamic agency with persistent observable shocks. Journal of Mathematical Economics (2017), http://dx.doi.org/10.1016/j.jmateco.2017.04.003 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Dynamic Agency with Persistent Observable Shocks Rui Li∗ College of Management University of Massachusetts Boston

Abstract This paper studies a continuous-time hidden-action model with persistent observable shocks. In this model, I develop a method to characterize the optimal contract with history-dependent effort exertion and shirking decisions. Temporal shirking is always optimal after some histories as long as a positive persistent shock is expected. As a result, my model gives rise to a mechanism through which the moral hazard problem amplifies macroeconomic fluctuations. I also show the pattern of the agent’s utility adjustments with respect to persistent shocks and its implications for compensation design. Keywords: Dynamic contract, persistent shock, business cycle.

1

Introduction

Firms are subject to large, unpredictable shocks that are beyond their control, are publicly observable, and have persistent impacts on their profitability. For example, a natural disaster can destroy a firm’s production facilities, requiring time-consuming rebuilding, or a technology innovation can permanently improve a firm’s productivity. In particular, most macroeconomic shocks are observable or can be inferred from publicly available aggregate economic indexes, and they are persistent, as shown by the real business cycle literature. Therefore, if information frictions exist, understanding incentive provisions with persistent observable shocks can have profound implications for many important economic issues, especially in macroeconomics. According to the models of Holmstr¨om [1979, 1982], these shocks should be irrelevant to contract design ∗

100 Morrissey Boulevard, Boston, MA 02125. Email: [email protected]. TEL: 1-617-287-3182.

1

because tying the agent’s utility to shocks beyond his control does not provide additional incentives but only makes the contract riskier. In a dynamic model, however, the persistent observable shocks affect the principal’s marginal cost of providing the agent with utility. His utility level then needs to be adjusted upon each shock to keep this marginal cost unchanged for efficiency.1 Consequently, the optimal incentive provision and effort exertion depend on the history of the shocks. In this paper, I introduce persistent observable shocks into a dynamic version of the model introduced by Holmstr¨ om [1979] to investigate how these shocks affect contract design. Specifically, the model is based on Sannikov [2008], in which a risk-neutral principal delegates her firm to a risk-averse agent over a long time horizon, with the agent exerting costly effort to increase the firm’s income at each instant in time. Moral hazard arises because the principal is not able to distinguish the agent’s effort from unobservable productivity shocks that are i.i.d. across time. The firm’s profitability is further subject to persistent observable shocks, which are characterized by Markov transitions between a high-productivity state and a low-productivity one. Under this framework, this paper develops a method to characterize the optimal contract, which enables us to study the optimal effort exertion conditioning on the history. Notice that DeMarzo et al. [2012] and Hoffmann and Pfeil [2010] have studied similar models with a risk-neutral agent. Different from my paper, they only consider the contracts under which the highest level of effort is always enforced so that effort exertion is restricted to be independent of the history.2 Under the optimal contract in my model, the agent is always allowed to shirk temporally in the bad state after some histories. To understand the intuition, notice that the incentive provision has to be terminated after a sequence of rewards or punishments because of the income effect and the limited liability.3 In the bad state, terminating the incentive provision means that the firm is 1

The intuition is that the agent’s utility adjustments with respect to the observable shocks do not provide additional incentives but only serve to smooth the marginal cost of utility provision upon the state transitions. Obviously, if the observable shocks were independent and identically distributed (i.i.d.) across time, optimal risk sharing would imply that the agent’s continuation utility should be independent of observable shocks. 2 Their models are based on the hidden-action model of DeMarzo and Sannikov [2006]. As these authors point out, assuming away shirking may be suboptimal:“Because the reduction in cash flows due to shirking is bounded—unlike the case of diversion–it may be optimal to stop providing incentives and to allow the agent to shirk after some histories.” Piskorski and Tchistyi [2010, 2011] study a cashdiversion (also called hidden-information) model similar to DeMarzo et al. [2012] and Hoffmann and Pfeil [2010]. The key difference is that in a cash-diversion model, incentive compatibility is all about “truth telling,” so that, given incentive compatibility, the agent’s hidden behavior in their models is independent of the history. 3 In this model, the principal rewards and punishes the agent by changing his continuation utility level

2

going to lose profitable opportunities in the future when the situation improves. Hence, if replacing the agent is costly, it may be optimal to suspend the incentive provision temporally to prevent termination and resume it later when the firm becomes more productive. For the same reason, symmetrically, in the good state, the principal would never allow the agent to shirk temporally and revert to working later. Here, the key trade-off is between the incentives provided in the two different states across time. A comparative statics analysis shows a “substitution effect”; that is, effort exertion in the bad state decreases with productivity in the good state. This procyclical pattern of effort gives rise to a mechanism amplifying business cycle shocks, which is distinct from the mechanisms based on various agency frictions proposed in a vast literature.4 Another implication of my model regards the adjustments of the agent’s utility level upon the persistent shocks. As discussed, the incentive provision has to be terminated after a sequence of rewards because of the income effect. Furthermore it is profitable to keep inducing the agent to exert effort when productivity is high. So, if the agent’s utility is already at a high level in the good state, additional rewards for good performance should be put on hold to alleviate the income effect until the downturn arrives. Since the unpaid rewards have to be compensated, a lump-sum payment will be given to the agent when the negative shock hits, which instantaneously lifts his utility level up. This pattern may explain why many top executives were rewarded when the recent financial crisis occurred, whereas many of their employees suffered the consequences.5 It may also explain why some top executives leave their firms with generous severance pay after their firms accidentally incur massive losses.6 under the contract. Compensating the agent’s effort cost is expensive when this utility level becomes high after a sequence of rewards because of the income effect; additional punishment is not possible if his utility level becomes very low after a sequence of punishments arising from limited liability. In both cases, the principal is not able to provide incentives. 4 Here, without attempting a summary, I refer the reader to Bernanke and Gertler [1989], Bernanke et al. [1996], Kiyotaki and Moore [1997], and Rampini [2004]. 5 For example, in the United States, “pensions for top executives rose an average of 19% in 2008, with more than 200 executives seeing pensions increase more than 50%,” while “the share prices at the companies declined an average of 37% in 2008 and many firms froze employee pensions and suspended retirement-plan contributions” (Ellen E. Schultz and Tom McGinty,“Pensions for Executives on Rise,” Wall Street Journal, November 3, 2009, http://online.wsj.com/article/SB125719963066023835. html). 6 For example, former CEO of BP, Tony Hayward, received “severance pay of a year’s salary (about £1m, or $1.6m) and the right to start drawing from a pension pot conservatively valued at £11m,” while “the firm announced a record loss of $17 billion, the consequence of a one-off charge of $32 billion to clean up the oil spill, compensate its victims and settle fines” after the oil spill in the Gulf of Mexico (“The Wages of Failure,” The Economist, July 29, 2010, http://www.economist.com/node/16693567). Please note that the occurrence of the explosion might have been under the CEO’s control, but the consequences of a massive oil spill were not and were quite unpredictable. In fact, explosions and oil

3

Now, let us turn to the methodology contribution of this paper. To characterize the optimal contract using dynamic programming, we need to solve the value functions of the principal in the two states, which are solutions to a pair of Hamilton-Jacobi-Bellman (HJB) ordinary differential equations (ODEs). They are difficult to work with, however, because they are intertwined through the delay and advance terms associated with the state switchings. So, instead of solving them directly, I propose a iterative procedure that successively constructs two sequences of value functions, and I show that the sequences uniquely converge to the value functions under the optimal contract.7 Each step of the iterative procedure is an extension of the method developed by Sannikov [2008] that allows us to handle persistent observable shocks, in which the state space of the contract design is divided into working and shirking regions. The parts of the value function over the two regions are characterized separately and then smoothly pasted together. This method helps us avoid singularity issues. This paper belongs to the growing literature on the continuous-time hidden-action attler and Sung model pioneered by Holmstr¨om and Milgrom [1987] and developed by Sch¨ [1993], Sannikov [2008], DeMarzo and Sannikov [2006], and Williams [2009].8 DeMarzo et al. [2012] and Hoffmann and Pfeil [2010] introduce persistent observable shocks into the hidden-action version of the model in DeMarzo and Sannikov [2006] and focus on the contracts under which the highest level of effort is always enforced. In addition to taking into account the history-dependent effort exertion under the fully optimal contract, as a complement to DeMarzo et al. [2012] and Hoffmann and Pfeil [2010],9 my paper considers a strictly risk-averse agent. Zhu [2013] also investigates a type of optimal contract with temporal shirking in the model of DeMarzo and Sannikov [2006], with the following differences. First, Zhu [2013] does not consider persistent observable shocks, which is the main subject of my paper. spills are not rare events in BP’s history. 7 See DeMarzo et al. [2012] (Appendix D) for a similar iterative procedures that they use for their numerical solutions without justifying its convergence and optimality. Specifically, I study an altered model with finitely many state transitions, which allows us to start with the value function after the last shock using the method in Sannikov [2008]. Then, step by step backwardly, I recursively characterize the value function between each pair of consecutive transitions. Here, a procedure to characterize the value function in each step based on the one constructed in the previous step is developed. As we keep iterating this procedure, the value functions converge to the ones under the optimal contract with infinite Markov transitions. 8 See Cvitani´c and Zhang [2013] for a complete description of the development of the literature. 9 My method cannot be used in the models of DeMarzo et al. [2012] and Hoffmann and Pfeil [2010] because my model requires the agent to be strictly risk averse and as patient as the principal, whereas in their models, the agent is risk neutral and less patient than the principal.

4

Second, under the optimal contract he studies, temporal shirking is offered because, as compensation, it is more efficient than a cash payment, whereas, in my model, it is the result of the optimal allocation of the limited incentives that can be provided across different states. Third, in Zhu [2013], shirking is frequent but instantaneous,10 whereas, in my model, a temporal phase of shirking always starts in the bad state and concludes when a transition to the good state occurs. Another closely related paper is Szydlowski [2012], in which the principal is ambiguity averse and the firm is subject to shocks to the contracting environment with an unknown distribution. The remainder of the paper is organized as follows. In Section 2, I introduce the model and in Section 3, I show the incentive constraint and the HJB equations for the value functions. The iterative procedure is developed in Section 4, the economic implications of the model are discussed in Section 5, and Section 6 concludes the paper.

2

The model

In the model, a risk-neutral principal delegates her firm to a risk-averse agent over the time horizon [0, ∞).

2.1

Production technology

The cumulative income process of the firm is denoted {yt },11 and dyt = g (at , θt ) dt + σ(θt )dzt and y0 = 0,

(1)

where dyt is the instantaneous income, or cash flow, of the firm at time t. The firm experiences two-state Markov transitions indicated by process {θt } with θt ∈ {0, 1}. The

Poisson transition rate at t is λ (θt ) > 0. Each transition is publicly observable and can be viewed as a persistent shock that has an impact on the firm. I use the terms “shock” and “state transition” interchangeably. According to (1), given the state, the firm’s expected income, g (at , θt ), is determined by the agent’s private choice of his level of effort 10

The agent shirks when his utility level, which is driven by a diffusion process, hits a reflecting boundary. 11 In the rest of this paper, after defining a process, I use the corresponding uppercase letter to denote it. For example, Y refers to the process {yt } hereafter.

5

at ∈ [0, a ¯].12 The actual income is also subject to unobservable shocks characterized by

a standard Brownian-motion {zt } which causes the moral hazard problem that will be introduced later. The functions g and σ satisfy the following conditions.

Assumption 1. For all θ ∈ {0, 1}: (a) g(·, θ) is twice continuously differentiable with 0 ≤ ga (a, θ) < ∞, gaa (a, θ) ≤ 0 for all a ∈ [0, a ¯];13 (b) g(0, θ) = 0; (c) σ ≡ ( ) minθ∈{0,1} σ θ˜ > 0. ˜

2.2

Preferences and moral hazard

The principal delegates the firm to the agent by offering a contract denoted ({at } , {ct }),

with at and ct being the agent’s effort input and compensation at time t, respectively. Because of limited liability, ct must be nonnegative. Processes A and C have to be adapted to {Ft }, the filtration generated by Y and Θ, which are publicly observable.

For technical reasons, I require A to be {Ft }-predictable.14 I assume that the principal

can commit to a long-term contract and the agent has an outside option worth zero such that there is no relevant participation constraint due to limited liability. A moral hazard problem arises because the principal cannot distinguish the unobservable shocks dzt in income from the agent’s private effort. So, under a contract, the agent could choose an alternative effort policy, an {Ft }-predictable effort process other

than A, to maximize the following objective function: max βE ˆ A∈A

ˆ A

[∫



e

−βt

0

] (u(ct ) − h(ˆ at )) dt .

(2) ˆ

Here, A is the set of effort processes, β > 0 is the discount rate, and E A is the expectation

ˆ 15 The utility operator based on the joint probability measure of Y and Θ implied by A. function, u, and the effort cost function, h, satisfy the following conditions. Assumption 2. The function u is twice continuously differentiable with u(0) = 0, u′ > 0, u′′ < 0, and limc→∞ u′ (c) = 0; h is twice continuously differentiable with h(0) = 0, h′ > 0, h′′ > 0, and h′ (0) = h for some real number h > 0. 12

All the results in this paper would still hold if the interval [0, a ¯] were replaced by the binary set {0, a ¯}. 13 In this paper, the derivatives on the boundaries are one-sided. 14 See Chapter 5 of Elliott [1982] for a definition of the predictability of stochastic processes. 15 See Appendix A for the measure transfer.

6

I focus on the incentive-compatible contracts that promise a given level of initial expected utility, w, to the agent. Specifically, a contract (A, C) is incentive compatible if A solves (2). The principal’s objective is max βE

A

(A,C)

[∫



e

−βt

0

] (dyt − ct dt)

such that (A, C) is incentive compatible and βE

A

[∫



e

−βt

0

] (u(ct ) − h(at ))dt = w.

(3)

Finally, I impose the following assumption so that, with moral hazard, the firm is more profitable in the good state (θt = 1) than in the bad state (θt = 0). Assumption 3. g (a, 1) > g (a, 0) and

ga (a,1) σ(1)



ga (a,0) σ(0)

for all a ∈ [0, a ¯].

Namely, the firm is more productive for every level of the agent’s effort, and the principal’s inference about the agent’s effort based on the firm’s income is more precise in the good state.

3

Incentive compatibility and the HJB differential equations

Let wt be the agent’s continuation utility given a contract (A, C), that is wt ≡ βE

A

[∫

t



e

−β(τ −t)

(u(cτ ) − h(aτ )) dτ |Ft

]

for t ∈ [0, ∞)

{ } Define ztA such that dztA = (dyt − g (at , θt ) dt) /σ (θt ) with z0A = 0, which is a standard

Brownian motion under the measure µA , the probability measure under the effort process A defined in Appendix A. Finally, let nt denote the number of state transitions that the firm has experienced up to t. The martingale representation theorem then implies the following result. Proposition 1. (Law of motion of W ) Let A and C be the policies of the agent’s effort and compensation, respectively. Then there exist two {Ft }-predictable and square

7

integrable processes, {ϕt } and {φt }, such that dwt = β(wt − u(ct ) + h(at ))dt + βφt dztA + βϕt (−λ (θt ) dt + dnt ) for t ∈ [0, ∞).

(4)

Proof. See Appendix B. The coefficient φt is the sensitivity of wt to the agent’s performance,16 which determines his incentives for effort exertion. The coefficient ϕt is the sensitivity to the persistent shocks, which determines the adjustments in wt upon state transitions. The two coefficients are optimally chosen as part of the contract design. The following incentive compatibility condition is extended from the one in Sannikov [2008]. Proposition 2. (Incentive Compatibility) A contract (A, C) is incentive compatible if and only if at = arg max φt a∈[0,¯ a]

g (a, θt ) − h(a) σ(θt )

(5)

for all t ∈ [0, ∞) at which θt is continuous.17 According to (5), incentives are provided only through the sensitivity to the unobservable shocks, in line with the “sufficient statistic” theorem of Holmstr¨ om [1979]. Since the agent is strictly risk averse, φt should be the minimum that induces a level of effort. For a ∈ [0, a ¯] and θ ∈ {0, 1}, define ρ (a, θ) =

φ(a, θ) =

  ρ (a, θ)  0

σ(θ) ′ ga (a,θ) h (a)

if a ∈ (0, a ¯];

and

(6)

if a = 0.

Thus, under the optimal contract, (5) is equivalent to φt = φ (at , θt ) for t ∈ [0, ∞).

(7)

Let J : [0, u(∞)) × {0, 1} → R be the value function indicating the principal’s

expected payoff under the optimal contract given the agent’s continuation utility and 16 Notice that dztA indicates the unobservable productivity shocks that the principal is not able to distinguish from the firm’s income. 17 Since effort process A is predictable, its trajectory is almost surely continuous from the left. Therefore, the incentive-compatibility condition is not satisfied at the instant when a transition occurs. Hereafter, I omit the phrase “at which θt is continuous” because doing so will not cause confusion.

8

the state of the firm. We expect this function to solve the HJB equation   (g (a, θ) − c − J (w, θ)) + (w − u (c) + h (a) − λ(θ)ϕ) Jw (w, θ) 0 = max c,ϕ,a∈[0,¯ a]  +βφ2 (a, θ) J (w, θ) + λ(θ) (J (w + βϕ, 1 − θ) − J (w, θ))  ww

(8)

β

for θ ∈ {0, 1}. However, (8) represents a pair of ODEs intertwined through advanced and delayed terms, which are difficult to solve. So, instead of characterizing the solutions

directly, I iteratively construct two value function sequences and show that they converge to J (w, 1) and J (w, 0) respectively.

4

Iterative construction and verification of J(w, θ)

To illustrate the iterative construction, imagine that the firm experiences only finitely many state transitions and then stays in the bad state permanently. The value function after the last transition can be characterized by using the method developed in Sannikov [2008]. Let us denote it J 0 (·, 0). Based on this value function, we recursively characterize J 1 (·, 1), the good-state value function just prior to the last shock, and then characterize J 1 (·, 0), the bad-state value function prior to the penultimate transition based on J 1 (·, 1), and so on. In this way, we construct two value function sequences, { k } { } J (·, 1) k∈N and J k (·, 0) k∈N , as illustrated in Figure 1. J 3 (·, 1)

J 2 (·, 1)

J 1 (·, 1)

θt

1

Permanent Bad State

0

J 2 (·, 0)

J 1 (·, 0)

J 0 (·, 0)

Time

Figure 1: Iterative construction and value function sequences. The horizontal axis represents the time. The solid curve is a trajectory of Θ, which indicates the state transitions. The value functions in the sequences are indicated along the curve.

I show that as k converges to infinity, the two sequences converge to J(·, 1) and J(·, 0), respectively. Here is the plan of this section. In Subsection 4.1, I briefly characterize J 0 (·, 0) by recalling Sannikov [2008]. In Subsection 4.2, I interpret the recursive procedure that enables us to construct the two value function sequences step by step. In Subsection 4.3, I show that the two sequences converge to the value functions in the 9

contracting environment with Markov state transitions. In this section, discussions are heuristic, and I leave the formal verifications in the propositions and their proofs. Before proceeding with our discussion, I show a preliminary result that simplifies our analysis. Define the retirement contract to be the restricted-optimal contract under which at = 0 for t ∈ [0, ∞), namely, the agent is allowed to shirk permanently. Optimal

risk sharing implies that his compensation should be constant across time and equal to cR (w) = u−1 (w). Thus, the principal’s expected payoff is characterized by the retirement value function, J R , which satisfies J R (w) = −u−1 (w) for w ∈ [0, u(∞)).

(9)

The following lemma is similar to Lemma 4 in Sannikov [2008]. ( ( )) Lemma 1. Define g¯a ≡ maxθ∈{0,1} ga (0, θ) and wRC ∈ [0, u(∞)) such that u′ cR wRC = ga . If a contract (A, C) provides the agent with expected utility w ≥ wRC and yields h/¯ the principal’s expected payoff Π, then Π ≤ J R (w).

Intuitively, the income effect implies that the agent should retire if his continuation utility is too high because compensating his effort cost is very expensive, and wRC is an upper bound of w beyond which the agent should retire given any history; on the other hand, limited liability implies that the agent cannot be punished if his continuation utility reaches zero and he then has to retire. As a result, we can focus on the contract [ ] design over 0, wRC . For convenience, I define the following “well-behaved” function set.

Definition 1. Let J be the set of differentiable and strictly concave functions J over

[0, u (∞)) such that J(0) = 0, J ′ (0) < ∞ and (a) J(w) = J R (w) for w ∈ [wRC , u(∞)); (b) J(w) ≥ J R (w) for w ∈ [0, u(∞));

(c) J ′ is Lipschitz continuous and piecewise differentiable over [0, wRC ];18 (d) J(w) ≤ g¯ for all w ∈ [0, u(∞)) with g¯ ≡ g (¯ a, 1). Notice that Lipschitz continuity in Part (c) implies that J ′ is absolutely continuous.19 18

Here, “piecewise differentiable” means that there exists a sequence of disjoint intervals, the union of which is its domain, and J ′ is differentiable in the interiors of the intervals. 19 See Problem 5.20 in Royden [1988].

10

4.1

Characterizing J 0 (·, 0)

Since the firm is no longer subject to any state transition, the method developed in Sannikov [2008] applies. Here, I briefly introduce the basic idea. The state space of w is divided into a shirking region and an effort-exertion region. In the shirking region, the agent retires and J 0 (·, 0) is represented by J R . In the effort-exertion region, the agent is [ ] induced to exert effort. Typically, this region is an interval20 contained in 0, wRC , and

over this interval J 0 (·, 0) is represented by the solution to an HJB differential equation satisfying a set of value-matching and smooth-pasting conditions on the boundaries and dominating J R .21 We could summarize the idea as the follows. We take J R as a

“baseline” value function and determine the region of the state space where inducing the agent to work could improve the principal’s expected payoff from working. According to the construction, it is easy to show the following result. Lemma 2. J 0 (·, 0) ∈ J . Since, without persistent shocks, the production technology is time invariant and the agent is equally patient as the principal, deferring the incentive provision and production is never efficient. Thus, shirking is always permanent. Namely, the principal either induces the agent to work or allows him to retire. However, if the firm transitions between two different states with different profitabilities, the regions of shirking and effort exertion may change from state to state, so that shirking may be temporal and the agent may revert to working from shirking when the state changes. If we characterize the value functions recursively state by state as illustrated in Figure 1, in each step, the division of the w-space should depend on the effort policy in the current state. When the next state transition occurs, a new phase starts and the effort-exertion policy may change. By extending the methodology introduced in Sannikov [2008], I develop a recursive procedure that enables us to accomplish each step in the iterative construction.

4.2

The recursive procedure

Starting with the characterization of J 1 (·, 1), which is based on J 0 (·, 0), we will use the recursive procedure repeatedly to construct the value function sequences. So, to 20

There could be multiple such intervals. The intuition behind the smooth-pasting condition is that, if the contract starts in the effort-exertion region the stopping time to allow the agent to shirk must be optimal. See Dixit and Pindyck [1994]. 21

11

interpret this procedure in general terms, I call the current value function that we are constructing the pre-shock value function and denote it by J O . I call the value after the next persistent shock the post-shock value function and denote it by J P , which is given and assumed to be a member of J . Since we are focusing on the current state, I omit

the dependence of the contract on θt . I also denote the Poisson time of the next shock T . Essentially, we are considering the optimal contract design over horizon [0, T ] given the optimal contract after T , which yields the value function J P .

J S (w) Improvement

Principal’s Expected Payoff

0

Region with Effort w

Figure 2: Construction of the pre-shock value function, J O . The horizontal axis represents the agent’s continuation utility, and the vertical axis represents the principal’s expected payoff. The dashed curve is the shirking value function, J S . The solid curve indicates the improvement based on J S over the interval labeled “Region with Effort.” The pre-shock value function is represented by J S outside this interval and by the improvement function inside it.

The procedure consists of two steps, which are illustrated in Figure 2. In the first step, we consider the shirking contract, the optimal contract under which the agent has to be allowed to shirk until T , and the shirking value function, J S , implied by this contract. Using J S as the baseline contract, in the second step we focus on the optimal contract that induces the agent to exert effort until T to determine the region of the state space where doing so improves the principal’s payoff. As we did earlier when J R was being used as the baseline value function, we determine unique solutions to an HJB differentiable equation associated with positive levels of effort that can be smoothly pasted on top of J S . Then, as illustrated in Figure 2, J O is represented by the solution (the solid curve labeled “Improvement”) in the region with effort and represented by J S (the dashed curve) outside it.

12

4.2.1

The value function J S and the shirking contract

Since the agent is risk averse, when he is allowed to shirk, his compensation and continuation utility should be constant across time. Thus, J S satisfies22 0 = max β(−c − J S (w)) + JwS (w)β(w − u(c) − λϕ) + λ(J P (w + βϕ) − J S (w)). c,ϕ

(10)

According to the right-hand side of (10), the compensation policy function, cS , and the continuation utility adjustment policy function, ϕS , satisfy

c (w) = S

and23

  0

( )  −1 u′−1 S J (w) w

if JwS (w) ≥

−1 u′ (0)

(11)

otherwise,

( ) JwS (w) = JwP w + βϕS (w) .

(12)

Here, (12) is the slope matching condition requiring that the marginal value of providing the agent with utility should be equal before and after the adjustment of w upon the shock. Intuitively, the adjustment ϕS does not affect the agent’s incentives, and the marginal value should be independent of whether the shock hits in the next moment. Otherwise, rearranging the utility provision could improve the principal’s payoff, leaving the agent’s intact. Since w is time invariant, its drift is zero and (4) implies ( ) w − u cS (w) − λϕS (w) = 0 for w ∈ [0, u(∞)).

(13)

Condition (13), along with (11) and (12), implies a characterization of J S (w). To see it, we define a function ω : [0, u(∞)) → [0, u(∞)) such that ω(w) =

22

    

λ β+λ w λ β+λ w

+

β β+λ u

(

u′−1

(

−1 P (w) Jw

))

if w ∈ [0, wP ]; if w ∈

(14)

(wP , u(∞)),

In fact, Proposition C.1 in Appendix C.2 establishes concavity and differentiability of J S . S P As shown in Proposition C.2 in Appendix C.2, Jw (0) ≤ Jw (0), and then the concavity of J S implies S that the first-order condition of ϕ is interior. 23

13

with wP ∈ [0, u(∞)) satisfying JwP (wP ) =

−1 24 u′ (0) .

In fact, ω(w) is the level from which

the agent’s continuation utility is adjusted to w at T , namely, ω (wT ) = wT − .25 Then we have the following characterization of J S . Proposition 3. Suppose that J P ∈ J . For w ∈ [0, u(∞)), JwS (w)

=

JwP (ω −1 (w))

and J (w) = S



0

w

) ( JwP ω −1 (w) ˆ dw, ˆ

(15)

and ( ) β λ J (w) = − ψ JwP (ω −1 (w)) + JP β+λ β+λ S

(

) )) β+λ β ( ( P −1 w − u ψ Jw (ω (w)) , λ λ (16)

where ψ : (−∞, JwP (0)] → R+ satisfies26 ψ(x) =

Proof. See Appendix C.1.

 ( )  u′−1 −1 x  0

if x ∈ (−∞, JwP (wP )), if x ∈ [JwP (wP ), JwP (0)].

Equation (15) is straightforward given the slope matching condition (12), and, by combining (15) with (14), we obtain (16). The next proposition characterizes the shirking contract implying J S and verifies its optimality. The proof is standard and omitted.27 Proposition 4. Suppose that J P ∈ J . Let cS and ϕS be the functions defined by

(11) and (12), respectively, and let the law of motion of the agent’s continuation utility satisfy wt =

24

  w

if t < T,

 w + βϕS (w)

(17)

if t = T.

Here, wP is the threshold level of w in the state after the shock such that the agent’s compensation is zero if w ≤ wP and positive if w > wP . 25 To understand the expression of ω, notice that the agent’s compensation and wt are time invariant prior to T . Then ] [ ∫ T ( ) e−βt u cS dt + λe(λ+β)T w . ω(w) = E β 0

Equations (11) and (12) imply that if w ∈ [0, wP ], cS = 0 and we have the first expression in (14); ( ) S ′−1 P otherwise, c = u −1/Jw (w) and we have the second one. 26 Here, function ψ indicates the agent’s compensation level given the principal’s marginal value of providing the agent with additional continuation utility under the shirking contract. 27 Additional useful results about J S are discussed in Appendix C.2.

14

Then under the shirking contract that promises the agent expected utility w, at = 0 and ct = cS (wt ) for t ∈ [0, T ]. The contract yields the expected payoff J S (w) to the principal. 4.2.2

The HJB equation with effort exertion and smooth pasting

Now, we focus on the contracts inducing the agent to exert positive levels of effort to determine the region of the w-space in which such contracts can raise the principal’s expected payoff above J S . To do so, I impose the sufficient incentive-compatibility condition φt = ρ (at , θt ) ,

(18)

which is equivalent to (7) if and only if at > 0. The principal’s maximization problem implies the following HJB differential equation: E Jww (w)

[

= min −2 g(a) − c − J E (w) + JwE (w) (w − u(c) + h(a) − λϕ) a,c,ϕ ] ) λ( P E + J (w + βϕ) − J (w) /βρ(a)2 . β

(19)

We pay attention to a particular kind of solutions for J E , defined below. Definition 2. An interval [wL , wR ] ⊂ [0, wRC ] is a working interval if there exists a solution to (19), J E , such that:

(a) J E (wR ) = J S (wR ) and JwE (wR ) = JwS (wR ); (b) J E (wL ) = J S (wL ) and JwE (wL ) ≥ JwS (wL ), with equality if wL > 0; (c) J E (w) > J S (w) for w ∈ (wL , wR ) and J E (w) ≥ J S (w) for all w ∈ [0, wRC ]. J E is called the working function associated with the working interval [wL , wR ]. The following proposition characterizes J O and shows that it is well defined, unique, and belongs to J . Proposition 5. Suppose that J P ∈ J . Define J O : [0, u(∞)) → R, such that if w is in the interior of a working interval [wL , wR ], then J O (w) = J E (w) with J E being the

associated working function; otherwise, J O (w) = J S (w). Then J O ∈ J . Proof. See Appendix C.4. 15

Inside a working interval, it is optimal to induce the agent to exert effort, and the principal’s expected payoff is improved from J S according to Part (c) of Definition 2. Parts (a) and (b) are the value-matching and smooth-pasting conditions on the boundaries of the interval, which are necessary because the incentive provision should be optimally stopped once wt hits one of the boundaries prior to T . To describe the optimal contract, I denote the policy functions implied by (19) a, c, and ϕ. Let T S = inf {t < T : wt = wL or wR } be the time that wt hits one of the working in-

terval boundaries. For any w ∈ (wL , wR ) with [wL , wR ] being a working interval, define

the agent’s continuation utility process {wt }t∈[0,T ] such that for t ∈ [0, T S ∧ T ), dwt = β [wt − u (c(wt )) + h (a(wt )) − λϕ (wt )] dt ρ (a (wt )) +β (dyt − g (a (wt )) dt) with w0 = w; σ

(20)

if T S < T , for t ∈ [T S , T ), wt = wT S and wT = wT S + βϕS (wT S ); if T S > T , wT =

wT − + βϕ (wT − ). The following proposition characterizes the optimal contract and verifies the optimality of J O .

Proposition 6. Suppose that J P ∈ J . If w is not in the interior of any working interval, the shirking contract is optimal and the principal’s expected payoff is J O (w) = J S (w). Now suppose w is in the interior of a working interval [wL , wR ]. Then the law of motion of wt is characterized by (20). For t ∈ [0, T S ∧ T ), at = a(wt ) and ct = c(wt ); if

T S < T , the contract switches to the shirking contract promising wT S at T S ; if T S > T , the contract switches to the optimal contract over [T, ∞) promising wT at T . The

principal’s resulting expected payoff is J O (w) = J E (w). Proof. See Appendix C.5.

The slope matching condition (12) still holds for the continuation utility adjustment policy, ϕ, for the same reason mentioned in the context of the shirking contract. The following theorem concludes that the agent exerts positive levels of effort if and only if wt is inside a working interval.28 Theorem 1. Suppose that J P ∈ J and that the optimal incentive-compatible contract

(A, C) is implemented, which provides the agent with expected utility w ∈ (wL , wR ) 28

Notice that the optimality of the contract inside a working interval is verified given the sufficient incentive constraint (18). Theorem 1 shows that it is indeed optimal to induce positive levels of effort so that this condition is necessary as well.

16

with [wL , wR ] being a working interval. Then at > 0 if and only if t ∈ [0, T S ∧ T ) with

T S = inf {t < T : wt = wL or wR }. Proof. See Appendix C.6.

When a persistent shock hits, the continuation utility is adjusted to wT = wT − + βϕ (wT − ), and whether the agent should exert effort depends on whether or not wT is inside a working interval of J P .

4.3

Convergence of the sequences

Lemma 2 and Proposition 5 imply all the value functions in the two sequences that are characterized by repeatedly applying the recursive procedure belong to J . The following lemma shows that the sequences are increasing in k. ′

Lemma 3. For any w ∈ [0, u(∞)), J k (w, θ) ≤ J k (w, θ) if k < k ′ for all θ ∈ {0, 1}. Proof. See Appendix D.1. The intuition is as follows. Prior to the permanent transition to the bad state, the number of visits to and the expected length of time experienced in the good state increase with k, as does the principal’s expected payoff. { } { } As the principal’s payoff is bounded, Lemma 3 implies that J k (·, 0) k∈N and J k (·, 1) k∈N

¯ 0) and J(·, ¯ 1), reconverge pointwise to a unique pair of functions. I denote them J(·,

spectively, and show that they are the value functions with Markov state transitions that satisfy (8).29

4.4

The limit functions

¯ 1) and J(·, ¯ 0) are the value functions in the good To verify that the limit functions J(·, and the bad state, respectively, I first show that they are upper bounds on the expected payoffs achieved under all incentive-compatible contracts with Markov state transitions. Proposition 7. For any w ∈ [0, wRC ], let Jˆ1 and Jˆ0 be the expected payoffs of the

principal under some incentive-compatible contracts promising the agent expected utility ¯ 1) and Jˆ0 ≤ J(w, ¯ 0). w in the good and bad states, respectively. Then, Jˆ1 ≤ J(w,

Proof. See Appendix D.2. 29

In fact, the two sequences converge uniformly, as shown in Lemma D.2 in Appendix D.3.

17

¯ 0) Let J ∗ (·, 1) and J ∗ (·, 0) be the pre-shock value functions constructed by using J(·, ¯ 1) as the post-shock value functions, respectively.30 The following proposition and J(·, says that, after one round of iteration of the procedure, we get the same pair so that ¯ 0) and J(·, ¯ 1) form a “fixed point” of the procedure. J(·, ¯ 0) and J ∗ (w, 1) = J(w, ¯ 1). Proposition 8. For any w ∈ [0, wRC ], J ∗ (w, 0) = J(w, Proof. See Appendix D.3. ¯ 1) and J(·, ¯ 0) satisfy (8).31 Now I conclude A consequence of Proposition 8 is that J(·, this subsection by showing that expected payoffs J¯ (w, 1) and J¯ (w, 0) can be achieved under some incentive-compatible contracts. Proposition 9. For any w ∈ [0, wRC ], the principal’s expected payoffs in the good

¯ 1) and J(w, ¯ 0), respectively, can be achieved under an incentiveand bad states, J(w, compatible contract providing the agent with expected utility w. Proof. See Appendix D.4. ¯ 1) and J(·, ¯ 0) are the value functions, J(·, 1) Propositions 7 and 9 imply that J(·, and J(·, 0), as desired. Furthermore, the optimal policy functions can be characterized by following the recursive procedure with them as the post-shock value functions.

5

Economic implications

In this section, we discuss the economic implications of the model based on some analytical results and numerical examples. Before proceeding to the discussions, I briefly introduce the dynamics of the agent’s effort under the optimal contract. If the agent’s initial continuation utility w is not in the interior of a working interval, the contract starts with shirking. Otherwise, it induces the agent to work, and wt moves around inside the working interval driven by the agent’s performance. Once wt hits one of the boundaries, the agent is allowed to shirk. In any case, if a state transition occurs, wt is adjusted to a new level, and a new phase of the contract starts. The new effort policy depends on whether the adjusted level is located in the interior of a working interval of the new state. 30 I show in Corollary D.1 in Appendix D.3 that the two limit functions belong to J . Hence, the recursive procedure applies to them. 31 ¯ 1) and J(·, ¯ 0) are piecewise second-order differentiable. Therefore, the solution is in the In fact, J(·, Carath´eodory sense defined in Coddington and Levinson [1955] and Hartman [2002].

18

5.1

Continuation utility adjustment with respect to a persistent observable shock

Let us start with the adjustment of wt with respect to the persistent observable shocks [ ) and let ϕ : 0, wRC × {0, 1} → R be the continuation utility adjustment policy function.

The slope-matching rule immediately implies the following pattern of the adjustments of wt with respect to the state transitions.

Corollary 1. Suppose that [wL , wR ] ∈ [0, wRC ) is an interval of the agent’s continuation ¯ 1) > J(w, ¯ 0) for all w ∈ (wL , wR ) and J(w, ¯ 1) = J(w, ¯ 0) for utility level such that J(w,

w = wL or wR . Then for any wM ∈ (wL , wR ), there exists an interval UL0 ⊂ (wL , wM ] such that ϕ(w, 0) < 0 for all w ∈ UL0 and an interval UR0 ⊂ [wM , wR ) such that ϕ(w, 0) > 0 for all w ∈ UR0 ; symmetrically, there exists an interval UL1 ⊂ (wL , wM ] such that ϕ(w, 1) > 0 for all w ∈ UL1 and an interval UR1 ⊂ [wM , wR ) such that ϕ(w, 1) < 0 for all w ∈ UR1 .

According to the characterization of the value functions, the interval [wL , wR ] mentioned in Corollary 1 is a working interval in the good state.32 I graphically illustrate the continuation utility pattern pattern described in this corollary based on the following numerical example. Example 1. g(a, θ) = α(θ)a with α(θ) being the marginal product of the agent’s effort in state θ, α(1) = 0.35, and α(0) = 0.3; the effort level is chosen from [0, 1]; √ λ(1) = λ(0) = 0.1; σ(1) = σ(0) = 0.4; β = 0.04; u(a) = a; and h(a) = 0.5a2 + 0.4a. ¯ ¯ Assumption 3 implies J(w, 1) ≥ J(w, 0) for all w ∈ [0, u(∞)). The inequality is strict and the ¯ adjustment is not zero if and only if J(w, 0) > J R (w). See Proposition D.2 in Appendix D.5. 32

19

Adjustment of Continuation Utility

Good State Bad State

0.01

0

-0.01

0

0.2

w

Figure 3: Continuation utility adjustments. The horizontal axis represents the agent’s continuation utility. The solid curve indicates the adjustment of the agent’s continuation utility when the firm transitions from the good state to the bad state, and the dashed curve indicates the adjustment when the firm transitions from the bad state to the good state.

Figure 3 illustrates the policy functions of the continuation utility adjustment upon a persistent shock in the two states. In this example, there is one working interval in each state, and the bad-state interval is a subset of the good-state one.33 The continuation utility responds to persistent shocks over the working interval in the good state. Upon a bad shock, the adjustment is positive when wt is at high levels and negative when wt is at low levels (the solid curve). Upon a good shock, the pattern is the opposite (the dashed curve). This adjustment pattern is the result of the optimal dependence of wt on the arrivals of the persistent shocks. Since the shocks are beyond the agent’s control, this dependence does not affect his incentives to exert effort. So the only restriction when we design this dependence is the promise-keeping condition. Specifically, there are two ways to tie wt to the arrival of the next persistent shock without changing the expected utility promised to the agent: (i) by letting it drift toward lower levels prior to the shock and adjusting it back to higher levels when the shock hits or (ii) by doing the opposite. These two different ways correspond to the negative and positive values of ϕ, the coefficient of the compensated jump martingale −λdt + dnt in the law of motion (4). In any case, the expectation of the change in a martingale is zero.

Because of limited liability and the income effect, if the principal keeps inducing the agent to exert effort, she will have to allow him to shirk when wt reaches the low or 33 Outside the good-state working interval, the agent retires and wt is time invariant. See Corollary 2 below.

20

high shirking region. In the good state, the agent is more productive and avoiding the shirking regions is more valuable. Therefore, in the good state, we should adopt the arrangement (i) mentioned above when wt is at a relatively high level and (ii) when it is at a relatively low level. If the firm is in the bad state, the arrangement follows the opposite pattern, according to which wt is adjusted to a moderate level far from the shirking regions when the firm transits into the good state. As a result, the incentive provision becomes more efficient when productivity goes up. The dependence of wt on the state transitions under the optimal contract has the following implications for the employment compensation design over the business cycle in practice. We could interpret the agent’s continuation utility in the model as the rank of an employee, which is associated with future benefits in the rest of his tenure, such as bonus, pension, and vacation time, and the rank is determined by his past performance. During a boom period, if a high-ranking employee’s effort inputs are more valuable, he should be punished for bad outcomes, but his rewards for good ones should be put on hold so that he will not become too rich to be motivated because of the income effect. When a downturn of the economy comes, a lump-sum reward could be paid as compensation if he keeps performing well. Symmetrically, a low-ranking employee should be rewarded only for good outcomes but not punished for bad ones so that he would not become too poor to be punished. If his performance turns out to be bad, instead of cutting his benefits or demoting him, the firm should put the punishments on hold. If, however, this employee cannot improve until the recession comes, the contract should specify a severe punishment. This adjustment pattern may explain why some top-ranking CEOs’ received bonuses when the recent financial crisis hit the economy, but low-ranking employees suffered the consequences.

5.2

Responses of effort with respect to persistent observable shocks

In this subsection, we discuss the responses of the agent’s effort exertion to the state transitions and show that, at macroeconomic levels, the moral hazard problem amplifies the impact of the business cycle on the productivity of the economy. Let us start with the following general result. Theorem 2. If S 1 and S 0 are the regions of the w-space in which the agent is allowed to shirk in the good and bad states, respectively, then S 1 is a strict subset of S 0 . 21

If [w ¯L , w ¯R ] is a working interval in the good state, there exist wL and wR such that w ¯L < w L < w R < w ¯R and the agent shirks over [w ¯L , wL ] ∪ [wR , w ¯R ] in the bad state. Proof. See Appendix E. Theorem 2 implies that the agent is more likely to be allowed to shirk in the bad state. When the firm transitions from the bad state to the good state, if wt was originally in S 1 \S 0 so that the agent was shirking, he immediately reverts to exerting effort. When

the firm transitions from the good state to the bad state, with some probability, wt is adjusted into S 1 \ S 0 and the agent stops working immediately.34

Readers may be inclined to explain this pattern by the trade-off between the benefit

and the cost of inducing effort exertion that we see in static moral hazard models pointwise across time. Specifically, the agent is allowed to shirk in the bad state because the marginal product of his effort is too low and not worth the incentive compensation paid to him. However, the explanation proposed by this model is different and based on an intertemporal trade-off between the good and bad states. Notice that Theorem 2 is independent of the specific level of productivity. In fact, no matter how productive the agent is in the bad state, he is always allowed to shirk after some histories if a better state is expected in the future. To see why, notice that the incentive provision requires the principal to be able to reward and punish the agent. However, limited liability and the income effect imply a lower and an upper bound of the level of wt at which rewards and punishments can be implemented. Hence, if the principal keeps inducing the agent to exert effort, she will lose this ability and will have to terminate the incentive provision in the finite future. Namely, incentives that can be provided are scarce so that a trade-off between the incentive provisions in the two states arises. Obviously, when termination is close, it is optimal to temporally suspend incentive provision in the bad state so that the firm could resume production later when productivity goes up. For the same reason, symmetrically we have the following result, which is a corollary of Proposition C.2 in Appendix C.2. Corollary 2. Under the optimal contract, the agent does not revert from shirking to working upon a transition from the good state to the bad state. ( ) More precisely, this happens if wt is originally in ω S 1 \ S 0 . Here, function ω is defined by (14) with the post-shock value function being J(·, 0). 34

22

In other words, shirking is permanent in the good state so that the agent retires once the shirking region is reached. Intuitively, there is no need to reserve the ability to provide incentives for the future if current productivity is high. 5.2.1

Comparative statics analysis

To see how the cross-state trade-off affects the incentive provision and effort exertion, I demonstrate two numerical comparative statics in Figure 4 by using Example 1 as the benchmark. In the left panel of Figure 4, I keep everything else unchanged and plot the effort policy functions in the bad state with different values of α(1), the marginal product of the agent’s effort in the good state. As α(1) increases, the agent exerts lower levels of effort and shirks more in the bad state. This comparative statics analysis illustrates a “substitution effect” between the allocations of incentive provision the in the two states: suspending incentive provision in the bad state is more valuable if the anticipation of future productivity in the good state is higher.

α(1) = 0.32 α(1) = 0.35 α(1) = 0.38

λ(0) = 0.05 λ(0) = 0.1 λ(0) = 0.15

0.4

Effort in the Bad State

Effort in the Bad State

0.4

0.2

0.2

0

0 0

0.2

0

w

0.2

w

Figure 4: Comparative statics with respect to α(1) and λ(0). In both panels, the horizontal axis represents the agent’s continuation utility, and the vertical axis represents his effort in the bad state. In the left panel, the dashed-dotted, dashed, and solid curves represent the policy functions with α(1) being 0.32, 0.35, and 0.38, respectively. In the right panel, the dashed-dotted, dashed, and solid curves represent the policy functions with λ(0) being 0.05, 0.1, and 0.15, respectively.

In the right panel of Figure 4, I plot the policy functions with different values of λ(0), the Poisson transition rate from the bad state to the good. The agent exerts lower levels of effort and shirks more with a higher value of λ(0). The intuition is that the expected waiting time for the coming transition to the good state decreases with λ(0), as does the forgone income of the firm that is due to shirking. Therefore, as λ(0) increases, the 23

opportunity cost of allowing the agent to shirk to reserve profitable opportunities in the future decreases. 5.2.2

Amplification of the impact of the business cycle on output

In this subsection, we consider an economy that consists of a large number of firms with identical production technology, and each firm delegates its production activities to an agent according to the optimal contract. The economy is over the business cycle characterized by the Markov switching process Θ, which indicates the fluctuation of the marginal product of all the agents’ effort. I assume that the Brownian motion Z is i.i.d. across the firms. Namely, the unobservable shocks causing the moral hazard problem in each firm are idiosyncratic. Furthermore, let us focus on the case in which the agent’s effort choice is binary, either shirking or working. In this economy, Theorem 2 implies a mechanism through which the information frictions amplify the impact of the business cycle on the total output of the economy. Obviously, when a downturn of the economy comes, some firms switch to the shirking contract and temporally stop producing, and when the economy recovers, they resume the incentive provisions and the agents revert to working. Therefore, a negative business cycle shock not only lowers the productivity of the firms but also reduces effort exertion. To numerically demonstrate this mechanism, I simulate such an economy consisting of one million firms for 250 years. The production technology and the preferences are the same as those in Example 1, except for the following: the level of effort is chosen from the binary set {0, 1} with the effort costs h(1) = 0.7 and h(0) = 0; the marginal

product of the agent’s effort in a boom period is α(1) = 1 and that in a recession period is α(0) = 0.8; β = 0.03; λ(1) = λ(0) = 0.25. To make sure that the aggregate production activities of the economy follow a stationary pattern and to prevent the firms from being ultimately absorbed by the retirement regions after a long time of simulation, I assume that each firm exogenously and independently exits from the economy with a Poisson rate 0.07 and is replaced by a new firm. The contract in each new firm initially promises the agent a utility level that maximizes the principal’s expected payoff in the current state. The time unit is one year, and in Figure 5, I interpret the simulation results over

the last 50 years.

24

1 0.5 0 200

205

210

215

220

225

230

235

240

245

250

230

235

240

245

250

Time 1 0.5 0 200

205

210

215

220

225

Time

Figure 5: Amplification of the impact of cyclical productivity shocks on output. In both panels, the horizontal axis represents the time. In the upper panel, the dashed curve represents the sample path of Θ used in the simulation, which indicates the fluctuations in productivity. The solid curve represents the average output level of the firms. In the bottom panel, the solid curve indicates the fraction of firms with output, namely, with agents’ effort exertions.

In the upper panel, the dashed curve indicates the fluctuation of α (θt ) over the business cycle. Notice that, in an efficient outcome in the first-best case, the agents always exert effort, and this curve also indicates the fluctuation of the average output of the economy over the business cycle. The solid curve indicates the average output level of the economy under the optimal contract with moral hazard. Obviously, the average output level is more volatile than productivity across time, and the impacts of the business cycle are amplified. In the bottom panel, I plot the fraction of firms without output in which the agents are allowed to shirk as a function of time. As predicted, in a recession period, a larger fraction of the firms suspend incentives provision, and the total effort exertion is procyclical. A large literature in macroeconomics is devoted to understanding the mechanisms through which agency frictions amplify business cycle shocks. My model provides a mechanism based on information frictions that is distinct from the existing mechanisms. This mechanism is the result of the efficient allocation of incentives across different business cycle states.

6

Conclusion

In this paper, I develop a method to characterize the optimal contract in a continuoustime moral hazard model with persistent observable shocks characterized by a two-state

25

Markov switching process. This method allows us to consider the optimal contract with history-dependent effort exertion and shirking decisions. In addition to the methodological contribution, the model highlights an important trade-off between the effort exertion in different states across time, which potentially gives rise to a mechanism amplifying the productivity shocks over the business cycle. The pattern of the agent’s utility adjustments with respect to the persistent shocks may shed some light on compensation design observed in practice.

Acknowledgment I am grateful to Noah Williams for his guidance and encouragement. I also thank Hengjie Ai, Bo Chen, Chang-Koo Chi, Raymond Deneckere, Chao He, Daniel Quint, Ming Li, Michael Rapp, Marzena Rostek, Bill Sandholm, Yuliy Sannikov, Ricardo Serrano-Padial, Neng Wang and Marek Weretka. All remaining errors are mine.

Appendix A

The probability basis and the measure transfer

) ( { } Let ΩΘ , F Θ , FtΘ , µΘ be the probability basis of Θ with ΩΘ being the set of {0, 1}valued piecewise constant functions on [0, ∞). Now I characterize the measure transfer ˆ ∈ ΩΘ . over the Z space under an effort policy A ∈ A conditioning on a given trajectory Θ Z Let Ω be the set of continuous functions on)[0, ∞) equipped with the sigma algebra F Z ( and let µ ¯Z be the Wiener measure on Ω, F Z which implies a standard Brownian motion { Z} ¯ ˆ over ΩZ , denoted {¯ z }. Let t { ( )} Ft{ be ( the )} filtration generated by Z. Given trajectory Θ, the processes g at , θˆt and σ θˆt are F Z -predictable. Furthermore t

) ( g a , θˆ t t g (¯ a, 1) ( ) ≤ min {σ (0) , σ (1)} < ∞ for all t ∈ [0, ∞). σ θˆt

Therefore we can define a µ ¯Z -martingale, ) ( ¯ Θ, ˆ t = exp ξ Z,

  ∫  

0

t

{ ( )} ¯ Θ, ˆ t , such that ξ Z,

) ) 2   (  ∫ ˆ ˆ  g as , θs 1 t  g as , θs  ( ) d¯ ( ) ds . zs −  2 0  σ θˆs σ θˆs (

(A.1)

[ ( )] ( ) ¯ Θ, ˆ t ˆ such that By (A.1), E ξ Z, = 1 under µ ¯Z . Hence the measure µZ A,t ·|Θ ( ) ∫ ( ) ( ) Z ˆ ¯ ˆ µZ µZ dZ¯ for any S Z ∈ FtZ is equivalent to µ ¯Z and the A,t S |Θ = S Z ξ Z, Θ, t d¯ { ( )} ( ) ( ) Z Z ˆ ˆ ˆ family of measures µZ is consistent, i.e. µZ A,t ·|Θ A,t′ ·|Θ equals µA,t ·|Θ on Ft if 26

( ) ˆ t′ ≥ t. Then according to Stroock [1987](Lemma 4.2), there exists a unique µZ A ·|Θ on ( ) ( ) ( ) ( Z Z) ∫ t Z ˆ ˆ ˆ Ω ,F such that µZ zs . A ·|Θ |Ft = µA,t ·|Θ for all t ∈ [0, ∞). Let yt = 0 σ θs d¯ ˆ { } ∫ g a , θ t ( s s) By Stroock [1987](Lemma 4.3),35 the process ztA defined by ztA = z¯t − 0 ds is σ (θˆs ) ( ) ˆ a standard Brownian motion under µZ A ·|Θ .

To characterize the measure transfer over the joint space, define Ω ≡ ΩZ × ΩΘ , Θ F ≡ F Z ⊗ F Θ and Ft ≡ FtZ ⊗ ( Ft for ) t ∈ ([0, ∞). ) Define the measure µA on Ω such that ( Z ) ∫ ∫ Z Θ Θ ˆ ˆ ˆ µA S × S ≡ S Θ S Z µA dZ|Θ dµ dΘ for any S Z ∈ F Z and S Θ ∈ F Ω . So the probability basis given the effort policies A is (Ω, F, {Ft } , µA ).

B

Proof of Proposition 1

Define Υ∞ (A, C) = β



0



e−βt (u(ct ) − h(at )) dt

as the agent’s random total utility under (A, C), and for t ∈ [0, ∞), define Υt (A, C) as the Ft -conditional expectation of Υ∞ (A, C). So, Υt (A, C) = β



t

0

e−βτ (u(cτ ) − h(aτ )) dτ + e−βt wt .

(B.1)

Since Υ∞ (A, C) is an integrable F-measurable random variable, {Υt (A, C)} is a uniformly integrable martingale under measure µA ,36 and thus a local martingale.37 By the Martingale Representation Theorem38 we have dΥt (A, C) = βe−βt φt dztA + βe−βt ϕt (−λ (θt ) dt + dnt )

(B.2)

with processes {φt } and {ϕt } being two {Ft }-predictable and square integrable processes. Hence (B.1) and (B.2) imply the desired result.

C

Appendix to Subsection 4.2

C.1

Proof of Proposition 3

Let us start with a preliminary result characterizing the function ω defined by (14). Lemma C.1. Suppose that J P ∈ J . Then (a) ω is strictly increasing and continuous, and ω −1 is well defined, strictly increasing, and continuous; (b) ω is differentiable over [0, wP ] and differentiable at any point over (wP , u(∞)) where JwP is; (c) ω(w) = w for w ∈ [wRC , u(∞)). 35

See See 37 See 38 See 36

also Proposition 1 of Huang and Pag´es [1992]. Elliott [1982] Theorem 4.11. Jacod and Shiryaev [2002] Chapter I, Definition 1.45. Jacod and Shiryaev [2002] Chapter III, Theorem 4.34.

27

Proof. Part (a): The result is true over [0, wP ] obviously. Suppose w ∈ (wP , u(∞)). First, strict concavity of J P and u implies that ω ′ (w) > 0; second, ω is continuous ( ) over −1 −1 P ′ ′−1 (w , u(∞)); third, notice that limw↓w = u (0) and limw↓w = 0, ˜ P J P (w) ˜ P u ˜ J P (w) ˜ w

w

λ and thus limw↓w ˜ = β+λ wP = ω(wP ), which implies that ω is continuous at wP ˜ P ω(w) and we have (a). Part (b): The result is true over [0, wP ] obviously. Suppose w ∈ (wP , u(∞)). Notice P P that JwP (w) < 0. Therefore, ω is differentiable over ( RC ) (w R, (u(∞)) ) wherever Jw is. −1 P RC R = Jw w , strict concavity of J P Part (c): Since Jw (0) = u′ (0) and Jw w implies that wP ≤ wRC and [wRC , u(∞)) ⊂ [wP , u(∞)). For all w ∈ [wRC , u(∞)), C.2, the second expression in (14) implies ω(w) = JwP (w) = JwR (w) ( and, ( by Lemma )) β λ −1 ′−1 = w. So we have the desired result. β+λ w + β+λ u u J P (w) w

Now I prove the proposition. Denote by J¯S the function on the right hand side λ of (16), I prove that J S (w) = J¯S (w) for all w ∈ [0, u(∞)). If w ∈ [0, β+λ wP ], ( ) ( ) JwP ω −1 (w) ∈ [JwP (w(P ), JwP (0)] and)ψ JwP (ω −1 (w)) = 0, which imply the result. Now, λ we assume that w ∈ wP , u(∞) . I show that J¯wS (w) = JwP (ω −1 (w)). Suppose that β+λ at ω −1 (w) ˜

λ is differentiable with w ˜ ∈ ( β+λ wP , u(∞)). Then, parts (a) and (b) of Lemma C.1 imply that ω −1 is differentiable at w. ˜ By differentiating with respect to w on both hand sides of (16) and combining with (14), we have

JwP

J¯wS (w) ˜ = JwP (ω −1 (w)). ˜

(C.1)

According to part (c) of Definition 1, if JwP is not differentiable at ω −1 (w), ˜ there P −1 exist two open intervals (w ˜l , w) ˜ and (w, ˜ w ˜r ) over which Jw (ω ) is differentiable. Con¯wS (w) ¯wS (w) ˜ = lim J ˜ = JwP (ω −1 (w)). ˜ Since J¯S is continuous over sequently, lim J w↑ ˜ w ˜ w↓ ˜ w ˜ ( ) λ wP , u(∞) according to (16), we have J¯wS (w) ˜ = JwP (ω −1 (w)). ˜ Now we check β+λ

λ wP to guarantee that J¯S can be writthat J¯S is continuous from the right at β+λ ten as an integral of its derivative. According to (16), we only need to check that −1 ˜ = w P , ψ(JwP (ω −1 )) is continuous from the right at this point. Since limw↓ ˜ λ wP ω (w)

limw↓ ˜

β+λ

λ P β+λ w

C.2

ψ(JwP (ω −1 (w))) ˜ = 0. Therefore, ψ(JwP (ω −1 )) is continuous from the right at and so is J¯S . Notice that J¯S (0) = 0, so we have the desired result.

λ wP β+λ

Properties of the shirking value function

In this subsection, I show some properties of J S . Let us start with the following two preliminary results. Lemma C.2. Let function cR : [0, u(∞)) → R+ be the retirement compensation policy) ( R R R ′−1 R function such that c (w) = −J (w) for w ∈ [0, u(∞)), then c (w) = u −1/Jw (w) for w ∈ [0, u(∞)). The proof is omitted as it is obvious. Lemma C.3. Suppose that JwS (w) = JwP (w) for some w ∈ [0, u(∞)), then J S (w) ≤ J P (w). ( ( )) −1 (w)) Proof. JwS (w) = JwP (w) implies ϕS (w) = (0 and, by (13), we(have (u ψ JwP (ω)) = ( ′−1 )) P ′−1 R w which, by Lemma C.2, implies u u −1/Jw (w) = u u −1/Jw (w) and then 28

JwP (w) = JwR (w). So we have that JwS (w) = JwP (w) implies JwS (w) = JwR (w) and cS (w) = cR (w) = −J R (w). Hence (10) implies β(J R (w)−J S (w))+λ(J P (w)−J S (w)) = 0. Since J S (w) ≥ J R (w), J P (w) ≥ J S (w) and we have the desired result. The following proposition depicts the properties of J S . Proposition C.1. The function J S characterized in Proposition 3 satisfies: (a) J S (w) = J R (w) for w ∈ [wRC , u(∞)); (b) J S (0) = 0, J S is differentiable and strictly concave; JwS is Lipschitz continuous over [0, wRC ], absolutely continuous over [0, u(∞)) and piecewise differentiable. Proof. Part (a): Straightforward according to (16) and Lemma C.2. Part (b): Part (a) of Lemma C.1 and (16) imply that J S is strictly concave, and S (16) also implies its differentiability. continuous over ( −1 ) Now, I show that Jw is Lipschitz RC P [0, w ]. Suppose that Jw ω (·) is differentiable at w ˜ ∈ [0, wRC ], hence (15) implies ) ( −1 ) d ( −1 S P ω (w) ˜ . Jww (w) ˜ = Jww ω (w) ˜ dw

(C.2)

Letting w ˆ = ω −1 (w), ˜ we have

[ ( ( ))] P λ λ β −1 d −1 Jww (w) ˆ ′−1 ω (w) ˆ = ≥ + u . 2 P P P β + λ β + λ Jw (w) ˆ dw Jw (w) ˆ β+λ Jw (w) ˆ {z } | ′

>0

S P ( −1 ) Jww ω (w) (w) ˜ ≤ β+λ ˜ which is bounded due to Lipschitz So, (C.2) implies Jww λ continuity of JwP . Furthermore, JwP is piecewise continuous and we have the desired result. The following proposition characterizes the relationships between J S , J R and J P . Proposition C.2. Suppose that J P ∈ J , then for any w ∈ [0, u(∞)) we have (a) J S (w) = J P (w) and JwS (w) = JwP (w) if and only if J P (w) = J R (w); (b) J S (w) ≤ J P (w), in addition, if J S (w) ̸= J R (w), J S (w) < J P (w); (c) J S (w) = J P (w) if and only if J P (w) = J R (w) if and only if J S (w) = J R (w). Proof. Part (a): Suppose that J S (w) = J P (w) and JwS (w) = JwP (w), then the optimality condition implies that cS (w) = cP (w)39 and ϕS (w) = 0. Consequently, (16) implies β λ J S (w) = − β+λ cS (w) + β+λ J S (w), and thus cS (w) = −J S (w). By (13) we have w = ( S ) u c (w) and thus J S (w) = −u−1 (w) = J R (w). Hence, J P (w) = J R (w). To prove the other direction, suppose that J P (w) = J R (w). By part (b) of Definition 1, JwP (w) = JwR((w). Let ˜ = ω(w). By Lemma C.2, w ˜ = w. Therefore ϕS (w) = 0 and ) w S S R (13) implies u c (w) = w and c (w) = −J (w), and we have JwS (w) = JwR (w) = JwP (w). By combining the results with (16), we have J S (w) = J R (w) = J P (w). Part (b): First I show that J S (w) ≤ J P (w) for w ∈ [0, u(∞)). We can focus on the interval [0, wRC ]. Suppose there is an interval [wL , wR ] ⊂ [0, wRC ] such that J S (wL ) = J P (wL ), J S (wR ) = J P (wR ) and J S (w) > J P (w) for all w ∈ (wL , wR ). 39

Here cP is the compensation policy function implied by J P .

29

Therefore, for any real number ε > 0, there exists a w1 ∈ [wL , wL + ε] such that JwS (w1 ) > JwP (w1 ) and a w2 ∈ [wR − ε, wR ] such that JwS (w2 ) < JwP (w2 ). Let ε be sufficiently small so that w1 < w2 . According to Part (b) of Proposition C.1 and part (c) of Definition 1, there exists a w∗ ∈ (w1 , w2 ) such that JwS (w∗ ) = JwP (w∗ ). By Lemma C.3, we have J S (w∗ ) ≤ J P (w∗ )–a contradiction. Now suppose J S (w) ̸= J R (w) for some w ∈ [0, u(∞)). Note that, if J S (w) = J P (w), JwS (w) = JwP (w) by part (a). Therefore J S (w) ̸= J P (w), hence J S (w) < J P (w). Part (c): By parts (a) and (b), J S (w) = J P (w) implies J P (w) = J R (w) and then implies J S (w) = J R (w) for any w ∈ [0, u(∞)). Now I show that J S (w) = J R (w) implies J S (w) = J P (w). Assume that J S (w) = J R (w) for some( w ∈ )[0, u(∞)), then JwS (w) = JwR (w) and hence cS (w) = cR (w), which implies that u cS (w) = w and (13) implies ϕS (w) = 0. By (16), J S (w) = J P (w).

C.3

Preliminary Results and Properties of J E

In this subsection, I show the preliminary results needed in Appendix C.4 and C.5. In this subsection, some of the results are directly extended from corresponding results in Sannikov [2008],40 so their proofs are omitted. Proposition C.3. Suppose that J P ∈ J . Let c : [0, u(∞)) → R+ and ϕ : [0, u(∞)) → R be the policy functions implied Then, for all w ∈ [0, u(∞)): c(w) = 0 if ) ( by (19). −1 −1 JwE (w) ≥ u′ (0) and c(w) = u′−1 J E (w) otherwise; if JwE (w) ≥ JwP (0), w

ϕ(w) =

−1 w, β

(C.3)

and if JwE (w) < JwP (0), ϕ(w) is uniquely determined by the condition JwP (w + βϕ(w)) = JwE (w).

(C.4)

The proof is straightforward, so is omitted. The following lemma establishes the existence and uniqueness of the solution to (19) satisfying a set of boundary conditions. Lemma C.4. Suppose that J P ∈ J . Given the initial values of J E (w) and JwE (w) at any w ∈ [0, u(∞)), the solution to (19) exists and is unique. Combined with Lemma C.4, the following lemma implies that any solution to (19) is strictly concave, or strictly convex, or linear. Lemma C.5. Suppose that J P ∈ J . Let J E be a solution to (19). For any w ∈ E (w) < 0, J E is strictly concave and if J E (w) > 0, J E is strictly [0, u(∞)), if Jww ww convex. The following lemma shows a “single crossing” property of the solutions of (19). Lemma C.6. Suppose that J P ∈ J and J E1 and J E2 are two solutions of (19). If J E1 (w) ˜ = J E2 (w) ˜ and JwE1 (w) ˜ > JwE2 (w) ˜ at some point w ˜ ∈ [0, u(∞)), then J E1 (w) < J E2 (w), JwE1 (w) < JwE2 (w) for all w ∈ [0, w) ˜ and J E1 (w) > J E2 (w), JwE1 (w) > JwE2 (w) for all w ∈ (w, ˜ u(∞)). 40

Specifically, Lemmas C.4 and C.5 are extended from Lemma 1 in Sannikov [2008], Lemma C.6 is from Lemma 2, and Lemma C.8 is from Lemma 3 in his paper.

30

The following lemma shows that if J E is tangent to J S at some point, it is concave. Lemma C.7. Suppose that J P ∈ J and J E is a solution to (19) such that J E (w) ˜ = E (w) ≤ 0 for all w ∈ J S (w) ˜ and JwE (w) ˜ = JwS (w) ˜ for some w ˜ ∈ [0, u(∞)). Then Jww [0, u(∞)). Proof. By plugging J E (w) ˜ = J S (w) ˜ and JwE (w) ˜ = JwS (w) ˜ into (19) we have [ E Jww (w) ˜ = min −2 −c − J S (w) ˜ + JwS (w) ˜ (w ˜ − u(c) − λϕ) a,c,ϕ ] ) ( ) λ( P S E J (w ˜ + βϕ) − J (w) ˜ + g(a) + Jw (w) ˜ h(a) /rρ(a)2 . + β

If we choose c = cS (w) ˜ and ϕ = ϕS (w) ˜ defined in (11) and (12), the above equation implies ( ) λ( P( ) ) −cS (w) ˜ − J S (w) ˜ + JwS (w) ˜ w ˜ − u(c) − λϕS (w) ˜ + J w ˜ + βϕS (w) ˜ − J S (w) ˜ =0 β ] [ E (w) ˜ h(a) /βρ(a)2 ≤ 0. Therefore, by Lemma C.5, ˜ ≤ mina g(a) + JwE (w) and thus Jww we have the desired result. The following lemma shows that, if J E is a strictly concave (19) and is ( solution ( to )) an improvement over J S , then it cannot step over the point wRC , J S wRC .

Lemma C.8. Suppose that J P ∈ J , J E is a strictly concave solution to (19) and [wL , wR ] ⊂ [0, u(∞)) such that J E (w) > J S (w) for all w ∈ (wL , wR ), J E (wL ) = J S (wL ) and J E (wR ) = J S (wR ). If wL ∈ [0, wRC ), then wR ≤ wRC .

C.4

Proof of Proposition 5

The following two lemmas are necessary. The proofs are based on strict concavity of J S , Lemma C.6, and the standard shooting argument. So I omit their proofs. Lemma C.9. Suppose that ∩ [wL,1 , wR,1 ] and [wL,2 , wR,2 ] are two different working intervals, then (wL,1 , wR,1 ) (wL,2 , wR,2 ) = ∅.

Lemma C.10. Suppose that H is a solution to (19) satisfying: H (wl ) = J S (wl ), H (wr ) = J S (wr ) with 0 ≤ wl < wr ≤ wRC , and H(w) > J S (w) for all w ∈ (wl , wr ). Then, there exists a unique working interval, [wL , wR ], such that [wl , wr ] ⊂ [wL , wR ] and the associated working function, J E , satisfies J E (w) ≥ H(w) for all w ∈ [wl , wr ] with the inequality holding if [wl , wr ] is a proper subset of [wL , wR ]. Notice that each working interval has positive [ ]Lebesgue measure. Therefore, there RC are at most countable many of them over 0, w , denoted by [wLk , wRk ] for k ∈ N.41 Notice that Lemma C.9 and Lemma C.10 imply the existence and uniqueness of J O . In particular, if there is no improvement, J O (w) = J S (w) for w ∈ [0, u(∞)). Now, I show that J O ∈ J . Since J S is concave, Lemma C.5 implies concavity of O J . The smooth pasting conditions of the working functions and differentiability of J S imply differentiability of J O . So Part (a), (b) and (d) of Definition 1 are straightforward. To show Part (c), Let N S be the subset of [0, wRC ] over which JwS is not differentiable 41

See Zhong [1974] Exercise 1.1.4.

31

and N A be the set of boundary points of the intervals. So, JwO is not differ( S ) working A entiable in a subset of N \ (∪k [wLk , wRk ]) ∪ N , which contains at most countably many points. Furthermore, since each working interval [wLk , wRk ] has strictly positive Lebesgue measure and any two working intervals do not intersect, JwO is piecewise differentiable. Notice that J S is Lipschitz continuous, each working function has bounded second order derivative over the compact set [0, wRC ], and JwO is continuous and has no increment over the set where it is not differentiable (smooth pasting), so it is Lipschitz continuous and we have the desired result.

C.5

Proof of Proposition 6

The proof consists of two parts: First I show that the contract generates the expected payoff J O (w) for the principal; second it is the highest expected payoff that can be achieved under an incentive-compatible contract. C.5.1

J O (w) can be achieved under an incentive-compatible contract

Denote the optimal contract characterized in Proposition 6 (A, C). The following lemma show that the policy functions under (A, C) are bounded. The proof is obvious and omitted. Lemma C.11. Suppose that J P ∈ J . Define the policy functions of the agent’s compensation c : [0, wRC ] → R+ and the adjustment of the agent’s continuation utility with respect to the observable shock ϕ : [0, wRC ] → R as follows: (1) If w is in the interior of a working interval with associated working function being J E , c(w) and ϕ(w) are the minimizers of the objective function in (19) associated with J E (w). (2) If w is not in the interior of any working interval, c(w) = cS (w) and ϕ(w) = ϕS (w) with cS and ϕS being defined by (11), (C.3) and (C.4). Then c(w) ∈ [0, u′−1 (wRC )] and βϕ(w) ∈ [−wRC , wRC ]. I only need to show the case in which w is in the interior of a working interval. Based on Lemma C.11, it is easy to check that (20) has a unique weak solution and (A, C) generates the continuation utility process { } W . Now, I show that (A, C) generates expected payoff J O (w) = J E (w). Let wtL t∈[0,T S ∧T ] be the left-continuous modification of {wt }t∈[0,T S ∧T ] . The expected payoff of the principal is Π = βE

A

[∫

T S ∧T

0

+e

−β (T S ∧T )

1

e−βt (g (at ) − ct ) dt + e−β (T ({

S

T >T

}) (

J

P

(

wTL

S ∧T

+ βϕ

(

) J S (w L S

wTL

T ∧T

))

−J

S

(

)

wTL

))

]

.

For t ∈ [0, T S ∧ T ], define ∫

t

( ) d−βτ (g (aτ ) − cτ ) dτ + e−βt J O wtL 0 ( ( ( )) ( )) −βt +e 1 ({t = T }) J P wtL + βϕ wtL − J O wtL . ( ) ( ) Since J O wTLS = J S wTLS if T S < T , Π = E A [ΠT S ∧T ]. I show that {Πt }t∈[0,T S ∧T ] is Πt = β

32

( ) ( ) a martingale. Notice that J E wtL = J O wtL for t ∈ [0, T S ∧ T ], hence [ ( ( )) βt e dΠt = β (g (at ) − ct ) + βJwE wtL − u(ct ) + h(at ) − λϕ wtL

] ( )) ( L) ( ( ( )) 1 E + β 2 ρ (at )2 Jww wt + λ J P wtL + βϕ wtL − J E wtL dt 2 ( ) ( ( ( )) ( )) +βρ (at ) JwE wtL dztA + λ J P wtL + βϕ wtL − J E wtL dmt . ( ) ( ) A and M are two martingales and J E satisfies (19) with a w L , c w L and Since Z t t ( ) ϕ wtL being the minimizers, and {Πt } is a martingale. Therefore Π = E A [ΠT S ∧T ] = Π0 = J E (w) = J O (w) and we have the desired result. C.5.2

J O (w) is the highest expected payoff that can be achieved under an incentive-compatible contract

I start with the following lemma. Lemma C.12. Suppose that J P ∈ J and that J E is a concave solution to (19) such that J E (w) ≥ J S (w) for all w ∈ [0, u(∞)). Then any incentive-compatible contract promising the agent expected utility w cannot yield an expected payoff greater than J E (w). Proof. Let (A, C) be a incentive-compatible contract providing the agent with expected utility w. According to Proposition 2, a necessary condition for incentive compatibility of (A, C) is φt ∈ [0, ρ (0)] if at = 0, and φt ≥ ρ (at ) if at > 0 for all t ∈ [0, T R ∧ T ]. (C.5) { } Let T R = inf t < T : wt = 0 or wt ≥ wRC . The expected payoff of the principal generated by the contract is [∫ R T ∧T ({ }) ( ) R A Π = E e−βt (g (a ) − c ) dt + e−β (T ∧T ) 1 T > T R J R wLR t

t

T

0

−β (T R ∧T )

+e

1

Define ¯ = E Π

A

[∫

({

T R ∧T

0

+ e−β (

T R ∧T

T
R

})

J

P

(

wTL

+ βϕT

)

]

e−βt (g (at ) − ct ) dt + e−β (T

.

R ∧T

) J E (w L R

T ∧T

)

] ({ }) ( ( ) ( )) )1 T < T R J P wTL + βϕT − J E wTL .

¯ For t ∈ By assumption, J E (w) ≥ J R (w) for all w ∈ [0, u(∞)) and thus Π ≤ Π. R [0, T ∧ T ], define ∫

t

( ) e−βt (g (aτ ) − cτ ) dτ + e−βt J E wtL 0 ( ( ) ( )) −βt +e 1 ({t = T }) J P wtL + βϕt − J E wtL

¯t = β Π

33

[ ] { } ¯t ¯ = EA Π ¯ T R ∧T . Now I show that Π and Π is a super-martingale. Notice t∈[0,T R ∧T ] that [ ( ) ( )( ) βt ¯ e dΠt = β g (at ) − ct − J E wtL + JwE wtL wtL − u(ct ) + h(at ) − λϕt ] ) ( L )) 1 2 E ( L) λ ( P ( L E + βφt Jww wt + J wt + βϕt − J wt dt 2 β ( ) ( ( ) ( )) +βφt J E wtL dztA + λ J P wtL + βϕt − J E wtL dmt .

Since Z A and M are two martingales, I only need to show that [ ) ( ) ( )( Dt ≡ β g (at ) − ct − J E wtL + JwE wtL wtL − u(ct ) + h(at ) − λϕt ] ) ( )) ( L) λ ( P ( L 1 E wt + J wt + βϕt − J E wtL ≤ 0. + βφ2t Jww 2 β

for all t ∈ [0, T R ∧ T ). If at > 0, by (C.5) and concavity of J E , we have [ ( ) ( )( ) Dt ≤ β g (at ) − ct − J E wtL + JwE wtL wtL − u(ct ) + h(at ) − λϕt ] ) ( L )) 1 λ( P( L 2 E ( L) E + βρ (at ) Jww wt + J wt + βϕt − J wt . 2 β Because J E is a solution to (19), Dt ≤ 0. If at = 0, we have [ ( ) ( )( ) Dt ≤ β − ct − J E wtL + JwE wtL wtL − u(ct ) − λϕt

( L) λ ( P ( L ) ( )) 1 E + βφ2t Jww wt + J wt + βϕt − J E wtL 2 β

]

(C.6)

and we need to (consider two cases. ) E L Case 1: Jw wt > JwP (w) for w ∈ [0, u(∞)). By assumption, J E (0) ≥ J S (0) = P J (0) = 0, and JwE (w) > JwP (w) and J E (w) ≥ J P (w) for all w ∈ [0, wtL ]

(C.7)

with the second inequality being strict if w > 0. Therefore, if ϕt ≤ 0, concavity of J E and (C.7) imply ( ) ( ) ( ) ( ) J E wtL + βϕt JwE wtL ≥ J E wtL + βϕt ≥ J P wtL + βϕt . (C.8)

If ϕt > 0, concavity of J P and (C.7) imply ( ) ( ) ( ) ( ) ( ) J E wtL + βϕt JwE wtL ≥ J P wtL + βϕt JwP wtL ≥ J P wtL + βϕt .

Hence

( ) ( ) ( ) βϕt JwE wtL + J E wtL − J P wtL + βϕt ≥ 0.

(C.9) (C.10)

Let w ˜ = u(ct ) and thus J R (w) ˜ = −ct . By the fact that J E (w) ≥ J R (w) for w ∈ [0, u(∞))

34

and concavity of J E we have ( ) ( )( ) J E wtL + ct + JwE wtL u(ct ) − wtL ≥ 0 (C.11) [ ] ( L) E By plugging (C.10) and (C.11) into (C.6), we have Dt < β 12 Jww wt β (φt )2 ≤ 0. ( ) E w L = J P (w). Case 2: There is a w ˜ ∈ [0, u(∞)) such that J Hence, by (15), there w t w ˜ ( ) is a w′ ∈ [0, u(∞)) such that JwE wtL = JwS (w′ ). Concavity of J E implies ( ) ( ) ( )( ) J E wtL − J E w′ + JwE wtL w′ − wtL ≥ 0.

Notice that J E (w) ≥ J S (w) for w ∈ [0, u(∞)). As a result, we have ( ) ( ) ( )( ) J E wtL − J S w′ + JwS w′ w′ − wtL ≥ 0.

(C.12)

(C.13)

First we plug (C.12) into (C.6) and obtain [ ( ( ))] ( ( ) ) λ Dt ≤ β −ct − J E wtL + JwE wtL − u(ct ) − βϕt − w′ − wtL β | {z } ( ) ( ′) ( L ) ( L) ( L) P E E E +λ J wt + βϕt − J wt + J wt − J w | {z } )] [ ( ( ′ )) ( L) λ( L E E L βϕt − w − wt = β −ct − J wt + Jw wt − u(ct ) − β ( ( ) ( )) +λ J P wtL + βϕt − J E w′ Second we plug (C.13) in and obtain [ ( )] ( ′) ( ′) λ( ˜) S S ′ Dt ≤ β −ct − J w + Jw w w − u(ct ) − β ϕt β ( ( ) ) ( ) +λ J P w′ + β ϕ˜t − J S w′ .

{ } ( ( )) ¯t Here ϕ˜t = βϕt − w′ − wtL /β. Since J S satisfies (19), Dt ≤ 0 and Π is t∈[0,T R ∧T ] [ ] A E ¯ ¯ an Ft -adapted super-martingale. Therefore Π = E ΠT R ∧T ≤ Π0 = J (w) and we have the desired result. Now we turn to the proof of the proposition. There are two cases: Case 1: w is in the interior of a working interval [wL , wR ] with associated working curve J E , and thus J O (w) = J E (w). According to the construction, J E is strictly concave and J E (w) ≥ J S (w) for all w ∈ [0, wRC ]. Thus by Lemma C.12, the expected payoff generated by any incentive-compatible contract providing expected utility w to the agent cannot generate an expected payoff of the principal greater than J E (w) = J O (w). Case 2: w is not in any working interval. Suppose that J E is the solution to (19) with initial condition J E (w) = J S (w) and JwE (w) = JwS (w). By Lemma C.7, J E is concave. Now I show that J E (w) ≥ J S (w) for all w ∈ [0, u(∞)). Suppose not and that, without loss of generality, there is a w ˜ ∈ (w, wRC ) such that E ′ E S ′ ˜ such that J (w ) = J S (w′ ). By Lemma J (w) ˜ < J (w). ˜ Then there is a w ∈ (w, w) C.10, there is a working interval containing [w, w], ˜ which contradicts the assumption that w is not in any working interval. Therefore, by Lemma C.12, there is no incentive-

35

compatible contract providing the agent with w and yielding an expected payoff greater than J E (w) = J S (w). Hence we have the desired result.

C.6

Proof of Theorem 1

Before T S ∧ T , wt ∈ (wL , wR ). Let us return to the proof of Lemma C.12 and adopt the notations there. I show that if at = 0, Dt < 0 for t ∈ [0, T S ∧ T ). Strict concavity of J E implies that: In Case 1, (C.8) and (C.10) are strict inequalities and hence Dt < 0; in Case 2, if wtL ̸= w′ , (C.12) is a strict inequality, or if wtL = w′ , (C.13) is a strict inequality, and thus we have Dt < 0. Therefore, we have the desired result.

D

Appendix to Subsections 4.3 and 4.4

In this section, it is convenient to introduce a standard Brownian motion {zto } which is independent of Z. This process serves as a publicly observable randomization that helps in the proofs of some of the results but would not affect the expected payoff under the√ optimal contract due to concavity of the value functions. Define the function o ¯]. Part (b) of Assumption 3 guarantees φ (a) = φ(a, 0)2 − φ(a, 1)2 for all a ∈ [0, a that φo is well defined.

D.1

Proof of Lemma 3

k+1 k I only show that ( k J k ) (w, 0) ≥ J (w, 0) for all k ∈ N. The proof for the other sequence is similar. Let A , C be the optimal contract promising the agent expected utility w and started in the bad-state period in which J k (·, 0) is the value function. The firm is going to experience k − 1 good-state periods continuation { in}the future. Denote the agent’s ( ) utility process under this contract by wtk . Now let us implement Ak , C k starting in the bad-state period in which J k+1 (·, 0) is the value function and the firm is going to experience one more good-state period. Everything is the same except that in{the last } good-state period the continuation utility process under this implementation, wtk+1 , follows ( ) dwtk+1 = β wtk+1 − u(ckt ) + h(akt ) dt + βφ(akt , 1)dztA + βφo (akt )dzto .

Obviously this implementation is incentive compatible and the distributions of W k+1 and W k are identical. So the agent’s expected utility is w. Part (a) of Assumption 3 implies that the principal’s expected payoff is no less than J k (w, 0). Concavity of J k+1 (·, 0) implies that public randomizations cannot improve the expected payoff. So we have the desired result.

D.2

Proof of Proposition 7

¯ 1) as the proof for Jˆ0 and J(·, ¯ 0) is similar. I only show the result about Jˆ1 and J(·, Let (A, C) be the contract promising w in the good state. by T (n) the Poisson ( k Denote ) k time of the nth state transition. For each k ∈ N, let A , C be the contract that is identical to (A, C) up to T (2k − 1), the time of a transition into the bad state, and thus allows the agent to retire with his continuation utility at T (2k − 1) matching that under (A, C) (for all histories. The difference between the payoffs of the principal at T (2k − 1) ) under Ak , C k and (A, C) is bounded from above by δ¯ ≡ g¯ − J R (wRC ). Obviously, 36

( ) the promise-keeping and incentive compatibility conditions are satisfied under Ak , C k . We denote the principal’s expected payoff by Πk and thus [ ∫ ] [ ] ∞ ¯ ¯ Πk ≥ Jˆ1 − E β e−βt δdt = Jˆ1 − E e−βT (2k−1) δ. (D.1) T (2k−1)

{ }k x0i with x1i i=1 being k independent exponen{ }k−1 tial random variables with rate parameter λ(1) and x0i i=1 being k − 1 ones with λ(0). { 1 }k Without loss of generality, I assume that λ(1) < λ(0). In addition, let x ¯i i=1 be k independent exponential random variables with rate parameter Then the rate } } ∑ { { ∑ λ(0) − λ(1). 0 ¯1i is λ(0) for all i and T (2k − 1) ≥ ki=1 min x1i , x ¯1i + k−1 parameter of min x1i , x i=1 xi with the right hand side of the inequality being the sum of 2k−1(independent)exponential 1 random variables with rate parameter λ(0) which is a Gamma- 2k − 1, λ(0) distributed ( )2k−1 [ ] 1 . Hence (D.1) implies that random variable. Therefore E e−βT (2k−1) < β Notice that T (2k − 1) =

∑k

1 i=1 xi +

∑k−1 i=1

1+ λ(1)

limk→∞ = Jˆ1 . On the other hand, by the definition of J k (·, 1), Πk ≤ J k (w, 1) for all k ∈ N. So we have the desired result. Πk

D.3

Proof of Proposition 8

¯ 0) and Let J¯S (·, 1) and J¯S (·, 0) be the shirking value functions constructed with J(·, Sk ¯ J(·, 1) being the post-shock value functions, respectively. Similarly, let J (·, 0) and J Sk (·, 1) be the shirking value functions generated by using J k (·, 0) and J k (·, 1) as the ¯ 0) and post-shock value functions, respectively. I focus on the part of the proof for J(·, ∗ ¯ J (·, 0) as the other part is similar. The proof is divided into two lemmas. In Lemma D.5, I show that the result holds over the region where J(·, 0) strictly dominates J¯S (·, 1) and in Lemma D.6 I show that it holds over the region where J(·, 0) and J¯S (·, 1) are equal. Before proceeding to the main part of the proof, I show four preliminary results. ¯ 1) ≥ Lemma D.1. For any w ∈ [0, wRC ] and k ∈ N, J k (w, 1) ≥ J k (w, 0) and thus J(w, ¯ 0). J(w, ( ) Proof. Let Ak , C k be the contract providing the agent with w in a bad-state period in which J k (·, 0) is the value function. Let {wt } be the continuation utility process and {ϕt } be the sensitivity process w.r.t. the persistent shocks. Let us turn to the good-state period in which J k (·, 1) is the value function and denote the arriving times of the 2k − 1 { }2k−1 state transitions that the firm is going to experience by T¯(n) n=1 . We implement ( k k) A , C starting in this good-state period as follows. Let T C be the ringing time of a public Poisson clock with rate λ(0). Define a new Poisson time sequence {T (n)}2k−2 n=1 such that: If T C < T¯(1), T (1) = T C and T (i) = T¯(i − 1) for i = 2, ..., 2k − 2; if T C > T¯(1), we ignore the ringing and set T (i) = T¯(i + 1) for i = 1, ...2k − 2. In both 2k−2 cases, the sequence {T (n)}n=1 is drawn from the same distribution as the arrival times of the 2k − 2 transitions that a firm to in the bad-state period mentioned at ( k is subject ) k the beginning and we implement A , C accordingly such that the continuation utility

37

{ } process under this implementation, wtk , satisfies ( ) dwtk = β wtk − u(ckt ) + h(at ) dt + βφ(at , 1)dztA +βφo (at )dzto + ϕt (−λ(0)dt + dnt )

[ { }] if t ∈ 0, min T C , T¯1 , and in the case of T C > T¯(1), ( ) dwtk = β wtk − u(ckt ) + h(at ) dt + βφ(at , 1)dztA + βφo (at )dzto ;

[ ] if t ∈ T¯2k−2 , T¯2k−1 . Obviously this implementation is incentive compatible and the agent’s expected utility is w; the principal’s expected payoff is no less than J k (w, 0) but no greater than J k (w, 1). So J k (w, 1) ≥ J k (w, 0). ¯ 1) ≥ J(w, ¯ 0) for This lemma implies that, as k converges to infinity, we have J(w, RC all w ∈ [0, w ] and the following results.

Lemma D.2. Over [0, wRC ], { } { } ¯ 0) and (a) the sequences J k (·, 1) k∈N and J k (·, 1) k∈N uniformly converge to J(·, ¯ J(·, 1) respectively, which are concave; { } { } ¯ 0) and J(·, ¯ 1) are differentiable; furthermore J k (·, 0) (b) J(·, and Jwk (·, 1) k∈N w k∈N uniformly converge to J¯w (·, 0) and J¯w (·, 1) respectively, which are Lipschitz continuous; { } { } (c) the analogous results of Part (a) and (b) hold with J Sk (·, 0) k∈N , J Sk (·, 1) k∈N , J¯S (·, 0), J¯S (·, 1) and their derivatives. Proof. According to Proposition 5, J k (·, 0) and J k (·, 1) are strictly concave for all k ∈ N. 42 and we have Part (a). Then point-wise convergence implies uniform convergence { Sk } ¯ 1) and their To prove Part (b), I show the results about J (·, 1) k∈N and J(·, k derivatives as the remaining part is {similar. }By Proposition 5, Jw (·, 1) is Lipschitz continuous for each k, so I show that Jwk (·, 1) k∈N are bounded by the same Lipschitz constant. Note that, by Lemma 3 and D.1, J k (w, 1) ≥ J k (w, 0) for all w ∈ [0, wRC ] such ¯ 0) or w is that J Sk (w, 1) > J R (w). Hence for any w ∈ [0, wRC ], either J k (w, 1) = J(w, k in the interior of a working interval of J (·, 1). So we consider these two cases. 2 d k (w, 1) ≤ max If J k (w, 1) = J R (w), Jww u−1 (w) ˜ , which is a constant RC ] w∈[0,w ˜ dw2 independent of w and k. ( ) If w is in a working interval of J k (·, 1), J k (w, 1) satisfies (19). Since Jwk wRC , 0 ≤ ( ) ¯ ˜ for all k ∈ N and the Jwk (w, 1) ≤ J¯w (0) and J R wRC ≤ J k (w, 1) ≤ maxw∈[0,w RC ] J(w) ˜ k optimal a, c and ϕ on the right hand side are {chosen from compact sets, Jww (w, 1) < M } with some M independent of k and w. So Jwk (·, 1) k∈N are Lipschitz continuous and { } bounded by the same Lipschitz constant. By the Arzel`a-Ascoli theorem,43 Jwk (·, 1) k∈N has a uniformly convergent sub-sequence and the limit function is also Lipschitz continu{ } ous and bounded by the same Lipschitz constant. Uniform convergence of J k (·, 1) k∈N ¯ 1) is differenimplies the uniqueness of the limit function, which is J¯w (·, 1), and J(·, 44 tiable. 42

See Rockafellar [1970] Theorem 10.8. See Renardy and Rogers [2004] Theorem 4.17. 44 See Rudin [1976] Theorem 7.17. 43

38

{ } To prove Part (c), I only show the results for JwSk (·, 1) k∈N , J¯wS (·, 1) as the remaining ( −1 ) d ( −1 ) Sk (w, 1) = J k part is similar. By Proposition 3, Jww ww ω (w), 1 dw ω (w) almost RC ]. Here ω is defined by (14) with J P being J k (·, 1). As I have shown everywhere in w [0, ( −1 ) ( ) 1 d k in Part (b), Jww ω −1 (w), 1 < M almost everywhere. Since dw ω (w) = ω′ (ω−1 (w))

and ω −1 (w) ∈ [0, wRC ] for any w ∈ [0, (wRC ], )we check that ω ′ (w) is bounded over λ [0, wRC ]. Let wP ∈ [0, wRC ] such that Jwk wP , 1 = − u′1(0) , hence we have ω ′ (w) = β+λ and [ ( ( ))] λ(0) β −1 d −1 1 ′ ′−1 k ω (w) = + u 2 Jww (w, 1) k β + λ(0) β + λ(0) Jwk (w, 1) dw J k (w, 1) (Jw (w, 1)) | {z } >0

λ(0) d −1 for w ∈ (wP , wRC ]. Therefore ω ′ (w) ≥ β+λ(0) and then dw ω (w) ≤ β+λ(0) λ(0) , which Sk { } Sk implies Jww (w, 1) ≤ M β+λ(0) λ(0) . Hence Jw (·, 1) k∈N are Lipschitz continuous and bounded by the same Lipschitz constant and we have the desired result.

¯ k : R2 × [0, wRC ] → R such that Lemma D.3. Define the function Ψ [ ( ) k ¯ ¯ ¯ ¯ = min −2 g (a, 1) − c − J¯ + J¯w (w ¯ − u(c) + h(a) − λ(1)ϕ) Ψ J, Jw , w a,c,ϕ ] ) λ(1) ( k + J (w ¯ + βϕ, 0) − J¯ /βρ(a)2 , β

¯ : R × [0, wRC ] → R such that and define Ψ [ ( ) ¯ ¯ ¯ Ψ J, Jw , w ¯ = min −2 g (a, 1) − c − J¯ + J¯w (w ¯ − u(c) + h(a) − λ(1)ϕ) a,c,ϕ ] ) λ(1) ( ¯ + J (w ¯ + βϕ, 0) − J¯ /βρ(a)2 . β ( ) ¯ and Ψ ¯ k are continuous for all k ∈ N and limk→∞ Ψ ¯ k (J, ¯ J¯w , w) ¯ J, ¯ J¯w , w Then Ψ ¯ =Ψ ¯ . { } ¯ 1), the Furthermore, if we define Ψk and Ψ symmetrically with J k (·, 1) k∈N and J(·, result holds as well. Proof. Continuity is shown in the proof of Lemma C.4. So I show convergence. Let ¯ k and ϕˆ that of Ψ. ¯ Set ϕˆk be the optimal ϕ for the objective in the definition of Ψ ) λ(1) ( k ( k ) k k ˆ ˆ ˆ ¯ w ˆk = w ¯ + β ϕ and w ˆ =w ¯ + β ϕ. Define ξk ≡ −λ(1)ϕ Jw + β J w ˆ , 0 − J¯ and ) ( ξ ≡ −λ(1)ϕˆJ¯w + λ(1) J¯ (w, ˆ 0) − J¯ . I show that limk→∞ ξk = ξ. β ¯ 0) is strictly concave. I show that limk→∞ w Case 1: J(·, ˆk = w, ˆ the unique value ¯ w, such that J( ˆ 0) = J¯w or w ˆ = 0 if J¯w (0, 0) < J¯w . Suppose not, then without loss of generality, that there is a sub-sequence of {w ˆk }k∈N , {w ˜k }k∈N , such{that ˜k = } limk→∞ w { k } k ˜ . Uniform w ˜ > w. ˆ Denote the associated sub-sequence of Jw (·, 0) k∈N by Jw k∈N { k } k convergence of Jw (·, 0) k∈N implies that J¯w (w, ˜ 0) = limk→∞ J˜w (w ˜k ) = J¯w = J¯w (w, 0) ¯ which contradicts strict concavity of J(·, 0). Therefore we have limk→∞ w ˆk = w ˆ and thus limk→∞ ξk = ξ. ¯ 0) is not strictly concave and there is an interval [w′ , w′′ ] such that Case 2: J(·, J¯w (w, 0) = J¯w if w ∈ [w′ , w′′ ]. So any ϕ satisfying w ¯ + βϕ ∈ [w′ , w′′ ] is optimal and I choose ϕˆ = β1 (w′ − w) ¯ and w ˆ = w′ . In fact, ξ does not change if w ˆ ∈ [w′ , w′′ ] changes. 39

By a similar argument to that in Case 1, {w ˆk }k∈N “converges” into [w′ , w′′ ]. Namely, the limit of any convergent sub-sequence is in([w′ , w′′ ]. )Therefore we have limk→∞ ξk = ξ. ¯ k J, ¯ J¯w , w Note that the objective function in Ψ ¯ is continuous in ξk , a and c. Therefore limk→∞ ξk = ξ implies the desired result.

¯ L , 0) = Lemma D.4. Suppose that [wL , wR ] ⊂ [0, wRC ] is an interval such that J(w S S S ¯ ¯ ¯ ¯ ¯ J (wL , 1), J(wR , 0) = J (wR , 1) and J(w, { 0)k }> J (w, 1) for w ∈k (wL , wR ). Then there RC is a sequence of functions over [0, w ], H k∈N , such that H is a working function { } { } ¯ 0) and H k ¯ of J k (·, 0) for k ∈ N. H k k∈N converges to J(·, w k∈N converges to Jw (·, 0) ¯ over [wL , wR ] uniformly. The results also hold for J(·, 1). { } ¯ 0) > J¯S (w, 1) + ε is connected Proof. First, I show that the set w ∈ [wL , wR ] : J(w, for some small number ε > 0. Suppose not such that the closed set { } 1 S ¯ ¯ In ≡ w ∈ (wL , wR ) : J(w, 0) ≤ J (w, 1) + n is nonempty for all n ∈ N. {Obviously In ⊂ In′ if n > n′ . By } the Nested Interval S ¯ ¯ theorem, limn→∞ In ̸= ∅ and w{∈ (wL , w } R ) : J(w, 0) ≤ J (w, 1) ̸= ∅–a contradiction. ¯ So, by uniform convergence of J k (·, 0) , there is a k¯ ∈ N such that for any k > k, k ¯ 0) − J (w, 0) < ε. Therefore the set maxw∈[0,wRC ] J(w, { } Uk = w ∈ [wL , wR ] : J k (w, 0) ≥ J¯S (w, 1)

is connected and J k (·, 0) is represented by one of its working functions for all k > ¯ which is denoted by H k , because in the interior of Uk , J k (w, 0) > J¯S (w, 1) ≥ k, J Sk−1 (w,{1). } ¯ 0) over [wL , wR ] uniformly, limk→∞ Uk = (wL , wR ). Since J k (·, 1) k∈N converges to J(·, { k} { } ¯ 0) over (wL , wR ) uniformly. Continuity of H k So H k∈N converges to J(·, and } { k∈N k ¯ J(·, 0) implies that convergence holds over [wL , wR ]. As shown in Lemma D.2, Hw k∈N are Lipschitz continuous bounded by a common Lipschitz constant over [wL , wR ]. So the sequence has a convergent { k } sub-sequence in the sense of uniform convergence. Hence, uniform convergence of H k∈N implies the uniqueness of the limit which is J¯w (·, 0) and we have the desired result. ¯ L , 0) = Lemma D.5. Suppose that [wL , wR ] ⊂ [0, wRC ] is an interval such that J(w S S S ¯ R , 0) = J¯ (wR , 1) and J(w, ¯ 0) > J¯ (w, 1) for w ∈ (wL , wR ), then J¯ (wL , 1), J(w ∗ ¯ ¯ 1). J (w, 0) = J(w, 0) for w ∈ [wL , wR ]. Analogous results hold for J ∗ (·, 1) and J(·,

Proof. We continue to use the notations in the proofs of Lemmas D.3 and D.4. I show that [wL , wR ] is a working interval of J ∗ (·, 0). Namely, H is the solution to Hww = Ψ (H(w), hw (w), w) for w ∈ [0, wRC ]

(D.2)

¯ L , 1), Hw (wL ) = J¯w (wL , 0) and H(w) ≥ J¯S (w, 1) with boundary conditions H(wL ) = J(w { k} RC for w ∈ [0, w ]. Let H k∈N be the sequence of working functions described in Lemma D.4. For any k ∈ N, H k is the solution to ( ) k Hww (w) = Ψk H k (w), Hwk (w), w (D.3) 40

with boundary conditions H k (wL ) = J k (wL , 0) and Hwk (wL ) = Jwk (wL , 0). (By Lemma ) k ¯ ¯ C.4, the solutions of (D.2) and (D.3) are unique. By Lemma D.3, lim Ψ J, J , w ¯ = w k→∞ ( ) ( ) 2 RC k ¯ ¯ ¯ ¯ Ψ J, Jw , w ¯ for any J, Jw , w ¯ ∈ R × [0, w ]. Then limk→∞ H (w) = H(w) and limk→∞ Hwk (w) = Hw (w) for all w ∈ [0, wRC ]. Note that limk→∞ J Sk (w, 1) = J¯S (w, 1) and H k (w) ≥ J Sk (w, 1) for all k ∈ N and w ∈ [0, wRC ]. So H(w) ≥ J¯S (w, 1) for all w ∈ [0, wRC ]. Furthermore H k (wL ) = J Sk (wL , 1) and H k (wR ) = J Sk (wR , 1) imply H (wL ) = J¯S (wL , 1) and H (wR ) = J¯S (wR , 1). So H is a working function of J ∗ (·, 0) ¯ 0) for w ∈ [wL , wR ]. and J ∗ (w, 0) = J(w, ¯ 0) = Lemma D.6. Suppose that [wL , wR ] ⊂ [0, wRC ] is an interval such that J(w, S ∗ S ¯ ¯ J (w, 1) for all w ∈ [wL , wR ], then J (w, 0) = J (w, 1) for all w ∈ [wL , wR ]. Symmet¯ 1) = J¯S (w, 0) for rically, suppose that [wL , wR ] ⊂ [0, wRC ] is an interval such that J(w, ∗ S all w ∈ [wL , wR ], then J (w, 1) = J¯ (w, 0) for all w ∈ [wL , wR ].

¯ 1). Without losing generality, I assume Proof. I show the results for J ∗ (w, 0) and J(w, that [wL , wR ] is the largest interval satisfying the conditions in the statement of the Lemma. Suppose not such that J ∗ (w, ¯ 0) > J¯S (w, ¯ 1) for some w ¯ ∈ [wL , wR ]. Then there ∗ S ¯ is a working function of J (·, 0), H, such that H (wl ) = J (wl , 1), H(wr ) = J¯S (wr , 1) and H(w) > J¯S (w, 1) for w ∈ (wl , wr ) with wL ≤ wl < wr ≤ wR and w ¯ ∈ (wl , wr ). ˆ ˆ ww (w) = Let w ˆ ∈ (w , w ) and h be the solution to the differential equation h l )r ( S ˆ ˆ w (w), w with boundary conditions h ˆ (w) ˆ w (w) Ψ h(w), h ˆ = J¯ (w, ˆ 1) and h ˆ = J¯wS (w, ˆ 1). ˆ (wl ) < J¯S (wl , 1) and h ˆ (wr ) < ˆ (w) Since h ˆ = J¯S (w, ˆ 1) < H (w). ˆ By Lemma C.6, h S J¯ (wr , 1). By the continuous dependence of the solution to the initial conditions, there is a small real number ε > 0 such that the solution, h, to the differential equation hww (w) = Ψ (h(w), hw (w), w) with initial conditions h (w) ˆ = J¯S (w, ˆ 1) and hw (w) ˆ = S S S ¯ ¯ ¯ Jw (w, ˆ 1) + ε satisfies J (wl , 1) − h(wl ) ≡ ϵ1 > 0 and J ((wr), 1) − h(w ( r) ≡ ) ϵ2 > 0. Since h ∈ (w, h − J¯S w h , 1 ≡ ϵ > 0. ¯S (w, hw (w) ˆ > J ˆ 1), there is a w ˆ w ) such that h w r { w} Let hk k∈N be the sequence of functions such that, for any k ∈ N, hk is the solu) ( tion to the ODE hkww (w) = Ψk hk (w), hkw (w), w with boundary conditions hk (w) ˆ = k Sk k Sk ˆ = J (w, ˆ 1) and hw (w) ˆ = Jw (w, ˆ 1)+ε. By Lemma D.3 and the fact that limk→∞ h (w) h (w), ˆ we have h(w) = limk→∞ hk (w) for all w ∈ [0, wRC ]. Therefore, there exists a k1 such that for all k > k1 the following conditions are satisfied: (1) hk (wh ) − h (wh ) < ϵ, so hk (wh ) > J¯S (wh , 1) ≥ J Sk (wh , 1); (2) hk (wl ) − h (wl ) < ϵ31 , so hk (wl ) < h (wl ) + ϵ31 = J¯S (wl , 1) − 2ϵ31 ; (3) hk (wr ) − h (wr ) < ϵ32 , so hk (wt ) < h (wr ) + ϵ32 = J¯S (wr , 1) − 2ϵ31 . On the other hand, by the fact that limk→∞ J Sk (wl , 1) = J¯S (wl , 1), there is a k2 such that for all k > k2 , J Sk (wl , 1) > J¯S (wl , 1) > J¯S (wl , 1) ϵ31 and by the fact that limk→∞ J Sk (wr , 1) = J¯S (wr , 1), there is a k3 such that for all k > k3 , J Sk (wr , 1) > k) > max {k1 , )k2 , k3 }, we have hk (wl ) < J¯S (wr , 1) > J¯S (wr , 1) ϵ32 . Hence, for all ( ( ( ) J Sk (wl , 1), hk (wr ) < J Sk (wr , 1) and hk wh > J¯S wh , 1 ≥ J Sk wh , 1 . Therefore, k k by (Lemma is a( working dominates ) C.10 (, there ) ) function of J (·, 0) which Sk ( h ) h and ( thus) k h k h S h J w , 0 > h w > J¯ w , 1 . Then J¯ (wh , 0) = limk→∞ J w , 1 > J¯S wh , 1 with wh ∈ [wL , wR ]–a contradiction. Lemmas D.5 and D.6 imply Proposition 8. The proofs of the lemmas also imply the following corollary. ¯ 1) and J(·, ¯ 1) belong to J . Corollary D.1. J(·, 41

D.4

Proof of Proposition 9

Let T (n) be the stopping time defined in Appendix D.2. If n is even, it is the time of a transition into the bad state. For each k ∈ N, let us consider the following counterfactual. The contract at T (2k) with the principal’s terminal payoff being ( is terminated ) characterized by J¯ wT (2k) , 0 . Denote the optimal contract promising the agent w in ( ) this contractual environment by Aˆk , Cˆ k . Then Proposition 8 implies that the prin-

¯ 0) under this contract. Now let us return to the actual cipal’s expected payoff is J(w, ( ) environment and implement Aˆk , Cˆ k up to T (2k), and thus let the agent retire with wT (2k) matching that in the counterfactual for all histories. Obviously, this implementation is incentive compatible and yields a lower expected payoff with the difference being ¯ 0) − J R (w). bounded by maxw∈[0,wRC ] J(w, ( ) Let π ˆ k be the principal’s expected payoff under the implementation of Ak , C k [ ] under the actual environment mentioned there. Then π ˆ k ≥ J¯ (w, 0) − ϱE e−βT (2k) ¯ 0) − J R (w). As shown in the proof of Proposition D.2, with ϱ = maxw∈[0,wRC ] J(w, [ −βT (2k) ] limk→∞ E e = 0 and thus limk→∞ π ˆ k ≥ J¯ (w, 0). On the other hand Proposition 9 implies that J¯ (w, 0) is the upper bound of the expected payoff under an incentivecompatible contract. So limk→∞ π ˆ k = J¯ (w, 0) and we have the desired result.

D.5

Properties of the Value Functions and the Optimal Contracts

The following proposition shows strict concavity of the two value functions. ¯ 1) and J(·, ¯ 0) are strictly concave. Proposition D.1. J(·, Proof. According to the their characterizations, if one of the two value functions is ¯ 0). Suppose not so that strictly concave, the other is. So I show strict concavity of J(·, ′ ′′ RC ¯ such that Jww (w, 0) = 0 for all w ∈ [w ˜ ,w ˜ ] ⊂ [0, w ]. Since J R is strictly concave, ′ ′′ ′ ′′ ¯ 0) > J R (w) for w ∈ [w′ , w′′ ]. I there is an interval [w , w ] ⊂ [w ˜ ,w ˜ ] such that J(w, ¯ 1) is strictly concave over [w′ , w′′ ]. show that J(·, ¯ 0) ≤ J(w, ¯ 1). By Proposition C.2, the Proposition 9 and Lemma D.1 implies J(w, R ′ ′′ ¯ ¯ 0) ≤ J(w, ¯ 1). fact that J(w, 0) > J (w) for w ∈ [w , w ] implies that J R (w) < J(w, ′ ′′ ¯ Therefore, [w , w ] is a subset of a working interval of J(·, 1) which is denoted by ¯ 1) is represented by the working function associated with it. Further[wL , wR ] and J(·, ¯ ¯ R , 1) = J R (wR ), J¯w (wL , 1) = JwR (wL ) and J¯w (wR , 1) = more, J(wL , 1) = J R (wL ), J(w R ¯ 1) is strictly Jw (wR ), then J¯w (wL , 1) > J¯w (wR , 1). So Lemma C.5 implies that J(·, ′ ′′ concave over [wL , wR ] and hence over [w , w ]-a contradiction. ¯ 1) and J(·, ¯ 0) strictly dominate J R over The following proposition shows that J(·, the same region of continuation utilities. ¯ 0) > J R (w) if and only if J(w, ¯ 1) > Proposition D.2. For any w ∈ [0, wRC ], J(w, R J (w). ¯ 1) ≥ J(w, ¯ 0) for all w ∈ [0, wRC ]. We only need Proof. Lemma D.1 implies that J(w, R ¯ ¯ ¯ 0) ≥ J¯S (w, 1), and to show that J(w, 1) > J (w) implies J(w, 0) > J R (w). Since J(w, S R R ¯ 1) > J (w) by part (a) of Proposition C.2. So we J¯ (w, 1) > J (w) if and only if J(w, have the desired result. ¯ 1) A straightforward corollary of this proposition is that in the region where J(·, R ¯ ¯ ¯ strictly dominates J(·, 0), J(·, 1) and J(·, 0) strictly dominate J . 42

¯ 1) > J(w, ¯ 0) implies J(w, ¯ 1) > J R (w) and Corollary D.2. For any w ∈ [0, wRC ], J(w, R ¯ 0) > J (w). J(w, Based on the result above, the next proposition shows that the agent’s continuation ¯ 1) > J R (w). utility is adjusted with respect to the state transitions as long as J(w, ¯ 1) > Proposition D.3. Suppose that [wL , wR ] ⊂ [0, u(∞)) is an interval such that J(w, ¯ ¯ ¯ ¯ J(w, 0) for all w ∈ (wL , wR ) and J(w, 1) = J(w, 0) for w = wL or wR . Then Jw (wL , 1) = J¯w (wL , 0) and J¯w (wR , 1) = J¯w (wR , 0). Proof. According to the continuity of the two value functions and their characterization, we only need to show that J¯w (0, 1) = J¯w (0, 0) in the case of wL = 0. Notice that (15) and the definition of ω in (14) imply J¯w (0, 1) = JwS (0, 1). On the other hand, ¯ 1) = J(0, ¯ 0) = J S (0, 1) and J(w, ¯ 1) ≥ J(w, ¯ 0) ≥ J¯S (w, 1) for all w ∈ [0, wRC ]. J(0, Therefore, we have the desired result.

E

Proof of Theorem 2

Equivalently, we show the following: If [w ¯L , w ¯R ] is a working interval in the good state, ¯L < w L < w R < w ¯R , and the agent shirks over the hence there exist wL , wR such that w intervals [w ¯L , wL ] and [wR , w ¯R ] in the bad state. I only prove the result about [w ¯L , w L ] as the other side is similar. Suppose not so that the agent exerts effort in both states over ¯L , 1) = J¯S (w ¯L , 0) ≤ J S (w ¯L , 0) this interval. Due to Part (b) of Proposition C.2, J¯ (w S and thus J¯ (w ¯L , 1) = J¯ (w ¯L , 0). If w ¯L > 0, the smooth pasting condition implies J¯w (w ¯L , 1) = J¯wS (w ¯L , 0) .

(E.1)

¯ 1) and J(·, ¯ 0) satisfy If w ¯L = 0, Proposition 3 implies (E.1). So J(·, [ ¯ ¯w Jww (w ¯L , 1) = min −2 g(a, 1) − c − J( ¯L , 1) + J¯w (w ¯L , 1) (w − u(c) + h(a) − λ(1)ϕ) a,c,ϕ ] ) λ(1) ( ¯ ¯ + J(w ¯L + βϕ, 0) − J(w ¯L , 1) /βρ(a, 1)2 ; β [ ¯w J¯ww (w ¯L , 0) = min −2 g(a, 0) − c − J( ¯L , 0) + J¯w (w ¯L , 0) (w − u(c) + h(a) − λ(0)ϕ) a,c,ϕ ] ) λ(0) ( ¯ ¯ J(w ¯L + βϕ, 1) − J(w ¯L , 0) /βρ(a, 0)2 . + β Equation (E.1) and the slope matching rule allow us to simplify the equations to [ ] ¯w ¯w (w −2 g (a, 1) − c − J( ¯ , 1) + J ¯ , 1) ( w ¯ − u(c) + h(a)) L L L ; J¯ww (w ¯L , 1) = min a,c βρ (a, 1)2 [ ] ¯w ¯w (w −2 g (a, 0) − c − J( ¯ , 0) + J ¯ , 0) ( w ¯ − u(c) + h(a)) L L L J¯ww (w ¯L , 0) = min . a,c βρ(a, 0)2

The optimal c is the same in the two HJB equations. So Assumption 3 implies J¯ww (w ¯L , 1) < J¯ww (w ¯L , 0)

(E.2)

¯w ¯w since J( ¯L , 1) = J( ¯L , 0) and J¯w (w ¯L , 1) = J¯w (w ¯L , 0). (E.2) implies that there exists a ¯ ¯ ¯ 1) ≥ w ˜r such that J(w, 1) < J(w, 0) for w ∈ [0, w ˜r ]–a contradiction to the fact that J(w, 43

¯ 0) for w ∈ [0, u(∞)). Thus there must be an interval [w J(w, ¯L , wL ] over which the agent shirks in the bad state. The second claim in this theorem is straightforward given the ¯ 1) dominates J(·, ¯ 0). above result, Proposition C.2, and the fact that J(·,

References B. Bernanke and M. Gertler. Agency costs, net worth, and business fluctuations. American Economic Review, 79:14–31, 1989. B. Bernanke, M. Gertler, and S. Gilchrist. The financial accelerator and the flight to quality. Review of Economics and Statistics, 78:1–15, 1996. A. Coddington and Norman Levinson. McGraw-Hill., 1955.

Theory of Ordinary Differential Equations.

J. Cvitani´c and J. Zhang. Contract Theory in Continuous-Time Models. Springer, 2013. P. DeMarzo and Y. Sannikov. Optimal security design and dynamic capital structure in a continuous-time agency model. Journal of Finance, 61:2681 – 2724, 2006. P. DeMarzo, M. Fishman, Z. He, and N. Wang. Dynamic agency and q theory of investment. The Journal of Finance, 67:2295–2340, 2012. A. Dixit and R. Pindyck. Investment under Uncertainty. Princeton University Press, 1994. R. Elliott. Stochastic Calculus and Applications. Springer-Verlag, 1982. P. Hartman. Ordinary Differential Equations. Society for Industrial and Applied Mathematics, 2002. F. Hoffmann and S. Pfeil. Reward for luck in a dynamic agency model. The Review of Financial Studies, 23:3329–3345, 2010. B. Holmstr¨om. Moral hazard and observability. The Bell Journal of Economics, 10: 74–91, 1979. B. Holmstr¨om. Moral hazard in teams. The Bell Journal of Economics, 13:324–340, 1982. B. Holmstr¨om and P. Milgrom. Aggregation and linearity in the provision of intertemporal incentives. Econometrica, 55:303–328, 1987. Chi-Fu. Huang and H. Pag´es. Optimal consumption and portfolio policies with an infinitinfinite: Existence and convergence. The Annal of Applied Probability, 2:36–64, 1992. J. Jacod and A. Shiryaev. Limit Theorems for Stochastic Processes. Springer, 2002. N. Kiyotaki and J. Moore. Credit cycles. Journal of Political Economy, 105:211–248, 1997. T. Piskorski and A. Tchistyi. Optimal mortgage design. The Review of Financial Studies, 23:3098–3140, 2010. 44

T. Piskorski and A. Tchistyi. Stochastic house appreciation and optimal mortgage lending. The Review of Financial Studies, 24:1407–1446, 2011. A. Rampini. Entrepreneurial activity, risk, and the business cycle. Journal of Monetary Economics, 51:555–573, 2004. M. Renardy and R. Rogers. An introduction to partial differential equations. SpringerVerlag, 2004. R. Rockafellar. Convex Analysis. Princeton University Press, 1970. H. Royden. Real Analysis. Prentice Hall, 1988. W. Rudin. Principles of Mathematical Analysis. McGraw-Hill, 1976. Y. Sannikov. A continuous-time version of the principal-agent problem. Review of Economic Studies, 75:957–984, 2008. H. Sch¨ attler and J. Sung. The first-order approach to the continuous-time principalagent problem with exponential utility. Journal of Economic Theory, 61:331–371, 1993. D. Stroock. Lectures on Stochastic Analysis: Diffusion Theory. Cambridge University Press, 1987. M. Szydlowski. Ambiguity in dynamic contracts. Working Paper, University of Minnesota, 2012. N. Williams. On dynamic principal-agent problems in continuous time. Working Paper, University of Wisconsin-Madison, 2009. K. Zhong. A Course In Probability. Acadamic Press, 1974. J. Zhu. Optimal contracts with shirking. Review of Economic Studies, 80:812–839, 2013.

45