An escape time interpretation of robust control

Journal of Economic Dynamics & Control 42 (2014) 1–12 Contents lists available at ScienceDirect Journal of Economic Dynamics & Control journal homep...

Download PDF

379KB Sizes 6 Downloads 80 Views

Report

PDF Reader
Full Text

Journal of Economic Dynamics & Control 42 (2014) 1–12

Contents lists available at ScienceDirect

Journal of Economic Dynamics & Control journal homepage: www.elsevier.com/locate/jedc

An escape time interpretation of robust control In-Koo Cho a,c, Kenneth Kasa b,n a

Department of Economics, University of Illinois, Urbana, IL 61801, USA Department of Economics, Simon Fraser University, 8888 University Drive, Burnaby, BC, Canada V5A 1S6 c College of Economics and Finance, Hanyang University, Seongdong-gu, Seoul 133-1791, Korea b

a r t i c l e i n f o

abstract

Article history: Received 28 August 2013 Received in revised form 23 February 2014 Accepted 28 February 2014 Available online 12 March 2014

This paper studies the problem of an agent who wants to prevent the state from exceeding a critical threshold. Even though the agent is presumed to know the model, the optimal policy is computed by solving a conventional robust control problem. That is, robustness is induced here by objectives rather than uncertainty, and so is an example of the duality between risk-sensitivity and robustness. However, here the agent only incurs costs upon escape to a critical region, not during ‘normal times’. We argue that this is often a more realistic model of macroeconomic policymaking. & 2014 Elsevier B.V. All rights reserved.

JEL classification: C61 D81 Keywords: Robust control Large deviations

1. Introduction Robust control methods have become popular in economics.1 This popularity stems from their ability to provide convenient formalizations of ambiguity and Knightian Uncertainty.2 The presumption in typical applications of robust control is that agents want to maximize expected utility, but confront forms of model uncertainty that are difficult to capture with conventional finite-dimensional Bayesian priors. In response, agents construct a set of plausible priors, and optimize against the worst-case prior. This can be interpreted as maximizing expected utility with respect to a pessimistically distorted prior. This paper argues that robust control methods are useful even when agents know the correct model, or are able to construct a unique prior about it. In many policy settings, what policymakers really care about are extreme events, e.g., crises or ‘market meltdowns’. Using results from Dupuis and McEneaney (1997), we show that robust control policies can be interpreted as minimizing the expectation of an exponential function of the escape time to a critical loss threshold. This implies agents are especially averse to rapid escapes. The fact that expectations are evaluated using the true (undistorted) probability measure implies agents trust their models and confront no model uncertainty. Instead, robust policies are useful because they reduce the likelihood that a known and trusted system will rapidly hit an undesirable threshold.3

n

Corresponding author. Tel.: þ1 778 782 5406. E-mail addresses: [email protected] (I.-K. Cho), [email protected] (K. Kasa). URL: http://www.sfu.ca/kkasa (K. Kasa). 1 See Hansen and Sargent (2008) for a textbook treatment. For an up-to-date survey see Hansen and Sargent (2011). 2 Strzalecki (2011) provides axiomatic foundations linking robust control, ambiguity, and Knightian Uncertainty. 3 This class of problems has been studied in the engineering literature. Meerkov and Runolfsson (1988) and Clark and Vinter (2012) provide examples.

http://dx.doi.org/10.1016/j.jedc.2014.02.014 0165-1889/& 2014 Elsevier B.V. All rights reserved.

2

I.-K. Cho, K. Kasa / Journal of Economic Dynamics & Control 42 (2014) 1–12

That robust control, motivated by model uncertainty, can be reinterpreted as an aspect of preferences, motivated by ‘risksensitivity’, is already well known. Mathematically, this equivalence derives from a Legendre transform duality between the two.4 What is new here is our specification of the costs experienced by our risk-sensitive agent. Normally, these costs are determined by the instantaneous loss function of the agent, which depends on the current values of the state and control variables. In contrast, we suppose our agent experiences no state costs whatsoever during ‘normal times’, i.e., when instantaneous losses are below the threshold. What matters to him is the frequency of disasters. Despite this change in the cost function, we show there is still a connection between robustness and risk-sensitivity. However, from a mathematical standpoint, our result is based more on the Feynman–Kac formula than on the Legendre transform. We make no effort here to derive this boundary cost objective function from first principles. A natural question, of course, is why the policymaker simply does not maximize the welfare of the agents who elected or appointed him. One possibility is that policymakers lack the sort of detailed information on individual preferences that would allow them to perform this sort of optimization, and so resort to simpler and more robust rules of thumb, like avoiding disasters. Another possibility is that agents themselves experience ‘zones of indifference’, perhaps based on information processing constraints and ‘rational inattention’, which make them primarily sensitive to thresholds. However, we leave the exploration of these possibilities to future research. It should be noted, however, that our analysis does not require the assumption that costs only depend on hitting times. Neglecting running costs simply makes the math easier, and in our opinion, makes the demonstrated correspondence more interesting. The remainder of the paper is organized as follows. Section 2 lays out a conventional continuous-time Linear-Quadratic Regulator (LQR) problem, and shows that the optimally controlled state evolves as an Ornstein–Uhlenbeck process. Section 3 does the same thing for the robust control case. Section 4 first compares the stationary and hitting-time distributions implied by conventional and robust control policies. The stationary distributions are both Gaussian, but robust policies produce distributions with a smaller variance. Using results from Day (1983), we show that first hitting-time densities are asymptotically exponential (in the small noise limit), and that robust policies produce densities that are skewed toward longer escape times. We then discuss the sense in which conventional robust control policies are implied by an objective that penalizes rapid hitting times. Section 5 applies the results to an example in which a firm attempts to maximize the expected present value of dividend payments out of a stochastic net cash flow process subject to a bankruptcy constraint. We show that an increase in bankruptcy costs and an increased preference for robustness both lead to an increase in the dividend payment threshold. Section 6 concludes and offers a few suggestions for future research.

2. The conventional LQR Given our focus on hitting and escape time issues, it proves to be convenient to work in continuous time. So consider an agent with the following objective function: Z 1 h i 1 xðtÞ2 þ λuðtÞ2 e ρt dt ð2:1Þ V ðx0 Þ ¼ min E0 u 2 0 where x(t) is the state, u(t) is a control variable, and λ is a parameter representing the relative cost of control. Without loss of generality, we assume that x(t) and u(t) are both scalars. The state transition equation is given by the following linear diffusion process: dx ¼ ð axþ buÞ dt þ s dW

ð2:2Þ

where W(t) is a Brownian motion process. Employing Ito's Lemma, the Hamilton–Jacobi–Bellman (HJB) equation for this problem is given by the following second-order ordinary differential equation: 1 2 1 x þ λu2 þ V 0 ðxÞð axþ buÞ þ s2 V″ðxÞ ρV ðxÞ ¼ min ð2:3Þ u 2 2 which yields the optimal policy function, uðxÞ ¼ ðb=λÞV 0 ðxÞ. Substituting this back in, and conjecturing that V ðxÞ ¼ 12 Px2 þC produces the following algebraic Riccati equation for P ! 2 2 b b ð2:4Þ ρP ¼ 1 þ P 2 2P a þ P λ λ which has the following unique positive solution5 qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 ½ρ þ 2a þ ½ρ þ 2a2 þ 4b λ 1 P¼ 2 1 2b λ 4 5

Hansen and Sargent (2008) provide a detailed discussion of the connections between robustness and risk-sensitivity. Given P, one can readily verify that the constant is given by C ¼ s2 P=ð2ρÞ. Note, this constant does not influence behavior.

ð2:5Þ

I.-K. Cho, K. Kasa / Journal of Economic Dynamics & Control 42 (2014) 1–12

3

For future reference, note that the optimally controlled state evolves according to the following Ornstein–Uhlenbeck process: ! 2 b ð2:6Þ dx ¼ a þ P x dt þ s dW βx dt þs dW λ This process has a Gaussian stationary distribution, with a zero long-run mean and a long-run variance of s2 =2β. Keep in mind the crucial assumption here is that the agent only cares about expected losses, which depend only on the average behavior of the state. Our contention is that in practice policymakers often care more about sample paths, and in particular are especially averse to extreme events, like ‘market meltdowns’. 3. A robust LQR Now suppose the agent distrusts the model given by Eq. (2.2), in a way that cannot be captured by a conventional Bayesian prior. Following Hansen and Sargent (2008), we assume that the agent enlists the services of a so-called ‘evil agent’, who is imagined to select models so as to maximize the agent's expected losses. Since this worst-case model depends on the agent's own policy, the agent views himself as being immersed in a dynamic zero-sum game. To prevent the agent from being unduly pessimistic, in the sense that he would be hedging against empirically implausible models, the evil agent is assumed to pay a penalty that is proportional to the relative entropy between the evil agent's worst-case model and the agent's own benchmark model in (2.2). It turns out that in continuous-time, these alternative models only differ in their drift terms. To see how this works, let q0t be the probability measure defined by the Brownian motion process in the benchmark model (2.2), and let qt be some alternative probability measure, defined by some competing model. The (discounted) relative entropy between qt and q0t is then defined as follows6: "Z ! # Z 1 dqt ρt RðqÞ ¼ log e dqt dt ð3:1Þ 0 0 dqt Evidently, RðqÞ is just an expected log-likelihood ratio statistic, with expectations computed using the distorted probability measure. It can also be interpreted as the Kullback–Leibler ‘distance’ between qt and q0t . From Girsanov's Theorem we have ! Z t Z 2 dqt 1 hs j ds dqt ¼ E~ log 0 2 0 dq t

where E~ denotes expectations with respect to the distorted measure qt, and hs represents a square-integrable process that is progressively measurable with respect to the filtration generated by qt. Again from Girsanov's Theorem, we can view qt as being induced by the following drift distorted Brownian motion7: Z t ~ ðtÞ ¼ WðtÞ hs ds W 0

which then defines the following conveniently parameterized set of alternative models ~ dx ¼ ð ax þ bu þ shÞ dt þ s dW

ð3:2Þ

The evil agent picks h so as to maximize the agent's losses, subject to a (discounted) relative entropy penalty. This gives rise to the following dynamic zero-sum game Z 1 h i 1 2 xðtÞ2 þλuðtÞ2 θh ðt Þ e ρt dt ð3:3Þ V ðx0 Þ ¼ min max E0 u 2 h 0 subject to the state transition equation in (3.2). The crucial parameter here is θ, which penalizes the actions of the evil agent. We shall see that as θ-1 the agent's policy converges to the conventional (nonrobust) policy. Using Ito's Lemma, the Hamilton–Jacobi–Isaacs equation for this game can be written as follows8: h i 1 1 2 2 x þ λu2 θh þ V 0 ðxÞ axþ bu þ sh þ s2 V″ðxÞ ð3:4Þ ρV ðxÞ ¼ min max u 2 2 h which gives rise to the following policy functions: uðxÞ ¼

b 0 V ðxÞ λ

hðxÞ ¼

s 0 V ðxÞ θ

6 See Hansen et al. (2006) for a detailed discussion of robust control in continuous-time models, and in particular, on the role of discounting in the definition of relative entropy. 7 ~ and W generate different filtrations. See Hansen et al. (2006) for details. There are some subtleties here arising from the possibility that W 8 As always, the question arises as to whether we can reverse the ordering of the min and max operators here. In what follows we assume this ‘Isaacs condition’ is satisfied. Again, see Hansen et al. (2006) for details.

4

I.-K. Cho, K. Kasa / Journal of Economic Dynamics & Control 42 (2014) 1–12

Substituting these back into (3.4) and conjecturing that V ðxÞ ¼ 12 P θ x2 þ C θ produces the following modified Riccati equation for P θ ! 2 b s2 2 P θ þ ½ρ þ 2aP θ 1 ¼ 0 λ θ which has the unique positive solution9 qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 ½ρ þ 2a þ ½ρ þ2a2 þ 4ðb λ 1 s2 θ 1 Þ Pθ ¼ 2 2ðb λ 1 s2 θ 1 Þ

ð3:5Þ

Notice that P θ -P as θ-1, so that robust control policies converge to their traditional Rational Expectations counterparts as the evil agent's actions are penalized more severely. Also note the breakdown of Certainty Equivalence, since now s2 enters the agent's decision rule. Assuming the agent's doubts are only ‘in his head’, so that the benchmark model is in fact the true data-generating process, implies that the state follows the following Ornstein–Uhlenbeck process ! 2 b dx ¼ a þ P θ x dt þ s dW βθ x dt þs dW ð3:6Þ λ where βθ is defined exactly as before. The only difference is that P is replaced by P θ . 4. Robustness and escape This section consists of two parts. The first part compares the first hitting time densities of traditional and robust control policies. We do this by comparing the large deviation properties of the two Ornstein–Uhlenbeck processes in Eqs. (2.6) and (3.6). We show that the hitting time densities of robust policies are shifted toward longer first hitting times. In the second part, we interpret these results using results from Dupuis and McEneaney (1997). We show that if agents minimize the expectation of an exponentially decreasing function of the state variable's first hitting time, the resulting optimal policy function corresponds to a robust control policy function. 4.1. First hitting-time densities Notice that traditional and robust control policies differ only in their resulting value function parameters, P and P θ . 2 Comparing Eqs. (2.5) and (3.5), it is clear that P θ 4P. Since the policy functions take the form uðxÞ ¼ ðb =λÞPx, this implies robust policies are more ‘activist’. Fig. 1 presents plots of the stationary state densities implied by the robust and nonrobust policy functions. Since in both cases the state follows an Ornstein–Uhlenbeck process, they are both Gaussian. The preference parameters are ρ ¼ :05 and λ ¼ 1:0. The state transition parameters are s2 ¼ :01, b ¼0.14, and a ¼ :5ρ. Note that a is set so that P θ has the simple 2 evaluation, P θ ¼ ðb λ 1 s2 θ 1 Þ 1=2 . Also note that these parameter values imply the breakdown point is θ^ ¼ :5. The depicted robust density pertains to the value θ ¼ 1:0, while the nonrobust density is approximated by setting θ ¼ 5000. Not surprisingly, the greater activism of the robust policy produces a stationary state density with smaller variance. By itself, this suggests that robust policies are less prone to disasters. However, it is important to keep in mind that the kinds of disasters that are likely to matter in practice are those that result from a sequence of small shocks, as opposed to a single large shock. Crises typically occur when an economy has been pushed to ‘the edge’, e.g., following a gradual (and presumably undetected) build-up of debt and excessive leverage. Since the optimally controlled state process is not i.i.d., the stationary variance can be a misleading indicator of first-hitting times. For example, the standard deviations of the above distributions are not that different. The nonrobust standard deviation is 0.21, while the robust standard deviation is 0.17. A naive use of these standard deviations to calculate disaster probabilities would suggest that there is not much difference. Unfortunately, the first-passage density of an Ornstein–Uhlenbeck process does not exist in closed-form.10 However, since we are interested in the probability of disasters, which are presumably rare events, we can obtain a small-noise approximation using the pioneering large deviation results of Freidlin and Wentzell (1998). To apply these results, start by pﬃﬃﬃ scaling the state process as follows: dx ¼ βx dt þ εs dW. Next, let x be a critical threshold, and suppose we are interested in keeping the state within the symmetric interval ½ x; x. If we then define τ ¼ infft 4 0 : jxðtÞj Zxg as the first-passage time to the boundary, we have (see Theorems 3.1 and 4.1 in ch. 4 of Freidlin and Wentzell, 1998) lim logfEðτÞg ¼ ε-0

2βx 2 εs2

ð4:1Þ

2

9 We must impose the existence condition θ 4s2 λ=b . The lower bound here corresponds to the agent's ‘breakdown point’. See, e.g., Hansen and Sargent (2008). 10 However, there is a well known series expansion, known as the ‘confluent hypergeometric function’.

I.-K. Cho, K. Kasa / Journal of Economic Dynamics & Control 42 (2014) 1–12

5

2.5 Robust 2

1.5

1

0.5

0 −1

−0.5

0

0.5

1

Fig. 1. Stationary state densities. 0.07 0.06 0.05 0.04 0.03 0.02

Robust

0.01 0

0

5

10

15

20

25

Fig. 2. First passage densities.

Hence, the mean first-passage time is an exponential function of the process parameters. This has two important implications. First, it implies that disasters are rare events. Second, it implies that even small differences in parameters can have large effects on mean first-passage times. For example, setting x ¼ 1 in the above example, so we are essentially considering a five sigma event, then using the above parameters values in (4.1) implies that the mean first-passage time for the nonrobust policy is about 15 years. (The discount parameter is suggestive of an annual time period). In contrast, the robust policy implies a mean first-passage time of 457 years! Note that naive use of the central limit theorem to predict disaster probabilities would produce two errors. First, it would lend a false sense of security. After all, a one-time five sigma deviation would be a one in a million event. Second, it would lead you to believe that robust policies do not make much difference (at least in percentage terms), since they would both appear to rule out escapes on any relevant time scale. Avoidance of these kinds of errors is the raison d'etre of large deviations theory. Our central contention here is that policymakers care about sample paths, not averages. Comparing mean first-passage times is a useful start, but again, the first moment is itself just an average, and one might argue that what really matters is the probability of rapid escapes. Otherwise, policy might unduly focus on very low probability events that nonetheless exert a strong influence on the mean. Ideally, what we want is the entire first-passage density. Day (1983) shows that this density is asymptotically (in the small noise limit) exponential. Specifically, let fðτÞ be the first-passage density to some boundary x, and let τ ¼ EðτÞ be its mean. Then Day (1983) shows that lim fðτÞ ¼ τ 1 e τ=τ ε-0

Fig. 2 plots fðτÞ for the same parameter values as in Fig. 1. Evidently, not only does the robust policy produce a larger mean first-passage time, it is much less likely to produce rapid escapes. For example, the nonrobust policy has a 28% chance of producing a crisis within 5 years, whereas with the robust policy the probability of a crisis remains negligible over any relevant time horizon.

6

I.-K. Cho, K. Kasa / Journal of Economic Dynamics & Control 42 (2014) 1–12

4.2. Escape time control The previous section showed that a robust control policy, designed to keep the average performance of an unknown model near some desired operating level, has the indirect effect of reducing disaster probabilities. Here we approach the problem from the opposite direction. Now suppose the agent has full confidence in his model, but rather than try to keep the system near a target, he tries to keep the system away from a critical threshold. We shall now show that the solution of this problem can be cast as a conventional robust control problem, i.e., as a dynamic zero-sum game. The new twist here is that now the evil agent not only pays a relative entropy cost, but also a time cost. Once again, let τ ¼ inf fs 40 : xðt þsÞ Z xg be the first-passage time to some critical threshold x, and consider an agent with the following objective function: Z τ 1 1 λuðsÞ2 ds θτ V ðxÞ ¼ min Et exp ð4:2Þ u ε t 2 subject to the scaled state transition equation pﬃﬃﬃ dx ¼ ðax þ buÞ dt þ εs dW

ð4:3Þ

Notice there is no model uncertainty here. The expectations operator in (4.2) is based on the known law of motion in (4.3). Also notice that no explicit state running costs are incurred. Rather, state costs are implicit, and reflect a fear of the boundary. Finally, notice that the agent wants τ to be big, and the fact that losses depend exponentially on τ implies that the agent is especially averse to rapid escapes. Using the Feynman–Kac formula to evaluate the expectation in (4.2) yields the following HJB equation11:

1 2 1 1 εs V″ðxÞ þ ðax þ buÞV 0 ðxÞ θ λu2 V ðxÞ ¼ 0 min u 2 ε 2 We shall study this equation using two tricks pioneered by Fleming (1978). First, define the log transformation, FðxÞ ¼ ε logðVðxÞÞ, and write the HJB equation as follows:

1 2 1 1 εs F″ðxÞ s2 ðF 0 ðxÞÞ2 þ ðaxþ buÞF 0 ðxÞ þ θ λu2 max ¼0 u 2 2 2 where now the boundary condition becomes FðxÞ ¼ 0 (assuming we normalize VðxÞ ¼ 1). At first sight, this seems like regress, not progress, since we have converted a linear ODE into a nonlinear ODE. However, we can now exploit Fleming's second trick to express this nonlinearity as the outcome of an optimization problem. Specifically, using the fact that, 1 1 p f ðx; uÞ s2 p2 ¼ min p f ðx; uÞ þ sv þ v2 v 2 2 we can write the HJB equation as follows: 1

1 2 εs F″ðxÞ þ max min F 0 ðxÞ ax þbu þsv þ v2 λu2 þ θ ¼ 0 u v 2 2

ð4:4Þ

Now comes the crucial insight. Notice that Eq. (4.4) is the HJB equation for the following zero-sum differential game: Z τ

1 2θ þ v2 λu2 dt F ε ðxÞ ¼ max min E0 u v 0 2 pﬃﬃﬃ s:t: dx ¼ ðax þ bu þ svÞ dt þ εs dW ð4:5Þ Notice the resemblance here to a traditional robust control problem. A hypothetical evil agent distorts the model's drift term subject to a penalty. As usual, the evil agent's cost function depends on the magnitude (relative entropy) of his distortion, as measured by v2. The new twist here is that his cost function also depends on the duration of his distortion, as captured by the penalty parameter θ. If θ ¼ 0, the agent does not care about escapes, and given the absence of state costs, the optimal policy is clearly u ¼0. Despite this surface resemblance, keep in mind that the introduction of a pessimistically distorted model does not reflect model uncertainty. It reflects a desire to avoid a boundary. This (endogenous) boundary cost makes this problem like a ‘pursuit-evasion’ game. The policymaker tries to delay hitting a boundary, while his evil twin tries to accelerate the hitting time. As a result, the calibration of θ works somewhat differently here. Rather than being related to detection error probabilities, θ measures the flow marginal benefits from keeping the system alive (or the annuitized opportunity cost of allowing it to fail). The Markov perfect Nash policies associated with this game are as follows: u¼

b 0 F ðxÞ λ ε

v ¼ sF 0ε ðxÞ

ð4:6Þ

Unfortunately, after substituting these back into the HJB equation in (4.4) we get an ODE that does not have a closed-form solution. The good news, however, is that we are not really interested in solving this problem for an arbitrary ε. The idea 11

See, e.g., Karlin and Taylor (1981).

I.-K. Cho, K. Kasa / Journal of Economic Dynamics & Control 42 (2014) 1–12

7

0

−0.5

θ =0.2 −1

−1.5

θ =4.0

−2

−2.5

0

0.2

0.4

0.6

0.8

1

Fig. 3. Escape time control policy.

here is to avoid crises, which are presumably rare events. As a result, we can settle for a small noise approximation. Dupuis and McEneaney (1997) show that limε-0 F ε ðxÞ ¼ F 0 ðxÞ, where F 0 ðxÞ is the unique ‘viscosity solution’ of the deterministic differential game obtained by setting ε ¼ 0 in (4.4).12 Substituting the policy functions in (4.6) into the HJB equation in (4.4) and then setting ε ¼ 0 gives us the following small noise limit: :5ðb λ 1 s2 ÞðF 00 Þ2 þ axF 00 þ θ ¼ 0 2

To decide which is the appropriate root of this quadratic, remember that u ¼0 when θ ¼ 0. Evidently, we must therefore take the positive square root when a 40 and the negative root when a o 0. This gives us the following solution for F 00 : F 00 ðxÞ ¼

ax7

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 ðaxÞ2 2θðb λ 1 s2 Þ b λ 1 s2 2

ð4:7Þ

In principle, we could integrate this to get the (log transformed) value function. However, the solution is rather messy, and unnecessary in any event if we are only interested in observed behavior (i.e., the policy function), since that only depends on F 00 ðxÞ.13 Fig. 3 contains plots of the policy function for two values of θ, assuming the crisis threshold is x ¼ 1. The parameter values are the same as in Figs. 1 and 2, with two exceptions. First, a ¼ :5ρ instead of :5ρ. This implies the uncontrolled deterministic system will hit x. To prevent this, the agent must set u o 0. Second, λ is increased from 1.0 to 3.0. This makes the evil agent relatively more powerful, so that despite the agent's best efforts, the state eventually hits the threshold. This ensures the value function is finite. The key point to notice here is that when θ increases, so that disasters become more costly, the agent's policy becomes more aggressive. That is, the agent responds to increased boundary costs exactly like he would to an increase in model uncertainty! However, the correspondence here between robust and escape time control is not exact. In particular, closer inspection of Fig. 3 reveals that the policy also becomes less aggressive as the boundary is approached. This reflects the fact that the marginal benefit from not hitting the threshold is constant while the marginal cost of exerting control is increasing. To make the correspondence between robust and escape time control closer, we could simply reinstate quadratic state costs into the objective function. This would make the agent want to increase control effort as x rises. However, it is clear that the nonlinearity exhibited by the boundary cost policy function in (4.7) means that in this Linear-Quadratic environment the correspondence between the two can never be exact. In the following section, we shall study an example where the correspondence is exact. In this example, the policy function itself features a threshold, and this threshold responds in the same way to model uncertainty and boundary costs. 12 Fleming and Soner (1993) provide a thorough discussion of viscosity solution methods in the context of dynamic optimization. Note, this raises the question of how accurate these approximations are for a given ε4 0. Unfortunately, there are few general results here. Dembo and Zeitouni (1998, ch. 3) contain a discussion of ‘sharp’ large deviations results and higher-order asymptotics, which would deliver more accurate approximations. However, these are difficult to implement. Anderson et al. (2012) provide examples where these kinds of approximations are quite accurate. On the other hand, Kolyuzhnov et al. (2014) argue that they can be inaccurate for empirically plausible values of ε. 13 A caveat: The viscosity solution method tells us that F ε converges pointwise to F0. It does not in general guarantee that F 0ε converges to F 00 . We assume this is the case.

8

I.-K. Cho, K. Kasa / Journal of Economic Dynamics & Control 42 (2014) 1–12

5. An example Although barriers and thresholds are pervasive in economics, perhaps the canonical example arises from a bankruptcy constraint. In this section, we study the problem of a firm that wants to maximize the expected present value of dividend payments out of a stochastic net cash flow process. The firm goes bankrupt if its accumulated cash reserves ever hit zero. Following Dutta and Radner (1999), we show that even a risk neutral firm will only pay dividends if accumulated cash reserves are sufficiently above zero. We can characterize this dividend payment threshold using the tools of option pricing, since the payment of a dividend here is tantamount to the exercise of an option. We solve this problem under three different scenarios. First, we suppose that the firm simply maximizes the expected present value of dividend payments. The only cost to bankruptcy is that the firm goes out of business. Second, we suppose the firm suffers an additional explicit bankruptcy cost. If this future cost is discounted, then it effectively adds a flow payment to the firm, providing it an additional incentive to stay in business, which leads the firm to increase its dividend payment threshold. Finally, in the third case we eliminate the bankruptcy cost and instead suppose the firm entertains doubts about the specification of the net cash flow process. In response, the firm formulates a robust dividend payment policy. We then derive conditions under which the robust dividend policy is equivalent to the bankruptcy cost policy. 5.1. Risk-neutrality with no bankruptcy cost The analysis here is a special case of Dutta and Radner (1999). They study a more complicated problem in which the firm can also influence the drift and volatility of the net cash flow process. For simplicity, we assume it is exogenous. In particular, letting dx(t) denote net earnings at instant t, we assume it is governed by the following diffusion process: dx ¼ μ dt þs dW

μ40

ð5:1Þ

where μ and s2 are both constant. If we then let dz(t) denote the rate of dividend payments, accumulated cash reserves are given by yðtÞ ¼ y0 þ xðtÞ zðtÞ

ð5:2Þ

where y0 represents initial cash reserves. The firm's objective is to maximize the expected present discounted value of the flow of dividend payments Z τ E e rt dzðtÞjyð0Þ ¼ y0 ð5:3Þ 0

where τ ¼ infft 40 : yðtÞ ¼ 0g is the point in time at which the firm goes bankrupt. As emphasized by Dutta and Radner (1999), maximizing the present value of dividend payments (i.e., the value of the firm), does not guarantee survival. In fact, we shall see that a value maximizing firm goes bankrupt with probability one. The HJB equation for (5.3) is given by 1 ð5:4Þ rV ðyÞ ¼ max dz þ ðμ dzÞV 0 ðyÞ þ s2 V″ðyÞ z 2 Note that since the instantaneous marginal benefit from paying a unit of dividends is just 1, while the instantaneous marginal cost is V 0 ðyÞ, the firm will not pay any dividends if V 0 ðyÞ 41. Moreover, since the cost of changing the level of cash reserves is proportional to the size of the change, the optimal dividend policy will be one of ’barrier control’, where dividends are paid out so as to keep y from rising above some barrier, b4 0. To fully characterize the optimal policy we just need to compute the barrier b.14 As a first step, we solve the HJB equation in (5.4) given the conjecture dz¼0. Since we are then left with a linear second-order ODE, the general solution for the value function is easily obtained, VðyÞ ¼ Aeλ1 y þBeλ2 y y rb ¼ VðbÞ þy b y Zb where λ1 and λ2 are given by the roots of the characteristic equation, pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ μ þ μ2 þ 2rs2 λ1 ¼ 40 s2 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ μ μ2 þ 2rs2 o0 λ2 ¼ s2

ð5:5Þ 1 2 2 2 s λ þ μλ r

¼ 0, ð5:6Þ

ð5:7Þ

The constants of integration, ðA; BÞ, and the optimal dividend payment threshold, b, are then determined by the following three boundary conditions: Vð0Þ ¼ 0

ð5:8Þ

V 0 ðbÞ ¼ 1

ð5:9Þ

V″ðbÞ ¼ 0

ð5:10Þ

14 See, e.g., Stokey (2009, ch. 10) for a more rigorous discussion. Note, if y0 4b, the firm pays dividends at an infinite rate, so as to push y discretely downward to b.

I.-K. Cho, K. Kasa / Journal of Economic Dynamics & Control 42 (2014) 1–12

9

The first condition just follows from the fact that a bankrupt firm is worth nothing, along with the assumption that there is no explicit bankruptcy cost. The next two conditions are the well known ‘smooth pasting’ and ‘super contact’ conditions. Using the fact that jλ2 j 4 λ1 , one can readily verify that V(y) is strictly concave for y ob, so that (5.9) verifies our conjecture that the firm will never pay dividends if y o b. Nonetheless, note that the firm still has a positive option value in this region, as determined by the two exponential functions in (5.5). Since (5.8) implies Aþ B ¼ 0, it is clear that b is fully determined by the super contact condition in (5.10),

2 λ2 ¼ eðλ1 þ jλ2 jÞb ð5:11Þ λ1 From this one can readily verify that b increases with s2. A larger variance makes it more likely that for a given y the bankruptcy constraint will be hit. Even though the firm is risk-neutral, this constraint effectively makes the firm risk averse, and so it raises its dividend threshold. A larger variance also increases the option value to delay paying dividends, which also acts to increase b. The influence of μ is more subtle, and also more important, since we shall see below that robustness operates through a drift distortion. On the one hand, an increase in μ obviously makes it less likely that you will hit the bankruptcy constraint. As a result, the firm can afford to start paying dividends with a smaller cushion of cash reserves, so b decreases. On the other hand, μ also increases the value of the option to wait, which increases b. In general, the effect is ambiguous. One can show, however, that b is more likely to decrease with μ when μ is large and r is small. Perhaps the most interesting aspect of this problem, and what was the central focus of Dutta and Radner (1999), is that a value maximizing firm will go bankrupt with probability one. This simply follows from the fact that the optimal dividend policy imposes a fixed upper bound on y. As noted earlier, first-passage densities rarely exist in closed-form. However, if we just want to compute the mean time until bankruptcy, we can proceed as follows. First, starting with Kolmogorov's backward equation, which characterizes the evolution of the probability distribution of the controlled y(t) process given an absorbing barrier at 0 and a reflecting barrier at b, we can then take the Laplace transform to convert this PDE to an algebraic equation in the transform. We can then differentiate with respect to the transform variable and evaluate at zero to get the mean time until absorption. This gives us15 y s2 ð2μ=s2 Þb 2 ð5:12Þ E ðτ Þ ¼ e 1 e ð2μ=s Þy 2 μ 2μ Not surprisingly, EðτÞ increases with y. Although it may appear at first sight as if EðτÞ also increases with b, remember b is endogenous, and depends on μ and s2. For example, a higher b could be the result of a lower μ. 5.2. Risk-neutrality with bankruptcy cost In the previous analysis, the firm only cared about bankruptcy to the extent that it influenced its ability to pay dividends. Our central contention in this paper is that often boundaries inflict costs of their own. This is especially true in the case of bankruptcy where, for example, legal fees can be substantial. If we let θ denote the bankruptcy cost, and assume that it is discounted at the same rate as dividends, we get Z τ max E e rt dzðtÞ Eθe rτ ð5:13Þ z

0

Rτ Using the fact that r 0 e rs ds ¼ 1 e rτ we can write this as Z τ e rt ½dzðtÞ þ θr dt θ max E z

ð5:14Þ

0

The point to observe here is that a discounted bankruptcy cost has the effect of introducing an additional positive flow return, even when no dividends are being paid. Since this flow return will cease if the firm goes bankrupt, it makes the firm want to stay in business longer. Clearly, when θ increases, so does the flow return, and so the more the firm wants to delay bankruptcy. The HJB equation for this new problem is 1 rV ðyÞ ¼ max dzþ rθ þ ðμ dzÞV 0 ðyÞ þ s2 V″ðyÞ ð5:15Þ z 2 Conjecturing once again that the optimal policy is one of barrier controls at y¼b, we can solve for the value function VðyÞ ¼ θ þ Aeλ1 y þ Beλ2 y

y rb

ð5:16Þ

Importantly, ðλ1 ; λ2 Þ satisfy the same characteristic equation as before, and so are given by (5.6) and (5.7). The boundary conditions are also the same as before. Notice, however, that instead of A þ B ¼ 0, we now have Aþ B ¼ θ. This makes the characterization of the optimal boundary a little more difficult, as it is no longer fully determined by the super contact condition. Fortunately, for our purposes we do not need an explicit analytical solution. We just need to know how b changes with θ, and this is relatively straightforward. First, let SðbÞ λ21 eλ1 b λ22 e jλ2 jb , and note that the optimal threshold without 15

See Cox and Miller (1965, pp. 233–234) for details.

10

I.-K. Cho, K. Kasa / Journal of Economic Dynamics & Control 42 (2014) 1–12

bankruptcy cost is determined by the unique b 4 0 such that SðbÞ ¼ 0. Next, note that the super contact condition with bankruptcy cost θ is given by ASðbÞ ¼ θλ22 e jλ2 jb

ð5:17Þ

Next, one can readily verify that S(b) is increasing in b, while A and the right-hand side of (5.17) are both positive. As a result, we know that the optimal threshold with bankruptcy cost exceeds the optimal threshold without bankruptcy cost. Moreover, since A decreases with θ while the right-hand side increases, we know that the threshold becomes higher when bankruptcy becomes more costly. This makes perfect sense. When the cost of bankruptcy increases, the firm becomes more cautious when paying dividends. More interestingly, we shall now see that the firm does the same thing when it becomes more unsure about the process driving net cash flows.

5.3. Risk-sensitivity/robustness with no bankruptcy cost Let us now go back to the case where there is no explicit bankruptcy cost. Instead, the firm doubts the specification of the net cash flow process in (5.1). In response, it formulates a robust dividend policy.16 As discussed earlier, this yields the following HJB equation: θ 2 1 rV ðyÞ ¼ max min dz þ h þ ðμ dzþ shÞV 0 ðyÞ þ s2 V″ðyÞ ð5:18Þ z 2 2 h where h represents a drift distortion, and where now θ can be interpreted as a penalty on the relative entropy between the benchmark cash flow process and the evil agent's worst case model. The policy function of the evil agent is easily seen to be h ¼ ðs=θÞV 0 ðyÞ. If this is substituted back into (5.18) we get the following nonlinear second-order ODE for the value function: 1 2 0 2 1 2 rV ðyÞ ¼ max dz þ ðμ dzÞV 0 ðyÞ s V ðyÞ þ s V″ðyÞ ð5:19Þ z 2θ 2 Not surprisingly, this does not have a closed-form solution, which leaves us with three options. First, we could just proceed to solve numerically for b for alternative values of the parameters. Although this would not be difficult, it would leave open the question of whether a barrier control policy is in fact optimal. Second, since we can solve the equation for the case θ 1 ¼ 0, we could construct a perturbation solution for small θ 1. Although Anderson et al. (2012) show that this can be an attractive alternative in robust control problems, we instead pursue a third option, which is to scale the robustness parameter, θ, so as to unwind the nonlinearity. In particular, note that if θ ¼ 12 θsV 0 ðyÞ we obtain the following linear ODE for the value function17: 1 1 V 0 ðyÞ þ s2 V″ðyÞ rV ðyÞ ¼ max dz þ μ dz sθ ð5:20Þ z 2 As in the earlier Linear-Quadratic example, a preference for robustness introduces a pessimistic drift distortion. This distortion decreases with the penalty parameter, θ. The analysis at this point can proceed exactly as in the no bankruptcy case in Section 5.1. The optimal policy will once again be a barrier control policy, and to calculate this policy we just need to 1 substitute μ s=θ in place of μ in Eq. (5.11). Fig. 4 plots the resulting optimal threshold as a function of θ for the following parameter values: μ ¼ 0:06, s ¼ 0:1, and r ¼0.01. 1 With no concern for model misspecification (i.e., θ ¼ 0), the optimal threshold is approximately b¼0.7. As the preference for robustness increases, so does the threshold. This is quite intuitive. As your doubts about the growth in net cash flows increase, you play it safe by requiring a higher stock of cash reserves before you start paying dividends. Notice, however, that beyond a certain point the threshold decreases sharply, and eventually it even falls below the benchmark value. Extreme doubts about cash flows translate into a very low growth estimate. In the limit, as μ-0, we have jλ2 j-λ1 and b-0. Beyond a certain point, the decline in the option value dominates the decline in present value, and so the threshold declines. (See Trojanowska and Kort, 2010 for further discussion). Although the policy function in this example is somewhat different than in our earlier Linear-Quadratic example, the punchline here is the same. An agent seeking robustness behaves the same (within limits) as an agent facing boundary costs. 16 This problem has received some recent attention in the real options literature. For example, Trojanowska and Kort (2010) and Miao and Wang (2011) study the effects of ambiguity on the option to invest in the presence of fixed costs. They find that the effects are ambiguous in general. In Trojanowska and Kort (2010), whether ambiguity delays investment depends on the relative strength of opposing option and present value effects. In Miao and Wang (2011), it depends on whether terminal payoffs are ambiguous or not. 17 This scaling trick was also used by Maenhout (2004) to obtain homothetic robust portfolio policies. However, he scales θ by V(y) instead of V 0 ðyÞ. This has somewhat different behavioral implications. Scaling by V 0 implies the agent's preference for robustness increases with wealth/cash reserves, whereas scaling by V implies the opposite.

I.-K. Cho, K. Kasa / Journal of Economic Dynamics & Control 42 (2014) 1–12

11

0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1/θ

Fig. 4. Dividend payment threshold vs. preference for robustness.

6. Conclusion Macroeconomic policy analysis normally presumes that policymakers want to keep the average performance of a system near some target operating level. For example, this is the case when policy is based on maximizing the expected utility of a representative agent, with utility quadratically approximated around some deterministic or frictionless steady state. In this paper, we have studied the problem of a policymaker who wants to keep the system away from a critical threshold. The motivation for this is that in practice optimal targets may be hard to measure or observe, since they depend on the preferences of individual agents. In contrast, crisis thresholds are often easier to measure, and therefore provide a more secure foundation for policy. We show that the response to an increase in boundary costs is analogous to an increase in model uncertainty, i.e., in both cases optimal policy becomes more vigilant and activist. Not only does this result provide a novel example of the duality between risk-sensitivity and robustness, it also greatly expands the relevant domain of robust control methods. There are many possible routes for extensions and refinements. One obvious extension would be to consider multiple states and controls. In this case, a crisis might be defined by escape to a nontrivial region of the state space. The only barriers to such an extension are computational, rather than conceptual. Rather than having to solve a nonlinear ODE, we would face the much harder problem of solving a nonlinear PDE. Another possible extension would be to exploit the connections between viscosity solutions and the theory of large deviations. Both are based on small noise approximations. Fleming and Tsai (1981) pursue this route, and show that escape time control can be cast as a dynamic zero-sum game in which agents battle over the size of the large deviations rate function. However, in their formulation timing is important, and decision rules are not generally time consistent. Boue and Dupuis (2001) discuss this issue, and offer some suggested resolutions. Finally, it would of course be useful to consider other examples. In the realm of macroeconomic policy, the zero lower bound on nominal interest rates or the depletion of foreign exchange reserves might be thought of as introducing critical thresholds into policy. However, we leave the exploration of these possibilities to future research.

Acknowledgments We thank two anonymous referees for helpful comments. References Anderson, E.W., Hansen, L.P., Sargent, T.J., 2012. Small noise methods for risk-sensitive/robust economies. J. Econ. Dyn. Control 36, 468–500. Boue, M., Dupuis, P., 2001. Risk-sensitive and robust escape control for degenerate diffusion processes. Math. Control Signals Syst. 14, 62–85. Clark, J., Vinter, R., 2012. Stochastic exit time problems arising in process control. Stochastics 84, 667–681. Cox, D., Miller, H., 1965. The Theory of Stochastic Processes. Chapman and Hall, London. Day, M.V., 1983. On the exponential exit law in the small parameter exit problem. Stochastics 8, 297–323. Dembo, A., Zeitouni, O., 1998. Large Deviations Techniques and Applications. Springer-Verlag, New York. Dupuis, P., McEneaney, W.M., 1997. Risk-sensitive and robust escape criteria. SIAM J. Control Optim. 35, 2021–2049. Dutta, P.K., Radner, R., 1999. Profit maximization and the market selection hypothesis. Rev. Econ. Stud. 66, 769–798. Fleming, W.H., 1978. Exit probabilities and optimal stochastic control. Appl. Math. Optim. 4, 329–346. Fleming, W.H., Soner, H.M., 1993. Controlled Markov Processes and Viscosity Solutions. Springer-Verlag, New York.

12

I.-K. Cho, K. Kasa / Journal of Economic Dynamics & Control 42 (2014) 1–12

Fleming, W.H., Tsai, C.-P., 1981. Optimal exit probabilities and differential games. Appl. Math. Optim. 7, 253–282. Freidlin, M., Wentzell, A., 1998. Random Perturbations of Dynamical Systems, second edn Springer-Verlag, New York. Hansen, L.P., Sargent, T.J., 2008. Robustness. Princeton University Press, Princeton. Hansen, L.P., Sargent, T.J., 2011. Wanting robustness in macroeconomics. In: Friedman, B.M., Woodford, M. (Eds.), Handbook of Monetary Economics, 3B, North-Holland, Amsterdam, pp. 1097–1157. Hansen, L.P., Sargent, T.J., Turmuhambetova, G.A., Williams, N., 2006. Robust control, min–max expected utility, and model misspecification. J. Econ. Theory 128, 45–90. Karlin, S., Taylor, H.M., 1981. A Second Course in Stochastic Processes. Academic Press, New York. Kolyuzhnov, D., Bogomolova, A., Slobodyan, S., 2014. Escape dynamics: a continuous-time approximation. J. Econ. Dyn. Control 38, 161–183. Maenhout, P.J., 2004. Robust portfolio rules and asset pricing. Rev. Financ. Stud. 17, 951–983. Meerkov, S.M., Runolfsson, T., 1988. Residence time control. IEEE Trans. Autom. Control 33, 323–332. Miao, J., Wang, N., 2011. Risk, uncertainty, and option exercise. J. Econ. Dyn. Control 35, 442–461. Stokey, N.L., 2009. The Economics of Inaction: Stochastic Control Models with Fixed Costs. Princeton University Press, Princeton, New Jersey. Strzalecki, T., 2011. Axiomatic foundations of multiplier preferences. Econometrica 79, 47–73. Trojanowska, M., Kort, P., 2010. The worst case for real options. J. Optim. Theory Appl. 146, 709–734.

An escape time interpretation of robust control

An escape time interpretation of robust control

Recommend Documents