Journal of Computational Physics 235 (2013) 1–13
Contents lists available at SciVerse ScienceDirect
Journal of Computational Physics journal homepage: www.elsevier.com/locate/jcp
Forward and adjoint sensitivity computation of chaotic dynamical systems Qiqi Wang ⇑ Department of Aeronautics and Astronautics, MIT, 77 Mass Ave., Cambridge, MA 02139, USA
a r t i c l e
i n f o
Article history: Received 23 February 2012 Accepted 5 September 2012 Available online 27 October 2012 Keywords: Sensitivity analysis Linear response Adjoint equation Unsteady adjoint Chaos Statistical average Lyapunov exponent Lyapunov covariant vector Lorenz attractor
a b s t r a c t This paper describes a forward algorithm and an adjoint algorithm for computing sensitivity derivatives in chaotic dynamical systems, such as the Lorenz attractor. The algorithms compute the derivative of long time averaged ‘‘statistical’’ quantities to infinitesimal perturbations of the system parameters. The algorithms are demonstrated on the Lorenz attractor. We show that sensitivity derivatives of statistical quantities can be accurately estimated using a single, short trajectory (over a time interval of 20) on the Lorenz attractor. Ó 2012 Elsevier Inc. All rights reserved.
1. Introduction Computational methods for sensitivity analysis is a powerful tool in modern computational science and engineering. These methods calculate the derivatives of output quantities with respect to input parameters in computational simulations. There are two types of algorithms for computing sensitivity derivatives: the forward algorithms and the adjoint algorithms. The forward algorithms are more efficient for computing sensitivity derivatives of many output quantities to a few input parameters; the adjoint algorithms are more efficient for computing sensitivity derivatives of a few output quantities to many input parameters. Key application of computational methods for sensitivity analysis include aerodynamic shape optimization [3], adaptive grid refinement [9], and data assimilation for weather forecasting [8]. In simulations of chaotic dynamical systems, such as turbulent flows and the climate system, many output quantities of interest are ‘‘statistical averages’’. Denote the state of the dynamical system as xðtÞ; for a function of the state JðxÞ, the corresponding statistical quantity hJi is defined as an average of JðxðtÞÞ over an infinitely long time interval:
1 T!1 T
hJi ¼ lim
Z
T
JðxðtÞÞ dt:
ð1Þ
0
For ergodic dynamical systems, a statistical average only depends on the governing dynamical system, and does not depend on the particular choice of trajectory xðtÞ. Many statistical averages, such as the mean aerodynamic forces in turbulent flow simulations, and the mean global temperature in climate simulations, are of great scientific and engineering interest. Computing sensitivities of these statistical quantities to input parameters can be useful in many applications. ⇑ Tel.: +1 617 253 0921. E-mail address:
[email protected] 0021-9991/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jcp.2012.09.007
2
Q. Wang / Journal of Computational Physics 235 (2013) 1–13
The differentiability of these statistical averages to parameters of interest as been established through the recent developments in the Linear Response Theory for dissipative chaos [6,7]. A class of chaotic dynamical systems, known as ‘‘quasi-hyperbolic’’ systems, has been proven to have statistical quantities that are differentiable with respect to small perturbations. These systems include the Lorenz attractor, and possibly many systems of engineering interest, such as turbulent flows. Despite recent advances both in Linear Response Theory [7] and in numerical methods for sensitivity computation of unsteady systems [10,4], sensitivity computation of statistical quantities in chaotic dynamical systems remains difficult. A major challenge in computing sensitivities in chaotic dynamical systems is their sensitivity to the initial condition, commonly known as the ‘‘butterfly effect’’. The linearized equations, used both in forward and adjoint sensitivity computations, give rise to solutions that blow up exponentially. When a statistical quantity is approximated by a finite time average, the computed sensitivity derivative of the finite time average diverges to infinity, instead of converging to the sensitivity derivative of the statistical quantity [5]. Existing methods for computing correct sensitivity derivatives of statistical quantities usually involve averaging over a large number of ensemble calculations [5,1]. The resulting high computation cost makes these methods not attractive in many applications. This paper outlines a computational method for efficiently estimating the sensitivity derivative of time averaged statistical quantities, relying on a single trajectory over a small time interval. The key idea of our method, inversion of the ‘‘shadow’’ operator, is already used as a tool for proving structural stability of strange attractors [6]. The key strategy of our method, divide and conquer of the shadow operator, is inspired by recent advances in numerical computation of the Lyapunov covariant vectors [2,11]. In the rest of this paper, Section 2 describes the ‘‘shadow’’ operator, on which our method is based. Section 3 derives the sensitivity analysis algorithm by inverting the shadow operator. Section 4 introduces a fix to the singularity of the shadow operator. Section 5 summarizes the forward sensitivity analysis algorithm. Section 6 derives the corresponding adjoint version of the sensitivity analysis algorithm. Section 7 demonstrates both the forward and adjoint algorithms on the Lorenz attractor. Section 8 concludes this paper. The paper uses the following mathematical notation: Vector fields in the state space (e.g. f ðxÞ; /i ðxÞ) are column vectors; @ax @f gradient of scalar fields (e.g. @xi ) are row vectors; gradient of vector fields (e.g. @x ) are matrices with each row being a dimension of f, and each column being a dimension of x. The () sign is used to identify matrix–vector products or vector-vector inner products. For a trajectory xðtÞ satisfying dx ¼ f ðxÞ and a scalar or vector field aðxÞ in the state space, we often use da dt dt da da dx da to denote daðxðtÞÞ . The chain rule ¼ ¼ f is often used without explanation. dt dx dt dx dt 2. The ‘‘shadow operator’’ For a smooth, uniformly bounded n dimensional vector field dxðxÞ, defined on the n dimensional state space of x. The following transform defines a slightly ‘‘distorted’’ coordinates of the state space:
x0 ðxÞ ¼ x þ dxðxÞ; where
ð2Þ
is a small real number. Note that for an infinitesimal , the following relation holds: x0 ðxÞ x ¼ dxðxÞ ¼ dxðx0 Þ þ Oð2 Þ:
ð3Þ
0
We call the transform from x to x as a ‘‘shadow coordinate transform’’. In particular, consider a trajectory xðtÞ and the corresponding transformed trajectory x0 ðtÞ ¼ x0 ðxðtÞÞ. For a small , the transformed trajectory x0 ðtÞ would ‘‘shadow’’ the original trajectory xðtÞ, i.e., it stays uniformly close to xðtÞ forever. Fig. 1 shows an example of a trajectory and its shadow. Now consider a trajectory xðtÞ satisfying an ordinary differential equation
x_ ¼ f ðxÞ
ð4Þ 0
with a smooth vector field f ðxÞ as a function of x. The same trajectory in the transformed ‘‘shadow’’ coordinates x ðtÞ do not satisfy the same differential equation. Instead, from Eq. (3), we obtain
@dx @x @dx x_0 ¼ f ðxÞ þ f ðxÞ ¼ f ðx0 Þ dxðx0 Þ þ f ðx0 Þ þ Oð2 Þ @x @x @x
ð5Þ
In other words, the shadow trajectory x0 ðtÞ satisfies a slightly perturbed equation
x_0 ¼ f ðx0 Þ þ df ðx0 Þ þ Oð2 Þ;
ð6Þ
where the perturbation df is
df ðxÞ ¼
@f @dx @f ddx dxðxÞ þ f ðxÞ ¼ dxðxÞ þ :¼ ðSf dxÞðxÞ: @x @x @x dt
ð7Þ
For a given differential equation x_ ¼ f ðxÞ, Eq. (7) defines a linear operator Sf : dx ) df . We call Sf the ‘‘shadow operator’’ of f. For any smooth vector field dxðxÞ that defines a slightly distorted ‘‘shadow’’ coordinate system in the state space, Sf deter-
3
Q. Wang / Journal of Computational Physics 235 (2013) 1–13
Fig. 1. A trajectory of the Lorenz attractor under a shadow coordinate transform. The black trajectory shows xðtÞ, and the red trajectory shows xprime ðtÞ. The perturbation dx shown corresponds to an infinitesimal change in the parameter r, and is explained in detail in Section 7. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
mines a unique smooth vector field df ðxÞ that defines a perturbation to the differential equation. Any trajectory of the original differential equation would satisfy the perturbed equation in the distorted coordinates. Given an ergodic dynamical system x_ ¼ f ðxÞ, and a pair ðdx; df Þ that satisfies df ¼ Sf dx; dx determines the sensitivity of statistical quantities of the dynamical system to an infinitesimal perturbation df . Let JðxÞ be a smooth scalar function of the state, consider the statistical average hJi as defined in Eq. (1). The sensitivity derivative of hJi to the infinitesimal perturbation df is by definition
Z Z dhJi 1 1 T 0 1 T ¼ lim lim Jðx ðtÞÞ dt lim JðxðtÞÞ dt ; T!1 T 0 !0 T!1 T 0 d
ð8Þ
where by the ergodicity assumption, the statistical average of the perturbed system can be computed by averaging over x0 ðtÞ, which satisfies the perturbed governing differential equation. Continuing from Eq. (8),
dhJi 1 ¼ lim lim !0 T!1 T d
Z
T
Jðx0 ðtÞÞ JðxðtÞÞ
0
1 T!1 !0 T
dt ¼ lim lim
Z 0
T
Jðx0 ðtÞÞ JðxðtÞÞ
1 T!1 T
dt ¼ lim
Z 0
T
@J dx dt ¼ @x
@J dx : @x
ð9Þ
Eq. (9) represents the sensitivity derivative of a statistical quantity hJi to the size of a perturbation df . There are two subtle points here: The two limits lim!0 and limT!1 can commute with each other for the following reason: The two trajectories x0 ðtÞ and xðtÞ stay uniformly close to each other forever; therefore,
Jðx0 ðtÞÞ JðxðtÞÞ !0 @J dx; ! @x
ð10Þ
uniformly for all t. Consequently,
1 T
Z 0
T
Jðx0 ðtÞÞ JðxðtÞÞ
!0 1
dt !
T
Z
T 0
@J dx dt; @x
ð11Þ
uniformly for all T. Thus the two limits commute. The two trajectories x0 ðtÞ and xðtÞ start at two specially positioned pair of initial conditions x0 ð0Þ ¼ xð0Þ þ dxðxð0ÞÞ. Almost any other pair of initial conditions (e.g. x0 ð0Þ ¼ xð0Þ) would make the two trajectories diverge as a result of the ‘‘butterfly effect’’. They would not stay uniformly close to each other, and the limits lim!0 and limT!1 would not commute.
4
Q. Wang / Journal of Computational Physics 235 (2013) 1–13
Eq. (9) represents the sensitivity derivative of the statistical quantity hJi to the infinitesimal perturbation df as another sta@J @J tistical quantity h@x dxi. We can compute it by averaging @x dx over a sufficiently long trajectory, provided that dx ¼ S1 df is known along the trajectory. The next section describes how to numerically compute dx ¼ S1 df for a given df . 3. Inverting the shadow operator Perturbations to input parameters can often be represented as perturbations to the dynamics. Consider a differential equation x_ ¼ f ðx; s1 ; s2 ; . . . ; sm Þ parameterized by m input variables, an infinitesimal perturbation in a input parameter df sj ! sj þ can be represented as a perturbation to the dynamics df ¼ ds . j Eq. (9) defines the sensitivity derivative of the statistical quantity hJi to an infinitesimal perturbation df , provided that a dx can be found satisfying df ¼ Sf dx, where Sf is the shadow operator. To compute the sensitivity by evaluating Eq. (9), one must first numerically invert Sf for a given df to find the corresponding dx. The key ingredient of numerical inversion of Sf is the Lyapunov spectrum decomposition. This decomposition can be efficiently computed numerically [11,2]. In particular, we focus on the case when the system x_ ¼ f ðxÞ has distinct Lyapunov exponents. Denote the Lyapunov covariant vectors as /1 ðxÞ; /2 ðxÞ; . . . ; /n ðxÞ. Each /i is a vector field in the state space satisfying
d @f / ðxðtÞÞ ¼ / ðxðtÞÞ ki /i ðxðtÞÞ; dt i @x i
ð12Þ
where k1 ; k2 ; . . . ; kn are the Lyapunov exponents in decreasing order. The Lyapunov spectrum decomposition enables a divide and conquer strategy for computing dx ¼ S1 f df . For any df ðxÞ and every point x on the attractor, both dxðxÞ and df ðxÞ can be decomposed into the Lyapunov covariant vector directions almost everywhere, i.e.
dxðxÞ ¼
n X axi ðxÞ /i ðxÞ;
ð13Þ
i¼1
df ðxÞ ¼
n X afi ðxÞ /i ðxÞ;
ð14Þ
i¼1
where axi and afi are scalar fields in the state space. From the form of Sf in Eq. (7), we obtain
Sf ðaxi /i Þ ¼
d ax ðxÞ @f d @f d /i ðxÞ ðaxi ðxÞ/i ðxÞÞ þ ðaxi ðxÞ /i ðxÞÞ ¼ axi ðxÞ / ðxÞ þ i : /i ðxÞ þ axi ðxÞ @x dt @x i dt dt
ð15Þ
By substituting Eq. (12) into the last term of Eq. (15), we obtain
Sf ðaxi /i Þ ¼
x dai ðxÞ ki axi ðxÞ /i ðxÞ: dt
ð16Þ
By combining Eq. (16) with Eqs. (13) and (14) and the linear relation df ¼ Sf dx, we finally obtain
df ¼
x n n X X dai Sf ðaxi /i Þ ¼ ki axi /i : dt i¼1 i¼1 |fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl}
ð17Þ
f
ai
Eqs. (16) and (17) are useful for two reasons: 1. They indicate that the shadow operator Sf , applied to a scalar field axi ðxÞ multiple of /i ðxÞ, generates another scalar field afi ðxÞ multiple of the same vector field /i ðxÞ. Therefore, one can compute S1 f df by first decomposing df as in Eq. (14) to obtain the afi . If each axi can be calculated from the corresponding afi , then dx can be computed with Eq. (13), completing the inversion. 2. It defines a scalar ordinary differential equation that governs the relation between axi and afi along a trajectory xðtÞ: x
dai ðxÞ ¼ afi ðxÞ þ ki axi ðxÞ: dt
ð18Þ
This equation can be used to obtain axi from afi along a trajectory, thereby filling the gap in the inversion procedure of Sf outlined above. For each positive Lyapunov exponent ki , one can integrate the ordinary differential equation
xi da xi ; fi þ ki a ¼a dt
ð19Þ
xi ðtÞ and the desired axi ðxÞ will decrease backwards in time from an arbitrary terminal condition, and the difference between a exponentially. For each negative Lyapunov exponent ki , Eq. (19) can be integrated forward in time from an arbitrary initial
Q. Wang / Journal of Computational Physics 235 (2013) 1–13
5
xi ðtÞ will converge exponentially to the desired axi ðxÞ. For a zero Lyapunov exponent ki ¼ 0, Section 4 introcondition, and a duces a solution. 4. Time dilation and compression There is a fundamental problem in the inversion method derived in Section 3: Sf is not invertible for certain df . This can be shown with the following analysis: Any continuous time dynamical system with a non-trivial attractor must have a zero Lyapunov exponent kn0 ¼ 0. The corresponding Lyapunov covariant vector is /n0 ðxÞ ¼ f ðxÞ. This can be verified by substituting ki ¼ 0 and /i ¼ f into Eq. (12). For this i ¼ n0 , Eq. (19) becomes x
afn0 ðxÞ ¼
dan0 ðxÞ dt
:
ð20Þ
By taking an infinitely long time average on both sides of Eq. (20), we obtain
D
E axn ðxðTÞÞ axn0 ðxð0ÞÞ afn0 ðxÞ ¼ lim 0 ¼ 0: T!1 T
ð21Þ
Eq. (21) implies that for any df ¼ Sf dx, the i ¼ n0 component of its Lyapunov decomposition (as in Eq. (14)) must satisfy hafn0 ðxÞi ¼ 0. Any df that do not satisfy this linear relation, e.g. df f , would not be in the range space of Sf . Thus the corresponding dx ¼ S1 f df does not exist. Our solution to the problem is complementing Sf with a ‘‘global time dilation and compression’’ constant g, whose effect produces a df that is outside the range space of Sf . We call g a time dilation constant for short. The combined effect of a time dilation constant and a shadow transform could produce all smooth perturbations df . The idea comes from the fact that for a constant g, the time dilated or compressed system x_ ¼ ð1 þ gÞf ðxÞ has exactly the same statistics hJi, as defined in Eq. (1), as the original system x_ ¼ f ðxÞ. Therefore, the perturbation in any hJi due to any df is equal to the perturbation in hJi due to ðgf ðxÞ þ df ðxÞÞ. Therefore, the sensitivity derivative to df can be computed if we can find a dx that satisfies Sf dx ¼ gf ðxÞ þ df ðxÞ for some g. We use the ‘‘free’’ constant g to put gf ðxÞ þ df ðxÞ into the range space of Sf . By substituting gf ðxÞ þ df ðxÞ into the constraint Eq. (21) that identifies the range space of Sf , the appropriate g must satisfy the following equation
g þ hafn0 i ¼ 0;
ð22Þ
which we use to numerically compute g. Once the appropriate time dilation constant g is computed, gf ðxÞ þ df ðxÞ is in the range space of Sf . We use the procedure in Section 3 to compute dx ¼ S1 f ðgf þ df Þ, then use Eq. (9) to compute the desired sensitivity derivative dhJi=d. The addition of gf to df affects Eq. (19) only for i ¼ n0 , making it x
dan0 ðxÞ
¼ afn0 ðxÞ þ g:
dt
ð23Þ
Eq. (23) indicates that axn0 can be computed by integrating the right hand side along the trajectory. The solution to Eq. (23) admits an arbitrary additive constant. The effect of this arbitrary constant is the following: By substituting Eqs. (13) into Eq. (9), the contribution from the i ¼ n0 term of dx to dhJi=d is
1 T!1 T lim
Z 0
T
afn0
dJ dt: dt
ð24Þ
Therefore, any constant addition to afn0 vanishes as T ! 1. Computationally, however, Eq. (9) must be approximated by a finite time average. We find it beneficial to adjust the level of afn0 to approximately hafn0 i ¼ 0, in order to control the error due to finite time averaging. 5. The forward sensitivity analysis algorithms For a given x_ ¼ f ðxÞ; df and JðxÞ, the mathematical developments in Sections 3 and 4 are summarized into Algorithm 1 for computing the sensitivity derivative ddhJi=d as in Eq. (9). The preparation phase of the algorithm (Steps 1–3) computes a trajectory and the Lyapunov spectrum decomposition along the trajectory. The algorithm then starts by decomposing df (Step 4), followed by computing dx (Steps 5–7), and finally computing dhJi=d (Step 8). The sensitivity derivative of many different statistical quantities hJ 1 i; hJ 2 i; . . . to a single df can be computed by only repeating the last step of the algorithm. Therefore, this is a ‘‘forward’’ algorithm in the sense that it efficiently computes sensitivity of multiple output quantities to a single input parameter (the size of perturbation df ). We will see that this is in sharp contrast to the ‘‘adjoint’’ algorithm described in Section 6, which efficiently computes the sensitivity derivative of one output statistical quantity hJi to many perturbations df1 ; df2 ; . . .. It is worth noting that the dx computed using Algorithm 1 satisfies the forward tangent equation
6
Q. Wang / Journal of Computational Physics 235 (2013) 1–13
_ ¼ @f dx þ g f þ df : dx @x
ð25Þ
This can be verified by taking derivative of Eq. (13), substituting Eqs. (19) and (23), then using Eq. (14). However, dx must satisfy both an initial condition and a terminal condition, making it difficult to solve with conventional time integration methods. In fact, Algorithm 1 is equivalent to splitting dx into stable, neutral and unstable components, corresponding to positive, zero and negative Lyapunov exponents; then solving Eq. (25) separately for each component in different time directions. This alternative version of the forward sensitivity computation algorithm could be useful for large systems to avoid computation of all the Lyapunov covariant vectors. Algorithm 1. The forward sensitivity analysis algorithm (1) Choose a ‘‘spin-up buffer time’’ T B , and an ‘‘statistical averaging time’’ T A . T B should be much longer than 1=jki j for all nonzero Lyapunov exponent ki , so that the solutions of Eq. (19) can reach axi over a time span of T B . T A should be much longer than the decorrelation time of the dynamics, so that one can accurately approximate a statistical quantity by averaging over ½0; T A . (2) Obtain an initial condition on the attractor at t ¼ T B , e.g., by solving x_ ¼ f ðxÞ for a sufficiently long time span, starting from an arbitrary initial condition. (3) Solve x_ ¼ f ðxÞ to obtain a trajectory xðtÞ; t 2 ½T B ; T A þ T B ; compute the Lyapunov exponents ki and the Lyapunov covariant vectors /i ðxðtÞÞ along the trajectory, e.g., using algorithms in [11,2]. (4) Perform the Lyapunov spectrum decomposition of df ðxÞ along the trajectory xðtÞ to obtain afi ðxÞ; i ¼ 1; . . . ; n as in Eq. (14). (5) Compute the global time dilation constant g using Eq. (22). (6) Solve the differential Eqs. (19) to obtain axi over the time interval ½0; T A . The equations with positive ki are solved backward in time from t ¼ T A þ T B to t ¼ 0; the ones with negative ki are solved forward in time from t ¼ T B to t ¼ T A . For kn0 ¼ 0, Eq. (23) is integrated, and the mean of axn0 is set to zero. (7) Compute dx along the trajectory xðtÞ; t 2 ½0; T A with Eq. (13). (8) Compute dhJi=d using Eq.(1) by averaging over the time interval ½0; T A .
6. The adjoint sensitivity analysis algorithm The adjoint algorithm starts by trying to find an adjoint vector field ^f ðxÞ, such that the sensitivity derivative of the given statistical quantity hJi to any infinitesimal perturbation df can be represented as
dhJi
D E ¼ ^f T df
ð26Þ
@J Both ^f in Eq. (26) and @x in Eq. (9) can be decomposed into linear combinations of the adjoint Lyapunov covariant vectors almost everywhere on the attractor:
^f ðxÞ ¼
n X ^fi ðxÞ wi ðxÞ; a
ð27Þ
i¼1 n @J T X ^xi ðxÞ wi ðxÞ; a ¼ @x i¼1
ð28Þ
where the adjoint Lyapunov covariant vectors wi satisfy T
d @f w ðxðtÞÞ ¼ wi ðxðtÞÞ ki wi ðxðtÞÞ dt i @x
ð29Þ
with proper normalization, the (primal) Lyapunov covariant vectors /i and the adjoint Lyapunov covariant vectors wi have the following conjugate relation:
wi ðxÞT /j ðxÞ
0; i – j; 1; i ¼ j;
ð30Þ
i.e., the n n matrix formed by all the /i and the n n matrix formed by all the wi are the transposed inverse of each other at every point x in the state space. By substituting Eqs. (13) and (28) into Eq. (9), and using the conjugate relation in Eq. (30), we obtain n x x dhJi X ^i ai : a ¼ d i¼1
ð31Þ
7
Q. Wang / Journal of Computational Physics 235 (2013) 1–13
Similarly, by combining Eqs. (26), (14), (27) and (30), it can be shown that ^f satisfies Eq. (26) if and only if n D E dhJi X ^fi afi : ¼ a d i¼1
ð32Þ
^fi that satisfy Comparing Eqs. (31) and (32) leads to the following conclusion: Eq. (26) can be satisfied by finding a
D
^fi afi a
E
x x ^i ai ; ¼ a
i ¼ 1; . . . ; n:
ð33Þ
^fi a
The that satisfies Eq. (33) can be found using the relation between Eq. (18) and integrate by parts in time, we obtain
1 T
Z
T
0
^fi afi dt ¼ a
T Z ^fi axi
a 1 T
T 0 T
0
afi
and axi in Eq. (18). By multiplying
^fi a
on both sides of
! ^fi da ^fi axi dt þ ki a dt
ð34Þ
for i – n0 . Through apply the same technique to Eq. (23), we obtain for i ¼ n0
1 T
Z
T
^fn afn dt a 0 0
0
T Z Z ^fn0 axn
^fn0 x a 1 T da 1 T 0 ¼ g a^fn0 dt: an0 dt þ
T 0 dt T 0 T
ð35Þ
0
^fi to satisfy the following relations If we set a
^fi ðxÞ da ^fi ðxÞ; i – n0 ; ^xi ðxÞ þ ki a ¼a dt ^f ðxÞ da ^fi i ¼ 0; i ¼ n0 ; ^xi ðxÞ; ha i ¼a dt
ð36Þ
then Eqs. (34) and (35) become
T Z ^fi axi
a 1 T x x ^ a dt; i – n0 ; a ¼
þ T 0 i i T
0 0
T Z T Z Z ^f ax
a 1 T f f 1 T x x 1 ^i ai dt ¼ i i
þ ^i ai dt þ g ^fn dt ha ^fn i ; i ¼ n0 : a a a 0 0 T 0 T 0 T 0 T
1 T
Z
T
^fi afi dt a
ð37Þ
0
As T ! 1, both equations reduces to Eq. (33). ^fi satisfy Eq. (36), then they also satisfy Eq. (37) and thus Eq. (33); as a result, the ^f formed In summary, if the scalar fields a f ^ through Eq. (27) satisfies Eq. (26), thus is the desired adjoint vector field. by these a ^fi satisfying Eq. (36) can be computed by solving an ordinary differential equations For each i – n0 , the scalar field a
^f da i f ^x þ k a ¼a i ^i : i dt
ð38Þ
Contrary to computation of axi through solving Eq. (19), the time integration should be forward in time for positive ki , and ^fi ðtÞ and a ^fi ðxðtÞÞ to diminish exponentially. backward in time for negative ki , in order for the difference between a ^fn0 ðxÞ. The equation is well defined because the right The i ¼ n0 equation in Eq. (36) can be directly integrated to obtain a hand side is mean zero:
1 T
Z 0
T
^fn ðxðtÞÞ dt ¼ a 0
1 T
Z 0
T
@J 1 / dt ¼ @x n0 T
Z 0
T
dJ T!1 dt ! 0: dt
ð39Þ
^xn ðxÞ over time, subtracted by its mean, is the solution a ^fn0 ðxÞ to the i ¼ n0 case of Eq. (36). Therefore, the integral of a 0 Algorithm 2. The adjoint sensitivity analysis algorithm (1) Choose a ‘‘spin-up buffer time’’ T B , and an ‘‘statistical averaging time’’ T A . T B should be much longer than 1=jki j for all nonzero Lyapunov exponent ki , so that the solutions of Eq. (19) can reach axi over a time span of T B . T A should be much longer than the decorrelation time of the dynamics, so that one can accurately approximate a statistical quantity by averaging over ½0; T A . (2) Obtain an initial condition on the attractor at t ¼ T B , e.g., by solving x_ ¼ f ðxÞ for a sufficiently long time span, starting from an arbitrary initial condition. (3) Solve x_ ¼ f ðxÞ to obtain a trajectory xðtÞ; t 2 ½T B ; T A þ T B ; compute the Lyapunov exponents ki and the Lyapunov covariant vectors /i ðxðtÞÞ along the trajectory, e.g., using algorithms in [11,2]. (continued on next page)
8
Q. Wang / Journal of Computational Physics 235 (2013) 1–13
(continued)
Algorithm 2. The adjoint sensitivity analysis algorithm ^xi ðxðtÞÞ; i ¼ 1; . . . ; n (4) Perform the Lyapunov spectrum decomposition of ð@J=@xÞT along the trajectory xðtÞ to obtain a as in Eq. (28). ^fi ðxðtÞÞ over the time interval ½0; T A . The equations with negative ki are (5) Solve the differential Eqs. (38) to obtain a solved backward in time from t ¼ T A þ T B to t ¼ 0; the ones with positive ki are solved forward in time from t ¼ T B to t ¼ T A . For i ¼ n0 , the scalar axn0 is integrated along the trajectory; the mean of the integral is subtracted from the ^fn0 . integral itself to obtain a ^ (6) Compute f along the trajectory xðtÞ; t 2 ½0; T A with Eq. (27). (7) Compute dhJi=d using Eq. (26) by averaging over the time interval ½0; T A .
The above analysis summarizes to Algorithm 2 for computing the sensitivity derivative derivative of the statistical average hJi to an infinitesimal perturbations df . The preparation phase of the algorithm (Steps 1–3) is exactly the same as in Algorithm 1. These steps compute a trajectory xðtÞ and the Lyapunov spectrum decomposition along the trajectory. The adjoint algorithm then starts by decomposing the derivative vector ð@J=@xÞT (Step 4), followed by computing the adjoint vector df (Steps 5–6), and finally computing dhJi=d for a particular df . Note that the sensitivity of the same hJi to many different perturbations df1 ; df2 ; . . . can be computed by repeating only the last step of the algorithm. Therefore, this is an ‘‘adjoint’’ algorithm, in the sense that it efficiently computes the sensitivity derivatives of a single output quantity to many input perturbation. It is worth noting that ^f computed using Algorithm 2 satisfies the adjoint equation T
_ @f ^ @J ^f ¼ f : @x @x
ð40Þ
This can be verified by taking derivative of Eq. (27), substituting Eq. (36), then using Eq. (28). However, ^f must satisfy both an initial condition and a terminal condition, making it difficult to solve with conventional time integration methods. In fact, Algorithm 2 is equivalent to splitting ^f into stable, neutral and unstable components, corresponding to positive, zero and negative Lyapunov exponents; then solving Eq. (40) separately for each component in different time directions. This alternative version of the adjoint sensitivity computation algorithm could be useful for large systems, to avoid computation of all the Lyapunov covariant vectors. 7. An example: the Lorenz attractor We consider the Lorenz attractor x_ ¼ f ðxÞ, where x ¼ ðx1 ; x2 ; x3 ÞT , and
0
rðx2 x1 Þ
1
B C f ðxÞ ¼ @ x1 ðr x3 Þ x2 A:
ð41Þ
x1 x2 bx3 The ‘‘classic’’ parameter values r ¼ 10; r ¼ 28; b ¼ 8=3 are used. Both the forward sensitivity analysis algorithm (Algorithm 1) and and the adjoint sensitivity analysis algorithm (Algorithm 2) are performed on this system. We want to demonstrate the computational efficiency of our algorithm; therefore, we choose a relatively short statistical averaging interval of T A ¼ 10, and a spin up buffer period of T B ¼ 5. Only a single trajectory of length T A þ 2T B on the attractor is required in our algorithm. Considering that the oscillation period of the Lorenz attractor is around 1, the combined trajectory length of 20 is a reasonable time integration length for most simulations of chaotic dynamical systems. In our example, we start the time integration from t ¼ 10 at x ¼ ð8:67139571762; 4:98065219709; 25Þ, and integrate the equation to t ¼ 5, to ensure that the entire trajectory from T B to T A þ T B is roughly on the attractor. The rest of the discussion in this section are focused on the trajectory xðtÞ for t 2 ½T B ; T A þ T B . 7.1. Lyapunov covariant vectors The Lyapunov covariant vectors are computed in Step 3 of both Algorithms 1 and 2, over the time interval ½T B ; T A þ T B . These vectors, along with the trajectory xðtÞ, are shown in Fig. 2. The three dimensional Lorenz attractor has three pairs of Lyapunov exponents and Lyapunov covariant vectors. k1 is the only positive Lyapunov exponent, and /1 is computed by integrating the tangent linear equation
@f ~x_ ¼ ~x; @x
ð42Þ
forward in time from an arbitrary initial condition at t ¼ T B . The first Lyapunov exponent is estimated to be k1 0:95 through a linear regression of ~ x in the log space. The first Lyapunov vector is then obtained as /1 ¼ ~ x ek1 t .
Q. Wang / Journal of Computational Physics 235 (2013) 1–13
9
Fig. 2. The Lyapunov covariant vectors of the Lorenz attractor along the trajectory xðtÞ for t 2 ½0; 10. The x-axes are t; the blue, green and red lines correspond to the x1 ; x2 and x3 coordinates in the state space, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
qffiffiffiffiffiffiffiffiffiffiffiffiffi k2 ¼ 0 is the vanishing Lyapunov exponent; therefore, /2 ¼ h f ðxÞ, where h ¼ 1= hkf k22 i is a normalizing constant that make the mean magnitude of /2 equal to 1. The third Lyapunov exponent k3 is negative. So /3 is computed by integrating the tangent linear Eq. (42) backwards in time from an arbitrary initial condition at t ¼ T A þ T B . The third Lyapunov exponent is estimated to be k3 14:6 through a linear regression of the backward solution ~ x in the log space. The third Lyapunov vector is then obtained as /3 ¼ ~ x ek3 t .
7.2. Forward sensitivity analysis We demonstrate our forward sensitivity analysis algorithm by computing the sensitivity derivative of three statistical quantities hx21 i, hx22 i and, hx3 i to a small perturbation in the system parameter r in the Lorenz attractor Eq. (41). The infinitesimal perturbation r ! r þ is equivalent to the perturbation
df
¼
@f ¼ ð0; x1 ; 0ÞT : @r
ð43Þ
The forcing term defined in Eq. (43) is plotted in Fig. 3(a). Fig. 3(b) plots the decomposition coefficients afi , computed by solving a 3 3 linear system defined in Eq. (14) at every point on the trajectory.
Fig. 3. Lyapunov vector decomposition of df . The x-axes are t; the blue, green and red lines on the left are the first, second and third component of df as defined in Eq. (43); the blue, green and red lines on the right are af1 ; af2 and af3 in the decomposition of df (Eq. (14)), respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
10
Q. Wang / Journal of Computational Physics 235 (2013) 1–13
For each afi obtained through the decomposition, Eq. (19) or (23) is solved to obtain axi . For i ¼ 1, Eq. (19) is solved backwards in time from t ¼ T A þ T B to t ¼ 0. For i ¼ n0 ¼ 2, the time compression constant is estimated to be g 2:78, and Eq. (23) is integrated to obtain ax2 . For i ¼ 3, Eq. (19) is solved forward in time from t ¼ T B to t ¼ T A . The resulting values of axi ; i ¼ 1; 2; 3 are plotted in Fig. 4(a). These values are then substituted into Eq. (13) to obtain dx, as plotted in Fig. 4(b). The ‘‘shadow’’ trajectory defined as x0 ¼ x þ dx is also plotted in Fig. 1 as the red lines, for an ¼ 1=3. This dx ¼ S1 f df is approximately the shadow coordinate perturbation ‘‘induced’’ by a 1=3 increase in the input parameter r, a.k.a. the Rayleigh number in the Lorenz attractor. The last step of the forward sensitivity analysis algorithm is computing the sensitivity derivatives of the output statistical quantities using Eq. (9). We found that using a windowed time averaging [4] yields more accurate sensitivities. Here our estimates over the time interval ½0; T A are
dhx21 i 2:64; dr
dhx22 i 3:99: dr
dhx3 i 1:01 dr
ð44Þ
These sensitivity values compare well to results obtained through finite difference, as shown in Section 7.4. 7.3. Adjoint sensitivity analysis We demonstrate our adjoint sensitivity analysis algorithm by computing the sensitivity derivatives of the statistical quantity hx3 i to small perturbations in the three system parameters s; r and b in the Lorenz attractor Eq. (41). The first three steps of Algorithm 2 is the same as in Algorithm 1, and has been demonstrated in Section 7.1. Step 4 involves decomposing ð@J=@xÞT into three adjoint Lyapunov covariant vectors. In our case, JðxÞ ¼ x3 , therefore @J=@x ð0; 0; 1Þ, as plotted in Fig. 5(a). The adjoint Lyapunov covariant vectors wi can be computed using Eq. (30) by inverting the 3 3 matrix ^xi ; i ¼ 1; 2; 3 can then formed by the (primal) Lyapunov covariant vectors /i at every point on the trajectory. The coefficients a be computed by solving Eq. (28). These scalar quantities along the trajectory are plotted in Fig. 5(b) for t 2 ½0; T A . ^xi ; a ^fi can be computed by solving Eq. (38). The solution is plotted in Fig. 6(a). Eq. (27) can then be used to Once we obtain a ^fi into the adjoint vector ^f . The computed ^f along the trajectory is plotted both in Fig. 6(b) as a function of t, and combine the a also in Fig. 7 as arrows on the trajectory in the state space. The last step of the adjoint sensitivity analysis algorithm is computing the sensitivity derivatives of hJi to the perturbadf tions dfs ¼ df , dfr ¼ df and dfb ¼ db using Eq. (26). Here our estimates over the time interval ½0; T A are computed as ds dr
dhx3 i 0:21; ds
dhx3 i 0:97; dr
dhx3 i 1:74: db
ð45Þ
dhx3 i dr
Note that estimated using adjoint method differs from the same value estimated using forward method (44). This discrepancy can be caused by the different numerical treatments to the time dilation term in the two methods. The forward method numerically estimates the time dilation constant g through Eq. (22); while the adjoint method sets the mean of ^fi to zero (36), so that the computation is independent to the value of g. This difference could cause apparent discrepancy a in the estimated sensitivity derivatives. The next section compares these sensitivity estimates, together with the sensitivity estimates computed in Section 7.2,, to a finite difference study. 7.4. Comparison with the finite difference method To reduce the noise in the computed statistical quantities in the finite difference study, a very long time integration length of T ¼ 100; 000 is used for each simulation. Despite this long time averaging, the quantities computed contain statistical noise of the order 0:01. The noise limits the step size of the finite difference sensitivity study. Fortunately all the output sta-
x x x Fig. 4. Inversion of Sf for dx ¼ S1 f df . The x-axes are t; the blue, green and red lines on the left are a1 ; a2 and a3 , respectively; the blue, green and red lines on the right are the first, second and third component of dx, computed via Eq. (13). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Q. Wang / Journal of Computational Physics 235 (2013) 1–13
11
Fig. 5. Adjoint Lyapunov vector decomposition of @J=@x. The x-axes are t; the blue, green and red lines on the left are the first, second and third component ^x1 ; a ^x2 and a ^x3 in the decomposition of @J=@x (Eq. (28)), respectively. (For interpretation of the references of @J=@x; the blue, green and red lines on the right are a to color in this figure legend, the reader is referred to the web version of this article.)
^f1 ; a ^f2 and a ^f3 , respectively; Fig. 6. Computation of the adjoint solution ^f for the Lorenz attractor. The x-axes are t; the blue, green and red lines on the left are a the blue, green and red lines on the right are the first, second and third component of ^f , computed via Eq. (27). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 7. The adjoint sensitivity derivative ^f as in Eq. (26), represented by arrows on the trajectory.
12
Q. Wang / Journal of Computational Physics 235 (2013) 1–13
Fig. 8. Histogram of sensitivities computed using Algorithm 1 (forward sensitivity analysis) starting from 200 random initial conditions. T A ¼ 10; T B ¼ 5. The red region identifies the 3r confidence interval estimated using finite difference regression. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 9. Histogram of sensitivities computed using Algorithm 2 (adjoint sensitivity analysis) starting from 200 random initial conditions. T A ¼ 10; T B ¼ 5. The red region identifies the 3r confidence interval estimated using finite difference regression. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
tistical quantities seem fairly linear with respect to the input parameters, and a moderately large step size of the order 0:1 can be used. To further reduce the effect of statistical noise, we perform linear regressions through 10 simulations of the Lorenz attractor, with r equally spaced between 27:9 and 28:1. The total time integration length (excluding spin up time) is 1; 000; 000. The resulting computation cost is in sharp contrast to our method, which involves a trajectory of only length 20. Similar analysis is performed for the parameters s and b, where 10 values of s equally spaced between 9:8 and 10:2 are used, and 10 values of b equally spaced between 8=3 0:02 and 8=3 þ 0:02 are used. The slopes estimated from the linear regressions, together with 3r confidence intervals (where r is the standard error of the linear regression) is listed below:
dhx21 i ¼ 2:70 0:10; dr dhx3 i ¼ 0:16 0:02; ds
dhx22 i dhx3 i ¼ 3:87 0:18; ¼ 1:01 0:04; dr dr dhx3 i ¼ 1:68 0:15: db
ð46Þ
To further assess the accuracy of our algorithm, which involves finite time approximations to Eqs. (9) and (26), we repeated both Algorithms 1 and 2 for 200 times, starting from random initial conditions at T ¼ 10. We keep the statistical averaging time T A ¼ 10 and the spin up buffer time T B ¼ 5. The resulting histogram of sensitivities computed with Algorithm 1 is shown in Fig. 8; the histogram of sensitivities computed with Algorithm 2 is shown in Fig. 9. The finite difference estimates are also indicated in these plots. We observe that our algorithms compute accurate sensitivities most of the time. However, some of the computed sensitivities seems to have heavy tails in their distribution. This may be due to behavior of the Lorenz attractor near the unstable fixed point ð0; 0; 0Þ. Similar heavy tailed distribution has been observed in other studies of the Lorenz attractor [1]. They found that certain quantities computed on Lorenz attractor can have unbounded second moment. This could be the case in our sensitivity estimates. Despite this minor drawback, the sensitivities computed using our algorithm have good quality. Our algorithms are much more efficient than existing sensitivity computation methods using ensemble averages. 8. Conclusion This paper derived a forward algorithm and an adjoint algorithm for computing sensitivity derivatives in chaotic dynamical systems. Both algorithms efficiently compute the derivative of statistical quantities hJi to infinitesimal perturbations df to the dynamics.
Q. Wang / Journal of Computational Physics 235 (2013) 1–13
13
The forward algorithm starts from a given perturbation df , and computes a perturbed ‘‘shadow’’ coordinate system dx, e.g. as shown in Fig. 1. The sensitivity derivatives of multiple statistical quantities to the given df can be computed from dx. The adjoint algorithm starts from a statistical quantity hJi, and computes an adjoint vector ^f , e.g. as shown in Fig. 7. The sensitivity derivative of the given hJi to multiple input perturbations can be computed from ^f . We demonstrated both the forward and adjoint algorithms on the Lorenz attractor at standard parameter values. The for@hx2 i @hx2 i 3i ward sensitivity analysis algorithm is used to simultaneously compute @r1 , @r2 , and @hx ; the adjoint sensitivity analysis algo@r @hx3 i 3 i @hx3 i rithm is used to simultaneously compute @hx , , and . We show that using a single trajectory of length about 20, both @s @r @b algorithms can efficiently compute accurate estimates of all the sensitivity derivatives. References [1] G. Eyink, T. Haine, D. Lea, Ruelle’s linear response formula, ensemble adjoint schemes and Lévy flights, Nonlinearity 17 (2004) 1867–1889. [2] F. Ginelli, P. Poggi, A. Turchi, H. Chaté, R. Livi, A. Politi, Characterizing dynamics with covariant Lyapunov vectors, Physical Review Letters 99 (2007) 130601. [3] A. Jameson, Aerodynamic design via control theory, Journal of Scientific Computing 3 (1988) 233–260. [4] J. Krakos, Q. Wang, S. Hall, D. Darmofal, Sensitivity analysis of limit cycle oscillations, Journal of Computational Physics 231 (8) (2012) 3228–3245. [5] D. Lea, M. Allen, T. Haine, Sensitivity analysis of the climate of a chaotic system, Tellus 52A (2000) 523–532. [6] D. Ruelle, Differentiation of SRB states, Communications in Mathematical Physics 187 (1997) 227–241. [7] D. Ruelle, A review of linear response theory for general differentiable dynamical systems, Nonlinearity 22 (4) (2009) 855. [8] J.-N. Thepaut, P. Courtier, Four-dimensional variational data assimilation using the adjoint of a multilevel primitive-equation model, Quarterly Journal of the Royal Meteorological Society 117 (502) (1991) 1225–1254. [9] D. Venditti, D. Darmofal, Grid adaptation for functional outputs: Application to two-dimensional inviscid flow, Journal of Computational Physics 176 (2002) 4069. [10] Q. Wang, P. Moin, G. Iaccarino, Minimal repetition dynamic checkpointing algorithm for unsteady adjoint calculation, SIAM Journal on Scientific Computing 31 (4) (2009) 2549–2567. [11] C. Wolfe, R. Samelson, An efficient method for recovering Lyapunov vectors from singular vectors, Tellus A 59 (3) (2007) 355–366.