Aerospace Science and Technology 99 (2020) 105715
Contents lists available at ScienceDirect
Aerospace Science and Technology www.elsevier.com/locate/aescte
Satellite proximate pursuit-evasion game with different thrust configurations ✩ Dong Ye a , Mingming Shi b,∗ , Zhaowei Sun a a b
Research Center of Satellite, Harbin Institute of Technology, 150001, Harbin, China Faculty of Sciences and Engineering, University of Groningen, 9747AG, Groningen, the Netherlands
a r t i c l e
i n f o
Article history: Received 19 July 2019 Received in revised form 15 December 2019 Accepted 16 January 2020 Available online 23 January 2020 Communicated by Christian Circi Keywords: Pursuit-evasion game CW equations Saddle solution Newton method
a b s t r a c t This paper investigates the proximity satellite pursuit-evasion game where the pursuer carries three orthogonal thrusters, each of which can exert a magnitude-limited force along one axis, while the evader has a single thruster which can produce a bounded acceleration in any direction. The pursuer or the evader tries their best to capture or escape by performing orbital maneuvers. By establishing a local moving coordinate frame on the originally revolving orbit of the evader, we reduce the dynamics of each player to the linear Clohessy-Wiltshire equations. We then transform the problem of finding the optimal controls associated with the saddle point solution of the game into the two-point boundary value problem, which is solved by combining the heuristic searching and Newton method. At last, by numerical simulations, we discuss the effectiveness of the proposed algorithm in finding the open-loop solution to the game. We show that, in contrast with pursuit-evasion games where each player has one single thruster, the problem considered in this paper may not be solved efficiently by the indirect method, since there exist some initial states of the players such that the proposed algorithm fails to solve the open-loop control. © 2020 Elsevier Masson SAS. All rights reserved.
1. Introduction In this paper, we investigate proximate satellite pursuit-evasion game, in which the pursuer and evader with magnitude bounded controls try to minimize or maximize the capture time, respectively. An issue for solving this type of differential games is that the analytic feedback solution is often difficult to derive from the Hamilton-Jacobi-Isaacs (HJI) partial differential equations [1]. Hence, most literature [2–7] has been dedicated to solving the open-loop trajectory of the saddle-point solution. Even though some early works [8,9] provided near-optimal close-loop control methods for the minmax-range game of two spacecrafts with constant-thrust, the results still rely on the pursuer to solve the open-loop solution periodically. A standard procedure for finding the open-loop saddle-point solution usually converts the pursuitevasion game into a two-point boundary value problem (TPBVP), which is then solved by the direct, indirect, or semi-direct method [5], [7–13]. For instance, when the pursuing satellite is far away
✩ The work of Dong Ye was supported by the National Natural Science Foundation of China (61603115, 91638301, 51875119) and the National Defense Basic Research Program (JCKY2017603B006). Corresponding author. E-mail addresses:
[email protected] (D. Ye),
[email protected] (M. Shi),
[email protected] (Z. Sun).
*
https://doi.org/10.1016/j.ast.2020.105715 1270-9638/© 2020 Elsevier Masson SAS. All rights reserved.
from the evader, Mauro Pontani [5,1] gave a semi-direct method to solve the open-loop solution of the three dimensional nonlinear spacecraft pursuit-evasion game. Readers who are interested in this problem shall also check [7] for the direct method, and [6] for the homotopic approach. The solving procedure is also applied to the scenario where the pursuer is sufficiently close to the evader. In this situation, the nonlinear dynamics of each player can be reduced to the linear CW equations [2], or to the quasilinear Tschauner-Hempel (TH) equations [4], by building a local coordinate frame at a virtual satellite, which follows a circular or elliptical orbit near the pursuer and evader. The reduced dynamics lessens the complexity of the satellite pursuit-evasion game. For example, the authors in [2] showed that by exploiting the linear CW equations, the number of unknowns required to solve the TPBVP associated with the open-loop solution reduces from 13 in [5] to 4, which sharply decreases the difficulty in solving the problem. This merit has also been verified in [3], and even in [4], where the dynamics of each player is represented by the TH equations. All the aforementioned papers [2–7] assume that the players have the same thrust configuration. Namely, each player only carries one single thruster that can generate a magnitude-limited acceleration in any direction. However, from a practical point of view, the thrust configurations of the pursuer and the evader may be different. In fact, since the intention of the pursuer is to capture the evader as soon as possible, the designer of the pursuing satel-
2
D. Ye et al. / Aerospace Science and Technology 99 (2020) 105715
lite may want to improve the maneuverability of the pursuer by adding extra thrusters. Hence, dissimilarly from [2–7], in this paper we consider a new scenario for the satellite pursuit-evasion game. We assume that the pursuer carries three orthogonal thrusters, each of which can exert a magnitude-bounded force along one direction. This thrust configuration has also been assumed in the research of formation flying [14], station keeping [15], and spacecraft rendezvous [16], [17], [18]. In contrast, the evader has a single thruster which can produce a finite acceleration in any direction. Compared with the existing literature, the problem serves as a starting point for considering more complex and real satellite pursuit-evasion game. To facilitate the presentation of this main idea, we consider the proximate satellite pursuit-evasion game and let the dynamics of the players be represented by the CW equations. We show that the number of the unknown variables for computing the open-loop saddle-point solution of the pursuit-evasion game, can also be decreased from 13 in [5] to 4. We use the indirect method to find the solution of the game. Specifically, to solve the TPBVP transformed from the game, we first heuristically search an initial guess for the unknown variables by the Particle Swarm Optimization (PSO) algorithm, which has been an effective method for solving the trajectory optimization problems, like the optimal orbit transfer [19, 20], the formation reconfiguration [21,22], and also the pursuitevasion game [2,4]. Then, we refine the initial guess by the Newton method. By the numerical simulation, we show that the Newton iteration can stop very quickly for some initial states of the pursuer and the evader. However, if the initial states of the players are not nice, it is difficult to solve the open-loop control of the pursuitevasion game. This undesirable feature motivates us to write this note to call more attention from the researchers to propose other efficient methods for solving the problem considered in this paper. At last, we mention that if the velocities of the pursuer and evader are required to be the same at the game ending time, satellite pursuit-evasion game can be considered as the problem of satellites rendezvous with non-cooperative target and/or bounded disturbance [23–25]. However, [23,25] assumed that the input of the chaser are not bounded, while [24] could not prove that the position of the purser will exactly converge to the evader. Hence, all the methods in these three papers are not suitable for the problem of satellite pursuit-evasion game. The rest of the paper consists of five sections. In Section 2, we introduce the dynamics of the players. In Section 3, we derive the expression for the saddle-point optimal controls of the pursuer and evader, and prove a theorem that can be used to reduce the number of state variables in the game dynamics. Section 4 converts the pursuit-evasion game into the TPBVP, and provides the solving method. In Section 5, we demonstrate the proposed method by several numerical examples, and show that the pursuit-evasion game considered in this paper may be difficult to solve by the indirect method for some initial states. At last, Section 6 concludes the paper. 2. Relative orbital dynamics Consider a chief satellite that is moving on a circular orbit around the Earth. The motion equations of a deputy relative to the chief can be represented in a local, rotating reference frame, where the origin O is set at the center of the chief, the O x axis points towards the radial direction of the temporal location of the chief, the O y axis follows the along-track direction of the chief, and the O z axis points in the direction of the orbital angular momentum, completing a right-handed coordinate frame. If the distance between the deputy and the chief is sufficiently small compared with the distance from the deputy to the center of the Earth, the motion
equations of the deputy in the local coordinate frame O xyz can be simplified to the CW equations [2,26], that is
x¨ δ − 2ω y˙ δ − 3ω2 xδ = u x y¨ δ + 2ω x˙ δ = u y z¨ δ + ω2 zδ = u z
(1)
where xδ , y δ , zδ are the three components of the position of the deputy in the local coordinate frame O xyz, ω is the orbital angular velocity of the chief, and u x , u y , and u z represent the three components of the acceleration in the coordinate frame O xyz. We define the system state variables as x = [xδ y δ zδ x˙ δ y˙ δ z˙ δ ] T , so that we can rewrite the deputy dynamics in the state space form, that is
x˙ = Ax + Bu , with
A=
⎡
(2)
O 3×3 A1
3ω 2 ⎣ 0 A1 = 0
0 0 0
I 3×3 A2 0 0
−ω2
,
⎤ ⎦,
B=
O 3×3 I 3×3
⎡
0 A 2 = ⎣ −2ω 0
(3) 2ω 0 0
⎤
0 0⎦ 0
If the pursuer P and evader E are close and do not travel a long distance during the game, a virtual chief that moves along a circular reference orbit can be selected near the trajectories of the two players, and the local coordinate frame O xyz can be defined accordingly. Straightforwardly, we can obtain the same dynamics as (2) for the pursuer and the evader. To distinguish the two players, we let the state of the pursuer be x P = [(r P ) ( v P ) ] ∈ R6 , where r P and v P are the position and velocity vectors of the pursuer in the local coordinate frame O xyz, respectively, and let x E = [(r E ) ( v E ) ] ∈ R6 , where r E and v E are the states of the evader. Then, the dynamics for both players is given by
x˙ P = Ax P + Bu P x˙ E = Ax E + Bu E
(4)
where u P and u E are the accelerations of the pursuer and evader, respectively. Unlike Refs. [1–7] and [27], in this paper we assume that the configurations of the thrust of the pursuer and the evader are different. Specifically, the pursuer can exert propulsions in all three axes of the local coordinate frame. The acceleration along each axis is independent of others, and has the following magnitude constraint,
|u P , x | ≤ ρ P , |u P , y | ≤ ρ P , |u P , z | ≤ ρ P
(5)
The assumption that the satellite has three independent thrusters, each of which having a magnitude constraint, has also been adopted in formation flying [14], station keeping [15], and spacecraft rendezvous [16], [17], [18]. Conversely, the evader only uses one thrust that can point in an arbitrary direction. The magnitude of the acceleration of the evader satisfies
E u ≤ ρ E 2
(6)
To make the interception achievable, we assume ρ P > ρ E , namely the pursuer has a higher maneuverability than the target. In the later, we denote by U P /U E the admissible control set for the pursuer/evader, where u P /u E should satisfy (5)/(6).
D. Ye et al. / Aerospace Science and Technology 99 (2020) 105715
The assumption of dissimilar thrust configurations and that the pursuer takes three independent thrusters is based on the considerations that more thrusters in general brings more flexibility and maneuverability for the pursuer, whose mission is to capture the evader as soon as possible. We note that the evader may also take more than one thruster, and other thrust configurations for the players are also possible. However, the main objective of this paper is to draw the attentions of researchers to a more broad research scenario, where the players do not only need to take one single thruster as in [1–7], [27]. Hence, we will not elaborate on these situations.
3
= arg max (λ E , v ) u E u E ∈U E
= ρE
λE ,v λ E , v
(12)
The adjoint variables satisfy the dynamics as
∂H = − AλP ∂ xP ∂H λ˙ E = E = − A λ E ∂x
λ˙ P =
(13)
From Eq, (8), the function for the terminal condition ψ is 3. Game formulation
ψ(x Pf , x Ef , t f ) = υ (r Pf − r Ef ) + t f
3.1. Two player zero-sum differential game The interception problem can be formulated as a zero-sum game, which ends when the pursuer captures the evader. The cost of the game is the capture time t f , which is minimized by the pursuer, and maximized by evader. Hence, the pay-off function of the game is defined as
J =tf
(7)
We denote the game starting time as t 0 . In the later context, for any signal c (t ) : R → Rn , we denote by c 0 and c f the value of c at time t 0 and t f , respectively. For instance, x0 = x(t 0 ) and x f = x(t f ). The game ends when the position of the pursuer coincides with that of the evader, that is
(14)
where v ∈ R3 is the Lagrange multiplier. The terminal condition for the adjoint variable is given by
λ Pf = λ Ef =
∂ψ(x Pf , x Ef , t f ) ∂ x Pf ∂ψ(x Ef , x Ef , t f ) ∂ x Ef
(15)
Substituting (14) into the equality above leads to
λ Pf ,r = υ , λ Pf , v
= 03 ,
λ Ef ,r = −υ λ Ef , v = 03 .
(16)
At last, exploiting the transversality condition, we have
r Pf = r Ef
(8) P
E
We denote by λ = [(λ ) (λ ) ] ∈ R the adjoint variables, where λ P = [(λ P ,r ) (λ P , v ) ] and λ E = [(λ E ,r ) (λ E , v ) ], with λ P ,r , λ P , v , λ E ,r , λ E , v ∈ R3 , are the adjoint variables associated with the state of the pursuer and the evader, respectively. Then, the Hamiltonian function for the game is 12
(λ Pf ) ( Ax Pf + Bu Pf ) + (λ Ef ) ( Ax Ef + Bu Ef ) + 1 = (λ Pf ,r ) ( v Pf − v Ef ) + 1 = 0
(17)
3.2. Reduced zero-sum game problem
where
In this subsection, we show that the zero-sum game from the satellite pursuit-evasion problem can be transformed into a game with reduced states and dynamics. This reduction is based on the observation that the adjoint variables λ P and λ E satisfy
H P = (λ P ) Ax P + (λ P ) Bu P
λ P (t ) + λ E (t ) = 06
P
H=H +H
E
E
E
(9)
E
E
H = (λ ) Ax + (λ ) Bu
E
(10)
According to the Pontryagin Maximum Principle, the Hamiltonian is minimized by the pursuer and maximized by the evader. It is trivial that the Hamiltonian is separable since H P and H E are decoupled. Then, the minimization and maximization of H commute, which implies that the saddle solution of the differential game (u ∗P , u ∗E ) exists. Therefore, we can obtain the optimal control input for the pursuer [7], that is
u
P ,∗
= arg min H
P
u P ∈U P
for all t ∈ [t 0 , t f ]. This observation has been shown in [2,3] for the satellite pursuit-evasion game where each of the players takes only one thruster, and is shown also true when the orbit of the chief is elliptical [4]. One can verify that this observation also holds for the problem considered in this paper. Based on this observation, we then present the reduced zerosum game. Theorem 1. For the pursuit-evasion game defined on the system dynamics (4), the saddle-point control strategies of the players in the pursuitevasion game are equivalent to the control inputs, given as follows,
= arg min ((λ P ) Ax P + (λ P ) Bu P )
u P (t ) = −ρ P sign(λ v )
= arg min (λ P , v ) u P
u E (t ) = −ρ E
u P ∈U P u P ∈U P
= −ρ P sign(λ P , v ), and the evader
u E ,∗ = arg max H E u E ∈U E
= arg max ((λ E ) Ax E + (λ E ) Bu E ) u E ∈U E
(11)
(18)
λv , λ v
(19)
where λ v ∈ R3 is the sub-vector of λ := [(λr ) T (λ v ) T ] T ∈ R6 , which along with x = [r v ] ∈ R6 is the solution of the following differential equations
x˙ = Ax + B (u P − u E )
(20)
λ˙ = − A λ,
(21)
T
4
D. Ye et al. / Aerospace Science and Technology 99 (2020) 105715
From the equalities above and λ vf = 03 , we have that if λrf is given, whereas the adjoint variable λ(τ ) for all τ ∈ [0, τ f ] can be computed by,
with the initial and terminal conditions,
x0 = x0P − x0E r f = 03
(22)
λ vf = 03 ˜ f := (λr )T v f + 1 = 0 H f
λ(τ ) = (τ )λ0 = (τ ) (τ f )−1 λrf = (τ ) (−τ f )λrf r λ
11 (τ − τ f ) r λ = (τ − τ f ) f =
21 (τ − τ f ) f 03
Proof. Define x = x P − x E , then by the system dynamics (4), we get the differential equation (20) with the initial condition x0 = x0P − x0E . The game ending condition x Pf = x Ef yields x f = 03 . Let
Hence, the optimal control inputs can be expressed as
λ = λ P , then by (18), we have that the control input of the pursuer satisfies the first equality of (19), where λ is the solution of (21) with terminal condition λ vf = 03 and
u E (τ ) = −
Hf +1=
(λrf )T ( v Pf
−
v Ef ) + 1
=
(λrf )T
v f + 1 = 0,
This theorem implies that the optimal control strategies for the players equal to those of the zero-sum differential game where the system state is the difference of the states of the pursuer and the evader and follows the same dynamics as that of the pursuer or the evader. Based on Theorem 1, the number of the state and adjoint variables reduces to 12, which is half of that of the original pursuit-evasion game. 4. Indirect game solving method 4.1. TPBVP and nonlinear equation solving
11 (t , t 0 ) 12 (t , t 0 ) .
21 (t , t 0 ) 22 (t , t 0 )
(24)
Since both (2) and (21) are linear, (t , t 0 ) and (t f , t 0 ) should only depend on τ . Hence, we replace the time t by τ in the later context and denote τ0 = t 0 − t 0 = 0, and τ f = t f − t 0 . For any matrix M (t , t 0 ) that only depends on τ , we will denote M (t , t 0 ) and M (t f , t 0 ) as M (τ ) and M (τ f ), respectively. For example, (τ ) = (t , t 0 ) and 11 (τ f ) = 11 (t f , t 0 ). For any τ1 , τ2 ∈ R, it is trivial to get
(τ1 )−1 = (−τ1 ),
(τ1 )−1 = (−τ1 )
(τ1 ) (τ2 ) = (τ1 + τ2 ), (τ1 ) (τ2 ) = (τ1 + τ2 )
21 (τ − τ f )λrf 21 (τ − τ f )λrf 2
(25)
ρE
(27)
Then, we can compute the final state of the reduced game as,
t f x f = (t f , t 0 )x(t 0 ) +
(t f , s) B (u P (s) − u E (s))ds t0
τ f = (τ f )x0 +
(τ f − s) B (u P (s) − u E (s))ds
(28)
0
Let ξ = [(λrf ) , τ f ] ∈ R4 be the unknown for the TPBVP. For each specific ξ , we can obtain λ(τ ) from Eq. (26) and the control inputs u P (τ ) and u E (τ ) from Eq. (27) for t ∈ [0, τ f ]. The state x(τ ) then can be derived by substituting the control inputs into (28) ˜ (ξ ) can also be calculated by x(τ f ) and integrating. The value of H and λ(τ f ). Imposing the second and fourth necessary conditions in (22) implies that the pursuit-evasion game can be reduced to solve ξ from the following nonlinear equation
From the previous subsection, we know that the satellite pursuit-evasion game can be cast as the two-point boundary value problem, in which the state vector [x λ ] satisfies the differential equation (20), with control inputs and boundary constraints given by (19) and (22), respectively. Notice that the computation of the optimal control inputs for both the players requires to know the adjoint λ(t ) at each time t ∈ [t 0 , t f ], which can be calculated from Eq. (21) if the boundary condition of λ and the game ending time t f are given. Hence, we focus on solving t f and the adjoint variable at any one of the terminal time, λ(t 0 ) or λ(t f ), such that the boundary constraints (22) hold. Note that this solving formulation is called the indirect method [3,7]. We first reduce the number of the unknowns that needs to be solved by exploiting the boundary condition (22) of the adjoint variable λ. Let (t , t 0 ) and (t , t 0 ) ∈ R6×6 be the state transition matrices of the system (2) and (21), respectively. Their detail expressions can be derived from the definition of the state transition matrix. We then let τ = t − t 0 and partition (t , t 0 ) as
(t , t 0 ) =
u P (τ ) = −sign( 21 (τ − τ f )λrf )ρ P
(23)
namely the third equality of (22). Finally, since λ E (t ) = −λ P (t ) for all t ∈ [t 0 , t f ], we have that the control input of the evader satisfies the second equality of (19). This ends the proof. 2
(26)
F (ξ ) :=
rf ˜f H
= 04
(29)
4.2. Newton method If we denote with ξ ∗ the solution of (29), we can solve ξ ∗ using the Newton method giving a proper initial guess ξ0 . In detail, let ξn be the approximation of ξ ∗ at the nth iteration, with n = 1, 2, . . . , then ξn updates as
ξn+1 = ξn − α J −1 (ξn ) F (ξn ),
(30)
where α > 0 is the size of the iteration step that can be chosen by trial and error, and J (ξn ) is the Jacobian matrix of F (ξn ) (also termed as the sensitivity matrix in [6]), namely,
⎡ ∂ F (ξn ) ⎣ J (ξn ) = = ∂ξn
∂r f ∂λrf ,n ∂H f ∂λrf ,n
dr f dτ f ,n dH f dτ f ,n
⎤ ⎦
(31)
We then explain how to evaluate the four blocks in J (ξn ). Note that for a given ξn , x(τ ), λ(τ ), u P (τ ), and u E (τ ) can be computed from Eq. (26), (27), and (28). Hence, the following value can be obtained straightforwardly
dx f dτ f ,n
= Ax f + B (u P (τ f ,n ) − u E (τ f ,n )),
(32)
by which, we can calculate the two following blocks in J (ξn ),
∂r f =vf ∂ τ f ,n ∂λrf ,n ∂H f ∂v f ∂v f = v f + (λrf ,n ) = (λrf ,n ) ∂ τ f ,n ∂ τ f ,n ∂ τ f ,n ∂ τ f ,n
(33)
D. Ye et al. / Aerospace Science and Technology 99 (2020) 105715
∂r
∂H
The terms ∂λr f and ∂λr f are undetermined. For the latter, we f ,n f ,n have
∂λrf ∂ v f ∂ v f ∂H f r r = v + (λ ) = v + (λ ) , f f f ,n f ,n ∂λrf ,n ∂λrf ,n ∂λrf ,n ∂λrf ,n ∂v
(34)
5
for food. It was initially introduced by Eberhart and Kennedy, and later used widely in trajectory optimization problems, [2,22,4,21, 19]. Moreover, compared with the genetic algorithm, it is shown that the PSO algorithm has the same effectiveness in finding the global optimal solution, but significantly better computational efficiency [28,4].
∂r
where ∂λr f is still undetermined. Notice that the terms ∂λr f and f ,n f ,n ∂v f ∂x are the sub-matrices of ∂λr f . We approximate them by the ∂λrf ,n f ,n
finite difference method, i.e.
∂x f x f (ξn1 ) − x f (ξn ) x f (ξn2 ) − x f (ξn ) x f (ξn3 ) − x f (ξn ) ≈ , , r ∂λ f ,n h h h
4.4. Solving procedure and algorithm Considering the last two subsections, the procedure to solve the unknown ξ from (29) and extract the saddle-point solution of the pursuit-evasion game is given in Algorithm 1, in which nmax is the maximum number of iterations of the Newton method.
(35) Algorithm 1: Solving the open-loop solution of the game.
where
ξn1 ξn2 ξn3
Input: x P (0), x E (0), t 0 , ω and the solving tolerance ε . Result: t f , λrf , u P ,∗ , u E ,∗ and the trajectory of the game.
= ξn + h[1, 0, 0] = ξn + h[0, 1, 0] = ξn + h[0, 0, 1]
(36) ∂r
∂v
with h > 0 sufficiently small. Hence, given ξn , ∂λr f and ∂λr f can f ,n f ,n be determined in three steps: a) Choose a sufficiently small h > 0 and obtain ξn1 , ξn2 , ξn3 as in (36). b) For each ξni , i = 1, 2, 3, compute the adjoint variable λ(ξni ) from Eq. (26) and the state x f (ξni ) from
Initialization: x0 = x P (0) − x E (0), K > 0, h > 0, α > 0, nmax and n = 0; begin Solve the optimization problem (38) to get ξ0 by some heuristic searching method ; Using ξ0 to compute x f by (28) and F (ξ0 ) by (29); while F (ξn ) ≥ ε and n ≤ nmax do ∂r
∂x
∂x
∂x
∂H
Determine the Jacobian matrix J (ξn ) and update ξn+1 as in (30); Compute the final state x f (ξn+1 ) by (28) and F (ξn+1 ) by (29); n = n + 1; end Extract t f and λrf from ξn+1 . For each t ∈ [t 0 , t f ], calculate λ(t ) by (26),
(37)
Remark 1. In [6], to obtain ∂λ f , a matrix differential equation for f ∂x is first derived as ∂λ
∂r
Compute ∂ τ f by (35), and then calculate ∂λr f and ∂λr f by (34); f ,n f ,n f ,n
the integration (28). c) Calculate the matrix ∂λr f from Eq. (35). f ,n After obtaining F (ξn ) and the four blocks in J (ξn ), we can update ξn+1 by the iteration (30). Given a solving tolerance ε , the iteration stops when
F (ξn+1 ) < ε
∂H
Compute ∂ τ f and ∂ τ f by (33); f ,n f ,n Compute ξni for i = 1, 2, 3 by (36), and for each ξni with i = 1, 2, 3, calculate the adjoint variable λ(ξni ) by (26), the final state x f (ξni ) by (28) and F (ξni ) by (29);
the optimal control input as u P (t ), u E (t ) by (27) and the open-loop saddle-point trajectory of the game; end
f
d ∂ x(t ) dt ∂λ f
=
∂ dx(t ) ∂x ∂uP ∂uE =A +B −B ∂λ f dt ∂λ f ∂λ f ∂λ f
5. Numerical example and discussion
∂x
d ∂x The matrix ∂λ f is then computed by integrating dt ∂λ f from t 0 f to t f . In this paper, we can not apply this computing method, since the discontinuity in the sign function prevents us to derive an explicit expression for ∂∂λu P . Hence, we use the finite difference f
method. 4.3. Finding initial guess ξ0 by heuristic search The Newton method can converge to the solution of equations quickly. However it is sensitive to the initial guess of the solution. We find the initial guess for the Newton method by solving an optimization problem as follows,
min F (ξ ) K F (ξ ) ξ
(38)
where K = diag[ki ] is a diagonal matrix, with ki > 0, i ∈ {1, . . . , 4} being the weight associated with the ith equality in F (ξ ). The optimization problem (38) is converted from the problem of solving F (ξ ) = 04 , and it is easy to know that F (ξ ) = 04 if and only if the objective function reach the minimum. We can use heuristic methods, like genetic algorithm, cuckoo search or, PSO algorithm to provide an initial guess ξ0 of the solution of (38). In this paper, we use the PSO algorithm to solve, which is a population-based, stochastic optimizer that models the behavior of a flock of birds searching
In this section, we present the numerical simulations for the pursuit-evasion game considered in this paper. In the simulation, we assume that before the game starts, the evader is moving along a circular orbit, which is selected as the reference orbit. The local moving coordinate frame O xyz is built as mentioned in Section 2 with the origin O being the center of the evader as it was still moving along the original orbit without any maneuver. The height of the reference orbit is 1000 km, hence the angular velocity are ω = 9.962 × 10−4 rad/s. The magnitudes of propulsive accelerations of the pursuer and the evader are ρ P = 0.0002 km/s2 and ρ E = 0.0001 km/s2 . We implement the method for computing the open-loop saddle-point trajectory in Matlab by a 2.7 GHz Intel Core processor on Windows 10 with 8 GB of RAM. To find the initial guess of ξ ∗ , we use the swarm particle algorithm, which is implemented by the Matlab swarmparticle function, with 100 maximum iterations and 30 swarm size for each iteration. Other parameters in the swarmparticle function are selected as default. The weight ki is set as ki = 104 for i = 1, 2, 3 and ki = 100 for i = 4, which implies that we penalize more on the miss of the pursuer and the evader at the game ending time t f . In the Newton iteration, we let the size of the step be α = 0.6, and the iteration stop tolerance be ε = 0.01. We will present three examples with different initial states of the players. The open-loop optimal controls of the first two can be solved efficiently by Algorithm 1, while the solution of the third one is difficult to find.
6
D. Ye et al. / Aerospace Science and Technology 99 (2020) 105715
Fig. 1. History of the pursuit evasion game for Example a. (For interpretation of the colors in the figure(s), the reader is referred to the web version of this article.)
ξ ∗ = [10.689342 3.787719 5.547900 360.124]
5.1. Successfully solving Example a. The initial states of the pursuer and evader in the coordinate frame O xyz are given by,
x P (0) = [8 7 10 0.001 0.003 0.002]
(39)
x E (0) = [0 0 0 0 0 0]
(40)
where the units are km for the position and km/s for the velocity. Using Algorithm 1, we have
which implies that the game ending time is t f = 360.124 seconds. The whole solving procedure costs 24.786 seconds, while the Newton method stops after 7 iterations in 0.101 seconds. The final distance of the two players is 0.18 m, which implies that Algorithm 1 successfully solves the open-loop optimal trajectory of the game. The evolution of the states of the pursuer and the evader and the trajectory of the pursuit-evasion are shown in Fig. 1. From Fig. 1(a), we can see that the x, y , z components of the positions of the two players both decrease. Apparently, the three components xδP . y δP , zδP
D. Ye et al. / Aerospace Science and Technology 99 (2020) 105715
7
Fig. 2. History of the pursuit evasion game for Example b.
decreases more quickly. Finally, the pursuer successfully captures the evader. From Fig. 1(c), the three components u xP , u Py , u zP are all negative over the game, which implies that the adjoint variable λ v does not change its sign. This is also confirmed in Fig. 1(d), where the three components of u E are all negative for t ∈ [t 0 , t f ]. Example b. The initial states of the pursuer and evader in the coordinate frame O xyz are given by,
x P (0) = [−15 − 5 20 0.000 − 0.005 − 0.004] E
x (0) = [0 0 0 0 0 0]
(41) (42)
Then using Algorithm 1, we have
ξ ∗ = [−6.42322 − 9.00389 4.697 452.782] which implies that the game ending time is t f = 452.782 seconds. The whole solving procedure costs 24.429 seconds, while the Newton method stops after 7 iterations in 0.100 seconds. The final distance of the two players is 0.14 m, which implies that Algorithm 1 successfully found the open-loop optimal control of the game in this case. The evolution of the states of the players and the trajectory of the pursuit-evasion are shown in Fig. 1. From
8
D. Ye et al. / Aerospace Science and Technology 99 (2020) 105715
Fig. 3. Unsuccessful finding of the open-loop solution of the pursuit-evasion game where the initial conditions of the players are given in Section 5.2 and the players have different thrust configurations.
Fig. 2(c), u xP , u Py are positive, and u zP is negative for all t ∈ [t 0 .t f ], which implies that λ v does not change its sign over the game and is confirmed from the history of u E in Fig. 2(d).
with the saddle point solution of the pursuit-evasion game. In detail, the initial states of the two players are
x P (0) = [8 − 7 5 0 0 0]
(43)
5.2. Unsuccessful solving and discussion
x (0) = [0 0 0 0 0 0] .
We emphasize that there exist some initial states of the players such that the indirect method may not be able to solve the problem. This is shown by the example in this subsection, where Algorithm 1 fails in solving the open-loop trajectory associated
Simulations show that the solution ξ ∗ for initial condition cannot be found by Algorithm 1, even if we increase the swarm size to 500 and the maximum iterations to 800 in the swarmpaticle function, and decreases the step size α in the Newton method to
E
(44)
D. Ye et al. / Aerospace Science and Technology 99 (2020) 105715
9
Fig. 4. Successful finding of the open-loop solution of the pursuit-evasion game where the initial conditions of the players are given in Section 5.2 and both players have one single thruster.
0.000005. We also try to solve the game by first normalizing the variables. In detail, we normalize the players’ states by taking the units of the distance and the time to be 6378.137 km and 806.811 seconds, (which makes the gravitational constant μ = 1), and restrict the first three adjoint variables in the unknown ξ within [−1, 1]. However, the indirect method still can not give a correct solution. The final distance of the two players is larger than 2 km by the indirect method. The simulation result for this incorrect solution is given in Fig. 3. We stress that the pursuer should be able to capture the evader, namely the considered TVBVP has a solution, and the inability of
Algorithm 1 for this example is not caused by our programming skill. We show this by simulation. We consider another scenario that the pursuer also takes a single thruster as in [2,3], which can produce the acceleration at any direction, with magnitude ρ p = 0.2 m/s2 , namely u P 2 ≤ ρ P . Obviously, the pursuer has lower maneuvering ability than that in the previous paragraph.1 By the
1
The thrust constraint for the pursuer in the previous paragraph is |u P ,x | ≤
ρ P , |u P , y |, |u P ,z | ≤ ρ P .
10
D. Ye et al. / Aerospace Science and Technology 99 (2020) 105715
principle of minimization, the optimal control of the pursuer is given by
u P = −ρ P
λv λ v
(45)
and the optimal control of the evader is the same as (21). The dynamics of the state and adjoint variables are the same as (20) and (21), and the terminal conditions of the corresponding TPBVP are same as (22). We also solve the TPBVP for this scenario using Algorithm 1. The whole solving procedure costs 15.803 seconds, while the Newton method stops after 7 iterations in 0.097 seconds. The solution for the unknown is given by
ξ ∗ = [16.514451 − 13.330707 7.347035 532.606], and the final distance of the players is 0.61 m. The simulation result for this example is represented in Fig. 4, which implies that Algorithm 1 can successfully find the open-loop solution of this example when the pursuer also takes a single thruster. Now consider the example in the previous paragraph. Since the pursuer has higher maneuvering ability than that in this paragraph, we know that the pursuer should be able to successfully capture the evader. Notice that the only difference between the pursuit-evasion game considered in the previous graph and the one in [2,3] is the thrust configuration, hence the inefficiency in solving the open-loop solution of the problem in our paper should come from the assumption that the pursuer has three independent thrusters. When applying the optimal control theory, this thrust configuration results in the strong discontinuous sign function when generating the control input of the pursuer. Notice that control input of the evader considered in this paper is the same as that in [2,3]. We conjecture that the difficulty in solving the problem in this paper comes from the sign function. By comparing Fig. 3(c) with Fig. 1(c) and Fig. 2(c), we also notice that, in the unsuccessful solving example, the acceleration of the pursuer in the x-axis, u xP , changes its sign during the game. Hence, it is also possible that the sign change of the control input causes the inability of the indirect method in solving the pursuitevasion problem with different thrust configurations. 6. Conclusions In this paper, we investigated the satellite proximate interception problem in which the thrust configurations of the two players are different. In detail, we assumed that the pursuer takes three independent thrusters, each of which can produce a saturated fixeddirection acceleration, while the evader carries one thruster, which can produces a bounded acceleration at any direction. By choosing a virtual circular reference orbit, the CW linear dynamics is established for both the players. The pursuit-evasion game is then transferred into the two-point boundary value problem with 4 unknowns and 4 nonlinear equations. Based on the indirect method, we proposed a numerical algorithm to solve the TPBVP. The simulation shows that if the initial states of the pursuer and evader are nice, the proposed method can efficiently find the open-loop solution of the game. However, unlike the pursuitevasion game where both the players only take one single thruster, there exist some initial states of the players that deteriorate the solving ability of the indirect method. Hence, a natural question for the future arises: how to design an efficient algorithm to solve the open-loop control of the satellite pursuit-evasion game with different thrust configurations.
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. References [1] M. Pontani, Numerical solution of orbital combat games involving missiles and spacecraft, Dyn. Games Appl. 1 (4) (2011) 534–557. [2] J. Stupik, M. Pontani, B. Conway, Optimal pursuit/evasion spacecraft trajectories in the hill reference frame, in: AIAA/AAS Astrodynamics Specialist Conference, AIAA, 2012. [3] Z.-y. Li, H. Zhu, Z. Yang, Y.-z. Luo, A dimension-reduction solution of free-time differential games for spacecraft pursuit-evasion, Acta Astronaut. [4] E.R. Prince, J.A. Hess, R.G. Cobb, R.W. Carr, Elliptical orbit proximity operations differential games, J. Guid. Control Dyn. (2019) 1–15, https://doi.org/10.2514/1. G004031. [5] B.A. Conway, M. Pontani, Numerical solution of the three-dimensional orbital pursuit-evasion game, J. Guid. Control Dyn. 32 (2) (2009) 474–487. [6] W.T. Hafer, H.L. Reed, J.D. Turner, K. Pham, Sensitivity methods applied to orbital pursuit evasion, J. Guid. Control Dyn. 38 (6) (2015) 1118–1126. [7] H.-X. Shen, C. Lorenze, Revisit of the three-dimensional orbital pursuit-evasion game, J. Guid. Control Dyn. 41 (8) (2018) 1823–1831. [8] G. Anderson, G. Bohn, A near-optimal control law for pursuit-evasion problems between two spacecraft, AIAA J. 15 (8) (1977) 1203–1205. [9] G.M. Anderson, Feedback control for a pursuing spacecraft using differential dynamic programming, AIAA J. 15 (8) (1977) 1084–1088. [10] K. Horie, B.A. Conway, Optimal fighter pursuit-evasion maneuvers found via two-sided optimization, J. Guid. Control Dyn. 29 (1) (2006) 105–112. [11] K. Horie, B.A. Conway, Genetic algorithm preprocessing for numerical solution of differential games problems, J. Guid. Control Dyn. 27 (6) (2004) 1075–1078. [12] B.A. Conway, A survey of methods available for the numerical optimization of continuous dynamic systems, J. Optim. Theory Appl. 152 (2) (2012) 271–306. [13] B.A. Conway, M. Pontani, Optimal interception of evasive missile warheads: numerical solution of the differential game, J. Guid. Control Dyn. 31 (4) (2008) 1111–1122. [14] P. Palmer, Optimal relocation of satellites flying in near-circular-orbit formations, J. Guid. Control Dyn. 29 (3) (2006) 519–526. [15] R.R. Kumar, H. Seywald, Fuel-optimal stationkeeping via differential inclusions, J. Guid. Control Dyn. 18 (5) (1995) 1156–1162. [16] Q. Wang, B. Zhou, G.-R. Duan, Robust gain scheduled control of spacecraft rendezvous system subject to input saturation, Aerosp. Sci. Technol. 42 (2015) 442–450. [17] L.S. Breger, J.P. How, Safe trajectories for autonomous rendezvous of spacecraft, J. Guid. Control Dyn. 31 (5) (2008) 1478–1489. [18] Q. Li, J. Yuan, B. Zhang, C. Gao, Model predictive control for autonomous rendezvous and docking with a tumbling target, Aerosp. Sci. Technol. 69 (2017) 700–711. [19] J. Shan, Y. Ren, Low-thrust trajectory design with constrained particle swarm optimization, Aerosp. Sci. Technol. 36 (2014) 114–124. [20] M. Pontani, B.A. Conway, Particle swarm optimization applied to space trajectories, J. Guid. Control Dyn. 33 (5) (2010) 1429–1441, https://doi.org/10.2514/ 1.48475. [21] D. Spiller, K. Basu, F. Curti, C. Circi, On the optimal passive formation reconfiguration by using attitude control, Acta Astronaut. 153 (2018) 259–273. [22] G. Di Mauro, D. Spiller, S.F. Rafano Carnà, R. Bevilacqua, Minimum-fuel control strategy for spacecraft formation reconfiguration via finite-time maneuvers, J. Guid. Control Dyn. 42 (4) (2019) 752–768. [23] Y. Wang, H. Ji, Integrated relative position and attitude control for spacecraft rendezvous with ISS and finite-time convergence, Aerosp. Sci. Technol. 85 (2019) 234–245. [24] Q. Li, J. Yuan, B. Zhang, H. Wang, Disturbance observer based control for spacecraft proximity operations with path constraint, Aerosp. Sci. Technol. 79 (2018) 154–163. [25] Y. Huang, Y. Jia, Integrated robust adaptive tracking control of non-cooperative fly-around mission subject to input saturation and full state constraints, Aerosp. Sci. Technol. 79 (2018) 233–245. [26] J. Sullivan, S. Grimberg, S. D’Amico, Comprehensive survey and assessment of spacecraft relative motion dynamics models, J. Guid. Control Dyn. 40 (8) (2017) 1837–1859. [27] D. Ye, M. Shi, Z. Sun, Satellite proximate interception vector guidance based on differential games, Chin. J. Aeronaut. 31 (6) (2018) 1352–1361. [28] R. Hassan, B. Cohanim, O. de Weck, G. Venter, A comparison of particle swarm optimization and the genetic algorithm, https://doi.org/10.2514/6.2005-1897.