An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach

Chinese Journal of Aeronautics, (2019), xxx(xx): xxx–xxx Chinese Society of Aeronautics and Astronautics & Beihang University Chinese Journal of Aer...

Download PDF

4MB Sizes 0 Downloads 10 Views

Report

Full Text

Chinese Journal of Aeronautics, (2019), xxx(xx): xxx–xxx

Chinese Society of Aeronautics and Astronautics & Beihang University

Chinese Journal of Aeronautics [email protected] www.sciencedirect.com

An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach Zhen LI, Shisheng ZHONG *, Lin LIN School of Mechatronics Engineering, Harbin Institute of Technology, Harbin 150001, China Received 9 July 2018; revised 7 September 2018; accepted 22 October 2018

KEYWORDS Aero-engine; Hybrid strategy; Maintenance policy; Optimization algorithm; Reinforcement learning

Abstract An aero-engine maintenance policy plays a crucial role in reasonably reducing maintenance cost. An aero-engine is a type of complex equipment with long service-life. In engineering, a hybrid maintenance strategy is adopted to improve the aero-engine operational reliability. Thus, the long service-life and the hybrid maintenance strategy should be considered synchronously in aero-engine maintenance policy optimization. This paper proposes an aero-engine life-cycle maintenance policy optimization algorithm that synchronously considers the long service-life and the hybrid maintenance strategy. The reinforcement learning approach was adopted to illustrate the optimization framework, in which maintenance policy optimization was formulated as a Markov decision process. In the reinforcement learning framework, the Gauss–Seidel value iteration algorithm was adopted to optimize the maintenance policy. Compared with traditional aero-engine maintenance policy optimization methods, the long service-life and the hybrid maintenance strategy could be addressed synchronously by the proposed algorithm. Two numerical experiments and algorithm analyses were performed to illustrate the optimization algorithm in detail. Ó 2019 Production and hosting by Elsevier Ltd. on behalf of Chinese Society of Aeronautics and Astronautics. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).

1. Introduction An aero-engine, which is composed of mechanic-electrichydraulic coupling systems, is the power plant of an aircraft.1,2 * Corresponding author. E-mail address: [email protected] (S. ZHONG). Peer review under responsibility of Editorial Committee of CJA.

Production and hosting by Elsevier

It has been reported that more than 30% of aircraft mechanical problems are related to aero-engines, and the aero-engine maintenance cost contributes to about 30% of an airline’s direct operating cost.3 A maintenance optimization method provides an available way to reduce the maintenance cost reasonably.4 In general, excessive maintenance is costly, while insufficient maintenance may lead to disasters. Thus, a maintenance policy plays a crucial role in balancing the maintenance cost and operational reliability.5 However, it is not an easy work to optimize an aero-engine maintenance policy manually, especially taking the long service-life and the hybrid maintenance strategy into consideration synchronously.

https://doi.org/10.1016/j.cja.2019.07.003 1000-9361 Ó 2019 Production and hosting by Elsevier Ltd. on behalf of Chinese Society of Aeronautics and Astronautics. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

2 In engineering, a hybrid maintenance strategy is adopted to improve the civil aero-engine operational reliability. Strategies of Condition-Based Maintenance (CBM), Hard-time Maintenance (HM), and failure Corrective Maintenance (CM) are included in the hybrid maintenance strategy.6 As aero-engine performance deterioration is inevitable,7 gas path performance parameters are monitored for CBM. The Life Limit Part (LLP) should be replaced before its life limitation,8 and HM is adopted. Moreover, CM is performed when an aeroengine is in the random failure state. Thus, the hybrid maintenance strategy should be considered in aero-engine maintenance policy optimization. However, few existing aero-engine maintenance optimization methods are able to address the hybrid strategy. At the same time, an aero-engine is a type of equipment with long service-life,9 which should be synchronously considered in maintenance policy optimization. To the best of our knowledge, this is the first paper to investigate an aero-engine maintenance policy optimization method that can synchronously address the hybrid maintenance strategy and long service-life. In traditional aero-engine maintenance optimization methods, maintenance intervals are used to decide when to repair an aero-engine, and maintenance work-scopes indicate how to carry out maintenance actions. Maintenance intervals and work-scopes can be obtained by traditional separate models. For example, LLP replacement intervals and performance recovery intervals can be optimized separately by current models.10–12 Based on maintenance intervals, maintenance workscopes are obtained by traditional decision-making models.13,14 Based on traditional optimization methods, an aeroengine maintenance decision support system was proposed by Fu et al.,15 in which maintenance interval and work-scope optimization models were presented. An optimization method for reliability-centered maintenance was proposed by Crocker and Kumar,16 in which the concepts of soft life and hard life were used to optimize a military aero-engine maintenance policy. A multi-objective evolutionary algorithm was also adopted to solve the aero-engine maintenance scheduling problem,17 taking the module exchange into consideration. To optimize an aero-engine maintenance policy, traditional optimization methods would become extremely complicated when the long service-life and the hybrid maintenance strategy are considered synchronously.18 In general, machine learning methods include supervised learning, unsupervised learning, and reinforcement learning,19 and the reinforcement learning method has attracted increasing interests in solving decision-making problems.20 Reinforcement learning represents a machine learning method in which an agent learns how to behave through action rewards. Different from widely used supervised learning methods, there is no presentation of input and output pairs in reinforcement learning. In reinforcement learning, an agent chooses an available action according to the environment on each decision epoch. The chosen action changes the environment, along with a reward to the agent. The objective of the agent is to find the action collection, whose reward is maximal in the long run. Reinforcement learning methods have been adopted in energy system charging policy optimization,21,22 energy system and distributed system schedule determination,23,24 multiple robotic task optimization,25 demand response optimization,26,27 robust control optimization,28 multiple satellites task planning,29 et al. Although reinforcement learn-

Z. LI et al. ing methods have been successfully applied, they have not aroused much attention in aero-engine maintenance policy optimization. Reinforcement learning does provide a more appropriate way for aero-engine life-cycle maintenance policy optimization. Thus, an aero-engine life-cycle maintenance policy optimization algorithm is proposed based on reinforcement learning. The main contributions of this paper are as follows: (1) An aero-engine life-cycle maintenance policy optimization algorithm is proposed, which can synchronously address the aero-engine long service-life and the hybrid maintenance strategy. (2) To address the hybrid strategy, the aero-engine state is represented by a multi-dimensional state space. (3) Reinforcement learning is adopted to illustrate the maintenance policy optimization. In the reinforcement learning framework, maintenance policy optimization is formulated as a discrete Markov Decision Process (MDP), and the Gauss–Seidel value iteration algorithm is adopted to optimize the maintenance policy. The remainder of this paper is organized as follows. Section 2 introduces aero-engine maintenance policy optimization and the reinforcement learning approach, and the deficiencies of traditional optimization methods are analyzed. In Section 3, the proposed aero-engine life-cycle maintenance policy optimization algorithm is described in detail. In Section 4, two simulation experiments and algorithm analysis are used to illustrate the proposed optimization algorithm. Conclusions and future work are discussed in Section 5. 2. Aero-engine maintenance policy optimization and reinforcement learning 2.1. Aero-engine maintenance policy optimization A maintenance policy indicates when and how to repair an aero-engine. Traditionally, maintenance intervals indicate when to repair an aero-engine, which are obtained by traditional separate optimization models, addressing CBM, HM and, CM strategies separately. In traditional methods, maintenance work-scopes indicate how to repair an aero-engine, which are optimized based on the outputs of interval optimization models. In summary, there are several deficiencies in traditional aero-engine maintenance optimization methods, as follows: (1) In traditional methods, maintenance intervals and workscopes are optimized by separate models, and workscopes are optimized based on interval optimization results. Thus, the interactions of maintenance intervals and work-scopes are neglected. Moreover, interval optimization errors would be propagated to work-scope optimization. (2) Because the hybrid maintenance strategy is adopted for civil aero-engines, hybrid strategies should be addressed synchronously in optimization. However, traditional optimization methods address hybrid strategies separately, and interactions of hybrid strategies are neglected.

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

An aero-engine life-cycle maintenance policy optimization algorithm (3) It is difficult for traditional optimization methods to address the aero-engine long-service life and the hybrid maintenance strategy synchronously. (4) Definite optimization results are obtained by traditional optimization methods. Due to random factors, definite optimization results may be poorly applicable in engineering. To deal with the aforementioned deficiencies of traditional methods, an aero-engine life-cycle maintenance policy optimization algorithm is proposed based on the reinforcement learning approach. Taking the place of maintenance intervals and work-scopes in traditional methods, the maintenance policy indicates when and how to repair an aero-engine. The proposed optimization algorithm is able to address the hybrid maintenance strategy and long service-life synchronously. To address the hybrid maintenance strategy, a multidimensional space is adopted to represent the aero-engine state. The maintenance strategies of CBM and CM are formulated as an MDP, and imperfect repair and random factors are all considered in state transition. Aero-engine maintenance policy optimization is illustrated by the reinforcement learning approach, and the Gauss–Seidel value iteration algorithm is adopted to seek for the optimal maintenance policy. In the reinforcement learning framework, the Gauss–Seidel value iteration algorithm makes it available to optimize the lifecycle maintenance policy. A comparison between traditional optimization methods and the proposed optimization algorithm is shown in Fig. 1. In Fig. 1, traditional aero-engine maintenance optimization methods are shown on the left. Traditional methods are constituted by separate models, including an LLP interval optimization model, a performance interval optimization model, and a work-scope optimization model. LLP replacement and performance recovery intervals are obtained by separate optimization models. Based on the LLP replacement, performance

Fig. 1

3

recovery, and corrective maintenance intervals, the maintenance work-scope is optimized. The proposed aero-engine life-cycle maintenance policy optimization algorithm is shown on the right in Fig. 1. Based on the reinforcement learning framework, the strategies of HM, CM, and CBM are addressed synchronously by the proposed optimization algorithm. The traditional separate optimization models of the LLP replacement interval, the performance recovery interval, and the maintenance work-scope are replaced by the proposed optimization algorithm. 2.2. Reinforcement learning approach Reinforcement learning is a machine learning method, which is widely used to solve multi-step, sequential-decision problems. Different from supervised learning, no pre-specified model is required in reinforcement learning. In aero-engine life-cycle maintenance policy optimization, few historical data is available for training the pre-specified mode. Thus, reinforcement learning provides a more appropriate way for aero-engine life-cycle maintenance policy optimization. Meanwhile, the aero-engine long service-life and the hybrid maintenance strategy can be addressed synchronously by reinforcement learning. In reinforcement learning, an agent takes on the work of optimizing an aero-engine maintenance policy. The agent is able to respond to dynamically changing aero-engine states through ongoing learning methods.30 In aero-engine maintenance policy optimization, aero-engine states are represented by a multi-dimensional state space. To optimize the maintenance policy, the agent chooses a maintenance action according to the aero-engine state. The aero-engine state is changed by the chosen maintenance action, along with the maintenance cost, as shown in Fig. 2. The optimal objective of the agent is to find the maintenance action collection, whose total cost is the minimum in the long run.

Comparison between traditional methods and proposed optimization algorithm.

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

4

Z. LI et al. sub-state Yj denotes the performance state, and Zk denotes the random failure state. (1) LLP state

Fig. 2

Schematic diagram of reinforcement learning.

Value iteration is a reinforcement learning algorithm that is widely adopted in solving decision-making problems. In the reinforcement learning framework, the Gauss–Seidel value iteration algorithm provides an appropriate way to optimize an aero-engine life-cycle maintenance policy. According to the Gauss–Seidel value iteration algorithm, an agent would run multiple episodes for the purpose of exploring and finding the optimal policy. The learning process is conducted for a sufficient number of iterations, and the total cost of each iteration is recorded. The minimum total cost is represented as Q-value, which is updated every iteration, and the Bellman equation is adopted as the updating mechanism in the Gauss–Seidel value iteration algorithm. The convergence of value iterative methods has been widely proven.31 Thus, based on the reinforcement learning framework, the Gauss–Seidel value iteration algorithm is adopted in the proposed aero-engine life-cycle maintenance policy optimization algorithm. 3. Aero-engine life-cycle maintenance policy optimization algorithm The reinforcement learning approach is adopted in the proposed aero-engine life-cycle maintenance policy optimization algorithm, in which the hybrid maintenance strategy and long service-life are considered synchronously. In the reinforcement learning framework, the aero-engine state, maintenance actions, state transition matrices, and maintenance costs should be determined. To address the hybrid maintenance strategy, a multi-dimensional state space is adopted to represent the aero-engine state, taking performance, LLP, and random failure states into consideration synchronously. The optimal objective of the proposed optimization algorithm is to obtain a maintenance policy whose total cost is the minimum in the long run. The reinforcement learning algorithm of Gauss–Seidel value iteration is adopted, addressing longterm optimization. In this section, the proposed optimization algorithm is described in detail.

In engineering, an LLP must be replaced before its life limitation, and the HM strategy is adopted for LLP maintenance. Traditionally, the flight-cycle or flight-hour is adopted to represent the LLP state. For example, a fan shaft is an LLP of a CFM56-5B aero-engine, and its life limitation is 30,000 flightcycles. That is, a fan shaft must be replaced before 30,000 flight-cycles. For convenience, the LLP state is represented by discrete time increments.32 In the proposed optimization algorithm, the LLP state is represented by several state levels, denoted as fTi ji ¼ 0; 1; 2; :::g, where T0 denotes the all-new state; when m < n, state Tn is ‘‘older” than state Tm . It is defined that an LLP is in state Tn , when tn1 < tllp 6 tn ; tn1 and tn are boundary values; tllp is an LLP real state, measured by flight-cycle or flight-hour. (2) Performance state As the CBM strategy is adopted in aero-engine maintenance, the aero-engine performance state should also be considered in the proposed optimization algorithm. Most of the in-service aero-engines are equipped with condition monitoring systems, and performance parameters are sent to ground by the aircraft communication addressing and reporting system in quasi real-time. Traditionally, the aero-engine performance state is assessed by performance parameters.33,34 In engineering, the Exhaust Gas Temperature Margin (EGTM) is adopted as an aero-engine performance indicator.35 The EGTM is defined as the margin temperature between exhaust gas temperature and red-line temperature. The aero-engine performance degrades as it operates, presented as EGTM declining.36,37 When the EGTM is close to the limitation, performance recovery maintenance should be performed. The 1000 flight-cycles EGTM parameters of a CFM56-5B aeroengine are shown in Fig. 3. As EGTM parameters are typical time series,38 it would make the algorithm extremely complicated when EGTM time series are adopted directly. For convenience, the CBM strategy is formulated as a discrete MDP in the reinforcement learning framework. Thus, the aero-engine performance state is represented by several levels, denoted as fDi ji ¼ 0; 1; 2; :::g, where D0 denotes the all-new performance state; when m < n, state Dn is worse than state Dm . It is defined that the performance state is Dn , when dn1 < dper 6 dn , in which dn1 and dn are

3.1. Aero-engine states In the reinforcement learning framework, the aero-engine state is changed by performed actions. As the hybrid maintenance strategy is considered in the proposed optimization algorithm, a multi-dimensional state space is adopted to represent the aero-engine state, in which performance, LLP, and random failure states are considered synchronously. The multidimensional state space is represented by S ¼ fXi ; Yj ; Zk ; :::g, where Xi ; Yj ; Zk ; ::: denote the sub-states in the multidimensional state space. Each sub-state denotes one considered factor, for example, sub-state Xi denotes the LLP state,

Fig. 3

1000 flight-cycles EGTM of a CFM56-5B aero-engine.

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

An aero-engine life-cycle maintenance policy optimization algorithm the boundary values; dper is the EGTM real value, measured by centigrade. Thus, in the proposed optimization algorithm, the aero-engine performance state is indicated by the performance levels. (3) Random failure state Although an aero-engine is with high reliability, it is subjected to random failures in practice. When an aero-engine is in the random failure state, CM should be performed to drive it back to the working state. Unlike the LLP or performance state, the random failure state is represented by two levels, denoted as fF0 ; F1 g, where F0 denotes the working state and F1 denotes the random failure state. From the above, when LLP, performance, and random failure states are all considered, the aero-engine state is represented by a multi-dimensional state space, denoted as ð1Þ S ¼ Ti ; Dj ; Fk ji ¼ 0; 1; 2; :::; j ¼ 0; 1; 2:::; k ¼ 0; 1 where Ti denotes the LLP state, Dj denotes the performance state, and Fk denotes the random failure state. 3.2. Maintenance actions In reinforcement learning, maintenance actions are performed on every decision epoch. The decision epoch is denoted as fEi ji ¼0; 1; 2; :::; m; :::g, and state Si is regarded as the status between epochs Ei 1 and Ei . As LLP, performance, and random failure states are all considered, LLP replacement actions, performance recovery actions, and failure corrective actions should be determined on each decision epoch. LLP replacement actions denote maintenance actions of replacing an aero-engine LLP. When an LLP replacement action is performed, the definite LLP is replaced, and the LLP state is changed. LLP replacement actions are represented by fArep;i ji ¼ 0; 1; 2; :::g, where Arep;0 denotes no LLP replaced and Arep;m ðm–0Þ denotes LLP m replaced. When Arep;m is performed, the LLP m state is changed to the all-new state. Performance recovery actions denote maintenance actions of recovering aero-engine performance. When a performance recovery action is performed, the performance state is recovered by a definite level, and the performance state is changed. Performance recovery actions are represented by fArec;j jj ¼ 0; 1; 2; :::g, where Arec;0 denotes no performance recovery action performed and Arec;m ðm–0Þ denotes the action of recovering m performance levels. Failure corrective actions denote maintenance actions of making a failure-state aero-engine back to the running state. When the aero-engine is trapped in the random failure state, a failure corrective action should be performed. Failure corrective actions are represented by Acor ¼ fAcor;0 ; Acor;1 g, where Acor;0 denotes no corrective maintenance performed and Acor;1 denotes corrective maintenance performed. From the above, aero-engine maintenance actions are represented by A ¼ Arep;i ; Arec;j ; Acor;k ji ¼ 0; 1; 2; :::; j ¼ 0; 1; 2:::; k ¼ 0; 1

5

3.3. State transition In the reinforcement learning framework, the aero-engine state is changed by performed maintenance actions. Thus, LLP, performance, and random failure state transitions are illustrated as follows. (1) LLP state transition In engineering, the LLP state is measured by the flight-cycle or flight-hour, which increases directly as an aero-engine operates. The LLP state would be recovered to the all-new state when an LLP replacement action is performed. Thus, the LLP state transfers directly, without uncertainty. When action Arep;0 is performed, LLP state Ti would transfer to the definite state Tiþ1 , that is, pðTiþ1 jTi ; Arep;0 Þ ¼ 1. When action Arep;1 is performed, LLP state Ti would transfer to T0 , that is, pðT0 jTi ; Arep;1 Þ ¼ 1. A schematic diagram of LLP state transition is shown in Fig. 4. (2) Performance state transition Aero-engine performance deterioration is inevitable in engineering. As the imperfect maintenance concept and random factors are considered in the proposed optimization algorithm, the CBM strategy is formulated as a discrete MDP. Probability matrices are adopted in performance state transition. When no performance recovery action is performed, the performance state transfers according to recession probabilities. That is, when action Arec;0 is performed, the performance state would transfer from Di to Dj by probability pðDj Di ; Arec;0 Þ, denoted as ( 0 i ¼ 0; 1; :::; m; j < i ð3Þ p Dj Di ; Arec;0 ¼ pij ¼ pij i ¼ 0; 1; :::; m; j P i Because the maintenance concept of ‘‘as good as new” has been proven to be far from the truth,40 a more realistic concept of imperfect repair is adopted for performance recovery actions.41,42 That is, the performance state cannot be recovered to the all-new state by any maintenance actions, and the performance state would transfer according to the transition probability matrices. When action Arec;m ðm > 0Þ is performed, the performance state would transfer from Di to Dim according to the probability matrix [pðDim jDi ; Arec;m ; i m > 0Þ]. A schematic diagram of performance state transition is shown in Fig. 5. The performance state transition probability matrices can be calculated by survival analysis based on the Weibull distribution.43,44

ð2Þ Because an aero-engine does not operate during the maintenance process, maintenance actions are assumed to be ‘‘instantaneous” in the proposed optimization algorithm.39

Fig. 4

Schematic diagram of LLP state transition.

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

6

Z. LI et al.

Fig. 5

Schematic diagram of performance state transition.

(3) Random failure state transition As an aero-engine may be in the random fault state occasionally, the CM strategy is also formulated as a discrete MDP. The random failure state would transfer according to probability matrices. In the proposed optimization algorithm, an aero-engine may fall into the random failure state F1 by probability pðF1 jDj ; F0 ; Acor;0 Þ. To be more realistic, it is assumed that the random failure probability is lower when the aero-engine is in a better performance state. Thus, the random failure probability pðF1 jDj ; F0 Þ is related to the performance state Dj . When the aero-engine is in the random failure state, corrective maintenance should be performed to drive it to the working state. It is assumed that the corrective maintenance is completely efficient, and the transition probability of the corrective maintenance is represented as pðF0 jDj ; F1 ; Acor;1 Þ = 1. A schematic diagram of random failure state transition is shown in Fig. 6. From the above, the state transition matrix on Ei is represented by pðSiþ1 jSi ; Ai Þ, where Siþ1 denotes the aero-engine state on Eiþ1 , Si denotes the aero-engine state on Ei , and Ai denotes the performed maintenance action on Ei . Different sub-states would transfer according to different modes, as illustrated above. 3.4. Total cost and optimization In reinforcement learning, an agent chooses maintenance actions according to action costs.44,45 In the proposed optimization algorithm, the optimal objective is to obtain a maintenance policy whose total cost is the minimum in the long run. The maintenance cost on decision epoch Ek is calculated by ck ¼ Cope;k þ Crep;k þ Crec;k þ Ccor;k þ Cother

ð4Þ

where ck denotes the maintenance cost on decision epoch Ek ; Cope;k denotes the operating cost; Crep;k ,Crec;k , Ccor;k , and Cother denote the LLP replacement cost, the performance recovery cost, the corrective maintenance cost, and other costs, respectively.

Fig. 6

Schematic diagram of random failure state transition.

In general, a good-performance-state aero-engine would have a better economic efficiency and a lower random failure rate. Thus, Cope;k is determined by the performance state. When an LLP replacement action is performed, material and replacement costs are both counted in Crep;k ðArep;k Þ. LLP replacement costs vary from different LLP replacement actions. The performance recovery cost is represented by Crec;k ðArec;k Þ, and when m > n, Crec;i ðArec;m Þ > Crec;i ðArec;n Þ. The corrective maintenance cost Ccor;k is counted when a corrective maintenance is performed. As the life-cycle maintenance policy is optimized by the proposed optimization algorithm, the future maintenance cost should be counted in optimization. In reinforcement learning, a discount factor is adopted to address long-term optimization. Thus, the optimal objective of the proposed optimization algorithm is denoted as obj : C ¼ min ck þ cckþ1 þ c2 ckþ2 þ :::cKk cK ð5Þ where C denotes the discounted future cost, and cðc 2 ½0; 1Þ denotes the discount factor, representing the relative impact of future action costs. In reinforcement learning, when a larger c is adopted, future action costs would leave a greater impact on maintenance action selection. That is, when c ¼ 0, the optimized policy is shortsighted, and the maintenance action is chosen by the current cost; when c ¼ 1, all the future actions are considered in action selection, which would bring a heavy calculation burden. Thus, a balance between future costs and the calculation burden should be determined. Thus, the discount factor c should be set as c 2 ð0; 1Þ, for example, c ¼ 0:9. As the Gauss–Seidel value iteration algorithm is an effective reinforcement learning algorithm, which is widely used in policy optimization, it is adopted to seek for the maintenance action collection whose discounted long-term cost is the minimum. 4. Numerical experiments of maintenance policy optimization Two numerical experiments were used to illustrate the proposed aero-engine life-cycle maintenance policy optimization algorithm in detail. According to the reinforcement learning framework, determinations of aero-engine states, maintenance actions, state transition matrices, and total cost matrices are described firstly. As traditional methods are unable to address the long service-life and the hybrid maintenance strategy synchronously, they were not adopted as the benchmark methods. The reinforcement learning algorithm of Gauss–Seidel value iteration was adopted in the experiments. 4.1. Aero-engine states In the reinforcement learning framework, the aero-engine state should be determined firstly. As the multi-dimensional state space was adopted to represent the aero-engine state, performance, LLP, and random failure states were all considered in the first numerical experiment. The EGTM was adopted to represent the aero-engine performance state. For convenience, the EGTM time series were divided into several levels. In a sample fleet, there were three performance recovery actions, including minimal repair,

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

An aero-engine life-cycle maintenance policy optimization algorithm medium repair, and overhaul repair. According to the performance recovery assumptions in Section 4.2, there should be at least five performance levels to fully illustrate the three performance recovery actions. In reinforcement learning, as more performance levels would make the optimization algorithm more complicated, five levels were able to present the aeroengine performance state. In the numerical experiment, the performance state was divided into five levels, denoted as fD1 ; D2 ; D3 ; D4 ; D5 g, from good to bad, where D5 denoted the worst performance level. Besides, the all-new performance state was denoted as D0 . Because performance and random failure states were both transferred by probabilities, the random failure state was regarded as a specific ‘‘performance state”. Thus, in the numerical experiment, performance and random failure states were represented by one state-space dimension, denoted as fD0 ; D1 ; D2 ; D3 ; D4 ; D5 ; Fg.

Fig. 7

7

Although there were several LLPs in an aero-engine, for convenience, LLPs with the same life limitation were regarded as the same LLP type. In the first numerical experiment, one LLP type was taken into account, and the LLP state was measured by flight-cycles. Referring to the adopted performance state levels, the LLP state was divided into five levels, denoted as fT1 ; T2 ; T3 ; T4 ; T5 g. Besides, T0 denoted the all-new state. Thus, the aero-engine state was represented by a twodimensional state space, denoted as S¼

Di ; Tj ji 2 f0; 1; 2; 3; 4; 5; 6g; j 2 f0; 1; 2; 3; 4; 5g

ð6Þ

where Di ði ¼ 0; 1; 2; 3; 4; 5Þ denoted the performance state; D6 denoted the random failure state F; Tj ðj ¼ 0; 1; 2; 3; 4; 5Þ denoted the LLP state. The performance and random failure states were denoted as the first dimension; the LLP state was denoted as the second dimension.

Schematic diagrams of performance recovery action effects on performance states.

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

8

Z. LI et al.

4.2. Maintenance actions In reinforcement learning, the aero-engine state was changed by performed maintenance actions, and maintenance actions should be determined according to the aero-engine state. Firstly, according to traditional and practical matters,46 two performance recovery assumptions were made in the numerical experiment as follows: (1) An aero-engine could not be recovered to the all-new performance state by any performance recovery actions. (2) No maintenance action should be performed when the aero-engine was in the worst performance state. According to the sample fleet operation, three actions of minimal repair, medium repair, and overhaul repair were included in performance recovery actions. Thus, three performance recovery actions were adopted in the numerical experiment, denoted as Arec ¼ fArec;1 ; Arec;2 ; Arec;3 g, where Arec–h1 was defined as the action to recover the performance state from Dx to Dx1 , when x 1 > 0, or to keep in the performance state Dx , when x ¼ 1; Arec;2 was defined as the action to recover the performance state from Dx to Dx2 , when x 2 > 0, or to recover the performance state from Dx to D1 , when 1 6 x 6 2; Arec;3 was defined as the action to recover the performance state from Dx to Dx3 , when x 3 > 0, or to recover the performance state from Dx to D1 , when 1 6 x 6 3. Besides, Arec;0 denoted no performance recovery action was performed. In the reinforcement learning framework, Fig. 7 shows performance recovery action effects on performance states. In the numerical experiment, the random failure state was regarded as a specific ‘‘performance state”. Thus, it was assumed that performance recovery actions could drive the random failure aero-engine back to the working state. As one LLP type was considered, LLP replacement actions were denoted as Arep ¼ fArep;0 ; Arep;1 g. Fig. 8 shows the LLP replacement action effect on LLP states. From the above, maintenance actions in the numerical experiment were represented by ð7Þ A ¼ Arep;j ; Arec;i ji ¼ 0; 1; 2; 3; j ¼ 0; 1 4.3. State transition In the proposed optimization algorithm, state transition matrices were determined according to aero-engine states and maintenance actions. When action Arec;0 was performed,

the aero-engine performance state would transfer according to P0 ¼ pðDj jDi ; Arec;0 Þ , denoted as 3 2 p00 p01 p02 p03 p04 p05 6 0 p p12 p13 p14 p15 7 7 6 11 7 6 7 6 0 0 p p p p 22 23 24 25 7 ð8Þ P0 ¼ 6 6 0 p33 p34 p35 7 0 0 7 6 7 6 4 0 0 0 0 p44 p45 5 0

0

0

p55

where pij ði ¼ jÞ denoted the probability that the aero-engine would stay in the current performance state; pij ði < jÞ denoted the probability of the aero-engine transferring to a worse performance state. Based on the aforementioned methods, performance state Di and transition probability matrix P0 were obtained from the sample fleet, as shown in 8 D0 6 10 > > > > > 10 < D1 6 30 > > > < 30 < D2 6 50 Di ¼ ð9Þ > > > 50 < D3 6 70 > > > 70 < D4 6 90 > > : 90 < D5 2

3 0:3742 0:2082 0:0161 0:4222 0:2457 0:0172 7 7 7 0:4947 0:4105 0:0209 7 7 0:0882 0:8739 0:0379 7 7 7 0:9897 0:0103 5 0

0:0188 0:1014 0:2813 6 0 0:0622 0:2527 6 6 6 0 0 0:0739 P0 ¼ 6 6 0 0 0 6 6 4 0 0 0

0

0

0

0

0

1 ð10Þ

Transition matrices of Arec;1 , Arec;2 , and Arec;3 were P1 , P2 , and P3 , denoted as 2 3 p03 p04 p05 p00 p01 p02 6 0 p p12 p13 p14 p15 7 6 7 11 6 7 6 0 p p12 p13 p14 p15 7 11 6 7 P1 ¼ 6 7 6 0 7 p 0 p p p 22 23 24 25 7 6 6 7 4 0 0 0 p33 p34 p35 5 2 6 6 6 6 6 P2 ¼ 6 6 6 6 4 2

Fig. 8 Schematic diagram of the LLP replacement action effect on LLP states.

0

0

6 6 6 6 6 P3 ¼ 6 6 6 6 4

0

0

0

p00

p01

p02

p03

p04

0

p11

p12

p13

p14

0

p11

p12

p13

p14

0

p11

p12

p13

p14

0

0

p22

p23

p24

p33

p34

0

p44

0

0

0

p00

p01

p02

p03

p04

0

p11

p12

p13

p14

0

p11

p12

p13

p14

0

p11

p12

p13

p14

0

p11

p12

p13

p14

0

0

p22

p23

p24

p45 p05

3

p15 7 7 7 p15 7 7 7 p15 7 7 7 p25 5 p35 p05

ð11Þ

3

p15 7 7 7 p15 7 7 7 p15 7 7 7 p15 5 p25

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

An aero-engine life-cycle maintenance policy optimization algorithm In the numerical rate was experiment, the random failure denoted as PF ¼ pðFjDj ; Arec;0 ; j ¼ 0; 1; 2; 3; 4; 5Þ . The probability that the aero-engine recover from the random failure state to the working state was denoted as PR ¼ pðDj jF; Arec;i ; j ¼ 1; 2; 3; 4; 5; i ¼ 1; 2; 3Þ . PF was the colwas the row matrix. umn matrix and PR T PF ¼ ½PF;0 ; PF;1 ; PF;2 ; PF;3 and PR ¼ ½PR;1 ; PR;2 ; PR;3 adopted in the numerical experiment were shown as 2 3 0:1 0:05 0:05 0:05 6 0:1 0:1 0:05 0:05 7 6 7 6 7 6 0:15 0:1 0:1 0:05 7 6 7 PF ¼ 6 7 6 0:15 0:15 0:1 0:1 7 6 7 4 0:2 0:15 0:15 0:1 5 0:2 0:2 0:15 0:15 3 2 0 0 0 0:0882 0:8739 0:0379 7 6 PR ¼ 4 0 0 0:0739 0:4947 0:4105 0:0209 5 0

0:0622

0:2527

0:4222

0:2457

0:0172 ð12Þ

As the line sum of a transition probability matrix should equal to 1, transition matrices P0 , P1 , P2 , and P3 were adjusted to P00 , P01 , P02 , and P03 , according to PF and PR , denoted as p Dj Di ; Arec;0 ;i ¼ 0; 1; 2; 3; 4; 5; 6; j ¼ 1; 2; 3; 4; 5; 6 0

ð13Þ P S PF;0 ¼ 0 1 p Dj Di ; Arec;1 ;i ¼ 0; 1; 2; 3; 4; 5; 6; j ¼ 1; 2; 3; 4; 5; 6 0

P 1 PF;1 ¼ PR;1 0

ð14Þ

p Dj Di ; Arec;2 ;i ¼ 0; 1; 2; 3; 4; 5; 6; j ¼ 1; 2; 3; 4; 5; 6 0

P 2 PF;2 ¼ PR;2 0

ð15Þ

p Dj Di ; Arec;3 ;i ¼ 0; 1; 2; 3; 4; 5; 6; j ¼ 1; 2; 3; 4; 5; 6 0

P 3 PF;3 ¼ PR;3 0

ð16Þ

Table 1 State (D0,T0) (D1,T0) ... (F,T0) (D0,T1) (D1,T1) ... (F,T1) . . .. . . (D0,T4) (D1,T4) ... (F,T4) (D0,T5) (D1,T5) ... (F,T5)

9

As the transition probability is unavailable for the LLP state, LLP state Ti would transfer to Tiþ1 when Arep;0 was performed. The state transition set of A0 ¼ fArep;0 ; Arec;0 g was denoted as p1 ðPF ; P; PN ; PR Þ, as shown in Table 1. In Table 1, PF ¼ PF;0 , P ¼ P00 , PN ¼ 1, PR ¼ 0, and the blank sections should be filled with 0 matrices. In addition to action A0 ¼ fArep;0 ; Arec;0 g, there were seven maintenance actions. The total number of maintenance actions NA was calculated by 1 nreca þ 1 ð17Þ NA ¼ C1nllp þ C2nllp þ ::: þ Cnnllp llp where nllp denoted the number of LLP types, and nreca denoted the number of performance recovery actions. When no LLP was replaced, the maintenance actions were denoted as Am ¼ fArep;0 ; Arec;i ji ¼ 0; 1; 2; 3g, and the state transition matrices were denoted as pðS; Arep;0 ; Arec;i j i ¼ 0; 1; 2; 3Þ ¼ p1 fPF;i ; P0i ; PN ; PR;i ji ¼ 0; 1; 2; 3g, represented by the same form as in Table 1. When an LLP replacement action was performed, the maintenance actions were denoted as Am ¼ fArep;1 ; Arec;i j i ¼ 0; 1; 2; 3g, and state transition matrices were denoted as pðS; Arep;1 ; Arec;i ji ¼ 0; 1; 2; 3Þ ¼ p2 fPF;i ; P0i ; PN ; PR;i ji ¼ 0; 1; 2; 3g, as shown in Table 2. 4.4. Maintenance policy optimization Based on the aforementioned methods, aero-engine states, maintenance actions, and state transition matrices were all determined. In the reinforcement learning framework, the Gauss–Seidel value iteration algorithm was adopted to optimize the aero-engine maintenance policy. The flow diagram of the proposed aero-engine maintenance policy optimization algorithm is shown in Fig. 9. Because real cost data was unavailable, a hypothetical cost matrix was adopted in the numerical experiment. In engineering, the cost matrix may change according to the actual maintenance cost, and it would not distort the analysis of simulation results. In the numerical experiment, the LLP replacement cost Cllp was assumed to be 350, while the performance recovery costs Crep;1 , Crep;2 , and Crep;3 were assumed to

Transition probability matrix of no LLP replacement action. (D0,T0)

P PR

...

(F,T0)

(D0,T1)

PF

P

PN

PR

...

(F,T1)

(D0,T2)

PF

P

PN

PR

...

(F,T2)

...

...

...

(D0,T5)

...

(F,T5)

P PR

PF PN

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

10 Table 2 State (D0,T0) (D1,T0) ... (F,T0) (D0,T1) (D1,T1) ... (F,T1) ... (D0,T4) (D1,T4) ... (F,T4) (D0,T5) (D1,T5) ... (F,T5)

Z. LI et al. Transition probability matrix of an LLP replacement action. (D0,T0)

...

(F,T0)

(D0,T1)

PF

P

PN PF

PR P

PN ... PF

PR ... P

PN PF

PR P

PN

PR

Fig. 9

...

(F,T1)

(D0,T2)

...

(F,T2)

...

(D0,T5)

...

(F,T5)

Flow diagram of maintenance policy optimization.

be 300, 350, and 500. As an aero-engine has long service-life, future maintenance actions should be fully considered in lifecycle maintenance policy optimization. Thus, a larger discount factor c ¼ 0:9 was adopted. In contrast to the Jacobi value iteration algorithm, the Gauss–Seidel value iteration algorithm has a faster convergence ability. Thus, in the reinforcement learning framework, the Gauss–Seidel value iteration algorithm was adopted to optimize the aero-engine maintenance policy. As the aero-engine state was represented by a twodimensional space, the optimal maintenance policy was presented by a two-dimensional policy map, shown in Fig. 10. Optimal maintenance actions on each decision epoch were shown in the policy map, and decision epochs were represented by aero-engine states. In the maintenance policy map of Fig. 10, the LLP state was regarded as the ordinate, and the performance state was regarded as the abscissa. Different maintenance actions were presented in different colors and shapes. In the legend, A0 denoted action f Arep;0 ; Arec;0 g ; A1

Fig. 10

Maintenance policy map of the numerical experiment.

denoted action f Arep;0 ; Arec;1 g ; A2 denoted action f Arep;0 ; Arec;2 g ; A4 denoted action f Arep;1 ; Arec;0 g ; A6 denoted action f Arep;1 ; Arec;2 g . In engineering, a maintenance policy could be obtained according to the aero-engine state.

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

An aero-engine life-cycle maintenance policy optimization algorithm 4.5. Algorithm performance analysis In this section, a more complex numerical experiment was conducted to illustrate the proposed optimization algorithm with more detailed instructions. In engineering, an aero-engine is usually composed of more than one LLP type. Some LLP

State

(D0,T0,L0)

(D0,T0,L0) (D1,T0,L0) ... (F,T0,L0) (D0,T1,L1) (D1,T1,L1) ... (F,T1,L1) ... (D0,T4) (D1,T4) ... (F,T4) (D0,T2,T3) (D1,T2,T3) ... (F,T2,T3)

Table 4

(D0,T0, L0) ... (F,T0,L0) (D0,T1, L1) ... (F,T1,L1) (D0,T2, L2) ... (F,T2,L2) (D0,T1, L3) ... (F,T1,L3) (D0,T2, L1) ... (F,T2,L1) (D0,T1, L2) ... (F,T1,L2) (D0,T2, L3) ... (F,T2,L3)

states were measured by flight-hour, different from the one in the first numerical experiment. Thus, a more complex numerical experiment with two LLP types was conducted. To distinguish the two numerical experiments, the first numerical experiment was named as Experiment 1, and the numerical experiment in this section was named as Experiment 2.

Transition matrices of no LLP replacement action.

Table 3

State

11

...

(F,T0,L0)

(D0,T1,L1)

PF

P

PN

PR

...

(F,T1,L1)

(D0,T2,L2)

PF

P

PN

PR

...

(F,T2,L2)

...

...

...

(D0,T2,T3)

...

P PR

P

(F,T2,T3)

PF

PR

PN

Transition probability matrices of LLP1 replacement actions. (D0,T0, L0)

. . . (F,T0, L0)

(D0,T1, L1)

PF

P

PN

PR

PF

P

PN

PR

PF

P

PN

PR

. . . (F,T1, L1)

. . . (F,T2, L2)

(D0,T1, L3)

PF

P

PN

PR

PF

P

PN

PR

. . . (F,T1, L3)

. . . (F,T2, L 1)

(D0,T1, L2)

PF

P

PN

PR

PF

P

PN

PR

. . . (F,T1, L2)

(D0,T2, L3)

. . . (F,T2, L3)

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

12

Z. LI et al.

4.5.1. Aero-engine states Two LLP types were considered in Experiment 2, and the two LLP states were measured by different units: one was flightcycle and the other was flight-hour. Thus, the aero-engine state was denoted as a three-dimensional state space. The three dimensions included the performance state, the LLP state represented by flight-cycle (LLP1), and the LLP state represented by flight-hour (LLP2). Corresponding to Experiment 1, in addition to all-new states T0 and L0 , the LLP1 state was divided into two levels, denoted as fT0 ; T1 ; T2 g; the LLP2 state was divided into three levels, denoted as fL0 ; L1 ; L2 ; L3 g. Same as in Experiment 1, the performance state was represented by five levels, and the random failure state was regarded as a specific ‘‘performance state”. Thus, the aero-engine state space was represented by S ¼ Di ; Tj ; Lk ji 2 f0; 1; 2; :::; 6g; j 2 f0; 1; 2g;k 2 f0; 1; 2; 3g ð18Þ where Di ði ¼ 0; 1; 2; 3; 4; 5Þ denoted the performance state; D6 denoted the random failure state; Tj denoted the LLP1 state; Lk denoted the LLP2 state.

unavailable for LLP2. However, aero-engine state transition matrices were changed by LLP2. On Em , action Am ¼ f Arec;i ; ArepT;0 ; ArepL;0 ji ¼ 0; 1; 2; 3g denoted no LLP replaced, and transition matrices are presented in Table 3, denoted as pðS;Arec;i ; ArepT;0 ; ArepL;0 j i ¼ 0; 1;2; 3Þ ¼ p3 f PF;i ;P0i ; PN ;PR;i ji ¼ 0;1; 2; 3g . The concrete forms of PF , P0i , and PR were the same as those in Experiment 1. State transition matrices of Am ¼ f Arec;i ; ArepT;1 ; ArepL;0 j i ¼ 0; 1; 2; 3g are presented in Table 4, denoted as pðS; Arec;i ; ArepT;1 ; ArepL;0 ji ¼ 0; 1; 2; 3Þ ¼ p4 f PF;i ; P0i ; PN ; PR;i j i ¼ 0; 1; 2; 3g . State transition matrices of Am ¼ fArec;i ; ArepT;0 ; ArepL;1 j i ¼ 0; 1; 2; 3g are presented in Table 5, denoted as pðS; Arec;i ; ArepT;0 ; ArepL;1 ji ¼ 0; 1; 2; 3Þ ¼ p5 fPF;i ; P0i ; PN ; PR;i j i ¼ 0; 1; 2; 3g. State transition matrices of Am ¼ fArec;i ; ArepT;1 ; ArepL;1 j i ¼ 0; 1; 2; 3g are presented in Table 6, denoted as pðS; Arec;i ; ArepT;1 ; ArepL;1 ji ¼ 0; 1; 2; 3Þ ¼ p6 fPF;i ; P0i ; PN ; PR;i j i ¼ 0; 1; 2; 3g. 4.5.3. Maintenance policy optimization

4.5.2. Maintenance actions and state transition In Experiment 2, LLP1 and LLP2 replacement actions were denoted as ArepT ¼ fArepT;0 ; ArepT;1 g and ArepL ¼ fArepL;0 ; ArepL;1 g. Performance recovery actions were the same as those in Experiment 1. Performance recovery assumptions were also available. Maintenance actions in Experiment 2 were represented by A ¼ Arec;i ; ArepT;j ; ArepL;k ji ¼ 0; 1; 2; 3; j ¼ 0; 1; k ¼ 0; 1 ð19Þ Similar to LLP1, the LLP2 state would transfer as the flight-hour increasing, and transition probabilities were

In Experiment 2, the reinforcement learning algorithm of Gauss–Seidel value iteration was also adopted to optimize the maintenance policy. Hypothetical costs were adopted, and the LLP1 replacement cost Cllp;1 was assumed to be 300 while the LLP2 replacement cost Cllp;2 was assumed to be 600. The three performance recovery action costs were assumed to be 200, 500, and 800, respectively. The discount factor was set as c ¼ 0:9. Because the aero-engine state was represented by a three-dimensional space, including the LLP1, LLP2, and performance states, the optimal maintenance policy was represented by a three-dimensional policy map, shown in Fig. 11.

Table 5

Transition probability matrices of LLP2 replacement actions.

State

(D0,T0,L0) . . . (F,T0,L0) (D0,T1,L1) . . . (F,T1,L1) . . . (D0,T1,L3) . . . (F,T1,L3) (D0,T2,L1) . . . (F,T2,L1) . . . (D0,T2,L3) . . . (F,T2,L3)

(D0,T0,L0) ... (F,T0,L0) (D0,T1,L1) ... (F,T1,L1) (D0,T2,L2) ... (F,T2,L2) (D0,T1,L3) ... (F, T1,L3) (D0,T2,L1) ... (F,T2,L1) (D0,T1,L2) ... (F,T1,L2) (D0,T2,L3) ... (F,T2,L3)

PF

P

PN

PR

PF

P

PN

PR

PF

P

PN

PR

PF

P

PN

PR

PF

P

PN

PR

PF

P

PN

PR

PF

P

PN

PR

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

An aero-engine life-cycle maintenance policy optimization algorithm Table 6

13

Transition probability matrices of LLP1 and LLP2 replacement actions. (D0,T0,L0) . . . (F,T0,L0) (D0,T1,L1) . . . (F,T1,L1) . . . (F,T2,L2) (D0,T1,L3) . . . (F,T1,L3) . . . (D0,T2,L3) . . . (F,T2,L3)

State (D0,T0,L0) ... (F,T0,L0) (D0,T1,L1) ... (F,T1,L1) (D0,T2,L2) ... (F,T2,L2) (D0,T1,L3) ... (F,T1,L3) (D0,T2,L1) ... (F,T2,L1) (D0,T1,L2) ... (F,T1,L2) (D0,T2,L3) ... (F,T2,L3)

PF

P

PN PF

PR P

PN PF

PR P

PN PF

PR P

PN PF

PR P

PN PF

PR P

PN PF

PR P

PN

PR

Table 7

Fig. 11

Algorithm information of two experiments.

Item

Experiment 1

Experiment 2

State dimension Probability state Definite state Probability state level Definite state level Probability state action number Definite state action number Action number Transition matrix dimension

2 1 1 6 5 3 1 42 42 42

3 1 2 6 5 3 2 49 49 49

Maintenance policy map of Experiment 2.

In the three-dimensional policy map, the LLP1, LLP2, and performance states were regarded as x axis, z axis, and y axis, respectively. The maintenance actions were presented in different colors and shapes. In the legend of the maintenance policy map in Fig. 11, A0 denoted action fArec;0 ; ArepT;0 ; ArepL;0 g; A1 denoted action fArec;1 ; ArepT;0 ; ArepL;0 g; A4 denoted action fArec;0 ; ArepT;1 ; ArepL;0 g; A5 denoted action fArec;0 ; ArepT;0 ; ArepL;1 g; A6 denoted action fArec;1 ; ArepT;1 ; ArepL;0 g; A9 denoted action fArec;1 ; ArepT;0 ; ArepL;1 g; A12 denoted action fArec;1 ; ArepT;1 ; ArepL;1 g; A15 denoted action fArec;0 ; ArepT;1 ; ArepL;1 g. Based on the policy map, a maintenance policy was obtained according to the aero-engine state. 4.5.4. Algorithm analysis In aforementioned two numerical experiments, a twodimensional state space was adopted in Experiment 1, while a three-dimensional state space was adopted in Experiment 2. It was obvious that state transition matrices of Experiment 2 were more complicated. Thus, the algorithm would become

complex as the state number increased. The algorithm information of the two numerical experiments is shown in Table 7. In the aforementioned numerical experiments, performance and random failure states were defined as probability states, because they transferred by probabilities. LLP states did not transfer by probabilities, and were defined as definite states. As shown in the two numerical experiments, state transition matrix forms were impacted by definite states. Transition probabilities were impacted by probability states. In reinforcement learning, a larger state space would make the algorithm more complicated, and more iterations were needed to seek for the optimal policy. The impact of aero-engine state space complexity on the algorithm was obvious. As the discount factor is an important coefficient in reinforcement learning, the discount factor impact was analyzed by contrast experiments, which were based on Experiment 1. In addition to the discount factor, the other parameters were all the same as those in Experiment 1. Maintenance policy maps of discount factor analysis are shown in subgraphs of Fig. 12.

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

14

Fig. 12

Z. LI et al.

Policy maps of discount factor analysis in Example 1.

In the policy maps, different maintenance actions were represented by different colors and shapes. In the legend of Fig. 12, A0 denoted action fArep;0 ; Arec;0 g; A1 denoted action fArep;0 ; Arec;1 g; A2 denoted action fArep;0 ; Arec;2 g;A4 denoted action fArep;1 ; Arec;0 g; A5 denoted action fArep;1 ; Arec;1 g; A6 denoted action fArep;1 ; Arec;2 g. The analysis experiments showed that when the discount factor was set as c > 0:43, the optimal policy maps were the same as those in Fig. 10; when the discount factor was set as c < 0:35, the optimal policy maps were the same as those in Fig. 12(c). In Fig. 12, discount factors were set as c ¼ 0:43, c ¼ 0:41, and c ¼ 0:35, respectively. The analysis experiments showed that when a smaller discount factor was adopted, more low-cost maintenance actions were adopted in the optimal maintenance policy. It was consistent with the aforementioned analysis of the discount factor. As hypothetical maintenance costs were adopted in numerical experiments, cost impacts were analyzed. Based on the aforementioned experiments, the medium performance recov-

Fig. 13 Policy maps of performance recovery cost analysis in Example 1.

ery cost was set as Crep;2 ¼ 350, and it was regarded as the benchmark cost. Maintenance policy maps of different cost ratios are shown in subgraphs of Figs. 13 and 14. In the legend of Fig. 13, A0 denoted action fArep;0 ; Arec;0 g; A1 denoted action fArep;0 ; Arec;1 g; A2 denoted action fArep;0 ; Arec;2 g;A3 denoted action fArep;0 ; Arec;3 g; A4 denoted action fArep;1 ; Arec;0 g; A5 denoted action fArep;1 ; Arec;1 g; A7 denoted action fArep;1 ; Arec;3 g. In the legend of Fig. 14, A0 denoted action fArec;0 ; ArepT;0 ; ArepL;0 g; A1 denoted action fArec;1 ; ArepT;0 ; ArepL;0 g;A3 denoted action fArec;3 ; ArepT;0 ; ArepL;0 g; A4 denoted action fArec;0 ; ArepT;1 ; ArepL;0 g; A5 denoted action fArec;0 ; ArepT;0 ; ArepL;1 g; A6 denoted action fArec;1 ; ArepT;1 ; ArepL;0 g; A8 denoted action fArec;3 ; ArepT;1 ; ArepL;0 g; A9 denoted action fArec;1 ; ArepT;0 ; ArepL;1 g; A12 denoted action fArec;1 ; ArepT;1 ; ArepL;1 g; A14 denoted action fArec;3 ; ArepT;1 ; ArepL;1 g; A15 denoted action fArec;0 ; ArepT;1 ; ArepL;1 g. As shown in Figs. 13(a) and 14(a), showed the optimal policy when Crec;1 decreased; Figs. 13(b) and 14(b) showed the

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

An aero-engine life-cycle maintenance policy optimization algorithm

15

Fig. 15 Policy maps of LLP replacement cost analysis in Example 1. Fig. 14 Policy maps of performance recovery cost analysis in Example 2.

optimal policy when Crec;3 decreased; Figs. 13(c) and 14(c) showed the optimal policy when Crec;1 and Crec;3 decreased simultaneously. As shown in Figs. 13(c) and 14(c), when Crec;1 and Crec;3 decreased simultaneously, optimal policy changes were not obvious. The impact of the LLP replacement cost on the optimal maintenance policy was analyzed by contrast experiments. Experiment results showed that the optimal maintenance policy would not change as the LLP replacement cost increasing. However, LLP replacement action times would increase as the LLP replacement cost decreasing. Optimal policy maps of LLP replacement cost analysis are shown in Figs. 15–17. In the legend of Fig. 15, A0 denoted action fArep;0 ; Arec;0 g; A1 denoted action fArep;0 ; Arec;1 g; A2 denoted action fArep;0 ; Arec;2 g;A4 denoted action fArep;1 ; Arec;0 g; A6 denoted action fArep;1 ; Arec;2 g. In the legends of Figs. 16 and 17, A0 denoted action fArec;0 ; ArepT;0 ; ArepL;0 g; A1 denoted action

fArec;1 ; ArepT;0 ; ArepL;0 g;A2 denoted action fArec;2 ; ArepT;0 ; ArepL;0 g; A4 denoted action fArec;0 ; ArepT;1 ; ArepL;0 g; A5 denoted action fArec;0 ; ArepT;0 ; ArepL;1 g; A6 denoted action fArec;1 ; ArepT;1 ; ArepL;0 g; A7 denoted action fArec;2 ; ArepT;1 ; ArepL;0 g; A9 denoted action fArec;1 ; ArepT;0 ; ArepL;1 g; A13 denoted action fArec;2 ; ArepT;1 ; ArepL;1 g; A15 denoted action fArec;0 ; ArepT;1 ; ArepL;1 g. As shown in Fig. 15, the LLP replacement cost decreased from subgraph (a) to (c). In Figs. 16 and 17, LLP1 and LLP2 replacement costs decreased from subgraph (a) to (b). It was shown that LLP replacement times would increase as the LLP replacement cost decreasing, and policy changes appeared on interface decision epochs. In the aforementioned experiments, the impact of the LLP residual life was not considered. Thus, based on the assumption that the random failure probability would increase as the LLP residual life decreasing, contrast experiments were

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

16

Z. LI et al.

Fig. 16

Policy maps of LLP1 replacement cost analysis in Example 2.

Fig. 17

Policy maps of LLP2 replacement cost analysis in Example 2.

performed to analyze the optimization algorithm. Optimal policy maps are shown in Fig. 18. In the legend of Fig. 18, A0 denoted action fArec;0 ; ArepT;0 ; ArepL;0 g; A1 denoted action fArec;1 ; ArepT;0 ; ArepL;0 g; A4 denoted action fArec;0 ; ArepT;1 ; ArepL;0 g; A5 denoted action fArec;0 ; ArepT;0 ; ArepL;1 g; A6 denoted action fArec;1 ; ArepT;1 ; ArepL;0 g; A9 denoted action fArec;1 ; ArepT;0 ; ArepL;1 g; A12 denoted action fArec;1 ; ArepT;1 ; ArepL;1 g; A15 denoted action fArec;0 ; ArepT;1 ; ArepL;1 g. Based on Experiment 2, according to the residual life of the elder LLP, random failure probabilities were enhanced by 5%, 10%, and 15% respectively from subgraph (a) to (c) of Fig. 18. Policy maps showed no variation. Thus, the LLP residual life may not affect the optimal maintenance policy. 5. Conclusions Based on the reinforcement learning approach, an aero-engine life-cycle maintenance policy optimization algorithm was proposed, which was able to address the long service-life and the hybrid maintenance strategy synchronously. To address the hybrid maintenance strategy, the multi-dimensional state space was adopted to represent the aero-engine state. Based on the reinforcement learning framework, the Gauss–Seidel value iteration algorithm was adopted to optimize the life-cycle maintenance policy.

Compared with traditional optimization methods, the optimal maintenance policy was used to indicate when and how to repair an aero-engine, taking the place of maintenance intervals and work-scopes in traditional methods. Meanwhile, the aero-engine long service-life, the hybrid maintenance strategy, and random factor destabilization were all addressed by the proposed optimization algorithm. Because few historical data was available for training the pre-specified optimization model of the aero-engine lifecycle maintenance policy, the reinforcement learning approach provided an appropriate way. In the reinforcement learning framework, the aero-engine state space, maintenance actions, and state transition matrices were determined according to aero-engine real-life operation. The Gauss–Seidel value iteration algorithm was employed to solve the long-term decision-making problem. The proposed optimization algorithm would help in making a wiser aero-engine life-cycle maintenance policy, resulting in a lower life-cycle maintenance cost. Two numerical experiments and algorithm analyses were employed to illustrate the proposed optimization algorithm in detail. As real aero-engine maintenance cost data was unavailable, hypothetical data was adopted in the numerical experiments. In future studies, maintenance cost calculation methods deserve further attention to improve the applicability of the proposed optimization algorithm.

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

An aero-engine life-cycle maintenance policy optimization algorithm

Fig. 18 Policy maps with LLP lifetime impact on transition probability.

Acknowledgments The authors thank anonymous reviewers for their critical and constructive review of the manuscript. This work was cosupported by the Key National Natural Science Foundation of China (No. U1533202), the Civil Aviation Administration of China (No. MHRD20150104), and the Shandong Independent Innovation and Achievements Transformation Fund, China (No. 2014CGZH1101). References 1. Joo S. Scheduling preventive maintenance for modular designed components: A dynamic approach. Eur J Oper Res 2009;192 (2):512–20. 2. Reneaux J, Brunet V, Esquieu S, Meunier M, Mouton S. Recent achievements in numerical simulation for aircraft power-plant configurations. Aeronaut J 2013;117(1188):213–31. 3. Birch NT. 2020 vision: The prospects for large civil aircraft propulsion. Aeronaut J 2000;104(1038):347–52.

17

4. Liu B, Yeh RH, Xie M, Kuo W. Maintenance scheduling for multicomponent systems with hidden failures. IEEE T Reliab 2017;66(4):1280–92. 5. Yilmaz O, Gindy N, Gao J. A repair and overhaul methodology for aeroengine components. Robot Cim-Int Manuf 2010;26 (2):190–201. 6. Andrew G. The development of engine-health monitoring for gasturbine engine health and life management. 34th AIAA/ASME/ SAE/ASEE joint propulsion conference and exhibit; 1998; Cleveland, USA. Reston: AIAA; 1998. p. 3544.. 7. Naeem M. Implications of high-pressure turbine’s erosion for a military turbofan’s fuel consumption. J Aerospace Eng 2012;25 (1):108–16. 8. Rong X, Zuo H, Chen Z. Civil aero-engine health management integrating with life prediction and maintenance decision-making. 2010 Prognostics& system health management conference; 2010 Jan 12-14; Macao, China. Piscataway: IEEE Press; 2010.p.MU3036.. 9. Lee JJ. Greener manufacturing, maintenance and disposaltowards the Acare targets. Aeronaut J 2006;110(1110):567–71. 10. Fu XY, Chen Y, Zhong SS. Approach for civil aero-engine repair objective determination based on life limited parts. J Aero Power 2014;21(7):1556–61 [Chinese]. 11. Fu XY, Chen Y, Zhong SS. Civil aero-engine repair objective determination based on performance state. Comp Inter Manu Syst 2013;12:3002–7 [Chinese]. 12. Fu XY, Zhong SS. Reduction rules-based search algorithm for opportunistic replacement strategy of multiple life-limited parts. Chin J Aeronaut 2018;31(1):21–30. 13. Fu XY, Cui ZQ, Zhong SS. Civil aeroengine workscope decisionmaking under uncertain conditions. J Harbin Inst Tec 2012;7:78–82 [Chinese]. 14. Fu XY, Zhong SS, Ding G. Civil aero-engine module workscope decision-making. J Aero Power 2010;25(10):2195–200 [Chinese]. 15. Fu XY, Zhong SS, Zhu JM. Civil aeroengine health management and maintenance decision support system: Development and application. Aviation technology, integration, and operations conference; 2013 Aug 12–14; Los Angeles, USA. Reston: AIAA; 2013. p.12–4.. 16. Crocker J, Kumar UD. Age-related maintenance versus reliability centred maintenance: A case study on aero-engines. Reliab Eng Syst Safe 2000;67(2):113–8. 17. Kleeman MP, Lamont GB. Solving the aircraft engine maintenance scheduling problem using a multi-objective evolutionary algorithm. GECCO ’05 Proceedings of the 7th annual workshop on Genetic and evolutionary computation; New York, USA. Berlin Heidelberg, Springer; 2005. p. 782-96.. 18. Overhaul AMSS. A multi-agent simulation system for prediction and scheduling of aero engine overhaul. AAMAS international conference on autonomous agents and multiagent systems; 2008 May 12-16; Estoril, Portugal: ACM Association for Computing Machinery AAAI Association for the Advancement of Artifical Intelligenc; 2008. p. 81-8.. 19. Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: A survey. J Artif Intell Res 1996;4:237–85. 20. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nature 2015;518(7540):529–33. 21. Kuznetsova E, Li Y, Ruiz C, Zio E, Ault G, Bell K. Reinforcement learning for microgrid energy management. Energy 2013;59:133–46. 22. Jiang B, Fei Y. Smart home in smart microgrid: A cost-effective energy ecosystem with intelligent hierarchical agents. IEEE T Smart Grid 2015;6(1):3–13. 23. Ruelens F, Claessens BJ, Quaiyum S, De Schutter B, Babuska R, Belmans R. Reinforcement learning applied to an electric water heater: from theory to practice. IEEE T Smart Grid 2018;9 (4):3792–800.

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

18 24. Orhean AI, Pop F, Raicu I. New scheduling approach using reinforcement learning for heterogeneous distributed systems. J Parallel Distr Com 2018;117:292–302. 25. Martı´ nez-Tenor A, Ferna´ndez-Madrigal JA, Cruz-Martı´ n A, Gonza´lez-Jime´nez J. Towards a common implementation of reinforcement learning for multiple robotic tasks. Expert Syst Appl 2018;100:246–59. 26. Ruelens F, Claessens BJ, Vandael S, De Schutter B, Babuska R, Belmans R. Residential demand response of thermostatically controlled loads using batch reinforcement learning. IEEE T Smart Grid 2017;8(5):2149–59. 27. Wen Z, O’Neill D, Maei H. Optimal demand response using device-based reinforcement learning. IEEE T Smart Grid 2015;6 (5):2312–24. 28. Sugiyama M, Hachiya H, Kashima H, Morimura T. Least absolute policy iteration-a robust approach to value function approximation. IEICE T Inf Syst 2010;E93D(9):2555–65. 29. Wang C, Li J, Jing N, Wang J, Chen H. A distributed cooperative dynamic task planning algorithm for multiple satellites based on multi-agent hybrid learning. Chin J Aeronaut 2011;24(4):493–505. 30. Lu R, Hong SH, Zhang X. A dynamic pricing demand response algorithm for smart grid: Reinforcement learning approach. Appl Energ 2018;220:220–30. 31. Jiang C, Sheng Z. Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Syst Appl 2009;36(3):6520–6. 32. Fu X, Tian Y, Zhong S, Lin L. Heuristic search algorithm for opportunistic replacement strategy of multiple life-limited parts. J Aero Power 2017;32(8):1971–7 [Chinese]. 33. Ghisu T, Parks GT, Jarrett JP, Clarkson PJ. Adaptive polynomial chaos for gas turbine compression systems performance analysis. AIAA J 2010;48(6):1156–70. 34. Lu F, Jiang CY, Huang JQ, Qiu XJ. Aero engine gas path performance tracking based on multi-sensor asynchronous integration filtering approach. IEEE Access 2018;6:28305–17. 35. Li Z, Zhong SS, Lin L. Aeroengine after-maintenance performance prediction based on simplified mixed Takagi-Sugeno model. J Aeros Comp Inf Com 2018;15(7):450–61.

Z. LI et al. 36. Li Z, Zhong SS, Lin L. Novel gas turbine fault diagnosis method based on performance deviation model. J Propul Power 2017;33 (3):730–9. 37. Zhong SS, Cui ZQ, Wang TC, Zhang MN. Corrected standardization model of aero-engine parameter based on deviation. J Aero Power 2012;27(11):2892–8 [Chinese]. 38. Cui ZQ, Zhong SS, Wang TC. Prediction of engine gas path parameter deviation based on fractional aggregation process neural network. Comp Inter Manu Sys 2013;19(5):1071–7 [Chinese]. 39. Sanchez-Silva M, Frangopol DM, Padgett J, Soliman M. Maintenance and operation of infrastructure systems: review. J Struct Eng-Asce 2016;142(9):F4016004. 40. Xia T, Xi L, Zhou X, Lee J. Condition-based maintenance for intelligent monitored series system with independent machine failure modes. Int J Prod Res 2013;51(15):4585–96. 41. Zhang N, Fouladirad M, Barros A. Optimal imperfect maintenance cost analysis of a two-component system with failure interactions. Reliab Eng Syst Safe 2018;177(9):24–34. 42. Hu C, Pei H, Wang Z, Si X, Zhang Z. A new remaining useful life estimation method for equipment subjected to intervention of imperfect maintenance activities. Chin J Aeronaut 2018;31 (3):514–28. 43. Bartholomew-Biggs M, Zuo MJ, Li X. Modelling and optimizing sequential imperfect preventive maintenance. Reliab Eng Syst Safe 2009;94(1):53–62. 44. Zhong SS, Li Z, Lin L. Probability evaluation method of gas turbine work-scope based on survival analysis. Proceedings of 8th IEEE prognostics and system health management conference; 2017 Jul 9-12; Harbin, China. Piscataway: IEEE Press; 2017:p.1-6. 45. Ruiz R, Carlos GDJ, Maroto C. Considering scheduling and preventive maintenance in the flowshop sequencing problem. Comput Oper Res 2007;34(11):3314–30. 46. Abeygunawardane SK, Jirutitijaroen P, Xu H. Adaptive maintenance policies for aging devices using a markov decision process. IEEE T Power Syst 2013;28(3):3194–203.

Please cite this article in press as: LI Z et al. An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach, Chin J Aeronaut (2019), https://doi.org/10.1016/j.cja.2019.07.003

An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach

An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach

Recommend Documents