Reliability and maintenance for performance-balanced systems operating in a shock environment

Reliability and maintenance for performance-balanced systems operating in a shock environment

Journal Pre-proof Reliability and maintenance for performance-balanced systems operating in a shock environment Xiaoyue Wang , Xian Zhao , Siqi Wang ...

1MB Sizes 0 Downloads 17 Views

Journal Pre-proof

Reliability and maintenance for performance-balanced systems operating in a shock environment Xiaoyue Wang , Xian Zhao , Siqi Wang , Leping Sun PII: DOI: Reference:

S0951-8320(19)30131-0 https://doi.org/10.1016/j.ress.2019.106705 RESS 106705

To appear in:

Reliability Engineering and System Safety

Received date: Revised date: Accepted date:

29 January 2019 6 October 2019 20 October 2019

Please cite this article as: Xiaoyue Wang , Xian Zhao , Siqi Wang , Leping Sun , Reliability and maintenance for performance-balanced systems operating in a shock environment, Reliability Engineering and System Safety (2019), doi: https://doi.org/10.1016/j.ress.2019.106705

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.

Highlights   

A performance-balanced system is built to enrich the research on balanced systems. A two-step FMCIA is constructed to derive system reliability. An integrated maintenance policy is designed for the proposed model.

1

Reliability and maintenance for performance-balanced systems operating in a shock environment Xiaoyue Wanga, Xian Zhaob,, Siqi Wangb, Leping Sunb a

b

Business School, Beijing Technology and Business University, Beijing, 100048, China

School of Management & Economics, Beijing Institute of Technology, Beijing, 100081, China

Abstract Due to practical engineering applications, the research on balanced systems has moved into development in recent years. The concept of balance differs from various perspectives and more reliability problems on balanced systems need to be further investigated. The system consists of multiple components with two-stage lifetime. When the locations of the components which are in poor operation stage concentrate in a particular range in the system, the high level of non-uniformed operation states of the components would speed up the failure process of the whole system. The interpretation of this real engineering situation is defined as that the system loses the performance balance and then the performance-balanced system with three competing criteria of system performance imbalance is established in this paper. A two-step finite Markov chain imbedding approach is employed to derive probabilistic indices both for components and whole system which operate in a shock environment. A combined maintenance policy is designed for the proposed model and an optimization model is constructed by minimizing mean maintenance cost. Numerical examples based on a battery pack system have served the purpose to present to applicability of the performance-balanced systems and the effectiveness of the proposed approach. Keywords Performance-balanced system; Shock environment; Two-step finite Markov chain imbedding approach; Maintenance policy. Notations Xj

time lag between the (j-1)-th and j-th shocks, j  1, 2,3

n

number of components in the system



Corresponding author. Tel.: +86 01068918446.

E-mail address: [email protected] (X. Zhao).

2

L

number of shocks that the system suffers

d1

maximum number of invalid shocks between two consecutive valid shocks with sparse d1 maximum number of components which are in Stage 1 between two consecutive

d2

components which are in Stage 2 with sparse d 2 number of consecutive valid shocks with sparse d1 required for the change point of

kc1

component lifetime

kc 2

number of consecutive valid shocks required for component failure

kt

number of cumulative valid shocks required for component failure

bcw

number of consecutive components which are in Stage 2 that causes system performance imbalance number of consecutive components with sparse d 2 which are in Stage 2 that causes system

bcdw

performance imbalance

btw

number of total components which are in Stage 2 that causes system performance imbalance

cins

cost for inspection of preventive maintenance

cr

cost for replacing a failed component

c2

cost for repairing a component which is in Stage 2

Tp

system age for preventive maintenance

T fc

lifetime of the system which breaks down by component failures

T fs

lifetime of the system which breaks down by system performance imbalance

PH d

discrete phase-type distribution 3

PH c

continuous phase-type distribution

1. Introduction The balanced systems have been increasingly concerned by researchers in the field of reliability and engineering. Hua and Elsayed [1] proposed methods for reliability estimation and rebalanced problem of a k-out-of-n pairs: G balanced system. Hua and Elsayed [2, 3] also developed a degradation model and a Monte Carlo simulation-based reliability approximation method for such systems. As an extended work, Guo and Elsayed [4] conducted reliability modeling and analysis for a (k1, k2)-out-of-(n, m) pairs: G balanced system. With the emergence of new concepts of system balance, various new balanced systems are constructed. By minimal path sets, Endharta et al. [5] evaluated the reliability of circular kout-of-n: G balanced systems where the system balance means that the working components are proportionately spread throughout the system. Cui et al. [6] proposed four reliability models for a k-outof-n: F balanced system with m sectors where the system balance is to keep the same number of working components in different sectors. It can be summarized that the k-out-of-n modelling has been extensively studied on the research of balanced systems. Plentiful relevant work on the reliability modelling for various k-out-of-n systems has been done, such as Zhao and Cui [7], Ling et al. [8] and Eryilmaz and Devrim [9]. Additionally, Cui et al. [10] established two new balanced reliability systems based on the differences of working states between components. Zhao et al. [11] proposed a general multi-state balanced system where the components and whole system both have multiple states. It is noted that the definition of ‘system balance’ is not restricted to the existing research and can have other implications based on real engineering applications. Due to different understandings of ‘system balance’, the topic of balanced systems has a great room to be investigated. Many industrial systems, such as energy generation systems, operate in a random shock environment and suffer from performance deterioration due to the damage of external shocks throughout their service life. Various shock models have been defined to analyze the reliability of the system whose state transitions are triggered by shocks. Shock models can be classified into five basic groups, which are cumulative shock model [12] , run shock model [13, 14], extreme shock model [15, 16], delta shock model [17] and mixed shock model [18, 19]. By considering mutable failure mechanisms, a shock model with self-healing mechanism and a multi-state shock model were built by Zhao et al. [20] and Zhao et al. [21] respectively. In the shock models, the analysis of system performance is an important issue to be addressed. Usually, the system performance is characterized not only by reliability characteristics such 4

as probability of failure-free operation but also by characteristics of performance (output) as well. For example, Cha and Finkelstein [22] analyzed the performance of systems running in a shock environment by a quality (output) function; Shen et al. [23] derived some performance indices of a damage selfhealing system including system reliability and remaining lifetime distribution. In this paper, the system performance is determined by the state of system operation, in other words, the measurement of all components operation states in a multi-component system. From a new perspective of system balance, a concept of system performance balance is proposed in this paper and then a performance-balanced system is established according to the performance situation of all components in the system. Specifically, the performance deterioration of components as well as whole system results from random shocks. In the proposed model, even though all components are operational, the system moves to an accelerated failure process and loses its performance balance when the locations of components which are in poor operation performance concentrate in a particular range and satisfy a certain pattern. The performance-balanced system is proposed with the intent of modeling the performance imbalance problems of energy storage systems. The ‘cell balancing’ problem of battery pack systems in electric vehicles is taken as a typical example to illustrate the model. A battery pack system is composed of n battery cells in a series structure. For each battery cell, it is known that a single rapid-charging or overdischarging may bring a certain amount of damage to the battery cell and can be considered as a random shock [24]. Besides, overheating or overcooling environment can be also regarded as a random shock because the extreme temperature of environment influences the performance of battery cells. For each battery cell, the arriving shock brings damage to it with a certain probability and no damage to it with another probability. After a certain amount of shocks attack the battery pack system, all battery cells operate in different levels of performance due to different levels of accumulated damage. Some battery cells operate in very poor state of charge (SOC) because of serious damage, i.e. poor operating performance. In the series connection of battery cells, the SOC of the pack system is governed by the cells with the lowest SOC [25]. The ‘cell balancing’ problem of battery pack systems refers to that the non-uniformed operation performance of cells may cause a more serious inconsistency problem and a further accelerated degradation of the pack system [26]. This real engineering problem is defined as the loss of system performance balance in this paper. When the battery system loses its performance balance, its accelerated degradation will bring about much lower level of its output current and make 5

the dynamic performance of the electric vehicles become worse, even leading to sudden halt of the vehicles and traffic incidents. In terms of this phenomenon, it is considered that the battery pack system fails when it loses performance balance and needs to be maintained with appropriate actions immediately in this paper. Besides, the failure of a battery cell also results in system failure because of the series structure of battery cells. Appropriate maintenance actions can effectively achieve the desired level of system reliability with reasonable costs for engineering systems which suffer random failures[27]. The growing importance of maintenance has generated an increasing interest in the studies of maintenance policies for improving system reliability. For example, preventive maintenance strategies, which have been widely adopted for systems running in a random environment were studied by Yang et al. [28] and Cha et al. [29]; Zhang and Zeng [30] focused on the development of opportunistic maintenance policies for the engineering systems. In this paper, the classic preventive maintenance combined corrective and opportunistic maintenance is employed for the performance-balanced systems. Markov models have been proven to be an efficient and useful technique in the field of reliability [31, 32]. Fu [33] first introduced the finite Markov chain imbedding approach (FMCIA) and used it to obtain the reliability of a consecutive-k-out-of-n-F-system. By the method of FMCIA, reliability modeling and analyses of some specified systems were investigated, such as [34, 35]. An overall review on the developments and applications of FMCIA was conducted by [36]. What is remarkable is that FMCIA has been increasingly used in diverse domains, such as start-up demonstration tests [37, 38], negative binomial distributions [39] and quality control [40]. In this paper, we extend the classic FMCIA and employ a two-step FMCIA to analyze the reliability of proposed performance-balanced systems. The research on the reliability modelling and analysis for balanced systems has gradually attracted attention in the field of reliability. However, the existing studies on balanced systems are relatively insufficient in the following aspects. First, the concept of system balance is limited to six types in previous literature. However, the system balance shows a great variety of forms in real engineering applications and more research can be conducted by proposing new concepts of system balance. Second, very few studies have considered the impact of shocks on the balanced systems and the reliability of balanced systems running in a shock environment has not been analyzed comprehensively. Third, appropriate maintenance actions are significant for the stable operation of systems, but the maintenance policies for balanced systems have not been studied so far. With an aim of making up for 6

the above research deficiency, the main contribution of this paper to the current literature lies in the following aspects. 

First, a new concept of system balance-performance balance is proposed for the first time and a performance-balanced system whose degradation is caused by shocks is established aiming at modeling the performance balance problem of energy storage systems.



Second, this paper proposes an efficient and accurate methodology based on a two-step FMCIA that can be applied to derive the reliability and related probabilistic indices of the performancebalanced systems. The proposed method is an improvement of traditional FMCIA where the first and second steps are to investigate the operation process of components and to derive system reliability indices by FMCIA respectively.



Third, an integrated maintenance policy is designed for performance-balanced systems and a simulation model is developed to solve this maintenance optimization problem because of the high computational intensity to find the optimal solution by analytical approach.

The remainder of this paper is organized as follows. The model descriptions and assumptions are given in Section 2. In Section 3, we conduct the reliability analyses of a single component and the whole system respectively by a two-step FMCIA. In Section 4, an integrated maintenance policy is designed and the approximately optimal inspection interval is obtained by a simulation algorithm. Section 5 presents some quantitative results for the proposed model. The conclusion and future research directions are presented in Section 6. 2. Model descriptions We consider a multi-component system which consists of n components connected in a series structure. The basic assumptions used in this paper are presented in the following. (1) The components in the system are subject to external and possibly valid shocks at random times throughout their service life. The shocks can be classified into two types, valid shocks and invalid shocks. (2) A valid shock causes a certain magnitude of damage to the component and is denoted as ‘1’ in the shock sequence, while ‘0’ is used to denote an invalid shock which causes no damage to the component.

7

(3) The probability of a valid shock for the i-th component is pi and the probability of an invalid shock for the i-th component is qi  1  pi . Let X j ( j  1 ) denote the inter-arrival time between the (j-1)-th and j-th shocks. In this paper, the lifetime of the component is divided into two stages. When the component is in its Stage 1, it does not fail and operates with excellent operation performance. When the valid shocks that a component suffers are more frequent and intensive, the component state starts to become worse and may fail after subsequent valid shocks. This case can be interpreted as the component moving to its Stage 2. When the component is in its Stage 2, it runs with poor operation performance. The change point for component moving to Stage 2 (poor operation state) from Stage 1 (excellent operation state) is proposed by considering sparse connection. The change point of the component lifetime is defined as the event that kc1 consecutive valid shocks with sparse d1 appear. If the number of failed trials between two adjacent successful trials is d or less, then the two successful trials are called two consecutive successful trials with sparse d [37]. When the component operates in the Stage 2, it fails when

kc 2 consecutive valid shocks or kt cumulative valid shocks attack the component in the Stage 2, whichever comes to first. Based on the above, the components in the system are subject to a two-stage failure process shown in Fig. 1. Component State

Stage 1

Stage 2

Failure Change point 0

Failure point t2

t1

kc2 consecutive valid shocks OR kt cumulative valid shocks

kc1 consecutive valid shocks with sparse d1

Fig. 1. Failure process of the component with a change point.

8

t

From the perspective of the whole system, the system performance balance is dependent on the patterns of the locations of components which are in their Stage 2. Three criteria of imbalance patterns are built for the performance-balanced system as follows: (1) Criterion A is that the number of consecutive components which are in their Stage 2 is not less than bcw ; (2) Criterion B is that the number of consecutive components with sparse d 2 which are in their Stage 2 is not less than bcdw ; (3) Criterion C is that the number of cumulative components which are in their Stage 2 is not less than btw . Note that these three parameters should satisfy bcw  bcdw  btw . If Criterion A or B or C is met, it means that the locations of the components which are in bad operating status satisfy a critical pattern, and the system loses its performance balance and fails. Equivalently, the system is assumed to keep performance balance if and only if none of the three criteria is satisfied.

9

Cell 8

0

1 0 0

0 0

1 0 1

0

0 1

Cell 8

1 0

0 1

0

1

0 0 1 0

0 1

Cell 7

1

1 0 0

0 1

0 0 0

1

0 0

Cell 7

1 0

0 0

0

1

0 0 0 0

0 1

Cell 6

0

1 0 0

0 1

0 0 1

0

0 1

Cell 6

1 1

0 0

0

1

0 0 1 0

0 1

Cell 5

0

0 1 0

0 1

0 0 1

0

0 1

Cell 5

0 0

1 0

0

1

1 0 1 0

0 1

Cell 4

0

1 1 0

0 1

1 0 0

1

0 0

Cell 4

0 0

1 0

0

0

1 0 0 1

0 0

Cell 3

1

0 0 0

1 0

0 0 1

1

0 0

Cell 3

1 0

1 0

0

0

1 0 1 1

0 1

Cell 2

0

0 1 0

0 1

0 0 0

1

0 0

Cell 2

0 1

1 0

0

1

0 0 1 1

0 0

Cell 1

0

1 1 0

0 1

1 0 1

1

1 1

Cell 1

0 1

1 0

0

1

1 0 1 1

0 0

0 Case (a)

0 t Individual component failure

t Case (b) System imbalance

Cell 8

0 0

0 0

1 10

0 0

1

0 1

Cell 8

0 1

0 0

1 0

0 1

1 0

1 1

Cell 7

1 0

1 0

0 10

0 1

0

0 1

Cell 7

0 1

0 0

0 0

0 1

0 0

0 1

Cell 6

0 1

0 0

0 10

0 0

0

1 0

Cell 6

1 1

0 0

0 1

0 1

1 0

0 1

Cell 5

1 1

0 0

0 10

0 0

0

1 1

Cell 5

1 0

0 0

0 1

0 0

1 0

0 1

Cell 4

0 1

1 0

0 11

0 0

1

1 1

Cell 4

0 1

1 0

0 0

0 1

0 1

0 0

Cell 3

1 0

0 0

0 10

1 1

0

0 1

Cell 3

1 0

0 1

0 1

0 1

1 0

0 1

Cell 2

0 0

1 0

0 10

0 0

1

0 0

Cell 2

0 0

1 0

0 1

0 0

1 0

0 0

Cell 1

0 1

1 0

0 10

0 0

1

0 1

Cell 1

0 1

0 0

0 1

0 1

1 0

0 0

0 Case (c)

t System imbalance

Occurrence of the event denotes that a cell moves to its Stage 2

0 Case (d)

t System imbalance

Occurrence of the event denotes that a cell fails

Fig. 2. Four possible failure cases of the battery pack system in EVs with parameters kc1  3, d1  1, kc 2  3, kt  4 and bcw  2, bcdw  3, d2  1, btw  4 .

For better understanding, a battery pack system with eight battery cells is taken as an illustration to demonstrate the failure criteria of the proposed performance-balanced system. As shown in Fig. 2, when the system parameters are kc1  3, d1  1, kc 2  3, kt  4 and bcw  2, bcdw  3, d2  1, btw  4 , four possible failure cases of the battery pack system are explained as follows. In Fig. 2, the horizontal axis is timeline and the valid and invalid shocks are denoted as ‘1’ and ‘0’ respectively. In the case (a), cell 1 is the only one that moves into its Stage 2. When the number of consecutive valid shock reaches kc 2  3 in its Stage 2, cell 1 fails and the whole battery pack system breaks down instantly. As to case (b), cell 1, 3 10

and 5 operate with poor performance, which satisfies Criterion B of the system performance imbalance ( bcdw  3, d2  1 ). As to case (c), the system is unbalanced and fails when two consecutive battery cells all run in their Stage 2 (Criterion A, bcw  2 ). Case (d) illustrates that the system imbalance and failure result from four total battery cells working in their Stage 2 (Criterion C, btw  4 ). Only a few possible failure cases of the whole battery pack system are given in Fig. 2 and more system failure cases or scenarios will occur according to the proposed model descriptions and assumptions. 3. Reliability analysis for the performance-balanced system In this section, probabilistic analyses for the above model are presented by using a modified finite Markov chain imbedding approach which is firstly proposed in this paper. In order to model the performance balanced problem of a multi-component system, a two-step FMCIA is employed, where the first step is to examine the failure process of a single component and the second step is to investigate the failure criteria of the entire system from an overall perspective. 3.1 Assessment for a single component The first step is to obtain the failure probability of a single component in the system and the probability that the component is in its Stage 2. For the i-th component in the system, four random variables are cs1 defined in a sequence of m random shocks as follows: N mi represents the number of last consecutive

cs1 valid shocks with sparse d1 that attack the i-th component in its Stage 1. If N mi is larger than zero, N mid

denotes the number of trailing invalid shocks that the i-th component suffers in its Stage 1, otherwise, cs1 cs 2 ts N mid is written as ‘0’ with no specific meaning if N mi is zero. N mi and N mi stand for the number of last

consecutive valid shocks and the total number of valid shocks respectively that the i-th component cs1 ts suffers in its Stage 2. The imbedding Markov chain Ymi , m  0 associated with N mi , N mid , N mics 2 and N mi

is defined as follows, cs1 ts Ymi   Nmi , Nmid , Nmics 2 , Nmi  , for m  0 .

The state space mi is

mi  (nics1 , nid , nics 2 , nits ),0  nics1  kc1 ,0  nid  d1 ,0  nics 2  kc 2 , nics 2  nits  kt   E ifc  ,

11

where nid  0 , if nics1  0 . The initial state is Y0i  (0,0,0,0) , and E ifc is the absorbing state which means the i-th component fails. Appendix A shows the rules of the transition probabilities in the one-step transition probability matrix Λ mif among the states of the i-th component. The one-step transition probability matrix Λ mif can be obtained according to Appendix A and partitioned into four elements by using the theory of Markov chains as follows, Q Λ mif   i 0

Ri  I  ( N

T

. 1)(NT 1)

1 The total number of transient states is denoted by NT   kc1  1 d1  1  1  kc 2  2kt  kc 2  1 . Q i (a 2

NT  NT matrix) is the one-step transition probability matrix among transient states for the i-th component. R i stands for the one-step transition probability matrix with a size NT  1 from transient states to absorbing states for the i-th component. 0 (a zero matrix with a size 1 NT ) represents the one-step transition probability matrix from absorbing states to transient states for the i-th component.

I stands for the one-step transition probability matrix among absorbing states which is a first-order identity matrix. Additionally, the matrix Q i can be further divided into four parts below, A Qi   i 0

Bi  Ci 

. NT  NT

1 There are n1   kc1  1 d1  1  1 and n2  kc 2  2kt  kc 2  1 transient states when the i-th component 2

is in its Stage 1 and Stage 2 respectively. The matrix A i with a size n1  n1 and Ci with a size n2  n2 contain the transition rules of the i-th component’s transient states in its Stage 1 and Stage 2 respectively. The matrix B i (a n1  n2 matrix) includes the transition probabilities that the i-th component moves to its Stage 2 from Stage 1. The shock length is denoted as L . Based on the above analysis, the probability mass function and the distribution function of the shock length L when the i-th component fails after l shocks are presented respectively in the following,

12

Pcfi ( L  l )  π0  Qi  Ri , l 1

l 1

(1)

Pcfi ( L  l )  π0   Qi  R i , j

(2)

j 0

where π0  (1,0,

,0)1 NT .

In order to analyze the reliability of whole system, it is essential to obtain the survival probabilities of an individual component which is in its Stage 1 and 2. The critical condition that it is required for the i-th component in its Stage 2 is nics1  kc1 . The survival probabilities of the i-th component in the Stage 1 and Stage 2 after suffering l shocks are written respectively as Pci1  L  l   π0  Qi  e1 ' ,

(3)

Pci2  L  l   π0  Qi  e2 ' ,

(4)

l

l

,0)1 NT , e1  ( 1,1,

where π0  (1,0,

,1 , 0,0,

 kc1 1 d1 1 1

,0 ) and e2  ( 0,0,

,0 , 1,1,

 kc1 1 d1 1 1

1 kc 2  2 kt  kc 2 1 2

,1 ) .

1 kc 2  2 kt  kc 2 1 2

Additionally, let the random variable Si f denote the total number of shocks that the i-th component has suffered when it fails. The definition of Si f means that the Markov chain transfers into the absorbing state, so Si f follows a discrete PH distribution denoted as Si f ~ PH d  αif , Qi  , where αif  1,0,

,0 1 N . Define a random variable Sig2 as the total number of shock that the i-th component T

has suffered when it enters its Stage 2. If the first n1 states are regarded as transient states, then other remaining states can be merged into one absorbing state which means the i-th component moves into its Stage 2. Consequently, Sig2 follows a discrete PH distribution as well and is represented as

Sig2 ~ PH d  αig2 , Ai  , where αig2  1,0,

g2 ,0 1n . A new one-step transition probability matrix Λ mi can 1

be obtained below, A g2 Λ mi  i 0

R ig2  .  I  ( n 1)( n 1) 1

13

1

It is assumed that the inter-arrival time of the shocks X j ( j  1 ) follows a continuous phase-type distribution with a representation X j ~ PH c (γ, η) . The random variables M i f and M ig2 denote the time that the i-th component fails and moves into its Stage 2 respectively, which can be derived as Si f

M if   X j ,

(5)

j 1

Sig2

M

g2 i

Xj .

(6)

j 1

On the basis of the closure properties of phase-type distributions [41], a matrix-based method is presented to obtain the distributions M i f and M ig2 which are shown as follows,









M i f ~ PH c γ  αif , η  I   a0 γ   Qi , M ig2 ~ PH c γ  αig2 , η  I   a0 γ   Ai ,

(7)

(8)

where a0  ηe' , I is an identity matrix and  is the Kronecker product. The cumulative distribution function of M i f and M ig2 can be derived respectively as





(9)





(10)

P  M i f  t   1   γ  αif  exp η  I   a0 γ   Qi t e' , P  M ig2  t   1   γ  αig2  exp η  I   a0 γ   Ai t e' .

Afterwards, random variables Pcfi  t  , Pci1  t  and Pci2  t  are used to denote the probabilities that the i-th component fails, stays in Stage 1 and Stage 2 at time t respectively, and can be calculated by

Pcfi (t )  P  M i f  t  ,

(11)

Pci1  t   1  P  M ig2  t  ,

(12)

Pci2  t   P  M ig2  t   P  M i f  t  .

14

(13)

3.2 Assessment for the system After we get a knowledge of the probabilities that a single component fails, stays in its Stage 1 and in its Stage 2 after l shocks (at time t) respectively in Section 3.1, the FMCIA is used for the second time to derive the reliability function of the whole system, the formula of expected shock length when system fails and the expected system lifetime in this subsection. The failure criterion of the system caused by a single component is not included temporarily for the moment aiming at describing the balanced index of the system more clearly. Four random variables

N kcw , N kcdw , N kdw , N ktw are defined as follows. N kcw denotes the number of the last consecutive components which are in their Stage 2 in the first k components. N kcdw represents the number of the last consecutive components which are in their Stage 2 with sparse d 2 in the first k components. N kdw represents the number of trailing components which are in their Stage 1 in the first k components if

N kcdw is larger than zero. Otherwise, N kdw is always denoted as ‘0’ if N kcdw is zero (‘0’ is just a symbol with no specific meaning in this situation). N ktw is the number of cumulative components which are in their Stage 2 in the first k components. According to model assumptions, a Markov chain {Yk , k  0} is established for the entire system as

Yk   Nkcw , Nkcdw , Nkdw , Nktw  , for 1  k  n on the state space  k : k  Sk   E fs 





  ncw , ncdw , ndw , ntw  , 0  ncw  bcw , ncw  ncdw  bcdw ,0  ndw  d 2 , ncdw  ntw  btw  E fs  ,

where ndw  0 if ncdw  0 . E fs is the absorbing state which is caused by possible seven events. Event 1, 2 and 3 mean that the imbalance criterion A , B and C are satisfied and the system loses its performance balance, respectively. Event 4, 5 and 6 lead to the system performance imbalance and failure when any two and only two of Criterion A, B and C are met, respectively. Event 7 means that the system is unbalanced because of the simultaneous satisfaction of Criterion A, B and C.

Procedure 1 Generate one-step transition probability matrix for performance-balanced systems

15

Step 1:

Yk 1   ncw , ncdw , ndw , ntw  is given and then go to Step 2.

Step 2:

If the k-th component is failed with a probability Pcfk  L  l  , then let Yk  E fc and halt; Else if the k-th component is working with a probability 1  Pcfk  L  l  , then go to Step 3.

Step 3:

If the k-th component is in Stage 2 with a probability Pck2  L  l  , then go to Step 4; Else if the k-th component is in Stage 1 with a probability Pck1  L  l  , then go to Step 7.

Step 4:

If ntw  btw  1 , then let Yk  E fs and halt; Else if ntw  btw  1 , then go to Step 5.

Step 5:

If ncdw  bcdw  1 , then let Yk  E fs and halt; Else if ncdw  bcdw  1 , then go to Step 6.

Step 6:

If ncw  bcw  1 , then let Yk  E fs and halt; Else if ncw  bcw  1 , then let Yk   ncw  1,ncdw  1,0,ntw  1 and halt.

Step 7:

If ncdw  0 , then let Yk   0,0,0,ntw  and halt; Else if ncdw  0 , then go to Step 8.

Step 8:

If ndw  d2 , then let Yk   0,0,0,ntw  and halt; Else if ndw  d 2 , then let Yk   0,ncdw ,ndw  1,ntw  and halt.

From the perspective of all system failure criteria, E f is the whole set of system absorbing states which includes E fc and E fs . E fc represents the absorbing state that the system is failed caused by component failures. For the Markov chain Yk  , if Yk 1 is a transient state and Procedure 1 shows how to generate the one-step transition probability matrix for the performance-balanced system. If Yk 1 is an absorbing state, its transition is not included in Procedure 1 and then it transfers to the absorbing state with a probability 1. After the derivation of one-step transition probability matrix based on Procedure 1,

16

the one-step transition probability matrix Λ k can be partitioned into four parts by applying the theory of Markov chains in the following, D Λk   k 0

Fk  I k  ( N

, H  2)(N H  2)

H presents the set of all transient states and the number of transient states is denoted by N H . Dk (a

N H  N H matrix) is the one-step transition probability matrix among transient states. Fk (a N H  2 matrix) includes the one-step transition probabilities from transient states to absorbing states. 0 is a zero matrix with a size 2  N H because it includes the one-step transition probabilities from absorbing states to transient states which are impossible. The last block I k (an identity matrix with a size 2  2 ) stands for the one-step transition probability matrix among absorbing states. Assume that there are n total components in the system. On the basis of the one-step probability matrix Λ k , some probabilistic indices related to the component number and shock length can be derived by the following equations. The reliability function of the system with n components after suffering l shocks can be written as n

R  l   π0  D j e '

(14)

j 1

where π0  1,0,,0 1 N , e  (1,1, H

,1) 1 NH and D j represents the one-step transition probability matrix

among the transient states after the system suffers l shocks. The probability distribution function of the shock length L when the whole system fails after l shocks is presented below, n

Psf L  l  1  R  l   1  π0  D j e ' ,

(15)

j 1

Moreover, we can derive the probability mass function of the shock length L when the system breaks down after l shocks by Equation (16) in the following. n

n

j 1

j 1

Psf L  l  Psf L  l  Psf L  l  1  π0  D*j e ' π0  D j e '

17

(16)

where D*j denotes the one-step transition probability matrix among the transient states after the system suffers l  1 shocks. Let Ls denote the total number of shocks until the system failure. The expected shock length E  Ls  when the system fails can be calculated by 

E  Ls    l  Psf L  l .

(17)

l 1

Furthermore, in the case of continuous time, a similar finite Markov chain can be established and the corresponding one-step transition probability matrix Λ k (t ) can be derived by replacing Pcfk  L  l  , Pck1  L  l  and Pck2  L  l  with Pcfk  t  , Pck1  t  , Pck2  t  respectively. Then, the reliability function of the

system can be computed by n

R  t   π0  D j (t )e ' ,

(18)

j 1

where D j (t ) is the one-step transition matrix among the transient states at time t. Then, a random variable T is defined as the system lifetime and its expected value can be gained by 

E T    R t dt . 0

(19)

4. Maintenance policy In this Section, a combination of preventive, corrective and opportunistic maintenance policy is designed for the proposed model. Preventive maintenance refers to the set of necessary operations applied to the system before its failure so that it can function well. It is certain that the preventive maintenance actions are playing a great role for an engineering system in improving its reliability, preventing its failures and reducing maintenance cost. A corrective maintenance action is performed repeatedly to bring the system back to operation once the system breaks down. An opportunistic maintenance is implemented when maintenance opportunities are offered by the failures of other components in a multi-component system. A system cycle is finished when a maintenance action (preventive or corrective) is performed. The time of all maintenance actions are assumed to be negligible. The status of the components and the whole system can be identified by a piece of monitoring equipment immediately. The maintenance rules as well as corresponding probabilities are given in Table 1 and the detailed maintenance actions are explained in the following. 18



Preventive maintenance actions are performed on the badly damaged components which are in their Stage 2 at the fixed system age Tp if the whole system still works. A total cost cins is incurred for performing the inspection on all components. The cost of preventive maintenance for a component which is in its Stage 2 is c2 .



Due to two system failure criteria, the corrective maintenance has two scenarios in the proposed model. o

If the system is failed because of component failures, each failed component is replaced with a new one with a replacement cost cr and opportunistic maintenances are implemented on the components which are in their Stage 2 with a cost c2 for each one. Considering different levels of component damage caused by shocks, the cost parameters cr and c2 should satisfy cr  c2 .

o

If the system failure results from the system performance imbalance, all the badly damaged components which are in their Stage 2 are repaired with a cost c2 for each one.



It is noted that no matter what kind of maintenance action is performed, opportunities of minor repairs for the components in their Stage 1 are generated and they can be repaired as good as new. Owing to the excellent operating performances of the components in their Stage 1, the cost of minor repairs for them is very low and considered negligible in this paper.

Table 1 Maintenance rules. Case

Condition

Cycle length

Cost

Probability Pmk11  P Tp  Tfc , Tp  Tfs , mg  k1

1

Tp  Tfc and Tp  T fs

Tp

cins  k1c2

2

T fc  Tp and T fc  T fs

T fc

hcr  k2 c2

Pmk22 , h  P Tfc  Tp , T fc  T fs , mg  k2 , mr  h

3

T fs  Tp and T fs  T fc

T fs

k3 c2

Pmk33  P Tfs  Tp , T fs  T fc , mg  k3 

19

In Table 1, T fc and T fs denote the system lifetime when the system fails due to individual component failure and system performance imbalance respectively. Random variables mg and mr are used to denote the total number of components which are in their Stage 2 and the total number of failed components in the system respectively. For the case 1, the preventive maintenance is conducted before the system failure no matter which failure criterion comes into effect. In case 2, a corrective maintenance resulting from individual component failures is performed before the preventive maintenance. Corrective replacements are conducted for the failed components and opportunistic maintenances are implemented for the components which are in poor operation state. Case 3 depicts that the system loses its performance balance and fails in the first place, so a corrective maintenance is implemented before the preventive maintenance. Based on the above, an optimization model with the intent of minimizing the mean cost rate per unit time is constructed as a function of inspection time Tp in the following, btw 1

min C (Tp ) 

 c

k1  0

ins

 k1c2   Pmk11 

btw 1 n  k2

   hc

k2  0 h 1



r

btw

 k2c2   Pmk22 ,h    (k3c2 )  Pmk33

E min Tp , T fc , T fs 



k3  0

(20)

In this maintenance policy, it is difficult to derive the analytical expression of C (Tp ) because T fc and T fs are dependent. A simulation algorithm has made significant progress as a powerful tool for solving the optimization problems of complex engineering systems [42], thus it is adopted to obtain the approximately optimal preventive maintenance inspection interval in this paper. The simulation implementation procedure for a specified value of Tp is presented in Procedure 2. In Procedure 2, J is the number of simulation runs and T f represents the current system lifetime. The vector H s 2 is a 3  1 matrix, where the three elements record the total number of components which are in Stage 2, the total number of failed components and the state of system performance balance (‘0’ represents that the system operates and ‘1’ denotes that the system fails due to performance imbalance), respectively. Note that the vector H s 2 should be set to zeros after all maintenance actions because all components which are in their Stage 2 are repaired. The vector K s is a n  4 matrix which contains the current summarized results of shock sequences of n components-  nics1 , nid , nics 2 , nits  where i  1,2,

, n . The

statistical results of arriving shocks in vector K s should be also returned to zeros after all maintenance 20

actions. For each Tp , Ct and Lt denote the total maintenance cost and total cycle length for all simulation runs respectively. After choosing an appropriate interval Tmin , Tmax  for Tp with a step 0.1, the average cost per unit time for each Tp is calculated by C (Tp )  Ct Lt . After all values of C (Tp ) are derived, then the minimal C (Tp* ) can be found and the corresponding Tp* can be obtained. When the (approximately) optimal inspection interval is figured out for the performance-balanced system, it provides decision supports for engineers to maintain and manage the practical engineering systems with performance-balanced characteristics more effectively. Procedure 2 Simulation for proposed maintenance policy Step 1:

Initialize the model parameters p, , kc1 , d1 , kc 2 , kt , bcw , bcdw , d2 , btw , cr , c2 , cins ; Define Tp , Tf  0, Ct  0, Lt  0, j  1, J  10000, Hs2  zeros(3,1), Ks  zeros(n, 4) ; Then go to Step 2.

Step 2:

Simulate the arriving of shocks and update T f , then go to Step 3.

Step 3:

If T f  Tp , let Ct  Ct  cins  k1c2 , Lt  Lt  Tp and update Hs2 , K s , Tf , then go to Step 6; Else if T f  Tp , update Hs2 , K s and go to Step 4.

Step 4:

If the system operates, then go back to Step 2; Else if the system fails, then go to Step 5.

Step 5:

If the system fails due to performance imbalance, let Ct  Ct  k3c2 , Lt  Lt  Tf , and update

Hs2 , K s , Tf , then go to Step 6; Else if the system fails due to component failures, let Ct  Ct  hcr  k2 c2 , Lt  Lt  Tf , and update Hs2 , K s , Tf , then go to Step 6. Step 6:

If j  J , let j  j  1 and then go to Step 2; Else if j  J , let C (Tp )  Ct Lt and halt.

21

5. Numerical examples In this section, we take a practical engineering application-a battery pack system as the research object to demonstrate the proposed model with rich illustrative examples. Some comparative analyses are conducted to examine the effects on the system probabilistic quantities imposed by different model parameters. Additionally, the proposed maintenance policy is also illustrated by comparing the results with different model parameters. Consider a battery pack system which is composed of six battery cells. All battery cells in the battery pack system may break down owing to the accumulated damage by random shocks during their life time. The whole system is functional if and only if all battery cells can operate and the whole battery pack system keeps performance balance. For the i-th battery cell, the arriving shock brings damage to it with a probability pi  0.3 . When the i-th battery cell suffers 2 consecutive valid shocks with sparse 1 ( kc1  2, d1  1 ), it can be regarded as moving to its Stage 2 and operating with poor performance. The ith battery cell fails after suffering 2 consecutive valid shocks or 3 total valid shocks in its Stage 2 ( kc 2  2, kt  3 ). Therefore, the number of transient states NT equals 8 and the state space mi is shown as mi  (0,0,0,0),(1,0,0,0),(1,1,0,0),(2,0,0,0), (2,0,1,1),(2,0,0,1),(2,0,1,2),(2,0,0,2) {E ifc } .

According to the analysis presented in Section 3.1, the transition diagram of a Markov chain for the i-th battery cell is presented in Fig. 3. qi

qi pi (0,0,0,0)

pi

pi

(1,0,0,0)

(2,0,0,0)

qi

qi

(1,1,0,0)

pi (2,0,1,1)

qi

qi (2,0,0,1)

pi

E ifc pi

pi

(2,0,1,2)

pi

qi qi

(2,0,0,2)

Fig. 3. Transition diagram of a Markov chain for the i-th battery cell with kc1  2, d1  1, kc 2  2, kt  3 . 22

The one-step transition probability matrix Λ mif of this example is shown as follows 0 0 0 0 0 0  (0,0,0,0) 0.7 0.3 0  0 0 0.7 0.3 0 0 0 0 0  (1,0,0,0)  0 0.3 0 0 0 0 0  (1,1,0,0) 0.7 0   0 0 0.7 0.3 0 0 0 0  (2,0,0,0)  0 0 0 0 0 0.7 0 0 0.3 , Λ mif  (2,0,1,1)  0   0 0 0 0 0.7 0.3 0 0  (2,0,0,1)  0 0 0 0 0 0 0 0.7 0.3 (2,0,1, 2)  0   0 0 0 0 0 0 0.7 0.3 (2,0,0, 2)  0  0 0 0 0 0 0 0 1  E ifc  0 99 0 0 0 0 0  0.7 0.3 0  0 0 0.7 0.3 0 0 0 0   0.7 0 0 0.3 0 0 0 0  0.7 0.3 0    0 0 0 0.7 0.3 0 0 0   where Qi  and Ai   0 0 0.7  .  0  0 0 0 0 0.7 0 0 0    0.7 0 0 0 0 0 0.7 0.3 0   0  0 0 0 0 0 0 0 0.7    0 0 0 0 0 0 0.7   0

According to Equations (1) and (2), the probability mass function and distribution function of the shock length L when the i-th battery cell fails after ten shocks are given below, Pcfi ( L  10)  π0Qi9 Ri = 0.0426 , 9

Pcfi ( L  10)  π0   Qi  R i  0.1863 . j

j 0

Moreover, the survival probabilities of the i-th battery cell in the Stage 1 and Stage 2 after suffering total ten shocks are computed respectively by Equations (3) and (4) in the following, Pci1  L  10   π0  Qi  e1 '  0.3401, 10

Pci2  L  10  π0  Qi  e2 '  0.4736 . 10

23

From the perspective of whole battery pack system, the battery pack system loses its performance balance and breaks down once one of the following events occurs: (1) the number of consecutive battery cells with poor performance reaches 2 ( bcw  2 ); (2) the number of consecutive underperforming battery cells with sparse 1 reaches 3 ( bcdw  3, d2  1 ); (3) the total number of battery cells with poor performance reaches 4 ( btw  4 ). Therefore, the state space of a Markov chain

{Yk ,1  k  n} for the whole system is (0,0,0,0),(1,1,0,1),(0,1,1,1),(1, 2,0, 2),(0, 2,1, 2),(0,0,0, 2),(1,1,0,3),  k     E fc   E fs  . (0,1,1,3),(0,0,0,3),(0,0,0,1),(1,1,0, 2),(0,1,1, 2),(1, 2,0,3),(0, 2,1,3) 

The transition diagram among the system states can be presented in Fig. 4. For the sake of clarity, the system absorbing states E fc and E fs all appear twice in Fig. 4. Transition probabilities among any two system states are given upon the corresponding connecting lines between the states in Fig. 4. 0.3401

0.3401

0.4736

(0,0,0,0)

(1,1,0,1)

0.3401

(0,1,1,1)

0.3401

0.4736

(0,0,0,1)

0.1863 0.1863

0.4736

0.4736

(1,1,0,2)

0.1863

(1,2,0,2) 0.4736

0.1863

E fc

0.4736

0.3401 0.1863

0.3401

0.1863

0.1863

0.1863

0.1863

(0,1,1,2)

0.3401

(0,2,1,2)

0.3401

(0,0,0,2)

0.4736

0.3401

0.4736

0.4736

0.4736

E fs

(1,1,0,3)

0.1863

0.3401

0.1863

0.4736

(0,1,1,3) 0.1863

(1,2,0,3)

0.4736

0.3401 0.1863

0.3401

(0,2,1,3)

0.4736

0.3401 0.4736

E fc (0,0,0,3) 0.1863

0.3401

Fig. 4. Transition diagram of a Markov chain for the performance-balanced system with bcw  2, bcdw  3, d 2  1, btw  4 .

24

E fs

According to the Equations (14)-(17), the reliability of whole battery pack system after ten shocks R 10  equals 0.0528 and the expected value of shock length when the battery pack system fails E  Ls  equals 5.7346. In the case of continuous time, assume the inter-arrival time of successive shocks X j follows an exponential distribution represented as X j ~ PH c 1, 0.5 . When the battery pack system has suffered 10 shocks, the distributions of Si f , Sig2 , M i f and M ig2 can be obtained respectively as

Si f ~ PH d  αif , Qi  , Sig2 ~ PH d  αig2 , Ai  ,

M i f ~ PH c  αif , 0.5I  0.5Qi  , M ig2 ~ PH c  αig2 , 0.5I  0.5Ai  . where αif  1,0,

,0 18 , αig2  1,0,0  .

Afterwards, the probabilities that the i-th battery cell fails, stays in Stage 1 and Stage 2 at time t can be calculated respectively in the following, Pcfi (t )  1  αif exp   0.5I  0.5Qi  t  e' , Pci1  t   1  1  αig2 exp   0.5 I  0.5 Ai  t  e'  ,

Pci2  t   1  αig2 exp   0.5I  0.5Ai  t  e'   1  αif exp   0.5I  0.5Qi  t  e'  =α if exp   0.5I  0.5Qi  t  e'  α ig2 exp   0.5I  0.5Ai  t  e' .

When t  15 is taken as an example, corresponding probabilities are calculated below, Pcfi (t )  0.1074 , Pci1  t   0.4835 , Pci2  t   0.4091 .

According to Equation (18), the system reliability R  t  is 0.1846 when the system has suffered random shocks for t  15 . The expected lifetime of the battery pack system E T  equals 11.0305. When one model parameter of the performance-balanced system varies, corresponding effects on the probabilistic indices in the case of shock length are exactly same with that of continuous time, therefore, the sensitivity analyses of model parameters are conducted for the case of shock length in the following. Taking the i-th battery cell as the research object, some comparative results are summarized in Table 2 to examine the effects on Pcfi ( L  l ) and Pcfi ( L  l ) imposed by different parameter combinations. From 25

cases 1-3, it can be seen that the increase of d1 leads to higher probabilities of Pcfi ( L  l ) , Pcfi ( L  l ) and Pci2  L  l  , because it is easier for the i-th battery cell moving to its Stage 2 when d1 gets larger. On the

contrary, when kc1 becomes larger (cases 2, 4 and 5), the i-th battery cell stays in its Stage 1 with a higher probability ( Pci1  L  l  ) and the failure probabilities of the battery cell ( Pcfi ( L  l ) , Pcfi ( L  l ) ) become much smaller. By comparing corresponding results in Table 2, it is observed that the decrease of

kc 2 as well as kt results in smaller Pci2  L  l  and larger Pcfi ( L  l ) which means that the battery cell has a lower probability staying in Stage 2 and a higher probability breaking down after ten shocks. The values of Pci2  L  l  and Pcfi ( L  l ) are affected greatly by the change of the probability of a valid shock

pi . Specifically, the much larger Pci2  L  l  and Pcfi ( L  l ) result from the increase of pi which can be known by comparing cases 6, 10 and 11. When the i-th battery cell suffers more shocks (cases 12 and 13), it has a higher probability to fail and stay in its Stage 2. Table 2 Probability indices of the i-th battery cell after suffering l shocks with different parameters. Case

pi

kc1

d1

kc 2

kt

l

Pcfi ( L  l )

Pcfi ( L  l )

Pci1 ( L  l )

Pci2 ( L  l )

1

0.3

4

1

3

5

10

0.0017

0.0041

0.8590

0.1370

2

0.3

4

2

3

5

10

0.0026

0.0052

0.7652

0.2296

3

0.3

4

3

3

5

10

0.0028

0.0055

0.6976

0.2970

4

0.3

3

2

3

5

10

0.0062

0.0178

0.5638

0.4184

5

0.3

5

2

3

5

10

0.0006

0.0010

0.8879

0.1111

6

0.3

4

2

2

5

10

0.0113

0.0287

0.7652

0.2061

7

0.3

4

2

4

5

10

0.0005

0.0008

0.7652

0.2340

8

0.3

3

2

3

4

10

0.0085

0.0210

0.5638

0.4151

9

0.3

3

2

3

6

10

0.0061

0.0177

0.5638

0.4185

10

0.4

4

2

2

5

10

0.0372

0.1086

0.5413

0.3502

26

11

0.5

4

2

2

5

10

0.0762

0.2676

0.3193

0.4131

12

0.4

4

2

2

4

8

0.0231

0.0395

0.6483

0.3122

13

0.4

4

2

2

4

12

0.0409

0.1890

0.4574

0.3536

Table 3 presents some comparative results of different balanced parameters and the corresponding system reliabilities and expected shock lengths when pi  0.25 (i  1,2,

, n), L  10 . By the comparisons

of corresponding cases, it can be observed that the R  l  and E  Ls  increase along with the growth of

bcw , bcdw and btw respectively, owing to the less probability of the battery pack system failure when the imbalance criteria are more strict. Fig. 5 is plotted to exhibit the above relationship between balanced parameters ( bcw , bcdw and btw ) and R  l  more clearly along with the increase of shock length. The larger parameters n and d 2 , by contrast, lead to the decline of R  l  and E  Ls  which can be found by comparing between cases 2 and 6 and between cases 2 and 3, respectively. This can be explained that the battery pack system is more likely to break down when the number of battery cells gets larger and the Criterion B of system performance imbalance is easier to be achieved respectively. The above conclusion is represented obviously in Fig. 6. Table 3 Probabilistic indices of battery pack system after suffering l shocks with different model parameters. Case

bcw

bcdw

d2

btw

n

R l 

E  Ls 

1

3

7

3

10

15

0.6458

12.3805

2

6

7

3

10

15

0.7981

14.0025

3

6

7

1

10

15

0.8757

15.6598

4

6

9

3

10

15

0.8818

15.6419

5

6

7

1

8

15

0.8432

14.6397

6

6

7

3

10

10

0.9199

17.5919

27

Fig. 5. Sensitivity analyses of bcw , bcdw , btw for system reliability

Fig. 6. Sensitivity analyses of n, d 2 for system reliability

Additionally, Fig. 7 is plotted to investigate the effect on the R  l  imposed by the parameters of component failure mechanism with fixed parameter combination bcw  6, bcdw  7, d2  3, btw  10, n  10 . Note that the parameter combination on component level  kc1 , d1 , kc 2 , kt  has an obvious impact on the reliability of the whole battery pack system R  l  and the expected shock length E  Ls  because Pcfi ( L  l ) , Pci1  L = l  and Pci2  L = l   i  1,2,

, n  determine the transition probabilities among the

system transient states. Specifically, R  l  and E  Ls  get larger along with the increase of kc1 , kc 2 , kt and the decrease of d1 , respectively.

28

Fig. 7. Sensitivity analyses of parameters of component failure mechanism for system reliability

For the proposed maintenance policy, the case when the time lags between two successive shocks X j ( j  1 ) follow a common exponential distribution with a parameter  represented as X j ~ PH c (1, - ) ( γ  1, η   ) is discussed. The number of simulation runs is set as 10000 after a

convergence analysis. The comparative results for the maintenance policy with different model parameters are given in Table 4 when the shock process parameters and cost parameters are set as n  8,   0.5, pi  0.4  i  1,2, 8 and cr  80, c2  10, cins  20 respectively. Corresponding

comparative results in Table 4 indicate that the optimal inspection time Tp* increase with the rise of kc1 ,

kc 2 and kt respectively. However, a drop in Tp* occurs when the value of d1 becomes larger from 1 to 2. When kc1 , kc 2 and kt get bigger, it means that the conditions for the deterioration of component operation state become more strict, so the optimal inspection time Tp* can be extended. Corresponding results of comparisons show that Tp* becomes longer along with the increase of bcw , bcdw and btw respectively. It can be interpreted that the system imbalance criteria become more difficult to be met when the values of bcw , bcdw and btw get bigger, thus the optimal inspection time Tp* can be extended.

29

By analyzing cases 7 and 8, it can be figured out the decrease d 2 has a remarkable effect on Tp* , increasing by 12%, which can be explained as the much difficulty for the system to reach one of the critical imbalance criteria. Table 4 Optimal inspection interval and corresponding cost with different model parameters. Component-level parameters

System-level parameters

Case

Tp*

C Tp* 

kc1

d1

kc 2

kt

bcw

bcdw

d2

btw

1

5

1

5

6

3

5

2

6

5.8

6.6577

2

6

1

5

6

3

5

2

6

6.6

4.9224

3

5

2

5

6

3

5

2

6

4.8

8.3897

4

5

1

4

6

3

5

2

6

5.2

6.9643

5

6

1

5

7

3

5

2

6

7.7

4.6966

6

6

1

5

7

4

5

2

6

7.9

4.7275

7

6

1

5

7

3

4

2

6

7.5

4.7308

8

6

1

5

7

3

4

1

6

8.4

4.7166

9

6

1

5

7

3

4

2

5

7.4

4.7190

6. Conclusion In this paper, we propose a performance-balanced system operating in a shock environment which is barely seen in the existing research. The performance-balanced system is composed of n components which suffer possibly valid shocks in the external environment. The lifetime of the components is divided into two stages, good operation state and poor operation state. A new concept of balance index is introduced to fit a real situation of engineering applications and defined as if the locations of components which are in their inferior operation status satisfy a certain criterion. Three performance imbalance criteria of the system are proposed in this paper. The system fails when the system loses its 30

performance balance or a component fails, whichever comes first. Some probabilistic indices of the components and system are obtained by a two-step finite Markov chain imbedding approach, such as survival probabilities of the components which are in different states, system reliability, expected shock length, expected system lifetime and so on. A new maintenance policy is designed for the proposed model by considering preventive, corrective and opportunistic maintenance. With the intent of minimizing the maintenance cost, an optimization model is established and a simulation algorithm is given to derive the approximately optimal preventive maintenance inspection interval. Numerical examples for a battery pack system in EV are given to illustrate the proposed model. The future research of this paper can involve in proposing new concepts of system balance and then establishing other new balanced systems on the basis of other real engineering background. Besides, this work can be extended by considering not only external shocks but also the internal degradation of components can trigger the transitions of component states. Another extension can be studied by considering the case that the degradation process of component (or system) exhibits a characteristic of multi-stage (more than two stages) for the future research direction. Finally, it is also worth studying the condition-based maintenance policy which is very cost-efficient for the performance-balanced system as well as the joint optimization of maintenance policy and spare parts in the future research. Conflict of Interest No conflict of interest exists in the submission of this manuscript, and this manuscript is approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described is an original research that has not been published previously, and not under consideration for publication elsewhere.

Acknowledgements This work is supported by the National Natural Science Foundation of China (71971026, 71572014) and the Basic Scientific Research Project of Education Department of Heilongjiang Province (135109529). Appendix A. Transition probabilities among states of the i-th component The transition probabilities among states of the i-th component are presented in the following. 31

(1) If 0  nics1  kc1 , 0  nid  d1 , nics 2  0 , nits  0 , P{Ymi  (nics1  1,0,0,0) | Y( m1)i  (nics1 , nid , nics 2 , nits )}  pi .

(2) If nics1  0 , nid  0 , nics 2  0 , nits  0 , P{Ymi  (nics1  1,0,0,0) | Y( m1)i  (nics1 , nid , nics 2 , nits )}  pi .

(3) If nics1  0 , nid  0 , nics 2  0 , nits  0 , P{Ymi  (0,0,0,0) | Y( m1)i  (nics1 , nid , nics 2 , nits )}  1  pi .

(4) If 0  nics1  kc1 , 0  nid  d1 , nics 2  0 , nits  0 , P{Ymi  (nics1 , nid  1,0,0) | Y( m1)i  (nics1 , nid , nics 2 , nits )}  1  pi .

(5) If 0  nics1  kc1 , nid  d1 , nics 2  0 , nits  0 , P{Ymi  (0,0,0,0) | Y( m1)i  (nics1 , nid , nics 2 , nits )}  1  pi .

(6) If nics1  kc1 , nid  0 , 0  nics 2  kc 2  1 , nics 2  nits  kt  1 , P{Ymi  (nics1 , nid , nics 2  1, nits  1) | Y( m1)i  (nics1 , nid , nics 2 , nits )}  pi .

(7) If nics1  kc1 , nid  0 , 0  nics 2  kc 2 , nics 2  nits  kt , P{Ymi  (nics1 , nid ,0, nits ) | Y( m1)i  (nics1 , nid , nics 2 , nits )}  1  pi .

(8) If nics1  kc1 , nid  0 , nics 2  kc 2  1 , nics 2  nits  kt  1 , P{Ymi  E ifc | Y( m1)i  (nics1 , nid , nics 2 , nits )}  pi .

(9) If nics1  kc1 , nid  0 , 0  nics 2  kc 2  1 , nits  kt  1 , P{Ymi  E ifc | Y( m1)i  (nics1 , nid , nics 2 , nits )}  pi .

32

(10) P{Ymi  E ifc | Y( m1)i  E ifc }  1 . (11) All other transition probabilities are zero. References [1] Hua DG, Elsayed EA. Reliability estimation of k-out-of-n pairs:G balanced systems with spatially distributed units. IEEE T Reliab 2016;65:886-900. [2] Hua DG, Elsayed EA. Degradation analysis of k-out-of-n pairs:G balanced system with spatially distributed units. IEEE T Reliab 2016;65:941-56. [3] Hua DG, Elsayed EA. Reliability approximation of k-out-of-n pairs: G balanced systems with spatially distributed units. IISE Trans 2018;50:616-26. [4] Guo J, Elsayed EA. Reliability of balanced multi-level unmanned aerial vehicles. Comput Oper Res 2019;106:1-13. [5] Endharta AJ, Yun WY, Ko YM. Reliability evaluation of circular k-out-of-n: G balanced systems through minimal path sets. Reliab Eng Syst Safe 2018;180:226-36. [6] Cui LR, Gao HD, Mo YC. Reliability for k-out-of-n: F balanced systems with m sectors. IISE Trans 2018;50:381-93. [7] Zhao X, Cui LR. Reliability evaluation of generalised multi-state k-out-of-n systems based on FMCI approach. Int J Syst Sci 2010;41:1437-43. [8] Ling XL, Wei YZ, Si SB. Reliability optimization of k-out-of-n system with random selection of allocative components. Reliab Eng Syst Safe 2019;186:186-93. [9] Eryilmaz S, Devrim Y. Reliability and optimal replacement policy for a k-out-of-n system subject to shocks. Reliab Eng Syst Safe 2019;188:393-7. [10] Cui LR, Chen JH, Li XC. Balanced reliability systems under Markov processes. IISE Trans 2019;51:1025-35. [11] Zhao X, Wang SQ, Wang XY, Fan Y. Multi-state balanced systems in a shock environment. Reliab Eng Syst Safe 2020; DOI: 10.1016/j.ress.2019.106592. 33

[12] Bai JM, Zhang ZG, Li ZH. Lifetime properties of a cumulative shock model with a cluster structure. Ann Oper Res 2014;212:21-41. [13] Mallor F, Omey E. Shocks, runs and random sums. J Appl Probab 2001;38:438-48. [14] Eryilmaz S. Discrete time shock models involving runs. Stat Probabil Lett 2015;107: 93-100. [15] Cha JH, Finkelstein M. On new classes of extreme shock models and some generalizations. J Appl Probab 2011;48:258-70. [16] Cirillo P, Husler J. Extreme shock models: an alternative perspective. Stat Probabil Lett 2011;81:2530. [17] Li ZH, Kong XB. Life behavior of delta-shock model. Stat Probabil Lett 2007;77:577-87. [18] Fan M, Zeng Z, Zio E, Kang R. Modeling dependent competing failure processes with degradationshock dependence. Reliab Eng Syst Safe 2017;165:422-30. [19] Rafiee K, Feng QM, Coit DW. Reliability assessment of competing risks with generalized mixed shock models. Reliab Eng Syst Safe 2017;159:1-11. [20] Zhao X, Guo XX, Wang XY. Reliability and maintenance policies for a two-stage shock model with self-healing mechanism. Reliab Eng Syst Safe 2018;172:185-94. [21] Zhao X, Wang SQ, Wang XY, Cai K. A multi-state shock model with mutative failure patterns. Reliab Eng Syst Safe 2018;178:1-11. [22] Cha JH, Finkelstein M. On some characteristics of quality for systems operating in a random environment. P I Mech Eng O-J Ris 2019;233:257-67. [23] Shen JY, Cui LR, Yi H. System performance of damage self-healing systems under random shocks by using discrete state method. Comput Ind Eng 2018;125:124-34. [24] Che HY, Zeng SK, Guo JB. Reliability analysis of load-sharing systems subject to dependent degradation processes and random shocks. IEEE Access 2017;5:23395-404. [25] Miyatake S, Susuki Y, Hikihara T, Itoh S, Tanaka K. Discharge characteristics of multicell lithium-ion battery with nonuniform cells. J Power Sources 2013;241:736-43. 34

[26] Gong XZ, Xiong R, Mi CC. Study of the characteristics of battery packs in electric vehicles with parallel-connected lithium-ion battery cells. IEEE T Ind Appl 2015;51:1872-9. [27] Hoque KA, Mohamed OA, Savaria Y. Towards an accurate reliability, availability and maintainability analysis approach for satellite systems based on probabilistic model checking. Des Aut Test Europe 2015:1635-40. [28] Yang L, Ma XB, Peng R, Zhai QQ, Zhao Y. A preventive maintenance policy based on dependent twostage deterioration and external shocks. Reliab Eng Syst Safe 2017;160:201-11. [29] Cha JH, Finkelstein M, Levitin G. On preventive maintenance of systems with lifetimes dependent on a random shock process. Reliab Eng Syst Safe 2017;168:90-7. [30] Zhang XH, Zeng JC. A general modeling method for opportunistic maintenance modeling of multiunit systems. Reliab Eng Syst Safe 2015;140:176-90. [31] Hoque KA, Mohamed OA, Savaria Y. Formal analysis of SEU mitigation for early dependability and performability analysis of FPGA-based space applications. J Appl Logic 2017;25:47-68. [32] Hoque KA, Mohamed OA, Savaria Y. Dependability modeling and optimization of triple modular redundancy partitioning for SRAM-based FPGAs. Reliab Eng Syst Safe 2019;182:107-19. [33] Fu JC. Reliability of consecutive-k-out-of-n: F-systems with (k-1)-step Markov dependence. IEEE T Reliab 1986;35:602-6. [34] Cui LR, Lin C, Du SJ. m-consecutive-k, l-out-of-n systems. IEEE T Reliab 2015;64:386-93. [35] Du SJ, Lin C, Cui LR. Reliabilities of a single-unit system with multi-phased missions. Commun StatTheor M 2016;45:2524-37. [36] Wu TL. On finite Markov chain imbedding and its applications. Methodol Comput Appl 2013;15:45365. [37] Zhao X, Wang XY, Sun G. Start-up demonstration tests with sparse connection. Eur J Oper Res 2015;243:865-73. [38] Zhao X, Wang XY, Coit DW, Chen Y. Start-up demonstration tests with the intent of equipment classification for balanced systems. IEEE T Reliab 2019;68:161-74. 35

[39] Wang XY, Zhao X, Sun JL. A compound negative binomial distribution with mutative termination conditions based on a change point. J Comput Appl Math 2019;351:237-49. [40] Fu JC, Shmueli G, Chang YM. A unified Markov chain approach for computing the run length distribution in control charts with simple or compound rules. Stat Probabil Lett 2003;65:457-66. [41] He QM. Fundamentals of matrix-analytic methods. New York, NY: springer; 2014. [42] Zhou J, Tsianikas S, Birnie DP, Coit DW. Economic and resilience benefit analysis of incorporating battery storage to photovoltaic array generation. Renew Energ 2019;135:652-62.

36