A confidence-based approach to reliability design considering correlated failures

A confidence-based approach to reliability design considering correlated failures

Reliability Engineering and System Safety 165 (2017) 102–114 Contents lists available at ScienceDirect Reliability Engineering and System Safety jou...

682KB Sizes 2 Downloads 100 Views

Reliability Engineering and System Safety 165 (2017) 102–114

Contents lists available at ScienceDirect

Reliability Engineering and System Safety journal homepage: www.elsevier.com/locate/ress

A confidence-based approach to reliability design considering correlated failures

MARK



Lance Fiondellaa,1, Yi-Kuei Linb, , Hoang Phamc, Ping-Chen Changd, Chendong Lie a

Department of Electrical and Computer Engineering, University of Massachusetts, 285 Old Westport Road, Dartmouth, MA 02747, USA Department of Industrial Management, National Taiwan University of Science & Technology, Taipei 106, Taiwan ROC c Department of Industrial Engineering, Rutgers University, Piscataway, NJ 08854, USA d Department of Industrial Engineering and Management, National Quemoy University, Kinmen County 892, Taiwan ROC e Dell Inc., 5455 Great America Parkway, Santa Clara, CA 95054, USA b

A R T I C L E I N F O

A BS T RAC T

Keywords: Reliability design Redundancy allocation s-correlated failure s-confidence optimization

To maintain a competitive edge, technology manufacturers must produce systems that are reliable enough to satisfy customers yet cheap enough to engineer so that they are profitable. This paper presents an optimization model to maximize the statistical confidence in product profitability, permitting flexibility in the design and number of the units manufactured. This is unlike traditional approaches, which focus on the two cases that optimize the reliability of a single unit or the s-expected profit obtained from a very large number of units. These two extremes disregard a practical concern, namely the negative impact that a larger than s-expected number of failures will exert on product profitability. This paper formulates an optimization problem to mitigate this risk. Virtually all reliability optimization problems also assume that component failures are s-independent. The present paper does not impose this assumption. The utility of the approach is demonstrated through a series of examples which compare the reliability of systems designed with and without the assumption of s-correlated component failures. The results indicate that explicitly considering s-correlation consistently mitigates the risk to profitability more effectively than the same method when component failures are assumed to be sindependent.

1. Introduction Reliability and cost are competing constraints in manufactured systems.2 Reliability is essential to achieve a desired level of customer satisfaction. On the other hand, cost control is critical to maintain product profitability. High reliability alone will not guarantee product viability because production cost must be managed. Similarly, arbitrary cost cutting can be detrimental to profit when the resulting system reliability is too low. Thus, a compromise between these two factors is necessary to optimize profitability. While the reliability of each unit and the overall profitability are desirable attributes of any system design, a methodology to address the uncertainty inherent in the production of a finite number of units is needed to mitigate risk. Two approaches to optimize the reliability of commercial off-theshelf systems dominate the research literature. The first [1] makes critical components fault-tolerant to improve system reliability. One

shortcoming of the redundancy allocation approach is that virtually all systems are produced in quantities greater than one and there is no guarantee that what is cost optimal for a single unit will be cost optimal for a larger number of items. Furthermore, the vast majority of reliability optimization techniques rely on the assumption that component failures are statistically independent, which can lead to optimistic overestimates of system reliability. A second popular approach to optimal reliability design [2] attempts to maximize the s-expected profit of a unit from an arbitrarily large population. A limitation of the profit maximizing approach is that not all products are capable of achieving an s-expected value because variance in the number of failures from a finite lot can introduce non-trivial variance into the actual profit derived. Recent contributions cost-informed reliability optimization include the work of Amari et al. [3] who proposed optimal cost-effective design policies for k-out-of-n:G subsystems that can experience imperfect fault-coverage and Amari and Pham [4], which



Corresponding author. E-mail addresses: lfi[email protected] (L. Fiondella), [email protected] (Y.-K. Lin), [email protected] (H. Pham), [email protected] (P.-C. Chang), [email protected] (C. Li). 1 Tel.: +1(508)999-8596. 2 The terms system and unit are used interchangeably in this paper. http://dx.doi.org/10.1016/j.ress.2017.03.025 Received 17 June 2016; Received in revised form 20 January 2017; Accepted 22 March 2017 Available online 28 March 2017 0951-8320/ © 2017 Elsevier Ltd. All rights reserved.

Reliability Engineering and System Safety 165 (2017) 102–114

L. Fiondella et al.

Nomenclature

μ

xi p ϕ E[Rs] m bi gi gi, j lj uj Cr Cm Cl F N P E [P ] P* α μ n μ′

μi Σ ρi, j R ri λ i, j

ρ μ″

Component redundancy in position i. Number of positions in system architecture. System structure function. s-expected system reliability. Number of resource constraints. Budget constraint of ith resource. Resource i consumption function. Resource i consumed by redundancy in position j. Redundancy lower bound of position j. Redundancy upper bound of position j. Reward for a reliable system. Cost to manufacture a system. Loss from an unreliable system. Funds available for manufacturing. Number of systems manufactured. Random variable for profit of system design. s-expected profit of system design. Target profit of system design. Probability P* is not achieved. s-expected component reliability. Number of redundant components. Reliability of parallel system composed of n components of reliability μ. Correlation between component failures. Reliability of parallel system composed of n components of reliability μ and correlation ρ.

Λ Λk λik, j k λ min λmin Xk X

Sk S S i, j I{0}(ψ ) C c Θ C ( Θ) zi, j Zi

1 × n vector of non-identical s-expected component reliabilities. s-expected reliability of component i. n×n correlation matrix. Correlation between failures of components i and j. 1 × n vector of component states. State of component i, ri ∈ {0, 1}. Poisson rate parameter encoding correlation between components i and j. n×n upper diagonal matrix of rate parameters. Λ matrix in iteration k. (i , j )th entry of Λk . Minimum λik, j > 0 . k 1 × n(n + 1)/2 vector of λ min . k kth Poisson variable, with rate λ min . 1 × n(n + 1)/2 Poisson vector encoding component correlations. k ) is added. Set of components to which Xk (λ min k 1 × n(n + 1)/2 vector of S . Set of components that fail when Xi = 1 and Xj = 1. Indicator function. I{0}(ψ ) = 1 for ψ = 0 . Cutsets of system. Individual cuts c ∈ C . 1 × n(n + 1)/2 vector of Poisson outcomes. Set of component failures given Θ . jth node at depth i of branch and bound tree. Set of nodes at depth i.

to the broader class of coherent systems [1] and can therefore also consider systems possessing complex network structures. Our previous papers [8–10] were restricted to the modeling and sensitivity analysis of s-correlation on system reliability and have not been applied in the context of any reliability optimization problem. We demonstrate the effectiveness of these approaches for solving the s-confidence optimization problem through a series of examples. The results indicate that the proposed techniques integrating the assumption of s-correlated component failures outperform the same techniques when component failures are assumed to be s-independent, thereby mitigating the negative impact of correlation on reliability while simultaneously maximizing the s-confidence profitability exceeds a desired target. Thus, the proposed approach can provide useful insight to managers, engineers, and scientists wishing to understand how correlated failures in the components of a system may impact its reliability and the potentially negative influence a larger than s-expected number of failures will exert on product profitability. The paper is organized as follows: Section 2 summarizes related research. Section 3 outlines two widely accepted optimization models. Section 4 discusses some limitations of these previous approaches and proposes techniques to remedy these shortcomings. Section 5 provides illustrations. Section 6 offers conclusions and future research.

minimizes the cost of complex repairable systems by identifying the optimal number of spares for each subsystem. However, these recent contributions continue to assume that the s-expected profit can be attained by producing a large quantity of items. Thus, an approach that quantifies the impact of s-correlated component failures on the reliability of a system and a larger than s-expected number of failures in a small lot of items will complement existing techniques well. Two methods are frequently used for modeling the choice among uncertain outcomes: stochastic dominance and mean-risk approaches [5]. Value at Risk (VaR) measures an investment's risk by estimating how much an investment might lose given normal market conditions in a specified period of time and it is used by firms and regulators [6] in the financial industry to assess the assets that may be needed to cover possible losses. Conditional Value at Risk (CVaR) [7] is the expected return on a portfolio in the worst cases, which is intended to be more sensitive to the shape of the tail of the loss distribution. This paper presents an approach to manage uncertainty when a system is produced in smaller quantities. Instead of maximizing the s-expected profit, the approach identifies a design to maximize the statistical confidence that a desired profit will be realized. The goal is to identify the ideal combination of a reliable system design and the quantity of units to be produced given a limited budget. Thus, it becomes possible to manufacture a small number of highly reliable units of high quality or a larger number of lower quality units with lower reliability. This approach allows an organization to study the potential risk and reward of alternative designs. In addition to the greater realism enabled by considering the production of a finite number of units, the proposed approach also removes the widespread and unrealistic assumption that redundant components fail in a statistically independent manner. Algebraic expressions [8] and numerical algorithms [9] to quantify the impact of s-correlation on discrete system reliability are utilized. The algebraic expressions are applicable to several common structures, including series, parallel, and series-parallel systems, are computationally efficient, and therefore suitable for intensive calculations performed during optimization. The numerical algorithms are applicable

2. Related research This section reviews the related research along multiple dimensions, including reliability optimization problems, the redundancy allocation problem, optimizing average system cost, correlated or dependent component failure times, and uncertainty and risk aversion for system reliability optimization. 2.1. Reliability optimization Many techniques have been applied to reliability optimization problems including heuristic methods, dynamic programming [11], 103

Reliability Engineering and System Safety 165 (2017) 102–114

L. Fiondella et al.

The previous research on profit optimization focuses on selecting the optimal number of components to maximize the s-expected system reliability, while minimizing the s-expected system cost or maximizing the s-expected system profit. Several types of systems with two modes of failure have been studied, including {k , n − k + 1}-out-of-n systems [29,30], parallel systems [31], and parallel-series systems [32] as well as applications to multi-level maintenance policies [33].

optimization includes the work of Rubinstein et al. [37] who extended redundancy allocation for reliability optimization of series-parallel systems to the case of uncertainty in system parameters. The authors presented methods based on analytic expressions and simulation to obtain the CDF of the total system capacity. To more faithfully address the practical concerns of the engineering design community, Coit and Smith [38] formulated redundancy allocation problems that use lowerbounds for optimization when both the component reliability and timeto-failure are random variables and consider the risk-profile of system designers and users. Coit and Smith [39] subsequently solved a redundancy allocation problem where the objective was to maximize a lower percentile of the system time-to-failure distribution when the available components possess random Weibull scale parameters, demonstrating the design approach is sensitive to the user's perceived risk. Coit et al. [40] also addressed system reliability optimization when component reliability estimates are treated as random variables with estimation uncertainty. They apply multiple objective concepts, including Pareto optimality to determine solutions that maximize system reliability and minimize variance, thereby providing flexibility in the decision-making process. Taflanidis and Beck [41] proposed an algorithm they call Stochastic Subset Optimization for reliability optimization problems, where the design variables are uncertain. The approach simulates samples of variables that lead to system failure and a smaller subset of the design space containing near-optimal design variables is identified while simultaneously performing sensitivity analysis to assess the influence of the design variables and uncertain model parameters. Tekiner-Mogulkoc and Coit [42] describe system reliability optimization models considering uncertainty, proposing algorithms to minimize the coefficient of variation of the system reliability estimate with respect to a minimum system reliability constraint and other system related constraints. Feizollahi et al. [43,44] proposed a robust deviation framework to deal with uncertain component reliabilities for the constrained redundancy optimization problem. Their approach is based on a linearized binary version of standard nonlinear integer programming formulations and extended by assuming that the component reliabilities belong to an interval uncertainty set, where only upper and lower bounds are known for each component reliability. They then develop a Min-Max regret model to handle data uncertainty. To accommodate unplanned variations or changing environments and operating stresses, Chatwattanasiri et al. [45] formulated a redundancy allocation problem to select the best design solution when there are multiple choices of components and system-level constraints, in order to support both risk-neutral and risk-averse decision-making.

2.4. Correlated or dependent component failures

2.6. Distinguishing contributions

Representative examples of research on correlated or dependent component failure times include the work of Coit and English [34] who presented a model based on proportional hazards to handle cases where component failure times are statistically correlated because a system's components are exposed to common environmental conditions. Song et al. [35] developed a multi-component reliability model and applied two preventive maintenance policies to a system where component failure processes are mutually competing and s-dependent due to simultaneous exposure to degradation and shock loads. Song et al. [36] later considered dependency of transmitted shock sizes and shock damages to specific failure processes of a system's components, when a system is subject to hard and soft failures, where hard failure can occur when transmitted system shocks are large enough to cause any component in a series system to fail immediately and the soft failure process can occur when any component deteriorates to a certain failure threshold.

The distinguishing contributions of the present paper is a combination of (i) discrete algebraic reliability expressions for series, parallel, and k-out-of-n:G systems composed of components with common reliability and correlation parameters and (ii) numerical algorithms for arbitrarily structured discrete systems composed of components possessing non-identical reliabilities and non-identical correlations with a confidence-based approach for the average system profit optimization problem when the number of units manufactured is finite. Alternative combinations of dependency modeling and uncertainty and risk aversion strategies for system reliability optimization are also possible.

discrete optimization [12], nonlinear programming [13], and metaheuristic algorithms like genetic algorithms [14], simulated annealing [15], and tabu search [16,17]. The monograph by Kuo et al. [1] provides a definitive summary of the application of these approaches to problems in reliability engineering. This reference surveys optimization problems in reliability such as redundancy allocation [18,19], component improvement and assignment [20], as well as multiobjective [21,22] problems. 2.2. Optimal redundancy allocation Recent research on the redundancy allocation problem for reliability optimization includes the work of Tian et al. [23] who presented a redundancy allocation approach to optimize the reliability of multistate series–parallel systems, while minimizing system cost and satisfying a system availability constraint. Recent advances such as [24] and [25] suggested new algorithms to optimize system reliability through redundancy allocation techniques. Liu et al. [26] developed a joint redundancy allocation and imperfect maintenance optimization strategy for multi-state systems to identify a multi-state element replacement strategy under imperfect repair to achieve a desired availability at minimum average cost. Lai and Yeh [27] proposed a two-stage simplified swarm optimization approach solve the redundancy allocation problem for a multi-state bridge system found in load balancing and control. Xu and Liao [28] studied the reliability of oneshot systems containing multifunctional components such as ad hoc sensor networks, deriving expressions for system reliability and the reliability of each function and formulating the redundancy allocation problem to maximize system reliability. While all of these studies have made significant contributions to advance the state of the art in redundancy allocation for reliability optimization and related problems, they all seek to optimize a single system consisting of suncorrelated components. 2.3. Profit/cost optimization

3. Reliability modeling and optimization This section reviews the key features of two predominant challenges in reliability engineering, maximization of reliability and profitability. Section 3.1 reviews a widely-studied problem in reliability optimization, the redundancy allocation problem [1]. Section 3.2 outlines another closely related approach, which seeks to optimize the sexpected system profit [2].

2.5. Reliability optimization under uncertainty and risk aversion Research on uncertainty and risk aversion for system reliability 104

Reliability Engineering and System Safety 165 (2017) 102–114

L. Fiondella et al.

built from cheap and unreliable components will possess a low reliability and often incur the penalty for failure. However, the cost of manufacturing a highly reliable system from more expensive, highquality components will lower product profitability. Eq. (3) guides reliability apportionment to achieve maximum economic advantage, optimizing the s-expected system profit at the expense of system reliability. Thus, reliability may only be improved to the point it increases profit and no more.

3.1. Optimal reliability design The framework proposed by Kuo et al. [1] divides the reliability optimization models discussed in the literature into eight categories. The present discussion is given in the context of the third category of this framework, redundancy allocation, because it is one of the most widely studied. The extensions presented here can also be applied to several of the other seven categories, including component assignment and multi-objective optimization problems and we solve a component assignment problem in the illustrations. In the redundancy allocation model, system reliability is improved by allocating additional components to each position3 in the system structure. However, this redundancy is limited by resource constraints. The challenge underlying this problem is to identify the optimal redundancy levels x1, …, xp that maximize the reliability of a p-stage system, while simultaneously satisfying a set of resource constraints. The mathematical formulation of this reliability optimization problem is [1].

max

E[Rs] = ϕ(x1, …, xp )s. t . :

(1 ≤ i ≤ m )

l j ≤ xj ≤ uj ,

4. Component correlation and uncertainty This section discusses two assumptions of existing optimization approaches that limit their applicability. The first assumption is that component failures are statistically independent. The second assumption is that a large number of systems can be manufactured, which is necessary to guarantee the optimal system profit based on the sexpected component reliabilities. We then propose techniques to remove these limitations. Section 4.1 presents two techniques to compute the negative impact of s-correlated failures on system reliability. Section 4.2 proposes an alternative optimization approach, that seeks to maximize the s-confidence a design meets a target profit. These s-correlation and s-confidence optimization techniques are easily combined into a single procedure.

gi(x1, …, xp ) ≤ bi ,

(1 ≤ j ≤ p),

(1)

where E[Rs] = ϕ(x1, …, xp ) is the s-expected system reliability, computed in terms of redundancy levels x1, …, xp and system structure ϕ. The first constraint indicates the total consumption of resource i, denoted gi(•), must be less than the maximum allowable budget bi for the ith of m resource constraints. Examples of resources include power consumption, cost, and weight. The second set of boundary constraints require the redundancy in stage j of the system to be an integer greater than or equal to lj, but less than or equal to uj. Upper bounds on resources are determined by the nature of the system. For example, the cost of launching a space vehicle may limit redundancy in heavier components. Redundancy allocation to optimize system reliability is a nonlinear integer programming problem and the majority of redundancy allocation studies assume that total expenditures is equivalent to the sum of the component resource expenditures

4.1. Component correlation A notable shortcoming associated with previous formulations of the redundancy allocation problem is the assumption that the failure of redundant components occurs in a statistically independent manner. Under this assumption, the reliability expression for a parallel system with n components of s-expected reliability μ is

μ′ = 1 − (1 − μ)n .

Here s-expected reliability may be interpreted as the probability that a system or component survives beyond the end of a warranty period, which may be computed as the fraction of components that fail after the end of warranty. Thus, redundancy may be justified when the probability that a single component survives past the end of warranty is low. However, the assumption of statistically independent component failures is often violated in practice because various environmental factors can contribute to s-correlated failures. These s-correlated failures can lead to a larger number of component failures. Hence, a reliability estimate based on Eq. (4) may be optimistic. The following sections introduce two techniques to compute system reliability when the components are s-correlated. The approach discussed in Section 4.1.1 presents an algebraic expression for scorrelated identical components [8], where components possess equal reliability and a common s-correlation coefficient. Section 4.1.2 provides an overview of an efficient numerical algorithm to compute the reliability of systems composed of s-correlated non-identical components [9]. This latter approach allows both the component reliabilities and pairwise component failure correlations to assume distinct values.

p

gi(x1, …, xp ) =

∑ gi,j(xj ). j =1

(2)

Existing studies apply a wide variety of solution methods to problems of this form. 3.2. Optimal system profit Another common approach to reliability optimization is to formulate an economic profit function [2]. This approach assigns a reward for reliable systems and exacts a penalty for systems that fail. A simple profit function is

E[P ] = E[Rs](Cr − Cm ) − (1 − E[Rs])Cl ,

(3)

where Cr, Cm, and Cl are constants for the reward, cost of manufacturing, and loss respectively. The nature of Cl varies depending on the type of product. For example, if Cr denotes the price paid for one unit, a money back guarantee can be characterized by Cl = Cm . Other more complicated models incorporating sophisticated warranty policies [46] are also possible. Clearly, profitability must come from the fraction of the items that are reliable. Furthermore, the cost of manufacturing must be less than the reward, otherwise the product cannot be profitable. Thus, an n implicit constraint is Cr > Cm = ∑ j =1 cjxj , where cj is the cost of a component in stage j. There will also be a relationship between manufacturing costs and the profit of a design. For example, a product 3

(4)

4.1.1. s-correlated identical components To reduce the lack of realism embodied in the assumption of sindependent component failures, consider the case where n identical components possess s-expected reliability μ and common s-correlation coefficient ρ ≥ 0 , which possesses a similar meaning to the correlation coefficient of the multivariate normal and characterizes the dependence between the failures of the components comprising a system due to their common operating environment. The reliability of a parallel system susceptible to s-correlated component failures may be characterized as the special case of the s-correlated binomial distribution [8], where at least one of n components is reliable.

The terms position and stage are used interchangeably throughout this paper.

105

Reliability Engineering and System Safety 165 (2017) 102–114

L. Fiondella et al.

correlation between components i and j as a function of the component's s-expected reliabilities μi and μj as well as their s-correlation ρi, j .

Theorem 4.1. The reliability of a parallel system with n identical components of s-expected reliability μ and s-correlation coefficient ρ ≥ 0 is

μ″ =

⎞ ⎛ ⎛ 1−μ⎞ 1 − ⎜1 − μ⎜1 + ρ μ ⎟⎟ ⎠⎠ ⎝ ⎝ 1+ρ

1−μ μ

l of this matrix Each iteration l identifies the smallest non zero entry λ min and creates the set Sl, which initially contains the subscripts i and j of this minimum element. Each component b ∉ S l is added to the set S l ′. l l If all λil,−1 j > 0 , where i , j ∈ S ′, component b is added to S . This requirement preserves the positivity of all entries in the matrix, ensuring that they will be valid rate parameters for Poisson variables. l The value λ min is then subtracted from entries in the matrix that correspond to the s-correlations between the subset of components included in the set Sl. This process continues until all entries of the matrix Λ are zero. The algorithm returns two vectors. The first, λmin , is the minimum element found in each iteration and the second is the sets of components S from which this minimum was subtracted. The λmin are the rates of the s-independent Poisson variables and the sets S are the corresponding subsets of components to which these variables will be added.

n

. (5)

Note in (5) that limρ→0μ″ simplifies to μ′ of Eq. (4). It also follows that limρ→1.0μ″ = μ, indicating that s-correlation can completely spoil the potential benefits of component redundancy. Thus, Eq. (5) will deter an optimization procedure from selecting a high level of faulttolerance in a stage where components possess a large amount of scorrelation because the reliability gains will not be worth the price paid for the redundancy. In practice, the correlation coefficient can be estimated by subjecting a population of items to some form of life testing and then examining the components of each system upon failure to determine which components in that system experienced failure. The binary vectors obtained from the observations of component failure and non-failure within each individual system can be used to compute the sample correlation coefficients for each pair of components. The equation given in Theorem 4.1 makes the simplifying assumption that each correlation is identical, which may be reasonable in some systems where common manufacturing properties of the components or common environmental factors drive component failure. The following section discusses the more general case where component reliabilities and correlations are not assumed to be identical.

Algorithm 1. MVB encoding algorithm. 1: 2: 3: 4: 5: 6: 7: 8: 9:

4.1.2. s-correlated non-identical components This section presents a two-stage algorithm consisting of a Poisson encoding step and branch and bound algorithm. The encoding algorithm is exact [9], precisely characterizing the success probabilities and correlations of the multivariate Bernoulli distribution representing component reliabilities and correlations with a quadratic number of Poisson variables. An alternative approach would require enumeration of all 2n possible combinations of outcomes of component success and failure [47]. The branch and bound algorithm then prunes the Poisson encoding to only those combinations of outcomes that correspond to combinations of Bernoulli outcomes that contribute to system reliability according to the cut sets determined from the system's structure function. Thus, the multivariate Bernoulli distribution and the interpretation of its parameters as component reliabilities and correlations between the failures of the components can augment reliability optimization problems by considering the impact of correlation. The algorithm for s-correlated non-identical components can be applied to any coherent system structure [1]. Thus, the approach is not limited to common structures like series, parallel, and series-parallel systems, and can also be used to evaluate the reliability of systems possessing complex network structures. The required inputs of the algorithm include the vector of s-expected component reliabilities μ , s-correlation matrix Σ , and system structure function ϕ. The component reliabilities are characterized as a s-correlated multivariate Bernoulli distribution R ∼ MVB(μ, Σ) and the system reliability is computed as the sum of the probabilities of the component outcomes that produce a reliable system according to the system structure function such that ϕ(R) = 1. The following sections discuss the two stages of the algorithm. The first stage encodes the reliability of the n s-correlated components with O(n 2 ) s-independent Poisson variables, which is significantly more efficient than the O(2n) parameters used by alternative techniques to quantify s-correlation [48,49]. Stage two computes system reliability from the Poisson encoding and cutsets of the system determined from the structure function. Poisson encoding algorithm: Algorithm 1 [9] expresses the component reliabilities as functions of s-independent Poisson variables. The Poisson rate parameter λi0, j of matrix Λ0 encodes the s-

INPUT: μ, Σ for i=1 to n do for j=i to n do ⎛ λi0, j = log⎜1 + ρi, j ⎝ end for end for l=1 while Λ ≠ 0

(1 − μi )(1 − μj ) μi μj

⎞ ⎟ ⎠

l λ min = FindNonZeroMinimum(Λl −1)

10:

S l = {i , j}

11:

for b ∉ S l do

12:

Sl′ = Sl ∪ b

13:

l l if λil,−1 j ≥ λ min , ∀ i , j (i < j ∈ S ′) then

14:

Sl = Sl′ else

15: 16: 17: 18: 19: 20:

Sl = Sl end if end for for sil , sjl ∈ S l , 1 ≤ i ≤ j ≤ n do l λil, j = λil,−1 j − λ min

21: end for 22: l←l+1 23: end while 24: return λmin , S For each 1 ≤ i ≤ n write the reliability of component i as

⎛ l ⎞ k Ri = I{0}⎜⎜∑ X (λ min ) × I (i ∈ S k )⎟⎟ , ⎝ k =1 ⎠

(6) k λ min

k

where X is a Poisson variable with rate parameter and I (i ∈ S ) = 1 if the ith component is a member of set Sk and zero otherwise. Component reliability is then determined according to the following indicator function

⎧1 ψ = 0 I{0}(ψ ) = ⎨ ⎩0 ψ > 0

(7)

Thus, a component is reliable only if all its constituent Poisson variables experience zero events. Furthermore, a Poisson variable that experiences one or more events fails components to which it belongs, encoding the s-correlations of the MVB distribution. This algorithm l k ∑k =1 λ min and that guarantees that I (i ∈ S k ) = λ i , i 106

Reliability Engineering and System Safety 165 (2017) 102–114

L. Fiondella et al. l

k ∑k =1 λ min I (i ∈ S k ) × I (j ∈ S k ) = λi, j . These two properties respectively ensure that E[Ri ] = μi and Corr (Ri , Rj ) = ρi, j , meaning that the Poisson variables encode the set of MVB parameters specified as input to Algorithm 1. Numerical example: The following three-component example illustrates the steps of Algorithm 1. The average component reliabilities and pairwise component correlations are set to 〈μ1 = 0.9136, μ2 = 0.9339, μ3 = 0.9274〉 and

component failure set C (Θ), which maps to a component reliability vector that produces an unreliable system. Clearly, this outcome contains a cut of the system structure that leads to system failure. Furthermore, any outcome Θ′ that changes any Poisson outcome from 0 to 1 can only increase the component failure set. Hence, C (Θ) ⊆ C (Θ′) and the combination of Poisson variables Θ′ must also map to system failure. The following algorithm uses this observation to reduce the number of Poisson combinations explored to produce a reliability estimate.

⎛ 1.0 0.088 0.050 ⎞ Σ = ⎜⎜ 0.088 1.0 0.074 ⎟⎟ ⎝ 0.050 0.074 1.0 ⎠

Algorithm 2. Branch and bound algorithm. 1: 2: 3: 4: 5: 6: 7: 8:

The Equation on line 4 within the loop on lines 2–6 produces the following upper diagonal matrix of Poisson rates to encode the MVB parameters

⎛ 0.09036 0.00717 0.00429 ⎞ Λ0 = ⎜⎜ 0.06839 0.00549 ⎟⎟ ⎝ 0.07537 ⎠

(8)

1 = 0.00429 , while line 10 initializes Line 9 identifies the minimum λ min the set for this first iteration to S1 = {1, 3}, since the minimum corresponds to the entry encoding the correlation between components one and three. Line 12 temporarily adds 2 to S1, while line 13 confirms 0 0 0 0 0 0 , λ1,2 , λ1,3 , λ 2,2 , λ 2,3 , and λ 3,3 are all greater than 0.00429, so line that λ1,1 1 1 14 retains 2 in S . Since S = {1, 2, 3} the loop on lines 19-20 subtracts 0.00429 from all size entries of Λ0 producing

⎛ 0.08607 0.00288 0.00000 ⎞ Λ1 = ⎜⎜ 0.06409 0.00120 ⎟⎟ ⎝ 0.07108 ⎠

9:

The algorithm begins from the state consisting of all zeros (Z 0 = 0), which corresponds to zero events in all Poisson variables, no component failures, and a reliable system. The children of this combination of variables is generated by setting one bit in the sequence from zero to one. This produces n(n + 1)/2 children z1, j with only the jth Poisson variable set to one (Xj = 1). The set Sj of components that fail when this variable is set to one is compared with the cutsets of the system. If the failure set of a child node contains a cutset of the system structure the node is removed from further consideration. Members of Z2 are generated from the combinations of outcomes that are not removed from Z1 as follows. For each pair of remaining child nodes z1, j and z1, k , where j < k , node z1, j generates a child with

(9)

The minimum of this second iteration is = 0.00120 and initially S 2 = {2, 3}, corresponding to the location of this minimum within the matrix. Line 12 temporarily adds 1 to S2, but line 16 ultimately 1 is not greater discards this addition because line 13 identifies that λ1,3 than the present non zero minimum 0.00120. Excluding 1 from S2 prevents the encoding of the correlation between components one and three from exceeding ρ1,3 = 0.050 . Thus, the loop on lines 19-20 1 1 1 , λ 2,3 , and λ 3,3 only, producing subtracts 0.00120 from λ 2,2

Xj = 1 and Xk = 1, possessing component failure set S j, k = S j ∪ S k and is compared with the system cutsets. Members of Zi, i ≥ 3 are generated and tested for removal in a similar fashion. The algorithm terminates when every combination of Zi contains a cut and produces an unreliable system. This approach finds all combinations of Poisson variables that produce a reliable system. In the worst case, this

(10) 2 λ1,1 ,

2 , λ1,2

and Similarly, the third iteration, subtracts 0.00288 from 2 , while iterations four through six subtract from the remaining nonλ 2,2 zero values on the diagonal one at a time. Table 1 summarizes the minimum and sets of each iteration returned by Algorithm 1. Thus, the encoding of the three components according to Eq. (6) are

R1 = I{0}(X1(0.00429) + X3(0.00288) + X6(0.08319))

(11)

R2 = I{0}(X1(0.00429) + X2(0.00120) + X3(0.00288) + X4(0.06001))

(12)

R3 = I{0}(X1(0.00429) + X2(0.00120) + X5(0.06988))

(13)

Zi +1 = Zi +1 − Zi, j

10: end if 11: end for 12: i←i+1 13: end while

2 λ min

⎛ 0.08607 0.00288 0.00000 ⎞ Λ2 = ⎜⎜ 0.06289 0.00000 ⎟⎟ ⎝ 0.06988 ⎠

INPUT: S , λmin , ϕ C = FindCuts(ϕ) Z0 = 0 i=0 while |Zi| ≠ 0 Zi +1 = GenerateChildren(Zi ) for j=1 to |Zi+1| if C (Zi +1, j ) ⊇ (c ∈ C )

2

algorithm explores 2O(n ) nodes. In practice, however, many of the combinations that lead to system unreliability are quickly eliminated. Thus, the branch and bound algorithm is often much faster than a deterministic approach that enumerates all 2n(n +1)/2 combinations of Poisson variables. The probabilities of the outcomes leading to a reliable system may be computed as follows. If the algorithm terminates after m ≤ n(n + 1)/2 iterations then the members of the sets Z0 through Zm−1 contain the combinations of Poisson variables needed for computing system reliability. Because the Poisson variables are s-

As noted in Eq. (7), a component is reliable only if all its Poisson variables experience zero events. The shared variables such as X1 and X3 in R1 and R2 encode the correlation between the failures of components one and two. Branch and bound algorithm: This section presents an efficient algorithm to estimate reliability from the Poisson encodings of the component reliabilities and the system structure. The approach identifies all combinations of Poisson variables that lead to a reliable system without enumerating many of the combinations that lead to system failure. The procedure relies on a simple observation to bound the number of combinations of Poisson variables that must be examined. Given a system of n components, n(n + 1)/2 Poisson variables are sufficient to encode the component reliabilities and scorrelations. Consider a combination of Poisson variables Θ with

Table 1 k Minimum λ min and sets per iteration.

107

Iteration (k)

k ) Minimum (λ min

Set (Sk)

1 2 3 4 5 6

0.00429 0.00120 0.00288 0.06001 0.06988 0.08319

{1, 2, 3} {2, 3} {1, 2} {2} {3} {1}

Reliability Engineering and System Safety 165 (2017) 102–114

L. Fiondella et al.

independent the probability of each combination is computed as n(n +1)/2 k ∏i =1 Pr{Xk (λ min )}, where k ⎧ ⎪ e−λ min : Xk = 0 k Pr{Xk (λ min )} = ⎨ k ⎪1 − e−λ min : Xk > 0 ⎩

(14)

Summing the probabilities of these combinations of Poisson outcomes produces the system reliability estimate. Numerical example: Continuing the three-component example illustrates the steps of Algorithm 2. Line 2 identifies the single cut as {1, 2, 3}, the case where all three components of the parallel system fail. Figure 1 shows the result of executing the branch and bound algorithm on the Poisson encoding of the three component parallel system. The node 000000 on the left corresponds to line 3, where all six Poisson variables experience zero events and all three components are reliable because there are no events in any of the Poisson variables in Eqs. (11)–(13). Line six generates the six children of 000000 shown in the second column of nodes in Figure 1, where a single distinct 0 has been changed to a 1 in each of these nodes. The nodes 10,000 is grey and not relevant to the reliability calculation because Table 1 indicates that setting X1 = 1 results in failure of all three nodes because S1 = {1, 2, 3}. However, none of the other five nodes produces system failure and thus must be expanded further. For example, node 010000 generates four children by preserving the prefix 01 and setting one of the final four bits to one. Node 011000 is eliminated because X2 = 1 and X3 = 1 fails the union of components {2, 3} and {1, 2} respectively, which means that all three components and the system fail. Node 010100 is retained because the X2 = 1 and X4 = 1 fails the union of {2, 3} and {2}, which leaves component one in the operational state and thus the system is reliable. The remaining states are enumerated in a similar fashion, considering only 23 of the 64 possible combinations to identify the 15 states corresponding to reliable operation of the three component parallel system. Because the Poisson variables are independent the reliability can be calculated as the sum of the probability of the 15 combinations identified by the branch and bound algorithm. To do this, first compute

Fig. 1. Branch and bound of three component parallel system encoding.

other circumstances, it could be more effective to produce more units at a smaller expense because a relatively small decrease in system reliability will be more than compensated for by the profits obtained from selling a larger population of items. In either case, flexibility to vary the design should be advantageous. There is no guarantee that the s-expected system profit will be realized. A more appropriate profit function for a small number of systems is N

E[P|N ] =

⎛ ⎞

∑ ⎜⎝ Ni ⎟⎠E[Rs]i (1 − E[Rs])N −i (iCr − (N − i )Cl ) + (F − (NCm )), i =0

(15) where N is the number of units produced with the available funds F, Cr is a constant denoting the target selling price of a single unit, and Cl is the amount paid to the buyer of a failed unit. Intuitively, this sum quantifies the average over all possible combinations of i reliable units and N − i unreliable units, which are expressed by the terms E[Rs]i and N (1 − E[Rs])N − i respectively, while the binomial coefficient ( i ) characterizes the number of ways i of N units may be reliable. The term iCr is the reward for i reliable units and (N − i )Cl the penalty for N − i unreliable units. Thus, subtracting the penalty from the reward provides the profit, which we multiply by the probability of i reliable units and N − i unreliable units. Summing from i=0 to N produces the average profit. The term (F − (NCm )) represents the amount of capital remaining after the production of N units with manufacturing cost Cm. Hence, a design requires E[P ] > F to be profitable. This characterization gives the designer freedom because it provides them with the flexibility to choose N as well as the redundancy allocation that determines E[Rs]. Hence, both reliability and the number of units produced are variable, thereby providing greater potential to achieve higher profit. Maximizing the s-expected profit of Eq. (15) may still fail to mitigate risk. A more suitable objective may be to maximize sconfidence in the profitability of items manufactured, suggesting an objective of the form Pr{P > P*} ≥ 1 − α , where P* is a constant representing the desired profit, P the profit achieved by the design, and α is the type I error the producer is willing to tolerate. The mathematical formulation of the proposed optimization problem is

k

the probability of zero events in each Poisson variable (e−λ min ) with the rates given in Table 1. For example, the probability of node 010000 is 0.995716 × (1 − 0.998800) × 0.997123 ×0.941753 × 0.932508 × 0.920178 . Computing the probabilities of the 15 nodes that do not lead to system failure and summing produces the reliability estimate 0.995114. This is lower than the standard calculation 1 − (1 − 0.9136)(1 − 0.9339)(1 − 0.9274) = 0.999585, which assumes component failures are s-independent. Thus, the approach quantifies the negative impact of correlated failures on system reliability (Table 2). 4.2. Optimal s-confidence A limitation of the optimal system profit model arises from the assumption that a product can be produced in sufficiently large quantities that there will be little deviation from the estimated sexpected system profit. A larger than expected number of failures in a small population of items will have a damaging impact on profitability. This could be a significant concern because too many failures will result in a more serious financial setback, harming an organization's ability to continue operations. A more realistic approach would be to allow flexibility in the number and quality of the systems to ensure profitability. In certain cases, it may be more advantageous to produce fewer units at greater unit cost because the larger budget allocated to each unit produces more reliable systems with greater profit potential. In

min αs. t . :

Pr{P > P*} ≥ 1 − α ,

NCm < F .

(16)

The goal is to minimize the error α such that the lower bound on profitability is satisfied by selecting the number of units N and the redundancy allocation that produces reliability s-expected E[Rs].

108

Reliability Engineering and System Safety 165 (2017) 102–114

L. Fiondella et al.

reliability characterizes the probability that an individual component survives beyond the end of a warranty period, indicating the potential advantage of redundancy. The s-correlations given in Table 3 were selected for the sake of illustration because larger s-correlations can severely detract from the reliability improvement attainable through redundancy and would greatly reduce the search space for optimization. The case where Cr = 19.99, Cl = Cm , F=100, P* = 200 corresponds to circumstances where the product is sold for 19.99, the seller provides a money back guarantee, the total amount of funds available is 100, and the target profit is 200. In the context of Eq. (1), cost is the only budgetary constraint, and an upper bound of seven components is imposed on the redundancy in each of the system's five stages. We note that for small systems such as the first example, it is possible to enumerate all possible combinations of redundancy in each of the systems five stages for all feasible values of N and that this example is provided in order to illustrate concepts surrounding the s-confidence optimization approach. However, enumeration is infeasible for larger problems, in which case traditional optimization methods as well as meta-heuristics to identify near optimal solutions will be necessary.

Table 2 Probability of zero events. Xk

Probability

Xk

Probability

X1 X2 X3

0.995716 0.998800 0.997123

X4 X5 X6

0.941753 0.932508 0.920178

This type of analysis quantifies the reasonable s-confidence one may place in achieving profit in excess of P*. If P* is too high α may be large and therefore unacceptably risky, or the objective may possess no feasible solution, which would indicate that the goal profit is overly ambitious given the present parameters. Lowering P* may enable the identification of a lower value of α, and therefore corresponds to a more risk averse decision maker. Studying P* along a spectrum of values produces a range of possible strategies from which one may select a suitable tradeoff between the level of risk and potential for reward. A risk averse planner may be willing to settle for a small profit, but in doing so would be compensated with a small α. Similarly, higher profits will come at the expense of larger α, which exposes the decision maker to a larger risk of nontrivial losses on their investments. This cautionary analysis is desirable because it emphasizes measurements that may promote informed decisions and avoid exaggerated expectations about an unrealistically high return on investment. Ultimately, it is the decision maker's choice to select a value of P*, and this may depend on the nature of the application for which their product is being designed. For example, safety critical products may warrant a lower value of P* to achieve what is deemed to be an acceptably small value of α. An alternative formulation seeks to maximize the probability that the profit exceeds a specified target, but imposes an upper bound of α* on the type I error the producer is willing to tolerate.

max

Pr{P > P*} ≥ 1 − αs. t . :

α ≤ α*,

NCm < F .

5.1.1. Example I This example demonstrates that assuming statistically independent component failures can lead to a design that is suboptimal with respect to profit. Table 4 compares two designs that optimize the s-expected profit of Eq. (15). The first computes the impact of s-correlation according to Eq. (5), while the second employs the more commonly used Eq. (4) that ignores the s-correlations specified in Table 3. Design 2c evaluates this second design according to Eq. (5), which also considers the s-correlations specified in Table 3. Design 2 allocates six components to stage one, seven components to stages two and three, and five components to stages four and five. This design appears to achieve a higher s-expected reliability and profit with greater s-confidence. However, when the s-correlations given in Table 3 are factored into design 2 as design 2c, the expected profit falls from 198.17 to 120.54, which is less than 120.86 of design 1. Thus, the second design actually possesses a lower profit than design 1, which explicitly incorporates s-correlation. This decrease in profit occurs because the s-expected unit reliability of design 2 falls from 0.9122 to 0.7504 when s-correlations are included in the system reliability calculation. While the reliability of design 2c is higher than design 1, the profit of design 2c is lower because the unit manufacturing cost is higher. Thus, optimization based on the assumption of s-independent component failures is dangerous because it can produce suboptimal designs with respect to profit and overestimate the s-confidence one may realistically place in this profit. It should also be noted that while this example provides a simple set of point calculations, it is also possible to conduct sensitivity analysis on the individual correlations to assess the impact increasing or decreasing correlation would have on the optimality of solutions.

(17)

5. Illustrations This section illustrates the optimal s-confidence approach for the cases of s-correlated identical and non-identical components. Due to the size of the state space of solutions, traditional solvers failed on relatively small problems. More specifically, attempts to apply mathematical programming approaches repetitively encountered local maxima for a particular number of units N and could not escape these locally optimal solutions. Thus, Goldberg's Simple Genetic Algorithm (SGA) [14] was adopted as the technique to search for near optimal solutions and apply to each examples given here. Nevertheless, it may be possible to solve the formulation of the s-confidence optimization problem presented here more effectively with alternative methods. The appendix discusses the details of the SGA employed in the following examples.

5.1.2. Example II This example demonstrates the value of the s-confidence maximizing approach embodied in Eq. (17). Design 1 of Example I achieved an s-expected profit of 120.86, but the s-confidence in the target profit of 200 was only 0.1585. Table 5 shows the designs produced by solving

5.1. s-correlated identical components Consider a five stage series system. Each of these five stages can be made more reliable by adding redundant components. However, the redundant components of each stage experience s-correlated failures. Hence, Eq. (5) represents the reliability of each of the five stages. It is also assumed that there is no s-correlation between the components in different stages, which can arise, for example, when components in a single stage overheat due to current irregularities. Thus, s-expected 5 system reliability is calculated as, E[Rs] = ∏j =1 μj ″, the product of the reliabilities of the five redundant stages. Table 3 lists the reliability, s-correlation, and cost of components for each stage. The reliability of the components are low, yet may be realistic when

Table 3 Component specifications.

109

Stage

Reliability

s-correlation

Cost

1 2 3 4 5

0.474 0.425 0.473 0.564 0.534

0.041 0.047 0.033 0.034 0.057

0.25 0.21 0.29 0.35 0.31

Reliability Engineering and System Safety 165 (2017) 102–114

L. Fiondella et al. 0.9

Table 4 Optimal s-expected profit.

0.8

1

2

2c

s-correlation N

Yes 12

No 12

Yes

Design Cm E[Rs] E [P ] (1 − α )%

〈7, 7, 6, 5, 5〉 8.26 0.7501 120.86 0.1585

〈6, 7, 7, 5, 5〉 8.30 0.9122 198.17 0.7155

0.7 s−confidence

Design #

0.6 0.5 0.4 0.3

0.7504 120.54 0.1592

0.2 0.1 65

Table 5 Optimal s-confidence.

80

100

120 140 Target profit

160

180

200

Fig. 2. Optimal s-confidence.

Design #

1

2

2c

s-correlation N

Yes 14

No 12

Yes

Design Cm E[Rs] E [P ] (1 − α )%

〈5, 7, 5, 4, 5〉 7.12 0.7058 115.54 0.1725

〈6, 7, 6, 5, 6〉 8.32 0.9137 198.64 0.7224

design corresponding to the optimal solutions shown in Figure 2. In the interval P* = (170, 185), the s-expected profit of the s-confidence optimizing design suffers the largest decrease with E[P ] = 114.86 . However, this lower profit is only 4.94% less than the profit of Design 1 in Example I, which maximized Eq. (15) and integrated scorrelation into reliability calculation. Thus, for the problem studied here, the s-confidence maximizing approach improves the probability that the target profit P* is reached and never compromises the sexpected profit by more than 5%. This risk/reward tradeoff will be especially important when minimizing the probability of significant losses is more important than maximizing the s-expected profit derived. Figure 3 also reveals the possibility of a nonlinear relationship between the target profit and s-expected profit of the s-confidence optimizing design. This observation further illustrates the value of the approach to identify a design that best suits an organizations needs. For example, the designs associated with the target utilities in the intervals (70,80), (110,120), and (150,160) all achieve the maximum sexpected profit over the range considered, but s-confidence increases as the target profit decreases. Thus, there is no difference in these designs from the perspective of s-expected profit, but the more conservative designs corresponding to the interval (70,80) may be preferable in order to minimize the probability of excessive losses. Figure 3 indicates that the s-expected profit increases and decreases as the target profit increases because higher confidence in profit requires higher unit reliability, which requires higher levels of fault tolerance in the individual stages of the design. Moreover, higher levels of fault tolerance require additional cost, which lowers the number of units that can be built with available funds. Thus, increasing the target profit can lower the number of units that can be built, leaving some funds unused because the increase in reliability would not be worth the corresponding increase in the s-confidence and s-expected profit of a design. Thus, allowing variability in the number of units to be built introduces the fluctuations in s-expected profit, which justifies considering a range of target utilities to identify a design that achieves a compromise between s-confidence and s-expected profit that the designer deems acceptable for their circumstances. The run time of the genetic algorithm on Examples I and II was less than one second for both the case of s-uncorrelated and s-correlated identical components, which utilized Eqs. (4) and (5) respectively to evaluate the fitness of possible solutions.

0.7509 120.51 0.1599

the s-confidence optimization problem with the same parameters as Example I for the s-correlated and s-uncorrelated cases with Eqs. (5) and (4) respectively as well as design 2 with s-correlation. Similar to Example I introducing s-correlation into design 2 achieves higher reliability and profit than design 1, but lower sconfidence. This occurs because Eq. (4) cannot determine the negative impact of s-correlation on redundancy. Examining the two designs indicates that the amount of redundancy in stages one, three, four, and five of design 1 are less than the redundancy in design 2. This lower level of redundancy in the s-correlated design can be explained by examining the component reliability and s-correlation specifications in Table 3, in light of Eq. (5). Components one, two, and three possess reliability less than 0.5 and could therefore benefit from higher levels of redundancy. However, the marginal improvement to reliability achieved by adding a sixth component to stages one or three does not outweigh the detrimental impact of s-correlation on parallel system reliability. Thus, Eq. (5) balances the reliability improvement obtainable with higher levels of redundancy and the negative influence of scorrelation on reliability. The lower redundancy in stages four and five may be explained in a similar manner. Design 1 improves the s-confidence slightly to 0.1725 and only lowers the s-expected profit by 5.32. While this may be a reasonable tradeoff to increase the probability of obtaining the target profit, the sconfidence is still rather low. Thus, a target profit of P* = 200 may be unrealistically high, indicating that it may be appropriate to lower this goal. Figure 2 shows the s-confidence of the optimal s-correlated design determined from Eq. (17) for values of P* between 65 and 200. Decreasing P* improves the s-confidence that the target profit can be achieved. When P* = 120 the s-confidence is 0.6514, suggesting that the optimal design can realize a 20% profit with probability greater than 65%. Setting P* = F = 100 corresponds to a design that recovers the original manufacturing costs and the optimal design attains a sconfidence of 0.7503. Furthermore, the s-expected profit is 118.52, demonstrating that this more conservative design approach achieves levels of profit very close to those optimizing the s-expected profit. The even more risk averse design P* = 80 minimizes the probability of a loss greater than 20% of F. In this scenario, the s-expected profit is still very good at 120.51 and the s-confidence of 0.8441 is also reasonably high. Figure 3 shows the s-expected profit of the s-confidence optimizing

5.2. s-correlated non-identical components This example considers a component selection problem for a system consisting of a single stage. Twenty four components are available and a subset of three to six components may be selected to design a parallel system. The upperbound of six components is imposed because the branch and bound algorithm is least efficient for a parallel system, since the only cutset is the set containing all n components. As a result, many child nodes can be generated that are not easily bounded by the 110

Reliability Engineering and System Safety 165 (2017) 102–114

L. Fiondella et al. 121

Table 6 Component parameters.

s−expected profit

120 119 118 117 116 115 114 65

80

100

120 140 Target profit

160

180

200

Fig. 3. s-expected profit.

algorithm. Furthermore, the number of possible Poisson combinations is 2n(n +1/2). Hence, the number of combinations in a tree for six components is 2,097,152, but 268,435,456 for a tree of seven components. In practice, the algorithm can produce a reliability estimate for a six component parallel system in under 10 s. However, the algorithm may require 30 min or more to estimate the reliability of a parallel system composed of seven s-correlated components. Thus, we push the performance of the algorithm to its present limit for the purpose of optimizing system reliability. Table 6 shows the component parameters including reliability and cost, while Table 7 provides the component s-correlations. The parameters Cr = 14.99, Cl = Cm , F=75, P* = 130 , indicate that the product is sold for 14.99, the seller offers a money back guarantee, initial funds are 75, and the target profit is 130.

Comp. #

μi

Cost

Comp. #

μi

Cost

1 2 3 4 5 6 7 8 9 10 11 12

0.5522 0.5533 0.5821 0.4934 0.4679 0.4576 0.5969 0.5391 0.4977 0.5392 0.4801 0.5180

1.48 1.48 1.64 1.18 1.06 1.01 1.72 1.41 1.20 1.41 1.12 1.30

13 14 15 16 17 18 19 20 21 22 23 24

0.4774 0.4606 0.5513 0.5515 0.5589 0.4674 0.5929 0.5149 0.4601 0.4901 0.4760 0.5269

1.10 1.03 1.47 1.47 1.51 1.06 1.70 1.28 1.02 1.16 1.10 1.34

⎛1.0 0.0397 0.0104 0.0051⎞ ⎜ 1.0 0.0104 0.0174 ⎟ , Σ1 = ⎜ ⎟ 1.0 0.0159 ⎟ ⎜ ⎝ 1.0 ⎠

(18)

one sees that with the exception of ρ4,13 = 0.0397 most of the scorrelations are quite low. The s-correlation matrix for the components 〈6, 9, 13, 23〉 of design 2, however, is

⎛1.0 0.0340 0.0184 0.0366 ⎞ ⎜ 1.0 0.0380 0.0330 ⎟ , Σ2 = ⎜ ⎟ 1.0 0.01740 ⎟ ⎜ ⎝ 1.0 ⎠

(19)

which contains four s-correlations in excess of 0.03, namely ρ6,9 = 0.0340 , ρ6,23 = 0.0366 , ρ9,13 = 0.0380 , and ρ9,23 = 0.0330 . These scorrelations explain why design s avoids components six and nine in favor of components four and 13. Inspecting the reliability of these components, μ6 = 0.4576 and μ9 = 0.4977. Design 2 chooses these two components because of the tradeoff between reliability and cost. The reliability of the components unique to design 1 are μ4 = 0.4934 and μ13 = 0.4774 . Component six possesses lower reliability than component 13 and higher s-correlation. Thus, design 1 achieves higher, reliability, profit, and s-confidence because it considers s-correlation in addition to reliability and cost, illustrating the advantage of optimization procedures that explicitly quantify the negative impact of scorrelation on system reliability. The genetic algorithm for Examples III and IV required approximately 10 s for the s-uncorrelated case with Eq. (4), but about 30 min for the case of s-correlated non-identical components when the fitness of each possible solution was evaluated with the branch and bound algorithm. Thus, the excellent performance of the algebraic expression for s-correlated identical components suggests that algebraic expressions for non-identical components can enhance runtime appreciably, thereby improving the scalability of the approach and the size of the systems that can be optimized. More sophisticated optimization algorithms could also accelerate the solution of these problems, but analytical expressions for non-identical components will also be needed and is the subject of ongoing research.

5.2.1. Example III Table 8 shows the solutions achieving near optimal s-expected profit. In both designs 1 and 2, only three components are selected even though as many as six components are allowed. However, the increase in reliability obtained from a fourth component does not improve profit sufficiently to outweigh the cost of the extra component. Thus, attempting to optimize s-expected profit encourages the production of more units of lower reliability. Examining the component parameters given in Tables 6 and 7 reveals that the components for design 1 possess low reliability, ( μ6 = 0.4577, μ14 = 0.4606 , μ22 = 0.4900 ), but are also cheaper and exhibit lower s-correlations ( ρ6,14 = 0.0161, ρ6,22 = 0.0059, ρ14,22 = 0.0098). Design 2 also selects components with lower reliabilities to reduce costs (μ4 = 0.4934 , μ18 = 0.4674 , μ21 = 0.4601), but chooses components with higher s-correlations ( ρ4,18 = 0.0192 , ρ4,21 = 0.0245, ρ18,21 = 0.0170 ). Evaluating design 2 with the s-correlations confirms that the system reliability, profit, and s-confidence are all lower than design 1.

5.2.2. Example IV Table 9 shows the s-confidence optimizing solutions. Both designs select four components, which lowers the number of units that can be produced, but increases system reliability. This higher reliability lowers the probability of failures and increases the probability that the target profit is achieved. Thus, optimizing s-confidence tends to produce fewer units of higher reliability. Both designs select components 13 and 23. However, design 2 also picks components six and nine, whereas design 1 chooses components four and 13. All six of these components possess reliability lower than 0.5. This occurs because components with lower reliability also cost less. Thus, the s-confidence optimizing approach also seeks to lower unit cost so that a larger number of units can be produced in the goal to achieve the target profit. Examining the s-correlation matrix of components 〈4, 13, 14, 23〉 comprising design 1

6. Conclusions and future research This paper develops an approach to optimize the reliability of discrete systems when component failures are s-correlated so that the s-confidence in a target level of profitability is maximized. Our contributions are two-fold. We remove the widespread and dangerously simplifying assumption that fault-tolerant components fail in a statistically independent manner. We also address the risk posed by a larger than predicted number of failures in a finite batch of items by focusing on the s-confidence in the profit and not on the s-expected profit, which enables a manufacturer to mitigate this risk according to 111

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1.0

1

0.0361 1.0

2

0.0299 0.0118 1.0

3

Table 7 Component s-correlations.

0.0334 0.0121 0.0379 1.0

4

0.0084 0.0117 0.0218 0.0337 1.0

5 0.0152 0.0096 0.0356 0.0332 0.0160 1.0

6 0.0078 0.0084 0.0318 0.0228 0.0207 0.0185 1.0

7 0.0179 0.0370 0.0312 0.0051 0.0233 0.0123 0.0216 1.0

8 0.0074 0.0089 0.0172 0.0169 0.0295 0.0340 0.0275 0.0170 1.0

9 0.0193 0.0316 0.0322 0.0376 0.0224 0.0344 0.0077 0.0249 0.0371 1.0

10 0.0279 0.0060 0.0348 0.0310 0.0275 0.0309 0.0159 0.0250 0.0381 0.0176 1.0

11 0.0057 0.0227 0.0273 0.0139 0.0056 0.0362 0.0163 0.0170 0.0356 0.0379 0.0169 1.0

12 0.0177 0.0210 0.0256 0.0397 0.0294 0.0184 0.0330 0.0161 0.0380 0.0125 0.0324 0.0225 1.0

13 0.0294 0.0310 0.0312 0.0104 0.0105 0.0161 0.0292 0.0283 0.0146 0.0331 0.0135 0.0345 0.0104 1.0

14 0.0073 0.0282 0.0172 0.0361 0.0299 0.0064 0.0086 0.0224 0.0062 0.0334 0.0380 0.0164 0.0059 0.0058 1.0

15 0.0098 0.0333 0.0113 0.0266 0.0146 0.0189 0.0223 0.0354 0.0071 0.0073 0.0356 0.0240 0.0175 0.0070 0.0104 1.0

16 0.0208 0.0186 0.0346 0.0191 0.0205 0.0309 0.0386 0.0286 0.0250 0.0161 0.0153 0.0323 0.0385 0.0228 0.0343 0.0207 1.0

17 0.0311 0.0262 0.0080 0.0192 0.0108 0.0155 0.0383 0.0256 0.0376 0.0302 0.0055 0.0080 0.0351 0.0160 0.0219 0.0274 0.0224 1.0

18 0.0240 0.0125 0.0239 0.0240 0.0245 0.0170 0.0124 0.0116 0.0213 0.0315 0.0165 0.0112 0.0054 0.0218 0.0114 0.0319 0.0097 0.0286 1.0

19

0.0114 0.0072 0.0327 0.0203 0.0185 0.0134 0.0089 0.0241 0.0221 0.0102 0.0076 0.0109 0.0060 0.0146 0.0095 0.0317 0.0203 0.0233 0.0336 1.0

20

0.0101 0.0346 0.0134 0.0245 0.0338 0.0060 0.0275 0.0356 0.0130 0.0181 0.0156 0.0378 0.0359 0.0198 0.0217 0.0334 0.0340 0.0170 0.0257 0.0338 1.0

21

0.0082 0.0198 0.0355 0.0162 0.0317 0.0059 0.0369 0.0363 0.0265 0.0346 0.0280 0.0145 0.0245 0.0098 0.0234 0.0266 0.0175 0.0371 0.0200 0.0283 0.0089 1.0

22

0.0367 0.0073 0.0217 0.0051 0.0103 0.0366 0.0335 0.0080 0.0330 0.0335 0.0186 0.0156 0.0174 0.0159 0.0164 0.0299 0.0338 0.0242 0.0258 0.0324 0.0194 0.0199 1.0

23

0.0380 0.0389 0.0381 0.0276 0.0345 0.0076 0.0180 0.0069 0.0397 0.0356 0.0351 0.0341 0.0274 0.0168 0.0091 0.0139 0.0085 0.0065 0.0112 0.0246 0.0151 0.0286 0.0254 1.0

24

L. Fiondella et al.

Reliability Engineering and System Safety 165 (2017) 102–114

112

Reliability Engineering and System Safety 165 (2017) 102–114

L. Fiondella et al.

optimize the reliability of a single item or the s-expected profitability of items and assumes component failures are statistically independent. Techniques for identical and non-identical s-correlated components were integrated into the optimization. The algebraic expressions for scorrelated identical components requires that components possess equal reliability and a common s-correlation coefficient. The numerical algorithm for non-identical components allows components to possess unequal means and distinct s-correlations. These two techniques were demonstrated in the context of the redundancy allocation and component selection problems respectively. Genetic algorithms were implemented to solve these problems. Explicit consideration of s-correlation ensured that redundancy allocation did not implement excessively high levels of fault-tolerance in stages that would not benefit because the scorrelation between components was too high. The results thus demonstrate that explicitly considering s-correlations can identify better designs than traditional approaches, which assume component failures are statistically independent. Future research will pursue algebraic expressions to evaluate the reliability of non-identical s-correlated components. Moreover, additional modeling generalizations are also possible, including consideration of variance and higher moments of component reliability or common life distributions and the corresponding impact of these more detailed characterizations of component reliability on the optimality of solutions.

Table 8 Optimal s-expected profit. Design #

1

2

2c

s-correlation N

Yes 23

No 23

Yes

Design Cm E[Rs] E [P ] (1 − α )%

〈6, 14, 22〉 3.20 0.8460 239.91 0.9823

〈4, 18, 21〉 3.26 0.8543 244.34 0.9872

0.8443 237.44 0.9812

Table 9 Optimal s-confidence. Design #

1

2

2c

s-correlation N

Yes 17

No 17

Yes

Design Cm E[Rs] E [P ] (1 − α )%

〈4, 13, 14, 23〉 4.40 0.9166 212.45 0.9894

〈6, 9, 13, 23〉 4.41 0.9254 216.80 0.9933

0.9073 207.56 0.9836

its tolerance for losses. This differs from previous approaches, which

Appendix A. Simple genetic algorithm implementation The parameters of the SGA [14] employed in Examples I and II are provided in Table A.10. To clarify the bit specific mutation strategy, Fig. A.4 shows an example chromosome encoding a possible solution in the population. The chromosome encodes the redundancy of each stage of the system as a binary number. For example, the chromosome in Fig. A.4 allocates three components to stage one, five components to stage two, and six components to stage k. This example encodes the redundancy levels of each stage with three bits. Mutation is a strategy that avoids convergence to a local optima by occasionally flipping bits from 0 to 1 or vice versa. However, setting the mutation probability too high can inhibit progress toward a global optimum. Note that bit flipping does not lead to uniform changes in the redundancy allocation problem because changing the first, second, or third bit of a stage's redundancy allocation will increase or decrease the number of components by four, two, or one respectively. Clearly, high mutation rates in the higher order bits of a stage's redundancy may inhibit progress toward a global optimum. Thus, high order bits are assigned lower mutation rates, while the lower order bits are allowed to mutate more frequently. Like Fig. A.4, Examples I and II used three bits to encode redundancy. The bit specific mutation rates were set to 〈0.05, 0.1, 0.15〉. This strategy allowed variation in the search process without introducing excessive instability into convergence toward and optimal solution. In Examples III and IV, the chromosome encoding solutions consisted of twenty for bits, one for each component. A 1 in the ith bit of the chromosome indicates that component i was included in the parallel system; a 0 means that it was excluded. The mutation rate of each bit was 0.05. The number of generations, solutions in each generation, and crossover probability were the same as those given in Table A.10. In all four examples the fitness of a solution was computed as f = − log(α ). This function maps the s-confidence in a solution from the interval (0, 1) to the interval (0, ∞), ensuring that a solution with very high s-confidence (α ≈ 0) was assigned a large fitness value. Thus, the fitness of a Table A.10 SGA parameters. Generations Population

100 100

Crossover Mutation

Fig. A.4. Redundancy encoding.

113

100% Bit specific

Reliability Engineering and System Safety 165 (2017) 102–114

L. Fiondella et al.

solution increased exponentially with a linear increase in s-confidence, improving the probability a solution with better s-confidence was selected for crossover and mutation more often. Our choice of parameters for the SGA demonstrated very good performance, enabling quick convergence to a near optimal solution. Nevertheless, the interested reader may wish to experiment with other parameter settings and genetic algorithm implementations to further enhance performance.

[27]

References [28]

[1] Kuo W, Prasad V, Tillman F, Hwang C. Optimal reliability design: fundamentals and applications. New York, NY: Cambridge University Press; 2001. [2] Pham H. Handbook of reliability engineering. New York, NY: Springer-Verlag; 2003. p. 19–39, [chapter “Reliability of Systems with Multiple Failure Modes”. [3] Amari S, Pham H, Dill G. Optimal design of k-out-of-n:g subsystems subjected to imperfect fault-coverage. IEEE Trans Rel 2004;53(4):567–75. [4] Amari S, Pham H. A novel approach for optimal cost-effective design of complex repairable systems. IEEE Trans Syst, Man, Cybern A, Syst, Hum 2007;37(3):406–15. [5] Ogryczak W, Ruszczynski A. From stochastic dominance to mean-risk models: semideviations as risk measures1. Eur J Oper Res 1999;116(1):33–50. [6] Lopez J. Regulatory evaluation of value-at-risk models, Working Paper 96-51, Wharton Financial Institutions Center; 1996. [7] Rockafellar R, Uryasev S. Conditional value-at-risk for general loss distributions. J Bank Financ 2002;26(7):1443–71. [8] Fiondella L, Zeephongsekul P. Reliability of systems with identically distributed correlated components. In: Proceedings of ISSAT international conference on reliability and quality in design, Vancouver, Canada; 2011, p. 26–30. [9] Fiondella L, Rajasekaran S, Gokhale S. Efficient software reliability analysis with correlated component failures. IEEE Trans Reliab 2013;62(1):244–55. [10] Fiondella L, Zeephongsekul P. Trivariate Bernoulli distribution with application to software fault tolerance. Ann Oper Res 2015:1–15. http://dx.doi.org/10.1007/ s10479-015-1798-4. [11] Bellman R. Dynamic programming. Princeton, NJ: Princeton University Press; 1957. [12] Wolsey L, Nemhauser G. Integer and combinatorial optimizationWiley series in discrete mathematics and optimization. New York, NY: John Wiley & Sons; 1999. [13] Bazaraa M, Sherali H, Shetty C. Nonlinear programming: theory and algorithms. Hoboken, New Jersey: John Wiley & Sons; 1979. [14] Goldberg D. Genetic algorithms in search, optimization and machine learning. Reading, MA: Addison Wesley; 1989. [15] Kirkpatrick S, Gelatt C, Vecchi M. Optimization by simulated annealing. Sci, New Ser 1983;220(4598):671–80. [16] Glover F. Tabu search- part I. ORSA J Comput 1989;1(3):190–206. [17] Glover F. Tabu search- part II. ORSA J Comput 1990;2(1):4–32. [18] Marseguerra M, Zio E, Podofillini L, Coit D. Optimal design of reliable network systems in presence of uncertainty. IEEE Trans Rel 2005;54(2):243–53. [19] Guilani P, Azimi P, Niaki S, Niaki S. Redundancy allocation problem of a system with increasing failure rates of components based on weibull distribution: a simulation-based optimization approach. Reliab Eng Syst Saf 2016;152:187–96. [20] Marseguerra M, Zio E, Martorell S. Basics of genetic algorithms optimization for RAMS applications. Reliab Eng Syst Saf 2006;91:977–91. [21] Coit D, Konak A. Multiple weighted objectives heuristic for the redundancy allocation problem. IEEE Trans Rel 2006;55(3):551–8. [22] Taboada H, Baheranwala F, Coit D, Wattanapongsakorn N. Practical solutions for multi-objective optimization: an application to system reliability design problems. Reliab Eng Syst Saf 2007;92:314–22. [23] Tian Z, Levitin G, Zuo MJ. A joint reliability-redundancy optimization approach for multi-state series-parallel systems. Reliab Eng Syst Saf 2009;94(10):1568–76. [24] Khalili-Damghani K, Amiri M. Solving binary-state multi-objective reliability redundancy allocation series-parallel problem using efficient epsilon-constraint, multi-start partial bound enumeration algorithm, and DEA. Reliab Eng Syst Saf 2012;103:35–44. [25] Okafor E, Sun Y. Multi-objective optimization of a series-parallel system using GPSIA. Reliab Eng Syst Saf 2012;103:61–71. [26] Liu Y, Huang HZ, Wang Z, Li Y, Yang Y. A joint redundancy and imperfect

[29] [30] [31] [32] [33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41] [42]

[43] [44]

[45]

[46] [47] [48] [49]

114

maintenance strategy optimization for multi-state systems. IEEE Trans Rel 2013;62(2):368–78. Lai CM, Yeh WC. Two-stage simplified swarm optimization for the redundancy allocation problem in a multi-state bridge system. Reliab Eng Syst Saf 2016;156:148–58. Xu Y, Liao H. Reliability analysis and redundancy allocation for a one-shot system containing multifunctional components. IEEE Trans Rel 2016;65(2):1045–57. Pham H, Pham M. Optimal design of {k,n-k+1}-out-of-n:F systems (subject to two failure modes). IEEE Trans Rel 1991;40(5):559–62. Pham H, Malon D. Optimal design of systems with competing failure modes. IEEE Trans Rel 1994;43(2):251–4. Pham H. Optimal number of components for a parallel system with competing failure modes. Int J Syst Sci 1992;23(3):449–55. Pham H. Optimal design of parallel-series systems with competing failure modes. IEEE Trans Rel 1992;41(4):583–7. Zhu W, Fouladirad M, Berenguer C. A multi-level maintenance policy for a multicomponent and multifailure mode system with two independent failure modes. Reliab Eng Syst Saf 2016;153:50–63. Coit D, English J. System reliability modeling considering the dependence of component environmental influences. In: Proceedings of annual reliability and maintainability symposium; 1999, p. 214–8. Song S, Coit D, Feng Q, Peng H. Reliability analysis for multi-component systems subject to multiple dependent competing failure processes. IEEE Trans Reliab 2014;63(1):331–45. Song S, Coit D, Feng Q. Reliability analysis of multiple-component series systems subject to hard and soft failures with dependent shock effects. IIE Trans 2016;48(8):720–35. Rubinstein R, Levitin G, Lisnianski A, Ben-Haim H. Redundancy optimization of static series-parallel reliability models under uncertainty. IEEE Trans Reliab 1997;46(4):503–11. Coit D, Smith A. Considering risk profiles in design optimization for series-parallel systems. In: Proceedings of annual reliability and maintainability symposium; 1997, p. 271–7. Coit D, Smith A. Genetic algorithm to maximize a lower-bound for system time-tofailure with uncertain component weibull parameters. Comput Ind Eng 2002;41(4):423–40. Coit D, Jin T, Wattanapongsakorn N. System optimization with component reliability estimation uncertainty: a multi-criteria approach. IEEE Trans Reliab 2004;53(3):369–80. Taflanidis A, Beck J. Stochastic subset optimization for reliability optimization and sensitivity analysis in system design. Comput Struct 2009;87(5–6):318–31. Tekiner-Mogulkoc H, Coit D. System reliability optimization considering uncertainty: minimization of the coefficient of variation for series-parallel systems. IEEE Trans Reliab 2011;60(3):667–74. Feizollahi M, Modarres M. The robust deviation redundancy allocation problem with interval component reliabilities. IEEE Trans Reliab 2012;61(4):957–65. Feizollahi MJ, Ahmed S, Modarres M. The robust redundancy allocation problem in series-parallel systems with budgeted uncertainty. IEEE Trans Rel 2014;63(1):239–50. Chatwattanasiri N, Coit D, Wattanapongsakorn N. System redundancy optimization with uncertain stress-based component reliability: minimization of regret. Reliab Eng Syst Saf 2016;154:73–83. Murthy D, Blischke W. Warranty management and product manufactureSpringer series in reliability engineering. New York, NY: Springer-Verlag; 2005. Lai C, Xie M. Stochastic ageing and dependence for reliability. New York, NY: Springer; 2006. Chae K, Clark G. System reliability in the presence of common-cause failures. IEEE Trans Reliab 1986;35(1):32–5. Vaurio J. An implicit method for incorporating common-cause failures in system analysis. IEEE Trans Reliab 1998;47(2):173–80.