Reference optimality criterion for planning accelerated life testing

Reference optimality criterion for planning accelerated life testing

Journal of Statistical Planning and Inference ( ) – Contents lists available at ScienceDirect Journal of Statistical Planning and Inference journa...

650KB Sizes 2 Downloads 103 Views

Journal of Statistical Planning and Inference (

)



Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

Reference optimality criterion for planning accelerated life testing Ancha Xu a,b,∗ , Yincai Tang c a

College of Mathematics and Information Science, Wenzhou University, Zhejiang, 325035, China

b

School of Natural and Applied Sciences, Northwestern Polytechnical University, Xi’an 710072, China

c

School of Finance and Statistics, East China Normal University, Shanghai, 200241, China

article

info

Article history: Received 28 April 2014 Received in revised form 9 March 2015 Accepted 16 June 2015 Available online xxxx Keywords: Accelerated life testing Bayesian approach Exponential distribution Reference prior Shannon information

abstract Most of the current literatures on planning accelerated life testing are based on D-optimality criterion and V-optimality criterion. Such methods minimize the generalized asymptotic variance of the maximum likelihood estimators of the model parameters or that of a quantile lifetime. Similarly, the existing Bayesian planning criterion is usually based on the posterior variance of a quantile lifetime. In this paper, we present a framework for a coherent approach for planning accelerated life testing. Our approach is based on the expectation of Shannon information between prior density function and posterior density function, which is also the spirit for deriving reference prior in Bayesian statistics. Thus, we refer to the criterion as the reference optimality criterion. Then the optimal design is selected via the principle of maximizing the expected Shannon information. Two optimization algorithms, one based on large-sample approximation, and the other based on Monte Carlo simulation, are developed to find the optimal plans. Several examples are investigated for illustration. © 2015 Elsevier B.V. All rights reserved.

1. Introduction Accelerated life testing (ALT) is commonly utilized to obtain failure time data of the product with high reliability. Compared with standard life-testing methods, ALT could greatly save time and expense. In ALT, specimens are tested at high stress levels to induce early failures, and then failure time data collected from ALT are extrapolated to estimate reliability characteristics (e.g., reliability, mean time to failure (MTTF), or some quantile of the lifetime distribution, etc.) at the normal stress condition. The fundamental theories and methodologies for lifetime analysis have been established over the last three decades. See Meeker and Escobar (1998) for a comprehensive survey. However, the precision of the reliability estimation depends on the design of the ALT plans. Nelson (1990) indicated that optimal testing plans allow maximum possible information to be obtained from the tests, and that optimal testing will save 25%–50% specimens compared with unplanned testing according to the same precision of the estimators in the model. The goal of the ALT optimal design is to find an optimal experimental plan (specifying the number and magnitude of the accelerated stress levels, and the number of items to be tested at these stress levels) according to some criteria. Various optimization criteria have been used to design ALT plans. The most commonly used one is called V-optimality criterion, which develops plans that minimize the asymptotic variance of the maximum likelihood (ML) estimator of a specified quantile lifetime or a function of the model parameters at normal stress condition. In contrast, for a model with two stress



Corresponding author at: College of Mathematics and Information Science, Wenzhou University, Zhejiang, 325035, China. E-mail address: [email protected] (A. Xu).

http://dx.doi.org/10.1016/j.jspi.2015.06.002 0378-3758/© 2015 Elsevier B.V. All rights reserved.

2

A. Xu, Y. Tang / Journal of Statistical Planning and Inference (

)



factors, Escobar and Meeker (1995) presented a criteria called D-optimality criterion that minimizes the determinant of the variance–covariance matrix of the estimates of the model parameters with an attempt to estimate them well on a whole. The motivation of D-optimality criterion is that the volume of an approximate joint confidence region for the model parameters is inversely proportional to an estimate of the square root of the determinant of the variance–covariance matrix. For the scenario of single stress factor, V-optimality criterion is also used by some authors. See Liu and Qiu (2011), Guan and Tang (2012) and Ye et al. (2013). These two criteria are used for different purposes and will lead to different results. Determining the optimal plans usually needs the initial values of the model parameters based on the two criteria. The initial values of the model parameters could be obtained from previous experience with similar products and materials, or engineering judgment. Consequently, robustness of such optimal plans should be studied according to different initial values. See Fard and Li (2009) and Tsai et al. (2012). Besides, V-optimality criterion and D-optimality criterion are based on large sample approximations, while ALTs are usually subject to the constraint that the available number of test units has to be small either because of high cost of the units, or availability of prototype units. In these cases, test planners may need to know the smallest possible number of units that are needed, how to choose the levels of stress, and the allocation for those units to achieve a specified precision in the ML estimators. Ma and Meeker (2010) considered the test plans when the sample size is small, and they restricted their analysis on constant-stress three-level compromise test plans. A more general method for dealing with the case of small sample size is the Bayesian approach, which can be used to combine prior information with data to provide better statistical precision. Prior information incorporated into test planning has been considered by some authors. See Zhang and Meeker (2006) and Yuan et al. (2012). Erkanli and Soyer (2000) pointed out that the Bayesian approach to the optimal design problem requires specification of three components: (i) a utility (loss) function that reflects the consequences of selecting a specific design; (ii) a probability model; (iii) a prior distribution reflecting designer’s prior beliefs about all unknown quantities. Posterior variance is used as a utility function in most of the literatures about Bayesian planning. Verdinelli et al. (1993) used the Shannon information as the utility function, where the Shannon information is between marginal distribution of the data and the posterior predictive distribution. However, the method presented by Verdinelli et al. (1993) is too limited and is hard to implement. The main drawback of the criterion presented by Verdinelli et al. (1993) is that when the sample size is large enough, the optimal design will be arbitrary. See the proof in Appendix A. The paper has two goals. 1. Propose a Bayesian criterion for planning ALT. We achieve this by setting the utility function as the Shannon information between prior density function and posterior density function, and defining the criterion as the expectation of such Shannon information. The optimal design is to maximize the expected Shannon information. The advantage of this Bayesian criterion is that the expected Shannon information can be interpreted as the amount of information from the observed data. Thus, an optimal test plan based on such criterion is to maximize the information from an experiment. 2. Present some algorithms to find optimal test plans. Large-sample approximation is first used and a simple formula is obtained for finding an optimal design. The optimal designs are the same as that based on D-optimality criterion and V-optimality criterion in some investigated examples. When the sample size is not large enough, Monte Carlo simulation algorithm is proposed. We discuss the procedures in detail for potential use. This paper proceeds as follows. Model assumptions and the prior distributions of the parameters are given in Sections 2 and 3, respectively. Reference planning criterion is proposed in Section 4. In Section 5, we present two algorithms: largesample approximation and Monte Carlo simulation algorithm, to find the optimal designs, and several examples are given for illustration. We compare our method with other criteria in Section 6. Finally, we conclude this paper, and discuss possible future research. 2. Model assumptions Let T be the lifetime of a product, and suppose that T has a log-location–scale distribution with cumulative distribution function (cdf) F (t |µ, σ ) = Φ



log(t ) − µ

σ



,

(2.1)

where µ and σ are the location and scale parameters of log lifetime, and Φ (·) is a standardized location–scale cdf. When Φ (·) = Φnor (·), the cdf of standard normal distribution, T is lognormal, and T follows a Weibull distribution for Φ (x) = 1 − exp[− exp(x)]. If T follows exponential distribution, then σ = 1 and Φ (x) = 1 − exp[− exp(x)]. Let S denote the accelerating variable, and the most commonly used acceleration models can be denoted as

µ = a + bϕ(S ),

(2.2)

where a and b > 0 are unknown parameters, and ϕ(S ) is a given decreasing function of stress level S. In particular, ϕ(S ) = 1/S for Arrhenius model and ϕ(S ) = − log(S ) for inverse power law model. σ does not depend on S, which means that the failure mechanisms are the same under different stress levels. Then S should be within a certain range, up to the highest stress level Sh , so that the failure mechanism will not change. Let S0 be the use stress condition, and

A. Xu, Y. Tang / Journal of Statistical Planning and Inference (

)



3

S0 < S1 < S2 < · · · < Sk = Sh be the stress levels used in ALT. The type of ALT may be constant stress ALT, step-stress ALT or progressive stress ALT. Different types of ALT have different settings. See the detail in Nelson (1990) and Meeker and Escobar (1998). No matter which ALT is adopted, the parameters in the model are a, b and σ , denoted as θ = (a, b, σ ). Usually, the quantity of interest is tq (S0 ), the q quantile of the life distribution at the use stress condition. From (2.1) and (2.2), we have tq (S0 ) = exp a + bϕ(S0 ) + Φ −1 (q)σ .





3. The prior specification Bayesian method has many advantages in analyzing data. One of the reasons is that the prior information can be combined. Prior distribution can be elicited from the experts’ opinions or historical data. In the ALT model, θ = (a, b, σ ) is the unknown parameters. Assume that the prior distribution has been elicited from the experts’ opinions or historical data, and denoted by π (θ). In practice, the information about θ is hard to collect, and the optimal design is sensitive to the prior of a and b (see Zhang and Meeker, 2006 and Tang and Liu, 2010). Thus, we take the following transformation

 α = tq (S0 ) β = g (b) (3.1) σ = σ, where g (b) is a monotone function of b, e.g., β = g (b) = exp{b[ϕ(S0 ) − ϕ(Sk )]} is the acceleration factor of stress level Sk relative to S0 . Denote η = (α, β, σ ). When planning the constant-stress ALT, Zhang and Meeker (2006) argued for the independence among tq (S0 ), b and σ , and chose to specify independent priors for tq (S0 ), b and σ . The joint prior for a, b and σ can then be obtained by transformation of random variables. The reparameterization we adopt is a little different, but the independence among α , β and σ still holds according to Zhang and Meeker (2006). Assume we have independent prior information of α , β and σ , and the prior distribution is formulated as π (η) = πα (α)πβ (β)πσ (σ ). Then the joint prior distribution of θ can be derived via multivariate transformation of random variables. 4. Reference optimality criterion Let D be a specific design and t be the observed data. Denote the likelihood function as L(t |D, θ) or L(t |D, η). Then according to Bayes’ theorem, the posterior density functions of the model parameters θ and η are L(t |D, η)π (η) L(t |D, θ)π (θ) and p(η|t , D) = , p(θ|t , D) = pt (t |D) pt (t |D) where pt (t |D) is the marginal density function of t defined by pt (t |D) =



L(t |D, θ)π (θ)dθ =



L(t |D, η)π (η)dη.

If the goal of an ALT is to estimate all the parameters in the model, then we define the utility function as the Shannon information between prior density function and posterior density function of θ . That is, U (t , D) =



p(θ|t , D) log

p(θ|t , D)

π (θ)

dθ.

(4.1)

If a quantile α = tq (S0 ) is of interest, then the Shannon information between prior density function and posterior density function of α is used. That is, U (t , D) =



p(α|t , D) log

p(α|t , D)

π (α)

dα,

(4.2)

where p(α|t , D) is the posterior density function of α , and can be described as p(α|t , D) =



p(η|t , D)dβ dσ .

But for a given plan D, the Shannon information (4.1) or (4.2) depends on the unobserved data t. The pre-posterior expectation of the Shannon information over pt (t |D) can be used to obtain a Bayesian planning criterion. Thus the optimal plan is the one that maximizes M (D) = Et |D [U (t , D)]

  p(θ|t , D)   pt (t |D) p(θ|t , D) log dθ dt , in (4).  π (θ) =    p(α|t , D)   pt (t |D) p(α|t , D) log dα dt , in (5). π (α)

(4.3)

Actually, Lindley (1956) was the first to use the expected Shannon information (4.3) in the context of Bayesian inference. He interpreted M (D) as the expected information about the model parameters to be provided by the data t when the prior densities of the model parameters are given. Berger et al. (2009) interpreted the expected Shannon information (4.3) as the

4

A. Xu, Y. Tang / Journal of Statistical Planning and Inference (

)



amount of information from the observed data based on the prior: the sharper the prior the smaller the amount of information to be expected from the data. Thus, given the prior, the Bayesian planning criterion is to find an optimal plan to maximize the amount of information from the data. Bernardo (1979) utilized it to develop a noninformative prior: reference prior, and now the reference prior is very popular in Bayesian analysis. See Berger and Bernardo (1989), Ghosal (1997), Xu and Tang (2011, 2012) and Guan et al. (2013). Thus, we call the Bayesian planning criterion as reference optimality criterion (ROC). Criterion (4.3) involves calculation of high-dimension integration. Usually, there is no closed form, and exact numerical computation is intractable. Thus approximations or simulations need to be used. A large-sample approximation is used to provide a useful basis for optimization. 4.1. Example 1: constant-stress ALT designs Consider a k-level constant-stress ALT. Assume that the lifetime Xi at stress level Si is the exponential distribution with failure rate λi , and that the acceleration function is given by the power law model

λi = aSib or

log λi = log a + b log Si ,

i = 0, 1, 2, . . . , k,

where b is known, and a is an unknown parameter. Vopatek (1992), Verdinelli et al. (1993) and Erkanli and Soyer (2000) considered a special case of this model with b = 1 and k = 1. Assume that there are n items on test, and ni items are allocated at stress Si . Denote that Ti is the total time on test at stress level Si . The object of the design is to determine the optimal allocation of test items at different stress levels. Let T = (T1 , . . . , Tk ). Then the likelihood function is

 L(a|T , D) =

a

n

k 

 bn Si i

 exp −a

k 

i=1

 Sib Ti

.

i=1

Noninformative prior is assigned for a, that is, π (a) ∝ 1/a. Thus, we have

 p(a|T , D) =

k 

Sib Ti

i =1

0 ( n)

n  an−1 exp −



0 (n)



k

Sib Ti

,

i=1

pT ( T , D ) = 

k  i=1

k 

bni

Si

Sib Ti

n .

i =1

Then the Shannon information (4.1) can be obtained as U (T , D) = nψ(n) − n − log(0 (n)), where ψ(·) is the digamma function. Thus, M (D) = ET |D [U (T , D)] = nψ(n) − n − log(0 (n)), which does not depend on ni . Therefore, when b is known in the power law model, it does not matter how many items are allocated at each stress. This result was also noted by Vopatek (1992), Verdinelli et al. (1993) and Erkanli and Soyer (2000) for the special case of the power law with b = 1 and k = 1. When the failure rate at the use stress condition λ0 = aS0b is of interest, the Shannon information (4.2) can be calculated, and M (D) is also independent of ni . 5. Optimal algorithms Criterion (4.3) involves calculation of the Shannon information and its marginal expectation over all possible data. Generally, there is no closed form, and exact numerical computation is intractable. Thus approximations or simulations need to be used. We will present two algorithms to find the optimal plans: one based on the large-sample approximation and the other based on Monte Carlo simulation. 5.1. Large-sample approximation Let Iθ (D) and Iη (D) denote the expected Fisher information matrix of θ and η for the candidate plan D, respectively.

• When the estimates of θ are of interest in an ALT, we use (4.1) as the utility function. Then, as shown in Clarke and Barron (1990), under some regularity conditions and as n → ∞,   n   d 1/2 M (D) = log + π (θ) log{|Iθ (D)|} dθ − π (θ) log(π (θ))dθ + o(1), (5.1) 2 2π e of θ and |Iθ (D)| is the determinant of Iθ (D). Thus, maximizing M (D) is equivalent to maximizing where d is the dimension π (θ) log{|Iθ (D)|}1/2 dθ . Denote  H1 (D) = π (θ) log{|Iθ (D)|}1/2 dθ. Large |Iθ (D)| will lead to large H1 (D). Thus, M (D) in (5.1) is similar to the D-optimality criterion. Sometimes, the evaluation of H1 (D) may be analytically intractable. Then we can use the following Monte Carlo simulation to approximate H1 (D):

A. Xu, Y. Tang / Journal of Statistical Planning and Inference (

)



5

m) 1. Generate θ (1) , θ (2) , . . . , θ ( from π (θ), where m is large enough. m 1 1/2 2. Approximate H1 (D) by m . j=1 log{|Iθ (j) (D)|} • When the quantile α = tq (S0 ) is of interest, (4.2) is used as the utility function. Then, it follows from Ghosh and Mukerjee (1992) that, under some regularity conditions and n → ∞,

   n   ω(α) + πα (α) log M (D) = log dα + o(1), 2 2π e πα (α) 1

(5.2)

where

ω(α) = exp

  |Iη (D)| πβ (β)πσ (σ ) log dβ dσ , |I−α (D)|

 



1

2

I−α (D) is the expected Fisher information matrix of (β, σ )′ with α held fixed. Let H2 (D) =



πα (α) log



 ω(α) dα. πα (α)

Then the optimal design is equivalent to maximizing H2 (D). Similarly, if H2 (D) has no closed form, Monte Carlo simulation could be adopted. The procedure is as follows: 1. Generate α (1) , α (2) , . . . , α (m1 ) from π m1 is large enough.  α (α), where 2. Approximate H2 (D) by (j)

1 m1

m1

j =1

log

ω(α (j) ) πα (α (j) )

.

3. If ω(α ) is also analytically intractable, we can generate β (1) , β (2) , . . . , β (m2 ) from πβ (β), and σ (1) , σ (2) , . . . , σ (m2 ) from πσ (σ ). Then approximate ω(α (j) ) by exp (j)

(l)



1 2m2

(l) ′

m2



|Iη(l) (D)|



(l)

(l) ′

l=1 log |I (l) (D)| −α

, where Iη(l) (D) is Iη (D) with η replaced by

(α , β , σ ) , and I−α(l) (D) is I−α (D) with (β, σ ) replaced by (β , σ ) . ′

5.2. Example 1 (continued) We continue the example in Section 4.1, but assume the acceleration function is log λi = a + bϕ(Si ), i = 0, 1, . . . , k, where both a and b are unknown. Let ϕi = ϕ(Si ), i = 0, 1, . . . , k. For fixed value of n, the design problem is to select the number of distinct stress levels k and the number of items tested at each stress level ni , i = 1, 2, . . . , k. We use the large-sample approximation to determine the optimal design under two circumstances. 1. (4.1) is used as the utility function. It can be easily shown that the Fisher information matrix of (a, b) is



k 

 n  I1 =   k 

i=1

ni ϕi

i=1

k 

 ni ϕi 

 .  2

ni ϕi

i=1

I1 is independent of the parameters, and |I1 | = n H1 (D) =

k

i=1

ni ϕi2 −

 k

i=1

ni ϕi

2

. Thus, for any proper priors π (θ), we have

1

log (|I1 |) . 2 Then maximizing M (D) is equivalent to select n1 , n2 , . . . , nk and k for maximizing n

k 

 ni ϕ − 2 i

i=1

k 

2 ni ϕi

,

i=1

which is the same as the result based on D-optimality criterion. 2. (4.2) is used as the utility function. The failure rate at the use stress condition λ0 = exp{a + bϕ0 } is of interest. Let β = exp{b(ϕ0 − ϕ1 )}. Then the Jacobian from (a, b)′ to (λ0 , β)′ is (ϕ0 − ϕ1 )−1 /(λ0 β). Thus the transformation is one-to-one. Let ψi = (ϕ0 − ϕi )/(ϕ0 − ϕ1 ). It can be easily shown that the Fisher information matrix of (λ0 , β) is k 

  n    λ20 I2 =  k   ni ψi  i =1

λ0 β

ni ψi



  λ0 β   . k   2 ni ψi  i =1

i=1

β2

6

A. Xu, Y. Tang / Journal of Statistical Planning and Inference (

)



Table 1 Prior distributions and the corresponding optimal change time in simple step-stress ALT.

 Then |I2 | =

n

2 i=1 ni ψi −

k

Prior

λ0

I II III IV

U(8 × 10 , U(6 × 10−4 , U(4 × 10−4 , U(2 × 10−5 , −4

 k

i=1 ni ψi

2 

1.2 × 10 ) 1.4 × 10−3 ) 1.6 × 10−3 ) 1.8 × 102 ) −3

β

τ1

H2 (D)

U(4, 6) U(3, 7) U(3, 7) U(3, 7)

223.79 217.01 215.27 204.56

0.517 0.530 0.545 0.598

/(λ20 β 2 ), and the Fisher information of β given λ0 is I−λ0 =

k

i=1

ni ψi2 /β 2 .

For any proper prior πβ (β),

  |I2 (D)| ω(λ0 ) = exp dβ πβ (β) log 2 |I−λ0 |     2  k k  1    n − = exp ni ψi ni ψi2  .  2λ0  i=1 i=1  



1

Thus, for any proper prior πλ0 (λ0 ),

 ω(λ0 ) 1/2 dλ0 H2 (D) = πλ0 (λ0 ) log πλ0 (λ0 )    2  k k   C1 n − = ni ψi ni ψi2  − C2 , 



4

i =1

i=1

πλ0 (λ0 ) log(πλ0 (λ0 ))/2dλ0 . Since C1 and C2 are independent of n1 , . . . , nk and  2  k k, the optimal plan is equivalent to minimize / ki=1 ni ψi2 , which again is the same as the result based on i=1 ni ψi

where C1 =



πλ0 (λ0 )/λ0 dλ0 and C2 =



V-optimality criterion. 5.3. Example 2: simple step-stress ALT designs In a simple step-stress ALT, n units are initially tested at a lower stress level Sl > S0 . The stress level remains at Sl until the stress changing time τ1 , from which the stress is increased to a higher stress level Sh . The test continues until the censoring time τ . The failure time distribution at stress level Si , i = {0, l, h}, is assumed to be exponential with failure rate λi . The acceleration function is the same as Example 2. Besides, the cumulative exposure (CE) model (Nelson, 1980) is assumed for step-stress ALT. The goal of the ALT is to estimate λ0 , then the design problem is to find the optimal change time τ1 . This problem was considered by Bai et al. (1989), and they used the V-optimality criterion to determine the optimal change time τ1 . Now the reference planning criterion is used for illustration. According to Bai et al. (1989), we can show that the Fisher information matrix of (λ0 , β)′ is



P1 + P2 (1 − P1 )

P1 ψ1 + P2 (1 − P1 )ψ2



 λ0 β , 2 P1 ψ + P2 (1 − P1 )ψ2 λ0 β β2   where β = exp{b(ϕ0 − ϕ1 )}, τ2 = τ − τ1 and Pi = 1 − exp −λ0 β ψi τi , i = 1, 2. In such a case, ω(λ0 ) and H2 (D) do not 

λ

2 0

I2 = n   P1 ψ1 + P2 (1 − P1 )ψ2

2 1

have closed forms, thus Monte Carlo simulation can be used. Assume that the stress is temperature with ϕ(S ) = 1/S, S0 = 293, Sl = 333, Sh = 393 and τ = 400. We use continuous uniform distributions to express our uncertainty in the values of the model parameters. Assume that λ0 and β are independent. Four sets of prior distributions for (λ0 , β) are assumed in this example, see Table 1. The means of β in the four priors are the same, i.e., E (β) = 5. However, we become more and more uncertain in the values of λ0 from prior I to prior IV, especially in the prior IV, the range is quite large. The goal is to assess the effectiveness of the priors on the optimal design. In the Monte Carlo simulation, we set m1 = m2 = 1000. The last two columns of Table 1 list the results of optimal change points and criterion values. We can see that the results are quite robust for different priors. Note: If the design variables are the change time τ1 and the lower stress level S1 , then for all the above priors, the optimal designs we obtain are τ1 = τ and S1 = S0 , which means that all the units tested under the use stress condition will give us the maximum information. The result is also indicated by Miller and Nelson (1983), although they used the V-optimality criterion.

A. Xu, Y. Tang / Journal of Statistical Planning and Inference (

)



7

5.4. Monte Carlo simulation algorithm We have combined Monte Carlo simulation with large-sample approximation in Section 5.1. In this section, a completely Monte Carlo simulation will be explored to find the optimal plan for general cases, no matter how large the sample size is. Usually, Monte Carlo simulation requires m to be at least a few thousands, which will take an extremely long computational time. To reduce the computational time, we adopt the surface fitting technique proposed by Muller and Parmigiani (1995). Using surface smoothing for finding optimal Bayesian designs has been previously considered by Erkanli and Soyer (2000), Liu and Tang (2010) and Yuan et al. (2012). We will also use this method, but the Epanechnikov kernel is used to fit a smooth curve or surface. The algorithm is summarized as follows: 1. 2. 3. 4.

Select J designs that spread over the design space, i.e. Di , i = 1, 2, . . . , J. For each design Di , compute M (Di ) using Monte Carlo simulation. Fit a smooth curve or surface using the method of kernels. Find the optimal design D∗ using the smoothed curve or surface.

Usually, the design space is finite in the problems of planning ALT. The grid method can be used in the first two steps: firstly divide the design space into some small parts equally, then compute M (D) at each grid point. The detailed methods for computing M (D) are given in Appendix B. After getting (Di , M (Di )), i = 1, 2, . . . , J, we use the Nadaraya–Watson estimate with the Epanechnikov kernel of M (D). The Epanechnikov kernel is optimal in a mean square error sense, and is more robust than Gaussian kernel (Epanechnikov, 1969). For example, when only one decision variable is involved,

  J  D −D i K

M (D) ≈

hD

i=1

,

  J  D−Di K

hD

i=1

where K (x) =

M (Di )

(1 − x2 )1{|x|<1} and hD is the bandwidth, which controls the smoothness of the curve. The optimal bandwidth is determined by Akaike information criterion (Hurvich et al., 1998). When two decision variables D = (D(1) , D(2) ) are 3 4

considered, J 

M (D) = M (D

(1)

(2)

,D ) ≈

i=1

 K

( 1) D(1) −Di

h (1) D

 K

  K

(1) D(1) −Di

h (1) D

(2) D(2) −Di

h (2) D

  K



(1)

(2)

M (Di , Di )

(2) D(2) −Di



,

h (2) D

where hD(1) and hD(2) are the bandwidths. In planning ALT, at most two decision variables are involved. Thus, the algorithm presented above can be used in practice. 5.5. Example 2 (continued) We consider the simple step-stress ALT design mentioned in Section 5.3, and use the Monte Carlo simulation algorithm. The setting is the same as Section 5.3. For illustration, we consider the prior set: U (8 × 10−4 , 1.2 × 10−3 ) for λ0 and U (4, 6) for β . Assume that the observed data t = (tl , th ), and ti = (ti1 , ti2 , . . . , tiri ) are the failure times at stress level Si , i = l, h. Then the likelihood function of (λ0 , β) is

  β rl +rh ψ2 exp −λ0 β Tl − λ0 β ψ2 Th , rl rh where Tl = j=1 tlj + (n − rl )τ1 and Th = j=1 (thj − τ1 ) + (n − rl − rh )(τ − τ1 ). Given the prior above, the full conditional posterior of λ0 given β is    r +r p(λ0 |t , β) ∝ λ0l h exp −λ0 β Tl + β ψ2 Th I(8×10−4 ,1.2×10−3 ) (λ0 ), r +r h

L(t |λ0 , β) = λ0l

which is a truncated gamma distribution with shape parameter rl + rh + 1 and scale parameter β Tl + β ψ2 Th . Then the algorithm of Scenario 2 in Appendix B can be used. In the simulation, we set J = 100, N = 500, N1 = 1000, n = 20, 50, 100 and 150. The optimal change times are 175.05, 186.56, 191.81 and 220.55 for n = 20, 50, 100 and 150, respectively. Different from the result based on large-sample approximation, the optimal change time changes when the sample size differs. In this prior setting, we can see that the bigger the sample size the bigger the change time. The reason is that when the sample size is small, the probability of obtaining the failures in the lower stress is also small. Then the stress needs to be increased early for much faster failures. For the case of n = 50, the effects of different J are also assessed. We choose J = 20, 50, 80 and 100. The results are shown in Fig. 1. From Fig. 1, we see that the fitted curves are quite robust for different J, even for the case of J = 20. Since the grid method is used in the Monte Carlo simulation algorithm, different J means different selected designs in step 1. Thus, the algorithm is also robust for different selected designs.

8

A. Xu, Y. Tang / Journal of Statistical Planning and Inference (

)



Fig. 1. The fitted curves for different J.

6. Comparison with other criteria In this section, we will compare our methods with other two criteria: V-optimality criterion and Bayesian criterion (Zhang and Meeker, 2006). We consider the planning problems based on Example 19.1 of Meeker and Escobar (1998). The lifetime of the Device-A follows lognormal distribution, and the engineers are interested in the estimation of q = 0.01 quantile of the lifetime distribution at 10 °C. A sample of n = 165 units will be tested, and the censoring time τ is 5000 h. No failures would be obtained when the units are tested at 10 °C. Thus, an ALT is necessary. Because the temperature is the stress, the Arrhenius acceleration model is utilized, that is, µ = a + bϕ(S ), where ϕ(S ) = 11 605/(S + 273.15), 11,605 is the reciprocal of Boltzmann’s constant in units of electron volts per degree °C, and the highest temperature is 80 °C. The optimal number of stress levels is 2 for all the criteria. Thus a two-level constant stress ALT is considered here. Then the planning problem is to find the optimal lower stress level S1 and its allocation p1 = n1 /n. When V-optimality criterion is used, the planning values of a, b and σ are assumed to be −13.5, 0.63 and 0.98, respectively. Thus, log(t0.01 (S0 )) and β = b(ϕ(S0 ) − ϕ(S2 )) (the log acceleration factor of the highest stress level S2 to the use stress level S0 ) are expected to be around 10.04 and 5.12, respectively. When Bayesian criteria are utilized, we use (log(t0.01 (S0 )), β, σ ) as the new parameters because log(t0.01 (S0 )) is of interest. Following Zhang and Meeker (2006), we assume log(t0.01 (S0 )), β and σ are independent. Four sets of prior distributions are assumed, see Table 2. Continuous uniform distributions are used to express our uncertainty in the values of the parameters. Other distributions such as lognormal and Gamma can be also used according to the experts’ opinions, see Zhang and Meeker (2006) and Liu and Tang (2010). The means of priors I–III are around the planning values, but the information becomes more and more vague from priors I–III. To assess the effect of the prior, prior IV is also used. The means of prior IV are a little far to the planning values. However, all the planning values are in the corresponding priors’ intervals. With these priors, we obtain the optimal plans based on ROC and Bayesian criterion (Zhang and Meeker, 2006). The results are listed in Table 3, where ROC 1 means ROC using largesample approximation, and ROC 2 is ROC using Monte Carlo simulation. Besides, the optimal test plans based on V-optimality criterion are p1 = 0.772 and S1 = 40 °C. We use the planning values as the true model parameters, and generate a dataset of 165 units with censoring time τ = 5000 h according to the test levels and allocations in Table 3. A total of 50,000 datasets are simulated. For each simulated dataset, the posterior mode of log(t0.01 (S0 )) is computed. Then we calculate the mean and the square root of the mean squared error (SMSE) based on the 50,000 posterior modes. Fig. 2 shows the histograms of all the 50,000 posterior modes for ROC and Bayesian criterion based on different priors. For V-optimality criterion, the mean and SMSE of log(t0.01 (S0 )) are 10.07 and 0.355, respectively. From Fig. 2, we can see that:

A. Xu, Y. Tang / Journal of Statistical Planning and Inference (

)



9

Table 2 Prior distributions used in the comparison. Prior

log(t0.01 (S0 ))

β

σ

I II III IV

U(9.04, 11.04) U(8.54, 11.54) U(6.54, 13.54) U(8.54, 14.54)

U(4.24, 6.00) U(4.74, 6.50) U(5.74, 7.50) U(2.00, 6.00)

U(0.9, 1.06) U(0.8, 1.16) U(0.5, 1.46) U(0.9, 2.00)

Table 3 Optimal plans for different Bayesian criteria. Prior

I II III IV

ROC 1

ROC 2

Bayesian criterion

p1

S1

p1

S1

p1

S1

0.748 0.711 0.695 0.580

40 °C 42 °C 45 °C 62 °C

0.856 0.773 0.749 0.627

39 °C 43 °C 35 °C 47 °C

0.858 0.767 0.694 0.355

41 °C 48 °C 46 °C 53 °C

1. When the priors are informative, i.e., prior I, ROC 2 performs the best. The optimal plans based on ROC and Bayesian criterion perform better than that of the V-optimality criterion, since the means are more close to the true value 10.04, and the SMSEs are much smaller. See the first row in Fig. 2. 2. If we do not have much information about the model parameters, then V-optimality criterion is hard to implement. However, Bayesian methods can still be used. For example, we can use some vague priors, i.e., priors II–IV. In such a case, ROC 2 performs the best. Especially for prior IV, the SMSE based on ROC 2 is much smaller than these based on ROC 1 and Bayesian criterion. See the second and third rows in Fig. 2. 3. The results based on ROC 1 are close to that based on ROC 2 from priors I–III, but much worse in the case of prior IV. This means large-sample approximation may work well if we have some prior information about the values of the model parameters. 4. The results based on ROC 2 are the most robust. For example, the SMSEs based on ROC 2 change from 0.342 to 0.369, although the prior IV is significantly different from priors I–III.

7. Conclusion In summary, this article has developed and illustrated a Bayesian criterion for planning accelerated life testing. The Bayesian criterion is based on the expected Shannon information, and is called reference planning criterion. The optimal design is to maximize the expected Shannon information; that is, to find a design to maximize the amount of information from an experiment. Two algorithms are proposed to find the optimal design. For the large-sample approximation, we give a closed form formula and can easily obtain the optimal design combined with Monte Carlo simulation. For Monte Carlo simulation algorithm, we give detailed simulation procedures for potential applications. For illustration of the Bayesian criterion, we use several examples to compare the results based on the criterion with that based on classic criteria. The proposed criterion is general, and could be used not only in ALT planning, but also in planning degradation testing. Acknowledgments The research is supported by Natural Science Foundation of China (11201345, 11271136), Program of Shanghai Subject Chief Scientist (14XD1401600), the 111 Project (B14019), Natural Science Foundation of Zhejiang province (LY15G010006), and China Postdoctoral Science Foundation (2015M572599). The authors thank the referee for his/her constructive suggestions that greatly improved the article. Appendix A Let t ∗ denote unknown, but potentially observable quantity from p(·|θ). Then the prior predictive distribution of t ∗ is p(t ∗ ) =



π (θ)p(t ∗ |θ)dθ,

and the posterior predictive distribution of t ∗ is p(t ∗ |D, t ) =



p(t ∗ |θ)p(θ|D, t )dθ.

10

A. Xu, Y. Tang / Journal of Statistical Planning and Inference (

)



Fig. 2. Histogram of the posterior modes of log(t0.01 (S0 )) based on different criteria and priors.

Verdinelli et al. (1993) presented a design criterion based on the expectation of the Shannon entropy between p(t ∗ |D, t ) and p(t ∗ ), that is, M ∗ (D) = Et

 =



p(t ∗ |D, t ) log

p(t |D)





p(t ∗ |D, t )



p(t ∗ )

p(t ∗ |D, t ) log



dt ∗

p(t ∗ |D, t ) p(t ∗ )

 

dt ∗ dt .

Then maximizing M ∗ (D) with respect to D yields the optimal design. Lemma 1. Under some regular conditions, as n → ∞,



L(t |D, θ) log



L(t |D, θ) p(t |D)

 dt =

d 2

log

 n  1 + log |ID (θ)| + log(1/π (θ)) + o(1), 2π e 2

where d is the dimension of θ and ID (θ) is the Fisher information matrix of θ given the design D. See the proof in Clarke and Barron (1990). Theorem 1. Under the same conditions as Lemma 1, as n → ∞, M ∗ (D) = −



p(t ∗ ) log p(t ∗ )dt ∗ +



π (θ)



p(t ∗ |θ) log p(t ∗ |θ)dt ∗ dθ + o(1).

(A.1)

A. Xu, Y. Tang / Journal of Statistical Planning and Inference (

)



11

Proof. Under the conditions, the integral among t ∗ , t and θ is exchangeable.  p(t ∗ |θ)π(θ)L(t |D,θ)dθ , p(t |D)

Since p(t ∗ |D, t ) = M (D) = ∗



p(t |D)



p(t |D)

=



p(t |D, t ) log ∗



p(t ∗ |D, t )





p(t ∗ |θ)π (θ)L(t |D, θ)dθ

 

dt ∗ dt

p(t ∗ )

 log

p(t |D)



p(t ∗ |θ)π (θ)L(t |D, θ) log

=

p(t ∗ |D, t )

p(t ∗ |D, t )



p(t ∗ )



p(t ∗ )

dt ∗ dt

dθ dt ∗ dt

p(t ∗ ) log p(t ∗ )dt ∗ ,

= A−

p(t ∗ |θ)π (θ)L(t |D, θ) log p(t ∗ |D, t )dθ dt ∗ dt. Thus



where A =



we have



p(t |θ)π (θ) ∗

A =



p(t |θ)π (θ) ∗

=





L(t |D, θ) log



L(t |D, θ) log

p(t |θ)π (θ) ∗

+





p(t ∗ |θ)π (θ)L(t |D, θ)dθ



p(t |D)



L(t |D, θ)



dtdt ∗ dθ

p(t |D)

L(t |D, θ) log



dtdt ∗ dθ

p(t ∗ |θ)π (θ)L(t |D, θ)dθ



L(t |D, θ)

dtdt ∗ dθ

≡ B1 + B2 . According to Lemma 1, we have d

B1 =

2

log

  n  1 π (θ) log |ID (θ)|dθ + π (θ) log(1/π (θ))dθ + o(1), + 2π e 2

and

 B2 =

π (θ)

p(t ∗ |θ)L(t |D, θ) log



p(t ∗ |θ)



p(t ∗ |θ)L(t |D, θ)π (θ)dθ



p(t ∗ |θ)L(t |D, θ)

p(t ∗ |θ)π (θ)L(t |D, θ)dθ π (θ) p(t ∗ |θ)L(t |D, θ) log p(t ∗ |θ)L(t |D, θ)     ∗ ∗ ∗ + π (θ) p(t |θ)L(t |D, θ) log p(t |θ)dtdt dθ



 

 =

 





dtdt ∗ dθ



dtdt ∗ dθ

≡ C1 + C2 . t ∗ can be seen as a new observation, then p(t ∗ |θ)L(t |D, θ) is the likelihood function of (t ∗ , t ). Thus, according to Lemma 1,

−C 1 =

d 2

 log

n+1

 +

2π e

1



2

π (θ) log |ID (θ)|dθ + ∗



π (θ) log(1/π (θ))dθ + o(1),

where ID∗ (θ) is the Fisher information matrix based on (t ∗ , t ). Thus we have ID∗ (θ)   π(θ) p(t ∗ |θ) log p(t ∗ |θ)dtdt ∗ dθ . Thus,

=

n +1 I n D

(θ). And C2

=

A = B1 + C1 + C2

=

d 2

 =

 log

π (θ)

n n+1



 +

1 2

 log

n n+1



 +

π (θ)



p(t ∗ |θ) log p(t ∗ |θ)dtdt ∗ dθ + o(1)

p(t ∗ |θ) log p(t ∗ |θ)dt ∗ dθ + o(1).

Thus, the result holds immediately.



The result of Theorem 1 is interesting, since as the sample size goes to infinity, M ∗ (D) is independent of the design D. Thus, the optimal design is arbitrary. Different from classic design criterion that based on asymptotic normality of maximum likelihood estimators, large-sample approximation does not work well for the criterion.

12

A. Xu, Y. Tang / Journal of Statistical Planning and Inference (

)



Appendix B The following procedure can be used to calculate M (Di ) for a design Di . 1. Generate θ (i) from the prior π (θ), i = 1, 2, . . . , N. For each θ (i) , generate the data t (i) from the ALT model considered. 2. For (4.1), U (t (i) , Di ) =



p(θ|t (i) , Di ) log

p(θ|t (i) ,Di )

π(θ)

dθ .

• If p(θ|t , Di ) has a closed form, then we can generate posterior samples of θ , say θ (1) , θ (2) , . . . , θ (N1 ) , from p(θ|t (i) , Di ) N1 p(θ (j) |t (i) ,Di ) . by MCMC method, and estimate U (t (i) , Di ) by N1 j=1 log π (θ (j) ) 1 • In general cases,  p(θ|t (i) , Di ) U (t (i) , Di ) = p(θ|t (i) , Di ) log dθ π (θ)  L(t (i) |Di , θ) = p(θ|t (i) , Di ) log dθ pt (i) (t (i) |Di )  p(θ|t (i) , Di ) log{L(t (i) |Di , θ)}dθ − log{pt (i) (t (i) |Di )} = (i)

≡ G1 − log{pt (i) (t (i) |Di )}. N1 (j) (i) (i) Then G1 can be estimated by N1 j=1 log{L(t |Di , θ )}. According to Newton and Raftery (1994), pt (i) (t |Di ) can be 1 estimated as

 pˆ t (i) =

N1 1 

−1

1

,

N1 j=1 L(t (i) |Di , θ (j) )

(B.2)

which is a simulation-consistent estimate. Sometimes, pˆ t (i) will not be stable, then the following quantity can be considered, that is,



−1

g (θ (j) )

N1 1 

N1 j=1 L(t (i) |Di , θ (j) )π (θ (j) )

,

(B.3)

where g (θ) is a density with tails thinner than the product of the prior and the likelihood. See Gelfand and Dey (1994).

For (4.2), U (t (i) , Di ) =



(i)

p(α|t (i) , Di ) log

p(α|t (i) ,Di )

πα (α)

dα . We discuss the computation of U (t (i) , Di ) by three scenarios.

• Scenario 1: p(α|t , Di ) has a closed form. Then the computation is straightforward. Firstly, generate posterior samples of α say, α (1) , α (2) , . . . , α (N1 ) , from p(α|t (i) , Di ), and estimate U (t (i) , Di ) by N1 1  p(α (j) |t (i) , Di ) . log N1 j=1 πα (α (j) ) • Scenario 2: p(α|β, σ , t (i) , Di ) has a closed form. We have p(α|t (i) , Di ) =

p(α|β, σ , t (i) , Di )p(β, σ |t (i) , Di )dβ dσ ,

where p(β, σ |t (i) , Di ) is the joint posterior of (β, σ ) given t (i) and Di . Then we can generate samples of η, say (α (1) , β (1) , σ (1) )′ , . . . , (α (N1 ) , β (N1 ) , σ (N1 ) )′ , from π (η|t (i) , Di ). For each fixed α ∗ , p(α ∗ |t (i) , Di ) can be estimated as N1 1  pˆ (α ∗ |t (i) , Di ) = p(α ∗ |β (j) , σ (j) , t (i) , Di ). N 1 j =1 Then U (t , Di ) can be estimated by (i)

N1 1 

log

pˆ (α (l) |t (i) , Di )

N1 l=1 πα (α (l) ) • Scenario 3: In general, U (t (i) , Di ) =



.

p(α|t (i) , Di ) log

p(α|t (i) , Di )

dα π (α)     L(t (i) |Di , η)πβ (β)πσ (σ )dβ dσ = p(α|t (i) , Di ) log dα pt (i) (t (i) |Di )     = p(α|t (i) , Di ) log L(t (i) |Di , η)πβ (β)πσ (σ )dβ dσ dα − log{pt (i) (t (i) |Di )}

≡ G2 − log{pt (i) (t (i) |Di )}.

A. Xu, Y. Tang / Journal of Statistical Planning and Inference (

)



13

pt (i) (t (i) |Di ) can be estimated according to (B.2) or (B.3). The evaluation of G2 can be as follows: (i) (1)

(1)

(2)

(N2 )

(2)

generate samples of (β, σ ), say (β0 , σ0 ), (β0 , σ0 ), . . . , (β0 1 N1

N1

k=1

log



1 N2

N2

j =1

(i)

L(t |Di , α

(k)

(j)

(j)



(N2 )

, σ0

), from πβ (β)πσ (σ ), (ii) estimate G2 by

, β0 , σ0 ) .

3. Compute the expected Shannon information M (Di ) by

1 N

N

i=1

U (t (i) , Di ).

References Bai, D.S., Kim, M.S., Lee, S.H., 1989. Optimum simple step stress accelerated life tests with censoring. IEEE Trans. Reliab. 38, 528–532. Berger, J.O., Bernardo, J.M., 1989. Estimating a product of means: Bayesian analysis with reference priors. J. Amer. Statist. Assoc. 84, 200–207. Berger, J.O., Bernardo, J.M., Sun, D., 2009. The formal definition of reference priors. Ann. Statist. 37, 905–938. Bernardo, J.M., 1979. Reference posterior distributions for Bayesian inference (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 41, 113–147. Clarke, B.S., Barron, A.R., 1990. Information theoretic asymptotics of Bayes methods. IEEE Trans. Inform. Theory 36, 453–471. Epanechnikov, V.A., 1969. Non-parametric estimation of a multivariate probability density. Theory Probab. Appl. 14, 153–158. Erkanli, A., Soyer, R., 2000. Simulation-based designs for accelerated life tests. J. Statist. Plann. Inference 90, 335–348. Escobar, L.A., Meeker, W.Q., 1995. Planning accelerated life test plans with two or more experimental factors. Technometrics 37, 411–427. Fard, N., Li, C., 2009. Optimal simple step stress accelerated life test design for reliability prediction. J. Statist. Plann. Inference 139, 1799–1808. Gelfand, A.E., Dey, D.K., 1994. Bayesian model choice: asymptotics and exact calculations. J. R. Stat. Soc. Ser. B Stat. Methodol. 56, 501–514. Ghosal, S., 1997. Reference priors in multiparameter nonregular cases. TEST 6, 159–186. Ghosh, J.K., Mukerjee, R., 1992. Noninformative priors (with discussion). In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian Statistics. Vol. 4. Oxford University Press, pp. 195–210. Guan, Q., Tang, Y., 2012. Optimal step-stress test under Type-I censoring for multivariate exponential distribution. J. Statist. Plann. Inference 142, 1908–1923. Guan, Q., Tang, Y., Xu, A., 2013. Objective Bayesian analysis for bivariate Marshall–Olkin exponential distribution. Comput. Statist. Data Anal. 64, 299–313. Hurvich, C.M., Simonoff, J.S., Tsai, C.L., 1998. Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. R. Stat. Soc. Ser. B 60, 271–293. Lindley, D.V., 1956. On a measure of the information provided by an experiment. Ann. Math. Statist. 27, 986–1005. Liu, X., Qiu, W., 2011. Modeling and planning of step-stress accelerated life tests with independent competing risks. IEEE Trans. Reliab. 60, 712–720. Liu, X., Tang, L.C., 2010. Accelerated life test plans for repairable systems with multiple independent risks. IEEE Trans. Reliab. 59, 115–127. Ma, H., Meeker, W.Q., 2010. Strategy for planning accelerated life tests with small sample sizes. IEEE Trans. Reliab. 59, 610–619. Meeker, W.Q., Escobar, L.A., 1998. Statistical Methods for Reliability Data. John Wiley & Sons. Miller, R., Nelson, W., 1983. Optimum simple step-stress plans for accelerated life testing. IEEE Trans. Reliab. 32, 59–65. Muller, P., Parmigiani, G., 1995. Optimal design via curve fitting of Monte Carlo experiments. J. Amer. Statist. Assoc. 90, 1322–1330. Nelson, W., 1980. Accelerated life testing-step-stress models and data analysis. IEEE Trans. Reliab. 29, 103–108. Nelson, W.B., 1990. Accelerated Testing. John Wiley & Sons. Newton, M.A., Raftery, A.E., 1994. Approximate Bayesian inference by the weighted likelihood bootstrap (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 56, 3–48. Tang, L.C., Liu, X., 2010. Planning and inference for a sequential accelerated life test. J. Qual. Technol. 42, 103–118. Tsai, C., Tseng, S., Balakrishnan, N., 2012. Optimal design for degradation tests based on gamma processes with random effects. IEEE Trans. Reliab. 61, 604–613. Verdinelli, I., Polson, N.G., Singpurwalla, N.D., 1993. Shannon information and Bayesian design for prediction in accelerated life testing. In: Barlow, R.E., Clariotti, C.A., Spizzichino, F. (Eds.), Reliability and Decision Making. Chapman & Hall, pp. 247–256. Vopatek, A.L., 1992. Design of accelerated life tests: a Bayesian approach (Doctoral Dissertation), The George Washington University. Xu, A., Tang, Y., 2011. Objective Bayesian analysis of accelerated competing failure models under Type-I censoring. Comput. Statist. Data Anal. 55, 2830–2839. Xu, A., Tang, Y., 2012. Objective Bayesian analysis for linear degradation models. Comm. Statist. Theory Methods 41, 4034–4046. Ye, Z.S., Hong, Y., Xie, Y., 2013. How do heterogeneities in operating environment affect field failure predictions and test planning? Ann. Appl. Stat. 7, 2249–2271. Yuan, T., Liu, X., Kuo, W., 2012. Planning simple step-stress accelerated life tests using Bayesian methods. IEEE Trans. Reliab. 61, 254–263. Zhang, Y., Meeker, W.Q., 2006. Bayesian methods for planning accelerated life tests. Technometrics 48, 49–60.