Journal of Statistical Planning and Inference 138 (2008) 3660 -- 3666
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i
On the mean residual life of records Mohammad Z. Raqaba,∗ , Majid Asadib a Department b Department
of Mathematics, University of Jordan, Amman 11942, Jordan of Statistics, University of Isfahan, Isfahan 81744, Iran
ARTICLE
INFO
ABSTRACT
Article history: Received 31 July 2006 Received in revised form 14 September 2007 Accepted 13 November 2007 Available online 20 March 2008
Suppose that a technical system is subject to shocks, e.g. peaks of voltages from a sequence of identically independent voltages having a lower limit value v > 0. We propose a new definition for the mean residual life of the records of the sequence and study its various properties. © 2008 Elsevier B.V. All rights reserved.
Keywords: Record statistics Hazard rate Generalized Pareto distribution Characterization Stochastic order
1. Introduction Let X1 , X2 , . . . , be a sequence of identically independent distributed (iid) voltages of a technical system with a common absolutely continuous cumulative distribution function (cdf) F, probability density function (pdf) f, and survival function F = 1 − F. Suppose that the technical system is subject to shocks, e.g. peaks of voltages. If the shocks are viewed as realizations of this sequence of random variables (r.v.s), then the model of record statistics (values of successive peak voltages) is adequate. Let Xi:n stand for the ith order statistic obtained from the first n observations. The sequence of upper records can be defined as XU(n) = XUn :Un ,
n = 0, 1, . . . ,
where U0 = 1, Un = min{j : j > Un−1 , Xj > XU
n−1 :Un−1
},
n 1.
The sequence of lower records can be defined similarly. In this paper, we consider the upper records. Generally, record values can be viewed as order statistics from a sample whose size is determined by the values of occurrence of the observations. Consider a k-out-of-n system, in which all components are independent and have the same distribution, F. The failure time of such system corresponds to the (n − k + 1)th order statistic from the underlying distribution F. Recent developments on the mean residual of such system are presented in Asadi and Bayramoglu (2005, 2006) and Asadi (2006). The nth upper record is just the failure time of a 1-out-of-U(n) system. Under mild conditions, the structure of record values is the same as that of the occurrence times of some corresponding nonhomogeneous Poisson process of some minimal repair and of the relevation transform (see Gupta and Kirmani, 1988). Record ∗
Corresponding author. E-mail addresses:
[email protected] (M.Z. Raqab),
[email protected] (M. Asadi).
0378-3758/$ - see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2007.11.014
M.Z. Raqab, M. Asadi / Journal of Statistical Planning and Inference 138 (2008) 3660 -- 3666
3661
statistics can also be defined as a model for successive extremes in a sequence of iid r.v.s. It may be helpful as a model for successively largest insurance claims in non-life insurance, for highest water levels or highest temperatures. For more details about records and applications, one may refer to Nagaraja (1988), Arnold et al. (1998), and Ahsanullah (2004). Now, the marginal pdf of XU(n) is fU(n) (x) =
[− log F(x)]n f(x), n!
x > 0, n 0,
(1)
and the joint pdf of XU(m) and XU(n) (0 m < n) is given by fU(m),U(n) (xm , xn ) =
[− log F(x)]m m!
− log
F(xn )
n−m−1
F(xm ) (n − m − 1)!
f(xm ) F(xm )
f(xn ),
0 < xm < xn < ∞.
(2)
From (1), the survival function of XU(n) , at value v > 0, is F n (v) = P(XU(n) > v) ∞ [− log F(x)]n f(x) dx = n! v ∞ xn e−x = dx n! − log F(v) =
n [− log F(v)]j F(v) j! j=0
= P(Y n),
(3)
where Y is a Poisson r.v. with mean (− log F(v)). Let X be the voltage of a unit. Then, given that the unit is of voltage exceeding v > 0, the residual quantity after the lower limit value v is (X − v|X > v), is called residual r.v. The mean of this r.v. is called the mean residual at value v. For intensive review and applications of the mean residual function, see for example, Guess and Proschan (1988). For simplicity, let us denote the r.v. (X − v|X > v) by Xv . Its survival function is F(x|v) = P(X − v > x|X > v) =
F(v + x) F(v)
.
The mean residual of each unit is given by m(t) = E(X − v|X > v) ∞ = F(x|v) dx 0 ∞ F(x) dx . = v F(v) The mean residual function uniquely characterizes the parent distribution F(x) (see, for example, Kotz and Shanbhag, 1980). That is F(v) =
m(0) − v 1/m(x) dx . e 0 m(v)
Generally, the mean residual function play an important role in reliability theory, risk analysis, renewal theory, dynamic programming and branching processes. In this paper, we are interested in evaluating the residual records XU(n) − v, v 0 based on knowing the fact that all units are of voltages exceeding v > 0. Under this setup, let us consider the (MRLR) as follows:
n (v) = E(XU(n) − v|XU(0) > v). The MRLR n (v) can be considered as the mean of the residual life length of a 1-out-of-U(n) system. In the context of stochastic processes, a number of useful relations connect record values with Poisson processes. For example, if the Xi 's have a continuous cdf F(x), then the sequence n = − log(1 − F(XU(n) )), where log stands for the natural logarithm, coincides with the jump times of
3662
M.Z. Raqab, M. Asadi / Journal of Statistical Planning and Inference 138 (2008) 3660 -- 3666
a Poisson process. In this connection, the MRLR's do arise naturally. For more details about the relations between the stochastic processes and record values, see, for example, Nevzorov (1987) and Arnold et al. (1998, Chapter 7). Several properties of n (v) are studied. In Section 2, we have obtained the form of n (v) in terms of the distribution function F. It is shown, in this section, that when the underlying model is generalized Pareto, the MRLR n (v) is a linear function of v. In Section 3, we prove a representation theorem by which one can recover the parent distribution using n (v). It is also shown, among other things, that when the distribution function F has a monotone hazard rate then the MRLR n (v) is monotone. 2. The mean residual life of record In this section, we consider the mean residual of the voltages peaks (records) on the basis that all voltages of units exceed a lower limit value v > 0. Let us first derive an expression for the mean residual life of records. Theorem 1. Let X1 , X2 , . . . , be a sequence of iid voltages from absolutely continuous distribution F. Given that XU(0) > v, v > 0, the mean residual life of records after v is
n (v) =
j n 1 ∞ F(x) F(x) − log dx. j! v F(v) F(v)
(4)
j=0
Proof. It follows from (2) that the survival function of the XU(n) − v|XU(0) > v can be written as F n (x|v) = P(XU(n) v + x|XU(0) > v) n−1 F(xn ) ∞ xn − log F(x0 ) f(x0 ) 1 f(xn ) dx0 dxn = (n − 1)! F(x0 ) F(v) v+x v F(v + x) = n + 1, − log F(v) j n 1 F(x + v) F(x + v) = , − log j! F(v) F(v)
(5)
j=0
where
(a, t) =
∞ 1 xa−1 e−x dx. (a − 1)! t
Clearly, the distribution of (XU(n) −v|XU(0) > v) is just the distribution of the nth record of a sequence of iid r.v.s from the conditional distribution of Xv = X − v|X > v. Given that all units are of voltages exceeding v > 0, we obtain the MRLR of the record system as
n (v) = E(XU(n) − v > x|XU(0) > v) = = =
∞ 0 ∞ 0
F n (x|v) dx
n + 1,
F(v + x) F(v)
n 1 ∞ F(x) − log j! v F(v) j=0
j
dx F(x) F(v)
dx.
(6)
This completes the proof of the theorem. From (6), n (v) can also be represented as ∞ n (v) = P(Yv (x) n) dx, v
where Yv (x) is a Poisson r.v. with mean v (x) = − log F(x)/F(v), x > v. The generalized Pareto distribution (GPD) plays an important role in reliability, extreme value theory and other branches of statistics. A r.v. X is said to have GPD if its survival function is given by −1
F(x; ) = (1 + x)−
,
x 0 when 0 < 1, 0 x − −1 when < 0.
M.Z. Raqab, M. Asadi / Journal of Statistical Planning and Inference 138 (2008) 3660 -- 3666
3663
For → 0, the GPD becomes standard exponential distribution. When > 0, the family of GPD reduces to Pareto type II distribution or Lomax distribution. This distribution has been used in studies of income, size of cities and reliability modeling. When < 0, the GPD is just the power distribution. The Pareto distribution can also be used in the generation of self-similar traffic for a packet injections within a task and motivation of the use of dynamic voltage scaling for minimizing power consumption (see, for example, Shang et al., 2003). For more details about the characterization of this model using records, one may refer to Raqab and Awad (2000). Example 1. Let X be a r.v. having GPD with survival function F(x; ) = (1 + x)−
−1
,
x 0 when 0 < 1, 0 x − −1 when < 0.
From (6), the mean residual life of record for GPD can be simplified as
n (v) = (1 + v)
n
1
(1 − )j+1 j=0
,
< 1.
For the exponential distribution ( → 0), the MRLR is just n (v) = (n + 1), which does not depend on v. For < 1 and = 0, we immediately have
n (v) =
1 + v
n+1 1 −1 . 1−
In the case of power distribution function ( < 0), the MRLR of the system converges to −(1 + v)/ as n gets large. In the case of classical Pareto distribution (Pareto type I), with cdf F(x) = 1 − x− , x > 1, > 1, the MRLR is derived to be
n (v) = v
n+1
−1 ,
−1
> 1.
Generally, the increments of MRLR's are
n (v) − n−1 (v) =
1 ∞ [v (x)]n e−v (x) dx, n! v
x > v.
This leads to the following remark. Remark 1. For all v > 0, we have
n−1 (v) n (v), n = 0, 1, . . .
and −1 (v) = 0.
That is n (v) is an increasing function of n. The inequality becomes strict inequality if the distribution function F is strictly increasing. Recursively, n (v) m(v), for v > 0. 3. Characterizations and properties of n (v) In this section, we present some distributional properties of the MRLR n (v). The following theorem provides a characterization of the underlying distribution via n (v) and n−1 (v). Theorem 2. Let X1 , X2 , . . . , be a sequence of iid voltages from a common absolutely continuous distribution function F(x). Then the distribution function F(x) can be characterized as follows: F(x) = 1 − e−
x
0 (1+n (y))/(n (y)−n−1 (y)) dy ,
x > 0 and n = 0, 1, . . . ,
(7)
where −1 (x) = 0 Proof. From (6), we have
n (v) =
∞ n [v (x)]j e−v (x) dx. j! v j=0
(8)
3664
M.Z. Raqab, M. Asadi / Journal of Statistical Planning and Inference 138 (2008) 3660 -- 3666
On differentiating both sides of (8) with respect to v, we immediately have ⎤ ∞ ∞ n n [v (x)]j e−v (x) [v (x)]j−1 e−v (x) ⎦ dx − dx − 1 j! (j − 1)! v v
⎡
n (v) = r(v) ⎣
j=0
j=1
or equivalently 1 + n (v) = r(v)(n (v) − n−1 (v)), where r(v) = f(v)/F(v) is the hazard rate of the distribution function F. Therefore r(v) =
1 + n (v) , n (v) − n−1 (v)
v > 0.
Integration of both sides of the above equation from 0 to x yields (7). In fact, if F is strictly increasing then by Remark 1, n (v) > n−1 (v), n = 1, 2, . . . , and r(v) is well-defined. Remark 2. For a technical system of voltages peaks from iid sequence of voltages from a common cdf F with
n (v) =
1 + v
((1 − )−n − 1),
= 0, < 1,
we have
n (v) − n−1 (v) =
1 + v . (1 − )n
From Theorem 2, the resulting parent cdf is GPD which is given in Example 1. Theorem 3. Assume that (x) = m(x)/m(0) where m denotes the mean residual life function of F. Then there exist a point v0 ∈ R+ with ¯ F(v 0 ) > 0 and a sequence {vm : m = 1, 2, . . .} of points lying in (v0 , ∞) such that it converges to v0 and XU(n) − vm d X > v = XU(n) , (v ) U(0) m
(9)
m
if and only if F is GPD. Proof. The ''if'' part of the theorem is easy to prove. To prove the ''only if'' part, note that the above equality holds if and only if 1 ∞ 1 ∞ n −u un e−u du = u e du, n! H[vm +(vm )x]−H(vm ) n! H(x) where H(x) = − log F(x) is the hazard function. This in turn implies that H[vm + (vm )x] − H(vm ) = H(x),
m = 0, 1, 2, . . . , x > 0.
From this we get that ¯ m )F(x), ¯ ¯ m + (vm )x) = F(v F(v
m = 0, 1, 2, . . . , x > 0.
The result follows from Theorem 3.4 of Asadi et al. (2001). The concept of proportional mean residual lives (MRLs) model, similar to Cox' proportional hazards model, as a semi-parametric model, is proposed in the literature to study association between the mean residual life and an individual subject's explanatory covariates (see, for example, Maguluri and Zhang, 1994). Let m1 and m2 denote the MRLs of distribution functions F and G, respectively. F and G are said to have proportional MRLs if and only if there exists a constant > 0 such that m1 (t) = m2 (t),
t > 0.
In the following theorem we prove that the MRLR n (t) and the MRL of the parent distribution are proportional if and only if the underlying distribution is GPD.
M.Z. Raqab, M. Asadi / Journal of Statistical Planning and Inference 138 (2008) 3660 -- 3666
3665
Theorem 4. Let X1 , X2 , . . . , be iid sequence of voltages with absolutely continuous distribution function F. Denote by m and n the MRL of F and the MRLR of the records, respectively, and assume that m(0) = (1 − )−1 . Then
n (v) = cn m(v), −1
where cn =
v > 0,
(10)
(1 − )((1 − )−(n+1) − 1), < 1, = 0, if and only if F is GPD of the form in Example 1.
Proof. The ''if'' part of the theorem easily follows from Example 1 on noting that for GPD in this form we have m(v)=(1+ v)/(1− ). To prove the ''only if'' part, let Eq. (10) hold. One can easily show that the relation between the hazard rate r(v) and the MRL m(v) is r(v) = (1 + m (v))/m(v). On the other hand, we have from Theorem 2 that r(v) =
1 + n (v) , n (v) − n−1 (v)
v > 0.
Hence we have 1 + m (v) 1 + cn m (v) , = m(v) (cn − cn−1 )m(v) which implies that cn − cn−1 − 1 = . cn−1 1−
m (v) =
Integration of both sides of the above equation leads to m(v) =
1−
v + c,
for some constant c. Using the assumption that m(0) = (1 − )−1 , we obtain c = (1 − )−1 . Hence the MRL of F is m(v) =
1 + v , 1−
< 1, v > 0,
which is the MRL of GPD. As the MRL uniquely determines the distribution, the proof is complete.
Remark 3. In Theorem 4 one does not actually need to assume that m(0) = (1 − )−1 . If we assume that m(0) = c, where c is an arbitrary positive constant, then Eq. (10), which gives m(v) = [/(1 − )]v + c characterizes a two parameter family of GPD with survival function ¯ = F(t)
−1 c(1 − ) , t + c(1 − )
t > 0, < 1, c > 0.
The following two theorems study the monotonicity of the MRLR n (v) and comparison between the MRLR of two technical record systems. In this context, a distribution function F is increasing failure rate (IFR) if its hazard rate r(v) is increasing and F is decreasing failure rate (DFR) if the corresponding r(v) is decreasing. Theorem 5. Let X1 , X2 , . . . , be iid sequence of voltages with absolutely continuous IFR (DFR) distribution function F. Then the MRLR of the system n (v) is decreasing (increasing). Proof. The result follows from the fact that F(v) is IFR (DFR) if and only if for x, v > 0, F(x + v)/F(v) is a decreasing (increasing) in v. If F is IFR (DFR) then for v1 < v2 , we have F(x + v1 ) F(v1 )
()
F(x + v2 ) F(v2 )
or equivalently − log
F(x + v1 ) F(v1 )
( ) − log
F(x + v2 ) F(v2 )
.
From (5), we obtain
F n (x|v) = (r(v) − r(x + v))(F n (x|v) − F n−1 (x|v)),
3666
M.Z. Raqab, M. Asadi / Journal of Statistical Planning and Inference 138 (2008) 3660 -- 3666
where g (.) denotes the derivative of g(.). Since F n (x|v) − F n−1 (x|v) =
n F(x + v) F(x + v) 1 − log n! F(v) F(v)
> 0, for x > 0, F n (x|v) is decreasing (increasing) in v > 0. Therefore, it follows from (6), n (v) is decreasing (increasing) in v.
Theorem 6. Let X1 , X2 , . . . , and Y1 , Y2 , . . . , be two iid sequences of voltages with absolutely continuous distribution functions F, G and hazard rates rF (v) and rG (v), respectively. If for v > 0, rF (v) rG (v), then Fn (v) G n (v).
Proof. To prove the result first note that if X and Y are two r.v.s such that P(X > u) P(Y > u), for all u, then X is said to be smaller than Y in the usual stochastic order (denoted by X st Y). One can note, from Shaked and Shanthikumar (1994, p. 13), that rF (t) rG (t) ⇔ (Y − t|Y > t) st (X − t|X > t). By noting that stochastic order is preserved by records, we obtain (YU(n) − t|YU(0) > t) st (XU(n) − t|XU(0) > t). This, in turn, implies that Fn (v) G n (v). Hence the proof is complete. Acknowledgment The authors thank the associate editor and referees for helpful suggestions which have considerably improved the presentation of the paper. The second author would like to thank the Statistics Center of Excellence of Ferdowsi University of Mashhad for supporting this reasearch work. References Ahsanullah, M., 2004. Record Values---Theory and Applications. University Press of America Inc., New York. Arnold, B.C., Balakrishnan, N., Nagaraja, H.N., 1998. Records. Wiley, New York. Asadi, M., 2006. On the mean past lifetime of the components of a parallel system. J. Statist. Plann. Inference 136, 1197--1206. Asadi, M., Bayramoglu, I., 2005. A note on the mean residual life function of a parallel system. Comm. Statist. Theory Methods 34 (2), 475--485. Asadi, M., Bayramoglu, I., 2006. The mean residual life function of a k-out-of-n structure at the system level. IEEE Trans. Reliability 55 (2), 314--318. Asadi, M., Rao, C.R., Shanbhag, D.N., 2001. Some unified characterization results on the generalized Pareto distribution. J. Statist. Plann. Inference 93, 29--50. Guess, F., Proschan, F., 1988. Mean residual life: theory and applications. In: Krishnaiha, P.R., Rao, C.R. (Eds.), Handbook of Statistics, vol. 7. pp. 215--224. Gupta, R.C., Kirmani, S.N.U.A., 1988. Closure and monotonicity properties of nonhomogeneous Poisson processes and record values. Probab. Eng. Inform. Sci. 2, 475--484. Kotz, S., Shanbhag, D.N., 1980. Some new approaches to probability distributions. Adv. in Appl. Probab. 12, 903--921. Maguluri, G., Zhang, C.H., 1994. Estimation in the mean residual life regression. J. Roy. Statist. Soc. B 56 (3), 477--489. Nagaraja, H.N., 1988. Record values and related statistics---a review. Comm. Statist. Theory Methods 17, 2223--2238. Nevzorov, V.B., 1987. Records. Theory Probab. Appl. 32 (2), 201--228 (English Translation). Raqab, M.Z., Awad, A.M., 2000. Characterization of Pareto and related distributions. Metrika 52, 63--67. Shaked, M., Shanthikumar, T.G., 1994. Stochastic Orders and their Applications. Academic Press Inc., New York. Shang, L., Peh, L.-S., Jha, N.K., 2003. Dynamic voltage scaling with links for power optimization of interconnection. In: Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA), Anaheim, CA.