Statistics and Probability Letters 79 (2009) 125–130
Contents lists available at ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
Equivalent processes of total time on test, Lorenz and inverse Lorenz processes Janusz Kawczak ∗ , Reg Kulperger a , Hao Yu a a
Department of Statistical and Actuarial Sciences, University of Western Ontario, London, Canada
article
info
a b s t r a c t
Article history: Received 17 October 2004 Received in revised form 11 January 2008 Accepted 23 July 2008 Available online 6 August 2008
In this paper we use empirical processes indexed by functions to study total time on time, Lorenz and inverse Lorenz (Goldie) processes. We show that these processes converge weakly to the same Gaussian limiting processes respectively. © 2008 Elsevier B.V. All rights reserved.
MSC: primary 62G05 secondary 60F17
1. Introduction Total time on test (TTT) is a statistical function that plays a central role in life testing of reliability. Applications and asymptotic theory of TTT have been developed by many authors (cf., for example, Barlow and Campo (1975), Barlow and Proschan (1975), Bergman and Klefsjö (1984) and Csörgő et al. [CsCsH] (1986)). The Lorenz curve is a graph of cumulative proportion of total ‘wealth’ owned against cumulative proportion of the population owning it. It is mainly used in economics, as applied to income and wealth, and also to business concentration and the distribution of sizes of firms (Hart, 1971, 1975). The asymptotic theory for Lorenz processes and inverse Lorenz processes has been developed by many authors (cf. Gastwirth (1971, 1972)), Kakwani and Podder (1973), Goldie (1977), Chandra and Singpurwalla (1978), Sendler (1982) and CsCsH (1986). In particular, the research monograph of CsCsH (1986) also contains a comprehensive review of developments concerning the asymptotic theory of TTT, Lorenz and inverse Lorenz processes up to that time in the i.i.d. case, and provides the first general convergence theory for i.i.d. based empirical total time on test, Lorenz and inverse Lorenz processes and some related empirical reliability processes. In what follows X denotes a generic random variable with distribution function F , F (0) = 0, and we assume throughout that the corresponding mean is finite:
µ = EX =
∞
Z
x dF (x) = 0
∞
Z
(1 − F (x)) dx < ∞. 0
Introduce the TTT transform and Lorenz curve of F respectively as H F ( u) =
Z
Q (u)
(1 − F (x)) dx,
0 ≤ u ≤ 1,
(1.1)
0
LF (u) =
∗
1
µ
u
Z
Q (t ) dt ,
0 ≤ u ≤ 1,
0
Corresponding author. E-mail address:
[email protected] (J. Kawczak).
0167-7152/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2008.07.041
(1.2)
126
J. Kawczak et al. / Statistics and Probability Letters 79 (2009) 125–130
where Q (t ) = F −1 (t ) = inf{x : F (x) ≥ t }, 0 < t ≤ 1, Q (0) = Q (0+), denotes the left-continuous quantile function of the right-continuously defined distribution function F . Since the Lorenz curve LF is continuous and strictly increasing on [0, 1], 1 −1 it has a well-defined continuous and strictly increasing inverse function L− F on [0, 1]. LF is also called the concentration curve pertaining to F . Let Fn (x) and Qn (u) be the sample empirical distribution and quantile functions, respectively defined by Fn (x) =
n 1X
n i =1
0≤x<∞
I {Xi ≤ x},
and Qn (u) = Fn−1 (u),
0 ≤ u ≤ 1,
(1.3)
where I (A) is the usual indicator function of the set A. Then the empirical TTT process tn (u) and its scaled version sn (u) are respectively defined by tn (u) = n1/2 (HFn (u) − HF (u)),
0 ≤ u ≤ 1,
sn (u) = n1/2 (HFn (u)/X¯n − HF (u)/µ), where X¯ n = n
−1
Pn
i=1
(1.4)
0 ≤ u ≤ 1,
(1.5)
Xi (the sample mean). Next the unscaled empirical Lorenz process kn (u) is defined by
kn (u) = n1/2 (X¯n LFn (u) − µLF (u)),
0 ≤ u ≤ 1,
(1.6)
the empirical Lorenz process ln (u) is defined by ln (u) = n1/2 (LFn (u) − LF (u)),
0 ≤ u ≤ 1,
(1.7)
and the empirical inverse Lorenz process, also called the Goldie concentration process, cn (u), is defined by 1 −1 cn (u) = n1/2 (L− Fn (u) − LF (u)),
0 ≤ u ≤ 1.
(1.8)
The structure of this paper is as follows. In Section 2, we derive some empirical processes indexed by functions through the limiting covariance functions of TTT, Lorenz and Goldie processes. Then we verify that these processes do converge weakly to the same Gaussian processes respectively under the same conditions. 2. Main results Let us first present the following limiting Gaussian processes of tn (u), sn (u), kn (u), ln (u) and cn (u) respectively. Their proofs can be found in CsCsH (1986). T F ( u) =
B(u)
u
Z
B(t ) dQ (t ) +
r (Q (u))
0
,
0 ≤ u ≤ 1,
SF (u) = µ−1 TF (u) − µ−2 HF (u)TF (1),
∆ F ( u) =
0 ≤ u ≤ 1,
u
Z
B(t ) dQ (t ),
0 ≤ u ≤ 1,
0
ΛF (u) = µ−1 {∆F (u) − LF (u)∆F (1)},
µΛF (LF (u)) −1
C F ( u) =
1
=
1 Q ( L− F (u))
1 Q (L− F (u))
0 ≤ u ≤ 1, 1 {∆F (L− F (u)) − u∆F (1)},
0 ≤ u ≤ 1, F 0 (x)
where {B(u), 0 ≤ u ≤ 1} is a standard Brownian bridge and r (x) = rF (x) = 1−F (x) is a failure or hazard rate function. For the computations of covariance functions of these Gaussian processes, we need the following extended Hoeffding identity (cf. Theorem 2.3 of Yu (1993)). Cov(f1 (X1 ), f2 (X2 )) =
Z
∞
∞
Z
−∞
f10 (x1 )f20 (x2 )Cov (I (X1 ≤ x1 ), I (X2 ≤ x2 )) dx1 dx2 , −∞
for any absolutely continuous functions f1 and f2 . Based on the formula (2.1), we have ETF (s)TF (t ) =
Z sZ 0
+
t
EB(u)B(v) dQ (u) dQ (v) + 0
1 r (Q (s)) Q (s)
Z
Q (t )
Z
= 0
t
Z
0
EB(s)B(v) dQ (v) + 0
1
1
r (Q (s)) r (Q (t )) 1 r (Q (t ))
Z
EB(s)B(t )
s
EB(u)B(t ) dQ (u)
0
Cov (I (X ≤ x), I (X ≤ y)) dx dy +
1
1
r (Q (s)) r (Q (t ))
Cov (I (F (X ) > s), I (F (X ) > t ))
(2.1)
J. Kawczak et al. / Statistics and Probability Letters 79 (2009) 125–130
1
+
r (Q (s)) r (Q (t ))
Cov (I (I (F (X ) > s) ≤ x), I (X ≤ y)) dx dy
Q (s)
1
Z
Cov (I (X ≤ x), I (I (F (X ) > t ) ≤ y)) dx dy 0
0
= Cov (tF (X , s), tF (X , t )) , where tF (X , u) = XI (X ≤ Q (u)) + (Q (u) +
(2.2)
1 r (Q (u))
)I (X > Q (u)). Similarly we have
ESF (s)SF (t ) = Cov(sF (X , s), sF (X , t )),
Z sZ
E ∆F (s)∆F (t ) =
127
0
0
Z
1
+
Q (t )
1Z
Z
(2.3)
t
EB(u)B(v) dQ (u) dQ (v)
Z0 Q (s0) Z
Q (t )
= 0
Cov (I (X ≤ x), I (X ≤ y)) dx dy
0
= Cov (kF (X , s), kF (X , t )) ,
(2.4)
E ΛF (s)ΛF (t ) = Cov(lF (X , s), lF (X , t )),
(2.5)
ECF (s)CF (t ) = Cov(cF (X , s), cF (X , t )),
(2.6)
where sF (X , u) = µ−1 tF (X , u) − µ−2 HF (u)tF (X , 1) = µ−1 tF (X , u) − µ−2 HF (u)X , kF (X , u) = XI (X ≤ Q (u)) + Q (u)I (X > Q (u)), lF (X , u) = µ−1 {kF (X , u) − LF (u)kF (X , 1)} = µ−1 {kF (X , u) − LF (u)X }, cF (X , u) =
1 µl F ( X , L − F (u))
Q (LF (u)) −1
=
1 Q (LF (u)) −1
1 {kF (X , L− F (u)) − uX }.
In the light of these calculations it is of interest to seek the distributional properties of their empirical processes corresponding to t , a, k, l, c. We choose to use a concise notation from the empirical processes’ standpoint. In each of the following processes the indexing space varies accordingly to the prescription of the process. Define for 0 ≤ u ≤ 1, tˆn (u) = n−1/2
n X
(tF (Xi , u) − EtF (Xi , u)) ,
i=1
sˆn (u) = n−1/2
n X
(sF (Xi , u) − EsF (Xi , u)) = µ−1 tˆn (u) − µ−2 HF (u)tˆn (1),
i=1 n X kˆ n (u) = n−1/2 (kF (Xi , u) − EkF (Xi , u)), i=1 n X ˆln (u) = n−1/2 (lF (Xi , u) − ElF (Xi , u)) = µ−1 (kˆ n (u) − LF (u)kˆ n (1)), i=1 n X cˆn (u) = n−1/2 (cF (Xi , u) − EcF (Xi , u)) = i=1
1 1 Q ( L− F (u))
1 ˆ (kˆ n (L− F (u)) − ukn (1)).
Let’s concentrate on the tˆn (·) process first, which can be naturally partitioned into: tˆn (u) = n−1/2
n X
(Xi I (Xi ≤ Q (u)) − EXi I (Xi ≤ Q (u)))
i=1
+n
−1/2
Q (u) +
X n
1 r (Q (u))
{I (Xi > Q (u)) − EI (Xi > Q (u))}
i=1
= tˆn(1) (u) + tˆn(2) (u), = tˆn(1) (u) + n−1/2 Q (u)
(2.7) n
X
{I (Xi > Q (u)) − EI (Xi > Q (u))}
i =1
+n−1/2
1
n X
r (Q (u)) i=1
{I (Xi > Q (u)) − EI (Xi > Q (u))}
= tˆn(1) (u) + tˆn(21) (u) + tˆn(22) (u)
(2.8)
128
J. Kawczak et al. / Statistics and Probability Letters 79 (2009) 125–130
√
by using the definition of tF . Let ∞ · 0 = 0, and zˆn = n(X¯ n − µ), where X¯ n is the sample average. Then, the processes of (·) interest can be rewritten as a linear combination of the tˆn and zˆn processes as follows: sˆn (u) = µ−1 tˆn (u) − µ−2 zˆn
ˆ(1)
ˆ(21)
kˆ n (u) = tn (u) + tn
(2.9)
( u)
(2.10)
ˆln (u) = µ−1 tˆn(1) (u) + tˆn(21) (u) − LF (u)ˆzn cˆn (u) =
1
(2.11)
1 1 ˆ(21) (L− ˆn . tˆn(1) (L− F (u)) + tn F (u)) − uz
1 Q (L− F (u))
(2.12)
It is apparent from the above display that obtaining the limiting distribution of the components of tˆn enables us to draw the conclusions about the rest of the processes. However, the process tˆn itself requires special treatment which is due to an (22) extra term, tˆn . Despite the differences, it is convenient to utilize the two-term representation in (2.8) for investigating tˆn . Since we are going to use the results from Andersen and Dobrič (1987, 1988) it is convenient, where applicable, to adopt the notation of stochastic processes theory as therein. For the sake of completeness, let us recall some notation and useful results from those papers. Let (Ω , F , P ) and (S , S , µ) be probability spaces and T an index set. If X is an S-valued random variable on (Ω , F , P ) with distribution PX = µ, then we say that X is P-perfect if P ? (X ∈ A) = µ? (A) for all A j S, where P ? denotes the outer P-measure. For a real valued function f on S × T the following notation is adopted: ft = f (·, t ) ∈ RS ,
f (s) = f (s, ·) ∈ RT .
A stochastic process f on (S , S , µ) with timeset T is a real valued function on S × T , and it is said to be centered if
Z
ft dµ = 0,
∀t ∈ T ,
S
and a second-order process if
Z
ft2 dµ < ∞,
∀t ∈ T .
S
If f is a second-order process, then it induces a pseudometric, denoted by ρf , on T given by
ρf (u, v)2 =
Z
(fu − fv )2 dµ,
∀u, v ∈ T .
Let (B(T ), k·kT ) denote the set of all bounded real valued functions on T , equipped with the topology induced by the supremum norm k · kT on RT . Let (S N , S N , µN ) be a countable product of (S , S , µ) and let {Πn } be a sequence of natural projections from S N onto S. The stochastic process f is said to satisfy the central limit theorem in (B(T ), k · kT )√or f P ∈ CLT(B(T ), k · kT ) if {f (s)|s ∈ S } j B(T ) and if there exists a Radon probability measure γf on (B(T ), k · kT ) so that (1/ n) ni=1 f (Πi ) converges in law to γf , i.e., ?
lim E g
n→∞
n 1 X
√
n i=1
! f (Πi )
Z =
gdγf
for all bounded, continuous functions g from (B(T ), k·kT ) into R. (Here E ? denotes the upper µN -integral.) Also, as mentioned in Andersen and Dobrič (1988) if we look at the central limit theorem in (RT , k · kT ), then there is no restriction to assume that {f (s)|s ∈ S } j B(T ) and that the limit measure γf is concentrated on (B(T ), k · kT ). Remark 1. In the above argument the natural projections can be replaced by the sequence on i.i.d random variables without affecting the final statement. Remark 2. If L is a subset of RT , then (L, k · kT ) denotes L equipped with the topology induced by k · kT , and K (L), G (L), F (L), and B (L) denote the sets of all compact, open, closed and Borel subsets of (L, k · kT ). A Borel measure on (L, k · kT ) is a measure on (L, B (L)) and a Radon measure µ on (L, k · kT ) is a finite Borel measure on (L, k · kT ) so that
µ(B) = sup{µ(K )|K ⊆ B, K ∈ K (L)}, for all B ∈ B (L). Theorem 5.5 and Proposition 5.6, both in Andersen and Dobrič (1987), can equivalently be expressed by Theorem 1 in Andersen and Dobrič (1988):
J. Kawczak et al. / Statistics and Probability Letters 79 (2009) 125–130
129
Theorem 1. Let f be a centered, second-order stochastic process on (S , S , µ) with timeset T so that {f (s)|s ∈ S } j B(T ). Let X = {Xn } be a sequence of independent, identically distributed S-valued random variables on (Ω , F , P ) with common distribution P µ and assume that X is P-perfect. If Sn = ni=1 f (xi ) for all n ∈ N, then the following two statements are equivalent: f ∈ CLT(B(T ), k · kT ), there exists √ a totally bounded pseudometric ρ on T so that {(1/ n)Sn } is eventually uniformly ρ -equicontinuous. And if one (and hence both) of the statements holds, then
ρf is totally bounded and γf (Cu (T√ , ρf )) = 1, for all pseudometrics ρ 0 on T {(1/ nSn } is eventually uniformly ρ 0 -equicontinuous if and only if γf (Cu (T , ρ 0 )) = 1, where Cu (T , ρ) is the set of all bounded and uniformly ρ -equicontinuous real-valued functions on (T , ρ). (i)
It is a well known fact that the weak convergence of {tˆn , for i = 1, 2} will follow from the weak convergence of the finite dimensional distributions plus the stochastic equicontinuity (eventual tightness) with respect to some pseudometric and such that the indexing set is a totally bounded space with that pseudometric (see Andersen and Dobrič (1987)). The (1) following is a direct consequence of the second statement in Theorem 1. Now, for the process, tˆn , it follows directly from the multivariate CLT that the f.d.d’s converge to the multivariate Gaussian distribution with appropriate covariance matrix. Here we have that f (Xi ) = Xi I (Xi ≤ Q (·)) − EXi I (Xi ≤ Q (·)) ∈ R[0,1] . Also, an intrinsic metric of the process which exploits the structure of the covariance can be taken as:
Z %(s, t ) =
Q (s∨t ) Q (s∧t )
1/2 x dF (x) 2
where s, t ∈ [0, 1], (1)
since Q (·) is a monotone, nondecreasing real function. Thus, it remains to show the stochastic equicontinuity of tˆn respect to %, which is equivalent to the following condition: for each ε > 0 and η > 0 there is a δ > 0 such that
with
! lim sup Pr n
sup |tˆn(1) (s) − tˆn(1) (t )| > η < ε.
%(s,t )<δ
Proof. Here we give the sketch of the proof. Since the increments on the stochastic process are bounded with respect to the pseudometric ρ , the proof will follow as in Pollard (1984), using the special construction of the grid plus the chaining argument. (2)
The complication arises in the treatment of the tˆn process. Because of the weight function, which may be unbounded, one is not able to construct an appropriate pseudometric, which will make the index space totally bounded. But, when a special condition on the weight function is imposed, e.g. in the Chibisov–O’Reilly theorem, the argument about the weak (2) law becomes a standard one. The empirical process tˆn is an example of a classical uniform weighted empirical process and has been studied by many researchers. Let’s ignore the centering of the process for a while, and let us denote by q(·) a weight function which is equal to (Q (·) + r (Q1(·)) ). Let P be a Lebesgue measure on [0,1], and let F = {q(u)I[0,u] : u ∈ (0, 1/2]} and also assume that q is a measurable weight function for which there exists γ ∈ (0, 1/2) such that q is non-increasing on (0, γ ) and uniformly bounded on [γ , 1/2] (usually q will be considered as U-shaped, so the problem can be treated on the half-interval). By Theorem 4.4 in Andersen et al. (1988) and Example 4.9 therein or by Example 2.11.15 in van der Vaart and Wellner (1996) we get the Chibisov–O’Reilly theorem if the following condition on the weight function is satisfied: γ
Z
1
t (1 − t ) or equivalently
exp −
0
F is P-preguassian
ε t (1 − t )q2 (t )
dt < ∞
for all ε > 0,
and (t (1 − t ))1/2 q(t ) −→ 0
as t → 0.
In our case all these conditions ought to be translated to accommodate the other end of the interval in the vicinity of 1. (2) But this is obtained automatically for such a defined function q. As a consequence of the above result the process tˆn has a (1) limiting distribution of a Gaussian process. Since the space of CLT’s is a linear space, the limit law of a pointwise sum of tˆn (2) and tˆn also belongs to the space of CLT. Furthermore, because of the property (2.2), we can conclude that the covariance structure of the limiting process TF is the same as the one of tˆn . Now, when considering the processes in (2.9)–(2.11) one gets an immediate result about the limiting distributions of those processes by utilizing the result about the tˆn process. The covariance structure, by the nature of the empirical processes, is as of a, k, l processes. With regard to the cˆn (u) process, once we notice that a Lorenz curve of F LF (u) is a monotone, nondecreasing function in u and hence an inverse also possesses the same property, the result about a weak law readily 1 follows. The only point, which requires special care, is at zero, i.e. 1/Q (L− F (0) may be undefined or infinity. But, without a loss of generality we can restrict the process cˆn (u) to the interval (0, 1] instead of the [0, 1].
130
J. Kawczak et al. / Statistics and Probability Letters 79 (2009) 125–130
Acknowledgment The work of these authors is supported by grants from the Natural Sciences and Engineering Research Council of Canada. References Andersen, N.T., Dobrič, V., 1987. The central limit theorem for stochastic processes. Ann. Probab. 15, 164–177. Andersen, N.T., Dobrič, V., 1988. The central limit theorem for stochastic processes II. J. Theoret. Probab. 1, 287–303. Andersen, N.T., Giné, E., Ossiander, M., Zinn, J., 1988. The central limit theorem and the law of iterated logarithm for empirical processes under local conditions. Probab. Theory Related Fields 77, 271–305. Barlow, R.E., Campo, R., 1975. Total time on test processes and applications to failure data analysis. In: Barlow, et al. (Eds.), Reliability and Fault Tree Analysis. SIAM, Philadelphia, pp. 451–481. Barlow, R.E., Proschan, F., 1975. Statistical Theory of Reliability and Life Testing. Holt, Rinehart and Winston, New York. Bergman, B., Klefsjö, B., 1984. The total time on test and its use in reliability theory. Oper. Res. 32, 596–606. Chandra, M., Singpurwalla, N.D., 1978. The Gini index, the Lorenz curve, and the total time on test transform. George Washington University. George Washington University Serial T–368. Csörgő, M., Csörgő, S., Horváth, L., 1986. An asymptotic theory for empirical reliability and concentration processes. In: Lecture Notes in Statistics, vol. 33. Springer-Verlag, New York. Gastwirth, J.L., 1971. A general definition of the Lorenz curve. Econometrica 39, 1037–1039. Gastwirth, J.L., 1972. The estimation of the Lorenz curve and Gini index. Rev. Econom. Statist. 54, 306–316. Goldie, C.M., 1977. Convergence theorems for empirical Lorenz curves and their inverses. Adv. Appl. Probab. 9, 765–791. Hart, P.E., 1971. Entropy and other measures of concentration. J. Roy. Statist. Soc. A. 134, 73–85. Hart, P.E., 1975. Moment distribution in economics: An exposition. J. Roy. Statist. Soc. A. 138, 423–434. Kakwani, N.C., Podder, N., 1973. On the estimation of Lorenz curves from grouped observations. Internat. Econom. Rev. 14, 278–292. Pollard, D., 1984. Convergence of stochastic processes. In: Springer Series in Statistics. Springer-Verlag. Sendler, W., 1982. On functionals of order statistics. Metrika 29, 19–54. van der Vaart, A.W., Wellner, J.A., 1996. Weak convergence and empirical processes with applications to statistics. In: Springer Series in Statistics. Springer. Yu, H., 1993. A Glivenko–Cantelli lemma and weak convergence for empirical processes of associated sequences. Probab. Theory Related Fields 95, 357–370.