Reliability Engineering and System Safety 80 (2003) 243–252 www.elsevier.com/locate/ress
Nonparametric predictive inference for grouped lifetime data F.P.A. Coolen*, K.J. Yan Department of Mathematical Sciences, Science Laboratories, University of Durham, South Road, Durham DH1 3LE, UK Received 5 June 2002; revised 16 December 2002; accepted 30 January 2003
Abstract This paper presents the application of a recently introduced nonparametric predictive inferential method to grouped lifetime data. Such data consist of numbers of events and numbers of right-censorings per interval, for a finite partition of the time-axis, without further information on the exact event and censoring times. Optimal bounds are derived for probabilities of events in terms of a future observation, where the bounds correspond to specific configurations of the event and censoring times per interval. The method is illustrated, and briefly compared to some alternative nonparametric approaches, via an example. q 2003 Elsevier Science Ltd. All rights reserved. Keywords: Censored data; Nonparametrics; Life tables; Prediction; Survival analysis
1. Introduction In this paper, a recently introduced method for statistical inference, called ‘nonparametric predictive inference’ (NPI), is applied to grouped data including right-censored observations. Such data often occur in situations concerning reliability and survival analysis, if the time-axis is partitioned into a finite number of intervals, and the data only consist of numbers of event times and censoring times per interval. A well-known example of such data is the use of so-called ‘life tables’ in actuarial methods. In reliability contexts, such data may typically appear on lifetimes of noncritical components in systems, where, e.g. once a month the components are inspected, showing if they have failed or not. In such situations, right-censoring could be due, for example, to components which fail caused by competing risks which are not the main failure modes under consideration, or by the components being replaced due to a predetermined preventive replacement policy. The statistical method for grouped data, presented in this paper, is based on quite minimal modelling assumptions, and is directly in terms of a random quantity representing a future observation, for example, time till failure of a future component. Generally, we will talk about ‘lifetime’ or ‘time * Corresponding author. Tel.: þ44-191-374-3022; fax: þ 44-191-3747388. E-mail address:
[email protected] (F.P.A. Coolen).
of event’ of ‘items’ (we prefer to use items over ‘individuals’, which is more common in survival analysis). We will assume that either a well-specified event happens, at a particular time, to each item for which we have an observation, or that a time is reported at which such an event has not yet occurred. All data are referred to as ‘observation (time)’, if it is a time at which the event of interest actually occurred we call it ‘event (time)’, else ‘censoring (time)’. Speaking in terms of ‘time’, we will restrict attention to nonnegative random quantities, so to random quantities and observations on the time-axis ½0; 1Þ: However, the method presented is more widely applicable, as only a finite partition of (part of) the real line is required. The formulae in this paper are such that computations are easy, as they are presented as product forms or have an explicitly recursive nature. In Section 2, the basics of nonparametric predictive inference are briefly summarized. Section 3 discusses, for grouped lifetime data, the influence of different possible configurations of event times and censoring times, and presents the main principle which enables derivation of optimal bounds for predictive probabilities. Section 4 presents lower and upper probabilities per interval, and Section 5 presents lower and upper predictive survival functions. In Section 6, our method is illustrated, and briefly compared with some alternative methods, via an example. This section may be the most interesting for readers who are less interested in the mathematical derivation and
0951-8320/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S0951-8320(03)00033-4
244
F.P.A. Coolen, K.J. Yan / Reliability Engineering and System Safety 80 (2003) 243–252
justification of this approach. Finally, in Section 7 we add some concluding remarks about the method and results presented in this paper.
2. Nonparametric predictive inference In this section, we summarize NPI for data including right-censored observations, as recently presented by Coolen and Yan [1], to which we refer for the theoretical justification and further detailed discussion of this method. Use of NPI in basic reliability situations has been presented by Coolen et al. [2]. Let the data consist of n observations, of which u are event times, 0 , t1 , · · · , tu ; and v ¼ n 2 u right-censoring times, 0 , c1 , · · · , cv : To simplify notation, let t0 ¼ 0 and tuþ1 ¼ 1; and we denote the right-censoring times in ðti ; tiþ1 Þ by ci1 , · · · , cili : We assume that there are no ties among the data, the method is easily adapted for ties [1]. Let nt be the number of items with observation time greater than t: We call this the number of items ‘at risk’ at time t; at an observation time the corresponding item is not included in nt : We denote the number of items at risk just prior to t by n~ t ; so n~ t ¼ nt þ 1 if t is an observation time in the data, else n~ t ¼ nt : Based on such data, Coolen and Yan [1] introduce, and justify, the assumption ‘right-censoring AðnÞ ’ (rc-AðnÞ ) for nonparametric predictive inference for the random quantity Xnþ1 representing the lifetime of a future item. Rightcensoring AðnÞ generalizes Hill’s AðnÞ [3 – 5], which underlies nonparametric predictive inference if the data do not include right-censored observations [2,6]. Berliner and Hill [7] applied AðnÞ directly to right-censored data, but this was only possible by using slightly different information from the right-censored observations, whereas our rc-AðnÞ -based method adds a further assumption enabling such observations to be taken into account precisely. Description of rc-AðnÞ requires notation for partial specification of probability distributions, which we call ‘M-function’ [1,2]. Definition (M-function). A partial specification of a probability distribution for a real-valued random quantity X can be provided via probability masses assigned to intervals, without any further restriction on the spread of the probability mass within each interval. A probability mass assigned, in such a way, to an interval ða; bÞ; is denoted by MX ða; bÞ; and referred to as M-function value for X on ða; bÞ: Clearly, all M-function values for X on all intervals should sum up to one, and each M-function value should be in ½0; 1: We wish to emphasize that no further assumptions are made about the distribution of the probability mass MX ða; bÞ over the interval ða; bÞ:
Definition. (rc-AðnÞ ) ([1,2]). The assumption ‘right-censoring AðnÞ ’ (rc-AðnÞ ) is that the probability distribution for a nonnegative random quantity Xnþ1 ; based on u event times and v right-censoring times, using the notation introduced above, is partially specified by ði ¼ 0; …; u; k ¼ 1; …; li Þ MXnþ1 ðti ; tiþ1 Þ ¼ MXnþ1 ðcik ; tiþ1 Þ ¼
1 nþ1
Y {r:cr ,ti }
1 ðn þ 1Þ~ncik
n~ cr þ 1 ; n~ cr Y
{r:cr ,cik }
n~ cr þ 1 : n~ cr
The product terms are defined as one if the product is taken over an empty set. The M-function values for Xnþ1 on other intervals are zero. This implicitly assumes noninformative censoring, as a post-data assumption related to exchangeability, at any time t; of all items known to be at risk at t; see Coolen and Yan [1], who also justify rc-AðnÞ : The partial specification of the probability distribution of Xnþ1 ; via these M-function values, enables nonparametric predictive inference when the problems considered can be formulated in terms of a future observation Xnþ1 : However, for many problems of interest, the M-function values only imply bounds for predictive probabilities, where optimal bounds are imprecise probabilities [2,6,8]. In Section 3 we discuss grouped data, and we prove a result on the configurations of event and censoring times, which enables derivation of such optimal bounds in Sections 4 and 5. One more property for probabilities and M-function values according to rc-AðnÞ ; as presented by Coolen and Yan [1], will be used in the Proof of Theorem 5, namely that PðXnþ1 [ ðti ; tiþ1 ÞÞ ¼ MXnþ1 ðtiþ1 ; tiþ2 Þ; for i ¼ 0; 1; …; u 2 1; where the probability on the left-hand side is the sum of all M-function values defined on ðti ; tiþ1 Þ and its sub-intervals ðcik ; tiþ1 Þ:
3. Grouped lifetime data and configurations In this paper, we consider NPI for grouped lifetime data, where event and censoring times are not actually observed, but only the numbers of each per interval are available, for a finite number of intervals which form a partition of the timeaxis. We assume that indeed no further information is available on the times of events and censorings within the intervals, that the intervals have been determined independently of the data, and that censoring was noninformative, so it occurs due to a mechanism that is independent of the remaining lifetimes [1,2]. We use the following notation for such data.
F.P.A. Coolen, K.J. Yan / Reliability Engineering and System Safety 80 (2003) 243–252
Let time points a1 , a2 , · · · , ak ; for k $ 2; partition the time-axis ½0; 1Þ into k þ 1 intervals Is ¼ ½as ; asþ1 Þ; for s ¼ 0; 1; …; k; with a0 ¼ 0 and akþ1 ¼ 1: Let es be the number of event times in Is ; and cs the number of rightcensoring times in Is ; and let ns ¼ es þPcs be the total number of observations in P Is : Let e ¼ ks¼0 es and c ¼ Pk k s¼0 cs ; and let n ¼ e þ c ¼ s¼0 ns be the total number of observations. Although we do not actually get to know the exact observation times, the orderings of event and censoring times within the intervals are important. Let the es ordered event times in Is be denoted by t1s , t2s , · · · , tess ; and the cs ordered censoring times in Is by cs1 , cs2 , · · · , cscs : For convenience, we assume no ties in this notation, but ties could be included without changing our main results in Sections 4 and 5. To develop NPI based on such data, the obvious problem is that the exact observation times, as needed for the M-function values for Xnþ1 in the definition of rc-AðnÞ ; are not available. These M-function values depend on the order of the event and censoring times, which we call ‘the configuration’ of the observations. For each configuration, we would have M-function values which partially specify the probability distribution for Xnþ1 ; and we could derive corresponding optimal bounds for probabilities of the form Xnþ1 [ B; where B , ½0; 1Þ [1,2,6]. The maximum lower bound would be derived by summing up all probability masses that, according to the M-function values, must necessarily be within B; and the minimum upper bound would be derived by summing up all such probability masses that could be within B: To derive such bounds for situations where B is equal to an interval Is (or a union of several such intervals), it turns out that the exact location of the event and censoring times within the intervals is not relevant, only their configuration. As there is a finite number of possible configurations, one could calculate the M-function values according to all possible configurations, calculate optimal bounds for Xnþ1 [ B for each configuration, and take the minimum (maximum) of the lower (upper) bounds per configuration as the lower (upper) probability for this event, that is the maximum lower (minimum upper) bound that can be justified without adding any further assumptions about the configurations. Although this is a natural way to proceed, the amount of computation involved would be enormous, making this approach not feasible for all but the smallest data sets. Luckily, however, we can derive these optimal bounds far quicker, as we can derive the configurations that actually lead to such bounds, therefore deleting the need to calculate M-functions for any other configurations. Such ‘optimal configurations’ will be presented in the next two sections, considering events Xnþ1 [ B with B ¼ Is (Section 4) and B ¼ ½as ; 1Þ (Section 5). The corresponding optimal configurations are all based on the following principle.
245
Theorem 1 (Optimal configuration principle, OCP). If a pair of neighbouring observations consists of an event time tj and a censoring time cl ; then, for any given set of further data consisting of u 2 1 event times and v 2 1 censoring times, the particular order of tj and cl influences the M-function values for Xnþ1 ; according to rc-AðnÞ ; as follows. Let CðtcÞ be the configuration for which tj , cl ; and let CðctÞ be the configuration for which cl , tj : Let MXtcnþ1 denote the rc-AðnÞ -based M-function values corresponding to CðtcÞ; and MXctnþ1 those corresponding to CðctÞ: Then: 1. M-function values on intervals before tj and cl do not depend on the order of tj and cl : So, for intervals ðti ; tiþ1 Þ and ðcik ; tiþ1 Þ; with tiþ1 , minðtj ; cl Þ; we have that MXtcnþ1 ðti ; tiþ1 Þ ¼ MXctnþ1 ðti ; tiþ1 Þ; MXtcnþ1 ðcik ; tiþ1 Þ ¼ MXctnþ1 ðcik ; tiþ1 Þ:
2. M-function values on intervals after tj and cl are larger for CðtcÞ than for CðctÞ: So, for intervals ðti ; tiþ1 Þ; with ti . maxðtj ; cl Þ; and ðcik ; tiþ1 Þ; with cik . maxðtj ; cl Þ; we have that MXtcnþ1 ðti ; tiþ1 Þ . MXctnþ1 ðti ; tiþ1 Þ; MXtcnþ1 ðcik ; tiþ1 Þ . MXctnþ1 ðcik ; tiþ1 Þ:
Proof. The relations between the different M-function values in OCP follow directly from the definition of rc-AðnÞ ; as for intervals before tj and cl ; all the factors in the products in the definitions of the M-function values are not affected by the order of tj and cl ; while for intervals after tj and cl ; all these factors but one remain unchanged, and the one changing factor is larger for CðtcÞ than for CðctÞ; since n~ cl is one smaller for CðtcÞ than for CðctÞ: A It turns out that consideration of these two types of Mfunctions is sufficient for the inferences considered in Sections 4 and 5. In fact, the optimal configurations of the event and censoring times per interval, leading to the lower and upper probabilities presented in Sections 4 and 5, will be justified straightforwardly by OCP. If one would wish to consider inferences on events Xnþ1 [ B; with B not consisting of one or more of the intervals Is ; then further aspects of the configurations would need to be considered. For OCP we did not consider the tied observation tj ¼ cl ; as the way that NPI would deal with such a tie [1] would not affect the optimal configurations, which is the reason why we do not have to focus on possible ties within the grouped data.
246
F.P.A. Coolen, K.J. Yan / Reliability Engineering and System Safety 80 (2003) 243–252
4. Predictive probabilities per interval In this section, we consider the probability that a future observation falls in one of the predetermined intervals Is ; based on grouped data consisting of n observations, using the notation introduced in Sections 2 and 3, and the assumption rc-AðnÞ : Clearly, we cannot derive precise probabilities for such events, both due to the nature of the M-functions in rc-AðnÞ and the way that the data are presented, without knowledge of the exact configurations within the intervals Is ; s ¼ 0; 1; …; k: Instead, we derive optimal bounds for these probabilities, i.e. the maximum lower bounds and minimum upper bounds that can be justified on the basis of the grouped data and rc-AðnÞ : These bounds are lower and upper probabilities [1,2,8], denoted by nþ1 [ Is Þ; respectively. The lower PðXnþ1 [ Is Þ and PðX probability PðXnþ1 [ Is Þ is determined by only summing up the probability masses that necessarily must be in Is ; according to the M-function values and the particular optimal configurations for all relevant intervals for which this probability mass is minimal. The upper probability nþ1 [ Is Þ is determined by summing up all probability PðX masses that could be in Is ; with the configurations for which this total probability mass is maximal. The principle OCP, derived in Section 3, is sufficient to derive the configurations corresponding to these lower and upper probabilities. We first consider the lower probabilities PðXnþ1 [ Is Þ; for s ¼ 0; 1; …; k: For given configurations for all the intervals, this lower probability is derived by only summing up the M-function values, according to rc-AðnÞ ; on intervals that are fully within Is ¼ ½as ; asþ1 Þ; so summing up the values MXnþ1 ðti ; tiþ1 Þ with ðti ; tiþ1 Þ , ½as ; asþ1 Þ and the MXnþ1 ðcik ; tiþ1 Þ with ðcik ; tiþ1 Þ , ½as ; asþ1 Þ: Let us now turn our attention to the configurations for all intervals Im ; m ¼ 0; 1; …; k; that lead to minimum probability for Xnþ1 to be in a particular Is ; so the configurations for which the above M-function values are minimal, we will denote these configurations as C Is ðIm Þ; and they are specified in Theorem 2. Theorem 2. (Configurations for PðXnþ1 [ Is Þ). Consider NPI, based on rc-AðnÞ ; for grouped data. For s ¼ 1; …; k 2 1; the following configurations lead to minimum probability mass for Xnþ1 in Is : m C Is ðIm Þ ¼ {cm cm , t1 };
for m , s;
m where {cm cm , t1 } is used to denote that all censorings in Im are assumed to take place before the event times in Im : The optimal configuration of the interval Is itself is:
C Is ðIs Þ ¼ {tess , cs1 }; so all events are assumed to happen prior to the censorings in this interval. Finally, the configurations in intervals beyond Is do not influence the M-function values within Is ; so we do not need to specify C Is ðIm Þ for m . s:
For I0 ; the only configuration that affects the M-function values within this interval is that of I0 itself, for which the optimal configuration is C I0 ðI0 Þ ¼ {te00 , c01 }; and C I0 ðIm Þ does not need to be specified for m . 0: For Ik ¼ ½ak ; 1Þ; the configurations of all intervals Im with m , k are relevant, but the actual configuration within Ik plays no role, as this configuration would effectively only serve to move probability mass within this interval. Hence, m C Ik ðIm Þ ¼ {cm cm , t1 }; for m , k;
and C Ik ðIk Þ does not need to be specified. Proof. These optimal configurations follow by (possibly repeated) application of the OCP presented in Section 3. For example, when considering Is with s ¼ 1; …; k; OCP implies that, for all intervals Im ; with m , s; any pair of neighbouring event time tj and censoring time cl ; within such an interval, should be configured like CðctÞ; in Theorem 1, so the censoring should be assumed to occur before the event time, in order to minimize the M-function values that together make up the lower probability for Xnþ1 [ Is : For any configuration, OCP can be applied repeatedly, every time moving a censoring time cl ; that is immediately to the right of an event time tj ; to the left of tj : This can be continued until all censoring times in each interval Im ; with m , s; are smaller than the event times in m the same interval, leading to {cm cm , t1 } for m , s: The optimal configuration in the interval Is ; for s ¼ 0; 1; …; k 2 1; follows from a simple argument, namely that the configuration CIs ðIs Þ must be the configuration that leads to maximum M-function values within the later intervals Im ; with m . s; as the sum of the M-function values in Is ; Isþ1 ; …; Ik is constant once the optimal configurations in the earlier intervals Im ; with m , s; are determined. Hence, with a similar argument as before, OCP leads to C Is ðIs Þ ¼ {tess , cs1 }; so all event times are assumed to take place before the censorings in Is : Clearly, changes in configurations within intervals Im ; with m . s; have no relevance for the M-function values within Is ; so do not need to be considered when determining PðXnþ1 [ Is Þ: The only optimal configuration not yet covered is C Ik ðIk Þ: Different configurations in Ik will affect the particular Mfunction values on intervals within Ik ; but the sum of these M-function values remains constant, as there are no intervals to the right of Ik that can take over some of the probability mass that is in Ik : A With the optimal configurations for PðXnþ1 [ Is Þ; as given in Theorem 2, and the definition of M-function values in rc-AðnÞ ; we can now determine these lower probabilities, keeping in mind that only M-function values on intervals that are completely within Is are summed up to give this lower probability. It is important to remark that here, as in
F.P.A. Coolen, K.J. Yan / Reliability Engineering and System Safety 80 (2003) 243–252
the rest of the paper, we assume that no single observation coincides with one of the values as that create the partition of the time-axis. (This would slightly complicate matters, but as we assumed that the as were determined independently of the data, and indeed probably before the data became available, this seems a reasonable assumption, even more since we do not actually know the precise observation times.) Theorem 3. (PðXnþ1 [ Is Þ). For grouped data, using the notation introduced in Section 3, the lower probabilities for the events Xnþ1 [ Is ; with s ¼ 0; 1; …; k; according to the assumption rc-AðnÞ ; are e0 ; PðXnþ1 [ I0 Þ ¼ nþ1 PðXnþ1 [ Is Þ ¼ 0;
PðXnþ1
for s ¼ 1; …; k 2 1; if es [ {0; 1}; 1 0
C B s21 B C es 2 1 Y cm C B [ Is Þ ¼ C; B1 þ m21 C X n þ 1 m¼0 B @ n2 nw 2 c m þ 1 A w¼0
for s ¼ 1; …; k 2 1; if es $ 2; 0 PðXnþ1
247
the super-additivity of such lower probabilities [8], i.e. PðXnþ1 [ Is < Im Þ $ PðXnþ1 [ Is Þ þ PðXnþ1 [ Im Þ for s – m; where the inequality is strict if there are censored observations in intervals which affect the relevant Mfunction values as described in Theorem 2 (we do not consider optimal configurations for the lower probability of Xnþ1 [ Is < Im ). The upper probabilities PðXnþ1 [ Is Þ; for s ¼ 0; 1; …; k; are derived in a similar way as the lower probabilities above, but now all the probability masses, as specified by the Mfunction values, that could actually be within Is ; are included. The optimal configurations are specified in Theorem 4, using C Is ðIm Þ to denote the configuration in Im that leads to maximum probability for Xnþ1 to be in Is : Theorem 4. (Configurations for PðXnþ1 [ Is Þ). Consider NPI, based on rc-AðnÞ ; for grouped data. For s ¼ 1; …; k 2 1; the following configurations lead to maximum probability mass for Xnþ1 in Is : C Is ðIm Þ ¼ {temm , cm 1 };
for m , s:
The optimal configuration of the interval Is itself is: 1
C B 21 B C nk kY cm C B [ Ik Þ ¼ C: B1 þ m21 C X n þ 1 m¼0 B @ n2 nw 2 c m þ 1 A w¼0
C Is ðIs Þ ¼ {cscs , t1s }: The configurations in intervals beyond Is do not influence the M-function values within Is ; so we do not need to specify C Is ðIm Þ for m . s: For I0 ; the optimal configuration is C I0 ðI0 Þ ¼ {c0c0 , t10 };
Proof. The lower probability PðXnþ1 [ I0 Þ follows straightforwardly from C I0 ðI0 Þ ¼ {te00 , c01 }; and rc-AðnÞ ; as for this configuration, intervals starting at censoring times within I0 ; on which M-function values are specified, all have the first event time in I1 ; that is t11 ; as right-end point, hence the probability masses specified by these M-function values are not necessarily within I0 : If there is at most one event time in an interval Is ; for s ¼ 1; …; k 2 1; then under the optimal configurations there are no M-function values specified on intervals that are completely within Is : The other lower probabilities specified in Theorem 3 are clearly the most interesting. These are derived by calculating the M-function values corresponding to the optimal configurations as specified in Theorem 2, and summing up all M-function values on intervals which are completely within the Is of interest. The final step is to prove that such sums are equal to the product forms provided in Theorem 3, which can be done via induction (as this is straightforward but tedious, we do not include it here, see Yan [9]). A
The optimal configurations for PðXnþ1 [ Is Þ; as given in Theorem 4, and the definition of M-function values in rcAðnÞ ; lead to the upper probabilities in Theorem 5, where all M-function values on intervals that have nonempty intersection with Is are summed up to give the upper probability for Xnþ1 to be in Is :
The optimal configurations of Theorem 2 cannot hold simultaneously, if indeed there are censored observations in the relevant intervals, which immediately confirms
Theorem 5. (PðXnþ1 [ Is Þ). For grouped data, using the notation introduced in Section 3, the upper probabilities for the events Xnþ1 [ Is ; with s ¼ 0; 1; …; k; according to
and CI0 ðIm Þ does not need to be specified for m . 0: For Ik ; the optimal configurations are C Ik ðIm Þ ¼ {temm , cm 1 };
for m , k;
and C Ik ðIk Þ does not need to be specified. Proof. These optimal configurations follow again from OCP, along the same lines as the proof of Theorem 2, but of course now using OCP to derive the maximum probability mass in Is ; leading to configurations where, when compared to those in Theorem 2, the order of all events and all censorings per interval is turned around. A
248
F.P.A. Coolen, K.J. Yan / Reliability Engineering and System Safety 80 (2003) 243–252
the assumption rc-AðnÞ ; are 2 0 PðXnþ1 [ Is Þ ¼
6Y s22 B B es þ 1 6 6 B1 þ 6 n þ 1 4m¼0 B @
13
n2
m X
cm nw 2 c m þ 1
C7 C7 C7 C7 A5
w¼0
1
0
C B C B cs21 þ cs C B B1 þ C; s 21 C B X @ n2 nw 2 c s þ 1 A w¼0
where the product term is defined as one if the product is over an empty set, and c21 ¼ 0: Proof. The main idea of the proof is as follows. The es event times partition Is into es þ 1 sub-intervals. The first one of these has nonempty intersection with the interval ðtes21 ; t1s Þ; on which an M-function value is defined, and, s21 because of the optimal configuration C Is ðIs21 Þ; also with s all intervals ðcs21 r ; t1 Þ; for r ¼ 1; …; cs21 ; on which also M-function values are defined. The optimal configuration in Is itself, C Is ðIs Þ; puts all censorings in Is prior to t1s ; so also the M-function values on all ðcsr ; t1s Þ; for r ¼ 1; …; cs ; are within ½as ; t1s Þ: The sum of all these M-function values is actually PðXnþ1 [ ðtes21 ; t1s Þ; and all this s21 probability mass can be put into ½as ; t1s Þ: According to C Is ðIs Þ; there are no censorings in any of the other subintervals of Is created by the event times, so the definition of rc-AðnÞ together with the result reported at the end of Section 2 (taken from Coolen and Yan [1]), imply that the probability masses on all these subintervals are equal to PðXnþ1 [ ðtes21 ; t1s ÞÞ: The product s21 form in Theorem 5 follows from rc-AðnÞ and the optimal configurations C Is ðIm Þ; for m , s; as presented in Theorem 4. The proof can again be completed by induction (this is again straightforward but tedious, and we do not include it here, see Yan [9]). A As for the lower probabilities above, these optimal configurations cannot occur simultaneously if there are censored data in the relevant intervals. This confirms subadditivity of such upper probabilities [8], i.e. PðXnþ1 [ Is < Im Þ # PðXnþ1 [ Is Þ þ CðXnþ1 [ Im Þ for s – m; with strict inequality if there are censored observations in intervals which affect the relevant M-function values according to Theorem 4.
5. Predictive survival functions In this section, we derive lower and upper probabilities for the events that Xnþ1 is greater than t; for t [ Is ; which define lower and upper survival functions for Xnþ1 for grouped data. The super- and sub-additivity of lower and
upper probabilities, respectively, mentioned above, prevent us from using the results from Section 4 directly. Instead, we must again find the optimal configurations, which luckily turn out to be rather straightforward in this case. We combine these configurations, and the actual upper and lower survival functions, in Theorems 6 and 7. Because the event and censoring times per interval Is ¼ ½as ; asþ1 Þ could be anywhere in this interval, without additional assumptions, the lower and upper survival functions, denoted by SðtÞ ¼ PðXnþ1 . tÞ and SðtÞ ¼ PðXnþ1 . tÞ; respectively, are step-functions, which only decrease at the as : Actually, SðtÞ is constant on Is ; so the upper survival function is continuous from the right at points as ; but the lower survival function is continuous from the left at as ; so SðtÞ is constant on I~s ¼ ðas ; asþ1 : We introduce some further, but straightforward, notation for the optimal configurations in Theorems 6 and 7. Theorem 6. (SðtÞ). Consider NPI, based on rc-AðnÞ ; for grouped data. Using the notation introduced in Section 3, the upper survival function SðtÞ ¼ PðXnþ1 . tÞ; at t [ Is ; is obtained for the configurations C SIs ðIm Þ ¼ {temm , cm 1 };
for m , s:
The optimal configuration of the interval Is takes a bit more consideration now, as we are actually interested in the upper survival function at time t [ Is : To get maximum probability mass to the right of t, all observations in Is are assumed to be greater than t, in which case the actual ordering does not influence the upper survival function value at t. So, the optimal configuration is: C SIs ðIs Þ ¼ {t , minðt1s ; cs1 Þ}: The configurations in intervals beyond Is do not influence SðtÞ; for t [ Is ; so we do not need to specify C SIs ðIm Þ for m . s: Let Ss denote the value of the upper survival function for t [ Is ¼ ½as ; asþ1 Þ; with s ¼ 0; 1; …; k; then, S0 ¼ 1; and, for s $ 1; Ss ¼ Ss21 2 es21 ps21 ; with p0 ¼
1 ; nþ1
and, for s $ 2; 0 p
s21
¼p
s22
1
sX 23
Bn2 nw 2 es22 þ 1 C C B C B w¼0 B C: s22 C B X A @ n2 nw þ 1 w¼0
F.P.A. Coolen, K.J. Yan / Reliability Engineering and System Safety 80 (2003) 243–252
249
Proof. The optimal configurations follow again from OCP. The upper survival function is derived by calculating the Mfunction values for the optimal configurations C SIs ðIm Þ; for m # s; where Ss is derived by subtracting from S s21 all probability mass which cannot be put to the right of t [ Is ; but which we had been able to put to the right of any w [ Is21 ; so the M-function values on intervals starting at the event and censoring times in Is21 ; under the optimal configuration. Again, further details of the proof are straightforward but somewhat tedious, and we do not include these here ¼ S[9].
and, for s $ 1;
Next we consider the lower survival function, which is constant on intervals I~s ¼ ðas ; asþ1 : Of course, we use the absence of knowledge on exact observation times within the intervals Is ; to put as little probability mass as necessary beyond t to derive SðtÞ ¼ PðXnþ1 . tÞ: Hence, for t [ I~s ; the observation times in Is are all assumed to be less than t; with further details on the optimal configuration included in Theorem 7.
Proof. The optimal configurations follow again from OCP. The lower survival function is derived by calculating the Mfunction values for the optimal configurations CSI~ ðIm Þ; for s m # s; where Ss~ is derived by subtracting from Ss.1 all probability mass which can be put to the left of any w [ I~s ; but which we had not been able to put to the left of t [ I~s21 ; so the M-function values on intervals starting at the event and censoring times in Is ; under the optimal configuration. Again, further details of the proof are straightforward but somewhat tedious, and we do not include these here, see Yan [9]. A
Theorem 7. (SðtÞ). Consider NPI, based on rc-AðnÞ ; for grouped data. Using the notation introduced in Section 3, the lower survival function SðtÞ ¼ PðXnþ1 . tÞ; at t [ I~s ; is obtained for the configurations m C SI~s ðIm Þ ¼ {cm cm , t1 };
for m , s:
The optimal configuration of the interval Is again needs some more consideration, as we are actually interested in the lower survival function at time t [ I~s : To get minimum probability mass to the right of t, all observations in Is are assumed to be less than t, and ordered such that the censorings are assumed to happen before the event times: C SI~s ðIs Þ ¼ {cscs , t1s ; and tess , t}: The configurations in intervals beyond Is do not influence SðtÞ; for t [ I~s ; so we do not need to specify CSI~ ðIm Þ for s m . s: Let Ss~ denote the value of the lower survival function for t [ I~s ¼ ðas ; asþ1 ; with s ¼ 0; 1; …; k; then, S0~ ¼
n 2 n0 ; n 2 c0 þ 1
and, for s $ 1; Ss~ ¼ Ss.1 þ qs21 2 ðes þ 1Þqs with q0 ¼
1 ; n 2 c0 þ 1
0 s
q ¼q
1
sX 21
B n2 nw þ 1 C C B C w¼0 C: B sX 21 C B @n2 n 2c þ1A
s21 B
w
s
w¼0
For completeness, it seems reasonable to define Sð0Þ ¼ 1; assuming that no events or censorings actually happened at time 0.
6. Example In this section, we illustrate NPI for grouped lifetime data via an example, in which we also compare it to two other statistical methods for such data. First, we compare the lower and upper survival functions with the so-called standard life table estimate, which we ^ denote by SðtÞ; see Lawless [10, Section 2.2] for details. This method provides an estimate of the underlying population lifetime distribution function, so its aim is not directly prediction, as it is in our method. It deals with censorings per interval, by effectively assuming that half of the events in an interval take place before all censorings, and the other half of events take place after all censorings in the interval. This enables a precise estimate of the survival function, corresponding to the underlying lifetime distribution, at the time points as which create the intervals. This estimator does not consider other time points. We do not discuss further variations, which can often be interpreted as following from slightly different assumptions for the exact censoring times within an interval [10]. It turns out that the value of this standard life table estimate at as is always between our lower and upper survival functions for Xnþ1 at as ; as illustrated in the example below. Secondly, we compare our inferences with the imprecise Dirichlet model by Walley [11], which was adapted for dealing with right-censored data by Coolen [12], and which gives bounds for estimated values of parameters us in a Bayesian multinomial model, where the categories are the same intervals as used in this paper, and the parameters have
250
F.P.A. Coolen, K.J. Yan / Reliability Engineering and System Safety 80 (2003) 243–252
the standard interpretation in terms of proportion of a large population to fall in each of these intervals, so again the inferential goal differs from ours, although a corresponding Bayesian predictive would suggest the probability for a future item’s event time to fall in a particular interval to be equal to the expected value of the corresponding parameter, which allows comparison with our method. However, censorings are dealt with differently in the imprecise Dirichlet model [12], as they are only assumed to take place at times as : In the example below, the cs censorings within interval Is are, for the imprecise Dirichlet model, assumed to take place at the right-end point asþ1 of this interval, so after the es event times in this interval, which explains why the corresponding bounds for the expected parameter values, Eðus Þ and Eðus Þ; are relatively close to our upper probability PðXnþ1 [ Is Þ; which is based on similar configurations for the intervals left of Is ; as shown in Section 4. However, our upper probability assumes a different configuration for Is ; leading to PðXnþ1 [ Is Þ $ Eðus Þ: The example illustrates that, if there are no censorings in all intervals Im ; with m # s; then PðXnþ1 [ Is Þ ¼ Eðus Þ: For completeness, we should remark that the imprecise Dirichlet model [11,12] requires specification of one parameter, which effectively controls the size of the set of corresponding Bayesian prior distributions. In our example, we set this parameter equal to 1, for larger values the bounds for the expected values can fall outside our corresponding lower and upper probabilities as based on rc-AðnÞ : There is no single value of that parameter which would always give equality of our imprecise probabilities and the bounds for the expected parameter values in the imprecise Dirichlet model. It should be remarked that the methods, illustrated in the example below, all give numerical values which are quite similar. The underlying reason is that all can be explained, with some small variations according to goal of inference and the added assumptions for dealing with censoring, via Efron’s [13] ‘redistribution of probability mass’ process, which in addition also underlies well-known nonparametric methods for inference based on right-censored data, such as the product-limit estimator by Kaplan and Meier [14] (see also Refs. [1,2]). 6.1. Example We use an example taken from Lawless [10, p. 57], who provides further details on context and the original source of these data. Effectively, the time-axis (in years) is partitioned into 11 intervals Is ; with a total number of 374 observations, consisting of es event times and cs censoring times per interval, as given in Table 1. Table 2 presents our lower and upper predictive survival functions for X375 ; as derived in Section 5, together with the corresponding survival function estimates according to
Table 1 Event and censoring times per interval Is
es
cs
[0,1) [1,2) [2,3) [3,4) [4,5) [5,6) [6,7) [7,8) [8,9) [9,10) [10,1)
90 76 51 25 20 7 4 1 3 2 47
0 0 0 12 5 9 9 3 5 5 0
the standard life table method [10], at points as ; denoted ^ s Þ: We see that these estimates are indeed always by Sða between our lower and upper survival functions. For such grouped data, the standard life table method does not define the survival function estimate at other time points, the values of the lower and upper survival functions, based on rc-Að374Þ ; at other t follow from Section 5, so SðtÞ ¼ Sðasþ1 Þ for all t [ I~s ¼ ðas ; asþ1 ; and SðtÞ ¼ Sðas Þ for all t [ I~s ¼ ½as ; asþ1 Þ: In the final interval, beyond a10 ¼ 10; the lower survival function is equal to 0, whereas the upper survival function remains equal to 0.236 without any further assumptions added. Table 3 gives the lower and upper probabilities for X375 to be in any of the intervals Is ; together with the lower and upper expected values for the parameters us in the imprecise Dirichlet model [12]. The difference between our upper and lower probabilities tends to increase with larger numbers of censorings in Is ; and in intervals Im ; with m , s: The lower probability for X375 [ ½7; 8Þ is equal to zero, as there was only one event time observed in that interval, and the bounds for the expected value of us ; according to the imprecise Dirichlet model, are always between our lower and upper probabilities, in this case, but the imprecise
Table 2 Lower and upper predictive survival functions, and standard life table estimates as
Sðas Þ
Sðas Þ
^ sÞ Sða
0 1 2 3 4 5 6 7 8 9 10
1 0.757 0.555 0.419 0.346 0.286 0.262 0.247 0.243 0.230 0.220
1 0.760 0.557 0.421 0.355 0.296 0.275 0.261 0.257 0.245 0.236
1 0.759 0.556 0.420 0.350 0.291 0.268 0.254 0.250 0.237 0.228
F.P.A. Coolen, K.J. Yan / Reliability Engineering and System Safety 80 (2003) 243–252 Table 3 Lower and upper predictive probabilities for X375 [ Is ; and lower and upper expected values for us in imprecise Dirichlet model Is
[0,1) [1,2) [2,3) [3,4) [4,5) [5,6) [6,7) [7,8) [8,9) [9,10) [10,1)
rc-Að374Þ
Imprecise Dirichlet model
PðX375 [ Is Þ
PðX375 [ Is Þ
Eðus Þ
Eðus Þ
0.2400 0.2000 0.1333 0.0640 0.0548 0.0181 0.0100 0 0.0078 0.0043 0.2200
0.2427 0.2053 0.1387 0.0750 0.0642 0.0272 0.0193 0.0081 0.0177 0.0147 0.2357
0.2400 0.2027 0.1360 0.0667 0.0586 0.0216 0.0137 0.0039 0.0122 0.0089 0.2313
0.2427 0.2053 0.1387 0.0693 0.0616 0.0247 0.0172 0.0078 0.0163 0.0133 0.2357
Dirichlet model can be adapted, by different choice of the parameter mentioned before [11,12], such that this would not necessarily hold.
7. Concluding remarks It is important to consider how these lower and upper probabilities can be used and interpreted. Although they could be considered from a Bayesian perspective [1], the most logical interpretation has a more classical frequentist nature, in the sense that, if the procedures in this paper were applied very often, the lower and upper probabilities would be bounds on the frequencies with which the relevant events would indeed occur. It is clear that, if there are only few data, these bounds may become very wide, hence may not lead to strong enough inferences, which would indicate that either more data, or additional assumptions, may be required. An interesting way to use NPI is for studying the effect of further assumptions underlying alternative methods. If other methods give quite different inferential conclusions, these may be heavily influenced by the assumptions underlying those other methods. Although we restricted attention to use of NPI, for grouped data, for inferences on Xnþ1 [ Is and on the survival function, this method is also applicable for many other events of interest, as long as such events are directly formulated in terms of Xnþ1 : This could include actual decision problems, with utilities related to possible outcomes of Xnþ1 : However, deriving optimal configurations may not always be achievable from OCP, or other simple principles. It is interesting to remark that, during the last decade, there has been increasing interest in the use of imprecise probabilities in reliability, see e.g. Coolen and Newby [15], Kozine and Filimonov [16], and Utkin and Gurov [17]. A variety of aspects that might be modelled via imprecise
251
probabilities have been suggested in the literature [8,12, 15 – 17]. In the results presented in this paper, three different aspects were taken into account via imprecision, namely the fact that only for a few events precise probabilities are defined by AðnÞ [1,2,6], the inclusion of censored data via rcAðnÞ [1,2], and the fact that the observations were not precisely available, leading to lack of knowledge of the configurations. Detailed understanding of features which can, or should, be taken into account via imprecision, is an interesting topic for future research. We would also welcome, and hope to undertake, further study of alternative models based on imprecise probabilities for such data, and further development of inferential procedures based on NPI for reliability data. Without further adaption, however, Berliner and Hill’s [7] AðnÞ -based method for right-censored data is less attractive for grouped data, as their use of partial censoring information would, for configurations with rightcensorings assumed to occur prior to events within an interval Is ; effectively put those right-censored observations in the earlier interval Is21 ; which may indeed lead to loss of valuable information. Finally, we should remark that, if the exact event and censoring times were actually known, then rc-AðnÞ -based lower and upper survival functions [1,2] would, of course, be within those for grouped data as presented in Section 5, as the bounds derived here hold for all possible configurations.
Acknowledgements The authors gratefully acknowledge useful feedback from a referee.
References [1] Coolen FPA, Yan KJ. Nonparametric predictive inference with rightcensored data. Submitted for publication. [2] Coolen FPA, Coolen-Schrijner P, Yan KJ. Nonparametric predictive inference in reliability. Reliab Engng Syst Safety 2002;78: 185–93. [3] Hill BM. Posterior distribution of percentiles: Bayes’ theorem for sampling from a population. J Am Statist Assoc 1968;63:677–91. [4] Hill BM. De Finetti’s theorem, induction, and AðnÞ or Bayesian nonparametric predictive inference (with discussion). In: Bernardo JM, DeGroot MH, Lindley DV, Smith AFM, editors. Bayesian Statistics 3. Oxford: University Press; 1988. p. 211–41. [5] Hill BM. Parametric models for An : splitting processes and mixtures. J R Statist Soc B 1993;55:423 –33. [6] Augustin T, Coolen FPA. Nonparametric predictive inference and interval probability. (Available as Discussion Paper 257 at http://www.stat.uni-muenchen.de/sfb386/.) Submitted for publication. [7] Berliner LM, Hill BM. Bayesian nonparametric survival analysis (with discussion). J Am Statist Assoc 1988;83:772–84. [8] Walley P. Statistical reasoning with imprecise probabilities. London: Chapman & Hall; 1991.
252
F.P.A. Coolen, K.J. Yan / Reliability Engineering and System Safety 80 (2003) 243–252
[9] Yan KJ. Nonparametric predictive inference with right-censored data. PhD Thesis. University of Durham; 2002 (available from the corresponding author). [10] Lawless JF. Statistical models and methods for lifetime data. New York: Wiley; 1982. [11] Walley P. Inferences from multinomial data: learning about a bag of marbles (with discussion). J R Statist Soc, Ser B 1996;58: 3 –57. [12] Coolen FPA. An imprecise Dirichlet model for Bayesian analysis of failure data including right-censored observations. Reliab Engng Syst Safety 1997;56:61–8.
[13] Efron B. The two sample problem with censored data. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 4. New York: Prentice-Hall; 1967. p. 831 –53. [14] Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Statist Assoc 1958;53:457– 81. [15] Coolen FPA, Newby MJ. Bayesian reliability analysis with imprecise prior probabilities. Reliab Engng Syst Safety 1994;43:75– 85. [16] Kozine IO, Filimonov YV. Imprecise reliabilities: experiences and advances. Reliab Engng Syst Safety 2000;67:75–83. [17] Utkin LV, Gurov SV. Imprecise reliability for some new lifetime distribution classes. J Statist Plan Inference 2002;105:215– 32.