Computational Statistics & Data Analysis 37 (2001) 93–113 www.elsevier.com/locate/csda
In#uence analysis to assess sensitivity of the dropout process Geert Molenberghsa;∗ , Geert Verbekeb , Herbert Thijsc , Emmanuel Lesa2red , Michael G. Kenwarde a
Biostatistics, Center for Statistics, Limburgs Universitair Centrum, Universitaire Campus, Building D, B-3590 Diepenbeek, Belgium b Katholieke Universiteit Leuven, Leuven, Belgium c Limburgs Universitair Centrum, Diepenbeek, Belgium d Katholieke Universiteit Leuven, Leuven, Belgium e London School of Hygiene and Tropical Medicine, London, UK Received 1 March 2000
Abstract Diggle and Kenward (Appl. Statist. 43 (1994) 49) proposed a selection model for continuous longitudinal data subject to possible non-random dropout. It has provoked a large debate about the role for such models. The original enthusiasm was followed by skepticism about the strong but untestable assumption upon which this type of models invariably rests. Since then, the view has emerged that these models should ideally be made part of a sensitivity analysis. One of their examples is a set of data on mastitis in dairy cattle, about which they concluded that the dropout process was non-random. The same data were used in Kenward (Statist. Med. 17 (1998) 2723), who performed an informal sensitivity analysis. It thus presents an interesting opportunity for a formal sensitivity assessment, as proposed by Verbeke et al. (sensitivity analysis for non-random dropout: a local in#uence approach, 2000; submitted), based on local in#uence (Cook, J. Roy. Statist. Soc. Ser. B 48 (1986) 133). c 2001 Elsevier Science B.V. All rights reserved. Keywords: Dropout; Selection model; Global in#uence; Local in#uence; Case deletion; Sensitivity analysis
1. Introduction In a longitudinal study or experiment, each unit is measured on several occasions. It is not unusual in practice for some sequences of measurements to terminate early ∗
Corresponding author. Tel.: +32-11-26-8238; fax: +32-11-26-8299. E-mail address:
[email protected] (G. Molenberghs). c 2001 Elsevier Science B.V. All rights reserved. 0167-9473/01/$ - see front matter PII: S 0 1 6 7 - 9 4 7 3 ( 0 0 ) 0 0 0 6 5 - 7
94
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
for reasons outside the control of the investigator, and any unit so a2ected is often called a dropout. It might therefore be necessary to accommodate dropout in the modeling process, which itself is sometimes of scientiEc interest. Rubin (1976) and Little and Rubin (1987, Chapter 6) make important distinctions between di2erent missing values processes. A dropout process is said to be completely random (CRD) if the dropout is independent of both unobserved and observed data and random (RD) if, conditional on the observed data, the dropout is independent of the unobserved measurements; otherwise the dropout process is termed non-random (NRD). If a dropout process is random then a valid analysis can be obtained through a likelihood-based analysis that ignores the dropout mechanism, provided the parameter describing the measurement process is functionally independent of the parameter describing the dropout process, the so-called parameter distinctness condition. This situation is termed ignorable by Rubin (1976) and Little and Rubin (1987). See also Little (1995). This leads to considerable simpliEcation in the analysis. In many examples, however, the reasons for dropout are many and varied and it is therefore diGcult to justify on a priori grounds the assumption of random dropout. Arguably, in the presence of non-random dropout, a wholly satisfactory analysis of the data is not feasible. One approach is to estimate from the available data the parameters of a model representing a non-random dropout mechanism. This is the route taken by Diggle and Kenward (1994, henceforth referred to as DK) in the context of continuous longitudinal data. Further approaches are proposed by Laird et al. (1987) Wu and Bailey (1988, 1989), Wu and Carroll (1988). Little (1995) gives a careful review of the di2erent modeling approaches. With the volume of literature on non-random missing data increasing, there has been growing concern about the fact that models often rest on strong assumptions and relatively little evidence from the data themselves. This point was already raised by several discussants to DK (Rubin, 1994; Laird, 1994) and Glynn et al. (1986). Since the model of DK Ets within the class of selection models, it is fair to say that it raised, at Erst, too high expectations. This was made clear by many discussants of their paper. This implies that, for example, formal tests for the null hypothesis of random missingness, while technically possible, should be approached with caution. In response, there is a growing awareness of the need for methods that investigate the sensitivity of the results with respect to the model assumptions. Some progress can be made by examining the e2ect of dropout mechanism parameters for which there is no information in the data on the analyses of the measurement parameters and on the remaining dropout parameters; Nordheim (1984) gives an example of such a sensitivity analysis with cross-sectional binomial outcomes. A similar principle is advocated by Little (1994). Molenberghs et al. (1999) illustrate the need for sensitivity analysis by reviewing some of the issues that arise with models for non-random missing data. While a general awareness of the need for sensitivity analysis has grown, only few actual proposals have been made. Moreover, many of these are to be considered as useful but ad hoc approaches. In our view, a more formal approach can be a useful component to sensitivity analyses. In this paper, we use the local in#uence
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
95
approach, suggested by Cook (1986) to investigate the sensitivity of the inference to dropout model assumptions for the mastitis data, analyzed previously in DK, who concluded non-random dropout. It provides an interesting case since Kenward (1998) conducted an informal sensitivity analysis of these data, thereby shedding a di2erent light on the original analysis. A general theoretical treatment is given in Verbeke et al. (2000). The mastitis data are introduced in Section 2. Section 3 sketches the non-random dropout model of DK and Section 4 presents informal sensitivity analyses of the mastitis data. Section 5 reviews local in#uence theory and derives appropriate in#uence measures for the current model. A local in#uence analysis of the mastitis data is the focus of Section 6. 2. Mastitis in dairy cattle This example, concerning the occurrence of the infectious disease mastitis in dairy cows, was introduced in DK and re-analyzed in Kenward (1998). Data were available of the milk yields in thousands of liters of 107 dairy cows from a single herd in two consecutive years: Yij (i = 1; : : : ; 107; j = 1; 2). In the Erst year, all animals were supposedly free of mastitis, in the second year 27 became infected and dropped out. Mastitis typically reduces milk yield, and the question of scientiEc interest is whether the probability of occurrence of mastitis is related to the yield that would have been observed had mastitis not occurred (and hence not dropped out). A graphical representation of the complete data is given in Fig. 1. The second panel, as well as the highlighted cows, will be discussed in Section 6. This setting is very interesting, since part of the scientiEc focus is on the dropout mechanism and on the potential outcome that would have been observed, had dropout
Fig. 1. Graphical representation of the mastitis data. The Erst panel shows a scatter plot of the second measurement versus the Erst measurement. The second panel shows a scatter plot of the change versus the baseline measurement.
96
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
(mastitis) been prevented. (This is in contrast to situations where dropout occurs due to death.) We will illustrate that the answer to such a question, while important, is very sensitive to modeling assumptions and=or to the presence of in#uential subjects. This fact was not taken into account in the original analysis by Diggle and Kenward (1994). The local in#uence approach to sensitivity analysis followed here has the ability, unlike case deletion, to reveal precisely which aspects of an in#uential subject drive the conclusion about the dropout model. 3. Models for incomplete data Let us introduce some notation. We assume that for cow i = 1; : : : ; N in the study two measurements of milk yield Yij are designed to be at years j=1; 2. Both outcomes are grouped into a vector Yi = (Yi1 ; Yi2 ) . Let Di be the dropout indicator for cow i which is 1 if dropout occurs and 0 otherwise. It is often necessary to split the vector Yi into observed (Yio ) and missing (Yim ) components, respectively. The taxonomy of Rubin (1976) and Little and Rubin (1987), is based on the selection model factorization of the full density: f(yi ; di |; ) = f(yi |)f(di |yi ; );
(3.1)
where the Erst factor with parameter is the marginal density of the measurement process and the second one with parameter is the density of the missingness process, conditional on the outcomes. Precisely, the taxonomy is based on the second factor of (3.1): f(di |yi ; ) = f(di |yio ; yim ; ):
(3.2)
If (3.2) is independent of the measurements, i.e., reduces to f(di | ), then the process is CRD. If (3.2) is independent of the unobserved measurements, Yim , but depends on the observed measurements, Yio , (i.e., reduces to f(di |yio ; )), then the process is RD. Finally, when (3.2) depends on the missing values Yim , the process is non-random (NRD). DK combine a multivariate normal model for the measurement process with a logistic regression model for the dropout process. Precisely, the measurement model assumes Yi ∼ N (Xi ; )
(3.3)
in which the design matrix is 1 0 Xi = : 1 1 Here, the rows corresponds to the means for years 1 and 2, respectively, while the columns represent intercept and time e2ect, respectively. Further, = ( 0 ; d ) , and is a (2 × 2) positive deEnite matrix. The dropout probability is modeled by means of a logistic regression: logit[g(yi1 ; yi2 )] = logit[pr(Di = 1|yi1 ; yi2 )] =
0
+
1 yi1
+ !yi2 ;
i = 1; : : : ; N: (3.4)
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
97
When ! equals zero, the dropout model is random, and all parameters can be estimated using standard software since the measurement model for which we use a bivariate normal model and the dropout model, assumed to follow a logistic regression, can then be Etted separately. If ! = 0, the dropout process is non-random. The full data likelihood contribution for subject i assumes the form L∗ (; |yi ; di ) ˙ f(yi ; di |; ):
(3.5)
In line with our discussion in the introduction, Rubin (1994) points out that such analyses heavily depend on the assumed dropout process while it is impossible to End evidence for or against the model, unless supplemental information on the dropouts is available. Further, note that in practice, subjects may drop out for a variety of reasons, with several competing dropout processes operating simultaneously. A major problem is that non-random dropout models rest on strong assumptions, violation of which can have a strong impact on inference. In particular, it is conceivable that one or a few in#uential subjects a2ect inferences about the dropout mechanism: an otherwise random dropout mechanism may seem to be non-random. Detecting of such subjects is therefore a crucial issue. In the next section, this model is applied to the mastitis data and several informal sensitivity assessments are discussed. Thereafter, we will investigate the sensitivity of the estimation of the parameters of interest with respect to assumptions about the dropout model. To this end, we consider the following perturbed version of (3.4): logit[g(yi1 ; yi2 )] = logit[pr(Di = 1|yi1 ; yi2 )] =
0
+
1 yi1
+ !i yi2 ;
i = 1; : : : ; N: (3.6)
in which di2erent subjects are assigned di2erent weights to the response at the second occasion to predict dropout. Note that the !i are not to be considered as subject-speciEc parameters that need to be estimated, but rather as perturbations that will be used only to derive in#uence measures (Cook, 1986). If all !i equal zero, the model reduces to a RD model. Hence, it enables studying the e2ect of how small perturbations in the NRD direction, can have a large impact on key features of the model (such as treatment e2ect, time evolution, variance components, or dropout parameters). This will be done using the local in#uence approach of Cook (1986) which we will describe in Section 5. Of course, not all possible forms of impact, resulting from sensitivity to dropout model assumptions, will be found in this way. Therefore, the method proposed here should be viewed as one component of a sensitivity analysis, but ought ideally to be supplemented with other methods. 4. Mastitis in dairy cattle: informal sensitivity analysis DK and Kenward (1998) performed several analyses of these data. In DK, a separate mean for each group deEned by the year of Erst lactation and a common time e2ect was considered, together with an unstructured 2 × 2 covariance matrix. The dropout model included both Yi1 and Yi2 and was reparameterized in terms
98
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
Table 1 Maximum-likelihood estimates (standard errors) of random and non-random dropout models, Etted to the mastitis data set, under several deletion schemes E2ect Measurement model: Intercept Time e2ect First variance Second variance Correlation Dropout model: Intercept First measurement Second measurement −2 loglikelihood
Parameter
all
RANDOM DROPOUT (53; 54; 66; 69) (4; 5) (66)
(4; 5; 66)
0
d 12 22
5.77(0.09) 0.72(0.11) 0.87(0.12) 1.30(0.20) 0.58(0.07)
5.69(0.09) 0.70(0.11) 0.76(0.11) 1.08(0.17) 0.45(0.08)
5.80(0.09) 0.60(0.08) 0.76(0.11) 1.09(0.17) 0.73(0.05)
−2:65(1:45) −3:69(1:63) 0.27(0.25) 0.46(0.28)
0 1
!=
2
0
0 280.02
5.81(0.08) 0.64(0.09) 0.77(0.11) 1.30(0.20) 0.72(0.05)
−2:34(1:51) −2:77(1:47) −2:48(1:54) 0.22(0.25) 0.29(0.24) 0.24(0.26) 0
246.64
5.75(0.09) 0.68(0.10) 0.86(0.12) 1.10(0.17) 0.57(0.07)
0 237.94
0 264.73
220.23
NON-RANDOM DROPOUT E2ect Parameter all (53; 54; 66; 69) (4; 5) (66) (4; 5; 66) Measurement model: Intercept
0 5.77(0.09) 5.69(0.09) 5.81(0.08) 5.75(0.09) 5.80(0.09) Time e2ect
d 0.33(0.14) 0.35(0.14) 0.40(0.18) 0.34(0.14) 0.63(0.29) First variance 12 0.87(0.12) 0.76(0.11) 0.77(0.11) 0.86(0.12) 0.76(0.11) Second variance 22 1.61(0.29) 1.29(0.25) 1.39(0.25) 1.34(0.25) 1.10(0.20) Correlation 0.48(0.09) 0.42(0.10) 0.67(0.06) 0.48(0.09) 0.73(0.05) Dropout model: 0.37(2.33) −0:37(2:65) −0:77(2:04) 0.45(2.35) −2:77(3:52) Intercept 0 First measurement 2.25(0.77) 2.11(0.76) 1.61(1.13) 2.06(0.76) 0.07(1.82) 1 Second measurement != 2 −2:54(0:83) −2:22(0:86) −1:66(1:29) −2:33(0:86) 0.20(2.09) −2 loglikelihood G 2 for NRD
274.91 5.11
243.21 3.43
237.86 0.08
261.15 3.57
220.23 0.005
of the size variable (Yi1 + Yi2 )=2 and the increment Yi2 − Yi1 . It turned out that in predicting dropout, the increment was an important variable in contrast with a relatively small contribution of the size. If this model were assumed plausible, RD would be rejected on the basis of a likelihood ratio test statistic of G 2 = 5:11 on 1 degree of freedom. Kenward (1998) carried out what we could term a data-driven sensitivity analysis. Let us describe his results in some detail in this section. He started from the original model in DK, albeit with a common intercept, since there was no evidence for a dependence on Erst lactation year. The Ets are reproduced in Table 1. Using the likelihoods to compare the Et of the two models, we get a di2erence G 2 = 5:11 (p = 0:02). Kenward (1998) found that the corresponding Wald test yields p = 0:002
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
99
and concluded that this discrepancy might suggest that the asymptotic approximations on which these are based are not very accurate. Nevertheless there is a suggestion from the change in likelihood that 2 is making a real contribution to the Et of the model. The dropout model parameters, estimated from the NRD setting can be found in Table 1. Some insight into this Etted model can be obtained by re-writing it in terms of the milk yield totals (Y1 + Y2 ) and increments (Y2 − Y1 ): logit[P(mastitis)] = 0:37 − 0:145(y1 + y2 ) − 2:395(y2 − y1 ):
(4.1)
The probability of mastitis (i.e., dropout) increases with larger negative increments, that is, those animals who showed (or would have shown) a greater decrease in yield over the two years have a higher probability of getting mastitis. This is precisely at the heart of the scientiEc question. The other di2erences in parameter estimates between the two models are consistent with this: the NRD dropout model predicts a smaller average increment, d , in yield, with larger second year variance and smaller correlation caused by greater negative predicted di2erences between yields. To gain some additional insight into these two Etted models we now take a closer look at the raw data and the predictive behavior of the Gaussian NRD model. Under an NRD model the predicted, or imputed, value of a missing observation is given by the ratio of expectations: yˆm =
EY m |Y o [ym P(d|yo ; ym )] : EY m |Y o [P(d|yo ; ym )]
(4.2)
Recall that the Etted dropout model (4.1) implies that the probability of mastitis increases with decreasing values of the increment Y2 − Y1 . We therefore plot the 27 imputed values of this quantity together with the 80 observed increments against the Erst year yield Y1 . This is presented in Fig. 2, in which the imputed values (plus signs) are indicated along with the observed values (crosses). Note how the imputed values are almost linear in Y1 : this is a well-known property of the ratio (4.2) within this range of observations. The imputed values for the NRD model are all negative, in contrast to the observed increments which are nearly all positive. With animals of this age, one would normally expect an increase in yield between the two years. The dropout model is imposing very atypical behavior on these animals and this corresponds to the statistical signiEcance of the NRD component of the model ( 2 ) but of course necessitates further scrutiny. Another feature of this plot is the pair of outlying observed points circled in the top left-hand corner. These two animals have the lowest and third lowest yields in the Erst year, but moderately large yields in the second, leading to the largest positive increments. It is likely that there is some anomaly, possibly illness, leading to their relatively low yields in the Erst year. One can conjecture that these two animals are the cause of the structure identiEed by the Gaussian NRD model. Under the joint Gaussian assumption the NRD model essentially “Ells in” the missing data to produce a complete Gaussian distribution. To counterbalance the e2ect of these two extreme positive increments, the dropout model predicts negative increments for the mastitic cows, leading to the results observed. As a check on this conjecture we omit these two animals from the data set and reEt the RD and NRD Gaussian models. The
100
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
Fig. 2. Plot of observed and imputed year 2 – year 1 yield di2erences against year 1 yield. Three outlying points are indicated.
resulting estimates are presented in the (4; 5) column of Table 1. Observation #66 will be removed by the local in#uence analysis (Section 6). This procedure is similar to a global in#uence analysis by means of deleting two observations. It is clear that the in#uence on the measurement model parameters is small in the random dropout case, although the gap on the time e2ect d between the random and non-random dropout models is reduced when #4 and #5 are removed. The deviance is minimal and the NRD model now shows no improvement in Et over RD. The estimates of the dropout parameters, while still moderately large in an absolute sense, are of the same size as their standard errors. In the absence of the anomalous animals (we removed #66 as well), the structure identiEed earlier in terms of the NRD dropout model no longer exists. The increments imputed by the Etted model are also plotted in Fig. 2, indicated by diamonds. While still lying among the lower region of the observed increments, these are now all positive and lie close to the increments imputed by the RD model (squares and triangles). Thus, we have a plausible representation of the data in terms of joint Gaussian milk yields, three pairs of outlying yields and no requirement for an NRD dropout process. The three key assumptions underlying the NRD model are, Erst, the form chosen for the relationship between dropout probability and response, second, the transformations chosen for the responses, and, third, the distribution of the response or, more precisely, the conditional distribution of the possibly unobserved response given the observed response. In the current setting for the Erst assumption, if there is dependence of mastitis occurrence on yield, experience with logistic regression tells us that the exact form of the link function in this relationship is unlikely to be critical. The second assumption may have an impact (cf. the choice between a direct-variable and an incremental-variable approach). In terms of sensitivity Kenward (1998)
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
101
considered the third assumption, the distribution of the response, to be the most critical one. All the data from the Erst year are available, and a normal probability plot of these showed no great departures from the Gaussian assumption. Leaving this distribution unchanged, Kenward (1998) examine the e2ect of changing the conditional distribution of Y2 given Y1 . One simple and obvious choice is to consider a heavy tailed tm distribution. For the degrees of freedom the following choices were made: m = 2; 10; 25. Not surprisingly, his Ending was that the heavier the tails of the t distribution, the better the outliers were accommodated. As a result, the di2erence between the RD and NRD models vanished (e.g., G 2 = 1:08 for a t2 distribution). In addition, Kenward (1998) found that the estimated yearly increment in milk yield
d from the NRD model, increases to the value estimated under the RD model. The results observed here are consistent with those from the deletion analysis. The two outlying pairs of measurements identiEed earlier are not inconsistent with the heavy tailed t distribution so would require no “Elling in” and hence no evidence for non-randomness in the dropout process under the second model. It should be clear that interpreting a single non-random model, Etted to an incomplete set of data, should be avoided. Rather, a careful sensitivity assessment should supplement Etting of such models. 5. Local inuence for non-random dropout 5.1. Some background A famous remark attributed to George Box is that all statistical models are wrong, but some are useful. Cook (1986) uses this idea to motivate his assessment of local in#uence. He suggests that more conEdence can be put in a model which is relatively stable under small modiEcations. The best known perturbation schemes are based on case-deletion (Cook and Weisberg, 1982) in which the e2ect is studied of completely removing cases from the analysis. A quite di2erent paradigm is the local in#uence approach where one investigates how the results of an analysis are changed under small perturbations of the model. Lesa2re and Verbeke (1998) have shown that the local in#uence approach is useful for the detection of in#uential subjects in longitudinal data analysis. Moreover, since the resulting in#uence diagnostics can be expressed analytically, they often can be decomposed in interpretable components, leading to additional insight. In our case, we are interested in the in#uence the non-randomness of dropout exerts on the parameters of interest, which will most often be the Exed-e2ects parameters, possibly supplemented with the variance components. Verbeke et al. (2000) have done this by considering (3.6) as the dropout model. Indeed, !i = 0 for all i corresponds to a RD process, which cannot in#uence the measurement model parameters. When small perturbations in a speciEc !i lead to relatively large di2erences in the model parameters, this suggests that the subject likely drives the conclusions in a disproportionate fashion. For example, if such a subject would drive the model
102
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
towards NRD, then the conditional expectations of the unobserved measurements, given the observed ones, may deviate substantially from the ones under a RD mechanism. Precisely this was observed by Kenward (1998) and reproduced in Fig. 2 (see also Section 4). This implies that in#uence on the dropout parameters may also have an impact on the measurement model parameters. Indeed, since the conditional expectations change, so will the measurement model parameters. Hence, its detection is usually very important. Recall that in the current example both the dropout (mastitis) and outcome (milk yield) processes are of importance. Finally, note that such impact can indeed arise from a complete subject. We denote the log-likelihood function corresponding to model (3.6) by ‘(|!) =
N
‘i (|!i );
(5.1)
i=1
in which ‘i (|!i ) is the contribution of the ith individual to the log-likelihood, and where = (; ) is the s-dimensional vector, grouping the parameters of the measurement model and the dropout model, not including the N × 1 vector ! = (!1 ; !2 ; : : : ; !N ) of weights deEning the perturbation of the RD model. Note that the perturbations in ! are conceptually di2erent from the parameter ! in (3.4). This expression arises from taking the logarithm of (3.5), the model components of which are described in Section 3. It is assumed that the vector of perturbations ! belongs to an open subset % of RN . For ! equal to !0 = (0; 0; : : : ; 0) , ‘(|!0 ) is the log-likelihood function which corresponds to a RD dropout model. Let ˆ be the maximum-likelihood estimator for , obtained by maximizing ‘(|!0 ), and let ˆ! denote the maximum-likelihood estimator for under ‘(|!). The local in#uence approach now compares ˆ! with . ˆ Similar estimates indicate that the parameter estimates are robust with respect to perturbations of the RD model in the direction of non-random dropout. Strongly di2erent estimates suggest that the estimation procedure is highly sensitive to such perturbations, which suggests that the choice between a RD model and a non-random dropout model highly a2ects the results of the analysis. Cook (1986) proposed to measure the distance between ˆ! and ˆ by the so-called likelihood displacement, deEned by LD(!) = 2(‘(|! ˆ 0 ) − ‘(ˆ! |!0 )): This takes into account the variability of . ˆ Indeed, LD(!) will be large if ‘(|!0 ) is strongly curved at , ˆ which means that is estimated with high precision, and small otherwise. Therefore, a graph of LD(!) versus ! contains essential information on the in#uence of perturbations. It is useful to view this graph as the geometric surface formed by the values of the (N + 1)-dimensional vector
(!) =
! LD(!)
as ! varies throughout %. Since this so-called in#uence graph can only be depicted when N =2, Cook (1986) proposed to look at local in#uence, i.e., at the normal curvatures (in the di2erential
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
103
geometry sense), Ch say, of (!) in !0 , in the direction of some N -dimensional vector h of unit length. Let i be the s-dimensional vector deEned by
@2 ‘i (|!i ) i = @!i @ =ˆ ; ! =0 i
and deEne ( as the (s × N ) matrix with i as its ith column. Further, let LS denote the (s × s) matrix of second-order derivatives of ‘(|!0 ) with respect to , also evaluated at = . ˆ Cook (1986) has then shown that Ch can be easily calculated by −1
Ch = 2|h ( LS (h|:
(5.2)
There are several ways in which (5.2) can be used to study (!), each corresponding to a speciEc choice of the unit vector h. One evident choice is the vector hi containing one in the ith position and zero elsewhere, corresponding to the perturbation of the ith weight only. This re#ects the in#uence of allowing the ith subject to drop out non-randomly, while the others can only drop out at random. It immediately follows from (5.2) that the corresponding local in#uence measure Ci is given by −1 Ci = 2| i LS i |:
(5.3)
Another important direction is the direction hmax of maximal normal curvature Cmax . It shows how to perturb the RD model to obtain the largest local changes in the likelihood displacement. It is readily seen that Cmax is the largest eigenvalue of −1 −2( LS (, and that hmax is the corresponding eigenvector. When a subset 1 of = (1 ; 2 ) is of special interest, a similar approach can be used, replacing the log-likelihood by the proEle log-likelihood for 1 . We refer to Cook (1986) and Lesa2re and Verbeke (1998) for details. For example, if the Exed-e2ects parameters are of special interest, Ci () can be calculated. On the other hand, if there is speciEc interest in the variance components ( say), either directly or because they relate back to the standard errors of the Exed e2ects, Ci ( ) can be computed. 5.2. Applied to the Model of DK We will consider complete and incomplete sequences in turn. The log-likelihood contribution for a complete sequence is ‘i! = ln f(yi ) + ln[1 − g(yi1 ; yi2 )]; where the parameter dependencies are suppressed for notational ease. The density f(yi ) is bivariate normal, described by (3.3). The mixed derivatives are straightforward to calculate @2 ‘i! = 0; (5.1) @@!i @2 ‘i! = −hi yi2 g(yi1 ; yi2 )[1 − g(yi1 ; yi2 )]: (5.2) @ @!i Here hi = (1; yi1 ) is the ‘history’ of the measurement process at the second occasion.
104
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
Evaluating those under !i = 0 merely results in replacing g(yi1 ; yi2 ) by g(yi1 ) = g(yi1 ; yi2 )|!i =0 , which is the RD version of the dropout model. The contribution from an incomplete sequence is more complicated. Its log-likelihood term is ‘i! = ln f(yi1 ) + ln
f(yi2 |yi1 )g(yi1 ; yi2 ) dyi2 :
Applying the results of Verbeke et al. (2000), the mixed derivatives become
@)(yi2 |yi1 ) @2 ‘i! ; = [1 − g(hi1 )] @@!i ! =0 @ @ ‘i! @ @!i
(5.3)
i
2
= −hi )(yi2 |yi1 )g(yi1 )[1 − g(yi1 )];
(5.4)
!i =0
where −1 [yi1 − )(yi1 )] )(yi2 |yi1 ) = )(yi1 ) + 12 11
(5.5)
denotes the conditional expectation of the second measurement. The derivatives of (5.5) w.r.t. the measurement model parameters are @)(yi2 |yi1 ) −1 = xi2 − 12 11 xi1 ; @
(5.6)
@12 @)(yi2 |yi1 ) −1 @11 −1 11 [yi1 − )(yi1 )]; = − 12 11 @ @ @
(5.7)
where xij is the jth row of Xi and indicates the subvector of covariance parameters within the vector . S S ), from which it follows Note that LS is block-diagonal with blocks L() and L( that for any unit vector h, Ch equals Ch () + Ch ( ) (Cook, 1986), with
Ch () = −2h
Ch ( ) = −2h
@2 ‘i! @@!i ! =0 @ ‘i! @ @!i
i
2
@2 ‘i! h; LS () @@!i ! =0
−1
LS ( ) !i =0
−1
i
@ ‘i! @ @!i
2
h; !i =0
evaluated at = . ˆ Expressions (5.1) and (5.3) imply that the direct in#uence on only arises from those cows which drop out. This is intuitively clear from the following consideration. Suppose the model is Etted using the EM algorithm (Dempster et al., 1977), then the E step determines the expected value of the measurement at dropout and the M step Ets an ignorable model to the so completed set of data. The only way in which the dropout process can in#uence the measurement model parameters is by predicting a value which deviates from prediction under ignorability, which is simply the conditional mean of the missing measurement, given the history. From expression (5.3) it is clear that the corresponding contribution is large only if (1) the
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
105
dropout probability was small but the subject disappeared nevertheless and (2) the conditional mean ‘strongly depends’ on the parameter of interest. Note that, as indicated earlier, there can be indirect in#uence from the completers on the measurement model parameters, through the impact they can have on the dropout parameters, and hence on all expectations in which the dropout parameters are involved. Verbeke et al. (2000) derived a useful approximation for Ci ( ):
Ci ( ) 2(Vi yi22 ) Vi hi
N i=1
−1
Vi hi hi
hi
;
(5.8)
where the factor in curly braces equals the hat-matrix diagonal and Vi = g(yi1 ) [1 − g(yi1 )]. For a dropout, yi2 has to be replaced by its conditional expectation. 6. Mastitis in dairy cattle: local inuence approach 6.1. Initial analysis We performed a local in#uence analysis of these data. Fig. 3 suggests that there are four in#uential subjects: #53, #54, #66 and #69, while #4 and #5 are not recovered. It
Fig. 3. Index plots of Ci , Ci (), Ci ( ) and of the components of the direction hmax of maximal curvature, when the dropout model is parameterized in function of Yi1 and Yi2 .
106
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
is interesting to consider an analysis with these four cows removed. Unlike removing #4 and #5, the in#uence on the likelihood ratio test is rather small: G 2 = 3:43 instead of the original 5.11. The in#uence on the measurement model parameters under both random and non-random dropout is small. We will carefully re#ect upon these di2erences. First, di2erences between local and global in#uence will be discussed. Second, the impact of parameterizations of the dropout model on local in#uence will be considered. It is very important to realize that one should not expect agreement between deletion and our local in#uence analysis. The latter focuses on the sensitivity of the results with respect to the assumed dropout model, more speciEcally how the results change when the RD model is extended into the direction of non-random dropout. In particular, all subjects singled out so far are complete and hence Ci () ≡ 0, placing all in#uence on Ci ( ) and hmax; i . This is conErmed from the Ci () and Ci ( ) panels where, certainly when the scale is compared to the one of Ci ( ), little or no in#uence is detected. Of course, a legitimate concern is precisely where one should place a cut-o2 between subjects that are in#uential and those that are not. Clearly, additional work studying the stochastic behavior of the in#uence measures would be helpful. Meanwhile, informal guidelines can be used, such as studying 5% of the relatively most in#uential subjects. More insight can also be obtained from studying (5.8). The contribution for subject i is made up of three factors. The Erst factor, the variance Vi , is small for extreme dropout probabilities. The subjects having a very high probability to either remain in the study or disappear will be less in#uential through this factor. Cows #4 and #5 have dropout probabilities equal to 0.13 and 0.17, respectively. This has to be contrasted with the fact that the 107 cows in the study span the dropout probability interval [0:13; 0:37]. Thus, this component rather de#ates the in#uence of subjects #4 and #5. Second, (5.8) contains a leverage factor in curly braces. Third, a subject is relatively more in#uential when both milk yields are high. (Recall that the Erst milk yields enters through the history hi and that for an incompletely observed animal, the second milk yield is replaced by its expectation.) We now need to question whether the latter feature is plausible or relevant. Since both measurements are positively correlated, animals with both milk yields high or both low will not be unusual. In contrast, Kenward (1998) observed that cows #4 and #5 are unusual on the basis of their increment. This is in line with several other applications of similar dropout models (DK: Molenberghs et al., 1997) where it was found that a strong incremental component apparently yields a non-random dropout model. From our analysis, it is clear that such a conclusion may not be well founded, since removal of #4 and #5 leads to disappearance of the non-random component. On the other hand, the size variable can often be replaced by just the last observed value, in which case the corresponding model is very close to random dropout. All of this points to the important role of the incremental variable. Even though a dropout model parameterized in terms of Yi1 and Yi2 (a “directvariable representation”) is formally equivalent to a model parameterized in terms of the Erst variable Yi1 and the increment Yi2 − Yi1 (an “incremental-variable representation”), these alternatives lead to di2erent perturbations of the form (3.6). At
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
107
Fig. 4. Index plots of Ci , Ci (), Ci ( ) and of the components of the direction hmax of maximal curvature, when the dropout model is parameterized as a function of Yi1 and Yi2 − Yi1 .
Erst, this feature can be seen as both an advantage and a disadvantage. The fact that reparameterizations of the linear predictor of the dropout model leads to di2erent perturbation schemes requires careful re#ection based on substantive knowledge in order to guide the analysis, such as the considerations on the incremental variable made earlier. Fig. 4 presents the results of the in#uence analysis applied to the incrementalvariable representation. From these diagnostic plots it is obvious that we recover three in#uential subjects: #4, #5, and #66. While Kenward (1998) did not consider #66 to be in#uential, it appears to be somewhat distant from the bulk of the data. The main di2erence between both types is that the Erst two were likely sick during year 1 (as is the belief of dairy scientists), while this is not necessarily so for #66. An additional feature is that in all cases both Ci ( ) as well as hmax show the same in#uential animals. In addition, hmax suggests that the in#uence for #66 is di2erent than for the others. It could be conjectured that the latter one pulls the coeGcient ! in a di2erent direction than the other two. The other values are all relatively small. This could indicate that for the remaining 104 subjects, RD is plausible, while a deviation in the direction of the incremental variable, with di7ering signs, appears to be necessary for the other three subjects. At this point, a comparison between hmax for the direct-variable and incrementalvariable analyses is useful. Since the contributions hi sum to one, these two plots are
108
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
Fig. 5. Index plots of the three components of Ci ( ) when the dropout model is parameterized in function of Yi1 and Yi2 − Yi1 .
directly comparable. There is no pronounced in#uence indication in the direct variables case, so perhaps only random noise is seen. A more formal way to distinguish between signal and noise in such plots is the subject of ongoing research. 6.2. Decomposition of in8uence In Fig. 5, we have decomposed (5.8) in its three components: the variance of the dropout probability Vi , the incremental variable Yi2 − Yi1 , which is replaced by its predicted value for a dropout, and the hat-matrix diagonal. In agreement with the preceding discussion, the in#uence clearly stems from an unusually large increment, despite the in#uence being downplayed by Vi since Y41 and Y51 are comparatively small and dropout increases with the milk yield in the Erst year. We noted already that cows #4 and #5 have relatively small dropout probabilities. In contrast, the dropout probability of #66 is large within the observed range [0:13; 0:37], since not only its increment, but also its size is large. For all three cows, the increment is large, and hence changing its “parameter” !i can have a large impact on the other dropout parameters 0 and 1 . But the di2erence in size explains the sign di2erence of hmax; 4 and hmax; 5 versus hmax; 66 . Kenward (1998) considers extra analyses with #4 and #5 removed. The resulting likelihood ratio statistic reduces to G 2 = 0:08. When only #66 is removed, the
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
109
likelihood ratio for non-random dropout is G 2 = 3:57, very similar to the one when #53, #54, #66 and #69 were removed. Removing all three (#4, #5 and #66) results in G 2 = 0:005. Details are given in Table 1. 6.3. Rationale for incremental analysis We now provide insight into why the transformation of direct outcomes to increments is useful. We noted already that the associated perturbation schemes (3.6) are di2erent. An important device in this respect is the equality 0
+
1 Yi1
+
2 Yi2
=
0
+(
1
+
2 )Yi1
+
2 (Yi2
− Yi1 ):
(6.1)
Eq. (6.1) shows that the direct variables model checks the in#uence on the random dropout parameter 1 , whereas the random dropout parameter in the incremental model is 1 + 2 . Not only is this a di2erent parameter, it is estimated with more precision. One often observes that ˆ 1 and ˆ 2 exhibit a similar variance and negative correlation, in which case the linear combination with smallest variance is approximately in the direction of the sum 1 + 2 . When the correlation is negative the di2erence direction 1 − 2 is obtained instead. Let us assess this in case all 107 observations are included. The estimated covariance matrix is
0:59 −0:54 0:70
with correlation −0:84. The variance of ˆ 1 + ˆ 2 on the other hand is estimated to be 0:21. In this case, the direction of minimal variance is along (0:74; 0:67) which is indeed close to the sum direction. When all three in#uential subjects are removed, the estimated covariance matrix becomes
3:31 −3:77 4:37
;
with correlation −0:9897. Removing only #4 and #5 yields an intermediate situation of which the results are not shown. The variance of the sum is 0:15 which is a further reduction and still close to the direction of minimal variance. These considerations reinforce the claim that an incremental analysis is highly recommended. It might therefore be interesting to routinely construct a plot such as in Fig. 1, even with longer measurement sequences. On the other hand, transforming the dropout model to a size variable (Yi1 +Yi2 )=2 will worsen the problem since an insensitive parameter for Yi1 will result. Finally, observe that a transformation of the dropout model to a size and incremental variable at the same time for the model with all three in#uential subjects removed gives a variance of the size and increment variables of 0:15 and 15:22, respectively. In other words, there is no evidence for an incremental e2ect, conErming that random dropout is plausible. Further insight into why the incremental analysis is useful can be found from a representation of the proEle likelihood function in 2 (Fig. 6). The (non-convex) proEle likelihood function is supplemented with a representation of the time e2ect
110
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
Fig. 6. Representation of the proEle likelihood function in 2 over a range which includes the RD and NRD models. In addition, the evolution of d and 1 is given.
d and 1 , as functions of 2 . In agreement with the considerations above, we observe once again that 1 is almost exactly linear in 2 . More precisely, the size direction is constant, implying that its magnitude is nearly invariant to the missing data assumptions. Although local and global in#uence are strictly speaking not equivalent, it is insightful to see how the global in#uence on can be linked to the behavior of Ci ( ). We observed earlier that all locally in#uential subjects are completers and hence Ci () ≡ 0. Yet, removing #4, #5 and #66 shows some e2ect on the discrepancy between the random dropout (RD) and non-random dropout (NRD) estimates of the time e2ect d . In particular, RD and NRD estimates with all three subjects removed are virtually identical (0.60 and 0.63). Since these subjects are in#uential in Ci ( ), the model could be improved by including incremental terms for these three subjects. Such a model would still imply random dropout. Of course, while this thought experiment is useful to increase insight, a modeler would not want to build such a contrived model. In contrast, allowing a dependence on the increment in all subjects will in#uence E(Yi2 |yi1 ; dropout) for all incomplete observations an hence the measurement model parameters under NRD will change. In conclusion, this provides a way to assess the indirect in#uence of the dropout mechanism on the measurement model parameters through local in#uence methods. In the milk data set, this in#uence is likely due to the fact that an exceptional increment which is caused by a
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
111
di2erent mechanism, perhaps a diseased animal during the Erst year, is nevertheless treated on equal footing with the other observations within the dropout model. Such an analysis not possible with the case-deletion method because it is not possible to disentangle the various sources of in#uence. In summary, our analysis agrees with and extends the sensitivity analysis of Kenward (1998). First, not only cows #4 and #5, but also #66, are found to drive the conclusion of Diggle and Kenward (1998) that dropout (i.e., occurrence of mastitis) is non-random. In other words, one cannot simply conclude that the occurrence of mastitis depends on milk yield. In fact, a model with all three cows removed points to completely random dropout. Further, by decomposing the local in#uence expressions, the mechanism by which these cows are in#uential is revealed. For #4 and #5, the unusually large increment, combined with a low value at year 1, is responsible. For cow #66, a large increment is also present, but the measurements themselves are normal to high. 7. Concluding remarks In this paper, we have applied local in#uence tools (Cook, 1986; Lesa2re and Verbeke, 1998) to the selection model for continuous data subject to non-random dropout, as presented in DK. In particular, it is shown how the impact of small perturbations around the null model of missing at random will a2ect the measurement model and dropout model parameters. In order to calculate the in#uence diagnostics it is not necessary to Et a non-random dropout model. All calculations have been carried out in GAUSS and the programs are available from the authors upon request. The method was applied to the mastitis data set, studied in DK and Kenward (1998). We have illustrated that an informal sensitivity analysis based on substantive considerations, and a formal approach such as a local in#uence analysis, can usefully supplement each other. Kenward (1998) found that both removing subjects #4 and #5, as well as replacing the Gaussian conditional distribution of the second milk yield given the Erst one with a heavy-tailed t distribution, indicated a strong sensitivity of the conclusions about the incremental e2ect from year 1 to year 2 and the nature of the dropout process. For example, both these actions removed the evidence for NRD. For the local in#uence analysis, it was deduced that an incremental-variable representation of the dropout mechanism is beneEcial over a direct variable representation. Contrasting our local in#uence approach with a case-deletion scheme as applied in Kenward (1998), we End the same two subjects, with in addition cow #66 being in#uential. One advantage of local in#uence is its ability to focus on direct and indirect in#uence on the dropout and measurement model parameters, stemming from perturbing the random dropout model in the direction of non-random dropout. In contrast, a case-deletion scheme combines all sources of in#uence, whether stemming from the dropout mechanism or not. In this paper, we have studied one single but interesting case, with its speciEc aspects but also with its limitations. In di2erent situations, local in#uence may be able to reveal di2erent aspects of in#uence. For example, rather than detecting one
112
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
or a few cases, a cluster of in#uential subjects could be revealed. This is likely to happen in situations with more structure, such as hierarchical studies (multi-center trials, stratiEed samples, etc.). Limited simulations set up to explore such capabilities have been done and are reported in Verbeke et al. (2000). The results are encouraging in the sense that, when suGciently di2erent in key aspects, an entire group of subjects can be revealed with the proposed method. Of course, the local in#uence parameterizations presented here are not the only ones possible or even desirable. It is clear that other schemes are worthwhile considering as well. The advantage of the scheme considered here is that it does not imply excessive computational requirements and yields nicely interpretable components. An important issue is the inclusion of covariates into the dropout mechanism. This has not been possible here given the relatively simple data structure. However, GAUSS code has been developed which allows for this possibility. Further research in this area is goingon. Finally, the ideas outlined in this paper are not conEned to the selection model of DK. Currently, work is done to explore this route for categorical responses, and for missingness models of the pattern-mixture type (Little, 1994). In order to have more formal rules to decide whether a subject is clearly in#uential, clearly not in#uential, or borderline, additional research is required. Presently, such rules of thumb as exploring the, say, 5% most in#uential subjects in more detail can be used. Finally, throughout this paper we have tried to convey our conviction that sensitivity assessment for NRD models is imperative. Both the formal and informal sensitivity analyses have illustrated that mechanical discrimination between RD and NRD, based on hypothesis testing, is a dead-end street, since NRD models are very sensitive to such aspects as modeling assumptions, outlying and otherwise in#uential observations, etc. Insight into whether RD or rather NRD is to be preferred should therefore be the subject of an integrated sensitivity analysis. 8. Uncited Reference Rubin, 1987. Acknowledgements We thank Rod Little and Don Rubin for helpful comments. We gratefully acknowledge support from FWO-Vlaanderen Research Project “Sensitivity Analysis for Incomplete and Coarse Data” and from Vlaams Instituut voor de Bevordering van het Wetenschappelijk-Technologisch Onderzoek in de Industrie. References Cook, R.D., 1986. Assessment of local in#uence. J. Roy. Statist. Soc. Ser. B 48, 133–169. Cook, R.D., Weisberg, S., 1982. Residuals and In#uence in Regression. Chapman & Hall, London.
G. Molenberghs et al. / Computational Statistics & Data Analysis 37 (2001) 93–113
113
Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39, 1–38. Diggle, P.D., Kenward, M.G., 1994. Informative dropout in longitudinal data analysis (with discussion). Appl. Statist. 43, 49–93. Glynn, R.J., Laird, N.M., Rubin, D.B., 1986. Selection modeling versus mixture modeling with nonignorable nonresponse. In: Wainer, H. (Ed.), Drawing Inferences from Self-Selected Samples. Springer, New York, pp. 115–142. Kenward, M.G., 1998. The e2ect of untestable assumptions for data with nonrandom dropout. Statist. Med. 17, 2723–2732. Laird, N.M., 1994. Discussion to Diggle, P.J. and Kenward, M.G.: informative dropout in longitudinal data analysis. Appl. Statist. 43, 84. Laird, N.M., Lange, N., Stram, D., 1987. Maximum likelihood computations with repeated measures: application of the EM algorithm. J. Amer. Statist. Assoc. 82, 97–105. Lesa2re, E., Verbeke, G., 1998. Local in#uence in linear mixed models. Biometrics 53, 570 –582. Little, R.J.A., Rubin, D.B., 1987. Statistical Analysis with Missing Data. Wiley, New York. Little, R.J.A., 1994. A class of pattern-mixture models for normal incomplete data. Biometrika 81, 471– 483. Little, R.J.A., 1995. Modeling the drop-out mechanism in repeated-measures studies. J. Amer. Statist. Assoc. 90, 1112–1121. Molenberghs, G., Goetghebeur, E., Lipsitz, S.R., 1999. Non-random missingness in categorical data: strengths and limitations, Amer. Statist. 53, 110 –118. Molenberghs, G., Kenward, M.G., Lesa2re, E., 1997. The analysis of longitudinal ordinal data with nonrandom dropout. Biometrika 84, 33– 44. Nordheim, E.V., 1984. Inference from nonrandomly missing categorical data: an example from a genetic study on Turner’s syndrome. J. Amer. Statist. Assoc. 79, 772–780. Rubin, D.B., 1976. Inference and missing data. Biometrika 63, 581–592. Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys. Wiley, New York. Rubin, D.B., 1994. Discussion to Diggle, P.J. and Kenward, M.G.: informative dropout in longitudinal data analysis. Appl. Statist. 43, 80 –82. Verbeke, G., Molenberghs, G., Thijs, H., Lesa2re, E., Kenward, M.G., 2000. Sensitivity analysis for non-random dropout: a local in#uence approach, Biometrics 57, 43–50. Wu, M.C., Bailey, K.R., 1988. Analysing changes in the presence of informative right censoring caused by death and withdrawal. Statist. Med. 7, 337–346. Wu, M.C., Bailey, K.R., 1989. Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics 45, 939–955. Wu, M.C., Carroll, R.J., 1988. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 44, 175 –188.