Sensitivity analysis of longitudinal data with intermittent missing values

Sensitivity analysis of longitudinal data with intermittent missing values

Statistical Methodology 4 (2007) 217–226 www.elsevier.com/locate/stamet Sensitivity analysis of longitudinal data with intermittent missing values Ah...

217KB Sizes 1 Downloads 63 Views

Statistical Methodology 4 (2007) 217–226 www.elsevier.com/locate/stamet

Sensitivity analysis of longitudinal data with intermittent missing values Ahmed M. Gad a,∗ , Abeer S. Ahmed b a Department of Statistics, Faculty of Economics and Political Science, Cairo University, Cairo, Egypt b The National Centre for Social and Criminological Research, Cairo, Egypt

Received 19 May 2005; received in revised form 17 May 2006; accepted 20 August 2006

Abstract Several models for longitudinal data with nonrandom missingness are available. The selection model of Diggle and Kenward is one of these models. It has been mentioned by many authors that this model depends on untested modelling assumptions, such as the response distribution, from the observed data. So, a sensitivity analysis of the study’s conclusions for such assumptions is needed. The stochastic EM algorithm is proposed and developed to handle continuous longitudinal data with nonrandom intermittent missing values when the responses have non-normal distribution. This is a step in investigating the sensitivity of the parameter estimates to the change of the response distribution. The proposed technique is applied to real data from the International Breast Cancer Study Group. c 2006 Elsevier B.V. All rights reserved.

Keywords: Longitudinal data; Intermittent missing; Stochastic EM; Non-normal distribution; t distribution; Breast cancer; Quality of life

1. Introduction Longitudinal data studies, where each subject is measured repeatedly, are very common in many disciplines. A common phenomenon with such studies is incompleteness. The incompleteness may be due to early withdrawal of some subjects from the study (dropout setting), or some values for some subjects may be unavailable (intermittent setting). Following Little and Rubin’s taxonomy [18], the missingness process can be classified to three categories. The process ∗ Corresponding address: Quantitative Methods Department, College of Administrative Sciences, King Saud University, Riyadh 11451, P.O. Box 2459, Saudi Arabia. Tel.: +966 1 4674081; fax: +966 1 4674091. E-mail address: [email protected] (A.M. Gad).

c 2006 Elsevier B.V. All rights reserved. 1572-3127/$ - see front matter doi:10.1016/j.stamet.2006.08.001

218

A.M. Gad, A.S. Ahmed / Statistical Methodology 4 (2007) 217–226

is said to be missing completely at random (MCAR) if the missingness is independent of both observed and unobserved data and missing at random (MAR) if the missingness is independent of the unobserved data conditional on the observed data. The missingness process is termed missing not at random or nonrandom (MNAR) if it is neither MCAR nor MAR. Assume that there are m subjects participating in the study and the intended measurements for the ith subject are n i . Let Yi j represent the jth measurement, j = 1, . . . , n i , on the ith subject, i = 1, . . . , m. Let Yi be an n i × 1 vector containing the responses that would be obtained, for the ith subject, if there were no missing values. Assume that the observed and missing components of Yi are denoted as Yi,obs and Yi,mis respectively. Let Ri be a vector of missingness indicators. For a particular realization of (Yi , Ri ), each element of Ri takes the value one if the corresponding value of Yi is observed and the value zero if the corresponding value of Yi is missing. The response vector of the ith subject Yi is assumed to follow the general linear model with correlated errors Yi = X i β + i ,

(1)

where X i is an n i × p known matrix, β is a p × 1 vector of unknown parameters and i are independently distributed random variables, with zero mean and covariance matrix Vi . The matrix Vi may have parametric structure, i.e. its elements are function of a smaller number of vector of parameters ω. In this case it can be written as Vi (ω). Also, this matrix may be unstructured, and hence it contains n i (n i + 1)/2 parameters. The selection model [8] is based on factorizing the complete data density function as f (Yi , Ri ) = f (Yi | θ ) f (Ri | Yi , Ψ ), where the first factor is the marginal density of the response and the second factor is the missingness process distribution conditional on the response. The pattern-mixture model [16] is based on factorizing the complete data density function as f (Yi , Ri ) = f (Yi | Ri , γ ) f (Ri | π ), where γ and π are suitable parameters. Celeux and Diebolt [1] have introduced the stochastic EM algorithm as an alternative to the EM algorithm. The stochastic EM algorithm can be used when the E-step of the EM algorithm is intractable. The stochastic EM algorithm involves iterating two steps. In the S-step, the missing values are filled with a single draw from the conditional distribution of the missing data given the observed values. In the M-step, the likelihood function of the pseudo complete data are maximized using any conventional procedure. For more details on the stochastic EM algorithm, see, for example, Ip [11] and Diebolt and Ip [4]. Diggle and Kenward [5] propose a general class of selection model for analyzing longitudinal data with dropouts. The probability of dropout at time di is modeled as a function of the measurement at time di and the observed measurements (history); that is, P(Di = di | history) = Pdi (Hdi , ydi ; Ψ ). They suggest using the logistic model for the dropout process as di X  logit Pdi (Hdi , ydi ; Ψ ) = Ψ0 + Ψj yj. j=1

(2)

A.M. Gad, A.S. Ahmed / Statistical Methodology 4 (2007) 217–226

219

The Diggle–Kenward model [5] combines the multivariate normal model for the response and the logistic model (2) for the dropout process. They maximize the log-likelihood function using the Nelder–Mead simplex algorithm [22]. A detailed description of the model can be found in Diggle and Kenward [5]. In the selection model, the observed data density f (Yobs , R) can be viewed as Z f (Yobs , R | θ, Ψ ) = f (Yobs , Ymis | θ )P(R | Yobs , Ymis , Ψ )dYmis = f (Yobs | θ )E f (Ymis |Yobs ) P(R | Yobs , Ymis , Ψ ). It is clear that the key assumptions underlying the selection model are: (i) the correct specification of the response distribution, and (ii) the missing data mechanism model. The response distribution cannot be fully observed due to the missing values. Many discussants to Diggle and Kenward [5] have mentioned the impact of this distribution on the study conclusion; see, for example, Little [17] and Rubin [23]. So, the influence of this distribution on the study’s conclusion should be part of the sensitivity analysis [14]. Fitting different distributions to the response enables us to investigate the properties of the proposed method when the response distribution is misspecified. Kenward [14] fits the normal model and the t model, in the dropout setting. Other approaches of performing sensitivity analysis are available in the literature, such as fitting several models for the missing data mechanism; see, for example, Ibrahim et al. [10] and Minini and Chavance [19]. Fitting the pattern-mixture model in addition to the selection model could be a sensitivity analysis tool; see Kenward and Molenberghs [15], Daniels and Hogan [3], Thijs et al. [25] and Molenberghs et al. [21]. The local influence approach, which investigates the effect of small perturbations of the missingness model on the study’s conclusions is another tool; see Molenberghs et al. [20], Steen et al. [24], Verbeke et al. [27] and Jansen [12]. Most of these sensitivity analysis approaches have been applied in the dropout setting. Less attention has been paid to sensitivity analysis in the intermittent setting. The aim of this paper is twofold. The first is to explore the effect of distributional assumptions on the parameter estimates of longitudinal data models, in the presence of nonrandom intermittent missing values. So, we adopt the change of distributional assumption approach as a sensitivity tool. Hence, it is important to be able to fit, in a convenient way, distributions other than the normal distribution. The second is to develop the stochastic EM algorithm to obtain the parameter estimates for non-normal models with nonrandom intermittent missing values, the t distribution in particular. In Section 2 the stochastic EM algorithm is proposed and developed for non-normal models, the t distribution in particular, in the presence of nonrandom intermittent missing data. In Section 3, the proposed approach is applied to a data set from the International Breast Cancer Study Group. In Section 4, a simulation study is performed to evaluate the properties of the proposed method. 2. Inference for non-normal models 2.1. Introduction Gad and Ahmed [6] have proposed and developed the stochastic EM algorithm, to obtain parameter estimates of longitudinal normal data models, in the presence of intermittent missing data. In this section we develop and apply the stochastic EM algorithm to non-normal

220

A.M. Gad, A.S. Ahmed / Statistical Methodology 4 (2007) 217–226

longitudinal data models. The concentration is mainly on the missing values when the pattern is intermittent and the mechanism is nonrandom, in which random or completely random missing data are considered as special cases. This enables us to assess the impact of the distributional assumptions on the underlying parameters. A family of non-normal distributions can be expressed as mixture of normals as Z f (Yi ) = f (Yi | Q i )h(Q i )dQ i , where Yi | Q i are normally distributed with mean µi and covariance matrix Vi /Q i , and Q i are unobserved iid positive random variables with known density h(Q i ). It is easy to prove that Yi , marginally, has multivariate t distribution with mean µi , scale matrix Vi and ν degrees of freedom when Q i is gamma distributed with parameters ( ν2 , ν2 ). Then, the marginal density of Yi is f (Yi ) =

Γ (ν + n i /2)|Vi |−1/2 , (π ν)n/2 Γ (ν/2)[1 + di2 /ν]ν+n/2

where 0

di2 = (Yi − µi ) Vi−1 (Yi − µi ).

(3)

It can be shown that the conditional distribution of Q i given Yi is a gamma distribution with parameters

ν+n i 2

and

ν+di2 2 ,

where di as in Eq. (3), i.e.

ν + n i ν + di2 , Q i | Yi ∼ Gamma 2 2

! .

(4)

The objective is to find the stochastic EM estimates of µi and Vi for non-normal models. Assume that the missingness process is modeled as a logistic model; hence the parameters Ψ also need to be estimated. Let the parameters µi , Vi and Ψ be stacked in a vector θ . By introducing the random variables Q i , the complete data of the ith subject are (Yi,obs , Yi,mis , Q i , Ri ). The joint density function of the complete data can be written as f (Yi,obs , Yi,mis , Q i , Ri ) = f (Yi,obs , Yi,mis , Ri | Q i , θ )h(Q i | ν), where θ is a vector of the model parameters. Then, the complete data log-likelihood function is L(θ, ν | Yobs , Ymis , Q, R) = L 1 (θ | Y, R) + L 2 (ν | Q).

(5)

Maximizing the log-likelihood function in Eq. (5), with respect to θ, is equivalent to maximizing L 1 (θ | Y, R) with respect to θ, if ν is fixed. If ν is unknown, and needs to be estimated, the MLE of θ and ν can be obtained by separately maximizing L 1 (θ | Y, R) and L 2 (ν | Q). In the S-step of the stochastic EM algorithm, we need to simulate the missing data from the conditional distribution of missing data given the observed data for each subject. The missing data, in this case, are Yi,mis and Q i . So we need to simulate from the joint distribution of Yi,mis and Q i given the observed data, f (Yi,mis , Q i | Yi,obs , Ri ). This distribution function can be decomposed as f (Yi,mis , Q i | Yi,obs , Ri ) = f (Yi,mis | Yi,obs , Q i , Ri )g(Q i | Yi,obs , Ri ), = f (Yi,mis | Yi,obs , Q i , Ri )h(Q i | Yi,obs ),

221

A.M. Gad, A.S. Ahmed / Statistical Methodology 4 (2007) 217–226

because the Q i do not depend on the Ri . Using this idea, the S-step can be performed in two sub-steps. The first is to simulate Q i from the conditional distribution h(Q i | Yi,obs ). The second is to simulate Yi,mis from the conditional distribution f (Yi,mis | Yi,obs , Q i , Ri ). Once we have the pseudo-complete data, the log-likelihood function can be maximized using any procedure to obtain parameter estimates. 2.2. Inference for the t distribution model In the intermittent pattern context, let the missing components of Yi , Yi,mis , be grouped in a 1 × r vector; Yi,mis = (Yi,mis1 , . . . , Yi,misr ). At the (t + 1)th iteration, the steps of the stochastic EM algorithm can be developed as follows: (t+1)

• S1-step: For each subject i, simulate Q i from the conditional distribution of Q i given the observed data Yi,obs and the current parameter estimates θ (t) . This conditional distribution is defined in Eq. (4). • S2-step: A sample is drawn from the conditional distribution of the Yi,mis given the observed data. We adopt the Gibbs sampling technique for this simulation. So, (t+1) this step is executed in r sub-steps. First Yi,mis1 is simulated from the distribu(t)

(t)

tion f (Yi,mis1 | Yi,mis2 , . . . , Yi,misr , Yi,obs , Ri , Q i , θ (t) ). Then, in the second sub-step, (t+1)

(t+1)

(t)

Yi,mis2 is simulated from the conditional distribution f (Yi,mis2 | Yi,mis1 , . . . , Yi,misr , (t+1)

Yi,obs , Ri , Q i , θ (t) ). In the third sub-step, Yi,mis3 is simulated from

f (Yi,mis3 | (t+1) (t+1) (t) (t) Yi,mis1 , Yi,mis2 , . . . , Yi,misr , Yi,obs , Ri , Q i , θ ), and so on. In the final sub-step, the last miss(t+1) (t+1) (t+1) (t+1) ing value Yi,misr is simulated from the distribution f (Yi,misr | Yi,mis1 , Yi,mis2 , . . . , Yi,misr −1 , Yi,obs , Ri , Q i , θ (t) ). The full conditional distribution does not have a standard form; hence it is not possible to use direct simulation. To overcome this problem, an accept–reject procedure is developed to generate the missing values. The whole procedure is as follows: (1) Generate a candidate value, y ∗ , from the conditional distribution, (t+1)

(t+1)

(t)

(t)

(t+1)

f (Yi,mis j | Yi,obs , Yi,mis1 , . . . , Yi,mis j−1 , Yi,mis j+1 , Yi,misr , Q i

, θ (t) ).

This distribution is a normal distribution; hence direct simulation is possible. An (t+1) advantage here is using the updated values of Q i , Q i . (2) Calculate the probability of missingness for the candidate value, y ∗ , according to the assumed dropout model in Eq. (2), where the parameters Ψ are fixed at the current values Ψ (t) . Let us denote this probability of dropout as Pi . (3) Simulate a random variate U from the uniform distribution on the interval [0, 1], then take yi,mis j = y ∗ if U ≤ Pi ; otherwise go to step 1. (t+1) • M-step: Given Q i , the complete data for subject i, Yi , has a normal distribution with mean µi and covariance matrix Vi /Q i . If the response Yi is modeled as in √ Eq. (1), then the √ errors ∗ = ∗ = i ∼ MVN(0, V /Q ). Let us use the following transformation: Y Q Y , X Qi X i , i i i i i i √ i∗ = Q i i . Then the transformed response values Yi∗ have the general linear model with i∗ ∼ MVN(0, Vi ). The Yi∗ values are used in the maximization step. The M-step has two sub-steps: the normal step and the logistic step. In the normal step, the model parameter estimates, θ, are obtained using the transformed response values and any appropriate optimization approach: the Jennrich–Schluchter algorithm [13] for example. In the logistic step, the maximum likelihood estimates of the missingness parameters using

222

A.M. Gad, A.S. Ahmed / Statistical Methodology 4 (2007) 217–226

the logistic model are obtained. The iterative reweighted least squares method is used to find these estimates; see, for example, Collett [2]. When implementing the stochastic EM algorithm, we need to check the convergence of the resulting chain. Several convergence diagnostics have been proposed in the literature. Out of these methods, the Gelman–Rubin method [7] will be used. This method is based on generating multiple, k ≥ 2, chains in parallel for N = 2 p iterations. For each chain, this method suggests starting from different points, for which the starting distribution is over-dispersed compared to the target distribution. This method is separately monitoring the convergence p of each scalar parameter of interest by evaluating the Potential Scale Reduction Factor, (PSRF), Rˆ as r p n−1 1 B Rˆ = + , n nW where B/n is the between-sequence variance and W is the average of within-sequence variances. This calculation depends on the last p iterations of each sequence. The convergence is achieved if the PSRF is close to one which means that the parallel Markov chains are essentially overlapping. If the PSRF is large, then proceeding further simulation may be needed. 3. Application: Breast cancer data The proposed approach is applied to the breast cancer data. These data concern quality of life among breast cancer patients in a clinical trial taken by the International Breast Cancer Study Group (IBCSG). In the IBCSG trial VI [9], premenopausal women with breast cancer are followed for traditional outcomes such as relapse, death and also focused on quality of life. The study’s objective was to compare the quality of life for patients among four different chemotherapy treatments: A, B, C and D. Each patient is allocated randomly to one of the four treatments. Six intended measurements, one every three months, were planned to be taken from each patient during the chemotherapy administration. The patients were asked to complete a quality of life questionnaire. The Perceived Adjustment to Chronic Illness Scale (PACIS) was an intended response. This is a one-item scale comprising a global patient rating of the amount of effort costs to cope with her illness. Some patients refused in some visits to complete the questionnaire, resulting in intermittent missing values. A patient may not appear to fill the questionnaire if her mood is poor, and therefore the missing data mechanism is nonrandom. The total number of patients surviving the study period is 446 patients; 10 patients died during the study. Those patients are excluded from the analysis, so the missing values are not due to death. There are 64 patients with missing response at the first assessment and those are also excluded from the analysis. So, the number of subjects included in the analysis is 382 patients. The PACIS is measured on a continuous scale from 0 to 100, where a larger score indicates that a greater amount of effort is required for the patient to cope with her illness. Following [9], we use a square-root transformation to normalize the data. The number of patients with missing responses on at least one occasion is 293 (77%); hence the study completers are 89 (23%) patients. The percentages of missing values were 29%, 36%, 47%, 54% and 62% for visits starting from the second visit. The percentages of patients with 0, 1, 2, 3, 4, 5 missing responses were respectively 23%, 18%, 13%, 13%, 14% and 19%. Assuming that the responses are normally distributed, many authors have tried to analyze these data. H¨urny et al. [9] use the complete case analysis for responses of the first four measurements.

A.M. Gad, A.S. Ahmed / Statistical Methodology 4 (2007) 217–226

223

Troxel et al. [26] have analyzed the responses for the first six months of the study, including the missing values. Ibrahim et al. [10] have analyzed the PACIS variable for the patients remaining in the study long enough to have all assessments. They use a random effects model with the AR(1) model for the covariance structure. The missing data mechanism is modeled using the logistic model, in Eq. (2), where only the previous and the current responses are included. They conclude that the third treatment (treatment C) effect and the coefficient of the response at the previous time point, in the missingness model, are significant. Gad and Ahmed [6] have reanalyzed these data using the same logistic model and any covariance structure. They use the stochastic EM algorithm to obtain parameter estimates. They conclude that the covariate treatment C and the parameters associated with the current and previous responses, in the missingness model, are significant. We adopt a mean model that allows each treatment to have its own effect. That is, µ j = µ0 j + α1 x 1 + α2 x 2 + α3 x 3

for j = 1, . . . , 6,

where µ0 j is a constant shift at each assessment time and  (1, 0, 0) for treatment A    (0, 1, 0) for treatment B (x1 , x2 , x3 ) = (0, 0, 1) for treatment C    (0, 0, 0) for treatment D. The first-order auto-regressive AR(1) model is adopted for the covariance structure. In this model, the (i, j)th element of the covariance matrix, σi j , is equal to σ 2 ρ |i− j| for i, j = 1, . . . , 6. An advantage of the proposed approach is that different covariance structures can be used. We have tried different covariance structures but only the results for the AR(1) structure are presented. For the missing data mechanism, we use the logistic regression model as in Eq. (2), including only the previous and the current responses to keep the model simple. That is, logit(ri j = 1 | Ψ ) = Ψ0 + Ψ1 Yi j−1 + Ψ2 Yi j ,

(6)

for i = 1, . . . , 382 and j = 1, 2, . . . , 6. The stochastic EM algorithm, developed in Section 3, is applied to obtain parameter estimates of this model assuming that ν is fixed. The degrees of freedom ν are fixed to be 4, 10, 15, 20, 25 and 50. The number of iterations is set at 3000, with a burn-in period of 1000 iterations. The stochastic EM estimates of mean, variance and missingness parameters are displayed in Table 1. Also, the parameter estimates of the normal model [6] are presented in the last column of Table 1. The Gelman–Rubin method has been applied as a convergence check. The shrink factor has been calculated for each one of the parameters and most of the scale reduction values are close to 1. Then we can conclude that the stochastic EM algorithm has converged. The standard errors of the estimates are calculated, assuming that ν is fixed at 4, using a bootstrap method. One hundred samples were re-sampled from the observed data set. The stochastic EM algorithm has been used to obtain the parameters estimates of each sample. The standard deviations of these 100 estimates are considered as the standard errors of the estimates. The results are also displayed in Table 1. The results show that as the value of ν decreases, the estimate of the Ψ2 decreases. Also, the estimates move towards those from the normal-based model as the value of ν increases. The mean parameters and covariance parameters are slightly affected by moving from the normal model to the t model. For each value of ν, the minus twice log-likelihood value at the stochastic

224

A.M. Gad, A.S. Ahmed / Statistical Methodology 4 (2007) 217–226

Table 1 The SEM estimates (SE) for PACIS assuming the responses have the t distribution and the AR(1) covariance structure Par. µ01 µ02 µ03 µ04 µ05 µ06 α1 α2 α3 ρ σ2 ψ0 ψ1 ψ2 −2 log L

The t distribution model ν=4 ν = 10

ν = 15

ν = 20

ν = 25

ν = 50

Normal model

6.17 (0.18) 5.97 (0.21) 5.87 (0.18) 5.32 (0.18) 4.99 (0.16) 5.32 (0.15) −0.12 (0.13) 0.05 (0.28) −0.61 (0.13) 0.48 (0.15) 3.75 (0.30) 1.34 (0.42) 1.21 (0.08) 0.16 (0.12) 5424

6.18 5.99 5.88 5.35 5.01 5.33 −0.12 0.04 −0.60 0.47 3.69 1.34 1.39 0.81 5491

6.18 6.01 5.87 5.36 5.01 5.32 −0.12 0.05 −0.60 0.48 3.75 1.31 1.44 0.84 5542

6.18 5.99 5.87 5.35 5.01 5.33 −0.12 0.04 −0.60 0.48 3.75 1.31 1.49 0.88 5558

6.18 5.99 5.87 5.35 5.00 5.32 −0.13 0.04 −0.62 0.47 3.75 1.22 1.62 0.95 5638

6.27 (0.15) 5.38 (0.15) 5.77 (0.15) 6.35 (0.15) 5.43 (0.15) 5.56 (0.15) −0.20 (0.17) 0.04 (0.17) −0.72 (0.18) 0.42 (0.02) 4.49 (0.12) 1.22 (0.07) 1.61 (0.06) 1.06 (0.08) 5894

6.18 5.99 5.87 5.36 5.01 5.30 −0.12 0.05 −0.62 0.48 3.72 1.38 1.31 0.73 5486

EM estimates is calculated. It appears that the minimum value is at ν = 4. Comparing the −2 log-likelihood for the normal model (−2` = 5894) and the t model with ν = 4(−2` = 5424), it appears that the t model with ν = 4 fits the data better, in the sense of a higher likelihood value, although no meaning can be attached to the statistical significance of the difference in these likelihood values. The missingness parameters ψ0 , ψ1 and ψ2 are significant (the Z-value is greater than 13) in the normal model. For the t model with ν = 4, the parameters ψ0 (Z-value = 3.19) and ψ1 (Z-value = 15.2) are significant whereas the parameter ψ2 (Z-value = 1.33) is not significant at any reasonable confidence level. This means that the missing data mechanism can be accepted to be missing at random (MAR) with the t model. In conclusion, assuming that the missingness model is correct, the t model can be used rather than the normal model with nonrandom missingness mechanism. In this case there is no need to include the nonrandom missingness term in the missingness model. So, using the t model with small degrees of freedom, we can assume the MAR mechanism rather than the MNAR mechanism. Hence, on changing the distributional assumption from the normal to the t distribution, the missing data mechanism moves from MNAR to MAR. 4. Simulation study A simulation study is conducted to evaluate the proposed method. It is based on a data set with m subjects and n observations for each subject. We adopt the simple model E(Yi j ) = µ j , where i = 1, . . . , m and j = 1, . . . , n. A stationary first order auto-regressive, AR(1), process is used to generate the residual component of the repeated measurements, i.e. V(i j ) = σ 2 and Cov(i j , ik ) = σ 2 ρ | j−k| . The missingness model is assumed as in Eq. (6). The data are simulated to satisfy the multivariate normal distribution, the AR(1) covariance structure and the missingness model (6) with number of subjects m = 100 and time points n = 5. According to this setting the parameter vector θ = (µ1 , µ2 , µ3 , µ4 , µ5 , σ 2 , ρ, ψ0 , ψ1 , ψ2 )0 . The parameters are set to the values µ1 = 6, µ2 = 6, µ3 = 6, µ4 = 5, µ5 = 5, σ 2 = 4, ρ = 0.5, ψ0 = 0, ψ1 = −2, ψ2 = 2. For each simulation we used 5000 replications.

A.M. Gad, A.S. Ahmed / Statistical Methodology 4 (2007) 217–226

225

Table 2 Simulation results: The SEM estimates and standard errors for the normal model and the t distribution model with ν = 4 Parameter µ1 µ2 µ3 µ4 µ5 ρ σ2 ψ0 ψ1 ψ2

The t distribution model Estimate

SE

Normal distribution model Estimate

SE

6.00 5.85 5.81 4.92 4.99 0.49 3.51 0.22 −0.95 0.86

0.10 0.23 0.16 0.18 0.16 0.12 0.33 0.12 0.18 0.95

6.00 5.98 5.97 5.05 5.03 0.52 4.09 0.25 −2.05 2.06

0.15 0.17 0.16 0.18 0.15 0.08 0.14 0.15 0.66 0.65

The stochastic EM algorithm is used to find parameter estimates, for each piece of data, assuming the normal distribution model and the t distribution model. The number of iterations is set at 3000 with a burn-in period of 1000 iterations. The average parameter estimates are displayed in Table 2. The stochastic EM estimates at ν = 4, for the t distribution model, are reported. A range of degrees of freedom ν values has been used, but the results are not reported because the conclusions are similar to that of Section 3. Also, the standard errors of the estimates are obtained and displayed in Table 2. The parameter ψ2 , assuming the normal model, is significant at any reasonable significance level with Z-value equal to 3.17. This is evidence of an MNAR process. For the t distribution model with ν = 4, this parameter (ψ2 ) is not significant (Z-value = 0.90). This is evidence of an MAR process. Hence, using the t distribution model, we can move from the MNAR process to the less restrictive process, the MAR process. The simulation supports the conclusions that have been reached in Section 3. Other covariance structures have been tried. Also different values for missingness model parameters have been used. The behaviour and direction of the results, for these settings, follow the same pattern of the above results; hence they are not reported. Acknowledgements The authors are grateful to the Quality of Life Committee of the IBCSG for permission to use the quality of life data. We would like to thank the Editor and Associate Editor, and two anonymous referees, for their helpful comments on the manuscript. References [1] G. Celeux, J. Diebolt, The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem, Computational Statistics Quarterly 2 (1985) 73–82. [2] D. Collett, Modelling Binary Data, Chapman and Hall, London, 1991. [3] M.J. Daniels, J.W. Hogan, Reparameterizing the pattern mixture model for sensitivity analyses under informative dropout, Biometrics 56 (2000) 1241–1248. [4] J. Diebolt, E.H.S. Ip, Stochastic EM algorithm: Method and application, in: W.R. Gilks, S. Richardson, D.J. Spiegelhalter (Eds.), Markov Chain Monte Carlo in Practice, Chapman and Hall, London, 1996 (Chapter 15). [5] P.J. Diggle, M.G. Kenward, Informative dropout in longitudinal data analysis, Journal of Royal Statistical Society. Series B. 43 (1994) 49–93.

226

A.M. Gad, A.S. Ahmed / Statistical Methodology 4 (2007) 217–226

[6] A.M. Gad, A.S. Ahmed, Analysis of longitudinal data with intermittent missing values using the stochastic EM algorithm, Computational Statistics & Data Analysis 50 (2006) 2702–2714. [7] A. Gelman, D.B. Rubin, Inference from iterative simulation using multiple sequences (with discussion), Statistical Science 7 (1992) 457–511. [8] J.J. Heckman, The common structure of statistical models of truncation, sample selection and limited dependent variables and simple estimator for such models, Annals of Economic and Social Measurement 5 (1976) 475–492. [9] C. H¨urny, J. Bernhard, R.D. Gelber, A. Coates, M. Gastiglione, M. Isley, D. Dreher, H. Peterson, A. Goldhirsch, H.J. Senn, Quality of life measures for patients receiving adjutant therapy for breast cancer: An international trial, European Journal of Cancer 28 (1992) 118–124. [10] J.G. Ibrahim, M.H. Chen, S.R. Lipsitz, Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable, Biometrika 88 (2001) 551–564. [11] E.H.S. Ip, A Stochastic EM estimator in the presence of missing data: Theory and applications, Technical Report, Division of Biostatistics, Stanford University, Stanford, California, 1994. [12] I. Jansen, G. Molenberghs, M. Aerts, H. Thijs, K.V. Steen, A local influence approach applied to binary data from a psychiatric study, Biometrics 59 (2003) 409–418. [13] R.I. Jennrich, M.D. Schluchter, Unbalanced repeated measures models with structured covariance matrices, Biometrika 42 (1986) 805–820. [14] M.G. Kenward, Selection models for repeated measurements with nonrandom dropout: An illustration of sensitivity, Statistics in Medicine 17 (1998) 2723–2732. [15] M.G. Kenward, G. Molenberghs, Parametric models for incomplete continuous and categorical longitudinal data, Statistical Methods in Medical Research 8 (1999) 51–83. [16] R.J.A. Little, Pattern mixture models for multivariate incomplete data, Journal of the American Statistical Association 88 (1993) 125–134. [17] R.J.A. Little, Discussion to Diggle and Kenward: Informative drop-out in longitudinal data analysis, Applied Statistics 43 (1994) 78–95. [18] R.J.A. Little, D.B. Rubin, Statistical Analysis with Missing Data, Wiley, 1987. [19] P. Minini, M. Chavance, Sensitivity analysis of longitudinal data with drop-outs, Statistics in Medicine 23 (2004) 1039–1054. [20] G. Molenberghs, G. Verbeke, H. Thijs, E. Lesaffre, M.G. Kenward, Mastitis in diary cattle: A local influence to assess sensitivity of dropout process, Computational Statistics & Data Analysis 37 (2001) 93–113. [21] G. Molenberghs, H. Thijs, M.G. Kenward, G. Verbeke, Sensitivity analysis for continues incomplete longitudinal outcomes, Statistica Neerlandica 57 (2003) 112–135. [22] J.A. Nelder, R. Mead, A simplex method for function minimisation, The Computer Journal 7 (1965) 303–313. [23] D.A. Rubin, Discussion to Diggle and Kenward: Informative drop-out in longitudinal data analysis, Applied Statistics 43 (1994) 80–82. [24] K.V. Steen, G. Molenberghs, G. Verbeke, H. Thijs, A local influence approach to sensitivity analysis of incomplete longitudinal ordinal data, Statistical Modelling 1 (2001) 125–142. [25] H. Thijs, G. Molenberghs, B. Michiels, G. Verbeke, D. Curran, Strategies to fit pattern mixture models, Biostatistics 3 (2002) 245–265. [26] A.B. Troxel, D.P. Harrington, S.R. Lipsitz, Analysis of longitudinal data with non-ignorable non monotone missing values, Annals of Statistics 47 (1998) 425–438. [27] G. Verbeke, G. Molenberghs, H. Thijs, E. Lesaffre, M.G. Kenward, Sensitivity analysis for nonrandom dropout: A local influence approach, Biometrics 57 (2001) 7–14.