Computational Statistics & Data Analysis 46 (2004) 493 – 509 www.elsevier.com/locate/csda
Analyzing a randomized experiment with imperfect compliance and ignorable conditions for missing data: theoretical and computational issues Andrea Mercatanti∗ Dipartimento di Statistica e Matematica Applicata all’Economia, Universita di Pisa, V. Ridol 10, 56124 Pisa, Italy Received 1 January 2003; received in revised form 2 September 2003; accepted 4 September 2003
Abstract Theoretical and computational issues when making causal inference in randomized experiments with imperfect compliance and missing data are discussed. The analysis of the complications from which a randomized experiment can su1er is attractive, because it is not only limited to a randomized trial analysis. The template of a randomized experiment with imperfect compliance can be adopted in fact to identify and estimate causal e1ects also in observational studies. More, some complications like the presence of non-responses in the treatment and/or in the assignment to treatment are more plausible in the context of an observational study than in a randomized trial. From the theoretical point of view the complications in expliciting the likelihood function when the inference is based on ignorability conditions for the missing data mechanism are discussed. Then the inputs for implementing the EM algorithm and examples based on arti6cial data are presented. c 2003 Elsevier B.V. All rights reserved. Keywords: Causal inference; Missing data; Non-compliance; EM algorithm
1. Introduction Most of the methodologies proposed in the literature for evaluating causal e1ects are based on the concept of potential quantities (or counterfactuals). This means that, ∗
Fax: +390502216375. E-mail address:
[email protected] (A. Mercatanti).
c 2003 Elsevier B.V. All rights reserved. 0167-9473/$ - see front matter doi:10.1016/j.csda.2003.09.003
494
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
in order to de6ne causal e1ects, a comparison between the outcome of a generic individual i assigned to a treatment, and the outcome of the same individual if he was not assigned to that treatment is necessary (Rubin, 1974; Holland and Rubin, 1983). Usually, this comparison is made by a di1erence of potential outcome averages under the di1erent treatments, and over the whole population of units; this is the so-called average treatment e3ect (Holland, 1986). A randomized experiment with a binary treatment and imperfect compliance with respect to the assignments needs three basic variables in order to be de6ned: the outcome, Yi ; the treatment, Di ; and the assignment to treatment, Zi . The potential outcomes with respect to the assignments to treatments are de6ned as Yi (Zi = z) with z ∈ {1; 0}. The two components in any couple of potential quantities {Yi (Zi = 0); Yi (Zi = 1)}, are not contemporarily observable since one necessarily excludes the other, so the concept of potential quantities necessarily implies missing informations. This paper discusses the complications in handling missing data in a likelihood-based causal analysis for a randomized experiment with imperfect compliance when the missing data mechanism is supposed to be ignorable after conditioning on the observable quantities. In this context missing data have not only the meaning of unobserved potential quantities (for example the outcome of a treated individual if the individual would not be subjected to the treatment) but also one of a more general character, that is as non-responses in Yi , Di , and Zi (for example, a non-response in the outcome of a treated individual). The study of the complications from which a randomized experiment can su1er is important, because it is not only limited to a randomized trial analysis. The template of a randomized experiment with imperfect compliance can be adopted in fact for the identi6cation and estimation of treatment causal e1ects also in non-experimental situations. Angrist et al. (1996) show under which set of assumptions, a regression analysis supported by the use of instrumental variables identi6es causal treatment effects in observational studies. The template is that of a randomized experiment with imperfect compliance in the sense that the particular instrumental variable adopted should have the role of a random assignment for which the treatment does not necessarily comply. Moreover, in the context of this paper the presence of nonresponses in the treatment and/or in the assignment to treatment, is more plausible for an observational study than in a randomized trial. In an experimental context, the treatments and the assignments to treatment have the peculiarity of being “created” by a researcher who randomly assigned treatments to the units. So the probabilities to have non-responses in these two variables are usually remote. Differently, in observational studies the data are derived from other sources (for example answers to questionnaires) for which the presence of non-responses is not unusual. Section 2 brieGy describes the structure of a likelihood-based inference for causal e1ects in the presence of non-compliance. Section 3 presents the likelihood function on which the inference can be based in the presence of non-responses. Section 4 presents a computational way for maximizing the likelihood function or a posterior distribution through the EM algorithm. Section 5 illustrates two arti6cial data analysis examples.
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
495
2. Theoretical framework and basic notation In settings of imperfect compliance with respect to an assigned binary treatment, and on the basis of the concept of potential quantities, the whole population can be sub-divided into four sub-groups to characterize di1erent compliance behaviors. Units for which Zi = 1 implies Di = 1 and Zi = 0 implies Di = 0 (compliers) are induced to take the treatment by the assignment. Units for which Zi = 1 implies Di = 0 and Zi = 0 implies Di = 0 are called never-takers because they never take the treatment, while units for which Zi = 1 implies Di = 1 and Zi = 0 implies Di = 1 are called always-takers because they always take the treatment. Finally the units for which Zi =1 implies Di =0 and Zi = 0 implies Di = 1 do exactly the opposite of the assignment and are called deers. Each of these four groups de6ne a particular compliance-status. The starting point for handling the problem of missing data in a randomized experiment with imperfect compliance can be the likelihood function introduced by Imbens and Rubin (1997), and used in some empirical works, for example by Little and Yau (1998), and by Hirano et al. (2000). Given the assumptions of stable unit treatment value assumption (SUTVA) by which the potential quantities for each unit are unrelated to the treatment status of other units, and “Random assignment to treatment” by which the probability to be assigned to treatment is the same for each individual (Angrist et al., 1996) the likelihood function for the complete data X can be written: L(V|X) ˙ f(X; V) =
n
f(di ; yi ; V);
(1)
i=1
where X is the matrix formed by the observed and unobserved potential quantities, Xobs and Xmis ; f(di ; yi ; V) is the probability or density function for the i-row of X, (di ; yi ) = (dobs; i ; dmis; i ; yobs; i ; ymis; i ), and for a given value of the parametric vector V; dobs; i and yobs; i are the observed potential treatment and outcome; dmis; i and ymis; i are the unobserved potential treatment and outcome. The likelihood function based on the observed potential quantities is proportional to the integral of (1) respect to the unobserved part of X (Rubin, 1978; Imbens and Rubin, 1997): n L(V|Xobs ) ˙ f(di ; yi ; V) dXmis i=1 n
=
···
f(di ; yi ; V) dymis; i ddmis; i :
i=1
The resolution of the integral produces: i i i i (!a ga0 + !d gd0 )× (!n gn1 + !d gd1 ) L(V|Xobs ) ˙ i∈&(1;0)
×
i∈&(1;1)
i∈&(0;1) i i (!a ga1 + !c gc1 )×
i∈&(0;0)
i i (!n gn0 + !c gc0 );
(2)
496
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
where the parametric vector is V = (!a ; !n ; !c ; !d ; a0 ; a1 ; n0 ; n1; c0 ; c1 ; d0 ; d1 ) and &(d; z) is the set of units having treatment Dobs; i = d, and assigned to Zobs; i = z. Each of the four parameters !t represents the probabilities of an individual being in the t compliance status, where t = c (complier), n (never-taker), a (always-taker), d (deer); the function gtzi = gtz (yi ; tz ) is the outcome distribution for an individual in the t group and assigned to the treatment z.
3. The likelihood function with non-responses In this section the speci6cation of the likelihood function for a randomized experiment with imperfect compliance and with the possibility of having non-responses in the variables Yi , Di , Zi , is introduced. The problem of non-responses is, in general, particularly important and its study is justi6ed by diKculties in having complete datasets. The previous section showed that non-responses are not the only sources of missing information in randomized experiments with imperfect compliance; missing data in the form of unobserved potential quantities (that is the outcome and the treatment under the treatment not assigned: Ymis; i and Dmis; i ) are always present in this context. An analysis based on (2) can be performed only using datasets without nonresponses. Consequently, the units having at least one non-response in the variables Yi , Di , or Zi , should necessarily be dropped out of the analysis. But this procedure is not easily justi6ed, because in this way it would be necessary to satisfy the assumption called “missing completely at random”, by which the probability of non-response for every variable is the same for all the units. Indeed, only in this case deleting the units presenting at least one non-response in the variables Yi , Di , or Zi , does not inGuence a likelihood-based analysis (Rubin, 1976; Little and Rubin, 1987; Schafer, 1997). Less restrictive conditions can be imposed assuming the “missing at random” (MAR), and the “distinctness of parameter” (DOP) conditions (Rubin, 1976; Little and Rubin, 1987; Schafer, 1997). In order to explain these two conditions, let R indicates a n × p matrix of indicators, namely a matrix of binary variables assuming value 1 if the corresponding element of a generic n × p dataset X is observed (a response), and 0 otherwise (a non-response). Consequently the dataset X can be decomposed into two parts, the observed part, Xobs , and the unobserved part, Xmis . Let us introduce a model for R that depends on both X and a parameters vector ”: f(R|X; ”), and let us call it the “non-response mechanism”. The MAR assumption states that the probability or density function of R is independent on the unobserved part of the dataset, Xmis , after conditioning on the observed data, Xobs : f(R|X; ”) = f(R|Xobs ; Xmis ; ”) = f(R|Xobs ; ”):
(3)
Hence, by introducing the assumption of independence between individual behaviors, we can state that the probability of a certain value for ri (i row of R, i =1; : : : ; n) is the
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
497
same for all the individuals having the same value of the row vector xi; obs (observed part of the i row of X). To understand the importance of (3) we should consider the further condition of DOP which with MAR allows us to simplify the inference. The DOP condition (Little and Rubin, 1987; Rubin, 1987; Schafer, 1997), states that the parameters V of the model generating the data, f(X; V), and the parameters ” of the non-response mechanism have to be distinct. This means that the joint parameter space of (V; ”) must be the Cartesian cross-product of the marginal parameter spaces for V and ”, or from a Bayesian viewpoint that the prior distribution of V must be independent on that of ”. This is a reasonable condition in a lot of practical situations. Given the two conditions of MAR and DOP, it is possible to show that in a likelihood-based analysis, the consideration of the non-response mechanism can be avoided. In fact, the joint distribution of the observed quantities (namely the observed part of the dataset, Xobs , and the R matrix) under MAR is f(R; Xobs ; V; ”) = f(R; X; V; ”) dXmis = f(R|X; ”)f(X; V) dXmis = f(R|Xobs ; ”)
f(X; V) dXmis = f(R|Xobs ; ”) · f(Xobs ; V):
Under the further condition of DOP, a likelihood-based inference for V is not dependent on ”. It is then possible to carry out inference only by the likelihood function based on the observed part of the dataset: L(V|Xobs ) ˙ f(Xobs ; V) = f(X; V) dXmis : In other words, this means that the non-response mechanism can be ignored only after conditioning on the observable quantities. Recently, some methods for dealing with the problem of missing data in experiments with imperfect compliance have been proposed in the literature; for example Baker (2000) has proposed non-ignorable missing data mechanisms for analyzing a randomized cancer prevention trial with imperfect compliance, non-responses only in a binary outcome, and in the presence of an auxiliary variable (that is a variable observed after randomization and prior to the outcome). More recently, Yau and Little (2001) have proposed procedures based on an ignorable missing data mechanism for making inference about the Complier Average Causal E1ect in a particular setting of longitudinal study with baseline covariates and missing data only in the repeated outcome measures. In particular, Frangakis and Rubin (1999) have proposed a method for handling the problem of non-responses on the outcome, and in absence of always-takers. Their procedure is based on two conditions: the “compound exclusion restriction (CER) for never-takers” stating that assignments have no e1ect on both the potential outcomes and the response behaviors, and the “latent ignorability” of the non-response mechanism. The satisfaction of these two last assumptions is di1erent from supposing the MAR and DOP conditions satis6ed. The CER for never-takers and the latent ignorability make the missing data mechanism ignorable only after conditioning on the compliance status that cannot be always observed. The authors argued that latent ignorability is often more compelling in clinical trials analysis. This is because the compliance status can be considered an
498
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
important variable associated with an individual’s health characteristics related to the outcome and to the missing data mechanism. This justi6es the latent ignorability, that is the assumption that potential outcomes and associated potential non-response indicators are independent within each level of the compliance status. The argument is related to the case of clinical trials, where the outcome is collected subsequently to the treatment. But di1erent collection schemes can be less favourable to the assumption of latent ignorability. The randomized experiment with imperfect compliance is a template that can be adopted to make causal inference in non-experimental situations, the situations involving instrumental variables. These are the cases found in economic and social studies where it is usual to have data collected by a sample survey or census all at the same time, and where the response probabilities could not be a1ected by the act of non-compliance, but more likely by other variables associated with individual behavior characteristics. In these non-experimental data collection schemes assuming MAR and DOP can be more plausible. The satisfaction of MAR and DOP conditions is not in general less restrictive than the satisfaction of the latent ignorability and the CER for never-takers, the choice depends essentially on the context of the study and has to be evaluated case by case. In order to stress the importance of making a calibrated choice of the missing data mechanism, two examples based on arti6cial datasets will be proposed in Section 5. In a randomized experiment with imperfect compliance and missing data the likelihood function based on Xobs , under MAR, DOP, and the usual SUTVA and “Random assignment to treatment” conditions (for these two last assumptions see Angrist et al., 1996) can be obtained by integrating (2) with respect to the non-responses: L(V|Xobs ) ˙ · · · f(Xobs ; V) d zmis; i ddmis; i dymis; i =
N
···
f(di ; yi ; V) dymis; i ddmis; i d zmis; i ddmis; i dymis; i ;
(4)
i=1
where: ymis; i and dmis; i are the unobserved parts of yi and di (due to the concept of potential quantities), like in (2); zmis; i , dmis; i and ymis; i are the non-responses on the potentially observed part of yi and di . The parametric vector V is V = (!a ; !n ; !c ; !d ; a0 ; a1 ; n0 ; n1; c0 ; c1 ; d0 ; d1 ; z ): The only di1erence with respect to the vector V in (2) is the presence of the parameter representing the probability of assignment to the treatment: z = P(Zi = 1). This parameter was omitted in (2) because it is not inGuential on a likelihood-based analysis, given that the “Random assignment to treatment” condition was supposed to be satis6ed. But in the (4), given the possibility to have non-responses in the assignment to treatment, the probability to be assigned, z , has to be included in the likelihood function. The resolution of the multiple integrations in (4) produces the likelihood function based on the observed quantities that factors into eight terms (some details are in
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
499
Table 1 Speci6cation of the probability or density functions in (5) i i i i i fd; z; 1; 1; 1 = I(d=1; z=0) (1 − z )(!a ga0 + !d gd0 ) + I(d=0; z=1) z (!n gn1 + !d gd1 ) + I(d=1; z=1) z (!a ga1 i i i +!c gc1 ) + I(d=0; z=0) (1 − z )(!n gn0 + !c gc0 )
fd; z; 0; 1; 1 = I(d=1; z=0) (1 − z )(!a + !d ) + I(d=0; z=1) z (!n + !d ) + I(d=1; z=1) z (!a + !c ) +I(d=0; z=0) (1 − z )(!n + !c ) i i i i i i f:; z; 1; 0; 1 = I(z=0) (1 − z )(!a ga0 + !d gd0 + !n gn0 + !c gc0 ) + I(z=1) z (!a ga1 + !d gd1 i i +!n gn1 + !c gc1 )
fd; z; 0; 0; 1 = I(z=0) (1 − z ) + I(z=1) z i i i i i i i i fd; :; 1; 1; 0 = I(d=0) (!d gd1 + !n gn0 + !n gn1 + !c gc0 ) + I(d=1) (!d gd0 + !a ga0 + !a ga1 + !c gc1 )
fd; :; 0; 1; 0 = I(d=0) (!d + !n + !c ) + I(d=1) (!d + !a + !c ) i + gi + gi + gi + gi + gi + gi + gi f:; :; 1; 0; 0 = gd1 n0 n1 a0 a1 c0 c1 d0
f:; :; 0; 0; 0 = 1
Appendix A): L(V|Xobs ) =
d=1;0 z=1;0 i∈ (d; z;1;1;1)
×
i∈& (:; :; 1;0;0)
fd; z; 0; 1; 1
d=1;0 z=1;0 i∈(d; z;0;1;1)
fd; z; 0; 0; 1
z=1;0 i∈(:; z;0;0;1)
fd; :; 1; 1; 0 ×
d=1;0 i∈ (d;:; 1;1;0)
×
f:; z; 1; 0; 1 ×
z=1;0 i∈ (:; z;1;0;1)
×
fd; z; 1; 1; 1 ×
f:; :; 1; 0; 0 ×
fd; :; 0; 1; 0
d=1;0 i∈ (d;:; 0;1;0)
f:; :; 0; 0; 0 :
(5)
i∈ (:; :; 0;0;0)
Table 1 presents the speci6cation of the probability, or density, functions in (5). 4. Computation by the use of the EM algorithm The maximization of the observed log-likelihood function, l(V|Xobs ), can be performed by usual iterative methods. In particular, in order to exploit the particular structure of the likelihood function induced by the MAR and DOP assumptions, the maximization can be easily obtained using the EM algorithm (Dempster et al., 1977; Tanner, 1996). This is a computational method that can be used as a part of a likelihood or Bayesian analysis, whose goal is in locating the mode of the likelihood function or of a posterior distribution. It is a deterministic algorithm that works by running two steps iteratively: the E-step and the M-step. The basic idea of the EM algorithm is in
500
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
augmenting the observed data with imputed missing data (during the E-step) in order to produce a log-likelihood function easier to maximize (during the M-step) than the original log-likelihood function. The EM algorithm is attractive in making maximum likelihood inference in our context because if the compliance status and the outcome were known for all units, the likelihood would not involve mixture components (some details are in Appendix B). When the augmented log-likelihood function is linear in the missing data, the EM algorithm corresponds to 6ll-in missing data and then udpdating parameter estimates. In more general situations expected suKcient statistics are imputed, or the expected augmented log-likelihood function is computed at each iteration of the algorithm. This section outlines the general structure of the EM algorithm for a randomized experiment with imperfect compliance and missing data. Tables 2 and 3 present the inputs for calculating f(Xmis |Xobs ; V(t−1) ) in the E-step. The missing data in the complete likelihood function (1) can be: the outcome and/or the compliance status. In particular, the compliance status can be unobserved because of two reasons. One reason is due to the concept of potential outcomes, in fact the compliance status of a unit having treatment Di = d and assignment Zi = z, is exactly determined only by knowing the value of the treatment Di under the alternative assignment Zi = |1 − z|, and this can represent an unobserved counterfactual situation. For example a unit with Di = 1 and Zi = 1 could be either a complier or an always-taker. The other reason is due to the missingness in the treatment Di and/or in the assignment Zi . For example, in the absence of de6ers a unit for which Di = 0 and Zi = 1 is surely a never-taker; but if the treatment is missing the compliance status is unobserved, and the unit could be a complier, a never-taker, or an alwaystaker. Table 2 presents the inputs for calculating the conditional expected values of the unobserved compliance status. The compliance status can be represented by a fourcomponent indicator t = c (complier), n (never-taker), a (always-taker), d (deer). The conditional probability of subject i, in the set (ryobs; i ; rdobs; i ; rzobs; i ), being type t given the observed data and a current value of the vector V, is obtainable by a ratio of two quantities. ryobs; i , rdobs; i and rzobs; i are indicators assuming value 0 in case of non-response and value 1 in case of response on yobs; i , dobs; i and zobs; i respectively. The numerator of the ratio is the corresponding Table 2 entry and the denominator is the corresponding row total. The expected value of the compliance status for a subject i, is then represented by a four-component indicator of conditional probabilities. Table 3 presents the conditional distributions of the missing outcome, f(ymis; i |Xobs ; V), for an individual i in the set (0; rdobs; i ; rzobs; i ). If the augmented log-likelihood is linear in the missing outcomes (this is the case of a binary outcome), then the E-step replaces each unobserved compliance status by its conditional expected value, and each unobserved outcome by its expectation E(Ymis; i |Xobs ): E(Ymis; i |Xobs ) =
ymis; i f(ymis; i |Xobs ; V) dymis; i :
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
501
Table 2 Inputs for calculating the conditional probability of subject i being type t given observed data and parametric vector V
dobs; i
(1; 1; 1)
(1; 0; 1)
(0; 0; 1)
(1; 1; 0)
t=c
t=n
t=a
t=d
0
0
0
i !d gd1
0
0
0
1
0
i !n gn0 i !n gn1
1
0
0
0
1
i !c gc1
0
i !a ga1
i !a ga0
i !d gd0
0
0
0
!c
!n
0
0
0
1
0
!n
0
!d
1
0
0
0
!a
!d
1
1
!c
0
!a
0
.
0
i !c gc0
i !n gn0
i !a ga0
i !d gd0
.
1
i !c gc1
.
0
!c
!n
!a
!d
.
1
!c
!n
!a
!d
0
.
i (1 − z )!c gc0
i (1 − z )!n gn0
0
i z !d gd1
i z !c gc1
0
i (1 − z )!a ga0
i (1 − z )!d gd0
(1 − z )!c
(1 − z )!n
0
z !d
(1 − z )!a
(1 − z )!d
1
(0; 1; 0)
Subject type t
i !c gc0
1 (0; 1; 1)
zobs; i
0
.
.
i !n gn1
i +z !n gn1
i !a ga1
i +z !a ga1
i !d gd1
+z !n 1
.
z ! c
0
+z !a (1; 0; 0)
.
.
i (1 − z )!c gc0
i (1 − z )!n gn0
i (1 − z )!a ga0
i (1 − z )!d gd0
!c
!n
!a
!d
i +z !c gc1
(0; 0; 0)
.
.
i +z !n gn1
i +z !a ga1
i +z !d gd1
The subsequent M-step then maximizes the log-likelihood function based on the augmented dataset, that is based on the dataset created by merging the observed and the imputed data. This is equivalent to a weighted maximization of the log-likelihood function, where subjects are di1erently classi6ed in the di1erent compliance groups, t, with weights equal to the conditional probabilities of being in t calculated in the E-step.
502
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
Table 3 Conditional distributions of the missing outcome, f(ymis; i |Xobs ; V), for an individual i in the set (0; rdobs; i ; rzobs; i )
(0; 1; 1)
(0; 0; 1)
(0; 1; 0)
dobs; i
zobs; i
f(ymis; i |Yobs ; V)
0
0
i ! + gi ! )=(! + ! ) (gc0 c c n n0 n
0
1
1
0
1
1
.
0
.
1
0
.
i ! + [(1 − )gi + gi ]! + gi ! (1 − z )gc0 c z n0 z n1 n z d1 d
1
.
i ! + [(1 − )gi + gi ]! + gi ! (1 − z )gd0 z a0 z a1 a z c1 c d
i ! + gi ! )=(! + ! ) (gn1 n n d d1 d i ! + gi ! )=(! + ! ) (ga0 a a d d0 d i ! + gi ! )=(! + ! ) (gc1 c c a a1 a
i ! + gi ! + gi ! + gi ! ) (gc0 c n0 n a0 a d0 d i ! + gi ! + gi ! + gi ! ) (gc1 c n1 n a1 a d1 d
=[(1 − z )(!c + !n ) + z (!n + !d )]
=[(1 − z )(!a + !d ) + z (!c + !a )]
(0; 0; 0)
.
.
i + gi ]! + [(1 − )gi + gi ]! [(1 − z )gc0 z c1 c z d0 z d1 d
i + gi ]! + [(1 − )gi + gi ]! +[(1 − z )gn0 z n1 n z a0 z a1 a
Another simpli6ed case is when the augmented log-likelihood is linear in some missing suKcient statistics. This is the case of a normally distributed outcome, where 2 the log-likelihood is linear in the statistics y and i mis; i i ymis; i . The procedure is the same as the previous case of linearity in the missing outcomes apart from the fact that the E-step replaces conditional expected suKcient statistics this time. For obtaining a measure of uncertainty about the maximum likelihood estimate usual numerical or simulation-resampling methods can be used. Results produced by some analyses on arti6cial datasets have shown that increasing the relative amount of missing data complicates the calculations of the second derivatives of the log-likelihood. Consequently the Hessian matrix can be more easily and fastly calculated by the outer product of the 6rst derivatives at the maximum likelihood point. Alternatively, the bootstrap can be an attractive option given its good 6nite sample properties; analyses based on simulations have shown the calculations are more time consuming even if not in a prohibitive way. If one makes no further assumptions except SUTVA, randomization of assignment, MAR and DOP, then the likelihood function will generally not have a unique maximum. A unique maximum can be obtained by assuming the exclusion restriction and the monotonicity condition, but in more general cases some additional structure may be desiderable. This can be obtained by introducing prior distributions for the parameters. The above delineated general structure of the EM algorithm does not change, even in a Bayesian context, apart from the M-step which will involve the maximization of an expected augmented log-posterior distribution.
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
503
5. Two examples based on simulations In this section two examples based on arti6cial datasets are presented. The 6rst example concerns an application of the proposed procedure and a comparison with results obtained from inference based only on the respondent units. We are considering the case of a randomized experiment with imperfect compliance with a normally distributed outcome, under both monotonicity and the exclusion restriction, i i i i that is by assuming !d = 0, gtzi = (tz ; tz ), ga0 = ga1 , and gn0 = gn1 . Consequently, the parametric vector is V = (!a ; !n ; !c ; a ; n ; c0 ; c1 ; a ; n ; c0 ; c1 ; z ). Table 4 presents the parameters of a hypotetical population distribution; Table 5 presents a missing data mechanism satisfying the MAR and DOP assumptions. In particular, this is the case of an always observed treatment, where the probabilities of non-response in the outcome or in the assignment to treatment depend on the observed value of the treatment. A sample of size 500 has been repeatedly drawn from the hypotetical population described in Table 4, with P(Zi = 1) = 0:5, and non-responses have been generated accordingly to the mechanism described in Table 5. Table 6 reports mean biases, root mean squared errors, coverage rates of 95% con6dence intervals, and mean widths of the intervals, for the repeated estimates of the three probabilities !t and of the complier average causal e1ect (CACE): c1 − c0 . The estimates have been obtained by (i) the MLE calculated using the EM algorithm under MAR and DOP as described in the previous section, (ii) the MLE calculated using the EM algorithm but after deleting the units having at least one non-response, that is erroneously supposing the “missing completely at random” (MCAR) condition is satis6ed. The CACE has been estimated by also using the instrumental variable estimator (IVE) under the MCAR assumption. Simulations are calculated over 1000 arti6cial datasets from the hypotetical distribution. Table 4 Hypotetical population distribution
Ci
P(Ci |V)
f(yi |Ci = t; Zi = 0; V)
f(yi |Ci = t; Zi = 1; V)
a n c
0.3 0.45 0.25
N(0:5; 0:3) N(1; 0:2) N(0:6; 0:2)
N(0:5; 0:3) N(1; 0:2) N(0:3; 0:5)
Table 5 Hypotetical missing data mechanism
ri = (ryobs; i ; rdobs; i ; rzobs; i )
P { ri | Di = 1 }
P { ri | D i = 0 }
(1; 1; 1) (1; 1; 0) (0; 1; 1) (0; 1; 0)
0.42 0.4 0.3 0.12
0.64 0.2 0.2 0.04
504
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
Table 6 Operating characteristics of various procedures for replications from the population of Table 4 with P(Zi = 1) = 0:5
95% Interval Parameter
Estimator
Mean bias
Root MSE
Coverage rate
Mean width
!a
MLE (MAR + DOP) MLE (MCAR)
0.027 0.084
0.046 0.092
0.812 0.355
0.128 0.135
!n
MLE (MAR + DOP) MLE (MCAR)
−0.091
0.022
0.048 0.103
0.871 0.559
0.150 0.201
!c
MLE (MAR + DOP) MLE (MCAR)
−0.049
0.081 0.079
0.734 0.719
0.196 0.192
CACE
MLE (MAR + DOP) MLE (MCAR) IVE
0.039
0.097 0.161 0.758
0.916 0.869 0.972
0.345 0.500 2.089
0.062
−0.061
0.157
Table 7 Hypotetical population distribution
Ci
P(Ci |V)
f(yi |Ci = t; Zi = 0; V)
f(yi |Ci = t; Zi = 1; V)
n c
0.4 0.6
N(0; 1) N(2; 1)
N(0; 1) N(2:5; 1)
Table 6 shows that estimations based only on the respondents clearly undercover the three probabilities !t and the CACE. Only the IVE has a better coverage rate but at the cost of a dramatically higher mean width of associated 95% intervals. Moreover, estimations based on MAR and DOP obtain substantially better accuracies as suggested by the mean biases and root mean squared errors. Another interesting issue is the case of a hypotetical distribution in absence of always-takers and de6ers, and with non-responses only in a normally distributed outcome. This is the framework studied by Frangakis and Rubin (1999) and based on the latent ignorability and the CER for never-takers. Now we consider the hypotetical population distribution described in Table 7, which is characterized by: !a = 0, i i = gn1 . A sample of size 500 has been repeatedly drawn !d = 0, gtzi = (tz ; tz ), and gn0 from this population, with 250 units assigned Zi = 1 and 250 units assigned Zi = 0. Then non-responses have been generated according to a mechanism where missing data are only in the outcome and depend on the assignment to treatment (this satis6es the MAR and DOP conditions). In each of the three conditions of Table 8, we 6x the non-response probability given that Zi = 0 at P{ri = (ryobs; i ; rdobs; i ; rzobs; i ) = (0; 1; 1)|Zi = 0} = 0:5:
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
505
Table 8 Inference for the Complier Average Causal E1ect for replications from the population of Table 7 with 250 units assigned Zi = 1 and 250 assigned Zi = 0
P(ri∗ |Zi = 1)
0.00
0.05
0.10
ri∗
Estimator
MLE (MAR + DOP) Latent ignorability + CER for never-takers MLE (MAR + DOP) Latent ignorability + CER for never-takers MLE (MAR + DOP) Latent ignorability + CER for never-takers
Mean bias
Root MSE
95% Interval Coverage rate
Mean width
0.010
0.159
0.893
0.502
−0.498
0.526
0.220
0.706
0.032
0.200
0.833
0.530
−0.480
0.512
0.250
0.714
−0.033
0.273
0.724
0.566
−0.453
0.488
0.307
0.729
: (ryobs; i ; rdobs; i ; rzobs; i ) = (0; 1; 1).
The parameter that we vary is the non-response probability given that Zi = 1, that is P{ri = (ryobs; i ; rdobs; i ; rzobs; i ) = (0; 1; 1)|Zi = 1} = 0 or 0:05 or 0:10: Simulations are calculated over 1000 arti6cial datasets from the hypotetical distribution for each experimental condition. Table 8 compares the results obtained by applying the method proposed by Frangakis and Rubin (1999) and the MLE under MAR and DOP in undertaking inference for the CACE. Results show a relative advantage in using the MLE under MAR and DOP in terms of accuracy, coverage rates and width of associated 95% intervals. Note that this advantage decreases when the probability to non-response given Zi =1, P{ri =(0; 1; 1)|Zi =1} increases. This example stresses the importance of a well calibrated choice of the missing data mechanism taking into account the peculiarities of the collection scheme. We are supposing an ignorable missing data mechanism that given the peculiarities of the hypotetical distribution could be plausible in a observational study where the data are not collected via an experimental design. Then applying an inferential method based on conditions more adapted to randomized and, in particular, clinical trials (latent ignorabilty and CER for never-takers) can be quite misleading according to the non-responses probabilities. An interesting example of results produce under various missing data mechanisms (MAR, latent ignorabilty, CER for never-takers or compliers) in a clinical trial is in Mealli and Rubin (2003). For obtaining the results in the two proposed examples, the EM algorithm has been implemented using the statistical software GAUSS. This does not present particular inconveniences or diKculties given a basic knowledge of programming. MatLab, Stata, S-Plus can be other usual statistical softwares adapted easily to implement the EM algorithm.
506
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
6. Conclusions The problem of non-responses in a randomized experiment with imperfect compliance has been considered. The proposed analysis relies on the assumption that the non-response mechanism is ignorable after conditioning on the observable quantities. This can be an appropriate condition if the data collection scheme supports the hypothesis that non-response probabilities are a1ected by observable variables related to individual behaviors, and in particular in presence of missing data in the treatment and in the assignment to treatment. Besides theoretical results, a computational procedure for maximizing the loglikelihood function or the posterior distribution was proposed. This was based on the EM algorithm in order to exploit the particular structure of the likelihood induced by the satisfaction of the MAR and DOP assumptions. Two examples based on arti6cial data illustrate the relative merits of the proposed procedure for particular hypotetical distributions.
Acknowledgements The author thank E. Cleur, G. Ghilardi, A. Ichino, G.W. Imbens, F. Mealli, as well as the Associate Editor and two anonymous referees for useful comments and suggestions.
Appendix A The likelihood function (2), factors in four terms, and every term includes the likelihood for the units in the set &(Dobs; i ; Zobs; i ). The resolution of the integrals in (4) produces a similar factorization, but into eighteen terms. Each of these terms refers to a set obtainable by intersecting the four sets &(Dobs; i ; Zobs; i ): {&(0; 0); &(0; 1); &(1; 1); &(1; 0)} with the eight sets (ryobs; i ; rdobs; i ; rzobs; i ), where ryobs; i , rdobs; i and rzobs; i are indicators assuming value 0 in case of non-response and value 1 in case of response on yobs; i , dobs; i and zobs; i , respectively. All the possible intersections between &(Dobs; i ; Zobs; i ) and (ryobs; i ; rdobs; i ; rzobs; i ) produce the (4 × 8) = 32 sets (Dobs; i ; Zobs; i ; ryobs; i ; rdobs; i ; rzobs; i ). But given that the units in (1; 1; 1) ∪ (0; 1; 1) have at least one non-response in the treatment or in the assignment to treatment, the number of possible sets reduce to eighteen. For example the units in the set (1; 0; 1) have a non-response in the treatment, so it is not possible to specify the four possible sets created by intersecting (1; 0; 1) with {&(0; 0); &(0; 1); &(1; 1); &(1; 0)}. The question can be resolved by de6ning (:; z; 1; 0; 1) = (0; z; 1; 0; 1) ∪ (1; z; 1; 0; 1) to be the units in set (1; 0; 1) and for which Zobs; i = z. Analogous arguments hold for the other units. Now, let us consider the units in the set (:; z; 1; 0; 1); the probability
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
507
or density function for a unit in this set is obtainable by integrating f(di ; yi ; V) with respect to the unobserved quantities: · · · f(di ; yi ; V) dymis; i ddmis; i ddmis; i : From (4), the integrations with respect to ymis; i and dmis; i produce i i i i + !d gd0 ) + I(d=0) (!c gc0 + !n gn0 )} ddmis; i if z = 0; (1 − z ) {I(d=1) (!a ga0 z
i i i i { I(d=1) (!a ga1 + !c gc1 ) + I(d=0) (!n gn1 + !d gd1 )} ddmis; i if z = 1;
the further integration respect to dmis; i eliminates the indicators I(d=1) and I(d=0) : i i i i (1 − z )(!a ga0 + !d gd0 + !c gc0 + !n gn0 ) if z = 0; i i i i + !c gc1 + !n gn1 + !d gd1 ) if z = 1: z (!a ga1
In other words, the units in the set (:; z; 1; 0; 1) are a mixture of compliers, de6ers, always-takers and never-takers assigned to the treatment z. Analogous arguments hold for the other units.
Appendix B The EM algorithm works in general as follows (at the t iteration): • in the E-step, by calculating the expected l(V|X) with respect to f(Xmis |Xobs ; V(t−1) ): (t−1) ) = l(V|X) f(Xmis |Xobs ; V(t−1) ) dXmis : Q(V; V If l(V|X) is linear in Xmis this is equivalent to augmenting the dataset by the quan(t) tities Xmis : (t) Xmis = E(Xmis |Xobs ; V(t−1) );
• and in the M-step, by maximizing Q(V; V(t−1) ) with respect to V to obtain V(t) . If l(V|X) is linear in Xmis this is equivalent to maximize the log-likelihood based on (t) the augmented dataset, l(V|Xmis ; Xobs ), to obtain V(t) . In the presence of non-responses, the evaluation of conditional expectations in the E-step is easy under MAR. This assumption allows us to work without considering
508
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
the non-response mechanism; indeed under (3), and for every value of ”: f(Xmis |Xobs ; R; V(t−1) ; ”) = =
f(R|Xobs ; ”)f(Xmis ; Xobs ; V(t−1) ) f(R|Xobs ; ”)f(Xmis ; Xobs ; V(t−1) ) dXmis f(Xmis ; Xobs ; V(t−1) ) f(Xmis ; Xobs ; V(t−1) ) = f(Xobs ; V(t−1) ) f(Xmis ; Xobs ; V(t−1) ) dXmis
= f(Xmis |Xobs ; V(t−1) ): Again in the M-step, by the DOP assumption, maximizations do not require us to take into account the non-response mechanism given that: (t) (t) (t) f(R; Xmis ; Xobs ; V; ”) = f(R|Xmis ; Xobs ; ”) · f(Xmis ; Xobs ; V):
References Angrist, J.D., Imbens, G.W., Rubin, D.B., 1996. Identi6cation of causal e1ect using instrumental variables. J. Amer. Statist. Assoc. 91, 444–455. Baker, S.G., 2000. Analyzing a randomized cancer prevention trial with a missing binary outcome, an auxiliary variable, and all-or-none compliance. J. Amer. Statist. Assoc. 95, 43–50. Dempster, A.P., Laird, N., Rubin, D.B., 1977. Maximum likelihood estimation from incomplete data using the EM algorithm. J. Roy. Statist. Soc., Ser. B 39, 1–38. Frangakis, C.E., Rubin, D.B., 1999. Addressing complications of intention to treatment analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika 86, 365–379. Hirano, K., Imbens, G.W., Rubin, D.B., Zhou, X., 2000. Estimating the e1ect of an inGuenza vaccine in an encouragement design. Biostatistics 1, 69–88. Holland, P.W., 1986. Statistics and casual inference. J. Amer. Statist. Assoc. 81, 945–960. Holland, P.W., Rubin, D.B., 1983. On Lord’s Paradox. In: Wainer, H., Messick, S. (Eds.), Principals of Modern Psychological Measurement. Lawrence Erlbaum, New Jersey. Imbens, G.W., Rubin, D.B., 1997. Bayesian inference for causal e1ects in randomized experiments with non-compliance. Ann. Statist. 25, 305–327. Little, R.J.A., Rubin, D.B., 1987. Statistical Analysis with Missing Data. Wiley, New York. Little, R.J.A., Yau, L., 1998. Statistical techniques for analyzing data from prevention trials: treatment of no-shows using Rubin’s causal model. Psychol. Methods 3, 147–159. Mealli, F., Rubin, D.B., 2003. Assumptions when analyzing randomized experiments with noncompliance and missing outcomes. Health Services Outcome Res. Methodol., forthcoming. Rubin, D.B., 1974. Estimating causal e1ects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66, 688–701. Rubin, D.B., 1976. Inference and missing data. Biometrika 63, 581–592. Rubin, D.B., 1978. Bayesian inference for causal e1ects: the role of randomization. Ann. Statist. 6, 34–58. Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys. Wiley, New York. Schafer, J.L., 1997. Analysis of Incomplete Multivariate Data. Chapman & Hall, London.
A. Mercatanti / Computational Statistics & Data Analysis 46 (2004) 493 – 509
509
Tanner, M.A., 1996. Tools for Statistical Inference. Springer, Berlin. Yau, L., Little, R.J., 2001. Inference for the complier-average causal e1ect from longitudinal data subject to noncompliance and missing data, with application to a job training assessment for the unemployed. J. Amer. Statist. Assoc. 96, 1232–1244.